ARM Architecture Reference Manual ARMv7 A And R Edition V7

User Manual: Pdf

Open the PDF directly: View PDF PDF.
Page Count: 2158

DownloadARM Architecture Reference Manual ARMv7-A And ARMv7-R Edition V7
Open PDF In BrowserView PDF
ARM Architecture
Reference Manual
®

ARM v7-A and ARM v7-R edition
®

®

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.
ARM DDI 0406B

ARM Architecture Reference Manual
ARMv7-A and ARMv7-R edition
Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.
Release Information
The following changes have been made to this document.
Change History
Date

Issue

Confidentiality

Change

05 April 2007

A

Non-Confidential

New edition for ARMv7-A and ARMv7-R architecture profiles.
Document number changed from ARM DDI 0100 to ARM DDI 0406 and contents
restructured.

29 April 2008

B

Non-Confidential

Addition of the VFP Half-precision and Multiprocessing Extensions, and many clarifications
and enhancements.

From ARMv7, the ARM® architecture defines different architectural profiles and this edition of this manual describes
only the A and R profiles. For details of the documentation of the ARMv7-M profile see Further reading on page xx.
Before ARMv7 there was only a single ARM Architecture Reference Manual, with document number DDI 0100. The first
issue of this was in February 1996, and the final issue, Issue I, was in July 2005. For more information see Further reading
on page xx.
Proprietary Notice
Words and logos marked with ® or ™ are registered trademarks or trademarks of ARM Limited in the EU and other
countries, except as otherwise stated below in this proprietary notice. Other brands and names mentioned herein may be
the trademarks of their respective owners.
Neither the whole nor any part of the information contained in, or the product described in, this document may be adapted
or reproduced in any material form except with the prior written permission of the copyright holder.
The product described in this document is subject to continuous developments and improvements. All particulars of the
product and its use contained in this document are given by ARM in good faith. However, all warranties implied or
expressed, including but not limited to implied warranties of merchantability, or fitness for purpose, are excluded.
1. Subject to the provisions set out below, ARM hereby grants to you a perpetual, non-exclusive, nontransferable, royalty
free, worldwide licence to use this ARM Architecture Reference Manual for the purposes of developing; (i) software
applications or operating systems which are targeted to run on microprocessor cores distributed under licence from ARM;
(ii) tools which are designed to develop software programs which are targeted to run on microprocessor cores distributed
under licence from ARM; (iii) or having developed integrated circuits which incorporate a microprocessor core
manufactured under licence from ARM.
2. Except as expressly licensed in Clause 1 you acquire no right, title or interest in the ARM Architecture Reference
Manual, or any Intellectual Property therein. In no event shall the licences granted in Clause 1, be construed as granting
you expressly or by implication, estoppel or otherwise, licences to any ARM technology other than the ARM Architecture
Reference Manual. The licence grant in Clause 1 expressly excludes any rights for you to use or take into use any ARM
patents. No right is granted to you under the provisions of Clause 1 to; (i) use the ARM Architecture Reference Manual
for the purposes of developing or having developed microprocessor cores or models thereof which are compatible in
whole or part with either or both the instructions or programmers’ models described in this ARM Architecture Reference
Manual; or (ii) develop or have developed models of any microprocessor cores designed by or for ARM; or (iii) distribute

ii

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

in whole or in part this ARM Architecture Reference Manual to third parties, other than to your subcontractors for the
purposes of having developed products in accordance with the licence grant in Clause 1 without the express written
permission of ARM; or (iv) translate or have translated this ARM Architecture Reference Manual into any other
languages.
3. THE ARM ARCHITECTURE REFERENCE MANUAL IS PROVIDED "AS IS" WITH NO WARRANTIES
EXPRESS, IMPLIED OR STATUTORY, INCLUDING BUT NOT LIMITED TO ANY WARRANTY OF
SATISFACTORY QUALITY, NONINFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE.
4. No licence, express, implied or otherwise, is granted to LICENSEE, under the provisions of Clause 1, to use the ARM
tradename, in connection with the use of the ARM Architecture Reference Manual or any products based thereon.
Nothing in Clause 1 shall be construed as authority for you to make any representations on behalf of ARM in respect of
the ARM Architecture Reference Manual or any products based thereon.
Where the term ARM is used to refer to the company it means “ARM or any of its subsidiaries as appropriate”.

Note
The term ARM is also used to refer to versions of the ARM architecture, for example ARMv6 refers to version 6 of the
ARM architecture. The context makes it clear when the term is used in this way.

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited
110 Fulbourn Road Cambridge, England CB1 9NJ
Restricted Rights Legend: Use, duplication or disclosure by the United States Government is subject to the restrictions
set forth in DFARS 252.227-7013 (c)(1)(ii) and FAR 52.227-19.
This document is Non-Confidential. The right to use, copy and disclose this document is subject to the licence set out
above.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

iii

iv

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Contents
ARM Architecture Reference Manual
ARMv7-A and ARMv7-R edition

Preface
About this manual ............................................................................... xiv
Using this manual ................................................................................ xv
Conventions ....................................................................................... xviii
Further reading .................................................................................... xx
Feedback ............................................................................................ xxi

Part A
Chapter A1

Application Level Architecture
Introduction to the ARM Architecture
A1.1
A1.2
A1.3
A1.4
A1.5
A1.6

Chapter A2

A1-2
A1-3
A1-4
A1-6
A1-7
A1-8

Application Level Programmers’ Model
A2.1

ARM DDI 0406B

About the ARM architecture .............................................................
The ARM and Thumb instruction sets ..............................................
Architecture versions, profiles, and variants ....................................
Architecture extensions ....................................................................
The ARM memory model .................................................................
Debug ..............................................................................................

About the Application level programmers’ model ............................. A2-2

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

v

Contents

A2.2
A2.3
A2.4
A2.5
A2.6
A2.7
A2.8
A2.9
A2.10
A2.11

Chapter A3

Application Level Memory Model
A3.1
A3.2
A3.3
A3.4
A3.5
A3.6
A3.7
A3.8
A3.9

Chapter A4

About the instruction sets ................................................................. A4-2
Unified Assembler Language ........................................................... A4-4
Branch instructions .......................................................................... A4-7
Data-processing instructions ............................................................ A4-8
Status register access instructions ................................................ A4-18
Load/store instructions ................................................................... A4-19
Load/store multiple instructions ..................................................... A4-22
Miscellaneous instructions ............................................................. A4-23
Exception-generating and exception-handling instructions ............ A4-24
Coprocessor instructions ............................................................... A4-25
Advanced SIMD and VFP load/store instructions .......................... A4-26
Advanced SIMD and VFP register transfer instructions ................. A4-29
Advanced SIMD data-processing operations ................................. A4-30
VFP data-processing instructions .................................................. A4-38

ARM Instruction Set Encoding
A5.1
A5.2
A5.3
A5.4
A5.5
A5.6
A5.7

vi

Address space ................................................................................. A3-2
Alignment support ............................................................................ A3-4
Endian support ................................................................................. A3-7
Synchronization and semaphores .................................................. A3-12
Memory types and attributes and the memory order model .......... A3-24
Access rights .................................................................................. A3-38
Virtual and physical addressing ..................................................... A3-40
Memory access order .................................................................... A3-41
Caches and memory hierarchy ...................................................... A3-51

The Instruction Sets
A4.1
A4.2
A4.3
A4.4
A4.5
A4.6
A4.7
A4.8
A4.9
A4.10
A4.11
A4.12
A4.13
A4.14

Chapter A5

ARM core data types and arithmetic ................................................ A2-3
ARM core registers ........................................................................ A2-11
The Application Program Status Register (APSR) ......................... A2-14
Execution state registers ................................................................ A2-15
Advanced SIMD and VFP extensions ............................................ A2-20
Floating-point data types and arithmetic ........................................ A2-32
Polynomial arithmetic over {0,1} .................................................... A2-67
Coprocessor support ...................................................................... A2-68
Execution environment support ..................................................... A2-69
Exceptions, debug events and checks ........................................... A2-81

ARM instruction set encoding .......................................................... A5-2
Data-processing and miscellaneous instructions ............................. A5-4
Load/store word and unsigned byte ............................................... A5-19
Media instructions .......................................................................... A5-21
Branch, branch with link, and block data transfer .......................... A5-27
Supervisor Call, and coprocessor instructions ............................... A5-28
Unconditional instructions .............................................................. A5-30

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Contents

Chapter A6

Thumb Instruction Set Encoding
A6.1
A6.2
A6.3

Chapter A7

Advanced SIMD and VFP Instruction Encoding
A7.1
A7.2
A7.3
A7.4
A7.5
A7.6
A7.7
A7.8
A7.9

Chapter A8

Chapter B1

The ThumbEE instruction set ........................................................... A9-2
ThumbEE instruction set encoding .................................................. A9-6
Additional instructions in Thumb and ThumbEE instruction sets ..... A9-7
ThumbEE instructions with modified behavior ................................. A9-8
Additional ThumbEE instructions ................................................... A9-14

System Level Architecture
The System Level Programmers’ Model
B1.1
B1.2
B1.3
B1.4
B1.5
B1.6
B1.7
B1.8
B1.9

ARM DDI 0406B

Format of instruction descriptions .................................................... A8-2
Standard assembler syntax fields .................................................... A8-7
Conditional execution ....................................................................... A8-8
Shifts applied to a register ............................................................. A8-10
Memory accesses .......................................................................... A8-13
Alphabetical list of instructions ....................................................... A8-14

ThumbEE
A9.1
A9.2
A9.3
A9.4
A9.5

Part B

Overview .......................................................................................... A7-2
Advanced SIMD and VFP instruction syntax ................................... A7-3
Register encoding ............................................................................ A7-8
Advanced SIMD data-processing instructions ............................... A7-10
VFP data-processing instructions .................................................. A7-24
Extension register load/store instructions ...................................... A7-26
Advanced SIMD element or structure load/store instructions ........ A7-27
8, 16, and 32-bit transfer between ARM core and extension registers .....
A7-31
64-bit transfers between ARM core and extension registers ......... A7-32

Instruction Details
A8.1
A8.2
A8.3
A8.4
A8.5
A8.6

Chapter A9

Thumb instruction set encoding ....................................................... A6-2
16-bit Thumb instruction encoding ................................................... A6-6
32-bit Thumb instruction encoding ................................................. A6-14

About the system level programmers’ model ................................... B1-2
System level concepts and terminology ........................................... B1-3
ARM processor modes and core registers ....................................... B1-6
Instruction set states ...................................................................... B1-23
The Security Extensions ................................................................ B1-25
Exceptions ..................................................................................... B1-30
Coprocessors and system control .................................................. B1-62
Advanced SIMD and floating-point support .................................... B1-64
Execution environment support ..................................................... B1-73

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

vii

Contents

Chapter B2

Common Memory System Architecture Features
B2.1
B2.2
B2.3
B2.4

Chapter B3

Virtual Memory System Architecture (VMSA)
B3.1
B3.2
B3.3
B3.4
B3.5
B3.6
B3.7
B3.8
B3.9
B3.10
B3.11
B3.12
B3.13

Chapter B4

B4.6
B4.7

Chapter C1

Alphabetical list of instructions ......................................................... B6-2

Debug Architecture
Introduction to the ARM Debug Architecture
C1.1
C1.2

viii

Introduction to the CPUID scheme .................................................. B5-2
The CPUID registers ........................................................................ B5-4
Advanced SIMD and VFP feature identification registers .............. B5-34

System Instructions
B6.1

Part C

About the PMSA .............................................................................. B4-2
Memory access control .................................................................... B4-9
Memory region attributes ............................................................... B4-11
PMSA memory aborts .................................................................... B4-13
Fault Status and Fault Address registers in a PMSA implementation ......
B4-18
CP15 registers for a PMSA implementation .................................. B4-22
Pseudocode details of PMSA memory system operations ............ B4-79

The CPUID Identification Scheme
B5.1
B5.2
B5.3

Chapter B6

About the VMSA .............................................................................. B3-2
Memory access sequence ............................................................... B3-4
Translation tables ............................................................................. B3-7
Address mapping restrictions ......................................................... B3-23
Secure and Non-secure address spaces ....................................... B3-26
Memory access control .................................................................. B3-28
Memory region attributes ............................................................... B3-32
VMSA memory aborts .................................................................... B3-40
Fault Status and Fault Address registers in a VMSA implementation ......
B3-48
Translation Lookaside Buffers (TLBs) ............................................ B3-54
Virtual Address to Physical Address translation operations ........... B3-63
CP15 registers for a VMSA implementation .................................. B3-64
Pseudocode details of VMSA memory system operations .......... B3-156

Protected Memory System Architecture (PMSA)
B4.1
B4.2
B4.3
B4.4
B4.5

Chapter B5

About the memory system architecture ........................................... B2-2
Caches ............................................................................................. B2-3
Implementation defined memory system features ......................... B2-27
Pseudocode details of general memory system operations .......... B2-29

Scope of part C of this manual ......................................................... C1-2
About the ARM Debug architecture ................................................. C1-3

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Contents

C1.3
C1.4

Chapter C2

Invasive Debug Authentication
C2.1

Chapter C3

About the debug register interfaces ................................................. C6-2
Reset and power-down support ....................................................... C6-4
Debug register map ....................................................................... C6-18
Synchronization of debug register updates .................................... C6-24
Access permissions ....................................................................... C6-26
The CP14 debug register interfaces .............................................. C6-32
The memory-mapped and recommended external debug interfaces .......
C6-43

Non-invasive Debug Authentication
C7.1
C7.2
C7.3
C7.4

ARM DDI 0406B

About Debug state ........................................................................... C5-2
Entering Debug state ....................................................................... C5-3
Behavior of the PC and CPSR in Debug state ................................. C5-7
Executing instructions in Debug state .............................................. C5-9
Privilege in Debug state ................................................................. C5-13
Behavior of non-invasive debug in Debug state ............................. C5-19
Exceptions in Debug state ............................................................. C5-20
Memory system behavior in Debug state ....................................... C5-24
Leaving Debug state ...................................................................... C5-28

Debug Register Interfaces
C6.1
C6.2
C6.3
C6.4
C6.5
C6.6
C6.7

Chapter C7

About debug exceptions .................................................................. C4-2
Effects of debug exceptions on CP15 registers and the DBGWFAR ........
C4-4

Debug State
C5.1
C5.2
C5.3
C5.4
C5.5
C5.6
C5.7
C5.8
C5.9

Chapter C6

About debug events ......................................................................... C3-2
Software debug events .................................................................... C3-5
Halting debug events ..................................................................... C3-38
Generation of debug events ........................................................... C3-40
Debug event prioritization .............................................................. C3-43

Debug Exceptions
C4.1
C4.2

Chapter C5

About invasive debug authentication ............................................... C2-2

Debug Events
C3.1
C3.2
C3.3
C3.4
C3.5

Chapter C4

Security Extensions and debug ....................................................... C1-8
Register interfaces ........................................................................... C1-9

About non-invasive debug authentication ........................................
v7 Debug non-invasive debug authentication ..................................
Effects of non-invasive debug authentication ..................................
ARMv6 non-invasive debug authentication ......................................

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

C7-2
C7-4
C7-6
C7-8

ix

Contents

Chapter C8

Sample-based Profiling
C8.1

Chapter C9

Performance Monitors
C9.1
C9.2
C9.3
C9.4
C9.5
C9.6
C9.7
C9.8
C9.9
C9.10

Chapter C10

Scope of this appendix ............................................................... AppxB-2
Introduction to the Common VFP subarchitecture ..................... AppxB-3
Exception processing ................................................................. AppxB-6
Support code requirements ...................................................... AppxB-11
Context switching ..................................................................... AppxB-14
Subarchitecture additions to the VFP system registers ........... AppxB-15
Version 1 of the Common VFP subarchitecture ....................... AppxB-23
Version 2 of the Common VFP subarchitecture ....................... AppxB-24

Legacy Instruction Mnemonics
C.1
C.2

x

System integration signals ......................................................... AppxA-2
Recommended debug slave port ............................................. AppxA-13

Common VFP Subarchitecture Specification
B.1
B.2
B.3
B.4
B.5
B.6
B.7
B.8

Appendix C

Accessing the debug registers ....................................................... C10-2
Debug identification registers ......................................................... C10-3
Control and status registers ......................................................... C10-10
Instruction and data transfer registers ......................................... C10-40
Software debug event registers ................................................... C10-48
OS Save and Restore registers, v7 Debug only .......................... C10-75
Memory system control registers ................................................. C10-80
Management registers, ARMv7 only ............................................ C10-88
Performance monitor registers ................................................... C10-105

Recommended External Debug Interface
A.1
A.2

Appendix B

About the performance monitors ...................................................... C9-2
Status in the ARM architecture ........................................................ C9-4
Accuracy of the performance monitors ............................................ C9-5
Behavior on overflow ....................................................................... C9-6
Interaction with Security Extensions ................................................ C9-7
Interaction with trace ........................................................................ C9-8
Interaction with power saving operations ......................................... C9-9
CP15 c9 register map .................................................................... C9-10
Access permissions ....................................................................... C9-12
Event numbers ............................................................................... C9-13

Debug Registers Reference
C10.1
C10.2
C10.3
C10.4
C10.5
C10.6
C10.7
C10.8
C10.9

Appendix A

Program Counter sampling .............................................................. C8-2

Thumb instruction mnemonics ................................................... AppxC-2
Pre-UAL pseudo-instruction NOP .............................................. AppxC-3

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Contents

Appendix D

Deprecated and Obsolete Features
D.1
D.2
D.3
D.4
D.5
D.6
D.7

Appendix E

Fast Context Switch Extension (FCSE)
E.1
E.2
E.3

Appendix F

Introduction to ARMv6 .............................................................. AppxG-2
Application level register support .............................................. AppxG-3
Application level memory support ............................................. AppxG-6
Instruction set support ............................................................. AppxG-10
System level register support .................................................. AppxG-16
System level memory model ................................................... AppxG-20
System Control coprocessor (CP15) support .......................... AppxG-29

Introduction to ARMv4 and ARMv5 ............................................ AppxH-2
Application level register support ............................................... AppxH-4
Application level memory support .............................................. AppxH-6
Instruction set support .............................................................. AppxH-11
System level register support ................................................... AppxH-18
System level memory model .................................................... AppxH-21
System Control coprocessor (CP15) support ........................... AppxH-31

Pseudocode Definition
I.1
I.2
I.3
I.4
I.5
I.6
I.7

ARM DDI 0406B

AppxF-2
AppxF-3
AppxF-5
AppxF-7

ARMv4 and ARMv5 Differences
H.1
H.2
H.3
H.4
H.5
H.6
H.7

Appendix I

About VFP vector mode .............................................................
Vector length and stride control .................................................
VFP register banks ....................................................................
VFP instruction type selection ....................................................

ARMv6 Differences
G.1
G.2
G.3
G.4
G.5
G.6
G.7

Appendix H

About the FCSE ......................................................................... AppxE-2
Modified virtual addresses ......................................................... AppxE-3
Debug and trace ........................................................................ AppxE-5

VFP Vector Operation Support
F.1
F.2
F.3
F.4

Appendix G

Deprecated features .................................................................. AppxD-2
Deprecated terminology ............................................................. AppxD-5
Obsolete features ....................................................................... AppxD-6
Semaphore instructions ............................................................. AppxD-7
Use of the SP as a general-purpose register ............................. AppxD-8
Explicit use of the PC in ARM instructions ................................. AppxD-9
Deprecated Thumb instructions ............................................... AppxD-10

Instruction encoding diagrams and pseudocode ......................... AppxI-2
Limitations of pseudocode .......................................................... AppxI-4
Data types ................................................................................... AppxI-5
Expressions ................................................................................ AppxI-9
Operators and built-in functions ................................................ AppxI-11
Statements and program structure ............................................ AppxI-17
Miscellaneous helper procedures and functions ....................... AppxI-22

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

xi

Contents

Appendix J

Pseudocode Index
J.1
J.2

Appendix K

Pseudocode operators and keywords ........................................ AppxJ-2
Pseudocode functions and procedures ...................................... AppxJ-6

Register Index
K.1

Register index ............................................................................ AppxK-2

Glossary

xii

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Preface

This preface summarizes the contents of this manual and lists the conventions it uses. It contains the
following sections:
•
About this manual on page xiv
•
Using this manual on page xv
•
Conventions on page xviii
•
Further reading on page xx
•
Feedback on page xxi.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

xiii

Preface

About this manual
This manual describes the ARM®v7 instruction set architecture, including its high code density Thumb®
instruction encoding and the following extensions to it:
•

The System Control coprocessor, coprocessor 15 (CP15), used to control memory system
components such as caches, write buffers, Memory Management Units, and Protection Units.

•

The optional Advanced SIMD extension, that provides high-performance integer and
single-precision floating-point vector operations.

•

The optional VFP extension, that provides high-performance floating-point operations. It can
optionally support double-precision operations.

•

The Debug architecture, that provides software access to debug features in ARM processors.

Part A describes the application level view of the architecture. It describes the application level view of the
programmers’ model and the memory model. It also describes the precise effects of each instruction in User
mode (the normal operating mode), including any restrictions on its use. This information is of primary
importance to authors and users of compilers, assemblers, and other programs that generate ARM machine
code.
Part B describes the system level view of the architecture. It gives details of system registers that are not
accessible from User mode, and the system level view of the memory model. It also gives full details of the
effects of instructions in privileged modes (any mode other than User mode), where these are different from
their effects in User mode.
Part C describes the Debug architecture. This is an extension to the ARM architecture that provides
configuration, breakpoint and watchpoint support, and a Debug Communications Channel (DCC) to a debug
host.
Assembler syntax is given for the instructions described in this manual, permitting instructions to be
specified in textual form. However, this manual is not intended as tutorial material for ARM assembler
language, nor does it describe ARM assembler language at anything other than a very basic level. To make
effective use of ARM assembler language, consult the documentation supplied with the assembler being
used.

xiv

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Preface

Using this manual
The information in this manual is organized into four parts, as described below.

Part A, Application Level Architecture
Part A describes the application level view of the architecture. It contains the following chapters:
Chapter A1

Gives a brief overview of the ARM architecture, and the ARM and Thumb instruction sets.

Chapter A2

Describes the application level view of the ARM programmers’ model, including the
application level view of the Advanced SIMD and VFP extensions. It describes the types of
value that ARM instructions operate on, the general-purpose registers that contain those
values, and the Application Program Status Register.

Chapter A3

Describes the application level view of the memory model, including the ARM memory
types and attributes, and memory access control.

Chapter A4

Describes the range of instructions available in the ARM, Thumb, Advanced SIMD, and
VFP instruction sets. It also contains some details of instruction operation, where these are
common to several instructions.

Chapter A5

Gives details of the encoding of the ARM instruction set.

Chapter A6

Gives details of the encoding of the Thumb instruction set.

Chapter A7

Gives details of the encoding of the Advanced SIMD and VFP instruction sets.

Chapter A8

Provides detailed reference information about every instruction available in the Thumb,
ARM, Advanced SIMD, and VFP instruction sets, with the exception of information only
relevant in privileged modes.

Chapter A9

Provides detailed reference information about the ThumbEE (Execution Environment)
variant of the Thumb instruction set.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

xv

Preface

Part B, System Level Architecture
Part B describes the system level view of the architecture. It contains the following chapters:
Chapter B1

Describes the system level view of the programmers’ model.

Chapter B2

Describes the system level view of the memory model features that are common to all
memory systems.

Chapter B3

Describes the system level view of the Virtual Memory System Architecture (VMSA) that
is part of all ARMv7-A implementations. This chapter includes descriptions of all of the
CP15 System Control Coprocessor registers in a VMSA implementation.

Chapter B4

Describes the system level view of the Protected Memory System Architecture (PMSA) that
is part of all ARMv7-R implementations. This chapter includes descriptions of all of the
CP15 System Control Coprocessor registers in a PMSA implementation.

Chapter B5

Describes the CPUID scheme.

Chapter B6

Provides detailed reference information about system instructions, and more information
about instructions where they behave differently in privileged modes.

Part C, Debug Architecture
Part C describes the Debug architecture. It contains the following chapters:
Chapter C1

Gives a brief introduction to the Debug architecture.

Chapter C2

Describes the authentication of invasive debug.

Chapter C3

Describes the debug events.

Chapter C4

Describes the debug exceptions.

Chapter C5

Describes Debug state.

Chapter C6

Describes the permitted debug register interfaces.

Chapter C7

Describes the authentication of non-invasive debug.

Chapter C8

Describes sample-based profiling.

Chapter C9

Describes the ARM performance monitors.

Chapter C10 Describes the debug registers.

xvi

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Preface

Part D, Appendices
This manual contains the following appendices:
Appendix A

Describes the recommended external Debug interfaces.

Note
This description is not part of the ARM architecture specification. It is included here only
as supplementary information, for the convenience of developers and users who might
require this information.
Appendix B

The Common VFP subarchitecture specification.

Note
This specification is not part of the ARM architecture specification. This sub-architectural
information is included here only as supplementary information, for the convenience of
developers and users who might require this information.
Appendix C

Describes the legacy mnemonics.

Appendix D

Identifies the deprecated architectural features.

Appendix E

Describes the Fast Context Switch Extension (FCSE). From ARMv6, the use of this feature
is deprecated, and in ARMv7 the FCSE is optional.

Appendix F

Describes the VFP vector operations. Use of these operations is deprecated in ARMv7.

Appendix G

Describes the differences in the ARMv6 architecture.

Appendix H

Describes the differences in the ARMv4 and ARMv5 architectures.

Appendix I

The formal definition of the pseudocode.

Appendix J

Index to definitions of pseudocode operators, keywords, functions, and procedures.

Appendix K

Index to register descriptions in the manual.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

xvii

Preface

Conventions
This manual employs typographic and other conventions intended to improve its ease of use.

General typographic conventions
typewriter

Is used for assembler syntax descriptions, pseudocode descriptions of instructions,
and source code examples. In the cases of assembler syntax descriptions and
pseudocode descriptions, see the additional conventions below.
The typewriter style is also used in the main text for instruction mnemonics and for
references to other items appearing in assembler syntax descriptions, pseudocode
descriptions of instructions and source code examples.

italic

Highlights important notes, introduces special terminology, and denotes internal
cross-references and citations.

bold

Is used for emphasis in descriptive lists and elsewhere, where appropriate.

SMALL CAPITALS

Are used for a few terms that have specific technical meanings. Their meanings can
be found in the Glossary.

Signals
In general this specification does not define processor signals, but it does include some signal examples and
recommendations. It uses the following signal conventions:
Signal level

The level of an asserted signal depends on whether the signal is active-HIGH or
active-LOW. Asserted means:
•
HIGH for active-HIGH signals
•
LOW for active-LOW signals.

Lower-case n

At the start or end of a signal name denotes an active-LOW signal.

Numbers
Numbers are normally written in decimal. Binary numbers are preceded by 0b, and hexadecimal numbers
by 0x and written in a typewriter font.

Bit values
Values of bits and bitfields are normally given in binary, in single quotes. The quotes are normally omitted
in encoding diagrams and tables.

Pseudocode descriptions
This manual uses a form of pseudocode to provide precise descriptions of the specified functionality. This
pseudocode is written in a typewriter font, and is described in Appendix I Pseudocode Definition.

xviii

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Preface

Assembler syntax descriptions
This manual contains numerous syntax descriptions for assembler instructions and for components of
assembler instructions. These are shown in a typewriter font, and use the conventions described in
Assembler syntax on page A8-4.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

xix

Preface

Further reading
This section lists publications from both ARM and third parties that provide more information on the ARM
family of processors.
ARM periodically provides updates and corrections to its documentation. See http://www.arm.com for
current errata sheets and addenda, and the ARM Frequently Asked Questions.

ARM publications
•
•
•
•

ARM Debug Interface v5 Architecture Specification (ARM IHI 0031)
ARMv7-M Architecture Reference Manual (ARM DDI 0403)
CoreSight Architecture Specification (ARM IHI 0029)
ARM Architecture Reference Manual (ARM DDI 0100I)

Note
—
—

•
•

Issue I of the ARM Architecture Reference Manual (DDI 0100I) was issued in July 2005 and
describes the first version of the ARMv6 architecture, and all previous architecture versions.
Addison-Wesley Professional publish ARM Architecture Reference Manual, Second Edition
(December 27, 2000). The contents of this are identical to Issue E of the ARM Architecture
Reference Manual (DDI 0100E). It describes ARMv5TE and earlier versions of the ARM
architecture, and is superseded by DDI 0100I.

Embedded Trace Macrocell Architecture Specification (ARM IHI 0014)
CoreSight Program Flow Trace Architecture Specification (ARM IHI 0035).

External publications
The following books are referred to in this manual, or provide more information:

xx

•

IEEE Std 1596.5-1993, IEEE Standard for Shared-Data Formats Optimized for Scalable Coherent
Interface (SCI) Processors, ISBN 1-55937-354-7

•

IEEE Std 1149.1-2001, IEEE Standard Test Access Port and Boundary Scan Architecture (JTAG)

•

ANSI/IEEE Std 754-1985, IEEE Standard for Binary Floating-Point Arithmetic

•

JEP106, Standard Manufacturers Identification Code, JEDEC Solid State Technology Association

•

The Java Virtual Machine Specification Second Edition, Tim Lindholm and Frank Yellin, published
by Addison Wesley (ISBN: 0-201-43294-3)

•

Memory Consistency Models for Shared Memory-Multiprocessors, Kourosh Gharachorloo, Stanford
University Technical Report CSL-TR-95-685

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Preface

Feedback
ARM welcomes feedback on its documentation.

Feedback on this manual
If you notice any errors or omissions in this manual, send e-mail to errata@arm.com giving:
•
the document title
•
the document number
•
the page number(s) to which your comments apply
•
a concise explanation of the problem.
General suggestions for additions and improvements are also welcome.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

xxi

Preface

xxii

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Part A
Application Level Architecture

Chapter A1
Introduction to the ARM Architecture

This chapter introduces the ARM architecture and contains the following sections:
•
About the ARM architecture on page A1-2
•
The ARM and Thumb instruction sets on page A1-3
•
Architecture versions, profiles, and variants on page A1-4
•
Architecture extensions on page A1-6
•
The ARM memory model on page A1-7
•
Debug on page A1-8.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A1-1

Introduction to the ARM Architecture

A1.1

About the ARM architecture
The ARM architecture supports implementations across a wide range of performance points. It is
established as the dominant architecture in many market segments. The architectural simplicity of ARM
processors leads to very small implementations, and small implementations mean devices can have very low
power consumption. Implementation size, performance, and very low power consumption are key attributes
of the ARM architecture.
The ARM architecture is a Reduced Instruction Set Computer (RISC) architecture, as it incorporates these
typical RISC architecture features:
•

a large uniform register file

•

a load/store architecture, where data-processing operations only operate on register contents, not
directly on memory contents

•

simple addressing modes, with all load/store addresses being determined from register contents and
instruction fields only.

In addition, the ARM architecture provides:
•
instructions that combine a shift with an arithmetic or logical operation
•
auto-increment and auto-decrement addressing modes to optimize program loops
•
Load and Store Multiple instructions to maximize data throughput
•
conditional execution of almost all instructions to maximize execution throughput.
These enhancements to a basic RISC architecture enable ARM processors to achieve a good balance of high
performance, small code size, low power consumption, and small silicon area.
Except where the architecture specifies differently, the programmer-visible behavior of an implementation
must be the same as a simple sequential execution of the program. This programmer-visible behavior does
not include the execution time of the program.

A1-2

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Introduction to the ARM Architecture

A1.2

The ARM and Thumb instruction sets
The ARM instruction set is a set of 32-bit instructions providing comprehensive data-processing and control
functions.
The Thumb instruction set was developed as a 16-bit instruction set with a subset of the functionality of the
ARM instruction set. It provides significantly improved code density, at a cost of some reduction in
performance. A processor executing Thumb instructions can change to executing ARM instructions for
performance critical segments, in particular for handling interrupts.
In ARMv6T2, Thumb-2 technology is introduced. This technology makes it possible to extend the original
Thumb instruction set with many 32-bit instructions. The range of 32-bit Thumb instructions included in
ARMv6T2 permits Thumb code to achieve performance similar to ARM code, with code density better than
that of earlier Thumb code.
From ARMv6T2, the ARM and Thumb instruction sets provide almost identical functionality. For more
information, see Chapter A4 The Instruction Sets.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A1-3

Introduction to the ARM Architecture

A1.3

Architecture versions, profiles, and variants
The ARM and Thumb instruction set architectures have evolved significantly since they were first
developed. They will continue to be developed in the future. Seven major versions of the instruction set have
been defined to date, denoted by the version numbers 1 to 7. Of these, the first three versions are now
obsolete.
ARMv7 provides three profiles:
ARMv7-A

Application profile, described in this manual. Implements a traditional ARM architecture
with multiple modes and supporting a Virtual Memory System Architecture (VMSA) based
on an MMU. Supports the ARM and Thumb instruction sets.

ARMv7-R

Real-time profile, described in this manual. Implements a traditional ARM architecture with
multiple modes and supporting a Protected Memory System Architecture (PMSA) based on
an MPU. Supports the ARM and Thumb instruction sets.

ARMv7-M

Microcontroller profile, described in the ARMv7-M Architecture Reference Manual.
Implements a programmers' model designed for fast interrupt processing, with hardware
stacking of registers and support for writing interrupt handlers in high-level languages.
Implements a variant of the ARMv7 PMSA and supports a variant of the Thumb instruction
set.

Versions can be qualified with variant letters to specify additional instructions and other functionality that
are included as an architecture extension. Extensions are typically included in the base architecture of the
next version number. Provision is also made to exclude variants by prefixing the variant letter with x.
Some extensions are described separately instead of using a variant letter. For details of these extensions see
Architecture extensions on page A1-6.
The valid variants of ARMv4, ARMv5, and ARMv6 are as follows:

A1-4

ARMv4

The earliest architecture variant covered by this manual. It includes only the ARM
instruction set.

ARMv4T

Adds the Thumb instruction set.

ARMv5T

Improves interworking of ARM and Thumb instructions. Adds count leading zeros (CLZ)
and software breakpoint (BKPT) instructions.

ARMv5TE

Enhances arithmetic support for digital signal processing (DSP) algorithms. Adds preload
data (PLD), dual word load (LDRD), store (STRD), and 64-bit coprocessor register transfers
(MCRR, MRRC).

ARMv5TEJ

Adds the BXJ instruction and other support for the Jazelle® architecture extension.

ARMv6

Adds many new instructions to the ARM instruction set. Formalizes and revises the memory
model and the Debug architecture.

ARMv6K

Adds instructions to support multi-processing to the ARM instruction set, and some extra
memory model features.

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Introduction to the ARM Architecture

ARMv6T2

Introduces Thumb-2 technology, giving a major development of the Thumb instruction set
to provide a similar level of functionality to the ARM instruction set.

Note
ARMv6KZ or ARMv6Z are sometimes used to describe the ARMv6K architecture with the optional
Security Extensions.
For detailed information about versions of the ARM architecture, see Appendix G ARMv6 Differences and
Appendix H ARMv4 and ARMv5 Differences.
The following architecture variants are now obsolete:
ARMv1, ARMv2, ARMv2a, ARMv3, ARMv3G, ARMv3M, ARMv4xM, ARMv4TxM, ARMv5,
ARMv5xM, ARMv5TxM, and ARMv5TExP.
Contact ARM if you require details of obsolete variants.
Instruction descriptions in this manual specify the architecture versions that support them.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A1-5

Introduction to the ARM Architecture

A1.4

Architecture extensions
This manual describes the following extensions to the ARM and Thumb instruction set architectures:
ThumbEE

Is a variant of the Thumb instruction set that is designed as a target for dynamically
generated code. It is:
•
a required extension to the ARMv7-A profile
•
an optional extension to the ARMv7-R profile.

VFP

Is a floating-point coprocessor extension to the instruction set architectures. There
have been three main versions of VFP to date:
•

VFPv1 is obsolete. Details are available on request from ARM.

•

VFPv2 is an optional extension to:

•

Advanced SIMD

—

the ARM instruction set in the ARMv5TE, ARMv5TEJ, ARMv6, and
ARMv6K architectures

—

the ARM and Thumb instruction sets in the ARMv6T2 architecture.

VFPv3 is an optional extension to the ARM, Thumb and ThumbEE
instruction sets in the ARMv7-A and ARMv7-R profiles.
VFPv3 can be implemented with either thirty-two or sixteen doubleword
registers, as described in Advanced SIMD and VFP extension registers on
page A2-21. Where necessary, the terms VFPv3-D32 and VFPv3-D16 are
used to distinguish between these two implementation options. Where the
term VFPv3 is used it covers both options.
VFPv3 can be extended by the half-precision extensions that provide
conversion functions in both directions between half-precision floating-point
and single-precision floating-point.

Is an instruction set extension that provides Single Instruction Multiple Data
(SIMD) functionality. It is an optional extension to the ARMv7-A and ARMv7-R
profiles. When VFPv3 and Advanced SIMD are both implemented, they use a
shared register bank and have some shared instructions.
Advanced SIMD can be extended by the half-precision extensions that provide
conversion functions in both directions between half-precision floating-point and
single-precision floating-point.

Security Extensions

Are a set of security features that facilitate the development of secure applications.
They are an optional extension to the ARMv6K architecture and the ARMv7-A
profile.

Jazelle

Is the Java bytecode execution extension that extended ARMv5TE to ARMv5TEJ.
From ARMv6 Jazelle is a required part of the architecture, but is still often
described as the Jazelle extension.

Multiprocessing Extensions
Are a set of features that enhance multiprocessing functionality. They are an
optional extension to the ARMv7-A and ARMv7-R profiles.

A1-6

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Introduction to the ARM Architecture

A1.5

The ARM memory model
The ARM architecture uses a single, flat address space of 232 8-bit bytes. The address space is also regarded
as 230 32-bit words or 231 16-bit halfwords.
The architecture provides facilities for:
•
faulting unaligned memory accesses
•
restricting access by applications to specified areas of memory
•
translating virtual addresses provided by executing instructions into physical addresses
•
altering the interpretation of word and halfword data between big-endian and little-endian
•
optionally preventing out-of-order access to memory
•
controlling caches
•
synchronizing access to shared memory by multiple processors.
For more information, see:
•
Chapter A3 Application Level Memory Model
•
Chapter B2 Common Memory System Architecture Features
•
Chapter B3 Virtual Memory System Architecture (VMSA)
•
Chapter B4 Protected Memory System Architecture (PMSA).

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A1-7

Introduction to the ARM Architecture

A1.6

Debug
ARMv7 processors implement two types of debug support:
Invasive debug

Debug permitting modification of the state of the processor. This is intended
primarily for run-control debugging.

Non-invasive debug

Debug permitting data and program flow observation, without modifying the state
of the processor or interrupting the flow of execution.
This provides for:
•
instruction and data tracing
•
program counter sampling
•
performance monitors.

For more information, see Chapter C1 Introduction to the ARM Debug Architecture.

A1-8

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Chapter A2
Application Level Programmers’ Model

This chapter gives an application level view of the ARM programmers’ model. It contains the following
sections:
•
About the Application level programmers’ model on page A2-2
•
ARM core data types and arithmetic on page A2-3
•
ARM core registers on page A2-11
•
The Application Program Status Register (APSR) on page A2-14
•
Execution state registers on page A2-15
•
Advanced SIMD and VFP extensions on page A2-20
•
Floating-point data types and arithmetic on page A2-32
•
Polynomial arithmetic over {0,1} on page A2-67
•
Coprocessor support on page A2-68
•
Execution environment support on page A2-69
•
Exceptions, debug events and checks on page A2-81.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-1

Application Level Programmers’ Model

A2.1

About the Application level programmers’ model
This chapter contains the programmers’ model information required for application development.
The information in this chapter is distinct from the system information required to service and support
application execution under an operating system. However, some knowledge of that system information is
needed to put the Application level programmers' model into context.
System level support requires access to all features and facilities of the architecture, a mode of operation
referred to as privileged operation. System code determines whether an application runs in a privileged or
unprivileged manner. When an operating system supports both privileged and unprivileged operation, an
application usually runs unprivileged. This:
•

permits the operating system to allocate system resources to it in a unique or shared manner

•

provides a degree of protection from other processes and tasks, and so helps protect the operating
system from malfunctioning applications.

This chapter indicates where some system level understanding is helpful, and where appropriate it:
•

gives an overview of the system level information

•

gives references to the system level descriptions in Chapter B1 The System Level Programmers’
Model and elsewhere.

The Security Extensions extend the architecture to provide hardware security features that support the
development of secure applications. For more information, see The Security Extensions on page B1-25.

A2-2

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

A2.2

ARM core data types and arithmetic
All ARMv7-A and ARMv7-R processors support the following data types in memory:
Byte
8 bits
Halfword
16 bits
Word
32 bits
Doubleword 64 bits.
Processor registers are 32 bits in size. The instruction set contains instructions supporting the following data
types held in registers:
•
32-bit pointers
•
unsigned or signed 32-bit integers
•
unsigned 16-bit or 8-bit integers, held in zero-extended form
•
signed 16-bit or 8-bit integers, held in sign-extended form
•
two 16-bit integers packed into a register
•
four 8-bit integers packed into a register
•
unsigned or signed 64-bit integers held in two registers.
Load and store operations can transfer bytes, halfwords, or words to and from memory. Loads of bytes or
halfwords zero-extend or sign-extend the data as it is loaded, as specified in the appropriate load instruction.
The instruction sets include load and store operations that transfer two or more words to and from memory.
You can load and store doublewords using these instructions. The exclusive doubleword load/store
instructions LDREXD and STREXD specify single-copy atomic doubleword accesses to memory.
When any of the data types is described as unsigned, the N-bit data value represents a non-negative integer
in the range 0 to 2N-1, using normal binary format.
When any of these types is described as signed, the N-bit data value represents an integer in the range -2N-1
to +2N-1-1, using two's complement format.
The instructions that operate on packed halfwords or bytes include some multiply instructions that use just
one of two halfwords, and Single Instruction Multiple Data (SIMD) instructions that operate on all of the
halfwords or bytes in parallel.
Direct instruction support for 64-bit integers is limited, and most 64-bit operations require sequences of two
or more instructions to synthesize them.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-3

Application Level Programmers’ Model

A2.2.1

Integer arithmetic
The instruction set provides a wide variety of operations on the values in registers, including bitwise logical
operations, shifts, additions, subtractions, multiplications, and many others. These operations are defined
using the pseudocode described in Appendix I Pseudocode Definition, usually in one of three ways:

A2-4

•

By direct use of the pseudocode operators and built-in functions defined in Operators and built-in
functions on page AppxI-11.

•

By use of pseudocode helper functions defined in the main text. These can be located using the table
in Appendix J Pseudocode Index.

•

By a sequence of the form:
1.

Use of the SInt(), UInt(), and Int() built-in functions defined in Converting bitstrings to
integers on page AppxI-14 to convert the bitstring contents of the instruction operands to the
unbounded integers that they represent as two's complement or unsigned integers.

2.

Use of mathematical operators, built-in functions and helper functions on those unbounded
integers to calculate other such integers.

3.

Use of either the bitstring extraction operator defined in Bitstring extraction on page AppxI-12
or of the saturation helper functions described in Pseudocode details of saturation on
page A2-9 to convert an unbounded integer result into a bitstring result that can be written to
a register.

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

Shift and rotate operations
The following types of shift and rotate operations are used in instructions:
Logical Shift Left
(LSL) moves each bit of a bitstring left by a specified number of bits. Zeros are shifted in at
the right end of the bitstring. Bits that are shifted off the left end of the bitstring are
discarded, except that the last such bit can be produced as a carry output.
Logical Shift Right
(LSR) moves each bit of a bitstring right by a specified number of bits. Zeros are shifted in
at the left end of the bitstring. Bits that are shifted off the right end of the bitstring are
discarded, except that the last such bit can be produced as a carry output.
Arithmetic Shift Right
(ASR) moves each bit of a bitstring right by a specified number of bits. Copies of the leftmost
bit are shifted in at the left end of the bitstring. Bits that are shifted off the right end of the
bitstring are discarded, except that the last such bit can be produced as a carry output.
Rotate Right (ROR) moves each bit of a bitstring right by a specified number of bits. Each bit that is shifted
off the right end of the bitstring is re-introduced at the left end. The last bit shifted off the
right end of the bitstring can be produced as a carry output.
Rotate Right with Extend
(RRX) moves each bit of a bitstring right by one bit. The carry input is shifted in at the left
end of the bitstring. The bit shifted off the right end of the bitstring can be produced as a
carry output.
Pseudocode details of shift and rotate operations
These shift and rotate operations are supported in pseudocode by the following functions:
// LSL_C()
// =======
(bits(N), bit) LSL_C(bits(N) x, integer shift)
assert shift > 0;
extended_x = x : Zeros(shift);
result = extended_x;
carry_out = extended_x;
return (result, carry_out);
// LSL()
// =====
bits(N) LSL(bits(N) x, integer shift)
assert shift >= 0;
if shift == 0 then
result = x;
else

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-5

Application Level Programmers’ Model

(result, -) = LSL_C(x, shift);
return result;
// LSR_C()
// =======
(bits(N), bit) LSR_C(bits(N) x, integer shift)
assert shift > 0;
extended_x = ZeroExtend(x, shift+N);
result = extended_x;
carry_out = extended_x;
return (result, carry_out);
// LSR()
// =====
bits(N) LSR(bits(N) x, integer shift)
assert shift >= 0;
if shift == 0 then
result = x;
else
(result, -) = LSR_C(x, shift);
return result;
// ASR_C()
// =======
(bits(N), bit) ASR_C(bits(N) x, integer shift)
assert shift > 0;
extended_x = SignExtend(x, shift+N);
result = extended_x;
carry_out = extended_x;
return (result, carry_out);
// ASR()
// =====
bits(N) ASR(bits(N) x, integer shift)
assert shift >= 0;
if shift == 0 then
result = x;
else
(result, -) = ASR_C(x, shift);
return result;
// ROR_C()
// =======
(bits(N), bit) ROR_C(bits(N) x, integer shift)
assert shift != 0;
m = shift MOD N;
result = LSR(x,m) OR LSL(x,N-m);
carry_out = result;
return (result, carry_out);

A2-6

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

// ROR()
// =====
bits(N) ROR(bits(N) x, integer shift)
if n == 0 then
result = x;
else
(result, -) = ROR_C(x, shift);
return result;
// RRX_C()
// =======
(bits(N), bit) RRX_C(bits(N) x, bit carry_in)
result = carry_in : x;
carry_out = x<0>;
return (result, carry_out);
// RRX()
// =====
bits(N) RRX(bits(N) x, bit carry_in)
(result, -) = RRX_C(x, shift);
return result;

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-7

Application Level Programmers’ Model

Pseudocode details of addition and subtraction
In pseudocode, addition and subtraction can be performed on any combination of unbounded integers and
bitstrings, provided that if they are performed on two bitstrings, the bitstrings must be identical in length.
The result is another unbounded integer if both operands are unbounded integers, and a bitstring of the same
length as the bitstring operand(s) otherwise. For the precise definition of these operations, see Addition and
subtraction on page AppxI-15.
The main addition and subtraction instructions can produce status information about both unsigned carry
and signed overflow conditions. This status information can be used to synthesize multi-word additions and
subtractions. In pseudocode the AddWithCarry() function provides an addition with a carry input and carry
and overflow outputs:
// AddWithCarry()
// ==============
(bits(N), bit, bit) AddWithCarry(bits(N) x, bits(N) y, bit carry_in)
unsigned_sum = UInt(x) + UInt(y) + UInt(carry_in);
signed_sum
= SInt(x) + SInt(y) + UInt(carry_in);
result
= unsigned_sum; // == signed_sum
carry_out
= if UInt(result) == unsigned_sum then ‘0’ else ‘1’;
overflow
= if SInt(result) == signed_sum then ‘0’ else ‘1’;
return (result, carry_out, overflow);

An important property of the AddWithCarry() function is that if:
(result, carry_out, overflow) = AddWithCarry(x, NOT(y), carry_in)

then:
•
if carry_in == '1', then result == x-y with:
overflow == '1' if signed overflow occurred during the subtraction
—
—
carry_out == '1' if unsigned borrow did not occur during the subtraction, that is, if x >= y
•
if carry_in == '0', then result == x-y-1 with:
overflow == '1' if signed overflow occurred during the subtraction
—
—
carry_out == '1' if unsigned borrow did not occur during the subtraction, that is, if x > y.
Together, these mean that the carry_in and carry_out bits in AddWithCarry() calls can act as NOT borrow
flags for subtractions as well as carry flags for additions.

A2-8

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

Pseudocode details of saturation
Some instructions perform saturating arithmetic, that is, if the result of the arithmetic overflows the
destination signed or unsigned N-bit integer range, the result produced is the largest or smallest value in that
range, rather than wrapping around modulo 2N. This is supported in pseudocode by the SignedSatQ() and
UnsignedSatQ() functions when a boolean result is wanted saying whether saturation occurred, and by the
SignedSat() and UnsignedSat() functions when only the saturated result is wanted:
// SignedSatQ()
// ============
(bits(N), boolean) SignedSatQ(integer i, integer N)
if i > 2^(N-1) - 1 then
result = 2^(N-1) - 1; saturated = TRUE;
elsif i < -(2^(N-1)) then
result = -(2^(N-1)); saturated = TRUE;
else
result = i; saturated = FALSE;
return (result, saturated);
// UnsignedSatQ()
// ==============
(bits(N), boolean) UnsignedSatQ(integer i, integer N)
if i > 2^N - 1 then
result = 2^N - 1; saturated = TRUE;
elsif i < 0 then
result = 0; saturated = TRUE;
else
result = i; saturated = FALSE;
return (result, saturated);
// SignedSat()
// ===========
bits(N) SignedSat(integer i, integer N)
(result, -) = SignedSatQ(i, N);
return result;
// UnsignedSat()
// =============
bits(N) UnsignedSat(integer i, integer N)
(result, -) = UnsignedSatQ(i, N);
return result;
SatQ(i, N, unsigned) returns either UnsignedSatQ(i,N) or SignedSatQ(i, N) depending on the value of its
third argument, and Sat(i, N, unsigned) returns either UnsignedSat(i, N) or SignedSat(i, N) depending on
the value of its third argument:

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-9

Application Level Programmers’ Model

// SatQ()
// ======
(bits(N), boolean) SatQ(integer i, integer N, boolean unsigned)
(result, sat) = if unsigned then UnsignedSatQ(i, N) else SignedSatQ(i, N);
return (result, sat);
// Sat()
// =====
bits(N) Sat(integer i, integer N, boolean unsigned)
result = if unsigned then UnsignedSat(i, N) else SignedSat(i, N);
return result;

A2-10

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

A2.3

ARM core registers
In the application level view, an ARM processor has:
•
thirteen general-purpose32-bit registers, R0 to R12
•
three 32-bit registers, R13 to R15, that sometimes or always have a special use.
Registers R13 to R15 are usually referred to by names that indicate their special uses:
SP, the Stack Pointer
Register R13 is used as a pointer to the active stack.
In Thumb code, most instructions cannot access SP. The only instructions that can access
SP are those designed to use SP as a stack pointer.
The use of SP for any purpose other than as a stack pointer is deprecated.

Note
Using SP for any purpose other than as a stack pointer is likely to break the requirements of
operating systems, debuggers, and other software systems, causing them to malfunction.
LR, the Link Register
Register R14 is used to store the return address from a subroutine. At other times, LR can
be used for other purposes.
When a BL or BLX instruction performs a subroutine call, LR is set to the subroutine return
address. To perform a subroutine return, copy LR back to the program counter. This is
typically done in one of two ways, after entering the subroutine with a BL or BLX instruction:
•
•

Return with a BX LR instruction.
On subroutine entry, store LR to the stack with an instruction of the form:
PUSH {,LR}

and use a matching instruction to return:
POP {,PC}

ThumbEE checks and handler calls use LR in a similar way. For details see Chapter A9
ThumbEE.
PC, the Program Counter
Register R15 is the program counter:
•

When executing an ARM instruction, PC reads as the address of the current
instruction plus 8.

•

When executing a Thumb instruction, PC reads as the address of the current
instruction plus 4.

•

Writing an address to PC causes a branch to that address.

In Thumb code, most instructions cannot access PC.
See ARM core registers on page B1-9 for the system level view of SP, LR, and PC.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-11

Application Level Programmers’ Model

Note
The names SP, LR and PC are preferred to R13, R14 and R15. However, sometimes it is simpler to use the
R13-R15 names when referring to a group of registers. For example, it is simpler to refer to Registers R8 to
R15, rather than to Registers R8 to R12, the SP, LR and PC. However these two descriptions of the group of
registers have exactly the same meaning.

A2.3.1

Pseudocode details of operations on ARM core registers
In pseudocode, the R[] function is used to:
•
Read or write R0-R12, SP, and LR, using n == 0-12, 13, and 14 respectively.
•
Read the PC, using n == 15.
This function has prototypes:
bits(32) R[integer n]
assert n >= 0 && n <= 15;
R[integer n] = bits(32) value
assert n >= 0 && n <= 14;

The full operation of this function is explained in Pseudocode details of ARM core register operations on
page B1-12.
Descriptions of ARM store instructions that store the PC value use the PCStoreValue() pseudocode function
to specify the PC value stored by the instruction:
// PCStoreValue()
// ==============
bits(32) PCStoreValue()
// This function returns the PC value. On architecture versions before ARMv7, it
// is permitted to instead return PC+4, provided it does so consistently. It is
// used only to describe ARM instructions, so it returns the address of the current
// instruction plus 8 (normally) or 12 (when the alternative is permitted).
return PC;

Writing an address to the PC causes either a simple branch to that address or an interworking branch that
also selects the instruction set to execute after the branch. A simple branch is performed by the
BranchWritePC() function:
// BranchWritePC()
// ===============
BranchWritePC(bits(32) address)
if CurrentInstrSet() == InstrSet_ARM then
if ArchVersion() < 6 && address<1:0> != ‘00’ then UNPREDICTABLE;
BranchTo(address<31:2>:’00’);
else
BranchTo(address<31:1>:’0’);

An interworking branch is performed by the BXWritePC() function:
A2-12

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

// BXWritePC()
// ===========
BXWritePC(bits(32) address)
if CurrentInstrSet() == InstrSet_ThumbEE then
if address<0> == ‘1’ then
BranchTo(address<31:1>:’0’); // Remaining in ThumbEE state
else
UNPREDICTABLE;
else
if address<0> == ‘1’ then
SelectInstrSet(InstrSet_Thumb);
BranchTo(address<31:1>:’0’);
elsif address<1> == ‘0’ then
SelectInstrSet(InstrSet_ARM);
BranchTo(address);
else // address<1:0> == ‘10’
UNPREDICTABLE;

The LoadWritePC() and ALUWritePC() functions are used for two cases where the behavior was systematically
modified between architecture versions:
// LoadWritePC()
// =============
LoadWritePC(bits(32) address)
if ArchVersion() >= 5 then
BXWritePC(address);
else
BranchWritePC(address);
// ALUWritePC()
// ============
ALUWritePC(bits(32) address)
if ArchVersion() >= 7 && CurrentInstrSet() == InstrSet_ARM then
BXWritePC(address);
else
BranchWritePC(address);

Note
The behavior of the PC writes performed by the ALUWritePC() function is different in Debug state, where
there are more UNPREDICTABLE cases. The pseudocode in this section only handles the non-debug cases. For
more information, see Data-processing instructions with the PC as the target in Debug state on page C5-12.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-13

Application Level Programmers’ Model

A2.4

The Application Program Status Register (APSR)
Program status is reported in the 32-bit Application Program Status Register (APSR). The format of the
APSR is:
31 30 29 28 27 26

N Z C V Q

24 23

RAZ/
SBZP

20 19

Reserved

16 15

GE[3:0]

0

Reserved

In the APSR, the bits are in the following categories:
•

Reserved bits are allocated to system features, or are available for future expansion. Unprivileged
execution ignores writes to privileged fields. However, application level software that writes to the
APSR must treat reserved bits as Do-Not-Modify (DNM) bits. For more information about the
reserved bits, see Format of the CPSR and SPSRs on page B1-16.

•

Flags that can be set by many instructions:
N, bit [31] Negative condition code flag. Set to bit [31] of the result of the instruction. If the result
is regarded as a two's complement signed integer, then N == 1 if the result is negative and
N == 0 if it is positive or zero.
Z, bit [30] Zero condition code flag. Set to 1 if the result of the instruction is zero, and to 0 otherwise.
A result of zero often indicates an equal result from a comparison.
C, bit [29] Carry condition code flag. Set to 1 if the instruction results in a carry condition, for
example an unsigned overflow on an addition.
V, bit [28] Overflow condition code flag. Set to 1 if the instruction results in an overflow condition,
for example a signed overflow on an addition.
Q, bit [27] Set to 1 to indicate overflow or saturation occurred in some instructions, normally related
to Digital Signal Processing (DSP). For more information, see Pseudocode details of
saturation on page A2-9.
GE[3:0], bits [19:16]
Greater than or Equal flags. SIMD instructions update these flags to indicate the results
from individual bytes or halfwords of the operation. These flags can control a later SEL
instruction. For more information, see SEL on page A8-312.

•

Bits [26:24] are RAZ/SBZP. Therefore, software can use MSR instructions that write the top byte of
the APSR without using a read, modify, write sequence. If it does this, it must write zeros to
bits [26:24].

Instructions can test the N, Z, C, and V condition code flags to determine whether the instruction is to be
executed. In this way, execution of the instruction can be made conditional on the result of a previous
operation. For more information about conditional execution see Conditional execution on page A4-3 and
Conditional execution on page A8-8.
In ARMv7-A and ARMv7-R, the APSR is the same register as the CPSR, but the APSR must be used only
to access the N, Z, C, V, Q, and GE[3:0] bits. For more information, see Program Status Registers (PSRs)
on page B1-14.

A2-14

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

A2.5

Execution state registers
The execution state registers modify the execution of instructions. They control:
•

Whether instructions are interpreted as Thumb instructions, ARM instructions, ThumbEE
instructions, or Java bytecodes. For more information, see ISETSTATE.

•

In Thumb state and ThumbEE state only, what conditions apply to the next four instructions. For
more information, see ITSTATE on page A2-17.

•

Whether data is interpreted as big-endian or little-endian. For more information, see ENDIANSTATE
on page A2-19.

In ARMv7-A and ARMv7-R, the execution state registers are part of the Current Program Status Register.
For more information, see Program Status Registers (PSRs) on page B1-14.
There is no direct access to the execution state registers from application level instructions, but they can be
changed by side effects of application level instructions.

A2.5.1

ISETSTATE
1 0

J T
The J bit and the T bit determine the instruction set used by the processor. Table A2-1 shows the encoding
of these bits.
Table A2-1 J and T bit encoding in ISETSTATE
J

T

Instruction set state

0

0

ARM

0

1

Thumb

1

0

Jazelle

1

1

ThumbEE

ARM state

The processor executes the ARM instruction set described in Chapter A5 ARM
Instruction Set Encoding.

Thumb state

The processor executes the Thumb instruction set as described in Chapter A6
Thumb Instruction Set Encoding.

Jazelle state

The processor executes Java bytecodes as part of a Java Virtual Machine (JVM). For
more information, see Jazelle direct bytecode execution support on page A2-73.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-15

Application Level Programmers’ Model

ThumbEE state

The processor executes a variation of the Thumb instruction set specifically targeted
for use with dynamic compilation techniques associated with an execution
environment. This can be Java or other execution environments. This feature is
required in ARMv7-A, and optional in ARMv7-R. For more information, see
Thumb Execution Environment on page A2-69.

Pseudocode details of ISETSTATE operations
The following pseudocode functions return the current instruction set and select a new instruction set:
enumeration InstrSet {InstrSet_ARM, InstrSet_Thumb, InstrSet_Jazelle, InstrSet_ThumbEE};
// CurrentInstrSet()
// =================
InstrSet CurrentInstrSet()
case ISETSTATE of
when ‘00’ result =
when ‘01’ result =
when ‘10’ result =
when ‘11’ result =
return result;

InstrSet_ARM;
InstrSet_Thumb;
InstrSet_Jazelle;
InstrSet_ThumbEE;

// SelectInstrSet()
// ================
SelectInstrSet(InstrSet iset)
case iset of
when InstrSet_ARM
if CurrentInstrSet() == InstrSet_ThumbEE then
UNPREDICTABLE;
else
ISETSTATE = ‘00’;
when InstrSet_Thumb
ISETSTATE = ‘01’;
when InstrSet_Jazelle
ISETSTATE = ‘10’;
when InstrSet_ThumbEE
ISETSTATE = ‘11’;
return;

A2-16

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

A2.5.2

ITSTATE
7 6 5 4

3 2 1 0

IT[7:0]
This field holds the If-Then execution state bits for the Thumb IT instruction. See IT on page A8-104 for a
description of the IT instruction and the associated IT block.
ITSTATE divides into two subfields:
IT[7:5]

Holds the base condition for the current IT block. The base condition is the top 3 bits of the
condition specified by the IT instruction.
This subfield is 0b000 when no IT block is active.

IT[4:0]

Encodes:
•

The size of the IT block. This is the number of instructions that are to be conditionally
executed. The size of the block is implied by the position of the least significant 1 in
this field, as shown in Table A2-2 on page A2-18.

•

The value of the least significant bit of the condition code for each instruction in the
block.

Note
Changing the value of the least significant bit of a condition code from 0 to 1 has the
effect of inverting the condition code.
This subfield is 0b00000 when no IT block is active.
When an IT instruction is executed, these bits are set according to the condition in the instruction, and the
Then and Else (T and E) parameters in the instruction. For more information, see IT on page A8-104.
An instruction in an IT block is conditional, see Conditional instructions on page A4-4 and Conditional
execution on page A8-8. The condition used is the current value of IT[7:4]. When an instruction in an IT
block completes its execution normally, ITSTATE is advanced to the next line of Table A2-2 on page A2-18.
For details of what happens if such an instruction takes an exception see Exception entry on page B1-34.

Note
Instructions that can complete their normal execution by branching are only permitted in an IT block as its
last instruction, and so always result in ITSTATE advancing to normal execution.

Note
ITSTATE affects instruction execution only in Thumb and ThumbEE states. In ARM and Jazelle states,
ITSTATE must be '00000000', otherwise behavior is UNPREDICTABLE.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-17

Application Level Programmers’ Model

Table A2-2 Effect of IT execution state bits
IT bits a
Note
[7:5]

[4]

[3]

[2]

[1]

[0]

cond_base

P1

P2

P3

P4

1

Entry point for 4-instruction IT block

cond_base

P1

P2

P3

1

0

Entry point for 3-instruction IT block

cond_base

P1

P2

1

0

0

Entry point for 2-instruction IT block

cond_base

P1

1

0

0

0

Entry point for 1-instruction IT block

000

0

0

0

0

0

Normal execution, not in an IT block

a. Combinations of the IT bits not shown in this table are reserved.

Pseudocode details of ITSTATE operations
ITSTATE advances after normal execution of an IT block instruction. This is described by the ITAdvance()
pseudocode function:
// ITAdvance()
// ===========
ITAdvance()
if ITSTATE<2:0> == ‘000’ then
ITSTATE.IT = ‘00000000’;
else
ITSTATE.IT<4:0> = LSL(ITSTATE.IT<4:0>, 1);

The following functions test whether the current instruction is in an IT block, and whether it is the last
instruction of an IT block:
// InITBlock()
// ===========
boolean InITBlock()
return (ITSTATE.IT<3:0> != ‘0000’);
// LastInITBlock()
// ===============
boolean LastInITBlock()
return (ITSTATE.IT<3:0> == ‘1000’);

A2-18

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

A2.5.3

ENDIANSTATE
ARMv7-A and ARMv7-R support configuration between little-endian and big-endian interpretations of
data memory, as shown in Table A2-3. The endianness is controlled by ENDIANSTATE.
Table A2-3 APSR configuration of endianness
ENDIANSTATE

Endian mapping

0

Little-endian

1

Big-endian

The ARM and Thumb instruction sets both include an instruction to manipulate ENDIANSTATE:
SETEND BE
Sets ENDIANSTATE to 1, for big-endian operation
SETEND LE
Sets ENDIANSTATE to 0, for little-endian operation.
The SETEND instruction is unconditional. For more information, see SETEND on page A8-314.

Pseudocode details of ENDIANSTATE operations
The BigEndian() pseudocode function tests whether big-endian memory accesses are currently selected.
// BigEndian()
// ===========
boolean BigEndian()
return (ENDIANSTATE == ‘1’);

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-19

Application Level Programmers’ Model

A2.6

Advanced SIMD and VFP extensions
Advanced SIMD and VFP are two optional extensions to ARMv7.
Advanced SIMD performs packed Single Instruction Multiple Data (SIMD) operations, either integer or
single-precision floating-point. VFP performs single-precision or double-precision floating-point
operations.
Both extensions permit floating-point exceptions, such as overflow or division by zero, to be handled in an
untrapped fashion. When handled in this way, a floating-point exception causes a cumulative status register
bit to be set to 1 and a default result to be produced by the operation.
The ARMv7 VFP implementation is VFPv3. ARMv7 also permits a variant of VFPv3, VFPv3U, that
supports the trapping of floating-point exceptions, see VFPv3U on page A2-31. VFPv2 also supports the
trapping of floating-point exceptions.
For more information about floating-point exceptions see Floating-point exceptions on page A2-42.
Each extension can be implemented at a number of levels. Table A2-4 shows the permitted combinations of
implementations of the two extensions.
Table A2-4 Permitted combinations of Advanced SIMD and VFP extensions
Advanced SIMD

VFP

Not implemented

Not implemented

Integer only

Not implemented

Integer and single-precision floating-point

Single-precision floating-point only a

Integer and single-precision floating-point

Single-precision and double-precision floating-point

Not implemented

Single-precision floating-point only a

Not implemented

Single-precision and double-precision floating-point

a. Must be able to load and store double-precision data.

The optional half-precision extensions provide conversion functions in both directions between
half-precision floating-point and single-precision floating-point. These extensions can be implemented with
any Advanced SIMD and VFP implementation that supports single-precision floating-point. The
half-precision extensions apply to both VFP and Advanced SIMD if they are both implemented.
For system-level information about the Advanced SIMD and VFP extensions see:
•
Advanced SIMD and VFP extension system registers on page B1-66
•
Advanced SIMD and floating-point support on page B1-64.

A2-20

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

Note
Before ARMv7, the VFP extension was called the Vector Floating-point Architecture, and was used for
vector operations. For details of these deprecated operations see Appendix F VFP Vector Operation
Support. From ARMv7:

A2.6.1

•

ARM recommends that the Advanced SIMD extension is used for single-precision vector
floating-point operations

•

an implementation that requires support for vector operations must implement the Advanced SIMD
extension.

Advanced SIMD and VFP extension registers
Advanced SIMD and VFPv3 use the same register set. This is distinct from the ARM core register set. These
registers are generally referred to as the extension registers.
The extension register set consists of either thirty-two or sixteen doubleword registers, as follows:
•

If VFPv2 is implemented, it consists of sixteen doubleword registers.

•

If VFPv3 is implemented, it consists of either thirty-two or sixteen doubleword registers. Where
necessary the terms VFPv3-D32 and VFPv3-D16 are used to distinguish between these two
implementation options.

•

If Advanced SIMD is implemented, it consists of thirty-two doubleword registers. If both Advanced
SIMD and VFPv3 are implemented, VFPv3 must be implemented in its VFPv3-D32 form.

The Advanced SIMD and VFP views of the extension register set are not identical. They are described in
the following sections.
Figure A2-1 on page A2-22 shows the views of the extension register set, and the way the word,
doubleword, and quadword registers overlap.

Advanced SIMD views of the extension register set
Advanced SIMD can view this register set as:
•
Sixteen 128-bit quadword registers, Q0-Q15.
•
Thirty-two 64-bit doubleword registers, D0-D31. This view is also available in VFPv3.
These views can be used simultaneously. For example, a program might hold 64-bit vectors in D0 and D1
and a 128-bit vector in Q1.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-21

Application Level Programmers’ Model

VFP views of the extension register set
In VFPv3-D32, the extension register set consists of thirty-two doubleword registers, that VFP can view as:
•
Thirty-two 64-bit doubleword registers, D0-D31. This view is also available in Advanced SIMD.
•
Thirty-two 32-bit single word registers, S0-S31. Only half of the set is accessible in this view.
In VFPv3-D16 and VFPv2, the extension register set consists of sixteen doubleword registers, that VFP can
view as:
•
Sixteen 64-bit doubleword registers, D0-D15.
•
Thirty-two 32-bit single word registers, S0-S31.
In each case, the two views can be used simultaneously.

Advanced SIMD and VFP register mapping

S2
S3
S4
S5
S6

...

S7

S28
S29
S30
S31

D0

D0

D1

D1

D2

D2

D3

D3

Q0-Q15
Advanced SIMD only

Q0

Q1

D14

D14

D15

D15

...

S1

D0-D31
VFPv3-D32 or
Advanced SIMD

...

S0

D0-D15
VFPv2 or
VFPv3-D16

...

S0-S31
VFP only

Q7

D16
Q8

...

...

D17

D30
Q15
D31

Figure A2-1 Advanced SIMD and VFP register set

A2-22

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

The mapping between the registers is as follows:
S<2n> maps to the least significant half of D
•
•
S<2n+1> maps to the most significant half of D
•
D<2n> maps to the least significant half of Q
•
D<2n+1> maps to the most significant half of Q.
For example, you can access the least significant half of the elements of a vector in Q6 by referring to D12,
and the most significant half of the elements by referring to D13.

Pseudocode details of Advanced SIMD and VFP extension registers
The pseudocode function VFPSmallRegisterBank() returns FALSE if all of the 32 registers D0-D31 can be
accessed, and TRUE if only the 16 registers D0-D15 can be accessed:
boolean VFPSmallRegisterBank()

In more detail, VFPSmallRegisterBank():
•
returns TRUE for a VFPv2 or VFPv3-D16 implementation
•
for a VFPv3-D32 implementation:
—
returns FALSE if CPACR.D32DIS == 0
—
returns TRUE if CPACR.D32DIS == 1 and CPACR.ASEDIS == 1
—
results in UNPREDICTABLE behavior if CPACR.D32DIS == 1 and CPACR.ASEDIS == 0.
For details of the CPACR register, see:
•
c1, Coprocessor Access Control Register (CPACR) on page B3-104 for a VMSA implementation
•
c1, Coprocessor Access Control Register (CPACR) on page B4-51 for a PMSA implementation.
The S0-S31, D0-D31, and Q0-Q15 views of the registers are provided by the following functions:
// The 64-bit extension register bank for Advanced SIMD and VFP.
array bits(64) _D[0..31];
// S[] - non-assignment form
// =========================
bits(32) S[integer n]
assert n >= 0 && n <= 31;
if (n MOD 2) == 0 then
result = D[n DIV 2]<31:0>;
else
result = D[n DIV 2]<63:32>;
return result;
// S[] - assignment form
// =====================
S[integer n] = bits(32) value
assert n >= 0 && n <= 31;
if (n MOD 2) == 0 then

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-23

Application Level Programmers’ Model

D[n DIV 2]<31:0> = value;
else
D[n DIV 2]<63:32> = value;
return;
// D[] - non-assignment form
// =========================
bits(64) D[integer n]
assert n >= 0 && n <= 31;
if n >= 16 && VFPSmallRegisterBank() then UNDEFINED;
return _D[n];
// D[] - assignment form
// =====================
D[integer n] = bits(64) value
assert n >= 0 && n <= 31;
if n >= 16 && VFPSmallRegisterBank() then UNDEFINED;
_D[n] = value;
return;
// Q[] - non-assignment form
// =========================
bits(128) Q[integer n]
assert n >= 0 && n <= 15;
return D[2*n+1]:D[2*n];
// Q[] - assignment form
// =====================
Q[integer n]
assert n
D[2*n] =
D[2*n+1]
return;

A2-24

= bits(128) value
>= 0 && n <= 15;
value<63:0>;
= value<127:64>;

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

A2.6.2

Data types supported by the Advanced SIMD extension
When the Advanced SIMD extension is implemented, it can operate on integer and floating-point data. It
defines a set of data types to represent the different data formats. Table A2-5 shows the available formats.
Each instruction description specifies the data types that the instruction supports.
Table A2-5 Advanced SIMD data types
Data type specifier

Meaning

.

Any element of  bits

.F

Floating-point number of  bits

.I

Signed or unsigned integer of  bits

.P

Polynomial over {0,1} of degree less than 

.S

Signed integer of  bits

.U

Unsigned integer of  bits

The polynomial data type is described in Polynomial arithmetic over {0,1} on page A2-67.
The .F16 data type is the half-precision data type currently selected by the FPSCR.AHP bit, see Advanced
SIMD and VFP system registers on page A2-28. It is supported only when the half-precision extensions are
implemented.
The .F32 data type is the ARM standard single-precision floating-point data type, see Advanced SIMD and
VFP single-precision format on page A2-34.
The instruction definitions use a data type specifier to define the data types appropriate to the operation.
Figure A2-2 on page A2-26 shows the hierarchy of Advanced SIMD data types.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-25

Application Level Programmers’ Model

.S8
.U8

.I8
.8

.P8
.S16
.U16

.I16
.16

.P16
.F16 ‡
.S32
.U32

.I32
.32

.F32
.S64
.U64

.I64
.64

-

‡ Supported only if the half-precision extensions are implemented

Figure A2-2 Advanced SIMD data type hierarchy
For example, a multiply instruction must distinguish between integer and floating-point data types.
However, some multiply instructions use modulo arithmetic for integer instructions and therefore do not
need to distinguish between signed and unsigned inputs.
A multiply instruction that generates a double-width (long) result must specify the input data types as signed
or unsigned, because for this operation it does make a difference.

A2.6.3

Advanced SIMD vectors
When the Advanced SIMD extension is implemented, a register can hold one or more packed elements, all
of the same size and type. The combination of a register and a data type describes a vector of elements. The
vector is considered to be an array of elements of the data type specified in the instruction. The number of
elements in the vector is implied by the size of the data elements and the size of the register.
Vector indices are in the range 0 to (number of elements – 1). An index of 0 refers to the least significant
end of the vector. Figure A2-3 on page A2-27 shows examples of Advanced SIMD vectors:

A2-26

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

127

0

Qn
.F32

.F32

.F32

.F32

[3]

[2]

[1]

[0]

128-bit vector of single-precision
(32-bit) floating-point numbers

.S16

.S16

.S16

.S16

.S16

.S16

.S16

.S16

[7]

[6]

[5]

[4]

[3]

[2]

[1]

[0]

63

128-bit vector of 16-bit signed integers

0

Dn
.S32

.S32

[1]

[0]

64-bit vector of 32-bit signed integers

.U16

.U16

.U16

.U16

[3]

[2]

[1]

[0]

64-bit vector of 16-bit unsigned integers

Figure A2-3 Examples of Advanced SIMD vectors

Pseudocode details of Advanced SIMD vectors
The pseudocode function Elem[] is used to access the element of a specified index and size in a vector:
// Elem[] - non-assignment form
// ============================
bits(size) Elem[bits(N) vector, integer e, integer size]
assert e >= 0 && (e+1)*size <= N;
return vector<(e+1)*size-1:e*size>;
// Elem[] - assignment form
// ========================
Elem[bits(N) vector, integer e, integer size] = bits(size) value
assert e >= 0 && (e+1)*size <= N;
vector<(e+1)*size-1:e*size> = value;
return;

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-27

Application Level Programmers’ Model

A2.6.4

Advanced SIMD and VFP system registers
The Advanced SIMD and VFP extensions have a shared register space for system registers. Only one
register in this space is accessible at the application level, see Floating-point Status and Control Register
(FPSCR).
See Advanced SIMD and VFP extension system registers on page B1-66 for the system level description of
the registers.

Floating-point Status and Control Register (FPSCR)
The Floating-point Status and Control Register (FPSCR) is implemented in any system that implements one
or both of:
•
the VFP extension
•
the Advanced SIMD extension.
The FPSCR provides all necessary User level control of the floating-point system
The FPSCR is a 32-bit read/write system register, accessible in unprivileged and privileged modes.
The format of the FPSCR is:
31 30 29 28 27 26 25 24 23 22 21 20 19 18

N Z C V
QC
AHP
DN
FZ

Stride
RMode
UNK/SBZP

16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Len
IDE

UNK/
SBZP

UNK/
SBZP

IXE
UFE
OFE
DZE
IOE
IDC

IXC
UFC
OFC
DZC
IOC

Bits [31:28]

Condition code bits. These are updated on floating-point comparison operations. They are
not updated on SIMD operations, and do not affect SIMD instructions.
N, bit [31] Negative condition code flag.
Z, bit [30] Zero condition code flag.
C, bit [29] Carry condition code flag.
V, bit [28] Overflow condition code flag.

QC, bit [27]

Cumulative saturation flag, Advanced SIMD only. This bit is set to 1 to indicate that an
Advanced SIMD integer operation has saturated since 0 was last written to this bit. For
details of saturation, see Pseudocode details of saturation on page A2-9.
The value of this bit is ignored by the VFP extension. If Advanced SIMD is not implemented
this bit is UNK/SBZP.

A2-28

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

AHP, bit[26] Alternative half-precision control bit:
0
IEEE half-precision format selected.
1
Alternative half-precision format selected.
For more information see Advanced SIMD and VFP half-precision formats on page A2-38.
If the half-precision extensions are not implemented this bit is UNK/SBZP.
Bits [19,14:13,6:5]
Reserved. UNK/SBZP.
DN, bit [25]

Default NaN mode control bit:
0
NaN operands propagate through to the output of a floating-point operation.
1
Any operation involving one or more NaNs returns the Default NaN.
For more information, see NaN handling and the Default NaN on page A2-41.
The value of this bit only controls VFP arithmetic. Advanced SIMD arithmetic always uses
the Default NaN setting, regardless of the value of the DN bit.

FZ, bit [24]

Flush-to-zero mode control bit:
0

Flush-to-zero mode disabled. Behavior of the floating-point system is fully
compliant with the IEEE 754 standard.

1

Flush-to-zero mode enabled.

For more information, see Flush-to-zero on page A2-39.
The value of this bit only controls VFP arithmetic. Advanced SIMD arithmetic always uses
the Flush-to-zero setting, regardless of the value of the FZ bit.
RMode, bits [23:22]
Rounding Mode control field. The encoding of this field is:
0b00
Round to Nearest (RN) mode
0b01
Round towards Plus Infinity (RP) mode
0b10
Round towards Minus Infinity (RM) mode
0b11
Round towards Zero (RZ) mode.
The specified rounding mode is used by almost all VFP floating-point instructions.
Advanced SIMD arithmetic always uses the Round to Nearest setting, regardless of the
value of the RMode bits.
Stride, bits [21:20] and Len, bits [18:16]
Use of nonzero values of these fields is deprecated in ARMv7. For details of their use in
previous versions of the ARM architecture see Appendix F VFP Vector Operation Support.
The values of these fields are ignored by the Advanced SIMD extension.
Bits [15,12:8] Floating-point exception trap enable bits. These bits are supported only in VFPv2 and
VFPv3U. They are reserved, RAZ/SBZP, on a system that implements VFPv3.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-29

Application Level Programmers’ Model

The possible values of each bit are:
0
Untrapped exception handling selected
1
Trapped exception handling selected.
The values of these bits control only VFP arithmetic. Advanced SIMD arithmetic always
uses untrapped exception handling, regardless of the values of these bits.
For more information, see Floating-point exceptions on page A2-42.

Bits [7,4:0]

IDE, bit [15]

Input Denormal exception trap enable.

IXE, bit [12]

Inexact exception trap enable.

UFE, bit [11]

Underflow exception trap enable.

OFE, bit [10]

Overflow exception trap enable.

DZE, bit [9]

Division by Zero exception trap enable.

IOE, bit [8]

Invalid Operation exception trap enable.

Cumulative exception flags for floating-point exceptions. Each of these bits is set to 1 to
indicate that the corresponding exception has occurred since 0 was last written to it. How
VFP instructions update these bits depends on the value of the corresponding exception trap
enable bits:
Trap enable bit = 0
If the floating-point exception occurs then the cumulative exception flag is set
to 1.
Trap enable bit = 1
If the floating-point exception occurs the trap handling software can decide
whether to set the cumulative exception flag to 1.
Advanced SIMD instructions set each cumulative exception flag if the corresponding
exception occurs in one or more of the floating-point calculations performed by the
instruction, regardless of the setting of the trap enable bits.
For more information, see Floating-point exceptions on page A2-42.
IDC, bit [7]

Input Denormal cumulative exception flag.

IXC, bit [4]

Inexact cumulative exception flag.

UFC, bit [3]

Underflow cumulative exception flag.

OFC, bit [2]

Overflow cumulative exception flag.

DZC, bit [1]

Division by Zero cumulative exception flag.

IOC, bit [0]

Invalid Operation cumulative exception flag.

If the processor implements the integer-only Advanced SIMD extension and does not implement the VFP
extension, all of these bits except QC are UNK/SBZP.
Writes to the FPSCR can have side-effects on various aspects of processor operation. All of these
side-effects are synchronous to the FPSCR write. This means they are guaranteed not to be visible to earlier
instructions in the execution stream, and they are guaranteed to be visible to later instructions in the
execution stream.

A2-30

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

Accessing the FPSCR
You read or write the FPSCR using the VMRS and VMSR instructions. For more information, see VMRS on
page A8-658 and VMSR on page A8-660. For example:
VMRS , FPSCR
VMSR FPSCR, 

A2.6.5

; Read Floating-point System Control Register
; Write Floating-point System Control Register

VFPv3U
VFPv3 does not support the exception trap enable bits in the FPSCR, see Floating-point Status and Control
Register (FPSCR) on page A2-28. All floating-point exceptions are untrapped.
The VFPv3U variant of the VFPv3 architecture implements the exception trap enable bits in the FPSCR,
and provides exception handling as described in VFP support code on page B1-70. There is a separate trap
enable bit for each of the six floating-point exceptions described in Floating-point exceptions on
page A2-42. The VFPv3U architecture is otherwise identical to VFPv3.
Trapped exception handling never causes the corresponding cumulative exception bit of the FPSCR to be
set to 1. If this behavior is desired, the trap handler routine must use a read, modify, write sequence on the
FPSCR to set the cumulative exception bit.
VFPv3U is backwards compatible with VFPv2.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-31

Application Level Programmers’ Model

A2.7

Floating-point data types and arithmetic
The VFP extension supports single-precision (32-bit) and double-precision (64-bit) floating-point data
types and arithmetic as defined by the IEEE 754 floating-point standard. It also supports the ARM Standard
modifications to that arithmetic described in Flush-to-zero on page A2-39 and NaN handling and the
Default NaN on page A2-41.
Trapped floating-point exception handling is supported in the VFPv3U variant only (see VFPv3U on
page A2-31).
ARM standard floating-point arithmetic means IEEE 754 floating-point arithmetic with the ARM standard
modifications and:
•
the Round to Nearest rounding mode selected
•
untrapped exception handling selected for all floating-point exceptions.
The Advanced SIMD extension only supports single-precision ARM standard floating-point arithmetic.

Note
Implementations of the VFP extension require support code to be installed in the system if trapped
floating-point exception handling is required. See VFP support code on page B1-70.
They might also require support code to be installed in the system to support other aspects of their
floating-point arithmetic. It is IMPLEMENTATION DEFINED which aspects of VFP floating-point arithmetic
are supported in a system without support code installed.
Aspects of floating-point arithmetic that are implemented in support code are likely to run much more
slowly than those that are executed in hardware.
ARM recommends that:
•
•

To maximize the chance of getting high floating-point performance, software developers use ARM
standard floating-point arithmetic.
Software developers check whether their systems have support code installed, and if not, observe the
restrictions on what operations their VFP implementation can handle
without support code.

IMPLEMENTATION DEFINED

•

A2-32

VFP implementation developers implement at least ARM standard floating-point arithmetic in
hardware, so that it can be executed without any need for support code.

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

A2.7.1

ARM standard floating-point input and output values
ARM standard floating-point arithmetic supports the following input formats defined by the IEEE 754
floating-point standard:
•

Zeros.

•

Normalized numbers.

•

Denormalized numbers are flushed to 0 before floating-point operations. For details, see
Flush-to-zero on page A2-39.

•

NaNs.

•

Infinities.

ARM standard floating-point arithmetic supports the Round to Nearest rounding mode defined by the IEEE
754 standard.
ARM standard floating-point arithmetic supports the following output result formats defined by the IEEE
754 standard:
•

Zeros.

•

Normalized numbers.

•

Results that are less than the minimum normalized number are flushed to zero, see Flush-to-zero on
page A2-39.

•

NaNs produced in floating-point operations are always the default NaN, see NaN handling and the
Default NaN on page A2-41.

•

Infinities.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-33

Application Level Programmers’ Model

A2.7.2

Advanced SIMD and VFP single-precision format
The single-precision floating-point format used by the Advanced SIMD and VFP extensions is as defined
by the IEEE 754 standard.
This description includes ARM-specific details that are left open by the standard. It is only intended as an
introduction to the formats and to the values they can contain. For full details, especially of the handling of
infinities, NaNs and signed zeros, see the IEEE 754 standard.
A single-precision value is a 32-bit word, and must be word-aligned when held in memory. It has the format:
31 30

S

23 22

exponent

0

fraction

The interpretation of the format depends on the value of the exponent field, bits [30:23]:
0 < exponent < 0xFF
The value is a normalized number and is equal to:
–1S × 2(exponent – 127) × (1.fraction)
The minimum positive normalized number is 2–126, or approximately 1.175 ×10–38.
The maximum positive normalized number is (2 – 2–23) × 2127, or approximately
3.403 ×1038.
exponent == 0
The value is either a zero or a denormalized number, depending on the fraction bits:
fraction == 0
The value is a zero. There are two distinct zeros:
+0
when S==0
–0
when S==1.
These usually behave identically. In particular, the result is equal if +0 and –0
are compared as floating-point numbers. However, they yield different results in
some circumstances. For example, the sign of the infinity produced as the result
of dividing by zero depends on the sign of the zero. The two zeros can be
distinguished from each other by performing an integer comparison of the two
words.
fraction != 0
The value is a denormalized number and is equal to:
–1S × 2–126 × (0.fraction)
The minimum positive denormalized number is 2–149, or approximately 1.401 × 10–45.
Denormalized numbers are flushed to zero in the Advanced SIMD extension. They are
optionally flushed to zero in the VFP extension. For details see Flush-to-zero on
page A2-39.

A2-34

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

exponent == 0xFF
The value is either an infinity or a Not a Number (NaN), depending on the fraction bits:
fraction == 0
The value is an infinity. There are two distinct infinities:
+∞

When S==0. This represents all positive numbers that are too big to
be represented accurately as a normalized number.

-∞

When S==1. This represents all negative numbers with an absolute
value that is too big to be represented accurately as a normalized
number.

fraction != 0
The value is a NaN, and is either a quiet NaN or a signaling NaN.
In the VFP architecture, the two types of NaN are distinguished on the basis of
their most significant fraction bit, bit [22]:
bit [22] == 0
The NaN is a signaling NaN. The sign bit can take any value, and
the remaining fraction bits can take any value except all zeros.
bit [22] == 1
The NaN is a quiet NaN. The sign bit and remaining fraction bits
can take any value.
For details of the default NaN see NaN handling and the Default NaN on page A2-41.

Note
NaNs with different sign or fraction bits are distinct NaNs, but this does not mean you can use floating-point
comparison instructions to distinguish them. This is because the IEEE 754 standard specifies that a NaN
compares as unordered with everything, including itself. However, you can use integer comparisons to
distinguish different NaNs.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-35

Application Level Programmers’ Model

A2.7.3

VFP double-precision format
The double-precision floating-point format used by the VFP extension is as defined by the IEEE 754
standard.
This description includes VFP-specific details that are left open by the standard. It is only intended as an
introduction to the formats and to the values they can contain. For full details, especially of the handling of
infinities, NaNs and signed zeros, see the IEEE 754 standard.
A double-precision value consists of two 32-bit words, with the formats:
Most significant word:
31 30

20 19

S

0

exponent

fraction[51:32]

Least significant word:
31

0

fraction[31:0]
When held in memory, the two words must appear consecutively and must both be word-aligned. The order
of the two words depends on the endianness of the memory system:
•

In a little-endian memory system, the least significant word appears at the lower memory address and
the most significant word at the higher memory address.

•

In a big-endian memory system, the most significant word appears at the lower memory address and
the least significant word at the higher memory address.

Double-precision values represent numbers, infinities and NaNs in a similar way to single-precision values,
with the interpretation of the format depending on the value of the exponent:
0 < exponent < 0x7FF
The value is a normalized number and is equal to:
–1S × 2exponent–1023 × (1.fraction)
The minimum positive normalized number is 2–1022, or approximately 2.225 × 10–308.
The maximum positive normalized number is (2 – 2–52) × 21023, or approximately
1.798 × 10308.
exponent == 0
The value is either a zero or a denormalized number, depending on the fraction bits:
fraction == 0
The value is a zero. There are two distinct zeros that behave analogously to the
two single-precision zeros:
+0
when S==0
–0
when S==1.

A2-36

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

fraction != 0
The value is a denormalized number and is equal to:
1–S × 2–1022 × (0.fraction)
The minimum positive denormalized number is 2–1074, or approximately 4.941 × 10–324.
Optionally, denormalized numbers are flushed to zero in the VFP extension. For details see
Flush-to-zero on page A2-39.
exponent == 0x7FF
The value is either an infinity or a NaN, depending on the fraction bits:
fraction == 0
the value is an infinity. As for single-precision, there are two infinities:
+∞
Plus infinity, when S==0
-∞
Minus infinity, when S==1.
fraction != 0
The value is a NaN, and is either a quiet NaN or a signaling NaN.
In the VFP architecture, the two types of NaN are distinguished on the basis of
their most significant fraction bit, bit [19] of the most significant word:
bit [19] == 0
The NaN is a signaling NaN. The sign bit can take any value, and
the remaining fraction bits can take any value except all zeros.
bit [19] == 1
The NaN is a quiet NaN. The sign bit and the remaining fraction bits
can take any value.
For details of the default NaN see NaN handling and the Default NaN on page A2-41.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-37

Application Level Programmers’ Model

A2.7.4

Advanced SIMD and VFP half-precision formats
Two half-precision floating-point formats are used by the half-precision extensions to Advanced SIMD and
VFP:
•
IEEE half-precision, as described in the revised IEEE 754 standard
•
Alternative half-precision.
The description of IEEE half-precision includes ARM-specific details that are left open by the standard, and
is only an introduction to the formats and to the values they can contain. For more information, especially
on the handling of infinities, NaNs and signed zeros, see the IEEE 754 standard.
For both half-precision floating-point formats, the layout of the 16-bit number is the same. The format is:
15 14
S

10 9

Exponent

0

Fraction

The interpretation of the format depends on the value of the exponent field, bits[14:10] and on which
half-precision format is being used.
0 < exponent < 0x1F
The value is a normalized number and is equal to:
–1S × 2((exponent-15) × (1.fraction)
The minimum positive normalized number is 2–14, or approximately 6.104 ×10–5.
The maximum positive normalized number is (2 – 2–10) × 215, or 65504.
Larger normalized numbers can be expressed using the alternative format when the
exponent == 0x1F.
exponent == 0
The value is either a zero or a denormalized number, depending on the fraction bits:
fraction == 0
The value is a zero. There are two distinct zeros:
+0
when S==0
–0
when S==1.
fraction != 0
The value is a denormalized number and is equal to:
–1S × 2–14 × (0.fraction)
The minimum positive denormalized number is 2–25, or approximately 2.980 × 10–8.

A2-38

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

exponent == 0x1F
The value depends on which half-precision format is being used:
IEEE Half-precision
The value is either an infinity or a Not a Number (NaN), depending on the
fraction bits:
fraction == 0
The value is an infinity. There are two distinct infinities:
+∞

When S==0. This represents all positive
numbers that are too big to be represented
accurately as a normalized number.

-∞

When S==1. This represents all negative
numbers with an absolute value that is too
big to be represented accurately as a
normalized number.

fraction != 0
The value is a NaN, and is either a quiet NaN or a signaling NaN.
The two types of NaN are distinguished by their most significant
fraction bit, bit [9]:
bit [9] == 0

The NaN is a signaling NaN. The sign bit
can take any value, and the remaining
fraction bits can take any value except all
zeros.

bit [9] == 1

The NaN is a quiet NaN. The sign bit and
remaining fraction bits can take any value.

Alternative Half-precision
The value is a normalized number and is equal to:
-1S x 216 x (1.fraction)
The maximum positive normalized number is (2-2-10) x 216 or 131008.

A2.7.5

Flush-to-zero
The performance of floating-point implementations can be significantly reduced when performing
calculations involving denormalized numbers and Underflow exceptions. In particular this occurs for
implementations that only handle normalized numbers and zeros in hardware, and invoke support code to
handle any other types of value. For an algorithm where a significant number of the operands and
intermediate results are denormalized numbers, this can result in a considerable loss of performance.
In many of these algorithms, this performance can be recovered, without significantly affecting the accuracy
of the final result, by replacing the denormalized operands and intermediate results with zeros. To permit
this optimization, VFP implementations have a special processing mode called Flush-to-zero mode.
Advanced SIMD implementations always use Flush-to-zero mode.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-39

Application Level Programmers’ Model

Behavior in Flush-to-zero mode differs from normal IEEE 754 arithmetic in the following ways:
•

All inputs to floating-point operations that are double-precision de-normalized numbers or
single-precision de-normalized numbers are treated as though they were zero. This causes an Input
Denormal exception, but does not cause an Inexact exception. The Input Denormal exception occurs
only in Flush-to-zero mode.
The FPSCR contains a cumulative exception bit FPSCR.IDC and trap enable bit FPSCR.IDE
corresponding to the Input Denormal exception. For details of how these are used when processing
the exception see Advanced SIMD and VFP system registers on page A2-28.
The occurrence of all exceptions except Input Denormal is determined using the input values after
flush-to-zero processing has occurred.

•

The result of a floating-point operation is flushed to zero if the result of the operation before rounding
satisfies the condition:
0 < Abs(result) < MinNorm, where:
—

MinNorm == 2-126 for single-precision

—

MinNorm == 2-1022 for double-precision.

This causes the FPSCR.UFC bit to be set to 1, and prevents any Inexact exception from occurring for
the operation.
Underflow exceptions occur only when a result is flushed to zero.
In a VFPv2 or VFPv3U implementation Underflow exceptions that occur in Flush-to-zero mode are
always treated as untrapped, even when the Underflow trap enable bit, FPSCR.UFE, is set to 1.
•

An Inexact exception does not occur if the result is flushed to zero, even though the final result of
zero is not equivalent to the value that would be produced if the operation were performed with
unbounded precision and exponent range.

For information on the FPSCR bits see Floating-point Status and Control Register (FPSCR) on page A2-28.
When an input or a result is flushed to zero the value of the sign bit of the zero is determined as follows:
•

In VFPv3 or VFPv3U, it is preserved. That is, the sign bit of the zero matches the sign bit of the input
or result that is being flushed to zero.

•

In VFPv2, it is IMPLEMENTATION DEFINED whether it is preserved or always positive. The same
choice must be made for all cases of flushing an input or result to zero.

Flush-to-zero mode has no effect on half-precision numbers that are inputs to floating-point operations, or
results from floating-point operations.

A2-40

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

Note
Flush-to-zero mode is incompatible with the IEEE 754 standard, and must not be used when IEEE 754
compatibility is a requirement. Flush-to-zero mode must be treated with care. Although it can lead to a major
performance increase on many algorithms, there are significant limitations on its use. These are application
dependent:

A2.7.6

•

On many algorithms, it has no noticeable effect, because the algorithm does not normally use
denormalized numbers.

•

On other algorithms, it can cause exceptions to occur or seriously reduce the accuracy of the results
of the algorithm.

NaN handling and the Default NaN
The IEEE 754 standard specifies that:
•

an operation that produces an Invalid Operation floating-point exception generates a quiet NaN as its
result if that exception is untrapped

•

an operation involving a quiet NaN operand, but not a signaling NaN operand, returns an input NaN
as its result.

The VFP behavior when Default NaN mode is disabled adheres to this with the following extra details,
where the first operand means the first argument to the pseudocode function call that describes the
operation:
•

If an untrapped Invalid Operation floating-point exception is produced because one of the operands
is a signaling NaN, the quiet NaN result is equal to the signaling NaN with its most significant
fraction bit changed to 1. If both operands are signaling NaNs, the result is produced in this way from
the first operand.

•

If an untrapped Invalid Operation floating-point exception is produced for other reasons, the quiet
NaN result is the Default NaN.

•

If both operands are quiet NaNs, the result is the first operand.

The VFP behavior when Default NaN mode is enabled, and the Advanced SIMD behavior in all
circumstances, is that the Default NaN is the result of all floating-point operations that:
•
generate untrapped Invalid Operation floating-point exceptions
•
have one or more quiet NaN inputs.
Table A2-6 on page A2-42 shows the format of the default NaN for ARM floating-point processors.
Default NaN mode is selected for VFP by setting the FPSCR.DN bit to 1, see Floating-point Status and
Control Register (FPSCR) on page A2-28.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-41

Application Level Programmers’ Model

Other aspects of the functionality of the Invalid Operation exception are not affected by Default NaN mode.
These are that:
•
If untrapped, it causes the FPSCR.IOC bit be set to 1.
•
If trapped, it causes a user trap handler to be invoked. This is only possible in VFPv2 and VFPv3U.
Table A2-6 Default NaN encoding
Half-precision, IEEE Format

Single-precision

Double-precision

Sign bit

0

0a

0a

Exponent

0x1F

0xFF

0x7FF

Fraction

Bit[9] == 1, bits[8:0] == 0

bit [22] == 1, bits [21:0] == 0

bit [51] == 1, bits [50:0] == 0

a. In VFPv2, the sign bit of the Default NaN is UNKNOWN.

A2.7.7

Floating-point exceptions
The Advanced SIMD and VFP extensions record the following floating-point exceptions in the FPSCR
cumulative flags, see Floating-point Status and Control Register (FPSCR) on page A2-28:
IOC

Invalid Operation. The flag is set to 1 if the result of an operation has no mathematical value
or cannot be represented. Cases include infinity * 0, +infinity + (–infinity), for example.
These tests are made after flush-to-zero processing. For example, if flush-to-zero mode is
selected, multiplying a denormalized number and an infinity is treated as 0 * infinity and
causes an Invalid Operation floating-point exception.
IOC is also set on any floating-point operation with one or more signaling NaNs as
operands, except for negation and absolute value, as described in Negation and absolute
value on page A2-47.

DZC

Division by Zero. The flag is set to 1 if a divide operation has a zero divisor and a dividend
that is not zero, an infinity or a NaN. These tests are made after flush-to-zero processing, so
if flush-to-zero processing is selected, a denormalized dividend is treated as zero and
prevents Division by Zero from occurring, and a denormalized divisor is treated as zero and
causes Division by Zero to occur if the dividend is a normalized number.
For the reciprocal and reciprocal square root estimate functions the dividend is assumed to
be +1.0. This means that a zero or denormalized operand to these functions sets the DZC
flag.

A2-42

OFC

Overflow. The flag is set to 1 if the absolute value of the result of an operation, produced
after rounding, is greater than the maximum positive normalized number for the destination
precision.

UFC

Underflow. The flag is set to 1 if the absolute value of the result of an operation, produced
before rounding, is less than the minimum positive normalized number for the destination
precision, and the rounded result is inexact.

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

The criteria for the Underflow exception to occur are different in Flush-to-zero mode. For
details, see Flush-to-zero on page A2-39.
IXC

Inexact. The flag is set to 1 if the result of an operation is not equivalent to the value that
would be produced if the operation were performed with unbounded precision and exponent
range.
The criteria for the Inexact exception to occur are different in Flush-to-zero mode. For
details, see Flush-to-zero on page A2-39.

IDC

Input Denormal. The flag is set to 1 if a denormalized input operand is replaced in the
computation by a zero, as described in Flush-to-zero on page A2-39.

With the Advanced SIMD extension and the VFPv3 extension these are non-trapping exceptions and the
data-processing instructions do not generate any trapped exceptions.
With the VFPv2 and VFPv3U extensions:
•

These exceptions can be trapped, by setting trap enable flags in the FPSCR, see VFPv3U on
page A2-31. Trapped floating-point exceptions are delivered to user code in an IMPLEMENTATION
DEFINED fashion.

•

The definitions of the floating-point exceptions change as follows:
—

if the Underflow exception is trapped, it occurs if the absolute value of the result of an
operation, produced before rounding, is less than the minimum positive normalized number
for the destination precision, regardless of whether the rounded result is inexact

—

higher priority trapped exceptions can prevent lower priority exceptions from occurring, as
described in Combinations of exceptions on page A2-44.

Table A2-7 shows the default results of the floating-point exceptions:
Table A2-7 Floating-point exception default results
Exception type

Default result for positive sign

Default result for negative sign

IOC, Invalid Operation

Quiet NaN

Quiet NaN

DZC, Division by Zero

+∞ (plus infinity)

–∞ (minus infinity)

OFC, Overflow

RN, RP:
RM, RZ:

+∞ (plus infinity)

+MaxNorm

RN, RM:
RP, RZ:

–∞ (minus infinity)

–MaxNorm

UFC, Underflow

Normal rounded result

Normal rounded result

IXC, Inexact

Normal rounded result

Normal rounded result

IDC, Input Denormal

Normal rounded result

Normal rounded result

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-43

Application Level Programmers’ Model

In Table A2-7 on page A2-43:
MaxNorm
The maximum normalized number of the destination precision
RM
Round towards Minus Infinity mode, as defined in the IEEE 754 standard
RN
Round to Nearest mode, as defined in the IEEE 754 standard
RP
Round towards Plus Infinity mode, as defined in the IEEE 754 standard
RZ
Round towards Zero mode, as defined in the IEEE 754 standard
•

For Invalid Operation exceptions, for details of which quiet NaN is produced as the default result see
NaN handling and the Default NaN on page A2-41.

•

For Division by Zero exceptions, the sign bit of the default result is determined normally for a
division. This means it is the exclusive OR of the sign bits of the two operands.

•

For Overflow exceptions, the sign bit of the default result is determined normally for the overflowing
operation.

Combinations of exceptions
The following pseudocode functions perform floating-point operations:
FixedToFP()
FPAbs()
FPAdd()
FPCompare()
FPCompareGE()
FPCompareGT()
FPDiv()
FPDoubleToSingle()
FPMax()
FPMin()
FPMul()
FPNeg()
FPRecipEstimate()
FPRecipStep()
FPRSqrtEstimate()
FPRSqrtStep()
FPSingleToDouble()
FPSqrt()
FPSub()
FPToFixed()

All of these operations except FPAbs() and FPNeg() can generate floating-point exceptions.
More than one exception can occur on the same operation. The only combinations of exceptions that can
occur are:
•
Overflow with Inexact
•
Underflow with Inexact
•
Input Denormal with other exceptions.

A2-44

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

When none of the exceptions caused by an operation are trapped, any exception that occurs causes the
associated cumulative flag in the FPSCR to be set.
When one or more exceptions caused by an operation are trapped, the behavior of the instruction depends
on the priority of the exceptions. The Inexact exception is treated as lowest priority, and Input Denormal as
highest priority:
•

If the higher priority exception is trapped, its trap handler is called. It is IMPLEMENTATION DEFINED
whether the parameters to the trap handler include information about the lower priority exception.
Apart from this, the lower priority exception is ignored in this case.

•

If the higher priority exception is untrapped, its cumulative bit is set to 1 and its default result is
evaluated. Then the lower priority exception is handled normally, using this default result.

Some floating-point instructions specify more than one floating-point operation, as indicated by the
pseudocode descriptions of the instruction. In such cases, an exception on one operation is treated as higher
priority than an exception on another operation if the occurrence of the second exception depends on the
result of the first operation. Otherwise, it is UNPREDICTABLE which exception is treated as higher priority.
For example, a VMLA.F32 instruction specifies a floating-point multiplication followed by a floating-point
addition. The addition can generate Overflow, Underflow and Inexact exceptions, all of which depend on
both operands to the addition and so are treated as lower priority than any exception on the multiplication.
The same applies to Invalid Operation exceptions on the addition caused by adding opposite-signed
infinities.
The addition can also generate an Input Denormal exception, caused by the addend being a denormalized
number while in Flush-to-zero mode. It is UNPREDICTABLE which of an Input Denormal exception on the
addition and an exception on the multiplication is treated as higher priority, because the occurrence of the
Input Denormal exception does not depend on the result of the multiplication. The same applies to an Invalid
Operation exception on the addition caused by the addend being a signaling NaN.

Note
Like other details of VFP instruction execution, these rules about exception handling apply to the overall
results produced by an instruction when the system uses a combination of hardware and support code to
implement it. See VFP support code on page B1-70 for more information.
These principles also apply to the multiple floating-point operations generated by VFP instructions in the
deprecated VFP vector mode of operation. For details of this mode of operation see Appendix F VFP Vector
Operation Support.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-45

Application Level Programmers’ Model

A2.7.8

Pseudocode details of floating-point operations
This section contains pseudocode definitions of the floating-point operations used by the architecture.

Generation of specific floating-point values
The following pseudocode functions generate specific floating-point values. The sign argument of
FPInfinity(), FPMaxNormal(), and FPZero() is '0' for the positive version and '1' for the negative version.
// FPZero()
// ========
bits(N) FPZero(bit sign, integer N)
assert N == 16 || N == 32 || N == 64;
if N == 16 then
return sign : ‘00000 0000000000’;
elsif N == 32 then
return sign : ‘00000000 00000000000000000000000’;
else
return sign : ‘00000000000 0000000000000000000000000000000000000000000000000000’;
// FPTwo()
// =======
bits(N) FPTwo(integer N)
assert N == 32 || N == 64;
if N == 32 then
return ‘0 10000000 00000000000000000000000’;
else
return ‘0 10000000000 0000000000000000000000000000000000000000000000000000’;
// FPThree()
// =========
bits(N) FPThree(integer N)
assert N == 32 || N == 64;
if N == 32 then
return ‘0 10000000 10000000000000000000000’;
else
return ‘0 10000000000 1000000000000000000000000000000000000000000000000000’;
// FPMaxNormal()
// =============
bits(N) FPMaxNormal(bit sign, integer N)
assert N == 16 || N == 32 || N == 64;
if N == 16 then
return sign : ‘11110 1111111111’;
elsif N == 32 then
return sign : ‘11111110 11111111111111111111111’;
else
return sign : ‘11111111110 1111111111111111111111111111111111111111111111111111’;

A2-46

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

// FPInfinity()
// ============
bits(N) FPInfinity(bit sign, integer N)
assert N == 16 || N == 32 || N == 64;
if N == 16 then
return sign : ‘11111 0000000000’;
elsif N == 32 then
return sign : ‘11111111 00000000000000000000000’;
else
return sign : ‘11111111111 0000000000000000000000000000000000000000000000000000’;
// FPDefaultNaN()
// ==============
bits(N) FPDefaultNaN(integer N)
assert N == 16 || N == 32 || N == 64;
if N == 16 then
return ‘0 11111 1000000000’;
elsif N == 32 then
return ‘0 11111111 10000000000000000000000’;
else
return ‘0 11111111111 1000000000000000000000000000000000000000000000000000’;

Note
This definition of FPDefaultNaN() applies to VFPv3 and VFPv3U. For VFPv2, the sign bit of the result is a
single-bit UNKNOWN value, instead of 0.

Negation and absolute value
The floating-point negation and absolute value operations only affect the sign bit. They do not treat NaN
operands specially, nor denormalized number operands when flush-to-zero is selected.
// FPNeg()
// =======
bits(N) FPNeg(bits(N) operand)
assert N == 32 || N == 64;
return NOT(operand) : operand;
// FPAbs()
// =======
bits(N) FPAbs(bits(N) operand)
assert N == 32 || N == 64;
return ‘0’ : operand;

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-47

Application Level Programmers’ Model

Floating-point value unpacking
The FPUnpack() function determines the type and numerical value of a floating-point number. It also does
flush-to-zero processing on input operands.
enumeration FPType {FPType_Nonzero, FPType_Zero, FPType_Infinity, FPType_QNaN, FPType_SNaN};
//
//
//
//
//
//
//
//
//
//
//

FPUnpack()
==========
Unpack a floating-point
that it represents. The
and infinities, is very
NaNs. (These values are
and conversions.)

number into its type, sign bit and
real number result has the correct
large in magnitude for infinities,
chosen to simplify the description

the real number
sign for numbers
and is 0.0 for
of comparisons

The ‘fpscr_val’ argument supplies FPSCR control bits. Status information is
updated directly in the FPSCR where appropriate.

(FPType, bit, real) FPUnpack(bits(N) fpval, bits(32) fpscr_val)
assert N == 16 || N == 32 || N == 64;
if N == 16 then
sign = fpval<15>;
exp = fpval<14:10>;
frac = fpval<9:0>;
if IsZero(exp) then
// Produce zero if value is zero
if IsZero(frac) then
type = FPType_Zero; value = 0.0;
else
type = FPType_Nonzero; value = 2^-14 * (UInt(frac) * 2^-10);
elsif IsOnes(exp) && fpscr_val<26> == ‘0’ then // Infinity or NaN in IEEE format
if IsZero(frac) then
type = FPType_Infinity; value = 2^1000000;
else
type = if frac<9> == ‘1’ then FPType_QNaN else FPType_SNaN;
value = 0.0;
else
type = FPType_Nonzero; value = 2^(UInt(exp)-15) * (1.0 + UInt(frac) * 2^-10));
elsif N == 32 then
sign = fpval<31>;
exp = fpval<30:23>;
frac = fpval<22:0>;
if IsZero(exp) then
// Produce zero if value is zero or flush-to-zero is selected.
if IsZero(frac) || fpscr_val<24> == ‘1’ then
type = FPType_Zero; value = 0.0;
if !IsZero(frac) then // Denormalized input flushed to zero
FPProcessException(FPExc_InputDenorm, fpscr_val);
else
type = FPType_Nonzero; value = 2^-126 * (UInt(frac) * 2^-23);

A2-48

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

elsif IsOnes(exp) then
if IsZero(frac) then
type = FPType_Infinity; value = 2^1000000;
else
type = if frac<22> == ‘1’ then FPType_QNaN else FPType_SNaN;
value = 0.0;
else
type = FPType_Nonzero; value = 2^(UInt(exp)-127) * (1.0 + UInt(frac) * 2^-23));
else // N == 64
sign = fpval<63>;
exp = fpval<62:52>;
frac = fpval<51:0>;
if IsZero(exp) then
// Produce zero if value is zero or flush-to-zero is selected.
if IsZero(frac) || fpscr_val<24> == ‘1’ then
type = FPType_Zero; value = 0.0;
if !IsZero(frac) then // Denormalized input flushed to zero
FPProcessException(FPExc_InputDenorm, fpscr_val);
else
type = FPType_Nonzero; value = 2^-1022 * (UInt(frac) * 2^-52);
elsif IsOnes(exp) then
if IsZero(frac) then
type = FPType_Infinity; value = 2^1000000;
else
type = if frac<51> == ‘1’ then FPType_QNaN else FPType_SNaN;
value = 0.0;
else
type = FPType_Nonzero; value = 2^(UInt(exp)-1023) * (1.0 + UInt(frac) * 2^-52));
if sign == ‘1’ then value = -value;
return (type, sign, value);

Floating-point exception and NaN handling
The FPProcessException() procedure checks whether a floating-point exception is trapped, and handles it
accordingly:
enumeration FPExc (FPExc_InvalidOp, FPExc_DivideByZero, FPExc_Overflow,
FPExc_Underflow, FPExc_Inexact, FPExc_InputDenorm};
//
//
//
//
//

FPProcessException()
====================
The ‘fpscr_val’ argument supplies FPSCR control bits. Status information is
updated directly in the FPSCR where appropriate.

FPProcessException(FPExc exception, bits(32) fpscr_val)
// Get appropriate FPSCR bit numbers
case exception of
when FPExc_InvalidOp
enable = 8;
cumul = 0;
when FPExc_DivideByZero enable = 9;
cumul = 1;

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-49

Application Level Programmers’ Model

when FPExc_Overflow
enable = 10; cumul = 2;
when FPExc_Underflow
enable = 11; cumul = 3;
when FPExc_Inexact
enable = 12; cumul = 4;
when FPExc_InputDenorm
enable = 15; cumul = 7;
if fpscr_val then
IMPLEMENTATION_DEFINED floating-point trap handling;
else
FPSCR = ‘1’;
return;

The FPProcessNaN() function processes a NaN operand, producing the correct result value and generating an
Invalid Operation exception if necessary:
//
//
//
//
//

FPProcessNaN()
==============
The ‘fpscr_val’ argument supplies FPSCR control bits. Status information is
updated directly in the FPSCR where appropriate.

bits(N) FPProcessNaN(FPType type, bits(N) operand, bits(32) fpscr_val)
assert N == 32 || N == 64;
topfrac = if N == 32 then 22 else 51;
result = operand;
if type = FPType_SNaN then
result = ‘1’;
FPProcessException(FPExc_InvalidOp, fpscr_val);
if fpscr_val<25> == ‘1’ then // DefaultNaN requested
result = FPDefaultNaN(N);
return result;

The FPProcessNaNs() function performs the standard NaN processing for a two-operand operation:
//
//
//
//
//
//
//
//
//

FPProcessNaNs()
===============
The boolean part of the return value says whether a NaN has been found and
processed. The bits(N) part is only relevant if it has and supplies the
result of the operation.
The ‘fpscr_val’ argument supplies FPSCR control bits. Status information is
updated directly in the FPSCR where appropriate.

(boolean, bits(N)) FPProcessNaNs(FPType type1, FPType type2,
bits(N) op1, bits(N) op2,
bits(32) fpscr_val)
assert N == 32 || N == 64;
if type1 == FPType_SNaN then
done = TRUE; result = FPProcessNaN(type1, op1, fpscr_val);
elsif type2 == FPType_SNaN then
done = TRUE; result = FPProcessNaN(type2, op2, fpscr_val);
elsif type1 == FPType_QNaN then
done = TRUE; result = FPProcessNaN(type1, op1, fpscr_val);
elsif type2 == FPType_QNaN then
done = TRUE; result = FPProcessNaN(type2, op2, fpscr_val);

A2-50

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

else
done = FALSE; result = Zeros(N);
return (done, result);

// ‘Don’t care’ result

Floating-point rounding
The FPRound() function rounds and encodes a floating-point result value to a specified destination format.
This includes processing Overflow, Underflow and Inexact floating-point exceptions and performing
flush-to-zero processing on result values.
//
//
//
//
//

FPRound()
=========
The ‘fpscr_val’ argument supplies FPSCR control bits. Status information is
updated directly in the FPSCR where appropriate.

bits(N) FPRound(real result, integer N, bits(32) fpscr_val)
assert N == 16 || N == 32 || N == 64;
assert result != 0.0;
// Obtain format parameters - minimum exponent, numbers of exponent and fraction bits.
if N == 16 then
minimum_exp = -14; E = 5; F = 10;
elsif N == 32 then
minimum_exp = -126; E = 8; F = 23;
else // N == 64
minimum_exp = -1022; E = 11; F = 52;
// Split value into sign,
if result < 0.0 then
sign = ‘1’; mantissa
else
sign = ‘0’; mantissa
exponent = 0;
while mantissa < 1.0 do
mantissa = mantissa *
while mantissa >= 2.0 do
mantissa = mantissa /

unrounded mantissa and exponent.
= -result;
= result;

2.0;

exponent = exponent - 1;

2.0;

exponent = exponent + 1;

// Deal with flush-to-zero.
if fpscr_val<24> == ‘1’ && N != 16 && exponent < minimum_exp then
result = FPZero(sign, N);
FPSCR.UFC = ‘1’; // Flush-to-zero never generates a trapped exception
else
// Start creating the exponent value for the result. Start by biasing the actual exponent
// so that the minimum exponent becomes 1, lower values 0 (indicating possible underflow).
biased_exp = Max(exponent - minimum_exp + 1, 0);
if biased_exp == 0 then mantissa = mantissa / 2^(minimum_exp - exponent);
// Get the unrounded mantissa as an integer, and the “units in last place” rounding error.
int_mant = RoundDown(mantissa * 2^F); // < 2^F if biased_exp == 0, >= 2^F if not
error = mantissa * 2^F - int_mant;

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-51

Application Level Programmers’ Model

// Underflow occurs if exponent is too small before rounding, and result is inexact or
// the Underflow exception is trapped.
if biased_exp == 0 && (error != 0.0 || fpscr_val<11> == ‘1’) then
FPProcessException(FPExc_Underflow, fpscr_val);
// Round result according to rounding mode.
case fpscr_val<23:22> of
when ‘00’ // Round to Nearest (rounding to even if exactly halfway)
round_up = (error > 0.5 || (error == 0.5 && int_mant<0> == ‘1’));
overflow_to_inf = TRUE;
when ‘01’ // Round towards Plus Infinity
round_up = (error != 0.0 && sign == ‘0’);
overflow_to_inf = (sign == ‘0’);
when ‘10’ // Round towards Minus Infinity
round_up = (error != 0.0 && sign == ‘1’);
overflow_to_inf = (sign == ‘1’);
when ‘11’ // Round towards Zero
round_up = FALSE;
overflow_to_inf = FALSE;
if round_up then
int_mant = int_mant + 1;
if int_mant == 2^F then
// Rounded up from denormalized to normalized
biased_exp = 1;
if int_mant == 2^(F+1) then // Rounded up to next exponent
biased_exp = biased_exp + 1; int_mant = int_mant DIV 2;
// Deal with overflow and generate result.
if N != 16 || fpscr_val<26> == ‘0’ then // Single, double or IEEE half precision
if biased_exp >= 2^E - 1 then
result = if overflow_to_inf then FPInfinity(sign, N) else FPMaxNormal(sign, N);
FPProcessException(FPExc_Overflow, fpscr_val);
else
result = sign : biased_exp : int_mant;
else
// Alternative half precision
if biased_exp >= 2^E then
result = sign : Ones(15);
FPProcessException(FPExc_InvalidOp, fpscr_val);
error = 0.0; // avoid an Inexact exception
else
result = sign : biased_exp : int_mant;
// Deal with Inexact exception.
if error != 0 then
FPProcessException(FPExc_Inexact, fpscr_val);
return result;

Selection of ARM standard floating-point arithmetic
StandardFPSCRValue is an FPSCR value that selects ARM standard floating-point arithmetic. Most of the
arithmetic functions have a boolean fpscr_controlled argument that is TRUE for VFP operations and FALSE
for Advanced SIMD operations, and that selects between using the real FPSCR value and this value.

A2-52

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

// StandardFPSCRValue()
// ====================
bits(32) StandardFPSCRValue()
return ‘00000’ : FPSCR<26> : ‘11000000000000000000000000’;

Comparisons
The FPCompare() function compares two floating-point numbers, producing an (N,Z,C,V) flags result as
shown in Table A2-8:
Table A2-8 VFP comparison flag values
Comparison result

N

Z

C

V

Equal

0

1

1

0

Less than

1

0

0

0

Greater than

0

0

1

0

Unordered

0

0

1

1

This result is used to define the VCMP instruction in the VFP extension. The VCMP instruction writes these flag
values in the FPSCR. After using a VMRS instruction to transfer them to the APSR, they can be used to control
conditional execution as shown in Table A8-1 on page A8-8.
// FPCompare()
// ===========
(bit, bit, bit, bit) FPCompare(bits(N) op1, bits(N) op2, boolean quiet_nan_exc,
boolean fpscr_controlled)
assert N == 32 || N == 64;
fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();
(type1,sign1,value1) = FPUnpack(op1, fpscr_val);
(type2,sign2,value2) = FPUnpack(op2, fpscr_val);
if type1==FPType_SNaN || type1==FPType_QNaN || type2==FPType_SNaN || type2==FPType_QNaN then
result = (‘0’,’0’,’1’,’1’);
if type1==FPType_SNaN || type2==FPType_SNaN || quiet_nan_exc then
FPProcessException(FPExc_InvalidOp, fpscr_val);
else
// All non-NaN cases can be evaluated on the values produced by FPUnpack()
if value1 == value2 then
result = (‘0’,’1’,’1’,’0’);
elsif value1 < value2 then
result = (‘1’,’0’,’0’,’0’);
else // value1 > value2
result = (‘0’,’0’,’1’,’0’);
return result;

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-53

Application Level Programmers’ Model

The FPCompareEQ(), FPCompareGE() and FPCompareGT() functions are used to describe Advanced SIMD
instructions that perform floating-point comparisons.
// FPCompareEQ()
// =============
boolean FPCompareEQ(bits(32) op1, bits(32) op2, boolean fpscr_controlled)
fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();
(type1,sign1,value1) = FPUnpack(op1, fpscr_val);
(type2,sign2,value2) = FPUnpack(op2, fpscr_val);
if type1==FPType_SNaN || type1==FPType_QNaN || type2==FPType_SNaN || type2==FPType_QNaN then
result = FALSE;
if type1==FPType_SNaN || type2==FPType_SNaN then
FPProcessException(FPExc_InvalidOp, fpscr_val);
else
// All non-NaN cases can be evaluated on the values produced by FPUnpack()
result = (value1 == value2);
return result;
// FPCompareGE()
// =============
boolean FPCompareGE(bits(32) op1, bits(32) op2, boolean fpscr_controlled)
fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();
(type1,sign1,value1) = FPUnpack(op1, fpscr_val);
(type2,sign2,value2) = FPUnpack(op2, fpscr_val);
if type1==FPType_SNaN || type1==FPType_QNaN || type2==FPType_SNaN || type2==FPType_QNaN then
result = FALSE;
FPProcessException(FPExc_InvalidOp, fpscr_val);
else
// All non-NaN cases can be evaluated on the values produced by FPUnpack()
result = (value1 >= value2);
return result;
// FPCompareGT()
// =============
boolean FPCompareGT(bits(32) op1, bits(32) op2, boolean fpscr_controlled)
fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();
(type1,sign1,value1) = FPUnpack(op1, fpscr_val);
(type2,sign2,value2) = FPUnpack(op2, fpscr_val);
if type1==FPType_SNaN || type1==FPType_QNaN || type2==FPType_SNaN || type2==FPType_QNaN then
result = FALSE;
FPProcessException(FPExc_InvalidOp, fpscr_val);
else
// All non-NaN cases can be evaluated on the values produced by FPUnpack()
result = (value1 > value2);
return result;

A2-54

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

Maximum and minimum
// FPMax()
// =======
bits(N) FPMax(bits(N) op1, bits(N) op2, boolean fpscr_controlled)
assert N == 32 || N == 64;
fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();
(type1,sign1,value1) = FPUnpack(op1, fpscr_val);
(type2,sign2,value2) = FPUnpack(op2, fpscr_val);
(done,result) = FPProcessNaNs(type1, type2, op1, op2, fpscr_val);
if !done then
if type1 == FPType_Zero && type2 == FPType_Zero && sign1 == NOT(sign2) then
// Opposite-signed zeros produce +0.0
result = FPZero(‘0’, N);
else
// All other cases can be evaluated on the values produced by FPUnpack()
result = if value1 > value2 then op1 else op2;
return result;
// FPMin()
// =======
bits(N) FPMin(bits(N) op1, bits(N) op2, boolean fpscr_controlled)
assert N == 32 || N == 64;
fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();
(type1,sign1,value1) = FPUnpack(op1, fpscr_val);
(type2,sign2,value2) = FPUnpack(op2, fpscr_val);
(done,result) = FPProcessNaNs(type1, type2, op1, op2, fpscr_val);
if !done then
if type1 == FPType_Zero && type2 == FPType_Zero && sign1 == NOT(sign2) then
// Opposite-signed zeros produce -0.0
result = FPZero(‘1’, N);
else
// All other cases can be evaluated on the values produced by FPUnpack()
result = if value1 < value2 then op1 else op2;
return result;

Addition and subtraction
// FPAdd()
// =======
bits(N) FPAdd(bits(N) op1, bits(N) op2, boolean fpscr_controlled)
assert N == 32 || N == 64;
fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();
(type1,sign1,value1) = FPUnpack(op1, fpscr_val);
(type2,sign2,value2) = FPUnpack(op2, fpscr_val);
(done,result) = FPProcessNaNs(type1, type2, op1, op2, fpscr_val);
if !done then
inf1 = (type1 == FPType_Infinity); inf2 = (type2 == FPType_Infinity);
zero1 = (type1 == FPType_Zero);
zero2 = (type2 == FPType_Zero);
if inf1 && inf2 && sign1 == NOT(sign2) then

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-55

Application Level Programmers’ Model

result = FPDefaultNaN(N);
FPProcessException(FPExc_InvalidOp, fpscr_val);
elsif (inf1 && sign1 == ‘0’) || (inf2 && sign2 == ‘0’)
result = FPInfinity(‘0’, N);
elsif (inf1 && sign1 == ‘1’) || (inf2 && sign2 == ‘1’)
result = FPInfinity(‘1’, N);
elsif zero1 && zero2 && sign1 == sign2 then
result = FPZero(sign1, N);
else
result_value = value1 + value2;
if result_value == 0.0 then // Sign of exact zero
result_sign = if fpscr_val<23:22> == ‘10’ then
result = FPZero(result_sign, N);
else
result = FPRound(result_value, N, fpscr_val);
return result;

then
then

result depends on rounding mode
‘1’ else ‘0’;

// FPSub()
// =======
bits(N) FPSub(bits(N) op1, bits(N) op2, boolean fpscr_controlled)
assert N == 32 || N == 64;
fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();
(type1,sign1,value1) = FPUnpack(op1, fpscr_val);
(type2,sign2,value2) = FPUnpack(op2, fpscr_val);
(done,result) = FPProcessNaNs(type1, type2, op1, op2, fpscr_val);
if !done then
inf1 = (type1 == FPType_Infinity); inf2 = (type2 == FPType_Infinity);
zero1 = (type1 == FPType_Zero);
zero2 = (type2 == FPType_Zero);
if inf1 && inf2 && sign1 == sign2 then
result = FPDefaultNaN(N);
FPProcessException(FPExc_InvalidOp, fpscr_val);
elsif (inf1 && sign1 == ‘0’) || (inf2 && sign2 == ‘1’) then
result = FPInfinity(‘0’, N);
elsif (inf1 && sign1 == ‘1’) || (inf2 && sign2 == ‘0’) then
result = FPInfinity(‘1’, N);
elsif zero1 && zero2 && sign1 == NOT(sign2) then
result = FPZero(sign1, N);
else
result_value = value1 - value2;
if result_value == 0.0 then // Sign of exact zero result depends on rounding mode
result_sign = if fpscr_val<23:22> == ‘10’ then ‘1’ else ‘0’;
result = FPZero(result_sign, N);
else
result = FPRound(result_value, N, fpscr_val);
return result;

A2-56

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

Multiplication and division
// FPMul()
// =======
bits(N) FPMul(bits(N) op1, bits(N) op2, boolean fpscr_controlled)
assert N == 32 || N == 64;
fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();
(type1,sign1,value1) = FPUnpack(op1, fpscr_val);
(type2,sign2,value2) = FPUnpack(op2, fpscr_val);
(done,result) = FPProcessNaNs(type1, type2, op1, op2, fpscr_val);
if !done then
inf1 = (type1 == FPType_Infinity); inf2 = (type2 == FPType_Infinity);
zero1 = (type1 == FPType_Zero);
zero2 = (type2 == FPType_Zero);
if (inf1 && zero2) || (zero1 && inf2) then
result = FPDefaultNaN(N);
FPProcessException(FPExc_InvalidOp, fpscr_val);
elsif inf1 || inf2 then
result_sign = if sign1 == sign2 then ‘0’ else ‘1’;
result = FPInfinity(result_sign, N);
elsif zero1 || zero2 then
result_sign = if sign1 == sign2 then ‘0’ else ‘1’;
result = FPZero(result_sign, N);
else
result = FPRound(value1*value2, N, fpscr_val);
return result;
// FPDiv()
// =======
bits(N) FPDiv(bits(N) op1, bits(N) op2, boolean fpscr_controlled)
assert N == 32 || N == 64;
fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();
(type1,sign1,value1) = FPUnpack(op1, fpscr_val);
(type2,sign2,value2) = FPUnpack(op2, fpscr_val);
(done,result) = FPProcessNaNs(type1, type2, op1, op2, fpscr_val);
if !done then
inf1 = (type1 == FPType_Infinity); inf2 = (type2 == FPType_Infinity);
zero1 = (type1 == FPType_Zero);
zero2 = (type2 == FPType_Zero);
if (inf1 && inf2) || (zero1 && zero2) then
result = FPDefaultNaN(N);
FPProcessException(FPExc_InvalidOp, fpscr_val);
elsif inf1 || zero2 then
result_sign = if sign1 == sign2 then ‘0’ else ‘1’;
result = FPInfinity(result_sign, N);
if !inf1 then FPProcessException(FPExc_DivideByZero);
elsif zero1 || inf2 then
result_sign = if sign1 == sign2 then ‘0’ else ‘1’;
result = FPZero(result_sign, N);
else
result = FPRound(value1/value2, N, fpscr_val);
return result;

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-57

Application Level Programmers’ Model

Reciprocal estimate and step
The Advanced SIMD extension includes instructions that support Newton-Raphson calculation of the
reciprocal of a number.
The VRECPE instruction produces the initial estimate of the reciprocal. It uses the following pseudocode
functions:
// FPRecipEstimate()
// =================
bits(32) FPRecipEstimate(bits(32) operand)
(type,sign,value) = FPUnpack(operand, StandardFPSCRValue());
if type == FPType_SNaN || type == FPType_QNaN then
result = FPProcessNaN(type, operand, StandardFPSCRValue());
elsif type = FPType_Infinity then
result = FPZero(sign, 32);
elsif type = FPType_Zero then
result = FPInfinity(sign, 32);
FPProcessException(FPExc_DivideByZero, StandardFPSCRValue());
elsif Abs(value) >= 2^126 then // Result underflows to zero of correct sign
result = FPZero(sign, 32);
FPProcessException(FPExc_Underflow, StandardFPSCRValue());;
else
// Operand must be normalized, since denormalized numbers are flushed to zero. Scale to a
// double-precision value in the range 0.5 <= x < 1.0, and calculate result exponent.
// Scaled value has copied sign bit, exponent = 1022 = double-precision biased version of
// -1, fraction = original fraction extended with zeros.
scaled = operand<31> : ‘01111111110’ : operand<22:0> : Zeros(29);
result_exp = 253 - UInt(operand<30:23>);
// In range 253-252 = 1 to 253-1 = 252
// Call C function to get reciprocal estimate of scaled value.
estimate = recip_estimate(scaled);
// Result is double-precision and a multiple of 1/256 in the range 1 to 511/256. Convert
// to scaled single-precision result with copied sign bit and high-order fraction bits,
// and exponent calculated above.
result = estimate<63> : result_exp<7:0> : estimate<51:29>;
return result;
// UnsignedRecipEstimate()
// =======================
bits(32) UnsignedRecipEstimate(bits(32) operand)
if operand<31> == ‘0’ then // Operands <= 0x7FFFFFFF produce 0xFFFFFFFF
result = Ones(32);
else
// Generate double-precision value = operand * 2^-32. This has zero sign bit,
// exponent = 1022 = double-precision biased version of -1, fraction taken from
// operand, excluding its most significant bit.
dp_operand = ‘0 01111111110’ : operand<30:0> : Zeros(21);

A2-58

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

// Call C function to get reciprocal estimate of scaled value.
estimate = recip_estimate(dp_operand);
// Result is double-precision and a multiple of 1/256 in the range 1 to 511/256.
// Multiply by 2^31 and convert to an unsigned integer - this just involves
// concatenating the implicit units bit with the top 31 fraction bits.
result = ‘1’ : estimate<51:21>;
return result;

where recip_estimate() is defined by the following C function:
double recip_estimate(double a)
{
int q, s;
double r;
q = (int)(a * 512.0);
/* a in units of 1/512 rounded down */
r = 1.0 / (((double)q + 0.5) / 512.0); /* reciprocal r */
s = (int)(256.0 * r + 0.5);
/* r in units of 1/256 rounded to nearest */
return (double)s / 256.0;
}

Table A2-9 shows the results where input values are out of range.
Table A2-9 VRECPE results for out-of-range inputs
Number type

Input Vm[i]

Result Vd[i]

Integer

<= 0x7FFFFFFF

0xFFFFFFFF

Floating-point

NaN

Default NaN

Floating-point

+/– 0 or denormalized number

+/– Infinity a

Floating-point

+/– infinity

+/– 0

Floating-point

Absolute value >= 2126

+/– 0

a. The Division by Zero exception bit in the FPSCR (FPSCR[1]) is set

The Newton-Raphson iteration:
xn+1 = xn(2-dxn)

converges to (1/d) if x0 is the result of VRECPE applied to d.
The VRECPS instruction performs a 2 - op1*op2 calculation and can be used with a multiplication to
perform a step of this iteration. The functionality of this instruction is defined by the following pseudocode
function:
// FPRecipStep()
// =============

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-59

Application Level Programmers’ Model

bits(32) FPRecipStep(bits(32) op1, bits(32) op2)
(type1,sign1,value1) = FPUnpack(op1, StandardFPSCRValue());
(type2,sign2,value2) = FPUnpack(op2, StandardFPSCRValue());
(done,result) = FPProcessNaNs(type1, type2, op1, op2, StandardFPSCRValue());
if !done then
inf1 = (type1 == FPType_Infinity); inf2 = (type2 == FPType_Infinity);
zero1 = (type1 == FPType_Zero);
zero2 = (type2 == FPType_Zero);
if (inf1 && zero2) || (zero1 && inf2) then
product = FPZero(‘0’, 32);
else
product = FPMul(op1, op2, FALSE);
result = FPSub(FPTwo(32), product, FALSE);
return result;

Table A2-10 shows the results where input values are out of range.
Table A2-10 VRECPS results for out-of-range inputs
Input Vn[i]

Input Vm[i]

Result Vd[i]

Any NaN

-

Default NaN

-

Any NaN

Default NaN

+/– 0.0 or denormalized number

+/– infinity

2.0

+/– infinity

+/– 0.0 or denormalized number

2.0

Square root
// FPSqrt()
// ========
bits(N) FPSqrt(bits(N) operand, boolean fpscr_controlled)
assert N == 32 || N == 64;
fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();
(type,sign,value) = FPUnpack(operand, fpscr_val);
if type == FPType_SNaN || type == FPType_QNaN then
result = FPProcessNaN(type, operand, fpscr_val);
elsif type == FPType_Zero || (type = FPType_Infinity && sign == ‘0’) then
result = operand;
elsif sign == ‘1’ then
result = FPDefaultNaN(N);
FPProcessException(FPExc_InvalidOp, fpscr_val);
else
result = FPRound(Sqrt(value), N, fpscr_val);
return result;

A2-60

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

Reciprocal square root
The Advanced SIMD extension includes instructions that support Newton-Raphson calculation of the
reciprocal of the square root of a number.
The VRSQRTE instruction produces the initial estimate of the reciprocal of the square root. It uses the following
pseudocode functions:
// FPRSqrtEstimate()
// =================
bits(32) FPRSqrtEstimate(bits(32) operand)
(type,sign,value) = FPUnpack(operand, StandardFPSCRValue());
if type == FPType_SNaN || type == FPType_QNaN then
result = FPProcessNaN(type, operand, StandardFPSCRValue());
elsif type = FPType_Zero then
result = FPInfinity(sign, 32);
FPProcessException(FPExc_DivideByZero, StandardFPSCRValue());
elsif sign == ‘1’ then
result = FPDefaultNaN(32);
FPProcessException(FPExc_InvalidOp, StandardFPSCRValue());
elsif type = FPType_Infinity then
result = FPZero(‘0’, 32);
else
// Operand must be normalized, since denormalized numbers are flushed to zero. Scale to a
// double-precision value in the range 0.25 <= x < 1.0, with the evenness or oddness of
// the exponent unchanged, and calculate result exponent. Scaled value has copied sign
// bit, exponent = 1022 or 1021 = double-precision biased version of -1 or -2, fraction
// = original fraction extended with zeros.
if operand<23> == ‘0’ then
scaled = operand<31> : ‘01111111110’ : operand<22:0> : Zeros(29);
else
scaled = operand<31> : ‘01111111101’ : operand<22:0> : Zeros(29);
result_exp = (380 - UInt(operand<30:23>)) DIV 2;
// Call C function to get reciprocal estimate of scaled value.
estimate = recip_sqrt_estimate(scaled);
// Result is double-precision and a multiple of 1/256 in the range 1 to 511/256. Convert
// to scaled single-precision result with copied sign bit and high-order fraction bits,
// and exponent calculated above.
result = estimate<63> : result_exp<7:0> : estimate<51:29>;
return result;
// UnsignedRSqrtEstimate()
// =======================
bits(32) UnsignedRSqrtEstimate(bits(32) operand)
if operand<31:30> == ‘00’ then
result = Ones(32);
else

ARM DDI 0406B

// Operands <= 0x3FFFFFFF produce 0xFFFFFFFF

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-61

Application Level Programmers’ Model

//
//
//
if

Generate double-precision value = operand * 2^-32. This has zero sign bit,
exponent = 1022 or 1021 = double-precision biased version of -1 or -2,
fraction taken from operand, excluding its most significant one or two bits.
operand<31> == ‘1’ then
dp_operand = ‘0 01111111110’ : operand<30:0> : Zeros(21);
else // operand<31:30> == ‘01’
dp_operand = ‘0 01111111101’ : operand<29:0> : Zeros(22);
// Call C function to get reciprocal estimate of scaled value.
estimate = recip_sqrt_estimate(dp_operand);
// Result is double-precision and a multiple of 1/256 in the range 1 to 511/256.
// Multiply by 2^31 and convert to an unsigned integer - this just involves
// concatenating the implicit units bit with the top 31 fraction bits.
result = ‘1’ : estimate<51:21>;
return result;

where recip_sqrt_estimate() is defined by the following C function:
double recip_sqrt_estimate(double a)
{
int q0, q1, s;
double r;
if (a < 0.5) /* range 0.25 <= a < 0.5 */
{
q0 = (int)(a * 512.0);
/* a in units of 1/512 rounded down */
r = 1.0 / sqrt(((double)q0 + 0.5) / 512.0); /* reciprocal root r */
}
else
/* range 0.5 <= a < 1.0 */
{
q1 = (int)(a * 256.0);
/* a in units of 1/256 rounded down */
r = 1.0 / sqrt(((double)q1 + 0.5) / 256.0); /* reciprocal root r */
}
s = (int)(256.0 * r + 0.5); /* r in units of 1/256 rounded to nearest */
return (double)s / 256.0;
}

Table A2-11 shows the results where input values are out of range.
Table A2-11 VRSQRTE results for out-of-range inputs

A2-62

Number type

Input Vm[i]

Result Vd[i]

Integer

<= 0x3FFFFFFF

0xFFFFFFFF

Floating-point

NaN, – normalized number, – infinity

Default NaN

Floating-point

– 0 or – denormalized number

– infinity a

Floating-point

+ 0 or + denormalized number

+ infinity a

Floating-point

+ infinity

+0

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

a. The Division by Zero exception bit in the FPSCR (FPSCR[1]) is set.

The Newton-Raphson iteration:
xn+1 = xn(3-dxn2)/2

converges to (1/√d) if x0 is the result of VRSQRTE applied to d.
The VRSQRTS instruction performs a (3 – op1*op2)/2 calculation and can be used with two multiplications to
perform a step of this iteration. The functionality of this instruction is defined by the following pseudocode
function:
// FPRSqrtStep()
// =============
bits(32) FPRSqrtStep(bits(32) op1, bits(32) op2)
(type1,sign1,value1) = FPUnpack(op1, StandardFPSCRValue());
(type2,sign2,value2) = FPUnpack(op2, StandardFPSCRValue());
(done,result) = FPProcessNaNs(type1, type2, op1, op2, StandardFPSCRValue());
if !done then
inf1 = (type1 == FPType_Infinity); inf2 = (type2 == FPType_Infinity);
zero1 = (type1 == FPType_Zero);
zero2 = (type2 == FPType_Zero);
if (inf1 && zero2) || (zero1 && inf2) then
product = FPZero(‘0’, 32);
else
product = FPMul(op1, op2, FALSE);
result = FPDiv(FPSub(FPThree(32), product, FALSE), FPTwo(32), FALSE);
return result;

Table A2-12 shows the results where input values are out of range.
Table A2-12 VRSQRTS results for out-of-range inputs

ARM DDI 0406B

Input Vn[i]

Input Vm[i]

Result Vd[i]

Any NaN

-

Default NaN

-

Any NaN

Default NaN

+/– 0.0 or denormalized number

+/– infinity

1.5

+/– infinity

+/– 0.0 or denormalized number

1.5

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-63

Application Level Programmers’ Model

Conversions
The following functions perform conversions between half-precision and single-precision floating-point
numbers.
// FPHalfToSingle()
// ================
bits(32) FPHalfToSingle(bits(16) operand, boolean fpscr_controlled)
fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();
(type,sign,value) = FPUnpack(operand, fpscr_val);
if type == FPType_SNaN || type == FPType_QNaN then
if fpscr_val<25> == ‘1’ then // DN bit set
result = FPDefaultNaN(32);
else
result = sign : ‘11111111 1’ : operand<8:0> : Zeros(13);
if type == FPType_SNaN then
FPProcessException(FPExc_InvalidOp, fpscr_val);
elsif type = FPType_Infinity then
result = FPInfinity(sign, 32);
elsif type = FPType_Zero then
result = FPZero(sign, 32);
else
result = FPRound(value, 32, fpscr_val); // Rounding will be exact
return result;
// FPSingleToHalf()
// ================
bits(16) FPSingleToHalf(bits(32) operand, boolean fpscr_controlled)
fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();
(type,sign,value) = FPUnpack(operand, fpscr_val);
if type == FPType_SNaN || type == FPType_QNaN then
if fpscr_val<26> == ‘1’ then
// AH bit set
result = FPZero(sign, 16);
elsif fpscr_val<25> == ‘1’ then // DN bit set
result = FPDefaultNaN(16);
else
result = sign : ‘11111 1’ : operand<21:13>;
if type == FPType_SNaN || fpscr_val<26> == ‘1’ then
FPProcessException(FPExc_InvalidOp, fpscr_val);
elsif type = FPType_Infinity then
if fpscr_val<26> == ‘1’ then // AH bit set
result = sign : Ones(15);
FPProcessException(FPExc_InvalidOp, fpscr_val);
else
result = FPInfinity(sign, 16);
elsif type = FPType_Zero then
result = FPZero(sign, 16);
else
result = FPRound(value, 16, fpscr_val);
return result;

A2-64

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

The following functions perform conversions between single-precision and double-precision floating-point
numbers.
// FPSingleToDouble()
// ==================
bits(64) FPSingleToDouble(bits(32) operand, boolean fpscr_controlled)
fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();
(type,sign,value) = FPUnpack(operand, fpscr_val);
if type == FPType_SNaN || type == FPType_QNaN then
if fpscr_val<25> == ‘1’ then // DN bit set
result = FPDefaultNaN(64);
else
result = sign : ‘11111111111 1’ : operand<21:0> : Zeros(29);
if type == FPType_SNaN then
FPProcessException(FPExc_InvalidOp, fpscr_val);
elsif type = FPType_Infinity then
result = FPInfinity(sign, 64);
elsif type = FPType_Zero then
result = FPZero(sign, 64);
else
result = FPRound(value, 64, fpscr_val); // Rounding will be exact
return result;
// FPDoubleToSingle()
// ==================
bits(32) FPDoubleToSingle(bits(64) operand, boolean fpscr_controlled)
fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();
(type,sign,value) = FPUnpack(operand, fpscr_val);
if type == FPType_SNaN || type == FPType_QNaN then
if fpscr_val<25> == ‘1’ then // DN bit set
result = FPDefaultNaN(32);
else
result = sign : ‘11111111 1’ : operand<50:29>;
if type == FPType_SNaN then
FPProcessException(FPExc_InvalidOp, fpscr_val);
elsif type = FPType_Infinity then
result = FPInfinity(sign, 32);
elsif type = FPType_Zero then
result = FPZero(sign, 32);
else
result = FPRound(value, 32, fpscr_val);
return result;

The following functions perform conversions between floating-point numbers and integers or fixed-point
numbers:
// FPToFixed()
// ===========
bits(M) FPToFixed(bits(N) operand, integer M, integer fraction_bits, boolean unsigned,
boolean round_towards_zero, boolean fpscr_controlled)
assert N == 32 || N == 64;
fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-65

Application Level Programmers’ Model

if round_towards_zero then fpscr_val<23:22> = ‘11’;
(type,sign,value) = FPUnpack(operand, fpscr_val);
//
//
//
//
if

For NaNs and infinities, FPUnpack() has produced a value that will round to the
required result of the conversion. Also, the value produced for infinities will
cause the conversion to overflow and signal an Invalid Operation floating-point
exception as required. NaNs must also generate such a floating-point exception.
type == FPType_SNaN || type == FPType_QNaN then
FPProcessException(FPExc_InvalidOp, fpscr_val);

// Scale value by specified number of fraction bits, then start rounding to an integer
// and determine the rounding error.
value = value * 2^fraction_bits;
int_result = RoundDown(value);
error = value - int_result;
// Apply the specified rounding mode.
case fpscr_val<23:22> of
when ‘00’ // Round to Nearest (rounding to even if exactly halfway)
round_up = (error > 0.5 || (error == 0.5 && int_result<0> == ‘1’));
when ‘01’ // Round towards Plus Infinity
round_up = (error != 0.0);
when ‘10’ // Round towards Minus Infinity
round_up = FALSE;
when ‘11’ // Round towards Zero
round_up = (error != 0.0 && int_result < 0);
if round_up then int_result = int_result + 1;
// Bitstring result is the integer result saturated to the destination size, with
// saturation indicating overflow of the conversion (signaled as an Invalid
// Operation floating-point exception).
(result, overflow) = SatQ(int_result, M, unsigned);
if overflow then
FPProcessException(FPExc_InvalidOp, fpscr_val);
elsif error != 0 then
FPProcessException(FPExc_Inexact, fpscr_val);
return result;
// FixedToFP()
// ===========
bits(N) FixedToFP(bits(M) operand, integer N, integer fraction_bits, boolean unsigned,
boolean round_to_nearest, boolean fpscr_controlled)
assert N == 32 || N == 64;
fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();
if round_to_nearest then fpscr_val<23:22> = ‘00’;
int_operand = if unsigned then UInt(operand) else SInt(operand);
real_operand = int_operand / 2^fraction_bits;
if real_operand == 0.0 then
result = FPZero(‘0’, N);
else
result = FPRound(real_operand, N, fpscr_val);
return result;

A2-66

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

A2.8

Polynomial arithmetic over {0,1}
The polynomial data type represents a polynomial in x of the form bn–1xn–1 + … + b1x + b0 where bk is
bit [k] of the value.
The coefficients 0 and 1 are manipulated using the rules of Boolean arithmetic:
•
0+0=1+1=0
•
0+1=1+0=1
•
0*0=0*1=1*0=0
•
1 * 1 = 1.
That is:

A2.8.1

•

adding two polynomials over {0,1} is the same as a bitwise exclusive OR

•

multiplying two polynomials over {0,1} is the same as integer multiplication except that partial
products are exclusive-ORed instead of being added.

Pseudocode details of polynomial multiplication
In pseudocode, polynomial addition is described by the EOR operation on bitstrings.
Polynomial multiplication is described by the PolynomialMult() function:
// PolynomialMult()
// ================
bits(M+N) PolynomialMult(bits(M) op1, bits(N) op2)
result = Zeros(M+N);
extended_op2 = Zeros(M) : op2;
for i=0 to M-1
if op1 == ‘1’ then
result = result EOR LSL(extended_op2, i);
return result;

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-67

Application Level Programmers’ Model

A2.9

Coprocessor support
Coprocessor space is used to extend the functionality of an ARM processor. There are sixteen coprocessors
defined in the coprocessor instruction space. These are commonly known as CP0 to CP15. The following
coprocessors are reserved by ARM for specific purposes:
•

Coprocessor 15 (CP15) provides system control functionality. This includes architecture and feature
identification, as well as control, status information and configuration support. The following
sections describe CP15:
—
CP15 registers for a VMSA implementation on page B3-64
—
CP15 registers for a PMSA implementation on page B4-22.
CP15 also provides performance monitor registers, see Chapter C9 Performance Monitors.

•

Coprocessor 14 (CP14) supports:
—

debug, see Chapter C6 Debug Register Interfaces

—

the execution environment features defined by the architecture, see Execution environment
support on page A2-69.

•

Coprocessor 11 (CP11) supports double-precision floating-point operations.

•

Coprocessor 10 (CP10) supports single-precision floating-point operations and the control and
configuration of both the VFP and the Advanced SIMD architecture extensions.

•

Coprocessors 8, 9, 12, and 13 are reserved for future use by ARM.

Note
Any implementation that includes either or both of the Advanced SIMD extension and the VFP extension
must enable access to both CP10 and CP11, see Enabling Advanced SIMD and floating-point support on
page B1-64.
In general, privileged access is required for:
•
system control through CP15
•
debug control and configuration
•
access to the identification registers
•
access to any register bits that enable or disable coprocessor features.
For details of the exact split between the privileged and unprivileged coprocessor operations see the relevant
sections of this manual.
All load, store, branch and data operation instructions associated with floating-point, Advanced SIMD and
execution environment support can execute unprivileged.
Coprocessors 0 to 7 can be used to provide vendor specific features.

A2-68

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

A2.10 Execution environment support
The Jazelle and ThumbEE states, introduced in ISETSTATE on page A2-15, support execution
environments:
•

The ThumbEE state is more generic, supporting a variant of the Thumb instruction set that minimizes
the code size overhead generated by a Just-In-Time (JIT) or Ahead-Of-Time (AOT) compiler. JIT and
AOT compilers convert execution environment source code to a native executable. For more
information, see Thumb Execution Environment.

•

The Jazelle state is specific to hardware acceleration of Java bytecodes. For more information, see
Jazelle direct bytecode execution support on page A2-73.

A2.10.1 Thumb Execution Environment
Thumb Execution Environment (ThumbEE) is a variant of the Thumb instruction set designed as a target for
dynamically generated code. This is code that is compiled on the device, from a portable bytecode or other
intermediate or native representation, either shortly before or during execution. ThumbEE provides support
for Just-In-Time (JIT), Dynamic Adaptive Compilation (DAC) and Ahead-Of-Time (AOT) compilers, but
cannot interwork freely with the ARM and Thumb instruction sets.
ThumbEE is particularly suited to languages that feature managed pointers and array types.
ThumbEE executes instructions in the ThumbEE instruction set state. For information about instruction set
states see ISETSTATE on page A2-15.
See Thumb Execution Environment on page B1-73 for system level information about ThumbEE.

ThumbEE instructions
In ThumbEE state, the processor executes almost the same instruction set as in Thumb state. However some
instructions behave differently, some are removed, and some ThumbEE instructions are added.
The key differences are:
•
additional instructions to change instruction set in both Thumb state and ThumbEE state
•
new ThumbEE instructions to branch to handlers
•
null pointer checking on load/store instructions executed in ThumbEE state
•
an additional instruction in ThumbEE state to check array bounds
•
some other modifications to load, store, and control flow instructions.
For more information about the ThumbEE instructions see Chapter A9 ThumbEE.

ThumbEE configuration
ThumbEE introduces two new registers:
•

ARM DDI 0406B

ThumbEE Configuration Register, TEECR. This contains a single bit, the ThumbEE configuration
control bit, XED.

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-69

Application Level Programmers’ Model

•

ThumbEE Handler Base Register. This contains the base address for ThumbEE handlers.
A handler is a short, commonly executed, sequence of instructions. It is typically, but not always,
associated directly with one or more bytecodes or other intermediate language elements.

Changes to these CP14 registers have the same synchronization requirements as changes to the CP15
registers. These are described in:
•
Changes to CP15 registers and the memory order model on page B3-77 for a VMSA implementation
•
Changes to CP15 registers and the memory order model on page B4-28 for a PMSA implementation.
ThumbEE is an unprivileged, user-level facility, and there are no special provisions for using it securely. For
more information, see ThumbEE and the Security Extensions on page B1-73.
ThumbEE Configuration Register (TEECR)
The ThumbEE Configuration Register (TEECR) controls unprivileged access to the ThumbEE Handler
Base Register.
The TEECR is:
•
a CP14 register
•
a 32-bit register, with access rights that depend on the current privilege:
—
the result of an unprivileged write to the register is UNDEFINED
—
unprivileged reads, and privileged reads and writes, are permitted.
•
when the Security Extensions are implemented, a Common register.
The format of the TEECR is:
31

1

UNK/SBZP

0

XED

Bits [31:1]

UNK/SBZP.

XED, bit [0]

Execution Environment Disable bit. Controls unprivileged access to the ThumbEE Handler
Base Register:
0
Unprivileged access permitted.
1
Unprivileged access disabled.
The reset value of this bit is 0.

The effects of a write to this register on ThumbEE configuration are only guaranteed to be visible to
subsequent instructions after the execution of an ISB instruction, an exception entry or an exception return.
However, a read of this register always returns the value most recently written to the register.
To access the TEECR, read or write the CP14 registers with an MRC or MCR instruction with  set to 6,
 set to c0,  set to c0, and  set to 0. For example:
MRC
MCR

A2-70

p14, 6, , c0, c0, 0
p14, 6, , c0, c0, 0

; Read ThumbEE Configuration Register
; Write ThumbEE Configuration Register

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

ThumbEE Handler Base Register (TEEHBR)
The ThumbEE Handler Base Register (TEEHBR) holds the base address for ThumbEE handlers.
The TEEHBR is:
•

a CP14 register

•

a 32-bit read/write register, with access rights that depend on the current privilege and the value of
the TEECR.XED bit:
—
privileged accesses are always permitted
—
when TEECR.XED == 0, unprivileged accesses are permitted
—
when TEECR.XED == 1, the result of an unprivileged access is UNDEFINED.

•

when the Security Extensions are implemented, a Common register.

The format of the TEEHBR is:
31

2 1 0

HandlerBase

SBZ

HandlerBase, bits [31:2]
The address of the ThumbEE Handler_00 implementation. This is the address of the first of
the ThumbEE handlers.
The reset value of this field is UNKNOWN.
bits [1:0]

Reserved, SBZ.

The effects of a write to this register on ThumbEE handler entry are only guaranteed to be visible to
subsequent instructions after the execution of an ISB instruction, an exception entry or an exception return.
However, a read of this register always returns the value most recently written to the register.
To access the TEEHBR, read or write the CP14 registers with an MRC or MCR instruction with  set to 6,
 set to c1,  set to c0, and  set to 0. For example:
MRC
MCR

ARM DDI 0406B

p14, 6, , c1, c0, 0
p14, 6, , c1, c0, 0

; Read ThumbEE Handler Base Register
; Write ThumbEE Handler Base Register

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-71

Application Level Programmers’ Model

Use of HandlerBase
ThumbEE handlers are entered by reference to a HandlerBase address, defined by the TEEHBR. See
ThumbEE Handler Base Register (TEEHBR) on page A2-71. Table A2-13 shows how the handlers are
arranged in relation to the value of HandlerBase:
Table A2-13 Access to ThumbEE handlers
Offset from HandlerBase

Name

Value stored

-0x0008

IndexCheck

Branch to IndexCheck handler

-0x0004

NullCheck

Branch to NullCheck handler

+0x0000

Handler_00

Implementation of Handler_00

+0x0020

Handler_01

Implementation of Handler_01

...

...

...

+(0x0000 + 32n)

Handler_

Implementation of Handler_

...

...

Implementation of additional handlers

The IndexCheck occurs when a CHKA instruction detects an index out of range. For more information, see
CHKA on page A9-15.
The NullCheck occurs when any memory access instruction is executed with a value of 0 in the base register.
For more information, see Null checking on page A9-3.

Note
Checks are similar to conditional branches, with the added property that they clear the IT bits when taken.
Other handlers are called using explicit handler call instructions. For details see the following sections:
•
HB, HBL on page A9-16
•
HBLP on page A9-17
•
HBP on page A9-18.

A2-72

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

A2.10.2 Jazelle direct bytecode execution support
From ARMv5TEJ, the architecture requires every system to include an implementation of the Jazelle
extension. The Jazelle extension provides architectural support for hardware acceleration of bytecode
execution by a Java Virtual Machine (JVM).
In the simplest implementations of the Jazelle extension, the processor does not accelerate the execution of
any bytecodes, and the JVM uses software routines to execute all bytecodes. Such an implementation is
called a trivial implementation of the Jazelle extension, and has minimal additional cost compared with not
implementing the Jazelle extension at all. An implementation that provides hardware acceleration of
bytecode execution is a non-trivial Jazelle implementation.
These requirements for the Jazelle extension mean a JVM can be written to both:
•

function correctly on all processors that include a Jazelle extension implementation

•

automatically take advantage of the accelerated bytecode execution provided by a processor that
includes a non-trivial implementation.

Typically, a non-trivial implementation of the Jazelle extension implements a subset of the bytecodes in
hardware, choosing bytecodes that:
•
can have simple hardware implementations
•
account for a large percentage of bytecode execution time.
The required features of a non-trivial implementation are:
•
provision of the Jazelle state
•
a new instruction, BXJ, to enter Jazelle state
•
system support that enables an operating system to regulate the use of the Jazelle extension hardware
•
system support that enables a JVM to configure the Jazelle extension hardware to its specific needs.
The required features of a trivial implementation are:
•

Normally, the Jazelle instruction set state is never entered. If an incorrect exception return causes
entry to the Jazelle instruction set state, the next instruction executed is treated as UNDEFINED.

•

The BXJ instruction behaves as a BX instruction.

•

Configuration support that maintains the interface to the Jazelle extension is permanently disabled.

For more information about trivial implementations see Trivial implementation of the Jazelle extension on
page B1-81.
A JVM that has been written to take advantage automatically of hardware-accelerated bytecode execution
is known as an Enabled JVM (EJVM).

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-73

Application Level Programmers’ Model

Subarchitectures
A processor implementation that includes the Jazelle extension expects the general-purpose register values
and other resources of the ARM processor to conform to an interface standard defined by the Jazelle
implementation when Jazelle state is entered and exited. For example, a specific general-purpose register
might be reserved for use as the pointer to the current bytecode.
In order for an EJVM and associated debug support to function correctly, it must be written to comply with
the interface standard defined by the acceleration hardware at Jazelle state execution entry and exit points.
An implementation of the Jazelle extension might define other configuration registers in addition to the
architecturally defined ones.
The interface standard and any additional configuration registers used to communicate with the Jazelle
extension are known collectively as the subarchitecture of the implementation. They are not described in
this manual. Only EJVM implementations and debug or similar software can depend on the subarchitecture.
All other software must rely only on the architectural definition of the Jazelle extension given in this manual.
A particular subarchitecture is identified by reading the JIDR described in Jazelle ID Register (JIDR) on
page A2-76.

Jazelle state
While the processor is in Jazelle state, it executes bytecode programs. A bytecode program is defined as an
executable object that comprises one or more class files, or is derived from and functionally equivalent to
one or more class files. See Lindholm and Yellin, The Java Virtual Machine Specification 2nd Edition for
the definition of class files.
While the processor is in Jazelle state, the PC identifies the next JVM bytecode to be executed. A JVM
bytecode is a bytecode defined in Lindholm and Yellin, or a functionally equivalent transformed version of
a bytecode defined in Lindholm and Yellin.
For the Jazelle extension, the functionality of Native methods, as described in Lindholm and Yellin, must be
specified using only instructions from the ARM, Thumb, and ThumbEE instruction sets.
An implementation of the Jazelle extension must not be documented or promoted as performing any task
while it is in Jazelle state other than the acceleration of bytecode programs in accordance with this section
and The Java Virtual Machine Specification.

Jazelle state entry instruction, BXJ
ARMv7 includes an ARM instruction similar to BX. The BXJ instruction has a single register operand that
specifies a target instruction set state, ARM state or Thumb state, and branch target address for use if entry
to Jazelle state is not available. For more information, see BXJ on page A8-64.
Correct entry into Jazelle state involves the EJVM executing the BXJ instruction at a time when both:
•

A2-74

the Jazelle extension Control and Configuration registers are initialized correctly, see Application
level configuration and control of the Jazelle extension on page A2-75

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

•

application level registers and any additional configuration registers are initialized as required by the
subarchitecture of the implementation.

Executing BXJ with Jazelle extension enabled
Executing a BXJ instruction when the JMCR.JE bit is 1, see Jazelle Main Configuration Register (JMCR) on
page A2-77, causes the Jazelle hardware to do one of the following:
•
enter Jazelle state and start executing bytecodes directly from a SUBARCHITECTURE DEFINED address
•
branch to a SUBARCHITECTURE DEFINED handler.
Which of these occurs is SUBARCHITECTURE DEFINED.
The Jazelle subarchitecture can use Application Level registers (but not System Level registers) to transfer
information between the Jazelle extension and the EJVM. There are SUBARCHITECTURE DEFINED
restrictions on what Application Level registers must contain when a BXJ instruction is executed, and
Application Level registers have SUBARCHITECTURE DEFINED values when Jazelle state execution ends and
ARM or Thumb state execution resumes.
Jazelle subarchitectures and implementations must not use any unallocated bits in Application Level
registers such as the CPSR or FPSCR. All such bits are reserved for future expansion of the ARM
architecture.
Executing BXJ with Jazelle extension disabled
If a BXJ instruction is executed when the JMCR.JE bit is 0, it is executed identically to a BX instruction with
the same register operand.
This means that BXJ instructions can be executed freely when the JMCR.JE bit is 0. In particular, if an EJVM
determines that it is executing on a processor whose Jazelle extension implementation is trivial or uses an
incompatible subarchitecture, it can set JE == 0 and execute correctly. In this case it executes without the
benefit of any Jazelle hardware acceleration that might be present.

Application level configuration and control of the Jazelle extension
All registers associated with the Jazelle extension are implemented in coprocessor space as part of
coprocessor 14 (CP14). The registers are accessed using the instructions:
MCR, see MCR, MCR2 on page A8-186
•
•
MRC, see MRC, MRC2 on page A8-202.
In a non-trivial implementation at least three registers are required. These are described in:
•
Jazelle ID Register (JIDR) on page A2-76
•
Jazelle Main Configuration Register (JMCR) on page A2-77
•
Jazelle OS Control Register (JOSCR) on page B1-77.
Additional configuration registers might be provided and are SUBARCHITECTURE DEFINED.
The following rules apply to all Jazelle extension control and configuration registers:
•

ARM DDI 0406B

All configuration registers are accessed by CP14 MRC and MCR instructions with  set to 7.

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-75

Application Level Programmers’ Model

•

The values contained in configuration registers are changed only by the execution of MCR instructions.
In particular, they are never changed by Jazelle state execution of bytecodes.

•

The access policy for the required registers is fully defined in their descriptions. With unprivileged
operation:
—
all MCR accesses to the JIDR are UNDEFINED
MRC and MCR accesses that are restricted to privileged modes are UNDEFINED.
—
The access policy of other configuration registers is SUBARCHITECTURE DEFINED.

•

When the Security Extensions are implemented, the registers are common to the Secure and
Non-secure security states. For more information, see Effect of the Security Extensions on the CP15
registers on page B3-71. This section applies to some CP14 registers as well as to the CP15 registers.

•

When a configuration register is readable, reading the register returns the last value written to it.
Reading a readable configuration register has no side effects.
When a configuration register is not readable, attempting to read it returns an UNKNOWN value.

•

When a configuration register can be written, the effect of writing to it must be idempotent. That is,
the overall effect of writing the same value more than once must not differ from the effect of writing
it once.

Changes to these CP14 registers have the same synchronization requirements as changes to the CP15
registers. These are described in:
•
Changes to CP15 registers and the memory order model on page B3-77 for a VMSA implementation
•
Changes to CP15 registers and the memory order model on page B4-28 for a PMSA implementation.
For more information, see Jazelle state configuration and control on page B1-77.
Jazelle ID Register (JIDR)
The Jazelle ID Register (JIDR) enables an EJVM to determine the architecture and subarchitecture under
which it is running.
The JIDR is:

A2-76

•

a CP14 register

•

a 32-bit read-only register

•

accessible during privileged and unprivileged execution

•

when the Security Extensions are implemented, a Common register, see Common CP15 registers on
page B3-74.

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

The format of the JIDR is:
31

28 27

Architecture

20 19

Implementer

12 11

Subarchitecture

0
SUBARCHITECTURE DEFINED

Architecture, bits [31:28]
Architecture code. This uses the same Architecture code that appears in the Main ID register
in coprocessor 15, see c0, Main ID Register (MIDR) on page B3-81 (VMSA
implementation) or c0, Main ID Register (MIDR) on page B4-32 (PMSA implementation).
Implementer, bits [27:20]
Implementer code of the designer of the subarchitecture. This uses the same Implementer
code that appears in the Main ID register in coprocessor 15, see c0, Main ID Register
(MIDR) on page B3-81 (VMSA implementation) or c0, Main ID Register (MIDR) on
page B4-32 (PMSA implementation).
If the trivial implementation of the Jazelle extension is used, the Implementer code is 0x00.
Subarchitecture, bits [19:12]
Contain the subarchitecture code. The following subarchitecture code is defined:
0x00
Jazelle v1 subarchitecture, or trivial implementation of Jazelle extension if
Implementer code is 0x00.
bits [11:0]

Contain additional SUBARCHITECTURE DEFINED information.

To access the JIDR, read the CP14 registers with an MRC instruction with  set to 7,  set to c0, 
set to c0, and  set to 0. For example:
MRC

p14, 7, , c0, c0, 0

; Read Jazelle ID register

Jazelle Main Configuration Register (JMCR)
The Jazelle Main Configuration Register (JMCR) controls the Jazelle extension.
The JMCR is:
•

a CP14 register

•

a 32-bit register, with access rights that depend on the current privilege:
—
for privileged operations the register is read/write
—
for unprivileged operations, the register is normally write-only

•

when the Security Extensions are implemented, a Common register, see Common CP15 registers on
page B3-74.

For more information about unprivileged access restrictions see Access to Jazelle registers on page A2-78.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-77

Application Level Programmers’ Model

The format of the JMCR is:
31

1 0

JE

SUBARCHITECTURE DEFINED

bit [31:1]

SUBARCHITECTURE DEFINED

JE, bit [0]

Jazelle Enable bit:

information.

0

Jazelle extension disabled. The BXJ instruction does not cause Jazelle state
execution. BXJ behaves exactly as a BX instruction, see Jazelle state entry
instruction, BXJ on page A2-74.

1

Jazelle extension enabled.

The reset value of this bit is 0.
To access the JMCR, read or write the CP14 registers with an MRC or MCR instruction with  set to 7,
 set to c2,  set to c0, and  set to 0. For example:
MRC
MCR

p14, 7, , c2, c0, 0
p14, 7, , c2, c0, 0

; Read Jazelle Main Configuration register
; Write Jazelle Main Configuration register

Access to Jazelle registers
Table A2-14 shows the access permissions for the Jazelle registers, and how unprivileged access to the
registers depends on the value of the JOSCR.
Table A2-14 Access to Jazelle registers
Unprivileged access
Jazelle register

Privileged access
JOSCR.CD == 0 a

JOSCR.CD == 1 a

Read access permitted

Read and write access

JIDR
Write access ignored
Read access UNDEFINED
JMCR
Write access permitted

registers

Read and write access
UNDEFINED

Read access UNDEFINED

SUBARCHITECTURE
DEFINED

UNDEFINED

Write access ignored
Read and write access permitted
Read access SUBARCHITECTURE

Read and write access

configuration

Read access permitted

DEFINED

UNDEFINED

Write access permitted

Write access permitted

a. See Jazelle OS Control Register (JOSCR) on page B1-77.

A2-78

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

EJVM operation
The following subsections summarize how an EJVM must operate, to meet the requirements of the
architecture:
•
Initialization
•
Bytecode execution
•
Jazelle exception conditions
•
Other considerations on page A2-80.
Initialization
During initialization, the EJVM must first check which subarchitecture is present, by checking the
Implementer and Subarchitecture codes in the value read from the JIDR.
If the EJVM is incompatible with the subarchitecture, it must do one of the following:
•
write a value with JE == 0 to the JMCR
•
if unaccelerated bytecode execution is unacceptable, generate an error.
If the EJVM is compatible with the subarchitecture, it must write its required configuration to the JMCR
and any SUBARCHITECTURE DEFINED configuration registers.
Bytecode execution
The EJVM must contain a handler for each bytecode.
The EJVM initiates bytecode execution by executing a BXJ instruction with:
•

the register operand specifying the target address of the bytecode handler for the first bytecode of the
program

•

the Application Level registers set up in accordance with the SUBARCHITECTURE DEFINED interface
standard.

The bytecode handler:
•

performs the data-processing operations required by the bytecode indicated

•

determines the address of the next bytecode to be executed

•

determines the address of the handler for that bytecode

•

performs a BXJ to that handler address with the registers again set up to the SUBARCHITECTURE
interface standard.

DEFINED

Jazelle exception conditions
During bytecode execution, the EJVM might encounter SUBARCHITECTURE DEFINED Jazelle exception
conditions that must be resolved by a software handler. For example, in the case of a configuration invalid
handler, the handler rewrites the desired configuration to the JMCR and to any SUBARCHITECTURE DEFINED
configuration registers.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-79

Application Level Programmers’ Model

On entry to a Jazelle exception condition handler the contents of the Application Level registers are
SUBARCHITECTURE DEFINED. This interface to the Jazelle exception condition handler might differ from the
interface standard for the bytecode handler, in order to supply information about the Jazelle exception
condition.
The Jazelle exception condition handler:
•

resolves the Jazelle exception condition

•

determines the address of the next bytecode to be executed

•

determines the address of the handler for that bytecode

•

performs a BXJ to that handler address with the registers again set up to the SUBARCHITECTURE
interface standard.

DEFINED

Other considerations
To ensure application execution and correct interaction with an operating system, an EJVM must only
perform operations that are permitted in unprivileged operation. In particular, for register accesses they must
only:
•
read the JIDR,
•
write to the JMCR, and other configuration registers.
An EJVM must not attempt to access the JOSCR.

A2-80

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Programmers’ Model

A2.11 Exceptions, debug events and checks
ARMv7 uses the following terms to describe various types of exceptional condition:
Exceptions

In the ARM architecture, exceptions cause entry into a privileged mode and execution of a
software handler for the exception.

Note
The terms floating-point exception and Jazelle exception condition do not use this meaning
of exception. These terms are described later in this list.
Exceptions include:
•
reset
•
interrupts
•
memory system aborts
•
undefined instructions
•
supervisor calls (SVCs).
Most details of exception handling are not visible to application-level code, and are
described in Exceptions on page B1-30. Aspects that are visible to application-level code
are:
•

The SVC instruction causes an SVC exception. This provides a mechanism for
unprivileged code to make a call to the operating system (or other privileged
component of the software system).

•

If the Security Extensions are implemented, the SMC instruction causes an SMC
exception, but only if it is executed in a privileged mode. Unprivileged code can only
cause SMC exceptions to occur by methods defined by the operating system (or other
privileged component of the software system).

•

The WFI instruction provides a hint that nothing needs to be done until an interrupt or
similar exception is taken, see Wait For Interrupt on page B1-47. This permits the
processor to enter a low-power state until that happens.

•

The WFE instruction provides a hint that nothing needs to be done until either an event
is generated by an SEV instruction or an interrupt or similar exception is taken, see
Wait For Event and Send Event on page B1-44. This permits the processor to enter a
low-power state until one of these happens.

•

The YIELD instruction provides a hint that the current execution thread is of low
importance, see The Yield instruction on page A2-82.

Floating-point exceptions
These relate to exceptional conditions encountered during floating-point arithmetic, such as
division by zero or overflow. For more information see:
•
Floating-point exceptions on page A2-42
•
Floating-point Status and Control Register (FPSCR) on page A2-28
•
ANSI/IEEE Std. 754-1985, IEEE Standard for Binary Floating-Point Arithmetic.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A2-81

Application Level Programmers’ Model

Jazelle exception conditions
These are conditions that cause Jazelle hardware acceleration to exit into a software handler,
as described in Jazelle exception conditions on page A2-79.
Debug events These are conditions that cause a debug system to take action. Most aspects of debug events
are not visible to application-level code, and are described in Chapter C3 Debug Events.
Aspects that are visible to application-level code include:

Checks

•

The BKPT instruction causes a BKPT Instruction debug event to occur, see BKPT
Instruction debug events on page C3-20.

•

The DBG instruction provides a hint to the debug system.

These are provided in the ThumbEE extension. A check causes an unconditional branch to
a specific handler entry point. The base address of the ThumbEE check handlers is held in
the TEEHBR, see ThumbEE Handler Base Register (TEEHBR) on page A2-71.

A2.11.1 The Yield instruction
In a Symmetric Multi-Threading (SMT) design, a thread can use a Yield instruction to give a hint to the
processor that it is running on. The Yield hint indicates that whatever the thread is currently doing is of low
importance, and so could yield. For example, the thread might be sitting in a spin-lock. Similar behavior
might be used to modify the arbitration priority of the snoop bus in a multiprocessor (MP) system. Defining
such an instruction permits binary compatibility between SMT and SMP systems.
ARMv7 defines a YIELD instruction as a specific NOP-hint instruction, see YIELD on page A8-812.
The YIELD instruction has no effect in a single-threaded system, but developers of such systems can use the
instruction to flag its intended use on migration to a multiprocessor or multithreading system. Operating
systems can use YIELD in places where a yield hint is wanted, knowing that it will be treated as a NOP if there
is no implementation benefit.

A2-82

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Chapter A3
Application Level Memory Model

This chapter gives an application level view of the memory model. It contains the following sections:
•
Address space on page A3-2
•
Alignment support on page A3-4
•
Endian support on page A3-7
•
Synchronization and semaphores on page A3-12
•
Memory types and attributes and the memory order model on page A3-24
•
Access rights on page A3-38
•
Virtual and physical addressing on page A3-40
•
Memory access order on page A3-41
•
Caches and memory hierarchy on page A3-51.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A3-1

Application Level Memory Model

A3.1

Address space
The ARM architecture uses a single, flat address space of 232 8-bit bytes. Byte addresses are treated as
unsigned numbers, running from 0 to 232 - 1. The address space is also regarded as:
•

•

230 32-bit words:
—

the address of each word is word-aligned, meaning that the address is divisible by 4 and the
last two bits of the address are 0b00

—

the word at word-aligned address A consists of the four bytes with addresses A, A+1, A+2 and
A+3.

231 16-bit halfwords:
—

the address of each halfword is halfword-aligned, meaning that the address is divisible by 2
and the last bit of the address is 0

—

the halfword at halfword-aligned address A consists of the two bytes with addresses A and
A+1.

In some situations the ARM architecture supports accesses to halfwords and words that are not aligned to
the appropriate access size, see Alignment support on page A3-4.
Normally, address calculations are performed using ordinary integer instructions. This means that the
address wraps around if the calculation overflows or underflows the address space. Another way of
describing this is that any address calculation is reduced modulo 232.

A3.1.1

Address incrementing and address space overflow
When a processor performs normal sequential execution of instructions, it effectively calculates:
(address_of_current_instruction) + (size_of_executed_instruction)

after each instruction to determine which instruction to execute next.

Note
The size of the executed instruction depends on the current instruction set, and might depend on the
instruction executed.
If this address calculation overflows the top of the address space, the result is UNPREDICTABLE. In other
words, a program must not rely on sequential execution of the instruction at address 0x00000000 after the
instruction at address:
0xFFFFFFFC, when a 4-byte instruction is executed
•
•
0xFFFFFFFE, when a 2-byte instruction is executed
0xFFFFFFFF, when a single byte instruction is executed.
•

A3-2

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Memory Model

This UNPREDICTABLE behavior only applies to instructions that are executed, including those that fail their
condition code check. Most ARM implementations prefetch instructions ahead of the currently-executing
instruction. If this prefetching overflows the top of the address space, it does not cause UNPREDICTABLE
behavior unless a prefetched instruction with an overflowed address is actually executed.
LDC, LDM, LDRD, POP, PUSH, STC, STRD, and STM instructions access a sequence of words at increasing memory
addresses, effectively incrementing the memory address by 4 for each load or store. If this calculation
overflows the top of the address space, the result is UNPREDICTABLE. In other words, programs must not use
these instructions in such a way that they attempt to access the word at address 0x00000000 sequentially after
the word at address 0xFFFFFFFC.

Note
In some cases instructions that operate on multiple words can decrement the memory address by 4 after each
word access. If this calculation underflows the address space, by decrementing the address 0x00000000, the
result is UNPREDICTABLE.
The behavior of any unaligned load or store with a calculated address that would access the byte at
0xFFFFFFFF and the byte at address 0x00000000 as part of the instruction is UNPREDICTABLE.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A3-3

Application Level Memory Model

A3.2

Alignment support
Instructions in the ARM architecture are aligned as follows:
•
ARM instructions are word-aligned
•
Thumb and ThumbEE instructions are halfword-aligned
•
Java bytecodes are byte-aligned.
The data alignment behavior supported by the ARM architecture has changed significantly between ARMv4
and ARMv7. This behavior is indicated by the SCTLR.U bit, see:
•
c1, System Control Register (SCTLR) on page B3-96 for a VMSAv7 implementation
•
c1, System Control Register (SCTLR) on page B4-45 for a PMSAv7 implementation
•
c1, System Control Register (SCTLR) on page AppxG-34 for architecture versions before ARMv7.
This bit defines the alignment behavior of the memory system for data accesses. Table A3-1 shows the
values of SCTLR.U for the different architecture versions.
Table A3-1 SCTLR.U bit values for different architecture versions
Architecture version

SCTLR.U value

Before ARMv6

0

ARMv6

0 or 1

ARMv7

1

On an ARMv6 processor, the SCTLR.U bit indicates which of two possible alignment models is selected:
U == 0

The processor implements the legacy alignment model. This is described in Alignment on
page AppxG-6.

Note
The use of U == 0 is deprecated in ARMv6T2, and is obsolete from ARMv7.
U == 1

The processor implements the alignment model described in this section. This model
supports unaligned data accesses.

ARMv7 requires the processor to implement the alignment model described in this section.

A3-4

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Memory Model

A3.2.1

Unaligned data access
An ARMv7 implementation must support unaligned data accesses. The SCTLR.U bit is RAO to indicate
this support. The SCTLR.A bit, the strict alignment bit, controls whether strict alignment is required. The
checking of load and store alignment depends on the value of this bit. For more information, see c1, System
Control Register (SCTLR) on page B3-96 for a VMSA implementation, or c1, System Control Register
(SCTLR) on page B4-45 for a PMSA implementation.
Table A3-2 shows how the checking of load and store alignment depends on the instruction type and the
value of SCTLR.A.
Table A3-2 Alignment requirements of load/store instructions
Alignment
check

Instructions

Result if check fails when:
SCTLR.A == 0

SCTLR.A == 1

LDRB, LDREXB, LDRBT, LDRSB, LDRSBT, STRB, STREXB, STRBT,
SWPB, TBB

None

-

-

LDRH, LDRHT, LDRSH, LDRSHT, STRH, STRHT, TBH

Halfword

Unaligned access

Alignment fault

LDREXH, STREXH

Halfword

Alignment fault

Alignment fault

LDR, LDRT, STR, STRT

Word

Unaligned access

Alignment fault

LDREX, STREX

Word

Alignment fault

Alignment fault

LDREXD, STREXD

Doubleword

Alignment fault

Alignment fault

All forms of LDM, LDRD, PUSH, POP, RFE, SRS, all forms of
STM, STRD, SWP

Word

Alignment fault

Alignment fault

LDC, LDC2, STC, STC2

Word

Alignment fault

Alignment fault

VLDM, VLDR, VSTM, VSTR

Word

Alignment fault

Alignment fault

VLD1, VLD2, VLD3, VLD4, VST1, VST2, VST3, VST4, all with

Element size

Unaligned access

Alignment fault

As specified by
@

Alignment fault

Alignment fault

standard alignment a
VLD1, VLD2, VLD3, VLD4, VST1, VST2, VST3, VST4, all with
@ specified a

a. These element and structure load/store instructions are only in the Advanced SIMD extension to the ARMv7 ARM and
Thumb instruction sets. ARMv7 does not support the pre-ARMv6 alignment model, so you cannot use that model with
these instructions.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A3-5

Application Level Memory Model

A3.2.2

Cases where unaligned accesses are UNPREDICTABLE
The following cases cause the resulting unaligned accesses to be UNPREDICTABLE, and overrule any
successful load or store behavior described in Unaligned data access on page A3-5:
•

Any load instruction that is not faulted by the alignment restrictions and that loads the PC has
behavior if it the address it loads from is not word-aligned.

UNPREDICTABLE

•

Any unaligned access that is not faulted by the alignment restrictions and that accesses memory with
the Strongly-ordered or Device attribute has UNPREDICTABLE behavior.

Note
These memory attributes are described in Memory types and attributes and the memory order model
on page A3-24.

A3.2.3

Unaligned data access restrictions in ARMv7 and ARMv6
ARMv7 and ARMv6 have the following restrictions on unaligned data accesses:
•

Accesses are not guaranteed to be single-copy atomic, see Atomicity in the ARM architecture on
page A3-26. An access can be synthesized out of a series of aligned operations in a shared memory
system without guaranteeing locked transaction cycles.

•

Unaligned accesses typically take a number of additional cycles to complete compared to a naturally
aligned transfer. The real-time implications must be analyzed carefully and key data structures might
need to have their alignment adjusted for optimum performance.

•

If an unaligned access occurs across a page boundary, the operation can abort on either or both halves
of the access.

Shared memory schemes must not rely on seeing monotonic updates of non-aligned data of loads and stores
for data items larger than byte wide. For more information, see Atomicity in the ARM architecture on
page A3-26.
Unaligned access operations must not be used for accessing Device memory-mapped registers. They must
only be used with care in shared memory structures that are protected by aligned semaphores or
synchronization variables.

A3-6

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Memory Model

A3.3

Endian support
The rules in Address space on page A3-2 require that for a word-aligned address A:
•
the word at address A consists of the bytes at addresses A, A+1, A+2 and A+3
•
the halfword at address A consists of the bytes at addresses A and A+1
•
the halfword at address A+2 consists of the bytes at addresses A+2 and A+3.
•
the word at address A therefore consists of the halfwords at addresses A and A+2.
However, this does not specify completely the mappings between words, halfwords, and bytes.
A memory system uses one of the two following mapping schemes. This choice is known as the endianness
of the memory system.
In a little-endian memory system:
•

the byte or halfword at a word-aligned address is the least significant byte or halfword in the word at
that address

•

the byte at a halfword-aligned address is the least significant byte in the halfword at that address.

In a big-endian memory system:
•

the byte or halfword at a word-aligned address is the most significant byte or halfword in the word at
that address

•

the byte at a halfword-aligned address is the most significant byte in the halfword at that address.

For a word-aligned address A, Table A3-3 and Table A3-4 on page A3-8 show the relationship between:
•
the word at address A
•
the halfwords at addresses A and A+2
•
the bytes at addresses A, A+1, A+2 and A+3.
Table A3-3 shows this relationship for a big-endian memory system, and Table A3-4 on page A3-8 shows
the relationship for a little-endian memory system.
Table A3-3 Big-endian memory system
MSByte

MSByte - 1

LSByte + 1

LSByte

Word at Address A
Halfword at Address A
Byte at Address A

ARM DDI 0406B

Byte at Address A+1

Halfword at Address A+2
Byte at Address A+2

Byte at Address A+3

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A3-7

Application Level Memory Model

Table A3-4 Little-endian memory system
MSByte

MSByte - 1

LSByte + 1

LSByte

Word at Address A
Halfword at Address A+2
Byte at Address A+3

Byte at Address A+2

Halfword at Address A
Byte at Address A+1

Byte at Address A

The big-endian and little-endian mapping schemes determine the order in which the bytes of a word or
halfword are interpreted. For example, a load of a word (4 bytes) from address 0x1000 always results in an
access of the bytes at memory locations 0x1000, 0x1001, 0x1002, and 0x1003. The endianness mapping scheme
determines the significance of these four bytes.

A3.3.1

Control of the endianness mapping scheme in ARMv7
In ARMv7-A, the mapping of instruction memory is always little-endian. In ARMv7-R, instruction
endianness can be controlled at the system level, see Instruction endianness.
For information about data memory endianness control, see ENDIANSTATE on page A2-19.

Note
Versions of the ARM architecture before ARMv7 had a different mechanism to control the endianness, see
Endian configuration and control on page AppxG-20.

A3.3.2

Instruction endianness
Before ARMv7, the ARM architecture included legacy support for an alternative big-endian memory model,
described as BE-32 and controlled by the B bit, bit [7], of the SCTLR, see c1, System Control Register
(SCTLR) on page AppxG-34. ARMv7 does not support BE-32 operation, and bit [7] of the SCTLR is RAZ.
Where legacy object code for ARM processors contains instructions with a big-endian byte order, the
removal of support for BE-32 operation requires the instructions in the object files to have their bytes
reversed for the code to be executed on an ARMv7 processor. This means that:
•

each Thumb instruction, whether a 32-bit Thumb instruction or a 16-bit Thumb instruction, must
have the byte order of each halfword of instruction reversed

•

each ARM instruction must have the byte order of each word of instruction reversed.

For most situations, this can be handled in the link stage of a tool-flow, provided the object files include
sufficient information to permit this to happen. In practice, this is the situation for all applications with the
ARMv7-A profile.

A3-8

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Memory Model

For applications of the ARMv7-R profile, there are some legacy code situations where the arrangement of
the bytes in the object files cannot be adjusted by the linker. For these object files to be used by an ARMv7-R
processor the byte order of the instructions must be reversed by the processor at runtime. Therefore, the
ARMv7-R profile permits configuration of the instruction endianness.

Instruction endianness static configuration, ARMv7-R only
To provide support for legacy big-endian object code, the ARMv7-R profile supports optional byte order
reversal hardware as a static option from reset. The ARMv7-R profile includes a read-only bit in the CP15
Control Register, SCTLR.IE, bit [31]. For more information, see c1, System Control Register (SCTLR) on
page B4-45.

A3.3.3

Element size and endianness
The effect of the endianness mapping on data transfers depends on the size of the data element or elements
transferred by the load/store instructions. Table A3-5 lists the element sizes of all the load/store instructions,
for all instruction sets.
Table A3-5 Element size of load/store instructions

Instructions

Element size

LDRB, LDREXB, LDRBT, LDRSB, LDRSBT, STRB, STREXB, STRBT, SWPB, TBB

Byte

LDRH, LDREXH, LDRHT, LDRSH, LDRSHT, STRH, STREXH, STRHT, TBH

Halfword

LDR, LDRT, LDREX, STR, STRT, STREX

Word

LDRD, LDREXD, STRD, STREXD

Word

All forms of LDM, PUSH, POP, RFE, SRS, all forms of STM, SWP

Word

LDC, LDC2, STC, STC2, VLDM, VLDR, VSTM, VSTR

Word

VLD1, VLD2, VLD3, VLD4, VST1, VST2, VST3, VST4

Element size of the Advanced SIMD access

A3.3.4

Instructions to reverse bytes in a general-purpose register
An application or device driver might have to interface to memory-mapped peripheral registers or shared
memory structures that are not the same endianness as the internal data structures. Similarly, the endianness
of the operating system might not match that of the peripheral registers or shared memory. In these cases,
the processor requires an efficient method to transform explicitly the endianness of the data.
In ARMv7, the ARM and Thumb instruction sets provide this functionality. There are instructions to:
•

ARM DDI 0406B

Reverse word (four bytes) register, for transforming big and little-endian 32-bit representations. See
REV on page A8-272.

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A3-9

Application Level Memory Model

A3.3.5

•

Reverse halfword and sign-extend, for transforming signed 16-bit representations. See REVSH on
page A8-276.

•

Reverse packed halfwords in a register for transforming big- and little-endian 16-bit representations.
See REV16 on page A8-274.

Endianness in Advanced SIMD
Advanced SIMD element load/store instructions transfer vectors of elements between memory and the
Advanced SIMD register bank. An instruction specifies both the length of the transfer and the size of the
data elements being transferred. This information is used by the processor to load and store data correctly
in both big-endian and little-endian systems.
Consider. for example, the instruction:
VLD1.16 {D0}, [R1]

This loads a 64-bit register with four 16-bit values. The four elements appear in the register in array order,
with the lowest indexed element fetched from the lowest address. The order of bytes in the elements depends
on the endianness configuration, as shown in Figure A3-1. Therefore, the order of the elements in the
registers is the same regardless of the endianness configuration. This means that Advanced SIMD code is
usually independent of endianness.
64-bit register containing four 16-bit elements
D[15:8]

D[7:0]

C[15:8]

C[7:0]

B[15:8]

B[7:0]

A[15:8]

A[7:0]

0 A[7:0]

0 A[15:8]

1 A[15:8]

1 A[7:0]

2 B[7:0]

2 B[15:8]

3 B[15:8]
4 C[7:0]

3 B[7:0]
VLD1.16 {D0}, [R1]

VLD1.16 {D0}, [R1]

4 C[15:8]

5 C[15:8]

5 C[7:0]

6 D[7:0]

6 D[15:8]

7 D[15:8]

7 D[7:0]

Memory system with
Little endian addressing (LE)

Memory system with
Big endian addressing (BE)

Figure A3-1 Advanced SIMD byte order example
The Advanced SIMD extension supports Little-Endian (LE) and Big-Endian (BE) models.
For information about the alignment of Advanced SIMD instructions see Unaligned data access on
page A3-5.

A3-10

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Memory Model

Note
Advanced SIMD is an extension to the ARMv7 ARM and Thumb instruction sets. In ARMv7, the SCTLR.B
bit always has the value 0, indicating that ARMv7 does not support the legacy BE-32 endianness model, and
you cannot use this model with Advanced SIMD element and structure load/store instructions.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A3-11

Application Level Memory Model

A3.4

Synchronization and semaphores
In architecture versions before ARMv6, support for the synchronization of shared memory depends on the
SWP and SWPB instructions. These are read-locked-write operations that swap register contents with memory,
and are described in SWP, SWPB on page A8-432. These instructions support basic busy/free semaphore
mechanisms, but do not support mechanisms that require calculation to be performed on the semaphore
between the read and write phases.
ARMv6 introduced a new mechanism to support more comprehensive non-blocking synchronization of
shared memory, using synchronization primitives that scale for multiprocessor system designs. ARMv6
provided a pair of synchronization primitives, LDREX and STREX. ARMv7 extends the new model by:
•
adding byte, halfword and doubleword versions of the synchronization primitives
•
adding a Clear-Exclusive instruction, CLREX
•
adding the synchronization primitives to the Thumb instruction set.

Note
From ARMv6, use of the SWP and SWPB instructions is deprecated. ARM strongly recommends that all
software migrates to using the new synchronization primitives described in this section.
In ARMv7, the synchronization primitives provided in the ARM and Thumb instruction sets are:
•
Load-Exclusives:
LDREX, see LDREX on page A8-142
—
LDREXB, see LDREXB on page A8-144
—
LDREXD, see LDREXD on page A8-146
—
—
LDREXH, see LDREXH on page A8-148
•
Store-Exclusives:
STREX, see STREX on page A8-400
—
—
STREXB, see STREXB on page A8-402
STREXD, see STREXD on page A8-404
—
STREXH, see STREXH on page A8-406
—
•
Clear-Exclusive, CLREX, see CLREX on page A8-70.

Note
This section describes the operation of a Load-Exclusive/Store-Exclusive pair of synchronization primitives
using, as examples, the LDREX and STREX instructions. The same description applies to any other pair of
synchronization primitives:
LDREXB used with STREXB
•
•
LDREXD used with STREXD
•
LDREXH used with STREXH.
Each Load-Exclusive instruction must be used only with the corresponding Store-Exclusive instruction.

A3-12

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Memory Model

The model for the use of a Load-Exclusive/Store-Exclusive instruction pair, accessing a non-aborting
memory address x is:
•

The Load-Exclusive instruction reads a value from memory address x.

•

The corresponding Store-Exclusive instruction succeeds in writing back to memory address x only if
no other observer, process, or thread has performed a more recent store of address x. The
Store-Exclusive operation returns a status bit that indicates whether the memory write succeeded.

A Load-Exclusive instruction tags a small block of memory for exclusive access. The size of the tagged
block is IMPLEMENTATION DEFINED, see Tagging and the size of the tagged memory block on page A3-20.
A Store-Exclusive instruction to the same address clears the tag.

Note
In this section, the term processor includes any observer that can generate a Load-Exclusive or a
Store-Exclusive.

A3.4.1

Exclusive access instructions and Non-shareable memory regions
For memory regions that do not have the Shareable attribute, the exclusive access instructions rely on a local
monitor that tags any address from which the processor executes a Load-Exclusive. Any non-aborted
attempt by the same processor to use a Store-Exclusive to modify any address is guaranteed to clear the tag.
A Load-Exclusive performs a load from memory, and:
•
the executing processor tags the physical memory address for exclusive access
•
the local monitor of the executing processor transitions to its Exclusive Access state.
A Store-Exclusive performs a conditional store to memory, that depends on the state of the local monitor:
If the local monitor is in its Exclusive Access state
•

If the address of the Store-Exclusive is the same as the address that has been tagged
in the monitor by an earlier Load-Exclusive, then the store takes place, otherwise it
is IMPLEMENTATION DEFINED whether the store takes place.

•

A status value is returned to a register:
—
if the store took place the status value is 0
—
otherwise, the status value is 1.

•

The local monitor of the executing processor transitions to its Open Access state.

If the local monitor is in its Open Access state
•
no store takes place
•
a status value of 1 is returned to a register.
•
the local monitor remains in its Open Access state.
The Store-Exclusive instruction defines the register to which the status value is returned.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A3-13

Application Level Memory Model

When a processor writes using any instruction other than a Store-Exclusive:
•

if the write is to a physical address that is not covered by its local monitor the write does not affect
the state of the local monitor

•

if the write is to a physical address that is covered by its local monitor it is IMPLEMENTATION DEFINED
whether the write affects the state of the local monitor.

If the local monitor is in its Exclusive Access state and the processor performs a Store-Exclusive to any
address other than the last one from which it performed a Load-Exclusive, it is IMPLEMENTATION DEFINED
whether the store updates memory, but in all cases the local monitor is reset to its Open Access state. This
mechanism:
•
is used on a context switch, see Context switch support on page A3-21
•
must be treated as a software programming error in all other cases.

Note
It is UNPREDICTABLE whether a store to a tagged physical address causes a tag in the local monitor to be
cleared if that store is by an observer other than the one that caused the physical address to be tagged.
Figure A3-2 shows the state machine for the local monitor. Table A3-6 on page A3-15 shows the effect of
each of the operations shown in the figure.
LoadExcl(x)
Exclusive
Access

Open
Access
CLREX
StoreExcl(x)
Store(x)

LoadExcl(x)

CLREX
Store(Tagged_address) *
StoreExcl(Tagged_address)
StoreExcl(!Tagged_address)

Store(!Tagged_address)
Store(Tagged_address) *

Operations marked * are possible alternative IMPLEMENTATION DEFINED options.
In the diagram: LoadExcl represents any Load-Exclusive instruction
StoreExcl represents any Store-Exclusive instruction
Store represents any other store instruction.
Any LoadExcl operation updates the tagged address to the most significant bits of the address x used
for the operation. For more information see the section Size of the tagged memory block.

Figure A3-2 Local monitor state machine diagram

A3-14

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Memory Model

Note
For the local monitor state machine, as shown in Figure A3-2 on page A3-14:
•

The IMPLEMENTATION DEFINED options for the local monitor are consistent with the local monitor
being constructed so that it does not hold any physical address, but instead treats any access as
matching the address of the previous LoadExcl.

•

A local monitor implementation can be unaware of Load-Exclusive and Store-Exclusive operations
from other processors.

•

It is UNPREDICTABLE whether the transition from Exclusive Access to Open Access state occurs when
the Store or StoreExcl is from another observer.

Table A3-6 shows the effect of the operations shown in Figure A3-2 on page A3-14.
Table A3-6 Effect of Exclusive instructions and write operations on local monitor
Initial state

Operation a

Effect

Final state

Open Access

CLREX

No effect

Open Access

Open Access

StoreExcl(x)

Does not update memory, returns status 1

Open Access

Open Access

LoadExcl(x)

Loads value from memory, tags address x

Exclusive Access

Open Access

Store(x)

Updates memory, no effect on monitor

Open Access

Exclusive Access

CLREX

Clears tagged address

Open Access

Exclusive Access

StoreExcl(t)

Updates memory, returns status 0

Open Access

Exclusive Access

StoreExcl(!t)

Updates memory, returns status 0 b
Open Access
Does not update memory, returns status 1 b
Exclusive Access

LoadExcl(x)

Loads value from memory, changes tag to address to x

Exclusive Access

Exclusive Access

Store(!t)

Updates memory, no effect on monitor

Exclusive Access

Exclusive Access

Store(t)

Updates memory

Exclusive Access b
Open Access b
a. In the table:
LoadExcl represents any Load-Exclusive instruction
StoreExcl represents any Store-Exclusive instruction
Store represents any store operation other than a Store-Exclusive operation.
t is the tagged address, bits [31:a] of the address of the last Load-Exclusive instruction. For more information, see
Tagging and the size of the tagged memory block on page A3-20.
b. IMPLEMENTATION DEFINED alternative actions.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A3-15

Application Level Memory Model

A3.4.2

Exclusive access instructions and Shareable memory regions
For memory regions that have the Shareable attribute, exclusive access instructions rely on:
•

A local monitor for each processor in the system, that tags any address from which the processor
executes a Load-Exclusive. The local monitor operates as described in Exclusive access instructions
and Non-shareable memory regions on page A3-13, except that for Shareable memory any
Store-Exclusive is then subject to checking by the global monitor if it is described in that section as
doing at least one of:
—

updating memory

—

returning a status value of 0.

The local monitor can ignore exclusive accesses from other processors in the system.
•

A global monitor that tags a physical address as exclusive access for a particular processor. This tag
is used later to determine whether a Store-Exclusive to that address that has not been failed by the
local monitor can occur. Any successful write to the tagged address by any other observer in the
shareability domain of the memory location is guaranteed to clear the tag. For each processor in the
system, the global monitor:
—
holds a single tagged address
—
maintains a state machine.

The global monitor can either reside in a processor block or exist as a secondary monitor at the memory
interfaces.

Note
An implementation can combine the functionality of the global and local monitors into a single unit.

Operation of the global monitor
Load-Exclusive from Shareable memory performs a load from memory, and causes the physical address of
the access to be tagged as exclusive access for the requesting processor. This access also causes the exclusive
access tag to be removed from any other physical address that has been tagged by the requesting processor.
The global monitor only supports a single outstanding exclusive access to Shareable memory per processor.
Store-Exclusive performs a conditional store to memory:
•

The store is guaranteed to succeed only if the physical address accessed is tagged as exclusive access
for the requesting processor and both the local monitor and the global monitor state machines for the
requesting processor are in the Exclusive Access state. In this case:
—

a status value of 0 is returned to a register to acknowledge the successful store

—

the final state of the global monitor state machine for the requesting processor is
IMPLEMENTATION DEFINED

—

A3-16

if the address accessed is tagged for exclusive access in the global monitor state machine for
any other processor then that state machine transitions to Open Access state.

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Memory Model

•

•

If no address is tagged as exclusive access for the requesting processor, the store does not succeed:
—

a status value of 1 is returned to a register to indicate that the store failed

—

the global monitor is not affected and remains in Open Access state for the requesting
processor.

If a different physical address is tagged as exclusive access for the requesting processor, it is
whether the store succeeds or not:

IMPLEMENTATION DEFINED

—

if the store succeeds a status value of 0 is returned to a register, otherwise a value of 1 is
returned

—

if the global monitor state machine for the processor was in the Exclusive Access state before
the Store-Exclusive it is IMPLEMENTATION DEFINED whether that state machine transitions to
the Open Access state.

The Store-Exclusive instruction defines the register to which the status value is returned.
In a shared memory system, the global monitor implements a separate state machine for each processor in
the system. The state machine for accesses to Shareable memory by processor (n) can respond to all the
Shareable memory accesses visible to it. This means it responds to:
•
accesses generated by the associated processor (n)
•
accesses generated by the other observers in the shareability domain of the memory location (!n).
In a shared memory system, the global monitor implements a separate state machine for each observer that
can generate a Load-Exclusive or a Store-Exclusive in the system.
Figure A3-3 on page A3-18 shows the state machine for processor(n) in a global monitor. Table A3-7 on
page A3-19 shows the effect of each of the operations shown in the figure.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A3-17

Application Level Memory Model

LoadExcl(x,n)

LoadExcl(x,n)
Exclusive
Access

Open
Access
CLREX(n), CLREX(!n),
LoadExcl(x,!n),
StoreExcl(x,n),
StoreExcl(x,!n),
Store(x,n), Store(x,!n)

StoreExcl(Tagged_address,!n)‡
Store(Tagged_address,!n)
StoreExcl(Tagged_address,n) *
StoreExcl(!Tagged_address,n) *
Store(Tagged_address,n) *
CLREX(n) *

StoreExcl(Tagged_address,!n)‡
Store(!Tagged_address,n)
StoreExcl(Tagged_address,n) *
StoreExcl(!Tagged_address,n) *
Store(Tagged_address,n) *
CLREX(n) *
StoreExcl(!Tagged_address,!n)
Store(!Tagged_address,!n)
CLREX(!n)

‡ StoreExcl(Tagged_Address,!n) clears the monitor only if the StoreExcl updates memory
Operations marked * are possible alternative IMPLEMENTATION DEFINED options.
In the diagram: LoadExcl represents any Load-Exclusive instruction
StoreExcl represents any Store-Exclusive instruction
Store represents any other store instruction.
Any LoadExcl operation updates the tagged address to the most significant bits of the address x
used for the operation. For more information see the section Size of the tagged memory block.

Figure A3-3 Global monitor state machine diagram for processor(n) in a multiprocessor system

Note
For the global monitor state machine, as shown in Figure A3-3:
•

Whether a Store-Exclusive successfully updates memory or not depends on whether the address
accessed matches the tagged Shareable memory address for the processor issuing the Store-Exclusive
instruction. For this reason, Figure A3-3 and Table A3-7 on page A3-19 only show how the (!n)
entries cause state transitions of the state machine for processor(n).

•

An Load-Exclusive can only update the tagged Shareable memory address for the processor issuing
the Load-Exclusive instruction.

•

The effect of the CLREX instruction on the global monitor is IMPLEMENTATION DEFINED.

•

A3-18

It is IMPLEMENTATION DEFINED:
—

whether a modification to a non-shareable memory location can cause a global monitor to
transition from Exclusive Access to Open Access state

—

whether a Load-Exclusive to a non-shareable memory location can cause a global monitor to
transition from Open Access to Exclusive Access state.

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Memory Model

Table A3-7 shows the effect of the operations shown in Figure A3-3 on page A3-18.
Table A3-7 Effect of load/store operations on global monitor for processor(n)
Initial
state a
Open

Operation b

Effect

Final
state a

CLREX(n),
CLREX(!n)

None

Open

StoreExcl(x,n)

Does not update memory, returns status 1

Open

LoadExcl(x,!n)

Loads value from memory, no effect on tag address for processor(n)

Open

StoreExcl(x,!n)

Depends on state machine and tag address for processor issuing

Open

STREX c

Exclusive

Store(x,n),
Store(x,!n)

Updates memory, no effect on monitor

Open

LoadExcl(x,n)

Loads value from memory, tags address x

Exclusive

LoadExcl(x,n)

Loads value from memory, tags address x

Exclusive

CLREX(n)

None. Effect on the final state is IMPLEMENTATION DEFINED.

Exclusive e
Open e
CLREX(!n)

None

Exclusive

Updates memory, returns status 0 c

Open

Does not update memory, returns status 1 c

Exclusive

StoreExcl(t,!n)

Open
StoreExcl(t,n)

Updates memory, returns status 0 d
Exclusive
Open
Updates memory, returns status 0 e
Exclusive

StoreExcl(!t,n)

Open
Does not update memory, returns status 1 e
Exclusive
StoreExcl(!t,!n)

Depends on state machine and tag address for processor issuing

Exclusive

STREX

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A3-19

Application Level Memory Model

Table A3-7 Effect of load/store operations on global monitor for processor(n) (continued)
Initial
state a

Operation b

Effect

Store(t,n)

Updates memory

Exclusive

Final
state a
Exclusive e
Open e

Store(t,!n)

Updates memory

Open

Store(!t,n),
Store(!t,!n)

Updates memory, no effect on monitor

Exclusive

a. Open = Open Access state, Exclusive = Exclusive Access state.
b. In the table:
LoadExcl represents any Load-Exclusive instruction
StoreExcl represents any Store-Exclusive instruction
Store represents any store operation other than a Store-Exclusive operation.
t is the tagged address for processor(n), bits [31:a] of the address of the last Load-Exclusive instruction issued by
processor(n), see Tagging and the size of the tagged memory block.
c. The result of a STREX(x,!n) or a STREX(t,!n) operation depends on the state machine and tagged address for the processor
issuing the STREX instruction. This table shows how each possible outcome affects the state machine for processor(n).
d. After a successful STREX to the tagged address, the state of the state machine is IMPLEMENTATION DEFINED. However,
this state has no effect on the subsequent operation of the global monitor.
e. Effect is IMPLEMENTATION DEFINED. The table shows all permitted implementations.

A3.4.3

Tagging and the size of the tagged memory block
As stated in the footnotes to Table A3-6 on page A3-15 and Table A3-7 on page A3-19, when a
Load-Exclusive instruction is executed, the resulting tag address ignores the least significant bits of the
memory address.
Tagged_address = Memory_address[31:a]

The value of a in this assignment is IMPLEMENTATION DEFINED, between a minimum value of 3 and a
maximum value of 11. For example, in an implementation where a == 4, a successful LDREX of address
0x000341B4 gives a tag value of bits [31:4] of the address, giving 0x000341B. This means that the four words
of memory from 0x0003 41B0 to 0x000341BF are tagged for exclusive access.
The size of the tagged memory block called the Exclusives Reservation Granule. The Exclusives
Reservation Granule is IMPLEMENTATION DEFINED between:
•
two words, in an implementation with a == 3
•
512 words, in an implementation with a == 11.
In some implementations the CTR identifies the Exclusives Reservation Granule, see:
•
c0, Cache Type Register (CTR) on page B3-83 for a VMSA implementation
•
c0, Cache Type Register (CTR) on page B4-34 for a PMSA implementation.

A3-20

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Memory Model

A3.4.4

Context switch support
After a context switch, software must ensure that the local monitor is in the Open Access state. This requires
it to either:
•
execute a CLREX instruction
•
execute a dummy STREX to a memory address allocated for this purpose.

Note
•

Using a dummy STREX for this purpose is backwards-compatible with the ARMv6 implementation of
the exclusive operations. The CLREX instruction is introduced in ARMv6K.

•

Context switching is not an application level operation. However, this information is included here to
complete the description of the exclusive operations.

The STREX or CLREX instruction following a context switch might cause a subsequent Store-Exclusive to fail,
requiring a load … store sequence to be replayed. To minimize the possibility of this happening, ARM
recommends that the Store-Exclusive instruction is kept as close as possible to the associated
Load-Exclusive instruction, see Load-Exclusive and Store-Exclusive usage restrictions.

A3.4.5

Load-Exclusive and Store-Exclusive usage restrictions
The Load-Exclusive and Store-Exclusive instructions are intended to work together, as a pair, for example
a LDREX/STREX pair or a LDREXB/STREXB pair. As mentioned in Context switch support, ARM recommends that
the Store-Exclusive instruction always follows within a few instructions of its associated Load-Exclusive
instructions. To support different implementations of these functions, software must follow the notes and
restrictions given here.
These notes describe use of an LDREX/STREX pair, but apply equally to any other
Load-Exclusive/Store-Exclusive pair:
•

The exclusives support a single outstanding exclusive access for each processor thread that is
executed. The architecture makes use of this by not requiring an address or size check as part of the
IsExclusiveLocal() function. If the target address of an STREX is different from the preceding LDREX in
the same execution thread, behavior can be UNPREDICTABLE. As a result, an LDREX/STREX pair can only
be relied upon to eventually succeed if they are executed with the same address. Where a context
switch or exception might result in a change of execution thread, a CLREX instruction or a dummy
STREX instruction must be executed to avoid unwanted effects, as described in Context switch support
Using an STREX in this way is the only occasion where software can program an STREX with a different
address from the previously executed LDREX.

•

An explicit store to memory can cause the clearing of exclusive monitors associated with other
processors, therefore, performing a store between the LDREX and the STREX can result in a livelock
situation. As a result, code must avoid placing an explicit store between an LDREX and an STREX in a
single code sequence.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A3-21

Application Level Memory Model

A3-22

•

If two STREX instructions are executed without an intervening LDREX the second STREX returns a status
value of 1. This means that:
—
every STREX must have a preceding LDREX associated with it in a given thread of execution
—
it is not necessary for every LDREX to have a subsequent STREX.

•

An implementation of the Load-Exclusive and Store-Exclusive instructions can require that, in any
thread of execution, the transaction size of a Store-Exclusive is the same as the transaction size of the
preceding Load-Exclusive that was executed in that thread. If the transaction size of a
Store-Exclusive is different from the preceding Load-Exclusive in the same execution thread,
behavior can be UNPREDICTABLE. As a result, an LDREX/STREX pair can only be relied upon to
eventually succeed only if they have the same size. Where a context switch or exception might result
in a change of execution thread, the software must execute a CLREX instruction or a dummy STREX
instruction to avoid unwanted effects, as described in Context switch support on page A3-21. Using
an STREX in this way is the only occasion where software can use a Store-Exclusive instruction with
a different transaction size from the previously executed Load-Exclusive instruction.

•

An implementation might clear an exclusive monitor between the LDREX and the STREX, without any
application-related cause. For example, this might happen because of cache evictions. Code written
for such an implementation must avoid having any explicit memory accesses or cache maintenance
operations between the LDREX and STREX instructions.

•

Implementations can benefit from keeping the LDREX and STREX operations close together in a single
code sequence. This minimizes the likelihood of the exclusive monitor state being cleared between
the LDREX instruction and the STREX instruction. Therefore, ARM strongly recommends a limit of 128
bytes between LDREX and STREX instructions in a single code sequence, for best performance.

•

Implementations that implement coherent protocols, or have only a single master, might combine the
local and global monitors for a given processor. The IMPLEMENTATION DEFINED and UNPREDICTABLE
parts of the definitions in Exclusive monitors operations on page B2-35 are provided to cover this
behavior.

•

The architecture sets an upper limit of 2048 bytes on the size of a region that can be marked as
exclusive. Therefore, for performance reasons, ARM recommends that software separates objects
that will be accessed by exclusive accesses by at least 2048 bytes. This is a performance guideline
rather than a functional requirement.

•

LDREX and STREX operations must be performed only on memory with the Normal memory attribute.

•

The effect of Data Abort exceptions on the state of monitors is UNPREDICTABLE. ARM recommends
that abort handling code performs a CLREX instruction or a dummy STREX instruction to clear the
monitor state.

•

If the memory attributes for the memory being accessed by an LDREX/STREX pair are changed between
the LDREX and the STREX, behavior is UNPREDICTABLE.

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Memory Model

A3.4.6

Semaphores
The Swap (SWP) and Swap Byte (SWPB) instructions must be used with care to ensure that expected behavior
is observed. Two examples are as follows:
1.

A system with multiple bus masters that uses Swap instructions to implement semaphores that control
interactions between different bus masters.
In this case, the semaphores must be placed in an uncached region of memory, where any buffering
of writes occurs at a point common to all bus masters using the mechanism. The Swap instruction
then causes a locked read-write bus transaction.

2.

A systems with multiple threads running on a uniprocessor that uses the Swap instructions to
implement semaphores that control interaction of the threads.
In this case, the semaphores can be placed in a cached region of memory, and a locked read-write bus
transaction might or might not occur. The Swap and Swap Byte instructions are likely to have better
performance on such a system than they do on a system with multiple bus masters such as that
described in example 1.

Note
From ARMv6, use of the Swap and Swap Byte instructions is deprecated. All new software should use the
Load-Exclusive and Store-Exclusive synchronization primitives described in Synchronization and
semaphores on page A3-12, for example LDREX and STREX.

A3.4.7

Synchronization primitives and the memory order model
The synchronization primitives follow the memory order model of the memory type accessed by the
instructions. For this reason:
•

Portable code for claiming a spin-lock must include a Data Memory Barrier (DMB) operation,
performed by a DMB instruction, between claiming the spin-lock and making any access that makes
use of the spin-lock.

•

Portable code for releasing a spin-lock must include a DMB instruction before writing to clear the
spin-lock.

This requirement applies to code using:
•
the Load-Exclusive/Store-Exclusive instruction pairs, for example LDREX/STREX
•
the deprecated synchronization primitives, SWP/SWPB.

A3.4.8

Use of WFE and SEV instructions by spin-locks
ARMv7 and ARMv6K provide Wait For Event and Send Event instructions, WFE and SEV, that can assist with
reducing power consumption and bus contention caused by processors repeatedly attempting to obtain a
spin-lock. These instructions can be used at application level, but a complete understanding of what they do
depends on system-level understanding of exceptions. They are described in Wait For Event and Send Event
on page B1-44.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A3-23

Application Level Memory Model

A3.5

Memory types and attributes and the memory order model
ARMv6 defined a set of memory attributes with the characteristics required to support the memory and
devices in the system memory map. In ARMv7 this set of attributes is extended by the addition of the Outer
Shareable attribute for Normal memory.

Note
Whether an ARMv7 implementation supports the Outer Shareable memory attribute is IMPLEMENTATION
DEFINED.
The ordering of accesses for regions of memory, referred to as the memory order model, is defined by the
memory attributes. This model is described in the following sections:
•
Memory types
•
Summary of ARMv7 memory attributes on page A3-25
•
Atomicity in the ARM architecture on page A3-26
•
Normal memory on page A3-28
•
Device memory on page A3-33
•
Strongly-ordered memory on page A3-34
•
Memory access restrictions on page A3-35
•
Backwards compatibility on page A3-37
•
The effect of the Security Extensions on page A3-37.

A3.5.1

Memory types
For each memory region, the most significant memory attribute specifies the memory type. There are three
mutually exclusive memory types:
•
Normal
•
Device
•
Strongly-ordered.
Normal and Device memory regions have additional attributes.
Usually, memory used for program code and for data storage is Normal memory. Examples of Normal
memory technologies are:
•
programmed Flash ROM

Note
During programming, Flash memory can be ordered more strictly than Normal memory.
•
•
•

A3-24

ROM
SRAM
DRAM and DDR memory.

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Memory Model

System peripherals (I/O) generally conform to different access rules to Normal memory. Examples of I/O
accesses are:
•

FIFOs where consecutive accesses
—
add queued values on write accesses
—
remove queued values on read accesses.

•

interrupt controller registers where an access can be used as an interrupt acknowledge, changing the
state of the controller itself

•

memory controller configuration registers that are used to set up the timing and correctness of areas
of Normal memory

•

memory-mapped peripherals, where accessing a memory location can cause side effects in the
system.

In ARMv7, regions of the memory map for these accesses are defined as Device or Strongly-ordered
memory. To ensure system correctness, access rules for Device and Strongly-ordered memory are more
restrictive than those for Normal memory:
•
both read and write accesses can have side effects
•
accesses must not be repeated, for example, on return from an exception
•
the number, order and sizes of the accesses must be maintained.
In addition, for Strongly-ordered memory, all memory accesses are strictly ordered to correspond to the
program order of the memory access instructions.

A3.5.2

Summary of ARMv7 memory attributes
Table A3-8 summarizes the memory attributes. For more information about theses attributes see:
•

Normal memory on page A3-28 and Shareable attribute for Device memory regions on page A3-34,
for the shareability attribute

•

Write-Through Cacheable, Write-Back Cacheable and Non-cacheable Normal memory on
page A3-32, for the cacheability attribute.
Table A3-8 Memory attribute summary

Memory type
attribute
Stronglyordered

ARM DDI 0406B

Shareability
-

Other attributes
-

Description
All memory accesses to
Strongly-ordered memory
occur in program order. All
Strongly-ordered regions are
assumed to be Shareable.

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A3-25

Application Level Memory Model

Table A3-8 Memory attribute summary (continued)
Memory type
attribute

Shareability

Device

Shareable

-

Intended to handle memorymapped peripherals that are
shared by several processors.

Nonshareable

-

Intended to handle memorymapped peripherals that are
used only by a single processor.

Normal

Outer
Shareable

Other attributes

Cacheability, one of:

Description

a

Non-cacheable
Write-Through Cacheable
Write-Back Write-Allocate Cacheable
Write-Back no Write-Allocate Cacheable
Inner
Shareable

Cacheability, one of:

a

Non-cacheable
Write-Through Cacheable
Write-Back Write-Allocate Cacheable
Write-Back no Write-Allocate Cacheable
Nonshareable

Cacheability, one of:

a

Non-cacheable
Write-Through Cacheable
Write-Back Write-Allocate Cacheable
Write-Back no Write-Allocate Cacheable

The Outer Shareable attribute
qualifies the Shareable attribute
for Normal memory regions
and enables two levels of
Normal memory sharing. b

Intended to handle Normal
memory that is shared between
several processors.

Intended to handle Normal
memory that is used by only a
single processor.

a. The cacheability attribute is defined independently for inner and outer cache regions.
b. The significance of the Outer Shareable attribute is IMPLEMENTATION DEFINED.

A3.5.3

Atomicity in the ARM architecture
Atomicity is a feature of memory accesses, described as atomic accesses. The ARM architecture description
refers to two types of atomicity, defined in:
•
Single-copy atomicity on page A3-27
•
Multi-copy atomicity on page A3-28.

A3-26

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Memory Model

Single-copy atomicity
A read or write operation is single-copy atomic if the following conditions are both true:
•

After any number of write operations to an operand, the value of the operand is the value written by
one of the write operations. It is impossible for part of the value of the operand to come from one
write operation and another part of the value to come from a different write operation.

•

When a read operation and a write operation are made to the same operand, the value obtained by the
read operation is one of:
—
the value of the operand before the write operation
—
the value of the operand after the write operation.
It is never the case that the value of the read operation is partly the value of the operand before the
write operation and partly the value of the operand after the write operation.

In ARMv7, the single-copy atomic processor accesses are:
•
all byte accesses
•
all halfword accesses to halfword-aligned locations
•
all word accesses to word-aligned locations
•
memory accesses caused by LDREXD and STREXD instructions to doubleword-aligned locations.
LDM, LDC, LDC2, LDRD, STM, STC, STC2, STRD, PUSH, POP, RFE, SRS, VLDM, VLDR, VSTM, and VSTR instructions are

executed as a sequence of word-aligned word accesses. Each 32-bit word access is guaranteed to be
single-copy atomic. A subsequence of two or more word accesses from the sequence might not exhibit
single-copy atomicity.
Advanced SIMD element and structure loads and stores are executed as a sequence of accesses of the
element or structure size. The element accesses are single-copy atomic if and only if both:
•
the element size is 32 bits, or smaller
•
the elements are naturally aligned.
Accesses to 64-bit elements or structures that are at least word-aligned are executed as a sequence of 32-bit
accesses, each of which is single-copy atomic. Subsequences of two or more 32-bit accesses from the
sequence might not be single-copy atomic.
When an access is not single-copy atomic, it is executed as a sequence of smaller accesses, each of which
is single-copy atomic, at least at the byte level.
If an instruction is executed as a sequence of accesses according to these rules, some exceptions can be taken
in the sequence and cause execution of the instruction to be abandoned. These exceptions are:
•

synchronous Data Abort exceptions

•

if low interrupt latency configuration is selected and the accesses are to Normal memory, see Low
interrupt latency configuration on page B1-43:

ARM DDI 0406B

—

IRQ interrupts

—

FIQ interrupts

—

asynchronous aborts.

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A3-27

Application Level Memory Model

If any of these exceptions are returned from using their preferred exception return, the instruction that
generated the sequence of accesses is re-executed and so any accesses that had already been performed
before the exception was taken are repeated.

Note
The exception behavior for these multiple access instructions means they are not suitable for use for writes
to memory for the purpose of software synchronization.
For implicit accesses:
•

Cache linefills and evictions have no effect on the single-copy atomicity of explicit transactions or
instruction fetches.

•

Instruction fetches are single-copy atomic for each instruction fetched.

Note
32-bit Thumb instructions are fetched as two 16-bit items.
•

Translation table walks are performed as 32-bit accesses aligned to 32 bits, each of which is
single-copy atomic.

Multi-copy atomicity
In a multiprocessing system, writes to a memory location are multi-copy atomic if the following conditions
are both true:
•

All writes to the same location are serialized, meaning they are observed in the same order by all
observers, although some observers might not observe all of the writes.

•

A read of a location does not return the value of a write until all observers observe that write.

Writes to Normal memory are not multi-copy atomic.
All writes to Device and Strongly-ordered memory that are single-copy atomic are also multi-copy atomic.
All write accesses to the same location are serialized. Write accesses to Normal memory can be repeated up
to the point that another write to the same address is observed.
For Normal memory, serialization of writes does not prohibit the merging of writes.

A3.5.4

Normal memory
Normal memory is idempotent, meaning that it exhibits the following properties:

A3-28

•

read accesses can be repeated with no side effects

•

repeated read accesses return the last value written to the resource being read

•

read accesses can prefetch additional memory locations with no side effects

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Memory Model

•

write accesses can be repeated with no side effects, provided that the contents of the location are
unchanged between the repeated writes

•

unaligned accesses can be supported

•

accesses can be merged before accessing the target memory system.

Normal memory can be read/write or read-only, and a Normal memory region is defined as being either
Shareable or Non-shareable. In a VMSA implementation, Shareable Normal memory can be either Inner
Shareable or Outer Shareable. In a PMSA implementation, no distinction is made between Inner Shareable
and Outer Shareable regions.
The Normal memory type attribute applies to most memory used in a system.
Accesses to Normal Memory have a weakly consistent model of memory ordering. See a standard text
describing memory ordering issues for a description of weakly consistent memory models, for example
chapter 2 of Memory Consistency Models for Shared Memory-Multiprocessors, Kourosh Gharachorloo,
Stanford University Technical Report CSL-TR-95-685. In general, for Normal memory, barrier operations
are required where the order of memory accesses observed by other observers must be controlled. This
requirement applies regardless of the cacheablility and shareability attributes of the Normal memory region.
The ordering requirements of accesses described in Ordering requirements for memory accesses on
page A3-45 apply to all explicit accesses.
An instruction that generates a sequence of accesses as described in Atomicity in the ARM architecture on
page A3-26 might be abandoned as a result of an exception being taken during the sequence of accesses. On
return from the exception the instruction is restarted, and therefore one or more of the memory locations
might be accessed multiple times. This can result in repeated write accesses to a location that has been
changed between the write accesses.
The architecture permits speculative accesses to memory locations marked as Normal if the access
permissions and domain permit an access to the locations.
A Normal memory region has shareability attributes that define the data coherency properties of the region.
These attributes do not affect the coherency requirements of:
•

instruction fetches, see Instruction coherency issues on page A3-53

•

translation table walks, if supported, in the base ARMv7 architecture and in versions of the
architecture before ARMv7, see TLB maintenance operations and the memory order model on
page B3-59.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A3-29

Application Level Memory Model

Non-shareable Normal memory
For a Normal memory region, the Non-shareable attribute identifies Normal memory that is likely to be
accessed only by a single processor.
A region of Normal memory with the Non-shareable attribute does not have any requirement to make data
accesses by different observers coherent, unless the memory is non-cacheable. If other observers share the
memory system, software must use cache maintenance operations if the presence of caches might lead to
coherency issues when communicating between the observers. This cache maintenance requirement is in
addition to the barrier operations that are required to ensure memory ordering.
For Non-shareable Normal memory, the Load-Exclusive and Store-Exclusive synchronization primitives do
not take account of the possibility of accesses by more than one observer.

Shareable, Inner Shareable, and Outer Shareable Normal memory
For Normal memory, the Shareable and Outer Shareable memory attributes describe Normal memory that
is expected to be accessed by multiple processors or other system masters:
•

In a VMSA implementation, Normal memory that has the Shareable attribute but not the Outer
Shareable attribute assigned is described as having the Inner Shareable attribute.

•

In a PMSA implementation, no distinction is made between Inner Shareable and Outer Shareable
Normal memory, and you cannot assign the Outer Shareable attribute to Normal memory regions.

A region of Normal memory with the Shareable attribute is one for which data accesses to memory by
different observers within the same shareability domain are coherent.
The Outer Shareable attribute is introduced in ARMv7, and can be applied only to a Normal memory region
in a VMSA implementation that has the Shareable attribute assigned. It creates three levels of shareability
for a Normal memory region:
Non-shareable
A Normal memory region that does not have the Shareable attribute assigned.
Inner Shareable
A Normal memory region that has the Shareable attribute assigned, but not the Outer
Shareable attribute.
Outer Shareable
A Normal memory region that has both the Shareable and the Outer Shareable attributes
assigned.
These attributes can be used to define sets of observers for which the shareability attributes make the data
or unified caches transparent for data accesses. The sets of observers that are affected by the shareability
attributes are described as shareability domains. The details of the use of these attributes are
system-specific. Example A3-1 on page A3-31 shows how they might be used:

A3-30

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Memory Model

Example A3-1 Use of shareability attributes
In a VMSA implementation, a particular sub-system with two clusters of processors has the requirement
that:
•

in each cluster, the data or unified caches of the processors in the cluster are transparent for all data
accesses with the Inner Shareable attribute

•

however, between the two clusters, the caches:
—
are not transparent for data accesses that have only the Inner Shareable attribute
—
are transparent for data accesses that have the Outer Shareable attribute.

In this system, each cluster is in a different shareability domain for the Inner Shareable attribute, but all
components of the sub-system are in the same shareability domain for the Outer Shareable attribute.
A system might implement two such sub-systems. If the data or unified caches of one subsystem are not
transparent to the accesses from the other subsystem, this system has two Outer Shareable shareability
domains.
Having two levels of shareability attribute means you can reduce the performance and power overhead for
shared memory regions that do not need to be part of the Outer Shareable shareability domain.
Whether an ARMv7 implementation supports the Outer Shareable attribute is IMPLEMENTATION DEFINED.
If the Outer Shareable attribute is supported, its significance in the implementation is IMPLEMENTATION
DEFINED.
For Shareable Normal memory, the Load-Exclusive and Store-Exclusive synchronization primitives take
account of the possibility of accesses by more than one observer in the same Shareability domain.

Note
The Shareable concept enables system designers to specify the locations in Normal memory that must have
coherency requirements. However, to facilitate porting of software, software developers must not assume
that specifying a memory region as Non-shareable permits software to make assumptions about the
incoherency of memory locations between different processors in a shared memory system. Such
assumptions are not portable between different multiprocessing implementations that make use of the
Shareable concept. Any multiprocessing implementation might implement caches that, inherently, are
shared between different processing elements.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A3-31

Application Level Memory Model

Write-Through Cacheable, Write-Back Cacheable and Non-cacheable Normal
memory
In addition to being Outer Shareable, Inner Shareable or Non-shareable, each region of Normal memory can
be marked as being one of:
•
Write-Through Cacheable
•
Write-Back Cacheable, with an additional qualifier that marks it as one of:
—
Write-Back, Write-Allocate
—
Write-Back, no Write-Allocate
•
Non-cacheable.
If the same memory locations are marked as having different cacheability attributes, for example by the use
of aliases in a virtual to physical address mapping, behavior is UNPREDICTABLE.
The cacheability attributes provide a mechanism of coherency control with observers that lie outside the
shareability domain of a region of memory. In some cases, the use of Write-Through Cacheable or
Non-cacheable regions of memory might provide a better mechanism for controlling coherency than the use
of hardware coherency mechanisms or the use of cache maintenance routines. To this end, the architecture
requires the following properties for Non-cacheable or Write-Through Cacheable memory:
•

a completed write to a memory location that is Non-cacheable or Write-Through Cacheable for a
level of cache made by an observer accessing the memory system inside the level of cache is visible
to all observers accessing the memory system outside the level of cache without the need of explicit
cache maintenance

•

a completed write to a memory location that is Non-cacheable for a level of cache made by an
observer accessing the memory system outside the level of cache is visible to all observers accessing
the memory system inside the level of cache without the need of explicit cache maintenance.

Note
Implementations can also use the cacheability attributes to provide a performance hint regarding the
performance benefit of caching. For example, it might be known to a programmer that a piece of memory
is not going to be accessed again and would be better treated as Non-cacheable. The distinction between
Write-Back Write-Allocate and Write-Back no Write-Allocate memory exists only as a hint for
performance.
The ARM architecture provides independent cacheability attributes for Normal memory for two conceptual
levels of cache, the inner and the outer cache. The relationship between these conceptual levels of cache and
the implemented physical levels of cache is IMPLEMENTATION DEFINED, and can differ from the boundaries
between the Inner and Outer Shareability domains. However:

A3-32

•

inner refers to the innermost caches, and always includes the lowest level of cache

•

no cache controlled by the Inner cacheability attributes can lie outside a cache controlled by the Outer
cacheability attributes

•

an implementation might not have any outer cache.

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Memory Model

Example A3-2 to Example A3-4 describe the three possible ways of implementing a system with three
levels of cache, L1 to L3. L1 is the level closest to the processor, see Memory hierarchy on page A3-52.
Example A3-2 Implementation with two inner and one outer cache levels
Implement the three levels of cache in the system, L1 to L3, with:
•
the Inner cacheability attribute applied to L1 and L2 cache
•
the Outer cacheability attribute applied to L3 cache.

Example A3-3 Implementation with three inner and no outer cache levels
Implement the three levels of cache in the system, L1 to L3, with the Inner cacheability attribute applied to
L1, L2, and L3 cache. Do not use the Outer cacheability attribute.

Example A3-4 Implementation with one inner and two outer cache levels
Implement the three levels of cache in the system, L1 to L3, with:
•
the Inner cacheability attribute applied to L1 cache
•
the Outer cacheability attribute applied to L2 and L3 cache.

A3.5.5

Device memory
The Device memory type attribute defines memory locations where an access to the location can cause side
effects, or where the value returned for a load can vary depending on the number of loads performed.
Memory-mapped peripherals and I/O locations are examples of memory regions normally marked as being
Device memory.
For explicit accesses from the processor to memory marked as Device:
•
all accesses occur at their program size
•
the number of accesses is the number specified by the program.
An implementation must not repeat an access to a Device memory location if the program has only one
access to that location. In other words, accesses to Device memory locations are not restartable.
The architecture does not permit speculative accesses to memory marked as Device.
The architecture permits an Advanced SIMD element or structure load instruction to access bytes in Device
memory that are not explicitly accessed by the instruction, provided the bytes accessed are within a 16-byte
window, aligned to 16-bytes, that contains at least one byte that is explicitly accessed by the instruction.
Address locations marked as Device are never held in a cache.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A3-33

Application Level Memory Model

All explicit accesses to Device memory must comply with the ordering requirements of accesses described
in Ordering requirements for memory accesses on page A3-45.
An instruction that generates a sequence of accesses as described in Atomicity in the ARM architecture on
page A3-26 might be abandoned as a result of an exception being taken during the sequence of accesses. On
return from the exception the instruction is restarted, and therefore one or more of the memory locations
might be accessed multiple times. This can result in repeated write accesses to a location that has been
changed between the write accesses.

Note
Do not use an instruction that generates a sequence of accesses to access Device memory if the instruction
might generate an abort on any access other than the first one.
Any unaligned access that is not faulted by the alignment restrictions and accesses Device memory has
UNPREDICTABLE behavior.

Shareable attribute for Device memory regions
Device memory regions can be given the Shareable attribute. This means that a region of Device memory
can be described as either:
•
Shareable Device memory
•
Non-shareable Device memory.
Non-shareable Device memory is defined as only accessible by a single processor. An example of a system
supporting Shareable and Non-shareable Device memory is an implementation that supports both:
•
a local bus for its private peripherals
•
system peripherals implemented on the main shared system bus.
Such a system might have more predictable access times for local peripherals such as watchdog timers or
interrupt controllers. In particular, a specific address in a Non-shareable Device memory region might
access a different physical peripheral for each processor.

A3.5.6

Strongly-ordered memory
The Strongly-ordered memory type attribute defines memory locations where an access to the location can
cause side effects, or where the value returned for a load can vary depending on the number of loads
performed. Examples of memory regions normally marked as being Strongly-ordered are memory-mapped
peripherals and I/O locations.
For explicit accesses from the processor to memory marked as Strongly-ordered:
•
all accesses occur at their program size
•
the number of accesses is the number specified by the program.
An implementation must not repeat an access to a Strongly-ordered memory location if the program has
only one access to that location. In other words, accesses to Strongly-ordered memory locations are not
restartable.

A3-34

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Memory Model

The architecture does not permit speculative accesses to memory marked as Strongly-ordered.
The architecture permits an Advanced SIMD element or structure load instruction to access bytes in
Strongly-ordered memory that are not explicitly accessed by the instruction, provided the bytes accessed are
within a 16-byte window, aligned to 16-bytes, that contains at least one byte that is explicitly accessed by
the instruction.
Address locations in Strongly-ordered memory are not held in a cache, and are always treated as Shareable
memory locations.
All explicit accesses to Strongly-ordered memory must correspond to the ordering requirements of accesses
described in Ordering requirements for memory accesses on page A3-45.
An instruction that generates a sequence of accesses as described in Atomicity in the ARM architecture on
page A3-26 might be abandoned as a result of an exception being taken during the sequence of accesses. On
return from the exception the instruction is restarted, and therefore one or more of the memory locations
might be accessed multiple times. This can result in repeated write accesses to a location that has been
changed between the write accesses.

Note
Do not use an instruction that generates a sequence of accesses to access Strongly-ordered memory if the
instruction might generate an abort on any access other than the first one.
Any unaligned access that is not faulted by the alignment restrictions and accesses Strongly-ordered
memory has UNPREDICTABLE behavior.

Note
See Ordering of instructions that change the CPSR interrupt masks on page AppxG-8 for additional
requirements that apply to accesses to Strongly-ordered memory in ARMv6.

A3.5.7

Memory access restrictions
The following restrictions apply to memory accesses:
•

For any access X, the bytes accessed by X must all have the same memory type attribute, otherwise
the behavior of the access is UNPREDICTABLE. That is, an unaligned access that spans a boundary
between different memory types is UNPREDICTABLE.

•

For any two memory accesses X and Y that are generated by the same instruction, the bytes accessed
by X and Y must all have the same memory type attribute, otherwise the results are UNPREDICTABLE.
For example, an LDC, LDM, LDRD, STC, STM, or STRD that spans a boundary between Normal and Device
memory is UNPREDICTABLE.

•

An instruction that generates an unaligned memory access to Device or Strongly-ordered memory is
UNPREDICTABLE.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A3-35

Application Level Memory Model

•

To ensure access rules are maintained, an instruction that causes multiple accesses to Device or
Strongly-ordered memory must not cross a 4KB address boundary, otherwise the effect is
UNPREDICTABLE. For this reason, it is important that an access to a volatile memory device is not
made using a single instruction that crosses a 4KB address boundary.
ARM expects this restriction to impose constraints on the placing of volatile memory devices in the
memory map of a system, rather than expecting a compiler to be aware of the alignment of memory
accesses.

•

For instructions that generate accesses to Device or Strongly-ordered memory, implementations must
not change the sequence of accesses specified by the pseudocode of the instruction. This includes not
changing:
—
how many accesses there are
—
the time order of the accesses
—
the data sizes and other properties of each access.
In addition, processor implementations expect any attached memory system to be able to identify the
memory type of an accesses, and to obey similar restrictions with regard to the number, time order,
data sizes and other properties of the accesses.
Exceptions to this rule are:

A3-36

—

An implementation of a processor can break this rule, provided that the information it supplies
to the memory system enables the original number, time order, and other details of the accesses
to be reconstructed. In addition, the implementation must place a requirement on attached
memory systems to do this reconstruction when the accesses are to Device or Strongly-ordered
memory.
For example, an implementation with a 64-bit bus might pair the word loads generated by an
LDM into 64-bit accesses. This is because the instruction semantics ensure that the 64-bit access
is always a word load from the lower address followed by a word load from the higher address.
However the implementation must permit the memory systems to unpack the two word loads
when the access is to Device or Strongly-ordered memory.

—

Any implementation technique that produces results that cannot be observed to be different
from those described above is legitimate.

—

An Advanced SIMD element or structure load instruction can access bytes in Device or
Strongly-ordered memory that are not explicitly accessed by the instruction, provided the
bytes accessed are within a 16-byte window, aligned to 16-bytes, that contains at least one byte
that is explicitly accessed by the instruction.

•

Any multi-access instruction that loads or stores the PC must access only Normal memory. If the
instruction accesses Device or Strongly-ordered memory the result is UNPREDICTABLE. There is one
exception to this restriction. In the VMSA architecture, when the MMU is disabled any multi-access
instruction that loads or stores the PC functions correctly, see Enabling and disabling the MMU on
page B3-5.

•

Any instruction fetch must access only Normal memory. If it accesses Device or Strongly-ordered
memory, the result is UNPREDICTABLE. For example, instruction fetches must not be performed to an
area of memory that contains read-sensitive devices, because there is no ordering requirement
between instruction fetches and explicit accesses.

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Memory Model

•

Behavior is UNPREDICTABLE if the same memory location:
—
is marked as Shareable Normal and Non-shareable Normal
—
is marked as having different memory types (Normal, Device, or Strongly-ordered)
—
is marked as having different cacheability attributes
—
is marked as being Shareable Device and Non-shareable Device memory.
Such memory marking contradictions can occur, for example, by the use of aliases in a virtual to
physical address mapping.

Before ARMv6, it is IMPLEMENTATION DEFINED whether a low interrupt latency mode is supported. From
ARMv6, low interrupt latency support is controlled by the SCTLR.FI bit. It is IMPLEMENTATION DEFINED
whether multi-access instructions behave correctly in low interrupt latency configurations.

A3.5.8

Backwards compatibility
From ARMv6, the memory attributes are significantly different from those in previous versions of the
architecture. Table A3-9 shows the interpretation of the earlier memory types in the light of this definition.
Table A3-9 Backwards compatibility

A3.5.9

Previous architectures

ARMv6 and ARMv7 attribute

NCNB (Non-cacheable, Non-bufferable)

Strongly-ordered

NCB (Non-cacheable, Bufferable)

Shareable Device

Write-Through Cacheable, Bufferable

Non-shareable Normal, Write-Through Cacheable

Write-Back Cacheable, Bufferable

Non-shareable Normal, Write-Back Cacheable

The effect of the Security Extensions
The Security Extensions can be included as part of an ARMv7-A implementation, with a VMSA. They
provide two distinct 4GByte virtual memory spaces:
•
a Secure virtual memory space
•
a Non-secure virtual memory space.
The Secure virtual memory space is accessed by memory accesses in the Secure state, and the Non-secure
virtual memory space is accessed by memory accesses in the Non-secure state.
By providing different virtual memory spaces, the Security Extensions permit memory accesses made from
the Non-secure state to be distinguished from those made from the Secure state.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A3-37

Application Level Memory Model

A3.6

Access rights
ARMv7 includes additional attributes for memory regions, that enable:

A3.6.1

•

Data accesses to be restricted, based on the privilege of the access. See Privilege level access controls
for data accesses.

•

Instruction fetches to be restricted, based on the privilege of the process or thread making the fetch.
See Privilege level access controls for instruction accesses.

•

On a system that implements the Security Extensions, accesses to be restricted to memory accesses
with the Secure memory attribute. See Memory region security status on page A3-39.

Privilege level access controls for data accesses
The memory attributes can define that a memory region is:
•
not accessible to any accesses
•
accessible only to Privileged accesses
•
accessible to Privileged and Unprivileged accesses.
The access privilege level is defined separately for explicit read and explicit write accesses. However, a
system that defines the memory attributes is not required to support all combinations of memory attributes
for read and write accesses.
A Privileged access is an access made during privileged execution, as a result of a load or store operation
other than LDRT, STRT, LDRBT, STRBT, LDRHT, STRHT, LDRSHT, or LDRSBT.
An Unprivileged access is an access made as a result of load or store operation performed in one of these
cases:
•
•

when the processor is in an unprivileged mode
when the processor is in any mode and the access is made as a result of a LDRT, STRT, LDRBT, STRBT,
LDRHT, STRHT, LDRSHT, or LDRSBT instruction.

A Data Abort exception is generated if the processor attempts a data access that the access rights do not
permit. For example, a Data Abort exception is generated if the processor is in unprivileged mode and
attempts to access a memory region that is marked as only accessible to Privileged accesses.

A3.6.2

Privilege level access controls for instruction accesses
Memory attributes can define that a memory region is:
•
Not accessible for execution
•
Accessible for execution by Privileged processes only
•
Accessible for execution by Privileged and Unprivileged processes.
To define the instruction access rights to a memory region, the memory attributes describe, separately, for
the region:
•
its read access rights, see Privilege level access controls for data accesses

A3-38

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Memory Model

•

whether it is suitable for execution.

For example, a region that is accessible for execution by Privileged processes only has the memory
attributes:
•
accessible only to Privileged read accesses
•
suitable for execution.
This means there is some linkage between the memory attributes that define the accessibility of a region to
explicit memory accesses, and those that define that a region can be executed.
A memory fault occurs if a processor attempts to execute code from a memory location with attributes that
do not permit code execution.

A3.6.3

Memory region security status
An additional memory attribute determines whether the memory region is Secure or Non-secure in an
ARMv7-A system that implements the Security Extensions. When the Security Extensions are
implemented, this attribute is checked by the system hardware to ensure that a region of memory that is
designated as Secure by the system hardware is not accessed by memory accesses with the Non-secure
memory attribute. For more information, see Memory region attributes on page B3-32.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A3-39

Application Level Memory Model

A3.7

Virtual and physical addressing
ARMv7 provides three alternative architectural profiles, ARMv7-A, ARMv7-R and ARMv7-M. Each of the
profiles specifies a different memory system. This manual describes two of these profiles:
ARMv7-A profile
The ARMv7-A memory system incorporates a Memory Management Unit (MMU),
controlled by CP15 registers. The memory system supports virtual addressing, with the
MMU performing virtual to physical address translation, in hardware, as part of program
execution.
ARMv7-R profile
The ARMv7-R memory system incorporates a Memory Protection Unit (MPU), controlled
by CP15 registers. The MPU does not support virtual addressing.
At the application level, the difference between the ARMv7-A and ARMv7-R memory systems is
transparent. Regardless of which profile is implemented, an application accesses the memory map described
in Address space on page A3-2, and the implemented memory system makes the features described in this
chapter available to the application.
For a system-level description of the ARMv7-A and ARMv7-R memory models see:
•
Chapter B2 Common Memory System Architecture Features
•
Chapter B3 Virtual Memory System Architecture (VMSA)
•
Chapter B4 Protected Memory System Architecture (PMSA).

Note
This manual does not describe the ARMv7-M profile. For details of this profile see:
•
ARMv7-M Architecture Application Level Reference Manual, for an application-level description
•
ARMv7-M Architecture Reference Manual, for a full description.

A3-40

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Memory Model

A3.8

Memory access order
ARMv7 provides a set of three memory types, Normal, Device, and Strongly-ordered, with well-defined
memory access properties.
The ARMv7 application-level view of the memory attributes is described in:
•
Memory types and attributes and the memory order model on page A3-24
•
Access rights on page A3-38.
When considering memory access ordering, an important feature of the ARMv6 memory model is the
Shareable memory attribute, that indicates whether a region of memory can be shared between multiple
processors, and therefore requires an appearance of cache transparency in the ordering model.
The key issues with the memory order model depend on the target audience:
•

For software programmers, considering the model at the application level, the key factor is that for
accesses to Normal memory barriers are required in some situations where the order of accesses
observed by other observers must be controlled.

•

For silicon implementers, considering the model at the system level, the Strongly-ordered and Device
memory attributes place certain restrictions on the system designer in terms of what can be built and
when to indicate completion of an access.

Note
Implementations remain free to choose the mechanisms required to implement the functionality of
the memory model.
More information about the memory order model is given in the following subsections:
•
Reads and writes on page A3-42
•
Ordering requirements for memory accesses on page A3-45
•
Memory barriers on page A3-47.
Additional attributes and behaviors relate to the memory system architecture. These features are defined in
the system level section of this manual:
•

Virtual memory systems based on an MMU, described in Chapter B3 Virtual Memory System
Architecture (VMSA).

•

Protected memory systems based on an MPU, described in Chapter B4 Protected Memory System
Architecture (PMSA).

•

Caches, described in Caches on page B2-3.

Note
In these system level descriptions, some attributes are described in relation to an MMU. In general, these
descriptions can also be applied to an MPU based system.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A3-41

Application Level Memory Model

A3.8.1

Reads and writes
Each memory access is either a read or a write. Explicit memory accesses are the memory accesses required
by the function of an instruction. The following can cause memory accesses that are not explicit:
•
instruction fetches
•
cache loads and writebacks
•
translation table walks.
Except where otherwise stated, the memory ordering requirements only apply to explicit memory accesses.

Reads
Reads are defined as memory operations that have the semantics of a load.
The memory accesses of the following instructions are reads:
LDR, LDRB, LDRH, LDRSB, and LDRSH
•
•
LDRT, LDRBT, LDRHT, LDRSBT, and LDRSHT
•
LDREX, LDREXB, LDREXD, and LDREXH
•
LDM, LDRD, POP, and RFE
•
LDC, LDC2, VLDM, VLDR, VLD1, VLD2, VLD3, and VLD4
•
the return of status values by STREX, STREXB, STREXD, and STREXH
•
in the ARM instruction set only, SWP and SWPB
•
in the Thumb instruction set only, TBB and TBH.
Hardware-accelerated opcode execution by the Jazelle extension can cause a number of reads to occur,
according to the state of the operand stack and the implementation of the Jazelle hardware acceleration.

Writes
Writes are defined as memory operations that have the semantics of a store.
The memory accesses of the following instructions are Writes:
STR, STRB, and STRH
•
•
STRT, STRBT, and STRHT
•
STREX, STREXB, STREXD, and STREXH
STM, STRD, PUSH, and SRS
•
•
STC, STC2, VSTM, VSTR, VST1, VST2, VST3, and VST4
•
in the ARM instruction set only, SWP and SWPB.
Hardware-accelerated opcode execution by the Jazelle extension can cause a number of writes to occur,
according to the state of the operand stack and the implementation of the Jazelle hardware acceleration.

A3-42

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Memory Model

Synchronization primitives
Synchronization primitives must ensure correct operation of system semaphores in the memory order
model. The synchronization primitive instructions are defined as those instructions that are used to ensure
memory synchronization:
LDREX, STREX, LDREXB, STREXB, LDREXD, STREXD, LDREXH, STREXH.
•
SWP, SWPB. Use of these instructions is deprecated from ARMv6.
•
Before ARMv6, support consisted of the SWP and SWPB instructions. ARMv6 introduced new Load-Exclusive
and Store-Exclusive instructions LDREX and STREX, and deprecated using the SWP and SWPB instructions.
ARMv7 introduces:
•

additional Load-Exclusive and Store-Exclusive instructions, LDREXB, LDREXD, LDREXH, STREXB, STREXD,
and STREXH

•

the Clear-Exclusive instruction CLREX

•

the Load-Exclusive, Store-Exclusive and Clear-Exclusive instructions in the Thumb instruction set.

For details of the Load-Exclusive, Store-Exclusive and Clear-Exclusive instructions see Synchronization
and semaphores on page A3-12.
The Load-Exclusive and Store-Exclusive instructions are supported to Shareable and Non-shareable
memory. Non-shareable memory can be used to synchronize processes that are running on the same
processor. Shareable memory must be used to synchronize processes that might be running on different
processors.

Observability and completion
An observer is an agent in the system that can access memory. For a processor, the following mechanisms
must be treated as independent observers:
•

the mechanism that performs reads or writes to memory

•

a mechanism that causes an instruction cache to be filled from memory or that fetches instructions to
be executed directly from memory

•

a mechanism that performs translation table walks.

The set of observers that can observe a memory access is defined by the system.
For all memory:
•

a write to a location in memory is said to be observed by an observer when a subsequent read of the
location by the same observer will return the value written by the write

•

a write to a location in memory is said to be globally observed for a shareability domain when a
subsequent read of the location by any observer in that shareability domain will return the value
written by the write

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A3-43

Application Level Memory Model

•

a read of a location in memory is said to be observed by an observer when a subsequent write to the
location by the same observer will have no effect on the value returned by the read

•

a read of a location in memory is said to be globally observed for a shareability domain when a
subsequent write to the location by any observer in that shareability domain will have no effect on
the value returned by the read.

Additionally, for Strongly-ordered memory:
•

A read or write of a memory-mapped location in a peripheral that exhibits side-effects is said to be
observed, and globally observed, only when the read or write:
—

meets the general conditions listed

—

can begin to affect the state of the memory-mapped peripheral

—

can trigger all associated side effects, whether they affect other peripheral devices, processors
or memory.

For all memory, the completion rules are defined as:
•

A read or write is complete for a shareability domain when all of the following are true:
—

the read or write is globally observed for that shareability domain

—

any translation table walks associated with the read or write are complete for that shareability
domain.

•

A translation table walk is complete for a shareability domain when the memory accesses associated
with the translation table walk are globally observed for that shareability domain, and the TLB is
updated.

•

A cache, branch predictor or TLB maintenance operation is complete for a shareability domain when
the effects of operation are globally observed for that shareability domain and any translation table
walks that arise from the operation are complete for that shareability domain.
The completion of any cache, branch predictor and TLB maintenance operation includes its
completion on all processors that are affected by both the operation and the DSB.

Side effect completion in Strongly-ordered and Device memory
The completion of a memory access in Strongly-ordered or Device memory is not guaranteed to be
sufficient to determine that the side effects of the memory access are visible to all observers. The mechanism
that ensures the visibility of side-effects of a memory accesses is IMPLEMENTATION DEFINED.

A3-44

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Memory Model

A3.8.2

Ordering requirements for memory accesses
ARMv7 and ARMv6 define access restrictions in the permitted ordering of memory accesses. These
restrictions depend on the memory attributes of the accesses involved.
Two terms used in describing the memory access ordering requirements are:
Address dependency
An address dependency exists when the value returned by a read access is used to compute
the virtual address of a subsequent read or write access. An address dependency exists even
if the value read by the first read access does not change the virtual address of the second
read or write access. This might be the case if the value returned is masked off before it is
used, or if it has no effect on the predicted address value for the second access.
Control dependency
A control dependency exists when the data value returned by a read access is used to
determine the condition code flags, and the values of the flags are used for condition code
checking to determine the address of a subsequent read access. This address determination
might be through conditional execution, or through the evaluation of a branch.
Figure A3-4 on page A3-46 shows the memory ordering between two explicit accesses A1 and A2, where
A1 occurs before A2 in program order. The symbols used in the figure are as follows:
<

Accesses must be observed in program order, that is, A1 must be observed before A2.

-

Accesses can be observed in any order, provided that the requirements of uniprocessor
semantics, for example respecting dependencies between instructions in a single processor,
are maintained.
The following additional restrictions apply to the ordering of memory accesses that have this
symbol:

ARM DDI 0406B

•

If there is an address dependency then the two memory accesses are observed in
program order by any observer in the common shareability domain of the two
accesses.
This ordering restriction does not apply if there is only a control dependency between
the two read accesses.
If there is both an address dependency and a control dependency between two read
accesses the ordering requirements of the address dependency apply.

•

If the value returned by a read access is used as data written by a subsequent write
access, then the two memory accesses are observed in program order.

•

It is impossible for an observer in the shareability domain of a memory location to
observe a write access to that memory location if that location would not be written
to in a sequential execution of a program.

•

It is impossible for an observer in the shareability domain of a memory location to
observe a write value written to that memory location if that value would not be
written in a sequential execution of a program.

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A3-45

Application Level Memory Model

•

It is impossible for an observer in the shareability domain of a memory location to
observe two reads to the same memory location performed by the same observer in
an order that would not occur in a sequential execution of a program.

In Figure A3-4, an access refers to a read or a write access to the specified memory type.
For example, Device access, Non-shareable refers to a read or write access to Non-shareable
Device memory.
A2
A1

Normal
access

Device access
Non-shareable

Shareable

Stronglyordered
access

Normal access

-

-

-

-

Device access, Non-shareable

-

<

-

<

Device access, Shareable

-

-

<

<

Strongly-ordered access

-

<

<

<

Figure A3-4 Memory ordering restrictions
There are no ordering requirements for implicit accesses to any type of memory.

Program order for instruction execution
The program order of instruction execution is the order of the instructions in the control flow trace.
Explicit memory accesses in an execution can be either:
Strictly Ordered
Denoted by <. Must occur strictly in order.
Ordered
Denoted by <=. Can occur either in order or simultaneously.
Load/store multiple instructions, such as LDM, LDRD, STM, and STRD, generate multiple word accesses, each of
which is a separate access for the purpose of determining ordering.
The rules for determining program order for two accesses A1 and A2 are:
If A1 and A2 are generated by two different instructions:
•

A1 < A2 if the instruction that generates A1 occurs before the instruction that generates A2 in
program order

•

A2 < A1 if the instruction that generates A2 occurs before the instruction that generates A1 in
program order.

If A1 and A2 are generated by the same instruction:
•

A3-46

If A1 and A2 are the load and store generated by a SWP or SWPB instruction:
—
A1 < A2 if A1 is the load and A2 is the store
—
A2 < A1 if A2 is the load and A1 is the store.

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Memory Model

•

In these descriptions:
—
an LDM-class instruction is any form of LDM, LDMDA, LDMDB, LDMIB, or POP instruction
—
an LDC-class instruction is an LDC, VLDM, or VLDR instruction
—
an STM-class instruction is any form of STM, STMDA, STMDB, STMIB, or PUSH instruction
—
an STC-class instruction is an STC, VSTM, or VSTR instruction.
If A1 and A2 are two word loads generated by an LDC-class or LDM-class instruction, or two word
stores generated by an STC-class or STM-class instruction, excluding LDM-class and STM-class
instructions with a register list that includes the PC:
—

A1 <= A2 if the address of A1 is less than the address of A2

—

A2 <= A1 if the address of A2 is less than the address of A1.

If A1 and A2 are two word loads generated by an LDM-class instruction with a register list that
includes the PC or two word stores generated by an STM-class instruction with a register list that
includes the PC, the program order of the memory accesses is not defined.

A3.8.3

•

If A1 and A2 are two word loads generated by an LDRD instruction or two word stores generated by
an STRD instruction, the program order of the memory accesses is not defined.

•

If A1 and A2 are load or store accesses generated by Advanced SIMD element or structure load/store
instructions, the program order of the memory accesses is not defined.

•

For any instruction or operation not explicitly mentioned in this section, if the single-copy atomicity
rules described in Single-copy atomicity on page A3-27 mean the operation becomes a sequence of
accesses, then the time-ordering of those accesses is not defined.

Memory barriers
Memory barrier is the general term applied to an instruction, or sequence of instructions, used to force
synchronization events by a processor with respect to retiring load/store instructions. The ARM architecture
defines a number of memory barriers that provide a range of functionality, including:
•
ordering of issued load/store instructions to the programmers’ model
•
completion of preceding load/store instructions to the programmers’ model
•
flushing of any instructions prefetched before the memory barrier operation.
ARMv7 and ARMv6 require three explicit memory barriers to support the memory order model described
in this chapter. In ARMv7 the memory barriers are provided as instructions that are available in the ARM
and Thumb instruction sets, and in ARMv6 the memory barriers are performed by CP15 register writes. The
three memory barriers are:
•
Data Memory Barrier, see Data Memory Barrier (DMB) on page A3-48
•
Data Synchronization Barrier, see Data Synchronization Barrier (DSB) on page A3-49
•
Instruction Synchronization Barrier, see Instruction Synchronization Barrier (ISB) on page A3-49.
Depending on the synchronization needed, a program might use memory barriers on their own, or it might
use them in conjunction with cache and memory management maintenance operations that are only
available in privileged modes.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A3-47

Application Level Memory Model

The DMB and DSB memory barriers affect reads and writes to the memory system generated by load/store
instructions and data or unified cache maintenance operations being executed by the processor. Instruction
fetches or accesses caused by a hardware translation table access are not explicit accesses.

Data Memory Barrier (DMB)
The DMB instruction is a data memory barrier. The processor that executes the DMB instruction is referred to
as the executing processor, Pe. The DMB instruction takes the required shareability domain and required
access types as arguments. If the required shareability is Full system then the operation applies to all
observers within the system.
A DMB creates two groups of memory accesses, Group A and Group B:
Group A

Contains:

Group B

•

All explicit memory accesses of the required access types from observers in the same
required shareability domain as Pe that are observed by Pe before the DMB instruction.
These accesses include any accesses of the required access types and required
shareability domain performed by Pe.

•

All loads of required access types from observers in the same required shareability
domain as Pe that have been observed by any given observer, Py, in the same required
shareability domain as Pe before Py has performed a memory access that is a member
of Group A.

Contains:
•

All explicit memory accesses of the required access types by Pe that occur in program
order after the DMB instruction.

•

All explicit memory accesses of the required access types by any given observer Px
in the same required shareability domain as Pe that can only occur after Px has
observed a store that is a member of Group B.

Any observer with the same required shareability domain as Pe observes all members of Group A before it
observes any member of Group B to the extent that those group members are required to be observed, as
determined by the shareability and cacheability of the memory locations accessed by the group members.
Where members of Group A and Group B access the same memory-mapped peripheral, all members of
Group A will be visible at the memory-mapped peripheral before any members of Group B are visible at
that peripheral.

Note

A3-48

•

A memory access might be in neither Group A nor Group B. The DMB does not affect the order of
observation of such a memory access.

•

The second part of the definition of Group A is recursive. Ultimately, membership of Group A derives
from the observation by Py of a load before Py performs an access that is a member of Group A as a
result of the first part of the definition of Group A.

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Memory Model

•

The second part of the definition of Group B is recursive. Ultimately, membership of Group B derives
from the observation by any observer of an access by Pe that is a member of Group B as a result of
the first part of the definition of Group B.

DMB only affects memory accesses. It has no effect on the ordering of any other instructions executing on the
processor.

For details of the DMB instruction in the Thumb and ARM instruction sets see DMB on page A8-90.

Data Synchronization Barrier (DSB)
The DSB instruction is a special memory barrier, that synchronizes the execution stream with memory
accesses. The DSB instruction takes the required shareability domain and required access types as arguments.
If the required shareability is Full system then the operation applies to all observers within the system.
A DSB behaves as a DMB with the same arguments, and also has the additional properties defined here.
A DSB completes when both:
•

all explicit memory accesses that are observed by Pe before the DSB is executed, are of the required
access types, and are from observers in the same required shareability domain as Pe, are complete for
the set of observers in the required shareability domain

•

all cache, branch predictor, and TLB maintenance operations issued by Pe before the DSB are complete
for the required shareability domain.

In addition, no instruction that appears in program order after the DSB instruction can execute until the DSB
completes.
For details of the DSB instruction in the Thumb and ARM instruction sets see DSB on page A8-92.

Note
Historically, this operation was referred to as Drain Write Buffer or Data Write Barrier (DWB). From
ARMv6, these names and the use of DWB were deprecated in favor of the new Data Synchronization Barrier
name and DSB abbreviation. DSB better reflects the functionality provided from ARMv6, because DSB is
architecturally defined to include all cache, TLB and branch prediction maintenance operations as well as
explicit memory operations.

Instruction Synchronization Barrier (ISB)
An ISB instruction flushes the pipeline in the processor, so that all instructions that come after the ISB
instruction in program order are fetched from cache or memory only after the ISB instruction has completed.
Using an ISB ensures that the effects of context altering operations executed before the ISB are visible to the
instructions fetched after the ISB instruction. Examples of context altering operations that require the
insertion of an ISB instruction to ensure the operations are complete are:
•
cache, TLB, and branch predictor maintenance operations
•
changes to the CP14 and CP15 registers.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A3-49

Application Level Memory Model

In addition, any branches that appear in program order after the ISB instruction are written into the branch
prediction logic with the context that is visible after the ISB instruction. This is needed to ensure correct
execution of the instruction stream.
Any context altering operations appearing in program order after the ISB instruction only take effect after
the ISB has been executed.
For details of the ISB instruction in the Thumb and ARM instruction sets see ISB on page A8-102.

Pseudocode details of memory barriers
The following types define the required shareability domains and required access types used as arguments
for DMB and DSB instructions:
enumeration MBReqDomain {MBReqDomain_FullSystem,
MBReqDomain_OuterShareable,
MBReqDomain_InnerShareable,
MBReqDomain_Nonshareable};
enumeration MBReqTypes {MBReqTypes_All, MBReqTypes_Writes};

The following procedures perform the memory barriers:
DataMemoryBarrier(MBReqDomain domain, MBReqTypes types)
DataSynchronizationBarrier(MBReqDomain domain, MBReqTypes types)
InstructionSynchronizationBarrier()

A3-50

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Memory Model

A3.9

Caches and memory hierarchy
The implementation of a memory system depends heavily on the microarchitecture and therefore the details
of the system are IMPLEMENTATION DEFINED. ARMv7 defines the application level interface to the memory
system, and supports a hierarchical memory system with multiple levels of cache. This section provides an
application level view of this system. It contains the subsections:
•
Introduction to caches
•
Memory hierarchy on page A3-52
•
Implication of caches for the application programmer on page A3-52
•
Preloading caches on page A3-54.

A3.9.1

Introduction to caches
A cache is a block of high-speed memory that contains a number of entries, each consisting of:
•
main memory address information, commonly known as a tag
•
the associated data.
Caches are used to increase the average speed of a memory access. Cache operation takes account of two
principles of locality:
Spatial locality
An access to one location is likely to be followed by accesses to adjacent locations.
Examples of this principle are:
•
sequential instruction execution
•
accessing a data structure.
Temporal locality
An access to an area of memory is likely to be repeated in a short time period. An example
of this principle is the execution of a code loop
To minimize the quantity of control information stored, the spatial locality property is used to group several
locations together under the same tag. This logical block is commonly known as a cache line. When data is
loaded into a cache, access times for subsequent loads and stores are reduced, resulting in overall
performance benefits. An access to information already in a cache is known as a cache hit, and other
accesses are called cache misses.
Normally, caches are self-managing, with the updates occurring automatically. Whenever the processor
wants to access a cacheable location, the cache is checked. If the access is a cache hit, the access occurs in
the cache, otherwise a location is allocated and the cache line loaded from memory. Different cache
topologies and access policies are possible, however, they must comply with the memory coherency model
of the underlying architecture.
Caches introduce a number of potential problems, mainly because of:
•
Memory accesses occurring at times other than when the programmer would normally expect them
•
There being multiple physical locations where a data item can be held

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A3-51

Application Level Memory Model

A3.9.2

Memory hierarchy
Memory close to a processor has very low latency, but is limited in size and expensive to implement. Further
from the processor it is easier to implement larger blocks of memory but these have increased latency. To
optimize overall performance, an ARMv7 memory system can include multiple levels of cache in a
hierarchical memory system. Figure A3-5 shows such a system, in an ARMv7-A implementation of a
VMSA, supporting virtual addressing.
Virtual
address

Physical address

Address
Translation
CP15 configuration
and control

Processor
R15
.
.
.
R0

Instruction
Prefetch

Level 1
Cache

Level 2
Cache

Load

Level 3
DRAM
SRAM
Flash
ROM

Store

Level 4
for example,
CF card, disk

Figure A3-5 Multiple levels of cache in a memory hierarchy

Note
In this manual, in a hierarchical memory system, Level 1 refers to the level closest to the processor, as shown
in Figure A3-5.

A3.9.3

Implication of caches for the application programmer
In normal operation, the caches are largely invisible to the application programmer. However they can
become visible when there is a breakdown in the coherency of the caches. Such a breakdown can occur:
•

when memory locations are updated by other agents in the system

•

when memory updates made from the application code must be made visible to other agents in the
system.

For example:

A3-52

•

In a system with a DMA controller that reads memory locations that are held in the data cache of a
processor, a breakdown of coherency occurs when the processor has written new data in the data
cache, but the DMA controller reads the old data held in memory.

•

In a Harvard architecture of caches, where there are separate instruction and data caches, a
breakdown of coherency occurs when new instruction data has been written into the data cache, but
the instruction cache still contains the old instruction data.

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Application Level Memory Model

Data coherency issues
You can ensure the data coherency of caches in the following ways:
•

By not using the caches in situations where coherency issues can arise. You can achieve this by:
—
using Non-cacheable or, in some cases, Write-Through Cacheable memory for the caches
—
not enabling caches in the system.

•

By using cache maintenance operations to manage the coherency issues in software, see Cache
maintenance functionality on page B2-9. Many of these operations are only available to system
software.

•

By using hardware coherency mechanisms to ensure the coherency of data accesses to memory for
cacheable locations by observers within the different shareability domains, see Non-shareable
Normal memory on page A3-30 and Shareable, Inner Shareable, and Outer Shareable Normal
memory on page A3-30.
The performance of these hardware coherency mechanisms is highly implementation specific. In
some implementations the mechanism suppresses the ability to cache shareable locations. In other
implementations, cache coherency hardware can hold data in caches while managing coherency
between observers within the shareability domains.

Instruction coherency issues
How far ahead of the current point of execution instructions are prefetched from is IMPLEMENTATION
DEFINED. Such prefetching can be either a fixed or a dynamically varying number of instructions, and can
follow any or all possible future execution paths. For all types of memory:
•

the processor might have fetched the instructions from memory at any time since the last ISB,
exception entry or exception return executed by that processor

•

any instructions fetched in this way might be executed multiple times, if this is required by the
execution of the program, without being refetched from memory

In addition, the ARM architecture does not require the hardware to ensure coherency between instruction
caches and memory, even for regions of memory with Shareable attributes. This means that for cacheable
regions of memory, an instruction cache can hold instructions that were fetched from memory before the
last ISB, exception entry or exception return.
If software requires coherency between instruction execution and memory, it must manage this coherency
using the ISB and DSB memory barriers and cache maintenance operations, see Ordering of cache and
branch predictor maintenance operations on page B2-21. Many of these operations are only available to
system software.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A3-53

Application Level Memory Model

A3.9.4

Preloading caches
The ARM architecture provides memory system hints PLD (Preload Data) and PLI (Preload Instruction) to
permit software to communicate the expected use of memory locations to the hardware. The memory system
can respond by taking actions that are expected to speed up the memory accesses if and when they do occur.
The effect of these memory system hints is IMPLEMENTATION DEFINED. Typically, implementations will use
this information to bring the data or instruction locations into caches that have faster access times than
normal memory.
The Preload instructions are hints, and so implementations can treat them as NOPs without affecting the
functional behavior of the device. The instructions do not generate synchronous Data Abort exceptions, but
the memory system operations might, under exceptional circumstances, generate asynchronous aborts. For
more information, see Data Abort exception on page B1-55.
Hardware implementations can provide other implementation-specific mechanisms to prefetch memory
locations in the cache. These must comply with the general cache behavior described in Cache behavior on
page B2-5.

A3-54

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Chapter A4
The Instruction Sets

This chapter describes the ARM and Thumb instruction sets. It contains the following sections:
•
About the instruction sets on page A4-2
•
Unified Assembler Language on page A4-4
•
Branch instructions on page A4-7
•
Data-processing instructions on page A4-8
•
Status register access instructions on page A4-18
•
Load/store instructions on page A4-19
•
Load/store multiple instructions on page A4-22
•
Miscellaneous instructions on page A4-23
•
Exception-generating and exception-handling instructions on page A4-24
•
Coprocessor instructions on page A4-25
•
Advanced SIMD and VFP load/store instructions on page A4-26
•
Advanced SIMD and VFP register transfer instructions on page A4-29
•
Advanced SIMD data-processing operations on page A4-30
•
VFP data-processing instructions on page A4-38.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A4-1

The Instruction Sets

A4.1

About the instruction sets
ARMv7 contains two main instruction sets, the ARM and Thumb instruction sets. Much of the functionality
available is identical in the two instruction sets. This chapter describes the functionality available in the
instruction sets, and the Unified Assembler Language (UAL) that can be assembled to either instruction set.
The two instruction sets differ in how instructions are encoded:
•

•

Thumb instructions are either 16-bit or 32-bit, and are aligned on a two-byte boundary. 16-bit and
32-bit instructions can be intermixed freely. Many common operations are most efficiently executed
using 16-bit instructions. However:
—

Most 16-bit instructions can only access eight of the general-purpose registers, R0-R7. These
are known as the low registers. A small number of 16-bit instructions can access the high
registers, R8-R15.

—

Many operations that would require two or more 16-bit instructions can be more efficiently
executed with a single 32-bit instruction.

ARM instructions are always 32-bit, and are aligned on a four-byte boundary.

The ARM and Thumb instruction sets can interwork freely, that is, different procedures can be compiled or
assembled to different instruction sets, and still be able to call each other efficiently.
ThumbEE is a variant of the Thumb instruction set that is designed as a target for dynamically generated
code. However, it cannot interwork freely with the ARM and Thumb instruction sets.
See:
•
•
•
•

A4.1.1

Chapter A5 ARM Instruction Set Encoding for encoding details of the ARM instruction set
Chapter A6 Thumb Instruction Set Encoding for encoding details of the Thumb instruction set
Chapter A8 Instruction Details for detailed descriptions of the instructions
Chapter A9 ThumbEE for encoding details of the ThumbEE instruction set.

Changing between Thumb state and ARM state
A processor in Thumb state (that is, executing Thumb instructions) can enter ARM state (and change to
executing ARM instructions) by executing any of the following instructions: BX, BLX, or an LDR or LDM that
loads the PC.
A processor in ARM state (that is, executing ARM instructions) can enter Thumb state (and change to
executing Thumb instructions) by executing any of the same instructions.
In ARMv7, a processor in ARM state can also enter Thumb state (and change to executing Thumb
instructions) by executing an ADC, ADD, AND, ASR, BIC, EOR, LSL, LSR, MOV, MVN, ORR, ROR, RRX, RSB, RSC, SBC, or SUB
instruction that has the PC as destination register and does not set the condition flags.

A4-2

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

The Instruction Sets

Note
This permits calls and returns between ARM code written for ARMv4 processors and Thumb code running
on ARMv7 processors to function correctly. In new code, ARM recommends that you use BX or BLX
instructions instead. In particular, use BX LR to return from a procedure, not MOV PC,LR.
The target instruction set is either encoded directly in the instruction (for the immediate offset version of
BLX), or is held as bit [0] of an interworking address. For details, see the description of the BXWritePC()
function in Pseudocode details of operations on ARM core registers on page A2-12.
Exception entries and returns can also change between ARM and Thumb states. For details see Exceptions
on page B1-30.

A4.1.2

Conditional execution
Most ARM instructions can be conditionally executed. This means that they only have their normal effect
on the programmers’ model operation, memory and coprocessors if the N, Z, C and V flags in the APSR
satisfy a condition specified in the instruction. If the flags do not satisfy this condition, the instruction acts
as a NOP, that is, execution advances to the next instruction as normal, including any relevant checks for
exceptions being taken, but has no other effect.
Most Thumb instructions are unconditional. Conditional execution in Thumb code can be achieved using
any of the following instructions:
•

A 16-bit conditional branch instruction, with a branch range of –256 to +254 bytes. For details see B
on page A8-44. Before ARMv6T2, this was the only mechanism for conditional execution in Thumb
code.

•

A 32-bit conditional branch instruction, with a branch range of approximately ± 1MB. For details see
B on page A8-44.

•

16-bit Compare and Branch on Zero and Compare and Branch on Nonzero instructions, with a branch
range of +4 to +130 bytes. For details see CBNZ, CBZ on page A8-66.

•

A 16-bit If-Then instruction that makes up to four following instructions conditional. For details see
IT on page A8-104. The instructions that are made conditional by an IT instruction are called its IT
block. Instructions in an IT block must either all have the same condition, or some can have one
condition, and others can have the inverse condition.

For more information about conditional execution see Conditional execution on page A8-8.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A4-3

The Instruction Sets

A4.2

Unified Assembler Language
This document uses the ARM Unified Assembler Language (UAL). This assembly language syntax
provides a canonical form for all ARM and Thumb instructions.
UAL describes the syntax for the mnemonic and the operands of each instruction. In addition, it assumes
that instructions and data items can be given labels. It does not specify the syntax to be used for labels, nor
what assembler directives and options are available. See your assembler documentation for these details.
Most earlier ARM assembly language mnemonics are still supported as synonyms, as described in the
instruction details.

Note
Most earlier Thumb assembly language mnemonics are not supported. For details see Appendix C Legacy
Instruction Mnemonics.
UAL includes instruction selection rules that specify which instruction encoding is selected when more than
one can provide the required functionality. For example, both 16-bit and 32-bit encodings exist for an
ADD R0,R1,R2 instruction. The most common instruction selection rule is that when both a 16-bit encoding
and a 32-bit encoding are available, the 16-bit encoding is selected, to optimize code density.
Syntax options exist to override the normal instruction selection rules and ensure that a particular encoding
is selected. These are useful when disassembling code, to ensure that subsequent assembly produces the
original code, and in some other situations.

A4.2.1

Conditional instructions
For maximum portability of UAL assembly language between the ARM and Thumb instruction sets, ARM
recommends that:
•

IT instructions are written before conditional instructions in the correct way for the Thumb

instruction set.
•

When assembling to the ARM instruction set, assemblers check that any IT instructions are correct,
but do not generate any code for them.

Although other Thumb instructions are unconditional, all instructions that are made conditional by an IT
instruction must be written with a condition. These conditions must match the conditions imposed by the IT
instruction. For example, an ITTEE EQ instruction imposes the EQ condition on the first two following
instructions, and the NE condition on the next two. Those four instructions must be written with EQ, EQ, NE
and NE conditions respectively.
Some instructions cannot be made conditional by an IT instruction. Some instructions can be conditional if
they are the last instruction in the IT block, but not otherwise.
The branch instruction encodings that include a condition field cannot be made conditional by an IT
instruction. If the assembler syntax indicates a conditional branch that correctly matches a preceding IT
instruction, it is assembled using a branch instruction encoding that does not include a condition field.

A4-4

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

The Instruction Sets

A4.2.2

Use of labels in UAL instruction syntax
The UAL syntax for some instructions includes the label of an instruction or a literal data item that is at a
fixed offset from the instruction being specified. The assembler must:
1.

Calculate the PC or Align(PC,4) value of the instruction. The PC value of an instruction is its address
plus 4 for a Thumb instruction, or plus 8 for an ARM instruction. The Align(PC,4) value of an
instruction is its PC value ANDed with 0xFFFFFFFC to force it to be word-aligned. There is no
difference between the PC and Align(PC,4) values for an ARM instruction, but there can be for a
Thumb instruction.

2.

Calculate the offset from the PC or Align(PC,4) value of the instruction to the address of the labelled
instruction or literal data item.

3.

Assemble a PC-relative encoding of the instruction, that is, one that reads its PC or Align(PC,4) value
and adds the calculated offset to form the required address.

Note
For instructions that can encode a subtraction operation, if the instruction cannot encode the
calculated offset but can encode minus the calculated offset, the instruction encoding specifies a
subtraction of minus the calculated offset.
The syntax of the following instructions includes a label:
•

B, BL, and BLX (immediate). The assembler syntax for these instructions always specifies the label of

the instruction that they branch to. Their encodings specify a sign-extended immediate offset that is
added to the PC value of the instruction to form the target address of the branch.
•

CBNZ and CBZ. The assembler syntax for these instructions always specifies the label of the instruction
that they branch to. Their encodings specify a zero-extended immediate offset that is added to the PC

value of the instruction to form the target address of the branch. They do not support backward
branches.
•

LDC, LDC2, LDR, LDRB, LDRD, LDRH, LDRSB, LDRSH, PLD, PLDW, PLI, and VLDR. The normal assembler syntax of

these load instructions can specify the label of a literal data item that is to be loaded. The encodings
of these instructions specify a zero-extended immediate offset that is either added to or subtracted
from the Align(PC,4) value of the instruction to form the address of the data item. A few such
encodings perform a fixed addition or a fixed subtraction and must only be used when that operation
is required, but most contain a bit that specifies whether the offset is to be added or subtracted.
When the assembler calculates an offset of 0 for the normal syntax of these instructions, it must
assemble an encoding that adds 0 to the Align(PC,4) value of the instruction. Encodings that subtract
0 from the Align(PC,4) value cannot be specified by the normal syntax.
There is an alternative syntax for these instructions that specifies the addition or subtraction and the
immediate offset explicitly. In this syntax, the label is replaced by [PC, #+/-], where:

ARM DDI 0406B

+/-

Is + or omitted to specify that the immediate offset is to be added to the Align(PC,4) value,
or - if it is to be subtracted.



Is the immediate offset.

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A4-5

The Instruction Sets

This alternative syntax makes it possible to assemble the encodings that subtract 0 from the
Align(PC,4) value, and to disassemble them to a syntax that can be re-assembled correctly.
•

ADR. The normal assembler syntax for this instruction can specify the label of an instruction or literal

data item whose address is to be calculated. Its encoding specifies a zero-extended immediate offset
that is either added to or subtracted from the Align(PC,4) value of the instruction to form the address
of the data item, and some opcode bits that determine whether it is an addition or subtraction.
When the assembler calculates an offset of 0 for the normal syntax of this instruction, it must
assemble the encoding that adds 0 to the Align(PC,4) value of the instruction. The encoding that
subtracts 0 from the Align(PC,4) value cannot be specified by the normal syntax.
There is an alternative syntax for this instruction that specifies the addition or subtraction and the
immediate value explicitly, by writing them as additions ADD ,PC,# or subtractions
SUB ,PC,#. This alternative syntax makes it possible to assemble the encoding that subtracts
0 from the Align(PC,4) value, and to disassemble it to a syntax that can be re-assembled correctly.

Note
ARM recommends that where possible, you avoid using:

A4-6

•

the alternative syntax for the ADR, LDC, LDC2, LDR, LDRB, LDRD, LDRH, LDRSB, LDRSH, PLD, PLI, PLDW, and VLDR
instructions

•

the encodings of these instructions that subtract 0 from the Align(PC,4) value.

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

The Instruction Sets

A4.3

Branch instructions
Table A4-1 summarizes the branch instructions in the ARM and Thumb instruction sets. In addition to
providing for changes in the flow of execution, some branch instructions can change instruction set.
Table A4-1 Branch instructions

Instruction

See

Range
(Thumb)

Range
(ARM)

Branch to target address

B on page A8-44

+/–16MB

+/–32MB

Compare and Branch on Nonzero, Compare
and Branch on Zero

CBNZ, CBZ on page A8-66

0-126B

a

Call a subroutine

BL, BLX (immediate) on page A8-58

+/–16MB
+/–16MB

+/–32MB
+/–32MB

Call a subroutine, optionally change instruction
set

BLX (register) on page A8-60

Any

Any

Branch to target address, change instruction set

BX on page A8-62

Any

Any

Change to Jazelle state

BXJ on page A8-64

-

-

Table Branch (byte offsets)
Table Branch (halfword offsets)

TBB, TBH on page A8-446

0-510B
0-131070B

a

Call a subroutine, change instruction set b

a. These instructions do not exist in the ARM instruction set.
b. The range is determined by the instruction set of the BLX instruction, not of the instruction it branches to.

Branches to loaded and calculated addresses can be performed by LDR, LDM and data-processing instructions.
For details see Load/store instructions on page A4-19, Load/store multiple instructions on page A4-22,
Standard data-processing instructions on page A4-8, and Shift instructions on page A4-10.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A4-7

The Instruction Sets

A4.4

Data-processing instructions
Core data-processing instructions belong to one of the following groups:
•

Standard data-processing instructions. These instructions perform basic data-processing operations,
and share a common format with some variations.

•

Shift instructions on page A4-10.

•

Saturating instructions on page A4-13.

•

Packing and unpacking instructions on page A4-14.

•

Miscellaneous data-processing instructions on page A4-15.

•

Parallel addition and subtraction instructions on page A4-16.

•

Divide instructions on page A4-17.

For extension data-processing instructions, see Advanced SIMD data-processing operations on page A4-30
and VFP data-processing instructions on page A4-38.

A4.4.1

Standard data-processing instructions
These instructions generally have a destination register Rd, a first operand register Rn, and a second
operand. The second operand can be another register Rm, or an immediate constant.
If the second operand is an immediate constant, it can be:
•

Encoded directly in the instruction.

•

A modified immediate constant that uses 12 bits of the instruction to encode a range of constants.
Thumb and ARM instructions have slightly different ranges of modified immediate constants. For
details see Modified immediate constants in Thumb instructions on page A6-17 and Modified
immediate constants in ARM instructions on page A5-9.

If the second operand is another register, it can optionally be shifted in any of the following ways:
LSL
Logical Shift Left by 1-31 bits.
LSR
Logical Shift Right by 1-32 bits.
ASR
Arithmetic Shift Right by 1-32 bits.
ROR
Rotate Right by 1-31 bits.
RRX
Rotate Right with Extend. For details see Shift and rotate operations on page A2-5.
In Thumb code, the amount to shift by is always a constant encoded in the instruction. In ARM code, the
amount to shift by is either a constant encoded in the instruction, or the value of a register Rs.
For instructions other than CMN, CMP, TEQ, and TST, the result of the data-processing operation is placed in the
destination register. In the ARM instruction set, the destination register can be the PC, causing the result to
be treated as an address to branch to. In the Thumb instruction set, this is only permitted for some 16-bit
forms of the ADD and MOV instructions.

A4-8

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

The Instruction Sets

These instructions can optionally set the condition code flags, according to the result of the operation. If
they do not set the flags, existing flag settings from a previous instruction are preserved.
Table A4-2 summarizes the main data-processing instructions in the Thumb and ARM instruction sets.
Generally, each of these instructions is described in three sections in Chapter A8 Instruction Details, one
section for each of the following:
•

INSTRUCTION (immediate) where the second operand is a modified immediate constant.

•

INSTRUCTION (register) where the second operand is a register, or a register shifted by a constant.

•

INSTRUCTION (register-shifted register) where the second operand is a register shifted by a value
obtained from another register. These are only available in the ARM instruction set.

Table A4-2 Standard data-processing instructions
Instruction

Mnemonic

Notes

Add with Carry

ADC

-

Add

ADD

Thumb instruction set permits use of a modified immediate
constant or a zero-extended 12-bit immediate constant.

Form PC-relative Address

ADR

First operand is the PC. Second operand is an immediate constant.
Thumb instruction set uses a zero-extended 12-bit immediate
constant. Operation is an addition or a subtraction.

Bitwise AND

AND

-

Bitwise Bit Clear

BIC

-

Compare Negative

CMN

Sets flags. Like ADD but with no destination register.

Compare

CMP

Sets flags. Like SUB but with no destination register.

Bitwise Exclusive OR

EOR

-

Copy operand to destination

MOV

Has only one operand, with the same options as the second
operand in most of these instructions. If the operand is a shifted
register, the instruction is an LSL, LSR, ASR, or ROR instruction
instead. For details see Shift instructions on page A4-10.
The ARM and Thumb instruction sets permit use of a modified
immediate constant or a zero-extended 16-bit immediate constant.

Bitwise NOT

MVN

Has only one operand, with the same options as the second
operand in most of these instructions.

Bitwise OR NOT

ORN

Not available in the ARM instruction set.

Bitwise OR

ORR

-

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A4-9

The Instruction Sets

Table A4-2 Standard data-processing instructions (continued)
Instruction

Mnemonic

Notes

Reverse Subtract

RSB

Subtracts first operand from second operand. This permits
subtraction from constants and shifted registers.

Reverse Subtract with Carry

RSC

Not available in the Thumb instruction set.

Subtract with Carry

SBC

-

Subtract

SUB

Thumb instruction set permits use of a modified immediate
constant or a zero-extended 12-bit immediate constant.

Test Equivalence

TEQ

Sets flags. Like EOR but with no destination register.

Test

TST

Sets flags. Like AND but with no destination register.

A4.4.2

Shift instructions
Table A4-3 lists the shift instructions in the ARM and Thumb instruction sets.
Table A4-3 Shift instructions
Instruction

See

Arithmetic Shift Right

ASR (immediate) on page A8-40

Arithmetic Shift Right

ASR (register) on page A8-42

Logical Shift Left

LSL (immediate) on page A8-178

Logical Shift Left

LSL (register) on page A8-180

Logical Shift Right

LSR (immediate) on page A8-182

Logical Shift Right

LSR (register) on page A8-184

Rotate Right

ROR (immediate) on page A8-278

Rotate Right

ROR (register) on page A8-280

Rotate Right with Extend

RRX on page A8-282

In the ARM instruction set only, the destination register of these instructions can be the PC, causing the
result to be treated as an address to branch to.

A4-10

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

The Instruction Sets

A4.4.3

Multiply instructions
These instructions can operate on signed or unsigned quantities. In some types of operation, the results are
same whether the operands are signed or unsigned.
•

Table A4-4 summarizes the multiply instructions where there is no distinction between signed and
unsigned quantities.
The least significant 32 bits of the result are used. More significant bits are discarded.

•

Table A4-5 summarizes the signed multiply instructions.

•

Table A4-6 on page A4-12 summarizes the unsigned multiply instructions.
Table A4-4 General multiply instructions
Instruction

See

Operation (number of bits)

Multiply Accumulate

MLA on page A8-190

32 = 32 + 32 x 32

Multiply and Subtract

MLS on page A8-192

32 = 32 – 32 x 32

Multiply

MUL on page A8-212

32 = 32 x 32

Table A4-5 Signed multiply instructions
Instruction

See

Operation (number of bits)

Signed Multiply Accumulate (halfwords)

SMLABB, SMLABT,
SMLATB, SMLATT on
page A8-330

32 = 32 + 16 x 16

Signed Multiply Accumulate Dual

SMLAD on page A8-332

32 = 32 + 16 x 16 + 16 x 16

Signed Multiply Accumulate Long

SMLAL on page A8-334

64 = 64 + 32 x 32

Signed Multiply Accumulate Long (halfwords)

SMLALBB, SMLALBT,
SMLALTB, SMLALTT on
page A8-336

64 = 64 + 16 x 16

Signed Multiply Accumulate Long Dual

SMLALD on page A8-338

64 = 64 + 16 x 16 + 16 x 16

Signed Multiply Accumulate (word by
halfword)

SMLAWB, SMLAWT on
page A8-340

32 = 32 + 32 x 16 a

Signed Multiply Subtract Dual

SMLSD on page A8-342

32 = 32 + 16 x 16 – 16 x 16

Signed Multiply Subtract Long Dual

SMLSLD on page A8-344

64 = 64 + 16 x 16 – 16 x 16

Signed Most Significant Word Multiply
Accumulate

SMMLA on page A8-346

32 = 32 + 32 x 32 b

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A4-11

The Instruction Sets

Table A4-5 Signed multiply instructions (continued)
Instruction

See

Operation (number of bits)

Signed Most Significant Word Multiply
Subtract

SMMLS on page A8-348

32 = 32 – 32 x 32 b

Signed Most Significant Word Multiply

SMMUL on page A8-350

32 = 32 x 32 b

Signed Dual Multiply Add

SMUAD on page A8-352

32 = 16 x 16 + 16 x 16

Signed Multiply (halfwords)

SMULBB, SMULBT,
SMULTB, SMULTT on
page A8-354

32 = 16 x 16

Signed Multiply Long

SMULL on page A8-356

64 = 32 x 32

Signed Multiply (word by halfword)

SMULWB, SMULWT on
page A8-358

32 = 32 x 16 a

Signed Dual Multiply Subtract

SMUSD on page A8-360

32 = 16 x 16 – 16 x 16

a. The most significant 32 bits of the 48-bit product are used. Less significant bits are discarded.
b. The most significant 32 bits of the 64-bit product are used. Less significant bits are discarded.

Table A4-6 Unsigned multiply instructions
Instruction

See

Operation (number of bits)

Unsigned Multiply Accumulate Accumulate Long

UMAAL on page A8-482

64 = 32 + 32 + 32 x 32

Unsigned Multiply Accumulate Long

UMLAL on page A8-484

64 = 64 + 32 x 32

Unsigned Multiply Long

UMULL on page A8-486

64 = 32 x 32

A4-12

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

The Instruction Sets

A4.4.4

Saturating instructions
Table A4-7 lists the saturating instructions in the ARM and Thumb instruction sets. For more information,
see Pseudocode details of saturation on page A2-9.
Table A4-7 Saturating instructions

Instruction

See

Operation

Signed Saturate

SSAT on page A8-362

Saturates optionally shifted 32-bit value to selected range

Signed Saturate 16

SSAT16 on page A8-364

Saturates two 16-bit values to selected range

Unsigned Saturate

USAT on page A8-504

Saturates optionally shifted 32-bit value to selected range

Unsigned Saturate 16

USAT16 on page A8-506

Saturates two 16-bit values to selected range

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A4-13

The Instruction Sets

A4.4.5

Packing and unpacking instructions
Table A4-8 lists the packing and unpacking instructions in the ARM and Thumb instruction sets. These are
all available from ARMv6T2 in the Thumb instruction set, and from ARMv6 onwards in the ARM
instruction set.
Table A4-8 Packing and unpacking instructions

A4-14

Instruction

See

Operation

Pack Halfword

PKH on page A8-234

Combine halfwords

Signed Extend and Add Byte

SXTAB on page A8-434

Extend 8 bits to 32 and add

Signed Extend and Add Byte 16

SXTAB16 on page A8-436

Dual extend 8 bits to 16 and add

Signed Extend and Add Halfword

SXTAH on page A8-438

Extend 16 bits to 32 and add

Signed Extend Byte

SXTB on page A8-440

Extend 8 bits to 32

Signed Extend Byte 16

SXTB16 on page A8-442

Dual extend 8 bits to 16

Signed Extend Halfword

SXTH on page A8-444

Extend 16 bits to 32

Unsigned Extend and Add Byte

UXTAB on page A8-514

Extend 8 bits to 32 and add

Unsigned Extend and Add Byte 16

UXTAB16 on page A8-516

Dual extend 8 bits to 16 and add

Unsigned Extend and Add Halfword

UXTAH on page A8-518

Extend 16 bits to 32 and add

Unsigned Extend Byte

UXTB on page A8-520

Extend 8 bits to 32

Unsigned Extend Byte 16

UXTB16 on page A8-522

Dual extend 8 bits to 16

Unsigned Extend Halfword

UXTH on page A8-524

Extend 16 bits to 32

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

The Instruction Sets

A4.4.6

Miscellaneous data-processing instructions
Table A4-9 lists the miscellaneous data-processing instructions in the ARM and Thumb instruction sets.
Immediate values in these instructions are simple binary numbers.
Table A4-9 Miscellaneous data-processing instructions

Instruction

See

Notes

Bit Field Clear

BFC on page A8-46

-

Bit Field Insert

BFI on page A8-48

-

Count Leading Zeros

CLZ on page A8-72

-

Move Top

MOVT on page A8-200

Moves 16-bit immediate value to top
halfword. Bottom halfword unchanged.

Reverse Bits

RBIT on page A8-270

-

Byte-Reverse Word

REV on page A8-272

-

Byte-Reverse Packed Halfword

REV16 on page A8-274

-

Byte-Reverse Signed Halfword

REVSH on page A8-276

-

Signed Bit Field Extract

SBFX on page A8-308

-

Select Bytes using GE flags

SEL on page A8-312

-

Unsigned Bit Field Extract

UBFX on page A8-466

-

Unsigned Sum of Absolute Differences

USAD8 on page A8-500

-

Unsigned Sum of Absolute Differences
and Accumulate

USADA8 on page A8-502

-

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A4-15

The Instruction Sets

A4.4.7

Parallel addition and subtraction instructions
These instructions perform additions and subtractions on the values of two registers and write the result to
a destination register, treating the register values as sets of two halfwords or four bytes. They are available
in ARMv6 and above.
These instructions consist of a prefix followed by a main instruction mnemonic. The prefixes are as follows:
S
Signed arithmetic modulo 28 or 216.
Q
Signed saturating arithmetic.
SH
Signed arithmetic, halving the results.
U
Unsigned arithmetic modulo 28 or 216.
UQ
Unsigned saturating arithmetic.
UH
Unsigned arithmetic, halving the results.
The main instruction mnemonics are as follows:
ADD16

Adds the top halfwords of two operands to form the top halfword of the result, and the
bottom halfwords of the same two operands to form the bottom halfword of the result.

ASX

Exchanges halfwords of the second operand, and then adds top halfwords and subtracts
bottom halfwords.

SAX

Exchanges halfwords of the second operand, and then subtracts top halfwords and adds
bottom halfwords.

SUB16

Subtracts each halfword of the second operand from the corresponding halfword of the first
operand to form the corresponding halfword of the result.

ADD8

Adds each byte of the second operand to the corresponding byte of the first operand to form
the corresponding byte of the result.

SUB8

Subtracts each byte of the second operand from the corresponding byte of the first operand
to form the corresponding byte of the result.

The instruction set permits all 36 combinations of prefix and main instruction operand.
See also Advanced SIMD parallel addition and subtraction on page A4-31.

A4-16

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

The Instruction Sets

A4.4.8

Divide instructions
In the ARMv7-R profile, the Thumb instruction set includes signed and unsigned integer divide instructions
that are implemented in hardware. For details of the instructions see:
•
SDIV on page A8-310
•
UDIV on page A8-468.

Note
•
•

SDIV and UDIV are UNDEFINED in the ARMv7-A profile.

The ARMv7-M profile also includes the SDIV and UDIV instructions.

In the ARMv7-R profile, the SCTLR.DZ bit enables divide by zero fault detection, see c1, System Control
Register (SCTLR) on page B4-45:
DZ == 0

Divide-by-zero returns a zero result.

DZ == 1

SDIV and UDIV generate an Undefined Instruction exception on a divide-by-zero.

The SCTLR.DZ bit is cleared to zero on reset.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A4-17

The Instruction Sets

A4.5

Status register access instructions
The MRS and MSR instructions move the contents of the Application Program Status Register (APSR) to or
from a general-purpose register.
The APSR is described in The Application Program Status Register (APSR) on page A2-14.
The condition flags in the APSR are normally set by executing data-processing instructions, and are
normally used to control the execution of conditional instructions. However, you can set the flags explicitly
using the MSR instruction, and you can read the current state of the flags explicitly using the MRS instruction.
For details of the system level use of status register access instructions CPS, MRS, and MSR, see Chapter B6
System Instructions.

A4-18

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

The Instruction Sets

A4.6

Load/store instructions
Table A4-10 summarizes the general-purpose register load/store instructions in the ARM and Thumb
instruction sets. See also:
•
Load/store multiple instructions on page A4-22
•
Advanced SIMD and VFP load/store instructions on page A4-26.
Load/store instructions have several options for addressing memory. For more information, see Addressing
modes on page A4-20.
Table A4-10 Load/store instructions

Data type

Load

Store

Load
unprivileged

Store
unprivileged

LoadExclusive

StoreExclusive

32-bit word

LDR

STR

LDRT

STRT

LDREX

STREX

16-bit halfword

-

STRH

-

STRHT

-

STREXH

16-bit unsigned halfword

LDRH

-

LDRHT

-

LDREXH

-

16-bit signed halfword

LDRSH

-

LDRSHT

-

-

-

8-bit byte

-

STRB

-

STRBT

-

STREXB

8-bit unsigned byte

LDRB

-

LDRBT

-

LDREXB

-

8-bit signed byte

LDRSB

-

LDRSBT

-

-

-

Two 32-bit words

LDRD

STRD

-

-

-

-

64-bit doubleword

-

-

-

-

LDREXD

STREXD

A4.6.1

Loads to the PC
The LDR instruction can be used to load a value into the PC. The value loaded is treated as an interworking
address, as described by the LoadWritePC() pseudocode function in Pseudocode details of operations on
ARM core registers on page A2-12.

A4.6.2

Halfword and byte loads and stores
Halfword and byte stores store the least significant halfword or byte from the register, to 16 or 8 bits of
memory respectively. There is no distinction between signed and unsigned stores.
Halfword and byte loads load 16 or 8 bits from memory into the least significant halfword or byte of a
register. Unsigned loads zero-extend the loaded value to 32 bits, and signed loads sign-extend the value to
32 bits.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A4-19

The Instruction Sets

A4.6.3

Unprivileged loads and stores
In an unprivileged mode, unprivileged loads and stores operate in exactly the same way as the corresponding
ordinary operations. In a privileged mode, unprivileged loads and stores are treated as though they were
executed in an unprivileged mode. For more information, see Privilege level access controls for data
accesses on page A3-38.

A4.6.4

Exclusive loads and stores
Exclusive loads and stores provide for shared memory synchronization. For more information, see
Synchronization and semaphores on page A3-12.

A4.6.5

Addressing modes
The address for a load or store is formed from two parts: a value from a base register, and an offset.
The base register can be any one of the general-purpose registers.
For loads, the base register can be the PC. This permits PC-relative addressing for position-independent
code. Instructions marked (literal) in their title in Chapter A8 Instruction Details are PC-relative loads.
The offset takes one of three formats:
Immediate

The offset is an unsigned number that can be added to or subtracted from the base
register value. Immediate offset addressing is useful for accessing data elements that
are a fixed distance from the start of the data object, such as structure fields, stack
offsets and input/output registers.

Register

The offset is a value from a general-purpose register. This register cannot be the PC.
The value can be added to, or subtracted from, the base register value. Register
offsets are useful for accessing arrays or blocks of data.

Scaled register

The offset is a general-purpose register, other than the PC, shifted by an immediate
value, then added to or subtracted from the base register. This means an array index
can be scaled by the size of each array element.

The offset and base register can be used in three different ways to form the memory address. The addressing
modes are described as follows:

A4-20

Offset

The offset is added to or subtracted from the base register to form the memory
address.

Pre-indexed

The offset is added to or subtracted from the base register to form the memory
address. The base register is then updated with this new address, to permit automatic
indexing through an array or memory block.

Post-indexed

The value of the base register alone is used as the memory address. The offset is then
added to or subtracted from the base register. The result is stored back in the base
register, to permit automatic indexing through an array or memory block.

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

The Instruction Sets

Note
Not every variant is available for every instruction, and the range of permitted immediate values and the
options for scaled registers vary from instruction to instruction. See Chapter A8 Instruction Details for full
details for each instruction.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A4-21

The Instruction Sets

A4.7

Load/store multiple instructions
Load Multiple instructions load a subset, or possibly all, of the general-purpose registers from memory.
Store Multiple instructions store a subset, or possibly all, of the general-purpose registers to memory.
The memory locations are consecutive word-aligned words. The addresses used are obtained from a base
register, and can be either above or below the value in the base register. The base register can optionally be
updated by the total size of the data transferred.
Table A4-11 summarizes the load/store multiple instructions in the ARM and Thumb instruction sets.
Table A4-11 Load/store multiple instructions
Instruction

See

Load Multiple, Increment After or Full Descending

LDM / LDMIA / LDMFD on page A8-110

Load Multiple, Decrement After or Full Ascending a

LDMDA / LDMFA on page A8-112

Load Multiple, Decrement Before or Empty Ascending

LDMDB / LDMEA on page A8-114

Load Multiple, Increment Before or Empty Descending a

LDMIB / LDMED on page A8-116

Pop multiple registers off the stack b

POP on page A8-246

Push multiple registers onto the stack c

PUSH on page A8-248

Store Multiple, Increment After or Empty Ascending

STM / STMIA / STMEA on page A8-374

Store Multiple, Decrement After or Empty Descending a

STMDA / STMED on page A8-376

Store Multiple, Decrement Before or Full Descending

STMDB / STMFD on page A8-378

Store Multiple, Increment Before or Full Ascending a

STMIB / STMFA on page A8-380

a. Not available in the Thumb instruction set.
b. This instruction is equivalent to an LDM instruction with the SP as base register, and base register updating.
c. This instruction is equivalent to an STMDB instruction with the SP as base register, and base register
updating.

System level variants of the LDM and STM instructions load and store User mode registers from a privileged
mode. Another system level variant of the LDM instruction performs an exception return. For details, see
Chapter B6 System Instructions.

A4.7.1

Loads to the PC
The LDM, LDMDA, LDMDB, LDMIB, and POP instructions can be used to load a value into the PC. The value loaded
is treated as an interworking address, as described by the LoadWritePC() pseudocode function in Pseudocode
details of operations on ARM core registers on page A2-12.

A4-22

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

The Instruction Sets

A4.8

Miscellaneous instructions
Table A4-12 summarizes the miscellaneous instructions in the ARM and Thumb instruction sets.
Table A4-12 Miscellaneous instructions
Instruction

See

Clear-Exclusive

CLREX on page A8-70

Debug hint

DBG on page A8-88

Data Memory Barrier

DMB on page A8-90

Data Synchronization Barrier

DSB on page A8-92

Instruction Synchronization Barrier

ISB on page A8-102

If Then (makes following instructions conditional)

IT on page A8-104

No Operation

NOP on page A8-222

Preload Data

PLD, PLDW (immediate) on page A8-236
PLD (literal) on page A8-238
PLD, PLDW (register) on page A8-240

Preload Instruction

PLI (immediate, literal) on page A8-242
PLI (register) on page A8-244

Set Endianness

SETEND on page A8-314

Send Event

SEV on page A8-316

Supervisor Call

SVC (previously SWI) on page A8-430

Swap, Swap Byte. Use deprecated. a

SWP, SWPB on page A8-432

Wait For Event

WFE on page A8-808

Wait For Interrupt

WFI on page A8-810

Yield

YIELD on page A8-812

a. Use Load/Store-Exclusive instructions instead, see Load/store instructions on page A4-19.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A4-23

The Instruction Sets

A4.9

Exception-generating and exception-handling instructions
The following instructions are intended specifically to cause a processor exception to occur:
•

The Supervisor Call (SVC, previously SWI) instruction is used to cause an SVC exception to occur. This
is the main mechanism for User mode code to make calls to privileged operating system code. For
more information, see Supervisor Call (SVC) exception on page B1-52.

•

The Breakpoint instruction BKPT provides for software breakpoints. For more information, see About
debug events on page C3-2.

•

In privileged system level code, the Secure Monitor Call (SMC, previously SMI) instruction. For more
information, see Secure Monitor Call (SMC) exception on page B1-53.

System level variants of the SUBS and LDM instructions can be used to return from exceptions. From ARMv6,
the SRS instruction can be used near the start of an exception handler to store return information, and the RFE
instruction can be used to return from an exception using the stored return information. For details of these
instructions, see Chapter B6 System Instructions.

A4-24

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

The Instruction Sets

A4.10 Coprocessor instructions
There are three types of instruction for communicating with coprocessors. These permit the processor to:
•

Initiate a coprocessor data-processing operation. For details see CDP, CDP2 on page A8-68.

•

Transfer general-purpose registers to and from coprocessor registers. For details, see:
—
MCR, MCR2 on page A8-186
—
MCRR, MCRR2 on page A8-188
—
MRC, MRC2 on page A8-202
—
MRRC, MRRC2 on page A8-204.

•

Load or store the values of coprocessor registers. For details, see:
—

LDC, LDC2 (immediate) on page A8-106

—

LDC, LDC2 (literal) on page A8-108

—

STC, STC2 on page A8-372.

The instruction set distinguishes up to 16 coprocessors with a 4-bit field in each coprocessor instruction, so
each coprocessor is assigned a particular number.

Note
One coprocessor can use more than one of the 16 numbers if a large coprocessor instruction set is required.
Coprocessors 10 and 11 are used, together, for VFP and some Advanced SIMD functionality. There are
different instructions for accessing these coprocessors, of similar types to the instructions for the other
coprocessors, that is, to:
•

Initiate a coprocessor data-processing operation. For details see VFP data-processing instructions on
page A4-38.

•

Transfer general-purpose registers to and from coprocessor registers. For details, see Advanced SIMD
and VFP register transfer instructions on page A4-29.

•

Load or store the values of coprocessor registers. For details, see Advanced SIMD and VFP load/store
instructions on page A4-26.

Coprocessors execute the same instruction stream as the processor, ignoring non-coprocessor instructions
and coprocessor instructions for other coprocessors. Coprocessor instructions that cannot be executed by
any coprocessor hardware cause an Undefined Instruction exception.
For more information about specific coprocessors see Coprocessor support on page A2-68.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A4-25

The Instruction Sets

A4.11 Advanced SIMD and VFP load/store instructions
Table A4-13 summarizes the extension register load/store instructions in the Advanced SIMD and VFP
instruction sets.
Advanced SIMD also provides instructions for loading and storing multiple elements, or structures of
elements, see Element and structure load/store instructions on page A4-27.
Table A4-13 Extension register load/store instructions
Instruction

See

Operation

Vector Load Multiple

VLDM on page A8-626

Load 1-16 consecutive 64-bit registers (Adv. SIMD and VFP)
Load 1-16 consecutive 32-bit registers (VFP only)

Vector Load Register

VLDR on page A8-628

Load one 64-bit register (Adv. SIMD and VFP)
Load one 32-bit register (VFP only)

Vector Store Multiple

VSTM on page A8-784

Store 1-16 consecutive 64-bit registers (Adv. SIMD and VFP)
Store 1-16 consecutive 32-bit registers (VFP only)

Vector Store Register

VSTR on page A8-786

Store one 64-bit register (Adv. SIMD and VFP)
Store one 32-bit register (VFP only)

A4-26

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

The Instruction Sets

A4.11.1 Element and structure load/store instructions
Table A4-14 shows the element and structure load/store instructions available in the Advanced SIMD
instruction set. Loading and storing structures of more than one element automatically de-interleaves or
interleaves the elements, see Figure A4-1 on page A4-28 for an example of de-interleaving. Interleaving is
the inverse process.
Table A4-14 Element and structure load/store instructions
Instruction

See

Load single element
Multiple elements

VLD1 (multiple single elements) on page A8-602

To one lane

VLD1 (single element to one lane) on page A8-604

To all lanes

VLD1 (single element to all lanes) on page A8-606

Load 2-element structure
Multiple structures

VLD2 (multiple 2-element structures) on page A8-608

To one lane

VLD2 (single 2-element structure to one lane) on page A8-610

To all lanes

VLD2 (single 2-element structure to all lanes) on page A8-612

Load 3-element structure
Multiple structures

VLD3 (multiple 3-element structures) on page A8-614

To one lane

VLD3 (single 3-element structure to one lane) on page A8-616

To all lanes

VLD3 (single 3-element structure to all lanes) on page A8-618

Load 4-element structure
Multiple structures

VLD4 (multiple 4-element structures) on page A8-620

To one lane

VLD4 (single 4-element structure to one lane) on page A8-622

To all lanes

VLD4 (single 4-element structure to all lanes) on page A8-624

Store single element

ARM DDI 0406B

Multiple elements

VST1 (multiple single elements) on page A8-768

From one lane

VST1 (single element from one lane) on page A8-770

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A4-27

The Instruction Sets

Table A4-14 Element and structure load/store instructions (continued)
Instruction

See

Store 2-element structure
Multiple structures

VST2 (multiple 2-element structures) on page A8-772

From one lane

VST2 (single 2-element structure from one lane) on page A8-774

Store 3-element structure
Multiple structures

VST3 (multiple 3-element structures) on page A8-776

From one lane

VST3 (single 3-element structure from one lane) on page A8-778

Store 4-element structure
Multiple structures

VST4 (multiple 4-element structures) on page A8-780

From one lane

VST4 (single 4-element structure from one lane) on page A8-782

A[0].x
A[0].y
A[0].z
A[1].x
Memory

A[1].y
A[1].z
A[2].x
A[2].y
A[2].z
A[3].x
A[3].y
A[3].z

X3 X2 X1 X0 D0
Y3 Y2 Y1 Y0 D1
Z3 Z2 Z1 Z0 D2

Registers

Figure A4-1 De-interleaving an array of 3-element structures

A4-28

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

The Instruction Sets

A4.12 Advanced SIMD and VFP register transfer instructions
Table A4-15 summarizes the extension register transfer instructions in the Advanced SIMD and VFP
instruction sets. These instructions transfer data from ARM core registers to extension registers, or from
extension registers to ARM core registers.
Advanced SIMD vectors, and single-precision and double-precision VFP registers, are all views of the same
extension register set. For details see Advanced SIMD and VFP extension registers on page A2-21.
Table A4-15 Extension register transfer instructions
Instruction

See

Copy element from ARM core register to every element of
Advanced SIMD vector

VDUP (ARM core register) on page A8-594

Copy byte, halfword, or word from ARM core register to
extension register

VMOV (ARM core register to scalar) on
page A8-644

Copy byte, halfword, or word from extension register to ARM
core register

VMOV (scalar to ARM core register) on
page A8-646

Copy from single-precision VFP register to ARM core register,
or from ARM core register to single-precision VFP register

VMOV (between ARM core register and
single-precision register) on page A8-648

Copy two words from ARM core registers to consecutive
single-precision VFP registers, or from consecutive
single-precision VFP registers to ARM core registers

VMOV (between two ARM core registers and
two single-precision registers) on page A8-650

Copy two words from ARM core registers to doubleword
extension register, or from doubleword extension register to
ARM core registers

VMOV (between two ARM core registers and a
doubleword extension register) on page A8-652

Copy from Advanced SIMD and VFP extension System Register
to ARM core register

VMRS on page A8-658
VMRS on page B6-27 (system level view)

Copy from ARM core register to Advanced SIMD and VFP
extension System Register

VMSR on page A8-660
VMSR on page B6-29 (system level view)

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A4-29

The Instruction Sets

A4.13 Advanced SIMD data-processing operations
Advanced SIMD data-processing operations process registers containing vectors of elements of the same
type packed together, enabling the same operation to be performed on multiple items in parallel.
Instructions operate on vectors held in 64-bit or 128-bit registers. Figure A4-2 shows an operation on two
64-bit operand vectors, generating a 64-bit vector result.

Note
Figure A4-2 and other similar figures show 64-bit vectors that consist of four 16-bit elements, and 128-bit
vectors that consist of four 32-bit elements. Other element sizes produce similar figures, but with one, two,
eight, or sixteen operations performed in parallel instead of four.

Dn
Dm

Op

Op

Op

Op
Dd

Figure A4-2 Advanced SIMD instruction operating on 64-bit registers
Many Advanced SIMD instructions have variants that produce vectors of elements double the size of the
inputs. In this case, the number of elements in the result vector is the same as the number of elements in the
operand vectors, but each element, and the whole vector, is double the size.
Figure A4-3 shows an example of an Advanced SIMD instruction operating on 64-bit registers, and
generating a 128-bit result.
Dn
Dm

Op

Op

Op

Op

Qd

Figure A4-3 Advanced SIMD instruction producing wider result
There are also Advanced SIMD instructions that have variants that produce vectors containing elements half
the size of the inputs. Figure A4-4 on page A4-31 shows an example of an Advanced SIMD instruction
operating on one 128-bit register, and generating a 64-bit result.

A4-30

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

The Instruction Sets

Qn

Op

Op

Op

Op

Dd

Figure A4-4 Advanced SIMD instruction producing narrower result
Some Advanced SIMD instructions do not conform to these standard patterns. Their operation patterns are
described in the individual instruction descriptions.
Advanced SIMD instructions that perform floating-point arithmetic use the ARM standard floating-point
arithmetic defined in Floating-point data types and arithmetic on page A2-32.

A4.13.1 Advanced SIMD parallel addition and subtraction
Table A4-16 shows the Advanced SIMD parallel add and subtract instructions.
Table A4-16 Advanced SIMD parallel add and subtract instructions
Instruction

See

Vector Add

VADD (integer) on page A8-536
VADD (floating-point) on page A8-538

Vector Add and Narrow, returning High Half

VADDHN on page A8-540

Vector Add Long, Vector Add Wide

VADDL, VADDW on page A8-542

Vector Halving Add, Vector Halving Subtract

VHADD, VHSUB on page A8-600

Vector Pairwise Add and Accumulate Long

VPADAL on page A8-682

Vector Pairwise Add

VPADD (integer) on page A8-684
VPADD (floating-point) on page A8-686

Vector Pairwise Add Long

VPADDL on page A8-688

Vector Rounding Add and Narrow, returning High Half

VRADDHN on page A8-726

Vector Rounding Halving Add

VRHADD on page A8-734

Vector Rounding Subtract and Narrow, returning High Half

VRSUBHN on page A8-748

Vector Saturating Add

VQADD on page A8-700

Vector Saturating Subtract

VQSUB on page A8-724

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A4-31

The Instruction Sets

Table A4-16 Advanced SIMD parallel add and subtract instructions (continued)
Instruction

See

Vector Subtract

VSUB (integer) on page A8-788
VSUB (floating-point) on page A8-790

Vector Subtract and Narrow, returning High Half

VSUBHN on page A8-792

Vector Subtract Long, Vector Subtract Wide

VSUBL, VSUBW on page A8-794

A4.13.2 Bitwise Advanced SIMD data-processing instructions
Table A4-17 shows bitwise Advanced SIMD data-processing instructions. These operate on the doubleword
(64-bit) or quadword (128-bit) extension registers, and there is no division into vector elements.
Table A4-17 Bitwise Advanced SIMD data-processing instructions
Instruction

See

Vector Bitwise AND

VAND (register) on page A8-544

Vector Bitwise Bit Clear (AND complement)

VBIC (immediate) on page A8-546
VBIC (register) on page A8-548

Vector Bitwise Exclusive OR

VEOR on page A8-596

Vector Bitwise Insert if False
VBIF, VBIT, VBSL on page A8-550
Vector Bitwise Insert if True

A4-32

Vector Bitwise Move

VMOV (immediate) on page A8-640
VMOV (register) on page A8-642

Vector Bitwise NOT

VMVN (immediate) on page A8-668
VMVN (register) on page A8-670

Vector Bitwise OR

VORR (immediate) on page A8-678
VORR (register) on page A8-680

Vector Bitwise OR NOT

VORN (register) on page A8-676

Vector Bitwise Select

VBIF, VBIT, VBSL on page A8-550

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

The Instruction Sets

A4.13.3 Advanced SIMD comparison instructions
Table A4-18 shows Advanced SIMD comparison instructions.
Table A4-18 Advanced SIMD comparison instructions
Instruction

See

Vector Absolute Compare

VACGE, VACGT, VACLE,VACLT on page A8-534

Vector Compare Equal

VCEQ (register) on page A8-552

Vector Compare Equal to Zero

VCEQ (immediate #0) on page A8-554

Vector Compare Greater Than or Equal

VCGE (register) on page A8-556

Vector Compare Greater Than or Equal to Zero

VCGE (immediate #0) on page A8-558

Vector Compare Greater Than

VCGT (register) on page A8-560

Vector Compare Greater Than Zero

VCGT (immediate #0) on page A8-562

Vector Compare Less Than or Equal to Zero

VCLE (immediate #0) on page A8-564

Vector Compare Less Than Zero

VCLT (immediate #0) on page A8-568

Vector Test Bits

VTST on page A8-802

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A4-33

The Instruction Sets

A4.13.4 Advanced SIMD shift instructions
Table A4-19 lists the shift instructions in the Advanced SIMD instruction set.
Table A4-19 Advanced SIMD shift instructions

A4-34

Instruction

See

Vector Saturating Rounding Shift Left

VQRSHL on page A8-714

Vector Saturating Rounding Shift Right and Narrow

VQRSHRN, VQRSHRUN on page A8-716

Vector Saturating Shift Left

VQSHL (register) on page A8-718
VQSHL, VQSHLU (immediate) on page A8-720

Vector Saturating Shift Right and Narrow

VQSHRN, VQSHRUN on page A8-722

Vector Rounding Shift Left

VRSHL on page A8-736

Vector Rounding Shift Right

VRSHR on page A8-738

Vector Rounding Shift Right and Accumulate

VRSRA on page A8-746

Vector Rounding Shift Right and Narrow

VRSHRN on page A8-740

Vector Shift Left

VSHL (immediate) on page A8-750
VSHL (register) on page A8-752

Vector Shift Left Long

VSHLL on page A8-754

Vector Shift Right

VSHR on page A8-756

Vector Shift Right and Narrow

VSHRN on page A8-758

Vector Shift Left and Insert

VSLI on page A8-760

Vector Shift Right and Accumulate

VSRA on page A8-764

Vector Shift Right and Insert

VSRI on page A8-766

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

The Instruction Sets

A4.13.5 Advanced SIMD multiply instructions
Table A4-20 summarizes the Advanced SIMD multiply instructions.
Table A4-20 Advanced SIMD multiply instructions
Instruction

See

Vector Multiply Accumulate

VMLA, VMLAL, VMLS, VMLSL (integer) on
page A8-634
VMLA, VMLS (floating-point) on page A8-636
VMLA, VMLAL, VMLS, VMLSL (by scalar) on
page A8-638

Vector Multiply Accumulate Long
Vector Multiply Subtract
Vector Multiply Subtract Long
Vector Multiply
Vector Multiply Long

VMUL, VMULL (integer and polynomial) on
page A8-662
VMUL (floating-point) on page A8-664
VMUL, VMULL (by scalar) on page A8-666

Vector Saturating Doubling Multiply Accumulate Long
VQDMLAL, VQDMLSL on page A8-702
Vector Saturating Doubling Multiply Subtract Long
Vector Saturating Doubling Multiply Returning High Half

VQDMULH on page A8-704

Vector Saturating Rounding Doubling Multiply Returning
High Half

VQRDMULH on page A8-712

Vector Saturating Doubling Multiply Long

VQDMULL on page A8-706

Advanced SIMD multiply instructions can operate on vectors of:
•
8-bit, 16-bit, or 32-bit unsigned integers
•
8-bit, 16-bit, or 32-bit signed integers
•
8-bit or 16-bit polynomials over {0,1} (VMUL and VMULL only)
•
single-precision (32-bit) floating-point numbers.
They can also act on one vector and one scalar.
Long instructions have doubleword (64-bit) operands, and produce quadword (128-bit) results. Other
Advanced SIMD multiply instructions can have either doubleword or quadword operands, and produce
results of the same size.
VFP multiply instructions can operate on:
•
single-precision (32-bit) floating-point numbers
•
double-precision (64-bit) floating-point numbers.
Some VFP implementations do not support double-precision numbers.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A4-35

The Instruction Sets

A4.13.6 Miscellaneous Advanced SIMD data-processing instructions
Table A4-21 shows miscellaneous Advanced SIMD data-processing instructions.
Table A4-21 Miscellaneous Advanced SIMD data-processing instructions
Instruction

See

Vector Absolute Difference and Accumulate

VABA, VABAL on page A8-526

Vector Absolute Difference

VABD, VABDL (integer) on page A8-528
VABD (floating-point) on page A8-530

Vector Absolute

VABS on page A8-532

Vector Convert between floating-point and
fixed point

VCVT (between floating-point and fixed-point, Advanced SIMD) on
page A8-580

Vector Convert between floating-point and
integer

VCVT (between floating-point and integer, Advanced SIMD) on
page A8-576

Vector Convert between half-precision and
single-precision

VCVT (between half-precision and single-precision, Advanced
SIMD) on page A8-586

Vector Count Leading Sign Bits

VCLS on page A8-566

Vector Count Leading Zeros

VCLZ on page A8-570

Vector Count Set Bits

VCNT on page A8-574

Vector Duplicate scalar

VDUP (scalar) on page A8-592

Vector Extract

VEXT on page A8-598

Vector Move and Narrow

VMOVN on page A8-656

Vector Move Long

VMOVL on page A8-654

Vector Maximum, Minimum

VMAX, VMIN (integer) on page A8-630
VMAX, VMIN (floating-point) on page A8-632

Vector Negate

VNEG on page A8-672

Vector Pairwise Maximum, Minimum

VPMAX, VPMIN (integer) on page A8-690
VPMAX, VPMIN (floating-point) on page A8-692

Vector Reciprocal Estimate

VRECPE on page A8-728

Vector Reciprocal Step

VRECPS on page A8-730

Vector Reciprocal Square Root Estimate

VRSQRTE on page A8-742

A4-36

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

The Instruction Sets

Table A4-21 Miscellaneous Advanced SIMD data-processing instructions (continued)
Instruction

See

Vector Reciprocal Square Root Step

VRSQRTS on page A8-744

Vector Reverse

VREV16, VREV32, VREV64 on page A8-732

Vector Saturating Absolute

VQABS on page A8-698

Vector Saturating Move and Narrow

VQMOVN, VQMOVUN on page A8-708

Vector Saturating Negate

VQNEG on page A8-710

Vector Swap

VSWP on page A8-796

Vector Table Lookup

VTBL, VTBX on page A8-798

Vector Transpose

VTRN on page A8-800

Vector Unzip

VUZP on page A8-804

Vector Zip

VZIP on page A8-806

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A4-37

The Instruction Sets

A4.14 VFP data-processing instructions
Table A4-22 summarizes the data-processing instructions in the VFP instruction set.
For details of the floating-point arithmetic used by VFP instructions, see Floating-point data types and
arithmetic on page A2-32.
Table A4-22 VFP data-processing instructions
Instruction

See

Absolute value

VABS on page A8-532

Add

VADD (floating-point) on page A8-538

Compare (optionally with exceptions enabled)

VCMP, VCMPE on page A8-572

Convert between floating-point and integer

VCVT, VCVTR (between floating-point and integer, VFP) on
page A8-578

Convert between floating-point and fixed-point

VCVT (between floating-point and fixed-point, VFP) on
page A8-582

Convert between double-precision and
single-precision

VCVT (between double-precision and single-precision) on
page A8-584

Convert between half-precision and single-precision

VCVTB, VCVTT (between half-precision and
single-precision, VFP) on page A8-588

Divide

VDIV on page A8-590

Multiply Accumulate, Multiply Subtract

VMLA, VMLS (floating-point) on page A8-636

Move immediate value to extension register

VMOV (immediate) on page A8-640

Copy from one extension register to another

VMOV (register) on page A8-642

Multiply

VMUL (floating-point) on page A8-664

Negate (invert the sign bit)

VNEG on page A8-672

Multiply Accumulate and Negate, Multiply Subtract
and Negate, Multiply and Negate

VNMLA, VNMLS, VNMUL on page A8-674

Square Root

VSQRT on page A8-762

Subtract

VSUB (floating-point) on page A8-790

A4-38

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Chapter A5
ARM Instruction Set Encoding

This chapter describes the encoding of the ARM instruction set. It contains the following sections:
•
ARM instruction set encoding on page A5-2
•
Data-processing and miscellaneous instructions on page A5-4
•
Load/store word and unsigned byte on page A5-19
•
Media instructions on page A5-21
•
Branch, branch with link, and block data transfer on page A5-27
•
Supervisor Call, and coprocessor instructions on page A5-28
•
Unconditional instructions on page A5-30.

Note
•

Architecture variant information in this chapter describes the architecture variant or extension in
which the instruction encoding was introduced into the ARM instruction set. All means that the
instruction encoding was introduced in ARMv4 or earlier, and so is in all variants of the ARM
instruction set covered by this manual.

•

In the decode tables in this chapter, an entry of - for a field value means the value of the field does
not affect the decoding.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A5-1

ARM Instruction Set Encoding

A5.1

ARM instruction set encoding
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8

cond

7 6 5 4 3 2

op1

1 0

op

The ARM instruction stream is a sequence of word-aligned words. Each ARM instruction is a single 32-bit
word in that stream.
Table A5-1 shows the major subdivisions of the ARM instruction set, determined by bits [31:25,4].
Most ARM instructions can be conditional, with a condition determined by bits [31:28] of the instruction,
the cond field. For details see The condition field. This applies to all instructions except those with the cond
field equal to 0b1111.
Table A5-1 ARM instruction encoding
cond

op1

op

Instruction classes

not 1111

00x

-

Data-processing and miscellaneous instructions on page A5-4.

010

-

Load/store word and unsigned byte on page A5-19.

011

0

Load/store word and unsigned byte on page A5-19.

1

Media instructions on page A5-21.

10x

-

Branch, branch with link, and block data transfer on page A5-27.

11x

-

Supervisor Call, and coprocessor instructions on page A5-28.
Includes VFP instructions and Advanced SIMD data transfers, see Chapter A7 Advanced
SIMD and VFP Instruction Encoding.

-

-

If the cond field is 0b1111, the instruction can only be executed unconditionally, see
Unconditional instructions on page A5-30.
Includes Advanced SIMD instructions, see Chapter A7 Advanced SIMD and VFP
Instruction Encoding.

1111

A5.1.1

The condition field
Every conditional instruction contains a 4-bit condition code field in bits 31 to 28:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8

7 6 5 4 3 2

1 0

cond
This field contains one of the values 0b0000-0b1110 described in Table A8-1 on page A8-8. Most
instruction mnemonics can be extended with the letters defined in the mnemonic extension field.
If the always (AL) condition is specified, the instruction is executed irrespective of the value of the condition
code flags. The absence of a condition code on an instruction mnemonic implies the AL condition code.

A5-2

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

ARM Instruction Set Encoding

A5.1.2

UNDEFINED and UNPREDICTABLE instruction set space
An attempt to execute an unallocated instruction results in either:
•

Unpredictable behavior. The instruction is described as UNPREDICTABLE.

•

An Undefined Instruction exception. The instruction is described as UNDEFINED.

An instruction is UNDEFINED if it is declared as UNDEFINED in an instruction description, or in this chapter.
An instruction is UNPREDICTABLE if:
•

it is declared as UNPREDICTABLE in an instruction description or in this chapter

•

the pseudocode for that encoding does not indicate that a different special case applies, and a bit
marked (0) or (1) in the encoding diagram of an instruction is not 0 or 1 respectively.

Unless otherwise specified:

A5.1.3

•

ARM instructions introduced in an architecture variant are UNDEFINED in earlier architecture variants.

•

ARM instructions introduced in one or more architecture extensions are UNDEFINED if none of those
extensions are implemented.

The PC and the use of 0b1111 as a register specifier
In ARM instructions, the use of 0b1111 as a register specifier specifies the PC.
Many instructions are UNPREDICTABLE if they use 0b1111 as a register specifier. This is specified by
pseudocode in the instruction description.

Note
Use of the PC as the base register in any store instruction is deprecated in ARMv7.

A5.1.4

The SP and the use of 0b1101 as a register specifier
In ARM instructions, the use of 0b1101 as a register specifier specifies the SP.
ARM deprecates:
•

using SP for any purpose other than as a stack pointer

•

using the SP in ARM instructions in ways other that those listed in 32-bit Thumb instruction support
for R13 on page A6-4, except that ARM does not deprecate the use of instructions of the following
form that write a word-aligned address to SP:
SUB SP, , #

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A5-3

ARM Instruction Set Encoding

A5.2

Data-processing and miscellaneous instructions
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8

cond

0 0 op

7 6 5 4 3 2

op1

1 0

op2

Table A5-2 shows the allocation of encodings in this space.
Table A5-2 Data-processing and miscellaneous instructions
op

op1

op2

Instruction or instruction class

Variant

0

not 10xx0

xxx0

Data-processing (register) on page A5-5

-

0xx1

Data-processing (register-shifted register) on page A5-7

-

0xxx

Miscellaneous instructions on page A5-18

-

1xx0

Halfword multiply and multiply-accumulate on page A5-13

-

0xxxx

1001

Multiply and multiply-accumulate on page A5-12

-

1xxxx

1001

Synchronization primitives on page A5-16

-

not 0xx1x

1011

Extra load/store instructions on page A5-14

-

11x1

Extra load/store instructions on page A5-14

-

1011

Extra load/store instructions (unprivileged) on page A5-15

-

11x1

Extra load/store instructions (unprivileged) on page A5-15

-

not 10xx0

-

Data-processing (immediate) on page A5-8

-

10000

-

16-bit immediate load (MOV (immediate) on page A8-194)

v6T2

10100

-

High halfword 16-bit immediate load (MOVT on page A8-200)

v6T2

10x10

-

MSR (immediate), and hints on page A5-17

-

10xx0

0xx1x

1

A5-4

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

ARM Instruction Set Encoding

A5.2.1

Data-processing (register)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4

cond

0 0 0

op1

op2

3 2 1 0

op3 0

If op1 == 0b10xx0, see Data-processing and miscellaneous instructions on page A5-4.
Table A5-3 shows the allocation of encodings in this space. These encodings are in all architecture variants.
Table A5-3 Data-processing (register) instructions

ARM DDI 0406B

op1

op2

op3

Instruction

See

0000x

-

-

Bitwise AND

AND (register) on page A8-36

0001x

-

-

Bitwise Exclusive OR

EOR (register) on page A8-96

0010x

-

-

Subtract

SUB (register) on page A8-422

0011x

-

-

Reverse Subtract

RSB (register) on page A8-286

0100x

-

-

Add

ADD (register) on page A8-24

0101x

-

-

Add with Carry

ADC (register) on page A8-16

0110x

-

-

Subtract with Carry

SBC (register) on page A8-304

0111x

-

-

Reverse Subtract with Carry

RSC (register) on page A8-292

10001

-

-

Test

TST (register) on page A8-456

10011

-

-

Test Equivalence

TEQ (register) on page A8-450

10101

-

-

Compare

CMP (register) on page A8-82

10111

-

-

Compare Negative

CMN (register) on page A8-76

1100x

-

-

Bitwise OR

ORR (register) on page A8-230

1101x

00000

00

Move

MOV (register) on page A8-196

not 00000

00

Logical Shift Left

LSL (immediate) on page A8-178

-

01

Logical Shift Right

LSR (immediate) on page A8-182

-

10

Arithmetic Shift Right

ASR (immediate) on page A8-40

00000

11

Rotate Right with Extend

RRX on page A8-282

not 00000

11

Rotate Right

ROR (immediate) on page A8-278

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A5-5

ARM Instruction Set Encoding

Table A5-3 Data-processing (register) instructions (continued)

A5-6

op1

op2

op3

Instruction

See

1110x

-

-

Bitwise Bit Clear

BIC (register) on page A8-52

1111x

-

-

Bitwise NOT

MVN (register) on page A8-216

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

ARM Instruction Set Encoding

A5.2.2

Data-processing (register-shifted register)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4

cond

0 0 0

op1

3 2 1 0

0 op2 1

If op1 == 0b10xx0, see Data-processing and miscellaneous instructions on page A5-4.
Table A5-4 shows the allocation of encodings in this space. These encodings are in all architecture variants.
Table A5-4 Data-processing (register-shifted register) instructions

ARM DDI 0406B

op1

op2

Instruction

See

0000x

-

Bitwise AND

AND (register-shifted register) on page A8-38

0001x

-

Bitwise Exclusive OR

EOR (register-shifted register) on page A8-98

0010x

-

Subtract

SUB (register-shifted register) on page A8-424

0011x

-

Reverse Subtract

RSB (register-shifted register) on page A8-288

0100x

-

Add

ADD (register-shifted register) on page A8-26

0101x

-

Add with Carry

ADC (register-shifted register) on page A8-18

0110x

-

Subtract with Carry

SBC (register-shifted register) on page A8-306

0111x

-

Reverse Subtract with Carry

RSC (register-shifted register) on page A8-294

10001

-

Test

TST (register-shifted register) on page A8-458

10011

-

Test Equivalence

TEQ (register-shifted register) on page A8-452

10101

-

Compare

CMP (register-shifted register) on page A8-84

10111

-

Compare Negative

CMN (register-shifted register) on page A8-78

1100x

-

Bitwise OR

ORR (register-shifted register) on page A8-232

1101x

00

Logical Shift Left

LSL (register) on page A8-180

01

Logical Shift Right

LSR (register) on page A8-184

10

Arithmetic Shift Right

ASR (register) on page A8-42

11

Rotate Right

ROR (register) on page A8-280

1110x

-

Bitwise Bit Clear

BIC (register-shifted register) on page A8-54

1111x

-

Bitwise NOT

MVN (register-shifted register) on page A8-218

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A5-7

ARM Instruction Set Encoding

A5.2.3

Data-processing (immediate)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8

cond

0 0 1

op

7 6 5 4 3 2

1 0

Rn

If op == 0b10xx0, see Data-processing and miscellaneous instructions on page A5-4.
Table A5-5 shows the allocation of encodings in this space. These encodings are in all architecture variants.
Table A5-5 Data-processing (immediate) instructions
op

Rn

Instruction

See

0000x

-

Bitwise AND

AND (immediate) on page A8-34

0001x

-

Bitwise Exclusive OR

EOR (immediate) on page A8-94

0010x

not 1111

Subtract

SUB (immediate, ARM) on page A8-420

1111

Form PC-relative address

ADR on page A8-32

0011x

-

Reverse Subtract

RSB (immediate) on page A8-284

0100x

not 1111

Add

ADD (immediate, ARM) on page A8-22

1111

Form PC-relative address

ADR on page A8-32

0101x

-

Add with Carry

ADC (immediate) on page A8-14

0110x

-

Subtract with Carry

SBC (immediate) on page A8-302

0111x

-

Reverse Subtract with Carry

RSC (immediate) on page A8-290

10001

-

Test

TST (immediate) on page A8-454

10011

-

Test Equivalence

TEQ (immediate) on page A8-448

10101

-

Compare

CMP (immediate) on page A8-80

10111

-

Compare Negative

CMN (immediate) on page A8-74

1100x

-

Bitwise OR

ORR (immediate) on page A8-228

1101x

-

Move

MOV (immediate) on page A8-194

1110x

-

Bitwise Bit Clear

BIC (immediate) on page A8-50

1111x

-

Bitwise NOT

MVN (immediate) on page A8-214

These instructions all have modified immediate constants, rather than a simple 12-bit binary number. This
provides a more useful range of values. For details see Modified immediate constants in ARM instructions
on page A5-9.

A5-8

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

ARM Instruction Set Encoding

A5.2.4

Modified immediate constants in ARM instructions
15 14 13 12 11 10 9

8 7 6 5 4

3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4

rotation

3 2 1 0

a b c d e f g h

Table A5-6 shows the range of modified immediate constants available in ARM data-processing
instructions, and how they are encoded in the a, b, c, d, e, f, g, h, and rotation fields in the instruction.
Table A5-6 Encoding of modified immediates in ARM processing instructions
rotation

 a

0000

00000000 00000000 00000000 abcdefgh

0001

gh000000 00000000 00000000 00abcdef

0010

efgh0000 00000000 00000000 0000abcd

0011

cdefgh00 00000000 00000000 000000ab

0100

abcdefgh 00000000 00000000 00000000

.

.

.

.

.

.

1001

00000000 00abcdef gh000000 00000000

.

.

.

.

.

.

1110

00000000 00000000 0000abcd efgh0000

1111

00000000 00000000 000000ab cdefgh00

8-bit values shifted to other even-numbered positions

8-bit values shifted to other even-numbered positions

a. In this table, the immediate constant value is shown in binary form, to relate
abcdefgh to the encoding diagram. In assembly syntax, the immediate value
is specified in the usual way (a decimal number by default).

Note
The range of values available in ARM modified immediate constants is slightly different from the range of
values available in 32-bit Thumb instructions. See Modified immediate constants in Thumb instructions on
page A6-17.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A5-9

ARM Instruction Set Encoding

Carry out
A logical instruction with rotation == 0b0000 does not affect APSR.C. Otherwise, a logical instruction that
sets the flags sets APSR.C to the value of bit [31] of the modified immediate constant.

Constants with multiple encodings
Some constant values have multiple possible encodings. In this case, a UAL assembler must select the
encoding with the lowest unsigned value of the rotation field. This is the encoding that appears first in
Table A5-6 on page A5-9. For example, the constant #3 must be encoded with (rotation, abcdefgh) ==
(0b0000, 0b00000011), not (0b0001, 0b00001100), (0b0010, 0b00110000), or (0b0011, 0b11000000).
In particular, this means that all constants in the range 0-255 are encoded with rotation == 0b0000, and
permitted constants outside that range are encoded with rotation != 0b0000. A flag-setting logical instruction
with a modified immediate constant therefore leaves APSR.C unchanged if the constant is in the range 0-255
and sets it to the most significant bit of the constant otherwise. This matches the behavior of Thumb
modified immediate constants for all constants that are permitted in both the ARM and Thumb instruction
sets.
An alternative syntax is available for a modified immediate constant that permits the programmer to specify
the encoding directly. In this syntax, # is instead written as #,#, where:


is the numeric value of abcdefgh, in the range 0-255



is twice the numeric value of rotation, an even number in the range 0-30.

This syntax permits all ARM data-processing instructions with modified immediate constants to be
disassembled to assembler syntax that will assemble to the original instruction.
This syntax also makes it possible to write variants of some flag-setting logical instructions that have
different effects on APSR.C to those obtained with the normal # syntax. For example,
ANDS R1,R2,#12,#2 has the same behavior as ANDS R1,R2,#3 except that it sets APSR.C to 0 instead of leaving
it unchanged. Such variants of flag-setting logical instructions do not have equivalents in the Thumb
instruction set, and their use is deprecated.

Operation
// ARMExpandImm()
// ==============
bits(32) ARMExpandImm(bits(12) imm12)
// APSR.C argument to following function call does not affect the imm32 result.
(imm32, -) = ARMExpandImm_C(imm12, APSR.C);
return imm32;
// ARMExpandImm_C()
// ================
(bits(32), bit) ARMExpandImm_C(bits(12) imm12, bit carry_in)

A5-10

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

ARM Instruction Set Encoding

unrotated_value = ZeroExtend(imm12<7:0>, 32);
(imm32, carry_out) = Shift_C(unrotated_value, SRType_ROR, 2*UInt(imm12<11:8>), carry_in);
return (imm32, carry_out);

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A5-11

ARM Instruction Set Encoding

A5.2.5

Multiply and multiply-accumulate
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8

cond

0 0 0 0

op

7 6 5 4 3 2

1 0

1 0 0 1

Table A5-7 shows the allocation of encodings in this space.
Table A5-7 Multiply and multiply-accumulate instructions

A5-12

op

Instruction

See

Variant

000x

Multiply

MUL on page A8-212

All

001x

Multiply Accumulate

MLA on page A8-190

All

0100

Unsigned Multiply Accumulate Accumulate Long

UMAAL on page A8-482

v6

0101

UNDEFINED

-

-

0110

Multiply and Subtract

MLS on page A8-192

v6T2

0111

UNDEFINED

-

-

100x

Unsigned Multiply Long

UMULL on page A8-486

All

101x

Unsigned Multiply Accumulate Long

UMLAL on page A8-484

All

110x

Signed Multiply Long

SMULL on page A8-356

All

111x

Signed Multiply Accumulate Long

SMLAL on page A8-334

All

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

ARM Instruction Set Encoding

A5.2.6

Saturating addition and subtraction
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4

cond

0 0 0 1 0

op

0

3 2 1 0

0 1 0 1

Table A5-8 shows the allocation of encodings in this space. These encodings are all available in ARMv5TE
and above, and are UNDEFINED in earlier variants of the architecture.
Table A5-8 Saturating addition and subtraction instructions

A5.2.7

op

Instruction

See

00

Saturating Add

QADD on page A8-250

01

Saturating Subtract

QSUB on page A8-264

10

Saturating Double and Add

QDADD on page A8-258

11

Saturating Double and Subtract

QDSUB on page A8-260

Halfword multiply and multiply-accumulate
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4

cond

0 0 0 1 0 op1 0

1

3 2 1 0

op 0

Table A5-9 shows the allocation of encodings in this space.
These encodings are signed multiply (SMUL) and signed multiply-accumulate (SMLA) instructions, operating
on 16-bit values, or mixed 16-bit and 32-bit values. The results and accumulators are 32-bit or 64-bit.
These encodings are all available in ARMv5TE and above, and are UNDEFINED in earlier variants of the
architecture.
Table A5-9 Halfword multiply and multiply-accumulate instructions
op1

op

Instruction

See

00

-

Signed 16-bit multiply, 32-bit accumulate

SMLABB, SMLABT, SMLATB, SMLATT on
page A8-330

01

0

Signed 16-bit x 32-bit multiply, 32-bit accumulate

SMLAWB, SMLAWT on page A8-340

01

1

Signed 16-bit x 32-bit multiply, 32-bit result

SMULWB, SMULWT on page A8-358

10

-

Signed 16-bit multiply, 64-bit accumulate

SMLALBB, SMLALBT, SMLALTB, SMLALTT
on page A8-336

11

-

Signed 16-bit multiply, 32-bit result

SMULBB, SMULBT, SMULTB, SMULTT on
page A8-354

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A5-13

ARM Instruction Set Encoding

A5.2.8

Extra load/store instructions
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8

cond

0 0 0

op1

Rn

7 6 5 4 3 2

1 0

1 op2 1

If op1 == 0b0xx1x or op2 == 0b00, see Data-processing and miscellaneous instructions on page A5-4.
Table A5-10 shows the allocation of encodings in this space.
Table A5-10 Extra load/store instructions
op2

op1

Rn

Instruction

See

Variant

01

xx0x0

-

Store Halfword

STRH (register) on page A8-412

All

xx0x1

-

Load Halfword

LDRH (register) on page A8-156

All

xx1x0

-

Store Halfword

STRH (immediate, ARM) on page A8-410

All

xx1x1

not 1111

Load Halfword

LDRH (immediate, ARM) on page A8-152

All

1111

Load Halfword

LDRH (literal) on page A8-154

All

xx0x0

-

Load Dual

LDRD (register) on page A8-140

v5TE

xx0x1

-

Load Signed Byte

LDRSB (register) on page A8-164

All

xx1x0

not 1111

Load Dual

LDRD (immediate) on page A8-136

v5TE

1111

Load Dual

LDRD (literal) on page A8-138

v5TE

not 1111

Load Signed Byte

LDRSB (immediate) on page A8-160

All

1111

Load Signed Byte

LDRSB (literal) on page A8-162

All

xx0x0

-

Store Dual

STRD (register) on page A8-398

All

xx0x1

-

Load Signed Halfword

LDRSH (register) on page A8-172

All

xx1x0

-

Store Dual

STRD (immediate) on page A8-396

All

xx1x1

not 1111

Load Signed Halfword

LDRSH (immediate) on page A8-168

All

1111

Load Signed Halfword

LDRSH (literal) on page A8-170

All

10

xx1x1

11

A5-14

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

ARM Instruction Set Encoding

A5.2.9

Extra load/store instructions (unprivileged)
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4

cond

0 0 0 0

1 op

Rt

3 2 1 0

1 op2 1

If op2 == 0b00, see Data-processing and miscellaneous instructions on page A5-4.
Table A5-11 shows the allocation of encodings in this space. The instruction encodings are all available in
ARMv6T2 and above, and are UNDEFINED in earlier variants of the architecture.
Table A5-11 Extra load/store instructions (unprivileged)
op2

op

Rt

Instruction

See

01

0

-

Store Halfword Unprivileged

STRHT on page A8-414

1

-

Load Halfword Unprivileged

LDRHT on page A8-158

0

xxx0

UNPREDICTABLE

-

xxx1

UNDEFINED

-

1x

ARM DDI 0406B

10

1

-

Load Signed Byte Unprivileged

LDRSBT on page A8-166

11

1

-

Load Signed Halfword Unprivileged

LDRSHT on page A8-174

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A5-15

ARM Instruction Set Encoding

A5.2.10 Synchronization primitives
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8

cond

0 0 0 1

op

7 6 5 4 3 2

1 0

1 0 0 1

Table A5-12 shows the allocation of encodings in this space.
Other encodings in this space are UNDEFINED.
Table A5-12 Synchronization primitives
op

Instruction

See

Variant

0x00

Swap Word, Swap Byte

SWP, SWPB on page A8-432 a

All

1000

Store Register Exclusive

STREX on page A8-400

v6

1001

Load Register Exclusive

LDREX on page A8-142

v6

1010

Store Register Exclusive Doubleword

STREXD on page A8-404

v6K

1011

Load Register Exclusive Doubleword

LDREXD on page A8-146

v6K

1100

Store Register Exclusive Byte

STREXB on page A8-402

v6K

1101

Load Register Exclusive Byte

LDREXB on page A8-144

v6K

1110

Store Register Exclusive Halfword

STREXH on page A8-406

v6K

1111

Load Register Exclusive Halfword

LDREXH on page A8-148

v6K

a. Use of these instructions is deprecated.

A5-16

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

ARM Instruction Set Encoding

A5.2.11 MSR (immediate), and hints
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4

cond

0 0 1 1 0 op 1 0

op1

3 2 1 0

op2

Table A5-13 shows the allocation of encodings in this space.
Other encodings in this space are unallocated hints. They execute as NOPs, but software must not use them.
Table A5-13 MSR (immediate), and hints
op

op1

op2

Instruction

See

Variant

0

0000

00000000

No Operation hint

NOP on page A8-222

v6K, v6T2

00000001

Yield hint

YIELD on page A8-812

v6K

00000010

Wait For Event hint

WFE on page A8-808

v6K

00000011

Wait For Interrupt hint

WFI on page A8-810

v6K

00000100

Send Event hint

SEV on page A8-316

v6K

1111xxxx

Debug hint

DBG on page A8-88

v7

0100

-

MSR (immediate) on page A8-208

All

1x00

-

Move to Special Register,
application level

xx01

-

MSR (immediate) on page B6-12

All

xx1x

-

Move to Special Register, system
level

-

-

Move to Special Register, system
level

MSR (immediate) on page B6-12

All

1

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A5-17

ARM Instruction Set Encoding

A5.2.12 Miscellaneous instructions
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8

cond

0 0 0 1 0

op

0

op1

7 6 5 4 3 2

0

1 0

op2

Table A5-14 shows the allocation of encodings in this space.
Other encodings in this space are UNDEFINED.
Table A5-14 Miscellaneous instructions
op2

op

op1

Instruction or instruction class

See

Variant

000

x0

xxxx

Move Special Register to Register

MRS on page A8-206
MRS on page B6-10

All

01

xx00

Move to Special Register, application level

MSR (register) on page A8-210

All

xx01
xx1x

Move to Special Register, system level

MSR (register) on page B6-14

All

11

-

Move to Special Register, system level

MSR (register) on page B6-14

All

01

-

Branch and Exchange

BX on page A8-62

v4T

11

-

Count Leading Zeros

CLZ on page A8-72

v6

010

01

-

Branch and Exchange Jazelle

BXJ on page A8-64

v5TEJ

011

01

-

Branch with Link and Exchange

BLX (register) on page A8-60

v5T

101

-

-

Saturating addition and subtraction

Saturating addition and
subtraction on page A5-13

-

111

01

-

Breakpoint

BKPT on page A8-56

v5T

11

-

Secure Monitor Call

SMC (previously SMI) on
page B6-18

Security
Extensions

001

A5-18

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

ARM Instruction Set Encoding

A5.3

Load/store word and unsigned byte
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4

cond

0 1 A

op1

Rn

3 2 1 0

B

These instructions have either A == 0 or B == 0. For instructions with A == 1 and B == 1, see Media
instructions on page A5-21.
Table A5-15 shows the allocation of encodings in this space. These encodings are in all architecture
variants.
Table A5-15 Single data transfer instructions
A

op1

B

Rn

Instruction

See

0

xx0x0 not 0x010

-

-

Store Register

STR (immediate, ARM) on
page A8-384

1

xx0x0 not 0x010

0

-

Store Register

STR (register) on page A8-386

0

0x010

-

-

Store Register Unprivileged

STRT on page A8-416

1

0x010

0

-

0

xx0x1 not 0x011

-

not 1111

Load Register (immediate)

LDR (immediate, ARM) on
page A8-120

xx0x1 not 0x011

-

1111

Load Register (literal)

LDR (literal) on page A8-122

1

xx0x1 not 0x011

0

-

Load Register

LDR (register) on page A8-124

0

0x011

-

-

Load Register Unprivileged

LDRT on page A8-176

1

0x011

0

-

0

xx1x0 not 0x110

-

-

Store Register Byte (immediate)

STRB (immediate, ARM) on
page A8-390

1

xx1x0 not 0x110

0

-

Store Register Byte (register)

STRB (register) on page A8-392

0

0x110

-

-

Store Register Byte Unprivileged

STRBT on page A8-394

1

0x110

0

-

0

xx1x1 not 0x111

-

not 1111

Load Register Byte (immediate)

LDRB (immediate, ARM) on
page A8-128

xx1x1 not 0x111

-

1111

Load Register Byte (literal)

LDRB (literal) on page A8-130

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A5-19

ARM Instruction Set Encoding

Table A5-15 Single data transfer instructions (continued)
A

op1

B

Rn

Instruction

See

1

xx1x1 not 0x111

0

-

Load Register Byte (register)

LDRB (register) on page A8-132

0

0x111

-

-

Load Register Byte Unprivileged

LDRBT on page A8-134

1

0x111

0

-

A5-20

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

ARM Instruction Set Encoding

A5.4

Media instructions
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4

cond

0 1 1

op1

Rd

op2

3 2 1 0

1

Rn

Table A5-16 shows the allocation of encodings in this space.
Other encodings in this space are UNDEFINED.
Table A5-16 Media instructions
op1

op2

Rd

Rn

Instructions

See

Variant

000xx

-

-

-

-

Parallel addition and
subtraction, signed on
page A5-22

-

001xx

-

-

-

-

Parallel addition and
subtraction, unsigned on
page A5-23

-

01xxx

-

-

-

-

Packing, unpacking,
saturation, and reversal on
page A5-24

-

10xxx

-

-

-

-

Signed multiplies on
page A5-26

-

11000

000

1111

-

Unsigned Sum of Absolute
Differences

USAD8 on page A8-500

v6

000

not 1111

-

Unsigned Sum of Absolute
Differences and Accumulate

USADA8 on page A8-502

v6

1101x

x10

-

-

Signed Bit Field Extract

SBFX on page A8-308

v6T2

1110x

x00

-

1111

Bit Field Clear

BFC on page A8-46

v6T2

-

not 1111

Bit Field Insert

BFI on page A8-48

v6T2

UBFX on page A8-466

v6T2

1111x

x10

-

-

Unsigned Bit Field Extract

11111

111

-

-

Permanently UNDEFINED. This space will not be allocated in future.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A5-21

ARM Instruction Set Encoding

A5.4.1

Parallel addition and subtraction, signed
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8

cond

7 6 5 4 3 2

0 1 1 0 0 0 op1

op2

1 0

1

Table A5-17 shows the allocation of encodings in this space. These encodings are all available in ARMv6
and above, and are UNDEFINED in earlier variants of the architecture.
Other encodings in this space are UNDEFINED.
Table A5-17 Signed parallel addition and subtraction instructions
op1

op2

Instruction

See

01

000

Add 16-bit

SADD16 on page A8-296

01

001

Add and Subtract with Exchange

SASX on page A8-300

01

010

Subtract and Add with Exchange

SSAX on page A8-366

01

011

Subtract 16-bit

SSUB16 on page A8-368

01

100

Add 8-bit

SADD8 on page A8-298

01

111

Subtract 8-bit

SSUB8 on page A8-370

Saturating instructions
10

000

Saturating Add 16-bit

QADD16 on page A8-252

10

001

Saturating Add and Subtract with Exchange

QASX on page A8-256

10

010

Saturating Subtract and Add with Exchange

QSAX on page A8-262

10

011

Saturating Subtract 16-bit

QSUB16 on page A8-266

10

100

Saturating Add 8-bit

QADD8 on page A8-254

10

111

Saturating Subtract 8-bit

QSUB8 on page A8-268

Halving instructions

A5-22

11

000

Halving Add 16-bit

SHADD16 on page A8-318

11

001

Halving Add and Subtract with Exchange

SHASX on page A8-322

11

010

Halving Subtract and Add with Exchange

SHSAX on page A8-324

11

011

Halving Subtract 16-bit

SHSUB16 on page A8-326

11

100

Halving Add 8-bit

SHADD8 on page A8-320

11

111

Halving Subtract 8-bit

SHSUB8 on page A8-328

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

ARM Instruction Set Encoding

A5.4.2

Parallel addition and subtraction, unsigned
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4

cond

0 1 1 0 0 1 op1

op2

3 2 1 0

1

Table A5-18 shows the allocation of encodings in this space. These encodings are all available in ARMv6
and above, and are UNDEFINED in earlier variants of the architecture.
Other encodings in this space are UNDEFINED.
Table A5-18 Unsigned parallel addition and subtractions instructions
op1

op2

Instruction

See

01

000

Add 16-bit

UADD16 on page A8-460

01

001

Add and Subtract with Exchange

UASX on page A8-464

01

010

Subtract and Add with Exchange

USAX on page A8-508

01

011

Subtract 16-bit

USUB16 on page A8-510

01

100

Add 8-bit

UADD8 on page A8-462

01

111

Subtract 8-bit

USUB8 on page A8-512

Saturating instructions
10

000

Saturating Add 16-bit

UQADD16 on page A8-488

10

001

Saturating Add and Subtract with Exchange

UQASX on page A8-492

10

010

Saturating Subtract and Add with Exchange

UQSAX on page A8-494

10

011

Saturating Subtract 16-bit

UQSUB16 on page A8-496

10

100

Saturating Add 8-bit

UQADD8 on page A8-490

10

111

Saturating Subtract 8-bit

UQSUB8 on page A8-498

Halving instructions

ARM DDI 0406B

11

000

Halving Add 16-bit

UHADD16 on page A8-470

11

001

Halving Add and Subtract with Exchange

UHASX on page A8-474

11

010

Halving Subtract and Add with Exchange

UHSAX on page A8-476

11

011

Halving Subtract 16-bit

UHSUB16 on page A8-478

11

100

Halving Add 8-bit

UHADD8 on page A8-472

11

111

Halving Subtract 8-bit

UHSUB8 on page A8-480

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A5-23

ARM Instruction Set Encoding

A5.4.3

Packing, unpacking, saturation, and reversal
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8

cond

0 1 1 0 1

op1

A

7 6 5 4 3 2

op2

1 0

1

Table A5-19 shows the allocation of encodings in this space.
Other encodings in this space are UNDEFINED.
Table A5-19 Packing, unpacking, saturation, and reversal instructions
op1

op2

A

Instructions

See

Variant

000

xx0

-

Pack Halfword

PKH on page A8-234

v6

01x

xx0

-

Signed Saturate

SSAT on page A8-362

v6

11x

xx0

-

Unsigned Saturate

USAT on page A8-504

v6

000

011

not 1111

Signed Extend and Add Byte 16

SXTAB16 on page A8-436

v6

1111

Signed Extend Byte 16

SXTB16 on page A8-442

v6

101

-

Select Bytes

SEL on page A8-312

v6

001

-

Signed Saturate 16

SSAT16 on page A8-364

v6

011

not 1111

Signed Extend and Add Byte

SXTAB on page A8-434

v6

1111

Signed Extend Byte

SXTB on page A8-440

v6

001

-

Byte-Reverse Word

REV on page A8-272

v6

011

not 1111

Signed Extend and Add Halfword

SXTAH on page A8-438

v6

1111

Signed Extend Halfword

SXTH on page A8-444

v6

010

011

011

101

-

Byte-Reverse Packed Halfword

REV16 on page A8-274

v6

100

011

not 1111

Unsigned Extend and Add Byte 16

UXTAB16 on page A8-516

v6

1111

Unsigned Extend Byte 16

UXTB16 on page A8-522

v6

001

-

Unsigned Saturate 16

USAT16 on page A8-506

v6

011

not 1111

Unsigned Extend and Add Byte

UXTAB on page A8-514

v6

1111

Unsigned Extend Byte

UXTB on page A8-520

v6

110

A5-24

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

ARM Instruction Set Encoding

Table A5-19 Packing, unpacking, saturation, and reversal instructions (continued)
op1

op2

A

Instructions

See

Variant

111

001

-

Reverse Bits

RBIT on page A8-270

v6T2

011

not 1111

Unsigned Extend and Add Halfword

UXTAH on page A8-518

v6

1111

Unsigned Extend Halfword

UXTH on page A8-524

v6

-

Byte-Reverse Signed Halfword

REVSH on page A8-276

v6

101

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A5-25

ARM Instruction Set Encoding

A5.4.4

Signed multiplies
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8

cond

0 1 1 1 0

op1

7 6 5 4 3 2

A

op2

1 0

1

Table A5-20 shows the allocation of encodings in this space. These encodings are all available in ARMv6T2
and above, and are UNDEFINED in earlier variants of the architecture.
Other encodings in this space are UNDEFINED.
Table A5-20 Signed multiply instructions
op1

op2

A

Instruction

See

000

00x

not 1111

Signed Multiply Accumulate Dual

SMLAD on page A8-332

1111

Signed Dual Multiply Add

SMUAD on page A8-352

not 1111

Signed Multiply Subtract Dual

SMLSD on page A8-342

1111

Signed Dual Multiply Subtract

SMUSD on page A8-360

00x

-

Signed Multiply Accumulate Long Dual

SMLALD on page A8-338

01x

-

Signed Multiply Subtract Long Dual

SMLSLD on page A8-344

00x

not 1111

Signed Most Significant Word Multiply Accumulate

SMMLA on page A8-346

1111

Signed Most Significant Word Multiply

SMMUL on page A8-350

-

Signed Most Significant Word Multiply Subtract

SMMLS on page A8-348

01x

100

101

11x

A5-26

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

ARM Instruction Set Encoding

A5.5

Branch, branch with link, and block data transfer
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4

cond

1 0

op

3 2 1 0

R

Table A5-21 shows the allocation of encodings in this space. These encodings are in all architecture
variants.
Table A5-21 Branch, branch with link, and block data transfer instructions

ARM DDI 0406B

op

R

Instructions

See

0000x0

-

Store Multiple Decrement After

STMDA / STMED on page A8-376

0000x1

-

Load Multiple Decrement After

LDMDA / LDMFA on page A8-112

0010x0

-

Store Multiple (Increment After)

STM / STMIA / STMEA on page A8-374

0010x1

-

Load Multiple (Increment After)

LDM / LDMIA / LDMFD on page A8-110

0100x0

-

Store Multiple Decrement Before

STMDB / STMFD on page A8-378

0100x1

-

Load Multiple Decrement Before

LDMDB / LDMEA on page A8-114

0110x0

-

Store Multiple Increment Before

STMIB / STMFA on page A8-380

0110x1

-

Load Multiple Increment Before

LDMIB / LDMED on page A8-116

0xx1x0

-

Store Multiple (user registers)

STM (user registers) on page B6-22

0xx1x1

0

Load Multiple (user registers)

LDM (user registers) on page B6-7

1

Load Multiple (exception return)

LDM (exception return) on page B6-5

10xxxx

-

Branch

B on page A8-44

11xxxx

-

Branch with Link

BL, BLX (immediate) on page A8-58

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A5-27

ARM Instruction Set Encoding

A5.6

Supervisor Call, and coprocessor instructions
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8

cond

1 1

op1

Rn

7 6 5 4 3 2

coproc

1 0

op

Table A5-22 shows the allocation of encodings in this space.
Table A5-22 Supervisor Call, and coprocessor instructions
op1

op

coproc

Rn

Instructions

See

Variant

0xxxxx a

-

101x

-

Advanced SIMD, VFP

Extension register load/store
instructions on page A7-26

0xxxx0 a

-

not 101x

-

Store Coprocessor

STC, STC2 on page A8-372

All

0xxxx1 a

-

not 101x

not 1111

Load Coprocessor

LDC, LDC2 (immediate) on
page A8-106

All

1111

Load Coprocessor

LDC, LDC2 (literal) on
page A8-108

All
-

00000x

-

-

-

UNDEFINED

-

00010x

-

101x

-

Advanced SIMD, VFP

64-bit transfers between ARM core and
extension registers on page A7-32

000100

-

not 101x

-

Move to Coprocessor from
two ARM core registers

MCRR, MCRR2 on
page A8-188

v5TE

000101

-

not 101x

-

Move to two ARM core
registers from Coprocessor

MRRC, MRRC2 on
page A8-204

v5TE

10xxxx

0

101x

-

-

VFP data-processing instructions on
page A7-24

not 101x

-

Coprocessor data operations

CDP, CDP2 on page A8-68

1

101x

-

Advanced SIMD, VFP

8, 16, and 32-bit transfer between ARM
core and extension registers on
page A7-31

1

not 101x

-

Move to Coprocessor from
ARM core register

MCR, MCR2 on
page A8-186

10xxx0

A5-28

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

All

All

ARM DDI 0406B

ARM Instruction Set Encoding

Table A5-22 Supervisor Call, and coprocessor instructions (continued)
op1

op

coproc

Rn

Instructions

See

Variant

10xxx1

1

not 101x

-

Move to ARM core register
from Coprocessor

MRC, MRC2 on
page A8-202

All

11xxxx

-

-

-

Supervisor Call

SVC (previously SWI) on
page A8-430

All

a. But not 000x0x

For more information about specific coprocessors see Coprocessor support on page A2-68.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A5-29

ARM Instruction Set Encoding

A5.7

Unconditional instructions
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8

1 1 1 1

op1

7 6 5 4 3 2

Rn

1 0

op

Table A5-23 shows the allocation of encodings in this space.
Other encodings in this space are UNDEFINED in ARMv5 and above.
All encodings in this space are UNPREDICTABLE in ARMv4 and ARMv4T.
Table A5-23 Unconditional instructions
op1

op

Rn

Instruction

See

0xxxxxxx

-

-

-

Miscellaneous instructions, memory hints, and
Advanced SIMD instructions on page A5-31

100xx1x0

-

-

Store Return State

SRS on page B6-20

v6

100xx0x1

-

-

Return From Exception

RFE on page B6-16

v6

101xxxxx

-

-

Branch with Link and Exchange

BL, BLX (immediate) on
page A8-58

v5

11000x11

-

not 1111

Load Coprocessor (immediate)

LDC, LDC2 (immediate) on
page A8-106

v5

11001xx1

-

1111

Load Coprocessor (literal)

v5

1101xxx1

-

1111

LDC, LDC2 (literal) on
page A8-108

11000x10
11001xx0
1101xxx0

-

-

Store Coprocessor

STC, STC2 on page A8-372

v5

11000100

-

-

Move to Coprocessor from two
ARM core registers

MCRR, MCRR2 on page A8-188

v6

11000101

-

-

Move to two ARM core registers
from Coprocessor

MRRC, MRRC2 on page A8-204

v6

1110xxxx

0

-

Coprocessor data operations

CDP, CDP2 on page A8-68

v5

1110xxx0

1

-

Move to Coprocessor from
ARM core register

MCR, MCR2 on page A8-186

v5

1110xxx1

1

-

Move to ARM core register from
Coprocessor

MRC, MRC2 on page A8-202

v5

A5-30

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

Variant

ARM DDI 0406B

ARM Instruction Set Encoding

A5.7.1

Miscellaneous instructions, memory hints, and Advanced SIMD instructions
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4

1 1 1 1 0

op1

Rn

3 2 1 0

op2

Table A5-24 shows the allocation of encodings in this space.
Other encodings in this space are UNDEFINED in ARMv5 and above. All these encodings are
UNPREDICTABLE in ARMv4 and ARMv4T.
Table A5-24 Hints, and Advanced SIMD instructions
op1

op2

Rn

Instruction

See

Variant

0010000

xx0x

xxx0

Change Processor State

CPS on page B6-3

v6

0010000

0000

xxx1

Set Endianness

SETEND on page A8-314

v6

01xxxxx

-

-

See Advanced SIMD data-processing instructions on page A7-10

v7

100xxx0

-

-

See Advanced SIMD element or structure load/store instructions on
page A7-27

v7

100x001

-

-

Unallocated memory hint (treat as NOP)

MP a
Extensions

100x101

-

-

Preload Instruction

PLI (immediate, literal) on
page A8-242

v7

101x001

-

not 1111

Preload Data with intent to
Write

PLD, PLDW (immediate) on
page A8-236

MP a
Extensions

1111

UNPREDICTABLE

-

-

not 1111

Preload Data

PLD, PLDW (immediate) on
page A8-236

v5TE

1111

Preload Data

PLD (literal) on page A8-238

v5TE

0001

-

Clear-Exclusive

CLREX on page A8-70

v6K

0100

-

Data Synchronization Barrier

DSB on page A8-92

v6T2

0101

-

Data Memory Barrier

DMB on page A8-90

v7

0110

-

Instruction Synchronization
Barrier

ISB on page A8-102

v6T2

-

-

UNPREDICTABLE

101x101

1010111

10xxx11

-

ARM DDI 0406B

except as shown above

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

-

A5-31

ARM Instruction Set Encoding

Table A5-24 Hints, and Advanced SIMD instructions (continued)
op1

op2

Rn

Instruction

See

Variant

110x001

xxx0

-

Unallocated memory hint (treat as NOP)

MP a
Extensions

110x101

xxx0

-

Preload Instruction

PLI (register) on page A8-244

v7

111x001

xxx0

-

Preload Data with intent to
Write

PLD, PLDW (register) on
page A8-240

MP a
Extensions

111x101

xxx0

-

Preload Data

PLD, PLDW (register) on
page A8-240

v5TE

11xxx11

xxx0

-

UNPREDICTABLE

-

-

a. Multiprocessing Extensions.

A5-32

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Chapter A6
Thumb Instruction Set Encoding

This chapter introduces the Thumb instruction set and describes how it uses the ARM programmers’ model.
It contains the following sections:
•
Thumb instruction set encoding on page A6-2
•
16-bit Thumb instruction encoding on page A6-6
•
32-bit Thumb instruction encoding on page A6-14.
For details of the differences between the Thumb and ThumbEE instruction sets see Chapter A9 ThumbEE.

Note
•

Architecture variant information in this chapter describes the architecture variant or extension in
which the instruction encoding was introduced into the Thumb instruction set.

•

In the decode tables in this chapter, an entry of - for a field value means the value of the field does
not affect the decoding.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A6-1

Thumb Instruction Set Encoding

A6.1

Thumb instruction set encoding
The Thumb instruction stream is a sequence of halfword-aligned halfwords. Each Thumb instruction is
either a single 16-bit halfword in that stream, or a 32-bit instruction consisting of two consecutive halfwords
in that stream.
If bits [15:11] of the halfword being decoded take any of the following values, the halfword is the first
halfword of a 32-bit instruction:
•
0b11101
•
0b11110
•
0b11111.
Otherwise, the halfword is a 16-bit instruction.
For details of the encoding of 16-bit Thumb instructions see 16-bit Thumb instruction encoding on
page A6-6.
For details of the encoding of 32-bit Thumb instructions see 32-bit Thumb instruction encoding on
page A6-14.

A6.1.1

UNDEFINED and UNPREDICTABLE instruction set space
An attempt to execute an unallocated instruction results in either:
•
Unpredictable behavior. The instruction is described as UNPREDICTABLE.
•
An Undefined Instruction exception. The instruction is described as UNDEFINED.
An instruction is UNDEFINED if it is declared as UNDEFINED in an instruction description, or in this chapter.
An instruction is UNPREDICTABLE if:
•
a bit marked (0) or (1) in the encoding diagram of an instruction is not 0 or 1 respectively, and the
pseudocode for that encoding does not indicate that a different special case applies
•
it is declared as UNPREDICTABLE in an instruction description or in this chapter.
Unless otherwise specified:
•

Thumb instructions introduced in an architecture variant are either UNPREDICTABLE or UNDEFINED in
earlier architecture variants.

•

A Thumb instruction that is provided by one or more of the architecture extensions is either
UNPREDICTABLE or UNDEFINED in an implementation that does not include any of those extensions.

In both cases, the instruction is UNPREDICTABLE if it is a 32-bit instruction in an architecture variant before
ARMv6T2, and UNDEFINED otherwise.

A6-2

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Thumb Instruction Set Encoding

A6.1.2

Use of 0b1111 as a register specifier
The use of 0b1111 as a register specifier is not normally permitted in Thumb instructions. When a value of
0b1111 is permitted, a variety of meanings is possible. For register reads, these meanings are:
•

Read the PC value, that is, the address of the current instruction + 4. The base register of the table
branch instructions TBB and TBH can be the PC. This enables branch tables to be placed in memory
immediately after the instruction.

Note
Use of the PC as the base register in the STC instruction is deprecated in ARMv7.
•

Read the word-aligned PC value, that is, the address of the current instruction + 4, with bits [1:0]
forced to zero. The base register of LDC, LDR, LDRB, LDRD (pre-indexed, no writeback), LDRH, LDRSB, and
LDRSH instructions can be the word-aligned PC. This enables PC-relative data addressing. In addition,
some encodings of the ADD and SUB instructions permit their source registers to be 0b1111 for the same
purpose.

•

Read zero. This is done in some cases when one instruction is a special case of another, more general
instruction, but with one operand zero. In these cases, the instructions are listed on separate pages,
with a special case in the pseudocode for the more general instruction cross-referencing the other
page.

For register writes, these meanings are:
•

The PC can be specified as the destination register of an LDR instruction. This is done by encoding Rt
as 0b1111. The loaded value is treated as an address, and the effect of execution is a branch to that
address. bit [0] of the loaded value selects whether to execute ARM or Thumb instructions after the
branch.
Some other instructions write the PC in similar ways, either implicitly (for example branch
instructions) or by using a register mask rather than a register specifier (LDM). The address to branch
to can be:
—
a loaded value, for example, RFE
—
a register value, for example, BX
—
the result of a calculation, for example, TBB or TBH.
The method of choosing the instruction set used after the branch can be:

•

ARM DDI 0406B

—

similar to the LDR case, for LDM or BX

—

a fixed instruction set other than the one currently being used, for example, the immediate form
of BLX

—

unchanged, for example branch instructions

—

set from the (J,T) bits of the SPSR, for RFE and SUBS PC,LR,#imm8.

Discard the result of a calculation. This is done in some cases when one instruction is a special case
of another, more general instruction, but with the result discarded. In these cases, the instructions are
listed on separate pages, with a special case in the pseudocode for the more general instruction
cross-referencing the other page.
Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A6-3

Thumb Instruction Set Encoding

A6.1.3

•

If the destination register specifier of an LDRB, LDRH, LDRSB, or LDRSH instruction is 0b1111, the
instruction is a memory hint instead of a load operation.

•

If the destination register specifier of an MRC instruction is 0b1111, bits [31:28] of the value
transferred from the coprocessor are written to the N, Z, C, and V flags in the APSR, and bits [27:0]
are discarded.

Use of 0b1101 as a register specifier
R13 is defined in the Thumb instruction set so that its use is primarily as a stack pointer, and R13 is normally
identified as SP in Thumb instructions. In 32-bit Thumb instructions, if you use R13 as a general-purpose
register beyond the architecturally defined constraints described in this section, the results are
UNPREDICTABLE.
The restrictions applicable to R13 are described in:
•
R13[1:0] definition
•
32-bit Thumb instruction support for R13.
See also 16-bit Thumb instruction support for R13 on page A6-5.

R13[1:0] definition
Bits [1:0] of R13 are SBZP. Writing a nonzero value to bits [1:0] causes UNPREDICTABLE behavior.

32-bit Thumb instruction support for R13
R13 instruction support is restricted to the following:
•

R13 as the source or destination register of a MOV instruction. Only register to register transfers without
shifts are supported, with no flag setting:
MOV
MOV

•

Using the following instructions to adjust R13 up or down by a multiple of 4:
ADD{W}
SUB{W}
ADD
ADD
SUB
SUB

A6-4

SP,
,SP

SP,SP,#
SP,SP,#
SP,SP,
SP,SP,,LSL #
SP,SP,
SP,SP,,LSL #

; For  = 1,2,3
; For  = 1,2,3

•

R13 as a base register  of any load/store instruction. This supports SP-based addressing for load,
store, or memory hint instructions, with positive or negative offsets, with and without writeback.

•

R13 as the first operand  in any ADD{S}, CMN, CMP, or SUB{S} instruction. The add and subtract
instructions support SP-based address generation, with the address going into a general-purpose
register. CMN and CMP are useful for stack checking in some circumstances.

•

R13 as the transferred register  in any LDR or STR instruction.

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Thumb Instruction Set Encoding

16-bit Thumb instruction support for R13
For 16-bit data-processing instructions that affect high registers, R13 can only be used as described in 32-bit
Thumb instruction support for R13 on page A6-4. Any other use is deprecated. This affects the high register
forms of CMP and ADD, where the use of R13 as  is deprecated.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A6-5

Thumb Instruction Set Encoding

A6.2

16-bit Thumb instruction encoding
15 14 13 12 11 10 9 8 7

6 5 4 3 2

1 0

Opcode
Table A6-1 shows the allocation of 16-bit instruction encodings.
Table A6-1 16-bit Thumb instruction encoding
Opcode

Instruction or instruction class

Variant

00xxxx

Shift (immediate), add, subtract, move, and compare on page A6-7

-

010000

Data-processing on page A6-8

-

010001

Special data instructions and branch and exchange on page A6-9

-

01001x

Load from Literal Pool, see LDR (literal) on page A8-122

v4T

0101xx

Load/store single data item on page A6-10

-

10100x

Generate PC-relative address, see ADR on page A8-32

v4T

10101x

Generate SP-relative address, see ADD (SP plus immediate) on page A8-28

v4T

1011xx

Miscellaneous 16-bit instructions on page A6-11

-

11000x

Store multiple registers, see STM / STMIA / STMEA on page A8-374 a

v4T

11001x

Load multiple registers, see LDM / LDMIA / LDMFD on page A8-110 a

v4T

1101xx

Conditional branch, and Supervisor Call on page A6-13

-

11100x

Unconditional Branch, see B on page A8-44

v4T

011xxx
100xxx

a. In ThumbEE, 16-bit load/store multiple instructions are not available. This encoding is used for special
ThumbEE instructions. For details see Chapter A9 ThumbEE.

A6-6

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Thumb Instruction Set Encoding

A6.2.1

Shift (immediate), add, subtract, move, and compare

15 14 13 12 11 10 9

0 0

8 7 6 5 4 3

2 1 0

Opcode

Table A6-2 shows the allocation of encodings in this space.
All these instructions are available since the Thumb instruction set was introduced in ARMv4T.
Table A6-2 16-bit Thumb shift (immediate), add, subtract, move, and compare instructions

ARM DDI 0406B

Opcode

Instruction

See

000xx

Logical Shift Left

LSL (immediate) on page A8-178

001xx

Logical Shift Right

LSR (immediate) on page A8-182

010xx

Arithmetic Shift Right

ASR (immediate) on page A8-40

01100

Add register

ADD (register) on page A8-24

01101

Subtract register

SUB (register) on page A8-422

01110

Add 3-bit immediate

ADD (immediate, Thumb) on page A8-20

01111

Subtract 3-bit immediate

SUB (immediate, Thumb) on page A8-418

100xx

Move

MOV (immediate) on page A8-194

101xx

Compare

CMP (immediate) on page A8-80

110xx

Add 8-bit immediate

ADD (immediate, Thumb) on page A8-20

111xx

Subtract 8-bit immediate

SUB (immediate, Thumb) on page A8-418

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A6-7

Thumb Instruction Set Encoding

A6.2.2

Data-processing

15 14 13 12 11 10 9 8 7

0 1 0 0 0 0

6 5 4 3 2

1 0

Opcode

Table A6-3 shows the allocation of encodings in this space.
All these instructions are available since the Thumb instruction set was introduced in ARMv4T.
Table A6-3 16-bit Thumb data-processing instructions

A6-8

Opcode

Instruction

See

0000

Bitwise AND

AND (register) on page A8-36

0001

Bitwise Exclusive OR

EOR (register) on page A8-96

0010

Logical Shift Left

LSL (register) on page A8-180

0011

Logical Shift Right

LSR (register) on page A8-184

0100

Arithmetic Shift Right

ASR (register) on page A8-42

0101

Add with Carry

ADC (register) on page A8-16

0110

Subtract with Carry

SBC (register) on page A8-304

0111

Rotate Right

ROR (register) on page A8-280

1000

Test

TST (register) on page A8-456

1001

Reverse Subtract from 0

RSB (immediate) on page A8-284

1010

Compare High Registers

CMP (register) on page A8-82

1011

Compare Negative

CMN (register) on page A8-76

1100

Bitwise OR

ORR (register) on page A8-230

1101

Multiply Two Registers

MUL on page A8-212

1110

Bitwise Bit Clear

BIC (register) on page A8-52

1111

Bitwise NOT

MVN (register) on page A8-216

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Thumb Instruction Set Encoding

A6.2.3

Special data instructions and branch and exchange

15 14 13 12 11 10 9

0 1 0 0 0 1

8 7 6 5 4 3

2 1 0

Opcode

Table A6-4 shows the allocation of encodings in this space.
Table A6-4 16-bit Thumb special data instructions and branch and exchange
Opcode

Instruction

See

Variant

0000

Add Low Registers

ADD (register) on page A8-24

v6T2 a

0001
001x

Add High Registers

ADD (register) on page A8-24

v4T

0100

UNPREDICTABLE

-

-

0101
011x

Compare High Registers

CMP (register) on page A8-82

v4T

1000

Move Low Registers

MOV (register) on page A8-196

v6 a

1001
101x

Move High Registers

MOV (register) on page A8-196

v4T

110x

Branch and Exchange

BX on page A8-62

v4T

111x

Branch with Link and Exchange

BLX (register) on page A8-60

v5T a

a.

ARM DDI 0406B

UNPREDICTABLE

in earlier variants.

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A6-9

Thumb Instruction Set Encoding

A6.2.4

Load/store single data item

15 14 13 12 11 10 9 8 7

opA

6 5 4 3 2

1 0

opB

These instructions have one of the following values in opA:
•
0b0101
•
0b011x
•
0b100x.
Table A6-5 shows the allocation of encodings in this space.
All these instructions are available since the Thumb instruction set was introduced in ARMv4T.
Table A6-5 16-bit Thumb Load/store instructions
opA

opB

Instruction

See

0101

000

Store Register

STR (register) on page A8-386

001

Store Register Halfword

STRH (register) on page A8-412

010

Store Register Byte

STRB (register) on page A8-392

011

Load Register Signed Byte

LDRSB (register) on page A8-164

100

Load Register

LDR (register) on page A8-124

101

Load Register Halfword

LDRH (register) on page A8-156

110

Load Register Byte

LDRB (register) on page A8-132

111

Load Register Signed Halfword

LDRSH (register) on page A8-172

0xx

Store Register

STR (immediate, Thumb) on page A8-382

1xx

Load Register

LDR (immediate, Thumb) on page A8-118

0xx

Store Register Byte

STRB (immediate, Thumb) on page A8-388

1xx

Load Register Byte

LDRB (immediate, Thumb) on page A8-126

0xx

Store Register Halfword

STRH (immediate, Thumb) on page A8-408

1xx

Load Register Halfword

LDRH (immediate, Thumb) on page A8-150

0xx

Store Register SP relative

STR (immediate, Thumb) on page A8-382

1xx

Load Register SP relative

LDR (immediate, Thumb) on page A8-118

0110

0111

1000

1001

A6-10

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Thumb Instruction Set Encoding

A6.2.5

Miscellaneous 16-bit instructions

15 14 13 12 11 10 9

1 0 1 1

8 7 6 5 4 3

2 1 0

Opcode

Table A6-6 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED.
Table A6-6 Miscellaneous 16-bit instructions
Opcode

Instruction

See

Variant

0110010

Set Endianness

SETEND on page A8-314

v6

0110011

Change Processor State

CPS on page B6-3

v6

00000xx

Add Immediate to SP

ADD (SP plus immediate) on page A8-28

v4T

00001xx

Subtract Immediate from SP

SUB (SP minus immediate) on page A8-426

v4T

0001xxx

Compare and Branch on Zero

CBNZ, CBZ on page A8-66

v6T2

001000x

Signed Extend Halfword

SXTH on page A8-444

v6

001001x

Signed Extend Byte

SXTB on page A8-440

v6

001010x

Unsigned Extend Halfword

UXTH on page A8-524

v6

001011x

Unsigned Extend Byte

UXTB on page A8-520

v6

0011xxx

Compare and Branch on Zero

CBNZ, CBZ on page A8-66

v6T2

010xxxx

Push Multiple Registers

PUSH on page A8-248

v4T

1001xxx

Compare and Branch on Nonzero

CBNZ, CBZ on page A8-66

v6T2

101000x

Byte-Reverse Word

REV on page A8-272

v6

101001x

Byte-Reverse Packed Halfword

REV16 on page A8-274

v6

101011x

Byte-Reverse Signed Halfword

REVSH on page A8-276

v6

1011xxx

Compare and Branch on Nonzero

CBNZ, CBZ on page A8-66

v6T2

110xxxx

Pop Multiple Registers

POP on page A8-246

v4T

1110xxx

Breakpoint

BKPT on page A8-56

v5

1111xxx

If-Then, and hints

If-Then, and hints on page A6-12

-

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A6-11

Thumb Instruction Set Encoding

If-Then, and hints

15 14 13 12 11 10 9 8 7

6 5 4 3 2

1 0 1 1 1 1 1 1

opA

1 0

opB

Table A6-7 shows the allocation of encodings in this space.
Other encodings in this space are unallocated hints. They execute as NOPs, but software must not use them.
Table A6-7 Miscellaneous 16-bit instructions

A6-12

opA

opB

Instruction

See

Variant

-

not 0000

If-Then

IT on page A8-104

v6T2

0000

0000

No Operation hint

NOP on page A8-222

v6T2

0001

0000

Yield hint

YIELD on page A8-812

v7

0010

0000

Wait For Event hint

WFE on page A8-808

v7

0011

0000

Wait For Interrupt hint

WFI on page A8-810

v7

0100

0000

Send Event hint

SEV on page A8-316

v7

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Thumb Instruction Set Encoding

A6.2.6

Conditional branch, and Supervisor Call

15 14 13 12 11 10 9

1 1 0 1

8 7 6 5 4 3

2 1 0

Opcode

Table A6-8 shows the allocation of encodings in this space.
All these instructions are available since the Thumb instruction set was introduced in ARMv4T.
Table A6-8 Conditional branch and Supervisor Call instructions

ARM DDI 0406B

Opcode

Instruction

See

not 111x

Conditional branch

B on page A8-44

1110

Permanently UNDEFINED. This space will not be allocated in future.

1111

Supervisor Call

SVC (previously SWI) on page A8-430

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A6-13

Thumb Instruction Set Encoding

A6.3

32-bit Thumb instruction encoding
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 1 1 op1

op

op2

If op1 == 0b00, a 16-bit instruction is encoded, see 16-bit Thumb instruction encoding on page A6-6.
Table A6-9 shows the allocation of encodings in this space.
Table A6-9 32-bit Thumb instruction encoding
op1

op2

op

Instruction class, see

01

00xx0xx

-

Load/store multiple on page A6-23

00xx1xx

-

Load/store dual, load/store exclusive, table branch on page A6-24

01xxxxx

-

Data-processing (shifted register) on page A6-31

1xxxxxx

-

Coprocessor instructions on page A6-40

x0xxxxx

0

Data-processing (modified immediate) on page A6-15

x1xxxxx

0

Data-processing (plain binary immediate) on page A6-19

-

1

Branches and miscellaneous control on page A6-20

000xxx0

-

Store single data item on page A6-30

001xxx0

-

Advanced SIMD element or structure load/store instructions on page A7-27

00xx001

-

Load byte, memory hints on page A6-28

00xx011

-

Load halfword, memory hints on page A6-26

00xx101

-

Load word on page A6-25

00xx111

-

UNDEFINED

010xxxx

-

Data-processing (register) on page A6-33

0110xxx

-

Multiply, multiply accumulate, and absolute difference on page A6-38

0111xxx

-

Long multiply, long multiply accumulate, and divide on page A6-39

1xxxxxx

-

Coprocessor instructions on page A6-40

10

11

A6-14

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Thumb Instruction Set Encoding

A6.3.1

Data-processing (modified immediate)
15 14 13 12 11 10 9

1 1 1 1 0

8 7 6 5 4

0

op

3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4

S

Rn

0

3 2 1 0

Rd

Table A6-10 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED.
In the Rn, Rd and S columns, - indicates that the field value of the field does affect the decoding.
These encodings are all available in ARMv6T2 and above.
Table A6-10 32-bit modified immediate data-processing instructions
op

Rn

Rd

S

Instruction

See

0000

-

not 1111

x

Bitwise AND

AND (immediate) on page A8-34

-

1111

0

UNPREDICTABLE

-

-

1111

1

Test

TST (immediate) on page A8-454

0001

-

-

-

Bitwise Bit Clear

BIC (immediate) on page A8-50

0010

not 1111

-

-

Bitwise OR

ORR (immediate) on page A8-228

1111

-

-

Move

MOV (immediate) on page A8-194

not 1111

-

-

Bitwise OR NOT

ORN (immediate) on page A8-224

1111

-

-

Bitwise NOT

MVN (immediate) on page A8-214

-

not 1111

x

Bitwise Exclusive OR

EOR (immediate) on page A8-94

1111

0

UNPREDICTABLE

-

1

Test Equivalence

TEQ (immediate) on page A8-448

not 1111

-

Add

ADD (immediate, Thumb) on page A8-20

1111

0

UNPREDICTABLE

-

1

Compare Negative

CMN (immediate) on page A8-74

0011

0100

1000

-

1010

-

-

-

Add with Carry

ADC (immediate) on page A8-14

1011

-

-

-

Subtract with Carry

SBC (immediate) on page A8-302

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A6-15

Thumb Instruction Set Encoding

Table A6-10 32-bit modified immediate data-processing instructions (continued)
op

Rn

Rd

S

Instruction

See

1101

-

not 1111

-

Subtract

SUB (immediate, Thumb) on page A8-418

1111

0

UNPREDICTABLE

-

1

Compare

CMP (immediate) on page A8-80

-

Reverse Subtract

RSB (immediate) on page A8-284

1110

-

-

These instructions all have modified immediate constants, rather than a simple 12-bit binary number. This
provides a more useful range of values. For details see Modified immediate constants in Thumb instructions
on page A6-17.

A6-16

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Thumb Instruction Set Encoding

A6.3.2

Modified immediate constants in Thumb instructions
15 14 13 12 11 10 9

8 7 6 5 4

3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4

i

imm3

3 2 1 0

a b c d e f g h

Table A6-11 shows the range of modified immediate constants available in Thumb data-processing
instructions, and how they are encoded in the a, b, c, d, e, f, g, h, i, and imm3 fields in the instruction.
Table A6-11 Encoding of modified immediates in Thumb data-processing instructions
i:imm3:a

 a

0000x

00000000 00000000 00000000 abcdefgh

0001x

00000000 abcdefgh 00000000 abcdefgh b

0010x

abcdefgh 00000000 abcdefgh 00000000 b

0011x

abcdefgh abcdefgh abcdefgh abcdefgh b

01000

1bcdefgh 00000000 00000000 00000000

01001

01bcdefg h0000000 00000000 00000000 c

01010

001bcdef gh000000 00000000 00000000

01011

0001bcde fgh00000 00000000 00000000 c

.
.
.

.
.
.

11101

00000000 00000000 000001bc defgh000 c

11110

00000000 00000000 0000001b cdefgh00

11111

00000000 00000000 00000001 bcdefgh0 c

8-bit values shifted to other positions

a. In this table, the immediate constant value is shown in binary form, to relate
abcdefgh to the encoding diagram. In assembly syntax, the immediate value is
specified in the usual way (a decimal number by default).
b. Not available in ARM instructions. UNPREDICTABLE if abcdefgh == 00000000.
c. Not available in ARM instructions if h == 1.

Note
The range of values available in Thumb modified immediate constants is slightly different from the range
of values available in ARM instructions. See Modified immediate constants in ARM instructions on
page A5-9 for the ARM values.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A6-17

Thumb Instruction Set Encoding

Carry out
A logical instruction with i:imm3:a == ’00xxx’ does not affect the carry flag. Otherwise, a logical
instruction that sets the flags sets the Carry flag to the value of bit [31] of the modified immediate constant.

Operation
// ThumbExpandImm()
// ================
bits(32) ThumbExpandImm(bits(12) imm12)
// APSR.C argument to following function call does not affect the imm32 result.
(imm32, -) = ThumbExpandImm_C(imm12, APSR.C);
return imm32;
// ThumbExpandImm_C()
// ==================
(bits(32), bit) ThumbExpandImm_C(bits(12) imm12, bit carry_in)
if imm12<11:10> == ‘00’ then
case imm12<9:8> of
when ‘00’
imm32 = ZeroExtend(imm12<7:0>, 32);
when ‘01’
if imm12<7:0> == ‘00000000’ then UNPREDICTABLE;
imm32 = ‘00000000’ : imm12<7:0> : ‘00000000’ : imm12<7:0>;
when ‘10’
if imm12<7:0> == ‘00000000’ then UNPREDICTABLE;
imm32 = imm12<7:0> : ‘00000000’ : imm12<7:0> : ‘00000000’;
when ‘11’
if imm12<7:0> == ‘00000000’ then UNPREDICTABLE;
imm32 = imm12<7:0> : imm12<7:0> : imm12<7:0> : imm12<7:0>;
carry_out = carry_in;
else
unrotated_value = ZeroExtend(‘1’:imm12<6:0>, 32);
(imm32, carry_out) = ROR_C(unrotated_value, UInt(imm12<11:7>));
return (imm32, carry_out);

A6-18

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Thumb Instruction Set Encoding

A6.3.3

Data-processing (plain binary immediate)
15 14 13 12 11 10 9

1 1 1 1 0

8 7 6 5 4

1

3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4

op

Rn

3 2 1 0

0

Table A6-12 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED.
These encodings are all available in ARMv6T2 and above.
Table A6-12 32-bit unmodified immediate data-processing instructions
op

Rn

Instruction

See

00000

not 1111

Add Wide (12-bit)

ADD (immediate, Thumb) on page A8-20

1111

Form PC-relative Address

ADR on page A8-32

00100

-

Move Wide (16-bit)

MOV (immediate) on page A8-194

01010

not 1111

Subtract Wide (12-bit)

SUB (immediate, Thumb) on page A8-418

1111

Form PC-relative Address

ADR on page A8-32

01100

-

Move Top (16-bit)

MOVT on page A8-200

100x0 a

-

Signed Saturate

SSAT on page A8-362

10010 b

-

Signed Saturate (two 16-bit)

SSAT16 on page A8-364

10100

-

Signed Bit Field Extract

SBFX on page A8-308

10110

not 1111

Bit Field Insert

BFI on page A8-48

1111

Bit Field Clear

BFC on page A8-46

110x0 a

-

Unsigned Saturate

USAT on page A8-504

11010 b

-

Unsigned Saturate 16

USAT16 on page A8-506

11100

-

Unsigned Bit Field Extract

UBFX on page A8-466

a. In the second halfword of the instruction, bits [14:12.7:6] != 0b00000.
b. In the second halfword of the instruction, bits [14:12.7:6] == 0b00000.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A6-19

Thumb Instruction Set Encoding

A6.3.4

Branches and miscellaneous control
15 14 13 12 11 10 9 8 7

1 1 1 1 0

6 5 4 3 2

1 0 15 14 13 12 11 10 9 8

op

1

op1

7 6 5 4 3 2

1 0

op2

Table A6-13 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED.
Table A6-13 Branches and miscellaneous control instructions
op1

op

op2

Instruction

See

Variant

0x0

not x111xxx

-

Conditional branch

B on page A8-44

v6T2

0111000

xx00

Move to Special Register,
application level

MSR (register) on page A8-210

All

xx01

Move to Special Register,
system level

MSR (register) on page B6-14

All

xx1x
0111001

-

0111010

-

-

Change Processor State, and hints
on page A6-21

-

0111011

-

-

Miscellaneous control instructions
on page A6-21

-

0111100

-

Branch and Exchange Jazelle

BXJ on page A8-64

v6T2

0111101

-

Exception Return

SUBS PC, LR and related
instructions on page B6-25

v6T2

011111x

-

Move from Special Register

MRS on page A8-206

v6T2

000

1111111

-

Secure Monitor Call

SMC (previously SMI) on
page B6-18

Security
Extensions

010

1111111

-

Permanently UNDEFINED. This space will not be allocated in future.

0x1

-

-

Branch

1x0

-

-

Branch with Link and
Exchange

1x1
a.

A6-20

UNDEFINED

-

B on page A8-44

v6T2
v5T a

BL, BLX (immediate) on
page A8-58

Branch with Link

v4T

in ARMv4T.

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Thumb Instruction Set Encoding

Change Processor State, and hints
15 14 13 12 11 10 9

8 7 6 5 4

3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4

1 1 1 1 0 0 1 1 1 0 1 0

1 0

0

op1

3 2 1 0

op2

Table A6-14 shows the allocation of encodings in this space. Other encodings in this space are unallocated
hints that execute as NOPs. These unallocated hint encodings are reserved and software must not use them.
Table A6-14 Change Processor State, and hint instructions
op1

op2

Instruction

See

Variant

not 000

-

Change Processor State

CPS on page B6-3

v6T2

000

00000000

No Operation hint

NOP on page A8-222

v6T2

00000001

Yield hint

YIELD on page A8-812

v7

00000010

Wait For Event hint

WFE on page A8-808

v7

00000011

Wait For Interrupt hint

WFI on page A8-810

v7

00000100

Send Event hint

SEV on page A8-316

v7

1111xxxx

Debug hint

DBG on page A8-88

v7

Miscellaneous control instructions
15 14 13 12 11 10 9

8 7 6 5 4

3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4

1 1 1 1 0 0 1 1 1 0 1 1

1 0

0

3 2 1 0

op

Table A6-15 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED
in ARMv7. They are UNPREDICTABLE in ARMv6.
Table A6-15 Miscellaneous control instructions

ARM DDI 0406B

op

Instruction

See

Variant

0000

Leave ThumbEE state a

ENTERX, LEAVEX on page A9-7

ThumbEE

0001

Enter ThumbEE state

ENTERX, LEAVEX on page A9-7

ThumbEE

0010

Clear-Exclusive

CLREX on page A8-70

v7

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A6-21

Thumb Instruction Set Encoding

Table A6-15 Miscellaneous control instructions (continued)
op

Instruction

See

Variant

0100

Data Synchronization Barrier

DSB on page A8-92

v7

0101

Data Memory Barrier

DMB on page A8-90

v7

0110

Instruction Synchronization Barrier

ISB on page A8-102

v7

a. This instruction is a NOP in Thumb state.

A6-22

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Thumb Instruction Set Encoding

A6.3.5

Load/store multiple
15 14 13 12 11 10 9

1 1 1 0 1 0 0

8 7 6 5 4

op

0

L

3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4

3 2 1 0

Rn

Table A6-16 shows the allocation of encodings in this space.
These encodings are all available in ARMv6T2 and above.
Table A6-16 Load/store multiple instructions
op

L

Rn

Instruction

See

00

0

-

Store Return State

SRS on page B6-20

1

-

Return From Exception

RFE on page B6-16

0

-

Store Multiple (Increment After, Empty Ascending)

STM / STMIA / STMEA on
page A8-374

1

not 1101

Load Multiple (Increment After, Full Descending)

LDM / LDMIA / LDMFD on
page A8-110

1101

Pop Multiple Registers from the stack

POP on page A8-246

not 1101

Store Multiple (Decrement Before, Full Descending)

STMDB / STMFD on
page A8-378

1101

Push Multiple Registers to the stack.

PUSH on page A8-248

1

-

Load Multiple (Decrement Before, Empty Ascending)

LDMDB / LDMEA on
page A8-114

0

-

Store Return State

SRS on page B6-20

1

-

Return From Exception

RFE on page B6-16

01

10

11

0

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A6-23

Thumb Instruction Set Encoding

A6.3.6

Load/store dual, load/store exclusive, table branch
15 14 13 12 11 10 9 8 7

6 5 4 3 2

1 1 1 0 1 0 0 op1 1 op2

1 0 15 14 13 12 11 10 9 8

7 6 5 4 3 2

Rn

1 0

op3

Table A6-17 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED.
Table A6-17 Load/store double or exclusive, table branch
op1

op2

op3

Rn

Instruction

See

Variant

00

00

-

-

Store Register Exclusive

STREX on page A8-400

v6T2

01

-

-

Load Register Exclusive

LDREX on page A8-142

v6T2

0x
1x

10
x0

-

-

Store Register Dual

STRD (immediate) on
page A8-396

v6T2

0x

11

-

not 1111

Load Register Dual (immediate)

v6T2

1x

x1

-

not 1111

LDRD (immediate) on
page A8-136

0x

11

-

1111

Load Register Dual (literal)

v6T2

1x

x1

-

1111

LDRD (literal) on
page A8-138

01

00

0100

-

Store Register Exclusive Byte

STREXB on page A8-402

v7

0101

-

Store Register Exclusive Halfword

STREXH on page A8-406

v7

0111

-

Store Register Exclusive
Doubleword

STREXD on page A8-404

v7

0000

-

Table Branch Byte

TBB, TBH on page A8-446

v6T2

0001

-

Table Branch Halfword

TBB, TBH on page A8-446

v6T2

0100

-

Load Register Exclusive Byte

LDREXB on page A8-144

v7

0101

-

Load Register Exclusive Halfword

LDREXH on page A8-148

v7

0111

-

Load Register Exclusive
Doubleword

LDREXD on page A8-146

v7

01

A6-24

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Thumb Instruction Set Encoding

A6.3.7

Load word
15 14 13 12 11 10 9

8 7 6 5 4

3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4

1 1 1 1 1 0 0 op1 1 0 1

Rn

3 2 1 0

op2

Table A6-18 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED.
These encodings are all available in ARMv6T2 and above.
Table A6-18 Load word
op1

op2

Rn

Instruction

See

01

-

not 1111

Load Register

LDR (immediate, Thumb) on page A8-118

00

1xx1xx

not 1111

1100xx

not 1111

1110xx

not 1111

Load Register Unprivileged

LDRT on page A8-176

000000

not 1111

Load Register

LDR (register) on page A8-124

-

1111

Load Register

LDR (literal) on page A8-122

0x

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A6-25

Thumb Instruction Set Encoding

A6.3.8

Load halfword, memory hints
15 14 13 12 11 10 9 8 7

6 5 4 3 2

1 1 1 1 1 0 0 op1 0 1 1

1 0 15 14 13 12 11 10 9 8

Rn

Rt

7 6 5 4 3 2

1 0

op2

Table A6-19 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED.
Except where otherwise noted, these encodings are available in ARMv6T2 and above.
Table A6-19 Load halfword, preload
op1

op2

Rn

Rt

Instruction

See

0x

-

1111

not 1111

Load Register Halfword

LDRH (literal) on page A8-154

01

-

not 1111

not 1111

Load Register Halfword

00

1xx1xx

not 1111

not 1111

LDRH (immediate, Thumb) on
page A8-150

1100xx

not 1111

not 1111

1110xx

not 1111

not 1111

Load Register Halfword
Unprivileged

LDRHT on page A8-158

000000

not 1111

not 1111

Load Register Halfword

LDRH (register) on page A8-156

1x

-

1111

not 1111

Load Register Signed
Halfword

LDRSH (literal) on page A8-170

11

-

not 1111

not 1111

LDRSH (immediate) on page A8-168

10

1xx1xx

not 1111

not 1111

Load Register Signed
Halfword

1100xx

not 1111

not 1111

1110xx

not 1111

not 1111

Load Register Signed
Halfword Unprivileged

LDRSHT on page A8-174

000000

not 1111

not 1111

Load Register Signed
Halfword

LDRSH (register) on page A8-172

0x

-

1111

1111

UNPREDICTABLE

-

01

-

not 1111

1111

Preload Data with intent to
Write a

PLD, PLDW (immediate) on
page A8-236

00

1100xx

not 1111

1111

Preload Data with intent to
Write a

PLD, PLDW (immediate) on
page A8-236

000000

not 1111

1111

Preload Data with intent to
Write a

PLD, PLDW (register) on
page A8-240

A6-26

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Thumb Instruction Set Encoding

Table A6-19 Load halfword, preload (continued)
op1

op2

Rn

Rt

Instruction

See

00

1xx1xx

not 1111

1111

UNPREDICTABLE

-

1110xx

not 1111

1111

1x

-

1111

1111

10

1100xx

not 1111

1111

000000

not 1111

1111

1xx1xx

not 1111

1111

1110xx

not 1111

1111

-

not 1111

1111

10

11

Unallocated memory hint (treat as NOP)

UNPREDICTABLE

-

Unallocated memory hint (treat as NOP)

a. Available in ARMv7 with the Multiprocessing Extensions. In the ARMv7 base architecture and in ARMv6T2 these are
unallocated memory hints (treat as NOP).

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A6-27

Thumb Instruction Set Encoding

A6.3.9

Load byte, memory hints
15 14 13 12 11 10 9 8 7

6 5 4 3 2

1 1 1 1 1 0 0 op1 0 0 1

1 0 15 14 13 12 11 10 9 8

Rn

Rt

7 6 5 4 3 2

1 0

op2

Table A6-20 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED.
These encodings are all available in ARMv6T2 and above.
Table A6-20 Load byte, preload
op1

op2

Rn

Rt

Instruction

See

0x

-

1111

not 1111

Load Register Byte

LDRB (literal) on page A8-130

01

-

not 1111

not 1111

Load Register Byte

00

1xx1xx

not 1111

not 1111

LDRB (immediate, Thumb) on
page A8-126

1100xx

not 1111

not 1111

1110xx

not 1111

not 1111

Load Register Byte
Unprivileged

LDRBT on page A8-134

000000

not 1111

not 1111

Load Register Byte

LDRB (register) on page A8-132

1x

-

1111

not 1111

Load Register Signed Byte

LDRSB (literal) on page A8-162

11

-

not 1111

not 1111

Load Register Signed Byte

LDRSB (immediate) on page A8-160

10

1xx1xx

not 1111

not 1111

1100xx

not 1111

not 1111

1110xx

not 1111

not 1111

Load Register Signed Byte
Unprivileged

LDRSBT on page A8-166

000000

not 1111

not 1111

Load Register Signed Byte

LDRSB (register) on page A8-164

0x

-

1111

1111

Preload Data

PLD (literal) on page A8-238

01

-

not 1111

1111

Preload Data

PLD, PLDW (immediate) on
page A8-236

00

1100xx

not 1111

1111

Preload Data

PLD, PLDW (immediate) on
page A8-236

000000

not 1111

1111

Preload Data

PLD, PLDW (register) on page A8-240

1xx1xx

not 1111

1111

UNPREDICTABLE

-

1110xx

not 1111

1111

A6-28

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Thumb Instruction Set Encoding

Table A6-20 Load byte, preload (continued)
op1

op2

Rn

Rt

Instruction

See

1x

-

1111

1111

Preload Instruction

PLI (immediate, literal) on page A8-242

11

-

not 1111

1111

10

1100xx

not 1111

1111

000000

not 1111

1111

Preload Instruction

PLI (register) on page A8-244

1xx1xx

not 1111

1111

UNPREDICTABLE

-

1110xx

not 1111

1111

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A6-29

Thumb Instruction Set Encoding

A6.3.10 Store single data item
15 14 13 12 11 10 9 8 7

1 1 1 1 1 0 0 0

6 5 4 3 2

op1

1 0 15 14 13 12 11 10 9 8

0

7 6 5 4 3 2

1 0

op2

Table A6-21 show the allocation of encodings in this space. Other encodings in this space are UNDEFINED.
These encodings are all available in ARMv6T2 and above.
Table A6-21 Store single data item
op1

op2

Instruction

See

100

-

Store Register Byte

STRB (immediate, Thumb) on page A8-388

000

1xx1xx

1110xx

Store Register Byte Unprivileged

STRBT on page A8-394

0xxxxx

Store Register Byte

STRB (register) on page A8-392

101

-

Store Register Halfword

STRH (immediate, Thumb) on page A8-408

001

1xx1xx

1110xx

Store Register Halfword Unprivileged

STRHT on page A8-414

001

0xxxxx

Store Register Halfword

STRH (register) on page A8-412

110

-

Store Register (immediate)

STR (immediate, Thumb) on page A8-382

010

1xx1xx

1110xx

Store Register Unprivileged

STRT on page A8-416

0xxxxx

Store Register (register)

STR (register) on page A8-386

1100xx

1100xx

1100xx

A6-30

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Thumb Instruction Set Encoding

A6.3.11 Data-processing (shifted register)
15 14 13 12 11 10 9

8 7 6 5 4

1 1 1 0 1 0 1

op

3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4

S

Rn

3 2 1 0

Rd

Table A6-22 shows the allocation of encodings in this space.
Other encodings in this space are UNDEFINED.
These encodings are all available in ARMv6T2 and above.
Table A6-22 Data-processing (shifted register)
op

Rn

Rd

S

Instruction

See

0000

-

not 1111

x

Bitwise AND

AND (register) on page A8-36

1111

0

UNPREDICTABLE

-

1

Test

TST (register) on page A8-456

0001

-

-

-

Bitwise Bit Clear

BIC (register) on page A8-52

0010

not 1111

-

-

Bitwise OR

ORR (register) on page A8-230

1111

-

-

Move

MOV (register) on page A8-196

not 1111

-

-

Bitwise OR NOT

ORN (register) on page A8-226

1111

-

-

Bitwise NOT

MVN (register) on page A8-216

-

not 1111

-

Bitwise Exclusive OR

EOR (register) on page A8-96

1111

0

UNPREDICTABLE

-

1

Test Equivalence

TEQ (register) on page A8-450

0011

0100

ARM DDI 0406B

0110

-

-

-

Pack Halfword

PKH on page A8-234

1000

-

not 1111

-

Add

ADD (register) on page A8-24

1111

0

UNPREDICTABLE

-

1

Compare Negative

CMN (register) on page A8-76

1010

-

-

-

Add with Carry

ADC (register) on page A8-16

1011

-

-

-

Subtract with Carry

SBC (register) on page A8-304

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A6-31

Thumb Instruction Set Encoding

Table A6-22 Data-processing (shifted register) (continued)
op

Rn

Rd

S

Instruction

See

1101

-

not 1111

-

Subtract

SUB (register) on page A8-422

1111

0

UNPREDICTABLE

-

1

Compare

CMP (register) on page A8-82

-

Reverse Subtract

RSB (register) on page A8-286

1110

A6-32

-

-

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Thumb Instruction Set Encoding

A6.3.12 Data-processing (register)
15 14 13 12 11 10 9

8 7 6 5 4

1 1 1 1 1 0 1 0

3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4

op1

Rn

1 1 1 1

3 2 1 0

op2

If, in the second halfword of the instruction, bits [15:12] != 0b1111, the instruction is UNDEFINED.
Table A6-23 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED.
These encodings are all available in ARMv6T2 and above.
Table A6-23 Data-processing (register)
op1

op2

Rn

Instruction

See

000x

0000

-

Logical Shift Left

LSL (register) on page A8-180

001x

0000

-

Logical Shift Right

LSR (register) on page A8-184

010x

0000

-

Arithmetic Shift Right

ASR (register) on page A8-42

011x

0000

-

Rotate Right

ROR (register) on page A8-280

0000

1xxx

not 1111

Signed Extend and Add Halfword

SXTAH on page A8-438

1111

Signed Extend Halfword

SXTH on page A8-444

not 1111

Unsigned Extend and Add Halfword

UXTAH on page A8-518

1111

Unsigned Extend Halfword

UXTH on page A8-524

not 1111

Signed Extend and Add Byte 16

SXTAB16 on page A8-436

1111

Signed Extend Byte 16

SXTB16 on page A8-442

not 1111

Unsigned Extend and Add Byte 16

UXTAB16 on page A8-516

1111

Unsigned Extend Byte 16

UXTB16 on page A8-522

not 1111

Signed Extend and Add Byte

SXTAB on page A8-434

1111

Signed Extend Byte

SXTB on page A8-440

not 1111

Unsigned Extend and Add Byte

UXTAB on page A8-514

1111

Unsigned Extend Byte

UXTB on page A8-520

0001

0010

0011

0100

0101

1xxx

1xxx

1xxx

1xxx

1xxx

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A6-33

Thumb Instruction Set Encoding

Table A6-23 Data-processing (register) (continued)
op1

op2

Rn

Instruction

See

1xxx

00xx

-

-

Parallel addition and subtraction, signed on
page A6-35

01xx

-

-

Parallel addition and subtraction, unsigned on
page A6-36

10xx

-

-

Miscellaneous operations on page A6-37

10xx

A6-34

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Thumb Instruction Set Encoding

A6.3.13 Parallel addition and subtraction, signed
15 14 13 12 11 10 9

8 7 6 5 4

1 1 1 1 1 0 1 0 1

3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4

op1

1 1 1 1

3 2 1 0

0 0 op2

If, in the second halfword of the instruction, bits [15:12] != 0b1111, the instruction is UNDEFINED.
Table A6-24 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED.
These encodings are all available in ARMv6T2 and above.
Table A6-24 Signed parallel addition and subtraction instructions
op1

op2

Instruction

See

001

00

Add 16-bit

SADD16 on page A8-296

010

00

Add, Subtract

SASX on page A8-300

110

00

Subtract, Add

SSAX on page A8-366

101

00

Subtract 16-bit

SSUB16 on page A8-368

000

00

Add 8-bit

SADD8 on page A8-298

100

00

Subtract 8-bit

SSUB8 on page A8-370

Saturating instructions
001

01

Saturating Add 16-bit

QADD16 on page A8-252

010

01

Saturating Add, Subtract

QASX on page A8-256

110

01

Saturating Subtract, Add

QSAX on page A8-262

101

01

Saturating Subtract 16-bit

QSUB16 on page A8-266

000

01

Saturating Add 8-bit

QADD8 on page A8-254

100

01

Saturating Subtract 8-bit

QSUB8 on page A8-268

Halving instructions

ARM DDI 0406B

001

10

Halving Add 16-bit

SHADD16 on page A8-318

010

10

Halving Add, Subtract

SHASX on page A8-322

110

10

Halving Subtract, Add

SHSAX on page A8-324

101

10

Halving Subtract 16-bit

SHSUB16 on page A8-326

000

10

Halving Add 8-bit

SHADD8 on page A8-320

100

10

Halving Subtract 8-bit

SHSUB8 on page A8-328

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A6-35

Thumb Instruction Set Encoding

A6.3.14 Parallel addition and subtraction, unsigned
15 14 13 12 11 10 9 8 7

1 1 1 1 1 0 1 0 1

6 5 4 3 2

1 0 15 14 13 12 11 10 9 8

op1

1 1 1 1

7 6 5 4 3 2

1 0

0 1 op2

If, in the second halfword of the instruction, bits [15:12] != 0b1111, the instruction is UNDEFINED.
Table A6-25 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED.
These encodings are all available in ARMv6T2 and above.
Table A6-25 Unsigned parallel addition and subtraction instructions
op1

op2

Instruction

See

001

00

Add 16-bit

UADD16 on page A8-460

010

00

Add, Subtract

UASX on page A8-464

110

00

Subtract, Add

USAX on page A8-508

101

00

Subtract 16-bit

USUB16 on page A8-510

000

00

Add 8-bit

UADD8 on page A8-462

100

00

Subtract 8-bit

USUB8 on page A8-512

Saturating instructions
001

01

Saturating Add 16-bit

UQADD16 on page A8-488

010

01

Saturating Add, Subtract

UQASX on page A8-492

110

01

Saturating Subtract, Add

UQSAX on page A8-494

101

01

Saturating Subtract 16-bit

UQSUB16 on page A8-496

000

01

Saturating Add 8-bit

UQADD8 on page A8-490

100

01

Saturating Subtract 8-bit

UQSUB8 on page A8-498

Halving instructions

A6-36

001

10

Halving Add 16-bit

UHADD16 on page A8-470

010

10

Halving Add, Subtract

UHASX on page A8-474

110

10

Halving Subtract, Add

UHSAX on page A8-476

101

10

Halving Subtract 16-bit

UHSUB16 on page A8-478

000

10

Halving Add 8-bit

UHADD8 on page A8-472

100

10

Halving Subtract 8-bit

UHSUB8 on page A8-480

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Thumb Instruction Set Encoding

A6.3.15 Miscellaneous operations
15 14 13 12 11 10 9

8 7 6 5 4

1 1 1 1 1 0 1 0 1 0 op1

3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4

1 1 1 1

3 2 1 0

1 0 op2

If, in the second halfword of the instruction, bits [15:12] != 0b1111, the instruction is UNDEFINED.
Table A6-26 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED.
These encodings are all available in ARMv6T2 and above.
Table A6-26 Miscellaneous operations
op1

op2

Instruction

See

00

00

Saturating Add

QADD on page A8-250

01

Saturating Double and Add

QDADD on page A8-258

10

Saturating Subtract

QSUB on page A8-264

11

Saturating Double and Subtract

QDSUB on page A8-260

00

Byte-Reverse Word

REV on page A8-272

01

Byte-Reverse Packed Halfword

REV16 on page A8-274

10

Reverse Bits

RBIT on page A8-270

11

Byte-Reverse Signed Halfword

REVSH on page A8-276

10

00

Select Bytes

SEL on page A8-312

11

00

Count Leading Zeros

CLZ on page A8-72

01

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A6-37

Thumb Instruction Set Encoding

A6.3.16 Multiply, multiply accumulate, and absolute difference
15 14 13 12 11 10 9 8 7

1 1 1 1 1 0 1 1 0

6 5 4 3 2

1 0 15 14 13 12 11 10 9 8

op1

Ra

7 6 5 4 3 2

1 0

0 0 op2

If, in the second halfword of the instruction, bits [7:6] != 0b00, the instruction is UNDEFINED.
Table A6-27 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED.
These encodings are all available in ARMv6T2 and above.
Table A6-27 Multiply, multiply accumulate, and absolute difference operations
op1

op2

Ra

Instruction

See

000

00

not 1111

Multiply Accumulate

MLA on page A8-190

1111

Multiply

MUL on page A8-212

01

-

Multiply and Subtract

MLS on page A8-192

-

not 1111

Signed Multiply Accumulate (Halfwords)

SMLABB, SMLABT, SMLATB,
SMLATT on page A8-330

1111

Signed Multiply (Halfwords)

SMULBB, SMULBT, SMULTB,
SMULTT on page A8-354

not 1111

Signed Multiply Accumulate Dual

SMLAD on page A8-332

1111

Signed Dual Multiply Add

SMUAD on page A8-352

not 1111

Signed Multiply Accumulate (Word by halfword)

SMLAWB, SMLAWT on
page A8-340

1111

Signed Multiply (Word by halfword)

SMULWB, SMULWT on
page A8-358

not 1111

Signed Multiply Subtract Dual

SMLSD on page A8-342

1111

Signed Dual Multiply Subtract

SMUSD on page A8-360

not 1111

Signed Most Significant Word Multiply Accumulate

SMMLA on page A8-346

1111

Signed Most Significant Word Multiply

SMMUL on page A8-350

001

010

011

100

101

0x

0x

0x

0x

110

0x

-

Signed Most Significant Word Multiply Subtract

SMMLS on page A8-348

111

00

not 1111

Unsigned Sum of Absolute Differences

USAD8 on page A8-500

1111

Unsigned Sum of Absolute Differences, Accumulate

USADA8 on page A8-502

A6-38

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Thumb Instruction Set Encoding

A6.3.17 Long multiply, long multiply accumulate, and divide
15 14 13 12 11 10 9

8 7 6 5 4

1 1 1 1 1 0 1 1 1

3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4

op1

3 2 1 0

op2

Table A6-28 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED.
Table A6-28 Multiply, multiply accumulate, and absolute difference operations
op1

op2

Instruction

See

Variant

000

0000

Signed Multiply Long

SMULL on page A8-356

v6T2

001

1111

Signed Divide

SDIV on page A8-310

v7-R a

010

0000

Unsigned Multiply Long

UMULL on page A8-486

v6T2

011

1111

Unsigned Divide

UDIV on page A8-468

v7-R a

100

0000

Signed Multiply Accumulate Long

SMLAL on page A8-334

v6T2

10xx

Signed Multiply Accumulate Long
(Halfwords)

SMLALBB, SMLALBT, SMLALTB,
SMLALTT on page A8-336

v6T2

110x

Signed Multiply Accumulate Long Dual

SMLALD on page A8-338

v6T2

101

110x

Signed Multiply Subtract Long Dual

SMLSLD on page A8-344

v6T2

110

0000

Unsigned Multiply Accumulate Long

UMLAL on page A8-484

v6T2

0110

Unsigned Multiply Accumulate Accumulate
Long

UMAAL on page A8-482

v6T2

a.

UNDEFINED

ARM DDI 0406B

in ARMv7-A.

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A6-39

Thumb Instruction Set Encoding

A6.3.18 Coprocessor instructions
15 14 13 12 11 10 9 8 7

1 1 1

1 1

6 5 4 3 2

op1

1 0 15 14 13 12 11 10 9 8

Rn

7 6 5 4 3 2

coproc

1 0

op

Table A6-29 shows the allocation of encodings in this space. These encodings are all available in ARMv6T2
and above.
Table A6-29 Coprocessor instructions
op1

op

coproc

Rn

Instructions

See

000x1x
001xxx
01xxxx

-

101x

-

Advanced SIMD, VFP

Extension register load/store
instructions on page A7-26

000x10
001xx0
01xxx0

-

not 101x

-

Store Coprocessor

STC, STC2 on page A8-372

000x11
001xx1
01xxx1

-

not 101x

not 1111

Load Coprocessor (immediate)

LDC, LDC2 (immediate) on
page A8-106

000x11
001xx1
01xxx1

-

not 101x

1111

Load Coprocessor (literal)

LDC, LDC2 (literal) on page A8-108

00000x

-

-

-

UNDEFINED

-

00010x

-

101x

-

Advanced SIMD, VFP

64-bit transfers between ARM core
and extension registers on page A7-32

000100

-

not 101x

-

Move to Coprocessor from two
ARM core registers

MCRR, MCRR2 on page A8-188

000101

-

not 101x

-

Move to two ARM core
registers from Coprocessor

MRRC, MRRC2 on page A8-204

10xxxx

0

101x

-

VFP

VFP data-processing instructions on
page A7-24

not 101x

-

Coprocessor data operations

CDP, CDP2 on page A8-68

A6-40

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Thumb Instruction Set Encoding

Table A6-29 Coprocessor instructions (continued)
op1

op

coproc

Rn

Instructions

See

10xxxx

1

101x

-

Advanced SIMD, VFP

8, 16, and 32-bit transfer between
ARM core and extension registers on
page A7-31

10xxx0

1

not 101x

-

Move to Coprocessor from
ARM core register

MCR, MCR2 on page A8-186

10xxx1

1

not 101x

-

Move to ARM core register
from Coprocessor

MRC, MRC2 on page A8-202

11xxxx

-

-

-

Advanced SIMD

Advanced SIMD data-processing
instructions on page A7-10

For more information about specific coprocessors see Coprocessor support on page A2-68.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A6-41

Thumb Instruction Set Encoding

A6-42

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Chapter A7
Advanced SIMD and VFP
Instruction Encoding

This chapter gives an overview of the Advanced SIMD and VFP instruction sets. It contains the following
sections:
•
Overview on page A7-2
•
Advanced SIMD and VFP instruction syntax on page A7-3
•
Register encoding on page A7-8
•
Advanced SIMD data-processing instructions on page A7-10
•
VFP data-processing instructions on page A7-24
•
Extension register load/store instructions on page A7-26
•
Advanced SIMD element or structure load/store instructions on page A7-27
•
8, 16, and 32-bit transfer between ARM core and extension registers on page A7-31
•
64-bit transfers between ARM core and extension registers on page A7-32.

Note
•

The Advanced SIMD architecture extension, its associated implementations, and supporting
software, are commonly referred to as NEON™ technology.

•

In the decode tables in this chapter, an entry of - for a field value means the value of the field does
not affect the decoding.

ARM DDI 0406B

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

A7-1

Advanced SIMD and VFP Instruction Encoding

A7.1

Overview
All Advanced SIMD and VFP instructions are available in both ARM state and Thumb state.

A7.1.1

Advanced SIMD
The following sections describe the classes of instruction in the Advanced SIMD extension:
•
Advanced SIMD data-processing instructions on page A7-10
•
Advanced SIMD element or structure load/store instructions on page A7-27
•
Extension register load/store instructions on page A7-26
•
8, 16, and 32-bit transfer between ARM core and extension registers on page A7-31
•
64-bit transfers between ARM core and extension registers on page A7-32.

A7.1.2

VFP
The following sections describe the classes of instruction in the VFP extension:
•
Extension register load/store instructions on page A7-26
•
8, 16, and 32-bit transfer between ARM core and extension registers on page A7-31
•
64-bit transfers between ARM core and extension registers on page A7-32
•
VFP data-processing instructions on page A7-24.

A7-2

Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.

ARM DDI 0406B

Advanced SIMD and VFP Instruction Encoding

A7.2

Advanced SIMD and VFP instruction syntax
Advanced SIMD and VFP instructions use the general conventions of the ARM instruction set.
Advanced SIMD and VFP data-processing instructions use the following general format:
V{}{}{.
} {,} , All Advanced SIMD and VFP instructions begin with a V. This distinguishes Advanced SIMD vector and VFP instructions from ARM scalar instructions. The main operation is specified in the field. It is usually a three letter mnemonic the same as or similar to the corresponding scalar integer instruction. The and fields are standard assembler syntax fields. For details see Standard assembler syntax fields on page A8-7. A7.2.1 Advanced SIMD Instruction modifiers The field provides additional variants of some instructions. Table A7-1 provides definitions of the modifiers. Modifiers are not available for every instruction. Table A7-1 Advanced SIMD instruction modifiers ARM DDI 0406B Meaning Q The operation uses saturating arithmetic. R The operation performs rounding. D The operation doubles the result (before accumulation, if any). H The operation halves the result. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-3 Advanced SIMD and VFP Instruction Encoding A7.2.2 Advanced SIMD Operand shapes The field provides additional variants of some instructions. Table A7-2 provides definitions of the shapes. Operand shapes are not available for every instruction. Table A7-2 Advanced SIMD operand shapes Meaning Typical register shape (none) The operands and result are all the same width. Dd, Dn, Dm L Long operation - result is twice the width of both operands Qd, Dn, Dm N Narrow operation - result is half the width of both operands Dd, Qn, Qm W Wide operation - result and first operand are twice the width of the second operand Qd, Qn, Dm A7.2.3 Qd, Qn, Qm Data type specifiers The
field normally contains one data type specifier. This indicates the data type contained in • the second operand, if any • the operand, if there is no second operand • the result, if there are no operand registers. The data types of the other operand and result are implied by the
field combined with the instruction shape. For information about data type formats see Data types supported by the Advanced SIMD extension on page A2-25. In the instruction syntax descriptions in Chapter A8 Instruction Details, the
field is usually specified as a single field. However, where more convenient, it is sometimes specified as a concatenation of two fields, . A7-4 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Advanced SIMD and VFP Instruction Encoding Syntax flexibility There is some flexibility in the data type specifier syntax: • You can specify three data types, specifying the result and both operand data types. For example: VSUBW.I16.I16.S8 Q3,Q5,D0 instead of: VSUBW.S8 Q3,Q5,D0 • You can specify two data types, specifying the data types of the two operands. The data type of the result is implied by the instruction shape. • You can specify two data types, specifying the data types of the single operand and the result. • Where an instruction requires a less specific data type, you can instead specify a more specific type, as shown in Table A7-3. • Where an instruction does not require a data type, you can provide one. • The F32 data type can be abbreviated to F. • The F64 data type can be abbreviated to D. In all cases, if you provide additional information, the additional information must match the instruction shape. Disassembly does not regenerate this additional information. Table A7-3 Data type specification flexibility Specified data type Permitted more specific data types None ARM DDI 0406B Any .I - .S .U - - .8 .I8 .S8 .U8 .P8 - .16 .I16 .S16 .U16 .P16 .F16 .32 .I32 .S32 .U32 - .F32 or .F .64 .I64 .S64 .U64 - .F64 or .D Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-5 Advanced SIMD and VFP Instruction Encoding A7.2.4 Register specifiers The , , and fields contain register specifiers, or in some cases scalar specifiers or register lists. Table A7-4 shows the register and scalar specifier formats that appear in the instruction descriptions. If is omitted, it is the same as . Table A7-4 Advanced SIMD and VFP register specifier formats Usual meaning a A quadword destination register for the result vector (Advanced SIMD only). A quadword source register for the first operand vector (Advanced SIMD only). A quadword source register for the second operand vector (Advanced SIMD only).
A doubleword destination register for the result vector. A doubleword source register for the first operand vector. A doubleword source register for the second operand vector. A singleword destination register for the result vector (VFP only). A singleword source register for the first operand vector (VFP only). A singleword source register for the second operand vector (VFP only). A destination scalar for the result. Element x of vector
. (Advanced SIMD only). A source scalar for the first operand. Element x of vector . (Advanced SIMD only). A source scalar for the second operand. Element x of vector . (Advanced SIMD only). An ARM core register. Can be source or destination. An ARM core register. Can be source or destination. a. In some instructions the roles of registers are different. A7-6 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Advanced SIMD and VFP Instruction Encoding A7.2.5 Register lists A register list is a list of register specifiers separated by commas and enclosed in brackets { and }. There are restrictions on what registers can appear in a register list. These restrictions are described in the individual instruction descriptions. Table A7-5 shows some register list formats, with examples of actual register lists corresponding to those formats. Note Register lists must not wrap around the end of the register bank. Syntax flexibility There is some flexibility in the register list syntax: • Where a register list contains consecutive registers, they can be specified as a range, instead of listing every register, for example {D0-D3} instead of {D0,D1,D2,D3}. • Where a register list contains an even number of consecutive doubleword registers starting with an even numbered register, it can be written as a list of quadword registers instead, for example {Q1,Q2} instead of {D2-D5}. • Where a register list contains only one register, the enclosing braces can be omitted, for example VLD1.8 D0,[R0] instead of VLD1.8 {D0},[R0]. Table A7-5 Example register lists ARM DDI 0406B Format Example Alternative {
} {D3} D3 {
,,} {D3,D4,D5} {D3-D5} {,} {D7[]} D7[] Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-7 Advanced SIMD and VFP Instruction Encoding A7.3 Register encoding Advanced SIMD registers are either quadword (128 bits wide) or doubleword (64 bits wide). Some instructions have options for either doubleword or quadword registers. This is normally encoded in Q (bit [6]) as Q = 0 for doubleword operations, Q = 1 for quadword operations. VFP registers are either double-precision (64 bits wide) or single-precision (32 bits wide). This is encoded in the sz field (bit [8]) as sz = 1 for double-precision operations, or sz = 0 for single-precision operations. Some instructions use only one or two registers, and use the unused register fields as additional opcode bits. Table A7-6 shows the encodings for the registers. Thumb encoding 15 14 13 12 11 10 9 8 7 6 5 4 3 2 D 1 0 15 14 13 12 11 10 9 8 Vn Vd 7 6 5 4 3 2 sz N Q M 1 0 Vm ARM encoding 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 D Vn Vd 7 6 5 4 3 2 sz N Q M 1 0 Vm Table A7-6 Encoding of register numbers Register mnemonic Usual usage Register number encoded in Notes a Used in Destination (quadword) D, Vd (bits [22,15:13]) bit [12] == 0 Adv. SIMD First operand (quadword) N, Vn (bits [7,19:17]) bit [16] == 0 Adv. SIMD Second operand (quadword) M, Vm (bits [5,3:1]) bit [0] == 0 Adv. SIMD
Destination (doubleword) D, Vd (bits [22,15:12]) - Both First operand (doubleword) N, Vn (bits [7,19:16]) - Both Second operand (doubleword) M, Vm (bits [5,3:0]) - Both Destination (single-precision) Vd, D (bits [15:12,22]) - VFP First operand (single-precision) Vn, N (bits [19:16,7]) - VFP Second operand (single-precision) Vm, M (bits [3:0,5]) - VFP a. If one of these bits is 1, the instruction is UNDEFINED. A7-8 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Advanced SIMD and VFP Instruction Encoding A7.3.1 Advanced SIMD scalars Advanced SIMD scalars can be 8-bit, 16-bit, 32-bit, or 64-bit. Instructions other than multiply instructions can access any element in the register set. The instruction syntax refers to the scalars using an index into a doubleword vector. The descriptions of the individual instructions contain details of the encodings. Table A7-7 shows the form of encoding for scalars used in multiply instructions. These instructions cannot access scalars in some registers. The descriptions of the individual instructions contain cross references to this section where appropriate. 32-bit Advanced SIMD scalars, when used as single-precision floating-point numbers, are equivalent to VFP single-precision registers. That is, Dm[x] in a 32-bit context (0 <= m <= 15, 0 <= x <=1) is equivalent to S[2m + x]. Table A7-7 Encoding of scalars in multiply instructions ARM DDI 0406B Scalar mnemonic Usual usage Scalar size Register specifier Index specifier Accessible registers Second operand 16-bit Vm[2:0] M, Vm[3] D0-D7 32-bit Vm[3:0] M D0-D15 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-9 Advanced SIMD and VFP Instruction Encoding A7.4 Advanced SIMD data-processing instructions Thumb encoding 15 14 13 12 11 10 9 8 7 1 1 1 U 1 1 1 1 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 A 7 6 5 4 3 2 B 1 0 C ARM encoding 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U A 7 6 5 4 3 2 B 1 0 C Table A7-8 shows the encoding for Advanced SIMD data-processing instructions. Other encodings in this space are UNDEFINED. In these instructions, the U bit is in a different location in ARM and Thumb instructions. This is bit [12] of the first halfword in the Thumb encoding, and bit [24] in the ARM encoding. Other variable bits are in identical locations in the two encodings, after adjusting for the fact that the ARM encoding is held in memory as a single word and the Thumb encoding is held as two consecutive halfwords. The ARM instructions can only be executed unconditionally. The Thumb instructions can be executed conditionally by using the IT instruction. For details see IT on page A8-104. Table A7-8 Data-processing instructions A7-10 U A B C See - 0xxxx - - Three registers of the same length on page A7-12 1x000 - 0xx1 One register and a modified immediate value on page A7-21 1x001 - 0xx1 Two registers and a shift amount on page A7-17 1x01x - 0xx1 1x1xx - 0xx1 1xxxx - 1xx1 1x0xx - x0x0 1x10x - x0x0 1x0xx - x1x0 1x10x - x1x0 Three registers of different lengths on page A7-15 Two registers and a scalar on page A7-16 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Advanced SIMD and VFP Instruction Encoding Table A7-8 Data-processing instructions (continued) ARM DDI 0406B U A B C See 0 1x11x - xxx0 Vector Extract, VEXT on page A8-598 1 1x11x 0xxx xxx0 Two registers, miscellaneous on page A7-19 10xx xxx0 Vector Table Lookup, VTBL, VTBX on page A8-798 1100 0xx0 Vector Duplicate, VDUP (scalar) on page A8-592 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-11 Advanced SIMD and VFP Instruction Encoding A7.4.1 Three registers of the same length Thumb encoding 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 0 1 0 15 14 13 12 11 10 9 8 C 7 6 5 4 3 2 A 1 0 B ARM encoding 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 0 C 7 6 5 4 3 2 A 1 0 B Table A7-9 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. Table A7-9 Three registers of the same length A B U C Instruction See 0000 0 - - Vector Halving Add VHADD, VHSUB on page A8-600 1 - - Vector Saturating Add VQADD on page A8-700 0 - - Vector Rounding Halving Add VRHADD on page A8-734 1 0 00 Vector Bitwise AND VAND (register) on page A8-544 01 Vector Bitwise Bit Clear (AND complement) VBIC (register) on page A8-548 10 Vector Bitwise OR (if source registers differ) VORR (register) on page A8-680 Vector Move (if source registers identical) VMOV (register) on page A8-642 11 Vector Bitwise OR NOT VORN (register) on page A8-676 00 Vector Bitwise Exclusive OR VEOR on page A8-596 01 Vector Bitwise Select VBIF, VBIT, VBSL on page A8-550 10 Vector Bitwise Insert if True VBIF, VBIT, VBSL on page A8-550 11 Vector Bitwise Insert if False VBIF, VBIT, VBSL on page A8-550 0001 1 0010 0011 A7-12 0 - - Vector Halving Subtract VHADD, VHSUB on page A8-600 1 - - Vector Saturating Subtract VQSUB on page A8-724 0 - - Vector Compare Greater Than VCGT (register) on page A8-560 1 - - Vector Compare Greater Than or Equal VCGE (register) on page A8-556 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Advanced SIMD and VFP Instruction Encoding Table A7-9 Three registers of the same length (continued) A B U C Instruction See 0100 0 - - Vector Shift Left VSHL (register) on page A8-752 1 - - Vector Saturating Shift Left VQSHL (register) on page A8-718 0 - - Vector Rounding Shift Left VRSHL on page A8-736 1 - - Vector Saturating Rounding Shift Left VQRSHL on page A8-714 0110 - - - Vector Maximum or Minimum VMAX, VMIN (integer) on page A8-630 0111 0 - - Vector Absolute Difference VABD, VABDL (integer) on page A8-528 1 - - Vector Absolute Difference and Accumulate VABA, VABAL on page A8-526 0 0 - Vector Add VADD (integer) on page A8-536 1 - Vector Subtract VSUB (integer) on page A8-788 0 - Vector Test Bits VTST on page A8-802 1 - Vector Compare Equal VCEQ (register) on page A8-552 0 - - Vector Multiply Accumulate or Subtract VMLA, VMLAL, VMLS, VMLSL (integer) on page A8-634 1 - - Vector Multiply VMUL, VMULL (integer and polynomial) on page A8-662 1010 - - - Vector Pairwise Maximum or Minimum VPMAX, VPMIN (integer) on page A8-690 1011 0 0 - Vector Saturating Doubling Multiply Returning High Half VQDMULH on page A8-704 1 - Vector Saturating Rounding Doubling Multiply Returning High Half VQRDMULH on page A8-712 0 - Vector Pairwise Add VPADD (integer) on page A8-684 0101 1000 1 1001 1 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-13 Advanced SIMD and VFP Instruction Encoding Table A7-9 Three registers of the same length (continued) A B U C Instruction See 1101 0 0 0x Vector Add VADD (floating-point) on page A8-538 1x Vector Subtract VSUB (floating-point) on page A8-790 0x Vector Pairwise Add VPADD (floating-point) on page A8-686 1x Vector Absolute Difference VABD (floating-point) on page A8-530 0 - Vector Multiply Accumulate or Subtract VMLA, VMLS (floating-point) on page A8-636 1 0x Vector Multiply VMUL (floating-point) on page A8-664 0 0x Vector Compare Equal VCEQ (register) on page A8-552 1 0x Vector Compare Greater Than or Equal VCGE (register) on page A8-556 1x Vector Compare Greater Than VCGT (register) on page A8-560 1 1 1110 1111 0 1 1 - Vector Absolute Compare Greater or Less Than (or Equal) VACGE, VACGT, VACLE,VACLT on page A8-534 0 0 - Vector Maximum or Minimum VMAX, VMIN (floating-point) on page A8-632 1 - Vector Pairwise Maximum or Minimum VPMAX, VPMIN (floating-point) on page A8-692 0 0x Vector Reciprocal Step VRECPS on page A8-730 0 1x Vector Reciprocal Square Root Step VRSQRTS on page A8-744 1 A7-14 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Advanced SIMD and VFP Instruction Encoding A7.4.2 Three registers of different lengths Thumb encoding 15 14 13 12 11 10 9 8 7 6 5 4 1 1 1 U 1 1 1 1 1 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 B A 0 3 2 1 0 0 ARM encoding 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 1 1 1 1 0 0 1 U 1 B A 0 3 2 1 0 0 If B == 0b11, see Advanced SIMD data-processing instructions on page A7-10. Table A7-10 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. Table A7-10 Data-processing instructions with three registers of different lengths A U Instruction See 000x - Vector Add Long or Wide VADDL, VADDW on page A8-542 001x - Vector Subtract Long or Wide VSUBL, VSUBW on page A8-794 0100 0 Vector Add and Narrow, returning High Half VADDHN on page A8-540 1 Vector Rounding Add and Narrow, returning High Half VRADDHN on page A8-726 0101 - Vector Absolute Difference and Accumulate VABA, VABAL on page A8-526 0110 0 Vector Subtract and Narrow, returning High Half VSUBHN on page A8-792 1 Vector Rounding Subtract and Narrow, returning High Half VRSUBHN on page A8-748 0111 - Vector Absolute Difference VABD, VABDL (integer) on page A8-528 10x0 - Vector Multiply Accumulate or Subtract VMLA, VMLAL, VMLS, VMLSL (integer) on page A8-634 10x1 0 Vector Saturating Doubling Multiply Accumulate or Subtract Long VQDMLAL, VQDMLSL on page A8-702 1100 - Vector Multiply (integer) VMUL, VMULL (integer and polynomial) on page A8-662 1101 0 Vector Saturating Doubling Multiply Long VQDMULL on page A8-706 1110 - Vector Multiply (polynomial) VMUL, VMULL (integer and polynomial) on page A8-662 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-15 Advanced SIMD and VFP Instruction Encoding A7.4.3 Two registers and a scalar Thumb encoding 15 14 13 12 11 10 9 8 7 1 1 1 U 1 1 1 1 1 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 B 7 6 5 4 3 2 A 1 1 0 0 ARM encoding 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 1 B 7 6 5 4 3 2 A 1 1 0 0 If B == 0b11, see Advanced SIMD data-processing instructions on page A7-10. Table A7-11 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. Table A7-11 Data-processing instructions with two registers and a scalar A U Instruction See 0x0x - Vector Multiply Accumulate or Subtract VMLA, VMLAL, VMLS, VMLSL (by scalar) on page A8-638 0x10 - Vector Multiply Accumulate or Subtract Long VMLA, VMLAL, VMLS, VMLSL (by scalar) on page A8-638 0x11 0 Vector Saturating Doubling Multiply Accumulate or Subtract Long VQDMLAL, VQDMLSL on page A8-702 100x - Vector Multiply VMUL, VMULL (by scalar) on page A8-666 1010 - Vector Multiply Long VMUL, VMULL (by scalar) on page A8-666 1011 0 Vector Saturating Doubling Multiply Long VQDMULL on page A8-706 1100 - Vector Saturating Doubling Multiply returning High Half VQDMULH on page A8-704 1101 - Vector Saturating Rounding Doubling Multiply returning High Half VQRDMULH on page A8-712 A7-16 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Advanced SIMD and VFP Instruction Encoding A7.4.4 Two registers and a shift amount Thumb encoding 15 14 13 12 11 10 9 8 7 6 5 4 1 1 1 U 1 1 1 1 1 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 imm3 A L B 3 2 1 0 1 ARM encoding 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 1 1 1 1 0 0 1 U 1 imm3 A L B 3 2 1 0 1 If [L, imm3] == 0b0000, see One register and a modified immediate value on page A7-21. Table A7-12 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. Table A7-12 Data-processing instructions with two registers and a shift amount A U B L Instruction See 0000 - - - Vector Shift Right VSHR on page A8-756 0001 - - - Vector Shift Right and Accumulate VSRA on page A8-764 0010 - - - Vector Rounding Shift Right VRSHR on page A8-738 0011 - - - Vector Rounding Shift Right and Accumulate VRSRA on page A8-746 0100 1 - - Vector Shift Right and Insert VSRI on page A8-766 0101 0 - - Vector Shift Left VSHL (immediate) on page A8-750 0101 1 - - Vector Shift Left and Insert VSLI on page A8-760 011x - - - Vector Saturating Shift Left VQSHL, VQSHLU (immediate) on page A8-720 1000 0 0 0 Vector Shift Right Narrow VSHRN on page A8-758 1 - Vector Rounding Shift Right Narrow VRSHRN on page A8-740 0 - Vector Saturating Shift Right, Unsigned Narrow VQSHRN, VQSHRUN on page A8-722 1 - Vector Saturating Shift Right, Rounded Unsigned Narrow VQRSHRN, VQRSHRUN on page A8-716 0 - Vector Saturating Shift Right, Narrow VQSHRN, VQSHRUN on page A8-722 1 - Vector Saturating Shift Right, Rounded Narrow VQRSHRN, VQRSHRUN on page A8-716 1 1001 - ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-17 Advanced SIMD and VFP Instruction Encoding Table A7-12 Data-processing instructions with two registers and a shift amount (continued) A U B L Instruction See 1010 - 0 - Vector Shift Left Long VSHLL on page A8-754 Vector Move Long VMOVL on page A8-654 Vector Convert VCVT (between floating-point and fixed-point, Advanced SIMD) on page A8-580 111x A7-18 - - - Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Advanced SIMD and VFP Instruction Encoding A7.4.5 Two registers, miscellaneous Thumb encoding 15 14 13 12 11 10 9 8 7 6 5 4 1 1 1 1 1 1 1 1 1 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 1 1 A 0 B 3 2 1 0 0 ARM encoding 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 1 1 1 1 0 0 1 1 1 1 1 A 0 B 3 2 1 0 0 The allocation of encodings in this space is shown in Table A7-13. Other encodings in this space are UNDEFINED. Table A7-13 Instructions with two registers, miscellaneous A B Instruction See 00 0000x Vector Reverse in doublewords VREV16, VREV32, VREV64 on page A8-732 0001x Vector Reverse in words VREV16, VREV32, VREV64 on page A8-732 0010x Vector Reverse in halfwords VREV16, VREV32, VREV64 on page A8-732 010xx Vector Pairwise Add Long VPADDL on page A8-688 1000x Vector Count Leading Sign Bits VCLS on page A8-566 1001x Vector Count Leading Zeros VCLZ on page A8-570 1010x Vector Count VCNT on page A8-574 1011x Vector Bitwise NOT VMVN (register) on page A8-670 110xx Vector Pairwise Add and Accumulate Long VPADAL on page A8-682 1110x Vector Saturating Absolute VQABS on page A8-698 1111x Vector Saturating Negate VQNEG on page A8-710 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-19 Advanced SIMD and VFP Instruction Encoding Table A7-13 Instructions with two registers, miscellaneous (continued) A B Instruction See 01 x000x Vector Compare Greater Than Zero VCGT (immediate #0) on page A8-562 x001x Vector Compare Greater Than or Equal to Zero VCGE (immediate #0) on page A8-558 x010x Vector Compare Equal to zero VCEQ (immediate #0) on page A8-554 x011x Vector Compare Less Than or Equal to Zero VCLE (immediate #0) on page A8-564 x100x Vector Compare Less Than Zero VCLT (immediate #0) on page A8-568 x110x Vector Absolute VABS on page A8-532 x111x Vector Negate VNEG on page A8-672 0000x Vector Swap VSWP on page A8-796 0001x Vector Transpose VTRN on page A8-800 0010x Vector Unzip VUZP on page A8-804 0011x Vector Zip VZIP on page A8-806 01000 Vector Move and Narrow VMOVN on page A8-656 01001 Vector Saturating Move and Unsigned Narrow VQMOVN, VQMOVUN on page A8-708 0101x Vector Saturating Move and Narrow VQMOVN, VQMOVUN on page A8-708 01100 Vector Shift Left Long (maximum shift) VSHLL on page A8-754 11x00 Vector Convert VCVT (between half-precision and single-precision, Advanced SIMD) on page A8-586 10x0x Vector Reciprocal Estimate VRECPE on page A8-728 10x1x Vector Reciprocal Square Root Estimate VRSQRTE on page A8-742 11xxx Vector Convert VCVT (between floating-point and integer, Advanced SIMD) on page A8-576 10 11 A7-20 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Advanced SIMD and VFP Instruction Encoding A7.4.6 One register and a modified immediate value Thumb encoding 15 14 13 12 11 10 9 8 7 6 5 4 1 1 1 a 1 1 1 1 1 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 0 0 0 b c d cmode 0 3 2 1 0 op 1 e f g h ARM encoding 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 1 1 1 1 0 0 1 a 1 0 0 0 b c d cmode 0 3 2 1 0 op 1 e f g h Table A7-14 shows the allocation of encodings in this space. Table A7-15 on page A7-22 shows the modified immediate constants available with these instructions, and how they are encoded. Table A7-14 Data-processing instructions with one register and a modified immediate value op cmode Instruction See 0 0xx0 Vector Move VMOV (immediate) on page A8-640 0xx1 Vector Bitwise OR VORR (immediate) on page A8-678 10x0 Vector Move VMOV (immediate) on page A8-640 10x1 Vector Bitwise OR VORR (immediate) on page A8-678 11xx Vector Move VMOV (immediate) on page A8-640 0xx0 Vector Bitwise NOT VMVN (immediate) on page A8-668 0xx1 Vector Bit Clear VBIC (immediate) on page A8-546 10x0 Vector Bitwise NOT VMVN (immediate) on page A8-668 10x1 Vector Bit Clear VBIC (immediate) on page A8-546 110x Vector Bitwise NOT VMVN (immediate) on page A8-668 1110 Vector Move VMOV (immediate) on page A8-640 1111 UNDEFINED - 1 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-21 Advanced SIMD and VFP Instruction Encoding Table A7-15 Modified immediate values for Advanced SIMD instructions op cmode Constant a
b Notes - 000x 00000000 00000000 00000000 abcdefgh 00000000 00000000 00000000 abcdefgh I32 c 001x 00000000 00000000 abcdefgh 00000000 00000000 00000000 abcdefgh 00000000 I32 c, d 010x 00000000 abcdefgh 00000000 00000000 00000000 abcdefgh 00000000 00000000 I32 c, d 011x abcdefgh 00000000 00000000 00000000 abcdefgh 00000000 00000000 00000000 I32 c, d 100x 00000000 abcdefgh 00000000 abcdefgh 00000000 abcdefgh 00000000 abcdefgh I16 c 101x abcdefgh 00000000 abcdefgh 00000000 abcdefgh 00000000 abcdefgh 00000000 I16 c, d 1100 00000000 00000000 abcdefgh 11111111 00000000 00000000 abcdefgh 11111111 I32 d, e 1101 00000000 abcdefgh 11111111 11111111 00000000 abcdefgh 11111111 11111111 I32 d, e 0 1110 abcdefgh abcdefgh abcdefgh abcdefgh abcdefgh abcdefgh abcdefgh abcdefgh I8 f 1 1110 aaaaaaaa bbbbbbbb cccccccc dddddddd eeeeeeee ffffffff gggggggg hhhhhhhh I64 f 0 1111 aBbbbbbc defgh000 00000000 00000000 aBbbbbbc defgh000 00000000 00000000 F32 f, g 1 1111 UNDEFINED - - a. In this table, the immediate value is shown in binary form, to relate abcdefgh to the encoding diagram. In assembler syntax, the constant is specified by a data type and a value of that type. That value is specified in the normal way (a decimal number by default) and is replicated enough times to fill the 64-bit immediate. For example, a data type of I32 and a value of 10 specify the 64-bit constant 0x0000000A0000000A. b. This specifies the data type used when the instruction is disassembled. On assembly, the data type must be matched in the table if possible. Other data types are permitted as pseudo-instructions when code is assembled, provided the 64-bit constant specified by the data type and value is available for the instruction (if it is available in more than one way, the first entry in this table that can produce it is used). For example, VMOV.I64 D0,#0x8000000080000000 does not specify a 64-bit constant that is available from the I64 line of the table, but does specify one that is available from the fourth I32 line or the F32 line. It is assembled to the former, and therefore is disassembled as VMOV.I32 D0,#0x80000000. c. This constant is available for the VBIC, VMOV, VMVN, and VORR instructions. d. UNPREDICTABLE if abcdefgh == 00000000. e. This constant is available for the VMOV and VMVN instructions only. f. This constant is available for the VMOV instruction only. g. In this entry, B = NOT(b). The bit pattern represents the floating-point number (–1)S * 2exp * mantissa, where S = UInt(a), exp = UInt(NOT(b):c:d)-3 and mantissa = (16+UInt(e:f:g:h))/16. A7-22 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Advanced SIMD and VFP Instruction Encoding Operation // AdvSIMDExpandImm() // ================== bits(64) AdvSIMDExpandImm(bit op, bits(4) cmode, bits(8) imm8) case cmode<3:1> of when ‘000’ testimm8 = FALSE; imm64 = Replicate(Zeros(24):imm8, 2); when ‘001’ testimm8 = TRUE; imm64 = Replicate(Zeros(16):imm8:Zeros(8), 2); when ‘010’ testimm8 = TRUE; imm64 = Replicate(Zeros(8):imm8:Zeros(16), 2); when ‘011’ testimm8 = TRUE; imm64 = Replicate(imm8:Zeros(24), 2); when ‘100’ testimm8 = FALSE; imm64 = Replicate(Zeros(8):imm8, 4); when ‘101’ testimm8 = TRUE; imm64 = Replicate(imm8:Zeros(8), 4); when ‘110’ testimm8 = TRUE; if cmode<0> == ‘0’ then imm64 = Replicate(Zeros(16):imm8:Ones(8), 2); else imm64 = Replicate(Zeros(8):imm8:Ones(16), 2); when ‘111’ testimm8 = FALSE; if cmode<0> == ‘0’ && op == ‘0’ then imm64 = Replicate(imm8, 8); if cmode<0> == ‘0’ && op == ‘1’ then imm8a = Replicate(imm8<7>, 8); imm8b = Replicate(imm8<6>, 8); imm8c = Replicate(imm8<5>, 8); imm8d = Replicate(imm8<4>, 8); imm8e = Replicate(imm8<3>, 8); imm8f = Replicate(imm8<2>, 8); imm8g = Replicate(imm8<1>, 8); imm8h = Replicate(imm8<0>, 8); imm64 = imm8a:imm8b:imm8c:imm8d:imm8e:imm8f:imm8g:imm8h; if cmode<0> == ‘1’ && op == ‘0’ then imm32 = imm8<7>:NOT(imm8<6>):Replicate(imm8<6>,5):imm8<5:0>:Zeros(19); imm64 = Replicate(imm32, 2); if cmode<0> == ‘1’ && op == ‘1’ then UNDEFINED; if testimm8 && imm8 == ‘00000000’ then UNPREDICTABLE; return imm64; ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-23 Advanced SIMD and VFP Instruction Encoding A7.5 VFP data-processing instructions Thumb encoding 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 T 1 1 1 0 opc1 1 0 15 14 13 12 11 10 9 8 opc2 1 0 1 7 6 5 4 3 2 opc3 0 1 0 opc4 ARM encoding 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 1 0 opc1 opc2 1 0 1 7 6 5 4 3 2 opc3 0 1 0 opc4 If T == 1 in the Thumb encoding or cond == 0b1111 in the ARM encoding, the instruction is UNDEFINED. Otherwise: • Table A7-16 shows the encodings for three-register VFP data-processing instructions. Other encodings in this space are UNDEFINED. • Table A7-17 on page A7-25 applies only if Table A7-16 indicates that it does. It shows the encodings for VFP data-processing instructions with two registers or a register and an immediate. Other encodings in this space are UNDEFINED. • Table A7-18 on page A7-25 shows the immediate constants available in the VMOV (immediate) instruction. These instructions are CDP instructions for coprocessors 10 and 11. Table A7-16 Three-register VFP data-processing instructions opc1 opc3 Instruction See 0x00 - Vector Multiply Accumulate or Subtract VMLA, VMLS (floating-point) on page A8-636 0x01 - Vector Negate Multiply Accumulate or Subtract VNMLA, VNMLS, VNMUL on page A8-674 0x10 x1 x0 Vector Multiply VMUL (floating-point) on page A8-664 x0 Vector Add VADD (integer) on page A8-536 x1 Vector Subtract VSUB (integer) on page A8-788 1x00 x0 Vector Divide VDIV on page A8-590 1x11 - Other VFP data-processing instructions Table A7-17 on page A7-25 0x11 A7-24 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Advanced SIMD and VFP Instruction Encoding Table A7-17 Other VFP data-processing instructions opc2 opc3 Instruction See - x0 Vector Move VMOV (immediate) on page A8-640 0000 01 Vector Move VMOV (register) on page A8-642 11 Vector Absolute VABS on page A8-532 01 Vector Negate VNEG on page A8-672 11 Vector Square Root VSQRT on page A8-762 001x x1 Vector Convert VCVTB, VCVTT (between half-precision and single-precision, VFP) on page A8-588 010x x1 Vector Compare VCMP, VCMPE on page A8-572 0111 11 Vector Convert VCVT (between double-precision and single-precision) on page A8-584 1000 x1 Vector Convert VCVT, VCVTR (between floating-point and integer, VFP) on page A8-578 101x x1 Vector Convert VCVT (between floating-point and fixed-point, VFP) on page A8-582 110x x1 Vector Convert VCVT, VCVTR (between floating-point and integer, VFP) on page A8-578 111x x1 Vector Convert VCVT (between floating-point and fixed-point, VFP) on page A8-582 0001 Table A7-18 VFP modified immediate constants Data type opc2 opc4 Constant a F32 abcd efgh aBbbbbbc defgh000 00000000 00000000 F64 abcd efgh aBbbbbbb bbcdefgh 00000000 00000000 00000000 00000000 00000000 00000000 a. In this column, B = NOT(b). The bit pattern represents the floating-point number (–1)S * 2exp * mantissa, where S = UInt(a), exp = UInt(NOT(b):c:d)-3 and mantissa = (16+UInt(e:f:g:h))/16. A7.5.1 Operation // VFPExpandImm() // ============== bits(N) VFPExpandImm(bits(8) imm8, integer N) assert N == 32 || N == 64; if N == 32 then return imm8<7>:NOT(imm8<6>):Replicate(imm8<6>,5):imm8<5:0>:Zeros(19); else return imm8<7>:NOT(imm8<6>):Replicate(imm8<6>,8):imm8<5:0>:Zeros(48); ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-25 Advanced SIMD and VFP Instruction Encoding A7.6 Extension register load/store instructions Thumb encoding 15 14 13 12 11 10 9 8 7 1 1 1 T 1 1 0 6 5 4 3 2 Opcode 1 0 15 14 13 12 11 10 9 8 Rn 7 6 5 4 3 2 1 0 1 0 1 ARM encoding 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 0 Opcode Rn 7 6 5 4 3 2 1 0 1 0 1 If T == 1 in the Thumb encoding or cond == 0b1111 in the ARM encoding, the instruction is UNDEFINED. Otherwise, the allocation of encodings in this space is shown in Table A7-19. Other encodings in this space are UNDEFINED. These instructions are LDC and STC instructions for coprocessors 10 and 11. Table A7-19 Extension register load/store instructions Opcode Rn Instruction See 0010x - - 64-bit transfers between ARM core and extension registers on page A7-32 01x00 - Vector Store Multiple (Increment After, no writeback) VSTM on page A8-784 01x10 - Vector Store Multiple (Increment After, writeback) VSTM on page A8-784 1xx00 - Vector Store Register VSTR on page A8-786 10x10 not 1101 Vector Store Multiple (Decrement Before, writeback) VSTM on page A8-784 1101 Vector Push Registers VPUSH on page A8-696 01x01 - Vector Load Multiple (Increment After, no writeback) VLDM on page A8-626 01x11 not 1101 Vector Load Multiple (Increment After, writeback) VLDM on page A8-626 1101 Vector Pop Registers VPOP on page A8-694 1xx01 - Vector Load Register VLDR on page A8-628 10x11 - Vector Load Multiple (Decrement Before, writeback) VLDM on page A8-626 A7-26 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Advanced SIMD and VFP Instruction Encoding A7.7 Advanced SIMD element or structure load/store instructions Thumb encoding 15 14 13 12 11 10 9 8 7 6 5 4 1 1 1 1 1 0 0 1 A 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 L 0 3 2 1 0 B ARM encoding 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 1 1 1 1 0 1 0 0 A L 0 3 2 1 0 B The allocation of encodings in this space is shown in: • Table A7-20 if L == 0, store instructions • Table A7-21 on page A7-28 if L == 1, load instructions. Other encodings in this space are UNDEFINED. The variable bits are in identical locations in the two encodings, after adjusting for the fact that the ARM encoding is held in memory as a single word and the Thumb encoding is held as two consecutive halfwords. The ARM instructions can only executed unconditionally. The Thumb instructions can be executed conditionally by using the IT instruction. For details see IT on page A8-104. Table A7-20 Element and structure store instructions (L == 0) ARM DDI 0406B A B Instruction See 0 0010 011x 1010 Vector Store VST1 (multiple single elements) on page A8-768 0011 100x Vector Store VST2 (multiple 2-element structures) on page A8-772 010x Vector Store VST3 (multiple 3-element structures) on page A8-776 000x Vector Store VST4 (multiple 4-element structures) on page A8-780 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-27 Advanced SIMD and VFP Instruction Encoding Table A7-20 Element and structure store instructions (L == 0) (continued) A B Instruction See 1 0x00 1000 Vector Store VST1 (single element from one lane) on page A8-770 0x01 1001 Vector Store VST2 (single 2-element structure from one lane) on page A8-774 0x10 1010 Vector Store VST3 (single 3-element structure from one lane) on page A8-778 0x11 1011 Vector Store VST4 (single 4-element structure from one lane) on page A8-782 Table A7-21 Element and structure load instructions (L == 1) A7-28 A B Instruction See 0 0010 011x 1010 Vector Load VLD1 (multiple single elements) on page A8-602 0011 100x Vector Load VLD2 (multiple 2-element structures) on page A8-608 010x Vector Load VLD3 (multiple 3-element structures) on page A8-614 000x Vector Load VLD4 (multiple 4-element structures) on page A8-620 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Advanced SIMD and VFP Instruction Encoding Table A7-21 Element and structure load instructions (L == 1) (continued) ARM DDI 0406B A B Instruction See 1 0x00 1000 Vector Load VLD1 (single element to one lane) on page A8-604 1100 Vector Load VLD1 (single element to all lanes) on page A8-606 0x01 1001 Vector Load VLD2 (single 2-element structure to one lane) on page A8-610 1101 Vector Load VLD2 (single 2-element structure to all lanes) on page A8-612 0x10 1010 Vector Load VLD3 (single 3-element structure to one lane) on page A8-616 1110 Vector Load VLD3 (single 3-element structure to all lanes) on page A8-618 0x11 1011 Vector Load VLD4 (single 4-element structure to one lane) on page A8-622 1111 Vector Load VLD4 (single 4-element structure to all lanes) on page A8-624 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-29 Advanced SIMD and VFP Instruction Encoding A7.7.1 Advanced SIMD addressing mode All the element and structure load/store instructions use this addressing mode. There is a choice of three formats: [{@}] The address is contained in ARM core register Rn. Rn is not updated by this instruction. Encoded as Rm = 0b1111. If Rn is encoded as 0b1111, the instruction is UNPREDICTABLE. [{@}]! The address is contained in ARM core register Rn. Rn is updated by this instruction: Rn = Rn + transfer_size Encoded as Rm = 0b1101. transfer_size is the number of bytes transferred by the instruction. This means that, after the instruction is executed, Rn points to the address in memory immediately following the last address loaded from or stored to. If Rn is encoded as 0b1111, the instruction is UNPREDICTABLE. This addressing mode can also be written as: [{@align}], # However, disassembly produces the [{@align}]! form. [{@}], The address is contained in ARM core register . Rn is updated by this instruction: Rn = Rn + Rm Encoded as Rm = Rm. Rm must not be encoded as 0b1111 or 0b1101 (the PC or the SP). If Rn is encoded as 0b1111, the instruction is UNPREDICTABLE. In all cases, specifies an optional alignment. Details are given in the individual instruction descriptions. A7-30 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Advanced SIMD and VFP Instruction Encoding A7.8 8, 16, and 32-bit transfer between ARM core and extension registers Thumb encoding 15 14 13 12 11 10 9 8 7 6 5 4 1 1 1 T 1 1 1 0 A 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 L 1 0 1 C B 3 2 1 0 1 ARM encoding 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 cond 1 1 1 0 A L 1 0 1 C B 3 2 1 0 1 If T == 1 in the Thumb encoding or cond == 0b1111 in the ARM encoding, the instruction is UNDEFINED. Otherwise, the allocation of encodings in this space is shown in Table A7-22. Other encodings in this space are UNDEFINED. These instructions are MRC and MCR instructions for coprocessors 10 and 11. Table A7-22 8-bit, 16-bit and 32-bit data transfer instructions L C A B Instruction See 0 0 000 - Vector Move VMOV (between ARM core register and single-precision register) on page A8-648 111 - Move to VFP Special Register from ARM core register VMSR on page A8-660 VMSR on page B6-29 (System level view) 0xx - Vector Move VMOV (ARM core register to scalar) on page A8-644 1xx 0x Vector Duplicate VDUP (ARM core register) on page A8-594 000 - Vector Move VMOV (between ARM core register and single-precision register) on page A8-648 111 - Move to ARM core register from VFP Special Register VMRS on page A8-658 VMRS on page B6-27 (System level view) xxx - Vector Move VMOV (scalar to ARM core register) on page A8-646 0 1 1 0 1 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A7-31 Advanced SIMD and VFP Instruction Encoding A7.9 64-bit transfers between ARM core and extension registers Thumb encoding 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 T 1 1 0 0 0 1 0 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 C 1 0 op ARM encoding 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 0 0 0 1 0 7 6 5 4 3 2 1 0 1 C 1 0 op If T == 1 in the Thumb encoding or cond == 0b1111 in the ARM encoding, the instruction is UNDEFINED. Otherwise, the allocation of encodings in this space is shown in Table A7-23. Other encodings in this space are UNDEFINED. These instructions are MRRC and MCRR instructions for coprocessors 10 and 11. Table A7-23 8-bit, 16-bit and 32-bit data transfer instructions A7-32 C op Instruction 0 00x1 VMOV (between two ARM core registers and two single-precision registers) on page A8-650 1 00x1 VMOV (between two ARM core registers and a doubleword extension register) on page A8-652 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Chapter A8 Instruction Details This chapter describes each instruction. It contains the following sections: • Format of instruction descriptions on page A8-2 • Standard assembler syntax fields on page A8-7 • Conditional execution on page A8-8 • Shifts applied to a register on page A8-10 • Memory accesses on page A8-13 • Alphabetical list of instructions on page A8-14. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-1 Instruction Details A8.1 Format of instruction descriptions The instruction descriptions in Alphabetical list of instructions on page A8-14 normally use the following format: • instruction section title • introduction to the instruction • instruction encoding(s) with architecture information • assembler syntax • pseudocode describing how the instruction operates • exception information • notes (where applicable). Each of these items is described in more detail in the following subsections. A few instruction descriptions describe alternative mnemonics for other instructions and use an abbreviated and modified version of this format. A8.1.1 Instruction section title The instruction section title gives the base mnemonic for the instructions described in the section. When one mnemonic has multiple forms described in separate instruction sections, this is followed by a short description of the form in parentheses. The most common use of this is to distinguish between forms of an instruction in which one of the operands is an immediate value and forms in which it is a register. Parenthesized text is also used to document the former mnemonic in some cases where a mnemonic has been replaced entirely by another mnemonic in the new assembler syntax. A8.1.2 Introduction to the instruction The instruction section title is followed by text that briefly describes the main features of the instruction. This description is not necessarily complete and is not definitive. If there is any conflict between it and the more detailed information that follows, the latter takes priority. A8.1.3 Instruction encodings This is a list of one or more instruction encodings. Each instruction encoding is labelled as: • T1, T2, T3 … for the first, second, third and any additional Thumb encodings • A1, A2, A3 … for the first, second, third and any additional ARM encodings • E1, E2, E3 … for the first, second, third and any additional ThumbEE encodings that are not also Thumb encodings. Where Thumb and ARM encodings are very closely related, the two encodings are described together, for example as encoding T1 / A1. A8-2 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Each instruction encoding description consists of: • Information about which architecture variants include the particular encoding of the instruction. This is presented in one of two ways: — For instruction encodings that are in the main instruction set architecture, as a list of the architecture variants that include the encoding. See Architecture versions, profiles, and variants on page A1-4 for a summary of these variants. — For instruction encodings that are in the architecture extensions, as a list of the architecture extensions that include the encoding. See Architecture extensions on page A1-6 for a summary of the architecture extensions and the architecture variants that they can extend. In architecture variant lists: • — ARMv7 means ARMv7-A and ARMv7-R profiles. The architecture variant information in this manual does not cover the ARMv7-M profile. — * is used as a wildcard. For example, ARMv5T* means ARMv5T, ARMv5TE, and ARMv5TEJ. An assembly syntax that ensures that the assembler selects the encoding in preference to any other encoding. In some cases, multiple syntaxes are given. The correct one to use is sometimes indicated by annotations to the syntax, such as Inside IT block and Outside IT block. In other cases, the correct one to use can be determined by looking at the assembler syntax description and using it to determine which syntax corresponds to the instruction being disassembled. There is usually more than one syntax that ensures re-assembly to any particular encoding, and the exact set of syntaxes that do so usually depends on the register numbers, immediate constants and other operands to the instruction. For example, when assembling to the Thumb instruction set, the syntax AND R0,R0,R8 ensures selection of a 32-bit encoding but AND R0,R0,R1 selects a 16-bit encoding. The assembly syntax documented for the encoding is chosen to be the simplest one that ensures selection of that encoding for all operand combinations supported by that encoding. This often means that it includes elements that are only necessary for a small subset of operand combinations. For example, the assembler syntax documented for the 32-bit Thumb AND (register) encoding includes the .W qualifier to ensure that the 32-bit encoding is selected even for the small proportion of operand combinations for which the 16-bit encoding is also available. The assembly syntax given for an encoding is therefore a suitable one for a disassembler to disassemble that encoding to. However, disassemblers might wish to use simpler syntaxes when they are suitable for the operand combination, in order to produce more readable disassembled code. • An encoding diagram, or a Thumb encoding diagram followed by an ARM encoding diagram when they are being described together. This is half-width for 16-bit Thumb encodings and full-width for 32-bit Thumb and ARM encodings. The 32-bit Thumb encodings use a double vertical line between the two halfwords of the instruction to distinguish them from ARM encodings and to act as a reminder that 32-bit Thumb instructions consist of two consecutive halfwords rather than a word. In particular, if instructions are stored using the standard little-endian instruction endianness, the encoding diagram for an ARM instruction at address A shows the bytes at addressees A+3, A+2, A+1, A from left to right, but the encoding diagram for a 32-bit Thumb instruction shows them in the order A+1, A for the first halfword, followed by A+3, A+2 for the second halfword. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-3 Instruction Details • A8.1.4 Encoding-specific pseudocode. This is pseudocode that translates the encoding-specific instruction fields into inputs to the encoding-independent pseudocode in the later Operation subsection, and that picks out any special cases in the encoding. For a detailed description of the pseudocode used and of the relationship between the encoding diagram, the encoding-specific pseudocode and the encoding-independent pseudocode, see Appendix I Pseudocode Definition. Assembler syntax The Assembly syntax subsection describes the standard UAL syntax for the instruction. Each syntax description consists of the following elements: • One or more syntax prototype lines written in a typewriter font, using the conventions described in Assembler syntax prototype line conventions on page A8-5. Each prototype line documents the mnemonic and (where appropriate) operand parts of a full line of assembler code. When there is more than one such line, each prototype line is annotated to indicate required results of the encoding-specific pseudocode. For each instruction encoding, this information can be used to determine whether any instructions matching that encoding are available when assembling that syntax, and if so, which ones. • The line where: followed by descriptions of all of the variable or optional fields of the prototype syntax line. Some syntax fields are standardized across all or most instructions. Standard assembler syntax fields on page A8-7 describes these fields. By default, syntax fields that specify registers, such as , , or , can be any of R0-R12 or LR in Thumb instructions, and any of R0-R12, SP or LR in ARM instructions. These require that the encoding-specific pseudocode set the corresponding integer variable (such as d, n, or t) to the corresponding register number (0-12 for R0-R12, 13 for SP, 14 for LR). This can normally be done by setting the corresponding bitfield in the instruction (named Rd, Rn, Rt…) to the binary encoding of that number. In the case of 16-bit Thumb encodings, this bitfield is normally of length 3 and so the encoding is only available when one of R0-R7 is specified in the assembler syntax. It is also common for such encodings to use a bitfield name such as Rdn. This indicates that the encoding is only available if and specify the same register, and that the register number of that register is encoded in the bitfield if they do. The description of a syntax field that specifies a register sometimes extends or restricts the permitted range of registers or documents other differences from the default rules for such fields. Typical extensions are to permit the use of the SP in Thumb instructions and to permit the use of the PC (using register number 15). • A8-4 Where appropriate, text that briefly describes changes from the pre-UAL ARM assembler syntax. Where present, this usually consists of an alternative pre-UAL form of the assembler mnemonic. The pre-UAL ARM assembler syntax does not conflict with UAL, and support for it is a recommended optional extension to UAL, to enable the assembly of pre-UAL ARM assembler source files. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Note The pre-UAL Thumb assembler syntax is incompatible with UAL and is not documented in the instruction sections. For details see Appendix C Legacy Instruction Mnemonics. Assembler syntax prototype line conventions The following conventions are used in assembler syntax prototype lines and their subfields: < > Any item bracketed by < and > is a short description of a type of value to be supplied by the user in that position. A longer description of the item is normally supplied by subsequent text. Such items often correspond to a similarly named field in an encoding diagram for an instruction. When the correspondence simply requires the binary encoding of an integer value or register number to be substituted into the instruction encoding, it is not described explicitly. For example, if the assembler syntax for an ARM instruction contains an item and the instruction encoding diagram contains a 4-bit field named Rn, the number of the register specified in the assembler syntax is encoded in binary in the instruction field. If the correspondence between the assembler syntax item and the instruction encoding is more complex than simple binary encoding of an integer or register number, the item description indicates how it is encoded. This is often done by specifying a required output from the encoding-specific pseudocode, such as add = TRUE. The assembler must only use encodings that produce that output. { } Any item bracketed by { and } is optional. A description of the item and of how its presence or absence is encoded in the instruction is normally supplied by subsequent text. Many instructions have an optional destination register. Unless otherwise stated, if such a destination register is omitted, it is the same as the immediately following source register in the instruction syntax. spaces Single spaces are used for clarity, to separate items. When a space is obligatory in the assembler syntax, two or more consecutive spaces are used. +/- This indicates an optional + or - sign. If neither is coded, + is assumed. All other characters must be encoded precisely as they appear in the assembler syntax. Apart from { and }, the special characters described above do not appear in the basic forms of assembler instructions documented in this manual. The { and } characters need to be encoded in a few places as part of a variable item. When this happens, the long description of the variable item indicates how they must be used. A8.1.5 Pseudocode describing how the instruction operates The Operation subsection contains encoding-independent pseudocode that describes the main operation of the instruction. For a detailed description of the pseudocode used and of the relationship between the encoding diagram, the encoding-specific pseudocode and the encoding-independent pseudocode, see Appendix I Pseudocode Definition. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-5 Instruction Details A8.1.6 Exception information The Exceptions subsection contains a list of the exceptional conditions that can be caused by execution of the instruction. Processor exceptions are listed as follows: • Resets and interrupts (both IRQs and FIQs) are not listed. They can occur before or after the execution of any instruction, and in some cases during the execution of an instruction, but they are not in general caused by the instruction concerned. • Prefetch Abort exceptions are normally caused by a memory abort when an instruction is fetched, followed by an attempt to execute that instruction. This can happen for any instruction, but is caused by the aborted attempt to fetch the instruction rather than by the instruction itself, and so is not listed. A special case is the BKPT instruction, that is defined as causing a Prefetch Abort exception in some circumstances. • Data Abort exceptions are listed for all instructions that perform data memory accesses. • Undefined Instruction exceptions are listed when they are part of the effects of a defined instruction. For example, all coprocessor instructions are defined to produce the Undefined Instruction exception if not accepted by their coprocessor. Undefined Instruction exceptions caused by the execution of an UNDEFINED instruction are not listed, even when the UNDEFINED instruction is a special case of one or more of the encodings of the instruction. Such special cases are instead indicated in the encoding-specific pseudocode for the encoding. • Supervisor Call and Secure Monitor Call exceptions are listed for the SVC and SMC instructions respectively. Supervisor Call exceptions and the SVC instruction were previously called Software Interrupt exceptions and the SWI instruction. Secure Monitor Call exceptions and the SMC instruction were previously called Secure Monitor interrupts and the SMI instruction. Floating-point exceptions are listed for instructions that can produce them. Floating-point exceptions on page A2-42 describes these exceptions. They do not normally result in processor exceptions. A8.1.7 Notes Where appropriate, other notes about the instruction appear under additional subheadings. Note Information that was documented in notes in previous versions of the ARM Architecture Reference Manual and its supplements has often been moved elsewhere. For example, operand restrictions on the values of bitfields in an instruction encoding are now normally documented in the encoding-specific pseudocode for that encoding. A8-6 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details A8.2 Standard assembler syntax fields The following assembler syntax fields are standard across all or most instructions: Is an optional field. It specifies the condition under which the instruction is executed. See Conditional execution on page A8-8 for the range of available conditions and their encoding. If is omitted, it defaults to always (AL). Specifies optional assembler qualifiers on the instruction. The following qualifiers are defined: .N Meaning narrow, specifies that the assembler must select a 16-bit encoding for the instruction. If this is not possible, an assembler error is produced. .W Meaning wide, specifies that the assembler must select a 32-bit encoding for the instruction. If this is not possible, an assembler error is produced. If neither .W nor .N is specified, the assembler can select either 16-bit or 32-bit encodings. If both are available, it must select a 16-bit encoding. In a few cases, more than one encoding of the same length can be available for an instruction. The rules for selecting between such encodings are instruction-specific and are part of the instruction description. Note When assembling to the ARM instruction set, the .N qualifier produces an assembler error and the .W qualifier has no effect. Although the instruction descriptions throughout this manual show the and fields without { } around them, these fields are optional as described in this section. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-7 Instruction Details A8.3 Conditional execution Most ARM instructions, and most Thumb instructions from ARMv6T2 onwards, can be executed conditionally, based on the values of the APSR condition flags. Before ARMv6T2, the only conditional Thumb instruction was the 16-bit conditional branch instruction. Table A8-1 lists the available conditions. In Thumb instructions, the condition (if it is not AL) is normally encoded in a preceding IT instruction. For details see Conditional instructions on page A4-4 and IT on page A8-104. Some conditional branch instructions do not require a preceding IT instruction, and include a condition code in their encoding. In ARM instructions, bits [31:28] of the instruction contain the condition, or contain 1111 for some ARM instructions that can only be executed unconditionally. Table A8-1 Condition codes cond Mnemonic extension Meaning (integer) Meaning (floating-point) a Condition flags 0000 EQ Equal Equal Z == 1 0001 NE Not equal Not equal, or unordered Z == 0 0010 CS b Carry set Greater than, equal, or unordered C == 1 0011 CC c Carry clear Less than C == 0 0100 MI Minus, negative Less than N == 1 0101 PL Plus, positive or zero Greater than, equal, or unordered N == 0 0110 VS Overflow Unordered V == 1 0111 VC No overflow Not unordered V == 0 1000 HI Unsigned higher Greater than, or unordered C == 1 and Z == 0 1001 LS Unsigned lower or same Less than or equal C == 0 or Z == 1 1010 GE Signed greater than or equal Greater than or equal N == V 1011 LT Signed less than Less than, or unordered N != V 1100 GT Signed greater than Greater than Z == 0 and N == V 1101 LE Signed less than or equal Less than, equal, or unordered Z == 1 or N != V 1110 None (AL) d Always (unconditional) Always (unconditional) Any a. b. c. d. A8-8 Unordered means at least one NaN operand. HS (unsigned higher or same) is a synonym for CS. LO (unsigned lower) is a synonym for CC. AL is an optional mnemonic extension for always, except in IT instructions. For details see IT on page A8-104. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details A8.3.1 Pseudocode details of conditional execution The CurrentCond() pseudocode function has prototype: bits(4) CurrentCond() and returns a 4-bit condition specifier as follows: • For ARM instructions, it returns bits[31:28] of the instruction. • For the T1 and T3 encodings of the Branch instruction (see B on page A8-44), it returns the 4-bit 'cond' field of the encoding. • For all other Thumb and ThumbEE instructions, it returns ITSTATE.IT<7:4>. See ITSTATE on page A2-17. The ConditionPassed() function uses this condition specifier and the APSR condition flags to determine whether the instruction must be executed: // ConditionPassed() // ================= boolean ConditionPassed() cond = CurrentCond(); // Evaluate base condition. case cond<3:1> of when ‘000’ result = (APSR.Z when ‘001’ result = (APSR.C when ‘010’ result = (APSR.N when ‘011’ result = (APSR.V when ‘100’ result = (APSR.C when ‘101’ result = (APSR.N when ‘110’ result = (APSR.N when ‘111’ result = TRUE; == == == == == == == ‘1’); ‘1’); ‘1’); ‘1’); ‘1’) && (APSR.Z == ‘0’); APSR.V); APSR.V) && (APSR.Z == ‘0’); // // // // // // // // EQ CS MI VS HI GE GT AL or or or or or or or NE CC PL VC LS LT LE // Condition bits ‘111x’ indicate the instruction is always executed. Otherwise, // invert condition if necessary. if cond<0> == ‘1’ && cond != ‘1111’ then result = !result; return result; ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-9 Instruction Details A8.4 Shifts applied to a register ARM register offset load/store word and unsigned byte instructions can apply a wide range of different constant shifts to the offset register. Both Thumb and ARM data-processing instructions can apply the same range of different constant shifts to the second operand register. For details see Constant shifts. ARM data-processing instructions can apply a register-controlled shift to the second operand register. A8.4.1 Constant shifts These are the same in Thumb and ARM instructions, except that the input bits come from different positions. is an optional shift to be applied to . It can be any one of: (omitted) No shift. LSL # Logical shift left bits. 1 <= <= 31. LSR # Logical shift right bits. 1 <= <= 32. ASR # Arithmetic shift right bits. 1 <= <= 32. ROR # Rotate right bits. 1 <= <= 31. Rotate right one bit, with extend. Bit [0] is written to shifter_carry_out, bits [31:1] are shifted right one bit, and the Carry Flag is shifted into bit [31]. RRX Note Assemblers can permit the use of some or all of ASR #0, LSL #0, LSR #0, and ROR #0 to specify that no shift is to be performed. This is not standard UAL, and the encoding selected for Thumb instructions might vary between UAL assemblers if it is used. To ensure disassembled code assembles to the original instructions, disassemblers must omit the shift specifier when the instruction specifies no shift. Similarly, assemblers can permit the use of #0 in the immediate forms of ASR, LSL, LSR, and ROR instructions to specify that no shift is to be performed, that is, that a MOV (register) instruction is wanted. Again, this is not standard UAL, and the encoding selected for Thumb instructions might vary between UAL assemblers if it is used. To ensure disassembled code assembles to the original instructions, disassemblers must use the MOV (register) syntax when the instruction specifies no shift. A8-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Encoding The assembler encodes into two type bits and five immediate bits, as follows: (omitted) type = 0b00, immediate = 0. LSL # type = 0b00, immediate = . LSR # type = 0b01. If < 32, immediate = . If == 32, immediate = 0. ASR # type = 0b10. If < 32, immediate = . If == 32, immediate = 0. ROR # type = 0b11, immediate = 0. RRX A8.4.2 type = 0b11, immediate = . Register controlled shifts These are only available in ARM instructions. is the type of shift to apply to the value read from . It must be one of: Arithmetic shift right, encoded as type = 0b10 Logical shift left, encoded as type = 0b00 Logical shift right, encoded as type = 0b01 Rotate right, encoded as type = 0b11. ASR LSL LSR ROR The bottom byte of contains the shift amount. A8.4.3 Pseudocode details of instruction-specified shifts and rotates enumeration SRType (SRType_LSL, SRType_LSR, SRType_ASR, SRType_ROR, SRType_RRX); // DecodeImmShift() // ================ (SRType, integer) DecodeImmShift(bits(2) type, bits(5) imm5) case type of when ‘00’ shift_t = SRType_LSL; when ‘01’ shift_t = SRType_LSR; when ‘10’ shift_t = SRType_ASR; when ‘11’ ARM DDI 0406B shift_n = UInt(imm5); shift_n = if imm5 == ‘00000’ then 32 else UInt(imm5); shift_n = if imm5 == ‘00000’ then 32 else UInt(imm5); Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-11 Instruction Details if imm5 == ‘00000’ then shift_t = SRType_RRX; else shift_t = SRType_ROR; shift_n = 1; shift_n = UInt(imm5); return (shift_t, shift_n); // DecodeRegShift() // ================ SRType DecodeRegShift(bits(2) type) case type of when ‘00’ shift_t = SRType_LSL; when ‘01’ shift_t = SRType_LSR; when ‘10’ shift_t = SRType_ASR; when ‘11’ shift_t = SRType_ROR; return shift_t; // Shift() // ======= bits(N) Shift(bits(N) value, SRType type, integer amount, bit carry_in) (result, -) = Shift_C(value, type, amount, carry_in); return result; // Shift_C() // ========= (bits(N), bit) Shift_C(bits(N) value, SRType type, integer amount, bit carry_in) assert !(type == SRType_RRX && amount != 1); if amount == 0 then (result, carry_out) = (value, else case type of when SRType_LSL (result, carry_out) = when SRType_LSR (result, carry_out) = when SRType_ASR (result, carry_out) = when SRType_ROR (result, carry_out) = when SRType_RRX (result, carry_out) = carry_in); LSL_C(value, amount); LSR_C(value, amount); ASR_C(value, amount); ROR_C(value, amount); RRX_C(value, carry_in); return (result, carry_out); A8-12 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details A8.5 Memory accesses Commonly, the following addressing modes are permitted for memory access instructions: Offset addressing The offset value is applied to an address obtained from the base register. The result is used as the address for the memory access. The value of the base register is unchanged. The assembly language syntax for this mode is: [,] Pre-indexed addressing The offset value is applied to an address obtained from the base register. The result is used as the address for the memory access, and written back into the base register. The assembly language syntax for this mode is: [,]! Post-indexed addressing The address obtained from the base register is used, unchanged, as the address for the memory access. The offset value is applied to the address, and written back into the base register The assembly language syntax for this mode is: [], In each case, is the base register. can be: • an immediate constant, such as or • an index register, • a shifted index register, such as , LSL #. For information about unaligned access, endianness, and exclusive access, see: • Alignment support on page A3-4 • Endian support on page A3-7 • Synchronization and semaphores on page A3-12. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-13 Instruction Details A8.6 Alphabetical list of instructions Every instruction is listed in this section. For details of the format used see Format of instruction descriptions on page A8-2. A8.6.1 ADC (immediate) Add with Carry (immediate) adds an immediate value and the carry flag value to a register value, and writes the result to the destination register. It can optionally update the condition flags based on the result. Encoding T1 ARMv6T2, ARMv7 ADC{S} ,,# 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 0 i 0 1 0 1 0 S 1 0 15 14 13 12 11 10 9 8 Rn 0 d = UInt(Rd); n = UInt(Rn); setflags = (S == ‘1’); if BadReg(d) || BadReg(n) then UNPREDICTABLE; Encoding A1 imm3 7 6 5 4 3 2 Rd 1 0 imm8 imm32 = ThumbExpandImm(i:imm3:imm8); ARMv4*, ARMv5T*, ARMv6*, ARMv7 ADC{S} ,,# 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 0 0 1 0 1 0 1 S Rn Rd 7 6 5 4 3 2 1 0 imm12 if Rd == ‘1111’ && S == ‘1’ then SEE SUBS PC, LR and related instructions; d = UInt(Rd); n = UInt(Rn); setflags = (S == ‘1’); imm32 = ARMExpandImm(imm12); A8-14 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax ADC{S} {,} , # where: S If S is present, the instruction updates the flags. Otherwise, the flags are not updated. See Standard assembler syntax fields on page A8-7. The destination register. The first operand register. The immediate value to be added to the value obtained from . See Modified immediate constants in Thumb instructions on page A6-17 or Modified immediate constants in ARM instructions on page A5-9 for the range of values. The pre-UAL syntax ADCS is equivalent to ADCS. Operation if ConditionPassed() then EncodingSpecificOperations(); (result, carry, overflow) = AddWithCarry(R[n], imm32, APSR.C); if d == 15 then // Can only occur for ARM encoding ALUWritePC(result); // setflags is always FALSE here else R[d] = result; if setflags then APSR.N = result<31>; APSR.Z = IsZeroBit(result); APSR.C = carry; APSR.V = overflow; Exceptions None. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-15 Instruction Details A8.6.2 ADC (register) Add with Carry (register) adds a register value, the carry flag value, and an optionally-shifted register value, and writes the result to the destination register. It can optionally update the condition flags based on the result. Encoding T1 ARMv4T, ARMv5T*, ARMv6*, ARMv7 Outside IT block. Inside IT block. ADCS , ADC , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 0 1 0 0 0 0 0 1 0 1 Rm 1 0 Rdn d = UInt(Rdn); n = UInt(Rdn); m = UInt(Rm); (shift_t, shift_n) = (SRType_LSL, 0); Encoding T2 setflags = !InITBlock(); ARMv6T2, ARMv7 ADC{S}.W ,,{,} 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 1 1 1 0 1 0 1 1 0 1 0 S imm2 type Rn (0) imm3 Rd 6 5 4 3 2 1 0 Rm d = UInt(Rd); n = UInt(Rn); m = UInt(Rm); setflags = (S == ‘1’); (shift_t, shift_n) = DecodeImmShift(type, imm3:imm2); if BadReg(d) || BadReg(n) || BadReg(m) then UNPREDICTABLE; Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7 ADC{S} ,,{,} 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 0 0 0 0 1 0 1 S Rn Rd 7 6 5 4 3 2 imm5 type 0 1 0 Rm if Rd == ‘1111’ && S == ‘1’ then SEE SUBS PC, LR and related instructions; d = UInt(Rd); n = UInt(Rn); m = UInt(Rm); setflags = (S == ‘1’); (shift_t, shift_n) = DecodeImmShift(type, imm5); A8-16 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax ADC{S} {,} , {,} where: S If S is present, the instruction updates the flags. Otherwise, the flags are not updated. See Standard assembler syntax fields on page A8-7. The destination register. The first operand register. The optionally shifted second operand register. The shift to apply to the value read from . If present, encoding T1 is not permitted. If absent, no shift is applied and any encoding is permitted. Shifts applied to a register on page A8-10 describes the shifts and how they are encoded. In Thumb assembly: • outside an IT block, if ADCS ,, has and both in the range R0-R7, it is assembled using encoding T1 as though ADCS , had been written. • inside an IT block, if ADC ,, has and both in the range R0-R7, it is assembled using encoding T1 as though ADC , had been written. To prevent either of these happening, use the .W qualifier. The pre-UAL syntax ADCS is equivalent to ADCS. Operation if ConditionPassed() then EncodingSpecificOperations(); shifted = Shift(R[m], shift_t, shift_n, APSR.C); (result, carry, overflow) = AddWithCarry(R[n], shifted, APSR.C); if d == 15 then // Can only occur for ARM encoding ALUWritePC(result); // setflags is always FALSE here else R[d] = result; if setflags then APSR.N = result<31>; APSR.Z = IsZeroBit(result); APSR.C = carry; APSR.V = overflow; Exceptions None. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-17 Instruction Details A8.6.3 ADC (register-shifted register) Add with Carry (register-shifted register) adds a register value, the carry flag value, and a register-shifted register value. It writes the result to the destination register, and can optionally update the condition flags based on the result. Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7 ADC{S} ,,, 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 0 0 0 0 1 0 1 S Rn Rd Rs 7 6 5 4 3 2 0 type 1 1 0 Rm d = UInt(Rd); n = UInt(Rn); m = UInt(Rm); s = UInt(Rs); setflags = (S == ‘1’); shift_t = DecodeRegShift(type); if d == 15 || n == 15 || m == 15 || s == 15 then UNPREDICTABLE; A8-18 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax ADC{S} {,} , , where: S If S is present, the instruction updates the flags. Otherwise, the flags are not updated. See Standard assembler syntax fields on page A8-7. The destination register. The first operand register. The register that is shifted and used as the second operand. The type of shift to apply to the value read from . It must be one of: ASR Arithmetic shift right, encoded as type = 0b10 LSL Logical shift left, encoded as type = 0b00 LSR Logical shift right, encoded as type = 0b01 ROR Rotate right, encoded as type = 0b11. The register whose bottom byte contains the amount to shift by. The pre-UAL syntax ADCS is equivalent to ADCS. Operation if ConditionPassed() then EncodingSpecificOperations(); shift_n = UInt(R[s]<7:0>); shifted = Shift(R[m], shift_t, shift_n, APSR.C); (result, carry, overflow) = AddWithCarry(R[n], shifted, APSR.C); R[d] = result; if setflags then APSR.N = result<31>; APSR.Z = IsZeroBit(result); APSR.C = carry; APSR.V = overflow; Exceptions None. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-19 Instruction Details A8.6.4 ADD (immediate, Thumb) This instruction adds an immediate value to a register value, and writes the result to the destination register. It can optionally update the condition flags based on the result. Encoding T1 ARMv4T, ARMv5T*, ARMv6*, ARMv7 ADDS ,,# Outside IT block. Inside IT block. ADD ,,# 15 14 13 12 11 10 9 8 7 0 0 0 1 1 1 0 d = UInt(Rd); imm3 n = UInt(Rn); Encoding T2 6 5 4 3 2 Rn 1 0 Rd setflags = !InITBlock(); imm32 = ZeroExtend(imm3, 32); ARMv4T, ARMv5T*, ARMv6*, ARMv7 ADDS ,# Outside IT block. Inside IT block. ADD ,# 15 14 13 12 11 10 9 8 7 0 0 1 1 0 d = UInt(Rdn); Encoding T3 6 5 4 3 2 Rdn 1 0 imm8 n = UInt(Rdn); setflags = !InITBlock(); imm32 = ZeroExtend(imm8, 32); ARMv6T2, ARMv7 ADD{S}.W ,,# 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 0 i 0 1 0 0 0 S 1 0 15 14 13 12 11 10 9 8 Rn 0 imm3 7 6 5 4 3 2 Rd 1 0 imm8 if Rd == ‘1111’ && S == ‘1’ then SEE CMN (immediate); if Rn == ‘1101’ then SEE ADD (SP plus immediate); d = UInt(Rd); n = UInt(Rn); setflags = (S == ‘1’); imm32 = ThumbExpandImm(i:imm3:imm8); if BadReg(d) || n == 15 then UNPREDICTABLE; Encoding T4 ARMv6T2, ARMv7 ADDW ,,# 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 0 i 1 0 0 0 0 0 1 0 15 14 13 12 11 10 9 8 Rn 0 imm3 7 6 5 4 3 2 Rd 1 0 imm8 if Rn == ‘1111’ then SEE ADR; if Rn == ‘1101’ then SEE ADD (SP plus immediate); d = UInt(Rd); n = UInt(Rn); setflags = FALSE; imm32 = ZeroExtend(i:imm3:imm8, 32); if BadReg(d) then UNPREDICTABLE; A8-20 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax ADD{S} ADDW {,} , # {,} , # All encodings permitted Only encoding T4 permitted where: S If S is present, the instruction updates the flags. Otherwise, the flags are not updated. See Standard assembler syntax fields on page A8-7. The destination register. The first operand register. If is SP, see ADD (SP plus immediate) on page A8-28. If is PC, see ADR on page A8-32. The immediate value to be added to the value obtained from . The range of values is 0-7 for encoding T1, 0-255 for encoding T2 and 0-4095 for encoding T4. See Modified immediate constants in Thumb instructions on page A6-17 for the range of values for encoding T3. When multiple encodings of the same length are available for an instruction, encoding T3 is preferred to encoding T4 (if encoding T4 is required, use the ADDW syntax). Encoding T1 is preferred to encoding T2 if is specified and encoding T2 is preferred to encoding T1 if is omitted. The pre-UAL syntax ADDS is equivalent to ADDS. Operation if ConditionPassed() then EncodingSpecificOperations(); (result, carry, overflow) = AddWithCarry(R[n], imm32, ‘0’); R[d] = result; if setflags then APSR.N = result<31>; APSR.Z = IsZeroBit(result); APSR.C = carry; APSR.V = overflow; Exceptions None. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-21 Instruction Details A8.6.5 ADD (immediate, ARM) This instruction adds an immediate value to a register value, and writes the result to the destination register. It can optionally update the condition flags based on the result. Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7 ADD{S} ,,# 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 0 0 1 0 1 0 0 S Rn Rd 7 6 5 4 3 2 1 0 imm12 if Rn == ‘1111’ && S == ‘0’ then SEE ADR; if Rn == ‘1101’ then SEE ADD (SP plus immediate); if Rd == ‘1111’ && S == ‘1’ then SEE SUBS PC, LR and related instructions; d = UInt(Rd); n = UInt(Rn); setflags = (S == ‘1’); imm32 = ARMExpandImm(imm12); A8-22 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax ADD{S} {,} , # where: S If S is present, the instruction updates the flags. Otherwise, the flags are not updated. See Standard assembler syntax fields on page A8-7. The destination register. The first operand register. If the SP is specified for , see ADD (SP plus immediate) on page A8-28. If the PC is specified for , see ADR on page A8-32. The immediate value to be added to the value obtained from . See Modified immediate constants in ARM instructions on page A5-9 for the range of values. The pre-UAL syntax ADDS is equivalent to ADDS. Operation if ConditionPassed() then EncodingSpecificOperations(); (result, carry, overflow) = AddWithCarry(R[n], imm32, ‘0’); if d == 15 then ALUWritePC(result); // setflags is always FALSE here else R[d] = result; if setflags then APSR.N = result<31>; APSR.Z = IsZeroBit(result); APSR.C = carry; APSR.V = overflow; Exceptions None. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-23 Instruction Details A8.6.6 ADD (register) This instruction adds a register value and an optionally-shifted register value, and writes the result to the destination register. It can optionally update the condition flags based on the result. Encoding T1 ARMv4T, ARMv5T*, ARMv6*, ARMv7 ADDS ,, Outside IT block. Inside IT block. ADD ,, 15 14 13 12 11 10 9 8 7 0 0 0 1 1 0 0 6 5 4 3 2 Rm 1 0 Rn Rd d = UInt(Rd); n = UInt(Rn); m = UInt(Rm); (shift_t, shift_n) = (SRType_LSL, 0); setflags = !InITBlock(); Encoding T2 ARMv6T2, ARMv7 if and are both from R0-R7 ARMv4T, ARMv5T*, ARMv6*, ARMv7 otherwise ADD , If is the PC, must be outside or last in IT block. 15 14 13 12 11 10 9 8 7 0 1 0 0 0 1 0 0 DN 6 5 4 3 2 1 0 Rm Rdn if (DN:Rdn) == ‘1101’ || Rm == ‘1101’ then SEE ADD (SP plus register); d = UInt(DN:Rdn); n = d; m = UInt(Rm); setflags = FALSE; (shift_t, shift_n) = (SRType_LSL, 0); if n == 15 && m == 15 then UNPREDICTABLE; if d == 15 && InITBlock() && !LastInITBlock() then UNPREDICTABLE; Encoding T3 ARMv6T2, ARMv7 ADD{S}.W ,,{,} 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 1 1 1 0 1 0 1 1 0 0 0 S imm2 type Rn (0) imm3 Rd 6 5 4 3 2 1 0 Rm if Rd == ‘1111’ && S == ‘1’ then SEE CMN (register); if Rn == ‘1101’ then SEE ADD (SP plus register); d = UInt(Rd); n = UInt(Rn); m = UInt(Rm); setflags = (S == ‘1’); (shift_t, shift_n) = DecodeImmShift(type, imm3:imm2); if BadReg(d) || n == 15 || BadReg(m) then UNPREDICTABLE; Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7 ADD{S} ,,{,} 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 0 0 0 0 1 0 0 S Rn Rd 7 6 5 4 3 2 imm5 type 0 1 0 Rm if Rd == ‘1111’ && S == ‘1’ then SEE SUBS PC, LR and related instructions; if Rn == ‘1101’ then SEE ADD (SP plus register); d = UInt(Rd); n = UInt(Rn); m = UInt(Rm); setflags = (S == ‘1’); (shift_t, shift_n) = DecodeImmShift(type, imm5); A8-24 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax ADD{S} {,} , {,} where: S If S is present, the instruction updates the flags. Otherwise, the flags are not updated. See Standard assembler syntax fields on page A8-7. The destination register. If omitted, is the same as and encoding T2 is preferred to encoding T1 inside an IT block. If is present, encoding T1 is preferred to encoding T2. The first operand register. If is SP, see ADD (SP plus register) on page A8-30. The register that is optionally shifted and used as the second operand. The shift to apply to the value read from . If present, only encoding T3 or A1 is permitted. If omitted, no shift is applied and any encoding is permitted. Shifts applied to a register on page A8-10 describes the shifts and how they are encoded. In Thumb assembly, inside an IT block, if ADD ,, cannot be assembled using encoding T1, it is assembled using encoding T2 as though ADD , had been written. To prevent this happening, use the .W qualifier. The pre-UAL syntax ADDS is equivalent to ADDS. Operation if ConditionPassed() then EncodingSpecificOperations(); shifted = Shift(R[m], shift_t, shift_n, APSR.C); (result, carry, overflow) = AddWithCarry(R[n], shifted, ‘0’); if d == 15 then ALUWritePC(result); // setflags is always FALSE here else R[d] = result; if setflags then APSR.N = result<31>; APSR.Z = IsZeroBit(result); APSR.C = carry; APSR.V = overflow; Exceptions None. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-25 Instruction Details A8.6.7 ADD (register-shifted register) Add (register-shifted register) adds a register value and a register-shifted register value. It writes the result to the destination register, and can optionally update the condition flags based on the result. Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7 ADD{S} ,,, 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 0 0 0 0 1 0 0 S Rn Rd Rs 7 6 5 4 3 2 0 type 1 1 0 Rm d = UInt(Rd); n = UInt(Rn); m = UInt(Rm); s = UInt(Rs); setflags = (S == ‘1’); shift_t = DecodeRegShift(type); if d == 15 || n == 15 || m == 15 || s == 15 then UNPREDICTABLE; A8-26 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax ADD{S} {,} , , where: S If S is present, the instruction updates the flags. Otherwise, the flags are not updated. See Standard assembler syntax fields on page A8-7. The destination register. The first operand register. The register that is shifted and used as the second operand. The type of shift to apply to the value read from . It must be one of: ASR Arithmetic shift right, encoded as type = 0b10 LSL Logical shift left, encoded as type = 0b00 LSR Logical shift right, encoded as type = 0b01 ROR Rotate right, encoded as type = 0b11. The register whose bottom byte contains the amount to shift by. The pre-UAL syntax ADDS is equivalent to ADDS. Operation if ConditionPassed() then EncodingSpecificOperations(); shift_n = UInt(R[s]<7:0>); shifted = Shift(R[m], shift_t, shift_n, APSR.C); (result, carry, overflow) = AddWithCarry(R[n], shifted, ‘0’); R[d] = result; if setflags then APSR.N = result<31>; APSR.Z = IsZeroBit(result); APSR.C = carry; APSR.V = overflow; Exceptions None. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-27 Instruction Details A8.6.8 ADD (SP plus immediate) This instruction adds an immediate value to the SP value, and writes the result to the destination register. Encoding T1 ARMv4T, ARMv5T*, ARMv6*, ARMv7 ADD ,SP,# 15 14 13 12 11 10 9 8 7 1 0 1 0 1 d = UInt(Rd); 6 5 4 3 2 Rd imm8 setflags = FALSE; Encoding T2 1 0 imm32 = ZeroExtend(imm8:’00’, 32); ARMv4T, ARMv5T*, ARMv6*, ARMv7 ADD SP,SP,# 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 0 0 0 0 0 d = 13; setflags = FALSE; Encoding T3 1 0 imm7 imm32 = ZeroExtend(imm7:’00’, 32); ARMv6T2, ARMv7 ADD{S}.W ,SP,# 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 1 0 i 0 1 0 0 0 S 1 1 0 1 0 imm3 7 6 5 4 3 2 Rd 1 0 imm8 if Rd == ‘1111’ && S == ‘1’ then SEE CMN (immediate); d = UInt(Rd); setflags = (S == ‘1’); imm32 = ThumbExpandImm(i:imm3:imm8); if d == 15 then UNPREDICTABLE; Encoding T4 ARMv6T2, ARMv7 ADDW ,SP,# 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 1 0 i 1 0 0 0 0 0 1 1 0 1 0 d = UInt(Rd); setflags = FALSE; if d == 15 then UNPREDICTABLE; Encoding A1 imm3 7 6 5 4 3 2 Rd 1 0 imm8 imm32 = ZeroExtend(i:imm3:imm8, 32); ARMv4*, ARMv5T*, ARMv6*, ARMv7 ADD{S} ,SP,# 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 0 0 1 0 1 0 0 S 1 1 0 1 Rd 7 6 5 4 3 2 1 0 imm12 if Rd == ‘1111’ && S == ‘1’ then SEE SUBS PC, LR and related instructions; d = UInt(Rd); setflags = (S == ‘1’); imm32 = ARMExpandImm(imm12); A8-28 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax ADD{S} ADDW {,} SP, # {,} SP, # All encodings permitted Only encoding T4 is permitted where: S If S is present, the instruction updates the flags. Otherwise, the flags are not updated. See Standard assembler syntax fields on page A8-7. The destination register. If omitted, is SP. The immediate value to be added to the value obtained from SP. Values are multiples of 4 in the range 0-1020 for encoding T1, multiples of 4 in the range 0-508 for encoding T2 and any value in the range 0-4095 for encoding T4. See Modified immediate constants in Thumb instructions on page A6-17 or Modified immediate constants in ARM instructions on page A5-9 for the range of values for encodings T3 and A1. When both 32-bit encodings are available for an instruction, encoding T3 is preferred to encoding T4 (if encoding T4 is required, use the ADDW syntax). The pre-UAL syntax ADDS is equivalent to ADDS. Operation if ConditionPassed() then EncodingSpecificOperations(); (result, carry, overflow) = AddWithCarry(SP, imm32, ‘0’); if d == 15 then // Can only occur for ARM encoding ALUWritePC(result); // setflags is always FALSE here else R[d] = result; if setflags then APSR.N = result<31>; APSR.Z = IsZeroBit(result); APSR.C = carry; APSR.V = overflow; Exceptions None. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-29 Instruction Details A8.6.9 ADD (SP plus register) This instruction adds an optionally-shifted register value to the SP value, and writes the result to the destination register. Encoding T1 ARMv4T, ARMv5T*, ARMv6*, ARMv7 ADD , SP, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 1 0 0 0 1 0 0 DM 1 1 0 1 Rdm d = UInt(DM:Rdm); m = UInt(DM:Rdm); setflags = FALSE; (shift_t, shift_n) = (SRType_LSL, 0); Encoding T2 ARMv4T, ARMv5T*, ARMv6*, ARMv7 ADD SP, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 0 1 0 0 0 1 0 0 1 Rm 1 0 1 0 1 if Rm == ‘1101’ then SEE encoding T1; d = 13; m = UInt(Rm); setflags = FALSE; (shift_t, shift_n) = (SRType_LSL, 0); Encoding T3 ARMv6T2, ARMv7 ADD{S}.W ,SP,{,} 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 0 1 0 1 1 0 0 0 S 1 1 0 1 0 imm3 Rd 7 6 5 4 3 2 imm2 type 1 0 Rm d = UInt(Rd); m = UInt(Rm); setflags = (S == ‘1’); (shift_t, shift_n) = DecodeImmShift(type, imm3:imm2); if d == 13 && (shift_t != SRType_LSL || shift_n > 3) then UNPREDICTABLE; if d == 15 || BadReg(m) then UNPREDICTABLE; Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7 ADD{S} ,SP,{,} 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 0 0 0 0 1 0 0 S 1 1 0 1 Rd 7 6 5 4 3 2 imm5 type 0 1 0 Rm if Rd == ‘1111’ && S == ‘1’ then SEE SUBS PC, LR and related instructions; d = UInt(Rd); m = UInt(Rm); setflags = (S == ‘1’); (shift_t, shift_n) = DecodeImmShift(type, imm5); A8-30 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax ADD{S} {,} SP, {, } where: S If S is present, the instruction updates the flags. Otherwise, the flags are not updated. See Standard assembler syntax fields on page A8-7. The destination register. This register can be SP. If omitted, is SP. This register can be the PC, but if it is, encoding T3 is not permitted. Using the PC is deprecated. The register that is optionally shifted and used as the second operand. This register can be the PC, but if it is, encoding T3 is not permitted. Using the PC is deprecated. This register can be SP in both ARM and Thumb instructions, but: • the use of SP is deprecated • when assembling for the Thumb instruction set, only encoding T1 is available and so the instruction can only be ADD SP,SP,SP. The shift to apply to the value read from . If omitted, no shift is applied and any encoding is permitted. If present, only encoding T3 or A1 is permitted. Shifts applied to a register on page A8-10 describes the shifts and how they are encoded. In the Thumb instruction set, if is SP or omitted, is only permitted to be omitted, LSL #1, LSL #2, or LSL #3. The pre-UAL syntax ADDS is equivalent to ADDS. Operation if ConditionPassed() then EncodingSpecificOperations(); shifted = Shift(R[m], shift_t, shift_n, APSR.C); (result, carry, overflow) = AddWithCarry(SP, shifted, ‘0’); if d == 15 then ALUWritePC(result); // setflags is always FALSE here else R[d] = result; if setflags then APSR.N = result<31>; APSR.Z = IsZeroBit(result); APSR.C = carry; APSR.V = overflow; Exceptions None. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-31 Instruction Details A8.6.10 ADR This instruction adds an immediate value to the PC value to form a PC-relative address, and writes the result to the destination register. Encoding T1 ARMv4T, ARMv5T*, ARMv6*, ARMv7 ADR ,
, , VABA.
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 0 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 0 1 1 1 N Q M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 0 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 1 1 1 N Q M 1 1 0 1 0 Vm if size == ‘11’ then UNDEFINED; if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; unsigned = (U == ‘1’); long_destination = FALSE; esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; Encoding T2 / A2 VABAL.
Advanced SIMD , , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 1 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 0 1 0 1 N 0 M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 1 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 1 0 1 N 0 M 0 1 0 1 0 Vm if size == ‘11’ then SEE “Related encodings”; if Vd<0> == ‘1’ then UNDEFINED; unsigned = (U == ‘1’); long_destination = TRUE; esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = 1; Related encodings A8-526 See Advanced SIMD data-processing instructions on page A7-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VABA.
, , VABA.
, , VABAL.
, , Encoding T1 / A1, Q = 1 Encoding T1 / A1, Q = 0 Encoding T2 / A2 where: See Standard assembler syntax fields on page A8-7. An ARM VABA or VABAL instruction must be unconditional.
The data type for the elements of the operands. It must be one of: S8 encoded as size = 0b00, U = 0 S16 encoded as size = 0b01, U = 0 S32 encoded as size = 0b10, U = 0 U8 encoded as size = 0b00, U = 1 U16 encoded as size = 0b01, U = 1 U32 encoded as size = 0b10, U = 1. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. , , The destination vector and the operand vectors, for a long operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 op1 = Elem[D[n+r],e,esize]; op2 = Elem[D[m+r],e,esize]; absdiff = Abs(Int(op1,unsigned) - Int(op2,unsigned)); if long_destination then Elem[Q[d>>1],e,2*esize] = Elem[Q[d>>1],e,2*esize] + absdiff; else Elem[D[d+r],e,esize] = Elem[D[d+r],e,esize] + absdiff; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-527 Instruction Details A8.6.267 VABD, VABDL (integer) Vector Absolute Difference {Long} (integer) subtracts the elements of one vector from the corresponding elements of another vector, and places the absolute values of the results in the elements of the destination vector. Operand and result elements are either all integers of the same length, or optionally the results can be double the length of the operands. Encoding T1 / A1 Advanced SIMD VABD.
, , VABD.
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 0 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 0 1 1 1 N Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 0 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 1 1 1 N Q M 0 1 0 1 0 Vm if size == ‘11’ then UNDEFINED; if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; unsigned = (U == ‘1’); long_destination = FALSE; esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; Encoding T2 / A2 VABDL.
Advanced SIMD , , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 1 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 0 1 1 1 N 0 M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 1 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 1 1 1 N 0 M 0 1 0 1 0 Vm if size == ‘11’ then SEE “Related encodings”; if Vd<0> == ‘1’ then UNDEFINED; unsigned = (U == ‘1’); long_destination = TRUE; esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = 1; Related encodings A8-528 See Advanced SIMD data-processing instructions on page A7-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VABD.
, , VABD.
, , VABDL.
, , Encoding T1 / A1, Q = 1 Encoding T1 / A1, Q = 0 Encoding T2 / A2 where: See Standard assembler syntax fields on page A8-7. An ARM VABD or VABDL instruction must be unconditional.
The data type for the elements of the operands. It must be one of: S8 encoded as size = 0b00, U = 0 S16 encoded as size = 0b01, U = 0 S32 encoded as size = 0b10, U = 0 U8 encoded as size = 0b00, U = 1 U16 encoded as size = 0b01, U = 1 U32 encoded as size = 0b10, U = 1. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. , , The destination vector and the operand vectors, for a long operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 op1 = Elem[D[n+r],e,esize]; op2 = Elem[D[m+r],e,esize]; absdiff = Abs(Int(op1,unsigned) - Int(op2,unsigned)); if long_destination then Elem[Q[d>>1],e,2*esize] = absdiff<2*esize-1:0>; else Elem[D[d+r],e,esize] = absdiff; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-529 Instruction Details A8.6.268 VABD (floating-point) Vector Absolute Difference (floating-point) subtracts the elements of one vector from the corresponding elements of another vector, and places the absolute values of the results in the elements of the destination vector. Operand and result elements are all single-precision floating-point numbers. Encoding T1 / A1 Advanced SIMD (UNDEFINED in integer-only variant) VABD.F32 , , VABD.F32
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 1 1 1 0 D 1 sz 1 0 15 14 13 12 11 10 9 8 Vn Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 0 D 1 sz Vn Vd 7 6 5 4 3 2 1 1 0 1 N Q M 0 7 6 5 4 3 2 1 1 0 1 N Q M 0 1 0 Vm 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; if sz == ‘1’ then UNDEFINED; esize = 32; elements = 2; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-530 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VABD.F32 , , VABD.F32
, , Encoded as Q = 1, sz = 0 Encoded as Q = 0, sz = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VABD instruction must be unconditional. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 op1 = Elem[D[n+r],e,esize]; op2 = Elem[D[m+r],e,esize]; Elem[D[d+r],e,esize] = FPAbs(FPSub(op1,op2,FALSE)); Exceptions Undefined Instruction. Floating-point exceptions: Input Denormal, Invalid Operation, Overflow, Underflow, and Inexact. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-531 Instruction Details A8.6.269 VABS Vector Absolute takes the absolute value of each element in a vector, and places the results in a second vector. The floating-point version only clears the sign bit. Encoding T1 / A1 Advanced SIMD (F = 1 UNDEFINED in integer-only variants) VABS.
, VABS.
, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 1 1 1 1 1 1 D 1 1 size 0 1 Vd 0 F 1 1 0 Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D 1 1 size 0 1 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 F 1 1 0 Q M 0 1 0 1 0 Vm if size == ‘11’ || (F == ‘1’ && size != ‘10’) then UNDEFINED; if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; advsimd = TRUE; floating_point = (F == ‘1’); esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; Encoding T2 / A2 VFPv2, VFPv3 (sz = 1 UNDEFINED in single-precision only variants) VABS.F64
, VABS.F32 , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 0 1 1 1 0 1 D 1 1 0 0 0 0 Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 1 0 1 D 1 1 0 0 0 0 Vd 7 6 5 4 3 2 1 0 1 sz 1 1 M 0 7 6 5 4 3 2 1 0 1 sz 1 1 M 0 1 0 Vm 1 0 Vm if FPSCR.LEN != ‘000’ || FPSCR.STRIDE != ‘00’ then SEE “VFP vectors”; advsimd = FALSE; dp_operation = (sz == ‘1’); d = if dp_operation then UInt(D:Vd) else UInt(Vd:D); m = if dp_operation then UInt(M:Vm) else UInt(Vm:M); VFP vectors A8-532 Encoding T2 / A2 can operate on VFP vectors under control of the FPSCR.LEN and FPSCR.STRIDE bits. For details see Appendix F VFP Vector Operation Support. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VABS.
, VABS.
, VABS.F32 ,
!= F64 VFP only. Encoding T2/A2, sz = 0 where: See Standard assembler syntax fields on page A8-7. An ARM Advanced SIMD VABS instruction must be unconditional.
The data type for the elements of the vectors. It must be one of: S8 encoding T1 / A1, size = 0b00, F = 0 S16 encoding T1 / A1, size = 0b01, F = 0 S32 encoding T1 / A1, size = 0b10, F = 0 F32 encoding T1 / A1, size = 0b10, F = 1 F64 encoding T2 / A2, sz = 1. , The destination vector and the operand vector, for a quadword operation.
, The destination vector and the operand vector, for a doubleword operation. , The destination vector and the operand vector, for a singleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDOrVFPEnabled(TRUE, advsimd); if advsimd then // Advanced SIMD instruction for r = 0 to regs-1 for e = 0 to elements-1 if floating_point then Elem[D[d+r],e,esize] = FPAbs(Elem[D[m+r],e,esize]); else result = Abs(SInt(Elem[D[m+r],e,esize])); Elem[D[d+r],e,esize] = result; else // VFP instruction if dp_operation then D[d] = FPAbs(D[m]); else S[d] = FPAbs(S[m]); Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-533 Instruction Details A8.6.270 VACGE, VACGT, VACLE,VACLT VACGE (Vector Absolute Compare Greater Than or Equal) and VACGT (Vector Absolute Compare Greater Than) take the absolute value of each element in a vector, and compare it with the absolute value of the corresponding element of a second vector. If the condition is true, the corresponding element in the destination vector is set to all ones. Otherwise, it is set to all zeros. VACLE (Vector Absolute Compare Less Than or Equal) is a pseudo-instruction, equivalent to a VACGE instruction with the operands reversed. Disassembly produces the VACGE instruction. VACLT (Vector Absolute Compare Less Than) is a pseudo-instruction, equivalent to a VACGT instruction with the operands reversed. Disassembly produces the VACGT instruction. The operands and result can be quadword or doubleword vectors. They must all be the same size. The operand vector elements must be 32-bit floating-point numbers. The result vector elements are 32-bit bitfields. Encoding T1 / A1 Advanced SIMD (UNDEFINED in integer-only variant) V.F32 , , V.F32
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 1 1 1 0 D op sz 1 0 15 14 13 12 11 10 9 8 Vn Vd 1 1 1 0 N Q M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 0 D op sz Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 1 1 0 N Q M 1 1 0 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; if sz == ‘1’ then UNDEFINED; or_equal = (op == ‘0’); esize = 32; elements = 2; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-534 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax V.F32 {,} , V.F32 {
,} , Encoded as Q = 1 Encoded as Q = 0 where: The operation. must be one of: ACGE Absolute Compare Greater than or Equal, encoded as op = 0 ACGT Absolute Compare Greater Than, encoded as op = 1. See Standard assembler syntax fields on page A8-7. An ARM VACGE, VACGT, VACLE, or VACLT instruction must be unconditional. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 op1 = FPAbs(Elem[D[n+r],e,esize]); op2 = FPAbs(Elem[D[m+r],e,esize]); if or_equal then test_passed = FPCompareGE(op1, op2, FALSE); else test_passed = FPCompareGT(op1, op2, FALSE); Elem[D[d+r],e,esize] = if test_passed then Ones(esize) else Zeros(esize); Exceptions Undefined Instruction. Floating-point exceptions: Input Denormal and Invalid Operation. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-535 Instruction Details A8.6.271 VADD (integer) Vector Add adds corresponding elements in two vectors, and places the results in the destination vector. Encoding T1 / A1 Advanced SIMD VADD.
, , VADD.
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 1 0 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 0 0 D size Vn Vd 7 6 5 4 3 2 1 0 0 0 N Q M 0 7 6 5 4 3 2 1 0 0 0 N Q M 0 1 0 Vm 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-536 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VADD.
{,} , VADD.
{
,} , where: See Standard assembler syntax fields on page A8-7. An ARM Advanced SIMD VADD instruction must be unconditional.
The data type for the elements of the vectors. It must be one of: I8 size = 0b00 I16 size = 0b01 I32 size = 0b10 I64 size = 0b11. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 Elem[D[d+r],e,esize] = Elem[D[n+r],e,esize] + Elem[D[m+r],e,esize]; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-537 Instruction Details A8.6.272 VADD (floating-point) Vector Add adds corresponding elements in two vectors, and places the results in the destination vector. Encoding T1 / A1 Advanced SIMD (UNDEFINED in integer-only variant) VADD.F32 , , VADD.F32
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 1 0 D 0 sz 1 0 15 14 13 12 11 10 9 8 Vn Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 0 0 D 0 sz Vn Vd 7 6 5 4 3 2 1 1 0 1 N Q M 0 7 6 5 4 3 2 1 1 0 1 N Q M 0 1 0 Vm 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; if sz == ‘1’ then UNDEFINED; advsimd = TRUE; esize = 32; elements = 2; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; Encoding T2 / A2 VFPv2, VFPv3 (sz = 1 UNDEFINED in single-precision only variants) VADD.F64
, , VADD.F32 , , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 0 0 D 1 1 1 0 15 14 13 12 11 10 9 8 Vn Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 1 0 0 D 1 1 Vn Vd 7 6 5 4 3 2 1 0 1 sz N 0 M 0 7 6 5 4 3 2 1 0 1 sz N 0 M 0 1 0 Vm 1 0 Vm if FPSCR.LEN != ‘000’ || FPSCR.STRIDE != ‘00’ then SEE “VFP vectors”; advsimd = FALSE; dp_operation = (sz == ‘1’); d = if dp_operation then UInt(D:Vd) else UInt(Vd:D); n = if dp_operation then UInt(N:Vn) else UInt(Vn:N); m = if dp_operation then UInt(M:Vm) else UInt(Vm:M); VFP vectors A8-538 Encoding T2 / A2 can operate on VFP vectors under control of the FPSCR.LEN and FPSCR.STRIDE bits. For details see Appendix F VFP Vector Operation Support. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VADD.F32 {,} , VADD.F32 {
,} , VADD.F64 {
,} , VADD.F32 {,} , Encoding T1 / A1, Q = 1, sz = 0 Encoding T1 / A1, Q = 0, sz = 0 Encoding T2 / A2, sz = 1 Encoding T2 / A2, sz = 0 where: See Standard assembler syntax fields on page A8-7. An ARM Advanced SIMD VADD instruction must be unconditional. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. , , The destination vector and the operand vectors, for a singleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDOrVFPEnabled(TRUE, advsimd); if advsimd then // Advanced SIMD instruction for r = 0 to regs-1 for e = 0 to elements-1 Elem[D[d+r],e,esize] = FPAdd(Elem[D[n+r],e,esize], Elem[D[m+r],e,esize], FALSE); else // VFP instruction if dp_operation then D[d] = FPAdd(D[n], D[m], TRUE); else S[d] = FPAdd(S[n], S[m], TRUE); Exceptions Undefined Instruction. Floating-point exceptions: Input Denormal, Invalid Operation, Overflow, Underflow, and Inexact. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-539 Instruction Details A8.6.273 VADDHN Vector Add and Narrow, returning High Half adds corresponding elements in two quadword vectors, and places the most significant half of each result in a doubleword vector. The results are truncated. (For rounded results, see VRADDHN on page A8-726). The operand elements can be 16-bit, 32-bit, or 64-bit integers. There is no distinction between signed and unsigned integers. Encoding T1 / A1 Advanced SIMD VADDHN.
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 1 1 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 0 1 0 0 N 0 M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 0 1 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 1 0 0 N 0 M 0 1 0 1 0 Vm if size == ‘11’ then SEE “Related encodings”; if Vn<0> == ‘1’ || Vm<0> == ‘1’ then UNDEFINED; esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); Related encodings A8-540 See Advanced SIMD data-processing instructions on page A7-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VADDHN.
, , where: See Standard assembler syntax fields on page A8-7. An ARM VADDHN instruction must be unconditional.
The data type for the elements of the operands. It must be one of: I16 size = 0b00 I32 size = 0b01 I64 size = 0b10.
, , The destination vector, the first operand vector, and the second operand vector. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for e = 0 to elements-1 result = Elem[Q[n>>1],e,2*esize] + Elem[Q[m>>1],e,2*esize]; Elem[D[d],e,esize] = result<2*esize-1:esize>; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-541 Instruction Details A8.6.274 VADDL, VADDW VADDL (Vector Add Long) adds corresponding elements in two doubleword vectors, and places the results in a quadword vector. Before adding, it sign-extends or zero-extends the elements of both operands. VADDW (Vector Add Wide) adds corresponding elements in one quadword and one doubleword vector, and places the results in a quadword vector. Before adding, it sign-extends or zero-extends the elements of the doubleword operand. Encoding T1 / A1 Advanced SIMD VADDL.
, , VADDW.
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 1 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 0 0 0 op N 0 M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 1 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 0 0 op N 0 M 0 1 0 1 0 Vm if size == ‘11’ then SEE “Related encodings”; if Vd<0> == ‘1’ || (op == ‘1’ && Vn<0> == ‘1’) then UNDEFINED; esize = 8 << UInt(size); elements = 64 DIV esize; is_vaddw == (op == ‘1’); d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); Related encodings A8-542 See Advanced SIMD data-processing instructions on page A7-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VADDL.
, , VADDW.
{,} , Encoded as op = 0 Encoded as op = 1 where: See Standard assembler syntax fields on page A8-7. An ARM VADDL or VADDW instruction must be unconditional.
The data type for the elements of the second operand vector. It must be one of: S8 encoded as size = 0b00, U = 0 S16 encoded as size = 0b01, U = 0 S32 encoded as size = 0b10, U = 0 U8 encoded as size = 0b00, U = 1 U16 encoded as size = 0b01, U = 1 U32 encoded as size = 0b10, U = 1. The destination register. If this register is omitted in a VADDW instruction, it is the same register as . , The first and second operand registers for a VADDW instruction. , The first and second operand registers for a VADDL instruction. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for e = 0 to elements-1 if is_vaddw then op1 = Int(Elem[Q[n>>1],e,2*esize], unsigned); else op1 = Int(Elem[D[n],e,esize], unsigned); result = op1 + Int(Elem[D[m],e,esize],unsigned); Elem[Q[d>>1],e,2*esize] = result<2*esize-1:0>; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-543 Instruction Details A8.6.275 VAND (immediate) This is a pseudo-instruction, equivalent to a VBIC (immediate) instruction with the immediate value bitwise inverted. For details see VBIC (immediate) on page A8-546. A8.6.276 VAND (register) This instruction performs a bitwise AND operation between two registers, and places the result in the destination register. Encoding T1 / A1 Advanced SIMD VAND , , VAND
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 1 0 D 0 0 1 0 15 14 13 12 11 10 9 8 Vn Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 0 0 D 0 0 Vn Vd 7 6 5 4 3 2 0 0 0 1 N Q M 1 7 6 5 4 3 2 0 0 0 1 N Q M 1 1 0 Vm 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-544 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VAND{.
} {,} , VAND{.
} {
,} , Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VAND instruction must be unconditional.
An optional data type. It is ignored by assemblers, and does not affect the encoding. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 D[d+r] = D[n+r] AND D[m+r]; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-545 Instruction Details A8.6.277 VBIC (immediate) Vector Bitwise Bit Clear (immediate) performs a bitwise AND between a register value and the complement of an immediate value, and returns the result into the destination vector. For the range of constants available, see One register and a modified immediate value on page A7-21. Encoding T1 / A1 Advanced SIMD VBIC.
, # VBIC.
, # 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 i 1 1 1 1 1 D 0 0 0 1 0 15 14 13 12 11 10 9 8 imm3 Vd cmode 7 6 5 4 3 2 0 Q 1 1 imm4 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 0 0 1 i 1 D 0 0 0 0 Q 1 1 imm3 Vd cmode 1 0 1 0 imm4 if cmode<0> == ‘0’ || cmode<3:2> == ‘11’ then SEE “Related encodings”; if Q == ‘1’ && Vd<0> == ‘1’ then UNDEFINED; imm64 = AdvSIMDExpandImm(‘1’, cmode, i:imm3:imm4); d = UInt(D:Vd); regs = if Q == ‘0’ then 1 else 2; Related encodings A8-546 See One register and a modified immediate value on page A7-21 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VBIC.
{,} , # VBIC.
{
,}
, #> Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VBIC instruction must be unconditional.
The data type used for . It can be either I16 or I32. I8, I64, and F32 are also permitted, but the resulting syntax is a pseudo-instruction. The destination vector for a quadword operation.
The destination vector for a doubleword operation. A constant of the type specified by
. This constant is replicated enough times to fill the destination register. For example, VBIC.I32 D0,#10 ANDs the complement of 0x0000000A0000000A with D0, and puts the result into D0. For details of the range of constants available and the encoding of
and , see One register and a modified immediate value on page A7-21. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 D[d+r] = D[d+r] AND NOT(imm64); Exceptions Undefined Instruction. Pseudo-instructions VAND can be used with a range of constants that are the bitwise inverse of the available constants for VBIC. This is assembled as the equivalent VBIC instruction. Disassembly produces the VBIC form. One register and a modified immediate value on page A7-21 describes pseudo-instructions with a combination of
and that is not supported by hardware, but that generates the same destination register value as a different combination that is supported by hardware. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-547 Instruction Details A8.6.278 VBIC (register) Vector Bitwise Bit Clear (register) performs a bitwise AND between a register value and the complement of a register value, and places the result in the destination register. Encoding T1 / A1 Advanced SIMD VBIC , , VBIC
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 1 0 D 0 1 1 0 15 14 13 12 11 10 9 8 Vn Vd 0 0 0 1 N Q M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 0 0 D 0 1 Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 0 0 1 N Q M 1 1 0 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-548 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VBIC{.
} {,} , VBIC{.
} {
,} , Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VBIC instruction must be unconditional.
An optional data type. It is ignored by assemblers, and does not affect the encoding. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 D[d+r] = D[n+r] AND NOT(D[m+r]); Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-549 Instruction Details A8.6.279 VBIF, VBIT, VBSL VBIF (Vector Bitwise Insert if False), VBIT (Vector Bitwise Insert if True), and VBSL (Vector Bitwise Select) perform bitwise selection under the control of a mask, and place the results in the destination register. The registers can be either quadword or doubleword, and must all be the same size. Encoding T1 / A1 Advanced SIMD V , , V
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 1 1 1 0 D op 1 0 15 14 13 12 11 10 9 8 Vn Vd 0 0 0 1 N Q M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 0 D op Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 0 0 1 N Q M 1 1 0 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; if op == ‘00’ then SEE VEOR; if op == ‘01’ then operation = VBitOps_VBSL; if op == ‘10’ then operation = VBitOps_VBIT; if op == ‘11’ then operation = VBitOps_VBIF; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-550 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax V{.
} {,} , V{.
} {
,} , Encoded as Q = 1 Encoded as Q = 0 where: The operation. It must be one of: BIF Bitwise Insert if False, encoded as op = 0b11. Inserts each bit from Vn into Vd if the corresponding bit of Vm is 0, otherwise leaves the Vd bit unchanged. BIT Bitwise Insert if True, encoded as op = 0b10. Inserts each bit from Vn into Vd if the corresponding bit of Vm is 1, otherwise leaves the Vd bit unchanged. BSL Bitwise Select, encoded as op = 0b01. Selects each bit from Vn into Vd if the corresponding bit of Vd is 1, otherwise selects the bit from Vm. See Standard assembler syntax fields on page A8-7. An ARM VBIF, VBIT, or VBSL instruction must be unconditional.
An optional data type. It is ignored by assemblers, and does not affect the encoding. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. Operation enumeration VBitOps {VBitOps_VBIF, VBitOps_VBIT, VBitOps_VBSL}; if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 case operation of when VBitOps_VBIF D[d+r] = (D[d+r] AND D[m+r]) OR (D[n+r] AND NOT(D[m+r]); when VBitOps_VBIT D[d+r] = (D[n+r] AND D[m+r]) OR (D[d+r] AND NOT(D[m+r]); when VBitOps_VBSL D[d+r] = (D[n+r] AND D[d+r]) OR (D[m+r] AND NOT(D[d+r]); Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-551 Instruction Details A8.6.280 VCEQ (register) VCEQ (Vector Compare Equal) takes each element in a vector, and compares it with the corresponding element of a second vector. If they are equal, the corresponding element in the destination vector is set to all ones. Otherwise, it is set to all zeros. The operand vector elements can be any one of: • 8-bit, 16-bit, or 32-bit integers. There is no distinction between signed and unsigned integers. • 32-bit floating-point numbers. The result vector elements are bitfields the same size as the operand vector elements. Encoding T1 / A1 Advanced SIMD VCEQ.
, ,
an integer type VCEQ.
, ,
an integer type 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 1 1 1 0 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 0 D size Vn Vd 7 6 5 4 3 2 1 0 0 0 N Q M 1 7 6 5 4 3 2 1 0 0 0 N Q M 1 1 0 Vm 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; if size == ‘11’ then UNDEFINED; int_operation = TRUE; esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; Encoding T2 / A2 Advanced SIMD (UNDEFINED in integer-only variant) VCEQ.F32 , , VCEQ.F32
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 1 0 D 0 sz 1 0 15 14 13 12 11 10 9 8 Vn Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 0 0 D 0 sz Vn Vd 7 6 5 4 3 2 1 1 1 0 N Q M 0 7 6 5 4 3 2 1 1 1 0 N Q M 0 1 0 Vm 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; if sz == ‘1’ then UNDEFINED; int_operation = FALSE; esize = 32; elements = 2; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-552 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VCEQ.
{,} , VCEQ.
{
,} , Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VCEQ instruction must be unconditional.
The data types for the elements of the operands. It must be one of: I8 encoding T1 / A1, size = 0b00 I16 encoding T1 / A1, size = 0b01 I32 encoding T1 / A1, size = 0b10 F32 encoding T2 / A2, sz = 0. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 op1 = Elem[D[n+r],e,esize]; op2 = Elem[D[m+r],e,esize]; if int_operation then test_passed = (op1 == op2); else test_passed = FPCompareEQ(op1, op2, FALSE); Elem[D[d+r],e,esize] = if test_passed then Ones(esize) else Zeros(esize); Exceptions Undefined Instruction. Floating-point exceptions: Input Denormal and Invalid Operation. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-553 Instruction Details A8.6.281 VCEQ (immediate #0) VCEQ #0 (Vector Compare Equal to zero) takes each element in a vector, and compares it with zero. If it is equal to zero, the corresponding element in the destination vector is set to all ones. Otherwise, it is set to all zeros. The operand vector elements can be any one of: • 8-bit, 16-bit, or 32-bit integers. There is no distinction between signed and unsigned integers. • 32-bit floating-point numbers. The result vector elements are bitfields the same size as the operand vector elements. Encoding T1 / A1 Advanced SIMD (F = 1 UNDEFINED in integer-only variants) VCEQ.
, , #0 VCEQ.
, , #0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 1 1 1 1 1 1 D 1 1 size 0 1 Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D 1 1 size 0 1 Vd 7 6 5 4 3 2 0 F 0 1 0 Q M 0 7 6 5 4 3 2 0 F 0 1 0 Q M 0 1 0 Vm 1 0 Vm if size == ‘11’ || (F == ‘1’ && size != ‘10’) then UNDEFINED; if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; floating_point = (F == ‘1’); esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-554 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VCEQ.
{,} , #0 VCEQ.
{
,} , #0 Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VCEQ instruction must be unconditional.
The data types for the elements of the operands. It must be one of: I8 encoded as size = 0b00, F = 0 I16 encoded as size = 0b01, F = 0 I32 encoded as size = 0b10, F = 0 F32 encoded as size = 0b10, F = 1. , The destination vector and the operand vector, for a quadword operation.
, The destination vector and the operand vector, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 if floating_point then test_passed = FPCompareEQ(Elem[D[m+r],e,esize], FPZero(‘0’,esize), FALSE); else test_passed = (Elem[D[m+r],e,esize] == Zeros(esize)); Elem[D[d+r],e,esize] = if test_passed then Ones(esize) else Zeros(esize); Exceptions Undefined Instruction. Floating-point exceptions: Input Denormal and Invalid Operation. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-555 Instruction Details A8.6.282 VCGE (register) VCGE (Vector Compare Greater Than or Equal) takes each element in a vector, and compares it with the corresponding element of a second vector. If the first is greater than or equal to the second, the corresponding element in the destination vector is set to all ones. Otherwise, it is set to all zeros. The operand vector elements can be any one of: • 8-bit, 16-bit, or 32-bit signed integers • 8-bit, 16-bit, or 32-bit unsigned integers • 32-bit floating-point numbers. The result vector elements are bitfields the same size as the operand vector elements. Encoding T1 / A1 Advanced SIMD VCGE.
, ,
an integer type VCGE.
, ,
an integer type 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 0 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 0 0 1 1 N Q M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 0 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 0 1 1 N Q M 1 1 0 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; if size == ‘11’ then UNDEFINED; type = if U == ‘1’ then VCGEtype_unsigned else VCGEtype_signed; esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; Encoding T2 / A2 Advanced SIMD (UNDEFINED in integer-only variant) VCGE.F32 , , VCGE.F32
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 1 1 1 0 D 0 sz 1 0 15 14 13 12 11 10 9 8 Vn Vd 1 1 1 0 N Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 0 D 0 sz Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 1 1 0 N Q M 0 1 0 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; if sz == ‘1’ then UNDEFINED; type = VCGEtype_fp; esize = 32; elements = 2; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-556 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VCGE.
{,} , VCGE.
{
,} , Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VCGE instruction must be unconditional.
The data types for the elements of the operands. It must be one of: S8 encoding T1 / A1, size = 0b00, U = 0 S16 encoding T1 / A1, size = 0b01, U = 0 S32 encoding T1 / A1, size = 0b10, U = 0 U8 encoding T1 / A1, size = 0b00, U = 1 U16 encoding T1 / A1, size = 0b01, U = 1 U32 encoding T1 / A1, size = 0b10, U = 1 F32 encoding T2 / A2, sz = 0. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. Operation enumeration VCGEtype {VCGEtype_signed, VCGEtype_unsigned, VCGEtype_fp}; if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 op1 = Elem[D[n+r],e,esize]; op2 = Elem[D[m+r],e,esize]; case type of when VCGEtype_signed test_passed = (SInt(op1) >= SInt(op2)); when VCGEtype_unsigned test_passed = (UInt(op1) >= UInt(op2)); when VCGEtype_fp test_passed = FPCompareGE(op1, op2, FALSE); Elem[D[d+r],e,esize] = if test_passed then Ones(esize) else Zeros(esize); Exceptions Undefined Instruction. Floating-point exceptions: Input Denormal and Invalid Operation. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-557 Instruction Details A8.6.283 VCGE (immediate #0) VCGE #0 (Vector Compare Greater Than or Equal to Zero) take each element in a vector, and compares it with zero. If it is greater than or equal to zero, the corresponding element in the destination vector is set to all ones. Otherwise, it is set to all zeros. The operand vector elements can be any one of: • 8-bit, 16-bit, or 32-bit signed integers • 32-bit floating-point numbers. The result vector elements are bitfields the same size as the operand vector elements. Encoding T1 / A1 Advanced SIMD (F = 1 UNDEFINED in integer-only variants) VCGE.
, , #0 VCGE.
, , #0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 1 1 1 1 1 1 D 1 1 size 0 1 Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D 1 1 size 0 1 Vd 7 6 5 4 3 2 0 F 0 0 1 Q M 0 7 6 5 4 3 2 0 F 0 0 1 Q M 0 1 0 Vm 1 0 Vm if size == ‘11’ || (F == ‘1’ && size != ‘10’) then UNDEFINED; if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; floating_point = (F == ‘1’); esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-558 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VCGE.
{,} , #0 VCGE.
{
,} , #0 Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VCGE instruction must be unconditional.
The data types for the elements of the operands. It must be one of: S8 encoded as size = 0b00, F = 0 S16 encoded as size = 0b01, F = 0 S32 encoded as size = 0b10, F = 0 F32 encoded as size = 0b10, F = 1. , The destination vector and the operand vector, for a quadword operation.
, The destination vector and the operand vector, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 if floating_point then test_passed = FPCompareGE(Elem[D[m+r],e,esize], FPZero(‘0’,esize), FALSE); else test_passed = (SInt(Elem[D[m+r],e,esize]) >= 0); Elem[D[d+r],e,esize] = if test_passed then Ones(esize) else Zeros(esize); Exceptions Undefined Instruction. Floating-point exceptions: Input Denormal and Invalid Operation. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-559 Instruction Details A8.6.284 VCGT (register) VCGT (Vector Compare Greater Than) takes each element in a vector, and compares it with the corresponding element of a second vector. If the first is greater than the second, the corresponding element in the destination vector is set to all ones. Otherwise, it is set to all zeros. The operand vector elements can be any one of: • 8-bit, 16-bit, or 32-bit signed integers • 8-bit, 16-bit, or 32-bit unsigned integers • 32-bit floating-point numbers. The result vector elements are bitfields the same size as the operand vector elements. Encoding T1 / A1 Advanced SIMD VCGT.
, ,
an integer type VCGT.
, ,
an integer type 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 0 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 0 0 1 1 N Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 0 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 0 1 1 N Q M 0 1 0 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; if size == ‘11’ then UNDEFINED; type = if U == ‘1’ then VCGTtype_unsigned else VCGTtype_signed; esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; Encoding T2 / A2 Advanced SIMD (UNDEFINED in integer-only variant) VCGT.F32 , , VCGT.F32
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 1 1 1 0 D 1 sz 1 0 15 14 13 12 11 10 9 8 Vn Vd 1 1 1 0 N Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 0 D 1 sz Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 1 1 0 N Q M 0 1 0 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; if sz == ‘1’ then UNDEFINED; type = VCGTtype_fp; esize = 32; elements = 2; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-560 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VCGT.
{,} , VCGT.
{
,} , Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VCGT instruction must be unconditional.
The data types for the elements of the operands. It must be one of: S8 encoding T1 / A1, size = 0b00, U = 0 S16 encoding T1 / A1, size = 0b01, U = 0 S32 encoding T1 / A1, size = 0b10, U = 0 U8 encoding T1 / A1, size = 0b00, U = 1 U16 encoding T1 / A1, size = 0b01, U = 1 U32 encoding T1 / A1, size = 0b10, U = 1 F32 encoding T2 / A2, sz = 0. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. Operation enumeration VCGTtype {VCGTtype_signed, VCGTtype_unsigned, VCGTtype_fp}; if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 op1 = Elem[D[n+r],e,esize]; op2 = Elem[D[m+r],e,esize]; case type of when VCGTtype_signed test_passed = (SInt(op1) > SInt(op2)); when VCGTtype_unsigned test_passed = (UInt(op1) > UInt(op2)); when VCGTtype_fp test_passed = FPCompareGT(op1, op2, FALSE); Elem[D[d+r],e,esize] = if test_passed then Ones(esize) else Zeros(esize); Exceptions Undefined Instruction. Floating-point exceptions: Input Denormal and Invalid Operation. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-561 Instruction Details A8.6.285 VCGT (immediate #0) VCGT #0 (Vector Compare Greater Than Zero) take each element in a vector, and compares it with zero. If it is greater than zero, the corresponding element in the destination vector is set to all ones. Otherwise, it is set to all zeros. The operand vector elements can be any one of: • 8-bit, 16-bit, or 32-bit signed integers • 32-bit floating-point numbers. The result vector elements are bitfields the same size as the operand vector elements. Encoding T1 / A1 Advanced SIMD (F = 1 UNDEFINED in integer-only variants) VCGT.
, , #0 VCGT.
, , #0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 1 1 1 1 1 1 D 1 1 size 0 1 Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D 1 1 size 0 1 Vd 7 6 5 4 3 2 0 F 0 0 0 Q M 0 7 6 5 4 3 2 0 F 0 0 0 Q M 0 1 0 Vm 1 0 Vm if size == ‘11’ || (F == ‘1’ && size != ‘10’) then UNDEFINED; if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; floating_point = (F == ‘1’); esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-562 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VCGT.
{,} , #0 VCGT.
{
,} , #0 Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VCGT instruction must be unconditional.
The data types for the elements of the operands. It must be one of: S8 encoded as size = 0b00, F = 0 S16 encoded as size = 0b01, F = 0 S32 encoded as size = 0b10, F = 0 F32 encoded as size = 0b10, F = 1. , The destination vector and the operand vector, for a quadword operation.
, The destination vector and the operand vector, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 if floating_point then test_passed = FPCompareGT(Elem[D[m+r],e,esize], FPZero(‘0’,esize), FALSE); else test_passed = (SInt(Elem[D[m+r],e,esize]) > 0); Elem[D[d+r],e,esize] = if test_passed then Ones(esize) else Zeros(esize); Exceptions Undefined Instruction. Floating-point exceptions: Input Denormal and Invalid Operation. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-563 Instruction Details A8.6.286 VCLE (register) VCLE is a pseudo-instruction, equivalent to a VCGE instruction with the operands reversed. For details see VCGE (register) on page A8-556. A8.6.287 VCLE (immediate #0) VCLE #0 (Vector Compare Less Than or Equal to Zero) take each element in a vector, and compares it with zero. If it is less than or equal to zero, the corresponding element in the destination vector is set to all ones. Otherwise, it is set to all zeros. The operand vector elements can be any one of: • 8-bit, 16-bit, or 32-bit signed integers • 32-bit floating-point numbers. The result vector elements are bitfields the same size as the operand vector elements. Encoding T1 / A1 Advanced SIMD (F = 1 UNDEFINED in integer-only variants) VCLE.
, , #0 VCLE.
, , #0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 1 1 1 1 1 1 D 1 1 size 0 1 Vd 0 F 0 1 1 Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D 1 1 size 0 1 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 F 0 1 1 Q M 0 1 0 1 0 Vm if size == ‘11’ || (F == ‘1’ && size != ‘10’) then UNDEFINED; if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; floating_point = (F == ‘1’); esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-564 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VCLE.
{,} , #0 VCLE.
{
,} , #0 Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VCLE instruction must be unconditional.
The data types for the elements of the operands. It must be one of: S8 encoded as size = 0b00, F = 0 S16 encoded as size = 0b01, F = 0 S32 encoded as size = 0b10, F = 0 F32 encoded as size = 0b10, F = 1. , The destination vector and the operand vector, for a quadword operation.
, The destination vector and the operand vector, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 if floating_point then test_passed = FPCompareGE(FPZero(‘0’,esize), Elem[D[m+r],e,esize], FALSE); else test_passed = (SInt(Elem[D[m+r],e,esize]) <= 0); Elem[D[d+r],e,esize] = if test_passed then Ones(esize) else Zeros(esize); Exceptions Undefined Instruction. Floating-point exceptions: Input Denormal and Invalid Operation. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-565 Instruction Details A8.6.288 VCLS Vector Count Leading Sign Bits counts the number of consecutive bits following the topmost bit, that are the same as the topmost bit, in each element in a vector, and places the results in a second vector. The count does not include the topmost bit itself. The operand vector elements can be any one of 8-bit, 16-bit, or 32-bit signed integers. The result vector elements are the same data type as the operand vector elements. Encoding T1 / A1 Advanced SIMD VCLS.
, VCLS.
, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 1 1 1 1 1 1 D 1 1 size 0 0 Vd 0 1 0 0 0 Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D 1 1 size 0 0 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 1 0 0 0 Q M 0 1 0 1 0 Vm if size == ‘11’ then UNDEFINED; if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-566 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VCLS.
, VCLS.
, Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VCLS instruction must be unconditional.
The data size for the elements of the operands. It must be one of: S8 encoded as size = 0b00 S16 encoded as size = 0b01 S32 encoded as size = 0b10. , The destination vector and the operand vector, for a quadword operation.
, The destination vector and the operand vector, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 Elem[D[d+r],e,esize] = CountLeadingSignBits(Elem[D[m+r],e,esize]); Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-567 Instruction Details A8.6.289 VCLT (register) VCLT is a pseudo-instruction, equivalent to a VCGT instruction with the operands reversed. For details see VCGT (register) on page A8-560. A8.6.290 VCLT (immediate #0) VCLT #0 (Vector Compare Less Than Zero) take each element in a vector, and compares it with zero. If it is less than zero, the corresponding element in the destination vector is set to all ones. Otherwise, it is set to all zeros. The operand vector elements can be any one of: • 8-bit, 16-bit, or 32-bit signed integers • 32-bit floating-point numbers. The result vector elements are bitfields the same size as the operand vector elements. Encoding T1 / A1 Advanced SIMD (F = 1 UNDEFINED in integer-only variants) VCLT.
, , #0 VCLT.
, , #0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 1 1 1 1 1 1 D 1 1 size 0 1 Vd 0 F 1 0 0 Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D 1 1 size 0 1 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 F 1 0 0 Q M 0 1 0 1 0 Vm if size == ‘11’ || (F == ‘1’ && size != ‘10’) then UNDEFINED; if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; floating_point = (F == ‘1’); esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-568 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VCLT.
{,} , #0 VCLT.
{
,} , #0 Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VCLT instruction must be unconditional.
The data types for the elements of the operands. It must be one of: S8 encoded as size = 0b00, F = 0 S16 encoded as size = 0b01, F = 0 S32 encoded as size = 0b10, F = 0 F32 encoded as size = 0b10, F = 1. , The destination vector and the operand vector, for a quadword operation.
, The destination vector and the operand vector, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 if floating_point then test_passed = FPCompareGT(FPZero(‘0’,esize), Elem[D[m+r],e,esize], FALSE); else test_passed = (SInt(Elem[D[m+r],e,esize]) < 0); Elem[D[d+r],e,esize] = if test_passed then Ones(esize) else Zeros(esize); Exceptions Undefined Instruction. Floating-point exceptions: Input Denormal and Invalid Operation. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-569 Instruction Details A8.6.291 VCLZ Vector Count Leading Zeros counts the number of consecutive zeros, starting from the most significant bit, in each element in a vector, and places the results in a second vector. The operand vector elements can be any one of 8-bit, 16-bit, or 32-bit integers. There is no distinction between signed and unsigned integers. The result vector elements are the same data type as the operand vector elements. Encoding T1 / A1 Advanced SIMD VCLZ.
, VCLZ.
, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 1 1 1 1 1 1 D 1 1 size 0 0 Vd 0 1 0 0 1 Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D 1 1 size 0 0 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 1 0 0 1 Q M 0 1 0 1 0 Vm if size == ‘11’ then UNDEFINED; if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-570 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VCLZ.
, VCLZ.
, Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VCLZ instruction must be unconditional.
The data size for the elements of the operands. It must be one of: I8 encoded as size = 0b00 I16 encoded as size = 0b01 I32 encoded as size = 0b10. , The destination vector and the operand vector, for a quadword operation.
, The destination vector and the operand vector, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 Elem[D[d+r],e,esize] = CountLeadingZeroBits(Elem[D[m+r],e,esize]); Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-571 Instruction Details A8.6.292 VCMP, VCMPE This instruction compares two floating-point registers, or one floating-point register and zero. It writes the result to the FPSCR flags. These are normally transferred to the ARM flags by a subsequent VMRS instruction. It can optionally raise an Invalid Operation exception if either operand is any type of NaN. It always raises an Invalid Operation exception if either operand is a signaling NaN. Encoding T1 / A1 VFPv2, VFPv3 (sz = 1 UNDEFINED in single-precision only variants) VCMP{E}.F64
, VCMP{E}.F32 , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 0 1 1 1 0 1 D 1 1 0 1 0 0 Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 1 0 1 D 1 1 0 1 0 0 Vd dp_operation = (sz == ‘1’); quiet_nan_exc = (E == ‘1’); d = if dp_operation then UInt(D:Vd) else UInt(Vd:D); m = if dp_operation then UInt(M:Vm) else UInt(Vm:M); Encoding T2 / A2 7 6 5 4 3 2 1 0 1 sz E 1 M 0 7 6 5 4 3 2 1 0 1 sz E 1 M 0 1 0 Vm 1 0 Vm with_zero = FALSE; VFPv2, VFPv3 (sz = 1 UNDEFINED in single-precision only variants) VCMP{E}.F64
, #0.0 VCMP{E}.F32 , #0.0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 0 1 1 1 0 1 D 1 1 0 1 0 1 Vd 1 1 1 0 1 D 1 1 0 1 0 1 Vd dp_operation = (sz == ‘1’); quiet_nan_exc = (E == ‘1’); d = if dp_operation then UInt(D:Vd) else UInt(Vd:D); A8-572 1 0 1 0 1 sz E 1 (0) 0 (0) (0) (0) (0) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 7 6 5 4 3 2 7 6 5 4 3 2 1 0 1 0 1 sz E 1 (0) 0 (0) (0) (0) (0) with_zero = TRUE; Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VCMP{E}.F64
, VCMP{E}.F32 , VCMP{E}.F64
, #0.0 VCMP{E}.F32 , #0.0 Encoding T1 / A1, sz = 1 Encoding T1 / A1, sz = 0 Encoding T2 / A2, sz = 1 Encoding T2 / A2, sz = 0 where: If present, any NaN operand causes an Invalid Operation exception. Encoded as E = 1. E Otherwise, only a signaling NaN causes the exception. Encoded as E = 0. See Standard assembler syntax fields on page A8-7.
, The operand vectors, for a doubleword operation. , The operand vectors, for a singleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckVFPEnabled(TRUE); if dp_operation then op2 = if with_zero then FPZero(‘0’,64) else D[m]; (FPSCR.N, FPSCR.Z, FPSCR.C, FPSCR.V) = FPCompare(D[d], op2, quiet_nan_exc, TRUE); else op2 = if with_zero then FPZero(‘0’,32) else S[m]; (FPSCR.N, FPSCR.Z, FPSCR.C, FPSCR.V) = FPCompare(S[d], op2, quiet_nan_exc, TRUE); Exceptions Undefined Instruction. Floating-point exceptions: Invalid Operation, Input Denormal. NaNs The IEEE 754 standard specifies that the result of a comparison is precisely one of <, ==, > or unordered. If either or both of the operands are NaNs, they are unordered, and all three of (Operand1 < Operand2), (Operand1 == Operand2) and (Operand1 > Operand2) are false. This results in the FPSCR flags being set as N=0, Z=0, C=1 and V=1. VCMPE raises an Invalid Operation exception if either operand is any type of NaN, and is suitable for testing for <, <=, >, >=, and other predicates that raise an exception when the operands are unordered. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-573 Instruction Details A8.6.293 VCNT This instruction counts the number of bits that are one in each element in a vector, and places the results in a second vector. The operand vector elements must be 8-bit bitfields. The result vector elements are 8-bit integers. Encoding T1 / A1 Advanced SIMD VCNT.8 , VCNT.8
, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 1 1 1 1 1 1 D 1 1 size 0 0 Vd 0 1 0 1 0 Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D 1 1 size 0 0 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 1 0 1 0 Q M 0 1 0 1 0 Vm if size != ‘00’ then UNDEFINED; if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; esize = 8; elements = 8; d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-574 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VCNT.8 , VCNT.8
, Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VCNT instruction must be unconditional. , The destination vector and the operand vector, for a quadword operation.
, The destination vector and the operand vector, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 Elem[D[d+r],e,esize] = BitCount(Elem[D[m+r],e,esize]); Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-575 Instruction Details A8.6.294 VCVT (between floating-point and integer, Advanced SIMD) This instruction converts each element in a vector from floating-point to integer, or from integer to floating-point, and places the results in a second vector. The vector elements must be 32-bit floating-point numbers, or 32-bit integers. Signed and unsigned integers are distinct. The floating-point to integer operation uses the Round towards Zero rounding mode. The integer to floating-point operation uses the Round to Nearest rounding mode. Encoding T1 / A1 Advanced SIMD (UNDEFINED in integer-only variant) VCVT.. , VCVT..
, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 1 1 1 1 1 1 D 1 1 size 1 1 Vd 0 1 1 op 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D 1 1 size 1 1 Vd 0 1 1 7 6 5 4 3 2 Q M 0 Vm 7 6 5 4 3 2 op Q M 0 1 0 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; if size != ‘10’ then UNDEFINED; to_integer = (op<1> == ‘1’); unsigned = (op<0> == ‘1’); esize = 32; elements = 2; if to_integer then round_zero = TRUE; // Variable name indicates purpose of FPToFixed() argument else round_nearest = TRUE; // Variable name indicates purpose of FixedToFP() argument d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-576 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VCVT.. , VCVT..
, Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM Advanced SIMD VCVT instruction must be unconditional. .. The data types for the elements of the vectors. They must be one of: .S32.F32 encoded as op = 0b10, size = 0b10 .U32.F32 encoded as op = 0b11, size = 0b10 .F32.S32 encoded as op = 0b00, size = 0b10 .F32.U32 encoded as op = 0b01, size = 0b10. , The destination vector and the operand vector, for a quadword operation.
, The destination vector and the operand vector, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 op = Elem[D[m+r],e,esize]; if to_integer then result = FPToFixed(op, esize, 0, unsigned, round_zero, FALSE); else result = FixedToFP(op, esize, 0, unsigned, round_nearest, FALSE); Elem[D[d+r],e,esize] = result; Exceptions Undefined Instruction. Floating-point exceptions: Input Denormal, Invalid Operation, and Inexact. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-577 Instruction Details A8.6.295 VCVT, VCVTR (between floating-point and integer, VFP) These instructions convert a value in a register from floating-point to a 32-bit integer, or from a 32-bit integer to floating-point, and place the result in a second register. The floating-point to integer operation normally uses the Round towards Zero rounding mode, but can optionally use the rounding mode specified by the FPSCR. The integer to floating-point operation uses the rounding mode specified by the FPSCR. VCVT (between floating-point and fixed-point, VFP) on page A8-582 describes conversions between floating-point and 16-bit integers. Encoding T1 / A1 VFPv2, VFPv3 (sz = 1 UNDEFINED in single-precision only variants) VCVT{R}.S32.F64 , VCVT{R}.S32.F32 , VCVT{R}.U32.F64 , VCVT{R}.U32.F32 , VCVT.F64.
, VCVT.F32. , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 0 1 D 1 1 1 1 0 15 14 13 12 11 10 9 8 opc2 Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 1 0 1 D 1 1 1 opc2 Vd 7 6 5 4 3 2 1 0 1 sz op 1 M 0 7 6 5 4 3 2 1 0 1 sz op 1 M 0 1 0 Vm 1 0 Vm if opc2 != ‘000’ && opc2 != ‘10x’ then SEE “Related encodings”; to_integer = (opc2<2> == ‘1’); dp_operation = (sz == 1); if to_integer then unsigned = (opc2<0> == ‘0’); round_zero = (op == ‘1’); d = UInt(Vd:D); m = if dp_operation then UInt(M:Vm) else UInt(Vm:M); else unsigned = (op == ‘0’); round_fpscr = FALSE; // FALSE selects FPSCR rounding m = UInt(Vm:M); d = if dp_operation then UInt(D:Vd) else UInt(Vd:D); Related encodings A8-578 See VFP data-processing instructions on page A7-24 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VCVT{R}.S32.F64 , VCVT{R}.S32.F32 , VCVT{R}.U32.F64 , VCVT{R}.U32.F32 opc2 = ’101’, sz = 1 opc2 = ’101’, sz = 0 opc2 = ’100’, sz = 1 opc2 = ’100’, sz = 0 opc2 = ’000’, sz = 1 opc2 = ’000’, sz = 0 , VCVT.F64.
, VCVT.F32. , where: If R is specified, the operation uses the rounding mode specified by the FPSCR. Encoded as op = 0. R If R is omitted. the operation uses the Round towards Zero rounding mode. For syntaxes in which R is optional, op is encoded as 1 if R is omitted. See Standard assembler syntax fields on page A8-7. The data type for the operand. It must be one of: S32 encoded as op = 1 U32 encoded as op = 0. , The destination register and the operand register, for a double-precision operand.
, The destination register and the operand register, for a double-precision result. , The destination register and the operand register, for a single-precision operand or result. Operation if ConditionPassed() then EncodingSpecificOperations(); if to_integer then if dp_operation then S[d] = FPToFixed(D[m], else S[d] = FPToFixed(S[m], else if dp_operation then D[d] = FixedToFP(S[m], else S[d] = FixedToFP(S[m], CheckVFPEnabled(TRUE); 32, 0, unsigned, round_zero, TRUE); 32, 0, unsigned, round_zero, TRUE); 64, 0, unsigned, round_fpscr, TRUE); 32, 0, unsigned, round_fpscr, TRUE); Exceptions Undefined Instruction. Floating-point exceptions: Input Denormal, Invalid Operation, and Inexact. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-579 Instruction Details A8.6.296 VCVT (between floating-point and fixed-point, Advanced SIMD) This instruction converts each element in a vector from floating-point to fixed-point, or from fixed-point to floating-point, and places the results in a second vector. The vector elements must be 32-bit floating-point numbers, or 32-bit integers. Signed and unsigned integers are distinct. The floating-point to fixed-point operation uses the Round towards Zero rounding mode. The fixed-point to floating-point operation uses the Round to Nearest rounding mode. Encoding T1 / A1 Advanced SIMD (UNDEFINED in integer-only variant) VCVT.. , , # VCVT..
, , # 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 1 D imm6 1 0 15 14 13 12 11 10 9 8 Vd 1 1 1 op 0 Q M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 1 D imm6 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 1 1 op 0 Q M 1 1 0 1 0 Vm if imm6 == ‘000xxx’ then SEE “Related encodings”; if imm6 == ‘0xxxxx’ then UNDEFINED; if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; to_fixed = (op == ‘1’); unsigned = (U == ‘1’); if to_fixed then round_zero = TRUE; // Variable name indicates purpose of FPToFixed() argument else round_nearest = TRUE; // Variable name indicates purpose of FixedToFP() argument esize = 32; frac_bits = 64 - UInt(imm6); d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; Related encodings A8-580 See One register and a modified immediate value on page A7-21 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VCVT.. , , # VCVT..
, , # Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM Advanced SIMD VCVT instruction must be unconditional. .. The data types for the elements of the vectors. They must be one of: .S32.F32 encoded as op = 1, U = 0 .U32.F32 encoded as op = 1, U = 1 .F32.S32 encoded as op = 0, U = 0 .F32.U32 encoded as op = 0, U = 1. , The destination vector and the operand vector, for a quadword operation.
, The destination vector and the operand vector, for a doubleword operation. The number of fraction bits in the fixed point number, in the range 1 to 32: • (64 - ) is encoded in imm6. An assembler can permit an value of 0. This is encoded as floating-point to integer or integer to floating-point instruction, see VCVT (between floating-point and integer, Advanced SIMD) on page A8-576. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 op = Elem[D[m+r],e,esize]; if to_fixed then result = FPToFixed(op, esize, frac_bits, unsigned, round_zero, FALSE); else result = FixedToFP(op, esize, frac_bits, unsigned, round_nearest, FALSE); Elem[D[d+r],e,esize] = result; Exceptions Undefined Instruction. Floating-point exceptions: Input Denormal, Invalid Operation, and Inexact. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-581 Instruction Details A8.6.297 VCVT (between floating-point and fixed-point, VFP) This instruction converts a value in a register from floating-point to fixed-point, or from fixed-point to floating-point, and places the result in a second register. You can specify the fixed-point value as either signed or unsigned. The floating-point value can be single-precision or double-precision. The fixed-point value can be 16-bit or 32-bit. Conversions from fixed-point values take their operand from the low-order bits of the source register and ignore any remaining bits. Signed conversions to fixed-point values sign-extend the result value to the destination register width. Unsigned conversions to fixed-point values zero-extend the result value to the destination register width. The floating-point to fixed-point operation uses the Round towards Zero rounding mode. The fixed-point to floating-point operation uses the Round to Nearest rounding mode. Encoding T1 / A1 VFPv3 (sf = 1 UNDEFINED in single-precision only variants) VCVT..F64
,
, # VCVT..F32 , , # VCVT.F64.
,
, # VCVT.F32. , , # 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 0 1 1 1 0 1 D 1 1 1 op 1 U Vd 1 0 1 sf sx 1 i 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 1 0 1 D 1 1 1 op 1 U to_fixed = (op == ‘1’); dp_operation = (sf == ‘1’); size = if sx == ‘0’ then 16 else 32; frac_bits = size - UInt(imm4:i); if to_fixed then round_zero = TRUE; else round_nearest = TRUE; d = if dp_operation then UInt(D:Vd) else UInt(Vd:D); if frac_bits < 0 then UNPREDICTABLE; A8-582 Vd 7 6 5 4 3 2 imm4 7 6 5 4 3 2 1 0 1 sf sx 1 i 0 1 0 1 0 imm4 unsigned = (U == ‘1’); Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VCVT..F64
,
,# VCVT..F32 , , # VCVT.F64.
,
, # VCVT.F32. , , # op = 1, sf = 1 op = 1, sf = 0 op = 0, sf = 1 op = 0, sf = 0 where: See Standard assembler syntax fields on page A8-7. The data type for the fixed-point number. It must be one of: S16 encoded as U = 0, sx = 0 U16 encoded as U = 1, sx = 0 S32 encoded as U = 0, sx = 1 U32 encoded as U = 1, sx = 1.
The destination and operand register, for a double-precision operand. The destination and operand register, for a single-precision operand. The number of fraction bits in the fixed-point number: • If is S16 or U16, must be in the range 0-16. (16 - ) is encoded in [imm4,i] • I f is S32 or U32, must be in the range 1-32. (32 - ) is encoded in [imm4,i]. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckVFPEnabled(TRUE); if to_fixed then if dp_operation then result = FPToFixed(D[d], size, frac_bits, unsigned, round_zero, TRUE); D[d] = if unsigned then ZeroExtend(result, 64) else SignExtend(result, 64); else result = FPToFixed(S[m], size, frac_bits, unsigned, round_zero, TRUE); S[d] = if unsigned then ZeroExtend(result, 32) else SignExtend(result, 32); else if dp_operation then D[d] = FixedToFP(D[d], 64, frac_bits, unsigned, round_nearest, TRUE); else S[d] = FixedToFP(S[d], 32, frac_bits, unsigned, round_nearest, TRUE); Exceptions Undefined Instruction. Floating-point exceptions: Input Denormal, Invalid Operation, and Inexact. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-583 Instruction Details A8.6.298 VCVT (between double-precision and single-precision) This instruction does one of the following: • converts the value in a double-precision register to single-precision and writes the result to a single-precision register • converts the value in a single-precision register to double-precision and writes the result to a double-precision register. Encoding T1 / A1 VFPv2, VFPv3 (UNDEFINED in single-precision only variants) VCVT.F64.F32
, VCVT.F32.F64 , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 0 1 1 1 0 1 D 1 1 0 1 1 1 Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 1 0 1 D 1 1 0 1 1 1 Vd 7 6 5 4 3 2 1 0 1 sz 1 1 M 0 7 6 5 4 3 2 1 0 1 sz 1 1 M 0 1 0 Vm 1 0 Vm double_to_single = (sz == ‘1’); d = if double_to_single then UInt(Vd:D) else UInt(D:Vd); m = if double_to_single then UInt(M:Vm) else UInt(Vm:M); A8-584 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VCVT.F64.F32
, VCVT.F32.F64 , Encoded as sz = 0 Encoded as sz = 1 where: See Standard assembler syntax fields on page A8-7.
, The destination register and the operand register, for a single-precision operand. , The destination register and the operand register, for a double-precision operand. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckVFPEnabled(TRUE); if double_to_single then S[d] = FPDoubleToSingle(D[m], TRUE); else D[d] = FPSingleToDouble(S[m], TRUE); Exceptions Undefined Instruction. Floating-point exceptions: Invalid Operation, Input Denormal, Overflow, Underflow, and Inexact. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-585 Instruction Details A8.6.299 VCVT (between half-precision and single-precision, Advanced SIMD) This instruction converts each element in a vector from single-precision to half-precision floating-point or from half-precision to single-precision, and places the results in a second vector. The vector elements must be 32-bit floating-point numbers, or 16-bit floating-point numbers. Encoding T1 / A1 Advanced SIMD with half-precision extensions (UNDEFINED in integer-only variant) VCVT.F32.F16 , VCVT.F16.F32
, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 1 1 1 1 1 1 D 1 1 size 1 0 Vd 0 1 1 op 0 0 M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D 1 1 size 1 0 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 1 1 op 0 0 M 0 1 0 1 0 Vm half_to_single = (op == ‘1’); if size != ‘01’ then UNDEFINED; if half_to_single && Vd<0> == ‘1’ then UNDEFINED; if !half_to_single && Vm<0> == ‘1’ then UNDEFINED; esize = 16; elements = 4; m = UInt(M:Vm); d = UInt(D:Vd); A8-586 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VCVT.F32.F16 , VCVT.F16.F32
, Encoded as op = 1 Encoded as op = 0 where: See Standard assembler syntax fields on page A8-7. , The destination vector and the operand vector for a half-precision to single-precision operation.
, The destination vector and the operand vectors for a single-precision to half-precision operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for e = 0 to elements-1 if half_to_single then Elem[Q[d>>1],e,2*esize] = FPHalfToSingle(Elem[D[m],e,esize], FALSE); else Elem[D[d],e,esize] = FPSingleToHalf(Elem[Q[m>>1],e,2*esize], FALSE); Exceptions Undefined Instruction. Floating-point exceptions: Invalid Operation, Input Denormal, Overflow, Underflow, and Inexact. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-587 Instruction Details A8.6.300 VCVTB, VCVTT (between half-precision and single-precision, VFP) This instruction does one of the following: • converts the half-precision value in the top or bottom half of a single-precision register to single-precision and writes the result to a single-precision register • converts the value in a single-precision register to half-precision and writes the result into the top or bottom half of a single-precision register, preserving the other half of the target register. Encoding T1 / A1 VFPv3 half-precision extensions VCVT.F32.F16 , VCVT.F16.F32 , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 0 1 1 1 0 1 D 1 1 0 0 1 op Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 1 0 1 D 1 1 0 0 1 op Vd 7 6 5 4 3 2 1 0 1 0 T 1 M 0 7 6 5 4 3 2 1 0 1 0 T 1 M 0 1 0 Vm 1 0 Vm half_to_single = (op == ‘0’); lowbit = if T == ‘1’ then 16 else 0; m = UInt(Vm:M); d = UInt(Vd:D); A8-588 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VCVT.F32.F16 , VCVT.F16.F32 , Encoded as op = 0 Encoded as op = 1 where: Specifies which half of the operand register or destination register is used for the operand or destination. If is B, then the T bit is encoded as 0 and the bottom half (bits [15:0]) of or is used. If is T, then the T bit is encoded as 1 and the top half (bits [31:16]) of or is used See Standard assembler syntax fields on page A8-7. The destination register. The operand register. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckVFPEnabled(TRUE); if half_to_single then S[d] = FPHalfToSingle(S[m], TRUE); else S[d] = FPSingleToHalf(S[m], TRUE); Exceptions Undefined Instruction. Floating-point exceptions: Invalid Operation, Input Denormal, Overflow, Underflow, and Inexact. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-589 Instruction Details A8.6.301 VDIV This instruction divides one floating-point value by another floating-point value and writes the result to a third floating-point register. Encoding T1 / A1 VFPv2, VFPv3 (sz = 1 UNDEFINED in single-precision only variants) VDIV.F64
, , VDIV.F32 , , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 0 1 D 0 0 1 0 15 14 13 12 11 10 9 8 Vn Vd 1 0 1 sz N 0 M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 1 0 1 D 0 0 if FPSCR.LEN != ‘000’ || FPSCR.STRIDE != dp_operation = (sz == ‘1’); d = if dp_operation then UInt(D:Vd) else n = if dp_operation then UInt(N:Vn) else m = if dp_operation then UInt(M:Vm) else VFP vectors A8-590 Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 0 1 sz N 0 M 0 1 0 1 0 Vm ‘00’ then SEE “VFP vectors”; UInt(Vd:D); UInt(Vn:N); UInt(Vm:M); This instruction can operate on VFP vectors under control of the FPSCR.LEN and FPSCR.STRIDE bits. For details see Appendix F VFP Vector Operation Support. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VDIV.F64 {
,} , VDIV.F32 {,} , Encoded as sz = 1 Encoded as sz = 0 where: See Standard assembler syntax fields on page A8-7.
, , The destination register and the operand registers, for a double-precision operation. , , The destination register and the operand registers, for a single-precision operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckVFPEnabled(TRUE); if dp_operation then D[d] = FPDiv(D[n], D[m], TRUE); else S[d] = FPDiv(S[n], S[m], TRUE); Exceptions Undefined Instruction. Floating-point exceptions: Invalid Operation, Division by Zero, Overflow, Underflow, Inexact, Input Denormal. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-591 Instruction Details A8.6.302 VDUP (scalar) Vector Duplicate duplicates a scalar into every element of the destination vector. The scalar, and the destination vector elements, can be any one of 8-bit, 16-bit, or 32-bit bitfields. There is no distinction between data types. For more information about scalars see Advanced SIMD scalars on page A7-9. Encoding T1 / A1 Advanced SIMD VDUP. , VDUP.
, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 1 1 1 1 D 1 1 1 0 15 14 13 12 11 10 9 8 imm4 Vd 1 1 0 0 0 Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D 1 1 imm4 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 1 0 0 0 Q M 0 1 0 1 0 Vm if imm4 == ‘x000’ then UNDEFINED; if Q == ‘1’ && Vd<0> == ‘1’ then UNDEFINED; case imm4 of when ‘xxx1’ esize = 8; elements = 8; index = UInt(imm4<3:1>); when ‘xx10’ esize = 16; elements = 4; index = UInt(imm4<3:2>); when ‘x100’ esize = 32; elements = 2; index = UInt(imm4<3>); d = UInt(D:Vd); regs = if U == ‘0’ then 1 else 2; A8-592 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VDUP. , VDUP.
, Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VDUP instruction must be unconditional. The data size. It must be one of: 8 Encoded as imm4<0> = '1'. imm4<3:1> encodes the index [x] of the scalar. 16 Encoded as imm4<1:0> = '10'. imm4<3:2> encodes the index [x] of the scalar. 32 Encoded as imm4<2:0> = '100'. imm4<3> encodes the index [x] of the scalar. The destination vector for a quadword operation.
The destination vector for a doubleword operation. The scalar. For details of how [x] is encoded, see the description of . Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); scalar = Elem[D[m],index,esize]; for r = 0 to regs-1 for e = 0 to elements-1 Elem[D[d+r],e,esize] = scalar; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-593 Instruction Details A8.6.303 VDUP (ARM core register) This instruction duplicates an element from an ARM core register into every element of the destination vector. The destination vector elements can be 8-bit, 16-bit, or 32-bit bitfields. The source element is the least significant 8, 16, or 32 bits of the ARM core register. There is no distinction between data types. Encoding T1 / A1 Advanced SIMD VDUP. , VDUP.
, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 0 1 b Q 0 1 0 15 14 13 12 11 10 9 8 Vd Rt 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 1 0 1 b Q 0 Vd Rt 7 6 5 4 3 2 1 0 1 0 1 1 D 0 e 1 (0) (0) (0) (0) 7 6 5 4 3 2 1 0 1 0 1 1 D 0 e 1 (0) (0) (0) (0) if Q == ‘1’ && Vd<0> == ‘1’ then UNDEFINED; d = UInt(D:Vd); t = UInt(Rt); regs = if Q == ‘0’ then 1 else 2; case b:e of when ‘00’ esize = 32; elements = 2; when ‘01’ esize = 16; elements = 4; when ‘10’ esize = 8; elements = 8; when ‘11’ UNDEFINED; if t == 15 || (CurrentInstrSet() != InstrSet_ARM && t == 13) then UNPREDICTABLE; A8-594 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VDUP. , VDUP.
, Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VDUP instruction must be unconditional. The data size for the elements of the destination vector. It must be one of: 8 encoded as [b,e] = 0b10 16 encoded as [b,e] = 0b01 32 encoded as [b,e] = 0b00. The destination vector for a quadword operation.
The destination vector for a doubleword operation. The ARM source register. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); scalar = R[t]; for r = 0 to regs-1 for e = 0 to elements-1 Elem[D[d+r],e,esize] = scalar; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-595 Instruction Details A8.6.304 VEOR Vector Bitwise Exclusive OR performs a bitwise Exclusive OR operation between two registers, and places the result in the destination register. The operand and result registers can be quadword or doubleword. They must all be the same size. Encoding T1 / A1 Advanced SIMD VEOR , , VEOR
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 1 1 1 0 D 0 0 1 0 15 14 13 12 11 10 9 8 Vn Vd 0 0 0 1 N Q M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 0 D 0 0 Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 0 0 1 N Q M 1 1 0 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-596 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VEOR{.
} {,} , VEOR{.
} {
,} , Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VEOR instruction must be unconditional.
An optional data type. It is ignored by assemblers, and does not affect the encoding. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 D[d+r] = D[n+r] EOR D[m+r]; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-597 Instruction Details A8.6.305 VEXT Vector Extract extracts elements from the bottom end of the second operand vector and the top end of the first, concatenates them and places the result in the destination vector. See Figure A8-1 for an example. The elements of the vectors are treated as being 8-bit bitfields. There is no distinction between data types. 7 6 5 4 3 2 1 0 Vm 7 6 5 4 3 2 1 0 Vn Vd Figure A8-1 Operation of doubleword VEXT for imm = 3 Encoding T1 / A1 Advanced SIMD VEXT.8 , , , # VEXT.8
, , , # 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 1 1 D 1 1 1 0 15 14 13 12 11 10 9 8 Vn Vd imm4 7 6 5 4 3 2 N Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 0 0 1 0 1 D 1 1 N Q M 0 Vn Vd imm4 1 0 Vm 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; if Q == ‘0’ && imm4<3> == ‘1’ then UNDEFINED; quadword_operation = (Q == ‘1’); position = 8 * UInt(imm4); d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); A8-598 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VEXT. {,} , , # VEXT. {
,} , , # Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VEXT instruction must be unconditional. Size of the operation. The value can be: • 8, 16, or 32 for doubleword operations • 8, 16, 32, or 64 for quadword operations. If the value is 16, 32, or 64, the syntax is a pseudo-instruction for a VEXT instruction specifying the equivalent number of bytes. The following examples show how an assembler treats values greater than 8: VEXT.16 D0,D1,#x is treated as VEXT.8 D0,D1,#(x*2) VEXT.32 D0,D1,#x is treated as VEXT.8 D0,D1,#(x*4) VEXT.64 Q0,Q1,#x is treated as VEXT.8 Q0,Q1,#(x*8). , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. The location of the extracted result in the concatenation of the operands, as a number of bytes from the least significant end, in the range 0-7 for a doubleword operation or 0-15 for a quadword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); if quadword_operation then Q[d] = (Q[m]:Q[n]); else D[d] = (D[m]:D[n]); Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-599 Instruction Details A8.6.306 VHADD, VHSUB Vector Halving Add adds corresponding elements in two vectors of integers, shifts each result right one bit, and places the final results in the destination vector. The results of the halving operations are truncated (for rounded results see VRHADD on page A8-734). Vector Halving Subtract subtracts the elements of the second operand from the corresponding elements of the first operand, shifts each result right one bit, and places the final results in the destination vector. The results of the halving operations are truncated (there is no rounding version). The operand and result elements are all the same type, and can be any one of: • 8-bit, 16-bit, or 32-bit signed integers • 8-bit, 16-bit, or 32-bit unsigned integers. Encoding T1 / A1 Advanced SIMD VH , , VH
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 0 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 0 0 op 0 N Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 0 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 0 op 0 N Q M 0 1 0 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; if size == ‘11’ then UNDEFINED; add = (op == ‘0’); unsigned = (U == ‘1’); esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-600 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VH.
{,} , VH.
{
,} , Encoded as Q = 1 Encoded as Q = 0 where: Must be one of: ADD encoded as op = 0 SUB encoded as op = 1. See Standard assembler syntax fields on page A8-7. An ARM VHADD or VHSUB instruction must be unconditional.
The data type for the elements of the vectors. It must be one of: S8 encoded as size = 0b00, U = 0 S16 encoded as size = 0b01, U = 0 S32 encoded as size = 0b10, U = 0 U8 encoded as size = 0b00, U = 1 U16 encoded as size = 0b01, U = 1 U32 encoded as size = 0b10, U = 1. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 op1 = Int(Elem[D[n+r],e,esize], unsigned); op2 = Int(Elem[D[m+r],e,esize], unsigned); result = if add then op1+op2 else op1-op2; Elem[D[d+r],e,esize] = result; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-601 Instruction Details A8.6.307 VLD1 (multiple single elements) This instruction loads elements from memory into one, two, three, or four registers, without de-interleaving. Every element of each register is loaded. For details of the addressing mode see Advanced SIMD addressing mode on page A7-30. Encoding T1 / A1 Advanced SIMD VLD1. , [{@}]{!} VLD1. , [{@}], 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 0 0 1 0 D 1 0 1 0 15 14 13 12 11 10 9 8 Rn Vd type 7 6 5 4 3 2 size align Rm 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 0 1 0 0 0 D 1 0 size align Rn Vd type 1 0 1 0 Rm case type of when ‘0111’ regs = 1; if align<1> == ‘1’ then UNDEFINED; when ‘1010’ regs = 2; if align == ‘11’ then UNDEFINED; when ‘0110’ regs = 3; if align<1> == ‘1’ then UNDEFINED; when ‘0010’ regs = 4; otherwise SEE “Related encodings”; alignment = if align == ‘00’ then 1 else 4 << UInt(align); ebytes = 1 << UInt(size); esize = 8 * ebytes; elements = 8 DIV ebytes; d = UInt(D:Vd); n = UInt(Rn); m = UInt(Rm); wback = (m != 15); register_index = (m != 15 && m != 13); if d+regs > 32 then UNPREDICTABLE; Related encodings See Advanced SIMD element or structure load/store instructions on page A7-27 Assembler syntax VLD1. ,[{@}] VLD1. , [{@}]! VLD1. , [{@}], Rm = ’1111’ Rm = ’1101’ Rm = other values where: A8-602 See Standard assembler syntax fields on page A8-7. An ARM VLD1 instruction must be unconditional. The data size. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details 32 64 encoded as size = 0b10 encoded as size = 0b11. The list of registers to load. It must be one of: {
} encoded as D:Vd =
, type = 0b0111 {
, } encoded as D:Vd =
, type = 0b1010 {
, , } encoded as D:Vd =
, type = 0b0110 {
, , , } encoded as D:Vd =
, type = 0b0010. Contains the base address for the access. The alignment. It can be one of: 64 8-byte alignment, encoded as align = 0b01. 128 16-byte alignment, available only if contains two or four registers, encoded as align = 0b10. 256 32-byte alignment, available only if contains four registers, encoded as align = 0b11. omitted Standard alignment, see Unaligned data access on page A3-5. Encoded as align = 0b00. ! If present, specifies writeback. Contains an address offset applied after the access. For more information about , !, and , see Advanced SIMD addressing mode on page A7-30. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); NullCheckIfThumbEE(n); address = R[n]; if (address MOD alignment) != 0 then GenerateAlignmentException(); if wback then R[n] = R[n] + (if register_index then R[m] else 8*regs); for r = 0 to regs-1 for e = 0 to elements-1 Elem[D[d+r],e,esize] = MemU[address,ebytes]; address = address + ebytes; Exceptions Undefined Instruction, Data Abort. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-603 Instruction Details A8.6.308 VLD1 (single element to one lane) This instruction loads one element from memory into one element of a register. Elements of the register that are not loaded are unchanged. For details of the addressing mode see Advanced SIMD addressing mode on page A7-30. Encoding T1 / A1 Advanced SIMD VLD1. , [{@}]{!} VLD1. , [{@}], 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 0 0 1 1 D 1 0 1 0 15 14 13 12 11 10 9 8 Rn Vd size 0 0 index_align 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 1 0 0 1 D 1 0 Rn Vd 7 6 5 4 3 2 Rm 7 6 5 4 3 2 size 0 0 index_align 1 0 1 0 Rm if size == ‘11’ then SEE VLD1 (single element to all lanes); case size of when ‘00’ if index_align<0> != ‘0’ then UNDEFINED; ebytes = 1; esize = 8; index = UInt(index_align<3:1>); alignment = 1; when ‘01’ if index_align<1> != ‘0’ then UNDEFINED; ebytes = 2; esize = 16; index = UInt(index_align<3:2>); alignment = if index_align<0> == ‘0’ then 1 else 2; when ‘10’ if index_align<2> != ‘0’ then UNDEFINED; if index_align<1:0> != ‘00’ && index_align<1:0> != ‘11’ then UNDEFINED; ebytes = 4; esize = 32; index = UInt(index_align<3>); alignment = if index_align<1:0> == ‘00’ then 1 else 4; d = UInt(D:Vd); n = UInt(Rn); m = UInt(Rm); wback = (m != 15); register_index = (m != 15 && m != 13); Assembler syntax VLD1. , [{@}] VLD1. , [{@}]! VLD1. , [{@}], Rm = ’1111’ Rm = ’1101’ Rm = other values where: A8-604 See Standard assembler syntax fields on page A8-7. An ARM VLD1 instruction must be unconditional. The data size. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 32 encoded as size = 0b10. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details The register containing the element to load. It must be {}. The register
is encoded in D:Vd. Contains the base address for the access. The alignment. It can be one of: 16 2-byte alignment, available only if is 16 32 4-byte alignment, available only if is 32 omitted Standard alignment, see Unaligned data access on page A3-5. ! If present, specifies writeback. Contains an address offset applied after the access. For more information about , !, and , see Advanced SIMD addressing mode on page A7-30. Table A8-5 shows the encoding of index and alignment for the different values. Table A8-5 Encoding of index and alignment == 8 == 16 == 32 Index index_align[3:1] = x index_align[3:2] = x index_align[3] = x omitted index_align[0] = 0 index_align[1:0] = ’00’ index_align[2:0] = ’000’ == 16 - index_align[1:0] = ’01’ - == 32 - - index_align[2:0] = ’011’ Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); NullCheckIfThumbEE(n); address = R[n]; if (address MOD alignment) != 0 then GenerateAlignmentException(); if wback then R[n] = R[n] + (if register_index then R[m] else ebytes); Elem[D[d],index,esize] = MemU[address,ebytes]; Exceptions Undefined Instruction, Data Abort. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-605 Instruction Details A8.6.309 VLD1 (single element to all lanes) This instruction loads one element from memory into every element of one or two vectors. For details of the addressing mode see Advanced SIMD addressing mode on page A7-30. Encoding T1 / A1 Advanced SIMD VLD1. , [{@}]{!} VLD1. , [{@}], 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 0 0 1 1 D 1 0 1 0 15 14 13 12 11 10 9 8 Rn Vd 1 1 0 0 size T a 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 1 0 0 1 D 1 0 Rn Vd 7 6 5 4 3 2 Rm 7 6 5 4 3 2 1 1 0 0 size T a 1 0 1 0 Rm if size == ‘11’ || (size == ‘00’ && a == ‘1’) then UNDEFINED; ebytes = 1 << UInt(size); elements = 8 DIV ebytes; regs = if T == ‘0’ then 1 else 2; alignment = if a == ‘0’ then 1 else ebytes; d = UInt(D:Vd); n = UInt(Rn); m = UInt(Rm); wback = (m != 15); register_index = (m != 15 && m != 13); if d+regs > 32 then UNPREDICTABLE; A8-606 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VLD1. , [{@}] VLD1. , [{@}]! VLD1. , [{@}], Rm = ’1111’ Rm = ’1101’ Rm = other values where: See Standard assembler syntax fields on page A8-7. An ARM VLD1 instruction must be unconditional. The data size. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 32 encoded as size = 0b10. The list of registers to load. It must be one of: {} encoded as D:Vd =
, T = 0 {, } encoded as D:Vd =
, T = 1. Contains the base address for the access. The alignment. It can be one of: 16 2-byte alignment, available only if is 16, encoded as a = 1. 32 4-byte alignment, available only if is 32, encoded as a = 1. omitted Standard alignment, see Unaligned data access on page A3-5. Encoded as a = 0. ! If present, specifies writeback. Contains an address offset applied after the access. For more information about , !, and , see Advanced SIMD addressing mode on page A7-30. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); NullCheckIfThumbEE(n); address = R[n]; if (address MOD alignment) != 0 then GenerateAlignmentException(); if wback then R[n] = R[n] + (if register_index then R[m] else ebytes); replicated_element = Replicate(MemU[address,ebytes], elements); for r = 0 to regs-1 D[d+r] = replicated_element; Exceptions Undefined Instruction, Data Abort. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-607 Instruction Details A8.6.310 VLD2 (multiple 2-element structures) This instruction loads multiple 2-element structures from memory into two or four registers, with de-interleaving. For more information, see Element and structure load/store instructions on page A4-27. Every element of each register is loaded. For details of the addressing mode see Advanced SIMD addressing mode on page A7-30. Encoding T1 / A1 Advanced SIMD VLD2. , [{@}]{!} VLD2. , [{@}], 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 0 0 1 0 D 1 0 1 0 15 14 13 12 11 10 9 8 Rn Vd type 7 6 5 4 3 2 size align Rm 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 0 1 0 0 0 D 1 0 size align Rn Vd type 1 0 1 0 Rm if size == ‘11’ then UNDEFINED; case type of when ‘1000’ regs = 1; inc = 1; if align == ‘11’ then UNDEFINED; when ‘1001’ regs = 1; inc = 2; if align == ‘11’ then UNDEFINED; when ‘0011’ regs = 2; inc = 2; otherwise SEE “Related encodings”; alignment = if align == ‘00’ then 1 else 4 << UInt(align); ebytes = 1 << UInt(size); esize = 8 * ebytes; elements = 8 DIV ebytes; d = UInt(D:Vd); d2 = d + inc; n = UInt(Rn); m = UInt(Rm); wback = (m != 15); register_index = (m != 15 && m != 13); if d2+regs > 32 then UNPREDICTABLE; Related encodings See Advanced SIMD element or structure load/store instructions on page A7-27 Assembler syntax VLD2. , [{@}] VLD2. , [{@}]! VLD2. , [{@}], Rm = ’1111’ Rm = ’1101’ Rm = other values where: A8-608 See Standard assembler syntax fields on page A8-7. An ARM VLD2 instruction must be unconditional. The data size. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details 32 encoded as size = 0b10. The list of registers to load. It must be one of: {
, } encoded as D:Vd =
, type = 0b1000 {
, } encoded as D:Vd =
, type = 0b1001 {
, , , } encoded as D:Vd =
, type = 0b0011. Contains the base address for the access. The alignment. It can be one of: 64 8-byte alignment, encoded as align = 0b01. 128 16-byte alignment, encoded as align = 0b10. 256 32-byte alignment, available only if contains four registers. Encoded as align = 0b11 omitted Standard alignment, see Unaligned data access on page A3-5. Encoded as align = 0b00. ! If present, specifies writeback. Contains an address offset applied after the access. For more information about , !, and , see Advanced SIMD addressing mode on page A7-30. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); NullCheckIfThumbEE(n); address = R[n]; if (address MOD alignment) != 0 then GenerateAlignmentException(); if wback then R[n] = R[n] + (if register_index then R[m] else 16*regs); for r = 0 to regs-1 for e = 0 to elements-1 Elem[D[d+r],e,esize] = MemU[address,ebytes]; Elem[D[d2+r],e,esize] = MemU[address+ebytes,ebytes]; address = address + 2*ebytes; Exceptions Undefined Instruction, Data Abort. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-609 Instruction Details A8.6.311 VLD2 (single 2-element structure to one lane) This instruction loads one 2-element structure from memory into corresponding elements of two registers. Elements of the registers that are not loaded are unchanged. For details of the addressing mode see Advanced SIMD addressing mode on page A7-30. Encoding T1 / A1 Advanced SIMD VLD2. , [{@}]{!} VLD2. , [{@}], 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 0 0 1 1 D 1 0 1 0 15 14 13 12 11 10 9 8 Rn Vd size 0 1 index_align 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 1 0 0 1 D 1 0 Rn Vd 7 6 5 4 3 2 Rm 7 6 5 4 3 2 size 0 1 index_align 1 0 1 0 Rm if size == ‘11’ then SEE VLD2 (single 2-element structure to all lanes); case size of when ‘00’ ebytes = 1; esize = 8; index = UInt(index_align<3:1>); inc = 1; alignment = if index_align<0> == ‘0’ then 1 else 2; when ‘01’ ebytes = 2; esize = 16; index = UInt(index_align<3:2>); inc = if index_align<1> == ‘0’ then 1 else 2; alignment = if index_align<0> == ‘0’ then 1 else 4; when ‘10’ if index_align<1> != ‘0’ then UNDEFINED; ebytes = 4; esize = 32; index = UInt(index_align<3>); inc = if index_align<2> == ‘0’ then 1 else 2; alignment = if index_align<0> == ‘0’ then 1 else 8; d = UInt(D:Vd); d2 = d + inc; n = UInt(Rn); m = UInt(Rm); wback = (m != 15); register_index = (m != 15 && m != 13); if d2 > 31 then UNPREDICTABLE; Assembler syntax VLD2. , [{@}] VLD2. , [{@}]! VLD2. , [{@}], Rm = ’1111’ Rm = ’1101’ Rm = other values where: A8-610 See Standard assembler syntax fields on page A8-7. An ARM VLD2 instruction must be unconditional. The data size. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 32 encoded as size = 0b10. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details The registers containing the structure. Encoded with D:Vd =
. It must be one of: {, } Single-spaced registers, see Table A8-6. {, } Double-spaced registers, see Table A8-6. This is not available if == 8. Contains the base address for the access. The alignment. It can be one of: 16 2-byte alignment, available only if is 8 32 4-byte alignment, available only if is 16 64 8-byte alignment, available only if is 32 omitted Standard alignment, see Unaligned data access on page A3-5. ! If present, specifies writeback. Contains an address offset applied after the access. For more information about , !, and see Advanced SIMD addressing mode on page A7-30. Table A8-6 Encoding of index, alignment, and register spacing == 8 == 16 == 32 Index index_align[3:1] = x index_align[3:2] = x index_align[3] = x Single-spacing - index_align[1] = 0 index_align[2] = 0 Double-spacing - index_align[1] = 1 index_align[2] = 1 omitted index_align[0] = 0 index_align[0] = 0 index_align[1:0] = ’00’ == 16 index_align[0] = 1 - - == 32 - index_align[0] = 1 - == 64 - - index_align[1:0] = ’01’ Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); NullCheckIfThumbEE(n); address = R[n]; if (address MOD alignment) != 0 then GenerateAlignmentException(); if wback then R[n] = R[n] + (if register_index then R[m] else 2*ebytes); Elem[D[d],index,esize] = MemU[address,ebytes]; Elem[D[d2],index,esize] = MemU[address+ebytes,ebytes]; Exceptions Undefined Instruction, Data Abort. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-611 Instruction Details A8.6.312 VLD2 (single 2-element structure to all lanes) This instruction loads one 2-element structure from memory into all lanes of two registers. For details of the addressing mode see Advanced SIMD addressing mode on page A7-30. Encoding T1 / A1 Advanced SIMD VLD2. , [{@}]{!} VLD2. , [{@}], 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 0 0 1 1 D 1 0 1 0 15 14 13 12 11 10 9 8 Rn Vd 1 1 0 1 size T a 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 1 0 0 1 D 1 0 Rn Vd 7 6 5 4 3 2 Rm 7 6 5 4 3 2 1 1 0 1 size T a 1 0 1 0 Rm if size == ‘11’ then UNDEFINED; ebytes = 1 << UInt(size); elements = 8 DIV ebytes; alignment = if a == ‘0’ then 1 else 2*ebytes; inc = if T == ‘0’ then 1 else 2; d = UInt(D:Vd); d2 = d + inc; n = UInt(Rn); m = UInt(Rm); wback = (m != 15); register_index = (m != 15 && m != 13); if d2 > 31 then UNPREDICTABLE; Assembler syntax VLD2. , [{@}] VLD2. , [{@}]! VLD2. , [{@}], Rm = ’1111’ Rm = ’1101’ Rm = other values where: A8-612 See Standard assembler syntax fields on page A8-7. An ARM VLD2 instruction must be unconditional. The data size. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 32 encoded as size = 0b10. The registers containing the structure. It must be one of: {, } single-spaced register transfer, encoded as D:Vd =
, T = 0 {, } double-spaced register transfer, encoded as D:Vd =
, T = 1. Contains the base address for the access. The alignment. It can be one of: 16 2-byte alignment, available only if is 8, encoded as a = 1 32 4-byte alignment, available only if is 16, encoded as a = 1 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details 64 omitted 8-byte alignment, available only if is 32, encoded as a = 1 Standard alignment, see Unaligned data access on page A3-5. Encoded as a = 0. ! If present, specifies writeback. Contains an address offset applied after the access. For more information about , !, and , see Advanced SIMD addressing mode on page A7-30. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); NullCheckIfThumbEE(n); address = R[n]; if (address MOD alignment) != 0 then GenerateAlignmentException(); if wback then R[n] = R[n] + (if register_index then R[m] else 2*ebytes); D[d] = Replicate(MemU[address,ebytes], elements); D[d2] = Replicate(MemU[address+ebytes,ebytes], elements); Exceptions Undefined Instruction, Data Abort. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-613 Instruction Details A8.6.313 VLD3 (multiple 3-element structures) This instruction loads multiple 3-element structures from memory into three registers, with de-interleaving. For more information, see Element and structure load/store instructions on page A4-27. Every element of each register is loaded. For details of the addressing mode see Advanced SIMD addressing mode on page A7-30. Encoding T1 / A1 Advanced SIMD VLD3. , [{@}]{!} VLD3. , [{@}], 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 0 0 1 0 D 1 0 1 0 15 14 13 12 11 10 9 8 Rn Vd type 7 6 5 4 3 2 size align Rm 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 0 1 0 0 0 D 1 0 size align Rn Vd type 1 0 1 0 Rm if size == ‘11’ || align<1> == ‘1’ then UNDEFINED; case type of when ‘0100’ inc = 1; when ‘0101’ inc = 2; otherwise SEE “Related encodings”; alignment = if align<0> == ‘0’ then 1 else 8; ebytes = 1 << UInt(size); esize = 8 * ebytes; elements = 8 DIV ebytes; d = UInt(D:Vd); d2 = d + inc; d3 = d2 + inc; n = UInt(Rn); m = UInt(Rm); wback = (m != 15); register_index = (m != 15 && m != 13); if d3 > 31 then UNPREDICTABLE; Related encodings See Advanced SIMD element or structure load/store instructions on page A7-27 Assembler syntax VLD3. , [{@}] VLD3. , [{@}]! VLD3. , [{@}], Rm = ’1111’ Rm = ’1101’ Rm = other values where: A8-614 See Standard assembler syntax fields on page A8-7. An ARM VLD3 instruction must be unconditional. The data size. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 32 encoded as size = 0b10. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details The list of registers to load. It must be one of: {
, , } encoded as D:Vd =
, type = 0b0100 {
, , } encoded as D:Vd =
, type = 0b0101. Contains the base address for the access. The alignment. It can be: 64 8-byte alignment, encoded as align = 0b01. omitted Standard alignment, see Unaligned data access on page A3-5. Encoded as align = 0b00. ! If present, specifies writeback. Contains an address offset applied after the access. For more information about , !, and , see Advanced SIMD addressing mode on page A7-30. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); NullCheckIfThumbEE(n); address = R[n]; if (address MOD alignment) != 0 then GenerateAlignmentException(); if wback then R[n] = R[n] + (if register_index then R[m] else 24); for e = 0 to elements-1 Elem[D[d],e,esize] = MemU[address,ebytes]; Elem[D[d2],e,esize] = MemU[address+ebytes,ebytes]; Elem[D[d3],e,esize] = MemU[address+2*ebytes,ebytes]; address = address + 3*ebytes; Exceptions Undefined Instruction, Data Abort. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-615 Instruction Details A8.6.314 VLD3 (single 3-element structure to one lane) This instruction loads one 3-element structure from memory into corresponding elements of three registers. Elements of the registers that are not loaded are unchanged. For details of the addressing mode see Advanced SIMD addressing mode on page A7-30. Encoding T1 / A1 Advanced SIMD VLD3. , []{!} VLD3. , [], 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 0 0 1 1 D 1 0 1 0 15 14 13 12 11 10 9 8 Rn Vd size 1 0 index_align 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 1 0 0 1 D 1 0 Rn Vd 7 6 5 4 3 2 Rm 7 6 5 4 3 2 size 1 0 index_align 1 0 1 0 Rm if size == ‘11’ then SEE VLD3 (single 3-element structure to all lanes); case size of when ‘00’ if index_align<0> != ‘0’ then UNDEFINED; ebytes = 1; esize = 8; index = UInt(index_align<3:1>); inc = 1; when ‘01’ if index_align<0> != ‘0’ then UNDEFINED; ebytes = 2; esize = 16; index = UInt(index_align<3:2>); inc = if index_align<1> == ‘0’ then 1 else 2; when ‘10’ if index_align<1:0> != ‘00’ then UNDEFINED; ebytes = 4; esize = 32; index = UInt(index_align<3>); inc = if index_align<2> == ‘0’ then 1 else 2; d = UInt(D:Vd); d2 = d + inc; d3 = d2 + inc; n = UInt(Rn); m = UInt(Rm); wback = (m != 15); register_index = (m != 15 && m != 13); if d3 > 31 then UNPREDICTABLE; Assembler syntax VLD3. , [] VLD3. , []! VLD3. , [], Rm = ’1111’ Rm = ’1101’ Rm = other values where: A8-616 See Standard assembler syntax fields on page A8-7. An ARM VLD3 instruction must be unconditional. The data size. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 32 encoded as size = 0b10. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details The registers containing the structure. Encoded with D:Vd =
. It must be one of: {, , } Single-spaced registers, see Table A8-7. {, , } Double-spaced registers, see Table A8-7. This is not available if == 8. Contains the base address for the access. ! If present, specifies writeback. Contains an address offset applied after the access. For more information about , !, and , see Advanced SIMD addressing mode on page A7-30. Table A8-7 Encoding of index and register spacing == 8 == 16 == 32 Index index_align[3:1] = x index_align[3:2] = x index_align[3] = x Single-spacing index_align[0] = 0 index_align[1:0] = '00' index_align[2:0] = '000' Double-spacing - index_align[1:0] = '10' index_align[2:0] = '100' Alignment Standard alignment rules apply, see Unaligned data access on page A3-5. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); NullCheckIfThumbEE(n); address = R[n]; if wback then R[n] = R[n] + (if register_index then R[m] else 3*ebytes); Elem[D[d],index,esize] = MemU[address,ebytes]; Elem[D[d2],index,esize] = MemU[address+ebytes,ebytes]; Elem[D[d3],index,esize] = MemU[address+2*ebytes,ebytes]; Exceptions Undefined Instruction, Data Abort. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-617 Instruction Details A8.6.315 VLD3 (single 3-element structure to all lanes) This instruction loads one 3-element structure from memory into all lanes of three registers. For details of the addressing mode see Advanced SIMD addressing mode on page A7-30. Encoding T1 / A1 Advanced SIMD VLD3. , []{!} VLD3. , [], 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 0 0 1 1 D 1 0 1 0 15 14 13 12 11 10 9 8 Rn Vd 1 1 1 0 size T a 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 1 0 0 1 D 1 0 Rn Vd if size == ‘11’ || a == ‘1’ then UNDEFINED; ebytes = 1 << UInt(size); elements = 8 DIV ebytes; inc = if T == ‘0’ then 1 else 2; d = UInt(D:Vd); d2 = d + inc; d3 = d2 + inc; n = UInt(Rn); wback = (m != 15); register_index = (m != 15 && m != 13); if d3 > 31 then UNPREDICTABLE; 7 6 5 4 3 2 Rm 7 6 5 4 3 2 1 1 1 0 size T a 1 0 1 0 Rm m = UInt(Rm); Assembler syntax VLD3. , [] VLD3. , []! VLD3. , [], Rm = ’1111’ Rm = ’1101’ Rm = other values where: See Standard assembler syntax fields on page A8-7. An ARM VLD3 instruction must be unconditional. The data size. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 32 encoded as size = 0b10. The registers containing the structures. It must be one of: {, , } Single-spaced register transfer, encoded as D:Vd =
, T = 0. {, , } Double-spaced register transfer, encoded as D:Vd =
, T = 1. A8-618 Contains the base address for the access. ! If present, specifies writeback. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Contains an address offset applied after the access. For more information about , !, and , see Advanced SIMD addressing mode on page A7-30. Alignment Standard alignment rules apply, see Unaligned data access on page A3-5. The a bit must be encoded as 0. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); NullCheckIfThumbEE(n); address = R[n]; if wback then R[n] = R[n] + (if register_index then R[m] else 3*ebytes); D[d] = Replicate(MemU[address,ebytes], elements); D[d2] = Replicate(MemU[address+ebytes,ebytes], elements); D[d3] = Replicate(MemU[address+2*ebytes,ebytes], elements); Exceptions Undefined Instruction, Data Abort. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-619 Instruction Details A8.6.316 VLD4 (multiple 4-element structures) This instruction loads multiple 4-element structures from memory into four registers, with de-interleaving. For more information, see Element and structure load/store instructions on page A4-27. Every element of each register is loaded. For details of the addressing mode see Advanced SIMD addressing mode on page A7-30. Encoding T1 / A1 Advanced SIMD VLD4. , [{@}]{!} VLD4. , [{@}], 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 0 0 1 0 D 1 0 1 0 15 14 13 12 11 10 9 8 Rn Vd type 7 6 5 4 3 2 size align Rm 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 0 1 0 0 0 D 1 0 size align Rn Vd type if size == ‘11’ then UNDEFINED; case type of when ‘0000’ inc = 1; when ‘0001’ inc = 2; otherwise SEE “Related encodings”; alignment = if align == ‘00’ then 1 else 4 << UInt(align); ebytes = 1 << UInt(size); esize = 8 * ebytes; elements = 8 DIV ebytes; d = UInt(D:Vd); d2 = d + inc; d3 = d2 + inc; d4 = d3 + inc; n = UInt(Rn); wback = (m != 15); register_index = (m != 15 && m != 13); if d4 > 31 then UNPREDICTABLE; Related encodings 1 0 1 0 Rm m = UInt(Rm); See Advanced SIMD element or structure load/store instructions on page A7-27 Assembler syntax VLD4. , [{@}] VLD4. , [{@}]! VLD4. , [{@}], Rm = ’1111’ Rm = ’1101’ Rm = other values where: A8-620 See Standard assembler syntax fields on page A8-7. An ARM VLD4 instruction must be unconditional. The data size. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 32 encoded as size = 0b10. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details The list of registers to load. It must be one of: {
, , , } encoded as D:Vd =
, type = 0b0000 {
, , , } encoded as D:Vd =
, type = 0b0001. Contains the base address for the access. The alignment. It can be one of: 64 8-byte alignment, encoded as align = 0b01. 128 16-byte alignment, encoded as align = 0b10. 256 32-byte alignment, encoded as align = 0b11. omitted Standard alignment, see Unaligned data access on page A3-5. Encoded as align = 0b00. ! If present, specifies writeback. Contains an address offset applied after the access. For more information about , !, and , see Advanced SIMD addressing mode on page A7-30. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); NullCheckIfThumbEE(n); address = R[n]; if (address MOD alignment) != 0 then GenerateAlignmentException(); if wback then R[n] = R[n] + (if register_index then R[m] else 32); for e = 0 to elements-1 Elem[D[d],e,esize] = MemU[address,ebytes]; Elem[D[d2],e,esize] = MemU[address+ebytes,ebytes]; Elem[D[d3],e,esize] = MemU[address+2*ebytes,ebytes]; Elem[D[d4],e,esize] = MemU[address+3*ebytes,ebytes]; address = address + 4*ebytes; Exceptions Undefined Instruction, Data Abort. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-621 Instruction Details A8.6.317 VLD4 (single 4-element structure to one lane) This instruction loads one 4-element structure from memory into corresponding elements of four registers. Elements of the registers that are not loaded are unchanged. For details of the addressing mode see Advanced SIMD addressing mode on page A7-30. Encoding T1 / A1 Advanced SIMD VLD4. , [{@}]{!} VLD4. , [{@}], 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 0 0 1 1 D 1 0 1 0 15 14 13 12 11 10 9 8 Rn Vd size 1 1 index_align 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 1 0 0 1 D 1 0 Rn Vd 7 6 5 4 3 2 Rm 7 6 5 4 3 2 size 1 1 index_align 1 0 1 0 Rm if size == ‘11’ then SEE VLD4 (single 4-element structure to all lanes); case size of when ‘00’ ebytes = 1; esize = 8; index = UInt(index_align<3:1>); inc = 1; alignment = if index_align<0> == ‘0’ then 1 else 4; when ‘01’ ebytes = 2; esize = 16; index = UInt(index_align<3:2>); inc = if index_align<1> == ‘0’ then 1 else 2; alignment = if index_align<0> == ‘0’ then 1 else 8; when ‘10’ if index_align<1:0> == ‘11’ then UNDEFINED; ebytes = 4; esize = 32; index = UInt(index_align<3>); inc = if index_align<2> == ‘0’ then 1 else 2; alignment = if index_align<1:0> == ‘00’ then 1 else 4 << UInt(index_align<1:0>); d = UInt(D:Vd); d2 = d + inc; d3 = d2 + inc; d4 = d3 + inc; n = UInt(Rn); m = UInt(Rm); wback = (m != 15); register_index = (m != 15 && m != 13); if d4 > 31 then UNPREDICTABLE; Assembler syntax VLD4. , [{@}] VLD4. , [{@}]! VLD4. , [{@}], Rm = ’1111’ Rm = ’1101’ Rm = other values where: A8-622 See Standard assembler syntax fields on page A8-7. An ARM VLD4 instruction must be unconditional. The data size. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 32 encoded as size = 0b10. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details The registers containing the structure. Encoded with D:Vd =
. It must be one of: {, , , } single-spaced registers, see Table A8-8. {, , , } double-spaced registers, see Table A8-8. Not available if == 8. The base address for the access. The alignment. It can be: 32 4-byte alignment, available only if is 8. 64 8-byte alignment, available only if is 16 or 32. 128 16-byte alignment, available only if is 32. omitted Standard alignment, see Unaligned data access on page A3-5. If present, specifies writeback. Contains an address offset applied after the access. ! For more information about , !, and see Advanced SIMD addressing mode on page A7-30. Table A8-8 Encoding of index, alignment, and register spacing == 8 == 16 == 32 Index index_align[3:1] = x index_align[3:2] = x index_align[3] = x Single-spacing - index_align[1] = 0 index_align[2] = 0 Double-spacing - index_align[1] = 1 index_align[2] = 1 omitted index_align[0] = 0 index_align[0] = 0 index_align[1:0] = ’00’ == 32 index_align[0] = 1 - - == 64 - index_align[0] = 1 index_align[1:0] = ’01’ == 128 - - index_align[1:0] = ’10’ Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); NullCheckIfThumbEE(n); address = R[n]; if (address MOD alignment) != 0 then GenerateAlignmentException(); if wback then R[n] = R[n] + (if register_index then R[m] else 4*ebytes); Elem[D[d],index,esize] = MemU[address,ebytes]; Elem[D[d2],index,esize] = MemU[address+ebytes,ebytes]; Elem[D[d3],index,esize] = MemU[address+2*ebytes,ebytes]; Elem[D[d4],index,esize] = MemU[address+3*ebytes,ebytes]; Exceptions Undefined Instruction, Data Abort. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-623 Instruction Details A8.6.318 VLD4 (single 4-element structure to all lanes) This instruction loads one 4-element structure from memory into all lanes of four registers. For details of the addressing mode see Advanced SIMD addressing mode on page A7-30. Encoding T1 / A1 Advanced SIMD VLD4. , [{ @}]{!} VLD4. , [{ @}], 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 0 0 1 1 D 1 0 1 0 15 14 13 12 11 10 9 8 Rn Vd 1 1 1 1 size T a 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 1 0 0 1 D 1 0 Rn Vd if size == ‘11’ && a == ‘0’ then UNDEFINED; if size == ‘11’ then ebytes = 4; elements = 2; alignment = 16; else ebytes = 1 << UInt(size); elements = 8 DIV ebytes; if size == ‘10’ then alignment = if a == ‘0’ then 1 else 8; else alignment = if a == ‘0’ then 1 else 4*ebytes; inc = if T == ‘0’ then 1 else 2; d = UInt(D:Vd); d2 = d + inc; d3 = d2 + inc; d4 = d3 + inc; wback = (m != 15); register_index = (m != 15 && m != 13); if d4 > 31 then UNPREDICTABLE; 7 6 5 4 3 2 Rm 7 6 5 4 3 2 1 1 1 1 size T a n = UInt(Rn); 1 0 1 0 Rm m = UInt(Rm); Assembler syntax VLD4. , [{ @}] VLD4. , [{ @}]! VLD4. , [{ @}], Rm = ’1111’ Rm = ’1101’ Rm = other values where: See Standard assembler syntax fields on page A8-7. An ARM VLD4 instruction must be unconditional. The data size. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 32 encoded as size = 0b10 (or 0b11 for 16-byte alignment). The registers containing the structures. It must be one of: {, , , } single-spaced registers, encoded as D:Vd =
, T = 0 A8-624 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details {, , , } double-spaced register transfer, encoded as D:Vd =
, T = 1. The base address for the access. The alignment. It can be one of: 32 4-byte alignment, available only if is 8, encoded as a = 1. 64 8-byte alignment, available only if is 16 or 32, encoded as a = 1. 128 16-byte alignment, available only if is 32, encoded as a = 1, size = 0b11. omitted Standard alignment, see Unaligned data access on page A3-5. Encoded as a = 0. ! If present, specifies writeback. Contains an address offset applied after the access. For more information about , !, and , see Advanced SIMD addressing mode on page A7-30. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); NullCheckIfThumbEE(n); address = R[n]; if (address MOD alignment) != 0 then GenerateAlignmentException(); if wback then R[n] = R[n] + (if register_index then R[m] else 4*ebytes); D[d] = Replicate(MemU[address,ebytes], elements); D[d2] = Replicate(MemU[address+ebytes,ebytes], elements); D[d3] = Replicate(MemU[address+2*ebytes,ebytes], elements); D[d4] = Replicate(MemU[address+3*ebytes,ebytes], elements); Exceptions Undefined Instruction, Data Abort. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-625 Instruction Details A8.6.319 VLDM Vector Load Multiple loads multiple extension registers from consecutive memory locations using an address from an ARM core register. Encoding T1 / A1 VFPv2, VFPv3, Advanced SIMD VLDM{mode} {!}, 15 14 13 12 11 10 9 8 7 is consecutive 64-bit registers 6 5 4 3 2 1 1 1 0 1 1 0 P U D W 1 1 0 15 14 13 12 11 10 9 8 Rn Vd 1 0 1 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 0 P U D W 1 Rn Vd 7 6 5 4 3 2 1 0 imm8 7 6 5 4 3 2 1 0 1 1 1 0 imm8 if P == ‘0’ && U == ‘0’ && W == ‘0’ then SEE “Related encodings”; if P == ‘0’ && U == ‘1’ && W == ‘1’ && Rn == ‘1101’ then SEE VPOP; if P == ‘1’ && W == ‘0’ then SEE VLDR; if P == U && W == ‘1’ then UNDEFINED; // Remaining combinations are PUW = 010 (IA without !), 011 (IA with !), 101 (DB with !) single_regs = FALSE; add = (U == ‘1’); wback = (W == ‘1’); d = UInt(D:Vd); n = UInt(Rn); imm32 = ZeroExtend(imm8:’00’, 32); regs = UInt(imm8) DIV 2; // If UInt(imm8) is odd, see “FLDMX”. if n == 15 && (wback || CurrentInstrSet() != InstrSet_ARM) then UNPREDICTABLE; if regs == 0 || regs > 16 || (d+regs) > 32 then UNPREDICTABLE; Encoding T2 / A2 VFPv2, VFPv3 VLDM{mode} {!}, 15 14 13 12 11 10 9 8 7 is consecutive 32-bit registers 6 5 4 3 2 1 1 1 0 1 1 0 P U D W 1 1 0 15 14 13 12 11 10 9 8 Rn Vd 1 0 1 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 0 P U D W 1 Rn Vd 7 6 5 4 3 2 1 0 imm8 7 6 5 4 3 2 1 0 1 0 1 0 imm8 if P == ‘0’ && U == ‘0’ && W == ‘0’ then SEE “Related encodings”; if P == ‘0’ && U == ‘1’ && W == ‘1’ && Rn == ‘1101’ then SEE VPOP; if P == ‘1’ && W == ‘0’ then SEE VLDR; if P == U && W == ‘1’ then UNDEFINED; // Remaining combinations are PUW = 010 (IA without !), 011 (IA with !), 101 (DB with !) single_regs = TRUE; add = (U == ‘1’); wback = (W == ‘1’); d = UInt(Vd:D); n = UInt(Rn); imm32 = ZeroExtend(imm8:’00’, 32); regs = UInt(imm8); if n == 15 && (wback || CurrentInstrSet() != InstrSet_ARM) then UNPREDICTABLE; if regs == 0 || (d+regs) > 32 then UNPREDICTABLE; A8-626 Related encodings See 64-bit transfers between ARM core and extension registers on page A7-32 FLDMX Encoding T1/A1 behaves as described by the pseudocode if imm8 is odd. However, there is no UAL syntax for such encodings and their use is deprecated. For more information, see FLDMX, FSTMX on page A8-101. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VLDM{}{.} {!}, where: The addressing mode: IA Increment After. The consecutive addresses start at the address specified in . This is the default and can be omitted. Encoded as P = 0, U = 1. DB Decrement Before. The consecutive addresses end just before the address specified in . Encoded as P = 1, U = 0. See Standard assembler syntax fields on page A8-7. An optional data size specifier. If present, it must be equal to the size in bits, 32 or 64, of the registers in . The base register. The SP can be used. In the ARM instruction set, if ! is not specified the PC can be used. ! Causes the instruction to write a modified value back to . This is required if == DB, and is optional if == IA. Encoded as W = 1. If ! is omitted, the instruction does not change in this way. Encoded as W = 0. The extension registers to be loaded, as a list of consecutively numbered doubleword (encoding T1 / A1) or singleword (encoding T2 / A2) registers, separated by commas and surrounded by brackets. It is encoded in the instruction by setting D and Vd to specify the first register in the list, and imm8 to twice the number of registers in the list (encoding T1 / A1) or the number of registers in the list (encoding T2 / A2). must contain at least one register. If it contains doubleword registers it must not contain more than 16 registers. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckVFPEnabled(TRUE); NullCheckIfThumbEE(n); address = if add then R[n] else R[n]-imm32; if wback then R[n] = if add then R[n}+imm32 else R[n]-imm32; for r = 0 to regs-1 if single_regs then S[d+r] = MemA[address,4]; address = address+4; else word1 = MemA[address,4]; word2 = MemA[address+4,4]; address = address+8; // Combine the word-aligned words in the correct order for current endianness. D[d+r] = if BigEndian() then word1:word2 else word2:word1; Exceptions Undefined Instruction, Data Abort. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-627 Instruction Details A8.6.320 VLDR This instruction loads a single extension register from memory, using an address from an ARM core register, with an optional offset. Encoding T1 / A1 VFPv2, VFPv3, Advanced SIMD VLDR
, [{, #+/-}] VLDR
,
, [PC,#-0] 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 0 1 U D 0 1 1 0 15 14 13 12 11 10 9 8 Rn Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 0 1 U D 0 1 Rn single_reg = FALSE; add = (U == ‘1’); d = UInt(D:Vd); n = UInt(Rn); Encoding T2 / A2 Vd 7 6 5 4 3 2 1 0 1 1 1 0 imm8 7 6 5 4 3 2 1 0 1 1 1 0 imm8 imm32 = ZeroExtend(imm8:’00’, 32); VFPv2, VFPv3 VLDR , [{, #+/-}] VLDR ,
, [ {, #+/-}] VLDR{.64}
,
, [PC, #+/-] VLDR{.32} , [ {, #+/-}] VLDR{.32} ,
The destination register for a doubleword load. The destination register for a singleword load. The base register. The SP can be used. +/- Is + or omitted if the immediate offset is to be added to the base register value (add == TRUE), or – if it is to be subtracted (add == FALSE). #0 and #-0 generate different instructions. The immediate offset used to form the address. For the immediate forms of the syntax, can be omitted, in which case the #0 form of the instruction is assembled. Permitted values are multiples of 4 in the range 0 to 1020.
, , V.
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 0 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 0 1 1 0 N Q M op 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 0 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 1 1 0 N Q M op 1 0 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; if size == ‘11’ then UNDEFINED; maximum = (op == ‘0’); unsigned = (U == ‘1’); esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-630 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax V.
{,} , V.
{
,} , Encoded as Q = 1 Encoded as Q = 0 where: Must be one of: MAX encoded as op = 0 MIN encoded as op = 1. See Standard assembler syntax fields on page A8-7. An ARM VMAX or VMIN instruction must be unconditional.
The data types for the elements of the vectors. It must be one of: S8 size = 0b00, U = 0 S16 size = 0b01, U = 0 S32 size = 0b10, U = 0 U8 size = 0b00, U = 1 U16 size = 0b01, U = 1 U32 size = 0b10, U = 1. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 op1 = Int(Elem[D[n+r],e,esize], unsigned); op2 = Int(Elem[D[m+r],e,esize], unsigned); result = if maximum then Max(op1,op2) else Min(op1,op2); Elem[D[d+r],e,esize] = result; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-631 Instruction Details A8.6.322 VMAX, VMIN (floating-point) Vector Maximum compares corresponding elements in two vectors, and copies the larger of each pair into the corresponding element in the destination vector. Vector Minimum compares corresponding elements in two vectors, and copies the smaller of each pair into the corresponding element in the destination vector. The operand vector elements are 32-bit floating-point numbers. Encoding T1 / A1 Advanced SIMD (UNDEFINED in integer-only variant) V.F32 , , V.F32
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 1 0 D op sz 1 0 15 14 13 12 11 10 9 8 Vn Vd 1 1 1 1 N Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 0 0 D op sz Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 1 1 1 N Q M 0 1 0 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; if sz == ‘1’ then UNDEFINED; maximum = (op == ‘0’); esize = 32; elements = 2; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-632 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax V.F32 {,} , V.F32 {
,} , Encoded as Q = 1 Encoded as Q = 0 where: Must be one of: MAX encoded as op = 0 MIN encoded as op = 1. See Standard assembler syntax fields on page A8-7. An ARM VMAX or VMIN instruction must be unconditional. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 op1 = Elem[D[n+r],e,esize]; op2 = Elem[D[m+r],e,esize]; Elem[D[d+r],e,esize] = if maximum then FPMax(op1,op2,FALSE) else FPMin(op1,op2,FALSE); Exceptions Undefined Instruction. Floating-point maximum and minimum • • • ARM DDI 0406B max(+0.0, –0.0) = +0.0 min(+0.0, –0.0) = –0.0 If any input is a NaN, the corresponding result element is the default NaN. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-633 Instruction Details A8.6.323 VMLA, VMLAL, VMLS, VMLSL (integer) Vector Multiply Accumulate and Vector Multiply Subtract multiply corresponding elements in two vectors, and either add the products to, or subtract them from, the corresponding elements of the destination vector. Vector Multiply Accumulate Long and Vector Multiply Subtract Long do the same thing, but with destination vector elements that are twice as long as the elements that are multiplied. Encoding T1 / A1 Advanced SIMD V.
, , V.
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 op 1 1 1 1 0 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 1 0 0 1 N Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 op 0 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 0 0 1 N Q M 0 1 0 1 0 Vm if size == ‘11’ then UNDEFINED; if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; add = (op == ‘0’); long_destination = FALSE; unsigned = FALSE; // “Don’t care” value: TRUE produces same functionality esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; Encoding T2 / A2 Advanced SIMD VL.
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 1 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 1 0 op 0 N 0 M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 1 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 0 op 0 N 0 M 0 1 0 1 0 Vm if size == ‘11’ then SEE “Related encodings”; if Vd<0> == ‘1’ then UNDEFINED; add = (op == ‘0’); long_destination = TRUE; unsigned = (U == ‘1’); esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = 1; Related encodings A8-634 See Advanced SIMD data-processing instructions on page A7-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax V. , , V.
, , VL. , , Encoding T1 / A1, Q = 1 Encoding T1 / A1, Q = 0 Encoding T2 / A2 where: Must be either MLA (op = 0) or MLS (op = 1). See Standard assembler syntax fields on page A8-7. An ARM Advanced SIMD VMLA, VMLAL, VMLS, or VMLSL instruction must be unconditional. The data type for the elements of the operands. It must be one of: S Optional in encoding T1 / A1. U = 0 in encoding T2 / A2. U Optional in encoding T1 / A1. U = 1 in encoding T2 / A2. I Available only in encoding T1 / A1. The data size for the elements of the operands. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 32 encoded as size = 0b10. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. , , The destination vector and the operand vectors, for a long operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 product = Int(Elem[D[n+r],e,esize],unsigned) * Int(Elem[D[m+r],e,esize],unsigned); addend = if add then product else -product; if long_destination then Elem[Q[d>>1],e,2*esize] = Elem[Q[d>>1],e,2*esize] + addend; else Elem[D[d+r],e,esize] = Elem[D[d+r],e,esize] + addend; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-635 Instruction Details A8.6.324 VMLA, VMLS (floating-point) Vector Multiply Accumulate multiplies corresponding elements in two vectors, and accumulates the results into the elements of the destination vector. Vector Multiply Subtract multiplies corresponding elements in two vectors, subtracts the products from corresponding elements of the destination vector, and places the results in the destination vector. Encoding T1 / A1 Advanced SIMD (UNDEFINED in integer-only variant) V.F32 , , V.F32
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 1 0 D op sz 1 0 15 14 13 12 11 10 9 8 Vn Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 0 0 D op sz Vn Vd 7 6 5 4 3 2 1 1 0 1 N Q M 1 7 6 5 4 3 2 1 1 0 1 N Q M 1 1 0 Vm 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; if sz == ‘1’ then UNDEFINED; advsimd = TRUE; add = (op == ‘0’); esize = 32; elements = 2; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; Encoding T2 / A2 VFPv2, VFPv3 (sz = 1 UNDEFINED in single-precision only variants) V.F64
, , V.F32 , , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 0 0 D 0 0 1 0 15 14 13 12 11 10 9 8 Vn Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 1 0 0 D 0 0 Vn Vd 7 6 5 4 3 2 1 0 1 sz N op M 0 7 6 5 4 3 2 1 0 1 sz N op M 0 1 0 Vm 1 0 Vm if FPSCR.LEN != ‘000’ || FPSCR.STRIDE != ‘00’ then SEE “VFP vectors”; advsimd = FALSE; dp_operation = (sz == ‘1’); add = (op == ‘0’); d = if dp_operation then UInt(D:Vd) else UInt(Vd:D); n = if dp_operation then UInt(N:Vn) else UInt(Vn:N); m = if dp_operation then UInt(M:Vm) else UInt(Vm:M); VFP vectors A8-636 Encoding T2 / A2 can operate on VFP vectors under control of the FPSCR.LEN and FPSCR.STRIDE bits. For details see Appendix F VFP Vector Operation Support. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax V.F32 , , V.F32
, , V.F64
, , V.F32 , , Encoding T1 / A1, Q = 1, sz = 0 Encoding T1 / A1, Q = 0, sz = 0 Encoding T2 / A2, sz = 1 Encoding T2 / A2, sz = 0 where: Must be either MLA (op = 0) or MLS (op = 1). See Standard assembler syntax fields on page A8-7. An ARM Advanced SIMD VMLA or VMLS instruction must be unconditional. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. , , The destination vector and the operand vectors, for a singleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDOrVFPEnabled(TRUE, advsimd); if advsimd then // Advanced SIMD instruction for r = 0 to regs-1 for e = 0 to elements-1 product = FPMul(Elem[D[n+r],e,esize], Elem[D[m+r],e,esize], FALSE); addend = if add then product else FPNeg(product); Elem[D[d+r],e,esize] = FPAdd(Elem[D[d+r],e,esize], addend, FALSE); else // VFP instruction if dp_operation then addend = if add then FPMul(D[n], D[m], TRUE) else FPNeg(FPMul(D[n], D[m], TRUE)); D[d] = FPAdd(D[d], addend, TRUE); else addend = if add then FPMul(S[n], S[m], TRUE) else FPNeg(FPMul(S[n], S[m], TRUE)); S[d] = FPAdd(S[d], addend, TRUE); Exceptions Undefined Instruction. Floating-point exceptions: Input Denormal, Invalid Operation, Overflow, Underflow, and Inexact. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-637 Instruction Details A8.6.325 VMLA, VMLAL, VMLS, VMLSL (by scalar) Vector Multiply Accumulate and Vector Multiply Subtract multiply elements of a vector by a scalar, and either add the products to, or subtract them from, corresponding elements of the destination vector. Vector Multiply Accumulate Long and Vector Multiply Subtract Long do the same thing, but with destination vector elements that are twice as long as the elements that are multiplied. For more information about scalars see Advanced SIMD scalars on page A7-9. Encoding T1 / A1 Advanced SIMD (F = 1 UNDEFINED in integer-only variants) V.
, , V.
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 Q 1 1 1 1 1 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 0 op 0 F N 1 M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 Q 1 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 op 0 F N 1 M 0 1 0 1 0 Vm if size == ‘11’ then SEE “Related encodings”; if size == ‘00’ || (F == ‘1’ && size == ‘01’) then UNDEFINED; if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’) then UNDEFINED; unsigned = FALSE; // “Don’t care” value: TRUE produces same functionality add = (op == ‘0’); floating_point = (F == ‘1’); long_destination = FALSE; d = UInt(D:Vd); n = UInt(N:Vn); regs = if Q == ‘0’ then 1 else 2; if size == ‘01’ then esize = 16; elements = 4; m = UInt(Vm<2:0>); index = UInt(M:Vm<3>); if size == ‘10’ then esize = 32; elements = 2; m = UInt(Vm); index = UInt(M); Encoding T2 / A2 Advanced SIMD VL.
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 1 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 1 D size Vn Vd 7 6 5 4 3 2 0 op 1 0 N 1 M 0 7 6 5 4 3 2 0 op 1 0 N 1 M 0 1 0 Vm 1 0 Vm if size == ‘11’ then SEE “Related encodings”; if size == ‘00’ || Vd<0> == ‘1’ then UNDEFINED; unsigned = (U == ‘1’); add = (op == ‘0’); floating_point = FALSE; long_destination = TRUE; d = UInt(D:Vd); n = UInt(N:Vn); regs = 1; if size == ‘01’ then esize = 16; elements = 4; m = UInt(Vm<2:0>); index = UInt(M:Vm<3>); if size == ‘10’ then esize = 32; elements = 2; m = UInt(Vm); index = UInt(M); Related encodings A8-638 See Advanced SIMD data-processing instructions on page A7-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax V. , , V.
, , VL. , , Encoding T1 / A1, Q = 1 Encoding T1 / A1, Q = 0 Encoding T2 / A2 where: ,
, , Must be either MLA (encoded as op = 0) or MLS (encoded as op = 1). See Standard assembler syntax fields on page A8-7. An ARM Advanced SIMD VMLA, VMLAL, VMLS, or VMLSL instruction must be unconditional. The data type for the elements of the operands. It must be one of: S encoding T2 / A2, U = ’0’. U encoding T2 / A2, U = ’1’. I encoding T1 / A1, F = ’0’. F encoding T1 / A1, F = ’1’. must be 32. The operand element data size. It can be 16 (size = ’01’) or 32 (size = ’10’). The accumulate vector, and the operand vector, for a quadword operation. The accumulate vector, and the operand vector, for a doubleword operation. The accumulate vector, and the operand vector, for a long operation. The scalar. Dm is restricted to D0-D7 if is 16, or D0-D15 otherwise. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); op2 = Elem[D[m],index,esize]; op2val = Int(op2, unsigned); for r = 0 to regs-1 for e = 0 to elements-1 op1 = Elem[D[n+r],e,esize]; op1val = Int(op1, unsigned); if floating_point then fp_addend = if add then FPMul(op1,op2,FALSE) else FPNeg(FPMul(op1,op2,FALSE)); Elem[D[d+r],e,esize] = FPAdd(Elem[D[d+r],e,esize], fp_addend, FALSE); else addend = if add then op1val*op2val else -op1val*op2val; if long_destination then Elem[Q[d>>1],e,2*esize] = Elem[Q[d>>1],e,2*esize] + addend; else Elem[D[d+r],e,esize] = Elem[D[d+r],e,esize] + addend; Exceptions Undefined Instruction. Floating-point exceptions: Input Denormal, Invalid Operation, Overflow, Underflow, and Inexact. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-639 Instruction Details A8.6.326 VMOV (immediate) This instruction places an immediate constant into every element of the destination register. Encoding T1 / A1 Advanced SIMD VMOV.
, # VMOV.
, # 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 i 1 1 1 1 1 D 0 0 0 1 0 15 14 13 12 11 10 9 8 imm3 Vd cmode 7 6 5 4 3 2 0 Q op 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 0 0 1 i 1 D 0 0 0 0 Q op 1 imm3 Vd cmode 1 0 imm4 1 0 imm4 if op == ‘0’ && cmode<0> == ‘1’ && cmode<3:2> != ‘11’ then SEE VORR (immediate); if op == ‘1’ && cmode != ‘1110’ then SEE “Related encodings”; if Q == ‘1’ && Vd<0> == ‘1’ then UNDEFINED; single_register = FALSE; advsimd = TRUE; imm64 = AdvSIMDExpandImm(op, cmode, i:imm3:imm4); d = UInt(D:Vd); regs = if Q == ‘0’ then 1 else 2; Encoding T2 / A2 VFPv3 (sz = 1 UNDEFINED in single-precision only variants) VMOV.F64
, # VMOV.F32 , # 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 0 1 D 1 1 1 0 15 14 13 12 11 10 9 8 imm4H Vd 1 0 1 sz (0) 0 (0) 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 1 0 1 D 1 1 imm4H Vd 7 6 5 4 3 2 imm4L 7 6 5 4 3 2 1 0 1 sz (0) 0 (0) 0 1 0 1 0 imm4L if FPSCR.LEN != ‘000’ || FPSCR.STRIDE != ‘00’ then SEE “VFP vectors”; single_register = (sz == ‘0’); advsimd = FALSE; if single_register then d = UInt(Vd:D); imm32 = VFPExpandImm(imm4H:imm4L, 32); else d = UInt(D:Vd); imm64 = VFPExpandImm(imm4H:imm4L, 64); regs = 1; A8-640 Related encodings See One register and a modified immediate value on page A7-21 VFP vectors Encoding T2 / A2 can operate on VFP vectors under control of the FPSCR.LEN and FPSCR.STRIDE bits. For details see Appendix F VFP Vector Operation Support. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VMOV.
, # VMOV.
, # VMOV.F64
, # VMOV.F32 , # Encoding T1 / A1, Q = 1 Encoding T1 / A1, Q = 0 Encoding T2 / A2, sz = 1 Encoding T2 / A2, sz = 0 where: See Standard assembler syntax fields on page A8-7. An ARM Advanced SIMD VMOV (immediate) instruction must be unconditional.
The data type. It must be one of I8, I16, I32, I64, or F32. The destination register for a quadword operation.
The destination register for a doubleword operation. The destination register for a singleword operation. A constant of the type specified by
. This constant is replicated enough times to fill the destination register. For example, VMOV.I32 D0,#10 writes 0x0000000A0000000A to D0. For the range of constants available, and the encoding of
and , see: • One register and a modified immediate value on page A7-21 for encoding T1 / A1 • VFP data-processing instructions on page A7-24 for encoding T2 / A2. Operation if ConditionPassed() then EncodingSpecificOperations(); if single_register then S[d] = imm32; else for r = 0 to regs-1 D[d+r] = imm64; CheckAdvSIMDOrVFPEnabled(TRUE, advsimd); Exceptions Undefined Instruction. Pseudo-instructions One register and a modified immediate value on page A7-21 describes pseudo-instructions with a combination of
and that is not supported by hardware, but that generates the same destination register value as a different combination that is supported by hardware. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-641 Instruction Details A8.6.327 VMOV (register) This instruction copies the contents of one register to another. Encoding T1 / A1 Advanced SIMD VMOV , VMOV
, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 1 0 D 1 0 1 0 15 14 13 12 11 10 9 8 Vm Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 0 0 D 1 0 Vm Vd 7 6 5 4 3 2 0 0 0 1 M Q M 1 7 6 5 4 3 2 0 0 0 1 M Q M 1 1 0 Vm 1 0 Vm if !Consistent(M) || !Consistent(Vm) then SEE VORR (register); if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; single_register = FALSE; advsimd = TRUE; d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; Encoding T2 / A2 VFPv2, VFPv3 (sz = 1 UNDEFINED in single-precision only variants) VMOV.F64
, VMOV.F32 , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 0 1 1 1 0 1 D 1 1 0 0 0 0 Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 1 0 1 D 1 1 0 0 0 0 if FPSCR.LEN != ‘000’ || single_register = (sz == if single_register then d = UInt(Vd:D); m = else d = UInt(D:Vd); m = VFP vectors A8-642 Vd 7 6 5 4 3 2 1 0 1 sz 0 1 M 0 7 6 5 4 3 2 1 0 1 sz 0 1 M 0 1 0 Vm 1 0 Vm FPSCR.STRIDE != ‘00’ then SEE “VFP vectors”; ‘0’); advsimd = FALSE; UInt(Vm:M); UInt(M:Vm); regs = 1; Encoding T2 / A2 can operate on VFP vectors under control of the FPSCR.LEN and FPSCR.STRIDE bits. For details see Appendix F VFP Vector Operation Support. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VMOV{.
} , VMOV{.
}
, VMOV.F64
, VMOV.F32 , Encoding T1 / A1, Q = 1 Encoding T1 / A1, Q = 0 Encoding T2 / A2, sz = 1 Encoding T2 / A2, sz = 0 where: See Standard assembler syntax fields on page A8-7. An ARM Advanced SIMD VMOV (register) instruction must be unconditional.
An optional data type.
must not be F64, but it is otherwise ignored. , The destination register and the source register, for a quadword operation.
, The destination register and the source register, for a doubleword operation. , The destination register and the source register, for a singleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); if single_register then S[d] = S[m]; else for r = 0 to regs-1 D[d+r] = D[m+r]; CheckAdvSIMDOrVFPEnabled(TRUE, advsimd); Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-643 Instruction Details A8.6.328 VMOV (ARM core register to scalar) This instruction copies a byte, halfword, or word from an ARM core register into an Advanced SIMD scalar. On a VFP-only system, this instruction transfers one word to the upper or lower half of a double-precision floating-point register from an ARM core register. This is an identical operation to the Advanced SIMD single word transfer. For more information about scalars see Advanced SIMD scalars on page A7-9. Encoding T1 / A1 VFPv2, VFPv3, Advanced SIMD if opc1 == ’0x’ && opc2 == '00' Advanced SIMD otherwise VMOV. , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 0 0 opc1 0 1 0 15 14 13 12 11 10 9 8 Vd Rt 1 1 1 0 0 opc1 0 Vd Rt 1 0 1 0 1 1 D opc2 1 (0) (0) (0) (0) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 7 6 5 4 3 2 7 6 5 4 3 2 1 0 1 0 1 1 D opc2 1 (0) (0) (0) (0) case opc1:opc2 of when ‘1xxx’ advsimd = TRUE; esize = 8; index = UInt(opc1<0>:opc2); when ‘0xx1’ advsimd = TRUE; esize = 16; index = UInt(opc1<0>:opc2<1>); when ‘0x00’ advsimd = FALSE; esize = 32; index = UInt(opc1<0>); when ‘0x10’ UNDEFINED; d = UInt(D:Vd); t = UInt(Rt); if t == 15 || (CurrentInstrSet() != InstrSet_ARM && t == 13) then UNPREDICTABLE; A8-644 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VMOV{.} , where: See Standard assembler syntax fields on page A8-7. The data size. It must be one of: 8 Encoded as opc1<1> = 1. [x] is encoded in opc1<0>, opc2. 16 Encoded as opc1<1>, opc2<0> = 0b01. [x] is encoded in opc1<0>, opc2<1>. 32 Encoded as opc1<1>, opc2 = 0b000. [x] is encoded in opc1<0>. omitted equivalent to 32. The scalar. The register
is encoded in D:Vd. For details of how [x] is encoded, see the description of . The source ARM core register. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDOrVFPEnabled(TRUE, advsimd); Elem[D[d+r],index,esize] = R[t]; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-645 Instruction Details A8.6.329 VMOV (scalar to ARM core register) This instruction copies a byte, halfword, or word from an Advanced SIMD scalar to an ARM core register. Bytes and halfwords can be either zero-extended or sign-extended. On a VFP-only system, this instruction transfers one word from the upper or lower half of a double-precision floating-point register to an ARM core register. This is an identical operation to the Advanced SIMD single word transfer. For more information about scalars see Advanced SIMD scalars on page A7-9. Encoding T1 / A1 VFPv2, VFPv3, Advanced SIMD if opc1 == ’0x’ && opc2 == '00' Advanced SIMD otherwise VMOV.
, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 0 U opc1 1 1 0 15 14 13 12 11 10 9 8 Vn Rt 1 1 1 0 U opc1 1 Vn Rt 1 0 1 0 1 1 N opc2 1 (0) (0) (0) (0) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 7 6 5 4 3 2 7 6 5 4 3 2 1 0 1 0 1 1 N opc2 1 (0) (0) (0) (0) case U:opc1:opc2 of when ‘x1xxx’ advsimd = TRUE; esize = 8; index = UInt(opc1<0>:opc2); when ‘x0xx1’ advsimd = TRUE; esize = 16; index = UInt(opc1<0>:opc2<1>); when ‘00x00’ advsimd = FALSE; esize = 32; index = UInt(opc1<0>); when ‘10x00’ UNDEFINED; when ‘x0x10’ UNDEFINED; t = UInt(Rt); n = UInt(N:Vn); unsigned = (U == ‘1’); if t == 15 || (CurrentInstrSet() != InstrSet_ARM && t == 13) then UNPREDICTABLE; A8-646 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VMOV{.
} , where: See Standard assembler syntax fields on page A8-7.
The data type. It must be one of: S8 Encoded as opc1<2:1> = ’01’. [x] is encoded in opc1<0>, opc2. S16 Encoded as opc1<2:1>, opc2<0> = ’001’. [x] is encoded in opc1<0>, opc2<1>. U8 Encoded as opc1<2:1> = ’11’. [x] is encoded in opc1<0>, opc2. U16 Encoded as opc1<2:1>, opc2<0> = ’101.’ [x] is encoded in opc1<0>, opc2<1>. 32 Encoded as opc1<2:1>, opc2<1:0> = ’0000’. [x] is encoded in opc1<0>. omitted equivalent to 32. The scalar. For details of how [x] is encoded see the description of
. The destination ARM core register. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDOrVFPEnabled(TRUE, advsimd); if unsigned then R[t] = ZeroExtend(Elem[D[n+r],index,esize]); else R[t] = SignExtend(Elem[D[n+r],index,esize]); Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-647 Instruction Details A8.6.330 VMOV (between ARM core register and single-precision register) This instruction transfers the contents of a single-precision VFP register to an ARM core register, or the contents of an ARM core register to a single-precision VFP register. Encoding T1 / A1 VFPv2, VFPv3 VMOV , VMOV , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 0 0 0 0 op 1 0 15 14 13 12 11 10 9 8 Vn Rt 1 1 1 0 0 0 0 op Vn Rt 1 0 1 0 1 0 N (0) (0) 1 (0) (0) (0) (0) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 7 6 5 4 3 2 7 6 5 4 3 2 1 0 1 0 1 0 N (0) (0) 1 (0) (0) (0) (0) to_arm_register = (op == ‘1’); t = UInt(Rt); n = UInt(Vn:N); if t == 15 || (CurrentInstrSet() != InstrSet_ARM && t == 13) then UNPREDICTABLE; A8-648 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VMOV , VMOV , Encoded as op = 0 Encoded as op = 1 where: See Standard assembler syntax fields on page A8-7. The single-precision VFP register. The ARM core register. Operation if ConditionPassed() then EncodingSpecificOperations(); if to_arm_register then R[t] = S[n]; else S[n] = R[t]; CheckVFPEnabled(TRUE); Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-649 Instruction Details A8.6.331 VMOV (between two ARM core registers and two single-precision registers) This instruction transfers the contents of two consecutively numbered single-precision VFP registers to two ARM core registers, or the contents of two ARM core registers to a pair of single-precision VFP registers. The ARM core registers do not have to be contiguous. Encoding T1 / A1 VFPv2, VFPv3 VMOV , , , VMOV , , , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 0 0 0 1 0 op 1 0 15 14 13 12 11 10 9 8 Rt2 Rt 1 0 1 0 0 0 M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 0 0 0 1 0 op Rt2 Rt 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 0 1 0 0 0 M 1 1 0 1 0 Vm to_arm_registers = (op == ‘1’); t = UInt(Rt); t2 = UInt(Rt2); m = UInt(Vm:M); if t == 15 || t2 == 15 || m == 31 then UNPREDICTABLE; if CurrentInstrSet() != InstrSet_ARM && (t == 13 || t2 == 13) then UNPREDICTABLE; if to_arm_registers && t == t2 then UNPREDICTABLE; A8-650 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VMOV , , , VMOV , , , Encoded as op = 0 Encoded as op = 1 where: See Standard assembler syntax fields on page A8-7. The first single-precision VFP register. The second single-precision VFP register. This is the next single-precision VFP register after . The ARM core register that is transferred to or from. The ARM core register that is transferred to or from. Operation if ConditionPassed() then EncodingSpecificOperations(); if to_arm_registers then R[t] = S[m]; R[t2] = S[m+1]; else S[m] = R[t]; S[m+1] = R[t2]; CheckVFPEnabled(TRUE); Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-651 Instruction Details A8.6.332 VMOV (between two ARM core registers and a doubleword extension register) This instruction copies two words from two ARM core registers into a doubleword extension register, or from a doubleword extension register to two ARM core registers. Encoding T1 / A1 VFPv2, VFPv3, Advanced SIMD VMOV , , VMOV , , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 0 0 0 1 0 op 1 0 15 14 13 12 11 10 9 8 Rt2 Rt 1 0 1 1 0 0 M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 0 0 0 1 0 op Rt2 Rt 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 0 1 1 0 0 M 1 1 0 1 0 Vm to_arm_registers = (op == ‘1’); t = UInt(Rd); t2 = UInt(Rt2); m = UInt(M:Vm); if t == 15 || t2 == 15 then UNPREDICTABLE; if CurrentInstrSet() != InstrSet_ARM && (t == 13 || t2 == 13) then UNPREDICTABLE; if to_arm_registers && t == t2 then UNPREDICTABLE; A8-652 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VMOV , , VMOV , , Encoded as op = 0 Encoded as op = 1 where: See Standard assembler syntax fields on page A8-7. The doubleword extension register. , The two ARM core registers. Operation if ConditionPassed() then EncodingSpecificOperations(); if to_arm_registers then R[t] = D[m]<31:0>; R[t2] = D[m]<63:32>; else D[m]<31:0> = R[t]; D[m]<63:32> = R[t2]; CheckVFPEnabled(TRUE); Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-653 Instruction Details A8.6.333 VMOVL Vector Move Long takes each element in a doubleword vector, sign or zero-extends them to twice their original length, and places the results in a quadword vector. Encoding T1 / A1 Advanced SIMD VMOVL.
, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 1 D imm3 1 0 15 14 13 12 11 10 9 8 0 0 0 Vd 1 0 1 0 0 0 M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 1 D imm3 0 0 0 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 0 1 0 0 0 M 1 1 0 1 0 Vm if imm3 == ‘000’ then SEE “Related encodings”; if imm3 != ‘001’ && imm3 != ‘010’ && imm3 != ‘100’ then SEE VSHLL; if Vd<0> == ‘1’ then UNDEFINED; esize = 8 * UInt(imm3); unsigned = (U == ‘1’); elements = 64 DIV esize; d = UInt(D:Vd); m = UInt(M:Vm); Related encodings A8-654 See One register and a modified immediate value on page A7-21 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VMOVL.dt> , where: See Standard assembler syntax fields on page A8-7. An ARM VMOVL instruction must be unconditional.
The data type for the elements of the operand. It must be one of: S8 encoded as U = 0, imm3 = ’001’ S16 encoded as U = 0, imm3 = ’010’ S32 encoded as U = 0, imm3 = ’100’ U8 encoded as U = 1, imm3 = ’001’ U16 encoded as U = 1, imm3 = ’010’ U32 encoded as U = 1, imm3 = ’100’. , The destination vector and the operand vector. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for e = 0 to elements-1 result = Int(Elem[D[m],e,esize], unsigned); Elem[Q[d>>1],e,2*esize] = result<2*esize-1:0>; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-655 Instruction Details A8.6.334 VMOVN Vector Move and Narrow copies the least significant half of each element of a quadword vector into the corresponding elements of a doubleword vector. The operand vector elements can be any one of 16-bit, 32-bit, or 64-bit integers. There is no distinction between signed and unsigned integers. Encoding T1 / A1 Advanced SIMD VMOVN.
, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 1 1 1 1 1 1 D 1 1 size 1 0 Vd 0 0 1 0 0 0 M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D 1 1 size 1 0 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 0 1 0 0 0 M 0 1 0 1 0 Vm if size == ‘11’ then UNDEFINED; if Vm<0> == ‘1’ then UNDEFINED; esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); m = UInt(M:Vm); A8-656 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VMOVN.
, where: See Standard assembler syntax fields on page A8-7. An ARM VMOVN instruction must be unconditional.
The data type for the elements of the operand. It must be one of: I16 encoded as size = 0b00 I32 encoded as size = 0b01 I64 encoded as size = 0b10.
, The destination vector and the operand vector. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for e = 0 to elements-1 Elem[D[d],e,esize] = Elem[Q[m>>1],e,2*esize]; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-657 Instruction Details A8.6.335 VMRS Move to ARM core register from Advanced SIMD and VFP extension System Register moves the value of the FPSCR to a general-purpose register. For details of system level use of this instruction, see VMRS on page B6-27. Encoding T1 / A1 VFPv2, VFPv3, Advanced SIMD VMRS , FPSCR 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 0 1 1 1 0 1 1 1 1 0 0 0 1 Rt 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 1 0 1 1 1 1 0 0 0 1 Rt 7 6 5 4 3 2 1 0 1 0 1 0 0 (0) (0) 1 (0) (0) (0) (0) 7 6 5 4 3 2 1 0 1 0 1 0 0 (0) (0) 1 (0) (0) (0) (0) t = UInt(Rt); if t == 13 && CurrentInstrSet() != InstrSet_ARM then UNPREDICTABLE; A8-658 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VMRS , FPSCR where: See Standard assembler syntax fields on page A8-7. The destination ARM core register. This register can be R0-R14 or APSR_nzcv. APSR_nzcv is encoded as Rt = ’1111’, and the instruction transfers the FPSCR N, Z, C, and V flags to the APSR N, Z, C, and V flags. The pre-UAL instruction FMSTAT is equivalent to VMRS APSR_nzcv, FPSCR. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckVFPEnabled(TRUE); SerializeVFP(); VFPExcBarrier(); if t != 15 then R[t] = FPSCR; else APSR.N = FPSCR.N; APSR.Z = FPSCR.Z; APSR.C = FPSCR.C; APSR.V = FPSCR.V; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-659 Instruction Details A8.6.336 VMSR Move to Advanced SIMD and VFP extension System Register from ARM core register moves the value of a general-purpose register to the FPSCR. For details of system level use of this instruction, see VMSR on page B6-29. Encoding T1 / A1 VFPv2, VFPv3, Advanced SIMD VMSR FPSCR, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 0 1 1 1 0 1 1 1 0 0 0 0 1 Rt 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 1 0 1 1 1 0 0 0 0 1 Rt 7 6 5 4 3 2 1 0 1 0 1 0 0 (0) (0) 1 (0) (0) (0) (0) 7 6 5 4 3 2 1 0 1 0 1 0 0 (0) (0) 1 (0) (0) (0) (0) t = UInt(Rt); if t == 15 || (t == 13 && CurrentInstrSet() != InstrSet_ARM) then UNPREDICTABLE; A8-660 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VMSR FPSCR, where: See Standard assembler syntax fields on page A8-7. The general-purpose register to be transferred to the FPSCR. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckVFPEnabled(TRUE); SerializeVFP(); VFPExcBarrier(); FPSCR = R[t]; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-661 Instruction Details A8.6.337 VMUL, VMULL (integer and polynomial) Vector Multiply multiplies corresponding elements in two vectors. Vector Multiply Long does the same thing, but with destination vector elements that are twice as long as the elements that are multiplied. For information about multiplying polynomials see Polynomial arithmetic over {0,1} on page A2-67. Encoding T1 / A1 Advanced SIMD VMUL.
, , VMUL.
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 op 1 1 1 1 0 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 1 0 0 1 N Q M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 op 0 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 0 0 1 N Q M 1 1 0 1 0 Vm if size == ‘11’ || (op == ‘1’ && size != ‘00’) then UNDEFINED; if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; polynomial = (op == ‘1’); long_destination = FALSE; unsigned = FALSE; // “Don’t care” value: TRUE produces same functionality esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; Encoding T2 / A2 Advanced SIMD VMULL.
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 1 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 1 1 op 0 N 0 M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 1 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 1 op 0 N 0 M 0 1 0 1 0 Vm if size == ‘11’ then SEE “Related encodings”; if op == ‘1’ && (U != ‘0’ || size != ‘00’) then UNDEFINED; if Vd<0> == ‘1’ then UNDEFINED; polynomial = (op == ‘1’); long_destination = TRUE; unsigned = (U == ‘1’); esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = 1; Related encodings A8-662 See Advanced SIMD data-processing instructions on page A7-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VMUL. {,} , VMUL. {
,} , VMULL. , , Encoding T1 / A1. Q = 1 Encoding T1 / A1. Q = 0 Encoding T2 / A2 where: See Standard assembler syntax fields on page A8-7. An ARM Advanced SIMD VMUL or VMULL instruction must be unconditional. The data type for the elements of the operands. It must be one of: S op = 0 in both encodings. U = 0 in encoding T2 / A2 U op = 0 in both encodings. U = 1 in encoding T2 / A2 I op = 0 in encoding T1 / A1, not available in encoding T2 / A2 P op = 1 in both encodings. U= 0 in encoding T2 / A2. When is P, must be 8. The data size for the elements of the operands. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 32 encoded as size = 0b10. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. , , The destination vector and the operand vectors, for a long operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 op1 = Elem[D[n+r],e,esize]; op1val = Int(op1, unsigned); op2 = Elem[D[m+r],e,esize]; op2val = Int(op2, unsigned); if polynomial then product = PolynomialMult(op1,op2); else product = (op1val*op2val)<2*esize-1:0>; if long_destination then Elem[Q[d>>1],e,2*esize] = product; else Elem[D[d+r],e,esize] = product; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-663 Instruction Details A8.6.338 VMUL (floating-point) Vector Multiply multiplies corresponding elements in two vectors, and places the results in the destination vector. Vector Multiply Long does the same thing, but with destination vector elements that are twice as long as the elements that are multiplied. Encoding T1 / A1 Advanced SIMD (UNDEFINED in integer-only variant) VMUL.F32 , , VMUL.F32
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 1 1 1 0 D 0 sz 1 0 15 14 13 12 11 10 9 8 Vn Vd 1 1 0 1 N Q M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 0 D 0 sz Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 1 0 1 N Q M 1 1 0 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; if sz == ‘1’ then UNDEFINED; advsimd = TRUE; esize = 32; elements = 2; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; Encoding T2 / A2 VFPv2, VFPv3 (sz = 1 UNDEFINED in single-precision only variants) VMUL.F64
, , VMUL.F32 , , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 0 0 D 1 0 1 0 15 14 13 12 11 10 9 8 Vn Vd 1 0 1 sz N 0 M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 1 0 0 D 1 0 Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 0 1 sz N 0 M 0 1 0 1 0 Vm if FPSCR.LEN != ‘000’ || FPSCR.STRIDE != ‘00’ then SEE “VFP vectors”; advsimd = FALSE; dp_operation = (sz == ‘1’); d = if dp_operation then UInt(D:Vd) else UInt(Vd:D); n = if dp_operation then UInt(N:Vn) else UInt(Vn:N); m = if dp_operation then UInt(M:Vm) else UInt(Vm:M); VFP vectors A8-664 Encoding T2 / A2 can operate on VFP vectors under control of the FPSCR.LEN and FPSCR.STRIDE bits. For details see Appendix F VFP Vector Operation Support. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VMUL.F32 {,} , VMUL.F32 {
,} , VMUL.F64 {
,} , VMUL.F32 {,} , Encoding T1 / A1, Q = 1, sz = 0 Encoding T1 / A1, Q = 0, sz = 0 Encoding T2 / A2, sz = 1 Encoding T2 / A2, sz = 0 where: See Standard assembler syntax fields on page A8-7. An ARM Advanced SIMD VMUL instruction must be unconditional. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. , , The destination vector and the operand vectors, for a singleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDOrVFPEnabled(TRUE, advsimd); if advsimd then // Advanced SIMD instruction for r = 0 to regs-1 for e = 0 to elements-1 Elem[D[d+r],e,esize] = FPMul(Elem[D[n+r],e,esize], Elem[D[m+r],e,esize], FALSE); else // VFP instruction if dp_operation then D[d] = FPMul(D[n], D[m], TRUE); else S[d] = FPMul(S[n], S[m], TRUE); Exceptions Undefined Instruction. Floating-point exceptions: Input Denormal, Invalid Operation, Overflow, Underflow, and Inexact. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-665 Instruction Details A8.6.339 VMUL, VMULL (by scalar) Vector Multiply multiplies each element in a vector by a scalar, and places the results in a second vector. Vector Multiply Long does the same thing, but with destination vector elements that are twice as long as the elements that are multiplied. For more information about scalars see Advanced SIMD scalars on page A7-9. Encoding T1 / A1 Advanced SIMD (F = 1 UNDEFINED in integer-only variants) VMUL.
, , VMUL.
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 Q 1 1 1 1 1 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 Q 1 D size Vn Vd 7 6 5 4 3 2 1 0 0 F N 1 M 0 7 6 5 4 3 2 1 0 0 F N 1 M 0 1 0 Vm 1 0 Vm if size == ‘11’ then SEE “Related encodings”; if size == ‘00’ || (F == ‘1’ && size == ‘01’) then UNDEFINED; if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’) then UNDEFINED; unsigned = FALSE; // “Don’t care” value: TRUE produces same functionality floating_point = (F == ‘1’); long_destination = FALSE; d = UInt(D:Vd); n = UInt(N:Vn); regs = if Q == ‘0’ then 1 else 2; if size == ‘01’ then esize = 16; elements = 4; m = UInt(Vm<2:0>); index = UInt(M:Vm<3>); if size == ‘10’ then esize = 32; elements = 2; m = UInt(Vm); index = UInt(M); Encoding T2 / A2 Advanced SIMD VMULL.
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 1 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 1 0 1 0 N 1 M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 1 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 0 1 0 N 1 M 0 1 0 1 0 Vm if size == ‘11’ then SEE “Related encodings”; if size == ‘00’ || Vd<0> == ‘1’ then UNDEFINED; unsigned = (U == ‘1’); long_destination = TRUE; floating_point = FALSE; d = UInt(D:Vd); n = UInt(N:Vn); regs = 1; if size == ‘01’ then esize = 16; elements = 4; m = UInt(Vm<2:0>); index = UInt(M:Vm<3>); if size == ‘10’ then esize = 32; elements = 2; m = UInt(Vm); index = UInt(M); Related encodings A8-666 See Advanced SIMD data-processing instructions on page A7-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VMUL.
{,} , VMUL.
{
,} , VMULL.
, , Encoding T1 / A1, Q = 1 Encoding T1 / A1, Q = 0 Encoding T2 / A2 where: See Standard assembler syntax fields on page A8-7. An ARM Advanced SIMD VMUL or VMULL instruction must be unconditional.
The data type for the scalar, and the elements of the operand vector. It must be one of: I16 encoding T1 / A1, size = 0b01, F = 0 I32 encoding T1 / A1, size = 0b10, F = 0 F32 encoding T1 / A1, size = 0b10, F = 1 S16 encoding T2 / A2, size = 0b01, U = 0 S32 encoding T2 / A2, size = 0b10, U = 0 U16 encoding T2 / A2, size = 0b01, U = 1 U32 encoding T2 / A2, size = 0b10, U = 1. , The destination vector, and the operand vector, for a quadword operation.
, The destination vector, and the operand vector, for a doubleword operation. , The destination vector, and the operand vector, for a long operation. The scalar. Dm is restricted to D0-D7 if
is I16, S16, or U16, or D0-D15 otherwise. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); op2 = Elem[D[m],index,esize]; op2val = Int(op2, unsigned); for r = 0 to regs-1 for e = 0 to elements-1 op1 = Elem[D[n+r],e,esize]; op1val = Int(op1, unsigned); if floating_point then Elem[D[d+r],e,esize] = FPMul(op1, op2, FALSE); else if long_destination then Elem[Q[d>>1],e,2*esize] = (op1val*op2val)<2*esize-1:0>; else Elem[D[d+r],e,esize] = (op1val*op2val); Exceptions Undefined Instruction. Floating-point exceptions: Input Denormal, Invalid Operation, Overflow, Underflow, and Inexact. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-667 Instruction Details A8.6.340 VMVN (immediate) Vector Bitwise NOT (immediate) places the bitwise inverse of an immediate integer constant into every element of the destination register. For the range of constants available, see One register and a modified immediate value on page A7-21. Encoding T1 / A1 Advanced SIMD VMVN.
, # VMVN.
, # 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 i 1 1 1 1 1 D 0 0 0 1 0 15 14 13 12 11 10 9 8 imm3 Vd cmode 7 6 5 4 3 2 0 Q 1 1 imm4 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 0 0 1 i 1 D 0 0 0 0 Q 1 1 imm3 Vd cmode 1 0 1 0 imm4 if (cmode<0> == ‘1’ && cmode<3:2> != ‘11’) || cmode<3:1> == ‘111’ then SEE “Related encodings”; if Q == ‘1’ && Vd<0> == ‘1’ then UNDEFINED; imm64 = AdvSIMDExpandImm(‘1’, cmode, i:imm3:imm4); d = UInt(D:Vd); regs = if Q == ‘0’ then 1 else 2; Related encodings A8-668 See One register and a modified immediate value on page A7-21 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VMVN.dt> , # VMVN.dt>
, # Encoding T1 / A1, Q = 1 Encoding T1 / A1, Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VMVN instruction must be unconditional.
The data type. It must be either I16 or I32. The destination register for a quadword operation.
The destination register for a doubleword operation. A constant of the specified type. See One register and a modified immediate value on page A7-21 for the range of constants available, and the encoding of
and . Operation if ConditionPassed() then EncodingSpecificOperations(); for r = 0 to regs-1 D[d+r] = NOT(imm64); CheckAdvSIMDEnabled(); Exceptions Undefined Instruction. Pseudo-instructions One register and a modified immediate value on page A7-21 describes pseudo-instructions with a combination of
and that is not supported by hardware, but that generates the same destination register value as a different combination that is supported by hardware. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-669 Instruction Details A8.6.341 VMVN (register) Vector Bitwise NOT (register) takes a value from a register, inverts the value of each bit, and places the result in the destination register. The registers can be either doubleword or quadword. Encoding T1 / A1 Advanced SIMD VMVN , VMVN
, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 1 1 1 1 1 1 D 1 1 size 0 0 Vd 0 1 0 1 1 Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D 1 1 size 0 0 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 1 0 1 1 Q M 0 1 0 1 0 Vm if size != ‘00’ then UNDEFINED; if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-670 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VMVN{.
} , VMVN{.
}
, where: See Standard assembler syntax fields on page A8-7. An ARM VMVN instruction must be unconditional.
An optional data type. It is ignored by assemblers, and does not affect the encoding. , The destination vector and the operand vector, for a quadword operation.
, The destination vector and the operand vector, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); for r = 0 to regs-1 D[d+r] = NOT(D[m+r]); CheckAdvSIMDEnabled(); Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-671 Instruction Details A8.6.342 VNEG Vector Negate negates each element in a vector, and places the results in a second vector. The floating-point version only inverts the sign bit. Encoding T1 / A1 Advanced SIMD (F = 1 UNDEFINED in integer-only variants) VNEG.
, VNEG.
, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 1 1 1 1 1 1 D 1 1 size 0 1 Vd 0 F 1 1 1 Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D 1 1 size 0 1 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 F 1 1 1 Q M 0 1 0 1 0 Vm if size == ‘11’ || (F == ‘1’ && size != ‘10’) then UNDEFINED; if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; advsimd = TRUE; floating_point = (F == ‘1’); esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; Encoding T2 / A2 VFPv2, VFPv3 (sz = 1 UNDEFINED in single-precision only variants) VNEG.F64
, VNEG.F32 , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 0 1 1 1 0 1 D 1 1 0 0 0 1 Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 1 0 1 D 1 1 0 0 0 1 Vd 7 6 5 4 3 2 1 0 1 sz 0 1 M 0 7 6 5 4 3 2 1 0 1 sz 0 1 M 0 1 0 Vm 1 0 Vm if FPSCR.LEN != ‘000’ || FPSCR.STRIDE != ‘00’ then SEE “VFP vectors”; advsimd = FALSE; dp_operation = (sz == ‘1’); d = if dp_operation then UInt(D:Vd) else UInt(Vd:D); m = if dp_operation then UInt(M:Vm) else UInt(Vm:M); VFP vectors A8-672 Encoding T2 / A2 can operate on VFP vectors under control of the FPSCR.LEN and FPSCR.STRIDE bits. For details see Appendix F VFP Vector Operation Support. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VNEG.
, VNEG.
, VNEG.F32 ,
!= F64 VFP only, encoding T2/A2, sz = 0 where: See Standard assembler syntax fields on page A8-7. An ARM Advanced SIMD VNEG instruction must be unconditional.
The data type for the elements of the vectors. It must be one of: S8 encoding T1 / A1, size = 0b00, F = 0 S16 encoding T1 / A1, size = 0b01, F = 0 S32 encoding T1 / A1, size = 0b10, F = 0 F32 encoding T1 / A1, size = 0b10, F = 1 F64 encoding T2 / A2, sz = 1. , The destination vector and the operand vector, for a quadword operation.
, The destination vector and the operand vector, for a doubleword operation. , The destination vector and the operand vector, for a singleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDOrVFPEnabled(TRUE, advsimd); if advsimd then // Advanced SIMD instruction for r = 0 to regs-1 for e = 0 to elements-1 if floating_point then Elem[D[d+r],e,esize] = FPNeg(Elem[D[m+r],e,esize]); else result = -SInt(Elem[D[m+r],e,esize]); Elem[D[d+r],e,esize] = result; else // VFP instruction if dp_operation then D[d] = FPNeg(D[m]); else S[d] = FPNeg(S[m]); Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-673 Instruction Details A8.6.343 VNMLA, VNMLS, VNMUL VNMLA multiplies together two floating-point register values, adds the negation of the floating-point value in the destination register to the negation of the product, and writes the result back to the destination register. VNMLS multiplies together two floating-point register values, adds the negation of the floating-point value in the destination register to the product, and writes the result back to the destination register. VNMUL multiplies together two floating-point register values, and writes the negation of the result to the destination register. Encoding T1 / A1 VFPv2, VFPv3 (sz = 1 UNDEFINED in single-precision only variants) VNMLA.F64
, , VNMLA.F32 , , VNMLS.F64
, , VNMLS.F32 , , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 0 0 D 0 1 1 0 15 14 13 12 11 10 9 8 Vn Vd 1 0 1 sz N op M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 1 0 0 D 0 1 if FPSCR.LEN != ‘000’ || FPSCR.STRIDE != type = if op == ‘1’ then VFPNegMul_VNMLA dp_operation = (sz == ‘1’); d = if dp_operation then UInt(D:Vd) else n = if dp_operation then UInt(N:Vn) else m = if dp_operation then UInt(M:Vm) else Encoding T2 / A2 Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 0 1 sz N op M 0 1 0 1 0 Vm ‘00’ then SEE “VFP vectors”; else VFPNegMul_VNMLS; UInt(Vd:D); UInt(Vn:N); UInt(Vm:M); VFPv2, VFPv3 (sz = 1 UNDEFINED in single-precision only variants) VNMUL.F64
, , VNMUL.F32 , , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 0 0 D 1 0 1 0 15 14 13 12 11 10 9 8 Vn Vd 1 0 1 sz N 1 M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 1 0 0 D 1 0 if FPSCR.LEN != ‘000’ || FPSCR.STRIDE != type = VFPNegMul_VNMUL; dp_operation = (sz == ‘1’); d = if dp_operation then UInt(D:Vd) else n = if dp_operation then UInt(N:Vn) else m = if dp_operation then UInt(M:Vm) else VFP vectors A8-674 Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 0 1 sz N 1 M 0 1 0 1 0 Vm ‘00’ then SEE “VFP vectors”; UInt(Vd:D); UInt(Vn:N); UInt(Vm:M); These instructions can operate on VFP vectors under control of the FPSCR.LEN and FPSCR.STRIDE bits. For details see Appendix F VFP Vector Operation Support. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VN.F64
, , VN.F32 , , VNMUL.F64 {
,} , VNMUL.F32 {,} , Encoding T1 / A1 with sz = 1 Encoding T1 / A1 with sz = 0 Encoding T2 / A2 with sz = 1 Encoding T2 / A2 with sz = 0 where: See Standard assembler syntax fields on page A8-7. Must be one of: MLA op = 1 MLS op = 0.
, , The destination register and the operand registers, for a double-precision operation. , , The destination register and the operand registers, for a single-precision operation. Operation enumeration VFPNegMul {VFPNegMul_VNMLA, VFPNegMul_VNMLS, VFPNegMul_VNMUL}; if ConditionPassed() then EncodingSpecificOperations(); CheckVFPEnabled(TRUE); if dp_operation then product = FPMul(D[n], D[m], TRUE); case type of when VFPNegMul_VNMLA D[d] = FPAdd(FPNeg(D[d]), when VFPNegMul_VNMLS D[d] = FPAdd(FPNeg(D[d]), when VFPNegMul_VNMUL D[d] = FPNeg(product); else product = FPMul(S[n], S[m], TRUE); case type of when VFPNegMul_VNMLA S[d] = FPAdd(FPNeg(S[d]), when VFPNegMul_VNMLS S[d] = FPAdd(FPNeg(S[d]), when VFPNegMul_VNMUL S[d] = FPNeg(product); FPNeg(product), TRUE); product, TRUE); FPNeg(product), TRUE); product, TRUE); Exceptions Undefined Instruction. Floating-point exceptions: Invalid Operation, Overflow, Underflow, Inexact, Input Denormal. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-675 Instruction Details A8.6.344 VORN (immediate) VORN (immediate) is a pseudo-instruction, equivalent to a VORR (immediate) instruction with the immediate value bitwise inverted. For details see VORR (immediate) on page A8-678. A8.6.345 VORN (register) This instruction performs a bitwise OR NOT operation between two registers, and places the result in the destination register. The operand and result registers can be quadword or doubleword. They must all be the same size. Encoding T1 / A1 Advanced SIMD VORN , , VORN
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 1 0 D 1 1 1 0 15 14 13 12 11 10 9 8 Vn Vd 0 0 0 1 N Q M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 0 0 D 1 1 Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 0 0 1 N Q M 1 1 0 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-676 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VORN{.
} {,} , VORN{.
} {
,} , Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VORN instruction must be unconditional.
An optional data type. It is ignored by assemblers, and does not affect the encoding. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 D[d+r] = D[n+r] OR NOT(D[m+r]); Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-677 Instruction Details A8.6.346 VORR (immediate) This instruction takes the contents of the destination vector, performs a bitwise OR with an immediate constant, and returns the result into the destination vector. For the range of constants available, see One register and a modified immediate value on page A7-21. Encoding T1 / A1 Advanced SIMD VORR.
, # VORR.
, # 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 i 1 1 1 1 1 D 0 0 0 1 0 15 14 13 12 11 10 9 8 imm3 Vd cmode 7 6 5 4 3 2 0 Q 0 1 imm4 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 0 0 1 i 1 D 0 0 0 0 Q 0 1 imm3 Vd cmode 1 0 1 0 imm4 if cmode<0> == ‘0’ || cmode<3:2> == ‘11’ then SEE VMOV (immediate); if Q == ‘1’ && Vd<0> == ‘1’ then UNDEFINED; imm64 = AdvSIMDExpandImm(‘0’, cmode, i:imm3:imm4); d = UInt(D:Vd); regs = if Q == ‘0’ then 1 else 2; A8-678 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VORR.
{,} , # VORR.
{
,}
, #> Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VORR instruction must be unconditional.
The data type used for . It can be either I16 or I32. I8, I64, and F32 are also permitted, but the resulting syntax is a pseudo-instruction. The destination vector for a quadword operation.
The destination vector for a doubleword operation. A constant of the type specified by
. This constant is replicated enough times to fill the destination register. For example, VORR.I32 D0,#10 ORs 0x0000000A0000000A into D0. For details of the range of constants available, and the encoding of
and , see One register and a modified immediate value on page A7-21. Operation if ConditionPassed() then EncodingSpecificOperations(); for r = 0 to regs-1 D[d+r] = D[d+r] OR imm64; CheckAdvSIMDEnabled(); Exceptions Undefined Instruction. Pseudo-instructions VORN can be used, with a range of constants that are the bitwise inverse of the available constants for VORR. This is assembled as the equivalent VORR instruction. Disassembly produces the VORR form. One register and a modified immediate value on page A7-21 describes pseudo-instructions with a combination of
and that is not supported by hardware, but that generates the same destination register value as a different combination that is supported by hardware. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-679 Instruction Details A8.6.347 VORR (register) This instruction performs a bitwise OR operation between two registers, and places the result in the destination register. The operand and result registers can be quadword or doubleword. They must all be the same size. Encoding T1 / A1 Advanced SIMD VORR , , VORR
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 1 0 D 1 0 1 0 15 14 13 12 11 10 9 8 Vn Vd 0 0 0 1 N Q M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 0 0 D 1 0 Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 0 0 1 N Q M 1 1 0 1 0 Vm if N == M && Vn == Vm then SEE VMOV (register); if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-680 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VORR{.
} {,} , VORR{.
} {
,} , Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VORR instruction must be unconditional.
An optional data type. It is ignored by assemblers, and does not affect the encoding. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 D[d+r] = D[n+r] OR D[m+r]; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-681 Instruction Details A8.6.348 VPADAL Vector Pairwise Add and Accumulate Long adds adjacent pairs of elements of a vector, and accumulates the absolute values of the results into the elements of the destination vector. The vectors can be doubleword or quadword. The operand elements can be 8-bit, 16-bit, or 32-bit integers. The result elements are twice the length of the operand elements. Figure A8-2 shows an example of the operation of VPADAL. Encoding T1 / A1 Advanced SIMD VPADAL.
, VPADAL.
, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 1 1 1 1 1 1 D 1 1 size 0 0 Vd 0 1 1 0 op Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D 1 1 size 0 0 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 1 1 0 op Q M 0 1 0 1 0 Vm if size == ‘11’ then UNDEFINED; if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; unsigned = (op == ‘1’); esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; Dm + + Dd Figure A8-2 Operation of doubleword VPADAL for data type S16 A8-682 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VPADAL.
, VPADAL.
, Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VPADAL instruction must be unconditional.
The data type for the elements of the vectors. It must be one of: S8 encoded as size = 0b00, op = 0 S16 encoded as size = 0b01, op = 0 S32 encoded as size = 0b10, op = 0 U8 encoded as size = 0b00, op = 1 U16 encoded as size = 0b01, op = 1 U32 encoded as size = 0b10, op = 1. , The destination vector and the operand vector, for a quadword operation.
, The destination vector and the operand vector, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); h = elements/2; CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to h-1 op1 = Elem[D[m+r],2*e,esize]; op2 = Elem[D[m+r],2*e+1,esize]; result = Int(op1, unsigned) + Int(op2, unsigned); Elem[D[d+r],e,2*esize] = Elem[D[d+r],e,2*esize] + result; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-683 Instruction Details A8.6.349 VPADD (integer) Vector Pairwise Add (integer) adds adjacent pairs of elements of two vectors, and places the results in the destination vector. The operands and result are doubleword vectors. The operand and result elements must all be the same type, and can be 8-bit, 16-bit, or 32-bit integers. There is no distinction between signed and unsigned integers. Figure A8-3 shows an example of the operation of VPADD. Encoding T1 / A1 VPADD.
Advanced SIMD
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 1 0 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 1 0 1 1 N Q M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 0 0 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 0 1 1 N Q M 1 1 0 1 0 Vm if size == ‘11’ || Q == ‘1’ then UNDEFINED; esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); Dm Dn + + + + Dd Figure A8-3 Operation of VPADD for data type I16 A8-684 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VPADD.
{
,} , Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VPADD instruction must be unconditional.
The data type for the elements of the vectors. It must be one of: I8 encoding T1 / A1, size = 0b00 I16 encoding T1 / A1, size = 0b01 I32 encoding T1 / A1, size = 0b10.
, , The destination vector, the first operand vector, and the second operand vector. Operation if ConditionPassed() then EncodingSpecificOperations(); bits(64) dest; h = elements/2; CheckAdvSIMDEnabled(); for e = 0 to h-1 Elem[dest,e,esize] = Elem[D[n],2*e,esize] + Elem[D[n],2*e+1,esize]; Elem[dest,e+h,esize] = Elem[D[m],2*e,esize] + Elem[D[m],2*e+1,esize]; D[d] = dest; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-685 Instruction Details A8.6.350 VPADD (floating-point) Vector Pairwise Add (floating-point) adds adjacent pairs of elements of two vectors, and places the results in the destination vector. The operands and result are doubleword vectors. The operand and result elements are 32-bit floating-point numbers. Figure A8-3 on page A8-684 shows an example of the operation of VPADD. Encoding T1 / A1 VPADD.F32 Advanced SIMD (UNDEFINED in integer-only variant)
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 1 1 1 0 D 0 sz 1 0 15 14 13 12 11 10 9 8 Vn Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 0 D 0 sz Vn Vd 7 6 5 4 3 2 1 1 0 1 N Q M 0 7 6 5 4 3 2 1 1 0 1 N Q M 0 1 0 Vm 1 0 Vm if sz == ‘1’ || Q == ‘1’ then UNDEFINED; esize = 32; elements = 2; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); A8-686 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VPADD.F32 Encoded as Q = 0, sz = 0 {
,} , where: See Standard assembler syntax fields on page A8-7. An ARM VPADD instruction must be unconditional.
, , The destination vector, the first operand vector, and the second operand vector. Operation if ConditionPassed() then EncodingSpecificOperations(); bits(64) dest; h = elements/2; CheckAdvSIMDEnabled(); for e = 0 to h-1 Elem[dest,e,esize] = FPAdd(Elem[D[n],2*e,esize], Elem[D[n],2*e+1,esize], FALSE); Elem[dest,e+h,esize] = FPAdd(Elem[D[m],2*e,esize], Elem[D[m],2*e+1,esize], FALSE); D[d] = dest; Exceptions Undefined Instruction. Floating-point exceptions: Input Denormal, Invalid Operation, Overflow, Underflow, and Inexact. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-687 Instruction Details A8.6.351 VPADDL Vector Pairwise Add Long adds adjacent pairs of elements of two vectors, and places the results in the destination vector. The vectors can be doubleword or quadword. The operand elements can be 8-bit, 16-bit, or 32-bit integers. The result elements are twice the length of the operand elements. Figure A8-4 shows an example of the operation of VPADDL. Encoding T1 / A1 Advanced SIMD VPADDL.
, VPADDL.
, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 1 1 1 1 1 1 D 1 1 size 0 0 Vd 0 0 1 0 op Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D 1 1 size 0 0 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 0 1 0 op Q M 0 1 0 1 0 Vm if size == ‘11’ then UNDEFINED; if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; unsigned = (op == ‘1’); esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; Dm + + Dd Figure A8-4 Operation of doubleword VPADDL for data type S16 A8-688 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VPADDL.
, VPADDL.
, Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VPADDL instruction must be unconditional.
The data type for the elements of the vectors. It must be one of: S8 encoded as size = 0b00, op = 0 S16 encoded as size = 0b01, op = 0 S32 encoded as size = 0b10, op = 0 U8 encoded as size = 0b00, op = 1 U16 encoded as size = 0b01, op = 1 U32 encoded as size = 0b10, op = 1. , The destination vector and the operand vector, for a quadword operation.
, The destination vector and the operand vector, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); h = elements/2; CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to h-1 op1 = Elem[D[m+r],2*e,esize]; op2 = Elem[D[m+r],2*e+1,esize]; result = Int(op1, unsigned) + Int(op2, unsigned); Elem[D[d+r],e,2*esize] = result<2*esize-1:0>; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-689 Instruction Details A8.6.352 VPMAX, VPMIN (integer) Vector Pairwise Maximum compares adjacent pairs of elements in two doubleword vectors, and copies the larger of each pair into the corresponding element in the destination doubleword vector. Vector Pairwise Minimum compares adjacent pairs of elements in two doubleword vectors, and copies the smaller of each pair into the corresponding element in the destination doubleword vector. Figure A8-5 shows an example of the operation of VPMAX. Encoding T1 / A1 VP.
Advanced SIMD
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 0 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 1 0 1 0 N Q M op 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 0 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 0 1 0 N Q M op 1 0 1 0 Vm if size == ‘11’ || Q == ‘1’ then UNDEFINED; maximum = (op == ‘0’); unsigned = (U == ‘1’); esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); Dm Dn max max max max Dd Figure A8-5 Operation of VPMAX for data type S16 or U16 A8-690 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VP.
{
,} , Encoded as Q = 0 where: Must be one of: MAX encoded as op = 0 MIN encoded as op = 1. See Standard assembler syntax fields on page A8-7. An ARM VPMAX or VPMIN instruction must be unconditional.
The data type for the elements of the vectors. It must be one of: S8 encoding T1 / A1, size = 0b00, U = 0 S16 encoding T1 / A1, size = 0b01, U = 0 S32 encoding T1 / A1, size = 0b10, U = 0 U8 encoding T1 / A1, size = 0b00, U = 1 U16 encoding T1 / A1, size = 0b01, U = 1 U32 encoding T1 / A1, size = 0b10, U = 1.
, , The destination vector and the operand vectors. Operation if ConditionPassed() then EncodingSpecificOperations(); bits(64) dest; h = elements/2; CheckAdvSIMDEnabled(); for e = 0 to h-1 op1 = Int(Elem[D[n],2*e,esize], unsigned); op2 = Int(Elem[D[n],2*e+1,esize], unsigned); result = if maximum then Max(op1,op2) else Min(op1,op2); Elem[dest,e,esize] = result; op1 = Int(Elem[D[m],2*e,esize], unsigned); op2 = Int(Elem[D[m],2*e+1,esize], unsigned); result = if maximum then Max(op1,op2) else Min(op1,op2); Elem[dest,e+h,esize] = result; D[d] = dest; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-691 Instruction Details A8.6.353 VPMAX, VPMIN (floating-point) Vector Pairwise Maximum compares adjacent pairs of elements in two doubleword vectors, and copies the larger of each pair into the corresponding element in the destination doubleword vector. Vector Pairwise Minimum compares adjacent pairs of elements in two doubleword vectors, and copies the smaller of each pair into the corresponding element in the destination doubleword vector. Figure A8-5 on page A8-690 shows an example of the operation of VPMAX. Encoding T1 / A1 VP.F32 Advanced SIMD (UNDEFINED in integer-only variant)
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 1 1 1 0 D op sz 1 0 15 14 13 12 11 10 9 8 Vn Vd 1 1 1 1 N Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 0 D op sz Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 1 1 1 N Q M 0 1 0 1 0 Vm if sz == ‘1’ || Q == ‘1’ then UNDEFINED; maximum = (op == ‘0’); esize = 32; elements = 2; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); A8-692 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VP.F32 {
,} , Encoded as Q = 0, sz = 0 where: Must be one of: MAX encoded as op = 0 MIN encoded as op = 1. See Standard assembler syntax fields on page A8-7. An ARM VPMAX or VPMIN instruction must be unconditional.
, , The destination vector and the operand vectors. Operation if ConditionPassed() then EncodingSpecificOperations(); bits(64) dest; h = elements/2; CheckAdvSIMDEnabled(); for e = 0 to h-1 op1 = Elem[D[n],2*e,esize]; op2 = Elem[D[n],2*e+1,esize]; Elem[dest,e,esize] = if maximum then FPMax(op1,op2,FALSE) else FPMin(op1,op2,FALSE); op1 = Elem[D[m],2*e,esize]; op2 = Elem[D[m],2*e+1,esize]; Elem[dest,e+h,esize] = if maximum then FPMax(op1,op2,FALSE) else FPMin(op1,op2,FALSE); D[d] = dest; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-693 Instruction Details A8.6.354 VPOP Vector Pop loads multiple consecutive extension registers from the stack. Encoding T1 / A1 VFPv2, VFPv3, Advanced SIMD VPOP is consecutive 64-bit registers 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 0 1 1 0 0 1 D 1 1 1 1 0 1 Vd 1 0 1 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 0 0 1 D 1 1 1 1 0 1 Vd 7 6 5 4 3 2 1 0 imm8 7 6 5 4 3 2 1 0 1 1 1 0 imm8 single_regs = FALSE; d = UInt(D:Vd); imm32 = ZeroExtend(imm8:’00’, 32); regs = UInt(imm8) DIV 2; // If UInt(imm8) is odd, see “FLDMX”. if regs == 0 || regs > 16 || (d+regs) > 32 then UNPREDICTABLE; Encoding T2 / A2 VFPv2, VFPv3 VPOP is consecutive 32-bit registers 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 0 1 1 0 0 1 D 1 1 1 1 0 1 Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 0 0 1 D 1 1 1 1 0 1 Vd 7 6 5 4 3 2 1 0 1 0 1 0 imm8 7 6 5 4 3 2 1 0 1 0 1 0 imm8 single_regs = TRUE; d = UInt(Vd:D); imm32 = ZeroExtend(imm8:’00’, 32); regs = UInt(imm8); if regs == 0 || regs > 16 || (d+regs) > 32 then UNPREDICTABLE; FLDMX A8-694 Encoding T1/A1 behaves as described by the pseudocode if imm8 is odd. However, there is no UAL syntax for such encodings and their use is deprecated. For more information, see FLDMX, FSTMX on page A8-101. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VPOP{.} where: See Standard assembler syntax fields on page A8-7. An optional data size specifier. If present, it must be equal to the size in bits, 32 or 64, of the registers in . The extension registers to be loaded, as a list of consecutively numbered doubleword (encoding T1 / A1) or singleword (encoding T2 / A2) registers, separated by commas and surrounded by brackets. It is encoded in the instruction by setting D and Vd to specify the first register in the list, and imm8 to twice the number of registers in the list (encoding T1 / A1) or the number of registers in the list (encoding T2 / A2). must contain at least one register, and not more than sixteen. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckVFPEnabled(TRUE); NullCheckIfThumbEE(13); address = SP; SP = SP + imm32; if single_regs then for r = 0 to regs-1 S[d+r] = MemA[address,4]; address = address+4; else for r = 0 to regs-1 word1 = MemA[address,4]; word2 = MemA[address+4,4]; address = address+8; // Combine the word-aligned words in the correct order for current endianness. D[d+r] = if BigEndian() then word1:word2 else word2:word1; Exceptions Undefined Instruction, Data Abort. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-695 Instruction Details A8.6.355 VPUSH Vector Push stores multiple consecutive extension registers to the stack. Encoding T1 / A1 VFPv2, VFPv3, Advanced SIMD VPUSH is consecutive 64-bit registers 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 0 1 1 0 1 0 D 1 0 1 1 0 1 Vd 1 0 1 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 0 1 0 D 1 0 1 1 0 1 Vd 7 6 5 4 3 2 1 0 imm8 7 6 5 4 3 2 1 0 1 1 1 0 imm8 single_regs = FALSE; d = UInt(D:Vd); imm32 = ZeroExtend(imm8:’00’, 32); regs = UInt(imm8) DIV 2; // If UInt(imm8) is odd, see “FSTMX”. if regs == 0 || regs > 16 || (d+regs) > 32 then UNPREDICTABLE; Encoding T2 / A2 VFPv2, VFPv3 VPUSH is consecutive 32-bit registers 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 0 1 1 0 1 0 D 1 0 1 1 0 1 Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 0 1 0 D 1 0 1 1 0 1 Vd 7 6 5 4 3 2 1 0 1 0 1 0 imm8 7 6 5 4 3 2 1 0 1 0 1 0 imm8 single_regs = TRUE; d = UInt(Vd:D); imm32 = ZeroExtend(imm8:’00’, 32); regs = UInt(imm8); if regs == 0 || regs > 16 || (d+regs) > 32 then UNPREDICTABLE; FSTMX A8-696 Encoding T1/A1 behaves as described by the pseudocode if imm8 is odd. However, there is no UAL syntax for such encodings and their use is deprecated. For more information, see FLDMX, FSTMX on page A8-101. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VPUSH{.} where: See Standard assembler syntax fields on page A8-7. An optional data size specifier. If present, it must be equal to the size in bits, 32 or 64, of the registers in . The extension registers to be stored, as a list of consecutively numbered doubleword (encoding T1 / A1) or singleword (encoding T2 / A2) registers, separated by commas and surrounded by brackets. It is encoded in the instruction by setting D and Vd to specify the first register in the list, and imm8 to twice the number of registers in the list (encoding T1 / A1), or the number of registers in the list (encoding T2 / A2). must contain at least one register, and not more than sixteen. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckVFPEnabled(TRUE); NullCheckIfThumbEE(13); address = SP - imm32; SP = SP - imm32; if single_regs then for r = 0 to regs-1 MemA[address,4] = S[d+r]; address = address+4; else for r = 0 to regs-1 // Store as two word-aligned words in the correct order for current endianness. MemA[address,4] = if BigEndian() then D[d+r]<63:32> else D[d+r]<31:0>; MemA[address+4,4] = if BigEndian() then D[d+r]<31:0> else D[d+r]<63:32>; address = address+8; Exceptions Undefined Instruction, Data Abort. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-697 Instruction Details A8.6.356 VQABS Vector Saturating Absolute takes the absolute value of each element in a vector, and places the results in the destination vector. If any of the results overflow, they are saturated. The cumulative saturation flag, QC, is set if saturation occurs. For details see Pseudocode details of saturation on page A2-9. Encoding T1 / A1 Advanced SIMD VQABS.
, VQABS.
, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 1 1 1 1 1 1 D 1 1 size 0 0 Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D 1 1 size 0 0 Vd 7 6 5 4 3 2 0 1 1 1 0 Q M 0 7 6 5 4 3 2 0 1 1 1 0 Q M 0 1 0 Vm 1 0 Vm if size == ‘11’ then UNDEFINED; if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-698 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VQABS.
, VQABS.
, Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VQABS instruction must be unconditional.
The data type for the elements of the vectors. It must be one of: S8 encoded as size = 0b00 S16 encoded as size = 0b01 S32 encoded as size = 0b10. , The destination vector and the operand vector, for a quadword operation.
, The destination vector and the operand vector, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 result = Abs(SInt(Elem[D[m+r],e,esize])); (Elem[D[d+r],e,esize], sat) = SignedSatQ(result, esize); if sat then FPSCR.QC = ‘1’; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-699 Instruction Details A8.6.357 VQADD Vector Saturating Add adds the values of corresponding elements of two vectors, and places the results in the destination vector. If any of the results overflow, they are saturated. The cumulative saturation flag, QC, is set if saturation occurs. For details see Pseudocode details of saturation on page A2-9. Encoding T1 / A1 Advanced SIMD VQADD.
,, VQADD.
,, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 0 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 0 D size Vn Vd 7 6 5 4 3 2 0 0 0 0 N Q M 1 7 6 5 4 3 2 0 0 0 0 N Q M 1 1 0 Vm 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; unsigned = (U == ‘1’); esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-700 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VQADD. {,} , VQADD. {
,} , Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VQADD instruction must be unconditional. The data type for the elements of the vectors. It must be one of: S signed, encoded as U = 0 U unsigned, encoded as U = 1. The data size for the elements of the vectors. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 32 encoded as size = 0b10 64 encoded as size = 0b11. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 sum = Int(Elem[D[n+r],e,esize], unsigned) + Int(Elem[D[m+r],e,esize], unsigned); (Elem[D[d+r],e,esize], sat) = SatQ(sum, esize, unsigned); if sat then FPSCR.QC = ‘1’; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-701 Instruction Details A8.6.358 VQDMLAL, VQDMLSL Vector Saturating Doubling Multiply Accumulate Long multiplies corresponding elements in two doubleword vectors, doubles the products, and accumulates the results into the elements of a quadword vector. Vector Saturating Doubling Multiply Subtract Long multiplies corresponding elements in two doubleword vectors, subtracts double the products from corresponding elements of a quadword vector, and places the results in the same quadword vector. In both instructions, the second operand can be a scalar instead of a vector. For more information about scalars see Advanced SIMD scalars on page A7-9. If any of the results overflow, they are saturated. The cumulative saturation flag, QC, is set if saturation occurs. For details see Pseudocode details of saturation on page A2-9. Encoding T1 / A1 Advanced SIMD VQD.
,, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 1 1 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 1 0 op 1 N 0 M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 0 1 D size Vn if size == ‘11’ then SEE “Related encodings”; if size == ‘00’ || Vd<0> == ‘1’ then UNDEFINED; add = (op == ‘0’); scalar_form = FALSE; d = UInt(D:Vd); n = UInt(N:Vn); esize = 8 << UInt(size); elements = 64 DIV esize; Encoding T2 / A2 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 0 op 1 N 0 M 0 1 0 1 0 Vm m = UInt(M:Vm); Advanced SIMD VQD.
,, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 1 1 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 0 op 1 1 N 1 M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 0 1 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 op 1 1 N 1 M 0 1 0 1 0 Vm if size == ‘11’ then SEE “Related encodings”; if size == ‘00’ || Vd<0> == ‘1’ then UNDEFINED; add = (op == ‘0’); scalar_form = TRUE; d = UInt(D:Vd); n = UInt(N:Vn); if size == ‘01’ then esize = 16; elements = 4; m = UInt(Vm<2:0>); index = UInt(M:Vm<3>); if size == ‘10’ then esize = 32; elements = 2; m = UInt(Vm); index = UInt(M); Related encodings A8-702 See Advanced SIMD data-processing instructions on page A7-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VQD.
, , VQD.
, , where: Must be one of: MLAL encoded as op = 0 MLSL encoded as op = 1. See Standard assembler syntax fields on page A8-7. An ARM VQDMLAL or VQDMLSL instruction must be unconditional.
The data type for the elements of the operands. It must be one of: S16 encoded as size = 0b01 S32 encoded as size = 0b10. , The destination vector and the first operand vector. The second operand vector, for an all vector operation. The scalar for a scalar operation. If
is S16, Dm is restricted to D0-D7. If
is S32, Dm is restricted to D0-D15. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); if scalar_form then op2 = SInt(Elem[D[m],index,esize]); for e = 0 to elements-1 if !scalar_form then op2 = SInt(Elem[D[m],e,esize]); op1 = SInt(Elem[D[n],e,esize]); // The following only saturates if both op1 and op2 equal -(2^(esize-1)) (product, sat1) = SignedSatQ(2*op1*op2, 2*esize); if add then result = SInt(Elem[Q[d>>1],e,2*esize]) + SInt(product); else result = SInt(Elem[Q[d>>1],e,2*esize]) - SInt(product); (Elem[Q[d>>1],e,2*esize], sat2) = SignedSatQ(result, 2*esize); if sat1 || sat2 then FPSCR.QC = ‘1’; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-703 Instruction Details A8.6.359 VQDMULH Vector Saturating Doubling Multiply Returning High Half multiplies corresponding elements in two vectors, doubles the results, and places the most significant half of the final results in the destination vector. The results are truncated (for rounded results see VQRDMULH on page A8-712). The second operand can be a scalar instead of a vector. For more information about scalars see Advanced SIMD scalars on page A7-9. If any of the results overflow, they are saturated. The cumulative saturation flag, QC, is set if saturation occurs. For details see Pseudocode details of saturation on page A2-9. Encoding T1 / A1 Advanced SIMD VQDMULH.
,, VQDMULH.
,, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 1 0 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 0 0 D size Vn Vd 7 6 5 4 3 2 1 0 1 1 N Q M 0 7 6 5 4 3 2 1 0 1 1 N Q M 0 1 0 Vm 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; if size == ‘00’ || size == ‘11’ then UNDEFINED; scalar_form = FALSE; esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; Encoding T2 / A2 Advanced SIMD VQDMULH.
,, VQDMULH.
,, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 Q 1 1 1 1 1 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 Q 1 D size Vn Vd 7 6 5 4 3 2 1 1 0 0 N 1 M 0 7 6 5 4 3 2 1 1 0 0 N 1 M 0 1 0 Vm 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’) then UNDEFINED; if size == ‘00’ || size == ‘11’ then UNDEFINED; scalar_form = TRUE; d = UInt(D:Vd); n = UInt(N:Vn); regs = if Q == ‘0’ then 1 else 2; if size == ‘01’ then esize = 16; elements = 4; m = UInt(Vm<2:0>); index = UInt(M:Vm<3>); if size == ‘10’ then esize = 32; elements = 2; m = UInt(Vm); index = UInt(M); Related encodings A8-704 See Advanced SIMD data-processing instructions on page A7-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VQDMULH.
{,} , VQDMULH.
{
,} , VQDMULH.
{,} , VQDMULH.
{
,} , Encoding T1 / A1, Q = 1 Encoding T1 / A1, Q = 0 Encoding T2 / A2, U = 1 Encoding T2 / A2, U = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VQDMULH instruction must be unconditional.
The data type for the elements of the operands. It must be one of: S16 encoded as size = 0b01 S32 encoded as size = 0b10. , The destination vector and the first operand vector, for a quadword operation.
, The destination vector and the first operand vector, for a doubleword operation. The second operand vector, for a quadword all vector operation. The second operand vector, for a doubleword all vector operation. The scalar for either a quadword or a doubleword scalar operation. If
is S16, Dm is restricted to D0-D7. If
is S32, Dm is restricted to D0-D15. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); if scalar_form then op2 = SInt(Elem[D[m],index,esize]); for r = 0 to regs-1 for e = 0 to elements-1 if !scalar_form then op2 = SInt(Elem[D[m+r],e,esize]); op1 = SInt(Elem[D[n+r],e,esize]); // The following only saturates if both op1 and op2 equal -(2^(esize-1)) (result, sat) = SignedSatQ((2*op1*op2) >> esize, esize); Elem[D[d+r],e,esize] = result; if sat then FPSCR.QC = ‘1’; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-705 Instruction Details A8.6.360 VQDMULL Vector Saturating Doubling Multiply Long multiplies corresponding elements in two doubleword vectors, doubles the products, and places the results in a quadword vector. The second operand can be a scalar instead of a vector. For more information about scalars see Advanced SIMD scalars on page A7-9. If any of the results overflow, they are saturated. The cumulative saturation flag, QC, is set if saturation occurs. For details see Pseudocode details of saturation on page A2-9. Encoding T1 / A1 Advanced SIMD VQDMULL.
,, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 1 1 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 0 1 D size Vn if size == ‘11’ then SEE “Related encodings”; if size == ‘00’ || Vd<0> == ‘1’ then UNDEFINED; scalar_form = FALSE; d = UInt(D:Vd); n = UInt(N:Vn); esize = 8 << UInt(size); elements = 64 DIV esize; Encoding T2 / A2 Vd 7 6 5 4 3 2 1 1 0 1 N 0 M 0 7 6 5 4 3 2 1 1 0 1 N 0 M 0 1 0 Vm 1 0 Vm m = UInt(M:Vm); Advanced SIMD VQDMULL.
,, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 1 1 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 1 0 1 1 N 1 M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 0 1 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 0 1 1 N 1 M 0 1 0 1 0 Vm if size == ‘11’ then SEE “Related encodings”; if size == ‘00’ || Vd<0> == ‘1’ then UNDEFINED; scalar_form = TRUE; d = UInt(D:Vd); n = UInt(N:Vn); if size == ‘01’ then esize = 16; elements = 4; m = UInt(Vm<2:0>); index = UInt(M:Vm<3>); if size == ‘10’ then esize = 32; elements = 2; m = UInt(Vm); index = UInt(M); Related encodings A8-706 See Advanced SIMD data-processing instructions on page A7-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VQDMULL.
, , VQDMULL.
, , where: See Standard assembler syntax fields on page A8-7. An ARM VQDMULL instruction must be unconditional.
The data type for the elements of the operands. It must be one of: S16 encoded as size = 0b01 S32 encoded as size = 0b10. , The destination vector and the first operand vector. The second operand vector, for an all vector operation. The scalar for a scalar operation. If
is S16, Dm is restricted to D0-D7. If
is S32, Dm is restricted to D0-D15. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); if scalar_form then op2 = SInt(Elem[D[m],index,esize]); for e = 0 to elements-1 if !scalar_form then op2 = SInt(Elem[D[m],e,esize]); op1 = SInt(Elem[D[n],e,esize]); // The following only saturates if both op1 and op2 equal -(2^(esize-1)) (product, sat) = SignedSatQ(2*op1*op2, 2*esize); Elem[Q[d>>1],e,2*esize] = product; if sat then FPSCR.QC = ‘1’; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-707 Instruction Details A8.6.361 VQMOVN, VQMOVUN Vector Saturating Move and Narrow copies each element of the operand vector to the corresponding element of the destination vector. The operand is a quadword vector. The elements can be any one of: • 16-bit, 32-bit, or 64-bit signed integers • 16-bit, 32-bit, or 64-bit unsigned integers. The result is a doubleword vector. The elements are half the length of the operand vector elements. If the operand is unsigned, the results are unsigned. If the operand is signed, the results can be signed or unsigned. If any of the results overflow, they are saturated. The cumulative saturation flag, QC, is set if saturation occurs. For details see Pseudocode details of saturation on page A2-9. Encoding T1 / A1 Advanced SIMD VQMOV{U}N.
, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 1 1 1 1 1 1 D 1 1 size 1 0 Vd 0 0 1 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D 1 1 size 1 0 Vd 0 0 1 0 7 6 5 4 3 2 op M 0 7 6 5 4 3 2 op M 0 1 0 Vm 1 0 Vm if op == ‘00’ then SEE VMOVN; if size == ‘11’ || Vm<0> == ‘1’ then UNDEFINED; source_unsigned = (op == ‘11’); dest_unsigned = (op<0> == ‘1’); esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); m = UInt(M:Vm); A8-708 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VQMOV{U}N.
, where: U If present, specifies that the operation produces unsigned results, even though the operands are signed. Encoded as op = 0b01. See Standard assembler syntax fields on page A8-7. An ARM VQMOVN or VQMOVUN instruction must be unconditional. The data type for the elements of the operand. It must be one of: S encoded as: • op = 0b10 for VQMOVN • op = 0b01 for VQMOVUN. U encoded as op = 0b11. Not available for VQMOVUN. The data size for the elements of the operand. It must be one of: 16 encoded as size = 0b00 32 encoded as size = 0b01 64 encoded as size = 0b10.
, The destination vector and the operand vector. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for e = 0 to elements-1 operand = Int(Elem[Q[m>>1],e,2*esize], src_unsigned); (Elem[D[d],e,esize], sat) = SatQ(operand, esize, dest_unsigned); if sat then FPSCR.QC = ‘1’; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-709 Instruction Details A8.6.362 VQNEG Vector Saturating Negate negates each element in a vector, and places the results in the destination vector. If any of the results overflow, they are saturated. The cumulative saturation flag, QC, is set if saturation occurs. For details see Pseudocode details of saturation on page A2-9. Encoding T1 / A1 Advanced SIMD VQNEG.
, VQNEG.
, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 1 1 1 1 1 1 D 1 1 size 0 0 Vd 0 1 1 1 1 Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D 1 1 size 0 0 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 1 1 1 1 Q M 0 1 0 1 0 Vm if size == ‘11’ then UNDEFINED; if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-710 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VQNEG.
, VQNEG.
, Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VQNEG instruction must be unconditional.
The data type for the elements of the vectors. It must be one of: S8 encoded as size = 0b00 S16 encoded as size = 0b01 S32 encoded as size = 0b10. , The destination vector and the operand vector, for a quadword operation.
, The destination vector and the operand vector, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 result = -SInt(Elem[D[m+r],e,esize]); (Elem[D[d+r],e,esize], sat) = SignedSatQ(result, esize); if sat then FPSCR.QC = ‘1’; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-711 Instruction Details A8.6.363 VQRDMULH Vector Saturating Rounding Doubling Multiply Returning High Half multiplies corresponding elements in two vectors, doubles the results, and places the most significant half of the final results in the destination vector. The results are rounded (for truncated results see VQDMULH on page A8-704). The second operand can be a scalar instead of a vector. For more information about scalars see Advanced SIMD scalars on page A7-9. If any of the results overflow, they are saturated. The cumulative saturation flag, QC, is set if saturation occurs. For details see Pseudocode details of saturation on page A2-9. Encoding T1 / A1 Advanced SIMD VQRDMULH.
,, VQRDMULH.
,, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 1 1 1 0 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 0 D size Vn Vd 7 6 5 4 3 2 1 0 1 1 N Q M 0 7 6 5 4 3 2 1 0 1 1 N Q M 0 1 0 Vm 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; if size == ‘00’ || size == ‘11’ then UNDEFINED; scalar_form = FALSE; esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; Encoding T2 / A2 Advanced SIMD VQRDMULH.
,, VQRDMULH.
,, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 Q 1 1 1 1 1 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 Q 1 D size Vn Vd 7 6 5 4 3 2 1 1 0 1 N 1 M 0 7 6 5 4 3 2 1 1 0 1 N 1 M 0 1 0 Vm 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’) then UNDEFINED; if size == ‘00’ || size == ‘11’ then UNDEFINED; scalar_form = TRUE; d = UInt(D:Vd); n = UInt(N:Vn); regs = if Q == ‘0’ then 1 else 2; if size == ‘01’ then esize = 16; elements = 4; m = UInt(Vm<2:0>); index = UInt(M:Vm<3>); if size == ‘10’ then esize = 32; elements = 2; m = UInt(Vm); index = UInt(M); Related encodings A8-712 See Advanced SIMD data-processing instructions on page A7-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VQRDMULH.
{,} , VQRDMULH.
{
,} , VQRDMULH.
{,} , VQRDMULH.
{
,} , Encoding T1 / A1, Q = 1 Encoding T1 / A1, Q = 0 Encoding T2 / A2, Q = 1 Encoding T2 / A2, Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VQRDMULH instruction must be unconditional.
The data type for the elements of the operands. It must be one of: S16 encoded as size = 0b01 S32 encoded as size = 0b10. , The destination vector and the first operand vector, for a quadword operation.
, The destination vector and the first operand vector, for a doubleword operation. The second operand vector, for a quadword all vector operation. The second operand vector, for a doubleword all vector operation. The scalar for either a quadword or a doubleword scalar operation. If
is S16, Dm is restricted to D0-D7. If
is S32, Dm is restricted to D0-D15. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); round_const = 1 << (esize-1); if scalar_form then op2 = SInt(Elem[D[m],index,esize]); for r = 0 to regs-1 for e = 0 to elements-1 op1 = SInt(Elem[D[n+r],e,esize]); if !scalar_form then op2 = SInt(Elem[D[m+r],e,esize]); (result, sat) = SignedSatQ((2*op1*op2 + round_const) >> esize, esize); Elem[D[d+r],e,esize] = result; if sat then FPSCR.QC = ‘1’; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-713 Instruction Details A8.6.364 VQRSHL Vector Saturating Rounding Shift Left takes each element in a vector, shifts them by a value from the least significant byte of the corresponding element of a second vector, and places the results in the destination vector. If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. For truncated results see VQSHL (register) on page A8-718. The first operand and result elements are the same data type, and can be any one of: • 8-bit, 16-bit, 32-bit, or 64-bit signed integers • 8-bit, 16-bit, 32-bit, or 64-bit unsigned integers. The second operand is a signed integer of the same size. If any of the results overflow, they are saturated. The cumulative saturation flag, QC, is set if saturation occurs. For details see Pseudocode details of saturation on page A2-9. Encoding T1 / A1 Advanced SIMD VQRSHL. ,, VQRSHL.
,, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 0 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 0 1 0 1 N Q M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 0 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 1 0 1 N Q M 1 1 0 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’ || Vn<0> == ‘1’) then UNDEFINED; unsigned = (U == ‘1’); esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); m = UInt(M:Vm); n = UInt(N:Vn); regs = if Q == ‘0’ then 1 else 2; A8-714 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VQRSHL. {,} , VQRSHL. {
,} , Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VQRSHL instruction must be unconditional. The data type for the elements of the vectors. It must be one of: S signed, encoded as U = 0 U unsigned, encoded as U = 1. The data size for the elements of the vectors. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 32 encoded as size = 0b10 64 encoded as size = 0b11. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 shift = SInt(Elem[D[n+r],e,esize]<7:0>); round_const = 1 << (-1-shift); // 0 for left shift, 2^(n-1) for right shift operand = Int(Elem[D[m+r],e,esize], unsigned); (result, sat) = SatQ((operand + round_const) << shift, esize, unsigned); Elem[D[d+r],e,esize] = result; if sat then FPSCR.QC = ‘1’; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-715 Instruction Details A8.6.365 VQRSHRN, VQRSHRUN Vector Saturating Rounding Shift Right, Narrow takes each element in a quadword vector of integers, right shifts them by an immediate value, and places the rounded results in a doubleword vector. For truncated results, see VQSHRN, VQSHRUN on page A8-722. The operand elements must all be the same size, and can be any one of: • 16-bit, 32-bit, or 64-bit signed integers • 16-bit, 32-bit, or 64-bit unsigned integers. The result elements are half the width of the operand elements. If the operand elements are signed, the results can be either signed or unsigned. If the operand elements are unsigned, the result elements must also be unsigned. If any of the results overflow, they are saturated. The cumulative saturation flag, QC, is set if saturation occurs. For details see Pseudocode details of saturation on page A2-9. Encoding T1 / A1 Advanced SIMD VQRSHR{U}N.
,,# 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 1 D imm6 1 0 15 14 13 12 11 10 9 8 Vd 1 0 0 op 0 1 M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 1 D imm6 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 0 0 op 0 1 M 1 1 0 1 0 Vm if imm6 == ‘000xxx’ then SEE “Related encodings”; if U == ‘0’ && op == ‘0’ then SEE VRSHRN; if Vm<0> == ‘1’ then UNDEFINED; case imm6 of when ‘001xxx’ esize = 8; elements = 8; shift_amount = 16 - UInt(imm6); when ‘01xxxx’ esize = 16; elements = 4; shift_amount = 32 - UInt(imm6); when ‘1xxxxx’ esize = 32; elements = 2; shift_amount = 64 - UInt(imm6); src_unsigned = (U == ‘1’ && op == ‘1’); dest_unsigned = (U == ‘1’); d = UInt(D:Vd); m = UInt(M:Vm); Related encodings A8-716 See One register and a modified immediate value on page A7-21 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VQRSHR{U}N.
, , # where: U If present, specifies that the results are unsigned, although the operands are signed. See Standard assembler syntax fields on page A8-7. An ARM VQRSHRN or VQRSHRUN instruction must be unconditional. The data type for the elements of the vectors. It must be one of: S encoded as: U = 0, op = 1, if U is absent U = 1, op = 0, if U is present U encoded as U = 1, op = 1. Not available for VQRSHRUN. The data size for the elements of the vectors. It must be one of: 16 Encoded as L = ’0’, imm6<5:3> = ’001’. (8– ) is encoded in imm6<2:0>. 32 Encoded as L = ’0’, imm6<5:4> = ’01’. (16 – ) is encoded in imm6<3:0>. 64 Encoded as L = ’0’, imm6<5> = ’1’. (32 – ) is encoded in imm6<4:0>.
, The destination vector and the operand vector. The immediate value, in the range 1 to /2. See the description of for how is encoded. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); round_const = 1 << (shift_amount - 1); for e = 0 to elements-1 operand = Int(Elem[Q[m>>1],e,2*esize], src_unsigned); (result, sat) = SatQ((operand + round_const) >> shift_amount, esize, dest_unsigned); Elem[D[d],e,esize] = result; if sat then FPSCR.QC = ‘1’; Exceptions Undefined Instruction. Pseudo-instructions VQRSHRN.I
, , #0 VQRSHRUN.I
, , #0 ARM DDI 0406B is a synonym for is a synonym for VQMOVN.I
, VQMOVUN.I
, Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-717 Instruction Details A8.6.366 VQSHL (register) Vector Saturating Shift Left (register) takes each element in a vector, shifts them by a value from the least significant byte of the corresponding element of a second vector, and places the results in the destination vector. If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are truncated. For rounded results, see VQRSHL on page A8-714. The first operand and result elements are the same data type, and can be any one of: • 8-bit, 16-bit, 32-bit, or 64-bit signed integers • 8-bit, 16-bit, 32-bit, or 64-bit unsigned integers. The second operand is a signed integer of the same size. If any of the results overflow, they are saturated. The cumulative saturation flag, QC, is set if saturation occurs. For details see Pseudocode details of saturation on page A2-9. Encoding T1 / A1 Advanced SIMD VQSHL. ,, VQSHL.
,, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 0 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 0 1 0 0 N Q M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 0 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 1 0 0 N Q M 1 1 0 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’ || Vn<0> == ‘1’) then UNDEFINED; unsigned = (U == ‘1’); esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); m = UInt(M:Vm); n = UInt(N:Vn); regs = if Q == ‘0’ then 1 else 2; A8-718 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VQSHL. {,} , VQSHL. {
,} , Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VQSHL instruction must be unconditional. The data type for the elements of the vectors. It must be one of: S signed, encoded as U = 0 U unsigned, encoded as U = 1. The data size for the elements of the vectors. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 32 encoded as size = 0b10 64 encoded as size = 0b11. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 shift = SInt(Elem[D[n+r],e,esize]<7:0>); operand = Int(Elem[D[m+r],e,esize], unsigned); (result,sat) = SatQ(operand << shift, esize, unsigned); Elem[D[d+r],e,esize] = result; if sat then FPSCR.QC = ‘1’; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-719 Instruction Details A8.6.367 VQSHL, VQSHLU (immediate) Vector Saturating Shift Left (immediate) takes each element in a vector of integers, left shifts them by an immediate value, and places the results in a second vector. The operand elements must all be the same size, and can be any one of: • 8-bit, 16-bit, 32-bit, or 64-bit signed integers • 8-bit, 16-bit, 32-bit, or 64-bit unsigned integers. The result elements are the same size as the operand elements. If the operand elements are signed, the results can be either signed or unsigned. If the operand elements are unsigned, the result elements must also be unsigned. If any of the results overflow, they are saturated. The cumulative saturation flag, QC, is set if saturation occurs. For details see Pseudocode details of saturation on page A2-9. Encoding T1 / A1 Advanced SIMD VQSHL{U}. ,,# VQSHL{U}.
,,# 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 1 D imm6 1 0 15 14 13 12 11 10 9 8 Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 1 D imm6 Vd 7 6 5 4 3 2 0 1 1 op L Q M 1 7 6 5 4 3 2 0 1 1 op L Q M 1 1 0 Vm 1 0 Vm if L:imm6 == ‘0000xxx’ then SEE “Related encodings”; if U == ‘0’ && op == ‘0’ then UNDEFINED; if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; case L:imm6 of when ‘0001xxx’ esize = 8; elements = 8; shift_amount = UInt(imm6) - 8; when ‘001xxxx’ esize = 16; elements = 4; shift_amount = UInt(imm6) - 16; when ‘01xxxxx’ esize = 32; elements = 2; shift_amount = UInt(imm6) - 32; when ‘1xxxxxx’ esize = 64; elements = 1; shift_amount = UInt(imm6); src_unsigned = (U == ‘1’ && op == ‘1’); dest_unsigned = (U == ‘1’); d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; Related encodings A8-720 See One register and a modified immediate value on page A7-21 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VQSHL{U}. {,} , # VQSHL{U}. {
,} , # Encoded as Q = 1 Encoded as Q = 0 where: U If present, specifies that the results are unsigned, although the operands are signed. See Standard assembler syntax fields on page A8-7. An ARM VQSHL or VQSHLU instruction must be unconditional. The data type for the elements of the vectors. It must be one of: S encoded as: U = 0, op = 1, if U is absent U = 1, op = 0, if U is present U encoded as U = 1, op = 1. Not available for VQSHLU. The data size for the elements of the vectors. It must be one of: 8 Encoded as L = ’0’, imm6<5:3> = ’001’. is encoded in imm6<2:0>. 16 Encoded as L = ’0’, imm6<5:4> = ’01’. is encoded in imm6<3:0>. 32 Encoded as L = ’0’, imm6<5> = ’1’. is encoded in imm6<4:0>. 64 Encoded as L = ’1’. is encoded in imm6<5:0>. , The destination vector, and the operand vector, for a quadword operation.
, The destination vector, and the operand vector, for a doubleword operation. The immediate value, in the range 0 to -1. See the description of for how is encoded. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 operand = Int(Elem[D[m+r],e,esize], src_unsigned); (result, sat) = SatQ(operand << shift_amount, esize, dest_unsigned); Elem[D[d+r],e,esize] = result; if sat then FPSCR.QC = ‘1’; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-721 Instruction Details A8.6.368 VQSHRN, VQSHRUN Vector Saturating Shift Right, Narrow takes each element in a quadword vector of integers, right shifts them by an immediate value, and places the truncated results in a doubleword vector. For rounded results, see VQRSHRN, VQRSHRUN on page A8-716. The operand elements must all be the same size, and can be any one of: • 16-bit, 32-bit, or 64-bit signed integers • 16-bit, 32-bit, or 64-bit unsigned integers. The result elements are half the width of the operand elements. If the operand elements are signed, the results can be either signed or unsigned. If the operand elements are unsigned, the result elements must also be unsigned. If any of the results overflow, they are saturated. The cumulative saturation flag, QC, is set if saturation occurs. For details see Pseudocode details of saturation on page A2-9. Encoding T1 / A1 Advanced SIMD VQSHR{U}N.
,,# 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 1 D imm6 1 0 15 14 13 12 11 10 9 8 Vd 1 0 0 op 0 0 M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 1 D imm6 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 0 0 op 0 0 M 1 1 0 1 0 Vm if imm6 == ‘000xxx’ then SEE “Related encodings”; if U == ‘0’ && op == ‘0’ then SEE VSHRN; if Vm<0> == ‘1’ then UNDEFINED; case imm6 of when ‘001xxx’ esize = 8; elements = 8; shift_amount = 16 - UInt(imm6); when ‘01xxxx’ esize = 16; elements = 4; shift_amount = 32 - UInt(imm6); when ‘1xxxxx’ esize = 32; elements = 2; shift_amount = 64 - UInt(imm6); src_unsigned = (U == ‘1’ && op == ‘1’); dest_unsigned = (U == ‘1’); d = UInt(D:Vd); m = UInt(M:Vm); Related encodings A8-722 See One register and a modified immediate value on page A7-21 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VQSHR{U}N.
, , # where: U If present, specifies that the results are unsigned, although the operands are signed. See Standard assembler syntax fields on page A8-7. An ARM VQSHRN or VQSHRUN instruction must be unconditional. The data type for the elements of the vectors. It must be one of: S encoded as: U = 0, op = 1, if U is absent U = 1, op = 0, if U is present U encoded as U = 1, op = 1. Not available for VQSHRUN. The data size for the elements of the vectors. It must be one of: 16 Encoded as imm6<5:3> = ’001’. (8 – ) is encoded in imm6<2:0>. 32 Encoded as imm6<5:4> = ’01’. (16 – ) is encoded in imm6<3:0>. 64 Encoded as imm6<5> = ’1’. (32 – ) is encoded in imm6<4:0>.
, The destination vector, and the operand vector. The immediate value, in the range 1 to /2. See the description of for how is encoded. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for e = 0 to elements-1 operand = Int(Elem[Q[m>>1],e,2*esize], src_unsigned); (result, sat) = SatQ(operand >> shift_amount, esize, dest_unsigned); Elem[D[d],e,esize] = result; if sat then FPSCR.QC = ‘1’; Exceptions Undefined Instruction. Pseudo-instructions VQSHRN.I VQSHRUN.I ARM DDI 0406B
, , #0
, , #0 is a synonym for is a synonym for VQMOVN.I
, VQMOVUN.I
, Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-723 Instruction Details A8.6.369 VQSUB Vector Saturating Subtract subtracts the elements of the second operand vector from the corresponding elements of the first operand vector, and places the results in the destination vector. Signed and unsigned operations are distinct. The operand and result elements must all be the same type, and can be any one of: • 8-bit, 16-bit, 32-bit, or 64-bit signed integers • 8-bit, 16-bit, 32-bit, or 64-bit unsigned integers. If any of the results overflow, they are saturated. The cumulative saturation flag, QC, is set if saturation occurs. For details see Pseudocode details of saturation on page A2-9. Encoding T1 / A1 Advanced SIMD VQSUB. , , VQSUB.
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 0 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 0 0 1 0 N Q M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 0 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 0 1 0 N Q M 1 1 0 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; unsigned = (U == ‘1’); esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-724 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VQSUB. {,} , VQSUB. {
,} , Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VQSUB instruction must be unconditional. The data type for the elements of the vectors. It must be one of: S signed, encoded as U = 0 U unsigned, encoded as U = 1. The data size for the elements of the vectors. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 32 encoded as size = 0b10 64 encoded as size = 0b11. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 diff = Int(Elem[D[n+r],e,esize], unsigned) - Int(Elem[D[m+r],e,esize], unsigned); (Elem[D[d+r],e,esize], sat) = SatQ(diff, esize, unsigned); if sat then FPSCR.QC = ‘1’; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-725 Instruction Details A8.6.370 VRADDHN Vector Rounding Add and Narrow, returning High Half adds corresponding elements in two quadword vectors, and places the most significant half of each result in a doubleword vector. The results are rounded. (For truncated results, see VADDHN on page A8-540.) The operand elements can be 16-bit, 32-bit, or 64-bit integers. There is no distinction between signed and unsigned integers. Encoding T1 / A1 Advanced SIMD VRADDHN.
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 1 1 1 1 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 0 1 0 0 N 0 M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 1 0 0 N 0 M 0 1 0 1 0 Vm if size == ‘11’ then SEE “Related encodings”; if Vn<0> == ‘1’ || Vm<0> == ‘1’ then UNDEFINED; esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); Related encodings A8-726 See Advanced SIMD data-processing instructions on page A7-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VRADDHN.
, , where: See Standard assembler syntax fields on page A8-7. An ARM VRADDHN instruction must be unconditional.
The data type for the elements of the operands. It must be one of: I16 size = 0b00 I32 size = 0b01 I64 size = 0b10.
, , The destination vector and the operand vectors. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); round_const = 1 << (esize-1); for e = 0 to elements-1 result = Elem[Q[n>>1],e,2*esize] + Elem[Q[m>>1],e,2*esize] + round_const; Elem[D[d],e,esize] = result<2*esize-1:esize>; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-727 Instruction Details A8.6.371 VRECPE Vector Reciprocal Estimate finds an approximate reciprocal of each element in the operand vector, and places the results in the destination vector. The operand and result elements are the same type, and can be 32-bit floating-point numbers, or 32-bit unsigned integers. For details of the operation performed by this instruction see Reciprocal estimate and step on page A2-58. Encoding T1 / A1 Advanced SIMD (F = 1 UNDEFINED in integer-only variants) VRECPE.
, VRECPE.
, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 1 1 1 1 1 1 D 1 1 size 1 1 Vd 0 1 0 F 0 Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D 1 1 size 1 1 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 1 0 F 0 Q M 0 1 0 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; if size != ‘10’ then UNDEFINED; floating_point = (F == ‘1’); esize = 32; elements = 2; d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-728 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VRECPE.
, VRECPE.
, Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VRECPE instruction must be unconditional.
The data types for the elements of the vectors. It must be one of: U32 encoded as F = 0, size = 0b10 F32 encoded as F = 1, size = 0b10. , The destination vector and the operand vector, for a quadword operation.
, The destination vector and the operand vector, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 if floating_point then Elem[D[d+r],e,esize] = FPRecipEstimate(Elem[D[m+r],e,esize]); else Elem[D[d+r],e,esize] = UnsignedRecipEstimate(Elem[D[m+r],e,esize]); Exceptions Undefined Instruction. Floating-point exceptions: Input Denormal, Invalid Operation, Underflow, and Division by Zero. Newton-Raphson iteration For details of the operation performed and how it can be used in a Newton-Raphson iteration to calculate the reciprocal of a number, see Reciprocal estimate and step on page A2-58. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-729 Instruction Details A8.6.372 VRECPS Vector Reciprocal Step multiplies the elements of one vector by the corresponding elements of another vector, subtracts each of the products from 2.0, and places the results into the elements of the destination vector. The operand and result elements are 32-bit floating-point numbers. For details of the operation performed by this instruction see Reciprocal estimate and step on page A2-58. Encoding T1 / A1 Advanced SIMD (UNDEFINED in integer-only variant) VRECPS.F32 , , VRECPS.F32
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 1 0 D 0 sz 1 0 15 14 13 12 11 10 9 8 Vn Vd 1 1 1 1 N Q M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 0 0 D 0 sz Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 1 1 1 N Q M 1 1 0 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; if sz == ‘1’ then UNDEFINED; esize = 32; elements = 2; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-730 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VRECPS.F32 {,} , VRECPS.F32 {
,} , Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VRECPS instruction must be unconditional. , , The destination vector and the operand vectors for a quadword operation.
, , The destination vector and the operand vectors for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 Elem[D[d+r],e,esize] = FPRecipStep(Elem[D[n+r],e,esize], Elem[D[m+r],e,esize]); Exceptions Undefined Instruction. Floating-point exceptions: Input Denormal, Invalid Operation, Overflow, Underflow, and Inexact. Newton-Raphson iteration For details of the operation performed and how it can be used in a Newton-Raphson iteration to calculate the reciprocal of a number, see Reciprocal estimate and step on page A2-58. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-731 Instruction Details A8.6.373 VREV16, VREV32, VREV64 VREV16 (Vector Reverse in halfwords) reverses the order of 8-bit elements in each halfword of the vector, and places the result in the corresponding destination vector. VREV32 (Vector Reverse in words) reverses the order of 8-bit or 16-bit elements in each word of the vector, and places the result in the corresponding destination vector. VREV64 (Vector Reverse in doublewords) reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector, and places the result in the corresponding destination vector. There is no distinction between data types, other than size. Encoding T1 / A1 Advanced SIMD VREV. , VREV.
, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 1 1 1 1 1 1 D 1 1 size 0 0 Vd 0 0 0 op 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D 1 1 size 0 0 Vd 0 0 0 7 6 5 4 3 2 Q M 0 Vm 7 6 5 4 3 2 op Q M 0 1 0 1 0 Vm if UInt(op)+UInt(size) >= 3 then UNDEFINED; if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; esize = 8 << UInt(size); elements = 64 DIV esize; groupsize = (1 << (3-UInt(op)-UInt(size)); // elements per reversing group: 2, 4 or 8 reverse_mask = (groupsize-1); // EORing mask used for index calculations d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; Figure A8-6 shows two examples of the operation of VREV. VREV64.8, doubleword VREV64.32, quadword Dm Qm Dd Qd Figure A8-6 Examples of operation A8-732 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VREV. , VREV.
, Encoded as Q = 1 Encoded as Q = 0 where: The size of the regions in which the vector elements are reversed. It must be one of: 16 encoded as op = 0b10 32 encoded as op = 0b01 64 encoded as op = 0b00. See Standard assembler syntax fields on page A8-7. An ARM VREV instruction must be unconditional. The size of the vector elements. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 32 encoded as size = 0b10. must specify a smaller size than . , The destination vector and the operand vector, for a quadword operation.
, The destination vector and the operand vector, for a doubleword operation. If op + size >= 3, the instruction is reserved. Operation if ConditionPassed() then EncodingSpecificOperations(); bits(64) dest; CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 // Calculate destination element index by bitwise EOR on source element index: e_bits = e; d_bits = e_bits EOR reverse_mask; d = UInt(d_bits); Elem[dest,d,esize] = Elem[D[m+r],e,esize]; D[d+r] = dest; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-733 Instruction Details A8.6.374 VRHADD Vector Rounding Halving Add adds corresponding elements in two vectors of integers, shifts each result right one bit, and places the final results in the destination vector. The operand and result elements are all the same type, and can be any one of: • 8-bit, 16-bit, or 32-bit signed integers • 8-bit, 16-bit, or 32-bit unsigned integers. The results of the halving operations are rounded (for truncated results see VHADD, VHSUB on page A8-600). Encoding T1 / A1 Advanced SIMD VRHADD , , VRHADD
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 0 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 0 D size Vn Vd 7 6 5 4 3 2 0 0 0 1 N Q M 0 7 6 5 4 3 2 0 0 0 1 N Q M 0 1 0 Vm 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; if size == ‘11’ then UNDEFINED; unsigned = (U == ‘1’); esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-734 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VRHADD.
{,} , VRHADD.
{
,} , Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VRHADD instruction must be unconditional.
The data type for the elements of the vectors. It must be one of: S8 encoded as size = 0b00, U = 0 S16 encoded as size = 0b01, U = 0 S32 encoded as size = 0b10, U = 0 U8 encoded as size = 0b00, U = 1 U16 encoded as size = 0b01, U = 1 U32 encoded as size = 0b10, U = 1. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 op1 = Int(Elem[D[n+r],e,esize], unsigned); op2 = Int(Elem[D[m+r],e,esize], unsigned); result = op1 + op2 + 1; Elem[D[d+r],e,esize] = result; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-735 Instruction Details A8.6.375 VRSHL Vector Rounding Shift Left takes each element in a vector, shifts them by a value from the least significant byte of the corresponding element of a second vector, and places the results in the destination vector. If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a rounding right shift. (For a truncating shift, see VSHL (register) on page A8-752). The first operand and result elements are the same data type, and can be any one of: • 8-bit, 16-bit, 32-bit, or 64-bit signed integers • 8-bit, 16-bit, 32-bit, or 64-bit unsigned integers. The second operand is always a signed integer of the same size. Encoding T1 / A1 Advanced SIMD VRSHL. , , VRSHL.
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 0 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 0 1 0 1 N Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 0 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 1 0 1 N Q M 0 1 0 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’ || Vn<0> == ‘1’) then UNDEFINED; unsigned = (U == ‘1’); esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); m = UInt(M:Vm); n = UInt(N:Vn); regs = if Q == ‘0’ then 1 else 2; A8-736 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VRSHL. {,} , VRSHL. {
,} , Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VRSHL instruction must be unconditional. The data type for the elements of the vectors. It must be one of: S signed, encoded as U = 0 U unsigned, encoded as U = 1. The data size for the elements of the vectors. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 32 encoded as size = 0b10 64 encoded as size = 0b11. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 shift = SInt(Elem[D[n+r],e,esize]<7:0>); round_const = 1 << (-shift-1); // 0 for left shift, 2^(n-1) for right shift result = (Int(Elem[D[m+r],e,esize], unsigned) + round_const) << shift; Elem[D[d+r],e,esize] = result; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-737 Instruction Details A8.6.376 VRSHR Vector Rounding Shift Right takes each element in a vector, right shifts them by an immediate value, and places the rounded results in the destination vector. For truncated results, see VSHR on page A8-756. The operand and result elements must be the same size, and can be any one of: • 8-bit, 16-bit, 32-bit, or 64-bit signed integers. • 8-bit, 16-bit, 32-bit, or 64-bit unsigned integers. Encoding T1 / A1 Advanced SIMD VRSHR. , , # VRSHR.
, , # 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 1 D imm6 1 0 15 14 13 12 11 10 9 8 Vd 0 0 1 0 L Q M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 1 D imm6 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 0 1 0 L Q M 1 1 0 1 0 Vm if L:imm6 == ‘0000xxx’ then SEE “Related encodings”; if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; case L:imm6 of when ‘0001xxx’ esize = 8; elements = 8; shift_amount = 16 - UInt(imm6); when ‘001xxxx’ esize = 16; elements = 4; shift_amount = 32 - UInt(imm6); when ‘01xxxxx’ esize = 32; elements = 2; shift_amount = 64 - UInt(imm6); when ‘1xxxxxx’ esize = 64; elements = 1; shift_amount = 64 - UInt(imm6); unsigned = (U == ‘1’); d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; Related encodings A8-738 See One register and a modified immediate value on page A7-21 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VRSHR. {,} , # VRSHR. {
,} , # Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VRSHR instruction must be unconditional. The data type for the elements of the vectors. It must be one of: S signed, encoded as U = 0 U unsigned, encoded as U = 1. The data size for the elements of the vectors. It must be one of: 8 Encoded as L = ’0’, imm6<5:3> = ’001’. (8– ) is encoded in imm6<2:0>. 16 Encoded as L = ’0’, imm6<5:4> = ’01’. (16 – ) is encoded in imm6<3:0>. 32 Encoded as L = ’0’, imm6<5> = ’1’. (32 – ) is encoded in imm6<4:0>. 64 Encoded as L = ’1’. (64 – ) is encoded in imm6<5:0>. , The destination vector, and the operand vector, for a quadword operation.
, The destination vector, and the operand vector, for a doubleword operation. The immediate value, in the range 1 to . See the description of for how is encoded. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); round_const = 1 << (shift_amount - 1); for r = 0 to regs-1 for e = 0 to elements-1 result = (Int(Elem[D[m+r],e,esize], unsigned) + round_const) >> shift_amount; Elem[D[d+r],e,esize] = result; Exceptions Undefined Instruction. Pseudo-instructions VRSHR. , , #0 VRSHR.
, , #0 is a synonym for is a synonym for VMOV , VMOV
, For details see VMOV (register) on page A8-642. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-739 Instruction Details A8.6.377 VRSHRN Vector Rounding Shift Right and Narrow takes each element in a vector, right shifts them by an immediate value, and places the rounded results in the destination vector. For truncated results, see VSHRN on page A8-758. The operand elements can be 16-bit, 32-bit, or 64-bit integers. There is no distinction between signed and unsigned integers. The destination elements are half the size of the source elements. Encoding T1 / A1 Advanced SIMD VRSHRN.I
, , # 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 1 1 D imm6 1 0 15 14 13 12 11 10 9 8 Vd 1 0 0 0 0 1 M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 0 1 D imm6 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 0 0 0 0 1 M 1 1 0 1 0 Vm if imm6 == ‘000xxx’ then SEE “Related encodings”; if Vm<0> == ‘1’ then UNDEFINED; case imm6 of when ‘001xxx’ then esize = 8; elements = 8; shift_amount = 16 - UInt(imm6); when ‘01xxxx’ then esize = 16; elements = 4; shift_amount = 32 - UInt(imm6); when ‘1xxxxx’ then esize = 32; elements = 2; shift_amount = 64 - UInt(imm6); d = UInt(D:Vd); m = UInt(M:Vm); Related encodings A8-740 See One register and a modified immediate value on page A7-21 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VRSHRN.I
, , # where: See Standard assembler syntax fields on page A8-7. An ARM VRSHRN instruction must be unconditional. The data size for the elements of the vectors. It must be one of: 16 Encoded as imm6<5:3> = ’001’. (8 – ) is encoded in imm6<2:0>. 32 Encoded as imm6<5:4> = ’01’. (16 – ) is encoded in imm6<3:0>. 64 Encoded as imm6<5> = ’1’. (32 – ) is encoded in imm6<4:0>.
, The destination vector, and the operand vector. The immediate value, in the range 1 to /2. See the description of for how is encoded. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); round_const = 1 << (shift_amount-1); for e = 0 to elements-1 result = LSR(Elem[Q[m>>1],e,2*esize] + round_const, shift_amount); Elem[D[d],e,esize] = result; Exceptions Undefined Instruction. Pseudo-instructions VRSHRN.I
, , #0 is a synonym for VMOVN.I
, For details see VMOVN on page A8-656. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-741 Instruction Details A8.6.378 VRSQRTE Vector Reciprocal Square Root Estimate finds an approximate reciprocal square root of each element in a vector, and places the results in a second vector. The operand and result elements are the same type, and can be 32-bit floating-point numbers, or 32-bit unsigned integers. For details of the operation performed by this instruction see Reciprocal square root on page A2-61. Encoding T1 / A1 Advanced SIMD (F = 1 UNDEFINED in integer-only variants) VRSQRTE.
, VRSQRTE.
, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 1 1 1 1 1 1 D 1 1 size 1 1 Vd 0 1 0 F 1 Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D 1 1 size 1 1 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 1 0 F 1 Q M 0 1 0 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; if size != ‘10’ then UNDEFINED; floating_point = (F == ‘1’); esize = 32; elements = 2; d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-742 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VRSQRTE.
, VRSQRTE.
, Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VRSQRTE instruction must be unconditional.
The data types for the elements of the vectors. It must be one of: U32 encoded as F = 0, size = 0b10 F32 encoded as F = 1, size = 0b10. , The destination vector and the operand vector, for a quadword operation.
, The destination vector and the operand vector, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 if floating_point then Elem[D[d+r],e,esize] = FPRSqrtEstimate(Elem[D[m+r],e,esize]); else Elem[D[d+r],e,esize] = UnsignedRSqrtEstimate(Elem[D[m+r],e,esize]); Exceptions Undefined Instruction. Floating-point exceptions: Input Denormal, Invalid Operation, and Division by Zero. Newton-Raphson iteration For details of the operation performed and how it can be used in a Newton-Raphson iteration to calculate the reciprocal of the square root of a number, see Reciprocal square root on page A2-61. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-743 Instruction Details A8.6.379 VRSQRTS Vector Reciprocal Square Root Step multiplies the elements of one vector by the corresponding elements of another vector, subtracts each of the products from 3.0, divides these results by 2.0, and places the results into the elements of the destination vector. The operand and result elements are 32-bit floating-point numbers. For details of the operation performed by this instruction see Reciprocal square root on page A2-61. Encoding T1 / A1 Advanced SIMD (UNDEFINED in integer-only variant) VRSQRTS.F32 , , VRSQRTS.F32
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 1 0 D 1 sz 1 0 15 14 13 12 11 10 9 8 Vn Vd 1 1 1 1 N Q M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 0 0 D 1 sz Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 1 1 1 N Q M 1 1 0 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; if sz == ‘1’ then UNDEFINED; esize = 32; elements = 2; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-744 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VRSQRTS.F32 {,} , VRSQRTS.F32 {
,} , Encoded as Q = 1, sz = 0 Encoded as Q = 0, sz = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VRSQRTS instruction must be unconditional. , , The destination vector and the operand vectors for a quadword operation.
, , The destination vector and the operand vectors for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 Elem[D[d+r],e,esize] = FPRSqrtStep(Elem[D[n+r],e,esize], Elem[D[m+r],e,esize]); Exceptions Undefined Instruction. Floating-point exceptions: Input Denormal, Invalid Operation, Overflow, Underflow, and Inexact. Newton-Raphson iteration For details of the operation performed and how it can be used in a Newton-Raphson iteration to calculate the reciprocal of the square root of a number, see Reciprocal square root on page A2-61. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-745 Instruction Details A8.6.380 VRSRA Vector Rounding Shift Right and Accumulate takes each element in a vector, right shifts them by an immediate value, and accumulates the rounded results into the destination vector. (For truncated results, see VSRA on page A8-764.) The operand and result elements must all be the same type, and can be any one of: • 8-bit, 16-bit, 32-bit, or 64-bit signed integers. • 8-bit, 16-bit, 32-bit, or 64-bit unsigned integers. Encoding T1 / A1 Advanced SIMD VRSRA. , , # VRSRA.
, , # 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 1 D imm6 1 0 15 14 13 12 11 10 9 8 Vd 0 0 1 1 L Q M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 1 D imm6 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 0 1 1 L Q M 1 1 0 1 0 Vm if L:imm6 == ‘0000xxx’ then SEE “Related encodings”; if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; case L:imm6 of when ‘0001xxx’ esize = 8; elements = 8; shift_amount = 16 - UInt(imm6); when ‘001xxxx’ esize = 16; elements = 4; shift_amount = 32 - UInt(imm6); when ‘01xxxxx’ esize = 32; elements = 2; shift_amount = 64 - UInt(imm6); when ‘1xxxxxx’ esize = 64; elements = 1; shift_amount = 64 - UInt(imm6); unsigned = (U == ‘1’); d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; Related encodings A8-746 See One register and a modified immediate value on page A7-21 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VRSRA. {,} , # VRSRA. {
,} , # Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VRSRA instruction must be unconditional. The data type for the elements of the vectors. It must be one of: S signed, encoded as U = 0 U unsigned, encoded as U = 1. The data size for the elements of the vectors. It must be one of: 8 Encoded as L = ’0’, imm6<5:3> = ’001’. (8– ) is encoded in imm6<2:0>. 16 Encoded as L = ’0’, imm6<5:4> = ’01’. (16 – ) is encoded in imm6<3:0>. 32 Encoded as L = ’0’, imm6<5> = ’1’. (32 – ) is encoded in imm6<4:0>. 64 Encoded as L = ’1’. (64 – ) is encoded in imm6<5:0>. , The destination vector, and the operand vector, for a quadword operation.
, The destination vector, and the operand vector, for a doubleword operation. The immediate value, in the range 1 to . See the description of for how is encoded. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); round_const = 1 << (shift_amount - 1); for r = 0 to regs-1 for e = 0 to elements-1 result = (Int(Elem[D[m+r],e,esize], unsigned) + round_const) >> shift_amount; Elem[D[d+r],e,esize] = Elem[D[d+r],e,esize] + result; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-747 Instruction Details A8.6.381 VRSUBHN Vector Rounding Subtract and Narrow, returning High Half subtracts the elements of one quadword vector from the corresponding elements of another quadword vector takes the most significant half of each result, and places the final results in a doubleword vector. The results are rounded. (For truncated results, see VSUBHN on page A8-792.) The operand elements can be 16-bit, 32-bit, or 64-bit integers. There is no distinction between signed and unsigned integers. Encoding T1 / A1 Advanced SIMD VRSUBHN.
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 1 1 1 1 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 0 1 1 0 N 0 M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 1 1 0 N 0 M 0 1 0 1 0 Vm if size == ‘11’ then SEE “Related encodings”; if Vn<0> == ‘1’ || Vm<0> == ‘1’ then UNDEFINED; esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); Related encodings A8-748 See Advanced SIMD data-processing instructions on page A7-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VRSUBHN.
, , where: See Standard assembler syntax fields on page A8-7. An ARM VRSUBHN instruction must be unconditional.
The data type for the elements of the operands. It must be one of: I16 size = 0b00 I32 size = 0b01 I64 size = 0b10.
, , The destination vector and the operand vectors. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); round_const = 1 << (esize-1); for e = 0 to elements-1 result = Elem[Q[n>>1],e,2*esize] - Elem[Q[m>>1],e,2*esize] + round_const; Elem[D[d],e,esize] = result<2*esize-1:esize>; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-749 Instruction Details A8.6.382 VSHL (immediate) Vector Shift Left (immediate) takes each element in a vector of integers, left shifts them by an immediate value, and places the results in the destination vector. Bits shifted out of the left of each element are lost. The elements must all be the same size, and can be 8-bit, 16-bit, 32-bit, or 64-bit integers. There is no distinction between signed and unsigned integers. Encoding T1 / A1 Advanced SIMD VSHL.I , , # VSHL.I
, , # 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 1 1 D imm6 1 0 15 14 13 12 11 10 9 8 Vd 0 1 0 1 L Q M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 0 1 D imm6 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 1 0 1 L Q M 1 1 0 1 0 Vm if L:imm6 == ‘0000xxx’ then SEE “Related encodings”; if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; case L:imm6 of when ‘0001xxx’ esize = 8; elements = 8; shift_amount = UInt(imm6) - 8; when ‘001xxxx’ esize = 16; elements = 4; shift_amount = UInt(imm6) - 16; when ‘01xxxxx’ esize = 32; elements = 2; shift_amount = UInt(imm6) - 32; when ‘1xxxxxx’ esize = 64; elements = 1; shift_amount = UInt(imm6); d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; Related encodings A8-750 See One register and a modified immediate value on page A7-21 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VSHL.I {,} , # VSHL.I {
,} , # Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VSHL instruction must be unconditional. The data size for the elements of the vectors. It must be one of: 8 Encoded as L = ’0’, imm6<5:3> = ’001’. is encoded in imm6<2:0>. 16 Encoded as L = ’0’, imm6<5:4> = ’01’. is encoded in imm6<3:0>. 32 Encoded as L = ’0’, imm6<5> = ’1’. is encoded in imm6<4:0>. 64 Encoded as L = ’1’. is encoded in imm6<5:0>. , The destination vector, and the operand vector, for a quadword operation.
, The destination vector, and the operand vector, for a doubleword operation. The immediate value, in the range 0 to -1. See the description of for how is encoded. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 Elem[D[d+r],e,esize] = LSL(Elem[D[m+r],e,esize], shift_amount); Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-751 Instruction Details A8.6.383 VSHL (register) Vector Shift Left (register) takes each element in a vector, shifts them by a value from the least significant byte of the corresponding element of a second vector, and places the results in the destination vector. If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a truncating right shift. (For a rounding shift, see VRSHL on page A8-736). The first operand and result elements are the same data type, and can be any one of: • 8-bit, 16-bit, 32-bit, or 64-bit signed integers • 8-bit, 16-bit, 32-bit, or 64-bit unsigned integers. The second operand is always a signed integer of the same size. Encoding T1 / A1 Advanced SIMD VSHL.I , , VSHL.I
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 0 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 0 1 0 0 N Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 0 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 1 0 0 N Q M 0 1 0 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’ || Vn<0> == ‘1’) then UNDEFINED; unsigned = (U == ‘1’); esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); m = UInt(M:Vm); n = UInt(N:Vn); regs = if Q == ‘0’ then 1 else 2; A8-752 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VSHL. {,} , VSHL. {
,} , Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VSHL instruction must be unconditional. The data type for the elements of the vectors. It must be one of: S signed, encoded as U = 0 U unsigned, encoded as U = 1. The data size for the elements of the vectors. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 32 encoded as size = 0b10 64 encoded as size = 0b11. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 shift = SInt(Elem[D[n+r],e,esize]<7:0>); result = Int(Elem[D[m+r],e,esize], unsigned) << shift; Elem[D[d+r],e,esize] = result; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-753 Instruction Details A8.6.384 VSHLL Vector Shift Left Long takes each element in a doubleword vector, left shifts them by an immediate value, and places the results in a quadword vector. The operand elements can be: • 8-bit, 16-bit, or 32-bit signed integers • 8-bit, 16-bit, or 32-bit unsigned integers • 8-bit, 16-bit, or 32-bit untyped integers (maximum shift only). The result elements are twice the length of the operand elements. Encoding T1 / A1 Advanced SIMD (0 < < ) VSHLL. , , # 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 1 D 1 0 15 14 13 12 11 10 9 8 imm6 Vd 1 0 1 0 0 0 M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 1 D imm6 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 0 1 0 0 0 M 1 1 0 1 0 Vm if imm6 == ‘000xxx’ then SEE “Related encodings”; if Vd<0> == ‘1’ then UNDEFINED; case imm6 of when ‘001xxx’ esize = 8; elements = 8; shift_amount = UInt(imm6) - 8; when ‘01xxxx’ esize = 16; elements = 4; shift_amount = UInt(imm6) - 16; when ‘1xxxxx’ esize = 32; elements = 2; shift_amount = UInt(imm6) - 32; if shift_amount == 0 then SEE VMOVL; unsigned = (U == ‘1’); d = UInt(D:Vd); m = UInt(M:Vm); Encoding T2 / A2 Advanced SIMD ( == ) VSHLL. , , # 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 1 1 1 1 1 1 D 1 1 size 1 0 Vd 0 0 1 1 0 0 M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D 1 1 size 1 0 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 0 1 1 0 0 M 0 1 0 1 0 Vm if size == ‘11’ || Vd<0> == ‘1’ then UNDEFINED; esize = 8 << UInt(size); shift_amount = esize; unsigned = FALSE; // Or TRUE without change of functionality d = UInt(D:Vd); m = UInt(M:Vm); Related encodings A8-754 See One register and a modified immediate value on page A7-21 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VSHLL. , , # where: See Standard assembler syntax fields on page A8-7. An ARM VSHLL instruction must be unconditional. The data type for the elements of the operand. It must be one of: S encoded as U = 0 in encoding T1 / A1 U encoded as U = 1 in encoding T1 / A1 I available only in encoding T2 / A2. The data size for the elements of the operand. It must be one of: 8 encoded as imm6<5:3> = ’001’ or size = ’00’ 16 encoded as imm6<5:4> = ’01’ or size = ’01’ 32 encoded as imm6<5> = ’1’ or size = ’10’. , The destination vector and the operand vector. The immediate value. must lie in the range 1 to : • if = , encoding is T2 / A2 • if = 8, is encoded in imm6<2:0> • if = 16, is encoded in imm6<3:0> • if = 32, is encoded in imm6<4:0>. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for e = 0 to elements-1 result = Int(Elem[D[m],e,esize], unsigned) << shift_amount; Elem[Q[d>>1],e,2*esize] = result<2*esize-1:0>; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-755 Instruction Details A8.6.385 VSHR Vector Shift Right takes each element in a vector, right shifts them by an immediate value, and places the truncated results in the destination vector. For rounded results, see VRSHR on page A8-738. The operand and result elements must be the same size, and can be any one of: • 8-bit, 16-bit, 32-bit, or 64-bit signed integers. • 8-bit, 16-bit, 32-bit, or 64-bit unsigned integers. Encoding T1 / A1 Advanced SIMD VSHR. , , # VSHR.
, , # 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 1 D imm6 1 0 15 14 13 12 11 10 9 8 Vd 0 0 0 0 L Q M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 1 D imm6 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 0 0 0 L Q M 1 1 0 1 0 Vm if L:imm6 == ‘0000xxx’ then SEE “Related encodings”; if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; case L:imm6 of when ‘0001xxx’ esize = 8; elements = 8; shift_amount = 16 - UInt(imm6); when ‘001xxxx’ esize = 16; elements = 4; shift_amount = 32 - UInt(imm6); when ‘01xxxxx’ esize = 32; elements = 2; shift_amount = 64 - UInt(imm6); when ‘1xxxxxx’ esize = 64; elements = 1; shift_amount = 64 - UInt(imm6); unsigned = (U == ‘1’); d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; Related encodings A8-756 See One register and a modified immediate value on page A7-21 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VSHR. {,} , # VSHR. {
,} , # Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VSHR instruction must be unconditional. The data type for the elements of the vectors. It must be one of: S signed, encoded as U = 0 U unsigned, encoded as U = 1. The data size for the elements of the vectors. It must be one of: 8 Encoded as L = ’0’, imm6<5:3> = ’001’. (8– ) is encoded in imm6<2:0>. 16 Encoded as L = ’0’, imm6<5:4> = ’01’. (16 – ) is encoded in imm6<3:0>. 32 Encoded as L = ’0’, imm6<5> = ’1’. (32 – ) is encoded in imm6<4:0>. 64 Encoded as L = ’1’. (64 – ) is encoded in imm6<5:0>. , The destination vector, and the operand vector, for a quadword operation.
, The destination vector, and the operand vector, for a doubleword operation. The immediate value, in the range 1 to . See the description of for how is encoded. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 result = Int(Elem[D[m+r],e,esize], unsigned) >> shift_amount; Elem[D[d+r],e,esize] = result; Exceptions Undefined Instruction. Pseudo-instructions VSHR. , , #0 VSHR.
, , #0 ARM DDI 0406B is a synonym for is a synonym for VMOV , VMOV
, Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-757 Instruction Details A8.6.386 VSHRN Vector Shift Right Narrow takes each element in a vector, right shifts them by an immediate value, and places the truncated results in the destination vector. For rounded results, see VRSHRN on page A8-740. The operand elements can be 16-bit, 32-bit, or 64-bit integers. There is no distinction between signed and unsigned integers. The destination elements are half the size of the source elements. Encoding T1 / A1 Advanced SIMD VSHRN.I
, , # 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 1 1 D imm6 1 0 15 14 13 12 11 10 9 8 Vd 1 0 0 0 0 0 M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 0 1 D imm6 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 0 0 0 0 0 M 1 1 0 1 0 Vm if imm6 == ‘000xxx’ then SEE “Related encodings”; if Vm<0> == ‘1’ then UNDEFINED; case imm6 of when ‘001xxx’ esize = 8; elements = 8; shift_amount = 16 - UInt(imm6); when ‘01xxxx’ esize = 16; elements = 4; shift_amount = 32 - UInt(imm6); when ‘1xxxxx’ esize = 32; elements = 2; shift_amount = 64 - UInt(imm6); d = UInt(D:Vd); m = UInt(M:Vm); Related encodings A8-758 See One register and a modified immediate value on page A7-21 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VSHRN.I
, , # where: See Standard assembler syntax fields on page A8-7. An ARM VSHRN instruction must be unconditional. The data size for the elements of the vectors. It must be one of: 16 Encoded as imm6<5:3> = ’001’. (8 – ) is encoded in imm6<2:0>. 32 Encoded as imm6<5:4> = ’01’. (16 – ) is encoded in imm6<3:0>. 64 Encoded as imm6<5> = ’1’. (32 – ) is encoded in imm6<4:0>.
, The destination vector, and the operand vector. The immediate value, in the range 1 to /2. See the description of for how is encoded. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for e = 0 to elements-1 result = LSR(Elem[Q[m>>1],e,2*esize], shift_amount); Elem[D[d],e,esize] = result; Exceptions Undefined Instruction. Pseudo-instructions VSHRN.I
, , #0 is a synonym for VMOVN.I
, For details see VMOVN on page A8-656. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-759 Instruction Details A8.6.387 VSLI Vector Shift Left and Insert takes each element in the operand vector, left shifts them by an immediate value, and inserts the results in the destination vector. Bits shifted out of the left of each element are lost. The elements must all be the same size, and can be 8-bit, 16-bit, 32-bit, or 64-bit. There is no distinction between data types. Encoding T1 / A1 Advanced SIMD VSLI. , , # VSLI.
, , # 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 1 1 1 1 D imm6 1 0 15 14 13 12 11 10 9 8 Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D imm6 Vd 7 6 5 4 3 2 0 1 0 1 L Q M 1 7 6 5 4 3 2 0 1 0 1 L Q M 1 1 0 Vm 1 0 Vm if L:imm6 == ‘0000xxx’ then SEE “Related encodings”; if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; case L:imm6 of when ‘0001xxx’ esize = 8; elements = 8; shift_amount = UInt(imm6) - 8; when ‘001xxxx’ esize = 16; elements = 4; shift_amount = UInt(imm6) - 16; when ‘01xxxxx’ esize = 32; elements = 2; shift_amount = UInt(imm6) - 32; when ‘1xxxxxx’ esize = 64; elements = 1; shift_amount = UInt(imm6); d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; Related encodings A8-760 See One register and a modified immediate value on page A7-21 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VSLI. {,} , # VSLI. {
,} , # Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VSLI instruction must be unconditional. The data size for the elements of the vectors. It must be one of: 8 Encoded as L = ’0’, imm6<5:3> = ’001’. is encoded in imm6<2:0>. 16 Encoded as L = ’0’, imm6<5:4> = ’01’. is encoded in imm6<3:0>. 32 Encoded as L = ’0’, imm6<5> = ’1’. is encoded in imm6<4:0>. 64 Encoded as L = ’1’. is encoded in imm6<5:0>. , The destination vector, and the operand vector, for a quadword operation.
, The destination vector, and the operand vector, for a doubleword operation. The immediate value, in the range 0 to -1. See the description of for how is encoded. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); mask = LSL(Ones(esize), shift_amount); for r = 0 to regs-1 for e = 0 to elements-1 shifted_op = LSL(Elem[D[m+r],e,esize], shift_amount); Elem[D[d+r],e,esize] = (Elem[D[d+r],e,esize] AND NOT(mask)) OR shifted_op; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-761 Instruction Details A8.6.388 VSQRT This instruction calculates the square root of the value in a floating-point register and writes the result to another floating-point register. Encoding T1 / A1 VFPv2, VFPv3 (sz = 1 UNDEFINED in single-precision only variants) VSQRT.F64
, VSQRT.F32 , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 0 1 1 1 0 1 D 1 1 0 0 0 1 Vd 1 0 1 sz 1 1 M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 1 0 1 D 1 1 0 0 0 1 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 0 1 sz 1 1 M 0 1 0 1 0 Vm if FPSCR.LEN != ‘000’ || FPSCR.STRIDE != ‘00’ then SEE “VFP vectors”; dp_operation = (sz == ‘1’); d = if dp_operation then UInt(D:Vd) else UInt(Vd:D); m = if dp_operation then UInt(M:Vm) else UInt(Vm:M); VFP vectors A8-762 This instruction can operate on VFP vectors under control of the FPSCR.LEN and FPSCR.STRIDE bits. For details see Appendix F VFP Vector Operation Support. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VSQRT.F64
, VSQRT.F32 , Encoded as sz = 1 Encoded as sz = 0 where: See Standard assembler syntax fields on page A8-7.
, The destination vector and the operand vector, for a double-precision operation. , The destination vector and the operand vector, for a single-precision operation. Operation if ConditionPassed() then EncodingSpecificOperations(); if dp_operation then D[d] = FPSqrt(D[m]); else S[d] = FPSqrt(S[m]); CheckVFPEnabled(TRUE); Exceptions Undefined Instruction. Floating-point exceptions: Invalid Operation, Inexact, Input Denormal. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-763 Instruction Details A8.6.389 VSRA Vector Shift Right and Accumulate takes each element in a vector, right shifts them by an immediate value, and accumulates the truncated results into the destination vector. (For rounded results, see VRSRA on page A8-746.) The operand and result elements must all be the same type, and can be any one of: • 8-bit, 16-bit, 32-bit, or 64-bit signed integers. • 8-bit, 16-bit, 32-bit, or 64-bit unsigned integers. Encoding T1 / A1 Advanced SIMD VSRA. , , # VSRA.
, , # 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 1 D imm6 1 0 15 14 13 12 11 10 9 8 Vd 0 0 0 1 L Q M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 1 D imm6 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 0 0 1 L Q M 1 1 0 1 0 Vm if L:imm6 == ‘0000xxx’ then SEE “Related encodings”; if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; case L:imm6 of when ‘0001xxx’ esize = 8; elements = 8; shift_amount = 16 - UInt(imm6); when ‘001xxxx’ esize = 16; elements = 4; shift_amount = 32 - UInt(imm6); when ‘01xxxxx’ esize = 32; elements = 2; shift_amount = 64 - UInt(imm6); when ‘1xxxxxx’ esize = 64; elements = 1; shift_amount = 64 - UInt(imm6); unsigned = (U == ‘1’); d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; Related encodings A8-764 See One register and a modified immediate value on page A7-21 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VSRA. {,} , # VSRA. {
,} , # Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VSRA instruction must be unconditional. The data type for the elements of the vectors. It must be one of: S signed, encoded as U = 0 U unsigned, encoded as U = 1. The data size for the elements of the vectors. It must be one of: 8 Encoded as L = ’0’, imm6<5:3> = ’001’. (8– ) is encoded in imm6<2:0>. 16 Encoded as L = ’0’, imm6<5:4> = ’01’. (16 – ) is encoded in imm6<3:0>. 32 Encoded as L = ’0’, imm6<5> = ’1’. (32 – ) is encoded in imm6<4:0>. 64 Encoded as L = ’1’. (64 – ) is encoded in imm6<5:0>. , The destination vector, and the operand vector, for a quadword operation.
, The destination vector, and the operand vector, for a doubleword operation. The immediate value, in the range 1 to . See the description of for how is encoded. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 result = Int(Elem[D[m+r],e,esize], unsigned) >> shift_amount; Elem[D[d+r],e,esize] = Elem[D[d+r],e,esize] + result; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-765 Instruction Details A8.6.390 VSRI Vector Shift Right and Insert takes each element in the operand vector, right shifts them by an immediate value, and inserts the results in the destination vector. Bits shifted out of the right of each element are lost. The elements must all be the same size, and can be 8-bit, 16-bit, 32-bit, or 64-bit. There is no distinction between data types. Encoding T1 / A1 Advanced SIMD VSRI. , , # VSRI.
, , # 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 1 1 1 1 D imm6 1 0 15 14 13 12 11 10 9 8 Vd 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D imm6 Vd 7 6 5 4 3 2 0 1 0 0 L Q M 1 7 6 5 4 3 2 0 1 0 0 L Q M 1 1 0 Vm 1 0 Vm if L:imm6 == ‘0000xxx’ then SEE “Related encodings”; if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; case L:imm6 of when ‘0001xxx’ esize = 8; elements = 8; shift_amount = 16 - UInt(imm6); when ‘001xxxx’ esize = 16; elements = 4; shift_amount = 32 - UInt(imm6); when ‘01xxxxx’ esize = 32; elements = 2; shift_amount = 64 - UInt(imm6); when ‘1xxxxxx’ esize = 64; elements = 1; shift_amount = 64 - UInt(imm6); d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; Related encodings A8-766 See One register and a modified immediate value on page A7-21 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VSRI. {,} , # VSRI. {
,} , # Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VSRI instruction must be unconditional. The data size for the elements of the vectors. It must be one of: 8 Encoded as L = ’0’, imm6<5:3> = ’001’. (8– ) is encoded in imm6<2:0>. 16 Encoded as L = ’0’, imm6<5:4> = ’01’. (16 – ) is encoded in imm6<3:0>. 32 Encoded as L = ’0’, imm6<5> = ’1’. (32 – ) is encoded in imm6<4:0>. 64 Encoded as L = ’1’. (64 – ) is encoded in imm6<5:0>. , The destination vector, and the operand vector, for a quadword operation.
, The destination vector, and the operand vector, for a doubleword operation. The immediate value, in the range 1 to . See the description of for how is encoded. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); mask = LSR(Ones(esize), shift_amount); for r = 0 to regs-1 for e = 0 to elements-1 shifted_op = LSR(Elem[D[m+r],e,esize], shift_amount); Elem[D[d+r],e,esize] = (Elem[D[d+r],e,esize] AND NOT(mask)) OR shifted_op; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-767 Instruction Details A8.6.391 VST1 (multiple single elements) Vector Store (multiple single elements) stores elements to memory from one, two, three, or four registers, without interleaving. Every element of each register is stored. For details of the addressing mode see Advanced SIMD addressing mode on page A7-30. Encoding T1 / A1 Advanced SIMD VST1. , [{@}]{!} VST1. , [{@}], 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 0 0 1 0 D 0 0 1 0 15 14 13 12 11 10 9 8 Rn Vd type 7 6 5 4 3 2 size align Rm 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 0 1 0 0 0 D 0 0 size align Rn Vd type 1 0 1 0 Rm case type of when ‘0111’ regs = 1; if align<1> == ‘1’ then UNDEFINED; when ‘1010’ regs = 2; if align == ‘11’ then UNDEFINED; when ‘0110’ regs = 3; if align<1> == ‘1’ then UNDEFINED; when ‘0010’ regs = 4; otherwise SEE “Related encodings”; alignment = if align == ‘00’ then 1 else 4 << UInt(align); ebytes = 1 << UInt(size); esize = 8 * ebytes; elements = 8 DIV ebytes; d = UInt(D:Vd); n = UInt(Rn); m = UInt(Rm); wback = (m != 15); register_index = (m != 15 && m != 13); if d+regs > 32 then UNPREDICTABLE; Related encodings See Advanced SIMD element or structure load/store instructions on page A7-27 Assembler syntax VST1. , [{@}] VST1. , [{@}]! VST1. , [{@}], Rm = ’1111’ Rm = ’1101’ Rm = other values where: A8-768 See Standard assembler syntax fields on page A8-7. An ARM VST1 instruction must be unconditional. The data size. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details 32 64 encoded as size = 0b10 encoded as size = 0b11. The list of registers to store. It must be one of: {
} encoded as D:Vd =
, type = 0b0111 {
, } encoded as D:Vd =
, type = 0b1010 {
, , } encoded as D:Vd =
, type = 0b0110 {
, , , } encoded as D:Vd =
, type = 0b0010. Contains the base address for the access. The alignment. It can be one of: 64 8-byte alignment, encoded as align = 0b01. 128 16-byte alignment, available only if contains two or four registers, encoded as align = 0b10. 256 32-byte alignment, available only if contains four registers, encoded as align = 0b11. omitted Standard alignment, see Unaligned data access on page A3-5. Encoded as align = 0b00. ! If present, specifies writeback. Contains an address offset applied after the access. For more information about , !, and , see Advanced SIMD addressing mode on page A7-30. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); NullCheckIfThumbEE(n); address = R[n]; if (address MOD alignment) != 0 then GenerateAlignmentException(); if wback then R[n] = R[n] + (if register_index then R[m] else 8*regs); for r = 0 to regs-1 for e = 0 to elements-1 MemU[address,ebytes] = Elem[D[d+r],e,esize]; address = address + ebytes; Exceptions Undefined Instruction, Data Abort. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-769 Instruction Details A8.6.392 VST1 (single element from one lane) This instruction stores one element to memory from one element of a register. For details of the addressing mode see Advanced SIMD addressing mode on page A7-30. Encoding T1 / A1 Advanced SIMD VST1. , [{@}]{!} VST1. , [{@}], 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 0 0 1 1 D 0 0 1 0 15 14 13 12 11 10 9 8 Rn Vd size 0 0 index_align 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 1 0 0 1 D 0 0 Rn Vd 7 6 5 4 3 2 Rm 7 6 5 4 3 2 size 0 0 index_align 1 0 1 0 Rm if size == ‘11’ then UNDEFINED; case size of when ‘00’ if index_align<0> != ‘0’ then UNDEFINED; ebytes = 1; esize = 8; index = UInt(index_align<3:1>); alignment = 1; when ‘01’ if index_align<1> != ‘0’ then UNDEFINED; ebytes = 2; esize = 16; index = UInt(index_align<3:2>); alignment = if index_align<0> == ‘0’ then 1 else 2; when ‘10’ if index_align<2> != ‘0’ then UNDEFINED; if index_align<1:0> != ‘00’ && index_align<1:0> != ‘11’ then UNDEFINED; ebytes = 4; esize = 32; index = UInt(index_align<3>); alignment = if index_align<1:0> == ‘00’ then 1 else 4; d = UInt(D:Vd); n = UInt(Rn); m = UInt(Rm); wback = (m != 15); register_index = (m != 15 && m != 13); Assembler syntax VST1. , [{@}] VST1. , [{@}]! VST1. , [{@}], Rm = ’1111’ Rm = ’1101’ Rm = other values where: A8-770 See Standard assembler syntax fields on page A8-7. An ARM VST1 instruction must be unconditional. The data size. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 32 encoded as size = 0b10. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details The register containing the element to store. It must be {}. The register Dd is encoded in D:Vd Contains the base address for the access. The alignment. It can be one of: 16 2-byte alignment, available only if is 16 32 4-byte alignment, available only if is 32 omitted Standard alignment, see Unaligned data access on page A3-5. ! If present, specifies writeback. Contains an address offset applied after the access. For more information about , !, and , see Advanced SIMD addressing mode on page A7-30. Table A8-9 shows the encoding of index and alignment for different values. Table A8-9 Encoding of index and alignment == 8 == 16 == 32 Index index_align[3:1] = x index_align[3:2] = x index_align[3] = x omitted index_align[0] = 0 index_align[1:0] = ’00’ index_align[2:0] = ’000’ == 16 - index_align[1:0] = ’01’ - == 32 - - index_align[2:0] = ’011’ Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); NullCheckIfThumbEE(n); address = R[n]; if (address MOD alignment) != 0 then GenerateAlignmentException(); if wback then R[n] = R[n] + (if register_index then R[m] else ebytes); MemU[address,ebytes] = Elem[D[d],index,esize]; Exceptions Undefined Instruction, Data Abort. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-771 Instruction Details A8.6.393 VST2 (multiple 2-element structures) This instruction stores multiple 2-element structures from two or four registers to memory, with interleaving. For more information, see Element and structure load/store instructions on page A4-27. Every element of each register is saved. For details of the addressing mode see Advanced SIMD addressing mode on page A7-30. Encoding T1 / A1 Advanced SIMD VST2. , [{@}]{!} VST2. , [{@}], 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 0 0 1 0 D 0 0 1 0 15 14 13 12 11 10 9 8 Rn Vd type 7 6 5 4 3 2 size align Rm 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 0 1 0 0 0 D 0 0 size align Rn Vd type 1 0 1 0 Rm if size == ‘11’ then UNDEFINED; case type of when ‘1000’ regs = 1; inc = 1; if align == ‘11’ then UNDEFINED; when ‘1001’ regs = 1; inc = 2; if align == ‘11’ then UNDEFINED; when ‘0011’ regs = 2; inc = 2; otherwise SEE “Related encodings”; alignment = if align == ‘00’ then 1 else 4 << UInt(align); ebytes = 1 << UInt(size); esize = 8 * ebytes; elements = 8 DIV ebytes; d = UInt(D:Vd); d2 = d + inc; n = UInt(Rn); m = UInt(Rm); wback = (m != 15); register_index = (m != 15 && m != 13); if d2+regs > 32 then UNPREDICTABLE; Related encodings See Advanced SIMD element or structure load/store instructions on page A7-27 Assembler syntax VST2. , [{@}] VST2. , [{@}]! VST2. , [{@}], Rm = ’1111’ Rm = ’1101’ Rm = other values where: A8-772 See Standard assembler syntax fields on page A8-7. An ARM VST2 instruction must be unconditional. The data size. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details 32 encoded as size = 0b10. The list of registers to store. It must be one of: {
, } encoded as D:Vd =
, type = 0b1000 {
, } encoded as D:Vd =
, type = 0b1001 {
, , , } encoded as D:Vd =
, type = 0b0011. Contains the base address for the access. The alignment. It can be one of: 64 8-byte alignment, encoded as align = 0b01. 128 16-byte alignment, encoded as align = 0b10. 256 32-byte alignment, available only if contains four registers, encoded as align = 0b11 omitted Standard alignment, see Unaligned data access on page A3-5. Encoded as align = 0b00. ! If present, specifies writeback. Contains an address offset applied after the access. For more information about , !, and , see Advanced SIMD addressing mode on page A7-30. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); NullCheckIfThumbEE(n); address = R[n]; if (address MOD alignment) != 0 then GenerateAlignmentException(); if wback then R[n] = R[n] + (if register_index then R[m] else 16*regs); for r = 0 to regs-1 for e = 0 to elements-1 MemU[address,ebytes] = Elem[D[d+r],e,esize]; MemU[address+ebytes,ebytes] = Elem[D[d2+r],e,esize]; address = address + 2*ebytes; Exceptions Undefined Instruction, Data Abort. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-773 Instruction Details A8.6.394 VST2 (single 2-element structure from one lane) This instruction stores one 2-element structure to memory from corresponding elements of two registers. For details of the addressing mode see Advanced SIMD addressing mode on page A7-30. Encoding T1 / A1 Advanced SIMD VST2. , [{@}]{!} VST2. , [{@}], 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 0 0 1 1 D 0 0 1 0 15 14 13 12 11 10 9 8 Rn Vd size 0 1 index_align 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 1 0 0 1 D 0 0 Rn Vd 7 6 5 4 3 2 Rm 7 6 5 4 3 2 size 0 1 index_align 1 0 1 0 Rm if size == ‘11’ then UNDEFINED; case size of when ‘00’ ebytes = 1; esize = 8; index = UInt(index_align<3:1>); inc = 1; alignment = if index_align<0> == ‘0’ then 1 else 2; when ‘01’ ebytes = 2; esize = 16; index = UInt(index_align<3:2>); inc = if index_align<1> == ‘0’ then 1 else 2; alignment = if index_align<0> == ‘0’ then 1 else 4; when ‘10’ if index_align<1> != ‘0’ then UNDEFINED; ebytes = 4; esize = 32; index = UInt(index_align<3>); inc = if index_align<2> == ‘0’ then 1 else 2; alignment = if index_align<0> == ‘0’ then 1 else 8; d = UInt(D:Vd); d2 = d + inc; n = UInt(Rn); m = UInt(Rm); wback = (m != 15); register_index = (m != 15 && m != 13); if d2 > 31 then UNPREDICTABLE; Assembler syntax VST2. , [{@}] VST2. , [{@}]! VST2. , [{@}], Rm = ’1111’ Rm = ’1101’ Rm = other values where: A8-774 See Standard assembler syntax fields on page A8-7. An ARM VST2 instruction must be unconditional. The data size. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 32 encoded as size = 0b10. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details The registers containing the structure. Encoded with D:Vd =
. It must be one of: {, } Single-spaced registers, see Table A8-10. {, } Double-spaced registers, see Table A8-10. This is not available if == 8. Contains the base address for the access. The alignment. It can be one of: 16 2-byte alignment, available only if is 8 32 4-byte alignment, available only if is 16 64 8-byte alignment, available only if is 32 omitted Standard alignment, see Unaligned data access on page A3-5. ! If present, specifies writeback. Contains an address offset applied after the access. For more information about , !, and , see Advanced SIMD addressing mode on page A7-30. Table A8-10 Encoding of index, alignment, and register spacing == 8 == 16 == 32 Index index_align[3:1] = x index_align[3:2] = x index_align[3] = x Single-spacing - index_align[1] = 0 index_align[2] = 0 Double-spacing - index_align[1] = 1 index_align[2] = 1 omitted index_align[0] = 0 index_align[0] = 0 index_align[1:0] = ’00’ == 16 index_align[0] = 1 - - == 32 - index_align[0] = 1 - == 64 - - index_align[1:0] = ’01’ Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); NullCheckIfThumbEE(n); address = R[n]; if (address MOD alignment) != 0 then GenerateAlignmentException(); if wback then R[n] = R[n] + (if register_index then R[m] else 2*ebytes); MemU[address,ebytes] = Elem[D[d],index,esize]; MemU[address+ebytes,ebytes] = Elem[D[d2],index,esize]; Exceptions Undefined Instruction, Data Abort. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-775 Instruction Details A8.6.395 VST3 (multiple 3-element structures) This instruction stores multiple 3-element structures to memory from three registers, with interleaving. For more information, see Element and structure load/store instructions on page A4-27. Every element of each register is saved. For details of the addressing mode see Advanced SIMD addressing mode on page A7-30. Encoding T1 / A1 Advanced SIMD VST3. , [{@}]{!} VST3. , [{@}], 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 0 0 1 0 D 0 0 1 0 15 14 13 12 11 10 9 8 Rn Vd type 7 6 5 4 3 2 size align Rm 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 0 1 0 0 0 D 0 0 size align Rn Vd type 1 0 1 0 Rm if size == ‘11’ || align<1> == ‘1’ then UNDEFINED; case type of when ‘0100’ inc = 1; when ‘0101’ inc = 2; otherwise SEE “Related encodings”; alignment = if align<0> == ‘0’ then 1 else 8; ebytes = 1 << UInt(size); esize = 8 * ebytes; elements = 8 DIV ebytes; d = UInt(D:Vd); d2 = d + inc; d3 = d2 + inc; n = UInt(Rn); m = UInt(Rm); wback = (m != 15); register_index = (m != 15 && m != 13); if d3 > 31 then UNPREDICTABLE; Related encodings See Advanced SIMD element or structure load/store instructions on page A7-27 Assembler syntax VST3. , [{@}] VST3. , [{@}]! VST3. , [{@}], Rm = ’1111’ Rm = ’1101’ Rm = other values where: A8-776 See Standard assembler syntax fields on page A8-7. An ARM VST3 instruction must be unconditional. The data size. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 32 encoded as size = 0b10. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details The list of registers to store. It must be one of: {
, , } encoded as D:Vd =
, type = 0b0100 {
, , } encoded as D:Vd =
, type = 0b0101. Contains the base address for the access. The alignment. It can be: 64 8-byte alignment, encoded as align = 0b01. omitted Standard alignment, see Unaligned data access on page A3-5. Encoded as align = 0b00. ! If present, specifies writeback. Contains an address offset applied after the access. For more information about , !, and , see Advanced SIMD addressing mode on page A7-30. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); NullCheckIfThumbEE(n); address = R[n]; if (address MOD alignment) != 0 then GenerateAlignmentException(); if wback then R[n] = R[n] + (if register_index then R[m] else 24); for e = 0 to elements-1 MemU[address,ebytes] = Elem[D[d],e,esize]; MemU[address+ebytes,ebytes] = Elem[D[d2],e,esize]; MemU[address+2*ebytes,ebytes] = Elem[D[d3],e,esize]; address = address + 3*ebytes; Exceptions Undefined Instruction, Data Abort. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-777 Instruction Details A8.6.396 VST3 (single 3-element structure from one lane) This instruction stores one 3-element structure to memory from corresponding elements of three registers. For details of the addressing mode see Advanced SIMD addressing mode on page A7-30. Encoding T1 / A1 Advanced SIMD VST3. , []{!} VST3. , [], 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 0 0 1 1 D 0 0 1 0 15 14 13 12 11 10 9 8 Rn Vd size 1 0 index_align 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 1 0 0 1 D 0 0 Rn Vd 7 6 5 4 3 2 Rm 7 6 5 4 3 2 size 1 0 index_align 1 0 1 0 Rm if size == ‘11’ then UNDEFINED; case size of when ‘00’ if index_align<0> != ‘0’ then UNDEFINED; ebytes = 1; esize = 8; index = UInt(index_align<3:1>); inc = 1; when ‘01’ if index_align<0> != ‘0’ then UNDEFINED; ebytes = 2; esize = 16; index = UInt(index_align<3:2>); inc = if index_align<1> == ‘0’ then 1 else 2; when ‘10’ if index_align<1:0> != ‘00’ then UNDEFINED; ebytes = 4; esize = 32; index = UInt(index_align<3>); inc = if index_align<2> == ‘0’ then 1 else 2; d = UInt(D:Vd); d2 = d + inc; d3 = d2 + inc; n = UInt(Rn); m = UInt(Rm); wback = (m != 15); register_index = (m != 15 && m != 13); if d3 > 31 then UNPREDICTABLE; Assembler syntax VST3. , [] VST3. , []! VST3. , [], Rm = ’1111’ Rm = ’1101’ Rm = other values where: A8-778 See Standard assembler syntax fields on page A8-7. An ARM VST3 instruction must be unconditional. The data size. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 32 encoded as size = 0b10. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details The registers containing the structure. Encoded with D:Vd =
. It must be one of: {, , } Single-spaced registers, see Table A8-11. {, , } Double-spaced registers, see Table A8-11. This is not available if == 8. Contains the base address for the access. ! If present, specifies writeback. Contains an address offset applied after the access. For more information about , !, and , see Advanced SIMD addressing mode on page A7-30. Table A8-11 Encoding of index and register spacing == 8 == 16 == 32 Index index_align[3:1] = x index_align[3:2] = x index_align[3] = x Single-spacing index_align[0] = 0 index_align[1:0] = ’00’ index_align[2:0] = ’000’ Double-spacing - index_align[1:0] = ’10’ index_align[2:0] = ’100’ Alignment Standard alignment rules apply, see Unaligned data access on page A3-5. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); NullCheckIfThumbEE(n); address = R[n]; if wback then R[n] = R[n] + (if register_index then R[m] else 3*ebytes); MemU[address,ebytes] = Elem[D[d],index,esize]; MemU[address+ebytes,ebytes] = Elem[D[d2],index,esize]; MemU[address+2*ebytes,ebytes] = Elem[D[d3],index,esize]; Exceptions Undefined Instruction, Data Abort. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-779 Instruction Details A8.6.397 VST4 (multiple 4-element structures) This instruction stores multiple 4-element structures to memory from four registers, with interleaving. For more information, see Element and structure load/store instructions on page A4-27. Every element of each register is saved. For details of the addressing mode see Advanced SIMD addressing mode on page A7-30. Encoding T1 / A1 Advanced SIMD VST4. , [{@}]{!} VST4. , [{@}], 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 0 0 1 0 D 0 0 1 0 15 14 13 12 11 10 9 8 Rn Vd type 7 6 5 4 3 2 size align Rm 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 0 1 0 0 0 D 0 0 size align Rn Vd type if size == ‘11’ then UNDEFINED; case type of when ‘0000’ inc = 1; when ‘0001’ inc = 2; otherwise SEE “Related encodings”; alignment = if align == ‘00’ then 1 else 4 << UInt(align); ebytes = 1 << UInt(size); esize = 8 * ebytes; elements = 8 DIV ebytes; d = UInt(D:Vd); d2 = d + inc; d3 = d2 + inc; d4 = d3 + inc; n = UInt(Rn); wback = (m != 15); register_index = (m != 15 && m != 13); if d4 > 31 then UNPREDICTABLE; Related encodings 1 0 1 0 Rm m = UInt(Rm); See Advanced SIMD element or structure load/store instructions on page A7-27 Assembler syntax VST4. , [{@}] VST4. , [{@}]! VST4. , [{@}], Rm = ’1111’ Rm = ’1101’ Rm = other values where: A8-780 See Standard assembler syntax fields on page A8-7. An ARM VST4 instruction must be unconditional. The data size. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 32 encoded as size = 0b10. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details The list of registers to store. It must be one of: {
, , , } encoded as D:Vd =
, type = 0b0000 {
, , , } encoded as D:Vd =
, type = 0b0001. Contains the base address for the access. The alignment. It can be one of: 64 8-byte alignment, encoded as align = 0b01. 128 16-byte alignment, encoded as align = 0b10. 256 32-byte alignment, encoded as align = 0b11. omitted Standard alignment, see Unaligned data access on page A3-5. Encoded as align = 0b00. ! If present, specifies writeback. Contains an address offset applied after the access. For more information about , !, and , see Advanced SIMD addressing mode on page A7-30. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); NullCheckIfThumbEE(n); address = R[n]; if (address MOD alignment) != 0 then GenerateAlignmentException(); if wback then R[n] = R[n] + (if register_index then R[m] else 32); for e = 0 to elements-1 MemU[address,ebytes] = Elem[D[d],e,esize]; MemU[address+ebytes,ebytes] = Elem[D[d2],e,esize]; MemU[address+2*ebytes,ebytes] = Elem[D[d3],e,esize]; MemU[address+3*ebytes,ebytes] = Elem[D[d4],e,esize]; address = address + 4*ebytes; Exceptions Undefined Instruction, Data Abort. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-781 Instruction Details A8.6.398 VST4 (single 4-element structure from one lane) This instruction stores one 4-element structure to memory from corresponding elements of four registers. For details of the addressing mode see Advanced SIMD addressing mode on page A7-30. Encoding T1 / A1 Advanced SIMD VST4. , [{@}]{!} VST4. , [{@}], 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 0 0 1 1 D 0 0 1 0 15 14 13 12 11 10 9 8 Rn Vd size 1 1 index_align 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 1 0 0 1 D 0 0 Rn Vd 7 6 5 4 3 2 Rm 7 6 5 4 3 2 size 1 1 index_align 1 0 1 0 Rm if size == ‘11’ then UNDEFINED; case size of when ‘00’ ebytes = 1; esize = 8; index = UInt(index_align<3:1>); inc = 1; alignment = if index_align<0> == ‘0’ then 1 else 4; when ‘01’ ebytes = 2; esize = 16; index = UInt(index_align<3:2>); inc = if index_align<1> == ‘0’ then 1 else 2; alignment = if index_align<0> == ‘0’ then 1 else 8; when ‘10’ if index_align<1:0> == ‘11’ then UNDEFINED; ebytes = 4; esize = 32; index = UInt(index_align<3>); inc = if index_align<2> == ‘0’ then 1 else 2; alignment = if index_align<1:0> == ‘00’ then 1 else 4 << UInt(index_align<1:0>); d = UInt(D:Vd); d2 = d + inc; d3 = d2 + inc; d4 = d3 + inc; n = UInt(Rn); m = UInt(Rm); wback = (m != 15); register_index = (m != 15 && m != 13); if d4 > 31 then UNPREDICTABLE; Assembler syntax VST4. , [{@}] VST4. , [{@}]! VST4. , [{@}], Rm = ’1111’ Rm = ’1101’ Rm = other values where: A8-782 See Standard assembler syntax fields on page A8-7. An ARM VST4 instruction must be unconditional. The data size. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 32 encoded as size = 0b10. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details The registers containing the structure. Encoded with D:Vd =
. It must be one of: {, , , } Single-spaced registers, see Table A8-12. {, , , } Double-spaced registers, see Table A8-12. This is not available if == 8. The base address for the access. The alignment. It can be: 32 4-byte alignment, available only if is 8. 64 8-byte alignment, available only if is 16 or 32. 128 16-byte alignment, available only if is 32. omitted Standard alignment, see Unaligned data access on page A3-5. If present, specifies writeback. Contains an address offset applied after the access. ! For more information about , !, and , see Advanced SIMD addressing mode on page A7-30. Table A8-12 Encoding of index, alignment, and register spacing == 8 == 16 == 32 Index index_align[3:1] = x index_align[3:2] = x index_align[3] = x Single-spacing - index_align[1] = 0 index_align[2] = 0 Double-spacing - index_align[1] = 1 index_align[2] = 1 omitted index_align[0] = 0 index_align[0] = 0 index_align[1:0] = ’00’ == 32 index_align[0] = 1 - - == 64 - index_align[0] = 1 index_align[1:0] = ’01’ == 128 - - index_align[1:0] = ’10’ Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); NullCheckIfThumbEE(n); address = R[n]; if (address MOD alignment) != 0 then GenerateAlignmentException(); if wback then R[n] = R[n] + (if register_index then R[m] else 4*ebytes); MemU[address,ebytes] = Elem[D[d],index,esize]; MemU[address+ebytes,ebytes] = Elem[D[d2],index,esize]; MemU[address+2*ebytes,ebytes] = Elem[D[d3],index,esize]; MemU[address+3*ebytes,ebytes] = Elem[D[d4],index,esize]; Exceptions Undefined Instruction, Data Abort. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-783 Instruction Details A8.6.399 VSTM Vector Store Multiple stores multiple extension registers to consecutive memory locations using an address from an ARM core register. Encoding T1 / A1 VFPv2, VFPv3, Advanced SIMD VSTM{mode} {!}, 15 14 13 12 11 10 9 8 7 is consecutive 64-bit registers 6 5 4 3 2 1 1 1 0 1 1 0 P U D W 0 1 0 15 14 13 12 11 10 9 8 Rn Vd 1 0 1 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 0 P U D W 0 Rn Vd 7 6 5 4 3 2 1 0 imm8 7 6 5 4 3 2 1 0 1 1 1 0 imm8 if P == ‘0’ && U == ‘0’ && W == ‘0’ then SEE “Related encodings”; if P == ‘1’ && U == ‘0’ && W == ‘1’ && Rn == ‘1101’ then SEE VPUSH; if P == ‘1’ && W == ‘0’ then SEE VSTR; if P == U && W == ‘1’ then UNDEFINED; // Remaining combinations are PUW = 010 (IA without !), 011 (IA with !), 101 (DB with !) single_regs = FALSE; add = (U == ‘1’); wback = (W == ‘1’); d = UInt(D:Vd); n = UInt(Rn); imm32 = ZeroExtend(imm8:’00’, 32); regs = UInt(imm8) DIV 2; // If UInt(imm8) is odd, see “FSTMX”. if n == 15 && (wback || CurrentInstrSet() != InstrSet_ARM) then UNPREDICTABLE; if regs == 0 || regs > 16 || (d+regs) > 32 then UNPREDICTABLE; Encoding T2 / A2 VFPv2, VFPv3 VSTM{mode} {!}, 15 14 13 12 11 10 9 8 7 is consecutive 32-bit registers 6 5 4 3 2 1 1 1 0 1 1 0 P U D W 0 1 0 15 14 13 12 11 10 9 8 Rn Vd 1 0 1 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 0 P U D W 0 Rn Vd 7 6 5 4 3 2 1 0 imm8 7 6 5 4 3 2 1 0 1 0 1 0 imm8 if P == ‘0’ && U == ‘0’ && W == ‘0’ then SEE “Related encodings”; if P == ‘1’ && U == ‘0’ && W == ‘1’ && Rn == ‘1101’ then SEE VPUSH; if P == ‘1’ && W == ‘0’ then SEE VSTR; if P == U && W == ‘1’ then UNDEFINED; // Remaining combinations are PUW = 010 (IA without !), 011 (IA with !), 101 (DB with !) single_regs = TRUE; add = (U == ‘1’); wback = (W == ‘1’); d = UInt(Vd:D); n = UInt(Rn); imm32 = ZeroExtend(imm8:’00’, 32); regs = UInt(imm8); if n == 15 && (wback || CurrentInstrSet() != InstrSet_ARM) then UNPREDICTABLE; if regs == 0 || (d+regs) > 32 then UNPREDICTABLE; A8-784 Related encodings See 64-bit transfers between ARM core and extension registers on page A7-32 FSTMX Encoding T1/A1 behaves as described by the pseudocode if imm8 is odd. However, there is no UAL syntax for such encodings and their use is deprecated. For more information, see FLDMX, FSTMX on page A8-101. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VSTM{}{.} {!}, where: The addressing mode: IA Increment After. The consecutive addresses start at the address specified in . This is the default and can be omitted. Encoded as P = 0, U = 1. DB Decrement Before. The consecutive addresses end just before the address specified in . Encoded as P = 1, U = 0. See Standard assembler syntax fields on page A8-7. An optional data size specifier. If present, it must be equal to the size in bits, 32 or 64, of the registers in . The base register. The SP can be used. In the ARM instruction set, if ! is not specified the PC can be used. However, use of the PC is deprecated. ! Causes the instruction to write a modified value back to . Required if == DB. Encoded as W = 1. If ! is omitted, the instruction does not change in this way. Encoded as W = 0. The extension registers to be stored, as a list of consecutively numbered doubleword (encoding T1 / A1) or singleword (encoding T2 / A2) registers, separated by commas and surrounded by brackets. It is encoded in the instruction by setting D and Vd to specify the first register in the list, and imm8 to twice the number of registers in the list (encoding T1 / A1) or the number of registers (encoding T2 / A2). must contain at least one register. If it contains doubleword registers it must not contain more than 16 registers. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckVFPEnabled(TRUE); NullCheckIfThumbEE(n); address = if add then R[n] else R[n]-imm32; if wback then R[n] = if add then R[n}+imm32 else R[n]-imm32; for r = 0 to regs-1 if single_regs then MemA[address,4] = S[d+r]; address = address+4; else // Store as two word-aligned words in the correct order for current endianness. MemA[address,4] = if BigEndian() then D[d+r]<63:32> else D[d+r]<31:0>; MemA[address+4,4] = if BigEndian() then D[d+r]<31:0> else D[d+r]<63:32>; address = address+8; Exceptions Undefined Instruction, Data Abort. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-785 Instruction Details A8.6.400 VSTR This instruction stores a single extension register to memory, using an address from an ARM core register, with an optional offset. Encoding T1 / A1 VFPv2, VFPv3, Advanced SIMD VSTR
, [{, #+/-}] 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 0 1 U D 0 0 1 0 15 14 13 12 11 10 9 8 Rn Vd 1 0 1 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 0 1 U D 0 0 Rn Vd 7 6 5 4 3 2 1 0 imm8 7 6 5 4 3 2 1 0 1 1 1 0 imm8 single_reg = FALSE; add = (U == ‘1’); imm32 = ZeroExtend(imm8:’00’, 32); d = UInt(D:Vd); n = UInt(Rn); if n == 15 && CurrentInstrSet() != InstrSet_ARM then UNPREDICTABLE; Encoding T2 / A2 VFPv2, VFPv3 VSTR , [{, #+/-}] 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 0 1 U D 0 0 1 0 15 14 13 12 11 10 9 8 Rn Vd 1 0 1 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 0 1 U D 0 0 Rn Vd 7 6 5 4 3 2 1 0 imm8 7 6 5 4 3 2 1 0 1 0 1 0 imm8 single_reg = TRUE; add = (U == ‘1’); imm32 = ZeroExtend(imm8:’00’, 32); d = UInt(Vd:D); n = UInt(Rn); if n == 15 && CurrentInstrSet() != InstrSet_ARM then UNPREDICTABLE; A8-786 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VSTR{.64}
, [{, #+/-}] VSTR{.32} , [{, #+/-}] Encoding T1 / A1 Encoding T2 / A2 where: See Standard assembler syntax fields on page A8-7. .32, .64 Optional data size specifiers.
The source register for a doubleword store. The source register for a singleword store. The base register. The SP can be used. In the ARM instruction set the PC can be used. However, use of the PC is deprecated. +/- Is + or omitted if the immediate offset is to be added to the base register value (add == TRUE), or – if it is to be subtracted (add == FALSE). #0 and #-0 generate different instructions. The immediate offset used to form the address. Values are multiples of 4 in the range 0-1020. can be omitted, meaning an offset of +0. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckVFPEnabled(TRUE); NullCheckIfThumbEE(n); address = if add then (R[n] + imm32) else (R[n] - imm32); if single_reg then MemA[address,4] = S[d]; else // Store as two word-aligned words in the correct order for current endianness. MemA[address,4] = if BigEndian() then D[d]<63:32> else D[d]<31:0>; MemA[address+4,4] = if BigEndian() then D[d]<31:0> else D[d]<63:32>; Exceptions Undefined Instruction, Data Abort. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-787 Instruction Details A8.6.401 VSUB (integer) Vector Subtract subtracts the elements of one vector from the corresponding elements of another vector, and places the results in the destination vector. Encoding T1 / A1 Advanced SIMD VSUB.
, , VSUB.
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 1 1 1 0 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 1 0 0 0 N Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 0 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 0 0 0 N Q M 0 1 0 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-788 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VSUB.
{,} , VSUB.
{
,} , where: See Standard assembler syntax fields on page A8-7. An ARM Advanced SIMD VSUB instruction must be unconditional.
The data type for the elements of the vectors. It must be one of: I8 size = 0b00 I16 size = 0b01 I32 size = 0b10 I64 size = 0b11. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 Elem[D[d+r],e,esize] = Elem[D[n+r],e,esize] - Elem[D[m+r],e,esize]; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-789 Instruction Details A8.6.402 VSUB (floating-point) Vector Subtract subtracts the elements of one vector from the corresponding elements of another vector, and places the results in the destination vector. Encoding T1 / A1 Advanced SIMD (UNDEFINED in integer-only variant) VSUB.F32 , , VSUB.F32
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 1 0 D 1 sz 1 0 15 14 13 12 11 10 9 8 Vn Vd 1 1 0 1 N Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 0 0 D 1 sz Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 1 0 1 N Q M 0 1 0 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; if sz == ‘1’ then UNDEFINED; advsimd = TRUE; esize = 32; elements = 2; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; Encoding T2 / A2 VFPv2, VFPv3 (sz = 1 UNDEFINED in single-precision only variants) VSUB.F64
, , VSUB.F32 , , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 0 0 D 1 1 1 0 15 14 13 12 11 10 9 8 Vn Vd 1 0 1 sz N 1 M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 1 1 0 0 D 1 1 Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 0 1 sz N 1 M 0 1 0 1 0 Vm if FPSCR.LEN != ‘000’ || FPSCR.STRIDE != ‘00’ then SEE “VFP vectors”; advsimd = FALSE; dp_operation = (sz == ‘1’); d = if dp_operation then UInt(D:Vd) else UInt(Vd:D); n = if dp_operation then UInt(N:Vn) else UInt(Vn:N); m = if dp_operation then UInt(M:Vm) else UInt(Vm:M); VFP vectors A8-790 Encoding T2 / A2 can operate on VFP vectors under control of the FPSCR.LEN and FPSCR.STRIDE bits. For details see Appendix F VFP Vector Operation Support. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VSUB.F32 {,} , VSUB.F32 {
,} , VSUB.F64 {
,} , VSUB.F32 {,} , Encoding T1 / A1, Q = 1, sz = 0 Encoding T1 / A1, Q = 0, sz = 0 Encoding T2 / A2, sz = 1 Encoding T2 / A2, sz = 0 where: See Standard assembler syntax fields on page A8-7. An ARM Advanced SIMD VSUB instruction must be unconditional. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. , , The destination vector and the operand vectors, for a singleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDOrVFPEnabled(TRUE, advsimd); if advsimd then // Advanced SIMD instruction for r = 0 to regs-1 for e = 0 to elements-1 Elem[D[d+r],e,esize] = FPSub(Elem[D[n+r],e,esize], Elem[D[m+r],e,esize], FALSE); else // VFP instruction if dp_operation then D[d] = FPSub(D[n], D[m], TRUE); else S[d] = FPSub(S[n], S[m], TRUE); Exceptions Undefined Instruction. Floating-point exceptions: Input Denormal, Invalid Operation, Overflow, Underflow, and Inexact. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-791 Instruction Details A8.6.403 VSUBHN Vector Subtract and Narrow, returning High Half subtracts the elements of one quadword vector from the corresponding elements of another quadword vector takes the most significant half of each result, and places the final results in a doubleword vector. The results are truncated. (For rounded results, see VRSUBHN on page A8-748. There is no distinction between signed and unsigned integers. Encoding T1 / A1 Advanced SIMD VSUBHN.
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 1 1 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 0 1 1 0 N 0 M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 0 1 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 1 1 0 N 0 M 0 1 0 1 0 Vm if size == ‘11’ then SEE “Related encodings”; if Vn<0> == ‘1’ || Vm<0> == ‘1’ then UNDEFINED; esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); Related encodings A8-792 See Advanced SIMD data-processing instructions on page A7-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VSUBHN.
, , where: See Standard assembler syntax fields on page A8-7. An ARM VSUBHN instruction must be unconditional.
The data type for the elements of the operands. It must be one of: I16 size = 0b00 I32 size = 0b01 I64 size = 0b10.
, , The destination vector, the first operand vector, and the second operand vector. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for e = 0 to elements-1 result = Elem[Q[n>>1],e,2*esize] - Elem[Q[m>>1],e,2*esize]; Elem[D[d],e,esize] = result<2*esize-1:esize>; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-793 Instruction Details A8.6.404 VSUBL, VSUBW Vector Subtract Long subtracts the elements of one doubleword vector from the corresponding elements of another doubleword vector, and places the results in a quadword vector. Before subtracting, it sign-extends or zero-extends the elements of both operands. Vector Subtract Wide subtracts the elements of a doubleword vector from the corresponding elements of a quadword vector, and places the results in another quadword vector. Before subtracting, it sign-extends or zero-extends the elements of the doubleword operand. Encoding T1 / A1 Advanced SIMD VSUBL.
, , VSUBW.
{,} , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 U 1 1 1 1 1 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 0 0 1 op N 0 M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 U 1 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 0 1 op N 0 M 0 1 0 1 0 Vm if size == ‘11’ then SEE “Related encodings”; if Vd<0> == ‘1’ || (op == ‘1’ && Vn<0> == ‘1’) then UNDEFINED; esize = 8 << UInt(size); elements = 64 DIV esize; is_vsubw == (op == ‘1’); d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); Related encodings A8-794 See Advanced SIMD data-processing instructions on page A7-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VSUBL.
, , VSUBW.
{,} , Encoded as op = 0 Encoded as op = 1 where: See Standard assembler syntax fields on page A8-7. An ARM VSUBL or VSUBW instruction must be unconditional.
The data type for the elements of the second operand. It must be one of: S8 encoded as size = 0b00, U = 0 S16 encoded as size = 0b01, U = 0 S32 encoded as size = 0b10, U = 0 U8 encoded as size = 0b00, U = 1 U16 encoded as size = 0b01, U = 1 U32 encoded as size = 0b10, U = 1. The destination register. , The first and second operand registers for a VSUBW instruction. , The first and second operand registers for a VSUBL instruction. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for e = 0 to elements-1 if is_vsubw then op1 = Int(Elem[Q[n>>1],e,2*esize], unsigned); else op1 = Int(Elem[D[n],e,esize], unsigned); result = op1 - Int(Elem[D[m],e,esize], unsigned); Elem[Q[d>>1],e,2*esize] = result<2*esize-1:0>; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-795 Instruction Details A8.6.405 VSWP VSWP (Vector Swap) exchanges the contents of two vectors. The vectors can be either doubleword or quadword. There is no distinction between data types. Encoding T1 / A1 Advanced SIMD VSWP , VSWP
, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 1 1 1 1 1 1 D 1 1 size 1 0 Vd 0 0 0 0 0 Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D 1 1 size 1 0 Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 0 0 0 0 0 Q M 0 1 0 1 0 Vm if size != ‘00’ then UNDEFINED; if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-796 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VSWP{.
} , VSWP{.
}
, Encoded as Q = 1, size = ’00’ Encoded as Q = 0, size = ’00’ where: See Standard assembler syntax fields on page A8-7. An ARM VSWP instruction must be unconditional.
An optional data type. It is ignored by assemblers, and does not affect the encoding. , The vectors for a quadword operation.
, The vectors for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 if d == m then D[d+r] = bits(64) UNKNOWN; else tmp = D[d+r]; D[d+r] = D[m+r]; D[m+r] = tmp; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-797 Instruction Details A8.6.406 VTBL, VTBX Vector Table Lookup uses byte indexes in a control vector to look up byte values in a table and generate a new vector. Indexes out of range return 0. Vector Table Extension works in the same way, except that indexes out of range leave the destination element unchanged. Encoding T1 / A1 Advanced SIMD V.8
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 1 1 1 1 D 1 1 1 0 15 14 13 12 11 10 9 8 Vn Vd 1 0 len 7 6 5 4 3 2 N op M 0 Vm 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 0 0 1 1 1 D 1 1 N op M 0 Vn Vd 1 0 len 1 0 1 0 Vm is_vtbl = (op == ‘0’); length = UInt(len)+1; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); if n+length > 32 then UNPREDICTABLE; A8-798 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax V.8
, , where: Specifies the operation. It must be one of: TBL encoded as op = 0 TBX encoded as op = 1 See Standard assembler syntax fields on page A8-7. An ARM VTBL or VTBX instruction must be unconditional.
The destination vector. The vectors containing the table. It must be one of: {} encoded as len = 0b00 {,} encoded as len = 0b01 {,,} encoded as len = 0b10 {,,,} encoded as len = 0b11 The index vector. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); // Create 256-bit = 32-byte table variable, with zeros in entries that will not be used. table3 = if length == 4 then D[n+3] else Zeros(64); table2 = if length >= 3 then D[n+2] else Zeros(64); table1 = if length >= 2 then D[n+1] else Zeros(64); table = table3 : table2 : table1 : D[n]; for i = 0 to 7 index = UInt(Elem[D[m],i,8]); if index < 8*length then Elem[D[d],i,8] = Elem[table,index,8]; else if is_vtbl then Elem[D[d],i,8] = Zeros(8); // else Elem[D[d],i,8] unchanged Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-799 Instruction Details A8.6.407 VTRN Vector Transpose treats the elements of its operand vectors as elements of 2 × 2 matrices, and transposes the matrices. The elements of the vectors can be 8-bit, 16-bit, or 32-bit. There is no distinction between data types. Encoding T1 / A1 Advanced SIMD VTRN. , VTRN.
, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 1 1 1 1 1 1 D 1 1 size 1 0 Vd 7 6 5 4 3 2 0 0 0 0 1 Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D 1 1 size 1 0 Vd 1 0 Vm 7 6 5 4 3 2 0 0 0 0 1 Q M 0 1 0 Vm if size == ‘11’ then UNDEFINED; if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; Figure A8-7 shows the operation of doubleword VTRN. Quadword VTRN performs the same operation as doubleword VTRN twice, once on the upper halves of the quadword vectors, and once on the lower halves 1 0 3 2 1 0 7 Dd Dd Dd Dm Dm Dm VTRN.32 6 VTRN.16 5 4 3 2 1 0 VTRN.8 Figure A8-7 Operation of doubleword VTRN A8-800 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VTRN. , VTRN.
, Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VTRN instruction must be unconditional. The data size for the elements of the vectors. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 32 encoded as size = 0b10. , The destination vector, and the operand vector, for a quadword operation.
, The destination vector, and the operand vector, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); h = elements/2; CheckAdvSIMDEnabled(); for r = 0 to regs-1 if d == m then D[d+r] = bits(64) UNKNOWN; else for e = 0 to h-1 tmp = Elem[D[d+r],2*e+1,esize]; Elem[D[d+r],2*e+1,esize] = Elem[D[m+r],2*e,esize]; Elem[D[m+r],2*e,esize] = tmp; Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-801 Instruction Details A8.6.408 VTST Vector Test Bits takes each element in a vector, and bitwise ANDs it with the corresponding element of a second vector. If the result is not zero, the corresponding element in the destination vector is set to all ones. Otherwise, it is set to all zeros. The operand vector elements can be any one of: • 8-bit, 16-bit, or 32-bit fields. The result vector elements are bitfields the same size as the operand vector elements. Encoding T1 / A1 Advanced SIMD VTST. , , VTST.
, , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 0 1 1 1 1 0 D size 1 0 15 14 13 12 11 10 9 8 Vn Vd 1 0 0 0 N Q M 1 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 0 0 D size Vn Vd 7 6 5 4 3 2 Vm 7 6 5 4 3 2 1 0 0 0 N Q M 1 1 0 1 0 Vm if Q == ‘1’ && (Vd<0> == ‘1’ || Vn<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; if size == ‘11’ then UNDEFINED; esize = 8 << UInt(size); elements = 64 DIV esize; d = UInt(D:Vd); n = UInt(N:Vn); m = UInt(M:Vm); regs = if Q == ‘0’ then 1 else 2; A8-802 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VTST. {,} , VTST. {
,} , Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VTST instruction must be unconditional. The data size for the elements of the operands. It must be one of: 8 encoded as size = 0b00 16 encoded as size = 0b01 32 encoded as size = 0b10. , , The destination vector and the operand vectors, for a quadword operation.
, , The destination vector and the operand vectors, for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); for r = 0 to regs-1 for e = 0 to elements-1 if !IsZero(Elem[D[n+r],e,esize] AND Elem[D[m+r],e,esize]) then Elem[D[d+r],e,esize] = Ones(esize); else Elem[D[d+r],e,esize] = Zeros(esize); Exceptions Undefined Instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-803 Instruction Details A8.6.409 VUZP Vector Unzip de-interleaves the elements of two vectors. See Table A8-13 and Table A8-14 for examples of the operation. The elements of the vectors can be 8-bit, 16-bit, or 32-bit. There is no distinction between data types. Encoding T1 / A1 Advanced SIMD VUZP. , VUZP.
, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 1 1 1 1 1 1 D 1 1 size 1 0 Vd 0 0 0 1 0 Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D 1 1 size 1 0 7 6 5 4 3 2 Vd Vm 7 6 5 4 3 2 0 0 0 1 0 Q M 0 1 0 1 0 Vm if size == ‘11’ || (Q == ‘0’ && size == ‘10’) then UNDEFINED; if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; quadword_operation = (Q == ‘1’); esize = 8 << UInt(size); d = UInt(D:Vd); m = UInt(M:Vm); Table A8-13 Operation of doubleword VUZP.8 Register state before operation Register state after operation Dd A7 A6 A5 A4 A3 A2 A1 A0 B6 B4 B2 B0 A6 A4 A2 A0 Dm B7 B6 B5 B4 B3 B2 B1 B0 B7 B5 B3 B1 A7 A5 A3 A1 Table A8-14 Operation of quadword VUZP.32 Register state before operation A8-804 Register state after operation Qd A3 A2 A1 A0 B2 B0 A2 A0 Qm B3 B2 B1 B0 B3 B1 A3 A1 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VUZP. , VUZP.
, Encoded as Q = 1 Encoded as Q = 0 where: See Standard assembler syntax fields on page A8-7. An ARM VUZP instruction must be unconditional. The data size for the elements of the vectors. It must be one of: 8 encoded as size = 0b00. 16 encoded as size = 0b01. 32 encoded as size = 0b10 for a quadword operation. Doubleword operation with = 32 is a pseudo-instruction. , The vectors for a quadword operation.
, The vectors for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); if quadword_operation then if d == m then Q[d>>1] = bits(128) UNKNOWN; Q[m>>1] = bits(128) UNKNOWN; else zipped_q = Q[m>>1]:Q[d>>1]; for e = 0 to (128 DIV esize) - 1 Elem[Q[d>>1],e,esize] = Elem[zipped_q,2*e,esize]; Elem[Q[m>>1],e,esize] = Elem[zipped_q,2*e+1,esize]; else if d == m then D[d] = bits(64) UNKNOWN; D[m] = bits(64) UNKNOWN; else zipped_d = D[m]:D[d]; for e = 0 to (64 DIV esize) - 1 Elem[D[d],e,esize] = Elem[zipped_d,2*e,esize]; Elem[D[m],e,esize] = Elem[zipped_d,2*e+1,esize]; Exceptions Undefined Instruction. Pseudo-instruction VUZP.32
, is a synonym for VTRN.32
, . For details see VTRN on page A8-800. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-805 Instruction Details A8.6.410 VZIP Vector Zip interleaves the elements of two vectors. See Table A8-15 and Table A8-16 for examples of the operation. The elements of the vectors can be 8-bit, 16-bit, or 32-bit. There is no distinction between data types. Encoding T1 / A1 Advanced SIMD VZIP. , VZIP.
, 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 1 1 1 1 1 1 1 1 1 D 1 1 size 1 0 Vd 0 0 0 1 1 Q M 0 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 0 0 1 1 1 D 1 1 size 1 0 7 6 5 4 3 2 Vd Vm 7 6 5 4 3 2 0 0 0 1 1 Q M 0 1 0 1 0 Vm if size == ‘11’ || (Q == ‘0’ && size == ‘10’) then UNDEFINED; if Q == ‘1’ && (Vd<0> == ‘1’ || Vm<0> == ‘1’) then UNDEFINED; quadword_operation = (Q == ‘1’); esize = 8 << UInt(size); d = UInt(D:Vd); m = UInt(M:Vm); Table A8-15 Operation of doubleword VZIP.8 Register state before operation Register state after operation Dd A7 A6 A5 A4 A3 A2 A1 A0 B3 A3 B2 A2 B1 A1 B0 A0 Dm B7 B6 B5 B4 B3 B2 B1 B0 B7 A7 B6 A6 B5 A5 B4 A4 Table A8-16 Operation of quadword VZIP.32 Register state before operation A8-806 Register state after operation Qd A3 A2 A1 A0 B1 A1 B0 A0 Qm B3 B2 B1 B0 B3 A3 B2 A2 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax VZIP. , VZIP.
, Encoded as Q = 1 Encoded as Q = 0 where: ,
, See Standard assembler syntax fields on page A8-7. An ARM VZIP instruction must be unconditional. The data size for the elements of the vectors. It must be one of: 8 encoded as size = 0b00. 16 encoded as size = 0b01. 32 encoded as size = 0b10 for a quadword operation. Doubleword operation with = 32 is a pseudo-instruction. The vectors for a quadword operation. The vectors for a doubleword operation. Operation if ConditionPassed() then EncodingSpecificOperations(); CheckAdvSIMDEnabled(); if quadword_operation then if d == m then Q[d>>1] = bits(128) UNKNOWN; Q[m>>1] = bits(128) UNKNOWN; else bits(256) zipped_q; for e = 0 to (128 DIV esize) - 1 Elem[zipped_q,2*e,esize] = Elem[Q[d>>1],e,esize]; Elem[zipped_q,2*e+1,esize] = Elem[Q[m>>1],e,esize]; Q[d>>1] = zipped_q<127:0>; Q[m>>1] = zipped_q<255:128>; else if d == m then D[d] = bits(64) UNKNOWN; D[m] = bits(64) UNKNOWN; else bits(128) zipped_d; for e = 0 to (64 DIV esize) - 1 Elem[zipped_d,2*e,esize] = Elem[D[d],e,esize]; Elem[zipped_d,2*e+1,esize] = Elem[D[m],e,esize]; D[d] = zipped_d<63:0>; D[m] = zipped_d<127:64>; Exceptions Undefined Instruction. Pseudo-instructions VZIP.32
, is a synonym for VTRN.32
, . For details see VTRN on page A8-800. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-807 Instruction Details A8.6.411 WFE Wait For Event is a hint instruction that permits the processor to enter a low-power state until one of a number of events occurs, including events signaled by executing the SEV instruction on any processor in the multiprocessor system. For more information, see Wait For Event and Send Event on page B1-44. Encoding T1 ARMv7 (executes as NOP in ARMv6T2) WFE 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 0 1 1 1 1 1 1 0 0 1 0 0 0 0 0 // No additional decoding required Encoding T2 ARMv7 (executes as NOP in ARMv6T2) WFE.W 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 0 0 1 1 1 0 1 0 (1) (1) (1) (1) 1 0 (0) 0 (0) 0 0 0 0 0 0 0 0 0 1 0 // No additional decoding required Encoding A1 ARMv6K, ARMv7 (executes as NOP in ARMv6T2) WFE 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 7 6 5 4 3 2 1 0 0 0 1 1 0 0 1 0 0 0 0 0 (1) (1) (1) (1) (0) (0) (0) (0) 0 0 0 0 0 0 1 0 // No additional decoding required A8-808 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax WFE where: See Standard assembler syntax fields on page A8-7. Operation if ConditionPassed() then EncodingSpecificOperations(); if EventRegistered() then ClearEventRegister(); else WaitForEvent(); Exceptions None. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-809 Instruction Details A8.6.412 WFI Wait For Interrupt is a hint instruction that permits the processor to enter a low-power state until one of a number of asynchronous events occurs. For details, see Wait For Interrupt on page B1-47. Encoding T1 ARMv7 (executes as NOP in ARMv6T2) WFI 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 0 1 1 1 1 1 1 0 0 1 1 0 0 0 0 // No additional decoding required Encoding T2 ARMv7 (executes as NOP in ARMv6T2) WFI.W 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 0 0 1 1 1 0 1 0 (1) (1) (1) (1) 1 0 (0) 0 (0) 0 0 0 0 0 0 0 0 0 1 1 // No additional decoding required Encoding A1 ARMv6K, ARMv7 (executes as NOP in ARMv6T2) WFI 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 7 6 5 4 3 2 1 0 0 0 1 1 0 0 1 0 0 0 0 0 (1) (1) (1) (1) (0) (0) (0) (0) 0 0 0 0 0 0 1 1 // No additional decoding required A8-810 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax WFI where: See Standard assembler syntax fields on page A8-7. Operation if ConditionPassed() then EncodingSpecificOperations(); WaitForInterrupt(); Exceptions None. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-811 Instruction Details A8.6.413 YIELD YIELD is a hint instruction. It enables software with a multithreading capability to indicate to the hardware that it is performing a task, for example a spin-lock, that could be swapped out to improve overall system performance. Hardware can use this hint to suspend and resume multiple code threads if it supports the capability. Encoding T1 ARMv7 (executes as NOP in ARMv6T2) YIELD 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 0 1 1 1 1 1 1 0 0 0 1 0 0 0 0 // No additional decoding required Encoding T2 ARMv7 (executes as NOP in ARMv6T2) YIELD.W 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 0 0 1 1 1 0 1 0 (1) (1) (1) (1) 1 0 (0) 0 (0) 0 0 0 0 0 0 0 0 0 0 1 // No additional decoding required Encoding A1 ARMv6K, ARMv7 (executes as NOP in ARMv6T2) YIELD 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 7 6 5 4 3 2 1 0 0 0 1 1 0 0 1 0 0 0 0 0 (1) (1) (1) (1) (0) (0) (0) (0) 0 0 0 0 0 0 0 1 // No additional decoding required A8-812 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Instruction Details Assembler syntax YIELD where: See Standard assembler syntax fields on page A8-7. Operation if ConditionPassed() then EncodingSpecificOperations(); Hint_Yield(); Exceptions None. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A8-813 Instruction Details A8-814 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Chapter A9 ThumbEE This chapter contains detailed reference material on ThumbEE. It contains the following sections: • The ThumbEE instruction set on page A9-2 • ThumbEE instruction set encoding on page A9-6 • Additional instructions in Thumb and ThumbEE instruction sets on page A9-7 • ThumbEE instructions with modified behavior on page A9-8 • Additional ThumbEE instructions on page A9-14. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A9-1 ThumbEE A9.1 The ThumbEE instruction set In general, instructions in ThumbEE are identical to Thumb instructions, with the following exceptions: • A small number of instructions are affected by modifications to transitions from ThumbEE state. For more information, see ThumbEE state transitions. • A substantial number of instructions have a null check on the base register before any other operation takes place, but are identical (or almost identical) in all other respects. For more information, see Null checking on page A9-3. • A small number of instructions are modified in additional ways. See Instructions with modifications on page A9-4. • Three Thumb instructions, BLX (immediate), 16-bit LDM, and 16-bit STM, are removed in ThumbEE state. The encoding corresponding to BLX (immediate) in Thumb is UNDEFINED in ThumbEE state. 16-bit LDM and STM are replaced by new instructions, for details see Additional ThumbEE instructions on page A9-14. • A9.1.1 Two new 32-bit instructions, ENTERX and LEAVEX, are introduced in both the Thumb instruction set and the ThumbEE instruction set. See Additional instructions in Thumb and ThumbEE instruction sets on page A9-7. These instructions use previously UNDEFINED encodings. ThumbEE state transitions Instruction set state transitions to ThumbEE state can occur implicitly as part of a return from exception, or explicitly on execution of an ENTERX instruction. Instruction set state transitions from ThumbEE state can only occur due to an exception, or due to a transition to Thumb state using the LEAVEX instruction. Return from exception instructions (RFE and SUBS PC, LR, #imm) are UNPREDICTABLE in ThumbEE state. Any other Thumb instructions that can update the PC in ThumbEE state are UNPREDICTABLE if they attempt to change to ARM state. Interworking of ARM and Thumb instructions is not supported in ThumbEE state. The instructions affected are: LDR, LDM, and POP instructions that write to the PC, if bit [0] of the value loaded to the PC is 0 • BLX (register), BX, and BXJ, where Rm bit [0] == 0. • Note SVC, BKPT, and UNDEFINED instructions cause an exception to occur. If a BXJ instruction is executed in ThumbEE state, with Rm bit[0] == 1, it does not enter Jazelle state. Instead, it behaves like the corresponding BX instruction and remains in ThumbEE state. Debug state is a special case. For the rules governing changes to CPSR state bits and Debug state, see Executing instructions in Debug state on page C5-9. A9-2 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B ThumbEE A9.1.2 Null checking A null check is performed for all load/store instructions when they are executed in ThumbEE state. If the value in the base register is zero, execution branches to the NullCheck handler at HandlerBase – 4. For most load/store instructions, this is the only difference from normal Thumb operation. Exceptions to this rule are described in this chapter. Note • The null check examines the value in the base register, not any calculated value offset from the base register. • If the base register is the SP or the PC, a zero value in the base register results in UNPREDICTABLE behavior. The instructions affected by null checking are: • all instructions whose mnemonic starts with LD, ST, VLD or VST • POP, PUSH, TBB, TBH, VPOP, and VPUSH. For each of these instructions, the pseudocode shown in the Operation section uses the following function: // NullCheckIfThumbEE() // ==================== NullCheckIfThumbEE(integer n) if CurrentInstrSet() == InstrSet_ThumbEE then if n == 15 then if IsZero(Align(PC,4)) then UNPREDICTABLE; elsif n == 13 then if IsZero(SP) then UNPREDICTABLE; else if IsZero(R[n]) then LR = PC<31:1> : ‘1’; // PC holds this instruction’s address plus 4 BranchWritePC(TEEHBR - 4); EndOfInstruction(); return; ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A9-3 ThumbEE A9.1.3 Instructions with modifications In addition to the instructions described in ThumbEE state transitions on page A9-2 and Null checking on page A9-3, Table A9-1 shows other instructions that are modified in ThumbEE state. The pseudocode, including the null check if any, is given in ThumbEE instructions with modified behavior on page A9-8. Table A9-1 Modified instructions A9-4 Instructions Rbase Modification LDR (register) Rn Rm multiplied by 4, null check LDRH (register) Rn Rm multiplied by 2, null check LDRSH (register) Rn Rm multiplied by 2, null check STR (register) Rn Rm multiplied by 4, null check STRH (register) Rn Rm multiplied by 2, null check Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B ThumbEE A9.1.4 IT block and check handlers CHKA, stores, and permitted loads (loads to the PC are only permitted as the last instruction) can occur anywhere in an IT block. If one of these instructions results in a branch to the null pointer or array index handlers, the IT state bits in ITSTATE are cleared. This enables unconditional execution from the start of the handler. The original IT state bits are not preserved. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A9-5 ThumbEE A9.2 ThumbEE instruction set encoding In general, instructions in the ThumbEE instruction set are encoded in exactly the same way as Thumb instructions described in Chapter A6 Thumb Instruction Set Encoding. The differences are as follows: A9.2.1 • There are no 16-bit LDM or STM instructions in the ThumbEE instruction set. • The 16-bit encodings used for LDM and STM in the Thumb instruction set are used for different 16-bit instructions in the ThumbEE instruction set. For details, see 16-bit ThumbEE instructions. • There are two new 32-bit instructions in both Thumb state and ThumbEE state. For details, see Additional instructions in Thumb and ThumbEE instruction sets on page A9-7. 16-bit ThumbEE instructions 15 14 13 12 11 10 9 8 7 1 1 0 0 6 5 4 3 2 1 0 Opcode Table A9-2 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. Table A9-2 16-bit ThumbEE instructions A9-6 Opcode Instruction See 0000 Handler Branch with Parameter HBP on page A9-18 0001 UNDEFINED 001x Handler Branch, Handler Branch with Link HB, HBL on page A9-16 01xx Handler Branch with Link and Parameter HBLP on page A9-17 100x Load Register from a frame LDR (immediate) on page A9-19 1010 Check Array CHKA on page A9-15 1011 Load Register from a literal pool LDR (immediate) on page A9-19 110x Load Register (array operations) LDR (immediate) on page A9-19 111x Store Register to a frame STR (immediate) on page A9-21 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B ThumbEE A9.3 Additional instructions in Thumb and ThumbEE instruction sets On a processor with the ThumbEE extension, there are two additional 32-bit instructions, ENTERX and LEAVEX. These are available in both Thumb state and ThumbEE state. A9.3.1 ENTERX, LEAVEX ENTERX causes a change from Thumb state to ThumbEE state, or has no effect in ThumbEE state. LEAVEX causes a change from ThumbEE state to Thumb state, or has no effect in Thumb state. Encoding T1 ThumbEE Not permitted in IT block. Not permitted in IT block. ENTERX LEAVEX 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 0 0 1 1 1 0 1 1 (1) (1) (1) (1) 1 0 (0) 0 (1) (1) (1) (1) 0 0 0 J (1) (1) (1) (1) is_enterx = (J == ‘1’); Assembler syntax Encoded as J = 1 Encoded as J = 0 ENTERX LEAVEX where: See Standard assembler syntax fields on page A8-7. An ENTERX or LEAVEX instruction must be unconditional. Operation if is_enterx then SelectInstrSet(InstrSet_ThumbEE); else SelectInstrSet(InstrSet_Thumb); Exceptions None. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A9-7 ThumbEE A9.4 ThumbEE instructions with modified behavior The 16-bit encodings of the following Thumb instructions have changed functionality in ThumbEE: • LDR (register) on page A9-9 • LDRH (register) on page A9-10 • LDRSH (register) on page A9-11 • STR (register) on page A9-12 • STRH (register) on page A9-13. In ThumbEE state there are the following changes in the behavior of instructions: A9-8 • All load/store instructions perform null checks on their base register values, as described in Null checking on page A9-3. The pseudocode for these instructions in Chapter A8 Instruction Details describes this by calling the NullCheckIfThumbEE() pseudocode procedure. • Instructions that attempt to enter ARM state are UNPREDICTABLE, as described in ThumbEE state transitions on page A9-2. The pseudocode for these instructions in Chapter A8 Instruction Details describes this by calling the SelectInstrSet() or BXWritePC() pseudocode procedure. • The BXJ instruction behaves like the BX instruction, as described in ThumbEE state transitions on page A9-2. The pseudocode for the instruction, in BXJ on page A8-64, describes this directly. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B ThumbEE A9.4.1 LDR (register) Load Register (register) calculates an address from a base register value and an offset register value, loads a word from memory, and writes it to a register. The offset register value is shifted left by 2 bits. For information about memory accesses see Memory accesses on page A8-13. The similar Thumb instruction does not have a left shift. Encoding T1 ThumbEE LDR ,[,, LSL #2] 15 14 13 12 11 10 9 0 1 0 1 1 0 0 t = UInt(Rt); 8 7 6 5 4 3 Rm n = UInt(Rn); 2 1 0 Rn Rt m = UInt(Rm); Assembler syntax LDR , [, , LSL #2] where: See Standard assembler syntax fields on page A8-7. The destination register. The base register. Contains the offset that is shifted and applied to the value of to form the address. Operation if ConditionPassed() then EncodingSpecificOperations(); address = R[n] + LSL(R[m],2); R[t] = MemU[address,4]; NullCheckIfThumbEE(n); Exceptions and checks Data Abort, NullCheck. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A9-9 ThumbEE A9.4.2 LDRH (register) Load Register Halfword (register) calculates an address from a base register value and an offset register value, loads a halfword from memory, zero-extends it to form a 32-bit word, and writes it to a register. The offset register value is shifted left by 1 bit. For information about memory accesses see Memory accesses on page A8-13. The similar Thumb instruction does not have a left shift. Encoding T1 ThumbEE LDRH ,[,, LSL #1] 15 14 13 12 11 10 9 8 7 0 1 0 1 1 0 1 t = UInt(Rt); 6 5 4 3 2 Rm n = UInt(Rn); Rn 1 0 Rt m = UInt(Rm); Assembler syntax LDRH , [, , LSL #1] where: See Standard assembler syntax fields on page A8-7. The destination register. The base register. Contains the offset that is shifted and applied to the value of to form the address. Operation if ConditionPassed() then EncodingSpecificOperations(); NullCheckIfThumbEE(n); address = R[n] + LSL(R[m],1); R[t] = ZeroExtend(MemU[address,2], 32); Exceptions and checks Data Abort, NullCheck. A9-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B ThumbEE A9.4.3 LDRSH (register) Load Register Signed Halfword (register) calculates an address from a base register value and an offset register value, loads a halfword from memory, sign-extends it to form a 32-bit word, and writes it to a register. The offset register value is shifted left by 1 bit. For information about memory accesses see Memory accesses on page A8-13. The similar Thumb instruction does not have a left shift. Encoding T1 ThumbEE LDRSH ,[,, LSL #1] 15 14 13 12 11 10 9 0 1 0 1 1 1 1 t = UInt(Rt); 8 7 6 5 4 3 Rm n = UInt(Rn); 2 1 0 Rn Rt m = UInt(Rm); Assembler syntax LDRSH , [, , LSL #1] where: See Standard assembler syntax fields on page A8-7. The destination register. The base register. Contains the offset that is shifted and applied to the value of to form the address. Operation if ConditionPassed() then EncodingSpecificOperations(); NullCheckIfThumbEE(n); address = R[n] + LSL(R[m],1); R[t] = SignExtend(MemU[address,2], 32); Exceptions and checks Data Abort, NullCheck. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A9-11 ThumbEE A9.4.4 STR (register) Store Register (register) calculates an address from a base register value and an offset register value, and stores a word from a register to memory. The offset register value is shifted left by 2 bits. For information about memory accesses see Memory accesses on page A8-13. The similar Thumb instruction does not have a left shift. Encoding T1 ThumbEE STR ,[,, LSL #2] 15 14 13 12 11 10 9 8 7 0 1 0 1 0 0 0 t = UInt(Rt); 6 5 4 3 2 Rm n = UInt(Rn); Rn 1 0 Rt m = UInt(Rm); Assembler syntax STR , [, , LSL #2] where: See Standard assembler syntax fields on page A8-7. The source register. The base register. Contains the offset that is shifted and applied to the value of to form the address. Operation if ConditionPassed() then EncodingSpecificOperations(); address = R[n] + LSL(R[m],2); MemU[address,4] = R[t]; NullCheckIfThumbEE(n); Exceptions and checks Data Abort, NullCheck. A9-12 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B ThumbEE A9.4.5 STRH (register) Store Register Halfword (register) calculates an address from a base register value and an offset register value, and stores a halfword from a register to memory. The offset register value is shifted left by 1 bit. For information about memory accesses see Memory accesses on page A8-13. The similar Thumb instruction does not have a left shift. Encoding T1 ThumbEE STRH ,[,, LSL #1] 15 14 13 12 11 10 9 0 1 0 1 0 0 1 t = UInt(Rt); 8 7 6 5 4 3 Rm n = UInt(Rn); 2 1 0 Rn Rt m = UInt(Rm); Assembler syntax STRH , [, , LSL #1] where: See Standard assembler syntax fields on page A8-7. The source register. The base register. Contains the offset that is shifted and applied to the value of to form the address. Operation if ConditionPassed() then EncodingSpecificOperations(); address = R[n] + LSL(R[m],1); MemU[address,2] = R[t]<15:0>; NullCheckIfThumbEE(n); Exceptions and checks Data Abort, NullCheck. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A9-13 ThumbEE A9.5 Additional ThumbEE instructions The following instructions are available in ThumbEE state, but not in Thumb state: • CHKA on page A9-15 • HB, HBL on page A9-16 • HBLP on page A9-17 • HBP on page A9-18 • LDR (immediate) on page A9-19 • STR (immediate) on page A9-21. These are 16-bit instructions. They occupy the instruction encoding space that STMIA and LDMIA occupy in Thumb state. A9-14 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B ThumbEE A9.5.1 CHKA CHKA (Check Array) compares the unsigned values in two registers. If the first is lower than, or the same as, the second, it copies the PC to the LR, and causes a branch to the IndexCheck handler. Encoding E1 ThumbEE CHKA , 15 14 13 12 11 10 9 8 7 6 5 4 3 1 1 0 0 1 0 1 0 N 2 1 0 Rm Rn n = UInt(N:Rn); m = UInt(Rm); if n == 15 || BadReg(m) then UNPREDICTABLE; Assembler syntax CHKA , where: See Standard assembler syntax fields on page A8-7. The first operand register. This contains the array size. Use of the SP is permitted. The second operand register. This contains the array index. Operation if ConditionPassed() then EncodingSpecificOperations(); if UInt(R[n]) <= UInt(R[m]) then LR = PC<31:1> : ‘1’; // PC holds this instruction’s address + 4 BranchWritePC(TEEHBR - 8); Exceptions and checks IndexCheck. Usage Use CHKA to check that an array index is in bounds. CHKA does not modify the APSR condition code flags. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A9-15 ThumbEE A9.5.2 HB, HBL Handler Branch branches to a specified handler. Handler Branch with Link saves a return address to the LR, and then branches to a specified handler. Encoding E1 ThumbEE HB{L} # 15 14 13 12 11 10 9 8 7 1 1 0 0 0 0 1 L generate_link = (L == ‘1’); 6 5 4 3 2 1 0 handler handler_offset = ZeroExtend(handler:’00000’, 32); Assembler syntax HB # HBL # Encoded as L = 0 Encoded as L = 1 where: See Standard assembler syntax fields on page A8-7. The index number of the handler to be called, in the range 0-255. Operation if ConditionPassed() then EncodingSpecificOperations(); if generate_link then next_instr_addr = PC - 2; LR = next_instr_addr<31:1> : ‘1’; BranchWritePC(TEEHBR + handler_offset); Exceptions None. Usage HB{L} makes a large number of handlers available. A9-16 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B ThumbEE A9.5.3 HBLP HBLP (Handler Branch with Link and Parameter) saves a return address to the LR, and then branches to a specified handler. It passes a 5-bit parameter to the handler in R8. Encoding E1 ThumbEE HBLP #, # 15 14 13 12 11 10 9 8 7 6 5 4 3 1 1 0 0 0 1 imm5 imm32 = ZeroExtend(imm5, 32); 2 1 0 handler handler_offset = ZeroExtend(handler:’00000’, 32); Assembler syntax HBLP #, # where: See Standard assembler syntax fields on page A8-7. The parameter to pass to the handler, in the range 0-31. The index number of the handler to be called, in the range 0-31. Operation if ConditionPassed() then EncodingSpecificOperations(); R[8] = imm32; next_instr_addr = PC - 2; LR = next_instr_addr<31:1> : ‘1’; BranchWritePC(TEEHBR + handler_offset); Exceptions None. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A9-17 ThumbEE A9.5.4 HBP HBP (Handler Branch with Parameter) causes a branch to a specified handler. It passes a 3-bit parameter to the handler in R8. Encoding E1 ThumbEE HBP #, # 15 14 13 12 11 10 9 8 7 1 1 0 0 0 0 0 0 6 5 4 3 2 imm3 imm32 = ZeroExtend(imm3, 32); 1 0 handler handler_offset = ZeroExtend(handler:’00000’, 32); Assembler syntax HBP #, # where: See Standard assembler syntax fields on page A8-7. The parameter to pass to the handler, in the range 0-7. The index number of the handler to be called, in the range 0-31. Operation if ConditionPassed() then EncodingSpecificOperations(); R[8] = imm32; BranchWritePC(TEEHBR + handler_offset); Exceptions None. A9-18 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B ThumbEE A9.5.5 LDR (immediate) Load Register (immediate) provides 16-bit instructions to load words using: • R9 as base register, with a positive offset of up to 63 words, for loading from a frame • R10 as base register, with a positive offset of up to 31 words, for loading from a literal pool • R0-R7 as base register, with a negative offset of up to 7 words, for array operations. Encoding E1 ThumbEE LDR ,[R9{, #}] 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 0 0 1 1 0 imm6 Rt t = UInt(Rt); n = 9; Encoding E2 imm32 = ZeroExtend(imm6:’00’, 32); add = TRUE; ThumbEE LDR ,[R10{, #}] 15 14 13 12 11 10 9 8 7 6 5 4 3 1 1 0 0 1 0 1 1 t = UInt(Rt); n = 10; Encoding E3 imm5 2 1 0 Rt imm32 = ZeroExtend(imm5:’00’, 32); add = TRUE; ThumbEE LDR ,[{, #-}] 15 14 13 12 11 10 9 1 1 0 0 1 0 0 t = UInt(Rt); ARM DDI 0406B 8 7 6 5 4 3 imm3 n = UInt(Rn); Rn 2 1 0 Rt imm32 = ZeroExtend(imm3:’00’, 32); add = FALSE; Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A9-19 ThumbEE Assembler syntax LDR , [{, #}] where: See Standard assembler syntax fields on page A8-7. The destination register. The base register. This register is: • R9 for encoding E1 • R10 for encoding E2 • any of R0-R7 for encoding E3. The immediate offset used to form the address. Values are multiples of 4 in the range: 0-252 encoding E1 0-124 encoding E2 –28-0 encoding E3. can be omitted, meaning an offset of 0. Operation if ConditionPassed() then EncodingSpecificOperations(); NullCheckIfThumbEE(n); address = if add then (R[n] + imm32) else (R[n] - imm32); R[t] = MemU[address,4]; Exceptions and checks Data Abort, NullCheck. A9-20 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B ThumbEE A9.5.6 STR (immediate) Store Register (immediate) provides a 16-bit word store instruction using R9 as base register, with a positive offset of up to 63 words, for storing to a frame. Encoding E1 ThumbEE STR , [R9, #] 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 0 0 1 1 1 imm6 Rt t = UInt(Rt); imm32 = ZeroExtend(imm6:’00’, 32); Assembler syntax STR , [R9, #] where: See Standard assembler syntax fields on page A8-7. The source register. The immediate offset applied to the value of R9 to form the address. Values are multiples of 4 in the range 0-252. can be omitted, meaning an offset of 0. Operation if ConditionPassed() then EncodingSpecificOperations(); address = R[9] + imm32; MemU[address,4] = R[t]; NullCheckIfThumbEE(9); Exceptions and checks Data Abort, NullCheck. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. A9-21 ThumbEE A9-22 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Part B System Level Architecture Chapter B1 The System Level Programmers’ Model This chapter provides a system-level view of the programmers’ model. It contains the following sections: • About the system level programmers’ model on page B1-2 • System level concepts and terminology on page B1-3 • ARM processor modes and core registers on page B1-6 • Instruction set states on page B1-23 • The Security Extensions on page B1-25 • Exceptions on page B1-30 • Coprocessors and system control on page B1-62 • Advanced SIMD and floating-point support on page B1-64 • Execution environment support on page B1-73. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-1 The System Level Programmers’ Model B1.1 About the system level programmers’ model An application programmer has only a restricted view of the system. The system level programmers’ model supports this application level view of the system, and includes features required for an operating system (OS) to provide the programming environment seen by an application. The system level programmers’ model includes all of the system features required to support operating systems and to handle hardware events. System level concepts and terminology on page B1-3 gives a system level introduction to the basic concepts of the ARM architecture, and the terminology used to describe the architecture. The rest of this chapter describes the system level programmers’ model. The other chapters in this part describe: • B1-2 The memory system architectures: — Chapter B2 Common Memory System Architecture Features describes common features of the memory system architectures — Chapter B3 Virtual Memory System Architecture (VMSA) describes the Virtual Memory System Architecture (VMSA) used in the ARMv7-A profile — Chapter B4 Protected Memory System Architecture (PMSA) describes the Protected Memory System Architecture (PMSA) used in the ARMv7-R profile. • The CPUID mechanism, that enables an OS to determine the capabilities of the processor it is running on. See Chapter B5 The CPUID Identification Scheme. • The instructions that provide system-level functionality, such as returning from an exception. See Chapter B6 System Instructions. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model B1.2 System level concepts and terminology A number of concepts are critical to understanding the system level architecture support. These are introduced in the following sections: • Privilege, mode, and state • Exceptions on page B1-4 B1.2.1 Privilege, mode, and state Privilege, mode, and state are key concepts in the ARM architecture. Privilege Software can execute as privileged or unprivileged: • Unprivileged execution limits or excludes access to some resources in the current security state. • Privileged execution gives access to all resources in the current security state. Mode The ARM architecture provides a set of modes that support normal software execution and handle exceptions. The current mode determines the set of registers that are available and the privilege of the executing software. For more information, see ARM processor modes and core registers on page B1-6. State In the ARM architecture, state is used to describe the following distinct concepts: Instruction set state ARMv7 provides four instruction set states. The instruction set state determines the instruction set that is being executed, and is one of ARM state, Thumb state, Jazelle state, or ThumbEE state. ISETSTATE on page A2-15 gives more information about these states. Execution state The execution state consists of the instruction set state and some control bits that modify how the instruction stream is decoded. For details, see Execution state registers on page A2-15 and Program Status Registers (PSRs) on page B1-14. Security state In the ARM architecture, the number of security states depends on whether the Security Extensions are implemented: • ARM DDI 0406B When the Security Extensions are implemented, the ARM architecture provides two security states, Secure state and Non-secure state. Each security state has its own system registers and memory address space. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-3 The System Level Programmers’ Model The security state is largely independent of the processor mode. The only exception to this independence of security state and processor mode is Monitor mode. Monitor mode exists only in the Secure state, and supports transitions between Secure and Non-secure state. Some system control resources are only accessible from the Secure state. For more information, see The Security Extensions on page B1-25. Note In some documentation, the Secure state is described as the Secure world, and the Non-secure state is described as the Non-secure world. • Debug state When the Security Extensions are not implemented, the ARM architecture provides only a single security state. Debug state refers to the processor being halted for debug purposes, because a debug event has occurred when the processor is configured to Halting debug-mode. See Invasive debug on page C1-3. When the processor is not in Debug state it is in Non-debug state. Except where explicitly stated otherwise, parts A and B of this manual describe processor behavior and instruction execution in Non-debug state. Chapter C5 Debug State describes the differences in Debug state. B1.2.2 Exceptions An exception is a condition that changes the normal flow of control in a program. The change of flow switches execution to an exception handler, and the state of the system at the point where the exception occurred is presented to the exception handler. A key component of the state presented to the handler is the return address, that indicates the point in the instruction stream where the exception was taken. The ARM architecture provides a number of different exceptions as described in Exceptions on page B1-30. Terminology for describing exceptions In this manual, a number of terms have specific meanings when used to describe exceptions: • • B1-4 An exception is generated in one of the following ways: — Directly as a result of the execution or attempted execution of the instruction stream. For example, an exception is generated as a result of an UNDEFINED instruction. — Less directly, as a result of something in the state of the system. For example, an exception is generated as a result of an interrupt signaled by a peripheral. An exception is taken by a processor at the point where it causes a change to the normal flow of control in the program. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model • • An exception is described as synchronous if both of the following apply: — the exception is generated as a result of direct execution or attempted execution of the instruction stream — the return address presented to the exception handler is guaranteed to indicate the instruction that caused the exception. An exception is described as asynchronous if either of the following applies: — the exception is not generated as a result of direct execution or attempted execution of the instruction stream — the return address presented to the exception handler is not guaranteed to indicate the instruction that caused the exception. Asynchronous exceptions are of two types: • a precise asynchronous exception guarantees that the state presented to the exception handler is consistent with the state at an identifiable instruction boundary in the execution stream from which the exception was taken. • an imprecise asynchronous exception is one where the state presented to the exception handler is not guaranteed to be consistent with any point in the execution stream from which the exception was taken. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-5 The System Level Programmers’ Model B1.3 ARM processor modes and core registers The following sections describe the ARM processor modes and the core registers: • ARM processor modes • ARM core registers on page B1-9 • Program Status Registers (PSRs) on page B1-14. B1.3.1 ARM processor modes The ARM architecture defines eight modes, shown in Table B1-1: Table B1-1 ARM processor modes Processor mode a Mode encoding b Privilege Description User usr 10000 Unprivileged Suitable for most application code. FIQ fiq 10001 Privileged Entered as a result of a fast interrupt. c IRQ irq 10010 Privileged Entered as a result of a normal interrupt. c Supervisor svc 10011 Privileged Suitable for running most kernel code. Entered on Reset, and on execution of a Supervisor Call (SVC) instruction. Monitor d mon 10110 Privileged A Secure mode that enables change between Secure and Non-secure states, and can also be used to handle any of FIQs, IRQs and external aborts. c Entered on execution of a Secure Monitor Call (SMC) instruction. Abort abt 10111 Privileged Entered as a result of a Data Abort exception or Prefetch Abort exception. c Undefined und 11011 Privileged Entered as a result of an instruction-related error. System sys 11111 Privileged Suitable for application code that requires privileged access. a. Processor mode names and abbreviations. b. CPSR.M. All other values are reserved. When the Security Extensions are not implemented the Monitor mode encoding, 0b10110, is reserved. c. Bits in the Secure Configuration Register can be set so that one or more of FIQs, IRQs and external aborts are handled in Monitor mode, see c1, Secure Configuration Register (SCR) on page B3-106. d. Only supported when the Security Extensions are implemented. Mode changes can be made under software control, or can be caused by an external or internal exception. B1-6 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model Notes on the ARM processor modes User mode User mode enables the operating system to restrict the use of system resources. Application programs normally execute in User mode. In User mode, the program being executed: • cannot access protected system resources • cannot change mode except by causing an exception, see Exceptions on page B1-30. Privileged modes The modes other than User mode are known as privileged modes. In their security state they have full access to system resources and can change mode freely. Exception modes The exception modes are: • FIQ mode • IRQ mode • Supervisor mode • Abort mode • Undefined mode • Monitor mode. Each of these modes normally handles the corresponding exceptions, as shown in Table B1-1 on page B1-6. Each exception mode has some banked registers to avoid corrupting the registers of the mode in use when the exception is taken, see ARM core registers on page B1-9. System mode System mode has the same registers available as User mode, and is not entered by any exception. System mode is intended for use by operating system tasks that must access system resources, but do not want to use the exception entry mechanism and the associated additional registers. Also, it is used when the operating system has to access the User mode registers. Monitor mode Monitor mode is only implemented as part of the Security Extensions, and is always in the Secure state, regardless of the value of the SCR.NS bit. For more information, see The Security Extensions on page B1-25. Code running in Monitor mode has access to both the Secure and Non-secure copies of system registers. This means Monitor mode provides the normal method of changing between the Secure and Non-secure security states. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-7 The System Level Programmers’ Model Secure and Non-secure modes In a processor that implements the Security Extensions, a mode description can be qualified as Secure or Non-secure, to indicate whether the processor is also in Secure state or Non-secure state. For example: • if a processor is in a privileged mode and Secure state, it is in a Secure privileged mode • if a processor is in User mode and Non-secure state, it is in Non-secure User mode. Pseudocode details of mode operations The BadMode() function tests whether a 5-bit mode number corresponds to one of the permitted modes: // BadMode() // ========= boolean BadMode(bits(5) mode) case mode of when ‘10000’ result = when ‘10001’ result = when ‘10010’ result = when ‘10011’ result = when ‘10110’ result = when ‘10111’ result = when ‘11011’ result = when ‘11111’ result = otherwise result = return result; FALSE; FALSE; FALSE; FALSE; !HaveSecurityExt(); FALSE; FALSE; FALSE; TRUE; // // // // // // // // User mode FIQ mode IRQ mode Supervisor mode Monitor mode Abort mode Undefined mode System mode The following pseudocode functions provide information about the current mode: // CurrentModeIsPrivileged() // ========================= boolean CurrentModeIsPrivileged() if BadMode(CPSR.M) then UNPREDICTABLE; if CPSR.M == ‘10000’ then return FALSE; return TRUE; // User mode // Other modes // CurrentModeIsUserOrSystem() // =========================== boolean CurrentModeIsUserOrSystem() if BadMode(CPSR.M) then UNPREDICTABLE; if CPSR.M == ‘10000’ then return TRUE; if CPSR.M == ‘11111’ then return TRUE; return FALSE; B1-8 // User mode // System mode // Other modes Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model B1.3.2 ARM core registers ARM core registers on page A2-11 describes the application level view of the ARM register file. This view provides 16 ARM core registers, R0 to R15, that include the Stack Pointer (SP), Link Register (LR), and Program Counter (PC). These registers are selected from a total set of either 31 or 33 registers, depending on whether or not the Security Extensions are implemented. The current execution mode determines the selected set of registers, as shown in Figure B1-1. This shows that the arrangement of the registers provides duplicate copies of some registers, with the current register selected by the execution mode. This arrangement is described as banking of the registers, and the duplicated copies of registers are referred to as banked registers: System level views Privileged modes Exception modes Application level view User mode System mode Supervisor Monitor mode mode ‡ Abort mode Undefined IRQ mode mode FIQ mode R0 R0_usr R1 R1_usr R2 R2_usr R3 R3_usr R4 R4_usr R5 R5_usr R6 R6_usr R7 R7_usr R8 R8_usr R8_fiq R9 R9_usr R9_fiq R10 R10_usr R10_fiq R11 R11_usr R11_fiq R12 R12_usr R12_fiq SP SP_usr SP_svc SP_mon ‡ SP_abt SP_und SP_irq SP_fiq LR LR_usr LR_svc LR_mon ‡ LR_abt LR_und LR_irq LR_fiq PC PC APSR CPSR SPSR_svc SPSR_mon ‡ SPSR_abt SPSR_und SPSR_irq SPSR_fiq ‡ Monitor mode, and the associated banked registers, are implemented only as part of the Security Extensions Figure B1-1 Organization of general-purpose registers and Program Status Registers Figure B1-1 includes the views of the Current Program Status Register (CPSR) and of the banked Saved Program Status Register (SPSR), see Program Status Registers (PSRs) on page B1-14. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-9 The System Level Programmers’ Model Note • System level register names, such as R0_usr, R8_usr, and R8_fiq, are used when it is necessary to identify a specific register. The Application level names refer to the registers for the current mode, and usually are sufficient to identify a register. • In ARMv7, the Security Extensions can be implemented only as part of an ARMv7-A implementation. Each of the exception modes selects a different copy of the banked SP and LR, because these registers have special functions on exception entry: SP This enables the exception handler to use a different stack to the one in use when the exception occurred. For example, it can use a stack in privileged memory rather than one in unprivileged memory. LR The exception return address is placed in the banked LR of the exception mode. This means the use of the LR by the application is not corrupted. The address placed in the banked LR is at an exception-dependent offset from the next instruction to be executed in the code in which the exception occurred. This address enables the exception handler to return to that code, so the processor can resume execution of the code. Table B1-4 on page B1-34 shows the LR value saved on entry to each of the exception modes. In addition: • FIQ mode provides its own mappings for the general-purpose registers R8 to R12. These enable very fast processing of interrupts that are simple enough to be processed using only registers R8 to R12, SP, LR, and PC, without affecting the corresponding registers of the mode in which the interrupt was taken. • In an exception mode the processor can access the SPSR for that mode. There is no SPSR for User mode and System mode. In all ARMv7-A and ARMv7-R implementations: • Every mode except User mode is privileged. • User mode and System mode share the same register file. The only difference between System and User modes is that System mode runs with privileged access. For more information about the application level view of the SP, LR, and the Program Counter (PC), and the alternative descriptions of them as R13, R14 and R15, see ARM core registers on page A2-11. B1-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model Writing to the PC In ARMv7, instruction writes to the PC are handled as follows: • Exception return instructions write both the PC and the CPSR. The value written to the CPSR determines the new instruction set state, and the value written to the PC determines the address that is branched to. For full details, including which instructions are exception return instructions and how incorrectly aligned PC values are handled, see Exception return on page B1-38. • The following two 16-bit Thumb instruction encodings remain in Thumb state and branch to a value written to the PC: — encoding T2 of ADD (register) on page A8-24 — encoding T1 of MOV (register) on page A8-196. The value written to the PC is forced to be halfword-aligned by ignoring its least significant bit, instead treating that bit as being 0. • The following instructions remain in the same instruction set state and branch to a value written to the PC: B, BL, CBNZ, CBZ, CHKA, HB, HBL, HBLP, HBP, TBB, and TBH — — in ThumbEE state, load/store instructions that fail their null check. The definition of each of these instructions ensures that the value written to the PC is correctly aligned for the current instruction set state. • The BLX (immediate) instruction switches between ARM and Thumb states and branches to a value written to the PC. Its definition ensures that the value written to the PC is correctly aligned for the new instruction set state. • The following instructions write a value to the PC, treating that value as an interworking address with low-order bits that determine the new instruction set state and an address to branch to: — BLX (register), BX, and BXJ — LDR, and LDRT instructions with equal to the PC — POP and all forms of LDM except LDM (exception return), when the register list includes the PC — in ARM state only, ADC, ADD, ADR, AND, ASR (immediate), BIC, EOR, LSL (immediate), LSR (immediate), MOV, MVN, ORR, ROR (immediate), RRX, RSB, RSC, SBC, and SUB instructions with equal to the PC and without flag setting specified. For details of how an interworking address specifies the new instruction set state and instruction address, see Pseudocode details of operations on ARM core registers on page A2-12. Note — — ARM DDI 0406B The LDR, LDRT, POP, and LDM instructions first have this behavior in ARMv5T. The instructions listed as having this behavior in ARM state only first have this behavior in ARMv7. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-11 The System Level Programmers’ Model In both cases, the behavior in earlier architecture versions is a branch that remains in the same instruction set state. For more information, see: — Interworking on page AppxG-4, for ARMv6 — Interworking on page AppxH-5, for ARMv5 and ARMv4. Pseudocode details of ARM core register operations The following pseudocode gives access to the general-purpose registers: // The names of the banked core registers. enumeration RName {RName_0usr, RName_1usr, RName_2usr, RName_3usr, RName_4usr, RName_5usr, RName_6usr, RName_7usr, RName_8usr, RName_8fiq, RName_9usr, RName_9fiq, RName_10usr, RName_10fiq, RName_11usr, RName_11fiq, RName_12usr, RName_12fiq, RName_SPusr, RName_SPfiq, RName_SPirq, RName_SPsvc, RName_SPabt, RName_SPund, RName_SPmon, RName_LRusr, RName_LRfiq, RName_LRirq, RName_LRsvc, RName_LRabt, RName_LRund, RName_LRmon, RName_PC}; // The physical array of banked core registers. // // _R[RName_PC] is defined to be the address of the current instruction. The // offset of 4 or 8 bytes is applied to it by the register access functions. array bits(32) _R[RName]; // RBankSelect() // ============= RName RBankSelect(bits(5) mode, RName usr, RName fiq, RName irq, RName svc, RName abt, RName und, RName mon) if BadMode(mode) then UNPREDICTABLE; else case mode of when ‘10000’ result = usr; // User mode when ‘10001’ result = fiq; // FIQ mode when ‘10010’ result = irq; // IRQ mode when ‘10011’ result = svc; // Supervisor mode when ‘10110’ result = mon; // Monitor mode when ‘10111’ result = abt; // Abort mode when ‘11011’ result = und; // Undefined mode when ‘11111’ result = usr; // System mode uses User mode registers return result; // RfiqBankSelect() // ================ RName RfiqBankSelect(bits(5) mode, RName usr, RName fiq) return RBankSelect(mode, usr, fiq, usr, usr, usr, usr, usr); B1-12 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model // LookUpRName() // ============= RName LookUpRName(integer assert n >= 0 && n <= case n of when 0 result = when 1 result = when 2 result = when 3 result = when 4 result = when 5 result = when 6 result = when 7 result = when 8 result = when 9 result = when 10 result = when 11 result = when 12 result = when 13 result = n, bits(5) mode) 14; RName_0usr; RName_1usr; RName_2usr; RName_3usr; RName_4usr; RName_5usr; RName_6usr; RName_7usr; RfiqBankSelect(mode, RName_8usr, RName_8fiq); RfiqBankSelect(mode, RName_9usr, RName_9fiq); RfiqBankSelect(mode, RName_10usr, RName_10fiq); RfiqBankSelect(mode, RName_11usr, RName_11fiq); RfiqBankSelect(mode, RName_12usr, RName_12fiq); RBankSelect(mode, RName_SPusr, RName_SPfiq, RName_SPirq, RName_SPsvc, RName_SPabt, RName_SPund, RName_SPmon); when 14 result = RBankSelect(mode, RName_LRusr, RName_LRfiq, RName_LRirq, RName_LRsvc, RName_LRabt, RName_LRund, RName_LRmon); return result; // Rmode[] - non-assignment form // ============================= bits(32) Rmode[integer n, bits(5) mode] assert n >= 0 && n <= 14; // // // if if In Non-secure state, check for attempted use of Monitor mode (‘10110’), or of FIQ mode (‘10001’) when the Security Extensions are reserving the FIQ registers. The definition of UNPREDICTABLE does not permit this to be a security hole. !IsSecure() && mode == ‘10110’ then UNPREDICTABLE; !IsSecure() && mode == ‘10001’ && NSACR.RFR == ‘1’ then UNPREDICTABLE; return _R[LookUpRName(n,mode)]; // Rmode[] - assignment form // ========================= Rmode[integer n, bits(5) mode] = bits(32) value assert n >= 0 && n <= 14; // // // if if In Non-secure state, check for attempted use of Monitor mode (‘10110’), or of FIQ mode (‘10001’) when the Security Extensions are reserving the FIQ registers. The definition of UNPREDICTABLE does not permit this to be a security hole. !IsSecure() && mode == ‘10110’ then UNPREDICTABLE; !IsSecure() && mode == ‘10001’ && NSACR.RFR == ‘1’ then UNPREDICTABLE; // Writes of non word-aligned values to SP are only permitted in ARM state. if n == 13 && value<1:0> != ‘00’ && CurrentInstrSet() != InstrSet_ARM then UNPREDICTABLE; ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-13 The System Level Programmers’ Model _R[LookUpRName(n,mode)] = value; return; // R[] - non-assignment form // ========================= bits(32) R[integer n] assert n >= 0 && n <= 15; if n == 15 then offset = if CurrentInstrSet() == InstrSet_ARM then 8 else 4; result = _R[RName_PC] + offset; else result = Rmode[n, CPSR.M]; return result; // R[] - assignment form // ===================== R[integer n] = bits(32) value assert n >= 0 && n <= 14; Rmode[n, CPSR.M] = value; return; // BranchTo() // ========== BranchTo(bits(32) address) _R[RName_PC] = address; return; B1.3.3 Program Status Registers (PSRs) The application level programmers’ model provides the Application Program Status Register, see The Application Program Status Register (APSR) on page A2-14. This is an application level alias for the Current Program Status Register (CPSR). The system level view of the CPSR extends the register, adding system level information. Each of the exception modes has its own saved copy of the CPSR, the Saved Program Status Register (SPSR), as shown in Figure B1-1 on page B1-9. For example, the SPSR for Monitor mode is called SPSR_mon. The Current Program Status Register (CPSR) The Current Program Status Register (CPSR) holds processor status and control information: • the APSR, see The Application Program Status Register (APSR) on page A2-14 • the current instruction set state, see ISETSTATE on page A2-15 • the execution state bits for the Thumb If-Then instruction, see ITSTATE on page A2-17 • the current endianness, see ENDIANSTATE on page A2-19 • the current processor mode B1-14 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model • interrupt and asynchronous abort disable bits. The non-APSR bits of the CPSR have defined reset values. These are shown in the TakeReset() pseudocode function, see Reset on page B1-48. Writes to the CPSR have side-effects on various aspects of processor operation. All of these side-effects, except for those on memory accesses caused by fetching instructions, are synchronous to the CPSR write. This means they are guaranteed not to be visible to earlier instructions in the execution stream, and they are guaranteed to be visible to later instructions in the execution stream. Fetching an instruction causes an instruction fetch memory access. In addition, in a Virtual Memory System Architecture (VMSA) implementation, fetching an instruction can cause a translation table walk. The privilege of these memory accesses can be affected by changes to the mode field of the CPSR. Also, if the Security Extensions are implemented the virtual memory space of these accesses can be affected by changes to the mode field. Those mode changes take effect on the memory accesses as follows: • A mode change by an exception entry is synchronous to the exception entry. This applies to all exception entries, including the exception entry for a synchronous exception generated directly by an instruction. • A mode change by an exception return instruction is synchronous to the instruction. • A mode change by an instruction other than an exception return and that is not the result of a synchronous exception generated directly by the instruction. Such a mode change can be the result of a CPS or MSR instructions, and: — is guaranteed not to be visible to memory accesses caused by fetching earlier instructions in the execution stream — is guaranteed to be visible to memory accesses caused by fetching instructions after the next exception entry, exception return instruction, or ISB instruction in the execution stream — might or might not affect memory accesses caused by fetching instructions between the mode change instruction and the point where mode changes are guaranteed to be visible. See Exception return on page B1-38 for the definition of exception return instructions. The Saved Program Status Registers (SPSRs) The purpose of an SPSR is to record the pre-exception value of the CPSR. When taking an exception, the processor copies the CPSR to the SPSR of the exception mode it is about to enter. Saving this value means the exception handler can: • on exception return, restore the CPSR to the value it had when the exception was taken • examine the value the CPSR had when the exception was taken, for example to determine the instruction set state in which the instruction that caused an Undefined Instruction exception was executed. The SPSRs do not have defined reset values. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-15 The System Level Programmers’ Model Format of the CPSR and SPSRs The format of the CPSR and SPSRs is: 31 30 29 28 27 26 25 24 23 N Z C V Q IT J [1:0] 20 19 Reserved 16 15 GE[3:0] 10 9 IT[7:2] 8 7 6 5 E A I F T 4 0 M[4:0] Condition code flags, bits [31:28] Set on the result of instruction execution. The flags are: N, bit [31] Negative condition code flag Z, bit [30] Zero condition code flag C, bit [29] Carry condition code flag V, bit [28] Overflow condition code flag. The condition code flags can be read or written in any mode, and are described in The Application Program Status Register (APSR) on page A2-14. Q, bit [27] Cumulative saturation flag. This flag can be read or written in any mode, and is described in The Application Program Status Register (APSR) on page A2-14. IT[7:0], bits [15:10,26:25] If-Then execution state bits for the Thumb IT (If-Then) instruction. ITSTATE on page A2-17 describes the encoding of these bits. CPSR.IT[7:0] are the IT[7:0] bits described there. For more information, see IT on page A8-104. For details of how these bits can be accessed see Accessing the execution state bits on page B1-18. J, bit [24] Jazelle bit, see the description of the T bit, bit [5]. Bits [23:20] Reserved. RAZ/SBZP. GE[3:0], bits [19:16] Greater than or Equal flags, for SIMD instructions. The GE[3:0] field can be read or written in any mode, and is described in The Application Program Status Register (APSR) on page A2-14. E, bit [9] Endianness execution state bit. Controls the load and store endianness for data accesses: 0 Little endian operation 1 Big endian operation. This bit is ignored by instruction fetches. ENDIANSTATE on page A2-19 describes the encoding of this bit. CPSR.E is the ENDIANSTATE bit described there. For details of how this bit can be accessed see Accessing the execution state bits on page B1-18. B1-16 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model Mask bits, bits [8:6] The mask bits disable some asynchronous exceptions. The three mask bits are: A, bit [8] Asynchronous abort disable bit. Used to mask asynchronous aborts. I, bit [7] Interrupt disable bit. Used to mask IRQ interrupts. F, bit [6] Fast interrupt disable bit. Used to mask FIQ interrupts. The possible values of each bit are: 0 Exception enabled 1 Exception disabled. The mask bits can be written only in privileged modes. Their values can be read in any mode, but use of their values and attempts to change them by User mode code are deprecated. Updates to the F bit are restricted if Non-maskable Fast Interrupts (NMFIs) are supported, see Non-maskable fast interrupts on page B1-18. If implemented, the Security Extensions can restrict updates to the A and F bits from the Non-secure state, see Use of the A, F, and Mode bits by the Security Extensions on page B1-19. T, bit [5] Thumb execution state bit. This bit and the J execution state bit, bit [24], determine the instruction set state of the processor, ARM, Thumb, Jazelle, or ThumbEE. ISETSTATE on page A2-15 describes the encoding of these bits. CPSR.J and CPSR.T are the same bits as ISETSTATE.J and ISETSTATE.T respectively. For more information, see Instruction set states on page B1-23. For details of how these bits can be accessed see Accessing the execution state bits on page B1-18. M[4:0], bits [4:0] Mode field. This field determines the current mode of the processor. The permitted values of this field are listed in Table B1-1 on page B1-6. All other values of M[4:0] are reserved. The effect of setting M[4:0] to a reserved value is UNPREDICTABLE. For more information about the processor modes see ARM processor modes on page B1-6. Figure B1-1 on page B1-9 shows the registers that can be accessed in each mode. This field can be written only in privileged modes. Its value can be read in any mode, but use of its value and attempts to change it by User mode code are deprecated. If implemented, the Security Extensions restrict use of the mode field to enter Monitor and FIQ modes, see Use of the A, F, and Mode bits by the Security Extensions on page B1-19. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-17 The System Level Programmers’ Model Accessing the execution state bits The execution state bits are the IT[7:0], J, E, and T bits. In exception modes you can read or write these bits in the current SPSR. In the CPSR, unless the processor is in Debug state: • The execution state bits, other than the E bit, are RAZ when read by an MRS instruction. • Writes to the execution state bits, other than the E bit, by an MSR instruction are: — For ARMv7 and ARMv6T2, ignored in all modes. — For architecture variants before ARMv6T2, ignored in User mode and required to write zeros in privileged modes. If a nonzero value is written in a privileged mode, behavior is UNPREDICTABLE. Instructions other than MRS and MSR that access the execution state bits can read and write them in any mode. Unlike the other execution state bits in the CPSR, CPSR.E can be read by an MRS instruction and written by an MSR instruction. However, using the CPSR.E value read by an MRS instruction is deprecated, and using an MSR instruction to change the value of CPSR.E is deprecated. Note • Use the SETEND instruction to change the current endianness. • To determine the current endianness, use an LDR instruction to load a word of memory whose value is known and will differ if the endianness is reversed. For example, use an LDR (literal) instruction to load a word whose four bytes are 0x01, 0x00, 0x00, and 0x00 in ascending order of memory address. The LDR instruction loads the destination register with: — 0x00000001 if the current endianness is little-endian 0x01000000 if the current endianness is big-endian. — For more information about the behavior of these bits in Debug state see Behavior of the PC and CPSR in Debug state on page C5-7. Non-maskable fast interrupts Exceptions, debug events and checks on page A2-81 introduces the two levels of external interrupts to an ARM processor, Interrupt Requests or IRQs and higher priority Fast Interrupt Requests or FIQs. Both IRQs and FIQs can be masked by bits in the CPSR, see Program Status Registers (PSRs) on page B1-14: • when the CPSR.I bit is set to 1, IRQ interrupts are masked • when the CPSR.F bit is set to 1, FIQ interrupts are masked. ARMv7 supports an operating mode where FIQs are not maskable by software. This Non-maskable Fast Interrupt (NMFI) operation is controlled by a configuration input signal to the processor, that is asserted HIGH to enable NMFI operation. There is no software control of NMFI operation. Software can detect whether FIQs are maskable by reading the SCTLR.NMFI bit: NMFI == 0 Software can mask FIQs by setting the CPSR.F bit to 1 NMFI == 1 Software cannot mask FIQs. B1-18 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model For more information see: • c1, System Control Register (SCTLR) on page B3-96 for a VMSA implementation • c1, System Control Register (SCTLR) on page B4-45 for a PMSA implementation. It is IMPLEMENTATION DEFINED whether an ARMv7 processor supports NMFIs. The SCTLR.NMFI bit is RAO only if the processor supports NMFIs and the configuration input signal is asserted HIGH, otherwise it is RAZ. When the SCTLR.NMFI bit is 1: • an instruction writing 0 to the CPSR.F bit clears it to 0, but an instruction attempting to write 1 to it leaves it unchanged. • CPSR.F can be set to 1 only by exception entries, as described in CPSR M field and A, I, and F mask bit values on exception entry on page B1-36. Use of the A, F, and Mode bits by the Security Extensions When the Security Extensions are implemented and the processor is in the Non-secure state: • the CPSR.F bit cannot be changed if the SCR.FW bit is set to 0 • the CPSR.A bit cannot be changed if the SCR.AW bit is set to 0 • the effect of setting CPSR.M to 0b10110, Monitor mode, is UNPREDICTABLE • the effect of setting CPSR.M to 0b10001, FIQ mode, is UNPREDICTABLE if NSACR.RFR is set to 1. Note • When the Security Extensions are implemented and the processor is in the Non-secure state the SPSR.F and SPSR.A bits can be changed even if the corresponding bits in the SCR are set to 0. However, when the SPSR is copied to the CPSR the CPSR.F and CPSR.A bits are not updated if the corresponding bits in the SCR are set to 0. • UNPREDICTABLE behavior must not be a security hole. Therefore, every implementation must ensure that: — If NSACR.RFR is 0, setting CPSR.M to 0b10110 when in Non-secure state cannot cause entry to either Monitor mode or Secure state — If NSACR.RFR is 1, setting CPSR.M to 0b10001 or 0b10110 when in Non-secure state cannot cause entry to Monitor mode, FIQ mode or Secure state. For more information about the access controls provided by the Security Extensions see c1, Secure Configuration Register (SCR) on page B3-106. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-19 The System Level Programmers’ Model Software running in Non-secure state might not be able to set the CPSR.F bit to 1 to mask FIQs, as described in Use of the A, F, and Mode bits by the Security Extensions on page B1-19. Table B1-2 shows how the SCTLR.NMFI bit interacts with the SCR.FW bit to control access to the CPSR.F bit, in the Secure and Non-secure security states. Table B1-2 Summary of NMFI behavior when Security Extensions are implemented Security state SCR.FW bit SCTLR.NMFI bit CPSR.F bit properties Secure x 0 F bit can be written to 0 or 1 1 F bit can be written to 0 but not to 1 0 x F bit cannot be written 1 0 F bit can be written to 0 or 1 1 F bit can be written to 0 but not to 1 Non-secure Note The SCTLR.NMFI bit is common to the Secure and Non-secure versions of the SCTLR, because it is a read-only bit that reflects the value of a configuration input signal. Pseudocode details of PSR operations The following pseudocode gives access to the PSRs: bits(32) CPSR, SPSR_fiq, SPSR_irq, SPSR_svc, SPSR_mon, SPSR_abt, SPSR_und; // SPSR[] - non-assignment form // ============================ bits(32) SPSR[] if BadMode(CPSR.M) then UNPREDICTABLE; else case CPSR.M of when ‘10001’ result = SPSR_fiq; when ‘10010’ result = SPSR_irq; when ‘10011’ result = SPSR_svc; when ‘10110’ result = SPSR_mon; when ‘10111’ result = SPSR_abt; when ‘11011’ result = SPSR_und; otherwise UNPREDICTABLE; return result; // // // // // // FIQ mode IRQ mode Supervisor mode Monitor mode Abort mode Undefined mode // SPSR[] - assignment form // ======================== B1-20 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model SPSR[] = bits(32) value if BadMode(CPSR.M) then UNPREDICTABLE; else case CPSR.M of when ‘10001’ SPSR_fiq = value; when ‘10010’ SPSR_irq = value; when ‘10011’ SPSR_svc = value; when ‘10110’ SPSR_mon = value; when ‘10111’ SPSR_abt = value; when ‘11011’ SPSR_und = value; otherwise UNPREDICTABLE; return; // // // // // // FIQ mode IRQ mode Supervisor mode Monitor mode Abort mode Undefined mode // CPSRWriteByInstr() // ================== CPSRWriteByInstr(bits(32) value, bits(4) bytemask, boolean affect_execstate) privileged = CurrentModeIsPrivileged(); nmfi = (SCTLR.NMFI == ‘1’); if bytemask<3> == ‘1’ then CPSR<31:27> = value<31:27>; if affect_execstate then CPSR<26:24> = value<26:24>; if bytemask<2> == ‘1’ then // bits <23:20> are reserved SBZP bits CPSR<19:16> = value<19:16>; // N,Z,C,V,Q flags // IT<1:0>,J execution state bits // GE<3:0> flags if bytemask<1> == ‘1’ then if affect_execstate then CPSR<15:10> = value<15:10>; // IT<7:2> execution state bits CPSR<9> = value<9>; // E bit is user-writable if privileged && (IsSecure() || SCR.AW == ‘1’) then CPSR<8> = value<8>; // A interrupt mask if bytemask<0> == ‘1’ then if privileged then CPSR<7> = value<7>; // I interrupt mask if privileged && (IsSecure() || SCR.FW == ‘1’) && (!nmfi || value<6> == ‘0’) then CPSR<6> = value<6>; // F interrupt mask if affect_execstate then CPSR<5> = value<5>; // T execution state bit if privileged then if BadMode(value<4:0>) then UNPREDICTABLE; else // Check for attempts to enter modes only permitted in Secure state from // Non-secure state. These are Monitor mode (‘10110’), and FIQ mode (‘10001’) // if the Security Extensions have reserved it. The definition of UNPREDICTABLE // does not permit the resulting behavior to be a security hole. if !IsSecure() && value<4:0> == ‘10110’ then UNPREDICTABLE; ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-21 The System Level Programmers’ Model if !IsSecure() && value<4:0> == ‘10001’ && NSACR.RFR == ‘1’ then UNPREDICTABLE; CPSR<4:0> = value<4:0>; // M<4:0> mode bits return; // SPSRWriteByInstr() // ================== SPSRWriteByInstr(bits(32) value, bits(4) bytemask) if CurrentModeIsUserOrSystem() then UNPREDICTABLE; if bytemask<3> == ‘1’ then SPSR[]<31:24> = value<31:24>; // N,Z,C,V,Q flags, IT<1:0>,J execution state bits if bytemask<2> == ‘1’ then // bits <23:20> are reserved SBZP bits SPSR[]<19:16> = value<19:16>; // GE<3:0> flags if bytemask<1> == ‘1’ then SPSR[]<15:8> = value<15:8>; if bytemask<0> == ‘1’ then SPSR[]<7:5> = value<7:5>; if BadMode(value<4:0>) then UNPREDICTABLE; else SPSR[]<4:0> = value<4:0>; // IT<7:2> execution state bits, E bit, A interrupt mask // I,F interrupt masks, T execution state bit // Mode bits return; B1-22 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model B1.4 Instruction set states The instruction set states are described in Chapter A2 Application Level Programmers’ Model and application level operations on them are described there. This section supplies more information about how they interact with system level functionality, in the sections: • Exceptions and instruction set state. • Unimplemented instruction sets. B1.4.1 Exceptions and instruction set state An exception is handled in the appropriate exception mode. The SCTLR.TE bit determines the processor instruction set state that handles exceptions. If necessary, the processor changes to this instruction set state on exception entry. For more information see: • c1, System Control Register (SCTLR) on page B3-96 for a VMSA implementation • c1, System Control Register (SCTLR) on page B4-45 for a PMSA implementation. When an exception is taken, the value of the CPSR before the exception is written to the SPSR for the exception mode. On returning from the exception: • the CPSR is restored: — from a memory location if the RFE instruction is used — otherwise, from the SPSR for the exception mode • the processor instruction set state is determined by the restored CPSR.J and CPSR.T values. Note The Reset exception is a special case and behaves differently, see Reset on page B1-48. B1.4.2 Unimplemented instruction sets The CPSR.J and CPSR.T bits define the current instruction set state, see ISETSTATE on page A2-15. The Jazelle state is optional, and the ThumbEE state is optional in the ARMv7-R architecture. Some system instructions permit an attempt to set CPSR.J and CPSR.T to values that select an unimplemented instruction set option, for example to set CPSR.J = 1, CPSR.T = 0 on an processor that does not implement the Jazelle state. If such values are written to CPSR.J and CPSR.T, the implementation behaves in one of these ways: • Sets CPSR.J and CPSR.T to the requested values and causes the next instruction to be UNDEFINED. Entry to the Undefined Instruction handler forces the processor into the state indicated by the SCTLR.TE bit. The handler can detect the cause of the exception because CPSR.J and CPSR.T are set to the unimplemented combination in SPSR_und. Table B1-4 on page B1-34 shows the value in LR_und on exception entry. For the description of the SCTLR see: — c1, System Control Register (SCTLR) on page B3-96 for a VMSA implementation — c1, System Control Register (SCTLR) on page B4-45 for a PMSA implementation. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-23 The System Level Programmers’ Model • B1-24 Does not set CPSR.J and CPSR.T to the requested values. The processor might change the value of one or both of the bits in such a way that the new values correspond to an implemented instruction set state. If this is done then the instruction set state changes to this new state. The detailed behavior of the attempt to change to an unimplemented state is IMPLEMENTATION DEFINED. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model B1.5 The Security Extensions It is IMPLEMENTATION DEFINED whether an ARMv7-A system includes the Security Extensions. When implemented, the Security Extensions integrate hardware security features into the architecture, to facilitate the development of secure applications. Many features of the architecture are extended to integrate with the Security Extensions, and because of this integration of the Security Extensions into the architecture, features of the Security Extensions are described in many sections of this manual. Note The Security Extensions are also permitted as an extension to the ARMv6K architecture. The resulting combination is sometimes known as the ARMv6Z or ARMv6KZ architecture. General information about the Security Extensions is given in: • Security states • Impact of the Security Extensions on the modes and exception model on page B1-28 • Effect of the Security Extensions on the CP15 registers on page B3-71. B1.5.1 Security states The Security Extensions define two security states, Secure state and Non-secure state. All code execution takes place either in Secure state or in Non-secure state: • each security state operates in its own virtual memory address space • many system controls can be set independently in each of the security states • all of the processor modes that are available in a system that does not implement the Security Extensions are available in each of the security states. The Security Extensions also define an additional processor mode, Monitor mode, that provides a bridge between code running in Non-secure state and code running in Secure state. The following features mean the two security states can provide more security than is typically provided by systems using the split between privileged and unprivileged code: • the memory system provides mechanisms that prevent the Non-secure state accessing regions of the physical memory designated as Secure • system controls that apply to the Secure state are not accessible from the Non-secure state • entry to the Secure state from the Non-secure state is provided only by a small number of exceptions • exit from the Secure state to the Non-secure state is provided only by a small number of mechanisms • many operating system exceptions can be handled without changing security state. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-25 The System Level Programmers’ Model The fundamental mechanism that determines the security state is the SCR.NS bit, see c1, Secure Configuration Register (SCR) on page B3-106: • for all modes other than Monitor mode, the SCR.NS bit determines the security state for code execution • code executing in Monitor mode is executed in the Secure state regardless of the value of the SCR.NS bit. Code can change the SCR only if it is executing in the Secure state. The general-purpose registers and the processor status registers are not banked between the Secure and the Non-secure states. When execution switches between the Non-secure and Secure security states, ARM expects that the values of these registers are switched by a kernel running mostly in Monitor mode. Many of the system registers described in Coprocessors and system control on page B1-62 are banked between the Secure and Non-secure security states. A banked copy of a register applies only to execution in the appropriate security state. A small number of system registers are not banked but apply to both the Secure and Non-secure security states. Typically the registers that are not banked relate to global system configuration options that ARM expects to be common to the two security states. Figure B1-2 on page B1-27 shows the normal transfers of control between different modes and security states. Note In Figure B1-2 on page B1-27, the route labelled as MCR is for an MCR instruction writing to the SCR, that sets SCR.NS to 1 (Non-secure) at a time when SCR.NS == 0 (Secure) and the processor is not in Monitor mode. This is a possible transfer, but ARM recommends that the value of SCR.NS is changed only by code executing in Monitor mode, see Changing from Secure to Non-secure state on page B1-27. B1-26 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model CP15SCR.NS = 0 (Secure) CP15SCR.NS = 1 (Non-secure) Monitor SMC ‡ SMC Supervisor FIQ IRQ Undef Abort System MCR ‡ or other mode changing method Supervisor FIQ IRQ Undef Abort System Non-secure privileged modes Secure privileged modes other than Monitor mode User User Non-secure state Secure state Figure B1-2 Security state, Monitor mode, and the SCR.NS bit Note It is important to distinguish between: Monitor mode This is a processor mode that is only available when the Security Extensions are implemented. It is used in normal operation, as a mechanism to transfer between Secure and Non-secure state, as described in this section. Monitor debug-mode This is a debug mode and is available regardless of whether the Security Extensions are implemented. For more information, see About the ARM Debug architecture on page C1-3. Changing from Secure to Non-secure state The security state is controlled by the SCR.NS bit, and ARM recommends that the SCR is modified only in Monitor mode. Monitor mode is responsible for switching between Secure and Non-secure states. To return to Non-secure state, set the SCR.NS bit to 1 and then perform an exception return. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-27 The System Level Programmers’ Model Note To avoid security holes, ARM strongly recommends that: • you do not change from Secure to Non-secure state by using an MSR or CPS instruction to switch from Monitor mode to some other mode while SCR.NS is 1 • you do not use an MCR instruction that writes SCR.NS to change from Secure to Non-secure state. This means you should not alter the SCR.NS bit in any mode except Monitor mode. The usual mechanism for changing from Secure to Non-secure state is an exception return. Pseudocode details of Secure state operations The HaveSecurityExt() function returns TRUE if the Security Extensions are implemented, and FALSE otherwise. The following function returns TRUE if the Security Extensions are not implemented or the processor is in Secure state, and FALSE otherwise. // IsSecure() // ========== boolean IsSecure() return !HaveSecurityExt() || SCR.NS == ‘0’ || CPSR.M == ‘10110’; // Monitor mode B1.5.2 Impact of the Security Extensions on the modes and exception model This section summarizes the effect of the Security Extensions on the modes and exception model, to give a overview of the Security Extensions. When the Security Extensions are implemented: • An additional mode, Monitor mode, is implemented. For more information, see ARM processor modes on page B1-6 and Security states on page B1-25. • An additional exception, the Secure Monitor Call (SMC) exception, is implemented. This is generated by the SMC instruction. For more information, see Secure Monitor Call (SMC) exception on page B1-53 and SMC (previously SMI) on page B6-18. • Because the SCTLR is banked between the Secure and Non-secure states, the V and VE bits are defined independently for the Secure and Non-secure states. For each state: — the SCTLR.V bit controls whether the normal or the high exception vectors are used — the SCTLR.VE bit controls whether the IRQ and FIQ vectors are IMPLEMENTATION DEFINED. For more information, see Exception vectors and the exception base address on page B1-30. • B1-28 The base address for the normal exception vectors is held in a CP15 register that is banked between the two security states. This register defines the base address used for exceptions handled in modes other than Monitor mode. Another CP15 register holds the base address for exceptions handled in Monitor mode. For more information, see Exception vectors and the exception base address on page B1-30. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model • If an exception is taken in Monitor mode in Non-debug state, the SCR.NS bit is set to zero, see c1, Secure Configuration Register (SCR) on page B3-106. This forces Secure state entry for all exceptions. However, if an exception is taken in Monitor mode in Debug state, the SCR.NS bit is not set to zero. Note Many uses of the Security Extensions can be simplified if the system is designed so that exceptions cannot be taken in Monitor mode. • Setting bits in the Secure Configuration Register causes one or more of external aborts, IRQs and FIQs to be handled in Monitor mode and to use the Monitor exception base address: — setting the SCR.EA bit to 1 means external aborts are handled in Monitor mode, instead of Abort mode — setting the SCR.FIQ bit to 1 means FIQs are handled in Monitor mode, instead of FIQ mode — setting the SCR.IRQ bit to 1 means IRQs are handled in Monitor mode, instead of IRQ mode. For more information see: — Control of exception handling by the Security Extensions on page B1-41 — c1, Secure Configuration Register (SCR) on page B3-106. • ARM DDI 0406B Setting bits in the Secure Configuration Register prevents code executing in Non-secure state from being able to mask one or both of asynchronous aborts and FIQs: — Setting the SCR.AW bit to 1 prevents Non-secure setting of CPSR.A to 1. — Setting the SCR.FW bit to 1 prevents Non-secure setting of CPSR.F to 1. For details of how this setting interacts with NMFIs see Non-maskable fast interrupts on page B1-18. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-29 The System Level Programmers’ Model B1.6 Exceptions An exception causes the processor to suspend program execution to handle an event, such as an externally generated interrupt or an attempt to execute an undefined instruction. Exceptions can be generated by internal and external sources. Normally, when an exception is taken the processor state is preserved immediately, before handling the exception. This means that, when the event has been handled, the original state can be restored and program execution resumed from the point where the exception was taken. More than one exception might be generated at the same time, and a new exception can be generated while the processor is handling an exception. The following sections describe exception handling in general: • Exception vectors and the exception base address • Exception priority order on page B1-33 • Exception entry on page B1-34 • Exception return on page B1-38 • Exception-handling instructions on page B1-41 • Control of exception handling by the Security Extensions on page B1-41 • Low interrupt latency configuration on page B1-43. • Wait For Event and Send Event on page B1-44 • Wait For Interrupt on page B1-47. The following sections give details of each exception: • Reset on page B1-48 • Undefined Instruction exception on page B1-49 • Supervisor Call (SVC) exception on page B1-52 • Secure Monitor Call (SMC) exception on page B1-53 • Prefetch Abort exception on page B1-54 • Data Abort exception on page B1-55 • IRQ exception on page B1-58 • FIQ exception on page B1-60. B1.6.1 Exception vectors and the exception base address When an exception is taken, processor execution is forced to an address that corresponds to the type of exception. These addresses are called the exception vectors. By default, the exception vectors are eight consecutive word-aligned memory addresses, starting at an exception base address. Table B1-3 on page B1-31 shows the assignment of the exceptions to the eight memory addresses. B1-30 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model Table B1-3 Offsets from exception base addresses Exception offset Exception that is vectored at that offset from: Monitor exception base address a Base address for all other exceptions 0x00 Not used Reset 0x04 Not used Undefined Instruction 0x08 Secure Monitor Call (SMC) Supervisor Call (SVC) 0x0C Prefetch Abort Prefetch Abort 0x10 Data Abort Data Abort 0x14 Not used Not used 0x18 IRQ (interrupt) IRQ (interrupt) 0x1C FIQ (fast interrupt) FIQ (fast interrupt) a. This column applies only if the Security Extensions are implemented. The default exception vectors for the IRQ and FIQ exceptions can be changed by setting the SCTLR.VE bit to 1, as described in Vectored interrupt support on page B1-32. If the Security Extensions are not implemented there is a single exception base address. This is controlled by the SCTLR.V bit: V == 0 Exception base address = 0x00000000. This setting is referred to as normal vectors, or as low vectors. V == 1 Exception base address = 0xFFFF0000. This setting is referred to as high vectors, or Hivecs. Note Use of the Hivecs setting, V == 1, is deprecated in ARMv7-R. ARM recommends that Hivecs is used only in ARMv7-A implementations. If the Security Extensions are implemented there are three exception base addresses: • the Non-secure exception base address is used for all exceptions that are processed in Non-secure state • the Secure exception base address is used for all exceptions that are processed in Secure state but not in Monitor mode • the Monitor exception base address is used for all exceptions that are processed in Monitor mode. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-31 The System Level Programmers’ Model See CPSR M field and A, I, and F mask bit values on exception entry on page B1-36 to determine the mode in which an exception is processed. If that mode is Monitor mode then the exception is processed in Secure state, otherwise the exception is processed in the current security state, determined at the time when the exception is taken. The Non-secure exception base address is controlled by the SCTLR.V bit in the Non-secure SCTLR: V == 0 The exception base address is the value of the Non-secure Vector Base Address Register (VBAR), see c12, Vector Base Address Register (VBAR) on page B3-148. V == 1 Exception base address = 0xFFFF0000. This setting is often referred to as Hivecs. The Secure exception base address is controlled similarly, by the Secure SCTLR.V bit and the Secure VBAR. The Monitor exception base address is always the value of the Monitor Vector Base Address Register (MVBAR), see c12, Monitor Vector Base Address Register (MVBAR) on page B3-149. Vectored interrupt support By default, the IRQ and FIQ exception vectors are at fixed offsets from the exception base address that is being used. This is consistent with previous versions of the ARM architecture. With this default configuration, each of the FIQ and IRQ handlers typically starts with an instruction sequence that determines the cause of the interrupt and then branches to an appropriate routine to handle it. Support for vectored interrupts means an interrupt controller can prioritize interrupts and provide the address of the required interrupt handler directly to the processor, for use as the interrupt vector. Vectored interrupt behavior is enabled by setting the SCTLR.VE bit to 1, see: • c1, System Control Register (SCTLR) on page B3-96 for a VMSA implementation • c1, System Control Register (SCTLR) on page B4-45 for a PMSA implementation. The hardware that supports vectored interrupts is IMPLEMENTATION DEFINED. For backwards compatibility, the vectored interrupt mechanism is disabled on reset. When the Security Extensions are implemented: B1-32 • The SCTLR.VE bit is banked between Secure and Non-secure states to provide independent control of whether vectored interrupt support is enabled. • Interrupts can be trapped to Monitor mode, by setting either or both of the SCR.IRQ and SCR.FIQ bits to 1. When an interrupt is trapped to Monitor mode it uses the vector in the vector table addressed by the Monitor exception base address held in MVBAR, regardless of the value of either banked copy of the SCTLR.VE bit. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model Operation In pseudocode, the current exception base address for exceptions processed in Monitor mode is determined by reading MVBAR, and for other exceptions by the following function: // ExcVectorBase() // =============== bits(32) ExcVectorBase() if SCTLR.V == ‘1’ then // Hivecs selected, base = 0xFFFF0000 return Ones(16):Zeros(16); elsif HaveSecurityExt() then return VBAR; else return Zeros(32); B1.6.2 Exception priority order In principle a number of different synchronous exceptions can be generated by a single instruction. The following principles determine which synchronous exception is taken: • No instruction is valid if it has a synchronous Prefetch Abort exception associated with it. Therefore, other synchronous exceptions are not taken in this case. • An instruction that generates an Undefined Instruction exception cannot cause any memory access, and therefore cannot cause a Data Abort exception. • All other synchronous exceptions are mutually exclusive and are derived from a decode of the instruction. The ARM architecture does not define when asynchronous exceptions are taken. Therefore the prioritization of asynchronous exceptions relative to other exceptions, both synchronous and asynchronous, depends on the implementation. The CPSR includes a mask bit for each type of asynchronous exception. Setting one of these bits to 1 prevents the corresponding asynchronous exception from being taken. Taking an exception sets an exception-dependent subset of these mask bits. Note • The subset of the CPSR mask bits that is set on taking an exception prioritizes the execution of FIQ handlers over that of IRQ and asynchronous abort handlers. • A special requirement applies to asynchronous watchpoints - see Debug event prioritization on page C3-43. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-33 The System Level Programmers’ Model B1.6.3 Exception entry On taking an exception: 1. The value of the CPSR is saved in the SPSR for the exception mode that is handling the exception. 2. The value of (PC + exception-dependent offset) is saved in the LR for the exception mode that is handling the exception, see Table B1-4. 3. The CPSR and PC are updated with information for the exception handler: • The CPSR is updated with new context information. This includes: — Setting CPSR.M to the processor mode in which the exception is to be handled. — Disabling appropriate classes of interrupt, to prevent uncontrolled nesting of exception handlers. For more information, see Table B1-6 on page B1-36, Table B1-7 on page B1-37, and Table B1-8 on page B1-37. — Setting the instruction set state to the instruction set chosen for exception entry, see Instruction set state on exception entry on page B1-35. — Setting the endianness to the value chosen for exception entry, see CPSR.E bit value on exception entry on page B1-38. — Clearing the IT[7:0] bits to 0. For more information, see CPSR M field and A, I, and F mask bit values on exception entry on page B1-36. • 4. The appropriate exception vector is loaded to the PC, see Exception vectors and the exception base address on page B1-30. Execution continues from the address held in the PC. At step 2 of the exception entry, the address saved in the LR depends on: • the Exception type • the instruction set state in which the processor is executing when the exception occurs. Table B1-4 shows the LR value saved for all cases: Table B1-4 Link Register value saved on exception entry Offset, for processor state of:a Exception Base LR value a ARM Thumb or ThumbEE Jazelle - - Reset UNKNOWN Undefined Instruction Address of the undefined instruction +4 +2 +2 or +4 b SVC Address of SVC instruction +4 +2 -c SMC Address of SMC instruction +4 +4 -c B1-34 - Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model Table B1-4 Link Register value saved on exception entry (continued) Offset, for processor state of:a Exception Base LR value a ARM Thumb or ThumbEE Jazelle Prefetch Abort Address of aborted instruction fetch +4 +4 +4 Data Abort Address of instruction that generated the abort +8 +8 +8 IRQ or FIQ Address of next instruction to execute +4 +4 +4 a. Except for the Reset exception, the value saved in the LR is the base LR value plus the offset value for the processor state immediately before the exception entry. b. In Jazelle state, Undefined Instruction exceptions can only happen on a processor that includes a trivial implementation of Jazelle state. On such a processor, if an exception return instruction writes {CPSR.J, CPSR.T} to 0b10, the processor takes an Undefined Instruction exception when it next attempts to execute an instruction. It is IMPLEMENTATION DEFINED whether the processor uses an offset of +2 or +4 in these circumstances, but it must always use the same offset. c. SVC and SMC exceptions cannot occur in Jazelle state. Instruction set state on exception entry Exception handlers always execute in either Thumb state or ARM state. Which state they execute in is determined by the Thumb Exception enable bit, SCTLR.TE, see: • c1, System Control Register (SCTLR) on page B3-96 for a VMSA implementation • c1, System Control Register (SCTLR) on page B4-45 for a PMSA implementation. On exception entry, the CPSR.T and CPSR.J bits are set to values that depend on the SCTLR.TE value, as shown in Table B1-5: Table B1-5 CPSR.J and CPSR.T bit values on exception entry SCTLR.TE CPSR.J CPSR.T Exception handler state 0 0 0 ARM 1 0 1 Thumb When the Security Extensions are implemented, the SCTLR is banked for Secure and Non-secure states, and therefore the TE bit value might be different for Secure and Non-secure states. The SCTLR.TE bit for the security state in which the exception is handled determines the instruction set state for the exception handler. This means the exception handlers might run in different instruction set states, depending on the security state. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-35 The System Level Programmers’ Model CPSR M field and A, I, and F mask bit values on exception entry On exception entry, the processor mode is set to one of the exception modes and the CPSR[A,I,F] interrupt disable (mask) bits are set to new values: • the CPSR.I bit is always set to 1, to disable IRQs • the CPSR.M (mode), CPSR.A (asynchronous abort disable), and CPSR.F (FIQ disable) bits are set to values that depend: — on the exception type — if the Security Extensions are implemented, on the security state and some bits of the SCR, see c1, Secure Configuration Register (SCR) on page B3-106. The new values are shown in: • Table B1-6, for an implementation that does not include the Security Extensions • Table B1-7 on page B1-37, for an implementation that includes the Security Extensions, when the security state is Secure (NS == 0). • Table B1-8 on page B1-37, for an implementation that includes the Security Extensions, when the security state is Non-secure (NS == 1). In these tables, Unchanged indicates that the bit value is unchanged from its value when the exception was taken. Table B1-6 A and F bit values on exception entry, without Security Extensions B1-36 Exception Exception mode CPSR.A CPSR.F Reset Supervisor 1 1 Undefined Instruction Undefined Unchanged Unchanged Supervisor Call (SVC) Supervisor Unchanged Unchanged All aborts Abort 1 Unchanged IRQ IRQ 1 Unchanged FIQ FIQ 1 1 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model Table B1-7 A and F bit values on exception entry, with Security Extensions and NS == 0 SCR bits NS == 0, Secure Exception Exception mode EA IRQ FIQ CPSR.A CPSR.F Reset x x x Supervisor 1 1 Undefined Instruction x x x Undefined Unchanged Unchanged Supervisor Call (SVC) x x x Supervisor Unchanged Unchanged Secure Monitor Call (SMC) x x x Monitor 1 1 0 x x Abort 1 Unchanged 1 x x Monitor 1 1 x x x Abort 1 Unchanged x 0 x IRQ 1 Unchanged x 1 x Monitor 1 1 x x 0 FIQ 1 1 x x 1 Monitor 1 1 All external aborts All internal aborts IRQ FIQ Table B1-8 A and F bit values on exception entry, with Security Extensions and NS == 1 SCR bits Exception EA IRQ FIQ AW FW Reset x x x x x Undefined Instruction x x x x Supervisor Call (SVC) x x x Secure Monitor Call (SMC) x x 0 All external aborts ARM DDI 0406B Exception mode NS == 1, Non-secure CPSR.A CPSR.F Supervisor 1 1 x Undefined Unchanged Unchanged x x Supervisor Unchanged Unchanged x x x Monitor 1 1 x x 0 x Abort Unchanged Unchanged 0 x x 1 x Abort 1 Unchanged 1 x x x x Monitor 1 1 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-37 The System Level Programmers’ Model Table B1-8 A and F bit values on exception entry, with Security Extensions and NS == 1 (continued) SCR bits Exception EA IRQ FIQ AW FW x x x 0 x x x x 1 x 0 x x 0 x Exception mode NS == 1, Non-secure CPSR.A CPSR.F Abort Unchanged Unchanged x Abort 1 Unchanged 0 x IRQ Unchanged Unchanged x 1 x IRQ 1 Unchanged 1 x x x Monitor 1 1 x x 0 0 0 FIQ Unchanged Unchanged x x 0 0 1 FIQ Unchanged 1 x x 0 1 0 FIQ 1 Unchanged x x 0 1 1 FIQ 1 1 x x 1 x x Monitor 1 1 All internal aborts IRQ FIQ CPSR.E bit value on exception entry On exception entry, the CPSR.E bit is set to the value of the SCTLR.EE bit. This bit of the CPSR controls the load and store endianness for data handling by the exception handler, see the bit description in Format of the CPSR and SPSRs on page B1-16. For the description of the SCTLR see: • c1, System Control Register (SCTLR) on page B3-96 for a VMSA implementation • c1, System Control Register (SCTLR) on page B4-45 for a PMSA implementation. B1.6.4 Exception return In the ARM architecture, exception return requires the simultaneous restoration of the PC and CPSR to values that are consistent with the desired state of execution on returning from the exception. Normally, this is the state of execution just before the exception was taken, but it can be different in some circumstances, for example if the exception handler performed instruction emulation. Typically, this involves returning to one of: B1-38 • the instruction boundary at which an asynchronous exception was taken • the instruction following an SVC or SMC instruction, for an exception generated by one of those instructions • the instruction that caused the exception, after the reason for the exception has been removed Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model • the subsequent instruction, if the instruction that caused the exception has been emulated in the exception handler. The ARM architecture makes no requirement that exception return must be to any particular place in the execution stream. However, the architecture does have a preferred exception return for each exception other than Reset. The values of the SPSR.IT[7:0] bits generated on exception entry are always correct for the preferred exception return, but might require adjustment by software if returning elsewhere. In some cases, the value of the LR set on taking the exception, as shown in Table B1-4 on page B1-34, makes it necessary to perform a subtraction to calculate the appropriate return address. The value that must be subtracted for the preferred exception return, and other details of the preferred exception return, are given in the description of each of the exceptions. The ARM architecture provides the following exception return instructions: • Data-processing instructions with the S bit set and the PC as a destination, see SUBS PC, LR and related instructions on page B6-25. Typically, SUBS is used when a subtraction is required, and SUBS with an operand of 0 or MOVS is used otherwise. • From ARMv6, the RFE instruction, see RFE on page B6-16. If a subtraction is required, typically it is performed before saving the LR value to memory. • In ARM state, a form of the LDM instruction, see LDM (exception return) on page B6-5. If a subtraction is required, typically it is performed before saving the LR value to memory. Alignment of exception returns An unaligned exception return is one where the address transferred to the PC on an exception return is not aligned to the size of instructions in the target instruction set. The target instruction set is controlled by the [J,T] bits of the value transferred to the CPSR for the exception return. The behavior of the hardware for exception returns for different values of the [J,T] bits is as follows: [J,T] == 00 The target instruction set state is ARM state. Bits [1:0] of the address transferred to the PC are ignored by the hardware. [J,T] == 01 The target instruction set state is Thumb state: • bit [0] of the address transferred to the PC is ignored by the hardware • bit [1] of the address transferred to the PC is part of the instruction address. [J,T] == 10 The target instruction set state is Jazelle state. In a non-trivial implementation of the Jazelle extension, bits [1:0] of the address transferred to the PC are part of the instruction address. In a trivial implementation of the Jazelle extension, behavior is UNPREDICTABLE, see Exception return to an unsupported instruction set state on page B1-40. For details of the trivial implementation of Jazelle state see Trivial implementation of the Jazelle extension on page B1-81. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-39 The System Level Programmers’ Model [J,T] == 11 The target instruction set state is ThumbEE state: • bit [0] of the address transferred to the PC is ignored by the hardware • bit [1] of the address transferred to the PC is part of the instruction address. ARM deprecates any dependence on the requirements that the hardware ignores bits of the address. ARM recommends that the address transferred to the PC for an exception return is correctly aligned for the target instruction set. After an exception entry other than Reset, the LR value has the correct alignment for the instruction set indicated by the SPSR.[J,T] bits. This means that if exception return instructions are used with the LR and SPSR values produced by such an exception entry, the only precaution software needs to take to ensure correct alignment is that any subtraction is of a multiple of four if returning to ARM state, or a multiple of two if returning to Thumb state or to ThumbEE state. Exception return to an unsupported instruction set state An implementation that does not support one or both of Jazelle and ThumbEE states does not normally get into an unsupported instruction set state, because: • on a trivial Jazelle implementation, the BXJ instruction acts as a BX instruction • on an implementation that does not include ThumbEE support, the ENTERX instruction is UNDEFINED • normal exception entry and return preserves the instruction set state. However, it is possible for an exception return instruction to set CPSR.J and CPSR.T to the values corresponding to an unsupported instruction set state. This is most likely to happen because a faulty exception handler restores the wrong value to the CPSR. If the processor attempts to execute an instruction while the CPSR.J and CPSR.T bits indicate an unsupported instruction set state: • If the unsupported instruction set state is Jazelle state, behavior is UNPREDICTABLE. • If the unsupported instruction set state is ThumbEE state, the processor takes an Undefined Instruction exception. The Undefined Instruction handler can detect the cause of this exception because on entry to the handler the SPSR.J and SPSR.T bits indicate the ThumbEE state. If the Undefined Instruction handler wants to return, avoiding a return to ThumbEE state, it can change the values its exception return instruction writes to the CPSR.J and CPSR.T bits. If an exception return writes CPSR.J = 1 and CPSR.T = 1, corresponding to ThumbEE state, and also writes the address of an aborting memory location to the PC, it is IMPLEMENTATION DEFINED whether: — the instruction is fetched and a Prefetch Abort exception is taken because the memory access aborts — an Undefined Instruction exception is taken, without the instruction being fetched. An implementation that supports neither of the Jazelle and ThumbEE states can implement the J bits of the PSRs as RAZ/WI. On such an implementation, a return to an unsupported instruction set state cannot occur. B1-40 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model B1.6.5 Exception-handling instructions From ARMv6, the instruction sets include the following exception-handling instructions, in addition to the exception return instructions described in Exception return on page B1-38: • a CPS (Change Processor State) instruction to simplify changes of processor mode and the disabling and enabling of interrupts, see CPS on page B6-3 • an SRS (Store Return State) instruction, to reduce the processing cost of handling exceptions in a different mode to the exception entry mode, by removing any need to use the stack of the original mode, see SRS on page B6-20. As an example of where these instructions might be used, an IRQ routine might want to execute in System or Supervisor mode, so that it can both re-enable IRQs and use BL instructions. This is not possible in IRQ mode, because a nested IRQ could corrupt the return link of a BL at any time. With the CPS and SRS instructions, the system can use the following instruction sequence at the start of its exception handler: SUB SRSFD CPSIE LR,LR,#4 SP!, # i,# ; IRQ requires subtraction from LR ; = 19 for Supervisor, 31 for System This: • stores the return state held in the LR and SPSR_irq to the stack for Supervisor mode or for User and System mode • switches to Supervisor or System mode and re-enables IRQs. This is done efficiently, without making any use of SP_irq or the IRQ stack. At the end of the exception handler, an RFEFD SP! instruction pops the return state off the stack and returns from the exception. B1.6.6 Control of exception handling by the Security Extensions The Security Exceptions provide additional controls of the handling of: • aborts, see Control of aborts by the Security Extensions • FIQs, see Control of FIQs by the Security Extensions on page B1-42 • IRQs, see Control of IRQs by the Security Extensions on page B1-43. Control of aborts by the Security Extensions The CPSR.A bit can be used to disable asynchronous aborts. When the Security Extensions are implemented: • the SCR.AW bit controls whether the CPSR.A bit can be modified in Non-secure state • the SCR.EA bit controls whether external aborts are handled in Abort mode or Monitor mode. For details of these bits see c1, Secure Configuration Register (SCR) on page B3-106. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-41 The System Level Programmers’ Model Table B1-9 shows the possible values for the SCR.AW and SCR.EA bits, and the abort handling that results in each case: Table B1-9 Effect of the SCR.AW and SCR.EA bits on abort handling SCR bits Effect on abort handling AW EA 0 0 All aborts are handled locally using Abort mode. Asynchronous aborts are maskable only in Secure state. This is the reset state and supports legacy systems. 0 1 All external aborts, synchronous and asynchronous, are handled in Monitor mode. Asynchronous aborts are maskable only in Secure state. All security aborts from peripherals can be treated in a safe manner in Monitor mode. 1 0 All aborts are handled locally, using Abort mode. Asynchronous aborts are maskable in both Secure and Non-secure states. 1 1 All external aborts are trapped to Monitor mode. Non-secure state can hide asynchronous external aborts from the Monitor, by changing the CPSR.A bit. When the SCR.EA bit is set to 1, and an external abort causes entry to Monitor mode, fault information is written to the Secure copies of the Fault Status and Fault Address registers. Control of FIQs by the Security Extensions The CPSR.F bit can be used to disable FIQs. When the Security Extensions are implemented: • the SCR.FW bit controls whether the CPSR.F bit can be modified in Non-secure state • the SCR.FIQ bit controls whether FIQs are handled in FIQ mode or Monitor mode. For details of these bits see c1, Secure Configuration Register (SCR) on page B3-106. Table B1-10 on page B1-43 shows the effect of these bits on FIQ handling: B1-42 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model Table B1-10 Effect of the SCR.AW and SCR.EA bits on FIQ handling SCR bits Effect on FIQ handling FW FIQ 0 0 FIQs are handled locally using FIQ mode. FIQs are maskable only in Secure state. This is the reset state and supports legacy systems. 0 1 FIQs are handled in Monitor mode. FIQs are maskable only in Secure state. This setting gives Secure FIQs. 1 0 FIQs are handled locally in FIQ mode. FIQs can be masked, in both Secure and Non-secure states. 1 1 All FIQs are trapped to Monitor mode. Non-secure state can hide FIQs from the Monitor, by changing the CPSR.F bit. Note • The configuration with SCR.FW == 1 and SCR.FIQ == 1 permits Non-secure state to deny service by changing the CPSR.F bit. ARM recommends that this configuration is not used. • Interrupts driven by Secure peripherals are called Secure interrupts. When SCR.FW = 0 and SCR.FIQ = 1, FIQ exceptions can be used as Secure interrupts. These enter Secure state in a deterministic way. Control of IRQs by the Security Extensions When the Security Extensions are implemented, the SCR.IRQ bit controls whether IRQs are handled in IRQ mode or Monitor mode. For details of this bit see c1, Secure Configuration Register (SCR) on page B3-106. B1.6.7 Low interrupt latency configuration The SCTLR.FI bit is set to 1 to enable the low interrupt latency configuration of an implementation. This configuration can reduce the interrupt latency of the processor. The mechanisms implemented to achieve low interrupt latency are IMPLEMENTATION DEFINED. For the description of the SCTLR see: • c1, System Control Register (SCTLR) on page B3-96 for a VMSA implementation • c1, System Control Register (SCTLR) on page B4-45 for a PMSA implementation. To ensure that a change between normal and low interrupt latency configurations is synchronized correctly, the SCTLR.FI bit must be changed only in IMPLEMENTATION DEFINED circumstances. The FI bit can be changed shortly after reset and before enabling the MMU, MPU, or caches, when interrupts are disabled, using the following sequence: ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-43 The System Level Programmers’ Model DSB ISB MCR p15, 0, Rx, c1, c0, c0 DSB ISB ; change FI bit in the SCTLR Implementation can define other sequences and circumstances that permit the SCTLR.FI bit to be changed. When interrupt latency is reduced, this can result in reduced performance overall. Examples of methods that might be used to reduce interrupt latency are: • disabling Hit-Under-Miss functionality in a processor • the abandoning of restartable external accesses. These choices permit the processor to react to a pending interrupt faster than would otherwise be the case. A low interrupt latency configuration permits interrupts and asynchronous aborts to be taken during a sequence of memory transactions generated by a load/store instruction. For details of what these sequences are and the consequences of taking interrupts and asynchronous aborts in this way see Single-copy atomicity on page A3-27. ARM deprecates any software reliance on the behavior that an interrupt or asynchronous abort cannot occur in a sequence of memory transactions generated by a single load/store instruction to Normal memory. Note A particular case that has shown this reliance is load multiples that load the stack pointer from memory. In an implementation where an interrupt is taken during the LDM, this can result in corruption of the stack pointer. B1.6.8 Wait For Event and Send Event A multiprocessor operating system requires locking mechanisms to protect data structures from being accessed simultaneously by multiple processors. These mechanisms prevent the data structures becoming inconsistent or corrupted if different processors try to make conflicting changes. If a lock is busy, because a data structure is being used by one processor, it might not be practical for another processor to do anything except wait for the lock to be released. For example, if a processor is handling an interrupt from a device it might need to add data received from the device to a queue. If another processor is removing data from the queue, it will have locked the memory area that holds the queue. The first processor cannot add the new data until the queue is in a consistent state and the lock has been released. It cannot return from the interrupt handler until the data has been added to the queue, so it must wait. Typically, a spin-lock mechanism is provided for these circumstances: B1-44 • A processor requiring access to the protected data attempts to obtain the lock using single-copy atomic synchronization primitives such as the ARM Load-Exclusive and Store-Exclusive operations described in Synchronization and semaphores on page A3-12. • If the processor obtains the lock it performs its memory operation and releases the lock. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model • If the processor cannot obtain the lock, it reads the lock value repeatedly in a tight loop until the lock becomes available. At this point it again attempts to obtain the lock. However, this spin-lock mechanism is not ideal for all situations: • in a low-power system the tight read loop is undesirable because it uses energy to no effect • in a multi-threaded processor the execution of spin-locks by waiting threads can significantly degrade overall performance. Therefore, ARMv7 provides an alternative locking mechanism based on events. The Wait For Event lock mechanism permits a processor that has failed to obtain a lock to enter a low-power state. When the processor that currently holds the required lock releases the lock it sends an event that causes any waiting processors to wake up and attempt to gain the lock again. Note Although a complex operating system can contain thousands of distinct locks, the event sent by this mechanism does not indicate which lock has been released. If the event relates to a different lock, or if another processor acquires the lock more quickly, the processor fails to acquire the lock and can re-enter the low-power state waiting for the next event. The Wait For Event system relies on hardware and software working together to achieve energy saving: • the hardware provides the mechanism to enter the Wait For Event low-power state • the operating system software is responsible for issuing: — a Wait For Event instruction when waiting for a spin-lock, to enter the low-power state — a Send Event instructions when releasing a spin-lock. The mechanism depends on the interaction of: • WFE wake-up events, see WFE wake-up events • the Event Register, see The Event Register on page B1-46 • the Send Event instruction, see The Send Event instruction on page B1-46 • the Wait For Event instruction, see The Wait For Event instruction on page B1-46. WFE wake-up events The following events are WFE wake-up events: • the execution of an SEV instruction on any processor in the multiprocessor system • an IRQ interrupt, unless masked by the CPSR.I bit • an FIQ interrupt, unless masked by the CPSR.F bit • an asynchronous abort, unless masked by the CPSR.A bit • a debug event, if invasive debug is enabled and the debug event is permitted. For details of the masking bits in the CPSR see Format of the CPSR and SPSRs on page B1-16. This masking is an important consideration with this mechanism, because lock mechanisms can be required when interrupts are disabled. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-45 The System Level Programmers’ Model The Event Register The Event Register is a single bit register for each processor. When set, an event register indicates that an event has occurred, since the register was last cleared, that might prevent the processor needing to suspend operation on issuing a WFE instruction. The value of the Event Register at reset is UNKNOWN. The Event Register is set by any WFE wake-up event or by the execution of an exception return instruction. For the definition of exception return instructions see Exception return on page B1-38. The Event Register is cleared only by a Wait For Event instruction. You cannot read or write the value of the Event Register directly. The Send Event instruction The Send Event instruction causes an event to be signaled to all processors in the multiprocessor system. The mechanism used to signal the event to the processors is IMPLEMENTATION DEFINED. The Send Event instruction sets the Event Register. The Send Event instruction, SEV, is available to both unprivileged and privileged code, see SEV on page A8-316. The Wait For Event instruction The action of the Wait For Event instruction depends on the state of the Event Register: • If the Event Register is set, the instruction clears the register and returns immediately. Normally, if this happens the processor makes another attempt to claim the lock. • If the Event Register is clear the processor can suspend execution and enter a low-power state. It can remain in that state until the processor detects a WFE wake-up event or a reset. When the processor detects a WFE wake-up event, or earlier if the implementation chooses, the WFE instruction completes. The Wait For Event instruction, WFE, is available to both unprivileged and privileged code, see WFE on page A8-808. The code using the Wait For Event mechanism must be tolerant to spurious wake-up events, including multiple wake ups. Pseudocode details of the Wait For Event lock mechanism The ClearEventRegister() pseudocode procedure clears the Event Register of the current processor. The EventRegistered() pseudocode function returns TRUE if the Event Register of the current processor is set and FALSE if it is clear: boolean EventRegistered() B1-46 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model The WaitForEvent() pseudocode procedure optionally suspends execution until a WFE wake-up event or reset occurs, or until some earlier time if the implementation chooses. It is IMPLEMENTATION DEFINED whether restarting execution after the period of suspension causes a ClearEventRegister() to occur. The SendEvent() pseudocode procedure sets the Event Register of every processor in the multiprocessor system. B1.6.9 Wait For Interrupt Previous versions of the ARM architecture have included a Wait For Interrupt concept, and Wait For Interrupt is a required feature of the architecture from ARMv6. In ARMv7, Wait For Interrupt is supported only through an instruction, WFI, that is provided in the ARM and Thumb instruction sets. For more information, see WFI on page A8-810. Note In ARMv7 the CP15 c7 encoding previously used for WFI is redefined as a NOP, see CP15 c7, No Operation (NOP) on page B3-138 and CP15 c7, Miscellaneous functions on page B4-72. When a processor issues a WFI instruction it can suspend execution and enter a low-power state. It can remain in that state until the processor detects a reset or one of the following WFI wake-up events: • an IRQ interrupt, regardless of the value of the CPSR.I bit • an FIQ interrupt, regardless of the value of the CPSR.F bit • an asynchronous abort, regardless of the value of the CPSR.A bit • a debug event, when invasive debug is enabled and the debug event is permitted. When the hardware detects a WFI wake-up event, or earlier if the implementation chooses, the WFI instruction completes. WFI wake-up events cannot be masked by the mask bits in the CPSR. Note • Because debug entry is one of the WFI wake-up events, ARM strongly recommends that Wait For Interrupt is used as part of an idle loop rather than waiting for a single specific interrupt event to occur and then moving forward. This ensures the intervention of debug while waiting does not significantly change the function of the program being debugged. • In some previous implementations of Wait For Interrupt, the idle loop is followed by exit functions that must be executed before the interrupt is taken. The operation of Wait For Interrupt remains consistent with this model, and therefore differs from the operation of Wait For Event. • Some implementations of Wait For Interrupt drain down any pending memory activity before suspending execution. This increases the power saving, by increasing the area over which clocks can be stopped. This operation is not required by the ARM architecture, and code must not rely on Wait For Interrupt operating in this way. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-47 The System Level Programmers’ Model Using WFI to indicate an idle state on bus interfaces A common implementation practice is to complete any entry into power-down routines with a WFI instruction. Typically, the WFI instruction: 1. forces the suspension of execution, and of all associated bus activity 2. ceases to execute instructions from processor. The control logic required to do this typically tracks the activity of the bus interfaces of the processor. This means it can signal to an external power controller that there is no ongoing bus activity. The exact nature of this interface is IMPLEMENTATION DEFINED, but the use of Wait For Interrupt as the only architecturally-defined mechanism that completely suspends execution makes it very suitable as the preferred power-down entry mechanism for future implementations. Pseudocode details of Wait For Interrupt The WaitForInterrupt() pseudocode procedure optionally suspends execution until a WFI wake-up event or reset occurs, or until some earlier time if the implementation chooses. B1.6.10 Reset On an ARM processor, when the Reset input is asserted the processor immediately stops execution of the current instruction. When Reset is de-asserted, the actions described in Exception entry on page B1-34 are performed, for the Reset exception. The processor then starts executing code, in Supervisor mode with interrupts disabled. Execution starts from the normal or high reset vector address, 0x00000000 or 0xFFFF0000, as determined by the reset value of the SCTLR.V bit. This reset value can be determined by an IMPLEMENTATION DEFINED configuration input signal. Note • The ARM architecture does not distinguish between multiple levels of reset. A system can provide multiple distinct levels of reset that reset different parts of the system. These all correspond to this single reset exception. • The reset value of the SCTLR.EE bit can be defined by a configuration input signal. If this is done, that value also applies to the CPSR.E bit on reset. For more information see: — c1, System Control Register (SCTLR) on page B3-96 for a VMSA implementation — c1, System Control Register (SCTLR) on page B4-45 for a PMSA implementation. The following pseudocode describes how this exception is taken: // TakeReset() // =========== TakeReset() // Enter Supervisor mode and (if relevant) Secure state, and reset CP15. This affects // the banked versions and values of various registers accessed later in the code. // Also reset other system components. CPSR.M = ‘10011’; // Supervisor mode B1-48 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model if HaveSecurityExt() then SCR.NS = ‘0’; ResetCP15Registers(); ResetDebugRegisters(); if HaveAdvSIMDorVFP() then FPEXC.EN = ‘0’; SUBARCHITECTURE_DEFINED further resetting; if HaveThumbEE() then TEECR.XED = ‘0’; if HaveJazelle() then JMCR.JE = ‘0’; SUBARCHITECTURE_DEFINED further resetting; // Further CPSR changes: all interrupts disabled, IT state reset, instruction set // and endianness according to the SCTLR values produced by the above call to // ResetCP15Registers(). CPSR.I = ‘1’; CPSR.F = ‘1’; CPSR.A = ‘1’; CPSR.IT = ‘00000000’; CPSR.J = ‘0’; CPSR.T = SCTLR.TE; // TE=0: ARM, TE=1: Thumb CPSR.E = SCTLR.EE; // EE=0: little-endian, EE=1: big-endian // // // // All registers, bits and fields not reset by the above pseudocode or by the BranchTo() call below are UNKNOWN bitstrings after reset. In particular, the return information registers R14_svc and SPSR_svc have UNKNOWN values, so that it is impossible to return from a reset in an architecturally defined way. // Branch to Reset vector. BranchTo(ExcVectorBase() + 0); The ARM architecture does not define any way of returning from a reset. B1.6.11 Undefined Instruction exception An Undefined Instruction exception might be caused by: • a coprocessor instruction that is not accessible because of the settings in one or both of: — the Coprocessor Access Control Register, see c1, Coprocessor Access Control Register (CPACR) on page B3-104 for a VMSA implementation, or c1, Coprocessor Access Control Register (CPACR) on page B4-51 for a PMSA implementation — in an implementation that includes the Security Extensions, the Non-Secure Access Control Register, see c1, Non-Secure Access Control Register (NSACR) on page B3-110 • a coprocessor instruction that is not implemented • an instruction that is UNDEFINED • an attempt to execute an instruction in an unsupported instruction set state, see Exception return to an unsupported instruction set state on page B1-40 • division by zero in an SDIV or UDIV instruction in the ARMv7-R profile when the SCTLR.DZ bit is set to 1, see c1, System Control Register (SCTLR) on page B4-45. The Undefined Instruction exception can be used for: • software emulation of a coprocessor in a system that does not have the physical coprocessor hardware • lazy context switching of coprocessor registers • general-purpose instruction set extension by software emulation ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-49 The System Level Programmers’ Model • • signaling an illegal instruction execution division by zero errors. In some coprocessor designs, an internal exceptional condition caused by one coprocessor instruction is signaled asynchronously by refusing to respond to a later coprocessor instruction that belongs to the same coprocessor. In these circumstances, the Undefined Instruction handler must take whatever action is needed to clear the exceptional condition, and then return to the second coprocessor instruction. Note The only mechanism to determine the cause of an Undefined Instruction exception is analysis of the instruction indicated by the return link in the LR on exception entry. Therefore it is important that a coprocessor only reports exceptional conditions by generating Undefined Instruction exceptions on its own coprocessor instructions. The following pseudocode describes how this exception is taken: // TakeUndefInstrException() // ========================= TakeUndefInstrException() // Determine return information. SPSR is to be the current CPSR, and LR is to be the // current PC minus 2 for Thumb or 4 for ARM, to change the PC offsets of 4 or 8 // respectively from the address of the current instruction into the required return // address offsets of 2 or 4 respectively. new_lr_value = if CPSR.T == ‘1’ then PC-2 else PC-4; new_spsr_value = CPSR; // Enter Undefined (‘11011’) mode, and ensure Secure state if initially in Monitor // (‘10110’) mode. This affects the banked versions of various registers accessed later // in the code. if CPSR.M == ‘10110’ then SCR.NS = ‘0’; CPSR.M = ‘11011’; // Write return information to registers, and make further CPSR changes: IRQs disabled, // IT state reset, instruction set and endianness to SCTLR-configured values. SPSR[] = new_spsr_value; R[14] = new_lr_value; CPSR.I = ‘1’; CPSR.IT = ‘00000000’; CPSR.J = ‘0’; CPSR.T = SCTLR.TE; // TE=0: ARM, TE=1: Thumb CPSR.E = SCTLR.EE; // EE=0: little-endian, EE=1: big-endian // Branch to Undefined Instruction vector. BranchTo(ExcVectorBase() + 4); The preferred exception return from an Undefined Instruction exception is a return to the instruction that generated the exception. Use the LR and SPSR values generated by the exception entry to produce this return as follows: • B1-50 If SPSR.J and SPSR.T are both 0, indicating that the exception occurred in ARM state, use an exception return instruction with a subtraction of 4 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model • If SPSR.J and SPSR.T are not both 0, indicating that the exception occurred in Thumb state or ThumbEE state, use an exception return instruction with a subtraction of 2. For more information, see Exception return on page B1-38. Note • Undefined Instruction exceptions cannot occur in Jazelle state • If handling the Undefined Instruction exception requires instruction emulation, followed by return to the next instruction after the instruction that caused the exception, the instruction emulator must use the instruction length to calculate the correct return address, and to calculate the updated values of the IT bits if necessary. Conditional execution of undefined instructions The conditional execution rules described in Conditional execution on page A8-8 apply to all instructions. This includes UNDEFINED instructions and other instructions that would cause entry to the Undefined Instruction exception. If such an instruction fails its condition check, the behavior depends on the architecture profile and the potential cause of entry to the Undefined Instruction exception, as follows: • • In the ARMv7-A profile: — If the potential cause is the execution of the instruction itself and depends on data values the instruction reads, the instruction executes as a NOP and does not cause an Undefined Instruction exception. — If the potential cause is the execution of an earlier coprocessor instruction, or the execution of the instruction itself but does not depend on data values the instruction reads, it is IMPLEMENTATION DEFINED whether the instruction executes as a NOP or causes an Undefined Instruction exception. An implementation must handle all such cases in the same way. In the ARMv7-R profile, the instruction executes as a NOP and does not cause an Undefined Instruction exception. Note Before ARMv7, all implementations executed any instruction that failed its condition check as a NOP, even if it would otherwise have caused an Undefined Instruction exception. Undefined Instruction handlers written for these implementations might assume without checking that the undefined instruction passed its condition check. Such Undefined Instruction handlers are likely to need rewriting, to check the condition is passed, before they function correctly on all ARMv7-A implementations. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-51 The System Level Programmers’ Model B1.6.12 Supervisor Call (SVC) exception The Supervisor Call instruction SVC enters Supervisor mode and requests a supervisor function. Typically, the SVC instruction is used to request an operating system function. For more information, see SVC (previously SWI) on page A8-430. Note In previous versions of the ARM architecture, the SVC instruction was called SWI, Software Interrupt. The following pseudocode describes how this exception is taken: // TakeSVCException() // ================== TakeSVCException() // Determine return information. SPSR is to be the current CPSR, after changing the IT[] // bits to give them the correct values for the following instruction, and LR is to be // the current PC minus 2 for Thumb or 4 for ARM, to change the PC offsets of 4 or 8 // respectively from the address of the current instruction into the required address of // the next instruction (the SVC instruction having size 2 or 4 bytes respectively). ITAdvance(); new_lr_value = if CPSR.T == ‘1’ then PC-2 else PC-4; new_spsr_value = CPSR; // Enter Supervisor (‘10011’) mode, and ensure Secure state if initially in Monitor // (‘10110’) mode. This affects the banked versions of various registers accessed later // in the code. if CPSR.M == ‘10110’ then SCR.NS = ‘0’; CPSR.M = ‘10011’; // Write return information to registers, and make further CPSR changes: IRQs disabled, // IT state reset, instruction set and endianness to SCTLR-configured values. SPSR[] = new_spsr_value; R[14] = new_lr_value; CPSR.I = ‘1’; CPSR.IT = ‘00000000’; CPSR.J = ‘0’; CPSR.T = SCTLR.TE; // TE=0: ARM, TE=1: Thumb CPSR.E = SCTLR.EE; // EE=0: little-endian, EE=1: big-endian // Branch to SVC vector. BranchTo(ExcVectorBase() + 8); The preferred exception return from an SVC exception is a return to the next instruction after the SVC instruction. Use the LR and SPSR values generated by the exception entry to produce this return by using an exception return instruction without a subtraction. For more information, see Exception return on page B1-38. B1-52 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model B1.6.13 Secure Monitor Call (SMC) exception The Secure Monitor Call instruction SMC enters Monitor mode and requests a Monitor function. For more information, see SMC (previously SMI) on page B6-18. Note In previous versions of the ARM architecture, the SMC instruction was called SMI, Software Monitor Interrupt. The following pseudocode describes how this exception is taken: // TakeSMCException() // ================== TakeSMCException() // Determine return information. SPSR is to be the current CPSR, after changing the IT[] // bits to give them the correct values for the following instruction, and LR is to be // the current PC minus 0 for Thumb or 4 for ARM, to change the PC offsets of 4 or 8 // respectively from the address of the current instruction into the required address of // the next instruction (with the SMC instruction always being 4 bytes in length). ITAdvance(); new_lr_value = if CPSR.T == ‘1’ then PC else PC-4; new_spsr_value = CPSR; // Enter Monitor (‘10110’) mode, and ensure Secure state if initially in Monitor mode. // This affects the banked versions of various registers accessed later in the code. if CPSR.M == ‘10110’ then SCR.NS = ‘0’; CPSR.M = ‘10110’; // Write return information to registers, and make further CPSR changes: interrupts // disabled, IT state reset, instruction set and endianness to SCTLR-configured values. SPSR[] = new_spsr_value; R[14] = new_lr_value; CPSR.I = ‘1’; CPSR.F = ‘1’; CPSR.A = ‘1’; CPSR.IT = ‘00000000’; CPSR.J = ‘0’; CPSR.T = SCTLR.TE; // TE=0: ARM, TE=1: Thumb CPSR.E = SCTLR.EE; // EE=0: little-endian, EE=1: big-endian // Branch to SMC vector. BranchTo(MVBAR + 8); The preferred exception return from an SMC exception is a return to the next instruction after the SMC instruction. Use the LR and SPSR values generated by the exception entry to produce this return by using an exception return instruction without a subtraction. For more information, see Exception return on page B1-38. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-53 The System Level Programmers’ Model Note You can return to the SMC instruction itself by returning using a subtraction of 4, without any adjustment to the SPSR.IT[7:0] bits. The result is that the return occurs, then interrupts or external aborts might occur and be handled, then the SMC instruction is re-executed and another SMC exception occurs. This relies on: • the SMC instruction being used correctly, either outside an IT block or as the last instruction in an IT block, so that the SPSR.IT[7:0] bits indicate unconditional execution • the SMC handler not changing the result of the original conditional execution test for the SMC instruction. B1.6.14 Prefetch Abort exception A Prefetch Abort exception can be generated by: • A synchronous memory abort on an instruction fetch. Note Asynchronous aborts on instruction fetches are reported using the Data Abort exception, see Data Abort exception on page B1-55. Prefetch Abort exception entry is synchronous to the instruction whose instruction fetch aborted. If an implementation prefetches instructions, it must handle a synchronous abort on an instruction prefetch by: — generating a Prefetch Abort exception if and when the instruction is about to execute — ignoring the abort if the instruction does not reach the point of being about to execute, for example, if a branch misprediction or exception entry occurs before the instruction is reached. For more information about memory aborts see: — VMSA memory aborts on page B3-40 — PMSA memory aborts on page B4-13. • A Breakpoint, Vector Catch or BKPT Instruction debug event, see Debug exception on Breakpoint, BKPT Instruction or Vector Catch debug events on page C4-2. The following pseudocode describes how this exception is taken: // TakePrefetchAbortException() // ============================ TakePrefetchAbortException() // Determine return information. SPSR is to be the current CPSR, and LR is to be the // current PC minus 0 for Thumb or 4 for ARM, to change the PC offsets of 4 or 8 // respectively from the address of the current instruction into the required address // of the current instruction plus 4. new_lr_value = if CPSR.T == ‘1’ then PC else PC-4; B1-54 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model new_spsr_value = CPSR; // Determine whether this is an external abort to be trapped to Monitor mode. trap_to_monitor = HaveSecurityExt() && SCR.EA == ‘1’ && IsExternalAbort(); // Enter Abort (‘10111’) or Monitor (‘10110’) mode, and ensure Secure state if // initially in Monitor mode. This affects the banked versions of various registers // accessed later in the code. if CPSR.M == ‘10110’ then SCR.NS = ‘0’; CPSR.M = if trap_to_monitor then ‘10110’ else ‘10111’; // Write return information to registers, and make further CPSR changes: IRQs disabled, // other interrupts disabled if appropriate, IT state reset, instruction set and // endianness to SCTLR-configured values. SPSR[] = new_spsr_value; R[14] = new_lr_value; CPSR.I = ‘1’; if trap_to_monitor then CPSR.F = ‘1’; CPSR.A = ‘1’; else if !HaveSecurityExt() || SCR.NS == ‘0’ || SCR.AW == ‘1’ then CPSR.A = ‘1’; CPSR.IT = ‘00000000’; CPSR.J = ‘0’; CPSR.T = SCTLR.TE; // TE=0: ARM, TE=1: Thumb CPSR.E = SCTLR.EE; // EE=0: little-endian, EE=1: big-endian // Branch to correct Prefetch Abort vector. if trap_to_monitor then BranchTo(MVBAR + 12); else BranchTo(ExcVectorBase() + 12); The preferred exception return from a Prefetch Abort exception is a return to the aborted instruction. Use the LR and SPSR values generated by the exception entry to produce this return by using an exception return instruction with a subtraction of 4. For more information, see Exception return on page B1-38. B1.6.15 Data Abort exception A Data Abort exception can be generated by: • A synchronous abort on a data read or write memory access. Exception entry is synchronous to the instruction that generated the memory access. • An asynchronous abort. The memory access that caused the abort can be any of: — a data read or write access — an instruction fetch or prefetch — in a VMSA memory system, a translation table access. Exception entry occurs asynchronously. It is similar to an interrupt, but uses either Abort mode or Monitor mode, and the associated banked registers. Setting the CPSR.A bit prevents asynchronous aborts from occurring. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-55 The System Level Programmers’ Model Note There are no asynchronous internal aborts in ARMv7 and earlier architecture versions, so asynchronous aborts are always asynchronous external aborts. • A Watchpoint debug event, see Debug exception on Watchpoint debug event on page C4-3. Note A Data Abort exception generated by a Watchpoint debug event can be either asynchronous or synchronous, but is not an abort. This means that if it is asynchronous, it is an asynchronous Data Abort exception but not an asynchronous abort. For more information about memory aborts see: • VMSA memory aborts on page B3-40 • PMSA memory aborts on page B4-13. The following pseudocode describes how this exception is taken: // TakeDataAbortException() // ======================== TakeDataAbortException() // Determine return information. SPSR is to be the current CPSR, and LR is to be the // current PC plus 4 for Thumb or 0 for ARM, to change the PC offsets of 4 or 8 // respectively from the address of the current instruction into the required address // of the current instruction plus 8. For an asynchronous abort, the PC and CPSR are // considered to have already moved on to their values for the instruction following // the instruction boundary at which the exception occurred. new_lr_value = if CPSR.T == ‘1’ then PC+4 else PC; new_spsr_value = CPSR; // Determine whether this is an external abort to be trapped to Monitor mode. trap_to_monitor = HaveSecurityExt() && SCR.EA == ‘1’ && IsExternalAbort(); // Enter Abort (‘10111’) or Monitor (‘10110’) mode, and ensure Secure state if // initially in Monitor mode. This affects the banked versions of various registers // accessed later in the code. if CPSR.M == ‘10110’ then SCR.NS = ‘0’; CPSR.M = if trap_to_monitor then ‘10110’ else ‘10111’; // Write return information to registers, and make further CPSR changes: IRQs disabled, // other interrupts disabled if appropriate, IT state reset, instruction set and // endianness to SCTLR-configured values. SPSR[] = new_spsr_value; R[14] = new_lr_value; CPSR.I = ‘1’; if trap_to_monitor then CPSR.F = ‘1’; CPSR.A = ‘1’; else if !HaveSecurityExt() || SCR.NS == ‘0’ || SCR.AW == ‘1’ then CPSR.A = ‘1’; CPSR.IT = ‘00000000’; B1-56 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model CPSR.J = ‘0’; CPSR.T = SCTLR.TE; CPSR.E = SCTLR.EE; // TE=0: ARM, TE=1: Thumb // EE=0: little-endian, EE=1: big-endian // Branch to correct Data Abort vector. if trap_to_monitor then BranchTo(MVBAR + 16); else BranchTo(ExcVectorBase() + 16); The preferred exception return from a Data Abort exception is a return to the instruction that generated the aborting memory access, or to the instruction following the instruction boundary at which an asynchronous Data Abort exception occurred. Use the LR and SPSR values generated by the exception entry to produce this return by using an exception return instruction with a subtraction of 8. For more information, see Exception return on page B1-38. Effects of data-aborted instructions Instructions that access data memory can modify memory by storing one or more values. If a Data Abort exception is generated by executing such an instruction, the value of each memory location that the instruction stores to is: • unchanged if the memory system does not permit write access to the memory location UNKNOWN otherwise. • Instructions that access data memory can modify registers in the following ways: • By loading values into one or more of the general-purpose registers. The registers loaded can include the PC. • By specifying base register write-back, in which the base register used in the address calculation has a modified value written to it. All instructions that support base register write-back have UNPREDICTABLE results if this is specified with the PC as the base register. Only general-purpose registers other than the PC can be modified reliably in this way. • By loading values into coprocessor registers. • By modifying the CPSR. If a synchronous Data Abort exception is generated by executing such an instruction, the following rules determine the values left in these registers: 1. On entry to the Data Abort handler: • the PC value is the Data Abort vector address, see Exception vectors and the exception base address on page B1-30 • the LR_abt value is determined from the address of the aborted instruction. Neither value is affected in any way by the results of any load specified by the instruction. 2. ARM DDI 0406B The base register is restored to its original value if either: • the aborted instruction is a load that includes the base register in the list to be loaded Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-57 The System Level Programmers’ Model • the base register is being written back. 3. If the instruction only loads one general-purpose register, the value in that register is unchanged. 4. If the instruction loads more than one general-purpose register, UNKNOWN values are left in destination registers other than the PC and the base register of the instruction. 5. If the instruction loads coprocessor registers, UNKNOWN values are left in the destination coprocessor registers. 6. CPSR bits that are not defined as updated on exception entry retain their current value. 7. If a synchronous Data Abort exception is generated by execution of a STREX, STREXB, STREXH, or STREXD instruction: • memory is not updated • is not updated • it is UNPREDICTABLE whether the monitor changes from the Exclusive state to the Open state. The ARM abort model The abort model used by an ARM processor implementation is described as a Base Restored Abort Model. This means that if a synchronous Data Abort exception is generated by executing an instruction that specifies base register write-back, the value in the base register is unchanged. Note In versions of the ARM architecture before ARMv6, it is IMPLEMENTATION DEFINED whether the abort model used is the Base Restored Abort Model or the Base Updated Abort Model. For more information, see The ARM abort model on page AppxH-20. The abort model applies uniformly across all instructions. B1.6.16 IRQ exception The IRQ exception is generated by IMPLEMENTATION DEFINED means. Typically this is by asserting an IRQ interrupt request input to the processor. Whether and how an IRQ exception is taken depends on the CPSR.I and SCTLR.FI bits: B1-58 • If CPSR.I == 1, IRQ exceptions are disabled and are not taken. • If CPSR.I == 0 and SCTLR.FI == 0, IRQ exceptions can be taken. In this case IRQ exception entry is precise to an instruction boundary. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model • If CPSR.I == 0 and SCTLR.FI == 1, IRQ exceptions can be taken. In this case IRQ exception entry is precise to an instruction boundary, except that some of the effects of the instruction that follows that boundary might have occurred. These effects are restricted to those that can be repeated idempotently and without breaking the rules in Single-copy atomicity on page A3-27. Examples of such effects are: — changing the value of a register that the instruction writes but does not read — performing an access to Normal memory. Note This relaxation of the normal definition of a precise asynchronous exception permits interrupts to occur during the execution of instructions that change register or memory values, while only requiring the implementation to restore those register values that are needed to correctly re-execute the instruction after the preferred exception return. LDM and STM are examples of such instructions. The following pseudocode describes how this exception is taken: // TakeIRQException() // ================== TakeIRQException() // Determine return information. SPSR is to be the current CPSR, and LR is to be the // current PC minus 0 for Thumb or 4 for ARM, to change the PC offsets of 4 or 8 // respectively from the address of the current instruction into the required address // of the instruction boundary at which the interrupt occurred plus 4. For this // purpose, the PC and CPSR are considered to have already moved on to their values // for the instruction following that boundary. new_lr_value = if CPSR.T == ‘1’ then PC else PC-4; new_spsr_value = CPSR; // Determine whether IRQs are trapped to Monitor mode. trap_to_monitor = HaveSecurityExt() && SCR.IRQ == ‘1’; // Enter IRQ (‘10010’) or Monitor (‘10110’) mode, and ensure Secure state if initially // in Monitor mode. This affects the banked versions of various registers accessed // later in the code. if CPSR.M == ‘10110’ then SCR.NS = ‘0’; CPSR.M = if trap_to_monitor then ‘10110’ else ‘10010’; // Write return information to registers, and make further CPSR changes: IRQs disabled, // other interrupts disabled if appropriate, IT state reset, instruction set and // endianness to SCTLR-configured values. SPSR[] = new_spsr_value; R[14] = new_lr_value; CPSR.I = ‘1’; if trap_to_monitor then CPSR.F = ‘1’; CPSR.A = ‘1’; else if !HaveSecurityExt() || SCR.NS == ‘0’ || SCR.AW == ‘1’ then CPSR.A = ‘1’; CPSR.IT = ‘00000000’; CPSR.J = ‘0’; CPSR.T = SCTLR.TE; // TE=0: ARM, TE=1: Thumb CPSR.E = SCTLR.EE; // EE=0: little-endian, EE=1: big-endian ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-59 The System Level Programmers’ Model // Branch to correct IRQ vector. if trap_to_monitor then BranchTo(MVBAR + 24); elsif SCTLR.VE == ‘1’ then IMPLEMENTATION_DEFINED branch to an IRQ vector; else BranchTo(ExcVectorBase() + 24); The preferred exception return from an IRQ interrupt is a return to the instruction following the instruction boundary at which the interrupt occurred. Use the LR and SPSR values generated by the exception entry to produce this return by using an exception return instruction with a subtraction of 4. For more information, see Exception return on page B1-38. B1.6.17 FIQ exception The FIQ exception is generated by IMPLEMENTATION DEFINED means. Typically this is by asserting an FIQ interrupt request input to the processor. Whether and how an FIQ exception is taken depends on the CPSR.F and SCTLR.FI bits: • If CPSR.F == 1, FIQ exceptions are disabled and are not taken. • If CPSR.F == 0 and SCTLR.FI == 0, FIQ exceptions can be taken. In this case FIQ exception entry is precise to an instruction boundary. • If CPSR.F == 0 and SCTLR.FI == 1, FIQ exceptions can be taken. In this case FIQ exception entry is precise to an instruction boundary, except that some of the effects of the instruction that follows that boundary might have occurred. These effects are restricted to those that can be repeated idempotently and without breaking the rules in Single-copy atomicity on page A3-27. Examples of such effects are: — changing the value of a register that the instruction writes but does not read — performing an access to Normal memory. Note This relaxation of the normal definition of a precise asynchronous exception permits interrupts to occur during the execution of instructions that change register or memory values, while only requiring the implementation to restore those register values that are needed to correctly re-execute the instruction after the preferred exception return. LDM and STM are examples of such instructions. The FIQ vector is the last vector in the vector table. This means the FIQ exception handler can be placed directly at the FIQ vector address, see Exception vectors and the exception base address on page B1-30. For example, if High vectors are enabled and VE == 0 the FIQ exception handler software can be placed at 0xFFFF001C. This avoids a branch instruction from the vector. The following pseudocode describes how this exception is taken: // TakeFIQException() // ================== B1-60 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model TakeFIQException() // Determine return information. SPSR is to be the current CPSR, and LR is to be the // current PC minus 0 for Thumb or 4 for ARM, to change the PC offsets of 4 or 8 // respectively from the address of the current instruction into the required address // of the instruction boundary at which the interrupt occurred plus 4. For this // purpose, the PC and CPSR are considered to have already moved on to their values // for the instruction following that boundary. new_lr_value = if CPSR.T == ‘1’ then PC else PC-4; new_spsr_value = CPSR; // Determine whether FIQs are trapped to Monitor mode. trap_to_monitor = HaveSecurityExt() && SCR.FIQ == ‘1’; // Enter FIQ (‘10001’) or Monitor (‘10110’) mode, and ensure Secure state if initially // in Monitor mode. This affects the banked versions of various registers accessed // later in the code. if CPSR.M == ‘10110’ then SCR.NS = ‘0’; CPSR.M = if trap_to_monitor then ‘10110’ else ‘10001’; // Write return information to registers, and make further CPSR changes: IRQs disabled, // other interrupts disabled if appropriate, IT state reset, instruction set and // endianness to SCTLR-configured values. SPSR[] = new_spsr_value; R[14] = new_lr_value; CPSR.I = ‘1’; if trap_to_monitor then CPSR.F = ‘1’; CPSR.A = ‘1’; else if !HaveSecurityExt() || SCR.NS == ‘0’ || SCR.FW == ‘1’ then CPSR.F = ‘1’; if !HaveSecurityExt() || SCR.NS == ‘0’ || SCR.AW == ‘1’ then CPSR.A = ‘1’; CPSR.IT = ‘00000000’; CPSR.J = ‘0’; CPSR.T = SCTLR.TE; // TE=0: ARM, TE=1: Thumb CPSR.E = SCTLR.EE; // EE=0: little-endian, EE=1: big-endian // Branch to correct FIQ vector. if trap_to_monitor then BranchTo(MVBAR + 28); elsif SCTLR.VE == ‘1’ then IMPLEMENTATION_DEFINED branch to an FIQ vector; else BranchTo(ExcVectorBase() + 28); The preferred exception return from an FIQ interrupt is a return to the instruction following the instruction boundary at which the interrupt occurred. Use the LR and SPSR values generated by the exception entry to produce this return by using an exception return instruction with a subtraction of 4. For more information, see Exception return on page B1-38. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-61 The System Level Programmers’ Model B1.7 Coprocessors and system control The ARM architecture supports sixteen coprocessors, usually referred to as CP0 to CP15. These coprocessors are introduced in Coprocessor support on page A2-68. The architecture reserves two of these coprocessors, CP14 and CP15, for configuration and control related to the architecture: • • CP14 is reserved for the configuration and control of: — debug features, see The CP14 debug register interfaces on page C6-32 — execution environment features, see Execution environment support on page B1-73. CP15 is called the System Control coprocessor, and is reserved for the control and configuration of the ARM processor system, including architecture and feature identification. This section gives: B1.7.1 • general information about the CP15 registers, in CP15 System Control coprocessor registers • information about access controls for coprocessors CP0 to CP13, in Access controls on CP0 to CP13 on page B1-63. CP15 System Control coprocessor registers The implementation of the CP15 registers depends heavily on whether the ARMv7 implementation is: • an ARMv7-A implementation with a Virtual Memory System Architecture (VMSA) • an ARMv7-R implementation with a Protected Memory System Architecture (PMSA). Therefore, detailed descriptions of the CP15 registers are given in: • CP15 registers for a VMSA implementation on page B3-64 • CP15 registers for a PMSA implementation on page B4-22. Registers that are common to VMSA and PMSA implementations are described in both of these sections. Some registers are implemented differently in VMSA and PMSA implementations. Those descriptions do not include the registers that implement the processor identification scheme, CPUID. The CPUID registers are described in Chapter B5 The CPUID Identification Scheme. CP15, the System Control coprocessor, can contain up to 16 primary registers, each of which is 32 bits long. The CP15 register access instructions define the required primary register. Additional fields in the instruction are used to refine the access, and increase the number of physical 32-bit registers in CP15. In descriptions of the System Control coprocessor the 4-bit primary register number is used as a top level register identifier, because it is the primary factor determining the function of the register. The 16 primary registers in CP15 are identified as c0 to c15. For details of register access rights and restrictions see the descriptions of the individual registers. In ARMv7-A implementations, see also Effect of the Security Extensions on the CP15 registers on page B3-71. The CP15 register access instructions are: MCR, to write an ARM core register to a CP15 register, see MCR, MCR2 on page A8-186 • MRC, to read the value of a CP15 register into an ARM core register, see MRC, MRC2 on page A8-202. • B1-62 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model All CP15 CDP, CDP2, LDC, LDC2, MCR2, MCRR, MCRR2, MRC2, MRRC, MRRC2, STC, and STC2 instructions are UNDEFINED. B1.7.2 Access controls on CP0 to CP13 Coprocessors CP0 to CP13 might be required for optional features of the ARMv7 implementation. In particular, CP10 and CP11 are used to support floating-point operations through the VFP and Advanced SIMD extensions to the architecture, see Advanced SIMD and floating-point support on page B1-64. Coprocessors CP0 to CP7 can be used to provide IMPLEMENTATION DEFINED vendor specific features. Access to the coprocessors CP0 to CP13 is controlled by the Coprocessor Access Control Register, see: • c1, Coprocessor Access Control Register (CPACR) on page B3-104 for a VMSA implementation • c1, Coprocessor Access Control Register (CPACR) on page B4-51 for a PMSA implementation. Initially on power up or reset, access to coprocessors CP0 to CP13 is disabled. When the Security Extensions are implemented, the Non-Secure Access Control Register determines which of the CP0 to CP13 coprocessors can be accessed from the Non-secure state, see c1, Non-Secure Access Control Register (NSACR) on page B3-110. Note • When an implementation includes either or both of the VFP and Advanced SIMD extensions, the access settings for CP10 and CP11 must be identical. If these settings are not identical the behavior of the extensions is UNPREDICTABLE. • To check which coprocessors are implemented: ARM DDI 0406B 1. If required, read the Coprocessor Access Control Register and save the value. 2. Write the value 0x0FFFFFFF to the register, to write 0b11 to the access field for each of the coprocessors CP13 to CP0. 3. Read the Coprocessor Access Control Register again and check the access field for each coprocessor: • if the access field value is 0b00 the coprocessor is not implemented • if the access field value is 0b11 the coprocessor is implemented. 4. If required, write the value from stage 1 back to the register to restore the original value. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-63 The System Level Programmers’ Model B1.8 Advanced SIMD and floating-point support Advanced SIMD and VFP extensions on page A2-20 introduces: • the VFP extension, used for scalar floating-point operations • the Advanced SIMD extension, used for integer and floating-point vector operations • the Advanced SIMD and VFP extension registers D0 - D31 and their alternative views as S0 - S31 and Q0 - Q15 • the Floating-Point Status and Control Register (FPSCR). For more information about the system registers for the Advanced SIMD and VFP extensions see Advanced SIMD and VFP extension system registers on page B1-66. Software can interrogate the registers described in Advanced SIMD and VFP feature identification registers on page B5-34 to discover the Advanced SIMD and floating-point support implemented in a system. This section gives more information about the Advanced SIMD and VFP extensions, in the subsections: • Enabling Advanced SIMD and floating-point support • Advanced SIMD and VFP extension system registers on page B1-66 • The Floating-Point Exception Register (FPEXC) on page B1-68 • Context switching with the Advanced SIMD and VFP extensions on page B1-69 • VFP support code on page B1-70 • VFP subarchitecture support on page B1-72. B1.8.1 Enabling Advanced SIMD and floating-point support If an ARMv7 implementation includes support for any Advanced SIMD or VFP features then the boot software for any system that uses that implementation must ensure that: • access to CP10 and CP11 is enabled in the Coprocessor Access Control Register, see: — c1, Coprocessor Access Control Register (CPACR) on page B3-104 for a VMSA implementation — c1, Coprocessor Access Control Register (CPACR) on page B4-51 for a PMSA implementation. • if the Security Extensions are implemented and Non-secure access to the Advanced SIMD or VFP features is required, the access flags for CP10 and CP11 in the NSACR must be set to 1, see c1, Non-Secure Access Control Register (NSACR) on page B3-110. If this is not done, operation of Advanced SIMD and VFP features is UNDEFINED. If the access control bits are programmed differently for CP10 and CP11, operation of Advanced SIMD and VFP features is UNPREDICTABLE. In addition, software must set the FPEXC.EN bit to 1 to enable most Advanced SIMD and VFP operations, see The Floating-Point Exception Register (FPEXC) on page B1-68. B1-64 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model When floating-point operation is disabled because FPEXC.EN is 0, all Advanced SIMD and VFP instructions are treated as Undefined Instructions except for: • a VMSR to the FPEXC or FPSID register • a VMRS from the FPEXC, FPSID, MVFR0, or MVFR1 register. These instructions can be executed only in privileged modes. Note • When FPEXC.EN == 0, these operations are treated as Undefined Instructions: — a VMSR to the FPSCR — a VMRS from the FPSCR • If a VFP implementation contains system registers additional to the FPSID, FPSCR, FPEXC, MVFR0, and MVFR1 registers, the behavior of VMSR instructions to them and VMRS instructions from them is SUBARCHITECTURE DEFINED. Pseudocode details of enabling the Advanced SIMD and VFP extensions The following pseudocode takes appropriate action if an Advanced SIMD or VFP instruction is used when the extensions are not enabled: // CheckAdvSIMDOrVFPEnabled() // ========================== CheckAdvSIMDOrVFPEnabled(boolean include_fpexc_check, boolean advsimd) if HaveSecurityExt() then // Check Non-secure Access Control Register for permission to use CP10/11. if NSACR.cp10 != NSACR.cp11 then UNPREDICTABLE; if SCR.NS == ‘1’ && NSACR.cp10 == ‘0’ then UNDEFINED; // Check Coprocessor Access Control Register for permission to use CP10/11. if CPACR.cp10 != CPACR.cp11 then UNPREDICTABLE; case CPACR.cp10 of when ‘00’ UNDEFINED; when ‘01’ if !CurrentModeIsPrivileged() then UNDEFINED; // else CPACR permits access when ‘10’ UNPREDICTABLE; when ‘11’ // CPACR permits access // If the Advanced SIMD extension is specified, check whether it is disabled. if advsimd && CPACR.ASEDIS == ‘1’ then UNDEFINED; // If required, check FPEXC enabled bit. if include_fpexc_check && FPEXC.EN == ‘0’ then UNDEFINED; return; // CheckAdvSIMDEnabled() // ===================== CheckAdvSIMDEnabled() ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-65 The System Level Programmers’ Model return CheckAdvSIMDOrVFPEnabled(TRUE, TRUE); // CheckVFPEnabled() // ================= CheckVFPEnabled(boolean include_fpexc_check) return CheckAdvSIMDOrVFPEnabled(include_fpexc_check, FALSE); B1.8.2 Advanced SIMD and VFP extension system registers The Advanced SIMD and VFP extensions share a common set of special-purpose registers. Any ARMv7 implementation that includes either or both of these extensions must implement these registers. This section gives general information about this set of registers, and indicates where each register is described in detail. It contains the following subsections: • Register map of the Advanced SIMD and VFP extension system registers • Accessing the Advanced SIMD and VFP extension system registers on page B1-67. Register map of the Advanced SIMD and VFP extension system registers Table B1-11 shows the register map of the Advanced SIMD and VFP registers. When the Security Extensions are implemented, the Advanced SIMD and VFP registers are not banked. Table B1-11 Advanced SIMD and VFP common register block System register Name Description 0b0000 FPSID See Floating-point System ID Register (FPSID) on page B5-34 0b0001 FPSCR See Floating-point Status and Control Register (FPSCR) on page A2-28 0b0010- 0b0101 Reserved All accesses are UNPREDICTABLE 0b0110 MVFR1 0b0111 MVFR0 Media and VFP Feature Registers 1 and 0, see Media and VFP Feature registers on page B5-36 0b1000 FPEXC See The Floating-Point Exception Register (FPEXC) on page B1-68 0b1001-0b1111 SUBARCHITECTURE - DEFINED B1-66 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model Note • Appendix B Common VFP Subarchitecture Specification includes examples of how a VFP subarchitecture might define additional registers, in the SUBARCHITECTURE DEFINED register space using addresses in the 0b1001 to 0b1111 range. • Appendix B is not part of the ARMv7 architecture. It is included as an example of how a VFP subarchitecture might be defined. Accessing the Advanced SIMD and VFP extension system registers You access the Advanced SIMD and VFP extension system registers using the VMRS and VMSR instructions, see: • VMRS on page A8-658 • VMSR on page A8-660. For example: VMRS , FPSID VMRS , MVFR1 VMSR FPSCR, ; Read Floating-Point System ID Register ; Read Media and VFP Feature Register 1 ; Write Floating-Point System Control Register You must enable access to CP10 and CP11 in the Coprocessor Access Control register before you can access any of the Advanced SIMD and VFP extension system registers, see Enabling Advanced SIMD and floating-point support on page B1-64. To enable access to the FPSCR you must also set the EN flag in the FPEXC Register to 1, see The Floating-Point Exception Register (FPEXC) on page B1-68. Table B1-12 shows the permitted accesses to the Advanced SIMD and VFP extension system registers when the access rights to CP10 and CP11 are sufficient. Table B1-12 Access to Advanced SIMD and VFP system registers Register Register access Privileged accesses User accesses EN == 0 a EN == 1 a EN == 0 a EN == 1 a FPSID Read-only Permitted Permitted Not permitted Not permitted FPSCR Read/write Not permitted Permitted Not permitted Permitted MVFR1, MVFR0 Read-only Permitted Permitted Not permitted Not permitted FPEXC Read/write Permitted Permitted Not permitted Not permitted a. In the FPEXC Register, see The Floating-Point Exception Register (FPEXC) on page B1-68. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-67 The System Level Programmers’ Model Note All hardware ID information can be accessed only from privileged modes. The FPSID is privileged access only. This is a change in VFPv3. In VFPv2 implementations the FPSID register can be accessed in all modes. The MVFR registers are privileged access only. User code must issue a system call to determine what features are supported. B1.8.3 The Floating-Point Exception Register (FPEXC) The Floating-Point Exception Register (FPEXC) provides global enable and disable control of the Advanced SIMD and VFP extensions, and to indicate how the state of these extensions is recorded. The FPEXC: • Is in the CP10 and CP11 register space. • Is present only when at least one of the VFP and Advanced SIMD extensions is implemented. • Is a 32-bit read/write register, that can have different access rights for different bits. • If the Security Extensions are implemented, is a Configurable access register. The FPEXC is only accessible in the Non-secure state if the CP10 and CP11 bits in the NSACR are set to 1, see c1, Non-Secure Access Control Register (NSACR) on page B3-110 • Is accessible only in privileged modes, and only if access to coprocessors CP10 and CP11 is enabled in the Coprocessor Access Control Register, see: — c1, Coprocessor Access Control Register (CPACR) on page B3-104 for a VMSA implementation — c1, Coprocessor Access Control Register (CPACR) on page B4-51 for a PMSA implementation. • Has a reset value of 0 for bit [30], FPEXC.EN. The format of the FPEXC is: 31 30 29 0 EXEN SUBARCHITECTURE DEFINED EX, bit [31] Exception bit. A status bit that specifies how much information must be saved to record the state of the Advanced SIMD and VFP system: 0 B1-68 The only significant state is the contents of the registers: • D0 - D15 • D16 - D31, if implemented Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model • FPSCR • FPEXC. A context switch can be performed by saving and restoring the values of these registers. 1 There is additional state that must be handled by any context switch system. The behavior of the EX bit on writes is SUBARCHITECTURE DEFINED, except that in any implementation a write of 0 to this bit must be a valid operation, and must return a value of 0 if read back immediately. EN, bit [30] Enable bit. A global enable for the Advanced SIMD and VFP extensions: 0 The Advanced SIMD and VFP extensions are disabled. For details of how the system operates when EN == 0 see Enabling Advanced SIMD and floating-point support on page B1-64. 1 The Advanced SIMD and VFP extensions are enabled and operate normally. This bit is always a normal read/write bit. It has a reset value of 0. Bits [29:0] SUBARCHITECTURE DEFINED. An implementation can use these bits to communicate exception information between the floating-point hardware and the support code. The subarchitectural definition of these bits includes their read/write access. This can be defined on a bit by bit basis. A constraint on these bits is that if EX == 0 it must be possible to save and restore all significant state for the floating-point system by saving and restoring only the two Advanced SIMD and VFP extension registers FPSCR and FPEXC. Writes to the FPEXC can have side-effects on various aspects of processor operation. All of these side-effects are synchronous to the FPEXC write. This means they are guaranteed not to be visible to earlier instructions in the execution stream, and they are guaranteed to be visible to later instructions in the execution stream. See Advanced SIMD and VFP extension system registers on page B1-66 for an overview of the common set of system registers for the Advanced SIMD and VFP extensions. B1.8.4 Context switching with the Advanced SIMD and VFP extensions In an implementation that includes one or both of the Advanced SIMD and VFP extensions, if the VFP registers are used by only a subset of processes, the operating system might implement lazy context switching of the extension registers and extension system registers. In the simplest lazy context switch implementation, the primary context switch code simply disables the VFP and Advanced SIMD extensions, by disabling access to coprocessors CP10 and CP11 in the Coprocessor Access Control Register, see Enabling Advanced SIMD and floating-point support on page B1-64. Subsequently, when a process or thread attempts to use an Advanced SIMD or VFP instruction, it triggers an Undefined Instruction exception. The operating system responds by saving and restoring the extension registers and extension system registers. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-69 The System Level Programmers’ Model B1.8.5 VFP support code A complete VFP implementation might require a software component, known as the support code. For example, if VFPv3U is implemented support code must handle the trapped floating-point exceptions. Typically, the support code is entered through the ARM Undefined Instruction vector, when the extension hardware does not respond to a VFP instruction. This software entry is known as a bounce. When VFPv3U is implemented, the bounce mechanism is used to support trapped floating-point exceptions. Trapped floating-point exceptions, known as traps, are floating-point exceptions that an implementation passes back to application software to resolve, see Floating-point exceptions on page A2-42. The support code must catch a trapped exception and convert it into a trap handler call. The support code can perform other tasks, as determined by the implementation. It might be used for rare conditions, such as operations that are difficult to implement in hardware, or operations that are gate intensive in hardware. This permits consistent software behavior with varying degrees of hardware support. The division of labor between the hardware and software components of an implementation, and details of the interface between the support code and hardware are SUBARCHITECTURE DEFINED. Asynchronous bounces, serialization, and VFP exception barriers A VFP implementation can produce an asynchronous bounce, in which a VFP instruction takes the Undefined Instruction exception because support code processing is required for an earlier VFP instruction. The mechanism by which the nature of the required processing is communicated to the support code is SUBARCHITECTURE DEFINED. Typically, it involves: • using the SUBARCHITECTURE DEFINED bits of the FPEXC, see The Floating-Point Exception Register (FPEXC) on page B1-68 • using the SUBARCHITECTURE DEFINED extension system registers, see Advanced SIMD and VFP extension system registers on page B1-66 • setting FPEXC.EX == 1, to indicate that the SUBARCHITECTURE DEFINED extension system registers must be saved on a context switch. An asynchronous bounce might not relate to the last VFP instruction executed before the one that took the Undefined Instruction exception. It is possible that another VFP instruction has been issued and retired before the asynchronous bounce occurs. This is possible only if this intervening instruction has no register dependencies on the VFP instruction that requires support code processing. In addition. it is possible that there are SUBARCHITECTURE DEFINED mechanisms for handling an intervening VFP instruction that has issued but not retired. However, VMRS and VMSR instructions that access the FPSID, FPSCR, or FPEXC registers are serializing instructions. This means they ensure that any exceptional condition in any preceding VFP instruction that requires support code processing has been detected and reflected in the extension system registers before they perform the register transfer. A VMSR instruction to the read-only FPSID register is a serializing NOP. B1-70 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model In addition: • A VMRS or VMSR instruction that accesses the FPSCR acts as a VFP exception barrier. This means it ensures that any outstanding exceptional conditions in preceding VFP instructions have been detected and processed by the support code before it performs the register transfer. If necessary, the VMRS or VMSR instruction takes an asynchronous bounce to force the processing of outstanding exceptional conditions. • VMRS and VMSR instructions that access the FPSID or FPEXC do not take asynchronous bounces. VFP serialization and the VFP exception barriers are described in pseudocode by the SerializeVFP() and VFPExcBarrier() functions respectively: SerializeVFP() VFPExcBarrier() Interactions with the ARM architecture ARM recommends that a VFP extension uses the Undefined Instruction mechanism to invoke its support code, see Undefined Instruction exceptions on page B1-76. To do this: 1. Before enabling the extension hardware, install the support code on the Undefined Instruction vector. 2. If the extension hardware requires assistance from the support code, it does not respond to a VFP instruction. 3. This causes an Undefined Instruction exception, that causes the support code to be executed. VFP load/store instructions can generate Data Abort exceptions, and therefore implementations must be able to cope with a Data Abort exception on any memory access caused by such instructions. Interrupts Taking the Undefined Instruction exception causes IRQs to be disabled, see Undefined Instruction exception on page B1-49. Normally, IRQs are not re-enabled until the exception handler returns. This means that normal use of a VFP extension that requires support code in a system can increases worst case IRQ latency considerably. You can reduce this IRQ latency penalty considerably by explicitly re-enabling interrupts soon after entry to the Undefined Instruction handler. This requires careful integration of the Undefined Instruction handler into the rest of the operating system. How this might be done is highly system-specific and beyond the scope of this manual. A system where the IRQ handler itself might use the VFP coprocessor has a second potential cause of increased IRQ latency. This increase occurs if a long latency VFP operation is initiated by the interrupted application program, denying the use of the extension hardware to the IRQ handler for a significant number of cycles. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-71 The System Level Programmers’ Model Therefore, if a system contains IRQ handlers that require both low interrupt latency and the use of VFP instructions, ARM recommends that the use of the highest latency Advanced SIMD or VFP instructions is avoided. Note FIQs are not disabled by entry to the Undefined Instruction handler, and so FIQ latency is not affected by the use of the Undefined Instruction exception described here. However, because they are not disabled, an FIQ can occur at any point during support code execution, including during the entry and exit sequences of the Undefined Instruction handler. If an FIQ handler can make any change to the state of the Advanced SIMD or VFP implementation, you must take great care to ensure that it handles every case correctly. Usually, this requirement is incompatible with the requirement that FIQs provide fast interrupt processing. Therefore ARM recommends that FIQ handlers do not use the Advanced SIMD or VFP extension. B1.8.6 VFP subarchitecture support In the ARMv7 specification of the VFP extension, some features are identified as SUBARCHITECTURE DEFINED. ARMv7 is fully compatible with the ARM Common VFP subarchitecture, that ARM has used for several VFP implementations. However, ARMv7 does not require or specifically recommend the use of the ARM Common VFP subarchitecture. Appendix B Common VFP Subarchitecture Specification is the specification of the ARM Common VFP subarchitecture. The subarchitecture is not part of the ARMv7 architecture specification. For details of the status of the subarchitecture specification see the Note on the cover page of Appendix B. B1-72 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model B1.9 Execution environment support Support code for an execution environment can execute in two of the processor states described in Instruction set states on page B1-23: B1.9.1 • ThumbEE state supports the Thumb Execution Environment. For more information, see Thumb Execution Environment. • Jazelle state supports direct bytecode execution. For more information, see Jazelle direct bytecode execution on page B1-74. Thumb Execution Environment See Thumb Execution Environment on page A2-69 for an introduction to the Thumb Execution Environment (ThumbEE), including an application level view of the execution environment, and a definition of its CP14 registers. This section describes the system level programmers’ model for ThumbEE. For more information about ThumbEE see Chapter A9 ThumbEE. The ThumbEE Configuration Register can be read in User mode, but can be written only in privileged modes, see ThumbEE Configuration Register (TEECR) on page A2-70. Access to the ThumbEE Handler Base Register depends on the value held in the TEECR and the current privilege level, see ThumbEE Handler Base Register (TEEHBR) on page A2-71. The processor executes ThumbEE instructions when it is in ThumbEE state. The processor instruction set state is indicated by the CPSR.J and CPSR.T bits, see Program Status Registers (PSRs) on page B1-14. (J,T) == 0b11 when the processor is in ThumbEE state. During normal execution, not involving exception entries and returns: • ThumbEE state can only be entered from Thumb state, using the ENTERX instruction • exit from ThumbEE state always occurs using the LEAVEX instruction and returns execution to Thumb state. For details of these instructions see ENTERX, LEAVEX on page A9-7. When an exception occurs in ThumbEE state, exception entry goes to either ARM state or Thumb state as usual, depending on the value of SCTLR.TE. When the exception handler returns, the exception return instruction restores CPSR.J and CPSR.T as usual, causing a return to ThumbEE state. In ThumbEE state, execution of the exception return instructions described in Exception return on page B1-38 is UNPREDICTABLE. ThumbEE and the Security Extensions When an implementation that supports ThumbEE includes the Security Extensions, the ThumbEE registers are not banked. If ThumbEE support is required in both Secure and Non-secure states, the monitor must save and restore the register contents accordingly. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-73 The System Level Programmers’ Model Aborts, exceptions, and checks Aborts and exceptions are unchanged in ThumbEE. A null check takes priority over an abort or watchpoint on the same memory access. For more information, see Null checking on page A9-3. The IT state bits in the CPSR are always cleared on entry to a NullCheck or IndexCheck handler. For more information, see IT block and check handlers on page A9-5. B1.9.2 Jazelle direct bytecode execution In Jazelle state the processor executes bytecode programs, as described in Jazelle state on page A2-74. The processor instruction set state is indicated by the CPSR.J and CPSR.T bits, see Program Status Registers (PSRs) on page B1-14. (J,T) == 0b10 when the processor is in Jazelle state. For more information about entering and leaving Jazelle state see Jazelle state on page B1-81. Extension of the PC to 32 bits To enable the PC to point to an arbitrary bytecode instruction, in a non-trivial Jazelle implementation all 32 bits of the PC are defined. In the PC, bit [0] always reads as zero when in ARM, Thumb, or ThumbEE state. The existence of bit [0] in the PC is only visible in ARM, Thumb, or ThumbEE states when an exception occurs in Jazelle state and the exception return address is odd-byte aligned. The main architectural implication of this is that an exception handler must ensure that it restores all 32 bits of the PC. The recommended ways of handling exception returns behave correctly. Exception handling in the Jazelle extension Exceptions on page B1-30 describes how exception entry occurs if an exception occurs while the processor is executing in Jazelle state. This section gives more information about how exceptions in Jazelle state are taken and handled. Interrupts and Fast interrupts, IRQ and FIQ To enable the standard mechanism for handling interrupts to work correctly, a Jazelle hardware implementation must ensure that one of the following applies at the point where execution of a bytecode instruction might be interrupted by an IRQ or FIQ: • Execution has reached a bytecode instruction boundary. That is: — all operations required to implement one bytecode instruction have completed — no operations required to implement the next bytecode instruction has completed. The LR value on entry to the interrupt handler must be (address of the next bytecode instruction) + 4. • B1-74 The sequence of operations performed from the start of execution of the current bytecode instruction, up to the point where the interrupt occurs, is idempotent. This means that the sequence can be repeated from its start without changing the overall result of executing the bytecode instruction. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model The LR value on entry to the interrupt handler must be (address of the current bytecode instruction) + 4. • Corrective action is taken either: — directly by the Jazelle extension hardware — indirectly, by calling a SUBARCHITECTURE DEFINED handler in the EJVM. The corrective action must re-create a situation where the bytecode instruction can be re-executed from its start. The LR value on entry to the interrupt handler must be (address of the interrupted bytecode instruction) + 4. Data Abort exceptions On taking a Data Abort exception, the value saved in LR_abt must ensure that the Data Abort handler can: • read the CP15 Fault Status and Fault Address registers • fix the reason for the abort • return using SUBS PC,LR,#8 or its equivalent. The abort handler must be able to do this without looking at the instruction that caused the abort or which instruction set state it was executed in. On an ARMv7-A implementation, the abort handler must take account of the virtual memory system. Note • This assumes that the intention is to return to and retry the bytecode instruction that caused the Data Abort exception. If the intention is instead to return to the bytecode instruction after the one that caused the abort, then the return address must be modified by the length of the bytecode instruction that caused the abort. • For details of the CP15 Fault Status and Fault Address: — for a VMSA implementation, see CP15 c5, Fault status registers on page B3-121 and CP15 c6, Fault Address registers on page B3-124 — for a PMSA implementation, see CP15 c5, Fault status registers on page B4-54 and CP15 c6, Fault Address registers on page B4-57. To enable the standard mechanism for handling Data Abort exceptions to work correctly, a Jazelle hardware implementation must ensure that one of the following applies at any point where a bytecode instruction can generate a Data Abort exception: • The sequence of operations performed from the start of execution of the bytecode instruction, up to the point where the Data Abort exception is generated, is idempotent. This means that the sequence can be repeated from its start without changing the overall result of executing the bytecode instruction. • If the Data Abort exception is generated during execution of a bytecode instruction, corrective action is taken either: — directly by the Jazelle extension hardware ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-75 The System Level Programmers’ Model — indirectly, by calling a SUBARCHITECTURE DEFINED handler in the EJVM. The corrective action must re-create a situation where the bytecode instruction can be re-executed from its start. Note From ARMv6, the ARM architecture does not support the Base Updated Abort Model. This removes a potential obstacle to the first of these solutions. For information about the Base Updated Abort Model in earlier versions of the ARM architecture see The ARM abort model on page AppxH-20. Prefetch Abort exceptions On taking a Prefetch Abort exception, the value saved in LR_abt must ensure that the Prefetch Abort handler can locate the start of the instruction that caused the abort simply and without looking at the instruction set state in which its execution was attempted. The start of this instruction is always at address (LR_abt – 4). On an ARMv7-A implementation, the abort handler must take account of the virtual memory system. A multi-byte bytecode instruction can cross a page boundary. In this case the Prefetch Abort handler cannot use LR_abt to determine which of the two pages caused the abort. How this situation is handled is SUBARCHITECTURE DEFINED, but if it is handled by taking a Prefetch Abort exception, the architecture requires that (LR_abt – 4) must point to the first byte of the bytecode instruction that caused the abort. To ensure subarchitecture-independence, OS designers must write Prefetch Abort handlers in such a way that they can handle a Prefetch Abort exception generated in either of the two pages spanned by a multi-byte bytecode instruction that crosses a page boundary. In an implementation that has an Instruction Fault Address Register (IFAR), the IFAR can be used to determine the faulting page. Otherwise, a simple technique is: IF the page pointed to by (LR_abt – 4) is not mapped THEN map the page ELSE map the page following the page including (LR_abt – 4) ENDIF retry the instruction SVC and SMC exceptions SVC and SMC exceptions must not be taken during Jazelle state execution. To cause either of these exceptions to be taken, a Jazelle implementation must exit to a software handler that executes an SVC or SMC instruction. Undefined Instruction exceptions The Undefined Instruction exception must not be taken during Jazelle state execution, except on a trivial implementation of Jazelle state as described in Exception return to an unsupported instruction set state on page B1-40. When executing in Jazelle state, the Jazelle extension hardware might use a coprocessor extension such as the VFP extension to execute some operations. If it does so, it must avoid taking Undefined Instruction exceptions while in Jazelle state, even if an exceptional condition occurs that would normally cause the coprocessor extension to generate an Undefined Instruction exception. B1-76 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model Jazelle state configuration and control For details of the configuration and control of Jazelle state from the application level, see Application level configuration and control of the Jazelle extension on page A2-75. That section includes descriptions of the Jazelle extension registers that can be accessed from User mode: • Jazelle ID Register (JIDR) on page A2-76 • Jazelle Main Configuration Register (JMCR) on page A2-77. The other Jazelle extension register is accessible only from privileged modes, see Jazelle OS Control Register (JOSCR). This register controls access to the Jazelle extension. When the Security Extensions are implemented, the Jazelle registers are common to the Secure and Non-secure security states. Each register has the same access permissions in both security states. For more information, see the register descriptions. Changes to the Jazelle CP14 registers have the same synchronization requirements as changes to the CP15 registers. These are described in: • Changes to CP15 registers and the memory order model on page B3-77 for a VMSA implementation • Changes to CP15 registers and the memory order model on page B4-28 for a PMSA implementation. Note • • Normally, an EJVM never accesses the JOSCR. An EJVM that runs in User mode must not attempt to access the JOSCR. Jazelle OS Control Register (JOSCR) The Jazelle OS Control Register (JOSCR) provides operating system control of the use of the Jazelle extension by processes and threads. The JOSCR is: • a CP14 register • a 32-bit read/write register • accessible only from privileged modes • when the Security Extensions are implemented, a Common register. The format of the JOSCR is: 31 2 1 0 Reserved, RAZ C C V D Bits [31:2] Reserved, RAZ. These bits are reserved for future expansion. CV, bit [1] Configuration Valid bit. This bit is used by an operating system to signal to the EJVM that it must re-write its configuration to the configuration registers. The possible values are: 0 ARM DDI 0406B Configuration not valid. The EJVM must re-write its configuration to the configuration registers before it executes another bytecode instruction. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-77 The System Level Programmers’ Model 1 Configuration valid. The EJVM does not need to update the configuration registers. When the JMCR.JE bit is set to 1, the CV bit also controls entry to Jazelle state, see Controlling entry to Jazelle state on page B1-79. CD, bit [0] Configuration Disabled bit. This bit is used by an operating system to disable User mode access to the JIDR and configuration registers: 0 Configuration enabled. Access to the Jazelle registers, including User mode accesses, operate normally. For more information, see the register descriptions in Application level configuration and control of the Jazelle extension on page A2-75. 1 Configuration disabled in User mode. User mode access to the Jazelle registers are UNDEFINED, and all User mode accesses to the Jazelle registers cause an Undefined Instruction exception. For more information about the use of this bit see Monitoring and controlling User mode access to the Jazelle extension on page B1-80. The JOSCR provides a control mechanism that is independent of the subarchitecture of the Jazelle extension. An operating system can use this mechanism to control access to the Jazelle extension. Normally, this register is used in conjunction with the JMCR.JE bit, see Jazelle Main Configuration Register (JMCR) on page A2-77. The JOSCR.CV and JOSCR.CD bits are both set to 0 on reset. This ensures that, subject to some conditions, an EJVM can operate under an OS that does not support the Jazelle extension. The main condition required to ensure an EJVM can operate under an OS that does not support the Jazelle extension it that the operating system never swaps between two EJVM processes that require different settings of the Jazelle configuration registers. Two examples of how this condition can be met in a system are: • if there is only ever one process or thread using the EJVM • if all of the processes or threads that use the EJVM use the same static settings of the configuration registers. Accessing the JOSCR To access the JOSCR you read or write the CP14 registers with set to 7, set to c1, set to c0, and set to 0. For example: MRC MCR p14, 7, , c1, c0, 0 p14, 7, , c1, c0, 0 ; Read Jazelle OS Control Register ; Write Jazelle OS Control Register Note For maximum compatibility with any future enhancements to the Jazelle extension, ARM strongly recommends that a read, modify, write sequence is used to update the JOSCR. Updating the register in this way preserves the value of any of bits [31:2] that might be used by a future expansion. B1-78 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model Controlling entry to Jazelle state The normal method of entering Jazelle state is using the BXJ instruction, see Jazelle state entry instruction, BXJ on page A2-74. The operation of this instruction depends on both: • the value of the JMCR.JE bit, see Jazelle Main Configuration Register (JMCR) on page A2-77 • the value of the JOSCR.CV bit. When the JMCR.JE bit is 0, the JOSCR has no effect on the execution of BXJ instructions. They always execute as BX instructions, and there is no attempt to enter Jazelle state. When the JMCR.JE bit is 1, the JOSCR.CV bit controls the operation of BXJ instructions: If CV == 1 The Jazelle extension hardware configuration is valid and enabled. A BXJ instruction causes the processor to enter Jazelle state in SUBARCHITECTURE DEFINED circumstances, and execute bytecode instructions as described in Executing BXJ with Jazelle extension enabled on page A2-75. If CV == 0 The Jazelle extension hardware configuration is not valid and therefore entry to Jazelle state is disabled. In all SUBARCHITECTURE DEFINED circumstances where, if CV had been 1 the BXJ instruction would have caused the Jazelle extension hardware to enter Jazelle state, it instead: • enters a Configuration Invalid handler • sets CV to 1. A Configuration Invalid handler is a sequence of instructions that: • includes MCR instructions to write the configuration required by the EJVM • ends with a BXJ instruction to re-attempt execution of the required bytecode instruction. The following are SUBARCHITECTURE DEFINED: • how the address of the Configuration Invalid handler is determined • the entry and exit conditions of the Configuration Invalid handler. In circumstances in which the Jazelle extension hardware would not have entered Jazelle state if CV had been 1, it is IMPLEMENTATION DEFINED whether: • the Configuration Invalid handler is entered • a SUBARCHITECTURE DEFINED handler is entered, as described in Executing BXJ with Jazelle extension enabled on page A2-75. In ARMv7, the JOSCVR.CV bit is set to 0 on exception entry for all implementations other than a trivial implementation of the Jazelle extension. The intended use of the JOSCR.CV bit is: 1. ARM DDI 0406B When a context switch occurs, JOSCR.CV is set to 0. This is done by the operating system or, in ARMv7, as the result of an exception. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-79 The System Level Programmers’ Model 2. When the new process or thread performs a BXJ instruction to start executing bytecode instructions, the Configuration Invalid handler is entered and JOSCR.CV becomes 1. 3. The Configuration Invalid handler: • writes the configuration required by the EJVM to the Jazelle configuration registers • retries the BXJ instruction to execute the bytecode instruction. This ensures that the Jazelle extension configuration registers are set up correctly for the EJVM concerned before any bytecode instructions are executed. It successfully handles cases where a context switch occurs during execution of the Configuration Invalid handler. Monitoring and controlling User mode access to the Jazelle extension The system can use the JOSCR.CD bit in different ways to monitor and control User mode access to the Jazelle extension hardware. Possible uses include: • An OS can set JOSCR.CD == 1 and JMCR.JE == 0, to prevent all User mode access to the Jazelle extension hardware. With these settings any use of the BXJ instruction has the same result as a BX instruction, and any attempt to configure the hardware, including any attempt to set the JMCR.JE bit to 1, results in an Undefined Instruction exception. • A simple mechanism for the OS to provide User mode access to the Jazelle extension hardware, while protecting EJVMs from conflicting use of the hardware by other processes, is: — Set the JOSCR.CD bit to 0. — Preserve and restore the JMCR on context switches, initializing its value to 0 for new processes. — The JOSCR.CV bit is set to 0 on each context switch, either by the operating system or, in ARMv7, as the result of an exception. This ensures that EJVMs reconfigure the Jazelle extension hardware to match their requirements when necessary. The context switch mechanism is described in Controlling entry to Jazelle state on page B1-79. EJVM operation EJVM operation on page A2-79 described the architectural requirements for an EJVM at the application level. Because the EJVM is provided for use by applications, the system level description of the architecture does not require significant additional information about the EJVM. Initialization on page A2-79 stated that, if the EJVM is compatible with the subarchitecture, the EJVM must write its required configuration to the JMCR and any other configuration registers. The EJVM must not omit this step on the assumption that the JOSCR.CV bit is 0. In other words, the EJVM must not assume that JOSCR.CV == 0, and that this will trigger entry to the Configuration Invalid handler before any bytecode instruction is executed by the Jazelle extension hardware. B1-80 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model Trivial implementation of the Jazelle extension Jazelle direct bytecode execution support on page A2-73 introduced the possible trivial implementation of the Jazelle extension, and summarized the application level requirements of a trivial implementation. This section gives the system level description of a trivial implementation of the Jazelle extension. A trivial implementation of the Jazelle extension must: • Implement the JIDR with the implementer and subarchitecture fields set to zero. The register can be implemented so that the whole register is RAZ. • Implement the JMCR as RAZ/WI. • Implement the JOSCR either: — so that it can be read and written, but its effects are ignored — as RAZ/WI. This enables operating systems that support an EJVM to execute correctly. • Implement the BXJ instruction to behave identically to the BX instruction in all circumstances, as required by the fact that the JMCR.JE bit is always zero. This means that Jazelle state can never be entered normally on a trivial implementation. • Treat Jazelle state as an unsupported instruction set state, as described in Exception return to an unsupported instruction set state on page B1-40. A trivial implementation does not have to extend the PC to 32 bits, that is, it can implement PC[0] as RAZ/WI. This is because the only way that PC[0] is visible in ARM or Thumb state is as a result of a processor exception occurring during Jazelle state execution, and Jazelle state execution cannot occur on a trivial implementation. Jazelle state All processor configuration information that can be modified by Jazelle state execution must be kept in the Application Level registers described in ARM processor modes and core registers on page B1-6. This ensures that the processor configuration information is preserved and restored correctly when processor exceptions and context switches occur. Configuration information can be kept either in Application Level registers or in configuration registers. In this context, configuration information is information that affects Jazelle state execution but is not modified by it. An Enabled Java Virtual Machine (EJVM) implementation must check whether the implemented Jazelle extension is compatible with its use of the Application Level registers. If the implementation is compatible, the EJVM sets JE == 1 in the JMCR, see Jazelle Main Configuration Register (JMCR) on page A2-77. If the implementation is not compatible, the EJVM sets JE == 0 and executes without hardware acceleration. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-81 The System Level Programmers’ Model Jazelle state exit The processor exits Jazelle state in IMPLEMENTATION DEFINED circumstances. Typically, this is due to attempted execution of a bytecode instruction that the implementation cannot handle in hardware, or that generates one of the Java exceptions described in Lindholm and Yellin, The Java Virtual Machine Specification 2nd Edition. On exit from Jazelle state, various processor registers contain SUBARCHITECTURE DEFINED values, enabling the EJVM to resume software execution of the bytecode program correctly. The processor also exits Jazelle state when a processor exception occurs. The CPSR is copied to the banked SPSR for the exception mode, so the banked SPSR contains J == 1 and T == 0, and Jazelle state is restored on return from the exception when the SPSR is copied back into the CPSR. With the restriction that Jazelle state execution can modify only Application Level registers, this ensures that all registers are correctly preserved and can be restored by the exception handlers. Configuration and control registers can be modified in the exception handler itself as described in Jazelle state configuration and control on page B1-77 and Jazelle OS Control Register (JOSCR) on page B1-77. Specific considerations apply to processor exceptions, see Exception handling in the Jazelle extension on page B1-74. It is IMPLEMENTATION DEFINED whether Jazelle extension hardware contains state that is both: • modified during Jazelle state execution • held outside the Application Level registers during Jazelle state execution. If such state exists, the implementation must: • Initialize the state from one or more of the Application Level registers whenever Jazelle state is entered, whether as the result of: — the execution of a BXJ instruction — returning from a processor exception. • Write the state into one or more of the Application Level registers whenever Jazelle state is exited, whether as a result of taking a processor exception or of IMPLEMENTATION DEFINED circumstances. • Ensure that the mechanism for writing the state into Application Level registers on taking a processor exception, and initializing the state from Application Level registers on returning from that exception, ensures that the state is correctly preserved and restored over the exception. Additional Jazelle state restrictions The Jazelle extension hardware must obey the following restrictions: B1-82 • It must not change processor mode other than by taking one of the standard ARM processor exceptions. • It must not access banked versions of registers other than the ones belonging to the processor mode in which it is entered. • It must not do anything that is illegal for an UNPREDICTABLE instruction. That is, it must not: — generate a security loophole — halt or hang the processor or any other part of the system. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The System Level Programmers’ Model As a result of these requirements, Jazelle state can be entered from User mode without risking a breach of OS security. In addition, Jazelle state execution is UNPREDICTABLE in FIQ mode. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B1-83 The System Level Programmers’ Model B1-84 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Chapter B2 Common Memory System Architecture Features This chapter provides a system-level view of the general features of the memory system. It contains the following sections: • About the memory system architecture on page B2-2 • Caches on page B2-3 • Implementation defined memory system features on page B2-27 • Pseudocode details of general memory system operations on page B2-29. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B2-1 Common Memory System Architecture Features B2.1 About the memory system architecture The ARM architecture supports different implementation choices for the memory system microarchitecture and memory hierarchy, depending on the requirements of the system being implemented. In this respect, the memory system architecture describes a design space in which an implementation is made. The architecture does not prescribe a particular form for the memory systems. Key concepts are abstracted in a way that enables implementation choices to be made while enabling the development of common software routines that do not have to be specific to a particular microarchitectural form of the memory system. For more information about the concept of a hierarchical memory system see Memory hierarchy on page A3-52. B2.1.1 Form of the memory system architecture ARMv7 supports different forms of the memory system architecture, that map onto the different architecture profiles. Two of these are described in this manual: • ARMv7-A, the A profile, requires the inclusion of a Virtual Memory System Architecture (VMSA), as described in Chapter B3 Virtual Memory System Architecture (VMSA). • ARMv7-R, the R profile, requires the inclusion of a Protected Memory System Architecture (PMSA), as described in Chapter B4 Protected Memory System Architecture (PMSA). Both of these memory system architectures provide mechanisms to split memory into different regions. Each region has specific memory types and attributes. The two memory system architectures have different capabilities and programmers’ models. The memory system architecture model required by ARMv7-M, the M profile, is outside the scope of this manual. It is described in the ARMv7-M Architecture Reference Manual. B2.1.2 Memory attributes Summary of ARMv7 memory attributes on page A3-25 summarizes the memory attributes, including how different memory types have different attributes. Each region of memory has a set of memory attributes: B2.1.3 • in a PMSA implementation the attributes are part of each MPU memory region definition • in a VMSA implementation the translation table entry that defines a virtual memory region also defines the attributes for that region. Levels of cache From ARMv7, the architecturally-defined cache control mechanism covers multiple levels of cache, as described in Caches on page B2-3. Also, it permits levels of cache beyond the scope of these cache control mechanisms, see System-level caches on page B2-26. Note Before ARMv7, the architecturally-defined cache control mechanism covers only a single level of cache, and any support for other levels of cache is IMPLEMENTATION DEFINED. B2-2 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Common Memory System Architecture Features B2.2 Caches The concept of caches is described in Caches and memory hierarchy on page A3-51. This section describes the cache identification and control mechanisms in ARMv7. These are described in the following sections: • Cache identification • Cache behavior on page B2-5 • Cache enabling and disabling on page B2-8 • Cache maintenance functionality on page B2-9 • The interaction of cache lockdown with cache maintenance on page B2-18 • Branch predictors on page B2-19 • Ordering of cache and branch predictor maintenance operations on page B2-21 • Multiprocessor effects on cache maintenance operations on page B2-23 • System-level caches on page B2-26. Note The cache identification and control mechanisms for previous versions of the ARM architecture are described in: • Cache support on page AppxG-21, for ARMv6 • Cache support on page AppxH-21, for the ARMv4 and ARMv5 architectures. B2.2.1 Cache identification The ARMv7 cache identification consists of a set of registers that describe the implemented caches that are under the control of the processor: • A single Cache Type Register defines: — the minimum line length of any of the instruction caches — the minimum line length of any of the data or unified caches — the cache indexing and tagging policy of the Level 1 instruction cache. For more information, see: — c0, Cache Type Register (CTR) on page B3-83, for a VMSA implementation — c0, Cache Type Register (CTR) on page B4-34, for a PMSA implementation. • A single Cache Level ID Register defines: — the type of cache implemented at a each cache level, up to the maximum of seven levels — the Level of Coherence for the caches — the Level of Unification for the caches. For more information, see: — c0, Cache Level ID Register (CLIDR) on page B3-92, for a VMSA implementation — c0, Cache Level ID Register (CLIDR) on page B4-41, for a PMSA implementation. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B2-3 Common Memory System Architecture Features • A single Cache Size Selection Register selects the cache level and cache type of the current Cache Size Identification Register, see: — c0, Cache Size Selection Register (CSSELR) on page B3-95, for a VMSA implementation — c0, Cache Size Selection Register (CSSELR) on page B4-43, for a PMSA implementation. • For each implemented cache, across all the levels of caching, a Cache Size Identification Register defines: — whether the cache supports Write-Through, Write-Back, Read-Allocate and Write-Allocate — the number of sets, associativity and line size of the cache. For more information, see: — c0, Cache Size ID Registers (CCSIDR) on page B3-91, for a VMSA implementation — c0, Cache Size ID Registers (CCSIDR) on page B4-40, for a PMSA implementation. Identifying the cache resources in ARMv7 From ARMv7 the architecture defines support for multiple levels of cache, up to a maximum of seven levels. This means the process of identifying the cache resources available to the processor in an ARMv7 implementation is more complicated. To obtain this information: 1. Read the Cache Type Register to find the indexing and tagging policy used for the Level 1 instruction cache. This register also provides the size of the smallest cache lines used for the instruction caches, and for the data and unified caches. These values are used in cache maintenance operations. 2. Read the Cache Level ID Register to find what caches are implemented. The register includes seven Cache type fields, for cache levels 1 to 8. Scanning these fields, starting from Level 1, identifies the instruction, data or unified caches implemented at each level. This scan ends when it reaches a level at which no caches are defined. The Cache Level ID Register also provides the Level of Unification and the Level of Coherency for the cache implementation. 3. For each cache identified at stage 2: • Write to the Cache Size Selection Register to select the required cache. A cache is identified by its level, and whether it is: — an instruction cache — a data or unified cache. • Read the Cache Size ID Register to find details of the cache. Note In ARMv6, only the Level 1 caches are architecturally defined, and the Cache Type Register holds details of the caches. For more information, see Cache support on page AppxG-21. B2-4 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Common Memory System Architecture Features B2.2.2 Cache behavior The behavior of caches in an ARMv7 implementation is summarized in the following subsections: • General behavior of the caches • Behavior of the caches at reset on page B2-6 • Behavior of Preload Data (PLD, PLDW) and Preload Instruction (PLI) with caches on page B2-7. General behavior of the caches When a memory location is marked with a Normal Cacheable memory attribute, determining whether a copy of the memory location is held in a cache still depends on many aspects of the implementation. Typically, the following non-exhaustive list of factors might be involved: • the size, line-length, and associativity of the cache • the cache allocation algorithm • activity by other elements of the system that can access the memory • instruction prefetching algorithms • data prefetching algorithms • interrupt behaviors. Given this range of factors, and the large variety of cache systems that might be implemented, the architecture cannot guarantee whether: • a memory location present in the cache remains in the cache • a memory location not present in the cache is brought into the cache. Instead, the following principles apply to the behavior of caches: • The architecture has a concept of an entry locked down in the cache. How lockdown is achieved is and lockdown might not be supported by: — a particular implementation — some memory attributes. IMPLEMENTATION DEFINED, • An unlocked entry in the cache cannot be relied upon to remain in the cache. If an unlocked entry does remain in the cache, it cannot be relied upon to remain incoherent with the rest of memory. In other words, software must not assume that an unlocked item that remains in the cache remains dirty. • A locked entry in the cache can be relied upon to remain in the cache. A locked entry in the cache cannot be relied upon to remain incoherent with the rest of memory, that is, it cannot be relied on to remain dirty. Note For more information, see The interaction of cache lockdown with cache maintenance on page B2-18. • ARM DDI 0406B If a memory location is marked as Cacheable there is no mechanism by which it can be guaranteed not to be allocated to an enabled cache at any time. Any application must assume that any Cacheable memory location can be allocated to any enabled cache at any time. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B2-5 Common Memory System Architecture Features • If the cache is disabled, it is guaranteed that no new allocation of memory locations into the cache will occur. • If the cache is enabled, it is guaranteed that no memory location that does not have a Cacheable attribute is allocated into the cache. • If the cache is enabled, it is guaranteed that no memory location is allocated to the cache if its translation table attributes or region attributes prevent privileged read access. • Any memory location that is marked as Normal Shareable is guaranteed to be coherent with all masters in that shareability domain for data accesses. • Any memory location is not guaranteed to remain incoherent with the rest of memory. • The eviction of a cache entry from a cache level can overwrite memory that has been written by another observer only if the entry contains a memory location that has been written to by a processor that controls that cache. The maximum size of the memory that can be overwritten is called the Cache Writeback Granule. In some implementations the CTR identifies the Cache Writeback Granule, see: — c0, Cache Type Register (CTR) on page B3-83 for a VMSA implementation — c0, Cache Type Register (CTR) on page B4-34 for a PMSA implementation. • The allocation of a memory location into a cache cannot cause the most recent value of that memory location to become invisible to an observer, if it had previously been visible to that observer. For the purpose of these principles, a cache entry covers at least 16 bytes and no more than 2KB of contiguous address space, aligned to its size. In addition, in ARMv7, in the following situations it is UNPREDICTABLE whether the location is returned from cache or from memory: • The location is not marked as Cacheable but is contained in the cache. This situation can occur if a location is marked as Non-cacheable after it has been allocated into the cache. • The location is marked as Cacheable and might be contained in the cache, but the cache is disabled. Behavior of the caches at reset In ARMv7: B2-6 • All caches are disabled at reset. • An implementation can require the use of a specific cache initialization routine to invalidate its storage array before it is enabled. The exact form of any required initialization routine is IMPLEMENTATION DEFINED, but the routine must be documented clearly as part of the documentation of the device. • It is IMPLEMENTATION DEFINED whether an access can generate a cache hit when the cache is disabled. If an implementation permits cache hits when the cache is disabled the cache initialization routine must: — provide a mechanism to ensure the correct initialization of the caches — be documented clearly as part of the documentation of the device. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Common Memory System Architecture Features In particular, if an implementation permits cache hits when the cache is disabled and the cache contents are not invalidated at reset, the initialization routine must avoid any possibility of running from an uninitialized cache. It is acceptable for an initialization routine to require a fixed instruction sequence to be placed in a restricted range of memory. • ARM recommends that whenever an invalidation routine is required, it is based on the ARMv7 cache maintenance operations. When they are enabled the state of the caches is UNPREDICTABLE if the appropriate initialization routine has not been performed. Similar rules apply: • to branch predictor behavior, see Behavior of the branch predictors at reset on page B2-21 • on an ARMv7-A implementation, to TLB behavior, see TLB behavior at reset on page B3-55. Note Before ARMv7, caches are invalidated by the assertion of reset, see Cache behavior at reset on page AppxG-23. Behavior of Preload Data (PLD, PLDW) and Preload Instruction (PLI) with caches Preload Data and Preload Instruction operations are provided by the PLD and PLI instructions. These are implemented in the ARM and Thumb instruction sets. The Multiprocessing Extensions add the PLDW instruction. PLD, PLDW and PLI act as hints to the memory system, and as such their operation does not cause a precise abort to occur. However, a memory operation performed as a result of one of these memory system hints might trigger an asynchronous event, so influencing the execution of the processor. Examples of the asynchronous events that might be triggered are asynchronous aborts and interrupts. A PLD or PLDW instruction is guaranteed not to cause any effect to the caches, or TLB, or memory other than the effects that, for permission or other reasons, the equivalent load from the same location with the same context and at the same privilege level can cause. A PLD or PLDW instruction is guaranteed not to access Strongly-ordered or Device memory. A PLI instruction is guaranteed not to cause any effect to the caches, or TLB, or memory other than the effects that, for permission or other reasons, the fetch resulting from changing the PC to the location specified by the PLI instruction with the same context and at the same privilege level can cause. A PLI instruction is guaranteed not to access Strongly-ordered or Device memory. In a VMSA implementation, a PLI instruction must not perform any accesses when the MMU is disabled. Note In ARMv6, an instruction prefetch is provided by the optional Prefetch instruction cache line operation in CP15 c7, with encoding == 0, == c13, == 1, see c7, Cache operations on page AppxG-38. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B2-7 Common Memory System Architecture Features Cache lockdown Cache lockdown requirements can conflict with the management of hardware coherency. For this reason, ARMv7 introduces significant changes in this area, compared to previous versions of the ARM architecture. These changes recognize that, in many systems, cache lockdown is inappropriate. For an ARMv7 implementation: • There is no requirement to support cache lockdown. • If cache lockdown is supported, the lockdown mechanism is IMPLEMENTATION DEFINED. However key properties of the interaction of lockdown with the architecture must be described in the implementation documentation. • The Cache Type Register does not hold information about lockdown. This is a change from ARMv6. However some CP15 c9 encodings are available for IMPLEMENTATION DEFINED, cache lockdown features, see Implementation defined memory system features on page B2-27. Note For details of cache lockdown in ARMv6 see c9, Cache lockdown support on page AppxG-45. B2.2.3 Cache enabling and disabling Levels of cache on page B2-2 indicates that: • from ARMv7 the architecture defines the control of multiple levels of cache • before ARMv7 the architecture defines the control of only one level of cache. This means the mechanism for cache enabling and disabling caches changes in ARMv7. In both cases, enabling and disabling of caches is controlled by the SCTLR.C and SCTLR.I bits, see: • c1, System Control Register (SCTLR) on page B3-96, for a VMSA implementation • c1, System Control Register (SCTLR) on page B4-45, for a PMSA implementation. In ARMv7: B2-8 • The SCTLR.C bit enables or disables all data and unified caches, across all levels of cache visible to the processor. • The SCTLR.I bit enables or disables all instruction caches, across all levels of cache visible to the processor. • If an implementation requires finer-grained control of cache enabling it can implement control bits in the Auxiliary Control Register for this purpose. For example, an implementation might define control bits to enable and disable the caches at a particular level. For more information about the Auxiliary Control Register see: — c1, Implementation defined Auxiliary Control Register (ACTLR) on page B3-103, for a VMSA implementation — c1, Implementation defined Auxiliary Control Register (ACTLR) on page B4-50, for a PMSA implementation. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Common Memory System Architecture Features Note In ARMv6, the SCTLR I, C, and W bits provide separate enables for the level 1 instruction cache (if implemented), the level 1 data or unified cache, and write buffering. For more information, see c1, System Control Register (SCTLR) on page AppxG-34. When a cache is disabled: • it is IMPLEMENTATION DEFINED whether a cache hit occurs if a location that is held in the cache is accessed • any location that is not held in the cache is not brought into the cache as a result of a memory access. The SCTLR.C and SCTLR.I bits describe the enabling of the caches, and do not affect the memory attributes generated by an enabled MMU or MPU. If the MMU or MPU is disabled, the effects of the SCTLR.C and SCTLR.I bits on the memory attributes are described in: • Enabling and disabling the MMU on page B3-5 for the MMU • Behavior when the MPU is disabled on page B4-5 for the MPU. B2.2.4 Cache maintenance functionality ARMv7 redefines the required CP15 cache maintenance operations. The two main features of this change are: • improved support for multiple levels of cache, including abstracting how many levels of cache are implemented. • reducing the architecturally-defined set of operations to the minimum set required for operating systems This section only describes cache maintenance for ARMv7. For details of cache maintenance in previous versions of the ARM architecture see: • c7, Cache operations on page AppxG-38 for ARMv6 • c7, Cache operations on page AppxH-49 for the ARMv4 and ARMv5 architectures. Terms used in describing cache operations on page B2-10 describes the terms used in this section. Then the following subsections describe the ARMv7 cache maintenance functionality: • ARMv7 cache maintenance operations on page B2-13 • The ARMv7 abstraction of the cache hierarchy on page B2-15. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B2-9 Common Memory System Architecture Features Terms used in describing cache operations This section describes particular terms used in the descriptions of cache maintenance operations. Cache maintenance operations are defined to act on particular memory locations. Operations can be defined: • by the address of the memory location to be maintained, referred to as by MVA • by a mechanism that describes the location in the hardware of the cache, referred to as by set/way. In addition, the instruction cache invalidate operation has an option that invalidates all entries in the instruction caches. The following subsections define the terms used to describe the cache operations: • Operations by MVA • Operations by set/way • Clean, Invalidate, and Clean and Invalidate on page B2-11. Operations by MVA For cache operations by MVA, these terms relate to memory addressing, and in particular the relation between: • Modified Virtual Address (MVA) • Virtual Address (VA) • Physical Address (PA). The term Modified Virtual Address relates to the Fast Context Switch Extension (FCSE) mechanism, described in Appendix E Fast Context Switch Extension (FCSE). Use of the FCSE is deprecated in ARMv6 and the FCSE is optional in ARMv7. When the FCSE is absent or disabled, the MVA and VA have the same value. However the term MVA is used throughout this section, and elsewhere in this manual, for cache and TLB operations. This is consistent with previous issues of the ARM Architecture Reference Manual. Virtual addresses only exist in systems with a MMU. When no MMU is implemented or the MMU is disabled, the MVA and VA are identical to the PA. In the cache operations, any operation described as operating by MVA includes as part of any required MVA to PA translation: • the current system Application Space IDentifier (ASID) • the current security state, if the Security Extensions are implemented. Operations by set/way Cache maintenance operations by set/way refer to the particular structures in a cache. Three parameters describe the location in a cache hierarchy that an operation works on. These parameters are: Level The cache level of the hierarchy. The number of levels of cache is IMPLEMENTATION and can be determined from the Cache Level ID Register, see: • c0, Cache Level ID Register (CLIDR) on page B3-92 for a VMSA implementation • c0, Cache Level ID Register (CLIDR) on page B4-41 for a PMSA implementation. DEFINED, B2-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Common Memory System Architecture Features In the ARM architecture, the lower numbered levels are those closest to the processor, see Memory hierarchy on page A3-52. Set Each level of a cache is split up into a number of sets. Each set is a set of locations in a cache level that an address can be assigned to. Usually, the set number is an IMPLEMENTATION DEFINED function of an address. In the ARM architecture, sets are numbered from 0. Way The Associativity of a cache defines the number of locations in a set that an address can be assigned to. The way number specifies a location in a set. In the ARM architecture, ways are numbered from 0. Cache maintenance operations that work by set/way use the level, set and way values to determine the location acted on by the operation. The address in memory that corresponds to this cache location is determined by the cache. Note Because the allocation of a memory address to a cache location is entirely IMPLEMENTATION DEFINED, ARM expects that most portable code will use only the set/way operations as single steps in a routine to perform maintenance on the entire cache. Clean, Invalidate, and Clean and Invalidate Caches introduce coherency problems in two possible directions: 1. An update to a memory location by a processor that accesses a cache might not be visible to other observers that can access memory. This can occur because new updates are still in the cache and are not visible yet to the other observers that do not access that cache. 2. Updates to memory locations by other observers that can access memory might not be visible to a processor that accesses a cache. This can occur when the cache contains an old, or stale, copy of the memory location that has been updated. The Clean and Invalidate operations address these two issues. The definitions of these operations are: Clean A cache clean operation ensures that updates made by an observer that controls the cache are made visible to other observers that can access memory at the point to which the operation is performed. Once the Clean has completed, the new memory values are guaranteed to be visible to the point to which the operation is performed, for example to the point of unification. The cleaning of a cache entry from a cache can overwrite memory that has been written by another observer only if the entry contains a location that has been written to by a processor that controls that cache. Invalidate ARM DDI 0406B A cache invalidate operation ensures that updates made visible by observers that access memory at the point to which the invalidate is defined are made visible to an observer that controls the cache. This might result in the loss of updates to the locations affected by the invalidate operation that have been written by observers that access the cache. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B2-11 Common Memory System Architecture Features If the address of an entry on which the invalidate operates does not have a Normal Cacheable attribute, or if the cache is disabled, then an invalidate operation also ensures that this address is not present in the cache. Note Entries for addresses with a Normal Cacheable attribute can be allocated to an enabled cache at any time, and so the cache invalidate operation cannot ensure that the address is not present in the cache. Clean and Invalidate A cache clean and invalidate operation behaves as the execution of a clean operation followed immediately by an invalidate operation. Both operations are performed to the same location. The points to which a cache maintenance operation can be defined differ depending on whether the operation is by MVA or by set/way: • For set/way operations, and for All (entire cache) operations, the point is defined to be to the next level of caching. • For MVA operations, two conceptual points are defined: Point of coherency (POC) For a particular MVA, the POC is the point at which all agents that can access memory are guaranteed to see the same copy of a memory location. In many cases, this is effectively the main system memory, although the architecture does not prohibit the implementation of caches beyond the POC that have no effect on the coherence between memory system agents. Point of unification (POU) The PoU for a processor is the point by which the instruction and data caches and the translation table walks of that processor are guaranteed to see the same copy of a memory location. In many cases, the point of unification is the point in a uniprocessor memory system by which the instruction and data caches and the translation table walks have merged. The PoU for an Inner Shareable shareability domain is the point by which the instruction and data caches and the translation table walks of all the processors in that Inner Shareable shareability domain are guaranteed to see the same copy of a memory location. Defining this point permits self-modifying code to ensure future instruction fetches are associated with the modified version of the code by using the standard correctness policy of: 1. clean data cache entry by address 2. invalidate instruction cache entry by address. The PoU also enables a uniprocessor system which does not implement the Multiprocessing Extensions to use the clean data cache entry operation to ensure that all writes to the translation tables are visible to the translation table walk hardware. B2-12 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Common Memory System Architecture Features Three field definitions in the Cache Level ID Register relate to these conceptual points: Level of Coherence The level of coherence field defines the first level of cache that does not have to be cleaned or invalidated when cleaning or invalidating to the point of coherency. The value in the register is one less than the cache level, so a value of 0 indicates level 1 cache. For example, if the level of coherence field contains the value 3: • level 4 cache is the first level that does not have to be cleaned or invalidated • therefore, a clean to the point of coherency operation requires the level 1, level 2 and level 3 caches to be cleaned. The specified level of coherence can be a level that is not implemented, indicating that all implemented caches are before the point of coherency. Level of Unification Uniprocessor The Level of Unification Uniprocessor field defines the first level of cache that does not have to be cleaned or invalidated when cleaning or invalidating to the point of unification for the processor. As with the Level of Coherence, the value in the register is one less than the cache level, so a value of 0 indicates Level 1 cache. The specified Level of Unification Uniprocessor can be a level that is not implemented, indicating that all implemented caches are before the point of unification. Level of Unification Inner Shareable The Level of Unification Inner Shareable field defines the first level of cache that does not have to be cleaned or invalidated when cleaning or invalidating to the point of unification for the Inner Shareable shareability domain. As with the Level of Coherence, the value in the register is one less than the cache level, that means a value of 0 indicates Level 1 cache. The specified Level of Unification Inner Shareable can be a level that is not implemented, indicating that all implemented caches are before the point of unification. The Level of Unification Inner Shareable field is RAZ in implementations that do not implement the Multiprocessing Extensions. For more information, see: — c0, Cache Level ID Register (CLIDR) on page B3-92 for a VMSA implementation — c0, Cache Level ID Register (CLIDR) on page B4-41 for a PMSA implementation. ARMv7 cache maintenance operations Cache maintenance operations are performed using accesses to CP15 c7. The operations are described in: • CP15 c7, Cache and branch predictor maintenance functions on page B3-126, for a VMSA implementation • CP15 c7, Cache and branch predictor maintenance functions on page B4-68, for a PMSA implementation. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B2-13 Common Memory System Architecture Features This operations required by ARMv7 are: Data cache and unified cache line operations Any of these operations can be applied to • any data cache • any unified cache. The supported operations are: Invalidate by MVA Performs an invalidate of a data or unified cache line based on the address it contains. Invalidate by set/way Performs an invalidate of a data or unified cache line based on its location in the cache hierarchy. Clean by MVA Performs a clean of a data or unified cache line based on the address it contains. Clean by set/way Performs a clean of a data or unified cache line based on its location in the cache hierarchy. Clean and Invalidate by MVA Performs a clean and invalidate of a data or unified cache line based on the address it contains. Clean and Invalidate by set/way Performs a clean and invalidate of a data or unified cache line based on its location in the cache hierarchy. Instruction cache operations Invalidate by MVA Performs an invalidate of an instruction cache line based on the address it contains. Invalidate All Performs an invalidate of the entire instruction cache or caches, and of all Branch Prediction caches. Note Other cache maintenance operations specified in ARMv6 are not supported in ARMv7. Their associated encodings in CP15 c7 are UNPREDICTABLE. An ARMv7 implementation can add additional IMPLEMENTATION DEFINED cache maintenance functionality using CP15 c15 operations, if this is required. B2-14 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Common Memory System Architecture Features The ARMv7 specification of the cache maintenance operation describe what they are guaranteed to do in a system. It does not limit other behaviors that might occur, provided they are consistent with the requirements for cache behavior described in Cache behavior on page B2-5. This means that as a side-effect of a cache maintenance operation: • any location in the cache might be cleaned • any unlocked location in the cache might be cleaned and invalidated. Note ARM recommends that, for best performance, such side-effects are kept to a minimum. In particular, when the Security Extensions are implemented ARM strongly recommends that the side-effects of operations performed in Non-secure state do not have a significant performance impact on execution in Secure state. Effect of the Security Extensions on the cache maintenance operations When the Security Extensions are implemented, each security state has its own physical address space. For details of how this affects the cache maintenance operations see The effect of the Security Extensions on the cache operations on page B3-27. The ARMv7 abstraction of the cache hierarchy The following subsections describe the ARMv7 abstraction of the cache hierarchy: • Cache hierarchy abstraction for address-based operations • Cache hierarchy abstraction for set/way-based operations on page B2-16. Example code for cache maintenance operations on page B2-16 gives an example of cache maintenance code, that can be adapted for other cache operations, and Boundary conditions for cache maintenance operations on page B2-17 gives more information about the cache operations. Cache hierarchy abstraction for address-based operations The addressed-based cache operations are described as operating by MVA. Each of these operations is always qualified as being one of: • performed to the point of coherency • performed to the point of unification. See Terms used in describing cache operations on page B2-10 for definitions of point of coherency and point of unification, and more information about possible meanings of MVA. This means that the full list of possible address-based cache operations is: • Invalidate data cache or unified cache line by MVA to the point of coherency • Clean data cache or unified cache line by MVA to the point of coherency • Clean data cache or unified cache line by MVA to the point of unification • Clean and invalidate data cache or unified cache line by MVA to the point of coherency • Invalidate instruction cache line by MVA to the point of unification. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B2-15 Common Memory System Architecture Features The Cache Type Register holds minimum line length values for: • the instruction caches • the data and unified caches. These values enable a range of addresses to be invalidated in an efficient manner. For details of the register see: • c0, Cache Type Register (CTR) on page B3-83 for a VMSA implementation • c0, Cache Type Register (CTR) on page B4-34 for a PMSA implementation. For details of the CP15 c7 encodings for all cache maintenance operations see: • CP15 c7, Cache and branch predictor maintenance functions on page B3-126 for a VMSA implementation • CP15 c7, Cache and branch predictor maintenance functions on page B4-68 for a PMSA implementation. Cache hierarchy abstraction for set/way-based operations The set/way-based cache maintenance operations are: • Invalidate data cache or unified cache line by set/way • Clean data cache or unified cache line by set/way • Clean and invalidate data cache or unified cache line by set/way The CP15 c7 encodings of these operations include a field that must be used to specify the cache level for the operation: • a clean operation cleans from the level of cache specified through to at least the next level of cache, moving further from the processor • an invalidate operation invalidates only at the level specified. In addition to these set/way operations, a cache operation is provided for instruction cache maintenance, to Invalidate all instruction cache lines to the point of unification. For details of the CP15 c7 encodings for all cache maintenance operations see: • CP15 c7, Cache and branch predictor maintenance functions on page B3-126 for a VMSA implementation • CP15 c7, Cache and branch predictor maintenance functions on page B4-68 for a PMSA implementation. Example code for cache maintenance operations This code sequence illustrates a generic mechanism for cleaning the entire data or unified cache to the point of coherency: MRC p15, 1, R0, c0, c0, 1 ANDS R3, R0, #&7000000 MOV R3, R3, LSR #23 BEQ Finished MOV R10, #0 B2-16 ; Read CLIDR ; Cache level value (naturally aligned) Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Common Memory System Architecture Features Loop1 ADD R2, R10, R10, LSR #1 MOV R1, R0, LSR R2 AND R1, R1, #7 CMP R1, #2 BLT Skip MCR p15, 2, R10, c0, c0, 0 ISB MRC p15, 1, R1, c0, c0, 0 AND R2, R1, #&7 ADD R2, R2, #4 LDR R4, =0x3FF ANDS R4, R4, R1, LSR #3 CLZ R5, R4 LDR R7, =0x00007FFF ANDS R7, R7, R1, LSR #13 MOV R9, R4 ORR R11, R10, R9, LSL R5 ORR R11, R11, R7, LSL R2 MCR p15, 0, R11, c7, c10, 2 SUBS R9, R9, #1 BGE Loop3 SUBS R7, R7, #1 BGE Loop2 ADD R10, R10, #2 CMP R3, R10 BGT Loop1 Loop2 Loop3 Skip ; Work out 3xcachelevel ; bottom 3 bits are the Cache type for this level ; get those 3 bits alone ; ; ; ; ; ; no cache or only instruction cache at this level write the Cache Size selection register ISB to sync the change to the CacheSizeID reg reads current Cache Size ID register extract the line length field add 4 for the line length offset (log2 16 bytes) ; R4 is the max number on the way size (right aligned) ; R5 is the bit position of the way size increment ; ; ; ; ; ; R7 is the max number of the index size (right aligned) R9 working copy of the max way size (right aligned) factor in the way number and cache number into R11 factor in the index number clean by set/way decrement the way number ; decrement the index ; increment the cache number Finished Similar approaches can be used for all cache maintenance operations. Boundary conditions for cache maintenance operations Cache maintenance operations operate on the caches when the caches are enabled or when they are disabled. For the address-based cache maintenance operations, the operations operate on the caches regardless of the memory type and cacheability attributes marked for the memory address in the VMSA translation table entries or in the PMSA section attributes. This means that the cache operations take no account of: • whether the address accessed: — is Strongly-ordered, Device or Normal memory — has a Cacheable attribute or the Non-cacheable attribute • the domain control of the address accessed • the access permissions for the address accessed. Therefore, software can: • ensure there are no more allocations to the caches of a range of addresses because of prefetching effects or interrupts • at the same time, continue to perform cache maintenance operations on these addresses. In a VMSA implementation, some cache maintenance operations can generate an MMU fault, see MMU faults on page B3-40. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B2-17 Common Memory System Architecture Features B2.2.5 The interaction of cache lockdown with cache maintenance The interaction of cache lockdown and cache maintenance operations is IMPLEMENTATION DEFINED. However, an architecturally-defined cache maintenance operation on a locked cache line must comply with the following general rules: • The effect of these operations on locked cache entries is IMPLEMENTATION DEFINED: — cache clean by set/way — cache invalidate by set/way — cache clean and invalidate by set/way — instruction cache invalidate all. However, one of the following approaches must be adopted in all these cases: 1. If the operation specified an invalidation a locked entry is not invalidated from the cache. If the operation specified a clean it is IMPLEMENTATION DEFINED whether locked entries are cleaned. 2. If an entry is locked down, or could be locked down, an IMPLEMENTATION DEFINED Data Abort exception is generated, using the fault status code defined for this purpose in CP15 c5, see either: • Fault Status and Fault Address registers in a VMSA implementation on page B3-48 • Fault Status and Fault Address registers in a PMSA implementation on page B4-18. This permits a typical usage model for cache invalidate routines to operate on a large range of addresses by performing the required operation on the entire cache, without having to consider whether any cache entries are locked. The operation performed is either an invalidate, or a clean and invalidate. • The effect of these operations is IMPLEMENTATION DEFINED: — cache clean by MVA — cache invalidate by MVA — cache clean and invalidate by MVA. However, one of the following approaches must be adopted in all these cases: 1. If the operation specified an invalidation a locked entry is invalidated from the cache. For the clean and invalidate operation, the entry must be cleaned before it is invalidated. 2. If the operation specified an invalidation a locked entry is not invalidated from the cache. If the operation specified a clean it is IMPLEMENTATION DEFINED whether locked entries are cleaned 3. If an entry is locked down, or could be locked down, an IMPLEMENTATION DEFINED Data Abort exception is generated, using the fault status code defined for this purpose in CP15 c5, see either: • Fault Status and Fault Address registers in a VMSA implementation on page B3-48 • Fault Status and Fault Address registers in a PMSA implementation on page B4-18. An implementation that uses the abort mechanisms for entries that could be locked must: • B2-18 document IMPLEMENTATION DEFINED code sequences that then perform the required operation on entries that are not locked down Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Common Memory System Architecture Features • implement one of the other permitted alternatives for the locked entries. ARM recommends that, where possible, architecturally-defined operations are used in such code sequences. This minimizes the number of customized operations required. In addition, any implementation that uses aborts for handling cache maintenance operations on entries that might be locked must provide a mechanism that can be used to ensures that no entries are locked in the cache. The reset setting of the cache must be that no cache entries are locked. On an ARMv7-A implementation, similar rules apply to TLB lockdown, see The interaction of TLB maintenance operations with TLB lockdown on page B3-57. Additional cache functions for the implementation of lockdown An implementation can add additional cache maintenance functions for the handling of lockdown in the IMPLEMENTATION DEFINED spaces reserved for Cache Lockdown. Examples of possible functions are: • Operations that unlock all cache entries. • Operations that preload into specific levels of cache. These operations might be provided for instruction caches, data caches, or both. An implementation can add other functions as required. B2.2.6 Branch predictors Branch predictor hardware typically uses a form of cache to hold branch information. The ARM architecture permits this branch predictor hardware to be visible to the functional behavior of software, and so the branch predictor is not architecturally invisible. This means that under some circumstances software must perform branch predictor maintenance to avoid incorrect execution caused by out of date entries in the branch predictor. Branch prediction maintenance operations In some implementations, to ensure correct operation it might be necessary to invalidate branch prediction entries on a change of instruction or instruction address mapping. For more information, see Branch predictor maintenance operations and the memory order model on page B2-20. Two CP15 c7 operations apply to branch prediction hardware, these two functions are: MCR p15, 0, Rt, c7, c5, 6: MCR p15, 0, Rt, c7, c5, 7: Invalidate entire branch predictor array Invalidate MVA from branch predictor array In ARMv7, these functions can perform a NOP if the operation of Branch Prediction hardware is not visible architecturally. The invalidate entire branch predictor array operation ensures that any location held in the branch predictor has no functional effect on execution. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B2-19 Common Memory System Architecture Features The invalidate MVA from branch predictor array operation operates on the address of the branch instruction. It includes the current system ASID and the security state when determining which line is affected as part of any required VA to PA translation. Security state checking is performed only if the Security Extensions are implemented. The invalidate by MVA operation can affect other branch predictor entries. Note The architecture does not make visible the range of addresses in a branch predictor to which the invalidate operation applies. This means the address used in the invalidate MVA instruction must be the address of the branch to be invalidated. If the correct functioning of a system requires invalidation of the branch predictor when there are changes to the instructions in memory, the invalidate entire instruction cache operation also causes an invalidate entire branch predictor array operation. Branch predictor maintenance operations and the memory order model The following rule describes the effect of the memory order model on the branch predictor maintenance operations: • Any invalidation of the branch predictor is guaranteed to take effect only after one of the following: — execution of a ISB instruction — taking an exception — return from an exception. Therefore, if a branch instruction appears between an invalidate branch prediction instruction and an ISB operation, exception entry or exception return, it is UNPREDICTABLE whether the branch instruction is affected by the invalidate. Software must avoid this ordering of instructions, because it might lead to UNPREDICTABLE behavior. The branch predictor maintenance operations must be used to invalidate entries in the branch predictor after any of the following events: • enabling or disabling the MMU • writing new data to instruction locations • writing new mappings to the translation tables • changes to the TTBR0, TTBR1, or TTBCR registers, unless accompanied by a change to the ContextID or the FCSE ProcessID. Failure to invalidate entries might give UNPREDICTABLE results, caused by the execution of old branches. B2-20 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Common Memory System Architecture Features In ARMv7, there is no requirement to use the branch predictor maintenance operations to invalidate the branch predictor after: • changing the ContextID or FCSE ProcessID. • a cache operation that is identified as also flushing the branch target cache, see either: — CP15 c7, Cache and branch predictor maintenance functions on page B3-126 for a VMSA implementation — CP15 c7, Cache and branch predictor maintenance functions on page B4-68 for a PMSA implementation. Note In ARMv6, the branch predictor must be invalidated after a change to the ContextID or FCSE ProcessID, see c13, Context ID support on page AppxG-54. Behavior of the branch predictors at reset In ARMv7: • If branch predictors are not architecturally invisible the branch prediction logic is disabled at reset. • An implementation can require the use of a specific branch predictor initialization routine to invalidate its storage array before it is enabled. The exact form of any required initialization routine is IMPLEMENTATION DEFINED, but the routine must be documented clearly as part of the documentation of the device. • ARM recommends that whenever an invalidation routine is required, it is based on the ARMv7 branch predictor maintenance operations. When it is enabled the state of the branch predictor logic is UNPREDICTABLE if the appropriate initialization routine has not been performed. Similar rules apply: • to cache behavior, see Behavior of the caches at reset on page B2-6 • on an ARMv7-A implementation, to TLB behavior, see TLB behavior at reset on page B3-55. B2.2.7 Ordering of cache and branch predictor maintenance operations The following rules describe the effect of the memory order model on the cache and branch predictor maintenance operations: • All cache and branch predictor maintenance operations are executed, relative to each other, in program order. • On an ARMv7-A implementation, where a cache or branch predictor maintenance operation appears in program order before a change to the translation tables, the cache or branch predictor maintenance operation is guaranteed to take place before the change to the translation tables is visible. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B2-21 Common Memory System Architecture Features • On an ARMv7-A implementation, where a change of the translation tables appears in program order before a cache or branch predictor maintenance operation, that change is guaranteed to be visible only after the sequence outlined in TLB maintenance operations and the memory order model on page B3-59 is executed. • A DMB instruction causes the effect of all data cache or unified cache maintenance operations appearing in program order before the DMB to be visible to all explicit load and store operations appearing in program order after the DMB. It also ensures that the effects of any data cache or unified cache maintenance operations appearing in program order before the DMB are observable by any observer in the same required shareability domain before any data cache or unified cache maintenance or explicit memory operations appearing in program order after the DMB are observed by the same observer. Completion of the DMB does not guarantee the visibility of all data to other observers. For example, all data might not be visible to a translation table walk, or to instruction fetches. • A DSB causes the completion of all cache maintenance operations appearing in program order before the DSB instruction. • An ISB instruction or an exception entry or a return from exception causes the effect of all branch predictor maintenance operations appearing in program order before the ISB instruction, exception entry or exception return to be visible to all instructions after the ISB instruction, exception entry or exception return. • Any data cache or unified cache maintenance operation by MVA must be executed in program order relative to any explicit load or store on the same processor to an address covered by the MVA of the cache operation. The order of memory accesses that result from the cache maintenance operation, relative to any other memory accesses, are subject to the memory ordering rules. For more information, see Ordering requirements for memory accesses on page A3-45. • There is no restriction on the ordering of data cache or unified cache maintenance operations by MVA relative to any explicit load or store on the same processor where the address of the explicit load or store is not covered by the MVA of the cache operation. Where the ordering must be restricted, a DMB instruction must be inserted to enforce ordering. • There is no restriction on the ordering of a data cache or unified cache maintenance operation by set/way relative to any explicit load or store on the same processor. Where the ordering must be restricted, a DMB instruction must be inserted to enforce ordering. • The execution of a data cache or unified cache maintenance operation by set/way might not be visible to other observers in the system until after a DSB instruction is executed. • The execution of an instruction cache maintenance operation is guaranteed to be complete only after the execution of a DSB instruction. • The completion of an instruction cache maintenance operation is guaranteed to be visible to the instruction fetch only after the execution of an ISB instruction or an exception entry or return from exception. The last two points mean that the sequence of cache cleaning operations for a line of self-modifying code on a uniprocessor system is: B2-22 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Common Memory System Architecture Features ; Enter this code with containing the new 32-bit instruction. Use STRH in the first ; line instead of STR for a 16-bit instruction. STR , [instruction location] Clean data cache by MVA to point of unification [instruction location] DSB ; Ensures visibility of the data cleaned from the data cache Invalidate instruction cache by MVA [instruction location] Invalidate BTC entry by MVA [instruction location] DSB ; Ensures completion of the instruction cache invalidation ISB B2.2.8 Multiprocessor effects on cache maintenance operations This section describes the multiprocessor effects on cache maintenance operations for the base ARMv7 architecture and the base ARMv7 architecture with Multiprocessing Extensions. Base ARMv7 architecture The base ARMv7 architecture defines that all cache maintenance operations apply only to the caches directly attached to the processor on which the operation is executed. There is no requirement that cache maintenance operations influence all processors with which the data can be shared. In porting an architecturally portable multiprocessor operating system to ARMv7, when a cache maintenance operation is performed, Inter-Processor Interrupts (IPIs) must be used to inform other processors in a multiprocessor configuration that they must perform the equivalent operation. Multiprocessing Extensions To improve the implementation of multiprocessor systems, a set of extensions to ARMv7, called the Multiprocessing Extensions, has been introduced. These expand the role of cache and branch predictor maintenance operations in the multiprocessing system. For the VMSA architecture, the Multiprocessing Extensions also extend the role of TLB operations. For more information see Multiprocessor effects on TLB maintenance operations on page B3-62. The extensions can be implemented in a uniprocessor system with no hardware support for cache coherency. In such a system, the Inner Shareable and Outer Shareable domains would be limited to being the single processor, and all instructions defined to apply to the Inner Shareable domains behave as aliases of the local operations. Data and Unified cache operations to the point of coherency The following instructions have an effect on data and unified caches to the point of coherency, and must affect the caches of other processors in the shareability domain described by the shareability attributes of the MVA passed with the instruction: • invalidate data, or unified, cache line by MVA to the point of coherency (DCIMVAC) • clean data, or unified, cache line by MVA to the point of coherency (DCCMVAC) • clean and invalidate data (or unified) cache line by MVA to the point of coherency (DCCIMVAC). ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B2-23 Common Memory System Architecture Features Table B2-1 shows, for these instructions, the minimum set of processors that they affect, and the earliest point that the operations occur to depends upon the shareability attribute of the address being used. Table B2-1 Processors affected by Data and Unified cache operations Shareability Processors affected Point that the operations occur to Non-shareable The processor executing the instruction Point of coherency of the entire system Inner Shareable All processors in the same Inner Shareable shareability domain as the processor executing the instruction Point of coherency of the entire system Outer Shareable All processors in the same Outer shareable shareability domain as the processor executing the instruction Point of coherency of the entire system Address based cache maintenance operations not to the point of coherency The following operations are redefined in the Multiprocessing Extensions: • Clean data, or unified, cache line by MVA to the point of unification (DCCMVAU) • Invalidate instruction cache line by MVA to point of unification (ICIMVAU) • Invalidate MVA from branch predictor array (BPIMVA) Table B2-2 shows, for these instructions, the minimum set of processors that they effect, and the earliest point that the operations occur to depends upon the shareability attribute of the address being used. Table B2-2 Processors affected byAddress based cache maintenance operations Shareability of the Address Processors affected Point that the operations occur to Non-Shareable The processor executing the instruction To the point of unification of instruction cache fills, data cache fills and writebacks, and translation table walks on the processor executing the instruction Inner Shareable or Outer shareable All processors in the same Inner Shareable shareability domain as the processor executing the instruction To the point of unification of instruction cache fills, data cache fills and writebacks, and translation table walks of all processors in the same Inner Shareable shareability domain as the processor executing the instruction B2-24 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Common Memory System Architecture Features Note The set of processors that is guaranteed to be affected is never greater than the Inner Shareable shareability domain containing the executing processor. Entire and set/way based cache maintenance operations This section describes the Local and Inner Shareable instructions for entire and set/way based cache maintenance operations: Local instructions The following instructions are only guaranteed to apply to the caches of the processor that the instructions are run on: • Invalidate entire instruction cache (ICIALLU) • Invalidate entire branch predictor array (BPIALL) • Clean and Invalidate data or unified cache line by set/way (DCCISW) • Clean data or unified cache line by set/way (DCCSW) • Invalidate data or unified cache line by set/way (DCISW). These operations have an effect on the processor executing the instruction. These operations are functionally unchanged from the base architecture. Inner Shareable instructions The following instructions can be applied to the caches of all processors in the same Inner Shareable shareability domain as the processor executing the instruction: • Invalidate entire branch predictor array Inner Shareable (BPIALLIS) • Invalidate entire instruction cache Inner Shareable (ICIALLUIS). ICIALLUIS automatically performs the BPIALLIS function, in the same way as ICIALLU automatically performs the BPIALL function. These operations have an effect to the point of unification of instruction cache fills, data cache fills and writebacks, and translation table walks of all processors in the same Inner Shareable shareability domain. These instructions complement the ICIALLU and BPIALL instructions defined in the base ARMv7 architecture, and extend them to the same Inner Shareable shareability domain. Inner Shareable instructions encodings: • ICIALLUIS is encoded as MCR p15, 0, , c7, c1, 0 • BPIALLIS is encoded as MCR p15, 0, , c7, c1, 6 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B2-25 Common Memory System Architecture Features B2.2.9 System-level caches The system-level architecture might define further aspects of the software view of caches and the memory model that are not defined by the ARMv7 processor architecture. These aspects of the system-level architecture can affect the requirements for software management of caches and coherency. For example, a system design might introduce additional levels of caching that cannot be managed using the CP15 maintenance operations defined by the ARMv7 architecture. Typically, such caches are referred to as system caches and are managed through the use of memory-mapped operations. The ARMv7 architecture does not forbid the presence of system caches that are outside the scope of the architecture, but ARM strongly recommends the following for any such cache: B2-26 • Physical, rather than virtual, addresses are used for address-based cache maintenance operations. • Any IMPLEMENTATION DEFINED system cache maintenance operations include as a minimum the set of functions defined by ARMv7 cache maintenance operations on page B2-13, with the number of levels of system cache operated on by these cache maintenance operations being IMPLEMENTATION DEFINED. • Where possible, such system caches are included in the caches affected by the architecturally-defined CP15 cache maintenance operations, so that the architecturally-defined software sequences for managing the memory model and coherency are sufficient for managing all caches in the system. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Common Memory System Architecture Features B2.3 IMPLEMENTATION DEFINED memory system features ARMv7 reserves space in the SCTLR for use with IMPLEMENTATION DEFINED features of the cache, and other IMPLEMENTATION DEFINED features of the memory system architecture. In particular, in ARMv7 the following memory system features are IMPLEMENTATION DEFINED: • Cache lockdown, see Cache lockdown on page B2-8. • In VMSAv7, TLB lockdown, see TLB lockdown on page B3-56. • Tightly Coupled Memory (TCM) support, including any associated DMA scheme. The TCM Type Register, TCMTR is required in all implementations, and if no TCMs are implemented this must be indicated by the value of this register. Note For details of the optional TCMs and associated DMA scheme in ARMv6 see Tightly Coupled Memory (TCM) support on page AppxG-23. B2.3.1 ARMv7 CP15 register support for IMPLEMENTATION DEFINED features The ARMv7 CP15 registers implementation includes the following support for IMPLEMENTATION DEFINED features of the memory system: • The TCM Type Register, TCMTR, in CP15 c0, must be implemented. The following conditions apply to this register: — — If no TCMs are implemented, the TCMTR indicates zero-size TCMs. For more information see c0, TCM Type Register (TCMTR) on page B3-85 (for a VMSA implementation) or c0, TCM Type Register (TCMTR) on page B4-35 (for a PMSA implementation). If bits [31:29] are 0b100, the format of the rest of the register format is IMPLEMENTATION This value indicates that the implementation includes TCMs that do not follow the ARMv6 usage model. Other fields in the register might give more information about the TCMs. DEFINED. For more information, see: — c0, TCM Type Register (TCMTR) on page B3-85, for a VMSA implementation — c0, TCM Type Register (TCMTR) on page B4-35, for a PMSA implementation. • The CP15 c9 encoding space with = {0-2,5-7} is IMPLEMENTATION DEFINED for all values of and . This space is reserved for branch predictor, cache and TCM functionality, for example maintenance, override behaviors and lockdown. It permits: — ARMv6 backwards compatible schemes — alternative schemes. For more information, see: ARM DDI 0406B — CP15 c9, Cache and TCM lockdown registers and performance monitors on page B3-141, for a VMSA implementation — CP15 c9, Cache and TCM lockdown registers and performance monitors on page B4-74, for a PMSA implementation. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B2-27 Common Memory System Architecture Features • In a VMSAv7 implementation, part of the CP15 c10 encoding space is IMPLEMENTATION DEFINED and reserved for TLB functionality, see TLB lockdown on page B3-56. • The CP15 c11 encoding space with = {0-8,15} is IMPLEMENTATION DEFINED for all values of and . This space is reserved for DMA operations to and from the TCMs It permits: — — an ARMv6 backwards compatible scheme an alternative scheme. For more information, see: — CP15 c11, Reserved for TCM DMA registers on page B3-147, for a VMSA implementation — CP15 c11, Reserved for TCM DMA registers on page B4-75, for a PMSA implementation. B2-28 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Common Memory System Architecture Features B2.4 Pseudocode details of general memory system operations This section contains pseudocode describing general memory operations, in the subsections: • Memory data type definitions. • Basic memory accesses on page B2-30. • Interfaces to memory system specific pseudocode on page B2-30. • Aligned memory accesses on page B2-31 • Unaligned memory accesses on page B2-32 • Reverse endianness on page B2-34 • Exclusive monitors operations on page B2-35 • Access permission checking on page B2-37 • Default memory access decode on page B2-37 • Data Abort exception on page B2-39. The pseudocode in this section applies to both VMSA and PMSA implementations. Additional pseudocode for memory operations is given in: • Pseudocode details of VMSA memory system operations on page B3-156 • Pseudocode details of PMSA memory system operations on page B4-79. B2.4.1 Memory data type definitions The following data type definitions are used by the memory system pseudocode functions: // Types of memory enumeration MemType {MemType_Normal, MemType_Device, MemType_StronglyOrdered}; // Memory attributes descriptor type MemoryAttributes is ( MemType type, bits(2) innerattrs, // ‘00’ = Non-cacheable; ‘01’ = WBWA; ‘10’ = WT; ‘11’ = WBnWA bits(2) outerattrs, // ‘00’ = Non-cacheable; ‘01’ = WBWA; ‘10’ = WT; ‘11’ = WBnWA boolean shareable, boolean outershareable ) // Physical address type, with extra bits used by some VMSA features type FullAddress is ( bits(32) physicaladdress, bits(8) physicaladdressext, bit NS // ‘0’ = Secure, ‘1’ = Non-secure ) // Descriptor used to access the underlying memory array type AddressDescriptor is ( MemoryAttributes memattrs, ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B2-29 Common Memory System Architecture Features FullAddress paddress ) // Access permissions descriptor type Permissions is ( bits(3) ap, // Access Permission bits bit xn // Execute Never bit ) B2.4.2 Basic memory accesses The _Mem[] function performs single-copy atomic, aligned, little-endian memory accesses to the underlying physical memory array of bytes: bits(8*size) _Mem[AddressDescriptor memaddrdesc, integer size] assert size == 1 || size == 2 || size == 4 || size == 8; _Mem[AddressDescriptor memaddrdesc, integer size] = bits(8*size) value assert size == 1 || size == 2 || size == 4 || size == 8; This function addresses the array using memaddrdesc.paddress,that supplies: • A 32-bit physical address. • An 8-bit physical address extension, that is treated as additional high-order bits of the physical address. This extension is always 0b00000000 in the PMSA. • A single NS bit to select between Secure and Non-secure parts of the array. This bit is always 0 if the Security Extensions are not implemented. The actual implemented array of memory might be smaller than the 241 bytes implied. In this case, the scheme for aliasing is IMPLEMENTATION DEFINED, or some parts of the address space might give rise to external aborts. For more information, see: • External aborts on page B3-45 for a VMSA implementation • External aborts on page B4-15 for a PMSA implementation. The attributes in memaddrdesc.memattrs are used by the memory system to determine caching and ordering behaviors as described in Memory types and attributes and the memory order model on page A3-24. B2.4.3 Interfaces to memory system specific pseudocode The following functions call the VMSA-specific or PMSA-specific functions to handle Alignment faults and perform address translation. // AlignmentFault() // ================ AlignmentFault(bits(32) address, boolean iswrite) case MemorySystemArchitecture() of when MemArch_VMSA AlignmentFaultV(address, iswrite); B2-30 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Common Memory System Architecture Features when MemArch_PMSA AlignmentFaultP(address, iswrite); // TranslateAddress() // ================== AddressDescriptor TranslateAddress(bits(32) VA, boolean ispriv, boolean iswrite) case MemorySystemArchitecture() of when MemArch_VMSA return TranslateAddressV(VA, ispriv, iswrite); when MemArch_PMSA return TranslateAddressP(VA, ispriv, iswrite); B2.4.4 Aligned memory accesses The MemA[] function performs a memory access at the current privilege level, and the MemA_unpriv[] function performs an access that is always unprivileged. In both cases the architecture requires the access to be aligned, and in ARMv7 the function generates an Alignment fault if it is not. Note In versions of the architecture before ARMv7, if the SCTLR.A and SCTLR.U bits are both 0, an unaligned access is forced to be aligned by replacing the low-order address bits with zeros. // MemA[] // ====== bits(8*size) MemA[bits(32) address, integer size] return MemA_with_priv[address, size, CurrentModeIsPrivileged()]; MemA[bits(32) address, integer size] = bits(8*size) value MemA_with_priv[address, size, CurrentModeIsPrivileged()] = value; return; // MemA_unpriv[] // ============= bits(8*size) MemA_unpriv[bits(32) address, integer size] return MemA_with_priv[address, size, FALSE]; MemA_unpriv[bits(32) address, integer size] = bits(8*size) value MemA_with_priv[address, size, FALSE] = value; return; // MemA_with_priv[] // ================ // Non-assignment form bits(8*size) MemA_with_priv[bits(32) address, integer size, boolean privileged] // Sort out alignment if address == Align(address, size) then VA = address; elsif SCTLR.A == ‘1’ || SCTLR.U == ‘1’ then ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B2-31 Common Memory System Architecture Features AlignmentFault(address, FALSE); else // if legacy non alignment-checking configuration VA = Align(address, size); // MMU or MPU memaddrdesc = TranslateAddress(VA, privileged, FALSE); // Memory array access, and sort out endianness value = _Mem[memaddrdesc, size]; if CPSR.E == ‘1’ then value = BigEndianReverse(value, size); return value; // Assignment form MemA_with_priv[bits(32) address, integer size, boolean privileged] = bits(8*size) value // Sort out alignment if address == Align(address, size) then VA = address; elsif SCTLR.A == ‘1’ || SCTLR.U == ‘1’ then AlignmentFault(address, FALSE); else // if legacy non alignment-checking configuration VA = Align(address, size); // MMU or MPU memaddrdesc = TranslateAddress(VA, privileged, TRUE); // Effect on exclusives if memaddrdesc.memattrs.shareable then ClearExclusiveByAddress(memaddrdesc.physicaladdress, ProcessorID(), size); // Sort out endianness, then memory array access if CPSR.E == ‘1’ then value = BigEndianReverse(value, size); _Mem[memaddrdesc,size] = value; return; B2.4.5 Unaligned memory accesses The MemU[] function performs a memory access at the current privilege level, and the MemU_unpriv[] function performs an access that is always unprivileged. In both cases: • if the SCTLR.A bit is 0, unaligned accesses are supported • if the SCTLR.A bit is 1, unaligned accesses produce Alignment faults. B2-32 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Common Memory System Architecture Features Note In versions of the architecture before ARMv7, if the SCTLR.A and SCTLR.U bits are both 0, an unaligned access is forced to be aligned by replacing the low-order address bits with zeros. // MemU[] // ====== bits(8*size) MemU[bits(32) address, integer size] return MemU_with_priv[address, size, CurrentModeIsPrivileged()]; MemU[bits(32) address, integer size] = bits(8*size) value MemU_with_priv[address, size, CurrentModeIsPrivileged()] = value; return; // MemU_unpriv[] // ============= bits(8*size) MemU_unpriv[bits(32) address, integer size] return MemU_with_priv[address, size, FALSE]; MemU_unpriv[bits(32) address, integer size] = bits(8*size) value MemU_with_priv[address, size, FALSE] = value; return; // // // // // // // MemU_with_priv[] ================ Due to single-copy atomicity constraints, the aligned accesses are distinguished from the unaligned accesses: * aligned accesses are performed at their size * unaligned accesses are expressed as a set of bytes. // Non-assignment form bits(8*size) MemU_with_priv[bits(32) address, integer size, boolean privileged] bits(8*size) value; // Legacy non alignment-checking configuration forces access to be aligned if SCTLR.A == ‘0’ && SCTLR.U == ‘0’ then address = Align(address, size); // Do aligned access, take alignment fault, or do sequence of bytes if address == Align(address, size) then value = MemA_with_priv[address, size, privileged]; elsif SCTLR.A == ‘1’ then AlignmentFault(address, FALSE); else // if unaligned access, SCTLR.A == ‘0’, and SCTLR.U == ‘1’ for i = 0 to size-1 value<8*i+7:8*i> = MemA_with_priv[address+i, 1, privileged]; if CPSR.E == ‘1’ then value = BigEndianReverse(value, size); ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B2-33 Common Memory System Architecture Features return value; // Assignment form MemU_with_priv[bits(32) address, integer size, boolean privileged] = bits(8*size) value // Legacy non alignment-checking configuration forces access to be aligned if SCTLR.A == ‘0’ && SCTLR.U == ‘0’ then address = Align(address, size); // Do aligned access, take alignment fault, or do sequence of bytes if address == Align(address, size) then MemA_with_priv[address, value, privileged] = value; elsif SCTLR.A == ‘1’ then AlignmentFault(address, TRUE); else // if unaligned access, SCTLR.A == ‘0’, and SCTLR.U == ‘1’ if CPSR.E == ‘1’ then value = BigEndianReverse(value, size); for i = 0 to size-1 MemA_with_priv[address+i, 1, privileged] = value<8*i+7:8*i>; return; B2.4.6 Reverse endianness The following pseudocode describes the operation to reverse endianness: // BigEndianReverse() // ================== bits(8*N) BigEndianReverse (bits(8*N) value, integer N) assert N == 1 || N == 2 || N == 4 || N == 8; bits(8*N) result; case N of when 1 result<7:0> = value<7:0>; when 2 result<15:8> = value<7:0>; result<7:0> = value<15:8>; when 4 result<31:24> = value<7:0>; result<23:16> = value<15:8>; result<15:8> = value<23:16>; result<7:0> = value<31:24>; when 8 result<63:56> = value<7:0>; result<55:48> = value<15:8> result<47:40> = value<23:16>; result<39:32> = value<31:24>; result<31:24> = value<39:32>; result<23:16> = value<47:40>; result<15:8> = value<55:48>; result<7:0> = value<63:56>; return result; B2-34 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Common Memory System Architecture Features B2.4.7 Exclusive monitors operations The SetExclusiveMonitors() function sets the exclusive monitors for a Load-Exclusive instruction. The ExclusiveMonitorsPass() function checks whether a Store-Exclusive instruction still has possession of the exclusive monitors and therefore completes successfully. // SetExclusiveMonitors() // ====================== SetExclusiveMonitors(bits(32) address, integer size) memaddrdesc = TranslateAddress(address, CurrentModeIsPrivileged(), FALSE); if memaddrdesc.memattrs.shareable then MarkExclusiveGlobal(memaddrdesc.physicaladdress, ProcessorID(), size); MarkExclusiveLocal(memaddrdesc.physicaladdress, ProcessorID(), size); // ExclusiveMonitorsPass() // ======================= boolean ExclusiveMonitorsPass(bits(32) address, integer size) // // // // It is IMPLEMENTATION DEFINED whether the detection of memory aborts happens before or after the check on the local Exclusive Monitor. As a result a failure of the local monitor can occur on some implementations even if the memory access would give an memory abort. if address != Align(address, size) then AlignmentFault(address, TRUE); else memaddrdesc = TranslateAddress(address, CurrentModeIsPrivileged(), TRUE); passed = IsExclusiveLocal(memaddrdesc.paddress, ProcessorID(), size); if memaddrdesc.memattrs.shareable then passed = passed && IsExclusiveGlobal(memaddrdesc.paddress, ProcessorID(), size); if passed then ClearExclusiveLocal(ProcessorID()); return passed; The MarkExclusiveGlobal() procedure takes as arguments a FullAddress paddress, the processor identifier processorid and the size of the transfer. The procedure records that processor processorid has requested exclusive access covering at least size bytes from address paddress. The size of region marked as exclusive is IMPLEMENTATION DEFINED, up to a limit of 2KB, and no smaller than two words, and aligned in the address space to the size of the region. It is UNPREDICTABLE whether this causes any previous request for exclusive access to any other address by the same processor to be cleared. MarkExclusiveGlobal(FullAddress paddress, integer processorid, integer size) ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B2-35 Common Memory System Architecture Features The MarkExclusiveLocal() procedure takes as arguments a FullAddress paddress, the processor identifier processorid and the size of the transfer. The procedure records in a local record that processor processorid has requested exclusive access to an address covering at least size bytes from address paddress. The size of the region marked as exclusive is IMPLEMENTATION DEFINED, and can at its largest cover the whole of memory, but is no smaller than two words, and is aligned in the address space to the size of the region. It is IMPLEMENTATION DEFINED whether this procedure also performs a MarkExclusiveGlobal() using the same parameters. MarkExclusiveLocal(FullAddress paddress, integer processorid, integer size) The IsExclusiveGlobal() function takes as arguments a FullAddress paddress, the processor identifier processorid and the size of the transfer. The function returns TRUE if the processor processorid has marked in a global record an address range as exclusive access requested that covers at least the size bytes from address paddress. It is IMPLEMENTATION DEFINED whether it returns TRUE or FALSE if a global record has marked a different address as exclusive access requested. If no address is marked in a global record as exclusive access, IsExclusiveGlobal() returns FALSE. boolean IsExclusiveGlobal(FullAddress paddress, integer processorid, integer size) The IsExclusiveLocal() function takes as arguments a FullAddress paddress, the processor identifier processorid and the size of the transfer. The function returns TRUE if the processor processorid has marked an address range as exclusive access requested that covers at least the size bytes from address paddress. It is IMPLEMENTATION DEFINED whether this function returns TRUE or FALSE if the address marked as exclusive access requested does not cover all of the size bytes from address paddress. If no address is marked as exclusive access requested, then this function returns FALSE. It is IMPLEMENTATION DEFINED whether this result is ANDed with the result of IsExclusiveGlobal() with the same parameters. boolean IsExclusiveLocal(FullAddress paddress, integer processorid, integer size) The ClearExclusiveByAddress() procedure takes as arguments a FullAddress paddress, the processor identifier processorid and the size of the transfer. The procedure clears the global records of all processors, other than processorid, for which an address region including any of the size bytes starting from paddress has had a request for an exclusive access. It is IMPLEMENTATION DEFINED whether the equivalent global record of the processor processorid is also cleared if any of the size bytes starting from paddress has had a request for an exclusive access, or if any other address has had a request for an exclusive access. ClearExclusiveByAddress(FullAddress paddress, integer processorid, integer size) The ClearExclusiveLocal() procedure takes as arguments the processor identifier processorid. The procedure clears the local record of processor processorid for which an address has had a request for an exclusive access. It is IMPLEMENTATION DEFINED whether this operation also clears the global record of processor processorid that an address has had a request for an exclusive access. ClearExclusiveLocal(integer processorid) B2-36 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Common Memory System Architecture Features B2.4.8 Access permission checking The function CheckPermission() is used by both the VMSA and PMSA architectures to perform access permission checking based on attributes derived from the translation tables or region descriptors. The domain and sectionnotpage arguments are only relevant for the VMSA architecture. The interpretation of the access permissions is shown in: • Access permissions on page B3-28, for a VMSA implementation • Access permissions on page B4-9, for a PMSA implementation. The following pseudocode describes the checking of the access permission: // CheckPermission() // ================= CheckPermission(Permissions perms, bits(32) mva, boolean sectionnotpage, bits(4) domain, boolean iswrite, boolean ispriv) if SCTLR.AFE == ‘0’ then perms.ap<0> = ‘1’; case perms.ap of when ‘000’ abort = TRUE; when ‘001’ abort = !ispriv; when ‘010’ abort = !ispriv && iswrite; when ‘011’ abort = FALSE; when ‘100’ UNPREDICTABLE; when ‘101’ abort = !ispriv || iswrite; when ‘110’ abort = iswrite; when ‘111’ if MemorySystemArchitecture() == MemArch_VMSA then abort = iswrite else UNPREDICTABLE; if abort then DataAbort(mva, domain, sectionnotpage, iswrite, DAbort_Permission); return; B2.4.9 Default memory access decode The function DefaultTEXDecode() is used by both the VMSA and PMSA architectures to decode the texcb and S attributes derived from the translation tables or region descriptors. The interpretation of the arguments is shown in: • C, B, and TEX[2:0] encodings without TEX remap on page B3-33, for a VMSA implementation • C, B, and TEX[2:0] encodings on page B4-11, for a PMSA implementation. The following pseudocode describes the default memory access decoding, when memory region remapping is not implemented: ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B2-37 Common Memory System Architecture Features // DefaultTEXDecode() // ================== MemoryAttributes DefaultTEXDecode(bits(5) texcb, bit S) MemoryAttributes memattrs; case texcb of when ‘00000’ memattrs.type = MemType_StronglyOrdered; memattrs.innerattrs = ‘00’; // Non-cacheable memattrs.outerattrs = ‘00’; // Non-cacheable memattrs.shareable = TRUE; when ‘00001’ memattrs.type = MemType_Device; memattrs.innerattrs = ‘00’; // Non-cacheable memattrs.outerattrs = ‘00’; // Non-cacheable memattrs.shareable = TRUE; when ‘0001x’, ‘00100’ memattrs.type = MemType_Normal; memattrs.innerattrs = texcb<1:0>; memattrs.outerattrs = texcb<1:0>; memattrs.shareable = (S == ‘1’); when ‘00110’ IMPLEMENTATION_DEFINED setting of memattrs; when ‘00111’ memattrs.type = MemType_Normal; memattrs.innerattrs = ‘01’; // Write-back write-allocate cacheable memattrs.outerattrs = ‘01’; // Write-back write-allocate cacheable memattrs.shareable = (S == ‘1’); when ‘01000’ memattrs.type = MemType_Device; memattrs.innerattrs = ‘00’; // Non-cacheable memattrs.outerattrs = ‘00’; // Non-cacheable memattrs.shareable = FALSE; when ‘1xxxx’ memattrs.type = MemType_Normal; memattrs.innerattrs = texcb<1:0>; memattrs.outerattrs = texcb<3:2>; memattrs.shareable = (S == ‘1’); otherwise UNPREDICTABLE; memattrs.outershareable = memattrs.shareable; return memattrs; B2-38 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Common Memory System Architecture Features B2.4.10 Data Abort exception The DataAbort() function generates a Data Abort exception and is used by both the VMSA and PMSA architectures. It sets the DFSR to indicate: • the type of the abort, including the distinction between section and page on a VMSA implementation • on a VMSA implementation, the domain, if appropriate • whether the access was a read or write. For a synchronous abort it also sets the DFAR to the MVA of the abort. For details of the FSR encoding values see: • Fault Status Register encodings for the VMSA on page B3-50, for a VMSA implementation • Fault Status Register encodings for the PMSA on page B4-19, for a PMSA implementation. An implementation might also set the IMPLEMENTATION DEFINED ADFSR. // Data abort types. enumeration DAbort {DAbort_AccessFlag, DAbort_Alignment, DAbort_Background, DAbort_Domain, DAbort_Permission, DAbort_Translation}; DataAbort(bits(32) address, bits(4) domain, boolean sectionnotpage, boolean iswrite, DAbort type) ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B2-39 Common Memory System Architecture Features B2-40 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Chapter B3 Virtual Memory System Architecture (VMSA) This chapter provides a system-level view of the Virtual Memory System Architecture (VMSA), the memory system architecture of an ARMv7-A implementation. It contains the following sections: • About the VMSA on page B3-2 • Memory access sequence on page B3-4 • Translation tables on page B3-7 • Address mapping restrictions on page B3-23 • Secure and Non-secure address spaces on page B3-26 • Memory access control on page B3-28 • Memory region attributes on page B3-32 • VMSA memory aborts on page B3-40 • Fault Status and Fault Address registers in a VMSA implementation on page B3-48 • Translation Lookaside Buffers (TLBs) on page B3-54 • Virtual Address to Physical Address translation operations on page B3-63 • CP15 registers for a VMSA implementation on page B3-64 • Pseudocode details of VMSA memory system operations on page B3-156. Note For an ARMv7-A implementation, this chapter must be read with Chapter B2 Common Memory System Architecture Features. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-1 Virtual Memory System Architecture (VMSA) B3.1 About the VMSA Complex operating systems typically use a virtual memory system to provide separate, protected address spaces for different processes. The ARMv7 VMSA is referred to as VMSAv7. For details of the differences in previous versions of the ARM architecture see: • Virtual memory support on page AppxH-21 for the ARMv4 and ARMv5 architectures • Virtual memory support on page AppxG-24 for ARMv6. In a VMSA, a Memory Management Unit (MMU) provides facilities that enable an operating system to dynamically allocate memory and other memory-mapped system resources to the processes. The MMU provides fine-grained control of a memory system through a set of virtual to physical address mappings and associated memory properties held in memory-mapped tables known as translation tables. The translation properties associated with each translation table entry include: Memory access permission control This controls whether a program has access to a memory area. The possible settings are no access, read-only access, or read/write access. In addition, there is control of whether code can be executed from the memory area. If a processor attempts an access that is not permitted, a memory abort is signaled to the processor. The permitted level of access can be affected by: • whether the program is running in User mode or a privileged mode • the use of domains. Memory region attributes These describe the properties of a memory region. The top-level attribute, the Memory type, is one of Strongly-ordered, Device, or Normal. Device and Normal memory regions have additional attributes, see Summary of ARMv7 memory attributes on page A3-25. Virtual-to-physical address mapping The VMSA regards the address of an explicit data access or an instruction fetch as a Virtual Address (VA). The MMU maps this address onto the required Physical Address (PA). VA to PA address mapping can be used to manage the allocation of physical memory in many ways. For example: • to allocate memory to different processes with potentially conflicting address maps • to enable an application with a sparse address map to use a contiguous region of physical memory. A full translation table lookup is called a translation table walk. It is performed automatically by hardware, and has a significant cost in execution time, requiring at least one main memory access, and often two. Translation Lookaside Buffers (TLBs) reduce the average cost of a memory access by caching the results of translation table walks. TLBs behave as caches of the translation table information, and the VMSA provides TLB maintenance operations to manage TLB contents in software. B3-2 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) To reduce the software overhead of TLB maintenance, the VMSA distinguishes between Global pages and Process specific pages. The Address Space Identifier (ASID) identifies pages associated with a specific process and provides a mechanism for changing process specific tables without having to perform maintenance on the TLB structures. System Control coprocessor (CP15) registers control the VMSA, including defining the location of the translation tables. They include registers that contain memory fault status and address information. See CP15 registers for a VMSA implementation on page B3-64. When the Security Extensions are implemented, many of the CP15 registers are banked between the Secure and Non-secure security states. This means separate system control software can be used in the different security states. VMSAv7 supports physical addresses of up to 40 bits, though implementations are permitted to support only 32 bits of physical address. Where implementations support more than 32 bits of physical address, generating physical addresses with PA[39:32] != 0b00000000 requires the use of Supersections, see Translation tables on page B3-7. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-3 Virtual Memory System Architecture (VMSA) B3.2 Memory access sequence Explicit data accesses and instruction fetches generate memory accesses using VAs. The VA for an access is subject to two stages of address translation: B3.2.1 1. A translation from VA to Modified Virtual Address (MVA) by the FCSE, if it is implemented, see FCSE translation 2. A translation from MVA to PA using the translation tables, see Translation from MVA to PA using the translation tables. FCSE translation The FCSE translation is a linear remapping of the bottom 32MBytes of the Virtual Address map, to a 32MByte address block determined by the FCSEIDR, see c13, FCSE Process ID Register (FCSEIDR) on page B3-152. Therefore, the translation is that shown by the pseudo-function FCSETranslate, see FCSE translation on page B3-156. Note B3.2.2 • The FCSE translation has no effect if bits FCSEIDR[31:25] are 0b0000000, see c13, FCSE Process ID Register (FCSEIDR) on page B3-152. • From VMSAv6, use of FCSE translation is deprecated. • In VMSAv7, the FCSE is optional and might not be implemented. If it is not implemented the VMSA behavior is that MVA = VA, and the FCSEIDR register is RAZ/WI. Translation from MVA to PA using the translation tables The MMU translates the MVA to the PA. Typically, this translation attempts to find the translation table entry held in a TLB that either: • is a global entry • was brought into the TLB with the ASID that matches the current value held in the CONTEXTIDR, see c13, Context ID Register (CONTEXTIDR) on page B3-153. If no matching entry is found in a TLB, then the hardware locates the appropriate entry in the translation tables held in memory. When the translation table entry is located, in the TLB or in memory, either: • It is not a valid translation, and therefore causes a Translation fault. • The contents of the entry contain the PA, the memory permission attributes and the memory type attributes for the required access. Using the entry might cause an abort, for variety of reasons. For more information about MMU faults see MMU faults on page B3-40. B3-4 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) B3.2.3 Enabling and disabling the MMU The MMU can be enabled and disabled by writing to the SCTLR.M bit, see c1, System Control Register (SCTLR) on page B3-96. On reset, this bit is cleared to 0, disabling the MMU. When the MMU is disabled, memory accesses are treated as follows: • All data accesses are treated as Non-cacheable and Strongly-ordered. Unexpected data cache hit behavior is IMPLEMENTATION DEFINED. • The treatment of instruction accesses depends on the value of the SCTLR.I bit: When I == 0 All instruction accesses are Non-cacheable. When I == 1 All instruction accesses are Cacheable: • Inner Write-Through no Write-Allocate • Outer Write-Through no Write-Allocate. In both cases all instruction accesses are Non-shareable, Normal memory. Note On some implementations, if the SCTLR.TRE bit is set to 0 then this behavior can be changed by the remap settings in the memory remap registers, see CP15 c10, Memory Remap Registers on page B3-143. The details of TEX remapping when SCTLR.TRE is set to 0 are IMPLEMENTATION DEFINED, see SCTLR.TRE, SCTLR.M, and the effect of the MMU remap registers on page B3-38. • No memory access permission checks are performed, and no aborts are generated by the MMU. • For every access the PA is equal to the MVA. This is known as a flat address mapping. • If the FCSE is implemented, the FCSE PID is SBZ when the MMU is disabled. This is the reset value for the FCSE PID. Behavior is UNPREDICTABLE if the FCSE PID is not zero when the MMU is disabled. When the FCSE is implemented software must clear the FCSE PID before disabling the MMU. • CP15 cache maintenance operations act on the target cache whether the MMU is enabled or not, and regardless of the values of the memory attributes. However, if the MMU is disabled, they use the flat address mapping, and all mappings are considered global. CP15 TLB invalidate operations act on the target TLB whether the MMU is enabled or not. All relevant CP15 registers must be programmed before the MMU is enabled. This includes setting up suitable translation tables in memory. When the MMU is disabled, an instruction can be fetched if one of the following conditions is met: • ARM DDI 0406B The instruction is in the same 4KB block of memory (aligned to 4KB) as an instruction that is required by a simple sequential execution of the program, or is in the 4KB block of memory immediately following such a block. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-5 Virtual Memory System Architecture (VMSA) • The instruction is in the same 4KB block of memory (aligned to 4KB) from which an instruction has previously been required by a simple sequential execution of the program with the MMU disabled, or is in the 4KB block immediately following such a block. Note • Software must ensure that instructions that will be executed when the MMU is disabled are located within 4KB blocks of the address space that contain only memory which is tolerant to prefetching and speculative accesses, and that the following 4KB blocks of the address space also contain only memory which is tolerant to prefetching and speculative accesses. • Enabling or disabling the MMU effectively changes the translation tables that are in use. The synchronization requirements that apply on changing translation tables also apply to enabling or disabling the MMU. For more information, see Changing translation table attributes on page B3-21. See also Requirements for instruction caches on page B3-23. In addition, if the physical address of the code that enables or disables the MMU differs from its MVA, instruction prefetching can cause complications. Therefore, ARM strongly recommends that any code that enables or disables the MMU has identical virtual and physical addresses. B3-6 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) B3.3 Translation tables The MMU supports memory accesses based on memory sections or pages: Supersections Consist of 16MB blocks of memory. Support for Supersections is optional. Sections Consist of 1MB blocks of memory. Large pages Consist of 64KB blocks of memory. Small pages Consist of 4KB blocks of memory. Support for Supersections, Sections and Large pages enables a large region of memory to be mapped using only a single entry in the TLB. The translation tables held in memory have two levels: First-level table Holds first-level descriptors that contain the base address and • translation properties for Sections and Supersections • translation properties and pointers to a second level table for Large pages and Small pages Second-level tables Hold second-level descriptors, each containing the base address and translation properties for a Small pages or a Large page. Second-level tables are also referred to as Page tables. The translation tables are described in the following sections: • Translation table entry formats • Translation table base registers on page B3-11 • Translation table walks on page B3-13 • Changing translation table attributes on page B3-21 • The access flag on page B3-21. B3.3.1 Translation table entry formats The formats of the first-level and second-level translation table descriptor entries in the translation tables are described in: • First-level descriptors on page B3-8 • Second-level descriptors on page B3-10. For more information about second-level translation tables see Additional requirements for translation tables on page B3-11. Note In previous versions of the ARM Architecture Reference Manual and in some other documentation, the AP[2] bit in the translation table entries is described as the APX bit. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-7 Virtual Memory System Architecture (VMSA) First-level descriptors Each entry in the first-level table is a descriptor of how the associated 1MB MVA range is mapped. Table B3-1 shows the possible first-level descriptor formats, where the value of bits [1:0] of the descriptor identifies the descriptor type: 0b00 Invalid or fault entry. The associated MVA is unmapped, and attempting to access it generates a Translation fault, see VMSA memory aborts on page B3-40. Software can use bits [31:2] of an invalid descriptor for its own purposes, because these bits are ignored by the hardware. 0b01 Page table descriptor. The descriptor gives the physical address of a second-level translation table, that specifies how the associated 1MByte MVA range is mapped. A second level translation table requires 1KByte of memory and can map Large pages and Small pages, see Second-level descriptors on page B3-10. 0b10 Section or Supersection descriptor for the associated MVA. Bit [18] determines whether the descriptor is of a Section or a Supersection. For details of how the descriptor is interpreted see The full translation flow for Sections, Supersections, Small pages and Large pages on page B3-15. 0b11 Reserved. In VMSAv7, descriptors with bits [1:0] == 0b11 generate Translation faults, and must not be used. Table B3-1 VMSAv7 first-level descriptor formats 31 24 23 Fault 12 11 10 9 8 5 4 3 2 1 0 IGNORE Page table Section 0 0 Page table base address, bits [31:10] I M P Domain S S N B B 0 1 S Z Z Domain X C B 1 0 N A N n 0 S P S G [2] TEX [2:0] I AP M [1:0] P Extended A Supersection base address N n base address 1 S P PA[31:24] S G PA[35:32] [2] TEX [2:0] I Extended AP X M base address C B 1 0 [1:0] N P PA[39:36] Section base address, PA[31:20] Supersection Reserved 20 19 18 17 16 15 14 Reserved 1 1 The address information in the first-level descriptors is: Page table Bits [31:10] of the descriptor are bits [31:10] of the physical address of a Page table. Section Bits [31:20] of the descriptor are bits [31:20] of the physical address of the Section. Supersection Bits [31:24] of the descriptor are bits [31:24] of the physical address of the Supersection. Optionally, bits [8:5,23:20] of the descriptor are bits [39:32] of the extended Supersection address. B3-8 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) The other fields in the descriptors are: TEX[2:0], C, B Memory region attribute bits, see Memory region attributes on page B3-32. These bits are not present in a Page table entry. XN bit The execute-never bit. Determines whether the region is executable, see The Execute Never (XN) attribute and instruction prefetching on page B3-30. This bit is not present in a Page table entry. NS bit Non-secure bit. When the Security Extensions are implemented this bit specifies whether the translated PA targets Secure or Non-secure memory, see Secure and Non-secure address spaces on page B3-26. Domain Domain field, see Domains on page B3-31. This field is not present in a Supersection entry. Memory described by Supersections is in domain 0. IMP bit The meaning of this bit is IMPLEMENTATION DEFINED. AP[2], AP[1:0] Access Permissions bits, see Memory access control on page B3-28. AP[0] can be configured as the access flag, see The access flag on page B3-21. These bits are not present in a Page table entry. S bit The Shareable bit. Determines whether the translation is for Shareable memory, see Memory region attributes on page B3-32. This bit is not present in a Page table entry. nG bit The not global bit. Determines how the translation is marked in the TLB, see Global and non-global regions in the virtual memory map on page B3-54. This bit is not present in a Page table entry. Bit [18], when bits [1:0] == 0b10 0 Descriptor is for a Section 1 Descriptor is for a Supersection. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-9 Virtual Memory System Architecture (VMSA) Second-level descriptors Table B3-2 shows the possible formats of a second-level descriptor, where bits [1:0] of the descriptor identify the descriptor type: 0b00 Invalid or fault entry. The associated MVA is unmapped, and attempting to access it generates a Translation fault. Software can use bits [31:2] of an invalid descriptor for its own purposes, because these bits are ignored by the hardware. 0b01 Large page descriptor. Bits [31:16] of the descriptor are the base address of the Large page. 0b1X Small page descriptor. Bits [31:12] of the descriptor are the base address of the Small page. In this descriptor format, bit [0] of the descriptor is the XN bit. Table B3-2 VMSAv7 second-level descriptor formats 31 16 15 14 Fault 12 11 10 9 8 7 6 5 4 3 2 IGNORE Large page Large page base address, PA[31:16] Small page Small page base address, PA[31:12] X N 1 0 0 0 TEX [2:0] A n S P G [2] SBZ AP C B 0 1 [1:0] A n S P G [2] TEX [2:0] AP X C B 1 [1:0] N The address information in the second-level descriptors is: Large page Bits [31:16] of the descriptor are bits [31:16] of the physical address of the Large page. Small page Bits [31:12] of the descriptor are bits [31:12] of the physical address of the Small page. The other fields in the descriptors are: XN bit The execute-never bit. Determines whether the region is executable, see The Execute Never (XN) attribute and instruction prefetching on page B3-30. TEX[2:0], C, B Memory region attribute bits, see Memory region attributes on page B3-32. AP[2], AP[1:0] Access Permissions bits, see Memory access control on page B3-28. AP[0] can be configured as the access flag, see The access flag on page B3-21. B3-10 S bit The Shareable bit. Determines whether the translation is for Shareable memory, see Memory region attributes on page B3-32. nG bit The not global bit. Used in the TLB matching process, see Global and non-global regions in the virtual memory map on page B3-54. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) Additional requirements for translation tables Additional requirements for the entries in a translation table apply: • to first-level translation tables when Supersection descriptors are used • to second-level translation tables when Large page descriptors are used. These requirements exist because the top four bits of the Supersection or Large page index region of the MVA overlap with the bottom four bits of the table index. Translation table walks on page B3-13 gives more information, and these two cases are shown in: • Figure B3-5 on page B3-18 for the first-level translation table Supersection entry • Figure B3-7 on page B3-20 for the second-level translation table Large page table entry. Considering the case of using Large page table descriptors in a second-level translation table, this overlap means that for any specific Large page, the bottom four bits of the second-level translation table entry might take any value from 0b0000 to 0b1111. Therefore, each of these sixteen index values must point to a separate copy of the same descriptor. This means that, in a second-level translation table, each Large page descriptor must: • occur first on a sixteen-word boundary • be repeated in 16 consecutive memory locations. For similar reasons, in a first-level translation table, each Supersection descriptor must also: • occur first on a sixteen-word boundary • be repeated in 16 consecutive memory locations. Second-level translation tables are 1KB in size, and must be aligned on a 1KB boundary. Each 32-bit entry in a table provides translation information for 4KB of memory. VMSAv7 supports two page sizes: • Large pages are 64KByte in size • Small pages are 4KByte in size. The required replication of Large page descriptors preserves this 4KB per entry relationship: (4KBytes per entry) x (16 replicated entries) = 64KBytes = Large page size B3.3.2 Translation table base registers Three translation table registers describe the translation tables that are held in memory. For descriptions of the registers, see: • c2, Translation Table Base Register 0 (TTBR0) on page B3-113 • c2, Translation Table Base Register 1 (TTBR1) on page B3-116 • c2, Translation Table Base Control Register (TTBCR) on page B3-117. On a translation table walk, the most significant bits of the MVA and the value of TTBCR.N determine whether TTBR0 or TTBR1 is used as the translation table base register. The value of TTBCR.N indicates a number of most significant bits of the MVA and: • if either TTBCR.N is zero or the indicated bits of the MVA are zero, TTBR0 is used • otherwise TTBR1 is used. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-11 Virtual Memory System Architecture (VMSA) For more information, see Determining which TTBR to use, and the TTBR0 translation table size on page B3-118. The normal use of the two TTBRs is: TTBR0 Typically used for process-specific addresses. This table ranges in size from 128bytes to 16Kbyte, depending on the value of TTBCR.N. Each process maintains a separate first-level translation table. On a context switch: • TTBR0 is updated to point to the first-level translation table for the new context • TTBCR is updated if this change changes the size of the translation table • the CONTEXTIDR is updated. When the TTBCR is programmed to zero, all translations use TTBR0 in a manner compatible with earlier versions of the architecture, that is, with versions before ARMv6. TTBR1 Typically used for operating system and I/O addresses, that do not change on a context switch. The size of this table is always 16KByte. In the selected TTBR. the following bits define the memory region attributes for the translation table walk: • the RGN, S and C bits, in the ARMv7-A base architecture • the RGN, S, and IRGN[1:0] bits, when the Multiprocessing Extensions are implemented. When the Security Extensions are implemented, two bits in the TTBCR for the current security state control whether a translation table walk is performed on a TLB miss: • PD0, bit [4], controls whether translation table walks based on TTBR0 are performed • PD1, bit [5], controls whether translation table walks based on TTBR1 are performed. For more information about the TTBCR see c2, Translation Table Base Control Register (TTBCR) on page B3-117. The effect of these bits is: PDx == 0 When a TLB miss occurs based on TTBRx, a translation table walk is performed. The privilege of the memory access, Secure or Non-secure, corresponds to the current security state. PDx == 1 If a TLB miss occurs based on TTBRx, a Section Translation fault is returned. No translation table walk is performed. Note When the Security Extensions are implemented, setting PD0 ==1 or PD1==1 can result in recursive entry into the abort handler. This effectively deadlocks the system if the mapping for the abort vectors is not guaranteed to be present in the TLB. TLB lockdown might be used to guarantee that the mapping for the abort vectors is present in the TLB. B3-12 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) B3.3.3 Translation table walks A translation table walk occurs as the result of a TLB miss, and starts with a read of the appropriate first-level translation table: • a section-mapped access only requires a read of the first-level translation table • a page-mapped access also requires a read of the second-level translation table. The value of the SCTLR.EE bit determines the endianness of the translation table look ups. The physical address of the base of the first-level translation table is determined from the appropriate Translation Table Base Register (TTBR), see Translation table base registers on page B3-11. In the base ARMv7 architecture, and in versions of the architecture before ARMv7, it is IMPLEMENTATION DEFINED whether a hardware translation table walk can cause a read from the L1 unified or data cache. If an implementation does not support translation table accesses from L1 then software must ensure coherency between translation table walks and data updates. Typically this involves one of: • storing translation tables in Inner Write-Through Cacheable Normal memory • storing translation tables in Inner Write-Back Cacheable Normal memory and ensure the appropriate cache entries are cleaned after modification • storing translation tables in Non-cacheable memory. For more information, see TLB maintenance operations and the memory order model on page B3-59. In the Multiprocessing Extensions, translation table walks are required to access data or unified caches, or data and unified caches, of other agents participating in the coherency protocol, according to the shareability attributes described in the translation table base register. The shareability attributes described in the translation table base register must be consistent with the shareability attributes for the translation tables themselves. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-13 Virtual Memory System Architecture (VMSA) Reading a first-level translation table To perform a fetch based on TTBR1, Bits TTBR1[31:14] are concatenated with bits [31:20] of the MVA and two zero bits to produce a 32-bit physical address, as shown in Figure B3-1. 31 TTBR1 14 13 0 Translation base SBZ 31 20 19 MVA PA[31:0] of first level descriptor PA[39:32] = 0x00 0 Table index 31 14 13 2 1 Translation base Table index 0 00 Figure B3-1 Accessing the translation table first-level descriptors based on TTBR1 When performing a fetch based on TTBR0: • the address bits taken from TTBR0 vary between bits [31:14] and bits [31:7] • the address bits taken from the MVA vary between bits [31:20] and bits [24:20]. The width of the TTBR0 and MVA fields depend on the value of TTBCR.N, as shown in Figure B3-2. 31 TTBR0 14-N 13-N Translation base 31 PA[31:0] of first level descriptor PA[39:32] = 0x00 32-N 31-N ‡ MVA 31 20 19 0 Table index 14-N 13-N Translation base 0 SBZ 2 1 0 Table index 0 0 N is the value of TTBCR.N ‡ This field is absent if N==0 Figure B3-2 Accessing the translation table first-level descriptors based on TTBR0 B3-14 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) Regardless of which register is used as the base for the fetch, the resulting physical address selects a four-byte translation table entry that is one of: • A first-level descriptor for a Section or Supersection. • A Page table pointer to a second-level translation table. In this case a second fetch is performed to retrieve a second-level descriptor, see Reading a second-level translation table. • A faulting entry. Note Comparing Figure B3-1 on page B3-14 with Figure B3-2 on page B3-14, you can see that when using TTBR0 with N == 0 the construction of the PA becomes identical to that for TTBR1. Other diagrams in this section show the PA formation from TTBR0, but also represent PA formation from TTBR1, for which case N = 0. Reading a second-level translation table Figure B3-3 shows how the address of a second-level descriptor is obtained by combining: • the result of the first-level fetch • the second-level table index value held in bits [19:12] of the MVA. See Table B3-1 on page B3-8 for the format of the Access control fields of the first-level descriptor. 31 MVA First-level descriptor PA[31:0] of second-level descriptor PA[39:32] = 0x00 31 Page table base address 31 Page table base address 20 19 12 11 10 9 21 0 10 9 21 0 0 Second-level table index Access control fields 0 1 Second-level 0 0 table index Figure B3-3 Accessing second-level Page table descriptors The full translation flow for Sections, Supersections, Small pages and Large pages This section summarizes how each of the memory section and page options is described in the translation tables, and has a subsection summarizing the full translation flow for each of the options. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-15 Virtual Memory System Architecture (VMSA) The four options are: Section A 1MB memory region, described by a first-level translation table descriptor with bits [18,1:0] == 0b010. See Translation flow for a Section on page B3-17. Supersection A 16MB memory region, described by a first-level translation table entry with bits [18,1:0] == 0b110. See Translation flow for a Supersection on page B3-18. Small page A 4KB memory region, described by: • a first-level translation table entry with bits [1:0] == 0b01, giving a second-level Page table address. • a second-level descriptor with bit [1] == 1. See Translation flow for a Small page on page B3-19. Large page A 64KB memory region, described by: • a first-level translation table entry with bits [1:0] == 0b01, giving a second-level Page table address. • a second-level descriptor with bits [1:0] == 0b01. See Translation flow for a Large page on page B3-20. B3-16 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) Translation flow for a Section Figure B3-4 shows the virtual to physical addresses translation for a Section. For details of the access control fields in the first-level descriptor see the Section entry in Table B3-1 on page B3-8. Translation Table Base Register 31 14-N 13-N Translation base 31 MVA PA[31:0] of first-level descriptor PA[39:32] = 0x00 First-level Section descriptor 32-N 31-N ‡ 31 20 19 First-level read 0 Table index 14-N 13-N Translation base Section index 2 1 0 Table index 0 0 31 31 PA[31:0] PA[39:32] = 0x00 0 SBZ Section base address Section base address 20 19 2 1 Access control fields 20 19 0 10 0 Section index For a translation based on TTBR0, N is the value of TTBCR.N For a translation based on TTBR1, N is 0 ‡ This field is absent if N==0 Figure B3-4 Section address translation ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-17 Virtual Memory System Architecture (VMSA) Translation flow for a Supersection Figure B3-5 shows the virtual to physical addresses translation for a Supersection. For details of the extended Supersection address and access control fields in the first-level descriptor see the Supersection entry in Table B3-1 on page B3-8. Translation Table Base Register 31 14-N 13-N Translation base 31 MVA PA31:0] of first-level descriptor PA[39:32] = 0x00 First-level Supersection descriptor 32-N 31-N ‡ 31 24 23 First-level read 20 19 2 1 0 Supersection index Table index 14-N 13-N Translation base 39 PA[39:0] 0 SBZ 0 Table index 0 0 31 24 23 32 31 24 23 2 1 0 Supersection Extended Supersection base address 1 0 base address and Access control fields Extended Supersection base address base address 0 Supersection index For a translation based on TTBR0, N is the value of TTBCR.N For a translation based on TTBR1, N is 0 ‡ This field is absent if N==0 Figure B3-5 Supersection address translation Note Figure B3-5 shows how, when the MVA addresses a Supersection, the top four bits of the Supersection index bits of the MVA overlap the bottom four bits of the Table index bits. For more information, see Additional requirements for translation tables on page B3-11. B3-18 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) Translation flow for a Small page Figure B3-6 shows the virtual to physical addresses translation for a Small page. For details of the access control fields in the first-level descriptor see the Page table entry in Table B3-1 on page B3-8. For details of the access control fields in the second-level descriptor see the Small page entry in Table B3-2 on page B3-10. Translation Table Base Register 31 14-N 13-N Translation base 31 First-level descriptor PA[31:0] of second-level descriptor PA[39:32] = 0x00 Second-level descriptor 32-N 31-N First-level table index ‡ MVA PA[31:0] of first-level descriptor PA[39:32] = 0x00 31 14-N 13-N 31 Page table base address 31 Page table base address Second-level read 20 19 12 11 2 1 Page index 0 10 9 21 0 10 9 21 0 Access control fields 0 1 Second-level 0 0 table index 31 12 11 Small page base address 31 PA[31:0] PA[39:32] = 0x00 0 Second-level table index First-level table index 0 0 Translation base First-level read 0 SBZ 21 Access control fields 1 0 12 11 Small page base address 0 Page index For a translation based on TTBR0, N is the value of TTBCR.N For a translation based on TTBR1, N is 0 ‡ This field is absent if N==0 Figure B3-6 Small page address translation ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-19 Virtual Memory System Architecture (VMSA) Translation flow for a Large page Figure B3-7 shows the virtual to physical addresses translation for a Large page. For details of the access control fields in the first-level descriptor see the Page table entry in Table B3-1 on page B3-8. For details of the access control fields in the second-level descriptor see the Large page entry in Table B3-2 on page B3-10. Translation Table Base Register 31 14-N 13-N Translation base 0 SBZ 31 MVA 32-N 31-N ‡ First-level table index 20 19 16 15 Small page index range 12 11 0 Page index Second-level table index PA[31:0] of first-level descriptor PA[39:32] = 0x00 First-level descriptor 31 14-N 13-N Translation base First-level read PA[31:0] of second-level descriptor PA[39:32] = 0x00 Second-level descriptor 31 Page table base address 31 Page table base address Second-level read 2 1 0 First-level 0 0 table index 10 9 21 0 10 9 21 0 Access control fields 0 1 Second-level 0 0 table index 31 16 15 Large page base address 31 PA[31:0] PA[39:32] = 0x00 21 Access control fields 01 0 16 15 Large page base address 0 Page index For a translation based on TTBR0, N is the value of TTBCR.N For a translation based on TTBR1, N is 0 ‡ This field is absent if N==0 Figure B3-7 Large page address translation Note Figure B3-7 shows how, when the MVA addresses a Large page, the top four bits of the page index bits of the MVA overlap the bottom four bits of the First-level table index bits. For more information, see Additional requirements for translation tables on page B3-11. This diagram also shows the width of the page index bits when addressing a Small page, to show that there is no overlap in this case. B3-20 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) B3.3.4 Changing translation table attributes When changing translation table attributes, you must avoid situations where caching or pipelining effects mean that overlapping entries or aliases with different attributes might be visible to the processor at the same time. To avoid these situations, ARM recommends that you invalidate old translation table entries, and synchronize the effects of those invalidations, before you create new translation table entries that might overlap or create aliases with different attributes. This approach is sometimes called break before make. For information about the procedure for synchronizing a change to the translation tables see TLB maintenance operations and the memory order model on page B3-59. Translation table entries that create Translation faults are not held in the TLB, see Translation fault on page B3-43. Therefore TLB and branch predictor invalidation is not required for the synchronization of a change from a translation table entry that causes a Translation fault to one that does not. B3.3.5 The access flag From VMSAv7, the AP[0] bit in the translation table descriptors can be redefined as an access flag. This is done by setting SCTLR.AFE to 1, see c1, System Control Register (SCTLR) on page B3-96. When this bit is set, the access permissions information in the translation table descriptors is limited to the AP[2:1] bits, as described in Simplified access permissions model on page B3-29. The access flag is used to indicate when a page or section of memory is accessed for the first time since the access flag was set to 0. It is IMPLEMENTATION DEFINED whether the access flag is managed by software or by hardware. The two options are described in the subsections: • Software management of the access flag • Hardware management of the access flag. The access flag mechanism expects that, when an Access Flag fault occurs, software sets the access flag to 1 in the translation table entry that caused the fault. This prevents the fault occurring the next time the memory is accessed. Software does not have to flush the entry from the TLB after setting the flag. Software management of the access flag With an implementation that requires software to manage the access flag, an Access Flag fault is generated when both: • the SCTLR.AFE bit is set to 1 • a translation table entry with the access flag set to 0 is read into the TLB. Hardware management of the access flag An implementation can choose to provide hardware management of the access flag. In this case, when the SCTLR.AFE bit is set to 1 and a translation table entry with the access flag set to 0 is read into the TLB, the hardware must write 1 to the access flag bit of the translation table entry in memory. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-21 Virtual Memory System Architecture (VMSA) Any implementation of hardware management of the access flag must ensure that any software changes to the translation table are not lost. The architecture does not require software that performs translation table changes to use interlocked operations. The hardware management mechanisms for the access flag must prevent any loss of data written to translation table entries that might occur when, for example, a write by another processor occurs between the read and write phases of a translation table walk that updates the access flag. An implementation that provides hardware management of the access flag: • does not generate Access Flag faults when the access flag is enabled • uses the HW access flag field, ID_MMFR2[31:28], to indicate this implementation choice, see c0, Memory Model Feature Register 2 (ID_MMFR2) on page B5-14. Architecturally, an operating system that makes use of the access flag must support the software faulting option that uses the Access Flag fault. This provides compatibility between systems that include a hardware implementation of the access flag and those systems that do not implement this feature. When an implementation provides hardware management of the access flag it must also implement the SCTLR.HA bit, that can be used to enable or disable the access flag mechanism. See c1, System Control Register (SCTLR) on page B3-96. Changing the access flag enable It is UNPREDICTABLE whether the TLB caches the effect of the SCTLR.AFE bit on translation tables. This means that, after changing the SCTLR.AFE bit software must invalidate the TLB before it relies on the effect of the new value of the SCTLR.AFE bit. B3-22 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) B3.4 Address mapping restrictions ARMv6 supported a page coloring restriction that, when implemented, required all Virtual Address aliases of a given Physical Address to have the same value for address bits [13:12]. This page coloring restriction was required to support Virtually Index Physically Tagged (VIPT) caches with a cache way size larger than 4KBytes. For details of the page coloring restriction see Virtual to physical translation mapping restrictions on page AppxG-26. ARMv7 does not support page coloring, and requires that all data and unified caches behave as Physically Indexed Physically Tagged (PIPT) caches. Note An ARMv7 implementation might use techniques such as hardware alias avoidance to make a VIPT cache behave as a PIPT cache, and might improve performance by avoiding accesses to frequently alternating aliases to a physical address. Such approaches give good results, but ARM recommends migration to the use of true PIPT caches for all data and unified caches. In an ARMv7 implementation, any data or unified cache maintenance operation that operates on a virtual address must take account of the fact that the cache behaves as a PIPT cache. This means that the implementation must perform the appropriate action on the physical address that corresponds to the MVA targeted by the operation. The ARMv7 requirements for instruction caches are described in Requirements for instruction caches. B3.4.1 Requirements for instruction caches In a base VMSAv7 implementation, the following conditions require cache maintenance of an instruction cache: • writing new data to an instruction address • writing new address mappings to the translation table • changing one or more of the TTBR0, TTBR1 and TTBCR registers without changing the ASID • enabling or disabling the MMU, by writing to the SCTLR. Note These conditions are consistent with the maintenance required for an ASID-tagged Virtually Indexed Virtually Tagged (VIVT) instruction cache that also includes a security status bit for each cache entry. VMSAv7 can be implemented with an optional extension, the IVIPT extension (Instruction cache Virtually Indexed Physically Tagged extension). The effect of this extension is to reduce the instruction cache maintenance requirement to a single condition: • writing new data to an instruction address. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-23 Virtual Memory System Architecture (VMSA) Note This condition is consistent with the maintenance required for a Virtually Indexed Physically Tagged (VIPT) instruction cache. Software can read the Cache Type Register to determine whether the IVIPT extension is implemented, see c0, Cache Type Register (CTR) on page B3-83. Functionally, the relationship between cache type and the software management requirement depends on whether the operating system uses ASIDs to distinguish processes that use different translation tables: • when ASIDs are used, management is similar for a VIPT and an ASID-tagged VIVT cache • when ASIDs are not used, management is similar for a VIVT and an ASID-tagged VIVT cache. A remapping policy that supports ASID changes means that translation tables can be swapped simply by updates to the TTBR0, TTBR1 and TTBCR registers, with an appropriate change of the ASID held in the CONTEXTIDR, see Synchronization of changes of ASID and TTBR on page B3-60. Such changes are transparent to an ASID-tagged VIVT instruction cache until an ASID value is reused. In contrast, a VIVT instruction cache that is not ASID-tagged must be invalidated whenever the virtual to physical address mappings change. Therefore, such a cache must be invalidated on an ASID change. Software written to rely on a VIPT instruction cache must only be used with processors that implement the IVIPT. For maximum compatibility across processors, ARM recommends that operating systems target the ARMv7 base architecture that uses ASID-tagged VIVT instruction caches, and do not assume the presence of the IVIPT extension. Software that relies on the IVIPT extension might fail in an UNPREDICTABLE way on an ARMv7 implementation that does not include the IVIPT extension. With an instruction cache, the distinction between a VIPT cache and a PIPT cache is much less visible to the programmer than it is for a data cache, because normally the contents of an instruction cache are not changed by writing to the cached memory. However, there are situations where a program must distinguish between the different cache tagging strategies. Example B3-1 describes such a situation. Example B3-1 A situation where software must be aware of the Instruction cache tagging strategy Two processes, P1 and P2, share some code and have separate virtual mappings to the same region of instruction memory. P1 changes this region, for example as a result of a JIT, or some other self-modifying code operation. P2 needs to see the modified code. As part of its self-modifying code operation, P1 must invalidate the changed locations from the instruction cache. For more information, see Ordering of cache and branch predictor maintenance operations on page B2-21. If this invalidation is performed by MVA, and the instruction cache is a VIPT cache, then P2 might continue to see the old code. In this situation, if the instruction cache is a VIPT cache, after the code modification the entire instruction cache must be invalidated to ensure P2 observes the new version of the code. B3-24 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) Note Software can read the Cache Type Register to determine whether the instruction cache is PIPT, VIPT, or ASID-tagged VIVT, see c0, Cache Type Register (CTR) on page B3-83. B3.4.2 Instruction cache maintenance operations by MVA On a cache maintenance operation by MVA, the generation of Data Abort exceptions can depend on the tagging strategy of the instruction cache: • With an ASID-tagged VIVT instruction cache, it is IMPLEMENTATION DEFINED whether the TLB is checked to see whether a valid translation table mapping exists for the VA used by a cache maintenance operation. Therefore, it is IMPLEMENTATION DEFINED whether cache maintenance operations by MVA can generate Data Abort exceptions. • With a VIPT or PIPT instruction cache, the TLB must be checked and an abort is generated on a Translation fault or an Access Flag fault. No abort is generated on a Domain or Permission fault. For maximum portability, ARM recommends that operating systems always provide an abort handler to process Data Abort exceptions on instruction cache maintenance operations by MVA, even though some ARMv7 implementations might not be capable of generating these aborts. The effect of an instruction cache maintenance operation by MVA can depend on the tagging strategy of the instruction cache: • For an ASID -tagged VIVT instruction cache or a VIPT instruction cache, the effect of the operation is only guaranteed to apply to the modified virtual address supplied to the instruction. It is not guaranteed to apply to any other alias of that modified virtual address. • For a PIPT instruction cache, the effect of the operation applies to all aliases of the modified virtual address supplied to the instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-25 Virtual Memory System Architecture (VMSA) B3.5 Secure and Non-secure address spaces When implemented, the Security Extensions provide two physical address spaces, a Secure physical address space and a Non-secure physical address space. The translation table base registers, TTBR0, TTBR1 and TTBCR are banked between Secure and Non-secure versions, and the security state of a memory access selects the corresponding version of the registers. Therefore, the translation tables are separated between Secure and Non-secure versions. Translation table walks are made to the physical address space corresponding to the security state of the translation tables. The Non-secure translation table entries can only translate a virtual address to a physical address in the Non-secure physical address space. Secure translation table entries can translate a virtual address to a physical addresses in either the Secure or the Non-secure address space. Selection of which physical address space to use is managed by the NS field in the first-level descriptors, see First-level descriptors on page B3-8: • for Non-secure translation table entries, the NS field is ignored • for Secure translation table entries, the NS field determines which physical address space is accessed: NS == 0 Secure physical address space is accessed. NS == 1 Non-secure physical address space is accessed. Because the NS field is defined only in the first level translation tables, the granularity of the Secure and Non-secure memory spaces is 1MB. However, in these memory regions you can define physical memory regions with a granularity of 4KB. For more information, see Translation tables on page B3-7. Note A system implementation can alias parts of the Secure physical address space to the Non-secure physical address space in implementation-specific ways. As with any other aliasing of physical memory, the use of aliases in this way can require the use of cache maintenance operations to ensure that changes to memory made using one alias of the physical memory are visible to accesses to the other alias of the physical memory. B3-26 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) B3.5.1 The effect of the Security Extensions on the cache operations When the Security Extensions are implemented and each security state has its own physical address space, Table B3-3 shows the effect of the security state on the cache operations. Table B3-3 Effect of the security state on the cache operations Security state Targeted entry Invalidate All Nonsecure All instruction cache lines that contain entries that can be accessed from the Non-secure security state Invalidate All Secure All instruction cache lines Invalidate by MVA Either Base Architecture: Cache operation Instruction cache operations All Lines that match the specified MVA and the current ASID and come from the same virtual address space as the current security state IVIPT extension: a All Lines that match the specified MVA and the current ASID and come from the same physical address space as described in the translation tables Data or unified cache operations Invalidate, Clean, Clean and Invalidate by set/way Nonsecure Line specified by set/way provided that the entry comes from the Non-secure physical address space Invalidate, Clean, Clean and Invalidate by set/way Secure Line specified by set/way regardless of the physical address space that the entry has come from Invalidate, Clean, Clean and Invalidate by MVA Either All Lines that match the specified MVA and the current ASID and come from the same physical address space, as described in the translation tables a. For more information about the IVIPT extension see Requirements for instruction caches on page B3-23. For locked entries and entries that might be locked, the behavior of cache maintenance operations described in The interaction of cache lockdown with cache maintenance on page B2-18 applies. This behavior is not affected by the Security Extensions. With an implementation that generates aborts if entries are locked or might be locked in the cache, if the use of lockdown aborts is enabled then these aborts can occur on any cache maintenance operation regardless of the Security Extensions. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-27 Virtual Memory System Architecture (VMSA) B3.6 Memory access control Access to a memory region is controlled by the access permission bits and the domain field in the TLB entry. These form part of the translation table entry formats described in Translation tables on page B3-7. The bits and fields are summarized in First-level descriptors on page B3-8 and Second-level descriptors on page B3-10. The TLB memory access controls are described in: • Access permissions • The Execute Never (XN) attribute and instruction prefetching on page B3-30 • Domains on page B3-31. B3.6.1 Access permissions The access permission bits control access to the corresponding memory region. If an access is made to an area of memory without the required permissions, a Permission fault is generated if the domain is set to Client. The access permissions are determined by the AP[2:0] bits in the translation table entry. The XN bit in the translation table entry provides an additional permission bit for instruction fetches. Note • Before VMSAv7, the SCTLR.S and SCTLR.R bits also affect the access permissions. For more information, see Translation attributes on page AppxH-22. • From VMSAv7, the full set of access permissions shown in Table B3-4 are only supported when the SCTLR.AFE bit is set to 0. When SCTLR.AFE = 1, the only supported access permissions are those described in Simplified access permissions model on page B3-29. • In previous issues of the ARM Architecture Reference Manual and in some other documentation, the AP[2] bit in the translation table entries is described as the APX bit. Table B3-4 shows the encoding of the access permissions: Table B3-4 VMSAv7 MMU access permissions AP[2] AP[1:0] Privileged permissions User permissions Description 0 00 No access No access All accesses generate Permission faults 0 01 Read/write No access Privileged access only 0 10 Read/write Read-only Writes in User mode generate Permission faults 0 11 Read/write Read/write Full access 1 00 - - Reserved B3-28 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) Table B3-4 VMSAv7 MMU access permissions (continued) AP[2] AP[1:0] Privileged permissions User permissions Description 1 01 Read-only No access Privileged read-only 1 10 Read-only Read-only Privileged and User read-only, deprecated in VMSAv7a 1 11 Read-only Read-only Privileged and User read-onlyb a. From VMSAv7, ARM strongly recommends that the 0b11 encoding is used for Privileged and User read-only. b. This mapping is introduced in VMSAv7, and is reserved in VMSAv6. For more information, see Simplified access permissions model. Each memory region can be tagged as not containing executable code. If the Execute-never (XN) bit is set to 1 and the region is in a Client domain, any attempt to execute an instruction in that region results in a Permission fault. If the XN bit is 0 and there is valid read permission, code can execute from that memory region, provided that no other Prefetch Abort condition exists. Note The XN bit is ignored on accesses to Manager domains. Simplified access permissions model Some memory management require a simple access permissions model where: • one flag selects between read-only and read/write access • a second flag selects between User and Kernel control. In the ARM architecture, this model permits four access combinations: • read-only by both privileged and unprivileged code • read/write by both privileged and unprivileged code • read-only by privileged code, no access by unprivileged code • read/write by privileged code, no access by unprivileged code. With the VMSAv7 MMU access permissions shown in Table B3-4 on page B3-28, this model is implemented by: • Setting the AP[0] bit to 1, unless the SCTLR.AFE bit is set to 1, see c1, System Control Register (SCTLR) on page B3-96. • Using the AP[2:1] bits to control access, as shown in Table B3-5 on page B3-30. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-29 Virtual Memory System Architecture (VMSA) Table B3-5 VMSAv7 simple access control model AP[2] AP[1] Accessa 0 0 Kernel, read/write 0 1 User, read/write 1 0 Kernel, read-only 1 1 User, read-only a. Kernel access corresponds to access by privileged code only. Note This model depends on the definition of the AP[2] == 1, AP[1:0] == 0b11 encoding shown in Table B3-4 on page B3-28. This encoding is introduced in VMSAv7, and therefore the simplified access permissions model cannot be supported in VMSAv6. When the SCTLR.AFE bit is set to 1 the AP[0] bit becomes an access flag, see The access flag on page B3-21. In this case, this simplified access permissions model becomes the only supported access permissions model. B3.6.2 The Execute Never (XN) attribute and instruction prefetching An implementation must not fetch instructions from any memory location that is marked as Execute Never. A location is marked as Execute Never when it has its XN attribute set to 1 in a Client domain. When the MMU is enabled, instructions can only be fetched or prefetched from memory locations in Client domains where: • XN is set to 0 • valid read permissions exist • no other Prefetch Abort condition exists. Any region of memory that is read-sensitive must be marked as Execute Never, to avoid the possibility of a speculative prefetch accessing the memory region. For example, any memory region that corresponds to a read-sensitive peripheral must be marked as Execute Never. The XN attribute is not checked for domains marked as Manager. Read-sensitive memory must not be included in domains marked as Manager, because the XN bit does not prevent prefetches in these cases. The XN attribute is not checked when the MMU is disabled. All VMSAv7 implementations must ensure that, when the MMU is disabled, prefetching down non-sequential paths cannot cause unwanted accesses to read-sensitive devices. B3-30 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) B3.6.3 Domains A domain is a collection of memory regions. The ARM VMSA architecture supports 16 domains, and each VMSA memory region is assigned to a domain: • First-level translation table entries for Page tables and Sections include a domain field. • Translation table entries for Supersections do not include a domain field. Supersections are defined as being in domain 0. • Second-level translation table entries inherit a domain setting from the parent first-level Page table entry. • Each TLB entry includes a domain field. A domain field specifies which domain the entry is in. Access to each domain is controlled by a two-bit field in the Domain Access Control Register, see c3, Domain Access Control Register (DACR) on page B3-119. Each field enables the access to an entire domain to be enabled and disabled very quickly, so that whole memory areas can be swapped in and out of virtual memory very efficiently. The VMSA supports two kinds of domain access: Clients Users of domains, guarded by the access permissions of the TLB entries for that domain. Clients execute programs and access data held in the domain. Managers Control the behavior of the domain, and are not guarded by the access permissions for TLB entries in that domain. The domain behavior controlled by a Manager covers: • the sections and pages currently in the domain • the current access permissions for the domain. A single program might: • be a Client of some domains • be a Manager of some other domains • have no access to the remaining domains. This permits very flexible memory protection for programs that access different memory resources. Table B3-6 shows the encoding of the bits in the DACR. Table B3-6 Domain access values Value Access types Description 00 No access Any access generates a Domain fault 01 Client Accesses are checked against the access permission bits in the TLB entry 10 Reserved Using this value has UNPREDICTABLE results 11 Manager Accesses are not checked against the access permission bits in the TLB entry, so a Permission fault cannot be generated ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-31 Virtual Memory System Architecture (VMSA) B3.7 Memory region attributes Each TLB entry has an associated set of memory region attributes. These control accesses to the caches, how the write buffer is used, and if the memory region is Shareable and therefore must be kept coherent. From VMSAv6: • Most of the memory attributes are controlled by the C and B bits and the TEX[2:0] field of the translation table entries. More information about these attributes is given in the sections: — The alternative descriptions of the Memory region attributes — C, B, and TEX[2:0] encodings without TEX remap on page B3-33 — Memory region attribute descriptions when TEX remap is enabled on page B3-34. • When the Security Extensions are implemented, the NS bit provides an additional memory attribute, see Secure and Non-secure address spaces on page B3-26. Note The Bufferable (B), Cacheable (C), and Type Extension (TEX) bit names are inherited from earlier versions of the architecture. These names no longer adequately describe the function of the B, C, and TEX bits. The translation table entries also include an S bit. This bit: • Is ignored if the entry refers to Device or Strongly-ordered memory. • For Normal memory, determines whether the memory region is Shareable or Non-shareable: S == 0 Normal memory region is Non-shareable S == 1 Normal memory region is Shareable. B3.7.1 The alternative descriptions of the Memory region attributes From VMSAv7, there are two alternative schemes for describing the memory region attributes, and the current scheme is selected by the SCTLR.TRE (TEX Remap Enable) bit, see c1, System Control Register (SCTLR) on page B3-96. The two schemes are: TRE == 0 TEX Remap disabled. TEX[2:0] are used, with the C and B bits, to describe the memory region attributes. This is the scheme used in VMSAv6, and it is described in C, B, and TEX[2:0] encodings without TEX remap on page B3-33. TRE == 1 TEX Remap enabled. TEX[2:1] are reassigned for use as flags managed by the operating system. The TEX[0], C and B bits are used to describe the memory region attributes, with the MMU remap registers: • the Primary Region Remap Register, PRRR • the Normal Memory Remap Register, NMRR. This scheme is described in Memory region attribute descriptions when TEX remap is enabled on page B3-34. B3-32 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) When the Security Extensions are implemented, the SCTLR.TRE bit is banked between the Secure and Non-secure states. B3.7.2 C, B, and TEX[2:0] encodings without TEX remap Table B3-7 shows the C, B, and TEX[2:0] encodings when TEX remap is disabled (TRE == 0). Table B3-7 TEX, C, and B encodings when TRE == 0 TEX[2:0] C B Description Memory type Page Shareable 000 0 0 Strongly-ordered Strongly-ordered Shareable 000 0 1 Shareable Device Device Shareable 000 1 0 Outer and Inner Write-Through, no Write-Allocate Normal S bit a 000 1 1 Outer and Inner Write-Back, no Write-Allocate Normal S bit a 001 0 0 Outer and Inner Non-cacheable Normal S bit a 001 0 1 Reserved - - 001 1 0 IMPLEMENTATION DEFINED IMPLEMENTATION IMPLEMENTATION DEFINED DEFINED 001 1 1 Outer and Inner Write-Back, Write-Allocate Normal S bit a 010 0 0 Non-shareable Device Device Non-shareable 010 0 1 Reserved - - 010 1 X Reserved - - 011 X X Reserved - - 1BB A A Cacheable memory: Normal S bit a AA = Inner attributeb BB = Outer attribute a. Whether the memory is Shareable depends on the value of the S bit, see description in Memory region attributes on page B3-32. b. For more information, see Cacheable memory attributes on page B3-34. See Memory types and attributes and the memory order model on page A3-24 for an explanation of Normal, Strongly-ordered and Device memory types and of the Shareable attribute. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-33 Virtual Memory System Architecture (VMSA) Cacheable memory attributes When TEX[2] == 1, the translation table entry describes Cacheable memory, and the rest of the encoding defines the Inner and Outer cache attributes: TEX[1:0] defines the Outer cache attribute C,B defines the Inner cache attribute The same encoding is used for the Outer and Inner cache attributes. Table B3-8 shows the encoding. Table B3-8 Inner and Outer cache attribute encoding B3.7.3 Encoding Cache attribute 0 0 Non-cacheable 0 1 Write-Back, Write-Allocate 1 0 Write-Through, no Write-Allocate 1 1 Write-Back, no Write-Allocate Memory region attribute descriptions when TEX remap is enabled The VMSAv6 scheme for describing the memory region attributes, described in C, B, and TEX[2:0] encodings without TEX remap on page B3-33, uses the TEX[2:0], C and B bits to describe all of the options for Inner and Outer cacheability. However, many system software implementations do not need to use all of these options simultaneously. Instead a smaller subset of attributes can be enabled. This alternative functionality is called TEX remap, and permits software to hold software-interpreted values in the translation tables. When TEX remap is enabled: • only the TEX[0], C and B bits are used to describe the memory region attributes • fewer attribute options are available at any time • the available options are configurable using the PRRR and NMRR registers • TEX[2:1] are not updated by hardware, see The OS managed translation table bits on page B3-38. When TEX remap is enabled: • • B3-34 For seven of the eight possible combinations of the TEX[0], C and B bits: — a field in the PRRR defines the corresponding memory region as being Normal, Device or Strongly-ordered memory — a field in the NMRR defines the Inner cache attributes that apply if the PRRR field identifies the region as Normal memory — a second field in the NMRR defines the Outer cache attributes that apply if the PRRR field identifies the region as Normal memory. The meaning of the eighth combination for the TEX[0], C and B bits is IMPLEMENTATION DEFINED Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) • Four bits in the PRRR permit mapping of the Shareable attribute by defining, for the translation table S bit: — the meaning of S == 0 if the region is identified as Device memory — the meaning of S == 1 if the region is identified as Device memory — the meaning of S == 0 if the region is identified as Normal memory — the meaning of S == 1 if the region is identified as Normal memory. In each case, the meaning of the Shareable bit value is that the memory region is one of: — Shareable — Non-shareable. For each of the possible encodings of the TEX[0], C and B bits in a translation table entry, Table B3-9 shows which fields of the PRRR and NMRR registers describe the memory region attributes. Table B3-9 TEX, C, and B encodings when TRE == 1 Cache attributes a, b: Encoding Memory TEX[0] C B 0 0 0 0 0 0 type a Outer Shareable attribute b Inner cache Outer cache PRRR[1:0] NMRR[1:0] NMRR[17:16] NOT(PRRR[24]) 1 PRRR[3:2] NMRR[3:2] NMRR[19:18] NOT(PRRR[25]) 1 0 PRRR[5:4] NMRR[5:4] NMRR[21:20] NOT(PRRR[26]) 0 1 1 PRRR[7:6] NMRR[7:6] NMRR[23:22] NOT(PRRR[27]) 1 0 0 PRRR[9:8] NMRR[9:8] NMRR[25:24] NOT(PRRR[28]) 1 0 1 PRRR[11:10] NMRR[11:10] NMRR[27:26] NOT(PRRR[29]) 1 1 0 IMPLEMENTATION IMPLEMENTATION IMPLEMENTATION IMPLEMENTATION DEFINED DEFINED DEFINED DEFINED PRRR[15:14] NMRR[15:14] NMRR[31:30] NOT(PRRR[31]) 1 1 1 a. For details of the memory type field encodings see c10, Primary Region Remap Register (PRRR) on page B3-143. For details of the cache attribute encodings see Table B3-8 on page B3-34. b. Only applies if the memory type for the region is mapped as Normal memory and the location is Shareable. To find the meaning of the value of the S bit in a translation table entry you must: • use Table B3-9 to find the memory type of the region described by the entry • if the memory type is Strongly-ordered then the region is Shareable • if the memory type is not Strongly-ordered then look up the memory type and value of the S bit in Table B3-10 on page B3-36 to find which bit of the PRRR defines the Shareable attribute of the region. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-35 Virtual Memory System Architecture (VMSA) Table B3-10 Remapping of the S bit Remapping of Shareable attribute when Memory type S=0 S=1 Strongly-ordered Shareable a Shareable a Device PRRR[16] PRRR[17] Normal PRRR[18] PRRR[19] a. No remapping, Strongly-ordered memory is always Shareable. • The appropriate bit of the PRRR indicates whether the region is Shareable or Non-shareable. Note When TEX remapping is enabled, it is possible for a translation table entry with S = 0 to be mapped as Shareable memory. For full descriptions of the TEX remap registers see: • c10, Primary Region Remap Register (PRRR) on page B3-143 • c10, Normal Memory Remap Register (NMRR) on page B3-146. When the Security Extensions are implemented, the TEX remap registers and the SCTLR.TRE bit are banked between the Secure and Non-secure security states. For more information, see The effect of the Security Extensions on TEX remapping on page B3-39. When TEX remap is enabled, the mappings specified by the PRRR and NMRR determine the mapping of the TEX[0], C and B bits in the translation tables to memory type and cacheability attributes: 1. The primary mapping, indicated by a field in the PRRR as shown in the Memory region column of Table B3-9 on page B3-35, takes precedence. 2. Any region that is mapped as Normal memory can have the Inner and Outer Cacheable attributes determined by the NMRR. 3. If it is supported, the Outer Shareable mapping adds a third level of attribute, see Interpretation of the NOSn fields in the PRRR on page B3-37. The TEX remap registers must be static during normal operation. In particular, when the remap registers are changed: • it is IMPLEMENTATION DEFINED when the changes take effect • it is UNPREDICTABLE whether the TLB caches the effect of the TEX remap on translation tables. B3-36 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) The sequence to ensure the synchronization of changes to the TEX remap registers is: 1. Perform a DSB. This ensures any memory accesses using the old mapping have completed. 2. Write the TEX remap registers or SCTLR.TRE bit. 3. Perform an ISB. This ensures synchronization of the register updates. 4. Invalidate the entire TLB. 5. Perform a DSB. This ensures completion of the entire TLB operation. 6. Clean and invalidate all caches. This removes any cached information associated with the old mapping. 7. Perform a DSB. This ensures completion of the cache maintenance. 8. Perform an ISB. This ensures instruction synchronization. This extends the standard rules for the synchronization of changes to CP15 registers described in Changes to CP15 registers and the memory order model on page B3-77, and provides implementation freedom as to whether or not the effect of the TEX remap is cached. Interpretation of the NOSn fields in the PRRR When all of the following apply, the NOSn fields in the PRRR distinguish between Inner Shareable and Outer Shareable memory regions: • the SCTLR.TRE bit is set to 1 • the region is mapped as Normal memory • the Normal memory remapping of the S bit value for the entry makes the region Shareable • the implementation supports the distinction between Inner Shareable and Outer Shareable. If the SCTLR.TRE bit is set to 0, an implementation can provide an IMPLEMENTATION DEFINED mechanism to interpret the NOSn fields in the PRRR, see SCTLR.TRE, SCTLR.M, and the effect of the MMU remap registers on page B3-38. The values of the NOSn fields in the PRRR have no effect if any of the following apply: • the SCTLR.TRE bit is set to 0 and the IMPLEMENTATION DEFINED mechanism has not been invoked • the region is not mapped as Normal memory • the Normal memory remapping of the S bit value for the entry makes the region Non-shareable. The NOSn fields in the PRRR are RAZ/WI if the implementation does not support the distinction between Inner Shareable and Outer Shareable memory regions. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-37 Virtual Memory System Architecture (VMSA) SCTLR.TRE, SCTLR.M, and the effect of the MMU remap registers When TEX remap is disabled, because the SCTLR.TRE bit is set to 0: • the effect of the MMU remap registers can be IMPLEMENTATION DEFINED • the interpretation of the fields of the PRRR and NMRR registers can differ from the description given in this section. VMSAv7 requires that the effect of these registers is limited to remapping the attributes of memory locations. These registers must not change whether any cache or MMU hardware is enabled. The mechanism by which the MMU remap registers have an effect when the SCTLR.TRE bit is set to 0 is IMPLEMENTATION DEFINED. The ARMv7 architecture requires that from reset, if the IMPLEMENTATION DEFINED mechanism has not been invoked: • If the MMU is enabled, the architecturally-defined behavior of the TEX[2:0], C, and B bits must apply, without reference to the TEX remap functionality. In other words, memory attribute assignment must comply with the scheme described in C, B, and TEX[2:0] encodings without TEX remap on page B3-33. • If the MMU is disabled, then the architecturally-defined behavior of the VMSA with the MMU disabled must apply, without reference to the TEX remap functionality. See Enabling and disabling the MMU on page B3-5. Typical mechanisms for enabling the IMPLEMENTATION DEFINED effect of the TEX Remap registers when SCTLR.TRE bit is set to 0 include: • • a control bit in the ACTLR, or in a CP15 c15 register changing the behavior when the PRRR and NMRR registers are changed from their reset values. IMPLEMENTATION DEFINED In addition, if the MMU is disabled and the SCTLR.TRE bit is set to 1, the architecturally-defined behavior of the VMSA with the MMU disabled must apply without reference to the TEX remap functionality. When the Security Extensions are implemented, the IMPLEMENTATION DEFINED effect of these registers must only take effect in the security domain of the registers. The OS managed translation table bits When TEX remap is enabled, the TEX[2:1] bits in the translation table descriptors are available as two flags that can be managed by the operating system. In VMSAv7, as long as the SCTLR.TRE bit is set to 1, the values of the TEX[2:1] bits are ignored by the memory management hardware. You can write any value to these bits in the translation tables. In a system that implements access flag updates in hardware, a hardware access flag update never changes these bits. B3-38 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) B3.7.4 The effect of the Security Extensions on TEX remapping When the Security Extensions are implemented, the MMU remap registers are banked in the Secure and Non-secure security states. The register versions for the current security state apply to all TLB lookups. The SCTLR.TRE bit is banked in the Secure and Non-secure copies of the register, and the appropriate version of this bit determines whether TEX remapping is applied to TLB lookups in the current security state. When the Security Extensions are implemented, the translation table descriptors include an NS bit. For security reasons, the NS bit is not accessible through the MMU remap registers. Write accesses to the Secure copies of the MMU remap registers are disabled when the CP15SDISABLE input is asserted HIGH, and the MCR operations to access these registers become UNDEFINED. For more information, see The CP15SDISABLE input on page B3-76. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-39 Virtual Memory System Architecture (VMSA) B3.8 VMSA memory aborts The mechanisms that cause the ARM processor to take an exception because of a failed memory access are: MMU fault The MMU detects an access restriction and signals the processor. External abort A memory system component other than the MMU signals an illegal or faulting memory access. The exception taken is a Prefetch Abort exception if either of these occurs synchronously on an instruction fetch, and a Data Abort exception otherwise. Collectively, these mechanisms are called aborts. The different abort mechanisms are described in: • MMU faults • External aborts on page B3-45. An access that causes an abort is said to be aborted, and uses the Fault Address Registers (FARs) and Fault Status Registers (FSRs) to record context information. The FARs and FSRs are described in Fault Status and Fault Address registers in a VMSA implementation on page B3-48. Also, a debug exception can cause the processor to take a Prefetch Abort exception or a Data Abort exception, and to update the FARs and FSRs. For details see Chapter C4 Debug Exceptions and Debug event prioritization on page C3-43. B3.8.1 MMU faults The MMU checks the memory accesses required for instruction fetches and for explicit memory accesses: • if an instruction fetch faults it generates a Prefetch Abort exception • if an explicit memory access faults it generates a Data Abort exception. For more information about Prefetch Abort exceptions and Data Abort exceptions see Exceptions on page B1-30. MMU faults are always synchronous. For more information, see Terminology for describing exceptions on page B1-4. When the MMU generates an abort for a region of memory, no memory access is made if that region is or could be marked as Strongly-ordered or Device. Fault-checking sequence The sequence used by the MMU to check for access faults is slightly different for sections and pages. For both sections and pages: B3-40 • Figure B3-8 on page B3-41 shows the checking sequence • Figure B3-9 on page B3-42 shows the descriptor fetch and check performed during the checking sequence. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) Modified Virtual Address Alignment check? Yes No Is the access subject to an alignment check? Check address alignment No Misaligned ? Yes Alignment fault Get first-level descriptor See Descriptor fetch and check flowchart Got valid descriptor ? No Section or Page ? Page Abort Get second-level descriptor Section Got valid descriptor ? Check domain Section domain fault No access Access type? Manager Manager Access type? No access Page domain fault Client Check access permissions Yes Abort Check domain Client Section permission fault No See Descriptor fetch and check flowchart Check access permissions Violation ? Violation ? No Yes Page permission fault No Physical Address Figure B3-8 VMSA fault checking sequence ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-41 Virtual Memory System Architecture (VMSA) In First or second level Get descriptor At first or second level External abort? Yes Translation external abort Yes Section or Page Descriptor fault No Descriptor fault ? Section at first level Page at second level No Access flag Yes fault ? Section or Page Access flag fault Section at first level Page at second level No Out Figure B3-9 Descriptor fetch and check in the fault checking sequence The faults that might be detected during the fault checking sequence are described in the following subsections: • Alignment fault • External abort on a translation table walk • Translation fault on page B3-43 • Access Flag fault on page B3-43 • Domain fault on page B3-44 • Permission fault on page B3-44. Alignment fault The ARMv7 memory architecture requires support for strict alignment checking. This checking is controlled by the SCTLR.A bit, see c1, System Control Register (SCTLR) on page B3-96. For details of when Alignment faults are generated see Unaligned data access on page A3-5. External abort on a translation table walk This is described in the section External aborts on page B3-45, see External abort on a translation table walk on page B3-46. B3-42 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) Translation fault There are two types of Translation fault: Section This is generated if the first-level descriptor is marked as invalid. This happens if bits [1:0] of the descriptor are: • 0b00, the fault encoding • 0b11, the reserved encoding. For more information, see First-level descriptors on page B3-8. Page This is generated if the second-level descriptor is marked as invalid. This happens if bits [1:0] of the descriptor are 0b00, the fault encoding. For more information, see Second-level descriptors on page B3-10. Translation table entries that result in Translation faults are guaranteed not to be cached, meaning the TLB is not updated. Therefore, when a Translation fault occurs, it is not necessary to perform any TLB maintenance operations to remove the faulting entries. Translation faults can be generated by data and unified cache maintenance operations by MVA. It is IMPLEMENTATION DEFINED whether Translation faults can be generated by instruction cache invalidate by MVA operations, see Instruction cache maintenance operations by MVA on page B3-25. It is IMPLEMENTATION DEFINED whether Translation faults can be generated by branch predictor maintenance operations. Access Flag fault There are two types of Access Flag fault: Section This can be generated when a section with AF == 0 is accessed. Page This can be generated when a page with AF == 0 is accessed. Access Flag faults only occur on a VMSAv7 implementation that provides software management of the access flag, and are only generated when the AFE flag is set to 1 in the SCTLR, see c1, System Control Register (SCTLR) on page B3-96. Translation table entries that result in Access Flag faults are guaranteed not to be cached, meaning the TLB is not updated. Therefore, when an Access Flag fault occurs, it is not necessary to perform any TLB maintenance operations to remove the faulting entries. It is IMPLEMENTATION DEFINED whether Access Flag faults can be generated by any cache maintenance operations by MVA. It is IMPLEMENTATION DEFINED whether Access Flag faults can be generated by branch predictor invalidate by MVA operations. For more information, see The access flag on page B3-21. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-43 Virtual Memory System Architecture (VMSA) Domain fault There are two types of Domain fault: Section When a first-level descriptor fetch returns a valid Section first-level descriptor, the MMU checks the domain field of that descriptor against the Domain Access Control Register, and generates a Section Domain fault if this check fails. Page When a second-level descriptor fetch returns a valid second-level descriptor, the MMU checks the domain field of the first-level descriptor that required the second-level fetch against the Domain Access Control Register, and generates a Page Domain fault if this check fails. Domain faults cannot occur on cache or branch predictor maintenance operations. For more information, see Domains on page B3-31. Where a Domain fault results in an update to the associated translation tables, the appropriate TLB entry must be flushed to ensure correctness. For more information, see the translation table entry update example in TLB maintenance operations and the memory order model on page B3-59. Changes to the Domain Access Control register must be synchronized by one of: • performing a ISB operation • an exception • exception return. For details see Changes to CP15 registers and the memory order model on page B3-77. Permission fault When a memory access is to a Client domain, the MMU checks the access permission field in the translation table entry. As with other MMU faults, there are two types of Permission fault: Section This can be generated when a section in a Client domain is accessed. Page This can be generated when a page in a Client domain is accessed. For details of conditions that cause a Permission fault see Access permissions on page B3-28. Where a Permission fault results in an update to the associated translation tables, the appropriate TLB entry must be flushed to ensure correctness. For more information, see the translation table entry update example in TLB maintenance operations and the memory order model on page B3-59. Permission faults cannot occur on cache or branch predictor maintenance operations. B3-44 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) B3.8.2 External aborts External aborts are defined as errors that occur in the memory system other than those that are detected by the MMU or Debug hardware. They include parity errors detected by the caches or other parts of the memory system. An external abort is one of: • synchronous • precise asynchronous • imprecise asynchronous. For more information, see Terminology for describing exceptions on page B1-4. The ARM architecture does not provide a method to distinguish between precise asynchronous and imprecise asynchronous aborts. The ARM architecture handles asynchronous aborts in a similar way to interrupts, except that they are reported to the processor using the Data Abort exception. Setting the CPSR.A bit to 1 masks asynchronous aborts, see Program Status Registers (PSRs) on page B1-14. Normally, external aborts are rare. An imprecise asynchronous external abort is likely to be fatal to the process that is running. An example of an event that might cause an external abort is an uncorrectable parity or ECC failure on a Level 2 Memory structure. It is IMPLEMENTATION DEFINED which external aborts, if any, are supported. VMSAv7 permits external aborts on data accesses, translation table walks, and instruction fetches to be either synchronous or asynchronous. The DFSR indicates whether the external abort is synchronous or asynchronous, see c5, Data Fault Status Register (DFSR) on page B3-121. Note Because imprecise external aborts are normally fatal to the process that caused them, ARM recommends that implementations make external aborts precise wherever possible. More information about possible external aborts is given in the subsections: • External abort on instruction fetch on page B3-46 • External abort on data read or write on page B3-46 • External abort on a translation table walk on page B3-46 • Behavior of external aborts on a translation table walk caused by a VA to PA translation on page B3-46 • Parity error reporting on page B3-46. For details of how external aborts are reported see Fault Status and Fault Address registers in a VMSA implementation on page B3-48. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-45 Virtual Memory System Architecture (VMSA) External abort on instruction fetch An external abort on an instruction fetch can be either synchronous or asynchronous. A synchronous external abort on an instruction fetch is taken precisely. An implementation can report the external abort asynchronously from the instruction that it applies to. In such an implementation these aborts behave essentially as interrupts. They are masked by the CPSR.A bit when it is set to 1, otherwise they are reported using the Data Abort exception. External abort on data read or write Externally generated errors during a data read or write can be either synchronous or asynchronous. An implementation can report the external abort asynchronously from the instruction that generated the access. In such an implementation these aborts behave essentially as interrupts. They are masked by the CPSR.A bit when it is set to 1, otherwise they are reported using the Data Abort exception. External abort on a translation table walk An external abort on a translation table walk can be either synchronous or asynchronous. If the external abort is synchronous then the result is: • a synchronous Prefetch Abort exception if the translation table walk is for an instruction fetch • a synchronous Data Abort exception if the translation table walk is for a data access. An implementation can report the error in the translation table walk asynchronously from executing the instruction whose instruction fetch or memory access caused the translation table walk. In such an implementation these aborts behave essentially as interrupts. They are masked by the CPSR.A bit when it is set to 1, otherwise they are reported using the Data Abort exception. Behavior of external aborts on a translation table walk caused by a VA to PA translation The VA to PA translation operations described in CP15 c7, Virtual Address to Physical Address translation operations on page B3-130 require translation table walks. An external abort can occur in the translation table walk, as described in External abort on a translation table walk. The abort generates a Data Abort exception, and can be synchronous or asynchronous. Parity error reporting The ARM architecture supports the reporting of both synchronous and asynchronous parity errors from the cache systems. It is IMPLEMENTATION DEFINED what parity errors in the cache systems, if any, result in synchronous or asynchronous parity errors. A fault status code is defined for reporting parity errors, see Fault Status and Fault Address registers in a VMSA implementation on page B3-48. However when parity error reporting is implemented it is IMPLEMENTATION DEFINED whether the assigned fault status code or another appropriate encoding is used to report parity errors. B3-46 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) For all purposes other than the fault status encoding, parity errors are treated as external aborts. B3.8.3 Prioritization of aborts For synchronous aborts, Debug event prioritization on page C3-43 describes the relationship between debug events, MMU faults and external aborts. In general, the ARM architecture does not define when asynchronous events are taken, and therefore the prioritization of asynchronous events is IMPLEMENTATION DEFINED. Note A special requirement applies to asynchronous watchpoints, see Debug event prioritization on page C3-43. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-47 Virtual Memory System Architecture (VMSA) B3.9 Fault Status and Fault Address registers in a VMSA implementation This section describes the Fault Status and Fault Address registers, and how they report information about VMSA aborts. It contains the following subsections: • About the Fault Status and Fault Address registers • Data Abort exceptions on page B3-49 • Prefetch Abort exceptions on page B3-49 • Fault Status Register encodings for the VMSA on page B3-50 • Distinguishing read and write accesses on Data Abort exceptions on page B3-52 • Provision for classification of external aborts on page B3-52 • The Domain field in the DFSR on page B3-52 • Auxiliary Fault Status Registers on page B3-53. Also, these registers are used to report information about debug exceptions. For details see Effects of debug exceptions on CP15 registers and the DBGWFAR on page C4-4. B3.9.1 About the Fault Status and Fault Address registers VMSAv7 provides four registers for reporting fault address and status information: • The Data Fault Status Register, see c5, Data Fault Status Register (DFSR) on page B3-121. The DFSR is updated on taking a Data Abort exception. • The Instruction Fault Status Register, see c5, Instruction Fault Status Register (IFSR) on page B3-122. The IFSR is updated on taking a Prefetch Abort exception. • The Data Fault Address Register, see c6, Data Fault Address Register (DFAR) on page B3-124. In some cases, on taking a synchronous Data Abort exception the DFAR is updated with the faulting address. See Terminology for describing exceptions on page B1-4 for a description of synchronous exceptions. • The Instruction Fault Address Register, see c6, Instruction Fault Address Register (IFAR) on page B3-125. The IFAR is updated with the faulting address on taking a Prefetch Abort exception. In addition, the architecture provides encodings for two IMPLEMENTATION DEFINED Auxiliary Fault Status Registers, see Auxiliary Fault Status Registers on page B3-53. Note B3-48 • On a Data Abort exception that is generated by an instruction cache maintenance operation, the IFSR is also updated. • Before ARMv7, the Data Fault Address Register (DFAR) was called the Fault Address Register (FAR). Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) On a Watchpoint debug exception, the Watchpoint Fault Address Register (DBGWFAR) is used to hold fault information. On a watchpoint access the DBGWFAR is updated with the address of the instruction that generated the Data Abort exception. For more information, see Watchpoint Fault Address Register (DBGWFAR) on page C10-28. B3.9.2 Data Abort exceptions On taking a Data Abort exception the processor: • updates the DFSR with a fault status code • if the Data Abort exception is synchronous: — updates the DFSR with whether the faulted access was a read or a write, and the domain number of the access, if applicable — if the Data Abort exception was not caused by a Watchpoint debug event, updates the DFAR with the MVA that caused the Data Abort exception — if the Data Abort exception was caused by a Watchpoint debug event, the DFAR becomes UNKNOWN • if the Data Abort exception is asynchronous, the DFAR becomes UNKNOWN. When the Security Extensions are implemented, the security state of the processor immediately after taking the Data Abort exception determines whether the Secure or Non-secure DFSR and DFAR are updated. If the Data Abort exception is generated by an instruction cache or branch predictor invalidation by MVA, the DFSR indicates an Instruction Cache Maintenance Operation Fault and the IFSR indicates a Translation or Access Flag fault. On an access that might have multiple aborts, the MMU fault checking sequence and the prioritization of aborts determine which abort occurs. For more information, see Fault-checking sequence on page B3-40 and Prioritization of aborts on page B3-47. B3.9.3 Prefetch Abort exceptions A Prefetch Abort exception is taken synchronously with the instruction that an abort is reported on. This means: • If the instruction is executed a Prefetch Abort exception is generated. • If the instruction fetch is issued but the processor does not attempt to execute the instruction no Prefetch Abort exception is generated for that instruction. For example, if the processor branches round the instruction no Prefetch Abort exception is generated. On taking a Prefetch Abort exception the processor: • updates the IFSR with a fault status code • updates the IFAR with the MVA that caused the Prefetch Abort exception. When the Security Extensions are implemented, the security state of the processor immediately after taking the Prefetch Abort exception determines whether the Secure or Non-secure DFSR and DFAR are updated. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-49 Virtual Memory System Architecture (VMSA) B3.9.4 Fault Status Register encodings for the VMSA For the fault status encodings for a VMSA implementation see: • Table B3-11 for the Instruction Fault Status Register (IFSR) encodings • Table B3-12 on page B3-51 for the Data Fault Status Register (DFSR) encodings. Note In previous ARM documentation, the terms precise and imprecise were used instead of synchronous and asynchronous. For details of the more exact terminology introduced in this manual see Terminology for describing exceptions on page B1-4. Table B3-11 VMSAv7 IFSR encodings IFSR [10,3:0] a Source 01100 01110 Translation table walk synchronous external abort 1st level 2nd level Valid 11100 11110 Translation table walk synchronous parity error 1st level 2nd level Valid 00101 00111 Translation fault Section Page Valid 00011 b 00110 Access Flag fault Section Page Valid 01001 01011 Domain fault Section Page Valid 01101 01111 Permission fault Section Page Valid 00010 Debug event UNKNOWN See Software debug events on page C3-5 01000 Synchronous external abort Valid - 10100 IMPLEMENTATION DEFINED Valid Lockdown 11010 IMPLEMENTATION DEFINED Valid Coprocessor abort 11001 Memory access synchronous parity error Valid - IFAR Notes - - MMU fault MMU fault MMU fault MMU fault a. All IFSR[10,3:0] values not listed in this table are reserved. b. Previously, this encoding was a deprecated encoding for Alignment fault. The extensive changes in the memory model in ARMv7 and VMSAv7 mean there should be no possibility of confusing these two uses. B3-50 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) Table B3-12 VMSAv7 DFSR encodings DFSR [10,3:0] a Source DFAR Domain Notes 00001 Alignment fault Valid UNKNOWN MMU fault 00100 Instruction cache maintenance fault Valid UNKNOWN - 01100 01110 Translation table walk synchronous external abort 1st level 2nd level Valid Valid UNKNOWN 11100 11110 Translation table walk synchronous parity error 1st level 2nd level Valid Valid UNKNOWN 00101 00111 Translation fault Section Page Valid Valid UNKNOWN 00011 b 00110 Access Flag fault Section Page Valid Valid UNKNOWN 01001 01011 Domain fault Section Page Valid Valid Valid Valid MMU fault 01101 01111 Permission fault Section Page Valid Valid Valid Valid MMU fault 00010 Debug event UNKNOWN UNKNOWN See Software debug events on page C3-5 01000 Synchronous external abort Valid UNKNOWN - 10100 IMPLEMENTATION DEFINED - - Lockdown 11010 IMPLEMENTATION DEFINED - - Coprocessor abort 11001 Memory access synchronous parity error Valid UNKNOWN - 10110 Asynchronous external abort c UNKNOWN UNKNOWN - 11000 Memory access asynchronous parity error UNKNOWN UNKNOWN Including on translation table walk Valid Valid Valid Valid - - MMU fault MMU fault a. All DFSR[10,3:0] values not listed in this table are reserved. b. Previously, this encoding was a deprecated encoding for Alignment fault. The extensive changes in the memory model in ARMv7 and VMSAv7 mean there should be no possibility of confusing these two uses. c. Including asynchronous data external abort on translation table walk or instruction fetch. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-51 Virtual Memory System Architecture (VMSA) Reserved encodings in the IFSR and DFSR encodings tables A single encoding is reserved for cache and TLB lockdown faults. The details of these faults and any associated subsidiary registers are IMPLEMENTATION DEFINED. A single encoding is reserved for aborts associated with coprocessors. The details of these faults are IMPLEMENTATION DEFINED. B3.9.5 Distinguishing read and write accesses on Data Abort exceptions On a synchronous Data Abort exception, the DFSR.WnR bit, bit [11] of the register, indicates whether the abort occurred on a read access or on a write access. However, for a fault on a CP15 cache maintenance operation, including a fault on a VA to PA translation operation, this bit always indicates a write access fault. For a fault generated by an SWP or SWPB instruction, the WnR bit is 0 if a read to the location would have generated a fault, otherwise it is 1. B3.9.6 Provision for classification of external aborts An implementation can use the DFSR.ExT and IFSR.ExT bits to provide more information about external aborts: • DFSR.ExT can provide an IMPLEMENTATION DEFINED classification of external aborts on data accesses • IFSR.ExT can provide an IMPLEMENTATION DEFINED classification of external aborts on instruction accesses For all aborts other than external aborts these bits return a value of 0. B3.9.7 The Domain field in the DFSR The DFSR includes a domain field. This has been inherited from previous versions of the VMSA. There is no domain field in the IFSR. The domain field of the DFSR is not valid on watchpoints. From ARMv7, use of the domain field in the DFSR is deprecated. This field might not be supported in future versions of the ARM architecture. ARM strongly recommends that new software does not use this field. For both Data Abort exceptions and Prefetch Abort exceptions, software can find the domain information by performing a translation table read for the faulting address and extracting the domain field from the translation table entry. B3-52 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) B3.9.8 Auxiliary Fault Status Registers ARMv7 architects two Auxiliary Fault Status Registers: • the Auxiliary Data Fault Status Register (ADFSR) • the Auxiliary Instruction Fault Status Register (AIFSR). These registers enable additional fault status information to be returned: • The position of these registers is architecturally-defined, but the content and use of the registers is IMPLEMENTATION DEFINED. • An implementation that does not need to report additional fault information must implement these registers as UNK/SBZ. This ensures that a privileged attempt to access these registers does not cause an Undefined Instruction exception. An example use of these registers would be to return more information for diagnosing parity errors. See c5, Auxiliary Data and Instruction Fault Status Registers (ADFSR and AIFSR) on page B3-123 for the architectural details of these registers. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-53 Virtual Memory System Architecture (VMSA) B3.10 Translation Lookaside Buffers (TLBs) Translation Lookaside Buffers (TLBs) are an implementation technique that caches translations or translation table entries. TLBs avoid the requirement for every memory access to perform a translation table lookup. The ARM architecture does not specify the exact form of the TLB structures for any design. In a similar way to the requirements for caches, the architecture only defines certain principles for TLBs: • The architecture has a concept of an entry locked down in the TLB. The method by which lockdown is achieved is IMPLEMENTATION DEFINED, and an implementation might not support lockdown. • An unlocked entry in the TLB is not guaranteed to remain in the TLB. • A locked entry in the TLB is guaranteed to remain in the TLB. However, a locked entry in a TLB might be updated by subsequent updates to the translation tables. Therefore it is not guaranteed to remain incoherent with an entry in the translation table if a change is made to the translation tables. • A translation table entry that returns a Translation fault or an Access fault is guaranteed not to be held in the TLB. However a translation table entry that returns a Domain fault or a Permission fault might be held in the TLB. • Any translation table entry that does not return a Translation or Access fault might be allocated to an enabled TLB at any time. The only translation table entries guaranteed not to be held in the TLB are those that return a Translation or Access fault. • Software can rely on the fact that between disabling and re-enabling the MMU, entries in the TLB have not have been corrupted to give incorrect translations. B3.10.1 Global and non-global regions in the virtual memory map The VMSA permits the virtual memory map to be divided into global and non-global regions, distinguished by the nG bit in the translation table descriptors: nG == 0 The translation is global. nG == 1 The translation is process specific, meaning it relates to the current ASID, as defined by the CONTEXTIDR. Each non-global region has an associated Address Space Identifier (ASID). These identifiers enable different translation table mappings to co-exist in a caching structure such as a TLB. This means that a new mapping of a non-global memory region can be created without removing previous mappings. For a symmetric multiprocessor cluster where a single operating system is running on the set of processing elements, ARMv7 requires all ASID values to be assigned uniquely. In other words, each ASID value must have the same meaning to all processing elements in the system. The use of non-global pages when FCSEIDR[31:25] is not 0b0000000 is UNPREDICTABLE. B3-54 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) B3.10.2 TLB matching A TLB is a hardware caching structure for translation table information. Like other hardware caching structures, it is mostly invisible to software. However, there are some situations where it can become visible. These are associated with coherency problems caused by an update to the translation table that has not been reflected in the TLB. The TLB maintenance operations, described in TLB maintenance on page B3-56, enable software to prevent any TLB incoherency becoming a problem. A particular case where the presence of the TLB can become visible is if the translation table entries that are in use under a particular ASID are changed without suitable invalidation of the TLB. This is an issue regardless of whether or not the translation table entries are global. In some cases, the TLB can hold two mappings for the same address, and this can lead to UNPREDICTABLE behavior TLB block size When the TLB is scanned, address matching is performed on bits [31:N] of the MVA, where N is log2 of the page size, or block size, for the TLB entry. In VMSAv7, a TLB can store entries based on the following block sizes: Supersections consist of 16MB blocks of memory, N = 24 Sections consist of 1MB blocks of memory, N = 20 Large pages consist of 64KB blocks of memory, N = 16 Small pages consist of 4KB blocks of memory, N = 12. Supersections, Sections and Large pages are supported to permit mapping of a large region of memory while using only a single entry in a TLB. B3.10.3 TLB behavior at reset In ARMv7, there is no requirement that a reset invalidates the TLBs. ARMv7 recognizes that an implementation might require caches, including TLBs, to maintain context over a system reset. Possible reasons for doing so include power management and debug requirements. For ARMv7: • All TLBs are disabled at reset. • An implementation can require the use of a specific TLB invalidation routine, to invalidate the TLB arrays before they are enabled after a reset. The exact form of this routine is IMPLEMENTATION DEFINED, but if an invalidation routine is required it must be documented clearly as part of the documentation of the device. ARM recommends that if an invalidation routine is required for this purpose, the routine is based on the ARMv7 TLB maintenance operations described in CP15 c8, TLB maintenance operations on page B3-138. • ARM DDI 0406B When TLBs that have not been invalidated by some mechanism since reset are enabled, the state of those TLBs is UNPREDICTABLE. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-55 Virtual Memory System Architecture (VMSA) Similar rules apply: • to cache behavior, see Behavior of the caches at reset on page B2-6 • to branch predictor behavior, see Behavior of the branch predictors at reset on page B2-21. B3.10.4 TLB lockdown ARMv7 recognizes that any TLB lockdown scheme is heavily dependent on the microarchitecture, making it inappropriate to define a common mechanism across all implementations. This means that: • ARMv7 does not require TLB lockdown support. • If TLB lockdown support is implemented, the lockdown mechanism is IMPLEMENTATION DEFINED. However, key properties of the interaction of lockdown with the architecture must be documented as part of the implementation documentation. This means that: • In ARMv7, the TLB Type Register TLBTR does not define the lockdown scheme in use. This is a change from previous versions of the architecture. • A region of the CP15 c10 encodings is reserved for IMPLEMENTATION DEFINED TLB functions, such as TLB lockdown functions. The reserved encodings are those with: = {0, 1, 4, 8} — — all values of and . See also The implementation defined TLB control operations on page B3-143. An implementation might use some of the CP15 c10 encodings that are reserved for IMPLEMENTATION TLB functions to implement additional TLB control functions. These functions might include: • Unlock all locked TLB entries. • Preload into a specific level of TLB. This is beyond the scope of the PLI and PLD hint instructions. DEFINED B3.10.5 TLB maintenance TLB maintenance operations provide a mechanism to invalidate entries from a TLB. Any TLB operation might affect other TLB entries that are not locked down. TLB maintenance operations are provided by CP15 c8 functions. The following operations are supported: • invalidate all unlocked entries in the TLB • invalidate a single TLB entry, by MVA, or MVA and ASID for a non-global entry • invalidate all TLB entries that match a specified ASID. The Multiprocessing Extensions add the following operations: B3-56 • invalidate all TLB entries that match a specified by MVA, regardless of the ASID • operations that apply across multiprocessors in the same Inner Shareable domain, see Multiprocessor effects on TLB maintenance operations on page B3-62. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) In the TLB operations: • An operation that depends on an MVA value includes a field for the ASID to be used as part of the translation. For a translation table entry that refers to a non-global region, the ASID must be specified. • If the Security Extensions are implemented, operations include the current security state as part of the VA to PA address translation required for the TLB operation. A single register function can apply one of these operations: • when separate Instruction and Data TLBs are implemented, to: — only the Instruction TLB — only the Data TLB — both the Instruction TLB and the Data TLB • the Unified TLB, when a Unified TLB is implemented. The distinction between the Instruction TLB and Data TLB in TLB maintenance operations is historical and is not supported in newer instructions. The distinction is deprecated in ARMv7. Developers must not rely on this distinction being maintained in future versions of the ARM architecture. The ARM architecture does not dictate the form in which the TLB stores translation table entries. However, for TLB invalidate operations, the size of the table entry that must be removed from the TLB must be at least the size that appears in the translation table entry. These operations are described in CP15 c8, TLB maintenance operations on page B3-138. The interaction of TLB maintenance operations with TLB lockdown The precise interaction of TLB lockdown with the TLB maintenance operations is IMPLEMENTATION DEFINED. However, the architecturally-defined TLB maintenance operations must comply with these rules: • The effect on locked entries of the TLB invalidate all unlocked entries and TLB invalidate by MVA all ASID operations is IMPLEMENTATION DEFINED. However, these operations must implement one of the following options: — Have no effect on entries that are locked down. — Generate an IMPLEMENTATION DEFINED Data Abort exception if an entry is locked down, or might be locked down. A fault status code is provided in the CP15 c5 fault status registers for cache and TLB lockdown faults, see Table B3-11 on page B3-50 and Table B3-12 on page B3-51. This permits a typical usage model for TLB invalidate routines, where the routine invalidates a large range of addresses, without considering whether any entries are locked in the TLB. • ARM DDI 0406B The effect on locked entries of the TLB invalidate by MVA and invalidate by ASID match operations is IMPLEMENTATION DEFINED. However, these operations must implement one of these options: — A locked entry is invalidated in the TLB. — The operation has no effect on a locked entry in the TLB. In the case of the Invalidate single entry by MVA, this means the operation is treated as a NOP. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-57 Virtual Memory System Architecture (VMSA) — The operation generates an IMPLEMENTATION DEFINED Data Abort exception if it operates on an entry that is locked down, or might be locked down. A fault status code is provided in the CP15 c5 fault status registers for cache and TLB lockdown faults, see Table B3-11 on page B3-50 and Table B3-12 on page B3-51. Any implementation that uses an abort mechanism for entries that might be locked must: • document the IMPLEMENTATION DEFINED code sequences that then performs the required operations on entries that are not locked down • implement one of the other specified alternatives for the locked entries. ARM recommends that architecturally-defined operations are used wherever possible in such sequences, to minimize the number of customized operations required. In addition, if an implementation uses an abort mechanisms for entries that might be locked it must also must provide a mechanism that ensures that no TLB entries are locked. Similar rules apply to cache lockdown, see The interaction of cache lockdown with cache maintenance on page B2-18. An unlocked entry in the TLB is not guaranteed to remain in the TLB. This means that, as a side effect of a TLB maintenance operation, any unlocked entry in the TLB might be invalidated. The effect of the Security Extensions on the TLB maintenance operations If an implementation includes the Security Extensions, the TLB maintenance operations must take account of the current security state. Table B3-13 summarizes how the Security Extensions affect these operations. Table B3-13 TLB maintenance operations when the Security Extensions are implemented B3-58 TLB maintenance operation TLB entries guaranteed to be invalidated Invalidate all entries All TLB entries accessible in the current security state. Invalidate single entry by MVA Targeted TLB entry, only if all of these apply: • the MVA value matches • the ASID value matches, for a non-global entry • the entry applies to the current security state. Invalidate entries by ASID match All non-global TLB entries for which both: • the ASID value matches • the entry applies to the current security state. Invalidate entries by MVA, all ASID All targeted TLB entries for which both: • the MVA value matches • the entry applies to the current security state. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) The Security Extensions do not change the possible effects of TLB maintenance operations on entries that are locked or might be locked, as described in The interaction of TLB maintenance operations with TLB lockdown on page B3-57. If an implementation has TLB maintenance operations that generate aborts on entries that are locked or might be locked then those aborts can occur on any maintenance operation, regardless of the Security Extensions. However aborts must not be generated as a result of entries from the other security state. TLB maintenance operations and the memory order model The following rules describe the relations between the memory order model and the TLB maintenance operations: • A TLB invalidate operation is complete when all memory accesses using the TLB entries that have been invalidated have been observed by all observers to the extent that those accesses are required to be observed, as determined by the shareability and cacheability of the memory locations accessed by the accesses. In addition, once the TLB invalidate operation is complete, no new memory accesses that can be observed by those observers using those TLB entries will be performed. • A TLB maintenance operation is only guaranteed to be complete after the execution of a DSB instruction. • An ISB instruction, or a return from an exception, causes the effect of all completed TLB maintenance operations that appear in program order before the ISB or return from exception to be visible to all subsequent instructions, including the instruction fetches for those instructions. • An exception causes all completed TLB maintenance operations that appear in the instruction stream before the point where the exception was taken to be visible to all subsequent instructions, including the instruction fetches for those instructions. • All TLB Maintenance operations are executed in program order relative to each other. • The execution of a Data or Unified TLB maintenance operation is guaranteed not to affect any explicit memory access of any instruction that appears in program order before the TLB maintenance operation. This means no memory barrier instruction is required. This ordering is guaranteed by the hardware implementation. • The execution of a Data or Unified TLB maintenance operation is only guaranteed to be visible to a subsequent explicit load or store operation after both: • — the execution of a DSB instruction to ensure the completion of the TLB operation — a subsequent ISB instruction, or taking an exception, or returning from an exception. The execution of an Instruction or Unified TLB maintenance operation is only guaranteed to be visible to a subsequent instruction fetch after both: — the execution of a DSB instruction to ensure the completion of the TLB operation — a subsequent ISB instruction, or taking an exception, or returning from an exception. The following rules apply when writing translation table entries. They ensure that the updated entries are visible to subsequent accesses and cache maintenance operations. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-59 Virtual Memory System Architecture (VMSA) For TLB maintenance, the translation table walk is treated as a separate observer: • A write to the translation tables, after it has been cleaned from the cache if appropriate, is only guaranteed to be seen by a translation table walk caused by an explicit load or store after the execution of both a DSB and an ISB. However, it is guaranteed that any writes to the translation tables are not seen by any explicit memory access that occurs in program order before the write to the translation tables. • For the base ARMv7 architecture and versions of the architecture before ARMv7, if the translation tables are held in Write-Back Cacheable memory, the caches must be cleaned to the point of unification after writing to the translation tables and before the DSB instruction. This ensures that the updated translation table are visible to a hardware translation table walk. • A write to the translation tables, after it has been cleaned from the cache if appropriate, is only guaranteed to be seen by a translation table walk caused by the instruction fetch of an instruction that follows the write to the translation tables after both a DSB and an ISB. Therefore, typical code for writing a translation table entry, covering changes to the instruction or data mappings in a uniprocessor system is: STR rx, [Translation table entry] ; write new entry to the translation table Clean cache line [Translation table entry] : This operation is not required with the ; Multiprocessing Extensions. DSB ; ensures visibility of the data cleaned from the D Cache Invalidate TLB entry by MVA (and ASID if non-global) [page address] Invalidate BTC DSB ; ensure completion of the Invalidate TLB operation ISB ; ensure table changes visible to instruction fetch Synchronization of changes of ASID and TTBR A common virtual memory management requirement is to change the ContextID and Translation Table Base Registers together to associate the new ContextID with different translation tables. However, such a change is complicated by: • the depth of prefetch being IMPLEMENTATION DEFINED • the use of branch prediction. The virtual memory management operations must ensure the synchronization of changes of the ContextID and the translation table registers. For example, some or all of the TLBs, BTCs (Branch Target Caches) and other caching of ASID and translation information might become corrupt with invalid translations. Synchronization is necessary to avoid either: • the old ASID being associated with translation table walks from the new translation tables • the new ASID being associated with translation table walks from the old translation tables. There are a number of possible solutions to this problem, and the most appropriate approach depends on the system. Example B3-2 on page B3-61 and Example B3-3 on page B3-61, and Example B3-4 on page B3-62 describe three possible approaches. B3-60 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) Note Another instance of the synchronization problem occurs if a branch is encountered between changing the ASID and performing the synchronization. In this case the value in the branch predictor might be associated with the incorrect ASID. This possibility can be addressed by any of these approaches, but might be addressed by avoiding such branches. Example B3-2 Using a reserved ASID to synchronize ASID and TTBR changes In this approach, a particular ASID value is reserved for use by the operating system, and is used only for the synchronization of the ASID and Translation Table Base Register. This example uses the value of 0 for this purpose, but any value could be used. This approach can be used only when the size of the mapping for any given virtual address is the same in the old and new translation tables. The following sequence is followed, and must be executed from memory marked as being global: Change ASID to 0 ISB Change Translation Table Base Register ISB Change ASID to new value This approach ensures that any non-global pages prefetched at a time when it is uncertain whether the old or new translation tables are being accessed are associated with the unused ASID value of 0. Since the ASID value of 0 is not used for any normal operations these entries cannot cause corruption of execution. Example B3-3 Using translation tables that contain only global mappings when changing the ASID A second approach involves switching the translation tables to a set of translation tables that only contain global mappings while switching the ASID. The following sequence is followed, and must be executed from memory marked as being global: Change Translation Table Base Register to the global-only mappings ISB Change ASID to new value ISB Change Translation Table Base Register to new value This approach ensures that no non-global pages can be prefetched at a time when it is uncertain whether the old or new ASID value will be used. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-61 Virtual Memory System Architecture (VMSA) Example B3-4 Disabling non-global mappings when changing the ASID In systems where the only non-global mappings are held in TTBR0, you can use the TTBCR.PD0 field to disable use of the TTBR0 register during the change of ASID. This means you do not require a set of global-only mappings. The following sequence is followed, and must be executed from a memory region with a translation that is accessed from the base address in the TTBR1 register, and is marked as global: Set TTBCR.PD0 = 1 ISB Change ASID to new value Change Translation Table Base Register to new value ISB Set TTBCR.PD0 = 0 This approach ensures that no non-global pages can be prefetched at a time when it is uncertain whether the old or new ASID value will be used. Multiprocessor effects on TLB maintenance operations The base ARMv7 architecture defines that the TLB maintenance operations apply only to the TLB directly attached to the processor on which the operation is executed. To improve the implementation of multiprocessor systems, a set of extensions to ARMv7, called the Multiprocessing Extensions, has been introduced. These introduce some new TLB maintenance operations to apply to the TLBs of processors in the same Inner Shareable domain. The extensions can be implemented in a uniprocessor system with no hardware support for cache coherency. In such a system, the Inner Shareable domain would be limited to being the single processor, and all instructions defined to apply to the Inner Shareable domain behave as aliases of the local operations. B3-62 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) B3.11 Virtual Address to Physical Address translation operations CP15 c7 includes operations for Virtual Address (VA) to Physical Address (PA) translation. For more information, see CP15 c7, Virtual Address to Physical Address translation operations on page B3-130. The details of these operations depend on whether the Security Extensions are implemented. All VA to PA translations take account of the TEX remapping when this remapping is enabled, see The alternative descriptions of the Memory region attributes on page B3-32. A VA to PA translation operation might require a translation table walk, and an external abort might occur on this walk. For more information, see Behavior of external aborts on a translation table walk caused by a VA to PA translation on page B3-46. If an external abort occurs on this walk: • The Physical Address Register, PAR: — is not updated if the abort is synchronous — is UNPREDICTABLE if the abort is asynchronous. • if the Security Extensions are implemented, fault status and fault address register updates occur only in the security state in which the abort is handled. Fault address and fault status registers in the other security state are not changed. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-63 Virtual Memory System Architecture (VMSA) B3.12 CP15 registers for a VMSA implementation This section gives a full description of the registers implemented in the CP15 System Control Coprocessor in an ARMv7 implementation that includes the VMSA memory system. Therefore, this is the description of the CP15 registers for an ARMv7-A implementation. Some of the registers described in this section are also included in an ARMv7 implementation with a PMSA. The section CP15 registers for a PMSA implementation on page B4-22 also includes descriptions of these registers. See Coprocessors and system control on page B1-62 for general information about the System Control Coprocessor, CP15 and the register access instructions MRC and MCR. Information in this section is organized as follows: • general information is given in: — Organization of the CP15 registers in a VMSA implementation — General behavior of CP15 registers on page B3-68 — Effect of the Security Extensions on the CP15 registers on page B3-71 — Changes to CP15 registers and the memory order model on page B3-77 — Meaning of fixed bit values in register diagrams on page B3-78. • this is followed by, for each of the primary CP15 registers c0 to c15: — a general description of the organization of the primary CP15 register — detailed descriptions of all the registers in that primary register. Note The detailed descriptions of the registers that implement the processor identification scheme, CPUID, are given in Chapter B5 The CPUID Identification Scheme, and not in this section. Table B3-14 on page B3-66 lists all of the CP15 registers in a VMSA implementation, and is an index to the detailed description of each register. B3.12.1 Organization of the CP15 registers in a VMSA implementation Figure B3-10 on page B3-65 summarizes the ARMv7 CP15 registers when the VMSA is implemented. Table B3-14 on page B3-66 lists all of these registers. Note ARMv7 introduces significant changes to the memory system registers, especially in relation to caches. For details of: B3-64 • the CP15 register implementation in VMSAv6, see Organization of CP15 registers for an ARMv6 VMSA implementation on page AppxG-29 • how the ARMv7 registers must be used to discover what caches can be accessed by the processor, see Identifying the cache resources in ARMv7 on page B2-4. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) CRn c0 opc1 0 CRm c0 1 {c1-c7} c0 c1 2 0 c2 c3 c5 c6 c7 0 0 0 0 0 c0 c0 c1 c0 c0 {c0,c1} c0 c0 c1 c4 c5 c6 c8 c10 c8 c9 0 {0-7} c10 0 c11 c12 {0-7} 0 c13 0 c11 c13 c14 {c3,c5,c6,c7} {c0-c2,c5-c8} {c12-c15} {c0,c1,c4,c8} c2 {c0-c8,c15} c0 c1 c0 c15 {0-7} {c0-c15} Read-only ‡ opc2 0 1 2 3 5 {4,6,7} {0-7} 0 1 7 0 {0-2} {0-2} {0-2} 0 {0,1} {0,2} 4 {0,6} 0 {0,1,6,7} 4 {1,2} {0-7} {1,2} {4,5} 1 1 {1,2} {0-3} {0-7} {0-7} {0-7} {0,1} {0-7} {0,1} 0 0 {1-4} {0-7} ‡ ‡ ‡ ‡ † ‡ Read/Write Access depends on the operation MIDR, Main ID Register CTR, Cache Type Register TCMTR, TCM Type Register, IMPLEMENTATION DEFINED TLBTR, TLB Type Register MPIDR, Multiprocessor Affinity Register Aliases of Main ID Register CPUID registers CCSIDR, Cache Size ID Registers CLIDR, Cache Level ID Register AIDR, Auxiliary ID Register, IMPLEMENTATION DEFINED CSSELR, Cache Size Selection Register System Control registers Security Extensions registers, if implemented Translation Table Base Registers DACR, Domain Access Control Register Fault Status Registers Fault Address Registers NOP Cache maintenance operations, Multiprocessing Extensions PAR, Physical Address Register Cache and branch predictor maintenance operations CP15ISB, Instruction barrier operation Cache maintenance operations VA to PA translation operations Cache management operations Data barrier operations DCCMVAU, Cache barrier operation NOP Cache maintenance operations TLB maintenance operations * Reserved for Branch Predictor, Cache and TCM operations Reserved for Performance monitors Reserved for TLB Lockdown operations TEX Remap Registers (PRRR, NMRR) Reserved for DMA operations for TCM access Security Extensions registers, if implemented ISR, Security Extensions register, if implemented FCSEIDR, FCSE PID Register Software Thread and Context ID registers IMPLEMENTATION DEFINED Registers Write-only † Read-only if FCSE not implemented Bold text = Accessible in User mode * Some encodings are only in the Multiprocessing Extensions Figure B3-10 CP15 registers in a VMSA implementation ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-65 Virtual Memory System Architecture (VMSA) For information about the CP15 encodings not shown in Figure B3-10 on page B3-65 see Unpredictable and undefined behavior for CP15 accesses on page B3-68. Summary of CP15 register descriptions in a VMSA implementation Table B3-14 shows the CP15 registers in a VMSA implementation. The table also includes links to the descriptions of each of the primary CP15 registers, c0 to c15. Table B3-14 Summary of VMSA CP15 register descriptions Register and description CP15 c0, ID codes registers on page B3-79 c0, Main ID Register (MIDR) on page B3-81 c0, Cache Type Register (CTR) on page B3-83 c0, TCM Type Register (TCMTR) on page B3-85 c0, TLB Type Register (TLBTR) on page B3-86 c0, Multiprocessor Affinity Register (MPIDR) on page B3-87 CP15 c0, Processor Feature registers on page B5-4 c0, Debug Feature Register 0 (ID_DFR0) on page B5-6 c0, Auxiliary Feature Register 0 (ID_AFR0) on page B5-8 CP15 c0, Memory Model Feature registers on page B5-9 CP15 c0, Instruction Set Attribute registers on page B5-19 c0, Cache Size ID Registers (CCSIDR) on page B3-91 c0, Cache Level ID Register (CLIDR) on page B3-92 c0, Implementation defined Auxiliary ID Register (AIDR) on page B3-94 c0, Cache Size Selection Register (CSSELR) on page B3-95 CP15 c1, System control registers on page B3-96 c1, System Control Register (SCTLR) on page B3-96 c1, Implementation defined Auxiliary Control Register (ACTLR) on page B3-103 c1, Coprocessor Access Control Register (CPACR) on page B3-104 c1, Secure Configuration Register (SCR) on page B3-106 B3-66 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) Table B3-14 Summary of VMSA CP15 register descriptions (continued) Register and description c1, Secure Debug Enable Register (SDER) on page B3-108 c1, Non-Secure Access Control Register (NSACR) on page B3-110 CP15 c2 and c3, Memory protection and control registers on page B3-113 c2, Translation Table Base Register 0 (TTBR0) on page B3-113 c2, Translation Table Base Register 1 (TTBR1) on page B3-116 c2, Translation Table Base Control Register (TTBCR) on page B3-117 c3, Domain Access Control Register (DACR) on page B3-119 CP15 c4, Not used on page B3-120 CP15 c5 and c6, Memory system fault registers on page B3-120 c5, Data Fault Status Register (DFSR) on page B3-121 c5, Instruction Fault Status Register (IFSR) on page B3-122 c5, Auxiliary Data and Instruction Fault Status Registers (ADFSR and AIFSR) on page B3-123 c6, Data Fault Address Register (DFAR) on page B3-124 c6, Instruction Fault Address Register (IFAR) on page B3-125 CP15 c7, Cache maintenance and other functions on page B3-126 CP15 c7, Cache and branch predictor maintenance functions on page B3-126 CP15 c7, Virtual Address to Physical Address translation operations on page B3-130 CP15 c7, Data and Instruction Barrier operations on page B3-137 CP15 c7, No Operation (NOP) on page B3-138 CP15 c8, TLB maintenance operations on page B3-138 CP15 c9, Cache and TCM lockdown registers and performance monitors on page B3-141 CP15 c10, Memory remapping and TLB control registers on page B3-142 c10, Primary Region Remap Register (PRRR) on page B3-143 c10, Normal Memory Remap Register (NMRR) on page B3-146 CP15 c11, Reserved for TCM DMA registers on page B3-147 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-67 Virtual Memory System Architecture (VMSA) Table B3-14 Summary of VMSA CP15 register descriptions (continued) Register and description CP15 c12, Security Extensions registers on page B3-148 c12, Vector Base Address Register (VBAR) on page B3-148 c12, Monitor Vector Base Address Register (MVBAR) on page B3-149 c12, Interrupt Status Register (ISR) on page B3-150 CP15 c13, Process, context and thread ID registers on page B3-151 c13, FCSE Process ID Register (FCSEIDR) on page B3-152 c13, Context ID Register (CONTEXTIDR) on page B3-153 CP15 c13 Software Thread ID registers on page B3-154 CP15 c14 is not used, see Unallocated CP15 encodings on page B3-69 CP15 c15, Implementation defined registers on page B3-155 B3.12.2 General behavior of CP15 registers The following sections give information about the general behavior of CP15 registers: • Read-only bits in read/write registers • Unpredictable and undefined behavior for CP15 accesses • Reset behavior of CP15 registers on page B3-70 See also Meaning of fixed bit values in register diagrams on page B3-78. Read-only bits in read/write registers Some read/write registers include bits that are read-only. These bits ignore writes. An example of this is the SCTLR.NMFI bit, bit [27], see c1, System Control Register (SCTLR) on page B3-96. UNPREDICTABLE and UNDEFINED behavior for CP15 accesses In ARMv7 the following operations are UNDEFINED: • all CDP, MCRR, MRRC, LDC and STC operations to CP15 • all CDP2, MCR2, MRC2, MCRR2, MRRC2, LDC2 and STC2 operations to CP15. Unless otherwise indicated in the individual register descriptions: • reserved fields in registers are UNK/SBZP • reserved values of fields can have UNPREDICTABLE effects. B3-68 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) The following subsections give more information about UNPREDICTABLE and UNDEFINED behavior for CP15: • Unallocated CP15 encodings • Rules for MCR and MRC accesses to CP15 registers • Effects of the Security Extensions on page B3-70. Unallocated CP15 encodings When MCR and MRC instructions perform CP15 operations, the CRn value for the instruction is the major register specifier for the CP15 space. Accesses to unallocated major registers are UNDEFINED. For the ARMv7-A Architecture, this means that: • for an implementation that includes the Security Extensions, accesses with = {c4, c14} are UNDEFINED • for an implementation that does not include the Security Extensions, accesses with = {c4, c12, c14} are UNDEFINED In an allocated CP15 major register specifier, MCR and MRC accesses to all unallocated encodings are UNPREDICTABLE for privileged accesses. For the ARMv7-A architecture this means that: • if the Security Extensions are implemented, any privileged MCR or MRC access with != {c4, c14} and a combination of , and values not shown in Figure B3-10 on page B3-65 is UNPREDICTABLE. • if the Security Extensions are not implemented, any privileged MCR or MRC access with != {c4, c12, c14} and a combination of , and values not shown in Figure B3-10 on page B3-65 is UNPREDICTABLE. Note As shown in Figure B3-10 on page B3-65, accesses to unallocated principal ID registers map onto MIDR. These are accesses with = c0, = 0, = c0, and = {4, 6, 7}. Rules for MCR and MRC accesses to CP15 registers All MCR operations from the PC are UNPREDICTABLE for all coprocessors, including for CP15. All MRC operations to APSR_nzcv are UNPREDICTABLE for CP15. The following accesses are UNPREDICTABLE: • an MCR access to an encoding for which no write behavior is defined in any circumstances • an MRC access to an encoding for which no read behavior is defined in any circumstances. Except for CP15 encoding that are accessible in User mode, all MCR and MRC accesses from User mode are This applies to all User mode accesses to unallocated CP15 encodings. Individual register descriptions, and the summaries of the CP15 major registers, show the CP15 encodings that are accessible in User mode. UNDEFINED. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-69 Virtual Memory System Architecture (VMSA) Some individual registers can be made inaccessible by setting configuration bits, possibly including IMPLEMENTATION DEFINED configuration bits, to disable access to the register. The effects of the architecturally-defined configuration bits are defined individually in this manual. Typically, setting a configuration bit to disable access to a register results in the register becoming UNDEFINED for MRC and MCR accesses. Effects of the Security Extensions In Non-secure state, any User or privileged access to a CP15 register is UNDEFINED if either: • There are no circumstances in which all bits and fields in the register can be accessed from Non-secure privileged modes. • Settings in the NSACR mean that there are no circumstances in which all bits and fields in the register can be accessed from Non-secure privileged modes. Note The ARMv7-A architecture does not define any registers of this type. However an ARMv7-A implementation might include one or more IMPLEMENTATION DEFINED registers of this type. When Non-secure access to a field of a CP15 register is controlled by an access control bit in the NSACR, and that access control bit is set to 0, then the controlled register field is RAZ/WI when accessed from a privileged mode in Non-secure state. If the register can be accessed from User mode then the field is also RAZ/WI when accessed from User mode. If write access to a register is disabled by the CP15SDISABLE signal then any MCR access to that register is UNDEFINED. Reset behavior of CP15 registers After a reset, only a limited subset of the processor state is guaranteed to be set to defined values. On reset, the VMSAv7 architecture requires that the following CP15 registers are set to defined values. Note When the Security Extensions are implemented, only the Secure copy of a banked register is reset to the defined value. B3-70 • The SCTLR, see c1, System Control Register (SCTLR) on page B3-96. • The CPACR, see c1, Coprocessor Access Control Register (CPACR) on page B3-104. • The SCR, when the Security Extensions are implemented, see c1, Secure Configuration Register (SCR) on page B3-106. • The TTBCR, see c2, Translation Table Base Control Register (TTBCR) on page B3-117. • The Secure version of the VBAR, when the Security Extensions are implemented, see c12, Vector Base Address Register (VBAR) on page B3-148. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) • The FCSEIDR, if the Fast Context Switch Extension (FCSE) is implemented, see c13, FCSE Process ID Register (FCSEIDR) on page B3-152. This register is RAZ/WI when the FCSE is not implemented. For details of the reset values of these registers see the register descriptions. If the introductory description of a register does not include its reset value then the architecture does not require that register to be reset to a defined value. The values of all other registers at reset are architecturally UNKNOWN. An implementation can assign an IMPLEMENTATION DEFINED reset value to a register whose reset value is architecturally UNKNOWN. After a reset, software must not rely on the value of any read/write register that does not have either an architecturally-defined reset value or an IMPLEMENTATION DEFINED reset value. B3.12.3 Effect of the Security Extensions on the CP15 registers When the Security Extensions are implemented, they integrate with many features of the architecture. Therefore, the descriptions of the individual CP15 registers include information about how the Security Extensions affect the register. This section: • summarizes how the Security Extensions affect the implementation of the CP15 registers • summarizes how the Security Extensions control access to the CP15 registers • describes a Security Extensions signal that can control access to some CP15 registers. It contains the following subsections: • Banked CP15 registers on page B3-72 • Restricted access CP15 registers on page B3-73 • Configurable access CP15 registers on page B3-74 • Common CP15 registers on page B3-74 • The CP15SDISABLE input on page B3-76 • Access to registers in Monitor mode on page B3-77. Note • This section describes the effect of the Security Extensions on all of CP15 registers that are present in an implementation that includes the Security Extensions. • When the Security Extensions are implemented, the register classifications of Banked, Restricted access, Configurable, or Common can apply to some coprocessor registers in addition to the CP15 registers. It is IMPLEMENTATION DEFINED whether each IMPLEMENTATION DEFINED register is Banked, Restricted access, Configurable, or Common. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-71 Virtual Memory System Architecture (VMSA) Banked CP15 registers When the Security Extensions are implemented, some CP15 registers are banked. Banked CP15 registers have two copies, one Secure and one Non-secure. The SCR.NS bit selects the Secure or Non-secure register, see c1, Secure Configuration Register (SCR) on page B3-106. Table B3-15 shows which registers are banked, and the permitted access to each register. Table B3-15 Banked CP15 registers CP15 register Banked register Permitted accesses a c0 CSSELR, Cache Size Selection Register Read/write in privileged modes only c1 SCTLR, System Control Register b Read/write in privileged modes only ACTLR, Auxiliary Control Register c Read/write in privileged modes only TTBR0, Translation Table Base 0 Read/write in privileged modes only TTBR1, Translation Table Base 1 Read/write in privileged modes only TTBCR, Translation Table Base Control Read/write in privileged modes only c3 DACR, Domain Access Control Register Read/write in privileged modes only c5 DFSR, Data Fault Status Register Read/write in privileged modes only IFSR, Instruction Fault Status Register Read/write in privileged modes only ADFSR, Auxiliary Data Fault Status Register c Read/write in privileged modes only AIFSR, Auxiliary Instruction Fault Status Register c Read/write in privileged modes only DFAR, Data Fault Address Register Read/write in privileged modes only IFAR, Instruction Fault Address Register Read/write in privileged modes only c7 PAR, Physical Address Register (VA to PA translation) Read/write in privileged modes only c10 PRRR, Primary Region Remap Register Read/write in privileged modes only NMRR, Normal Memory Remap Register Read/write in privileged modes only VBAR, Vector Base Address Register Read/write in privileged modes only c2 c6 c12 B3-72 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) Table B3-15 Banked CP15 registers (continued) CP15 register Banked register Permitted accesses a c13 FCSEIDR, FCSE PID Register d Read/write in privileged modes only CONTEXTIDR, Context ID Register Read/write in privileged modes only TPIDRURW, User Read/Write Thread ID Read/write in unprivileged and privileged modes TPIDRURO, User Read-only Thread ID Read-only in User mode Read/write in privileged modes TPIDRPRW, Privileged Only Thread ID Read/write in privileged modes only a. Any attempt to execute an access that is not permitted results in an Undefined Instruction exception. b. Some bits are common to the Secure and the Non-secure register, see c1, System Control Register (SCTLR) on page B3-96. c. Register is IMPLEMENTATION DEFINED. d. Banked only if the FCSE is implemented. The FCSE PID Register is RAZ/WI if the FCSE is not implemented. A Banked CP15 register can contain a mixture of: • fields that are banked • fields that are read-only in Non-secure privileged modes but read/write in the Secure state. The System Control Register SCTLR is an example of a register of that contains this mixture of fields. The Secure copies of the Banked CP15 registers are sometimes referred to as the Secure Banked CP15 registers. The Non-secure copies of the Banked CP15 registers are sometimes referred to as the Non-secure Banked CP15 registers. Restricted access CP15 registers When the Security Extensions are implemented, some CP15 registers are present only in the Secure security state. These are called Restricted access registers, and their read/write access permissions are: • Restricted access CP15 registers cannot be modified in Non-secure state. • The NSACR can be read in Non-secure privileged modes, but not in Non-secure User mode. This enables software running in a Non-secure privileged mode to read the access permissions for CP15 registers that have configurable access. • Apart from the NSACR, Restricted access CP15 registers cannot be read in Non-secure state. Table B3-16 on page B3-74 shows the Restricted access CP15 registers when the Security Extensions are implemented: ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-73 Virtual Memory System Architecture (VMSA) Table B3-16 Restricted access CP15 registers CP15 register Secure register Permitted accesses a c1 NSACR, Non-Secure Access Control Read/write in Secure privileged modes Read-only in Non-secure privileged modes SCR, Secure Configuration Read/write in Secure privileged modes SDER, Secure Debug Enable Read/write in Secure privileged modes MVBAR, Monitor Vector Base Address Read/write in Secure privileged modes c12 a. Any attempt to execute an access that is not permitted results in an Undefined Instruction exception. Configurable access CP15 registers Access to some CP15 registers is configurable. These registers can be: • accessible from Secure states only • accessible from both Secure and Non-secure states. Access is controlled by bits in the NSACR, see c1, Non-Secure Access Control Register (NSACR) on page B3-110. In ARMv7-A, the only required Configurable access CP15 register is: • CPACR, Coprocessor Access Control Register. Common CP15 registers Some CP15 registers and operations are common to the Secure and Non-secure security states. These are described as the Common access CP15 registers, or simply as the Common CP15 registers. These registers are: • Read-only registers that hold configuration information. • Register encodings used for various memory system operations, rather than to access registers. • The Interrupt Status Register (ISR). B3-74 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) Table B3-17 shows the registers that are present in an ARMv7-A implementation that are not affected by the Security Extensions. When the Security Extensions are implemented these registers are sometimes described as the common registers. Table B3-17 Common CP15 registers CP15 register Register Permitted accesses a c0 MIDR, Main ID Register Read-only in privileged modes only CTR, Cache Type Register Read-only in privileged modes only TCMTR, TCM Type Register b Read-only in privileged modes only TLBTR, TLB Type Register b Read-only in privileged modes only MPIDR, Multiprocessor Affinity Register Read-only in privileged modes only ID_PFRx, Processor Feature Registers Read-only in privileged modes only ID_DFR0, Debug Feature Register 0 Read-only in privileged modes only ID_AFR0, Auxiliary Feature Register 0 Read-only in privileged modes only ID_MMFRx, Memory Model Feature Registers Read-only in privileged modes only ID_ISARx, Instruction Set Attribute Registers Read-only in privileged modes only CCSIDR, Cache Size ID Register Read-only in privileged modes only CLIDR, Cache Level ID Register Read-only in privileged modes only AIDR, Auxiliary ID Register b Read-only in privileged modes only NOP Write-only in privileged modes only Cache maintenance operations See CP15 c7, Cache and branch predictor maintenance functions on page B3-126 VA to PA Translation operations See CP15 c7, Virtual Address to Physical Address translation operations on page B3-130 Data Barrier Operations Write-only in unprivileged and privileged modes c8 TLB maintenance operations Write-only in privileged modes only c9 Performance monitors See Access permissions on page C9-12 c12 ISR, Interrupt Status Register Read-only in privileged modes only c7 a. Any attempt to execute an access that is not permitted results in an Undefined Instruction exception. b. Register or operation details are IMPLEMENTATION DEFINED. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-75 Virtual Memory System Architecture (VMSA) Secure CP15 registers The Secure CP15 registers comprise: • The Secure copies of the Banked CP15 registers • Restricted access CP15 registers • Configurable access CP15 registers that are configured to be accessible only from Secure state. The CP15SDISABLE input The Security Extensions include an input signal, CP15SDISABLE, that disables write access to some of the Secure registers when asserted HIGH. Note The interaction between CP15SDISABLE and any IMPLEMENTATION DEFINED register is IMPLEMENTATION DEFINED. Table B3-18 shows the registers and operations affected. Table B3-18 Secure registers affected by CP15SDISABLE CP15 register Register name Affected operation c1 SCTLR, System Control Register MCR p15, 0, , c1, c0, 0 c2 TTBR0, Translation Table Base Register 0 MCR p15, 0, , c2, c0, 0 TTBCR, Translation Table Base Control Register MCR p15, 0, , c2, c0, 2 c3 DACR, Domain Access Control Register MCR p15, 0, , c3, c0, 0 c10 PRRR. Primary Region Remap Register MCR p15, 0, , c10, c2, 0 NMRR, Normal Memory Remap Register MCR p15, 0, , c10, c2, 1 VBAR, Vector Base Address Register MCR p15, 0, , c12, c0, 0 MVBAR, Monitor Vector Base Address Register MCR p15, 0, , c12, c0, 1 FCSEIDR, FCSE PID Register a MCR p15, 0, , c13, c0, 0 c12 c13 a. If the FCSE is implemented. The FCSE PID Register is RAZ/WI if the FCSE is not implemented. On a reset by the external system, the CP15SDISABLE input signal must be taken LOW. This permits the Reset code to set up the configuration of the Security Extensions. When the input is asserted HIGH, any attempt to write to the Secure registers shown in Table B3-18 results in an Undefined Instruction exception. B3-76 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) The CP15SDISABLE input does not affect reading Secure registers, or reading or writing Non-secure registers. It is IMPLEMENTATION DEFINED how the input is changed and when changes to this input are reflected in the processor. However, changes must be reflected as quickly as possible. The change must occur before completion of a Instruction Synchronization Barrier operation, issued after the change, is visible to the processor with respect to instruction execution boundaries. Software must perform a Instruction Synchronization Barrier operation meeting the above conditions to ensure all subsequent instructions are affected by the change to CP15SDISABLE. The assertion of CP15SDISABLE enables key Secure privileged features to be locked in a known good state, providing an additional level of overall system security. ARM expects control of this input to reside in the system, in a system block dedicated to security. Access to registers in Monitor mode When the processor is in Monitor mode, the processor is in Secure state regardless of the value of the SCR.NS bit. In Monitor mode, the SCR.NS bit determines whether the Secure Banked CP15 registers or Non-secure Banked CP15 registers are read or written using MRC or MCR instructions. That is: NS = 0 Common, Restricted access, and Secure Banked registers are accessed by CP15 MRC and MCR instructions. CP15 operations use the security state to determine all resources used, that is, all CP15 based operations are performed in Secure state. NS = 1 Common, Restricted access and Non-secure Banked registers are accessed by CP15 MRC and MCR instructions. CP15 operations use the security state to determine all resources used, that is, all CP15 based operations are performed in Secure state. The security state determines whether the Secure or Non-secure Banked registers are used to determine the control state. B3.12.4 Changes to CP15 registers and the memory order model All changes to CP15 registers that appear in program order after any explicit memory operations are guaranteed not to affect those memory operations. Any change to CP15 registers is guaranteed to be visible to subsequent instructions only after one of: • the execution of an ISB instruction • the taking of an exception • the return from an exception. To guarantee the visibility of changes to some CP15 registers, additional operations might be required, on a case by case basis, before the ISB instruction, exception or return from exception. These cases are identified specifically in the definition of the registers. However, for CP15 register accesses, all MRC and MCR instructions to the same register using the same register number appear to occur in program order relative to each other without context synchronization. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-77 Virtual Memory System Architecture (VMSA) Where a change to the CP15 registers that is not yet guaranteed to be visible has an effect on exception processing, the following rule applies: • When it is determined that an exception must be taken, any change of state held in CP15 registers involved in the triggering of the exception and that affects the processing of the exception is guaranteed to take effect before the exception is taken. Therefore, in the following example, where initially A=1 and V=0, the LDR might or might not take a Data Abort exception due to the unaligned access, but if an exception occurs the vector used is affected by the V bit: MCR p15, R0, c1, c0, 0 LDR R2, [R3] ; clears the A bit and sets the V bit ; unaligned load. B3.12.5 Meaning of fixed bit values in register diagrams In register diagrams, fixed bits are indicated by one of following: 0 In any implementation: • the bit must read as 0 • writes to the bit must be ignored. Software: • can rely on the bit reading as 0 • must use an SBZP policy to write to the bit. (0) In any implementation: • the bit must read as 0 • writes to the bit must be ignored. Software: • must not rely on the bit reading as 0 • must use an SBZP policy to write to the bit. 1 In any implementation: • the bit must read as 1 • writes to the bit must be ignored. Software: • can rely on the bit reading as 1 • must use an SBOP policy to write to the bit. (1) In any implementation: • the bit must read as 1 • writes to the bit must be ignored. Software: • must not rely on the bit reading as 1 • must use an SBOP policy to write to the bit. Fields that are more than 1 bit wide are sometimes described as UNK/SBZP, instead of having each bit marked as (0). B3-78 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) B3.12.6 CP15 c0, ID codes registers The CP15 c0 registers are used for processor and feature identification. Figure B3-11 shows the CP15 c0 registers. CRn c0 opc1 0 CRm c0 c1 c2 1 {c3-c7} c0 2 c0 Read-only opc2 0 1 2 3 5 {4,6,7} 0 1 2 3 4 5 6 7 0 1 2 3 4 5 {6,7} {0-7} 0 1 7 0 Read/Write MIDR, Main ID Register CTR, Cache Type Register TCMTR, TCM Type Register, details IMPLEMENTATION DEFINED TLBTR, TLB Type Register, details IMPLEMENTATION DEFINED MPIDR, Multiprocessor Affinity Register Aliases of Main ID Register ‡ ID_PFR0, Processor Feature Register 0 ‡ ID_PFR1, Processor Feature Register 1 ‡ ID_DFR0, Debug Feature Register 0 ‡ ID_AFR0, Auxiliary Feature Register 0 ‡ ID_MMFR0, Memory Model Feature Register 0 ‡ ID_MMFR1, Memory Model Feature Register 1 ‡ ID_MMFR2, Memory Model Feature Register 2 ‡ ID_MMFR3, Memory Model Feature Register 3 ‡ ID_ISAR0, ISA Feature Register 0 ‡ ID_ISAR1, ISA Feature Register 1 ‡ ID_ISAR1, ISA Feature Register 2 ‡ ID_ISAR1, ISA Feature Register 3 ‡ ID_ISAR4, ISA Feature Register 4 ‡ ID_ISAR5, ISA Feature Register 5 Read-As-Zero Read-As-Zero CCSIDR, Cache Size ID Registers CLIDR, Cache Level ID Register AIDR, Auxiliary ID Register IMPLEMENTATION DEFINED CSSELR, Cache Size Selection Register Write-only ‡ CPUID registers Figure B3-11 CP15 c0 registers in a VMSA implementation CP15 c0 register encodings not shown in Figure B3-11 are UNPREDICTABLE, see Unallocated CP15 encodings on page B3-69. Note Chapter B5 The CPUID Identification Scheme describes the CPUID registers shown in Figure B3-11. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-79 Virtual Memory System Architecture (VMSA) Table B3-19 lists the CP15 c0 registers and shows where each register is described in full. The table does not include the reserved and aliased registers that are shown in Figure B3-11 on page B3-79. Table B3-19 Index to CP15 c0 register descriptions opc1 CRm opc2 Register and description 0 c0 0 c0, Main ID Register (MIDR) on page B3-81 1 c0, Cache Type Register (CTR) on page B3-83 2 c0, TCM Type Register (TCMTR) on page B3-85 3 c0, TLB Type Register (TLBTR) on page B3-86 5 c0, Multiprocessor Affinity Register (MPIDR) on page B3-87 4, 6, 7 c0, Main ID Register (MIDR) on page B3-81 0, 1 CP15 c0, Processor Feature registers on page B5-4 2 c0, Debug Feature Register 0 (ID_DFR0) on page B5-6 3 c0, Auxiliary Feature Register 0 (ID_AFR0) on page B5-8 4-7 CP15 c0, Memory Model Feature registers on page B5-9 c2 0-5 CP15 c0, Instruction Set Attribute registers on page B5-19 c0 0 c0, Cache Size ID Registers (CCSIDR) on page B3-91 1 c0, Cache Level ID Register (CLIDR) on page B3-92 7 c0, Implementation defined Auxiliary ID Register (AIDR) on page B3-94 0 c0, Cache Size Selection Register (CSSELR) on page B3-95 c1 1 2 c0 Note The CPUID scheme described in Chapter B5 The CPUID Identification Scheme includes information about the implementation of the optional VFP and Advanced SIMD architecture extensions. See Advanced SIMD and VFP extensions on page A2-20 for a summary of the implementation options for these features. B3-80 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) B3.12.7 c0, Main ID Register (MIDR) The Main ID Register, MIDR, provides identification information for the processor, including an implementer code for the device and a device ID number. The MIDR is: • a 32-bit read-only register • accessible only in privileged modes • when the Security Extensions are implemented, a Common register. Some fields of the MIDR are IMPLEMENTATION DEFINED. For details of the values of these fields for a particular ARMv7 implementation, and any implementation-specific significance of these values, see the product documentation. The format of the MIDR is: 31 24 23 Implementer 20 19 Variant 16 15 Architecture 4 Primary part number 3 0 Revision Implementer, bits [31:24] The Implementer code. Table B3-20 shows the permitted values for this field: Table B3-20 Implementer codes Bits [31:24] ASCII character Implementer 0x41 A ARM Limited 0x44 D Digital Equipment Corporation 0x4D M Motorola, Freescale Semiconductor Inc. 0x51 Q QUALCOMM Inc. 0x56 V Marvell Semiconductor Inc. 0x69 i Intel Corporation All other values are reserved by ARM and must not be used. Variant, bits [23:20] An IMPLEMENTATION DEFINED variant number. Typically, this field is used to distinguish between different product variants, or major revisions of a product. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-81 Virtual Memory System Architecture (VMSA) Architecture, bits [19:16] Table B3-21 shows the permitted values for this field: Table B3-21 Architecture codes Bits [19:16] Architecture 0x1 ARMv4 0x2 ARMv4T 0x3 ARMv5 (obsolete) 0x4 ARMv5T 0x5 ARMv5TE 0x6 ARMv5TEJ 0x7 ARMv6 0xF Defined by CPUID scheme All other values are reserved by ARM and must not be used. Primary part number, bits [15:4] An IMPLEMENTATION DEFINED primary part number for the device. Note On processors implemented by ARM, if the top four bits of the primary part number are 0x0 or 0x7, the variant and architecture are encoded differently, see c0, Main ID Register (MIDR) on page AppxH-34. Processors implemented by ARM have an Implementer code of 0x41. Revision, bits [3:0] An IMPLEMENTATION DEFINED revision number for the device. ARMv7 requires all implementations to use the CPUID scheme, described in Chapter B5 The CPUID Identification Scheme, and an implementation is described by the MIDR with the CPUID registers. Note For an ARMv7 implementation by ARM, the MIDR is interpreted as: Bits [31:24] Implementer code, must be 0x41. Bits [23:20] Major revision number, rX. Bits [19:16] Architecture code, must be 0xF. Bits [15:4] ARM part number. Bits [3:0] Minor revision number, pY. B3-82 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) Accessing the MIDR To access the MIDR you read the CP15 registers with set to 0, set to c0, set to c0, and set to 0. For example: MRC p15,0,,c0,c0,0 ; Read CP15 Main ID Register B3.12.8 c0, Cache Type Register (CTR) The Cache Type Register, CTR, provides information about the architecture of the caches. The CTR is: • a 32-bit read-only register • accessible only in privileged modes • when the Security Extensions are implemented, a Common register. The format of the CTR is changed in ARMv7. The new format of the register is indicated by Bit [31:29] being set to 0b100. For details of the format of the Cache Type Register in versions of the ARM architecture before ARMv7 see c0, Cache Type Register (CTR) on page AppxH-35. In ARMv7, the format of the CTR is: 31 29 28 27 1 0 0 0 24 23 CWG 20 19 ERG 16 15 14 13 4 DminLine L1Ip 0 0 0 0 0 0 0 0 0 0 3 0 IminLine Bits [31:29] Set to 0b100 for the ARMv7 register format. Set to 0b000 for the format used in ARMv6 and earlier. Bit [28] RAZ. CWG, bits [27:24] Cache Writeback Granule. Log2 of the number of words of the maximum size of memory that can be overwritten as a result of the eviction of a cache entry that has had a memory location in it modified. A value of 0b0000 indicates that the CTR does not provide Cache Writeback Granule information and either: • the architectural maximum of 512 words (2Kbytes) must be assumed • the Cache Writeback Granule can be determined from maximum cache line size encoded in the Cache Size ID Registers. Values greater than 0b1001 are reserved. ERG, bits [27:24] Exclusives Reservation Granule. Log2 of the number of words of the maximum size of the reservation granule that has been implemented for the Load-Exclusive and Store-Exclusive instructions. For more information, see Tagging and the size of the tagged memory block on page A3-20. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-83 Virtual Memory System Architecture (VMSA) A value of 0b0000 indicates that the CTR does not provide Exclusives Reservation Granule information and the architectural maximum of 512 words (2Kbytes) must be assumed. Values greater than 0b1001 are reserved. DminLine, bits [19:16] Log2 of the number of words in the smallest cache line of all the data caches and unified caches that are controlled by the processor. L1Ip, bits [15:14] Level 1 instruction cache policy. Indicates the indexing and tagging policy for the L1 instruction cache. Table B3-22 shows the possible values for this field. Table B3-22 Level 1 instruction cache policy field values Bits [13:4] L1Ip bits L1 instruction cache indexing and tagging policy 00 Reserved 01 ASID-tagged Virtual Index, Virtual Tag (AIVIVT) 10 Virtual Index, Physical Tag (VIPT) 11 Physical Index, Physical Tag (PIPT) RAZ. IminLine, bits [3:0] Log2 of the number of words in the smallest cache line of all the instruction caches that are controlled by the processor. Accessing the CTR To access the CTR you read the CP15 registers with set to 0, set to c0, set to c0, and set to 1. For example MRC p15,0,,c0,c0,1 B3-84 ; Read CP15 Cache Type Register Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) B3.12.9 c0, TCM Type Register (TCMTR) The TCM Type Register, TCMTR, provides information about the implementation of the TCM. The TCMTR is: • a 32-bit read-only register • accessible only in privileged modes. • when the Security Extensions are implemented, a Common register. From ARMv7: • TCMTR must be implemented • when the ARMv7 format is used, the meaning of register bits [28:0] is IMPLEMENTATION DEFINED • the ARMv6 format of the TCM Type Register remains a valid usage model • if no TCMs are implemented the ARMv6 format must be used to indicate zero-sized TCMs. The ARMv7 format of the TCMTR is: 31 29 28 0 1 0 0 IMPLEMENTATION DEFINED Bits [31:29] Set to 0b100 for the ARMv7 register format. Note This field is set to 0b000 for the format used in ARMv6 and earlier. Bits [28:0] IMPLEMENTATION DEFINED in the ARMv7 register format. If no TCMs are implemented, the TCMTR must be implemented with this ARMv6 format: 31 29 28 0 0 0 19 18 UNKNOWN 16 15 0 0 0 3 UNKNOWN 2 0 0 0 0 For details of the ARMv6 optional implementation of the TCM Type Register see c0, TCM Type Register (TCMTR) on page AppxG-33. Accessing the TCMTR To access the TCMTR you read the CP15 registers with set to 0, set to c0, set to c0, and set to 2. For example: MRC p15,0,,c0,c0,2 ARM DDI 0406B ; Read CP15 TCM Type Register Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-85 Virtual Memory System Architecture (VMSA) B3.12.10 c0, TLB Type Register (TLBTR) The TLB Type Register, TLBTR, provides information about the TLB implementation. The register must define whether the implementation provides separate instruction and data TLBs, or a unified TLB. Normally, the IMPLEMENTATION DEFINED information in this register includes the number of lockable entries in the TLB. The TLBTR is: • a 32-bit read-only register • accessible only in privileged modes • implemented only when the VMSA is implemented • when the Security Extensions are implemented, a Common register. The format of the TLBTR is: 31 1 0 nU IMPLEMENTATION DEFINED Bits [31:1] IMPLEMENTATION DEFINED. nU, bit [0] Not Unified TLB. Indicates whether the implementation has a unified TLB: nU == 0 Unified TLB. nU == 1 Separate Instruction and Data TLBs. Note From ARMv7, the TLB lockdown mechanism is IMPLEMENTATION DEFINED, and therefore the details of bits [31:1] of the TLB Type Register are IMPLEMENTATION DEFINED. Accessing the TLBTR To access the TLBTR you read the CP15 registers with set to 0, set to c0, set to c0, and set to 3. For example: MRC p15,0,,c0,c0,3 B3-86 ; Read CP15 TLB Type Register Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) B3.12.11 c0, Multiprocessor Affinity Register (MPIDR) The Multiprocessor Affinity Register, MPIDR, provides an additional processor identification mechanism for scheduling purposes in a multiprocessor system. In a uniprocessor system ARM recommends that this register returns a value of 0. The MPIDR is: • a 32-bit read-only register • accessible only in privileged modes • when the Security Extensions are implemented, a Common register • introduced in ARMv7. In the ARMv7 base architecture the format of the MPIDR is: 31 0 24 23 0 0 0 0 0 0 0 16 15 Affinity Level 2 8 7 Affinity Level 1 0 Affinity Level 0 When the Multiprocessing Extensions are implemented the format of the MPIDR is: 31 30 29 25 24 23 1 U (0) (0) (0) (0) (0) 16 15 Affinity Level 2 8 7 Affinity Level 1 0 Affinity Level 0 MT Note In the MIDR bit definitions, a processor in the system can be a physical processor or a virtual CPU. Bits [31:24], ARMv7 base architecture Reserved, RAZ. Bits [31], Multiprocessing Extensions RAO. Indicates that the processor implements the Multiprocessing Extensions register format. U bit, bit [30], Multiprocessing Extensions Indicates a Uniprocessor system, as distinct from processor 0 in a multiprocessor system. The possible values of this bit are: 0 Processor is part of a multiprocessor system. 1 Processor is part of a uniprocessor system ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-87 Virtual Memory System Architecture (VMSA) Bits [29:25], Multiprocessing Extensions Reserved, UNK. MT bit, bit [24], Multiprocessing Extensions Indicates whether the lowest level of affinity consists of logical processors that are implemented using a multi-threading type approach. The possible values of this bit are: 0 Performance of processors at the lowest affinity level is largely independent. 1 Performance of processors at the lowest affinity level is very interdependent For more information about the meaning of this bit see Multi-threading approach to lowest affinity levels, Multiprocessing Extensions on page B3-89. Affinity level 2, bits [23:16] The least significant affinity level field, for this processor in the system. Affinity level 1, bits [15:8] The intermediate affinity level field, for this processor in the system. Affinity level 0, bits [7:0] The most significant level field, for this processor in the system. In the system as a whole, for each of the affinity level fields, the assigned values must start at 0 and increase monotonically. Increasing monotonically means that: • There must not be any gaps in the sequence of numbers used. • A higher value of the field includes any properties indicated by all lower values of the field. When matching against an affinity level field, scheduler software checks for a value equal to or greater than a required value. Recommended use of the MPIDR on page B3-89 includes a description of an example multiprocessor system and the affinity level field values it might use. The interpretation of these fields is IMPLEMENTATION DEFINED, and must be documented as part of the documentation of the multiprocessor system. ARM recommends that this register might be used as described in the next subsection. The software mechanism to discover the total number of affinity numbers used at each level is IMPLEMENTATION DEFINED, and is part of the general system identification task. B3-88 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) Multi-threading approach to lowest affinity levels, Multiprocessing Extensions When the Multiprocessing Extensions are implemented, if the MPIDR.MT bit is set to 1, this indicates that the processors at affinity level 0 are logical processors, implemented using a multi-threading type approach. In such an approach, there can be a significant performance impact if a new thread is assigned the processor with: • the same Affinity Level 0 value as some other thread, referred to as the original thread • a pair of values for Affinity Levels 2 and 3 that are different to the pair of values of the original thread. In this situation, the performance of the original thread might be significantly reduced. Note In this description, thread always refers to a thread or a process. Recommended use of the MPIDR In a multiprocessor system the register might provide two important functions: • Identifying special functionality of a particular processor in the system. In general, the actual meaning of the affinity level fields is not important. In a small number of situations, an affinity level field value might have a special IMPLEMENTATION DEFINED significance. Possible examples include booting from reset and power-down events. • Providing affinity information for the scheduling software, to help the scheduler run an individual thread or process on either: — the same processor, or as similar a processor as possible, as the processor it was running on previously — a processor on which a related thread or process was run. Note A monotonically increasing single number ID mechanism provides a convenient index into software arrays and for accessing the interrupt controller. This might be: • performed as part of the boot sequence • stored as part of the local storage of threads. MPIDR provides a mechanism with up to three levels of affinity information, but the meaning of those levels of affinity is entirely IMPLEMENTATION DEFINED. The levels of affinity provided can have different meanings. Table B3-23 on page B3-90 shows two possible implementations: ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-89 Virtual Memory System Architecture (VMSA) Table B3-23 Possible implementations of the affinity levels Affinity Level Example system 1 Example system 2 0 Virtual CPUs in a multi-threaded processor Processors in an SMP cluster 1 Processors in an Symmetric Multi Processor (SMP) cluster Clusters with a system 2 Clusters in a system No meaning, fixed as 0. The scheduler maintains affinity level information for all threads and processes. When it has to reschedule a thread or process the scheduler: • looks for an available processor that matches at all three affinity levels • if this fails, it might look for a processor that matches at levels 2 and 3 only • if it still cannot find an available processor it might look for a match at level 3 only. A multiprocessor system corresponding to Example system 1 in Table B3-23 might implement affinity values as shown in Table B3-24: Table B3-24 Example of possible affinity values at different affinity levels Affinity level 2 Cluster level Affinity level 1 Processor level Affinity level 0 Virtual CPU level 0 0 0, 1 0 1 0, 1 0 2 0, 1 0 3 0, 1 1 0 0, 1 1 1 0, 1 1 2 0, 1 1 3 0, 1 Accessing the MPIDR To access MPIDR you read the CP15 registers with set to 0, set to c0, set to c0, and set to 5. For example: MRC p15,0,,c0,c0,5 B3-90 ; Read Multiprocessor Affinity Register Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) B3.12.12 c0, Cache Size ID Registers (CCSIDR) The Cache Size ID Registers, CCSIDR, provide information about the architecture of the caches. The CCSIDR registers are: • 32-bit read-only registers • accessible only in privileged modes • when the Security Extensions are implemented, Common registers • introduced in ARMv7. One CCSIDR is implemented for each cache that can be accessed by the processor. CSSELR selects which Cache Size ID Register is accessible, see c0, Cache Size Selection Register (CSSELR) on page B3-95. The format of a CCSIDR is: 31 30 29 28 27 13 12 W W W RA T B A NumSets 3 Associativity WT, bit [31] Indicates whether the cache level supports Write-Through, see Table B3-25. WB, bit [30] Indicates whether the cache level supports Write-Back, see Table B3-25. RA, bit [29] Indicates whether the cache level supports Read-Allocation, see Table B3-25. WA, bit [28] Indicates whether the cache level supports Write-Allocation, see Table B3-25. 2 0 LineSize Table B3-25 WT, WB, RA and WA bit values WT, WB, RA or WA bit value Meaning 0 Feature not supported 1 Feature supported NumSets, bits [27:13] (Number of sets in cache) - 1, therefore a value of 0 indicates 1 set in the cache. The number of sets does not have to be a power of 2. Associativity, bits [12:3] (Associativity of cache) - 1, therefore a value of 0 indicates an associativity of 1. The associativity does not have to be a power of 2. LineSize, bits [2:0] (Log2(Number of words in cache line)) -2. For example: • ARM DDI 0406B For a line length of 4 words: Log2(4) = 2, LineSize entry = 0. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-91 Virtual Memory System Architecture (VMSA) This is the minimum line length. For a line length of 8 words: Log2(8) = 3, LineSize entry = 1. • Accessing the currently selected CCSIDR The CSSELR selects a CCSIDR, see c0, Cache Size Selection Register (CSSELR) on page B3-95. To access the currently-selected CCSIDR you read the CP15 registers with set to 1, set to c0, set to c0, and set to 0. For example: MRC p15,1,,c0,c0,0 ; Read current CP15 Cache Size ID Register Accessing the CCSIDR when the value in CSSELR corresponds to a cache that is not implemented returns an UNKNOWN value. B3.12.13 c0, Cache Level ID Register (CLIDR) The Cache Level ID Register, CLIDR: • identifies the type of cache, or caches, implemented at each level, up to a maximum of eight levels • identifies the Level of Coherency and Level of Unification for the cache hierarchy. The CLIDR is: • a 32-bit read-only register • accessible only in privileged modes • when the Security Extensions are implemented, a Common register • introduced in ARMv7. The format of the CLIDR is: 31 30 29 0 0 27 26 LoUU Bits [31:30] 24 23 LoC 21 20 LoUIS 18 17 Ctype7 15 14 Ctype6 12 11 Ctype5 9 Ctype4 8 6 Ctype3 5 3 Ctype2 2 0 Ctype1 RAZ. LoUU, bits [29:27] Level of Unification Uniprocessor for the cache hierarchy, see Clean, Invalidate, and Clean and Invalidate on page B2-11. LoC, bits [26:24] Level of Coherency for the cache hierarchy, see Clean, Invalidate, and Clean and Invalidate on page B2-11. LoUIS, bits [23:21] Level of Unification Inner Shareable for the cache hierarchy, see Clean, Invalidate, and Clean and Invalidate on page B2-11. This field is RAZ in implementations that do not implement the Multiprocessing Extensions. B3-92 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) CtypeX, bits [3(x - 1) + 2:3(x - 1)], for x = 1 to 7 Cache Type fields. Indicate the type of cache implemented at each level, from Level 1 up to a maximum of seven levels of cache hierarchy. The Level 1 cache field, Ctype1, is bits [2:0], see register diagram. Table B3-26 shows the possible values for each CtypeX field. Table B3-26 Ctype bit values CtypeX value Meaning, cache implemented at this level 000 No cache 001 Instruction cache only 010 Data cache only 011 Separate instruction and data caches 100 Unified cache 101, 11X Reserved If you read the Cache Type fields from Ctype1 upwards, once you have seen a value of 0b000, no caches exist at further out levels of the hierarchy. So, for example, if Ctype3 is the first Cache Type field with a value of 0b000, the values of Ctype4 to Ctype7 must be ignored. The CLIDR describes only the caches that are under the control of the processor. Accessing the CLIDR To access the CLIDR you read the CP15 registers with set to 1, set to c0, set to c0, and set to 1. For example: MRC p15,1,,c0,c0,1 ARM DDI 0406B ; Read CP15 Cache Level ID Register Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-93 Virtual Memory System Architecture (VMSA) B3.12.14 c0, IMPLEMENTATION DEFINED Auxiliary ID Register (AIDR) The IMPLEMENTATION DEFINED Auxiliary ID Register, AIDR, provides implementation-specific ID information. The value of this register must be used in conjunction with the value of MIDR. The IMPLEMENTATION DEFINED AIDR is: • a 32-bit read-only register • accessible only in privileged modes • when the Security Extensions are implemented, a Common register • introduced in ARMv7. The format of the AIDR is IMPLEMENTATION DEFINED. Accessing the AIDR To access the AIDR you read the CP15 registers with set to 1, set to c0, set to c0, and set to 7. For example: MRC p15,1,,c0,c0,7 B3-94 ; Read IMPLEMENTATION DEFINED Auxiliary ID Register Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) B3.12.15 c0, Cache Size Selection Register (CSSELR) The Cache Size Selection Register, CSSELR, selects the current CCSIDR. An ARMv7 implementation must include a CCSIDR for every implemented cache that is under the control of the processor. The CSSELR identifies which CP1CSID register can be accessed, by specifying, for the required cache: • the cache level • the cache type, either: — instruction cache. — Data cache. The data cache argument is also used for a unified cache. The CSSELR is: • a 32-bit read/write register • accessible only in privileged modes • when the Security Extensions are implemented, a Banked register • introduced in ARMv7. The format of the CSSELR is: 31 4 3 UNK/SBZP 1 0 Level InD Bits [31:4] UNK/SBZP. Level, bits [3:1] Cache level of required cache. Permitted values are from 0b000, indicating Level 1 cache, to 0b110 indicating Level 7 cache. InD, bit [0] Instruction not Data bit. Permitted values are: 0 Data or unified cache 1 Instruction cache. If CSSELR is set to indicate a cache that is not implemented, the result of reading CCSIDR is UNPREDICTABLE. Accessing CSSELR To access CSSELR you read or write the CP15 registers with set to 2, set to c0, set to c0, and set to 0. For example: MRC p15,2,,c0,c0,0 MCR p15,2,,c0,c0,0 ARM DDI 0406B ; Read Cache Size Selection Register ; Write Cache Size Selection Register Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-95 Virtual Memory System Architecture (VMSA) B3.12.16 CP15 c1, System control registers The CP15 c1 registers are used for system control. Figure B3-12 shows the CP15 c1 registers. CRn c1 opc1 0 CRm c0 c1 Read-only opc2 0 1 2 0 1 2 Read/Write SCTLR, Control Register ACTLR, Auxiliary Control Register, IMPLEMENTATION DEFINED CPACR, Coprocessor Access Control Register * SCR, Secure Configuration Register * SDER, Secure Debug Enable Register * NSACR, Non-secure Access Control Register Write-only * Only present if the Security Extensions are implemented. Figure B3-12 CP15 c1 registers in a VMSA implementation CP15 c1 register encodings not shown in Figure B3-12 are UNPREDICTABLE. When the Security Extensions are not implemented all encodings with CRm == c1 are UNPREDICTABLE. For more information, see Unallocated CP15 encodings on page B3-69. The following sections describe the CP15 c1 registers: • c1, System Control Register (SCTLR) • c1, Implementation defined Auxiliary Control Register (ACTLR) on page B3-103 • c1, Coprocessor Access Control Register (CPACR) on page B3-104 • c1, Secure Configuration Register (SCR) on page B3-106 • c1, Secure Debug Enable Register (SDER) on page B3-108 • c1, Non-Secure Access Control Register (NSACR) on page B3-110. B3.12.17 c1, System Control Register (SCTLR) The System Control Register, SCTLR, provides the top level control of the system, including its memory system. The SCTLR: • Is a 32-bit read/write register, with different access rights for some bits of the register. In ARMv7, some bits in the register are read-only. These bits relate to non-configurable features of an ARMv7 implementation, and are provided for compatibility with previous versions of the architecture. B3-96 • Is accessible only in privileged modes. • Has a defined reset value. The reset value is IMPLEMENTATION DEFINED, see Reset value of the SCTLR on page B3-102. When the Security Extensions are implemented the defined reset value applies only to the Secure copy of the SCTLR, and software must program the non-banked read/write bits of the Non-secure copy of the register with the required values. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) • When the Security Extensions are implemented: — is a Banked register, with some bits common to the Secure and Non-secure copies of the register — has write access to the Secure copy of the register disabled when the CP15SDISABLE signal is asserted HIGH. For more information, see Effect of the Security Extensions on the CP15 registers on page B3-71. Control bits in the SCTLR that are not applicable to a VMSA implementation read as the value that most closely reflects that implementation, and ignore writes. In an ARMv7-A implementation the format of the SCTLR is: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 0 1 1 VE TE TRE AFE EE NMFI U 0 0 1 FI 1 0 HA V I Z RR 0 0 0 1 1 1 1 C A M SW B Bit [31] UNK/SBZP. TE, bit [30] Thumb Exception enable. This bit controls whether exceptions are taken in ARM or Thumb state: 0 Exceptions, including reset, handled in ARM state 1 Exceptions, including reset, handled in Thumb state. When the Security Extensions are implemented, this bit is banked between the Secure and Non-secure versions of the register. An implementation can include a configuration input signal that determines the reset value of the TE bit. If there is no configuration input signal to determine the reset value of this bit then it resets to 0 in an ARMv7-A implementation. For more information about the use of this bit see Instruction set state on exception entry on page B1-35. AFE, bit [29] Access Flag Enable bit. This bit enables use of the AP[0] bit in the translation table descriptors as an access flag. It also restricts access permissions in the translation table descriptors to the simplified model described in Simplified access permissions model on page B3-29. The possible values of this bit are: 0 In the translation table descriptors, AP[0] is an access permissions bit. The full range of access permissions is supported. No access flag is implemented. 1 In the translation table descriptors, AP[0] is an access flag. Only the simplified model for access permissions is supported. When the Security Extensions are implemented, this bit is banked between the Secure and Non-secure versions of the register. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-97 Virtual Memory System Architecture (VMSA) TRE, bit [28] TEX Remap Enable bit. This bit enables remapping of the TEX[2:1] bits for use as two translation table bits that can be managed by the operating system. Enabling this remapping also changes the scheme used to describe the memory region attributes in the VMSA. The possible values of this bit are: 0 TEX Remap disabled. TEX[2:0] are used, with the C and B bits, to describe the memory region attributes. 1 TEX Remap enabled. TEX[2:1] are reassigned for use as flags managed by the operating system. The TEX[0], C and B bits are used to describe the memory region attributes, with the MMU remap registers. When the Security Extensions are implemented, this bit is banked between the Secure and Non-secure versions of the register. For more information, see The alternative descriptions of the Memory region attributes on page B3-32. NMFI, bit [27] Non-maskable Fast Interrupts enable: 0 Fast interrupts (FIQs) can be masked in the CPSR 1 Fast interrupts are non-maskable. When the Security Extensions are implemented this bit is common to the Secure and Non-secure versions of the register. This bit is read-only. It is IMPLEMENTATION DEFINED whether an implementation supports Non-Maskable Fast Interrupts (NMFIs): • If NMFIs are not supported then this bit must be RAZ. • If NMFIs are supported then this bit is controlled by a configuration input signal. For more information, see Non-maskable fast interrupts on page B1-18. Bit [26] RAZ/SBZP. EE, bit [25] Exception Endianness bit. The value of this bit defines the value of the CPSR.E bit on entry to an exception vector, including reset. This value also indicates the endianness of the translation table data for translation table lookups. The permitted values of this bit are: 0 Little endian 1 Big endian. When the Security Extensions are implemented, this bit is banked between the Secure and Non-secure versions of the register. This is a read/write bit. An implementation can include a configuration input signal that determines the reset value of the EE bit. If there is no configuration input signal to determine the reset value of this bit then it resets to 0. VE, bit [24] B3-98 Interrupt Vectors Enable bit. This bit controls the vectors used for the FIQ and IRQ interrupts. The permitted values of this bit are: 0 Use the FIQ and IRQ vectors from the vector table, see the V bit entry 1 Use the IMPLEMENTATION DEFINED values for the FIQ and IRQ vectors. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) When the Security Extensions are implemented, this bit is banked between the Secure and Non-secure versions of the register. For more information, see Vectored interrupt support on page B1-32. If the implementation does not support IMPLEMENTATION DEFINED FIQ and IRQ vectors then this bit is RAZ/WI. Bit [23] RAO/SBOP. U, bit [22] In ARMv7 this bit is RAO/SBOP, indicating use of the alignment model described in Alignment support on page A3-4. For details of this bit in earlier versions of the architecture see Alignment on page AppxG-6. FI, bit [21] Fast Interrupts configuration enable bit. This bit can be used to reduce interrupt latency in an implementation by disabling IMPLEMENTATION DEFINED performance features. The permitted values of this bit are: 0 All performance features enabled. 1 Low interrupt latency configuration. Some performance features disabled. When the Security Extensions are implemented, this bit is common to the Secure and Non-secure versions of the register. This bit is: • a read/write bit if the Security Extensions are not implemented • if the Security Extensions are implemented: — a read/write bit if the processor is in Secure state — a read-only bit if the processor is in Non-secure state. For more information, see Low interrupt latency configuration on page B1-43. If the implementation does not support a mechanism for selecting a low interrupt latency configuration this bit is RAZ/WI. Bit [20:19] RAZ/SBZP. Bit [18] RAO/SBOP. HA, bit [17] Hardware Access Flag Enable bit. If the implementation provides hardware management of the access flag this bit enables the access flag management: 0 Hardware management of access flag disabled 1 Hardware management of access flag enabled. If the Security Extensions are implemented then this bit is banked between the Secure and Non-secure versions of the register. If the implementation does not provide hardware management of the access flag then this bit is RAZ/WI. For more information, see Hardware management of the access flag on page B3-21. Bit [16] ARM DDI 0406B RAO/SBOP. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-99 Virtual Memory System Architecture (VMSA) Bit [15] RAZ/SBZP. RR, bit [14] Round Robin bit. If the cache implementation supports the use of an alternative replacement strategy that has a more easily predictable worst-case performance, this bit selects it: 0 Normal replacement strategy, for example, random replacement 1 Predictable strategy, for example, round-robin replacement. When the Security Extensions are implemented, this bit is common to the Secure and Non-secure versions of the register. This bit is: • a read/write bit if the Security Extensions are not implemented • if the Security Extensions are implemented: — a read/write bit if the processor is in Secure state — a read-only bit if the processor is in Non-secure state. The replacement strategy associated with each value of the RR bit is IMPLEMENTATION DEFINED. If the implementation does not support multiple IMPLEMENTATION DEFINED replacement strategies this bit is RAZ/WI. V, bit [13] Vectors bit. This bit selects the base address of the exception vectors: 0 Normal exception vectors, base address 0x00000000. When the Security Extensions are implemented this base address can be re-mapped. 1 High exception vectors (Hivecs), base address 0xFFFF0000. This base address is never remapped. When the Security Extensions are implemented, this bit is banked between the Secure and Non-secure versions of the register. An implementation can include a configuration input signal that determines the reset value of the V bit. If there is no configuration input signal to determine the reset value of this bit then it resets to 0. For more information, see Exception vectors and the exception base address on page B1-30. I, bit [12] Instruction cache enable bit: This is a global enable bit for instruction caches: 0 Instruction caches disabled 1 Instruction caches enabled. When the Security Extensions are implemented, this bit is banked between the Secure and Non-secure versions of the register. If the system does not implement any instruction caches that can be accessed by the processor, at any level of the memory hierarchy, this bit is RAZ/WI. If the system implements any instruction caches that can be accessed by the processor then it must be possible to disable them by setting this bit to 0. Cache enabling and disabling on page B2-8 describes the effect of enabling the caches. B3-100 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) Z, bit [11] Branch prediction enable bit. This bit is used to enable branch prediction, also called program flow prediction: 0 Program flow prediction disabled 1 Program flow prediction enabled. When the Security Extensions are implemented, this bit is banked between the Secure and Non-secure versions of the register. If program flow prediction cannot be disabled, this bit is RAO/WI. Program flow prediction includes all possible forms of speculative change of instruction stream prediction. Examples include static prediction, dynamic prediction, and return stacks. If the implementation does not support program flow prediction this bit is RAZ/WI. SW, bit[10] SWP/SWPB Enable bit. This bit enables the use of SWP and SWPB instructions: 0 SWP and SWPB are UNDEFINED 1 SWP and SWPB perform as described in SWP, SWPB on page A8-432. When the Security Extensions are implemented, this bit is banked between the Secure and Non-secure versions of the register. The bit is reset to 0. This is part of the Multiprocessing Extensions. In implementations that do not implement the Multiprocessing Extensions this bit is RAZ and SWP and SWPB instructions perform as described in SWP, SWPB on page A8-432. Note At reset, this bit disables SWP and SWPB. This means that operating systems have to choose to use SWP or SWPB. Bits [9:8] RAZ/SBZP. B, bit [7] In ARMv7 this bit is RAZ/SBZP, indicating use of the endianness model described in Endian support on page A3-7. For details of this bit in earlier versions of the architecture see Endian support on page AppxG-7 and Endian support on page AppxH-7. Bits [6:3] RAO/SBOP. C, bit [2] Cache enable bit: This is a global enable bit for data and unified caches: 0 Data and unified caches disabled 1 Data and unified caches enabled. When the Security Extensions are implemented, this bit is banked between the Secure and Non-secure versions of the register. If the system does not implement any data or unified caches that can be accessed by the processor, at any level of the memory hierarchy, this bit is RAZ/WI. If the system implements any data or unified caches that can be accessed by the processor then it must be possible to disable them by setting this bit to 0. Cache enabling and disabling on page B2-8 describes the effect of enabling the caches. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-101 Virtual Memory System Architecture (VMSA) A, bit [1] Alignment bit. This is the enable bit for Alignment fault checking: 0 Alignment fault checking disabled 1 Alignment fault checking enabled. When the Security Extensions are implemented, this bit is banked between the Secure and Non-secure versions of the register. For more information, see Alignment fault on page B3-42, for a VMSA implementation. M, bit [0] MMU enable bit. This is a global enable bit for the MMU: 0 MMU disabled 1 MMU enabled. When the Security Extensions are implemented, this bit is banked between the Secure and Non-secure versions of the register. For more information, see Enabling and disabling the MMU on page B3-5. Reset value of the SCTLR The SCTLR has a defined reset value that is IMPLEMENTATION DEFINED. There are different types of bit in the SCTLR: • Some bits are defined as RAZ or RAO, and have the same value in all VMSAv7 implementations. Figure B3-13 on page B3-103 shows the values of these bits. • Some bits are read-only and either: — have an IMPLEMENTATION DEFINED value — have a value that is determined by a configuration input signal. • Some bits are read/write and either: — reset to zero — reset to an IMPLEMENTATION DEFINED value — reset to a value that is determined by a configuration input signal. Figure B3-13 on page B3-103 shows the reset value, or how the reset value is defined, for each bit of the SCTLR. It also shows the possible values of each half byte of the register. B3-102 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) 0xA, 0x8, 0x4 or 0x0 0x2 or 0x0 0xC 0x5 0x2 or 0x0 0x8 or 0x0 0x7 0x8 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 (*) (†) (*) * * * * * * * * (*) * * * * * * * * * (*) 0 ‡ 0 0 ‡ 0 ‡ 0 1 1 0 0 0 1 0 1 0 0 ‡ 0 (0) 0 0 0 0 1 1 1 1 0 0 0 TE TRE U HA RR V I Z B C A M VE SW EE AFE FI NMFI * Read-only bits, including RAZ and RAO bits. (*) Can be RAZ. Otherwise read/write, resets to 0. (†) Can be read-only, with IMPLEMENTATION DEFINED value. Otherwise resets to 0. ‡ Value or reset value can depend on configuration input. Otherwise RAZ or resets to 0. Figure B3-13 Reset value of the SCTLR, ARMv7-A (VMSAv7) Accessing the SCTLR To access the SCTLR you read or write the CP15 registers with set to 0, set to c1, set to c0, and set to 0. For example: MRC p15,0,,c1,c0,0 MCR p15,0,,c1,c0,0 ; Read CP15 System Control Register ; Write CP15 System Control Register Note Additional configuration and control bits might be added to the SCTLR in future versions of the ARM architecture. ARM strongly recommends that software always uses a read, modify, write sequence to update the SCTLR. This prevents software modifying any bit that is currently unallocated, and minimizes the chance of the register update having undesired side effects. B3.12.18 c1, IMPLEMENTATION DEFINED Auxiliary Control Register (ACTLR) The Auxiliary Control Register, ACTLR, provides implementation-specific configuration and control options. The ACTLR is: • A 32-bit read/write register. • Accessible only in privileged modes. • When the Security Extensions are implemented, a Banked register. However, some bits might define global configuration settings, and be common to the Secure and Non-secure copies of the register. The contents of this register are IMPLEMENTATION DEFINED. ARMv7 requires this register to be privileged read/write accessible, even if an implementation has not created any control bits in this register. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-103 Virtual Memory System Architecture (VMSA) Accessing the ACTLR To access the ACTLR you read or write the CP15 registers with set to 0, set to c1, set to c0, and set to 1. For example: MRC p15,0,,c1,c0,1 MCR p15,0,,c1,c0,1 ; Read CP15 Auxiliary Control Register ; Write CP15 Auxiliary Control Register B3.12.19 c1, Coprocessor Access Control Register (CPACR) The Coprocessor Access Control Register, CPACR, controls access to all coprocessors other than CP14 and CP15. It also enables software to check for the presence of coprocessors CP0 to CP13. The CPACR: • is a 32-bit read/write register • is accessible only in privileged modes • has a defined reset value of 0 • when the Security Extensions are implemented, is a Configurable access register. The format of the CPACR is: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 (0) (0) cp13 cp12 cp11 cp10 cp9 cp8 cp7 cp6 cp5 cp4 cp3 cp2 cp1 cp0 D32DIS ASEDIS ASEDIS, bit[31] Disable Advanced SIMD Functionality: 0 This bit does not cause any instructions to be UNDEFINED. 1 All instruction encodings identified in the Alphabetical list of instructions on page A8-14 as being part of Advanced SIMD, but that are not VFPv3 instructions, are UNDEFINED. On an implementation that: • Implements VFP and does not implement Advanced SIMD, this bit is RAO/WI. • Does not implement VFP or Advanced SIMD, this bit is UNK/SBZP. • Implements both VFP and Advanced SIMD, it is IMPLEMENTATION DEFINED whether this bit is supported. If it is not supported it is RAZ/WI. This bit resets to 0 if it is supported. D32DIS, bit[30] Disable use of D16-D31 of the VFP register file: 0 B3-104 This bit does not cause any instructions to be UNDEFINED. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) 1 All instruction encodings identified in the Alphabetical list of instructions on page A8-14 as being VFPv3 instructions are UNDEFINED if they access any of registers D16-D31. If this bit is 1 when CPACR.ASEDIS == 0, the result is UNPREDICTABLE. On an implementation that: • Does not implement VFP, this bit is UNK/SBZP. • Implements VFP and does not implement D16-D31, this bit is RAO/WI. • Implements VFP and implements D16-D31, it is IMPLEMENTATION DEFINED whether this bit is supported. If it is not, then this bit is RAZ/WI. This bit resets to 0 if it is supported. Bits [29:28] Reserved. UNK/SBZP. cp, bits [2n+1, 2n], for n = 0 to 13 Defines the access rights for coprocessor n. The possible values of the field are: 0b00 Access denied. Any attempt to access the coprocessor generates an Undefined Instruction exception. 0b01 Privileged access only. Any attempt to access the coprocessor in User mode generates an Undefined Instruction exception. 0b10 Reserved. The effect of this value is UNPREDICTABLE. 0b11 Full access. The meaning of full access is defined by the appropriate coprocessor. The value for a coprocessor that is not implemented is 0b00, access denied. When the Security Extensions are implemented, the NSACR controls whether each coprocessor can be accessed from the Non-secure state, see c1, Non-Secure Access Control Register (NSACR) on page B3-110. When the NSACR permits Non-secure access to a coprocessor the level of access permitted is determined by the CPACR. Because the CPACR is not banked, the options for Non-secure state access to a coprocessor are: • no access • identical access rights to the Secure state. If more than one coprocessor is used to provide a set of functionality then having different values for the CPACR fields for those coprocessors can lead to UNPREDICTABLE behavior. An example where this must be considered is with the VFP extension. This uses CP10 and CP11. Typically, an operating system uses this register to control coprocessor resource sharing among applications: • Initially all applications are denied access to the shared coprocessor-based resources. • When an application attempts to use a resource it results in an Undefined Instruction exception. • The Undefined Instruction handler can then grant access to the resource by setting the appropriate field in the CPACR. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-105 Virtual Memory System Architecture (VMSA) For details of how this register can be used to check for implemented coprocessors see Access controls on CP0 to CP13 on page B1-63. Sharing resources among applications requires a state saving mechanism. Two possibilities are: • during a context switch, if the last executing process or thread had access rights to a coprocessor then the operating system saves the state of that coprocessor • on receiving a request for access to a coprocessor, the operating system saves the old state for that coprocessor with the last process or thread that accessed it. Accessing the CPACR To access the CPACR you read or write the CP15 registers with set to 0, set to c1, set to c0, and set to 2. For example: MRC p15,0,,c1,c0,2 MCR p15,0,,c1,c0,2 ; Read CP15 Coprocessor Access Control Register ; Write CP15 Coprocessor Access Control Register Normally, software uses a read, modify, write sequence to update the CPACR, to avoid unwanted changes to the access settings for other coprocessors. B3.12.20 c1, Secure Configuration Register (SCR) The Secure Configuration Register, SCR, is part of the Security Extensions. The SCR defines the configuration of the current security state. It specifies: • the security state of the processor, Secure or Non-secure • what mode the processor branches to if an IRQ, FIQ or external abort occurs • whether the CPSR.F and CPSR.A bits can be modified when SCR.NS = 1. The SCR: • is present only when the Security Extensions are implemented • is a 32-bit read/write register • is accessible in Secure privileged modes only • has a defined reset value of 0 • is a Restricted access register, meaning it exists only in the Secure state. The format of the SCR is: 31 7 6 5 4 3 2 1 0 UNK/SBZP FW EA nET AW Bits [31:7] B3-106 NS FIQ IRQ Reserved. UNK/SBZP. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) nET, bit [6] Not Early Termination. This bit disables early termination: 0 Early termination permitted. Execution time of data operations can depend on the data values. 1 Disable early termination. The number of cycles required for data operations is forced to be independent of the data values. This IMPLEMENTATION DEFINED mechanism can be used to disable data dependent timing optimizations from multiplies and data operations. It can provide system support against information leakage that might be exploited by timing correlation types of attack. On implementations that do not have early termination, this bit is UNK/SBZP. AW, bit [5] A bit writable. This bit controls whether the A bit in the CPSR can be modified in Non-secure state: 0 the CPSR.A bit can be modified only in Secure state. 1 the CPSR.A bit can be modified in any security state. For more information, see Control of aborts by the Security Extensions on page B1-41. FW, bit [4] F bit writable. This bit controls whether the F bit in the CPSR can be modified in Non-secure state: 0 the CPSR.F bit can be modified only in Secure state 1 the CPSR.F bit can be modified in any security state. For more information, see Control of FIQs by the Security Extensions on page B1-42. EA, bit [3] External Abort handler. This bit controls which mode handles external aborts: 0 Abort mode handles external aborts 1 Monitor mode handles external aborts. For more information, see Control of aborts by the Security Extensions on page B1-41. FIQ, bit [2] FIQ handler. This bit controls which mode the processor enters when a Fast Interrupt (FIQ) is taken: 0 FIQ mode entered when FIQ is taken 1 Monitor mode entered when FIQ is taken. For more information, see Control of FIQs by the Security Extensions on page B1-42. IRQ, bit [1] ARM DDI 0406B IRQ handler. This bit controls which mode the processor enters when an Interrupt (IRQ) is taken: 0 IRQ mode entered when IRQ is taken 1 Monitor mode entered when IRQ is taken. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-107 Virtual Memory System Architecture (VMSA) NS, bit [0] Non Secure bit. Except when the processor is in Monitor mode, this bit determines the security state of the processor. Table B3-27 shows the security settings: Table B3-27 Processor security state Processor mode, from CPSR.M bits SCR.NS Monitor mode All modes except Monitor mode 0 Secure state Secure state 1 Secure state Non-secure state For more information, see Changing from Secure to Non-secure state on page B1-27. The value of the NS bit also affects the accessibility of the Banked CP15 registers in Monitor mode, see Access to registers in Monitor mode on page B3-77. Unless the processor is in Debug state, when an exception occurs in Monitor mode the hardware sets the NS bit to 0. Whenever the processor changes security state, the monitor code can change the value of the EA, FIQ and IQ bits. This means that the behavior of IRQ, FIQ and External Abort exceptions can be different in each security state. Accessing the SCR To access the SCR you read or write the CP15 registers with set to 0, set to c1, set to c1, and set to 0. For example: MRC p15,0,,c1,c1,0 MCR p15,0,,c1,c1,0 ; Read CP15 Secure Configuration Register ; Write CP15 Secure Configuration Register B3.12.21 c1, Secure Debug Enable Register (SDER) The Secure Debug Enable Register, SDER, is part of the Security Extensions. The SDER controls invasive and non-invasive debug in Secure User mode. The SDER is: • present only when the Security Extensions are implemented • a 32-bit read/write register • a Restricted access register, meaning it exists only in the Secure state • accessible in Secure privileged modes only. B3-108 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) The format of the SDER is: 31 2 1 0 UNK/SBZP SUNIDEN SUIDEN Bits [31:2] Reserved. UNK/SBZP. SUNIDEN, bit [1] Secure User Non-Invasive Debug ENable: 0 non-invasive debug not permitted in Secure User mode 1 non-invasive debug permitted in Secure User mode. SUIDEN, bit [0] Secure User Invasive Debug ENable: 0 invasive debug not permitted in Secure User mode 1 invasive debug permitted in Secure User mode. For more information about the use of the SUNIDEN and SUIDEN bits see: • Chapter C2 Invasive Debug Authentication • Chapter C7 Non-invasive Debug Authentication. Note Invasive and non-invasive debug in Secure privileged modes is controlled by hardware only. For more information, see Chapter C2 Invasive Debug Authentication and Chapter C7 Non-invasive Debug Authentication. Accessing the SDER To access the SDER you read or write the CP15 registers with set to 0, set to c1, set to c1, and set to 1. For example: MRC p15,0,,c1,c1,1 MCR p15,0,,c1,c1,1 ARM DDI 0406B ; Read CP15 Secure Debug Enable Register ; Write CP15 Secure Debug Enable Register Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-109 Virtual Memory System Architecture (VMSA) B3.12.22 c1, Non-Secure Access Control Register (NSACR) The Non-Secure Access Control Register, NSACR, is part of the Security Extensions. The NSACR defines the Non-secure access permissions to the coprocessors CP0 to CP13. Additional bits in the register can be used to define Non-secure access permissions for IMPLEMENTATION DEFINED functionality. IMPLEMENTATION DEFINED The NSACR is: • Present only when the Security Extensions are implemented. • A 32-bit register • A Restricted access register. NSACR exists only in the Secure state, but can be read from Non-secure state. • Accessible only in privileged modes, with access rights that depend on the mode and security state: — the NSACR is read/write in Secure privileged modes — the NSACR is read-only in Non-secure privileged modes. The format of the NSACR is: 31 20 19 18 UNK/SBZP 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 IMP RFR NSASEDIS NSD32DIS cp13 - - - cp0 Coprocessor Non-secure access enables, cp13 to cp0, see text Bits [31:20] Reserved. UNK/SBZP. RFR, bit [19] Reserve FIQ Registers: 0 FIQ mode and the FIQ banked registers are accessible in Secure and Non-secure security states. 1 FIQ mode and the FIQ banked registers are accessible in the Secure security state only. Any attempt to access any FIQ Banked register or to enter an FIQ mode when in the Non-secure security states is UNPREDICTABLE. This bit resets to 0. On some implementations this bit cannot be set to 1. If NSACR.RFR == 1 when SCR.FIQ == 0, instruction execution is UNPREDICTABLE in Non-secure security state. Bits [18:16] IMPLEMENTATION DEFINED. These bits can be used to define the Non-secure access to IMPLEMENTATION DEFINED features. B3-110 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) NSASEDIS, bit[15] Disable Non-secure Advanced SIMD functionality: 0 This bit has no effect on the ability to write CPACR.ASEDIS. 1 When executing in Non-secure state the CPACR.ASEDIS bit has a fixed value of 1 and writes to it are ignored. On an implementation that: • Implements VFP and does not implement Advanced SIMD, this bit is RAO/WI. • Does not implement VFP or Advanced SIMD, this bit is UNK/SBZP. • Implements both VFP and Advanced SIMD, it is IMPLEMENTATION DEFINED whether this bit is supported. If it is not supported it is RAZ/WI. This bit resets to 0 if it is supported. NSD32DIS, bit[14] Disable Non-secure use of D16-D31 of the VFP register file: 0 This bit has no effect on the ability to write CPACR.D32DIS. 1 When executing in Non-secure state, the CPACR.D32DIS bit has a fixed value of 1 and writes to it are ignored. If this bit is 1 when NSACR.NSASEDIS == 0, the result is UNPREDICTABLE. On an implementation that: • Does not implement VFP, this bit is UNK/SBZP. • Implements VFP and does not implement D16-D31, this bit is RAO/WI. • Implements VFP and implements D16-D31, it is IMPLEMENTATION DEFINED whether this bit is supported. If it is not supported it is RAZ/WI. This bit resets to 0 if it is supported. cp, bit [n], for n = 0 to 13 Non-secure access to coprocessor enable. Each bit enables access to the corresponding coprocessor from Non-secure state: 0 Coprocessor can be accessed only from Secure state. Any attempt to access coprocessor in Non-secure state results in an Undefined Instruction exception. If the processor is in Non-secure state it cannot write the corresponding bits in the CPACR, and reads them as 0b00, access denied. 1 Coprocessor can be accessed from any security state. If Non-secure access to a coprocessor is enabled, the CPACR must be checked to determine the level of access that is permitted, see c1, Coprocessor Access Control Register (CPACR) on page B3-104. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-111 Virtual Memory System Architecture (VMSA) If multiple coprocessors are used to control a feature then the Non-secure access enable bits for those coprocessors must be set to the same value, otherwise behavior is UNPREDICTABLE. For example, when the VFP extension is implemented it is controlled by coprocessors 10 and 11, and bits [10,11] of the NSACR must be set to the same value. For bits that correspond to coprocessors that are not implemented, it is IMPLEMENTATION whether the bits: • behave as RAZ/WI • can be written by Secure privileged modes. DEFINED Accessing the NSACR To access the NSACR you read or write the CP15 registers with set to 0, set to c1, set to c1, and set to 2. For example: MRC p15,0,,c1,c1,2 MCR p15,0,,c1,c1,2 ; Read CP15 Non-Secure Access Control Register ; Write CP15 Non-Secure Access Control Register You can write to the NSACR only in Secure privileged modes. You can read the register in any privileged mode. B3-112 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) B3.12.23 CP15 c2 and c3, Memory protection and control registers On an ARMv7-A implementation, the CP15 c2 and c3 registers are used for memory protection and control. Figure B3-14 shows these registers. CRn c2 c3 opc1 0 CRm c0 0 c0 Read-only opc2 0 1 2 0 Read/Write TTBR0, Translation Table Base Register 0 TTBR1, Translation Table Base Register 1 TTBCR, Translation Table Base Control Register DACR, Domain Access Control Register Write-only Figure B3-14 CP15 c2 and c3 registers CP15 c2 and c3 register encodings not shown in Figure B3-14 are UNPREDICTABLE, see Unallocated CP15 encodings on page B3-69. B3.12.24 CP15 c2, Translation table support registers When the VMSA is implemented, three translation table support registers are implemented in CP15 c2. Table B3-28 summarizes these registers. Table B3-28 VMSA translation table support registers Register name Description Translation Table Base 0 c2, Translation Table Base Register 0 (TTBR0) Translation Table Base 1 c2, Translation Table Base Register 1 (TTBR1) on page B3-116 Translation Table Base Control c2, Translation Table Base Control Register (TTBCR) on page B3-117 The description of the TTBCR describes the use of this set of registers, see c2, Translation Table Base Control Register (TTBCR) on page B3-117. c2, Translation Table Base Register 0 (TTBR0) The Translation Table Base Register 0, TTBR0, holds the base address of translation table 0, and information about the memory it occupies. The TTBR0 register: • is a 32-bit read/write register • is accessible only in privileged modes • when the Security Extensions are implemented: — is a Banked register. — has write access to the Secure copy of the register disabled when the CP15SDISABLE signal is asserted HIGH. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-113 Virtual Memory System Architecture (VMSA) When the Multiprocessing Extensions are not implemented, the format of the TTBR0 register is: 31 14-N 13-N Translation table base 0 address 6 5 4 3 2 1 0 UNK/SBZP RGN NOS S C IMP When the Multiprocessing Extensions are implemented, the format of the TTBR0 register is: 31 14-N 13-N Translation table base 0 address 7 6 5 4 3 2 1 0 UNK/SBZP RGN S IRGN[0] NOS IMP IRGN[1] Bits [31:14-N] Translation table base 0 address, bits [31:14-N]. The value of N determines the required alignment of the translation table, which must be aligned to 214-N bytes. Bits [13-N:6], ARMv7-A base architecture UNK/SBZP. Bits [13-N:7], when the Multiprocessing Extensions are implemented UNK/SBZP. IRGN[0], bit [6], when the Multiprocessing Extensions are implemented See the description of bit [0] when the Multiprocessing Extensions are implemented. NOS, bit [5] Not Outer Shareable bit. Indicates the Outer Shareable attribute for the memory associated with a translation table walk that has the Shareable attribute, indicated by TTBR0.S == 1: 0 Outer Shareable 1 Inner Shareable. This bit is ignored when TTBR0.S == 0. This bit is only implemented from ARMv7. RGN, bits [4:3] Region bits. Indicates the Outer Cacheability attributes for the memory associated with the translation table walks: 0b00 Normal memory, Outer Non-cacheable 0b01 Normal memory, Outer Write-Back Write-Allocate Cacheable B3-114 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) 0b10 0b11 Normal memory, Outer Write-Through Cacheable Normal memory, Outer Write-Back no Write-Allocate Cacheable. IMP, bit [2] The effect of this bit is IMPLEMENTATION DEFINED. If the translation table implementation does not include any IMPLEMENTATION DEFINED features this bit is SBZ. S, bit [1] Shareable bit. Indicates the Shareable attribute for the memory associated with the translation table walks: 0 Non-shareable 1 Shareable. C, bit [0], ARMv7-A base architecture Cacheable bit. Indicates whether the translation table walk is to Inner Cacheable memory. 0 Inner Non-cacheable 1 Inner Cacheable. For regions marked as Inner Cacheable, it is IMPLEMENTATION DEFINED whether the read has the Write-Through, Write-Back no Write-Allocate, or Write-Back Write-Allocate attribute. IRGN, bits [6,0], when the Multiprocessing Extensions are implemented Inner region bits. Indicates the Inner Cacheability attributes for the memory associated with the translation table walks. The possible values of IRGN[1:0] are: 0b00 Normal memory, Inner Non-cacheable 0b01 Normal memory, Inner Write-Back Write-Allocate Cacheable 0b10 Normal memory, Inner Write-Through Cacheable 0b11 Normal memory, Inner Write-Back no Write-Allocate Cacheable. Note The encoding of the IRGN bits is counter-intuitive, with register bit [6] being IRGN[0] and register bit [0] being IRGN[1]. This encoding is chosen to give a consistent encoding of memory region types and to ensure that software written for the ARMv7 base architecture can run unmodified on an implementation that includes the Multiprocessing Extensions. Accessing the TTBR0 register To access the TTBR0 register you read or write the CP15 registers with set to 0, set to c2, set to c0, and set to 0. For example: MRC p15,0,,c2,c0,0 MCR p15,0,,c2,c0,0 ARM DDI 0406B ; Read CP15 Translation Table Base Register 0 ; Write CP15 Translation Table Base Register 0 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-115 Virtual Memory System Architecture (VMSA) c2, Translation Table Base Register 1 (TTBR1) The Translation Table Base Register 1, TTBR1, holds the base address of translation table 1, and information about the memory it occupies. The TTBR1 register is: • a 32-bit read/write register • accessible only in privileged modes • when the Security Extensions are implemented, a Banked register. When the Multiprocessing Extensions are not implemented, the format of the TTBR1 register is: 31 14 13 Translation table base 1 address 6 5 4 3 2 1 0 UNK/SBZP RGN NOS S C IMP When the Multiprocessing Extensions are implemented, the format of the TTBR1 register is: 31 14 13 Translation table base 1 address 7 6 5 4 3 2 1 0 UNK/SBZP RGN S IRGN[0] NOS IMP IRGN[1] Bits [31:14] Translation table base 1 address, bits [31:14]. The translation table must be aligned on a 16KByte boundary. Bits [13:6], ARMv7-A base architecture UNK/SBZP. Bits [13:7], when the Multiprocessing Extensions are implemented UNK/SBZP. IRGN[0:1], bits [6,0], when the Multiprocessing Extensions are implemented See the definition given for the TTBR0 in c2, Translation Table Base Register 0 (TTBR0) on page B3-113. NOS, RGN, IMP, S, bits [5:1] See the definitions given for the TTBR0 in c2, Translation Table Base Register 0 (TTBR0) on page B3-113. B3-116 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) C, bit [0], ARMv7-A base architecture See the definition given for the TTBR0 in c2, Translation Table Base Register 0 (TTBR0) on page B3-113. Accessing the TTBR1 register To access the TTBR1 register you read or write the CP15 registers with set to 0, set to c2, set to c0, and set to 1. For example: MRC p15,0,,c2,c0,1 MCR p15,0,,c2,c0,1 ; Read CP15 Translation Table Base Register 1 ; Write CP15 Translation Table Base Register 1 c2, Translation Table Base Control Register (TTBCR) The Translation Table Base Control Register, TTBCR, determines which of the Translation Table Base Registers, TTBR0 or TTBR1, defines the base address for the translation table walk that is required when an MVA is not found in the TLB. The TTBCR: • Is a 32-bit read/write register. • Is accessible only in privileged modes • Has a defined reset value of 0. When the Security Extensions are implemented, this reset value applies only to the Secure copy of the register, and software must program the Non-secure copy of the register with the required value. • When the Security Extensions are implemented: — is a Banked register. — has write access to the Secure copy of the register disabled when the CP15SDISABLE signal is asserted HIGH. When the Security Extensions are not implemented, the format of the TTBCR is: 31 3 2 UNK/SBZP 0 N When the Security Extensions are implemented, the format of the TTBCR is: 31 6 5 4 3 2 UNK/SBZP (0) 0 N PD1 PD0 Bits [31:6, 3] UNK/SBZP. PD1, bit [5], when Security Extensions are implemented Translation table walk Disable bit for TTBR1. This bit controls whether a translation table walk is performed on a TLB miss when TTBR1 is used: 0 ARM DDI 0406B If a TLB miss occurs when TTBR1 is used a translation table walk is performed. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-117 Virtual Memory System Architecture (VMSA) 1 If a TLB miss occurs when TTBR1 is used no translation table walk is performed and a Section Translation fault is returned. PD0, bit [4], when Security Extensions are implemented Translation table walk Disable bit for TTBR0. This bit controls whether a translation table walk is performed on a TLB miss when TTBR0 is used. The meanings of the possible values of this bit are equivalent to those for the PD1 bit. Bits [5:4], when Security Extensions are not implemented UNK/SBZP. N, bits [2:0] Indicate the width of the base address held in TTBR0. In TTBR0, the base address field is bits [31:14-N]. The value of N also determines: • whether TTBR0 or TTBR1 is used as the base address for translation table walks. • the size of the translation table pointed to by TTBR0. N can take any value from 0 to 7, that is, from 0b000 to 0b111. When N has its reset value of 0, the translation table base is compatible with ARMv5 and ARMv6. Determining which TTBR to use, and the TTBR0 translation table size When an MVA is not found in the TLB, the value of TTBCR.N determines whether TTBR0 or TTBR1 is used as the base address for the translation table walk in memory: • if N == 0 then always use TTBR0 • if N > 0 then: — if bits [31:32-N] of the MVA are all zero then use TTBR0 — otherwise use TTBR1. The size of the first-level translation tables accessed by TTBR0 depends on the value of TTBCR.N as shown in Table B3-29: Table B3-29 Value of N field and the size of the TTBR0 translation table B3-118 TTBCR.N Size of TTBR0 translation table 0b000 16KB 0b001 8KB 0b010 4KB 0b011 2KB 0b100 1KB Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) Table B3-29 Value of N field and the size of the TTBR0 translation table (continued) TTBCR.N Size of TTBR0 translation table 0b101 512 bytes 0b110 256 bytes 0b111 128 bytes Accessing the TTBCR To access the TTBCR you read or write the CP15 registers with set to 0, set to c2, set to c0, and set to 2. For example: MRC p15,0,,c2,c0,2 MCR p15,0,,c2,c0,2 ; Read CP15 Translation Table Base Control Register ; Write CP15 Translation Table Base Control Register B3.12.25 c3, Domain Access Control Register (DACR) The Domain Access Control Register, DACR, defines the access permission for each of the sixteen memory domains. The DACR: • is a 32-bit read/write register • is accessible only in privileged modes • when the Security Extensions are implemented: — is a Banked register. — has write access to the Secure copy of the register disabled when the CP15SDISABLE signal is asserted HIGH. The format of the DACR is: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 D15 D14 D13 D12 D11 D10 D9 D8 D7 D6 D5 8 D4 7 6 D3 5 4 D2 3 2 1 D1 0 D0 Dn, bits [(2n+1):2n] Domain n access permission, where n = 0 to 15. Permitted values are: 0b00 No access. Any access to the domain generates a Domain fault. 0b01 Client. Accesses are checked against the permission bits in the translation tables. 0b10 Reserved, effect is UNPREDICTABLE 0b11 Manager. Accesses are not checked against the permission bits in the translation tables. For more information, see Domains on page B3-31. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-119 Virtual Memory System Architecture (VMSA) Accessing the DACR To access the DACR you read or write the CP15 registers with set to 0, set to c3, set to c0, and set to 0. For example: MRC p15,0,,c3,c0,0 MCR p15,0,,c3,c0,0 ; Read CP15 Domain Access Control Register ; Write CP15 Domain Access Control Register B3.12.26 CP15 c4, Not used CP15 c4 is not used on any ARMv7 implementation, see Unallocated CP15 encodings on page B3-69. B3.12.27 CP15 c5 and c6, Memory system fault registers The CP15 c5 and c6 registers are used for memory system fault reporting. Figure B3-15 shows the CP15 c5 and c6 registers. CRn c5 opc1 0 CRm c0 c1 c6 0 Read-only c0 opc2 0 1 0 1 0 2 Read/Write DFSR, Data Fault Status Register IFSR, Instruction Fault Status Register ADFSR, Auxiliary DFSR Details are IMPLEMENTATION DEFINED AIFSR, Auxiliary IFSR DFAR, Data Fault Address Register IFAR, Instruction Fault Address Register Write-only Figure B3-15 CP15 c5 and c6 registers in a VMSA implementation CP15 c5 and c6 register encodings not shown in Figure B3-15 are UNPREDICTABLE, see Unallocated CP15 encodings on page B3-69. The CP15 c5 and c6 registers are described in: • CP15 c5, Fault status registers on page B3-121 • CP15 c6, Fault Address registers on page B3-124. Also, these registers are used to report information about debug exceptions. For details see Effects of debug exceptions on CP15 registers and the DBGWFAR on page C4-4. B3-120 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) B3.12.28 CP15 c5, Fault status registers There are two fault status registers, in CP15 c5, and the architecture provides encodings for two additional IMPLEMENTATION DEFINED registers. Table B3-30 summarizes these registers. Table B3-30 Fault status registers Register name Description Data Fault Status Register (DFSR) c5, Data Fault Status Register (DFSR) Instruction Fault Status Register (IFSR) c5, Instruction Fault Status Register (IFSR) on page B3-122 Auxiliary Data Fault Status Register (ADFSR) c5, Auxiliary Data and Instruction Fault Status Registers (ADFSR and AIFSR) on page B3-123 Auxiliary Instruction Fault Status Register (AIFSR) Fault information is returned using the fault status registers and the fault address registers described in CP15 c6, Fault Address registers on page B3-124. For details of how these registers are used see Fault Status and Fault Address registers in a VMSA implementation on page B3-48. c5, Data Fault Status Register (DFSR) The Data Fault Status Register, DFSR, holds status information about the last data fault. The DFSR is: • a 32-bit read/write register • accessible only in privileged modes • when the Security Extensions are implemented, a Banked register. The format of the DFSR is: 31 13 12 11 10 9 8 7 UNK/SBZP (0) (0) 4 3 Domain 0 FS[3:0] ExT WnR FS[4] Bits [31:13,9:8] UNK/SBZP. ExT, bit [12] External abort type. This bit can be used to provide an IMPLEMENTATION DEFINED classification of external aborts. For aborts other than external aborts this bit always returns 0. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-121 Virtual Memory System Architecture (VMSA) WnR, bit [11] Write not Read bit. Indicates whether the abort was caused by a write or a read access: 0 Abort caused by a read access 1 Abort caused by a write access. For faults on CP15 cache maintenance operations, including the VA to PA translation operations, this bit always returns a value of 1. FS, bits [10,3:0] Fault status bits. For the valid encodings of these bits in an ARMv7-A implementation with a VMSA, see Table B3-12 on page B3-51. All encodings not shown in the table are reserved. Domain, bits [7:4] The domain of the fault address. From ARMv7 use of this field is deprecated, see The Domain field in the DFSR on page B3-52. For information about using the DFSR see Fault Status and Fault Address registers in a VMSA implementation on page B3-48. Accessing the DFSR To access the DFSR you read or write the CP15 registers with set to 0, set to c5, set to c0, and set to 0. For example: MRC p15,0,,c5,c0,0 MCR p15,0,,c5,c0,0 ; Read CP15 Data Fault Status Register ; Write CP15 Data Fault Status Register c5, Instruction Fault Status Register (IFSR) The Instruction Fault Status Register, IFSR, holds status information about the last instruction fault. The IFSR is: • a 32-bit read/write register • accessible only in privileged modes • when the Security Extensions are implemented, a Banked register. The format of the IFSR is: 31 13 12 11 10 9 (0) UNK/SBZP 4 3 UNK/SBZP 0 FS[3:0] ExT FS[4] Bits [31:13,11,9:4] UNK/SBZP. B3-122 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) ExT, bit [12] External abort type. This bit can be used to provide an IMPLEMENTATION DEFINED classification of external aborts. For aborts other than external aborts this bit always returns 0. FS, bits [10,3:0] Fault status bits. For the valid encodings of these bits in an ARMv7-A implementation with a VMSA, see Table B3-11 on page B3-50. All encodings not shown in the table are reserved. For information about using the IFSR see Fault Status and Fault Address registers in a VMSA implementation on page B3-48. Accessing the IFSR To access the IFSR you read or write the CP15 registers with set to 0, set to c5, set to c0, and set to 1. For example: MRC p15,0,,c5,c0,1 MCR p15,0,,c5,c0,1 ; Read CP15 Instruction Fault Status Register ; Write CP15 Instruction Fault Status Register c5, Auxiliary Data and Instruction Fault Status Registers (ADFSR and AIFSR) The Auxiliary Data Fault Status Register (ADFSR) and the Auxiliary Instruction Fault Status Register (AIFSR) enable the system to return additional IMPLEMENTATION DEFINED fault status information, see Auxiliary Fault Status Registers on page B3-53. The ADFSR and AIFSR are: • 32-bit read/write registers • accessible only in privileged modes • when the Security Extensions are implemented, Banked registers • introduced in ARMv7. The formats of the ADFSR and AIFSR are IMPLEMENTATION DEFINED. Accessing the ADFSR and AIFSR To access the ADFSR or AIFSR you read or write the CP15 registers with set to 0, set to c5, set to c1, and set to: • • 0 for the ADFSR 1 for the AIFSR. For example: MRC MCR MRC MCR p15,0,,c5,c1,0 p15,0,,c5,c1,0 p15,0,,c5,c1,1 p15,0,,c5,c1,1 ARM DDI 0406B ; ; ; ; Read Write Read Write CP15 CP15 CP15 CP15 Auxiliary Auxiliary Auxiliary Auxiliary Data Fault Status Data Fault Status Instruction Fault Instruction Fault Register Register Status Register Status Register Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-123 Virtual Memory System Architecture (VMSA) B3.12.29 CP15 c6, Fault Address registers There are two Fault Address registers, in CP15 c6, as shown in Figure B3-15 on page B3-120s. The two Fault Address registers complement the Fault Status registers, and are shown in Table B3-31. Table B3-31 Fault Address registers Register name Description Data Fault Address Register (DFAR) c6, Data Fault Address Register (DFAR) Instruction Fault Address Register (IFAR) c6, Instruction Fault Address Register (IFAR) on page B3-125 Note Before ARMv7: • The DFAR was called the Fault Address Register (FAR). • The Watchpoint Fault Address Register (DBGWFAR) was implemented in CP15 c6, with = 1. From ARMv7, the DBGWFAR is only implemented as a CP14 debug register, see Watchpoint Fault Address Register (DBGWFAR) on page C10-28. Fault information is returned using the fault address registers and the fault status registers described in CP15 c5, Fault status registers on page B3-121. For details of how these registers are used, and when the value in the IFAR is valid, see Fault Status and Fault Address registers in a VMSA implementation on page B3-48. c6, Data Fault Address Register (DFAR) The Data Fault Address Register, DFAR, holds the MVA of the faulting address that caused a synchronous Data Abort exception. The DFAR is: • a 32-bit read/write register • accessible only in privileged modes • when the Security Extensions are implemented, a Banked register. The format of the DFAR is: 31 0 MVA of faulting address of synchronous Data Abort exception For information about using the DFAR, and when the value in the DFAR is valid, see Fault Status and Fault Address registers in a VMSA implementation on page B3-48. A debugger can write to the DFAR to restore its value. B3-124 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) Accessing the DFAR To access the DFAR you read or write the CP15 registers with set to 0, set to c6, set to c0, and set to 0. For example: MRC p15,0,,c6,c0,0 MCR p15,0,,c6,c0,0 ; Read CP15 Data Fault Address Register ; Write CP15 Data Fault Address Register c6, Instruction Fault Address Register (IFAR) The Instruction Fault Address Register, IFAR, holds the MVA of the faulting access that caused a synchronous Prefetch Abort exception. The IFAR is: • a 32-bit read/write register • accessible only in privileged modes • when the Security Extensions are implemented, a Banked register. The format of the IFAR is: 31 0 MVA of faulting address of synchronous Prefetch Abort exception For information about using the IFAR see Fault Status and Fault Address registers in a VMSA implementation on page B3-48. A debugger can write to the IFAR to restore its value. Accessing the IFAR To access the IFAR you read or write the CP15 registers with set to 0, set to c6, set to c0, and set to 2. For example: MRC p15,0,,c6,c0,2 MCR p15,0,,c6,c0,2 ARM DDI 0406B ; Read CP15 Instruction Fault Address Register ; Write CP15 Instruction Fault Address Register Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-125 Virtual Memory System Architecture (VMSA) B3.12.30 CP15 c7, Cache maintenance and other functions The CP15 c7 registers are used for cache maintenance operations. They also provide barrier operations, and VA to PA address translation functions. Figure B3-16 shows the CP15 c7 registers. CRn c7 opc1 0 CRm c0 c1 c4 c5 c6 c8 c10 c11 c13 c14 Read-only opc2 4 {0,6} 0 {0,1} 4 {6,7} {1,2} {0-7} {1,2} {4,5} 1 1 {1,2} Read/Write NOP Cache maintenance operations ‡ PAR, PA result from VA to PA translation Cache maintenance operations CP15ISB, Instruction Synchronization Barrier operation Branch predictor maintenance operations Cache maintenance operations VA to PA translation operations Cache maintenance operations Data barrier operations DCCMVAU, cache maintenance operation NOP Cache maintenance operations Write-only Bold text = Accessible in User mode ‡ Part of the Multiprocessing Extensions Figure B3-16 CP15 c7 registers in a VMSA implementation CP15 c7 register encodings not shown in Figure B3-16 are UNPREDICTABLE, see Unallocated CP15 encodings on page B3-69. The CP15 c7 registers are described in: • CP15 c7, Cache and branch predictor maintenance functions • CP15 c7, Virtual Address to Physical Address translation operations on page B3-130 • CP15 c7, Miscellaneous functions on page B3-136. B3.12.31 CP15 c7, Cache and branch predictor maintenance functions CP15 c7 provides a number of functions. This section describes only the CP15 c7 cache and branch predictor maintenance operations. Branch predictor operations are included in this section because they operate in a similar way to the cache maintenance operations. Note ARMv7 introduces significant changes in the CP c7 operations. Most of these changes are because, from ARMv7, the architecture covers multiple levels of cache. This section only describes the ARMv7 requirements for these operations. For details of these operations in previous versions of the architecture see: • c7, Cache operations on page AppxG-38 for ARMv6 • c7, Cache operations on page AppxH-49 for ARMv4 and ARMv5. B3-126 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) Figure B3-17 shows the CP15 c7 cache and branch predictor maintenance operations. CRn c7 opc1 0 CRm c1 opc2 0 6 0 1 6 7 1 2 1 2 1 1 2 c5 c6 c10 c11 c14 Read-only Read/Write ICIALLUIS, Invalidate all instruction caches to PoU Inner Shareable ‡ BPIALLIS, Invalidate entire branch predictor array Inner Shareable ‡ ICIALLU, Invalidate all instruction caches to PoU ICIMVAU, Invalidate instruction caches by MVA to PoU BPIALL, Invalidate entire branch predictor array BPIMVA, Invalidate MVA from branch predictor array DCIMVAC, Invalidate data† cache line by MVA to PoC DCISW, Invalidate data† cache line by set/way DCCMVAC, Clean data† cache line by MVA to PoC DCCSW, Clean data† cache line by set/way DCCMVAU, Clean data† cache line by MVA to PoU DCCIMVAC, Clean and invalidate data† cache line by MVA to PoC DCCISW, Clean and invalidate data† cache line by set/way Write-only † data or unified PoU: Point of Unification PoC: Point of Coherency ‡ Part of the Multiprocessing Extensions Figure B3-17 CP15 c7 Cache and branch predictor maintenance operations The CP15 c7 cache and branch predictor maintenance operations are all write-only operations that can be executed only in privileged modes. They are listed in Table B3-32. For more information about the terms used in this section see Terms used in describing cache operations on page B2-10. The Multiprocessing Extensions changes the set of caches affected by these operations, Multiprocessor effects on cache maintenance operations on page B2-23. In Table B3-32, the Rt data column specifies what data is required in the register Rt specified by the MCR instruction used to perform the operation. For more information about the possible data formats, see Data formats for the cache and branch predictor operations on page B3-128. Table B3-32 CP15 c7 cache and branch predictor maintenance operations CRm opc2 Mnemonic Function a Rt data c1 0 ICIALLUISb Invalidate all instruction caches Inner Shareable to PoU. Also flushes branch target cache. c Ignored c1 6 BPIALLISb Invalidate entire branch predictor array Inner Shareable. Ignored c5 0 ICIALLU Invalidate all instruction caches to PoU. Also flushes branch target cache. c Ignored c5 1 ICIMVAU Invalidate instruction cache line by MVA to PoU. c MVA c5 6 BPIALL Invalidate entire branch predictor array. Ignored ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-127 Virtual Memory System Architecture (VMSA) Table B3-32 CP15 c7 cache and branch predictor maintenance operations (continued) CRm opc2 Mnemonic Function a Rt data c5 7 BPIMVA Invalidate MVA from branch predictor array. MVA c6 1 DCIMVAC Invalidate data or unified cache line by MVA to PoC. MVA c6 2 DCISW Invalidate data or unified cache line by set/way. Set/way c10 1 DCCMVAC Clean data or unified cache line by MVA to PoC. MVA c10 2 DCCSW Clean data or unified cache line by set/way. Set/way c11 1 DCCMVAU Clean data or unified cache line by MVA to PoU. MVA c14 1 DCCIMVAC Clean and Invalidate data or unified cache line by MVA to PoC. MVA c14 2 DCCISW Clean and Invalidate data or unified cache line by set/way. Set/way a. Modified Virtual Address (MVA), point of coherency (PoC) and point of unification (PoU) are described in Terms used in describing cache operations on page B2-10. b. Part of the Multiprocessing Extensions, See Multiprocessor effects on cache maintenance operations on page B2-23. c. Only applies to separate instruction caches, does not apply to unified caches. Data formats for the cache and branch predictor operations Table B3-32 on page B3-127 shows three possibilities for the data in the register Rt specified by the MCR instruction. These are described in the following subsections: • Ignored • MVA • Set/way on page B3-129. Ignored The value in the register specified by the MCR instruction is ignored. You do not have to write a value to the register before issuing the MCR instruction. MVA For more information about the possible meaning when the table shows that an MVA is required see Terms used in describing cache operations on page B2-10. When the data is stated to be an MVA, it does not have to be cache line aligned. B3-128 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) Set/way For an operation by set/way, the data identifies the cache line that the operation is to be applied to by specifying: • the cache set the line belongs to • the way number of the line in the set • the cache level. The format of the register data for a set/way operation is: 31 32-A 31-A Way B SBZ B-1 L L-1 Set 4 SBZ 3 1 Level 0 0 Where: A = Log2(ASSOCIATIVITY) B L = (L + S) = Log2(LINELEN) S = Log2(NSETS) Level ASSOCIATIVITY, LINELEN (Line Length) and NSETS (number of sets) have their usual meanings and are the values for the cache level being operated on. The values of A and S are rounded up to the next integer. ((Cache level to operate on) -1) For example, this field is 0 for operations on L 1 cache, or 1 for operations on L 2 cache. The number of the set to operate on. The number of the way to operate on. Set Way Note • If L = 4 then there is no SBZ field between the set and level fields in the register. • If A = 0 there is no way field in the register, and register bits [31:B] are SBZ. • If the level, set or way field in the register is larger than the size implemented in the cache then the effect of the operation is UNPREDICTABLE. Accessing the CP15 c7 cache and branch predictor maintenance operations To perform one of the cache maintenance operations you write the CP15 registers with set to 0, set to c7, and and set to the values shown in Table B3-32 on page B3-127. That is: MCR p15,0,,c7,, For example: MCR p15,0,,c7,c5,0 MCR p15,0,,c7,c10,2 ARM DDI 0406B ; Invalidate all instruction caches to point of unification ; Clean data or unified cache line by set/way Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-129 Virtual Memory System Architecture (VMSA) B3.12.32 CP15 c7, Virtual Address to Physical Address translation operations CP15 c7 provides a number of functions, summarized in Figure B3-10 on page B3-65. This section describes only the CP15 c7 operations that provide Virtual Address (VA) to Physical Address (PA) translation on implementations that include the VMSA, and the register that returns the result of the operation. Figure B3-18 shows all of the CP15 c7 VA to PA translation operations. It does not show the other CP15 c7 operations. Note As explained in this section, the CP15 c7 encodings for VA to PA translation with == {4-7} are available only when the Security Extensions are implemented. These encodings are reserved and UNPREDICTABLE when the Security Extensions are not implemented. CRn c7 opc1 0 CRm c4 c8 opc2 0 0 1 2 3 4 5 6 7 PAR, Physical Address Register (Translation result) V2PCWPR, Privileged Read VA to PA translation Current V2PCWPW, Privileged Write VA to PA translation security V2PCWUR, User Read VA to PA translation state V2PCWUW, User Write VA to PA translation † V2POWPR, Privileged Read VA to PA translation † V2POWPW, Privileged Write VA to PA translation Other security † V2POWUR, User Read VA to PA translation state † V2POWUW, User Write VA to PA translation Read-only Write-only Read/Write Shown with Security Extensions implemented. When they are not implemented: the concepts of Current security state and Other security state are not defined encodings marked † are reserved and UNPREDICTABLE. Figure B3-18 CP15 c7 VA to PA translation operations This set of registers comprises: B3-130 • A single Physical Address Register, PAR, that returns the result of the VA to PA translation. For more information about this register see c7, Physical Address Register (PAR) and VA to PA translations on page B3-133. • A set of VA to PA translation operations. These are: — 32-bit write-only operations — accessible only in privileged modes. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) When the Security Extensions are not implemented, there are four VA to PA translation operations, listed in Table B3-33. Table B3-33 VA to PA translation when Security Extensions are not implemented CRm opc2 Mnemonic Register or operation c4 0 - PAR, Physical Address Register c8 0 V2PCWPR Privileged read VA to PA translation c8 1 V2PCWPW Privileged write VA to PA translation c8 2 V2PCWUR User read VA to PA translation c8 3 V2PCWUW User write VA to PA translation When the Security Extensions are implemented, there are eight VA to PA translation operations. Four of these are common to the Secure and Non-secure security states, and four are only available in the Secure state. Table B3-34 lists these operations, and shows the security states in which each is available. Table B3-34 VA to PA translation when Security Extensions are implemented Register or operation: CRm opc2 Mnemonic Common Non-secure state Secure state c4 0 - - PAR PAR c8 0 V2PCWPR Current security state privileged read a - - c8 1 V2PCWPW Current security state privileged write a - - c8 2 V2PCWUR Current security state User read a - - c8 3 V2PCWUW Current security state User write a - - c8 4 V2POWPR - - Other security state privileged read a ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-131 Virtual Memory System Architecture (VMSA) Table B3-34 VA to PA translation when Security Extensions are implemented (continued) Register or operation: CRm opc2 Mnemonic Common Non-secure state Secure state c8 5 V2POWPW - - Other security state privileged write a c8 6 V2POWUR - - Other security state User read a c8 7 V2POWUW - - Other security state User write a a. VA to PA Translation operations. Writing a VA to a VA to PA translation operation encoding translates the VA to the corresponding PA. The PA value is returned in the PAR. These operations are accessible only in privileged modes. The available VA to PA translations depend on: • whether the Security Extensions are implemented • if the Security Extensions are implemented, whether the processor is in the Secure or Non-secure state. In more detail: Security Extensions not implemented Four VA to PA translation operations are available, as shown in Table B3-33 on page B3-131. These operations provide VA to PA translation for privileged read or write, and for User read or write. Security Extensions implemented, processor in Non-secure state Only the four current security state VA to PA translation operations are available, as shown in Table B3-33 on page B3-131. These operations provide VA to PA translation for privileged read or write, and for User read or write, in the Non-secure security state. It is not possible to perform VA to PA translations for the Secure security state. Attempting to access an Other security state VA to PA translation operation encoding generates an Undefined Instruction exception. Security Extensions implemented, processor in Secure security state Eight VA to PA Translation operations are available, as shown in Table B3-34 on page B3-131: B3-132 • The four current security state VA to PA translation operations provide address translation for privileged read or write, and for User read or write, in the Secure security state. • The four other security state VA to PA translation operations provide address translation for privileged read or write, and for User read or write, in the Non-secure security state. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) Note In all cases: • If the FCSE is implemented the VA required is the VA before any modification by the FCSE, not the MVA. • For information about translations when the MMU is disabled see VA to PA translation when the MMU is disabled on page B3-136. c7, Physical Address Register (PAR) and VA to PA translations The Physical Address Register, PAR, of the current security state receives the PA during any VA to PA translation. The PAR is: • a 32-bit read/write register • accessible only in privileged modes • when the Security Extensions are implemented, a Banked register. Write access to the register means its contents can be context switched. The PAR format depends on the value of bit [0]. Bit [0] indicates whether the address translation operation completed successfully. If the translation completed successfully, the format of the PAR is: 31 12 11 10 9 8 7 6 PA (0) NS 4 3 2 1 0 SH Inner[2:0] NOS SS F Outer[1:0] IMPLEMENTATION DEFINED PA, bits [31:12] Physical Address. The physical address corresponding to the supplied virtual address. Address bits [31:12] are returned. Bit [11] Reserved. UNK/SBZP. Bits [10:1] Return information from the translation table entry used for the translation: NOS, bit [10] Not Outer Shareable attribute. Indicates whether the physical memory is Outer Shareable: 0 Memory is Outer Shareable 1 Memory is not Outer Shareable. On an implementation that do not support Outer Shareable, this bit is UNK/SBZP. NS, bit [9] Non-secure. The NS bit from the translation table entry. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-133 Virtual Memory System Architecture (VMSA) Bit [8] IMPLEMENTATION DEFINED. SH, bit [7] Shareable attribute. Indicates whether the physical memory is Shareable: 0 Memory is Non-shareable 1 Memory is Shareable. Inner[2:0], bits [6:4] Inner memory attributes from the translation table entry. Permitted values are: 0b111 Write-Back, no Write-Allocate 0b110 Write-Through 0b101 Write-Back, Write-Allocate 0b011 Device 0b001 Strongly-ordered 0b000 Non-cacheable. Other encodings for Inner[2:0] are reserved. Outer[1:0], bits [3:2] Outer memory attributes from the translation table. Possible values are: 0b11 Write-Back, no Write-Allocate. 0b10 Write-Through, no Write-Allocate. 0b01 Write-Back, Write-Allocate. 0b00 Non-cacheable. SS, bit [1] SuperSection. Used to indicate if the result is a Supersection: 0 Page is not a Supersection, that is, PAR[31:12] contains PA[31:12], regardless of the page size. 1 Page is part of a Supersection • PAR[31:24] contains PA[31:24] • PAR[23:16] contains PA[39:32] • PAR[15:12] contains 0b0000. If an implementation supports less than 40 bits of physical address, the bits in the PAR field that correspond to physical address bits that are not implemented are UNKNOWN. Note PA[23:12] is the same as VA[23:12] for Supersections F, bit [0] F bit is 0 if the conversion completed successfully. In the Inner[2:0] and Outer[1:0] fields, an implementation that does not support all of the attributes can report the memory type behavior that the cache does support, rather than the value held in the translation table entry. B3-134 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) If the translation fails without generating an abort, the format of the PAR is: 31 7 6 UNK/SBZP Bits [31:7] 1 0 FS F UNK/SBZP. FS, bits [6:1] Fault status bits. Bits [12,10,3:0] from the Data Fault Status Register, indicate the source of the abort. For more information, see c5, Data Fault Status Register (DFSR) on page B3-121. F, bit [0] F bit is 1 if the conversion aborted. The VA to PA translation only generates an abort if the translation fails because an external abort occurred on a translation table walk request. In this case: • If the external abort is synchronous, the DFSR and DFAR of the security state in which the abort is handled are updated. The DFSR indicates the appropriate external abort on Translation fault, and the DFAR indicates the MVA that caused the translation. PAR is UNKNOWN. • If the external abort is asynchronous, the DFSR of the security state in which the abort is handled is updated when the abort is taken. The DFSR indicates the asynchronous external abort. The DFAR is not updated. PAR is UNKNOWN. For all other cases where the VA to PA translation fails: • No abort is generated, and the DFSR and DFAR are unchanged • the PAR [6:1] field is updated with an FSR encoding that indicates the fault • the PAR bit [0] is set to 1. Implementations that do not support all attributes can report the behavior for those memory types that the cache does support. Accessing the PAR and the VA to PA translation operations To access one of the VA to PA translation operations you write the CP15 registers with set to 0, set to c7, set to c8, and set to the value shown in Table B3-33 on page B3-131 or Table B3-34 on page B3-131. With register Rt containing the original VA this gives: MCR p15,0,,c7,c8, To read the PAR you read the CP15 registers with set to 0, set to c7, set to c4, and set to 0. To return the translated PA in register Rt this gives: MRC p15,0,,c7,c4,0 The PAR is a read/write register, and you can write to the CP15 registers with the same settings to write to the register. There is no translation operation that requires writing to this register, but the write operation might be required to restore the value of the PAR after a context switch. An example of a VA to PA translation when the Security Extensions are not implemented is: ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-135 Virtual Memory System Architecture (VMSA) MCR p15,0,,c7,c8,2 MRC p15,0,,c7,c4,0 ; Write CP15 VA to User Read VA to PA Translation Register ; Read CP15 PA from Physical Address Register An example of a VA to PA translation when the Security Extensions are implemented and the processor is in the Secure state is: MCR p15,0,,c7,c8,5 MRC p15,0,,c7,c4,0 ; Write VA to Other State Privileged Write VA to PA Translation Register ; Performs VA to PA translation for Non-secure security state ; Read PA from Physical Address Register VA to PA translation when the MMU is disabled The VA to PA translation operations occur even when the MMU is disabled. The operations report the flat address mapping and the MMU-disabled value of the attributes and permissions for the data side accesses. These include any MMU-disabled re-mapping specified by the TEX-remap facilities. The SuperSection bit is 0 when the MMU is disabled. For more information about the address and attributes returned when the MMU is disabled see Enabling and disabling the MMU on page B3-5. When the Security Extensions are implemented, this information applies when the MMU is disabled in the security state for which the VA to PA translation is performed. B3.12.33 CP15 c7, Miscellaneous functions CP15 c7 provides a number of functions, summarized in Figure B3-10 on page B3-65. This section describes only the CP15 c7 miscellaneous operations. Figure B3-19 shows the CP15 c7 miscellaneous operations. It does not show the other CP15 c7 operations. CRn c7 opc1 0 CRm c0 c5 c10 c13 Read-only opc2 4 4 4 5 1 Read/Write NOP, was Wait For Interrupt (CP15WFI) in ARMv6 CP15ISB, Instruction Synchronization Barrier operation CP15DSB, Data Synchronization Barrier operation CP15DMB, Data Memory Barrier operation NOP, was Prefetch instruction by MVA in ARMv6 Write-only Bold text = Accessible in User mode Figure B3-19 CP15 c7 Miscellaneous operations The CP15 c7 miscellaneous operations are described in: • CP15 c7, Data and Instruction Barrier operations on page B3-137 • CP15 c7, No Operation (NOP) on page B3-138. B3-136 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) CP15 c7, Data and Instruction Barrier operations ARMv6 includes two CP15 c7 operations to perform Data Barrier operations, and another operation to perform an Instruction Barrier operation. In ARMv7: • The ARM and Thumb instruction sets include instructions to perform the barrier operations, that can be executed in all modes, see Memory barriers on page A3-47. • The CP15 c7 operations are defined as write-only operations, that can be executed in all modes. The three operations are described in: — Instruction Synchronization Barrier operation — Data Synchronization Barrier operation — Data Memory Barrier operation. The value in the register Rt specified by the MCR instruction used to perform one of these operations is ignored. You do not have to write a value to the register before issuing the MCR instruction. In ARMv7 using these CP15 c7 operations is deprecated. Use the ISB, DSB, and DMB instructions instead. Note • In ARMv6 and earlier documentation, the Instruction Synchronization Barrier operation is referred to as a Prefetch Flush (PFF). • In versions of the ARM architecture before ARMv6 the Data Synchronization Barrier operation is described as a Data Write Barrier (DWB). Instruction Synchronization Barrier operation In ARMv7, the ISB instruction is used to perform an Instruction Synchronization Barrier, see ISB on page A8-102. The deprecated CP15 c7 encoding for an Instruction Synchronization Barrier is set to 0, set to c7, set to c5, and set to 4. Data Synchronization Barrier operation In ARMv7, the DSB instruction is used to perform a Data Synchronization Barrier, see DSB on page A8-92. The deprecated CP15 c7 encoding for a Data Synchronization Barrier is set to 0, set to c7, set to c10, and set to 4. This operation performs the full system barrier performed by the DSB instruction. Data Memory Barrier operation In ARMv7, the DMB instruction is used to perform a Data Memory Barrier, see DMB on page A8-90. The deprecated CP15 c7 encoding for a Data Memory Barrier is set to 0, set to c7, set to c10, and set to 5. This operation performs the full system barrier performed by the DMB instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-137 Virtual Memory System Architecture (VMSA) CP15 c7, No Operation (NOP) ARMv6 includes two CP15 c7 operations that are not supported in ARMv7, with encodings that become No Operation (NOP) in ARMv7. These are: • The Wait For Interrupt (CP15WFI) operation. In ARMv7 this operation is performed by the WFI instruction, that is available in the ARM and Thumb instruction sets. For more information, see WFI on page A8-810. • The prefetch instruction by MVA operation. In ARMv7 this operation is replaced by the PLI instruction, that is available in the ARM and Thumb instruction sets. For more information, see PLI (immediate, literal) on page A8-242 and PLI (register) on page A8-244. In ARMv7, the CP15 c7 encodings that were used for these operations must be valid write-only operations that perform a NOP. These encodings are: • for the ARMv6 CP15WFI operation: set to 0, set to c7, set to c0, and set to 4 — • for the ARMv6 prefetch instruction by MVA operation: set to 0, set to c7, set to c13, and set to 1. — B3.12.34 CP15 c8, TLB maintenance operations On ARMv7-A implementations, CP15 c8 operations are used for TLB maintenance functions. Figure B3-20 shows the CP15 c8 encodings. CRn opc1 c8 0 CRm c3 c5 c6 c7 Read-only opc2 0 1 2 3 0 1 2 0 1 2 0 1 2 3 Read/Write TLBIALLIS, Invalidate entire TLB Inner Shareable ‡ TLBIMVAIS, Invalidate unified TLB entry by MVA Inner Shareable ‡ TLBIASIDIS, Invalidate unified TLB by ASID match Inner Shareable ‡ TLBIMVAAIS, Invalidate unified TLB entry by MVA all ASID Inner Shareable ‡ ITLBIALL, invalidate instruction TLB ITLBIMVA, invalidate instruction TLB entry by MVA ITLBIASID, invalidate instruction TLB by ASID match DTLBIALL, invalidate data TLB DTLBIMVA, invalidate data TLB entry by MVA DTLBIASID, invalidate data TLB by ASID match TLBIALL*, invalidate unified TLB TLBIMVA*, invalidate unified TLB entry by MVA TLBIASID*, invalidate unified TLB by ASID match TLBIMVAA, Invalidate unified TLB entries by MVA all ASID ‡ Write-only * See text for more information about these mnemonics ‡ Part of the Multiprocessing Extensions Figure B3-20 CP15 c8 operations CP15 c8 encodings not shown in Figure B3-20 are UNPREDICTABLE, see Unallocated CP15 encodings on page B3-69. B3-138 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) The CP15 c8 TLB maintenance functions: • are write-only operations • can be executed only in privileged modes. Table B3-35 summarizes the TLB maintenance operations. Table B3-35 CP15 c8 TLB maintenance operations CRm opc2 Mnemonic Function Rt data c3 0 TLBIALLIS Invalidate entire unified TLB d Inner Shareable a Ignored 1 TLBIMVAIS Invalidate unified TLB d entry by MVA Inner Shareable a MVA 2 TLBIASIDIS Invalidate unified TLB d by ASID match Inner Shareable a ASID 3 TLBIMVAAIS Invalidate unified TLB d entry by MVA all ASID Inner Shareable a MVA 0 ITLBIALL Invalidate entire instruction TLB b Ignored 1 ITLBIMVA Invalidate instruction TLB b entry by MVA MVA 2 ITLBIASID Invalidate instruction TLB b by ASID match ASID 0 DTLBIALL Invalidate entire data TLB b Ignored 1 DTLBIMVA Invalidate data TLB b entry by MVA MVA 2 DTLBIASID Invalidate data TLB b by ASID match ASID 0 TLBIALL c Invalidate entire unified TLB d Ignore 1 TLBIMVA c Invalidate unified TLB d entry by MVA MVA 2 TLBIASID c Invalidate unified TLB d by ASID match ASID 3 TLBIMVAA Invalidate unified TLB d entries by MVA all ASID a MVA c5 c6 c7 a. Implemented only as part of the Multiprocessing Extensions. b. If these operations are performed on an implementation that has a unified TLB they operate on the unified TLB. c. The mnemonics for the operations with CRm==c7, opc2=={0,1,2} were previously UTLBIALL, UTLBIMVA and UTLBIMASID. d. When separate instruction and data TLBs are implemented, these operations are performed on both TLBs. For more information about the Inner Shareable operations see Multiprocessor effects on TLB maintenance operations on page B3-62. For information about the effect of these operations on locked TLB entries see The interaction of TLB maintenance operations with TLB lockdown on page B3-57. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-139 Virtual Memory System Architecture (VMSA) About the TLB maintenance operations For more information about TLBs and their maintenance see Translation Lookaside Buffers (TLBs) on page B3-54, and in particular TLB maintenance on page B3-56. The following subsections give more information about the TLB maintenance operations: • Invalidate entire TLB • Invalidate single TLB entry by MVA • Invalidate TLB entries by ASID match • Invalidate TLB entries by MVA all ASID on page B3-141. As stated in the footnotes to Table B3-35 on page B3-139: • If an Instruction TLB or Data TLB operation is used on a system that implements a Unified TLB then the operation is performed on the Unified TLB • If a Unified TLB operation is used on a system that implements separate Instruction and Data TLBs then the operation is performed on both the Instruction TLB and the Data TLB. • The mnemonics for the operations to invalidate a unified TLB that are defined in the ARM v7 base architecture were previously UTLBIALL, UTLBIMV, and UTLBIASID. These remain synonyms for these operations, but ARM deprecates the use of the older names. These are the operations with CRm==c7, opc2=={0,1,2}. For information about the synchronization of the TLB maintenance operations see TLB maintenance operations and the memory order model on page B3-59. Invalidate entire TLB The Invalidate entire TLB operations invalidate all unlocked entries in the TLB. The value in the register Rt specified by the MCR instruction used to perform the operation is ignored. You do not have to write a value to the register before issuing the MCR instruction. Invalidate single TLB entry by MVA The Invalidate Single Entry operations invalidate a TLB entry that matches the MVA and ASID values provided as an argument to the operation. The register format required is: 31 12 11 MVA 8 7 SBZ 0 ASID With global entries in the TLB, the supplied ASID value is not checked. Invalidate TLB entries by ASID match The Invalidate on ASID Match operations invalidate all TLB entries for non-global pages that match the ASID value provided as an argument to the operation. The register format required is: 31 8 7 SBZ B3-140 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. 0 ASID ARM DDI 0406B Virtual Memory System Architecture (VMSA) Invalidate TLB entries by MVA all ASID The Invalidate TLB entries by MVA all ASID operations invalidate all TLB entries that matches the MVA provided as an argument to the operation regardless of the ASID. The register format required is: 31 12 11 MVA 0 SBZ Accessing the CP15 c8 TLB maintenance operations To perform one of the TLB maintenance operations you write the CP15 registers with == 0, ==c8, and and set to the values shown in Table B3-35 on page B3-139. That is: MCR p15,0,,c8,, For example: MCR p15,0,,c8,c5,0 MCR p15,0,,c8,c6,2 ; Invalidate all unlocked entries in Instruction TLB ; Invalidate Data TLB entries on ASID match B3.12.35 CP15 c9, Cache and TCM lockdown registers and performance monitors Some CP15 c9 encodings are reserved for IMPLEMENTATION DEFINED memory system functions, in particular: • cache control, including lockdown • TCM control, including lockdown • branch predictor control. Additional CP15 c9 encodings are reserved for performance monitors. These encodings fall into two groups: • the optional performance monitors, described in Chapter C9 Performance Monitors • additional IMPLEMENTATION DEFINED performance monitors. The reserved encodings permit implementations that are compatible with previous versions of the ARM architecture, in particular with the ARMv6 requirements. Figure B3-21 shows the permitted CP15 c9 register encodings. CRn c9 opc1 {0-7} CRm {c0-c2} {c5-c8} {c12-c14} c15 Read-only ‡ opc2 {0-7} {0-7} {0-7} {0-7} Read/Write ‡ ‡ ‡ Reserved for Branch Predictor, Cache and TCM operations Reserved for Branch Predictor, Cache and TCM operations Reserved for ARM-recommended Performance Monitors Reserved for IMPLEMENTATION DEFINED Performance Monitors Write-only Access depends on the operation Figure B3-21 Permitted CP15 c9 encodings ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-141 Virtual Memory System Architecture (VMSA) CP15 c9 encodings not shown in Figure B3-21 on page B3-141 are UNPREDICTABLE, see Unallocated CP15 encodings on page B3-69. In ARMv6, CP15 c9 provides cache lockdown functions. With the ARMv7 abstraction of the hierarchical memory model, for CP15 c9: • All encodings with CRm = {c0-c2, c5-c8} are reserved for IMPLEMENTATION DEFINED cache, branch predictor and TCM operations. This reservation enables the implementation of a scheme that is backwards compatible with ARMv6. For details of the ARMv6 implementation see c9, Cache lockdown support on page AppxG-45. Note In an ARMv6 implementation that implements the Security Extensions, a Cache Behavior Override Register is required in CP15 c9, with CRm = 8, see c9, Cache Behavior Override Register (CBOR) on page AppxG-49. This register is not architecturally-defined in ARMv7, and therefore the CP15 c9 encoding with CRm = 8 is IMPLEMENTATION DEFINED. However, an ARMv7 implementation can include the CBOR, in which case ARM recommends that this encoding is used for it. • All encodings with CRm = {c12-c14} are reserved for the optional performance monitors that are defined in Chapter C9 Performance Monitors. • All encodings with CRm = c15 are reserved for IMPLEMENTATION DEFINED performance monitoring features. B3.12.36 CP15 c10, Memory remapping and TLB control registers On ARMv7-A implementations, CP15 c10 is used for memory remapping registers. In addition, some encodings are reserved for IMPLEMENTATION DEFINED TLB control functions, in particular TLB lockdown. The reserved encodings permit implementations that are compatible with previous versions of the ARM architecture, in particular with the ARMv6 requirements. Figure B3-22 shows the CP15 c10 registers and reserved encodings. CRn c10 opc1 0 Read-only ‡ CRm {c0,c1,c4,c8} c2 opc2 {0-7} 0 1 Read/Write ‡ Reserved for TLB Lockdown operations PRRR, Primary Region Remap Register NMRR, Normal Memory Remap Register Write-only Access depends on the operation Figure B3-22 CP15 c10 registers CP15 c10 encodings not shown in Figure B3-22 are UNPREDICTABLE, see Unallocated CP15 encodings on page B3-69. The CP15 c10 memory remap registers are described in CP15 c10, Memory Remap Registers on page B3-143. B3-142 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) The IMPLEMENTATION DEFINED TLB control operations In VMSAv6, CP15 c10 provides TLB lockdown functions. In VMSAv7, the TLB lockdown mechanism is IMPLEMENTATION DEFINED and some CP15 c10 encodings are reserved for IMPLEMENTATION DEFINED TLB control operations. These are the encodings with == c10, == 0, == {c0, c1, c4, c8}, and == {0-7}. B3.12.37 CP15 c10, Memory Remap Registers CP15 c10 includes two Memory Remap Registers, described in the subsections: • c10, Primary Region Remap Register (PRRR) • c10, Normal Memory Remap Register (NMRR) on page B3-146. In addition: • The significance and use of these registers is described in Memory region attribute descriptions when TEX remap is enabled on page B3-34. • The function of these registers is architecturally defined only when the — SCTLR.TRE bit is set to 1 — SCTLR.TRE bit is set to 0 and no IMPLEMENTATION DEFINED mechanism using MMU remap has been invoked. Otherwise their behavior is IMPLEMENTATION DEFINED, see SCTLR.TRE, SCTLR.M, and the effect of the MMU remap registers on page B3-38. c10, Primary Region Remap Register (PRRR) The Primary Region Remap Register, PRRR, can in some cases control the top level mapping of the TEX[0], C, and B memory region attributes. The PRRR: • is a 32-bit read/write register • is accessible only in privileged modes • when the Security Extensions are implemented: — is a Banked register — has write access to the Secure copy of the register disabled when the CP15SDISABLE signal is asserted HIGH. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-143 Virtual Memory System Architecture (VMSA) The format of the PRRR is: 31 30 29 28 27 26 25 24 23 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 UNK/SBZP NOS7 NOS6 NOS5 NOS4 NOS0 NS1 NOS1 NS0 NOS2 NOS3 TR7 TR6 TR5 TR4 TR3 TR2 TR1 TR0 DS0 DS1 The reset value of the PRRR is IMPLEMENTATION DEFINED. NOSn, bit [24+n], for values of n from 0 to 7 Outer Shareable property mapping for memory attributes n, if the region is mapped as Normal Memory that is Shareable. n is the value of the TEX[0], C and B bits, see Table B3-36 on page B3-145. The possible values of each NOSn bit are: 0 Shareable Normal memory region is Outer Shareable 1 Shareable Normal Memory region is Inner Shareable. The value of this bit is ignored if the region is not Shareable Normal memory. The meaning of the field with n = 6 is IMPLEMENTATION DEFINED and might differ from the meaning given here. This is because the meaning of the attribute combination {TEX[0] = 1, C = 1, B = 0} is IMPLEMENTATION DEFINED. If the implementation does not support the Outer Shareable memory attribute then these bits are reserved, RAZ/SBZP. Bits [23:20] Reserved. UNK/SBZP. NS1, bit [19] Mapping of S = 1 attribute for Normal memory. This bit gives the mapped Shareable attribute for a region of memory that: • is mapped as Normal memory • has the S bit set to 1. The possible values of the bit are: 0 Region is not Shareable 1 Region is Shareable. NS0, bit [18] Mapping of S = 0 attribute for Normal memory. This bit gives the mapped Shareable attribute for a region of memory that: • is mapped as Normal memory • has the S bit set to 0. The possible values of the bit are the same as those given for the NS1 bit, bit [19]. B3-144 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) DS1, bit [17] Mapping of S = 1 attribute for Device memory. This bit gives the mapped Shareable attribute for a region of memory that: • is mapped as Device memory • has the S bit set to 1. The possible values of the bit are the same as those given for the NS1 bit, bit [19]. DS0, bit [16] Mapping of S = 0 attribute for Device memory. This bit gives the mapped Shareable attribute for a region of memory that: • is mapped as Device memory • has the S bit set to 0. The possible values of the bit are the same as those given for the NS1 bit, bit [19]. TRn, bits [2n+1:2n] for values of n from 0 to 7 Primary TEX mapping for memory attributes n. n is the value of the TEX[0], C and B bits, see Table B3-36. This field defines the mapped memory type for a region with attributes n. The possible values of the field are: 00 Strongly-ordered 01 Device 10 Normal Memory 11 Reserved, effect is UNPREDICTABLE. The meaning of the field with n = 6 is IMPLEMENTATION DEFINED and might differ from the meaning given here. This is because the meaning of the attribute combination {TEX[0] = 1, C = 1, B = 0} is IMPLEMENTATION DEFINED. Table B3-36 shows the mapping between the memory region attributes and the n value used in the PRRR.nOSn and PRRR.TRn field descriptions. Table B3-36 Memory attributes and the n value for the PRRR field descriptions Attributes n value ARM DDI 0406B TEX[0] C B 0 0 0 0 0 0 1 1 0 1 0 2 0 1 1 3 1 0 0 4 1 0 1 5 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-145 Virtual Memory System Architecture (VMSA) Table B3-36 Memory attributes and the n value for the PRRR field descriptions (continued) Attributes n value TEX[0] C B 1 1 0 6 1 1 1 7 For more information about the PRRR see Memory region attribute descriptions when TEX remap is enabled on page B3-34. Accessing the PRRR To access the PRRR you read or write the CP15 registers with set to 0, set to c10, set to c2, and set to 0. For example: MRC p15,0,,c10,c2,0 MCR p15,0,,c10,c2,0 ; Read CP15 Primary Region Remap Register ; Write CP15 Primary Region Remap Register c10, Normal Memory Remap Register (NMRR) The Normal Memory Remap Register, NMRR, can in some cases provide additional mapping controls for memory regions that are mapped as Normal memory by their entry in the PRRR. The NMRR: • is a 32-bit read/write register • is accessible only in privileged modes • when the Security Extensions are implemented: — is a Banked register — has write access to the Secure copy of the register disabled when the CP15SDISABLE signal is asserted HIGH. The format of the NMRR is: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 OR7 OR6 OR5 OR4 OR3 OR2 OR1 OR0 IR7 IR3 IR0 IR6 IR5 IR4 IR2 IR1 The reset value of the NMRR is IMPLEMENTATION DEFINED. ORn, bits [2n+17:2n+16], for values of n from 0 to 7 Outer Cacheable property mapping for memory attributes n, if the region is mapped as Normal Memory by the TRn entry in the PRRR, see c10, Primary Region Remap Register (PRRR) on page B3-143. n is the value of the TEX[0], C and B bits, see Table B3-36 on page B3-145. The possible values of this field are: 00 Region is Non-cacheable 01 Region is Write-Back, WriteAllocate B3-146 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) 10 11 Region is WriteThrough, Non-WriteAllocate Region is Write-Back, Non-WriteAllocate. The meaning of the field with n = 6 is IMPLEMENTATION DEFINED and might differ from the meaning given here. This is because the meaning of the attribute combination {TEX[0] = 1, C = 1, B = 0} is IMPLEMENTATION DEFINED. IRn, bits [2n+1:2n], for values of n from 0 to 7 Inner Cacheable property mapping for memory attributes n, if the region is mapped as Normal Memory by the TRn entry in the PRRR, see c10, Primary Region Remap Register (PRRR) on page B3-143. n is the value of the TEX[0], C and B bits, see Table B3-36 on page B3-145. The possible values of this field are the same as those given for the ORn field. The meaning of the field with n = 6 is IMPLEMENTATION DEFINED and might differ from the meaning given here. This is because the meaning of the attribute combination {TEX[0] = 1, C = 1, B = 0} is IMPLEMENTATION DEFINED. For more information about the NMRR see Memory region attribute descriptions when TEX remap is enabled on page B3-34. Accessing the NMRR To access the NMRR you read or write the CP15 registers with set to 0, set to c10, set to c2, and set to 1. For example: MRC p15,0,,c10,c2,1 MCR p15,0,,c10,c2,1 ; Read CP15 Normal Memory Remap Register ; Write CP15 Normal Memory Remap Register B3.12.38 CP15 c11, Reserved for TCM DMA registers Some CP15 c11 register encodings are reserved for IMPLEMENTATION DEFINED DMA operations to and from TCM, see Figure B3-23: CRn c11 opc1 {0-7} Read-only ‡ CRm {c0-c8} c15 opc2 {0-7} {0-7} Read/Write ‡ ‡ Reserved for DMA operations for TCM access Reserved for DMA operations for TCM access Write-only Access depends on the operation Figure B3-23 Permitted CP15 c11 encodings CP15 c11 encodings not shown in Figure B3-23 are UNPREDICTABLE, see Unallocated CP15 encodings on page B3-69. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-147 Virtual Memory System Architecture (VMSA) B3.12.39 CP15 c12, Security Extensions registers When the Security Extensions are implemented, CP15 c12 is used for the Vector base address registers and an Interrupt status register. Figure B3-24 shows the CP15 c12 Security Extensions registers: CRn c12 opc1 0 CRm c0 opc2 0 1 0 c1 Read-only Read/Write * VBAR, Vector Base Address Register * MVBAR, Monitor Vector Base Address Register * ISR, Interrupt Status Register Write-only * Implemented only when the Security Extensions are implemented Figure B3-24 Security Extensions CP15 c12 registers When the Security Extensions are implemented, CP15 c12 encodings not shown in Figure B3-24 are UNPREDICTABLE. On an implementation that does not include the Security Extensions all CP15 c12 encodings are UNDEFINED. For more information, see Unallocated CP15 encodings on page B3-69. The CP15 c12 registers are described in the subsections: • c12, Vector Base Address Register (VBAR) • c12, Monitor Vector Base Address Register (MVBAR) on page B3-149 • c12, Interrupt Status Register (ISR) on page B3-150. B3.12.40 c12, Vector Base Address Register (VBAR) When the Security Extensions are implemented and high exception vectors are not selected, the Vector Base Address Register, VBAR, provides the exception base address for exceptions that are not handled in Monitor mode, see Exception vectors and the exception base address on page B1-30. The high exception vectors always have the base address 0xFFFF0000 and are not affected by the value of VBAR. The VBAR: B3-148 • Is present only when the Security Extensions are implemented. • Is a 32-bit read/write register. • Is accessible only in privileged modes. • Has a defined reset value, for the Secure copy of the register, of 0. This reset value does not apply to the Non-secure copy of the register, and software must program the Non-secure copy of the register with the required value, as part of the processor boot sequence. • Is a Banked register. • Has write access to the Secure copy of the register disabled when the CP15SDISABLE signal is asserted HIGH. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) The format of the VBAR is: 31 5 4 Vector_Base_Address 0 (0) (0) (0) (0) (0) The Secure copy of the VBAR holds the vector base address for the Secure state, described as the Secure exception base address The Non-secure copy of the VBAR holds the vector base address for the Non-secure state, described as the Non-secure exception base address. Vector_Base_Address, bits [31:5] Bits [31:5] of the base address of the normal exception vectors. Bits [4:0] of an exception vector is the exception offset, see Table B1-3 on page B1-31. Bits [4:0] Reserved, UNK/SBZP. For details of how the VBAR registers are used to determine the exception addresses see Exception vectors and the exception base address on page B1-30. Accessing the VBAR To access the VBAR you read or write the CP15 registers with set to 0, set to c12, set to c0, and set to 0. For example: MRC p15,0,,c12,c0,0 MCR p15,0,,c12,c0,0 ; Read CP15 Vector Base Address Register ; Write CP15 Vector Base Address Register B3.12.41 c12, Monitor Vector Base Address Register (MVBAR) The Monitor Vector Base Address Register, MVBAR, provides the exception base address for all exceptions that are handled in Monitor mode, see Exception vectors and the exception base address on page B1-30. The MVBAR is: • present only when the Security Extensions are implemented • a 32-bit read/write register • accessible in Secure privileged modes only • a Restricted access register, meaning it exists only in the Secure state. The format of the MVBAR is: 31 5 4 Monitor_Vector_Base_Address ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. 0 (0) (0) (0) (0) (0) B3-149 Virtual Memory System Architecture (VMSA) The reset value of the MVBAR is UNKNOWN. The MVBAR must be programmed as part of the boot sequence. Monitor_Vector_Base_Address, bits [31:5] Bits [31:5] of the base address of the exception vectors for exceptions that are handled in Monitor mode. Bits [4:0] of an exception vector is the exception offset, see Table B1-3 on page B1-31. Bits [4:0] Reserved, UNK/SBZP. For details of how the MVBAR is used to determine the exception addresses see Exception vectors and the exception base address on page B1-30. Accessing the MVBAR To access the MVBAR you read or write the CP15 registers with set to 0, set to c12, set to c0, and set to 1. For example: MRC p15,0,,c12,c0,1 MCR p15,0,,c12,c0,1 ; Read CP15 Monitor Vector Base Address Register ; Write CP15 Monitor Vector Base Address Register B3.12.42 c12, Interrupt Status Register (ISR) The Interrupt Status Register, ISR, shows whether an IRQ, FIQ or external abort is pending. The ISR is: • present only when the Security Extensions are implemented • a 32-bit read-only register • accessible only in privileged modes. • a Common register, meaning it is available in the Secure and Non-secure states. The format of the ISR is: 31 9 8 UNK B3-150 7 6 5 0 A I F (0) (0) (0) (0) (0) (0) Bits [31:9] Reserved, UNK. A, bit [8] External abort pending flag: 0 no pending external abort 1 an external abort is pending. I, bit [7] Interrupt pending flag. Indicates whether an IRQ interrupt is pending: 0 no pending IRQ 1 an IRQ interrupt is pending. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) F, bit [7] Fast interrupt pending flag. Indicates whether an FIQ fast interrupt is pending: 0 no pending FIQ 1 an FIQ fast interrupt is pending. Bits [5:0] Reserved, UNK/SBZP. The bit positions of the A, I and F flags in the ISR match the A, I and F flag bits in the CPSR, see Program Status Registers (PSRs) on page B1-14. This means the same masks can be used to extract the flags from the register value. Note • The ISR.F and ISR.I bits directly reflect the state of the FIQ and IRQ inputs. • the ISR.A bit is set when an asynchronous abort is generated and is cleared automatically when the abort is taken. Accessing the ISR To access the ISR you read the CP15 registers with set to 0, set to c12, set to c1, and set to 0. For example: MRC p15,0,,c12,c1,0 ; Read Interrupt Status Register B3.12.43 CP15 c13, Process, context and thread ID registers The CP15 c13 registers are used for: • a Context ID register • three software Thread ID registers • an FCSE Process ID Register. Note From ARMv6, use of the FCSE is a deprecated, and in ARMv7 the FCSE is an optional component of a VMSA implementation. ARM expects the FCSE will become obsolete during the lifetime of ARMv7. However, every ARMv7-A implementation must include the FCSE Process ID Register. Figure B3-25 on page B3-152 shows the CP15 c13 registers: ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-151 Virtual Memory System Architecture (VMSA) CRn c13 opc1 0 Read-only † CRm c0 opc2 0 1 2 3 4 † Read/Write FCSEIDR, FCSE PID Register CONTEXTIDR, Context ID Register TPIDRURW, User Read/Write Software Thread ID TPIDRURO, User Read Only Registers TPIDRPRW, Privileged Only Write-only Access depends whether FCSE is implemented Figure B3-25 CP15 c13 registers in a VMSA implementation CP15 c13 encodings not shown in Figure B3-25 are UNPREDICTABLE, see Unallocated CP15 encodings on page B3-69. The CP15 c13 registers are described in: • c13, FCSE Process ID Register (FCSEIDR) • c13, Context ID Register (CONTEXTIDR) on page B3-153 • CP15 c13 Software Thread ID registers on page B3-154. B3.12.44 c13, FCSE Process ID Register (FCSEIDR) The FCSE Process ID Register, FCSEIDR, identifies the current Process ID (PID) for the Fast Context Switch Extension (FCSE). In ARMv7, the FCSE is optional. However, the FCSEIDR must be implemented regardless of whether the FCSE is implemented. Software can access this register to determine whether the FCSE is implemented. The FCSEIDR: • Is a 32-bit register, with access that depends on whether the FCSE is implemented: FCSE implemented: . the register is read/write FCSE not implemented: . the register is RAZ/WI. • Is accessible only in privileged modes. • When implemented as a read/write register, has a defined reset value of 0. When the Security Extensions are implemented, this reset value applies only to the Secure copy of the register, and software must program the Non-secure copy of the register with the required value. • When the Security Extensions are implemented, is a Banked register. When the Security Extensions are implemented and the FCSE is implemented, write access to the Secure copy of the FCSEIDR is disabled when the CP15SDISABLE signal is asserted HIGH. B3-152 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) The format of the FCSEIDR is: 31 25 24 0 PID UNK/SBZP PID, bits [31:25] The current Process ID, for the FCSE. If the FCSE is not implemented this field is RAZ/WI. Bits [24:0] Reserved. If the FCSE is not implemented this field is RAZ/WI. If the FCSE is implemented, the value of this field is UNKNOWN on reads and Should-Be-Zero-or-Preserved on writes. Note • When the PID is written, the overall virtual-to-physical address mapping changes. Because of this, you must ensure that instructions that might have been prefetched already are not affected by the address mapping change. • From ARMv6, use of the FCSE is deprecated, and in ARMv7 the FCSE is optional. Accessing the FCSEIDR To access the FCSEIDR you read or write the CP15 registers with set to 0, set to c13, set to c0, and set to 0. For example: MRC p15,0,,c13,c0,0 MCR p15,0,,c13,c0,0 ; Read CP15 FCSE PID Register ; Write CP15 FCSE PID Register B3.12.45 c13, Context ID Register (CONTEXTIDR) The Context ID Register, CONTEXTIDR, identifies the current: • Process Identifier (PROCID) • Address Space Identifier (ASID). The value of the whole of this register is called the Context ID and is used by: • the debug logic, for Linked and Unlinked Context ID matching, see Breakpoint debug events on page C3-5 and Watchpoint debug events on page C3-15. • the trace logic, to identify the current process. The ASID field value is used by many memory management functions. The CONTEXTIDR is: • a 32-bit read/write register • accessible only in privileged modes • when the Security Extensions are implemented, a Banked register. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-153 Virtual Memory System Architecture (VMSA) The format of the CONTEXTIDR is: 31 8 7 PROCID 0 ASID PROCID, bits [31:8] Process Identifier. This field must be programmed with a unique value that identifies the current process. It is used by the trace logic and the debug logic to identify the process that is running currently. ASID, bits [7:0] Address Space Identifier. This field is programmed with the value of the current ASID. Using the CONTEXTIDR For information about the synchronization of changes to the CONTEXTIDR see Changes to CP15 registers and the memory order model on page B3-77. There are particular synchronization requirements when changing the ASID and Translation Table Base Registers, see Synchronization of changes of ASID and TTBR on page B3-60. Accessing the CONTEXTIDR To access the CONTEXTIDR you read or write the CP15 registers with set to 0, set to c13, set to c0, and set to 1. For example: MRC p15,0,,c13,c0,1 MCR p15,0,,c13,c0,1 ; Read CP15 Context ID Register ; Write CP15 Context ID Register B3.12.46 CP15 c13 Software Thread ID registers The Software Thread ID registers provide locations where software can store thread identifying information, for OS management purposes. These registers are never updated by the hardware. The Software Thread ID registers are: • three 32-bit register read/write registers: — User Read/Write Thread ID Register, TPIDRURW — User Read-only Thread ID Register, TPIDRURO — Privileged Only Thread ID Register, TPIDRPRW. • accessible in different modes: — the User Read/Write Thread ID Register is read/write in unprivileged and privileged modes — the User Read-only Thread ID Register is read-only in User mode, and read/write in privileged modes — the Privileged Only Thread ID Register is only accessible in privileged modes, and is read/write. • when the Security Extensions are implemented, Banked registers • introduced in ARMv7. B3-154 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) Accessing the Software Thread ID registers To access the Software Thread ID registers you read or write the CP15 registers with set to 0, set to c13, set to c0, and set to: • 2 for the User Read/Write Thread ID Register, TPIDRURW • 3 for the User Read-only Thread ID Register, TPIDRURO • 4 for the Privileged Only Thread ID Register, TPIDRPRW. For example: MRC MCR MRC MCR MRC MCR p15, p15, p15, p15, p15, p15, 0, 0, 0, 0, 0, 0, , , , , , , c13, c13, c13, c13, c13, c13, c0, c0, c0, c0, c0, c0, 2 2 3 3 4 4 ; ; ; ; ; ; Read Write Read Write Read Write CP15 CP15 CP15 CP15 CP15 CP15 User Read/Write User Read/Write User Read-only User Read-only Privileged Only Privileged Only Thread Thread Thread Thread Thread Thread ID ID ID ID ID ID Register Register Register Register Register Register B3.12.47 CP15 c14, Not used CP15 c14 is not used on any ARMv7 implementation, see Unallocated CP15 encodings on page B3-69. B3.12.48 CP15 c15, IMPLEMENTATION DEFINED registers CP15 c15 is reserved for IMPLEMENTATION DEFINED purposes. ARMv7 does not impose any restrictions on the use of the CP15 c15 encodings. The documentation of the ARMv7 implementation must describe fully any registers implemented in CP15 c15. Normally, for processor implementations by ARM, this information is included in the Technical Reference Manual for the processor. Typically, CP15 c15 is used to provide test features, and any required configuration options that are not covered by this manual. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-155 Virtual Memory System Architecture (VMSA) B3.13 Pseudocode details of VMSA memory system operations This section contains pseudocode describing VMSA memory operations. The following subsections describe the pseudocode functions: • Alignment fault • FCSE translation • Address translation on page B3-157 • Domain checking on page B3-157 • TLB operations on page B3-158 • Translation table walk on page B3-158. See also the pseudocode for general memory system operations in Pseudocode details of general memory system operations on page B2-29. B3.13.1 Alignment fault The following pseudocode describes the generation of an Alignment fault Data Abort exception: // AlignmentFaultV() // ================= AlignmentFaultV(bits(32) address, boolean iswrite) mva = FCSETranslate(address); DataAbort(mva, bits(4) UNKNOWN, boolean UNKNOWN, iswrite, DAbort_Alignment); B3.13.2 FCSE translation The following pseudocode describes the FCSE translation: // FCSETranslate() // =============== bits(32) FCSETranslate(bits(32) va) if va<31:25> == ‘0000000’ then mva = FCSEIDR.PID : va<24:0>; else mva = va; return mva; B3-156 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) B3.13.3 Address translation The following pseudocode describes address translation in a VMSA implementation: // TranslateAddressV() // =================== AddressDescriptor TranslateAddressV(bits(32) va, boolean ispriv, boolean iswrite) mva = FCSETranslate(va); if SCTLR.M == ‘1’ then // MMU is enabled (tlbhit, tlbrecord) = CheckTLB(CONTEXTIDR.ASID, mva); if !tlbhit then tlbrecord = TranslationTableWalk(mva, iswrite); if CheckDomain(tlbrecord.domain, mva, tlbrecord.sectionnotpage, iswrite) then CheckPermission(tlbrecord.perms, mva, tlbrecord.sectionnotpage, iswrite, ispriv); else tlbrecord = TranslationTableWalk(mva, iswrite); return tlbrecord.addrdesc; B3.13.4 Domain checking The following pseudocode describes domain checking: // CheckDomain() // ============= boolean CheckDomain(bits(4) domain, bits(32) mva, boolean sectionnotpage, boolean iswrite) bitpos = 2*UInt(domain); case DACR of when ‘00’ DataAbort(mva, domain, sectionnotpage, iswrite, DAbort_Domain); when ‘01’ permissioncheck = TRUE; when ‘10’ UNPREDICTABLE; when ‘11’ permissioncheck = FALSE; return permissioncheck; ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-157 Virtual Memory System Architecture (VMSA) B3.13.5 TLB operations The TLBRecord type represents the contentsof a TLB entry: // Types of TLB entry enumeration TLBRecType = { TLBRecType_SmallPage, TLBRecType_LargePage, TLBRecType_Section, TLBRecType_Supersection, TLBRecType_MMUDisabled }; type TLBRecord is ( Permissions bit bits(4) boolean TLBRecType AddressDescriptor ) perms, nG, // ‘0’ = Global, ‘1’ = not Global domain, sectionnotpage, type, addrdesc The CheckTLB() function checks whether the TLB contains an entry that matches an ASID and address, and returns TRUE and the matching TLBRecord if so. Otherwise, it returns FALSE and an UNKNOWN TLBRecord. (boolean, TLBRecord) CheckTLB(bits(8) asid, bits(32) address) The AssignToTLB() procedure supplies an ASID and new TLBRecord to the TLB, for possible allocation to a TLB entry. It is IMPLEMENTATION DEFINED under what circumstances this allocation takes place, and TLB entries might also be allocated at other times. AssignToTLB(bits(8) asid, bits(32) mva, TLBRecord entry) B3.13.6 Translation table walk The following pseudocode describes the translation table walk operation: // TranslationTableWalk() // ====================== // // Returns a result of a translation table walk in TLBRecord form. TLBRecord TranslationTableWalk(bits(32) mva, boolean is_write) TLBRecord result; AddressDescriptor l1descaddr; AddressDescriptor l2descaddr; if SCTLR.M == ‘1’ then // MMU is enabled domain = bits(4) UNKNOWN; // For Data Abort exceptions found before a domain is known // Determine correct Translation Table Base Register to use. n = UInt(TTBCR.N); B3-158 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) if n == 0 || IsZero(mva<31:(32-n)>) then ttbr = TTBR0; disabled = (TTBCR.PD0 == ‘1’); else ttbr = TTBR1; disabled = (TTBCR.PD1 == ‘1’); n = 0; // TTBR1 translation always works like N=0 TTBR0 translation // Check this Translation Table Base Register is not disabled. if HaveSecurityExt() && disabled == ‘1’ then DataAbort(mva, domain, TRUE, is_write, DAbort_Translation); // Obtain level 1 descriptor. l1descaddr.paddress.physicaladdress = ttbr<31:(14-n)> : mva<(31-n):20> : ‘00’; l1descaddr.paddress.physicaladdressext = ‘00000000’; l1descaddr.paddress.NS = if IsSecure() then ‘0’ else ‘1’; l1descaddr.memattrs.type = MemType_Normal; l1descaddr.memattrs.shareable = (ttbr<1> == ‘1’); l1descaddr.memattrs.outershareable = (ttbr<5> == ‘0’) && (ttbr<1> == ‘1’); l1descaddr.memattrs.outerattrs = ttbr<4:3>; if HaveMPExt() then 1descaddr.memattrs.innerattrs = ttbr<0>:ttbr<6>; else if ttbr<0> == ‘0’ then l1descaddr.memattrs.innerattrs = ‘00’; else IMPLEMENTATION_DEFINED set l1descaddr.memattrs.innerattrs to one of ‘01’,’10’,’11’; l1desc = _Mem[l1descaddr,4]; // Process level 1 descriptor. case l1desc<1:0> of when ‘00’, ‘11’ // Fault, Reserved DataAbort(mva, domain, TRUE, is_write, DAbort_Translation); when ‘01’ // Section or Supersection texcb = l1desc<14:12,3,2>; S = l1desc<16>; ap = l1desc<15,11:10>; xn = l1desc<4>; nG = l1desc<17>; sectionnotpage = TRUE; NS = l1desc<19>; if SCTLR.AFE == ‘1’ && l1desc<10> == ‘0’ then if SCTLR.HA == ‘0’ then DataAbort(mva, domain, sectionnotpage, is_write, DAbort_AccessFlag); else // Hardware-managed access flag must be set in memory _Mem[l1descaddr,4]<10> = ‘1’; if l1desc<18> == ‘0’ then // Section domain = l1desc<8:5>; type = TLBRecType_Section; ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-159 Virtual Memory System Architecture (VMSA) physicaladdressext = ‘00000000’; physicaladdress = l1desc<31:20> : mva<19:0>; else // Supersection domain = ‘0000’; type = TLBRecType_Supersection; physicaladdressext = l1desc<8:5,23:20>; physicaladdress = l1desc<31:24> : mva<23:0>; when ‘10’ // Large page or Small page domain = l1desc<8:5>; sectionnotpage = FALSE; NS = l1desc<3>; // Obtain level 2 descriptor. l2descaddr.paddress.physicaladdress = l1desc<31:10> : mva<19:12> : ‘00’; l2descaddr.paddress.physicaladdressext = ‘00000000’; l2descaddr.paddress.NS = if IsSecure() then ‘0’ else ‘1’; l2descaddr.memattrs = l1descaddr.memattrs; l2desc = _Mem[l2descaddr,4]; // Process level 2 descriptor. if l2desc<1:0> == ‘00’ then DataAbort(mva, domain, sectionnotpage, is_write, DAbort_Translation); S = l2desc<10>; ap = l2desc<9,5:4>; nG = l2desc<11>; if SCTLR.AFE == ‘1’ && l2desc<4> == ‘0’ then if SCTLR.HA == ‘0’ then DataAbort(mva, domain, sectionnotpage, is_write, DAbort_AccessFlag); else // Hardware-managed access flag must be set in memory _Mem[l2descaddr,4]<4> = ‘1’; if l2desc<1> == ‘0’ then // Large page texcb = l2desc<14:12,3,2> xn = l2desc<15>; type = TLBRecType_LargePage; physicaladdressext = ‘00000000’; physicaladdress = l2desc<31:16> : mva<15:0>; else // Small page texcb = l2desc<8:6,3,2>; xn = l2desc<0>; type = TLBRecType_SmallPage; physicaladdressext = ‘00000000’; physicaladdress = l2desc<31:12> : mva<11:0>; else // MMU is disabled texcb = ‘00000’; S = ‘1’; ap = bits(3) UNKNOWN; xn = bit UNKNOWN; nG = bit UNKNOWN; B3-160 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Virtual Memory System Architecture (VMSA) domain = bits(4) UNKNOWN; sectionnotpage = boolean UNKNOWN; type = TLBRecType_MMUDisabled; physicaladdress = mva; physicaladdressext = ‘00000000’; NS = if IsSecure() then ‘0’ else ‘1’; // Decode the TEX, C, B and S bits to produce the TLBRecord’s memory attributes. if SCTLR.TRE == ‘0’ then if RemapRegsHaveResetValues() then result.addrdesc.memattrs = DefaultTEXDecode(texcb, S); else IMPLEMENTATION_DEFINED setting of result.addrdesc.memattrs; else if SCTLR.M == ‘0’ then result.addrdesc.memattrs = DefaultTEXDecode(texcb, S); else result.addrdesc.memattrs = RemappedTEXDecode(texcb, S); // Set the rest of the TLBRecord, try to add it to the TLB, and return it. result.perms.ap = ap; result.perms.xn = xn; result.nG = nG; result.domain = domain; result.sectionnotpage = sectionnotpage; result.type = type; result.addrdesc.paddress.physicaladdress = physicaladdress; result.addrdesc.paddress.physicaladdressext = physicaladdressext; result.addrdesc.paddress.NS = NS; AssignToTLB(CONTEXTIDR.ASID, mva, result); return result; ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B3-161 Virtual Memory System Architecture (VMSA) B3-162 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Chapter B4 Protected Memory System Architecture (PMSA) This chapter provides a system-level view of the memory system. It contains the following sections: • About the PMSA on page B4-2 • Memory access control on page B4-9 • Memory region attributes on page B4-11 • PMSA memory aborts on page B4-13 • Fault Status and Fault Address registers in a PMSA implementation on page B4-18 • CP15 registers for a PMSA implementation on page B4-22 • Pseudocode details of PMSA memory system operations on page B4-79. Note For an ARMv7-R implementation, this chapter must be read with Chapter B2 Common Memory System Architecture Features. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-1 Protected Memory System Architecture (PMSA) B4.1 About the PMSA The PMSA is based on a Memory Protection Unit (MPU). The PMSA provides a much simpler memory protection scheme than the MMU based VMSA described in Chapter B3 Virtual Memory System Architecture (VMSA). The simplification applies to both the hardware and the software. A PMSAv7 processor is identified by the presence of the MPU Type Register, see c0, MPU Type Register (MPUIR) on page B4-36. The main simplification is that the MPU does not use translation tables. Instead, System Control Coprocessor (CP15) registers are used to define protection regions. The protection regions eliminate the need for: • hardware to perform translation table walks • software to set up and maintain the translation tables. The use of protection regions has the benefit of making the memory checking fully deterministic. However, the level of control is region based rather than page based, meaning the control is considerably less fine-grained than in the VMSA. A second simplification is that the PMSA does not support virtual to physical address mapping other than flat address mapping. The physical memory address accessed is the same as the virtual address generated by the processor. B4.1.1 Protection regions In a PMSA implementation, you can use CP15 registers to define protection regions in the physical memory map. When describing a PMSA implementation, protection regions are often referred to as regions. This means the PMSA has the following features: • For each defined region, CP15 registers specify: — the region size — the base address — the memory attributes, for example, memory type and access permissions. Regions of 256 bytes or larger can be split into 8 sub-regions for improved granularity of memory access control. The minimum region size supported is IMPLEMENTATION DEFINED. B4-2 • Memory region control, requiring read and write access to the region configuration registers, is possible only from privileged modes. • Regions can overlap. If an address is defined in multiple regions, a fixed priority scheme is used to define the properties of the address being accessed. This scheme gives priority to the region with the highest region number. • The PMSA can be configured so that an access to an address that is not defined in any region either: — causes a memory abort — if it is a privileged access, uses the default memory map. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) • All addresses are physical addresses, address translation is not supported. • Instruction and data address spaces can be either: — unified, so a single region descriptor applies to both instruction and data accesses — separated between different instruction region descriptors and data region descriptors. When the processor generates a memory access, the MPU compares the memory address with the programmed memory regions: • • B4.1.2 If a matching memory region is not found, then: — the access can be mapped onto a background region, see Using the default memory map as a background region on page B4-5 — otherwise, a Background Fault memory abort is signaled to the processor. If a matching memory region is found: — The access permission bits are used to determine whether the access is permitted. If the access is not permitted, the MPU signals a Permissions Fault memory abort. Otherwise, the access proceeds. See Memory access control on page B4-9 for a description of the access permission bits. — The memory region attributes are used to determine the memory type, as described in Memory region attributes on page B4-11. Subregions A region of the PMSA memory map can be split into eight equal sized, non-overlapping subregions: • any region size between 256bytes and 4Gbytes supports 8 sub-regions • region sizes below 256 bytes do not support sub-regions In the Region Size Register for each region, there is a Subregion disable bit for each subregion. This means that each subregion is either: • part of the region, if its Subregion disable bit is 0 • not part of the region, if its Subregion disable bit is 1. If the region size is smaller than 256 bytes then all eight of the Subregion bits are UNK/SBZ. If a subregion is part of the region then the protection and memory type attributes of the region apply to the subregion. If a subregion is not part of the region then the addresses covered by the subregion do not match as part of the region. Subregions are not available in versions of the PMSA before PMSAv7. B4.1.3 Overlapping regions The MPU can be programmed with two or more overlapping regions. When memory regions overlap, a fixed priority scheme determines the region whose attributes are applied to the memory access. The higher the region number the higher the priority. Therefore, for example, in an implementation that supports eight memory regions, the attributes for region 7 have highest priority and those for region 0 have lowest priority. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-3 Protected Memory System Architecture (PMSA) Figure B4-1 shows a case where the MPU is programmed with overlapping memory regions. 0x4000 0x3010 Region 2 0x3000 Region 1 0x0000 Figure B4-1 Overlapping memory regions in the MPU In this example: • Data region 2 is programmed to be 4KB in size, starting from address 0x3000 with AP[2:0} == 0b010, giving privileged mode full access, User mode read-only access. • Data region 1 is programmed to be 16KB in size, starting from address 0x0 with AP[2:0} == 0b001, giving privileged mode access only. If the processor performs a data load from address 0x3010 while in User mode, the address is in both region 1 and region 2. Region 2 has the higher priority, therefore the region 2 attributes apply to the access. This means the load does not abort. B4.1.4 The background region Background region refers to a region that matches the entire 4GB physical address map, and has a lower priority than any other region. Therefore, a background region provides the memory attributes for any memory access that does not match any of the defined memory regions. When the SCTLR.BR bit is set to 0, the MPU behaves as if there is a background region that generates a Background Fault memory abort on any access. This means that any memory access that does not match any of the programmed memory regions generates a Background Fault memory abort. This is the same as the behavior in PMSAv6. If you want a background region with a different set of memory attributes, you can program region 0 as a 4GB region with the attributes you require. Because region 0 has the lowest priority this region then acts as a background region. B4-4 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) Using the default memory map as a background region The default memory map is defined in The default memory map on page B4-6. Before PMSAv7, the default memory map is used only to define the behavior of memory accesses when the MPU is disabled or not implemented. From PMSAv7, when the SCTLR.BR bit is set to 1, and the MPU is present and enabled: • the default memory map defines the background region for privileged memory accesses, meaning that a privileged access that does not match any of the programmed memory regions takes the properties defined for that address in the default memory map • an unprivileged memory access that does not match any of the defined memory regions generates a Background Fault memory abort. Using the default memory map as the background region means that all of the programmable memory region definitions can be used to define protection regions in the 4GB memory address space. B4.1.5 Enabling and disabling the MPU The SCTLR.M bit is used to enable and disable the MPU, see c1, System Control Register (SCTLR) on page B4-45. On reset, this bit is cleared to 0, meaning the MPU is disabled after a reset. Software must program all relevant CP15 registers before enabling the MPU. This includes at least one of: • setting up at least one memory region • setting the SCTLR.BR bit to 1, to use the default memory map as a background region, see Using the default memory map as a background region. Synchronization of changes to the CP15 registers is discussed in Changes to CP15 registers and the memory order model on page B4-28. These considerations apply to any change that enables or disables the MPU or the caches. Behavior when the MPU is disabled When the MPU is disabled: • Instruction accesses use the default memory map and attributes shown in Table B4-1 on page B4-6. An access to a memory region with the Execute Never attribute generates a Permission fault, see The Execute Never (XN) attribute and instruction prefetching on page B4-10. No other permission checks are performed. Additional control of the cacheability is made by: — the SCTLR.I bit if separate instruction and data caches are implemented — the SCTLR.C bit if unified caches are implemented. • Data accesses use the default memory map and attributes shown in Table B4-2 on page B4-7. No memory access permission checks are performed, and no aborts can be generated. • Program flow prediction functions as normal, controlled by the value of the SCTLR.Z bit, see c1, System Control Register (SCTLR) on page B4-45. • All of the CP15 cache operations work as normal. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-5 Protected Memory System Architecture (PMSA) • Instruction and data prefetch operations work as normal, based on the default memory map: — Data prefetch operations have no effect if the data cache is disabled — Instruction prefetch operations have no effect if the instruction cache is disabled. • The Outer memory attributes are the same as those for the Inner memory system. The default memory map The PMSAv7 default memory map is fixed and not configurable, and is shown in: • Table B4-1 for the instruction access attributes • Table B4-2 on page B4-7 for the data access attributes. The regions of the default memory map are identical in both tables. The information about the memory map is split into two tables only to improve the presentation of the information. Table B4-1 Default memory map, showing instruction access attributes Instruction memory type Address range HIVECS 0xFFFFFFFF -0xF0000000 Execute Never, XN Caching enabled a Caching disabled a 0 Not applicable Not applicable Execute Never 0xFFFFFFFF 0xF0000000 1b Normal, Non-cacheable Normal, Non-cacheable Execution permitted 0xEFFFFFFF 0xC0000000 X Not applicable Not applicable Execute Never 0xBFFFFFFF 0xA0000000 X Not applicable Not applicable Execute Never 0x9FFFFFFF 0x80000000 X Not applicable Not applicable Execute Never 0x7FFFFFFF 0x60000000 X Normal, Non-shareable, Write-Through Cacheable Normal, Non-shareable, Non-cacheable Execution permitted 0x5FFFFFFF 0x40000000 X Normal, Non-shareable, Write-Through Cacheable Normal, Non-shareable, Non-cacheable Execution permitted 0x3FFFFFFF 0x00000000 X Normal, Non-shareable, Write-Through Cacheable Normal, Non-shareable, Non-cacheable Execution permitted a. When separate instruction and data caches are implemented, caching is enabled for instruction accesses if the instruction caches are enabled. When unified caches are implemented caching is enabled if the data or unified caches are enabled. See the descriptions of the C and I bits in c1, System Control Register (SCTLR) on page B4-45. b. Use of HIVECS == 1 is deprecated in PMSAv7, see Exception vectors and the exception base address on page B1-30. B4-6 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) Table B4-2 Default memory map, showing data access attributes Data memory type Address range Caching enabled a Caching disabled 0xFFFFFFFF - 0xC0000000 Strongly-ordered Strongly-ordered 0xBFFFFFFF - 0xA0000000 Shareable Device Shareable Device 0x9FFFFFFF - 0x80000000 Non-shareable Device Non-shareable Device 0x7FFFFFFF - 0x60000000 Normal, Shareable, Non-cacheable Normal, Shareable, Non-cacheable 0x5FFFFFFF - 0x40000000 Normal, Non-shareable, Write-Through Cacheable Normal, Shareable, Non-cacheable 0x3FFFFFFF - 0x00000000 Normal, Non-shareable, Write-Back, Write-Allocate Cacheable Normal, Shareable, Non-cacheable a. Caching is enabled for data accesses if the data or unified caches are enabled. See the description of the C bit in c1, System Control Register (SCTLR) on page B4-45. Behavior of an implementation that does not include an MPU If a PMSAv7 implementation does not include an MPU, it must adopt the default memory map behavior described in Behavior when the MPU is disabled on page B4-5. A PMSAv7 implementation that does not include an MPU is identified by an MPU Type Register entry that shows a Unified MPU with zero Data or Unified regions, see c0, MPU Type Register (MPUIR) on page B4-36. B4.1.6 Finding the minimum supported region size You can use the DRBAR to find the minimum region size supported by an implementation, by following this procedure: 1. Write a valid memory region number to the RGNR. Normally you use region number 0, because this is always a valid region number. 2. Write the value 0xFFFFFFFC to the DRBAR. This value sets all valid bits in the register to 1. 3. Read back the value of the DRBAR. In the returned value the least significant bit set indicates the resolution of the selected region. If the least significant bit set is bit M the resolution of the region is 2M bytes. If the MPU implements separate data and instruction regions this process gives the minimum size for data regions. To find the minimum size for instruction regions, use the same procedure with the IRBAR. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-7 Protected Memory System Architecture (PMSA) For more information about the registers used see: • c6, MPU Region Number Register (RGNR) on page B4-66 • c6, Data Region Base Address Register (DRBAR) on page B4-60 • c6, Instruction Region Base Address Register (IRBAR) on page B4-61. B4-8 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) B4.2 Memory access control Access to a memory region is controlled by the access permission bits for each region, held in the DRACR and IRACR. For descriptions of the registers see: • c6, Data Region Access Control Register (DRACR) on page B4-64 • c6, Instruction Region Access Control Register (IRACR) on page B4-65. B4.2.1 Access permissions Access permission bits control access to the corresponding memory region. If an access is made to an area of memory without the required permissions, a Permission fault is generated. In the appropriate Region Access Control Register: • the AP bits determine the access permissions • the XN bit provides an additional permission bit for instruction fetches. The access permissions are a three-bit field, DRACR.AP[2:0] or IRACR.AP[2:0]. Table B4-3 shows the possible values of this field. Table B4-3 Access permissions AP[2:0] Privileged permissions User permissions Description 000 No access No access All accesses generate a Permission fault 001 Read/Write No access All User mode accesses generate Permission faults 010 Read/Write Read-only User mode write accesses generate Permission faults 011 Read/Write Read/Write Full access 100 UNPREDICTABLE UNPREDICTABLE Reserved 101 Read-only No Access Privileged read-only, all other accesses generate Permission faults 110 Read-only Read-only All write accesses generate Permission faults. 111 UNPREDICTABLE UNPREDICTABLE Reserved ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-9 Protected Memory System Architecture (PMSA) The Execute Never (XN) attribute and instruction prefetching Each memory region can be tagged as not containing executable code. If the Execute never (XN) bit is set to 1, any attempt to execute an instruction in that region results in a Permission fault, and the implementation must not access the region to prefetch instructions speculatively. If the XN bit is 0, code can execute from that memory region. Note The XN bit acts as an additional permission check. The address must also have a valid read access permission. In ARMv7, all regions of memory that contain read-sensitive peripherals must be marked as XN to avoid the possibility of a speculative prefetch accessing the locations. B4-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) B4.3 Memory region attributes Each memory region has an associated set of memory region attributes. These control accesses to the caches, how the write buffer is used, and whether the memory region is Shareable and therefore is guaranteed by hardware to be coherent. These attributes are encoded in the C, B, TEX[2:0] and S bits of the appropriate Region Access Control Register. Note The Bufferable (B), Cacheable (C), and Type Extension (TEX) bit names are inherited from earlier versions of the architecture. These names no longer adequately describe the function of the B, C, and TEX bits. B4.3.1 C, B, and TEX[2:0] encodings The TEX[2:0] field must be considered with the C and B bits to give a five bit encoding of the access attributes for an MPU memory region. Table B4-4 shows these encodings. For Normal memory regions, the S (Shareable) bit gives more information about whether the region is Shareable. A Shareable region can be shared by multiple processors. A Normal memory region is Shareable if the S bit for the region is set to 1. For other memory types, the value of the S bit is ignored. Table B4-4 C, B and TEX[2:0] encodings TEX[2:0] C B Description Memory type Shareable? 000 0 0 Strongly-ordered. Strongly-ordered Shareable 000 0 1 Shareable Device. Device Shareable 000 1 0 Outer and Inner Write-Through, no Write-Allocate. Normal S bita 000 1 1 Outer and Inner Write-Back, no Write-Allocate. Normal S bita 001 0 0 Outer and Inner Non-cacheable. Normal S bita 001 0 1 Reserved. - - 001 1 0 IMPLEMENTATION DEFINED. IMP. DEF.b IMP. DEF.b 001 1 1 Outer and Inner Write-Back, Write-Allocate. Normal S bita 010 0 0 Non-shareable Device. Device Non-shareable 010 0 1 Reserved. - - 010 1 X Reserved. - - ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-11 Protected Memory System Architecture (PMSA) Table B4-4 C, B and TEX[2:0] encodings (continued) TEX[2:0] C B Description Memory type Shareable? 011 X X Reserved. - - 1BB A A Cacheable memory: Normal S bita AA = Inner attribute c BB = Outer policy a. Region is Shareable if S == 1, and Non-shareable if S == 0. b. IMP. DEF. = IMPLEMENTATION DEFINED. c. For more information see Cacheable memory attributes. For an explanation of Normal, Strongly-ordered and Device memory types, and the Shareable attribute, see Memory types and attributes and the memory order model on page A3-24. Cacheable memory attributes When TEX[2] == 1, the memory region is Cacheable memory, and the rest of the encoding defines the Inner and Outer cache attributes: TEX[1:0] defines the Outer cache attribute C,B defines the Inner cache attribute The same encoding is used for the Outer and Inner cache attributes. Table B4-5 shows the encoding. Table B4-5 Inner and Outer cache attribute encoding B4-12 Memory attribute encoding Cache attribute 00 Non-cacheable 01 Write-Back, Write-Allocate 10 Write-Through, no Write-Allocate 11 Write-Back, no Write-Allocate Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) B4.4 PMSA memory aborts The mechanisms that cause the ARM processor to take an exception because of a memory access are: MPU fault The MPU detects an access restriction and signals the processor. External abort A memory system component other than the MPU signals an illegal or faulting external memory access. The exception taken is a Prefetch Abort exception if either of these occurs synchronously on an instruction fetch, and a Data Abort exception otherwise. Collectively these mechanisms are called aborts. The different abort mechanisms are described in: • MPU faults • External aborts on page B4-15. An access that causes an abort is said to be aborted, and uses the Fault Address Registers (FARs) and Fault Status Registers (FSRs) to record context information. The FARs and FSRs are described in Fault Status and Fault Address registers in a PMSA implementation on page B4-18. Also, a debug exception can cause the processor to take a Prefetch Abort exception or a Data Abort exception, and to update the FARs and FSRs. For details see Chapter C4 Debug Exceptions and Debug event prioritization on page C3-43. B4.4.1 MPU faults The MPU checks the memory accesses required for instruction fetches and for explicit memory accesses: • if an instruction fetch faults it generates a Prefetch Abort exception • if an explicit memory access faults it generates a Data Abort exception. For more information about Prefetch Abort exceptions and Data Abort exceptions see Exceptions on page B1-30. MPU faults are always synchronous. For more information, see Terminology for describing exceptions on page B1-4. When the MPU generates an abort for a region of memory, no memory access is made if that region is or could be marked as Strongly-ordered or Device. The MPU can generate three types of fault, described in the subsections: • Alignment fault on page B4-14 • Background fault on page B4-14 • Permission fault on page B4-14. The MPU fault checking sequence on page B4-15 describes the fault checking sequence. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-13 Protected Memory System Architecture (PMSA) Alignment fault The ARMv7 memory architecture requires support for strict alignment checking. This checking is controlled by the SCTLR.A bit, see c1, System Control Register (SCTLR) on page B4-45. For details of when Alignment faults are generated see Unaligned data access on page A3-5. Background fault If the memory access address does not match one of the programmed MPU memory regions, and the default memory map is not being used, a Background Fault memory abort is generated. Background faults cannot occur on any cache or branch predictor maintenance operation. Permission fault The access permissions, defined in Memory access control on page B4-9, are checked against the processor memory access. If the access is not permitted, a Permission Fault memory abort is generated. Permission faults cannot occur on cache or branch predictor maintenance operation. B4-14 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) The MPU fault checking sequence Figure B4-2 shows the MPU fault checking sequence, when the MPU is enabled. Memory address Alignment check? Does the access require an alignment check? Yes Check address alignment No No Misaligned ? Yes Alignment fault Check address is in a defined memory region Is use of default memory map as a Background region enabled? Address in a region ? Background region ? Yes No Yes Privileged access ? Check access permissions Yes Valid permissions No ? Yes Permission fault No No No Background fault Is access to an XN area in the Background region? Execution permitted ? Yes Access memory Figure B4-2 MPU fault checking sequence B4.4.2 External aborts External memory errors are defined as errors that occur in the memory system other than those that are detected by the MPU or Debug hardware. They include parity errors detected by the caches or other parts of the memory system. An external abort is one of: • synchronous • precise asynchronous • imprecise asynchronous. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-15 Protected Memory System Architecture (PMSA) For more information, see Terminology for describing exceptions on page B1-4. The ARM architecture does not provide a method to distinguish between precise asynchronous and imprecise asynchronous aborts. The ARM architecture handles asynchronous aborts in a similar way to interrupts, except that they are reported to the processor using the Data Abort exception. Setting the CPSR.A bit to 1 masks asynchronous aborts, see Program Status Registers (PSRs) on page B1-14. Normally, external aborts are rare. An imprecise asynchronous external abort is likely to be fatal to the process that is running. An example of an event that might cause an external abort is an uncorrectable parity or ECC failure on a Level 2 memory structure. It is IMPLEMENTATION DEFINED which external aborts, if any, are supported. PMSAv7 permits external aborts on data accesses and instruction fetches to be either synchronous or asynchronous. The DFSR indicates whether the external abort is synchronous or asynchronous, see c5, Data Fault Status Register (DFSR) on page B4-55. Note Because imprecise external aborts are normally fatal to the process that caused them, ARM recommends that implementations make external aborts precise wherever possible. More information about possible external aborts is given in the subsections: • External abort on instruction fetch • External abort on data read or write • Parity error reporting on page B4-17. For information about how external aborts are reported see Fault Status and Fault Address registers in a PMSA implementation on page B4-18. External abort on instruction fetch An external abort on an instruction fetch can be either synchronous or asynchronous. A synchronous external abort on an instruction fetch is taken precisely. An implementation can report the external abort asynchronously from the instruction that it applies to. In such an implementation these aborts behave essentially as interrupts. They are masked by the CPSR.A bit when it is set to 1, otherwise they are reported using the Data Abort exception. External abort on data read or write Externally generated errors during a data read or write can be either synchronous or asynchronous. An implementation can report the external abort asynchronously from the instruction that generated the access. In such an implementation these aborts behave essentially as interrupts. They are masked by the CPSR.A bit when it is set to 1, otherwise they are reported using the Data Abort exception. B4-16 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) Parity error reporting The ARM architecture supports the reporting of both synchronous and asynchronous parity errors from the cache systems. It is IMPLEMENTATION DEFINED what parity errors in the cache systems, if any, result in synchronous or asynchronous parity errors. A fault status code is defined for reporting parity errors, see Fault Status and Fault Address registers in a PMSA implementation on page B4-18. However when parity error reporting is implemented it is IMPLEMENTATION DEFINED whether the assigned fault status code or another appropriate encoding is used to report parity errors. For all purposes other than the fault status encoding, parity errors are treated as external aborts. B4.4.3 Prioritization of aborts For synchronous aborts, Debug event prioritization on page C3-43 describes the relationship between debug events, MPU faults and external aborts. In general, the ARM architecture does not define when asynchronous events are taken, and therefore the prioritization of asynchronous events is IMPLEMENTATION DEFINED. Note A special requirement applies to asynchronous watchpoints, see Debug event prioritization on page C3-43. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-17 Protected Memory System Architecture (PMSA) B4.5 Fault Status and Fault Address registers in a PMSA implementation This section describes the Fault Status and Fault Address registers, and how they report information about PMSA aborts. It contains the following subsections: • About the Fault Status and Fault Address registers • Data Abort exceptions on page B4-19 • Prefetch Abort exceptions on page B4-19 • Fault Status Register encodings for the PMSA on page B4-19 • Distinguishing read and write accesses on Data Abort exceptions on page B4-21 • Provision for classification of external aborts on page B4-21 • Auxiliary Fault Status Registers on page B4-21. Also, these registers are used to report information about debug exceptions. For details see Effects of debug exceptions on CP15 registers and the DBGWFAR on page C4-4. B4.5.1 About the Fault Status and Fault Address registers PMSAv7 provides four registers for reporting fault address and status information: • The Data Fault Status Register, see c5, Data Fault Status Register (DFSR) on page B4-55. The DFSR is updated on taking a Data Abort exception. • The Instruction Fault Status Register, see c5, Instruction Fault Status Register (IFSR) on page B4-56. The IFSR is updated on taking a Prefetch Abort exception. • The Data Fault Address Register, see c6, Data Fault Address Register (DFAR) on page B4-57. In some cases, on taking a synchronous Data Abort exception the DFAR is updated with the faulting address. See Terminology for describing exceptions on page B1-4 for a description of synchronous exceptions. • The Instruction Fault Address Register, see c6, Instruction Fault Address Register (IFAR) on page B4-58. The IFAR is updated with the faulting address on taking a Prefetch Abort exception. In addition, the architecture provides encodings for two IMPLEMENTATION DEFINED Auxiliary Fault Status Registers, see Auxiliary Fault Status Registers on page B4-21. Note • On a Data Abort exception that is generated by an instruction cache maintenance operation, the IFSR is also updated. • Before ARMv7, the Data Fault Address Register (DFAR) was called the Fault Address Register (FAR). On a Watchpoint debug exception, the Watchpoint Fault Address Register (DBGWFAR) is used to hold fault information. On a watchpoint access the DBGWFAR is updated with the address of the instruction that generated the Data Abort exception. For more information, see Watchpoint Fault Address Register (DBGWFAR) on page C10-28. B4-18 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) B4.5.2 Data Abort exceptions On taking a Data Abort exception the processor: • updates the DFSR with a fault status code • if the Data Abort exception is synchronous: — updates the DFSR with whether the faulted access was a read or a write — if the Data Abort exception was not caused by a Watchpoint debug event, updates the DFAR with the address that caused the Data Abort exception — if the Data Abort exception was caused by a Watchpoint debug event, the DFAR becomes UNKNOWN • if the Data Abort exception is asynchronous, the DFAR becomes UNKNOWN. On an access that might have multiple aborts, the MPU fault checking sequence and the prioritization of aborts determine which abort occurs. For more information, see The MPU fault checking sequence on page B4-15 and Prioritization of aborts on page B4-17. B4.5.3 Prefetch Abort exceptions A Prefetch Abort exception can be generated on an instruction fetch. The Prefetch Abort exception is taken synchronously with the instruction that the abort is reported on. This means: • If the instruction is executed a Prefetch Abort exception is generated. • If the instruction fetch is issued but the processor does not attempt to execute the instruction no Prefetch Abort exception is generated. For example, if the processor branches round the instruction no Prefetch Abort exception is generated. On taking a Prefetch Abort exception the processor: • updates the IFSR with a fault status code • updates the IFAR with the address that caused the Prefetch Abort exception. B4.5.4 Fault Status Register encodings for the PMSA For the PMSA fault status encodings in priority order see: • Table B4-6 for the Instruction Fault Status Register (IFSR) encodings • Table B4-7 on page B4-20 for the Data Fault Status Register (DFSR) encodings. Table B4-6 PMSAv7 IFSR encodings IFSR [10,3:0] a Sources IFAR Notes 00001 Alignment fault Valid MPU fault 00000 Background fault Valid MPU fault 01101 Permission fault Valid MPU fault ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-19 Protected Memory System Architecture (PMSA) Table B4-6 PMSAv7 IFSR encodings (continued) IFSR [10,3:0] a Sources IFAR Notes 00010 Debug event UNKNOWN See Software debug events on page C3-5 01000 Synchronous external abort Valid - 10100 IMPLEMENTATION DEFINED - Lockdown 11010 IMPLEMENTATION DEFINED - Coprocessor abort 11001 Memory access synchronous parity error Valid - a. All IFSR[10,3:0] values not listed in this table are reserved. Table B4-7 PMSAv7 DFSR encodings DFSR [10,3:0] a Sources DFAR Notes 00001 Alignment fault Valid MPU fault 00000 Background fault Valid MPU fault 01101 Permission fault Valid MPU fault 00010 Debug event UNKNOWN See Software debug events on page C3-5 01000 Synchronous external abort Valid - 10100 IMPLEMENTATION DEFINED - Lockdown 11010 IMPLEMENTATION DEFINED - Coprocessor abort 11001 Memory access synchronous parity error b - 10110 Asynchronous external abort UNKNOWN - 11000 Memory access asynchronous parity error UNKNOWN - a. All DFSR[10,3:0] values not listed in this table are reserved. b. It is IMPLEMENTATION DEFINED whether the DFAR is updated for a synchronous parity error. Note In previous ARM documentation, the terms precise and imprecise were used instead of synchronous and asynchronous. For details of the more exact terminology introduced in this manual see Terminology for describing exceptions on page B1-4. B4-20 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) Reserved encodings in the IFSR and DFSR encodings tables A single encoding is reserved for cache lockdown faults. The details of these faults and any associated subsidiary registers are IMPLEMENTATION DEFINED. A single encoding is reserved for aborts associated with coprocessors. The details of these faults are IMPLEMENTATION DEFINED. B4.5.5 Distinguishing read and write accesses on Data Abort exceptions On a Data Abort exception, the DFSR.WnR bit, bit [11] of the register, indicates whether the abort occurred on a read access or on a write access. However, for a fault on a CP15 cache maintenance operation this bit always indicates a write access fault. For a fault generated by a SWP or SWPB instruction, the WnR bit is 0 if a read to the location would have generated a fault, otherwise it is 1. B4.5.6 Provision for classification of external aborts An implementation can use the DFSR.ExT and IFSR.ExT bits to provide more information about external aborts: • DFSR.ExT can provide an IMPLEMENTATION DEFINED classification of external aborts on data accesses • IFSR.ExT can provide an IMPLEMENTATION DEFINED classification of external aborts on instruction accesses For all aborts other than external aborts these bits return a value of 0. B4.5.7 Auxiliary Fault Status Registers ARMv7 architects two Auxiliary Fault Status Registers: • the Auxiliary Data Fault Status Register (ADFSR) • the Auxiliary Instruction Fault Status Register (AIFSR). These registers enable additional fault status information to be returned: • The position of these registers is architecturally-defined, but the content and use of the registers is IMPLEMENTATION DEFINED. • An implementation that does not need to report additional fault information must implement these registers as UNK/SBZ. This ensures that a privileged attempt to access these registers is not faulted. An example use of these registers would be to return more information for diagnosing parity errors. See c5, Auxiliary Data and Instruction Fault Status Registers (ADFSR and AIFSR) on page B4-56 for the architectural details of these registers. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-21 Protected Memory System Architecture (PMSA) B4.6 CP15 registers for a PMSA implementation This section gives a full description of the registers implemented in the CP15 System Control Coprocessor in an ARMv7 implementation that includes the PMSA memory system. Therefore, this is the description of the CP15 registers for an ARMv7-R implementation. Some of the registers described in this section are also included in an ARMv7 implementation with a VMSA. The section CP15 registers for a VMSA implementation on page B3-64 also includes descriptions of these registers. See Coprocessors and system control on page B1-62 for general information about the System Control Coprocessor, CP15 and the register access instructions MRC and MCR. Information in this section is organized as follows: • general information is given in: — Organization of the CP15 registers in a PMSA implementation — General behavior of CP15 registers on page B4-26 — Changes to CP15 registers and the memory order model on page B4-28 — Meaning of fixed bit values in register diagrams on page B4-29. • this is followed by, for each of the primary CP15 registers c0 to c15: — a general description of the organization of the primary CP15 register — detailed descriptions of all the registers in that primary register. Note The detailed descriptions of the registers that implement the processor identification scheme, CPUID, are given in Chapter B5 The CPUID Identification Scheme, and not in this section. Table B4-8 on page B4-24 lists all of the CP15 registers in a PMSA implementation, and is an index to the detailed description of each register. B4.6.1 Organization of the CP15 registers in a PMSA implementation Figure B4-3 on page B4-23 summarizes the ARMv7 CP15 registers when the PMSA is implemented. Table B4-8 on page B4-24 lists all of these registers. Note ARMv7 introduces significant changes to the memory system registers, especially in relation to caches. For details of: B4-22 • the CP15 register implementation in PMSAv6, see Organization of CP15 registers for an ARMv6 PMSA implementation on page AppxG-31. • how the ARMv7 registers must be used to discover what caches can be accessed by the processor, see Identifying the cache resources in ARMv7 on page B2-4. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) CRn c0 opc1 0 CRm c0 1 {c1-c7} c0 c1 2 0 c0 c0 c5 0 c6 0 c0 c1 c0 c1 c7 0 c2 c0 c1 c5 c9 {0-7} c11 c13 {0-7} 0 {c6,c10} c10 c11 c13 c14 {c0-c2,c5-c8} {c12-c15} {c0-c8,c15} c0 c15 {0-7} {c0-c15} Read-only ‡ opc2 0 1 2 4 5 {3,6,7} {0-7} 0 1 7 0 0 1 2 {0,1} {0,1} {0,2} 0 1 2 3 4 5 0 4 {0,6} {0,1} 4 {6,7} {1,2} {4,5} 1 1 {1,2} {0-7} {0-7} {0-7} 1 {2-4} {0-7} Read/Write ‡ ‡ ‡ ‡ MIDR, Main ID Register CTR, Cache Type Register TCMTR, TCM Type Register, IMPLEMENTATION DEFINED MPUIR, MPU Type Register MPIDR, Multiprocessor Affinity Register Aliases of Main ID Register CPUID registers CCSIDR, Cache Size ID Registers CLIDR, Cache Level ID Register AIDR, Auxiliary ID Register, IMPLEMENTATION DEFINED CSSELR, Cache Size Selection Register SCTLR, Control Register ACTLR, Auxiliary Control Register, IMPLEMENTATION DEFINED CPACR, Coprocessor Access Control Register Fault Status Registers Auxiliary Fault Status Registers, IMPLEMENTATION DEFINED Fault Address Registers DRBAR, Data Region Base Address Register IRBAR, Instruction Region Base Address Register DRSR, Data Region Size and Enable Register IRSR, Instruction Region Size and Enable Register DRACR, Data Region Access Control Register IRACR, Instruction Region Access Control Register RGNR, MPU Region Number Register NOP Cache maintenance operations, Multiprocessing Extensions Cache maintenance operations CP15ISB, Instruction barrier operation Branch predictor maintenance operations Cache maintenance operations Data barrier operations DCCMVAU, Cache barrier operation NOP Cache maintenance operations Reserved for Branch Predictor, Cache and TCM operations Reserved for Performance monitors Reserved for DMA operations for TCM access CONTEXTIDR, Context ID Register Software Thread Registers IMPLEMENTATION DEFINED Registers Write-only Bold text = Accessible in User mode Access depends on the operation Figure B4-3 CP15 registers in a PMSA implementation ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-23 Protected Memory System Architecture (PMSA) For information about the CP15 encodings not shown in Figure B4-3 on page B4-23 see Unpredictable and undefined behavior for CP15 accesses on page B4-26. Summary of CP15 register descriptions in a PMSA implementation Table B4-8 shows the CP15 registers in a PMSA implementation. The table also includes links to the descriptions of each of the primary CP15 registers, c0 to c15. Table B4-8 Summary of CP15 registers in a PMSA implementation Register and description CP15 c0, ID codes registers on page B4-30 c0, Main ID Register (MIDR) on page B4-32 c0, Cache Type Register (CTR) on page B4-34 c0, TCM Type Register (TCMTR) on page B4-35 c0, MPU Type Register (MPUIR) on page B4-36 c0, Multiprocessor Affinity Register (MPIDR) on page B4-37 CP15 c0, Processor Feature registers on page B5-4 c0, Debug Feature Register 0 (ID_DFR0) on page B5-6 c0, Auxiliary Feature Register 0 (ID_AFR0) on page B5-8 CP15 c0, Memory Model Feature registers on page B5-9 CP15 c0, Instruction Set Attribute registers on page B5-19 c0, Cache Size ID Registers (CCSIDR) on page B4-40 c0, Cache Level ID Register (CLIDR) on page B4-41 c0, Implementation defined Auxiliary ID Register (AIDR) on page B4-43 c0, Cache Size Selection Register (CSSELR) on page B4-43 CP15 c1, System control registers on page B4-44 c1, System Control Register (SCTLR) on page B4-45 c1, Implementation defined Auxiliary Control Register (ACTLR) on page B4-50 c1, Coprocessor Access Control Register (CPACR) on page B4-51 CP15 registers c2, c3, and c4 are not used on a PMSA implementation, see Unallocated CP15 encodings on page B4-27 B4-24 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) Table B4-8 Summary of CP15 registers in a PMSA implementation (continued) Register and description CP15 c5 and c6, Memory system fault registers on page B4-53 c5, Data Fault Status Register (DFSR) on page B4-55 c5, Instruction Fault Status Register (IFSR) on page B4-56 c5, Auxiliary Data and Instruction Fault Status Registers (ADFSR and AIFSR) on page B4-56 c6, Data Fault Address Register (DFAR) on page B4-57 c6, Instruction Fault Address Register (IFAR) on page B4-58 c6, Data Region Base Address Register (DRBAR) on page B4-60 c6, Instruction Region Base Address Register (IRBAR) on page B4-61 c6, Data Region Size and Enable Register (DRSR) on page B4-62 c6, Instruction Region Size and Enable Register (IRSR) on page B4-63 c6, Data Region Access Control Register (DRACR) on page B4-64 c6, Instruction Region Access Control Register (IRACR) on page B4-65 c6, MPU Region Number Register (RGNR) on page B4-66 CP15 c7, Cache maintenance and other functions on page B4-68 CP15 c7, Cache and branch predictor maintenance functions on page B4-68 CP15 c7, Data and Instruction Barrier operations on page B4-72 CP15 c7, No Operation (NOP) on page B4-73 CP15 c8 is not used on a PMSA implementation, see Unallocated CP15 encodings on page B4-27 CP15 c9, Cache and TCM lockdown registers and performance monitors on page B4-74 CP15 c10 is not used on a PMSA implementation, see Unallocated CP15 encodings on page B4-27 CP15 c11, Reserved for TCM DMA registers on page B4-75 CP15 c12 is not used on a PMSA implementation, see Unallocated CP15 encodings on page B4-27 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-25 Protected Memory System Architecture (PMSA) Table B4-8 Summary of CP15 registers in a PMSA implementation (continued) Register and description CP15 c13, Context and Thread ID registers on page B4-75 c13, Context ID Register (CONTEXTIDR) on page B4-76 CP15 c13 Software Thread ID registers on page B4-77 CP15 c14 is not used on a PMSA implementation, see Unallocated CP15 encodings on page B4-27 CP15 c15, Implementation defined registers on page B4-78 B4.6.2 General behavior of CP15 registers The following sections give information about the general behavior of CP15 registers: • Unpredictable and undefined behavior for CP15 accesses • Reset behavior of CP15 registers on page B4-27 See also Meaning of fixed bit values in register diagrams on page B4-29. Read-only bits in read/write registers Some read/write registers include bits that are read-only. These bits ignore writes. An example of this is the SCTLR.NMFI bit, bit [27], see c1, System Control Register (SCTLR) on page B4-45. UNPREDICTABLE and UNDEFINED behavior for CP15 accesses In ARMv7 the following operations are UNDEFINED: • all CDP, MCRR, MRRC, LDC and STC operations to CP15 • all CDP2, MCR2, MRC2, MCRR2, MRRC2, LDC2 and STC2 operations to CP15. Unless otherwise indicated in the individual register descriptions: • reserved fields in registers are UNK/SBZP • reserved values of fields can have UNPREDICTABLE effects. The following subsections give more information about UNPREDICTABLE and UNDEFINED behavior for CP15: • Unallocated CP15 encodings on page B4-27 • Rules for MCR and MRC accesses to CP15 registers on page B4-27. B4-26 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) Unallocated CP15 encodings When MCR and MRC instructions perform CP15 operations, the value for the instruction is the major register specifier for the CP15 space. Accesses to unallocated major registers are UNDEFINED. For the ARMv7-R Architecture, this means that accesses with = {c2-c4, c8, c10, c12, c14} are UNDEFINED. In an allocated CP15 major register specifier, MCR and MRC accesses to all unallocated encodings are UNPREDICTABLE for privileged accesses. For the ARMv7-A architecture this means that privileged MCR and MRC accesses with != {c2-c4, c8, c10, c12, c14} but with an unallocated combination of , and values, are UNPREDICTABLE. For != {c2-c4, c8, c10, c12, c14}, Figure B4-3 on page B4-23 shows all allocated allocations of , and . A privileged access using any combination not show in the figure is UNPREDICTABLE. Note As shown in Figure B4-3 on page B4-23, accesses to unallocated principal ID registers map onto the Main ID Register. These are accesses with = c0, = 0, = c0, and = {4, 6, 7}. Rules for MCR and MRC accesses to CP15 registers All MCR operations from the PC are UNPREDICTABLE for all coprocessors, including for CP15. All MRC operations to APSR_nzcv are UNPREDICTABLE for CP15. The following accesses are UNPREDICTABLE: • an MCR access to an encoding for which no write behavior is defined in any circumstances • an MRC access to an encoding for which no read behavior is defined in any circumstances. Except for CP15 encoding that are accessible in User mode, all MCR and MRC accesses from User mode are This applies to all User mode accesses to unallocated CP15 encodings. Individual register descriptions, and the summaries of the CP15 major registers, show the CP15 encodings that are accessible in User mode. UNDEFINED. Some individual registers can be made inaccessible by setting configuration bits, possibly including IMPLEMENTATION DEFINED configuration bits, to disable access to the register. The effects of the architecturally-defined configuration bits are defined individually in this manual. Typically, setting a configuration bit to disable access to a register results in the register becoming UNDEFINED for MRC and MCR accesses. Reset behavior of CP15 registers After a reset, only a limited subset of the processor state is guaranteed to be set to defined values. On reset, the PMSAv7 architecture requires that the following CP15 registers are set to defined values: • the SCTLR, see c1, System Control Register (SCTLR) on page B4-45 • the CPACR, see c1, Coprocessor Access Control Register (CPACR) on page B4-51 • the DRSR, see c6, Data Region Size and Enable Register (DRSR) on page B4-62 • the IRSR, if implemented, see c6, Instruction Region Size and Enable Register (IRSR) on page B4-63. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-27 Protected Memory System Architecture (PMSA) For details of the reset values of these registers see the register descriptions. After a reset, software must not rely on the value of any read/write register not included in this list. B4.6.3 Changes to CP15 registers and the memory order model All changes to CP15 registers that appear in program order after any explicit memory operations are guaranteed not to affect those memory operations. Any change to CP15 registers is guaranteed to be visible to subsequent instructions only after one of: • the execution of an ISB instruction • the taking of an exception • the return from an exception. To guarantee the visibility of changes to some CP15 registers, additional operations might be required, on a case by case basis, before the ISB instruction, exception or return from exception. These cases are identified specifically in the definition of the registers. However, for CP15 register accesses, all MRC and MCR instructions to the same register using the same register number appear to occur in program order relative to each other without context synchronization. Where a change to the CP15 registers that is not yet guaranteed to be visible has an effect on exception processing, the following rule applies: • When it is determined that an exception must be taken, any change of state held in CP15 registers involved in the triggering of the exception and that affects the processing of the exception is guaranteed to take effect before the exception is taken. Therefore, in the following example, where initially A=1 and V=0, the LDR might or might not take a Data Abort exception due to the unaligned access, but if an exception occurs, the vector used is affected by the V bit: MCR p15, R0, c1, c0, 0 LDR R2, [R3] B4-28 ; clears the A bit and sets the V bit ; unaligned load. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) B4.6.4 Meaning of fixed bit values in register diagrams In register diagrams, fixed bits are indicated by one of following: 0 In any implementation: • the bit must read as 0 • writes to the bit must be ignored. Software: • can rely on the bit reading as 0 • must use an SBZP policy to write to the bit. (0) In any implementation: • the bit must read as 0 • writes to the bit must be ignored. Software: • must not rely on the bit reading as 0 • must use an SBZP policy to write to the bit. 1 In any implementation: • the bit must read as 1 • writes to the bit must be ignored. Software: • can rely on the bit reading as 1 • must use an SBOP policy to write to the bit. (1) In any implementation: • the bit must read as 1 • writes to the bit must be ignored. Software: • must not rely on the bit reading as 1 • must use an SBOP policy to write to the bit. Fields that are more than 1 bit wide are sometimes described as UNK/SBZP, instead of having each bit marked as (0). ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-29 Protected Memory System Architecture (PMSA) B4.6.5 CP15 c0, ID codes registers The CP15 c0 registers are used for processor and feature identification. Figure B4-4 shows the CP15 c0 registers. CRn c0 opc1 0 CRm c0 c1 c2 1 {c3-c7} c0 2 c0 Read-only opc2 0 1 2 4 5 {3,6,7} 0 1 2 3 4 5 6 7 0 1 2 3 4 5 {6,7} {0-7} 0 1 7 0 Read/Write MIDR, Main ID Register CTR, Cache Type Register TCMTR, TCM Type Register, details IMPLEMENTATION DEFINED MPUIR, MPU Type Register MPIDR, Multiprocessor Affinity Register Aliases of Main ID Register ‡ ID_PFR0, Processor Feature Register 0 ‡ ID_PFR1, Processor Feature Register 1 ‡ ID_DFR0, Debug Feature Register 0 ‡ ID_AFR0, Auxiliary Feature Register 0 ‡ ID_MMFR0, Memory Model Feature Register 0 ‡ ID_MMFR1, Memory Model Feature Register 1 ‡ ID_MMFR2, Memory Model Feature Register 2 ‡ ID_MMFR3, Memory Model Feature Register 3 ‡ ID_ISAR0, ISA Feature Register 0 ‡ ID_ISAR1, ISA Feature Register 1 ‡ ID_ISAR2, ISA Feature Register 2 ‡ ID_ISAR3, ISA Feature Register 3 ‡ ID_ISAR4, ISA Feature Register 4 ‡ ID_ISAR5, ISA Feature Register 5 Read-As-Zero Read-As-Zero CCSIDR, Cache Size ID Registers CLIDR, Cache Level ID Register AIDR, Auxiliary ID Register, IMPLEMENTATION DEFINED CSSELR, Cache Size Selection Register Write-only ‡ CPUID registers Figure B4-4 CP15 c0 registers in a PMSA implementation All CP15 c0 register encodings not shown in Figure B4-4 are UNPREDICTABLE, see Unallocated CP15 encodings on page B4-27. Note Chapter B5 The CPUID Identification Scheme describes the CPUID registers shown in Figure B4-4. B4-30 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) Table B4-9 lists the CP15 c0 registers and shows where each register is described in full. The table does not include the reserved and aliased registers that are shown in Figure B4-4 on page B4-30. Table B4-9 Index to CP15 c0 register descriptions opc1 CRm opc2 Register and description 0 c0 0 c0, Main ID Register (MIDR) on page B4-32 1 c0, Cache Type Register (CTR) on page B4-34 2 c0, TCM Type Register (TCMTR) on page B4-35 4 c0, MPU Type Register (MPUIR) on page B4-36 5 c0, Multiprocessor Affinity Register (MPIDR) on page B4-37 3, 6, 7 c0, Main ID Register (MIDR) on page B4-32 0, 1 CP15 c0, Processor Feature registers on page B5-4 2 c0, Debug Feature Register 0 (ID_DFR0) on page B5-6 3 c0, Auxiliary Feature Register 0 (ID_AFR0) on page B5-8 4-7 CP15 c0, Memory Model Feature registers on page B5-9 c2 0-5 CP15 c0, Instruction Set Attribute registers on page B5-19 c0 0 c0, Cache Size ID Registers (CCSIDR) on page B4-40 1 c0, Cache Level ID Register (CLIDR) on page B4-41 7 c0, Implementation defined Auxiliary ID Register (AIDR) on page B4-43 0 c0, Cache Size Selection Register (CSSELR) on page B4-43 c1 1 2 c0 Note The CPUID scheme described in Chapter B5 The CPUID Identification Scheme includes information about the implementation of the optional Floating-Point and Advanced SIMD architecture extensions. See Advanced SIMD and VFP extensions on page A2-20 for a summary of the implementation options for these features. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-31 Protected Memory System Architecture (PMSA) B4.6.6 c0, Main ID Register (MIDR) The Main ID Register, MIDR, provides identification information for the processor, including an implementer code for the device and a device ID number. The MIDR is: • a 32-bit read-only register • accessible only in privileged modes. Some fields of the MIDR are IMPLEMENTATION DEFINED. For details of the values of these fields for a particular ARMv7 implementation, and any implementation-specific significance of these values, see the product documentation. The format of the MIDR is: 31 24 23 Implementer 20 19 Variant 16 15 Architecture 4 Primary part number 3 0 Revision Implementer, bits [31:24] The Implementer code. Table B4-10 shows the permitted values for this field: Table B4-10 Implementer codes Bits [31:24] ASCII character Implementer 0x41 A ARM Limited 0x44 D Digital Equipment Corporation 0x4D M Motorola, Freescale Semiconductor Inc. 0x51 Q QUALCOMM Inc. 0x56 V Marvell Semiconductor Inc. 0x69 i Intel Corporation All other values are reserved by ARM and must not be used. Variant, bits [23:20] An IMPLEMENTATION DEFINED variant number. Typically, this field is used to distinguish between different product variants, for example implementations of the same product with different cache sizes. B4-32 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) Architecture, bits [19:16] Table B4-11 shows the permitted values for this field: Table B4-11 Architecture codes Bits [19:16] Architecture 0x1 ARMv4 0x2 ARMv4T 0x3 ARMv5 (obsolete) 0x4 ARMv5T 0x5 ARMv5TE 0x6 ARMv5TEJ 0x7 ARMv6 0xF Defined by CPUID scheme All other values are reserved by ARM and must not be used. Primary part number, bits [15:4] An IMPLEMENTATION DEFINED primary part number for the device. Note On processors implemented by ARM, if the top four bits of the primary part number are 0x0 or 0x7, the variant and architecture are encoded differently, see c0, Main ID Register (MIDR) on page AppxH-34. Processors implemented by ARM have an Implementer code of 0x41. Revision, bits [3:0] An IMPLEMENTATION DEFINED revision number for the device. ARMv7 requires all implementations to use the CPUID scheme, described in Chapter B5 The CPUID Identification Scheme, and an implementation is described by the MIDR and the CPUID registers. Note For an ARMv7 implementation by ARM, the MIDR is interpreted as: Bits [31:24] Implementer code, must be 0x41. Bits [23:20] Major revision number, rX. Bits [19:16] Architecture code, must be 0xF. Bits [15:4] ARM part number. Bits [3:0] Minor revision number, pY. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-33 Protected Memory System Architecture (PMSA) Accessing the MIDR To access the MIDR you read the CP15 registers with set to 0, set to c0, set to c0, and set to 0. For example: MRC p15,0,,c0,c0,0 B4.6.7 ; Read CP15 Main ID Register c0, Cache Type Register (CTR) The Cache Type Register, CTR, provides information about the architecture of the caches. The CTR is: • a 32-bit read-only register • accessible only in privileged modes. The format of the CTR is changed from ARMv7. The ARMv7 format of the register is indicated by bits [31:29] being set to 0b100. For details of the format of the Cache Type Register in versions of the ARM architecture before ARMv7 see c0, Cache Type Register (CTR) on page AppxH-35. In ARMv7, the format of the CTR is: 31 29 28 27 1 0 0 0 24 23 CWG 20 19 ERG 16 15 DminLine 4 1 0 0 0 0 0 0 0 0 0 0 0 3 0 IminLine Bits [31:29] Set to 0b100 for the ARMv7 register format. Set to 0b000 for the format used in ARMv6 and earlier. Bit [28] RAZ. CWG, bits [27:24] Cache Writeback Granule. Log2 of the number of words of the maximum size of memory that can be overwritten as a result of the eviction of a cache entry that has had a memory location in it modified. A value of 0b0000 indicates that the CTR does not provide Cache Writeback Granule information and either: • the architectural maximum of 512 words (2Kbytes) must be assumed • the Cache Writeback Granule can be determined from maximum cache line size encoded in the Cache Size ID Registers. Values greater than 0b1001 are reserved. ERG, bits [27:24] Exclusives Reservation Granule. Log2 of the number of words of the maximum size of the reservation granule that has been implemented for the Load-Exclusive and Store-Exclusive instructions. For more information, see Tagging and the size of the tagged memory block on page A3-20. A value of 0b0000 indicates that the CTR does not provide Exclusives Reservation Granule information and the architectural maximum of 512 words (2Kbytes) must be assumed. B4-34 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) Values greater than 0b1001 are reserved. DminLine, bits [19:16] Log2 of the number of words in the smallest cache line of all the data caches and unified caches that are controlled by the processor. Bit [15] RAO. Bits [14:4] RAZ. IminLine, bits [3:0] Log2 of the number of words in the smallest cache line of all the instruction caches that are controlled by the processor. Accessing the CTR To access the CTR you read the CP15 registers with set to 0, set to c0, set to c0, and set to 1. For example MRC p15,0,,c0,c0,1 B4.6.8 ; Read CP15 Cache Type Register c0, TCM Type Register (TCMTR) The TCM Type Register, TCMTR, provides information about the implementation of the TCM. The TCMTR is: • a 32-bit read-only register • accessible only in privileged modes. From ARMv7:the • TCMTR must be implemented • when the ARMv7 format is used, the meaning of register bits [28:0] is IMPLEMENTATION DEFINED • the ARMv6 format of the TCM Type Register remains a valid usage model • if no TCMs are implemented the ARMv6 format must be used to indicate zero-sized TCMs. The ARMv7 format of the TCMTR is: 31 29 28 0 1 0 0 IMPLEMENTATION DEFINED Bits [31:29] Set to 0b100 for the ARMv7 register format. Set to 0b000 for the format used in ARMv6 and earlier. Bits [28:0] IMPLEMENTATION DEFINED ARM DDI 0406B in the ARMv7 register format. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-35 Protected Memory System Architecture (PMSA) If no TCMs are implemented, the TCMTR must be implemented with this ARMv6 format: 31 29 28 19 18 0 0 0 16 15 0 0 0 UNKNOWN 3 2 0 0 0 0 UNKNOWN For details of the ARMv6 optional implementation of the TCM Type Register see c0, TCM Type Register (TCMTR) on page AppxG-33. Accessing the TCMTR To access the TCMTR you read the CP15 registers with set to 0, set to c0, set to c0, and set to 2. For example: MRC p15,0,,c0,c0,2 B4.6.9 ; Read CP15 TCM Type Register c0, MPU Type Register (MPUIR) The MPU Type Register, MPUIR, identifies the features of the MPU implementation. In particular it identifies: • whether the MPU implements: — a Unified address map, also referred to as a von Neumann architecture — separate Instruction and Data address maps, also referred to as a Harvard architecture. • the number of memory regions implemented by the MPU. The MPUIR is: • a 32-bit read-only register • accessible only in privileged modes • implemented only when the PMSA is implemented. The format of the MPUIR is: 31 24 23 UNKNOWN Bits [31:24] 16 15 IRegion 8 7 DRegion 1 UNKNOWN 0 nU UNKNOWN. IRegion, bits [23:16] Specifies the number of Instruction regions implemented by the MPU. If the MPU implements a Unified memory map this field is UNK/SBZ. DRegion, bits [15:8] Specifies the number of Data or Unified regions implemented by the MPU. If this field is zero, no MPU is implemented, and the default memory map is in use. Bits [7:1] B4-36 UNKNOWN. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) nU, bit [0] Not Unified MPU. Indicates whether the MPU implements a unified memory map: nU == 0 Unified memory map. Bits [23:16] of the register are zero. nU == 1 Separate Instruction and Data memory maps. Accessing the MPUIR To access the MPUIR you read the CP15 registers with set to 0, set to c0, set to c0, and set to 4. For example: MRC p15,0,,c0,c0,4 ; Read CP15 MPU Type Register B4.6.10 c0, Multiprocessor Affinity Register (MPIDR) The Multiprocessor Affinity Register, MPIDR, provides an additional processor identification mechanism for scheduling purposes in a multiprocessor system. In a uniprocessor system ARM recommends that this register returns a value of 0. The MPIDR is: • a 32-bit read-only register • accessible only in privileged modes • introduced in ARMv7. The format of the MPIDR is: 31 24 23 0 0 0 0 0 0 0 0 16 15 Affinity level 2 8 7 Affinity level 1 0 Affinity level 0 Note In the MIDR bit definitions, a processor in the system can be a physical processor or a virtual processor. Bits [31:24] Reserved, RAZ. Affinity level 2, bits [23:16] The least significant affinity level field, for this processor in the system. Affinity level 1, bits [15:8] The intermediate affinity level field, for this processor in the system. Affinity level 0, bits [7:0] The most significant level field, for this processor in the system. In the system as a whole, for each of the affinity level fields, the assigned values must start at 0 and increase monotonically. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-37 Protected Memory System Architecture (PMSA) Increasing monotonically means that: • There must not be any gaps in the sequence of numbers used. • A higher value of the field includes any properties indicated by all lower values of the field. When matching against an affinity level field, scheduler software checks for a value equal to or greater than a required value. Recommended use of the MPIDR includes a description of an example multiprocessor system and the affinity level field values it might use. The interpretation of these fields is IMPLEMENTATION DEFINED, and must be documented as part of the documentation of the multiprocessor system. ARM recommends that this register might be used as described in the next subsection. The software mechanism to discover the total number of affinity numbers used at each level is IMPLEMENTATION DEFINED, and is part of the general system identification task. Recommended use of the MPIDR In a multiprocessor system the register might provide two important functions: • Identifying special functionality of a particular processor in the system. In general, the actual meaning of the affinity level fields is not important. In a small number of situations, an affinity level field value might have a special IMPLEMENTATION DEFINED significance. Possible examples include booting from reset and power-down events. • Providing affinity information for the scheduling software, to help the scheduler run an individual thread or process on either: — the same processor, or as similar a processor as possible, as the processor it was running on previously — a processor on which a related thread or process was run. Note A monotonically increasing single number ID mechanism provides a convenient index into software arrays and for accessing the interrupt controller. This might be: • performed as part of the boot sequence • stored as part of the local storage of threads. B4-38 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) MPIDR provides a mechanism with up to three levels of affinity information, but the meaning of those levels of affinity is entirely IMPLEMENTATION DEFINED. The levels of affinity provided can have different meanings. Table B4-12 shows two possible implementations: Table B4-12 Possible implementations of the affinity levels Affinity Level Example system 1 Example system 2 0 Virtual CPUs in a in a multi-threaded processor Processors in an SMP cluster 1 Processors in an Symmetric Multi Processor (SMP) cluster Clusters with a system 2 Clusters in a system No meaning, fixed as 0. The scheduler maintains affinity level information for all threads and processes. When it has to reschedule a thread or process the scheduler: • looks for an available processor that matches at all three affinity levels • if this fails, it might look for a processor that matches at levels 2 and 3 only • if it still cannot find an available processor it might look for a match at level 3 only. A multiprocessor system corresponding to Example system 1 in Table B4-12 might implement affinity values as shown in Table B4-13: Table B4-13 Example of possible affinity values at different affinity levels Affinity level 2, Cluster level Affinity level 1, Processor level Affinity level 0, Virtual CPU level 0 0 0, 1 0 1 0, 1 0 2 0, 1 0 3 0, 1 1 0 0, 1 1 1 0, 1 1 2 0, 1 1 3 0, 1 Accessing the MPIDR To access the MPIDR you read the CP15 registers with set to 0, set to c0, set to c0, and set to 5. For example: MRC p15,0,,c0,c0,5 ARM DDI 0406B ; Read Multiprocessor Affinity Register Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-39 Protected Memory System Architecture (PMSA) B4.6.11 c0, Cache Size ID Registers (CCSIDR) The Cache Size ID Registers, CCSIDR, provide information about the architecture of the caches. The CCSIDR registers are: • 32-bit read-only registers • accessible only in privileged modes • introduced in ARMv7. One CCSIDR is implemented for each cache that can be accessed by the processor. CSSELR selects which Cache Size ID register is accessible, see c0, Cache Size Selection Register (CSSELR) on page B4-43. The format of a CCSIDR is: 31 30 29 28 27 13 12 W W R W T B A A NumSets 3 Associativity 2 0 LineSize WT, bit [31] Indicates whether the cache level supports Write-Through, see Table B4-14. WB, bit [30] Indicates whether the cache level supports Write-Back, see Table B4-14. RA, bit [29] Indicates whether the cache level supports Read-Allocation, see Table B4-14. WA, bit [28] Indicates whether the cache level supports Write-Allocation, see Table B4-14. Table B4-14 WT, WB, RA and WA bit values WT, WB, RA or WA bit value Meaning 0 Feature not supported 1 Feature supported NumSets, bits [27:13] (Number of sets in cache) - 1, therefore a value of 0 indicates 1 set in the cache. The number of sets does not have to be a power of 2. Associativity, bits [12:3] (Associativity of cache) - 1, therefore a value of 0 indicates an associativity of 1. The associativity does not have to be a power of 2. B4-40 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) LineSize, bits [2:0] (Log2(Number of words in cache line)) -2. For example: • For a line length of 4 words: Log2(4) = 2, LineSize entry = 0. This is the minimum line length. • For a line length of 8 words: Log2(8) = 3, LineSize entry = 1. Accessing the currently selected CCSIDR The CSSELR selects a CCSIDR, see c0, Cache Size Selection Register (CSSELR) on page B4-43. To access the currently-selected CCSIDR you read the CP15 registers with set to 1, set to c0, set to c0, and set to 0. For example: MRC p15,1,,c0,c0,0 ; Read current CP15 Cache Size ID Register Accessing the CCSIDR when the value in CSSELR corresponds to a cache that is not implemented returns an UNKNOWN value. B4.6.12 c0, Cache Level ID Register (CLIDR) The Cache Level ID Register, CLIDR: • identifies the type of cache, or caches, implemented at each level, up to a maximum of eight levels • identifies the Level of Coherency and Level of Unification for the cache hierarchy. The CLIDR is: • a 32-bit read-only register • accessible only in privileged modes • introduced in ARMv7. The format of the CLIDR is: 31 30 29 0 0 27 26 LoUU Bits [31:30] 24 23 LoC 21 20 LoUIS 18 17 Ctype7 15 14 Ctype6 12 11 Ctype5 9 Ctype4 8 6 Ctype3 5 3 Ctype2 2 0 Ctype1 RAZ. LoUU, bits [29:27] Level of Unification Uniprocessor for the cache hierarchy, see Clean, Invalidate, and Clean and Invalidate on page B2-11. LoC, bits [26:24] Level of Coherency for the cache hierarchy, see Clean, Invalidate, and Clean and Invalidate on page B2-11. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-41 Protected Memory System Architecture (PMSA) LoUIS, bits [23:21] Level of Unification Inner Shareable for the cache hierarchy, see Clean, Invalidate, and Clean and Invalidate on page B2-11. This field is RAZ in implementations that do not implement the Multiprocessing extension. CtypeX, bits [3(x - 1) + 2:3(x - 1)], for x = 1 to 7 Cache type fields. Indicate the type of cache implemented at each level, from Level 1 up to a maximum of seven levels of cache hierarchy. The Level 1 cache type field, Ctype1, is bits [2:0], see register diagram. Table B4-15 shows the possible values for each CtypeX field. Table B4-15 Ctype bit values CtypeX bits Meaning, cache implemented at this level 000 No cache 001 Instruction cache only 010 Data cache only 011 Separate instruction and data caches 100 Unified cache 101, 11X Reserved If you read the Cache type fields from Ctype1 upwards, once you have seen a value of 0b000, no caches exist at further out levels of the hierarchy. So, for example, if Ctype3 is the first Cache type field with a value of 0b000, the values of Ctype4 to Ctype7 must be ignored. The CLIDR describes only the caches that are under the control of the processor. Accessing the CLIDR To access the CLIDR you read the CP15 registers with set to 1, set to c0, set to c0, and set to 1. For example: MRC p15,1,,c0,c0,1 B4-42 ; Read CP15 Cache Level ID Register Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) B4.6.13 c0, IMPLEMENTATION DEFINED Auxiliary ID Register (AIDR) The IMPLEMENTATION DEFINED Auxiliary ID Register, AIDR, provides implementation-specific ID information. The value of this register must be used in conjunction with the value of the MIDR. The IMPLEMENTATION DEFINED AIDR is: • a 32-bit read-only register • accessible only in privileged modes • introduced in ARMv7. The format of the AIDR is IMPLEMENTATION DEFINED. Accessing the AIDR To access the AIDR you read the CP15 registers with set to 1, set to c0, set to c0, and set to 7. For example: MRC p15,1,,c0,c0,7 ; Read IMPLEMENTATION DEFINED Auxiliary ID Register B4.6.14 c0, Cache Size Selection Register (CSSELR) The Cache Size Selection Register, CSSELR, selects the current CCSIDR. An ARMv7 implementation must include a CCSIDR for every implemented cache that is under the control of the processor. The CSSELR identifies which CCSIDR can be accessed, by specifying, for the required cache: • the cache level • the cache type, either: — instruction cache. — Data cache. The data cache argument is also used for a unified cache. CSSELR is: • a 32-bit read/write register • accessible only in privileged modes • introduced in ARMv7. The format of the CSSELR is: 31 4 3 UNK/SBZP 1 0 Level InD Bits [31:4] UNK/SBZP. Level, bits [3:1] Cache level of required cache. Permitted values are from 0b000, indicating Level 1 cache, to 0b110 indicating Level 7 cache. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-43 Protected Memory System Architecture (PMSA) InD, bit [0] Instruction not data bit. Permitted values are: 0 Data or unified cache 1 Instruction cache. If CSSELR is set to indicate a cache that is not implemented, the result of reading the current CCSIDR is UNPREDICTABLE. Accessing CSSELR To access CSSELR you read or write the CP15 registers with set to 2, set to c0, set to c0, and set to 0. For example: MRC p15,2,,c0,c0,0 MCR p15,2,,c0,c0,0 ; Read Cache Size Selection Register ; Write Cache Size Selection Register B4.6.15 CP15 c1, System control registers The CP15 c1 registers are used for system control. Figure B4-5 shows the CP15 c1 registers. CRn c1 opc1 0 CRm c0 Read-only opc2 0 1 2 Read/Write SCTLR, Control Register ACTLR, Auxiliary Control Register, IMPLEMENTATION DEFINED CPACR, Coprocessor Access Control Register Write-only Figure B4-5 CP15 c1 registers in a PMSA implementation All CP15 c1 register encodings not shown in Figure B4-5 are UNPREDICTABLE, see Unallocated CP15 encodings on page B4-27. B4-44 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) B4.6.16 c1, System Control Register (SCTLR) The System Control Register, SCTLR, provides the top level control of the system, including its memory system. The SCTLR: • Is a 32-bit read/write register, with different access rights for some bits of the register. In ARMv7, some bits in the register are read-only. These bits relate to non-configurable features of an ARMv7 implementation, and are provided for compatibility with previous versions of the architecture. • Is accessible only in privileged modes. • Has a defined reset value. The reset value is IMPLEMENTATION DEFINED, see Reset value of the SCTLR on page B4-49. Control bits in the SCTLR that are not applicable to a PMSA implementation read as the value that most closely reflects that implementation, and ignore writes. In an ARMv7-R implementation the format of the SCTLR is: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 0 IE TE 0 NMFI IE, bit [31] 1 1 EE VE U 0 FI 1 DZ 1 0 BR V I Z RR 0 0 0 1 1 1 1 C A M SW B Instruction Endianness. This bit indicates the endianness of the instructions issued to the processor: 0 Little-endian byte ordering in the instructions 1 Big-endian byte ordering in the instructions. When set, this bit causes the byte order of instructions to be reversed at runtime. This bit is read-only. It is IMPLEMENTATION DEFINED which instruction endianness is used by an ARMv7-R implementation, and this bit must indicate the implemented endianness. If IE == 1 and EE == 0, behavior is UNPREDICTABLE. TE, bit [30] Thumb Exception enable. This bit controls whether exceptions are taken in ARM or Thumb state: 0 Exceptions, including reset, handled in ARM state 1 Exceptions, including reset, handled in Thumb state. An implementation can include a configuration input signal that determines the reset value of the TE bit. If the implementation does not include a configuration signal for this purpose then this bit resets to zero in an ARMv7-R implementation. For more information about the use of this bit see Instruction set state on exception entry on page B1-35. Bits [29:28] ARM DDI 0406B RAZ/SBZP. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-45 Protected Memory System Architecture (PMSA) NMFI, bit [27] Non-Maskable Fast Interrupts enable: 0 Fast interrupts (FIQs) can be masked in the CPSR 1 Fast interrupts are non-maskable. This bit is read-only. It is IMPLEMENTATION DEFINED whether an implementation supports Non-Maskable Fast Interrupts (NMFIs): • If NMFIs are not supported then this bit is RAZ/WI. • If NMFIs are supported then this bit is determined a configuration input signal. For more information, see Non-maskable fast interrupts on page B1-18. Bit [26] RAZ/SBZP. EE, bit [25] Exception Endianness bit. The value of this bit defines the value of the CPSR.E bit on entry to an exception vector, including reset. The permitted values of this bit are: 0 Little endian 1 Big endian. This is a read/write bit. An implementation can include a configuration input signal that determines the reset value of the EE bit. If the implementation does not include a configuration signal for this purpose then this bit resets to zero. If IE == 1 and EE == 0, behavior is UNPREDICTABLE. VE, bit [24] Interrupt Vectors Enable bit. This bit controls the vectors used for the FIQ and IRQ interrupts. The permitted values of this bit are: 0 Use the FIQ and IRQ vectors from the vector table, see the V bit entry 1 Use the IMPLEMENTATION DEFINED values for the FIQ and IRQ vectors. For more information, see Vectored interrupt support on page B1-32. If the implementation does not support IMPLEMENTATION DEFINED FIQ and IRQ vectors then this bit is RAZ/WI. Bit [23] RAO/SBOP. U, bit [22] In ARMv7 this bit is RAO/SBOP, indicating use of the alignment model described in Alignment support on page A3-4. For details of this bit in earlier versions of the architecture see Alignment on page AppxG-6. FI, bit [21] Fast Interrupts configuration enable bit. This bit can be used to reduce interrupt latency in an implementation by disabling IMPLEMENTATION DEFINED performance features. The permitted values of this bit are: 0 All performance features enabled. 1 Low interrupt latency configuration. Some performance features disabled. If the implementation does not support a mechanism for selecting a low interrupt latency configuration this bit is RAZ/WI. For more information, see Low interrupt latency configuration on page B1-43. B4-46 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) Bit [20] RAZ/SBZP. DZ, bit [19] Divide by Zero fault enable bit. Any ARMv7-R implementation includes instructions to perform unsigned and signed division, see SDIV on page A8-310 and UDIV on page A8-468. This bit controls whether an integer divide by zero causes an Undefined Instruction exception: 0 1 Divide by zero returns the result zero, and no exception is taken Attempting a divide by zero causes an Undefined Instruction exception on the SDIV or UDIV instruction. Bit [18] RAO/SBOP. BR, bit [17] Background Region bit. When the MPU is enabled this bit controls how an access that does not map to any MPU memory region is handled: 0 Any access to an address that is not mapped to an MPU region generates a Background Fault memory abort. This is the PMSAv6 behavior. 1 The default memory map is used as a background region: • A privileged access to an address that does not map to an MPU region takes the properties defined for that address in the default memory map. • An unprivileged access to an address that does not map to an MPU region generates a Background Fault memory abort. For more information, see Using the default memory map as a background region on page B4-5. Bit [16] RAO/SBOP. Bit [15] RAZ/SBZP. RR, bit [14] Round Robin bit. If the cache implementation supports the use of an alternative replacement strategy that has a more easily predictable worst-case performance, this bit selects it: 0 Normal replacement strategy, for example, random replacement 1 Predictable strategy, for example, round-robin replacement. The RR bit must reset to 0. The replacement strategy associated with each value of the RR bit is IMPLEMENTATION DEFINED. If the implementation does not support multiple IMPLEMENTATION DEFINED replacement strategies this bit is RAZ/WI. V, bit [13] Vectors bit. This bit selects the base address of the exception vectors: 0 Normal exception vectors, base address 0x00000000. 1 High exception vectors (Hivecs), base address 0xFFFF0000. For more information, see Exception vectors and the exception base address on page B1-30. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-47 Protected Memory System Architecture (PMSA) Note Use of the Hivecs setting, V == 1, is deprecated in an ARMv7-R implementation. An implementation can include a configuration input signal that determines the reset value of the V bit. If the implementation does not include a configuration signal for this purpose then this bit resets to zero. I, bit [12] Instruction cache enable bit: This is a global enable bit for instruction caches: 0 Instruction caches disabled 1 Instruction caches enabled. If the system does not implement any instruction caches that can be accessed by the processor, at any level of the memory hierarchy, this bit is RAZ/WI. If the system implements any instruction caches that can be accessed by the processor then it must be possible to disable them by setting this bit to 0. Cache enabling and disabling on page B2-8 describes the effect of enabling the caches. Z, bit [11] Branch prediction enable bit. This bit is used to enable branch prediction, also called program flow prediction: 0 Program flow prediction disabled 1 Program flow prediction enabled. If program flow prediction cannot be disabled, this bit is RAO/WI. If the implementation does not support program flow prediction then this bit is RAZ/WI. SW, bit[10] SWP/SWPB enable bit. This bit enables the use of SWP and SWPB instructions: 0 SWP and SWPB are UNDEFINED 1 SWP and SWPB perform as described in section SWP, SWPB on page A8-432. This bit is added as part of the Multiprocessing Extensions. Note At reset, this bit disables SWP and SWPB. This means that operating systems have to choose to use SWP and SWPB. Bits [9:8] RAZ/SBZP. B, bit [7] In ARMv7 this bit is RAZ/SBZP, indicating use of the endianness model described in Endian support on page A3-7. For details of this bit in earlier versions of the architecture see Endian support on page AppxG-7 and Endian support on page AppxH-7. Bits [6:3] B4-48 RAO/SBOP. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) C, bit [2] Cache enable bit: This is a global enable bit for data and unified caches: 0 Data and unified caches disabled 1 Data and unified caches enabled. If the system does not implement any data or unified caches that can be accessed by the processor, at any level of the memory hierarchy, this bit is RAZ/WI. If the system implements any data or unified caches that can be accessed by the processor then it must be possible to disable them by setting this bit to 0. Cache enabling and disabling on page B2-8 describes the effect of enabling the caches. A, bit [1] Alignment bit. This is the enable bit for Alignment fault checking: 0 Alignment fault checking disabled 1 Alignment fault checking enabled. For more information, see Alignment fault on page B4-14. M, bit [0] MPU enable bit. This is a global enable bit for the MPU: 0 MPU disabled 1 MPU enabled. For more information, see Enabling and disabling the MPU on page B4-5. Reset value of the SCTLR The SCTLR has a defined reset value that is IMPLEMENTATION DEFINED. There are different types of bit in the SCTLR: • Some bits are defined as RAZ or RAO, and have the same value in all PMSAv7 implementations. Figure B4-6 on page B4-50 shows the values of these bits. • Some bits are read-only and either: — have an IMPLEMENTATION DEFINED value — have a value that is determined by a configuration input signal. • Some bits are read/write and either: — reset to zero — reset to an IMPLEMENTATION DEFINED value — reset to a value that is determined by a configuration input signal. Figure B4-6 on page B4-50 shows the reset value, or how the reset value is defined, for each bit of the SCTLR. It also shows the possible values of each half byte of the register. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-49 Protected Memory System Architecture (PMSA) 0xC, 0x8, 0xA, 0x8, 0x4 or 0x0 0x2 or 0x0 0xC 0x5 0x2 or 0x0 0x8 or 0x0 0x7 0x8 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 *† IE (*) (†) (*) * * * * * * * * * * * * * * * * * (*) ‡ 0 0 ‡ 0 ‡ 0 1 1 0 0 0 1 0 1 0 0 ‡ 0 (0) 0 0 0 0 1 1 1 1 0 0 0 TE U DZ BR RR V I Z SW B C A M VE NMFI EE FI * Read-only bits, including RAZ and RAO bits. (*) Can be RAZ. Otherwise read/write, resets to 0. † Value is IMPLEMENTATION DEFINED. (†) Can be read-only, with IMPLEMENTATION DEFINED value. Otherwise resets to 0. ‡ Value or reset value can depend on configuration input. Otherwise RAZ or resets to 0. Figure B4-6 Reset value of the SCTLR, ARMv7-R (PMSAv7) Accessing the SCTLR To access SCTLR you read or write the CP15 registers with set to 0, set to c1, set to c0, and set to 0. For example: MRC p15,0,,c1,c0,0 MCR p15,0,,c1,c0,0 ; Read CP15 System Control Register ; Write CP15 System Control Register Note Additional configuration and control bits might be added to the SCTLR in future versions of the ARM architecture. ARM strongly recommends that software always uses a read, modify, write sequence to update the SCTLR. This prevents software modifying any bit that is currently unallocated, and minimizes the chance of the register update having undesired side effects. B4.6.17 c1, IMPLEMENTATION DEFINED Auxiliary Control Register (ACTLR) The Auxiliary Control Register, ACTLR, provides implementation-specific configuration and control options. The ACTLR is: • A 32-bit read/write register. • Accessible only in privileged modes. The contents of this register are IMPLEMENTATION DEFINED. ARMv7 requires this register to be privileged read/write accessible, even if an implementation has not created any control bits in this register. B4-50 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) Accessing the ACTLR To access the ACTLR you read or write the CP15 registers with set to 0, set to c1, set to c0, and set to 1. For example: MRC p15,0,,c1,c0,1 MCR p15,0,,c1,c0,1 ; Read CP15 Auxiliary Control Register ; Write CP15 Auxiliary Control Register B4.6.18 c1, Coprocessor Access Control Register (CPACR) The Coprocessor Access Control Register, CPACR, controls access to all coprocessors other than CP14 and CP15. It also enables software to check for the presence of coprocessors CP0 to CP13. The CPACR: • is a 32-bit read/write register • is accessible only in privileged modes. • has a defined reset value of 0. The format of the CPACR is: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 (0) (0) cp13 cp12 cp11 cp10 cp9 cp8 cp7 cp6 cp5 cp4 cp3 cp2 cp1 cp0 D32DIS ASEDIS ASEDIS, bit[31] Disable Advanced SIMD functionality: 0 This bit does not cause any instructions to be UNDEFINED. 1 All instruction encodings identified in the Alphabetical list of instructions on page A8-14 as being part of Advanced SIMD, but that are not VFPv3 instructions, are UNDEFINED. On an implementation that: • Implements VFP and does not implement Advanced SIMD, this bit is RAO/WI. • Does not implement VFP or Advanced SIMD, this bit is UNK/SBZP. • Implements both VFP and Advanced SIMD, it is IMPLEMENTATION DEFINED whether this bit is supported. If it is not supported it is RAZ/WI. This bit resets to 0 if it is supported. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-51 Protected Memory System Architecture (PMSA) D32DIS, bit[30] Disable use of D16-D31 of the VFP register file: 0 This bit does not cause any instructions to be UNDEFINED. 1 All instruction encodings identified in the Alphabetical list of instructions on page A8-14 as being VFPv3 instructions are UNDEFINED if they access any of registers D16-D31. If this bit is 1 when CPACR.ASEDIS == 0, the result is UNPREDICTABLE. On an implementation that: • Does not implement VFP, this bit is UNK/SBZP. • Implements VFP and does not implement D16-D31, this bit is RAO/WI. • Implements VFP and implements D16-D31, it is IMPLEMENTATION DEFINED whether this bit is supported. If it is not supported it is RAZ/WI. This bit resets to 0 if it is supported. Bits [29:28] Reserved. UNK/SBZP. cp, bits [2n+1, 2n], for n = 0 to 13 Defines the access rights for coprocessor n. The possible values of the field are: 00 Access denied. Any attempt to access the coprocessor generates an Undefined Instruction exception. 01 Privileged access only. Any attempt to access the coprocessor in User mode generates an Undefined Instruction exception. 10 Reserved. The effect of this value is UNPREDICTABLE. 11 Full access. The meaning of full access is defined by the appropriate coprocessor. The value for a coprocessor that is not implemented is 00, access denied. If more than one coprocessor is used to provide a set of functionality then having different values for the CPACR fields for those coprocessors can lead to UNPREDICTABLE behavior. An example where this must be considered is with the VFP extension, that uses CP10 and CP11. Typically, an operating system uses this register to control coprocessor resource sharing among applications: • Initially all applications are denied access to the shared coprocessor-based resources. • When an application attempts to use a resource it results in an Undefined Instruction exception. • The Undefined Instruction handler can then grant access to the resource by setting the appropriate field in the CPACR. For details of how this register can be used to check for implemented coprocessors see Access controls on CP0 to CP13 on page B1-63. B4-52 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) Sharing resources among applications requires a state saving mechanism. Two possibilities are: • during a context switch, if the last executing process or thread had access rights to a coprocessor then the operating system saves the state of that coprocessor • on receiving a request for access to a coprocessor, the operating system saves the old state for that coprocessor with the last process or thread that accessed it. Accessing the CPACR To access the CPACR you read or write the CP15 registers with set to 0, set to c1, set to c0, and set to 2. For example: MRC p15,0,,c1,c0,2 MCR p15,0,,c1,c0,2 ; Read CP15 Coprocessor Access Control Register ; Write CP15 Coprocessor Access Control Register Normally, software uses a read, modify, write sequence to update the CPACR, to avoid unwanted changes to the access settings for other coprocessors. B4.6.19 CP15 c2 and c3, Not used on a PMSA implementation The CP15 c2 and c3 register encodings are not used on an ARMv7-R implementation, see Unallocated CP15 encodings on page B4-27. B4.6.20 CP15 c4, Not used The CP15 c4 register encodings are not used on an ARMv7 implementation, see Unallocated CP15 encodings on page B4-27. B4.6.21 CP15 c5 and c6, Memory system fault registers The CP15 c5 and c6 registers are used for memory system fault reporting. In addition, c6 provides the MPU Region registers. Figure B4-7 on page B4-54 shows the CP15 c5 and c6 registers. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-53 Protected Memory System Architecture (PMSA) CRn c5 opc1 0 CRm c0 opc2 0 1 0 1 0 2 0 1 2 3 4 5 0 c1 c6 0 c0 c1 c2 Read-only Read/Write DFSR, Data Fault Status Register IFSR, Instruction Fault Status Register ADFSR, Auxiliary DFSR Details are IMPLEMENTATION DEFINED AIFSR, Auxiliary IFSR DFAR, Data Fault Address Register IFAR, Instruction Fault Address Register DRBAR, Data Region Base Address Register IRBAR, Instruction Region Base Address Register DRSR, Data Region Size and Enable Register IRSR, Instruction Region Size and Enable Register DRACR, Data Region Access Control Register IRACR, Instruction Region Access Control Register RGNR, MPU Region Number Register Write-only Figure B4-7 CP15 c5 and c6 registers in a PMSA implementation All CP15 c5 and c6 register encodings not shown in Figure B4-7 are UNPREDICTABLE, see Unallocated CP15 encodings on page B4-27. The CP15 c5 and c6 registers are described in: • CP15 c5, Fault status registers • CP15 c6, Fault Address registers on page B4-57 • CP15 c6, Memory region programming registers on page B4-59. Also, these registers are used to report information about debug exceptions. For details see Effects of debug exceptions on CP15 registers and the DBGWFAR on page C4-4. B4.6.22 CP15 c5, Fault status registers There are two fault status registers, in CP15 c5, and the architecture provides encodings for two additional IMPLEMENTATION DEFINED registers. Table B4-16 summarizes these registers. Table B4-16 Fault status registers Register name Description Data Fault Status Register (DFSR) c5, Data Fault Status Register (DFSR) on page B4-55 Instruction Fault Status Register (IFSR) c5, Instruction Fault Status Register (IFSR) on page B4-56 Auxiliary Data Fault Status Register (ADFSR) c5, Auxiliary Data and Instruction Fault Status Registers (ADFSR and AIFSR) on page B4-56 Auxiliary Instruction Fault Status Register (AIFSR) B4-54 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) Fault information is returned using the fault status registers and the fault address registers described in CP15 c6, Fault Address registers on page B4-57. For details of how these registers are used see Fault Status and Fault Address registers in a PMSA implementation on page B4-18. c5, Data Fault Status Register (DFSR) The Data Fault Status Register, DFSR, holds status information about the last data fault. The DFSR is: • a 32-bit read/write register • accessible only in privileged modes. The format of the DFSR is: 31 13 12 11 10 9 4 3 UNK/SBZP UNK/SBZP 0 FS[3:0] ExT WnR FS[4] Bits [31:13,9:4] UNK/SBZP. ExT, bit [12] External abort type. This bit can be used to provide an IMPLEMENTATION DEFINED classification of external aborts. For aborts other than external aborts this bit always returns 0. WnR, bit [11] Write not Read bit. Indicates whether the abort was caused by a write or a read access: 0 Abort caused by a read access 1 Abort caused by a write access. For faults on CP15 cache maintenance operations this bit always returns a value of 1. FS, bits [10,3:0] Fault status bits. For the valid encodings of these bits in an ARMv7-R implementation with a PMSA, see Table B4-7 on page B4-20. All encodings not shown in the table are reserved. For information about using the DFSR see Fault Status and Fault Address registers in a PMSA implementation on page B4-18. Accessing the DFSR To access the DFSR you read or write the CP15 registers with set to 0, set to c5, set to c0, and set to 0. For example: MRC p15,0,,c5,c0,0 MCR p15,0,,c5,c0,0 ARM DDI 0406B ; Read CP15 Data Fault Status Register ; Write CP15 Data Fault Status Register Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-55 Protected Memory System Architecture (PMSA) c5, Instruction Fault Status Register (IFSR) The Instruction Fault Status Register, IFSR, holds status information about the last instruction fault. The IFSR is: • a 32-bit read/write register • accessible only in privileged modes. The format of the IFSR is: 31 13 12 11 10 9 (0) UNK/SBZP 4 3 UNK/SBZP 0 FS[3:0] ExT FS[4] Bits [31:13,11,9:4] UNK/SBZP. ExT, bit [12] External abort type. This bit can be used to provide an IMPLEMENTATION DEFINED classification of external aborts. For aborts other than external aborts this bit always returns 0. FS, bits [10,3:0] Fault status bits. See Table B4-7 on page B4-20 for the valid encodings of these bits. All encodings not shown in the table are reserved. For information about using the IFSR see Fault Status and Fault Address registers in a PMSA implementation on page B4-18. Accessing the IFSR To access the IFSR you read or write the CP15 registers with set to 0, set to c5, set to c0, and set to 1. For example: MRC p15,0,,c5,c0,1 MCR p15,0,,c5,c0,1 ; Read CP15 Instruction Fault Status Register ; Write CP15 Instruction Fault Status Register c5, Auxiliary Data and Instruction Fault Status Registers (ADFSR and AIFSR) The Auxiliary Data Fault Status Register (ADFSR) and the Auxiliary Instruction Fault Status Register (AIFSR) enable the system to return additional IMPLEMENTATION DEFINED fault status information, see Auxiliary Fault Status Registers on page B4-21. The ADFSR and AIFSR are: • 32-bit read/write registers • accessible only in privileged modes • introduced in ARMv7. B4-56 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) The formats of the ADFSR and AIFSR are IMPLEMENTATION DEFINED. Accessing the ADFSR and AIFSR To access the ADFSR or AIFSR you read or write the CP15 registers with set to 0, set to c5, set to c1, and set to: • • 0 for the ADFSR 1 for the AIFSR. For example: MRC MCR MRC MCR p15,0,,c5,c1,0 p15,0,,c5,c1,0 p15,0,,c5,c1,1 p15,0,,c5,c1,1 ; ; ; ; Read CP15 Auxiliary Data Fault Status Register Write CP15 Auxiliary Data Fault Status Register Read CP15 Auxiliary Instruction Fault Status Register Write CP15 Auxiliary Instruction Fault Status Register B4.6.23 CP15 c6, Fault Address registers There are two Fault Address registers, in CP15 c6, as shown in Figure B4-7 on page B4-54. The two Fault Address registers complement the Fault Status registers, and are shown in Table B4-17. Table B4-17 Fault address registers Register name Description Data Fault Address Register (DFAR) c6, Data Fault Address Register (DFAR) Instruction Fault Address Register (IFAR) c6, Instruction Fault Address Register (IFAR) on page B4-58 Note Before ARMv7: • The DFAR was called the Fault Address Register (FAR). • The Watchpoint Fault Address Register (DBGWFAR) was implemented in CP15 c6 with ==1. From ARMv7, the DBGWFAR is only implemented as a CP14 debug register, see Watchpoint Fault Address Register (DBGWFAR) on page C10-28. Fault information is returned using the fault address registers and the fault status registers described in CP15 c5, Fault status registers on page B4-54. For details of how these registers are used see Fault Status and Fault Address registers in a PMSA implementation on page B4-18. c6, Data Fault Address Register (DFAR) The Data Fault Address Register, DFAR, holds the faulting address that caused a synchronous Data Abort exception. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-57 Protected Memory System Architecture (PMSA) The DFAR is: • a 32-bit read/write register • accessible only in privileged modes. The format of the DFAR is: 31 0 Faulting address of synchronous Data Abort exception For information about using the DFAR, including when the value in the DFAR is valid, see Fault Status and Fault Address registers in a PMSA implementation on page B4-18. A debugger can write to the DFAR to restore its value. Accessing the DFAR To access the DFAR you read or write the CP15 registers with set to 0, set to c6, set to c0, and set to 0. For example: MRC p15,0,,c6,c0,0 MCR p15,0,,c6,c0,0 ; Read CP15 Data Fault Address Register ; Write CP15 Data Fault Address Register c6, Instruction Fault Address Register (IFAR) The Instruction Fault Address Register, IFAR, holds the address of the faulting access that caused a synchronous Prefetch Abort exception. The IFAR is: • a 32-bit read/write register • accessible only in privileged modes. The format of the IFAR is: 31 0 Faulting address of synchronous Prefetch Abort exception For information about using the IFAR, including when the value in the IFAR is valid, see Fault Status and Fault Address registers in a PMSA implementation on page B4-18. A debugger can write to the IFAR to restore its value. Accessing the IFAR To access the IFAR you read or write the CP15 registers with set to 0, set to c6, set to c0, and set to 2. For example: MRC p15,0,,c6,c0,2 MCR p15,0,,c6,c0,2 B4-58 ; Read CP15 Instruction Fault Address Register ; Write CP15 Instruction Fault Address Register Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) B4.6.24 CP15 c6, Memory region programming registers When the PMSA is implemented, a number of registers in CP15 c6 are used to configure the MPU memory regions. There are three registers for each memory region supported by the MPU: • A Base Address Register, that defined the start address of the region in the memory map. • A Region Size and Enable Register, that: — has a single enable bit for the region — defines the size of the region — has a disable bit for each of the eight subregions in the region. • A Region Access Control Register that defines the memory access attributes for the region. The multiple copies of these registers are mapped onto three or six registers in CP15 c6, and another register is used to select the current memory region. The mapping of the region registers onto the CP15 registers depends on whether the MPU implements a unified memory map, or separate Instruction and Data memory maps: Separate Instruction and Data memory maps The multiple copies of the registers that describe each memory region map onto six CP15 registers. For the memory regions in the Instruction memory map: • the multiple Region Base Address Registers map onto the Instruction Region Base Address Register, IRBAR • the multiple Region Size and Enable Registers map onto the Instruction Region Size and Enable Register, IRSR • the multiple Region Access Control Registers map onto the Instruction Region Access Control Register, IRACR. For the memory regions in the Data memory map: • the multiple Region Base Address Registers map onto the Data Region Base Address Register, DRBAR • the multiple Region Size and Enable Registers map onto the Data Region Size and Enable Register, DRSR • the multiple Region Access Control Registers map onto the Data Region Access Control Register, DRACR. The value in the RGNR is the index value for both the instruction region and the data region registers, see c6, MPU Region Number Register (RGNR) on page B4-66. The RGNR value indicates the current memory region for both the instruction and the data memory maps. However, a particular value might not be valid for both memory maps. Unified memory maps The multiple copies of the registers that describe each memory region map onto three CP15 registers: • ARM DDI 0406B the multiple Region Base Address Registers map onto the Data Region Base Address Register, DRBAR Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-59 Protected Memory System Architecture (PMSA) • the multiple Region Size and Enable Registers map onto the Data Region Size and Enable Register, DRSR • the multiple Region Access Control Registers map onto the Data Region Access Control Register, DRACR. The IRBAR, IRSR, and IRACR are not implemented. The value in the RGNR is the index value for the data region registers, see c6, MPU Region Number Register (RGNR) on page B4-66. Its value indicates the current memory region in the unified memory map. The read-only MPUIR indicates: • whether the MPU implements separate Instruction and Data address maps, or a Unified address map • the number of Data or Unified regions the MPU supports • if separate Instruction and Data address maps are implemented, the number of Instruction regions the MPU supports. For more information, see c0, MPU Type Register (MPUIR) on page B4-36. Table B4-18 summarizes the CP15 registers that are used to program the MPU memory regions, and gives references to the full descriptions of these registers. Table B4-18 MPU Memory Region Programming Registers Register name Description Data or Unified Region Base Address c6, Data Region Base Address Register (DRBAR) Instruction Region Base Addressa c6, Instruction Region Base Address Register (IRBAR) on page B4-61 a Data or Unified Region Size and Enable c6, Data Region Size and Enable Register (DRSR) on page B4-62 Instruction Region Size and Enablea c6, Instruction Region Size and Enable Register (IRSR) on page B4-63 a Data or Unified Region Access Control c6, Data Region Access Control Register (DRACR) on page B4-64 Instruction Region Access Controla c6, Instruction Region Access Control Register (IRACR) on page B4-65 a MPU Region Number c6, MPU Region Number Register (RGNR) on page B4-66 a. These registers are implemented only if the MPU implements separate Instruction and Data memory maps. c6, Data Region Base Address Register (DRBAR) The Data Region Base Address Register, DRBAR, indicates the base address of the current memory region in the data or unified address map. The base address must be aligned to the region size. The current memory region is selected by the value held in the RGNR, see c6, MPU Region Number Register (RGNR) on page B4-66. B4-60 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) The DRBAR is: • a 32-bit read/write register • accessible only in privileged modes. The format of the DRBAR is: 31 2 Region Base Address 1 0 (0) (0) Region Base Address, bits [31:2] The Base Address for the region, in the Data or Unified address map. The region referenced is selected by the RGNR Bit [1:0] UNK/SBZP. The DRBAR can be used to find the size of the supported physical address space for the Data or Unified memory map, see Finding the minimum supported region size on page B4-7. Accessing the DRBAR To access the DRBAR you read or write the CP15 registers with set to 0, set to c6, set to c1, and set to 0. For example: MRC p15,0,,c6,c1,0 MCR p15,0,,c6,c1,0 ; Read CP15 Data Region Base Address Register ; Write CP15 Data Region Base Address Register c6, Instruction Region Base Address Register (IRBAR) The Instruction Region Base Address Register, IRBAR, indicates the base address of the current memory region in the Instruction address map. The base address must be aligned to the region size. The current memory region is selected by the value held in the RGNR, see c6, MPU Region Number Register (RGNR) on page B4-66. The IRBAR is: • a 32-bit read/write register • accessible only in privileged modes. • implemented only when the PMSA implements separate instruction and data memory maps. The format of the IRBAR is identical to the DRBAR, see c6, Data Region Base Address Register (DRBAR) on page B4-60. The IRBAR can be used to find the minimum region size supported by the implementation, see Finding the minimum supported region size on page B4-7. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-61 Protected Memory System Architecture (PMSA) Accessing the IRBAR To access the IRBAR you read or write the CP15 registers with set to 0, set to c6, set to c1, and set to 1. For example: MRC p15,0,,c6,c1,1 MCR p15,0,,c6,c1,1 ; Read CP15 Instruction Region Base Address Register ; Write CP15 Instruction Region Base Address Register c6, Data Region Size and Enable Register (DRSR) The Data Region Size and Enable Register, DRSR, indicates the size of the current memory region in the data or unified address map, and can be used to enable or disable: • the entire region • each of the eight subregions, if the region is enabled. The current memory region is selected by the value held in the RGNR see c6, MPU Region Number Register (RGNR) on page B4-66. The DRSR: • is a 32-bit read/write register • is accessible only in privileged modes. • has a defined reset value of 0. The format of the DRSR is: 31 16 15 14 13 12 11 10 9 8 7 6 5 UNK/SBZP (0) (0) S7D S6D S5D S4D 1 0 RSize En S0D S1D S2D S3D Bit [31:16,7:6] UNK/SBZP. SnD, bit [n+8], for values of n from 0 to 7 Subregion disable bit for region n. Indicates whether the subregion is part of this region: 0 Subregion is part of this region 1 Subregion disabled. The subregion is not part of this region. The region is divided into exactly eight equal sized subregions. Subregion 0 is the subregion at the least significant address. For more information, see Subregions on page B4-3. If the size of this region, indicated by the RSize field, is less than 256 bytes then the SnD fields are not defined, and register bits [15:8] are UNK/SBZP. B4-62 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) RSize, bits [5:1] Region Size field. Indicates the size of the current memory region: • A value of 0 is not permitted, this value is reserved and UNPREDICTABLE. • En, bit [0] If N is the value in this field, the region size is 2N+1 bytes. Enable bit for the region: 0 Region is disabled 1 Region is enabled. Because this register resets to zero, all memory regions are disabled on reset. All memory regions must be enabled before they are used. The minimum region size supported is IMPLEMENTATION DEFINED, but if the memory system implementation includes a cache, ARM strongly recommends that the minimum region size is a multiple of the cache line length. This prevents cache attributes changing mid-way through a cache line. Behavior is UNPREDICTABLE if you: • write a region size that is outside the range supported by the implementation • access this register when the RGNR does not point to a valid region in the MPU Data or Unified address map. Accessing the DRSR To access the DRSR you read or write the CP15 registers with set to 0, set to c6, set to c1, and set to 2. For example: MRC p15,0,,c6,c1,2 MCR p15,0,,c6,c1,2 ; Read CP15 Data Region Size and Enable Register ; Write CP15 Data Region Size and Enable Register c6, Instruction Region Size and Enable Register (IRSR) The Instruction Region Size and Enable Register, IRSR, indicates the size of the current memory region in the instruction address map, and to enable or disable: • the entire region • each of the eight subregions, if the region is enabled. The current memory region is selected by the value held in the RGNR, see c6, MPU Region Number Register (RGNR) on page B4-66. The IRSR: • is a 32-bit read/write register • is accessible only in privileged modes • has a defined reset value of 0. • is implemented only when the PMSA implements separate instruction and data memory maps. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-63 Protected Memory System Architecture (PMSA) The format of the IRSR is identical to the DRSR, see c6, Data Region Size and Enable Register (DRSR) on page B4-62. All memory regions must be enabled before they are used. The minimum region size supported is IMPLEMENTATION DEFINED, but if the memory system implementation includes an instruction cache, ARM strongly recommends that the minimum region size is a multiple of the instruction cache line length. This prevents cache attributes changing mid-way through a cache line. Behavior is UNPREDICTABLE if you: • write a region size that is outside the range supported by the implementation • access this register when the RGNR does not point to a valid region in the MPU instruction address map. Accessing the IRSR To access the IRSR you read or write the CP15 registers with set to 0, set to c6, set to c1, and set to 3. For example: MRC p15,0,,c6,c1,3 MCR p15,0,,c6,c1,3 ; Read CP15 Instruction Region Size and Enable Register ; Write CP15 Instruction Region Size and Enable Register c6, Data Region Access Control Register (DRACR) The Data Region Access Control Register, DRACR, defines the memory attributes for the current memory region in the data or unified address map. The current memory region is selected by the value held in the RGNR, see c6, MPU Region Number Register (RGNR) on page B4-66. The DRACR is: • a 32-bit read/write register • accessible only in privileged modes. The format of the DRACR is: 31 13 12 11 10 UNK/SBZP 8 7 6 X (0) AP [2:0] (0) (0) N 5 3 TEX [2:0] 2 1 0 S C B Bit [31:13,11,7:6] UNK/SBZP. XN, bit [12] B4-64 Execute Never bit. Indicates whether instructions can be fetched from this region: 0 region can contain executable code 1 region is an Execute never region, and any attempt to execute an instruction from the region results in a Permission fault. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) If the MPU implements separate Instruction and Data memory maps this bit is UNK/SBZ For more information, see The Execute Never (XN) attribute and instruction prefetching on page B4-10. AP[2:0], bits [10:8] Access Permissions field. Indicates the read and write access permissions for unprivileged and privileged accesses to the memory region. For more information, see Access permissions on page B4-9. TEX[2:0], C, B, bits [5:3,1:0] Memory access attributes. For more information, see C, B, and TEX[2:0] encodings on page B4-11. S, bit [2] Shareable bit, for Normal memory regions: 0 If region is Normal memory, memory is Non-shareable 1 If region is Normal memory, memory is Shareable. The value of this bit is ignored if the region is not Normal memory. If you access this register when the RGNR does not point to a valid region in the MPU data or unified address map, the result is UNPREDICTABLE. Accessing the DRACR To access the DRACR you read or write the CP15 registers with set to 0, set to c6, set to c1, and set to 4. For example: MRC p15,0,,c6,c1,4 MCR p15,0,,c6,c1,4 ; Read CP15 Data Region Access Control Register ; Write CP15 Data Region Access Control Register c6, Instruction Region Access Control Register (IRACR) The Instruction Region Access Control Register, IRACR, defines the memory attributes for the current memory region in the instruction address map, when the MPU implements separate data and instruction address maps. The current memory region is selected by the value held in the RGNR, see c6, MPU Region Number Register (RGNR) on page B4-66. The IRACR is: • a 32-bit read/write register • accessible only in privileged modes • implemented only when the PMSA implements separate instruction and data memory maps. The format of the IRACR is identical to the DRACR, see c6, Data Region Access Control Register (DRACR) on page B4-64. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-65 Protected Memory System Architecture (PMSA) Note The XN bit, bit [12], is always valid in the IRACR. If you access this register when the RGNR does not point to a valid region in the MPU instruction address map, the result is UNPREDICTABLE. Accessing the IRACR To access the IRACR you read or write the CP15 registers with set to 0, set to c6, set to c1, and set to 5. For example: MRC p15,0,,c6,c1,5 MCR p15,0,,c6,c1,5 ; Read CP15 Instruction Region Access Control Register ; Write CP15 Instruction Region Access Control Register c6, MPU Region Number Register (RGNR) The MPU Region Number Register, RGNR, defines the current memory region in: • the MPU data or unified address map • the MPU instruction address map, if the MPU implements separate data and instruction address maps. The value in the RGNR identifies the memory region description accessed by the Region Base Address, Size and Enable, and Access Control Registers. Note There is only a single MPU Region Number Register. When the MPU implements separate data and instruction address maps, the current region number is always identical for both address maps. This might mean that the current region number is valid for one address map but invalid for the other map. The RGNR is: • a 32-bit read/write register • accessible only in privileged modes. The format of the RGNR is: 31 N N-1 UNK/SBZP Bit [31:N] 0 Region UNK/SBZP. Region, bits [N-1:0] The number of the current region in the Data or Unified address map, and in the Instruction address map if the MPU implements separate Data and Instruction address maps. The value of N is Log2(Number of regions supported) rounded up to an integer. B4-66 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) Memory region numbering starts at 0 and goes up to one less than the number of regions supported. Writing a value to this register that is greater than or equal to the number of memory regions supported has results. UNPREDICTABLE In the context of the RGNR description, when the MPU implements separate Data and Instruction address maps the Number of memory regions supported is the greater of: • number of Data memory regions supported • number of Instruction memory regions supported. Accessing the RGNR To access the RGNR you read or write the CP15 registers with set to 0, set to c6, set to c2, and set to 0. For example: MRC p15,0,,c6,c2,0 MCR p15,0,,c6,c2,0 ARM DDI 0406B ; Read CP15 MPU Region Number Register ; Write CP15 MPU Region Number Register Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-67 Protected Memory System Architecture (PMSA) B4.6.25 CP15 c7, Cache maintenance and other functions The CP15 c7 registers are used for cache maintenance operations, and also provide barrier operations. Figure B4-8 shows the CP15 c7 registers. CRn c7 opc1 0 CRm c0 c1 c5 c6 c10 c11 c13 c14 Read-only opc2 4 {0,6} {0,1} 4 {6,7} {1,2} {1,2} {4,5} 1 1 {1,2} Read/Write NOP Cache maintenance operations ‡ Cache maintenance operations CP15ISB, Instruction Synchronization Barrier operation Branch predictor maintenance operations Cache maintenance operations Cache maintenance operations Data barrier operations DCCMVAU, cache maintenance operation NOP Cache maintenance operations Write-only Bold text = Accessible in User mode ‡ Part of the Multiprocessing Extensions Figure B4-8 CP15 c7 registers in a PMSA implementation All CP15 c7 encodings not shown in Figure B4-8 are UNPREDICTABLE, see Unallocated CP15 encodings on page B4-27. The CP15 c7 operations are described in: • CP15 c7, Cache and branch predictor maintenance functions • CP15 c7, Miscellaneous functions on page B4-72. B4.6.26 CP15 c7, Cache and branch predictor maintenance functions CP15 c7 provides a number of functions. This section describes only the CP15 c7 cache and branch predictor maintenance operations. Branch predictor operations are included in this section, because they operate in a similar way to the cache maintenance operations. Note ARMv7 introduces significant changes in the CP15 c7 operations. Most of these changes are because, from ARMv7, the architecture covers multiple levels of cache. This section only describes the ARMv7 requirements for these operations. For details of these operations in previous versions of the architecture see: • c7, Cache operations on page AppxG-38 for ARMv6 • c7, Cache operations on page AppxH-49 for ARMv4 and ARMv5. B4-68 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) Figure B4-9 shows the CP15 c7 cache and branch predictor maintenance operations. CRn c7 opc1 0 CRm c1 opc2 0 6 0 1 6 7 1 2 1 2 1 1 2 c5 c6 c10 c11 c14 Read-only Read/Write ICIALLUIS, Invalidate all instruction caches to PoU Inner Shareable ‡ BPIALLIS, Invalidate entire branch predictor array Inner Shareable ‡ ICIALLU, Invalidate all instruction caches to PoU ICIMVAU, Invalidate instruction caches by MVA to PoU BPIALL, Invalidate entire branch predictor array BPIMVA, Invalidate MVA from branch predictor array DCIMVAC, Invalidate data† cache line by MVA to PoC DCISW, Invalidate data† cache line by set/way DCCMVAC, Clean data† cache line by MVA to PoC DCCSW, Clean data† cache line by set/way DCCMVAU, Clean data† cache line by MVA to PoU DCCIMVAC, Clean and invalidate data† cache line by MVA to PoC DCCISW, Clean and invalidate data† cache line by set/way Write-only † data or unified PoU: Point of Unification PoC: Point of Coherency ‡ Part of the Multiprocessing Extensions Figure B4-9 CP15 c7 Cache and branch predictor maintenance operations The CP15 c7 cache and branch predictor maintenance operations are all write-only operations that can be executed only in privileged modes. They are listed in Table B4-19. For more information about the terms used in this section see Terms used in describing cache operations on page B2-10. The Multiprocessing Extensions changes the set of caches affected by these operations, see Multiprocessor effects on cache maintenance operations on page B2-23. In Table B4-19, the Rt data column specifies what data is required in the register Rt specified by the MCR instruction used to perform the operation. For more information about the possible data formats see Data formats for the cache and branch predictor operations on page B4-70. Table B4-19 CP15 c7 cache and branch predictor maintenance operations CRm opc2 Mnemonic Function a Rt data c1 0 ICIALLUIS Invalidate all instruction caches to PoU Inner Shareable. Also flushes branch target cache. b Ignored c1 1 BPIALLIS Invalidate entire branch predictor array Inner Shareable. Ignored c5 0 ICIALLU Invalidate all instruction caches to PoU. Also flushes branch target cache. c Ignored c5 1 ICIMVAU Invalidate instruction cache line by address to PoU. b, d Address c5 6 BPIALL Invalidate entire branch predictor array. Ignored ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-69 Protected Memory System Architecture (PMSA) Table B4-19 CP15 c7 cache and branch predictor maintenance operations (continued) CRm opc2 Mnemonic Function a Rt data c5 7 BPIMVA Invalidate address from branch predictor array in the inner shareable domain. d Address c6 1 DCIMVAC Invalidate data or unified cache line by address to PoU. d Address c6 2 DCISW Invalidate data or unified cache line by set/way. Set/way c10 1 DCCMVAC Clean data or unified cache line by address to PoC. d Address c10 2 DCCSW Clean data or unified cache line by set/way. Set/way c11 1 DCCMVAU Clean data or unified cache line by address to PoU. d Address c14 1 DCCIMVAC Clean and invalidate data or unified cache line by address to PoC. d Address c14 2 DCCISW Clean and invalidate data or unified cache line by set/way. Set/way a. Address, point of coherency (PoC) and point of unification (PoU) are described in Terms used in describing cache operations on page B2-10. b. Only applies to separate instruction caches, does not apply to unified caches. c. Only applies to separate instruction caches, does not apply to unified caches. d. In general descriptions of the cache operations, these functions are described as operating by MVA (Modified Virtual Address). In a PMSA implementation the MVA and the PA have the same value, and so the functions operate using a physical address in the memory map. Data formats for the cache and branch predictor operations Table B4-19 on page B4-69 shows three possibilities for the data in the register Rt specified by the MCR instruction. These are described in the following subsections: • Ignored • Address • Set/way on page B4-71 Ignored The value in the register specified by the MCR instruction is ignored. You do not have to write a value to the register before issuing the MCR instruction. Address In general descriptions of the maintenance operations, operations that require a memory address are described as operating by MVA. For more information, see Terms used in describing cache operations on page B2-10. In a PMSA implementation, these operations require the physical address in the memory map. When the data is stated to be an address, it does not have to be cache line aligned. B4-70 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) Set/way For an operation by set/way, the data identifies the cache line that the operation is to be applied to by specifying: • the cache set the line belongs to • the way number of the line in the set • the cache level. The format of the register data for a set/way operation is: 31 32-A 31-A Way B SBZ B-1 L L-1 Set 4 SBZ 3 1 Level 0 0 Where: A = Log2(ASSOCIATIVITY) B L = (L + S) = Log2(LINELEN) S = Log2(NSETS) Level ASSOCIATIVITY, LINELEN (Line Length) and NSETS (number of sets) have their usual meanings and are the values for the cache level being operated on. The values of A and S are rounded up to the next integer. ((Cache level to operate on) -1) For example, this field is 0 for operations on L 1 cache, or 1 for operations on L 2 cache. The number of the set to operate on. The number of the way to operate on. Set Way Note • If L = 4 then there is no SBZ field between the set and level fields in the register. • If A = 0 there is no way field in the register, and register bits [31:B] are SBZ. • If the level, set or way field in the register is larger than the size implemented in the cache then the effect of the operation is UNPREDICTABLE. Accessing the CP15 c7 cache maintenance operations To perform one of the cache maintenance operations you write the CP15 registers with set to 0, set to c7, and and set to the values shown in Table B4-19 on page B4-69. That is: MCR p15,0,,c7,, For example: MCR p15,0,,c7,c5,0 MCR p15,0,,c7,c10,2 ARM DDI 0406B ; Invalidate all instruction caches to point of unification ; Clean data or unified cache line by set/way Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-71 Protected Memory System Architecture (PMSA) B4.6.27 CP15 c7, Miscellaneous functions CP15 c7 provides a number of functions, summarized in Figure B4-8 on page B4-68. This section describes only the CP15 c7 miscellaneous operations. Figure B4-10 shows the CP15 c7 miscellaneous operations. It does not show the other CP15 c7 operations. CRn c7 opc1 0 CRm c0 c5 c10 c13 Read-only opc2 4 4 4 5 1 Read/Write NOP, was Wait For Interrupt (CP15WFI) in ARMv6 CP15ISB, Instruction Synchronization Barrier operation CP15DSB, Data Synchronization Barrier operation CP15DMB, Data Memory Barrier operation NOP, was Prefetch instruction by MVA in ARMv6 Write-only Bold text = Accessible in User mode Figure B4-10 CP15 c7 Miscellaneous operations The CP15 c7 miscellaneous operations are described in: • CP15 c7, Data and Instruction Barrier operations • CP15 c7, No Operation (NOP) on page B4-73. CP15 c7, Data and Instruction Barrier operations ARMv6 includes two CP15 c7 operations to perform Data Barrier operations, and another operation to perform an Instruction Barrier operation. In ARMv7: • The ARM and Thumb instruction sets include instructions to perform the barrier operations, that can be executed in unprivileged and privileged modes, see Memory barriers on page A3-47. • The CP15 c7 operations are defined as write-only operations, that can be executed in unprivileged and privileged modes, but using these operations is deprecated. The three operations are described in: — Instruction Synchronization Barrier operation on page B4-73 — Data Synchronization Barrier operation on page B4-73 — Data Memory Barrier operation on page B4-73. The value in the register Rt specified by the MCR instruction used to perform one of these operations is ignored. You do not have to write a value to the register before issuing the MCR instruction. In ARMv7 using these CP15 c7 operations is deprecated. Use the ISB, DSB, and DMB instructions instead. Note B4-72 • In ARMv6 and earlier documentation, the Instruction Synchronization Barrier operation is referred to as a Prefetch Flush. • In versions of the ARM architecture before ARMv6 the Data Synchronization Barrier operation is described as a Data Write Barrier (DWB). Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) Instruction Synchronization Barrier operation In ARMv7, the ISB instruction is used to perform an Instruction Synchronization Barrier, see ISB on page A8-102. The deprecated CP15 c7 encoding for an Instruction Synchronization Barrier is set to 0, set to c7, set to c5, and set to 4. Data Synchronization Barrier operation In ARMv7, the DSB instruction is used to perform a Data Synchronization Barrier, see DSB on page A8-92. The deprecated CP15 c7 encoding for a Data Synchronization Barrier is set to 0, set to c7, set to c10, and set to 4. This operation performs the full system barrier performed by the DSB instruction. Data Memory Barrier operation In ARMv7, the DMB instruction is used to perform a Data Memory Barrier, see DMB on page A8-90. The deprecated CP15 c7 encoding for a Data Memory Barrier is set to 0, set to c7, set to c10, and set to 5. This operation performs the full system barrier performed by the DMB instruction. CP15 c7, No Operation (NOP) ARMv6 includes two CP15 c7 operations that are not supported in ARMv7, with encodings that become No Operation (NOP) in ARMv7. These are: • The Wait For Interrupt (CP15WFI) operation. In ARMv7 this operation is performed by the WFI instruction, that is available in the ARM and Thumb instruction sets. For more information, see WFI on page A8-810. • The prefetch instruction by MVA operation. In ARMv7 this operation is replaced by the PLI instruction, that is available in the ARM and Thumb instruction sets. For more information, see PLI (immediate, literal) on page A8-242, and PLI (register) on page A8-244. In ARMv7, the CP15 c7 encodings that were used for these operations must be valid write-only operations that perform a NOP. These encodings are: • for the ARMv6 CP15WFI operation: set to 0, set to c7, set to c0, and set to 4 — • for the ARMv6 prefetch instruction by MVA operation: set to 0, set to c7, set to c13, and set to 1. — B4.6.28 CP15 c8, Not used on a PMSA implementation CP15 c8 is not used on an ARMv7-R implementation, see Unallocated CP15 encodings on page B4-27. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-73 Protected Memory System Architecture (PMSA) B4.6.29 CP15 c9, Cache and TCM lockdown registers and performance monitors Some CP15 c9 register encodings are reserved for IMPLEMENTATION DEFINED memory system functions, in particular: • cache control, including lockdown • TCM control, including lockdown • branch predictor control. Additional CP15 c9 encodings are reserved for performance monitors. These encodings fall into two groups: • the optional performance monitors described in Chapter C9 Performance Monitors • additional IMPLEMENTATION DEFINED performance monitors. The reserved encodings permit implementations that are compatible with previous versions of the ARM architecture, in particular with the ARMv6 requirements. Figure B4-11 shows the permitted CP15 c9 register encodings. CRn c9 opc1 {0-7} CRm {c0-c2} {c5-c8} {c12-c14} c15 Read-only ‡ opc2 {0-7} {0-7} {0-7} {0-7} Read/Write ‡ ‡ ‡ Reserved for Branch Predictor, Cache and TCM operations Reserved for Branch Predictor, Cache and TCM operations Reserved for ARM-recommended Performance Monitors Reserved for IMPLEMENTATION DEFINED Performance Monitors Write-only Access depends on the operation Figure B4-11 Permitted CP15 c9 register encodings All CP15 c9 encodings not shown in Figure B4-11 are UNPREDICTABLE, see Unallocated CP15 encodings on page B4-27. In ARMv6, CP15 c9 provides cache lockdown functions. With the ARMv7 abstraction of the hierarchical memory model, for CP15 c9: • All encodings with CRm = {c0-c2, c5-c8} are reserved for IMPLEMENTATION DEFINED cache, branch predictor and TCM operations. This reservation enables the implementation of a scheme that is backwards compatible with ARMv6. For details of the ARMv6 implementation see c9, Cache lockdown support on page AppxG-45. • All encodings with CRm = {c12-c14} are reserved for the optional performance monitors that are defined in Chapter C9 Performance Monitors. • All encodings with CRm = c15 are reserved for IMPLEMENTATION DEFINED performance monitoring features. B4.6.30 CP15 c10, Not used on a PMSA implementation CP15 c10 is not used on an ARMv7-R implementation, see Unallocated CP15 encodings on page B4-27. B4-74 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) B4.6.31 CP15 c11, Reserved for TCM DMA registers Some CP15 c11 register encodings are reserved for IMPLEMENTATION DEFINED DMA operations to and from TCM, see Figure B4-12. CRn c11 opc1 {0-7} CRm {c0-c8} c15 Read-only ‡ opc2 {0-7} {0-7} ‡ ‡ Read/Write Reserved for DMA operations for TCM access Reserved for DMA operations for TCM access Write-only Access depends on the operation Figure B4-12 Permitted CP15 c11 register encodings All CP15 c11 encodings not shown in Figure B4-12 are UNPREDICTABLE, see Unallocated CP15 encodings on page B4-27. B4.6.32 CP15 c12, Not used on a PMSA implementation CP15 c12 is not used on an ARMv7-R implementation, see Unallocated CP15 encodings on page B4-27. B4.6.33 CP15 c13, Context and Thread ID registers The CP15 c13 registers are used for: • a Context ID register • three software Thread ID registers. Figure B4-13 shows the CP15 c13 registers: CRn c13 opc1 0 Read-only CRm c0 opc2 1 2 3 4 Read/Write CONTEXTIDR, Context ID Register TPIDRURW, User Read/Write TPIDRURO, User Read Only Software Thread ID Registers TPIDRPRW, Privileged Only Write-only Figure B4-13 CP15 c13 registers in a PMSA implementation All CP15 c13 encodings not shown in Figure B4-13 are UNPREDICTABLE, see Unallocated CP15 encodings on page B4-27. The CP15 c13 registers are described in: • c13, Context ID Register (CONTEXTIDR) on page B4-76 • CP15 c13 Software Thread ID registers on page B4-77. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-75 Protected Memory System Architecture (PMSA) B4.6.34 c13, Context ID Register (CONTEXTIDR) The Context ID Register, CONTEXTIDR, identifies the current context by means of a Context Identifier (Context ID). Note Previously, on PMSA implementations, this Context ID has been described as a Process Identifier (PROCID), and this CP15 c13 register has been called the Process ID Register. The new naming makes the register naming consistent for PMSA and VMSA implementations. The whole of this register is used by: • the debug logic, for Linked and Unlinked Context ID matching, see Breakpoint debug events on page C3-5 and Watchpoint debug events on page C3-15. • the trace logic, to identify the current process. The CONTEXTIDR is: • a 32-bit read/write register • accessible only in privileged modes. The format of the CONTEXTIDR is: 31 0 ContextID ContextID, bits [31:0] Context Identifier. This field must be programmed with a unique context identifier value that identifies the current process. It is used by the trace logic and the debug logic to identify the process that is running currently. Accessing the CONTEXTIDR To access the CONTEXTIDR you read or write the CP15 registers with set to 0, set to c13, set to c0, and set to 1. For example: MRC p15,0,,c13,c0,1 MCR p15,0,,c13,c0,1 B4-76 ; Read CP15 Context ID Register ; Write CP15 Context ID Register Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) B4.6.35 CP15 c13 Software Thread ID registers The Software Thread ID registers provide locations where software can store thread identifying information, for OS management purposes. These registers are never updated by the hardware. The Software Thread ID registers are: • three 32-bit register read/write registers: — User Read/Write Thread ID Register, TPIDRURW — User Read-only Thread ID Register, TPIDRURO — Privileged Only Thread ID Register, TPIDRPRW. • accessible in different modes: — the User Read/Write Thread ID Register is read/write in unprivileged and privileged modes — the User Read-only Thread ID Register is read-only in User mode, and read/write in privileged modes — the Privileged Only Thread ID Register is only accessible in privileged modes, and is read/write • introduced in ARMv7. Accessing the Software Thread ID registers To access the Software Thread ID registers you read or write the CP15 registers with set to 0, set to c13, set to c0, and set to: • 2 for the User Read/Write Thread ID Register, TPIDRURW • 3 for the User Read-only Thread ID Register, TPIDRURO • 4 for the Privileged Only Thread ID Register, TPIDRPRW. For example: MRC MCR MRC MCR MRC MCR p15, p15, p15, p15, p15, p15, 0, 0, 0, 0, 0, 0, , , , , , , c13, c13, c13, c13, c13, c13, c0, c0, c0, c0, c0, c0, 2 2 3 3 4 4 ; ; ; ; ; ; Read Write Read Write Read Write CP15 CP15 CP15 CP15 CP15 CP15 User Read/Write User Read/Write User Read-only User Read-only Privileged Only Privileged Only Thread Thread Thread Thread Thread Thread ID ID ID ID ID ID Register Register Register Register Register Register B4.6.36 CP15 c14, Not used CP15 c14 is not used on any ARMv7 implementation, see Unallocated CP15 encodings on page B4-27. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-77 Protected Memory System Architecture (PMSA) B4.6.37 CP15 c15, IMPLEMENTATION DEFINED registers CP15 c15 is reserved for IMPLEMENTATION DEFINED purposes. ARMv7 does not impose any restrictions on the use of the CP15 c15 encodings. The documentation of the ARMv7 implementation must describe fully any registers implemented in CP15 c15. Normally, for processor implementations by ARM, this information is included in the Technical Reference Manual for the processor. Typically, CP15 c15 is used to provide test features, and any required configuration options that are not covered by this manual. B4-78 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) B4.7 Pseudocode details of PMSA memory system operations This section contains pseudocode describing PMSA-specific memory operations. The following subsections describe the pseudocode functions: • Alignment fault • Address translation • Default memory map attributes on page B4-81. See also the pseudocode for general memory system operations in Pseudocode details of general memory system operations on page B2-29. B4.7.1 Alignment fault The following pseudocode describes the Alignment fault in a PMSA implementation: // AlignmentFaultP() // ================= AlignmentFaultP(bits(32) address, boolean iswrite) DataAbort(address, bits(4) UNKNOWN, boolean UNKNOWN, iswrite, DAbort_Alignment); B4.7.2 Address translation The following pseudocode describes address translation in a PMSA implementation: // TranslateAddressP() // =================== AddressDescriptor TranslateAddressP(bits(32) va, boolean ispriv, boolean iswrite) AddressDescriptor result; Permissions perms; // PMSA only does flat mapping and security domain is effectively IMPLEMENTATION DEFINED. result.paddress.physicaladdress = va; result.paddress.physicaladdressext = ‘00000000’; IMPLEMENTATION_DEFINED setting of result.paddress.NS; if SCTLR.M == 0 then // MPU is disabled result.memAttrs = DefaultMemoryAttributes(va); else // MPU is enabled // Scan through regions looking for matching ones. If found, the last // one matched is used. region_found = FALSE; for r=0 to MPUIR.DRegion-1 size_enable = DRSR[r]; ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-79 Protected Memory System Architecture (PMSA) base_address = DRBAR[r]; access_control = DRACR[r]; if size_enable<0> == ‘1’ then // Region is enabled lsbit = UInt(size_enable<5:1>) + 1; if lsbit < 2 then UNPREDICTABLE; if lsbit == 32 || va<31:lsbit> == base_address<31:lsbit> then if lsbit >= 8 then // can have subregions subregion = UInt(va); hit = (size_enable == ‘0’); else hit = TRUE; if hit then texcb = access_control<5:3,1:0>; S = access_control<2>; perms.ap = access_control<10:8>; perms.xn = access_control<12>; region_found = TRUE; // Generate the memory attributes, and also the permissions if no region found. if region_found then result.memattrs = DefaultTEXDecode(texcb, S); else if SCTLR.BR == ‘0’ || NOT(ispriv) then DataAbort(address, bits(4) UNKNOWN, boolean UNKNOWN, iswrite, DAbort_Background); else result.memattrs = DefaultMemoryAttributes(va); perms.ap = ‘011’; perms.xn = if va<31:28> == ‘1111’ then NOT(SCTLR.V) else va<31>; // Check the permissions. CheckPermission(perms, VA, boolean UNKNOWN, bits(4) UNKNOWN, iswrite, ispriv); return result; B4-80 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Protected Memory System Architecture (PMSA) B4.7.3 Default memory map attributes The following pseudocode describes the default memory map attributes in a PMSA implementation: // DefaultMemoryAttributes() // ========================= MemoryAttributes DefaultMemoryAttributes(bits(32) va) MemoryAttributes memattrs; case va<31:30> of when ‘00’ if SCTLR.C == ‘0’ then memattrs.type = MemType_Normal; memattrs.innerattrs = ‘00’; // Non-cacheable memattrs.shareable = TRUE; else memattrs.type = MemType_Normal; memattrs.innerattrs = ‘01’; // Write-back write-allocate cacheable memattrs.shareable = FALSE; when ‘01’ if SCTLR.C == ‘0’ || va<29> == ‘1’ then memattrs.type = MemType_Normal; memattrs.innerattrs = ‘00’; // Non-cacheable memattrs.shareable = TRUE; else memattrs.type = MemType_Normal; memattrs.innerattrs = ‘10’; // Write-through cacheable memattrs.shareable = FALSE; when ‘10’ memattrs.type = MemType_Device; memattrs.innerattrs = ‘00’; // Non-cacheable memattrs.shareable = (va<29> == ‘1’); when ‘11’ memattrs.type = MemType_StronglyOrdered; memattrs.innerattrs = ‘00’; // Non-cacheable memattrs.shareable = TRUE; // Outer attributes are the same as the inner attributes in all cases. memattrs.outerattrs = memattrs.innerattrs; memattrs.outershareable = memattrs.shareable; return memattrs; ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B4-81 Protected Memory System Architecture (PMSA) B4-82 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Chapter B5 The CPUID Identification Scheme This chapter describes the CPUID scheme introduced as a requirement in ARMv7. This scheme provides registers that identify the architecture version and many features of the processor implementation. This chapter also describes the registers that identify the implemented Advanced SIMD and VFP features, if any. This chapter contains the following sections: • Introduction to the CPUID scheme on page B5-2 • The CPUID registers on page B5-4 • Advanced SIMD and VFP feature identification registers on page B5-34. Note The other chapters of this manual describe the permitted combinations of architectural features for the ARMv7-A and ARMv7-R architecture profiles, and some of the appendices give this information for previous versions of the architecture. Typically, permitted features are associated with a named architecture version, or version and profile, such as ARMv7-A or ARMv6. The CPUID scheme is a mechanism for describing these permitted combinations in a way that enables software to determine the capabilities of the hardware it is running on. The CPUID scheme does not extend the permitted combinations of architectural features beyond those associated with named architecture versions and profiles. The fact that the CPUID scheme can describe other combinations does not imply that those combinations are permitted ARM architecture variants. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B5-1 The CPUID Identification Scheme B5.1 Introduction to the CPUID scheme In ARM architecture versions before ARMv7, the architecture version is indicated by the Architecture field in the Main ID Register, see: • c0, Main ID Register (MIDR) on page B3-81, for a VMSA implementation • c0, Main ID Register (MIDR) on page B4-32, for a PMSA implementation. From ARMv7, the architecture implements an extended processor identification scheme, using a number of registers in CP15 c0. ARMv7 requires the use of this scheme, and use of the scheme is indicated by a value of 0xF in the Architecture field of the Main ID Register. Note Some ARMv6 processors implemented the scheme before its formal adoption in the architecture. The CPUID scheme provides information about the implemented: • processor features • debug features • auxiliary features, in particular IMPLEMENTATION DEFINED features • memory model features • instruction set features. The following sections give more information about the CPUID registers: • Organization of the CPUID registers • General features of the CPUID registers on page B5-3. The CPUID registers on page B5-4 gives detailed descriptions of the registers. This chapter also describes the identification registers for any Advanced SIMD or VFP implementation. These are registers in the shared register space for the Advanced SIMD and VFP extensions, in CP 10 and CP 11. Advanced SIMD and VFP feature identification registers on page B5-34 describes these registers. B5.1.1 Organization of the CPUID registers Figure B5-1 on page B5-3 shows the CPUID registers and their encodings in CP15. Two of the encodings shown, with == c2 and == {6,7}, are reserved for future expansion of the CPUID scheme. In addition, all CP15 c0 encodings with == {c3-c7} and == {0-7} are reserved for future expansion of the scheme. These reserved encodings must be RAZ. B5-2 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The CPUID Identification Scheme CRn c0 opc1 0 CRm c0 c1 opc2 0 1 2 3 4 5 6 7 0 1 2 3 4 5 {6-7} c2 Read-only Read/Write ID_PFR0, Processor Feature Register 0 ID_PFR1, Processor Feature Register 1 ID_DFR0, Debug Feature Register 0 ID_AFR0, Auxiliary Feature Register 0 ID_MMFR0, Memory Model Feature Register 0 ID_MMFR1, Memory Model Feature Register 1 ID_MMFR2, Memory Model Feature Register 2 ID_MMFR3, Memory Model Feature Register 3 ID_ISAR0, ISA Feature Register 0 ID_ISAR1, ISA Feature Register 1 ID_ISAR2, ISA Feature Register 2 ID_ISAR3, ISA Feature Register 3 ID_ISAR4, ISA Feature Register 4 ID_ISAR5, ISA Feature Register 5 Reserved Write-only Figure B5-1 The CPUID register encodings B5.1.2 General features of the CPUID registers All of the CPUID registers are: • 32-bit read-only registers • accessible only in privileged modes • when the Security Extensions are implemented, Common registers, see Common CP15 registers on page B3-74. Each register is divided into eight 4-bit fields, and the possible field values are defined individually for each field. Some registers do not use all of these fields. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B5-3 The CPUID Identification Scheme B5.2 The CPUID registers The CPUID registers are described in detail in the following sections: • CP15 c0, Processor Feature registers • c0, Debug Feature Register 0 (ID_DFR0) on page B5-6 • c0, Auxiliary Feature Register 0 (ID_AFR0) on page B5-8 • CP15 c0, Memory Model Feature registers on page B5-9 • CP15 c0, Instruction Set Attribute registers on page B5-19. See also General features of the CPUID registers on page B5-3. B5.2.1 CP15 c0, Processor Feature registers The Processor Feature registers, ID_PFR0 and ID_PFR1, provide information about the instruction set state support and programmers’ model for the processor. There are two Processor Feature registers, described in: • c0, Processor Feature Register 0 (ID_PFR0) • c0, Processor Feature Register 1 (ID_PFR1) on page B5-5 • Accessing the Processor Feature registers on page B5-6. c0, Processor Feature Register 0 (ID_PFR0) The format of ID_PFR0 is: 31 16 15 Reserved, RAZ Bits [31:16] 12 11 State3 8 7 State2 4 3 State1 0 State0 Reserved, RAZ. State3, bits [15:12] ThumbEE instruction set support. Permitted values are: 0b0000 Not supported. 0b0001 ThumbEE instruction set supported. The value of 0b0001 is only permitted when State1 == 0b0011. State2, bits [11:8] Jazelle extension support. Permitted values are: 0b0000 Not supported. 0b0001 Support for Jazelle extension, without clearing of JOSCR.CV on exception entry. 0b0010 Support for Jazelle extension, with clearing of JOSCR.CV on exception entry. B5-4 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The CPUID Identification Scheme State1, bits [7:4] Thumb instruction set support. Permitted values are: 0b0000 No support for Thumb instruction set. 0b0001 Support for Thumb encoding before the introduction of Thumb-2 technology: • all instructions are 16-bit • a BL or BLX is a pair of 16-bit instructions • 32-bit instructions other than BL and BLX cannot be encoded. 0b0010 Reserved. 0b0011 Support for Thumb encoding after the introduction of Thumb-2 technology, and for all 16-bit and 32-bit Thumb basic instructions. State0, bits [3:0] ARM instruction set support. Permitted values are: 0b0000 No support for ARM instruction set. 0b0001 Support for ARM instruction set. c0, Processor Feature Register 1 (ID_PFR1) The format of ID_PFR1 is: 31 13 12 Reserved, RAZ M profile programmers’ model Bits [31:12] 8 7 4 3 0 Security Extensions Programmers’ model Reserved, RAZ. M profile programmers’ model, bits [11:8] Permitted values are: 0b0000 Not supported. 0b0010 Support for two-stack programmers’ model. The value of 0b0001 is reserved. Security Extensions, bits [7:4] Permitted values are: 0b0000 Not supported. 0b0001 Support for the Security Extensions. This includes support for Monitor mode and the SMC instruction. 0b0010 As for 0b0001, and adds the ability to set the NSACR.RFR bit. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B5-5 The CPUID Identification Scheme Programmers’ model, bits [3:0] Support for the standard programmers’ model for ARMv4 and later. Model must support User, FIQ, IRQ, Supervisor, Abort, Undefined and System modes. Permitted values are: 0b0000 Not supported. 0b0001 Supported. Accessing the Processor Feature registers To access the Processor Feature Registers you read the CP15 registers with set to 0, set to c0, set to c1, and set to: • 0 for ID_PFR0 • 1 for ID_PFR1. For example: MRC p15, 0, , c0, c1, 0 B5.2.2 ; Read Processor Feature Register 0 c0, Debug Feature Register 0 (ID_DFR0) The Debug Feature Register 0, ID_DFR0, provides top level information about the debug system for the processor. You can obtain more information from the debug infrastructure, see Debug identification registers on page C10-3. The format of the ID_DFR0 is: 31 24 23 20 19 16 15 12 11 8 7 4 3 0 Reserved, RAZ Debug model, M profile Memory-mapped trace model Coprocessor trace model Memory-mapped debug model, A and R profiles Coprocessor Secure debug model, A profile only Coprocessor debug model, A and R profiles Bits [31:24] Reserved, RAZ. Debug model, M profile, bits [23:20] Support for memory-mapped debug model for M profile processors. Permitted values are: 0b0000 Not supported. 0b0001 Support for M profile Debug architecture, with memory-mapped access. B5-6 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The CPUID Identification Scheme Memory-mapped trace model, bits [19:16] Support for memory-mapped trace model. Permitted values are: 0b0000 Not supported. 0b0001 Support for ARM trace architecture, with memory-mapped access. The ID register, register 0x079, gives more information about the implementation. See also Trace on page C1-5. Coprocessor trace model, bits [15:12] Support for coprocessor-based trace model. Permitted values are: 0b0000 Not supported. 0b0001 Support for ARM trace architecture, with CP14 access. The ID register, register 0x079, gives more information about the implementation. See also Trace on page C1-5. Memory-mapped debug model, A and R profiles, bits [11:8] Support for memory-mapped debug model, for A and R profile processors. Permitted values are: 0b0000 Not supported, or pre-ARMv6 implementation. 0b0100 Support for v7 Debug architecture, with memory-mapped access. Values 0b0001, 0b0010, and 0b0011 are reserved. Coprocessor Secure debug model, bits [7:4] Support for coprocessor-based Secure debug model, for an A profile processor that includes the Security Extensions. Permitted values are: 0b0000 Not supported. 0b0011 Support for v6.1 Debug architecture, with CP14 access. 0b0100 Support for v7 Debug architecture, with CP14 access. Values 0b0001 and 0b0010 are reserved. Coprocessor debug model, bits [3:0] Support for coprocessor based debug model, for A and R profile processors. Permitted values are: 0b0000 Not supported. 0b0010 Support for v6 Debug architecture, with CP14 access. 0b0011 Support for v6.1 Debug architecture, with CP14 access. 0b0100 Support for v7 Debug architecture, with CP14 access. Value 0b0001 is reserved. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B5-7 The CPUID Identification Scheme Accessing the ID_DFR0 To access the ID_DFR0 you read the CP15 registers with set to 0, set to c0, set to c1, and set to 2. For example: MRC p15, 0, , c0, c1, 2 B5.2.3 ; Read Debug Feature Register 0 c0, Auxiliary Feature Register 0 (ID_AFR0) The Auxiliary Feature Register 0, ID_AFR0, provides information about the IMPLEMENTATION DEFINED features of the processor. The format of the ID_AFR0 is: 31 16 15 Reserved, RAZ Bits [31:16] 12 11 IMP 8 7 IMP 4 3 IMP 0 IMP Reserved, RAZ. IMPLEMENTATION DEFINED, bits [15:12] IMPLEMENTATION DEFINED, bits [11:8] IMPLEMENTATION DEFINED, bits [7:4] IMPLEMENTATION DEFINED, bits [3:0] The Auxiliary Feature Register 0 has four 4-bit IMPLEMENTATION FIELDS. These fields are defined by the implementer of the design. The implementer is identified by the Implementer field of the Main ID Register, see: • c0, Main ID Register (MIDR) on page B3-81, for a VMSA implementation • c0, Main ID Register (MIDR) on page B4-32, for a PMSA implementation. The Auxiliary Feature Register 0 enables implementers to include additional design features in the CPUID scheme. Field definitions for the Auxiliary Feature Register 0 might: • differ between different implementers • be subject to change • migrate over time, for example if they are incorporated into the main architecture. Accessing the ID_AFR0 To access the ID_AFR0 you read the CP15 registers with set to 0, set to c0, set to c1, and set to 3. For example: MRC p15, 0, , c0, c1, 3 B5-8 ; Read Auxiliary Feature Register 0 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The CPUID Identification Scheme B5.2.4 CP15 c0, Memory Model Feature registers The Memory Model Feature registers, ID_MMF0 to ID_MMFR3, provide general information about the implemented memory model and memory management support, including the supported cache and TLB operations. There are four Memory Model Feature registers, described in: • c0, Memory Model Feature Register 0 (ID_MMFR0) • c0, Memory Model Feature Register 1 (ID_MMFR1) on page B5-11 • c0, Memory Model Feature Register 2 (ID_MMFR2) on page B5-14 • c0, Memory Model Feature Register 3 (ID_MMFR3) on page B5-17 • Accessing the Memory Model Feature registers on page B5-19. c0, Memory Model Feature Register 0 (ID_MMFR0) The format of the ID_MMFR0 is: 31 28 27 Innermost shareability 24 23 FCSE support 20 19 Auxiliary registers 16 15 TCM support 12 11 8 7 Shareability Outermost levels shareability 4 3 PMSA support 0 VMSA support Innermost shareability, bits [31:28] Indicates the innermost shareability domain implemented. Permitted values are: 0b0000 Implemented as Non-cacheable. 0b0001 Implemented with hardware coherency support. 0b1111 Shareability ignored. This field is valid only if more than one level of shareability is implemented, as indicated by the value of the Shareability levels field, bits [15:12]. When the Shareability level field is zero, this field is UNK. FCSE support, bits [27:24] Indicates whether the implementation includes the FCSE. Permitted values are: 0b0000 Not supported. 0b0001 Support for FCSE. The value of 0b0001 is only permitted when the VMSA_support field has a value greater than 0b0010. Auxiliary registers, bits [23:20] Indicates support for Auxiliary registers. Permitted values are: 0b0000 None supported. 0b0001 Support for Auxiliary Control Register only. 0b0010 Support for Auxiliary Fault Status Registers (AIFSR and ADFSR) and Auxiliary Control Register. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B5-9 The CPUID Identification Scheme TCM support, bits [19:16] Indicates support for TCMs and associated DMAs. Permitted values are: 0b0000 Not supported. 0b0001 Support is IMPLEMENTATION DEFINED. ARMv7 requires this setting. 0b0010 Support for TCM only, ARMv6 implementation. 0b0011 Support for TCM and DMA, ARMv6 implementation. Note An ARMv7 implementation might include an ARMv6 model for TCM support. However, in ARMv7 this is an IMPLEMENTATION DEFINED option, and therefore it must be represented by the 0b0001 encoding in this field. Shareability levels, bits [15:12] Indicates the number of shareability levels implemented. Permitted values are: 0b0000 One level of shareability implemented. 0b0001 Two levels of shareability implemented. Outermost shareability, bits [11:8] Indicates the outermost shareability domain implemented. Permitted values are: 0b0000 Implemented as Non-cacheable. 0b0001 Implemented with hardware coherency support. 0b1111 Shareability ignored. PMSA support, bits [7:4] Indicates support for a PMSA. Permitted values are: 0b0000 Not supported. 0b0001 Support for IMPLEMENTATION DEFINED PMSA. 0b0010 Support for PMSAv6, with a Cache Type Register implemented. 0b0011 Support for PMSAv7, with support for memory subsections. ARMv7-R profile. When the PMSA support field is set to a value other than 0b0000 the VMSA support field must be set to 0b0000. VMSA support, bits [3:0] Indicates support for a VMSA. Permitted values are: 0b0000 Not supported. 0b0001 Support for IMPLEMENTATION DEFINED VMSA. 0b0010 Support for VMSAv6, with Cache and TLB Type Registers implemented. 0b0011 Support for VMSAv7, with support for remapping and the access flag. ARMv7-A profile. When the VMSA support field is set to a value other than 0b0000 the PMSA support field must be set to 0b0000. B5-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The CPUID Identification Scheme c0, Memory Model Feature Register 1 (ID_MMFR1) The format of the ID_MMFR1 is: 31 24 23 20 19 16 15 12 11 8 7 4 3 0 28 27 L1 cache Branch L1 unified L1 Harvard L1 unified L1 Harvard L1 unified L1 Harvard Test and Predictor cache cache cache s/w cache s/w cache VA cache VA Clean Branch predictor, bits [31:28] Indicates branch predictor management requirements. Permitted values are: 0b0000 No branch predictor, or no MMU present. Implies a fixed MPU configuration. 0b0001 Branch predictor requires flushing on: • enabling or disabling the MMU • writing new data to instruction locations • writing new mappings to the translation tables • any change to the TTBR0, TTBR1, or TTBCR registers • changes of FCSE ProcessID or ContextID. 0b0010 Branch predictor requires flushing on: • enabling or disabling the MMU • writing new data to instruction locations • writing new mappings to the translation tables • any change to the TTBR0, TTBR1, or TTBCR registers without a corresponding change to the FCSE ProcessID or ContextID. 0b0011 Branch predictor requires flushing only on: • writing new data to instruction locations. 0b0100 For execution correctness, branch predictor requires no flushing at any time. Note The branch predictor is described in some documentation as the Branch Target Buffer. L1 cache Test and Clean, bits [27:24] Indicates the supported Level 1 data cache test and clean operations, for Harvard or unified cache implementations. Permitted values are: 0b0000 None supported. This is the required setting for ARMv7. 0b0001 Supported Level 1 data cache test and clean operations are: • Test and clean data cache. 0b0010 As for 0b0001, and adds: • Test, clean, and invalidate data cache. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B5-11 The CPUID Identification Scheme L1 unified cache, bits [23:20] Indicates the supported entire Level 1 cache maintenance operations, for a unified cache implementation. Permitted values are: 0b0000 None supported. This is the required setting for ARMv7, because ARMv7 requires a hierarchical cache implementation. 0b0001 Supported entire Level 1 cache operations are: • Invalidate cache, including branch predictor if appropriate • Invalidate branch predictor, if appropriate. 0b0010 As for 0b0001, and adds: • Clean cache. Uses a recursive model, using the cache dirty status bit. • Clean and invalidate cache. Uses a recursive model, using the cache dirty status bit. If this field is set to a value other than 0b0000 then the L1 Harvard cache field, bits [19:16], must be set to 0b0000. L1 Harvard cache, bits [19:16] Indicates the supported entire Level 1 cache maintenance operations, for a Harvard cache implementation. Permitted values are: 0b0000 None supported. This is the required setting for ARMv7, because ARMv7 requires a hierarchical cache implementation. 0b0001 Supported entire Level 1 cache operations are: • Invalidate instruction cache, including branch predictor if appropriate • Invalidate branch predictor, if appropriate. 0b0010 As for 0b0001, and adds: • Invalidate data cache • Invalidate data cache and instruction cache, including branch predictor if appropriate. 0b0011 As for 0b0010, and adds: • Clean data cache. Uses a recursive model, using the cache dirty status bit. • Clean and invalidate data cache. Uses a recursive model, using the cache dirty status bit. If this field is set to a value other than 0b0000 then the L1 unified cache field, bits [23:20], must be set to 0b0000. L1 unified cache s/w, bits [15:12] Indicates the supported Level 1 cache line maintenance operations by set/way, for a unified cache implementation. Permitted values are: 0b0000 None supported. This is the required setting for ARMv7, because ARMv7 requires a hierarchical cache implementation. 0b0001 Supported Level 1 unified cache line maintenance operations by set/way are: • Clean cache line by set/way. B5-12 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The CPUID Identification Scheme 0b0010 0b0011 As for 0b0001, and adds: • Clean and invalidate cache line by set/way. As for 0b0010, and adds: • Invalidate cache line by set/way. If this field is set to a value other than 0b0000 then the L1 Harvard cache s/w field, bits [11:8], must be set to 0b0000. L1 Harvard cache s/w, bits [11:8] Indicates the supported Level 1 cache line maintenance operations by set/way, for a Harvard cache implementation. Permitted values are: 0b0000 None supported. This is the required setting for ARMv7, because ARMv7 requires a hierarchical cache implementation. 0b0001 Supported Level 1 Harvard cache line maintenance operations by set/way are: • Clean data cache line by set/way • Clean and invalidate data cache line by set/way. 0b0010 As for 0b0001, and adds: • Invalidate data cache line by set/way. 0b0011 As for 0b0010, and adds: • Invalidate instruction cache line by set/way. If this field is set to a value other than 0b0000 then the L1 unified cache s/w field, bits [15:12], must be set to 0b0000. L1 unified cache VA, bits [7:4] Indicates the supported Level 1 cache line maintenance operations by MVA, for a unified cache implementation. Permitted values are: 0b0000 None supported. This is the required setting for ARMv7, because ARMv7 requires a hierarchical cache implementation. 0b0001 Supported Level 1 unified cache line maintenance operations by MVA are: • Clean cache line by MVA • Invalidate cache line by MVA • Clean and invalidate cache line by MVA. 0b0010 As for 0b0001, and adds: • Invalidate branch predictor by MVA, if branch predictor is implemented. If this field is set to a value other than 0b0000 then the L1 Harvard cache VA field, bits [3:0], must be set to 0b0000. L1 Harvard cache VA, bits [3:0] Indicates the supported Level 1 cache line maintenance operations by MVA, for a Harvard cache implementation. Permitted values are: 0b0000 None supported. This is the required setting for ARMv7, because ARMv7 requires a hierarchical cache implementation. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B5-13 The CPUID Identification Scheme 0b0001 0b0010 Supported Level 1 Harvard cache line maintenance operations by MVA are: • Clean data cache line by MVA • Invalidate data cache line by MVA • Clean and invalidate data cache line by MVA • Clean instruction cache line by MVA. As for 0b0001, and adds: • Invalidate branch predictor by MVA, if branch predictor is implemented. If this field is set to a value other than 0b0000 then the L1 unified cache VA field, bits [7:4], must be set to 0b0000. c0, Memory Model Feature Register 2 (ID_MMFR2) The format of the ID_MMFR2 is: 31 28 27 HW access flag 24 23 WFI stall 16 15 20 19 Mem barrier Unified TLB 12 11 Harvard TLB 8 7 4 3 0 L1 Harvard L1 Harvard L1 Harvard range bg prefetch fg prefetch HW access flag, bits [31:28] Indicates support for a Hardware access flag, as part of the VMSAv7 implementation. Permitted values are: 0b0000 Not supported. 0b0001 Support for VMSAv7 access flag, updated in hardware. On an ARMv7-R implementation this field must be 0b0000. WFI stall, bits [27:24] Indicates the support for Wait For Interrupt (WFI) stalling. Permitted values are: 0b0000 Not supported. 0b0001 Support for WFI stalling. Mem barrier, bits [23:20] Indicates the supported CP15 memory barrier operations: 0b0000 None supported. 0b0001 Supported CP15 Memory barrier operations are: • Data Synchronization Barrier (DSB). In previous versions of the ARM architecture, DSB was named Data Write Barrier (DWB). 0b0010 As for 0b0001, and adds: • Instruction Synchronization Barrier (ISB). In previous versions of the ARM architecture, the ISB operation was called Prefetch Flush. • Data Memory Barrier (DMB). B5-14 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The CPUID Identification Scheme Unified TLB, bits [19:16] Indicates the supported TLB maintenance operations, for a unified TLB implementation. Permitted values are: 0b0000 Not supported. 0b0001 Supported unified TLB maintenance operations are: • Invalidate all entries in the TLB • Invalidate TLB entry by MVA. 0b0010 As for 0b0001, and adds: • Invalidate TLB entries by ASID match. 0b0011 As for 0b0010 and adds: • Invalidate TLB entries by MVA All ASID. If this field is set to a value other than 0b0000 then the Harvard TLB field, bits [15:12], must be set to 0b0000. Harvard TLB, bits [15:12] Indicates the supported TLB maintenance operations, for a Harvard TLB implementation. Permitted values are: 0b0000 Not supported. 0b0001 Supported Harvard TLB maintenance operations are: • Invalidate all entries in the ITLB and the DTLB. This is a shared unified TLB operation. • Invalidate all ITLB entries. • Invalidate all DTLB entries. • Invalidate ITLB entry by MVA. • Invalidate DTLB entry by MVA. 0b0010 As for 0b0001, and adds: • Invalidate ITLB and DTLB entries by ASID match. This is a shared unified TLB operation. • Invalidate ITLB entries by ASID match • Invalidate DTLB entries by ASID match. If this field is set to a value other than 0b0000 then the Unified TLB field, bits [19:16], must be set to 0b0000. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B5-15 The CPUID Identification Scheme L1 Harvard range, bits [11:8] Indicates the supported Level 1 cache maintenance range operations, for a Harvard cache implementation. Permitted values are: 0b0000 Not supported. 0b0001 Supported Level 1 Harvard cache maintenance range operations are: • Invalidate data cache range by VA • Invalidate instruction cache range by VA • Clean data cache range by VA • Clean and invalidate data cache range by VA. L1 Harvard bg prefetch, bits [7:4] Indicates the supported Level 1 cache background prefetch operations, for a Harvard cache implementation. When supported, background prefetch operations are non-blocking operations. Permitted values are: 0b0000 Not supported. 0b0001 Supported Level 1 Harvard cache foreground prefetch operations are: • Prefetch instruction cache range by VA • Prefetch data cache range by VA. L1 Harvard fg prefetch, bits [3:0] Indicates the supported Level 1 cache foreground prefetch operations, for a Harvard cache implementation. When supported, foreground prefetch operations are blocking operations. Permitted values are: 0b0000 Not supported. 0b0001 Supported Level 1 Harvard cache foreground prefetch operations are: • Prefetch instruction cache range by VA • Prefetch data cache range by VA. B5-16 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The CPUID Identification Scheme c0, Memory Model Feature Register 3 (ID_MMFR3) The format of the ID_MMFR3 is: 31 28 27 24 23 Reserved, RAZ 20 19 16 15 12 11 8 7 4 3 0 Reserved, RAZ Coherent walk Maintenance broadcast BP maintain Cache maintainence s/w Cache maintainence MVA Supersection support Supersection support, bits [31:28] On a VMSA implementation, indicates whether Supersections are supported. Permitted values are: 0b0000 Supersections supported. 0b1111 Supersections not supported. All other values are reserved. Note The sense of this identification is reversed from the normal usage in the CPUID mechanism, with the value of zero indicating that the feature is supported. Bits [27:24] Reserved, RAZ. Coherent walk, bits [23:20] Indicates whether Translation table updates require a clean to the point of unification. Permitted values are: 0b0000 Updates to the translation tables require a clean to the point of unification to ensure visibility by subsequent translation table walks. 0b0001 Updates to the translation tables do not require a clean to the point of unification to ensure visibility by subsequent translation table walks. Bits [19:16] Reserved, RAZ. Maintenance broadcast, bits [15:12] Indicates whether Cache, TLB and branch predictor operations are broadcast. Permitted values are: 0b0000 Cache, TLB and branch predictor operations only affect local structures. 0b0001 Cache and branch predictor operations affect structures according to shareability and defined behavior of instructions. TLB operations only affect local structures. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B5-17 The CPUID Identification Scheme 0b0010 Cache, TLB and branch predictor operations affect structures according to shareability an defined behavior of instructions. BP maintain, bits [11:8] Indicates the supported branch predictor maintenance operations in an implementation with hierarchical cache maintenance operations. Permitted values are: 0b0000 None supported. 0b0001 Supported branch predictor maintenance operations are: • Invalidate entire branch predictor array 0b0010 As for 0b0001, and adds: • Invalidate branch predictor by MVA. Cache maintain s/w, bits [7:4] Indicates the supported cache maintenance operations by set/way, in an implementation with hierarchical caches. Permitted values are: 0b0000 None supported. 0b0001 Supported hierarchical cache maintenance operations by set/way are: • Invalidate data cache by set/way • Clean data cache by set/way • Clean and invalidate data cache by set/way. In a unified cache implementation, the data cache operations apply to the unified caches. Cache maintain MVA, bits [3:0] Indicates the supported cache maintenance operations by MVA, in an implementation with hierarchical caches. Permitted values are: 0b0000 None supported. 0b0001 Supported hierarchical cache maintenance operations by MVA are: • Invalidate data cache by MVA • Clean data cache by MVA • Clean and invalidate data cache by MVA • Invalidate instruction cache by MVA • Invalidate all instruction cache entries. In a unified cache implementation, the data cache operations apply to the unified caches, and the instruction cache operations are not implemented. B5-18 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The CPUID Identification Scheme Accessing the Memory Model Feature registers To access the Memory Model Feature Registers you read the CP15 registers with set to 0, set to c0, set to c1, and set to: • 4 for the ID_MMFR0 • 5 for the ID_MMFR1 • 6 for the ID_MMFR2 • 7 for the ID_MMFR3. For example: MRC p15, 0, , c0, c1, 6 B5.2.5 ; Read Memory Model Feature Register 2 CP15 c0, Instruction Set Attribute registers The Instruction Set Attribute registers, ID_ISAR0 to ID_ISAR5, provide information about the instruction set supported by the processor. The instruction set is divided into: • The basic instructions, for the ARM, Thumb, and ThumbEE instruction sets. If the Processor Feature Register 0 indicates support for one of these instruction sets then all basic instructions that have encodings in the corresponding instruction set must be implemented. • The non-basic instructions. The Instruction Set Attribute registers indicate which of these instructions are implemented. Instruction set descriptions in the CPUID scheme on page B5-20 describes the division of the instruction set into basic and non-basic instructions. Summary of Instruction Set Attribute register attributes on page B5-22 lists all of the attributes and shows which register holds each attribute. ARMv7 implements six Instruction Set Attribute registers, described in: • c0, Instruction Set Attribute Register 0 (ID_ISAR0) on page B5-24 • c0, Instruction Set Attribute Register 1 (ID_ISAR1) on page B5-25 • c0, Instruction Set Attribute Register 2 (ID_ISAR2) on page B5-27 • c0, Instruction Set Attribute Register 3 (ID_ISAR3) on page B5-29 • c0, Instruction Set Attribute Register 4 (ID_ISAR4) on page B5-31 • c0, Instruction Set Attribute Register 5 (ID_ISAR5) on page B5-33 • Accessing the Instruction Set Attribute registers on page B5-33. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B5-19 The CPUID Identification Scheme Instruction set descriptions in the CPUID scheme The following subsections describe how the CPUID scheme describes the instruction set, and how instructions are classified as either basic or non-basic: • General rules for instruction classification • Data-processing instructions • Multiply instructions on page B5-21 • Branches on page B5-21 • Load or Store single word instructions on page B5-21 • Load or Store multiple word instructions on page B5-21 • Q flag support in the PSRs on page B5-21. General rules for instruction classification Two general rules apply to the description of instruction classification given in this section: 1. The rules about an instruction being basic do not guarantee that it is available in any particular instruction set. For example, the rules given in this section classify MOV R0, #123456789 as a basic instruction, but this instruction is not available in any existing ARM instruction set. 2. Whether an instruction is conditional or unconditional never makes any difference to whether it is a basic instruction. Data-processing instructions The data-processing instructions are: ADC NEG ADD ORN AND ORR ASR ROR BIC RRX CMN RSB CMP RSC EOR SBC LSL SUB LSR TEQ MOV TST MVN An instruction from this group is a basic instruction if these conditions both apply: • The second source operand, or the only source operand of a MOV or MVN instruction, is an immediate or an unshifted register. Note A MOV instruction with a shifted register source operand must be treated as the equivalent ASR, LSL, LSR, ROR, or RRX instruction, see MOV (shifted register) on page A8-198. • The instruction is not one of the exception return instructions described in SUBS PC, LR and related instructions on page B6-25. If either of these conditions does not apply then the instruction is a non-basic instruction. One or both of these attributes in the Instruction Set Attribute registers shows the support for non-basic data-processing instructions: • PSR_instrs • WithShifts_instrs. B5-20 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The CPUID Identification Scheme Multiply instructions The classification of multiply instructions is: MUL instructions are always basic instructions • • all other multiply instructions, and all multiply-accumulate instructions, are non-basic instructions. Branches All B and BL instructions are basic instructions. Load or Store single word instructions The instructions in this group are: LDR LDRB LDRH LDRSB LDRSH STR STRB STRH An instruction in this group is a basic instruction if its addressing mode is one of these forms: [Rn, #immediate] • • [Rn, #-immediate] • [Rn, Rm] • [Rn, -Rm]. A Load or Store single word instruction with any other addressing mode is a non-basic instruction. One or more of these attributes in the Instruction Set Attribute registers shows the support for these instructions: • WithShifts_instrs • Writeback_instrs • Unpriv_instrs. Load or Store multiple word instructions The Load or Store multiple word instructions are: LDM STM PUSH POP A limited number of variants of these instructions are non-basic. The Except_instrs attribute in the Instruction Set Attribute registers shows the support for these instructions. For details of these non-basic instructions see c0, Instruction Set Attribute Register 1 (ID_ISAR1) on page B5-25. All other forms of these instructions are always basic instructions. Q flag support in the PSRs The Q flag is present in the CPSR and SPSRs when one or more of these conditions apply to the Instruction Set Attribute register attributes: • MultS_instrs ≥ 2 • Saturate_instrs ≥ 1 • SIMD_instrs ≥ 1. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B5-21 The CPUID Identification Scheme Summary of Instruction Set Attribute register attributes The Instruction Set Attribute registers use a set of attributes to indicate the non-basic instructions supported by the processor. The descriptions of the non-basic instructions in Instruction set descriptions in the CPUID scheme on page B5-20 include the attribute or attributes used to indicate support for each category of non-basic instructions. Table B5-1 lists all of these attributes in alphabetical order, and shows which Instruction Set Attribute register holds each attribute. Table B5-1 Alphabetic list of Instruction Set Attribute registers attributes B5-22 Attribute Register Barrier_instrs c0, Instruction Set Attribute Register 4 (ID_ISAR4) on page B5-31 BitCount_instrs c0, Instruction Set Attribute Register 0 (ID_ISAR0) on page B5-24 Bitfield_instrs c0, Instruction Set Attribute Register 0 (ID_ISAR0) on page B5-24 CmpBranch_instrs c0, Instruction Set Attribute Register 0 (ID_ISAR0) on page B5-24 Coproc_instrs c0, Instruction Set Attribute Register 0 (ID_ISAR0) on page B5-24 Debug_instrs c0, Instruction Set Attribute Register 0 (ID_ISAR0) on page B5-24 Divide_instrs c0, Instruction Set Attribute Register 0 (ID_ISAR0) on page B5-24 Endian_instrs c0, Instruction Set Attribute Register 1 (ID_ISAR1) on page B5-25 Except_AR_instrs c0, Instruction Set Attribute Register 1 (ID_ISAR1) on page B5-25 Except_instrs c0, Instruction Set Attribute Register 1 (ID_ISAR1) on page B5-25 Extend_instrs c0, Instruction Set Attribute Register 1 (ID_ISAR1) on page B5-25 IfThen_instrs c0, Instruction Set Attribute Register 1 (ID_ISAR1) on page B5-25 Immediate_instrs c0, Instruction Set Attribute Register 1 (ID_ISAR1) on page B5-25 Interwork_instrs c0, Instruction Set Attribute Register 1 (ID_ISAR1) on page B5-25 Jazelle_instrs c0, Instruction Set Attribute Register 1 (ID_ISAR1) on page B5-25 LoadStore_instrs c0, Instruction Set Attribute Register 2 (ID_ISAR2) on page B5-27 MemHint_instrs c0, Instruction Set Attribute Register 2 (ID_ISAR2) on page B5-27 Mult_instrs c0, Instruction Set Attribute Register 2 (ID_ISAR2) on page B5-27 MultiAccessInt_instrs c0, Instruction Set Attribute Register 2 (ID_ISAR2) on page B5-27 MultS_instrs c0, Instruction Set Attribute Register 2 (ID_ISAR2) on page B5-27 MultU_instrs c0, Instruction Set Attribute Register 2 (ID_ISAR2) on page B5-27 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The CPUID Identification Scheme Table B5-1 Alphabetic list of Instruction Set Attribute registers attributes (continued) ARM DDI 0406B Attribute Register PSR_AR_instrs c0, Instruction Set Attribute Register 2 (ID_ISAR2) on page B5-27 PSR_M_instrs c0, Instruction Set Attribute Register 4 (ID_ISAR4) on page B5-31 Reversal_instrs c0, Instruction Set Attribute Register 2 (ID_ISAR2) on page B5-27 Saturate_instrs c0, Instruction Set Attribute Register 3 (ID_ISAR3) on page B5-29 SIMD_instrs c0, Instruction Set Attribute Register 3 (ID_ISAR3) on page B5-29 SMC_instrs c0, Instruction Set Attribute Register 4 (ID_ISAR4) on page B5-31 SVC_instrs c0, Instruction Set Attribute Register 3 (ID_ISAR3) on page B5-29 Swap_instrs c0, Instruction Set Attribute Register 0 (ID_ISAR0) on page B5-24 SynchPrim_instrs c0, Instruction Set Attribute Register 3 (ID_ISAR3) on page B5-29 SynchPrim_instrs_frac c0, Instruction Set Attribute Register 4 (ID_ISAR4) on page B5-31 TabBranch_instrs c0, Instruction Set Attribute Register 3 (ID_ISAR3) on page B5-29 ThumbCopy_instrs c0, Instruction Set Attribute Register 3 (ID_ISAR3) on page B5-29 ThumbEE_extn_instrs c0, Instruction Set Attribute Register 3 (ID_ISAR3) on page B5-29 TrueNOP_instrs c0, Instruction Set Attribute Register 3 (ID_ISAR3) on page B5-29 Unpriv_instrs c0, Instruction Set Attribute Register 4 (ID_ISAR4) on page B5-31 WithShifts_instrs c0, Instruction Set Attribute Register 4 (ID_ISAR4) on page B5-31 Writeback_instrs c0, Instruction Set Attribute Register 4 (ID_ISAR4) on page B5-31 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B5-23 The CPUID Identification Scheme c0, Instruction Set Attribute Register 0 (ID_ISAR0) The format of the ID_ISAR0 is: 31 28 27 Reserved, RAZ Bits [31:28] 24 23 Divide _instrs 20 19 Debug _instrs 16 15 Coproc _instrs 12 11 CmpBranch _instrs 8 7 Bitfield _instrs 4 3 BitCount _instrs 0 Swap _instrs Reserved, RAZ. Divide_instrs, bits [27:24] Indicates the supported Divide instructions. Permitted values are: 0b0000 . None supported. 0b0001 . Adds support for SDIV and UDIV. Debug_instrs, bits [23:20] Indicates the supported Debug instructions. Permitted values are: 0b0000 None supported. 0b0001 Adds support for BKPT. Coproc_instrs, bits [19:16] Indicates the supported Coprocessor instructions. Permitted values are: 0b0000 None supported, except for separately attributed architectures including CP15, CP14, and Advanced SIMD and VFP. 0b0001 Adds support for generic CDP, LDC, MCR, MRC, and STC. 0b0010 As for 0b0001, and adds generic CDP2, LDC2, MCR2, MRC2, and STC2. 0b0011 As for 0b0010, and adds generic MCRR and MRRC. 0b0100 As for 0b0011, and adds generic MCRR2 and MRRC2. CmpBranch_instrs, bits [15:12] Indicates the supported combined Compare and Branch instructions in the Thumb instruction set. Permitted values are: 0b0000 None supported. 0b0001 Adds support for CBNZ and CBZ. Bitfield_instrs, bits [11:8] Indicates the supported BitField instructions. Permitted values are: 0b0000 None supported. 0b0001 Adds support for BFC, BFI, SBFX, and UBFX. BitCount_instrs, bits [7:4] Indicates the supported Bit Counting instructions. Permitted values are: 0b0000 None supported. 0b0001 Adds support for CLZ. B5-24 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The CPUID Identification Scheme Swap_instrs, bits [3:0] Indicates the supported Swap instructions in the ARM instruction set. Permitted values are: 0b0000 None supported. 0b0001 Adds support for SWP and SWPB. c0, Instruction Set Attribute Register 1 (ID_ISAR1) The format of the IID_ISAR1 is: 31 28 27 Jazelle _instrs 24 23 Interwork _instrs 16 15 20 19 Immediate _instrs IfThen _instrs 12 11 Extend _instrs 8 7 Except_AR _instrs 4 3 Except _instrs 0 Endian _instrs Jazelle_instrs, bits [31:28] Indicates the supported Jazelle extension instructions. Permitted values are: 0b0000 No support for Jazelle. 0b0001 Adds support for BXJ instruction, and the J bit in the PSR. This setting might indicate a trivial implementation of Jazelle support. Interwork_instrs, bits [27:24] Indicates the supported Interworking instructions. Permitted values are: 0b0000 None supported. 0b0001 Adds support for BX instruction, and the T bit in the PSR. 0b0010 As for 0b0001, and adds support for BLX instruction. PC loads have BX-like behavior. 0b0011 As for 0b0010, but guarantees that data-processing instructions in the ARM instruction set with the PC as the destination and the S bit clear have BX-like behavior. Note A value of 0b0000, 0b0001, or 0b0010 in this field does not guarantee that an ARM data-processing instruction with the PC as the destination and the S bit clear behaves like an old MOV PC instruction, ignoring bits [1:0] of the result. With these values of this field: • if bits [1:0] of the result value are 0b00 then the processor remains in ARM state • if bits [1:0] are 0b01, 0b10 or 0b11, the result must be treated as UNPREDICTABLE. Immediate_instrs, bits [23:20] Indicates the support for data-processing instructions with long immediates. Permitted values are: 0b0000 None supported. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B5-25 The CPUID Identification Scheme 0b0001 Adds support for: • the MOVT instruction • the MOV instruction encodings with zero-extended 16-bit immediates • the Thumb ADD and SUB instruction encodings with zero-extended 12-bit immediates, and the other ADD, ADR and SUB encodings cross-referenced by the pseudocode for those encodings. IfThen_instrs, bits [19:16] Indicates the supported IfThen instructions in the Thumb instruction set. Permitted values are: 0b0000 None supported. 0b0001 Adds support for the IT instructions, and for the IT bits in the PSRs. Extend_instrs, bits [15:12] Indicates the supported Extend instructions. Permitted values are: 0b0000 No scalar sign-extend or zero-extend instructions are supported, where scalar instructions means non-Advanced SIMD instructions. 0b0001 Adds support for the SXTB, SXTH, UXTB, and UXTH instructions. 0b0010 As for 0b0001, and adds support for the SXTB16, SXTAB, SXTAB16, SXTAH, UXTB16, UXTAB, UXTAB16, and UXTAH instructions. Note In addition: • the shift options on these instructions are available only if the WithShifts_instrs attribute is 0b0011 or greater • the SXTAB16, SXTB16, UXTAB16, and UXTB16 instructions are available only if both: — the Extend_instrs attribute is 0b0010 or greater — the SIMD_instrs attribute is 0b0011 or greater. Except_AR_instrs, bits [11:8] Indicates the supported A and R profile exception-handling instructions. Permitted values are: 0b0000 None supported. 0b0001 Adds support for the SRS and RFE instructions, and the A and R profile forms of the CPS instruction. Except_instrs, bits [7:4] Indicates the supported exception-handling instructions in the ARM instruction set. Permitted values are: 0b0000 Not supported. This indicates that the User bank and Exception return forms of the LDM and STM instructions are not supported. 0b0001 Adds support for the LDM (exception return), LDM (user registers) and STM (user registers) instruction versions. B5-26 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The CPUID Identification Scheme Endian_instrs, bits [3:0] Indicates the supported Endian instructions. Permitted values are: 0b0000 None supported. 0b0001 Adds support for the SETEND instruction, and the E bit in the PSRs. c0, Instruction Set Attribute Register 2 (ID_ISAR2) The format of the ID_ISAR2 is: 31 28 27 Reversal _instrs 24 23 PSR_AR _instrs 16 15 20 19 MultU _instrs MultS _instrs 12 11 Mult _instrs 8 7 MultiAccess Int_instrs 4 3 MemHint _instrs 0 LoadStore _instrs Reversal_instrs, bits [31:28] Indicates the supported Reversal instructions. Permitted values are: 0b0000 None supported. 0b0001 Adds support for the REV, REV16, and REVSH instructions. 0b0010 As for 0b0001, and adds support for the RBIT instruction. PSR_AR_instrs, bits [27:24] Indicates the supported A and R profile instructions to manipulate the PSR. Permitted values are: 0b0000 None supported. 0b0001 Adds support for the MRS and MSR instructions, and the exception return forms of data-processing instructions described in SUBS PC, LR and related instructions on page B6-25. Note The exception return forms of the data-processing instructions are: • In the ARM instruction set, data-processing instructions with the PC as the destination and the S bit set. These instructions might be affected by the WithShifts attribute. • In the Thumb instruction set, the SUBS PC,LR,#N instruction. MultU_instrs, bits [23:20] Indicates the supported advanced unsigned Multiply instructions. Permitted values are: 0b0000 None supported. 0b0001 Adds support for the UMULL and UMLAL instructions. 0b0010 As for 0b0001, and adds support for the UMAAL instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B5-27 The CPUID Identification Scheme MultS_instrs, bits [19:16] Indicates the supported advanced signed Multiply instructions. Permitted values are: 0b0000 None supported. 0b0001 Adds support for the SMULL and SMLAL instructions. 0b0010 As for 0b0001, and adds support for the SMLABB, SMLABT, SMLALBB, SMLALBT, SMLALTB, SMLALTT, SMLATB, SMLATT, SMLAWB, SMLAWT, SMULBB, SMULBT, SMULTB, SMULTT, SMULWB, and SMULWT instructions. Also adds support for the Q bit in the PSRs. 0b0011 As for 0b0010, and adds support for the SMLAD, SMLADX, SMLALD, SMLALDX, SMLSD, SMLSDX, SMLSLD, SMLSLDX, SMMLA, SMMLAR, SMMLS, SMMLSR, SMMUL, SMMULR, SMUAD, SMUADX, SMUSD, and SMUSDX instructions. Mult_instrs, bits [15:12] Indicates the supported additional Multiply instructions. Permitted values are: 0b0000 No additional instructions supported. This means only MUL is supported. 0b0001 Adds support for the MLA instruction. 0b0010 As for 0b0001, and adds support for the MLS instruction. MultiAccessInt_instrs, bits [11:8] Indicates the support for multi-access interruptible instructions. Permitted values are: 0b0000 None supported. This means the LDM and STM instructions are not interruptible. LDM and STM instructions are restartable. 0b0001 0b0010 LDM and STM instructions are continuable. MemHint_instrs, bits [7:4] Indicates the supported Memory Hint instructions. Permitted values are: 0b0000 None supported. 0b0001 Adds support for the PLD instruction. 0b0010 Adds support for the PLD instruction. In the MemHint_instrs field, entries of 0b0001 and 0b0010 have identical meanings. 0b0011 As for 0b0001 (or 0b0010), and adds support for the PLI instruction. 0b0100 As for 0b0011, and adds support for the PLDW instruction. LoadStore_instrs, bits [3:0] Indicates the supported additional load/store instructions. Permitted values are: 0b0000 None supported. 0b0001 Adds support for the LDRD and STRD instructions. B5-28 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The CPUID Identification Scheme c0, Instruction Set Attribute Register 3 (ID_ISAR3) The format of the ID_ISAR3 is: 31 28 27 ThumbEE_ extn_instrs 24 23 20 19 TrueNOP _instrs 16 15 12 11 TabBranch SynchPrim _instrs _instrs 8 7 SVC _instrs 4 3 SIMD _instrs 0 Saturate _instrs ThumbCopy _instrs ThumbEE_extn_instrs, bits [31:28] Indicates the supported Thumb Execution Environment (ThumbEE) extension instructions. Permitted values are: 0b0000 None supported. 0b0001 Adds support for the ENTERX and LEAVEX instructions, and modifies the load behavior to include null checking. Note This field can only have a value other than 0b0000 when the PFR0 register State3 field has a value of 0b0001, see c0, Processor Feature Register 0 (ID_PFR0) on page B5-4. TrueNOP_instrs, bits [27:24] Indicates the support for True NOP instructions. Permitted values are: 0b0000 None supported. This means there are no NOP instructions that do not have any register dependencies. 0b0001 Adds true NOP instructions in both the Thumb and ARM instruction sets. Also permits additional NOP-compatible hints. ThumbCopy_instrs, bits [23:20] Indicates the supported Thumb non flag-setting MOV instructions. Permitted values are: 0b0000 Not supported. This means that in the Thumb instruction set, encoding T1 of the MOV (register) instruction does not support a copy from a low register to a low register. 0b0001 Adds support for Thumb instruction set encoding T1 of the MOV (register) instruction, copying from a low register to a low register. TabBranch_instrs, bits [19:16] Indicates the supported Table Branch instructions in the Thumb instruction set. Permitted values are: 0b0000 None supported. 0b0001 Adds support for the TBB and TBH instructions. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B5-29 The CPUID Identification Scheme SynchPrim_instrs, bits [15:12] This field is used with the SynchPrim_instrs_frac field of ID_ISAR4 to indicate the supported Synchronization Primitive instructions. Table B5-2 shows the permitted values of these fields: Table B5-2 Synchronization Primitives support SynchPrim_instrs SynchPrim_instrs_frac Supported Synchronization Primitives 0000 0000 None supported 0001 0000 Adds support for the LDREX and STREX instructions. 0001 0011 As for [0001,0000], and adds support for the CLREX, LDREXB, LDREXH, STREXB, and STREXH instructions. 0010 0000 As for [0001,0011], and adds support for the LDREXD and STREXD instructions. All combinations of SynchPrim_instrs and SynchPrim_instrs_frac not shown in Table B5-2 are reserved. SVC_instrs, bits [11:8] Indicates the supported SVC instructions. Permitted values are: 0b0000 Not supported. 0b0001 Adds support for the SVC instruction. Note The SVC instruction was called the SWI instruction in previous versions of the ARM architecture. SIMD_instrs, bits [7:4] Indicates the supported SIMD instructions. Permitted values are: 0b0000 None supported. 0b0001 Adds support for the SSAT and USAT instructions, and for the Q bit in the PSRs. 0b0011 As for 0b0001, and adds support for the PKHBT, PKHTB, QADD16, QADD8, QASX, QSUB16, QSUB8, QSAX, SADD16, SADD8, SASX, SEL, SHADD16, SHADD8, SHASX, SHSUB16, SHSUB8, SHSAX, SSAT16, SSUB16, SSUB8, SSAX, SXTAB16, SXTB16, UADD16, UADD8, UASX, UHADD16, UHADD8, UHASX, UHSUB16, UHSUB8, UHSAX, UQADD16, UQADD8, UQASX, UQSUB16, UQSUB8, UQSAX, USAD8, USADA8, USAT16, USUB16, USUB8, USAX, UXTAB16, and UXTB16 instructions. Also adds support for the GE[3:0] bits in the PSRs. B5-30 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The CPUID Identification Scheme Note • in the SIMD_instrs field, the value of 0b0010 is reserved • the SXTAB16, SXTB16, UXTAB16, and UXTB16 instructions are available only if both: — the Extend_instrs attribute is 0b0010 or greater — the SIMD_instrs attribute is 0b0011 or greater. Saturate_instrs, bits [3:0] Indicates the supported Saturate instructions. Permitted values are: 0b0000 None supported. This means no non-Advanced SIMD saturate instructions are supported. 0b0001 Adds support for the QADD, QDADD, QDSUB, and QSUB instructions, and for the Q bit in the PSRs. c0, Instruction Set Attribute Register 4 (ID_ISAR4) The format of the ID_ISAR4 is: 31 28 27 SWP_frac 24 23 PSR_M _instrs 16 15 20 19 SynchPrim _instrs_frac Barrier _instrs 12 11 SMC _instrs 8 7 Writeback _instrs 4 3 WithShifts _instrs 0 Unpriv _instrs SWP_frac, bits [31:28] Indicates support for the memory system locking the bus for SWP or SWPB instructions. Permitted values are: SWP or SWPB not supported. 0b0000 0b0001 SWP or SWPB supported but only in a uniprocessor context. SWP and SWPB do not guarantee whether memory accesses from other masters can come between the load memory access and the store memory access of the SWP or SWPB. This field is valid only if the Swap_instrs field in ID_ISAR0 is zero. PSR_M_instrs, bits [27:24] Indicates the supported M profile instructions to modify the PSRs. Permitted values are: 0b0000 None supported. 0b0001 Adds support for the M profile forms of the CPS, MRS and MSR instructions. SynchPrim_instrs_frac, bits [23:20] This field is used with the SynchPrim_instrs field of ID_ISAR3 to indicate the supported Synchronization Primitive instructions. Table B5-2 on page B5-30 shows the permitted values of these fields. All combinations of SynchPrim_instrs and SynchPrim_instrs_frac not shown in Table B5-2 on page B5-30 are reserved. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B5-31 The CPUID Identification Scheme Barrier_instrs, bits [19:16] Indicates the supported Barrier instructions in the ARM and Thumb instruction sets. Permitted values are: 0b0000 None supported. Barrier operations are provided only as CP15 operations. 0b0001 Adds support for the DMB, DSB, and ISB barrier instructions. If this field is set to a value other than 0b0000 then the L1 unified cache field, bits [23:20], must be set to 0b0000. SMC_instrs, bits [15:12] Indicates the supported SMC instructions. Permitted values are: 0b0000 Not supported. 0b0001 Adds support for the SMC instruction. Note The SMC instruction was called the SMI instruction in previous versions of the ARM architecture. Writeback_instrs, bits [11:8] Indicates the support for Writeback addressing modes. Permitted values are: 0b0000 Basic support. Only the LDM, STM, PUSH, POP, SRS, and RFE instructions support writeback addressing modes. These instructions support all of their writeback addressing modes. 0b0001 Adds support for all of the writeback addressing modes defined in ARMv7. WithShifts_instrs, bits [7:4] Indicates the support for instructions with shifts. Permitted values are: 0b0000 Nonzero shifts supported only in MOV and shift instructions. 0b0001 Adds support for shifts of loads and stores over the range LSL 0-3. 0b0011 As for 0b0001, and adds support for other constant shift options, both on load/store and other instructions. 0b0100 As for 0b0011, and adds support for register-controlled shift options. Note • In this field, the value of 0b0010 is reserved. • Additions to the basic support indicated by the 0b0000 field value only apply when the encoding supports them. In particular, in the Thumb instruction set there is no difference between the 0b0011 and 0b0100 levels of support. • MOV instructions with shift options are treated as ASR, LSL, LSR, ROR or RRX instructions, as described in Data-processing instructions on page B5-20. B5-32 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The CPUID Identification Scheme Unpriv_instrs, bits [3:0] Indicates the supported Unprivileged instructions. Permitted values are: 0b0000 None supported. No T variant instructions are implemented. 0b0001 Adds support for LDRBT, LDRT, STRBT, and STRT instructions. 0b0010 As for 0b0001, and adds support for LDRHT, LDRSBT, LDRSHT, and STRHT instructions. c0, Instruction Set Attribute Register 5 (ID_ISAR5) The format of the ID_ISAR5 is: 31 0 Reserved, RAZ Bits [31:0] Reserved, RAZ. Accessing the Instruction Set Attribute registers To access the Instruction Set Attribute Registers you read the CP15 registers with set to 0, set to c0, set to c2, and set to: • 0 for the ID_ISAR0 • 1 for the ID_ISAR1 • 2 for the ID_ISAR2 • 3 for the ID_ISAR3 • 4 for the ID_ISAR4 • 5 for the ID_ISAR5. For example: MRC p15, 0, , c0, c2, 3 ARM DDI 0406B ; Read Instruction Set Attribute Register 3 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B5-33 The CPUID Identification Scheme B5.3 Advanced SIMD and VFP feature identification registers When an implementation includes one or both of the optional Advanced SIMD and VFP extensions, the feature identification registers for the extensions are implemented in a common register block. The extensions reside in the coprocessor space for coprocessors CP10 and CP11, and the registers are accessed using the VMRS and VMSR instructions. For more information, see Register map of the Advanced SIMD and VFP extension system registers on page B1-66. Table B5-3 lists the feature identification registers for the Advanced SIMD and VFP extensions. These are described in the remainder of this section. When the Security Extensions are implemented, these registers are Common registers. Table B5-3 Advanced SIMD and VFP feature identification registers B5.3.1 System register Name Description 0b0000 FPSID See Floating-point System ID Register (FPSID) 0b0110 MVFR1 See Media and VFP Feature Register 1 (MVFR1) on page B5-38 0b0111 MVFR0 See Media and VFP Feature Register 0 (MVFR0) on page B5-36 Floating-point System ID Register (FPSID) In ARMv7, the FPSID Register provides top-level information about the floating-point implementation. Note • In an ARMv7 implementation that includes one or both of the Advanced SIMD and VFP extensions the Media and VFP Feature registers provide details of the implemented VFP architecture. • The FPSID can be implemented in a system that provides only software emulation of the ARM floating-point instructions. The ARMv7 format of the FPSID is: 31 24 23 22 Implementer 16 15 Subarchitecture 8 7 Part number 4 3 Variant 0 Revision SW Implementer, bits [31:24] Implementer codes are the same as those used for the Main ID Register, see: • c0, Main ID Register (MIDR) on page B3-81, for a VMSA implementation • c0, Main ID Register (MIDR) on page B4-32, for a PMSA implementation. For an implementation by ARM this field is 0x41, the ASCII code for A. B5-34 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The CPUID Identification Scheme SW, bit [23] Software flag. This bit is used to indicate that a system provides only software emulation of the VFP floating-point instructions: 0 The system includes hardware support for VFP floating-point operations. 1 The system provides only software emulation of the VFP floating-point instructions. Subarchitecture, bits [22:16] Subarchitecture version number. For an implementation by ARM, permitted values are: 0b0000000 VFPv1 architecture with an IMPLEMENTATION DEFINED subarchitecture. Not permitted in an ARMv7 implementation. 0b0000001 VFPv2 architecture with Common VFP subarchitecture v1. Not permitted in an ARMv7 implementation. 0b0000010 VFP architecture v3 or later with Common VFP subarchitecture v2. The VFP architecture version is indicated by the MVFR0 and MVFR1 registers. 0b0000011 VFP architecture v3 or later with Null subarchitecture. The entire floating-point implementation is in hardware, and no software support code is required. The VFP architecture version is indicated by the MVFR0 and MVFR1 registers. This value can be used only by an implementation that does not support the trap enable bits in the FPSCR, see Floating-point Status and Control Register (FPSCR) on page A2-28. 0b0000100 VFP architecture v3 or later with Common VFP subarchitecture v3. The VFP architecture version is indicated by the MVFR0 and MVFR1 registers. For a subarchitecture designed by ARM the most significant bit of this field, register bit [22], is 0. Values with a most significant bit of 0 that are not listed here are reserved. When the subarchitecture designer is not ARM, the most significant bit of this field, register bit [22], must be 1. Each implementer must maintain its own list of subarchitectures it has designed, starting at subarchitecture version number 0x40. Part number, bits [15:8] An IMPLEMENTATION DEFINED part number for the floating-point implementation, assigned by the implementer. Variant, bits [7:4] An IMPLEMENTATION DEFINED variant number. Typically, this field is used to distinguish between different production variants of a single product. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B5-35 The CPUID Identification Scheme Revision, bits [3:0] An IMPLEMENTATION DEFINED revision number for the floating-point implementation. B5.3.2 Media and VFP Feature registers The Media and VFP Feature registers describe the features provided by the Advanced SIMD and VFP extensions, when an implementation includes either or both of these extensions. For details of the implementation options for these extensions see Advanced SIMD and VFP extensions on page A2-20. In VFPv2, it is IMPLEMENTATION DEFINED whether the Media and VFP Feature registers are implemented. Note Often, the complete implementation of a VFP architecture uses support code to provide some VFP functionality. In such an implementation, only the support code can provide full details of the supported features. In this case the Media and VFP Feature registers are not used directly. The Media and VFP Feature registers are described in: • Media and VFP Feature Register 0 (MVFR0) • Media and VFP Feature Register 1 (MVFR1) on page B5-38. Media and VFP Feature Register 0 (MVFR0) The format of the MVFR0 register is: 31 16 15 12 11 8 7 4 3 0 28 27 24 23 20 19 VFP VFP Short Square DoubleSingleA_SIMD Divide rounding exception vectors root precision precision registers modes trapping VFP rounding modes, bits [31:28] Indicates the rounding modes supported by the VFP floating-point hardware. Permitted values are: 0b0000 Only Round to Nearest mode supported, except that Round towards Zero mode is supported for VCVT instructions that always use that rounding mode regardless of the FPSCR setting. 0b0001 All rounding modes supported. Short vectors, bits [27:24] Indicates the hardware support for VFP short vectors. Permitted values are: 0b0000 Not supported. 0b0001 Short vector operation supported. B5-36 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The CPUID Identification Scheme Square root, bits [23:20] Indicates the hardware support for VFP square root operations. Permitted values are: 0b0000 Not supported in hardware. 0b0001 Supported. Note • • the FSQRTS instruction also requires the single-precision VFP attribute, bits [7:4] the FSQRTD instruction also requires the double-precision VFP attribute, bits [11:8]. Divide, bits [19:16] Indicates the hardware support for VFP divide operations. Permitted values are: 0b0000 Not supported in hardware. 0b0001 Supported. Note • • the FDIVS instruction also requires the single-precision VFP attribute, bits [7:4] the FDIVD instruction also requires the double-precision VFP attribute, bits [11:8]. VFP exception trapping, bits [15:12] Indicates whether the VFP hardware implementation supports exception trapping. Permitted values are: 0b0000 Not supported. This is the value for VFPv3. 0b0001 Supported by the hardware. This is the value for VFPv3U, and for VFPv2. When exception trapping is supported, support code is needed to handle the trapped exceptions. Note This value does not indicate that trapped exception handling is available. Because trapped exception handling requires support code, only the support code can provide this information. Double-precision, bits [11:8] Indicates the hardware support for VFP double-precision operations. Permitted values are: 0b0000 Not supported in hardware. 0b0001 Supported, VFPv2. 0b0010 Supported, VFPv3. VFPv3 adds an instruction to load a double-precision floating-point constant, and conversions between double-precision and fixed-point values. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B5-37 The CPUID Identification Scheme A value of 0b0001 or 0b0010 indicates support for all VFP double-precision instructions in the supported version of VFP, except that, in addition to this field being nonzero: • FSQRTD is only available if the Square root field is 0b0001 • FDIVD is only available if the Divide field is 0b0001 • conversion between double-precision and single-precision is only available if the single-precision field is nonzero. Single-precision, bits [7:4] Indicates the hardware support for VFP single-precision operations. Permitted values are: 0b0000 Not supported in hardware. 0b0001 Supported, VFPv2. 0b0010 Supported, VFPv3. VFPv3 adds an instruction to load a single-precision floating-point constant, and conversions between single-precision and fixed-point values. A value of 0b0001 or 0b0010 indicates support for all VFP single-precision instructions in the supported version of VFP, except that, in addition to this field being nonzero: • FSQRTS is only available if the Square root field is 0b0001 • FDIVS is only available if the Divide field is 0b0001 • conversion between double-precision and single-precision is only available if the double-precision field is nonzero. A_SIMD registers, bits [3:0] Indicates support for the Advanced SIMD register bank. Permitted values are: 0b0000 Not supported. 0b0001 Supported, 16 x 64-bit registers. 0b0010 Supported, 32 x 64-bit registers. If this field is nonzero: • all VFP LDC, STC, MCR, and MRC instructions are supported • if the CPUID register shows that the MCRR and MRRC instructions are supported then the corresponding VFP instructions are supported. Media and VFP Feature Register 1 (MVFR1) The format of the MVFR1 register is: 31 28 27 Reserved, RAZ Bits [31:28] B5-38 24 23 VFP HPFP 20 19 A_SIMD HPFP 16 15 A_SIMD SPFP 12 11 A_SIMD integer 8 7 A_SIMD load/store 4 3 D_NaN mode 0 FtZ mode Reserved, RAZ. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B The CPUID Identification Scheme VFP HPFP, bits[27:24] Indicates whether the VFP supports half-precision floating-point conversion operations. Permitted values are: 0b0000 Not supported. 0b0001 Supported. A_SIMD HPFP, bits[23:20] Indicates whether Advanced SIMD supports half-precision floating-point conversion operations. Permitted values are: 0b0000 Not supported. 0b0001 Supported. This value is only permitted if the A_SIMD SPFP field is 0b0001. A_SIMD SPFP, bits [19:16] Indicates whether the Advanced SIMD extension supports single-precision floating-point operations. Permitted values are: 0b0000 Not supported. 0b0001 Supported. This value is only permitted if the A_SIMD integer field is 0b0001. A_SIMD integer, bits [15:12] Indicates whether the Advanced SIMD extension supports integer operations. Permitted values are: 0b0000 Not supported. 0b0001 Supported. A_SIMD load/store, bits [11:8] Indicates whether the Advanced SIMD extension supports load/store instructions. Permitted values are: 0b0000 Not supported. 0b0001 Supported. D_NaN mode, bits [7:4] Indicates whether the VFP hardware implementation supports only the Default NaN mode. Permitted values are: 0b0000 Hardware supports only the Default NaN mode. If a VFP subarchitecture is implemented its support code might include support for propagation of NaN values. 0b0001 Hardware supports propagation of NaN values. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B5-39 The CPUID Identification Scheme FtZ mode, bits [7:4] Indicates whether the VFP hardware implementation supports only the Flush-to-Zero mode of operation. Permitted values are: 0b0000 Hardware supports only the Flush-to-Zero mode of operation. If a VFP subarchitecture is implemented its support code might include support for full denormalized number arithmetic. 0b0001 Hardware supports full denormalized number arithmetic. B5.3.3 Accessing the Advanced SIMD and VFP feature identification registers You access the Advanced SIMD and VFP feature identification registers using the VMRS instruction, see VMRS on page A8-658. For example: VMRS , FPSID VMRS , MVFR1 B5-40 ; Read Floating-Point System ID Register ; Read Media and VFP Feature Register 1 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Chapter B6 System Instructions This chapter describes the instructions that are only available, or that behave differently, in privileged modes. It contains the following section: • Alphabetical list of instructions on page B6-2. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B6-1 System Instructions B6.1 Alphabetical list of instructions This section lists every instruction that behaves differently in privileged modes, or that is only available in privileged modes. For information about privileged modes see ARM processor modes and core registers on page B1-6. B6-2 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B System Instructions B6.1.1 CPS Change Processor State is available only in privileged modes. It changes one or more of the A, I, and F interrupt disable bits and the mode bits of the CPSR, without changing the other CPSR bits. Encoding T1 ARMv6*, ARMv7 CPS Not permitted in IT block. 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 0 1 1 0 1 1 0 0 1 1 im (0) A I F enable = (im == ‘0’); disable = (im == ‘1’); changemode = FALSE; affectA = (A == ‘1’); affectI = (I == ‘1’); affectF = (F == ‘1’); if InITBlock() then UNPREDICTABLE; Encoding T2 ARMv6T2, ARMv7 CPS.W {,#} CPS # 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Not permitted in IT block. Not permitted in IT block. 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 0 0 1 1 1 0 1 0 (1) (1) (1) (1) 1 0 (0) 0 (0) imod M A I F mode if imod == ‘00’ && M == ‘0’ then SEE “Hint instructions”; enable = (imod == ‘10’); disable = (imod == ‘11’); changemode = (M == ‘1’); affectA = (A == ‘1’); affectI = (I == ‘1’); affectF = (F == ‘1’); if imod == ‘01’ || InITBlock() then UNPREDICTABLE; Encoding A1 ARMv6*, ARMv7 CPS {,#} CPS # 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 1 1 1 1 0 0 0 1 0 0 0 0 imod M 0 (0) (0) (0) (0) (0) (0) (0) A I F 0 3 2 1 0 mode enable = (imod == ‘10’); disable = (imod == ‘11’); changemode = (M == ‘1’); affectA = (A == ‘1’); affectI = (I == ‘1’); affectF = (F == ‘1’); if (imod == ‘00’ && M == ‘0’) || imod == ‘01’ then UNPREDICTABLE; Assembler syntax CPS {, #} CPS # where: ARM DDI 0406B The effect required on the A, I, and F bits in the CPSR. This is one of: IE Interrupt Enable. This sets the specified bits to 0. ID Interrupt Disable. This sets the specified bits to 1. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B6-3 System Instructions If is specified, the bits to be affected are specified by . The mode can optionally be changed by specifying a mode number as . If is not specified, then: • is not specified and interrupt settings are not changed specifies the new mode number. • See Standard assembler syntax fields on page A8-7. A CPS instruction must be unconditional. Is a sequence of one or more of the following, specifying which interrupt disable flags are affected: a Sets the A bit in the instruction, causing the specified effect on the CPSR.A (asynchronous abort) bit. i Sets the I bit in the instruction, causing the specified effect on the CPSR.I (IRQ interrupt) bit. f Sets the F bit in the instruction, causing the specified effect on the CPSR.F (FIQ interrupt) bit. The number of the mode to change to. If this option is omitted, no mode change occurs. Operation EncodingSpecificOperations(); if CurrentModeIsPrivileged() then cpsr_val = CPSR; if enable then if affectA then cpsr_val<8> = ‘0’; if affectI then cpsr_val<7> = ‘0’; if affectF then cpsr_val<6> = ‘0’; if disable then if affectA then cpsr_val<8> = ‘1’; if affectI then cpsr_val<7> = ‘1’; if affectF then cpsr_val<6> = ‘1’; if changemode then cpsr_val<4:0> = mode; CPSRWriteByInstr(cpsr_val, ‘1111’, TRUE); Exceptions None. Hint instructions If the imod field and the M bit in encoding T2 are '00' and'0' respectively, a hint instruction is encoded. To determine which hint instruction, see Change Processor State, and hints on page A6-21. B6-4 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B System Instructions B6.1.2 LDM (exception return) Load Multiple (exception return) loads multiple registers from consecutive memory locations using an address from a base register. The SPSR of the current mode is copied to the CPSR. An address adjusted by the size of the data loaded can optionally be written back to the base register. The registers loaded include the PC. The word loaded for the PC is treated as an address and a branch occurs to that address. Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7 LDM{} {!},^ 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 cond 1 0 0 P U 1 W 1 Rn 1 3 2 1 0 register_list n = UInt(Rn); registers = register_list; wback = (W == ‘1’); increment = (U == ‘1’); wordhigher = (P == U); if n == 15 then UNPREDICTABLE; if wback && registers == ‘1’ && ArchVersion() >= 7 then UNPREDICTABLE; Assembler syntax LDM{} {!}, ^ where: See Standard assembler syntax fields on page A8-7. is one of: DA Decrement After. The consecutive memory addresses end at the address in the base register. For this instruction, FA, meaning Full Ascending, is equivalent to DA. Encoded as P = 0, U = 0. DB Decrement Before. The consecutive memory addresses end one word below the address in the base register. For this instruction, EA, meaning Empty Ascending, is equivalent to DB. Encoded as P = 1, U = 0. IA Increment After. The consecutive memory addresses start at the address in the base register. This is the default, and is normally omitted. For this instruction, FD, meaning Full Descending, is equivalent to IA. Encoded as P = 0, U = 1. IB Increment Before. The consecutive memory addresses start one word above the address in the base register. For this instruction, ED, meaning Empty Descending, is equivalent to IB. Encoded as P = 1, U = 1. The base register. This register can be the SP. ! Causes the instruction to write a modified value back to . Encoded as W = 1. If ! is omitted, the instruction does not change in this way. Encoded as W = 0. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B6-5 System Instructions Is a list of one or more registers, separated by commas and surrounded by { and }. It specifies the set of registers to be loaded. The registers are loaded with the lowest-numbered register from the lowest memory address, through to the highest-numbered register from the highest memory address. The PC must be specified in the register list, and the instruction causes a branch to the address (data) loaded into the PC. The pre-UAL syntax LDM{} is equivalent to LDM{}. Note Instructions with similar syntax but without the PC included in are described in LDM (user registers) on page B6-7. Operation if ConditionPassed() then EncodingSpecificOperations(); if CurrentModeIsUserOrSystem() then UNPREDICTABLE; length = 4*BitCount(registers) + 4; address = if increment then R[n] else R[n]-length; if wordhigher then address = address+4; for i = 0 to 14 if registers == ‘1’ then R[i] = MemA[address,4]; address = address + 4; new_pc_value = MemA[address,4]; if wback && registers == ‘0’ then R[n] = if increment then R[n]+length else R[n]-length; if wback && registers == ‘1’ then R[n] = bits(32) UNKNOWN; CPSRWriteByInstr(SPSR[], ‘1111’, TRUE); BranchWritePC(new_pc_value); Exceptions Data Abort. B6-6 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B System Instructions B6.1.3 LDM (user registers) Load Multiple (user registers) is UNPREDICTABLE in User or System modes. In exception modes, it loads multiple User mode registers from consecutive memory locations using an address from a banked base register. Writeback to the base register is not available with this instruction. The registers loaded cannot include the PC. Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7 LDM{} ,^ 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 cond 1 0 0 P U 1 (0) 1 Rn 0 n = UInt(Rn); registers = register_list; increment = (U == ‘1’); if n == 15 || BitCount(registers) < 1 then UNPREDICTABLE; 3 2 1 0 register_list wordhigher = (P == U); Assembler syntax LDM{} , ^ where: See Standard assembler syntax fields on page A8-7. is one of: DA Decrement After. The consecutive memory addresses end at the address in the base register. For this instruction, FA, meaning Full Ascending, is equivalent to DA. Encoded as P = 0, U = 0. DB Decrement Before. The consecutive memory addresses end one word below the address in the base register. For this instruction, EA, meaning Empty Ascending, is equivalent to DB. Encoded as P = 1, U = 0. IA Increment After. The consecutive memory addresses start at the address in the base register. This is the default, and is normally omitted. For this instruction, FD, meaning Full Descending, is equivalent to IA. Encoded as P = 0, U = 1. IB Increment Before. The consecutive memory addresses start one word above the address in the base register. For this instruction, ED, meaning Empty Descending, is equivalent to IB. Encoded as P = 1, U = 1. The base register. This register can be the SP. Is a list of one or more registers, separated by commas and surrounded by { and }. It specifies the set of registers to be loaded by the LDM instruction. The registers are loaded with the lowest-numbered register from the lowest memory address, through to the highest-numbered register from the highest memory address. The PC must not be in the register list. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B6-7 System Instructions The pre-UAL syntax LDM{} is equivalent to LDM{}. Note Instructions with similar syntax but with the PC included in are described in LDM (exception return) on page B6-5. Operation if ConditionPassed() then EncodingSpecificOperations(); if CurrentModeIsUserOrSystem() then UNPREDICTABLE; length = 4*BitCount(registers); address = if increment then R[n] else R[n]-length; if wordhigher then address = address+4; for i = 0 to 14 if registers == ‘1’ then // Load User mode (‘10000’) register Rmode[i, ‘10000’] = MemA[address,4]; address = address + 4; Exceptions Data Abort. B6-8 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B System Instructions B6.1.4 LDRBT, LDRHT, LDRSBT, LDRSHT, and LDRT Even in privileged modes, loads from memory by these instructions are restricted in the same way as loads from memory in User mode. This is encapsulated in the MemA_unpriv[] and MemU_unpriv[] pseudocode functions. For details see Aligned memory accesses on page B2-31 and Unaligned memory accesses on page B2-32. For details of the instructions see: • LDRBT on page A8-134 • LDRHT on page A8-158 • LDRSBT on page A8-166 • LDRSHT on page A8-174 • LDRT on page A8-176. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B6-9 System Instructions B6.1.5 MRS Move to Register from Special Register moves the value from the CPSR or SPSR of the current mode into a general-purpose register. Encoding T1 ARMv6T2, ARMv7 MRS , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 0 0 1 1 1 1 1 R (1) (1) (1) (1) 1 0 (0) 0 Rd (0) (0) (0) (0) (0) (0) (0) (0) d = UInt(Rd); read_spsr = (R == ‘1’); if BadReg(d) then UNPREDICTABLE; Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7 MRS , 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 0 0 0 1 0 R 0 0 (1) (1) (1) (1) Rd 7 6 5 4 3 2 1 0 (0) (0) (0) (0) 0 0 0 0 (0) (0) (0) (0) d = UInt(Rd); read_spsr = (R == ‘1’); if d == 15 then UNPREDICTABLE; B6-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B System Instructions Assembler syntax MRS , where: See Standard assembler syntax fields on page A8-7. The destination register. Is one of: • APSR • CPSR • SPSR. ARM recommends the APSR form when only the N, Z, C, V, Q, or GE[3:0] bits of the read value are going to be used (see The Application Program Status Register (APSR) on page A2-14). Operation if ConditionPassed() then EncodingSpecificOperations(); if read_spsr then if CurrentModeIsUserOrSystem() then UNPREDICTABLE; else R[d] = SPSR[]; else // CPSR is read with execution state bits other than E masked out. R[d] = CPSR AND ‘11111000 11111111 00000011 11011111’; Exceptions None. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B6-11 System Instructions B6.1.6 MSR (immediate) Move immediate value to Special Register moves selected bits of an immediate value to the CPSR or the SPSR of the current mode. Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7 MSR ,# 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 cond if mask imm32 = if mask if n == 0 0 1 1 0 R 1 0 mask (1) (1) (1) (1) imm12 == ‘0000’ && R == ‘0’ then SEE “Related encodings”; ARMExpandImm(imm12); write_spsr = (R == ‘1’); == ‘0000’ then UNPREDICTABLE; 15 then UNPREDICTABLE; Assembler syntax MSR , # where: See Standard assembler syntax fields on page A8-7. Is one of: • APSR_ • CPSR_ • SPSR_. ARM recommends the APSR forms when only the N, Z, C, V, Q, and GE[3:0] bits are being written. For more information, see The Application Program Status Register (APSR) on page A2-14. The immediate value to be transferred to . See Modified immediate constants in ARM instructions on page A5-9 for the range of values. Is one of nzcvq, g, or nzcvqg. In the A and R profiles: APSR_nzcvq is the same as CPSR_f (mask == ’1000’) • • APSR_g is the same as CPSR_s (mask == ’0100’) APSR_nzcvqg is the same as CPSR_fs (mask == ’1100’). • B6-12 Is a sequence of one or more of the following: c mask<0> = '1' to enable writing of bits<7:0> of the destination PSR x mask<1> = '1' to enable writing of bits<15:8> of the destination PSR s mask<2> = '1' to enable writing of bits<23:16> of the destination PSR f mask<3> = '1' to enable writing of bits<31:24> of the destination PSR. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B System Instructions Operation if ConditionPassed() then EncodingSpecificOperations(); if write_spsr then SPSRWriteByInstr(imm32, mask); else CPSRWriteByInstr(imm32, mask, FALSE); // Does not affect execution state bits // other than E Exceptions None. E bit The CPSR.E bit is writable from any mode using an MSR instruction. Use of this to change its value is deprecated. Use the SETEND instruction instead. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B6-13 System Instructions B6.1.7 MSR (register) Move to Special Register from ARM core register moves the value of a general-purpose register to the CPSR or the SPSR of the current mode. Encoding T1 ARMv6T2, ARMv7 MSR , 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 0 0 1 1 1 0 0 R Rn 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 0 (0) 0 mask (0) (0) (0) (0) (0) (0) (0) (0) n = UInt(Rn); write_spsr = (R == ‘1’); if mask == ‘0000’ then UNPREDICTABLE; if BadReg(n) then UNPREDICTABLE; Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7 MSR , 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 0 0 0 1 0 R 1 0 mask 7 6 5 4 3 2 (1) (1) (1) (1) (0) (0) (0) (0) 0 0 0 0 1 0 Rn n = UInt(Rn); write_spsr = (R == ‘1’); if mask == ‘0000’ then UNPREDICTABLE; if n == 15 then UNPREDICTABLE; Assembler syntax MSR , where: See Standard assembler syntax fields on page A8-7. Is one of: • APSR_ • CPSR_ • SPSR_. ARM recommends the APSR forms when only the N, Z, C, V, Q, and GE[3:0] bits are being written. For more information, see The Application Program Status Register (APSR) on page A2-14. Is the general-purpose register to be transferred to . Is one of nzcvq, g, or nzcvqg. In the A and R profiles: • APSR_nzcvq is the same as CPSR_f (mask == ’1000’) APSR_g is the same as CPSR_s (mask == ’0100’) • APSR_nzcvqg is the same as CPSR_fs (mask == ’1100’). • B6-14 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B System Instructions Is a sequence of one or more of the following: c mask<0> = '1' to enable writing of bits<7:0> of the destination PSR x mask<1> = '1' to enable writing of bits<15:8> of the destination PSR s mask<2> = '1' to enable writing of bits<23:16> of the destination PSR f mask<3> = '1' to enable writing of bits<31:24> of the destination PSR. Operation if ConditionPassed() then EncodingSpecificOperations(); if write_spsr then SPSRWriteByInstr(R[n], mask); else CPSRWriteByInstr(R[n], mask, FALSE); // Does not affect execution state bits // other than E Exceptions None. E bit The CPSR.E bit is writable from any mode using an MSR instruction. Use of this to change its value is deprecated. Use the SETEND instruction instead. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B6-15 System Instructions B6.1.8 RFE Return From Exception loads the PC and the CPSR from the word at the specified address and the following word respectively. For information about memory accesses see Memory accesses on page A8-13. Encoding T1 ARMv6T2, ARMv7 RFEDB {!} Outside or last in IT block 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 0 1 0 0 0 0 0 W 1 Rn (1) (1) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) n = UInt(Rn); wback = (W == ‘1’); increment = FALSE; if n == 15 then UNPREDICTABLE; if InITBlock() && !LastInITBlock() then UNPREDICTABLE; Encoding T2 wordhigher = FALSE; ARMv6T2, ARMv7 RFE{IA} {!} Outside or last in IT block 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 0 1 0 0 1 1 0 W 1 Rn (1) (1) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) n = UInt(Rn); wback = (W == ‘1’); increment = TRUE; wordhigher = FALSE; if n == 15 then UNPREDICTABLE; if InITBlock() && !LastInITBlock() then UNPREDICTABLE; Encoding A1 ARMv6*, ARMv7 RFE{} {!} 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 1 1 1 1 1 0 0 P U 0 W 1 n = UInt(Rn); wback = (W == ‘1’); inc = (U == ‘1’); if n == 15 then UNPREDICTABLE; B6-16 Rn 7 6 5 4 3 2 1 0 (0) (0) (0) (0) (1) (0) (1) (0) (0) (0) (0) (0) (0) (0) (0) (0) wordhigher = (P == U); Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B System Instructions Assembler syntax RFE{} {!} where: is one of: DA Decrement After. ARM code only. The consecutive memory addresses end at the address in the base register. Encoded as P = 0, U = 0 in encoding A1. DB Decrement Before. The consecutive memory addresses end one word below the address in the base register. Encoding T1, or encoding A1 with P = 1, U = 0. IA Increment After. The consecutive memory addresses start at the address in the base register. This is the default, and is normally omitted. Encoding T2, or encoding A1 with P = 0, U = 1. IB Increment Before. ARM code only. The consecutive memory addresses start one word above the address in the base register. Encoded as P = 1, U = 1 in encoding A1. See Standard assembler syntax fields on page A8-7. An ARM RFE instruction must be unconditional. The base register. ! Causes the instruction to write a modified value back to . If ! is omitted, the instruction does not change . RFEFA, RFEEA, RFEFD, and RFEED are pseudo-instructions for RFEDA, RFEDB, RFEIA, and RFEIB respectively, referring to their use for popping data from Full Ascending, Empty Ascending, Full Descending, and Empty Descending stacks. Operation if ConditionPassed() then EncodingSpecificOperations(); if !CurrentModeIsPrivileged() || CurrentInstrSet() == InstrSet_ThumbEE then UNPREDICTABLE; else address = if increment then R[n] else R[n]-8; if wordhigher then address = address+4; CPSRWriteByInstr(MemA[address+4,4], ‘1111’, TRUE); BranchWritePC(MemA[address,4]); if wback then R[n] = if increment then R[n]+8 else R[n]-8; Exceptions Data Abort. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B6-17 System Instructions B6.1.9 SMC (previously SMI) Secure Monitor Call causes a Secure Monitor exception. It is available only in privileged modes. An attempt to execute this instruction in User mode causes an Undefined Instruction exception. For details of the effects of a Secure Monitor exception see Secure Monitor Call (SMC) exception on page B1-53. Encoding T1 Security Extensions (not in ARMv6K) SMC # 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 0 1 1 1 1 1 1 1 1 0 15 14 13 12 11 10 9 8 imm4 7 6 5 4 3 2 1 0 1 0 0 0 (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) imm32 = ZeroExtend(imm4, 32); // imm32 is for assembly/disassembly only and is ignored by hardware if InITBlock() && !LastInITBlock() then UNPREDICTABLE; Encoding A1 Security Extensions SMC # 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 7 6 5 4 3 2 0 0 0 1 0 1 1 0 (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) (0) 0 1 1 1 1 0 imm4 imm32 = ZeroExtend(imm4, 32); // imm32 is for assembly/disassembly only and is ignored by hardware B6-18 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B System Instructions Assembler syntax SMC # where: See Standard assembler syntax fields on page A8-7. Is a 4-bit immediate value. This is ignored by the ARM processor. It can be used by the SMC exception handler (Secure Monitor code) to determine what service is being requested, but this is not recommended. The pre-UAL syntax SMI is equivalent to SMC. Operation if ConditionPassed() then EncodingSpecificOperations(); if HaveSecurityExt() && CurrentModeIsPrivileged() then TakeSMCException(); // Secure Monitor Call if privileged else UNDEFINED; Exceptions Secure Monitor Call. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B6-19 System Instructions B6.1.10 SRS Store Return State stores the LR and SPSR of the current mode to the stack of a specified mode. For information about memory accesses see Memory accesses on page A8-13. Encoding T1 ARMv6T2, ARMv7 SRSDB SP{!},# 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 0 1 0 0 0 0 0 W 0 (1) (1) (0) (1) (1) (1) (0) (0) (0) (0) (0) (0) (0) (0) (0) mode wback = (W == ‘1’); increment = FALSE; wordhigher = FALSE; // In Non-secure state, check for attempts to access Monitor mode (‘10110’), or FIQ // mode (‘10001’) when the Security Extensions are reserving the FIQ registers. The // definition of UNPREDICTABLE does not permit this to be a security hole. if !IsSecure() && mode == ‘10110’ then UNPREDICTABLE; if !IsSecure() && mode == ‘10001’ && NSACR.RFR == ‘1’ then UNPREDICTABLE; Encoding T2 ARMv6T2, ARMv7 SRS{IA} SP{!},# 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 0 1 0 0 1 1 0 W 0 (1) (1) (0) (1) (1) (1) (0) (0) (0) (0) (0) (0) (0) (0) (0) mode wback = (W == ‘1’); increment = TRUE; wordhigher = FALSE; // In Non-secure state, check for attempts to access Monitor mode (‘10110’), or FIQ // mode (‘10001’) when the Security Extensions are reserving the FIQ registers. The // definition of UNPREDICTABLE does not permit this to be a security hole. if !IsSecure() && mode == ‘10110’ then UNPREDICTABLE; if !IsSecure() && mode == ‘10001’ && NSACR.RFR == ‘1’ then UNPREDICTABLE; Encoding A1 ARMv6*, ARMv7 SRS{} SP{!},# 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 1 1 1 0 0 P U 1 W 0 (1) (1) (0) (1) (0) (0) (0) (0) (0) (1) (0) (1) (0) (0) (0) 1 0 mode wback = (W == ‘1’); inc = (U == ‘1’); wordhigher = (P == U); // In Non-secure state, check for attempts to access Monitor mode (‘10110’), or FIQ // mode (‘10001’) when the Security Extensions are reserving the FIQ registers. The // definition of UNPREDICTABLE does not permit this to be a security hole. if !IsSecure() && mode == ‘10110’ then UNPREDICTABLE; if !IsSecure() && mode == ‘10001’ && NSACR.RFR == ‘1’ then UNPREDICTABLE; B6-20 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B System Instructions Assembler syntax SRS{} SP{!}, # where: is one of: DA Decrement After. ARM code only. The consecutive memory addresses end at the address in the base register. Encoded as P = 0, U = 0 in encoding A1. DB Decrement Before. The consecutive memory addresses end one word below the address in the base register. Encoding T1, or encoding A1 with P = 1, U = 0. IA Increment After. The consecutive memory addresses start at the address in the base register. This is the default, and is normally omitted. Encoding T2, or encoding A1 with P = 0, U = 1. IB Increment Before. ARM code only. The consecutive memory addresses start one word above the address in the base register. Encoded as P = 1, U = 1 in encoding A1. See Standard assembler syntax fields on page A8-7. An ARM SRS instruction must be unconditional. ! Causes the instruction to write a modified value back to the base register (encoded as W = 1). If ! is omitted, the instruction does not change the base register (encoded as W = 0). The number of the mode whose banked SP is used as the base register. For details of processor modes and their numbers see ARM processor modes on page B1-6. SRSFA, SRSEA, SRSFD, and SRSED are pseudo-instructions for SRSIB, SRSIA, SRSDB, and SRSDA respectively, referring to their use for pushing data onto Full Ascending, Empty Ascending, Full Descending, and Empty Descending stacks. Operation if ConditionPassed() then EncodingSpecificOperations(); if CurrentModeIsUserOrSystem() || CurrentInstrSet() == InstrSet_ThumbEE then UNPREDICTABLE; else base = Rmode[13,mode]; address = if increment then base else base-8; if wordhigher then address = address+4; MemA[address,4] = LR; MemA[address+4,4] = SPSR[]; if wback then Rmode[13] = if increment then base+8 else base-8; Exceptions Data Abort. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B6-21 System Instructions B6.1.11 STM (user registers) Store Multiple (user registers) is UNPREDICTABLE in User or System modes. In exception modes, it stores multiple User mode registers to consecutive memory locations using an address from a banked base register. Writeback to the base register is not available with this instruction. Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7 STM{amode} ,^ 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 cond 1 0 0 P U 1 (0) 0 Rn n = UInt(Rn); registers = register_list; increment = (U == ‘1’); if n == 15 || BitCount(registers) < 1 then UNPREDICTABLE; 7 6 5 4 3 2 1 0 register_list wordhigher = (P == U); Assembler syntax STM{amode} , ^ where: See Standard assembler syntax fields on page A8-7. amode is one of: DA Decrement After. The consecutive memory addresses end at the address in the base register. For this instruction, ED, meaning Empty Descending, is equivalent to DA. Encoded as P = 0, U = 0. DB Decrement Before. The consecutive memory addresses end one word below the address in the base register. For this instruction, FD, meaning Full Descending, is equivalent to DB. Encoded as P = 1, U = 0. IA Increment After. The consecutive memory addresses start at the address in the base register. This is the default, and is normally omitted. For this instruction, EA, meaning Empty Ascending, is equivalent to IA. Encoded as P = 0, U = 1. IB Increment Before. The consecutive memory addresses start one word above the address in the base register. For this instruction, FA, meaning Full Ascending, is equivalent to IB. Encoded as P = 1, U = 1. The base register. This register can be the SP. Is a list of one or more registers, separated by commas and surrounded by { and }. It specifies the set of registers to be stored by the STM instruction. The registers are stored with the lowest-numbered register to the lowest memory address, through to the highest-numbered register to the highest memory address. The pre-UAL syntax STM{amode} is equivalent to STM{amode}. B6-22 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B System Instructions Operation if ConditionPassed() then EncodingSpecificOperations(); if CurrentModeIsUserOrSystem() then UNPREDICTABLE; length = 4*BitCount(registers); address = if increment then R[n] else R[n]-length; if wordhigher then address = address+4; for i = 0 to 14 if registers == ‘1’ then / Store User mode (‘10000’) register MemA[address,4] = Rmode[i, ‘10000’]; address = address + 4; if registers<15> == ‘1’ then MemA[address,4] = PCStoreValue(); Exceptions Data Abort. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B6-23 System Instructions B6.1.12 STRBT, STRHT, and STRT Even in privileged modes, stores to memory by these instructions are restricted in the same way as stores to memory in User mode. This is encapsulated in the MemA_unpriv[] and MemU_unpriv[] pseudocode functions. For details see Aligned memory accesses on page B2-31 and Unaligned memory accesses on page B2-32. For details of the instructions see: B6-24 • STRBT on page A8-394 • STRHT on page A8-414 • STRT on page A8-416. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B System Instructions B6.1.13 SUBS PC, LR and related instructions The SUBS PC, LR, # instruction provides an exception return without the use of the stack. It subtracts the immediate constant from LR, branches to the resulting address, and also copies the SPSR to the CPSR. The ARM instruction set contains similar instructions based on other data-processing operations, with a wider range of operands, or both. The use of these other instructions is deprecated, except for MOVS PC, LR. Encoding T1 ARMv6T2, ARMv7 SUBS PC,LR,# Outside or last in IT block 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 0 0 1 1 1 1 0 1 (1) (1) (1) (0) 1 0 (0) 0 (1) (1) (1) (1) n = 14; imm32 = ZeroExtend(imm8, 32); register_form = FALSE; if InITBlock() && !LastInITBlock() then UNPREDICTABLE; Encoding A1 imm8 opcode = ‘0010’; // = SUB ARMv4*, ARMv5T*, ARMv6*, ARMv7 S PC,,# S PC,# 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 cond 0 0 1 n = UInt(Rn); opcode 1 Rn imm32 = ARMExpandImm(imm12); Encoding A2 1 1 1 1 3 2 1 0 imm12 register_form = FALSE; ARMv4*, ARMv5T*, ARMv6*, ARMv7 S PC,,{,} S PC,{,} 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 cond 0 0 0 opcode 1 Rn 1 1 1 1 imm5 type 0 3 2 1 0 Rm n = UInt(Rn); m = UInt(Rm); register_form = TRUE; (shift_t, shift_n) = DecodeImmShift(type, imm5); Assembler syntax SUBS PC, LR, # S PC, , # S PC, , {,} S PC, # S PC, {,} Encodings T1, A1 Encoding A1 Encoding A2, deprecated Encoding A1, deprecated Encoding A2 where: ARM DDI 0406B See Standard assembler syntax fields on page A8-7. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B6-25 System Instructions The operation. is one of ADC, ADD, AND, BIC, EOR, ORR, RSB, RSC, SBC, and SUB. Use of all of these operations except SUB is deprecated. The operation. is MOV or MVN. Use of MVN is deprecated. The first operand register. Use of any register except LR is deprecated. The immediate constant. For encoding T1, is in the range 0-255. See Modified immediate constants in ARM instructions on page A5-9 for the range of available values in encoding A1. The optionally shifted second or only operand register. Use of any register except LR is deprecated. The shift to apply to the value read from . If absent, no shift is applied. The shifts and how they are encoded are described in Shifts applied to a register on page A8-10. Use of is deprecated. The value of the operation or is encoded in the opcode field of the instruction. For the opcode values for different operations see Operation on page B6-6. In Thumb code, MOVS PC,LR is a pseudo-instruction for SUBS PC,LR,#0. The pre-UAL syntax S is equivalent to S. The pre-UAL syntax S is equivalent to S. Operation if ConditionPassed() then EncodingSpecificOperations(); if CurrentInstrSet() == InstrSet_ThumbEE then UNPREDICTABLE; operand2 = if register_form then Shift(R[m], shift_t, shift_n, APSR.C) else imm32; case opcode of when ‘0000’ result = R[n] AND operand2; // AND when ‘0001’ result = R[n] EOR operand2; // EOR when ‘0010’ (result, -, -) = AddWithCarry(R[n], NOT(operand2), ‘1’); // SUB when ‘0011’ (result, -, -) = AddWithCarry(NOT(R[n]), operand2, ‘1’); // RSB when ‘0100’ (result, -, -) = AddWithCarry(R[n], operand2, ‘0’); // ADD when ‘0101’ (result, -, -) = AddWithCarry(R[n], operand2, APSR.c); // ADC when ‘0110’ (result, -, -) = AddWithCarry(R[n], NOT(operand2), APSR.C); // SBC when ‘0111’ (result, -, -) = AddWithCarry(NOT(R[n]), operand2, APSR.C); // RSC when ‘1100’ result = R[n] OR operand2; // ORR when ‘1101’ result = operand2; // MOV when ‘1110’ result = R[n] AND NOT(operand2); // BIC when ‘1111’ result = NOT(operand2); // MVN CPSRWriteByInstr(SPSR[], ‘1111’, TRUE); BranchWritePC(result); Exceptions None. B6-26 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B System Instructions B6.1.14 VMRS Move to ARM core register from Advanced SIMD and VFP extension System Register moves the value of an extension system register to a general-purpose register. Encoding T1 / A1 VFPv2, VFPv3, Advanced SIMD VMRS , 15 14 13 12 11 10 9 8 7 6 5 4 1 1 1 0 1 1 1 0 1 1 1 1 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 reg Rt 1 0 1 0 (0) (0) (0) 1 (0) (0) (0) (0) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 cond 1 1 1 0 1 1 1 1 reg Rt 3 2 1 0 3 2 1 0 1 0 1 0 (0) (0) (0) 1 (0) (0) (0) (0) t = UInt(Rt); if t == 13 && CurrentInstrSet() != InstrSet_ARM then UNPREDICTABLE; if t == 15 && reg != ‘0001’ then UNPREDICTABLE; ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B6-27 System Instructions Assembler syntax VMRS , where: See Standard assembler syntax fields on page A8-7. The destination ARM core register. This register can be R0-R14. If is FPSCR, it is also permitted to be APSR_nzcv, encoded as Rt = ’1111’. This instruction transfers the FPSCR N, Z, C, and V flags to the APSR N, Z, C, and V flags. Is one of: FPSID FPSCR MVFR1 MVFR0 FPEXC reg = ’0000’ reg = ’0001’ reg = ’0110’ reg = ’0111’ reg = ’1000’. If the Common VFP subarchitecture is implemented, see Subarchitecture additions to the VFP system registers on page AppxB-15 for additional values of . The pre-UAL instruction FMSTAT is equivalent to VMRS APSR_nzcv, FPSCR. Operation if ConditionPassed() then EncodingSpecificOperations(); if reg == ‘0001’ then // FPSCR CheckVFPEnabled(TRUE); SerializeVFP(); VFPExcBarrier(); if t == 15 then APSR.N = FPSCR.N; APSR.Z = FPSCR.Z; APSR.C = FPSCR.C; APSR.V = FPSCR.V; else R[t] = FPSCR; else // Non-FPSCR registers are privileged-only and not affected by FPEXC.EN CheckVFPEnabled(FALSE); if !CurrentModeIsPrivileged() then UNDEFINED; case reg of when ‘0000’ SerializeVFP(); R[t] = FPSID; // ‘0001’ already dealt with above when ‘001x’ UNPREDICTABLE; when ‘010x’ UNPREDICTABLE; when ‘0110’ SerializeVFP(); R[t] = MVFR1; when ‘0111’ SerializeVFP(); R[t] = MVFR0; when ‘1000’ SerializeVFP(); R[t] = FPEXC; otherwise SUBARCHITECTURE_DEFINED register access; Exceptions Undefined Instruction. B6-28 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B System Instructions B6.1.15 VMSR Move to Advanced SIMD and VFP extension System Register from ARM core register moves the value of a general-purpose register to a VFP system register. Encoding T1 / A1 VFPv2, VFPv3, Advanced SIMD VMSR , 15 14 13 12 11 10 9 8 7 6 5 4 1 1 1 0 1 1 1 0 1 1 1 0 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 reg Rt 1 0 1 0 (0) (0) (0) 1 (0) (0) (0) (0) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 cond 1 1 1 0 1 1 1 0 reg Rt 3 2 1 0 3 2 1 0 1 0 1 0 (0) (0) (0) 1 (0) (0) (0) (0) t = UInt(Rt); if t == 15 || (t == 13 && CurrentInstrSet() != InstrSet_ARM) then UNPREDICTABLE; ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. B6-29 System Instructions Assembler syntax VMSR , where: See Standard assembler syntax fields on page A8-7. Is one of: FPSID FPSCR FPEXC reg = ’0000’ reg = ’0001’ reg = ’1000’. If the Common VFP subarchitecture is implemented, see Subarchitecture additions to the VFP system registers on page AppxB-15 for additional values of . The general-purpose register to be transferred to . Operation if ConditionPassed() then EncodingSpecificOperations(); if reg == ‘0001’ then // FPSCR CheckVFPEnabled(TRUE); SerializeVFP(); VFPExcBarrier(); FPSCR = R[t]; else // Non-FPSCR registers are privileged-only and not affected by FPEXC.EN CheckVFPEnabled(FALSE); if !CurrentModeIsPrivileged() then UNDEFINED; case reg of when ‘0000’ SerializeVFP(); // ‘0001’ already dealt with above when ‘001x’ UNPREDICTABLE; when ‘01xx’ UNPREDICTABLE; when ‘1000’ SerializeVFP(); FPEXC = R[t]; otherwise SUBARCHITECTURE_DEFINED register access; Exceptions Undefined Instruction. B6-30 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Part C Debug Architecture Chapter C1 Introduction to the ARM Debug Architecture This chapter introduces part C of this manual, and the ARM Debug architecture. It contains the following sections: • Scope of part C of this manual on page C1-2 • About the ARM Debug architecture on page C1-3 • Security Extensions and debug on page C1-8 • Register interfaces on page C1-9. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C1-1 Introduction to the ARM Debug Architecture C1.1 Scope of part C of this manual Part C of this manual defines the debug features of ARMv7. However, ARM recognizes that many debuggers require compatibility with previous versions of the ARM Debug architecture. Therefore, this part includes information about three versions of the ARM Debug architecture: • v7 Debug • v6.1 Debug • v6 Debug These three versions of the Debug architecture are introduced in Major differences between the ARMv6 and ARMv7 Debug architectures on page C1-7. In part C of this manual: • ARMv6 is used sometimes to refer to an implementation that includes either v6.1 Debug or v6 Debug. • ARMv7 is used sometimes to refer to an implementation that includes v7 Debug. Note • v6.1 Debug and v6 Debug are two different versions of the Debug architecture for the ARMv6 architecture. They might be described as: — ARMv6, v6.1 Debug — ARMv6, v6 Debug. Throughout this part the descriptions v6.1 Debug and v6 Debug are used, for brevity. • C1-2 Any processor that implements the ARMv7 architecture must implement v7 Debug. Information about v6.1 Debug and v6 Debug is given: — to enable developers to produce debuggers that are backwards compatible with these Debug architecture versions — as reference material for processors that implement the ARMv6 architecture. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Introduction to the ARM Debug Architecture C1.2 About the ARM Debug architecture ARM processors implement two types of debug support: Invasive debug All debug features that permit modification of processor state. For more information, see Invasive debug. Non-invasive debug All debug features that permit data and program flow observation, especially trace support. For more information, see Non-invasive debug on page C1-5. The following sections introduce invasive and non-invasive debug. Summary of the ARM debug component descriptions on page C1-7 gives a quick reference summary of the rest of this part of this manual. C1.2.1 Invasive debug The invasive debug component of the ARM Debug architecture is intended primarily for run-control debugging. Note In this part of this manual, invasive debug is often referred to simply as debug. For example, debug events, debug exceptions, and Debug state are all part of the invasive debug implementation. The programmers’ model can be used to manage and control debug events. Watchpoints and breakpoints are two examples of debug events. Debug events are described in Chapter C3 Debug Events. You can configure the processor through the DBGDSCR into one of two debug-modes: Monitor debug-mode In Monitor debug-mode, a debug event causes a debug exception to occur: • a debug exception that relates to instruction execution generates a Prefetch Abort exception • a debug exception that relates to a data access generates a Data Abort exception. Debug exceptions are described in Chapter C4 Debug Exceptions. Halting debug-mode In Halting debug-mode, a debug event causes the processor to enter a special Debug state. When the processor is in Debug state, the processor ceases to execute instructions from the program counter location, but is instead controlled through the external debug interface, in particular the Instruction Transfer Register (DBGITR). This enables an external agent, such as a debugger, to interrogate processor context, and control all subsequent instruction execution. Because the processor is stopped, it ignores the system and cannot service interrupts. Debug state is described in Chapter C5 Debug State. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C1-3 Introduction to the ARM Debug Architecture A debug solution can use a mixture of the two methods, for example to support an OS or RTOS with both: • Running System Debug (RSD) using Monitor debug-mode • Halting debug-mode support available as a fallback for system failure and boot time debug. The architecture supports the ability to switch between these two debug-modes. When no debug-mode is selected, debug is restricted to simple monitor solutions. These are usually ROM or Flash-based. Such a monitor might use standard system features, such as a UART or Ethernet connection, to communicate with a debug host. Alternatively, it might use the Debug Communications Channel (DCC) as an out-of-band communications channel to the host. This minimizes the debug requirement on system resources. All versions of the Debug architecture provide a software interface that includes: • a Debug Identification Register (DBGDIDR) • status and control registers, including the Debug Status and Control Register (DBGDSCR) • hardware breakpoint and watchpoint support • the DCC. In addition, the v7 Debug software interface includes reset, power-down and operating system debug support features. The Debug architecture requires an external debug interface that supports access to the programmers’ model. This forms the basis of the Debug Programmers' Model (DPM) for ARMv6 and ARMv7. Description of invasive debug features The following chapters describe the invasive debug implementation: • Chapter C2 Invasive Debug Authentication • Chapter C3 Debug Events • Chapter C4 Debug Exceptions • Chapter C5 Debug State. In addition, see: C1-4 • Chapter C6 Debug Register Interfaces for a description of the register interfaces to the debug components • Chapter C10 Debug Registers Reference for descriptions of the registers used to configure and control debug operations • Appendix A Recommended External Debug Interface for a description of the recommended external interface to the debug components. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Introduction to the ARM Debug Architecture C1.2.2 Non-invasive debug Non-invasive debug includes all debug features that permit data and program flow to be observed, but that do not permit modification of the main processor state. The v7 Debug architecture defines three areas of non-invasive debug: • Instruction trace and, in some implementations, data trace. Trace support is an architecture extension typically implemented using a trace macrocell, see Trace. • Sample-based profiling, see Sample-based profiling on page C1-6. • Performance monitors, see Performance monitors on page C1-6. A processor implementation might include other forms of non-invasive debug. Chapter C7 Non-invasive Debug Authentication describes the authentication of non-invasive debug operations. Trace Trace support is an architecture extension. This manual describes such an extension as a trace macrocell. A trace macrocell constructs a real-time trace stream corresponding to the operation of the processor. It is IMPLEMENTATION DEFINED whether the trace stream is: • stored locally in an Embedded Trace Buffer (ETB) for independent download and analysis • exported directly through a trace port to a Trace Port Analyzer (TPA) and its associated host based trace debug tools. Typically, use of a trace macrocell is non-invasive. Development tools can connect to the trace macrocell, configure it, capture trace and download the trace without affecting the operation of the processor in any way. A trace macrocell provides an enhanced level of runtime system observation and debug granularity. It is particularly useful in cases where: • Stopping the processor affects the behavior of the system. • There is insufficient state visible in a system by the time a problem is detected to be able to determine its cause. Trace provides a mechanism for system logging and back tracing of faults. Trace might also be used to perform analysis of code running on the processor, such as performance analysis or code coverage analysis. Typically, a trace architecture defines: • the trace macrocell programmers’ model • permitted trace protocol formats • the physical trace port connector. The following documents define the ARM trace architectures: • Embedded Trace Macrocell Architecture Specification • CoreSight Program Flow Trace Architecture Specification. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C1-5 Introduction to the ARM Debug Architecture The ARM trace architectures have a common identification mechanism. This means development tools can detect which architecture is implemented. Sample-based profiling Sample-based profiling is an optional non-invasive component of the Debug architecture, that enables debug software to profile a program. For more information, see Chapter C8 Sample-based Profiling. Performance monitors Performance monitors were implemented in several processors before ARMv7, but before ARMv7 they did not form part of the architecture. The ARMv7 form of the monitors, described here, follows those implementations with minor modifications to enable future expansion. The basic form of the performance monitors is: • • A cycle counter, with the ability to count every cycle or every sixty-fourth cycle. A number of event counters. The event counted by each counter is programmable: — — Previous implementations provided up to four counters In ARMv7, space is provided for up to 31 counters. The actual number of counters is and an identification mechanism is provided. IMPLEMENTATION DEFINED, • Controls for — enabling and resetting counters — flagging overflows — enabling interrupts on overflow. The cycle counter can be enabled independently from the event counters. The set of events that can be monitored is divided into: • events that are likely to be consistent across many microarchitectures • other events, that are likely to be implementation specific. As a result, the architecture defines a common set of events to be used across many microarchitectures, and a large space reserved for IMPLEMENTATION DEFINED events. The full set of events for any given implementation is IMPLEMENTATION DEFINED. There is no requirement to implement any of the common set of events, but the numbers allocated for the common set of events must not be used except as defined. Chapter C9 Performance Monitors describes the performance monitors. C1-6 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Introduction to the ARM Debug Architecture C1.2.3 Major differences between the ARMv6 and ARMv7 Debug architectures ARMv6 is the first version of the ARM architecture to include debug. The introduction of the ARM architecture Security Extensions extended the ARMv6 Debug architecture: • ARMv6 processors without the Security Extensions implement v6 Debug • ARMv6 processors with the Security Extensions implement v6.1 Debug. ARMv7 introduces additional extensions to support developments in the debug environment. The main change in the Debug architecture is the specification of new forms of external debug interface. ARMv6 Debug does not require a particular debug interface, but can be implemented with access from a JTAG interface as defined in IEEE Standard Test Access Port and Boundary Scan Architecture (JTAG). However, systems such as the ARM CoreSight™ architecture require changes in the debug interface. For more information about the CoreSight architecture see the CoreSight Architecture Specification. ARMv7 Debug addresses some of the aims of the CoreSight architecture, such as a more system-centric view of debug, and improved debug of powered-down systems. v7 Debug also introduces an architecture extension to provide performance monitors. C1.2.4 Summary of the ARM debug component descriptions Table C1-1 shows the main components of v7 Debug, and where they are described. Table C1-1 v7 Debug subarchitectures Component Status Type Reference Run-control Debug Required Invasive Chapter C2 Invasive Debug Authentication Chapter C3 Debug Events Chapter C4 Debug Exceptions Chapter C5 Debug State Chapter C6 Debug Register Interfaces Trace Optional Non-invasive a Trace on page C1-5 Sample-based profiling Optional Non-invasive a Chapter C8 Sample-based Profiling Performance monitors Optional Non-invasive a Chapter C9 Performance Monitors a. For information about authentication of these components see Chapter C7 Non-invasive Debug Authentication. For more information, see: • Chapter C10 Debug Registers Reference • Appendix A Recommended External Debug Interface. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C1-7 Introduction to the ARM Debug Architecture C1.3 Security Extensions and debug Security Extensions debug enables you to not permit invasive debug events and non-invasive debug operations independently in either: • In all processor modes in Secure state. • In Secure privileged modes but not in Secure User mode. In v7 Debug, for invasive debug events that cause entry to Debug state: — support for not permitting these events is optional — if an implementation does support not permitting these events the use of them is deprecated. This is controlled by two control bits in the Secure Debug Enable Register and, in the recommended external debug interface, four input signals: • the Secure User Invasive Debug Enable bit, SDER.SUIDEN • the Secure User Non-invasive Debug Enable bit, SDER.SUNIDEN • in the recommended external debug interface: — the Debug Enable signal, DBGEN — the Non-Invasive Debug Enable signal, NIDEN — the Secure Privileged Invasive Debug Enable signal, SPIDEN — the Secure Privileged Non-Invasive Debug Enable signal, SPNIDEN. For more information, see: C1-8 • Chapter C2 Invasive Debug Authentication • Chapter C7 Non-invasive Debug Authentication • c1, Secure Debug Enable Register (SDER) on page B3-108 for details of the SUIDEN and SUNIDEN control bits • Authentication signals on page AppxA-3 for details of the DBGEN, NIDEN, SPIDEN and SPNIDEN signals. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Introduction to the ARM Debug Architecture C1.4 Register interfaces This section gives a brief description of the different debug register interfaces defined by v7 Debug. The most important distinction is between: • the external debug interface, that defines how an external debugger can access the v7 Debug resources • the processor interface, that describes how an ARMv7 processor can access its own debug resources. For v7 Debug, ARM recommends an external debug interface based on the ARM Debug Interface v5 Architecture Specification (ADIv5). The most significant difference between ADIv5 and the interface recommended by v6 Debug and v6.1 Debug is that ADIv5 supports debug over power-down of the processor. Although the ADIv5 interface is not required for compliance with ARMv7, the ARM RealView® tools require this interface to be implemented. ADIv5 supports both a JTAG wire interface and a low pin-count Serial Wire (SW) interface. The RealView tools support either wire interface. An ADIv5 interface enables a debug object, such as an ARMv7 processor, to abstract a set of resources as a memory-mapped peripheral. Accesses to debug resources are made as 32-bit read/write transfers. Power-down debug is supported by introducing the abstraction that accesses to certain resources can return an error response when they are unavailable, just as a memory-mapped peripheral can return a slave-generated error response in exceptional circumstances. v7 Debug requires software executing on the processor to be able to access all debug registers. To provide access to a particular basic subset of debug registers, v7 Debug requires implementation of the Baseline Coprocessor 14 (CP14) Interface, see The Baseline CP14 debug register interface on page C6-32. To provide access to the rest of the debug registers v7 Debug permits one of two options: • An Extended CP14 interface. This is similar to the requirement of v6 Debug and v6.1 Debug. • A memory-mapped interface. An implementation can include both of these options. ARMv7 does not permit all combinations of debug, trace, and performance monitor register interfaces. There are three options for ARMv7 implementations, shown in Table C1-2 on page C1-10. In a number of cases an optional memory-mapped interface is permitted, indicated by brackets. ARM recommends that if the optional memory-mapped interface is implemented for either the debug interface or the trace interface then it is implemented for both of these interfaces. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C1-9 Introduction to the ARM Debug Architecture Table C1-2 Options for interfacing to debug in ARMv7 Processor interface to debug registers Processor interface to trace registers Processor interface to performance monitor Baseline CP14 + Memory-mapped (Memory-mapped) a CP15 Baseline CP14 + Extended CP14 (+ Memory-mapped) a Memory-mapped) a CP15 Baseline CP14 + Extended CP14 (+ Memory-mapped) a CP14 (+ Memory-mapped) a CP15 a. Interfaces shown in brackets are optional, see text for more information. C1-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Chapter C2 Invasive Debug Authentication This chapter describes the authentication controls on invasive debug operations. It contains the following section: • About invasive debug authentication on page C2-2. Note The recommended external debug interface provides an authentication interface that controls both invasive debug and non-invasive debug, as described in Authentication signals on page AppxA-3. This chapter describes how you can use this interface to control invasive debug. For information about using the interface to control non-invasive debug see Chapter C7 Non-invasive Debug Authentication. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C2-1 Invasive Debug Authentication C2.1 About invasive debug authentication Invasive debug can be enabled or disabled. If it is disabled the processor ignores all debug events except BKPT Instruction. This means that debug events other than the BKPT Instruction debug event do not cause the processor to enter Debug state or to take a debug exception. In addition, if a processor implements the Security Extensions, invasive debug can be permitted or not permitted. When invasive debug is not permitted, all debug events are not permitted. When a debug event is not permitted: • if the debug event is not a BKPT Instruction debug event then it is ignored • if the debug event is a BKPT Instruction debug event then it causes a debug exception. Note The BKPT Instruction debug event is never ignored. The difference between enabled and permitted is that whether a debug event is permitted depends on both the security state and the operating mode of the processor. For debug events that cause entry to Debug state, Secure User halting debug refers to permitting these events in Secure User mode when invasive debug is not permitted in Secure privileged modes. The debug events that cause entry to Debug state are: • Halting debug events • if Halting debug-mode is selected, Software debug events. Support for Secure User halting debug is required in v6.1 Debug. In v7 Debug it is IMPLEMENTATION DEFINED whether Secure User halting debug is supported. On an implementation that does not support Secure User halting debug the DBGDIDR.nSUHD_imp bit is RAO, see Debug ID Register (DBGDIDR) on page C10-3. ARM deprecates the use of Secure User halting debug. If the Security Extensions are implemented, when invasive debug is not permitted in Secure privileged modes it must be possible to permit, in Secure User mode, the debug events that do not cause entry to Debug state. The debug events that do not cause entry to Debug state are Software debug events when Monitor debug-mode is selected. Note When the Security Extensions are implemented, the Debug architecture distinguishes between permitting invasive halting debug and permitting invasive non-halting debug. However, in Non-secure state and in Secure privileged modes whether a debug event is permitted does not depend on whether the event would cause entry to Debug state. Therefore, the distinction between permitting invasive halting debug and invasive non-halting debug applies only in Secure User mode. When Secure User halting debug is supported, the processor can be configured so that both invasive halting debug and invasive non-halting debug are permitted in Secure User mode when invasive debug is not permitted in Secure privileged modes. Therefore, the alternatives for when a debug event is permitted are: • in all processor modes, in both Secure and Non-secure security states • only in Non-secure state C2-2 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Invasive Debug Authentication • in Non-secure state and also in Secure User mode. When Secure User halting debug is not supported, the processor can be configured only so that invasive non-halting debug is permitted in Secure User mode when invasive debug is not permitted in Secure privileged modes. Any debug event that would cause entry to Debug state is ignored, unless it is a BKPT Instruction debug event. Therefore, the alternatives for when a debug event is permitted are: • in all processor modes, in both Secure and Non-secure security states • only in Non-secure state • in Non-secure state and also, if it will not cause entry to Debug state, in Secure User mode. In v6.1 Debug and v7 Debug, invasive debug authentication can be controlled dynamically, meaning that whether a debug event is permitted can change while the processor is running, or while the processor is in Debug state. For more information, see Generation of debug events on page C3-40. In v6 Debug, invasive debug authentication can be changed only while the processor is in reset. In the recommended external debug interface, the signals that control the enabling and permitting of debug events are DBGEN and SPIDEN. SPIDEN is only implemented on processors that implement Security Extensions. See Authentication signals on page AppxA-3. Part C of this manual assumes that the recommended external debug interface is implemented. Note • DBGEN and SPIDEN also control non-invasive debug, see About non-invasive debug authentication on page C7-2. • For more information about use of the authentication signals see Changing the authentication signals on page AppxA-4. If DBGEN is LOW, all invasive debug is disabled. On processors that do not implement Security Extensions, if DBGEN is HIGH, invasive debug is enabled and permitted in all modes, see Table C2-1: Table C2-1 Invasive debug authentication, Security Extensions not implemented DBGEN Modes in which invasive debug is permitted LOW None. Invasive debug is disabled. HIGH All modes. On processors that implement the Security Extensions, if both DBGEN and SPIDEN are HIGH, invasive debug is enabled and all debug events are permitted in all modes and in both Secure and Non-secure security states. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C2-3 Invasive Debug Authentication If DBGEN is HIGH and SPIDEN is LOW: • invasive debug is enabled • all debug events are permitted in the Non-secure state • no debug events are permitted in Secure privileged modes. • whether invasive debug is permitted in Secure User mode depends on: — the value of the SDER.SUIDEN bit, see c1, Secure Debug Enable Register (SDER) on page B3-108. — if Secure User halting debug is not supported, whether the debug event would cause entry to Debug state. This is shown in Table C2-2. Table C2-2 Invasive debug authentication, Security Extensions implemented DBGENa SPIDENa SUIDENb Mode Security state Invasive debug LOW X X Any Either Disabled HIGH LOW 0 Any Non-secure Enabled and permitted Secure Enabled but not permitted Any Non-secure Enabled and permitted User Secure See note c Privileged Secure Enabled but not permitted Any Either Enabled and permitted HIGH HIGH LOW HIGH 1 X a. Authentication signals, see Authentication signals on page AppxA-3. b. SDER.SUIDEN bit, see c1, Secure Debug Enable Register (SDER) on page B3-108. c. Invasive non-halting debug is permitted. If Secure User halting debug is not supported then invasive halting debug is enabled but not permitted. Otherwise, invasive halting debug is enabled and permitted. Note Invasive and non-invasive debug authentication enable you to protect Secure processing from direct observation or invasion by an untrusted debugger. If you are designing a system you must be aware that security attacks can be aided by the invasive and non-invasive debug facilities. For example, Debug state or the DBGDSCR.INTdis register bit might be used for a denial of service attack, and the Non-secure performance monitors might be used to measure the side-effects of Secure processing on Non-secure code. ARM recommends that where you are concerned about such attacks you disable invasive and non-invasive debug in all modes. However you must be aware of the limitations on the protection that debug authentication can provide, because similar attacks can be made by running malicious code on the processor in Non-secure state. C2-4 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Chapter C3 Debug Events This chapter describes debug events. Debug events trigger invasive debug operations. It contains the following sections: • About debug events on page C3-2 • Software debug events on page C3-5 • Halting debug events on page C3-38 • Generation of debug events on page C3-40 • Debug event prioritization on page C3-43. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C3-1 Debug Events C3.1 About debug events A debug event can be either: • A Software debug event, see Software debug events on page C3-5 • A Halting debug event, see Halting debug events on page C3-38. A processor responds to a debug event in one of the following ways: • ignores the debug event • takes a debug exception, see Chapter C4 Debug Exceptions • enters Debug state, see Chapter C5 Debug State. The response depends on the configuration. This is shown in Table C3-1 and in: • Figure C3-1 on page C3-3 for v7 Debug • Figure C3-2 on page C3-4 for v6 Debug and v6.1 Debug. Table C3-1 Processor behavior on debug events Configuration Behavior, for specified debug event Enabled and permitted a DBGDSCR [15:14] b BKPT Instruction debug event No xx c Yes Debug-mode selected and enabled Other Software debug event Halting debug event Debug exception d Ignore Ignore e Disabled or not permitted 00 Debug exception d Ignore Debug state entry f None Yes x1 Debug state entry Debug state entry Debug state entry Halting Yes 10 Debug exception Debug exception or Debug state entry f Monitor UNPREDICTABLE g a. Invasive debug is enabled and the debug event is permitted. Whether a debug event is permitted might depend on the type of debug event as well as the configuration of the processor, see Chapter C2 Invasive Debug Authentication. b. See Debug Status and Control Register (DBGDSCR) on page C10-10. c. The value of DBGSCR[15:14] is ignored when invasive debug is disabled or the debug event is not permitted. If debug is disabled these bits are RAZ. d. When debug is disabled or the debug event is not permitted, the BKPT instruction generates a debug exception rather than being ignored. The DBGDSCR, IFSR and IFAR are set as if a BKPT Instruction debug exception occurred. See Effects of debug exceptions on CP15 registers and the DBGWFAR on page C4-4. e. The processor might enter Debug state later, see Halting debug events on page C3-38. f. In v6 Debug, it is IMPLEMENTATION DEFINED whether the processor enters Debug state or ignores the event. g. Be careful when programming debug events when Monitor debug-mode is selected and enabled, because certain conditions can lead to UNPREDICTABLE behavior, see Unpredictable behavior on Software debug events on page C3-24. In v6 Debug and v6.1 Debug, some events are ignored in this state. C3-2 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Events BKPT Instruction Debug Exception (Prefetch Abort) BKPT Instruction Debug Exception (Prefetch Abort) Breakpoint Debug Exception (Data Abort) Breakpoint Debug Exception (Data Abort) Vector Catch UNPREDICTABLE Vector Catch UNPREDICTABLE Watchpoint Ignored Watchpoint Ignored Software Debug Events Software Debug Events External Debug Request External Debug Request Debug State Entry Halt Request OS Unlock Catch OS Unlock Catch Halting Debug Events Halting Debug Events Debug disabled or debug event not permitted Debug-mode: none BKPT Instruction Debug Exception (Prefetch Abort) BKPT Instruction Breakpoint Debug Exception (Data Abort) Breakpoint Vector Catch UNPREDICTABLE Vector Catch Watchpoint Ignored Watchpoint Software Debug Events Software Debug Events External Debug Request External Debug Request Debug State Entry Halt Request Debug State Entry Halt Request Debug Exception (Prefetch Abort) ‡ ‡ Halt Request OS Unlock Catch OS Unlock Catch Halting Debug Events Halting Debug Events Debug Exception (Data Abort) UNPREDICTABLE Ignored Debug State Entry ‡ Some cases are UNPREDICTABLE, see text. Debug-mode: Halting Debug-mode: Monitor Figure C3-1 Processor behavior on debug events, for v7 Debug ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C3-3 Debug Events BKPT Instruction Debug Exception (Prefetch Abort) BKPT Instruction Debug Exception (Prefetch Abort) Breakpoint Debug Exception (Data Abort) Breakpoint Debug Exception (Data Abort) Vector Catch UNPREDICTABLE Watchpoint Ignored Software Debug Events External Debug Request Debug State Entry Halt Request Halting Debug Events Vector Catch UNPREDICTABLE Watchpoint External Debug Request Ignored ‡ Software Debug Events ‡ Debug State Entry Halt Request Halting Debug Events ‡ IMPLEMENTATION DEFINED altenative behavior for v6 Debug Debug disabled or debug event not permitted Debug-mode: none BKPT Instruction Debug Exception (Prefetch Abort) BKPT Instruction Debug Exception (Prefetch Abort) Breakpoint Debug Exception (Data Abort) Breakpoint Debug Exception (Data Abort) Vector Catch UNPREDICTABLE Watchpoint Ignored Software Debug Events External Debug Request Halt Request Debug State Entry Halting Debug Events Vector Catch † † Watchpoint ‡ Software Debug Events External Debug Request UNPREDICTABLE Ignored ‡ Halt Request Debug State Entry Halting Debug Events ‡ IMPLEMENTATION DEFINED altenative behavior for v6 Debug † Some cases are ignored, see text Debug-mode: Halting Debug-mode: Monitor Figure C3-2 Processor behavior on debug events, for v6 Debug and v6.1 Debug C3-4 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Events C3.2 Software debug events A Software debug event can be any of the following: • A Breakpoint debug event, see Breakpoint debug events • A Watchpoint debug event, see Watchpoint debug events on page C3-15 • A BKPT Instruction debug event, see BKPT Instruction debug events on page C3-20 • A Vector Catch debug event, see Vector Catch debug events on page C3-20. Memory addresses on page C3-23 describes the addresses used for generating Software debug events in different memory system implementations. If Monitor debug-mode is selected and enabled, the behavior of certain types of Software debug event is UNPREDICTABLE. For more information, see Unpredictable behavior on Software debug events on page C3-24. Pseudocode details of Software debug events on page C3-27 gives pseudocode for the operation of the Software debug events. C3.2.1 Breakpoint debug events A Breakpoint debug event is defined by a pair of registers described to as a Breakpoint Register Pair (BRP), comprising a Breakpoint Control Register (DBGBCR) and a Breakpoint Value Register (DBGBVR). BRPs, DBGBCRs, and DBGBVRs number upwards from 0, with BRPn comprising DBGBCRn and DBGBVRn. For details of the breakpoint registers see: • Breakpoint Control Registers (DBGBCR) on page C10-49 • Breakpoint Value Registers (DBGBVR) on page C10-48. The DBGDIDR.BRPs field specifies the number of BRPs implemented, see Debug ID Register (DBGDIDR) on page C10-3. The maximum number of BRPs is 16. You can define a Breakpoint debug event: • Based on comparison of an Instruction Virtual Address (IVA) with the value held in a DBGBVR. See Memory addresses on page C3-23 for the definition of an IVA. • Based on comparison of the Context ID with the value held in a DBGBVR. Some BRPs might not support Context ID comparison. The DBGDIDR.CTX_CMPs field specifies the number of BRPs that support Context ID comparison, see Debug ID Register (DBGDIDR) on page C10-3. • By linking a BRP to a second BRP, to define a single Breakpoint debug event. One pair includes an IVA for comparison, and the second pair includes a Context ID value. In all cases, the DBGBCR defines some additional conditions that must be met for the BRP to generate a Breakpoint debug event, including whether the BRP is enabled. The terms hit and miss are used to describe whether the conditions defined in the BRP are met: • a hit occurs when the conditions are met • a miss occurs when a condition is not met, meaning the processor does not generate a debug event. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C3-5 Debug Events The following sections describe Breakpoint debug events: • Generation of Breakpoint debug events • Debug event generation conditions defined by the DBGBCR on page C3-7 • IVA comparisons for Debug event generation on page C3-8 • IVA comparisons and instruction length on page C3-10 • Context ID comparisons for Debug event generation on page C3-13 • Additional considerations for IVA mismatch breakpoints on page C3-13 • Additional conditions for linked BRPs on page C3-15. Generation of Breakpoint debug events For each instruction in the program flow, the debug logic tests all the BRPs. For each BRP, the debug logic generates a Breakpoint debug event only if all of the following apply: • When the BRP is tested, the conditions specified in the DBGBCR are met, see Debug event generation conditions defined by the DBGBCR on page C3-7. • The comparison with the value in the DBGBVR is successful. When two BRPs are linked to define a single Breakpoint debug event, both comparisons must succeed. For more information see: — IVA comparisons for Debug event generation on page C3-8 — Context ID comparisons for Debug event generation on page C3-13. • The instruction is committed for execution. Note The processor must test for any possible Breakpoint debug events before it executes the instruction. The debug logic might test the BRPs when an instruction is prefetched. However, it must not generate a Breakpoint debug event if the instruction is not committed for execution. If all of these conditions are met, the debug logic generates the Breakpoint debug event regardless of whether the instruction passes its condition code test. In ARMv6 and the ARMv7-A and ARMv7-R architecture profiles, the debug logic generates the debug event regardless of the type of instruction. Breakpoint debug events are synchronous. That is, the debug event acts like an exception that cancels the breakpointed instruction. When invasive debug is enabled and Monitor debug-mode is selected, if Breakpoint debug events are permitted a Breakpoint debug event generates a Prefetch Abort exception. C3-6 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Events Debug event generation conditions defined by the DBGBCR For each BRP, the DBGBCR defines some conditions for generating a Breakpoint debug event, using the following register fields: Breakpoint enable Controls whether this BRP is enabled. Privileged mode control Controls whether this BRP defines a Breakpoint debug event that can occur: • only in User mode • only in a privileged mode • only in User, System or Supervisor modes • in any mode. Security state control If the processor implements the Security Extensions, this field controls whether this BRP defines a Breakpoint debug event that can occur only in Secure state, only in Non-secure state, or in either security state. For more information, including the differences in different versions of the Debug architecture, see Breakpoint Control Registers (DBGBCR) on page C10-49. When two BRPs are linked to define a single Breakpoint debug event, the BRP that defines the IVA comparison also defines the privileged mode control and security state control, see Additional conditions for linked BRPs on page C3-15 for more information. Other information in the DBGBCR In addition to defining these conditions for generating a Breakpoint debug event, the DBGBCR controls the following: • The DBGBVR meaning field defines the breakpoint type. The following sections describe all of the breakpoint types: — IVA comparisons for Debug event generation on page C3-8 — Context ID comparisons for Debug event generation on page C3-13. • The Linked BRP number field specifies whether the BRP is linked to another BRP. If this BRP is linked, this field gives the number of the linked BRP. For more information see Additional conditions for linked BRPs on page C3-15. • For an IVA comparison, the DBGBVR defines a word-aligned address, and the Byte address select field specifies the bytes in that word that comprise the breakpointed instruction, see IVA comparisons for Debug event generation on page C3-8. • For an IVA comparison in v7 Debug, the Address range mask field optionally specifies a bitmask that defines the low-order bits of the IVA and DBGBVR values that are excluded from the comparison, see IVA comparisons for Debug event generation on page C3-8. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C3-7 Debug Events Note For IVA comparison in v7 Debug, you must use either byte address selection or address range masking to restrict the comparison made. However, you cannot use both at the same time. IVA comparisons for Debug event generation The result of an IVA comparison depends on the value in the DBGBVR either matching or mismatching the IVA value. In each case, you can link the BRP to a second BRP that defines a Context ID comparison. This means that the breakpoint types that depend on an IVA comparison are: • Unlinked IVA match • Unlinked IVA mismatch • Linked IVA match • Linked IVA mismatch. When the DBGBCR is programmed for one of these breakpoint types, the debug logic generates a Breakpoint debug event only if all the other conditions for the breakpoint are met, and the IVA comparison is successful. That is, all other conditions are met and, taking account of any masking: • for an IVA match, the IVA value equals the value in the DBGBVR • for an IVA mismatch, the IVA value does not equal the value in the DBGBVR. In the linked cases, the debug logic generates a Breakpoint debug event only if all the other conditions for the breakpoint are met, the IVA comparison is successful, and the Context ID comparison in the linked BRP is successful, see Context ID comparisons for Debug event generation on page C3-13. See Additional conditions for linked BRPs on page C3-15 for more information. All versions of the Debug architecture support byte address selection, to specify which bytes of the word addressed by DBGBVR comprise the breakpointed instruction. v7 Debug supports an alternative bit masking scheme referred to as address range masking. The following subsections give more information about the IVA comparisons: • Condition for breakpoint generation on IVA match, without address range masking on page C3-9 • Condition for breakpoint generation on IVA mismatch, without address range masking on page C3-9 • Breakpoint address range masking behavior, v7 Debug on page C3-9. DBGBVR values must be word-aligned, and DBGBVR[1:0] are never used for IVA comparison. ARM instructions are always word-aligned, and therefore a DBGBVR value can specify exactly the IVA of an ARM instruction. See IVA comparisons and instruction length on page C3-10 for more information about how the instruction length affects how you must define a breakpoint. Note C3-8 • v6 Debug does not support IVA mismatch. • If it is supported, you can use IVA mismatch to generate a Breakpoint debug event when the processor executes an instruction other than the instruction indicated by the DBGBVR. You can use this for single-stepping, or for breakpointing all instructions outside a range of instruction addresses. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Events Condition for breakpoint generation on IVA match, without address range masking When BRPn is programmed for IVA match, without address range masking, and all other conditions for generating a breakpoint are met, a Breakpoint debug event is generated only if both: • bits [31:2] of the IVA are equal to the value of bits [31:2] of DBGBVRn • the Byte address select field, bits [8:5], of DBGBCRn is programmed for an IVA match for the current Instruction set state and IVA[1:0] value, see Byte address selection behavior on IVA match or mismatch on page C10-55. Note In v7 Debug, to perform IVA comparison without address range masking you must set DBGBCR[28:24], the Address range mask field, to zero. Condition for breakpoint generation on IVA mismatch, without address range masking When BRPn is programmed for IVA mismatch, without address range masking, and all other conditions for generating a breakpoint are met, a Breakpoint debug event is generated only if either: • bits [31:2] of the IVA are not equal to the value of bits [31:2] of DBGBVRn • the Byte address select field, bits [8:5], of DBGBCRn is programmed for an IVA mismatch for the current Instruction set state and IVA[1:0] value, see Byte address selection behavior on IVA match or mismatch on page C10-55. Note In v7 Debug, to perform IVA comparison without address range masking you must set DBGBCR[28:24], the Address range mask field, to zero. Breakpoint address range masking behavior, v7 Debug When BRPn is programmed for IVA matching, the comparison is masked using the value held in the Address range mask field, DBGBCRn[28:24]. You can use the Address range mask field when programming the BRP for IVA mismatch, that is, when DBGBCR[28:24] != 0b00000 and DBGBCR[22] == 1. In this case, the address comparison portion of breakpoint generation hits for all addresses outside the masked address region. If an implementation does not support breakpoint address range masking, the Address range mask field is RAZ. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C3-9 Debug Events Note There is no encoding for a full 32-bit mask. This mask would have the effect of setting a breakpoint that hits on every address comparison, and you can achieve this by setting: • DBGBCR[22] to 1 to select an IVA mismatch • DBGBCR[8:5] to 0b0000. To use address range masking, you must also set DBGBCR[8:5], the Byte address field, to 0b1111. IVA comparisons and instruction length An instruction set is fixed-length if all of its instructions have the same length, and variable-length otherwise. In a variable-length instruction set a single instruction comprises one or more units of memory. The ARM instruction set is an example of a fixed-length instruction set. In the ARM instruction set the size of each instruction is one word, and ARM instructions are always word aligned. The following are examples of variable-length instruction sets: • The ThumbEE instruction set, and the Thumb instruction set from ARMv6T2 onwards. In these instruction sets an instruction comprises one or two halfwords. • Java bytecodes. A single Java bytecode comprises one or more bytes. Before ARMv6T2, an implementation can treat the Thumb instruction set as a fixed-length 16-bit instruction set, as described in BL and BLX (immediate) instructions, before ARMv6T2 on page AppxG-4. An implementation that does this can permit an exception to be taken between the two halfwords of a BL or BLX (immediate) instruction. In a variable-length instruction set, for an instruction consisting of more than one unit of memory, the first unit of the instruction is defined as the unit of the instruction with the lowest address in memory. In a fixed-length instruction set, an instruction consists of a single unit of memory. This unit is also the first unit of the instruction. Instruction length considerations depend on the Debug architecture version, as described in the following subsections: • Effect of instruction length in v7 Debug • Effect of instruction length in v6 Debug and v6.1 Debug on page C3-11. IVA comparison programming examples on page C3-12 gives examples of Breakpoint programming, taking account of possible instruction length effects, for all versions of the Debug architecture. Effect of instruction length in v7 Debug In v7 Debug there are four types of IVA breakpoint: • IVA match with no address range mask, described as a regular IVA breakpoint • IVA mismatch with no address range mask, described as a step-off IVA breakpoint • IVA match with an address range mask, described as an included range IVA breakpoint C3-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Events • IVA mismatch with an address range mask, described as an excluded range IVA breakpoint Note Support for address range masks on breakpoints is IMPLEMENTATION DEFINED. For all types of IVA breakpoint, if the conditions in the DBGBCR are met, and the instruction is committed for execution, the BRP generates a Breakpoint debug event if the required DBGBVR comparison, taking account of the byte address selection and any address range masking, hits for the first unit of the instruction. Table C3-2 shows the conditions for Breakpoint debug event generation by an instruction that comprises more than one unit of memory, assuming that the conditions in the DBGBCR are met and that the instruction is committed for execution. Table C3-2 Breakpoint debug event generation for instructions of more than one unit of memory DBGBVR comparison result a : IVA breakpoint type Breakpoint debug event generated? First unit b Any subsequent unit b Hit - Any Yes Miss Hit Regular, included range, or excluded range UNPREDICTABLE Step-off No Any No Miss Miss a. Taking account of the byte address selection and any address range masking. b. Of the instruction whose IVA is being compared. Effect of instruction length in v6 Debug and v6.1 Debug If the conditions in the DBGBCR are met, and the instruction is committed for execution, the BRP generates a Breakpoint debug event if the required DBGBVR comparison, taking account of the byte address selection, hits for the first unit of the instruction. In v6 Debug and v6.1 Debug, it is IMPLEMENTATION DEFINED whether an IVA comparison on an instruction memory unit other than the first unit, following a breakpoint miss on the first unit of the instruction, can cause a Breakpoint debug event. For Java bytecodes, v6 Debug and v6.1 Debug specify that a BRP comparison on an operand does not generate a Breakpoint debug event. A Breakpoint debug is generated only if the BRP hits on the opcode. For Java bytecodes the instruction memory unit is a byte, and the opcode is always the first byte of the instruction. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C3-11 Debug Events Note • • v6 Debug does not support IVA mismatch breakpoints. v6.1 Debug and v6 Debug do not support address range masks on breakpoints. IVA comparison programming examples In all Debug architecture versions, a debugger must configure the BRP so that it matches on all bytes of the first unit of the instruction, otherwise the generation of Breakpoint debug events is UNPREDICTABLE. Before ARMv6T2, on a processor that implements the Thumb instruction set and can take an exception between the two halfwords of a Thumb BL or BLX (immediate) instruction, a debugger must treat the two halfwords as separate instructions, and set breakpoints on both halfwords. This might require two BRPs. Note • To ensure compatibility across ARMv6 implementations, a debugger can always treat BL or BLX (immediate) as two instructions when debugging code on an ARMv6 processor before ARMv6T2. • The examples that follow include setting breakpoints on ThumbEE instructions. These are supported only in ARMv7. For example, if BRPn and BRPm are two breakpoint register pairs, then: • On any ARMv6 or ARMv7 processor: — To breakpoint on a Java bytecode at address 0x8001, the debugger must set DBGBVRn to 0x8000 and DBGBCRn[8:5] to 0b0010. • — To breakpoint on a 16-bit Thumb or ThumbEE instruction starting at address 0x8002, a debugger must set DBGBVRn to 0x8000 and DBGBCRn[8:5] to 0b1100. — To breakpoint on an ARM instruction starting at address 0x8004, a debugger must set DBGBVRn to 0x8004 and DBGBCRn[8:5] to 0b1111. On an ARMv7 or ARMv6T2 processor, a debugger sets breakpoints on a 32-bit Thumb instruction, or on a 16-bit or a 32-bit ThumbEE instruction, in exactly the same way as on a 16-bit Thumb instruction. For example: — To breakpoint on a 16-bit or a 32-bit Thumb or ThumbEE instruction starting at address 0x8000, the debugger must set DBGBVRn to 0x8000 and DBGBCRn[8:5] to 0b0011. These are the settings for breakpointing on any Thumb or ThumbEE instruction, including BL and BLX (immediate). • C3-12 On an ARMv6 or ARMv6K processor: — To breakpoint on a Thumb BL or BLX instruction at address 0x8000, a debugger must set DBGBVRn to 0x8000, and DBGBCRn[8:5] to 0b1111. — To breakpoint on a Thumb BL or BLX instruction at address 0x8002, a debugger must set DBGBVRn to 0x8000, DBGBVRm to 0x8004, DBGBCRn[8:5] to 0b1100, and DBGBCRm[8:5] to 0b0011. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Events Note When programming DBGBVR for IVA match or mismatch, the debugger must program DBGBVR[1:0] to 0b00, otherwise Breakpoint debug event generation is UNPREDICTABLE. Context ID comparisons for Debug event generation A Context ID comparison depends on the value in the DBGBVR matching the Context ID, held in the Context ID Register, when the instruction is committed for execution. The breakpoint types that depend on a Context ID comparison are: • Unlinked Context ID match • Linked Context ID match. When the DBGBCR is programmed for one of these debug types, the debug logic generates a Breakpoint debug event only if all the other conditions for the breakpoint are met, and the Context ID equals the value in the DBGBVR. In the linked case, the BRP that is programmed for a Context ID match is linked to at least one of: • a BRP programmed for Linked IVA match or mismatch • a Watchpoint Register Pair (WRP) programmed for linked Data Virtual Address (DVA) match. In the linked IVA cases, the debug logic generates a Breakpoint debug event only if all the other conditions for the breakpoint are met, the Context ID comparison is successful, and the IVA comparison in the linked BRP is successful, see IVA comparisons for Debug event generation on page C3-8. See Additional conditions for linked BRPs on page C3-15 for more information. In the linked DVA case, the debug logic generates a Watchpoint debug event only if all the other conditions for the watchpoint are met, the Context ID comparison is successful, and the DVA comparison in the linked WRP is successful, See Watchpoint debug events on page C3-15 for more information. Note • You cannot define a Breakpoint debug event based on a Context ID mismatch. • You can link a BRP programmed for linked Context ID match to any number of: — BRPs programmed for Linked IVA match or mismatch — WRPs programmed for Linked DVA match. This means you can use a single BRP to define the Context ID match for multiple breakpoints and watchpoints. Additional considerations for IVA mismatch breakpoints The following subsections describe additional considerations for IVA mismatch breakpoints: • Interaction of IVA mismatch breakpoints with other breakpoints and Vector Catch on page C3-14 • Generation of IVA mismatch breakpoints on branch to self instructions on page C3-14. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C3-13 Debug Events Interaction of IVA mismatch breakpoints with other breakpoints and Vector Catch When a BRPn is programmed for IVA mismatch and does not generate a Breakpoint debug event because the general conditions specified in DBGBCRn are not met, this does not affect the generation of: • Breakpoint debug events by other BRPs • Vector Catch debug events. Note In this context, the general conditions specified in DBGBCR not being met means that at least one of the following applies: • the BRP is not enabled • the Privileged mode control bits of the DBGBCR do not match the mode of the processor • DBGBCR is configured for linked Context ID matching but the linked BRP either is not enabled or does not match the current Context ID • the Security Extensions are implemented, and the Security state control field of DBGBCR does not match the security state of the processor. However, if the general conditions specified in DBGBCRn are met, and BRPn does not generate a Breakpoint debug event only because the IVA fails the comparison required for an IVA mismatch, then the failure of this comparison can affect the generation of other debug events: • if any other BRP, BRPm, hits on its required comparison with the IVA and meets the general conditions specified in DBGBCRm, it is UNPREDICTABLE whether BRPm generates a Breakpoint debug event • if the Vector Catch Register defines a Vector Catch that matches the IVA, it is UNPREDICTABLE whether a Vector Catch debug event is generated. Generation of IVA mismatch breakpoints on branch to self instructions This section describes the generation of Breakpoint debug events when the IVA of an instruction that branches to itself misses a BRP programmed for IVA mismatch, and all the general conditions specified in the DBGBCR are met. See the IVA mismatch column of Table C10-11 on page C10-56 for details of when an IVA mismatch comparison misses. In this case: C3-14 1. The first time the instruction is committed for execution the BRP does not generate a Breakpoint debug event. 2. Because the instruction branches to itself, if no exception is generated, the instruction is committed for execution again. On this and any subsequent execution, it is UNPREDICTABLE whether the BRP generates a Breakpoint debug event. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Events Note Instructions that branch to themselves include: • a branch instruction that specifies itself as the branch destination • a load instruction that loads the PC from a memory location that holds the address of that load instruction. Additional conditions for linked BRPs When you link two BRPs to define a single Linked IVA match or mismatch breakpoint, if BRPn defines the IVA match or mismatch and BRPm defines the Context ID match: • for the DBGBCR fields described in Debug event generation conditions defined by the DBGBCR on page C3-7, you must program DBGBCRn and DBGBCRm as follows: — in DBGBCRn, program the Security state control and Privileged mode control fields to define the required conditions for Debug event generation — in DBGBCRm, program the Security state control field to 0b00, and the Privileged mode control field to 0b11 • you must program the Linked BRP number field: — of DBGBCRn with the value of m — of DBGBCRm to zero • you must program the DBGBVR meaning field: — of DBGBCRn for Linked IVA match or mismatch — of DBGBCRm for Linked Context ID match • BRPm must support Context ID comparisons. Breakpoint debug event generation is UNPREDICTABLE if you do not meet all these conditions. You must also set the Breakpoint enable bits in DBGBCRn and DBGBCRm to 1, to enable both BRPs. Note If you fail to enable either or both of the BRPs, BRPn never generates any Breakpoint debug events. For more information see Linked comparisons on page C10-59 C3.2.2 Watchpoint debug events A Watchpoint debug event is defined by a pair of registers described as a Watchpoint Register Pair (WRP), comprising a Watchpoint Control Register (DBGWCR) and a Watchpoint Value Register (DBGWVR). WRPs, DBGWCRs, and DBGWVRs number upwards from 0, with WRPn comprising DBGWCRn and DBGWVRn. For details of the Watchpoint registers see: • Watchpoint Control Registers (DBGWCR) on page C10-61 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C3-15 Debug Events • Watchpoint Value Registers (DBGWVR) on page C10-60. The DBGDIDR.WRPs field specifies the number of WRPs implemented, see Debug ID Register (DBGDIDR) on page C10-3. A WRP can be linked to a BRP, to define a single watchpoint event. The WRP holds a virtual address for comparison, and the BRP holds a Context ID value. For more information, see Linked comparisons on page C10-59. A Watchpoint debug event is defined based on comparisons of a Data Virtual Address (DVA) with the value held in a WVR. See Memory addresses on page C3-23 for the definition of a DVA. For a given Watchpoint Register Pair, WRPn, a Watchpoint debug event occurs when all of the following are true: • The watchpoint is enabled, in DBGWCRn. • The DVA matches the value in DBGWVRn. • When the processor tests the WRP, all the conditions of DBGWCRn are met. • If linking is enabled in DBGWCRn, when the processor tests the WRP, the Linked Context ID matching BRP, BRPm, meets the following conditions: — the BRP is enabled, in DBGBCRm — the value held in the DBGBVRm matches the Context ID held in the CONTEXTIDR. For more information about BRPs see Breakpoint debug events on page C3-5. • The instruction that initiated the memory access is committed for execution. A Watchpoint debug event is generated only if the instruction passes its condition code check. Note A watchpoint match does not require the access to match exactly the watched address. A match is generated on any access to any watched byte or bytes. For example, a match is generated on an unaligned word access that includes a byte that is being watched, even when the watched byte is not in the same word as the start address of the unaligned word. All instructions that are defined as memory access instructions can generate Watchpoint debug events. For information about which instructions are memory accesses see Alphabetical list of instructions on page A8-14. Watchpoint debug event generation can be conditional on whether the memory access is a load access or a store access. For a Store-Exclusive instruction, if the target address of the instruction would generate a Watchpoint debug event, but the check of whether the Store-Exclusive operation has control of the exclusive monitors returns FALSE, then it is IMPLEMENTATION DEFINED whether the processor generates the Watchpoint debug event. C3-16 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Events For each of the memory hint instructions, PLD and PLI, it is IMPLEMENTATION DEFINED whether the instruction generates Watchpoint debug events. If either or both of the PLD and PLI instructions normally generates Watchpoint debug events, the behavior must be: • • • For the PLI instruction: — no watchpoint is generated in a situation where, if the instruction was a real fetch rather than a hint, the real fetch would generate a Prefetch Abort exception — in all other situations a Watchpoint debug event is generated. For the PLD instruction: — no watchpoint is generated in a situation where, if the instruction was a real memory access rather than a hint, the real memory access would generate a Data Abort exception — in all other situations a Watchpoint debug event is generated. When watchpoint generation is conditional on the type of memory access, a memory hint instruction is treated as generating a load access. It is IMPLEMENTATION DEFINED whether the following cache maintenance operations generate Watchpoint debug events: • Clean data or unified cache line by MVA to PoU, DCCMVAU • Clean data or unified cache line by MVA to PoC, DCCMVAC • Invalidate data or unified cache line by MVA to PoC, DCIMVAC • Invalidate instruction cache line by MVA to PoU, ICIMVAU • Clean and Invalidate data or unified cache line by MVA to PoC, DCCIMVAC. When Watchpoint debug event generation by these cache maintenance operations is implemented, the behavior must be: • the cache maintenance operation must generate a Watchpoint debug event on a DVA match, regardless of whether the data is stored in any cache • when watchpoint generation is conditional on the type of memory access, a cache maintenance operation is treated as generating a store access. For regular data accesses, the size of the access is considered when determining whether a watched byte is being accessed. The size of the access is IMPLEMENTATION DEFINED for: • memory hint instructions, PLD and PLI • cache maintenance operations. Watchpoint debug events are precise and can be synchronous or asynchronous: • a synchronous Watchpoint debug event acts like a synchronous abort exception on the memory access instruction itself • an asynchronous Watchpoint debug event acts like a precise asynchronous abort exception that cancels a later instruction. For more information, see Synchronous and Asynchronous Watchpoint debug events on page C3-18. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C3-17 Debug Events For the ordering of debug events, ARMv7 requires that: • Regardless of the actual ordering of memory accesses, Watchpoint debug events must be taken in program order. See Debug event prioritization on page C3-43. • Watchpoint debug events must behave as if the processor tested for any possible Watchpoint debug event before the memory access was observed, regardless of whether the Watchpoint debug event is synchronous or asynchronous. See Generation of debug events on page C3-40. Synchronous and Asynchronous Watchpoint debug events ARMv7 permits watchpoints to be either synchronous or asynchronous. An implementation can implement synchronous watchpoints, asynchronous watchpoints, or both. It is IMPLEMENTATION DEFINED under what circumstances a watchpoint is synchronous or asynchronous. ARMv6 only permits asynchronous watchpoints. Synchronous Watchpoint debug events A synchronous Watchpoint debug event acts like a synchronous abort: • The debug event occurs before any following instructions or exceptions have altered the state of the processor. • The value in the base register for the memory access is not updated. Note The Base Updated Abort Model is not permitted in ARMv7. • If the instruction was a register load, the data returned is marked as invalid and: — if the instruction was a single register load, the destination is not updated — if the instruction loaded multiple registers, the values in the destination registers, other than the PC and base register, are UNKNOWN. • If the instruction is a coprocessor load, the values left in the coprocessor registers are UNKNOWN. • If the instruction is a store, the content of the memory location written to is unchanged. When invasive debug is enabled and Monitor debug-mode is selected, if Watchpoint debug events are permitted a synchronous Watchpoint debug event generates a synchronous Data Abort exception. On a synchronous Watchpoint debug event, the DBGDSCR.MOE field is set to Synchronous Watchpoint occurred. When an instruction that causes multiple memory operations is addressing Device or Strongly-ordered memory, if a synchronous Watchpoint debug event is signaled by a memory operation other than the first operation of the instruction, the memory access rules might not be maintained. Examples of instructions that cause multiple memory operations are the LDM and LDC instructions. C3-18 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Events For example, if the second memory operation of an STM instruction signals a synchronous Watchpoint debug event, then when the instruction is re-tried following processing of the debug event, the first memory operation is repeated. This behavior is not normally permitted for accesses to Device or Strongly-ordered memory. To avoid this circumstance, debuggers must not set watchpoints on addresses in regions of Device or Strongly-ordered memory that might be accessed in this way. The address range masking features of watchpoints can be used to set a watchpoint on an entire region, ensuring the synchronous Watchpoint debug event is taken on the first operation of such an instruction. Asynchronous Watchpoint debug events An asynchronous Watchpoint debug event acts like a precise asynchronous abort. Its behavior is: • The watchpointed instruction must have completed, and other instructions that followed it, in program order, might have completed. For more information, see Recognizing asynchronous Watchpoint debug events. • The watchpoint must be taken before any exceptions that occur in program order after the watchpoint is triggered. • All the registers written by the watchpointed instruction are updated. • Any memory accessed by the watchpointed instruction is updated. When invasive debug is enabled and Monitor debug-mode is selected, if Watchpoint debug events are permitted an asynchronous Watchpoint debug event generates a precise asynchronous Data Abort exception. An asynchronous Watchpoint debug event is not an abort and is not affected by architectural rules about aborts, including the rules about external aborts and asynchronous aborts. An asynchronous Watchpoint debug event: • is not affected by the SCR.EA bit • is not ignored when the CPSR.A bit is set to 1. On an asynchronous Watchpoint debug event, the DBGDSCR.MOE field is set to Asynchronous Watchpoint occurred. Recognizing asynchronous Watchpoint debug events When an instruction that consists of multiple memory operations is accessing Device or Strongly-ordered memory, and an asynchronous Watchpoint debug event is signaled by a memory operation other than the first operation of the instruction, the debug event must not cause Debug state entry or a debug exception until all the operations have completed. This ensures the memory access rules for Device and Strongly-ordered memory are preserved. Examples of instructions that cause multiple memory operations are the LDM and LDC instructions. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C3-19 Debug Events Note To understand why the architecture does not permit the asynchronous Watchpoint debug event to be taken before the watchpointed instruction completes, consider an LDM instruction accessing Device or Strongly-ordered memory, with an asynchronous Watchpoint debug event signaled after the first word of memory is accessed. If the debug event was taken immediately, the LDM would be re-executed on return from the event handler. This would cause a new access to the first word of memory, breaking the rule that, for Device or Strongly-ordered memory, each memory operation of an instruction is issued precisely once. C3.2.3 BKPT Instruction debug events A BKPT Instruction debug event occurs when a BKPT instruction is committed for execution. BKPT is an unconditional instruction. BKPT Instruction debug events are synchronous. That is, the debug event acts like an exception that cancels the BKPT instruction. For details of the BKPT instruction and its encodings in the ARM and Thumb instruction sets see BKPT on page A8-56. C3.2.4 Vector Catch debug events The Vector Catch Register (DBGVCR) controls Vector Catch debug events, see Vector Catch Register (DBGVCR) on page C10-67. A Vector Catch debug event occurs when: • The IVA of an instruction matches a vector address for the current security state. See Memory addresses on page C3-23 for a definition of the IVA. • When the processor tests for the possible vector catch, the corresponding bit of the DBGVCR is set to 1, indicating that vector catch is enabled. • The instruction is committed for execution. The debug event is generated whether the instruction passes or fails its condition code check. If all the conditions for a Vector Catch debug event are met, the processor generates the event regardless of the mode in which it is executing. The processor must test for any possible Vector Catch debug events before it executes the instruction. If the Security Extensions are not implemented the debug logic uses only one set of vector addresses to generate Vector Catch debug events, and these are called the Local vector addresses. If the Security Extensions are implemented, the debug logic uses three sets of vector addresses to generate Vector Catch debug events: • C3-20 One set for exceptions taken in the Non-secure exception modes. These are called the Non-secure Local vector addresses. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Events • One set for exceptions taken in the Secure exception modes other than Monitor mode. These are called the Secure Local vector addresses. • One set for exceptions taken in Monitor mode. These are called the Monitor vector addresses. You enable vector catch independently for each of these vector addresses, by setting a bit in the DBGVCR to 1, see Vector Catch Register (DBGVCR) on page C10-67. If the Security Extensions are not implemented, the debug logic determines whether to generate a Vector Catch debug event by comparing every instruction fetch with the Local vector addresses. If the Security Extensions are implemented, the debug logic determines whether to generate a Vector Catch debug event by comparing every Secure instruction fetch with the Secure Local and Monitor vector addresses, and by comparing every Non-secure instruction fetch with the Non-secure Local vector addresses. Note Any instruction fetched from an exception vector address and committed for execution triggers a Vector Catch debug event if the appropriate bit in the DBGVCR is set to 1. Testing for possible Vector Catch debug events does not check whether the instruction is executed as a result of an exception entry. Whether a Vector Catch debug event is generated for an instruction is UNPREDICTABLE if either: • The exception vector address is word-aligned and one of the following applies: — the first unit of the instruction is in the word at the exception vector address but is not at the exception vector address — the first unit of the instruction is not in the word at the exception vector address but another unit of the instruction is in that word. This can occur when the processor is executing a variable-length instruction set, that is, in Thumb, ThumbEE or Jazelle state. • ARM DDI 0406B The exception vector address is not word-aligned but is halfword-aligned and one of the following applies: — The first unit of the instruction is in the halfword at the exception vector address but is not at the exception vector address. This can occur only in Jazelle state, where instructions consist of one or more byte-sized units. — The first unit of the instruction includes the halfword at the exception vector address but is not at the exception vector address. This can occur only in ARM state, where all instructions are a single word and are word-aligned. — The first unit of the instruction is not in the halfword at the exception vector address but another unit of the instruction is in that halfword. This can occur in variable-length instruction set states, that is, in Thumb, ThumbEE or Jazelle state. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C3-21 Debug Events Note Normally, exception vector addresses must be word-aligned. However, when SCTLR.VE == 1, enabling vectored interrupt support, the exception vector address for one or both of the IRQ and FIQ vectors might not be word-aligned. Support for exception vector addresses that are not word-aligned is IMPLEMENTATION DEFINED, see Vectored interrupt support on page B1-32. If Monitor debug-mode is selected and enabled, and the vector is either the Prefetch Abort vector or the Data Abort vector, the debug event is: UNPREDICTABLE in v7 Debug • • ignored in v6 Debug and v6.1 Debug. Vector Catch debug events are synchronous. That is, the debug event acts like an exception that cancels the instruction at the caught vector. When invasive debug is enabled and Monitor debug-mode is selected, if Vector Catch debug events are permitted a Vector Catch debug event generates a Prefetch Abort exception. For more information, see Generation of debug events on page C3-40. Note A Vector Catch debug event is taken only when the instruction is committed for execution and therefore might not be taken if another exception occurs, see Debug event prioritization on page C3-43. For more information, see Vector Catch Register (DBGVCR) on page C10-67. Vector catch debug events and vectored interrupt support The ARM architecture provides support for vectored interrupts, where an interrupt controller provides the interrupt vector address directly to the processor. The mechanism for defining the vectors is IMPLEMENTATION DEFINED. You enable the use of vectored interrupts by setting the SCTLR.VE bit to 1. For more information see Vectored interrupt support on page B1-32. Vectored interrupt support affects Vector Catch debug event generation for the IRQ and FIQ exception vectors. These two vectors are described as the interrupt vectors. The details of Vector Catch debug event generation on the interrupt vectors depend on whether the Security Extensions are implemented: If the Security Extensions are not implemented C3-22 • If the SCTRL.VE bit is set to 0, then the Local vector addresses for IRQ and FIQ vector catch are determined by the exception base address. • If the SCTRL.VE bit is set to 1, then the Local vector address for an IRQ or FIQ vector catch is the interrupt vector address supplied by the interrupt controller on taking the interrupt. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Events If the Security Extensions are implemented The Secure Local and Non-secure Local vector addresses for IRQ and FIQ vector catch are determined by the appropriate banked copy of the SCTRL.VE bit: • If the SCTRL.VE bit is set to 0, then the corresponding Local vector addresses for IRQ and FIQ vector catch are determined by the banked exception base address. • If the SCTRL.VE bit is set to 1, then for each of IRQ and FIQ vector catch: — if the interrupt is taken in Secure or Non-Secure IRQ mode or FIQ mode, then the corresponding Local vector address is the interrupt vector address supplied by the interrupt controller on taking the interrupt. — if the interrupt is taken in Monitor mode, then it is IMPLEMENTATION DEFINED whether the IRQ and FIQ Vector Catch debug events generated from the Local vector addresses can occur, and if they can occur the Secure and Non-secure Local vector addresses for the vector catches are IMPLEMENTATION DEFINED. The Monitor vector addresses for IRQ and FIQ vector catch are determined by the Monitor exception base address. When the Vector Catch debug logic uses addresses supplied by the interrupt controller, then: C3.2.5 • if the interrupt controller has not supplied an interrupt address to the processor since vectored interrupt support was enabled then no Vector Catch debug events using Local vector addresses are generated • if Vector Catch debug events were not enabled when the interrupt controller supplied a vector address to the processor, but have been enabled since, an implementation must consistently either: — generate a Vector Catch debug event if the IVA of an instruction matches the Local vector address — not generate Vector Catch debug events using any Local vector address. Memory addresses On processors that implement the Virtual Memory System Architecture (VMSA), and also implement the Fast Context Switch Extension (FCSE): • It is IMPLEMENTATION DEFINED whether the Instruction Virtual Address (IVA) used in generating Breakpoint debug events is the Modified Virtual Address (MVA) or Virtual Address (VA) of the instruction. • It is IMPLEMENTATION DEFINED whether the Data Virtual Address (DVA) used in generating Watchpoint debug events is the MVA or VA of the data access. • The IVA used in generating Vector Catch debug events is always the VA of the instruction. • The Watchpoint Fault Address Register (DBGWFAR) reads a VA plus an offset that depends on the processor instruction set state. • The Program Counter Sampling Register (DBGPCSR), if implemented, reads a VA plus an offset that depends on the processor instruction set state. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C3-23 Debug Events Note The FCSE is optional in ARMv7, and ARM deprecates use of the FCSE. On processors that implement the VMSA, and do not implement the FCSE: • The IVA used in generating Breakpoint debug events is the VA of the instruction. • The DVA used in generating Watchpoint debug events is the VA of the data access. • The IVA used in generating Vector Catch debug events is the VA of the instruction. • The DBGWFAR reads a VA plus an offset that depends on the processor instruction set state. • The DBGPCSR reads a VA plus an offset that depends on the processor instruction set state. On processors that implement the Protected Memory System Architecture (PMSA), the Virtual Address is identical to the Physical Address (PA) and therefore: • The IVA used in generating Breakpoint debug events is the PA of the instruction. • The DVA used in generating Watchpoint debug events is the PA of the data access. • The IVA used in generating Vector Catch debug events is the PA of the instruction. • The DBGWFAR reads a PA plus an offset that depends on the processor instruction set state. • The DBGPCSR reads a PA plus an offset that depends on the processor instruction set state. For more information about the DBGWFAR, see: • Effects of debug exceptions on CP15 registers and the DBGWFAR on page C4-4 • Effect of entering Debug state on CP15 registers and the DBGWFAR on page C5-4 • Watchpoint Fault Address Register (DBGWFAR) on page C10-28. For more information about the DBGPCSR, see Program Counter sampling on page C8-2 and Program Counter Sampling Register (DBGPCSR) on page C10-38. C3.2.6 UNPREDICTABLE behavior on Software debug events In ARMv6 the following events are ignored if Monitor debug-mode is configured, because they could lead to an unrecoverable state: • Vector Catch debug events on the Prefetch Abort and Data Abort vectors • Unlinked Context ID Breakpoint debug events, if the processor is running in a privileged mode • Linked or Unlinked Instruction Virtual Address mismatch Breakpoint debug events, if the processor is running in a privileged mode. In ARMv7, if Monitor debug-mode is configured the generation of the following events is UNPREDICTABLE and can lead to an unrecoverable state: C3-24 • Vector Catch debug events on the Prefetch Abort and Data Abort vectors • Unlinked Context ID Breakpoint debug events that are configured to be generated in any mode, or to be generated only in privileged modes • Linked or Unlinked Instruction Virtual Address mismatch Breakpoint debug events that are configured to be generated in any mode, or to be generated only in privileged modes. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Events When Monitor debug-mode is configured, debuggers must avoid these cases by restricting the programming of the debug event control registers: • DBGVCR[28,27,12,11,4,3] must be programmed as zero, see Vector Catch Register (DBGVCR) on page C10-67. • The permitted values of the Privileged Mode control bits, DBGBCR[2:1], must be restricted in the following cases: — if DBGBCR[22:20] is set to 0b010, selecting an Unlinked Context ID breakpoint — If DBGBCR[22:20] is set to 0b100 or 0b101, selecting an IVA mismatch breakpoint. For these cases, DBGBCR[2:1] must be programmed to one of: — 0b00, selecting match only in User, Supervisor or System mode — 0b10, selecting match only in User mode. See Debug exceptions in abort handlers for additional points that must be considered before using the 0b00 setting. For details of programming the DBGBCR see Breakpoint Control Registers (DBGBCR) on page C10-49. If these restrictions are not followed, processor behavior on a resulting debug event is UNPREDICTABLE. When the Security Extensions are implemented Vector Catch debug events on the Secure Monitor Call vector are not ignored and are not UNPREDICTABLE. However, normally DBGVCR[10] is also programmed as zero, see Monitor debug-mode vector catch on Secure Monitor Call on page C3-26. Debug exceptions in abort handlers The previous section indicated that, in ARMv7, a debugger might set DBGBCR[2:1] to 0b00, match in User, Supervisor and System modes, to avoid the possibility of reaching an unrecoverable state in the Unlinked Context ID and IVA mismatch breakpoint cases when Monitor debug-mode is selected. However, DBGBCR[2:1] must only be programmed to 0b00 if you are confident that the abort handler will not switch to one of these modes before saving context that might be corrupted by an additional debug event. The context that might be corrupted by such an event includes LR_abt, SPSR_abt, IFAR, DFAR, and DFSR. It is unlikely that an abort handler would switch to User mode to process an abort before saving these registers, so setting DBGBCR[2:1] to 0b10, match only in User mode, is safer. Also, take care when setting a Breakpoint or BKPT Instruction debug event inside a Prefetch Abort or Data Abort handler, or when setting a Watchpoint debug event on a data address that might be accessed by any of these handlers. In general, a user must only set Breakpoint or BKPT Instruction debug events inside an abort handler at a point after the context that would be corrupted by a debug event has been saved. Breakpoint debug events in code that might be run by an abort handler can be avoided by setting DBGBCR[2:1] to 0b00 or 0b01, as appropriate. Watchpoint debug events in abort handlers can be avoided by setting DBGWCR[2:1] for the watchpoint to 0b10, match only unprivileged accesses, if the code being debugged is not running in a privileged mode. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C3-25 Debug Events If these guidelines are not followed, a debug event might occur before the handler has saved the context of the abort, causing the context to be overwritten. This loss of context results in UNPREDICTABLE software behavior. The context that might be corrupted by such an event includes LR_abt, SPSR_abt, IFAR, DFAR, and DFSR. Debug events in the debug monitor Because debug exceptions generate Data Abort or Prefetch Abort exceptions, the precautions outlined in the section Debug exceptions in abort handlers on page C3-25 also apply to debug monitors. The suggested settings for breakpoints and watchpoints that can avoid taking debug exceptions in a Data Abort handler can be used to avoid taking debug exceptions in the debug monitor. In addition, particularly on ARMv7 processors that do not implement the Extended CP14 interface, and particularly those that implement synchronous Watchpoint debug events, when Monitor debug-mode is enabled debuggers must avoid: • setting Watchpoint debug events on the addresses of debug registers • setting Breakpoint and Vector Catch debug events on the addresses of instructions in the debug monitor. In particular, it is unwise to set a watchpoint on the address of the Watchpoint Control Register (DBGWCR) for that watchpoint, or to set a breakpoint on the address of an instruction that disables the breakpoint. The section Generation of debug events on page C3-40 identifies two problem cases: • A write to the DBGWCR for a watchpoint set on the address of that DBGWCR, to disable that watchpoint, triggers the watchpoint. In this case: • — if watchpoints are asynchronous, the write to the DBGWCR still takes place and the watchpoint is disabled. The debug software must then deal with the re-entrant debug exception. — if watchpoints are synchronous the value in the DBGWCR after the watchpoint is signaled is unchanged, and the debug event is left enabled. an instruction that disables a breakpoint on that instruction triggers the breakpoint. In this case, the debug exception is taken before the debug event is disabled. In both of these cases it might be impossible to recover. Monitor debug-mode vector catch on Secure Monitor Call Debuggers must be cautious about programming a Vector Catch debug event on the Secure Monitor Call (SMC) vector when Monitor debug-mode is configured. If such an event is programmed, the following sequence can occur: 1. C3-26 Non-secure code executes an SMC instruction. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Events 2. The processor takes the SMC exception, branching to the Monitor vector in Monitor mode. The SCR.NS bit is set to 1, indicating the SMC originated in the Non-secure state. 3. The Vector Catch debug event is taken. Although SCR.NS is set to 1, the processor is in the Secure state because it is in Monitor mode. 4. The processor jumps to the Secure Prefetch Abort vector, and sets SCR.NS to 0. Note Aborts taken in Secure state cause SCR.NS to be set to 0. 5. The abort handler at the Secure Prefetch Abort handler can tell a Vector Catch debug event occurred, and can determine the address of the SMC instruction from LR_mon. However, it cannot determine whether that is a Secure or Non-secure address. Therefore, ARM recommends that you do not program a Vector Catch debug event on the SMC vector when Monitor debug-mode is enabled. Note This is not a security issue, because the sequence given here can only occur if SPIDEN is HIGH. Possible effect of the Security Extensions on FIQ vector catch When the Security Extensions are implemented, a debugger might need to consider the implications of the SCR on a Vector Catch event set on the FIQ vector, when the SCR is configured with both: • the SCR.FW bit set to 0, so the CPSR.F bit cannot be modified in Non-secure state • the SCR.FIQ bit set to 0, so that FIQs are handled in FIQ mode. With this configuration, if an FIQ occurs in Non-secure state, the processor does not set CPSR.F to disable FIQs, and so the processor repeatedly takes the FIQ exception. It might not be possible to debug this situation using the vector catch on FIQ because the instruction at the FIQ exception vector is never committed for execution and therefore the debug event never occurs. C3.2.7 Pseudocode details of Software debug events The following subsections give pseudocode details of Software debug events: • Debug events • Breakpoints and Vector Catches on page C3-28 • Watchpoints on page C3-35. Debug events The following functions cause the corresponding debug events to occur: BKPTInstrDebugEvent() ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C3-27 Debug Events BreakpointDebugEvent() VectorCatchDebugEvent() WatchpointDebugEvent() If the debug event is not permitted, it is ignored by the processor. Breakpoints and Vector Catches If invasive debug is enabled, on each instruction the Debug_CheckInstruction() function checks for BRP and DBGVCR matches. If a match is found the function calls BreakpointDebugEvent() or VectorCatchDebugEvent(). If the debug event is not permitted, it is ignored by the processor. On a simple sequential execution model, the Debug_CheckInstruction() call for an instruction occurs just before the Operation pseudocode for the instruction is executed, and any call it generates to BreakpointDebugEvent() or VectorCatchDebugEvent() must happen at that time. However, the architecture does not define when the checks for BRP and DBGVCR matches are made, other than that they must be made at or before that time. Therefore an implementation can perform the checks much earlier in an instruction pipeline, marking the instruction as breakpointed, and cause a marked instruction to call BreakpointDebugEvent() or VectorCatchDebugEvent() if and when it is about to execute. The BRPMatch() function checks an individual BRP match, calling the BRPLinkMatch() function if necessary to check whether a linked BRP matches. The VCRMatch() function checks for a Vector Catch debug event. When vectored interrupt support is enabled, it uses variables to hold the IRQ and FIQ interrupt vector addresses supplied to the processor by the interrupt controller on taking an interrupt in IRQ mode or FIQ mode. These variables are updated by the VCR_OnTakingInterrupt() function, that is called each time the processor takes an IRQ or FIQ interrupt. For all of these functions, between a context changing operation and an exception entry, exception return or explicit Instruction Synchronization Barrier (ISB) operation, it is UNPREDICTABLE whether the values of CurrentModeIsPrivileged(), CPSR.M, CurrentInstrSet(), FindSecure(), and the CONTEXTIDR used by BRPMatch(), BRPLinkMatch(), and VCRMatch() are the old or the new values. // Debug_CheckInstruction() // ======================== Debug_CheckInstruction(bits(32) address, integer length) // Do nothing if debug disabled. if DBGDSCR<15:14> == ‘00’ then return; case CurrentInstrSet() of when InstrSet_ARM step = 4; when InstrSet_Thumb, InstrSet_ThumbEE step = 2; when InstrSet_Jazelle step = 1; length = length / step; vcr_match = FALSE; brp_match = FALSE; C3-28 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Events // Each unit of the instruction is checked against the VCR and the BRPs. VCRMatch() // and BRPMatch() might return UNKNOWN for units other than the first unit of the // instruction, as in some cases the generation of Debug events is UNPREDICTABLE. for W = 0 to length-1 vcr_match = VCRMatch(address, W == 0) || vcr_match; // This code does not take into account the case where a mismatch breakpoint // does not match the address of an instruction but another breakpoint or // vector catch does match the instruction. In that situation, generation of // the Debug event is UNPREDICTABLE. for N = 0 to UInt(DBGDIDR.BRPs) brp_match = BRPMatch(N, address, W == 0) || brp_match; address = address + step; // // // if A suitable debug event occurs if there has been a BRP match or a VCR match. If both have occurred, just one debug event occurs, and its type is IMPLEMENTATION DEFINED. vcr_match || brp_match then if !vcr_match then BreakpointDebugEvent(); elsif !brp_match then VectorCatchDebugEvent(); else IMPLEMENTATION_DEFINED either BreakpointDebugEvent() or VectorCatchDebugEvent(); return; // BRPMatch() // ========== boolean BRPMatch(integer N, bits(32) address, boolean first) assert N <= UInt(DBGDIDR.BRPs); // If this breakpoint is not enabled, return immediately. if DBGBCR[N]<0> == ‘0’ return FALSE; unk_match = FALSE; // Mode control match case DBGBCR[N]<2:1> of when ‘00’ if UInt(DBGDIDR.Version) < 3 then UNPREDICTABLE; else case CPSR.M of when ‘10000’ mode_control_match = TRUE; // User mode when ‘10011’ mode_control_match = TRUE; // Supervisor mode when ‘11111’ mode_control_match = TRUE; // System mode otherwise mode_control_match = FALSE; // Any other mode when ‘01’ mode_control_match = CurrentModeIsPrivileged(); // Privileged mode when ‘10’ mode_control_match = !CurrentModeIsPrivileged(); // Unprivileged mode when ‘11’ mode_control_match = TRUE; // Any mode // Byte lane select case CurrentInstrSet() of ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C3-29 Debug Events when InstrSet_ARM byte_select_match = (DBGBCR[N]<8:5> != ‘0000’); when InstrSet_Thumb, InstrSet_ThumbEE case address<1> of when ‘0’ byte_select_match = (DBGBCR[N]<6:5> != ‘00’); when ‘1’ byte_select_match = (DBGBCR[N]<8:7> != ‘00’); when InstrSet_Jazelle case address<1:0> of when ‘00’ byte_select_match = (DBGBCR[N]<5> == ‘1’); when ‘01’ byte_select_match = (DBGBCR[N]<6> == ‘1’); when ‘10’ byte_select_match = (DBGBCR[N]<7> == ‘1’); when ‘11’ byte_select_match = (DBGBCR[N]<8> == ‘1’); // Address mask case DBGBCR[N]<28:24> of when ‘00000’ // This implies no mask, but the byte address is always dealt with by // byte_select_match, so the mask always has the bottom two bits set. mask = ZeroExtend(‘11’, 32); when ‘00001’, ‘00010’ UNPREDICTABLE; otherwise mask = ZeroExtend(Ones(UInt(DBGBCR[N]<28:24>)), 32); if DBGBCR[N]<8:5> != ‘1111’ then unk_match = TRUE; // Meaning of BVR case DBGBCR[N]<22:20> of when ‘000’ // Unlinked IVA match cmp_in = address; linked = FALSE; when ‘001’ // Linked IVA match cmp_in = address; linked = TRUE; mismatch = FALSE; mismatch = FALSE; mon_debug_ok = TRUE; mon_debug_ok = TRUE; when ‘010’ // Unlinked context ID match if N < UInt(DBGDIDR.BRPs) - UInt(DBGDIDR.CTX_CMPs) then UNPREDICTABLE; if DBGBCR[N]<8:5> != ‘1111’ || DBGBCR[N]<28:24> != ‘00000’ then unk_match = TRUE; mask = Zeros(32); cmp_in = CONTEXTIDR; linked = FALSE; mismatch = FALSE; mon_debug_ok = FALSE; when ‘011’ // Linked context ID match (does not match directly, only via link) return FALSE; when ‘100’ // Unlinked IVA mismatch if UInt(DBGDIDR.Version) < 2 then UNPREDICTABLE; cmp_in = address; linked = FALSE; mismatch = TRUE; when ‘101’ // Linked IVA mismatch if UInt(DBGDIDR.Version) < 2 then UNPREDICTABLE; cmp_in = address; linked = TRUE; mismatch = TRUE; mon_debug_ok = FALSE; mon_debug_ok = FALSE; otherwise // Reserved unk_match = TRUE; if !IsZero(DBGBVR[N] AND mask) then unk_match = TRUE; C3-30 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Events BVR_match = byte_select_match && (cmp_in AND NOT(mask)) == DBGBVR[N]; if mismatch then BVR_match = !BVR_match; // // // // // if If this is not the first unit of the instruction and there is an address match, then the breakpoint match is UNPREDICTABLE, except in the “single-step” case where it is a mismatch breakpoint without a range set. If there is a match on the first unit of the instruction, that will override the UNKNOWN case here. In the single-step case, matches on the subsequent units of the instruction are ignored. BVR_match && !first then if mismatch && DBGBCR[N]<28:24> == ‘00000’ then // Single-step case BVR_match = FALSE; else BVR_match = boolean UNKNOWN; // Security state case DBGBCR[N]<15:14> of when ‘00’ secure_state_match = TRUE; when ‘01’ secure_state_match = !IsSecure(); when ‘10’ secure_state_match = IsSecure(); when ‘11’ UNPREDICTABLE; // // // // Any state (or no Security Extensions) Non-secure only Secure only Reserved match = mode_control_match && BVR_match && secure_state_match; // If linked, check the linked BRP. if linked then match = match && BRPLinkMatch(UInt(DBGBCR[N]<19:16>)); elsif DBGBCR[N]<19:16> != ‘0000’ then unk_match = TRUE; // // // if When Monitor debug-mode is configured: * some types of event are ignored in v6 Debug and v6.1 Debug in privileged modes * some types of event are UNPREDICTABLE in v7 Debug. !mon_debug_ok && DBGDSCR<15:14> == ‘10’ then if UInt(DBGDIDR.Version) < 3 then if CurrentModeIsPrivileged() then return FALSE; else if DBGBCR[N]<2:1> == ‘01’ || DBGBCR[N]<2:1> == ‘11’ then UNPREDICTABLE; if unk_match then return boolean UNKNOWN; else return match; // BRPLinkMatch() // ============== boolean BRPLinkMatch(integer M) assert M <= UInt(DBGDIDR.BRPs); if M < UInt(DBGDIDR.BRPs) - UInt(DBGDIDR.CTX_CMPs) then UNPREDICTABLE; // If this breakpoint is not enabled, return immediately. if DBGBCR[M]<0> == ‘0’ return FALSE; unk_match = FALSE; ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C3-31 Debug Events if if if if if if DBGBCR[M]<2:1> != ‘11’ then unk_match = TRUE; DBGBCR[M]<8:5> != ‘1111’ then unk_match = TRUE; DBGBCR[M]<15:14> != ‘00’ then unk_match = TRUE; DBGBCR[M]<19:16> != ‘0000’ then unk_match = TRUE; DBGBCR[M]<22:20> != ‘011’ then unk_match = TRUE; DBGBCR[M]<28:24> != ‘00000’ then unk_match = TRUE; if unk_match then return boolean UNKNOWN; else return (CONTEXTIDR == DBGBVR[M]); // Variables used to record most recent interrupts of various types. bits(32) VCR_Recent_IRQ_S; bits(32) VCR_Recent_IRQ_NS; bits(32) VCR_Recent_FIQ_S; bits(32) VCR_Recent_FIQ_NS; boolean VCR_Recent_IRQ_S_Valid; boolean VCR_Recent_IRQ_NS_Valid; boolean VCR_Recent_FIQ_S_Valid; boolean VCR_Recent_FIQ_NS_Valid; // VCR_OnTakingInterrupt() // ======================= VCR_OnTakingInterrupt(bits(32) vector, boolean FIQnIRQ) if SCTLR.VE == ‘1’ then if FIQnIRQ then if IsSecure() then if DBGVCR<7> == ‘0’ || (HaveSecurityExt() && SCR.FIQ == ‘1’) then IMPLEMENTATION_DEFINED whether the variables are updated; else VCR_Recent_FIQ_S = vector; VCR_Recent_FIQ_S_Valid = TRUE; else if DBGVCR<31> == ‘0’ || (HaveSecurityExt() && SCR.FIQ == ‘1’) then IMPLEMENTATION_DEFINED whether the variables are updated; else VCR_Recent_FIQ_NS = vector; VCR_Recent_FIQ_NS_Valid = TRUE; else if IsSecure() then if DBGVCR<6> == ‘0’ || (HaveSecurityExt() && SCR.IRQ == ‘1’) then IMPLEMENTATION_DEFINED whether the variables are updated; else VCR_Recent_IRQ_S = vector; VCR_Recent_IRQ_S_Valid = TRUE; else if DBGVCR<30> == ‘0’ || (HaveSecurityExt() && SCR.IRQ == ‘1’) then IMPLEMENTATION_DEFINED whether the variables are updated; else VCR_Recent_IRQ_NS = vector; C3-32 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Events VCR_Recent_IRQ_NS_Valid = TRUE; return; // VCRVectorMatch() // ================ // // The result of this // TRUE // boolean UNKNOWN // FALSE function says whether iaddr and eaddr match for vector catch purposes: if they definitely match if it is UNPREDICTABLE whether they match if they definitely do not match boolean VCRVectorMatch(bits(32) iaddr, boolean first, bits(32) eaddr) match = FALSE; unpred = FALSE; if eaddr<31:2> == iaddr<31:2> then if eaddr<1:0> == iaddr<1:0> then // Exact address match is a definite match if on the first unit of the instruction, // otherwise an UNPREDICTABLE match. if first then match = TRUE; else unpred = TRUE; else // Check for other cases of UNPREDICTABLE matches. case CurrentInstrSet() of when InstrSet_ARM unpred = TRUE; when InstrSet_Thumb, InstrSet_ThumbEE if iaddr<1> == eaddr<1> then unpred = TRUE; if iaddr<1:0> == ‘10’ && eaddr<1:0> == ‘00’ then unpred = TRUE; when InstrSet_Jazelle if eaddr<1:0> == ‘00’ then unpred = TRUE; if eaddr<1:0> == ‘10’ && iaddr<1:0> == ‘11’ then unpred = TRUE; if match then return TRUE; elsif unpred then return boolean UNKNOWN; else return FALSE; // VCRMatch() // ========== boolean VCRMatch(bits(32) address, boolean first) // Determine addresses for IRQ and FIQ comparisons. if SCTLR.VE == ‘0’ then VCR_Recent_IRQ_S_Valid = FALSE; VCR_Recent_IRQ_NS_Valid = FALSE; VCR_Recent_FIQ_S_Valid = FALSE; VCR_Recent_FIQ_NS_Valid = FALSE; irq_addr = ExcVectorBase() + 24; irq_addr_v = TRUE; fiq_addr = ExcVectorBase() + 28; fiq_addr_v = TRUE; ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C3-33 Debug Events else if IsSecure() then irq_addr = VCR_Recent_IRQ_S; irq_addr_v = VCR_Recent_IRQ_S_Valid; fiq_addr = VCR_Recent_FIQ_S; fiq_addr_v = VCR_Recent_FIQ_S_Valid; else irq_addr = VCR_Recent_IRQ_NS; irq_addr_v = VCR_Recent_IRQ_NS_Valid; fiq_addr = VCR_Recent_FIQ_NS; fiq_addr_v = VCR_Recent_FIQ_NS_Valid; a_match = FALSE; match = FALSE; // Boolean for a match on an abort vector // Boolean for a match on any other vector // Check for non-monitor, non-reset matches, using DBGVCR<7:1> if no Security // Extensions or in Secure state, or DBGVCR<31:25> if in Non-secure state. start = if IsSecure() then 0 else 24; if DBGVCR == ‘1’ then match = match || VCRVectorMatch(address, first, ExcVectorBase()+4); if DBGVCR == ‘1’ then match = match || VCRVectorMatch(address, first, ExcVectorBase()+8); if DBGVCR == ‘1’ then a_match = a_match || VCRVectorMatch(address, first, ExcVectorBase()+12); if DBGVCR == ‘1’ then a_match = a_match || VCRVectorMatch(address, first, ExcVectorBase()+16); if DBGVCR == ‘1’ then if HaveSecurityExt() && SCR.IRQ == ‘1’ && SCTLR.VE == ‘1’ then IMPLEMENTATION_DEFINED what test is made, if any; else if irq_addr_v then match = match || VCRVectorMatch(address, first, irq_addr); if DBGVCR == ‘1’ then if HaveSecurityExt() && SCR.FIQ == ‘1’ && SCTLR.VE == ‘1’ then IMPLEMENTATION_DEFINED what test is made, if any; else if fiq_addr_v then match = match || VCRVectorMatch(address, first, fiq_addr); // If we have the Security Extensions and are in Secure state, check for monitor matches. if HaveSecurityExt() && IsSecure() then if DBGVCR<10> == ‘1’ then match = match || VCRVectorMatch(address, first, MVBAR+8); if DBGVCR<11> == ‘1’ then a_match = a_match || VCRVectorMatch(address, first, MVBAR+12); if DBGVCR<12> == ‘1’ then a_match = a_match || VCRVectorMatch(address, first, MVBAR+16); if DBGVCR<14> == ‘1’ then match = match || VCRVectorMatch(address, first, MVBAR+24); if DBGVCR<15> == ‘1’ then match = match || VCRVectorMatch(address, first, MVBAR+28); // Check for reset matches. // In v7 Debug this check is made regardless of the security state. // In v6 Debug and v6.1 Debug this check is only made in Secure state. vector = if SCTLR.V == ‘1’ then Ones(16):Zeros(16) else Zeros(32); if DBGVCR<0> == ‘1’ && (UInt(DBGDIDR.Version) >= 3 || IsSecure()) then match = match || VCRVectorMatch(address, first, vector); C3-34 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Events // When Monitor debug-mode is configured, abort vector catches are ignored in v6 Debug // and v6.1 Debug, but UNPREDICTABLE in v7 Debug. if a_match && DBGDSCR<15:14> == ‘10’ then if UInt(DBGDIDR.Version) < 3 then a_match = FALSE; else UNPREDICTABLE; return match || a_match; Watchpoints If invasive debug is enabled, the Debug_CheckDataAccess() function checks WRP matches for each data access. If the implementation includes IMPLEMENTATION DEFINED support for watchpoint generation on memory hint operations, or on cache maintenance operations, the function also checks for WRP matches on the appropriate operations. If a match is found the function calls WatchpointDebugEvent(). If the debug event is not permitted, it is ignored by the processor. On a simple sequential execution model: • for a synchronous watchpoint, the Debug_CheckDataAccess() test is made before the data access • for an asynchronous watchpoint, the Debug_CheckDataAccess() test is made after the data access. For more information see Synchronous and Asynchronous Watchpoint debug events on page C3-18. The WRPMatch() function checks an individual WRP match. In ARMv7, it is IMPLEMENTATION DEFINED whether WRP matches use eight byte lanes or four. The WRPUsesEightByteLanes() function returns TRUE if they use eight byte lanes and FALSE if they use four. Using eight byte lanes is permitted only in ARMv7. boolean WRPUsesEightByteLanes() For these functions the parameters read, write, privileged and secure are determined at the point the access is made, and not from the state of the processor at the point where WRPMatch() is executed. For swaps, read = write = TRUE. // Debug_CheckDataAccess() // ======================= boolean Debug_CheckDataAccess(bits(32) address, integer size, boolean read, boolean write, boolean privileged, boolean secure) // Do nothing if debug disabled; if DBGDSCR<15:14> == ‘00’ then return; match = FALSE; // Each byte accessed by the data access is checked for byte = address to address + size - 1 for N = 0 to UInt(DBGDIDR.WRPs) if WRPMatch(N, byte, read, write, privileged, secure) then match = TRUE; if match then WatchpointDebugEvent(); return; ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C3-35 Debug Events // WRPMatch() // ========== boolean WRPMatch(integer N, bits(32) address, boolean read, boolean write, boolean privileged, boolean secure) assert N <= UInt(DBGDIDR.WRPs); // If watchpoint is not enabled, return immediately. if DBGWCR[N]<0> == ‘0’ return FALSE; // Access privilege match case DBGWCR[N]<2:1> of when ‘00’ UNPREDICTABLE; when ‘01’ privilege_match = privileged; when ‘10’ privilege_match = !privileged; when ‘11’ privilege_match = TRUE; // // // // Reserved Only privileged accesses Only unprivileged accesses Any access // Load/Store access control match case DBGWCR[N]<4:3> of when ‘00’ UNPREDICTABLE; when ‘01’ load_store_match = read; when ‘10’ load_store_match = write; when ‘11’ load_store_match = TRUE; // // // // Reserved Only load, load exclusive or swap Only store, store exclusive or swap All accesses // Address match case DBGWCR[N]<28:24> of when ‘00000’ // No mask // If implementation uses 8 byte lanes, DBGWVR[N]<2> == ‘1’ selects 4 byte lane // behavior. if DBGWVR[N]<2> == ‘1’ then bits = 2; if DBGWCR[N]<12:9> != ‘0000’ then UNPREDICTABLE; else bits = if WRPUsesEightByteLanes() then 3 else 2; mask = ZeroExtend(Ones(bits), 32); if !IsZero(DBGWVR[N]<1:0>) then UNPREDICTABLE; byte = UInt(address); WVR_match = ((address AND NOT(mask)) == DBGWVR[N]) && (DBGWCR[N]<5+byte> == ‘1’); when ‘00001’, ‘00010’ UNPREDICTABLE; // Reserved otherwise // Masked address check mask = ZeroExtend(Ones(UInt(DBGWCR[N]<28:24>)), 32); if !IsZero(DBGWVR[N] AND mask) then UNPREDICTABLE; if DBGWCR[N]<8:5> != ‘1111’ then UNPREDICTABLE; if WRPUsesEightByteLanes() && (DBGWCR[N]<12:9> != ‘1111’) then UNPREDICTABLE; WVR_match = ((address AND NOT(mask)) == DBGWVR[N]); // Security state case DBGWCR[N]<15:14> of when ‘00’ secure_state_match = TRUE; when ‘01’ secure_state_match = !secure; C3-36 // Any access (or no Security Extensions) // Only non-secure accesses Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Events when ‘10’ when ‘11’ secure_state_match = secure; UNPREDICTABLE; // Only secure accesses // Reserved match = privilege_match && load_store_match && WVR_match && secure_state_match; // Check for linking linked = (DBGWCR[N]<22> == ‘1’); if linked then match = match && BRPLinkMatch(UInt(DBGWCR[N]<19:16>)); elsif DBGWCR[N]<19:16> != ‘0000’ then UNPREDICTABLE; return match; ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C3-37 Debug Events C3.3 Halting debug events A Halting debug event is one of the following: • An External Debug Request debug event. This is a request from the system for the processor to enter Debug state. The method of generating an External Debug Request is IMPLEMENTATION DEFINED. Typically it is by asserting an External Debug Request input to the processor. • A Halt Request debug event. This occurs when the debug logic receives a Halt request command. In v7 Debug, a debugger generates a Halt request command by writing 1 to the DBGDRCR Halt request bit, see Debug Run Control Register (DBGDRCR), v7 Debug only on page C10-29. • An OS Unlock Catch debug event. This occurs when both of the following are true: — the OS Unlock Catch is enabled in the Event Catch Register — the OS Lock transitions from the locked to the unlocked condition. For details see Event Catch Register (DBGECR) on page C10-78 and OS Lock Access Register (DBGOSLAR) on page C10-75. If invasive debug is disabled when one of these events is detected, the request is ignored and no Halting debug event occurs. Invasive debug is disabled when the external debug interface signal DBGEN is LOW. If DBGEN is HIGH, meaning that invasive debug is enabled, and a Halting debug event occurs when it is not permitted, the Halting debug event is pended. This means that the processor enters Debug state when it transitions to a security state or processor mode where the Halting debug event is permitted. However, if DBGEN goes LOW before the processor enters the security state or processor mode where the Halting debug event is permitted, it is UNPREDICTABLE whether the event remains pended. If the debug logic is reset before the processor enters the permitted security state or processor mode, the processor must remove pending Halt Request and OS Unlock catch debug events. Whether a pending External Debug Request debug event is removed is IMPLEMENTATION DEFINED. Note The IMPLEMENTATION DEFINED details of External Debug Request might specify that it is pended externally by the peripheral that is driving it until the processor acknowledges the request by entering Debug state. In such a system the pending request is typically held over a debug logic reset. If a Halting debug event occurs when debug is enabled and the event is permitted, or the Halting debug event becomes permitted while it is pending, it is guaranteed that Debug state is entered by the end of the next Instruction Synchronization Barrier (ISB) operation, exception entry, or exception return. See Run-control and cross-triggering signals on page AppxA-5 for details of the recommended external debug interface. C3-38 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Events In v6 Debug and v6.1 Debug: • if the processor implements the recommended ARM Debug Interface v4, the Halt request command is issued through the JTAG interface, by placing the HALT instruction in the IR and taking the Debug Test Access Port State Machine (Debug TAP State Machine) through the Run-Test/Idle state • the OS Unlock Catch debug event is not supported. In v6 Debug it is IMPLEMENTATION DEFINED whether Halting debug events cause entry to Debug state when Halting debug-mode is not configured and enabled. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C3-39 Debug Events C3.4 Generation of debug events The generation of Breakpoint and Watchpoint debug events can be dependent on the context of the processor, including: • the current processor mode • the contents of the CONTEXTIDR • the Secure security state setting, if the processor implements Security Extensions. The generation of debug events is also dependent on the state of the debug event generation logic: • Breakpoint debug events are dependent on the contents of the relevant Breakpoint Register Pair (BRP) • Watchpoint debug events are dependent on the contents of the relevant Watchpoint Register Pair (WRP) • Linked Breakpoint or Watchpoint debug events are dependent on the settings of a second BRP • Vector Catch debug events are dependent on the settings in the Vector Catch Register (DBGVCR) • OS Unlock Catch debug events are dependent on the setting of the Event Catch Register (DBGECR). In addition, as shown in Table C3-1 on page C3-2, the generation of debug events is dependent on: • the invasive debug authentication settings, see Chapter C2 Invasive Debug Authentication • the values of the DBGDSCR.HDBGen and DBGDSCR.MDBGen bits, see Debug Status and Control Register (DBGDSCR) on page C10-10. The following events are guaranteed to take effect on the debug event generation logic by the end of the next ISB operation, exception entry, or exception return: C3-40 • Context changing operations, including: — mode changes — writes to the CONTEXTIDR — security state changes. • Operations that change the state of the debug event generation logic, including: — writes to BRP registers, for Breakpoint debug events, or Linked Breakpoint or Watchpoint debug events — writes to WRP registers, for Watchpoint debug events — writes to the DBGVCR, for Vector Catch debug events — writes to the DBGECR, for OS Unlock Catch debug events — changes to the authentication signals — writes to the DBGDSCR. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Events Usually, exception return sequences are also context changing operations, and hence the context change operation is guaranteed to take effect on the breakpoint matching logic by the end of that exception return sequence. To ensure a change in the debug event generation logic has completed before a particular event or piece of code is debugged you must include an ISB, exception entry or exception return after the change in the Debug settings. In the absence of an ISB, exception entry or exception return, it is UNPREDICTABLE when the changes take place. Between a context change operation and the end of the next ISB, exception entry or exception return it is whether the processing of a debug event depends on the old or the new context. UNPREDICTABLE Between operations that change the state of the debug event generation logic and the end of the next ISB, exception entry or exception return, it is UNPREDICTABLE whether debug event generation depends on the old or the new settings. Example C3-1 describes such a case. Example C3-1 Unpredictability in debug event generation A breakpoint is set at an address programmed in its Breakpoint Value Register (DBGBVR) and is configured through its Breakpoint Control Register (DBGBCR). In this example: • DBGBCR is programmed to only match in User, Supervisor or System modes • the address in the DBGBVR is the address of an instruction in an abort handler routine normally entered from the Prefetch Abort exception vector in Abort mode, but located after that handler switches from Abort mode to Supervisor mode using a CPS instruction. If there is no ISB, exception entry or exception return between the CPS instruction and the instruction at the breakpoint address, it is UNPREDICTABLE whether the breakpoint matches, even though the instruction is executed in Supervisor mode. Such an ISB, exception entry or exception return is usually not required to ensure correct operation of the program. In this example because the program is switching between two privileged modes it is not required to ensure correct operation of the memory system. ARMv7 does not require that such changes take effect on instruction fetches from the memory system, or on memory accesses made by the processor, at the same point as they take effect on the debug logic. The only architectural requirement is that such a change executed before an ISB operation must be visible to both the memory system and the debug logic for all instructions executed after the ISB operation. This requirement is described earlier in this section. The processor must test for any possible: • Watchpoint debug event before a memory access operation is observed. • Breakpoint or Vector Catch debug event before the instruction is executed, that is, before the instruction has any effect on the architectural state of the processor. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C3-41 Debug Events As a result, for an instruction that modifies the context in which the processor tests for debug events, the processor must test for all possible debug event in terms of the context before the memory access operation is observed or the instruction executes. For example: • In a v7 Debug implementation that uses the memory-mapped interface, a write to the DBGWCR to enable a watchpoint on a Data Virtual Address (DVA) of the DBGWCR itself must not trigger the watchpoint. Conversely, a write to the DBGWCR to disable the same watchpoint must trigger the watchpoint. For more information, see Debug events in the debug monitor on page C3-26. • An instruction that writes to a Breakpoint Control Register (DBGBCR) or Vector Catch Register (DBGVCR) to enable a debug event on the Instruction Virtual Address (IVA) of the instruction itself must not trigger the debug event. Conversely, a write to the DBGBCR or DBGVCR to disable the same debug event must trigger the debug event. C3-42 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Events C3.5 Debug event prioritization Debug events can be synchronous or asynchronous: • Breakpoint, Vector Catch, BKPT Instruction, and synchronous Watchpoint debug events are all synchronous debug events • asynchronous Watchpoints and all Halting debug events are all asynchronous debug events. A single instruction can generate a number of synchronous debug events. It can also generate a number of synchronous exceptions. The principles given in Exception priority order on page B1-33 apply to those exceptions and debug events, in addition to the following: • An instruction fetch that generates an MMU fault, MPU fault, or external abort does not generate a Breakpoint or Vector Catch debug event. • Breakpoint and Vector Catch debug events are associated with the instruction and are taken before the instruction executes. Therefore, when a Breakpoint or Vector Catch debug event occurs no other synchronous exception or debug event that would have occurred as a result of executing the instruction is generated. • If a single instruction has more than one of the following debug events associated with it, it is UNPREDICTABLE which is taken: — Breakpoint — Vector Catch. • No instruction is valid if it has a Prefetch Abort exception associated with it. Therefore, if an instruction causes a Prefetch Abort exception no other synchronous exception or debug event that would have occurred as a result of executing the instruction is generated. • An instruction that generates an Undefined Instruction exception does not cause any memory access, and therefore cannot cause a Data Abort exception or a Watchpoint debug event. • A memory access that generates an MMU fault or an MPU fault must not generate a Watchpoint debug event. • A memory access that generates an MMU fault, an MPU fault, or a synchronous Watchpoint debug event must not generate an external abort. • All other synchronous exceptions and synchronous debug events are mutually exclusive, and are derived from a decode of the instruction. The ARM architecture does not define when asynchronous debug events other than asynchronous Watchpoint debug events are taken. Therefore the prioritization of asynchronous debug events other than asynchronous Watchpoint debug events is IMPLEMENTATION DEFINED. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C3-43 Debug Events Debug events must be taken in the execution order of the sequential execution model. This means that if an instruction causes a debug event then that event must be taken before any debug event on any instruction that would execute after that instruction, in the sequential execution model. In particular, if the execution of an instruction generates an asynchronous Watchpoint debug event: C3-44 • the asynchronous Watchpoint debug event must not be taken if the instruction also generates any synchronous debug event • if the instruction does not generate any synchronous debug event, then the asynchronous Watchpoint debug event must be taken before any subsequent: — synchronous or asynchronous debug event — synchronous or asynchronous precise exception. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Chapter C4 Debug Exceptions This chapter describes debug exceptions, that are used to handle debug events when the processor is configured for Monitor debug-mode. It contains the following sections: • About debug exceptions on page C4-2 • Effects of debug exceptions on CP15 registers and the DBGWFAR on page C4-4. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C4-1 Debug Exceptions C4.1 About debug exceptions A debug exception is taken when: • a permitted Software debug event occurs when invasive debug is enabled and Monitor debug-mode is selected • a BKPT instruction is executed when one of: — invasive debug is disabled — the debug event is not permitted — no debug-mode is selected. For more information, see Table C3-1 on page C3-2. You must be careful when programming certain events because you might leave the processor in an unrecoverable state. See Unpredictable behavior on Software debug events on page C3-24. How the processor handles the debug exception depends on the cause of the exception, and is described in: • Debug exception on Breakpoint, BKPT Instruction or Vector Catch debug events • Debug exception on Watchpoint debug event on page C4-3. Halting debug events never cause a debug exception. The Halting debug events are: • External Debug Request debug event • Halt Request debug event • OS Unlock Catch debug event. C4.1.1 Debug exception on Breakpoint, BKPT Instruction or Vector Catch debug events If the cause of the debug exception is a Breakpoint, BKPT Instruction, or a Vector Catch debug event, the processor performs the following actions: • Sets the DBGDSCR.MOE bits according to Table C10-3 on page C10-26. • Sets the IFSR and IFAR as described in Effects of debug exceptions on CP15 registers and the DBGWFAR on page C4-4. • Generates a Prefetch Abort exception, see Prefetch Abort exception on page B1-54 The Prefetch Abort handler is responsible for checking the IFSR bits to find out whether the exception entry was caused by a debug exception. If it was, typically the handler branches to the debug monitor. C4-2 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Exceptions C4.1.2 Debug exception on Watchpoint debug event If the cause of the debug exception is a Watchpoint debug event, the processor performs the following actions: • Sets the DBGDSCR.MOE bits either to Asynchronous Watchpoint Occurred or to Synchronous Watchpoint Occurred. • Sets the DFSR, DFAR, and DBGWFAR as described in Effects of debug exceptions on CP15 registers and the DBGWFAR on page C4-4. • Generates a precise Data Abort exception, see Data Abort exception on page B1-55. For more information, see Synchronous and Asynchronous Watchpoint debug events on page C3-18. The Data Abort handler is responsible for checking the DFSR bits to find out whether the exception entry was caused by a debug exception. If it was, typically the handler branches to the debug monitor: • The DBGWFAR indicates the address of the instruction that caused the Watchpoint debug event. see Watchpoint Fault Address Register (DBGWFAR) on page C10-28. • LR_abt holds the address of (instruction to restart at + 8). If the watchpoint is synchronous, the instruction to restart at is the instruction that triggered the watchpoint. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C4-3 Debug Exceptions C4.2 Effects of debug exceptions on CP15 registers and the DBGWFAR There are four CP15 registers that are used to record abort information: DFAR IFAR Data Fault Address Register, see: • c6, Data Fault Address Register (DFAR) on page B3-124 for a VMSA implementation • c6, Data Fault Address Register (DFAR) on page B4-57 for a PMSA implementation. Instruction Fault Address Register, see: • c6, Instruction Fault Address Register (IFAR) on page B3-125 for a VMSA implementation • c6, Instruction Fault Address Register (IFAR) on page B4-58 for a PMSA implementation. DFSR Data Fault Status Register, see: • c5, Data Fault Status Register (DFSR) on page B3-121 for a VMSA implementation • c5, Data Fault Status Register (DFSR) on page B4-55 for a PMSA implementation. IFSR Instruction Fault Status Register, see: • c5, Instruction Fault Status Register (IFSR) on page B3-122 for a VMSA implementation • c5, Instruction Fault Status Register (IFSR) on page B4-56 for a PMSA implementation. Their usage model for normal operation is described in: • Fault Status and Fault Address registers in a VMSA implementation on page B3-48 for a VMSA implementation • Fault Status and Fault Address registers in a PMSA implementation on page B4-18 for a PMSA implementation. Additional registers might be used to return additional IMPLEMENTATION DEFINED fault status information, see: • c5, Auxiliary Data and Instruction Fault Status Registers (ADFSR and AIFSR) on page B3-123 for a VMSA implementation • c5, Auxiliary Data and Instruction Fault Status Registers (ADFSR and AIFSR) on page B4-56 for a PMSA implementation. Also, information can be returned in the Watchpoint Fault Address Register (DBGWFAR). The implementation of the DBGWFAR depends on the Debug architecture version: • In v6 Debug it is implemented as a register in CP15 c6. • In v6.1 Debug it is implemented in CP14, and use of the CP15 alias is deprecated. • In v7 Debug it can be implemented in the Extended CP14 interface, and has no alias in CP15. For more information, see Watchpoint Fault Address Register (DBGWFAR) on page C10-28. C4-4 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Exceptions In Monitor debug-mode the behavior on the exception generated as a result of a Breakpoint, BKPT Instruction, or Vector Catch debug events is as follows: • the IFSR is updated with the encoding for a debug event, IFSR[10,3:0] = 0b00010 • the IFAR is UNKNOWN following these debug exceptions • the DFSR, DFAR and DBGWFAR are unchanged. In Monitor debug-mode the behavior on the exception generated as a result of a Watchpoint debug event is as follows: • the IFSR and IFAR are unchanged. • the DFSR is updated with the encoding for a debug event, DFSR[10,3:0] = 0b00010. • the Domain and Write fields in the DFSR, DFSR[11,7:4], are UNKNOWN. However, an ARMv6 watchpoint sets the Domain field. • the DFAR is UNKNOWN. • the DBGWFAR is updated with the Instruction Virtual Address (IVA) of the instruction that accessed the watchpointed address, plus an offset that depends on the instruction set state of the processor for that instruction: — 8 in ARM state — 4 in Thumb and ThumbEE states IMPLEMENTATION DEFINED in Jazelle state. — See Memory addresses on page C3-23 for a definition of the IVA used to update the DBGWFAR. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C4-5 Debug Exceptions C4-6 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Chapter C5 Debug State This chapter describes Debug state, that is entered if a debug event occurs when the processor is configured for Halting debug-mode. It contains the following sections: • About Debug state on page C5-2 • Entering Debug state on page C5-3 • Behavior of the PC and CPSR in Debug state on page C5-7 • Executing instructions in Debug state on page C5-9 • Privilege in Debug state on page C5-13 • Behavior of non-invasive debug in Debug state on page C5-19 • Exceptions in Debug state on page C5-20 • Memory system behavior in Debug state on page C5-24 • Leaving Debug state on page C5-28. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C5-1 Debug State C5.1 About Debug state When invasive debug is enabled, the processor switches to a special state called Debug state if one of: • a permitted Software debug event occurs and Halting debug-mode is selected • a permitted Halting debug event occurs • a Halting debug event becomes permitted while it is pending. For more information, see State on page B1-3. In Debug state, control passes to an external agent. Note The external agent is usually a debugger. However it might be some other agent connecting to the debug port of the processor. This could be another processor in the same System on Chip (SoC) device. In part C of this manual this agent is often referred to as a debugger. In v6 Debug, when debug is enabled and Halting debug-mode is not selected it is IMPLEMENTATION DEFINED whether a Halting debug event causes entry to Debug state. For more information, see Table C3-1 on page C3-2. Halting debug-mode is configured by setting DBGDSCR[14] to 1, see Debug Status and Control Register (DBGDSCR) on page C10-10. Parts A and B of this manual describe how an ARMv7 processor behaves when it is not in Debug state, that is, when it is in Non-debug state. In Debug state, the processor behavior changes as follows: • The PC and CPSR behave as described in Behavior of the PC and CPSR in Debug state on page C5-7. • Instructions are prefetched from the Instruction Transfer Register (DBGITR), see Executing instructions in Debug state on page C5-9. • The processor can execute only instructions from the ARM instruction set. • The rules about modes and privileges are different to those in Non-debug state, see Privilege in Debug state on page C5-13. • Non-invasive debug features are disabled, see Behavior of non-invasive debug in Debug state on page C5-19. • Exceptions are treated as described in Exceptions in Debug state on page C5-20. Other software and Halting debug events and interrupts are ignored. • If the processor implements a DMA engine, its behavior is IMPLEMENTATION DEFINED. • If the processor implements a cache or other local memory that it keeps coherent with other memories in the system during normal operation, it must continue to service coherency requests from the other memories. Leaving Debug state on page C5-28 describes how to leave Debug state. C5-2 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug State C5.2 Entering Debug state When invasive debug is enabled, the processor switches to a special state called Debug state if one of: • a permitted Software debug event occurs and Halting debug-mode is selected • a permitted Halting debug event occurs • a Halting debug event becomes permitted while it is pending. In v6 Debug, when debug is enabled and Halting debug-mode is not selected it is IMPLEMENTATION DEFINED whether a Halting debug event causes entry to Debug state. For more information, see Table C3-1 on page C3-2. Note Entering Debug state does not ensure that the effect of any context altering operation performed before Debug state entry is visible to instructions executed in Debug state. On entering Debug state the processor follows this sequence: 1. The processor signals to the system that it is entering Debug state. Details of the signalling method, including whether it is implemented, are IMPLEMENTATION DEFINED. 2. Processing is halted, meaning: ARM DDI 0406B • The instruction pipeline is flushed and no more instructions are prefetched from memory. • The values of the following are not changed on entering Debug state: — the PC and CPSR — all general-purpose and program status registers, including SPSR_abt and LR_abt. • The values of the PC and CPSR remain unchanged while the processor is in Debug state. • Instructions can be executed in Debug state, see Executing instructions in Debug state on page C5-9, but when the instruction is executed in this way the normal effects of incrementing the PC and updating the CPSR are masked. • The effect of Debug state entry on CP15 registers and debug registers is described in Effect of entering Debug state on CP15 registers and the DBGWFAR on page C5-4. • The processor signals to the system that it is in Debug state. Details of this signalling method, including whether it is implemented, are IMPLEMENTATION DEFINED. • The processor might: — ensure that all Non-debug state memory operations complete and signal this to the system — set the DBGDSCR.ADAdiscard bit to 1. However, processor behavior regarding memory accesses outstanding at Debug state entry is IMPLEMENTATION DEFINED, see Asynchronous aborts and entry to Debug state on page C5-5. Details of the method used to signal to the system that Non-debug state memory operations are complete, including whether any such method is implemented, are IMPLEMENTATION DEFINED. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C5-3 Debug State 3. The processor signals that it has entered Debug state and is ready for an external agent to take control: • the DBGDSCR.HALTED bit is set to 1 • the DBGDSCR.MOE field is set according to Table C10-3 on page C10-26. For details of the recommended external debug interface, see Run-control and cross-triggering signals on page AppxA-5 and DBGACK and DBGCPUDONE on page AppxA-7. C5.2.1 Effect of entering Debug state on CP15 registers and the DBGWFAR The actions taken on entering Debug state depend on what caused the Debug state entry: • If Debug state was entered following a Watchpoint debug event, then the DBGWFAR is updated with the Instruction Virtual Address (IVA) of the instruction that accessed the watchpointed address, plus an offset that depends on the instruction set state of the processor when the debug event was generated: — 8 in ARM state — 4 in Thumb and ThumbEE states IMPLEMENTATION DEFINED in Jazelle state. — See Memory addresses on page C3-23 for a definition of the IVA used to update the DBGWFAR. • Otherwise, the DBGWFAR is unchanged on entry to Debug state. Note • The implementation of the DBGWFAR depends on the Debug architecture version: — In v6 Debug it is implemented as a register in CP15 c6. — In v6.1 Debug it is implemented in CP14, and use of the CP15 alias is deprecated. — In v7 Debug it can be implemented in the Extended CP14 interface, and has no alias in CP15. For more information, see Watchpoint Fault Address Register (DBGWFAR) on page C10-28. • In all cases, on Debug state entry the DBGWFAR is set as described in this section. In ARMv7, all CP15 registers are unchanged on entry to Debug state. In ARMv6, all CP15 registers except for the DBGWFAR are unchanged on entry to Debug state. The unchanged registers include the IFSR, DFSR, DFAR, and IFAR. On a processor that implements the Security Extensions, the SCR.NS bit is not changed on entry to Debug state. C5-4 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug State C5.2.2 Asynchronous aborts and entry to Debug state On entry to Debug state, it is IMPLEMENTATION DEFINED whether a processor ensures that all memory operations complete and that all possible outstanding asynchronous aborts have been recognized before it signals that it has entered Debug state. Behavior in ARMv7 In ARMv7 the behavior on entry to Debug state is signaled by the value of the DBGDSCR.ADAdiscard bit: If DBGDSCR.ADAdiscard == 1 The processor has already ensured that all possible outstanding asynchronous aborts have been recognized, and the debugger has no additional action to take. If the processor logic always automatically sets DBGDSCR.ADAdiscard to 1 on entry to Debug state, then DBGDSCR.ADAdiscard is implemented as a read-only bit. If DBGDSCR.ADAdiscard == 0 The following sequence must occur: 1. The debugger must execute an IMPLEMENTATION DEFINED sequence to determine whether all possible outstanding asynchronous aborts have been recognized. An asynchronous abort recognized as a result of this sequence is not acted on immediately. Instead, the processor latches the abort event and its type. The asynchronous abort is acted on when the processor leaves Debug state. 2. DBGDSCR.ADAdiscard is set to 1. There are two ways this requirement can be implemented: • The processor automatically sets this bit to 1 on detecting the execution of the IMPLEMENTATION DEFINED sequence. In this case, DBGDSCR.ADAdiscard is implemented as a read-only bit. • The IMPLEMENTATION DEFINED sequence sets DBGDSCR.ADAdiscard to 1, using the processor interface to the debug resources. In this case, DBGDSCR.ADAdiscard is implemented as a read/write bit. When the processor has completed all Non-debug state memory operations it signals this to the system. It is IMPLEMENTATION DEFINED whether the processor ensures that all Non-debug state memory operations are complete on entry to Debug state. If not, the processor does not signal the system until all Non-debug state memory operations are complete. This might be linked to the debugger executing the IMPLEMENTATION DEFINED sequence to determine whether all possible outstanding asynchronous aborts have been recognized. Details of the method used to signal to the system that Non-debug state memory operations are complete, including whether any such method is implemented, are IMPLEMENTATION DEFINED. While the processor is in Debug state and DBGDSCR.ADAdiscard is 1, any memory access that causes an asynchronous abort has the effect of setting DBGDSCR.ADABORT_l, the Sticky Asynchronous Data Abort bit, to 1, but has no other effect on the state of the processor. The cause and type of the abort are not recorded. Because the abort is not pended, if the asynchronous abort is an external asynchronous abort and ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C5-5 Debug State the Interrupt Status Register (ISR) is implemented, the ISR.A bit is not updated. For more information, see c12, Interrupt Status Register (ISR) on page B3-150. The ISR is implemented only on processors that include the Security Extensions. Any asynchronous abort that is latched before or during the entry to Debug state sequence is not overwritten by any new asynchronous abort. This means the latched abort is not discarded if the processor detects another asynchronous abort while DBGDSCR.ADAdiscard is set to 1. The processor acts on the latched abort on exit from Debug state. If the asynchronous abort is an external asynchronous abort and the ISR is implemented, the ISR.A bit reads as 1 indicating that an external abort is pending. If the debugger has executed any memory access instructions, before exiting Debug state it must issue an IMPLEMENTATION DEFINED sequence of operations to ensure that any asynchronous aborts have been recognized and discarded. On exit from Debug state, the processor automatically clears DBGDSCR.ADAdiscard to 0. If an asynchronous abort is signalled to the processor before entry to Debug state or between entry to Debug state and DBGDSCR.ADAdiscard transitioning from 0 to 1, then the processor acts on the asynchronous abort on exit from Debug state: • if the CPSR.A bit is 1, the abort is pended, and is taken when the A bit is cleared to 0 • if the CPSR.A bit is 0, the abort is taken by the processor. For details of the recommended external debug interface, see Run-control and cross-triggering signals on page AppxA-5 and DBGACK and DBGCPUDONE on page AppxA-7. Behavior in ARMv6 The behavior of asynchronous aborts on entry to Debug state differs between v6 Debug and v6.1 Debug: v6 Debug DBGDSCR.ADAdiscard bit is not defined. A debugger must always perform a Data Synchronization Barrier (DSB) following entry to Debug state. If the CPSR.A bit is 0 and an asynchronous abort is signalled, the processor takes a Data Abort exception as described in Undefined Instruction and Data Abort exceptions in Debug state in v6 Debug on page C5-23. A subsequent read of the processor state by the debugger returns the updated values of CPSR, LR_abt and SPSR_abt. The value of DBGDSCR.ADABORT_l is UNKNOWN when in Non-debug state. v6.1 Debug A debugger must always perform a DSB following entry to Debug state. This DSB causes DBGDSCR.ADAdiscard to be set to 1. DBGDSCR.ADABORT_l is set to 1 on any asynchronous abort detected while the processor is in Debug state, regardless of the setting of DBGDSCR.ADAdiscard. C5-6 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug State C5.3 Behavior of the PC and CPSR in Debug state Processing is halted on entry to Debug state, see Entering Debug state on page C5-3. After the processor has entered Debug state, a read of the PC returns a return address plus an offset. The return address depends on the type of debug event, and the offset depends on the instruction set state of the processor when Debug state was entered. Table C5-1 shows the values returned by a read of the PC. Table C5-1 PC value while in Debug state PC value, for instruction set state on Debug entry Debug event Meaning of return address (RA) a obtained from PC read ARM Thumb or ThumbEE Jazelle b Breakpoint RA + 8 RA + 4 RA + Offset Breakpointed instruction address Synchronous Watchpoint RA + 8 RA + 4 RA + Offset Address of the instruction that triggered the watchpoint c Asynchronous Watchpoint RA + 8 RA + 4 RA + Offset Address of the instruction for the execution to resume d BKPT instruction RA + 8 RA + 4 RA + Offset BKPT instruction address Vector Catch RA + 8 RA + 4 RA + Offset Vector address External Debug Request RA + 8 RA + 4 RA + Offset Address of the instruction for the execution to resume Halt Request RA + 8 RA + 4 RA + Offset Address of the instruction for the execution to resume OS Unlock Catch RA + 8 RA + 4 RA + Offset Address of the instruction for the execution to resume a. Return address (RA) is the address of the first instruction that the processor must execute on exit from Debug state. This enables program execution to continue from where it stopped. b. Offset is an IMPLEMENTATION DEFINED value that is constant and documented. c. Returning to RA has the effect of retrying the instruction. This can have implications under the memory order model. See Synchronous and Asynchronous Watchpoint debug events on page C3-18. d. RA is not the address of the instruction that triggered the watchpoint, but one that was executed some number of instructions later. The address of the instruction that triggered the watchpoint can be discovered from the value in the DBGWFAR. See Watchpoint Fault Address Register (DBGWFAR) on page C10-28. On entry to Debug state, the value of the CPSR is the value that the instruction at the return address would have been executed with, if it had not been cancelled by the debug event. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C5-7 Debug State Note This rule also applies to the CPSR.IT bits. On entry to Debug state these bits apply to the instruction at the return address. The behavior of the PC and CPSR registers in Debug state is: • The PC does not increment on instruction execution. • The CPSR.IT status bits do not change on instruction execution. • Predictable instructions that explicitly modify the PC or CPSR operate normally, updating the PC or CPSR. • After the processor has entered Debug state, if 0b1111 (the PC) is specified as a source operand for an instruction it returns a value as described in Table C5-1 on page C5-7. The value read from the PC is aligned according to the rules of the instruction set state indicated by the CPSR.J and CPSR.T execution state bits, regardless of the fact that the processor only executes the ARM instruction set in Debug state. For more information, see Executing instructions in Debug state on page C5-9. • If an instruction sequence for writing a particular value to the PC is executed while in Debug state, and the processor is later forced to restart without any additional write to the PC or CPSR, the execution starts at the address corresponding to the written value. • If the CPSR is written to while in Debug state, subsequent reads of the PC return an UNKNOWN value, and if the processor is later forced to restart without having performed a write to the PC, the restart address is UNKNOWN. However, the CPSR can be read correctly while in Debug state. Note In v6 Debug, the CPSR and PC can be written in a single instruction, for example, MOVS pc,lr. In this case, the behavior is as if the CPSR is written first, followed by the PC. That is, if the processor is later forced to restart the restart address is predictable. This does not apply to v6.1 Debug or v7 Debug because in these versions of the Debug architecture such instructions are themselves UNPREDICTABLE in Debug state. • If the processor is forced to restart without having performed a write to the PC, the restart address is UNKNOWN. • If the PC is written to while in Debug state, later reads of the PC return an UNKNOWN value. See also Executing instructions in Debug state on page C5-9, for more restrictions on instructions that might be executed in Debug state, including those that access the PC and CPSR. C5-8 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug State C5.4 Executing instructions in Debug state In Debug state the processor executes instructions issued through the Instruction Transfer Register, see Instruction Transfer Register (DBGITR) on page C10-46. This mechanism is enabled through DBGDSCR[13], see Debug Status and Control Register (DBGDSCR) on page C10-10. The following rules and restrictions apply to instructions that can be executed in this manner in Debug state: • The processor instruction set state always corresponds to the state indicated by the CPSR.J and CPSR.T execution state bits. However, the processor always interprets the instructions issued through the DBGITR as ARM instruction set opcodes, regardless of the setting of the CPSR.J and CPSR.T execution state bits. Some ARM instructions are UNPREDICTABLE if executed in Debug state. These instructions are either: — identified as UNPREDICTABLE in this list — shown as UNPREDICTABLE in Table C5-2 on page C5-10. Otherwise, except for the value read from the PC, instructions executed in Debug state operate as specified for ARM state. Behavior of the PC and CPSR in Debug state on page C5-7 specifies the value read from the PC. • The CPSR.IT execution state bits are ignored. This means that instructions issued through the DBGITR do not fail their condition tests unexpectedly. However, the condition code field in an ARM instruction is honored. The CPSR.IT execution state bits are preserved and do not change when instructions are executed, unless an instruction that modifies those bits explicitly is executed. • The branch instructions B, BL, BLX (immediate), and BLX (register) are UNPREDICTABLE in Debug state. • The hint instructions WFI, WFE and YIELD are UNPREDICTABLE in Debug state. • All memory read and memory write instructions with the PC as the base address register read an value for the base address. UNKNOWN • Certain instructions that normally update the CPSR can be UNPREDICTABLE in Debug state, see Writing to the CPSR in Debug state on page C5-10. • Instructions that load a value from memory into the PC are UNPREDICTABLE in Debug state. • Conditional instructions that write explicitly to the PC are UNPREDICTABLE in Debug state. • There are additional restrictions on data-processing instructions that write to the PC. See Data-processing instructions with the PC as the target in Debug state on page C5-12. • The exception-generating instructions SVC, SMC and BKPT are UNPREDICTABLE in Debug state. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C5-9 Debug State • A coprocessor can impose additional constraints or usage guidelines for executing coprocessor instructions in Debug state. For example a coprocessor that signals internal exception conditions asynchronously using the Undefined Instruction exception, as described in Undefined Instruction exception on page B1-49, might require particular sequences of instructions to avoid the corruption of coprocessor state associated with the exception condition. In the case of the VFP coprocessors, these sequences are defined by the VFP subarchitecture. Other coprocessors must define any sequences that they require. Note The definition of UNPREDICTABLE implies that an UNPREDICTABLE instruction executed in Debug state must not put the processor into a state or mode in which debug is not permitted, or change the state of any register that cannot be accessed from the current state and mode. C5.4.1 Writing to the CPSR in Debug state Table C5-2 lists all the instructions that normally update the CPSR, and shows their behavior in Debug state. Which instructions are permitted in Debug state depends on the version of the Debug architecture. Table C5-2 Instructions that modify the CPSR, and their behavior in Debug state Instruction v6 Debug v6.1 Debug, v7 Debug BX UNPREDICTABLE if CPSR.J is 1. Can be used to set or clear the CPSR.T bit. UNPREDICTABLE. BXJ UNPREDICTABLE if either CPSR.J or CPSR.T is 1. Can be used to set CPSR.J to 1. UNPREDICTABLE. SETEND UNPREDICTABLE. UNPREDICTABLE. CPS UNPREDICTABLE. UNPREDICTABLE. S PC,, a Can be used to set the CPSR to any value by copying it from the SPSR of the current mode. UNPREDICTABLE. PC,, a Do not update the CPSR. See Data-processing instructions with the PC as the target in Debug state on page C5-12. MSR CPSR_fsxc Use for setting the CPSR bits other than the execution state bits. Use for setting the CPSR to any value. MSR CPSR_ Use for setting the CPSR bits other than the execution state bits. UNPREDICTABLE. LDM (exception return), RFE UNPREDICTABLE. UNPREDICTABLE. a. is one of ADC, ADD, AND, ASR, BIC, EOR, LSL, LSR, MOV, MVN, ORR, ROR, RRX, RSB, RSC, SBC, or SUB. C5-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug State Note Table C5-2 on page C5-10 does not: • Include instructions that only update the CPSR bits that are available in the APSR, that is the N, Z, C, V, Q, and GE[3:0] bits. These instructions have their normal behavior when executed in Debug state. • Include instructions that cause exceptions, such as SVC, SMC, and memory access instructions that cause aborts. The behavior of these instructions is described in Exceptions in Debug state on page C5-20. • Show what values can be written to the CPSR. For more information, see Altering CPSR privileged bits in Debug state on page C5-14. MRS and MSR instructions in Debug state, in v6.1 Debug and v7 Debug In v6.1 Debug and v7 Debug, if the debugger has to update bits in the CPSR that are not available in the APSR then it must use the MSR instruction to do so, writing to CPSR_fsxc. The behavior of the CPSR forms of the MSR and MRS instructions in Debug state differs from their behavior in Non-debug state. In the CPSR: • • in Non-debug state: — the execution state bits, other than the E bit, are RAZ when read by an MRS instruction — writes to the execution state bits, other than the E bit, by an MSR instruction are ignored in Debug state: — the execution state bits return their correct values when read by an MRS instruction — writes to the execution state bits by an MSR instruction update the execution state bits. MRS and MSR instructions that read and write an SPSR behave as they do in Non-debug state. In addition, in Debug state in v6.1 Debug and v7 Debug: • if you use an MSR instruction to directly modify the execution state bits of the CPSR, you must then perform an Instruction Synchronization Barrier (ISB) operation • an MSR instruction that does not write to all fields of the CPSR is UNPREDICTABLE • if an MRS instruction reads the CPSR after an MSR writes the execution state bits, and before an ISB, the value returned is UNKNOWN • if the processor leaves Debug state after an MSR writes the execution state bits, and before an ISB, the behavior of the processor is UNPREDICTABLE. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C5-11 Debug State C5.4.2 Data-processing instructions with the PC as the target in Debug state The ARM encodings of the instructions ADC, ADD, AND, ASR, BIC, EOR, LSL, LSR, MOV, MVN, ORR, ROR, RRX, RSB, RSC, SBC, and SUB write to the PC if their Rd field is 0b1111. When in Non-debug state, these ARM instruction encodings can be executed only in the ARM instruction set state, and their behavior is described in: • SUBS PC, LR and related instructions on page B6-25, if the S bit of the instruction is 1. • Chapter A8 Instruction Details, if the S bit of the instruction is 0. These ARM instructions cause interworking branches in ARMv7, and simple branches in earlier versions of the architecture. The ALUWritePC() pseudocode function describes this operation, see Pseudocode details of operations on ARM core registers on page A2-12. In Debug state, these ARM instruction encodings can be executed in any instruction set state, and the following additional restrictions apply: • If the S bit of the instruction is 1: — in v7 Debug and v6.1 Debug, behavior is UNPREDICTABLE — in v6 Debug, behavior is as in Non-debug state. • If the S bit of the instruction is 0, the behavior is always either a simple branch without changing instruction set state or UNPREDICTABLE. Table C5-3 shows how this behavior depends on the instruction set state, the value alu<1:0> written to the PC, and the architecture version. Table C5-3 Debug state rules for data-processing instructions that write to the PC CPSR.J CPSR.T Instruction set state Architecture version alu<1:0> Operation a 0 0 ARM ARMv7 00 BranchTo(alu<31:2>:'00') x1 UNPREDICTABLE b 10 UNPREDICTABLE ARMv6 xx BranchTo(alu<31:2>:'00') ARMv7 x0 UNPREDICTABLE b x1 BranchTo(alu<31:1>:'0') ARMv6 xx BranchTo(alu<31:1>:'0') ARMv7 or ARMv6 xx BranchTo(alu<31:0>) X 1 1 0 Thumb or ThumbEE Jazelle a. Pseudocode description of behavior, when the behavior is not UNPREDICTABLE. b. This behavior is changed from the behavior in Non-debug state. In all other rows, the behavior described is unchanged from the behavior in Non-debug state. C5-12 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug State C5.5 Privilege in Debug state In Debug state, instructions issued to the processor have the privileges to access and modify processor registers, memory and coprocessor registers that they would have if issued in the same mode and security state in Non-debug state. In User mode and Debug state, instructions have additional privileges to access or modify some registers and fields that cannot be accessed in User mode in Non-debug state. However, on processors that implement the Security Extensions and support Secure User halting debug, these additional privileges are restricted when all the following conditions are true: • the processor is in Debug state • the processor is in Secure User mode • invasive debug is not permitted in Secure privileged modes, because either DBGEN or SPIDEN is LOW, see Chapter C2 Invasive Debug Authentication. The following sections describe the instruction privileges, and the restrictions on them when these conditions are all true: • Accessing registers and memory in Debug state • Altering CPSR privileged bits in Debug state on page C5-14 • Changing the SCR.NS bit in Debug state on page C5-15 • Coprocessor and Advanced SIMD instructions in Debug state on page C5-16. C5.5.1 Accessing registers and memory in Debug state The rules for accessing ARM core registers and memory are the same in Debug state as in Non-debug state. For example, if the CPSR mode bits indicate the processor is in Supervisor mode: • reads of ARM core registers return the Supervisor mode registers • normal load and store operations make privileged accesses to memory • a load or store with User mode privilege operation, for example LDRT, makes a User mode privilege access. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C5-13 Debug State C5.5.2 Altering CPSR privileged bits in Debug state On processors that implement the Security Extensions, the processor: • prevents attempts to set the CPSR.M field to a value that would place the processor in a mode or security state where debug is not permitted • prevents updates to the Privileged bits of the CPSR in cases where Secure User halting debug is supported, the processor is in Secure User mode, and invasive debug is not permitted in Secure privileged modes • prevents attempts to set the CPSR.M field to 0b10001, FIQ mode, if NSACR.RFR == 1 and the processor is in Non-secure state. On processors that do not implement the Security Extensions, all CPSR updates that are permitted in a privileged mode when in Non-debug state, are permitted in Debug state. Table C5-4 defines the behavior on writes to the CPSR in Debug state. Table C5-4 Permitted updates to the CPSR in Debug state Mode Secure state Logical (DBGEN AND SPIDEN) SU halting debug a supported Update privileged CPSR bits b Modify CPSR.M to Monitor mode User Yes 0 Yes Update ignored UNPREDICTABLE c No Permitted d Permitted Privileged Yes 0 X Permitted d Permitted Any No 0 X Permitted d UNPREDICTABLE e Any X 1 X Permitted d Permitted a. Secure User halting debug support. b. This column does not apply to changing CPSR.M to Monitor mode. Apart from this, the CPSR bits are defined in Program Status Registers (PSRs) on page B1-14, and this column does apply to changing CPSR.M to any other value. c. The definition of UNPREDICTABLE implies the processor must not enter a privileged mode. d. Except that, regardless of the state of SPIDEN: The SCR.AW, SCR.FW and SCTLR.NMFI bits have the same effects on writes to CPSR.A and CPSR.F as they do in Non-debug state, see Control of exception handling by the Security Extensions on page B1-41 and Non-maskable fast interrupts on page B1-18. The NSACR.RFR bit has the same effect on writes to CPSR.M as it does in Non-debug state, see c1, Non-Secure Access Control Register (NSACR) on page B3-110. e. The definition of UNPREDICTABLE implies the processor must not enter Monitor mode, and must not enter FIQ mode when NSACR.RFR == 1. C5-14 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug State Being in Debug state when invasive halting debug is not permitted A processor can be in a Secure privileged mode with SPIDEN LOW, see Generation of debug events on page C3-40 and Changing the authentication signals on page AppxA-4. More generally, it is possible to be in Debug state when the current mode, security state or debug authentication signals indicate that, in Non-debug state, debug events would be ignored. There are two situations where this can occur: • Between a change in the debug authentication signals and the end of the next Instruction Synchronization Barrier operation, exception entry, or exception return. At this point it is it is UNPREDICTABLE whether the behavior of debug events that are generated follows the old or the new authentication signal settings. • Because it is possible to change the authentication signals while in Debug state. For example, the following sequence of events can occur: 1. The processor is in a Secure privileged mode. SPIDEN and DBGEN are both HIGH. 2. An instruction is prefetched that matches all the conditions for a breakpoint to occur. 3. That instruction is committed for execution. 4. At the same time, an external device writes to the peripheral that controls SPIDEN and DBGEN, causing SPIDEN to be deasserted to LOW. 5. SPIDEN changes, but the processor is already committed to entering Debug state. 6. The processor enters Debug state and is in a Secure privileged mode, even though SPIDEN is LOW. If this series of events occurs, the processor can change to other Secure privileged modes, including Monitor mode, and update privileged bits in the CPSR, because it is in a privileged mode. However, if the processor leaves Secure state or moves to Secure User mode, it might not be able to return to a Secure privileged mode. C5.5.3 Changing the SCR.NS bit in Debug state SCR.NS is the Non-secure state bit, see c1, Secure Configuration Register (SCR) on page B3-106. Because this bit is part of a coprocessor register, the rules for executing coprocessor instructions in Debug state apply, see Coprocessor and Advanced SIMD instructions in Debug state on page C5-16. In Debug state, the SCR can be written to: • • when Secure User halting debug is supported: — in any Secure privileged mode, including Monitor mode, regardless of the state of DBGEN and SPIDEN — in Secure User mode only if DBGEN and SPIDEN are both HIGH when Secure User halting debug is not supported, in any Secure mode, including Monitor mode, regardless of the state of DBGEN and SPIDEN. A write to the SCR in any other case is treated as an Undefined Instruction exception. For details of how Undefined Instruction exceptions are handled in Debug state see Exceptions in Debug state on page C5-20. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C5-15 Debug State This is a particular case of the rules for accessing CP15 registers described in Coprocessor and Advanced SIMD instructions in Debug state. Note Normally, in Monitor mode, any exception automatically clears the SCR.NS bit to 0. However an exception while in Debug state in Monitor mode does not have any effect on the value of the SCR.NS bit. C5.5.4 Coprocessor and Advanced SIMD instructions in Debug state The following sections describe the coprocessor and Advanced SIMD instructions in Debug state: • Instructions for CP0 to CP13, and Advanced SIMD instructions • Instructions for CP14 and CP15 on page C5-17. Instructions for CP0 to CP13, and Advanced SIMD instructions This subsection describes: • Coprocessor instructions for CP0 to CP13. These include the VFP instructions. • If the Advanced SIMD extension is implemented, the instruction encodings described in Advanced SIMD data-processing instructions on page A7-10 and Advanced SIMD element or structure load/store instructions on page A7-27. Access controls for these instructions are determined: • • by the CPACR, see: — c1, Coprocessor Access Control Register (CPACR) on page B3-104, for a VMSA implementation — c1, Coprocessor Access Control Register (CPACR) on page B4-51, for a PMSA implementation. additionally, if the Security Extensions are implemented, by the NSACR, see c1, Non-Secure Access Control Register (NSACR) on page B3-110. In v6.1 Debug and v7 Debug, in Debug state the current mode and security state define the privilege and access controls for these instructions. In v6 Debug, in Debug state it is IMPLEMENTATION DEFINED whether these instructions are executed using the privilege and access controls for the current mode and security state, or using the privilege and access controls for a privileged mode in the current security state. C5-16 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug State Instructions for CP14 and CP15 This subsection describes the coprocessor instructions for the internal coprocessors CP14 and CP15. The two groups of registers provided by CP14 are: • The CP14 debug registers, accessed by MCR and MRC instructions with == 0b000. Some of these registers can also be accessed by CP14 LDC and STC instructions. • The CP14 non-debug registers, accessed by MCR and MRC instructions with != 0b000. These include the trace registers. Accesses to CP14 and CP15 are as follows: • Instructions that access CP14 or CP15 registers that are permitted (not UNDEFINED) in User mode when in Non-debug state, are always permitted in Debug state. • Instructions that access CP14 debug registers that are permitted (not UNDEFINED) in privileged modes when in Non-debug state are permitted in Debug state, regardless of the debug authentication and the processor mode and security state. • If Secure User halting debug is supported, ARM recommends that certain CP15 instructions that a debugger requires to maintain memory coherency are permitted in Debug state regardless of debug permissions and the processor mode, see Access to specific cache management functions in Debug state on page C5-25. • If the processor is in a privileged mode or the debugger can write to the CPSR.M bits to change to a privileged mode, then instructions that access CP14 or CP15 registers that are permitted (not UNDEFINED) in privileged modes when in Non-debug state are permitted in Debug state. If the processor is in User mode there is no requirement to change to a privileged mode first. Note — Two particular cases are where Security Extensions are not implemented and where Secure User halting debug is not supported. In these cases the CPSR.M bits can always be changed to a privileged mode and, therefore, the debugger is able to access all CP14 and CP15 registers at all times. — Except for accesses to the Baseline CP14 debug registers, ARM deprecates accessing any CP14 or CP15 register from User mode in Debug state if that register cannot be accessed from User mode in Non-debug state. • In every case, permissions to access CP14 and CP15 registers while in Debug state are never greater than the permissions granted to any privileged mode when in Non-debug state in the current security state. • If the processor is in Secure User mode and the debugger cannot write to the CPSR.M bits to change to a privileged mode, then any instruction that accesses a CP14 non-debug register or a CP15 register is not permitted (UNDEFINED) in Debug state if it is not permitted in Secure User mode in Non-debug state. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C5-17 Debug State • Any CP14 or CP15 register access that is not permitted generates an Undefined Instruction exception. For details of how Undefined Instruction exceptions are handled in Debug state see Exceptions in Debug state on page C5-20. • If the processor is in a privileged mode or the debugger can write to the CPSR.M bits to change to a privileged mode, then any CP14 or CP15 instruction is UNPREDICTABLE in Debug state if that instruction is UNPREDICTABLE in Non-debug state. • On processors that implement the Security Extensions, any access to a Banked CP15 register accesses the copy for the current security state. If the processor is in Monitor mode, the Non-debug state rules for accessing CP15 registers in Monitor mode apply. This means that, for example: • If the processor is stopped in Non-secure state and invasive debug is not permitted in Secure privileged modes then the debugger has access only to those CP15 registers accessible in Non-secure state in Non-debug mode. • If the processor is stopped with invasive debug permitted in Secure privileged modes then the debugger has access to all CP15 registers. If the processor is in Non-secure state, the debugger can switch the processor to Monitor mode to access the SCR.NS bit, to give access to all CP15 registers. Invasive debug is permitted in Secure privileged modes when both SPIDEN and DBGEN are HIGH. In Debug state. the CP15SDISABLE input to the processor operates in exactly the same way as in Non-debug state, see The CP15SDISABLE input on page B3-76: C5-18 • if CP15SDISABLE is HIGH, any operation affected by CP15SDISABLE in Non-debug state results in an Undefined Instruction exception in Debug state • if CP15SDISABLE is LOW, it has no effect on any register access. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug State C5.6 Behavior of non-invasive debug in Debug state If any non-invasive debug features exist, their behavior in Debug state is broadly the same as when non-invasive debug is not permitted. For details see About non-invasive debug authentication on page C7-2. Note When the DBGDSCR.DBGack bit, Force Debug Acknowledge, is set to 1 and the processor is in Non-debug state, the behavior of non-invasive debug features is IMPLEMENTATION DEFINED. However, in this case non-invasive debug features must behave either as if in Debug state or as if Non-debug state. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C5-19 Debug State C5.7 Exceptions in Debug state This section describes how exceptions are handled when the processor is in Debug state: • exception handling is the same in Debug state in v7 Debug and v6.1 Debug, except for some slight differences in when asynchronous aborts are recognized • there are some differences in exception handling in Debug state in v6 Debug, and these are indicated. Exceptions are handled as follows when the processor is in Debug state: Reset On a Reset exception, the processor leaves Debug state. The reset handler runs in Non-debug state, see Reset on page B1-48. Note This only applies to a reset that in Non-debug state would cause a Reset exception. It does not apply to a debug logic reset. For more information on debug logic reset, see Recommended reset scheme for v7 Debug on page C6-16. Prefetch Abort A Prefetch Abort exception cannot be generated because no instructions are prefetched in Debug state. SVC The SVC instruction is UNPREDICTABLE. SMC The SMC instruction is UNPREDICTABLE. BKPT The BKPT instruction is UNPREDICTABLE. Debug events Debug events are ignored in Debug state. Interrupts IRQ and FIQ exceptions are disabled and not taken in Debug state. Note This behavior does not depend on the values of the I and F bits in the CPSR, and the value of these bits are not changed on entering Debug state. However, if the Interrupt Status Register (ISR) is implemented, the ISR.I and ISR.F bits continue to reflect the values of the IRQ and FIQ inputs to the processor. For more information, see c12, Interrupt Status Register (ISR) on page B3-150. C5-20 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug State Undefined Instruction Undefined Instruction exceptions are generated for the same reasons in Debug state as in Non-debug state. The behavior depends on the Debug architecture version: v6.1 Debug, v7 Debug When an Undefined Instruction exception is generated in Debug state, the processor takes the exception as follows: • PC, CPSR, SPSR_und, LR_und, SCR.NS, and DBGDSCR.MOE are unchanged. • The processor remains in Debug state. • DBGDSCR.UND_l, the Sticky Undefined Instruction bit, is set to 1. For more information, see the description of the UND_l bit in Debug Status and Control Register (DBGDSCR) on page C10-10. v6 Debug See Undefined Instruction and Data Abort exceptions in Debug state in v6 Debug on page C5-23. Synchronous data abort Data Abort exceptions are generated by synchronous data aborts in Debug state. The behavior depends on the Debug architecture version: v6.1 Debug, v7 Debug When a Data Abort exception is generated by a synchronous data abort in Debug state, the processor takes the exception as follows: • PC, CPSR, SPSR_abt, LR_abt, SCR.NS, and DBGDSCR.MOE are unchanged. • The processor remains in Debug state. • DBGDSCR.SDABORT_l, the Sticky Synchronous Data Abort bit, is set to 1. • The DFSR and DFAR are updated if any of: — Secure User halting debug is not supported — the processor is not in Secure User mode — invasive debug is permitted in Secure privileged modes. Otherwise it is IMPLEMENTATION DEFINED whether the DFSR and DFAR are updated. • If the ISR is implemented, the ISR.A bit is not changed, because no abort is pended. See also the description of the SDABORT_l bit in Debug Status and Control Register (DBGDSCR) on page C10-10. v6 Debug See Undefined Instruction and Data Abort exceptions in Debug state in v6 Debug on page C5-23. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C5-21 Debug State Asynchronous abort The behavior depends on the Debug architecture version: v6.1 Debug, v7 Debug When an asynchronous abort is signalled in Debug state, no Data Abort exception is generated and the processor behaves as follows: • The setting of the CPSR.A bit is ignored. • PC, CPSR, SPSR_abt, LR_abt, SCR.NS, and DBGDSCR.MOE are unchanged. • The processor remains in Debug state. • The DFSR is unchanged. • If DBGDSCR.ADAdiscard is 1: • — DBGDSCR.ADABORT_l, the Sticky Asynchronous Data Abort bit, is set to 1. — On exit from Debug state, this asynchronous abort is not acted on. — If the ISR is implemented, the ISR.A bit is not changed, because no abort is pended. If DBGDSCR.ADAdiscard is 0: — In v7 Debug, DBGDSCR.ADABORT_l is unchanged. — In v6.1 Debug, DBGDSCR.ADABORT_l is set to 1. — On exit from Debug state, this asynchronous abort is acted on. — If the asynchronous abort is an external asynchronous abort, and the ISR is implemented, the ISR.A bit is set to 1 indicating that an external abort is pending. See also: • Asynchronous aborts and entry to Debug state on page C5-5. • the descriptions of the ADABORT_l and ADAdiscard bits in Debug Status and Control Register (DBGDSCR) on page C10-10. v6 Debug When an asynchronous abort is signalled in Debug state, then: C5-22 • if the CPSR.A bit is 0, the abort is generated when the CPSR.A bit is cleared to 0 • if the CPSR.A bit is 1, a Data Abort exception is generated, see Undefined Instruction and Data Abort exceptions in Debug state in v6 Debug on page C5-23. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug State C5.7.1 Undefined Instruction and Data Abort exceptions in Debug state in v6 Debug In v6 Debug, if an Undefined Instruction exception is generated when the processor is in Jazelle state and Debug state, the result is UNPREDICTABLE. Otherwise, in v6 Debug, Undefined Instruction and Data Abort exceptions generated in Debug state are taken by the processor as follows: • • The PC, CPSR, and SPSR_ are set in the same way as in a normal Non-debug state exception entry. In addition: — if the exception is an asynchronous abort, and the PC has not yet been written, LR_abt is set as for exception entry in Non-debug state — in all other cases, LR_ is set to an UNKNOWN value. The processor remains in Debug state, and does not prefetch the exception vector. In addition, for a Data Abort exception: • The DFSR and DFAR are set in the same way as in a normal Non-debug state exception entry. The DBGWFAR is set to an UNKNOWN value. The IFSR is not modified. • DBGDSCR.ADABORT_l or DBGDSCR.SDABORT_l is set to 1. • The DBGDSCR.MOE bits are set to 0b0110, D-side abort occurred. For more information about asynchronous aborts in ARMv6 see Behavior in ARMv6 on page C5-6. Debuggers must take care when processing a debug event that occurred when the processor was executing an exception handler. The debugger must save the values of SPSR_und and LR_und before performing any operation that might result in an Undefined Instruction exception being generated in Debug state. The debugger must also save the values of SPSR_abt and LR_abt, and of the DFSR, DFAR and DBGWFAR before performing an operation that might generate a Data Abort exception when in Debug state. If this is not done, register values might be overwritten, resulting in UNPREDICTABLE software behavior. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C5-23 Debug State C5.8 Memory system behavior in Debug state The Debug architecture places requirements on the memory system. There are two general guidelines: • Memory coherency has to be maintained during debugging. • It is best if debugging is non-intrusive. This requires a way to preserve, for example, the contents of memory caches and translation lookaside buffers (TLBs), so the state of the target application is not altered. In Debug state, it is strongly recommended that the caches and TLBs, where implemented, behave as described here. For preservation purposes it is strongly recommended that it is possible to: • disable cache evictions and linefills, so that cache accesses, on read or write, do not cause the contents of caches to change. • disable TLB evictions and replacements, so that translations do not cause the contents of TLBs to change. The mechanisms for disabling these operations: • must be accessible by the external debugger • are only required when in Debug state. In v6.1 Debug and v7 Debug, the Debug State Cache Control Register (DBGDSCCR) and the Debug State MMU Control Register (DBGDSMCR) are used for this purpose. While the processor is in Debug state, no instruction fetches occur and therefore: • if the system implements separate instruction and data caches then there might be no instruction cache evictions or replacements • if the system implements separate instruction and data TLBs then there might be no instruction TLB evictions or replacements. In Debug state, reads must behave as in Non-debug state: • cache reads return data from the cache • cache misses fetch from external memory. A debugger must be able to maintain coherency between instruction and data memory, and maintain coherency in a multiprocessor system. This means that in Debug state a debugger must be able to force all writes to update all levels of memory to the point of coherency. It must be possible to reset the memory system of the processor to a known safe and coherent state. Also, it must be possible to reset any caches of meta-information, such as branch predictor arrays, to a safe and coherent state. For debugging purposes ARM recommends that TLBs can be disabled so that all TLB accesses are read from the main translation tables, and not from the TLB. This enables a debugger to access memory without using any virtual to physical memory mapping that is implemented for the application. C5-24 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug State C5.8.1 Access to specific cache management functions in Debug state If a processor includes the Security Extensions and supports Secure User halting debug, it must implement mechanisms that enable memory system requirements to be met when debugging in Secure User mode when invasive debug is not permitted in Secure privileged modes. This is a situation where executing the CP15 cache and TLB control operations would otherwise be prohibited. To meet these requirements, ARM recommends that, on a processor that implements the Security Extensions and supports Secure User halting debug, when the processor is in Debug state: • the rules for accessing CP15 registers do not apply for a certain set of register access operations • the set of operations depends on the Debug architecture version, as shown in Table C5-5. Table C5-5 CP15 operations permitted from User mode in Debug state Versions Operation Description v7 Debug MCR p15,0,,c7,c5,0 Invalidate entire instruction cache and flush branch predictor arrays a MCR p15,0,,c7,c5,1 Invalidate instruction cache by MVA a MCR p15,0,,c7,c5,7 Invalidate MVA from branch predictor array MCR p15,0,,c7,c10,1 Clean data or unified cache line by MVA to point of coherency b MCR p15,0,,c7,c10,2 Clean data or unified cache line by set/way b MCR p15,0,,c7,c11,1 Clean data or unified cache line by MVA to point of unification b MCR p15,0,,c7,c1,0 Invalidate entire instruction cache Inner Shareable c MCR p15,0,,c7,c1,6 Invalidate entire branch predictor array Inner Shareable c v6.1 Debug MCRR p15,0,,,c5 Invalidate instruction cache by VA range v6.1 Debug, v7 Debug MCR p15,0,,c7,c5,6 Flush entire branch predictor array a. See also v7 Debug restrictions on instruction cache invalidation in Secure User debug on page C5-26. b. A debugger does not have to perform cache cleaning operations if DBGDSCCR.nWT is implemented and is set to 0, see Debug State Cache Control Register (DBGDSCCR) on page C10-81. This is because when nWT is set to 0, writes do not leave dirty data in the cache that is not coherent with outer levels of memory. However, the I-cache is not updated, so I-cache invalidate operations are required. c. These instructions are part of the Multiprocessing Extensions. See Multiprocessor effects on cache maintenance operations on page B2-23. These instructions must be executable in Debug state regardless of any processor setting. However, use of an operation can generate an abort if instruction cache lockdown is in use. For more information about debug access to coprocessor instructions, see Coprocessor and Advanced SIMD instructions in Debug state on page C5-16. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C5-25 Debug State For more information about the ARMv7 cache maintenance operations, see: • CP15 c7, Cache and branch predictor maintenance functions on page B3-126 for a VMSA implementation • CP15 c7, Cache and branch predictor maintenance functions on page B4-68 for a PMSA implementation. In v6 Debug and on any processor that does not implement Security Extensions, or when debugging in a state and mode where privileged CP15 operations can be executed, the debugger can use any CP15 operations. These include, but are not limited to, those operations listed in Table C5-5 on page C5-25. v7 Debug restrictions on instruction cache invalidation in Secure User debug An ARMv7 implementation that includes the Security Extensions and supports Secure User halting debug must support Secure User debug access to at least one of these instruction cache invalidation operations: • Invalidate entire instruction cache, and flush branch predictor arrays, MCR p15,0,,c7,c5,0 • Invalidate instruction cache by MVA, MCR p15,0,,c7,c5,1. An implementation might support both of these operations. If the DSCCR.nWT bit is not implemented, the implementation must also support Secure User debug access to at least the operation to Clean data or unified cache line by MVA to point of coherency. A debugger requires access to an instruction cache invalidation operations so that it can maintain coherency between instruction memory and data memory, and between processors in a multiprocessor system. However, the architecture imposes restrictions on the operation of these instructions in Debug state, that are not required when the instructions are used in normal operation. In Secure User mode in Debug state when invasive debug is not permitted in Secure privileged modes: • If the Invalidate all instruction caches operation is supported it must: — invalidate all unlocked lines in the cache — leave any locked lines in the cache unchanged. If there are locked lines in the cache the instruction can abort, but only after it has invalidated all unlocked lines. However, there is no requirement for the operation to abort if there are locked lines. • If the Invalidate instruction caches by MVA operation is supported, this operation must not invalidate a locked line. If an instruction attempts to invalidate a locked line in Secure User mode debug the implementation must either: — ignore the instruction — abort the instruction. These requirements mean that these instructions might operate differently in Debug state to how they operate in Non-debug state. Note In ARMv7, it is IMPLEMENTATION DEFINED whether instruction cache locking is supported. C5-26 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug State C5.8.2 Debug state Cache and MMU Control Registers In v6 Debug, the Debug state MMU Control Register (DBGDSMCR) and Debug state Cache Control Register (DBGDSCCR) are not defined. In v6.1 Debug, ARM recommends the debug registers DBGDSMCR and DBGDSCCR. v7 Debug requires DBGDSMCR and DBGDSCCR, but there can be IMPLEMENTATION DEFINED limits on their behavior. For descriptions of these registers, see Memory system control registers on page C10-80. In all debug implementations there can be IMPLEMENTATION DEFINED support for cache behavior override and, on a VMSA implementation, for TLB debug control. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C5-27 Debug State C5.9 Leaving Debug state The processor leaves Debug state when a restart request command is received. A restart request can be one of the following: • An External Restart request. This is a request from the system for the processor to leave Debug state. The External Restart request enables multiple processors to be restarted synchronously. The External Restart request is generated by IMPLEMENTATION DEFINED means. Typically this is by asserting an External Restart request input to the processor. • A restart request command. In v7 Debug, the restart request command is made by a debugger writing 1 to the DBGDRCR Restart request bit, see Debug Run Control Register (DBGDRCR), v7 Debug only on page C10-29 A number of flags in the Debug Status and Control Register (DBGDSCR) must be set correctly before leaving Debug state, see Debug Status and Control Register (DBGDSCR) on page C10-10. The flags that must be set are: • the sticky exception flags, DBGDSCR[8:6], must be set to 0b000 • the Execute ARM Instruction Enable bit, DBGDSCR.ITRen, must be set to 0 • the Latched Instruction Complete flag, DBGDSCR.InstrCompl_l, must be set to 1. In v7 Debug the sticky exception flags are cleared to 0 by writing 1 to the Clear Sticky Exceptions bit of the DBGDRCR. This operation can be combined with the restart request command. For more information see Debug Run Control Register (DBGDRCR), v7 Debug only on page C10-29. If the processor is signaled to leave Debug state without all of these flags set to the correct values the results are UNPREDICTABLE. On receipt of a restart request, the processor performs a sequence of operations to leave Debug state. If DBGDSCR is read during the restart sequence, DBGDSCR.RESTARTED must read as 0 and DBGDSCR.HALTED must read as 1. At all other times DBGDSCR.RESTARTED must read as 1. On completion of the restart sequence, the processor leaves Debug state: C5-28 • DBGDSCR.HALTED is set to 0. • The processor stops ignoring debug events and starts executing instructions from the address held in the PC, in the mode and instruction set state indicated by the current value of the CPSR. The execution state bits of the CPSR are honored, and the IT bits state machine is restarted, with the current value applying to the first instruction executed. • Unless the DBGDSCR.DBGack bit is set to 1, the processor signals to the system that it is in Non-debug state. Details of this signalling method, including whether it is implemented, are IMPLEMENTATION DEFINED. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug State Note Leaving Debug state is not a memory barrier operation. This means that: • If a debugger executes any context altering operations in Debug state, it must issue an Instruction Synchronization Barrier (ISB) instruction before leaving Debug state • If the debugger executes any memory access instructions in Debug state, it must execute a Data Synchronization Barrier (DSB) instruction before leaving Debug state, to ensure those accesses are complete. This DSB might form part of the IMPLEMENTATION DEFINED sequence of instructions required to ensure that the processor has recognized any asynchronous aborts, as described in Asynchronous aborts and entry to Debug state on page C5-5. For details of the recommended external debug interface, see Run-control and cross-triggering signals on page AppxA-5 and DBGACK and DBGCPUDONE on page AppxA-7. In v6 Debug and v6.1 Debug: • the DBGDRCR and External Restart request are not supported • if the processor implements the recommended ARM Debug Interface v4, the restart request command is issued through the JTAG interface by placing the RESTART instruction in the IR and taking the Debug TAP State Machine through the Run-Test/Idle state. Connecting multiple JTAG interfaces in series enables multiple processors to be restarted synchronously. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C5-29 Debug State C5-30 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Chapter C6 Debug Register Interfaces This chapter describes the debug register interfaces. It contains the following sections: • About the debug register interfaces on page C6-2 • Reset and power-down support on page C6-4 • Debug register map on page C6-18 • Synchronization of debug register updates on page C6-24 • Access permissions on page C6-26 • The CP14 debug register interfaces on page C6-32 • The memory-mapped and recommended external debug interfaces on page C6-43. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C6-1 Debug Register Interfaces C6.1 About the debug register interfaces The Debug architecture defines a set of debug registers. The debug register interfaces provide access to these registers. This chapter describes the different ways of implementing the debug register interfaces. The debug register interfaces provide access to the debug registers from: • software running on the processor, see Processor interface to the debug registers • an external debugger, see External interface to the debug registers. The debug register interfaces always include the Debug Communications Channel, see The Debug Communications Channel (DCC) on page C6-3. C6.1.1 Processor interface to the debug registers Table C6-4 on page C6-32 lists the set of CP14 debug instructions for accessing the debug registers that must be implemented. The possible interfaces between the software running on the processor and the debug registers are: • The Baseline CP14 interface. This provides access to a small set of the debug registers through a set of coprocessor instructions. It must be implemented by all processors. • The Extended CP14 interface. This provides access to the remaining debug registers through a coprocessor interface. It is required in v6 Debug and v6.1 Debug, and is optional in v7 Debug. • The memory-mapped interface. This provides memory-mapped access to the debug registers. It is introduced in v7 Debug, and is an optional interface. When it is implemented: — some of the registers that are accessed through the Baseline CP14 interface are not available through the memory-mapped interface — it is IMPLEMENTATION DEFINED whether the memory-mapped interface is visible only to the processor in which the debug registers are implemented, or is also visible to other processors in the system. An ARMv7 implementation must include the Baseline CP14 interface and at least one of: • the Extended CP14 interface • the memory-mapped interface. C6.1.2 External interface to the debug registers Every ARMv6 and ARMv7 implementation must include an external debug interface. This interface provides access to the debug registers from an external debugger through a Debug Access Port (DAP). This interface is IMPLEMENTATION DEFINED. For details of the interface recommended by ARM: • for an ARMv7 implementation, see the ARM Debug Interface v5 Architecture Specification • for an ARMv6 implementation, contact ARM. C6-2 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Register Interfaces The Debug architecture does not require implementation of the recommended interface. However: • the ARM RealView tools require the recommended interface • ARM recommends this interface for compatibility with other tool chains. C6.1.3 The Debug Communications Channel (DCC) The debug register interface includes the Debug Communications Channel (DCC). This is accessed through two physical registers: • DBGDTRTX, for data transfers from the processor to an external debugger • DBGDTRRX, for data transfers from the external debugger to the processor. In addition, there are four DCC status flags in the DBGDSCR: • TXfull and TXfull_l, indicating the DBGDTRTX status • RXfull and RXfull_l, indicating the DBGDTRRX status. There are separate internal and external views of the DBGDSCR, and of the DBGDTRTX and DBGDTRRX Registers: • DBGDTRTXint, DBGDTRRXint and DBGDSCRint provide the internal view • DBGDTRTXext, DBGDTRRXext and DBGDSCRext provide the external view. For more information, see Internal and external views of the DBGDSCR and the DCC registers on page C6-21. Note In previous descriptions of the DCC, the term DTR (Data Transfer Register) is used to describe the DCC data registers. In those descriptions, the DBGDTRTX Register is named wDTR, and the DBGDTRRX Register is named rDTR. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C6-3 Debug Register Interfaces C6.2 Reset and power-down support This section contains the following subsections: • Debug guidelines for systems with energy management capability • Power domains and debug on page C6-5 • The OS Save and Restore mechanism on page C6-8 • Recommended reset scheme for v7 Debug on page C6-16. C6.2.1 Debug guidelines for systems with energy management capability ARMv7 processors can be built with energy management capabilities. This section describes how to use the v7 Debug features to debug software running on these systems. v7 Debug only permits debugging software that is running on a system where: • energy-saving measures are taken only when the processor is in an idle state • it is a function of the operating system, or other supervisor code, to take any implemented energy-saving measures. The measures that the OS can take to save energy during an idle state can be split in two groups: Standby The OS takes some measures, including using IMPLEMENTATION DEFINED features, to reduce energy consumption. The processor preserves the processor state, including the debug logic state. Changing from standby to normal operation does not involve a reset of the processor. Power-down The OS takes some measures to reduce energy consumption. These measures mean the processor cannot preserve the processor state, and therefore the measures taken must include the OS saving any processor state it requires not to be lost. Changing from power-down to normal operation must include: • a reset of the processor, after the power level has been restored • reinstallation of the processor state by the OS. Standby is the least invasive OS energy saving state. It implies only that the processor is unavailable, and does not clear any of the debug settings. For standby, v7 Debug prescribes only the following: • If the processor is in standby and a Halting debug event is triggered the processor must leave standby to handle the debug event. If the processor executed a WFI or WFE instruction to enter standby then that instruction is retired. • If the processor is in standby and the external debug or memory-mapped interface is accessed, the processor must automatically: — leave standby — respond to the debug transaction — go back to standby. This is possible because the external debug and memory-mapped interface can insert wait states, for example by holding PREADYDBG LOW, until the processor has left standby. C6-4 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Register Interfaces The protocol for communicating between the debug logic and the power controller, enabling the processor to leave and return to standby automatically, is IMPLEMENTATION DEFINED. v7 Debug includes features that can aid software debugging in a system that dynamically powers down the processor. These techniques are described in greater detail in the following sections. C6.2.2 Power domains and debug This section does not apply to v6 Debug and v6.1 Debug, which support only a single power domain. This section discusses how, in v7 Debug, some registers can be split between different power domains to implement support for debug over power-down and re-powering of the processor. In v7 Debug, it is IMPLEMENTATION DEFINED whether a processor supports debug over power-down: • debug over power-down can be supported only if the processor implements the features summarized in this section • when a processor implements the features required for debug over power-down, it is whether a system that includes that processor supports debug over power-down IMPLEMENTATION DEFINED • usually, a system that does not support debug over power-down implements a single power domain. An ARMv7 processor with a single power domain cannot support debug over power-down. This means that the number of power domains supported by an ARMv7 processor is IMPLEMENTATION DEFINED. However, ARM recommends that at least two are implemented, to provide support for debug over power-down. The two power domains required for this are: • a debug power domain • a core power domain. The debug power domain contains the external debug interface control logic and a subset of the debug resources. This subset is determined by physical placement constraints and other considerations that are explained later in this chapter. Figure C6-1 on page C6-7 shows an example of such a system. For example, this arrangement is useful for debugging systems where several processors are connected to the same debug bus and where one or more of the processors can power-down at any time. It has two advantages: • The debug bus is not made unavailable by the core power domain powering down: — if the debugger tries to access the processor with the core power domain powered-down, the external debug interface can return a slave-generated error response instead of locking the system — if the debugger tries to access another processor, it can proceed normally. The debug bus might be, for example, an APBv3 or internal debug bus. • ARM DDI 0406B Some debug registers are unaffected by power-down. This means that a debugger can, for example, identify the processor while the core power domain is powered-down. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C6-5 Debug Register Interfaces To have full debug support for power-down and re-powering of the processor, the following registers and individual bits need to be in the debug power domain: DBGECR This enables the debugger to set the OS Unlock Catch bit to 1 any time and still break on completion of the power-up sequence. If this register was in the core power domain, the power-down event would clear this catch bit to 0. For more information, see Event Catch Register (DBGECR) on page C10-78. DBGDRCR[0] Halt request bit This enables the debugger to request a Debug state entry even if the processor is powered down. Also, if the debugger makes this request before powering-down but the request cannot be satisfied, for example because the processor is in Secure state but (DBGEN AND SPIDEN) = 0, the request remains pending through power-down. Note The processor has to be powered up to respond to a pending DBGDRCR[0] Halt request or External Debug request. OS Lock Access Register This enables the lock that the OS sets before saving the debug registers to remain set through power-down. For details see OS Lock Access Register (DBGOSLAR) on page C10-75. Device Power-down and Reset registers These registers must be in the debug power domain because some of their functions are used for debugging power-down events. See Device Power-down and Reset Control Register (DBGPRCR), v7 Debug only on page C10-31, Device Power-down and Reset Status Register (DBGPRSR), v7 Debug only on page C10-34. Lock Access Register, if implemented If implemented, this register must be in the debug power domain because it is used to enable certain accesses by external debug interface, and this functionality is required when debugging power-down events. Identification registers and the DBGDIDR The identification registers are at addresses 0xD00-0xDFC, and 0xFD0-0xFEC. For details of these registers see Management registers, ARMv7 only on page C10-88. Debugger operation only requires the above registers and bits to be in the debug power domain. However, to rationalize the split between the debug and core power domains in the register map, ARMv7 requires an implementation that supports debug over power-down to have all bits of the following registers in the debug power domain: DBGDIDR, DBGECR, and DBGDRCR No error response is returned on read or write accesses when the core power domain is powered down. C6-6 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Register Interfaces OS Save and Restore registers, and Device Power-down and Reset registers No error response returned on read or write accesses when the core power domain is powered down. However, accesses to the OS Lock Access Register (DBGOSLAR) and OS Save and Restore Register (DBGOSSRR) are UNPREDICTABLE when the core power domain is powered-down. All of the management registers, except for the IMPLEMENTATION DEFINED integration registers The management registers are registers 823 - 1023, in the address range 0xD00-0xFFC. Requiring all these registers to be in the debug power domain simplifies the decoding of register addresses for the registers in the debug power domain. Note The CP15 c0 registers (0xD00-0xDFC) are included in this category. For all other registers, including any IMPLEMENTATION DEFINED registers, it is IMPLEMENTATION DEFINED whether the register is implemented in the core or the debug power domain. Figure C6-1 shows the recommended power domain split. Processor Core power domain Power controller Core domain Vdd Remainder of processor logic Debug domain Vdd External debug interface DBGPWRDUP DBGNOPWRDWN Bridge All other Debug registers DBGDIDR, DBGECR, DBGDRCR, OS Save and Restore registers, DBGPRCR, DBGPRSR, and Management registers Debug power domain Power domain boundary Figure C6-1 Recommended power domain split between core and debug power domains The signals DBGNOPWRDWN and DBGPWRDUP shown in Figure C6-1 above form an interface between the power controller and the processor debug logic that is in the debug power domain. With this interface: • ARM DDI 0406B the external debugger can request the power controller to emulate power-down, simplifying the requirements on software by sacrificing entirely realistic behavior Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C6-7 Debug Register Interfaces • the external debug interface knows when the core power domain is powered down, and can communicate this information to the external debugger. For details of these signals see DBGNOPWRDWN on page AppxA-9 and DBGPWRDUP on page AppxA-10. If the core power domain is not being powered down at the same time as the debug power domain then the authentication signal DBGEN must be pulled LOW before power is removed from the debug power domain. The behavior of the debug logic, and in particular the generation of debug events, is UNPREDICTABLE when the debug power domain is not powered if DBGEN is not LOW. Pulling DBGEN LOW ensures that debug events are ignored by the processor. For more information, see Changing the authentication signals on page AppxA-4. Reads and writes of debug registers when the debug logic is powered down are UNPREDICTABLE. The performance monitors must be implemented in the core power domain, and must continue to operate when debug power is removed. The rest of this part of this manual assumes that two power domains are implemented as described in this section, and that therefore the implementation supports debug over power-down. Features that are not required for an ARMv7 implementation with a single power domain are identified as SinglePower, with a description of the differences in behavior. A SinglePower implementation cannot support debug over power-down. C6.2.3 The OS Save and Restore mechanism The OS Save and Restore mechanism enables an operating system to save the debug registers before power-down and restore them when power is restored. This extends the support for debug over power-down, and permits debug tools to work at a higher level of abstraction when there are no power-down events. In v7 Debug: • If an implementation supports debug over power-down, then it must implement the OS Save and Restore mechanism. • On a SinglePower implementation, and on any other implementation that does not support debug over power-down, it is IMPLEMENTATION DEFINED whether the OS Save and Restore mechanism is implemented. • If the OS Save and Restore mechanism is not implemented, the DBGOSLSR must be implemented as RAZ, and the other OS Save and Restore mechanism register encodings must be RAZ/WI. In v6 Debug and v6.1 Debug, these registers are not defined. Two of the requirements for an implementation that supports debug over power-down are: C6-8 • An operating system must be able to save and restore the much of the debug logic state over a power-down. This requirement is met by the OS Save and Restore mechanism. • A debugger must be able to detect that a processor has powered-down. For more information, see Permissions in relation to power-down on page C6-28. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Register Interfaces The OS Save and Restore mechanism is provided by the following registers: • OS Save and Restore Register (DBGOSSRR), see OS Save and Restore Register (DBGOSSRR) on page C10-77 • OS Lock Access Register (DBGOSLAR), see OS Lock Access Register (DBGOSLAR) on page C10-75 • OS Lock Status Register (DBGOSLSR), see OS Lock Status Register (DBGOSLSR) on page C10-76 • The Event Catch Register (DBGECR), see Event Catch Register (DBGECR) on page C10-78. You can read the DBGOSLSR to detect whether the OS Save and Restore mechanism is implemented. If it is not implemented the read of the DBGOSLSR returns zero. The DBGOSSRR works in conjunction with an internal sequence counter, so that a series of reads or writes of this register saves or restores the complete debug logic state of the processor that would be lost when the processor is powered down. The internal sequence counter is reset to the start of the sequence by writing the key, 0xC5ACCE55, to the DBGOSLAR. The number of accesses required, and the order and interpretation of the data are IMPLEMENTATION DEFINED. The first access to the DBGOSSRR following the reset of the internal sequence counter must be a read: • when performing an OS Save sequence this read returns the number of reads from the DBGOSSRR that are needed to save the entire debug logic state • when performing an OS Restore sequence the value returned by this read is UNKNOWN. The result of issuing a write to the DBGOSSRR following a reset of the internal sequence counter is UNPREDICTABLE. Note • If the OS Save and Restore mechanism is not implemented, this first read returns zero, correctly indicating to software that no registers are to be saved. • An implementation that includes the OS Save and Restore mechanism might not provide access to the DBGOSSRR through the external debug interface. In this case: ARM DDI 0406B — the DBGOSLSR, DBGOSLAR, and DBGECR are accessible through the external debug interface — through the external debug interface, the DBGOSSRR is RAZ/WI — because the first read of the DBGOSSRR through the external debug interface returns zero, this indicates that the OS registers cannot be saved or restored through the external debug interface. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C6-9 Debug Register Interfaces The subsequent accesses to the DBGOSSRR must be either all reads or all writes. UNPREDICTABLE behavior results if: • reads and writes are mixed • more accesses are performed than the number of registers to be saved or restored, as returned by the first read in the OS Save sequence. • the subsequent accesses are writes, but the OS Lock is cleared with fewer writes performed than the number of registers to be restored. The debug logic state of the processor is unchanged if the OS Lock is cleared during or following an OS Save sequence. The sequence is restarted the next time the OS Lock is set. When the core power domain is powered down or when the OS Lock is not locked, reads of DBGOSSRR return an UNKNOWN value and writes are UNPREDICTABLE. See Example OS Save and Restore sequences on page C6-12 for software examples of the OS Save and Restore processes. The debug logic state preserved by the OS Save and Restore mechanism If debug over power-down is supported, the OS Save and Restore mechanism permits the following debug logic state to be preserved: • The registers that must be in the debug power domain, see Power domains and debug on page C6-5. • The DBGWFAR. • The DBGBVRs, DBGBCRs, DBGWVRs, DBGWCRs, and DBGVCR. • The DBGDSCCR and DBGDSMCR. • The data transfer registers DBGDTRTX and DBGDTRRX, subject to the values of DBGDSCR.TXfull and DBGDSCR.RXfull when the OS Save sequence is performed: — If DBGDSCR.TXfull is set to 1 then the value of DBGDTRTX is guaranteed to be saved and restored. — If DBGDSCR.RXfull is set to 1 then the value of DBGDTRRX is guaranteed to be saved and restored. — If either of these flags is not set to 1 when the OS Save sequence is performed then the value of the corresponding register is UNKNOWN after the OS Restore sequence. Note The OS Save and Restore sequences must not stall reading the values of DBGDTRTX and DBGDTRRX, and must not cause any instructions to be issued, regardless of the settings of the DBGDSCR.ExtDCCmode access mode bits. C6-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Register Interfaces • The DCC status flags themselves: — DBGDSCR.TXfull, bit [29] — DBGDSCR.TXfull_l, bit [26] — DBGDSCR.RXfull, bit [30] — DBGDSCR.RXfull_l, bit [27]. Note Reading DBGDSCR through the DBGOSSRR has no side-effects, that is, the values of TXfull_l and RXfull_l are unchanged. • All other writable flags in the DBGDSCR: — Method of Debug Entry bits, MOE, bits [5:2] — Force Debug Acknowledge bit, DBGack, bit [10] — Interrupts Disable bit, INTdis, bit [11] — User mode Access to Communication Channel Enable bit, UDCCdis, bit [12] — Execute ARM Instruction Enable bit, ITRen, bit [13] — Halting debug-mode Enable bit, HDBGen, bit [14] — Monitor debug-mode Enable bit, MDBGen, bit [15] — External DCC access mode field, ExtDCCmode, bits [21:20]. • If vectored interrupt support is implemented and enabled, all state required to ensure the correct generation of Vector Catch debug events. For more information, see Vector catch debug events and vectored interrupt support on page C3-22. The OS Save sequence must preserve at least all of this debug logic state that is lost when the core power domain is powered down. The OS Save sequence does not have to preserve any debug logic state that is not lost when the core power domain is powered down. That is, it does not have to preserve any debug logic state that is in the debug power domain. The OS Save and Restore mechanism does not preserve: • The sticky exception flags in the DBGDSCR, and the contents of the DBGITR. • The read-only processor status flags in the DBGDSCR: — HALTED, bit [0] — RESTARTED, bit [1] — SPIDdis, bit [16] — SPNIDdis, bit [17] — NS, bit [18] — ADAdiscard, bit [19] — InstrCompl_l, bit [24] — PipeAdv, bit [25]. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C6-11 Debug Register Interfaces • The performance monitor registers described in Chapter C9 Performance Monitors. • The trace registers. The OS Restore sequence always overwrites the debug registers with the values that were saved. In particular, the values of the DBGDTRTX and DBGDTRRX Registers, and of the DCC status flags TXfull, TXfull_l, RXfull, and RXfull_l after the OS Restore sequence are the saved values. If there were valid values in the DBGDTRTX or DBGDTRRX Registers immediately before the OS Restore sequence then those values are lost. Example OS Save and Restore sequences Example OS Save and Restore sequences are described in: • Example OS Save and Restore sequences using the memory-mapped interface • Example OS Save and Restore sequences using the Extended CP14 interface on page C6-14. Example OS Save and Restore sequences using the memory-mapped interface On an implementation that includes the OS Save and Restore mechanism and a memory-mapped interface: • Example C6-1 shows the correct sequence for saving the debug logic state, using the memory-mapped interface, before powering down • Example C6-2 on page C6-13 shows the correct sequence for restoring the debug logic state, using the memory-mapped interface, when the system is powered on again. When the debug logic state is restored, if the OS Unlock Catch bit in the Event Catch Register is set to 1 a debug event is triggered when the DBGOSLAR is cleared. This event might be used by an external debugger to restart a debugging session. See Event Catch Register (DBGECR) on page C10-78. Example C6-1 OS debug register save sequence, memory-mapped interface ; On entry, R0 points to a block to save the debug registers in. SaveDebugRegisters PUSH {R4, LR} MOV R4, R0 ; Save pointer ; (1) Set OS Lock Access Register (DBGOSLAR). The architecture requires that DBGOSLAR ; and the other debug registers have at least the Device memory attribute. BL GetDebugRegisterBase ; Returns base in R0 LDR R1, =0xC5ACCE55 STR R1, [R0, #0x300] ; Write DBGOSLAR ; (2) Get the number of words to save. LDR R1, [R0, #0x308] STR R1, [R4], #4 ; DBGOSSRR returns size ; Push on to the save stack ; (3) Loop reading words from the DBGOSSRR. C6-12 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Register Interfaces CMP R1, #0 SaveDebugRegisters_Loop ITTT NE LDRNE R2, [R0, #0x308] STRNE R2, [R4], #4 SUBSNE R1, R1, #1 BNE SaveDebugRegisters_Loop ; Check for zero ; Load a word of data ; Push on to the save stack ; (4) Return the pointer to first word not written to. Leave DBGOSLAR set, because ; from now on we do not want any changes. MOV R0, R4 POP {R4, PC} Example C6-2 OS debug register restore sequence, memory-mapped interface ; On entry, R0 points to a block of saved debug registers. RestoreDebugRegisters PUSH {R4, LR} MOV R4, R0 ; (1) ; ; ; BL LDR STR ; Save pointer Set the OS Lock Access Register (DBGOSLAR) and reset pointer. The lock will already be set, but this write is needed to reset the pointer. The architecture requires that DBGOSLAR and the other debug registers have at least the Device memory attribute. GetDebugRegisterBase ; Returns base in R0 R1, =0xC5ACCE55 R1, [R0, #0x300] ; Write DBGOSLAR ; (2) Clear the Sticky Power-down Status bit. LDR R1, [R0, #0x314] ; Read DBGPRSR to clear StickyPD ; (3) Get the number of words saved. LDR R1, [R0, #0x308] LDR R1, [R4], #4 ; Dummy read of DBGOSSRR ; Get register count from the save stack ; (4) Loop writing words from the DBGOSSRR. CMP R1, #0 ; Check for zero RestoreDebugRegisters_Loop ITTT NE LDRNE R2, [R4], #4 ; Load a word from the save stack STRNE R2, [R0, #0x308] ; Store a word of data SUBSNE R1, R1, #1 BNE RestoreDebugRegisters_Loop ; (5) Clear the DBGOSLAR. Writing any non-key value clears the lock, so use the ; zero value in R1. STR R1, [R0, #0x300] ; Write DBGOSLAR ; (6) A final DSB ensures the restore is complete and an ISB ensures ; the restored register values are visible to subsequent instructions. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C6-13 Debug Register Interfaces DSB ISB ; (7) Return the pointer to first word not read. MOV R0, R4 POP {R4, PC} Example OS Save and Restore sequences using the Extended CP14 interface On an implementation that includes the OS Save and Restore mechanism and the Extended CP14 interface: • Example C6-3 shows the correct sequence for saving the debug logic state, using the Extended CP14 interface, before powering down • Example C6-4 on page C6-15 shows the correct sequence, using the Extended CP14 interface, for restoring the debug logic state when the system is powered on again. When the debug logic state is restored, if the OS Unlock Catch bit in the Event Catch Register is set to 1 a debug event is triggered when the DBGOSLAR is cleared. This event might be used by an external debugger to restart a debugging session. See Event Catch Register (DBGECR) on page C10-78. Example C6-3 OS debug register save sequence, Extended CP14 interface ; On entry, R0 points to a block to save the debug registers in. SaveDebugRegisters ; (1) Set OS Lock Access Register (DBGOSLAR). LDR R1, =0xC5ACCE55 MCR p14, 0, R1, c1, c0, 4 ; Write DBGOSLAR ISB ; (2) Get the number of words to save. MRC p14, 0, R1, c1, c2, 4 STR R1, [R0], #4 ; DBGOSSRR returns size ; Push on to the save stack ; (3) Loop reading words from the DBGOSSRR. CMP R1, #0 ; Check for zero SaveDebugRegisters_Loop ITTT NE MRCNE p14, 0, R2, c1, c2, 4 ; Load a word of data STRNE R2, [R0], #4 ; Push on to the save stack SUBSNE R1, R1, #1 BNE SaveDebugRegisters_Loop ; (4) Return the pointer to first word not written to. This pointer is already in R0, so ; all that is needed is to return from this function. ; ; Leave DBGOSLAR set, because from now on we do not want any changes. BX LR C6-14 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Register Interfaces Example C6-4 OS debug register restore sequence, Extended CP14 interface ; On entry, R0 points to a block of saved debug registers. RestoreDebugRegisters ; (1) Set OS Lock Access Register (DBGOSLAR) and reset pointer. The lock ; will already be set, but this write is needed to reset the pointer. LDR R1, =0xC5ACCE55 MCR p14, 0, R1, c1, c0, 4 ; Write DBGOSLAR ISB ; (2) Clear the Sticky Power-down Status bit. MRC p14, 0, R1, c1, c5, 4 ; Read DBGPRSR to clear StickyPD ISB ; (3) Get the number of words saved. MRC p14, 0, R1, c1, c2, 4 LDR R1, [R0], #4 ; Dummy read of DBGOSSRR ; Load size from the save stack ; (4) Loop writing words from the DBGOSSRR. CMP R1, #0 ; Check for zero RestoreDebugRegisters_Loop ITTT NE LDRNE R2, [R0], #4 ; Load a word from the save stack MCRNE p14, 0, R2, c1, c2, 4 ; Store a word of data SUBSNE R1, R1, #1 BNE RestoreDebugRegisters_Loop ; (5) Clear the OS Lock Access Register (DBGOSLAR). Writing any non-key value ; clears the lock, so use the zero value in R1. ISB MCR p14, 0, R1, c1, c0, 4 ; Write DBGOSLAR ; (6) A final ISB guarantees the restored register values are visible to subsequent ; instructions. ISB ; (7) Return the pointer to first word not read. This pointer is already in R0, so ; all that is needed is to return from this function. BX LR ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C6-15 Debug Register Interfaces C6.2.4 Recommended reset scheme for v7 Debug The processor reset scheme is IMPLEMENTATION DEFINED. The ARM architecture, described in parts A and B of this manual, does not distinguish different levels of reset. However, in a typical system, there are a number of reasons why multiple levels of reset might exist. In particular, for debug: • It is desirable to be able to debug the reset sequence. This requires support for: — setting the debug register values before performing a processor reset — a processor reset not resetting the debug register values. • Providing separate power domains means you might need to reset the debug logic independently from the logic in the core power domain. For these reasons, v7 Debug introduces a distinction between debug logic reset and non-debug logic reset. These resets can be applied independently. The reset descriptions in parts A and B of this manual describe the non-debug logic reset. Part C describes the debug logic reset and its interaction with the non-debug logic reset. The non-debug logic reset is sometimes referred to as a core logic reset. ARM recommends use of the following reset signals for an implementation that supports these independent resets: nSYSPORESET This signal must be driven LOW on power-up of both the core and debug power domains. It sets parts of the processor logic, including debug logic, to a known state. nCOREPORESET If the core power domain is powered down while the system is still powered up, this signal must be driven LOW when the core power domain is powered back up. It sets parts of the processor logic in the core power domain to a known state. Also, this reset initializes the debug registers that are in the core power domain. nRESET This signal is driven LOW to generate a warm reset, that is, when the system wants to set the processor to a known state but the reset has nothing to do with any power-down, for example a watchdog reset. It sets parts of the non-debug processor logic to a known state. A debug session must be unaffected by this reset. PRESETDBGn The debugger drives this signal LOW to set parts of the debug logic to a known state. This signal must be driven LOW on power-up of the debug logic. v6 Debug and v6.1 Debug systems do not support multiple power domains and therefore ARM recommends a less flexible reset scheme, consisting of only nSYSPORESET and nRESET. The debug logic is reset only on nSYSPORESET and has no independent reset signal. In the v7 Debug recommended reset scheme, a separate PRESETDBGn reset signal can be asserted at any time, not just at power-up. This new signal has similar effects to nSYSPORESET, that is, it clears all debug registers, unless otherwise noted by the register definition. For more information, see Appendix A Recommended External Debug Interface. Asynchronously asserting PRESETDBGn can lead to UNPREDICTABLE behavior. For example, the reset might change the values of debug registers that are in use or will be used by software. For more information about this reset scheme, contact ARM. C6-16 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Register Interfaces Table C6-1 summarizes the v7 Debug recommended reset scheme. Table C6-1 Recommended reset scheme, v7 Debug Debug power domain Core power domain Signal Debug logic Debug logic Non-debug logic nSYSPORESET Reset Reseta Reset nCOREPORESET Not reset Reseta Reset nRESET Not reset Not reset Reset PRESETDBGn Reset Reseta Not reset a. If the core power domain is not powered, or the Sticky Power-down status bit DBGPRSR[1] is set to 1, it is UNPREDICTABLE whether the registers are reset. If power is not applied to the core power domain, nCOREPORESET must be driven LOW when power is restored to the core power domain. This resets these registers. For ARMv7 SinglePower systems, ARM recommends only nSYSPORESET, nRESET, and PRESETDBGn. Debug behavior when the processor is in debug logic reset The implementation of separate debug and core power domains with a separate debug logic reset signal means that a processor can access debug registers and the DCC while in the debug logic reset state. When in debug logic reset: • The behavior of the DCC is UNPREDICTABLE. In particular, the values of the DBGDSCR.RXfull and DBGDSCR.TXfull flags are UNKNOWN. • It is UNPREDICTABLE whether a debug event that would have been generated by the state of the debug logic immediately before the debug logic reset is generated. • The debug logic must not generate any debug event that would not have been generated if the system was not in debug logic reset. • Accesses to the debug registers through the Extended CP14, memory-mapped and external debug interfaces are UNPREDICTABLE. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C6-17 Debug Register Interfaces C6.3 Debug register map Table C6-2 lists all of the debug registers. Full details of each register can be found in the referenced section. The number of DBGBVR/DBGBCR and DBGWVR/DBGWCR pairs is IMPLEMENTATION DEFINED, see the BRPs and WRPs fields of the Debug ID Register (DBGDIDR) on page C10-3. An implementation can have up to 16 of each. The interpretation of the information in the Access column depends on whether the coprocessor or memory-mapped interface is used to access the register. Collectively, registers 832-1023 are known as the management registers. Table C6-2 Debug register map Register number Offset Access a Versions b Name and reference to description 0 0x000 Read-only All Debug ID Register (DBGDIDR) on page C10-3. Not applicable c - Read-only v7 only Debug ROM Address Register (DBGDRAR) on page C10-7. Not applicable c - Read-only v7 only Debug Self Address Offset Register (DBGDSAR) on page C10-8. 1-5 - - - Reserved. 6 0x018 Read/write v7 d Watchpoint Fault Address Register (DBGWFAR) on page C10-28. 7 0x01C Read/write All Vector Catch Register (DBGVCR) on page C10-67. 8 - - - Reserved. 9 0x024 Read/write v7 only Event Catch Register (DBGECR) on page C10-78. 10 0x028 Read/write v6.1, v7 Debug State Cache Control Register (DBGDSCCR) on page C10-81. 11 0x02C Read/write v6.1, v7 Debug State MMU Control Register (DBGDSMCR) on page C10-84. 12-31 - - - Reserved. 32 0x080 Read/write v7 e DBGDTRRX external view f. See Host to Target Data Transfer Register (DBGDTRRX) on page C10-40. C6-18 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Register Interfaces Table C6-2 Debug register map (continued) Register number Offset Access a Versions b Name and reference to description 33 0x084 Write-only v7 e Instruction Transfer Register (DBGITR) on page C10-46. Read-only v7 e Program Counter Sampling Register (DBGPCSR) on page C10-38. 34 0x088 Read/write v7 e DBGDSCR external viewf. See Debug Status and Control Register (DBGDSCR) on page C10-10. 35 0x08C Read/write v7 e DBGDTRTX external viewf. See Target to Host Data Transfer Register (DBGDTRTX) on page C10-43. 36 0x090 Write-only v7 only Debug Run Control Register (DBGDRCR), v7 Debug only on page C10-29. 37-39 - - - Reserved. 40 0x0A0 Read-only v7 only Program Counter Sampling Register (DBGPCSR) on page C10-38 41 0x0A4 Read-only v7 only Context ID Sampling Register (DBGCIDSR) on page C10-39 42-63 - - - Reserved. 64-79 0x1000x13C Read/write or - All Breakpoint Value Registers (DBGBVR) on page C10-48 or Reserved. 80-95 0x1400x17C Read/write or - All Breakpoint Control Registers (DBGBCR) on page C10-49 or Reserved. 96-111 0x1800x1BC Read/write or - All Watchpoint Value Registers (DBGWVR) on page C10-60 or Reserved. 112-127 0x1C00x1FC Read/write or - All Watchpoint Control Registers (DBGWCR) on page C10-61 or Reserved. 128-191 - - - Reserved. 192 0x300 Write-only v7 only OS Lock Access Register (DBGOSLAR) on page C10-75. 193 0x304 Read-only v7 only OS Lock Status Register (DBGOSLSR) on page C10-76. 194 0x308 Read/write v7 only OS Save and Restore Register (DBGOSSRR) on page C10-77. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C6-19 Debug Register Interfaces Table C6-2 Debug register map (continued) Register number Offset Access a Versions b Name and reference to description 195 - - - Reserved. 196 0x310 Read/write v7 only Device Power-down and Reset Control Register (DBGPRCR), v7 Debug only on page C10-31. 197 0x314 Read-only v7 only Device Power-down and Reset Status Register (DBGPRSR), v7 Debug only on page C10-34. 198-511 - - - Reserved. 512-575 0x8000x8FC - v7 only IMPLEMENTATION DEFINED. 576-831 - - - Reserved. 832-895 0xD000xDFC Read-only v7 only Processor identification registers on page C10-88. 896-927 - - - Reserved. 928-959 0xE800xEFC - v7 only IMPLEMENTATION DEFINED integration registers. See the 960 0xF00 Read/write v7 only Integration Mode Control Register (DBGITCTRL) on page C10-91. 961-999 0xF040xF9C - v7 only Reserved for management registers expansion. 1000 0xFA0 Read/write v7 only Claim Tag Set Register (DBGCLAIMSET) on page C10-92. 1001 0xFA4 Read/write v7 only Claim Tag Clear Register (DBGCLAIMCLR) on page C10-93. 1002-1003 - - - Reserved. 1004 0xFB0 Write-only v7 only Lock Access Register (DBGLAR) on page C10-94. 1005 0xFB4 Read-only v7 only Lock Status Register (DBGLSR) on page C10-95. 1006 0xFB8 Read-only v7 only Authentication Status Register (DBGAUTHSTATUS) on page C10-96. 1007-1009 - - - Reserved. C6-20 CoreSight Architecture Specification. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Register Interfaces Table C6-2 Debug register map (continued) Register number Offset Access a Versions b Name and reference to description 1010 0xFC8 Read-only v7 only Debug Device ID Register (DBGDEVID) on page C10-6. 1011 0xFCC Read-only v7 only Device Type Register (DBGDEVTYPE) on page C10-98. 1012-1019 0xFD00xFEC Read-only v7 only Debug Peripheral Identification Registers (DBGPID0 to DBGPID4) on page C10-98. 1020-1023 0xFF00xFFC Read-only v7 only Debug Component Identification Registers (DBGCID0 to DBGCID3) on page C10-102. a. For more information, see CP14 debug registers access permissions on page C6-36 and Permission summaries for memory-mapped and external debug interfaces on page C6-45. b. An entry of All in the Versions column indicates that the register is implemented in v6 Debug, v6.1 Debug, and v7 Debug. c. These registers are only implemented through the Baseline CP14 interface and do not have register numbers or offsets. d. The method of accessing the DBGWFAR is different in v6 Debug, v6.1 Debug and v7 Debug. For details see Watchpoint Fault Address Register (DBGWFAR) on page C10-28. e. In v6 Debug and v6.1 Debug, ARM recommends these registers as part of the external debug interface, and are not implemented through the Extended CP14 interface. In v7 Debug these registers are required. f. Internal views of the DBGDTRRX, DBGDTRTX, and DBGDSCR are implemented through the Baseline CP14 interface. This is explained in Internal and external views of the DBGDSCR and the DCC registers. C6.3.1 Internal and external views of the DBGDSCR and the DCC registers For each of the three registers DBGDSCR, DBGDTRTX and DBGDTRRX there are two views, denoted by int and ext suffixes. The differences between these aliases relate to the handling of the Debug Communications Channel (DCC), and in particular the TXfull and RXfull status flags. The nomenclature internal and external derives from the intended usage model. Accesses to DBGDSCRint, DBGDTRRXint or DBGDTRTXint are always made through the Baseline CP14 interface described in The Baseline CP14 debug register interface on page C6-32. DBGDSCRint is read-only in v7 Debug. Accesses to DBGDSCRext, DBGDTRRXext or DBGDTRTXext can be made through: • the Extended CP14 interface, if implemented • the memory-mapped interface, if implemented • the external debug interface. However, if at any given time you attempt to access the DBGDSCRext, DBGDTRRXext and DBGDTRTXext registers through more than one interface the behavior is UNPREDICTABLE. If an implementation provides a single port to handle external debug interface and the memory-mapped interface ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C6-21 Debug Register Interfaces accesses, that port might serialize accesses to the registers from the two interfaces. However, the effects of reads and writes to these registers are such that the behavior observed from either interface appears as UNPREDICTABLE. Internal view External view DBGDSCRint (Read-only) DBGDSCR DBGDSCRext (Read/write) RXfull_l Copied on reads RXfull TXfull_l Copied on reads TXfull 1 (on writes) DBGDTRTXint (Write-only) TX Read Logic DBGDTRTX DBGDTRTXext (Read/write) DBGDTRRX DBGDTRRXext (Read/write) 0 (on reads) DBGDTRRXint (Read-only) RX Write Logic Figure C6-2 v7 Debug Internal (int) and External (ext) views of the DCC registers Note • DBGDSCRint and DBGDSCRext only provide different views onto the underlying DBGDSCR • DBGDTRRXint and DBGDTRRXext only provide different views onto the underlying DBGDTRRX Register • DBGDTRTXint and DBGDTRTXext only provide different views onto the underlying DBGDTRTX Register. See also: • Debug Status and Control Register (DBGDSCR) on page C10-10 • Host to Target Data Transfer Register (DBGDTRRX) on page C10-40 • Target to Host Data Transfer Register (DBGDTRTX) on page C10-43. C6-22 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Register Interfaces C6.3.2 Effect of the Security Extensions on the debug registers When the Security Extensions are implemented, all debug register are Common registers, meaning they are common to the Secure and Non-secure states. For more information, see Common CP15 registers on page B3-74. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C6-23 Debug Register Interfaces C6.4 Synchronization of debug register updates Software running on the processor can program the debug registers through at least one of: • a CP14 coprocessor interface • the memory-mapped interface, if it is implemented. It is IMPLEMENTATION DEFINED which interfaces are implemented. For the CP14 coprocessor interface, the following synchronization rules apply: • All changes to CP14 debug registers that appear in program order after any explicit memory operations are guaranteed not to affect those memory operations. • Any change to CP14 debug registers is guaranteed to be visible to subsequent instructions only after one of: — performing an ISB operation — taking an exception — returning from an exception. However, for CP14 coprocessor register accesses, all MRC and MCR instructions to the same register using the same register number appear to occur in program order relative to each other without context synchronization. For the memory-mapped interface, the following synchronization rules apply: • All memory-mapped debug registers must be mapped to Strongly-ordered or Device memory, otherwise the effect of any access to the memory-mapped debug registers is UNPREDICTABLE. • Changes to memory-mapped debug registers that appear in program order after an explicit memory operation are guaranteed not to affect that previous memory operation only if the order is guaranteed by the memory order model or by the use of a DMB or DSB operation between the memory operation and the register change. • A DSB operation causes all writes to memory-mapped debug registers appearing in program order before the DSB to be completed. • With respect to other accesses by the same processor to the memory-mapped debug registers, all accesses to memory-mapped debug registers have their effect in the order in which the accesses occur, as governed by the memory order model and the use of DSB and DMB operations. • All accesses to memory-mapped debug registers that are completed are only guaranteed to affect subsequent instructions after one of: — performing an ISB operation — taking an exception — returning from an exception. Some memory-mapped debug registers are not idempotent for reads or writes. Therefore, the region of memory occupied by the debug registers must not be marked as Normal memory, because the memory order model permits accesses to Normal memory locations that are not appropriate for such registers. C6-24 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Register Interfaces Synchronization between register updates made through the external debug interface and updates made by software running on the processor is IMPLEMENTATION DEFINED. However, if the external debug interface is implemented through the same port as the memory-mapped interface, then updates made through the external debug interface have the same properties as updates made through the memory-mapped interface. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C6-25 Debug Register Interfaces C6.5 Access permissions This section describes the basic concepts of the access permissions model for debug registers on ARMv7 processors. The actual rules for each interface, and for ARMv6 implementations, are given in the section describing the register interface: • CP14 debug registers access permissions on page C6-36 • Permission summaries for memory-mapped and external debug interfaces on page C6-45. The restrictions for accessing the registers can be divided into three categories: Privilege of the access Accesses from processors in the system to the memory-mapped registers, and accesses to coprocessor registers, can be required to be privileged. Locks Can be used to lock out different parts of the register map so they cannot be accessed. Power-down Access to registers in the core power domain is not possible when that domain is powered down. When permission to access a register is not granted, an error is returned. The nature of this error depends on the interface: • For coprocessor interfaces, the error is an Undefined Instruction exception • For the memory-mapped interface, the error is a slave-generated error response, for example PSLVERRDBG. The error is normally signaled to the processor as an external abort. • For the external debug interface, the error is signaled to the debugger by the Debug Access Port. Holding the processor in warm reset, whether by using an external warm reset signal or by using the Device Power-down and Reset Control Register (DBGPRCR), does not affect the behavior of the memory-mapped or external debug interface. The Hold non-debug reset control bit of the DBGPRCR enables an external debugger to keep the processor in warm reset while programming other debug registers. For details see Device Power-down and Reset Control Register (DBGPRCR), v7 Debug only on page C10-31. C6.5.1 Permissions in relation to the privilege of the access The majority of debug registers can only be accessed by privileged code. The exception to this general requirement is a small subset of the registers, defined in The Baseline CP14 debug register interface on page C6-32. Using the coprocessor interface, privileged code can disable User mode access to this subset of registers. For the memory-mapped interface, it is IMPLEMENTATION DEFINED whether restricting debug register access to privileged code is implemented by the processor or must be implemented by the system designer at the system level. The behavior of an access that is not permitted is IMPLEMENTATION DEFINED, however it must either be ignored or aborted. C6-26 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Register Interfaces Note • The recommended memory-mapped interface port is based on the AMBA® Advanced Peripheral Bus (APBv3), that does not support signaling of access privileges. Therefore in this case the system must prevent the access. • This access restriction applies to the privilege of the initiator of the access, not the current mode of the processor being accessed. The privilege of accesses made by a Debug Access Port is IMPLEMENTATION DEFINED. The system designer can impose additional restrictions. However, ARM strongly recommends that designers do not impose restrictions such as only permitting Secure privileged accesses, and does not support such restrictions in its debug tools. C6.5.2 Permissions in relation to locks The registers can be locked by a debugger or by an operating system so that access to debug registers is restricted. There are three locks, although some of these locks only apply to certain interfaces: Software Lock The Software Lock only applies to accesses made through the memory-mapped interface. By default, software is locked out so the debug registers settings cannot be modified. A debug monitor must leave this lock set when not accessing the debug registers, to reduce the chance of erratic code modifying debug settings. When this lock is set, writes to the debug registers from the memory-mapped interface are ignored. For more information about this lock, see Lock Access Register (DBGLAR) on page C10-94 and Lock Status Register (DBGLSR) on page C10-95. OS Lock An OS must set this lock on the debug registers before starting an OS Save or Restore sequence, so that the debug registers cannot be read or written during the sequence. When this lock is set, accesses to some registers return errors. Only the OS Save and Restore mechanism registers can be accessed safely. Note An external debugger can clear this lock at any time, even if an OS Save or Restore operation is in progress. For more information about this lock, see OS Lock Access Register (DBGOSLAR) on page C10-75 and OS Lock Status Register (DBGOSLSR) on page C10-76. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C6-27 Debug Register Interfaces Debug Software Enable An external debugger can use the Debug Software Enable function to prevent modification of the debug registers by a debug monitor or other software running on the system. The Debug Software Enable is a required function of the Debug Access Port, and is implemented as part of the ARM Debug Interface v5. For more information see the ARM Debug Interface v5 Architecture Specification. See also DBGSWENABLE on page AppxA-11. Note • The states of the Software Lock and the OS Lock are held in the debug power domain, and the Debug Software Enable is in the Debug Access Port. Therefore, these locks are unaffected by the core power domain powering down. Also, all of these locks are set to their reset values only on reset of the debug power domain, that is. on a PRESETDBGn or nSYSPORESET reset. • On SinglePower systems, the Software Lock and OS Lock are lost over a power-down. It is whether the single processor power-domain also includes the Debug Access Port, and therefore also whether the Debug Software Enable is lost over a power-down. IMPLEMENTATION DEFINED C6.5.3 Permissions in relation to power-down Accesses cannot be made through the coprocessor interface when the core power domain is powered down. Access to registers in the core power domain is not possible when the domain is powered down, and accesses return an error response. Note Returning this error response, rather than simply ignoring writes, means that the debugger and the debug monitor detect the debug session interruption as soon as it occurs. This makes re-starting the session, after power-up, considerably easier. When the core power domain powers down, the Sticky Power-down status bit, bit [1] of the Device Power-down and Reset Status Register, is set to 1. This bit remains set to 1 until it is cleared to 0 by a read of this register after the core power domain has powered up. If the register is read while the core power domain is still powered down, the bit remains set to 1. When this bit is 1 the behavior is as if the core power domain is powered down, meaning the processor ignores accesses to registers inside the core power domain and the system returns an error. This applies whether the register is accessed through the Extended CP14 interface, the memory-mapped interface, or the external debug interface. This behavior is useful because when the external debugger tries to access a register whose contents might have been lost by a power-down, it gets the same response regardless of whether the core power domain is currently powered down or has powered back up. This means that, if the external debugger does not access the external debug interface during the window where the core power domain is powered down, the processor still reports the occurrence of the power-down event. C6-28 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Register Interfaces Access to all debug registers is not possible if the debug logic is powered down. In this situation: • the system must respond to any access made through the memory-mapped or external debug interface when the debug power domain is powered down, and ARM recommends that the system generates an error response • accesses through the coprocessor interface are UNPREDICTABLE. The debug logic is powered down: C6.5.4 • when the debug power domain is powered down, in an implementation with separate debug and core power domains • when the processor is powered down, in a SinglePower implementation. Access to IMPLEMENTATION DEFINED and reserved registers The following subsections describe the responses to accesses to IMPLEMENTATION DEFINED and reserved registers: • Access to implementation defined registers • Access to reserved registers on page C6-30. Note There are no IMPLEMENTATION DEFINED or reserved registers in the Baseline CP14 interface and therefore these sections do not say anything about accesses through the Baseline CP14 interface. Any unused registers in the spaces for IMPLEMENTATION DEFINED registers must behave as reserved registers. These spaces are register numbers 512-575 and 928-959. Access to IMPLEMENTATION DEFINED registers When the Debug Software Enable function, described in Permissions in relation to locks on page C6-27, is disabling software access to the debug registers, Table C6-3 shows how the response to an accesses to an IMPLEMENTATION DEFINED register depends on the debug interface used for the access. Table C6-3 Accesses to IMPLEMENTATION DEFINED registers when Debug Software Enable disables access Debug interlace used for access a Response Memory-mapped interface Error response Extended CP14 interface Undefined Instruction exception External debug interface IMPLEMENTATION DEFINED a. There are no IMPLEMENTATION DEFINED registers in the Baseline CP14 interface. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C6-29 Debug Register Interfaces It the Debug Software Enable function is not disabling software access to the debug registers, the response to any access to an IMPLEMENTATION DEFINED register is IMPLEMENTATION DEFINED. This means the response is IMPLEMENTATION DEFINED if any of the following apply: • the core power domain is powered-down • the Sticky Powered-down Status bit is set to 1 • the OS Lock is implemented and is locked • the attempted access is using the memory-mapped interface and the Software Lock is locked. Note The IMPLEMENTATION DEFINED registers include the IMPLEMENTATION DEFINED integration registers, register numbers 928-959. Access to reserved registers The response to an access to a reserved register depends on the interface you are using to attempt the access, as follows: Memory-mapped interface When the Debug Software Enable function, described in Permissions in relation to locks on page C6-27, is disabling software access to the debug registers, any access to a reserved register through the memory-mapped interface returns an error response. This includes accesses to reserved registers in the management registers space, register numbers 832-1023. When the Debug Software Enable function is not disabling software access to the debug registers: • Reserved registers in the management registers space, except for reserved registers in the IMPLEMENTATION DEFINED integration registers space, are UNK/SBZP. • For all other reserved registers, it is UNPREDICTABLE whether a register access returns an error response if any of the following applies: — the core power domain is powered-down — the Sticky Powered-Down Status bit is set to 1 — the OS Lock is implemented and is locked — the Software Lock is locked. If none of these applies then these reserved registers are UNK/SBZP. Extended CP14 interface In v6 Debug and v6.1 Debug, any attempt to access a reserved register causes an Undefined Instruction exception. In v7 Debug: • C6-30 When the Debug Software Enable function is disabling software access to the debug registers, any attempt to access a reserved register causes an Undefined Instruction exception. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Register Interfaces • When the Debug Software Enable function enables software access to the debug registers, any attempt to access a reserved register: — causes an Undefined Instruction exception if the access is from User mode — is UNPREDICTABLE if the access is from a privileged mode. External debug interface Reserved registers in the management registers space, except for reserved registers in the IMPLEMENTATION DEFINED integration registers space, are UNK/SBZP. For all other reserved registers: • It is UNPREDICTABLE whether a register access returns an error response if any of the following applies: — the core power domain is powered-down — the Sticky Powered-Down Status bit is set to 1 — the OS Lock is implemented and is locked. • If none of these applies then these reserved registers are UNK/SBZP. Note • • ARM DDI 0406B There are no reserved registers in the Baseline CP14 interface. Unimplemented breakpoint and watchpoint registers are reserved registers. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C6-31 Debug Register Interfaces C6.6 The CP14 debug register interfaces This section contains the following subsections: • The Baseline CP14 debug register interface • Extended CP14 interface on page C6-33 • CP14 debug registers access permissions on page C6-36. C6.6.1 The Baseline CP14 debug register interface Table C6-4 lists the set of CP14 debug instructions for accessing the debug registers that must be implemented. All MRC and MCR instructions with = 0b1110 and = 0b000 are debug instructions: • Some of these instructions are defined in Table C6-4. • Additional instructions are defined in Extended CP14 interface on page C6-33 • All other instructions are reserved for use by the Debug architecture. The behavior of reserved instructions is defined in CP14 debug registers access permissions on page C6-36. All MRC and MCR instructions with = 0b1110 and = 0b001 are used by the trace extension. Other values of are not used by the Debug architecture. All LDC and STC instructions with = 0b1110 that are not listed below are reserved for use by the Debug architecture and are currently UNDEFINED. All CDP, MRC2, MCR2, LDC2, STC2, LDCL, STCL, LDC2L, and STC2L instructions with = 0b1110 are UNDEFINED. Instructions that access registers that are only available in v7 Debug are UNDEFINED in earlier versions of the Debug architecture. For example, the read from DBGDRAR performed by MRC p14,0,,c1,c0,0 is UNDEFINED in v6 Debug and v6.1 Debug, but is permitted in v7 Debug. refers to any of the general-purpose registers R0-R14. Use of APSR_nzcv is UNPREDICTABLE except where stated. Use of R13 is UNPREDICTABLE in Thumb and ThumbEE state, and is deprecated in ARM state. Table C6-4 Baseline CP14 debug instructions Instruction Mnemonic Version Name and reference to description MRC p14,0,,c0,c0,0 DBGDIDR All Debug ID Register (DBGDIDR) on page C10-3 MRC p14,0,,c1,c0,0 DBGDRAR v7 only Debug ROM Address Register (DBGDRAR) on page C10-7 MRC p14,0,,c2,c0,0 DBGDSAR v7 only Debug Self Address Offset Register (DBGDSAR) on page C10-8 C6-32 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Register Interfaces Table C6-4 Baseline CP14 debug instructions (continued) Instruction Mnemonic Version Name and reference to description MRC p14,0,,c0,c5,0 DBGDTRRXint All a DBGDTRRX internal view. See Host to Target Data Transfer Register (DBGDTRRX) on page C10-40 DBGDTRTXint All a DBGDTRTX internal view. See Target to Host Data Transfer Register (DBGDTRTX) on page C10-43 DBGDSCRint Alla DBGDSCR internal view. See Debug Status and Control Register (DBGDSCR) on page C10-10 STC p14,c5, MCR p14,0,,c0,c5,0 LDC p14,c5, MRC p14,0,,c0,c1,0 MRC p14,0,APSR_nzcv,c0,c1,0 b a. For more information, see the register description. b. DBGDSCR[31:28] are transferred to the N, Z, C and V condition flags. For more information, see Program Status Registers (PSRs) on page B1-14. C6.6.2 Extended CP14 interface The architectural requirements for the Extended CP14 interface depend on the Debug architecture version: v6 Debug and v6.1 Debug All debug registers can be accessed through CP14, and implementations must provide an external access mechanism for debuggers. The details of this mechanism are not covered by the architecture specification. See Features specific to v6 Debug and v6.1 Debug on page C6-35. v7 Debug The Extended CP14 interface to the debug registers is optional. The Baseline CP14 interface is sufficient to boot-strap access to the register file, and enables software to distinguish between the Extended CP14 and memory-mapped interfaces. See Features specific to v7 Debug on page C6-34. If the Extended CP14 interface is not implemented, the memory-mapped interface must be implemented. See section The memory-mapped and recommended external debug interfaces on page C6-43. Note This section does not apply to a v7 Debug implementation that does not implement the Extended CP14 interface. The full list of debug registers is given in Table C6-2 on page C6-18 and is not repeated here. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C6-33 Debug Register Interfaces With some exceptions, listed in Features specific to v7 Debug and Features specific to v6 Debug and v6.1 Debug on page C6-35, all the debug registers, including those in the IMPLEMENTATION DEFINED space, are accessed by the following coprocessor instructions, with <= 0b0111 and the mapping shown in Figure C6-3: MRC p14,0,,,, ; Read • • MCR p14,0,,,, ; Write Bit 10 Value 0 Field 9 8 7 6 5 4 3 2 1 0 Register number[9:0] (0-1023) CRn[3:0] op2[2:0] CRm[3:0] Figure C6-3 Mapping from register number to CP14 instruction For example, the instruction: MRC p14,0,,c0,c0,5 reads the value of DBGBCR0, that is register 80, 0b0001010000. Features specific to v7 Debug Table C6-5 lists the exceptions, in the v7 Debug Extended CP14 interface, to the standard mapping. In the v7 Debug Extended CP14 interface, all the instructions are UNDEFINED in User mode and UNPREDICTABLE in privileged modes. Table C6-5 Exceptions to the standard mapping, v7 Debug with Extended CP14 interface C6-34 Register number Name Access Standard mapping 33 Program Counter Sampling Register Read-only MRC p14,0,,c0,c1,2 Instruction Transfer Register Write-only MCR p14,0,,c0,c1,2 40 Program Counter Sampling Register Read-only MRC p14,0,,c0,c8,2 41 Context ID Sampling Register Read-only MRC p14,0,,c0,c9,2 832-895 Processor identification registers Read-only MRC p14,0,,c6,c0,4 to MRC p14,0,,c6,c15,7 1004 Lock Access Register Write-only MCR p14,0,,c7,c12,6 1005 Lock Status Register Read-only MRC p14,0,,c7,c13,6 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Register Interfaces Accesses to the external views DBGDSCRext, DBGDTRRXext and DBGDTRTXext can be made through the standard mapping of these registers, in addition to the instructions to access the internal views DBGDSCRint, DBGDTRRXint and DBGDTRTXint provided in the Baseline CP14 interface. See Internal and external views of the DBGDSCR and the DCC registers on page C6-21. Features specific to v6 Debug and v6.1 Debug Table C6-6 lists the exceptions, in the Extended CP14 interface in v6 Debug and v6.1 Debug, to the standard mapping. All the instructions listed are UNDEFINED in ARMv6. Table C6-6 Exceptions to the standard mapping, v6 Debug and v6.1 Debug Register number Name Access Standard mapping, all UNDEFINED 32 Host to Target Data Transfer Register Read/write MRC p14,0,,c0,c0,2 MCR p14,0,,c0,c0,2 33 34 Program Counter Sampling Register Read-only MRC p14,0,,c0,c1,2 Instruction Transfer Register Write-only MCR p14,0,,c0,c1,2 Debug Status and Control Register Read/write MRC p14,0,,c0,c2,2 MCR p14,0,,c0,c2,2 35 Target to Host Data Transfer Register Read/write MRC p14,0,,c0,c3,2 MCR p14,0,,c0,c3,2 See also footnote e on Table C6-2 on page C6-18, regarding registers 32, 33, 34, and 35. In v6 Debug and v6.1 Debug, no debug registers map to CP14 instructions with != 0b0000. All instruction encodings with != 0b0000 and = 0 are UNDEFINED in User mode and UNPREDICTABLE in privileged modes. All reserved encodings with = 0b0000 are UNDEFINED in all modes. Table C6-7 defines an additional ARMv6 instruction for making an internal access write to the DBGDSCR. Table C6-7 Additional ARMv6 CP14 debug instruction Instruction Mnemonic Name MCR p14,0,,c0,c1,0 DBGDSCRint Debug Status and Control Register (DBGDSCR) on page C10-10 ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C6-35 Debug Register Interfaces C6.6.3 CP14 debug registers access permissions By default, certain CP14 debug registers can be accessed from User mode. However, the processor can be programmed to prevent User mode access to these CP14 debug registers. For more information, see the description of the UDCCdis bit in Debug Status and Control Register (DBGDSCR) on page C10-10. All CP14 debug registers can be accessed if the processor is in Debug state. Note When the Software Lock (DBGLAR) is implemented for a memory-mapped interface, it does not affect the behavior of CP14 instructions. Baseline CP14 debug registers access permissions Access to the Baseline CP14 debug registers is governed by the processor mode, Debug state and the value of DBGDSCR.UDCCdis. In addition, when the OS Lock is set accesses to the baseline registers are UNPREDICTABLE. Note OS Lock is implemented only in v7 Debug. These access permissions are shown: • in Table C6-8 for v6 Debug and v6.1 Debug • in Table C6-9 on page C6-37 for v7 Debug Table C6-8 Access to Baseline CP14 debug registers in v6 Debug and v6.1 Debug Conditions Baseline CP14 instructions a DBGDSCRint writes Debug state Processor mode DBGDSCR.UDCCdis b Yes X X Proceed Proceed No User 0 Proceed UNDEFINED No User 1 UNDEFINED UNDEFINED No Privileged X Proceed Proceed a. Read DBGDIDR, DBGDSCRint, DBGDTRRXint, or write DBGDTRTXint. Attempting to use an MCR instruction to access the DBGDIDR always causes an Undefined Instruction exception. b. DCC User mode accesses disable bit, see Debug Status and Control Register (DBGDSCR) on page C10-10. C6-36 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Register Interfaces Table C6-9 Access to Baseline CP14 debug registers in v7 Debug Conditions Baseline CP14 instructions a Debug state Processor mode DBGDSCR.UDCCdis b OS Lock Yes X X 0 Proceed Yes X X 1 UNPREDICTABLEc No User 0 0 Proceed No User 0 1 UNPREDICTABLEc No User 1 X UNDEFINED d No Privileged X 0 Proceed No Privileged X 1 UNPREDICTABLEc a. Read DBGDIDR, DBGDSAR, DBGDRAR, DBGDSCRint, DBGDTRRXint, or write DBGDTRTXint. Attempting to use an MCR instruction to read DBGDIDR, DBGDSAR, DBGDRAR, or DBGDSCRint is UNPREDICTABLE, except in the case shown by footnote d. b. DCC user accesses disable bit, see Debug Status and Control Register (DBGDSCR) on page C10-10. c. Apart from reads of DBGDIDR, which proceed. d. Under these conditions, attempting to use an MCR instruction to read DBGDIDR, DBGDSAR, DBGDRAR, or DBGDSCRint always causes an Undefined Instruction exception. Note The Baseline CP14 instructions are not affected by: • the recommended Debug Software Enable control in the Debug Access Port, see Permissions in relation to locks on page C6-27 • the Sticky Power-down status bit in the Device Power-down and Reset Status Register (DBGPRSR), see Device Power-down and Reset Status Register (DBGPRSR), v7 Debug only on page C10-34. For more information on access permissions and restrictions see Access permissions on page C6-26. In addition: • if the debug power domain is powered down, instructions that access the debug registers are UNPREDICTABLE • ARM DDI 0406B when the processor is in debug logic reset, reads of the debug registers return UNKNOWN values. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C6-37 Debug Register Interfaces v7 Debug CP14 debug registers access permissions, Extended CP14 interface not implemented Table C6-10 summarizes the complete set of CP14 instructions if the Extended CP14 interface is not implemented. In this situation, only the Baseline CP14 interface is implemented. Table C6-10 Access to unallocated CP14 debug registers, v7 Debug with no Extended CP14 interface Conditions C6-38 CP14 debug MCR and MRC instructions, other than Baseline CP14 instructions Debug state Processor mode Yes X UNPREDICTABLE No User UNDEFINED No Privileged UNPREDICTABLE Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Register Interfaces v7 Debug CP14 debug registers access permissions, Extended CP14 interface implemented If the Extended CP14 interface is implemented, the Debug Software Enable function can be used to prevent access to registers other than the DBGDIDR, DBGDSCR, DBGDTRRX, DBGDTRTX, DBGDSAR, DBGDRAR, DBGOSLAR, DBGOSLSR and DBGOSSRR. For more information, see Permissions in relation to locks on page C6-27. For a v7 Debug implementation with the Extended CP14 interface: • Table C6-9 on page C6-37 shows the access permissions for the Baseline CP14 debug registers • Table C6-11 summarizes the access permissions for the other CP14 debug registers • Table C6-12 on page C6-40 gives more information about access to the Extended CP14 interface debug registers. Table C6-11 Access to CP14 debug registers, v7 Debug with Extended CP14 interface Conditions a Other CP14 debug instructions b Debug state Processor mode Enable c CRn <= 0b0111 d CRn >= 0b1000 Yes X 0 UNDEFINED e UNPREDICTABLE Yes X 1 See Table C6-12 on page C6-40 f UNPREDICTABLE No User X UNDEFINED UNDEFINED No Privileged 0 UNDEFINED e UNPREDICTABLE No Privileged 1 See Table C6-12 on page C6-40 UNPREDICTABLE a. The accesses in this table are not affected by the value of the DBGDSCR.UDCCdis bit. b. All MRC and MCR instructions with == 0b1110 and == 0b000 except for read accesses to DBGDIDR, DBGDSAR, DBGDRAR, DBGDSCRint, and DBGDTRRXint, and write accesses to DBGDTRTXint. c. Debug Software Enable function is enabled. d. Where indicated in this column, see Table C6-12 on page C6-40 for a more detailed description of access permissions to the other registers defined by the Debug architecture. In addition, there is more information about access to reserved and IMPLEMENTATION DEFINED registers in Access to implementation defined and reserved registers on page C6-29. e. Except for the OS Save and Restore mechanism registers DBGOSLAR, DBGOSLSR, and DBGOSSRR, and the DBGPRSR. The state of the Debug Software Enable function does not affect access to these registers. Access to these registers must always be provided, even on implementations that do not support debug over power-down. If the implementation does not support debug over power-down the DBGOSLAR, DBGOSLSR, and DBGOSSRR are RAZ/WI. f. ARM deprecates the use of these instructions from User mode in Debug state. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C6-39 Debug Register Interfaces Table C6-12 Access to Extended CP14 interface debug registers Conditions Registers: Sticky Power- down set OS Lock set DBGECR, DBGDRCR, DBGOSLAR a, DBGOSLSR a, DBGPRCR, DBGPRSR No No OK UNPREDICTABLE OK UNPREDICTABLE OK No Yes OK OK UNDEFINED UNPREDICTABLE OK Yes X OK UNPREDICTABLE UNDEFINED UNPREDICTABLE OK DBGOSSRR a Other debug b All reserved c Other mgmt d a. If the OS Save and Restore mechanism is not implemented, these registers addresses behave as reserved locations. b. Debug register numbers 0 to 127, except for the DBGECR, DBGDRCR, the registers defined as baseline registers, and reserved registers. For details of the baseline registers see Table C6-4 on page C6-32. c. See also Access to implementation defined and reserved registers on page C6-29. d. Other management registers. This means debug register numbers 832 to 1023, except for the IMPLEMENTATION DEFINED locations, see Access to implementation defined and reserved registers on page C6-29. In v7 Debug the behavior of Extended CP14 interface MRC and MCR instructions also depends on the access type of the register, as shown in Table C6-2 on page C6-18. Table C6-13 summarizes the behavior of these instructions, for: • read accesses, using MRC p14,0,,,, • write accesses, using MCR p14,0,,,,. Table C6-13 Behavior of CP14 MRC and MCR instructions, v7 Debug with Extended CP14 interface Access type a Read access b Write access b - (Reserved) UNPREDICTABLE UNPREDICTABLE Read-only Returns register value in Rt UNPREDICTABLE Write-only UNPREDICTABLE Writes value in Rt to register Read/write Returns register value in Rt Writes value in Rt to register a. Register access type, as shown in Table C6-2 on page C6-18. b. In a privileged mode, or in Debug state. Some read/write registers include bits that are read-only. These bits ignore writes. C6-40 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Register Interfaces When the processor is in Non-debug state, all User mode accesses to the Extended CP14 interface registers are UNDEFINED. For example, in privileged modes the following instruction reads the value of DBGWVR7, register 103, if at least 8 watchpoints are implemented, and is UNPREDICTABLE otherwise: MRC p14,0,,c0,c7,6 Note The access permissions in Table C6-11 on page C6-39 and Table C6-12 on page C6-40 have precedence over the behavior in Table C6-13 on page C6-40. For example, even if at least 8 watchpoints are implemented, the following instruction is UNDEFINED in all processor modes when the Debug Software Enable function is disabled: MRC p14,0,,c0,c7,6 v6 Debug and v6.1 Debug CP14 debug registers access permissions In v6 Debug and v6.1 Debug, access to registers other than the DBGDIDR, DBGDSCR, DBGDTRRX, and DBGDTRTX is not permitted if Halting debug-mode is selected. The Debug Software Enable function, the Sticky Power-down status bit and the OS Lock are not implemented, and there are fewer CP14 debug registers than in the v7 Debug Extended CP14 interface. For v6 Debug and v6.1 Debug: • Table C6-8 on page C6-36 shows the access permissions for the Baseline CP14 debug registers • Table C6-14 shows the access permissions for the other CP14 debug registers. Table C6-14 Access to CP14 debug registers, v6 Debug and v6.1 Debug Conditions a Other CP14 debug instructions b Debug state Processor mode DBGDSCR[15:14] c Yes X XX Proceed No User XX UNDEFINED No Privileged 00 (None) UNDEFINED No Privileged X1 (Halting) UNDEFINED No Privileged 10 (Monitor) Proceed a. The accesses in this table are not affected by the value of the DBGDSCR.UDCCdis bit. b. All instructions with == 0b000 and == 0b0000, except for read accesses to DBGDIDR, DBGDSAR, DBGDRAR, DBGDSCRint, and DBGDTRRXint, and write accesses to DBGDSCRint and DBGDTRTXint. See also Table C6-15 on page C6-42. c. MDBGen and HDBGen bits, debug-mode enable and select bits. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C6-41 Debug Register Interfaces In v6 Debug and v6.1 Debug the behavior of CP14 MRC and MCR instructions also depends on access type of the register, as shown in Table C6-2 on page C6-18. Table C6-15 summarizes the behavior, for: • read accesses, using MRC p14,0,,,, • write accesses, using MCR p14,0,,,,. Table C6-15 Behavior of CP14 MRC and MCR instructions in v6 Debug and v6.1 Debug Access type a Read access Write access - (Reserved) UNDEFINED UNDEFINED Read-only (DBGDIDR b) Returns register value in Rt UNDEFINED Write-only c - - Read/write Returns register value in Rt Writes value in Rt to register a. Register access type, as shown in Table C6-2 on page C6-18. b. The DBGDIDR is the only read-only register in v6 Debug and v6.1 Debug. c. There are no write-only registers in v6 Debug and v6.1 Debug. Some read/write registers include bits that are read-only. These bits ignore writes. For example, the following instruction reads the value of DBGWVR7, register 103, if at least 8 watchpoints are implemented, and is UNDEFINED otherwise: MRC p14,0,,c0,c7,6 Note The access permissions in Table C6-14 on page C6-41 have precedence over those in Table C6-15. For example, even if at least 8 watchpoints are implemented, the following instruction is UNDEFINED in User mode, and is also UNDEFINED in privileged modes when Halting debug-mode is enabled: MRC p14,0,,c0,c7,6 C6-42 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Register Interfaces C6.7 The memory-mapped and recommended external debug interfaces The external debug interface is IMPLEMENTATION DEFINED in all versions of the ARM Debug architecture. This manual describes only the v7 Debug recommendations for this interface. For details of the external debug interface recommendations for v6 Debug and v6.1 Debug, contact ARM. The memory-mapped interface to the debug registers is optional in v7 Debug. The Baseline CP14 interface is sufficient to boot-strap access to the register file, and permits software to distinguish between the Extended CP14 and memory-mapped interfaces. Both the memory-mapped interface and the recommended external debug interface are defined in terms of an addressable register file mapped onto a region of memory. This section describes: • the view of the debug registers from the processor through the memory-mapped interface • the recommended external debug interface. If the memory-mapped interface is not implemented, the Extended CP14 interface must be implemented, see Extended CP14 interface on page C6-33. C6.7.1 Register map The register map occupies 4KB of physical address space. The base address is IMPLEMENTATION DEFINED and must be aligned to a 4KB boundary. Note All memory-mapped debug registers must be mapped to Strongly-ordered or Device memory, see Synchronization of debug register updates on page C6-24. In systems with the ARMv7 PMSA this requirement applies even when the MPU is disabled. Each register is mapped at an offset that is the register number multiplied by 4, the size of a word. For example, DBGWVR7, register 103, is mapped at offset 0x19C (412). The complete list of registers is defined in Debug register map on page C6-18, and is not repeated here. C6.7.2 Shared interface port for the memory-mapped and external debug interfaces What components in a system can access the memory-mapped interface is IMPLEMENTATION DEFINED. Typically, the processor itself and other processors in the system can access this interface. An external debugger might be able to access the debug registers through the memory-mapped interface, as well as through the external debug interface. Because the memory-mapped interface and external debug interface share the same memory map and many of the same properties, both interfaces can be implemented as a single physical interface port to the processor. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C6-43 Debug Register Interfaces If the memory-mapped interface and external debug interface are implemented as a single physical interface port, external debugger accesses must be distinguishable from those of software running on a processor, including the ARM processor itself, in the target system. For example, accesses by an external debugger are not affected by the Software Lock. For the recommended memory-mapped or external debug interface this is achieved using the PADDRDBG[31] signal, see PADDRDBG on page AppxA-15. C6.7.3 Endianness The recommended memory-mapped and external debug interface port, referred to as the debug port, only supports word accesses. The data presented or returned on the interface is always 32 bits and is in a fixed byte order: • bits [7:0] of the debug register are mapped to bits [7:0] of the connected data bus • bits [15:8] of the debug register are mapped to bits [15:8] of the connected data bus • bits [23:16] of the debug register are mapped to bits [23:16] of the connected data bus • bits [31:24] of the debug register are mapped to bits [31:24] of the connected data bus. The debug port ignores bits [1:0] of the address. These signals are not present in the debug port interface. The Debug Access Port (DAP) and the interface between it and the debug port together form part of the external debug interface, and must support word accesses from the external debugger to these registers. The recommended ARM Debug Interface v5 (ADIv5) supports word accesses, see the ARM Debug Interface v5 Architecture Specification for more information. Where this interface is used the implementation must ensure that a 32-bit access by the debugger through the Debug Access Port has the same 32-bit value, in the same bit order, as the corresponding access to the debug registers. This is a requirement for tools support using ADIv5. If a memory-mapped interface is implemented, the debug port connects to the system interconnect fabric either directly or through some form of bridge component. Such system interconnect fabrics normally support byte accesses. The system must support word-sized accesses to the debug registers. When accessing the debug registers, the behavior of an access that is smaller than word-sized is UNPREDICTABLE. The detailed behavior of this bridge and of the system interconnect is outside the scope of the architecture. Accesses to registers made through the debug port are not affected by the endianness configuration of the processor in which the registers reside. However, they are affected by the endianness configuration of the bus master making the access, and by the nature and configuration of the fabric that connects the two. In an ARMv7 processor, the CPSR.E bit controls the endianness. With some assumptions, described later in this section, the operation of the CPSR.E bit is: CPSR.E bit set to 0, for little-endian operation If the processor reads its own DBGDIDR with an LDR instruction, the system ensures that the value returned in the destination register is in the same bit order as the DBGDIDR itself. CPSR.E bit set to 1, for big-endian operation If the processor reads its own DBGDIDR with an LDR instruction, the system ensures that: • bits [7:0] of DBGDIDR are read into bits [31:24] of the destination register • bits [15:8] of DBGDIDR are read into bits [23:16] of the destination register C6-44 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Register Interfaces • • bits [23:16] of DBGDIDR are read into bits [15:8] of the destination register bits [31:24] of DBGDIDR are read into bits [7:0] of the destination register. Similarly the bytes of a data value written to a debug register, for example the DBGDSCR, are reversed in big-endian configuration. If an ARMv7 processor, with the E bit set for little-endian operation, reads the DBGDIDR of a second ARMv7 processor with an LDR instruction, then bits [7:0] of the DBGDIDR of the second processor are read into bits [7:0] of the destination register of the LDR, on the first processor. Similarly, the other bytes of the DBGDIDR are copied to the corresponding bytes of the destination register. However, if the E bit of the first processor is set for big-endian operation the bytes are reversed during the LDR operation, with bits [31:24] of the DBGDIDR of the second processor being read to bits [7:0] of the destination register of the LDR. Note The ordering of the bytes in the destination register on the first processor is not affected in any way by the setting of the CPSR.E bit of the second processor. These examples assume that no additional manipulation of the data occurs in the interconnect fabric of the system. For example, an interconnect might perform byte transposition for accesses made across a boundary between a little-endian subsystem and a big-endian subsystem. Such transformations are beyond the scope of the architecture. C6.7.4 Permission summaries for memory-mapped and external debug interfaces This section gives summaries of the permission controls and their effects for different implementations of v7 Debug systems. The following subsections describe the access permissions for the two interfaces: • Access permissions for the external debug interface on page C6-47 • Access permissions for the memory-mapped interface on page C6-48. Note For more information about access permissions in an implementation that includes the OS Save and Restore mechanism but does not provide access to the DBGOSSRR through the external debug interface, see the Note in The OS Save and Restore mechanism on page C6-8. The remaining subsections apply to both interfaces: • Meanings of terms and abbreviations used in this section on page C6-46 • Permissions summary for separate debug and core power domains on page C6-48 • Permissions summary for SinglePower (debug and core in single power domain) on page C6-50. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C6-45 Debug Register Interfaces Meanings of terms and abbreviations used in this section The following terms and abbreviations are used in the tables that summarize the access permissions: X Don't care. The outcome does not depend on this condition. 0 The condition is false. 1 The condition is true. For more information, see Table C6-16. The access is ignored or aborted. IG/ABT Note The IG/ABT response might be implemented outside the processor, for example, by the system or DAP. Proceed Not possible Error OK UNP DBGLAR The access must not be ignored, but the processor or system might return an error response. For more information about the response returned, see: • Permissions summary for separate debug and core power domains on page C6-48 • Permissions summary for SinglePower (debug and core in single power domain) on page C6-50. When the debug logic is powered down, accessing the debug registers is not possible. The system must respond to the access, and the response is IMPLEMENTATION DEFINED. ARM recommends that the system returns an error response. Error response. Writes are ignored and reads return an UNKNOWN value. Read or write access succeeds. Writes to read-only locations are ignored. Reads from RAZ or write-only locations return zero. Some read/write registers include bits that are read-only. Unless otherwise stated in the bit description, these bits ignore writes. The access has UNPREDICTABLE results. Reads return UNKNOWN value. Lock Access Register, see Lock Access Register (DBGLAR) on page C10-94. This is one of the management registers. Table C6-16 lists the control conditions used in this section, and tells you where you can find more information about each of these controls. These conditions can be given an argument of X, 0 or 1, as defined at the start of this section. The table gives more information about the meaning when the argument is 1 for each condition. Table C6-16 Meaning of (Argument = 1) for the control condition Control condition Meaning of (Argument = 1) For details see Debug logic powered The debug power domain is powered up a Permissions in relation to power-down on page C6-28 Core logic powered The core power domain is powered up a Processor powered The single power domain is powered up a Sticky power-down DBGPRSR[1] = 1 C6-46 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Register Interfaces Table C6-16 Meaning of (Argument = 1) for the control condition (continued) Control condition Meaning of (Argument = 1) For details see OS Lock DBGOSLSR[1] = 1 Software Lock DBGLSR[1] = 1 Permissions in relation to locks on page C6-27 Debug Software Enable The recommended function of the DAP is enabled a. On a SinglePower system, the Processor powered control condition is equivalent to having both Debug logic powered and Core logic powered on a system with the recommended separate debug and core power domains. Access permissions for the external debug interface Table C6-17 summarizes the access permissions for the external debug interface. When the debug logic is not powered, external debug accesses must be prohibited. An implementation can either ignore or abort these accesses. Table C6-17 Register access permissions for the external debug interface a Debug logic powered? b Response Writes or has other side-effects? No Not possible - Yes Proceed Yes a. See Meanings of terms and abbreviations used in this section on page C6-46 when using this table. b. Or Processor powered, on a SinglePower system. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C6-47 Debug Register Interfaces Access permissions for the memory-mapped interface Table C6-17 on page C6-47 summarizes the access permissions for the memory-mapped interface. At the system level, certain memory-mapped accesses must be prohibited. An implementation can either ignore or abort these accesses. Table C6-18 Register access permissions for the memory-mapped interface a Conditions: Response Writes or has other side-effects? Debug logic powered? b Debug Software Enable Access privilege Software Lock No X X X Not possible - Yes 0 X X IG/ABT - Yes X User X IG/ABT - Yes 1 Privileged 0 Proceed Yes Yes 1 Privileged 1 Proceed DBGLAR only c a. See Meanings of terms and abbreviations used in this section on page C6-46 when using this table. b. Or Processor powered, on a SinglePower system. c. Writes are ignored and reads, such as reads of DBGDSCRext, have no side-effects. Writes to the DBGLAR are permitted. Note If an implementation permits an external debugger to access the memory-mapped interface, it is IMPLEMENTATION DEFINED whether those accesses are controlled by the Debug Software Enable control in the debug access port. Permissions summary for separate debug and core power domains For implementations with separate debug and core power domains, the following tables show the effects of permissions on access to memory-mapped debug registers: • Table C6-19 on page C6-49 for access to debug and management registers • Table C6-20 on page C6-49 for access to the OS Save and Restore and Power-down registers. For more information about the conditions that control access to these registers, see Table C6-16 on page C6-46. C6-48 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Register Interfaces Table C6-19 Debug and management register access for separate debug and core power domains a Conditions Registers: Core logic powered? Sticky power-down OS Lock DBGDIDR, DBGECR, DBGDRCR Other debug b, d Management c, d Reserved d No X X OK Error OK UNP Yes 0 0 OK OK OK OK Yes 0 1 OK Error OK UNP Yes 1 X OK Error OK UNP a. See Meanings of terms and abbreviations used in this section on page C6-46 when using this table. b. Registers in the memory region 0x000 - 0x1FC, except for the DBGDIDR, DBGECR, and DBGDRCR, and reserved locations. c. Registers in the memory region 0xD00 - 0xFFC, except for IMPLEMENTATION DEFINED registers. d. For details of the behavior of accesses to reserved and IMPLEMENTATION DEFINED registers see Access to implementation defined and reserved registers on page C6-29. Table C6-20 OS Save and Restore and Power-down register access for separate debug and core power domains a Conditions Registers: Core logic powered? Sticky power-down OS Lock DBGOSLSR b DBGPRCR, DBGPRSR DBGOSLAR b DBGOSSRR b No X X OK UNP UNP Yes 0 0 OK OK UNP Yes 0 1 OK OK OK Yes 1 X OK OK UNP a. See Meanings of terms and abbreviations used in this section on page C6-46 when using this table. b. If the OS Save and Restore mechanism is not implemented, these registers behave as reserved locations. For details of the behavior of accesses to reserved and IMPLEMENTATION DEFINED registers see Access to implementation defined and reserved registers on page C6-29. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C6-49 Debug Register Interfaces Permissions summary for SinglePower (debug and core in single power domain) For implementations with a single debug and core power domain, when the processor is powered down the system response is IMPLEMENTATION DEFINED. ARM recommends that the system returns an error response, but the processor cannot generate any response. The Sticky Power-down status bit is RAZ. Table C6-21 and Table C6-22 show the effects of permissions on access to memory-mapped debug registers. For more information about the conditions that control access to these registers, see Table C6-16 on page C6-46. Table C6-21 Register accesses for single debug and core power domain, part 1 a Conditions Registers: Processor powered? OS Lock DBGDIDR, DBGECR, DBGDRCR, DBGOSLSR b, DBGPRCR, DBGPRSR DBGOSLAR b DBGOSSRR b No X Not possible Not possible Not possible Yes 0 OK OK UNP Yes 1 OK OK OK a. See Meanings of terms and abbreviations used in this section on page C6-46 when using this table. b. If the OS Save and Restore mechanism is not implemented, these registers behave as reserved locations. Table C6-22 Register accesses for single debug and core power domain, part 2 a Conditions Registers: Processor powered? OS Lock Other debug b, d Management c, d Reserved d No X Not possible Not possible Not possible Yes 0 OK OK OK Yes 1 Error OK UNP a. See Meanings of terms and abbreviations used in this section on page C6-46 when using this table. b. Registers in the memory region 0x000 - 0x1FC, except for the DBGDIDR, DBGECR, and DBGDRCR, and reserved locations. c. Management registers, that is, registers in the memory region 0xD00 - 0xFFC, except for IMPLEMENTATION DEFINED registers. d. For details of the behavior of accesses to reserved and IMPLEMENTATION DEFINED registers see Access to implementation defined and reserved registers on page C6-29. C6-50 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Debug Register Interfaces C6.7.5 Registers not implemented in the memory-mapped or external debug interface In any Debug architecture version, the following registers are not implemented through the memory-mapped or external debug interfaces: DBGDRAR Debug ROM Address Register (DBGDRAR) on page C10-7 DBGDSAR Debug Self Address Offset Register (DBGDSAR) on page C10-8. These registers are not required by an external debugger. In addition, there is no interface to access to DBGDSCRint, DBGDTRRXint or DBGDTRTXint through the memory-mapped or external debug interface. These operations are only available through the Baseline CP14 interface. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C6-51 Debug Register Interfaces C6-52 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Chapter C7 Non-invasive Debug Authentication This chapter describes the authentication controls on non-invasive debug operations. It contains the following sections: • About non-invasive debug authentication on page C7-2 • v7 Debug non-invasive debug authentication on page C7-4 • Effects of non-invasive debug authentication on page C7-6 • ARMv6 non-invasive debug authentication on page C7-8. Note The recommended external debug interface provides an authentication interface that controls both invasive debug and non-invasive debug, as described in Authentication signals on page AppxA-3. This chapter describes how you can use this interface to control non-invasive debug. For information about using the interface to control invasive debug see Chapter C2 Invasive Debug Authentication. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C7-1 Non-invasive Debug Authentication C7.1 About non-invasive debug authentication Non-invasive debug can be enabled or disabled though the external debug interface. In addition, if a processor implements the Security Extensions, non-invasive debug operations can be permitted or not permitted. The difference between enabled and permitted is that the permitted non-invasive debug operations depend on both the security state and the operating mode of the processor. The alternatives for when non-invasive debug is permitted are: • in all processor modes, in both Secure and Non-secure security states • only in Non-secure state • in Non-secure state and in Secure User mode. Whether non-invasive debug operations are permitted in Secure User mode depends on the value of the SDER.SUNIDEN bit, see c1, Secure Debug Enable Register (SDER) on page B3-108. In v6.1 Debug and v7 Debug, non-invasive debug authentication can be controlled dynamically, meaning that whether non-invasive debug is permitted can change while the processor is running, or while the processor is in Debug state. However, for more information, see Generation of debug events on page C3-40. In v6 Debug, non-invasive debug authentication can be changed only while the processor is in reset. In the recommended external debug interface, the signals that control the enabling and permitting of non-invasive debug are DBGEN, SPIDEN, NIDEN and SPNIDEN, see Authentication signals on page AppxA-3. Part C of this manual assumes that the recommended external debug interface is implemented. SPIDEN and SPNIDEN are only implemented on processors that implement Security Extensions. NIDEN is an optional signal in v6 Debug and v6.1 Debug. Note • DBGEN and SPIDEN also control invasive debug, see About invasive debug authentication on page C2-2. • In v6 Debug and v6.1 Debug, NIDEN might be implemented on some non-invasive debug components and not on others. For example, the performance monitoring unit for a processor might implement NIDEN when the trace macrocell for the same processor does not. • For more information about use of the authentication signals see Changing the authentication signals on page AppxA-4. • For more information about ARMv6 non-invasive debug see ARMv6 non-invasive debug authentication on page C7-8. If both DBGEN and NIDEN are LOW, no non-invasive debug is permitted. Non-invasive debug authentication in v7 Debug is described in the section v7 Debug non-invasive debug authentication on page C7-4. C7-2 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Non-invasive Debug Authentication The behavior of the non-invasive debug components when non-invasive debug is not enabled or not permitted is described in the following sections. These sections also describe the behavior when the processor is in Debug state: • Performance monitors on page C7-6 • Trace on page C7-7 • Reads of the Program Counter sampling registers on page C8-3. ARMv6 non-invasive debug authentication on page C7-8 describes the architectural requirements for an v6 Debug or v6.1 Debug implementation. Note Invasive and non-invasive debug authentication enable you to protect Secure processing from direct observation or invasion by a debugger that you do not trust. If you are designing a system you must be aware that security attacks can be aided by the invasive and non-invasive debug facilities. For example, Debug state or the DBGDSCR.INTdis bit might be used for a denial of service attack, and the Non-secure performance monitors might be used to measure the side-effects of Secure processing on Non-secure code. ARM recommends that where you are concerned about such attacks you disable invasive and non-invasive debug in all modes. However you must be aware of the limitations on the protection that debug authentication can provide, because similar attacks can be made by running malicious code on the processor in Non-secure state. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C7-3 Non-invasive Debug Authentication C7.2 v7 Debug non-invasive debug authentication On processors that do not implement Security Extensions, if NIDEN is asserted HIGH, non-invasive debug is enabled and permitted in all modes. If DBGEN is asserted HIGH the system behaves as if NIDEN is asserted HIGH, regardless of the actual state of the NIDEN signal. Table C7-1 shows the required behavior in v7 Debug when the Security Extensions are not implemented. Table C7-1 v7 Debug non-invasive debug authentication, Security Extensions not implemented DBGEN NIDEN Modes in which non-invasive debug is permitted LOW LOW None. Non-invasive debug is disabled. x HIGH All modes. HIGH LOW All modes. On a processor that implements the Security Extensions: • If both NIDEN and SPNIDEN are asserted HIGH, non-invasive debug is enabled and permitted in all modes and security states. • If NIDEN is HIGH and SPNIDEN is LOW: — non-invasive debug is enabled and permitted in Non-secure state — non-invasive debug is not permitted in Secure privileged modes — whether non-invasive debug is permitted in Secure User mode depends on the value of the SDER.SUNIDEN bit. If DBGEN is HIGH, the system behaves as if NIDEN is HIGH, regardless of the actual state of the NIDEN signal If SPIDEN is HIGH, the system behaves as if SPNIDEN is HIGH, regardless of the actual state of the SPNIDEN signal. C7-4 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Non-invasive Debug Authentication Table C7-2 shows the non-invasive debug authentication for ARMv7 processors that implement the Security Extensions. Table C7-2 v7 Debug non-invasive debug authentication, Security Extensions implemented Signals SDER. SUNIDEN Modes in which non-invasive debug is permitted DBGEN NIDEN SPIDEN SPNIDEN LOW LOW x x x None. Non-invasive debug is disabled. LOW HIGH LOW LOW 0 All modes in Non-secure state LOW HIGH LOW LOW 1 All modes in Non-secure state, Secure User mode. LOW HIGH LOW HIGH x All modes in both security states. LOW HIGH HIGH x x All modes in both security states. HIGH x LOW LOW 0 All modes in Non-secure state. HIGH x LOW LOW 1 All modes in Non-secure state, Secure User mode. HIGH x LOW HIGH x All modes in both security states. HIGH x HIGH x x All modes in both security states. Note The value of the SDER.SUIDEN bit does not have any effect on non-invasive debug. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C7-5 Non-invasive Debug Authentication C7.3 Effects of non-invasive debug authentication The following sections describe the effects of the non-invasive debug authentication on the non-invasive debug components: • Performance monitors • Trace on page C7-7 • Reads of the Program Counter sampling registers on page C8-3. C7.3.1 Performance monitors Performance monitors provide a non-invasive debug feature, and are controlled by the non-invasive debug authentication signals. For more information, see Chapter C9 Performance Monitors. The cycle counter, PMCCNTR, is not controlled by the non-invasive debug authentication signals. However, setting the PMCR.DP flag to 1 disables PMCCNTR counting in regions of code where the event counters are disabled. For details see c9, Performance Monitor Control Register (PMCR) on page C10-105. Table C7-3 describes the behavior of the performance monitors when non-invasive debug is disabled or not permitted, and in Debug state. Table C7-3 Behavior of performance monitors when non-invasive debug not permitted Debug state Non-invasive debug permitted and enabled PMCR.DP a Event counters enabled and events exported a, b PMCCNTR enabled Yes x x No No No Yes x Yes Yes No No 0 No Yes No No 1 No No a. See c9, Performance Monitor Control Register (PMCR) on page C10-105. b. The events are exported only if the PMCR.X bit is set to 1. The performance monitors are not intended to be completely accurate, see Accuracy of the performance monitors on page C9-5. In particular, some inaccuracy is permitted at the point of changing security state. However, to avoid the leaking of information from the Secure state, the permitted inaccuracy is that non-prohibited transactions can be uncounted. Prohibited transactions must not be counted. Entry to and exit from Debug state can also disturb the normal running of the processor, causing additional inaccuracy in the performance monitors. Disabling the counters while in Debug state limits the extent of this inaccuracy. Implementations can limit this inaccuracy to a greater extent, for example by disabling the counters as soon as possible during the Debug state entry sequence. C7-6 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Non-invasive Debug Authentication C7.3.2 Trace All instructions and data transfers are ignored by the trace device when: • non-invasive debug is disabled • the processor is in a mode or state where non-invasive debug is not permitted • the processor is in Debug state. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C7-7 Non-invasive Debug Authentication C7.4 ARMv6 non-invasive debug authentication An ARMv6 processor might implement the v7 Debug non-invasive debug authentication signaling described in v7 Debug non-invasive debug authentication on page C7-4. In general, non-invasive debug authentication in ARMv6 Debug is IMPLEMENTATION DEFINED. For details of the implemented authentication scheme you must see the appropriate product documentation. In particular: • it is IMPLEMENTATION DEFINED whether the NIDEN signal is implemented • the exact roles of the following signals are IMPLEMENTATION DEFINED: — DBGEN, SPIDEN, and SPNIDEN — NIDEN, if it is implemented. However, an ARMv6 non-invasive debug authentication scheme must obey the following rules: • If NIDEN is implemented then tying NIDEN and DBGEN both LOW guarantees that non-invasive debug is disabled. • if NIDEN is not implemented then the mechanism for disabling non-invasive debug is IMPLEMENTATION DEFINED. An implementation might not support any mechanism for disabling non-invasive debug. • When the Security Extensions are implemented, tying SPIDEN and SPNIDEN both LOW guarantees that non-invasive debug is not permitted in Secure privileged modes. In addition, if SPIDEN and SPNIDEN are both LOW then setting SDER.SUNIDEN to 0 guarantees that non-invasive debug is not permitted in Secure User mode. If non-invasive debug is enabled then if SDER.SUNIDEN is 1, non-invasive debug is permitted in Secure User mode. • If NIDEN is implemented then tying NIDEN and SPNIDEN both HIGH is guaranteed to enable and permit non-invasive debug in all modes in both security states. If NIDEN is not implemented then tying SPNIDEN HIGH is guaranteed to enable and permit non-invasive debug in all modes in both security states. Table C7-4 shows the architectural requirements for non-invasive debug behavior in an ARMv6 Debug implementation that does not include the Security Extensions. Table C7-4 ARMv6 non-invasive debug authentication requirements, Security Extensions not implemented C7-8 NIDEN DBGEN Non-invasive debug behavior Implemented and LOW LOW Disabled. Implemented and HIGH x Enabled. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Non-invasive Debug Authentication Table C7-5 shows the architectural requirements for non-invasive debug behavior in an ARMv6 Debug implementation that includes the Security Extensions. Table C7-5 ARMv6 non-invasive debug authentication requirements, Security Extensions implemented Signals SDER. SUNIDEN Non-invasive debug behavior NIDEN DBGEN SPIDEN SPNIDEN Implemented and LOW LOW x x x Disabled. x x LOW LOW 0 Not permitted in all modes in Secure state. x x LOW LOW 1 Not permitted in Secure privileged modes. Permitted in Secure User mode if enabled. Implemented and HIGH x x x x Permitted in all modes in Non-secure state. Might also be permitted in Secure state. Implemented and HIGH x x HIGH x Permitted in all modes and security states. Not implemented x x HIGH x Permitted in all modes and security states. An ARMv6 Debug implementation that includes the Security Extensions might have other signal combinations that permit non-invasive debug in Secure privileged modes. You must take care to avoid unknowingly permitting non-invasive debug. There is no mechanism that a debugger can use to determine the implemented mechanism for controlling non-invasive debug on an ARMv6 processor. You must see the product documentation for this information. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C7-9 Non-invasive Debug Authentication C7-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Chapter C8 Sample-based Profiling This chapter describes sample-based profiling. Sample-based profiling is an optional non-invasive debug component. It contains the following section: • Program Counter sampling on page C8-2. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C8-1 Sample-based Profiling C8.1 Program Counter sampling In ARMv6, the Program Counter Sampling Register (DBGPCSR) is an optional part of the recommended external debug interface. It is not defined by the architecture. In v7 Debug, Program Counter sampling is an optional feature defined by the architecture. The following sections describe this feature: • Implemented Program Counter sampling registers • Reads of the Program Counter sampling registers on page C8-3 C8.1.1 Implemented Program Counter sampling registers In v7 Debug, it is IMPLEMENTATION DEFINED whether the DBGPCSR is implemented. It is an optional extension to the Debug architecture, that provides a mechanism for coarse-grained profiling of code executing on the processor without changing the behavior of that code. For details see Program Counter Sampling Register (DBGPCSR) on page C10-38. If the DBGPCSR is implemented, it is IMPLEMENTATION DEFINED whether a second sampling register is also implemented. This register is the Context ID Sampling Register (DBGCIDSR) and is described in Context ID Sampling Register (DBGCIDSR) on page C10-39. If a processor does not implement DBGPCSR it does not implement DBGCIDSR. If a processor implements only DBGPCSR, it is IMPLEMENTATION DEFINED whether it is implemented as register 33, as register 40, or as both register 33 and register 40. If a processor implements both DBGPCSR and DBGCIDSR: • it must implement: — DBGPCSR as register 40 — DBGCIDSR as register 41 • it is IMPLEMENTATION DEFINED whether it also implements DBGPCSR as register 33. If a processor implements DBGPCSR as both register 33 and register 40, the two register numbers are aliases of a single register. ARM deprecates reading DBGPCSR as register 33 on an implementation that also implements it as register 40. To determine which, if any, of the Program Counter sampling registers are implemented, and the register numbers used for any implemented registers, read: C8-2 • the DEVID_imp and PCSR_imp bits of the DBGDIDR, see Debug ID Register (DBGDIDR) on page C10-3 • the DBGDEVID.PCsample field, see Debug Device ID Register (DBGDEVID) on page C10-6. Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Sample-based Profiling Note ARM recommends that an implementation that supports sample-based profiling: C8.1.2 • implements both DBGPCSR and DBGCIDSR • implements DBGPCSR as register 40 • also implements DBGPCSR as register 33, for backwards compatibility with implementations that implement it only as register 33. Reads of the Program Counter sampling registers A read of the DBGPCSR: • Normally: — returns the address of an instruction recently executed by the processor — sets the DBGCIDSR, if implemented, to the current value of the CONTEXTIDR. For more information about the CONTEXTIDR, see: — c13, Context ID Register (CONTEXTIDR) on page B3-153, for a VMSA implementation — c13, Context ID Register (CONTEXTIDR) on page B4-76, for a PMSA implementation. • Alternatively, when any of the following is true, returns 0xFFFFFFFF and sets the DBGCIDSR, if implemented, to an UNKNOWN value: — non-invasive debug is disabled — the processor is in a mode or state where non-invasive debug is not permitted — the processor is in Debug state. If the DBGCIDSR is implemented, reading it returns the last value to which it was set. Note The ARM architecture does not define recently executed. The delay between an instruction being executed by the processor and its address appearing in the DBGPCSR is not defined. For example, if a piece of code reads the DBGPCSR of the processor it is running on, there is no guaranteed relationship between the program counter for that piece of code and the value read. The DBGPCSR is intended only for use by an external agent to provide statistical information for code profiling. The value in the DBGPCSR always references a committed instruction. An implementation must not sample values that reference instructions that are fetched but not committed for execution. If DBGPCSR is implemented, it must be possible to sample references to branch targets. It is IMPLEMENTATION DEFINED whether references to other instructions can be sampled. ARM recommends that a reference to any instruction can be sampled. The branch target for a conditional branch instruction that fails its condition code check is the instruction that follows the conditional branch instruction. The branch target for an exception is the exception vector address. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C8-3 Sample-based Profiling If an instruction writes to the CONTEXTIDR, it is UNPREDICTABLE whether the DBGCIDSR is set to the original or new value of CONTEXTIDR when a read of the DBGPCSR samples a subsequent instruction that occurs before the earliest of: • the execution of an ISB instruction or an ISB operation • the taking of an exception • the execution of an exception return instruction. C8-4 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Chapter C9 Performance Monitors This chapter describes the performance monitors, that are a non-invasive debug component. It contains the following sections: • About the performance monitors on page C9-2 • Status in the ARM architecture on page C9-4 • Accuracy of the performance monitors on page C9-5 • Behavior on overflow on page C9-6 • Interaction with Security Extensions on page C9-7 • Interaction with trace on page C9-8 • Interaction with power saving operations on page C9-9 • CP15 c9 register map on page C9-10 • Access permissions on page C9-12 • Event numbers on page C9-13. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C9-1 Performance Monitors C9.1 About the performance monitors The basic organization of the performance monitors is: • A cycle counter. This can be programmed to increment either on every cycle, or once every 64 cycles. • A number of event counters. Each counter is configured to select the event that increments the counter. Space is provided in the architecture for up to 31 counters. The actual number of counters is IMPLEMENTATION DEFINED, and there is an identification mechanism for the counters. • Controls for enabling the counters, resetting the counters, flagging overflows, and enabling interrupts on counter overflow. The cycle counter can be enabled independently of the event counters. The counters are held in a set of registers that can be accessed in coprocessor space. This means the counters can be accessed from the operating system running on the processor, enabling a number of uses, including: • dynamic compilation techniques • energy management. In addition, you can provide access to the counters from application code, if required. This enables applications to monitor their own performance with fine grain control without requiring operating system support. For example, an application might implement per-function performance monitoring. There are many situations where performance monitoring features integrated into the processor are valuable for applications and for application development. When an operating system does not use the performance monitors itself, ARM recommends that it enables application code access to the performance monitors. However an implementation can choose not to implement any performance monitors. To enable interaction with external monitoring, an implementation might consider additional enhancements, including: • Providing a set of events, from which a selection can be exported onto a bus for use as external events. For very high frequency operation, this might introduce unacceptable timing requirements, but the bus could be interfaced to the trace macrocell or another closely coupled resource. • Providing the ability to count external events. Here, again, there are clock frequency issues between the processor and the system. A suitable approach might be to edge-detect changes in the signals and to use those changes to increment a counter. This enhancement requires the processor to implement a set of external event input pins. • Providing memory-mapped and external debug access to the performance monitor registers, to enable the counter resources to be used for system monitoring in systems where they are not used by the software running on the processor. Such access is not described in this manual. Contact ARM if you require more information about this option. C9-2 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Performance Monitors The set of events that might be monitored splits into: • events that are likely to be consistent across many microarchitectures • implementation specific events. Therefore, this architecture defines a common set of events to be used across many microarchitectures, and a large space reserved for IMPLEMENTATION DEFINED events. The full set of events for any given implementation is IMPLEMENTATION DEFINED, and there is no requirement to implement any of the common set of events. ARM recommends that ARMv7 processors implement as many of the events as are feasible given the architecture profile and microarchitecture of the implementation. The event numbers of the common set of events are reserved for the specified events. In this set, a particular event number must either: • be used for its assigned event • not be used. When an ARMv7 processor supports monitoring of an event that is assigned a number in the range allocated to the common set of events range, if possible it must use that number for the event. However, ARM might introduce additional event definitions in this range in future editions of this manual. Therefore software might encounter implementations where an event assigned a number in this range is monitored using an event number from the IMPLEMENTATION DEFINED range. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C9-3 Performance Monitors C9.2 Status in the ARM architecture The status of the architecturally-defined performance monitors block is that it is an IMPLEMENTATION DEFINED space for ARMv7, but ARM recommends implementers to use the approach described here to implement the performance monitors. C9-4 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Performance Monitors C9.3 Accuracy of the performance monitors The performance monitors provide approximately accurate performance count information. To keep the implementation and validation cost low, a reasonable degree of inaccuracy in the counts is acceptable. There is no exact definition of reasonable degree of inaccuracy, but ARM recommends the following guidelines: • Under normal operating conditions, the counters must present an accurate value of the count. • In exceptional circumstances, such as changes in security state or other boundary conditions, it is acceptable for the count to be inaccurate. • Under very unusual non-repeating pathological cases counts can be inaccurate. These cases are likely to occur as a result of asynchronous exceptions, such as interrupts, where the chance of a systematic error in the count is vanishingly unlikely. Note An implementation must not introduce inaccuracies that can be triggered systematically by normal pieces of code that are running. For example, dropping a branch count in a loop due to the structure of the loop gives a systematic error that makes the count of branch behavior very inaccurate, and this is not reasonable. However, the dropping of a single branch count as the result of a rare interaction with an interrupt is acceptable. The permitted inaccuracy limits the possible uses of the performance monitors. In particular, the point in a pipeline where the event counter is incremented is not defined relative to the point where a read of the event counters is made. This means that pipelining effects can cause some imprecision. An implementation must document any particular scenarios where significant inaccuracies are expected. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C9-5 Performance Monitors C9.4 Behavior on overflow On counter overflow: • An overflow status flag is set to 1. See c9, Overflow Flag Status Register (PMOVSR) on page C10-110. • An interrupt request is generated if the processor is configured to generate counter overflow interrupts. For details see c9, Interrupt Enable Set Register (PMINTENSET) on page C10-118 and c9, Interrupt Enable Clear Register (PMINTENCLR) on page C10-119. • The counter wraps to zero and continues counting events. Counting continues as long as the counters are enabled, regardless of any overflows. The counter always resets to zero and overflows after 32 bits of increment. To enable a more frequent generation of interrupt requests, the counters can be written to. For example, an interrupt handler might reset the overflowed counter to 0xFFFF0000 to generate another overflow interrupt after 16 bits of increment. Note The mechanism by which an interrupt request from the performance monitors generates an FIQ or IRQ exception is IMPLEMENTATION DEFINED. The interrupt handler for the counter interrupt must cancel the interrupt by clearing the overflow flag. C9-6 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Performance Monitors C9.5 Interaction with Security Extensions The performance monitors provide a non-invasive debug feature, and therefore are controlled by the non-invasive debug authentication signals. About non-invasive debug authentication on page C7-2 describes how non-invasive debug interacts with Security Extensions. Performance monitors on page C7-6 describes the behavior of the performance monitors when: • non-invasive debug is disabled • the processor is in a mode or state where non-invasive debug is not permitted • the processor is in Debug state. Note Additional controls in the PMCR can also disable the event counters and the PMCCNTR. Disabling the event counters and the PMCCNTR in the PMCR takes precedence over the authentication controls. The performance monitor registers are Common registers, see Common CP15 registers on page B3-74. They are always accessible regardless of the values of the authentication signals and SUNIDEN. Authentication controls whether the counters count events, not to control access to the performance monitor registers. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C9-7 Performance Monitors C9.6 Interaction with trace It is IMPLEMENTATION DEFINED whether counter events are exported to a trace macrocell or other external monitoring agents to provide triggering information. The form of the exporting is also IMPLEMENTATION DEFINED. If implemented, this exporting might be enabled as part of the performance monitoring control functionality. Similarly, ARM recommends system designers to include a mechanism for importing a set of external events to be counted, but such a feature is IMPLEMENTATION DEFINED. When implemented, this feature enables the trace module to pass in events to be counted. C9-8 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Performance Monitors C9.7 Interaction with power saving operations All counters are subject to any changes in clock frequency, including clock stopping caused by the WFI and WFE instructions. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C9-9 Performance Monitors C9.8 CP15 c9 register map The performance monitor registers are mapped into part of the CP15 register map. The registers are described in Performance monitor registers on page C10-105. Figure C9-1 shows the CP15 c9 encodings for the recommended performance monitor registers, and the reserved encodings for IMPLEMENTATION DEFINED performance monitors: CRn c9 opc1 0 CRm c12 opc2 0 1 2 3 4 5 0 1 2 0 1 2 {0-7} c13 c14 c15 Read-only ‡ Read/Write ‡ PMCR, Performance Monitor Control Register PMCNTENSET, Count Enable Set Register PMCNTENCLR, Count Enable Clear Register PMOVSR, Overflow Flag Status Register PMSWINC, Software Increment Register PMSELR, Event Counter Selection Register PMCCNTR, Cycle Count Register PMXEVTYPER, Event Type Select Register PMXEVCNTR, Event Count Register PMUSERENR, User Enable Register PMINTENSET, Interrupt Enable Set Register PMINTENCLR, Interrupt Enable Clear Register Reserved for IMPLEMENTATION DEFINED performance monitors Write-only Access depends on the operation Figure C9-1 Recommended CP15 performance monitor registers Table C9-1 lists the instructions used to access the recommended performance monitor registers. Table C9-1 Recommended performance monitor registers Instruction a Description or notes MRC p15,0,,c9,c12,0 c9, Performance Monitor Control Register (PMCR) on page C10-105. MCR p15,0,,c9,c12,0 MRC p15,0,,c9,c12,1 c9, Count Enable Set Register (PMCNTENSET) on page C10-108. MCR p15,0,,c9,c12,1 MRC p15,0,,c9,c12,2 c9, Count Enable Clear Register (PMCNTENCLR) on page C10-109. MCR p15,0,,c9,c12,2 MRC p15,0,,c9,c12,3 c9, Overflow Flag Status Register (PMOVSR) on page C10-110. MCR p15,0,,c9,c12,3 C9-10 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Performance Monitors Table C9-1 Recommended performance monitor registers (continued) Instruction a Description or notes MRC p15,0,,c9,c12,4 UNPREDICTABLE. MCR p15,0,,c9,c12,4 c9, Software Increment Register (PMSWINC) on page C10-112. MRC p15,0,,c9,c12,5 c9, Event Counter Selection Register (PMSELR) on page C10-113. PMSWINC is a write-only register. MCR p15,0,,c9,c12,5 MRC p15,0,,c9,c13,0 c9, Cycle Count Register (PMCCNTR) on page C10-114. MCR p15,0,,c9,c13,0 MRC p15,0,,c9,c13,1 c9, Event Type Select Register (PMXEVTYPER) on page C10-115. MCR p15,0,,c9,c13,1 MRC p15,0,,c9,c13,2 c9, Event Count Register (PMXEVCNTR) on page C10-116. MCR p15,0,,c9,c13,2 MRC p15,0,,c9,c14,0 c9, User Enable Register (PMUSERENR) on page C10-117. MCR p15,0,,c9,c14,0 MRC p15,0,,c9,c14,1 c9, Interrupt Enable Set Register (PMINTENSET) on page C10-118. MCR p15,0,,c9,c14,1 MRC p15,0,,c9,c14,2 c9, Interrupt Enable Clear Register (PMINTENCLR) on page C10-119. MCR p15,0,,c9,c14,2 a. C9.8.1 CP15 c9 encodings with CRm == {c12-c14} not listed in the table are reserved. For details of the behavior of accesses to these encodings see Unallocated CP15 encodings on page B3-69. Power domains and performance monitor registers reset For ARMv7 implementations, ARM recommends that performance monitors are implemented as part of the core power domain, not as part of a separate debug power domain. There is no interface to access the performance monitor registers when the core power domain is powered down. The performance monitor registers must be set to their reset values on a processor reset by nSYSPORESET, nCOREPORESET or nRESET. Performance monitor registers are not changed by a debug logic reset by PRESETDBGn. For more information about the reset scheme recommended for a v7 Debug implementation see Recommended reset scheme for v7 Debug on page C6-16. ARM DDI 0406B Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C9-11 Performance Monitors C9.9 Access permissions Normally the performance monitor registers are accessible from privileged modes only. Setting the PMUSERENR.EN flag to 1 permits access from User mode code, for example for instrumentation and profiling purposes, see c9, User Enable Register (PMUSERENR) on page C10-117. However, the PMUSERENR does not provide access to the registers that control interrupt generation. Table C9-2 Performance monitor access permissions Register Operation Access from a privileged mode Access from User mode a PMUSERENR.EN == 0 PMUSERENR.EN == 1 PMCR MRC or MCR Proceed UNDEFINED Proceed PMCNTENSET MRC or MCR Proceed UNDEFINED Proceed PMCNTENCLR MRC or MCR Proceed UNDEFINED Proceed PMOVSR MRC or MCR Proceed UNDEFINED Proceed PMSWINC MRC UNPREDICTABLE UNDEFINED UNPREDICTABLE MCR Proceed UNDEFINED Proceed PMSELR MRC or MCR Proceed UNDEFINED Proceed PMCCNTR MRC or MCR Proceed UNDEFINED Proceed PMXEVTYPER MRC or MCR Proceed UNDEFINED Proceed PMXEVCNTR MRC or MCR Proceed UNDEFINED Proceed PMUSERENR a MRC Proceed Proceed Proceed MCR Proceed UNDEFINED UNDEFINED PMINTENSET MRC or MCR Proceed UNDEFINED UNDEFINED PMINTENCLR MRC or MCR Proceed UNDEFINED UNDEFINED Reserved b MRC or MCR UNPREDICTABLE UNDEFINED UNDEFINED a. For details of the EN flag see c9, User Enable Register (PMUSERENR) on page C10-117. b. All the registers marked as reserved in Table C9-1 on page C9-10. C9-12 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Performance Monitors C9.10 Event numbers The event numbers are described in the following subsections: • Common feature event numbers • Implementation defined feature event numbers on page C9-16. C9.10.1 Common feature event numbers For the common features, normally the counters must increment only once for each event. Exceptions to this rule are stated in the individual definitions. In these definitions, the term architecturally executed means that the instruction flow is such that the counted instruction would have been executed in a simple sequential execution model. Note An instruction is architecturally executed if the behavior of the program on the processor is consistent with the instruction having been executed on a simple execution model of the architecture. Therefore an instruction that has been executed and retired is defined to be architecturally executed. In processors that perform speculative execution, an instruction is not architecturally executed if the results of the speculative execution are discarded. Where an instruction has no visible effect, for example, a NOP, the point where the instruction is retired is IMPLEMENTATION DEFINED. The common feature event number assignments are: 0x00 Software increment. The register is incremented only on writes to the Software Increment Register. For details see c9, Software Increment Register (PMSWINC) on page C10-112. 0x01 Instruction fetch that causes a refill of at least the level of instruction or unified cache closest to the processor. Each instruction fetch that causes a refill from outside the cache is counted. Accesses that do not cause a new cache refill, but are satisfied from refilling data of a previous miss, are not counted. Where an instruction fetch fetches multiple instructions, the fetch counts a single event. CP15 cache maintenance operations do not count as events. This counter increments on speculative instruction fetches as well as on fetches of instructions that reach execution. 0x02 Instruction fetch that causes a TLB refill of at least the level of TLB closest to the processor. Each instruction fetch that causes an access to a level of memory system due to a translation table walk or an access to another level of TLB caching is counted. CP15 TLB maintenance operations do not count as events. This counter increments on speculative instruction fetches as well as on fetches of instructions that reach execution. 0x03 ARM DDI 0406B Memory Read or Write operation that causes a refill of at least the level of data or unified cache closest to the processor. Each memory read from or write to that causes a refill from outside the cache is counted. Accesses that do not cause a new cache refill, but are satisfied Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. C9-13 Performance Monitors from refilling data of a previous miss are not counted. Each access to a cache line that causes a new linefill is counted, including the multiple accesses of load or store multiples, including PUSH and POP. Write-Through writes that hit in the cache do not cause a linefill and so are not counted. CP15 cache maintenance operations do not count as events. This counter increments on speculative memory accesses as well as for memory accesses that are explicitly made by instructions. 0x04 Memory Read or Write operation that causes a cache access to at least the level of data or unified cache closest to the processor. Each access to a cache line is counted including the multiple accesses of instructions such as LDM or STM. CP15 cache maintenance operations do not count as events. This counter increments on speculative memory accesses as well as for memory accesses that are explicitly made by instructions. 0x05 Memory Read or Write operation that causes a TLB refill of at least the level of TLB closest to the processor. Each memory read or write operation that causes a translation table walk or an access to another level of TLB caching is counted. CP15 TLB maintenance operations do not count as events. This counter increments on speculative memory accesses as well as for memory accesses that are explicitly made by instructions. 0x06 Memory-reading instruction architecturally executed. This counter increments for every instruction that explicitly read data, including SWP. This counter does not increment for a conditional instruction that fails its condition code check. 0x07 Memory-writing instruction architecturally executed. The counter increments for every instruction that explicitly wrote data, including SWP. This counter does not increment for a Store-Exclusive instruction that fails, or for a conditional instruction that fails its condition code check. 0x08 Instruction architecturally executed. This counter counts for all instructions, including conditional instructions that fail their condition code check. 0x09 Exception taken. This counts for each exception taken. Note This event number counts the processor exceptions described in Exceptions on page B1-30. It does not count floating-point exceptions or ThumbEE null and index checks. 0x0A Exception return architecturally executed. This counts the exception return instructions described in Exception return on page B1-38. This counter does not increment for a conditional instruction that fails its condition code check. C9-14 Copyright © 1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved. ARM DDI 0406B Performance Monitors 0x0B Instruction that writes to the CONTEXTIDR architecturally executed. This counter does not increment for a conditional instruction that fails its condition code check. 0x0C Software change of PC, except by an exception, architecturally executed. This counter does not increment for a conditional instruction that fails its condition code check. 0x0D Immediate branch architecturally executed: B{L}

Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.6
Linearized                      : Yes
Page Mode                       : UseOutlines
XMP Toolkit                     : Adobe XMP Core 4.0-c316 44.253921, Sun Oct 01 2006 17:14:39
Create Date                     : 2008:04:29 18:07:50Z
Creator Tool                    : FrameMaker 8.0
Modify Date                     : 2008:04:29 18:22:05+01:00
Metadata Date                   : 2008:04:29 18:22:05+01:00
Copyright                       : Copyright ©€1996-1998, 2000, 2004-2008 ARM Limited. All rights reserved.
Format                          : application/pdf
Creator                         : ARM Limited
Title                           : ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition
Producer                        : Acrobat Distiller 7.0.5 (Windows)
Document ID                     : uuid:a8a35e60-51cd-4b36-a5e9-7efa2dc90c9f
Instance ID                     : uuid:a66d7831-448e-412c-af6f-b4f6383b09e9
Page Count                      : 2158
Author                          : ARM Limited
EXIF Metadata provided by EXIF.tools

Navigation menu