ARM Architecture Reference Manual ARMv7 A And R Edition ARM.arm

DDI0406C_C_arm_architecture_reference_manual

arm_architecture_reference_manual

DDI0406C_C_arm_architecture_reference_manual

01_armv7_architecture_reference_manual

User Manual:

Open the PDF directly: View PDF .
Page Count: 2736 [warning: Documents this large are best viewed by clicking the View PDF Link!]

ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition
Contents
Preface
- About this manual
- Using this manual
- Conventions
- Additional reading
  - ARM publications
  - Other publications
- Feedback
  - Feedback on this manual
Part A: Application Level Architecture
A1: Introduction to the ARM Architecture
- A1.1 About the ARM architecture
- A1.2 The instruction sets
  - A1.2.1 Execution environment support
- A1.3 Architecture versions, profiles, and variants
  - A1.3.1 Debug architecture versions
- A1.4 Architecture extensions
  - A1.4.1 Instruction set architecture extensions
  - A1.4.2 Architecture extensions
- A1.5 The ARM memory model
A2: Application Level Programmers’ Model
- A2.1 About the Application level programmers’ model
  - A2.1.1 Instruction sets, arithmetic operations, and register files
- A2.2 ARM core data types and arithmetic
  - A2.2.1 Integer arithmetic
- A2.3 ARM core registers
  - A2.3.1 Writing to the PC
  - A2.3.2 Pseudocode details of operations on ARM core registers
- A2.4 The Application Program Status Register (APSR)
- A2.5 Execution state registers
- A2.6 Advanced SIMD and Floating-point Extensions
- A2.7 Floating-point data types and arithmetic
- A2.8 Polynomial arithmetic over {0, 1}
  - A2.8.1 Pseudocode details of polynomial multiplication
- A2.9 Coprocessor support
- A2.10 Thumb Execution Environment
  - A2.10.1 ThumbEE instructions
  - A2.10.2 ThumbEE configuration
    - Use of HandlerBase
- A2.11 Jazelle direct bytecode execution support
- A2.12 Exceptions, debug events and checks
A3: Application Level Memory Model
- A3.1 Address space
  - A3.1.1 Address space overflow or underflow
- A3.2 Alignment support
- A3.3 Endian support
- A3.4 Synchronization and semaphores
- A3.5 Memory types and attributes and the memory order model
- A3.6 Access rights
- A3.7 Virtual and physical addressing
- A3.8 Memory access order
- A3.9 Caches and memory hierarchy
A4: The Instruction Sets
- A4.1 About the instruction sets
- A4.2 Unified Assembler Language
  - A4.2.1 Conditional instructions
  - A4.2.2 Use of labels in UAL instruction syntax
- A4.3 Branch instructions
- A4.4 Data-processing instructions
- A4.5 Status register access instructions
  - A4.5.1 Banked register access instructions
- A4.6 Load/store instructions
- A4.7 Load/store multiple instructions
  - A4.7.1 Loads to the PC
- A4.8 Miscellaneous instructions
  - A4.8.1 The Yield instruction
- A4.9 Exception-generating and exception-handling instructions
- A4.10 Coprocessor instructions
- A4.11 Advanced SIMD and Floating-point load/store instructions
  - A4.11.1 Element and structure load/store instructions
- A4.12 Advanced SIMD and Floating-point register transfer instructions
- A4.13 Advanced SIMD data-processing instructions
- A4.14 Floating-point data-processing instructions
A5: ARM Instruction Set Encoding
- A5.1 ARM instruction set encoding
- A5.2 Data-processing and miscellaneous instructions
- A5.3 Load/store word and unsigned byte
- A5.4 Media instructions
- A5.5 Branch, branch with link, and block data transfer
- A5.6 Coprocessor instructions, and Supervisor Call
- A5.7 Unconditional instructions
  - A5.7.1 Memory hints, Advanced SIMD instructions, and miscellaneous instructions
A6: Thumb Instruction Set Encoding
- A6.1 Thumb instruction set encoding
- A6.2 16-bit Thumb instruction encoding
- A6.3 32-bit Thumb instruction encoding
A7: Advanced SIMD and Floating-point Instruction Encoding
- A7.1 Overview
  - A7.1.1 Advanced SIMD
  - A7.1.2 Floating-point
- A7.2 Advanced SIMD and Floating-point instruction syntax
- A7.3 Register encoding
  - A7.3.1 Advanced SIMD scalars
- A7.4 Advanced SIMD data-processing instructions
- A7.5 Floating-point data-processing instructions
  - A7.5.1 Operation of modified immediate constants, Floating-point
- A7.6 Extension register load/store instructions
- A7.7 Advanced SIMD element or structure load/store instructions
  - A7.7.1 Advanced SIMD addressing mode
- A7.8 8, 16, and 32-bit transfer between ARM core and extension registers
- A7.9 64-bit transfers between ARM core and extension registers
A8: Instruction Descriptions
- A8.1 Format of instruction descriptions
- A8.2 Standard assembler syntax fields
- A8.3 Conditional execution
  - A8.3.1 Pseudocode details of conditional execution
- A8.4 Shifts applied to a register
- A8.5 Memory accesses
- A8.6 Encoding of lists of ARM core registers
- A8.7 Additional pseudocode support for instruction descriptions
  - A8.7.1 Pseudocode details of coprocessor operations
  - A8.7.2 Calling the supervisor
- A8.8 Alphabetical list of instructions
A9: The ThumbEE Instruction Set
- A9.1 About the ThumbEE instruction set
- A9.2 ThumbEE instruction set encoding
  - A9.2.1 16-bit ThumbEE instructions
- A9.3 Additional instructions in Thumb and ThumbEE instruction sets
  - A9.3.1 ENTERX, LEAVEX
- A9.4 ThumbEE instructions with modified behavior
- A9.5 Additional ThumbEE instructions
Part B: System Level Architecture
B1: System Level Programmers’ Model
- B1.1 About the System level programmers’ model
- B1.2 System level concepts and terminology
  - B1.2.1 Mode, state, and privilege level
  - B1.2.2 Exceptions
    - Terminology for describing exceptions
    - Exceptions, privilege, and security state
- B1.3 ARM processor modes and ARM core registers
- B1.4 Instruction set states
  - B1.4.1 Exceptions and instruction set state
  - B1.4.2 Unimplemented instruction sets
- B1.5 The Security Extensions
- B1.6 The Large Physical Address Extension
- B1.7 The Virtualization Extensions
  - B1.7.1 Impact of the Virtualization Extensions on the modes and exception model
    - The virtual exceptions
- B1.8 Exception handling
- B1.9 Exception descriptions
- B1.10 Coprocessors and system control
  - B1.10.1 CP14 and CP15 system control registers
    - Access to CP14 and CP15 registers
  - B1.10.2 Access controls on CP0 to CP13
- B1.11 Advanced SIMD and floating-point support
- B1.12 Thumb Execution Environment
  - B1.12.1 ThumbEE and the Security Extensions and Virtualization Extensions
  - B1.12.2 Aborts, exceptions, and checks
- B1.13 Jazelle direct bytecode execution
- B1.14 Traps to the hypervisor
B2: Common Memory System Architecture Features
- B2.1 About the memory system architecture
- B2.2 Caches and branch predictors
- B2.3 IMPLEMENTATION DEFINED memory system features
  - B2.3.1 ARMv7 CP15 register support for IMPLEMENTATION DEFINED features
- B2.4 Pseudocode details of general memory system operations
B3: Virtual Memory System Architecture (VMSA)
- B3.1 About the VMSA
- B3.2 The effects of disabling MMUs on VMSA behavior
- B3.3 Translation tables
- B3.4 Secure and Non-secure address spaces
- B3.5 Short-descriptor translation table format
- B3.6 Long-descriptor translation table format
- B3.7 Memory access control
- B3.8 Memory region attributes
- B3.9 Translation Lookaside Buffers (TLBs)
- B3.10 TLB maintenance requirements
- B3.11 Caches in a VMSA implementation
- B3.12 VMSA memory aborts
- B3.13 Exception reporting in a VMSA implementation
- B3.14 Virtual Address to Physical Address translation operations
- B3.15 About the system control registers for VMSA
- B3.16 Organization of the CP14 registers in a VMSA implementation
- B3.17 Organization of the CP15 registers in a VMSA implementation
- B3.18 Functional grouping of VMSAv7 system control registers
- B3.19 Pseudocode details of VMSA memory system operations
B4: System Control Registers in a VMSA implementation
- B4.1 VMSA System control registers descriptions, in register order
- B4.2 VMSA system control operations described by function
B5: Protected Memory System Architecture (PMSA)
- B5.1 About the PMSA
- B5.2 Memory access control
  - B5.2.1 Access permissions
    - The XN (Execute-never) attribute and instruction fetching
- B5.3 Memory region attributes
- B5.4 PMSA memory aborts
- B5.5 Exception reporting in a PMSA implementation
- B5.6 About the system control registers for PMSA
- B5.7 Organization of the CP14 registers in a PMSA implementation
- B5.8 Organization of the CP15 registers in a PMSA implementation
- B5.9 Functional grouping of PMSAv7 system control registers
- B5.10 Pseudocode details of PMSA memory system operations
B6: System Control Registers in a PMSA implementation
- B6.1 PMSA System control registers descriptions, in register order
- B6.2 PMSA system control operations described by function
B7: The CPUID Identification Scheme
- B7.1 Introduction to the CPUID scheme
- B7.2 The CPUID registers
  - B7.2.1 Organization of the CPUID registers
    - General properties of the CPUID registers
  - B7.2.2 About the Instruction Set Attribute registers
    - Instruction set descriptions in the CPUID scheme
    - Summary of Instruction Set Attribute register attribute fields
- B7.3 Advanced SIMD and Floating-point Extension feature identification registers
  - B7.3.1 About the Media and VFP Feature registers
B8: The Generic Timer
- B8.1 About the Generic Timer
- B8.2 Generic Timer registers summary
  - B8.2.1 Status of the CNTVOFF register
B9: System Instructions
- B9.1 General restrictions on system instructions
  - B9.1.1 Restrictions on exception return instructions
  - B9.1.2 Restrictions on updates to the CPSR.M field
- B9.2 Encoding and use of Banked register transfer instructions
- B9.3 Alphabetical list of instructions
Part C: Debug Architecture
C1: Introduction to the ARM Debug Architecture
- C1.1 Scope of part C of this manual
  - C1.1.1 Support for Secure User halting debug
- C1.2 About the ARM Debug architecture
- C1.3 Security Extensions and debug
- C1.4 Register interfaces
C2: Invasive Debug Authentication
- C2.1 About invasive debug authentication
- C2.2 Invasive debug with no Security Extensions
- C2.3 Invasive debug with the Security Extensions
- C2.4 Invasive debug authentication security considerations
C3: Debug Events
- C3.1 About debug events
- C3.2 BKPT instruction debug events
- C3.3 Breakpoint debug events
- C3.4 Watchpoint debug events
- C3.5 Vector catch debug events
- C3.6 Halting debug events
- C3.7 Generation of debug events
- C3.8 Debug event prioritization
- C3.9 Pseudocode details of Software debug events
C4: Debug Exceptions
- C4.1 About debug exceptions
  - C4.1.1 Debug exception on BKPT instruction, Breakpoint, or Vector catch debug events
  - C4.1.2 Debug exception on Watchpoint debug event
- C4.2 Avoiding debug exceptions that might cause UNPREDICTABLE behavior
  - C4.2.1 Debug exceptions in exception handlers
  - C4.2.2 Debug exceptions in debug monitors
C5: Debug State
- C5.1 About Debug state
- C5.2 Entering Debug state
- C5.3 Executing instructions in Debug state
- C5.4 Behavior of non-invasive debug in Debug state
- C5.5 Exceptions in Debug state
  - C5.5.1 Handling of synchronous Data Aborts in Non-secure state, Virtualization Extensions
  - C5.5.2 Effect of asynchronous aborts when the processor is in Debug state
- C5.6 Memory system behavior in Debug state
- C5.7 Exiting Debug state
  - C5.7.1 CPSR and PC values on exit from Debug state
  - C5.7.2 Effect of asynchronous aborts on exiting Debug state
C6: Debug Register Interfaces
- C6.1 About the debug register interfaces
  - C6.1.1 Processor interfaces to the debug registers
  - C6.1.2 External debug interface to the debug registers
- C6.2 Synchronization of debug register updates
  - C6.2.1 Synchronization of accesses to the Debug Communications Channel
  - C6.2.2 Synchronization requirements for memory-mapped register interfaces
- C6.3 Access permissions
- C6.4 The CP14 debug register interface
- C6.5 The memory-mapped and recommended external debug interfaces
- C6.6 Summary of the v7 Debug register interfaces
- C6.7 Summary of the v7.1 Debug register interfaces
C7: Debug Reset and Powerdown Support
- C7.1 Debug guidelines for systems with energy management capability
- C7.2 Power domains and debug
- C7.3 The OS Save and Restore mechanism
- C7.4 Reset and debug
C8: The Debug Communications Channel and Instruction Transfer Register
- C8.1 About the DCC and DBGITR
- C8.2 Operation of the DCC and Instruction Transfer Register
  - C8.2.1 General operation of the DCC and Instruction Transfer Register
  - C8.2.2 Operation of the External DCC access modes
- C8.3 Behavior of accesses to the DCC registers and DBGITR
- C8.4 Synchronization of accesses to the DCC and the DBGITR
C9: Non-invasive Debug Authentication
- C9.1 About non-invasive debug authentication
- C9.2 Non-invasive debug authentication
- C9.3 Effects of non-invasive debug authentication
  - C9.3.1 Trace
C10: Sample-based Profiling
- C10.1 Sample-based profiling
  - C10.1.1 The implemented Sample-based profiling registers
  - C10.1.2 Reads of the Program Counter Sampling Register
C11: The Debug Registers
- C11.1 About the debug registers
- C11.2 Debug register summary
- C11.3 Debug identification registers
  - C11.3.1 About the Debug identification registers
- C11.4 Control and status registers
  - C11.4.1 About the Debug control and status registers
- C11.5 Instruction and data transfer registers
  - C11.5.1 About the Debug instruction transfer and data transfer registers
- C11.6 Software debug event registers
  - C11.6.1 About the Software debug event registers
- C11.7 Sample-based profiling registers
  - C11.7.1 About the sample-based profiling registers
- C11.8 OS Save and Restore registers
  - C11.8.1 About the OS Save and Restore registers
- C11.9 Memory system control registers
  - C11.9.1 About the Debug memory system control registers
- C11.10 Management registers
- C11.11 Register descriptions, in register order
C12: The Performance Monitors Extension
- C12.1 About the Performance Monitors
- C12.2 Accuracy of the Performance Monitors
- C12.3 Behavior on overflow
  - C12.3.1 Generating overflow interrupt requests
  - C12.3.2 Pseudocode details of overflow interrupt requests
- C12.4 Effect of the Security Extensions and Virtualization Extensions
  - C12.4.1 Interaction with Security Extensions
  - C12.4.2 Interaction with Virtualization Extensions
- C12.5 Event filtering, PMUv2
  - C12.5.1 Accuracy of event filtering
    - Exception-related events
    - Software increment events
  - C12.5.2 Pseudocode details of event filtering
- C12.6 Counter enables
- C12.7 Counter access
  - C12.7.1 PMNx event counters
  - C12.7.2 CCNT cycle counter
- C12.8 Event numbers and mnemonics
- C12.9 Performance Monitors registers
  - C12.9.1 CP15 c9 performance monitors registers
    - Power domains and Performance Monitors registers reset
    - Access permissions
Part D: Appendixes
D1: Recommended External Debug Interface
- D1.1 About the recommended external debug interface
- D1.2 Authentication signals
  - D1.2.1 Changing the authentication signals
- D1.3 Run-control and cross-triggering signals
- D1.4 Recommended debug slave port
  - D1.4.1 PADDRDBG
  - D1.4.2 PSLVERRDBG
- D1.5 Other debug signals
D2: Recommended Memory-mapped and External Debug Interfaces for the Performance Monitors
- D2.1 About the memory-mapped views of the Performance Monitors registers
- D2.2 PMU register descriptions for memory-mapped register views
D3: Recommendations for Performance Monitors Event Numbers for IMPLEMENTATION DEFINED Events
- D3.1 ARM recommendations for IMPLEMENTATION DEFINED event numbers
  - D3.1.1 Effect of selecting an unused or reserved event number
D4: Example OS Save and Restore Sequences for External Debug Over Powerdown
- D4.1 Example OS Save and Restore sequences for v7 Debug
  - D4.1.1 v7 Debug OS Save and Restore sequences using memory-mapped interface, v7 Debug
  - D4.1.2 v7 Debug OS Save and Restore sequences using the CP14 interface, v7 Debug
- D4.2 Example OS Save and Restore sequences for v7.1 Debug
D5: System Level Implementation of the Generic Timer
- D5.1 About the Generic Timer specification
  - D5.1.1 The memory-mapped view of the Generic Timer
- D5.2 Memory-mapped counter module
  - D5.2.1 Control of counter operating frequency and increment
    - The frequency modes table
    - Changing the system counter frequency and increment
- D5.3 Counter module control and status register summary
- D5.4 About the memory-mapped view of the counter and timer
- D5.5 The CNTBaseN and CNTPL0BaseN frames
- D5.6 The CNTCTLBase frame
- D5.7 System level Generic Timer register descriptions, in register order
- D5.8 Providing a complete set of counter and timer features
- D5.9 Gray-count scheme for timer distribution scheme
D6: Common VFP Subarchitecture Specification
- D6.1 Scope of this appendix
- D6.2 Introduction to the Common VFP subarchitecture
- D6.3 Exception processing
- D6.4 Support code requirements
- D6.5 Context switching
- D6.6 Subarchitecture additions to the Floating-point Extension system registers
- D6.7 Earlier versions of the Common VFP subarchitecture
  - D6.7.1 Differences between version 2 and version 3 of the Common VFP subarchitecture
  - D6.7.2 Differences between version 1 and version 2 of the Common VFP subarchitecture
    - Subarchitecture v1 exception handling when FPSCR.IXE is set to 1
D7: Barrier Litmus Tests
- D7.1 Introduction
- D7.2 Simple ordering and barrier cases
- D7.3 Exclusive accesses and barriers
- D7.4 Using a mailbox to send an interrupt
- D7.5 Cache and TLB maintenance operations and barriers
D8: Legacy Instruction Mnemonics
- D8.1 Thumb instruction mnemonics
- D8.2 Other UAL mnemonic changes
- D8.3 Pre-UAL pseudo-instruction NOP
D9: Deprecated and Obsolete Features
- D9.1 Deprecated features
- D9.2 Obsolete features
- D9.3 Use of the SP as a general-purpose register
- D9.4 Explicit use of the PC in ARM instructions
- D9.5 Deprecated Thumb instructions
D10: Fast Context Switch Extension (FCSE)
- D10.1 About the FCSE
- D10.2 Modified virtual addresses
- D10.3 Debug and trace
  - D10.3.1 Addresses used for the generation of debug events
D11: VFP Vector Operation Support
- D11.1 About VFP vector mode
  - D11.1.1 Affected instructions
- D11.2 Vector length and stride control
- D11.3 VFP register banks
- D11.4 VFP instruction type selection
D12: ARMv6 Differences
- D12.1 Introduction to ARMv6
- D12.2 Application level register support
  - D12.2.1 APSR support
  - D12.2.2 Instruction set state
    - Interworking
    - BL and BLX (immediate) instructions, before ARMv6T2
- D12.3 Application level memory support
- D12.4 Instruction set support
- D12.5 System level register support
- D12.6 System level memory model
- D12.7 System Control coprocessor, CP15, support
D13: v6 Debug and v6.1 Debug Differences
- D13.1 About v6 Debug and v6.1 Debug
  - D13.1.1 Major differences between the ARMv6 and ARMv7 Debug architectures
- D13.2 Invasive debug authentication, v6 Debug and v6.1 Debug
- D13.3 Debug events, v6 Debug and v6.1 Debug
  - D13.3.1 Software debug events
  - D13.3.2 Halting debug events
- D13.4 Debug exceptions, v6 Debug and v6.1 Debug
- D13.5 Debug state, v6 Debug and v6.1 Debug
- D13.6 Debug register interfaces, v6 Debug and v6.1 Debug
  - D13.6.1 v6 Debug and v6.1 Debug register visibility
    - ARMv6 debug features not defined by the Debug architecture
  - D13.6.2 v6 Debug and v6.1 Debug register accesses in the CP14 interface
    - Accesses to reserved registers
- D13.7 Reset and powerdown support
- D13.8 The Debug Communications Channel and Instruction Transfer Register
- D13.9 Non-invasive debug authentication, v6 Debug and v6.1 Debug
  - D13.9.1 ARMv6 non-invasive debug authentication
- D13.10 Sample-based profiling, v6 Debug and v6.1 Debug
- D13.11 The debug registers, v6 Debug and v6.1 Debug
- D13.12 Performance monitors, v6 Debug and v6.1 Debug
D14: Secure User Halting Debug
- D14.1 About Secure User halting debug
- D14.2 Invasive debug authentication in an implementation that supports SUHD
  - D14.2.1 Effect of SUHD support on invasive debug authentication
- D14.3 Effects of SUHD on Debug state
D15: ARMv4 and ARMv5 Differences
- D15.1 Introduction to ARMv4 and ARMv5
  - D15.1.1 Debug
  - D15.1.2 ARMv6 and ARMv7
- D15.2 Application level register support
  - D15.2.1 APSR support
  - D15.2.2 Instruction set state
    - Interworking
- D15.3 Application level memory support
- D15.4 Instruction set support
- D15.5 System level register support
- D15.6 System level memory model
- D15.7 System Control coprocessor, CP15 support
D16: Pseudocode Definition
- D16.1 About the ARMv7 pseudocode
  - D16.1.1 General limitations of ARMv7 pseudocode
- D16.2 Pseudocode for instruction descriptions
  - D16.2.1 Instruction encoding diagrams and instruction pseudocode
  - D16.2.2 Limitations of the instruction pseudocode
- D16.3 Data types
- D16.4 Expressions
- D16.5 Operators and built-in functions
- D16.6 Statements and program structure
- D16.7 Miscellaneous helper procedures and functions
D17: Pseudocode Index
- D17.1 Pseudocode operators and keywords
- D17.2 Pseudocode functions and procedures
D18: Registers Index
- D18.1 Alphabetic index of ARMv7 registers, by register name
- D18.2 Full registers index
Glossary

ARM DDI 0406C.c (ID051414)

ARM® Architecture Reference Manual

ARMv7-A and ARMv7-R edition

Non-Confidential ID051414

ARM Architecture Reference Manual

ARMv7-A and ARMv7-R edition

Release Information

The following changes have been made to this document.

Note that issue C.a, the first publication of issue C of this manual, was originally identified as issue C.

From ARMv7, the ARM® architecture defines different architectural profiles and this edition of this manual describes only the A

and R profiles. For details of the documentation of the ARMv7-M profile see Additional reading on page xxiii. Before ARMv7

there was only a single ARM Architecture Reference Manual, with document number DDI 0100. The first issue of this was in

February 1996, and the final issue, issue I, was in July 2005. For more information see Additional reading on page xxiii.

Proprietary Notice

This ARM Architecture Reference Manual is protected by copyright and the practice or implementation of the information herein

may be protected by one or more patents or pending applications. No part of this ARM Architecture Reference Manual may be

reproduced in any form by any means without the express prior written permission of ARM. No license, express or implied, by

estoppel or otherwise to any intellectual property rights is granted by this ARM Architecture Reference Manual.

Your access to the information in this ARM Architecture Reference Manual is conditional upon your acceptance that you will not

use or permit others to use the information for the purposes of determining whether implementations of the ARM architecture

infringe any third party patents.

This ARM Architecture Reference Manual is provided “as is”. ARM makes no representations or warranties, either express or

implied, included but not limited to, warranties of merchantability, fitness for a particular purpose, or non-infringement, that the

content of this ARM Architecture Reference Manual is suitable for any particular purpose or that any practice or implementation

of the contents of the ARM Architecture Reference Manual will not infringe any third party patents, copyrights, trade secrets, or

other rights.

This ARM Architecture Reference Manual may include technical inaccuracies or typographical errors.

To the extent not prohibited by law, in no event will ARM be liable for any damages, including without limitation any direct loss,

lost revenue, lost profits or data, special, indirect, consequential, incidental or punitive damages, however caused and regardless

of the theory of liability, arising out of or related to any furnishing, practicing, modifying or any use of this ARM Architecture

Reference Manual, even if ARM has been advised of the possibility of such damages.

Words and logos marked with ® or TM are registered trademarks or trademarks of ARM Limited, except as otherwise stated below

in this proprietary notice. Other brands and names mentioned herein may be the trademarks of their respective owners.

110 Fulbourn Road, Cambridge, England CB1 9NJ

This document consists solely of commercial items. You shall be responsible for ensuring that any use, duplication or disclosure

of this document complies fully with any relevant export laws and regulations to assure that this document or any portion thereof

is not exported, directly or indirectly, in violation of such export laws.

This document is Non-Confidential but any disclosure by you is subject to you providing notice to and the acceptance by

the recipient of, the conditions set out above.

In this document, where the term ARM is used to refer to the company it means “ARM or any of its subsidiaries as appropriate”.

Change History

Date Issue Confidentiality Change

05 April 2007 A Non-Confidential New edition for ARMv7-A and ARMv7-R architecture profiles.

Document number changed from ARM DDI 0100 to ARM DDI 0406 and contents restructured.

29 April 2008 B Non-Confidential Addition of the VFP Half-precision and Multiprocessing Extensions, and many clarifications and enhancements.

23 November 2011 C (C.a) Non-Confidential Addition of the Virtualization Extensions, Large Physical Address Extension, Generic Timer Extension, and other

additions. Many other clarifications and enhancements.

24 July 2012 C.b Non-Confidential Errata release for issue C.a.

20 May 2014 C.c Non-Confidential Second errata release for issue C.a.

ID051414 Non-Confidential

Note

The term ARM can refer to versions of the ARM architecture, for example ARMv6 refers to version 6 of the ARM architecture.

The context makes it clear when the term is used in this way.

Non-Confidential ID051414

ID051414 Non-Confidential

Contents

ARM Architecture Reference Manual ARMv7-A and

ARMv7-R edition

Preface

About this manual ..................................................................................................... xiv

Using this manual ...................................................................................................... xvi

Conventions .............................................................................................................. xxi

Additional reading ................................................................................................... xxiii

Feedback ................................................................................................................ xxiv

Part A Application Level Architecture

Chapter A1 Introduction to the ARM Architecture

A1.1 About the ARM architecture ................................................................................ A1-28

A1.2 The instruction sets ............................................................................................. A1-29

A1.3 Architecture versions, profiles, and variants ........................................................ A1-30

A1.4 Architecture extensions ....................................................................................... A1-32

A1.5 The ARM memory model .................................................................................... A1-35

Chapter A2 Application Level Programmers’ Model

A2.1 About the Application level programmers’ model ................................................ A2-38

A2.2 ARM core data types and arithmetic ................................................................... A2-40

A2.3 ARM core registers ............................................................................................. A2-45

A2.4 The Application Program Status Register (APSR) .............................................. A2-49

A2.5 Execution state registers ..................................................................................... A2-50

A2.6 Advanced SIMD and Floating-point Extensions .................................................. A2-54

A2.7 Floating-point data types and arithmetic ............................................................. A2-63

A2.8 Polynomial arithmetic over {0, 1} ......................................................................... A2-93

A2.9 Coprocessor support ........................................................................................... A2-94

Non-Confidential ID051414

A2.10 Thumb Execution Environment ........................................................................... A2-95

A2.11 Jazelle direct bytecode execution support .......................................................... A2-97

A2.12 Exceptions, debug events and checks .............................................................. A2-102

Chapter A3 Application Level Memory Model

A3.1 Address space .................................................................................................. A3-106

A3.2 Alignment support ............................................................................................. A3-108

A3.3 Endian support .................................................................................................. A3-110

A3.4 Synchronization and semaphores ..................................................................... A3-114

A3.5 Memory types and attributes and the memory order model .............................. A3-126

A3.6 Access rights ..................................................................................................... A3-142

A3.7 Virtual and physical addressing ........................................................................ A3-145

A3.8 Memory access order ........................................................................................ A3-146

A3.9 Caches and memory hierarchy ......................................................................... A3-156

Chapter A4 The Instruction Sets

A4.1 About the instruction sets .................................................................................. A4-160

A4.2 Unified Assembler Language ............................................................................ A4-162

A4.3 Branch instructions ............................................................................................ A4-164

A4.4 Data-processing instructions ............................................................................. A4-165

A4.5 Status register access instructions .................................................................... A4-174

A4.6 Load/store instructions ...................................................................................... A4-175

A4.7 Load/store multiple instructions ......................................................................... A4-177

A4.8 Miscellaneous instructions ................................................................................ A4-178

A4.9 Exception-generating and exception-handling instructions ............................... A4-179

A4.10 Coprocessor instructions ................................................................................... A4-180

A4.11 Advanced SIMD and Floating-point load/store instructions ............................... A4-181

A4.12 Advanced SIMD and Floating-point register transfer instructions ..................... A4-183

A4.13 Advanced SIMD data-processing instructions ................................................... A4-184

A4.14 Floating-point data-processing instructions ....................................................... A4-191

Chapter A5 ARM Instruction Set Encoding

A5.1 ARM instruction set encoding ........................................................................... A5-194

A5.2 Data-processing and miscellaneous instructions .............................................. A5-196

A5.3 Load/store word and unsigned byte .................................................................. A5-208

A5.4 Media instructions ............................................................................................. A5-209

A5.5 Branch, branch with link, and block data transfer .............................................. A5-214

A5.6 Coprocessor instructions, and Supervisor Call ................................................. A5-215

A5.7 Unconditional instructions ................................................................................. A5-216

Chapter A6 Thumb Instruction Set Encoding

A6.1 Thumb instruction set encoding ........................................................................ A6-220

A6.2 16-bit Thumb instruction encoding .................................................................... A6-223

A6.3 32-bit Thumb instruction encoding .................................................................... A6-230

Chapter A7 Advanced SIMD and Floating-point Instruction Encoding

A7.1 Overview ........................................................................................................... A7-254

A7.2 Advanced SIMD and Floating-point instruction syntax ...................................... A7-255

A7.3 Register encoding ............................................................................................. A7-259

A7.4 Advanced SIMD data-processing instructions ................................................... A7-261

A7.5 Floating-point data-processing instructions ....................................................... A7-272

A7.6 Extension register load/store instructions .......................................................... A7-274

A7.7 Advanced SIMD element or structure load/store instructions ........................... A7-275

A7.8 8, 16, and 32-bit transfer between ARM core and extension registers ............. A7-278

A7.9 64-bit transfers between ARM core and extension registers ............................. A7-279

Chapter A8 Instruction Descriptions

A8.1 Format of instruction descriptions ..................................................................... A8-282

ID051414 Non-Confidential

A8.2 Standard assembler syntax fields ..................................................................... A8-287

A8.3 Conditional execution ........................................................................................ A8-288

A8.4 Shifts applied to a register ................................................................................. A8-291

A8.5 Memory accesses ............................................................................................. A8-294

A8.6 Encoding of lists of ARM core registers ............................................................ A8-295

A8.7 Additional pseudocode support for instruction descriptions .............................. A8-296

A8.8 Alphabetical list of instructions .......................................................................... A8-300

Chapter A9 The ThumbEE Instruction Set

A9.1 About the ThumbEE instruction set ................................................................. A9-1112

A9.2 ThumbEE instruction set encoding ................................................................. A9-1115

A9.3 Additional instructions in Thumb and ThumbEE instruction sets .................... A9-1116

A9.4 ThumbEE instructions with modified behavior ................................................ A9-1117

A9.5 Additional ThumbEE instructions .................................................................... A9-1123

Part B System Level Architecture

Chapter B1 System Level Programmers’ Model

B1.1 About the System level programmers’ model .................................................. B1-1134

B1.2 System level concepts and terminology .......................................................... B1-1135

B1.3 ARM processor modes and ARM core registers ............................................. B1-1139

B1.4 Instruction set states ....................................................................................... B1-1155

B1.5 The Security Extensions ................................................................................. B1-1156

B1.6 The Large Physical Address Extension ........................................................... B1-1160

B1.7 The Virtualization Extensions .......................................................................... B1-1162

B1.8 Exception handling .......................................................................................... B1-1165

B1.9 Exception descriptions .................................................................................... B1-1205

B1.10 Coprocessors and system control ................................................................... B1-1226

B1.11 Advanced SIMD and floating-point support ..................................................... B1-1229

B1.12 Thumb Execution Environment ....................................................................... B1-1240

B1.13 Jazelle direct bytecode execution ................................................................... B1-1241

B1.14 Traps to the hypervisor ................................................................................... B1-1248

Chapter B2 Common Memory System Architecture Features

B2.1 About the memory system architecture ........................................................... B2-1264

B2.2 Caches and branch predictors ........................................................................ B2-1266

B2.3 IMPLEMENTATION DEFINED memory system features ............................... B2-1292

B2.4 Pseudocode details of general memory system operations ............................ B2-1293

Chapter B3 Virtual Memory System Architecture (VMSA)

B3.1 About the VMSA .............................................................................................. B3-1308

B3.2 The effects of disabling MMUs on VMSA behavior ......................................... B3-1314

B3.3 Translation tables ............................................................................................ B3-1318

B3.4 Secure and Non-secure address spaces ........................................................ B3-1323

B3.5 Short-descriptor translation table format ......................................................... B3-1324

B3.6 Long-descriptor translation table format .......................................................... B3-1338

B3.7 Memory access control ................................................................................... B3-1356

B3.8 Memory region attributes ................................................................................ B3-1366

B3.9 Translation Lookaside Buffers (TLBs) ............................................................. B3-1378

B3.10 TLB maintenance requirements ...................................................................... B3-1381

B3.11 Caches in a VMSA implementation ................................................................. B3-1392

B3.12 VMSA memory aborts ..................................................................................... B3-1395

B3.13 Exception reporting in a VMSA implementation .............................................. B3-1409

B3.14 Virtual Address to Physical Address translation operations ............................ B3-1438

B3.15 About the system control registers for VMSA .................................................. B3-1444

B3.16 Organization of the CP14 registers in a VMSA implementation ...................... B3-1468

B3.17 Organization of the CP15 registers in a VMSA implementation ...................... B3-1469

Non-Confidential ID051414

B3.18 Functional grouping of VMSAv7 system control registers ............................... B3-1491

B3.19 Pseudocode details of VMSA memory system operations .............................. B3-1503

Chapter B4 System Control Registers in a VMSA implementation

B4.1 VMSA System control registers descriptions, in register order ....................... B4-1522

B4.2 VMSA system control operations described by function ................................. B4-1743

Chapter B5 Protected Memory System Architecture (PMSA)

B5.1 About the PMSA .............................................................................................. B5-1756

B5.2 Memory access control ................................................................................... B5-1761

B5.3 Memory region attributes ................................................................................ B5-1762

B5.4 PMSA memory aborts ..................................................................................... B5-1765

B5.5 Exception reporting in a PMSA implementation .............................................. B5-1769

B5.6 About the system control registers for PMSA .................................................. B5-1774

B5.7 Organization of the CP14 registers in a PMSA implementation ...................... B5-1786

B5.8 Organization of the CP15 registers in a PMSA implementation ...................... B5-1787

B5.9 Functional grouping of PMSAv7 system control registers ............................... B5-1799

B5.10 Pseudocode details of PMSA memory system operations .............................. B5-1806

Chapter B6 System Control Registers in a PMSA implementation

B6.1 PMSA System control registers descriptions, in register order ....................... B6-1810

B6.2 PMSA system control operations described by function ................................. B6-1943

Chapter B7 The CPUID Identification Scheme

B7.1 Introduction to the CPUID scheme .................................................................. B7-1950

B7.2 The CPUID registers ....................................................................................... B7-1951

B7.3 Advanced SIMD and Floating-point Extension feature identification registers B7-1957

Chapter B8 The Generic Timer

B8.1 About the Generic Timer ................................................................................. B8-1960

B8.2 Generic Timer registers summary ................................................................... B8-1969

Chapter B9 System Instructions

B9.1 General restrictions on system instructions ..................................................... B9-1972

B9.2 Encoding and use of Banked register transfer instructions ............................. B9-1973

B9.3 Alphabetical list of instructions ........................................................................ B9-1978

Part C Debug Architecture

Chapter C1 Introduction to the ARM Debug Architecture

C1.1 Scope of part C of this manual ........................................................................ C1-2022

C1.2 About the ARM Debug architecture ................................................................ C1-2023

C1.3 Security Extensions and debug ....................................................................... C1-2027

C1.4 Register interfaces .......................................................................................... C1-2028

Chapter C2 Invasive Debug Authentication

C2.1 About invasive debug authentication .............................................................. C2-2030

C2.2 Invasive debug with no Security Extensions ................................................... C2-2031

C2.3 Invasive debug with the Security Extensions .................................................. C2-2033

C2.4 Invasive debug authentication security considerations ................................... C2-2035

Chapter C3 Debug Events

C3.1 About debug events ........................................................................................ C3-2038

C3.2 BKPT instruction debug events ....................................................................... C3-2040

C3.3 Breakpoint debug events ................................................................................ C3-2041

C3.4 Watchpoint debug events ................................................................................ C3-2059

ID051414 Non-Confidential

C3.5 Vector catch debug events .............................................................................. C3-2067

C3.6 Halting debug events ...................................................................................... C3-2075

C3.7 Generation of debug events ............................................................................ C3-2076

C3.8 Debug event prioritization ............................................................................... C3-2078

C3.9 Pseudocode details of Software debug events ............................................... C3-2080

Chapter C4 Debug Exceptions

C4.1 About debug exceptions .................................................................................. C4-2090

C4.2 Avoiding debug exceptions that might cause UNPREDICTABLE behavior .... C4-2092

Chapter C5 Debug State

C5.1 About Debug state .......................................................................................... C5-2094

C5.2 Entering Debug state ...................................................................................... C5-2095

C5.3 Executing instructions in Debug state ............................................................. C5-2098

C5.4 Behavior of non-invasive debug in Debug state .............................................. C5-2106

C5.5 Exceptions in Debug state .............................................................................. C5-2107

C5.6 Memory system behavior in Debug state ........................................................ C5-2111

C5.7 Exiting Debug state ......................................................................................... C5-2112

Chapter C6 Debug Register Interfaces

C6.1 About the debug register interfaces ................................................................ C6-2116

C6.2 Synchronization of debug register updates ..................................................... C6-2117

C6.3 Access permissions ........................................................................................ C6-2119

C6.4 The CP14 debug register interface ................................................................. C6-2123

C6.5 The memory-mapped and recommended external debug interfaces .............. C6-2128

C6.6 Summary of the v7 Debug register interfaces ................................................. C6-2130

C6.7 Summary of the v7.1 Debug register interfaces .............................................. C6-2139

Chapter C7 Debug Reset and Powerdown Support

C7.1 Debug guidelines for systems with energy management capability ................ C7-2150

C7.2 Power domains and debug ............................................................................. C7-2151

C7.3 The OS Save and Restore mechanism ........................................................... C7-2154

C7.4 Reset and debug ............................................................................................. C7-2162

Chapter C8 The Debug Communications Channel and Instruction Transfer Register

C8.1 About the DCC and DBGITR .......................................................................... C8-2166

C8.2 Operation of the DCC and Instruction Transfer Register ................................ C8-2169

C8.3 Behavior of accesses to the DCC registers and DBGITR ............................... C8-2173

C8.4 Synchronization of accesses to the DCC and the DBGITR ............................ C8-2178

Chapter C9 Non-invasive Debug Authentication

C9.1 About non-invasive debug authentication ....................................................... C9-2184

C9.2 Non-invasive debug authentication ................................................................. C9-2185

C9.3 Effects of non-invasive debug authentication .................................................. C9-2187

Chapter C10 Sample-based Profiling

C10.1 Sample-based profiling ................................................................................. C10-2190

Chapter C11 The Debug Registers

C11.1 About the debug registers ............................................................................. C11-2194

C11.2 Debug register summary ............................................................................... C11-2195

C11.3 Debug identification registers ........................................................................ C11-2198

C11.4 Control and status registers .......................................................................... C11-2199

C11.5 Instruction and data transfer registers ........................................................... C11-2200

C11.6 Software debug event registers .................................................................... C11-2201

C11.7 Sample-based profiling registers ................................................................... C11-2202

C11.8 OS Save and Restore registers .................................................................... C11-2203

Non-Confidential ID051414

C11.9 Memory system control registers .................................................................. C11-2204

C11.10 Management registers .................................................................................. C11-2205

C11.11 Register descriptions, in register order .......................................................... C11-2211

Chapter C12 The Performance Monitors Extension

C12.1 About the Performance Monitors .................................................................. C12-2302

C12.2 Accuracy of the Performance Monitors ......................................................... C12-2306

C12.3 Behavior on overflow ..................................................................................... C12-2307

C12.4 Effect of the Security Extensions and Virtualization Extensions ................... C12-2309

C12.5 Event filtering, PMUv2 ................................................................................... C12-2311

C12.6 Counter enables ............................................................................................ C12-2313

C12.7 Counter access ............................................................................................. C12-2314

C12.8 Event numbers and mnemonics .................................................................... C12-2315

C12.9 Performance Monitors registers .................................................................... C12-2328

Part D Appendixes

Appendix A Recommended External Debug Interface

D1.1 About the recommended external debug interface ......................................... D1-2338

D1.2 Authentication signals ..................................................................................... D1-2340

D1.3 Run-control and cross-triggering signals ......................................................... D1-2342

D1.4 Recommended debug slave port .................................................................... D1-2346

D1.5 Other debug signals ........................................................................................ D1-2348

Appendix B Recommended Memory-mapped and External Debug Interfaces for the

Performance Monitors

D2.1 About the memory-mapped views of the Performance Monitors registers ...... D2-2354

D2.2 PMU register descriptions for memory-mapped register views ....................... D2-2363

Appendix C Recommendations for Performance Monitors Event Numbers for

IMPLEMENTATION DEFINED Events

D3.1 ARM recommendations for IMPLEMENTATION DEFINED event numbers ... D3-2378

Appendix D Example OS Save and Restore Sequences for External Debug Over

Powerdown

D4.1 Example OS Save and Restore sequences for v7 Debug .............................. D4-2390

D4.2 Example OS Save and Restore sequences for v7.1 Debug ........................... D4-2394

Appendix E System Level Implementation of the Generic Timer

D5.1 About the Generic Timer specification ............................................................ D5-2398

D5.2 Memory-mapped counter module ................................................................... D5-2399

D5.3 Counter module control and status register summary ..................................... D5-2402

D5.4 About the memory-mapped view of the counter and timer .............................. D5-2404

D5.5 The CNTBaseN and CNTPL0BaseN frames .................................................. D5-2405

D5.6 The CNTCTLBase frame ................................................................................. D5-2407

D5.7 System level Generic Timer register descriptions, in register order ................ D5-2408

D5.8 Providing a complete set of counter and timer features .................................. D5-2425

D5.9 Gray-count scheme for timer distribution scheme ........................................... D5-2427

Appendix F Common VFP Subarchitecture Specification

D6.1 Scope of this appendix .................................................................................... D6-2431

D6.2 Introduction to the Common VFP subarchitecture .......................................... D6-2432

D6.3 Exception processing ...................................................................................... D6-2434

D6.4 Support code requirements ............................................................................. D6-2438

D6.5 Context switching ............................................................................................ D6-2440

D6.6 Subarchitecture additions to the Floating-point Extension system registers ... D6-2441

ID051414 Non-Confidential

D6.7 Earlier versions of the Common VFP subarchitecture .................................... D6-2448

Appendix G Barrier Litmus Tests

D7.1 Introduction ..................................................................................................... D7-2450

D7.2 Simple ordering and barrier cases .................................................................. D7-2453

D7.3 Exclusive accesses and barriers ..................................................................... D7-2460

D7.4 Using a mailbox to send an interrupt ............................................................... D7-2462

D7.5 Cache and TLB maintenance operations and barriers .................................... D7-2463

Appendix H Legacy Instruction Mnemonics

D8.1 Thumb instruction mnemonics ........................................................................ D8-2470

D8.2 Other UAL mnemonic changes ....................................................................... D8-2471

D8.3 Pre-UAL pseudo-instruction NOP ................................................................... D8-2474

Appendix I Deprecated and Obsolete Features

D9.1 Deprecated features ........................................................................................ D9-2476

D9.2 Obsolete features ............................................................................................ D9-2485

D9.3 Use of the SP as a general-purpose register .................................................. D9-2486

D9.4 Explicit use of the PC in ARM instructions ...................................................... D9-2487

D9.5 Deprecated Thumb instructions ...................................................................... D9-2488

Appendix J Fast Context Switch Extension (FCSE)

D10.1 About the FCSE ............................................................................................ D10-2490

D10.2 Modified virtual addresses ............................................................................ D10-2491

D10.3 Debug and trace ............................................................................................ D10-2493

Appendix K VFP Vector Operation Support

D11.1 About VFP vector mode ................................................................................ D11-2496

D11.2 Vector length and stride control .................................................................... D11-2497

D11.3 VFP register banks ........................................................................................ D11-2498

D11.4 VFP instruction type selection ....................................................................... D11-2499

Appendix L ARMv6 Differences

D12.1 Introduction to ARMv6 ................................................................................... D12-2502

D12.2 Application level register support .................................................................. D12-2503

D12.3 Application level memory support ................................................................. D12-2506

D12.4 Instruction set support ................................................................................... D12-2510

D12.5 System level register support ........................................................................ D12-2515

D12.6 System level memory model ......................................................................... D12-2518

D12.7 System Control coprocessor, CP15, support ................................................ D12-2525

Appendix M v6 Debug and v6.1 Debug Differences

D13.1 About v6 Debug and v6.1 Debug .................................................................. D13-2550

D13.2 Invasive debug authentication, v6 Debug and v6.1 Debug ........................... D13-2551

D13.3 Debug events, v6 Debug and v6.1 Debug .................................................... D13-2552

D13.4 Debug exceptions, v6 Debug and v6.1 Debug .............................................. D13-2556

D13.5 Debug state, v6 Debug and v6.1 Debug ....................................................... D13-2557

D13.6 Debug register interfaces, v6 Debug and v6.1 Debug .................................. D13-2561

D13.7 Reset and powerdown support ..................................................................... D13-2564

D13.8 The Debug Communications Channel and Instruction Transfer Register ..... D13-2565

D13.9 Non-invasive debug authentication, v6 Debug and v6.1 Debug ................... D13-2566

D13.10 Sample-based profiling, v6 Debug and v6.1 Debug ...................................... D13-2568

D13.11 The debug registers, v6 Debug and v6.1 Debug .......................................... D13-2569

D13.12 Performance monitors, v6 Debug and v6.1 Debug ....................................... D13-2580

Appendix N Secure User Halting Debug

D14.1 About Secure User halting debug ................................................................. D14-2582

Non-Confidential ID051414

D14.2 Invasive debug authentication in an implementation that supports SUHD .... D14-2583

D14.3 Effects of SUHD on Debug state ................................................................... D14-2584

Appendix O ARMv4 and ARMv5 Differences

D15.1 Introduction to ARMv4 and ARMv5 ............................................................... D15-2590

D15.2 Application level register support .................................................................. D15-2591

D15.3 Application level memory support ................................................................. D15-2592

D15.4 Instruction set support ................................................................................... D15-2597

D15.5 System level register support ........................................................................ D15-2603

D15.6 System level memory model ......................................................................... D15-2606

D15.7 System Control coprocessor, CP15 support ................................................. D15-2614

Appendix P Pseudocode Definition

D16.1 About the ARMv7 pseudocode ..................................................................... D16-2644

D16.2 Pseudocode for instruction descriptions ........................................................ D16-2645

D16.3 Data types ..................................................................................................... D16-2647

D16.4 Expressions ................................................................................................... D16-2651

D16.5 Operators and built-in functions .................................................................... D16-2653

D16.6 Statements and program structure ................................................................ D16-2658

D16.7 Miscellaneous helper procedures and functions ........................................... D16-2662

Appendix Q Pseudocode Index

D17.1 Pseudocode operators and keywords ........................................................... D17-2668

D17.2 Pseudocode functions and procedures ......................................................... D17-2671

Appendix R Registers Index

D18.1 Alphabetic index of ARMv7 registers, by register name ................................ D18-2688

D18.2 Full registers index ........................................................................................ D18-2699

Glossary

ID051414 Non-Confidential

Preface

This preface introduces the ARM® Architecture Reference Manual, ARM®v7-A and ARM®v7-R edition. It contains

the following sections:

•About this manual on page xiv

•Using this manual on page xvi

•Conventions on page xxi

•Additional reading on page xxiii

•Feedback on page xxiv.

Preface

About this manual

Non-Confidential ID051414

About this manual

This manual describes the A and R profiles of the ARM® architecture v7, ARMv7. It includes descriptions of:

• The processor instruction sets:

— the original ARM® instruction set

— the high code density Thumb® instruction set

— the ThumbEE instruction set, that includes specific support for Just-In-Time (JIT) or Ahead-Of-Time

(AOT) compilation.

• The modes and states that determine how a processor operates, including the current execution privilege and

security.

• The exception model.

• The memory model, that defines memory ordering and memory management:

— the ARMv7-A architecture profile defines a Virtual Memory System Architecture (VMSA)

— the ARMv7-R architecture profile defines a Protected Memory System Architecture (PMSA).

• The programmers’ model, and its use of a coprocessor interface to access system control registers that control

most processor and memory system features.

• The OPTIONAL Floating-point (VFP) Extension, that provides high-performance floating-point instructions

that support:

— single-precision and double-precision operations

— conversions between double-precision, single-precision, and half-precision floating-point values.

• The OPTIONAL Advanced SIMD Extension, that provides high-performance integer and single-precision

floating-point vector operations.

• The OPTIONAL Security Extensions, that facilitate the development of secure applications.

• The OPTIONAL Virtualization Extensions, that support the virtualization of Non-secure operation.

• The Debug architecture, that provides software access to debug features in the processor.

Note

ARMv7 introduces the architecture profiles. A separate Architecture Reference Manual describes the third profile,

the Microcontroller profile, ARMv7-M. For more information see Architecture versions, profiles, and variants on

page A1-30.

This manual gives the assembler syntax for the instructions it describes, meaning it can specify instructions in

textual form. However, this manual is not a tutorial for ARM assembler language, nor does it describe ARM

assembler language, except at a very basic level. To make effective use of ARM assembler language, read the

documentation supplied with the assembler being used.

This manual is organized into parts:

Part A Describes the application level view of the architecture. It describes the application level view of

the programmers’ model and the memory model. It also describes the precise effects of each

instruction in User mode, the normal operating mode, including any restrictions on its use. This

information is of primary importance to authors and users of compilers, assemblers, and other

programs that generate ARM machine code. Software execution in User mode is at the PL0

privilege level, also described as unprivileged.

Note

User mode is the only mode where software execution is unprivileged.

Preface

About this manual

ID051414 Non-Confidential

Part B Describes the system level view of the architecture. It gives details of system registers, most of

which are not accessible from PL0, and the system level view of the memory model. It also gives

full details of the effects of instructions executed with some level of privilege, where these are

different from their effects in unprivileged execution.

Part C Describes the Debug architecture. This is an extension to the ARM architecture that provides

configuration, breakpoint and watchpoint support, and a Debug Communications Channel (DCC)

to a debug host.

Appendixes Provide additional information that is not part of the ARMv7 architectural requirements, including

descriptions of:

• features that are recommended but not required

• differences in previous versions of the architecture.

Preface

Using this manual

Non-Confidential ID051414

Using this manual

The information in this manual is organized into parts, as described in this section.

Part A, Application Level Architecture

Part A describes the application level view of the architecture. It contains the following chapters:

Chapter A1 Introduction to the ARM Architecture

Gives an overview of the ARM architecture, and the ARM and Thumb instruction sets.

Chapter A2 Application Level Programmers’ Model

Describes the application level view of the ARM programmers’ model, including the application

level view of the Advanced SIMD and Floating-point Extensions. It describes the types of values

that ARM instructions operate on, the ARM core registers that contain those values, and the

Application Program Status Register.

Chapter A3 Application Level Memory Model

Describes the application level view of the memory model, including the ARM memory types and

attributes, and memory access control.

Chapter A4 The Instruction Sets

Describes the range of instructions available in the ARM, Thumb, Advanced SIMD, and VFP

instruction sets. It also contains some details of instruction operation that are common to several

instructions.

Chapter A5 ARM Instruction Set Encoding

Describes the encoding of the ARM instruction set.

Chapter A6 Thumb Instruction Set Encoding

Describes the encoding of the Thumb instruction set.

Chapter A7 Advanced SIMD and Floating-point Instruction Encoding

Describes the encoding of the Advanced SIMD and Floating-point Extension (VFP) instruction sets.

Chapter A8 Instruction Descriptions

Gives a full description of every instruction available in the Thumb, ARM, Advanced SIMD, and

Floating-point Extension instruction sets, with the exception of information only relevant to

execution with some level of privilege.

Chapter A9 The ThumbEE Instruction Set

Gives a full description of the Thumb Execution Environment variant of the Thumb instruction set.

This means it describes the ThumbEE instruction set.

Preface

Using this manual

ID051414 Non-Confidential

Part B, System Level Architecture

Part B describes the system level view of the architecture. It contains the following chapters:

Chapter B1 System Level Programmers’ Model

Describes the system level view of the programmers’ model.

Chapter B2 Common Memory System Architecture Features

Describes the system level view of the memory model features that are common to all memory

systems.

Chapter B3 Virtual Memory System Architecture (VMSA)

Describes the system level view of the Virtual Memory System Architecture (VMSA) that is part of

all ARMv7-A implementations. This chapter includes a description of the organization and general

properties of the system control registers in a VMSA implementation.

Chapter B4 System Control Registers in a VMSA implementation

Describes all of the system control registers in VMSA implementation, including the registers that

are part of the OPTIONAL extensions to a VMSA implementation. The registers are described in

alphabetical order.

Chapter B5 Protected Memory System Architecture (PMSA)

Describes the system level view of the Protected Memory System Architecture (PMSA) that is part

of all ARMv7-R implementations. This chapter includes a description of the organization and

general properties of the system control registers in a PMSA implementation.

Chapter B6 System Control Registers in a PMSA implementation

Describes all of the system control registers in PMSA implementation, including the registers that

are part of the OPTIONAL extensions to a PMSA implementation. The registers are described in

alphabetical order.

Chapter B7 The CPUID Identification Scheme

Describes the CPUID scheme. This provides registers that identify the architecture version and

many features of the processor implementation. This chapter also describes the registers that

identify the implemented Advanced SIMD and VFP features, if any.

Chapter B8 The Generic Timer

Describes the OPTIONAL Generic Timer architecture extension.

Chapter B9 System Instructions

Provides detailed reference information about system instructions, and more information about

instructions that behave differently when executed with some level of privilege.

Part C, Debug Architecture

Part C describes the Debug architecture. It contains the following chapters:

Chapter C1 Introduction to the ARM Debug Architecture

Introduces the Debug architecture, defining the scope of this part of the manual.

Chapter C2 Invasive Debug Authentication

Describes the authentication of invasive debug.

Chapter C3 Debug Events

Describes the debug events.

Preface

Using this manual

Non-Confidential ID051414

Chapter C4 Debug Exceptions

Describes the debug exceptions that handle debug events when the processor is configured for

Monitor debug-mode.

Chapter C5 Debug State

Describes Debug state that is entered if a debug event occurs when the processor is configured for

Halting debug-mode.

Chapter C6 Debug Register Interfaces

Describes the permitted debug register interfaces and the options for their implementation.

Chapter C7 Debug Reset and Powerdown Support

Describes the reset and powerdown support in the Debug architecture, including support for debug

over powerdown.

Chapter C8 The Debug Communications Channel and Instruction Transfer Register

Describes the Debug Communication Channel (DCC) and Instruction Transfer Register (ITR), and

how an external debugger uses these features to communicate with the debug logic.

Chapter C9 Non-invasive Debug Authentication

Describes the authentication of non-invasive debug.

Chapter C10 Sample-based Profiling

Describes sample-based profiling, that provides sampling of the program counter.

Chapter C11 The Debug Registers

Describes the debug registers.

Chapter C12 The Performance Monitors Extension

Describes the OPTIONAL Performance Monitors Extension.

Part D, Appendixes

This manual contains the following appendixes:

Appendix D1 Recommended External Debug Interface

Describes the recommended external interface to the ARM debug architecture.

Note

This description is not part of the ARM architecture specification. It is included here as

supplementary information, for the convenience of developers and users who might require this

information.

Appendix D2 Recommended Memory-mapped and External Debug Interfaces for the Performance Monitors

Describes the recommended external interfaces to the Performance Monitors Extension.

Note

This description is not part of the ARM architecture specification. It is included here as

supplementary information, for the convenience of developers and users who might require this

information.

Preface

Using this manual

ID051414 Non-Confidential

Appendix D3 Recommendations for Performance Monitors Event Numbers for IMPLEMENTATION

DEFINED Events

Gives the ARM recommendations for the use of the event numbers in the IMPLEMENTATION

DEFINED event number space.

Note

This description is not part of the ARM architecture specification. It is included here as

supplementary information, for the convenience of developers and users who might require this

information.

Appendix D4 Example OS Save and Restore Sequences for External Debug Over Powerdown

Gives software examples that perform the OS Save and Restore sequences, for v7 Debug and v7.1

Debug implementations.

Note

Chapter C7 Debug Reset and Powerdown Support describes the OS Save and Restore mechanism,

for both v7 Debug and v7.1 Debug.

Appendix D5 System Level Implementation of the Generic Timer

Contains the ARM Generic Timer architecture specification for the memory-mapped interface to

the Generic Timer.

Note

This description is not part of the ARM architecture specification. It is included here as

supplementary information, for the convenience of developers and users who might require this

information.

Appendix D6 Common VFP Subarchitecture Specification

Defines version 2 of the Common VFP Subarchitecture.

Note

This specification is not part of the ARM architecture specification. This sub-architectural

information is included here as supplementary information, for the convenience of developers and

users who might require this information.

Appendix D7 Barrier Litmus Tests

Gives examples of the use of the barrier instructions provided by the ARMv7 architecture.

Note

These examples are not part of the ARM architecture specification. They are included here as

supplementary information, for the convenience of developers and users who might require this

information.

Appendix D8 Legacy Instruction Mnemonics

Describes the legacy mnemonics and their Unified Assembler Language equivalents.

Appendix D9 Deprecated and Obsolete Features

Lists the deprecated architectural features, with references to their descriptions in parts A to C of

the manual.

Appendix D10 Fast Context Switch Extension (FCSE)

Describes the Fast Context Switch Extension (FCSE). See the appendix for information about the

status of this in different versions of the ARM architecture.

Preface

Using this manual

Non-Confidential ID051414

Appendix D11 VFP Vector Operation Support

Describes the VFP vector operations. ARM deprecates the use of these operations.

Appendix D12 ARMv6 Differences

Describes how the ARMv6 architecture differs from the description given in parts A and B of this

manual.

Appendix D13 v6 Debug and v6.1 Debug Differences

Describes how the two debug architectures for ARMv6 differ from the description given in part C

of this manual.

Appendix D14 Secure User Halting Debug

Describes the Secure User halting debug (SUHD) feature.

Appendix D15 ARMv4 and ARMv5 Differences

Describes how the ARMv4 and ARMv5 architectures differ from the description given in parts A

and B of this manual.

Appendix D16 Pseudocode Definition

The formal definition of the pseudocode used in this manual.

Appendix D17 Pseudocode Index

Gives indexes to definitions of pseudocode operators, keywords, functions, and procedures.

Appendix D18 Registers Index

Gives indexes to register descriptions in the manual.

Preface

Conventions

ID051414 Non-Confidential

Conventions

The following sections describe conventions that this book can use:

•Typographic conventions

•Signals

•Numbers on page xxii

•Pseudocode descriptions on page xxii

•Assembler syntax descriptions on page xxii.

Typographic conventions

The typographic conventions are:

italic Introduces special terminology, and denotes citations.

bold Denotes signal names, and is used for terms in descriptive lists, where appropriate.

monospace

Used for assembler syntax descriptions, pseudocode, and source code examples.

Also used in the main text for instruction mnemonics and for references to other items appearing in

assembler syntax descriptions, pseudocode, and source code examples.

SMALL CAPITALS

Used in body text for a few terms that have specific technical meanings, and are defined in the

Glossary.

Colored text Indicates a link. This can be:

• a URL, for example

http://infocenter.arm.com

• a cross-reference, that includes the page number of the referenced information if it is not on

the current page, for example, Pseudocode descriptions on page xxii

• a link, to a chapter or appendix, or to a glossary entry, or to the section of the document that

defines the colored term, for example Simple sequential execution or SCTLR.

Note

Many links are to a register or instruction definition. Remember that:

• many system control registers are defined both in Chapter B4 System Control Registers in a

VMSA implementation and in Chapter B6 System Control Registers in a PMSA

implementation

• many instructions are defined in multiple forms, and in some cases the ARM encodings of an

instruction are defined separately to the Thumb encodings.

Ensure that any linked definition you refer to is appropriate to your context.

Signals

In general this specification does not define processor signals, but it does include some signal examples and

recommendations. The signal conventions are:

Signal level The level of an asserted signal depends on whether the signal is active-HIGH or

active-LOW. Asserted means:

• HIGH for active-HIGH signals

• LOW for active-LOW signals.

Lower-case n At the start or end of a signal name denotes an active-LOW signal.

Preface

Conventions

Non-Confidential ID051414

Numbers

Numbers are normally written in decimal. Binary numbers are preceded by

, and hexadecimal numbers by

. In

both cases, the prefix and the associated value are written in a

monospace

font, for example

0xFFFF0000

Pseudocode descriptions

This manual uses a form of pseudocode to provide precise descriptions of the specified functionality. This

pseudocode is written in a

monospace

font, and is described in Appendix D16 Pseudocode Definition.

Assembler syntax descriptions

This manual contains numerous syntax descriptions for assembler instructions and for components of assembler

instructions. These are shown in a

monospace

font, and use the conventions described in Assembler syntax on

page A8-283.

Preface

Additional reading

ID051414 Non-Confidential

Additional reading

This section lists relevant publications from ARM and third parties.

See the Infocenter

http://infocenter.arm.com

, for access to ARM documentation.

ARM publications

•ARM® Debug Interface v5 Architecture Specification (ARM IHI 0031).

•ARM®v7-M Architecture Reference Manual (ARM DDI 0403).

•ARM CoreSight™ Architecture Specification (ARM IHI 0029).

•ARM® Architecture Reference Manual (ARM DDI 0100I).

Note

— Issue I of the ARM Architecture Reference Manual (DDI 0100I) was issued in July 2005 and describes

the first version of the ARMv6 architecture, and all previous architecture versions.

— Addison-Wesley Professional publish ARM Architecture Reference Manual, Second Edition

(December 27, 2000). The contents of this are identical to issue E of the ARM Architecture Reference

Manual (DDI 0100E). It describes ARMv5TE and earlier versions of the ARM architecture, and is

superseded by DDI 0100I.

•ARM Embedded Trace Macrocell Architecture Specification (ARM IHI 0014).

•ARM CoreSight™ Program Flow Trace Architecture Specification (ARM IHI 0035).

•ARM® Generic Interrupt Controller Architecture Specification (ARM IHI 0048).

Other publications

The following books are referred to in this manual, or provide more information:

• IEEE Std 1596.5-1993, IEEE Standard for Shared-Data Formats Optimized for Scalable Coherent Interface

(SCI) Processors, ISBN 1-55937-354-7.

• IEEE Std 1149.1-2001, IEEE Standard Test Access Port and Boundary Scan Architecture (JTAG).

• ANSI/IEEE Std 754-2008, and ANSI/IEEE Std 754-1985, IEEE Standard for Binary Floating-Point

Arithmetic. See also Floating-point standards, and terminology on page A2-55.

• JEDEC Solid State Technology Association, Standard Manufacturer’s Identification Code, JEP106.

•Tim Lindholm and Frank Yellin, The Java Virtual Machine Specification, Second Edition, Addison Wesley,

ISBN: 0-201-43294-3.

•Kourosh Gharachorloo, Memory Consistency Models for Shared Memory-Multiprocessors, 1995, Stanford

University Technical Report CSL-TR-95-685.

Preface

Feedback

Non-Confidential ID051414

Feedback

ARM welcomes feedback on its documentation.

Feedback on this manual

If you have comments on the content of this manual, send e-mail to

errata@arm.com

. Give:

• the title

• the number, ARM DDI 0406C.c

• the page numbers to which your comments apply

• a concise explanation of your comments.

ARM also welcomes general suggestions for additions and improvements.

Part A

Application Level Architecture

ID051414 Non-Confidential

Chapter A1

Introduction to the ARM Architecture

This chapter introduces the ARM architecture and contains the following sections:

•About the ARM architecture on page A1-28

•The instruction sets on page A1-29

•Architecture versions, profiles, and variants on page A1-30

•Architecture extensions on page A1-32

•The ARM memory model on page A1-35.

A1 Introduction to the ARM Architecture

A1.1 About the ARM architecture

Non-Confidential ID051414

A1.1 About the ARM architecture

The ARM architecture supports implementations across a wide range of performance points. The architectural

simplicity of ARM processors leads to very small implementations, and small implementations mean devices can

have very low power consumption. Implementation size, performance, and very low power consumption are key

attributes of the ARM architecture.

The ARM architecture is a Reduced Instruction Set Computer (RISC) architecture, as it incorporates these RISC

architecture features:

• a large uniform register file

•a load/store architecture, where data-processing operations only operate on register contents, not directly on

memory contents

• simple addressing modes, with all load/store addresses being determined from register contents and

instruction fields only.

In addition, the ARM architecture provides:

• instructions that combine a shift with an arithmetic or logical operation

• auto-increment and auto-decrement addressing modes to optimize program loops

• Load and Store Multiple instructions to maximize data throughput

• conditional execution of many instructions to maximize execution throughput.

These enhancements to a basic RISC architecture mean ARM processors achieve a good balance of high

performance, small program size, low power consumption, and small silicon area.

This Architecture Reference Manual defines a set of behaviors to which an implementation must conform, and a set

of rules for software to use the implementation. It does not describe how to build an implementation.

Except where the architecture specifies differently, the programmer-visible behavior of an implementation must be

the same as a simple sequential execution of the program. This programmer-visible behavior does not include the

execution time of the program.

The ARM architecture includes definitions of:

• An associated debug architecture, see Debug architecture versions on page A1-31 and Part C of this manual.

• Associated trace architectures, that define trace macrocells that implementers can implement with the

associated processor. For more information see the ARM Embedded Trace Macrocell Architecture

Specification and the ARM CoreSight™ Program Flow Trace Architecture Specification.

A1 Introduction to the ARM Architecture

A1.2 The instruction sets

ID051414 Non-Confidential

A1.2 The instruction sets

The ARM instruction set is a set of 32-bit instructions providing comprehensive data-processing and control

functions.

The Thumb instruction set was developed as a 16-bit instruction set with a subset of the functionality of the ARM

instruction set. It provides significantly improved code density, at a cost of some reduction in performance. A

processor executing Thumb instructions can change to executing ARM instructions for performance critical

segments, in particular for handling interrupts.

ARMv6T2 introduced Thumb-2 technology. This technology extends the original Thumb instruction set with many

32-bit instructions. The range of 32-bit Thumb instructions included in ARMv6T2 permits Thumb code to achieve

performance similar to ARM code, with code density better than that of earlier Thumb code.

From ARMv6T2, the ARM and Thumb instruction sets provide almost identical functionality. For more

information, see Chapter A4 The Instruction Sets.

A1.2.1 Execution environment support

Two additional instruction sets support execution environments:

• The architecture can provide hardware acceleration of Java bytecodes. For more information, see:

—Jazelle direct bytecode execution support on page A2-97, for application level information

—Jazelle direct bytecode execution on page B1-1241, for system level information.

The Virtualization Extensions do not support hardware acceleration of Java bytecodes. That is, they support

only a trivial implementation of the Jazelle® extension.

• The ThumbEE instruction set is a variant of the Thumb instruction set that minimizes the code size overhead

of a Just-In-Time (JIT) or Ahead-Of-Time (AOT) compiler. JIT and AOT compilers convert execution

environment source code to a native executable. For more information, see:

—Thumb Execution Environment on page A2-95, for application level information

—Thumb Execution Environment on page B1-1240, for system level information.

From the publication of issue C.a of this manual, ARM deprecates any use of the ThumbEE instruction set.

A1 Introduction to the ARM Architecture

A1.3 Architecture versions, profiles, and variants

Non-Confidential ID051414

A1.3 Architecture versions, profiles, and variants

The ARM architecture has evolved significantly since its introduction, and ARM continues to develop it. Seven

major versions of the architecture have been defined to date, denoted by the version numbers 1 to 7. Of these, the

first three versions are now obsolete.

ARMv7 provides three profiles:

ARMv7-A Application profile, described in this manual:

• Implements a traditional ARM architecture with multiple modes.

• Supports a Virtual Memory System Architecture (VMSA) based on a Memory Management

Unit (MMU). An ARMv7-A implementation can be called a VMSAv7 implementation.

• Supports the ARM and Thumb instruction sets.

ARMv7-R Real-time profile, described in this manual:

• Implements a traditional ARM architecture with multiple modes.

• Supports a Protected Memory System Architecture (PMSA) based on a Memory Protection

Unit (MPU). An ARMv7-R implementation can be called a PMSAv7 implementation.

• Supports the ARM and Thumb instruction sets.

ARMv7-M Microcontroller profile, described in the ARMv7-M Architecture Reference Manual:

• Implements a programmers' model designed for low-latency interrupt processing, with

hardware stacking of registers and support for writing interrupt handlers in high-level

languages.

• Implements a variant of the ARMv7 PMSA.

• Supports a variant of the Thumb instruction set.

Note

Parts A, B, and C of this Architecture Reference Manual describe the ARMv7-A and ARMv7-R profiles:

• Appendixes describe how the ARMv4-ARMv6 architecture versions differ from ARMv7.

• Separate Architecture Reference Manuals define the M-profile architectures, see Additional reading on

page xxiii.

Architecture versions can be qualified with variant letters to specify additional instructions and other functionality

that are included as an architecture extension.

Some extensions are described separately instead of using a variant letter. For details of these extensions see

Architecture extensions on page A1-32.

The valid variants of ARMv4, ARMv5, and ARMv6 are as follows:

ARMv4 The earliest architecture variant covered by this manual. It includes only the ARM instruction set.

ARMv4T Adds the Thumb instruction set.

ARMv5T Improves interworking of ARM and Thumb instructions. Adds Count Leading Zeros (

CLZ

) and

software Breakpoint (

BKPT

) instructions.

ARMv5TE Enhances arithmetic support for digital signal processing (DSP) algorithms. Adds Preload Data

(

PLD

), Load Register Dual (

LDRD

), Store Register Dual (

STRD

), and 64-bit coprocessor register transfer

(

MCRR

MRRC

) instructions.

ARMv5TEJ Adds the

BXJ

instruction and other support for the Jazelle® architecture extension.

ARMv6 Adds many new instructions to the ARM instruction set. Formalizes and revises the memory model

and the Debug architecture.

A1 Introduction to the ARM Architecture

A1.3 Architecture versions, profiles, and variants

ID051414 Non-Confidential

ARMv6K Adds instructions to support multiprocessing to the ARM instruction set, and some extra memory

model features.

ARMv6T2 Introduces Thumb-2 technology, that supports a major development of the Thumb instruction set to

provide a similar level of functionality to the ARM instruction set.

Note

Where appropriate, the terms ARMv6KZ or ARMv6Z describe the ARMv6K architecture with the ARMv6

Security Extensions, that were an OPTIONAL addition to the VMSAv6 architecture.

For detailed information about how earlier versions of the ARM architecture differ from ARMv7, see

Appendix D12 ARMv6 Differences and Appendix D15 ARMv4 and ARMv5 Differences.

The following architecture variants are now obsolete:

ARMv1, ARMv2, ARMv2a, ARMv3, ARMv3G, ARMv3M, ARMv4xM, ARMv4TxM, ARMv5, ARMv5xM,

ARMv5TxM, and ARMv5TExP.

Contact ARM if you require details of obsolete variants.

Each instruction description in this manual specifies the architecture versions that include the instruction.

A1.3.1 Debug architecture versions

Before ARMv6, the debug implementation for an ARM processor was IMPLEMENTATION DEFINED. ARMv6 defined

the first debug architecture.

The debug architecture versions are:

v6 Debug Introduced with the original ARMv6 architecture definition.

v6.1 Debug Introduced to ARMv6K with the OPTIONAL Security Extensions, described in Architecture

extensions on page A1-33. A VMSAv6 implementation that includes the Security Extensions must

implement v6.1 Debug.

v7 Debug First defined in issue A of this manual, and required by any ARMv7-R implementation

An ARMv7-A implementation that does not include the Virtualization Extensions must implement

either v7 Debug or v7.1 Debug.

For more information about the Virtualization Extensions, see Architecture extensions on

page A1-33.

v7.1 Debug First defined in issue C.a of this manual, and required by any ARMv7-A implementation that

includes the Virtualization Extensions.

For more information, see:

•Chapter C1 Introduction to the ARM Debug Architecture, for v7 Debug and v7.1 Debug

•About v6 Debug and v6.1 Debug on page D13-2550, for v6 Debug and v6.1 Debug.

Note

In this manual:

• debug usually refers to invasive debug, that permits modification of the state of the processor

• trace usually refers to non-invasive debug, that does not permit modification of the state of the processor.

For more information see About the ARM Debug architecture on page C1-2023.

A1 Introduction to the ARM Architecture

A1.4 Architecture extensions

Non-Confidential ID051414

A1.4 Architecture extensions

Instruction set architecture extensions summarizes the extensions that mainly affect the Instruction Set Architecture

(ISA), either extending the instructions implemented in the ARM and Thumb instruction sets, or implementing an

additional instruction set.

Architecture extensions on page A1-33 describes other extensions to the architecture.

A1.4.1 Instruction set architecture extensions

This manual describes the following extensions to the ISA:

Jazelle Is the Java bytecode execution extension that extended ARMv5TE to ARMv5TEJ. From

ARMv6, the architecture requires at least the trivial Jazelle implementation, but a Jazelle

implementation is still often described as a Jazelle extension.

The Virtualization Extensions require that the Jazelle implementation is the trivial Jazelle

implementation.

ThumbEE Is an extension that provides the ThumbEE instruction set, a variant of the Thumb

instruction set that is designed as a target for dynamically generated code. In the original

release of the ARMv7 architecture, the ThumbEE extension was:

• A required extension to the ARMv7-A profile.

• An optional extension to the ARMv7-R profile.

From publication of issue C.a of this manual, ARM deprecates any use of ThumbEE

instructions. However, ARMv7-A implementations must continue to include ThumbEE

support, for backwards compatibility.

Floating-point Is a floating-point coprocessor extension to the instruction set architectures. For historic

reasons, the Floating-point Extension is also called the VFP Extension. There have been the

following versions of the Floating-point (VFP) Extension:

VFPv1 Obsolete. Details are available on request from ARM.

VFPv2 An optional extension to:

• the ARM instruction set in the ARMv5TE, ARMv5TEJ, ARMv6, and

ARMv6K architectures

• the ARM and Thumb instruction sets in the ARMv6T2 architecture.

VFPv3 An OPTIONAL extension to the ARM, Thumb, and ThumbEE instruction sets in

the ARMv7-A and ARMv7-R profiles.

VFPv3 can be implemented with either thirty-two or sixteen doubleword

registers, as described in Advanced SIMD and Floating-point Extension

registers on page A2-56. Where necessary, the terms VFPv3-D32 and

VFPv3-D16distinguish between these two implementation options. Where the

term VFPv3 is used it covers both options.

VFPv3U is a variant of VFPv3 that supports the trapping of floating-point

exceptions to support code, see VFPv3U and VFPv4U on page A2-62.

VFPv3 with Half-precision Extension

VFPv3 and VFPv3U can be extended by the OPTIONAL Half-precision

Extension, that provides conversion functions in both directions between

half-precision floating-point and single-precision floating-point.

VFPv4 An OPTIONAL extension to the ARM, Thumb, and ThumbEE instruction sets in

the ARMv7-A and ARMv7-R profiles.

VFPv4U is a variant of VFPv4 that supports the trapping of floating-point

exceptions to support code, see VFPv3U and VFPv4U on page A2-62.

A1 Introduction to the ARM Architecture

A1.4 Architecture extensions

ID051414 Non-Confidential

VFPv4 and VFPv4U add both the Half-precision Extension and the fused

multiply-add instructions to the features of VFPv3. VFPv4 can be implemented

with either thirty-two or sixteen doubleword registers, see Advanced SIMD and

Floating-point Extension registers on page A2-56. Where necessary, these

implementation options are distinguished using the terms:

• VFPv4-D32, or VFPv4U-D32, for a thirty-two register implementation

• VFPv4-D16, or VFPv4U-D16, for a sixteen register implementation.

Where the term VFPv4 is used it covers both options.

If an implementation includes both the Floating-point and Advanced SIMD Extensions:

• It must implement the corresponding versions of the extensions:

— if the implementation includes VFPv3 it must include Advanced SIMDv1

— if the implementation includes VFPv3 with the Half-precision Extension it

must include Advanced SIMDv1 with the half-precision extensions

— if the implementation includes VFPv4 it must include Advanced SIMDv2.

• The two extensions use the same register bank. This means VFP must be

implemented as VFPv3-D32, or as VFPv4-D32.

• Some instructions apply to both extensions.

Advanced SIMD Is an instruction set extension that provides Single Instruction Multiple Data (SIMD)

integer and single-precision floating-point vector operations on doubleword and quadword

registers. There have been the following versions of Advanced SIMD:

Advanced SIMDv1

It is an OPTIONAL extension to the ARMv7-A and ARMv7-R profiles.

Advanced SIMDv1 with Half-precision Extension

Advanced SIMDv1 can be extended by the OPTIONAL Half-precision Extension,

that provides conversion functions in both directions between half-precision

floating-point and single-precision floating-point.

Advanced SIMDv2

It is an OPTIONAL extension to the ARMv7-A and ARMv7-R profiles.

Advanced SIMDv2 adds both the Half-precision Extension and the fused

multiply-add instructions to the features of Advanced SIMDv1.

See the description of the Floating-point Extension for more information about

implementations that include both the Floating-point Extension and the Advanced SIMD

Extension.

A1.4.2 Architecture extensions

This manual also describes the following extensions to the ARMv7 architecture:

Security Extensions

Are an OPTIONAL set of extensions to VMSAv6 implementations of the ARMv6K architecture, and

to the ARMv7-A architecture profile, that provide a set of security features that facilitate the

development of secure applications.

Multiprocessing Extensions

Are an OPTIONAL set of extensions to the ARMv7-A and ARMv7-R profiles, that provides a set of

features that enhance multiprocessing functionality.

Large Physical Address Extension

Is an OPTIONAL extension to VMSAv7 that provides an address translation system supporting

physical addresses of up to 40 bits at a fine grain of translation.

The Large Physical Address Extension requires implementation of the Multiprocessing Extensions.

A1 Introduction to the ARM Architecture

A1.4 Architecture extensions

Non-Confidential ID051414

Virtualization Extensions

Are an OPTIONAL set of extensions to VMSAv7 that provides hardware support for virtualizing the

Non-secure state of a VMSAv7 implementation. This supports system use of a virtual machine

monitor, also called a hypervisor, to switch Guest operating systems.

The Virtualization Extensions require implementation of:

• the Security Extensions

• the Large Physical Address Extension

• the v7.1 Debug architecture, see Scope of part C of this manual on page C1-2022.

If an implementation that includes the Virtualization Extensions also implements:

• The Performance Monitors Extension, then it must implement version 2 of that extension,

PMUv2, see About the Performance Monitors on page C12-2302.

• A trace macrocell, that trace macrocell must support the Virtualization Extensions. In

particular, if the trace macrocell is:

—an Embedded Trace Macrocell (ETM), the macrocell must implement ETMv3.5 or

later, see the Embedded Trace Macrocell Architecture Specification

—a Program Trace Macrocell (PTM), the macrocell must implement PFTv1.1 or later,

see the CoreSight Program Flow Trace Architecture Specification.

In some tables in this manual, an ARMv7-A implementation that includes the Virtualization

Extensions is described as ARMv7VE, or as v7VE.

Generic Timer Extension

Is an OPTIONAL extension to any ARMv7-A or ARMv7-R, that provides a system timer, and a

low-latency register interface to it.

This extension is introduced with the Large Physical Address Extension and Virtualization

Extensions, but can be implemented with any earlier version of the ARMv7 architecture. The

Generic Timer Extension does not require the implementation of any of the extensions described in

this subsection.

For more information see Chapter B8 The Generic Timer.

Performance Monitors Extension

The ARMv7 architecture:

• reserves CP15 register space for IMPLEMENTATION DEFINED performance monitors

• defines a recommended performance monitors implementation.

From issue C.a of this manual, this recommended implementation is called the Performance

Monitors Extension.

The Performance Monitors Extension does not require the implementation of any of the extensions

described in this subsection.

If an ARMv7 implementation that includes v7.1 Debug also includes the Performance Monitors

Extension, it must implement PMUv2.

For more information see Chapter C12 The Performance Monitors Extension.

Note

The Fast Context Switch Extension (FCSE) is an older ARM extension, described in Appendix D10:

• ARM deprecates any use of this extension. This means in ARMv7 implementations before the introduction

of the Multiprocessing Extensions, the FCSE is OPTIONAL and deprecated.

• The Multiprocessing Extensions obsolete the FCSE. This means that any processor that includes the

Multiprocessing Extensions cannot include the FCSE. This includes all processors that implement the Large

Physical Address Extension.

A1 Introduction to the ARM Architecture

A1.5 The ARM memory model

ID051414 Non-Confidential

A1.5 The ARM memory model

The ARM instruction sets address a single, flat address space of 232 8-bit bytes. This address space is also regarded

as 230 32-bit words or 231 16-bit halfwords.

The architecture provides facilities for:

• generating an exception on an unaligned memory access

• restricting access by applications to specified areas of memory

• translating virtual addresses provided by executing instructions into physical addresses

• altering the interpretation of word and halfword data between big-endian and little-endian

• controlling the order of accesses to memory

• controlling caches

• synchronizing access to shared memory by multiple processors.

For more information, see:

•Chapter A3 Application Level Memory Model

•Chapter B2 Common Memory System Architecture Features

•Chapter B3 Virtual Memory System Architecture (VMSA)

•Chapter B5 Protected Memory System Architecture (PMSA).

A1 Introduction to the ARM Architecture

A1.5 The ARM memory model

Non-Confidential ID051414

ID051414 Non-Confidential

Chapter A2

Application Level Programmers’ Model

This chapter gives an application level view of the ARM programmers’ model. It contains the following sections:

•About the Application level programmers’ model on page A2-38

•ARM core data types and arithmetic on page A2-40

•ARM core registers on page A2-45

•The Application Program Status Register (APSR) on page A2-49

•Execution state registers on page A2-50

•Advanced SIMD and Floating-point Extensions on page A2-54

•Floating-point data types and arithmetic on page A2-63

•Polynomial arithmetic over {0, 1} on page A2-93

•Coprocessor support on page A2-94

•Thumb Execution Environment on page A2-95

•Jazelle direct bytecode execution support on page A2-97

•Exceptions, debug events and checks on page A2-102.

Note

In this chapter, system register names usually link to the description of the register in Chapter B4 System Control

Registers in a VMSA implementation, for example FPSCR. If the register is included in a PMSA implementation,

then it is also described in Chapter B6 System Control Registers in a PMSA implementation.

A2 Application Level Programmers’ Model

A2.1 About the Application level programmers’ model

Non-Confidential ID051414

A2.1 About the Application level programmers’ model

This chapter contains the programmers’ model information required for application development.

The information in this chapter is distinct from the system information required to service and support application

execution under an operating system, or higher level of system software. However, some knowledge of that system

information is needed to put the Application level programmers' model into context.

Depending on the implemented architecture extensions, the architecture supports multiple levels of execution

privilege, that number upwards from PL0, where PL0 is the lowest privilege level and is often described as

unprivileged. The Application level programmers’ model is the programmers’ model for software executing at PL0.

For more information see Processor privilege levels, execution privilege, and access privilege on page A3-142.

System software determines the privilege level at which application software runs. When an operating system

supports execution at both PL1 and PL0, an application usually runs unprivileged. This:

• permits the operating system to allocate system resources to an application in a unique or shared manner

• provides a degree of protection from other processes and tasks, and so helps protect the operating system

from malfunctioning applications.

This chapter indicates where some system level understanding is helpful, and if appropriate it gives a reference to

the system level description in Chapter B1 System Level Programmers’ Model, or elsewhere.

The Security Extensions extend the architecture to provide hardware security features that support the development

of secure applications, by providing two Security states. The Virtualization Extensions further extend the

architecture to provide virtualization of operation in Non-secure state. However, application level software is

generally unaware of these extensions. For more information, see The Security Extensions on page B1-1156 and The

Virtualization Extensions on page B1-1162.

Note

• When an implementation includes the Security Extensions, application and operating system software

normally executes in Non-secure state.

• The virtualization features accessible only at PL2 are implemented only in Non-secure state. Secure state has

only two privilege levels, PL0 and PL1.

• Older documentation, describing implementations or architecture versions that support only two privilege

levels, often refers to execution at PL1 as privileged execution.

• In this manual, the following terms have special meanings, defined in the Glossary:

—IMPLEMENTATION DEFINED, see IMPLEMENTATION DEFINED.

—OPTIONAL, see OPTIONAL.

—SUBARCHITECTURE DEFINED, see SUBARCHITECTURE DEFINED.

—UNDEFINED, see UNDEFINED.

—UNKNOWN, see UNKNOWN.

—UNPREDICTABLE, see UNPREDICTABLE.

A2 Application Level Programmers’ Model

A2.1 About the Application level programmers’ model

ID051414 Non-Confidential

A2.1.1 Instruction sets, arithmetic operations, and register files

The ARM and Thumb instruction sets both provide a wide range of integer arithmetic and logical operations, that

operate on register file of sixteen 32-bit registers, the ARM core registers. As described in ARM core registers on

page A2-45, these registers include the special registers SP, LR, and PC. ARM core data types and arithmetic on

page A2-40 gives more information about these operations.

In addition, if an implementation includes:

• the Floating-point (VFP) Extension, the ARM and Thumb instruction sets include floating-point instructions

• the Advanced SIMD Extension, the ARM and Thumb instruction sets include vector instructions.

Floating-point and vector instructions operate on an independent register file, described in Advanced SIMD and

Floating-point Extension registers on page A2-56. In an implementation that includes both of these extensions, they

share a common register file. The following sections give more information about these extensions and the

instructions they provide:

•Advanced SIMD and Floating-point Extensions on page A2-54

•Floating-point data types and arithmetic on page A2-63

•Polynomial arithmetic over {0, 1} on page A2-93.

A2 Application Level Programmers’ Model

A2.2 ARM core data types and arithmetic

Non-Confidential ID051414

A2.2 ARM core data types and arithmetic

All ARMv7-A and ARMv7-R processors support the following data types in memory:

Byte 8 bits

Halfword 16 bits

Word 32 bits

Doubleword 64 bits.

Processor registers are 32 bits in size. The instruction set contains instructions supporting the following data types

held in registers:

• 32-bit pointers

• unsigned or signed 32-bit integers

• unsigned 16-bit or 8-bit integers, held in zero-extended form

• signed 16-bit or 8-bit integers, held in sign-extended form

• two 16-bit integers packed into a register

• four 8-bit integers packed into a register

• unsigned or signed 64-bit integers held in two registers.

Load and store operations can transfer bytes, halfwords, or words to and from memory. Loads of bytes or halfwords

zero-extend or sign-extend the data as it is loaded, as specified in the appropriate load instruction.

The instruction sets include load and store operations that transfer two or more words to and from memory. Software

can load and store doublewords using these instructions.

Note

For information about the atomicity of memory accesses see Atomicity in the ARM architecture on page A3-128.

When any of the data types is described as unsigned, the N-bit data value represents a non-negative integer in the

range 0 to 2N-1, using normal binary format.

When any of these types is described as signed, the N-bit data value represents an integer in the range -2N-1 to

+2N-1-1, using two's complement format.

The instructions that operate on packed halfwords or bytes include some multiply instructions that use just one of

two halfwords, and SIMD instructions that perform parallel addition or subtraction on all of the halfwords or bytes.

Direct instruction support for 64-bit integers is limited, and most 64-bit operations require sequences of two or more

instructions to synthesize them.

A2 Application Level Programmers’ Model

A2.2 ARM core data types and arithmetic

ID051414 Non-Confidential

A2.2.1 Integer arithmetic

The instruction set provides a wide variety of operations on the values in registers, including bitwise logical

operations, shifts, additions, subtractions, multiplications, and many others. The pseudocode described in

Appendix D16 Pseudocode Definition defines these operations, usually in one of three ways:

• By direct use of the pseudocode operators and built-in functions defined in Operators and built-in functions

on page D16-2653.

• By use of pseudocode helper functions defined in the main text. These can be located using the table in

Appendix D17 Pseudocode Index.

• By a sequence of the form:

1. Use of the

SInt()

UInt()

, and

Int()

built-in functions defined in Converting bitstrings to integers on

page D16-2655 to convert the bitstring contents of the instruction operands to the unbounded integers

that they represent as two's complement or unsigned integers.

2. Use of mathematical operators, built-in functions and helper functions on those unbounded integers to

calculate other such integers.

3. Use of either the bitstring extraction operator defined in Bitstring extraction on page D16-2654 or of

the saturation helper functions described in Pseudocode details of saturation on page A2-44 to convert

an unbounded integer result into a bitstring result that can be written to a register.

Shift and rotate operations

The following types of shift and rotate operations are used in instructions:

Logical Shift Left

(

LSL

) moves each bit of a bitstring left by a specified number of bits. Zeros are shifted in at the right

end of the bitstring. Bits that are shifted off the left end of the bitstring are discarded, except that the

last such bit can be produced as a carry output.

Logical Shift Right

(

LSR

) moves each bit of a bitstring right by a specified number of bits. Zeros are shifted in at the left

end of the bitstring. Bits that are shifted off the right end of the bitstring are discarded, except that

the last such bit can be produced as a carry output.

Arithmetic Shift Right

(

ASR

) moves each bit of a bitstring right by a specified number of bits. Copies of the leftmost bit are

shifted in at the left end of the bitstring. Bits that are shifted off the right end of the bitstring are

discarded, except that the last such bit can be produced as a carry output.

Rotate Right (

ROR

) moves each bit of a bitstring right by a specified number of bits. Each bit that is shifted off the

right end of the bitstring is re-introduced at the left end. The last bit shifted off the right end of the

bitstring can be produced as a carry output.

Rotate Right with Extend

(

RRX

) moves each bit of a bitstring right by one bit. A carry input is shifted in at the left end of the

bitstring. The bit shifted off the right end of the bitstring can be produced as a carry output.

Pseudocode details of shift and rotate operations

These shift and rotate operations are supported in pseudocode by the following functions:

// LSL_C()

// =======

(bits(N), bit) LSL_C(bits(N) x, integer shift)

assert shift > 0;

extended_x = x : Zeros(shift);

result = extended_x<N-1:0>;

carry_out = extended_x<N>;

A2 Application Level Programmers’ Model

A2.2 ARM core data types and arithmetic

Non-Confidential ID051414

return (result, carry_out);

// LSL()

// =====

bits(N) LSL(bits(N) x, integer shift)

assert shift >= 0;

if shift == 0 then

result = x;

else

(result, -) = LSL_C(x, shift);

return result;

// LSR_C()

// =======

(bits(N), bit) LSR_C(bits(N) x, integer shift)

assert shift > 0;

extended_x = ZeroExtend(x, shift+N);

result = extended_x<shift+N-1:shift>;

carry_out = extended_x<shift-1>;

return (result, carry_out);

// LSR()

// =====

bits(N) LSR(bits(N) x, integer shift)

assert shift >= 0;

if shift == 0 then

result = x;

else

(result, -) = LSR_C(x, shift);

return result;

// ASR_C()

// =======

(bits(N), bit) ASR_C(bits(N) x, integer shift)

assert shift > 0;

extended_x = SignExtend(x, shift+N);

result = extended_x<shift+N-1:shift>;

carry_out = extended_x<shift-1>;

return (result, carry_out);

// ASR()

// =====

bits(N) ASR(bits(N) x, integer shift)

assert shift >= 0;

if shift == 0 then

result = x;

else

(result, -) = ASR_C(x, shift);

return result;

// ROR_C()

// =======

(bits(N), bit) ROR_C(bits(N) x, integer shift)

assert shift != 0;

m = shift MOD N;

result = LSR(x,m) OR LSL(x,N-m);

carry_out = result<N-1>;

return (result, carry_out);

A2 Application Level Programmers’ Model

A2.2 ARM core data types and arithmetic

ID051414 Non-Confidential

// ROR()

// =====

bits(N) ROR(bits(N) x, integer shift)

if shift == 0 then

result = x;

else

(result, -) = ROR_C(x, shift);

return result;

// RRX_C()

// =======

(bits(N), bit) RRX_C(bits(N) x, bit carry_in)

result = carry_in : x<N-1:1>;

carry_out = x<0>;

return (result, carry_out);

// RRX()

// =====

bits(N) RRX(bits(N) x, bit carry_in)

(result, -) = RRX_C(x, carry_in);

return result;

Pseudocode details of addition and subtraction

In pseudocode, addition and subtraction can be performed on any combination of unbounded integers and bitstrings,

provided that if they are performed on two bitstrings, the bitstrings must be identical in length. The result is another

unbounded integer if both operands are unbounded integers, and a bitstring of the same length as the bitstring

operand(s) otherwise. For the precise definition of these operations, see Addition and subtraction on

page D16-2656.

The main addition and subtraction instructions can produce status information about both unsigned carry and signed

overflow conditions. When necessary, multi-word additions and subtractions are synthesized from this status

information. In pseudocode the

AddWithCarry()

function provides an addition with a carry input and carry and

overflow outputs:

// AddWithCarry()

// ==============

(bits(N), bit, bit) AddWithCarry(bits(N) x, bits(N) y, bit carry_in)

unsigned_sum = UInt(x) + UInt(y) + UInt(carry_in);

signed_sum = SInt(x) + SInt(y) + UInt(carry_in);

result = unsigned_sum<N-1:0>; // same value as signed_sum<N-1:0>

carry_out = if UInt(result) == unsigned_sum then '0' else '1';

overflow = if SInt(result) == signed_sum then '0' else '1';

return (result, carry_out, overflow);

An important property of the

AddWithCarry()

function is that if:

(result, carry_out, overflow) = AddWithCarry(x, NOT(y), carry_in)

then:

•if

carry_in == '1'

, then

result == x-y

with:

—

overflow == '1'

if signed overflow occurred during the subtraction

—

carry_out == '1'

if unsigned borrow did not occur during the subtraction, that is, if

x >= y

•if

carry_in == '0'

, then

result == x-y-1

with:

—

overflow == '1'

if signed overflow occurred during the subtraction

—

carry_out == '1'

if unsigned borrow did not occur during the subtraction, that is, if

x > y

Together, these mean that the

carry_in

and

carry_out

bits in

AddWithCarry()

calls can act as NOT borrow flags for

subtractions as well as carry flags for additions.

A2 Application Level Programmers’ Model

A2.2 ARM core data types and arithmetic

Non-Confidential ID051414

Pseudocode details of saturation

Some instructions perform saturating arithmetic, that is, if the result of the arithmetic overflows the destination

signed or unsigned N-bit integer range, the result produced is the largest or smallest value in that range, rather than

wrapping around modulo 2N. This is supported in pseudocode by:

• the

SignedSatQ()

and

UnsignedSatQ()

functions when an operation requires, in addition to the saturated result,

a Boolean argument that indicates whether saturation occurred

• the

SignedSat()

and

UnsignedSat()

functions when only the saturated result is required.

// SignedSatQ()

// ============

(bits(N), boolean) SignedSatQ(integer i, integer N)

if i > 2^(N-1) - 1 then

result = 2^(N-1) - 1; saturated = TRUE;

elsif i < -(2^(N-1)) then

result = -(2^(N-1)); saturated = TRUE;

else

result = i; saturated = FALSE;

return (result<N-1:0>, saturated);

// UnsignedSatQ()

// ==============

(bits(N), boolean) UnsignedSatQ(integer i, integer N)

if i > 2^N - 1 then

result = 2^N - 1; saturated = TRUE;

elsif i < 0 then

result = 0; saturated = TRUE;

else

result = i; saturated = FALSE;

return (result<N-1:0>, saturated);

// SignedSat()

// ===========

bits(N) SignedSat(integer i, integer N)

(result, -) = SignedSatQ(i, N);

return result;

// UnsignedSat()

// =============

bits(N) UnsignedSat(integer i, integer N)

(result, -) = UnsignedSatQ(i, N);

return result;

SatQ(i, N, unsigned)

returns either

UnsignedSatQ(i, N)

SignedSatQ(i, N)

depending on the value of its third

argument, and

Sat(i, N, unsigned)

returns either

UnsignedSat(i, N)

SignedSat(i, N)

depending on the value of

its third argument:

// SatQ()

// ======

(bits(N), boolean) SatQ(integer i, integer N, boolean unsigned)

(result, sat) = if unsigned then UnsignedSatQ(i, N) else SignedSatQ(i, N);

return (result, sat);

// Sat()

// =====

bits(N) Sat(integer i, integer N, boolean unsigned)

result = if unsigned then UnsignedSat(i, N) else SignedSat(i, N);

return result;

A2 Application Level Programmers’ Model

A2.3 ARM core registers

ID051414 Non-Confidential

A2.3 ARM core registers

In the application level view, an ARM processor has:

• thirteen general-purpose 32-bit registers, R0 to R12

• three 32-bit registers with special uses, SP, LR, and PC, that can be described as R13 to R15.

The special registers are:

SP, the stack pointer

The processor uses SP as a pointer to the active stack.

In the Thumb instruction set, most instructions cannot access SP. The only instructions that can

access SP are those designed to use SP as a stack pointer.

The ARM instruction set provides more general access to the SP, and it can be used as a

general-purpose register. However, ARM deprecates the use of SP for any purpose other than as a

stack pointer.

Note

Using SP for any purpose other than as a stack pointer is likely to break the requirements of

operating systems, debuggers, and other software systems, causing them to malfunction.

Software can refer to SP as R13.

LR, the link register

The link register is a special register that can hold return link information. Some cases described in

this manual require this use of the LR. When software does not require the LR for linking, it can use

it for other purposes. It can refer to LR as R14.

PC, the program counter

• When executing an ARM instruction, PC reads as the address of the current instruction

plus 8.

• When executing a Thumb instruction, PC reads as the address of the current instruction

plus 4.

• Writing an address to PC causes a branch to that address.

Most Thumb instructions cannot access PC.

The ARM instruction set provides more general access to the PC, and many ARM instructions can

use the PC as a general-purpose register. However, ARM deprecates the use of PC for any purpose

other than as the program counter. See Writing to the PC on page A2-46 for more information.

Software can refer to PC as R15.

See ARM core registers on page B1-1143 for the system level view of these registers.

Note

In general, ARM strongly recommends using the names SP, LR and PC instead of R13, R14 and R15. However,

sometimes it is simpler to use the R13-R15 names when referring to a group of registers. For example, it is simpler

to refer to Registers R8 to R15, rather than to Registers R8 to R12, the SP, LR and PC. These two descriptions of the

group of registers have exactly the same meaning.

A2 Application Level Programmers’ Model

A2.3 ARM core registers

Non-Confidential ID051414

A2.3.1 Writing to the PC

In ARMv7, many data-processing instructions can write to the PC. Writes to the PC are handled as follows:

• In Thumb state, the following 16-bit Thumb instruction encodings branch to the value written to the PC:

— encoding T2 of ADD (register, Thumb) on page A8-310

— encoding T1 of MOV (register, Thumb) on page A8-486.

The value written to the PC is forced to be halfword-aligned by ignoring its least significant bit, treating that

bit as being 0.

• The

CBNZ

CBZ

CHKA

HBL

HBLP

HBP

TBB

, and

TBH

instructions remain in the same instruction set state

and branch to the value written to the PC.

The definition of each of these instructions ensures that the value written to the PC is correctly aligned for

the current instruction set state.

• The

BLX

(immediate) instruction switches between ARM and Thumb states and branches to the value written

to the PC. Its definition ensures that the value written to the PC is correctly aligned for the new instruction

set state.

• The following instructions write a value to the PC, treating that value as an interworking address to branch

to, with low-order bits that determine the new instruction set state:

—

BLX

(register),

, and

BXJ

—

LDR

instructions with

<Rt>

equal to the PC

—

POP

and all forms of

LDM

except

LDM

(exception return), when the register list includes the PC

— in ARM state only,

ADC

ADD

ADR

AND

ASR

(immediate),

BIC

EOR

LSL

(immediate),

LSR

(immediate),

MOV

MVN

ORR

ROR

(immediate),

RRX

RSB

RSC

SBC

, and

SUB

instructions with

<Rd>

equal to the PC and without

flag-setting specified.

For details of how an interworking address specifies the new instruction set state and instruction address, see

Pseudocode details of operations on ARM core registers on page A2-47.

Note

— The register-shifted register instructions, that are available only in the ARM instruction set and are

summarized inData-processing (register-shifted register) on page A5-198, cannot write to the PC.

— The

LDR

POP

, and

LDM

instructions first have interworking branch behavior in ARMv5T.

— The instructions listed as having interworking branch behavior in ARM state only first have this

behavior in ARMv7.

In the cases where later versions of the architecture introduce interworking branch behavior, the behavior in

earlier architecture versions is a branch that remains in the same instruction set state. For more information,

see:

—Interworking on page D12-2503, for ARMv6

—Interworking on page D15-2591, for ARMv5 and ARMv4.

• Some instructions are treated as exception return instructions, and write both the PC and the CPSR. For more

information, including which instructions are exception return instructions, see Exception return on

page B1-1194.

• Some instructions cause an exception, and the exception handler address is written to the PC as part of the

exception entry. Similarly, in ThumbEE state, an instruction that fails its null check causes the address of the

null check handler to be written to the PC, see Null checking on page A9-1113.

A2 Application Level Programmers’ Model

A2.3 ARM core registers

ID051414 Non-Confidential

A2.3.2 Pseudocode details of operations on ARM core registers

In pseudocode, the uses of the

R[]

function are:

• reading or writing R0-R12, SP, and LR, using n == 0-12, 13, and 14 respectively

• reading the PC, using n == 15.

This function has prototypes:

bits(32) R[integer n]

assert n >= 0 && n <= 15;

R[integer n] = bits(32) value

assert n >= 0 && n <= 14;

Pseudocode details of ARM core register operations on page B1-1144 explains the full operation of this function.

Descriptions of ARM store instructions that store the PC value use the

PCStoreValue()

pseudocode function to

specify the PC value stored by the instruction:

// PCStoreValue()

// ==============

bits(32) PCStoreValue()

// This function returns the PC value. On architecture versions before ARMv7, it

// is permitted to instead return PC+4, provided it does so consistently. It is

// used only to describe ARM instructions, so it returns the address of the current

// instruction plus 8 (normally) or 12 (when the alternative is permitted).

return PC;

Writing an address to the PC causes either a simple branch to that address or an interworking branch that also selects

the instruction set to execute after the branch. A simple branch is performed by the

BranchWritePC()

function:

// BranchWritePC()

// ===============

BranchWritePC(bits(32) address)

if CurrentInstrSet() == InstrSet_ARM then

if ArchVersion() < 6 && address<1:0> != '00' then UNPREDICTABLE;

BranchTo(address<31:2>:'00');

elsif CurrentInstrSet() == InstrSet_Jazelle then

if JazelleAcceptsExecution() then

BranchTo(address<31:0>);

else

newaddress = address;

newaddress<1:0> = bits(2) UNKNOWN;

BranchTo(newaddress);

else

BranchTo(address<31:1>:'0');

An interworking branch is performed by the

BXWritePC()

function:

// BXWritePC()

// ===========

BXWritePC(bits(32) address)

if CurrentInstrSet() == InstrSet_ThumbEE then

if address<0> == '1' then

BranchTo(address<31:1>:'0'); // Remaining in ThumbEE state

else

UNPREDICTABLE;

else

if address<0> == '1' then

SelectInstrSet(InstrSet_Thumb);

BranchTo(address<31:1>:'0');

elsif address<1> == '0' then

SelectInstrSet(InstrSet_ARM);

BranchTo(address);

else // address<1:0> == '10'

A2 Application Level Programmers’ Model

A2.3 ARM core registers

Non-Confidential ID051414

UNPREDICTABLE;

The

LoadWritePC()

and

ALUWritePC()

functions are used for two cases where the behavior was systematically

modified between architecture versions:

// LoadWritePC()

// =============

LoadWritePC(bits(32) address)

if ArchVersion() >= 5 then

BXWritePC(address);

else

BranchWritePC(address);

// ALUWritePC()

// ============

ALUWritePC(bits(32) address)

if ArchVersion() >= 7 && CurrentInstrSet() == InstrSet_ARM then

BXWritePC(address);

else

BranchWritePC(address);

Note

The behavior of the PC writes performed by the

ALUWritePC()

function is different in Debug state, where there are

more UNPREDICTABLE cases. The pseudocode in this section only handles the non-debug cases. For more

information, see Behavior of Data-processing instructions that access the PC in Debug state on page C5-2102.

A2 Application Level Programmers’ Model

A2.4 The Application Program Status Register (APSR)

ID051414 Non-Confidential

A2.4 The Application Program Status Register (APSR)

Program status is reported in the 32-bit Application Program Status Register (APSR). The APSR bit assignments

are:

The APSR bit categories are:

• Reserved bits, that are allocated to system features, or are available for future expansion. Unprivileged

execution ignores writes to fields that are accessible only at PL1 or higher. However, application level

software that writes to the APSR must treat reserved bits as Do-Not-Modify (DNM) bits. For more

information about the reserved bits, see Format of the CPSR and SPSRs on page B1-1148.

Although bits[15:0] are UNKNOWN on reads, it is permitted that, on a read of APSR:

— Bit[9] returns the value of CPSR.E.

— Bits[8:6] return the value of CPSR.{A,I, F}, the mask bits.

This is an exception to the general rule that an UNKNOWN field must not return information that cannot be

obtained, at the current Privilege level, by an architected mechanism.

ARM recommends that these bits do not return the CPSR bit values on a read of the APSR.

• Bits that can be set by many instructions:

— The Condition flags:

N, bit[31] Negative condition flag. Set to bit[31] of the result of the instruction. If the result is

regarded as a two's complement signed integer, then the processor sets N to 1 if the result

is negative, and sets N to 0 if it is positive or zero.

Z, bit[30] Zero condition flag. Set to 1 if the result of the instruction is zero, and to 0 otherwise. A

result of zero often indicates an equal result from a comparison.

C, bit[29] Carry condition flag. Set to 1 if the instruction results in a carry condition, for example an

unsigned overflow on an addition.

V, bit[28] Overflow condition flag. Set to 1 if the instruction results in an overflow condition, for

example a signed overflow on an addition.

— The Overflow or saturation flag:

Q, bit[27] Set to 1 to indicate overflow or saturation occurred in some instructions, normally related

to digital signal processing (DSP). For more information, see Pseudocode details of

saturation on page A2-44.

— The Greater than or Equal flags:

GE[3:0], bits[19:16]

The instructions described in Parallel addition and subtraction instructions on

page A4-171 update these flags to indicate the results from individual bytes or halfwords

of the operation. These flags can control a later

SEL

instruction. For more information, see

SEL on page A8-602.

• Bits[26:24] are RAZ/SBZP. Therefore, software can use

MSR

instructions that write the top byte of the APSR

without using a read, modify, write sequence. If it does this, it must write zeros to bits[26:24].

Instructions can test the N, Z, C, and V condition flags, combining these with the condition code for the instruction

to determine whether the instruction must be executed. In this way, execution of the instruction is conditional on the

result of a previous operation. For more information about conditional execution see Conditional execution on

page A4-161 and Conditional execution on page A8-288.

In ARMv7-A and ARMv7-R, the APSR is the same register as the CPSR, but the APSR must be used only to access

the N, Z, C, V, Q, and GE[3:0] bits. For more information, see Program Status Registers (PSRs) on page B1-1147.

Reserved, UNKNOWN/SBZPN

31 30 29 28 27 26 24 23 20 19 16 15 0

ZCVQ Reserved,

UNK/SBZP GE[3:0]

RAZ/

SBZP

A2 Application Level Programmers’ Model

A2.5 Execution state registers

Non-Confidential ID051414

A2.5 Execution state registers

The execution state registers modify the execution of instructions. They control:

• Whether instructions are interpreted as Thumb instructions, ARM instructions, ThumbEE instructions, or

Java bytecodes. For more information, see Instruction set state register, ISETSTATE.

• In Thumb state and ThumbEE state only, the condition codes that apply to the next one to four instructions.

For more information, see IT block state register, ITSTATE on page A2-51.

• Whether data is interpreted as big-endian or little-endian. For more information, see Endianness mapping

In ARMv7-A and ARMv7-R, the execution state registers are part of the Current Program Status Register. For more

information, see Program Status Registers (PSRs) on page B1-1147.

There is no direct access to the execution state registers from application level instructions, but they can be changed

by side-effects of application level instructions.

A2.5.1 Instruction set state register, ISETSTATE

The instruction set state register, ISETSTATE, format is:

The J bit and the T bit determine the current instruction set state for the processor. Table A2-1 shows the encoding

of these bits.

ARM state The processor executes the ARM instruction set described in Chapter A5 ARM Instruction

Set Encoding.

Thumb state The processor executes the Thumb instruction set as described in Chapter A6 Thumb

Instruction Set Encoding.

Jazelle state The processor executes Java bytecodes as part of a Java Virtual Machine (JVM). For more

information, see:

•Jazelle direct bytecode execution support on page A2-97, for application level

information

•Jazelle direct bytecode execution on page B1-1241, for system level information.

ThumbEE state The processor executes a variation of the Thumb instruction set specifically targeted for use

with dynamic compilation techniques associated with an execution environment. This can

be Java or other execution environments. This feature is required in ARMv7-A, and optional

in ARMv7-R. For more information, see:

•Thumb Execution Environment on page A2-95, for application level information

•Thumb Execution Environment on page B1-1240, for system level information.

Table A2-1 J and T bit encoding in ISETSTATE

J T Instruction set state

00ARM

0 1 Thumb

10Jazelle

1 1 ThumbEE

A2 Application Level Programmers’ Model

A2.5 Execution state registers

ID051414 Non-Confidential

Pseudocode details of ISETSTATE operations

The following pseudocode functions return the current instruction set and select a new instruction set:

enumeration InstrSet {InstrSet_ARM, InstrSet_Thumb, InstrSet_Jazelle, InstrSet_ThumbEE};

// CurrentInstrSet()

// =================

InstrSet CurrentInstrSet()

case ISETSTATE of

when '00' result = InstrSet_ARM;

when '01' result = InstrSet_Thumb;

when '10' result = InstrSet_Jazelle;

when '11' result = InstrSet_ThumbEE;

return result;

// SelectInstrSet()

// ================

SelectInstrSet(InstrSet iset)

case iset of

when InstrSet_ARM

if CurrentInstrSet() == InstrSet_ThumbEE then

UNPREDICTABLE;

else

ISETSTATE = '00';

when InstrSet_Thumb

ISETSTATE = '01';

when InstrSet_Jazelle

ISETSTATE = '10';

when InstrSet_ThumbEE

ISETSTATE = '11';

return;

A2.5.2 IT block state register, ITSTATE

The IT block state register, ITSTATE, format is:

This field holds the If-Then execution state bits for the Thumb

instruction, that applies to the IT block of one to

four instructions that immediately follow the

instruction. See IT on page A8-390 for a description of the

instruction and the associated IT block.

ITSTATE divides into two subfields:

IT[7:5] Holds the base condition for the current IT block. The base condition is the top 3 bits of the

condition code specified by the

field of the

instruction.

This subfield is

0b000

when no IT block is active.

IT[4:0] Encodes:

• The size of the IT block. This is the number of instructions that are to be conditionally

executed. The size of the block is implied by the position of the least significant 1 in this field,

as shown in Table A2-2 on page A2-52.

• The value of the least significant bit of the condition code for each instruction in the block.

Note

Changing the value of the least significant bit of a condition code from 0 to 1 has the effect

of inverting the condition code.

IT[7:0]

A2 Application Level Programmers’ Model

A2.5 Execution state registers

Non-Confidential ID051414

This subfield is

0b00000

when no IT block is active.

When an IT instruction is executed, these bits are set according to the

condition code in the instruction,

and the Then and Else (T and E) parameters in the instruction. For more information, see IT on page A8-390.

When permitted, an instruction in an IT block is conditional, see Conditional instructions on page A4-162 and

Conditional execution on page A8-288. The condition code used is the current value of IT[7:4]. When an instruction

in an IT block completes its execution normally,

ITSTATE

advances to the next line of Table A2-2. A few instructions,

for example

BKPT

, cannot be conditional and therefore are always executed, ignoring the current ITSTATE.

For details of what happens if an instruction in an IT block:

• Takes an exception see Overview of exception entry on page B1-1171.

• In ThumbEE state, causes a branch to a check handler, see IT block and check handlers on page A9-1114.

An instruction that might complete its normal execution by branching is only permitted in an IT block as the last

instruction in the block. This means that normal execution of the instruction always results in

ITSTATE

advancing to

normal execution.

On a branch or an exception return, if ITSTATE is set to a value that is not consistent with the instruction stream

being branched to or returned to, then instruction execution is UNPREDICTABLE.

ITSTATE

affects instruction execution only in Thumb and ThumbEE states. In ARM and Jazelle states,

ITSTATE

must

'00000000'

, otherwise the behavior is UNPREDICTABLE.

Pseudocode details of ITSTATE operations

ITSTATE

advances after normal execution of an IT block instruction. This is described by the

ITAdvance()

pseudocode

function:

// ITAdvance()

// ===========

ITAdvance()

if ITSTATE<2:0> == '000' then

ITSTATE.IT = '00000000';

else

ITSTATE.IT<4:0> = LSL(ITSTATE.IT<4:0>, 1);

The following functions test whether the current instruction is in an IT block, and whether it is the last instruction

of an IT block:

// InITBlock()

// ===========

boolean InITBlock()

return (ITSTATE.IT<3:0> != '0000');

Table A2-2 Effect of IT execution state bits

IT bits a

a. Combinations of the IT bits not shown in this table are reserved.

Note

[7:5] [4] [3] [2] [1] [0]

cond_base P1 P2 P3 P4 1 Entry point for 4-instruction IT block

cond_base P1 P2 P3 1 0 Entry point for 3-instruction IT block

cond_base P1 P2 1 0 0 Entry point for 2-instruction IT block

cond_base P1 1 0 0 0 Entry point for 1-instruction IT block

000 0 0 0 0 0 Normal execution, not in an IT block

A2 Application Level Programmers’ Model

A2.5 Execution state registers

ID051414 Non-Confidential

// LastInITBlock()

// ===============

boolean LastInITBlock()

return (ITSTATE.IT<3:0> == '1000');

A2.5.3 Endianness mapping register, ENDIANSTATE

ARMv7-A and ARMv7-R support configuration between little-endian and big-endian interpretations of data

memory, as shown in Table A2-3. The endianness is controlled by ENDIANSTATE.

The ARM and Thumb instruction sets both include an instruction to manipulate ENDIANSTATE:

SETEND BE

Sets ENDIANSTATE to 1, for big-endian operation.

SETEND LE

Sets ENDIANSTATE to 0, for little-endian operation.

The

SETEND

instruction is unconditional. For more information, see SETEND on page A8-604.

Pseudocode details of ENDIANSTATE operations

The

BigEndian()

pseudocode function tests whether big-endian memory accesses are currently selected.

// BigEndian()

// ===========

boolean BigEndian()

return (ENDIANSTATE == '1');

Table A2-3 ENDIANSTATE encoding of endianness

ENDIANSTATE Endian mapping

0 Little-endian

1 Big-endian

A2 Application Level Programmers’ Model

A2.6 Advanced SIMD and Floating-point Extensions

Non-Confidential ID051414

A2.6 Advanced SIMD and Floating-point Extensions

Advanced SIMD and Floating-point (VFP) are two OPTIONAL extensions to ARMv7.

The Advanced SIMD Extension performs packed Single Instruction Multiple Data (SIMD) operations, either

integer or single-precision floating-point. The Floating-point Extension performs single-precision or

double-precision floating-point operations.

Both extensions permit floating-point exceptions, such as overflow or division by zero, to be handled without

trapping. When handled in this way, a floating-point exception causes a cumulative status register bit to be set to 1

and a default result to be produced by the operation.

The ARMv7 Floating-point Extension implementation can be VFPv3 or VFPv4, see Architecture extensions on

page A1-32. ARMv7 also defines variants of VFPv3 and VFPv4, VFPv3U and VFPv4U, that support the trapping

of floating-point exceptions, see VFPv3U and VFPv4U on page A2-62. VFPv2 also supports the trapping of

floating-point exceptions.

The Advanced SIMD implementation can be Advanced SIMDv1 or Advanced SIMDv2.

If an implementation includes both the Advanced SIMD and the Floating-point Extensions then the versions of the

two extensions must align, as described in Instruction set architecture extensions on page A1-32.

For more information about floating-point exceptions see Floating-point exceptions on page A2-70.

Each version of these extensions can be implemented at a number of levels. Table A2-4 shows the permitted

combinations of implementations of the two extensions.

The Half-precision Extension provides conversion functions in both directions between half-precision

floating-point and single-precision floating-point. This extension:

• Can be implemented with any Advanced SIMDv1 or VFPv3 implementation that supports single-precision

floating-point, and the Half-precision extension applies to both VFP and Advanced SIMD if they are both

implemented.

• Is included in any Advanced SIMDv2 or VFPv4 implementation that supports single-precision

floating-point.

For system level information about the Advanced SIMD and Floating-point Extensions see Advanced SIMD and

floating-point support on page B1-1229.

Table A2-4 Permitted combinations of Advanced SIMD and Floating-point Extensions

Advanced SIMD Floating-point (VFP)

Not implemented Not implemented

Integer only Not implemented

Integer and single-precision floating-point Single-precision floating-point onlya

a. Must be able to load and store double-precision data using the bottom 16 double-precision registers, D0-D15.

Integer and single-precision floating-point Single-precision and double-precision floating-point

Not implemented Single-precision floating-point onlya

Not implemented Single-precision and double-precision floating-point

A2 Application Level Programmers’ Model

A2.6 Advanced SIMD and Floating-point Extensions

ID051414 Non-Confidential

Note

Before ARMv7, the Floating-point Extension was called the Vector Floating-point Architecture, and was used for

vector operations. For details of these deprecated operations see Appendix D11 VFP Vector Operation Support. In

ARMv7:

• ARM recommends that the Advanced SIMD Extension is used for single-precision vector floating-point

operations.

• An implementation that requires support for vector operations must implement the Advanced SIMD

Extension.

A2.6.1 Floating-point standards, and terminology

The ARM floating-point implementation includes support for all the required features of ANSI/IEEE Std 754-2008,

IEEE Standard for Binary Floating-Point Arithmetic, referred to as IEEE 754-2008. However, the original

implementation was based on the 1985 version of this standard, referred to as IEEE 754-1985, In this manual:

• Floating-point terminology generally uses the IEEE 754-1985 terms. This section summarizes how

IEEE 754-2008 changes these terms.

• References to IEEE 754 that do not include the issue year apply to either issue of the standard.

Table A2-5 shows how the terminology in this manual differs from that used in IEEE 754-2008.

The fused multiply add operations are first defined in IEEE 754-2008, and are introduced in VFPv4 and

Advanced SIMDv2. The following sections describe the instructions that perform these operations:

•VFMA, VFMS on page A8-892

•VFNMA, VFNMS on page A8-894.

All other ARMv7 floating-point operations are defined in both issues of IEEE 754.

Note

ARMv7 does not support the IEEE 754-2008 roundTiesToAway rounding mode. However, IEEE 754-compliance

does not require support for this mode.

Table A2-5 Floating-point terminology

This manual, based on IEEE 754-1985a

a. Except that normalized number is used in preference to normal number, because of

the other specific uses of normal in this manual.

IEEE 754-2008

Normalized Normal

Denormal, or denormalized Subnormal

Round towards Minus Infinity roundTowardsNegative

Round towards Plus Infinity roundTowardsPositive

Round towards Zero roundTowardZero

Round to Nearest roundTiesToEven

Rounding mode Rounding-direction attribute

A2 Application Level Programmers’ Model

A2.6 Advanced SIMD and Floating-point Extensions

Non-Confidential ID051414

A2.6.2 Advanced SIMD and Floating-point Extension registers

From VFPv3, the Advanced SIMD and Floating-point (VFP) Extensions use the same register set. This is distinct

from the ARM core register set. These registers are generally referred to as the extension registers.

The extension register set consists of either thirty-two or sixteen doubleword registers, as follows:

• If VFPv2 is implemented, it consists of sixteen doubleword registers.

• If VFPv3 is implemented, it consists of either thirty-two or sixteen doubleword registers. Where necessary,

these two implementation options are distinguished using the terms:

— VFPv3-D32, for an implementation with thirty-two doubleword registers

— VFPv3-D16, for an implementation with sixteen doubleword registers.

• If VFPv4 is implemented, it consists of either thirty-two or sixteen doubleword registers. Where necessary,

these two implementation options are distinguished using the terms:

— VFPv4-D32, for an implementation with thirty-two doubleword registers

— VFPv4-D16, for an implementation with sixteen doubleword registers.

• If Advanced SIMD is implemented, it consists of thirty-two doubleword registers.

• If Advanced SIMD and Floating-point are both implemented, Floating-point must be implemented as

VFPv3-D32 or VFPv4-D32.

The Advanced SIMD and Floating-point views of the extension register set are not identical. The following sections

describe these different views.

Figure A2-1 on page A2-57 shows the views of the extension register set, and the way the word, doubleword, and

quadword registers overlap.

Advanced SIMD views of the extension register set

Advanced SIMD can view this register set as:

• Sixteen 128-bit quadword registers,

Q0-Q15

• Thirty-two 64-bit doubleword registers,

D0-D31

. This view is also available in VFPv3-D32 and VFPv4-D32.

These views can be used simultaneously. For example, a program might hold 64-bit vectors in

and

and a

128-bit vector in

Floating-point views of the extension register set

In VFPv4-D32 or VFPv3-D32, the extension register set consists of thirty-two doubleword registers, that VFP can

view as:

• Thirty-two 64-bit doubleword registers,

D0-D31

. This view is also available in Advanced SIMD.

• Thirty-two 32-bit single word registers,

S0-S31

. Only half of the set is accessible in this view.

In VFPv4-D16, VFPv3-D16, and VFPv2, the extension register set consists of sixteen doubleword registers, that

VFP can view as:

• Sixteen 64-bit doubleword registers,

D0-D15

• Thirty-two 32-bit single word registers,

S0-S31

In each case, the two views can be used simultaneously.

A2 Application Level Programmers’ Model

A2.6 Advanced SIMD and Floating-point Extensions

ID051414 Non-Confidential

Advanced SIMD and Floating-point register mapping

Figure A2-1 shows the different views of Advanced SIMD and Floating-point register banks, and the relationship

between them.

Figure A2-1 Advanced SIMD and Floating-point Extensions register set

The mapping between the registers is as follows:

•

S<2n>

maps to the least significant half of

D<n>

•

S<2n+1>

maps to the most significant half of

D<n>

•

D<2n>

maps to the least significant half of

Q<n>

•

D<2n+1>

maps to the most significant half of

Q<n>

For example, software can access the least significant half of the elements of a vector in

by referring to

D12

, and

the most significant half of the elements by referring to

D13

Pseudocode details of Advanced SIMD and Floating-point Extension registers

The pseudocode function

VFPSmallRegisterBank()

returns FALSE if all of the 32 registers

D31

can be accessed,

and TRUE if only the 16 registers

D15

can be accessed:

boolean VFPSmallRegisterBank()

In more detail,

VFPSmallRegisterBank()

• returns TRUE for a VFPv2, VFPv3-D16, or VFPv4-D16 implementation

• for a VFPv3-D32 or VFPv4-D32 implementation:

— returns FALSE if CPACR.D32DIS is set to 0

— returns TRUE if CPACR.D32DIS and CPACR.ASEDIS are both set to 1

— results in UNPREDICTABLE behavior if CPACR.D32DIS is set to 1 and CPACR.ASEDIS is set to 0.

VFPv2,

VFPv3-D16, or

VFPv4-D16

S28

S29

S30

S31

VFP only

D14

D15

VFPv3-D32,

VFPv4-D32, or

Advanced SIMD

D14

D15

D30

D31

D17

D16

Advanced SIMD

only

Q15

D0-D31 Q0-Q15D0-D15S0-S31

A2 Application Level Programmers’ Model

A2.6 Advanced SIMD and Floating-point Extensions

Non-Confidential ID051414

For details of the CPACR, see either:

•CPACR, Coprocessor Access Control Register, VMSA on page B4-1551

•CPACR, Coprocessor Access Control Register, PMSA on page B6-1831.

The following functions provide the

S31

D31

, and

Q15

views of the registers:

// The 64-bit extension register bank for Advanced SIMD and VFP.

array bits(64) _D[0..31];

// Clone the 64-bit Advanced SIMD and VFP extension register bank for use as input to

// instruction pseudocode, to avoid read-after-write for Advanced SIMD and VFP operations.

array bits(64) _Dclone[0..31];

// S[] - non-assignment form

// =========================

bits(32) S[integer n]

assert n >= 0 && n <= 31;

if (n MOD 2) == 0 then

result = D[n DIV 2]<31:0>;

else

result = D[n DIV 2]<63:32>;

return result;

// S[] - assignment form

// =====================

S[integer n] = bits(32) value

assert n >= 0 && n <= 31;

if (n MOD 2) == 0 then

D[n DIV 2]<31:0> = value;

else

D[n DIV 2]<63:32> = value;

return;

// D[] - non-assignment form

// =========================

bits(64) D[integer n]

assert n >= 0 && n <= 31;

if n >= 16 && VFPSmallRegisterBank() then UNDEFINED;

return _D[n];

// D[] - assignment form

// =====================

D[integer n] = bits(64) value

assert n >= 0 && n <= 31;

if n >= 16 && VFPSmallRegisterBank() then UNDEFINED;

_D[n] = value;

return;

// Q[] - non-assignment form

// =========================

bits(128) Q[integer n]

assert n >= 0 && n <= 15;

return D[2*n+1]:D[2*n];

A2 Application Level Programmers’ Model

A2.6 Advanced SIMD and Floating-point Extensions

ID051414 Non-Confidential

// Q[] - assignment form

// =====================

Q[integer n] = bits(128) value

assert n >= 0 && n <= 15;

D[2*n] = value<63:0>;

D[2*n+1] = value<127:64>;

return;

The

Din[]

function returns a Doubleword register from the

_Dclone[]

copy of the Advanced SIMD and

Floating-point register bank, and the

Qin[]

function returns a Quadword register from that register bank.

Note

The

CheckAdvancedSIMDEnabled()

function copies the

_D[]

_Dclone[]

, see Pseudocode details of

enabling the Advanced SIMD and Floating-point Extensions on page B1-1235.

// Din[] - non-assignment form

// ===========================

bits(64) Din[integer n]

assert n >= 0 && n <= 31;

if n >= 16 && VFPSmallRegisterBank() then UNDEFINED;

return _Dclone[n];

// Qin[] - non-assignment form

// ===========================

bits(128) Qin[integer n]

assert n >= 0 && n <= 15;

return Din[2*n+1]:Din[2*n];

A2.6.3 Data types supported by the Advanced SIMD Extension

In an implementation that includes the Advanced SIMD Extension, the Advanced SIMD instructions can operate

on integer and floating-point data, and the extension defines a set of data types to represent the different data

formats. Table A2-6 shows the available formats. Each instruction description specifies the data types that the

instruction supports.

Polynomial arithmetic over {0, 1} on page A2-93 describes the polynomial data type.

The

.F16

data type is the half-precision data type selected by the FPSCR.AHP bit. It is supported only if an

implementation includes the Half-precision extension.

The

.F32

data type is the ARM standard single-precision floating-point data type, see Advanced SIMD and

Floating-point single-precision format on page A2-64.

Table A2-6 Advanced SIMD data types

Data type specifier Meaning

.<size>

Any element of

<size>

bits

.F<size>

Floating-point number of

<size>

bits

.I<size>

Signed or unsigned integer of

<size>

bits

.P<size>

Polynomial over {0, 1} of degree less than

<size>

.S<size>

Signed integer of

<size>

bits

.U<size>

Unsigned integer of

<size>

bits

A2 Application Level Programmers’ Model

A2.6 Advanced SIMD and Floating-point Extensions

Non-Confidential ID051414

The instruction definitions use a data type specifier to define the data types appropriate to the operation. Figure A2-2

shows the hierarchy of Advanced SIMD data types.

Figure A2-2 Advanced SIMD data type hierarchy

For example, a multiply instruction must distinguish between integer and floating-point data types.

An integer multiply instruction that generates a double-width (long) result must specify the input data types as

signed or unsigned. However, some integer multiply instructions use modulo arithmetic, and therefore do not have

to distinguish between signed and unsigned inputs.

A2.6.4 Advanced SIMD vectors

In an implementation that includes the Advanced SIMD Extension, a register can hold one or more packed elements,

all of the same size and type. The combination of a register and a data type describes a vector of elements. The vector

is considered to be an array of elements of the data type specified in the instruction. The number of elements in the

vector is implied by the size of the data elements and the size of the register.

Vector indices are in the range 0 to (number of elements – 1). An index of 0 refers to the least significant end of the

vector. Figure A2-3 on page A2-61 shows examples of Advanced SIMD vectors:

† Output format only. See VMULL instruction description.

‡ Supported only if the implementation includes the Half-precision Extension.

.64

.32

.16

.I8

.S64

.U64

.I64

.F32

.S8

.U8

.P8

.I16 .S16

.U16

.P16 †

.F16 ‡

.I32 .S32

.U32

A2 Application Level Programmers’ Model

A2.6 Advanced SIMD and Floating-point Extensions

ID051414 Non-Confidential

Figure A2-3 Examples of Advanced SIMD vectors

Pseudocode details of Advanced SIMD vectors

The pseudocode function

Elem[]

accesses the element of a specified index and size in a vector:

// Elem[] - non-assignment form

// ============================

bits(size) Elem[bits(N) vector, integer e, integer size]

assert e >= 0 && (e+1)*size <= N;

return vector<(e+1)*size-1:e*size>;

// Elem[] - assignment form

// ========================

Elem[bits(N) vector, integer e, integer size] = bits(size) value

assert e >= 0 && (e+1)*size <= N;

vector<(e+1)*size-1:e*size> = value;

return;

A2.6.5 Advanced SIMD and Floating-point system registers

The Advanced SIMD and Floating-point (VFP) Extensions have a shared register space for system registers. Only

one register in this space is accessible at the Application level, see either:

•FPSCR, Floating-point Status and Control Register, VMSA on page B4-1570

•FPSCR, Floating-point Status and Control Register, PMSA on page B6-1847.

Note

In this chapter, short links to the FPSCR are to the description in Chapter B4 System Control Registers in a VMSA

implementation. The FPSCR description in Chapter B6 System Control Registers in a PMSA implementation is

identical to this description.

Writes to the FPSCR can have side-effects on various aspects of processor operation. All of these side-effects are

synchronous to the FPSCR write. This means they are guaranteed not to be visible to earlier instructions in the

execution stream, and they are guaranteed to be visible to later instructions in the execution stream.

See Advanced SIMD and Floating-point Extension system registers on page B1-1236 for the system level view of

the registers.

127 0

.F32 .F32 .F32 .F32

[3] [2] [1] [0]

.S16 .S16 .S16 .S16 .S16 .S16 .S16 .S16

[7] [6] [5] [4] [3] [2] [1] [0]

063

.S32 .S32

[1] [0]

.U16 .U16 .U16 .U16

[3] [2] [1] [0]

128-bit vector of single-precision

(32-bit) floating-point numbers

128-bit vector of 16-bit signed integers

64-bit vector of 32-bit signed integers

64-bit vector of 16-bit unsigned integers

64 6396 95 32 31 16 1548 4780 79112 111

32 31 16 1548 47

A2 Application Level Programmers’ Model

A2.6 Advanced SIMD and Floating-point Extensions

Non-Confidential ID051414

A2.6.6 VFPv3U and VFPv4U

The VFPv3 and VFPv4 versions of the Floating-point Extension do not support the exception trap enable bits in the

FPSCR. With these versions of the Floating-point Extension, all floating-point exceptions are untrapped.

The VFPv3U variant of the VFPv3 extension, and the VFPv4U variant of the VFPv4 extension, implement

exception trap enable bits in the FPSCR, and provide exception handling as described in Floating-point support

code on page B1-1237. There is a separate trap enable bit for each of the six floating-point exceptions described in

Floating-point exceptions on page A2-70. Except for support for this trapping mechanism:

• the VFPv3U architecture is identical to VFPv3

• the VFPv4U architecture is identical to VFPv4.

Trapped exception handling never causes the corresponding cumulative exception bit of the FPSCR to be set to 1.

If this behavior is desired, the trap handler routine must use a read, modify, write sequence on the FPSCR to set the

cumulative exception bit.

Both VFPv3U and VFPv4U can be implemented with either thirty-two or sixteen doubleword registers. That is:

• VFPv3U can be implemented as VFPv3U-D32, or as VFPv3U-D16

• VFPv4U can be implemented as VFPv4U-D32, or as VFPv4U-D16.

VFPv3U-D16 and VFPv4U-D16 are backwards compatible with VFPv2.

A2 Application Level Programmers’ Model

A2.7 Floating-point data types and arithmetic

ID051414 Non-Confidential

A2.7 Floating-point data types and arithmetic

The Floating-point (VFP) Extension supports single-precision (32-bit) and double-precision (64-bit) floating-point

data types and arithmetic as defined by the IEEE 754 floating-point standard. It also supports the half-precision

(16-bit) floating-point data type for data storage only, by supporting conversions between single-precision and

half-precision data types.

ARM standard floating-point arithmetic means IEEE 754 floating-point arithmetic with the following restrictions:

• denormalized numbers are flushed to zero, see Flush-to-zero on page A2-68

• only default NaNs are supported, see NaN handling and the Default NaN on page A2-69

• the Round to Nearest rounding mode selected, by setting FPSCR.RMode to

0b00

• untrapped exception handling selected for all floating-point exceptions, by setting FPSCR[15, 12:8] to

0b000000

In ARMv7 implementations, trapped floating-point exception handling is supported in the VFPv3U and VFPv4U

variants of the Floating-point Extension, see VFPv3U and VFPv4U on page A2-62. In implementations of previous

architecture versions, it is supported in VFPv2.

The Advanced SIMD Extension supports only single-precision ARM standard floating-point arithmetic.

Note

Implementations of the Floating-point Extension require support code to be installed in the system if trapped

floating-point exception handling is required. See Floating-point support code on page B1-1237.

Some implementations might also require support code to support other aspects of their floating-point arithmetic.

However, with the ARMv7 architecture, ARM deprecates using support code in this way.

It is IMPLEMENTATION DEFINED which aspects of Floating-point Extension floating-point arithmetic are supported

in a system without support code installed.

Aspects of floating-point arithmetic that are implemented in support code are likely to run much more slowly than

those that are executed in hardware.

ARM recommends that:

• To maximize the chance of getting high floating-point performance, software developers use ARM standard

floating-point arithmetic.

• Software developers check whether their systems have support code installed, and if not, observe the

IMPLEMENTATION DEFINED restrictions on what operations their Floating-point Extension implementation

can handle without support code.

• Floating-point Extension implementation developers implement at least ARM standard floating-point

arithmetic in hardware, so that it can be executed without any need for support code.

The following sections give more information about ARM floating-point data types and arithmetic:

•ARM standard floating-point input and output values on page A2-64.

•Advanced SIMD and Floating-point single-precision format on page A2-64.

•Floating-point double-precision format on page A2-65.

•Advanced SIMD and Floating-point half-precision formats on page A2-66.

•Flush-to-zero on page A2-68.

•NaN handling and the Default NaN on page A2-69.

•Floating-point exceptions on page A2-70.

•Pseudocode details of floating-point operations on page A2-73.

A2 Application Level Programmers’ Model

A2.7 Floating-point data types and arithmetic

Non-Confidential ID051414

A2.7.1 ARM standard floating-point input and output values

ARM standard floating-point arithmetic supports the following input formats defined by the IEEE 754

floating-point standard:

• Zeros.

• Normalized numbers.

• Denormalized numbers are flushed to 0 before floating-point operations, see Flush-to-zero on page A2-68.

•NaNs.

• Infinities.

ARM standard floating-point arithmetic supports the Round to Nearest rounding mode defined by the IEEE 754

standard.

ARM standard floating-point arithmetic supports the following output result formats defined by the IEEE 754

standard:

• Zeros.

• Normalized numbers.

• Results that are less than the minimum normalized number are flushed to zero, see Flush-to-zero on

page A2-68.

• NaNs produced in floating-point operations are always the default NaN, see NaN handling and the Default

NaN on page A2-69.

• Infinities.

A2.7.2 Advanced SIMD and Floating-point single-precision format

The single-precision floating-point format used by the Advanced SIMD and Floating-point Extensions is as defined

by the IEEE 754 standard.

This description includes ARM-specific details that are left open by the standard. It is only intended as an

introduction to the formats and to the values they can contain. For full details, especially of the handling of infinities,

NaNs and signed zeros, see the IEEE 754 standard.

A single-precision value is a 32-bit word with the format:

The interpretation of the format depends on the value of the exponent field, bits[30:23]:

0 < exponent <

0xFF

The value is a normalized number and is equal to:

(–1)S×2

(exponent – 127) × (1.fraction)

The minimum positive normalized number is 2–126, or approximately 1.175 × 10–38.

The maximum positive normalized number is (2 – 2–23)×2

127, or approximately 3.403 × 1038.

exponent == 0

The value is either a zero or a denormalized number, depending on the fraction bits:

fraction == 0

The value is a zero. There are two distinct zeros:

+0 When S==0.

–0 When S==1.

fractionS

31 30 23 22 0

exponent

A2 Application Level Programmers’ Model

A2.7 Floating-point data types and arithmetic

ID051414 Non-Confidential

These usually behave identically. In particular, the result is equal if +0 and –0 are

compared as floating-point numbers. However, they yield different results in some

circumstances. For example, the sign of the infinity produced as the result of dividing

by zero depends on the sign of the zero. The two zeros can be distinguished from each

other by performing an integer comparison of the two words.

fraction != 0

The value is a denormalized number and is equal to:

(–1)S × 2–126 × (0.fraction)

The minimum positive denormalized number is 2–149, or approximately 1.401 × 10–45.

Denormalized numbers are always flushed to zero in the Advanced SIMD Extension. They are

optionally flushed to zero in the Floating-point Extension. For details see Flush-to-zero on

page A2-68.

exponent ==

0xFF

The value is either an infinity or a Not a Number (NaN), depending on the fraction bits:

fraction == 0

The value is an infinity. There are two distinct infinities:

+infinity When S==0. This represents all positive numbers that are too big to be

represented accurately as a normalized number.

-infinity When S==1. This represents all negative numbers with an absolute value

that is too big to be represented accurately as a normalized number.

fraction != 0

The value is a NaN, and is either a quiet NaN or a signaling NaN.

In the Floating-point Extension, the two types of NaN are distinguished on the basis of

their most significant fraction bit, bit[22]:

bit[22] == 0

The NaN is a signaling NaN. The sign bit can take any value, and the

remaining fraction bits can take any value except all zeros.

bit[22] == 1

The NaN is a quiet NaN. The sign bit and remaining fraction bits can take

any value.

For details of the default NaN see NaN handling and the Default NaN on page A2-69.

Note

NaNs with different sign or fraction bits are distinct NaNs, but this does not mean software can use floating-point

comparison instructions to distinguish them. This is because the IEEE 754 standard specifies that a NaN compares

as unordered with everything, including itself.

A2.7.3 Floating-point double-precision format

The double-precision floating-point format used by the Floating-point Extension is as defined by the IEEE 754

standard.

This description includes Floating-point Extension-specific details that are left open by the standard. It is only

intended as an introduction to the formats and to the values they can contain. For full details, especially of the

handling of infinities, NaNs and signed zeros, see the IEEE 754 standard.

A double-precision value is a 64-bit doubleword, with the format:

63 62 52 51 32 31 0

exponent fraction

A2 Application Level Programmers’ Model

A2.7 Floating-point data types and arithmetic

Non-Confidential ID051414

Double-precision values represent numbers, infinities and NaNs in a similar way to single-precision values, with

the interpretation of the format depending on the value of the exponent:

0 < exponent <

0x7FF

The value is a normalized number and is equal to:

(–1)S × 2(exponent–1023) × (1.fraction)

The minimum positive normalized number is 2–1022, or approximately 2.225 × 10–308.

The maximum positive normalized number is (2 – 2–52) × 21023, or approximately 1.798 × 10308.

exponent == 0

The value is either a zero or a denormalized number, depending on the fraction bits:

fraction == 0

The value is a zero. There are two distinct zeros that behave analogously to the two

single-precision zeros:

+0 when S==0

–0 when S==1.

fraction != 0

The value is a denormalized number and is equal to:

(-1)S × 2–1022 × (0.fraction)

The minimum positive denormalized number is 2–1074, or approximately 4.941 × 10–324.

Optionally, denormalized numbers are flushed to zero in the Floating-point Extension. For details

see Flush-to-zero on page A2-68.

exponent ==

0x7FF

The value is either an infinity or a NaN, depending on the fraction bits:

fraction == 0

the value is an infinity. As for single-precision, there are two infinities:

+infinity When S==0.

-infinity When S==1.

fraction != 0

The value is a NaN, and is either a quiet NaN or a signaling NaN.

In the Floating-point Extension, the two types of NaN are distinguished on the basis of

their most significant fraction bit, bit[51] of the most significant word:

bit[51] == 0

The NaN is a signaling NaN. The sign bit can take any value, and the

remaining fraction bits can take any value except all zeros.

bit[51] == 1

The NaN is a quiet NaN. The sign bit and the remaining fraction bits can

take any value.

For details of the default NaN see NaN handling and the Default NaN on page A2-69.

Note

NaNs with different sign or fraction bits are distinct NaNs, but this does not mean software can use floating-point

comparison instructions to distinguish them. This is because the IEEE 754 standard specifies that a NaN compares

as unordered with everything, including itself.

A2.7.4 Advanced SIMD and Floating-point half-precision formats

The Half-precision Extension to the Advanced SIMD and Floating-point Extensions uses two half-precision

floating-point formats:

• IEEE half-precision, as described in the IEEE 754-2008 standard

A2 Application Level Programmers’ Model

A2.7 Floating-point data types and arithmetic

ID051414 Non-Confidential

• Alternative half-precision.

The description of IEEE half-precision includes ARM-specific details that are left open by the standard, and is only

an introduction to the formats and to the values they can contain. For more information, especially on the handling

of infinities, NaNs and signed zeros, see the IEEE 754 standard.

For both half-precision floating-point formats, the layout of the 16-bit number is the same. The format is:

The interpretation of the format depends on the value of the exponent field, bits[14:10] and on which half-precision

format is being used.

0 < exponent <

0x1F

The value is a normalized number and is equal to:

(–1)S×2

(exponent-15) × (1.fraction)

The minimum positive normalized number is 2–14, or approximately 6.104 × 10–5.

The maximum positive normalized number is (2 – 2–10)×2

15, or 65504.

Larger normalized numbers can be expressed using the alternative format when the

exponent ==

0x1F

exponent == 0

The value is either a zero or a denormalized number, depending on the fraction bits:

fraction == 0

The value is a zero. There are two distinct zeros:

+0 when S==0

–0 when S==1.

fraction != 0

The value is a denormalized number and is equal to:

(–1)S × 2–14 × (0.fraction)

The minimum positive denormalized number is 2–24, or approximately 5.960 × 10–8.

exponent ==

0x1F

The value depends on which half-precision format is being used:

IEEE half-precision

The value is either an infinity or a Not a Number (NaN), depending on the fraction bits:

fraction == 0

The value is an infinity. There are two distinct infinities:

+infinity When S==0. This represents all positive numbers that are too

big to be represented accurately as a normalized number.

-infinity When S==1. This represents all negative numbers with an

absolute value that is too big to be represented accurately as a

normalized number.

fraction != 0

The value is a NaN, and is either a quiet NaN or a signaling NaN. The two

types of NaN are distinguished by their most significant fraction bit, bit[9]:

bit[9] == 0 The NaN is a signaling NaN. The sign bit can take any value,

and the remaining fraction bits can take any value except all

zeros.

bit[9] == 1 The NaN is a quiet NaN. The sign bit and remaining fraction

bits can take any value.

15 14 10 9 0

Sexponent fraction

A2 Application Level Programmers’ Model

A2.7 Floating-point data types and arithmetic

Non-Confidential ID051414

Alternative half-precision

The value is a normalized number and is equal to:

-1S × 216 × (1.fraction)

The maximum positive normalized number is (2-2-10) × 216 or 131008.

A2.7.5 Flush-to-zero

The performance of floating-point implementations can be significantly reduced when performing calculations

involving denormalized numbers and Underflow exceptions. In particular this occurs for implementations that only

handle normalized numbers and zeros in hardware, and invoke support code to handle any other types of value. For

an algorithm where a significant number of the operands and intermediate results are denormalized numbers, this

can result in a considerable loss of performance.

In many of these algorithms, this performance can be recovered, without significantly affecting the accuracy of the

final result, by replacing the denormalized operands and intermediate results with zeros. To permit this optimization,

Floating-point Extension implementations have a special processing mode called Flush-to-zero mode. Advanced

SIMD implementations always use Flush-to-zero mode.

Behavior in Flush-to-zero mode differs from normal IEEE 754 arithmetic in the following ways:

• All inputs to floating-point operations that are double-precision denormalized numbers or single-precision

denormalized numbers are treated as though they were zero. This causes an Input Denormal exception, but

does not cause an Inexact exception. The Input Denormal exception occurs only in Flush-to-zero mode.

Note

Combinations of exceptions on page A2-71 defines the floating-point operations.

The FPSCR contains a cumulative exception bit FPSCR.IDC and trap enable bit FPSCR.IDE corresponding

to the Input Denormal exception.

The occurrence of all exceptions except Input Denormal is determined using the input values after

flush-to-zero processing has occurred.

• The result of a floating-point operation is flushed to zero if the result of the operation before rounding

satisfies the condition:

0 <

Abs(result)

< MinNorm, where:

—MinNorm is2

-126 for single-precision

—MinNorm is2

-1022 for double-precision.

This causes the FPSCR.UFC bit to be set to 1, and prevents any Inexact exception from occurring for the

operation.

Underflow exceptions occur only when a result is flushed to zero.

In a VFPv2, VFPv3U, or VFPv4U implementation Underflow exceptions that occur in Flush-to-zero mode

are always treated as untrapped, even when the Underflow trap enable bit, FPSCR.UFE, is set to 1.

• An Inexact exception does not occur if the result is flushed to zero, even though the final result of zero is not

equivalent to the value that would be produced if the operation were performed with unbounded precision

and exponent range.

When an input or a result is flushed to zero the value of the sign bit of the zero is determined as follows:

• In VFPv4, VFPv4U, VFPv3, or VFPv3U, it is preserved. That is, the sign bit of the zero matches the sign bit

of the input or result that is being flushed to zero.

• In VFPv2, it is IMPLEMENTATION DEFINED whether it is preserved or always positive. The same choice must

be made for all cases of flushing an input or result to zero.

Flush-to-zero mode has no effect on half-precision numbers that are inputs to floating-point operations, or results

from floating-point operations.

A2 Application Level Programmers’ Model

A2.7 Floating-point data types and arithmetic

ID051414 Non-Confidential

Note

Flush-to-zero mode is incompatible with the IEEE 754 standard, and must not be used when IEEE 754 compatibility

is a requirement. Flush-to-zero mode must be used with care. Although it can improve performance on some

algorithms, there are significant limitations on its use. These are application dependent:

• On many algorithms, it has no noticeable effect, because the algorithm does not normally use denormalized

numbers.

• On other algorithms, it can cause exceptions to occur or seriously reduce the accuracy of the results of the

algorithm.

A2.7.6 NaN handling and the Default NaN

The IEEE 754 standard specifies that:

• an operation that produces an Invalid Operation floating-point exception generates a quiet NaN as its result

if that exception is untrapped

• an operation involving a quiet NaN operand, but not a signaling NaN operand, returns an input NaN as its

result.

The Floating-point Extension behavior when Default NaN mode is disabled adheres to this, with the following

additions:

• If an untrapped Invalid Operation floating-point exception is produced, the quiet NaN result is derived from:

— the first signaling NaN operand, if the exception was produced because at least one of the operands is

a signaling NaN

— otherwise, the default NaN

• If an untrapped Invalid Operation floating-point exception is not produced, but at least one of the operands

is a quiet NaN, the result is derived from the first quiet NaN operand.

Depending on the operation, the exact value of a derived quiet NaN result may differ in both sign and number of

fraction bits from its source.For a quiet NaN result derived from signaling NaN operand, the most-significant

fraction bit is set to 1.

Note

• In these descriptions, first operand relates to the left-to-right ordering of the arguments to the pseudocode

function that describes the operation.

• The IEEE 754 standard specifies that the sign bit of a NaN has no significance.

The Floating-point Extension behavior when Default NaN mode is enabled, and the Advanced SIMD behavior in

all circumstances, is that the Default NaN is the result of all floating-point operations that either:

• generate untrapped Invalid Operation floating-point exceptions

• have one or more quiet NaN inputs, but no signaling NaN inputs.

Table A2-7 on page A2-70 shows the format of the default NaN for ARM floating-point processors.

Default NaN mode is selected for the Floating-point Extension by setting the FPSCR.DN bit to 1.

Other aspects of the functionality of the Invalid Operation exception are not affected by Default NaN mode. These

are that:

• If untrapped, it causes the FPSCR.IOC bit be set to 1.

• If trapped, it causes a user trap handler to be invoked. This is only possible in VFPv2, VFPv3U, and VFPv4U.

A2 Application Level Programmers’ Model

A2.7 Floating-point data types and arithmetic

Non-Confidential ID051414

A2.7.7 Floating-point exceptions

The Advanced SIMD and Floating-point Extensions record the following floating-point exceptions in the FPSCR

cumulative bits:

FPSCR.IOC Invalid Operation. The bit is set to 1 if the result of an operation has no mathematical value or cannot

be represented. Cases include, for example:

• (infinity) × 0

• (+infinity) + (–infinity).

These tests are made after flush-to-zero processing. For example, if flush-to-zero mode is selected,

multiplying a denormalized number and an infinity is treated as (0 × infinity), and causes an Invalid

Operation floating-point exception.

IOC is also set on any floating-point operation with one or more signaling NaNs as operands, except

for negation and absolute value, as described in Floating-point negation and absolute value on

page A2-75.

FPSCR.DZC Division by Zero. The bit is set to 1 if a divide operation has a zero divisor and a dividend that is

not zero, an infinity or a NaN. These tests are made after flush-to-zero processing, so if flush-to-zero

processing is selected, a denormalized dividend is treated as zero and prevents Division by Zero

from occurring, and a denormalized divisor is treated as zero and causes Division by Zero to occur

if the dividend is a normalized number.

For the reciprocal and reciprocal square root estimate functions the dividend is assumed to be +1.0.

This means that a zero or denormalized operand to these functions sets the DZC bit.

FPSCR.OFC Overflow. The bit is set to 1 if the absolute value of the result of an operation, produced after

rounding, is greater than the maximum positive normalized number for the destination precision.

FPSCR.UFC Underflow. The bit is set to 1 if the absolute value of the result of an operation, produced before

rounding, is less than the minimum positive normalized number for the destination precision, and

the rounded result is inexact.

The criteria for the Underflow exception to occur are different in Flush-to-zero mode. For details,

see Flush-to-zero on page A2-68.

FPSCR.IXC Inexact. The bit is set to 1 if the result of an operation is not equivalent to the value that would be

produced if the operation were performed with unbounded precision and exponent range.

The criteria for the Inexact exception to occur are different in Flush-to-zero mode. For details, see

Flush-to-zero on page A2-68.

FPSCR.IDC Input Denormal. The bit is set to 1 if a denormalized input operand is replaced in the computation

by a zero, as described in Flush-to-zero on page A2-68.

With the Advanced SIMD Extension and the VFPv3 or VFPv4 versions of the Floating-point Extension these are

non-trapping exceptions and the data-processing instructions do not generate any trapped exceptions.

Table A2-7 Default NaN encoding

Half-precision, IEEE Format Single-precision Double-precision

Sign bit 0 0a0a

Exponent

0x1F 0xFF 0x7FF

Fraction Bit[9] == 1, bits[8:0] == 0 bit[22] == 1, bits[21:0] == 0 bit[51] == 1, bits[50:0] == 0

a. In VFPv2, the sign bit of the Default NaN is UNKNOWN.

A2 Application Level Programmers’ Model

A2.7 Floating-point data types and arithmetic

ID051414 Non-Confidential

With the VFPv2, VFPv3U, and VFPv4U versions of the Floating-point Extension:

• These exceptions can be trapped, by setting trap enable bits in the FPSCR, see VFPv3U and VFPv4U on

page A2-62. The way in which trapped floating-point exceptions are delivered to user software is

IMPLEMENTATION DEFINED. However, ARM recommends use of the VFP subarchitecture defined in

Appendix D6 Common VFP Subarchitecture Specification.

• The definition of the Underflow exception is different in the trapped and cummulative exception cases. In the

trapped case, meaning for VFPv2, VFPv3U, or VFPv4U, the definition is:

— the trapped Underflow exception occurs if the absolute value of the result of an operation, produced

before rounding, is less than the minimum positive normalized number for the destination precision,

regardless of whether the rounded result is inexact.

• As with cumulative exceptions, higher priority trapped exceptions can prevent lower priority exceptions from

occurring, as described in Combinations of exceptions.

Table A2-8 shows the results of untrapped floating-point exceptions:

In Table A2-8:

MaxNorm The maximum normalized number of the destination precision.

RM Round towards Minus Infinity mode, as defined in the IEEE 754 standard.

RN Round to Nearest mode, as defined in the IEEE 754 standard.

RP Round towards Plus Infinity mode, as defined in the IEEE 754 standard.

RZ Round towards Zero mode, as defined in the IEEE 754 standard.

• For Invalid Operation exceptions, for details of which quiet NaN is produced as the default result see NaN

handling and the Default NaN on page A2-69.

• For Division by Zero exceptions, the sign bit of the default result is determined normally for a division. This

means it is the exclusive OR of the sign bits of the two operands.

• For Overflow exceptions, the sign bit of the default result is determined normally for the overflowing

operation.

Combinations of exceptions

The following pseudocode functions perform floating-point operations:

FixedToFP()

FPAdd()

FPCompare()

FPCompareEQ()

FPCompareGE()

FPCompareGT()

FPDiv()

Table A2-8 Results of untrapped floating-point exceptions

Exception type Default result for positive sign Default result for negative sign

IOC, Invalid Operation Quiet NaN Quiet NaN

DZC, Division by Zero +infinity -infinity

OFC, Overflow RN, RP:

RM, RZ:

+infinity

+MaxNorm

RN, RM:

RP, RZ:

-infinity

-MaxNorm

UFC, Underflow Normal rounded result Normal rounded result

IXC, Inexact Normal rounded result Normal rounded result

IDC, Input Denormal Normal rounded result Normal rounded result

A2 Application Level Programmers’ Model

A2.7 Floating-point data types and arithmetic

Non-Confidential ID051414

FPDoubleToSingle()

FPHalfToSingle()

FPMax()

FPMin()

FPMul()

FPMulAdd()

FPRecipEstimate()

FPRecipStep()

FPRSqrtEstimate()

FPRSqrtStep()

FPSingleToDouble()

FPSingleToHalf()

FPSqrt()

FPSub()

FPToFixed()

All of these operations can generate floating-point exceptions.

Note

FPAbs()

and

FPNeg()

are not classified as floating-point operations because:

• they cannot generate floating-point exceptions

• the floating-point operation behavior described in the following sections does not apply to them:

—Flush-to-zero on page A2-68

—NaN handling and the Default NaN on page A2-69.

More than one exception can occur on the same operation. The only combinations of exceptions that can occur are:

• Overflow with Inexact

• Underflow with Inexact

• Input Denormal with other exceptions.

When none of the exceptions caused by an operation are trapped, any exception that occurs causes the associated

cumulative bit in the FPSCR to be set.

When one or more exceptions caused by an operation are trapped, the behavior of the instruction depends on the

priority of the exceptions. The Inexact exception is treated as lowest priority, and Input Denormal as highest priority:

• If the higher priority exception is trapped, its trap handler is called. It is IMPLEMENTATION DEFINED whether

the parameters to the trap handler include information about the lower priority exception. Apart from this,

the lower priority exception is ignored in this case.

• If the higher priority exception is untrapped, its cumulative bit is set to 1 and its default result is evaluated.

Then the lower priority exception is handled normally, using this default result.

Some floating-point instructions specify more than one floating-point operation, as indicated by the pseudocode

descriptions of the instruction. In such cases, an exception on one operation is treated as higher priority than an

exception on another operation if the occurrence of the second exception depends on the result of the first operation.

Otherwise, it is UNPREDICTABLE which exception is treated as higher priority.

For example, a

VMLA.F32

instruction specifies a floating-point multiplication followed by a floating-point addition.

The addition can generate Overflow, Underflow and Inexact exceptions, all of which depend on both operands to

the addition and so are treated as lower priority than any exception on the multiplication. The same applies to Invalid

Operation exceptions on the addition caused by adding opposite-signed infinities. The addition can also generate an

Input Denormal exception, caused by the addend being a denormalized number while in Flush-to-zero mode. It is

UNPREDICTABLE which of an Input Denormal exception on the addition and an exception on the multiplication is

treated as higher priority, because the occurrence of the Input Denormal exception does not depend on the result of

the multiplication. The same applies to an Invalid Operation exception on the addition caused by the addend being

a signaling NaN.

A2 Application Level Programmers’ Model

A2.7 Floating-point data types and arithmetic

ID051414 Non-Confidential

Note

• The

VFMA

instruction performs a vector addition and a vector multiplication as a single operation. The

VFMS

instruction performs a vector subtraction and a vector multiplication as a single operation.

• Like other details of Floating-point instruction execution, these rules about exception handling apply to the

overall results produced by an instruction when the system uses a combination of hardware and support code

to implement it. See Floating-point support code on page B1-1237 for more information.

These principles also apply to the multiple floating-point operations generated by Floating-point instructions

in the deprecated VFP vector mode of operation. For details of this mode of operation see Appendix D11 VFP

Vector Operation Support.

A2.7.8 Pseudocode details of floating-point operations

The following subsections contain pseudocode definitions of the floating-point functionality supported by the

ARMv7 architecture:

•Generation of specific floating-point values

•Floating-point negation and absolute value on page A2-75

•Floating-point value unpacking on page A2-75

•Floating-point exception and NaN handling on page A2-76

•Floating-point rounding on page A2-78

•Selection of ARM standard floating-point arithmetic on page A2-80

•Floating-point comparisons on page A2-80

•Floating-point maximum and minimum on page A2-81

•Floating-point addition and subtraction on page A2-82

•Floating-point multiplication and division on page A2-83

•Floating-point fused multiply-add on page A2-84

•Floating-point reciprocal estimate and step on page A2-85

•Floating-point square root on page A2-87

•Floating-point reciprocal square root estimate and step on page A2-87

•Floating-point conversions on page A2-90.

Generation of specific floating-point values

The following pseudocode functions generate specific floating-point values. The

sign

argument of

FPInfinity()

FPMaxNormal()

, and

FPZero()

'0'

for the positive version and

'1'

for the negative version.

// FPZero()

// ========

bits(N) FPZero(bit sign, integer N)

assert N IN {16,32,64};

if N == 16 then

E = 5;

elsif N == 32 then

E = 8;

else E = 11;

F = N - E - 1;

exp = Zeros(E);

frac = Zeros(F);

return sign:exp:frac;

// FPTwo()

// =======

bits(N) FPTwo(integer N)

assert N IN {32,64};

if N == 16 then

A2 Application Level Programmers’ Model

A2.7 Floating-point data types and arithmetic

Non-Confidential ID051414

E = 5;

elsif N == 32 then

E = 8;

else E = 11;

F = N - E - 1;

sign = '0';

exp = '1':Zeros(E-1);

frac = Zeros(F);

return sign:exp:frac;

// FPThree()

// =========

bits(N) FPThree(integer N)

assert N IN {32,64};

if N == 16 then

E = 5;

elsif N == 32 then

E = 8;

else E = 11;

F = N - E - 1;

sign = '0';

exp = '1':Zeros(E-1);

frac = '1':Zeros(F-1);

return sign:exp:frac;

// FPMaxNormal()

// =============

bits(N) FPMaxNormal(bit sign, integer N)

assert N IN {16,32,64};

if N == 16 then

E = 5;

elsif N == 32 then

E = 8;

else E = 11;

F = N - E - 1;

exp = Ones(E-1):'0';

frac = Ones(F);

return sign:exp:frac;

// FPInfinity()

// ============

bits(N) FPInfinity(bit sign, integer N)

assert N IN {16,32,64};

if N == 16 then

E = 5;

elsif N == 32 then

E = 8;

else E = 11;

F = N - E - 1;

exp = Ones(E);

frac = Zeros(F);

return sign:exp:frac;

// FPDefaultNaN()

// ==============

bits(N) FPDefaultNaN(integer N)

assert N IN {16,32,64};

if N == 16 then

E = 5;

elsif N == 32 then

A2 Application Level Programmers’ Model

A2.7 Floating-point data types and arithmetic

ID051414 Non-Confidential

E = 8;

else E = 11;

F = N - E - 1;

sign = '0';

exp = Ones(E);

frac = '1':Zeros(F-1);

return sign:exp:frac;

Note

This definition of

FPDefaultNaN()

applies to VFPv4, VFPv4U, VFPv3, and VFPv3U implementations. For VFPv2,

the sign bit of the result is a single-bit UNKNOWN value, instead of 0.

Floating-point negation and absolute value

The floating-point negation and absolute value operations only affect the sign bit. They do not treat NaN operands

specially, nor denormalized number operands when flush-to-zero is selected.

// FPNeg()

// =======

bits(N) FPNeg(bits(N) operand)

assert N IN {32,64};

return NOT(operand<N-1>) : operand<N-2:0>;

// FPAbs()

// =======

bits(N) FPAbs(bits(N) operand)

assert N IN {32,64};

return '0' : operand<N-2:0>;

Floating-point value unpacking

The

FPUnpack()

function determines the type and numerical value of a floating-point number. It also does

flush-to-zero processing on input operands.

enumeration FPType {FPType_Nonzero, FPType_Zero, FPType_Infinity, FPType_QNaN, FPType_SNaN};

// FPUnpack()

// ==========

// Unpack a floating-point number into its type, sign bit and the real number

// that it represents. The real number result has the correct sign for numbers

// and infinities, is very large in magnitude for infinities, and is 0.0 for

// NaNs. (These values are chosen to simplify the description of comparisons

// and conversions.)

// The 'fpscr_val' argument supplies FPSCR control bits. Status information is

// updated directly in the FPSCR where appropriate.

(FPType, bit, real) FPUnpack(bits(N) fpval, bits(32) fpscr_val)

assert N IN {16,32,64};

if N == 16 then

sign = fpval<15>;

exp16 = fpval<14:10>;

frac16 = fpval<9:0>;

if IsZero(exp16) then

// Produce zero if value is zero

if IsZero(frac16) then

type = FPType_Zero; value = 0.0;

else

type = FPType_Nonzero; value = 2.0^-14 * (UInt(frac16) * 2.0^-10);

A2 Application Level Programmers’ Model

A2.7 Floating-point data types and arithmetic

Non-Confidential ID051414

elsif IsOnes(exp16) && fpscr_val<26> == '0' then // Infinity or NaN in IEEE format

if IsZero(frac16) then

type = FPType_Infinity; value = 2.0^1000000;

else

type = if frac16<9> == '1' then FPType_QNaN else FPType_SNaN;

value = 0.0;

else

type = FPType_Nonzero; value = 2.0^(UInt(exp16)-15) * (1.0 + UInt(frac16) * 2.0^-10);

elsif N == 32 then

sign = fpval<31>;

exp32 = fpval<30:23>;

frac32 = fpval<22:0>;

if IsZero(exp32) then

// Produce zero if value is zero or flush-to-zero is selected.

if IsZero(frac32) || fpscr_val<24> == '1' then

type = FPType_Zero; value = 0.0;

if !IsZero(frac32) then // Denormalized input flushed to zero

FPProcessException(FPExc_InputDenorm, fpscr_val);

else

type = FPType_Nonzero; value = 2.0^-126 * (UInt(frac32) * 2.0^-23);

elsif IsOnes(exp32) then

if IsZero(frac32) then

type = FPType_Infinity; value = 2.0^1000000;

else

type = if frac32<22> == '1' then FPType_QNaN else FPType_SNaN;

value = 0.0;

else

type = FPType_Nonzero; value = 2.0^(UInt(exp32)-127) * (1.0 + UInt(frac32) * 2.0^-23);

else // N == 64

sign = fpval<63>;

exp64 = fpval<62:52>;

frac64 = fpval<51:0>;

if IsZero(exp64) then

// Produce zero if value is zero or flush-to-zero is selected.

if IsZero(frac64) || fpscr_val<24> == '1' then

type = FPType_Zero; value = 0.0;

if !IsZero(frac64) then // Denormalized input flushed to zero

FPProcessException(FPExc_InputDenorm, fpscr_val);

else

type = FPType_Nonzero; value = 2.0^-1022 * (UInt(frac64) * 2.0^-52);

elsif IsOnes(exp64) then

if IsZero(frac64) then

type = FPType_Infinity; value = 2.0^1000000;

else

type = if frac64<51> == '1' then FPType_QNaN else FPType_SNaN;

value = 0.0;

else

type = FPType_Nonzero; value = 2.0^(UInt(exp64)-1023) * (1.0 + UInt(frac64) * 2.0^-52);

if sign == '1' then value = -value;

return (type, sign, value);

Floating-point exception and NaN handling

The

FPProcessException()

procedure checks whether a floating-point exception is trapped, and handles it

accordingly:

enumeration FPExc {FPExc_InvalidOp, FPExc_DivideByZero, FPExc_Overflow,

FPExc_Underflow, FPExc_Inexact, FPExc_InputDenorm};

// FPProcessException()

// ====================

A2 Application Level Programmers’ Model

A2.7 Floating-point data types and arithmetic

ID051414 Non-Confidential

// The 'fpscr_val' argument supplies FPSCR control bits. Status information is

// updated directly in the FPSCR where appropriate.

FPProcessException(FPExc exception, bits(32) fpscr_val)

// Get appropriate FPSCR bit numbers

case exception of

when FPExc_InvalidOp enable = 8; cumul = 0;

when FPExc_DivideByZero enable = 9; cumul = 1;

when FPExc_Overflow enable = 10; cumul = 2;

when FPExc_Underflow enable = 11; cumul = 3;

when FPExc_Inexact enable = 12; cumul = 4;

when FPExc_InputDenorm enable = 15; cumul = 7;

if fpscr_val<enable> == '1' then

IMPLEMENTATION_DEFINED floating-point trap handling;

else

FPSCR<cumul> = '1';

return;

The

FPProcessNaN()

function processes a NaN operand, producing the correct result value and generating an Invalid

Operation exception if necessary:

// FPProcessNaN()

// ==============

// The 'fpscr_val' argument supplies FPSCR control bits. Status information is

// updated directly in the FPSCR where appropriate.

bits(N) FPProcessNaN(FPType type, bits(N) operand, bits(32) fpscr_val)

assert N IN {32,64};

topfrac = if N == 32 then 22 else 51;

result = operand;

if type == FPType_SNaN then

result<topfrac> = '1';

FPProcessException(FPExc_InvalidOp, fpscr_val);

if fpscr_val<25> == '1' then // DefaultNaN requested

result = FPDefaultNaN(N);

return result;

The

FPProcessNaNs()

function performs the standard NaN processing for a two-operand operation:

// FPProcessNaNs()

// ===============

// The boolean part of the return value says whether a NaN has been found and

// processed. The bits(N) part is only relevant if it has and supplies the

// result of the operation.

// The 'fpscr_val' argument supplies FPSCR control bits. Status information is

// updated directly in the FPSCR where appropriate.

(boolean, bits(N)) FPProcessNaNs(FPType type1, FPType type2,

bits(N) op1, bits(N) op2,

bits(32) fpscr_val)

assert N IN {32,64};

if type1 == FPType_SNaN then

done = TRUE; result = FPProcessNaN(type1, op1, fpscr_val);

elsif type2 == FPType_SNaN then

done = TRUE; result = FPProcessNaN(type2, op2, fpscr_val);

elsif type1 == FPType_QNaN then

done = TRUE; result = FPProcessNaN(type1, op1, fpscr_val);

elsif type2 == FPType_QNaN then

done = TRUE; result = FPProcessNaN(type2, op2, fpscr_val);

else

done = FALSE; result = Zeros(N); // 'Don't care' result

return (done, result);

The

FPProcessNaNs3()

function performs the standard NaN processing for a three-operand operation:

// FPProcessNaNs3()

A2 Application Level Programmers’ Model

A2.7 Floating-point data types and arithmetic

Non-Confidential ID051414

// ===============

// The boolean part of the return value says whether a NaN has been found and

// processed. The bits(N) part is only relevant if it has and supplies the

// result of the operation.

// The 'fpscr_val' argument supplies FPSCR control bits. Status information is

// updated directly in the FPSCR where appropriate.

(boolean, bits(N)) FPProcessNaNs3(FPType type1, FPType type2, FPType type3,

bits(N) op1, bits(N) op2, bits(N) op3,

bits(32) fpscr_val)

assert N IN {32,64};

if type1 == FPType_SNaN then

done = TRUE; result = FPProcessNaN(type1, op1, fpscr_val);

elsif type2 == FPType_SNaN then

done = TRUE; result = FPProcessNaN(type2, op2, fpscr_val);

elsif type3 == FPType_SNaN then

done = TRUE; result = FPProcessNaN(type3, op3, fpscr_val);

elsif type1 == FPType_QNaN then

done = TRUE; result = FPProcessNaN(type1, op1, fpscr_val);

elsif type2 == FPType_QNaN then

done = TRUE; result = FPProcessNaN(type2, op2, fpscr_val);

elsif type3 == FPType_QNaN then

done = TRUE; result = FPProcessNaN(type3, op3, fpscr_val);

else

done = FALSE; result = Zeros(N); // 'Don't care' result

return (done, result);

Floating-point rounding

The

FPRound()

function rounds and encodes a supplied floating-point value to a specified destination format. This

includes processing Overflow, Underflow and Inexact floating-point exceptions and performing flush-to-zero

processing on the resulting value.

// FPRound()

// =========

// The 'fpscr_val' argument supplies FPSCR control bits. Status information is

// updated directly in the FPSCR where appropriate.

bits(N) FPRound(real value, integer N, bits(32) fpscr_val)

assert N IN {16,32,64};

assert value != 0.0;

// Obtain format parameters - minimum exponent, numbers of exponent and fraction bits.

if N == 16 then

E = 5;

elsif N == 32 then

E = 8;

else E = 11;

minimum_exp = 2 - 2^(E-1);

F = N - E - 1;

// Split value into sign, unrounded mantissa and exponent.

if value < 0.0 then

sign = '1'; mantissa = -value;

else

sign = '0'; mantissa = value;

exponent = 0;

while mantissa < 1.0 do

mantissa = mantissa * 2.0; exponent = exponent - 1;

while mantissa >= 2.0 do

mantissa = mantissa / 2.0; exponent = exponent + 1;

// Deal with flush-to-zero.

A2 Application Level Programmers’ Model

A2.7 Floating-point data types and arithmetic

ID051414 Non-Confidential

if fpscr_val<24> == '1' && N != 16 && exponent < minimum_exp then

result = FPZero(sign, N);

FPSCR.UFC = '1'; // Flush-to-zero never generates a trapped exception

else

// Start creating the exponent value for the result. Start by biasing the actual exponent

// so that the minimum exponent becomes 1, lower values 0 (indicating possible underflow).

biased_exp = Max(exponent - minimum_exp + 1, 0);

if biased_exp == 0 then mantissa = mantissa / 2^(minimum_exp - exponent);

// Get the unrounded mantissa as an integer, and the "units in last place" rounding error.

int_mant = RoundDown(mantissa * 2^F); // < 2^F if biased_exp == 0, >= 2^F if not

error = mantissa * 2^F - int_mant;

// Underflow occurs if exponent is too small before rounding, and result is inexact or

// the Underflow exception is trapped.

if biased_exp == 0 && (error != 0.0 || fpscr_val<11> == '1') then

FPProcessException(FPExc_Underflow, fpscr_val);

// Round result according to rounding mode.

case fpscr_val<23:22> of

when '00' // Round to Nearest (rounding to even if exactly halfway)

round_up = (error > 0.5 || (error == 0.5 && int_mant<0> == '1'));

overflow_to_inf = TRUE;

when '01' // Round towards Plus Infinity

round_up = (error != 0.0 && sign == '0');

overflow_to_inf = (sign == '0');

when '10' // Round towards Minus Infinity

round_up = (error != 0.0 && sign == '1');

overflow_to_inf = (sign == '1');

when '11' // Round towards Zero

round_up = FALSE;

overflow_to_inf = FALSE;

if round_up then

int_mant = int_mant + 1;

if int_mant == 2^F then // Rounded up from denormalized to normalized

biased_exp = 1;

if int_mant == 2^(F+1) then // Rounded up to next exponent

biased_exp = biased_exp + 1;

int_mant = int_mant DIV 2;

// Deal with overflow and generate result.

if N != 16 || fpscr_val<26> == '0' then // Single, double or IEEE half precision

if biased_exp >= 2^E - 1 then

result = if overflow_to_inf then FPInfinity(sign, N) else FPMaxNormal(sign, N);

FPProcessException(FPExc_Overflow, fpscr_val);

error = 1.0; // Ensure that an Inexact exception occurs

else

result = sign:biased_exp<E-1:0>:int_mant<F-1:0>;

else // Alternative half precision (with N==16)

if biased_exp >= 2^E then

result = sign : Ones(15);

FPProcessException(FPExc_InvalidOp, fpscr_val);

error = 0.0; // Ensure that an Inexact exception does not occur

else

result = sign:biased_exp<E-1:0>:int_mant<F-1:0>;

// Deal with Inexact exception.

if error != 0.0 then

FPProcessException(FPExc_Inexact, fpscr_val);

return result;

A2 Application Level Programmers’ Model

A2.7 Floating-point data types and arithmetic

Non-Confidential ID051414

Selection of ARM standard floating-point arithmetic

The

StandardFPSCRValue()

function returns the FPSCR value that selects ARM standard floating-point arithmetic.

Most of the arithmetic functions have a Boolean

fpscr_controlled

argument that is

TRUE

for Floating-point

operations and

FALSE

for Advanced SIMD operations, and that selects between using the real FPSCR value and this

value.

// StandardFPSCRValue()

// ====================

bits(32) StandardFPSCRValue()

return '00000' : FPSCR<26> : '11000000000000000000000000';

Floating-point comparisons

The

FPCompare()

function compares two floating-point numbers, producing a {N, Z, C, V} condition flags result as

shown in Table A2-9:

This result defines the operation of the

VCMP

instruction in the Floating-point Extension. The

VCMP

instruction writes

these flag values in the FPSCR. After using a

VMRS

instruction to transfer them to the APSR, they can control

conditional execution as shown in Table A8-1 on page A8-288.

// FPCompare()

// ===========

(bit, bit, bit, bit) FPCompare(bits(N) op1, bits(N) op2, boolean quiet_nan_exc,

boolean fpscr_controlled)

assert N IN {32,64};

fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();

(type1,sign1,value1) = FPUnpack(op1, fpscr_val);

(type2,sign2,value2) = FPUnpack(op2, fpscr_val);

if type1==FPType_SNaN || type1==FPType_QNaN || type2==FPType_SNaN || type2==FPType_QNaN then

result = ('0','0','1','1');

if type1==FPType_SNaN || type2==FPType_SNaN || quiet_nan_exc then

FPProcessException(FPExc_InvalidOp, fpscr_val);

else

// All non-NaN cases can be evaluated on the values produced by FPUnpack()

if value1 == value2 then

result = ('0','1','1','0');

elsif value1 < value2 then

result = ('1','0','0','0');

else // value1 > value2

result = ('0','0','1','0');

return result;

The

FPCompareEQ()

FPCompareGE()

and

FPCompareGT()

functions describe the operation of Advanced SIMD

instructions that perform floating-point comparisons.

// FPCompareEQ()

// =============

boolean FPCompareEQ(bits(32) op1, bits(32) op2, boolean fpscr_controlled)

fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();

Table A2-9 Effect of a Floating-point comparison on the condition flags

Comparison result N Z C V

Equal 0 1 1 0

Less than 1 0 0 0

Greater than 0 0 1 0

Unordered 0 0 1 1

A2 Application Level Programmers’ Model

A2.7 Floating-point data types and arithmetic

ID051414 Non-Confidential

(type1,sign1,value1) = FPUnpack(op1, fpscr_val);

(type2,sign2,value2) = FPUnpack(op2, fpscr_val);

if type1==FPType_SNaN || type1==FPType_QNaN || type2==FPType_SNaN || type2==FPType_QNaN then

result = FALSE;

if type1==FPType_SNaN || type2==FPType_SNaN then

FPProcessException(FPExc_InvalidOp, fpscr_val);

else

// All non-NaN cases can be evaluated on the values produced by FPUnpack()

result = (value1 == value2);

return result;

// FPCompareGE()

// =============

boolean FPCompareGE(bits(32) op1, bits(32) op2, boolean fpscr_controlled)

fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();

(type1,sign1,value1) = FPUnpack(op1, fpscr_val);

(type2,sign2,value2) = FPUnpack(op2, fpscr_val);

if type1==FPType_SNaN || type1==FPType_QNaN || type2==FPType_SNaN || type2==FPType_QNaN then

result = FALSE;

FPProcessException(FPExc_InvalidOp, fpscr_val);

else

// All non-NaN cases can be evaluated on the values produced by FPUnpack()

result = (value1 >= value2);

return result;

// FPCompareGT()

// =============

boolean FPCompareGT(bits(32) op1, bits(32) op2, boolean fpscr_controlled)

fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();

(type1,sign1,value1) = FPUnpack(op1, fpscr_val);

(type2,sign2,value2) = FPUnpack(op2, fpscr_val);

if type1==FPType_SNaN || type1==FPType_QNaN || type2==FPType_SNaN || type2==FPType_QNaN then

result = FALSE;

FPProcessException(FPExc_InvalidOp, fpscr_val);

else

// All non-NaN cases can be evaluated on the values produced by FPUnpack()

result = (value1 > value2);

return result;

Floating-point maximum and minimum

// FPMax()

// =======

bits(N) FPMax(bits(N) op1, bits(N) op2, boolean fpscr_controlled)

assert N IN {32,64};

fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();

(type1,sign1,value1) = FPUnpack(op1, fpscr_val);

(type2,sign2,value2) = FPUnpack(op2, fpscr_val);

(done,result) = FPProcessNaNs(type1, type2, op1, op2, fpscr_val);

if !done then

if value1 > value2 then

(type,sign,value) = (type1,sign1,value1);

else

(type,sign,value) = (type2,sign2,value2);

if type == FPType_Infinity then

result = FPInfinity(sign, N);

elsif type == FPType_Zero then

sign = sign1 AND sign2; // Use most positive sign

result = FPZero(sign, N);

else

result = FPRound(value, N, fpscr_val);

return result;

// FPMin()

A2 Application Level Programmers’ Model

A2.7 Floating-point data types and arithmetic

Non-Confidential ID051414

// =======

bits(N) FPMin(bits(N) op1, bits(N) op2, boolean fpscr_controlled)

assert N IN {32,64};

fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();

(type1,sign1,value1) = FPUnpack(op1, fpscr_val);

(type2,sign2,value2) = FPUnpack(op2, fpscr_val);

(done,result) = FPProcessNaNs(type1, type2, op1, op2, fpscr_val);

if !done then

if value1 < value2 then

(type,sign,value) = (type1,sign1,value1);

else

(type,sign,value) = (type2,sign2,value2);

if type == FPType_Infinity then

result = FPInfinity(sign, N);

elsif type == FPType_Zero then

sign = sign1 OR sign2; // Use most negative sign

result = FPZero(sign, N);

else

result = FPRound(value, N, fpscr_val);

return result;

Floating-point addition and subtraction

// FPAdd()

// =======

bits(N) FPAdd(bits(N) op1, bits(N) op2, boolean fpscr_controlled)

assert N IN {32,64};

fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();

(type1,sign1,value1) = FPUnpack(op1, fpscr_val);

(type2,sign2,value2) = FPUnpack(op2, fpscr_val);

(done,result) = FPProcessNaNs(type1, type2, op1, op2, fpscr_val);

if !done then

inf1 = (type1 == FPType_Infinity); inf2 = (type2 == FPType_Infinity);

zero1 = (type1 == FPType_Zero); zero2 = (type2 == FPType_Zero);

if inf1 && inf2 && sign1 == NOT(sign2) then

result = FPDefaultNaN(N);

FPProcessException(FPExc_InvalidOp, fpscr_val);

elsif (inf1 && sign1 == '0') || (inf2 && sign2 == '0') then

result = FPInfinity('0', N);

elsif (inf1 && sign1 == '1') || (inf2 && sign2 == '1') then

result = FPInfinity('1', N);

elsif zero1 && zero2 && sign1 == sign2 then

result = FPZero(sign1, N);

else

result_value = value1 + value2;

if result_value == 0.0 then // Sign of exact zero result depends on rounding mode

result_sign = if fpscr_val<23:22> == '10' then '1' else '0';

result = FPZero(result_sign, N);

else

result = FPRound(result_value, N, fpscr_val);

return result;

// FPSub()

// =======

bits(N) FPSub(bits(N) op1, bits(N) op2, boolean fpscr_controlled)

assert N IN {32,64};

fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();

(type1,sign1,value1) = FPUnpack(op1, fpscr_val);

(type2,sign2,value2) = FPUnpack(op2, fpscr_val);

(done,result) = FPProcessNaNs(type1, type2, op1, op2, fpscr_val);

if !done then

inf1 = (type1 == FPType_Infinity); inf2 = (type2 == FPType_Infinity);

zero1 = (type1 == FPType_Zero); zero2 = (type2 == FPType_Zero);

if inf1 && inf2 && sign1 == sign2 then

A2 Application Level Programmers’ Model

A2.7 Floating-point data types and arithmetic

ID051414 Non-Confidential

result = FPDefaultNaN(N);

FPProcessException(FPExc_InvalidOp, fpscr_val);

elsif (inf1 && sign1 == '0') || (inf2 && sign2 == '1') then

result = FPInfinity('0', N);

elsif (inf1 && sign1 == '1') || (inf2 && sign2 == '0') then

result = FPInfinity('1', N);

elsif zero1 && zero2 && sign1 == NOT(sign2) then

result = FPZero(sign1, N);

else

result_value = value1 - value2;

if result_value == 0.0 then // Sign of exact zero result depends on rounding mode

result_sign = if fpscr_val<23:22> == '10' then '1' else '0';

result = FPZero(result_sign, N);

else

result = FPRound(result_value, N, fpscr_val);

return result;

Floating-point multiplication and division

// FPMul()

// =======

bits(N) FPMul(bits(N) op1, bits(N) op2, boolean fpscr_controlled)

assert N IN {32,64};

fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();

(type1,sign1,value1) = FPUnpack(op1, fpscr_val);

(type2,sign2,value2) = FPUnpack(op2, fpscr_val);

(done,result) = FPProcessNaNs(type1, type2, op1, op2, fpscr_val);

if !done then

inf1 = (type1 == FPType_Infinity); inf2 = (type2 == FPType_Infinity);

zero1 = (type1 == FPType_Zero); zero2 = (type2 == FPType_Zero);

if (inf1 && zero2) || (zero1 && inf2) then

result = FPDefaultNaN(N);

FPProcessException(FPExc_InvalidOp, fpscr_val);

elsif inf1 || inf2 then

result_sign = if sign1 == sign2 then '0' else '1';

result = FPInfinity(result_sign, N);

elsif zero1 || zero2 then

result_sign = if sign1 == sign2 then '0' else '1';

result = FPZero(result_sign, N);

else

result = FPRound(value1*value2, N, fpscr_val);

return result;

// FPDiv()

// =======

bits(N) FPDiv(bits(N) op1, bits(N) op2, boolean fpscr_controlled)

assert N IN {32,64};

fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();

(type1,sign1,value1) = FPUnpack(op1, fpscr_val);

(type2,sign2,value2) = FPUnpack(op2, fpscr_val);

(done,result) = FPProcessNaNs(type1, type2, op1, op2, fpscr_val);

if !done then

inf1 = (type1 == FPType_Infinity); inf2 = (type2 == FPType_Infinity);

zero1 = (type1 == FPType_Zero); zero2 = (type2 == FPType_Zero);

if (inf1 && inf2) || (zero1 && zero2) then

result = FPDefaultNaN(N);

FPProcessException(FPExc_InvalidOp, fpscr_val);

elsif inf1 || zero2 then

result_sign = if sign1 == sign2 then '0' else '1';

result = FPInfinity(result_sign, N);

if !inf1 then FPProcessException(FPExc_DivideByZero, fpscr_val);

elsif zero1 || inf2 then

result_sign = if sign1 == sign2 then '0' else '1';

result = FPZero(result_sign, N);

else

A2 Application Level Programmers’ Model

A2.7 Floating-point data types and arithmetic

Non-Confidential ID051414

result = FPRound(value1/value2, N, fpscr_val);

return result;

Floating-point fused multiply-add

// FPMulAdd()

// ==========

// Calculates addend + op1*op2 with a single rounding.

bits(N) FPMulAdd(bits(N) addend, bits(N) op1, bits(N) op2,

boolean fpscr_controlled)

assert N IN {32,64};

fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();

(typeA,signA,valueA) = FPUnpack(addend, fpscr_val);

(type1,sign1,value1) = FPUnpack(op1, fpscr_val);

(type2,sign2,value2) = FPUnpack(op2, fpscr_val);

inf1 = (type1 == FPType_Infinity); zero1 = (type1 == FPType_Zero);

inf2 = (type2 == FPType_Infinity); zero2 = (type2 == FPType_Zero);

(done,result) = FPProcessNaNs3(typeA, type1, type2, addend, op1, op2, fpscr_val);

if typeA == FPType_QNaN && ((inf1 && zero2) || (zero1 && inf2)) then

result = FPDefaultNaN(N);

FPProcessException(FPExc_InvalidOp, fpscr_val);

if !done then

infA = (typeA == FPType_Infinity); zeroA = (typeA == FPType_Zero);

// Determine sign and type product will have if it does not cause an Invalid

// Operation.

signP = if sign1 == sign2 then '0' else '1';

infP = inf1 || inf2;

zeroP = zero1 || zero2;

// Non SNaN-generated Invalid Operation cases are multiplies of zero by infinity and

// additions of opposite-signed infinities.

if (inf1 && zero2) || (zero1 && inf2) || (infA && infP && signA == NOT(signP)) then

result = FPDefaultNaN(N);

FPProcessException(FPExc_InvalidOp, fpscr_val);

// Other cases involving infinities produce an infinity of the same sign.

elsif (infA && signA == '0') || (infP && signP == '0') then

result = FPInfinity('0', N);

elsif (infA && signA == '1') || (infP && signP == '1') then

result = FPInfinity('1', N);

// Cases where the result is exactly zero and its sign is not determined by the

// rounding mode are additions of same-signed zeros.

elsif zeroA && zeroP && signA == signP then

result = FPZero(signA, N);

// Otherwise calculate numerical result and round it.

else

result_value = valueA + (value1 * value2);

if result_value == 0.0 then // Sign of exact zero result depends on rounding mode

result_sign = if fpscr_val<23:22> == '10' then '1' else '0';

result = FPZero(result_sign, N);

else

result = FPRound(result_value, N, fpscr_val);

return result;

A2 Application Level Programmers’ Model

A2.7 Floating-point data types and arithmetic

ID051414 Non-Confidential

Floating-point reciprocal estimate and step

The Advanced SIMD Extension includes instructions that support Newton-Raphson calculation of the reciprocal of

a number.

The

VRECPE

instruction produces the initial estimate of the reciprocal. It uses the following pseudocode functions:

// FPRecipEstimate()

// =================

bits(32) FPRecipEstimate(bits(32) operand)

(type,sign,value) = FPUnpack(operand, StandardFPSCRValue());

if type == FPType_SNaN || type == FPType_QNaN then

result = FPProcessNaN(type, operand, StandardFPSCRValue());

elsif type == FPType_Infinity then

result = FPZero(sign, 32);

elsif type == FPType_Zero then

result = FPInfinity(sign, 32);

FPProcessException(FPExc_DivideByZero, StandardFPSCRValue());

elsif Abs(value) >= 2^126 then // Result underflows to zero of correct sign

result = FPZero(sign, 32);

FPProcessException(FPExc_Underflow, StandardFPSCRValue());

else

// Operand must be normalized, since denormalized numbers are flushed to zero. Scale to a

// double-precision value in the range 0.5 <= x < 1.0, and calculate result exponent.

// Scaled value is positive, with:

// exponent = 1022 = double-precision representation of 2^(-1)

// fraction = original fraction extended with zeros.

scaled = '0 01111111110' : operand<22:0> : Zeros(29);

result_exp = 253 - UInt(operand<30:23>); // In range 253-252 = 1 to 253-1 = 252

// Call C function to get reciprocal estimate of scaled value.

estimate = recip_estimate(scaled);

// Result is double-precision and a multiple of 1/256 in the range 1 to 511/256. Convert

// to scaled single-precision result with the original sign bit, the copied high-order

// fraction bits, and the exponent calculated above.

result = sign : result_exp<7:0> : estimate<51:29>;

return result;

// UnsignedRecipEstimate()

// =======================

bits(32) UnsignedRecipEstimate(bits(32) operand)

if operand<31> == '0' then // Operands <= 0x7FFFFFFF produce 0xFFFFFFFF

result = Ones(32);

else

// Generate double-precision value = operand * 2^(-32). This has zero sign bit, with:

// exponent = 1022 = double-precision representation of 2^(-1)

// fraction taken from operand, excluding its most significant bit.

dp_operand = '0 01111111110' : operand<30:0> : Zeros(21);

// Call C function to get reciprocal estimate of scaled value.

estimate = recip_estimate(dp_operand);

// Result is double-precision and a multiple of 1/256 in the range 1 to 511/256.

// Multiply by 2^31 and convert to an unsigned integer - this just involves

// concatenating the implicit units bit with the top 31 fraction bits.

result = '1' : estimate<51:21>;

return result;

where

recip_estimate()

is defined by the following C function:

A2 Application Level Programmers’ Model

A2.7 Floating-point data types and arithmetic

Non-Confidential ID051414

double recip_estimate(double a)

{

int q, s;

double r;

q = (int)(a * 512.0); /* a in units of 1/512 rounded down */

r = 1.0 / (((double)q + 0.5) / 512.0); /* reciprocal r */

s = (int)(256.0 * r + 0.5); /* r in units of 1/256 rounded to nearest */

return (double)s / 256.0;

}

Table A2-10 shows the results where input values are out of range.

The Newton-Raphson iteration:

n+1

= x

(2-dx

)

converges to (

1/d

) if

0 is the result of

VRECPE

applied to

The VRECPS instruction performs a (2 - op1×op2) calculation and can be used with a multiplication to perform a

step of this iteration. The functionality of this instruction is defined by the following pseudocode function:

// FPRecipStep()

// =============

bits(32) FPRecipStep(bits(32) op1, bits(32) op2)

(type1,sign1,value1) = FPUnpack(op1, StandardFPSCRValue());

(type2,sign2,value2) = FPUnpack(op2, StandardFPSCRValue());

(done,result) = FPProcessNaNs(type1, type2, op1, op2, StandardFPSCRValue());

if !done then

inf1 = (type1 == FPType_Infinity); inf2 = (type2 == FPType_Infinity);

zero1 = (type1 == FPType_Zero); zero2 = (type2 == FPType_Zero);

if (inf1 && zero2) || (zero1 && inf2) then

product = FPZero('0', 32);

else

product = FPMul(op1, op2, FALSE);

result = FPSub(FPTwo(32), product, FALSE);

return result;

Table A2-11 shows the results where input values are out of range.

Table A2-10 VRECPE results for out of range inputs

Number type Input Vm[i] Result Vd[i]

Integer <=

0x7FFFFFFF 0xFFFFFFFF

Floating-point NaN Default NaN

Floating-point ±0 or denormalized number ±infinity a

a. FPSCR.DZC is set to 1

Floating-point ±infinity ±0

Floating-point Absolute value >= 2126 ±0

Table A2-11 VRECPS results for out of range inputs

Input Vn[i] Input Vm[i] Result Vd[i]

Any NaN - Default NaN

A2 Application Level Programmers’ Model

A2.7 Floating-point data types and arithmetic

ID051414 Non-Confidential

Floating-point square root

// FPSqrt()

// ========

bits(N) FPSqrt(bits(N) operand)

assert N IN {32,64};

(type,sign,value) = FPUnpack(operand, FPSCR);

if type == FPType_SNaN || type == FPType_QNaN then

result = FPProcessNaN(type, operand, FPSCR);

elsif type == FPType_Zero then

result = FPZero(sign, N);

elsif type == FPType_Infinity && sign == '0' then

result = FPInfinity(sign, N);

elsif sign == '1' then

result = FPDefaultNaN(N);

FPProcessException(FPExc_InvalidOp, FPSCR);

else

result = FPRound(Sqrt(value), N, FPSCR);

return result;

Floating-point reciprocal square root estimate and step

The Advanced SIMD Extension includes instructions that support Newton-Raphson calculation of the reciprocal of

the square root of a number.

The

VRSQRTE

instruction produces the initial estimate of the reciprocal of the square root. It uses the following

pseudocode functions:

// FPRSqrtEstimate()

// =================

bits(32) FPRSqrtEstimate(bits(32) operand)

(type,sign,value) = FPUnpack(operand, StandardFPSCRValue());

if type == FPType_SNaN || type == FPType_QNaN then

result = FPProcessNaN(type, operand, StandardFPSCRValue());

elsif type == FPType_Zero then

result = FPInfinity(sign, 32);

FPProcessException(FPExc_DivideByZero, StandardFPSCRValue());

elsif sign == '1' then

result = FPDefaultNaN(32);

FPProcessException(FPExc_InvalidOp, StandardFPSCRValue());

elsif type == FPType_Infinity then

result = FPZero('0', 32);

else

// Operand must be normalized, since denormalized numbers are flushed to zero. Scale to a

// double-precision value in the range 0.25 <= x < 1.0, with the evenness or oddness of

// the exponent unchanged, and calculate result exponent.

// Scaled value has positive sign bit, with:

// exponent = 1022 or 1021 = double-precision representation of 2^(-1) or 2^(-2)

// fraction = original fraction extended with zeros.

if operand<23> == '0' then

scaled = '0 01111111110' : operand<22:0> : Zeros(29);

else

scaled = '0 01111111101' : operand<22:0> : Zeros(29);

- Any NaN Default NaN

±0.0 or denormalized number ±infinity 2.0

±infinity ±0.0 or denormalized number 2.0

Table A2-11 VRECPS results for out of range inputs (continued)

Input Vn[i] Input Vm[i] Result Vd[i]

A2 Application Level Programmers’ Model

A2.7 Floating-point data types and arithmetic

Non-Confidential ID051414

result_exp = (380 - UInt(operand<30:23>)) DIV 2;

// Call C function to get reciprocal estimate of scaled value.

estimate = recip_sqrt_estimate(scaled);

// Result is double-precision and a multiple of 1/256 in the range 1 to 511/256. Convert

// to scaled single-precision result with positive sign bit and high-order fraction bits,

// and exponent calculated above.

result = '0' : result_exp<7:0> : estimate<51:29>;

return result;

// UnsignedRSqrtEstimate()

// =======================

bits(32) UnsignedRSqrtEstimate(bits(32) operand)

if operand<31:30> == '00' then // Operands <= 0x3FFFFFFF produce 0xFFFFFFFF

result = Ones(32);

else

// Generate double-precision value = operand * 2^(-32). This has zero sign bit, with:

// exponent = 1022 or 1021 = double-precision representation of 2^(-1) or 2^(-2)

// fraction taken from operand, excluding its most significant one or two bits.

if operand<31> == '1' then

dp_operand = '0 01111111110' : operand<30:0> : Zeros(21);

else // operand<31:30> == '01'

dp_operand = '0 01111111101' : operand<29:0> : Zeros(22);

// Call C function to get reciprocal estimate of scaled value.

estimate = recip_sqrt_estimate(dp_operand);

// Result is double-precision and a multiple of 1/256 in the range 1 to 511/256.

// Multiply by 2^31 and convert to an unsigned integer - this just involves

// concatenating the implicit units bit with the top 31 fraction bits.

result = '1' : estimate<51:21>;

return result;

where

recip_sqrt_estimate()

is defined by the following C function:

double recip_sqrt_estimate(double a)

{

int q0, q1, s;

double r;

if (a < 0.5) /* range 0.25 <= a < 0.5 */

{

q0 = (int)(a * 512.0); /* a in units of 1/512 rounded down */

r = 1.0 / sqrt(((double)q0 + 0.5) / 512.0); /* reciprocal root r */

}

else /* range 0.5 <= a < 1.0 */

{

q1 = (int)(a * 256.0); /* a in units of 1/256 rounded down */

r = 1.0 / sqrt(((double)q1 + 0.5) / 256.0); /* reciprocal root r */

}

s = (int)(256.0 * r + 0.5); /* r in units of 1/256 rounded to nearest */

return (double)s / 256.0;

}

A2 Application Level Programmers’ Model

A2.7 Floating-point data types and arithmetic

ID051414 Non-Confidential

Table A2-12 shows the results where input values are out of range.

The Newton-Raphson iteration:

n+1

= x

(3-dx

)/2

converges to (

√

) if

0 is the result of

VRSQRTE

applied to

The

VRSQRTS

instruction performs a (3 – op1×op2)/2 calculation and can be used with two multiplications to perform

a step of this iteration. The

FPRSqrtStep()

pseudocode function defines the functionality of this instruction:

// FPRSqrtStep()

// =============

bits(32) FPRSqrtStep(bits(32) op1, bits(32) op2)

(type1,sign1,value1) = FPUnpack(op1, StandardFPSCRValue());

(type2,sign2,value2) = FPUnpack(op2, StandardFPSCRValue());

(done,result) = FPProcessNaNs(type1, type2, op1, op2, StandardFPSCRValue());

if !done then

inf1 = (type1 == FPType_Infinity); inf2 = (type2 == FPType_Infinity);

zero1 = (type1 == FPType_Zero); zero2 = (type2 == FPType_Zero);

if (inf1 && zero2) || (zero1 && inf2) then

product = FPZero('0', 32);

else

product = FPMul(op1, op2, FALSE);

result = FPHalvedSub(FPThree(32), product, FALSE);

return result;

Table A2-13 shows the results where input values are out of range.

FPRSqrtStep()

calls the

FPHalvedSub()

pseudocode function:

// FPHalvedSub()

// =============

bits(N) FPHalvedSub(bits(N) op1, bits(N) op2, boolean fpscr_controlled)

assert N IN {32,64};

Table A2-12 VRSQRTE results for out of range inputs

Number type Input Vm[i] Result Vd[i]

Integer <=

0x3FFFFFFF 0xFFFFFFFF

Floating-point NaN, –(normalized number), –infinity Default NaN

Floating-point –0 or –(denormalized number) – infinity a

a. FPSCR.DZC is set to 1.

Floating-point +0 or +(denormalized number) +infinity a

Floating-point +infinity +0

Table A2-13 VRSQRTS results for out of range inputs

Input Vn[i] Input Vm[i] Result Vd[i]

Any NaN - Default NaN

- Any NaN Default NaN

±0.0 or denormalized number ±infinity 1.5

±infinity ±0.0 or denormalized number 1.5

A2 Application Level Programmers’ Model

A2.7 Floating-point data types and arithmetic

Non-Confidential ID051414

fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();

(type1,sign1,value1) = FPUnpack(op1, fpscr_val);

(type2,sign2,value2) = FPUnpack(op2, fpscr_val);

(done,result) = FPProcessNaNs(type1, type2, op1, op2, fpscr_val);

if !done then

inf1 = (type1 == FPType_Infinity); inf2 = (type2 == FPType_Infinity);

zero1 = (type1 == FPType_Zero); zero2 = (type2 == FPType_Zero);

if inf1 && inf2 && sign1 == sign2 then

result = FPDefaultNaN(N);

FPProcessException(FPExc_InvalidOp, fpscr_val);

elsif (inf1 && sign1 == '0') || (inf2 && sign2 == '1') then

result = FPInfinity('0', N);

elsif (inf1 && sign1 == '1') || (inf2 && sign2 == '0') then

result = FPInfinity('1', N);

elsif zero1 && zero2 && sign1 == NOT(sign2) then

result = FPZero(sign1, N);

else

result_value = (value1 - value2) / 2.0;

if result_value == 0.0 then // Sign of exact zero result depends on rounding mode

result_sign = if fpscr_val<23:22> == '10' then '1' else '0';

result = FPZero(result_sign, N);

else

result = FPRound(result_value, N, fpscr_val);

return result;

Floating-point conversions

The following functions perform conversions between half-precision and single-precision floating-point numbers.

// FPHalfToSingle()

// ================

bits(32) FPHalfToSingle(bits(16) operand, boolean fpscr_controlled)

fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();

(type,sign,value) = FPUnpack(operand, fpscr_val);

if type == FPType_SNaN || type == FPType_QNaN then

if fpscr_val<25> == '1' then // DN bit set

result = FPDefaultNaN(32);

else

result = sign : '11111111 1' : operand<8:0> : Zeros(13);

if type == FPType_SNaN then

FPProcessException(FPExc_InvalidOp, fpscr_val);

elsif type == FPType_Infinity then

result = FPInfinity(sign, 32);

elsif type == FPType_Zero then

result = FPZero(sign, 32);

else

result = FPRound(value, 32, fpscr_val); // Rounding will be exact

return result;

// FPSingleToHalf()

// ================

bits(16) FPSingleToHalf(bits(32) operand, boolean fpscr_controlled)

fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();

(type,sign,value) = FPUnpack(operand, fpscr_val);

if type == FPType_SNaN || type == FPType_QNaN then

if fpscr_val<26> == '1' then // AH bit set

result = FPZero(sign, 16);

elsif fpscr_val<25> == '1' then // DN bit set

result = FPDefaultNaN(16);

else

result = sign : '11111 1' : operand<21:13>;

if type == FPType_SNaN || fpscr_val<26> == '1' then

FPProcessException(FPExc_InvalidOp, fpscr_val);

elsif type == FPType_Infinity then

if fpscr_val<26> == '1' then // AH bit set

A2 Application Level Programmers’ Model

A2.7 Floating-point data types and arithmetic

ID051414 Non-Confidential

result = sign : Ones(15);

FPProcessException(FPExc_InvalidOp, fpscr_val);

else

result = FPInfinity(sign, 16);

elsif type == FPType_Zero then

result = FPZero(sign, 16);

else

result = FPRound(value, 16, fpscr_val);

return result;

The following functions perform conversions between single-precision and double-precision floating-point

numbers.

// FPSingleToDouble()

// ==================

bits(64) FPSingleToDouble(bits(32) operand, boolean fpscr_controlled)

fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();

(type,sign,value) = FPUnpack(operand, fpscr_val);

if type == FPType_SNaN || type == FPType_QNaN then

if fpscr_val<25> == '1' then // DN bit set

result = FPDefaultNaN(64);

else

result = sign : '11111111111 1' : operand<21:0> : Zeros(29);

if type == FPType_SNaN then

FPProcessException(FPExc_InvalidOp, fpscr_val);

elsif type == FPType_Infinity then

result = FPInfinity(sign, 64);

elsif type == FPType_Zero then

result = FPZero(sign, 64);

else

result = FPRound(value, 64, fpscr_val); // Rounding will be exact

return result;

// FPDoubleToSingle()

// ==================

bits(32) FPDoubleToSingle(bits(64) operand, boolean fpscr_controlled)

fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();

(type,sign,value) = FPUnpack(operand, fpscr_val);

if type == FPType_SNaN || type == FPType_QNaN then

if fpscr_val<25> == '1' then // DN bit set

result = FPDefaultNaN(32);

else

result = sign : '11111111 1' : operand<50:29>;

if type == FPType_SNaN then

FPProcessException(FPExc_InvalidOp, fpscr_val);

elsif type == FPType_Infinity then

result = FPInfinity(sign, 32);

elsif type == FPType_Zero then

result = FPZero(sign, 32);

else

result = FPRound(value, 32, fpscr_val);

return result;

The following functions perform conversions between floating-point numbers and integers or fixed-point numbers:

// FPToFixed()

// ===========

bits(M) FPToFixed(bits(N) operand, integer M, integer fraction_bits, boolean unsigned,

boolean round_towards_zero, boolean fpscr_controlled)

assert N IN {32,64};

fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();

if round_towards_zero then fpscr_val<23:22> = '11';

(type,sign,value) = FPUnpack(operand, fpscr_val);

A2 Application Level Programmers’ Model

A2.7 Floating-point data types and arithmetic

Non-Confidential ID051414

// For NaNs and infinities, FPUnpack() has produced a value that will round to the

// required result of the conversion. Also, the value produced for infinities will

// cause the conversion to overflow and signal an Invalid Operation floating-point

// exception as required. NaNs must also generate such a floating-point exception.

if type == FPType_SNaN || type == FPType_QNaN then

FPProcessException(FPExc_InvalidOp, fpscr_val);

// Scale value by specified number of fraction bits, then start rounding to an integer

// and determine the rounding error.

value = value * 2^fraction_bits;

int_result = RoundDown(value);

error = value - int_result;

// Apply the specified rounding mode.

case fpscr_val<23:22> of

when '00' // Round to Nearest (rounding to even if exactly halfway)

round_up = (error > 0.5 || (error == 0.5 && int_result<0> == '1'));

when '01' // Round towards Plus Infinity

round_up = (error != 0.0);

when '10' // Round towards Minus Infinity

round_up = FALSE;

when '11' // Round towards Zero

round_up = (error != 0.0 && int_result < 0);

if round_up then int_result = int_result + 1;

// Bitstring result is the integer result saturated to the destination size, with

// saturation indicating overflow of the conversion (signaled as an Invalid

// Operation floating-point exception).

(result, overflow) = SatQ(int_result, M, unsigned);

if overflow then

FPProcessException(FPExc_InvalidOp, fpscr_val);

elsif error != 0.0 then

FPProcessException(FPExc_Inexact, fpscr_val);

return result;

// FixedToFP()

// ===========

bits(N) FixedToFP(bits(M) operand, integer N, integer fraction_bits, boolean unsigned,

boolean round_to_nearest, boolean fpscr_controlled)

assert N IN {32,64};

fpscr_val = if fpscr_controlled then FPSCR else StandardFPSCRValue();

if round_to_nearest then fpscr_val<23:22> = '00';

int_operand = if unsigned then UInt(operand) else SInt(operand);

real_operand = int_operand / 2^fraction_bits;

if real_operand == 0.0 then

result = FPZero('0', N);

else

result = FPRound(real_operand, N, fpscr_val);

return result;

A2 Application Level Programmers’ Model

A2.8 Polynomial arithmetic over {0, 1}

ID051414 Non-Confidential

A2.8 Polynomial arithmetic over {0, 1}

Some Advanced SIMD instructions can operate on polynomials over {0, 1}, see Data types supported by the

Advanced SIMD Extension on page A2-59. The polynomial data type represents a polynomial in x of the form

bn–1xn–1 + … + b1x + b0 where bk is bit[k] of the value.

The coefficients 0 and 1 are manipulated using the rules of Boolean arithmetic:

• 0 + 0 = 1 + 1 = 0

• 0 + 1 = 1 + 0 = 1

• 0 × 0 = 0 × 1 = 1 × 0 = 0

• 1 × 1 = 1.

That is:

• adding two polynomials over {0, 1} is the same as a bitwise exclusive OR

• multiplying two polynomials over {0, 1} is the same as integer multiplication except that partial products are

exclusive-ORed instead of being added.

Note

The instructions that can perform polynomials arithmetic over {0, 1} are

VMUL

and

VMULL

, see VMUL, VMULL

(integer and polynomial) on page A8-958.

A2.8.1 Pseudocode details of polynomial multiplication

In pseudocode, polynomial addition is described by the EOR operation on bitstrings.

Polynomial multiplication is described by the

PolynomialMult()

function:

// PolynomialMult()

// ================

bits(M+N) PolynomialMult(bits(M) op1, bits(N) op2)

result = Zeros(M+N);

extended_op2 = Zeros(M) : op2;

for i=0 to M-1

if op1 == '1' then

result = result EOR LSL(extended_op2, i);

return result;

A2 Application Level Programmers’ Model

A2.9 Coprocessor support

Non-Confidential ID051414

A2.9 Coprocessor support

The ARM architecture supports coprocessors, to extend the functionality of an ARM processor. The coprocessor

instructions summarized in Coprocessor instructions on page A4-180 provide access to sixteen coprocessors,

described as CP0 to CP15. The following coprocessors are reserved by ARM for specific purposes:

• Coprocessor 15 (CP15) provides system control functionality. This includes architecture and feature

identification, as well as control, status information and configuration support.

For a VMSA implementation, the following sections give a general description of CP15:

—About the system control registers for VMSA on page B3-1444

—Organization of the CP15 registers in a VMSA implementation on page B3-1469

—Functional grouping of VMSAv7 system control registers on page B3-1491.

For a PMSA implementation, the following sections give a general description of CP15:

—About the system control registers for PMSA on page B5-1774

—Organization of the CP15 registers in a PMSA implementation on page B5-1787

—Functional grouping of PMSAv7 system control registers on page B5-1799.

CP15 also provides performance monitor registers, see Chapter C12 The Performance Monitors Extension.

• Coprocessor 14 (CP14) supports:

— debug, see Chapter C6 Debug Register Interfaces

— the Thumb Execution Environment, see Thumb Execution Environment on page A2-95

— direct Java bytecode execution, see Jazelle direct bytecode execution support on page A2-97.

• Coprocessors 10 and 11 (CP10 and CP11) together support floating-point and vector operations, and the

control and configuration of the Floating-point and Advanced SIMD architecture extensions.

• Coprocessors 8, 9, 12, and 13 are reserved for future use by ARM. Any coprocessor access instruction

attempting to access one of these coprocessors is UNDEFINED.

Note

In an implementation that includes either or both of the Advanced SIMD Extension and the Floating-point (VFP)

Extension, to permit execution of any floating-point or Advanced SIMD instructions, software must enable access

to both CP10 and CP11, see Enabling Advanced SIMD and floating-point support on page B1-1229.

The following sections give information more information about permitted accesses to coprocessors CP14 and

CP15:

•UNPREDICTABLE and UNDEFINED behavior for CP14 and CP15 accesses on page B3-1446, for a VMSA

implementation

•UNPREDICTABLE and UNDEFINED behavior for CP14 and CP15 accesses on page B5-1776, for a PMSA

implementation.

Most CP14 and CP15 functions cannot be accessed by software executing at PL0. This manual clearly identifies

those functions that can be accessed at PL0.

Software executing at PL1 can enable the unprivileged execution of all load, store, branch and data operation

instructions associated with floating-point, Advanced SIMD and execution environment support.

Coprocessors 0 to 7 can provide vendor-specific features.

A2 Application Level Programmers’ Model

A2.10 Thumb Execution Environment

ID051414 Non-Confidential

A2.10 Thumb Execution Environment

Thumb Execution Environment (ThumbEE) is a variant of the Thumb instruction set designed as a target for

dynamically generated code. This is code that is compiled on the device, from a portable bytecode or other

intermediate or native representation, either shortly before or during execution. ThumbEE provides support for

Just-In-Time (JIT), Dynamic Adaptive Compilation (DAC), and Ahead-Of-Time (AOT) compilers, but cannot

interwork freely with the ARM and Thumb instruction sets.

From the publication of issue C.a of this manual, ARM deprecates any use of the ThumbEE instruction set.

ThumbEE is particularly suited to languages that feature managed pointers and array types. The processor executes

ThumbEE instructions when it is in the ThumbEE instruction set state. For information about instruction set states

see Instruction set state register, ISETSTATE on page A2-50.

ThumbEE is both the name of the instruction set and the name of the extension that provides support for that

instruction set. The ThumbEE Extension is:

• required in implementations of the ARMv7-A profile

• optional in implementations of the ARMv7-R profile.

See Thumb Execution Environment on page B1-1240 for system level information about ThumbEE.

A2.10.1 ThumbEE instructions

In ThumbEE state, the processor executes almost the same instruction set as in Thumb state. However some

instructions behave differently, some are removed, and some ThumbEE instructions are added.

The key differences are:

• additional instructions to change instruction set in both Thumb state and ThumbEE state

• new ThumbEE instructions to branch to handlers

• null pointer checking on load/store instructions executed in ThumbEE state

• an additional instruction in ThumbEE state to check array bounds

• some other modifications to load, store, and control flow instructions.

For more information about the ThumbEE instructions see Chapter A9 The ThumbEE Instruction Set.

A2.10.2 ThumbEE configuration

ThumbEE introduces two new CP14 registers, that Table A2-14 shows. These are 32-bit registers:

ThumbEE is an unprivileged, user-level facility, and there are no special provisions for using it securely. For more

information, see ThumbEE and the Security Extensions and Virtualization Extensions on page B1-1240.

Table A2-14 ThumbEE register summary

Name, VMSAaName, PMSAaCRn opc1 CRm opc2 Width Type Description

TEECR TEECR c0 6 c0 0 32-bit RW ThumbEE Configuration Register

TEEHBR TEEHBR c1 6 c0 0 32-bit RW ThumbEE Handler Base Register

a. VMSA and PMSA definitions of the register fields are identical. These columns link to the descriptions in Chapter B4 and in Chapter B6.

A2 Application Level Programmers’ Model

A2.10 Thumb Execution Environment

Non-Confidential ID051414

Use of HandlerBase

ThumbEE handlers are entered by reference to a HandlerBase address, defined by the TEEHBR. In addition to the

handlers for IndexCheck and NullCheck, there are 256 handlers, Handler_00 to Handler_FF, at 32-byte offsets from

HandlerBase. Table A2-15 shows the arrangement of handlers relative to the value of HandlerBase:

The IndexCheck occurs when a

CHKA

instruction detects an index out of range. For more information, see CHKA on

page A9-1124.

The NullCheck occurs when any memory access instruction is executed with a value of 0 in the base register. For

more information, see Null checking on page A9-1113.

Note

Checks are similar to conditional branches, with the added property that they clear the IT bits when taken.

The other handlers are called using explicit handler call instructions:

•

and

HBL

can call any handler, that is, can call Handler_00-Handler_FF

•

HBLP

and

HBP

can call only Handler_00-Handler_31.

For more information see the following instruction descriptions:

•HB, HBL on page A9-1125

•HBLP on page A9-1126

•HBP on page A9-1127.

Table A2-15 Access to ThumbEE handlers

Offset from HandlerBase Name Value stored

0x0008

IndexCheck Branch to IndexCheck handler

0x0004

NullCheck Branch to NullCheck handler

0x0000

Handler_00 Implementation of Handler_00

0x0020

Handler_01 Implementation of Handler_01

………

0x1FC0

Handler_FE Implementation of Handler_FE

0x1FE0

Handler_FF Implementation of Handler_FF

A2 Application Level Programmers’ Model

A2.11 Jazelle direct bytecode execution support

ID051414 Non-Confidential

A2.11 Jazelle direct bytecode execution support

From ARMv5TEJ, the architecture requires every system to include an implementation of the Jazelle extension. The

Jazelle extension provides architectural support for hardware acceleration of bytecode execution by a Java Virtual

Machine (JVM).

In the simplest implementations of the Jazelle extension, the processor does not accelerate the execution of any

bytecodes, and the JVM uses software routines to execute all bytecodes. Such an implementation is called a trivial

implementation of the Jazelle extension, and has minimal additional cost compared with not implementing the

Jazelle extension at all. An implementation that provides hardware acceleration of bytecode execution is a

non-trivial Jazelle implementation.

The Virtualization Extensions require that the Jazelle implementation is the trivial Jazelle implementation.

These requirements for the Jazelle extension mean a JVM can be written to both:

• function correctly on all processors that include a Jazelle extension implementation

• automatically take advantage of the accelerated bytecode execution provided by a processor that includes a

non-trivial implementation.

A non-trivial implementation of the Jazelle extension implements a subset of the bytecodes in hardware, choosing

bytecodes that:

• can have simple hardware implementations

• account for a large percentage of bytecode execution time.

The required features of a non-trivial implementation are:

• provision of the Jazelle state

• a new instruction,

BXJ

, to enter Jazelle state

• system support that enables an operating system to regulate the use of the Jazelle extension hardware

• system support that enables a JVM to configure the Jazelle extension hardware to its specific needs.

The required features of a trivial implementation are:

• Normally, the Jazelle instruction set state is never entered. In some implementations, an incorrect exception

return can cause entry to the Jazelle instruction set state. If this happens, the next instruction executed is

treated as UNDEFINED. For more information, see Unimplemented instruction sets on page B1-1155.

• The

BXJ

instruction behaves as a

instruction.

• Configuration support that maintains the interface to the Jazelle extension is permanently disabled.

For more information about trivial implementations see Trivial implementation of the Jazelle extension on

page B1-1245.

A JVM that has been written to take advantage automatically of hardware-accelerated bytecode execution is called

an Enabled JVM (EJVM).

A2.11.1 Subarchitectures

A processor implementation that includes the Jazelle extension expects the ARM core register values and other

resources of the ARM processor to conform to an interface standard defined by the Jazelle implementation when

Jazelle state is entered and exited. For example, a specific ARM core register might be reserved for use as the pointer

to the current bytecode.

For an EJVM, and any associated debug support, to function correctly, it must be written to comply with the

interface standard defined by the acceleration hardware at Jazelle state execution entry and exit points.

An implementation of the Jazelle extension might define other configuration registers in addition to the

architecturally defined ones.

A2 Application Level Programmers’ Model

A2.11 Jazelle direct bytecode execution support

Non-Confidential ID051414

The interface standard and any additional configuration registers used for communication with the Jazelle extension

are known collectively as the subarchitecture of the implementation. They are not described in this manual. Only

EJVM implementations and debug or similar software can depend on the subarchitecture. All other software must

rely only on the architectural definition of the Jazelle extension given in this manual. A particular subarchitecture

is identified by reading the JIDR.

A2.11.2 Jazelle state

While the processor is in Jazelle state, it executes bytecode programs. A bytecode program is defined as an

executable object that comprises one or more

class

files, or is derived from and functionally equivalent to one or

class

files. See The Java Virtual Machine Specification for the definition of

class

files.

While the processor is in Jazelle state, the PC identifies the next JVM bytecode to be executed. A JVM bytecode is

a bytecode defined in The Java Virtual Machine Specification, or a functionally equivalent transformed version of

a bytecode defined in that specification.

For the Jazelle extension, the functionality of Native methods, as described in The Java Virtual Machine

Specification, must be specified using only instructions from the ARM, Thumb, and ThumbEE instruction sets.

An implementation of the Jazelle extension must not be documented or promoted as performing any task while it is

in Jazelle state other than the acceleration of bytecode programs in accordance with this section and the descriptions

in The Java Virtual Machine Specification.

A2.11.3 Jazelle state entry instruction,

BXJ

ARMv7 includes an ARM instruction similar to

. The

BXJ

instruction has a single register operand that specifies

a target instruction set state, ARM state or Thumb state, and branch target address for use if entry to Jazelle state is

not available. For more information, see BXJ on page A8-354.

Correct entry into Jazelle state involves the EJVM executing the

BXJ

instruction at a time when both:

• the Jazelle extension Control and Configuration registers are initialized correctly, see Application level

configuration and control of the Jazelle extension on page A2-99

• application level registers and any additional configuration registers are initialized as required by the

subarchitecture of the implementation.

Executing

BXJ

with Jazelle extension enabled

Executing a

BXJ

instruction when the JMCR.JE bit is 1 causes the Jazelle hardware to do one of the following:

• enter Jazelle state and start executing bytecodes directly from a SUBARCHITECTURE DEFINED address

• branch to a SUBARCHITECTURE DEFINED handler.

Which of these occurs is SUBARCHITECTURE DEFINED.

The Jazelle subarchitecture can use Application level registers, but not System level registers, to transfer

information between the Jazelle extension and the EJVM. There are SUBARCHITECTURE DEFINED restrictions on

what Application level registers must contain when a BXJ instruction is executed, and Application level registers

have SUBARCHITECTURE DEFINED values when Jazelle state execution ends and ARM or Thumb state execution

resumes.

Jazelle subarchitectures and implementations must not use any unallocated bits in Application level registers such

as the CPSR or FPSCR. All such bits are reserved for future expansion of the ARM architecture.

Executing

BXJ

with Jazelle extension disabled

If a

BXJ

instruction is executed when the JMCR.JE bit is 0, it is executed identically to a

instruction with the same

A2 Application Level Programmers’ Model

A2.11 Jazelle direct bytecode execution support

ID051414 Non-Confidential

This means that

BXJ

instructions can be executed freely when the JMCR.JE bit is 0. In particular, if an EJVM

determines that it is executing on a processor whose Jazelle extension implementation is trivial or uses an

incompatible subarchitecture, it can set JE to 0 and execute correctly. In this case it executes without the benefit of

any Jazelle hardware acceleration that might be present.

A2.11.4 Application level configuration and control of the Jazelle extension

The Jazelle extension registers are implemented as CP14 registers. Table A2-16 summarizes the

architecturally-defined Jazelle registers. Additional SUBARCHITECTURE DEFINED configuration registers might be

provided.

An EJVM can read the JIDR to determine the architecture and subarchitecture under which it is running, and:

• the JMCR gives application level control of Jazelle operation

• the JOSCR gives OS level control of Jazelle operation

The following rules apply to all Jazelle extension control and configuration registers, including any

SUBARCHITECTURE DEFINED registers:

• Registers are accessed by CP14

MRC

and

MCR

instructions with

<opc1>

set to 7.

• The values contained in configuration registers are changed only by the execution of

MCR

instructions. In

particular, they are never changed by Jazelle state execution of bytecodes.

• The access policy for each architecturally-defined register is fully defined in the register description. The

access policy of other configuration registers is SUBARCHITECTURE DEFINED.

When execution is unprivileged,

MRC

and

MCR

accesses that are restricted to execution at PL1 or higher are

UNDEFINED.

For more information see Access to Jazelle registers on page A2-100.

• In an implementation that includes the Security Extensions, the registers are Common registers, meaning

they are common to the Secure and Non-secure security states. For more information, see Classification of

system control registers on page B3-1451.

• When a configuration register is readable, reading the register:

— returns the last value written to it

— has no side-effects.

When a configuration register is not readable, attempting to read it returns an UNKNOWN value.

• When a configuration register can be written, the effect of writing to it must be idempotent. That is, the

overall effect of writing the same value more than once must not differ from the effect of writing it once.

Changes to these CP14 registers have the same synchronization requirements as changes to the CP15 registers.

These are described in:

•Synchronization of changes to system control registers on page B3-1461 for a VMSA implementation

•Synchronization of changes to system control registers on page B5-1779 for a PMSA implementation.

For more information, see Jazelle state configuration and control on page B1-1243.

Table A2-16 Jazelle architecturally-defined registers summary

Name, VMSAaName, PMSAaCRn opc1 CRm opc2 Width Type bDescription

JIDR JIDR c0 7 c0 0 32-bit RO Jazelle ID Register

JOSCR JOSCR c1 7 c0 0 32-bit RW Jazelle OS Control Register

JMCR JMCR c2 7 c0 0 32-bit RW Jazelle Main Configuration Register

a. VMSA and PMSA definitions of the register fields are identical. These columns link to the descriptions in Chapter B4 and Chapter B6.

b. Type, for a non-trivial Jazelle implementation. Trivial implementation of the Jazelle extension on page B1-1245 describes the register

requirements for a trivial Jazelle implementation.

A2 Application Level Programmers’ Model

A2.11 Jazelle direct bytecode execution support

Non-Confidential ID051414

A2.11.5 Access to Jazelle registers

For a non-trivial Jazelle implementation, Table A2-17 shows the access permissions for the Jazelle registers, and

how unprivileged access to the registers depends on the value of the JOSCR.

Trivial implementation of the Jazelle extension on page B1-1245 describes the required behavior of Jazelle register

accesses for a trivial Jazelle implementation.

A2.11.6 EJVM operation

The following subsections summarize how an EJVM must operate, to meet the requirements of the architecture:

•Initialization

•Bytecode execution

•Jazelle exception conditions on page A2-101

•Other considerations on page A2-101.

Initialization

During initialization, the EJVM must first check which subarchitecture is present, by checking the Implementer and

Subarchitecture codes in the value read from the JIDR.

If the EJVM is incompatible with the subarchitecture, it must do one of the following:

• write to the JMCR with JE set to 0

• if unaccelerated bytecode execution is unacceptable, generate an error.

If the EJVM is compatible with the subarchitecture, it must write its required configuration to the JMCR and any

SUBARCHITECTURE DEFINED configuration registers.

Bytecode execution

The EJVM must contain a handler for each bytecode.

The EJVM initiates bytecode execution by executing a

BXJ

instruction with:

• the register operand specifying the target address of the bytecode handler for the first bytecode of the program

• the Application level registers set up in accordance with the SUBARCHITECTURE DEFINED interface standard.

The bytecode handler:

• performs the data-processing operations required by the bytecode indicated

Table A2-17 Access to Jazelle registers in a non-trivial Jazelle implementation

Jazelle register Unprivileged access Access at PL1

VMSA PMSA JOSCR.CD is 0 JOSCR.CD is 1

JOSCR JOSCR Read and write access

UNDEFINED

Read and write access

UNDEFINED

Read and write access permitted

JIDR JIDR Read access permitted Read access UNDEFINED Read access permitted

Write access UNDEFINED Write access UNDEFINED Write access UNPREDICTABLE

JMCR JMCR Read access UNDEFINED Read and write access

UNDEFINED Read and write access permitted

Write access permitted

SUBARCHITECTURE DEFINED

configuration registers

Read access UNDEFINED Read and write access

UNDEFINED

Read access SUBARCHITECTURE DEFINED

Write access permitted Write access permitted

A2 Application Level Programmers’ Model

A2.11 Jazelle direct bytecode execution support

ID051414 Non-Confidential

• determines the address of the next bytecode to be executed

• determines the address of the handler for that bytecode

• performs a

BXJ

to that handler address with the registers again set up to the SUBARCHITECTURE DEFINED

interface standard.

Jazelle exception conditions

During bytecode execution, the EJVM might encounter SUBARCHITECTURE DEFINED Jazelle exception conditions

that must be resolved by a software handler. For example, in the case of a configuration invalid handler, the handler

rewrites the desired configuration to the JMCR and to any SUBARCHITECTURE DEFINED configuration registers.

On entry to a Jazelle exception condition handler the contents of the Application level registers are

SUBARCHITECTURE DEFINED. This interface to the Jazelle exception condition handler might differ from the

interface standard for the bytecode handler, in order to supply information about the Jazelle exception condition.

The Jazelle exception condition handler:

• resolves the Jazelle exception condition

• determines the address of the next bytecode to be executed

• determines the address of the handler for that bytecode

• performs a

BXJ

to that handler address with the registers again set up to the SUBARCHITECTURE DEFINED

interface standard.

Other considerations

To ensure application execution and correct interaction with an operating system, an EJVM must only perform

operations that are permitted in unprivileged operation. In particular, for register accesses they must only:

• read the JIDR,

• write to the JMCR, and other configuration registers.

An EJVM must not attempt to access the JOSCR.

A2 Application Level Programmers’ Model

A2.12 Exceptions, debug events and checks

Non-Confidential ID051414

A2.12 Exceptions, debug events and checks

ARMv7 uses the following terms to describe various types of exceptional condition:

Exceptions In the ARM architecture, an exception causes entry into a processor mode that executes software at

PL1 or PL2, and execution of a software handler for the exception.

Note

The terms floating-point exception and Jazelle exception condition do not use this meaning of

exception. These terms are described later in this list.

Exceptions include:

• reset

• interrupts

• memory system aborts

• undefined instructions

• supervisor calls (SVCs), Secure Monitor calls (SMCs), and hypervisor calls (HVCs).

Most details of exception handling are not visible to application level software, and are described in

Exception handling on page B1-1165. Aspects that are visible to application level software are:

• The

SVC

instruction causes a Supervisor Call exception. This provides a mechanism for

unprivileged software to make a call to the operating system, or other system component that

is accessible only at PL1.

• In an implementation that includes the Security Extensions, the

SMC

instruction causes a

Secure Monitor Call exception, but only if software execution is at PL1 or higher.

Unprivileged software can only cause a Secure Monitor Call exception by methods defined

by the operating system, or by another component of the software system that executes at PL1

or higher.

• In an implementation that includes the Virtualization Extensions, the

HVC

instruction causes a

Hypervisor Call exception, but only if software execution is at PL1 or higher. Unprivileged

software can only cause a Hypervisor Call exception by methods defined by the hypervisor,

or by another component of the software system that executes at PL1 or higher.

• The

WFI

instruction provides a hint that nothing needs to be done until the processor takes an

interrupt or similar exception, see Wait For Interrupt on page B1-1203. This permits the

processor to enter a low-power state until that happens.

• The

WFE

instruction provides a hint that nothing needs to be done until either an

SEV

instruction

generates an event, or the processor takes an interrupt or similar exception, see Wait For

Event and Send Event on page B1-1200. This permits the processor to enter a low-power state

until one of these happens.

Floating-point exceptions

These relate to exceptional conditions encountered during floating-point arithmetic, such as division

by zero or overflow. For more information see:

•Floating-point exceptions on page A2-70

•FPSCR, Floating-point Status and Control Register, VMSA on page B4-1570, or FPSCR,

Floating-point Status and Control Register, PMSA on page B6-1847

• ANSI/IEEE Std. 754, IEEE Standard for Binary Floating-Point Arithmetic.

Jazelle exception conditions

These are conditions that cause Jazelle hardware acceleration to exit into a software handler, as

described in Jazelle exception conditions on page A2-101.

A2 Application Level Programmers’ Model

A2.12 Exceptions, debug events and checks

ID051414 Non-Confidential

Debug events These are conditions that cause a debug system to take action. Most aspects of debug events are not

visible to application level software, and are described in Chapter C3 Debug Events. Aspects that

are visible to application level software include:

• The

BKPT

instruction causes a BKPT instruction debug event to occur, see BKPT instruction

debug events on page C3-2040.

• The

DBG

instruction provides a hint to the debug system.

Checks These are provided in the ThumbEE Extension. A check causes an unconditional branch to a

specific handler entry point. The base address of the ThumbEE check handlers is held in the

TEEHBR.

A2 Application Level Programmers’ Model

A2.12 Exceptions, debug events and checks

Non-Confidential ID051414

ID051414 Non-Confidential

Chapter A3

Application Level Memory Model

This chapter gives an application level view of the memory model. It contains the following sections:

•Address space on page A3-106

•Alignment support on page A3-108

•Endian support on page A3-110

•Synchronization and semaphores on page A3-114

•Memory types and attributes and the memory order model on page A3-126

•Access rights on page A3-142

•Virtual and physical addressing on page A3-145

•Memory access order on page A3-146

•Caches and memory hierarchy on page A3-156.

Note

In this chapter, system register names usually link to the description of the register in Chapter B4 System Control

Registers in a VMSA implementation, for example SCTLR. If the register is included in a PMSA implementation,

then it is also described in Chapter B6 System Control Registers in a PMSA implementation.

A3 Application Level Memory Model

A3.1 Address space

Non-Confidential ID051414

A3.1 Address space

The ARM architecture Application level memory model uses a single, flat address space of 232 8-bit bytes, covering

4GBytes. Byte addresses are treated as unsigned numbers, running from 0 to 232 - 1. The address space is also

regarded as:

•2

30 32-bit words:

— the address of each word is word-aligned, meaning that the address is divisible by 4 and the least

significant bits of the address are

0b00

— the word at word-aligned address A consists of the four bytes with addresses A, A+1, A+2 and A+3.

•2

31 16-bit halfwords:

— the address of each halfword is halfword-aligned, meaning that the address is divisible by 2 and the

least significant bit of the address is 0

— the halfword at halfword-aligned address A consists of the two bytes with addresses A and A+1.

In some situations the ARM architecture supports accesses to halfwords and words that are not aligned to the

appropriate access size, see Alignment support on page A3-108.

Normally, address calculations are performed using ordinary integer instructions. This means that the address wraps

around if the calculation overflows or underflows the address space. Another way of describing this is that any

address calculation is reduced modulo 232.

A3.1.1 Address space overflow or underflow

Address space overflow occurs when the memory address increments beyond the top byte of the address space at

0xFFFFFFFF

. When this happens, the address wraps round, so that, for example, incrementing

0xFFFFFFFF

by 2 gives

a result of

0x00000001

Address space underflow occurs when the memory address decrements below the first byte of the address space at

0x00000000

. When this happens, the address wraps round, so that, for example, decrementing

0x00000002

by 4 gives

a result of

0xFFFFFFFE

When a processor performs normal sequential execution of instructions, after each instruction it finds the address

of the next instruction by calculating:

(address_of_current_instruction) + (size_of_executed_instruction)

This calculation can result in address space overflow.

Note

The size of the executed instruction depends on the current instruction set, and can depend on the instruction

executed.

Any multi-byte memory access that depends on address space overflow or underflow is UNPREDICTABLE. This

applies to both data and instruction accesses.

The following rules define the accesses that are UNPREDICTABLE:

1. If the processor executes an instruction for which the instruction address, size, and alignment mean it contains

the bytes

0xFFFFFFFF

and

0x00000000

, the result is UNPREDICTABLE.

Examples of this UNPREDICTABLE behavior include:

• relying on sequential execution of the instruction at

0x00000000

after any of:

— executing a 4-byte instruction at

0xFFFFFFFC

— executing a 2-byte instruction at

0xFFFFFFFE

— executing a 1-byte instruction at

0xFFFFFFFF

A3 Application Level Memory Model

A3.1 Address space

ID051414 Non-Confidential

• attempting to execute an instruction that spans the top of memory, for example:

— a 4-byte instruction at

0xFFFFFFFE

— a 2-byte instruction at

0xFFFFFFFF

2. If the processor executes a load or store instruction for which the computed address, total access size, and

alignment mean it accesses the bytes

0xFFFFFFFF

and

0x00000000

, the result is UNPREDICTABLE.

Examples of this UNPREDICTABLE behavior include:

• attempting to perform an unaligned load or store operation that spans the top of memory, for example:

— a word load or store from or to address

0xFFFFFFFD

— a halfword load or store from or to address

0xFFFFFFFF

• attempting to perform a multiple load or store operation that spans the top of memory, for example:

— a two-word load or store from or to addresses

0xFFFFFFFC

and

0x00000000

— an Advanced SIMD multiple-element load or store that includes bytes

0xFFFFFFFF

and

0x00000000

This UNPREDICTABLE behavior only applies to instructions that are executed, including those that fail their condition

code check. Most ARM implementations fetch instructions ahead of the currently-executing instruction. If this

prefetching overflows the top of the address space, it does not cause UNPREDICTABLE behavior unless the prefetched

instruction with an overflowed address is executed.

Note

In some cases, instructions that operate on multiple words can decrement the memory address by 4 after each word

access. If this calculation underflows the address space, the result is UNPREDICTABLE.

A3 Application Level Memory Model

A3.2 Alignment support

Non-Confidential ID051414

A3.2 Alignment support

Instructions in the ARM architecture are aligned as follows:

• ARM instructions are word-aligned

• Thumb and ThumbEE instructions are halfword-aligned

• Java bytecodes are byte-aligned.

In the ARMv7 architecture, some load and store instructions support unaligned data accesses, as described in

Unaligned data access.

For more information about the alignment support in previous versions of the ARM architecture, see Alignment on

page D12-2506.

A3.2.1 Unaligned data access

An ARMv7 implementation must support unaligned data accesses to Normal memory by some load and store

instructions, as Table A3-1 shows. Software can set the SCTLR.A bit to control whether a misaligned access to

Normal memory by one of these instructions causes an Alignment fault Data Abort exception.

Table A3-1 Alignment requirements of load/store instructions

Instructions Alignment

check

Result if check fails when:

SCTLR.A is 0 SCTLR.A is 1

LDRB

LDREXB

LDRBT

LDRSB

LDRSBT

STRB

STREXB

STRBT

SWPB

TBB

None - -

LDRH

LDRHT

LDRSH

LDRSHT

STRH

STRHT

TBH

Halfword Unaligned access Alignment fault

LDREXH

STREXH

Halfword Alignment fault Alignment fault

LDR

LDRT

STR

STRT

PUSH

, encodings T3 and A2 only

POP

, encodings T3 and A2 only

Word Unaligned access Alignment fault

LDREX

STREX

Word Alignment fault Alignment fault

LDREXD

STREXD

Doubleword Alignment fault Alignment fault

All forms of

LDM

and

STM

LDRD

RFE

SRS

STRD

SWP

PUSH

, except for encodings T3 and A2

POP

, except for encodings T3 and A2

Word Alignment fault Alignment fault

LDC

LDC2

STC

STC2

Word Alignment fault Alignment fault

VLDM

VLDR

VPOP

VPUSH

VSTM

VSTR

Word Alignment fault Alignment fault

VLD1

VLD2

VLD3

VLD4

VST1

VST2

VST3

VST4

, all with standard alignmentaElement size Unaligned access Alignment fault

VLD1

VLD2

VLD3

VLD4

VST1

VST2

VST3

VST4

, all with

:<align>

specifieda, b As specified

:<align>

Alignment fault Alignment fault

a. These element and structure load/store instructions are only in the Advanced SIMD Extension to the ARMv7 ARM and Thumb instruction

sets. ARMv7 does not support the pre-ARMv6 alignment model, so software cannot use that model with these instructions.

b. Previous versions of this document used @<align> to specify alignment. Both forms are supported, see Advanced SIMD addressing mode

on page A7-277 for more information.

A3 Application Level Memory Model

A3.2 Alignment support

ID051414 Non-Confidential

Note

In an implementation that includes the Virtualization Extensions, an unaligned access to Device or Strongly-ordered

memory always causes an Alignment fault Data Abort exception. For implementations that do not include the

Virtualization Extensions, see Cases where unaligned accesses are UNPREDICTABLE.

A3.2.2 Cases where unaligned accesses are UNPREDICTABLE

The following cases cause the resulting unaligned accesses to be UNPREDICTABLE, and overrule any permitted load

or store behavior shown in Table A3-1 on page A3-108:

• Any load instruction that is not faulted by the alignment restrictions shown in Table A3-1 on page A3-108

and that loads the PC has UNPREDICTABLE behavior if the address it loads from is not word-aligned.

• In an implementation that does not include the Virtualization Extensions, any unaligned access that is not

faulted by the alignment restrictions shown in Table A3-1 on page A3-108 and that accesses memory with

the Strongly-ordered or Device memory attribute has UNPREDICTABLE behavior.

Note

— In an implementation that includes the Virtualization Extensions, such an unaligned access to Device

or Strongly-ordered memory generates an Alignment fault, see Alignment faults on page B3-1402.

—Memory types and attributes and the memory order model on page A3-126 describes the

Strongly-ordered and Device memory attributes.

A3.2.3 Unaligned data access restrictions in ARMv7 and ARMv6

ARMv7 and ARMv6 have the following restrictions on unaligned data accesses:

• Accesses are not guaranteed to be single-copy atomic except at the byte access level, see Atomicity in the

ARM architecture on page A3-128. An access can be synthesized out of a series of aligned operations in a

shared memory system without guaranteeing locked transaction cycles.

• Unaligned accesses typically take a number of additional cycles to complete compared to a naturally aligned

transfer. The real-time implications must be analyzed carefully and key data structures might need to have

their alignment adjusted for optimum performance.

• An operation that performs an unaligned access can abort on any memory access that it makes, and can abort

on more than one access. This means that an unaligned access that occurs across a page boundary can

generate an abort on either side of the boundary, or on both sides of the boundary.

Shared memory schemes must not rely on seeing single-copy atomic updates of unaligned data of loads and stores

for data items larger than byte wide. For more information, see Atomicity in the ARM architecture on page A3-128.

Unaligned access operations must not be used for accessing memory-mapped registers in a Device or

Strongly-ordered memory region.

A3 Application Level Memory Model

A3.3 Endian support

Non-Confidential ID051414

A3.3 Endian support

The rules in Address space on page A3-106 require that for a word-aligned address A:

• the doubleword at address A comprises the bytes at addresses A, A+1, A+2, A+3, A+4, A+5, A+6, and A+7

• the word:

— at address A comprises the bytes at addresses A, A+1, A+2 and A+3

— at address A+4 comprises the bytes at addresses A+4, A+5, A+6 and A+7

• the halfword:

— at address A comprises the bytes at addresses A and A+1

— at address A+2 comprises the bytes at addresses A+2 and A+3

— at address A+4 comprises the bytes at addresses A+4 and A+5

— at address A+6 comprises the bytes at addresses A+6 and A+7

• this means that:

— the doubleword at address A comprises the words at addresses A and A+4

— the word at address A comprises the halfwords at addresses A and A+2

— the word at address A+4 comprises the halfwords at addresses A+4 and A+6.

However, this does not specify completely the mappings between words, halfwords, and bytes.

A memory system uses one of the two following mapping schemes. This choice is called the endianness of the

memory system.

In a little-endian memory system:

• the byte, halfword, or word at an address is the least significant byte, halfword, or word in the doubleword at

that address

• the byte or halfword at an address is the least significant byte or halfword in the word at that address

• the byte at an address is the least significant byte in the halfword at that address.

In a big-endian memory system:

• the byte, halfword, or word at an address is the most significant byte, halfword or word in the doubleword at

that address

• the byte or halfword at an address is the most significant byte or halfword in the word at that address

• the byte at an address is the most significant byte in the halfword at that address.

For an address A, Figure A3-1 on page A3-111 shows, for big-endian and little-endian memory systems, the

relationship between:

• the doubleword at address A

• the words at addresses A and A+4

• the halfwords at addresses A, A+2, A+4, and A+6

• the bytes at addresses A, A+1, A+2, A+3, A+4, A+5, A+6, and A+7.

A3 Application Level Memory Model

A3.3 Endian support

ID051414 Non-Confidential

Figure A3-1 Endianness relationships

The big-endian and little-endian mapping schemes determine the order in which the bytes of a doubleword, word

or halfword are interpreted. For example, a load of a word from address

0x1000

always results in an access to the

bytes at memory locations

0x1000

0x1001

0x1002

, and

0x1003

. The endianness mapping scheme determines the

significance of these four bytes.

A3.3.1 Instruction endianness

In ARMv7-A, the mapping of instruction memory is always little-endian. In ARMv7-R, instruction endianness can

be controlled at the system level, see Instruction endianness static configuration, ARMv7-R only on page A3-112.

Note

For information about data memory endianness control, see Endianness mapping register, ENDIANSTATE on

page A2-53.

Before ARMv7, the ARM architecture included legacy support for an alternative big-endian memory model,

described as BE-32 and controlled by SCTLR.B bit, bit[7] of the register, see Endian configuration and control on

page D12-2518. ARMv7 does not support BE-32 operation, and bit SCTLR[7] is RAZ/SBZP.

Where legacy object code for ARM processors contains instructions with a big-endian byte order, the removal of

support for BE-32 operation requires the instructions in the object files to have their bytes reversed for the code to

be executed on an ARMv7 processor. This means that:

• each Thumb instruction, whether a 32-bit Thumb instruction or a 16-bit Thumb instruction, must have the

byte order of each halfword of instruction reversed

• each ARM instruction must have the byte order of each word of instruction reversed.

For most situations, this can be handled in the link stage of a tool-flow, provided the object files include sufficient

information to permit this to happen. In practice, this is the situation for all applications with the ARMv7-A profile.

For applications of the ARMv7-R profile, there are some legacy code situations where the arrangement of the bytes

in the object files cannot be adjusted by the linker. For these object files to be used by an ARMv7-R processor the

byte order of the instructions must be reversed by the processor at runtime. Therefore, the ARMv7-R profile permits

configuration of the instruction endianness.

In this figure, Byte, A+1 is an abbreviation for Byte at address A+1

MSByte MSByte-1 MSByte-2 MSByte-3 LSByte+3 LSByte+2 LSByte+1 LSByte

Byte, A+7 Byte, AByte, A+1Byte, A+2Byte, A+3Byte, A+4Byte, A+5Byte, A+6

Halfword at address AHalfword at address A+2Halfword at address A+4Halfword at address A+6

Word at address AWord at address A+4

Doubleword at address A

Little-endian memory system

Byte, A Byte, A+1 Byte, A+2 Byte, A+3 Byte, A+4 Byte, A+5 Byte, A+6 Byte, A+7

Halfword at address A Halfword at address A+2 Halfword at address A+4 Halfword at address A+6

Word at address A Word at address A+4

Doubleword at address A

MSByte MSByte-1 MSByte-2 MSByte-3 LSByte+3 LSByte+2 LSByte+1 LSByte

Big-endian memory system

A3 Application Level Memory Model

A3.3 Endian support

Non-Confidential ID051414

Instruction endianness static configuration, ARMv7-R only

To provide support for legacy big-endian object code, the ARMv7-R profile supports optional byte order reversal

hardware as a static option from reset. The ARMv7-R profile includes a read-only bit in the CP15 Control Register,

SCTLR.IE, bit[31], that indicates the instruction endianness configuration.

A3.3.2 Element size and endianness

The effect of the endianness mapping on data transfers depends on the size of the data element or elements

transferred by the load/store instructions. Table A3-2 lists the element sizes of all the load/store instructions, for all

instruction sets.

A3.3.3 Instructions to reverse bytes in an ARM core register

An application or device driver might have to interface to memory-mapped peripheral registers or shared memory

structures that are not the same endianness as the internal data structures. Similarly, the endianness of the operating

system might not match that of the peripheral registers or shared memory. In these cases, the processor requires an

efficient method to transform explicitly the endianness of the data.

In ARMv7, in the ARM and Thumb instruction sets, the following instructions provide this functionality:

REV

Reverse word (four bytes) register, for transforming big-endian and little-endian 32-bit

representations, see REV on page A8-562.

REVSH

Reverse halfword and sign-extend, for transforming signed 16-bit representations, see REVSH on

page A8-566.

REV16

Reverse packed halfwords in a register for transforming big-endian and little-endian 16-bit

representations, see REV16 on page A8-564.

A3.3.4 Endianness in Advanced SIMD

Advanced SIMD element load/store instructions transfer vectors of elements between memory and the Advanced

SIMD register bank. An instruction specifies both the length of the transfer and the size of the data elements being

transferred. This information is used by the processor to load and store data correctly in both big-endian and

little-endian systems.

Consider, for example, the instruction:

VLD1.16 {D0}, [R1]

Table A3-2 Element size of load/store instructions

Instructions Element size

LDRB

LDREXB

LDRBT

LDRSB

LDRSBT

STRB

STREXB

STRBT

SWPB

TBB

Byte

LDRH

LDREXH

LDRHT

LDRSH

LDRSHT

STRH

STREXH

STRHT

TBH

Halfword

LDR

LDRT

LDREX

STR

STRT

STREX

Word

LDRD

LDREXD

STRD

STREXD

Word

All forms of

LDM

PUSH

POP

RFE

SRS

, all forms of

STM

SWP

Word

LDC

LDC2

STC

STC2

Word

Forms of

VLDM

VLDR

VPOP

VPUSH

VSTM

VSTR

that transfer 32-bit Si registers Word

Forms of

VLDM

VLDR

VPOP

VPUSH

VSTM

VSTR

that transfer 64-bit Di registers Doubleword

VLD1

VLD2

VLD3

VLD4

VST1

VST2

VST3

VST4

Element size of the Advanced SIMD access

A3 Application Level Memory Model

A3.3 Endian support

ID051414 Non-Confidential

This loads a 64-bit register with four 16-bit values. The four elements appear in the register in array order, with the

lowest indexed element fetched from the lowest address. The order of bytes in the elements depends on the

endianness configuration, as shown in Figure A3-2. Therefore, the order of the elements in the registers is the same

regardless of the endianness configuration.

Figure A3-2 Advanced SIMD byte order example

For information about the alignment of Advanced SIMD instructions see Unaligned data access on page A3-108.

D[15:8] D[7:0] C[15:8] C[7:0] B[15:8] B[7:0] A[15:8] A[7:0]

64-bit register containing four 16-bit elements

6 D[7:0]

C[15:8]

C[7:0]

B[15:8]

B[7:0]

A[15:8]

A[7:0] 0

D[7:0]

D[15:8]

C[7:0]

C[15:8]

B[7:0]

B[15:8]

A[7:0]

A[15:8]

Memory system with

little-endian addressing (LE)

Memory system with

big-endian addressing (BE)

VLD1.16 {D0}, [R1] VLD1.16 {D0}, [R1]

77 D[15:8]

A3 Application Level Memory Model

A3.4 Synchronization and semaphores

Non-Confidential ID051414

A3.4 Synchronization and semaphores

In architecture versions before ARMv6, support for the synchronization of shared memory depends on the

SWP

and

SWPB

instructions. These are read-locked-write operations that swap register contents with memory, and are

described in SWP, SWPB on page A8-722. These instructions support basic busy/free semaphore mechanisms, but

do not support mechanisms that require calculation to be performed on the semaphore between the read and write

phases.

From ARMv6, ARM deprecates any use of

SWP

SWPB

, and the ARMv7 Virtualization Extensions make these

instructions OPTIONAL and deprecated.

Note

• ARM strongly recommends that all software uses the synchronization primitives described in this section,

rather than

SWP

SWPB

• If an implementation does not support the

SWP

and

SWPB

instructions, the ID_ISAR0.Swap_instrs and

ID_ISAR4.SWP_frac fields are zero, see About the Instruction Set Attribute registers on page B7-1952.

ARMv6 introduced a new mechanism to support more comprehensive non-blocking synchronization of shared

memory, using synchronization primitives that scale for multiprocessor system designs. ARMv7 extends support for

this mechanism, and provides the following synchronization primitives in the ARM and Thumb instruction sets:

• Load-Exclusives:

—

LDREX

, see LDREX on page A8-432

—

LDREXB

, see LDREXB on page A8-434

—

LDREXD

, see LDREXD on page A8-436

—

LDREXH

, see LDREXH on page A8-438

• Store-Exclusives:

—

STREX

, see STREX on page A8-690

—

STREXB

, see STREXB on page A8-692

—

STREXD

, see STREXD on page A8-694

—

STREXH

, see STREXH on page A8-696

• Clear-Exclusive,

CLREX

, see CLREX on page A8-360.

Note

This section describes the operation of a Load-Exclusive/Store-Exclusive pair of synchronization primitives using,

as examples, the

LDREX

and

STREX

instructions. The same description applies to any other pair of synchronization

primitives:

•

LDREXB

used with

STREXB

•

LDREXD

used with

STREXD

•

LDREXH

used with

STREXH

Software must use a Load-Exclusive instruction only with the corresponding Store-Exclusive instruction.

The model for the use of a Load-Exclusive/Store-Exclusive instruction pair, accessing a non-aborting memory

address x is:

• The Load-Exclusive instruction reads a value from memory address x.

• The corresponding Store-Exclusive instruction succeeds in writing back to memory address x only if no other

observer, process, or thread has performed a more recent store to address x. The Store-Exclusive operation

returns a status bit that indicates whether the memory write succeeded.

A Load-Exclusive instruction tags a small block of memory for exclusive access. The size of the tagged block is

IMPLEMENTATION DEFINED, see Tagging and the size of the tagged memory block on page A3-121. A

Store-Exclusive instruction to the same address clears the tag.

A3 Application Level Memory Model

A3.4 Synchronization and semaphores

ID051414 Non-Confidential

Note

In this section, the term processor includes any observer that can generate a Load-Exclusive or a Store-Exclusive.

A3.4.1 Exclusive access instructions and Non-shareable memory regions

For memory regions that do not have the Shareable attribute, the exclusive access instructions rely on a local

monitor that tags any address from which the processor executes a Load-Exclusive. Any non-aborted attempt by the

same processor to use a Store-Exclusive to modify any address is guaranteed to clear the tag.

A Load-Exclusive performs a load from memory, and:

• the executing processor tags the physical memory address for exclusive access

• the local monitor of the executing processor transitions to the Exclusive Access state.

A Store-Exclusive performs a conditional store to memory, that depends on the state of the local monitor:

If the local monitor is in the Exclusive Access state

• If the address of the Store-Exclusive is the same as the address that has been tagged in the

monitor by an earlier Load-Exclusive, then the store occurs, otherwise it is IMPLEMENTATION

DEFINED whether the store occurs.

• A status value is returned to a register:

— if the store took place the status value is 0

— otherwise, the status value is 1.

• The local monitor of the executing processor transitions to the Open Access state.

If the local monitor is in the Open Access state

• no store takes place

• a status value of 1 is returned to a register.

• the local monitor remains in the Open Access state.

The Store-Exclusive instruction defines the register to which the status value is returned.

When a processor writes using any instruction other than a Store-Exclusive it is IMPLEMENTATION DEFINED whether

the write affects the state of the local monitor.

If the local monitor is in the Exclusive Access state and the processor performs a Store-Exclusive to any address

other than the last one from which it performed a Load-Exclusive, it is IMPLEMENTATION DEFINED whether the store

updates memory, but in all cases the local monitor is reset to the Open Access state. This mechanism:

• is used on a context switch, see Context switch support on page A3-122

• must be treated as a software programming error in all other cases.

Note

It is IMPLEMENTATION DEFINED whether a store to a tagged physical address causes a tag in the local monitor to be

cleared if that store is by an observer other than the one that caused the physical address to be tagged.

Figure A3-3 on page A3-116 shows the state machine for the local monitor. Table A3-3 on page A3-116 shows the

effect of each of the operations shown in the figure.

A3 Application Level Memory Model

A3.4 Synchronization and semaphores

Non-Confidential ID051414

Figure A3-3 Local monitor state machine diagram

For more information about tagging see Tagging and the size of the tagged memory block on page A3-121.

Note

For the local monitor state machine, as shown in Figure A3-3:

• The IMPLEMENTATION DEFINED options for the local monitor are consistent with the local monitor being

constructed so that it does not hold any physical address, but instead treats any access as matching the address

of the previous

LoadExcl

• A local monitor implementation can be unaware of Load-Exclusive and Store-Exclusive operations from

other processors.

• The architecture does not require a load instruction by another processor, that is not a Load-Exclusive

instruction, to have any effect on the local monitor.

• It is IMPLEMENTATION DEFINED whether the transition from Exclusive Access to Open Access state occurs

when the

Store

StoreExcl

is from another observer.

Table A3-3 shows the effect of the operations shown in Figure A3-3.

Open

Access

Exclusive

Access

LoadExcl(x) LoadExcl(x)

CLREX

StoreExcl(x)

Store(x)

CLREX

Store(!Tagged_address)*

Store(Tagged_address)*

Operations marked * are possible alternative IMPLEMENTATION DEFINED options.

In the diagram: LoadExcl represents any Load-Exclusive instruction

StoreExcl represents any Store-Exclusive instruction

Store represents any other store instruction.

Any LoadExcl operation updates the tagged address to the most significant bits of the address x used for the operation.

StoreExcl(Tagged_address)

StoreExcl(!Tagged_address)

Store(Tagged_address)*

Table A3-3 Effect of Exclusive instructions and write operations on the local monitor

Initial state OperationaEffect Final state

Open Access

CLREX

No effect Open Access

StoreExcl(x)

Does not update memory, returns status 1 Open Access

LoadExcl(x)

Loads value from memory, tags address x Exclusive Access

Store(x)

Updates memory, no effect on monitor Open Access

Exclusive Access

CLREX

Clears tagged address Open Access

StoreExcl(t)

Updates memory, returns status 0 Open Access

StoreExcl(!t)

Updates memory, returns status 0b

Open Access

Does not update memory, returns status 1b

LoadExcl(x)

Loads value from memory, changes tag to address x Exclusive Access

A3 Application Level Memory Model

A3.4 Synchronization and semaphores

ID051414 Non-Confidential

Note

Normal memory that is Inner Non-cacheable, Outer Non-cacheable is inherently coherent between different

processors, and it is IMPLEMENTATION DEFINED whether such memory, if it does not have the Shareable attribute, is

treated as Non-shareable or as Shareable.

Changes to the local monitor state resulting from speculative execution

The architecture permits a local monitor to transition to the Open Access state as a result of speculation, or from

some other cause. This is in addition to the transitions to Open Access state caused by the architectural execution

of an operation shown in Table A3-3 on page A3-116.

An implementation must ensure that:

• the local monitor cannot be seen to transition to the Exclusive Access state except as a result of the

architectural execution of one of the operations shown in Table A3-3 on page A3-116

• any transition of the local monitor to the Open Access state not caused by the architectural execution of an

operation shown in Table A3-3 on page A3-116 must not indefinitely delay forward progress of execution.

A3.4.2 Exclusive access instructions and Shareable memory regions

For memory regions that have the Shareable attribute, exclusive access instructions rely on:

•A local monitor for each processor in the system, that tags any address from which the processor executes a

Load-Exclusive. The local monitor operates as described in Exclusive access instructions and Non-shareable

memory regions on page A3-115, except that for Shareable memory any Store-Exclusive is then subject to

checking by the global monitor if it is described in that section as doing at least one of:

— updating memory

— returning a status value of 0.

The local monitor can ignore accesses from other processors in the system.

•A global monitor that tags a physical address as exclusive access for a particular processor. This tag is used

later to determine whether a Store-Exclusive to that address that has not been failed by the local monitor can

occur. Any successful write to the tagged address by any other observer in the shareability domain of the

memory location is guaranteed to clear the tag. For each processor in the system, the global monitor:

— can hold at least one tagged address

— maintains a state machine for each tagged address it can hold.

Exclusive Access

Store(!t)

Updates memory Exclusive Accessb

Open Accessb

Store(t)

Updates memory Exclusive Accessb

Open Accessb

a. In the table:

LoadExcl

represents any Load-Exclusive instruction

StoreExcl

represents any Store-Exclusive instruction

Store

represents any store operation other than a Store-Exclusive operation.

t is the tagged address, bits[31:a] of the address of the last Load-Exclusive instruction. For more information, see

Tagging and the size of the tagged memory block on page A3-121.

b. IMPLEMENTATION DEFINED alternative actions.

Table A3-3 Effect of Exclusive instructions and write operations on the local monitor (continued)

Initial state OperationaEffect Final state

A3 Application Level Memory Model

A3.4 Synchronization and semaphores

Non-Confidential ID051414

Note

For each processor, the architecture only requires global monitor support for a single tagged address. Any

situation that might benefit from the use of multiple tagged addresses on a single processor is

UNPREDICTABLE, see Load-Exclusive and Store-Exclusive usage restrictions on page A3-122.

In addition, in an implementation that includes the Large Physical Address Extension, when the implementation is

using the Short-descriptor translation table format, it is IMPLEMENTATION DEFINED whether Load-Exclusive and

Store-Exclusive accesses to Non-shareable regions with the Normal, Inner Non-cacheable, Outer Non-cacheable

attribute use the global monitor in addition to the local monitor.

Note

The global monitor can either reside in a processor block or exist as a secondary monitor at the memory

interfaces.The IMPLEMENTATION DEFINED aspects of the monitors mean that the global monitor and local monitor

can be combined into a single unit, provided that unit performs the global monitor and local monitor functions

defined in this manual.

For Shareable regions of memory, in some implementations and for some memory types, the properties of the global

monitor can be met only by functionality outside the processor. Some system implementations might not implement

this functionality for all regions of memory. In particular, this can apply to:

• any type of memory in the system implementation that does not support hardware cache coherency

• Non-cacheable memory, or memory treated as Non-cacheable, in an implementation that does support

hardware cache coherency.

In such a system, it is defined by the system:

• whether the global monitor is implemented

• if the global monitor is implemented, which address ranges or memory types it monitors.

The behavior of Load Exclusive and Store Exclusive instructions when accessing a memory address not monitored

by the global monitor is UNPREDICTABLE.

Note

An implementation can combine the functionality of the global and local monitors into a single unit.

Operation of the global monitor

A Load-Exclusive from Shareable memory performs a load from memory, and causes the physical address of the

access to be tagged as exclusive access for the requesting processor. This access also causes the exclusive access

tag to be removed from any other physical address that has been tagged by the requesting processor.

The global monitor only supports a single outstanding exclusive access to Shareable memory per processor. A

Load-Exclusive by one processor has no effect on the global monitor state for any other processor.

Store-Exclusive performs a conditional store to memory:

• The store is guaranteed to succeed only if the physical address accessed is tagged as exclusive access for the

requesting processor and both the local monitor and the global monitor state machines for the requesting

processor are in the Exclusive Access state. In this case:

— a status value of 0 is returned to a register to acknowledge the successful store

— the final state of the global monitor state machine for the requesting processor is IMPLEMENTATION

DEFINED

— if the address accessed is tagged for exclusive access in the global monitor state machine for any other

processor then that state machine transitions to Open Access state.

A3 Application Level Memory Model

A3.4 Synchronization and semaphores

ID051414 Non-Confidential

• If no address is tagged as exclusive access for the requesting processor, the store does not succeed:

— a status value of 1 is returned to a register to indicate that the store failed

— the global monitor is not affected and remains in Open Access state for the requesting processor.

• If a different physical address is tagged as exclusive access for the requesting processor, it is

IMPLEMENTATION DEFINED whether the store succeeds or not:

— if the store succeeds a status value of 0 is returned to a register, otherwise a value of 1 is returned

— if the global monitor state machine for the processor was in the Exclusive Access state before the

Store-Exclusive it is IMPLEMENTATION DEFINED whether that state machine transitions to the Open

Access state.

The Store-Exclusive instruction defines the register to which the status value is returned.

In a shared memory system, the global monitor implements a separate state machine for each processor in the

system. The state machine for accesses to Shareable memory by processor (n) can respond to all the Shareable

memory accesses visible to it. This means it responds to:

• accesses generated by the associated processor (n)

• accesses generated by the other observers in the shareability domain of the memory location (!n).

In a shared memory system, the global monitor implements a separate state machine for each observer that can

generate a Load-Exclusive or a Store-Exclusive in the system.

Figure A3-4 shows the state machine for processor(n) in a global monitor. Table A3-4 on page A3-120 shows the

effect of each of the operations shown in the figure.

Figure A3-4 Global monitor state machine diagram for processor(n) in a multiprocessor system

For more information about tagging see Tagging and the size of the tagged memory block on page A3-121.

Any LoadExcl operation updates the tagged address to the most significant bits of the address x used for the operation.

Open

Access

Exclusive

Access

LoadExcl(x,n) LoadExcl(x,n)

CLREX(n)

StoreExcl(x,n)

CLREX(n)*

StoreExcl(Tagged_address,!n)‡

Store(Tagged_address,!n)

StoreExcl(Tagged_address,n)*

Store(!Tagged_address,n)

Operations marked * are possible alternative

IMPLEMENTATION

DEFINED

options.

In the diagram: LoadExcl represents any Load-Exclusive instruction

StoreExcl represents any Store-Exclusive instruction

Store represents any other store instruction.

LoadExcl(x,!n)

StoreExcl(x,!n)

Store(x,n)

StoreExcl(!Tagged_address,n)*

Store(Tagged_address,n)*

StoreExcl(Tagged_address,!n)‡

StoreExcl(Tagged_address,n)*

StoreExcl(!Tagged_address,n)*

Store(Tagged_address,n)*

CLREX(n)*

StoreExcl(!Tagged_address,!n)

Store(!Tagged_address,!n)

CLREX(!n)

‡StoreExcl(Tagged_Address,!n) clears the monitor only if the StoreExcl updates memory

Store(x,!n)

CLREX(!n)

A3 Application Level Memory Model

A3.4 Synchronization and semaphores

Non-Confidential ID051414

Note

For the global monitor state machine, as shown in Figure A3-4 on page A3-119:

• The architecture does not require a load instruction by another processor, that is not a Load-Exclusive

instruction, to have any effect on the global monitor.

• Whether a Store-Exclusive successfully updates memory or not depends on whether the address accessed

matches the tagged Shareable memory address for the processor issuing the Store-Exclusive instruction. For

this reason, Figure A3-4 on page A3-119 and Table A3-4 only show how the (!n) entries cause state

transitions of the state machine for processor(n).

• An Load-Exclusive can only update the tagged Shareable memory address for the processor issuing the

Load-Exclusive instruction.

• The effect of the

CLREX

instruction on the global monitor is IMPLEMENTATION DEFINED.

• It is IMPLEMENTATION DEFINED:

— whether a modification to a non-shareable memory location can cause a global monitor to transition

from Exclusive Access to Open Access state

— whether a Load-Exclusive to a non-shareable memory location can cause a global monitor to transition

from Open Access to Exclusive Access state.

Table A3-4 shows the effect of the operations shown in Figure A3-4 on page A3-119.

Table A3-4 Effect of load/store operations on global monitor for processor(n)

Initial state OperationaEffect Final state

Exclusive

Access

LoadExcl(x, n)

Loads value from memory, tags address x Exclusive Access

CLREX(n)

None. Effect on the final state is IMPLEMENTATION DEFINED.Exclusive Accessd

Open Accessd

CLREX(!n)

None Exclusive Access

StoreExcl(t, !n)

Updates memory, returns status 0bOpen Access

Does not update memory, returns status 1bExclusive Access

StoreExcl(t, n)

Updates memory, returns status 0cOpen Access

Exclusive Access

StoreExcl(!t, n)

Updates memory, returns status 0dOpen Access

Exclusive Access

Does not update memory, returns status 1dOpen Access

Exclusive Access

StoreExcl(!t, !n)

Depends on state machine and tag address for processor issuing

STREX

Exclusive Access

Store(t, n)

Updates memory Exclusive Accessd

Open Accessd

Store(t, !n)

Updates memory Open Access

Store(!t, n)

Store(!t, !n)

Updates memory, no effect on monitor Exclusive Access

A3 Application Level Memory Model

A3.4 Synchronization and semaphores

ID051414 Non-Confidential

A3.4.3 Tagging and the size of the tagged memory block

As stated in the footnotes to Table A3-3 on page A3-116 and Table A3-4 on page A3-120, when a Load-Exclusive

instruction is executed, the resulting tag address ignores the least significant bits of the memory address.

Tagged_address = Memory_address[31:a]

The value of

in this assignment is IMPLEMENTATION DEFINED, between a minimum value of 3 and a maximum

value of 11. For example, in an implementation where a is 4, a successful

LDREX

of address

0x000341B4

gives a tag

value of bits[31:4] of the address, giving

0x000341B

. This means that the four words of memory from

0x000341B0

0x000341BF

are tagged for exclusive access.

The size of the tagged memory block is called the Exclusives Reservation Granule. The Exclusives Reservation

Granule is IMPLEMENTATION DEFINED in the range 2-512 words:

• 2 words in an implementation where a is 3

• 512 words in an implementation where a is 11.

In some implementations the CTR identifies the Exclusives Reservation Granule, see either:

•CTR, Cache Type Register, VMSA on page B4-1556

•CTR, Cache Type Register, PMSA on page B6-1835.

Open Access

CLREX(n)

CLREX(!n)

None Open Access

StoreExcl(x, n)

Does not update memory, returns status 1 Open Access

LoadExcl(x, !n)

Loads value from memory, no effect on tag address for processor(n) Open Access

StoreExcl(x, !n)

Depends on state machine and tag address for processor issuing

STREX

bOpen Access

Store(x, n)

Store(x, !n)

Updates memory, no effect on monitor Open Access

LoadExcl(x, n)

Loads value from memory, tags address x Exclusive Access

a. In the table:

LoadExcl

represents any Load-Exclusive instruction

StoreExcl

represents any Store-Exclusive instruction

Store

represents any store operation other than a Store-Exclusive operation.

t is the tagged address for processor(n), bits[31:a] of the address of the last Load-Exclusive instruction issued by processor(n), see Tagging

and the size of the tagged memory block.

b. The result of a

STREX(x, !n)

or a

STREX(t, !n)

operation depends on the state machine and tagged address for the processor issuing the

STREX

instruction. This table shows how each possible outcome affects the state machine for processor(n).

c. After a successful

STREX

to the tagged address, the state of the state machine is IMPLEMENTATION DEFINED. However, this state has no effect

on the subsequent operation of the global monitor.

d. Effect is IMPLEMENTATION DEFINED. The table shows all permitted implementations.

Table A3-4 Effect of load/store operations on global monitor for processor(n) (continued)

Initial state OperationaEffect Final state

A3 Application Level Memory Model

A3.4 Synchronization and semaphores

Non-Confidential ID051414

A3.4.4 Context switch support

After a context switch, software must ensure that the local monitor is in the Open Access state. This requires it to

either:

• execute a

CLREX

instruction

• execute a dummy

STREX

to a memory address allocated for this purpose.

Note

• Using a dummy

STREX

for this purpose is backwards-compatible with the ARMv6 implementation of the

exclusive operations. The

CLREX

instruction is introduced in ARMv6K.

• Context switching is not an application level operation. However, this information is included here to

complete the description of the exclusive operations.

The

STREX

CLREX

instruction that follows a context switch might cause a subsequent Store-Exclusive to fail,

requiring a Load-Exclusive … Store-Exclusive sequence to be repeated. To minimize the possibility of this

happening, ARM recommends that the Store-Exclusive instruction is kept as close as possible to the associated

Load-Exclusive instruction, see Load-Exclusive and Store-Exclusive usage restrictions.

A3.4.5 Load-Exclusive and Store-Exclusive usage restrictions

The Load-Exclusive and Store-Exclusive instructions are intended to work together, as a pair, for example a

LDREX

STREX

pair or a

LDREXB

STREXB

pair. To support different implementations of these functions, software must

follow the notes and restrictions given here.

These notes describe use of an

LDREX

STREX

pair, but apply equally to any other Load-Exclusive/Store-Exclusive pair:

• The exclusives support a single outstanding exclusive access for each processor thread that is executed. The

architecture makes use of this by not requiring an address or size check as part of the

IsExclusiveLocal()

function. If the target virtual address of an

STREX

is different from the virtual address of the preceding

LDREX

in the same thread of execution, behavior can be UNPREDICTABLE. As a result, an

LDREX

STREX

pair can only

be relied upon to eventually succeed if they are executed with the same virtual address. Where a context

switch or exception might change the thread of execution, a

CLREX

instruction or a dummy

STREX

instruction

must be executed to avoid unwanted effects, as described in Context switch support. Using an

STREX

in this

way is the only occasion where software can program an

STREX

with a different address from the previously

executed

LDREX

•If two

STREX

instructions are executed without an intervening

LDREX

the second

STREX

returns a status value

of 1. This means that:

— ARM recommends that, in a given thread of execution, every

STREX

has a preceding

LDREX

associated

with it

— it is not necessary for every

LDREX

to have a subsequent

STREX

• An implementation of the Load-Exclusive and Store-Exclusive instructions can require that, in any thread of

execution, the transaction size of a Store-Exclusive is the same as the transaction size of the preceding

Load-Exclusive executed in that thread. If the transaction size of a Store-Exclusive is different from the

preceding Load-Exclusive in the same thread of execution, behavior can be UNPREDICTABLE. As a result,

software can rely on an

LDREX

STREX

pair to eventually succeed only if they have the same size. Where a

context switch or exception might change the thread of execution, the software must execute a

CLREX

instruction, or a dummy

STREX

instruction, to avoid unwanted effects, as described in Context switch support.

Using an

STREX

in this way is the only occasion where software can use a Store-Exclusive instruction with a

different transaction size from the previously executed Load-Exclusive instruction.

• An implementation might clear an exclusive monitor between the

LDREX

and the

STREX

, without any

application-related cause. For example, this might happen because of cache evictions. Software written for

such an implementation must, in any single thread of execution, avoid having any explicit memory accesses,

System control register updates, or cache maintenance operations between the

LDREX

instruction and the

associated

STREX

instruction.

A3 Application Level Memory Model

A3.4 Synchronization and semaphores

ID051414 Non-Confidential

• In some implementations, an access to Strongly-ordered or Device memory might clear the exclusive

monitor. Therefore, software must not place a load or a store to Strongly-ordered or Device memory between

LDREX

and an

STREX

in a single thread of execution.

• Implementations can benefit from keeping the

LDREX

and

STREX

operations close together in a single thread of

execution. This minimizes the likelihood of the exclusive monitor state being cleared between the

LDREX

instruction and the

STREX

instruction. Therefore, for best performance, ARM strongly recommends a limit of

128 bytes between

LDREX

and

STREX

instructions in a single thread of execution.

• The architecture sets an upper limit of 2048 bytes on the size of a region that can be marked as exclusive.

Software can read the implemented size of the Exclusives reservation granule from the CTR.ERG field, see:

—CTR, Cache Type Register, VMSA on page B4-1556 for a VMSA implementation.

—CTR, Cache Type Register, PMSA on page B6-1835 for a PMSA implementation.

In a heavily contended system, having multiple objects that are in the same exclusive reservation granule

accessed by exclusive accesses can lead to starvation of a process accessing that granule. Therefore, in such

systems, ARM recommends that objects that are accessed by exclusive accesses are separated by the size of

the Exclusive Reservation Granule.

• It is IMPLEMENTATION DEFINED whether

LDREX

and

STREX

operations can be performed to a memory region

with the Device or Strongly-ordered memory attribute. Unless the implementation documentation explicitly

states that

LDREX

and

STREX

operations to a memory region with the Device or Strongly-ordered attribute are

permitted, the effect of such operations is UNPREDICTABLE.

• After taking a Data Abort exception, the state of the exclusive monitors is UNKNOWN. Therefore ARM

strongly recommends that the abort handling software performs a

CLREX

instruction, or a dummy

STREX

instruction, to clear the monitor state.

• For the memory location being accessed by a

LoadExcl/StoreExcl

pair, if the memory attributes for the

LoadExcl

instruction differ from the memory attributes for the

StoreExcl

instruction, behavior is

UNPREDICTABLE.

This can occur either:

— Because the translation of the accessed address changes between the

LoadExcl

and the

StoreExcl

— As a result of using different virtual addresses, with different attributes, that point to the same physical

address. This case is covered by another bullet point in this list.

If the memory attributes for the memory being accessed by an

LDREX

STREX

pair are changed between the

LDREX

and the

STREX

, behavior is UNPREDICTABLE.

• The effect of a data or unified cache invalidate, cache clean, or cache clean and invalidate instruction on a

local or global exclusive monitor that is in the Exclusive Access state is UNPREDICTABLE. Execution of the

instruction might clear the monitor, or it might leave it in the Exclusive Access state. For address-based

maintenance instructions this also applies to the monitors of other processors in the same shareability domain

as the processor executing the cache maintenance instruction, as determined by the shareability domain of

the address being maintained.

Note

ARM strongly recommends that implementations ensure that the use of such maintenance operations by a

processor in the Non-secure state cannot cause a denial of service on a processor in the Secure state.

A3 Application Level Memory Model

A3.4 Synchronization and semaphores

Non-Confidential ID051414

• If the mapping of the virtual to physical address is changed between the

LDREX

instruction and the

STREX

instruction, and the change is performed using a break-before-make sequence as described in General TLB

maintenance requirements on page B3-1381, if the

STREX

is performed after another write to the same

physical address as the

STREX

, and that other write was performed after the old translation was properly

invalidated and that invalidation was properly synchronized, then the

STREX

will not pass its monitor check.

Note

ARM expects that, in many implementations, either:

— The TLB invalidation will clear either the local or global monitor.

— The physical address will be checked between the

LDREX

and

STREX

Note

In the event of repeatedly-contending load-exclusive/store-exclusive sequences from multiple processors, an

implementation must ensure that forward progress is made by at least one processor.

A3.4.6 Semaphores

The Swap (

SWP

) and Swap Byte (

SWPB

) instructions must be used with care to ensure that expected behavior is

observed. Two examples are as follows:

1. A system with multiple bus masters that uses Swap instructions to implement semaphores that control

interactions between different bus masters.

In this case, the semaphores must be placed in an uncached region of memory, where any buffering of writes

occurs at a point common to all bus masters using the mechanism. The Swap instruction then causes a locked

read-write bus transaction.

2. A system with multiple threads running on a uniprocessor that uses Swap instructions to implement

semaphores that control interaction of the threads.

In this case, the semaphores can be placed in a cached region of memory, and a locked read-write bus

transaction might or might not occur. The Swap and Swap Byte instructions are likely to have better

performance on such a system than they do on a system with multiple bus masters, such as that described in

example 1.

Note

From ARMv6, ARM deprecates use of the Swap and Swap Byte instructions, and strongly recommends that all new

software uses the Load-Exclusive and Store-Exclusive synchronization primitives described in Synchronization and

semaphores on page A3-114, for example

LDREX

and

STREX

A3.4.7 Synchronization primitives and the memory order model

The synchronization primitives follow the memory order model of the memory type accessed by the instructions.

For this reason:

• Portable software for claiming a spin-lock must include a Data Memory Barrier (DMB) operation, performed

by a

DMB

instruction, between claiming the spin-lock and making any access that makes use of the spin-lock.

• Portable software for releasing a spin-lock must include a

DMB

instruction before writing to clear the spin-lock.

This requirement applies to software using:

• the Load-Exclusive/Store-Exclusive instruction pairs, for example

LDREX

STREX

• the deprecated synchronization primitives,

SWP

SWPB

A3 Application Level Memory Model

A3.4 Synchronization and semaphores

ID051414 Non-Confidential

A3.4.8 Use of WFE and SEV instructions by spin-locks

ARMv7 and ARMv6K provide Wait For Event and Send Event instructions,

WFE

and

SEV

, that can assist with

reducing power consumption and bus contention caused by processors repeatedly attempting to obtain a spin-lock.

These instructions can be used at the application level, but a complete understanding of what they do depends on

system level understanding of exceptions. They are described in Wait For Event and Send Event on page B1-1200.

A3 Application Level Memory Model

A3.5 Memory types and attributes and the memory order model

Non-Confidential ID051414

A3.5 Memory types and attributes and the memory order model

ARMv6 defined a set of memory attributes with the characteristics required to support the memory and devices in

the system memory map. In ARMv7 this set of attributes is extended by the addition of the Outer Shareable attribute

for Normal memory and, in an implementation that does not include the Large Physical Address Extension, for

Device memory.

Note

Whether an ARMv7 implementation distinguishes between Inner Shareable and Outer Shareable memory is

IMPLEMENTATION DEFINED.

The ordering of accesses for regions of memory, referred to as the memory order model, is defined by the memory

attributes. This model is described in the following sections:

•Memory types

•Summary of ARMv7 memory attributes on page A3-127

•Atomicity in the ARM architecture on page A3-128

•Concurrent modification and execution of instructions on page A3-130

•Normal memory on page A3-132

•Device and Strongly-ordered memory on page A3-136

•Memory access restrictions on page A3-138

•The effect of the Security Extensions on page A3-141.

A3.5.1 Memory types

For each memory region, the most significant memory attribute specifies the memory type. There are three mutually

exclusive memory types:

• Normal

•Device

• Strongly-ordered.

Normal and Device memory regions have additional attributes.

Usually, memory used for programs and for data storage is suitable for access using the Normal memory attribute.

Examples of memory technologies for which the Normal memory attribute is appropriate are:

• programmed Flash ROM

Note

During programming, Flash memory can be ordered more strictly than Normal memory.

•ROM

•SRAM

• DRAM and DDR memory.

System peripherals (I/O) generally conform to different access rules. Examples of I/O accesses are:

• FIFOs where consecutive accesses:

— add queued values on write accesses

— remove queued values on read accesses.

• interrupt controller registers where an access can be used as an interrupt acknowledge, changing the state of

the controller itself

• memory controller configuration registers that are used for setting up the timing and correctness of areas of

Normal memory

• memory-mapped peripherals, where accessing a memory location can cause side-effects in the system.

A3 Application Level Memory Model

A3.5 Memory types and attributes and the memory order model

ID051414 Non-Confidential

In ARMv7, the Strongly-ordered or Device memory attribute provides suitable access control for such peripherals.

To ensure correct system behavior, the access rules for Device and Strongly-ordered memory are more restrictive

than those for Normal memory, so that:

• Neither read nor write accesses can be performed speculatively.

Note

However, translation table walks can be made speculatively to memory marked as Device or

Strongly-ordered, see Device and Strongly-ordered memory on page A3-136.

• Read and write accesses cannot be repeated, for example, on return from an exception.

• The number, order and sizes of the accesses are maintained.

For more information, see Device and Strongly-ordered memory on page A3-136.

A3.5.2 Summary of ARMv7 memory attributes

Table A3-5 summarizes the memory attributes. For more information about these attributes see:

•Normal memory on page A3-132 and Shareable attribute for Device memory regions on page A3-137, for

the shareability attribute

•Write-Through Cacheable, Write-Back Cacheable and Non-cacheable Normal memory on page A3-134, for

cacheability and cache allocation hint attributes.

Note

The cacheability and cache allocation hint attributes apply only to Normal memory. Device and Strongly-ordered

memory regions are Non-cacheable.

In this table:

Shareability Applies only to Normal memory, and to Device memory in an implementation that does not include

the Large Physical Address Extension. In an implementation that includes the Large Physical

Address Extension, Device memory is always Outer Shareable,

When it is possible to assign a shareability attribute to Device memory, ARM deprecates assigning

any attribute other than Shareable or Outer Shareable, see Shareable attribute for Device memory

regions on page A3-137

Whether an ARMv7 implementation distinguishes between Inner Shareable and Outer Shareable

memory is IMPLEMENTATION DEFINED.

Cacheability Applies only to Normal memory, and can be defined independently for Inner and Outer cache

regions. Some cacheability attributes can be complemented by a cache allocation hint. This is an

indication to the memory system of whether allocating a value to a cache is likely to improve

performance. For more information see Cacheability and cache allocation hint attributes on

page B2-1264.

An implementation might not make any distinction between memory regions with attributes that

differ only in their cache allocation hint.

Table A3-5 Memory attribute summary

Memory type Implementation includes LPAEa?Shareability Cacheability

Strongly- ordered - - -

A3 Application Level Memory Model

A3.5 Memory types and attributes and the memory order model

Non-Confidential ID051414

Memory model and memory ordering on page D15-2595 compares these attributes with the memory attributes in

architecture versions before ARMv6.

A3.5.3 Atomicity in the ARM architecture

Atomicity is a feature of memory accesses, described as atomic accesses. The ARM architecture description refers

to two types of atomicity, defined in:

•Single-copy atomicity

•Multi-copy atomicity on page A3-130.

Single-copy atomicity

A read or write operation is single-copy atomic only if it meets the following conditions:

• For a single-copy atomic store, if the store overlaps another single-copy atomic store, then all of the writes

from one of the stores are inserted into the Coherence order of each overlapping byte before any of the writes

of the other store are inserted into the Coherence order of the overlapping bytes.

• If a single-copy atomic load overlaps a single-copy atomic store and for any of the overlapping bytes the load

returns the data written by the write inserted into the Coherence order of that byte by the single-copy atomic

store then the load must return data from a point in the Coherence order no earlier than the writes inserted

into the Coherence order by the single-copy atomic store of all of the overlapping bytes.

In ARMv7, the single-copy atomic processor accesses are:

• All byte accesses.

• All halfword accesses to halfword-aligned locations.

• All word accesses to word-aligned locations.

• Memory accesses caused by a

LDREXD

STREXD

to a doubleword-aligned location for which the

STREXD

succeeds

cause single-copy atomic updates of the doubleword being accessed.

Note

The way to atomically load two 32-bit quantities is to perform a

LDREXD

STREXD

sequence, reading and writing

the same value, for which the

STREXD

succeeds, and use the read values.

LDM

LDC

LDC2

LDRD

STM

STC

STC2

STRD

PUSH

POP

RFE

SRS

VLDM

VLDR

VSTM

, and

VSTR

instructions are executed as a

sequence of word-aligned word accesses. Each 32-bit word access is guaranteed to be single-copy atomic. The

architecture does not require subsequences of two or more word accesses from the sequence to be single-copy

atomic.

Device Yes Outer Shareable -

No Outer Shareable

Inner Shareable

Non-shareable

Normal - Outer Shareable One of:

• Non-cacheable

• Write-Through Cacheable

• Write-Back Cacheable.

Inner Shareable

Non-shareable

a. LPAE means the Large Physical Address Extension.

Table A3-5 Memory attribute summary (continued)

Memory type Implementation includes LPAEa?Shareability Cacheability

A3 Application Level Memory Model

A3.5 Memory types and attributes and the memory order model

ID051414 Non-Confidential

In an implementation that includes the Large Physical Address Extension,

LDRD

and

STRD

accesses to 64-bit aligned

locations are 64-bit single-copy atomic as seen by translation table walks and accesses to translation tables.

Note

The Large Physical Address Extension adds this requirement to avoid the need for complex measures to avoid

atomicity issues when changing translation table entries, without creating a requirement that all locations in the

memory system are 64-bit single-copy atomic. This addition means:

• The system designer must ensure that all writable memory locations that might be used to hold translations,

such as bulk SDRAM, can be accessed with 64-bit single-copy atomicity.

• Software must ensure that translation tables are not held in memory locations that cannot meet this atomicity

requirement, such as peripherals that are typically accessed using a narrow bus.

This requirement places no burden on read-only memory locations for which reads have no side effects, since it is

impossible to detect the size of memory accesses to such locations.

Advanced SIMD element and structure loads and stores are executed as a sequence of accesses of the element or

structure size. The architecture requires the element accesses to be single-copy atomic if and only if both:

• the element size is 32 bits, or smaller

• the elements are naturally aligned.

Accesses to 64-bit elements or structures that are at least word-aligned are executed as a sequence of 32-bit accesses,

each of which is single-copy atomic.The architecture does not require subsequences of two or more 32-bit accesses

from the sequence to be single-copy atomic.

When an access is not single-copy atomic by the rules described in this section, it is executed as a sequence of one

or more accesses that aggregate to size of the access. Each of the accesses in this sequence is single-copy atomic, at

least at the byte level.

Note

In this section, the terms before the write operation and after the write operation mean before or after the write

operation has had its effect on the coherence order of the bytes of the memory location accessed by the write

operation.

If, according to these rules, an instruction is executed as a sequence of accesses, some exceptions can be taken

during that sequence. Such an exception causes execution of the instruction to be abandoned. These exceptions are:

• Synchronous Data Abort exceptions.

• The following, if low interrupt latency configuration is selected and the accesses are to Normal memory:

— IRQ interrupts

— FIQ interrupts

— asynchronous aborts.

For more information about this configuration, see Low interrupt latency configuration on page B1-1198.

If such an instruction is abandoned as a result of an asynchronous exception, then:

• For a load:

— Any register being loaded other than one used in the generation of the address by the instruction, might

contain an UNKNOWN value.

— Registers used in the generation of the address are restored to their initial value.

• For a store, any data location being stored to can contain an UNKNOWN value.

If such an instruction is abandoned as a result of a Synchronous Data Abort exception, then see Data Abort

exception on page B1-1215.

A3 Application Level Memory Model

A3.5 Memory types and attributes and the memory order model

Non-Confidential ID051414

If any of these exceptions are returned from using their preferred return address, the instruction that generated the

sequence of accesses is re-executed and so any access that had been performed before the exception was taken is

repeated.

Note

The exception behavior for these multiple access instructions means they are not suitable for use for writes to

memory for the purpose of software synchronization.

For implicit accesses:

• Cache linefills and evictions have no effect on the single-copy atomicity of explicit transactions or instruction

fetches.

• Instruction fetches are single-copy atomic:

— at 32-bit granularity in ARM state

— at 16-bit granularity in Thumb and ThumbEE states

— at 8-bit granularity in Jazelle state.

Concurrent modification and execution of instructions describes additional constraints on the behavior of

instruction fetches.

• Translation table walks are performed using accesses that are single-copy atomic:

— at 32-bit granularity when using Short-descriptor format translation tables

— at 64-bit granularity when using Long-descriptor format translation tables.

Multi-copy atomicity

In a multiprocessing system, writes to a memory location are multi-copy atomic if the following conditions are both

true:

• All writes to the same location are serialized, meaning they are observed in the same order by all observers,

although some observers might not observe all of the writes.

• A read of a location does not return the value of a write until all observers observe that write.

Writes to Normal memory are not multi-copy atomic.

All writes to Device and Strongly-ordered memory that are single-copy atomic are also multi-copy atomic.

Note

All coherent write accesses to the same location are serialized, regardless of whether or not they are multi-copy

atomic. For Normal memory, write accesses can be repeated up to the point that another write to the same address

is observed, and serialization does not prohibit the merging of writes.

A3.5.4 Concurrent modification and execution of instructions

The ARMv7 architecture limits the set of instructions that can be executed by one thread of execution as they are

being modified by another thread of execution without requiring explicit synchronization.

Except for the instructions identified in this section, the effect of the concurrent modification and execution of an

instruction is UNPREDICTABLE.

For the following instructions only, the architecture guarantees that, after modification of the instruction, behavior

is consistent with execution of either:

• The instruction originally fetched.

• A fetch of the new instruction. That is, a fetch of the instruction that results from the modification.

A3 Application Level Memory Model

A3.5 Memory types and attributes and the memory order model

ID051414 Non-Confidential

The instructions to which this guarantee applies are:

In the Thumb instruction set

The 16-bit encodings of the

NOP

BKPT

, and

SVC

instructions.

In addition:

• The most-significant halfword of a

instruction can be concurrently modified to the most

significant halfword of another

instruction.

The most-significant halfword of a

BLX

instruction can be concurrently modified to the most

significant halfword of another

BLX

instruction.

These cases mean that the most significant bits of the immediate value can be modified.

• The most-significant halfword of a

BLX

instruction can be concurrently modified to a

16-bit

BKPT

, or

SVC

instruction.

• The least-significant halfword of a

instruction can be concurrently modified to the least

significant halfword of another

instruction.

The least-significant halfword of a

BLX

instruction can be concurrently modified to the least

significant halfword of another

BLX

instruction.

These cases mean that the least significant bits of the immediate value can be modified.

• The least-significant halfword of a 32-bit

immediate instruction:

— with a condition field can be concurrently modified to the least significant halfword of

another 32-bit

immediate instruction with a condition field

— without a condition field can be concurrently modified to the least significant halfword

of another 32-bit

immediate instruction without a condition field.

These cases mean that the least significant bits of the immediate value can be modified.

• A 16-bit

BKPT

, or

SVC

instruction can be concurrently modified to the most-significant

halfword of a

instruction.

Note

In the Thumb instruction set:

• the only encodings of

BKPT

and

SVC

are 16-bit

• the only encoding of

is 32-bit.

In the ARM instruction set

The

NOP

BKPT

SVC

HVC

, and

SMC

instructions.

For all other instructions, to avoid UNPREDICTABLE behavior, instruction modifications must be explicitly

synchronized before they are executed. The required synchronization is as follows:

1. To ensure that the modified instructions are observable, the thread of execution that is modifying the

instructions must issue the following sequence of instructions and operations:

DCCMVAU [instruction location] ; Clean data cache by MVA to point of unification

DSB ; Ensure visibility of the data cleaned from the cache

ICIMVAU [instruction location] ; Invalidate instruction cache by MVA to PoU

BPIMVAU [instruction location] ; Invalidate branch predictor by MVA to PoU

DSB ; Ensure completion of the invalidations

2. Once the modified instructions are observable, the thread of execution that is executing the modified

instructions must issue the following instructions or operations to ensure execution of the modified

instructions:

ISB ; Synchronize fetched instruction stream

Note

Issue C.a of this manual first describes this behavior, but the description applies to all ARMv7 implementations.

A3 Application Level Memory Model

A3.5 Memory types and attributes and the memory order model

Non-Confidential ID051414

In addition, for both instruction sets, if one thread of execution changes a conditional branch instruction to another

conditional branch instruction, and the change affects both the condition field and the branch target, execution of

the changed instruction by another thread of execution before the change is synchronized can lead to either:

• the old condition being associated with the new target address

• the new condition being associated with the old target address.

These possibilities apply regardless of whether the condition, either before or after the change to the branch

instruction, is the always condition.

A3.5.5 Normal memory

Accesses to normal memory region are idempotent, meaning that they exhibit the following properties:

• read accesses can be repeated with no side-effects

• repeated read accesses return the last value written to the resource being read

• read accesses can fetch additional memory locations with no side-effects

• write accesses can be repeated with no side-effects in the following cases:

— if the contents of the location accessed are unchanged between the repeated writes

— as the result of an exception, as described in this section

• unaligned accesses can be supported

• accesses can be merged before accessing the target memory system.

Normal memory can be read/write or read-only, and a Normal memory region is defined as being either Shareable

or Non-shareable. For Shareable Normal memory, whether a VMSA implementation distinguishes between Inner

Shareable and Outer Shareable is IMPLEMENTATION DEFINED. A PMSA implementation makes no distinction

between Inner Shareable and Outer Shareable regions.

The Normal memory type attribute applies to most memory used in a system.

Accesses to Normal memory have a weakly consistent model of memory ordering. See a standard text describing

memory ordering issues for a description of weakly consistent memory models, for example chapter 2 of Memory

Consistency Models for Shared Memory-Multiprocessors. In general, for Normal memory, barrier operations are

required where the order of memory accesses observed by other observers must be controlled. This requirement

applies regardless of the cacheability and shareability attributes of the Normal memory region.

The ordering requirements of accesses described in Ordering requirements for memory accesses on page A3-149

apply to all explicit accesses.

An instruction that generates a sequence of accesses as described in Atomicity in the ARM architecture on

page A3-128 might be abandoned as a result of an exception being taken during the sequence of accesses. On return

from the exception the instruction is restarted, and therefore one or more of the memory locations might be accessed

multiple times. This can result in repeated write accesses to a location that has been changed between the write

accesses.

The architecture permits speculative accesses to memory locations marked as Normal if the access permissions and

domain permit an access to the locations.

A Normal memory region has shareability attributes that define the data coherency properties of the region. These

attributes do not affect the coherency requirements of:

• Instruction fetches, see Instruction coherency issues on page A3-158.

• Translation table walks for VMSA implementations of:

— ARMv7-A without the Multiprocessing extensions

— versions of the architecture before ARMv7.

For more information, see TLB maintenance operations and the memory order model on page B3-1383.

Non-shareable Normal memory

For a Normal memory region, the Non-shareable attribute identifies Normal memory that is likely to be accessed

only by a single processor.

A3 Application Level Memory Model

A3.5 Memory types and attributes and the memory order model

ID051414 Non-Confidential

A region of Normal memory with the Non-shareable attribute does not have any requirement to make data accesses

by different observers coherent, unless the memory is Non-cacheable. If other observers share the memory system,

software must use cache maintenance operations if the presence of caches might lead to coherency issues when

communicating between the observers. This cache maintenance requirement is in addition to the barrier operations

that are required to ensure memory ordering.

For Non-shareable Normal memory, it is IMPLEMENTATION DEFINED whether the Load-Exclusive and

Store-Exclusive synchronization primitives take account of the possibility of accesses by more than one observer.

Shareable, Inner Shareable, and Outer Shareable Normal memory

For Normal memory, the Shareable and Outer Shareable memory attributes describe Normal memory that is

expected to be accessed by multiple processors or other system masters:

• In a VMSA implementation, Normal memory that has the Shareable attribute but not the Outer Shareable

attribute assigned is described as having the Inner Shareable attribute.

• In a PMSA implementation, no distinction is made between Inner Shareable and Outer Shareable Normal

memory.

A region of Normal memory with the Shareable attribute is one for which data accesses to memory by different

observers within the same shareability domain are coherent.

The Outer Shareable attribute is introduced in ARMv7, and can be applied only to a Normal memory region in a

VMSA implementation that has the Shareable attribute assigned. It creates three levels of shareability for a Normal

memory region:

Non-shareable A Normal memory region that does not have the Shareable attribute assigned.

Inner Shareable A Normal memory region that has the Shareable attribute assigned, but not the Outer

Shareable attribute.

Outer Shareable A Normal memory region that has both the Shareable and the Outer Shareable attributes

assigned.

These attributes can define sets of observers for which the shareability attributes make the data or unified caches

transparent for data accesses. The sets of observers that are affected by the shareability attributes are described as

shareability domains. The details of the use of these attributes are system-specific. Example A3-1 shows how they

might be used:

Example A3-1 Use of shareability attributes

In a VMSA implementation, a particular subsystem with two clusters of processors has the requirement that:

• in each cluster, the data or unified caches of the processors in the cluster are transparent for all data accesses

with the Inner Shareable attribute

• however, between the two clusters, the caches:

— are not transparent for data accesses that have only the Inner Shareable attribute

— are transparent for data accesses that have the Outer Shareable attribute.

In this system, each cluster is in a different shareability domain for the Inner Shareable attribute, but all components

of the subsystem are in the same shareability domain for the Outer Shareable attribute.

A system might implement two such subsystems. If the data or unified caches of one subsystem are not transparent

to the accesses from the other subsystem, this system has two Outer Shareable shareability domains.

However, for a Normal memory region that is Non-cacheable, as described in Write-Through Cacheable,

Write-Back Cacheable and Non-cacheable Normal memory on page A3-134, the only significance of the

Shareability attribute is the behavior of Load-Exclusive and Store-Exclusive instructions. For more information

about this behavior see Synchronization and semaphores on page A3-114.

A3 Application Level Memory Model

A3.5 Memory types and attributes and the memory order model

Non-Confidential ID051414

Having two levels of shareability attribute means system designers can reduce the performance and power overhead

for shared memory regions that do not need to be part of the Outer Shareable shareability domain.

In a VMSA implementation, for Shareable Normal memory, whether there is a distinction between Inner Shareable

and Outer Shareable is IMPLEMENTATION DEFINED.

For Shareable Normal memory, the Load-Exclusive and Store-Exclusive synchronization primitives take account

of the possibility of accesses by more than one observer in the same Shareability domain.

Note

• System designers can use the Shareable concept to specify the locations in Normal memory that must have

coherency requirements. However, to facilitate porting of software, software developers must not assume that

specifying a memory region as Non-shareable permits software to make assumptions about the incoherency

of memory locations between different processors in a shared memory system. Such assumptions are not

portable between different multiprocessing implementations that make use of the Shareable concept. Any

multiprocessing implementation might implement caches that, inherently, are shared between different

processing elements.

• This architecture is written with an expectation that all processors using the same operating system or

hypervisor are in the same Inner Shareable shareability domain.

Write-Through Cacheable, Write-Back Cacheable and Non-cacheable Normal memory

In addition to being Outer Shareable, Inner Shareable or Non-shareable, each region of Normal memory is assigned

a cacheability attribute that is one of:

• Write-Through Cacheable

• Write-Back Cacheable

• Non-cacheable.

Also, for cacheable Normal memory regions:

• a region might be assigned a cache allocation hint

• in an ARMv7-A implementation that includes the Large Physical Address Extension, it is IMPLEMENTATION

DEFINED whether the Write-Through Cacheable and Write-Back Cacheable attributes can have an additional

attribute of Transient or Non-transient, see Transient cacheability attribute, Large Physical Address

Extension on page A3-135.

A memory location can be marked as having different cacheability attributes, for example when using aliases in a

virtual to physical address mapping:

• if the attributes differ only in the cache allocation hint this does not affect the behavior of accesses to that

location

• for other cases see Mismatched memory attributes on page A3-139.

The cacheability attributes provide a mechanism of coherency control with observers that lie outside the shareability

domain of a region of memory. In some cases, the use of Write-Through Cacheable or Non-cacheable regions of

memory might provide a better mechanism for controlling coherency than the use of hardware coherency

mechanisms or the use of cache maintenance routines. To this end, the architecture requires the following properties

for Non-cacheable or Write-Through Cacheable memory:

• a completed write to a memory location that is Non-cacheable or Write-Through Cacheable for a level of

cache made by an observer accessing the memory system inside the level of cache is visible to all observers

accessing the memory system outside the level of cache without the need of explicit cache maintenance

• a completed write to a memory location that is Non-cacheable for a level of cache made by an observer

accessing the memory system outside the level of cache is visible to all observers accessing the memory

system inside the level of cache without the need of explicit cache maintenance.

A3 Application Level Memory Model

A3.5 Memory types and attributes and the memory order model

ID051414 Non-Confidential

Note

Implementations can use the cache allocation hints to indicate a probable performance benefit of caching. For

example, a programmer might know that a piece of memory is not going to be accessed again and would be better

treated as Non-cacheable. The distinction between memory regions with attributes that differ only in the cache

allocation hints exists only as a hint for performance.

The ARM architecture provides independent cacheability attributes for Normal memory for two conceptual levels

of cache, the inner and the outer cache. The relationship between these conceptual levels of cache and the

implemented physical levels of cache is IMPLEMENTATION DEFINED, and can differ from the boundaries between the

Inner and Outer Shareability domains. However:

• inner refers to the innermost caches, and always includes the lowest level of cache

• no cache controlled by the Inner cacheability attributes can lie outside a cache controlled by the Outer

cacheability attributes

• an implementation might not have any outer cache.

Example A3-2, Example A3-3, and Example A3-4 describe the possible ways of implementing a system with three

levels of cache, level 1 (L1) to level 3 (L3).

Note

• L1 cache is the level closest to the processor, see Memory hierarchy on page A3-156.

• When managing coherency, system designs must consider both the inner and outer cacheability attributes, as

well as the shareability attributes. This is because hardware might have to manage the coherency of caches

at one conceptual level, even when another conceptual level has the Non-cacheable attribute.

Example A3-2 Implementation with two inner and one outer cache levels

Implement the three levels of cache in the system, L1 to L3, with:

• the Inner cacheability attribute applied to L1 and L2 cache

• the Outer cacheability attribute applied to L3 cache.

Example A3-3 Implementation with three inner and no outer cache levels

Implement the three levels of cache in the system, L1 to L3, with the Inner cacheability attribute applied to L1, L2,

and L3 cache. Do not use the Outer cacheability attribute.

Example A3-4 Implementation with one inner and two outer cache levels

Implement the three levels of cache in the system, L1 to L3, with:

• the Inner cacheability attribute applied to L1 cache

• the Outer cacheability attribute applied to L2 and L3 cache.

Transient cacheability attribute, Large Physical Address Extension

For an ARMv7-A implementation that includes the Large Physical Address Extension, it is IMPLEMENTATION

DEFINED whether a Transient attribute is supported for cacheable Normal memory regions. If an implementation

supports this attribute, the set of possible cacheability attributes for a Normal memory region becomes:

• Write-Through Cacheable, Non-transient

A3 Application Level Memory Model

A3.5 Memory types and attributes and the memory order model

Non-Confidential ID051414

• Write-Back Cacheable, Non-transient

• Write-Through Cacheable, Transient

• Write-Back Cacheable, Transient

• Non-cacheable.

The cacheability attribute can be defined independently for the inner and outer levels of caching.

The transient attribute indicates that the benefit of caching is for a relatively short period, and that therefore it might

be better to restrict allocation, to avoid possibly casting-out other, less transient, entries.

Note

The architecture does not specify what is meant by a relatively short period.

The description of the MAIRn registers includes the assignment of the Transient attribute in an implementation that

supports this option.

A3.5.6 Device and Strongly-ordered memory

The Device and Strongly-ordered memory type attributes define memory locations where an access to the location

can cause side-effects, or where the value returned for a load can vary depending on the number of loads performed.

In ARMv7, Device and Strongly-ordered memory differ only in their shareability options, as this section describes.

Note

See Ordering of instructions that change the CPSR interrupt masks on page D12-2508 for additional requirements

that apply to accesses to Strongly-ordered memory in ARMv6.

Examples of memory regions normally marked as being Device or Strongly-ordered memory are Memory-mapped

peripherals and I/O locations.

For explicit accesses from the processor to memory marked as Device or Strongly-ordered:

• all accesses occur at their program size

• the number of accesses is the number specified by the program.

An implementation must not perform more accesses to a Device or Strongly-ordered memory location than are

specified by a simple sequential execution of the program, except as a result of an exception. This section describes

this permitted effect of an exception.

The architecture does not permit speculative data accesses to memory marked as Device or Strongly-ordered.

However, it does not prohibit speculative translation table walks to Device or Strongly-ordered memory.

Note

• For an implementation that includes the Virtualization Extensions, for accesses from an application running

in Non-secure state, a speculative translation table walk to Device or Strongly-ordered memory might result

from the second stage of address translation defined by a hypervisor. For more information, see Overlaying

the memory type attribute on page B3-1376.

• For information about restrictions on speculative instruction fetching see:

—Execute-never restrictions on instruction fetching on page B3-1359 for a VMSA implementation

—The XN (Execute-never) attribute and instruction fetching on page B5-1761 for a PMSA

implementation.

The architecture permits an Advanced SIMD element or structure load instruction to access bytes in Device or

Strongly-ordered memory that are not explicitly accessed by the instruction, provided the bytes accessed are in a

16-byte window, aligned to 16-bytes, that contains at least one byte that is explicitly accessed by the instruction.

Address locations marked as Device or Strongly-ordered are never held in a cache.

A3 Application Level Memory Model

A3.5 Memory types and attributes and the memory order model

ID051414 Non-Confidential

Address locations marked as Strongly-ordered, and on an implementation that includes the Large Physical Address

Extension, address locations marked as Device, are always treated as Shareable. For more information about the

effect of the Large Physical Address Extension on the shareability of these locations see Device and

Strongly-ordered memory shareability, Large Physical Address Extension on page A3-138.

On an implementation that does not include the Large Physical Address Extension, the shareability of an address

location marked as Device is configurable, as described in Shareable attribute for Device memory regions.

All explicit accesses to Device or Strongly-ordered memory must comply with the ordering requirements of

accesses described in Ordering requirements for memory accesses on page A3-149. On an implementation that does

not include the Large Physical Address Extension, the requirements for Device memory depend on the shareability

of the Device memory locations.

An instruction that generates a sequence of accesses as described in Atomicity in the ARM architecture on

page A3-128 might be abandoned as a result of an exception being taken during the sequence of accesses. On return

from the exception the instruction is restarted, and therefore one or more of the memory locations might be accessed

multiple times. This can result in repeated write accesses to a location that has been changed between the write

accesses.

Note

Software must not use an instruction that generates a sequence of accesses to access Device or Strongly-ordered

memory if the instruction might generate a synchronous Data Abort exception on any access other than the first one.

The only architecturally-required difference between Device and Strongly-ordered memory is that:

• a write to Strongly-ordered memory can complete only when it reaches the peripheral or memory component

accessed by the write

• a write to Device memory is permitted to complete before it reaches the peripheral or memory component

accessed by the write.

Note

In addition, as described in Shareable attribute for Device memory regions, in an implementation that does not

include the Large Physical Address Extension, Device memory has Shareability attributes, the interpretation of

which is IMPLEMENTATION DEFINED, and might mean a Device memory region is not shareable.

The architecture does not permit unaligned accesses to Strongly-ordered or Device memory. Memory access

restrictions on page A3-138 summarizes the behavior of such accesses.

Shareable attribute for Device memory regions

In an implementation that does not include the Large Physical Address Extension, Device memory regions can be

given the Shareable attribute. When a Device memory region is give the Shareable attribute it can also be given the

Outer Shareable attribute. This means that a region of Device memory can be described as one of:

• Outer Shareable Device memory

• Inner Shareable Device memory

• Non-shareable Device memory.

Some implementations make no distinction between Outer Shareable Device memory and Inner Shareable Device

memory, and refer to both memory types as Shareable Device memory.

Some implementations make no distinction between Shareable Device memory and Non-shareable Device memory,

and refer to both memory types as Shareable Device memory.

For Device memory regions, the significance of shareability is IMPLEMENTATION DEFINED. However, an example

of a system supporting Shareable and Non-shareable Device memory is an implementation that supports both:

• a local bus for its private peripherals

• system peripherals implemented on the main shared system bus.

A3 Application Level Memory Model

A3.5 Memory types and attributes and the memory order model

Non-Confidential ID051414

Such a system might have more predictable access times for local peripherals such as watchdog timers or interrupt

controllers. In particular, a specific address in a Non-shareable Device memory region might access a different

physical peripheral for each processor.

ARM deprecates the marking of Device memory with a shareability attribute other than Outer Shareable or

Shareable. This means ARM strongly recommends that Device memory is never assigned a shareability attribute of

Non-shareable or Inner Shareable.

Device and Strongly-ordered memory shareability, Large Physical Address Extension

In an implementation that includes the Large Physical Address Extension, the Long-descriptor translation table

format does not distinguish between Shareable and Non-shareable Device memory.

In an implementation that includes the Large Physical Address Extension and is using the Short-descriptor

translation table format:

• An address-based cache maintenance operation for an addresses in a region with the Strongly-ordered or

Device memory type applies to all processors in the same Outer Shareable domain, regardless of any

shareability attributes applied to the region.

• Device memory transactions to a single peripheral must not be reordered, regardless of any shareability

attributes that are applied to the corresponding Device memory region.

Any single peripheral has an IMPLEMENTATION DEFINED size of not less than 1KB.

A3.5.7 Memory access restrictions

The following restrictions apply to memory accesses:

• For accesses to any two bytes, p and q, that are generated by the same instruction:

— The bytes p and q must have the same memory type and shareability attributes, otherwise the results

are UNPREDICTABLE. For example, an

LDC

LDM

LDRD

STC

STM

STRD

, or unaligned load or store that spans

a boundary between Normal and Device memory is UNPREDICTABLE.

— Except for possible differences in the cache allocation hints, ARM deprecates having different

cacheability attributes for the bytes p and q.

•Unaligned data access on page A3-108 identifies the instructions that can make an unaligned memory

access, and the required configuration setting. If such an access is to Device or Strongly-ordered memory

then:

— if the implementation does not include the Virtualization Extensions, the effect is UNPREDICTABLE

— if the implementation includes the Virtualization Extensions, the access generates an Alignment fault.

• The accesses of an instruction that causes multiple accesses to Device or Strongly-ordered memory must not

cross a 4KB address boundary, otherwise the effect is UNPREDICTABLE. For this reason, it is important that

an access to a volatile memory device is not made using a single instruction that crosses a 4KB address

boundary.

Note

This situation is UNPREDICTABLE even if the cause of the accesses is an unaligned access to Device or

Strongly-ordered memory in an implementation that includes the Virtualization Extensions.

ARM expects this restriction to impose constraints on the placing of volatile memory devices in the memory

map of a system, rather than expecting a compiler to be aware of the alignment of memory accesses.

• For any instruction that generates accesses to Device or Strongly-ordered memory, implementations must not

change the sequence of accesses specified by the pseudocode of the instruction. This includes not changing:

— how many accesses there are

— the time order of the accesses at any particular memory-mapped peripheral

— the data size and other properties of each access.

A3 Application Level Memory Model

A3.5 Memory types and attributes and the memory order model

ID051414 Non-Confidential

In addition, processor implementations expect any attached memory system to be able to identify the memory

type of accesses, and to obey similar restrictions with regard to the number, time order, data sizes and other

properties of the accesses.

Exceptions to this rule are:

— An implementation of a processor can break this rule, provided that the original number, time order,

and other details of the accesses can be reconstructed from the information it supplies to the memory

system. In addition, the implementation must place a requirement on attached memory systems to do

this reconstruction when the accesses are to Device or Strongly-ordered memory.

For example, an implementation with a 64-bit bus might pair the word loads generated by an

LDM

into

64-bit accesses. This is because the instruction semantics ensure that the 64-bit access is always a word

load from the lower address followed by a word load from the higher address. However the

implementation must permit the memory systems to unpack the two word loads when the access is to

Device or Strongly-ordered memory.

— An Advanced SIMD element or structure load instruction can access bytes in Device or

Strongly-ordered memory that are not explicitly accessed by the instruction, provided the bytes

accessed are within a 16-byte window, aligned to 16-bytes, that contains at least one byte that is

explicitly accessed by the instruction.

— There is no requirement for the memory system to be able to identify the size of the elements accessed

by an Advanced SIMD element or structure load/store instruction.

• In a PMSA implementation, and in a VMSA implementation when any associated MMU is enabled, any

multi-access instruction that loads or stores the PC must access only Normal memory. If the instruction

accesses Device or Strongly-ordered memory the result is UNPREDICTABLE.

• Any instruction fetch must access only Normal memory. If it accesses Device or Strongly-ordered memory,

the result is UNPREDICTABLE.

• If a single physical memory location has more than one set of attributes assigned to it, ARM strongly

recommends that software ensures that the sets of attributes are identical. For more information see

Mismatched memory attributes.

An example of where multiple sets of attributes might be assigned to the same physical memory location is

the use of aliases in a virtual to physical address mapping.

Mismatched memory attributes

A physical memory location is accessed with mismatched attributes if all accesses to the location do not use a

common definition of all of the following attributes of that location:

• memory type, Strongly-ordered, Device, or Normal

• shareability

• cacheability, for both the inner and outer levels of cache, but excluding any cache allocation hints.

The following rules apply when a physical memory location is accessed with mismatched attributes:

1. When a memory location is accessed with mismatched attributes the only software visible effects are one or

more of the following:

• Uniprocessor semantics for reads and writes to that memory location might be lost. This means:

— a read of the memory location by a thread of execution might not return the value most recently

written to that memory location by that thread of execution

— multiple writes to the memory location by a thread of execution, that use different memory

attributes, might not be ordered in program order.

• There might be a loss of coherency when multiple threads of execution attempt to access a memory

location.

• There might be a loss of properties derived from the memory type, see rule 2.

• If multiple threads of execution attempt to use Load-Exclusive or Store-Exclusive instructions to

access a location with different memory attributes, the exclusive monitor state becomes UNKNOWN.

A3 Application Level Memory Model

A3.5 Memory types and attributes and the memory order model

Non-Confidential ID051414

2. The loss of properties associated with mismatched memory type attributes refers only to the following

properties of Strongly-ordered or Device memory, that are additional to the properties of Normal memory:

• prohibition of speculative accesses

• preservation of the size of accesses

• preservation of the order of accesses

• the guarantee that the write acknowledgement comes from the endpoint of the access.

If the only memory type mismatch is between Strongly-ordered and Device memory, then the only property

that can be lost is:

• the guarantee that the write acknowledgement comes from the endpoint of the access.

3. If all aliases of a memory location that permit write access to the location assign the same shareability and

cacheability attributes to that location, and all these aliases use a definition of the shareability attribute that

includes all the threads of execution that can access the location, then any thread of execution that reads the

memory location using these shareability and cacheability attributes accesses it coherently, to the extent

required by that common definition of the memory attributes.

4. The possible loss of properties caused by mismatched attributes for a memory location are defined more

precisely if all of the mismatched attributes define the memory location as one of:

• Strongly-ordered memory

• Device memory

• Normal Inner Non-cacheable, Outer Non-cacheable memory.

In these cases, the only possible software-visible effects of the mismatched attributes are one or more of:

• possible loss of properties derived from the memory type when multiple threads of execution attempt

to access the memory location.

• possible re-ordering of memory transactions to the memory location that use different memory

attributes, potentially leading to a loss of coherency or uniprocessor semantics. Any possible loss of

coherency or uniprocessor semantics can be avoided by inserting

DMB

barrier instructions between

accesses to the same memory location that might use different attributes.

5. If the mismatched attributes for a memory location all assign the same shareability attribute to the location,

any loss of coherency within a shareability domain can be avoided. To do so, software must use the

techniques that are required for the software management of the coherency of cacheable locations between

threads of execution in different shareability domains. This means:

• If any thread of execution might have written to the location with the write-back attribute, before

writing to the location not using the write-back attribute, a thread of execution must invalidate, or

clean, the location from the caches. This avoids the possibility of overwriting the location with stale

data.

• After writing to the location with the write-back attribute, a thread of execution must clean the location

from the caches, to make the write visible to external memory.

• Before reading the location with a cacheable attribute, a thread of execution must invalidate the

location from the caches, to ensure that any value held in the caches reflects the last value made visible

in external memory.

In all cases:

• location refers to any byte within the current coherency granule

• a clean and invalidate operation can be used instead of a clean operation, or instead of an invalidate

operation

• to ensure coherency, all cache maintenance and memory transactions must be completed, or ordered

by the use of barrier operations.

Note

With software management of coherency, race conditions can cause loss of data. A race condition occurs

when different threads of execution write simultaneously to bytes that are in the same location, and the

(invalidate or clean), write, clean sequence of one thread overlaps the equivalent sequence of another thread.

A3 Application Level Memory Model

A3.5 Memory types and attributes and the memory order model

ID051414 Non-Confidential

6. If the mismatched attributes for a location mean that multiple cacheable accesses to the location might be

made with different shareability attributes, then coherency is guaranteed only if each thread of execution that

accesses the location with a cacheable attribute performs a clean and invalidate of the location.

Note

The Note in rule 5, about possible race conditions, also applies to this rule.

ARM strongly recommends that software does not use mismatched attributes for aliases of the same location. An

implementation might not optimize the performance of a system that uses mismatched aliases.

A3.5.8 The effect of the Security Extensions

The Security Extensions can be included as part of an ARMv7-A implementation, with a VMSA. They provide two

distinct 4GByte virtual memory spaces:

• a Secure virtual memory space

• a Non-secure virtual memory space.

The Secure virtual memory space is accessed by memory accesses in the Secure state, and the Non-secure virtual

memory space is accessed by memory accesses in the Non-secure state.

By providing different virtual memory spaces, the Security Extensions permit memory accesses made from the

Non-secure state to be distinguished from those made from the Secure state.

A3 Application Level Memory Model

A3.6 Access rights

Non-Confidential ID051414

A3.6 Access rights

ARMv7 defines additional memory region attributes, that define access permissions that can:

• Restrict data accesses, based on the privilege level of the access. See Privilege level access controls for data

accesses on page A3-143.

• Restrict instruction fetches, based on the privilege level of the process or thread making the fetch. See

Privilege level access controls for instruction accesses on page A3-143.

• On a system that implements the Security Extensions, restrict accesses so that only memory accesses with

the Secure memory attribute are permitted. See Memory region security status on page A3-144.

These attributes are defined:

• In a VMSA implementation, in the MMU, see Memory access control on page B3-1356, Memory region

attributes on page B3-1366, and The effects of disabling MMUs on VMSA behavior on page B3-1314.

• In a PMSA implementation, in the MPU, see Memory access control on page B5-1761 and Memory region

attributes on page B5-1762.

A3.6.1 Processor privilege levels, execution privilege, and access privilege

As introduced in About the Application level programmers’ model on page A2-38, within a security state, the

ARMv7 architecture defines different levels of execution privilege:

• in Secure state, the privilege levels are PL1 and PL0

• in Non-secure state, the privilege levels are PL2, PL1, and PL0.

PL0 indicates unprivileged execution in the current security state.

The current processor mode determines the execution privilege level, and therefore the execution privilege level can

be described as the processor privilege level.

Every memory access has an access privilege, that is either unprivileged or privileged.

The characteristics of the privilege levels are:

PL0 The privilege level of application software, that executes in User mode. Therefore, software

executed in User mode is described as unprivileged software. This software cannot access some

features of the architecture. In particular, it cannot change many of the configuration settings.

Software executing at PL0 makes only unprivileged memory accesses.

PL1 Software execution in all modes other than User mode and Hyp mode is at PL1. Normally, operating

system software executes at PL1. Software executing at PL1 can access all features of the

architecture, and can change the configuration settings for those features, except for some features

added by the Virtualization Extensions that are only accessible at PL2.

Note

In many implementation models, system software is unaware of the PL2 level of privilege, and of

whether the implementation includes the Virtualization Extensions.

The PL1 modes refers to all the modes other than User mode and Hyp mode.

Software executing at PL1 makes privileged memory accesses by default, but can also make

unprivileged accesses.

PL2 Software executing in Hyp mode executes at PL2.

Software executing at PL2 can perform all of the operations accessible at PL1, and can access some

additional functionality.

Hyp mode is normally used by a hypervisor, that controls, and can switch between, Guest OSs, that

execute at PL1.

A3 Application Level Memory Model

A3.6 Access rights

ID051414 Non-Confidential

Hyp mode is implemented only as part of the Virtualization Extensions, and only in Non-secure

state. This means that:

• implementations that do not include the Virtualization Extensions have only two privilege

levels, PL0 and PL1

• execution in Secure state has only two privilege levels, PL0 and PL1.

In an implementation that includes the Security Extensions, the execution privilege levels are defined independently

in each security state, and there is no relationship between the Secure and Non-secure privilege levels.

Note

The fact that Non-secure Hyp mode executes at PL2 does not indicate that it is more privileged than the Secure PL1

modes. Secure PL1 modes can change the configuration and control settings for Non-secure operation in all modes,

but Non-secure modes can never change the configuration and control settings for Secure operation.

Memory access permissions can be assigned:

• at PL1, for accesses made at PL1 and at PL0

• in Non-secure state, at PL2, independently for:

— Non-secure accesses made at PL2

— Non-secure accesses made at PL1, and at PL0.

A3.6.2 Privilege level access controls for data accesses

The memory access permissions assigned at PL1 can define that a memory region is:

• Not accessible to any accesses.

• Accessible only to accesses at PL1.

• Accessible to accesses at any level of privilege.

In Non-secure state, separate memory access permissions can be assigned at PL2 for:

• Accesses made at PL1 and PL0.

• Accesses made at PL2.

The access privilege level is defined separately for explicit read and explicit write accesses. However, a system that

specifies the memory attributes is not required to support all combinations of memory attributes for read and write

accesses.

A privileged memory access is an access made during execution at PL1 or higher, as a result of a load or store

operation other than

LDRT

STRT

LDRBT

STRBT

LDRHT

STRHT

LDRSHT

, or

LDRSBT

An unprivileged memory access is an access made as a result of load or store operation performed in one of these

cases:

• When the processor is at PL0.

• When the processor is at PL1, and the access is made as a result of a

LDRT

STRT

LDRBT

STRBT

LDRHT

STRHT

LDRSHT

, or

LDRSBT

instruction.

A Data Abort exception is generated if the processor attempts a data access that the access rights do not permit. For

example, a Data Abort exception is generated if the processor is at PL0 and attempts to access a memory region that

is marked as only accessible to privileged memory accesses.

A3.6.3 Privilege level access controls for instruction accesses

Memory attributes access permissions assigned at PL1 can define that a memory region is:

• Not accessible for execution.

• Not accessible for execution at PL1 Only implementations that include the Large Physical Address Extension

support this attribute.

A3 Application Level Memory Model

A3.6 Access rights

Non-Confidential ID051414

• Accessible for execution only at PL1.

• Accessible for execution at any level of privilege.

In Non-secure state, in an implementation that includes the Virtualization Extensions, separate memory access

permissions can be assigned at PL2 for:

• Accesses made at PL1 and PL0.

• Accesses made at PL2.

To define the instruction access rights to a memory region, the memory attributes describe, separately, for the

region:

• Its read access rights. These are equivalent to the read access rights described in Privilege level access

controls for data accesses on page A3-143.

• Whether software can be executed from the region. This is indicated by whether or not an Execute-never

(XN) attribute is assigned to the region.

• For an implementation that includes the Large Physical Address Extension, whether software can be

executed at PL1 from the region. This is indicated by whether or not a Privileged execute-never (PXN)

attribute is assigned to the region.

This means there is a linkage between the memory attributes that define the accessibility of a region to data accesses,

and those that define whether instructions can be executed from the region. For example, a region that is accessible

for execution only at PL1 or higher:

• Has the memory attribute indicating that it is accessible only to read accesses at PL1 or higher.

• Does not have the Execute-never attribute

• If the implementation includes the Large Physical Address Extension, does not have the Privileged

execute-never attribute.

Any attempt to execute an instruction from a memory location with an applicable execute-never attribute generates

a memory fault.

A3.6.4 Memory region security status

If an implementation includes the Security Extensions, an additional memory attribute determines whether the

memory region is Secure or Non-secure. Such an implementation checks this attribute, to ensure that a region of

memory that the system designates as Secure is not accessed by memory accesses with the Non-secure memory

attribute. For more information, see Memory region attributes on page B3-1366.

A3 Application Level Memory Model

A3.7 Virtual and physical addressing

ID051414 Non-Confidential

A3.7 Virtual and physical addressing

ARMv7 provides three alternative architectural profiles, ARMv7-A, ARMv7-R and ARMv7-M. Each of the

profiles specifies a different memory system. This manual describes two of these profiles:

ARMv7-A profile

The ARMv7-A memory system incorporates a Memory Management Unit (MMU), controlled by

CP15 registers. The memory system supports virtual addressing, with the MMU performing virtual

to physical address translation, in hardware, as part of program execution.

An ARMv7-A processor that implements the Virtualization Extensions provides two stages of

address translation for processes running at the Application level:

• The operating system defines the mappings from virtual addresses to intermediate physical

addresses (IPAs). When it does this, it believes it is mapping virtual addresses to physical

addresses.

• The hypervisor defines the mappings from IPAs to physical addresses. These translations are

invisible to the operating system.

For more information see About address translation on page B3-1311.

ARMv7-R profile

The ARMv7-R memory system incorporates a Memory Protection Unit (MPU), controlled by CP15

registers. The MPU does not support virtual addressing.

At the Application level, the difference between the ARMv7-A and ARMv7-R memory systems is transparent.

Regardless of which profile is implemented, an application accesses the memory map described in Address space

on page A3-106, and the implemented memory system makes the features described in this chapter available to the

application.

For a system level description of the ARMv7-A and ARMv7-R memory models see:

•Chapter B2 Common Memory System Architecture Features

•Chapter B3 Virtual Memory System Architecture (VMSA)

•Chapter B5 Protected Memory System Architecture (PMSA).

Note

This manual does not describe the ARMv7-M profile. For details of this profile see the ARMv7-M Architecture

Reference Manual.

A3 Application Level Memory Model

A3.8 Memory access order

Non-Confidential ID051414

A3.8 Memory access order

ARMv7 provides a set of three memory types, Normal, Device, and Strongly-ordered, with well-defined memory

access properties.

The ARMv7 application level view of the memory attributes is described in:

•Memory types and attributes and the memory order model on page A3-126

•Access rights on page A3-142.

When considering memory access ordering, an important feature of the ARMv7 memory model is the Shareable

memory attribute, that indicates whether a region of memory appears coherent for data accesses made by multiple

observers.

The key issues with the memory order model depend on the target audience:

• For software programmers, considering the model at the Application level, the key factor is that for accesses

to Normal memory barriers are required in some situations where the order of accesses observed by other

observers must be controlled.

• For silicon implementers, considering the model at the system level, the Strongly-ordered and Device

memory attributes place certain restrictions on the system designer in terms of what can be built and when to

indicate completion of an access.

Note

Implementations remain free to choose the mechanisms required to implement the functionality of the

memory model.

More information about the memory order model is given in the following subsections:

•Reads and writes

•Ordering requirements for memory accesses on page A3-149

•Memory barriers on page A3-151.

Additional attributes and behaviors relate to the memory system architecture. These features are defined in the

system level section of this manual:

• Virtual memory systems based on an MMU, described in Chapter B3 Virtual Memory System Architecture

(VMSA).

• Protected memory systems based on an MPU, described in Chapter B5 Protected Memory System

Architecture (PMSA).

• Caches, described in Caches and branch predictors on page B2-1266.

Note

In these system level descriptions, some attributes are described in relation to an MMU. In general, these

descriptions can also be applied to an MPU based system.

A3.8.1 Reads and writes

Each memory access is either a read or a write. Explicit memory accesses are the memory accesses required by the

function of an instruction. The following can cause memory accesses that are not explicit:

• instruction fetches

• cache loads and write-backs

• translation table walks.

Except where otherwise stated, the memory ordering requirements only apply to explicit memory accesses.

A3 Application Level Memory Model

A3.8 Memory access order

ID051414 Non-Confidential

Reads

Reads are defined as memory operations that have the semantics of a load.

The memory accesses of the following instructions are reads:

•

LDR

LDRB

LDRH

LDRSB

, and

LDRSH

•

LDRT

LDRBT

LDRHT

LDRSBT

, and

LDRSHT

•

LDREX

LDREXB

LDREXD

, and

LDREXH

•

LDM

LDRD

POP

, and

RFE

•

LDC

LDC2

VLDM

VLDR

VLD1

VLD2

VLD3

VLD4

, and

VPOP

• The return of status values by

STREX

STREXB

STREXD

, and

STREXH

•

SWP

and

SWPB

. These instructions are available only in the ARM instruction set.

•

TBB

and

TBH

. These instructions are available only in the Thumb instruction set.

Hardware-accelerated opcode execution by the Jazelle extension can cause a number of reads to occur, according

to the state of the operand stack and the implementation of the Jazelle hardware acceleration.

Writes

Writes are defined as memory operations that have the semantics of a store.

The memory accesses of the following instructions are Writes:

•

STR

STRB

, and

STRH

•

STRT

STRBT

, and

STRHT

•

STREX

STREXB

STREXD

, and

STREXH

•

STM

STRD

PUSH

, and

SRS

•

STC

STC2

VPUSH

VSTM

VSTR

VST1

VST2

VST3

, and

VST4

•

SWP

and

SWPB

. These instructions are available only in the ARM instruction set.

Hardware-accelerated opcode execution by the Jazelle extension can cause a number of writes to occur, according

to the state of the operand stack and the implementation of the Jazelle hardware acceleration.

Synchronization primitives

Synchronization primitives must ensure correct operation of system semaphores in the memory order model. The

synchronization primitive instructions are defined as those instructions that are executed to ensure memory

synchronization. They are the following instructions:

•

LDREX

STREX

LDREXB

STREXB

LDREXD

STREXD

LDREXH

STREXH

•

SWP

SWPB

. From ARMv6, ARM deprecates the use of these instructions.

Observability and completion

An observer is an agent in the system that can access memory. For a processor, the following mechanisms must be

treated as independent observers:

• the mechanism that performs reads or writes to memory

• a mechanism that causes an instruction cache to be filled from memory or that fetches instructions to be

executed directly from memory

• a mechanism that performs translation table walks.

The set of observers that can observe a memory access is defined by the system.

In the definitions in this subsection, subsequent means whichever of the following is appropriate to the context:

• after the point in time where the location is observed by that observer

• after the point in time where the location is globally observed.

A3 Application Level Memory Model

A3.8 Memory access order

Non-Confidential ID051414

For all memory:

• a write to a location in memory is said to be observed by an observer when:

— a subsequent read of the location by the same observer will return the value written by the observed

write, or written by a write to that location by any observer that is sequenced in the Coherence order

of the location after the observed write

— a subsequent write of the location by the same observer will be sequenced in the Coherence order of

the location after the observed write

• a write to a location in memory is said to be globally observed for a shareability domain when:

— a subsequent read of the location by any observer in that shareability domain will return the value

written by the globally observed write, or written by a write to that location by any observer that is

sequenced in the Coherence order of the location after the globally observed write

— a subsequent write of the location by any observer in that shareability domain will be sequenced in the

Coherence order of the location after the globally observed write

• a read of a location in memory is said to be observed by an observer when a subsequent write to the location

by the same observer will have no effect on the value returned by the read

• a read of a location in memory is said to be globally observed for a shareability domain when a subsequent

write to the location by any observer in that shareability domain will have no effect on the value returned by

the read.

Additionally, for Strongly-ordered memory:

• A read or write of a memory-mapped location in a peripheral that exhibits side-effects is said to be observed,

and globally observed, only when the read or write:

— meets the general conditions listed

— can begin to affect the state of the memory-mapped peripheral

— can trigger all associated side-effects, whether they affect other peripheral devices, processors, or

memory.

Note

This definition is consistent with the memory access having reached the peripheral.

For all memory, the completion rules are defined as:

• A read or write is complete for a shareability domain when all of the following are true:

— the read or write is globally observed for that shareability domain

— any translation table walks associated with the read or write are complete for that shareability domain.

• A translation table walk is complete for a shareability domain when the memory accesses associated with the

translation table walk are globally observed for that shareability domain, and the TLB is updated.

• A cache, branch predictor, or TLB maintenance operation is complete for a shareability domain when the

effects of the operation are globally observed for that shareability domain, and any translation table walks

that arise from the operation are complete for that shareability domain.

The completion of any cache, branch predictor or TLB maintenance operation includes its completion on all

processors that are affected by both the operation and the DSB operation that is required to guarantee

visibility of the maintenance operation.

Completion of side-effects of accesses to Strongly-ordered and Device memory

The completion of a memory access to Strongly-ordered or Device memory is not guaranteed to be sufficient to

determine that the side-effects of the memory access are visible to all observers. The mechanism that ensures the

visibility of side-effects of a memory access is IMPLEMENTATION DEFINED.

A3 Application Level Memory Model

A3.8 Memory access order

ID051414 Non-Confidential

A3.8.2 Ordering requirements for memory accesses

ARMv7 and ARMv6 define access restrictions in the permitted ordering of memory accesses. These restrictions

depend on the memory attributes of the accesses involved.

Two terms used in describing the memory access ordering requirements are:

Address dependency

An address dependency exists when the value returned by a read access is used for the computation

of the virtual address of a subsequent read or write access. An address dependency exists even if the

value read by the first read access does not change the virtual address of the second read or write

access. This might be the case if the value returned is masked off before it is used, or if it has no

effect on the predicted address value for the second access.

Control dependency

A control dependency exists when the data value returned by a read access determines the condition

flags, and the values of the flags are used in the condition code checking that determines the address

of a subsequent read access. This address determination might be through conditional execution, or

through the evaluation of a branch.

Figure A3-5 shows the memory ordering between two explicit accesses A1 and A2, where A1 occurs before A2 in

program order. In the figure, an access refers to a read or a write access to the specified memory type. For example,

Normal access refers to a read or write access to Normal memory. The symbols used in the figure are as follows:

< Accesses must arrive at any particular memory-mapped peripheral or block of memory in program

order, that is, A1 must arrive before A2. There are no ordering restrictions about when accesses

arrive at different peripherals or blocks of memory, provided that accesses follow the general

ordering rules given in this section.

- Accesses can arrive at any memory-mapped peripheral or block of memory in any order, provided

that the accesses follow the general ordering rules given in this section.

The size of a memory mapped peripheral, or a block of memory, is IMPLEMENTATION DEFINED, but is not smaller

than 1KByte.

Note

This implies that the maximum memory-mapped peripheral size for which the architecture guarantees order for all

implementations is 1KB.

Figure A3-5 Memory ordering restrictions

There are no ordering requirements for implicit accesses to any type of memory.

The following additional restrictions apply to the ordering of all memory accesses:

• For all accesses from a single observer, the requirements of uniprocessor semantics must be maintained, for

example:

— respecting dependencies between instructions in a single processor

— coherency.

Normal access

Strongly-ordered access

Normal access Strongly-ordered access ‡Device access ‡

A1 A2

Device access

‡ The ordering requirements for Device and Strongly-ordered accesses are identical.

A3 Application Level Memory Model

A3.8 Memory access order

Non-Confidential ID051414

• If there is an address dependency then the two memory accesses are observed in program order by any

observer in the common shareability domain of the two accesses.

This ordering restriction does not apply if there is only a control dependency between the two read accesses.

If there is both an address dependency and a control dependency between two read accesses the ordering

requirements of the address dependency apply.

• If the value returned by a read access is used as data written by a subsequent write access, then the two

memory accesses are observed in program order by any observer in the common shareability domain of the

two accesses.

• It is impossible for an observer in the shareability domain of a memory location to observe an access by a

store instruction that has not been architecturally executed.

• It is impossible for an observer in the shareability domain of a memory location to observe two reads to the

same memory location performed by the same observer in an order that would not occur in a sequential

execution of a program.

• For an implementation that does not include the Multiprocessing Extensions, it is IMPLEMENTATION DEFINED

whether all writes complete in a finite period of time, or whether some writes require the execution of a

DSB

instruction to guarantee their completion.

• For an implementation that includes the Multiprocessing Extensions, all writes complete in a finite period of

time.

Note

This applies for all writes, including repeated writes to the same location.

Program order for instruction execution

The program order of instruction execution is the order of the instructions in a simple sequential execution of the

program.

Explicit memory accesses in an execution can be either:

Strictly Ordered

Denoted by <. Must occur strictly in order.

Ordered Denoted by <=. Can occur either in order or simultaneously.

Load/store multiple instructions, such as

LDM

LDRD

STM

, and

STRD

, generate multiple word accesses, each of which is

a separate access for the purpose of determining ordering.

The rules for determining program order for two accesses A1 and A2 are:

If A1 and A2 are generated by two different instructions:

• A1 < A2 if the instruction that generates A1 occurs before the instruction that generates A2 in program order

• A2 < A1 if the instruction that generates A2 occurs before the instruction that generates A1 in program order.

If A1 and A2 are generated by the same instruction:

• If A1 and A2 are the load and store generated by a

SWP

SWPB

instruction:

— A1 < A2 if A1 is the load and A2 is the store

— A2 < A1 if A2 is the load and A1 is the store.

A3 Application Level Memory Model

A3.8 Memory access order

ID051414 Non-Confidential

• In these descriptions:

—an LDM-class instruction is any form of

LDM

LDMDA

LDMDB

, or

LDMIB

, or a

POP

instruction that operates

on more than one register

—an LDC-class instruction is an

LDC

VLDM

VLDR

, or

VPOP

instruction

—an STM-class instruction is any form of

STM

STMDA

STMDB

, or

STMIB

, or a

PUSH

instruction that operates

on more than one register

—an STC-class instruction is an

STC

VSTM

VSTR

, or

VPUSH

instruction.

If A1 and A2 are two word loads generated by an LDC-class or LDM-class instruction, or two word stores

generated by an STC-class or STM-class instruction, excluding LDM-class and STM-class instructions with

a register list that includes the PC:

— A1 <= A2 if the address of A1 is less than the address of A2

— A2 <= A1 if the address of A2 is less than the address of A1.

If A1 and A2 are two word loads generated by an LDM-class instruction with a register list that includes the

PC or two word stores generated by an STM-class instruction with a register list that includes the PC, the

program order of the memory accesses is not defined.

• If A1 and A2 are two word loads generated by an

LDRD

instruction or two word stores generated by an

STRD

instruction, the program order of the memory accesses is not defined.

• If A1 and A2 are load or store accesses generated by Advanced SIMD element or structure load/store

instructions, the program order of the memory accesses is not defined.

• For any instruction or operation not explicitly mentioned in this section, if the single-copy atomicity rules

described in Single-copy atomicity on page A3-128 mean the operation becomes a sequence of accesses, then

the time-ordering of those accesses is not defined.

A3.8.3 Memory barriers

Memory barrier is the general term applied to an instruction, or sequence of instructions, that forces synchronization

events by a processor with respect to retiring load/store instructions. The ARM architecture defines a number of

memory barriers that provide a range of functionality, including:

• ordering of load/store instructions

• completion of load/store instructions

• context synchronization.

ARMv7 and ARMv6 require three explicit memory barriers to support the memory order model described in this

chapter. In ARMv7 the memory barriers are provided as instructions that are available in the ARM and Thumb

instruction sets, and in ARMv6 the memory barriers are performed by CP15 register writes. The three memory

barriers are:

• Data Memory Barrier, see Data Memory Barrier (DMB) on page A3-152

• Data Synchronization Barrier, see Data Synchronization Barrier (DSB) on page A3-153

• Instruction Synchronization Barrier, see Instruction Synchronization Barrier (ISB) on page A3-153.

Note

Depending on the required synchronization, a program might use memory barriers on their own, or it might use them

in conjunction with cache and memory management maintenance operations that are only available when software

execution is at PL1 or higher.

The DMB and DSB memory barriers affect reads and writes to the memory system generated by load/store

instructions and data or unified cache maintenance operations being executed by the processor. Instruction fetches

or accesses caused by a hardware translation table access are not explicit accesses.

A3 Application Level Memory Model

A3.8 Memory access order

Non-Confidential ID051414

Data Memory Barrier (DMB)

The

DMB

instruction is a data memory barrier. The processor that executes the

DMB

instruction is referred to as the

executing processor, Pe. The

DMB

instruction takes the required shareability domain and required access types as

arguments, see Shareability and access limitations on the data barrier operations on page A3-153. If the required

shareability is Full system then the operation applies to all observers within the system.

A DMB creates two groups of memory accesses, Group A and Group B:

Group A Contains:

• All explicit memory accesses of the required access types from observers in the same

required shareability domain as Pe that are observed by Pe before the

DMB

instruction. These

accesses include any accesses of the required access types performed by Pe.

• All loads of required access types from an observer Px in the same required shareability

domain as Pe that have been observed by any given different observer, Py, in the same

required shareability domain as Pe before Py has performed a memory access that is a

member of Group A.

Group B Contains:

• All explicit memory accesses of the required access types by Pe that occur in program order

after the

DMB

instruction.

• All explicit memory accesses of the required access types by any given observer Px in the

same required shareability domain as Pe that can only occur after a load by Px has returned

the result of a store that is a member of Group B.

Any observer with the same required shareability domain as Pe observes all members of Group A before it observes

any member of Group B to the extent that those group members are required to be observed, as determined by the

shareability and cacheability of the memory locations accessed by the group members.

Where members of Group A and members of Group B access the same memory-mapped peripheral of arbitrary

system-defined size, then members of Group A that are accessing Strongly-ordered, Device, or Normal

Non-cacheable memory arrive at that peripheral before members of Group B that are accessing Strongly-ordered,

Device, or Normal Non-cacheable memory. If the memory accesses are not to a peripheral, then there are no

restrictions from this paragraph.

Note

• Where the members of Group A and Group B that must be ordered are from the same processor, a

DMB NSH

sufficient for this guarantee.

• A memory access might be in neither Group A nor Group B. The DMB does not affect the order of

observation of such a memory access.

• The second part of the definition of Group A is recursive. Ultimately, membership of Group A derives from

the observation by Py of a load before Py performs an access that is a member of Group A as a result of the

first part of the definition of Group A.

• The second part of the definition of Group B is recursive. Ultimately, membership of Group B derives from

the observation by any observer of an access by Pe that is a member of Group B as a result of the first part of

the definition of Group B.

DMB

only affects memory accesses and data and unified cache maintenance operations, see Cache and branch

predictor maintenance operations on page B2-1278. It has no effect on the ordering of any other instructions

executing on the processor.

For details of the

DMB

instruction in the Thumb and ARM instruction sets see DMB on page A8-378.

A3 Application Level Memory Model

A3.8 Memory access order

ID051414 Non-Confidential

Data Synchronization Barrier (DSB)

The

DSB

instruction is a special memory barrier, that synchronizes the execution stream with memory accesses. The

DSB

instruction takes the required shareability domain and required access types as arguments, see Shareability and

access limitations on the data barrier operations. If the required shareability is Full system then the operation

applies to all observers within the system.

DSB

behaves as a

DMB

with the same arguments, and also has the additional properties defined here.

DSB

completes when:

• all explicit memory accesses that are observed by Pe before the

DSB

is executed, are of the required access

types, and are from observers in the same required shareability domain as Pe, are complete for the set of

observers in the required shareability domain.

• if the required accesses types of the

DSB

is reads and writes, all cache and branch predictor maintenance

operations issued by Pe before the

DSB

are complete for the required shareability domain.

• if the required accesses types of the

DSB

is reads and writes, all TLB maintenance operations issued by Pe

before the

DSB

are complete for the required shareability domain.

In addition, no instruction that appears in program order after the

DSB

instruction can execute until the

DSB

completes.

For details of the

DSB

instruction in the Thumb and ARM instruction sets see DSB on page A8-380.

Note

Historically, this operation was referred to as Drain Write Buffer or Data Write Barrier (DWB). From ARMv6, these

names and the use of DWB were deprecated in favor of the new Data Synchronization Barrier name and DSB

abbreviation. DSB better reflects the functionality provided from ARMv6, because DSB is architecturally defined

to include all cache, TLB and branch prediction maintenance operations as well as explicit memory operations.

Instruction Synchronization Barrier (ISB)

ISB

instruction flushes the pipeline in the processor, so that all instructions that come after the

ISB

instruction in

program order are fetched from cache or memory only after the

ISB

instruction has completed. Using an

ISB

ensures

that the effects of context-changing operations executed before the

ISB

are visible to the instructions fetched after

the

ISB

instruction. Examples of context-changing operations that require the insertion of an

ISB

instruction to ensure

the effects of the operation are visible to instructions fetched after the

ISB

instruction are:

• completed cache, TLB, and branch predictor maintenance operations

• changes to system control registers.

Any context-changing operations appearing in program order after the

ISB

instruction only take effect after the

ISB

has been executed.

For more information about the

ISB

instruction in the Thumb and ARM instruction sets, see ISB on page A8-389.

Shareability and access limitations on the data barrier operations

The

DMB

and

DSB

instructions can each take an optional limitation argument that specifies:

• the shareability domain over which the instruction must operate, as one of:

— full system

— Outer Shareable

— Inner Shareable

— Non-shareable

• the accesses for which the instruction operates, as one of:

— read and write accesses

— write accesses only.

A3 Application Level Memory Model

A3.8 Memory access order

Non-Confidential ID051414

By default, each instruction operates for read and write accesses, over the full system, and whether an

implementation supports any other options is IMPLEMENTATION DEFINED. See the instruction descriptions for more

information about these arguments.

Note

ISB

also supports an optional limitation argument, but supports only one value for that argument, that corresponds

to full system operation.

In an implementation that includes the Virtualization Extensions, and supports shareability limitations on the data

barrier operations, the HCR.BSU field can upgrade the required shareability of the operation for an instruction that

is executed in a Non-secure PL1 or PL0 mode. Table A3-6 shows the encoding of this field:

For an instruction executed in a Non-secure PL1 or PL0 mode, Table A3-7 shows how HCR.BSU upgrades the

shareability specified by the argument of the

DMB

DSB

instruction:

Pseudocode details of memory barriers

The following types define the required shareability domains and required access types used as arguments for

DMB

and

DSB

instructions:

enumeration MBReqDomain {MBReqDomain_FullSystem,

MBReqDomain_OuterShareable,

MBReqDomain_InnerShareable,

MBReqDomain_Nonshareable};

enumeration MBReqTypes {MBReqTypes_All, MBReqTypes_Writes};

Table A3-6 HCR.BSU encoding

HCR.BSU Minimum shareability of instruction

00 No effect, shareability is as specified by the instruction

01 Inner Shareable

10 Outer Shareable

11 Full system

Table A3-7 Upgrading the shareability of data barrier operations

Shareability from

DMB

DSB

argument HCR.BSU Resultant shareability

Full system Any Full system

Outer Shareable

, or

Outer Shareable

, Full system Full system

Inner Shareable

, Outer Shareable Outer Shareable

, Full system Full system

Non-shareable

, No effect Non-shareable

, Inner Shareable Inner Shareable

, Outer Shareable Outer Shareable

, Full system Full system

A3 Application Level Memory Model

A3.8 Memory access order

ID051414 Non-Confidential

The following procedures perform the memory barriers:

DataMemoryBarrier(MBReqDomain domain, MBReqTypes types)

DataSynchronizationBarrier(MBReqDomain domain, MBReqTypes types)

InstructionSynchronizationBarrier()

A3 Application Level Memory Model

A3.9 Caches and memory hierarchy

Non-Confidential ID051414

A3.9 Caches and memory hierarchy

The implementation of a memory system depends heavily on the microarchitecture and therefore the details of the

system are IMPLEMENTATION DEFINED. ARMv7 defines the application level interface to the memory system, and

supports a hierarchical memory system with multiple levels of cache. This section provides an application level

view of this system. It contains the subsections:

•Introduction to caches

•Memory hierarchy

•Implication of caches for the application programmer on page A3-157

•Preloading caches on page A3-158.

A3.9.1 Introduction to caches

A cache is a block of high-speed memory that contains a number of entries, each consisting of:

• main memory address information, commonly called a tag

• the associated data.

Caches increase the average speed of a memory access. Cache operation takes account of two principles of locality:

Spatial locality

An access to one location is likely to be followed by accesses to adjacent locations. Examples of this

principle are:

• sequential instruction execution

• accessing a data structure.

Temporal locality

An access to an area of memory is likely to be repeated in a short time period. An example of this

principle is the execution of a software loop.

To minimize the quantity of control information stored, the spatial locality property groups several locations

together under the same tag. This logical block is commonly called a cache line. When data is loaded into a cache,

access times for subsequent loads and stores are reduced, resulting in overall performance benefits. An access to

information already in a cache is called a cache hit, and other accesses are called cache misses.

Normally, caches are self-managing, with the updates occurring automatically. Whenever the processor wants to

access a cacheable location, the cache is checked. If the access is a cache hit, the access occurs in the cache,

otherwise a location is allocated and the cache line loaded from memory. Different cache topologies and access

policies are possible, however, they must comply with the memory coherency model of the underlying architecture.

Caches introduce a number of potential problems, mainly because of:

• memory accesses occurring at times other than when the programmer would otherwise expect them

• there being multiple physical locations where a data item can be held.

A3.9.2 Memory hierarchy

Memory close to a processor has very low latency, but is limited in size and expensive to implement. Further from

the processor it is easier to implement larger blocks of memory but these have increased latency. To optimize overall

performance, an ARMv7 memory system can include multiple levels of cache in a hierarchical memory system.

Figure A3-6 on page A3-157 shows such a system, in an ARMv7-A implementation of a VMSA, supporting virtual

addressing.

A3 Application Level Memory Model

A3.9 Caches and memory hierarchy

ID051414 Non-Confidential

Figure A3-6 Multiple levels of cache in a memory hierarchy

Note

In this manual, in a hierarchical memory system, Level 1 refers to the level closest to the processor, as shown in

Figure A3-6.

A3.9.3 Implication of caches for the application programmer

In normal operation, the caches are largely invisible to the application programmer. However they can become

visible when there is a breakdown in the coherency of the caches. Such a breakdown can occur:

• when memory locations are updated by other agents in the system

• when memory updates made from the application software must be made visible to other agents in the

system.

For example:

• In a system with a DMA controller that reads memory locations that are held in the data cache of a processor,

a breakdown of coherency occurs when the processor has written new data in the data cache, but the DMA

controller reads the old data held in memory.

• In a Harvard architecture of caches, where there are separate instruction and data caches, a breakdown of

coherency occurs when new instruction data has been written into the data cache, but the instruction cache

still contains the old instruction data.

Data coherency issues

Software can ensure the data coherency of caches in the following ways:

• By not using the caches in situations where coherency issues can arise. This can be achieved by:

— using Non-cacheable or, in some cases, Write-Through Cacheable memory

— not enabling caches in the system.

• By using cache maintenance operations to manage the coherency issues in software, see About ARMv7 cache

and branch predictor maintenance functionality on page B2-1274. Many of these operations are only

available to system software.

• By using hardware coherency mechanisms to ensure the coherency of data accesses to memory for cacheable

locations by observers within the different shareability domains, see Non-shareable Normal memory on

page A3-132 and Shareable, Inner Shareable, and Outer Shareable Normal memory on page A3-133.

Processor

R15

Instruction

fetch

Load

Store

Level 1

Cache

Level 2

Cache

Level 3

DRAM

SRAM

Flash

ROM Level 4

for example,

CF card, disk

Address

translation

CP15 configuration

and control

Virtual

address Physical address

A3 Application Level Memory Model

A3.9 Caches and memory hierarchy

Non-Confidential ID051414

The performance of these hardware coherency mechanisms is highly implementation-specific. In some

implementations the mechanism suppresses the ability to cache shareable locations. In other

implementations, cache coherency hardware can hold data in caches while managing coherency between

observers within the shareability domains.

Instruction coherency issues

How far ahead of the current point of execution instructions are fetched from is IMPLEMENTATION DEFINED. Such

prefetching can be either a fixed or a dynamically varying number of instructions, and can follow any or all possible

future execution paths. For all types of memory:

• the processor might have fetched the instructions from memory at any time since the last context

synchronization operation on that processor

• any instructions fetched in this way might be executed multiple times, if this is required by the execution of

the program, without being refetched from memory

Note

See Context synchronization operation for the definition of this term.

In addition, the ARM architecture does not require the hardware to ensure coherency between instruction caches

and memory, even for regions of memory with Shareable attributes. This means that for cacheable regions of

memory, an instruction cache can hold instructions that were fetched from memory before the context

synchronization operation.

If software requires coherency between instruction execution and memory, it must manage this coherency using the

ISB and DSB memory barriers and cache maintenance operations, see Ordering of cache and branch predictor

maintenance operations on page B2-1289. Many of these operations are only available to system software.

A3.9.4 Preloading caches

The ARM architecture provides memory system hints

PLD

(Preload Data),

PLDW

(Preload Data with intent to write),

and

PLI

(Preload Instruction) to permit software to communicate the expected use of memory locations to the

hardware. The memory system can respond by taking actions that are expected to speed up the memory accesses if

and when they do occur. The effect of these memory system hints is IMPLEMENTATION DEFINED. Typically,

implementations use this information to bring the data or instruction locations into caches that have faster access

times than normal memory.

The Preload instructions are hints, and so implementations can treat them as NOPs without affecting the functional

behavior of the device. The instructions do not generate synchronous Data Abort exceptions, but the memory system

operations might, under exceptional circumstances, generate asynchronous aborts. For more information, see Data

Abort exception on page B1-1215.

For more information about the operation of these instructions see Behavior of Preload Data (PLD, PLDW) and

Preload Instruction (PLI) with caches on page B2-1269.

Hardware implementations can provide other implementation-specific mechanisms to fetch memory locations in

the cache. These must comply with the general cache behavior described in Cache behavior on page B2-1267.

ID051414 Non-Confidential

Chapter A4

The Instruction Sets

This chapter describes the ARM and Thumb instruction sets. It contains the following sections:

•About the instruction sets on page A4-160

•Unified Assembler Language on page A4-162

•Branch instructions on page A4-164

•Data-processing instructions on page A4-165

•Status register access instructions on page A4-174

•Load/store instructions on page A4-175

•Load/store multiple instructions on page A4-177

•Miscellaneous instructions on page A4-178

•Exception-generating and exception-handling instructions on page A4-179

•Coprocessor instructions on page A4-180

•Advanced SIMD and Floating-point load/store instructions on page A4-181

•Advanced SIMD and Floating-point register transfer instructions on page A4-183

•Advanced SIMD data-processing instructions on page A4-184

•Floating-point data-processing instructions on page A4-191.

A4 The Instruction Sets

A4.1 About the instruction sets

Non-Confidential ID051414

A4.1 About the instruction sets

ARMv7 contains two main instruction sets, the ARM and Thumb instruction sets. Much of the functionality

available is identical in the two instruction sets. This chapter describes the functionality available in the instruction

sets, and the Unified Assembler Language (UAL) that can be assembled to either instruction set.

The two instruction sets differ in how instructions are encoded:

• Thumb instructions are either 16-bit or 32-bit, and are aligned on a two-byte boundary. 16-bit and 32-bit

instructions can be intermixed freely. Many common operations are most efficiently executed using 16-bit

instructions. However:

— Most 16-bit instructions can only access the first eight of the ARM core registers, R0-R7. These are

called the low registers. A small number of 16-bit instructions can also access the high registers,

R8-R15.

— Many operations that would require two or more 16-bit instructions can be more efficiently executed

with a single 32-bit instruction.

— All 32-bit instructions can access all of the ARM core registers, R0-R15.

• ARM instructions are always 32-bit, and are aligned on a four-byte boundary.

The ARM and Thumb instruction sets can interwork freely, that is, different procedures can be compiled or

assembled to different instruction sets, and still be able to call each other efficiently.

ThumbEE is a variant of the Thumb instruction set that is designed as a target for dynamically generated code.

However, it cannot interwork freely with the ARM and Thumb instruction sets.

In an implementation that includes a non-trivial Jazelle extension, the processor can execute some Java bytecodes

in hardware. For more information see Jazelle direct bytecode execution support on page A2-97. The processor

executes Java bytecodes when it is in Jazelle state. However, this execution is outside the scope of this manual.

See:

•Chapter A5 ARM Instruction Set Encoding for encoding details of the ARM instruction set

•Chapter A6 Thumb Instruction Set Encoding for encoding details of the Thumb instruction set

•Chapter A8 Instruction Descriptions for detailed descriptions of the instructions

•Chapter A9 The ThumbEE Instruction Set for encoding details of the ThumbEE instruction set.

A4.1.1 Changing between Thumb state and ARM state

A processor in ARM state executes ARM instructions, and a processor in Thumb state executes Thumb instructions.

A processor in Thumb state can enter ARM state by executing any of the following instructions:

BLX

, or an

LDR

LDM

that loads the PC.

A processor in ARM state can enter Thumb state by executing any of the same instructions.

In ARMv7, a processor in ARM state can also enter Thumb state by executing an

ADC

ADD

AND

ASR

BIC

EOR

LSL

LSR

MOV

MVN

ORR

ROR

RRX

RSB

RSC

SBC

, or

SUB

instruction that has the PC as destination register and does not set the

condition flags.

Note

This permits calls and returns between ARM code written for ARMv4 processors and Thumb code running on

ARMv7 processors to function correctly. ARM recommends that new software uses

BLX

instructions instead.

In particular, ARM recommends that software uses

BX LR

to return from a procedure, not

MOV PC, LR

The target instruction set is either encoded directly in the instruction (for the immediate offset version of

BLX

), or is

held as bit[0] of an interworking address. For details, see the description of the

BXWritePC()

function in Pseudocode

details of operations on ARM core registers on page A2-47.

Exception entries and returns can also change between ARM and Thumb states. For details see Exception handling

on page B1-1165.

A4 The Instruction Sets

A4.1 About the instruction sets

ID051414 Non-Confidential

A4.1.2 Conditional execution

In the ARM and Thumb instruction sets, most instructions can be conditionally executed.

In the ARM instruction set, conditional execution means that an instruction only has its normal effect on the

programmers’ model operation, memory and coprocessors if the N, Z, C and V condition flags in the APSR satisfy

a condition specified by the

cond

field in the instruction encoding. If the flags do not satisfy this condition, the

instruction acts as a NOP, that is, execution advances to the next instruction as normal, including any relevant checks

for exceptions being taken, but has no other effect.

In the Thumb instruction set, different mechanisms control conditional execution:

• For the following Thumb encodings, conditional execution is controlled in a similar way to the ARM

instructions:

— A 16-bit conditional branch instruction encoding, with a branch range of –256 to +254 bytes. Before

ARMv6T2, this was the only mechanism for conditional execution in Thumb code.

— A 32-bit conditional branch instruction encoding, with a branch range of approximately ±1MB.

For more information about these encodings see B on page A8-334.

• The

CBZ

and

CBNZ

instructions, Compare and Branch on Zero and Compare and Branch on Nonzero, are 16-bit

conditional instructions with a branch range of +4 to +130 bytes. For details see CBNZ, CBZ on page A8-356.

• The 16-bit If-Then instruction makes up to four following instructions conditional, and can make most other

Thumb instructions conditional. For details see IT on page A8-390. The instructions that are made

conditional by an

instruction are called its IT block. For any IT block, either:

— all instructions have the same condition

— some instructions have one condition, and the other instructions have the inverse condition.

ARM deprecates the conditional execution of any instruction encoding provided by the Advanced SIMD Extension

that is not also provided by the Floating-point (VFP) Extension, and strongly recommends that any such instruction

that can be conditionally executed is specified with the

<c>

field omitted or set to

. For more information, see

Conditional execution on page A8-288.

For more information about conditional execution see Conditional execution on page A8-288.

A4.1.3 Writing to the PC

Writing to the PC on page A2-46 gives an overview of instructions that write to the PC, including the required

behavior of these writes. This information is also given in the appropriate sections of this chapter.

A4.1.4 Permanently UNDEFINED encodings

All versions of the ARM architecture define some encodings as permanently UNDEFINED. That is, permanently

UNDEFINED encodings are defined in the ARM instruction set encodings, and in the 16-bit and 32-bit Thumb

encodings. From issue C.a of this manual, ARM defines an assembler mnemonic for the unconditional forms of

these instructions, see UDF on page A8-758.

A4 The Instruction Sets

A4.2 Unified Assembler Language

Non-Confidential ID051414

A4.2 Unified Assembler Language

This document uses the ARM Unified Assembler Language (UAL). This assembly language syntax provides a

canonical form for all ARM and Thumb instructions.

UAL describes the syntax for the mnemonic and the operands of each instruction. In addition, it assumes that

instructions and data items can be given labels. It does not specify the syntax to be used for labels, nor what

assembler directives and options are available. See your assembler documentation for these details.

Most earlier ARM assembly language mnemonics are still supported as synonyms, as described in the instruction

details.

Note

Most earlier Thumb assembly language mnemonics are not supported. For more information, see Appendix D8

Legacy Instruction Mnemonics.

UAL includes instruction selection rules that specify which instruction encoding is selected when more than one

can provide the required functionality. For example, both 16-bit and 32-bit encodings exist for an

ADD R0, R1, R2

instruction. The most common instruction selection rule is that when both a 16-bit encoding and a 32-bit encoding

are available, the 16-bit encoding is selected, to optimize code density.

Syntax options exist to override the normal instruction selection rules and ensure that a particular encoding is

selected. These are useful when disassembling code, to ensure that subsequent assembly produces the original code,

and in some other situations.

A4.2.1 Conditional instructions

For maximum portability of UAL assembly language between the ARM and Thumb instruction sets, ARM

recommends that:

•

instructions are written before conditional instructions in the correct way for the Thumb instruction set.

• When assembling to the ARM instruction set, assemblers check that any

instructions are correct, but do

not generate any code for them.

Although other Thumb instructions are unconditional, all instructions that are made conditional by an

instruction

must be written with a condition. These conditions must match the conditions imposed by the

instruction. For

example, an

ITTEE EQ

instruction imposes the

condition on the first two following instructions, and the

condition on the next two. Those four instructions must be written with

and

conditions respectively.

Some instructions cannot be made conditional by an

instruction. Some instructions can be conditional if they are

the last instruction in the IT block, but not otherwise.

The branch instruction encodings that include a condition code field cannot be made conditional by an

instruction. If the assembler syntax indicates a conditional branch that correctly matches a preceding

instruction,

it is assembled using a branch instruction encoding that does not include a condition code field.

A4.2.2 Use of labels in UAL instruction syntax

The UAL syntax for some instructions includes the label of an instruction or a literal data item that is at a fixed offset

from the instruction being specified. The assembler must:

1. Calculate the

Align(PC, 4)

value of the instruction. The

value of an instruction is its address plus 4

for a Thumb instruction, or plus 8 for an ARM instruction. The

Align(PC, 4)

value of an instruction is its

value ANDed with

0xFFFFFFFC

to force it to be word-aligned. There is no difference between the

and

Align(PC, 4)

values for an ARM instruction, but there can be for a Thumb instruction.

2. Calculate the offset from the

Align(PC, 4)

value of the instruction to the address of the labelled

instruction or literal data item.

3. Assemble a PC-relative encoding of the instruction, that is, one that reads its

Align(PC, 4)

value and

adds the calculated offset to form the required address.

A4 The Instruction Sets

A4.2 Unified Assembler Language

ID051414 Non-Confidential

Note

For instructions that can encode a subtraction operation, if the instruction cannot encode the calculated offset

but can encode minus the calculated offset, the instruction encoding specifies a subtraction of minus the

calculated offset.

The syntax of the following instructions includes a label:

•

, and

BLX

(immediate). The assembler syntax for these instructions always specifies the label of the

instruction that they branch to. Their encodings specify a sign-extended immediate offset that is added to the

value of the instruction to form the target address of the branch.

•

CBNZ

and

CBZ

. The assembler syntax for these instructions always specifies the label of the instruction that they

branch to. Their encodings specify a zero-extended immediate offset that is added to the

value of the

instruction to form the target address of the branch. They do not support backward branches.

•

LDC

LDC2

LDR

LDRB

LDRD

LDRH

LDRSB

LDRSH

PLD

PLDW

PLI

, and

VLDR

. The normal assembler syntax of these

load instructions can specify the label of a literal data item that is to be loaded. The encodings of these

instructions specify a zero-extended immediate offset that is either added to or subtracted from the

Align(PC, 4)

value of the instruction to form the address of the data item. A few such encodings perform a

fixed addition or a fixed subtraction and must only be used when that operation is required, but most contain

a bit that specifies whether the offset is to be added or subtracted.

When the assembler calculates an offset of 0 for the normal syntax of these instructions, it must assemble an

encoding that adds 0 to the

Align(PC, 4)

value of the instruction. Encodings that subtract 0 from the

Align(PC,

value cannot be specified by the normal syntax.

There is an alternative syntax for these instructions that specifies the addition or subtraction and the

immediate offset explicitly. In this syntax, the label is replaced by

[PC, #+/-<imm>]

, where:

+/-

or omitted to specify that the immediate offset is to be added to the

Align(PC, 4)

value, or

if it is to be subtracted.

<imm>

Is the immediate offset.

This alternative syntax makes it possible to assemble the encodings that subtract 0 from the

Align(PC, 4)

value, and to disassemble them to a syntax that can be re-assembled correctly.

•

ADR

. The normal assembler syntax for this instruction can specify the label of an instruction or literal data item

whose address is to be calculated. Its encoding specifies a zero-extended immediate offset that is either added

to or subtracted from the

Align(PC, 4)

value of the instruction to form the address of the data item, and some

opcode bits that determine whether it is an addition or subtraction.

When the assembler calculates an offset of 0 for the normal syntax of this instruction, it must assemble the

encoding that adds 0 to the

Align(PC, 4)

value of the instruction. The encoding that subtracts 0 from the

Align(PC, 4)

value cannot be specified by the normal syntax.

There is an alternative syntax for this instruction that specifies the addition or subtraction and the immediate

value explicitly, by writing them as additions

ADD <Rd>, PC, #<imm>

or subtractions

SUB <Rd>, PC, #<imm>

This alternative syntax makes it possible to assemble the encoding that subtracts 0 from the

Align(PC, 4)

value, and to disassemble it to a syntax that can be re-assembled correctly.

Note

ARM recommends that where possible, software avoids using:

• The alternative syntax for the

ADR

LDC

LDC2

LDR

LDRB

LDRD

LDRH

LDRSB

LDRSH

PLD

PLI

PLDW

, and

VLDR

instructions.

• The encodings of these instructions that subtract 0 from the

Align(PC, 4)

value.

A4 The Instruction Sets

A4.3 Branch instructions

Non-Confidential ID051414

A4.3 Branch instructions

Table A4-1 summarizes the branch instructions in the ARM and Thumb instruction sets. In addition to providing

for changes in the flow of execution, some branch instructions can change instruction set.

Branches to loaded and calculated addresses can be performed by

LDR

LDM

and data-processing instructions. For

details see Load/store instructions on page A4-175, Load/store multiple instructions on page A4-177, Standard

data-processing instructions on page A4-165, and Shift instructions on page A4-167.

In addition to the branch instructions shown in Table A4-1:

• In the ARM instruction set, a data-processing instruction that targets the PC behaves as a branch instruction.

For more information, see Data-processing instructions on page A4-165.

• In the ARM and Thumb instruction sets, a load instruction that targets the PC behaves as a branch instruction.

For more information, see Load/store instructions on page A4-175.

Table A4-1 Branch instructions

Instruction See Range, Thumb Range, ARM

Branch to target address B on page A8-334 ±16MB ±32MB

Compare and Branch on Nonzero,

Compare and Branch on Zero

CBNZ, CBZ on page A8-356 0-126 bytes a

Call a subroutine

Call a subroutine, change instruction setb

BL, BLX (immediate) on

page A8-348

±16MB

±32MB

Call a subroutine, optionally change instruction set BLX (register) on page A8-350 Any Any

Branch to target address, change instruction set BX on page A8-352 Any Any

Change to Jazelle state BXJ on page A8-354 --

Table Branch (byte offsets)

Table Branch (halfword offsets)

TBB, TBH on page A8-736 0-510 bytes

0-131070 bytes

a. These instructions do not exist in the ARM instruction set.

b. The range is determined by the instruction set of the

BLX

instruction, not of the instruction it branches to.

A4 The Instruction Sets

A4.4 Data-processing instructions

ID051414 Non-Confidential

A4.4 Data-processing instructions

Core data-processing instructions belong to one of the following groups:

•Standard data-processing instructions.

These instructions perform basic data-processing operations, and share a common format with some

variations.

•Shift instructions on page A4-167.

•Multiply instructions on page A4-167.

•Saturating instructions on page A4-169.

•Saturating addition and subtraction instructions on page A4-169.

•Packing and unpacking instructions on page A4-170.

•Parallel addition and subtraction instructions on page A4-171.

•Divide instructions on page A4-172.

•Miscellaneous data-processing instructions on page A4-173.

For extension data-processing instructions, see Advanced SIMD data-processing instructions on page A4-184 and

Floating-point data-processing instructions on page A4-191.

A4.4.1 Standard data-processing instructions

These instructions generally have a destination register Rd, a first operand register Rn, and a second operand. The

second operand can be another register Rm, or an immediate constant.

If the second operand is an immediate constant, it can be:

• Encoded directly in the instruction.

•A modified immediate constant that uses 12 bits of the instruction to encode a range of constants. Thumb and

ARM instructions have slightly different ranges of modified immediate constants. For more information, see

Modified immediate constants in Thumb instructions on page A6-232 and Modified immediate constants in

ARM instructions on page A5-200.

If the second operand is another register, it can optionally be shifted in any of the following ways:

LSL

Logical Shift Left by 1-31 bits.

LSR

Logical Shift Right by 1-32 bits.

ASR

Arithmetic Shift Right by 1-32 bits.

ROR

Rotate Right by 1-31 bits.

RRX

Rotate Right with Extend. For details see Shift and rotate operations on page A2-41.

In Thumb code, the amount to shift by is always a constant encoded in the instruction. In ARM code, the amount to

shift by is either a constant encoded in the instruction, or the value of a register, Rs.

For instructions other than

CMN

CMP

TEQ

, and

TST

, the result of the data-processing operation is placed in the

destination register. In the ARM instruction set, the destination register can be the PC, causing the result to be treated

as a branch address. In the Thumb instruction set, this is only permitted for some 16-bit forms of the

ADD

and

MOV

instructions.

These instructions can optionally set the condition flags, according to the result of the operation. If they do not set

the flags, existing flag settings from a previous instruction are preserved.

Table A4-2 on page A4-166 summarizes the main data-processing instructions in the Thumb and ARM instruction

sets. Generally, each of these instructions is described in three sections in Chapter A8 Instruction Descriptions, one

section for each of the following:

•

INSTRUCTION

(immediate) where the second operand is a modified immediate constant.

•

INSTRUCTION

(register) where the second operand is a register, or a register shifted by a constant.

•

INSTRUCTION

(register-shifted register) where the second operand is a register shifted by a value obtained from

another register. These are only available in the ARM instruction set.

A4 The Instruction Sets

A4.4 Data-processing instructions

Non-Confidential ID051414

Table A4-2 Standard data-processing instructions

Instruction Mnemonic Notes

Add with Carry

ADC

Add

ADD

Thumb instruction set permits use of a modified immediate constant or a

zero-extended 12-bit immediate constant.

Form PC-relative Address

ADR

First operand is the PC. Second operand is an immediate constant. Thumb instruction

set uses a zero-extended 12-bit immediate constant. Operation is an addition or a

subtraction.

Bitwise AND

AND

Bitwise Bit Clear

BIC

Compare Negative

CMN

Sets flags. Like

ADD

but with no destination register.

Compare

CMP

Sets flags. Like

SUB

but with no destination register.

Bitwise Exclusive OR

EOR

Copy operand to destination

MOV

Has only one operand, with the same options as the second operand in most of these

instructions. If the operand is a shifted register, the instruction is an

LSL

LSR

ASR

, or

ROR

instruction instead. For details see Shift instructions on page A4-167.

The ARM and Thumb instruction sets permit use of a modified immediate constant

or a zero-extended 16-bit immediate constant.

Bitwise NOT

MVN

Has only one operand, with the same options as the second operand in most of these

instructions.

Bitwise OR NOT

ORN

Not available in the ARM instruction set.

Bitwise OR

ORR

Reverse Subtract

RSB

Subtracts first operand from second operand. This permits subtraction from constants

and shifted registers.

Reverse Subtract with Carry

RSC

Not available in the Thumb instruction set.

Subtract with Carry

SBC

Subtract

SUB

Thumb instruction set permits use of a modified immediate constant or a

zero-extended 12-bit immediate constant.

Test Equivalence

TEQ

Sets flags. Like

EOR

but with no destination register.

Test

TST

Sets flags. Like

AND

but with no destination register.

A4 The Instruction Sets

A4.4 Data-processing instructions

ID051414 Non-Confidential

A4.4.2 Shift instructions

Table A4-3 lists the shift instructions in the ARM and Thumb instruction sets.

In the ARM instruction set only, the destination register of these instructions can be the PC, causing the result to be

treated as an address to branch to.

A4.4.3 Multiply instructions

These instructions can operate on signed or unsigned quantities. In some types of operation, the results are same

whether the operands are signed or unsigned.

•Table A4-4 summarizes the multiply instructions where there is no distinction between signed and unsigned

quantities.

The least significant 32 bits of the result are used. More significant bits are discarded.

•Table A4-5 on page A4-168 summarizes the signed multiply instructions.

•Table A4-6 on page A4-168 summarizes the unsigned multiply instructions.

Table A4-3 Shift instructions

Instruction See

Arithmetic Shift Right ASR (immediate) on page A8-330

Arithmetic Shift Right ASR (register) on page A8-332

Logical Shift Left LSL (immediate) on page A8-468

Logical Shift Left LSL (register) on page A8-470

Logical Shift Right LSR (immediate) on page A8-472

Logical Shift Right LSR (register) on page A8-474

Rotate Right ROR (immediate) on page A8-568

Rotate Right ROR (register) on page A8-570

Rotate Right with Extend RRX on page A8-572

Table A4-4 General multiply instructions

Instruction See Operation (number of bits)

Multiply Accumulate MLA on page A8-480 32 = 32 + 32 × 32

Multiply and Subtract MLS on page A8-482 32 = 32 – 32 × 32

Multiply MUL on page A8-502 32 = 32 × 32

A4 The Instruction Sets

A4.4 Data-processing instructions

Non-Confidential ID051414

Table A4-5 Signed multiply instructions

Instruction See Operation (number of bits)

Signed Multiply Accumulate (halfwords) SMLABB, SMLABT, SMLATB, SMLATT

on page A8-620

32 = 32 + 16 × 16

Signed Multiply Accumulate Dual SMLAD on page A8-622 32 = 32 + 16 × 16 + 16 × 16

Signed Multiply Accumulate Long SMLAL on page A8-624 64 = 64 + 32 × 32

Signed Multiply Accumulate Long (halfwords) SMLALBB, SMLALBT, SMLALTB,

SMLALTT on page A8-626

64 = 64 + 16 × 16

Signed Multiply Accumulate Long Dual SMLALD on page A8-628 64 = 64 + 16 × 16 + 16 × 16

Signed Multiply Accumulate (word by halfword) SMLAWB, SMLAWT on page A8-630 32 = 32 + 32 × 16 a

Signed Multiply Subtract Dual SMLSD on page A8-632 32 = 32 + 16 × 16 – 16 × 16

Signed Multiply Subtract Long Dual SMLSLD on page A8-634 64 = 64 + 16 × 16 – 16 × 16

Signed Most Significant Word Multiply Accumulate SMMLA on page A8-636 32 = 32 + 32 × 32 b

Signed Most Significant Word Multiply Subtract SMMLS on page A8-638 32 = 32 – 32 × 32 b

Signed Most Significant Word Multiply SMMUL on page A8-640 32 = 32 × 32 b

Signed Dual Multiply Add SMUAD on page A8-642 32 = 16 × 16 + 16 × 16

Signed Multiply (halfwords) SMULBB, SMULBT, SMULTB, SMULTT

on page A8-644

32 = 16 × 16

Signed Multiply Long SMULL on page A8-646 64 = 32 × 32

Signed Multiply (word by halfword) SMULWB, SMULWT on page A8-648 32 = 32 × 16 a

Signed Dual Multiply Subtract SMUSD on page A8-650 32 = 16 × 16 – 16 × 16

a. The most significant 32 bits of the 48-bit product are used. Less significant bits are discarded.

b. The most significant 32 bits of the 64-bit product are used. Less significant bits are discarded.

Table A4-6 Unsigned multiply instructions

Instruction See Operation (number of bits)

Unsigned Multiply Accumulate Accumulate Long UMAAL on page A8-774 64 = 32 + 32 + 32 × 32

Unsigned Multiply Accumulate Long UMLAL on page A8-776 64 = 64 + 32 × 32

Unsigned Multiply Long UMULL on page A8-778 64 = 32 × 32

A4 The Instruction Sets

A4.4 Data-processing instructions

ID051414 Non-Confidential

A4.4.4 Saturating instructions

Table A4-7 lists the saturating instructions in the ARM and Thumb instruction sets. For more information, see

Pseudocode details of saturation on page A2-44.

A4.4.5 Saturating addition and subtraction instructions

Table A4-8 lists the saturating addition and subtraction instructions in the ARM and Thumb instruction sets. For

more information, see Pseudocode details of saturation on page A2-44.

Table A4-7 Saturating instructions

Instruction See Operation

Signed Saturate SSAT on page A8-652 Saturates optionally shifted 32-bit value to selected range

Signed Saturate 16 SSAT16 on page A8-654 Saturates two 16-bit values to selected range

Unsigned Saturate USAT on page A8-796 Saturates optionally shifted 32-bit value to selected range

Unsigned Saturate 16 USAT16 on page A8-798 Saturates two 16-bit values to selected range

Table A4-8 Saturating addition and subtraction instructions

Instruction See Operation

Saturating Add QADD on page A8-540 Add, saturating result to the 32-bit signed integer range

Saturating Subtract QSUB on page A8-554 Subtract, saturating result to the 32-bit signed integer range

Saturating Double and Add QDADD on page A8-548 Doubles one value and adds a second value, saturating the doubling and

the addition to the 32-bit signed integer range

Saturating Double and

Subtract

QDSUB on page A8-550 Doubles one value and subtracts the result from a second value, saturating

the doubling and the subtraction to the 32-bit signed integer range

A4 The Instruction Sets

A4.4 Data-processing instructions

Non-Confidential ID051414

A4.4.6 Packing and unpacking instructions

Table A4-9 lists the packing and unpacking instructions in the ARM and Thumb instruction sets. These are all

available from ARMv6T2 in the Thumb instruction set, and from ARMv6 onwards in the ARM instruction set.

Table A4-9 Packing and unpacking instructions

Instruction See Operation

Pack Halfword PKH on page A8-522 Combine halfwords

Signed Extend and Add Byte SXTAB on page A8-724 Extend 8 bits to 32 and add

Signed Extend and Add Byte 16 SXTAB16 on page A8-726 Dual extend 8 bits to 16 and add

Signed Extend and Add Halfword SXTAH on page A8-728 Extend 16 bits to 32 and add

Signed Extend Byte SXTB on page A8-730 Extend 8 bits to 32

Signed Extend Byte 16 SXTB16 on page A8-732 Dual extend 8 bits to 16

Signed Extend Halfword SXTH on page A8-734 Extend 16 bits to 32

Unsigned Extend and Add Byte UXTAB on page A8-806 Extend 8 bits to 32 and add

Unsigned Extend and Add Byte 16 UXTAB16 on page A8-808 Dual extend 8 bits to 16 and add

Unsigned Extend and Add Halfword UXTAH on page A8-810 Extend 16 bits to 32 and add

Unsigned Extend Byte UXTB on page A8-812 Extend 8 bits to 32

Unsigned Extend Byte 16 UXTB16 on page A8-814 Dual extend 8 bits to 16

Unsigned Extend Halfword UXTH on page A8-816 Extend 16 bits to 32

A4 The Instruction Sets

A4.4 Data-processing instructions

ID051414 Non-Confidential

A4.4.7 Parallel addition and subtraction instructions

These instructions perform additions and subtractions on the values of two registers and write the result to a

destination register, treating the register values as sets of two halfwords or four bytes. That is, they perform SIMD

additions or subtractions on the registers. They are available in ARMv6 and above.

These instructions consist of a prefix followed by a main instruction mnemonic. The prefixes are as follows:

Signed arithmetic modulo 28 or 216.

Signed saturating arithmetic.

Signed arithmetic, halving the results.

Unsigned arithmetic modulo 28 or 216.

Unsigned saturating arithmetic.

Unsigned arithmetic, halving the results.

The main instruction mnemonics are as follows:

ADD16

Adds the top halfwords of two operands to form the top halfword of the result, and the bottom

halfwords of the same two operands to form the bottom halfword of the result.

ASX

Exchanges halfwords of the second operand, and then adds top halfwords and subtracts bottom

halfwords.

SAX

Exchanges halfwords of the second operand, and then subtracts top halfwords and adds bottom

halfwords.

SUB16

Subtracts each halfword of the second operand from the corresponding halfword of the first operand

to form the corresponding halfword of the result.

ADD8

Adds each byte of the second operand to the corresponding byte of the first operand to form the

corresponding byte of the result.

SUB8

Subtracts each byte of the second operand from the corresponding byte of the first operand to form

the corresponding byte of the result.

The instruction set permits all 36 combinations of prefix and main instruction operand, as Table A4-10 shows.

See also Advanced SIMD parallel addition and subtraction on page A4-185.

Table A4-10 Parallel addition and subtraction instructions

Main instruction Signed Saturating Signed

halving Unsigned Unsigned

saturating Unsigned

halving

ADD16

, add, two halfwords

SADD16 QADD16 SHADD16 UADD16 UQADD16 UHADD16

ASX

, add and subtract with exchange

SASX QASX SHASX UASX UQASX UHASX

SAX

, subtract and add with exchange

SSAX QSAX SHSAX USAX UQSAX UHSAX

SUB16

, subtract, two halfwords

SSUB16 QSUB16 SHSUB16 USUB16 UQSUB16 UHSUB16

ADD8

, add, four bytes

SADD8 QADD8 SHADD8 UADD8 UQADD8 UHADD8

SUB8

, subtract, four bytes

SSUB8 QSUB8 SHSUB8 USUB8 UQSUB8 UHSUB8

A4 The Instruction Sets

A4.4 Data-processing instructions

Non-Confidential ID051414

A4.4.8 Divide instructions

The ARMv7-R profile introduces support for signed and unsigned integer divide instructions, implemented in

hardware, in the Thumb instruction set. For more information see ARMv7 implementation requirements and options

for the divide instructions.

For descriptions of the instructions see:

•SDIV on page A8-600

•UDIV on page A8-760.

Note

• The Virtualization Extensions introduce the requirement for an ARMv7-A implementation to include

SDIV

and

UDIV

• The ARMv7-M profile also includes the

SDIV

and

UDIV

instructions.

In the ARMv7-R profile, the SCTLR.DZ bit enables divide by zero fault detection:

SCTLR.DZ == 0 Divide-by-zero returns a zero result.

SCTLR.DZ == 1

SDIV

and

UDIV

generate an Undefined Instruction exception on a divide-by-zero.

The SCTLR.DZ bit is cleared to zero on reset.

In an ARMv7-A profile implementation that supports the

SDIV

and

UDIV

instructions, divide-by-zero always returns

a zero result.

ARMv7 implementation requirements and options for the divide instructions

Any implementation of the ARMv7-R profile must include the

SDIV

and

UDIV

instructions in the Thumb instruction

set.

Any implementation of the Virtualization Extensions must include the

SDIV

and

UDIV

instructions in the Thumb and

ARM instruction sets.

In the ARMv7-R profile, the implementation of

SDIV

and

UDIV

in the ARM instruction set is OPTIONAL.

In an ARMv7-A implementation that does not include the Virtualization Extensions, the implementation of

SDIV

and

UDIV

in both instruction sets is OPTIONAL, but the architecture permits an ARMv7-A implementation to not

implement

SDIV

and

UDIV

Note

Previous issues of this document have stated that a VMSAv7 implementation might implement

SDIV

and

UDIV

in the

Thumb instruction set but not in the ARM instruction set. ARM strongly recommends against this implementation

option.

The ID_ISAR0.Divide_instrs field indicates the level of support for these instructions, see ID_ISAR0, Instruction

Set Attribute Register 0, VMSA on page B4-1608 or ID_ISAR0, Instruction Set Attribute Register 0, PMSA on

page B6-1856:

• a field value of

0b0001

indicates they are implemented in the Thumb instruction set

• a field value of

0b0010

indicates they are implemented in both the Thumb and ARM instruction sets.

A4 The Instruction Sets

A4.4 Data-processing instructions

ID051414 Non-Confidential

A4.4.9 Miscellaneous data-processing instructions

Table A4-11 lists the miscellaneous data-processing instructions in the ARM and Thumb instruction sets.

Immediate values in these instructions are simple binary numbers.

Table A4-11 Miscellaneous data-processing instructions

Instruction See Notes

Bit Field Clear BFC on page A8-336 -

Bit Field Insert BFI on page A8-338 -

Count Leading Zeros CLZ on page A8-362 -

Move Top MOVT on page A8-491 Moves 16-bit immediate value to top

halfword. Bottom halfword unchanged.

Reverse Bits RBIT on page A8-560 -

Byte-Reverse Word REV on page A8-562 -

Byte-Reverse Packed Halfword REV16 on page A8-564 -

Byte-Reverse Signed Halfword REVSH on page A8-566 -

Signed Bit Field Extract SBFX on page A8-598 -

Select Bytes using GE flags SEL on page A8-602 -

Unsigned Bit Field Extract UBFX on page A8-756 -

Unsigned Sum of Absolute Differences USAD8 on page A8-792 -

Unsigned Sum of Absolute Differences and Accumulate USADA8 on page A8-794 -

A4 The Instruction Sets

A4.5 Status register access instructions

Non-Confidential ID051414

A4.5 Status register access instructions

The

MRS

and

MSR

instructions move the contents of the Application Program Status Register (APSR) to or from an

ARM core register, see:

•MRS on page A8-496

•MSR (immediate) on page A8-498

•MSR (register) on page A8-500.

The Application Program Status Register (APSR) on page A2-49 described the APSR.

The condition flags in the APSR are normally set by executing data-processing instructions, and normally control

the execution of conditional instructions. However, software can set the condition flags explicitly using the

MSR

instruction, and can read the current state of the condition flags explicitly using the

MRS

instruction.

At system level, software can also:

• use these instructions to access the SPSR of the current mode

•use the

CPS

instruction to change the CPSR.M field and the CPSR.{A, I, F} interrupt mask bits.

For details of the system level use of status register access instructions

CPS

MRS

, and

MSR

, see:

•CPS (Thumb) on page B9-1978

•CPS (ARM) on page B9-1980

•MRS on page B9-1990

•MSR (immediate) on page B9-1996

•MSR (register) on page B9-1998.

A4.5.1 Banked register access instructions

In a processor that implements the Virtualization Extensions, in all modes except User mode, the

MRS

(Banked

MSR

(Banked register) instructions move the contents of a Banked ARM core register, the SPSR, or the

ELR_hyp, to or from an ARM core register. For instruction descriptions see:

•MRS (Banked register) on page B9-1992

•MSR (Banked register) on page B9-1994.

Note

These are system level instructions.

A4 The Instruction Sets

A4.6 Load/store instructions

ID051414 Non-Confidential

A4.6 Load/store instructions

Table A4-12 summarizes the ARM core register load/store instructions in the ARM and Thumb instruction sets. See

also:

•Load/store multiple instructions on page A4-177

•Advanced SIMD and Floating-point load/store instructions on page A4-181.

Load/store instructions have several options for addressing memory. For more information, see Addressing modes

on page A4-176.

A4.6.1 Loads to the PC

The

LDR

instruction can load a value into the PC. The value loaded is treated as an interworking address, as described

by the

LoadWritePC()

pseudocode function in Pseudocode details of operations on ARM core registers on

page A2-47.

A4.6.2 Halfword and byte loads and stores

Halfword and byte stores store the least significant halfword or byte from the register, to 16 or 8 bits of memory

respectively. There is no distinction between signed and unsigned stores.

Halfword and byte loads load 16 or 8 bits from memory into the least significant halfword or byte of a register.

Unsigned loads zero-extend the loaded value to 32 bits, and signed loads sign-extend the value to 32 bits.

A4.6.3 Load unprivileged and Store unprivileged

When executing at PL0, a Load unprivileged or Store unprivileged instruction operates in exactly the same way as

the corresponding ordinary load or store instruction. For example, an

LDRT

instruction executes in exactly the same

way as the equivalent

LDR

instruction. When executed at PL1, Load unprivileged and Store unprivileged instructions

behave as they would if they were executed at PL0. For example, an

LDRT

instruction executes in exactly the way

that the equivalent

LDR

instruction would execute at PL0. In particular, the instructions make unprivileged memory

accesses.

The Load unprivileged and Store unprivileged instructions are UNPREDICTABLE if executed at PL2.

For more information, see Privilege level access controls for data accesses on page A3-143.

Table A4-12 Load/store instructions

Data type Load Store Load

unprivileged Store

unprivileged Load-

Exclusive Store-

Exclusive

32-bit word

LDR STR LDRT STRT LDREX STREX

16-bit halfword -

STRH

STRHT

STREXH

16-bit unsigned halfword

LDRH

LDRHT - LDREXH

16-bit signed halfword

LDRSH

LDRSHT

---

8-bit byte -

STRB

STRBT

STREXB

8-bit unsigned byte

LDRB - LDRBT

LDREXB

8-bit signed byte

LDRSB

LDRSBT

---

Two 32-bit words

LDRD STRD

-- --

64-bit doubleword - - - -

LDREXD STREXD

A4 The Instruction Sets

A4.6 Load/store instructions

Non-Confidential ID051414

A4.6.4 Exclusive loads and stores

Exclusive loads and stores provide shared memory synchronization. For more information, see Synchronization and

semaphores on page A3-114.

A4.6.5 Addressing modes

The address for a load or store is formed from two parts: a value from a base register, and an offset.

The base register can be any one of the ARM core registers R0-R12, SP, or LR.

For loads, the base register can be the PC. This permits PC-relative addressing for position-independent code.

Instructions marked (literal) in their title in Chapter A8 Instruction Descriptions are PC-relative loads.

The offset takes one of three formats:

Immediate The offset is an unsigned number that can be added to or subtracted from the base register

value. Immediate offset addressing is useful for accessing data elements that are a fixed

distance from the start of the data object, such as structure fields, stack offsets and

input/output registers.

can be added to, or subtracted from, the base register value. Register offsets are useful for

accessing arrays or blocks of data.

Scaled register The offset is an ARM core register, other than the PC, shifted by an immediate value, then

added to or subtracted from the base register. This means an array index can be scaled by

the size of each array element.

The offset and base register can be used in three different ways to form the memory address. The addressing modes

are described as follows:

Offset The offset is added to or subtracted from the base register to form the memory address.

Pre-indexed The offset is added to or subtracted from the base register to form the memory address. The

base register is then updated with this new address, to permit automatic indexing through an

array or memory block.

Post-indexed The value of the base register alone is used as the memory address. The offset is then added

to or subtracted from the base register. The result is stored back in the base register, to permit

automatic indexing through an array or memory block.

Note

Not every variant is available for every instruction, and the range of permitted immediate values and the options for

scaled registers vary from instruction to instruction. See Chapter A8 Instruction Descriptions for full details for

each instruction.

A4 The Instruction Sets

A4.7 Load/store multiple instructions

ID051414 Non-Confidential

A4.7 Load/store multiple instructions

Load Multiple instructions load a subset, or possibly all, of the ARM core registers from memory.

Store Multiple instructions store a subset, or possibly all, of the ARM core registers to memory.

The memory locations are consecutive word-aligned words. The addresses used are obtained from a base register,

and can be either above or below the value in the base register. The base register can optionally be updated by the

total size of the data transferred.

Table A4-13 summarizes the load/store multiple instructions in the ARM and Thumb instruction sets.

When executing at PL1, variants of the

LDM

and

STM

instructions load and store User mode registers. Another

system level variant of the

LDM

instruction performs an exception return. For details of these variants, see Chapter B9

System Instructions.

A4.7.1 Loads to the PC

The

LDM

LDMDA

LDMDB

LDMIB

, and

POP

instructions can load a value into the PC. The value loaded is treated as an

interworking address, as described by the

LoadWritePC()

pseudocode function in Pseudocode details of operations

on ARM core registers on page A2-47.

Table A4-13 Load/store multiple instructions

Instruction See

Load Multiple, Increment After or Full Descending LDM/LDMIA/LDMFD (Thumb) on page A8-396

LDM/LDMIA/LDMFD (ARM) on page A8-398

Load Multiple, Decrement After or Full Ascending aLDMDA/LDMFA on page A8-400

Load Multiple, Decrement Before or Empty Ascending LDMDB/LDMEA on page A8-402

Load Multiple, Increment Before or Empty Descending aLDMIB/LDMED on page A8-404

Pop multiple registers off the stack bPOP (Thumb) on page A8-534

POP (ARM) on page A8-536

Push multiple registers onto the stack cPUSH on page A8-538

Store Multiple, Increment After or Empty Ascending STM (STMIA, STMEA) on page A8-664

Store Multiple, Decrement After or Empty Descending aSTMDA (STMED) on page A8-666

Store Multiple, Decrement Before or Full Descending STMDB (STMFD) on page A8-668

Store Multiple, Increment Before or Full Ascending aSTMIB (STMFA) on page A8-670

a. Not available in the Thumb instruction set.

b. This instruction is equivalent to an

LDM

instruction with the SP as base register, and base register updating.

c. This instruction is equivalent to an

STMDB

instruction with the SP as base register, and base register updating.

A4 The Instruction Sets

A4.8 Miscellaneous instructions

Non-Confidential ID051414

A4.8 Miscellaneous instructions

Table A4-14 summarizes the miscellaneous instructions in the ARM and Thumb instruction sets.

A4.8.1 The Yield instruction

In a Symmetric Multi-Threading (SMT) design, a thread can use the

YIELD

instruction to give a hint to the processor

that it is running on. The

YIELD

hint indicates that whatever the thread is currently doing is of low importance, and

so could yield. For example, the thread might be sitting in a spin-lock. A similar use might be in modifying the

arbitration priority of the snoop bus in a multiprocessor (MP) system. Defining such an instruction permits binary

compatibility between SMT and SMP systems.

ARMv7 defines a

YIELD

instruction as a specific NOP (No Operation) hint instruction.

The

YIELD

instruction has no effect in a single-threaded system, but developers of such systems can use the

instruction to flag its intended use on migration to a multiprocessor or multithreading system. Operating systems

can use

YIELD

in places where a yield hint is wanted, knowing that it will be treated as a

NOP

if there is no

implementation benefit.

Table A4-14 Miscellaneous instructions

Instruction See

Clear-Exclusive CLREX on page A8-360

Debug Hint DBG on page A8-377

Data Memory Barrier DMB on page A8-378

Data Synchronization Barrier DSB on page A8-380

Instruction Synchronization Barrier ISB on page A8-389

If-Then IT on page A8-390

No Operation NOP on page A8-510

Preload Data PLD, PLDW (immediate) on page A8-524

PLD (literal) on page A8-526

PLD, PLDW (register) on page A8-528

Preload Instruction PLI (immediate, literal) on page A8-530

PLI (register) on page A8-532

Set Endianness SETEND on page A8-604

Send Event SEV on page A8-606

Swap, Swap Byte. Deprecated. a

a. Use Load/Store-Exclusive instructions instead, see Load/store instructions on page A4-175.

SWP, SWPB on page A8-722

Wait For Event WFE on page A8-1104

Wait For Interrupt WFI on page A8-1106

Yield YIELD on page A8-1108

A4 The Instruction Sets

A4.9 Exception-generating and exception-handling instructions

ID051414 Non-Confidential

A4.9 Exception-generating and exception-handling instructions

The following instructions are intended specifically to cause a synchronous processor exception to occur:

• The

SVC

instruction generates a Supervisor Call exception. For more information, see Supervisor Call (SVC)

exception on page B1-1210.

• The Breakpoint instruction

BKPT

provides software breakpoints. For more information, see About debug

events on page C3-2038.

• In a processor that implements the Security Extensions, when executing at PL1 or higher, the

SMC

instruction

generates a Secure Monitor Call exception. For more information, see Secure Monitor Call (SMC) exception

on page B1-1211.

• In a processor that implements the Virtualization Extensions, in software executing in a Non-secure PL1

mode, the

HVC

instruction generates a Hypervisor Call exception. For more information, see Hypervisor Call

(HVC) exception on page B1-1212.

For an exception taken to a PL1 mode:

• The system level variants of the

SUBS

and

LDM

instructions perform a return from an exception.

Note

The variants of

SUBS

include

MOVS

. See the references to

SUBS PC, LR

in Table A4-15 for more information.

• From ARMv6, the

SRS

instruction can be used near the start of the handler, to store return information. The

RFE

instruction can then perform a return from the exception using the stored return information.

In a processor that implements the Virtualization Extensions, the

ERET

instruction performs a return from an

exception taken to Hyp mode.

For more information, see Exception return on page B1-1194.

Table A4-15 summarizes the instructions, in the ARM and Thumb instruction sets, for generating or handling an

exception. Except for

BKPT

and

SVC

, these are system level instructions.

Table A4-15 Exception-generating and exception-handling instructions

Instruction See

Supervisor Call SVC (previously SWI) on page A8-720

Breakpoint BKPT on page A8-346

Secure Monitor Call SMC (previously SMI) on page B9-2002

Return From Exception RFE on page B9-2000

Subtract (exception return) SUBS PC, LR (Thumb) on page B9-2010

SUBS PC, LR and related instructions (ARM) on page B9-2012

Hypervisor Call HVC on page B9-1984

Exception Return ERET on page B9-1982

Load Multiple (exception return) LDM (exception return) on page B9-1986

Store Return State SRS (Thumb) on page B9-2004

SRS (ARM) on page B9-2006

A4 The Instruction Sets

A4.10 Coprocessor instructions

Non-Confidential ID051414

A4.10 Coprocessor instructions

There are three types of instruction for communicating with coprocessors. These permit the processor to:

• Initiate a coprocessor data-processing operation. For details see CDP, CDP2 on page A8-358.

• Transfer ARM core registers to and from coprocessor registers. For details, see:

—MCR, MCR2 on page A8-476

—MCRR, MCRR2 on page A8-478

—MRC, MRC2 on page A8-492

—MRRC, MRRC2 on page A8-494.

• Load or store the values of coprocessor registers. For details, see:

—LDC, LDC2 (immediate) on page A8-392

—LDC, LDC2 (literal) on page A8-394

—STC, STC2 on page A8-662.

The instruction set distinguishes up to 16 coprocessors with a 4-bit field in each coprocessor instruction, so each

coprocessor is assigned a particular number.

Note

One coprocessor can use more than one of the 16 numbers if a large coprocessor instruction set is required.

Coprocessors 10 and 11 are used, together, for Floating-point Extension and some Advanced SIMD Extension

functionality. There are different instructions for accessing these coprocessors, of similar types to the instructions

for the other coprocessors, that is, to:

• Initiate a coprocessor data-processing operation. For details see Floating-point data-processing instructions

on page A4-191.

• Transfer ARM core registers to and from coprocessor registers. For details, see Advanced SIMD and

Floating-point register transfer instructions on page A4-183.

• Load or store the values of coprocessor registers. For details, see Advanced SIMD and Floating-point

load/store instructions on page A4-181.

Coprocessors execute the same instruction stream as the processor, ignoring non-coprocessor instructions and

coprocessor instructions for other coprocessors. Coprocessor instructions that cannot be executed by any

coprocessor hardware cause an Undefined Instruction exception.

Coprocessors 8, 9, 12, and 13 are reserved for future use by ARM. Any coprocessor access instruction attempting

to access one of these coprocessors is UNDEFINED.

For more information about specific coprocessors see Coprocessor support on page A2-94.

A4 The Instruction Sets

A4.11 Advanced SIMD and Floating-point load/store instructions

ID051414 Non-Confidential

A4.11 Advanced SIMD and Floating-point load/store instructions

Table A4-16 summarizes the extension register load/store instructions in the Advanced SIMD and Floating-point

(VFP) instruction sets.

Advanced SIMD also provides instructions for loading and storing multiple elements, or structures of elements, see

Element and structure load/store instructions.

A4.11.1 Element and structure load/store instructions

Table A4-17 shows the element and structure load/store instructions available in the Advanced SIMD instruction

set. Loading and storing structures of more than one element automatically de-interleaves or interleaves the

elements, see Figure A4-1 on page A4-182 for an example of de-interleaving. Interleaving is the inverse process.

Table A4-16 Extension register load/store instructions

Instruction See Operation

Vector Load Multiple VLDM on page A8-922 Load 1-16 consecutive 64-bit registers, Advanced SIMD and Floating-point

Load 1-16 consecutive 32-bit registers, Floating-point only

Vector Load Register VLDR on page A8-924 Load one 64-bit register, Advanced SIMD and Floating-point

Load one 32-bit register, Floating-point only

Vector Store Multiple VSTM on page A8-1080 Store 1-16 consecutive 64-bit registers, Advanced SIMD and Floating-point

Store 1-16 consecutive 32-bit registers, Floating-point only

Vector Store Register VSTR on page A8-1082 Store one 64-bit register, Advanced SIMD and Floating-point

Store one 32-bit register, Floating-point only

Table A4-17 Element and structure load/store instructions

Instruction See

Load single element

Multiple elements VLD1 (multiple single elements) on page A8-898

To one lane VLD1 (single element to one lane) on page A8-900

To all lanes VLD1 (single element to all lanes) on page A8-902

Load 2-element structure

Multiple structures VLD2 (multiple 2-element structures) on page A8-904

To one lane VLD2 (single 2-element structure to one lane) on page A8-906

To all lanes VLD2 (single 2-element structure to all lanes) on page A8-908

Load 3-element structure

Multiple structures VLD3 (multiple 3-element structures) on page A8-910

To one lane VLD3 (single 3-element structure to one lane) on page A8-912

To all lanes VLD3 (single 3-element structure to all lanes) on page A8-914

A4 The Instruction Sets

A4.11 Advanced SIMD and Floating-point load/store instructions

Non-Confidential ID051414

Figure A4-1 shows the de-interleaving of a

VLD3.16

(multiple 3-element structures) instruction:

Figure A4-1 De-interleaving an array of 3-element structures

Figure A4-1 shows the

VLD3.16

instruction operating to three 64-bit registers that comprise four 16-bit elements:

• Different instructions in this group would produce similar figures, but operate on different numbers of

registers. For example,

VLD4

and

VST4

instructions operate on four registers.

• Different element sizes would produce similar figures but with 8-bit or 32-bit elements.

• These instructions operate only on doubleword (64-bit) registers.

Load 4-element structure

Multiple structures VLD4 (multiple 4-element structures) on page A8-916

To one lane VLD4 (single 4-element structure to one lane) on page A8-918

To all lanes VLD4 (single 4-element structure to all lanes) on page A8-920

Store single element

Multiple elements VST1 (multiple single elements) on page A8-1064

From one lane VST1 (single element from one lane) on page A8-1066

Store 2-element structure

Multiple structures VST2 (multiple 2-element structures) on page A8-1068

From one lane VST2 (single 2-element structure from one lane) on page A8-1070

Store 3-element structure

Multiple structures VST3 (multiple 3-element structures) on page A8-1072

From one lane VST3 (single 3-element structure from one lane) on page A8-1074

Store 4-element structure

Multiple structures VST4 (multiple 4-element structures) on page A8-1076

From one lane VST4 (single 4-element structure from one lane) on page A8-1078

Table A4-17 Element and structure load/store instructions (continued)

Instruction See

A[0].x

A[0].y

A[0].z

A[1].x

A[1].y

A[1].z

A[2].x

A[2].y

A[2].z

A[3].x

A[3].y

A[3].z

Memory

Z3Z2Z1Z0D2

Y3Y1D1

X3X2X1D0

Y2Y0

Registers

A is a packed array of

3-element structures.

Each element is a 16-bit

halfword.

A4 The Instruction Sets

A4.12 Advanced SIMD and Floating-point register transfer instructions

ID051414 Non-Confidential

A4.12 Advanced SIMD and Floating-point register transfer instructions

Table A4-18 summarizes the extension register transfer instructions in the Advanced SIMD and Floating-point

(VFP) instruction sets. These instructions transfer data from ARM core registers to extension registers, or from

extension registers to ARM core registers.

Advanced SIMD vectors, and single-precision and double-precision Floating-point registers, are all views of the

same extension register set. For details see Advanced SIMD and Floating-point Extension registers on page A2-56.

Table A4-18 Extension register transfer instructions

Instruction See

Copy element from ARM core register to every element of Advanced SIMD vector VDUP (ARM core register) on page A8-886

Copy byte, halfword, or word from ARM core register to extension register VMOV (ARM core register to scalar) on

page A8-940

Copy byte, halfword, or word from extension register to ARM core register VMOV (scalar to ARM core register) on

page A8-942

Copy from single-precision Floating-point register to ARM core register, or from

ARM core register to single-precision Floating-point register

VMOV (between ARM core register and

single-precision register) on page A8-944

Copy two words from ARM core registers to consecutive single-precision

Floating-point registers, or from consecutive single-precision Floating-point

registers to ARM core registers

VMOV (between two ARM core registers and

two single-precision registers) on page A8-946

Copy two words from ARM core registers to doubleword extension register, or from

doubleword extension register to ARM core registers

VMOV (between two ARM core registers and a

doubleword extension register) on page A8-948

Copy from Advanced SIMD and Floating-point Extension System Register to ARM

core register

VMRS on page A8-954

VMRS on page B9-2014 (system level view)

Copy from ARM core register to Advanced SIMD and Floating-point Extension

System Register

VMSR on page A8-956

VMSR on page B9-2016 (system level view)

A4 The Instruction Sets

A4.13 Advanced SIMD data-processing instructions

Non-Confidential ID051414

A4.13 Advanced SIMD data-processing instructions

Advanced SIMD data-processing instructions process registers containing vectors of elements of the same type

packed together, enabling the same operation to be performed on multiple items in parallel.

Instructions operate on vectors held in 64-bit or 128-bit registers. Figure A4-2 shows an operation on two 64-bit

operand vectors, generating a 64-bit vector result.

Note

Figure A4-2 and other similar figures show 64-bit vectors that consist of four 16-bit elements, and 128-bit vectors

that consist of four 32-bit elements. Other element sizes produce similar figures, but with one, two, eight, or sixteen

operations performed in parallel instead of four.

Figure A4-2 Advanced SIMD instruction operating on 64-bit registers

Many Advanced SIMD instructions have variants that produce vectors of elements double the size of the inputs. In

this case, the number of elements in the result vector is the same as the number of elements in the operand vectors,

but each element, and the whole vector, is double the size.

Figure A4-3 shows an example of an Advanced SIMD instruction operating on 64-bit registers, and generating a

128-bit result.

Figure A4-3 Advanced SIMD instruction producing wider result

There are also Advanced SIMD instructions that have variants that produce vectors containing elements half the

size of the inputs. Figure A4-4 on page A4-185 shows an example of an Advanced SIMD instruction operating on

one 128-bit register, and generating a 64-bit result.

Op Op Op Op

A4 The Instruction Sets

A4.13 Advanced SIMD data-processing instructions

ID051414 Non-Confidential

Figure A4-4 Advanced SIMD instruction producing narrower result

Some Advanced SIMD instructions do not conform to these standard patterns. Their operation patterns are

described in the individual instruction descriptions.

Advanced SIMD instructions that perform floating-point arithmetic use the ARM standard floating-point arithmetic

defined in Floating-point data types and arithmetic on page A2-63.

A4.13.1 Advanced SIMD parallel addition and subtraction

Table A4-19 shows the Advanced SIMD parallel add and subtract instructions.

Op Op Op Op

Table A4-19 Advanced SIMD parallel add and subtract instructions

Instruction See

Vector Add VADD (integer) on page A8-828

VADD (floating-point) on page A8-830

Vector Add and Narrow, returning High Half VADDHN on page A8-832

Vector Add Long, Vector Add Wide VADDL, VADDW on page A8-834

Vector Halving Add, Vector Halving Subtract VHADD, VHSUB on page A8-896

Vector Pairwise Add and Accumulate Long VPADAL on page A8-978

Vector Pairwise Add VPADD (integer) on page A8-980

VPADD (floating-point) on page A8-982

Vector Pairwise Add Long VPADDL on page A8-984

Vector Rounding Add and Narrow, returning High Half VRADDHN on page A8-1022

Vector Rounding Halving Add VRHADD on page A8-1030

Vector Rounding Subtract and Narrow, returning High Half VRSUBHN on page A8-1044

Vector Saturating Add VQADD on page A8-996

Vector Saturating Subtract VQSUB on page A8-1020

Vector Subtract VSUB (integer) on page A8-1084

VSUB (floating-point) on page A8-1086

Vector Subtract and Narrow, returning High Half VSUBHN on page A8-1088

Vector Subtract Long, Vector Subtract Wide VSUBL, VSUBW on page A8-1090

A4 The Instruction Sets

A4.13 Advanced SIMD data-processing instructions

Non-Confidential ID051414

A4.13.2 Bitwise Advanced SIMD data-processing instructions

Table A4-20 shows bitwise Advanced SIMD data-processing instructions. These operate on the doubleword

(64-bit) or quadword (128-bit) extension registers, and there is no division into vector elements.

A4.13.3 Advanced SIMD comparison instructions

Table A4-21 shows Advanced SIMD comparison instructions.

Table A4-20 Bitwise Advanced SIMD data-processing instructions

Instruction See

Vector Bitwise AND VAND (register) on page A8-836

Vector Bitwise Bit Clear (AND complement) VBIC (immediate) on page A8-838

VBIC (register) on page A8-840

Vector Bitwise Exclusive OR VEOR on page A8-888

Vector Bitwise Insert if False VBIF, VBIT, VBSL on page A8-842

Vector Bitwise Insert if True

Vector Bitwise Move VMOV (immediate) on page A8-936

VMOV (register) on page A8-938

Vector Bitwise NOT VMVN (immediate) on page A8-964

VMVN (register) on page A8-966

Vector Bitwise OR VORR (immediate) on page A8-974

VORR (register) on page A8-976

Vector Bitwise OR NOT VORN (register) on page A8-972

Vector Bitwise Select VBIF, VBIT, VBSL on page A8-842

Table A4-21 Advanced SIMD comparison instructions

Instruction See

Vector Absolute Compare VACGE, VACGT, VACLE, VACLT on page A8-826

Vector Compare Equal VCEQ (register) on page A8-844

Vector Compare Equal to Zero VCEQ (immediate #0) on page A8-846

Vector Compare Greater Than or Equal VCGE (register) on page A8-848

Vector Compare Greater Than or Equal to Zero VCGE (immediate #0) on page A8-850

Vector Compare Greater Than VCGT (register) on page A8-852

Vector Compare Greater Than Zero VCGT (immediate #0) on page A8-854

Vector Compare Less Than or Equal to Zero VCLE (immediate #0) on page A8-856

Vector Compare Less Than Zero VCLT (immediate #0) on page A8-860

Vector Test Bits VTST on page A8-1098

A4 The Instruction Sets

A4.13 Advanced SIMD data-processing instructions

ID051414 Non-Confidential

A4.13.4 Advanced SIMD shift instructions

Table A4-22 lists the shift instructions in the Advanced SIMD instruction set.

Table A4-22 Advanced SIMD shift instructions

Instruction See

Vector Saturating Rounding Shift Left VQRSHL on page A8-1010

Vector Saturating Rounding Shift Right and Narrow VQRSHRN, VQRSHRUN on page A8-1012

Vector Saturating Shift Left VQSHL (register) on page A8-1014

VQSHL, VQSHLU (immediate) on page A8-1016

Vector Saturating Shift Right and Narrow VQSHRN, VQSHRUN on page A8-1018

Vector Rounding Shift Left VRSHL on page A8-1032

Vector Rounding Shift Right VRSHR on page A8-1034

Vector Rounding Shift Right and Accumulate VRSRA on page A8-1042

Vector Rounding Shift Right and Narrow VRSHRN on page A8-1036

Vector Shift Left VSHL (immediate) on page A8-1046

VSHL (register) on page A8-1048

Vector Shift Left Long VSHLL on page A8-1050

Vector Shift Right VSHR on page A8-1052

Vector Shift Right and Narrow VSHRN on page A8-1054

Vector Shift Left and Insert VSLI on page A8-1056

Vector Shift Right and Accumulate VSRA on page A8-1060

Vector Shift Right and Insert VSRI on page A8-1062

A4 The Instruction Sets

A4.13 Advanced SIMD data-processing instructions

Non-Confidential ID051414

A4.13.5 Advanced SIMD multiply instructions

Table A4-23 summarizes the Advanced SIMD multiply instructions.

Advanced SIMD multiply instructions can operate on vectors of:

• 8-bit, 16-bit, or 32-bit unsigned integers.

• 8-bit, 16-bit, or 32-bit signed integers.

• 8-bit polynomials over {0, 1}.

VMUL

and

VMULL

are the only instructions that operate on polynomials.

VMULL

produces a 16-bit polynomial over {0, 1}.

• Single-precision (32-bit) floating-point numbers.

They can also act on one vector and one scalar.

Long instructions have doubleword (64-bit) operands, and produce quadword (128-bit) results. Other Advanced

SIMD multiply instructions can have either doubleword or quadword operands, and produce results of the same

size.

Floating-point multiply instructions can operate on:

• single-precision (32-bit) floating-point numbers

• double-precision (64-bit) floating-point numbers.

Some Floating-point Extension implementations do not support double-precision numbers.

Table A4-23 Advanced SIMD multiply instructions

Instruction See

Vector Multiply Accumulate

VMLA, VMLAL, VMLS, VMLSL (integer) on page A8-930

VMLA, VMLS (floating-point) on page A8-932

VMLA, VMLAL, VMLS, VMLSL (by scalar) on page A8-934

Vector Multiply Accumulate Long

Vector Multiply Subtract

Vector Multiply Subtract Long

Vector Multiply VMUL, VMULL (integer and polynomial) on page A8-958

VMUL (floating-point) on page A8-960

VMUL, VMULL (by scalar) on page A8-962

Vector Multiply Long

Vector Fused Multiply Accumulate VFMA, VFMS on page A8-892

Vector Fused Multiply Subtract

Vector Saturating Doubling Multiply Accumulate Long VQDMLAL, VQDMLSL on page A8-998

Vector Saturating Doubling Multiply Subtract Long

Vector Saturating Doubling Multiply Returning High Half VQDMULH on page A8-1000

Vector Saturating Rounding Doubling Multiply Returning High Half VQRDMULH on page A8-1008

Vector Saturating Doubling Multiply Long VQDMULL on page A8-1002

A4 The Instruction Sets

A4.13 Advanced SIMD data-processing instructions

ID051414 Non-Confidential

A4.13.6 Miscellaneous Advanced SIMD data-processing instructions

Table A4-24 shows miscellaneous Advanced SIMD data-processing instructions.

Table A4-24 Miscellaneous Advanced SIMD data-processing instructions

Instruction See

Vector Absolute Difference and Accumulate VABA, VABAL on page A8-818

Vector Absolute Difference VABD, VABDL (integer) on page A8-820

VABD (floating-point) on page A8-822

Vector Absolute VABS on page A8-824

Vector Convert between floating-point and fixed

point

VCVT (between floating-point and fixed-point, Advanced SIMD) on

page A8-872

Vector Convert between floating-point and integer VCVT (between floating-point and integer, Advanced SIMD) on page A8-868

Vector Convert between half-precision and

single-precision

VCVT (between half-precision and single-precision, Advanced SIMD) on

page A8-878

Vector Count Leading Sign Bits VCLS on page A8-858

Vector Count Leading Zeros VCLZ on page A8-862

Vector Count Set Bits VCNT on page A8-866

Vector Duplicate scalar VDUP (scalar) on page A8-884

Vector Extract VEXT on page A8-890

Vector Move and Narrow VMOVN on page A8-952

Vector Move Long VMOVL on page A8-950

Vector Maximum, Minimum VMAX, VMIN (integer) on page A8-926

VMAX, VMIN (floating-point) on page A8-928

Vector Negate VNEG on page A8-968

Vector Pairwise Maximum, Minimum VPMAX, VPMIN (integer) on page A8-986

VPMAX, VPMIN (floating-point) on page A8-988

Vector Reciprocal Estimate VRECPE on page A8-1024

Vector Reciprocal Step VRECPS on page A8-1026

Vector Reciprocal Square Root Estimate VRSQRTE on page A8-1038

Vector Reciprocal Square Root Step VRSQRTS on page A8-1040

Vector Reverse VREV16, VREV32, VREV64 on page A8-1028

Vector Saturating Absolute VQABS on page A8-994

Vector Saturating Move and Narrow VQMOVN, VQMOVUN on page A8-1004

Vector Saturating Negate VQNEG on page A8-1006

Vector Swap VSWP on page A8-1092

Vector Table Lookup VTBL, VTBX on page A8-1094

A4 The Instruction Sets

A4.13 Advanced SIMD data-processing instructions

Non-Confidential ID051414

Vector Transpose VTRN on page A8-1096

Vector Unzip VUZP on page A8-1100

Vector Zip VZIP on page A8-1102

Table A4-24 Miscellaneous Advanced SIMD data-processing instructions (continued)

Instruction See

A4 The Instruction Sets

A4.14 Floating-point data-processing instructions

ID051414 Non-Confidential

A4.14 Floating-point data-processing instructions

Table A4-25 summarizes the data-processing instructions in the Floating-point (VFP) instruction set.

For details of the floating-point arithmetic used by Floating-point instructions, see Floating-point data types and

arithmetic on page A2-63.

Table A4-25 Floating-point data-processing instructions

Instruction See

Absolute value VABS on page A8-824

Add VADD (floating-point) on page A8-830

Compare, optionally with exceptions enabled VCMP, VCMPE on page A8-864

Convert between floating-point and integer VCVT, VCVTR (between floating-point and integer, Floating-point) on

page A8-870

Convert between floating-point and fixed-point VCVT (between floating-point and fixed-point, Floating-point) on

page A8-874

Convert between double-precision and single-precision VCVT (between double-precision and single-precision) on page A8-876

Convert between half-precision and single-precision VCVTB, VCVTT on page A8-880

Divide VDIV on page A8-882

Multiply Accumulate VMLA, VMLS (floating-point) on page A8-932

Multiply Subtract

Fused Multiply Accumulate VFMA, VFMS on page A8-892

Fused Multiply Subtract

Move immediate value to extension register VMOV (immediate) on page A8-936

Copy from one extension register to another VMOV (register) on page A8-938

Multiply VMUL (floating-point) on page A8-960

Negate, by inverting the sign bit VNEG on page A8-968

Multiply Accumulate and Negate VNMLA, VNMLS, VNMUL on page A8-970

Multiply Subtract and Negate

Multiply and Negate

Fused Negate Multiply Accumulate VFNMA, VFNMS on page A8-894

Fused Negate Multiply Subtract

Square Root VSQRT on page A8-1058

Subtract VSUB (floating-point) on page A8-1086

A4 The Instruction Sets

A4.14 Floating-point data-processing instructions

Non-Confidential ID051414

ID051414 Non-Confidential

Chapter A5

ARM Instruction Set Encoding

This chapter describes the encoding of the ARM instruction set. It contains the following sections:

•ARM instruction set encoding on page A5-194

•Data-processing and miscellaneous instructions on page A5-196

•Load/store word and unsigned byte on page A5-208

•Media instructions on page A5-209

•Branch, branch with link, and block data transfer on page A5-214

•Coprocessor instructions, and Supervisor Call on page A5-215

•Unconditional instructions on page A5-216.

Note

• Architecture variant information in this chapter describes the architecture variant or extension in which the

instruction encoding was introduced into the ARM instruction set. All means that the instruction encoding

was introduced in ARMv4 or earlier, and so is in all variants of the ARM instruction set covered by this

manual.

• In the decode tables in this chapter, an entry of - for a field value means the value of the field does not affect

the decoding.

A5 ARM Instruction Set Encoding

A5.1 ARM instruction set encoding

Non-Confidential ID051414

A5.1 ARM instruction set encoding

The ARM instruction stream is a sequence of word-aligned words. Each ARM instruction is a single 32-bit word in

that stream. The encoding of an ARM instruction is:

Table A5-1 shows the major subdivisions of the ARM instruction set, determined by bits[31:25, 4].

Most ARM instructions can be conditional, with a condition determined by bits[31:28] of the instruction, the

cond

field. For more information see The condition code field. This applies to all instructions except those with the cond

field equal to

0b1111

A5.1.1 The condition code field

Every conditional instruction contains a 4-bit condition code field, the

cond

field, in bits 31 to 28:

This field contains one of the values

0b0000

0b1110

, as shown in Table A8-1 on page A8-288. Most instruction

mnemonics can be extended with the letters defined in the mnemonic extension column of this table.

If the always (

) condition is specified, the instruction is executed irrespective of the value of the condition flags.

The absence of a condition code on an instruction mnemonic implies the

condition code.

op1 op

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond

Table A5-1 ARM instruction encoding

cond op1 op Instruction classes

not 1111 00x - Data-processing and miscellaneous instructions on page A5-196.

010 - Load/store word and unsigned byte on page A5-208.

011 0 Load/store word and unsigned byte on page A5-208.

1Media instructions on page A5-209.

10x - Branch, branch with link, and block data transfer on page A5-214.

11x - Coprocessor instructions, and Supervisor Call on page A5-215.

Includes Floating-point instructions and Advanced SIMD data transfers, see Chapter A7 Advanced SIMD

and Floating-point Instruction Encoding.

1111 - - If the cond field is

0b1111

, the instruction can only be executed unconditionally, see Unconditional

instructions on page A5-216.

Includes Advanced SIMD instructions, see Chapter A7 Advanced SIMD and Floating-point

Instruction Encoding.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond

A5 ARM Instruction Set Encoding

A5.1 ARM instruction set encoding

ID051414 Non-Confidential

A5.1.2 UNDEFINED and UNPREDICTABLE instruction set space

An attempt to execute an unallocated instruction results in either:

• Unpredictable behavior. The instruction is described as UNPREDICTABLE.

• An Undefined Instruction exception. The instruction is described as UNDEFINED.

An instruction is UNDEFINED if it is declared as UNDEFINED in an instruction description, or in this chapter.

An instruction is UNPREDICTABLE if:

• it is declared as UNPREDICTABLE in an instruction description or in this chapter

• the pseudocode for that encoding does not indicate that a different special case applies, and a bit marked (0)

or (1) in the encoding diagram of an instruction is not 0 or 1 respectively.

For more information about UNDEFINED and UNPREDICTABLE instruction behavior, see Undefined Instruction

exception on page B1-1206.

Unless otherwise specified:

• ARM instructions introduced in an architecture variant are UNDEFINED in earlier architecture variants.

• ARM instructions introduced in one or more architecture extensions are UNDEFINED in an implementation

that does not include any of those extensions.

A5.1.3 The PC and the use of

0b1111

as a register specifier

In ARM instructions, the use of

0b1111

as a register specifier specifies the PC.

Many instructions are UNPREDICTABLE if they use

0b1111

as a register specifier. This is specified by pseudocode in

the instruction description.

Note

In ARMv7, ARM deprecates use of the PC as the base register in any store instruction.

A5.1.4 The SP and the use of

0b1101

as a register specifier

In ARM instructions, the use of

0b1101

as a register specifier specifies the SP.

ARM deprecates using SP for any purpose other than as a stack pointer.

A5 ARM Instruction Set Encoding

A5.2 Data-processing and miscellaneous instructions

Non-Confidential ID051414

A5.2 Data-processing and miscellaneous instructions

The encoding of ARM data-processing instructions, and some miscellaneous, instructions is:

Table A5-2 shows the allocation of encodings in this space.

Table A5-2 Data-processing and miscellaneous instructions

op op1 op2 Instruction or instruction class Variant

0 not 10xx0 xxx0 Data-processing (register) on page A5-197 -

0xx1 Data-processing (register-shifted register) on page A5-198 -

10xx0 0xxx Miscellaneous instructions on page A5-207 -

1xx0 Halfword multiply and multiply accumulate on page A5-203 -

0xxxx 1001 Multiply and multiply accumulate on page A5-202 -

1xxxx 1001 Synchronization primitives on page A5-205 -

not 0xx1x 1011 Extra load/store instructions on page A5-203 -

11x1 Extra load/store instructions on page A5-203 -

0xx10 11x1 Extra load/store instructions on page A5-203 -

0xx1x 1011 Extra load/store instructions, unprivileged on page A5-204 -

0xx11 11x1 Extra load/store instructions, unprivileged on page A5-204 -

1 not 10xx0 - Data-processing (immediate) on page A5-199 -

10000 - 16-bit immediate load, MOV (immediate) on page A8-484 v6T2

10100 - High halfword 16-bit immediate load, MOVT on page A8-491 v6T2

10x10 - MSR (immediate), and hints on page A5-206 -

0 0 op op1 op2

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond

A5 ARM Instruction Set Encoding

A5.2 Data-processing and miscellaneous instructions

ID051414 Non-Confidential

A5.2.1 Data-processing (register)

The encoding of ARM data-processing (register) instructions is:

Table A5-3 shows the allocation of encodings in this space. These encodings are in all architecture variants.

Table A5-3 Data-processing (register) instructions

op op2 imm5 Instruction See

0000x - - Bitwise AND AND (register) on page A8-326

0001x - - Bitwise Exclusive OR EOR (register) on page A8-384

0010x - - Subtract SUB (register) on page A8-712

0011x - - Reverse Subtract RSB (register) on page A8-576

0100x - - Add ADD (register, ARM) on page A8-312

0101x - - Add with Carry ADC (register) on page A8-302

0110x - - Subtract with Carry SBC (register) on page A8-594

0111x - - Reverse Subtract with Carry RSC (register) on page A8-582

10xx0 - - See Data-processing and miscellaneous instructions on page A5-196

10001 - - Test TST (register) on page A8-746

10011 - - Test Equivalence TEQ (register) on page A8-740

10101 - - Compare CMP (register) on page A8-372

10111 - - Compare Negative CMN (register) on page A8-366

1100x - - Bitwise OR ORR (register) on page A8-518

1101x 00 00000 Move MOV (register, ARM) on page A8-488

not 00000 Logical Shift Left LSL (immediate) on page A8-468

01 - Logical Shift Right LSR (immediate) on page A8-472

10 - Arithmetic Shift Right ASR (immediate) on page A8-330

11 00000 Rotate Right with Extend RRX on page A8-572

not 00000 Rotate Right ROR (immediate) on page A8-568

1110x - - Bitwise Bit Clear BIC (register) on page A8-342

1111x - - Bitwise NOT MVN (register) on page A8-506

0 0 0 op imm5 op2 0

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond

A5 ARM Instruction Set Encoding

A5.2 Data-processing and miscellaneous instructions

Non-Confidential ID051414

A5.2.2 Data-processing (register-shifted register)

The encoding of ARM data-processing (register-shifted register) instructions is:

Table A5-4 shows the allocation of encodings in this space. These encodings are in all architecture variants.

Table A5-4 Data-processing (register-shifted register) instructions

op1 op2 Instruction See

0000x - Bitwise AND AND (register-shifted register) on page A8-328

0001x - Bitwise Exclusive OR EOR (register-shifted register) on page A8-386

0010x - Subtract SUB (register-shifted register) on page A8-714

0011x - Reverse Subtract RSB (register-shifted register) on page A8-578

0100x - Add ADD (register-shifted register) on page A8-314

0101x - Add with Carry ADC (register-shifted register) on page A8-304

0110x - Subtract with Carry SBC (register-shifted register) on page A8-596

0111x - Reverse Subtract with Carry RSC (register-shifted register) on page A8-584

10xx0 - See Data-processing and miscellaneous instructions on page A5-196

10001 - Test TST (register-shifted register) on page A8-748

10011 - Test Equivalence TEQ (register-shifted register) on page A8-742

10101 - Compare CMP (register-shifted register) on page A8-374

10111 - Compare Negative CMN (register-shifted register) on page A8-368

1100x - Bitwise OR ORR (register-shifted register) on page A8-520

1101x 00 Logical Shift Left LSL (register) on page A8-470

01 Logical Shift Right LSR (register) on page A8-474

10 Arithmetic Shift Right ASR (register) on page A8-332

11 Rotate Right ROR (register) on page A8-570

1110x - Bitwise Bit Clear BIC (register-shifted register) on page A8-344

1111x - Bitwise NOT MVN (register-shifted register) on page A8-508

0 0 0 op1 0 op2 1

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond

A5 ARM Instruction Set Encoding

A5.2 Data-processing and miscellaneous instructions

ID051414 Non-Confidential

A5.2.3 Data-processing (immediate)

The encoding of ARM data-processing (immediate) instructions is:

Table A5-5 shows the allocation of encodings in this space. These encodings are in all architecture variants.

These instructions all have modified immediate constants, rather than a simple 12-bit binary number. This provides

a more useful range of values. For details see Modified immediate constants in ARM instructions on page A5-200.

Table A5-5 Data-processing (immediate) instructions

op Rn Instruction See

0000x - Bitwise AND AND (immediate) on page A8-324

0001x - Bitwise Exclusive OR EOR (immediate) on page A8-382

0010x not 1111 Subtract SUB (immediate, ARM) on page A8-710

1111 Form PC-relative address ADR on page A8-322

0011x - Reverse Subtract RSB (immediate) on page A8-574

0100x not 1111 Add ADD (immediate, ARM) on page A8-308

1111 Form PC-relative address ADR on page A8-322

0101x - Add with Carry ADC (immediate) on page A8-300

0110x - Subtract with Carry SBC (immediate) on page A8-592

0111x - Reverse Subtract with Carry RSC (immediate) on page A8-580

10xx0 - See Data-processing and miscellaneous instructions on page A5-196

10001 - Test TST (immediate) on page A8-744

10011 - Test Equivalence TEQ (immediate) on page A8-738

10101 - Compare CMP (immediate) on page A8-370

10111 - Compare Negative CMN (immediate) on page A8-364

1100x - Bitwise OR ORR (immediate) on page A8-516

1101x - Move MOV (immediate) on page A8-484

1110x - Bitwise Bit Clear BIC (immediate) on page A8-340

1111x - Bitwise NOT MVN (immediate) on page A8-504

0 0 1 op Rn

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond

A5 ARM Instruction Set Encoding

A5.2 Data-processing and miscellaneous instructions

Non-Confidential ID051414

A5.2.4 Modified immediate constants in ARM instructions

The encoding of a modified immediate constant in an ARM instruction is:

Table A5-6 shows the range of modified immediate constants available in ARM data-processing instructions, and

their encoding in the a, b, c, d, e, f, g, and h bits and the rotation field in the instruction.

Note

The range of values available in ARM modified immediate constants is slightly different from the range of values

available in 32-bit Thumb instructions. See Modified immediate constants in Thumb instructions on page A6-232.

Carry out

A logical instruction with the rotation field set to

0b0000

does not affect APSR.C. Otherwise, a logical flag-setting

instruction sets APSR.C to the value of bit[31] of the modified immediate constant.

Constants with multiple encodings

Some constant values have multiple possible encodings. In this case, a UAL assembler must select the encoding

with the lowest unsigned value of the rotation field. This is the encoding that appears first in Table A5-6. For

example, the constant

must be encoded with (rotation, abcdefgh) == (

0b0000

0b00000011

), not (

0b0001

0b00001100

), (

0b0010

0b00110000

), or (

0b0011

0b11000000

Table A5-6 Encoding of modified immediates in ARM processing instructions

rotation <const> a

a. This table shows the immediate constant value in binary form, to relate

abcdefgh

to the encoding diagram.

In assembly syntax, the immediate value is specified in the usual way (a decimal number by default).

0000

00000000 00000000 00000000 abcdefgh

0001

gh000000 00000000 00000000 00abcdef

0010

efgh0000 00000000 00000000 0000abcd

0011

cdefgh00 00000000 00000000 000000ab

0100

abcdefgh 00000000 00000000 00000000

8-bit values shifted to other even-numbered positions

1001

00000000 00abcdef gh000000 00000000

8-bit values shifted to other even-numbered positions

1110

00000000 00000000 0000abcd efgh0000

1111

00000000 00000000 000000ab cdefgh00

rotation a b c d e f g h

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A5 ARM Instruction Set Encoding

A5.2 Data-processing and miscellaneous instructions

ID051414 Non-Confidential

In particular, this means that all constants in the range 0-255 are encoded with rotation ==

0b0000

, and permitted

constants outside that range are encoded with rotation !=

0b0000

. A flag-setting logical instruction with a modified

immediate constant therefore leaves APSR.C unchanged if the constant is in the range 0-255 and sets it to the most

significant bit of the constant otherwise. This matches the behavior of Thumb modified immediate constants for all

constants that are permitted in both the ARM and Thumb instruction sets.

An alternative syntax is available for a modified immediate constant that permits the programmer to specify the

encoding directly. In this syntax,

#<const>

is instead written as

#<byte>, #<rot>

, where:

<byte>

is the numeric value of abcdefgh, in the range 0-255

<rot>

is twice the numeric value of rotation, an even number in the range 0-30.

This syntax permits all ARM data-processing instructions with modified immediate constants to be disassembled

to assembler syntax that assembles to the original instruction.

This syntax also makes it possible to write variants of some flag-setting logical instructions that have different

effects on APSR.C to those obtained with the normal

#<const>

syntax. For example,

ANDS R1, R2, #12, #2

has the

same behavior as

ANDS R1, R2, #3

except that it sets APSR.C to 0 instead of leaving it unchanged. Such variants of

flag-setting logical instructions do not have equivalents in the Thumb instruction set, and ARM deprecates their use.

Operation of modified immediate constants, ARM instructions

// ARMExpandImm()

// ==============

bits(32) ARMExpandImm(bits(12) imm12)

// APSR.C argument to following function call does not affect the imm32 result.

(imm32, -) = ARMExpandImm_C(imm12, APSR.C);

return imm32;

// ARMExpandImm_C()

// ================

(bits(32), bit) ARMExpandImm_C(bits(12) imm12, bit carry_in)

unrotated_value = ZeroExtend(imm12<7:0>, 32);

(imm32, carry_out) = Shift_C(unrotated_value, SRType_ROR, 2*UInt(imm12<11:8>), carry_in);

return (imm32, carry_out);

A5 ARM Instruction Set Encoding

A5.2 Data-processing and miscellaneous instructions

Non-Confidential ID051414

A5.2.5 Multiply and multiply accumulate

The encoding of ARM multiply and multiply accumulate instructions is:

Table A5-7 shows the allocation of encodings in this space.

A5.2.6 Saturating addition and subtraction

The encoding of ARM saturating addition and subtraction instructions is:

Table A5-8 shows the allocation of encodings in this space. These encodings are all available in ARMv5TE and

above, and are UNDEFINED in earlier variants of the architecture.

Table A5-7 Multiply and multiply accumulate instructions

op Instruction See Variant

000x Multiply MUL on page A8-502 All

001x Multiply Accumulate MLA on page A8-480 All

0100 Unsigned Multiply Accumulate Accumulate Long UMAAL on page A8-774 v6

0101 UNDEFINED --

0110 Multiply and Subtract MLS on page A8-482 v6T2

0111 UNDEFINED --

100x Unsigned Multiply Long UMULL on page A8-778 All

101x Unsigned Multiply Accumulate Long UMLAL on page A8-776 All

110x Signed Multiply Long SMULL on page A8-646 All

111x Signed Multiply Accumulate Long SMLAL on page A8-624 All

0 0 0 0 op 1 0 0 1

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond

Table A5-8 Saturating addition and subtraction instructions

op Instruction See

00 Saturating Add QADD on page A8-540

01 Saturating Subtract QSUB on page A8-554

10 Saturating Double and Add QDADD on page A8-548

11 Saturating Double and Subtract QDSUB on page A8-550

0 0 0 1 0 op 0 0 1 0 1

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond

A5 ARM Instruction Set Encoding

A5.2 Data-processing and miscellaneous instructions

ID051414 Non-Confidential

A5.2.7 Halfword multiply and multiply accumulate

The encoding of ARM halfword multiply and multiply accumulate instructions is:

Table A5-9 shows the allocation of encodings in this space.

These encodings are signed multiply (

SMUL

) and signed multiply accumulate (

SMLA

) instructions, operating on 16-bit

values, or mixed 16-bit and 32-bit values. The results and accumulators are 32-bit or 64-bit.

These encodings are all available in ARMv5TE and above, and are UNDEFINED in earlier variants of the architecture.

A5.2.8 Extra load/store instructions

The encoding of extra ARM load/store instructions is:

If (op2 ==

0b00

), then see Data-processing and miscellaneous instructions on page A5-196.

If ((op1 ==

0b0xx10

) && (op2 ==

0b01

)) or ((op1 ==

0b0xx11

) && (op2 !=

0b00

)) then see Extra load/store

instructions, unprivileged on page A5-204.

Otherwise, Table A5-10 shows the allocation of encodings in this space.

0 0 0 1 0 op1 0 1 op 0

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond

Table A5-9 Halfword multiply and multiply accumulate instructions

op1 op Instruction See

00 - Signed 16-bit multiply, 32-bit accumulate SMLABB, SMLABT, SMLATB, SMLATT on page A8-620

01 0 Signed 16-bit × 32-bit multiply, 32-bit accumulate SMLAWB, SMLAWT on page A8-630

1 Signed 16-bit × 32-bit multiply, 32-bit result SMULWB, SMULWT on page A8-648

10 - Signed 16-bit multiply, 64-bit accumulate SMLALBB, SMLALBT, SMLALTB, SMLALTT on page A8-626

11 - Signed 16-bit multiply, 32-bit result SMULBB, SMULBT, SMULTB, SMULTT on page A8-644

0 0 0 op1 Rn 1 op2 1

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond

Table A5-10 Extra load/store instructions

op2 op1 Rn Instruction See Variant

01 xx0x0 - Store Halfword STRH (register) on page A8-702 All

xx0x1 - Load Halfword LDRH (register) on page A8-446 All

xx1x0 - Store Halfword STRH (immediate, ARM) on page A8-700 All

xx1x1 not 1111 Load Halfword LDRH (immediate, ARM) on page A8-442 All

1111 Load Halfword LDRH (literal) on page A8-444 All

A5 ARM Instruction Set Encoding

A5.2 Data-processing and miscellaneous instructions

Non-Confidential ID051414

A5.2.9 Extra load/store instructions, unprivileged

The encoding of unprivileged extra ARM load/store instructions is:

If op2 ==

0b00

then see Data-processing and miscellaneous instructions on page A5-196.

If (op ==

0b0

&& op2 ==

0b1x

) then see Extra load/store instructions on page A5-203.

Otherwise, Table A5-11 shows the allocation of encodings in this space.

10 xx0x0 - Load Dual LDRD (register) on page A8-430 v5TE

xx0x1 - Load Signed Byte LDRSB (register) on page A8-454 All

xx1x0 not 1111 Load Dual LDRD (immediate) on page A8-426 v5TE

1111 Load Dual LDRD (literal) on page A8-428 v5TE

xx1x1 not 1111 Load Signed Byte LDRSB (immediate) on page A8-450 All

1111 Load Signed Byte LDRSB (literal) on page A8-452 All

11 xx0x0 - Store Dual STRD (register) on page A8-688 All

xx0x1 - Load Signed Halfword LDRSH (register) on page A8-462 All

xx1x0 - Store Dual STRD (immediate) on page A8-686 All

xx1x1 not 1111 Load Signed Halfword LDRSH (immediate) on page A8-458 All

1111 Load Signed Halfword LDRSH (literal) on page A8-460 All

Table A5-10 Extra load/store instructions (continued)

op2 op1 Rn Instruction See Variant

0 0 0 0 1 op 1 op2 1

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond

Table A5-11 Extra load/store instructions, unprivileged

op2 op Instruction See Variant

01 0 Store Halfword Unprivileged STRHT on page A8-704 v6T2

1 Load Halfword Unprivileged LDRHT on page A8-448 v6T2

10 1 Load Signed Byte Unprivileged LDRSBT on page A8-456 v6T2

11 1 Load Signed Halfword Unprivileged LDRSHT on page A8-464 v6T2

A5 ARM Instruction Set Encoding

A5.2 Data-processing and miscellaneous instructions

ID051414 Non-Confidential

A5.2.10 Synchronization primitives

The encoding of ARM synchronization primitive instructions is:

Table A5-12 shows the allocation of encodings in this space.

Other encodings in this space are UNDEFINED.

Table A5-12 Synchronization primitives

op Instruction See Variant

0x00 Swap Word, Swap Byte SWP, SWPB on page A8-722 a

a. ARM deprecates the use of these instructions.

All

1000 Store Register Exclusive STREX on page A8-690 v6

1001 Load Register Exclusive LDREX on page A8-432 v6

1010 Store Register Exclusive Doubleword STREXD on page A8-694 v6K

1011 Load Register Exclusive Doubleword LDREXD on page A8-436 v6K

1100 Store Register Exclusive Byte STREXB on page A8-692 v6K

1101 Load Register Exclusive Byte LDREXB on page A8-434 v6K

1110 Store Register Exclusive Halfword STREXH on page A8-696 v6K

1111 Load Register Exclusive Halfword LDREXH on page A8-438 v6K

0 0 0 1 op 1 0 0 1

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond

A5 ARM Instruction Set Encoding

A5.2 Data-processing and miscellaneous instructions

Non-Confidential ID051414

A5.2.11 MSR (immediate), and hints

The encoding of ARM

MSR

(immediate) and hint instructions is:

Table A5-13 shows the allocation of encodings in this space. Encodings with

set to 0,

op1

set to

0b0000

, and a

value of

op2

that is not shown in the table, are unallocated hints and behave as if

op2

is set to

0b00000000

. These

unallocated hint encodings are reserved and software must not use them.

0 0 1 1 0 op 1 0 op1 op2

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond

Table A5-13 MSR (immediate), and hints

op op1 op2 Instruction See Variant

0 0000 00000000 No Operation hint NOP on page A8-510 v6K, v6T2

00000001 Yield hint YIELD on page A8-1108 v6K

00000010 Wait For Event hint WFE on page A8-1104 v6K

00000011 Wait For Interrupt hint WFI on page A8-1106 v6K

00000100 Send Event hint SEV on page A8-606 v6K

1111xxxx Debug hint DBG on page A8-377 v7

0100

1x00

- Move to Special register, Application level MSR (immediate) on page A8-498 All

xx01

xx1x

- Move to Special register, System level MSR (immediate) on page B9-1996 All

1 - - Move to Special register, System level MSR (immediate) on page B9-1996 All

A5 ARM Instruction Set Encoding

A5.2 Data-processing and miscellaneous instructions

ID051414 Non-Confidential

A5.2.12 Miscellaneous instructions

The encoding of some miscellaneous ARM instructions is:

Table A5-14 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED.

0 0 0 1 0 op 0 op1 B 0 op2

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond

Table A5-14 Miscellaneous instructions

op2 B op op1 Instruction or instruction class See Variant

000 1 x0 xxxx Move from Banked or Special register MRS (Banked register) on page B9-1992 v7VE

x1 xxxx Move to Banked or Special register MSR (Banked register) on page B9-1994 v7VE

0 x0 xxxx Move from Special register MRS on page A8-496

MRS on page B9-1990

All

01 xx00 Move to Special register, Application level MSR (register) on page A8-500 All

xx01

xx1x

Move to Special register, System level MSR (register) on page B9-1998 All

11 - Move to Special register, System level MSR (register) on page B9-1998 All

001 - 01 - Branch and Exchange BX on page A8-352 v4T

11 - Count Leading Zeros CLZ on page A8-362 v5T

010 - 01 - Branch and Exchange Jazelle BXJ on page A8-354 v5TEJ

011 - 01 - Branch with Link and Exchange BLX (register) on page A8-350 v5T

101 - - - Saturating addition and subtraction Saturating addition and subtraction on

page A5-202

110 - 11 - Exception Return ERET on page B9-1982 v7VE

111 - 01 - Breakpoint BKPT on page A8-346 v5T

10 - Hypervisor Call HVC on page B9-1984 v7VE

11 - Secure Monitor Call SMC (previously SMI) on page B9-2002 Security

Extensions

A5 ARM Instruction Set Encoding

A5.3 Load/store word and unsigned byte

Non-Confidential ID051414

A5.3 Load/store word and unsigned byte

The encoding of ARM load/store word and unsigned byte instructions is:

These instructions have either A == 0 or B == 0. For instructions with A == 1 and B == 1, see Media instructions

on page A5-209.

Otherwise, Table A5-15 shows the allocation of encodings in this space. These encodings are in all architecture

variants.

0 1 A op1 Rn B

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond

Table A5-15 Single data transfer instructions

A op1 B Rn Instruction See

0 xx0x0 not 0x010 - - Store Register STR (immediate, ARM) on page A8-674

1 xx0x0 not 0x010 0 - Store Register STR (register) on page A8-676

0 0x010 - - Store Register Unprivileged STRT on page A8-706

1 0x010 0 -

0 xx0x1 not 0x011 - not 1111 Load Register (immediate) LDR (immediate, ARM) on page A8-408

1111 Load Register (literal) LDR (literal) on page A8-410

1 xx0x1 not 0x011 0 - Load Register LDR (register, ARM) on page A8-414

0 0x011 - - Load Register Unprivileged LDRT on page A8-466

1 0x011 0 -

0 xx1x0 not 0x110 - - Store Register Byte (immediate) STRB (immediate, ARM) on page A8-680

1 xx1x0 not 0x110 0 - Store Register Byte (register) STRB (register) on page A8-682

0 0x110 - - Store Register Byte Unprivileged STRBT on page A8-684

1 0x110 0 -

0 xx1x1 not 0x111 - not 1111 Load Register Byte (immediate) LDRB (immediate, ARM) on page A8-418

1111 Load Register Byte (literal) LDRB (literal) on page A8-420

1 xx1x1 not 0x111 0 - Load Register Byte (register) LDRB (register) on page A8-422

0 0x111 - - Load Register Byte Unprivileged LDRBT on page A8-424

1 0x111 0 -

A5 ARM Instruction Set Encoding

A5.4 Media instructions

ID051414 Non-Confidential

A5.4 Media instructions

The encoding of ARM media instructions is:

Table A5-16 shows the allocation of encodings in this space.

Other encodings in this space are UNDEFINED.

0 1 1 op1 Rd op2 1 Rn

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond

Table A5-16 Media instructions

op1 op2 Rd Rn cond Instructions See Variant

000xx - - - - - Parallel addition and subtraction, signed on

page A5-210

001xx - - - - - Parallel addition and subtraction, unsigned on

page A5-211

01xxx - - - - - Packing, unpacking, saturation, and reversal

on page A5-212

10xxx - - - - - Signed multiply, signed and unsigned divide on

page A5-213

11000 000 1111 - - Unsigned Sum of Absolute Differences USAD8 on page A8-792 v6

000 not

1111

- - Unsigned Sum of Absolute Differences

and Accumulate

USADA8 on page A8-794 v6

1101x x10 - - - Signed Bit Field Extract SBFX on page A8-598 v6T2

1110x x00 - 1111 - Bit Field Clear BFC on page A8-336 v6T2

not

1111

- Bit Field Insert BFI on page A8-338 v6T2

1111x x10 - - - Unsigned Bit Field Extract UBFX on page A8-756 v6T2

11111 111 - - 1110 Permanently UNDEFINED UDF on page A8-758 Alla

not

1110 -aAll

a. Issue C.a of this manual first defines an assembler mnemonic for this encoding. This mnemonic applies only to the unconditional encoding,

with cond set to

0b1110

A5 ARM Instruction Set Encoding

A5.4 Media instructions

Non-Confidential ID051414

A5.4.1 Parallel addition and subtraction, signed

The encoding of ARM signed parallel addition and subtraction instructions is:

Table A5-17 shows the allocation of encodings in this space. These encodings are all available in ARMv6 and

above, and are UNDEFINED in earlier variants of the architecture.

Other encodings in this space are UNDEFINED.

Table A5-17 Signed parallel addition and subtraction instructions

op1 op2 Instruction See

01 000 Add 16-bit SADD16 on page A8-586

001 Add and Subtract with Exchange, 16-bit SASX on page A8-590

010 Subtract and Add with Exchange, 16-bit SSAX on page A8-656

011 Subtract 16-bit SSUB16 on page A8-658

100 Add 8-bit SADD8 on page A8-588

111 Subtract 8-bit SSUB8 on page A8-660

Saturating instructions

10 000 Saturating Add 16-bit QADD16 on page A8-542

001 Saturating Add and Subtract with Exchange, 16-bit QASX on page A8-546

010 Saturating Subtract and Add with Exchange, 16-bit QSAX on page A8-552

011 Saturating Subtract 16-bit QSUB16 on page A8-556

100 Saturating Add 8-bit QADD8 on page A8-544

111 Saturating Subtract 8-bit QSUB8 on page A8-558

Halving instructions

11 000 Halving Add 16-bit SHADD16 on page A8-608

001 Halving Add and Subtract with Exchange, 16-bit SHASX on page A8-612

010 Halving Subtract and Add with Exchange, 16-bit SHSAX on page A8-614

011 Halving Subtract 16-bit SHSUB16 on page A8-616

100 Halving Add 8-bit SHADD8 on page A8-610

111 Halving Subtract 8-bit SHSUB8 on page A8-618

0 1 1 0 0 0 op1 op2 1

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond

A5 ARM Instruction Set Encoding

A5.4 Media instructions

ID051414 Non-Confidential

A5.4.2 Parallel addition and subtraction, unsigned

The encoding of ARM unsigned parallel addition and subtraction instructions is:

Table A5-18 shows the allocation of encodings in this space. These encodings are all available in ARMv6 and

above, and are UNDEFINED in earlier variants of the architecture.

Other encodings in this space are UNDEFINED.

Table A5-18 Unsigned parallel addition and subtractions instructions

op1 op2 Instruction See

01 000 Add 16-bit UADD16 on page A8-750

001 Add and Subtract with Exchange, 16-bit UASX on page A8-754

010 Subtract and Add with Exchange, 16-bit USAX on page A8-800

011 Subtract 16-bit USUB16 on page A8-802

100 Add 8-bit UADD8 on page A8-752

111 Subtract 8-bit USUB8 on page A8-804

Saturating instructions

10 000 Saturating Add 16-bit UQADD16 on page A8-780

001 Saturating Add and Subtract with Exchange, 16-bit UQASX on page A8-784

010 Saturating Subtract and Add with Exchange, 16-bit UQSAX on page A8-786

011 Saturating Subtract 16-bit UQSUB16 on page A8-788

100 Saturating Add 8-bit UQADD8 on page A8-782

111 Saturating Subtract 8-bit UQSUB8 on page A8-790

Halving instructions

11 000 Halving Add 16-bit UHADD16 on page A8-762

001 Halving Add and Subtract with Exchange, 16-bit UHASX on page A8-766

010 Halving Subtract and Add with Exchange, 16-bit UHSAX on page A8-768

011 Halving Subtract 16-bit UHSUB16 on page A8-770

100 Halving Add 8-bit UHADD8 on page A8-764

111 Halving Subtract 8-bit UHSUB8 on page A8-772

0 1 1 0 0 1 op1 op2 1

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond

A5 ARM Instruction Set Encoding

A5.4 Media instructions

Non-Confidential ID051414

A5.4.3 Packing, unpacking, saturation, and reversal

The encoding of ARM packing, unpacking, saturation, and reversal instructions is:

Table A5-19 shows the allocation of encodings in this space.

Other encodings in this space are UNDEFINED.

Table A5-19 Packing, unpacking, saturation, and reversal instructions

op1 op2 A Instructions See Variant

000 xx0 - Pack Halfword PKH on page A8-522 v6

011 not 1111 Signed Extend and Add Byte 16-bit SXTAB16 on page A8-726 v6

1111 Signed Extend Byte 16-bit SXTB16 on page A8-732 v6

101 - Select Bytes SEL on page A8-602 v6

01x xx0 - Signed Saturate SSAT on page A8-652 v6

010 001 - Signed Saturate, two 16-bit SSAT16 on page A8-654 v6

011 not 1111 Signed Extend and Add Byte SXTAB on page A8-724 v6

1111 Signed Extend Byte SXTB on page A8-730 v6

011 001 - Byte-Reverse Word REV on page A8-562 v6

011 not 1111 Signed Extend and Add Halfword SXTAH on page A8-728 v6

1111 Signed Extend Halfword SXTH on page A8-734 v6

101 - Byte-Reverse Packed Halfword REV16 on page A8-564 v6

100 011 not 1111 Unsigned Extend and Add Byte 16-bit UXTAB16 on page A8-808 v6

1111 Unsigned Extend Byte 16-bit UXTB16 on page A8-814 v6

11x xx0 - Unsigned Saturate USAT on page A8-796 v6

110 001 - Unsigned Saturate, two 16-bit USAT16 on page A8-798 v6

011 not 1111 Unsigned Extend and Add Byte UXTAB on page A8-806 v6

1111 Unsigned Extend Byte UXTB on page A8-812 v6

111 001 - Reverse Bits RBIT on page A8-560 v6T2

011 not 1111 Unsigned Extend and Add Halfword UXTAH on page A8-810 v6

1111 Unsigned Extend Halfword UXTH on page A8-816 v6

101 - Byte-Reverse Signed Halfword REVSH on page A8-566 v6

0 1 1 0 1 op1 A op2 1

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond

A5 ARM Instruction Set Encoding

A5.4 Media instructions

ID051414 Non-Confidential

A5.4.4 Signed multiply, signed and unsigned divide

The encoding of ARM signed multiply and divide instructions is:

Table A5-20 shows the allocation of encodings in this space.

Other encodings in this space are UNDEFINED.

0 1 1 1 0 op1 A op2 1

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond

Table A5-20 Signed multiply instructions

op1 op2 A Instruction See Variant

000 00x not 1111 Signed Multiply Accumulate Dual SMLAD on page A8-622 v6

1111 Signed Dual Multiply Add SMUAD on page A8-642 v6

01x not 1111 Signed Multiply Subtract Dual SMLSD on page A8-632 v6

1111 Signed Dual Multiply Subtract SMUSD on page A8-650 v6

001 000 - Signed Divide SDIV on page A8-600 v7a

011 000 - Unsigned Divide UDIV on page A8-760 v7a

100 00x - Signed Multiply Accumulate Long Dual SMLALD on page A8-628 v6

01x - Signed Multiply Subtract Long Dual SMLSLD on page A8-634 v6

101 00x not 1111 Signed Most Significant Word Multiply Accumulate SMMLA on page A8-636 v6

1111 Signed Most Significant Word Multiply SMMUL on page A8-640 v6

11x - Signed Most Significant Word Multiply Subtract SMMLS on page A8-638 v6

a. Optional in some ARMv7 implementations, see ARMv7 implementation requirements and options for the divide instructions on

page A4-172.

A5 ARM Instruction Set Encoding

A5.5 Branch, branch with link, and block data transfer

Non-Confidential ID051414

A5.5 Branch, branch with link, and block data transfer

The encoding of ARM branch, branch with link, and block data transfer instructions is:

Table A5-21 shows the allocation of encodings in this space. These encodings are in all architecture variants.

1 0 op Rn R

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond

Table A5-21 Branch, branch with link, and block data transfer instructions

op R Rn Instructions See

0000x0 - - Store Multiple Decrement After STMDA (STMED) on page A8-666

0000x1 - - Load Multiple Decrement After LDMDA/LDMFA on page A8-400

0010x0 - - Store Multiple Increment After STM (STMIA, STMEA) on page A8-664

001001 - - Load Multiple Increment After LDM/LDMIA/LDMFD (ARM) on page A8-398

001011 - not 1101 Load Multiple Increment After LDM/LDMIA/LDMFD (ARM) on page A8-398

1101 Pop multiple registers POP (ARM) on page A8-536

010000 - - Store Multiple Decrement Before STMDB (STMFD) on page A8-668

010010 - not 1101 Store Multiple Decrement Before STMDB (STMFD) on page A8-668

- 1101 Push multiple registers PUSH on page A8-538

0100x1 - - Load Multiple Decrement Before LDMDB/LDMEA on page A8-402

0110x0 - - Store Multiple Increment Before STMIB (STMFA) on page A8-670

0110x1 - - Load Multiple Increment Before LDMIB/LDMED on page A8-404

0xx1x0 - - Store Multiple (user registers) STM (User registers) on page B9-2008

0xx1x1 0 - Load Multiple (user registers) LDM (User registers) on page B9-1988

1 - Load Multiple (exception return) LDM (exception return) on page B9-1986

10xxxx - - Branch B on page A8-334

11xxxx - - Branch with Link BL, BLX (immediate) on page A8-348

A5 ARM Instruction Set Encoding

A5.6 Coprocessor instructions, and Supervisor Call

ID051414 Non-Confidential

A5.6 Coprocessor instructions, and Supervisor Call

The encoding of ARM coprocessor instructions and the Supervisor Call instruction is:

Table A5-22 shows the allocation of encodings in this space:

For more information about specific coprocessors see Coprocessor support on page A2-94.

1 1 op1 Rn coproc op

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond

Table A5-22 Coprocessor instructions, and Supervisor Call

coproc op1 op Rn Instructions See Variant

- 00000x - - UNDEFINED --

11xxxx - - Supervisor Call SVC (previously SWI) on page A8-720 All

not

101x

0xxxx0

not 000x00

- - Store Coprocessor STC, STC2 on page A8-662 All

0xxxx1

not 000x01

- not 1111 Load Coprocessor (immediate) LDC, LDC2 (immediate) on page A8-392 All

1111 Load Coprocessor (literal) LDC, LDC2 (literal) on page A8-394 All

000100 - - Move to Coprocessor from two

ARM core registers

MCRR, MCRR2 on page A8-478 v5TE

000101 - - Move to two ARM core

registers from Coprocessor

MRRC, MRRC2 on page A8-494 v5TE

10xxxx 0 - Coprocessor data operations CDP, CDP2 on page A8-358 All

10xxx0 1 - Move to Coprocessor from

ARM core register

MCR, MCR2 on page A8-476 All

10xxx1 1 - Move to ARM core register

from Coprocessor

MRC, MRC2 on page A8-492 All

101x 0xxxxx

not 000x0x

- - Advanced SIMD,

Floating-point

Extension register load/store instructions on

page A7-274

00010x - - Advanced SIMD,

Floating-point

64-bit transfers between ARM core and extension

registers on page A7-279

10xxxx 0 - Floating-point data processing Floating-point data-processing instructions on

page A7-272

10xxxx 1 - Advanced SIMD,

Floating-point

8, 16, and 32-bit transfer between ARM core and

extension registers on page A7-278

A5 ARM Instruction Set Encoding

A5.7 Unconditional instructions

Non-Confidential ID051414

A5.7 Unconditional instructions

The encoding of ARM unconditional instructions is:

Table A5-23 shows the allocation of encodings in this space.

Other encodings in this space are UNDEFINED in ARMv5 and above.

All encodings in this space are UNPREDICTABLE in ARMv4 and ARMv4T.

1 1 1 op1 Rn op

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Table A5-23 Unconditional instructions

op1 op Rn Instruction See Variant

0xxxxxxx - - - Memory hints, Advanced SIMD instructions, and

miscellaneous instructions on page A5-217

100xx1x0 - - Store Return State SRS (ARM) on page B9-2006 v6

100xx0x1 - - Return From Exception RFE on page B9-2000 v6

101xxxxx - - Branch with Link and Exchange BL, BLX (immediate) on page A8-348 v5

110xxxx0

not 11000x00

- - Store Coprocessor STC, STC2 on page A8-662 v5

110xxxx1

not 11000x01

- not 1111 Load Coprocessor (immediate) LDC, LDC2 (immediate) on page A8-392 v5

1111 Load Coprocessor (literal) LDC, LDC2 (literal) on page A8-394 v5

11000100 - - Move to Coprocessor from two ARM

core registers

MCRR, MCRR2 on page A8-478 v6

11000101 - - Move to two ARM core registers

from Coprocessor

MRRC, MRRC2 on page A8-494 v6

1110xxxx 0 - Coprocessor data operations CDP, CDP2 on page A8-358 v5

1110xxx0 1 - Move to Coprocessor from ARM

core register

MCR, MCR2 on page A8-476 v5

1110xxx1 1 - Move to ARM core register from

Coprocessor

MRC, MRC2 on page A8-492 v5

A5 ARM Instruction Set Encoding

A5.7 Unconditional instructions

ID051414 Non-Confidential

A5.7.1 Memory hints, Advanced SIMD instructions, and miscellaneous instructions

The encoding of ARM memory hint and Advanced SIMD instructions, and some miscellaneous instruction is:

Table A5-24 shows the allocation of encodings in this space.

Other encodings in this space are UNDEFINED in ARMv5 and above. All these encodings are UNPREDICTABLE in

ARMv4 and ARMv4T.

1 1 1 0 op1 Rn op2

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Table A5-24 Hints, and Advanced SIMD instructions

op1 op2 Rn Instruction See Variant

0010000 xx0x xxx0 Change Processor State CPS (ARM) on page B9-1980 v6

0010000 0000 xxx1 Set Endianness SETEND on page A8-604 v6

0010010 0111 - UNPREDICTABLE - v5T

01xxxxx - - See Advanced SIMD data-processing instructions on page A7-261 v7

100xxx0 - - See Advanced SIMD element or structure load/store instructions on page A7-275 v7

100x001 - - Unallocated memory hint (treat as NOP) MP Exta

100x101 - - Preload Instruction PLI (immediate, literal) on page A8-530 v7

100xx11 - - UNPREDICTABLE --

101x001 - not 1111 Preload Data with intent to Write PLD, PLDW (immediate) on page A8-524 MP Exta

1111 UNPREDICTABLE --

101x101 - not 1111 Preload Data PLD, PLDW (immediate) on page A8-524 v5TE

1111 Preload Data PLD (literal) on page A8-526 v5TE

1010011 - - UNPREDICTABLE --

1010111 0000 - UNPREDICTABLE --

0001 - Clear-Exclusive CLREX on page A8-360 v6K

001x - UNPREDICTABLE --

0100 - Data Synchronization Barrier DSB on page A8-380 v6T2

0101 - Data Memory Barrier DMB on page A8-378 v7

0110 - Instruction Synchronization Barrier ISB on page A8-389 v6T2

0111 - UNPREDICTABLE --

1xxx - UNPREDICTABLE --

1011x11 - - UNPREDICTABLE -

110x001 xxx0 - Unallocated memory hint (treat as NOP) MP Exta

110x101 xxx0 - Preload Instruction PLI (register) on page A8-532 v7

111x001 xxx0 - Preload Data with intent to Write PLD, PLDW (register) on page A8-528 MP Exta

A5 ARM Instruction Set Encoding

A5.7 Unconditional instructions

Non-Confidential ID051414

111x101 xxx0 - Preload Data PLD, PLDW (register) on page A8-528 v5TE

11xxx11 xxx0 - UNPREDICTABLE --

1111111 1111 Permanently UNDEFINEDb-v5

a. Multiprocessing Extensions.

b. See Table A5-16 on page A5-209 for the full range of encodings in this permanently UNDEFINED group.

Table A5-24 Hints, and Advanced SIMD instructions (continued)

op1 op2 Rn Instruction See Variant

ID051414 Non-Confidential

Chapter A6

Thumb Instruction Set Encoding

This chapter introduces the Thumb instruction set and describes how it uses the ARM programmers’ model. It

contains the following sections:

•Thumb instruction set encoding on page A6-220

•16-bit Thumb instruction encoding on page A6-223

•32-bit Thumb instruction encoding on page A6-230.

For details of the differences between the Thumb and ThumbEE instruction sets see Chapter A9 The ThumbEE

Instruction Set.

Note

• Architecture variant information in this chapter describes the architecture variant or extension in which the

instruction encoding was introduced into the Thumb instruction set.

• In the decode tables in this chapter, an entry of - for a field value means the value of the field does not affect

the decoding.

A6 Thumb Instruction Set Encoding

A6.1 Thumb instruction set encoding

Non-Confidential ID051414

A6.1 Thumb instruction set encoding

The Thumb instruction stream is a sequence of halfword-aligned halfwords. Each Thumb instruction is either a

single 16-bit halfword in that stream, or a 32-bit instruction consisting of two consecutive halfwords in that stream.

If the value of bits[15:11] of the halfword being decoded is one of the following, the halfword is the first halfword

of a 32-bit instruction:

•

0b11101

•

0b11110

•

0b11111

Otherwise, the halfword is a 16-bit instruction.

For details of the encoding of 16-bit Thumb instructions see 16-bit Thumb instruction encoding on page A6-223.

For details of the encoding of 32-bit Thumb instructions see 32-bit Thumb instruction encoding on page A6-230.

A6.1.1 UNDEFINED and UNPREDICTABLE instruction set space

An attempt to execute an unallocated instruction results in either:

• Unpredictable behavior. The instruction is described as UNPREDICTABLE.

• An Undefined Instruction exception. The instruction is described as UNDEFINED.

An instruction is UNDEFINED if it is declared as UNDEFINED in an instruction description, or in this chapter.

An instruction is UNPREDICTABLE if:

• a bit marked (0) in the encoding diagram of an instruction is not 0, and the pseudocode for that encoding does

not indicate that a different special case applies when that bit is not 0

• a bit marked (1) in the encoding diagram of an instruction is not 1, and the pseudocode for that encoding does

not indicate that a different special case applies when that bit is not 1

• it is declared as UNPREDICTABLE in an instruction description or in this chapter.

For more information about UNDEFINED and UNPREDICTABLE instruction behavior, see Undefined Instruction

exception on page B1-1206.

Unless otherwise specified:

• Thumb instructions introduced in an architecture variant are either UNPREDICTABLE or UNDEFINED in earlier

architecture variants.

• A Thumb instruction that is provided by one or more of the architecture extensions is either UNPREDICTABLE

or UNDEFINED in an implementation that does not include any of those extensions.

In both cases, the instruction is UNPREDICTABLE if it is a 32-bit instruction in an architecture variant before

ARMv6T2, and UNDEFINED otherwise.

A6.1.2 Use of the PC, and use of

0b1111

as a register specifier

The use of

0b1111

as a register specifier is not normally permitted in Thumb instructions. When a value of

0b1111

permitted, a variety of meanings is possible. For register reads, these meanings include:

• Read the PC value, that is, the address of the current instruction + 4. The base register of the table branch

instructions

TBB

and

TBH

can be the PC. This means branch tables can be placed in memory immediately after

the instruction.

• Read the word-aligned PC value, that is, the address of the current instruction + 4, with bits[1:0] forced to

zero. The base register of

LDC

LDR

LDRB

LDRD

(pre-indexed, no writeback),

LDRH

LDRSB

, and

LDRSH

instructions

can be the word-aligned PC. This provides PC-relative data addressing. In addition, some encodings of the

ADD

and

SUB

instructions permit their source registers to be

0b1111

for the same purpose.

A6 Thumb Instruction Set Encoding

A6.1 Thumb instruction set encoding

ID051414 Non-Confidential

• Read zero. This is done in some cases when one instruction is a special case of another, more general

instruction, but with one operand zero. In these cases, the instructions are listed on separate pages, with a

special case in the pseudocode for the more general instruction cross-referencing the other page.

For register writes, these meanings include:

• The PC can be specified as the destination register of an

LDR

instruction. This is done by encoding Rt as

0b1111

. The loaded value is treated as an address, and the effect of execution is a branch to that address. Bit[0]

of the loaded value selects whether to execute ARM or Thumb instructions after the branch.

• Some other instructions write the PC in similar ways. An instruction can specify that the PC is written:

— implicitly, for example, branch instructions

— explicitly by a register specifier of

0b1111

, for example 16-bit

MOV

(register) instructions

— explicitly by using a register mask, for example

LDM

instructions.

The address to branch to can be:

— a loaded value, for example,

RFE

— a register value, for example,

— the result of a calculation, for example,

TBB

TBH

The method of choosing the instruction set used after the branch can be:

— similar to the

LDR

case, for example,

LDM

— a fixed instruction set other than the one currently being used, for example, the immediate form of

BLX

— unchanged, for example, branch instructions or 16-bit

MOV

(register) instructions

— set from the {J, T} bits of the SPSR, for

RFE

and

SUBS PC, LR, #imm8

• Discard the result of a calculation. This is done in some cases when one instruction is a special case of

another, more general instruction, but with the result discarded. In these cases, the instructions are listed on

separate pages, with a special case in the pseudocode for the more general instruction cross-referencing the

other page.

• If the destination register specifier of an

LDRB

LDRH

LDRSB

, or

LDRSH

instruction is

0b1111

, the instruction is a

memory hint instead of a load operation.

• If the destination register specifier of an

MRC

instruction is 0b1111, bits[31:28] of the value transferred from

the coprocessor are written to the N, Z, C, and V condition flags in the APSR, and bits[27:0] are discarded.

A6.1.3 Use of the SP, and use of

0b1101

as a register specifier

R13 is defined in the Thumb instruction set so that its use is primarily as a stack pointer, and R13 is normally

identified as SP in Thumb instructions. In 32-bit Thumb instructions, if software uses R13 as a general-purpose

The restrictions applicable to R13 are described in:

•R13[1:0] definition

•32-bit Thumb instruction support for R13 on page A6-222.

See also 16-bit Thumb instruction support for R13 on page A6-222.

R13[1:0] definition

Bits[1:0] of R13 are SBZP. Writing a nonzero value to bits[1:0] causes UNPREDICTABLE behavior.

A6 Thumb Instruction Set Encoding

A6.1 Thumb instruction set encoding

Non-Confidential ID051414

32-bit Thumb instruction support for R13

R13 instruction support is restricted to the following:

• R13 as the source or destination register of a

MOV

instruction. Only register to register transfers without shifts

are supported, with no flag-setting:

MOV SP, <Rm>

MOV <Rn>, SP

• Using the following instructions to adjust R13 up or down by a multiple of 4:

ADD{W} SP, SP, #<imm>

SUB{W} SP, SP, #<imm>

ADD SP, SP, <Rm>

ADD SP, SP, <Rm>, LSL #<n> ; For <n> = 1, 2, 3

SUB SP, SP, <Rm>

SUB SP, SP, <Rm>, LSL #<n> ; For <n> = 1, 2, 3

• R13 as a base register

<Rn>

of any load/store instruction. This supports SP-based addressing for load, store,

or memory hint instructions, with positive or negative offsets, with and without writeback.

• R13 as the first operand

<Rn>

in any

ADD{S}

CMN

CMP

, or

SUB{S}

instruction. The add and subtract instructions

support SP-based address generation, with the address going into an ARM core register, R0-R12 or R14.

CMN

and

CMP

are useful for stack checking in some circumstances.

• R13 as the transferred register

<Rt>

in any

LDR

STR

instruction.

16-bit Thumb instruction support for R13

For 16-bit data-processing instructions that affect high registers, R13 can only be used as described in 32-bit Thumb

instruction support for R13. ARM deprecates any other use. This affects the high register forms of

CMP

and

ADD

where ARM deprecates the use of R13 as

<Rm>

A6 Thumb Instruction Set Encoding

A6.2 16-bit Thumb instruction encoding

ID051414 Non-Confidential

A6.2 16-bit Thumb instruction encoding

The encoding of a 16-bit Thumb instruction is:

Table A6-1 shows the allocation of 16-bit instruction encodings.

Table A6-1 16-bit Thumb instruction encoding

Opcode Instruction or instruction class Variant

00xxxx Shift (immediate), add, subtract, move, and compare on page A6-224 -

010000 Data-processing on page A6-225 -

010001 Special data instructions and branch and exchange on page A6-226 -

01001x Load from Literal Pool, see LDR (literal) on page A8-410 v4T

0101xx

011xxx

100xxx

Load/store single data item on page A6-227 -

10100x Generate PC-relative address, see ADR on page A8-322 v4T

10101x Generate SP-relative address, see ADD (SP plus immediate) on page A8-316 v4T

1011xx Miscellaneous 16-bit instructions on page A6-228 -

11000x Store multiple registers, see STM (STMIA, STMEA) on page A8-664 a

a. In ThumbEE, 16-bit load/store multiple instructions are not available. This encoding is used for special

ThumbEE instructions. For details see Chapter A9 The ThumbEE Instruction Set.

v4T

11001x Load multiple registers, see LDM/LDMIA/LDMFD (Thumb) on page A8-396 av4T

1101xx Conditional branch, and Supervisor Call on page A6-229 -

11100x Unconditional Branch, see B on page A8-334 v4T

Opcode

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A6 Thumb Instruction Set Encoding

A6.2 16-bit Thumb instruction encoding

Non-Confidential ID051414

A6.2.1 Shift (immediate), add, subtract, move, and compare

The encoding of 16-bit Thumb shift (immediate), add, subtract, move, and compare instructions is:

Table A6-2 shows the allocation of encodings in this space.

All these instructions are available since the Thumb instruction set was introduced in ARMv4T.

Table A6-2 16-bit Thumb shift (immediate), add, subtract, move, and compare instructions

Opcode Instruction See

000xx Logical Shift Lefta

a. When

Opcode

0b00000

, and bits[8:6] are

0b000

, this is an encoding for

MOV

, see

MOV (register, Thumb) on page A8-486.

LSL (immediate) on page A8-468

001xx Logical Shift Right LSR (immediate) on page A8-472

010xx Arithmetic Shift Right ASR (immediate) on page A8-330

01100 Add register ADD (register, Thumb) on page A8-310

01101 Subtract register SUB (register) on page A8-712

01110 Add 3-bit immediate ADD (immediate, Thumb) on page A8-306

01111 Subtract 3-bit immediate SUB (immediate, Thumb) on page A8-708

100xx Move MOV (immediate) on page A8-484

101xx Compare CMP (immediate) on page A8-370

110xx Add 8-bit immediate ADD (immediate, Thumb) on page A8-306

111xx Subtract 8-bit immediate SUB (immediate, Thumb) on page A8-708

0 0 Opcode

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A6 Thumb Instruction Set Encoding

A6.2 16-bit Thumb instruction encoding

ID051414 Non-Confidential

A6.2.2 Data-processing

The encoding of 16-bit Thumb data-processing instructions is:

Table A6-3 shows the allocation of encodings in this space.

All these instructions are available since the Thumb instruction set was introduced in ARMv4T.

Table A6-3 16-bit Thumb data-processing instructions

Opcode Instruction See

0000 Bitwise AND AND (register) on page A8-326

0001 Bitwise Exclusive OR EOR (register) on page A8-384

0010 Logical Shift Left LSL (register) on page A8-470

0011 Logical Shift Right LSR (register) on page A8-474

0100 Arithmetic Shift Right ASR (register) on page A8-332

0101 Add with Carry ADC (register) on page A8-302

0110 Subtract with Carry SBC (register) on page A8-594

0111 Rotate Right ROR (register) on page A8-570

1000 Test TST (register) on page A8-746

1001 Reverse Subtract from 0 RSB (immediate) on page A8-574

1010 Compare CMP (register) on page A8-372

1011 Compare Negative CMN (register) on page A8-366

1100 Bitwise OR ORR (register) on page A8-518

1101 Multiply MUL on page A8-502

1110 Bitwise Bit Clear BIC (register) on page A8-342

1111 Bitwise NOT MVN (register) on page A8-506

0 1 0 0 0 0 Opcode

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A6 Thumb Instruction Set Encoding

A6.2 16-bit Thumb instruction encoding

Non-Confidential ID051414

A6.2.3 Special data instructions and branch and exchange

The encoding of 16-bit Thumb special data instructions and branch and exchange instructions is:

Table A6-4 shows the allocation of encodings in this space.

Table A6-4 16-bit Thumb special data instructions and branch and exchange

Opcode Instruction See Variant

0000 Add Low Registers ADD (register, Thumb) on page A8-310 v6T2 a

a. UNPREDICTABLE in earlier variants.

0001

001x

Add High Registers ADD (register, Thumb) on page A8-310 v4T

01xx Compare High Registers CMP (register) on page A8-372 v4T

1000 Move Low Registers MOV (register, Thumb) on page A8-486 v6 a

1001

101x

Move High Registers MOV (register, Thumb) on page A8-486 v4T

110x Branch and Exchange BX on page A8-352 v4T

111x Branch with Link and Exchange BLX (register) on page A8-350 v5T a

0 1 0 0 0 1 Opcode

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A6 Thumb Instruction Set Encoding

A6.2 16-bit Thumb instruction encoding

ID051414 Non-Confidential

A6.2.4 Load/store single data item

The encoding of 16-bit Thumb instructions that load or store a single data item is:

These instructions have one of the following values of opA:

•

0b0101

•

0b011x

•

0b100x

Table A6-5 shows the allocation of encodings in this space.

All these instructions are available since the Thumb instruction set was introduced in ARMv4T.

Table A6-5 16-bit Thumb Load/store single data item instructions

opA opB Instruction See

0101 000 Store Register STR (register) on page A8-676

001 Store Register Halfword STRH (register) on page A8-702

010 Store Register Byte STRB (register) on page A8-682

011 Load Register Signed Byte LDRSB (register) on page A8-454

100 Load Register LDR (register, Thumb) on page A8-412

101 Load Register Halfword LDRH (register) on page A8-446

110 Load Register Byte LDRB (register) on page A8-422

111 Load Register Signed Halfword LDRSH (register) on page A8-462

0110 0xx Store Register STR (immediate, Thumb) on page A8-672

1xx Load Register LDR (immediate, Thumb) on page A8-406

0111 0xx Store Register Byte STRB (immediate, Thumb) on page A8-678

1xx Load Register Byte LDRB (immediate, Thumb) on page A8-416

1000 0xx Store Register Halfword STRH (immediate, Thumb) on page A8-698

1xx Load Register Halfword LDRH (immediate, Thumb) on page A8-440

1001 0xx Store Register SP relative STR (immediate, Thumb) on page A8-672

1xx Load Register SP relative LDR (immediate, Thumb) on page A8-406

opA opB

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A6 Thumb Instruction Set Encoding

A6.2 16-bit Thumb instruction encoding

Non-Confidential ID051414

A6.2.5 Miscellaneous 16-bit instructions

The encoding of 16-bit Thumb miscellaneous instructions is:

Table A6-6 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED.

Table A6-6 Miscellaneous 16-bit instructions

Opcode Instruction See Variant

00000xx Add Immediate to SP ADD (SP plus immediate) on page A8-316 v4T

00001xx Subtract Immediate from SP SUB (SP minus immediate) on page A8-716 v4T

0001xxx Compare and Branch on Zero CBNZ, CBZ on page A8-356 v6T2

001000x Signed Extend Halfword SXTH on page A8-734 v6

001001x Signed Extend Byte SXTB on page A8-730 v6

001010x Unsigned Extend Halfword UXTH on page A8-816 v6

001011x Unsigned Extend Byte UXTB on page A8-812 v6

0011xxx Compare and Branch on Zero CBNZ, CBZ on page A8-356 v6T2

010xxxx Push Multiple Registers PUSH on page A8-538 v4T

0110010 Set Endianness SETEND on page A8-604 v6

0110011 Change Processor State CPS (Thumb) on page B9-1978 v6

1001xxx Compare and Branch on Nonzero CBNZ, CBZ on page A8-356 v6T2

101000x Byte-Reverse Word REV on page A8-562 v6

101001x Byte-Reverse Packed Halfword REV16 on page A8-564 v6

101011x Byte-Reverse Signed Halfword REVSH on page A8-566 v6

1011xxx Compare and Branch on Nonzero CBNZ, CBZ on page A8-356 v6T2

110xxxx Pop Multiple Registers POP (Thumb) on page A8-534 v4T

1110xxx Breakpoint BKPT on page A8-346 v5

1111xxx If-Then, and hints If-Then, and hints on page A6-229 -

1 0 1 1 Opcode

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A6 Thumb Instruction Set Encoding

A6.2 16-bit Thumb instruction encoding

ID051414 Non-Confidential

If-Then, and hints

The encoding of 16-bit Thumb If-Then and hint instructions is:

Table A6-7 shows the allocation of encodings in this space.

Other encodings in this space are unallocated hints. They execute as NOPs, but software must not use them.

A6.2.6 Conditional branch, and Supervisor Call

The encoding of 16-bit Thumb conditional branch and Supervisor Call instructions is:

Table A6-8 shows the allocation of encodings in this space.

All these instructions are available since the Thumb instruction set was introduced in ARMv4T.

Table A6-7 16-bit If-Then and hint instructions

opA opB Instruction See Variant

- not 0000 If-Then IT on page A8-390 v6T2

0000 0000 No Operation hint NOP on page A8-510 v6T2

0001 0000 Yield hint YIELD on page A8-1108 v7

0010 0000 Wait For Event hint WFE on page A8-1104 v7

0011 0000 Wait For Interrupt hint WFI on page A8-1106 v7

0100 0000 Send Event hint SEV on page A8-606 v7

1 0 1 1 1 1 1 1 opA opB

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Table A6-8 Conditional branch and Supervisor Call instructions

Opcode Instruction See

not 111x Conditional branch B on page A8-334

1110 Permanently UNDEFINED UDF on page A8-758a

a. Issue C.a of this manual first defines an assembler mnemonic for this encoding.

1111 Supervisor Call SVC (previously SWI) on page A8-720

1 1 0 1 Opcode

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A6 Thumb Instruction Set Encoding

A6.3 32-bit Thumb instruction encoding

Non-Confidential ID051414

A6.3 32-bit Thumb instruction encoding

The encoding of a 32-bit Thumb instruction is:

If op1 ==

0b00

, a 16-bit instruction is encoded, see 16-bit Thumb instruction encoding on page A6-223.

Otherwise, Table A6-9 shows the allocation of encodings in this space.

Table A6-9 32-bit Thumb instruction encoding

op1 op2 op Instruction class, see

01 00xx0xx - Load/store multiple on page A6-237

00xx1xx - Load/store dual, load/store exclusive, table branch on page A6-238

01xxxxx - Data-processing (shifted register) on page A6-243

1xxxxxx - Coprocessor, Advanced SIMD, and Floating-point instructions on page A6-251

10 x0xxxxx 0 Data-processing (modified immediate) on page A6-231

x1xxxxx 0 Data-processing (plain binary immediate) on page A6-234

-1Branches and miscellaneous control on page A6-235

11 000xxx0 - Store single data item on page A6-242

00xx001 - Load byte, memory hints on page A6-241

00xx011 - Load halfword, memory hints on page A6-240

00xx101 - Load word on page A6-239

00xx111 - UNDEFINED

001xxx0 - Advanced SIMD element or structure load/store instructions on page A7-275

010xxxx - Data-processing (register) on page A6-245

0110xxx - Multiply, multiply accumulate, and absolute difference on page A6-249

0111xxx - Long multiply, long multiply accumulate, and divide on page A6-250

1xxxxxx - Coprocessor, Advanced SIMD, and Floating-point instructions on page A6-251

1 1 op1 op2 op

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

A6 Thumb Instruction Set Encoding

A6.3 32-bit Thumb instruction encoding

ID051414 Non-Confidential

A6.3.1 Data-processing (modified immediate)

The encoding of the 32-bit Thumb data-processing (modified immediate) instructions is:

Table A6-10 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED.

These encodings are all available in ARMv6T2 and above.

These instructions all have modified immediate constants, rather than a simple 12-bit binary number. This provides

a more useful range of values. For details see Modified immediate constants in Thumb instructions on page A6-232.

Table A6-10 32-bit modified immediate data-processing instructions

op Rn Rd:S Instruction See

0000 - not 11111 Bitwise AND AND (immediate) on page A8-324

11111 Test TST (immediate) on page A8-744

0001 - - Bitwise Bit Clear BIC (immediate) on page A8-340

0010 not 1111 - Bitwise OR ORR (immediate) on page A8-516

1111 - Move MOV (immediate) on page A8-484

0011 not 1111 - Bitwise OR NOT ORN (immediate) on page A8-512

1111 - Bitwise NOT MVN (immediate) on page A8-504

0100 - not 11111 Bitwise Exclusive OR EOR (immediate) on page A8-382

11111 Test Equivalence TEQ (immediate) on page A8-738

1000 - not 11111 Add ADD (immediate, Thumb) on page A8-306

11111 Compare Negative CMN (immediate) on page A8-364

1010 - - Add with Carry ADC (immediate) on page A8-300

1011 - - Subtract with Carry SBC (immediate) on page A8-592

1101 - not 11111 Subtract SUB (immediate, Thumb) on page A8-708

11111 Compare CMP (immediate) on page A8-370

1110 - - Reverse Subtract RSB (immediate) on page A8-574

1 1 1 0 0 op S Rn 0 Rd

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

A6 Thumb Instruction Set Encoding

A6.3 32-bit Thumb instruction encoding

Non-Confidential ID051414

A6.3.2 Modified immediate constants in Thumb instructions

The encoding of a modified immediate constant in a 32-bit Thumb instruction is:

Table A6-11 shows the range of modified immediate constants available in Thumb data-processing instructions, and

their encoding in the a, b, c, d, e, f, g, h, and i bits, and the imm3 field, in the instruction.

Note

As the footnotes to Table A6-11 show, the range of values available in Thumb modified immediate constants is

slightly different from the range of values available in ARM instructions. See Modified immediate constants in ARM

instructions on page A5-200 for the ARM values.

Carry out

A logical instruction with i:imm3:a == '00xxx' does not affect the Carry flag. Otherwise, a logical flag-setting

instruction sets the Carry flag to the value of bit[31] of the modified immediate constant.

Table A6-11 Encoding of modified immediates in Thumb data-processing instructions

i:imm3:a <const> a

a. This table shows the immediate constant value in binary form, to relate

abcdefgh

to the encoding diagram.

In assembly syntax, the immediate value is specified in the usual way (a decimal number by default).

0000x

00000000 00000000 00000000 abcdefgh

0001x

00000000 abcdefgh 00000000 abcdefgh

b. Not available in ARM instructions. UNPREDICTABLE if abcdefgh == 00000000.

0010x

abcdefgh 00000000 abcdefgh 00000000

0011x

abcdefgh abcdefgh abcdefgh abcdefgh

01000

1bcdefgh 00000000 00000000 00000000

01001

01bcdefg h0000000 00000000 00000000

01010

001bcdef gh000000 00000000 00000000

01011

0001bcde fgh00000 00000000 00000000

8-bit values shifted to other positions

11101

00000000 00000000 000001bc defgh000

11110

00000000 00000000 0000001b cdefgh00

11111

00000000 00000000 00000001 bcdefgh0

c. Not available in ARM instructions if

== 1.

i imm3 a b c d e f g h

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

A6 Thumb Instruction Set Encoding

A6.3 32-bit Thumb instruction encoding

ID051414 Non-Confidential

Operation of modified immediate constants, Thumb instructions

// ThumbExpandImm()

// ================

bits(32) ThumbExpandImm(bits(12) imm12)

// APSR.C argument to following function call does not affect the imm32 result.

(imm32, -) = ThumbExpandImm_C(imm12, APSR.C);

return imm32;

// ThumbExpandImm_C()

// ==================

(bits(32), bit) ThumbExpandImm_C(bits(12) imm12, bit carry_in)

if imm12<11:10> == '00' then

case imm12<9:8> of

when '00'

imm32 = ZeroExtend(imm12<7:0>, 32);

when '01'

if imm12<7:0> == '00000000' then UNPREDICTABLE;

imm32 = '00000000' : imm12<7:0> : '00000000' : imm12<7:0>;

when '10'

if imm12<7:0> == '00000000' then UNPREDICTABLE;

imm32 = imm12<7:0> : '00000000' : imm12<7:0> : '00000000';

when '11'

if imm12<7:0> == '00000000' then UNPREDICTABLE;

imm32 = imm12<7:0> : imm12<7:0> : imm12<7:0> : imm12<7:0>;

carry_out = carry_in;

else

unrotated_value = ZeroExtend('1':imm12<6:0>, 32);

(imm32, carry_out) = ROR_C(unrotated_value, UInt(imm12<11:7>));

return (imm32, carry_out);

A6 Thumb Instruction Set Encoding

A6.3 32-bit Thumb instruction encoding

Non-Confidential ID051414

A6.3.3 Data-processing (plain binary immediate)

The encoding of the 32-bit Thumb data-processing (plain binary immediate) instructions is:

Table A6-12 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED.

These encodings are all available in ARMv6T2 and above.

Table A6-12 32-bit unmodified immediate data-processing instructions

op Rn Instruction See

00000 not 1111 Add Wide (12-bit) ADD (immediate, Thumb) on page A8-306

1111 Form PC-relative Address ADR on page A8-322

00100 - Move Wide (16-bit) MOV (immediate) on page A8-484

01010 not 1111 Subtract Wide (12-bit) SUB (immediate, Thumb) on page A8-708

1111 Form PC-relative Address ADR on page A8-322

01100 - Move Top (16-bit) MOVT on page A8-491

10000

10010 a

a. In the second halfword of the instruction, bits[14:12, 7:6] !=

0b00000

- Signed Saturate SSAT on page A8-652

10010 b

b. In the second halfword of the instruction, bits[14:12, 7:6] ==

0b00000

- Signed Saturate, two 16-bit SSAT16 on page A8-654

10100 - Signed Bit Field Extract SBFX on page A8-598

10110 not 1111 Bit Field Insert BFI on page A8-338

1111 Bit Field Clear BFC on page A8-336

11000

11010 a

- Unsigned Saturate USAT on page A8-796

11010 b- Unsigned Saturate, two 16-bit USAT16 on page A8-798

11100 - Unsigned Bit Field Extract UBFX on page A8-756

1 1 1 0 1 op Rn 0

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

A6 Thumb Instruction Set Encoding

A6.3 32-bit Thumb instruction encoding

ID051414 Non-Confidential

A6.3.4 Branches and miscellaneous control

The encoding of the 32-bit Thumb branch instructions and miscellaneous control instructions is:

Table A6-13 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED.

1 1 1 0 op 1 op1 op2 imm8

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

Table A6-13 Branches and miscellaneous control instructions

op1 imm8 op op2 Instruction See Variant

0x0 - not

x111xxx

- Conditional branch B on page A8-334 v6T2

xx1xxxxx 011100x - Move to Banked or Special register MSR (Banked register) on

page B9-1994

v7VE

xx0xxxxx 0111000 xx00 Move to Special register, Application

level

MSR (register) on page A8-500 All

xx01

xx1x

Move to Special register,

System level

MSR (register) on page B9-1998 All

0111001 - Move to Special register,

System level

MSR (register) on page B9-1998 All

- 0111010 - - Change Processor State, and hints on page A6-236

- 0111011 - - Miscellaneous control instructions on page A6-237

- 0111100 - Branch and Exchange Jazelle BXJ on page A8-354 v6T2

00000000 0111101 - Exception Return ERET on page B9-1982 v6T2a

not

00000000

0111101 - Exception Return SUBS PC, LR (Thumb) on

page B9-2010

v6T2

xx1xxxxx 011111x - Move from Banked or Special

MRS (Banked register) on

page B9-1992

v7VE

xx0xxxxx 0111110 - Move from Special register,

Application level

MRS on page A8-496 v6T2

0111111 - Move from Special register, System

level

MRS on page B9-1990 v6T2

000 - 1111110 - Hypervisor Call HVC on page B9-1984 v7VE

1111111 - Secure Monitor Call SMC (previously SMI) on

page B9-2002

Security

Extensions

0x1 - - - Branch B on page A8-334 v6T2

010 - 1111111 - Permanently UNDEFINED UDF on page A8-758 Allb

A6 Thumb Instruction Set Encoding

A6.3 32-bit Thumb instruction encoding

Non-Confidential ID051414

Change Processor State, and hints

The encoding of 32-bit Thumb Change Processor State and hint instructions is:

Table A6-14 shows the allocation of encodings in this space. Encodings with

op1

set to

0b000

and a value of

op2

that

is not shown in the table are unallocated hints, and behave as if

op2

is set to

0b00000000

. These unallocated hint

encodings are reserved and software must not use them.

1x0 - - - Branch with Link and Exchange BL, BLX (immediate) on

page A8-348 v5T c

1x1 - - - Branch with Link BL, BLX (immediate) on

page A8-348

v4T

a. v7VE, that is, ARMv7 with the Virtualization Extensions, first defines

ERET

as an assembler mnemonic for this encoding. From ARMv6T2

this is an encoding for SUBS PC, LR (Thumb) on page B9-2010 with an imm8 value of zero. The Virtualization Extensions do not change

the behavior of the encoded instruction when it is executed at PL1.

b. Issue C.a of this manual first defines an assembler mnemonic for this encoding.

c. UNDEFINED in ARMv4T.

Table A6-13 Branches and miscellaneous control instructions (continued)

op1 imm8 op op2 Instruction See Variant

Table A6-14 Change Processor State, and hint instructions

op1 op2 Instruction See Variant

not 000 - Change Processor State CPS (Thumb) on page B9-1978 v6T2

000 00000000 No Operation hint NOP on page A8-510 v6T2

00000001 Yield hint YIELD on page A8-1108 v7

00000010 Wait For Event hint WFE on page A8-1104 v7

00000011 Wait For Interrupt hint WFI on page A8-1106 v7

00000100 Send Event hint SEV on page A8-606 v7

1111xxxx Debug hint DBG on page A8-377 v7

1 1 1 0 0 1 1 1 0 1 0 1 0 0 op1 op2

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

A6 Thumb Instruction Set Encoding

A6.3 32-bit Thumb instruction encoding

ID051414 Non-Confidential

Miscellaneous control instructions

The encoding of some 32-bit Thumb miscellaneous control instructions is:

Table A6-15 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED in

ARMv7. They are UNPREDICTABLE in ARMv6T2.

A6.3.5 Load/store multiple

The encoding of 32-bit Thumb load/store multiple instructions is:

Table A6-16 shows the allocation of encodings in this space.

These encodings are all available in ARMv6T2 and above.

Table A6-15 Miscellaneous control instructions

op Instruction See Variant

0000 Exit ThumbEE state a

a. This instruction is a NOP in Thumb state.

ENTERX, LEAVEX on page A9-1116 ThumbEE

0001 Enter ThumbEE state ENTERX, LEAVEX on page A9-1116 ThumbEE

0010 Clear-Exclusive CLREX on page A8-360 v7

0100 Data Synchronization Barrier DSB on page A8-380 v7

0101 Data Memory Barrier DMB on page A8-378 v7

0110 Instruction Synchronization Barrier ISB on page A8-389 v7

1 1 1 0 0 1 1 1 0 1 1 1 0 0 op

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

1 1 0 1 0 0 op 0 W L Rn

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

Table A6-16 Load/store multiple instructions

op L W:Rn Instruction See

00 0 - Store Return State SRS (Thumb) on page B9-2004

1 - Return From Exception RFE on page B9-2000

01 0 - Store Multiple (Increment After, Empty Ascending) STM (STMIA, STMEA) on page A8-664

1 not 11101 Load Multiple (Increment After, Full Descending) LDM/LDMIA/LDMFD (Thumb) on page A8-396

11101 Pop Multiple Registers from the stack POP (Thumb) on page A8-534

10 0 not 11101 Store Multiple (Decrement Before, Full Descending) STMDB (STMFD) on page A8-668

11101 Push Multiple Registers to the stack. PUSH on page A8-538

1 - Load Multiple (Decrement Before, Empty Ascending) LDMDB/LDMEA on page A8-402

11 0 - Store Return State SRS (Thumb) on page B9-2004

1 - Return From Exception RFE on page B9-2000

A6 Thumb Instruction Set Encoding

A6.3 32-bit Thumb instruction encoding

Non-Confidential ID051414

A6.3.6 Load/store dual, load/store exclusive, table branch

The encoding of 32-bit Thumb load/store dual, load/store exclusive and table branch instructions is:

Table A6-17 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED.

1 1 0 1 0 0 op1 1 op2 Rn op3

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

Table A6-17 Load/store double or exclusive, table branch

op1 op2 op3 Rn Instruction See Variant

00 00 - - Store Register Exclusive STREX on page A8-690 v6T2

01 - - Load Register Exclusive LDREX on page A8-432 v6T2

0x 10 - - Store Register Dual STRD (immediate) on page A8-686 v6T2

1x x0 - -

0x 11 - not 1111 Load Register Dual (immediate) LDRD (immediate) on page A8-426 v6T2

1x x1 - not 1111

0x 11 - 1111 Load Register Dual (literal) LDRD (literal) on page A8-428 v6T2

1x x1 - 1111

01 00 0100 - Store Register Exclusive Byte STREXB on page A8-692 v7

0101 - Store Register Exclusive Halfword STREXH on page A8-696 v7

0111 - Store Register Exclusive Doubleword STREXD on page A8-694 v7

01 0000 - Table Branch Byte TBB, TBH on page A8-736 v6T2

0001 - Table Branch Halfword TBB, TBH on page A8-736 v6T2

0100 - Load Register Exclusive Byte LDREXB on page A8-434 v7

0101 - Load Register Exclusive Halfword LDREXH on page A8-438 v7

0111 - Load Register Exclusive Doubleword LDREXD on page A8-436 v7

A6 Thumb Instruction Set Encoding

A6.3 32-bit Thumb instruction encoding

ID051414 Non-Confidential

A6.3.7 Load word

The encoding of 32-bit Thumb load word instructions is:

Table A6-18 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED.

These encodings are all available in ARMv6T2 and above.

Table A6-18 Load word

op1 op2 Rn Instruction See

00 000000 not 1111 Load Register LDR (register, Thumb) on page A8-412

00 1xx1xx not 1111 Load Register LDR (immediate, Thumb) on page A8-406

1100xx not 1111

01 - not 1111

00 1110xx not 1111 Load Register Unprivileged LDRT on page A8-466

0x - 1111 Load Register LDR (literal) on page A8-410

1 1 1 1 0 0 op1 1 0 1 Rn op2

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

A6 Thumb Instruction Set Encoding

A6.3 32-bit Thumb instruction encoding

Non-Confidential ID051414

A6.3.8 Load halfword, memory hints

The encoding of 32-bit Thumb load halfword instructions and some memory hint instructions is:

Table A6-19 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED.

Except where otherwise noted, these encodings are available in ARMv6T2 and above.

1 1 1 1 0 0 op1 0 1 1 Rn Rt op2

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

Table A6-19 Load halfword, preload

op1 op2 Rn Rt Instruction See

0x - 1111 not 1111 Load Register Halfword LDRH (literal) on page A8-444

1111 Preload Data PLD (literal) on page A8-526

00 1xx1xx not 1111 - Load Register Halfword LDRH (immediate, Thumb) on

page A8-440

1100xx not 1111 not 1111

01 - not 1111 not 1111

00 000000 not 1111 not 1111 Load Register Halfword LDRH (register) on page A8-446

1110xx not 1111 - Load Register Halfword Unprivileged LDRHT on page A8-448

000000 not 1111 1111 Preload Data with intent to WriteaPLD, PLDW (register) on page A8-528

1100xx not 1111 1111 Preload Data with intent to WriteaPLD, PLDW (immediate) on

page A8-524

01 - not 1111 1111

10 1xx1xx not 1111 - Load Register Signed Halfword LDRSH (immediate) on page A8-458

1100xx not 1111 not 1111

11 - not 1111 not 1111

1x - 1111 not 1111 Load Register Signed Halfword LDRSH (literal) on page A8-460

10 000000 not 1111 not 1111 Load Register Signed Halfword LDRSH (register) on page A8-462

1110xx not 1111 - Load Register Signed Halfword Unprivileged LDRSHT on page A8-464

10 000000 not 1111 1111 Unallocated memory hint (treat as NOP) -

1100xx not 1111 1111

1x - 1111 1111

11 - not 1111 1111 Unallocated memory hint (treat as NOP) -

a. Available in ARMv7 with the Multiprocessing Extensions. In an ARMv7 implementation that does not include the Multiprocessing

Extensions, and in ARMv6T2, these are unallocated memory hints, that are treated as NOPs.

A6 Thumb Instruction Set Encoding

A6.3 32-bit Thumb instruction encoding

ID051414 Non-Confidential

A6.3.9 Load byte, memory hints

The encoding of 32-bit Thumb load byte instructions and some memory hint instructions is:

Table A6-20 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED.

These encodings are all available in ARMv6T2 and above.

1 1 1 1 0 0 op1 0 0 1 Rn Rt op2

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

Table A6-20 Load byte, memory hints

op1 op2 Rn Rt Instruction See

00 000000 not 1111 not 1111 Load Register Byte LDRB (register) on page A8-422

1111 Preload Data PLD, PLDW (register) on page A8-528

0x - 1111 not 1111 Load Register Byte LDRB (literal) on page A8-420

1111 Preload Data PLD (literal) on page A8-526

00 1xx1xx not 1111 - Load Register Byte LDRB (immediate, Thumb) on page A8-416

1100xx not 1111 not 1111 Load Register Byte

1111 Preload Data PLD, PLDW (immediate) on page A8-524

1110xx not 1111 - Load Register Byte Unprivileged LDRBT on page A8-424

01 - not 1111 not 1111 Load Register Byte LDRB (immediate, Thumb) on page A8-416

1111 Preload Data PLD, PLDW (immediate) on page A8-524

10 000000 not 1111 not 1111 Load Register Signed Byte LDRSB (register) on page A8-454

1111 Preload Instruction PLI (register) on page A8-532

1x - 1111 not 1111 Load Register Signed Byte LDRSB (literal) on page A8-452

1111 Preload Instruction PLI (immediate, literal) on page A8-530

10 1xx1xx not 1111 - Load Register Signed Byte LDRSB (immediate) on page A8-450

1100xx not 1111 not 1111 Load Register Signed Byte LDRSB (immediate) on page A8-450

1111 Preload Instruction PLI (immediate, literal) on page A8-530

1110xx not 1111 - Load Register Signed Byte Unprivileged LDRSBT on page A8-456

11 - not 1111 not 1111 Load Register Signed Byte LDRSB (immediate) on page A8-450

1111 Preload Instruction PLI (immediate, literal) on page A8-530

A6 Thumb Instruction Set Encoding

A6.3 32-bit Thumb instruction encoding

Non-Confidential ID051414

A6.3.10 Store single data item

The encoding of 32-bit Thumb store single data item instructions is:

Table A6-21 show the allocation of encodings in this space. Other encodings in this space are UNDEFINED.

These encodings are all available in ARMv6T2 and above.

Table A6-21 Store single data item

op1 op2 Instruction See

000 1xx1xx Store Register Byte STRB (immediate, Thumb) on page A8-678

1100xx

100 -

000 000000 Store Register Byte STRB (register) on page A8-682

1110xx Store Register Byte Unprivileged STRBT on page A8-684

001 1xx1xx Store Register Halfword STRH (immediate, Thumb) on page A8-698

1100xx

101 -

001 000000 Store Register Halfword STRH (register) on page A8-702

1110xx Store Register Halfword Unprivileged STRHT on page A8-704

010 1xx1xx Store Register STR (immediate, Thumb) on page A8-672

1100xx

110 -

010 000000 Store Register STR (register) on page A8-676

1110xx Store Register Unprivileged STRT on page A8-706

1 1 1 1 0 0 0 op1 0 op2

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

A6 Thumb Instruction Set Encoding

A6.3 32-bit Thumb instruction encoding

ID051414 Non-Confidential

A6.3.11 Data-processing (shifted register)

The encoding of 32-bit Thumb data-processing (shifted register) instructions is:

Table A6-22 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED.

These encodings are all available in ARMv6T2 and above.

Table A6-22 Data-processing (shifted register)

op Rn Rd:S Instruction See

0000 - not 11111 Bitwise AND AND (register) on page A8-326

11111 Test TST (register) on page A8-746

0001 - - Bitwise Bit Clear BIC (register) on page A8-342

0010 not 1111 - Bitwise OR ORR (register) on page A8-518

1111 - - Move register and immediate shifts on page A6-244

0011 not 1111 - Bitwise OR NOT ORN (register) on page A8-514

1111 - Bitwise NOT MVN (register) on page A8-506

0100 - not 11111 Bitwise Exclusive OR EOR (register) on page A8-384

11111 Test Equivalence TEQ (register) on page A8-740

0110 - - Pack Halfword PKH on page A8-522

1000 - not 11111 Add ADD (register, Thumb) on page A8-310

11111 Compare Negative CMN (register) on page A8-366

1010 - - Add with Carry ADC (register) on page A8-302

1011 - - Subtract with Carry SBC (register) on page A8-594

1101 - not 11111 Subtract SUB (register) on page A8-712

11111 Compare CMP (register) on page A8-372

1110 - - Reverse Subtract RSB (register) on page A8-576

1 1 0 1 0 1 op S Rn Rd

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

A6 Thumb Instruction Set Encoding

A6.3 32-bit Thumb instruction encoding

Non-Confidential ID051414

Move register and immediate shifts

The encoding of the 32-bit Thumb move register and immediate shift instructions is:

Table A6-23 shows the allocation of encodings in this space.

These encodings are all available in ARMv6T2 and above.

Table A6-23 Move register and immediate shifts

type imm3:imm2 Instruction See

00 00000 Move MOV (register, Thumb) on page A8-486

not 00000 Logical Shift Left LSL (immediate) on page A8-468

01 - Logical Shift Right LSR (immediate) on page A8-472

10 - Arithmetic Shift Right ASR (immediate) on page A8-330

11 00000 Rotate Right with Extend RRX on page A8-572

not 00000 Rotate Right ROR (immediate) on page A8-568

1 1 0 1 0 1 0 0 1 0 1 1 1 1 imm3 imm2 type

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

A6 Thumb Instruction Set Encoding

A6.3 32-bit Thumb instruction encoding

ID051414 Non-Confidential

A6.3.12 Data-processing (register)

The encoding of 32-bit Thumb data-processing (register) instructions is:

If, in the second halfword of the instruction, bits[15:12] !=

0b1111

, the instruction is UNDEFINED.

Table A6-24 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED.

These encodings are all available in ARMv6T2 and above.

1 1 1 1 0 1 0 op1 Rn 1 1 1 1 op2

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

Table A6-24 Data-processing (register)

op1 op2 Rn Instruction See

000x 0000 - Logical Shift Left LSL (register) on page A8-470

001x 0000 - Logical Shift Right LSR (register) on page A8-474

010x 0000 - Arithmetic Shift Right ASR (register) on page A8-332

011x 0000 - Rotate Right ROR (register) on page A8-570

0000 1xxx not 1111 Signed Extend and Add Halfword SXTAH on page A8-728

1111 Signed Extend Halfword SXTH on page A8-734

0001 1xxx not 1111 Unsigned Extend and Add Halfword UXTAH on page A8-810

1111 Unsigned Extend Halfword UXTH on page A8-816

0010 1xxx not 1111 Signed Extend and Add Byte 16-bit SXTAB16 on page A8-726

1111 Signed Extend Byte 16-bit SXTB16 on page A8-732

0011 1xxx not 1111 Unsigned Extend and Add Byte 16-bit UXTAB16 on page A8-808

1111 Unsigned Extend Byte 16-bit UXTB16 on page A8-814

0100 1xxx not 1111 Signed Extend and Add Byte SXTAB on page A8-724

1111 Signed Extend Byte SXTB on page A8-730

0101 1xxx not 1111 Unsigned Extend and Add Byte UXTAB on page A8-806

1111 Unsigned Extend Byte UXTB on page A8-812

1xxx 00xx - - Parallel addition and subtraction, signed on page A6-246

1xxx 01xx - - Parallel addition and subtraction, unsigned on page A6-247

10xx 10xx - - Miscellaneous operations on page A6-248

A6 Thumb Instruction Set Encoding

A6.3 32-bit Thumb instruction encoding

Non-Confidential ID051414

A6.3.13 Parallel addition and subtraction, signed

The encoding of 32-bit Thumb signed parallel addition and subtraction instructions is:

If, in the second halfword of the instruction, bits[15:12] !=

0b1111

, the instruction is UNDEFINED.

Table A6-25 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. These

encodings are all available in ARMv6T2 and above.

Table A6-25 Signed parallel addition and subtraction instructions

op1 op2 Instruction See

001 00 Add 16-bit SADD16 on page A8-586

010 00 Add and Subtract with Exchange, 16-bit SASX on page A8-590

110 00 Subtract and Add with Exchange, 16-bit SSAX on page A8-656

101 00 Subtract 16-bit SSUB16 on page A8-658

000 00 Add 8-bit SADD8 on page A8-588

100 00 Subtract 8-bit SSUB8 on page A8-660

Saturating instructions

001 01 Saturating Add 16-bit QADD16 on page A8-542

010 01 Saturating Add and Subtract with Exchange, 16-bit QASX on page A8-546

110 01 Saturating Subtract and Add with Exchange, 16-bit QSAX on page A8-552

101 01 Saturating Subtract 16-bit QSUB16 on page A8-556

000 01 Saturating Add 8-bit QADD8 on page A8-544

100 01 Saturating Subtract 8-bit QSUB8 on page A8-558

Halving instructions

001 10 Halving Add 16-bit SHADD16 on page A8-608

010 10 Halving Add and Subtract with Exchange, 16-bit SHASX on page A8-612

110 10 Halving Subtract and Add with Exchange, 16-bit SHSAX on page A8-614

101 10 Halving Subtract 16-bit SHSUB16 on page A8-616

000 10 Halving Add 8-bit SHADD8 on page A8-610

100 10 Halving Subtract 8-bit SHSUB8 on page A8-618

1 1 1 1 0 1 0 1 op1 1 1 1 1 0 0 op2

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

A6 Thumb Instruction Set Encoding

A6.3 32-bit Thumb instruction encoding

ID051414 Non-Confidential

A6.3.14 Parallel addition and subtraction, unsigned

The encoding of 32-bit Thumb unsigned parallel addition and subtraction instructions is:

If, in the second halfword of the instruction, bits[15:12] !=

0b1111

, the instruction is UNDEFINED.

Table A6-26 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. These

encodings are all available in ARMv6T2 and above.

Table A6-26 Unsigned parallel addition and subtraction instructions

op1 op2 Instruction See

001 00 Add 16-bit UADD16 on page A8-750

010 00 Add and Subtract with Exchange, 16-bit UASX on page A8-754

110 00 Subtract and Add with Exchange, 16-bit USAX on page A8-800

101 00 Subtract 16-bit USUB16 on page A8-802

000 00 Add 8-bit UADD8 on page A8-752

100 00 Subtract 8-bit USUB8 on page A8-804

Saturating instructions

001 01 Saturating Add 16-bit UQADD16 on page A8-780

010 01 Saturating Add and Subtract with Exchange, 16-bit UQASX on page A8-784

110 01 Saturating Subtract and Add with Exchange, 16-bit UQSAX on page A8-786

101 01 Saturating Subtract 16-bit UQSUB16 on page A8-788

000 01 Saturating Add 8-bit UQADD8 on page A8-782

100 01 Saturating Subtract 8-bit UQSUB8 on page A8-790

Halving instructions

001 10 Halving Add 16-bit UHADD16 on page A8-762

010 10 Halving Add and Subtract with Exchange, 16-bit UHASX on page A8-766

110 10 Halving Subtract and Add with Exchange, 16-bit UHSAX on page A8-768

101 10 Halving Subtract 16-bit UHSUB16 on page A8-770

000 10 Halving Add 8-bit UHADD8 on page A8-764

100 10 Halving Subtract 8-bit UHSUB8 on page A8-772

1 1 1 1 0 1 0 1 op1 1 1 1 1 0 1 op2

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

A6 Thumb Instruction Set Encoding

A6.3 32-bit Thumb instruction encoding

Non-Confidential ID051414

A6.3.15 Miscellaneous operations

The encoding of some 32-bit Thumb miscellaneous instructions is:

If, in the second halfword of the instruction, bits[15:12] !=

0b1111

, the instruction is UNDEFINED.

Table A6-27 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. These

encodings are all available in ARMv6T2 and above.

Table A6-27 Miscellaneous operations

op1 op2 Instruction See

00 00 Saturating Add QADD on page A8-540

01 Saturating Double and Add QDADD on page A8-548

10 Saturating Subtract QSUB on page A8-554

11 Saturating Double and Subtract QDSUB on page A8-550

01 00 Byte-Reverse Word REV on page A8-562

01 Byte-Reverse Packed Halfword REV16 on page A8-564

10 Reverse Bits RBIT on page A8-560

11 Byte-Reverse Signed Halfword REVSH on page A8-566

10 00 Select Bytes SEL on page A8-602

11 00 Count Leading Zeros CLZ on page A8-362

1 1 1 1 0 1 0 1 0 op1 1 1 1 1 1 0 op2

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

A6 Thumb Instruction Set Encoding

A6.3 32-bit Thumb instruction encoding

ID051414 Non-Confidential

A6.3.16 Multiply, multiply accumulate, and absolute difference

The encoding of 32-bit Thumb multiply, multiply accumulate, and absolute difference instructions is:

If, in the second halfword of the instruction, bits[7:6] !=

0b00

, the instruction is UNDEFINED.

Table A6-28 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED. These

encodings are all available in ARMv6T2 and above.

1 1 1 1 0 1 1 0 op1 Ra 0 0 op2

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

Table A6-28 Multiply, multiply accumulate, and absolute difference operations

op1 op2 Ra Instruction See

000 00 not 1111 Multiply Accumulate MLA on page A8-480

1111 Multiply MUL on page A8-502

01 - Multiply and Subtract MLS on page A8-482

001 - not 1111 Signed Multiply Accumulate (Halfwords) SMLABB, SMLABT, SMLATB, SMLATT on

page A8-620

1111 Signed Multiply (Halfwords) SMULBB, SMULBT, SMULTB, SMULTT on

page A8-644

010 0x not 1111 Signed Multiply Accumulate Dual SMLAD on page A8-622

1111 Signed Dual Multiply Add SMUAD on page A8-642

011 0x not 1111 Signed Multiply Accumulate (Word by halfword) SMLAWB, SMLAWT on page A8-630

1111 Signed Multiply (Word by halfword) SMULWB, SMULWT on page A8-648

100 0x not 1111 Signed Multiply Subtract Dual SMLSD on page A8-632

1111 Signed Dual Multiply Subtract SMUSD on page A8-650

101 0x not 1111 Signed Most Significant Word Multiply Accumulate SMMLA on page A8-636

1111 Signed Most Significant Word Multiply SMMUL on page A8-640

110 0x - Signed Most Significant Word Multiply Subtract SMMLS on page A8-638

111 00 not 1111 Unsigned Sum of Absolute Differences, Accumulate USADA8 on page A8-794

1111 Unsigned Sum of Absolute Differences USAD8 on page A8-792

A6 Thumb Instruction Set Encoding

A6.3 32-bit Thumb instruction encoding

Non-Confidential ID051414

A6.3.17 Long multiply, long multiply accumulate, and divide

The encoding of 32-bit Thumb long multiply, long multiply accumulate, and divide instructions is:

Table A6-29 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED.

1 1 1 1 0 1 1 1 op1 op2

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

Table A6-29 Multiply, multiply accumulate, and absolute difference operations

op1 op2 Instruction See Variant

000 0000 Signed Multiply Long SMULL on page A8-646 v6T2

001 1111 Signed Divide SDIV on page A8-600 v7-Ra

010 0000 Unsigned Multiply Long UMULL on page A8-778 v6T2

011 1111 Unsigned Divide UDIV on page A8-760 v7-Ra

100 0000 Signed Multiply Accumulate Long SMLAL on page A8-624 v6T2

10xx Signed Multiply Accumulate Long (Halfwords) SMLALBB, SMLALBT, SMLALTB, SMLALTT on

page A8-626

v6T2

110x Signed Multiply Accumulate Long Dual SMLALD on page A8-628 v6T2

101 110x Signed Multiply Subtract Long Dual SMLSLD on page A8-634 v6T2

110 0000 Unsigned Multiply Accumulate Long UMLAL on page A8-776 v6T2

0110 Unsigned Multiply Accumulate Accumulate Long UMAAL on page A8-774 v6T2

a. Optional in some ARMv7 implementations, see ARMv7 implementation requirements and options for the divide instructions on

page A4-172.

A6 Thumb Instruction Set Encoding

A6.3 32-bit Thumb instruction encoding

ID051414 Non-Confidential

A6.3.18 Coprocessor, Advanced SIMD, and Floating-point instructions

The encoding of 32-bit Thumb coprocessor instructions is:

Table A6-30 shows the allocation of encodings in this space. These encodings are all available in ARMv6T2 and

above:

For more information about specific coprocessors see Coprocessor support on page A2-94.

1 1 1 1 op1 Rn coproc op

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

Table A6-30 Coprocessor, Advanced SIMD, and Floating-point instructions

coproc op1 op Rn Instructions See

- 00000x - - UNDEFINED -

11xxxx - - Advanced SIMD Advanced SIMD data-processing instructions on

page A7-261

not 101x 0xxxx0

not 000x0x

- - Store Coprocessor STC, STC2 on page A8-662

0xxxx1

not 000x0x

- not 1111 Load Coprocessor (immediate) LDC, LDC2 (immediate) on page A8-392

1111 Load Coprocessor (literal) LDC, LDC2 (literal) on page A8-394

000100 - - Move to Coprocessor from two

ARM core registers

MCRR, MCRR2 on page A8-478

000101 - - Move to two ARM core

registers from Coprocessor

MRRC, MRRC2 on page A8-494

10xxxx 0 - Coprocessor data operations CDP, CDP2 on page A8-358

10xxx0 1 - Move to Coprocessor from

ARM core register

MCR, MCR2 on page A8-476

10xxx1 1 - Move to ARM core register

from Coprocessor

MRC, MRC2 on page A8-492

101x 0xxxxx

not 000x0x

- - Advanced SIMD,

Floating-point

Extension register load/store instructions on

page A7-274

00010x - - Advanced SIMD,

Floating-point

64-bit transfers between ARM core and extension

registers on page A7-279

10xxxx 0 - Floating-point data processing Floating-point data-processing instructions on

page A7-272

10xxxx 1 - Advanced SIMD,

Floating-point

8, 16, and 32-bit transfer between ARM core and

extension registers on page A7-278

A6 Thumb Instruction Set Encoding

A6.3 32-bit Thumb instruction encoding

Non-Confidential ID051414

ID051414 Non-Confidential

Chapter A7

Advanced SIMD and Floating-point

Instruction Encoding

This chapter gives an overview of the Advanced SIMD and Floating-point (VFP) instruction sets. It contains the

following sections:

•Overview on page A7-254

•Advanced SIMD and Floating-point instruction syntax on page A7-255

•Register encoding on page A7-259

•Advanced SIMD data-processing instructions on page A7-261

•Floating-point data-processing instructions on page A7-272

•Extension register load/store instructions on page A7-274

•Advanced SIMD element or structure load/store instructions on page A7-275

•8, 16, and 32-bit transfer between ARM core and extension registers on page A7-278

•64-bit transfers between ARM core and extension registers on page A7-279.

Note

• The Advanced SIMD architecture extension, its associated implementations, and supporting software, are

commonly referred to as NEON™ technology.

• In the decode tables in this chapter, an entry of - for a field value means the value of the field does not affect

the decoding.

A7 Advanced SIMD and Floating-point Instruction Encoding

A7.1 Overview

Non-Confidential ID051414

A7.1 Overview

All Advanced SIMD and Floating-point instructions are available in both ARM state and Thumb state.

A7.1.1 Advanced SIMD

The following sections describe the classes of instruction in the Advanced SIMD Extension:

•Advanced SIMD data-processing instructions on page A7-261

•Advanced SIMD element or structure load/store instructions on page A7-275

•Extension register load/store instructions on page A7-274

•8, 16, and 32-bit transfer between ARM core and extension registers on page A7-278

•64-bit transfers between ARM core and extension registers on page A7-279.

A7.1.2 Floating-point

The following sections describe the classes of instruction in the Floating-point Extension:

•Extension register load/store instructions on page A7-274

•8, 16, and 32-bit transfer between ARM core and extension registers on page A7-278

•64-bit transfers between ARM core and extension registers on page A7-279

•Floating-point data-processing instructions on page A7-272.

A7 Advanced SIMD and Floating-point Instruction Encoding

A7.2 Advanced SIMD and Floating-point instruction syntax

ID051414 Non-Confidential

A7.2 Advanced SIMD and Floating-point instruction syntax

Advanced SIMD and Floating-point (VFP) instructions use the general conventions of the ARM instruction set.

Advanced SIMD and Floating-point data-processing instructions use the following general format:

V{<modifier>}<operation>{<shape>}{<c>}{<q>}{.<dt>} {<dest>,} <src1>, <src2>

All Advanced SIMD and Floating-point instructions begin with a

. This distinguishes Advanced SIMD vector and

Floating-point instructions from ARM scalar instructions.

The main operation is specified in the

field. It is usually a three letter mnemonic the same as or similar

to the corresponding scalar integer instruction.

The

<c>

and

<q>

fields are standard assembler syntax fields. For details see Standard assembler syntax fields on

page A8-287.

A7.2.1 Advanced SIMD instruction modifiers

The

field provides additional variants of some instructions. Table A7-1 provides definitions of the

modifiers. Modifiers are not available for every instruction.

A7.2.2 Advanced SIMD operand shapes

The

<shape>

field provides additional variants of some instructions. Table A7-2 provides definitions of the shapes.

Operand shapes are not available for every instruction.

Note

• Some assemblers support a Q shape specifier, that requires all operands to be Q registers. An example of

using this specifier is

VADDQ.S32 q0, q1, q2

. This is not standard UAL, and ARM recommends that

programmers do not use a Q shape specifier.

• A disassembler must not generate any shape specifier not shown in Table A7-2.

Table A7-1 Advanced SIMD instruction modifiers

<modifier> Meaning

Q The operation uses saturating arithmetic.

R The operation performs rounding.

D The operation doubles the result (before accumulation, if any).

H The operation halves the result.

Table A7-2 Advanced SIMD operand shapes

<shape> Meaning Typical register shape

(none) The operands and result are all the same width. Dd, Dn, Dm Qd, Qn, Qm

L Long operation - result is twice the width of both operands Qd, Dn, Dm

N Narrow operation - result is half the width of both operands Dd, Qn, Qm

W Wide operation - result and first operand are twice the width of the second operand Qd, Qn, Dm

A7 Advanced SIMD and Floating-point Instruction Encoding

A7.2 Advanced SIMD and Floating-point instruction syntax

Non-Confidential ID051414

A7.2.3 Data type specifiers

The

<dt>

field normally contains one data type specifier. Unless the assembler syntax description for the instruction

indicates otherwise, this indicates the data type contained in:

• the second operand, if any

• the operand, if there is no second operand

• the result, if there are no operand registers.

The data types of the other operand and result are implied by the

<dt>

field combined with the instruction shape. For

information about data type formats see Data types supported by the Advanced SIMD Extension on page A2-59.

In the instruction syntax descriptions in Chapter A8 Instruction Descriptions, the

<dt>

field is usually specified as

a single field. However, where more convenient, it is sometimes specified as a concatenation of two fields,

Syntax flexibility

There is some flexibility in the data type specifier syntax:

• Software can specify three data types, specifying the result and both operand data types. For example:

VSUBW.I16.I16.S8 Q3, Q5, D0

instead of

VSUBW.S8 Q3, Q5, D0

• Software can specify two data types, specifying the data types of the two operands. The data type of the result

is implied by the instruction shape. For example:

VSUBW.I16.S8 Q3, Q5, D0

instead of

VSUBW.S8 Q3, Q5, D0

• Software can specify two data types, specifying the data types of the single operand and the result. For

example:

VMOVN.I16.I32 D0, Q1

instead of

VMOVN.I32 D0, Q1

• Where an instruction requires a less specific data type, software can instead specify a more specific type, as

shown in Table A7-3.

• Where an instruction does not require a data type, software can provide one.

• The

F32

data type can be abbreviated to

• The

F64

data type can be abbreviated to

In all cases, if software provides additional information, the additional information must match the instruction

shape. Disassembly does not regenerate this additional information.

Table A7-3 Data type specification flexibility

Specified data type Permitted more specific data types

None Any

.I<size>

.S<size> .U<size>

.8 .I8 .S8 .U8 .P8

.16 .I16 .S16 .U16 .P16 .F16

.32 .I32 .S32 .U32

.F32

.64 .I64 .S64 .U64

.F64

A7 Advanced SIMD and Floating-point Instruction Encoding

A7.2 Advanced SIMD and Floating-point instruction syntax

ID051414 Non-Confidential

A7.2.4 Register specifiers

The

<dest>

<src1>

, and

<src2>

fields contain register specifiers, or in some cases scalar specifiers or register lists.

Table A7-4 shows the register and scalar specifier formats that appear in the instruction descriptions.

<dest>

is omitted, it is the same as

<src1>

Table A7-4 Advanced SIMD and Floating-point register specifier formats

<specifier> Usual meaning a

a. In some instructions the roles of registers are different.

Used in

<Qd>

A quadword destination register for the result vector. Advanced SIMD

<Qn>

A quadword source register for the first operand vector. Advanced SIMD

<Qm>

A quadword source register for the second operand vector. Advanced SIMD

<Dd>

A doubleword destination register for the result vector. Both

<Dn>

A doubleword source register for the first operand vector. Both

<Dm>

A doubleword source register for the second operand vector. Both

<Sd>

A singleword destination register for the result vector. Floating-point

<Sn>

A singleword source register for the first operand vector. Floating-point

<Sm>

A singleword source register for the second operand vector. Floating-point

<Dd[x]>

A destination scalar for the result. Element x of vector

<Dd>

. Advanced SIMD

<Dn[x]>

A source scalar for the first operand. Element x of vector

<Dn>

.Bothb

b. In the Floating-point Extension,

<Dn[x]>

is used only in

VMOV

(scalar to ARM core register), see VMOV

(scalar to ARM core register) on page A8-942.

<Dm[x]>

A source scalar for the second operand. Element x of vector

<Dm>

. Advanced SIMD

<Rt>

An ARM core register, used for a source or destination address. Both

<Rt2>

An ARM core register, used for a source or destination address. Both

<Rn>

An ARM core register, used as a load or store base address. Both

<Rm>

An ARM core register, used as a post-indexed address source. Both

A7 Advanced SIMD and Floating-point Instruction Encoding

A7.2 Advanced SIMD and Floating-point instruction syntax

Non-Confidential ID051414

A7.2.5 Register lists

A register list is a list of register specifiers separated by commas and enclosed in brackets { and }. There are

restrictions on what registers can appear in a register list. These restrictions are described in the individual

instruction descriptions. Table A7-5 shows some register list formats, with examples of actual register lists

corresponding to those formats.

Note

Syntax flexibility

There is some flexibility in the register list syntax:

• Where a register list contains consecutive registers, they can be specified as a range, instead of listing every

{D0-D3}

instead of

{D0, D1, D2, D3}

• Where a register list contains an even number of consecutive doubleword registers starting with an even

numbered register, it can be written as a list of quadword registers instead, for example

{Q1, Q2}

instead of

{D2-D5}

• Where a register list contains only one register, the enclosing braces can be omitted, for example

VLD1.8 D0, [R0]

instead of

VLD1.8 {D0}, [R0]

Table A7-5 Example register lists

Format Example Alternative

{<Dd>} {D3} D3

{<Dd>, <Dd+1>, <Dd+2>} {D3, D4, D5} {D3-D5}

{<Dd[x]>, <Dd+2[x]} {D0[3], D2[3]}

{<Dd[]>} {D7[]} D7[]

A7 Advanced SIMD and Floating-point Instruction Encoding

A7.3 Register encoding

ID051414 Non-Confidential

A7.3 Register encoding

An Advanced SIMD register is either:

•quadword, meaning it is 128 bits wide

•doubleword, meaning it is 64 bits wide.

Some instructions have options for either doubleword or quadword registers. This is normally encoded in Q, bit[6],

as Q = 0 for doubleword operations, or Q = 1 for quadword operations.

A Floating-point register is either:

• double-precision, meaning it is 64 bits wide

• single-precision, meaning it is 32 bits wide.

This is encoded in the

field, bit[8], as sz = 1 for double-precision operations, or sz = 0 for single-precision

operations.

The Thumb instruction encoding of Advanced SIMD or Floating-point registers is:

The ARM instruction encoding of Advanced SIMD or Floating-point registers is:

Some instructions use only one or two registers, and use the unused register fields as additional opcode bits.

Table A7-6 shows the encodings for the registers.

D Vn Vd sz N Q M Vm

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

D Vn Vd sz N Q M Vm

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Table A7-6 Encoding of register numbers

mnemonic Usual usage Register number

encoded inaNotesaUsed in

<Qd>

Destination (quadword) D, Vd (bits[22, 15:13]) bit[12] == 0bAdvanced SIMD

<Qn>

First operand (quadword) N, Vn (bits[7, 19:17]) bit[16] == 0bAdvanced SIMD

<Qm>

Second operand (quadword) M, Vm (bits[5, 3:1]) bit[0] == 0bAdvanced SIMD

<Dd>

Destination (doubleword) D, Vd (bits[22, 15:12]) - Both

<Dn>

First operand (doubleword) N, Vn (bits[7, 19:16]) - Both

<Dm>

Second operand (doubleword) M, Vm (bits[5, 3:0]) - Both

<Sd>

Destination (single-precision) Vd, D (bits[15:12, 22]) - Floating-point

<Sn>

First operand (single-precision) Vn, N (bits[19:16, 7]) - Floating-point

<Sm>

Second operand (single-precision) Vm, M (bits[3:0, 5]) - Floating-point

a. Bit numbers given for the ARM instruction encoding. See the figures in this section for the equivalent bits in the Thumb

encoding.

b. If this bit is 1, the instruction is UNDEFINED.

A7 Advanced SIMD and Floating-point Instruction Encoding

A7.3 Register encoding

Non-Confidential ID051414

A7.3.1 Advanced SIMD scalars

Advanced SIMD scalars can be 8-bit, 16-bit, 32-bit, or 64-bit. Instructions other than multiply instructions can

access any element in the register set. The instruction syntax refers to the scalars using an index into a doubleword

vector. The descriptions of the individual instructions contain details of the encodings.

Table A7-7 shows the form of encoding for scalars used in multiply instructions. These instructions cannot access

scalars in some registers. The descriptions of the individual instructions contain cross references to this section

where appropriate.

32-bit Advanced SIMD scalars, when used as single-precision floating-point numbers, are equivalent to

Floating-point single-precision registers. That is,

Dm[x]

in a 32-bit context (0 <=

<= 15, 0 <=

<=1) is equivalent

S[2m + x]

Table A7-7 Encoding of scalars in multiply instructions

Scalar

mnemonic Usual usage Scalar

size Register

specifier Index

specifier Accessible

registers

<Dm[x]>

Second operand 16-bit Vm[2:0] M, Vm[3] D0-D7

32-bit Vm[3:0] M D0-D15

A7 Advanced SIMD and Floating-point Instruction Encoding

A7.4 Advanced SIMD data-processing instructions

ID051414 Non-Confidential

A7.4 Advanced SIMD data-processing instructions

The Thumb encoding of Advanced SIMD data processing instructions is:

The ARM encoding of Advanced SIMD data processing instructions is:

Table A7-8 shows the encoding for Advanced SIMD data-processing instructions. Other encodings in this space are

UNDEFINED.

In these instructions, the U bit is in a different location in ARM and Thumb instructions. This is bit[12] of the first

halfword in the Thumb encoding, and bit[24] in the ARM encoding. Other variable bits are in identical locations in

the two encodings, after adjusting for the fact that the ARM encoding is held in memory as a single word and the

Thumb encoding is held as two consecutive halfwords.

The ARM instructions can only be executed unconditionally. The Thumb instructions can be executed conditionally

by using the

instruction. For details see IT on page A8-390.

Table A7-8 Data-processing instructions

UA B C See

- 0xxxx - - Three registers of the same length on page A7-262

1x000 - 0xx1 One register and a modified immediate value on page A7-269

1x001 - 0xx1 Two registers and a shift amount on page A7-266

1x01x - 0xx1

1x1xx - 0xx1

1xxxx - 1xx1

1x0xx - x0x0 Three registers of different lengths on page A7-264

1x10x - x0x0

1x0xx - x1x0 Two registers and a scalar on page A7-265

1x10x - x1x0

0 1x11x - xxx0 Vector Extract, VEXT on page A8-890

1 1x11x 0xxx xxx0 Two registers, miscellaneous on page A7-267

10xx xxx0 Vector Table Lookup, VTBL, VTBX on page A8-1094

1100 0xx0 Vector Duplicate, VDUP (scalar) on page A8-884

1 1 U 1 1 1 1 A B C

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

1 1 1 0 0 1 U A B C

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A7 Advanced SIMD and Floating-point Instruction Encoding

A7.4 Advanced SIMD data-processing instructions

Non-Confidential ID051414

A7.4.1 Three registers of the same length

The Thumb encoding of these instructions is:

The ARM encoding of these instructions is:

Table A7-9 shows the allocation of encodings in this space. Other encodings in this space are UNDEFINED.

1 1 U 1 1 1 1 0 C A B

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

1 1 1 0 0 1 U 0 C A B

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Table A7-9 Three registers of the same length

A B U C Instruction See Varianta

0000 0 - - Vector Halving Add VHADD, VHSUB on page A8-896 ASIMD

1 - - Vector Saturating Add VQADD on page A8-996 ASIMD

0001 0 - - Vector Rounding Halving Add VRHADD on page A8-1030 ASIMD

1 0 00 Vector Bitwise AND VAND (register) on page A8-836 ASIMD

01 Vector Bitwise Bit Clear, AND complement VBIC (register) on page A8-840 ASIMD

10 Vector Bitwise OR, if source registers differ VORR (register) on page A8-976 ASIMD

Vector Move, if source registers identical VMOV (register) on page A8-938 ASIMD

11 Vector Bitwise OR NOT VORN (register) on page A8-972 ASIMD

0001 1 1 00 Vector Bitwise Exclusive OR VEOR on page A8-888 ASIMD

01 Vector Bitwise Select VBIF, VBIT, VBSL on page A8-842 ASIMD

10 Vector Bitwise Insert if True VBIF, VBIT, VBSL on page A8-842 ASIMD

11 Vector Bitwise Insert if False VBIF, VBIT, VBSL on page A8-842 ASIMD

0010 0 - - Vector Halving Subtract VHADD, VHSUB on page A8-896 ASIMD

1 - - Vector Saturating Subtract VQSUB on page A8-1020 ASIMD

0011 0 - - Vector Compare Greater Than VCGT (register) on page A8-852 ASIMD

1 - - Vector Compare Greater Than or Equal VCGE (register) on page A8-848 ASIMD

0100 0 - - Vector Shift Left VSHL (register) on page A8-1048 ASIMD

1 - - Vector Saturating Shift Left VQSHL (register) on page A8-1014 ASIMD

0101 0 - - Vector Rounding Shift Left VRSHL on page A8-1032 ASIMD

1 - - Vector Saturating Rounding Shift Left VQRSHL on page A8-1010 ASIMD

0110 - - - Vector Maximum or Minimum VMAX, VMIN (integer) on page A8-926 ASIMD

0111 0 - - Vector Absolute Difference VABD, VABDL (integer) on page A8-820 ASIMD

1 - - Vector Absolute Difference and Accumulate VABA, VABAL on page A8-818 ASIMD

A7 Advanced SIMD and Floating-point Instruction Encoding

A7.4 Advanced SIMD data-processing instructions

ID051414 Non-Confidential

1000 0 0 - Vector Add VADD (integer) on page A8-828 ASIMD

1 - Vector Subtract VSUB (integer) on page A8-1084 ASIMD

1 0 - Vector Test Bits VTST on page A8-1098 ASIMD

1 - Vector Compare Equal VCEQ (register) on page A8-844 ASIMD

1001 0 - - Vector Multiply Accumulate or Subtract VMLA, VMLAL, VMLS, VMLSL (integer)

on page A8-930

ASIMD

1 - - Vector Multiply VMUL, VMULL (integer and polynomial)

on page A8-958

ASIMD

1010 - - - Vector Pairwise Maximum or Minimum VPMAX, VPMIN (integer) on

page A8-986

ASIMD

1011 0 0 - Vector Saturating Doubling Multiply Returning

High Half

VQDMULH on page A8-1000 ASIMD

1 - Vector Saturating Rounding Doubling Multiply

Returning High Half

VQRDMULH on page A8-1008 ASIMD

1 0 - Vector Pairwise Add VPADD (integer) on page A8-980 ASIMD

1100 1 0 - Vector Fused Multiply Accumulate or Subtract VFMA, VFMS on page A8-892 ASIMDv2

1101 0 0 0x Vector Add VADD (floating-point) on page A8-830 ASIMD

1x Vector Subtract VSUB (floating-point) on page A8-1086 ASIMD

1 0x Vector Pairwise Add VPADD (floating-point) on page A8-982 ASIMD

1x Vector Absolute Difference VABD (floating-point) on page A8-822 ASIMD

1 0 - Vector Multiply Accumulate or Subtract VMLA, VMLS (floating-point) on

page A8-932

ASIMD

1 0x Vector Multiply VMUL (floating-point) on page A8-960 ASIMD

1110 0 0 0x Vector Compare Equal VCEQ (register) on page A8-844 ASIMD

1 0x Vector Compare Greater Than or Equal VCGE (register) on page A8-848 ASIMD

1x Vector Compare Greater Than VCGT (register) on page A8-852 ASIMD

1 1 - Vector Absolute Compare Greater or Less Than

(or Equal)

VACGE, VACGT, VACLE, VACLT on

page A8-826

ASIMD

1111 0 0 - Vector Maximum or Minimum VMAX, VMIN (floating-point) on

page A8-928

ASIMD

1- Vector Pairwise Maximum or Minimum VPMAX, VPMIN (floating-point) on

page A8-988

ASIMD

1 0 0x Vector Reciprocal Step VRECPS on page A8-1026 ASIMD

0 1x Vector Reciprocal Square Root Step VRSQRTS on page A8-1040 ASIMD

a. In this column, ASIMD indicates Advanced SIMD, and ASIMDv2 indicates Advanced SIMDv2.

Table A7-9 Three registers of the same length (continued)

A B U C Instruction See Varianta

A7 Advanced SIMD and Floating-point Instruction Encoding

A7.4 Advanced SIMD data-processing instructions

Non-Confidential ID051414

A7.4.2 Three registers of different lengths

The Thumb encoding of these instructions is:

The ARM encoding of these instructions is:

If B ==

0b11

, see Advanced SIMD data-processing instructions on page A7-261.

Otherwise, Table A7-10 shows the allocation of encodings in this space. Other encodings in this space are

UNDEFINED.

1 1 U 1 1 1 1 1 B A 0 0

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

1 1 1 0 0 1 U 1 B A 0 0

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Table A7-10 Data-processing instructions with three registers of different lengths

A U Instruction See

000x - Vector Add Long or Wide VADDL, VADDW on page A8-834

001x - Vector Subtract Long or Wide VSUBL, VSUBW on page A8-1090

0100 0 Vector Add and Narrow, returning High Half VADDHN on page A8-832

1 Vector Rounding Add and Narrow, returning High Half VRADDHN on page A8-1022

0101 - Vector Absolute Difference and Accumulate VABA, VABAL on page A8-818

0110 0 Vector Subtract and Narrow, returning High Half VSUBHN on page A8-1088

1 Vector Rounding Subtract and Narrow, returning High Half VRSUBHN on page A8-1044

0111 - Vector Absolute Difference VABD, VABDL (integer) on page A8-820

10x0 - Vector Multiply Accumulate or Subtract VMLA, VMLAL, VMLS, VMLSL (integer) on page A8-930

10x1 0 Vector Saturating Doubling Multiply Accumulate or

Subtract Long

VQDMLAL, VQDMLSL on page A8-998

1100 - Vector Multiply

(integer) VMUL, VMULL (integer and polynomial) on page A8-958

1101 0 Vector Saturating Doubling Multiply Long VQDMULL on page A8-1002

1110 - Vector Multiply (polynomial) VMUL, VMULL (integer and polynomial) on page A8-958

A7 Advanced SIMD and Floating-point Instruction Encoding

A7.4 Advanced SIMD data-processing instructions

ID051414 Non-Confidential

A7.4.3 Two registers and a scalar

The Thumb encoding of these instructions is:

The ARM encoding of these instructions is:

If B ==

0b11

, see Advanced SIMD data-processing instructions on page A7-261.

Otherwise, Table A7-11 shows the allocation of encodings in this space. Other encodings in this space are

UNDEFINED.

1 1 U 1 1 1 1 1 B A 1 0

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

1 1 1 0 0 1 U 1 B A 1 0

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Table A7-11 Data-processing instructions with two registers and a scalar

A U Instruction See

0x0x - Vector Multiply Accumulate or Subtract VMLA, VMLAL, VMLS, VMLSL (by scalar) on page A8-934

0x10 - Vector Multiply Accumulate or Subtract Long VMLA, VMLAL, VMLS, VMLSL (by scalar) on page A8-934

0x11 0 Vector Saturating Doubling Multiply Accumulate or

Subtract Long

VQDMLAL, VQDMLSL on page A8-998

100x - Vector Multiply VMUL, VMULL (by scalar) on page A8-962

1010 - Vector Multiply Long VMUL, VMULL (by scalar) on page A8-962

1011 0 Vector Saturating Doubling Multiply Long VQDMULL on page A8-1002

1100 - Vector Saturating Doubling Multiply returning High

Half

VQDMULH on page A8-1000

1101 - Vector Saturating Rounding Doubling Multiply

returning High Half

VQRDMULH on page A8-1008

A7 Advanced SIMD and Floating-point Instruction Encoding

A7.4 Advanced SIMD data-processing instructions

Non-Confidential ID051414

A7.4.4 Two registers and a shift amount

The Thumb encoding of these instructions is:

The ARM encoding of these instructions is:

If [L, imm3] ==

0b0000

, see One register and a modified immediate value on page A7-269.

Otherwise, Table A7-12 shows the allocation of encodings in this space. Other encodings in this space are

UNDEFINED.

1 1 U 1 1 1 1 1 imm3 A L B 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

1 1 1 0 0 1 U 1 imm3 A L B 1

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Table A7-12 Data-processing instructions with two registers and a shift amount

A U B L Instruction See

0000 - - - Vector Shift Right VSHR on page A8-1052

0001 - - - Vector Shift Right and Accumulate VSRA on page A8-1060

0010 - - - Vector Rounding Shift Right VRSHR on page A8-1034

0011 - - - Vector Rounding Shift Right and Accumulate VRSRA on page A8-1042

0100 1 - - Vector Shift Right and Insert VSRI on page A8-1062

0101 0 - - Vector Shift Left VSHL (immediate) on page A8-1046

1 - - Vector Shift Left and Insert VSLI on page A8-1056

011x - - - Vector Saturating Shift Left VQSHL, VQSHLU (immediate) on page A8-1016

1000 0 0 0 Vector Shift Right Narrow VSHRN on page A8-1054

1 0 Vector Rounding Shift Right Narrow VRSHRN on page A8-1036

1 0 0 Vector Saturating Shift Right, Unsigned Narrow VQSHRN, VQSHRUN on page A8-1018

1 0 Vector Saturating Shift Right, Rounded Unsigned

Narrow

VQRSHRN, VQRSHRUN on page A8-1012

1001 - 0 0 Vector Saturating Shift Right, Narrow VQSHRN, VQSHRUN on page A8-1018

1 0 Vector Saturating Shift Right, Rounded Narrow VQRSHRN, VQRSHRUN on page A8-1012

1010 - 0 0 Vector Shift Left Long VSHLL on page A8-1050

Vector Move Long VMOVL on page A8-950

111x - - 0 Vector Convert VCVT (between floating-point and fixed-point,

Advanced SIMD) on page A8-872

A7 Advanced SIMD and Floating-point Instruction Encoding

A7.4 Advanced SIMD data-processing instructions

ID051414 Non-Confidential

A7.4.5 Two registers, miscellaneous

The Thumb encoding of these instructions is:

The ARM encoding of these instructions is:

The allocation of encodings in this space is shown in Table A7-13. Other encodings in this space are UNDEFINED.

1 1 1 1 1 1 1 1 1 1 A 0 B 0

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

1 1 1 0 0 1 1 1 1 1 A 0 B 0

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Table A7-13 Instructions with two registers, miscellaneous

A B Instruction See

00 0000x Vector Reverse in doublewords VREV16, VREV32, VREV64 on page A8-1028

0001x Vector Reverse in words VREV16, VREV32, VREV64 on page A8-1028

0010x Vector Reverse in halfwords VREV16, VREV32, VREV64 on page A8-1028

010xx Vector Pairwise Add Long VPADDL on page A8-984

1000x Vector Count Leading Sign Bits VCLS on page A8-858

1001x Vector Count Leading Zeros VCLZ on page A8-862

1010x Vector Count VCNT on page A8-866

1011x Vector Bitwise NOT VMVN (register) on page A8-966

110xx Vector Pairwise Add and Accumulate Long VPADAL on page A8-978

00 1110x Vector Saturating Absolute VQABS on page A8-994

1111x Vector Saturating Negate VQNEG on page A8-1006

01 x000x Vector Compare Greater Than Zero VCGT (immediate #0) on page A8-854

x001x Vector Compare Greater Than or Equal to Zero VCGE (immediate #0) on page A8-850

x010x Vector Compare Equal to zero VCEQ (immediate #0) on page A8-846

x011x Vector Compare Less Than or Equal to Zero VCLE (immediate #0) on page A8-856

x100x Vector Compare Less Than Zero VCLT (immediate #0) on page A8-860

x110x Vector Absolute VABS on page A8-824

x111x Vector Negate VNEG on page A8-968

A7 Advanced SIMD and Floating-point Instruction Encoding

A7.4 Advanced SIMD data-processing instructions

Non-Confidential ID051414

10 0000x Vector Swap VSWP on page A8-1092

0001x Vector Transpose VTRN on page A8-1096

0010x Vector Unzip VUZP on page A8-1100

0011x Vector Zip VZIP on page A8-1102

01000 Vector Move and Narrow VMOVN on page A8-952

01001 Vector Saturating Move and Unsigned Narrow VQMOVN, VQMOVUN on page A8-1004

0101x Vector Saturating Move and Narrow VQMOVN, VQMOVUN on page A8-1004

01100 Vector Shift Left Long (maximum shift) VSHLL on page A8-1050

11x00 Vector Convert VCVT (between half-precision and single-precision, Advanced

SIMD) on page A8-878

11 10x0x Vector Reciprocal Estimate VRECPE on page A8-1024

10x1x Vector Reciprocal Square Root Estimate VRSQRTE on page A8-1038

11xxx Vector Convert VCVT (between floating-point and integer, Advanced SIMD) on

page A8-868

Table A7-13 Instructions with two registers, miscellaneous (continued)

A B Instruction See

A7 Advanced SIMD and Floating-point Instruction Encoding

A7.4 Advanced SIMD data-processing instructions

ID051414 Non-Confidential

A7.4.6 One register and a modified immediate value

The Thumb encoding of these instructions is:

The ARM encoding of these instructions is:

Table A7-14 shows the allocation of encodings in this space.

Table A7-15 shows the modified immediate constants available with these instructions, and how they are encoded.

Table A7-14 Data-processing instructions with one register and a modified immediate value

op cmode Instruction See

0 0xx0 Vector Move VMOV (immediate) on page A8-936

0xx1 Vector Bitwise OR VORR (immediate) on page A8-974

10x0 Vector Move VMOV (immediate) on page A8-936

10x1 Vector Bitwise OR VORR (immediate) on page A8-974

11xx Vector Move VMOV (immediate) on page A8-936

1 0xx0 Vector Bitwise NOT VMVN (immediate) on page A8-964

0xx1 Vector Bit Clear VBIC (immediate) on page A8-838

10x0 Vector Bitwise NOT VMVN (immediate) on page A8-964

10x1 Vector Bit Clear VBIC (immediate) on page A8-838

110x Vector Bitwise NOT VMVN (immediate) on page A8-964

1110 Vector Move VMOV (immediate) on page A8-936

1111 UNDEFINED -

1 1 a 1 1 1 1 1 0 0 0 b c d cmode 0 op 1 e f g h

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

1 1 1 0 0 1 a 1 0 0 0 b c d cmode 0 op 1 e f g h

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Table A7-15 Modified immediate values for Advanced SIMD instructions

op cmode Constanta<dt>bNotes

- 000x

00000000 00000000 00000000 abcdefgh 00000000 00000000 00000000 abcdefgh I32

001x

00000000 00000000 abcdefgh 00000000 00000000 00000000 abcdefgh 00000000 I32

c, d

010x

00000000 abcdefgh 00000000 00000000 00000000 abcdefgh 00000000 00000000 I32

c, d

011x

abcdefgh 00000000 00000000 00000000 abcdefgh 00000000 00000000 00000000 I32

c, d

100x

00000000 abcdefgh 00000000 abcdefgh 00000000 abcdefgh 00000000 abcdefgh I16

101x

abcdefgh 00000000 abcdefgh 00000000 abcdefgh 00000000 abcdefgh 00000000 I16

c, d

1100

00000000 00000000 abcdefgh 11111111 00000000 00000000 abcdefgh 11111111 I32

d, e

1101

00000000 abcdefgh 11111111 11111111 00000000 abcdefgh 11111111 11111111 I32

d, e

A7 Advanced SIMD and Floating-point Instruction Encoding

A7.4 Advanced SIMD data-processing instructions

Non-Confidential ID051414

01110

abcdefgh abcdefgh abcdefgh abcdefgh abcdefgh abcdefgh abcdefgh abcdefgh I8

1111

aBbbbbbc defgh000 00000000 00000000 aBbbbbbc defgh000 00000000 00000000 F32

f, g

11110

aaaaaaaa bbbbbbbb cccccccc dddddddd eeeeeeee ffffffff gggggggg hhhhhhhh I64

1111 UNDEFINED --

a. In this table, the immediate value is shown in binary form, to relate abcdefgh to the encoding diagram. In assembler

syntax, the constant is specified by a data type and a value of that type. That value is specified in the normal way (a

decimal number by default) and is replicated enough times to fill the 64-bit immediate. For example, a data type of

I32

and a value of 10 specify the 64-bit constant

0x0000000A0000000A

b. This specifies the data type used when the instruction is disassembled. On assembly, the data type must be matched in

the table if possible. Other data types are permitted as pseudo-instructions when a program is assembled, provided the

64-bit constant specified by the data type and value is available for the instruction. If a constant is available in more than

one way, the first entry in this table that can produce it is used. For example,

VMOV.I64 D0, #0x8000000080000000

does

not specify a 64-bit constant that is available from the I64 line of the table, but does specify one that is available from

the fourth I32 line or the F32 line. It is assembled to the first of these, and therefore is disassembled as

VMOV.I32 D0,

#0x80000000

c. This constant is available for the

VBIC

VMOV

VMVN

, and

VORR

instructions.

d. UNPREDICTABLE if

abcdefgh

== 00000000.

e. This constant is available for the

VMOV

and

VMVN

instructions only.

f. This constant is available for the

VMOV

instruction only.

g. In this entry,

=NOT(

). The bit pattern represents the floating-point number (–1)S × 2exp × mantissa, where

UInt(a)

, exp =

UInt(NOT(b):c:d)-3

and mantissa =

(16+UInt(e:f:g:h))/16

Table A7-15 Modified immediate values for Advanced SIMD instructions (continued)

op cmode Constanta<dt>bNotes

A7 Advanced SIMD and Floating-point Instruction Encoding

A7.4 Advanced SIMD data-processing instructions

ID051414 Non-Confidential

Advanced SIMD expand immediate pseudocode

// AdvSIMDExpandImm()

// ==================

bits(64) AdvSIMDExpandImm(bit op, bits(4) cmode, bits(8) imm8)

case cmode<3:1> of

when '000'

testimm8 = FALSE; imm64 = Replicate(Zeros(24):imm8, 2);

when '001'

testimm8 = TRUE; imm64 = Replicate(Zeros(16):imm8:Zeros(8), 2);

when '010'

testimm8 = TRUE; imm64 = Replicate(Zeros(8):imm8:Zeros(16), 2);

when '011'

testimm8 = TRUE; imm64 = Replicate(imm8:Zeros(24), 2);

when '100'

testimm8 = FALSE; imm64 = Replicate(Zeros(8):imm8, 4);

when '101'

testimm8 = TRUE; imm64 = Replicate(imm8:Zeros(8), 4);

when '110'

testimm8 = TRUE;

if cmode<0> == '0' then

imm64 = Replicate(Zeros(16):imm8:Ones(8), 2);

else

imm64 = Replicate(Zeros(8):imm8:Ones(16), 2);

when '111'

testimm8 = FALSE;

if cmode<0> == '0' && op == '0' then

imm64 = Replicate(imm8, 8);

if cmode<0> == '0' && op == '1' then

imm8a = Replicate(imm8<7>, 8); imm8b = Replicate(imm8<6>, 8);

imm8c = Replicate(imm8<5>, 8); imm8d = Replicate(imm8<4>, 8);

imm8e = Replicate(imm8<3>, 8); imm8f = Replicate(imm8<2>, 8);

imm8g = Replicate(imm8<1>, 8); imm8h = Replicate(imm8<0>, 8);

imm64 = imm8a:imm8b:imm8c:imm8d:imm8e:imm8f:imm8g:imm8h;

if cmode<0> == '1' && op == '0' then

imm32 = imm8<7>:NOT(imm8<6>):Replicate(imm8<6>,5):imm8<5:0>:Zeros(19);

imm64 = Replicate(imm32, 2);

if cmode<0> == '1' && op == '1' then

UNDEFINED;

if testimm8 && imm8 == '00000000' then

UNPREDICTABLE;

return imm64;

A7 Advanced SIMD and Floating-point Instruction Encoding

A7.5 Floating-point data-processing instructions

Non-Confidential ID051414

A7.5 Floating-point data-processing instructions

The Thumb encoding of Floating-point (VFP) data processing instructions is:

The ARM encoding of Floating-point (VFP) data processing instructions is:

== 1 in the Thumb encoding or

cond

0b1111

in the ARM encoding, the instruction is UNDEFINED.

Otherwise:

•Table A7-16 shows the encodings for three-register Floating-point data-processing instructions. Other

encodings in this space are UNDEFINED.

•Table A7-17 applies only if Table A7-16 indicates that it does. It shows the encodings for Floating-point

data-processing instructions with two registers or a register and an immediate. Other encodings in this space

are UNDEFINED.

•Table A7-18 on page A7-273 shows the immediate constants available in the

VMOV

(immediate) instruction.

These instructions are

CDP

instructions for coprocessors 10 and 11.

1 1 T 1 1 1 0 opc1 opc2 1 0 1 opc3 0 opc4

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

1 1 1 0 opc1 opc2 1 0 1 opc3 0 opc4

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond

Table A7-16 Three-register Floating-point data-processing instructions

opc1 opc3 Instruction See Variant

0x00 - Vector Multiply Accumulate or Subtract VMLA, VMLS (floating-point) on page A8-932 VFPv2

0x01 - Vector Negate Multiply Accumulate or Subtract VNMLA, VNMLS, VNMUL on page A8-970 VFPv2

0x10 x1

x0 Vector Multiply VMUL (floating-point) on page A8-960 VFPv2

0x11 x0 Vector Add VADD (floating-point) on page A8-830 VFPv2

x1 Vector Subtract VSUB (floating-point) on page A8-1086 VFPv2

1x00 x0 Vector Divide VDIV on page A8-882

1x01 - Vector Fused Negate Multiply Accumulate or

Subtract

VFNMA, VFNMS on page A8-894 VFPv4

1x10 - Vector Fused Multiply Accumulate or Subtract VFMA, VFMS on page A8-892 VFPv4

1x11 - Other Floating-point data-processing instructions Table A7-17 -

Table A7-17 Other Floating-point data-processing instructions

opc2 opc3 Instruction See Variant

-x0Vector Move VMOV (immediate) on page A8-936 VFPv3

0000 01 Vector Move VMOV (register) on page A8-938 VFPv2

11 Vector Absolute VABS on page A8-824 VFPv2

0001 01 Vector Negate VNEG on page A8-968 VFPv2

11 Vector Square Root VSQRT on page A8-1058 VFPv2

A7 Advanced SIMD and Floating-point Instruction Encoding

A7.5 Floating-point data-processing instructions

ID051414 Non-Confidential

A7.5.1 Operation of modified immediate constants, Floating-point

The

VFPExpandImm()

pseudocode function describes the operation of an immediate constant in a floating-point

instruction.

// VFPExpandImm()

// ==============

bits(N) VFPExpandImm(bits(8) imm8, integer N)

assert N IN {32,64};

if N == 32 then

E = 8;

else

E = 11;

F = N - E - 1;

sign = imm8<7>;

exp = NOT(imm8<6>):Replicate(imm8<6>,E-3):imm8<5:4>;

frac = imm8<3:0>:Zeros(F-4);

return sign:exp:frac;

001x x1 Vector Convert VCVTB, VCVTT on page A8-880 VFPv3HPa

010x x1 Vector Compare VCMP, VCMPE on page A8-864 VFPv2

0111 11 Vector Convert VCVT (between double-precision and single-precision) on page A8-876 VFPv2

1000 x1 Vector Convert VCVT, VCVTR (between floating-point and integer, Floating-point) on

page A8-870

VFPv2

101x x1 Vector Convert VCVT (between floating-point and fixed-point, Floating-point) on

page A8-874

VFPv3

110x x1 Vector Convert VCVT, VCVTR (between floating-point and integer, Floating-point) on

page A8-870

VFPv2

111x x1 Vector Convert VCVT (between floating-point and fixed-point, Floating-point) on

page A8-874

VFPv3

a. VFPv3 Half-precision Extension.

Table A7-17 Other Floating-point data-processing instructions (continued)

opc2 opc3 Instruction See Variant

Table A7-18 Floating-point modified immediate constants

Data type opc2 opc4 Constant a

F32

abcd efgh

aBbbbbbc defgh000 00000000 00000000

F64

abcd efgh

aBbbbbbb bbcdefgh 00000000 00000000 00000000 00000000 00000000 00000000

a. In this column,

=NOT(

). The bit pattern represents the floating-point number (–1)S × 2exp × mantissa, where

UInt(a)

, exp =

UInt(NOT(b):c:d)-3

and mantissa =

(16+UInt(e:f:g:h))/16

A7 Advanced SIMD and Floating-point Instruction Encoding

A7.6 Extension register load/store instructions

Non-Confidential ID051414

A7.6 Extension register load/store instructions

The Thumb encoding of Advanced SIMD and Floating-point (VFP) Extension register load and store instructions is:

The ARM encoding of Advanced SIMD and Floating-point (VFP) Extension register load and store instructions is:

== 1 in the Thumb encoding or

cond

0b1111

in the ARM encoding, the instruction is UNDEFINED.

Otherwise, the allocation of encodings in this space is shown in Table A7-19. Other encodings in this space are

UNDEFINED.

These instructions are

LDC

and

STC

instructions for coprocessors 10 and 11.

1 1 T 1 1 0 Opcode Rn 1 0 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

1 1 0 Opcode Rn 1 0 1

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond

Table A7-19 Extension register load/store instructions

Opcode Rn Instruction See

0010x - - 64-bit transfers between ARM core and extension

registers on page A7-279

01x00 - Vector Store Multiple (Increment After, no writeback) VSTM on page A8-1080

01x10 - Vector Store Multiple (Increment After, writeback) VSTM on page A8-1080

1xx00 - Vector Store Register VSTR on page A8-1082

10x10 not 1101 Vector Store Multiple (Decrement Before, writeback) VSTM on page A8-1080

1101 Vector Push Registers VPUSH on page A8-992

01x01 - Vector Load Multiple (Increment After, no writeback) VLDM on page A8-922

01x11 not 1101 Vector Load Multiple (Increment After, writeback) VLDM on page A8-922

1101 Vector Pop Registers VPOP on page A8-990

1xx01 - Vector Load Register VLDR on page A8-924

10x11 - Vector Load Multiple (Decrement Before, writeback) VLDM on page A8-922

A7 Advanced SIMD and Floating-point Instruction Encoding

A7.7 Advanced SIMD element or structure load/store instructions

ID051414 Non-Confidential

A7.7 Advanced SIMD element or structure load/store instructions

The Thumb encoding of Advanced SIMD element load and store instructions is:

The ARM encoding of Advanced SIMD element load and store instructions is:

The allocation of encodings in this space is shown in:

•Table A7-20 if L == 0. These are the encodings for store instructions.

•Table A7-21 on page A7-276 if L == 1. These are the encodings for load instructions.

Other encodings in this space are UNDEFINED.

The variable bits are in identical locations in the two encodings, after adjusting for the fact that the ARM encoding

is held in memory as a single word and the Thumb encoding is held as two consecutive halfwords.

The ARM instructions can only be executed unconditionally. The Thumb instructions can be executed conditionally

by using the

instruction. For details see IT on page A8-390.

Table A7-20 Element and structure store instructions (L == 0)

A B Instruction See

0 0010

011x

1010

Vector Store VST1 (multiple single elements) on page A8-1064

0011

100x

Vector Store VST2 (multiple 2-element structures) on page A8-1068

010x Vector Store VST3 (multiple 3-element structures) on page A8-1072

000x Vector Store VST4 (multiple 4-element structures) on page A8-1076

1 0x00

1000

Vector Store VST1 (single element from one lane) on page A8-1066

0x01

1001

Vector Store VST2 (single 2-element structure from one lane) on page A8-1070

0x10

1010

Vector Store VST3 (single 3-element structure from one lane) on page A8-1074

0x11

1011

Vector Store VST4 (single 4-element structure from one lane) on page A8-1078

1 1 1 1 0 0 1 A L 0 B

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

1 1 1 0 1 0 0 A L 0 B

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A7 Advanced SIMD and Floating-point Instruction Encoding

A7.7 Advanced SIMD element or structure load/store instructions

Non-Confidential ID051414

Table A7-21 Element and structure load instructions (L == 1)

A B Instruction See

0 0010

011x

1010

Vector Load VLD1 (multiple single elements) on page A8-898

0011

100x

Vector Load VLD2 (multiple 2-element structures) on page A8-904

010x Vector Load VLD3 (multiple 3-element structures) on page A8-910

000x Vector Load VLD4 (multiple 4-element structures) on page A8-916

1 0x00

1000

Vector Load VLD1 (single element to one lane) on page A8-900

1100 Vector Load VLD1 (single element to all lanes) on page A8-902

0x01

1001

Vector Load VLD2 (single 2-element structure to one lane) on page A8-906

1101 Vector Load VLD2 (single 2-element structure to all lanes) on page A8-908

0x10

1010

Vector Load VLD3 (single 3-element structure to one lane) on page A8-912

1110 Vector Load VLD3 (single 3-element structure to all lanes) on page A8-914

0x11

1011

Vector Load VLD4 (single 4-element structure to one lane) on page A8-918

1111 Vector Load VLD4 (single 4-element structure to all lanes) on page A8-920

A7 Advanced SIMD and Floating-point Instruction Encoding

A7.7 Advanced SIMD element or structure load/store instructions

ID051414 Non-Confidential

A7.7.1 Advanced SIMD addressing mode

All the element and structure load/store instructions use this addressing mode. There is a choice of three formats:

[<Rn>{:<align>}]

The address is contained in ARM core register Rn.

Rn is not updated by this instruction.

Encoded as Rm =

0b1111

If Rn is encoded as

0b1111

, the instruction is UNPREDICTABLE.

[<Rn>{:<align>}]!

The address is contained in ARM core register Rn.

Rn is updated by this instruction:

Rn = Rn + transfer_size

Encoded as Rm =

0b1101

transfer_size

is the number of bytes transferred by the instruction. This means that, after

the instruction is executed, Rn points to the address in memory immediately following the

last address loaded from or stored to.

If Rn is encoded as

0b1111

, the instruction is UNPREDICTABLE.

This addressing mode can also be written as:

[<Rn>{:align}], #<transfer_size>

However, disassembly produces the

[<Rn>{:align}]!

form.

[<Rn>{:<align>}], <Rm>

The address is contained in ARM core register

<Rn>

Rn is updated by this instruction:

Rn = Rn + Rm

Encoded as Rm = Rm. Rm must not be encoded as

0b1111

0b1101

, the PC or the SP.

If Rn is encoded as

0b1111

, the instruction is UNPREDICTABLE.

In all cases,

<align>

specifies an alignment. Details are given in the individual instruction descriptions.

Previous versions of the document used the

character for alignment. So, for example, the first format in this section

was shown as

[<Rn>{@<align>}]

. Both

and

are supported. However, to ensure portability of code to assemblers

that treat

as a comment character,

is preferred.

A7 Advanced SIMD and Floating-point Instruction Encoding

A7.8 8, 16, and 32-bit transfer between ARM core and extension registers

Non-Confidential ID051414

A7.8 8, 16, and 32-bit transfer between ARM core and extension registers

The Thumb encoding of Advanced SIMD and Floating-point 8-bit, 16-bit, and 32-bit register data transfer

instructions is:

The ARM encoding of Advanced SIMD and Floating-point 8-bit, 16-bit, and 32-bit register data transfer

instructions is:

== 1 in the Thumb encoding or

cond

0b1111

in the ARM encoding, the instruction is UNDEFINED.

Otherwise, the allocation of encodings in this space is shown in Table A7-22. Other encodings in this space are

UNDEFINED.

These instructions are

MRC

and

MCR

instructions for coprocessors 10 and 11.

1 1 T 1 1 1 0 A L 1 0 1 C B 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

1 1 1 0 A L 1 0 1 C B 1

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond

Table A7-22 8-bit, 16-bit and 32-bit data transfer instructions

L C A B Instruction See

0 0 000 - Vector Move VMOV (between ARM core register and single-precision register) on

page A8-944

111 - Move to Floating-point Special

VMSR on page A8-956

VMSR on page B9-2016, System level view

0 1 0xx - Vector Move VMOV (ARM core register to scalar) on page A8-940

1xx 0x Vector Duplicate VDUP (ARM core register) on page A8-886

1 0 000 - Vector Move VMOV (between ARM core register and single-precision register) on

page A8-944

111 - Move to ARM core register from

Floating-point Special register

VMRS on page A8-954

VMRS on page B9-2014, System level view

1 xxx - Vector Move VMOV (scalar to ARM core register) on page A8-942

A7 Advanced SIMD and Floating-point Instruction Encoding

A7.9 64-bit transfers between ARM core and extension registers

ID051414 Non-Confidential

A7.9 64-bit transfers between ARM core and extension registers

The Thumb encoding of Advanced SIMD and Floating-point 64-bit register data transfer instructions is:

The ARM encoding of Advanced SIMD and Floating-point 64-bit register data transfer instructions is:

== 1 in the Thumb encoding or

cond

0b1111

in the ARM encoding, the instruction is UNDEFINED.

Otherwise, the allocation of encodings in this space is shown in Table A7-23. Other encodings in this space are

UNDEFINED.

These instructions are

MRRC

and

MCRR

instructions for coprocessors 10 and 11.

Table A7-23 64-bit data transfer instructions

C op Instruction

0 00x1 VMOV (between two ARM core registers and two single-precision registers) on page A8-946

1 00x1 VMOV (between two ARM core registers and a doubleword extension register) on page A8-948

1 1 T 1 1 0 0 0 1 0 1 0 1 C op

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

1 1 0 0 0 1 0 1 0 1 C op

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond

A7 Advanced SIMD and Floating-point Instruction Encoding

A7.9 64-bit transfers between ARM core and extension registers

Non-Confidential ID051414

ID051414 Non-Confidential

Chapter A8

Instruction Descriptions

This chapter describes each instruction. It contains the following sections:

•Format of instruction descriptions on page A8-282

•Standard assembler syntax fields on page A8-287

•Conditional execution on page A8-288

•Shifts applied to a register on page A8-291

•Memory accesses on page A8-294

•Encoding of lists of ARM core registers on page A8-295

•Additional pseudocode support for instruction descriptions on page A8-296

•Alphabetical list of instructions on page A8-300.

Note

The Floating-point Extension was previously described as the VFP Extension, and:

• Different versions of this extension, and the instructions they introduce, are identified using the abbreviation

VFP, for example VFPv3.

• The deprecated vector features of the Floating-point Extension are identified as VFP vectors.

A8 Instruction Descriptions

A8.1 Format of instruction descriptions

Non-Confidential ID051414

A8.1 Format of instruction descriptions

The instruction descriptions in Alphabetical list of instructions on page A8-300 normally use the following format:

• instruction section title

• introduction to the instruction

• instruction encoding(s) with architecture information

• assembler syntax

• pseudocode describing how the instruction operates

• exception information

• notes (where applicable).

Each of these items is described in more detail in the following subsections.

A few instruction descriptions describe alternative mnemonics for other instructions and use an abbreviated and

modified version of this format.

A8.1.1 Instruction section title

The instruction section title gives the base mnemonic for the instructions described in the section. When one

mnemonic has multiple forms described in separate instruction sections, this is followed by a short description of

the form in parentheses. The most common use of this is to distinguish between forms of an instruction in which

one of the operands is an immediate value and forms in which it is a register.

Another use of parenthesized text is to indicate the former mnemonic in some cases where a mnemonic has been

replaced entirely by another mnemonic in the new assembler syntax.

A8.1.2 Introduction to the instruction

The instruction section title is followed by text that briefly describes the main features of the instruction. This

description is not necessarily complete and is not definitive. If there is any conflict between it and the more detailed

information that follows, the latter takes priority.

A8.1.3 Instruction encodings

This is a list of one or more instruction encodings. Each instruction encoding is labelled as:

• T1, T2, T3 … for the first, second, third and any additional Thumb encodings

• A1, A2, A3 … for the first, second, third and any additional ARM encodings

• E1, E2, E3 … for the first, second, third and any additional ThumbEE encodings that are not also Thumb

encodings.

Where Thumb and ARM encodings are very closely related, the two encodings are described together, for example

as encoding T1/A1.

Each instruction encoding description consists of:

• Information about which architecture variants include the particular encoding of the instruction. This is

presented in one of two ways:

— For instruction encodings that are in the main instruction set architecture, as a list of the architecture

variants that include the encoding. See Architecture versions, profiles, and variants on page A1-30 for

a summary of these variants.

— For instruction encodings that are in the architecture extensions, as a list of the architecture extensions

that include the encoding. See Architecture extensions on page A1-32 for a summary of the

architecture extensions and the architecture variants that they can extend.

In architecture variant lists:

— ARMv7 means ARMv7-A and ARMv7-R profiles. The architecture variant information in this manual

does not cover the ARMv7-M profile.

A8 Instruction Descriptions

A8.1 Format of instruction descriptions

ID051414 Non-Confidential

— * is used as a wildcard. For example, ARMv5T* means ARMv5T, ARMv5TE, and ARMv5TEJ.

• An assembly syntax that ensures that the assembler selects the encoding in preference to any other encoding.

In some cases, multiple syntaxes are given. The correct one to use is sometimes indicated by annotations to

the syntax, such as Inside IT block and Outside IT block. In other cases, the correct one to use can be

determined by looking at the assembler syntax description and using it to determine which syntax

corresponds to the instruction being disassembled.

There is usually more than one syntax that ensures re-assembly to any particular encoding, and the exact set

of syntaxes that do so usually depends on the register numbers, immediate constants and other operands to

the instruction. For example, when assembling to the Thumb instruction set, the syntax

AND R0, R0, R8

ensures selection of a 32-bit encoding but

AND R0, R0, R1

selects a 16-bit encoding.

The assembly syntax documented for the encoding is chosen to be the simplest one that ensures selection of

that encoding for all operand combinations supported by that encoding. This often means that it includes

elements that are only necessary for a small subset of operand combinations. For example, the assembler

syntax documented for the 32-bit Thumb

AND

(register) encoding includes the

qualifier to ensure that the

32-bit encoding is selected even for the small proportion of operand combinations for which the 16-bit

encoding is also available.

The assembly syntax given for an encoding is therefore a suitable one for a disassembler to disassemble that

encoding to. However, disassemblers might wish to use simpler syntaxes when they are suitable for the

operand combination, in order to produce more readable disassembled code.

• An encoding diagram, or a Thumb encoding diagram followed by an ARM encoding diagram when they are

being described together. This is half-width for 16-bit Thumb encodings and full-width for 32-bit Thumb and

ARM encodings. The 32-bit ARM encoding diagrams number the bits from 31 to 0, while the 32-bit Thumb

encoding diagrams number the bits from 15 to 0 for each halfword, to distinguish them from ARM encodings

and to act as a reminder that a 32-bit Thumb instruction consists of two consecutive halfwords rather than a

word.

In particular, if instructions are stored using the standard little-endian instruction endianness, the encoding

diagram for an ARM instruction at address A shows the bytes at addresses A+3, A+2, A+1, A from left to

right, but the encoding diagram for a 32-bit Thumb instruction shows them in the order A+1, A for the first

halfword, followed by A+3, A+2 for the second halfword.

• Encoding-specific pseudocode. This is pseudocode that translates the encoding-specific instruction fields

into inputs to the encoding-independent pseudocode in the later Operation subsection, and that picks out any

special cases in the encoding. For a detailed description of the pseudocode used and of the relationship

between the encoding diagram, the encoding-specific pseudocode and the encoding-independent

pseudocode, see Appendix D16 Pseudocode Definition.

A8.1.4 Assembler syntax

The Assembly syntax subsection describes the standard UAL syntax for the instruction.

Each syntax description consists of the following elements:

• One or more syntax prototype lines written in a

typewriter

font, using the conventions described in

Assembler syntax prototype line conventions on page A8-284. Each prototype line documents the mnemonic

and (where appropriate) operand parts of a full line of assembler code. When there is more than one such line,

each prototype line is annotated to indicate required results of the encoding-specific pseudocode.

For each instruction encoding belonging to a target instruction set, an assembler can use this information to

determine whether it can use that encoding to encode the instruction requested by the UAL source. If multiple

encodings can encode the instruction then:

— If both a 16-bit encoding and a 32-bit encoding can encode the instruction, the architecture prefers the

16-bit encoding. This means the assembler must use the 16-bit encoding rather than the 32-bit

encoding.

Software can use the

and

qualifiers to specify the required encoding width, see Standard

assembler syntax fields on page A8-287.

— If multiple encodings of the same length can encode the instruction, the Assembler syntax subsection

says which encoding is preferred, and how software can, instead, select the other encodings.

A8 Instruction Descriptions

A8.1 Format of instruction descriptions

Non-Confidential ID051414

Each encoding also documents UAL syntax that selects it in preference to any other encoding.

If no encodings of the target instruction set can encode the instruction requested by the UAL source, normally

the assembler generates an error saying that the instruction is not available in that instruction set.

Note

Often, an instruction is available in one instruction set but not in another. The Assembler syntax subsection

identifies many of these cases. For example, the ARM instructions with bits<31:28> ==

0b1111

described in

Unconditional instructions on page A5-216 cannot have a condition code, but the equivalent Thumb

instructions often can, and this usually appears in the Assembler syntax subsection as a statement that the

ARM instruction cannot be conditional.

However, some such cases are too complex to describe in the available space, so the definitive test of whether

an instruction is available in a given instruction set is whether there is an available encoding for it in that

instruction set.

• The line where: followed by descriptions of all of the variable or optional fields of the prototype syntax line.

Some syntax fields are standardized across all or most instructions. Standard assembler syntax fields on

page A8-287 describes these fields.

By default, syntax fields that specify registers, such as

<Rd>

<Rn>

, or

<Rt>

, can be any of R0-R12 or LR in

Thumb instructions, and any of R0-R12, SP or LR in ARM instructions. These require that the

encoding-specific pseudocode set the corresponding integer variable (such as

, or

) to the corresponding

— Normally, software can do this by setting the corresponding field in the instruction, typically named

Rd, Rn, Rt, to the binary encoding of that number.

— In the case of 16-bit Thumb encodings, the field is normally of length 3, and so the encoding is only

available when the assembler syntax specifies one of R0-R7. Such encodings often use a register field

name like Rdn. This indicates that the encoding is only available if

<Rd>

and

<Rn>

specify the same

The description of a syntax field that specifies a register sometimes extends or restricts the permitted range

of registers or documents other differences from the default rules for such fields. Examples of extensions are

permitting the use of the SP in a Thumb instruction, or permitting the use of the PC, identified using register

number 15.

• Where appropriate, text that briefly describes changes from the pre-UAL ARM assembler syntax. Where

present, this usually consists of an alternative pre-UAL form of the assembler mnemonic. The pre-UAL

ARM assembler syntax does not conflict with UAL. ARM recommends that it is supported, as an optional

extension to UAL, so that pre-UAL ARM assembler source files can be assembled.

Note

The pre-UAL Thumb assembler syntax is incompatible with UAL and is not documented in the instruction sections.

For details see Appendix D8 Legacy Instruction Mnemonics.

Assembler syntax prototype line conventions

The following conventions are used in assembler syntax prototype lines and their subfields:

< >

Any item bracketed by

and

is a short description of a type of value to be supplied by the user in

that position. A longer description of the item is normally supplied by subsequent text. Such items

often correspond to a similarly named field in an encoding diagram for an instruction. When the

correspondence only requires the binary encoding of an integer value or register number to be

substituted into the instruction encoding, it is not described explicitly. For example, if the assembler

syntax for an ARM instruction contains an item

<Rn>

and the instruction encoding diagram contains

a 4-bit field named Rn, the number of the register specified in the assembler syntax is encoded in

binary in the instruction field.

A8 Instruction Descriptions

A8.1 Format of instruction descriptions

ID051414 Non-Confidential

If the correspondence between the assembler syntax item and the instruction encoding is more

complex than simple binary encoding of an integer or register number, the item description indicates

how it is encoded. This is often done by specifying a required output from the encoding-specific

pseudocode, such as

add = TRUE

. The assembler must only use encodings that produce that output.

{}

Any item bracketed by

{

and

}

is optional. A description of the item and of how its presence or

absence is encoded in the instruction is normally supplied by subsequent text.

Many instructions have an optional destination register. Unless otherwise stated, if such a

destination register is omitted, it is the same as the immediately following source register in the

instruction syntax.

In the assembler syntax, numeric constants are normally preceded by a

. Some UAL instruction

syntax descriptions explicitly show this

as optional. Any UAL assembler:

• must treat the

as optional where an instruction syntax description shows it as optional

• can treat the

either as mandatory or as optional where an instruction syntax description does

not show it as optional.

Note

ARM recommends that UAL assemblers treat all uses of

shown in this manual as optional.

spaces Single spaces are used for clarity, to separate items. When a space is obligatory in the assembler

syntax, two or more consecutive spaces are used.

+/-

This indicates an optional

sign. If neither is coded,

is assumed.

All other characters must be encoded precisely as they appear in the assembler syntax. Apart from

{

and

}

, the

special characters described above do not appear in the basic forms of assembler instructions documented in this

manual. The

{

and

}

characters need to be encoded in a few places as part of a variable item. When this happens,

the long description of the variable item indicates how they must be used.

A8.1.5 Pseudocode describing how the instruction operates

The Operation subsection contains encoding-independent pseudocode that describes the main operation of the

instruction. For a detailed description of the pseudocode used and of the relationship between the encoding diagram,

the encoding-specific pseudocode and the encoding-independent pseudocode, see Appendix D16 Pseudocode

Definition.

A8 Instruction Descriptions

A8.1 Format of instruction descriptions

Non-Confidential ID051414

A8.1.6 Exception information

The Exceptions subsection contains a list of the exceptional conditions that can be caused by execution of the

instruction.

Processor exceptions are listed as follows:

• Resets and interrupts (both IRQs and FIQs) are not listed. They can occur before or after the execution of any

instruction, and in some cases during the execution of an instruction, but they are not in general caused by

the instruction concerned.

• Prefetch Abort exceptions are normally caused by a memory abort when an instruction is fetched, followed

by an attempt to execute that instruction. This can happen for any instruction, but is caused by the aborted

attempt to fetch the instruction rather than by the instruction itself, and so is not listed. A special case is the

BKPT

instruction, that is defined as causing a Prefetch Abort exception in some circumstances.

• Data Abort exceptions are listed for all instructions that perform data memory accesses.

• Undefined Instruction exceptions are listed when they are part of the effects of a defined instruction. For

example, all coprocessor instructions are defined to produce the Undefined Instruction exception if not

accepted by their coprocessor. Undefined Instruction exceptions caused by the execution of an undefined

instruction are not listed, even when the undefined instruction is a special case of one or more of the

encodings of the instruction. Such special cases are instead indicated in the encoding-specific pseudocode for

the encoding.

• Supervisor Call and Secure Monitor Call exceptions are listed for the

SVC

and

SMC

instructions respectively.

Supervisor Call exceptions and the

SVC

instruction were previously called Software Interrupt exceptions and

the

SWI

instruction. Secure Monitor Call exceptions and the

SMC

instruction were previously called Secure

Monitor interrupts and the

SMI

instruction.

Floating-point exceptions are listed for instructions that can produce them. Floating-point exceptions on

page A2-70 describes these exceptions. They do not normally result in processor exceptions.

A8.1.7 Notes

Where appropriate, other notes about the instruction appear under additional subheadings.

Note

Information that was documented in notes in previous versions of the ARM Architecture Reference Manual and its

supplements has often been moved elsewhere. For example, operand restrictions on the values of fields in an

instruction encoding are now normally documented in the encoding-specific pseudocode for that encoding.

A8 Instruction Descriptions

A8.2 Standard assembler syntax fields

ID051414 Non-Confidential

A8.2 Standard assembler syntax fields

The following assembler syntax fields are standard across all or most instructions:

<c>

Is an optional field. It specifies the condition under which the instruction is executed. See

Conditional execution on page A8-288 for the range of available conditions and their encoding. If

<c>

is omitted, it defaults to always (

<q>

Specifies optional assembler qualifiers on the instruction. The following qualifiers are defined:

Meaning narrow, specifies that the assembler must select a 16-bit encoding for the

instruction. If this is not possible, an assembler error is produced.

Meaning wide, specifies that the assembler must select a 32-bit encoding for the

instruction. If this is not possible, an assembler error is produced.

If neither

nor

is specified, the assembler can select either 16-bit or 32-bit encodings. If both

are available, it must select a 16-bit encoding. In a few cases, more than one encoding of the same

length can be available for an instruction. The rules for selecting between such encodings are

instruction-specific and are part of the instruction description.

Note

When assembling to the ARM instruction set, the

qualifier produces an assembler error and

the

qualifier has no effect.

A8 Instruction Descriptions

A8.3 Conditional execution

Non-Confidential ID051414

A8.3 Conditional execution

Most ARM instructions, and most Thumb instructions from ARMv6T2 onwards, can be executed conditionally,

based on the values of the APSR condition flags. Before ARMv6T2, the only conditional Thumb instruction was

the 16-bit conditional branch instruction. Table A8-1 lists the available conditions.

In Thumb instructions, the condition, if it is not

, is normally encoded in a preceding

instruction. For more

information see Conditional instructions on page A4-162 and IT on page A8-390. Some conditional branch

instructions do not require a preceding

instruction, because they include a condition code in their encoding.

In ARM instructions, bits[31:28] of the instruction contain the condition code, or contain

0b1111

for some ARM

instructions that can only be executed unconditionally.

ARM deprecates the conditional execution of any instruction encoding provided by the Advanced SIMD Extension

that is not also provided by the Floating-point (VFP) extension, and strongly recommends that:

• For ARM instructions, any such Advanced SIMD instruction that can be conditionally executed is executed

with the

<c>

field omitted or set to

Note

This applies only to

VDUP

, see VDUP (ARM core register) on page A8-886. The other instructions do not

permit conditional execution in ARM state.

Table A8-1 Condition codes

cond Mnemonic

extension Meaning (integer) Meaning (floating-point) a

a. Unordered means at least one NaN operand.

Condition flags

0000

Equal Equal Z == 1

0001

Not equal Not equal, or unordered Z == 0

0010

(unsigned higher or same) is a synonym for

Carry set Greater than, equal, or unordered C == 1

0011

(unsigned lower) is a synonym for

Carry clear Less than C == 0

0100

Minus, negative Less than N == 1

0101

Plus, positive or zero Greater than, equal, or unordered N == 0

0110

Overflow Unordered V == 1

0111

No overflow Not unordered V == 0

1000

Unsigned higher Greater than, or unordered C == 1 and Z == 0

1001

Unsigned lower or same Less than or equal C == 0 or Z == 1

1010

Signed greater than or equal Greater than or equal N == V

1011

Signed less than Less than, or unordered N != V

1100

Signed greater than Greater than Z == 0 and N == V

1101

Signed less than or equal Less than, equal, or unordered Z == 1 or N != V

1110 None (

) d

is an optional mnemonic extension for always, except in

instructions. For details see IT on page A8-390.

Always (unconditional) Always (unconditional) Any

A8 Instruction Descriptions

A8.3 Conditional execution

ID051414 Non-Confidential

• For Thumb instructions, such Advanced SIMD instructions are never included in an IT block. This means

they must be specified with the

<c>

field omitted or set to

This deprecation does not apply to Advanced SIMD instruction encodings that are also available as Floating-point

instruction encodings. That is, it does not apply to the Advanced SIMD encodings of the instructions described in

the following sections:

•VLDM on page A8-922.

•VLDR on page A8-924.

•VMOV (ARM core register to scalar) on page A8-940.

•VMOV (between two ARM core registers and a doubleword extension register) on page A8-948.

•VMRS on page A8-954.

•VMSR on page A8-956.

•VPOP on page A8-990.

•VPUSH on page A8-992.

•VSTM on page A8-1080.

•VSTR on page A8-1082.

See also Conditional execution of undefined instructions on page B1-1209.

A8.3.1 Pseudocode details of conditional execution

The

CurrentCond()

pseudocode function has prototype:

bits(4) CurrentCond()

This function returns a 4-bit condition specifier as follows:

• For ARM instructions, it returns bits[31:28] of the instruction.

• For the T1 and T3 encodings of the Branch instruction (see B on page A8-334), it returns the 4-bit

cond

field

of the encoding.

• For all other Thumb and ThumbEE instructions:

—if

ITSTATE.IT<3:0> != '0000'

it returns

ITSTATE.IT<7:4>

—if

ITSTATE.IT<7:0> == '00000000'

it returns

'1110'

— otherwise, execution of the instruction is UNPREDICTABLE.

For more information, see IT block state register, ITSTATE on page A2-51.

The

ConditionPassed()

function uses this condition specifier and the APSR condition flags to determine whether

the instruction must be executed:

// ConditionPassed()

// =================

boolean ConditionPassed()

cond = CurrentCond();

// Evaluate base condition.

case cond<3:1> of

when ‘000’ result = (APSR.Z == ‘1’); // EQ or NE

when ‘001’ result = (APSR.C == ‘1’); // CS or CC

when ‘010’ result = (APSR.N == ‘1’); // MI or PL

when ‘011’ result = (APSR.V == ‘1’); // VS or VC

when ‘100’ result = (APSR.C == ‘1’) && (APSR.Z == ‘0’); // HI or LS

when ‘101’ result = (APSR.N == APSR.V); // GE or LT

when ‘110’ result = (APSR.N == APSR.V) && (APSR.Z == ‘0’); // GT or LE

when ‘111’ result = TRUE; // AL

// Condition flag values in the set ‘111x’ indicate the instruction is always executed.

// Otherwise, invert condition if necessary.

if cond<0> == ‘1’ && cond != ‘1111’ then

result = !result;

A8 Instruction Descriptions

A8.3 Conditional execution

Non-Confidential ID051414

return result;

Undefined Instruction exception on page B1-1206 describes the handling of conditional instructions that are

UNDEFINED or UNPREDICTABLE. The pseudocode in the manual, as a sequential description of the instructions, has

limitations in this respect. For more information, see Limitations of the instruction pseudocode on page D16-2646.

A8 Instruction Descriptions

A8.4 Shifts applied to a register

ID051414 Non-Confidential

A8.4 Shifts applied to a register

ARM register offset load/store word and unsigned byte instructions can apply a wide range of different constant

shifts to the offset register. Both Thumb and ARM data-processing instructions can apply the same range of different

constant shifts to the second operand register. For details see Constant shifts.

ARM data-processing instructions can apply a register-controlled shift to the second operand register.

A8.4.1 Constant shifts

These are the same in Thumb and ARM instructions, except that the input bits come from different positions.

<shift>

is an optional shift to be applied to

<Rm>

. It can be any one of:

(omitted) No shift.

LSL #<n>

Logical shift left

<n>

bits. 1 <=

<n>

<= 31.

LSR #<n>

Logical shift right

<n>

bits. 1 <=

<n>

<= 32.

ASR #<n>

Arithmetic shift right

<n>

bits. 1 <=

<n>

<= 32.

ROR #<n>

Rotate right

<n>

bits. 1 <=

<n>

<= 31.

RRX

Rotate right one bit, with extend. Bit[0] is written to

shifter_carry_out

, bits[31:1] are shifted right

one bit, and the Carry flag is shifted into bit[31].

Note

Assemblers can permit the use of some or all of

ASR #0

LSL #0

LSR #0

, and

ROR #0

to specify that no shift is to be

performed. This is not standard UAL, and the encoding selected for Thumb instructions might vary between UAL

assemblers if it is used. To ensure disassembled code assembles to the original instructions, disassemblers must omit

the shift specifier when the instruction specifies no shift.

Similarly, assemblers can permit the use of

in the immediate forms of

ASR

LSL

LSR

, and

ROR

instructions to specify

that no shift is to be performed, that is, that a

MOV

(register) instruction is wanted. Again, this is not standard UAL,

and the encoding selected for Thumb instructions might vary between UAL assemblers if it is used. To ensure

disassembled code assembles to the original instructions, disassemblers must use the

MOV

(register) syntax when the

instruction specifies no shift.

Encoding

The assembler encodes

<shift>

into two type bits and five immediate bits, as follows:

(omitted) type =

0b00

, immediate = 0.

LSL #<n>

type =

0b00

, immediate =

<n>

LSR #<n>

type =

0b01

<n>

< 32, immediate =

<n>

== 32, immediate = 0.

ASR #<n>

type =

0b10

<n>

< 32, immediate =

<n>

== 32, immediate = 0.

ROR #<n>

type =

0b11

, immediate =

<n>

RRX

type =

0b11

, immediate = 0.

A8 Instruction Descriptions

A8.4 Shifts applied to a register

Non-Confidential ID051414

A8.4.2 Register controlled shifts

These are only available in ARM instructions.

<type>

is the type of shift to apply to the value read from

<Rm>

. It must be one of:

ASR

Arithmetic shift right, encoded as type =

0b10

LSL

Logical shift left, encoded as type =

0b00

LSR

Logical shift right, encoded as type =

0b01

ROR

Rotate right, encoded as type =

0b11

The bottom byte of

<Rs>

contains the shift amount.

A8.4.3 Pseudocode details of instruction-specified shifts and rotates

enumeration SRType {SRType_LSL, SRType_LSR, SRType_ASR, SRType_ROR, SRType_RRX};

// DecodeImmShift()

// ================

(SRType, integer) DecodeImmShift(bits(2) type, bits(5) imm5)

case type of

when ‘00’

shift_t = SRType_LSL; shift_n = UInt(imm5);

when ‘01’

shift_t = SRType_LSR; shift_n = if imm5 == ‘00000’ then 32 else UInt(imm5);

when ‘10’

shift_t = SRType_ASR; shift_n = if imm5 == ‘00000’ then 32 else UInt(imm5);

when ‘11’

if imm5 == ‘00000’ then

shift_t = SRType_RRX; shift_n = 1;

else

shift_t = SRType_ROR; shift_n = UInt(imm5);

return (shift_t, shift_n);

// DecodeRegShift()

// ================

SRType DecodeRegShift(bits(2) type)

case type of

when ‘00’ shift_t = SRType_LSL;

when ‘01’ shift_t = SRType_LSR;

when ‘10’ shift_t = SRType_ASR;

when ‘11’ shift_t = SRType_ROR;

return shift_t;

// Shift()

// =======

bits(N) Shift(bits(N) value, SRType type, integer amount, bit carry_in)

(result, -) = Shift_C(value, type, amount, carry_in);

return result;

// Shift_C()

// =========

(bits(N), bit) Shift_C(bits(N) value, SRType type, integer amount, bit carry_in)

assert !(type == SRType_RRX && amount != 1);

if amount == 0 then

(result, carry_out) = (value, carry_in);

else

case type of

when SRType_LSL

(result, carry_out) = LSL_C(value, amount);

A8 Instruction Descriptions

A8.4 Shifts applied to a register

ID051414 Non-Confidential

when SRType_LSR

(result, carry_out) = LSR_C(value, amount);

when SRType_ASR

(result, carry_out) = ASR_C(value, amount);

when SRType_ROR

(result, carry_out) = ROR_C(value, amount);

when SRType_RRX

(result, carry_out) = RRX_C(value, carry_in);

return (result, carry_out);

A8 Instruction Descriptions

A8.5 Memory accesses

Non-Confidential ID051414

A8.5 Memory accesses

Commonly, the following addressing modes are permitted for memory access instructions:

Offset addressing

The offset value is applied to an address obtained from the base register. The result is used as the

address for the memory access. The value of the base register is unchanged.

The assembly language syntax for this mode is:

[<Rn>, <offset>]

Pre-indexed addressing

The offset value is applied to an address obtained from the base register. The result is used as the

address for the memory access, and written back into the base register.

The assembly language syntax for this mode is:

[<Rn>, <offset>]!

Post-indexed addressing

The address obtained from the base register is used, unchanged, as the address for the memory

access. The offset value is applied to the address, and written back into the base register

The assembly language syntax for this mode is:

[<Rn>], <offset>

In each case,

<Rn>

is the base register.

can be:

• an immediate constant, such as

<imm8>

<imm12>

• an index register,

<Rm>

• a shifted index register, such as

For information about unaligned access, endianness, and exclusive access, see:

•Alignment support on page A3-108

•Endian support on page A3-110

•Synchronization and semaphores on page A3-114.

A8 Instruction Descriptions

A8.6 Encoding of lists of ARM core registers

ID051414 Non-Confidential

A8.6 Encoding of lists of ARM core registers

A number of instructions operate on lists of ARM core registers. For these instructions, the assembler syntax

includes a

field, that provides a list of the registers to be operated on, with list entries separated by

commas.

The registers list is encoded in the instruction encoding. Most often, this is done using an 8-bit, 13-bit, or 16-bit

register_list

field. This section gives more information about these and other possible register list encodings.

In a

register_list

field, each bit corresponds to a single register, and if the

field of the assembler

instruction includes Rt then

register_list<t>

is set to 1, otherwise it is set to 0.

The full rules for the encoding of lists of ARM core registers are:

• Except for the cases listed here, 16-bit Thumb encodings use an 8-bit register list, and can access only

registers R0-R7.

The exceptions to this rule are:

— The T1 encoding of

POP

uses an 8-bit register list, and an additional bit,

, that corresponds to the PC.

This means it can access any of R0-R7 and the PC.

— The T1 encoding of

PUSH

uses an 8-bit register list, and an additional bit,

, that corresponds to the LR.

This means it can access any of R0-R7 and the LR.

• 32-bit Thumb encodings of load operations use a 13-bit register list, and two additional bits,

, corresponding

to the LR, and

, corresponding to the PC. This means these instructions can access any of R0-R12 and the

LR and PC.

• 32-bit Thumb encodings of store operations use a 13-bit register list, and one additional bit,

, corresponding

to the LR. This means these instructions can access any of R0-R12 and the LR.

• Except for the case listed here, ARM encodings use a 16-bit register list. This means these instructions can

access any of R0-R12 and the SP, LR, and PC.

The exception to this rule is:

— The system instructions

LDM

(exception return) and

LDM

(User registers) use a 15-bit register list. This

means these instructions can access any of R0-R12 and the SP and LR.

• The T3 and A2 encodings of

POP

, and the T3 and A2 encodings of

PUSH

, access a single register from the set

of registers {R0-R12, LR, PC} and encode the register number in the

field.

Note

POP

is a load operation, and

PUSH

is a store operation.

In every case, the encoding-specific pseudocode converts the register list into a 32-bit variable,

registers

, with a

bit corresponding to each of the registers R0-R12, SP, LR, and PC.

Note

Some Floating-point and Advanced SIMD instructions operate on lists of Advanced SIMD and Floating-point

extension registers. The assembler syntax of these instructions includes a

<list>

field that specifies the registers to

be operated on, and the description of the instruction in Alphabetical list of instructions on page A8-300 defines the

use and encoding of this field.

A8 Instruction Descriptions

A8.7 Additional pseudocode support for instruction descriptions

Non-Confidential ID051414

A8.7 Additional pseudocode support for instruction descriptions

Earlier sections of this chapter include pseudocode that describes features of the execution of ARM and Thumb

instructions, see:

•Pseudocode details of conditional execution on page A8-289

•Pseudocode details of instruction-specified shifts and rotates on page A8-292

The following subsection gives additional pseudocode support functions for some of the instructions described in

Alphabetical list of instructions on page A8-300:

A8.7.1 Pseudocode details of coprocessor operations

The

Coproc_Accepted()

pseudocode function determines whether a coprocessor instruction is accepted for

execution.

// Coproc_Accepted()

// =================

// Determines whether the coprocessor instruction is accepted.

boolean Coproc_Accepted(integer cp_num, bits(32) instr)

// Not called for CP10 and CP11 coprocessors

assert !(cp_num IN {10,11});

if !(cp_num IN {14,15}) then

// Check against NSACR/CPACR/HCPTR

if HaveSecurityExt() then

// Check Non-Secure Access Control Register for permission to use cp_num.

if !IsSecure() && NSACR<cp_num> == ‘0’ then UNDEFINED;

// Check Coprocessor Access Control Register for permission to use cp_num.

if !HaveVirtExt() || !CurrentModeIsHyp() then

case CPACR<2*cp_num+1:2*cp_num> of

when ‘00’ UNDEFINED;

when ‘01’ if !CurrentModeIsNotUser() then UNDEFINED;

// else CPACR permits access

when ‘10’ UNPREDICTABLE;

when ‘11’ // CPACR permits access

if HaveSecurityExt() && HaveVirtExt() && !IsSecure() && HCPTR<cp_num> == ‘1’ then

HSRString = Zeros(25);

HSRString<5> = ‘0’;

HSRString<3:0> = cp_num<3:0>;

WriteHSR(‘000111’, HSRString);

if !CurrentModeIsHyp() then

TakeHypTrapException();

else

UNDEFINED;

return CPxInstrDecode(instr);

elsif cp_num == 14 then

// CP14 space

// Unpack the basic classes based on Opc1

if instr<27:24> == ‘1110’ && instr<4> == ‘1’ && instr<31:28> != ‘1111’ then

// MCR/MRC

opc1 = UInt(instr<23:21>);

two_reg = FALSE;

if instr<15:12> == ‘1111’ &&

!(instr<23:16> == ‘00010000’ && instr<7:0> == ‘00010001’) then

// every case using APSR except the DBGBSCRint

UNPREDICTABLE;

elsif instr<27:20> == ‘11000101’ && instr<31:28> != ‘1111’ then

// MRRC

opc1 = UInt(instr<7:4>);

if opc1 != 0 then UNDEFINED;

A8 Instruction Descriptions

A8.7 Additional pseudocode support for instruction descriptions

ID051414 Non-Confidential

two_reg = TRUE;

elsif instr<27:25> == ‘110’ && instr<31:28> != ‘1111’ && instr<22> == ‘0’ then

// LDC/STC

opc1 = 0; // only use of LDC/STC is for Debug

if UInt(instr<15:12>) != 5 then UNDEFINED;

else

UNDEFINED;

case opc1 of

// Does not consider possible traps of Debug and Trace registers from

// Non-secure modes to Hyp mode here.

when 0 return CP14DebugInstrDecode(instr);

when 1 return CP14TraceInstrDecode(instr);

when 6

// ThumbEE registers - fully decoded here

if two_reg then UNDEFINED;

if instr<7:5> != ‘000’ || instr<3:1> != ‘000’ ||

instr<15:12> == ‘1111’ then

UNPREDICTABLE;

else

if instr<0> == ‘0’ then

if !CurrentModeIsNotUser() then UNDEFINED;

if instr<1> == ‘1’ then

if !CurrentModeIsNotUser() && TEECR.XED == ‘1’ then UNDEFINED;

if HaveSecurityExt() && HaveVirtExt() && !IsSecure() &&

!CurrentModeIsHyp() && HSTR.TTEE == ‘1’ then

HSRString = Zeros(25);

HSRString<19:17> = instr<7:5>;

HSRString<16:14> = instr<23:21>;

HSRString<13:10> = instr<19:16>;

HSRString<8:5> = instr<15:12>;

HSRString<4:1> = instr<3:0>;

HSRString<0> = instr<20>;

WriteHSR(‘000101’, HSRString);

TakeHypTrapException();

return TRUE;

when 7 return CP14JazelleInstrDecode(instr);

otherwise

UNDEFINED;

elsif cp_num == 15 then

// Only MCR/MCRR/MRRC/MRC are supported in CP15

if instr<27:24> == ‘1110’ && instr<4> == ‘1’ && instr<31:28> != ‘1111’ then

// MCR/MRC

CrNnum = UInt(instr<19:16>);

two_reg = FALSE;

if instr<15:12> == ‘1111’ then UNPREDICTABLE;

// don’t support use of the PC

elsif instr<27:21> == ‘1100010’ && instr<31:28> != ‘1111’ then

// MCRR/MRRC

CrNnum = UInt(instr<3:0>);

two_reg = TRUE;

else

UNDEFINED;

if CrNnum == 4 then UNPREDICTABLE;

// Check for coarse-grained Hyp traps

// Check against HSTR for PL1 accesses

if HaveSecurityExt() && HaveVirtExt() && !IsSecure() && !CurrentModeIsHyp() &&

CrNnum != 14 && HSTR<CrNnum> == ‘1’ then

if !CurrentModeIsNotUser() && InstrIsPL0Undefined(instr) then

IMPLEMENTATION_CHOICE to be UNDEFINED;

HSRString = Zeros(25);

if two_reg then

A8 Instruction Descriptions

A8.7 Additional pseudocode support for instruction descriptions

Non-Confidential ID051414

HSRString<19:16> = instr<7:4>;

HSRString<13:10> = instr<19:16>;

HSRString<8:5> = instr<15:12>;

HSRString<4:1> = instr<3:0>;

HSRString<0> = instr<20>;

WriteHSR(‘000100’, HSRString);

else

HSRString<19:17> = instr<7:5>;

HSRString<16:14> = instr<23:21>;

HSRString<13:10> = instr<19:16>;

HSRString<8:5> = instr<15:12>;

HSRString<4:1> = instr<3:0>;

HSRString<0> = instr<20>;

WriteHSR(‘000011’, HSRString);

TakeHypTrapException();

// Check for TIDCP as a coarse-grain check for PL1 accesses

if HaveSecurityExt() && HaveVirtExt() && !IsSecure() && !CurrentModeIsHyp() &&

HCR.TIDCP == ‘1’ && !two_reg then

CrMnum = UInt(instr<3:0>);

if (CrNnum == 9 && CrMnum IN {0,1,2,5,6,7,8}) ||

(CrNnum == 10 && CrMnum IN {0,1,4,8}) ||

(CrNnum == 11 && CrMnum IN {0,1,2,3,4,5,6,7,8,15}) then

if !CurrentModeIsNotUser() && InstrIsPL0Undefined(instr) then

IMPLEMENTATION_CHOICE to be UNDEFINED;

HSRString = Zeros(25);

HSRString<19:17> = instr<7:5>;

HSRString<16:14> = instr<23:21>;

HSRString<13:10> = instr<19:16>;

HSRString<8:5> = instr<15:12>;

HSRString<4:1> = instr<3:0>;

HSRString<0> = instr<20>;

WriteHSR(‘000011’, HSRString);

TakeHypTrapException();

return CP15InstrDecode(instr);

The

Coproc_DoneLoading()

pseudocode function determines, for an

LDC

instruction, whether enough words have been

loaded:

boolean Coproc_DoneLoading(integer cp_num, bits(32) instr)

The

Coproc_DoneStoring()

function determines for an

STC

instruction whether enough words have been stored:

boolean Coproc_DoneStoring(integer cp_num, bits(32) instr)

The

Coproc_GetOneWord()

function obtains the word for an

MRC

instruction from the coprocessor:

bits(32) Coproc_GetOneWord(integer cp_num, bits(32) instr)

The

Coproc_GetTwoWords()

function obtains the two words for an

MRRC

instruction from the coprocessor:

(bits(32), bits(32)) Coproc_GetTwoWords(integer cp_num, bits(32) instr)

Note

The relative significance of the two words returned is IMPLEMENTATION DEFINED, but all uses within this manual

present the two words in the order (most significant, least significant).

The

Coproc_GetWordToStore()

function obtains the next word to store for an

STC

instruction from the coprocessor:

bits(32) Coproc_GetWordToStore(integer cp_num, bits(32) instr)

The

Coproc_InternalOperation()

procedure instructs a coprocessor to perform the internal operation requested by a

CDP

instruction:

Coproc_InternalOperation(integer cp_num, bits(32) instr)

The

Coproc_SendLoadedWord()

procedure sends a loaded word for an

LDC

instruction to the coprocessor:

A8 Instruction Descriptions

A8.7 Additional pseudocode support for instruction descriptions

ID051414 Non-Confidential

Coproc_SendLoadedWord(bits(32) word, integer cp_num, bits(32) instr)

The

Coproc_SendOneWord()

procedure sends the word for an

MCR

instruction to the coprocessor:

Coproc_SendOneWord(bits(32) word, integer cp_num, bits(32) instr)

The

Coproc_SendTwoWords()

procedure sends the two words for an

MCRR

instruction to the coprocessor:

Coproc_SendTwoWords(bits(32) word2, bits(32) word1, integer cp_num, bits(32) instr)

Note

The relative significance of

word2

and

word1

is IMPLEMENTATION DEFINED, but all uses within this manual treat

word2

as more significant than

word1

The

CPxInstrDecode()

pseudocode function decodes an accepted access to a coprocessor other than CP10, CP11,

CP14, or CP15:

boolean CPxInstrDecode(bits(32) instr)

The

CP14DebugInstrDecode()

pseudocode function decodes an accepted access to a CP14 debug register:

boolean CP14DebugInstrDecode(bits(32) instr)

The

CP14JazelleInstrDecode()

pseudocode function decodes an accepted access to a CP14 Jazelle register:

boolean CP14JazelleInstrDecode(bits(32) instr)

The

CP14TraceInstrDecode()

pseudocode function decodes an accepted access to a CP14 Trace register:

boolean CP14TraceInstrDecode(bits(32) instr)

The

CP15InstrDecode()

pseudocode function decodes an accepted access to a CP15 register:

boolean CP15InstrDecode(bits(32) instr)

A8.7.2 Calling the supervisor

The

CallSupervisor()

pseudocode function generates a Supervisor Call exception, after setting up the HSR if the

exception must be taken to Hyp mode. Valid execution of the

SVC

instruction calls this function.

// CallSupervisor()

// ================

// Calls the Supervisor, with appropriate trapping etc

CallSupervisor(bits(16) immediate)

if CurrentModeIsHyp() ||

(HaveVirtExt() && !IsSecure() && !CurrentModeIsNotUser() && HCR.TGE == ‘1’) then

// will be taken to Hyp mode so must set HSR

HSRString = Zeros(25);

HSRString<15:0> = if CurrentCond() == ‘1110’ then immediate else bits(16) UNKNOWN;

WriteHSR(‘010001’, HSRString);

// This will go to Hyp mode if necessary

TakeSVCException();

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8 Alphabetical list of instructions

This section lists every instruction. For details of the format used see Format of instruction descriptions on

page A8-282.

This section is formatted so that a full description of an instruction uses a double page.

A8.8.1 ADC (immediate)

Add with Carry (immediate) adds an immediate value and the Carry flag value to a register value, and writes the

result to the destination register. It can optionally update the condition flags based on the result.

d = UInt(Rd); n = UInt(Rn); setflags = (S == ‘1’); imm32 = ThumbExpandImm(i:imm3:imm8);

if d IN {13,15} || n IN {13,15} then UNPREDICTABLE;

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

if Rd == ‘1111’ && S == ‘1’ then SEE SUBS PC, LR and related instructions;

d = UInt(Rd); n = UInt(Rn); setflags = (S == ‘1’); imm32 = ARMExpandImm(imm12);

Encoding T1 ARMv6T2, ARMv7

ADC{S}<c> <Rd>, <Rn>, #<const>

Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7

ADC{S}<c> <Rd>, <Rn>, #<const>

1 1 1 0 i 0 1 0 1 0 S Rn 0 imm3 Rd imm8

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

0 0 1 0 1 0 1 S Rn Rd imm12

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

ADC{S}{<c>}{<q>} {<Rd>,} <Rn>, #<const>

where:

is present, the instruction updates the flags. Otherwise, the flags are not updated.

See Standard assembler syntax fields on page A8-287.

<Rd>

The destination register. If

is specified and <Rd> is the PC, see SUBS PC, LR (Thumb) on

page B9-2010 or SUBS PC, LR and related instructions (ARM) on page B9-2012.

In ARM instructions, if

is not specified and <Rd> is the PC, the instruction is a branch to the

address calculated by the operation. This is an interworking branch, see Pseudocode details of

operations on ARM core registers on page A2-47. ARM deprecates this use of PC.

Note

Before ARMv7, this was a simple branch.

<Rn>

The first operand register. The PC can be used in ARM instructions. ARM deprecates this use of PC.

<const>

The immediate value to be added to the value obtained from

<Rn>

. See Modified immediate constants

in Thumb instructions on page A6-232 or Modified immediate constants in ARM instructions on

page A5-200 for the range of values.

The pre-UAL syntax

ADC<c>S

is equivalent to

ADCS<c>

Operation

if ConditionPassed() then

EncodingSpecificOperations();

(result, carry, overflow) = AddWithCarry(R[n], imm32, APSR.C);

if d == 15 then // Can only occur for ARM encoding

ALUWritePC(result); // setflags is always FALSE here

else

R[d] = result;

if setflags then

APSR.N = result<31>;

APSR.Z = IsZeroBit(result);

APSR.C = carry;

APSR.V = overflow;

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.2 ADC (register)

Add with Carry (register) adds a register value, the Carry flag value, and an optionally-shifted register value, and

writes the result to the destination register. It can optionally update the condition flags based on the result.

d = UInt(Rdn); n = UInt(Rdn); m = UInt(Rm); setflags = !InITBlock();

(shift_t, shift_n) = (SRType_LSL, 0);

d = UInt(Rd); n = UInt(Rn); m = UInt(Rm); setflags = (S == ‘1’);

(shift_t, shift_n) = DecodeImmShift(type, imm3:imm2);

if d IN {13,15} || n IN {13,15} || m IN {13,15} then UNPREDICTABLE;

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

if Rd == ‘1111’ && S == ‘1’ then SEE SUBS PC, LR and related instructions;

d = UInt(Rd); n = UInt(Rn); m = UInt(Rm); setflags = (S == ‘1’);

(shift_t, shift_n) = DecodeImmShift(type, imm5);

Encoding T1 ARMv4T, ARMv5T*, ARMv6*, ARMv7

ADCS <Rdn>, <Rm>

Outside IT block.

ADC<c> <Rdn>, <Rm>

Inside IT block.

Encoding T2 ARMv6T2, ARMv7

ADC{S}<c>.W <Rd>, <Rn>, <Rm>{, <shift>}

Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7

ADC{S}<c> <Rd>, <Rn>, <Rm>{, <shift>}

0 1 0 0 0 0 0 1 0 1 Rm Rdn

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 1 0 1 0 1 1 0 1 0 S Rn (0) imm3 Rd imm2 type Rm

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

0 0 0 0 1 0 1 S Rn Rd imm5 type 0 Rm

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

ADC{S}{<c>}{<q>} {<Rd>,} <Rn>, <Rm> {, <shift>}

where:

is present, the instruction updates the flags. Otherwise, the flags are not updated.

See Standard assembler syntax fields on page A8-287.

<Rd>

The destination register. If

is specified and

<Rd>

is the PC, see SUBS PC, LR (Thumb) on

page B9-2010 or SUBS PC, LR and related instructions (ARM) on page B9-2012.

In ARM instructions, if

is not specified and

<Rd>

is the PC, the instruction is a branch to the address

calculated by the operation. This is an interworking branch, see Pseudocode details of operations

on ARM core registers on page A2-47. ARM deprecates this use of the PC.

Note

Before ARMv7, this was a simple branch.

<Rn>

The first operand register. The PC can be used in ARM instructions. ARM deprecates this use of the

PC.

<Rm>

The optionally shifted second operand register. The PC can be used in ARM instructions. ARM

deprecates this use of the PC.

<shift>

The shift to apply to the value read from

<Rm>

. If present, encoding T1 is not permitted. If absent, no

shift is applied and any encoding is permitted. Shifts applied to a register on page A8-291 describes

the shifts and how they are encoded.

In Thumb assembly:

• outside an IT block, if

ADCS <Rd>, <Rn>, <Rd>

has

<Rd>

and

<Rn>

both in the range R0-R7, it is assembled using

encoding T1 as though

ADCS <Rd>, <Rn>

had been written.

• inside an IT block, if

ADC<c> <Rd>, <Rn>, <Rd>

has

<Rd>

and

<Rn>

both in the range R0-R7, it is assembled

using encoding T1 as though

ADC<c> <Rd>, <Rn>

had been written.

To prevent either of these happening, use the .W qualifier.

The pre-UAL syntax

ADC<c>S

is equivalent to

ADCS<c>

Operation

if ConditionPassed() then

EncodingSpecificOperations();

shifted = Shift(R[m], shift_t, shift_n, APSR.C);

(result, carry, overflow) = AddWithCarry(R[n], shifted, APSR.C);

if d == 15 then // Can only occur for ARM encoding

ALUWritePC(result); // setflags is always FALSE here

else

R[d] = result;

if setflags then

APSR.N = result<31>;

APSR.Z = IsZeroBit(result);

APSR.C = carry;

APSR.V = overflow;

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.3 ADC (register-shifted register)

Add with Carry (register-shifted register) adds a register value, the Carry flag value, and a register-shifted register

value. It writes the result to the destination register, and can optionally update the condition flags based on the result.

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

d = UInt(Rd); n = UInt(Rn); m = UInt(Rm); s = UInt(Rs);

setflags = (S == ‘1’); shift_t = DecodeRegShift(type);

if d == 15 || n == 15 || m == 15 || s == 15 then UNPREDICTABLE;

Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7

ADC{S}<c> <Rd>, <Rn>, <Rm>, <type> <Rs>

0 0 0 0 1 0 1 S Rn Rd Rs 0 type 1 Rm

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

ADC{S}{<c>}{<q>} {<Rd>,} <Rn>, <Rm>, <type> <Rs>

where:

is present, the instruction updates the flags. Otherwise, the flags are not updated.

See Standard assembler syntax fields on page A8-287.

<Rd>

The destination register.

<Rn>

The first operand register.

<Rm>

The register that is shifted and used as the second operand.

<type>

The type of shift to apply to the value read from

<Rm>

. It must be one of:

ASR

Arithmetic shift right, encoded as type =

0b10

LSL

Logical shift left, encoded as type =

0b00

LSR

Logical shift right, encoded as type =

0b01

ROR

Rotate right, encoded as type =

0b11

<Rs>

The register whose bottom byte contains the amount to shift by.

The pre-UAL syntax

ADC<c>S

is equivalent to

ADCS<c>

Operation

if ConditionPassed() then

EncodingSpecificOperations();

shift_n = UInt(R[s]<7:0>);

shifted = Shift(R[m], shift_t, shift_n, APSR.C);

(result, carry, overflow) = AddWithCarry(R[n], shifted, APSR.C);

R[d] = result;

if setflags then

APSR.N = result<31>;

APSR.Z = IsZeroBit(result);

APSR.C = carry;

APSR.V = overflow;

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.4 ADD (immediate, Thumb)

This instruction adds an immediate value to a register value, and writes the result to the destination register. It can

optionally update the condition flags based on the result.

d = UInt(Rd); n = UInt(Rn); setflags = !InITBlock(); imm32 = ZeroExtend(imm3, 32);

d = UInt(Rdn); n = UInt(Rdn); setflags = !InITBlock(); imm32 = ZeroExtend(imm8, 32);

if Rd == ‘1111’ && S == ‘1’ then SEE CMN (immediate);

if Rn == ‘1101’ then SEE ADD (SP plus immediate);

d = UInt(Rd); n = UInt(Rn); setflags = (S == ‘1’); imm32 = ThumbExpandImm(i:imm3:imm8);

if d == 13 || (d == 15 && S == ‘0’) || n == 15 then UNPREDICTABLE;

if Rn == ‘1111’ then SEE ADR;

if Rn == ‘1101’ then SEE ADD (SP plus immediate);

d = UInt(Rd); n = UInt(Rn); setflags = FALSE; imm32 = ZeroExtend(i:imm3:imm8, 32);

if d IN {13,15} then UNPREDICTABLE;

Encoding T1 ARMv4T, ARMv5T*, ARMv6*, ARMv7

ADDS <Rd>, <Rn>, #<imm3>

Outside IT block.

ADD<c> <Rd>, <Rn>, #<imm3>

Inside IT block.

Encoding T2 ARMv4T, ARMv5T*, ARMv6*, ARMv7

ADDS <Rdn>, #<imm8>

Outside IT block.

ADD<c> <Rdn>, #<imm8>

Inside IT block.

Encoding T3 ARMv6T2, ARMv7

ADD{S}<c>.W <Rd>, <Rn>, #<const>

Encoding T4 ARMv6T2, ARMv7

ADDW<c> <Rd>, <Rn>, #<imm12>

0 0 0 1 1 1 0 imm3 Rn Rd

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 0 1 1 0 Rdn imm8

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 1 1 0 i 0 1 0 0 0 S Rn 0 imm3 Rd imm8

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

1 1 1 0 i 1 0 0 0 0 0 Rn 0 imm3 Rd imm8

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

where:

is present, the instruction updates the flags. Otherwise, the flags are not updated.

See Standard assembler syntax fields on page A8-287.

<Rd>

The destination register.

<Rn>

The first operand register. If

<Rn>

is SP, see ADD (SP plus immediate) on page A8-316. If

<Rn>

PC, see ADR on page A8-322.

<const>

The immediate value to be added to the value obtained from

<Rn>

. The range of values is 0-7 for

encoding T1, 0-255 for encoding T2 and 0-4095 for encoding T4. See Modified immediate constants

in Thumb instructions on page A6-232 for the range of values for encoding T3.

When multiple encodings of the same length are available for an instruction, encoding T3 is preferred to encoding

T4 (if encoding T4 is required, use the

ADDW

syntax). Encoding T1 is preferred to encoding T2 if

<Rd>

is specified

and encoding T2 is preferred to encoding T1 if

<Rd>

is omitted.

The pre-UAL syntax

ADD<c>S

is equivalent to

ADDS<c>

Operation

if ConditionPassed() then

EncodingSpecificOperations();

(result, carry, overflow) = AddWithCarry(R[n], imm32, ‘0’);

R[d] = result;

if setflags then

APSR.N = result<31>;

APSR.Z = IsZeroBit(result);

APSR.C = carry;

APSR.V = overflow;

Exceptions

None.

ADD{S}{<c>}{<q>} {<Rd>,} <Rn>, #<const>

All encodings permitted

ADDW{<c>}{<q>} {<Rd>,} <Rn>, #<const>

Only encoding T4 permitted

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.5 ADD (immediate, ARM)

This instruction adds an immediate value to a register value, and writes the result to the destination register. It can

optionally update the condition flags based on the result.

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

if Rn == ‘1111’ && S == ‘0’ then SEE ADR;

if Rn == ‘1101’ then SEE ADD (SP plus immediate);

if Rd == ‘1111’ && S == ‘1’ then SEE SUBS PC, LR and related instructions;

d = UInt(Rd); n = UInt(Rn); setflags = (S == ‘1’); imm32 = ARMExpandImm(imm12);

Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7

ADD{S}<c> <Rd>, <Rn>, #<const>

0 0 1 0 1 0 0 S Rn Rd imm12

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

where:

is present, the instruction updates the flags. Otherwise, the flags are not updated.

See Standard assembler syntax fields on page A8-287.

<Rd>

The destination register. If

is specified and

<Rd>

is the PC, see SUBS PC, LR and related

instructions (ARM) on page B9-2012.

is not specified and

<Rd>

is the PC, the instruction is a branch to the address calculated by the

operation. This is an interworking branch, see Pseudocode details of operations on ARM core

registers on page A2-47.

Note

Before ARMv7, this was a simple branch.

<Rn>

The first operand register. If the SP is specified for

<Rn>

, see ADD (SP plus immediate) on

page A8-316. If the PC is specified for

<Rn>

, see ADR on page A8-322.

<const>

The immediate value to be added to the value obtained from

<Rn>

. See Modified immediate constants

in ARM instructions on page A5-200 for the range of values.

The pre-UAL syntax

ADD<c>S

is equivalent to

ADDS<c>

Operation

if ConditionPassed() then

EncodingSpecificOperations();

(result, carry, overflow) = AddWithCarry(R[n], imm32, ‘0’);

if d == 15 then

ALUWritePC(result); // setflags is always FALSE here

else

R[d] = result;

if setflags then

APSR.N = result<31>;

APSR.Z = IsZeroBit(result);

APSR.C = carry;

APSR.V = overflow;

Exceptions

None.

ADD{S}{<c>}{<q>} {<Rd>,} <Rn>, #<const>

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.6 ADD (register, Thumb)

This instruction adds a register value and an optionally-shifted register value, and writes the result to the destination

d = UInt(Rd); n = UInt(Rn); m = UInt(Rm); setflags = !InITBlock();

(shift_t, shift_n) = (SRType_LSL, 0);

if (DN:Rdn) == ‘1101’ || Rm == ‘1101’ then SEE ADD (SP plus register);

d = UInt(DN:Rdn); n = d; m = UInt(Rm); setflags = FALSE; (shift_t, shift_n) = (SRType_LSL, 0);

if n == 15 && m == 15 then UNPREDICTABLE;

if d == 15 && InITBlock() && !LastInITBlock() then UNPREDICTABLE;

if Rd == ‘1111’ && S == ‘1’ then SEE CMN (register);

if Rn == ‘1101’ then SEE ADD (SP plus register);

d = UInt(Rd); n = UInt(Rn); m = UInt(Rm); setflags = (S == ‘1’);

(shift_t, shift_n) = DecodeImmShift(type, imm3:imm2);

if d == 13 || (d == 15 && S == ‘0’) || n == 15 || m IN {13,15} then UNPREDICTABLE;

Encoding T1 ARMv4T, ARMv5T*, ARMv6*, ARMv7

ADDS <Rd>, <Rn>, <Rm>

Outside IT block.

ADD<c> <Rd>, <Rn>, <Rm>

Inside IT block.

Encoding T2 ARMv6T2, ARMv7 if

<Rdn>

and

<Rm>

are both from R0-R7

ARMv4T, ARMv5T*, ARMv6*, ARMv7 otherwise

ADD<c> <Rdn>, <Rm>

<Rdn>

is the PC, must be outside or last in IT block.

Encoding T3 ARMv6T2, ARMv7

ADD{S}<c>.W <Rd>, <Rn>, <Rm>{, <shift>}

0 0 0 1 1 0 0 Rm Rn Rd

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 1 0 0 0 1 0 0

Rm Rdn

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 1 0 1 0 1 1 0 0 0 S Rn (0) imm3 Rd imm2 type Rm

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

ADD{S}{<c>}{<q>} {<Rd>,} <Rn>, <Rm> {, <shift>}

where:

is present, the instruction updates the flags. Otherwise, the flags are not updated.

See Standard assembler syntax fields on page A8-287.

<Rd>

The destination register. If

is specified and

<Rd>

is the PC, see CMN (register) on page A8-366. If

omitted,

<Rd>

is the same as

<Rn>

and encoding T2 is preferred to encoding T1 inside an IT block. If

<Rd>

is present, encoding T1 is preferred to encoding T2.

<Rd>

is the PC and

is not specified, encoding T2 is used and the instruction is a branch to the

address calculated by the operation. This is a simple branch, see Pseudocode details of operations

on ARM core registers on page A2-47.

<Rn>

The first operand register. The PC can be used in encoding T2. If

<Rn>

is SP, see ADD (SP plus

<Rm>

The register that is optionally shifted and used as the second operand. The PC can be used in

encoding T2

<shift>

The shift to apply to the value read from

<Rm>

. If present, only encoding T3 is permitted. If omitted,

no shift is applied and any encoding is permitted. Shifts applied to a register on page A8-291

describes the shifts and how they are encoded.

Inside an IT block, if

ADD<c> <Rd>, <Rn>, <Rd>

cannot be assembled using encoding T1, it is assembled using

encoding T2 as though

ADD<c> <Rd>, <Rn>

had been written. To prevent this happening, use the

qualifier.

The pre-UAL syntax

ADD<c>S

is equivalent to

ADDS<c>

Operation

if ConditionPassed() then

EncodingSpecificOperations();

shifted = Shift(R[m], shift_t, shift_n, APSR.C);

(result, carry, overflow) = AddWithCarry(R[n], shifted, ‘0’);

if d == 15 then

ALUWritePC(result); // setflags is always FALSE here

else

R[d] = result;

if setflags then

APSR.N = result<31>;

APSR.Z = IsZeroBit(result);

APSR.C = carry;

APSR.V = overflow;

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.7 ADD (register, ARM)

This instruction adds a register value and an optionally-shifted register value, and writes the result to the destination

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

if Rd == ‘1111’ && S == ‘1’ then SEE SUBS PC, LR and related instructions;

if Rn == ‘1101’ then SEE ADD (SP plus register);

d = UInt(Rd); n = UInt(Rn); m = UInt(Rm); setflags = (S == ‘1’);

(shift_t, shift_n) = DecodeImmShift(type, imm5);

Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7

ADD{S}<c> <Rd>, <Rn>, <Rm>{, <shift>}

cond 0 0 0 0 1 0 0 S Rn Rd imm5 type 0 Rm

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

ADD{S}{<c>}{<q>} {<Rd>,} <Rn>, <Rm> {, <shift>}

where:

is present, the instruction updates the flags. Otherwise, the flags are not updated.

See Standard assembler syntax fields on page A8-287.

<Rd>

The destination register. If

is specified and

<Rd>

is the PC, see SUBS PC, LR and related

instructions (ARM) on page B9-2012. If omitted,

<Rd>

is the same as

<Rn>

<Rd>

is the PC and

is not specified, the instruction is a branch to the address calculated by the

operation. This is an interworking branch, see Pseudocode details of operations on ARM core

registers on page A2-47.

Note

Before ARMv7, this was a simple branch.

<Rn>

The first operand register. The PC can be used. If

<Rn>

is SP, see ADD (SP plus register, Thumb) on

page A8-318.

<Rm>

The register that is optionally shifted and used as the second operand. The PC can be used.

<shift>

The shift to apply to the value read from

<Rm>

. If present, only encoding T3 or A1 is permitted. If

omitted, no shift is applied and any encoding is permitted. Shifts applied to a register on

page A8-291 describes the shifts and how they are encoded.

The pre-UAL syntax

ADD<c>S

is equivalent to

ADDS<c>

Operation

if ConditionPassed() then

EncodingSpecificOperations();

shifted = Shift(R[m], shift_t, shift_n, APSR.C);

(result, carry, overflow) = AddWithCarry(R[n], shifted, ‘0’);

if d == 15 then

ALUWritePC(result); // setflags is always FALSE here

else

R[d] = result;

if setflags then

APSR.N = result<31>;

APSR.Z = IsZeroBit(result);

APSR.C = carry;

APSR.V = overflow;

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.8 ADD (register-shifted register)

Add (register-shifted register) adds a register value and a register-shifted register value. It writes the result to the

destination register, and can optionally update the condition flags based on the result.

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

d = UInt(Rd); n = UInt(Rn); m = UInt(Rm); s = UInt(Rs);

setflags = (S == ‘1’); shift_t = DecodeRegShift(type);

if d == 15 || n == 15 || m == 15 || s == 15 then UNPREDICTABLE;

Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7

ADD{S}<c> <Rd>, <Rn>, <Rm>, <type> <Rs>

cond 0 0 0 0 1 0 0 S Rn Rd Rs 0 type 1 Rm

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

ADD{S}{<c>}{<q>} {<Rd>,} <Rn>, <Rm>, <type> <Rs>

where:

is present, the instruction updates the flags. Otherwise, the flags are not updated.

See Standard assembler syntax fields on page A8-287.

<Rd>

The destination register.

<Rn>

The first operand register.

<Rm>

The register that is shifted and used as the second operand.

<type>

The type of shift to apply to the value read from

<Rm>

. It must be one of:

ASR

Arithmetic shift right, encoded as type =

0b10

LSL

Logical shift left, encoded as type =

0b00

LSR

Logical shift right, encoded as type =

0b01

ROR

Rotate right, encoded as type =

0b11

<Rs>

The register whose bottom byte contains the amount to shift by.

The pre-UAL syntax

ADD<c>S

is equivalent to

ADDS<c>

Operation

if ConditionPassed() then

EncodingSpecificOperations();

shift_n = UInt(R[s]<7:0>);

shifted = Shift(R[m], shift_t, shift_n, APSR.C);

(result, carry, overflow) = AddWithCarry(R[n], shifted, ‘0’);

R[d] = result;

if setflags then

APSR.N = result<31>;

APSR.Z = IsZeroBit(result);

APSR.C = carry;

APSR.V = overflow;

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.9 ADD (SP plus immediate)

This instruction adds an immediate value to the SP value, and writes the result to the destination register.

d = UInt(Rd); setflags = FALSE; imm32 = ZeroExtend(imm8:’00’, 32);

d = 13; setflags = FALSE; imm32 = ZeroExtend(imm7:’00’, 32);

if Rd == ‘1111’ && S == ‘1’ then SEE CMN (immediate);

d = UInt(Rd); setflags = (S == ‘1’); imm32 = ThumbExpandImm(i:imm3:imm8);

if d == 15 && S == ‘0’ then UNPREDICTABLE;

d = UInt(Rd); setflags = FALSE; imm32 = ZeroExtend(i:imm3:imm8, 32);

if d == 15 then UNPREDICTABLE;

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

if Rd == ‘1111’ && S == ‘1’ then SEE SUBS PC, LR and related instructions;

d = UInt(Rd); setflags = (S == ‘1’); imm32 = ARMExpandImm(imm12);

Encoding T1 ARMv4T, ARMv5T*, ARMv6*, ARMv7

ADD<c> <Rd>, SP, #<imm>

Encoding T2 ARMv4T, ARMv5T*, ARMv6*, ARMv7

ADD<c> SP, SP, #<imm>

Encoding T3 ARMv6T2, ARMv7

ADD{S}<c>.W <Rd>, SP, #<const>

Encoding T4 ARMv6T2, ARMv7

ADDW<c> <Rd>, SP, #<imm12>

Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7

ADD{S}<c> <Rd>, SP, #<const>

1 0 1 0 1 Rd imm8

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 0 1 1 0 0 0 0 0 imm7

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 1 1 0 i 0 1 0 0 0 S 1 1 0 1 0 imm3 Rd imm8

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

1 1 1 0 i 1 0 0 0 0 0 1 1 0 1 0 imm3 Rd imm8

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

cond 0 0 1 0 1 0 0 S 1 1 0 1 Rd imm12

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

where:

is present, the instruction updates the flags. Otherwise, the flags are not updated.

See Standard assembler syntax fields on page A8-287.

<Rd>

The destination register. If

is specified and

<Rd>

is the PC, see SUBS PC, LR (Thumb) on

page B9-2010 or SUBS PC, LR and related instructions (ARM) on page B9-2012. If omitted,

<Rd>

is SP.

In ARM instructions, if

is not specified and

<Rd>

is the PC, the instruction is a branch to the address

calculated by the operation. This is an interworking branch, see Pseudocode details of operations

on ARM core registers on page A2-47.

Note

Before ARMv7, this was a simple branch.

<const>

The immediate value to be added to the value obtained from SP. Values are multiples of 4 in the

range 0-1020 for encoding T1, multiples of 4 in the range 0-508 for encoding T2 and any value in

the range 0-4095 for encoding T4. See Modified immediate constants in Thumb instructions on

page A6-232 or Modified immediate constants in ARM instructions on page A5-200 for the range

of values for encodings T3 and A1.

When both 32-bit encodings are available for an instruction, encoding T3 is preferred to encoding

T4.

Note

If encoding T4 is required, use the

ADDW

syntax.

The pre-UAL syntax

ADD<c>S

is equivalent to

ADDS<c>

Operation

if ConditionPassed() then

EncodingSpecificOperations();

(result, carry, overflow) = AddWithCarry(SP, imm32, ‘0’);

if d == 15 then // Can only occur for ARM encoding

ALUWritePC(result); // setflags is always FALSE here

else

R[d] = result;

if setflags then

APSR.N = result<31>;

APSR.Z = IsZeroBit(result);

APSR.C = carry;

APSR.V = overflow;

Exceptions

None.

ADD{S}{<c>}{<q>} {<Rd>,} SP, #<const>

All encodings permitted

ADDW{<c>}{<q>} {<Rd>,} SP, #<const>

Only encoding T4 is permitted

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.10 ADD (SP plus register, Thumb)

This instruction adds an optionally-shifted register value to the SP value, and writes the result to the destination

d = UInt(DM:Rdm); m = UInt(DM:Rdm); setflags = FALSE;

if d == 15 && InITBlock() && !LastInITBlock() then UNPREDICTABLE;

(shift_t, shift_n) = (SRType_LSL, 0);

if Rm == ‘1101’ then SEE encoding T1;

d = 13; m = UInt(Rm); setflags = FALSE;

(shift_t, shift_n) = (SRType_LSL, 0);

if Rd == ‘1111’ && S == ‘1’ then SEE CMN (register);

d = UInt(Rd); m = UInt(Rm); setflags = (S == ‘1’);

(shift_t, shift_n) = DecodeImmShift(type, imm3:imm2);

if d == 13 && (shift_t != SRType_LSL || shift_n > 3) then UNPREDICTABLE;

if (d == 15 && S == ‘0’) || m IN {13,15} then UNPREDICTABLE;

Encoding T1 ARMv4T, ARMv5T*, ARMv6*, ARMv7

ADD<c> <Rdm>, SP, <Rdm>

Encoding T2 ARMv4T, ARMv5T*, ARMv6*, ARMv7

ADD<c> SP, <Rm>

Encoding T3 ARMv6T2, ARMv7

ADD{S}<c>.W <Rd>, SP, <Rm>{, <shift>}

0 1 0 0 0 1 0 0

1 1 0 1 Rdm

1514131211109876543210

0 1 0 0 0 1 0 0 1 Rm 1 0 1

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 1 0 1 0 1 1 0 0 0 S 1 1 0 1 (0) imm3 Rd imm2 type Rm

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

ADD{S}{<c>}{<q>} {<Rd>,} SP, <Rm>{, <shift>}

where:

is present, the instruction updates the flags. Otherwise, the flags are not updated.

See Standard assembler syntax fields on page A8-287.

<Rd>

The destination register. If

is specified and

<Rd>

is the PC, see CMN (register) on page A8-366.

This register can be SP. If omitted,

<Rd>

is SP. This register can be the PC, but if it is, encoding T3

is not permitted. ARM deprecates using the PC.

<Rd>

is the PC and

is not specified, encoding T1 is used and the instruction is a branch to the

address calculated by the operation. This is a simple branch, see Pseudocode details of operations

on ARM core registers on page A2-47.

<Rm>

The register that is optionally shifted and used as the second operand. This register can be the PC,

but if it is, encoding T3 is not permitted. ARM deprecates using the PC. This register can be the SP,

but:

• ARM deprecates using the SP

• only encoding T1 is available and so the instruction can only be

ADD SP, SP, SP

<shift>

The shift to apply to the value read from

<Rm>

. If omitted, no shift is applied and any encoding is

permitted. If present, only encoding T3 is permitted. Shifts applied to a register on page A8-291

describes the shifts and how they are encoded.

<Rd>

is SP or omitted,

<shift>

is only permitted to be omitted,

LSL #1

LSL #2

, or

LSL #3

The pre-UAL syntax

ADD<c>S

is equivalent to

ADDS<c>

Operation

if ConditionPassed() then

EncodingSpecificOperations();

shifted = Shift(R[m], shift_t, shift_n, APSR.C);

(result, carry, overflow) = AddWithCarry(SP, shifted, ‘0’);

if d == 15 then

ALUWritePC(result); // setflags is always FALSE here

else

R[d] = result;

if setflags then

APSR.N = result<31>;

APSR.Z = IsZeroBit(result);

APSR.C = carry;

APSR.V = overflow;

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.11 ADD (SP plus register, ARM)

This instruction adds an optionally-shifted register value to the SP value, and writes the result to the destination

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

if Rd == ‘1111’ && S == ‘1’ then SEE SUBS PC, LR and related instructions;

d = UInt(Rd); m = UInt(Rm); setflags = (S == ‘1’);

(shift_t, shift_n) = DecodeImmShift(type, imm5);

Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7

ADD{S}<c> <Rd>, SP, <Rm>{, <shift>}

cond 0 0 0 0 1 0 0 S 1 1 0 1 Rd imm5 type 0 Rm

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

ADD{S}{<c>}{<q>} {<Rd>,} SP, <Rm>{, <shift>}

where:

is present, the instruction updates the flags. Otherwise, the flags are not updated.

See Standard assembler syntax fields on page A8-287.

<Rd>

The destination register. If

is specified and

<Rd>

is the PC, see SUBS PC, LR and related

instructions (ARM) on page B9-2012. This register can be SP. If omitted,

<Rd>

is SP. This register

can be the PC, but ARM deprecates using the PC.

is not specified and

<Rd>

is the PC, the instruction is a branch to the address calculated by the

operation. This is an interworking branch, see Pseudocode details of operations on ARM core

registers on page A2-47. ARM deprecates this use of the PC.

Note

Before ARMv7, this was a simple branch.

<Rm>

The register that is optionally shifted and used as the second operand. This register can be the PC,

but ARM deprecates using the PC. This register can be the SP, but ARM deprecates using the SP.

<shift>

The shift to apply to the value read from

<Rm>

. If omitted, no shift is applied and any encoding is

permitted. Shifts applied to a register on page A8-291 describes the shifts and how they are

encoded.

The pre-UAL syntax

ADD<c>S

is equivalent to

ADDS<c>

Operation

if ConditionPassed() then

EncodingSpecificOperations();

shifted = Shift(R[m], shift_t, shift_n, APSR.C);

(result, carry, overflow) = AddWithCarry(SP, shifted, ‘0’);

if d == 15 then

ALUWritePC(result); // setflags is always FALSE here

else

R[d] = result;

if setflags then

APSR.N = result<31>;

APSR.Z = IsZeroBit(result);

APSR.C = carry;

APSR.V = overflow;

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.12 ADR

This instruction adds an immediate value to the PC value to form a PC-relative address, and writes the result to the

destination register.

d = UInt(Rd); imm32 = ZeroExtend(imm8:’00’, 32); add = TRUE;

d = UInt(Rd); imm32 = ZeroExtend(i:imm3:imm8, 32); add = FALSE;

if d IN {13,15} then UNPREDICTABLE;

d = UInt(Rd); imm32 = ZeroExtend(i:imm3:imm8, 32); add = TRUE;

if d IN {13,15} then UNPREDICTABLE;

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

d = UInt(Rd); imm32 = ARMExpandImm(imm12); add = TRUE;

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

d = UInt(Rd); imm32 = ARMExpandImm(imm12); add = FALSE;

Encoding T1 ARMv4T, ARMv5T*, ARMv6*, ARMv7

ADR<c> <Rd>, <label>

Encoding T2 ARMv6T2, ARMv7

ADR<c>.W <Rd>, <label> <label>

before current instruction

SUB <Rd>, PC, #0

Special case for subtraction of zero

Encoding T3 ARMv6T2, ARMv7

ADR<c>.W <Rd>, <label> <label>

after current instruction

Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7

ADR<c> <Rd>, <label> <label>

after current instruction

Encoding A2 ARMv4*, ARMv5T*, ARMv6*, ARMv7

ADR<c> <Rd>, <label> <label>

before current instruction

SUB <Rd>, PC, #0

Special case for subtraction of zero

1 0 1 0 0 Rd imm8

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 1 1 0 i 1 0 1 0 1 0 1 1 1 1 0 imm3 Rd imm8

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

1 1 1 0 i 1 0 0 0 0 0 1 1 1 1 0 imm3 Rd imm8

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

cond 0 0 1 0 1 0 0 0 1 1 1 1 Rd imm12

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond 0 0 1 0 0 1 0 0 1 1 1 1 Rd imm12

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

where:

See Standard assembler syntax fields on page A8-287.

<Rd>

The destination register. In ARM instructions, if

<Rd>

is the PC, the instruction is a branch to the

address calculated by the operation. This is an interworking branch, see Pseudocode details of

operations on ARM core registers on page A2-47.

Note

Before ARMv7, this was a simple branch.

<label>

The label of an instruction or literal data item whose address is to be loaded into

<Rd>

. The assembler

calculates the required value of the offset from the

Align(PC, 4)

value of the

ADR

instruction to this

label.

If the offset is zero or positive, encodings T1, T3, and A1 are permitted, with

imm32

equal to the

offset.

If the offset is negative, encodings T2 and A2 are permitted, with

imm32

equal to the size of the offset.

That is, the use of encoding T2 or A2 indicates that the required offset is minus the value of

imm32

Permitted values of the size of the offset are:

Encoding T1 Multiples of 4 in the range 0 to 1020.

Encodings T2, T3 Any value in the range 0 to 4095.

Encodings A1, A2 Any of the constants described in Modified immediate constants in ARM

instructions on page A5-200.

The alternative syntax permits the addition or subtraction of the offset and the immediate offset to be specified

separately, including permitting a subtraction of 0 that cannot be specified using the normal syntax. For more

information, see Use of labels in UAL instruction syntax on page A4-162.

Operation

if ConditionPassed() then

EncodingSpecificOperations();

result = if add then (Align(PC,4) + imm32) else (Align(PC,4) - imm32);

if d == 15 then // Can only occur for ARM encodings

ALUWritePC(result);

else

R[d] = result;

Exceptions

None.

ADR{<c>}{<q>} <Rd>, <label>

Normal syntax

ADD{<c>}{<q>} <Rd>, PC, #<const>

Alternative for encodings T1, T3, A1

SUB{<c>}{<q>} <Rd>, PC, #<const>

Alternative for encoding T2, A2

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.13 AND (immediate)

This instruction performs a bitwise AND of a register value and an immediate value, and writes the result to the

destination register.

if Rd == ‘1111’ && S == ‘1’ then SEE TST (immediate);

d = UInt(Rd); n = UInt(Rn); setflags = (S == ‘1’);

(imm32, carry) = ThumbExpandImm_C(i:imm3:imm8, APSR.C);

if d == 13 || (d == 15 && S == ‘0’) || n IN {13,15} then UNPREDICTABLE;

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

if Rd == ‘1111’ && S == ‘1’ then SEE SUBS PC, LR and related instructions;

d = UInt(Rd); n = UInt(Rn); setflags = (S == ‘1’);

(imm32, carry) = ARMExpandImm_C(imm12, APSR.C);

Encoding T1 ARMv6T2, ARMv7

AND{S}<c> <Rd>, <Rn>, #<const>

Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7

AND{S}<c> <Rd>, <Rn>, #<const>

1 1 1 0 i 0 0 0 0 0 S Rn 0 imm3 Rd imm8

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

cond 0 0 1 0 0 0 0 S Rn Rd imm12

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

AND{S}{<c>}{<q>} {<Rd>,} <Rn>, #<const>

where:

is present, the instruction updates the flags. Otherwise, the flags are not updated.

See Standard assembler syntax fields on page A8-287.

<Rd>

The destination register. If

is specified and

<Rd>

is the PC, see SUBS PC, LR (Thumb) on

page B9-2010 or SUBS PC, LR and related instructions (ARM) on page B9-2012.

In ARM instructions, if

is not specified and

<Rd>

is the PC, the instruction is a branch to the address

calculated by the operation. This is an interworking branch, see Pseudocode details of operations

on ARM core registers on page A2-47. ARM deprecates this use of the PC.

Note

Before ARMv7, this was a simple branch.

<Rn>

The first operand register. The PC can be used in ARM instructions. ARM deprecates this use of the

PC.

<const>

The immediate value to be ANDed with the value obtained from

<Rn>

. See Modified immediate

constants in Thumb instructions on page A6-232 or Modified immediate constants in ARM

instructions on page A5-200 for the range of values.

The pre-UAL syntax

AND<c>S

is equivalent to

ANDS<c>

Operation

if ConditionPassed() then

EncodingSpecificOperations();

result = R[n] AND imm32;

if d == 15 then // Can only occur for ARM encoding

ALUWritePC(result); // setflags is always FALSE here

else

R[d] = result;

if setflags then

APSR.N = result<31>;

APSR.Z = IsZeroBit(result);

APSR.C = carry;

// APSR.V unchanged

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.14 AND (register)

This instruction performs a bitwise AND of a register value and an optionally-shifted register value, and writes the

result to the destination register. It can optionally update the condition flags based on the result.

d = UInt(Rdn); n = UInt(Rdn); m = UInt(Rm); setflags = !InITBlock();

(shift_t, shift_n) = (SRType_LSL, 0);

if Rd == ‘1111’ && S == ‘1’ then SEE TST (register);

d = UInt(Rd); n = UInt(Rn); m = UInt(Rm); setflags = (S == ‘1’);

(shift_t, shift_n) = DecodeImmShift(type, imm3:imm2);

if d == 13 || (d == 15 && S == ‘0’) || n IN {13,15} || m IN {13,15} then UNPREDICTABLE;

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

if Rd == ‘1111’ && S == ‘1’ then SEE SUBS PC, LR and related instructions;

d = UInt(Rd); n = UInt(Rn); m = UInt(Rm); setflags = (S == ‘1’);

(shift_t, shift_n) = DecodeImmShift(type, imm5);

Encoding T1 ARMv4T, ARMv5T*, ARMv6*, ARMv7

ANDS <Rdn>, <Rm>

Outside IT block.

AND<c> <Rdn>, <Rm>

Inside IT block.

Encoding T2 ARMv6T2, ARMv7

AND{S}<c>.W <Rd>, <Rn>, <Rm>{, <shift>}

Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7

AND{S}<c> <Rd>, <Rn>, <Rm>{, <shift>}

0 1 0 0 0 0 0 0 0 0 Rm Rdn

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 1 0 1 0 1 0 0 0 0 S Rn (0) imm3 Rd imm2 type Rm

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

cond 0 0 0 0 0 0 0 S Rn Rd imm5 type 0 Rm

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

AND{S}{<c>}{<q>} {<Rd>,} <Rn>, <Rm> {, <shift>}

where:

is present, the instruction updates the flags. Otherwise, the flags are not updated.

See Standard assembler syntax fields on page A8-287.

<Rd>

The destination register. If

is specified and

<Rd>

is the PC, see SUBS PC, LR (Thumb) on

page B9-2010 or SUBS PC, LR and related instructions (ARM) on page B9-2012.

In ARM instructions, if

is not specified and

<Rd>

is the PC, the instruction is a branch to the address

calculated by the operation. This is an interworking branch, see Pseudocode details of operations

on ARM core registers on page A2-47. ARM deprecates this use of the PC.

Note

Before ARMv7, this was a simple branch.

<Rn>

The first operand register. The PC can be used in ARM instructions. ARM deprecates this use of the

PC.

<Rm>

The register that is optionally shifted and used as the second operand. The PC can be used in ARM

instructions. ARM deprecates this use of the PC.

<shift>

The shift to apply to the value read from

<Rm>

. If present, encoding T1 is not permitted. If absent, no

shift is applied and all encodings are permitted. Shifts applied to a register on page A8-291

describes the shifts and how they are encoded.

In Thumb assembly:

• outside an IT block, if

ANDS <Rd>, <Rn>, <Rd>

has

<Rd>

and

<Rn>

both in the range R0-R7, it is assembled using

encoding T1 as though

ANDS <Rd>, <Rn>

had been written

• inside an IT block, if

AND<c> <Rd>, <Rn>, <Rd>

has

<Rd>

and

<Rn>

both in the range R0-R7, it is assembled

using encoding T1 as though

AND<c> <Rd>, <Rn>

had been written.

To prevent either of these happening, use the .W qualifier.

The pre-UAL syntax

AND<c>S

is equivalent to

ANDS<c>

Operation

if ConditionPassed() then

EncodingSpecificOperations();

(shifted, carry) = Shift_C(R[m], shift_t, shift_n, APSR.C);

result = R[n] AND shifted;

if d == 15 then // Can only occur for ARM encoding

ALUWritePC(result); // setflags is always FALSE here

else

R[d] = result;

if setflags then

APSR.N = result<31>;

APSR.Z = IsZeroBit(result);

APSR.C = carry;

// APSR.V unchanged

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.15 AND (register-shifted register)

This instruction performs a bitwise AND of a register value and a register-shifted register value. It writes the result

to the destination register, and can optionally update the condition flags based on the result.

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

d = UInt(Rd); n = UInt(Rn); m = UInt(Rm); s = UInt(Rs);

setflags = (S == ‘1’); shift_t = DecodeRegShift(type);

if d == 15 || n == 15 || m == 15 || s == 15 then UNPREDICTABLE;

Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7

AND{S}<c> <Rd>, <Rn>, <Rm>, <type> <Rs>

cond 0 0 0 0 0 0 0 S Rn Rd Rs 0 type 1 Rm

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

AND{S}{<c>}{<q>} {<Rd>,} <Rn>, <Rm>, <type> <Rs>

where:

is present, the instruction updates the flags. Otherwise, the flags are not updated.

See Standard assembler syntax fields on page A8-287.

<Rd>

The destination register.

<Rn>

The first operand register.

<Rm>

The register that is shifted and used as the second operand.

<type>

The type of shift to apply to the value read from

<Rm>

. It must be one of:

ASR

Arithmetic shift right, encoded as type =

0b10

LSL

Logical shift left, encoded as type =

0b00

LSR

Logical shift right, encoded as type =

0b01

ROR

Rotate right, encoded as type =

0b11

<Rs>

The register whose bottom byte contains the amount to shift by.

The pre-UAL syntax

AND<c>S

is equivalent to

ANDS<c>

Operation

if ConditionPassed() then

EncodingSpecificOperations();

shift_n = UInt(R[s]<7:0>);

(shifted, carry) = Shift_C(R[m], shift_t, shift_n, APSR.C);

result = R[n] AND shifted;

R[d] = result;

if setflags then

APSR.N = result<31>;

APSR.Z = IsZeroBit(result);

APSR.C = carry;

// APSR.V unchanged

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.16 ASR (immediate)

Arithmetic Shift Right (immediate) shifts a register value right by an immediate number of bits, shifting in copies

of its sign bit, and writes the result to the destination register. It can optionally update the condition flags based on

the result.

d = UInt(Rd); m = UInt(Rm); setflags = !InITBlock();

(-, shift_n) = DecodeImmShift(‘10’, imm5);

d = UInt(Rd); m = UInt(Rm); setflags = (S == ‘1’);

(-, shift_n) = DecodeImmShift(‘10’, imm3:imm2);

if d IN {13,15} || m IN {13,15} then UNPREDICTABLE;

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

if Rd == ‘1111’ && S == ‘1’ then SEE SUBS PC, LR and related instructions;

d = UInt(Rd); m = UInt(Rm); setflags = (S == ‘1’);

(-, shift_n) = DecodeImmShift(‘10’, imm5);

Encoding T1 ARMv4T, ARMv5T*, ARMv6*, ARMv7

ASRS <Rd>, <Rm>, #<imm>

Outside IT block.

ASR<c> <Rd>, <Rm>, #<imm>

Inside IT block.

Encoding T2 ARMv6T2, ARMv7

ASR{S}<c>.W <Rd>, <Rm>, #<imm>

Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7

ASR{S}<c> <Rd>, <Rm>, #<imm>

0 0 0 1 0 imm5 Rm Rd

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 1 0 1 0 1 0 0 1 0 S 1 1 1 1 (0) imm3 Rd imm2 1 0 Rm

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

cond 0 0 0 1 1 0 1 S (0) (0) (0) (0) Rd imm5 1 0 0 Rm

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

ASR{S}{<c>}{<q>} {<Rd>,} <Rm>, #<imm>

where:

is present, the instruction updates the flags. Otherwise, the flags are not updated.

See Standard assembler syntax fields on page A8-287.

<Rd>

The destination register.

In ARM instructions, if

is not specified and

<Rd>

is the PC, the instruction is a branch to the address

calculated by the operation. This is an interworking branch, see Pseudocode details of operations

on ARM core registers on page A2-47. ARM deprecates this use of the PC.

Note

Before ARMv7, this was a simple branch.

<Rm>

The first operand register. The PC can be used in ARM instructions. ARM deprecates this use of the

PC.

<imm>

The shift amount, in the range 1 to 32. See Shifts applied to a register on page A8-291.

Operation

if ConditionPassed() then

EncodingSpecificOperations();

(result, carry) = Shift_C(R[m], SRType_ASR, shift_n, APSR.C);

if d == 15 then // Can only occur for ARM encoding

ALUWritePC(result); // setflags is always FALSE here

else

R[d] = result;

if setflags then

APSR.N = result<31>;

APSR.Z = IsZeroBit(result);

APSR.C = carry;

// APSR.V unchanged

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.17 ASR (register)

Arithmetic Shift Right (register) shifts a register value right by a variable number of bits, shifting in copies of its

sign bit, and writes the result to the destination register. The variable number of bits is read from the bottom byte of

a register. It can optionally update the condition flags based on the result.

d = UInt(Rdn); n = UInt(Rdn); m = UInt(Rm); setflags = !InITBlock();

d = UInt(Rd); n = UInt(Rn); m = UInt(Rm); setflags = (S == ‘1’);

if d IN {13,15} || n IN {13,15} || m IN {13,15} then UNPREDICTABLE;

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

d = UInt(Rd); n = UInt(Rn); m = UInt(Rm); setflags = (S == ‘1’);

if d == 15 || n == 15 || m == 15 then UNPREDICTABLE;

Encoding T1 ARMv4T, ARMv5T*, ARMv6*, ARMv7

ASRS <Rdn>, <Rm>

Outside IT block.

ASR<c> <Rdn>, <Rm>

Inside IT block.

Encoding T2 ARMv6T2, ARMv7

ASR{S}<c>.W <Rd>, <Rn>, <Rm>

Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7

ASR{S}<c> <Rd>, <Rn>, <Rm>

0 1 0 0 0 0 0 1 0 0 Rm Rdn

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 1 1 1 0 1 0 0 1 0 S Rn 1 1 1 1 Rd 0 0 0 0 Rm

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

cond 0 0 0 1 1 0 1 S (0) (0) (0) (0) Rd Rm 0 1 0 1 Rn

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

ASR{S}{<c>}{<q>} {<Rd>,} <Rn>, <Rm>

where:

is present, the instruction updates the flags. Otherwise, the flags are not updated.

See Standard assembler syntax fields on page A8-287.

<Rd>

The destination register.

<Rn>

The first operand register.

<Rm>

The register whose bottom byte contains the amount to shift by.

Operation

if ConditionPassed() then

EncodingSpecificOperations();

shift_n = UInt(R[m]<7:0>);

(result, carry) = Shift_C(R[n], SRType_ASR, shift_n, APSR.C);

R[d] = result;

if setflags then

APSR.N = result<31>;

APSR.Z = IsZeroBit(result);

APSR.C = carry;

// APSR.V unchanged

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.18 B

Branch causes a branch to a target address.

if cond == ‘1110’ then SEE UDF;

if cond == ‘1111’ then SEE SVC;

imm32 = SignExtend(imm8:’0’, 32);

if InITBlock() then UNPREDICTABLE;

imm32 = SignExtend(imm11:’0’, 32);

if InITBlock() && !LastInITBlock() then UNPREDICTABLE;

if cond<3:1> == ‘111’ then SEE “Related encodings”;

imm32 = SignExtend(S:J2:J1:imm6:imm11:’0’, 32);

if InITBlock() then UNPREDICTABLE;

I1 = NOT(J1 EOR S); I2 = NOT(J2 EOR S); imm32 = SignExtend(S:I1:I2:imm10:imm11:’0’, 32);

if InITBlock() && !LastInITBlock() then UNPREDICTABLE;

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

imm32 = SignExtend(imm24:’00’, 32);

Encoding T1 ARMv4T, ARMv5T*, ARMv6*, ARMv7

B<c> <label>

Not permitted in IT block.

Encoding T2 ARMv4T, ARMv5T*, ARMv6*, ARMv7

B<c> <label>

Outside or last in IT block

Encoding T3 ARMv6T2, ARMv7

B<c>.W <label>

Not permitted in IT block.

Encoding T4 ARMv6T2, ARMv7

B<c>.W <label>

Outside or last in IT block

Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7

B<c> <label>

Related encodings See Branches and miscellaneous control on page A6-235.

1 1 0 1 cond imm8

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 1 1 0 0 imm11

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 1 1 0 S cond imm6 1 0 J1 0 J2 imm11

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

1 1 1 0 S imm10 1 0 J1 1 J2 imm11

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

cond 1 0 1 0 imm24

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

B{<c>}{<q>} <label>

where:

See Standard assembler syntax fields on page A8-287.

Note

Encodings T1 and T3 are conditional in their own right, and do not require an

instruction to make

them conditional.

For encodings T1 and T3,

<c>

must not be

or omitted. The 4-bit encoding of the condition is

placed in the instruction and not in a preceding

instruction, and the instruction must not be in an

IT block. As a result, encodings T1 and T2 are never both available to the assembler, nor are

encodings T3 and T4.

<label>

The label of the instruction that is to be branched to. The assembler calculates the required value of

the offset from the PC value of the

instruction to this label, then selects an encoding that sets

imm32

to that offset.

Permitted offsets are:

Encoding T1 Even numbers in the range –256 to 254

Encoding T2 Even numbers in the range –2048 to 2046

Encoding T3 Even numbers in the range –1048576 to 1048574

Encoding T4 Even numbers in the range –16777216 to 16777214

Encoding A1 Multiples of 4 in the range –33554432 to 33554428.

Operation

if ConditionPassed() then

EncodingSpecificOperations();

BranchWritePC(PC + imm32);

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.19 BFC

Bit Field Clear clears any number of adjacent bits at any position in a register, without affecting the other bits in the

d = UInt(Rd); msbit = UInt(msb); lsbit = UInt(imm3:imm2);

if d IN {13,15} then UNPREDICTABLE;

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

d = UInt(Rd); msbit = UInt(msb); lsbit = UInt(lsb);

if d == 15 then UNPREDICTABLE;

Encoding T1 ARMv6T2, ARMv7

BFC<c> <Rd>, #<lsb>, #<width>

Encoding A1 ARMv6T2, ARMv7

BFC<c> <Rd>, #<lsb>, #<width>

1 1 1 0 (0) 1 1 0 1 1 0 1 1 1 1 0 imm3 Rd imm2 (0) msb

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

cond 0 1 1 1 1 1 0 msb Rd lsb 0 0 1 1 1 1 1

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

BFC{<c>}{<q>} <Rd>, #<lsb>, #<width>

where:

See Standard assembler syntax fields on page A8-287.

<Rd>

The destination register.

<lsb>

The least significant bit that is to be cleared, in the range 0 to 31. This determines the required value

lsbit

<width>

The number of bits to be cleared, in the range 1 to 32-

<lsb>

. The required value of

msbit

<lsb>+<width>-1

Operation

if ConditionPassed() then

EncodingSpecificOperations();

if msbit >= lsbit then

R[d]<msbit:lsbit> = Replicate(‘0’, msbit-lsbit+1);

// Other bits of R[d] are unchanged

else

UNPREDICTABLE;

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.20 BFI

Bit Field Insert copies any number of low order bits from a register into the same number of adjacent bits at any

position in the destination register.

if Rn == ‘1111’ then SEE BFC;

d = UInt(Rd); n = UInt(Rn); msbit = UInt(msb); lsbit = UInt(imm3:imm2);

if d IN {13,15} || n == 13 then UNPREDICTABLE;

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

if Rn == ‘1111’ then SEE BFC;

d = UInt(Rd); n = UInt(Rn); msbit = UInt(msb); lsbit = UInt(lsb);

if d == 15 then UNPREDICTABLE;

Encoding T1 ARMv6T2, ARMv7

BFI<c> <Rd>, <Rn>, #<lsb>, #<width>

Encoding A1 ARMv6T2, ARMv7

BFI<c> <Rd>, <Rn>, #<lsb>, #<width>

1 1 1 0 (0) 1 1 0 1 1 0 Rn 0 imm3 Rd imm2 (0) msb

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

cond 0 1 1 1 1 1 0 msb Rd lsb 0 0 1 Rn

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

BFI{<c>}{<q>} <Rd>, <Rn>, #<lsb>, #<width>

where:

See Standard assembler syntax fields on page A8-287.

<Rd>

The destination register.

<Rn>

The source register.

<lsb>

The least significant destination bit, in the range 0 to 31. This determines the required value of

lsbit

<width>

The number of bits to be copied, in the range 1 to 32-

<lsb>

. The required value of

msbit

<lsb>+<width>-1

Operation

if ConditionPassed() then

EncodingSpecificOperations();

if msbit >= lsbit then

R[d]<msbit:lsbit> = R[n]<(msbit-lsbit):0>;

// Other bits of R[d] are unchanged

else

UNPREDICTABLE;

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.21 BIC (immediate)

Bitwise Bit Clear (immediate) performs a bitwise AND of a register value and the complement of an immediate

value, and writes the result to the destination register. It can optionally update the condition flags based on the result.

d = UInt(Rd); n = UInt(Rn); setflags = (S == ‘1’);

(imm32, carry) = ThumbExpandImm_C(i:imm3:imm8, APSR.C);

if d IN {13,15} || n IN {13,15} then UNPREDICTABLE;

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

if Rd == ‘1111’ && S == ‘1’ then SEE SUBS PC, LR and related instructions;

d = UInt(Rd); n = UInt(Rn); setflags = (S == ‘1’);

(imm32, carry) = ARMExpandImm_C(imm12, APSR.C);

Encoding T1 ARMv6T2, ARMv7

BIC{S}<c> <Rd>, <Rn>, #<const>

Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7

BIC{S}<c> <Rd>, <Rn>, #<const>

1 1 1 0 i 0 0 0 0 1 S Rn 0 imm3 Rd imm8

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

cond 0 0 1 1 1 1 0 S Rn Rd imm12

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

BIC{S}{<c>}{<q>} {<Rd>,} <Rn>, #<const>

where:

is present, the instruction updates the flags. Otherwise, the flags are not updated.

See Standard assembler syntax fields on page A8-287.

<Rd>

The destination register. If

is specified and

<Rd>

is the PC, see SUBS PC, LR (Thumb) on

page B9-2010 or SUBS PC, LR and related instructions (ARM) on page B9-2012.

In ARM instructions, if

is not specified and

<Rd>

is the PC, the instruction is a branch to the address

calculated by the operation. This is an interworking branch, see Pseudocode details of operations

on ARM core registers on page A2-47. ARM deprecates this use of the PC.

Note

Before ARMv7, this was a simple branch.

<Rn>

The register that contains the operand. The PC can be used in ARM instructions. ARM deprecates

this use of the PC.

<const>

The immediate value to be bitwise inverted and ANDed with the value obtained from

<Rn>

. See

Modified immediate constants in Thumb instructions on page A6-232 or Modified immediate

constants in ARM instructions on page A5-200 for the range of values.

The pre-UAL syntax

BIC<c>S

is equivalent to

BICS<c>

Operation

if ConditionPassed() then

EncodingSpecificOperations();

result = R[n] AND NOT(imm32);

if d == 15 then // Can only occur for ARM encoding

ALUWritePC(result); // setflags is always FALSE here

else

R[d] = result;

if setflags then

APSR.N = result<31>;

APSR.Z = IsZeroBit(result);

APSR.C = carry;

// APSR.V unchanged

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.22 BIC (register)

Bitwise Bit Clear (register) performs a bitwise AND of a register value and the complement of an optionally-shifted

the result.

d = UInt(Rdn); n = UInt(Rdn); m = UInt(Rm); setflags = !InITBlock();

(shift_t, shift_n) = (SRType_LSL, 0);

d = UInt(Rd); n = UInt(Rn); m = UInt(Rm); setflags = (S == ‘1’);

(shift_t, shift_n) = DecodeImmShift(type, imm3:imm2);

if d IN {13,15} || n IN {13,15} || m IN {13,15} then UNPREDICTABLE;

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

if Rd == ‘1111’ && S == ‘1’ then SEE SUBS PC, LR and related instructions;

d = UInt(Rd); n = UInt(Rn); m = UInt(Rm); setflags = (S == ‘1’);

(shift_t, shift_n) = DecodeImmShift(type, imm5);

Encoding T1 ARMv4T, ARMv5T*, ARMv6*, ARMv7

BICS <Rdn>, <Rm>

Outside IT block.

BIC<c> <Rdn>, <Rm>

Inside IT block.

Encoding T2 ARMv6T2, ARMv7

BIC{S}<c>.W <Rd>, <Rn>, <Rm>{, <shift>}

Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7

BIC{S}<c> <Rd>, <Rn>, <Rm>{, <shift>}

0 1 0 0 0 0 1 1 1 0 Rm Rdn

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 1 0 1 0 1 0 0 0 1 S Rn (0) imm3 Rd imm2 type Rm

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

cond 0 0 0 1 1 1 0 S Rn Rd imm5 type 0 Rm

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

BIC{S}{<c>}{<q>} {<Rd>,} <Rn>, <Rm> {, <shift>}

where:

is present, the instruction updates the flags. Otherwise, the flags are not updated.

See Standard assembler syntax fields on page A8-287.

<Rd>

The destination register. If

is specified and

<Rd>

is the PC, see SUBS PC, LR (Thumb) on

page B9-2010 or SUBS PC, LR and related instructions (ARM) on page B9-2012.

In ARM instructions, if

is not specified and

<Rd>

is the PC, the instruction is a branch to the address

calculated by the operation. This is an interworking branch, see Pseudocode details of operations

on ARM core registers on page A2-47. ARM deprecates this use of the PC.

Note

Before ARMv7, this was a simple branch.

<Rn>

The first operand register. The PC can be used in ARM instructions. ARM deprecates this use of the

PC.

<Rm>

The register that is optionally shifted and used as the second operand. The PC can be used in ARM

instructions. ARM deprecates this use of the PC.

<shift>

The shift to apply to the value read from

<Rm>

. If present, encoding T1 is not permitted. If absent, no

shift is applied and all encodings are permitted. Shifts applied to a register on page A8-291

describes the shifts and how they are encoded.

The pre-UAL syntax

BIC<c>S

is equivalent to

BICS<c>

Operation

if ConditionPassed() then

EncodingSpecificOperations();

(shifted, carry) = Shift_C(R[m], shift_t, shift_n, APSR.C);

result = R[n] AND NOT(shifted);

if d == 15 then // Can only occur for ARM encoding

ALUWritePC(result); // setflags is always FALSE here

else

R[d] = result;

if setflags then

APSR.N = result<31>;

APSR.Z = IsZeroBit(result);

APSR.C = carry;

// APSR.V unchanged

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.23 BIC (register-shifted register)

Bitwise Bit Clear (register-shifted register) performs a bitwise AND of a register value and the complement of a

register-shifted register value. It writes the result to the destination register, and can optionally update the condition

flags based on the result.

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

d = UInt(Rd); n = UInt(Rn); m = UInt(Rm); s = UInt(Rs);

setflags = (S == ‘1’); shift_t = DecodeRegShift(type);

if d == 15 || n == 15 || m == 15 || s == 15 then UNPREDICTABLE;

Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7

BIC{S}<c> <Rd>, <Rn>, <Rm>, <type> <Rs>

cond 0 0 0 1 1 1 0 S Rn Rd Rs 0 type 1 Rm

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

BIC{S}{<c>}{<q>} {<Rd>,} <Rn>, <Rm>, <type> <Rs>

where:

is present, the instruction updates the flags. Otherwise, the flags are not updated.

See Standard assembler syntax fields on page A8-287.

<Rd>

The destination register.

<Rn>

The first operand register.

<Rm>

The register that is shifted and used as the second operand.

<type>

The type of shift to apply to the value read from

<Rm>

. It must be one of:

ASR

Arithmetic shift right, encoded as type =

0b10

LSL

Logical shift left, encoded as type =

0b00

LSR

Logical shift right, encoded as type =

0b01

ROR

Rotate right, encoded as type =

0b11

<Rs>

The register whose bottom byte contains the amount to shift by.

The pre-UAL syntax

BIC<c>S

is equivalent to

BICS<c>

Operation

if ConditionPassed() then

EncodingSpecificOperations();

shift_n = UInt(R[s]<7:0>);

(shifted, carry) = Shift_C(R[m], shift_t, shift_n, APSR.C);

result = R[n] AND NOT(shifted);

R[d] = result;

if setflags then

APSR.N = result<31>;

APSR.Z = IsZeroBit(result);

APSR.C = carry;

// APSR.V unchanged

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.24 BKPT

Breakpoint causes a software breakpoint to occur.

Breakpoint is always unconditional, even when inside an IT block.

imm32 = ZeroExtend(imm8, 32);

// imm32 is for assembly/disassembly only and is ignored by hardware.

imm32 = ZeroExtend(imm12:imm4, 32);

// imm32 is for assembly/disassembly only and is ignored by hardware.

if cond != ‘1110’ then UNPREDICTABLE; // BKPT must be encoded with AL condition

Encoding T1 ARMv5T*, ARMv6*, ARMv7

BKPT #<imm8>

Encoding A1 ARMv5T*, ARMv6*, ARMv7

BKPT #<imm16>

1 0 1 1 1 1 1 0 imm8

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond 0 0 0 1 0 0 1 0 imm12 0 1 1 1 imm4

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

BKPT{<q>} {#}<imm>

where:

<q>

See Standard assembler syntax fields on page A8-287. A

BKPT

instruction must be unconditional.

<imm>

Specifies a value that is stored in the instruction, in the range 0-255 for a Thumb instruction or

0-65535 for an ARM instruction. This value is ignored by the processor, but can be used by a

debugger to store more information about the breakpoint.

Operation

EncodingSpecificOperations();

BKPTInstrDebugEvent();

Exceptions

Prefetch Abort.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.25 BL, BLX (immediate)

Branch with Link calls a subroutine at a PC-relative address.

Branch with Link and Exchange Instruction Sets (immediate) calls a subroutine at a PC-relative address, and

changes instruction set from ARM to Thumb, or from Thumb to ARM.

I1 = NOT(J1 EOR S); I2 = NOT(J2 EOR S); imm32 = SignExtend(S:I1:I2:imm10:imm11:’0’, 32);

targetInstrSet = CurrentInstrSet();

if InITBlock() && !LastInITBlock() then UNPREDICTABLE;

if CurrentInstrSet() == InstrSet_ThumbEE || H == ‘1’ then UNDEFINED;

I1 = NOT(J1 EOR S); I2 = NOT(J2 EOR S); imm32 = SignExtend(S:I1:I2:imm10H:imm10L:’00’, 32);

targetInstrSet = InstrSet_ARM;

if InITBlock() && !LastInITBlock() then UNPREDICTABLE;

For the case when

cond

0b1111

, see the A2 encoding.

imm32 = SignExtend(imm24:’00’, 32); targetInstrSet = InstrSet_ARM;

imm32 = SignExtend(imm24:H:’0’, 32); targetInstrSet = InstrSet_Thumb;

Encoding T1 ARMv4T, ARMv5T*, ARMv6*, ARMv7 if J1 == J2 == 1

ARMv6T2, ARMv7 otherwise

BL<c> <label>

Outside or last in IT block

Encoding T2 ARMv5T*, ARMv6*, ARMv7 if J1 == J2 == 1

ARMv6T2, ARMv7 otherwise

BLX<c> <label>

Outside or last in IT block

Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7

BL<c> <label>

Encoding A2 ARMv5T*, ARMv6*, ARMv7

BLX <label>

1 1 1 0 S imm10 1 1 J1 1 J2 imm11

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

1 1 1 0 S imm10H 1 1 J1 0 J2 imm10L H

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

cond 1 0 1 1 imm24

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 1 1 1 0 1 H imm24

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

BL{X}{<c>}{<q>} <label>

where:

See Standard assembler syntax fields on page A8-287. An ARM

BLX

(immediate) instruction must

be unconditional.

If present, specifies a change of instruction set (from ARM to Thumb or from Thumb to ARM). If

X is omitted, the processor remains in the same state. For ThumbEE instructions, specifying X is

not permitted.

<label>

The label of the instruction that is to be branched to.

uses encoding T1 or A1. The assembler calculates the required value of the offset from the PC

value of the

instruction to this label, then selects an encoding with

imm32

set to that offset.

BLX

uses encoding T2 or A2. The assembler calculates the required value of the offset from the

Align(PC, 4)

value of the

BLX

instruction to this label, then selects an encoding with

imm32

set to that

offset.

Permitted offsets are:

Encoding T1 Even numbers in the range –16777216 to 16777214.

Encoding T2 Multiples of 4 in the range –16777216 to 16777212.

Encoding A1 Multiples of 4 in the range –33554432 to 33554428.

Encoding A2 Even numbers in the range –33554432 to 33554430.

Operation

if ConditionPassed() then

EncodingSpecificOperations();

if CurrentInstrSet() == InstrSet_ARM then

LR = PC - 4;

else

LR = PC<31:1> : ‘1’;

if targetInstrSet == InstrSet_ARM then

targetAddress = Align(PC,4) + imm32;

else

targetAddress = PC + imm32;

SelectInstrSet(targetInstrSet);

BranchWritePC(targetAddress);

Exceptions

None.

Branch range before ARMv6T2

Before ARMv6T2, J1 and J2 in encodings T1 and T2 were both 1, resulting in a smaller branch range. The

instructions could be executed as two separate 16-bit instructions, as described in BL and BLX (immediate)

instructions, before ARMv6T2 on page D12-2504.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.26 BLX (register)

Branch with Link and Exchange (register) calls a subroutine at an address and instruction set specified by a register.

m = UInt(Rm);

if m == 15 then UNPREDICTABLE;

if InITBlock() && !LastInITBlock() then UNPREDICTABLE;

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

m = UInt(Rm);

if m == 15 then UNPREDICTABLE;

Encoding T1 ARMv5T*, ARMv6*, ARMv7

BLX<c> <Rm>

Outside or last in IT block

Encoding A1 ARMv5T*, ARMv6*, ARMv7

BLX<c> <Rm>

0 1 0 0 0 1 1 1 1 Rm (0) (0) (0)

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond 0 0 0 1 0 0 1 0 (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) 0 0 1 1 Rm

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

BLX{<c>}{<q>} <Rm>

where:

See Standard assembler syntax fields on page A8-287.

<Rm>

The register that contains the branch target address and instruction set selection bit. This register can

be the SP in both ARM and Thumb instructions, but ARM deprecates this use of the SP.

Operation

if ConditionPassed() then

EncodingSpecificOperations();

target = R[m];

if CurrentInstrSet() == InstrSet_ARM then

next_instr_addr = PC - 4;

LR = next_instr_addr;

else

next_instr_addr = PC - 2;

LR = next_instr_addr<31:1> : ‘1’;

BXWritePC(target);

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.27 BX

Branch and Exchange causes a branch to an address and instruction set specified by a register.

m = UInt(Rm);

if InITBlock() && !LastInITBlock() then UNPREDICTABLE;

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

m = UInt(Rm);

Encoding T1 ARMv4T, ARMv5T*, ARMv6*, ARMv7

BX<c> <Rm>

Outside or last in IT block

Encoding A1 ARMv4T, ARMv5T*, ARMv6*, ARMv7

BX<c> <Rm>

0 1 0 0 0 1 1 1 0 Rm (0) (0) (0)

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

cond 0 0 0 1 0 0 1 0 (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) 0 0 0 1 Rm

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

BX{<c>}{<q>} <Rm>

where:

See Standard assembler syntax fields on page A8-287.

<Rm>

The register that contains the branch target address and instruction set selection bit. The PC can be

used. This register can be the SP in both ARM and Thumb instructions, but ARM deprecates this

use of the SP.

Note

<Rm>

is the PC in a Thumb instruction at a non word-aligned address, it results in UNPREDICTABLE

behavior because the address passed to the

BXWritePC()

pseudocode function has bits<1:0> = '10'.

Operation

if ConditionPassed() then

EncodingSpecificOperations();

BXWritePC(R[m]);

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.28 BXJ

Branch and Exchange Jazelle attempts to change to Jazelle state. If the attempt fails, it branches to an address and

instruction set specified by a register as though it were a

instruction.

In an implementation that includes the Virtualization Extensions, if HSTR.TJDBX is set to 1, execution of a

BXJ

instruction in a Non-secure mode other than Hyp mode generates a Hyp Trap exception. For more information see

Trapping accesses to Jazelle functionality on page B1-1256.

m = UInt(Rm);

if m IN {13,15} then UNPREDICTABLE;

if InITBlock() && !LastInITBlock() then UNPREDICTABLE;

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

m = UInt(Rm);

if m == 15 then UNPREDICTABLE;

Encoding T1 ARMv6T2, ARMv7

BXJ<c> <Rm>

Outside or last in IT block

Encoding A1 ARMv5TEJ, ARMv6*, ARMv7

BXJ<c> <Rm>

1 1 1 0 0 1 1 1 1 0 0 Rm 1 0 (0) 0 (1) (1) (1) (1) (0) (0) (0) (0) (0) (0) (0) (0)

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

cond 0 0 0 1 0 0 1 0 (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) 0 0 1 0 Rm

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

BXJ{<c>}{<q>} <Rm>

where:

See Standard assembler syntax fields on page A8-287.

<Rm>

The register that specifies the branch target address and instruction set selection bit to be used if the

attempt to switch to Jazelle state fails.

Operation

if ConditionPassed() then

EncodingSpecificOperations();

if HaveVirtExt() && !IsSecure() && !CurrentModeIsHyp() && HSTR.TJDBX == ‘1’ then

HSRString = Zeros(25);

HSRString<3:0> = m;

WriteHSR(‘001010’, HSRString);

TakeHypTrapException();

elsif JMCR.JE == ‘0’ || CurrentInstrSet() == InstrSet_ThumbEE then

BXWritePC(R[m]);

else

if JazelleAcceptsExecution() then

SwitchToJazelleExecution();

else

SUBARCHITECTURE_DEFINED handler call;

Exceptions

Hyp Trap.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.29 CBNZ, CBZ

Compare and Branch on Nonzero and Compare and Branch on Zero compare the value in a register with zero, and

conditionally branch forward a constant value. They do not affect the condition flags.

n = UInt(Rn); imm32 = ZeroExtend(i:imm5:’0’, 32); nonzero = (op == ‘1’);

if InITBlock() then UNPREDICTABLE;

Encoding T1 ARMv6T2, ARMv7

CB{N}Z <Rn>, <label>

Not permitted in IT block.

1 0 1 1 op 0 i 1 imm5 Rn

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

CB{N}Z{<q>} <Rn>, <label>

where:

If specified, causes the branch to occur when the contents of

<Rn>

are nonzero (encoded as op = 1).

If omitted, causes the branch to occur when the contents of

<Rn>

are zero (encoded as op = 0).

<q>

See Standard assembler syntax fields on page A8-287. A

CBZ

CBNZ

instruction must be

unconditional.

<Rn>

The operand register.

<label>

The label of the instruction that is to be branched to. The assembler calculates the required value of

the offset from the PC value of the

CBZ

CBNZ

instruction to this label, then selects an encoding that

sets

imm32

to that offset. Permitted offsets are even numbers in the range 0 to 126.

Operation

EncodingSpecificOperations();

if nonzero != IsZero(R[n]) then

BranchWritePC(PC + imm32);

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.30 CDP, CDP2

Coprocessor Data Processing tells a coprocessor to perform an operation that is independent of ARM core registers

and memory. If no coprocessor can execute the instruction, an Undefined Instruction exception is generated.

This is a generic coprocessor instruction. Some of the fields have no functionality defined by the architecture and

are free for use by the coprocessor instruction set designer. These are the

opc1

opc2

CRd

CRn

, and

CRm

fields.

However, coprocessors CP8-CP15 are reserved for use by ARM, and this manual defines the valid

CDP

and

CDP2

instructions when

coproc

is in the range

p15

. For more information see Coprocessor support on page A2-94 and

General behavior of system control registers on page B5-1776.

For the case when

cond

0b1111

, see the T2 and A2 encoding.

if coproc IN “101x” then SEE “Floating-point instructions”;

cp = UInt(coproc);

if coproc IN “101x” then UNDEFINED;

cp = UInt(coproc);

Encoding T1/A1 ARMv6T2, ARMv7 for encoding T1

ARMv4*, ARMv5T*, ARMv6*, ARMv7 for encoding A1

CDP<c> <coproc>, <opc1>, <CRd>, <CRn>, <CRm>, <opc2>

Encoding T2/A2 ARMv6T2, ARMv7 for encoding T2

ARMv5T*, ARMv6*, ARMv7 for encoding A2

CDP2<c> <coproc>, <opc1>, <CRd>, <CRn>, <CRm>, <opc2>

Floating-point instructions See Floating-point data-processing instructions on page A7-272.

1 1 0 1 1 1 0 opc1 CRn CRd coproc opc2 0 CRm

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

cond 1 1 1 0 opc1 CRn CRd coproc opc2 0 CRm

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 1 1 1 1 1 0 opc1 CRn CRd coproc opc2 0 CRm

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

1 1 1 1 1 1 0 opc1 CRn CRd coproc opc2 0 CRm

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

CDP{2}{<c>}{<q>} <coproc>, {#}<opc1>, <CRd>, <CRn>, <CRm> {, {#}<opc2>}

where:

If specified, selects encoding T2/A2. If omitted, selects encoding T1/A1.

See Standard assembler syntax fields on page A8-287. An ARM

CDP2

instruction must be

unconditional.

The name of the coprocessor, and causes the corresponding coprocessor number to be placed in the

cp_num field of the instruction. The generic coprocessor names are p0-p15.

<opc1>

Is a coprocessor-specific opcode, in the range 0 to 15.

<CRd>

The destination coprocessor register for the instruction.

<CRn>

The coprocessor register that contains the first operand.

<CRm>

The coprocessor register that contains the second operand.

<opc2>

Is a coprocessor-specific opcode in the range 0 to 7. If it is omitted,

<opc2>

is 0.

Operation

if ConditionPassed() then

EncodingSpecificOperations();

if !Coproc_Accepted(cp, ThisInstr()) then

GenerateCoprocessorException();

else

Coproc_InternalOperation(cp, ThisInstr());

Exceptions

Undefined Instruction.

Uses of these instructions by specific coprocessors might generate other exceptions.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.31 CHKA

CHKA

is a ThumbEE instruction, see CHKA on page A9-1124.

A8.8.32 CLREX

Clear-Exclusive clears the local record of the executing processor that an address has had a request for an exclusive

access.

// No additional decoding required

Encoding T1 ARMv7

CLREX<c>

Encoding A1 ARMv6K, ARMv7

CLREX

1 1 1 0 0 1 1 1 0 1 1 (1) (1) (1) (1) 1 0 (0) 0 (1) (1) (1) (1) 0 0 1 0 (1) (1) (1) (1)

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

(1)1 1 1 0 1 0 1 0 1 1 1 (1) (1) (1) (1) (1) (1) (1) (0) (0) (0) (0) 0 0 0 1 (1) (1) (1) (1)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

CLREX{<c>}{<q>}

where:

See Standard assembler syntax fields on page A8-287. An ARM

CLREX

instruction must be

unconditional.

Operation

if ConditionPassed() then

EncodingSpecificOperations();

ClearExclusiveLocal(ProcessorID());

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.33 CLZ

Count Leading Zeros returns the number of binary zero bits before the first binary one bit in a value.

if !Consistent(Rm) then UNPREDICTABLE;

d = UInt(Rd); m = UInt(Rm);

if d IN {13,15} || m IN {13,15} then UNPREDICTABLE;

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

d = UInt(Rd); m = UInt(Rm);

if d == 15 || m == 15 then UNPREDICTABLE;

Encoding T1 ARMv6T2, ARMv7

CLZ<c> <Rd>, <Rm>

Encoding A1 ARMv5T*, ARMv6*, ARMv7

CLZ<c> <Rd>, <Rm>

1 1 1 1 0 1 0 1 0 1 1 Rm 1 1 1 1 Rd 1 0 0 0 Rm

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

cond 0 0 0 1 0 1 1 0 (1) (1) (1) (1) Rd (1) (1) (1) (1) 0 0 0 1 Rm

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

CLZ{<c>}{<q>} <Rd>, <Rm>

where:

See Standard assembler syntax fields on page A8-287.

<Rd>

The destination register.

<Rm>

The register that contains the operand. Its number must be encoded twice in encoding T1.

Operation

if ConditionPassed() then

EncodingSpecificOperations();

result = CountLeadingZeroBits(R[m]);

R[d] = result<31:0>;

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.34 CMN (immediate)

Compare Negative (immediate) adds a register value and an immediate value. It updates the condition flags based

on the result, and discards the result.

n = UInt(Rn); imm32 = ThumbExpandImm(i:imm3:imm8);

if n == 15 then UNPREDICTABLE;

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

n = UInt(Rn); imm32 = ARMExpandImm(imm12);

Encoding T1 ARMv6T2, ARMv7

CMN<c> <Rn>, #<const>

Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7

CMN<c> <Rn>, #<const>

1 1 1 0 i 0 1 0 0 0 1 Rn 0 imm3 1 1 1 1 imm8

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

cond 0 0 1 1 0 1 1 1 Rn (0) (0) (0) (0) imm12

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

CMN{<c>}{<q>} <Rn>, #<const>

where:

See Standard assembler syntax fields on page A8-287.

<Rn>

The register that contains the operand. SP can be used in Thumb and ARM instructions. The PC can

be used in ARM instructions, but ARM deprecates this use of the PC.

<const>

The immediate value to be added to the value obtained from

<Rn>

. See Modified immediate constants

in Thumb instructions on page A6-232 or Modified immediate constants in ARM instructions on

page A5-200 for the range of values.

Operation

if ConditionPassed() then

EncodingSpecificOperations();

(result, carry, overflow) = AddWithCarry(R[n], imm32, ‘0’);

APSR.N = result<31>;

APSR.Z = IsZeroBit(result);

APSR.C = carry;

APSR.V = overflow;

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.35 CMN (register)

Compare Negative (register) adds a register value and an optionally-shifted register value. It updates the condition

flags based on the result, and discards the result.

n = UInt(Rn); m = UInt(Rm);

(shift_t, shift_n) = (SRType_LSL, 0);

n = UInt(Rn); m = UInt(Rm);

(shift_t, shift_n) = DecodeImmShift(type, imm3:imm2);

if n == 15 || m IN {13,15} then UNPREDICTABLE;

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

n = UInt(Rn); m = UInt(Rm);

(shift_t, shift_n) = DecodeImmShift(type, imm5);

Encoding T1 ARMv4T, ARMv5T*, ARMv6*, ARMv7

CMN<c> <Rn>, <Rm>

Encoding T2 ARMv6T2, ARMv7

CMN<c>.W <Rn>, <Rm>{, <shift>}

Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7

CMN<c> <Rn>, <Rm>{, <shift>}

0 1 0 0 0 0 1 0 1 1 Rm Rn

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 1 0 1 0 1 1 0 0 0 1 Rn (0) imm3 1 1 1 1 imm2 type Rm

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

cond 0 0 0 1 0 1 1 1 Rn (0) (0) (0) (0) imm5 type 0 Rm

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

CMN{<c>}{<q>} <Rn>, <Rm> {, <shift>}

where:

See Standard assembler syntax fields on page A8-287.

<Rn>

The first operand register. SP can be used in Thumb instructions (encoding T2) and in ARM

instructions. The PC can be used in ARM instructions, but ARM deprecates this use of the PC.

<Rm>

The register that is optionally shifted and used as the second operand. The PC can be used in ARM

instructions, but ARM deprecates this use of the PC.

<shift>

The shift to apply to the value read from

<Rm>

. If present, encoding T1 is not permitted. If absent, no

shift is applied and all encodings are permitted. Shifts applied to a register on page A8-291

describes the shifts and how they are encoded.

Operation

if ConditionPassed() then

EncodingSpecificOperations();

shifted = Shift(R[m], shift_t, shift_n, APSR.C);

(result, carry, overflow) = AddWithCarry(R[n], shifted, ‘0’);

APSR.N = result<31>;

APSR.Z = IsZeroBit(result);

APSR.C = carry;

APSR.V = overflow;

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.36 CMN (register-shifted register)

Compare Negative (register-shifted register) adds a register value and a register-shifted register value. It updates the

condition flags based on the result, and discards the result.

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

n = UInt(Rn); m = UInt(Rm); s = UInt(Rs);

shift_t = DecodeRegShift(type);

if n == 15 || m == 15 || s == 15 then UNPREDICTABLE;

Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7

CMN<c> <Rn>, <Rm>, <type> <Rs>

cond 0 0 0 1 0 1 1 1 Rn (0) (0) (0) (0) Rs 0 type 1 Rm

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

CMN{<c>}{<q>} <Rn>, <Rm>, <type> <Rs>

where:

See Standard assembler syntax fields on page A8-287.

<Rn>

The first operand register.

<Rm>

The register that is shifted and used as the second operand.

<type>

The type of shift to apply to the value read from

<Rm>

. It must be one of:

ASR

Arithmetic shift right, encoded as type =

0b10

LSL

Logical shift left, encoded as type =

0b00

LSR

Logical shift right, encoded as type =

0b01

ROR

Rotate right, encoded as type =

0b11

<Rs>

The register whose bottom byte contains the amount to shift by.

Operation

if ConditionPassed() then

EncodingSpecificOperations();

shift_n = UInt(R[s]<7:0>);

shifted = Shift(R[m], shift_t, shift_n, APSR.C);

(result, carry, overflow) = AddWithCarry(R[n], shifted, ‘0’);

APSR.N = result<31>;

APSR.Z = IsZeroBit(result);

APSR.C = carry;

APSR.V = overflow;

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.37 CMP (immediate)

Compare (immediate) subtracts an immediate value from a register value. It updates the condition flags based on

the result, and discards the result.

n = UInt(Rn); imm32 = ZeroExtend(imm8, 32);

n = UInt(Rn); imm32 = ThumbExpandImm(i:imm3:imm8);

if n == 15 then UNPREDICTABLE;

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

n = UInt(Rn); imm32 = ARMExpandImm(imm12);

Encoding T1 ARMv4T, ARMv5T*, ARMv6*, ARMv7

CMP<c> <Rn>, #<imm8>

Encoding T2 ARMv6T2, ARMv7

CMP<c>.W <Rn>, #<const>

Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7

CMP<c> <Rn>, #<const>

0 0 1 0 1 Rn imm8

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 1 1 0 i 0 1 1 0 1 1 Rn 0 imm3 1 1 1 1 imm8

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

cond 0 0 1 1 0 1 0 1 Rn (0) (0) (0) (0) imm12

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

CMP{<c>}{<q>} <Rn>, #<const>

where:

See Standard assembler syntax fields on page A8-287.

<Rn>

The first operand register. SP can be used in Thumb instructions (encoding T2) and in ARM

instructions. The PC can be used in ARM instructions, but ARM deprecates this use of the PC.

<const>

The immediate value to be compared with the value obtained from

<Rn>

. The range of values is

0-255 for encoding T1. See Modified immediate constants in Thumb instructions on page A6-232

or Modified immediate constants in ARM instructions on page A5-200 for the range of values for

encoding T2 and A1.

Operation

if ConditionPassed() then

EncodingSpecificOperations();

(result, carry, overflow) = AddWithCarry(R[n], NOT(imm32), ‘1’);

APSR.N = result<31>;

APSR.Z = IsZeroBit(result);

APSR.C = carry;

APSR.V = overflow;

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.38 CMP (register)

Compare (register) subtracts an optionally-shifted register value from a register value. It updates the condition flags

based on the result, and discards the result.

n = UInt(Rn); m = UInt(Rm);

(shift_t, shift_n) = (SRType_LSL, 0);

n = UInt(N:Rn); m = UInt(Rm);

(shift_t, shift_n) = (SRType_LSL, 0);

if n < 8 && m < 8 then UNPREDICTABLE;

if n == 15 || m == 15 then UNPREDICTABLE;

n = UInt(Rn); m = UInt(Rm);

(shift_t, shift_n) = DecodeImmShift(type, imm3:imm2);

if n == 15 || m IN {13,15} then UNPREDICTABLE;

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

n = UInt(Rn); m = UInt(Rm);

(shift_t, shift_n) = DecodeImmShift(type, imm5);

Encoding T1 ARMv4T, ARMv5T*, ARMv6*, ARMv7

CMP<c> <Rn>, <Rm> <Rn>

and

<Rm>

both from R0-R7

Encoding T2 ARMv4T, ARMv5T*, ARMv6*, ARMv7

CMP<c> <Rn>, <Rm> <Rn>

and

<Rm>

not both from R0-R7

Encoding T3 ARMv6T2, ARMv7

CMP<c>.W <Rn>, <Rm> {, <shift>}

Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7

CMP<c> <Rn>, <Rm>{, <shift>}

0 1 0 0 0 0 1 0 1 0 Rm Rn

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

0 1 0 0 0 1 0 1 N Rm Rn

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1 1 0 1 0 1 1 1 0 1 1 Rn (0) imm3 1 1 1 1 imm2 type Rm

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

cond 0 0 0 1 0 1 0 1 Rn (0) (0) (0) (0) imm5 type 0 Rm

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

CMP{<c>}{<q>} <Rn>, <Rm> {, <shift>}

where:

See Standard assembler syntax fields on page A8-287.

<Rn>

The first operand register. The SP can be used. The PC can be used in ARM instructions, but ARM

deprecates this use of the PC.

<Rm>

The register that is optionally shifted and used as the second operand. The PC can be used in ARM

instructions, but ARM deprecates this use of the PC. The SP can be used in both ARM and Thumb

instructions, but:

• ARM deprecates the use of SP

• when assembling for the Thumb instruction set, only encoding T2 is available.

<shift>

The shift to apply to the value read from

<Rm>

. If present, encodings T1 and T2 are not permitted. If

absent, no shift is applied and all encodings are permitted. Shifts applied to a register on

page A8-291 describes the shifts and how they are encoded.

Operation

if ConditionPassed() then

EncodingSpecificOperations();

shifted = Shift(R[m], shift_t, shift_n, APSR.C);

(result, carry, overflow) = AddWithCarry(R[n], NOT(shifted), ‘1’);

APSR.N = result<31>;

APSR.Z = IsZeroBit(result);

APSR.C = carry;

APSR.V = overflow;

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.39 CMP (register-shifted register)

Compare (register-shifted register) subtracts a register-shifted register value from a register value. It updates the

condition flags based on the result, and discards the result.

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

n = UInt(Rn); m = UInt(Rm); s = UInt(Rs);

shift_t = DecodeRegShift(type);

if n == 15 || m == 15 || s == 15 then UNPREDICTABLE;

Encoding A1 ARMv4*, ARMv5T*, ARMv6*, ARMv7

CMP<c> <Rn>, <Rm>, <type> <Rs>

cond 0 0 0 1 0 1 0 1 Rn (0) (0) (0) (0) Rs 0 type 1 Rm

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Assembler syntax

CMP{<c>}{<q>} <Rn>, <Rm>, <type> <Rs>

where:

See Standard assembler syntax fields on page A8-287.

<Rn>

The first operand register.

<Rm>

The register that is shifted and used as the second operand.

<type>

The type of shift to apply to the value read from

<Rm>

. It must be one of:

ASR

Arithmetic shift right, encoded as type =

0b10

LSL

Logical shift left, encoded as type =

0b00

LSR

Logical shift right, encoded as type =

0b01

ROR

Rotate right, encoded as type =

0b11

<Rs>

The register whose bottom byte contains the amount to shift by.

Operation

if ConditionPassed() then

EncodingSpecificOperations();

shift_n = UInt(R[s]<7:0>);

shifted = Shift(R[m], shift_t, shift_n, APSR.C);

(result, carry, overflow) = AddWithCarry(R[n], NOT(shifted), ‘1’);

APSR.N = result<31>;

APSR.Z = IsZeroBit(result);

APSR.C = carry;

APSR.V = overflow;

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.40 CPS

Change Processor State is a system instruction, see CPS (Thumb) on page B9-1978 and CPS (ARM) on

page B9-1980.

A8.8.41 CPY

Copy is a pre-UAL synonym for

MOV

(register).

Assembler syntax

CPY <Rd>, <Rn>

This is equivalent to:

MOV <Rd>, <Rn>

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

A8.8.42 DBG

Debug Hint provides a hint to debug and related systems. See their documentation for what use (if any) they make

of this instruction.

// Any decoding of ‘option’ is specified by the debug system

For the case when

cond

0b1111

, see Unconditional instructions on page A5-216.

// Any decoding of ‘option’ is specified by the debug system

Assembler syntax

DBG{<c>}{<q>} #<option>

where:

See Standard assembler syntax fields on page A8-287.

Provides extra information about the hint, and is in the range 0 to 15.

Operation

if ConditionPassed() then

EncodingSpecificOperations();

Hint_Debug(option);

Exceptions

None.

Encoding T1 ARMv7 (executes as NOP in ARMv6T2)

DBG<c> #<option>

Encoding A1 ARMv7 (executes as NOP in ARMv6Kand ARMv6T2)

DBG<c> #<option>

1 1 1 0 0 1 1 1 0 1 0 (1) (1) (1) (1) 1 0 (0) 0 (0) 0 0 0 1 1 1 1 option

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

cond 0 0 1 1 0 0 1 0 0 0 0 0 (1) (1) (1) (1) (0) (0) (0) (0) 1 1 1 1 option

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.43 DMB

Data Memory Barrier is a memory barrier that ensures the ordering of observations of memory accesses, see Data

Memory Barrier (DMB) on page A3-152.

// No additional decoding required

Assembler syntax

DMB{<c>}{<q>} {<option>}

where:

See Standard assembler syntax fields on page A8-287. An ARM

DMB

instruction must be

unconditional.

Specifies an optional limitation on the DMB operation. Values are:

Full system is the required shareability domain, reads and writes are the required access

types. Can be omitted.

This option is referred to as the full system DMB. Encoded as option =

0b1111

Full system is the required shareability domain, writes are the required access type.

SYST

is a synonym for

. Encoded as option =

0b1110

ISH

Inner Shareable is the required shareability domain, reads and writes are the required

access types. Encoded as option =

0b1011

ISHST

Inner Shareable is the required shareability domain, writes are the required access type.

Encoded as option =

0b1010

NSH

Non-shareable is the required shareability domain, reads and writes are the required

access types. Encoded as option =

0b0111

NSHST

Non-shareable is the required shareability domain, writes are the required access type.

Encoded as option =

0b0110

OSH

Outer Shareable is the required shareability domain, reads and writes are the required

access types. Encoded as option =

0b0011

OSHST

Outer Shareable is the required shareability domain, writes are the required access type.

Encoded as option =

0b0010

All other encodings of option are reserved. It is IMPLEMENTATION DEFINED whether options other

than

are implemented. All unsupported and reserved options must execute as a full system DMB

operation, but software must not rely on this behavior.

Encoding T1 ARMv7

DMB<c> <option>

Encoding A1 ARMv7

DMB <option>

1 1 1 0 0 1 1 1 0 1 1 (1) (1) (1) (1) 1 0 (0) 0 (1) (1) (1) (1) 0 1 0 1 option

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

1 1 1 0 1 0 1 0 1 1 1 (1) (1) (1) (1) (1) (1) (1) (1) (0) (0) (0) (0) 0 1 0 1 option

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

ID051414 Non-Confidential

Note

The instruction supports the following alternative

values, but ARM recommends that

software does not use these alternative values:

•

as an alias for

ISH

•

SHST

as an alias for

ISHST

•

as an alias for

NSH

•

UNST

is an alias for

NSHST

Operation

if ConditionPassed() then

EncodingSpecificOperations();

case option of

when ‘0010’ domain = MBReqDomain_OuterShareable; types = MBReqTypes_Writes;

when ‘0011’ domain = MBReqDomain_OuterShareable; types = MBReqTypes_All;

when ‘0110’ domain = MBReqDomain_Nonshareable; types = MBReqTypes_Writes;

when ‘0111’ domain = MBReqDomain_Nonshareable; types = MBReqTypes_All;

when ‘1010’ domain = MBReqDomain_InnerShareable; types = MBReqTypes_Writes;

when ‘1011’ domain = MBReqDomain_InnerShareable; types = MBReqTypes_All;

when ‘1110’ domain = MBReqDomain_FullSystem; types = MBReqTypes_Writes;

otherwise domain = MBReqDomain_FullSystem; types = MBReqTypes_All;

if HaveVirtExt() && !IsSecure() && !CurrentModeIsHyp() then

if HCR.BSU == ‘11’ then

domain = MBReqDomain_FullSystem;

if HCR.BSU == ‘10’ && domain != MBReqDomain_FullSystem then

domain = MBReqDomain_OuterShareable;

if HCR.BSU == ‘01’ && domain == MBReqDomain_Nonshareable then

domain = MBReqDomain_InnerShareable;

DataMemoryBarrier(domain, types);

Exceptions

None.

A8 Instruction Descriptions

A8.8 Alphabetical list of instructions

Non-Confidential ID051414

A8.8.44 DSB

Data Synchronization Barrier is a memory barrier that ensures the completion of memory accesses, see Data

Synchronization Barrier (DSB) on page A3-153.

// No additional decoding required

Assembler syntax

DSB{<c>}{<q>} {<option>}

where:

See Standard assembler syntax fields on page A8-287. An ARM

DSB

instruction must be

unconditional.

Specifies an optional limitation on the DSB operation. Values are:

Full system is the required shareability domain, reads and writes are the required access

types. Can be omitted.

This option is referred to as the full system DSB. Encoded as option =

0b1111

Full system is the required shareability domain, writes are the required access type.

SYST

is a synonym for

. Encoded as option =

0b1110

ISH

Inner Shareable is the required shareability domain, reads and writes are the required

access types. Encoded as option =

0b1011

ISHST

Inner Shareable is the required shareability domain, writes are the required access type.

Encoded as option =

0b1010

NSH

Non-shareable is the required shareability domain, reads and writes are the required

access types. Encoded as option =

0b0111

NSHST

Non-shareable is the required shareability domain, writes are the required access type.

Encoded as option =

0b0110

OSH

Outer Shareable is the required shareability domain, reads and writes are the required

access types. Encoded as option =

0b0011

OSHST

Outer Shareable is the required shareability domain, writes are the required access type.

Encoded as option =

0b0010

All other encodings of option are reserved. It is IMPLEMENTATION DEFINED whether options other

than

are implemented. All unsupported and reserved options must execute as a full system DSB

operation, but software must not rely on this behavior.

Encoding T1 ARMv7

DSB<c> <option>

Encoding A1 ARMv7

DSB <option>

1 1 1 0 0 1 1 1 0 1 1 (1) (1) (1) (1) 1 0 (0) 0 (1) (1) (1) (1) 0 1 0 0 option

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1514131211109876543210

1 1 1 0 1 0 1 0 1 1 1 (1) (1) (1) (1) (1) (1) (1) (1) (0) (0) (0) (0) 0 1 0 0 option

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0