ARM V7 M Architecture Application Level Reference Manual ARMv7
ARM-ARMv7-ReferenceManual
ARM-ARMv7-ReferenceManual
ARM-ARMv7-ReferenceManual
ARM-ARMv7-ReferenceManual
ARM-ARMv7-ReferenceManual
User Manual: Pdf
Open the PDF directly: View PDF .
Page Count: 534
Download | |
Open PDF In Browser | View PDF |
ARM v7-M Architecture Application Level Reference Manual Beta Copyright © 2006 ARM Limited. All rights reserved. ARM DDI 0405A-01 ARM v7-M Architecture Application Level Reference Manual Copyright © 2006 ARM Limited. All rights reserved. Release Information The following changes have been made to this document. Change History Date Issue Change 21-Mar-2006 A first beta release Proprietary Notice ARM, the ARM Powered logo, Thumb, and StrongARM are registered trademarks of ARM Limited. The ARM logo, AMBA, Angel, ARMulator, EmbeddedICE, ModelGen, Multi-ICE, PrimeCell, ARM7TDMI, ARM7TDMI-S, ARM9TDMI, ARM9E-S, ETM7, ETM9, TDMI, STRONG, are trademarks of ARM Limited. All other products or services mentioned herein may be trademarks of their respective owners. The product described in this document is subject to continuous developments and improvements. All particulars of the product and its use contained in this document are given by ARM in good faith. 1. Subject to the provisions set out below, ARM hereby grants to you a perpetual, non-exclusive, nontransferable, royalty free, worldwide licence to use this ARM Architecture Reference Manual for the purposes of developing; (i) software applications or operating systems which are targeted to run on microprocessor cores distributed under licence from ARM; (ii) tools which are designed to develop software programs which are targeted to run on microprocessor cores distributed under licence from ARM; (iii) integrated circuits which incorporate a microprocessor core manufactured under licence from ARM. 2. Except as expressly licensed in Clause 1 you acquire no right, title or interest in the ARM Architecture Reference Manual, or any Intellectual Property therein. In no event shall the licences granted in Clause 1, be construed as granting you expressly or by implication, estoppel or otherwise, licences to any ARM technology other than the ARM Architecture Reference Manual. The licence grant in Clause 1 expressly excludes any rights for you to use or take into use any ARM patents. No right is granted to you under the provisions of Clause 1 to; (i) use the ARM Architecture Reference Manual for the purposes of developing or having developed microprocessor cores or models thereof which are compatible in whole or part with either or both the instructions or programmer's models described in this ARM Architecture Reference Manual; or (ii) develop or have developed models of any microprocessor cores designed by or for ARM; or (iii) distribute in whole or in part this ARM Architecture Reference Manual to third parties without the express written permission of ARM; or (iv) translate or have translated this ARM Architecture Reference Manual into any other languages. 3.THE ARM ARCHITECTURE REFERENCE MANUAL IS PROVIDED "AS IS" WITH NO WARRANTIES EXPRESS, IMPLIED OR STATUTORY, INCLUDING BUT NOT LIMITED TO ANY WARRANTY OF SATISFACTORY QUALITY, NONINFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE. ii Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 4. No licence, express, implied or otherwise, is granted to LICENSEE, under the provisions of Clause 1, to use the ARM tradename, in connection with the use of the ARM Architecture Reference Manual or any products based thereon. Nothing in Clause 1 shall be construed as authority for you to make any representations on behalf of ARM in respect of the ARM Architecture Reference Manual or any products based thereon. Copyright © 2005, 2006 ARM limited 110 Fulbourn Road Cambridge, England CB1 9NJ Restricted Rights Legend: Use, duplication or disclosure by the United States Government is subject to the restrictions set forth in DFARS 252.227-7013 (c)(1)(ii) and FAR 52.227-19 The right to use and copy this document is subject to the licence set out above. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta iii iv Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 Contents ARM v7-M Architecture Application Level Reference Manual Preface About this manual .................................................................................. x Unified Assembler Language ................................................................ xi Using this manual ................................................................................ xii Conventions ........................................................................................ xiv Further reading .................................................................................... xv Feedback ............................................................................................ xvi Part A Chapter A1 Application Introduction A1.1 A1.2 Chapter A2 Application Level Programmer’s Model A2.1 A2.2 A2.3 ARM DDI 0405A-01 The ARM Architecture – M profile .................................................... A1-2 Introduction to Pseudocode ............................................................. A1-3 The register model ........................................................................... A2-2 Exceptions, faults and interrupts ...................................................... A2-5 Coprocessor support ........................................................................ A2-6 Copyright © 2006 ARM Limited. All rights reserved. Beta v Contents Chapter A3 ARM Architecture Memory Model A3.1 A3.2 A3.3 A3.4 A3.5 A3.6 A3.7 A3.8 A3.9 Chapter A4 The Thumb Instruction Set A4.1 A4.2 A4.3 A4.4 A4.5 A4.6 A4.7 Chapter A5 Chapter B1 System Level Programmer’s Model The system address map ................................................................. B2-2 Bit Banding ....................................................................................... B2-5 System Control Space (SCS) .......................................................... B2-7 System timer - SysTick .................................................................... B2-9 Nested Vectored Interrupt Controller (NVIC) ................................. B2-10 Protected Memory System Architecture ........................................ B2-12 ARMv7-M System Instructions B3.1 vi Introduction to the system level ....................................................... B1-2 System programmer’s model ........................................................... B1-3 System Address Map B2.1 B2.2 B2.3 B2.4 B2.5 B2.6 Chapter B3 Format of instruction descriptions .................................................... A5-2 Immediate constants ........................................................................ A5-8 Constant shifts applied to a register ............................................... A5-10 Memory accesses .......................................................................... A5-13 Memory hints ................................................................................. A5-14 NOP-compatible hints .................................................................... A5-15 Alphabetical list of Thumb instructions ........................................... A5-16 System B1.1 B1.2 Chapter B2 Instruction set encoding ................................................................... A4-2 Instruction encoding for 16-bit Thumb instructions .......................... A4-3 Instruction encoding for 32-bit Thumb instructions ........................ A4-12 Conditional execution ..................................................................... A4-33 UNDEFINED and UNPREDICTABLE instruction set space .......... A4-37 Usage of 0b1111 as a register specifier ........................................ A4-39 Usage of 0b1101 as a register specifier ........................................ A4-41 Thumb Instructions A5.1 A5.2 A5.3 A5.4 A5.5 A5.6 A5.7 Part B Address space ................................................................................. A3-2 Alignment Support ........................................................................... A3-3 Endian Support ................................................................................ A3-5 Synchronization and semaphores .................................................... A3-8 Memory types ................................................................................ A3-19 Access rights .................................................................................. A3-26 Memory access order .................................................................... A3-27 Caches and memory hierarchy ...................................................... A3-32 Bit banding ..................................................................................... A3-34 Alphabetical list of ARMv7-M system instructions ........................... B3-2 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 Contents Part C Chapter C1 Debug Debug C1.1 C1.2 C1.3 C1.4 C1.5 C1.6 C1.7 C1.8 C1.9 C1.10 C1.11 Appendix A Pseudo-code definition A.1 A.2 A.3 A.4 A.5 A.6 Appendix B Appendix C Instruction encoding diagrams and pseudo-code ...................... AppxA-2 Data Types ................................................................................. AppxA-4 Expressions ............................................................................... AppxA-8 Operators and built-in functions ............................................... AppxA-10 Statements and program structure ........................................... AppxA-18 Helper procedures and functions ............................................. AppxA-22 Legacy Instruction Mnemonics CPUID C.1 Appendix D Introduction to debug ....................................................................... C1-2 The Debug Access Port (DAP) ........................................................ C1-4 Overview of the ARMv7-M debug features ...................................... C1-7 Debug and reset .............................................................................. C1-8 Debug event behavior ...................................................................... C1-9 Debug register support in the SCS ................................................ C1-11 Instrumentation Trace Macrocell (ITM) support ............................. C1-12 Data Watchpoint and Trace (DWT) support ................................... C1-14 Embedded Trace (ETM) support .................................................... C1-15 Trace Port Interface Unit (TPIU) .................................................... C1-16 Flash Patch and Breakpoint (FPB) support .................................... C1-17 Core Feature ID Registers ......................................................... AppxC-2 Deprecated Features in ARMv7M Glossary ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta vii Contents viii Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 Preface This preface describes the contents of this manual, then lists the conventions and terminology it uses. • About this manual on page x • Unified Assembler Language on page xi • Using this manual on page xii • Conventions on page xiv • Further reading on page xv • Feedback on page xvi. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta ix Preface About this manual This manual documents the Microcontroller profile associated with version 7 of the ARM Architecture (ARMv7-M). For short-form definitions of all the ARMv7 profiles see page A1-1. The manual consists of three parts: Part A The application level programming model and memory model information along with the instruction set as visible to the application programmer. This is the information required to program applications or to develop the toolchain components (compiler, linker, assembler and disassembler) excluding the debugger. For ARMv7-M, this is almost entirely a subset of material common to the other two profiles. Instruction set details which differ between profiles are clearly stated. Note All ARMv7 profiles support a common procedure calling standard, the ARM Architecture Procedure Calling Standard (AAPCS). Part B The system level programming model and system level support instructions required for system correctness. The system level supports the ARMv7-M exception model. It also provides features for configuration and control of processor resources and management of memory access rights. This is the information in addition to Part A required for an operating system (OS) and/or system support software. It includes details of register banking, the exception model, memory protection (management of access rights) and cache support. Part B is profile specific. ARMv7-M introduces a new programmer’s model and as such has some fundamental differences at the system level from the other profiles. As ARMv7-M is a memory-mapped architecture, the system memory map is documented here. Part C The debug features to support the ARMv7-M debug architecture, and the programmer’s interface to the debug environment. This is the information required in addition to Parts A and B to write a debugger. Part C covers details of the different types of debug: • halting debug and the related debug state • exception-based monitor debug • non-invasive support for event generation and signalling of the events to an external agent. This part is profile specific and includes several debug features unique within the ARMv7 architecture to this profile. x Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 Preface Unified Assembler Language Unified Assembler Language (UAL) provides a canonical form for all ARM and Thumb instructions. This replaces the earlier Thumb assembler language. The syntax of Thumb instructions is now the same as the syntax of ARM instructions. For details on the changes from the old Thumb syntax, see page AppxB-1. UAL describes the syntax for the mnemonic and the operands of each instruction. In addition, it assumes that instructions and data items can be given labels. It does not specify the syntax to be used for labels, nor what assembler directives and options are available. See your assembler documentation for these details. UAL includes instruction selection rules that specify which instruction encoding is selected when more than one can provide the required functionality. For example, both 16-bit and 32-bit encodings exist for an ADD R0,R1,R2 instruction. The most common instruction selection rule is that when both a 16-bit encoding and a 32-bit encoding are available, the 16-bit encoding is selected, to optimize code density. Syntax options exist to override the normal instruction selection rules and ensure that a particular encoding is selected. These are useful when disassembling code, to ensure that subsequent assembly produces the original code, and in some other situations. Note The precise effects of each instruction are described, including any restrictions on its use. This information is of primary importance to authors of compilers, assemblers, and other programs that generate Thumb machine code. This manual is restricted to UAL and not intended as tutorial material for ARM assembler language, nor does it describe ARM assembler language at anything other than a very basic level. To make effective use of ARM assembler language, consult the documentation supplied with the assembler being used. Different assemblers vary considerably with respect to many aspects of assembler language, such as which assembler directives are accepted and how they are coded. Assembler syntax is given for the instructions described in this manual, allowing instructions to be specified in textual form. This is of considerable use to assembly code writers, and also when debugging either assembler or high-level language code at the single instruction level. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta xi Preface Using this manual The information in this manual is organized into nine chapters and a set of supporting appendices, as described below: Chapter A1 Introduction ARMv7 overview, the different architecture profiles and the background to the Microcontroller (M) profile. Chapter A2 Application Level Programmer’s Model Details on the registers and status bits available at the application level along with a summary of the exception support. Chapter A3 ARM Architecture Memory Model Details of the ARM architecture memory attributes and memory order model. Chapter A4 The Thumb Instruction Set Encoding diagrams for the Thumb instruction set along with general details on bit field usage, UNDEFINED and UNPREDICTABLE terminology. Chapter A5 Thumb Instructions Contains detailed reference material on each Thumb instruction, arranged alphabetically by instruction mnemonic. Summary information for system instructions is included and referenced for detailed definition in Part B. Chapter B1 System Level Programmer’s Model Details of the registers, status and control mechanisms available at the system level. Chapter B2 System Address Map Overview of the system address map, and details of the architecturally defined features within the Private Peripheral Bus region. This chapter includes details of the memory-mapped support for a protected memory system. Chapter B3 ARMv7-M System Instructions Contains detailed reference material on the system level instructions. Chapter C1 Debug ARMv7-M debug support Appendix A Pseudo-code definition Definition of terms, format and helper functions used by the pseudo-code to describe the memory model and instruction operations Appendix B Legacy Instruction Mnemonics A cross reference of Unified Assembler Language forms of the instruction syntax to the Thumb format used in earlier versions of the ARM architecture. xii Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 Preface Appendix C CPUID A summary of the ID attribute registers used for ARM architecture feature identification. Appendix D Deprecated Features in ARMv7M Deprecated features that software is advised to avoid for future-proofing. It is ARM’s intent to remove this functionality in a future version of the ARM architecture. Glossary ARM DDI 0405A-01 Glossary of terms - not including those associated with pseudo-code. Copyright © 2006 ARM Limited. All rights reserved. Beta xiii Preface Conventions This manual employs typographic and other conventions intended to improve its ease of use. General typographic conventions typewriter Is used for assembler syntax descriptions, pseudo-code descriptions of instructions, and source code examples. For more details of the conventions used in assembler syntax descriptions see Assembler syntax on page A5-3. For more details of pseudo-code conventions see Appendix A Pseudo-code definition. The typewriter font is also used in the main text for instruction mnemonics and for references to other items appearing in assembler syntax descriptions, pseudo-code descriptions of instructions and source code examples. xiv italic Highlights important notes, introduces special terminology, and denotes internal cross-references and citations. bold Is used for emphasis in descriptive lists and elsewhere, where appropriate. SMALL CAPITALS Are used for a few terms which have specific technical meanings. Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 Preface Further reading This section lists publications that provide additional information on the ARM family of processors. This manual provides architecture imformation. It is designed to be read in conjunction with a Technical Reference Manual (TRM) for the implementation of interest. The TRM provides details of the IMPLEMENTATION DEFINED architecture features in the ARM compliant core. The silicon partner’s device specification should be used for additional system details. ARM periodically provides updates and corrections to its documentation. For the latest information and errata, some materials are published at http://www.arm.com. Alternatively, contact your distributor or silicon partner who will have access to the latest published ARM information, as well as information specific to the device of interest. ARM publications The first ARMv7-M implementation is described in the Cortex-M3 Technical Reference Manual (ARM DDI 0337). ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta xv Preface Feedback ARM Limited welcomes feedback on its documentation. Feedback on this book If you notice any errors or omissions in this book, send email to errata@arm.com giving: • the document title • the document number • the page number(s) to which your comments apply • a concise explanation of the problem. General suggestions for additions and improvements are also welcome. xvi Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 Part A Application Chapter A1 Introduction Due to the explosive growth in recent years associated with the ARM architecture into many market areas, along with the need to maintain high levels of architecture consistency, ARMv7 is documented as a set of architecture profiles. The ARM architecture specification is re-structured accordingly. Three profiles have been defined as follows: ARMv7-A the application profile for systems supporting the ARM and Thumb instruction sets, and requiring virtual address support in the memory management model. ARMv7-R the realtime profile for systems supporting the ARM and Thumb instruction sets, and requiring physical address only support in the memory management model ARMv7-M the microcontroller profile for systems supporting only the Thumb instruction set, and where overall size and deterministic operation for an implementation are more important than absolute performance. While profiles were formally introduced with the ARMv7 development, the A-profile and R-profile have implicitly existed in earlier versions, associated with the Virtual Memory System Architecture (VMSA) and Protected Memory System Architecture (PMSA) respectively. Instruction Set Architecture (ISA) ARMv7-M only supports Thumb instructions, and specifically a subset of the ARMv7 Thumb-2 instruction set, where Thumb-2 indicates general support of both 16-bit and 32-bit instructions in the Thumb execution state. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A1-1 Introduction A1.1 The ARM Architecture – M profile The ARM architecture has evolved through several major revisions to a point where it supports implementations across a wide spectrum of performance points, with over a billion parts per annum being produced. The latest version (ARMv7) has seen the diversity formally recognised in a set of architecture profiles, the profiles used to tailor the architecture to different market requirements. A key factor is that the application level is consistent across all profiles, and the bulk of the variation is at the system level. The introduction of Thumb-2 in ARMv6T2 provided a balance to the ARM and Thumb instruction sets, and the opportunity for the ARM architecture to be extended into new markets, in particular the microcontroller marketplace. To take maximum advantage of this opportunity a Thumb-only profile with a new programmer’s model (a system level consideration) has been introduced as a unique profile, complementing ARM’s strengths in the high performance and real-time embedded markets. Key criteria for ARMv7-M implementations are as follows: • Enable implementations with industry leading power, performance and area constraints — • • Highly deterministic operation — Single/low cycle execution — Minimal interrupt latency (short pipelines) — Cacheless operation Excellent C/C++ target – aligns with ARM’s programming standards in this area — • • Opportunities for simple pipeline designs offering leading edge system performance levels in a broad range of markets and applications Exception handlers are standard C/C++ functions, entered using standard calling conventions Designed for deeply embedded systems — Low pincount devices — Enable new entry level opportunities for the ARM architecture Debug and software profiling support for event driven systems This manual is specific to the ARMv7-M profile. A1-2 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 Introduction A1.2 Introduction to Pseudocode Pseudo-code is used to describe the exception model, memory system behaviour, and the instruction set architecture. The general format rules for pseudo-code used throughout this manual are described in Appendix A Pseudo-code definition. This appendix includes information on data types and the operations (logical and arithmetic) supported by the ARM architecture. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A1-3 Introduction A1-4 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 Chapter A2 Application Level Programmer’s Model This chapter provides an application level view of the programmer’s model. This is the information necessary for application development, as distinct from the system information required to service and support application execution under an operating system. It contains the following sections: • The register model on page A2-2 • Exceptions, faults and interrupts on page A2-5 • Coprocessor support on page A2-6 System related information is provided in overview form and/or with references to the system information part of the architecture specification as appropriate. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A2-1 Application Level Programmer’s Model A2.1 The register model The application level programmer’s model provides details of the general-purpose and special-purpose registers visible to the application programmer, the ARM memory model, and the instruction set used to load to registers from memory, store registers to memory, or manipulate data (data operations) within the registers. Applications often interact with external events. A summary of the types of events recognized in the architecture, along with the mechanisms provided in the architecture to interact with events, is included in Exceptions, faults and interrupts on page A2-5). How events are handled is a system level topic described in Exception model on page B1-9. A2.1.1 Registers There are thirteen general-purpose 32-bit registers (R0-R12), and an additional three 32-bit registers which have special names and usage models. SP stack pointer (R13), used as a pointer to the active stack. For usage restrictions see Chapter A5 Thumb Instructions. This is preset to the top of the Main stack on reset. See The SP registers on page B1-7 for additional information. LR link register (R14), used to store a value (the Return Link) relating to the return address from a subroutine which is entered using a Branch with Link instruction. This register is set to an illegal value (all 1’s) on reset. The reset value will cause a fault condition to occur if a subroutine return call is attempted from it. PC program counter. For details on the usage model of the PC see Chapter A5 Thumb Instructions. The PC is loaded with the Reset handler start address on reset. Program status is reported in the 32-bit Application Program Status Register (APSR), where the defined bits break down into a set of flags as follows: 31 30 29 28 27 26 0 N Z C V Q RESERVED APSR bit fields fall into two categories • Reserved bits are allocated to system features or are available for future expansion. Further information on currently allocated reserved bits is available in The special-purpose processor status registers (xPSR) on page B1-7. Software must ignore values read from reserved bits, and preserve their value on a write, to ensure future compatibility. The bits are defined as SBZP/UNP. • User-writeable bits NZCVQ, collectively known as the flags NZCV (the Negative, Zero, Carry and oVerflow flags) are sometimes referred to as the condition code flags, and are written on execution of a flag-setting instruction: • A2-2 N is set to bit<31> of the result of the instruction. If this result is regarded as a two’s complement signed integer, then N = 1 if the result is negative and N = 0 if the result is positive or zero. Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 Application Level Programmer’s Model • Z is set if the result of the instruction is zero, otherwise it is cleared. • C is set in one of four ways on an instruction: • — For an addition, including the comparison instruction CMN, C is set if the addition produced a carry (that is, an unsigned overflow), otherwise it is cleared. — For a subtraction, including the comparison instruction CMP, C is cleared if the subtraction produced a borrow (that is, an unsigned underflow), otherwise it is set. — For non-additions/subtractions that include a shift, C is set or cleared to the last bit shifted out of the value by the shifter. — For other non-additions/subtractions, C is normally unchanged (special cases are listed as part of the instruction definition). V is set in one of two ways on an instruction: — For an addition or subtraction, V is set if a signed overflow occurred, regarding the operands and result as two’s complement signed integers — For non-additions/subtractions, V is normally unchanged (special cases are listed as part of the instruction definition). The Q flag is set if the result of an SSAT, SSAT16, USAT or USAT16 instruction changes (saturates) the input value for the signed or unsigned range of results. A2.1.2 Execution state support ARMv7-M only executes Thumb instructions, and therefore always executes instructions in Thumb state. See Chapter A5 Thumb Instructions for a list of the instructions supported. In addition to normal program execution, there is a Debug state – see Chapter C1 Debug for more details. A2.1.3 Privileged execution Good system design practice requires the application developer to have a degree of knowledge of the underlying system architecture and the services it offers. System support requires a level of access generally referred to as privileged operation. The system support code determines whether applications run in a privileged or unprivileged manner. Where both privileged and unprivileged support is provided by an operating system, applications usually run unprivileged, allowing the operating system to allocate system resources for sole or shared use by the application, and to provide a degree of protection with respect to other processes and tasks. Thread mode is the fundamental mode for application execution in ARMv7-M. Thread mode is selected on reset, and can execute in a privileged or non-privileged manner depending on the system environment. Privileged execution is required to manage system resources in many cases. When code is executing unprivileged, Thread mode can execute an SVC instruction to generate a supervisor call exception. Privileged execution in Thread mode can raise a supervisor call using SVC or handle system access and control directly. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A2-3 Application Level Programmer’s Model All exceptions execute as privileged code in Handler mode. See Exception model on page B1-9 for details. Supervisor call handlers manage resources on behalf of the application such as memory allocation and management of software stacks. A2-4 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 Application Level Programmer’s Model A2.2 Exceptions, faults and interrupts An exception can be caused by the execution of an exception generating instruction or triggered as a response to a system behavior such as an interrupt, memory management, alignment or bus fault, or a debug event. Synchronous and asynchronous exceptions can occur within the architecture. A2.2.1 System related events The following types of exception are system related. Where there is direct correlation with an instruction, reference to the associated instruction is made. Supervisor calls are used by application code to request a service from the underlying operating system. Using the SVC instruction, the application can instigate a supervisor call for a service requiring privileged access to the system. Several forms of Fault can occur: • Instruction execution related errors • Data memory access errors can occur on any load or store • Usage faults from a variety of execution state related errors. Execution of an UNDEFINED instruction is an example cause of a UsageFault exception. • Debug events can generate a DebugMonitor exception. Faults in general are synchronous with respect to the associated executing instruction. Some system errors can cause an imprecise exception where it is reported at a time bearing no fixed relationship to the instruction which caused it. Interrupts are always treated as asynchronous events with respect to the program flow. System timer (SysTick), a pended service call (PendSV), and an external interrupt controller (NVIC) are all defined. A BKPT instruction generates a debug event – see Debug event behavior on page C1-9 for more information. For power or performance reasons it can be desirable to either notify the system that an action is complete, or provide a hint to the system that it can suspend operation of the current task. Instruction support is provided for the following: • Send Event and Wait for Event instructions. See WFE on page A5-317. • Wait For Interrupt. See WFI on page A5-319. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A2-5 Application Level Programmer’s Model A2.3 Coprocessor support An ARMv7-M implementation can optionally support coprocessors. If it does not support them, it treats all coprocessors as non-existent. Coprocessors 8 to 15 (CP8 to CP15) are reserved by ARM. Coprocessors 0 to 7 (CP0 to CP7) are IMPLEMENTATION DEFINED, subject to the coprocessor instruction constraints of the instruction set architecture. Where a coprocessor instruction is issued to a non-existent or disabled coprocessor, a NOCP UsageFault is generated (see Fault behavior on page B1-14). Unknown instructions issued to an enabled coprocessor generate an UNDEFINSTR UsageFault. A2-6 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 Chapter A3 ARM Architecture Memory Model This chapter covers the general principles which apply to the ARM memory model. The chapter contains the following sections: • Address space on page A3-2 • Alignment Support on page A3-3 • Endian Support on page A3-5 • Synchronization and semaphores on page A3-8 • Memory types on page A3-19 • Access rights on page A3-26 • Memory access order on page A3-27 • Caches and memory hierarchy on page A3-32 ARMv7-M is a memory-mapped architecture. The address map specific details that apply to ARMv7-M are described in The system address map on page B2-2. The chapter includes one feature unique to the M profile: • Bit banding on page A3-34 ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A3-1 ARM Architecture Memory Model A3.1 Address space The ARM architecture uses a single, flat address space of 232 8-bit bytes. Byte addresses are treated as unsigned numbers, running from 0 to 232 - 1. This address space is regarded as consisting of 230 32-bit words, each of whose addresses is word-aligned, which means that the address is divisible by 4. The word whose word-aligned address is A consists of the four bytes with addresses A, A+1, A+2 and A+3. The address space can also be considered as consisting of 231 16-bit halfwords, each of whose addresses is halfword-aligned, which means that the address is divisible by 2. The halfword whose halfword-aligned address is A consists of the two bytes with addresses A and A+1. While instruction fetches are always halfword-aligned, some load and store instructions support unaligned addresses. This affects the access address A, such that A<1:0> in the case of a word access and A<0> in the case of a halfword access can have non-zero values. Address calculations are normally performed using ordinary integer instructions. This means that they normally wrap around if they overflow or underflow the address space. This means that the result of the calculation is reduced modulo 232. Normal sequential execution of instructions effectively calculates: (address_of_current_instruction) +(2 or 4) /*16- and 32-bit instr mix*/ after each instruction to determine which instruction to execute next. If this calculation overflows the top of the address space, the result is UNPREDICTABLE. In ARMv7-M this condition cannot occur because the top of memory is defined to always have the eXecute Never (XN) memory attribute associated with it. See The system address map on page B2-2 for more details. An access violation will be reported if this scenario occurs. The above only applies to instructions that are executed, including those which fail their condition code check. Most ARM implementations prefetch instructions ahead of the currently-executing instruction. LDC, LDM, LDRD, POP, PUSH, STC, STRD, and STM instructions access a sequence of words at increasing memory addresses, effectively incrementing a memory address by 4 for each register load or store. If this calculation overflows the top of the address space, the result is UNPREDICTABLE. Any unaligned load or store whose calculated address is such that it would access the byte at 0xFFFFFFFF and the byte at address 0x00000000 as part of the instruction is UNPREDICTABLE. A3.1.1 Virtual versus Physical Addressing Virtual memory is not supported in ARMv7-M. A3-2 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 ARM Architecture Memory Model A3.2 Alignment Support The system architecture can choose one of two policies for alignment checking in ARMv7-M: • Support the unaligned access • Generate a fault when an unaligned access occurs. The policy varies with the type of access. An implementation can be configured to force alignment faults for all unaligned accesses (see below). Writes to the PC are restricted according to the rules outlined in Usage of 0b1111 as a register specifier on page A4-39. A3.2.1 Alignment Behavior Address alignment affects data accesses and updates to the PC. Alignment and data access The following data accesses always generate an alignment fault: • Non halfword-aligned LDREXH and STREXH • Non word-aligned LDREX and STREX • Non word-aligned LDRD, LDMIA, LDMDB, POP, and LDC • Non word-aligned STRD, STMIA, STMDB, PUSH, and STC The following data accesses support unaligned addressing, and only generate alignment faults when the ALIGN_TRP bit is set (see The System Control Block (SCB) on page B2-8): • Non halfword-aligned LDR{S}H{T} and STRH{T} • Non halfword-aligned TBH • Non word-aligned LDR{T} and STR{T} Note LDREXD and STREXD are not supported in ARMv7-M Accesses to Strongly Ordered and Device memory types must always be naturally aligned (see Memory access restrictions on page A3-24 Alignment and updates to the PC All instruction fetches must be halfword-aligned. Any exception return irregularities are captured as an INVSTATE or INVPC UsageFault by the exception return mechanism. See Fault behavior on page B1-14. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A3-3 ARM Architecture Memory Model For exception entry and return: • Exception entry using a vector with bit<0> clear causes an INVSTATE UsageFault • A reserved EXC_RETURN value causes an INVPC Usagefault • Loading an unaligned value from the stack into the PC on an exception return is UNPREDICTABLE For all other cases where the PC is updated: • If bit<0> of the value loaded to the PC using an ADD or MOV instruction is zero, the result is UNPREDICTABLE • A BLX, BX, LDR to the PC, POP or LDM including the PC instruction will cause an INVSTATE UsageFault if bit<0> of the value loaded is zero • Loading the PC with a value from a memory location whose address is not word aligned is UNPREDICTABLE A3-4 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 ARM Architecture Memory Model A3.3 Endian Support The address space rules (Address space on page A3-2) require that for a word-aligned address A: • The word at address A consists of the bytes at addresses A, A+1, A+2 and A+3 • The halfword at address A consists of the bytes at addresses A and A+1 • The halfword at address A+2 consists of the bytes at addresses A+2 and A+3 • The word at address A therefore consists of the halfwords at addresses A and A+2 However, this does not fully specify the mappings between words, halfwords and bytes.A memory system uses one of the following mapping schemes. This choice is known as the endianness of the memory system. In a little-endian memory system: • A byte or halfword at a word-aligned address is the least significant byte or halfword within the word at that address • A byte at a halfword-aligned address is the least significant byte within the halfword at that address In a big-endian memory system: • A byte or halfword at a word-aligned address is the most significant byte or halfword within the word at that address • A byte at a halfword-aligned address is the most significant byte within the halfword at that address For a word-aligned address A, Table A3-1 and Table A3-2 show how the word at address A, the halfwords at address A and A+2, and the bytes at addresses A, A+1, A+2 and A+3 map onto each other for each endianness. Table A3-1 Little-endian memory system MSByte MSByte -1 LSByte + 1 LSByte Word at Address A Halfword at Address A+2 Byte at Address A+3 Halfword at Address A Bye at Address A+2 Byte at Address A+1 Byte at Address A Table A3-2 Big-endian memory system MSByte MSByte -1 LSByte + 1 LSByte Word at Address A Halfword at Address A Byte at Address A ARM DDI 0405A-01 Halfword at Address A+2 Bye at Address A+1 Byte at Address A+2 Copyright © 2006 ARM Limited. All rights reserved. Beta Byte at Address A +3 A3-5 ARM Architecture Memory Model The big-endian and little-endian mapping schemes determines the order in which the bytes of a word or half-word are interpreted. As an example, a load of a word (4 bytes) from address 0x1000 will result in an access of the bytes contained at memory locations 0x1000, 0x1001, 0x1002 and 0x1003, regardless of the mapping scheme used. The mapping scheme determines the significance of those bytes. A3.3.1 Control of the Endian Mapping in ARMv7-M ARMv7-M supports a selectable endian model, that is configured to be big endian (BE-8) or little endian (LE-8) by a control input on system reset. The endian mapping has the following restrictions: • The endian setting only applies to data accesses, instruction fetches are always little endian • Loads and stores to the System Control Space (System Control Space (SCS) on page B2-7) are always little endian Where big endian format instruction support is required, it can be implemented in the bus fabric. See Endian support on page AppxG-2 for more details. Instruction alignment and byte ordering Thumb-2 enforces 16-bit alignment on all instructions. This means that 32-bit instructions are treated as two halfwords, hw1 and hw2, with hw1 at the lower address. In instruction encoding diagrams, hw1 is shown to the left of hw2. This results in the encoding diagrams reading more naturally. The byte order of a 32-bit Thumb instruction is shown in Figure A3-1. Thumb 32-bit instruction order in memory 32-bit Thumb instruction, hw1 15 8 7 Byte at Address A+1 32-bit Thumb instruction, hw2 0 15 Byte at Address A Byte at Address A+3 87 0 Byte at Address A+2 Figure A3-1 Instruction byte order in memory A3-6 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 ARM Architecture Memory Model A3.3.2 Element size and Endianness The effect of the endianness mapping on data applies to the size of the element(s) being transferred in the load and store instructions. Table A3-3 shows the element size of each of the load and store instructions:. Table A3-3 Load-store and element size association Instruction class Instructions Element Size Load/store byte LDR{S}B{T}, STRB{T}, LDREXB, STREXB byte Load/store halfword LDR{S}H{T}, STRH{T}, TBH, LDREXH, STREXH halfword Load/store word LDR{T}, STR{T}, LDREX, STREX word Load/store two words LDRD, STRD word Load/store multiple words LDM{IA,DB}, STM{IA,DB}, PUSH, POP, LDC, STC word A3.3.3 Instructions to reverse bytes in a general-purpose register When an application or device driver has to interface to memory-mapped peripheral registers or shared-memory structures that are not the same endianness as that of the internal data structures, or the endianness of the Operating System, an efficient way of being able to explicitly transform the endianness of the data is required. Thumb-2 provides instructions for the following byte transformations (see the instruction definitions in Chapter A5 Thumb Instructions for details): REV Reverse word (four bytes) register, for transforming 32-bit representations. REVSH Reverse halfword and sign extend, for transforming signed 16-bit representations. REV16 Reverse packed halfwords in a register for transforming unsigned 16-bit representations. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A3-7 ARM Architecture Memory Model A3.4 Synchronization and semaphores Exclusive access instructions support non-blocking shared-memory synchronization primitives that allow calculation to be performed on the semaphore between the read and write phases, and scale for multiple-processor system designs. In ARMv7-M, the synchronization primitives provided are: • Load-Exclusives: — LDREX, see LDREX on page A5-119 — LDREXB, see LDREXB on page A5-121 — LDREXH, see LDREXH on page A5-123 • Store-Exclusives: — STREX, see STREX on page A5-262 — STREXB, see STREXB on page A5-264 — STREXH, see STREXH on page A5-266 • Clear-Exclusive: — CLREX, see CLREX on page A5-65. Note This section describes the operation of a Load-Exclusive/Store-Exclusive pair of synchronization primitives using, as examples, the LDREX and STREX instructions. The same description applies to any other pair of synchronization primitives: • LDREXB used with STREXB • LDREXH used with STREXH. Each Load-Exclusive instruction must be used only with the corresponding Store-Exclusive instruction. STREXD and LDREXD are not supported in ARMv7-M. The model for the use of a Load-Exclusive/Store-Exclusive instruction pair, accessing memory address x is: • The Load-Exclusive instruction always successfully reads a value from memory address x • The corresponding Store-Exclusive instruction succeeds in writing back to memory address x only if no other processor or process has performed a more recent Load-Exclusive of address x. The Store-Exclusive operation returns a status bit that indicates whether the memory write succeeded. A Load-Exclusive instruction tags a small block of memory for exclusive access. The size of the tagged block is IMPLEMENTATION DEFINED, see Size of the tagged memory block on page A3-16. A Store-Exclusive instruction to the same address clears the tag. These instructions operate with an address monitor that provides the state machine and associated system control for memory accesses. Two different monitor models exist, depending on whether the memory has the sharable or non-sharable memory attribute, see Shared Normal memory on page A3-22. Uniprocessor systems are only required to support the non-shared memory model. This means they can support synchronization primitives with the minimum amount of hardware overhead. Figure A3-2 on page A3-9 shows an example minimal system. A3-8 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 ARM Architecture Memory Model L2 RAM L2 Cache Bridge to L3 Routing matrix Monitor CPU 1 Figure A3-2 Example uniprocessor system, with non-shared monitor Multiprocessor systems are required to implement an address monitor for each processor. Logically, a multiprocessor system must implement: • A local monitor for each processor, that monitors Load-Exclusive and Store-Exclusive accesses to Non Shared memory by that processor. A local monitor can be unaware of all Load-Exclusive and Store-Exclusive accesses made by the other processors. • A single global monitor, that monitors all Load-Exclusive and Store-Exclusive accesses to Shared memory, by all processors. The global monitor must maintain an exclusive access state machine for each processor. However, it is IMPLEMENTATION DEFINED: • where the monitors reside in the memory system hierarchy • whether the monitors are implemented: — as a single entity for each processor, visible to all shared accesses — as a distributed entity. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A3-9 ARM Architecture Memory Model Figure A3-3 shows a single entity approach in which the monitor supports state machines for both Shared and Non Shared memory accesses. Only the Shared memory case needs to snoop. L2 RAM L2 Cache Bridge to L3 Routing matrix Monitor Monitor CPU 1 CPU 2 Figure A3-3 Global monitoring using write snoop monitor approach Figure A3-4 shows a distributed model with a local monitors in each processor block, and global monitoring distributed across the targets of interest. Shared L2 RAM Mon 2 Nonshared L2 RAM Mon 1 L2 Cache Bridge to L3 Mon 2 Mon 2 Mon 1 Mon 1 Routing matrix Local Monitor CPU 1 Local Monitor CPU 2 Figure A3-4 Global monitoring using monitor-at-target approach A3-10 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 ARM Architecture Memory Model A3.4.1 Exclusive access instructions and Non Shared memory regions For memory regions that do not have the Shared attribute, the exclusive access instructions rely on a local monitor that tags any address from which the processor executes a Load-Exclusive. Any non-aborted attempt by the same processor to use a Store-Exclusive to modify any address is guaranteed to clear the tag. Load-Exclusive performs a load from memory, and: • the executing processor tags the fact that it has an outstanding tagged physical address to non-sharable memory • the local monitor of the executing processor transitions to its Exclusive Access state. Store-Exclusive performs a conditional store to memory: • if the local monitor of the executing processor is in its Exclusive Access state: — the store takes place — a status value of 0 is returned to a register — the local monitor of the executing processor transitions to its Open Access state. • if the local monitor of the executing processor is not in its Exclusive Access state: — no store takes place — a status value of 1 is returned to a register. The Store-Exclusive instruction defined the register to which the status value is returned. When a processor writes using any instruction other than a Store-Exclusive: • if the write is to a physical address that is not covered by its local monitor the write does not affect the state of the local monitor • if the write is to a physical address that is covered by its local monitor is IMPLEMENTATION DEFINED whether the write affects the state of the local monitor. If the local monitor is in its Exclusive Access state and a processor performs a Store-Exclusive to any address in Non Shared memory other than the last one from which it has performed a Load-Exclusive, it is IMPLEMENTATION DEFINED whether the store succeeds. This mechanism: • is used on a context switch, see Context switch support on page A3-16 • should be treated as a software programming error in all other cases. Note In non-shared memory, it is UNPREDICTABLE whether a store to a tagged physical address will cause a tag to be cleared if that store is by a processor other than the one that caused the physical address to be tagged. The state machine for the local monitor is shown in Figure A3-5 on page A3-12. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A3-11 ARM Architecture Memory Model CLREX STREX(x) STR(x) LDREX(x) Open Access LDREX(x1) Exclusive Access CLREX STR(Tagged_address) STREX(Tagged_address) STREX(!Tagged_address) STR(!Tagged_address) STR(Tagged_address) The operations in italics show possible alternative iMPLEMENTATION DEFINED options. Figure A3-5 Local monitor state machine diagram Note The IMPLEMENTATION DEFINED options for the local monitor are consistent with the local monitor being constructed so that it does not hold any physical address, but instead treats any access as matching the address of the previous LDREX. Table A3-4 shows the effect of the Load-Exclusive and Store-Exclusive instructions shown in Figure A3-5. Table A3-4 Effect of Exclusive instructions on local monitor Initial state Operationa Effect Final state Open access CLREX No effect Open access Open access STREX(x) Does not update memory, returns status 1 Open access Open access LDREX(x) Loads value from memory, tags address x Exclusive access Exclusive access CLREX Clears tagged address Open access Exclusive access STREX(t) Updates memory, returns status 0 Open access Exclusive access STREX(!t) Updates memory, returns status 0 Open access Exclusive access LDREX(x1) Loads value from memory, changes tag to address to x1 Exclusive access a. STREX and LDREX are used as examples of the exclusive access instructions. t is the tagged address, bits[31:a] of the address of the last Load-Exclusive instruction, see Size of the tagged memory block on page A3-16. Figure A3-5 shows the behavior of the local address monitor associated with the processor issuing the LDREX, STREX and STR instructions. It is UNPREDICTABLE whether the transition from Exclusive Access to Open Access occurs when the STR or STREX is from a different processor. A local monitor implementation can be unaware of Load-Exclusive and Store-Exclusive operations from other processors. A3-12 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 ARM Architecture Memory Model A3.4.2 Exclusive access instructions and shared memory regions For memory regions that have the Shared attribute, exclusive access instructions rely on: • A local monitor for each processor in the system, that tags any address from which the processor executes a Load-Exclusive. The local monitor operates as described in Exclusive access instructions and Non Shared memory regions on page A3-11, and can ignore exclusive accesses from other processors in the system. • A single global monitor that tags a physical address as exclusive access for a particular processor. This tag is used later to determine whether an Store-Exclusive to that address can occur. Any non-aborted attempt to modify the tagged address by any processor is guaranteed to clear the tag. For each processor in the system, the global monitor: — holds a single tagged address — maintains a state machine. The global monitor can either: • reside in a processor block, as illustrated in Figure A3-3 on page A3-10 • exist as a secondary monitor at the memory interfaces, as shown in Figure A3-4 on page A3-10. An implementation can combine the functionality of the global and local monitors into a single unit. Operation of the global monitor Load-Exclusive from shared memory performs a load from memory, and causes the physical address of the access to be tagged as exclusive access for the requesting processor. This access also causes the exclusive access tag to be removed from any other physical address that has been tagged by the requesting processor. The global monitor only supports a single outstanding exclusive access to sharable memory per processor. Store-Exclusive performs a conditional store to memory: • • • The store is guaranteed to succeed only if the physical address accessed is tagged as exclusive access for the requesting processor and both the local monitor and the global monitor state machine for the requesting processor are in the Exclusive Access state. In this case: — a status value of 0 is returned to a register to acknowledge the successful store — the final state of the global for the requesting processor is implementation defined — the global monitor for any other processor that has tagged the address accessed transitions to Open Access. If no address is tagged as exclusive access for the requesting processor, the store does not succeed: — a status value of 1 is returned to a register to indicate that the store failed — the global monitor is not affected and remains Open Access for the requesting processor. If a different physical address is tagged as exclusive access for the requesting processor, it is whether the store succeeds or not: IMPLEMENTATION DEFINED — ARM DDI 0405A-01 if the store succeeds a status value of 0 is returned to a register, otherwise a value of 1 is returned Copyright © 2006 ARM Limited. All rights reserved. Beta A3-13 ARM Architecture Memory Model — if the global monitor for the processor was in the Open Access state before the Store-Exclusive it remains in the Open Access state — if the global monitor for the processor was in the Exclusive Access state before the Store-Exclusive it is IMPLEMENTATION DEFINED whether the global monitor transitions to the Open Access state. The Store-Exclusive instruction defines the register to which the status value is returned. In a shared memory system, the global monitor must implement a separate state machine for each processor in the system. In this context, the term processor includes any independent DMA agent. The state machine for Shared memory accesses by processor(n) can respond to all the Shared memory transactions visible to it: • transactions generated by the associated processor (n) • transactions generated by the other processors in the shared memory system (!n). The state machine behavior is illustrated in Figure A3-6. CLREX(n), CLREX(!n), STREX(x,n), STR(x,n), LDREX(x,!n), STREX(x,!n),STR(x,!n) CLREX(n), CLREX(!n), LDREX(x1,n) LDREX(x,n) Open Access Exclusive Access STREX(Tagged_address,!n) STR(Tagged_address,!n) STREX(Tagged_address,n) STREX(!Tagged_address,n) STR(Tagged_address,n) STREX(Tagged_address,!n) STR(!Tagged_address,n) STREX(Tagged_address,n) STREX(!Tagged_address,n) STR(Tagged_address,n) STREX(!Tagged_address,!n) STR(!Tagged_address,!n) STREX(Tagged_Address,!n) only clears the monitor if the STREX updates memory The operations in italics show possible alternative iMPLEMENTATION DEFINED options. Figure A3-6 Global monitor state machine diagram for processor(n) in a multiprocessor system Note A3-14 • Whether a Store-Exclusive successfully updates memory or not depends on whether the address accessed matches the tagged shared memory address for the processor issuing the Store-Exclusive instruction. For this reason, Figure A3-6 and Table A3-5 on page A3-15 only show how the (!n) entries cause state transitions of the state machine for processor n. • An Load-Exclusive can only update the tagged shared memory address for the processor issuing the Load-Exclusive instruction. • CLREX instructions do not affect the global monitor. Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 ARM Architecture Memory Model Table A3-5 shows the effect of the Load-Exclusive and Store-Exclusive instructions shown in Figure A3-6 on page A3-14. Table A3-5 Effect of Exclusive instructions on global monitor for processor n Initial statea Operationb Effect Final statea Open CLREX(n), CLREX(!n) None Open Open STREX(x,n) Does not update memory, returns status 1 Open Open LDREX(x,!n) Loads value from memory, no effect on tag address for processor n Open Open STREX(x,!n) Depends on state machine and tag address for processor issuing STREX Open Open LDREX(x,n) Loads value from memory, tags address x Exclusive Exclusive LDREX(x1,n) Loads value from memory, tags address x1 Exclusive Exclusive CLREX(n), CLREX(!n) None Exclusive Updates memory, returns status 0c Open Exclusive STREX(t,!n) Does not update memory, returns status 1c Exclusive Open Exclusive STREX(t,n) Updates memory, returns status 0d Exclusive Open Updates memory, returns status 0e Exclusive Exclusive STREX(!t,n) Open Does not update memory, returns status 1e Exclusive Exclusive STREX(!t,!n) Depends on state machine and tag address for processor issuing STREX Exclusive a. Open = Open access, Exclusive = Exclusive access. b. STREX and LDREX are used as examples of the exclusive access instructions. t is the tagged address for processor n, bits[31:a] of the address of the last LDREX instruction issued by processor n, see Size of the tagged memory block on page A3-16. c. The result of the STREX(t,!n) operation depends on the state machine and tagged address for the processor issuing the STREX instruction. This table shows how each possible outcome affects the state machine for processor n. d. After a successful STREX to the tagged address, the state of the state machine is IMPLEMENTATION DEFINED. However, this state has no effect on the subsequent operation of the global monitor. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A3-15 ARM Architecture Memory Model e. Effect is IMPLEMENTATION DEFINED. The table shows all permitted implementations. A3.4.3 Size of the tagged memory block As shown in Figure A3-5 on page A3-12 and Figure A3-6 on page A3-14, when a LDREX instruction is executed, the resulting tag address ignores the least significant bits of the memory address: Tagged_address == Memory_address[31:a] The value of a in this assignment is IMPLEMENTATION DEFINED, between a minimum value of 2 and a maximum value of 7. For example, in an implementation where a = 4, a successful LDREX of address 0x000341B4 gives a tag value of bits [31:4] of the address, giving 0x000341B. This means that the four words of memory from 0x000341B0 to 0x000341BF are tagged for exclusive access. Subsequently, a valid STREX to any address in this block will remove the tag. Therefore, the size of the tagged memory block is IMPLEMENTATION DEFINED between: • one word, in an implementation with a = 2 • 32 words, in an implementation with a = 7. A3.4.4 Context switch support It is necessary to ensure that the local monitor is in the Open Access state after a context switch. In ARMv7-M, the local monitor is changed to Open Access automatically as part of an exception entry or exit sequence. The local monitor can also be forced to the Open Access state by a CLREX instruction. Note Context switching is not an application level operation. However, this information is included here to complete the description of the exclusive operations. The STREX or CLREX instruction following a context switch might cause a subsequent Store-Exclusive to fail, requiring a load … store sequence to be replayed. To minimize the possibility of this happening, ARM Limited recommends that the Store-Exclusive instruction is kept as close as possible to the associated Load-Exclusive instruction, see Load-Exclusive and Store-Exclusive usage restrictions. A3.4.5 Load-Exclusive and Store-Exclusive usage restrictions The Load-Exclusive and Store-Exclusive instructions are designed to work together, as a pair, for example a LDREX/STREX pair or a LDREXB/STREXB pair. As mentioned in Context switch support, ARM Limited recommends that the Store-Exclusive instruction always follows within a few instructions of its associated Load-Exclusive instructions. In order to support different implementations of these functions, software must follow the notes and restrictions given here. A3-16 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 ARM Architecture Memory Model These notes describe use of a LDREX/STREX pair, but apply equally to any other Load-Exclusive/Store-Exclusive pair: 1. The exclusives are designed to support a single outstanding exclusive access for each processor thread that is executed. The architecture makes use of this by not mandating an address or size check as part of the IsExclusiveLocal() function. If the target address of an STREX is different from the preceding LDREX within the same execution thread, behavior can be UNPREDICTABLE. As a result, an LDREX/STREX pair can only be relied upon to eventually succeed if they are executed with the same address. Where a context switch or exception might result in a change of execution thread, a CLREX instruction or a dummy STREX instruction must be executed to avoid unwanted effects, as described in Context switch support on page A3-16 Using an STREX in this way is the only occasion where software can program an STREX with a different address from the previously executed LDREX. 2. An explicit store to memory can cause the clearing of exclusive monitors associated with other processors, therefore, performing a store between the LDREX and the STREX can result in a livelock situation. As a result, code must avoid placing an explicit store between an LDREX and an STREX within a single code sequence. 3. Two STREX instructions executed without an intervening LDREX will also result in the second STREX returning a status value of 1. As a result: • each STREX must have a preceding LDREX associated with it within a given thread of execution • it is not necessary for each LDREX to have a subsequent STREX. 4. Implementations can cause apparently spurious clearing of the exclusive monitor between the LDREX and the STREX, as a result of, for example, cache evictions. Code designed to run on such implementations should avoid having any explicit memory transactions or cache maintenance operations between the LDREX and STREX instructions. 5. Implementations can benefit from keeping the LDREX and STREX operations close together in a single code sequence. This minimizes the likelihood of the exclusive monitor state being cleared between the LDREX instruction and the STREX instruction. Therefore, ARM Limited strongly recommends a limit of 128 bytes between LDREX and STREX instructions in a single code sequence, for best performance. 6. Implementations that implement coherent protocols, or have only a single master, might combine the local and global monitors for a given processor. The IMPLEMENTATION DEFINED and UNPREDICTABLE parts of the definitions in are designed to cover this behavior. 7. The architecture sets an upper limit of 128 bytes on the regions that can be marked as exclusive. Therefore, for performance reasons, ARM Limited recommends that software separates objects that will be accessed by exclusive accesses by at least 128 bytes. This is a performance guideline rather than a functional requirement. 8. LDREX and STREX operations must be performed only on memory with the Normal memory attribute. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A3-17 ARM Architecture Memory Model A3.4.6 Synchronization primitives and the memory order model The synchronization primitives follow the memory ordering model of the memory type accessed by the instructions. For this reason: • Portable code for claiming a spinlock must include a DMB instruction between claiming the spinlock and making any access that makes use of the spinlock. • Portable code for releasing a spinlock must include a DMB instruction before writing to clear the spinlock. This requirement applies to code using the Load-Exclusive/Store-Exclusive instruction pairs, for example LDREX/STREX. A3-18 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 ARM Architecture Memory Model A3.5 Memory types ARMv7 defines a set of memory attributes with the characteristics required to support all memory and devices in the system memory map. The ordering of accesses for regions of memory is also defined by the memory attributes. There are three mutually exclusive main memory type attributes to describe the memory regions: • Normal • Device • Strongly Ordered. Memory used for program execution and data storage generally complies with Normal memory. Examples of Normal memory technology are: • preprogrammed Flash (updating Flash memory can impose stricter ordering rules) • ROM • SRAM • SDRAM and DDR memory System peripherals (I/O) generally conform to different access rules; defined as Strongly Ordered or Device memory. Examples of I/O accesses are: • FIFOs where consecutive accesses add (write) or remove (read) queued values • interrupt controller registers where an access can be used as an interrupt acknowledge changing the state of the controller itself • memory controller configuration registers that are used to set up the timing (and correctness) of areas of normal memory • memory-mapped peripherals where the accessing of memory locations causes side effects within the system. In addition, the Shared attribute indicates whether Normal or Device memory is private to a single processor, or accessible from multiple processors or other bus master resources, for example, an intelligent peripheral with DMA capability. Strongly Ordered memory is required where it is necessary to ensure strict ordering of the access with respect to what occurred in program order before the access and after it. Strongly Ordered memory always assumes the resource is shared. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A3-19 ARM Architecture Memory Model Table A3-6provides a summary of the memory attributes. Table A3-6 Summary of memory attributes Memory type attribute Shared attribute Other attribute Strongly ordered Device Normal A3.5.1 Description All memory accesses to Strongly Ordered memory occur in program order. All Strongly Ordered accesses are assumed to be Shared. Shared Designed to handle memory mapped peripherals that are shared by several processors. Non-shared Designed to handle memory mapped peripherals that are used only by a single processor. Shared Non-cacheable/Write-T hrough cacheable/Write-Back cacheable Designed to handle normal memory which is shared between several processors. Non-shared Non-cacheable/Write-T hrough cacheable/Write-Back cacheable Designed to handle normal memory which is used only by a single processor. Atomicity The terms Atomic and Atomicity are used within computer science to describe a number of properties for memory accesses. Within the ARM architecture, the following definitions are used: Single-copy atomicity The property of Single-copy atomicity is exhibited for read and write operations if the following conditions are met: A3-20 1. After every two write operations to an operand, either the value of the first write operation or the value of the second write operation remains in the operand. Thus, it is impossible for part of the value of the first write operation and part of the second write operation to remain in the operand. 2. When a read operation and a write operation occur to the same operand, the value obtained by the read operation is either the value of the operand before the write operation or the value of the operand after the write operation. It is never the case that the value of the read operation is partly the value of the operand before the write operation and partly the value of the operand after the write operation. Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 ARM Architecture Memory Model The only ARMv7-M explicit accesses made by the ARM processor which exhibit single-copy atomicity are: • All byte transactions • All halfword transactions to 16-bit aligned locations • All word transactions to 32-bit aligned locations LDM, LDC, LDRD, STM, STC, STRD, PUSH and POP operations are seen to be a sequence of 32-bit transactions aligned to 32 bits. Each of these 32-bit transactions are guaranteed to exhibit single-copy atomicity. Sub-sequences of two or more 32-bit transactions from the sequence also do not exhibit single-copy atomicity. Where a transaction does not exhibit single-copy atomicity, it is seen as a sequence of transactions of bytes which do exhibit single-copy atomicity. For implicit accesses: • Cache linefills/evictions are seen to be a sequence of 32-bit transactions aligned to 32 bits. Each of these 32-bit transactions exhibits single-copy atomicity. Sub-sequences of two or more 32-bit transactions from the sequence also do not exhibit single-copy atomicity • Instruction fetches exhibit single-copy atomicity for the individual instructions being fetched; it must be noted that 32-bit thumb instructions are comprised of 2 16-bit quantities. Multi-copy atomicity In a multiprocessing system, writes to a memory location exhibit Multi-copy atomicity if: 1. All writes to the same location are serialised that is they are observed in the same order to all copies of the location. 2. The value of a write is not returned by a read until all copies of the location have seen that write No writes to Normal memory exhibit Multi-copy atomicity All writes to Device and Strongly-Ordered memory which exhibit Single-copy atomicity also exhibit Multi-copy atomicity. All write transactions to the same location are serialised. Write transactions to Normal memory can be repeated up to the point that another write to the same address is observed. Serialisation of writes does not prohibit the merging of writes for Normal memory. A3.5.2 Normal memory attribute Normal memory is idempotent, exhibiting the following properties: • read transactions can be repeated with no side effects • repeated read transactions return the last value written to the resource being read • read transactions can prefetch additional memory locations with no side effects. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A3-21 ARM Architecture Memory Model • • • write transactions can be repeated with no side effects, provided that the location is unchanged between the repeated writes unaligned accesses can be supported transactions can be merged prior to accessing the target memory system Normal memory can be read/write or read-only. The Normal memory attribute can be further defined as being Shared or Non-Shared, and describes most memory used in a system. Accesses to Normal Memory conform to the weakly-ordered model of memory ordering. A description of the weakly-ordered model can be found in standard texts describing memory ordering issues. A recommended text is chapter 2 of Memory Consistency Models for Shared Memory-Multiprocessors, Kourosh Gharachorloo, Stanford University Technical Report CSL-TR-95-685. All explicit accesses must correspond to the ordering requirements of accesses described in Memory access order on page A3-27. Instructions which conform to the 32-bit sequence of transactions classification as defined in Atomicity on page A3-20 can be abandoned if a MemManage or BusFault exception occurs during the sequence of transactions. The instruction will be restarted on return from the exception, and one or more of the memory locations can be accessed multiple times. For Normal memory, this can result in repeated write transactions to a location which has been changed between the repeated writes. Non-shared Normal memory The Non-Shared Normal memory attribute is designed to describe normal memory that can be accessed only by a single processor. A region of memory marked as Non-Shared Normal does not have any requirement to make the effect of a cache transparent. For regions of memory marked as Non-shared Non-cacheable, a Data Synchronization Barrier (DSB) instruction must be used in situations where previous accesses must be made visible to other observers. See Memory barriers on page A3-30 for more details. Shared Normal memory The Shared Normal memory attribute is designed to describe normal memory that can be accessed by multiple processors or other system masters. A region of memory marked as Shared Normal is one in which the effect of interposing a cache (or caches) on the memory system is entirely transparent to data accesses. Explicit software management is still required to ensure coherency of instruction caches. Implementations can use a variety of mechanisms to support this, from very simply not caching accesses in shared regions to more complex hardware schemes for cache coherency for those regions. Cacheable write-through, cacheable write-back and non-cacheable memory In addition to marking a region of Normal memory as being Shared or Non-Shared, regions can also be marked as being one of: • cacheable write-through A3-22 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 ARM Architecture Memory Model • • cacheable write-back non-cacheable. This marking is independent of the marking of a region of memory as being Shared or Non-Shared. It indicates the required handling of the data region for reasons other than those to handle the requirements of shared data. As a result, it is acceptable for a region of memory that is marked as being cacheable and shared not to be held in the cache in an implementation which handles shared regions as not caching the data. A3.5.3 Device memory attribute The Device memory attribute is defined for memory locations where an access to the location can cause side effects, or where the value returned for a load can vary depending on the number of loads performed. Memory mapped peripherals and I/O locations are typical examples of areas of memory that should be marked as being Device. Explicit accesses from the processor to regions of memory marked as Device occur at the size and order defined by the instruction. The number of accesses that occur to such locations is the number that is specified by the program. Implementations must not repeat accesses to such locations when there is only one access in the program, that is, the accesses are not restartable. An example where an implementation might want to repeat an access is before and after an interrupt, in order to allow the interrupt to cause a slow access to be abandoned. Such implementation optimizations must not be performed for regions of memory marked as Device. In addition, address locations marked as Device are non-cacheable. While writes to device memory can be buffered, writes shall only be merged where the correct number of accesses, order, and their size is maintained. Multiple accesses to the same address cannot change the number or order of accesses to that address. Coalescing of accesses is not permitted in this case. Accesses to Device memory can exhibit side effects. Device memory operations that have side effects that apply to Normal memory locations require Memory Barriers to ensure correct execution. An example is the programming of the configuration registers of a memory controller with respect to the memory accesses it controls. All explicit accesses to memory marked as Device must correspond to the ordering requirements of accesses described in Memory access order on page A3-27. Shared attribute The Shared attribute is defined by memory region and can be referred to as: • memory marked as Shared Device • memory marked as Non-Shared Device Memory marked as Non-Shared Device is defined as only accessible by a single processor. An example of a system supporting Shared and Non-shared Device memory is an implementation that supports a local bus for its private peripherals, whereas system peripherals are situated on the main (Shared) system bus. Such a system can have more predictable access times for local peripherals such as watchdog timers or interrupt controllers. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A3-23 ARM Architecture Memory Model A3.5.4 Strongly Ordered memory attribute Accesses to memory marked as Strongly Ordered have a strong memory-ordering model for all explicit memory accesses from that processor. An access to memory marked as Strongly Ordered is required to act as if a DMB memory barrier instruction were inserted before and after the access from that processor. See Data Memory Barrier (DMB) on page A3-31 for DMB details. Explicit accesses from the processor to memory marked as Strongly Ordered occur at their program size, and the number of accesses that occur to such locations is the number that are specified by the program. Implementations must not repeat accesses to such locations when there is only one access in the program, that is, the accesses are not restartable. Address locations marked as Strongly Ordered are not held in a cache, and are always treated as Shared memory locations. All explicit accesses to memory marked as Strongly Ordered must correspond to the ordering requirements of accesses described in Memory access order on page A3-27. A3.5.5 Memory access restrictions The following restrictions apply to memory accesses: • For any access X, the bytes accessed by X must all have the same memory type attribute, otherwise, the behavior of the access is UNPREDICTABLE. That is, unaligned accesses that span a boundary between different memory types are UNPREDICTABLE. • For any two memory accesses X and Y, such that X and Y are generated by the same instruction, X and Y must all have the same memory type attribute, otherwise, the results are UNPREDICTABLE. For example, an LDC, LDM, LDRD, STC, STM, or STRD that spans a boundary between Normal and Device memory is UNPREDICTABLE. • Instructions that generate unaligned memory accesses to Device or Strongly Ordered memory are UNPREDICTABLE. • For instructions that generate accesses to Device or Strongly Ordered memory, implementations do not change the sequence of accesses specified by the pseudo-code of the instruction. This includes not changing how many accesses there are, nor their time order, nor the data sizes and other properties of each individual access. Furthermore, processor core implementations expect any attached memory system to be able to identify accesses by memory type, and to obey similar restrictions with regard to the number, time order, data sizes and other properties of the accesses. Exceptions to this rule are: — A3-24 An implementation of a processor core can break this rule, provided that the information it does supply to the memory system enables the original number, time order, and other details of the accesses to be reconstructed. In addition, the implementation must place a requirement on attached memory systems to do this reconstruction when the accesses are to Device or Strongly Ordered memory. Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 ARM Architecture Memory Model For example, the word loads generated by an LDM can be paired into 64-bit accesses by an implementation with a 64-bit bus. This is because the instruction semantics ensure that the 64-bit access is always a word load from the lower address followed by a word load from the higher address, provided a requirement is placed on memory systems to unpack the two word loads where the access is to Device or Strongly Ordered memory. — Any implementation technique that produces results that cannot be observed to be different from those described above is legitimate. • Multi-access instructions that load or store the PC must only access normal memory. If they access Device or Strongly Ordered memory the results are UNPREDICTABLE. • Instruction fetches must only access normal memory. If they access Device or Strongly Ordered memory, the results are UNPREDICTABLE. By example, instruction fetches must not be performed to areas of memory containing read-sensitive devices, because there is no ordering requirement between instruction fetches and explicit accesses. Implementations can prefetch by an IMPLEMENTATION DEFINED amount down a sequential path from the instruction currently being executed. To ensure correctness, read-sensitive locations must be marked as non-executable (see User/privileged access and Read/Write access control for Instruction Accesses on page A3-26). ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A3-25 ARM Architecture Memory Model A3.6 Access rights Access rights split into two classes: • Rights for data accesses • Rights for instruction prefetching Furthermore, the access right can be restricted to privileged execution only. A3.6.1 User/privileged access and Read/Write access control for Data Accesses The memory attributes are allowed to define for explicit reads and for explicit writes that a region of memory is: • Not accessible to any accesses • Accessible only to Privileged accesses • Accessible to Privileged and Non-Privileged accesses Not all combinations of memory attributes for reads and writes are supported by all systems which define the memory attributes. If an attempt is made to read or write non-accessible data, a data access error will occur. Privileged accesses are accesses made as a result of a load or store operation (other than LDRT, STRT, LDRBT, STRBT, LDRHT, STRHT, LDRSHT, LDRSBT) during privileged execution. Unprivileged accesses are accesses made as a result of a load or store operation when the processor is executing unprivileged code, or any made as a result of the following instructions: LDRT, STRT, LDRBT, STRBT, LDRHT, STRHT, LDRSHT, LDRSBT A3.6.2 User/privileged access and Read/Write access control for Instruction Accesses The memory attributes can define that a region of memory is: • Not accessible for execution Prefetching must not occur from locations marked as non-executable • Accessible for execution by Privileged processes only • Accessible for execution by Privileged and Non-Privileged processes The mechanism by which this is described is that the region is described as accessible for reads by a privileged read access (or by privileged and non-privileged read access) and is suitable for execution. As a result, there is some linkage between the memory attributes which define the accessibility to explicit memory accesses, and those which define that a region can be executed. If execution is attempted to any memory locations for which the attributes are not permitted, an instruction execution error will occur. A3-26 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 ARM Architecture Memory Model A3.7 Memory access order The memory types defined in Memory types on page A3-19 have associated memory ordering rules to provide system compatibility for software between different implementations. The rules are defined to accommodate the increasing difficulty of ensuring linkage between the completion of memory accesses and the execution of instructions within a complex high-performance system, while also allowing simple systems and implementations to meet the criteria with predictable behaviour. The memory order model determines: • when side effects are guaranteed to be visible • the requirements for memory consistency Shared memory indicating whether a region of memory is shared between multiple processors (and therefore requires an appearance of cache transparency in an ordering model) is supported. Implementations remain free to choose the mechanisms to implement this functionality. Additional attributes and behaviors relate to the memory system architecture. These features are defined in other areas of this manual (see Access rights on page A3-26, Chapter C1 Debug and Chapter B1 System Level Programmer’s Model on access permissions, the system memory map and the Protected Memory System Architecture respectively). A3.7.1 Read and write definitions Memory accesses can be either reads or writes. Reads Reads are defined as memory operations that have the semantics of a load.For ARMv7-M and Thumb-2 these are: • LDR{S}B{T}, LDR{S}H{T}, LDR{T} • LDMIA, LDMDB, LDRD, POP, LDC • LDREX{B,H}, STREX{B,H} • TBB, TBH Writes Writes are defined as operations that have the semantics of a store.For ARMv7-M and Thumb-2 these are: • STRB{T}, STRH{T}, STR{T} • STMIA, STMDB, STRD, PUSH, STC, STREX{B,H} Memory synchronization primitives Synchronization primitives are required to ensure correct operation of system semaphores within the memory order model. The memory synchronization primitive instructions are defined as those instructions that are used to ensure memory synchronization: ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A3-27 ARM Architecture Memory Model LDREX{B,H}, STREX{B,H} These instructions are supported to shared and non-shared memory. Non-shared memory can be used when the processes to be synchronized are running on the same processor. When the processes to be synchronized are running on different processors, shared memory must be used. A3.7.2 Observability and completion The concept of observability applies to all memory, however, the concept of global observability only applies to shared memory. Normal, Device and Strongly Ordered memory are defined in Memory types on page A3-19. For all memory: • A write to a location in memory is said to be observed by a memory system agent (the observer) when a subsequent read of the location by the observer returns the value written by the write. • A write to a location in memory is said to be globally observed when a subsequent read of the location by any memory system agent returns the value written by the write. • A read to a location in memory is said to be observed by a memory system agent (the observer) when a subsequent write of the location by the observer has no effect on the value returned by the read. • A read to a location in memory is said to be globally observed when a subsequent write of the location by any memory system agent has no effect on the value returned by the read. Additionally, for Strongly Ordered memory: • A read or write to a memory mapped location in a peripheral which exhibits side-effects is said to be observed, and globally observed, only when the read or write meets the general conditions listed, can begin to affect the state of the memory-mapped peripheral, and can trigger any side effects that affect other peripheral devices, cores and/or memory. For all memory, a read or write is defined to be complete when it is globally observed: • A branch predictor maintenance operation is defined to be complete when the effects of operation are globally observed. To determine when any side effects have completed, it is necessary to poll a location associated with the device, for example, a status register. Side effect completion in Strongly Ordered and Device memory For all memory-mapped peripherals, where the side-effects of a peripheral are required to be visible to the entire system, the peripheral must provide an IMPLEMENTATION DEFINED location which can be read to determine when all side effects are complete. This is a key element of the architected memory order model. A3-28 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 ARM Architecture Memory Model A3.7.3 Ordering requirements for memory accesses ARMv7-M defines access restrictions in the memory ordering allowed, depending on the memory attributes of the accesses involved. Table A3-7 shows the memory ordering between two explicit accesses A1 and A2, where A1 (as listed in the first column) occurs before A2 (as listed in the first row) in program order. The symbols used in Table A3-76-3 are as follows: < Accesses must be globally observed in program order, that is, A1 must be globally observed strictly before A2. (blank) Accesses can be globally observed in any order, provided that the requirements of uniprocessor semantics, for example respecting dependencies between instructions within a single processor, are maintained. Table A3-7 Memory order restrictions A2 Normal Access A1 Device Access (Non-Shared) (Shared) Normal Access < Device Access (Non-Shared) < Device Access (Shared) Strongly Ordered Access Strongly Ordered Access < < < < < < < There are no ordering requirements for implicit accesses to any type of memory. A3.7.4 Program order for instruction execution Program order of instruction execution is the order of the instructions in the control flow trace. Explicit memory accesses in an execution can be either: Strictly Ordered Denoted by <. Must occur strictly in order. Ordered Denoted by <=. Must occur either in order, or simultaneously. Multiple load and store instructions, such as LDM{IA || DB}, LDRD, POP, STM{IA || DB}, PUSH and STRD, generate multiple word accesses, each of which is a separate access for the purpose of determining ordering. The rules for determining program order for two accesses A1 and A2 are: ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A3-29 ARM Architecture Memory Model If A1 and A2 are generated by two different instructions: • A1 < A2 if the instruction that generates A1 occurs before the instruction that generates A2 in program order • A2 < A1 if the instruction that generates A2 occurs before the instruction that generates A1 in program order. If A1 and A2 are generated by the same instruction: • • A3.7.5 If A1 and A2 are the load and store generated by a SWP or SWPB instruction: — A1 < A2 if A1 is the load and A2 is the store — A2 < A1 if A2 is the load and A1 is the store. If A1 and A2 are two word loads generated by an LDC, LDRD, or LDM instruction, or two word stores generated by an STC, STRD, or STM instruction, excluding LDM or STM instructions whose register list includes the PC: — A1 <= A2 if the address of A1 is less than the address of A2 — A2 <= A1 if the address of A2 is less than the address of A1. • If A1 and A2 are two word loads generated by an LDM instruction whose register list includes the PC or two word stores generated by an STM instruction whose register list includes the PC, the program order of the memory operations is not defined. • If A1 and A2 are two word loads generated by an LDRD instruction or two word stores generated by an STRD instruction whose register list includes the PC, the instruction is UNPREDICTABLE. Memory barriers Memory barrier is the general term applied to an instruction, or sequence of instructions, used to force synchronization events by a processor with respect to retiring load/store instructions in a processor core. A memory barrier is used to guarantee completion of preceding load/store instructions to the programmer’s model, flushing of any prefetched instructions prior to the event, or both. ARMv7-M includes three explicit barrier instructions to support the memory order model. The instructions can execute from privileged or unprivileged code. • Data Memory Barrier (DMB) as described in Data Memory Barrier (DMB) on page A3-31 • Data Synchronization Barrier (DSB) as described in Data Synchronization Barrier (DSB) on page A3-31 • Instruction Synchronization Barrier (ISB) as described in Instruction Synchronization Barrier (ISB) on page A3-31 Explicit memory barriers affect reads and writes to the memory system generated by load and store instructions being executed in the CPU. Reads and writes generated by DMA transactions and instruction fetches are not explicit accesses. A3-30 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 ARM Architecture Memory Model A3.7.6 Data Memory Barrier (DMB) DMB acts as a data memory barrier, exhibiting the following behavior: • All explicit memory accesses by instructions occurring in program order before this instruction are globally observed before any explicit memory accesses due to instructions occurring in program order after this instruction are observed. • The DMB instruction has no effect on the ordering of other instructions executing on the processor. As such, DMB ensures the apparent order of the explicit memory operations before and after the instruction, without ensuring their completion. For details on the DMB instruction, see Chapter A5 Thumb Instructions. A3.7.7 Data Synchronization Barrier (DSB) The DSB instruction operation acts as a special kind of Data Memory Barrier. The DSB operation completes when all explicit memory accesses before this instruction complete. In addition, no instruction subsequent to the DSB can execute until the DSB completes. For details on the DSB instruction, see Chapter A5 Thumb Instructions. A3.7.8 Instruction Synchronization Barrier (ISB) The ISB instruction flushes the pipeline in the processor, so that all instructions following the pipeline flush are fetched from memory after the instruction has been completed. It ensures that the effects of context altering operations, such as branch predictor maintenance operations, as well as all changes to the special-purpose registers where applicable (see The special-purpose control register on page B1-9) executed before the ISB instruction are visible to the instructions fetched after the ISB. In addition, the ISB instruction ensures that any branches which appear in program order after the ISB are always written into the branch prediction logic with the context that is visible after the ISB. This is required to ensure correct execution of the instruction stream. Any context altering operations appearing in program order after the ISB only take effect after the ISB has been executed. This is due to the behavior of the context altering instructions. ARM implementations are free to choose how far ahead of the current point of execution they prefetch instructions; either a fixed or a dynamically varying number of instructions. As well as being free to choose how many instructions to prefetch, an ARM implementation can choose which possible future execution path to prefetch along. For example, after a branch instruction, it can choose to prefetch either the instruction following the branch or the instruction at the branch target. This is known as branch prediction. A potential problem with all forms of instruction prefetching is that the instruction in memory can be changed after it was prefetched but before it is executed. If this happens, the modification to the instruction in memory does not normally prevent the already prefetched copy of the instruction from executing to completion. The ISB and memory barrier instructions (DMB or DSB as appropriate) are used to force execution ordering where necessary. For details on the ISB instruction, see Chapter A5 Thumb Instructions. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A3-31 ARM Architecture Memory Model A3.8 Caches and memory hierarchy Support for caches in ARMv7-M is limited to memory attributes. These can be exported on a supporting bus protocol such as AMBA (AHB or AXI protocols) to support system caches. In situations where a breakdown in coherency can occur, software must manage the caches using cache maintenance operations which are memory mapped and IMPLEMENTATION DEFINED. A3.8.1 Introduction to caches A cache is a block of high-speed memory locations containing both address information (commonly known as a TAG) and the associated data. The purpose is to increase the average speed of a memory access. Caches operate on two principles of locality: Spatial locality an access to one location is likely to be followed by accesses from adjacent locations, for example, sequential instruction execution or usage of a data structure Temporal locality an access to an area of memory is likely to be repeated within a short time period, for example, execution of a code loop To minimise the quantity of control information stored, the spatial locality property is used to group several locations together under the same TAG. This logical block is commonly known as a cache line. When data is loaded into a cache, access times for subsequent loads and stores are reduced, resulting in overall performance benefits. An access to information already in a cache is known as a cache hit, and other accesses are called cache misses. Normally, caches are self-managing, with the updates occurring automatically. Whenever the processor wants to access a cacheable location, the cache is checked. If the access is a cache hit, the access occurs immediately, otherwise a location is allocated and the cache line loaded from memory. Different cache topologies and access policies are possible, however, they must comply with the memory coherency model of the underlying architecture. Caches introduce a number of potential problems, mainly because of: • Memory accesses occurring at times other than when the programmer would normally expect them • There being multiple physical locations where a data item can be held A3.8.2 Implication of caches to the application programmer Caches are largely invisible to the application programmer, but can become visible due to a breakdown in coherency. Such a breakdown can occur: • When memory locations are updated by other agents in the systems • When memory updates made from the application code must be made visible to other agents in the system For example: A3-32 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 ARM Architecture Memory Model In systems with a DMA that reads memory locations which are held in the data cache of a processor, a breakdown of coherency occurs when the processor has written new data in the data cache, but the DMA reads the old data held in memory. In a Harvard architecture of caches, a breakdown of coherency occurs when new instruction data has been written into the data cache and/or to memory, but the instruction cache still contains the old instruction data. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A3-33 ARM Architecture Memory Model A3.9 Bit banding ARMv7-M supports bit-banding. This feature is designed to be used with on-chip RAM or peripherals. A bit-band address space means that the address space supports bit-wise as well as (multi)byte accesses through an aliased address range. Byte, halfword, word and multi-byte accesses are supported by the primary address range associated with the bit-band memory in accordance with the memory type attribute associated with the address range. The aliased address range maps a word of address space (four bytes) to each bit. For a detailed explanation of bit banding, see Bit Banding on page B2-5. Note Where a primary bit-band region is supported in the system memory map, the associated aliased bit-band region must be supported. Where no primary region support exists, the corresponding aliased region must also behave as non-existent memory. A3-34 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 Chapter A4 The Thumb Instruction Set This chapter describes the Thumb® instruction set. It contains the following sections: • Instruction set encoding on page A4-2 • Instruction encoding for 32-bit Thumb instructions on page A4-12 • Conditional execution on page A4-33 • UNDEFINED and UNPREDICTABLE instruction set space on page A4-37 • Usage of 0b1111 as a register specifier on page A4-39 • Usage of 0b1101 as a register specifier on page A4-41 ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A4-1 The Thumb Instruction Set A4.1 Instruction set encoding Thumb instructions are either 16-bit or 32-bit. Bits<15:11> of the halfword that the PC points to determine whether it is a 16-bit instruction, or whether the following halfword is the second part of a 32-bit instruction. Table A4-1 shows how the instruction set space is divided between 16-bit and 32-bit instructions. An x in the encoding indicates any bit, except that any combination of bits already defined is excluded. Table A4-1 Determination of instruction length A4-2 hw1<15:11> Function 0b11100 Thumb 16-bit unconditional branch instruction, defined in all Thumb architectures. 0b111xx Thumb 32-bit instructions, defined in Thumb-2, see Instruction encoding for 32-bit Thumb instructions on page A4-12. 0bxxxxx Thumb 16-bit instructions. Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 The Thumb Instruction Set A4.2 Instruction encoding for 16-bit Thumb instructions Figure A4-1 shows the main divisions of the Thumb 16-bit instruction set space. An entry in square brackets, for example [1], indicates a note below the table. 15 14 13 Shift by immediate, move register 0 0 0 12 11 10 9 Add/subtract register 0 0 0 1 1 0 opc Add/subtract immediate 0 0 0 1 1 1 opc Add/subtract/compare/move immediate 0 0 1 Data-processing register 0 1 0 0 0 0 Special data processing 0 1 0 0 0 1 Branch/exchange instruction set 0 1 0 0 0 1 Load from literal pool 0 1 0 0 1 Load/store register offset 0 1 0 1 Load/store word/byte immediate offset 0 1 1 B L Load/store halfword immediate offset 1 0 0 0 L Load from or store to stack 1 0 0 1 L Rd SP-relative imm8 Add to SP or PC 1 0 1 0 SP Rd imm8 Miscellaneous [3] 1 0 1 1 x Load/store multiple 1 1 0 0 L Conditional branch 1 1 0 1 Undefined instruction 1 1 0 1 1 1 1 0 Service (system) call 1 1 0 1 1 1 1 1 Unconditional branch 1 1 1 0 0 32-bit instruction 1 1 1 0 1 x x x x x x x 32-bit instruction 1 1 1 1 x x x x x x x x opcode [1] 8 7 6 5 4 imm5 opcode Rn Rd imm3 Rn Rd 0 imm8 Rm opcode opcode [1] 1 DN Rm L Rm Rdn Rdn (0) (0) (0) PC-relative imm8 Rm opcode imm5 imm5 x 1 Rm Rd x 2 Rd Rdn 1 3 Rm x x x x Rn Rn Rd Rn Rd Rn Rd x x x x x x x x x x x x x x x x register list cond [2] imm8 x x x x x imm8 imm11 Figure A4-1 Thumb instruction set overview 1. 2. 3. opcode != 0b11. cond != 0b111x. See Miscellaneous instructions on page A4-9. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A4-3 The Thumb Instruction Set For further information about these instructions, see: • Table A4-2 for shift (by immediate) and move (register) instructions • Table A4-3 for add and subtract (register) instructions • Table A4-4 on page A4-5 for add and subtract (3-bit immediate) instructions • Table A4-5 on page A4-5 for add, subtract, compare and move (8-bit immediate) instructions • Table A4-6 on page A4-6 for data processing (register) instructions • Table A4-7 on page A4-6 for special data processing instructions • Table A4-8 on page A4-7 for branch and exchange instruction set instructions • LDR (literal) on page A5-105 for load from literal pool instructions • Table A4-9 on page A4-7 for load and store (register offset) instructions • Table A4-10 on page A4-7 for load and store, word or byte (immediate offset) instructions • Table A4-11 on page A4-7 for load and store, halfword (immediate offset) instructions • Table A4-12 on page A4-8 for load from or store to stack instructions • Table A4-13 on page A4-8 for add 8-bit immediate to SP or PC instructions • Miscellaneous instructions on page A4-9 for miscellaneous instructions • Table A4-14 on page A4-8 for load and store multiple instructions • B on page A5-40 for conditional branch instructions • SVC (formerly SWI) on page A5-285 for service (system) call instructions • B on page A5-40 for unconditional branch instructions. Table A4-2 Shift by immediate and move (register) instructions Function Instruction opcode imm5 Move register (not in IT block) MOV (register) on page A5-169 0b00 0b00000 Logical shift left LSL (immediate) on page A5-151 0b00 != 0b00000 Logical shift right LSR (immediate) on page A5-155 0b01 any Arithmetic shift right ASR (immediate) on page A5-36 0b10 any Table A4-3 Add and subtract (register) instructions A4-4 Function Instruction opc Add register ADD (register) on page A5-24 0b0 Subtract register SUB (register) on page A5-279 0b1 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 The Thumb Instruction Set Table A4-4 Add and subtract (3-bit immediate) instructions Function Instruction opc Add immediate ADD (immediate) on page A5-21 0b0 Subtract immediate SUB (immediate) on page A5-276 0b1 Table A4-5 Add, subtract, compare, and move (8-bit immediate) instructions Function Instruction opcode Move immediate MOV (immediate) on page A5-167 0b00 Compare immediate CMP (immediate) on page A5-73 0b01 Add immediate ADD (immediate) on page A5-21 0b10 Subtract immediate SUB (immediate) on page A5-276 0b11 ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A4-5 The Thumb Instruction Set Table A4-6 Data processing (register) instructions Function Instruction opcode Bitwise AND AND (register) on page A5-34 0b0000 Bitwise Exclusive OR EOR (register) on page A5-88 0b0001 Logical Shift Left LSL (register) on page A5-153 0b0010 Logical Shift Right LSR (register) on page A5-157 0b0011 Arithmetic shift right ASR (register) on page A5-38 0b0100 Add with carry ADC (register) on page A5-19 0b0101 Subtract with Carry SBC (register) on page A5-230 0b0110 Rotate Right ROR (register) on page A5-220 0b0111 Test TST (register) on page A5-301 0b1000 Reverse subtract (from zero) RSB (immediate) on page A5-224 0b1001 Compare CMP (register) on page A5-75 0b1010 Compare Negative CMN (register) on page A5-71 0b1011 Logical OR ORR (register) on page A5-196 0b1100 Multiply MUL on page A5-180 0b1101 Bit Clear BIC (register) on page A5-49 0b1110 Move Negative MVN (register) on page A5-184 0b1111 Table A4-7 Special data processing instructions A4-6 Function Instruction opcode Add (register, including high registers) ADD (register) on page A5-24 0b00 Compare (register, including high registers) CMP (register) on page A5-75 0b01 Move (register, including high registers) MOV (register) on page A5-169 0b10 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 The Thumb Instruction Set Table A4-8 Branch and exchange instruction set instructions Function Instruction L Branch and Exchange BX on page A5-57 0b0 Branch with Link and Exchange BLX (register) on page A5-55 0b1 Table A4-9 Load and store (register offset) instructions Function Instruction opcode Store word STR (register) on page A5-252 0b000 Store halfword STRH (register) on page A5-270 0b001 Store byte STRB (register) on page A5-256 0b010 Load signed byte LDRSB (register) on page A5-137 0b011 Load word LDR (register) on page A5-107 0b100 Load unsigned halfword LDRH (register) on page A5-129 0b101 Load unsigned byte LDRB (register) on page A5-113 0b110 Load signed halfword LDRSH (register) on page A5-145 0b111 Table A4-10 Load and store, word or byte (5-bit immediate offset) instructions Function Instruction B L Store word STR (immediate) on page A5-250 0b0 0b0 Load word LDR (immediate) on page A5-102 0b0 0b1 Store byte STRB (immediate) on page A5-254 0b1 0b0 Load byte LDRB (immediate) on page A5-109 0b1 0b1 Table A4-11 Load and store halfword (5-bit immediate offset) instructions Function Instruction L Store halfword STRH (immediate) on page A5-268 0b0 Load halfword LDRH (immediate) on page A5-125 0b1 ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A4-7 The Thumb Instruction Set Table A4-12 Load from stack and store to stack instructions Function Instruction L Store to stack STR (immediate) on page A5-250 0b0 Load from stack LDR (immediate) on page A5-102 0b1 Table A4-13 Add 8-bit immediate to SP or PC instructions Function Instruction SP Add (PC plus immediate) ADR on page A5-30 0b0 Add (SP plus immediate) ADD (SP plus immediate) on page A5-26 0b1 Table A4-14 Load and store multiple instructions A4-8 Function Instruction L Store multiple STMIA / STMEA on page A5-248 0b0 Load multiple LDMIA / LDMFD on page A5-99 0b1 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 The Thumb Instruction Set A4.2.1 Miscellaneous instructions Figure A4-2 lists miscellaneous Thumb instructions. An entry in square brackets, for example [1], indicates a note below the figure. 15 14 13 12 11 10 9 8 7 Adjust stack pointer 1 0 1 1 0 0 0 0 opc 6 5 4 3 2 1 0 Sign/zero extend 1 0 1 1 0 0 1 0 Compare and Branch on (Non-)Zero 1 0 1 1 N 0 i 1 Push/pop register list 1 0 1 1 L 1 0 R UNPREDICTABLE 1 0 1 1 0 1 1 0 0 1 0 0 x x x x Change Processor State 1 0 1 1 0 1 1 0 0 1 1 im 0 (0) I F UNPREDICTABLE 1 0 1 1 0 1 1 0 0 1 1 x 1 x x x Reverse bytes 1 0 1 1 1 0 1 0 Software breakpoint 1 0 1 1 1 1 1 0 If-Then instructions 1 0 1 1 1 1 1 1 cond NOP-compatible hints 1 0 1 1 1 1 1 1 hint imm7 opc Rm Rd imm5 Rn register list opc Rn Rd imm8 mask (!= 0b0000) 0 0 0 0 Figure A4-2 Miscellaneous Thumb instructions Note Any instruction with bits<15:12> = 1011, that is not shown in Figure A4-2, is an UNDEFINED instruction. For further information about these instructions, see: • Table A4-15 on page A4-10 for adjust stack pointer instructions • Table A4-16 on page A4-10 for sign or zero extend instructions • Table A4-17 on page A4-10 for compare (non-)zero and branch instructions • Table A4-18 on page A4-10 for push and pop instructions • CPS on page A5-77 for the change processor state instruction • Table A4-19 on page A4-11 for reverse bytes instructions • BKPT on page A5-51 for the software breakpoint instruction • IT on page A5-92 for the If-Then instruction • Table A4-20 on page A4-11 for NOP-compatible hint instructions. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A4-9 The Thumb Instruction Set Table A4-15 Adjust stack pointer instructions Function Instruction opc Increment stack pointer ADD (SP plus immediate) on page A5-26 0b0 Decrement stack pointer SUB (SP minus immediate) on page A5-281 0b1 Table A4-16 Sign or zero extend instructions Function Instruction opc Signed Extend Halfword SXTH on page A5-289 0b00 Signed Extend Byte SXTB on page A5-287 0b01 Unsigned Extend Halfword UXTH on page A5-315 0b10 Unsigned Extend Byte UXTB on page A5-313 0b11 Table A4-17 Compare and branch on (non-)zero instructions Function Instruction N Compare and branch on zero CBZ on page A5-61 0b0 Compare and branch on non-zero CBNZ on page A5-59 0b1 Table A4-18 Push and pop instructions A4-10 Function Instruction L Push registers PUSH on page A5-208 0b0 Pop registers POP on page A5-206 0b1 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 The Thumb Instruction Set Table A4-19 Reverse byte instructions Function Instruction opc Byte-Reverse Word REV on page A5-212 0b00 Byte-Reverse Packed Halfword REV16 on page A5-214 0b01 UNDEFINED - 0b10 Byte-Reverse Signed Halfword REVSH on page A5-216 0b11 Table A4-20 NOP-compatible hint instructions Function Instruction hint No operation NOP on page A5-188 0b0000 Yield YIELD on page A5-321 0b0001 Wait For Event WFE on page A5-317 0b0010 Wait For Interrupt WFI on page A5-319 0b0011 Send event SEV on page A5-236 0b0100 ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A4-11 The Thumb Instruction Set A4.3 Instruction encoding for 32-bit Thumb instructions Figure A4-3 shows the main divisions of the Thumb 32-bit instruction set space. The following sections give further details of each instruction type shown in Figure A4-3. hw1 15 14 13 12 11 10 9 8 hw2 7 6 Data processing: immediate, 1 1 1 1 0 including bitfield, and saturate Data processing, no immediate operand 1 1 1 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 1 0 1 Load and store single data item, memory hints 1 1 1 1 1 0 0 Load and Store, Double and Exclusive, and Table Branch 1 1 1 0 1 0 0 1 Load and Store Multiple, RFE and SRS 1 1 1 0 1 0 0 0 Branches, 1 1 1 1 0 miscellaneous control Coprocessor 1 1 1 1 1 1 1 1 Figure A4-3 Thumb 32-bit instruction set summary This section contains the following subsections: • Data processing instructions: immediate, including bitfield and saturate on page A4-13 • Data processing instructions, non-immediate on page A4-18 • Load and store single data item, and memory hints on page A4-25 • Load/store double and exclusive, and table branch on page A4-27 • Load and store multiple on page A4-29 • Branches, miscellaneous control instructions on page A4-30 • Coprocessor instructions on page A4-32. A4-12 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 The Thumb Instruction Set A4.3.1 Data processing instructions: immediate, including bitfield and saturate Figure A4-4 shows the encodings for: • data processing instructions with an immediate operand • data processing instructions with bitfield or saturating operations. hw1 15 14 13 12 11 10 9 8 hw2 7 6 5 4 3 2 1 General format 1 1 1 1 0 Data processing, modified 12-bit immediate 1 1 1 1 0 Add, Subtract, plain 12-bit immediate 1 1 1 1 0 Move, plain 16-bit immediate 1 1 1 1 0 9 8 7 6 5 4 3 2 1 0 0 S Rn 0 imm3 Rd imm8 i 1 0 O 0 OP2 P Rn 0 imm3 Rd imm8 imm4 0 imm3 Rd imm8 Rn 0 imm3 Rd i 0 i OP 1 0 O 1 OP2 P Bit field operations, Saturation with shift 1 1 1 1 0 (0) 1 1 Reserved 1 1 1 1 0 0 15 14 13 12 11 10 1 1 OP 0 1 imm2 (0) imm5 0 Figure A4-4 Data processing instructions: immediate, bitfield, and saturating This section contains the following subsections: • Data processing instructions with modified 12-bit immediate on page A4-14 • Data processing instructions with plain 12-bit immediate on page A4-16 • Data processing instructions with plain 16-bit immediate on page A4-16 • Data processing instructions, bitfield and saturate on page A4-17. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A4-13 The Thumb Instruction Set Data processing instructions with modified 12-bit immediate Table A4-21 gives the opcodes and locations of further information about the data processing instructions with modified 12-bit immediate data. For information about modified 12-bit immediate data, see Immediate constants on page A5-8. In these instructions, if the S bit is set, the instruction updates the condition code flags according to the results of the instruction, see Conditional execution on page A4-33. Table A4-21 Data processing instructions with modified 12-bit immediate A4-14 Function Instruction OP Notes Add with carry ADC (immediate) on page A5-17 0b1010 Add ADD (immediate) on page A5-21 0b1000 Logical AND AND (immediate) on page A5-32 0b0000 Bit clear BIC (immediate) on page A5-47 0b0001 Compare negative CMN (immediate) on page A5-69 0b1000 ADD with Rd == 0b1111, S == 1 Compare CMP (immediate) on page A5-73 0b1101 SUB with Rd == 0b1111, S == 1 Exclusive OR EOR (immediate) on page A5-86 0b0100 Move MOV (immediate) on page A5-167 0b0010 ORR with Rn == 0b1111 Move negative MVN (immediate) on page A5-182 0b0011 ORN with Rn == 0b1111 Logical OR NOT ORN (immediate) on page A5-190 0b0011 Logical OR ORR (immediate) on page A5-194 0b0010 Reverse subtract RSB (immediate) on page A5-224 0b1110 Subtract with carry SBC (immediate) on page A5-228 0b1011 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 The Thumb Instruction Set Table A4-21 Data processing instructions with modified 12-bit immediate (continued) Function Instruction OP Notes Subtract SUB (immediate) on page A5-276 0b1101 Test equal TEQ (immediate) on page A5-295 0b0100 EOR with Rd == 0b1111, S == 1 Test TST (immediate) on page A5-299 0b0000 AND with Rd == 0b1111, S == 1 Instructions of this format using any other combination of the OP bits are UNDEFINED. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A4-15 The Thumb Instruction Set Data processing instructions with plain 12-bit immediate Table A4-22 gives the opcodes and locations of further information about the data processing instructions with plain 12-bit immediate data. In these instructions, the immediate value is in i:imm3:imm8. Table A4-22 Data processing instructions with plain 12-bit immediate Function Instruction OP OP2 Add wide ADD (immediate) on page A5-21, encoding T4 0 0b00 Subtract wide SUB (immediate) on page A5-276, encoding T4 1 0b10 Address (before current instruction) ADR on page A5-30, encoding T2 0 0b10 Address (after current instruction) ADR on page A5-30, encoding T3 1 0b00 Instructions of this format using any other combination of the OP and OP2 bits are UNDEFINED. Data processing instructions with plain 16-bit immediate Table A4-23 gives the opcodes and locations of further information about the data processing instructions with plain 16-bit immediate data. In these instructions, the immediate value is in imm4:i:imm3:imm8. Table A4-23 Data processing instructions with plain 16-bit immediate Function Instruction OP OP2 Move top MOVT on page A5-172 1 0b00 Move wide MOV (immediate) on page A5-167, encoding T3 0 0b00 Instructions of this format using any other combination of the OP and OP2 bits are UNDEFINED. A4-16 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 The Thumb Instruction Set Data processing instructions, bitfield and saturate Table A4-24 gives the opcodes and locations of further information about saturation, bitfield extract, clear, and insert instructions. Table A4-24 Miscellaneous data processing instructions Function Instruction OP Notes Bit Field Clear BFC on page A5-43 0b011 Rn == 0b1111, meaning #0 Bit Field Insert BFI on page A5-45 0b011 Signed Bit Field extract SBFX on page A5-232 0b010 Signed saturate, LSL SSAT on page A5-242 0b000 Signed saturate, ASR SSAT on page A5-242 0b001 Unsigned Bit Field extract UBFX on page A5-303 0b110 Unsigned saturate, LSL USAT on page A5-311 0b100 Unsigned saturate, ASR USAT on page A5-311 0b101 Instructions of this format using any other combination of the OP bits are UNDEFINED. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A4-17 The Thumb Instruction Set A4.3.2 Data processing instructions, non-immediate Figure A4-5 shows the encodings for data processing instructions without an immediate operand. In these instructions, if the S bit is set, the instruction updates the condition code flags according to the results of the instruction, see Conditional execution on page A4-33. hw1 15 14 13 12 11 10 General format 1 1 1 9 8 hw2 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 0 1 Data processing: 1 1 1 0 1 0 1 constant shift OP Register-controlled shift 1 1 1 1 1 0 1 0 0 OP S Rn S Rn Rd imm2 type Rm 1 1 1 1 Rd 0 Rm (0) imm3 OP2 Sign or zero extension, 1 1 1 1 1 0 1 0 0 with optional addition OP Rn 1 1 1 1 Rd 1 (0) rot Rm SIMD 1 1 1 1 1 0 1 0 1 add or subtract OP Rn 1 1 1 1 Rd 0 prefix Rm Other three register 1 1 1 1 1 0 1 0 1 data processing OP Rn 1 1 1 1 Rd 1 OP2 Rm Not 1111 Reserved 1 1 1 1 1 0 1 0 32-bit multiplies and Sum of absolute 1 1 1 1 1 0 1 1 0 differences, with or without accumulate OP Rn Racc Rd OP2 Rm 64-bit multiplies and multiply-accumulates. Divides. 1 1 1 1 1 0 1 1 1 OP Rn RdLo RdHi OP2 Rm Figure A4-5 Data processing instructions, non-immediate This section contains the following subsections: • Data processing instructions with constant shift on page A4-19 • Register-controlled shift instructions on page A4-20 • Signed and unsigned extend instructions with optional addition on page A4-21 • Other three-register data processing instructions on page A4-22 • 32-bit multiplies, with or without accumulate on page A4-23 • 64-bit multiply, multiply-accumulate, and divide instructions on page A4-24. A4-18 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 The Thumb Instruction Set Data processing instructions with constant shift Table A4-25 gives the opcodes and locations of further information about the data processing instructions with a constant shift applied to the second operand register. For information about constant shifts, see Constant shifts applied to a register on page A5-10. In these instructions, if the S bit is set, the instruction updates the condition code flags according to the results of the instruction, see Conditional execution on page A4-33. The shift type is encoded in hw2<5:4>. The shift value is encoded in hw2<14:12,7:6>. Table A4-25 Data processing instructions with constant shift Function Instruction OP Notes Add with carry ADC (register) on page A5-19 0b1010 Add ADD (register) on page A5-24 0b1000 Logical AND AND (register) on page A5-34 0b0000 Bit clear BIC (register) on page A5-49 0b0001 Compare negative CMN (register) on page A5-71 0b1000 ADD with Rd == 0b1111, S == 1 Compare CMP (register) on page A5-75 0b1101 SUB with Rd == 0b1111, S == 1 Exclusive OR EOR (register) on page A5-88 0b0100 Move, and immediate shift Move, and immediate shift instructions on page A4-20 0b0010 ORR with Rn == 0b1111 Move negative MVN (register) on page A5-184 0b0011 ORN with Rn == 0b1111 Logical OR NOT ORN (register) on page A5-192 0b0011 Logical OR ORR (register) on page A5-196 0b0010 Reverse subtract RSB (register) on page A5-226 0b1110 Subtract with carry SBC (register) on page A5-230 0b1011 Subtract SUB (register) on page A5-279 0b1101 Test equal TEQ (register) on page A5-297 0b0100 EOR with Rd == 0b1111, S == 1 Test TST (register) on page A5-301 0b0000 AND with Rd == 0b1111, S == 1 Instructions of this format using any other combination of the OP bits are UNDEFINED. Instructions of this format with OP == 0b0110 are UNDEFINED if S == 1 or shift_type == 0b01 or shift_type == 0b11. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A4-19 The Thumb Instruction Set Move, and immediate shift instructions Table A4-26 gives the locations of further information about the move, and immediate shift instructions. In these instructions, if the S bit is set, the instruction updates the condition code flags according to the results of the instruction, see Conditional execution on page A4-33. Table A4-26 Move, and immediate shift instructions Function Instruction type imm5 Move MOV (register) on page A5-169 0b00 0b00000 Logical Shift Left LSL (immediate) on page A5-151 0b00 not 0b00000 Logical Shift Right LSR (immediate) on page A5-155 0b01 any Arithmetic Shift Right ASR (immediate) on page A5-36 0b10 any Rotate Right ROR (immediate) on page A5-218 0b11 not 0b00000 Rotate Right with Extend RRX on page A5-222 0b11 0b00000 Register-controlled shift instructions Table A4-27 gives the opcodes and locations of further information about the register-controlled shift instructions. In these instructions, if the S bit is set, the instruction updates the condition code flags according to the results of the instruction, see Conditional execution on page A4-33. Table A4-27 Register-controlled shift instructions Function Instruction type OP Logical Shift Left LSL (register) on page A5-153 0b00 0b000 Logical Shift Right LSR (register) on page A5-157 0b01 0b000 Arithmetic Shift Right ASR (register) on page A5-38 0b10 0b000 Rotate Right ROR (register) on page A5-220 0b11 0b000 Instructions of this format using any other combination of the OP and OP2 bits are UNDEFINED. A4-20 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 The Thumb Instruction Set Signed and unsigned extend instructions with optional addition Table A4-28 gives the opcodes and locations of further information about the signed and unsigned (zero) extend instructions with optional addition. Table A4-28 Signed and unsigned extend instructions with optional addition Function Instruction OP Rn Signed extend byte SXTB on page A5-287 0b100 0b1111 Signed extend halfword SXTH on page A5-289 0b000 0b1111 Unsigned extend byte UXTB on page A5-313 0b101 0b1111 Unsigned extend halfword UXTH on page A5-315 0b001 0b1111 Instructions of this format using any other combination of the OP bits are UNDEFINED. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A4-21 The Thumb Instruction Set Other three-register data processing instructions Table A4-29 gives the opcodes and locations of further information about other three-register data processing instructions. Table A4-29 Other three-register data processing instructions Function Instruction OP OP2 Count Leading Zeros CLZ on page A5-67 0b011 0b000 Reverse Bits RBIT on page A5-210 0b001 0b010 Byte-Reverse Word REV on page A5-212 0b001 0b000 Byte-Reverse Packed Halfword REV16 on page A5-214 0b001 0b001 Byte-Reverse Signed Halfword REVSH on page A5-216 0b001 0b011 Instructions of this format using any other combination of the OP and OP2 bits are UNDEFINED. A4-22 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 The Thumb Instruction Set 32-bit multiplies, with or without accumulate Table A4-30 gives the opcodes and locations of further information about multiply and multiply-accumulate instructions with 32-bit results, and absolute difference and accumulate absolute difference instructions. Table A4-30 Other two-register data processing instructions Function Instruction OP OP2 Ra 32 + 32 x 32-bit, least significant word MLA on page A5-163 0b000 0b0000 not R15 32 – 32 x 32-bit, least significant word MLS on page A5-165 0b000 0b0001 not R15 32 x 32-bit, least significant word MUL on page A5-180 0b000 0b0000 0b1111 Instructions of this format using any other combination of the OP and OP2 bits are UNDEFINED. An instruction that matches the OP and OP2 fields, but not the Ra column, is UNPREDICTABLE under the usage rules for R15. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A4-23 The Thumb Instruction Set 64-bit multiply, multiply-accumulate, and divide instructions Table A4-31 gives the opcodes and locations of further information about multiply and multiply accumulate instructions with 64-bit results, and divide instructions. Table A4-31 Other two-register data processing instructions Function Instruction OP OP2 Signed 32 x 32 SMULL on page A5-240 0b000 0b0000 Signed divide SDIV on page A5-234 0b001 0b1111 Unsigned 32 x 32 UMULL on page A5-309 0b010 0b0000 Unsigned divide UDIV on page A5-305 0b011 0b1111 Signed 64 + 32 x 32 SMLAL on page A5-238 0b100 0b0000 Unsigned 64 + 32 x 32 UMLAL on page A5-307 0b110 0b0000 Instructions of this format using any other combination of the OP and OP2 bits are UNDEFINED. Divide by Zero ARMv7-M supports signed and unsigned integer divide instructions SDIV and UDIV. The divide instructions can have divide-by-zero trapping enabled. Trapping is controlled by the DIV_0_TRP bit (see The System Control Block (SCB) on page B2-8). A4-24 • If DIV_0_TRP is clear, a division by zero produces a result of zero. • If DIV_0_TRP is set, a division by zero causes a UsageFault exception to occur on the SDIV or UDIV instruction. Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 The Thumb Instruction Set A4.3.3 Load and store single data item, and memory hints Figure A4-6 shows the encodings for loads and stores with single data items. hw1 15 14 13 12 11 10 9 8 hw2 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 General format 1 1 1 1 1 0 0 1 1 1 1 1 Rt imm12 Rn + imm12 1 1 1 1 1 0 0 S 1 size L Rn Rt imm12 Rn imm8 1 1 1 1 1 0 0 S 0 size L Rn Rt 1 1 0 0 imm8 Rn + imm8, User privilege 1 1 1 1 1 0 0 S 0 size L Rn Rt 1 1 1 0 imm8 Rn post-indexed by +/ imm8 1 1 1 1 1 0 0 S 0 size L Rn Rt 1 0 1 imm8 Rn pre-indexed by +/ imm8 1 1 1 1 1 0 0 S 0 size L Rn Rt 1 1 1 imm8 Rn + shifted register 1 1 1 1 1 0 0 S 0 size L Rn Rt 0 0 0 0 0 0 shift PC +/ imm1 1 1 1 1 1 0 0 S U size RESERVED 1 1 1 1 1 0 0 0 RESERVED 1 1 1 1 1 0 0 0 RESERVED 1 1 1 1 1 0 0 Not 1111 1 0 Not 1111 0 Rm 0 Not 00000 0 1 1 1 1 Figure A4-6 Load and store instructions, single data item In these instructions: L specifies whether the instruction is a load (L == 1) or a store (L == 0) S specifies whether a load is sign extended (S == 1) or zero extended (S == 0) U specifies whether the indexing is upwards (U == 1) or downwards (U == 0) Rn cannot be r15 (if it is, the instruction is PC +/- imm12) Rm cannot be r13 or r15 (if it is, the instruction is UNPREDICTABLE). Table A4-32 gives the encoding and locations of further information about load and store single data item instructions, and memory hints. Table A4-32 Load and store single data item, and memory hints Instruction Format S size L Rt LDR, LDRB, LDRSB, LDRH, LDRSH (immediate offset) 2 X 0b0X 1 Not R15 0 0b10 1 Any, including R15 X 0b0X 1 Not R15 0 0b10 1 Any, including R15 X 0b0X 1 Not R15 0 0b10 1 Any, including R15 LDR, LDRB, LDRSB, LDRH, LDRSH (negative immediate offset) 3 LDR, LDRB, LDRSB, LDRH, LDRSH (post-indexed) 5 ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A4-25 The Thumb Instruction Set Table A4-32 Load and store single data item, and memory hints (continued) Instruction Format S size L Rt LDR, LDRB, LDRSB, LDRH, LDRSH (pre-indexed) 6 X 0b0X 1 Not R15 0 0b10 1 Any, including R15 X 0b0X 1 Not R15 0 0b10 1 Any, including R15 X 0b0X 1 Not R15 0 0b10 1 Any, including R15 X 0b0X 1 Not R15 0 0b10 1 Not R15 LDR, LDRB, LDRSB, LDRH, LDRSH (register offset) LDR, LDRB, LDRSB, LDRH, LDRSH (PC-relative) LDRT, LDRBT, LDRSBT, LDRHT, LDRSHT 7 1 4 PLD 1, 2, 3, 7 0 0b00 1 R15 PLI 1, 2, 3, 7 1 0b00 1 R15 Unallocated memory hints (execute as NOP) 1, 2, 3, 7 X 0b01 1 R15 UNPREDICTABLE 4, 5, 6 X 0b0X 1 R15 STR, STRB, STRH (immediate offset) on page 4-245 2 0 Not 0b11 0 Not R15 STR, STRB, STRH (negative immediate offset) on page 4-247 3 0 Not 0b11 0 Not R15 STR, STRB, STRH (post-indexed) on page 4-249 5 0 Not 0b11 0 Not R15 STR, STRB, STRH (pre-indexed) on page 4-251 6 0 Not 0b11 0 Not R15 STR, STRB, STRH (register offset) on page 4-253 7 0 Not 0b11 0 Not R15 STRT, STRBT, STRHT on page 4-268 4 0 Not 0b11 0 Not R15 Instruction encodings using any combination of Format, S, size, and L bits that is not covered by Table A4-32 on page A4-25 are UNDEFINED. An instruction that matches the Format, S, and L fields, but not the Rt column, is UNPREDICTABLE under the usage rules for R15. A4-26 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 The Thumb Instruction Set A4.3.4 Load/store double and exclusive, and table branch Figure A4-7 shows the encodings for load and store double, load and store exclusive, and table branch instructions. hw1 15 14 13 12 11 10 General format 1 1 1 0 1 9 0 0 8 7 hw2 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 Load and Store Double 1 1 1 0 1 (only if PW != 0b00) 0 0 P U 1 W L Rn Rt Rt2 imm8 Load and Store Exclusive 1 1 1 0 1 0 0 0 0 1 0 L Rn Rt Rd imm8 Load and Store Exclusive Byte, 1 1 1 0 1 Halfword, and Table Branch. 0 0 0 1 1 0 L Rn Rt Rt2 OP Rm Figure A4-7 Load and store double, load and store exclusive, and table branch In these instructions: L specifies whether the instruction is a load (L == 1) or a store (L == 0) P specifies pre-indexed addressing (P == 1) or post-indexed addressing (P == 0) U specifies whether the indexing is upwards (U == 1) or downwards (U == 0) W specifies whether the address is written back to the base register (W == 1) or not (W == 0). For further details about the load and store double instructions, see: • LDRD (immediate) on page A5-117 • STRD (immediate) on page A5-260. For further details about the load and store exclusive word instructions, see: • LDREX on page A5-119 • STREX on page A5-262. Table A4-33 on page A4-28 gives details of the encoding of load and store exclusive byte and halfword, and the table branch instructions. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A4-27 The Thumb Instruction Set Table A4-33 Load and store exclusive byte, halfword, and doubleword, and table branch instructions Instruction L OP Rn Rt Rt2 Rm LDREXB on page A5-121 1 0b0100 Not R15 Not R15 SBO SBO LDREXH on page A5-123 1 0b0101 Not R15 Not R15 SBO SBO STREXB on page A5-264 0 0b0100 Not R15 Not R15 SBO Not R15 STREXH on page A5-266 0 0b0101 Not R15 Not R15 SBO Not R15 TBB on page A5-291 1 0b0000 Any including R15 SBO SBZ Not R15 TBH on page A5-293 1 0b0001 Any including R15 SBO SBZ Not R15 Instructions of this format using any other combination of the L and OP bits are UNDEFINED. An instruction that matches the OP and L fields, but not the Rn, Rm, Rt, or Rt2 columns, is UNPREDICTABLE under the usage rules for R15. A4-28 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 The Thumb Instruction Set A4.3.5 Load and store multiple Figure A4-8 shows encodings for the load and store multiple instructions, together with the RFE and SRS instructions. hw1 15 14 13 12 11 10 9 General format 1 1 1 0 1 0 0 8 7 hw2 6 5 4 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 Load and Store Multiple 1 1 1 0 1 0 0 V U 0 W L RESERVED 3 Rn P M (0) mask 1 1 1 0 1 0 0 UU 0 Figure A4-8 Load and store multiple, RFE, and SRS In these instructions: L specifies whether the instruction is a load (L == 1) or a store (L == 0) mask specifies which registers, in the range R0-R12, must be loaded or stored M specifies whether R14 is to be loaded or stored P specifies whether the PC is to be loaded (the PC cannot be stored) U specifies whether the indexing is upwards (U == 1) or downwards (U == 0) V is NOT U W specifies whether the address is written back to the base register (W == 1) or not (W == 0). For further details about these instructions, see: • LDMDB / LDMEA on page A5-97 (Load Multiple Decrement Before / Empty Ascending) • LDMIA / LDMFD on page A5-99 (Load Multiple Increment After / Full Descending) • POP on page A5-206 • PUSH on page A5-208 • STMDB / STMFD on page A5-246 (Store Multiple Decrement Before / Full Descending) • STMIA / STMEA on page A5-248 (Store Multiple Increment After / Empty Ascending). ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A4-29 The Thumb Instruction Set A4.3.6 Branches, miscellaneous control instructions Figure A4-9 shows the encodings for branches and various control instructions. hw1 hw2 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 General format 1 1 1 1 0 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 Branch 1 1 1 1 0 S offset[21:12] 1 0 J1 1 J2 offset[11:1] Branch with link 1 1 1 1 0 S offset[21:12] 1 1 J1 1 J2 offset[11:1] 1 0 J1 0 J2 offset[11:1] Conditional branch 1 1 1 1 0 S 0 cond offset[17:12] Move to status from register 1 1 1 1 0 0 1 1 1 0 0 0 Rn 1 0 (0) 0 1 0 0 0 SYSm No operation, hints 1 1 1 1 0 0 1 1 1 0 1 0 (1)(1) (1)(1) 1 0 (0) 0 (0) 0 0 0 hint Special control operations 1 1 1 1 0 0 1 1 1 0 1 1 (1)(1) (1)(1) 1 0 (0) 0 (1)(1) (1) (1) Move to register 1 1 1 1 0 0 1 1 1 1 1 0 (1)(1) (1)(1) 1 0 (0) 0 from status Permanently UNDEFINED 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 Rd OP option SYSm 1 1 1 1 Figure A4-9 Branches and miscellaneous control instructions In these instructions: I,F specifies which interrupt disable flags a CPS instruction must alter I1,I2 contain bits<23:22> of the offset, exclusive ORed with the S bit J1,J2 contain bits<19:18> of the offset M specifies whether a CPS instruction modifies the mode (M == 1) or not (M == 0) R specifies whether an MRS instruction accesses the SPSR (R== 1) or the CPSR (R == 0) S contains the sign bit, duplicated to bits<31:24> of the offset, or to bits<31:20> of the offset for conditional branches. For further details about the No operation and hint instructions, see Table A4-34 on page A4-31. For further details about the Special control operation instructions, see Table A4-35 on page A4-31. A4-30 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 The Thumb Instruction Set For further details about the other branch and miscellaneous control instructions, see: • B on page A5-40 (Branch) • BL on page A5-53 (Branch with Link) • BLX (register) on page A5-55 (Branch with Link and eXchange) • BX on page A5-57 (Branch eXchange) • MRS on page A5-178 (Move from Special-purpose register to ARM Register) • MSR (register) on page A5-179 (Move from ARM register to Special-purpose Register) Table A4-34 NOP-compatible hint instructions Function Hint number For details see No operation 0b00000000 NOP on page A5-188 Yield 0b00000001 YIELD on page A5-321 Wait for event 0b00000010 WFE on page A5-317 Wait for interrupt 0b00000011 WFI on page A5-319 Send event 0b00000100 SEV on page A5-236 Debug hint 0b1111xxxx DBG on page A5-80 The remainder of this space is RESERVED. The instructions must execute as No Operations, and must not be used. Table A4-35 Special control operations Function OP For details see Clear Exclusive 0b0010 CLREX on page A5-65 Data Synchronization Barrier 0b0100 DSB on page A5-84 Data Memory Barrier 0b0101 DMB on page A5-82 Instruction Synchronization Barrier 0b0110 ISB on page A5-90 Instructions of this format using any other combination of the OP bits are UNDEFINED. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A4-31 The Thumb Instruction Set A4.3.7 Coprocessor instructions Figure A4-10 shows the encodings for coprocessor instructions. hw1 15 14 13 12 11 10 General format 1 1 1 9 8 7 hw2 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 MRRC and MCRR 1 1 1 C 1 1 0 0 0 1 0 L coprocessor register transfers Rt2 Rt coproc Load and store 1 1 1 C 1 1 0 P U N W L coprocessor Rn CRd coproc CRn CRd coproc opc2 0 CRm CRn Rxf coproc opc2 1 CRm Coprocessor 1 1 1 C 1 1 1 0 data processing MRC and MCR 1 1 1 C 1 1 1 0 coprocessor register transfers RESERVED 1 1 1 0 1 1 opc1 opc1 L opcode CRm imm8 1 1 1 1 Figure A4-10 Coprocessor instructions A4-32 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 The Thumb Instruction Set A4.4 Conditional execution Most Thumb instructions are unconditional. Before Thumb-2, the only conditional Thumb instruction was a 16-bit conditional branch instruction, B, with a branch range of –256 to +254 bytes. Thumb-2 adds the following instructions: • A 32-bit conditional branch, with a branch range of approximately ± 1MB, see B on page A5-40. • A 16-bit If-Then instruction, IT. IT makes up to four following instructions conditional, see IT on page A5-92. The instructions that are made conditional by an IT instruction are called its IT block. • A 16-bit Compare and Branch on Zero instruction, with a branch range of +4 to +130 bytes, see CBZ on page A5-61. • A 16-bit Compare and Branch on Non-Zero instruction, with a branch range of +4 to +130 bytes, see CBNZ on page A5-59. The condition codes that the conditional branch and IT instructions use are shown in Table A4-36 on page A4-34. The conditions of the instructions in an IT block are either all the same, or some of them are the inverse of the first condition. A4.4.1 Assembly language syntax Although Thumb instructions are unconditional, all instructions that are made conditional by an IT instruction must be written with a condition. These conditions must match the conditions imposed by the IT instruction. For example, an ITTEE EQ instruction imposes the EQ condition on the first two following instructions, and the NE condition on the next two. Those four instructions must be written with EQ, EQ, NE and NE conditions respectively. Some instructions are not allowed to be made conditional by an IT instruction, or are only allowed to be if they are the last instruction in the IT block. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A4-33 The Thumb Instruction Set The branch instruction encodings that include a condition field are not allowed to be made conditional by an IT instruction. If the assembler syntax indicates a conditional branch that correctly matches a preceding IT instruction, it must be assembled using a branch instruction encoding that does not include a condition field. Table A4-36 Condition codes Opcode Mnemonic extension Meaning Condition flag state 0000 EQ Equal Z set 0001 NE Not equal Z clear 0010 CS a Carry set C set 0011 CC b Carry clear C clear 0100 MI Minus/negative N set 0101 PL Plus/positive or zero N clear 0110 VS Overflow V set 0111 VC No overflow V clear 1000 HI Unsigned higher C set and Z clear 1001 LS Unsigned lower or same C clear or Z set 1010 GE Signed greater than or equal N set and V set, or N clear and V clear (N == V) 1011 LT Signed less than N set and V clear, or N clear and V set (N != V) 1100 GT Signed greater than Z clear, and either N set and V set, or N clear and V clear (Z == 0,N == V) 1101 LE Signed less than or equal Z set, or N set and V clear, or N clear and V set (Z == 1 or N != V) 1110 AL Always (unconditional). AL can only be used with IT instructions. - 1111 - Alternative instruction, always (unconditional). - a. HS (unsigned Higher or Same) is a synonym for CS. b. LO (unsigned Lower) is a synonym for CC. A4-34 Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 The Thumb Instruction Set A4.4.2 The IT execution state bits The IT bits are a system level resource defined in The special-purpose processor status registers (xPSR) on page B1-7. They are system level state bits associated with the IT instruction and conditional execution in Thumb-2. Their behavior is described here alongside the other information associated with conditional execution for completeness. IT<7:5> encodes the base condition (that is, the top 3 bits of the condition specified by the IT instruction) for the current IT block, if any. It contains 0b000 when no IT block is active. IT<4:0> encodes the number of instructions that are due to be conditionally executed, and whether the condition for each is the base condition code or the inverse of the base condition code. It contains 0b00000 when no IT block is active. When an IT instruction is executed, these bits are set according to the condition in the instruction, and the Then and Else (T and E) parameters in the instruction (see IT on page A5-92 for details). During execution of an IT block, IT<4:0> is shifted: • to reduce the number of instructions to be conditionally executed by one • to move the next bit into position to form the least significant bit of the condition code. See Table A4-37 for the way the shift operates. Table A4-37 Shifting of IT execution state bits Old state New state IT[7:5] IT[4] IT[3] IT[2] IT[1] IT[0] IT[7:5] IT[4] IT[3] IT[2] IT[1] IT[0] cond_base P1 P2 P3 P4 1 cond_base P2 P3 P4 1 0 cond_base P1 P2 P3 1 0 cond_base P2 P3 1 0 0 cond_base P1 P2 1 0 0 cond_base P2 1 0 0 0 cond_base P1 1 0 0 0 0b000 0 0 0 0 0 ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A4-35 The Thumb Instruction Set See Table A4-38 for the effect of each state. Table A4-38 Effect of IT execution state bits A4-36 Entry point for: IT[7:5] IT[4] IT[3] IT[2] IT[1] IT[0] 4-instruction IT block cond_base P1 P2 P3 P4 1 Next instruction has condition cond_base, P1 3-instruction IT block cond_base P1 P2 P3 1 0 Next instruction has condition cond_base, P1 2-instruction IT block cond_base P1 P2 1 0 0 Next instruction has condition cond_base, P1 1-instruction IT block cond_base P1 1 0 0 0 Next instruction has condition cond_base, P1 0b000 0 0 0 0 0 Normal execution (not in an IT block) non-zero 0 0 0 0 0 UNPREDICTABLE 0bxxx 1 0 0 0 0 UNPREDICTABLE Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 The Thumb Instruction Set A4.5 UNDEFINED and UNPREDICTABLE instruction set space An attempt to execute an unallocated instruction results in either: • Unpredictable behavior. The instruction is described as UNPREDICTABLE. • An Undefined Instruction exception. The instruction is described as UNDEFINED. This section describes the general rules that determine whether an unallocated instruction in the Thumb instruction set space is UNDEFINED or UNPREDICTABLE. Note See Usage of 0b1111 as a register specifier on page A4-39 and Usage of 0b1101 as a register specifier on page A4-41 for additional information on UNPREDICTABLE behavior. A4.5.1 16-bit instruction set space Instruction bits<15:6> are used for decode. Instructions where bits<15:10> == 0b010001 are special data processing operations. Unallocated instructions in this space are UNPREDICTABLE. In ARMv6 this is where bits<9:6> == 0b0000 or 0b0100. In ARMv7 this is where bits<9:6> == 0b0100. All other unallocated instructions are UNDEFINED. Permanently undefined space The part of the instruction set space where bits<15:8> == 0b11011110 is architecturally undefined. This space is available for instruction emulation, or for other purposes where software wants to force an Undefined Instruction exception to occur. A4.5.2 32-bit instruction set space The following general rules apply to all 32-bit Thumb instructions: • The hw1<15:11> bit-field is always in the range 0b11101 to 0b11111 inclusive. • Instruction classes are determined by hw1<15:8,6> and hw2<15>. For details see Figure A4-3 on page A4-12. • Instructions are made up of three types of bit field: — opcode fields — register specifiers — immediate fields specifying shifts or immediate values. • Opcode fields are defined in Figure A4-4 on page A4-13 to Figure A4-10 on page A4-32 inclusive. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A4-37 The Thumb Instruction Set An instruction is UNDEFINED if: • it corresponds to any Undefined or Reserved encoding in Figure A4-4 on page A4-13 to Figure A4-10 on page A4-32 inclusive • it corresponds to an opcode bit pattern that is missing from the tables associated with the figures (that is, Table A4-21 on page A4-14 to Table A4-35 on page A4-31 inclusive), or noted in the subsection text • it is declared as UNDEFINED within an instruction description. An instruction is UNPREDICTABLE if: A4-38 • a register specifier is 0b1111 or 0b1101 and the instruction does not specifically describe this case • an SBZ bit or multi-bit field is not zero or all zeros • an SBO bit or multi-bit field is not one or all ones • it is declared as UNPREDICTABLE within an instruction description. Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 The Thumb Instruction Set A4.6 Usage of 0b1111 as a register specifier When a value of 0b1111 is permitted as a register specifier in Thumb-2, a variety of meanings is possible. For register reads, these meanings are: • Read the PC value, that is, the address of the current instruction + 4. The base register of the table branch instructions TBB and TBH is allowed to be the PC. This allows branch tables to be placed in memory immediately after the instruction. (Some instructions read the PC value implicitly, without the use of a register specifier, for example the conditional branch instruction B .) Note Use of the PC as the base register in the STC instruction is deprecated in ARMv7. • Read the word-aligned PC value, that is, the address of the current instruction + 4, with bits<1:0> forced to zero. The base register of LDC, STC, LDR, LDRB, LDRD (pre-indexed, no writeback), LDRH, LDRSB, and LDRSH instructions are allowed to be the word-aligned PC. This allows PC-relative data addressing. In addition, the ADDW and SUBW instructions allow their source registers to be 0b1111 for the same purpose. • Read zero. Where this occurs, the instruction can be a special case of another, more general instruction, but with one operand zero. In these cases, the instructions are listed on separate pages. This is the case for the following instructions: BFC special case of BFI MOV special case of ORR MUL special case of MLA MVN special case of ORN For register writes, these meanings are: • The PC can be specified as the destination register of an LDR instruction. This is done by encoding Rt as 0b1111. The loaded value is treated as an address, and the effect of execution is a branch to that address. Bit<0> of the loaded value determines the instruction execution state and must be 1. Some other instructions write the PC in similar ways, either implicitly (for example, B ) or by using a register mask rather than a register specifier (LDM). The address to branch to can be a loaded value (for example, LDM), a register value (for example, BX), or the result of a calculation (for example, TBB or TBH). • Discard the result of a calculation. This is done when one instruction is a special case of another, more general instruction, but with the result discarded. In these cases, the instructions are listed on separate pages, with Encoding notes for each instruction cross-referencing the other. This is the case for the following instructions: CMN special case of ADDS CMP special case of SUBS TEQ special case of EORS TST special case of ANDS. ARM DDI 0405A-01 Copyright © 2006 ARM Limited. All rights reserved. Beta A4-39 The Thumb Instruction Set • If the destination register specifier of an LDRB, LDRH, LDRSB, or LDRSH instruction is 0b1111, the instruction is a memory hint instead of a load operation. This is the case for the following instructions: PLD uses LDRB encoding PLI uses LDRSB encoding. The unallocated memory hint instruction encodings (LDRH and LDRSH encodings) execute as NOP, instead of being UNDEFINED or UNPREDICTABLE like most other unallocated instruction encodings. See Memory hints on page A5-14 for further details. • A4.6.1 If the destination register specifier of an MRC instruction is 0b1111, bits<31:28> of the value transferred from the coprocessor are written to the (N,Z,C,V) flags in the APSR, and bits<27:0> are discarded. M profile interworking support Thumb interworking uses bit<0> on some writes to the PC to determine the instruction execution state. ARMv7-M only supports the Thumb instruction execution state, therefore the value must be 1 in interworking instructions, otherwise a fault occurs. For 16-bit instructions, interworking behavior is as follows: • ADD (register) and MOV (register) branch within Thumb state ignoring bit<0>. • B, or the B instruction, branches without interworking • BKPT and SVC cause exceptions, the exception mechanism responsible for any state transition. They are not considered to be interworking instructions. • BLX (register) and BX interwork on the value in Rm For 32-bit instructions, interworking behavior is as follows: A4-40 • B, or the B instruction, branches without interworking • BL branches to Thumb state based on the instruction encoding, not due to bit<0> of the value written to the PC. • LDM and LDR support interworking using the value written to the PC. • TBB and TBH branch without interworking. Copyright © 2006 ARM Limited. All rights reserved. Beta ARM DDI 0405A-01 The Thumb Instruction Set A4.7 Usage of 0b1101 as a register specifier R13 is defined in Thumb-2 such that its usage model is required to be that of a stack pointer, aligning R13 with the ARM Architecture Procedure Call Standard (AAPCS), the architecture usage model supported by the PUSH and POP instructions in the 16-bit instruction set. In the 32-bit Thumb instruction set, if R13 is used as a general purpose register beyond the architecturally defined constraints described in this section, the results are UNPREDICTABLE. A4.7.1 R13<1:0> definition For bits<1:0> of R13, software must adopt a SBZP (Should Be Zero or Preserved) write policy, that is, it is permitted to write zeros or values read from them. Writing anything else to bits<1:0> results in UNPREDICTABLE values. Reading bits<1:0> returns the value written earlier, unless the value read is UNPREDICTABLE. This definition means that R13 can be set to a word-aligned address. This supports ADD/SUB R13,R13,#4 without either a requirement that R13<1:0> must always read as zero or a need to use ADD/SUB Rt,R13,#4; BIC R13,Rt,#3 to force word-alignment of the write to R13. A4.7.2 Thumb-2 ISA support for R13 R13 instruction support is restricted to the following: • R13 as the source or destination register of a MOV instruction. Only register <=> register (no shift) transfers are supported, with no flag setting: MOV MOV • Adjusting R13 up or down by a multiple of its alignment: ADD{W} SUB{W} ADD SUB A4.7.3 SP,Rm Rn,SP SP,SP,#N SP,SP,#N SP,SP,Rm,LSL #shft SP,SP,Rm,LSL #shft ; ; ; ; For For For For N a multiple of 4 N a multiple of 4 shft=0,1,2,3 shft=0,1,2,3 • R13 as a base register of any load or store instruction. This supports SP-based addressing for load, store, or memory hint instructions, with positive or negative offsets, with and without writeback. • R13 as the first operand in any ADD{S}, ADDW, CMN, CMP, SUB{S}, or SUBW instruction. The add/subtract instructions support SP-based address generation, with the address going into a general-purpose register. CMN and CMP are useful for stack checking in some circumstances. • R13 as the transferred register