ARM Architecture Reference Manual

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 1138

DownloadARM Architecture Reference Manual
Open PDF In BrowserView PDF
ARM Architecture
Reference Manual

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.
ARM DDI 0100I

ARM Architecture Reference Manual
Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.
Release Information
The following changes have been made to this document.
Change History
Date

Issue

Change

February 1996

A

First edition

July 1997

B

Updated and index added

April 1998

C

Updated

February 2000

D

Updated for ARM architecture v5

June 2000

E

Updated for ARM architecture v5TE and corrections to Part B

July 2004

F

Updated for ARM architecture v6 (Confidential)

December 2004

G

Updated to incorporate corrections to errata

March 2005

H

Updated to incorporate corrections to errata

July 2005

I

Updated to incorporate corrections to pseudocode and graphics

Proprietary Notice
ARM, the ARM Powered logo, Thumb, and StrongARM are registered trademarks of ARM Limited.
The ARM logo, AMBA, Angel, ARMulator, EmbeddedICE, ModelGen, Multi-ICE, PrimeCell, ARM7TDMI,
ARM7TDMI-S, ARM9TDMI, ARM9E-S, ETM7, ETM9, TDMI, STRONG, are trademarks of ARM Limited.
All other products or services mentioned herein may be trademarks of their respective owners.
The product described in this document is subject to continuous developments and improvements. All particulars of the
product and its use contained in this document are given by ARM in good faith.
1. Subject to the provisions set out below, ARM hereby grants to you a perpetual, non-exclusive, nontransferable, royalty
free, worldwide licence to use this ARM Architecture Reference Manual for the purposes of developing; (i) software
applications or operating systems which are targeted to run on microprocessor cores distributed under licence from ARM;
(ii) tools which are designed to develop software programs which are targeted to run on microprocessor cores distributed
under licence from ARM; (iii) or having developed integrated circuits which incorporate a microprocessor core
manufactured under licence from ARM.
2. Except as expressly licensed in Clause 1 you acquire no right, title or interest in the ARM Architecture Reference
Manual, or any Intellectual Property therein. In no event shall the licences granted in Clause 1, be construed as granting
you expressly or by implication, estoppel or otherwise, licences to any ARM technology other than the ARM Architecture
Reference Manual. The licence grant in Clause 1 expressly excludes any rights for you to use or take into use any ARM
patents. No right is granted to you under the provisions of Clause 1 to; (i) use the ARM Architecture Reference Manual
for the purposes of developing or having developed microprocessor cores or models thereof which are compatible in
whole or part with either or both the instructions or programmer's models described in this ARM Architecture Reference

ii

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Manual; or (ii) develop or have developed models of any microprocessor cores designed by or for ARM; or (iii) distribute
in whole or in part this ARM Architecture Reference Manual to third parties, other than to your subcontractors for the
purposes of having developed products in accordance with the licence grant in Clause 1 without the express written
permission of ARM; or (iv) translate or have translated this ARM Architecture Reference Manual into any other
languages.
3.THE ARM ARCHITECTURE REFERENCE MANUAL IS PROVIDED "AS IS" WITH NO WARRANTIES
EXPRESS, IMPLIED OR STATUTORY, INCLUDING BUT NOT LIMITED TO ANY WARRANTY OF
SATISFACTORY QUALITY, NONINFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE.
4. No licence, express, implied or otherwise, is granted to LICENSEE, under the provisions of Clause 1, to use the ARM
tradename, in connection with the use of the ARM Architecture Reference Manual or any products based thereon.
Nothing in Clause 1 shall be construed as authority for you to make any representations on behalf of ARM in respect of
the ARM Architecture Reference Manual or any products based thereon.
Copyright © 1996-1998, 2000, 2004, 2005 ARM limited
110 Fulbourn Road Cambridge, England CB1 9NJ
Restricted Rights Legend: Use, duplication or disclosure by the United States Government is subject to the restrictions
set forth in DFARS 252.227-7013 (c)(1)(ii) and FAR 52.227-19
This document is Non-Confidential. The right to use, copy and disclose this document is subject to the licence set out
above.

ARM DDI 0100I

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

iii

iv

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Contents
ARM Architecture Reference Manual

Preface
About this manual ................................................................................ xii
Architecture versions and variants ...................................................... xiii
Using this manual .............................................................................. xviii
Conventions ........................................................................................ xxi
Further reading .................................................................................. xxiii
Feedback .......................................................................................... xxiv

Part A
Chapter A1

CPU Architecture
Introduction to the ARM Architecture
A1.1
A1.2
A1.3

Chapter A2

Programmers’ Model
A2.1
A2.2
A2.3
A2.4
A2.5

ARM DDI 0100I

About the ARM architecture ............................................................. A1-2
ARM instruction set .......................................................................... A1-6
Thumb instruction set ..................................................................... A1-11

Data types ........................................................................................ A2-2
Processor modes ............................................................................. A2-3
Registers .......................................................................................... A2-4
General-purpose registers ............................................................... A2-6
Program status registers ................................................................ A2-11

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

v

Contents

A2.6
A2.7
A2.8
A2.9
A2.10
A2.11

Chapter A3

Addressing Mode 1 - Data-processing operands ............................. A5-2
Addressing Mode 2 - Load and Store Word or Unsigned Byte ...... A5-18
Addressing Mode 3 - Miscellaneous Loads and Stores ................. A5-33
Addressing Mode 4 - Load and Store Multiple ............................... A5-41
Addressing Mode 5 - Load and Store Coprocessor ....................... A5-49

The Thumb Instruction Set
A6.1
A6.2
A6.3
A6.4
A6.5
A6.6
A6.7
A6.8

vi

Alphabetical list of ARM instructions ................................................ A4-2
ARM instructions and architecture versions ................................. A4-286

ARM Addressing Modes
A5.1
A5.2
A5.3
A5.4
A5.5

Chapter A6

Instruction set encoding ................................................................... A3-2
The condition field ............................................................................ A3-3
Branch instructions .......................................................................... A3-5
Data-processing instructions ............................................................ A3-7
Multiply instructions ........................................................................ A3-10
Parallel addition and subtraction instructions ................................. A3-14
Extend instructions ......................................................................... A3-16
Miscellaneous arithmetic instructions ............................................ A3-17
Other miscellaneous instructions ................................................... A3-18
Status register access instructions ................................................ A3-19
Load and store instructions ............................................................ A3-21
Load and Store Multiple instructions .............................................. A3-26
Semaphore instructions ................................................................. A3-28
Exception-generating instructions .................................................. A3-29
Coprocessor instructions ............................................................... A3-30
Extending the instruction set .......................................................... A3-32

ARM Instructions
A4.1
A4.2

Chapter A5

A2-16
A2-30
A2-38
A2-44
A2-53
A2-69

The ARM Instruction Set
A3.1
A3.2
A3.3
A3.4
A3.5
A3.6
A3.7
A3.8
A3.9
A3.10
A3.11
A3.12
A3.13
A3.14
A3.15
A3.16

Chapter A4

Exceptions .....................................................................................
Endian support ...............................................................................
Unaligned access support ..............................................................
Synchronization primitives .............................................................
The Jazelle Extension ....................................................................
Saturated integer arithmetic ...........................................................

About the Thumb instruction set ...................................................... A6-2
Instruction set encoding ................................................................... A6-4
Branch instructions .......................................................................... A6-6
Data-processing instructions ............................................................ A6-8
Load and Store Register instructions ............................................. A6-15
Load and Store Multiple instructions .............................................. A6-18
Exception-generating instructions .................................................. A6-20
Undefined Instruction space .......................................................... A6-21

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Contents

Chapter A7

Thumb Instructions
A7.1
A7.2

Part B
Chapter B1

Memory and System Architectures
Introduction to Memory and System Architectures
B1.1
B1.2
B1.3
B1.4
B1.5
B1.6
B1.7
B1.8

Chapter B2

About the VMSA .............................................................................. B4-2
Memory access sequence ............................................................... B4-4
Memory access control .................................................................... B4-8
Memory region attributes ............................................................... B4-11
Aborts ............................................................................................. B4-14
Fault Address and Fault Status registers ....................................... B4-19
Hardware page table translation .................................................... B4-23
Fine page tables and support of tiny pages ................................... B4-35
CP15 registers ............................................................................... B4-39

Protected Memory System Architecture
B5.1

ARM DDI 0100I

About the System Control coprocessor ............................................ B3-2
Registers .......................................................................................... B3-3
Register 0: ID codes ........................................................................ B3-7
Register 1: Control registers .......................................................... B3-12
Registers 2 to 15 ............................................................................ B3-18

Virtual Memory System Architecture
B4.1
B4.2
B4.3
B4.4
B4.5
B4.6
B4.7
B4.8
B4.9

Chapter B5

About the memory order model ........................................................ B2-2
Read and write definitions ................................................................ B2-4
Memory attributes prior to ARMv6 ................................................... B2-7
ARMv6 memory attributes - introduction .......................................... B2-8
Ordering requirements for memory accesses ................................ B2-16
Memory barriers ............................................................................. B2-18
Memory coherency and access issues .......................................... B2-20

The System Control Coprocessor
B3.1
B3.2
B3.3
B3.4
B3.5

Chapter B4

About the memory system ............................................................... B1-2
Memory hierarchy ............................................................................ B1-4
L1 cache .......................................................................................... B1-6
L2 cache .......................................................................................... B1-7
Write buffers ..................................................................................... B1-8
Tightly Coupled Memory .................................................................. B1-9
Asynchronous exceptions .............................................................. B1-10
Semaphores ................................................................................... B1-12

Memory Order Model
B2.1
B2.2
B2.3
B2.4
B2.5
B2.6
B2.7

Chapter B3

Alphabetical list of Thumb instructions ............................................. A7-2
Thumb instructions and architecture versions .............................. A7-125

About the PMSA .............................................................................. B5-2

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

vii

Contents

B5.2
B5.3
B5.4
B5.5
B5.6
B5.7

Chapter B6

Caches and Write Buffers
B6.1
B6.2
B6.3
B6.4
B6.5
B6.6

Chapter B7

Chapter C1

About the FCSE ...............................................................................
Modified virtual addresses ...............................................................
Enabling the FCSE ..........................................................................
Debug and Trace .............................................................................
CP15 registers .................................................................................

B8-2
B8-3
B8-5
B8-6
B8-7

Introduction to the Vector Floating-point Architecture
About the Vector Floating-point architecture .................................... C1-2
Overview of the VFP architecture .................................................... C1-4
Compliance with the IEEE 754 standard ......................................... C1-9
IEEE 754 implementation choices ................................................. C1-10

VFP Programmer’s Model
C2.1
C2.2
C2.3
C2.4
C2.5
C2.6
C2.7

viii

B7-2
B7-3
B7-7
B7-8
B7-9

Vector Floating-point Architecture
C1.1
C1.2
C1.3
C1.4

Chapter C2

About TCM .......................................................................................
TCM configuration and control .........................................................
Accesses to TCM and cache ...........................................................
Level 1 (L1) DMA model ..................................................................
L1 DMA control using CP15 Register 11 .........................................

Fast Context Switch Extension
B8.1
B8.2
B8.3
B8.4
B8.5

Part C

About caches and write buffers ........................................................ B6-2
Cache organization .......................................................................... B6-4
Types of cache ................................................................................. B6-7
L1 cache ........................................................................................ B6-10
Considerations for additional levels of cache ................................. B6-12
CP15 registers ............................................................................... B6-13

Tightly Coupled Memory
B7.1
B7.2
B7.3
B7.4
B7.5

Chapter B8

Memory access sequence ............................................................... B5-4
Memory access control .................................................................... B5-8
Memory access attributes .............................................................. B5-10
Memory aborts (PMSAv6) .............................................................. B5-13
Fault Status and Fault Address register support ............................ B5-16
CP15 registers ............................................................................... B5-18

Floating-point formats ...................................................................... C2-2
Rounding .......................................................................................... C2-9
Floating-point exceptions ............................................................... C2-10
Flush-to-zero mode ........................................................................ C2-14
Default NaN mode ......................................................................... C2-16
Floating-point general-purpose registers ....................................... C2-17
System registers ............................................................................ C2-21

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Contents

C2.8

Chapter C3

VFP Instruction Set Overview
C3.1
C3.2
C3.3
C3.4

Chapter C4

Chapter D1

Introduction to the Debug Architecture
Introduction ...................................................................................... D1-2
Trace ................................................................................................ D1-4
Debug and ARMv6 ........................................................................... D1-5

Debug Events and Exceptions
D2.1
D2.2
D2.3
D2.4

Chapter D3

Addressing Mode 1 - Single-precision vectors (non-monadic) ......... C5-2
Addressing Mode 2 - Double-precision vectors (non-monadic) ....... C5-8
Addressing Mode 3 - Single-precision vectors (monadic) .............. C5-14
Addressing Mode 4 - Double-precision vectors (monadic) ............ C5-18
Addressing Mode 5 - VFP load/store multiple ................................ C5-22

Debug Architecture
D1.1
D1.2
D1.3

Chapter D2

Alphabetical list of VFP instructions ................................................. C4-2

VFP Addressing Modes
C5.1
C5.2
C5.3
C5.4
C5.5

Part D

Data-processing instructions ............................................................ C3-2
Load and Store instructions ........................................................... C3-14
Single register transfer instructions ................................................ C3-18
Two-register transfer instructions ................................................... C3-22

VFP Instructions
C4.1

Chapter C5

Reset behavior and initialization .................................................... C2-29

Introduction ...................................................................................... D2-2
Monitor debug-mode ........................................................................ D2-5
Halting debug-mode ......................................................................... D2-8
External Debug Interface ............................................................... D2-13

Coprocessor 14, the Debug Coprocessor
D3.1
D3.2
D3.3
D3.4
D3.5

Coprocessor 14 debug registers ...................................................... D3-2
Coprocessor 14 debug instructions .................................................. D3-5
Debug register reference ................................................................. D3-8
Reset values of the CP14 debug registers ..................................... D3-24
Access to CP14 debug registers from the external debug interface .........
D3-25

Glossary

ARM DDI 0100I

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ix

Contents

x

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Preface

This preface describes the versions of the ARM® architecture and the contents of this manual, then lists the
conventions and terminology it uses.
•
About this manual on page xii
•
Architecture versions and variants on page xiii
•
Using this manual on page xviii
•
Conventions on page xxi
•
Further reading on page xxiii
•
Feedback on page xxiv.

ARM DDI 0100I

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

xi

Preface

About this manual
The purpose of this manual is to describe the ARM instruction set architecture, including its high code
density Thumb® subset, and three of its standard coprocessor extensions:
•

The standard System Control coprocessor (coprocessor 15), which is used to control memory system
components such as caches, write buffers, Memory Management Units, and Protection Units.

•

The Vector Floating-point (VFP) architecture, which uses coprocessors 10 and 11 to supply a
high-performance floating-point instruction set.

•

The debug architecture interface (coprocessor 14), formally added to the architecture in ARM v6 to
provide software access to debug features in ARM cores, (for example, breakpoint and watchpoint
control).

The 32-bit ARM and 16-bit Thumb instruction sets are described separately in Part A. The precise effects
of each instruction are described, including any restrictions on its use. This information is of primary
importance to authors of compilers, assemblers, and other programs that generate ARM machine code.
Assembler syntax is given for most of the instructions described in this manual, allowing instructions to be
specified in textual form.
However, this manual is not intended as tutorial material for ARM assembler language, nor does it describe
ARM assembler language at anything other than a very basic level. To make effective use of ARM assembler
language, consult the documentation supplied with the assembler being used.
The memory and system architecture definition is significantly improved in ARM architecture version 6 (the
latest version). Prior to this, it usually needs to be supplemented by detailed implementation-specific
information from the technical reference manual of the device being used.

xii

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Preface

Architecture versions and variants
The ARM instruction set architecture has evolved significantly since it was first developed, and will
continue to be developed in the future. Six major versions of the instruction set have been defined to date,
denoted by the version numbers 1 to 6. Of these, the first three versions including the original 26-bit
architecture (the 32-bit architecture was introduced at ARMv3) are now OBSOLETE. All bits and encodings
that were used for 26-bit features become RESERVED for future expansion by ARM Ltd.
Versions can be qualified with variant letters to specify collections of additional instructions that are
included as an architecture extension. Extensions are typically included in the base architecture of the next
version number, ARMv5T being the notable exception. Provision is also made to exclude variants by
prefixing the variant letter with x, for example the xP variant described below in the summary of version 5
features.

Note
The xM variant which indicates that long multiplies (32 x 32 multiplies with 64-bit results) are not
supported, has been withdrawn.
The valid architecture variants are as follows (variant in brackets for legacy reasons only):
ARMv4, ARMv4T, ARMv5T, (ARMv5TExP), ARMv5TE, ARMv5TEJ, and ARMv6
The following architecture variants are now OBSOLETE:
ARMv1, ARMv2, ARMv2a, ARMv3, ARMv3G, ARMv3M, ARMv4xM, ARMv4TxM, ARMv5,
ARMv5xM, and ARMv5TxM
Details on OBSOLETE versions are available on request from ARM.
The ARM and Thumb instruction sets are summarized by architecture variant in ARM instructions and
architecture versions on page A4-286 and Thumb instructions and architecture versions on page A7-125
respectively. The key differences introduced since ARMv4 are listed below.

Version 4 and the introduction of Thumb (T variant)
The Thumb instruction set is a re-encoded subset of the ARM instruction set. Thumb instructions execute
in their own processor state, with the architecture defining the mechanisms required to transition between
ARM and Thumb states. The key difference is that Thumb instructions are half the size of ARM instructions
(16 bits compared with 32 bits). Greater code density can usually be achieved by using the Thumb
instruction set in preference to the ARM instruction set. However, the Thumb instruction set does have some
limitations:
•

Thumb code usually uses more instructions for a given task, making ARM code best for maximizing
performance of time-critical code.

•

ARM state and some associated ARM instructions are required for exception handling.

The Thumb instruction set is always used in conjunction with a version of the ARM instruction set.

ARM DDI 0100I

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

xiii

Preface

New features in Version 5T
This version extended architecture version 4T as follows:
•

Improved efficiency of ARM/Thumb interworking

•

Count leading zeros (CLZ, ARM only) and software breakpoint (BKPT, ARM and Thumb) instructions
added

•

Additional options for coprocessor designers (coprocessor support is ARM only)

•

Tighter definition of flag setting on multiplies (ARM and Thumb)

•

Introduction of the E variant, adding ARM instructions which enhance performance of an ARM
processor on typical digital signal processing (DSP) algorithms:

•

—

Several multiply and multiply-accumulate instructions that act on 16-bit data items.

—

Addition and subtraction instructions that perform saturated signed arithmetic. Saturated
arithmetic produces the maximum positive or negative value instead of wrapping the result if
the calculation overflows the normal integer range.

—

Load (LDRD), store (STRD) and coprocessor register transfer (MCRR and MRRC) instructions that act
on two words of data.

—

A preload data instruction PLD.

Introduction of the J variant, adding the BXJ instruction and the other provisions required to support
the Jazelle® architecture extension.

Note
Some early implementations of the E variant omitted the LDRD, STRD, MCRR, MRCC and PLD instructions. These
are designated as conforming to the ExP variant, and the variant is defined for legacy reasons only.

xiv

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Preface

New features in Version 6
The following ARM instructions are added:
•

CPS, SRS and RFE instructions for improved exception handling

•

REV, REV16 and REVSH byte reversal instructions

•

SETEND for a revised endian (memory) model

•

LDREX and STREX exclusive access instructions

•

SXTB, SXTH, UXTB, UXTH byte/halfword extend instructions

•

A set of Single Instruction Multiple Data (SIMD) media instructions

•

Additional forms of multiply instructions with accumulation into a 64-bit result.

The following Thumb instructions are added:
•

CPS, CPY (a form of MOV), REV, REV16, REVSH, SETEND, SXTB, SXTH, UXTB, UXTH

Other changes to ARMv6 are as follows:
•

The architecture name ARMv6 implies the presence of all preceding features, that is, ARMv5TEJ
compliance.

•

Revised Virtual and Protected Memory System Architectures.

•

Provision of a Tightly Coupled Memory model.

•

New hardware support for word and halfword unaligned accesses.

•

Formalized adoption of a debug architecture with external and Coprocessor 14 based interfaces.

•

Prior to ARMv6, the System Control coprocessor (CP15) described in Chapter B3 was a
recommendation only. Support for this coprocessor is now mandated in ARMv6.

•

For historical reasons, the rules relating to unaligned values written to the PC are somewhat complex
prior to ARMv6. These rules are made simpler and more consistent in ARMv6.

•

The high vectors extension prior to ARMv6 is an optional (IMPLEMENTATION DEFINED) part of the
architecture. This extension becomes obligatory in ARMv6.

•

Prior to ARMv6, a processor may use either of two abort models. ARMv6 requires that the Base
Restored Abort Model (BRAM) is used. The two abort models supported previously were:

ARM DDI 0100I

—

The BRAM, in which the base register of any valid load/store instruction that causes a memory
system abort is always restored to its pre-instruction value.

—

The Base Updated Abort Model (BUAM), in which the base register of any valid load/store
instruction that causes a memory system abort will have been modified by the base register
writeback (if any) of that instruction.

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

xv

Preface

•

The restriction that multiplication destination registers should be different from their source registers
is removed in ARMv6.

•

In ARMv5, the LDM(2) and STM(2) ARM instructions have restrictions on the use of banked registers
by the immediately following instruction. These restrictions are removed from ARMv6.

•

The rules determining which PSR bits are updated by an MSR instruction are clarified and extended to
cover the new PSR bits defined in ARMv6.

•

In ARMv5, the Thumb MOV instruction behavior varies according to the registers used (see note). Two
changes are made in ARMv6.
—

The restriction about the use of low register numbers in the MOV (3) instruction encoding is
removed.

—

In order to make the new side-effect-free MOV instructions available to the assembler language
programmer without changing the meaning of existing assembler sources, a new assembler
syntax CPY Rd,Rn is introduced. This always assembles to the MOV (3) instruction regardless of
whether Rd and Rn are high or low registers.

Note
In ARMv5, the Thumb MOV Rd,Rn instructions have the following properties:
•

If both Rd and Rn are low registers, the instruction is the MOV (2) instruction. This instruction sets the
N and Z flags according to the value transferred, and sets the C and V flags to 0.

•

If either Rd or Rn is a high register, the instruction is the MOV (3) instruction. This instruction leaves
the condition flags unchanged.

This situation results in behavior that varies according to the registers used. The MOV(2) side-effects also limit
compiler flexibility on use of pseudo-registers in a global register allocator.

Naming of ARM/Thumb architecture versions
To name a precise version and variant of the ARM/Thumb architecture, the following strings are
concatenated:
1.
The string ARMv.
2.
The version number of the ARM instruction set.
3.
Variant letters of the included variants.
4.
In addition, the letter P is used after x to denote the exclusion of several instructions in the
ARMv5TExP variant.
The table Architecture versions on page xvii lists the standard names of the current (not obsolete)
ARM/Thumb architecture versions described in this manual. These names provide a shorthand way of
describing the precise instruction set implemented by an ARM processor. However, this manual normally
uses descriptive phrases such as T variants of architecture version 4 and above to avoid the use of lists of
architecture names.

xvi

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Preface

All architecture names prior to ARMv4 are now OBSOLETE. The term all is used throughout this manual to
refer to all architecture versions from ARMv4 onwards.

Architecture versions
Name

ARM instruction set
version

Thumb instruction set
version

Notes

ARMv4

4

None

-

ARMv4T

4

1

-

ARMv5T

5

2

-

ARMv5TExP

5

2

Enhanced DSP
instructions except
LDRD, MCRR, MRRC, PLD,
and STRD

ARMv5TE

5

2

Enhanced DSP
instructions

ARMv5TEJ

5

2

Addition of BXJ
instruction and Jazelle
Extension support
over ARMv5TE

ARMv6

6

3

Additional
instructions as listed in
Table A4-2 on
page A4-286 and
Table A7-1 on
page A7-125.

ARM DDI 0100I

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

xvii

Preface

Using this manual
The information in this manual is organized into four parts, as described below.

Part A - CPU Architectures
Part A describes the ARM and Thumb instruction sets, and contains the following chapters:

xviii

Chapter A1

Gives a brief overview of the ARM architecture, and the ARM and Thumb instruction sets.

Chapter A2

Describes the types of value that ARM instructions operate on, the general-purpose registers
that contain those values, and the Program Status Registers. This chapter also describes how
ARM processors handle interrupts and other exceptions, endian and unaligned support,
information on + synchronization primitives, and the Jazelle® extension.

Chapter A3

Gives a description of the ARM instruction set, organized by type of instruction.

Chapter A4

Contains detailed reference material on each ARM instruction, arranged alphabetically by
instruction mnemonic.

Chapter A5

Contains detailed reference material on the addressing modes used by ARM instructions.
The term addressing mode is interpreted broadly in this manual, to mean a procedure shared
by many different instructions, for generating values used by the instructions. For four of the
addressing modes described in this chapter, the values generated are memory addresses
(which is the traditional role of an addressing mode). The remaining addressing mode
generates values to be used as operands by data-processing instructions.

Chapter A6

Gives a description of the Thumb instruction set, organized by type of instruction. This
chapter also contains information about how to switch between the ARM and Thumb
instruction sets, and how exceptions that arise during Thumb state execution are handled.

Chapter A7

Contains detailed reference material on each Thumb instruction, arranged alphabetically by
instruction mnemonic.

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Preface

Part B - Memory and System Architectures
Part B describes standard memory system features that are normally implemented by the System Control
coprocessor (coprocessor 15) in an ARM-based system. It contains the following chapters:
Chapter B1

Gives a brief overview of this part of the manual.

Chapter B2

The memory order model.

Chapter B3

Gives a general description of the System Control coprocessor and its use.

Chapter B4

Describes the standard ARM memory and system architecture based on the use of a Virtual
Memory System Architecture (VMSA) based on a Memory Management Unit (MMU).

Chapter B5

Gives a description of the simpler Protected Memory System Architecture (PMSA) based on
a Memory Protection Unit (MPU).

Chapter B6

Gives a description of the standard ways to control caches and write buffers in ARM
memory systems. This chapter is relevant both to systems based on an MMU and to systems
based on an MPU.

Chapter B7

Describes the Tightly Coupled Memory (TCM) architecture option for level 1 memory.

Chapter B8

Describes the Fast Context Switch Extension and Context ID support (ARMv6 only).

Part C - Vector Floating-point Architecture
Part C describes the Vector Floating-point (VFP) architecture. This is a coprocessor extension to the ARM
architecture designed for high floating-point performance on typical graphics and DSP algorithms.
Chapter C1

Gives a brief overview of the VFP architecture and information about its compliance with
the IEEE 754-1985 floating-point arithmetic standard.

Chapter C2

Describes the floating-point formats supported by the VFP instruction set, the floating-point
general-purpose registers that hold those values, and the VFP system registers.

Chapter C3

Describes the VFP coprocessor instruction set, organized by type of instruction.

Chapter C4

Contains detailed reference material on the VFP coprocessor instruction set, organized
alphabetically by instruction mnemonic.

Chapter C5

Contains detailed reference material on the addressing modes used by VFP instructions.
One of these is a traditional addressing mode, generating addresses for load/store
instructions. The remainder specify how the floating-point general-purpose registers and
instructions can be used to hold and perform calculations on vectors of floating-point values.

ARM DDI 0100I

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

xix

Preface

Part D - Debug Architecture
Part D describes the debug architecture. This is a coprocessor extension to the ARM architecture designed
to provide configuration, breakpoint and watchpoint support, and a Debug Communications Channel (DCC)
to a debug host.

xx

Chapter D1

Gives a brief introduction to the debug architecture.

Chapter D2

Describes the key features of the debug architecture.

Chapter D3

Describes the Coprocessor Debug Register support (cp14) for the debug architecture.

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Preface

Conventions
This manual employs typographic and other conventions intended to improve its ease of use.

General typographic conventions
typewriter

Is used for assembler syntax descriptions, pseudo-code descriptions of instructions,
and source code examples. In the cases of assembler syntax descriptions and
pseudo-code descriptions, see the additional conventions below.
The typewriter font is also used in the main text for instruction mnemonics and for
references to other items appearing in assembler syntax descriptions, pseudo-code
descriptions of instructions and source code examples.

italic

Highlights important notes, introduces special terminology, and denotes internal
cross-references and citations.

bold

Is used for emphasis in descriptive lists and elsewhere, where appropriate.

SMALL CAPITALS

Are used for a few terms which have specific technical meanings. Their meanings
can be found in the Glossary.

Pseudo-code descriptions of instructions
A form of pseudo-code is used to provide precise descriptions of what instructions do. This pseudo-code is
written in a typewriter font, and uses the following conventions for clarity and brevity:
•
Indentation is used to indicate structure. For example, the range of statements that a for statement
loops over, goes from the for statement to the next statement at the same or lower indentation level
as the for statement (both ends exclusive).
•
Comments are bracketed by /* and */, as in the C language.
•
English text is occasionally used outside comments to describe functionality that is hard to describe
otherwise.
•
All keywords and special functions used in the pseudo-code are described in the Glossary.
•
Assignment and equality tests are distinguished by using = for an assignment and == for an equality
test, as in the C language.
•
Instruction fields are referred to by the names shown in the encoding diagram for the instruction.
When an instruction field denotes a register, a reference to it means the value in that register, rather
than the register number, unless the context demands otherwise. For example, a Rn == 0 test is
checking whether the value in the specified register is 0, but a Rd is R15 test is checking whether the
specified register is register 15.
•
When an instruction uses an addressing mode, the pseudo-code for that addressing mode generates
one or more values that are used in the pseudo-code for the instruction. For example, the AND
instruction described in AND on page A4-8 uses ARM addressing mode 1 (see Addressing Mode 1 Data-processing operands on page A5-2). The pseudo-code for the addressing mode generates two
values shifter_operand and shifter_carry_out, which are used by the pseudo-code for the AND
instruction.

ARM DDI 0100I

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

xxi

Preface

Assembler syntax descriptions
This manual contains numerous syntax descriptions for assembler instructions and for components of
assembler instructions. These are shown in a typewriter font, and are as follows:
Any item bracketed by < and > is a short description of a type of value to be supplied by the
user in that position. A longer description of the item is normally supplied by subsequent
text. Such items often correspond to a similarly named field in an encoding diagram for an
instruction. When the correspondence simply requires the binary encoding of an integer
value or register number to be substituted into the instruction encoding, it is not described
explicitly. For example, if the assembler syntax for an ARM instruction contains an item
 and the instruction encoding diagram contains a 4-bit field named Rn, the number of
the register specified in the assembler syntax is encoded in binary in the instruction field.

< >

If the correspondence between the assembler syntax item and the instruction encoding is
more complex than simple binary encoding of an integer or register number, the item
description indicates how it is encoded.
{ }

Any item bracketed by { and } is optional. A description of the item and of how its presence
or absence is encoded in the instruction is normally supplied by subsequent text.

|

This indicates an alternative character string. For example, LDM|STM is either LDM or STM.

spaces

Single spaces are used for clarity, to separate items. When a space is obligatory in the
assembler syntax, two or more consecutive spaces are used.

+/-

This indicates an optional + or - sign. If neither is coded, + is assumed.

*

When used in a combination like  * 4, this describes an immediate value which
must be a specified multiple of a value taken from a numeric range. In this instance, the
numeric range is 0 to 255 (the set of values that can be represented as an 8-bit immediate)
and the specified multiple is 4, so the value described must be a multiple of 4 in the range
4*0 = 0 to 4*255 = 1020.

All other characters must be encoded precisely as they appear in the assembler syntax. Apart from { and },
the special characters described above do not appear in the basic forms of assembler instructions
documented in this manual. The { and } characters need to be encoded in a few places as part of a variable
item. When this happens, the long description of the variable item indicates how they must be used.

Note
This manual only attempts to describe the most basic forms of assembler instruction syntax. In practice,
assemblers normally recognize a much wider range of instruction syntaxes, as well as various directives to
control the assembly process and additional features such as symbolic manipulation and macro expansion.
All of these are beyond the scope of this manual.

xxii

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Preface

Further reading
This section lists publications from both ARM Limited and third parties that provide additional information
on the ARM family of processors.
ARM periodically provides updates and corrections to its documentation. See http://www.arm.com for
current errata sheets and addenda, and the ARM Frequently Asked Questions.

ARM publications
ARM External Debug Interface Specification.

External publications
The following books are referred to in this manual, or provide additional information:
•

IEEE Standard for Shared-Data Formats Optimized for Scalable Coherent Interface (SCI)
Processors, IEEE Std 1596.5-1993, ISBN 1-55937-354-7, IEEE).

•

The Java™ Virtual Machine Specification Second Edition, Tim Lindholm and Frank Yellin,
published by Addison Wesley (ISBN: 0-201-43294-3)

•

JTAG Specification IEEE1149.1

ARM DDI 0100I

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

xxiii

Preface

Feedback
ARM Limited welcomes feedback on its documentation.

Feedback on this book
If you notice any errors or omissions in this book, send email to errata@arm giving:
•
the document title
•
the document number
•
the page number(s) to which your comments apply
•
a concise explanation of the problem.
General suggestions for additions and improvements are also welcome.

xxiv

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Part A
CPU Architecture

Chapter A1
Introduction to the ARM Architecture

This chapter introduces the ARM® architecture and contains the following sections:
•
About the ARM architecture on page A1-2
•
ARM instruction set on page A1-6
•
Thumb instruction set on page A1-11.

ARM DDI 0100I

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

A1-1

Introduction to the ARM Architecture

A1.1

About the ARM architecture
The ARM architecture has evolved to a point where it supports implementations across a wide spectrum of
performance points. Over two billion parts have shipped, establishing it as the dominant architecture across
many market segments. The architectural simplicity of ARM processors has traditionally led to very small
implementations, and small implementations allow devices with very low power consumption.
Implementation size, performance, and very low power consumption remain key attributes in the
development of the ARM architecture.
The ARM is a Reduced Instruction Set Computer (RISC), as it incorporates these typical RISC architecture
features:
•

a large uniform register file

•

a load/store architecture, where data-processing operations only operate on register contents, not
directly on memory contents

•

simple addressing modes, with all load/store addresses being determined from register contents and
instruction fields only

•

uniform and fixed-length instruction fields, to simplify instruction decode.

In addition, the ARM architecture provides:
•

control over both the Arithmetic Logic Unit (ALU) and shifter in most data-processing instructions
to maximize the use of an ALU and a shifter

•

auto-increment and auto-decrement addressing modes to optimize program loops

•

Load and Store Multiple instructions to maximize data throughput

•

conditional execution of almost all instructions to maximize execution throughput.

These enhancements to a basic RISC architecture allow ARM processors to achieve a good balance of high
performance, small code size, low power consumption, and small silicon area.

A1-2

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Introduction to the ARM Architecture

A1.1.1

ARM registers
ARM has 31 general-purpose 32-bit registers. At any one time, 16 of these registers are visible. The other
registers are used to speed up exception processing. All the register specifiers in ARM instructions can
address any of the 16 visible registers.
The main bank of 16 registers is used by all unprivileged code. These are the User mode registers. User
mode is different from all other modes as it is unprivileged, which means:
•

User mode can only switch to another processor mode by generating an exception. The SWI
instruction provides this facility from program control.

•

Memory systems and coprocessors might allow User mode less access to memory and coprocessor
functionality than a privileged mode.

Three of the 16 visible registers have special roles:
Stack pointer

Software normally uses R13 as a Stack Pointer (SP). R13 is used by the PUSH and POP
instructions in T variants, and by the SRS and RFE instructions from ARMv6.

Link register

Register 14 is the Link Register (LR). This register holds the address of the next
instruction after a Branch and Link (BL or BLX) instruction, which is the instruction
used to make a subroutine call. It is also used for return address information on entry
to exception modes. At all other times, R14 can be used as a general-purpose
register.

Program counter

Register 15 is the Program Counter (PC). It can be used in most instructions as
a pointer to the instruction which is two instructions after the instruction being
executed. In ARM state, all ARM instructions are four bytes long (one 32-bit word)
and are always aligned on a word boundary. This means that the bottom two bits of
the PC are always zero, and therefore the PC contains only 30 non-constant bits.
Two other processor states are supported by some versions of the architecture.
Thumb® state is supported on T variants, and Jazelle® state on J variants. The PC can
be halfword (16-bit) and byte aligned respectively in these states.

The remaining 13 registers have no special hardware purpose. Their uses are defined purely by software.
For more details on registers, refer to Registers on page A2-4.

ARM DDI 0100I

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

A1-3

Introduction to the ARM Architecture

A1.1.2

Exceptions
ARM supports seven types of exception, and a privileged processing mode for each type. The seven types
of exception are:
•
reset
•
attempted execution of an Undefined instruction
•
software interrupt (SWI) instructions, can be used to make a call to an operating system
•
Prefetch Abort, an instruction fetch memory abort
•
Data Abort, a data access memory abort
•
IRQ, normal interrupt
•
FIQ, fast interrupt.
When an exception occurs, some of the standard registers are replaced with registers specific to the
exception mode. All exception modes have replacement banked registers for R13 and R14. The fast
interrupt mode has additional banked registers for fast interrupt processing.
When an exception handler is entered, R14 holds the return address for exception processing. This is used
to return after the exception is processed and to address the instruction that caused the exception.
Register 13 is banked across exception modes to provide each exception handler with a private stack pointer.
The fast interrupt mode also banks registers 8 to 12 so that interrupt processing can begin without the need
to save or restore these registers.
There is a sixth privileged processing mode, System mode, which uses the User mode registers. This is used
to run tasks that require privileged access to memory and/or coprocessors, without limitations on which
exceptions can occur during the task.
In addition to the above, reset shares the same privileged mode as SWIs.
For more details on exceptions, refer to Exceptions on page A2-16.

The exception process
When an exception occurs, the ARM processor halts execution in a defined manner and begins execution at
one of a number of fixed addresses in memory, known as the exception vectors. There is a separate vector
location for each exception, including reset. Behavior is defined for normal running systems (see section
A2.6) and debug events (see Chapter D3 Coprocessor 14, the Debug Coprocessor)
An operating system installs a handler on every exception at initialization. Privileged operating system tasks
are normally run in System mode to allow exceptions to occur within the operating system without state loss.

A1-4

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Introduction to the ARM Architecture

A1.1.3

Status registers
All processor state other than the general-purpose register contents is held in status registers. The current
operating processor status is in the Current Program Status Register (CPSR). The CPSR holds:
•
four condition code flags (Negative, Zero, Carry and oVerflow).
•
one sticky (Q) flag (ARMv5 and above only). This encodes whether saturation has occurred in
saturated arithmetic instructions, or signed overflow in some specific multiply accumulate
instructions.
•
four GE (Greater than or Equal) flags (ARMv6 and above only). These encode the following
conditions separately for each operation in parallel instructions:
—
whether the results of signed operations were non-negative
—
whether unsigned operations produced a carry or a borrow.
•
two interrupt disable bits, one for each type of interrupt (two in ARMv5 and below).
•
one (A) bit imprecise abort mask (from ARMv6)
•
five bits that encode the current processor mode.
•
two bits that encode whether ARM instructions, Thumb instructions, or Jazelle opcodes are being
executed.
•
one bit that controls the endianness of load and store operations (ARMv6 and above only).
Each exception mode also has a Saved Program Status Register (SPSR) which holds the CPSR of the task
immediately before the exception occurred. The CPSR and the SPSRs are accessed with special
instructions.
For more details on status registers, refer to Program status registers on page A2-11.
Table A1-1 Status register summary

ARM DDI 0100I

Field

Description

Architecture

NZCV

Condition code flags

All

J

Jazelle state flag

5TEJ and above

GE[3:0]

SIMD condition flags

6

E

Endian Load/Store

6

A

Imprecise Abort Mask

6

I

IRQ Interrupt Mask

All

F

FIQ Interrupt Mask

All

T

Thumb state flag

4T and above

Mode[4:0]

Processor mode

All

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

A1-5

Introduction to the ARM Architecture

A1.2

ARM instruction set
The ARM instruction set can be divided into six broad classes of instruction:
•
Branch instructions
•
Data-processing instructions on page A1-7
•
Status register transfer instructions on page A1-8
•
Load and store instructions on page A1-8
•
Coprocessor instructions on page A1-10
•
Exception-generating instructions on page A1-10.
Most data-processing instructions and one type of coprocessor instruction can update the four condition
code flags in the CPSR (Negative, Zero, Carry and oVerflow) according to their result.
Almost all ARM instructions contain a 4-bit condition field. One value of this field specifies that the
instruction is executed unconditionally.
Fourteen other values specify conditional execution of the instruction. If the condition code flags indicate
that the corresponding condition is true when the instruction starts executing, it executes normally.
Otherwise, the instruction does nothing. The 14 available conditions allow:
•
tests for equality and non-equality
•
tests for <, <=, >, and >= inequalities, in both signed and unsigned arithmetic
•
each condition code flag to be tested individually.
The sixteenth value of the condition field encodes alternative instructions. These do not allow conditional
execution. Before ARMv5 these instructions were UNPREDICTABLE.

A1.2.1

Branch instructions
As well as allowing many data-processing or load instructions to change control flow by writing the PC, a
standard Branch instruction is provided with a 24-bit signed word offset, allowing forward and backward
branches of up to 32MB.
There is a Branch and Link (BL) option that also preserves the address of the instruction after the branch in
R14, the LR. This provides a subroutine call which can be returned from by copying the LR into the PC.
There are also branch instructions which can switch instruction set, so that execution continues at the branch
target using the Thumb instruction set or Jazelle opcodes. Thumb support allows ARM code to call Thumb
subroutines, and ARM subroutines to return to a Thumb caller. Similar instructions in the Thumb instruction
set allow the corresponding Thumb → ARM switches. An overview of the Thumb instruction set is
provided in Chapter A6 The Thumb Instruction Set.
The BXJ instruction introduced with the J variant of ARMv5, and present in ARMv6, provides the
architected mechanism for entry to Jazelle state, and the associated assertion of the J flag in the CPSR.

A1-6

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Introduction to the ARM Architecture

A1.2.2

Data-processing instructions
The data-processing instructions perform calculations on the general-purpose registers. There are five types
of data-processing instructions:
•
Arithmetic/logic instructions
•
Comparison instructions
•
Single Instruction Multiple Data (SIMD) instructions
•
Multiply instructions on page A1-8
•
Miscellaneous Data Processing instructions on page A1-8.

Arithmetic/logic instructions
The following arithmetic/logic instructions share a common instruction format. These perform an arithmetic
or logical operation on up to two source operands, and write the result to a destination register. They can
also optionally update the condition code flags, based on the result.
Of the two source operands:
•
one is always a register
•
the other has two basic forms:
—
an immediate value
—
a register value, optionally shifted.
If the operand is a shifted register, the shift amount can be either an immediate value or the value of another
register. Five types of shift can be specified. Every arithmetic/logic instruction can therefore perform an
arithmetic/logic operation and a shift operation. As a result, ARM does not have dedicated shift instructions.
The Program Counter (PC) is a general-purpose register, and therefore arithmetic/logic instructions can
write their results directly to the PC. This allows easy implementation of a variety of jump instructions.

Comparison instructions
The comparison instructions use the same instruction format as the arithmetic/logic instructions. These
perform an arithmetic or logical operation on two source operands, but do not write the result to a register.
They always update the condition flags, based on the result.
The source operands of comparison instructions take the same forms as those of arithmetic/logic
instructions, including the ability to incorporate a shift operation.

Single Instruction Multiple Data (SIMD) instructions
The add and subtract instructions treat each operand as two parallel 16-bit numbers, or four parallel 8-bit
numbers. They can be treated as signed or unsigned. The operations can optionally be saturating, wrap
around, or the results can be halved to avoid overflow.
These instructions are available in ARMv6.

ARM DDI 0100I

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

A1-7

Introduction to the ARM Architecture

Multiply instructions
There are several classes of multiply instructions, introduced at different times into the architecture. See
Multiply instructions on page A3-10 for details.

Miscellaneous Data Processing instructions
These include Count Leading Zeros (CLZ) and Unsigned Sum of Absolute Differences with optional
Accumulate (USAD8 and USADA8).

A1.2.3

Status register transfer instructions
The status register transfer instructions transfer the contents of the CPSR or an SPSR to or from a
general-purpose register. Writing to the CPSR can:
•
set the values of the condition code flags
•
set the values of the interrupt enable bits
•
set the processor mode and state
•
alter the endianness of Load and Store operations.

A1.2.4

Load and store instructions
The following load and store instructions are available:
•
Load and Store Register
•
Load and Store Multiple registers on page A1-9
•
Load and Store Register Exclusive on page A1-9.
There are also swap and swap byte instructions, but their use is deprecated in ARMv6. It is recommended
that all software migrates to using the load and store register exclusive instructions.

Load and Store Register
Load Register instructions can load a 64-bit doubleword, a 32-bit word, a 16-bit halfword, or an 8-bit byte
from memory into a register or registers. Byte and halfword loads can be automatically zero-extended or
sign-extended as they are loaded.
Store Register instructions can store a 64-bit doubleword, a 32-bit word, a 16-bit halfword, or an 8-bit byte
from a register or registers to memory.
From ARMv6, unaligned loads and stores of words and halfwords are supported, accessing the specified
byte addresses. Prior to ARMv6, unaligned 32-bit loads rotated data, all 32-bit stores were aligned, and the
other affected instructions UNPREDICTABLE.

A1-8

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Introduction to the ARM Architecture

Load and Store Register instructions have three primary addressing modes, all of which use a base register
and an offset specified by the instruction:
•

In offset addressing, the memory address is formed by adding or subtracting an offset to or from the
base register value.

•

In pre-indexed addressing, the memory address is formed in the same way as for offset addressing.
As a side effect, the memory address is also written back to the base register.

•

In post-indexed addressing, the memory address is the base register value. As a side effect, an offset
is added to or subtracted from the base register value and the result is written back to the base register.

In each case, the offset can be either an immediate or the value of an index register. Register-based offsets
can also be scaled with shift operations.
As the PC is a general-purpose register, a 32-bit value can be loaded directly into the PC to perform a jump
to any address in the 4GB memory space.

Load and Store Multiple registers
Load Multiple (LDM) and Store Multiple (STM) instructions perform a block transfer of any number of
the general-purpose registers to or from memory. Four addressing modes are provided:
•
pre-increment
•
post-increment
•
pre-decrement
•
post-decrement.
The base address is specified by a register value, which can be optionally updated after the transfer. As the
subroutine return address and PC values are in general-purpose registers, very efficient subroutine entry and
exit sequences can be constructed with LDM and STM:
•

A single STM instruction at subroutine entry can push register contents and the return address onto the
stack, updating the stack pointer in the process.

•

A single LDM instruction at subroutine exit can restore register contents from the stack, load the PC
with the return address, and update the stack pointer.

LDM and STM instructions also allow very efficient code for block copies and similar data movement

algorithms.

Load and Store Register Exclusive
These instructions support cooperative memory synchronization. They are designed to provide the atomic
behavior required for semaphores without locking all system resources between the load and store phases.
See LDREX on page A4-52 and STREX on page A4-202 for details.

ARM DDI 0100I

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

A1-9

Introduction to the ARM Architecture

A1.2.5

Coprocessor instructions
There are three types of coprocessor instructions:
Data-processing instructions
These start a coprocessor-specific internal operation.
Data transfer instructions
These transfer coprocessor data to or from memory. The address of the transfer is calculated
by the ARM processor.
Register transfer instructions
These allow a coprocessor value to be transferred to or from an ARM register, or a pair of
ARM registers.

A1.2.6

Exception-generating instructions
Two types of instruction are designed to cause specific exceptions to occur.
Software interrupt instructions
SWI instructions cause a software interrupt exception to occur. These are normally used to

make calls to an operating system, to request an OS-defined service. The exception entry
caused by a SWI instruction also changes to a privileged processor mode. This allows an
unprivileged task to gain access to privileged functions, but only in ways permitted by the
OS.
Software breakpoint instructions
BKPT instructions cause an abort exception to occur. If suitable debugger software is installed
on the abort vector, an abort exception generated in this fashion is treated as a breakpoint.
If debug hardware is present in the system, it can instead treat a BKPT instruction directly as
a breakpoint, preventing the abort exception from occurring.

In addition to the above, the following types of instruction cause an Undefined Instruction exception to
occur:
•
coprocessor instructions which are not recognized by any hardware coprocessor
•
most instruction words that have not yet been allocated a meaning as an ARM instruction.
In each case, this exception is normally used either to generate a suitable error or to initiate software
emulation of the instruction.

A1-10

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Introduction to the ARM Architecture

A1.3

Thumb instruction set
The Thumb instruction set is a subset of the ARM instruction set, with each instruction encoded in 16 bits
instead of 32 bits. For details see Chapter A6 The Thumb Instruction Set.

ARM DDI 0100I

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

A1-11

Introduction to the ARM Architecture

A1-12

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Chapter A2
Programmers’ Model

This chapter introduces the ARM® Programmers’ Model. It contains the following sections:
•
Data types on page A2-2
•
Processor modes on page A2-3
•
Registers on page A2-4
•
General-purpose registers on page A2-6
•
Program status registers on page A2-11
•
Exceptions on page A2-16
•
Endian support on page A2-30
•
Unaligned access support on page A2-38
•
Synchronization primitives on page A2-44
•
The Jazelle Extension on page A2-53
•
Saturated integer arithmetic on page A2-69.

ARM DDI 0100I

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

A2-1

Programmers’ Model

A2.1

Data types
ARM processors support the following data types:
Byte

8 bits

Halfword

16 bits

Word

32 bits

Note

A2-2

•

Support for halfwords was introduced in version 4.

•

ARMv6 has introduced unaligned data support for words and halfwords. See Unaligned access
support on page A2-38 for more information.

•

When any of these types is described as unsigned, the N-bit data value represents a non-negative
integer in the range 0 to +2N-1, using normal binary format.

•

When any of these types is described as signed, the N-bit data value represents an integer in the range
-2N-1 to +2N-1-1, using two's complement format.

•

Most data operations, for example ADD, are performed on word quantities. Long multiplies support
64-bit results with or without accumulation. ARMv5TE introduced some halfword multiply
operations. ARMv6 introduced a variety of Single Instruction Multiple Data (SIMD) instructions
operating on two halfwords or four bytes in parallel.

•

Load and store operations can transfer bytes, halfwords, or words to and from memory, automatically
zero-extending or sign-extending bytes or halfwords as they are loaded. Load and store operations
that transfer two or more words to and from memory are also provided.

•

ARM instructions are exactly one word and are aligned on a four-byte boundary. Thumb® instructions
are exactly one halfword and are aligned on a two-byte boundary. Jazelle® opcodes are a variable
number of bytes in length and can appear at any byte alignment.

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Programmers’ Model

A2.2

Processor modes
The ARM architecture supports the seven processor modes shown in Table A2-1.
Table A2-1 ARM processor modes
Processor mode

Mode number

Description

User

usr

0b10000

Normal program execution mode

FIQ

fiq

0b10001

Supports a high-speed data transfer or channel process

IRQ

irq

0b10010

Used for general-purpose interrupt handling

Supervisor

svc

0b10011

A protected mode for the operating system

Abort

abt

0b10111

Implements virtual memory and/or memory protection

Undefined

und

0b11011

Supports software emulation of hardware coprocessors

System

sys

0b11111

Runs privileged operating system tasks (ARMv4 and
above)

Mode changes can be made under software control, or can be caused by external interrupts or exception
processing.
Most application programs execute in User mode. When the processor is in User mode, the program being
executed is unable to access some protected system resources or to change mode, other than by causing an
exception to occur (see Exceptions on page A2-16). This allows a suitably-written operating system to
control the use of system resources.
The modes other than User mode are known as privileged modes. They have full access to system resources
and can change mode freely. Five of them are known as exception modes:
•
FIQ
•
IRQ
•
Supervisor
•
Abort
•
Undefined.
These are entered when specific exceptions occur. Each of them has some additional registers to avoid
corrupting User mode state when the exception occurs (see Registers on page A2-4 for details).
The remaining mode is System mode, which is not entered by any exception and has exactly the same
registers available as User mode. However, it is a privileged mode and is therefore not subject to the User
mode restrictions. It is intended for use by operating system tasks that need access to system resources, but
wish to avoid using the additional registers associated with the exception modes. Avoiding such use ensures
that the task state is not corrupted by the occurrence of any exception.

ARM DDI 0100I

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

A2-3

Programmers’ Model

A2.3

Registers
The ARM processor has a total of 37 registers:
•

Thirty-one general-purpose registers, including a program counter. These registers are 32 bits wide
and are described in General-purpose registers on page A2-6.

•

Six status registers. These registers are also 32 bits wide, but only some of the 32 bits are allocated
or need to be implemented. The subset depends on the architecture variant supported. These are
described in Program status registers on page A2-11.

Registers are arranged in partially overlapping banks, with the current processor mode controlling which
bank is available, as shown in Figure A2-1 on page A2-5. At any time, 15 general-purpose registers (R0 to
R14), one or two status registers, and the program counter are visible. Each column of Figure A2-1 on
page A2-5 shows which general-purpose and status registers are visible in the indicated processor mode.

A2-4

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Programmers’ Model

Modes
Privileged modes
Exception modes
User

System

Supervisor

Abort

Undefined

R0

R0

R0

R0

R0

R0

R0

R1

R1

R1

R1

R1

R1

R1

R2

R2

R2

R2

R2

R2

R2

R3

R3

R3

R3

R3

R3

R3

R4

R4

R4

R4

R4

R4

R4

R5

R5

R5

R5

R5

R5

R5

R6

R6

R6

R6

R6

R6

R6

R7

R7

R7

R7

R7

R7

R7

R8

R8

R8

R8

R8

R8

R8_fiq

R9

R9

R9

R9

R9

R9

R9_fiq

R10

R10

R10

R10

R10

R10

R10_fiq

R11

R11

R11

R11

R11

R11

R11_fiq

R12

R12

R12

R12

R12

R12

R12_fiq

R13

R13

R13_svc

R13_abt

R13_und

R13_irq

R13_fiq

R14

R14

R14_svc

R14_abt

R14_und

R14_irq

R14_fiq

PC

PC

CPSR

CPSR

PC

PC

CPSR

CPSR

SPSR_svc

SPSR_abt

PC

CPSR
SPSR_und

Interrupt

PC

Fast interrupt

PC

CPSR

CPSR

SPSR_irq

SPSR_fiq

indicates that the normal register used by User or System mode has
been replaced by an alternative register specific to the exception mode

Figure A2-1 Register organization

ARM DDI 0100I

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

A2-5

Programmers’ Model

A2.4

General-purpose registers
The general-purpose registers R0 to R15 can be split into three groups. These groups differ in the way they
are banked and in their special-purpose uses:
•
The unbanked registers, R0 to R7
•
The banked registers, R8 to R14
•
Register 15, the PC, is described in Register 15 and the program counter on page A2-9.

A2.4.1

The unbanked registers, R0 to R7
Registers R0 to R7 are unbanked registers. This means that each of them refers to the same 32-bit physical
register in all processor modes. They are completely general-purpose registers, with no special uses implied
by the architecture, and can be used wherever an instruction allows a general-purpose register to be
specified.

A2.4.2

The banked registers, R8 to R14
Registers R8 to R14 are banked registers. The physical register referred to by each of them depends on the
current processor mode. Where a particular physical register is intended, without depending on the current
processor mode, a more specific name (as described below) is used. Almost all instructions allow the banked
registers to be used wherever a general-purpose register is allowed.

Note
There are a few exceptions to this rule for processors pre-ARMv6, and they are noted in the individual
instruction descriptions. Where a restriction exists on the use of banked registers, it always applies to all of
R8 to R14. For example, R8 to R12 are subject to such restrictions even in systems in which FIQ mode is
never used and so only one physical version of the register is ever in use.
Registers R8 to R12 have two banked physical registers each. One is used in all processor modes other than
FIQ mode, and the other is used in FIQ mode. Where it is necessary to be specific about which version is
being referred to, the first group of physical registers are referred to as R8_usr to R12_usr and the second
group as R8_fiq to R12_fiq.
Registers R8 to R12 do not have any dedicated special purposes in the architecture. However, for interrupts
that are simple enough to be processed using registers R8 to R14 only, the existence of separate FIQ mode
versions of these registers allows very fast interrupt processing.
Registers R13 and R14 have six banked physical registers each. One is used in User and System modes, and
each of the remaining five is used in one of the five exception modes. Where it is necessary to be specific
about which version is being referred to, you use names of the form:
R13_
R14_

where  is the appropriate one of usr, svc (for Supervisor mode), abt, und, irq and fiq.

A2-6

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Programmers’ Model

Register R13 is normally used as a stack pointer and is also known as the SP. The SRS instruction, introduced
in ARMv6, is the only ARM instruction that uses R13 in a special-case manner. There are other such
instructions in the Thumb instruction set, as described in Chapter A6 The Thumb Instruction Set.
Each exception mode has its own banked version of R13. Suitable uses for these banked versions of R13
depend on the architecture version:
•

In architecture versions earlier than ARMv6, each banked version of R13 will normally be initialized
to point to a stack dedicated to that exception mode. On entry, the exception handler typically stores
the values of other registers that it wants to use on this stack. By reloading these values into the
register when it returns, the exception handler can ensure that it does not corrupt the state of the
program that was being executed when the exception occurred.
If fewer exception-handling stacks are desired in a system than this implies, it is possible instead to
initialize the banked version of R13 for an exception mode to point to a small area of memory that is
used for temporary storage while transferring to another exception mode and its stack. For example,
suppose that there is a requirement for an IRQ handler to use the Supervisor mode stack to store
SPSR_irq, R0 to R3, R12, R14_irq, and then to execute in Supervisor mode with IRQs enabled. This
can be achieved by initializing R13_irq to point to a four-word temporary storage area, and using the
following code sequence on entry to the handler:
STMIA
MRS
MOV
MOV
MOV
MRS
BIC
ORR
MSR
STMFD
STR
LDMIA
BIC
MSR
STMIB

•

R13, (R0-R3)
R0, SPSR
R1, R12
R2, R13
R3, R14
R12, CPSR
R12, R12, #0x1F
R12, R12, #0x13
CPSR_c, R12
R13!, (R1,R3)
R0, [R13,#-20]!
R2, {R0-R3}
R12, R12, #0x80
CPSR_c, R12
R13, {R0-R3}

; Put R0-R3 into temporary storage
; Move banked SPSR and R12-R14 into
; unbanked registers

; Use read/modify/write sequence
; on CPSR to switch to Supervisor
; mode
;
;
;
;
;
;
;

Push original {R12, R14_irq}, then
SPSR_irq with a gap for R0-R3
Reload R0-R3 from temporary storage
Modify and write CPSR again to
re-enable IRQs
Store R0-R3 in the gap left on the
stack for them

In ARMv6 and above, it is recommended that the OS designer should decide how many
exception-handling stacks are required in the system, and select a suitable processor mode in which
to handle the exceptions that use each stack. For example, one exception-handling stack might be
required to be locked into real memory and be used for aborts and high-priority interrupts, while
another could use virtual memory and be used for SWIs, Undefined instructions and low-priority
interrupts. Suitable processor modes in this example might be Abort mode and Supervisor mode
respectively.
The banked version of R13 for each of the selected modes is then initialized to point to the
corresponding stack, and the other banked versions of R13 are normally not used. Each exception
handler starts with an SRS instruction to store the exception return information to the appropriate
stack, followed (if necessary) by a CPS instruction to switch to the appropriate mode and possibly

ARM DDI 0100I

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

A2-7

Programmers’ Model

re-enable interrupts, after which other registers can be saved on that stack. So in the above example,
an Undefined Instruction handler that wants to re-enable interrupts immediately would start with the
following two instructions:
SRSFD
CPSIE

#svc_mode!
i, #svc_mode

The handler can then operate entirely in Supervisor mode, using the virtual memory stack pointed to
by R13_svc.
Register R14 (also known as the Link Register or LR) has two special functions in the architecture:
•

In each mode, the mode's own version of R14 is used to hold subroutine return addresses. When a
subroutine call is performed by a BL or BLX instruction, R14 is set to the subroutine return address. The
subroutine return is performed by copying R14 back to the program counter. This is typically done
in one of the two following ways:
—

Execute a BX LR instruction.

Note
An MOV PC,LR instruction will perform the same function as BX LR if the code to which it returns
uses the current instruction set, but will not return correctly from an ARM subroutine called
by Thumb code, or from a Thumb subroutine called by ARM code. The use of MOV PC,LR
instructions for subroutine return is therefore deprecated.

—

On subroutine entry, store R14 to the stack with an instruction of the form:
STMFD SP!,{,LR}

and use a matching instruction to return:
LDMFD SP!,{,PC}

•

When an exception occurs, the appropriate exception mode's version of R14 is set to the exception
return address (offset by a small constant for some exceptions). The exception return is performed in
a similar way to a subroutine return, but using slightly different instructions to ensure full restoration
of the state of the program that was being executed when the exception occurred. See Exceptions on
page A2-16 for more details.

Register R14 can be treated as a general-purpose register at all other times.

Note
When nested exceptions are possible, the two special-purpose uses might conflict. For example, if an IRQ
interrupt occurs when a program is being executed in User mode, none of the User mode registers are
necessarily corrupted. But if an interrupt handler running in IRQ mode re-enables IRQ interrupts and a
nested IRQ interrupt occurs, any value the outer interrupt handler is holding in R14_irq at the time is
overwritten by the return address of the nested interrupt.
System programmers need to be careful about such interactions. The usual way to deal with them is to
ensure that the appropriate version of R14 does not hold anything significant at times when nested
exceptions can occur. When this is hard to do in a straightforward way, it is usually best to change to another

A2-8

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Programmers’ Model

processor mode during entry to the exception handler, before re-enabling interrupts or otherwise allowing
nested exceptions to occur. (In ARMv4 and above, System mode is often the best mode to use for this
purpose.)

A2.4.3

Register 15 and the program counter
Register R15 (R15) is often used in place of the other general-purpose registers to produce various
special-case effects. These are instruction-specific and so are described in the individual instruction
descriptions.
There are also many instruction-specific restrictions on the use of R15. these are also noted in the individual
instruction descriptions. Usually, the instruction is UNPREDICTABLE if R15 is used in a manner that breaks
these restrictions.
If an instruction description neither describes a special-case effect when R15 is used nor places restrictions
on its use, R15 is used to read or write the Program Counter (PC), as described in:
•
Reading the program counter
•
Writing the program counter on page A2-10.

Reading the program counter
When an instruction reads the PC, the value read depends on which instruction set it comes from:
•

For an ARM instruction, the value read is the address of the instruction plus 8 bytes. Bits [1:0] of this
value are always zero, because ARM instructions are always word-aligned.

•

For a Thumb instruction, the value read is the address of the instruction plus 4 bytes. Bit [0] of this
value is always zero, because Thumb instructions are always halfword-aligned.

This way of reading the PC is primarily used for quick, position-independent addressing of nearby
instructions and data, including position-independent branching within a program.
An exception to the above rule occurs when an ARM STR or STM instruction stores R15. Such instructions
can store either the address of the instruction plus 8 bytes, like other instructions that read R15, or the
address of the instruction plus 12 bytes. Whether the offset of 8 or the offset of 12 is used is
IMPLEMENTATION DEFINED. An implementation must use the same offset for all ARM STR and STM
instructions that store R15. It cannot use 8 for some of them and 12 for others.
Because of this exception, it is usually best to avoid the use of STR and STM instructions that store R15. If this
is difficult, use a suitable instruction sequence in the program to ascertain which offset the implementation
uses. For example, if R0 points to an available word of memory, then the following instructions put the offset
of the implementation in R0:
SUB
STR
LDR
SUB

ARM DDI 0100I

R1,
PC,
R0,
R0,

PC, #4
[R0]
[R0]
R0, R1

;
;
;
;

R1 = address of following STR instruction
Store address of STR instruction + offset,
then reload it
Calculate the offset as the difference

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

A2-9

Programmers’ Model

Note
The rules about how R15 is read apply only to reads by instructions. In particular, they do not necessarily
describe the values placed on a hardware address bus during instruction fetches. Like all other details of
hardware interfaces, such values are IMPLEMENTATION DEFINED.

Writing the program counter
When an instruction writes the PC, the normal result is that the value written to the PC is treated as an
instruction address and a branch occurs to that address.
Since ARM instructions are required to be word-aligned, values they write to the PC are normally expected
to have bits[1:0] == 0b00. Similarly, Thumb instructions are required to be halfword-aligned and so values
they write to the PC are normally expected to have bit[0] == 0.
The precise rules depend on the current instruction set state and the architecture version:
•

In T variants of ARMv4 and above, including all variants of ARMv6 and above, bit[0] of a value
written to R15 in Thumb state is ignored unless the instruction description says otherwise. If bit[0]
of the PC is implemented (which depends on whether and how the Jazelle Extension is implemented),
then zero must be written to it regardless of the value written to bit[0] of R15.

•

In ARMv6 and above, bits[1:0] of a value written to R15 in ARM state are ignored unless the
instruction description says otherwise. Bit[1] of the PC must be written as zero regardless of the value
written to bit[1] of R15. If bit[0] of the PC is implemented (which depends on how the Jazelle
Extension is implemented), then zero must be written to it.

•

In all variants of ARMv4 and ARMv5, bits[1:0] of a value written to R15 in ARM state must be 0b00.
If they are not, the results are UNPREDICTABLE.

Several instructions have their own rules for interpreting values written to R15. For example, BX and other
instructions designed to transfer between ARM and Thumb states use bit[0] of the value to select whether
to execute the code at the destination address in ARM state or Thumb state. Special rules of this type are
described on the individual instruction pages, and override the general rules in this section.

A2-10

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Programmers’ Model

A2.5

Program status registers
The Current Program Status Register (CPSR) is accessible in all processor modes. It contains condition
code flags, interrupt disable bits, the current processor mode, and other status and control information. Each
exception mode also has a Saved Program Status Register (SPSR), that is used to preserve the value of the
CPSR when the associated exception occurs.

Note
User mode and System mode do not have an SPSR, because they are not exception modes. All instructions
that read or write the SPSR are UNPREDICTABLE when executed in User mode or System mode.
The format of the CPSR and the SPSRs is shown below.
31 30 29 28 27 26 25 24 23

N Z C V Q Res

A2.5.1

J

20 19

RESERVED

16 15

GE[3:0]

10 9 8 7 6 5 4
RESERVED

E A I F T

0

M[4:0]

Types of PSR bits
PSR bits fall into four categories, depending on the way in which they can be updated:
Reserved bits

Reserved for future expansion. Implementations must read these bits as 0 and ignore
writes to them. For maximum compatibility with future extensions to the
architecture, they must be written with values read from the same bits.

User-writable bits

Can be written from any mode. The N, Z, C, V, Q, GE[3:0], and E bits are
user-writable.

Privileged bits

Can be written from any privileged mode. Writes to privileged bits in User mode are
ignored. The A, I, F, and M[4:0] bits are privileged.

Execution state bits

Can be written from any privileged mode. Writes to execution state bits in User
mode are ignored. The J and T bits are execution state bits, and are always zero in
ARM state.
Privileged MSR instructions that write to the CPSR execution state bits must write
zeros to them, in order to avoid changing them. If ones are written to either or both
of them, the resulting behavior is UNPREDICTABLE. This restriction applies only to
the CPSR execution state bits, not the SPSR execution state bits.

A2.5.2

The condition code flags
The N, Z, C, and V (Negative, Zero, Carry and oVerflow) bits are collectively known as the condition code
flags, often referred to as flags. The condition code flags in the CPSR can be tested by most instructions to
determine whether the instruction is to be executed.

ARM DDI 0100I

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

A2-11

Programmers’ Model

The condition code flags are usually modified by:
•

Execution of a comparison instruction (CMN, CMP, TEQ or TST).

•

Execution of some other arithmetic, logical or move instruction, where the destination register of the
instruction is not R15. Most of these instructions have both a flag-preserving and a flag-setting
variant, with the latter being selected by adding an S qualifier to the instruction mnemonic. Some of
these instructions only have a flag-preserving version. This is noted in the individual instruction
descriptions.

In either case, the new condition code flags (after the instruction has been executed) usually mean:
N

Is set to bit 31 of the result of the instruction. If this result is regarded as a two's complement
signed integer, then N = 1 if the result is negative and N = 0 if it is positive or zero.

Z

Is set to 1 if the result of the instruction is zero (this often indicates an equal result from a
comparison), and to 0 otherwise.

C

Is set in one of four ways:

V

•

For an addition, including the comparison instruction CMN, C is set to 1 if the addition
produced a carry (that is, an unsigned overflow), and to 0 otherwise.

•

For a subtraction, including the comparison instruction CMP, C is set to 0 if the
subtraction produced a borrow (that is, an unsigned underflow), and to 1 otherwise.

•

For non-addition/subtractions that incorporate a shift operation, C is set to the last bit
shifted out of the value by the shifter.

•

For other non-addition/subtractions, C is normally left unchanged (but see the
individual instruction descriptions for any special cases).

Is set in one of two ways:
•

For an addition or subtraction, V is set to 1 if signed overflow occurred, regarding the
operands and result as two's complement signed integers.

•

For non-addition/subtractions, V is normally left unchanged (but see the individual
instruction descriptions for any special cases).

The flags can be modified in these additional ways:

A2-12

•

Execution of an MSR instruction, as part of its function of writing a new value to the CPSR or SPSR.

•

Execution of MRC instructions with destination register R15. The purpose of such instructions is to
transfer coprocessor-generated condition code flag values to the ARM processor.

•

Execution of some variants of the LDM instruction. These variants copy the SPSR to the CPSR, and
their main intended use is for returning from exceptions.

•

Execution of an RFE instruction in a privileged mode that loads a new value into the CPSR from
memory.

•

Execution of flag-setting variants of arithmetic and logical instructions whose destination register is
R15. These also copy the SPSR to the CPSR, and are intended for returning from exceptions.

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Programmers’ Model

A2.5.3

The Q flag
In E variants of ARMv5 and above, bit[27] of the CPSR is known as the Q flag and is used to indicate
whether overflow and/or saturation has occurred in some DSP-oriented instructions. Similarly, bit[27] of
each SPSR is a Q flag, and is used to preserve and restore the CPSR Q flag if an exception occurs. See
Saturated integer arithmetic on page A2-69 for more information.
In architecture versions prior to ARMv5, and in non-E variants of ARMv5, bit[27] of the CPSR and SPSRs
must be treated as a reserved bit, as described in Types of PSR bits on page A2-11.

A2.5.4

The GE[3:0] bits
In ARMv6, the SIMD instructions use bits[19:16] as Greater than or Equal (GE) flags for individual bytes
or halfwords of the result. You can use these flags to control a later SEL instruction, see SEL on page A4-127
for more details.
Instructions that operate on halfwords:
•
set or clear GE[3:2] together, based on the result of the top halfword calculation
•
set or clear GE[1:0] together, based on the result of the bottom halfword calculation.
Instructions that operate on bytes:
•
set or clear GE[3] according to the result of the top byte calculation
•
set or clear GE[2] according to the result of the second byte calculation
•
set or clear GE[1] according to the result of the third byte calculation
•
set or clear GE[0] according to the result of the bottom byte calculation.
Each bit is set (otherwise cleared) if the results of the corresponding calculation are as follows:
•

for unsigned byte addition, if the result is greater than or equal to 28

•
•
•

for unsigned halfword addition, if the result is greater than or equal to 216
for unsigned subtraction, if the result is greater than or equal to zero
for signed arithmetic, if the result is greater than or equal to zero.

In architecture versions prior to ARMv6, bits[19:16] of the CPSR and SPSRs must be treated as a reserved
bit, as described in Types of PSR bits on page A2-11.

A2.5.5

The E bit
From ARMv6, bit[9] controls load and store endianness for data handling. See Instructions to change CPSR
E bit on page A2-36. This bit is ignored by instruction fetches.
In architecture versions prior to ARMv6, bit[9] of the CPSR and SPSRs must be treated as a reserved bit,
as described in Types of PSR bits on page A2-11.

ARM DDI 0100I

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

A2-13

Programmers’ Model

A2.5.6

The interrupt disable bits
A, I, and F are the interrupt disable bits:

A2.5.7

A bit

Disables imprecise data aborts when it is set. This is available only in ARMv6 and above.
In earlier versions, bit[8] of CPSR and SPSRs must be treated as a reserved bit, as described
in Types of PSR bits on page A2-11.

I bit

Disables IRQ interrupts when it is set.

F bit

Disables FIQ interrupts when it is set.

The mode bits
M[4:0] are the mode bits. These determine the mode in which the processor operates. Their interpretation
is shown in Table A2-2.
Table A2-2 The mode bits
M[4:0]

Mode

Accessible registers

0b10000

User

PC, R14 to R0, CPSR

0b10001

FIQ

PC, R14_fiq to R8_fiq, R7 to R0, CPSR, SPSR_fiq

0b10010

IRQ

PC, R14_irq, R13_irq, R12 to R0, CPSR, SPSR_irq

0b10011

Supervisor

PC, R14_svc, R13_svc, R12 to R0, CPSR, SPSR_svc

0b10111

Abort

PC, R14_abt, R13_abt, R12 to R0, CPSR, SPSR_abt

0b11011

Undefined

PC, R14_und, R13_und, R12 to R0, CPSR, SPSR_und

0b11111

System

PC, R14 to R0, CPSR (ARMv4 and above)

Not all combinations of the mode bits define a valid processor mode. Only those combinations explicitly
described can be used. If any other value is programmed into the mode bits M[4:0], the result is
UNPREDICTABLE.

A2-14

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Programmers’ Model

A2.5.8

The T and J bits
The T and J bits select the current instruction set, as shown in Table A2-3.
Table A2-3 The T and J bits
J

T

Instruction set

0

0

ARM

0

1

Thumb

1

0

Jazelle

1

1

RESERVED

The T bit exists on t variants of ARMv4, and on all variants of ARMv5 and above. on non-T variants of
ARMv4, the T bit must be treated as a reserved bit, as described in Types of PSR bits on page A2-11.
The Thumb instruction set is implemented on T variants of ARMv4 and ARMv5, and on all variants of
ARMv6 and above. instructions that switch between ARM and Thumb state execution can be used freely
on implementation of these architectures.
The Thumb instruction set is not implemented on non-T variants of ARMv5. If the Thumb instruction set is
selected by setting T ==1 on these architecture variants, the next instruction executed will cause an
Undefined Instruction exception (see Undefined Instruction exception on page A2-19). Instructions that
switch between ARM and Thumb state execution can be used on implementation of these architecture
variants, but only function correctly as long as the program remains in ARM state. If the program attempts
to switch to Thumb state, the first instruction executed after that switch causes an Undefined Instruction
exception. Entry into that exception then switches back to ARM state. The exception handler can detect that
this was the cause of the exception from the fact that the T bit of SPSR_und is set.
The J bit exists on ARMv5TEJ and on all variants of ARMv6 and above. On variants of ARMv4 and
ARMv5, other than ARMv5TEJ, the J bit must be treated as a reserved bit, as described in Types of PSR bits
on page A2-11.
Hardware acceleration for Jazelle opcode execution can be implemented on ARMv5TEJ and on ARMv6
and above. On these architecture variants, the BXJ instruction is used to switch from ARM state into Jazelle
state when the hardware accelerator is present and enabled. If the hardware accelerator is disabled, or not
present, the BXJ instruction behaves as a BX instruction, and the J bit remains clear. For more details, see The
Jazelle Extension on page A2-53.

A2.5.9

Other bits
Other bits in the program status registers are reserved for future expansion. In general, programmers must
take care to write code in such a way that these bits are never modified. Failure to do this might result in
code that has unexpected side effects on future versions of the architecture. See Types of PSR bits on
page A2-11, and the usage notes for the MSR instruction on page A4-76 for more details.

ARM DDI 0100I

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

A2-15

Programmers’ Model

A2.6

Exceptions
Exceptions are generated by internal and external sources to cause the processor to handle an event, such as
an externally generated interrupt or an attempt to execute an Undefined instruction. The processor state just
before handling the exception is normally preserved so that the original program can be resumed when the
exception routine has completed. More than one exception can arise at the same time.
The ARM architecture supports seven types of exception. Table A2-4 lists the types of exception and the
processor mode that is used to process each type. When an exception occurs, execution is forced from a fixed
memory address corresponding to the type of exception. These fixed addresses are called the exception
vectors.

Note
The normal vector at address 0x00000014 and the high vector at address 0xFFFF0014 are reserved for future
expansion.

Table A2-4 Exception processing modes
Normal
address

High vector
address

Supervisor

0x00000000

0xFFFF0000

Undefined instructions

Undefined

0x00000004

0xFFFF0004

Software interrupt (SWI)

Supervisor

0x00000008

0xFFFF0008

Prefetch Abort (instruction fetch memory abort)

Abort

0x0000000C

0xFFFF000C

Data Abort (data access memory abort)

Abort

0x00000010

0xFFFF0010

IRQ (interrupt)

IRQ

0

0x00000018

0xFFFF0018

1

IMPLEMENTATION DEFINED

0

0x0000001C

1

IMPLEMENTATION DEFINED

Exception type

Mode

Reset

FIQ (fast interrupt)

FIQ

VEa

0xFFFF001C

a. VE = vectored interrupt enable (CP15 control); RAZ when not implemented.

A2-16

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Programmers’ Model

When an exception occurs, the banked versions of R14 and the SPSR for the exception mode are used to
save state as follows:
R14_ = return link
SPSR_ = CPSR
CPSR[4:0] = exception mode number
CPSR[5] = 0
/*
if  == Reset or FIQ then
CPSR[6] = 1
/*
/* else CPSR[6] is unchanged */
CPSR[7] = 1
/*
if  != UNDEF or SWI then
CPSR[8] = 1
/*
/* else CPSR[8] is unchanged */
CPSR[9] = CP15_reg1_EEbit
/*
PC = exception vector address

Execute in ARM state */
Disable fast interrupts */
Disable normal interrupts */
Disable imprecise aborts (v6 only) */
Endianness on exception entry */

To return after handling the exception, the SPSR is moved into the CPSR, and R14 is moved to the PC. This
can be done atomically in two ways:
•
using a data-processing instruction with the S bit set, and the PC as the destination
•
using the Load Multiple with Restore CPSR instruction, as described in LDM (3) on page A4-40.
In addition, in ARMv6, the RFE instruction (see RFE on page A4-113) can be used to load the CPSR and PC
from memory, so atomically returning from an exception to a PC and CPSR that was previously saved in
memory.
Collectively these mechanisms define all of the mechanisms which perform a return from exception.
The following sections show what happens automatically when the exception occurs, and also show the
recommended data-processing instruction to use to return from each exception. This instruction is always a
MOVS or SUBS instruction with the PC as its destination.

Note
When the recommended data-processing instruction is a SUBS and a Load Multiple with Restore CPSR
instruction is used to return from the exception handler, the subtraction must still be performed. This is
usually done at the start of the exception handler, before the return link is stored to memory.
For example, an interrupt handler that wishes to store its return link on the stack might use instructions of
the following form at its entry point:
SUB
STMFD

R14, R14, #4
SP!, {, R14}

and return using the instruction:
LDMFD

ARM DDI 0100I

SP!, {, PC}^

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

A2-17

Programmers’ Model

A2.6.1

ARMv6 extensions to the exception model
In ARMv6 and above, the exception model is extended as follows:

A2.6.2

•

An imprecise data abort mechanism that allows some types of data abort to be treated
asynchronously. The resulting exceptions behave like interrupts, except that they use Abort mode and
its banked registers. This mechanism includes a mask bit (the A bit) in the PSRs, in order to ensure
that imprecise data aborts do not occur while another abort is being handled. The mechanism is
described in Imprecise data aborts on page A2-23.

•

Support for vectored interrupts controlled by the VE bit in the system control coprocessor (see
Vectored interrupt support on page A2-26). It is IMPLEMENTATION DEFINED whether support for this
mechanism is included in earlier versions of the architecture.

•

Support for a low interrupt latency configuration controlled by the FI bit in the system control
coprocessor (see Low interrupt latency configuration on page A2-27). It is IMPLEMENTATION
DEFINED whether support for this mechanism is included in earlier versions of the architecture.

•

Three new instructions (CPS, SRS, RFE) to improve nested stack handling of different exceptions in a
common mode. CPS can also be used to efficiently enable or disable the interrupt and imprecise abort
masks, either within a mode, or while transitioning from a privileged mode to any other mode. See
New instructions to improve exception handling on page A2-28 for a brief description.

Reset
When the Reset input is asserted on the processor, the ARM processor immediately stops execution of the
current instruction. When Reset is de-asserted, the following actions are performed:
R14_svc
= UNPREDICTABLE value
SPSR_svc = UNPREDICTABLE value
CPSR[4:0] = 0b10011
CPSR[5]
= 0
CPSR[6]
= 1
CPSR[7]
= 1
CPSR[8]
= 1
CPSR[9]
= CP15_reg1_EEbit
if high vectors configured then
PC
= 0xFFFF0000
else
PC
= 0x00000000

/*
/*
/*
/*
/*
/*

Enter Supervisor mode */
Execute in ARM state */
Disable fast interrupts */
Disable normal interrupts */
Disable Imprecise Aborts (v6 only) */
Endianness on exception entry */

After Reset, the ARM processor begins execution at address 0x00000000 or 0xFFFF0000 in Supervisor mode
with interrupts disabled.

Note
There is no architecturally defined way of returning from a Reset.

A2-18

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Programmers’ Model

A2.6.3

Undefined Instruction exception
If the ARM processor executes a coprocessor instruction, it waits for any external coprocessor
to acknowledge that it can execute the instruction. If no coprocessor responds, an Undefined Instruction
exception occurs.
If an attempt is made to execute an instruction that is UNDEFINED, an Undefined Instruction exception occurs
(see Extending the instruction set on page A3-32).
The Undefined Instruction exception can be used for software emulation of a coprocessor in a system that
does not have the physical coprocessor (hardware), or for general-purpose instruction set extension by
software emulation.
When an Undefined Instruction exception occurs, the following actions are performed:
R14_und
SPSR_und
CPSR[4:0]
CPSR[5]

=
=
=
=

address of next instruction after the Undefined instruction
CPSR
0b11011
/* Enter Undefined Instruction mode */
0
/* Execute in ARM state */
/* CPSR[6] is unchanged */
CPSR[7]
= 1
/* Disable normal interrupts */
/* CPSR[8] is unchanged */
CPSR[9]
= CP15_reg1_EEbit
/* Endianness on exception entry */
if high vectors configured then
PC
= 0xFFFF0004
else
PC
= 0x00000004

To return after emulating the Undefined instruction use:
MOVS PC,R14

This restores the PC (from R14_und) and CPSR (from SPSR_und) and returns to the instruction following
the Undefined instruction.
In some coprocessor designs, an internal exceptional condition caused by one coprocessor instruction is
signaled imprecisely by refusing to respond to a later coprocessor instruction. In these circumstances, the
Undefined Instruction handler takes whatever action is necessary to clear the exceptional condition, then
returns to the second coprocessor instruction. To do this use:
SUBS PC,R14,#4

ARM DDI 0100I

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

A2-19

Programmers’ Model

A2.6.4

Software Interrupt exception
The Software Interrupt instruction (SWI) enters Supervisor mode to request a particular supervisor (operating
system) function. When a SWI is executed, the following actions are performed:
R14_svc
SPSR_svc
CPSR[4:0]
CPSR[5]

=
=
=
=

address of next instruction after the SWI instruction
CPSR
0b10011
/* Enter Supervisor mode */
0
/* Execute in ARM state */
/* CPSR[6] is unchanged */
CPSR[7]
= 1
/* Disable normal interrupts */
/* CPSR[8] is unchanged */
CPSR[9]
= CP15_reg1_EEbit
/* Endianness on exception entry */
if high vectors configured then
PC
= 0xFFFF0008
else
PC
= 0x00000008

To return after performing the SWI operation, use the following instruction to restore the PC
(from R14_svc) and CPSR (from SPSR_svc) and return to the instruction following the SWI:
MOVS PC,R14

A2.6.5

Prefetch Abort (instruction fetch memory abort)
A memory abort is signaled by the memory system. Activating an abort in response to an instruction fetch
marks the fetched instruction as invalid. A Prefetch Abort exception is generated if the processor tries to
execute the invalid instruction. If the instruction is not executed (for example, as a result of a branch being
taken while it is in the pipeline), no Prefetch Abort occurs.
In ARMv5 and above, a Prefetch Abort exception can also be generated as the result of executing a BKPT
instruction. For details, see BKPT on page A4-14 (ARM instruction) and BKPT on page A7-24 (Thumb
instruction).
When an attempt is made to execute an aborted instruction, the following actions are performed:

A2-20

R14_abt
SPSR_abt
CPSR[4:0]
CPSR[5]

=
=
=
=

address of the aborted instruction + 4
CPSR
0b10111
/* Enter Abort mode */
0
/* Execute in ARM state */
/* CPSR[6] is unchanged */
= 1
/* Disable normal interrupts */
= 1
/* Disable Imprecise Data Aborts (v6 only) */
= CP15_reg1_EEbit
/* Endianness on exception entry */
vectors configured then
= 0xFFFF000C

CPSR[7]
CPSR[8]
CPSR[9]
if high
PC
else
PC

= 0x0000000C

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Programmers’ Model

To return after fixing the reason for the abort, use:
SUBS PC,R14,#4

This restores both the PC (from R14_abt) and CPSR (from SPSR_abt), and returns to the aborted
instruction.

A2.6.6

Data Abort (data access memory abort)
A memory abort is signaled by the memory system. Activating an abort in response to a data access (load
or store) marks the data as invalid. A Data Abort exception occurs before any following instructions or
exceptions have altered the state of the CPU. The following actions are performed:
R14_abt
SPSR_abt
CPSR[4:0]
CPSR[5]

=
=
=
=

address of the aborted instruction + 8
CPSR
0b10111
/* Enter Abort mode */
0
/* Execute in ARM state */
/* CPSR[6] is unchanged */
= 1
/* Disable normal interrupts */
= 1
/* Disable Imprecise Data Aborts (v6 only) */
= CP15_reg1_EEbit
/* Endianness on exception entry */
vectors configured then
= 0xFFFF0010

CPSR[7]
CPSR[8]
CPSR[9]
if high
PC
else
PC

= 0x00000010

To return after fixing the reason for the abort use:
SUBS PC,R14,#8

This restores both the PC (from R14_abt) and CPSR (from SPSR_abt), and returns to re-execute the aborted
instruction.
If the aborted instruction does not need to be re-executed use:
SUBS PC,R14,#4

Effects of data-aborted instructions
Instructions that access data memory can modify memory by storing one or more values. If a Data Abort
occurs in such an instruction, the value of each memory location that the instruction stores to is:
•
unchanged if the memory system does not permit write access to the memory location
UNPREDICTABLE otherwise.
•

ARM DDI 0100I

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

A2-21

Programmers’ Model

Instructions that access data memory can modify registers in the following ways:
•

By loading values into one or more of the general-purpose registers, that can include the PC.

•

By specifying base register write-back, in which the base register used in the address calculation has
a modified value written to it. All instructions that allow this to be specified have UNPREDICTABLE
results if base register write-back is specified and the base register is the PC, so only general-purpose
registers other than the PC can legitimately be modified in this way.

•

By loading values into coprocessor registers.

•

By modifying the CPSR.

If a Data Abort occurs, the values left in these registers are determined by the following rules:
1.

The PC value on entry to the Data Abort handler is 0x00000010 or 0xFFFF0010, and the R14_abt value
is determined from the address of the aborted instruction. Neither is affected in any way by the results
of any PC load specified by the instruction.

2.

If base register write-back is not specified, the base register value is unchanged. This applies even if
the instruction loaded its own base register and the memory access to load the base register occurred
earlier than the aborting access.
For example, suppose the instruction is:
LDMIA R0,{R0,R1,R2}

and the implementation loads the new R0 value, then the new R1 value and finally the new R2 value.
If a Data Abort occurs on any of the accesses, the value in the base register R0 of the instruction is
unchanged. This applies even if it was the load of R1 or R2 that aborted, rather than the load of R0.

A2-22

3.

If base register write-back is specified, the value left in the base register is determined by the abort
model of the implementation, as described in Abort models on page A2-23.

4.

If the instruction only loads one general-purpose register, the value in that register is unchanged.

5.

If the instruction loads more than one general-purpose register, UNPREDICTABLE values are left in
destination registers that are neither the PC nor the base register of the instruction.

6.

If the instruction loads coprocessor registers, UNPREDICTABLE values are left in the destination
coprocessor registers, unless otherwise specified in the instruction set description of the specific
coprocessor.

7.

CPSR bits not defined as updated on exception entry maintain their current value.

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Programmers’ Model

Abort models
The abort model used by an ARM implementation is IMPLEMENTATION DEFINED, and is one of the
following:
Base Restored Abort Model
If a precise Data Abort occurs in an instruction that specifies base register write-back, the
value in the base register is unchanged. This is the only abort model permitted in ARMv6
and above.
Base Updated Abort Model
If a precise Data Abort occurs in an instruction that specifies base register write-back, the
base register write-back still occurs. This model is prohibited in ARMv6 and above.
In either case, the abort model applies uniformly across all instructions. An implementation does not use the
Base Restored Abort Model for some instructions and the Base Updated Abort Model for others.

A2.6.7

Imprecise data aborts
An imprecise data abort, caused, for example, by an external error on a write that has been held in a Write
Buffer, is asynchronous to the execution of the causing instruction and might in reality occur many cycles
after the instruction that caused the memory access has retired. For this reason, the imprecise data abort
might occur at a time that the processor is in abort mode because of a precise abort, or might have live state
in abort mode, but be handling an interrupt.
To avoid the loss of the Abort mode state (R14 and SPSR_abt) in these cases, that would lead to the
processor entering an unrecoverable state, the existence of a pending imprecise data abort must be held by
the system until such time as the abort mode can safely be entered.
From ARMv6, a mask is added into the CPSR (CPSR[8]) to control when an imprecise abort cannot be
accepted. This bit is referred to as the A bit. The imprecise data abort causes a Data Abort to be taken when
imprecise data aborts are not masked. When imprecise data aborts are masked, the implementation is
responsible for holding the presence of a pending imprecise abort until the mask is cleared and the abort is
taken. It is IMPLEMENTATION DEFINED whether more than one imprecise abort can be pended.
The A bit is set automatically on taking a Prefetch Abort, a Data Abort, an IRQ or FIQ interrupt, and on
reset.
The A bit can only be changed from a privileged mode.

ARM DDI 0100I

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

A2-23

Programmers’ Model

A2.6.8

Interrupt request (IRQ) exception
The IRQ exception is generated externally by asserting the IRQ input on the processor. It has a lower priority
than FIQ (see Table A2-1 on page A2-25), and is masked out when an FIQ sequence is entered.
Interrupts are disabled when the I bit in the CPSR is set. If the I bit is clear, ARM checks for an IRQ at
instruction boundaries.

Note
The I bit can only be changed from a privileged mode.
When an IRQ is detected, the following actions are performed:
R14_irq
SPSR_irq
CPSR[4:0]
CPSR[5]

=
=
=
=

address of next instruction to be executed + 4
CPSR
0b10010
/* Enter IRQ mode */
0
/* Execute in ARM state */
/* CPSR[6] is unchanged */
CPSR[7]
= 1
/* Disable normal interrupts */
CPSR[8]
= 1
/* Disable Imprecise Data Aborts (v6 only) */
CPSR[9]
= CP15_reg1_EEbit
/* Endianness on exception entry */
if VE==0 then
if high vectors configured then
PC
= 0xFFFF0018
else
PC
= 0x00000018
else
PC = IMPLEMENTATION DEFINED
/* see page A2-26 */

To return after servicing the interrupt, use:
SUBS PC,R14,#4

This restores both the PC (from R14_irq) and CPSR (from SPSR_irq), and resumes execution of the
interrupted code.

A2.6.9

Fast interrupt request (FIQ) exception
The FIQ exception is generated externally by asserting the FIQ input on the processor. FIQ is designed to
support a data transfer or channel process, and has sufficient private registers to remove the need for register
saving in such applications, therefore minimizing the overhead of context switching.
Fast interrupts are disabled when the F bit in the CPSR is set. If the F bit is clear, ARM checks for an FIQ
at instruction boundaries.

Note
The F bit can only be changed from a privileged mode.
When an FIQ is detected, the following actions are performed:

A2-24

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Programmers’ Model

R14_fiq
= address of next instruction to be executed + 4
SPSR_fiq = CPSR
CPSR[4:0] = 0b10001
/* Enter FIQ mode */
CPSR[5]
= 0
/* Execute in ARM state */
CPSR[6]
= 1
/* Disable fast interrupts */
CPSR[7]
= 1
/* Disable normal interrupts */
CPSR[8]
= 1
/* Disable Imprecise Data Aborts (v6 only) */
CPSR[9]
= CP15_reg1_EEbit
/* Endianness on exception entry */
if VE==0 then
if high vectors configured then
PC
= 0xFFFF001C
else
PC
= 0x0000001C
else
PC = IMPLEMENTATION DEFINED
/* see page A2-26 */

To return after servicing the interrupt, use:
SUBS PC, R14,#4

This restores both the PC (from R14_fiq) and CPSR (from SPSR_fiq), and resumes execution of the
interrupted code.
The FIQ vector is deliberately the last vector to allow the FIQ exception-handler software to be placed
directly at address 0x0000001C or 0xFFFF001C, without requiring a branch instruction from the vector.

A2.6.10 Exception priorities
Table A2-1 shows the exception priorities:
Table A2-1 Exception priorities
Priority
Highest

Lowest

ARM DDI 0100I

Exception
1

Reset

2

Data Abort (including data TLB miss)

3

FIQ

4

IRQ

5

Imprecise Abort (external abort) - ARMv6

6

Prefetch Abort (including prefetch TLB miss)

7

Undefined instruction
SWI

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

A2-25

Programmers’ Model

Undefined instruction and software interrupt cannot occur at the same time, because they each correspond
to particular (non-overlapping) decodings of the current instruction. Both must be lower priority than
Prefetch Abort, because a Prefetch Abort indicates that no valid instruction was fetched.
The priority of a Data Abort exception is higher than FIQ, which ensures that the Data Abort handler is
entered before the FIQ handler is entered (so that the Data Abort is resolved after the FIQ handler has
completed).

A2.6.11 High vectors
High vectors were introduced into some implementations of ARMv4 and are required in ARMv6
implementations. High vectors allow the exception vector locations to be moved from their normal address
range 0x00000000-0x0000001C at the bottom of the 32-bit address space, to an alternative address range
0xFFFF0000-0xFFFF001C near the top of the address space. These alternative locations are known as the high
vectors.
Prior to ARMv6, it is IMPLEMENTATION DEFINED whether the high vectors are supported. When they are, a
hardware configuration input selects whether the normal vectors or the high vectors are to be used from
reset.
The ARM instruction set does not contain any instructions that can directly change whether normal or high
vectors are configured. However, if the standard System Control coprocessor is attached to an ARM
processor that supports the high vectors, bit[13] of coprocessor 15 register 1 can be used to switch between
using the normal vectors and the high vectors (see Register 1: Control registers on page B3-12).

A2.6.12 Vectored interrupt support
Historically, the IRQ and FIQ exception vectors are affected by whether high vectors are enabled, and are
otherwise fixed. The result is that interrupt handlers typically have to start with an instruction sequence to
determine the cause of the interrupt and branch to a routine to handle it. Support of vectored interrupts
allows an interrupt controller to prioritize interrupts, and provide the required interrupt handler address
directly to the core. The vectored interrupt behavior is explicitly enabled by the setting of a bit, the VE bit,
in the system coprocessor CP15 register 1. See Register 1: Control registers on page B3-12. For backwards
compatibility, the vectored interrupt mechanism is disabled on reset. The details of the hardware to support
vectored interrupts is IMPLEMENTATION DEFINED.
A vectored interrupt controller (VIC) can reduce effective interrupt latency considerably, by eliminating the
need for an interrupt handler to identify the source of an interrupt and acknowledge it before re-enabling the
interrupts. Furthermore, if the VIC and core implement an appropriate handshake as the interrupt handler
routine is entered, the VIC can automatically mask out the interrupt source associated with that handler and
any lower priority sources. This allows the interrupts concerned to be re-enabled by the processor core as
soon as their return information (that is, R14 and SPSR values) have been saved, reducing the period during
which higher priority interrupts are disabled.

A2-26

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Programmers’ Model

A2.6.13 Low interrupt latency configuration
The FI bit (bit[21]) in the system control register (CP15 register 1) enables the interrupt latency
configuration logic in an implementation. See Register 1: Control registers on page B3-12. The purpose of
this configuration is to reduce the interrupt latency of the processor. The exact mechanisms that are used to
perform this are IMPLEMENTATION DEFINED.
In order to ensure that a change between normal and low interrupt latency configurations is synchronized
correctly, the FI bit must only be changed in IMPLEMENTATION DEFINED circumstances. It is recommended
that software systems should only change the FI bit shortly after reset, while interrupts are disabled.
When interrupt latency is reduced, this may result in reduced performance overall. Examples of the
mechanisms which may be used are disabling Hit-Under-Miss functionality within a core, and the
abandoning of restartable external accesses, allowing the core to react to a pending interrupt faster than
would otherwise be the case. Low interrupt latency configuration may have IMPLEMENTATION DEFINED
effects in the memory system or elsewhere outside the processor core. It is legal for the interrupt to be seen
as being taken before a store to a restartable memory location, but for the memory to have been updated
when in low interrupt latency configuration.
In low interrupt latency configuration, software must only use multi-word load/store instructions in ways
that are fully restartable. This allows (but does not require) implementations to make multi-word
instructions interruptible when in low interrupt latency configuration. The multi-access instructions to
which this rule currently applies are:
ARM

LDC, all forms of LDM, LDRD, STC, all forms of STM, STRD

Thumb

LDMIA, PUSH, POP, STMIA

Note
If the instruction is interrupted before it is complete, the result may be that one or more of the words are
accessed twice. Idempotent memory (multiple reads or writes of the same information exhibit identical
system results) is a requirement of system correctness.
In ARMv6, memory with the normal attribute is guaranteed to behave this way, however, memory marked
as Device or Strongly Ordered is not (for example, a FIFO). It is IMPLEMENTATION DEFINED whether
multi-word accesses are supported for Device and Strongly Ordered memory types in the low interrupt
latency configuration.
A similar situation exists with regard to multi-word load/store instructions that access memory locations that
can abort in a recoverable way, since an abort on one of the words accessed may cause a previously-accessed
word to be accessed twice – once before the abort, and a second time after the abort handler has returned.
The requirement in this case is either that all side-effects are idempotent, or that the abort must either occur
on the first word accessed or not at all.

ARM DDI 0100I

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

A2-27

Programmers’ Model

A2.6.14 New instructions to improve exception handling
ARMv6 adds an instruction to simplify changes of processor mode and the disabling and enabling of
interrupts. New instructions are also added to reduce the processing cost of handling exceptions in a
different mode to the exception entry mode, by removing any need to use the original mode’s stack. Two
examples are:
•

IRQ routines may wish to execute in System or Supervisor mode, so that they can both re-enable
IRQs and use BL instructions. This is not possible in IRQ mode, because a nested IRQ could corrupt
the BL’s return link at any time. Using the new instructions, the system can store the return state (R14
link register and SPSR_irq) to the System/User or Supervisor mode stack, switch to System or
Supervisor mode and re-enable IRQs efficiently, without making any use of R13_irq or the IRQ stack.

•

FIQ mode is designed for efficient use by a single owner, using R8_fiq – R13_fiq as global variables.
In addition, unlike IRQs, FIQs are not disabled by other exceptions (apart from reset), making them
the preferred type for real time interrupts, when other exceptions are being used routinely, such as
virtual memory or instruction emulation. IRQs may be disabled for unacceptably long periods of time
while these needs are being serviced.
However, if more than one real-time interrupt source is required, there is a conflict of interest. The
new mechanism allows multiple FIQ sources and minimizes the period with FIQs disabled, greatly
reducing the interrupt latency penalty. The FIQ mode registers can be allocated to the highest priority
FIQ as a single owner.

SRS – Store Return State
This instruction stores R14_ and SPSR_ to sequential addresses, using the
banked version of R13 for a specified mode to supply the base address (and to be written back to if base
register writeback is specified). This allows an exception handler to store its return state on a stack other
than the one automatically selected by its exception entry sequence.
The addressing mode used is a version of ARM addressing mode 4 (see Addressing Mode 4 - Load and Store
Multiple on page A5-41), modified so as to assume a {R14,SPSR} register list rather than using a list
specified by a bit mask in the instruction. This allows the SRS instruction to access stacks in a manner
compatible with the normal use of STM instructions for stack accesses. See SRS on page A4-174 for the
instruction details.

RFE – Return From Exception
This instruction loads the PC and CPSR from sequential addresses. This is used to return from an exception
which has had its return state saved using the SRS instruction, and again uses a version of ARM addressing
mode 4, modified this time to assume a {PC,CPSR} register list. See RFE on page A4-113 for the
instruction details.

A2-28

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Programmers’ Model

CPS – Change Processor State
This instruction provides new values for the CPSR interrupt masks, mode bits, or both, and is designed to
shorten and speed up the read/modify/write instruction sequence used in earlier architecture variants to
perform such tasks. Together with the SRS instruction, it allows an exception handler to save its return
information on the stack of another mode and then switch to that other mode, without modifying the stack
belonging to the original mode or any registers other than the stack pointer of the new mode.
The instruction also streamlines interrupt mask handling and mode switches in other code, and in particular
allows short, efficient, atomic code sequences in a uniprocessor system by disabling interrupts at their start
and re-enabling interrupts at their end. See CPS on page A4-29 for the instruction details.
A CPS Thumb instruction that allows mask updates within the current mode is also provided, see section CPS
on page A7-39.

Note
The Thumb instruction cannot change the mode due to instruction space usage constraints.

ARM DDI 0100I

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

A2-29

Programmers’ Model

A2.7

Endian support
This section discusses memory and memory-mapped I/O, with regard to the assumptions ARM processor
implementations make about endianness.
ARMv6 introduces several architectural extensions to support mixed-endian access in hardware:

A2.7.1

•

Byte reverse instructions that operate on general-purpose register contents to support word, and
signed and unsigned halfword data quantities.

•

Separate instruction and data endianness, with instructions fixed as little-endian format, naturally
aligned, but with legacy support for 32-bit word-invariant binary images/ROM.

•

A PSR Endian control flag, the E bit, which dictates the byte order used for the entire load and store
instruction space when data is loaded into, and stored back out of the register file. In previous
architectures this PSR bit was specified as 0 and is never set in legacy code written to conform to
architectures prior to ARMv6.

•

ARM and Thumb instructions to set and clear the E bit explicitly.

•

A byte-invariant addressing scheme to support fine-grain big-endian and little-endian shared data
structures, to conform to the IEEE Standard for Shared-Data Formats Optimized for Scalable
Coherent Interface (SCI) Processors, IEEE Std 1596.5-1993 (ISBN 1-55937-354-7, IEEE).

•

Bus interface endianness is IMPLEMENTATION DEFINED. However, it must support byte lane controls
for unaligned word and halfword data access.

Address space
The ARM architecture uses a single, flat address space of 232 8-bit bytes. Byte addresses are treated as
unsigned numbers, running from 0 to 232 - 1.
This address space is regarded as consisting of 230 32-bit words, each of whose addresses is word-aligned,
which means that the address is divisible by 4. The word whose word-aligned address is A consists of the
four bytes with addresses A, A+1, A+2 and A+3.
In ARMv4 and above, the address space is also regarded as consisting of 231 16-bit halfwords, each of whose
addresses is halfword-aligned (divisible by 2). The halfword whose halfword-aligned address is A consists
of the two bytes with addresses A and A+1.
In ARMv5E and above, the address space supports 64-bit doubleword operations. Doubleword operations
can be considered as two-word load/store operations, each word addressed as follows:
•
A, A+1, A+2, and A+3 for the first word
•
A+4, A+5, A+6, and A+7 for the second word.
Prior to ARMv6, word-aligned doubleword operations are UNPREDICTABLE with doubleword-aligned
addresses always supported. ARMv6 mandates support of both modulo4 and modulo8 alignment of
doublewords, and introduces support for unaligned word and halfword data accesses, all controlled through
the standard System Control coprocessor.

A2-30

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Programmers’ Model

Jazelle state (see The T and J bits on page A2-15) introduced with ARM architecture variant v5J supports
byte addressing.
Address calculations are normally performed using ordinary integer instructions. This means that they
normally wrap around if they overflow or underflow the address space. This means that the result of the
calculation is reduced modulo 232.
Normal sequential execution of instructions effectively calculates:
(address_of_current_instruction) + 4

after each instruction to determine which instruction to execute next. If this calculation overflows the top of
the address space, the result is UNPREDICTABLE. In other words, programs should not rely on sequential
execution of the instruction at address 0x00000000 after the instruction at address 0xFFFFFFFC.
The above only applies to instructions that are executed, including those which fail their condition code
check. Most ARM implementations prefetch instructions ahead of the currently-executing instruction. If
this prefetching overflows the top of the address space, it does not cause the implementation's behavior to
become UNPREDICTABLE until and unless the prefetched instructions are actually executed.
LDC, LDM, LDRD, POP, PUSH, STC, STRD, and STM instructions access a sequence of words at increasing memory
addresses, effectively incrementing a memory address by 4 for each load or store. If this calculation
overflows the top of the address space, the result is UNPREDICTABLE. In other words, programs should not
use these instructions in such a way that they access the word at address 0x00000000 sequentially after the
word at address 0xFFFFFFFC.

Any unaligned load or store whose calculated address is such that it would access the byte at 0xFFFFFFFF and
the byte at address 0x00000000 as part of the instruction is UNPREDICTABLE.

A2.7.2

Endianness - an overview
The rules in Address space on page A2-30 require that for a word-aligned address A:
•
the word at address A consists of the bytes at addresses A, A+1, A+2 and A+3
•
the halfword at address A consists of the bytes at addresses A and A+1
•
the halfword at address A+2 consists of the bytes at addresses A+2 and A+3.
•
the word at address A therefore consists of the halfwords at addresses A and A+2.
However, this does not totally specify the mappings between words, halfwords, and bytes.
A memory system uses one of the two following mapping schemes. This choice is known as the endianness
of the memory system.
In a little-endian memory system:
•

a byte or halfword at a word-aligned address is the least significant byte or halfword within the word
at that address

•

a byte at a halfword-aligned address is the least significant byte within the halfword at that address.

ARM DDI 0100I

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

A2-31

Programmers’ Model

In a big-endian memory system:
•

a byte or halfword at a word-aligned address is the most significant byte or halfword within the word
at that address

•

a byte at a halfword-aligned address is the most significant byte within the halfword at that address.

For a word-aligned address A, Table A2-2 and Table A2-3 show how the word at address A, the halfwords
at addresses A and A+2, and the bytes at addresses A, A+1, A+2 and A+3 map on to each other for each
endianness.
Table A2-2 Big-endian memory system
31

24

23

16

15

8 7

0

Word at Address A
Halfword at Address A
Byte at Address A

Halfword at Address A+2

Byte at Address A+1

Byte at Address A+2

Byte at Address A+3

Table A2-3 Little-endian memory system
31

24

23

16

15

8 7

0

Word at Address A
Halfword at Address A+2
Byte at Address A+3

Halfword at Address A

Byte at Address A+2

Byte at Address A+1

Byte at Address A

On memory systems wider than 32 bits, the ARM architecture has traditionally supported a word-invariant
memory model, meaning that a word aligned address will fetch the same data in both big endian and little
endian systems. This is illustrated for a 64-bit data path in Table A2-4 and Table A2-5 on page A2-33.
Table A2-4 Big-endian word invariant case
63

32

Word at Address A+4
Halfword at
Address A+4

A2-32

Halfword at
Address A+6

31

0

Word at Address A
Halfword at
Address A

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

Halfword at
Address A+2

ARM DDI 0100I

Programmers’ Model

Table A2-5 Little-endian word invariant case
63

32

31

0

Word at Address A+4
Halfword at
Address A+6

Halfword at
Address A+4

Word at Address A
Halfword at
Address A+2

Halfword at
Address A

New provisions in ARMv6
ARMv6 has introduced new configurations known as mixed endian support. These use a byte-invariant
address model, affecting the order that bytes are transferred to and from ARM registers. Byte invariance
means that the address of a byte in memory is the same irrespective of whether that byte is being accessed
in a big endian or little endian manner.
Byte, halfword, and word accesses access the same one, two or four bytes in memory for both big and little
endian configuration. Double word and multiple word accesses in the ARM architecture are treated as a
series of word accesses from incrementing word addresses, and hence each word also returns the same bytes
of information in these cases too.

Note
When an implementation is configured in mixed endian mode, this only affects data accesses and how they
are loaded/stored to/from the register file. Instruction fetches always assume a little endian byte order model.
•

When configured for big endian load/store, the lowest address provides the most significant byte of
the requested word or halfword. For LDRD/STRD this is the most significant byte of the first word
accessed.

•

When configured for little endian load/store, the lowest address provides the least significant byte of
the requested word or halfword. For LDRD/STRD this is the least significant byte of the first word
accessed.

The convention adopted in this book is to identify the different endian models as follows:
•

the word invariant big endian model is known as BE-32

•

the byte invariant big endian model is referred to as BE-8

•

little endian data is identical in both models and referred to as LE.

ARM DDI 0100I

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

A2-33

Programmers’ Model

A2.7.3

Endian configuration and control
Prior to ARMv6, a single bit (B bit) provides endian control. It is IMPLEMENTATION DEFINED whether
implementations of ARMv5 and below support little-endian memory systems, big-endian memory systems,
or both. If a standard System Control coprocessor is attached to an ARM implementation supporting the B
bit, this configuration input can be changed by writing to bit[7] of register 1 of the System Control
coprocessor (see Register 1: Control registers on page B3-12). An implementation may preset the B bit on
reset. If an ARM processor configures for little-endian operation on reset, and it is attached to a big-endian
memory system, one of the first things the reset handler must do is switch the configured endianness to
big-endian, using an instruction sequence like:
MRC
ORR
MCR

p15, 0, r0, c1, c0
r0, r0, #0x80
p15, 0, r0, c1, c0

; r0 := CP15 register 1
; Set bit[7] in r0
; CP15 register 1 := r0

This must be done before there is any possibility of a byte or halfword data access occurring, or instruction
execution in Thumb or Jazelle state.
ARMv6 supports big-endian, little-endian, and byte-invariant hybrid systems. LE and BE-8 formats must
be supported. Support of BE-32 is IMPLEMENTATION DEFINED.
Features are provided in the System Control coprocessor and CPSR/SPSR to support hybrid operation. The
System Control Coprocessor register (CP15 register 1) and CPSR bits used are:
•

Bit[1] - A bit - used to enable alignment checking. Always reset to zero (alignment checking OFF).

•

Bit[7] - B bit - OPTIONAL, retained for backwards compatibility

•

Bit[22] - the U bit - enables ARMv6 unaligned data support, and used with Bit[1] - the A bit - to
determine alignment checking behavior.

•

Bit [25] - the EE bit - Exception Endian bit.

•

CPSR/SPSR[9] - the E bit - load/store endian control.

The behavior of the memory system with respect to the U and A bits is summarized in Table A2-6.
Table A2-6

A2-34

U

A

Description

0

0

Legacy (32-bit word invariant only)

0

1

Modulo 8 alignment checking: LDRD/STRD (8 and 32-bit invariant
memory models)

1

0

Unaligned access support (8-bit byte invariant data accesses only)

1

1

Modulo 4 alignment checking: LDRD/STRD (8-bit and 32-bit invariant
memory models)

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Programmers’ Model

The EE-bit value is used to overwrite the CPSR_E bit on exception entry and for page table lookups. These
are asynchronous events with respect to normal control of the CPSR E bit.
A 2-bit configuration (CFGEND[1:0]) replaces the BigEndinit configuration pin to provide hardware
system configuration on reset. CFGEND[1] maps to the U bit, while CFGEND[0] sets either the B bit or EE
bit and CPSR_E on reset.
Table A2-7 defines the CFGEND[1:0] encoding and associated configurations.
Table A2-7
CFGEND[1:0]

Coprocessor 15 System Control Register (register 1)

CPSR/SPSR

EE bit[25]

U bit[22]

A bit[1]

B bit[7]

E bit

00

0

0

0

0

0

01a

0

0

0

1

0

10

0

1

0

0

0

11

1

1

0

0

1

a. This configuration is RESERVED in implementations which do not support BE-32. In this case, the B bit
must read as zero (RAZ).

Where an implementation does not include configuration pins, the U bit and A bit shall clear on reset.
The usage model for the U bit and A bit with respect to the B bit and E bit is summarized in Table A2-8.
Where BE-32 is not supported, the B bit must read as zero, and all entries indicated by B==1 are RESERVED.
Interaction of these control bits with data alignment is discussed in Unaligned access support on
page A2-38.
Table A2-8 Endian and Alignment Control Bit Usage Summary
U

A

B

E

Instruction
Endianness

Data
Endianness

Unaligned
Behavior

0

0

0

0

LE

LE

Rotated LDR

0

0

0

1

-

-

-

0

0

1

0

BE-32

BE-32

Rotated LDR

0

0

1

1

-

-

-

0

1

0

0

LE

LE

ARM DDI 0100I

Data Abort

Description
Legacy LE / programmed BE
configuration
RESERVED

(no E bit in legacy code)

Legacy BE (32-bit word-invariant)
RESERVED

(no E bit in legacy code)

modulo 8 LDRD/STRD doubleword
alignment checking. LE Data

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

A2-35

Programmers’ Model

Table A2-8 Endian and Alignment Control Bit Usage Summary (continued)
U

A

B

E

Instruction
Endianness

Data
Endianness

Unaligned
Behavior

0

1

0

1

LE

BE-8

Data Abort

modulo 8 LDRD/STRD doubleword
alignment checking. BE Data

0

1

1

0

BE-32

BE-32

Data Abort

modulo 8 LDRD/STRD doubleword
alignment checking, legacy BE

0

1

1

1

-

-

1

0

0

0

LE

LE

Unaligned

LE instructions, LE mixed-endian data,
unaligned access permitted

1

0

0

1

LE

BE-8

Unaligned

LE instructions, BE mixed-endian data,
unaligned access permitted

1

0

1

x

-

-

1

1

0

0

LE

LE

Data Abort

modulo 4 alignment checking, LE Data

1

1

0

1

LE

BE-8

Data Abort

modulo 4 alignment checking, BE data

1

1

1

0

BE-32

BE-32

Data Abort

modulo 4 alignment checking, legacy BE

1

1

1

1

-

-

-

-

-

Description

RESERVED

RESERVED

RESERVED

BE-32 and BE-8 are as defined in Endianness - an overview on page A2-31. Data aborts cause an alignment
error to be reported in the Fault Status Register in the system coprocessor.

Note
The U, A and B bits are System Control Coprocessor bits, while the E bit is a CPSR/SPSR flag.
The behavior of SETEND instructions (or any other instruction that modifies the CPSR) is UNPREDICTABLE
when setting the E bit would result in a RESERVED state.

A2.7.4

Instructions to change CPSR E bit
ARM and Thumb instructions are provided to set and clear the E bit efficiently:
SETEND BE Set the CPSR E bit.
SETEND LE Reset the CPSR E bit.
These are unconditional instructions. See ARM SETEND on page A4-129 and Thumb SETEND on
page A7-95.

A2-36

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Programmers’ Model

A2.7.5

Instructions to reverse bytes in a general-purpose register
When an application or device driver has to interface to memory-mapped peripheral registers or
shared-memory DMA structures that are not the same endianness as that of the internal data structures, or
the endianness of the Operating System, an efficient way of being able to explicitly transform the endianness
of the data is required.
ARMv6 ARM and Thumb instruction sets provide this functionality:
•

Reverse word (four bytes) register, for transforming big and little-endian 32-bit representations. See
ARM REV on page A4-109 and Thumb REV on page A7-88.

•

Reverse halfword and sign-extend, for transforming signed 16-bit representations. See ARM REVSH
on page A4-111 and Thumb REVSH on page A7-90.

•

Reverse packed halfwords in a register for transforming big- and little-endian 16-bit representations.
See ARM REV16 on page A4-110 and Thumb REV16 on page A7-89.

ARM DDI 0100I

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

A2-37

Programmers’ Model

A2.8

Unaligned access support
The ARM architecture traditionally expects all memory accesses to be suitably aligned. In particular, the
address used for a halfword access should normally be halfword-aligned, the address used for a word access
should normally be word-aligned.
Prior to ARMv6, doubleword (LDRD/STRD) accesses to memory, where the address is not doubleword-aligned,
are UNPREDICTABLE. Also, data accesses to non-aligned word and halfword data are treated as aligned from
the memory interface perspective. That is:
•

the address is treated as truncated, with address bits[1:0] treated as zero for word accesses, and
address bit[0] treated as zero for halfword accesses.

•

load single word ARM instructions are architecturally defined to rotate right the word-aligned data
transferred by a non word-aligned address one, two or three bytes depending on the value of the two
least significant address bits.

•

alignment checking is defined for implementations supporting a System Control coprocessor using
the A bit in CP15 register 1. When this bit is set, a Data Abort indicating an alignment fault is reported
for unaligned accesses.

ARMv6 introduces unaligned word and halfword load and store data access support. When this is enabled,
the processor uses one or more memory accesses to generate the required transfer of adjacent bytes
transparently to the programmer, apart from a potential access time penalty where the transaction crosses an
IMPLEMENTATION DEFINED cache-line, bus-width or page boundary condition. Doubleword accesses must
be word-aligned in this configuration.

A2.8.1

Unaligned instruction fetches
All instruction fetches must be aligned. Specifically they must be:
•
word aligned in ARM state
•
halfword aligned in Thumb state.
Writing an unaligned address to R15 is UNPREDICTABLE, except in the specific cases where the instructions
are associated with a Thumb to ARM state transition, bit[1] providing a valid address bit on transition to
Thumb state, and bit[0] indicating whether a transition needs to occur. The BX instruction in ARM state (see
BX on page A4-20) and POP instruction in Thumb state (see POP on page A7-82) are examples of
instructions providing state transition support.
The general rules for reading and writing the program counter are defined in Register 15 and the program
counter on page A2-9.

A2-38

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Programmers’ Model

A2.8.2

Unaligned data access in ARMv6 systems
ARMv6 uses the U bit (CP15 register 1 bit[22]) and A bit (CP15 register 1 bit[1]), to provide a configuration
supporting the following unaligned memory accesses:
•

Unaligned halfword accesses for LDRH, LDRSH and STRH.

•

Unaligned word accesses for LDR, LDRT, STR and STRT.

The U bit and A bit are also used to configure endian support as described in Endian configuration and
control on page A2-34. All other multi-byte load and store accesses shall be word aligned.
Instructions must always be aligned (and in little endian format):
•
ARM instructions must be word-aligned
•
Thumb instructions must be halfword-aligned.
In addition, an ARMv6 system shall reset to the CFGEND[1:0] condition as described in Table A2-7 on
page A2-35.
For ARMv6, Table A2-10 on page A2-40 defines when an alignment fault must occur for an access, and
when the behavior of an access is architecturally UNPREDICTABLE. It also gives details of precisely which
memory locations are returned for valid accesses.
The access type descriptions used in this section are determined from the load/store instructions as described
in Table A2-9:
Table A2-9
Access
Type

ARM instructions

Thumb instructions

Byte

LDRB LDRBT LDRSB STRB STRBT SWPB (either access)

LDRB LDRSB STRB

Halfword

LDRH LDRSH STRH

LDRH LDRSH STRH

WLoad

LDR LDRT SWP (load access, if U == 0)

LDR

WStore

STR STRT SWP (store access, if U == 0)

STR

WSync

LDREX STREX SWP (either access, if U == 1)

-

Two-word

LDRD STRD

-

Multi-word

LDC LDM RFE SRS STC STM

LDMIA POP PUSH STMIA

The following terminology is used to describe the memory locations accessed:
Byte[X]

ARM DDI 0100I

Means the byte whose address is X in the current endianness model. The correspondence
between the endianness models is that Byte[A] in the LE endianness model, Byte[A] in the
BE-8 endianness model, and Byte[A EOR 3] in the BE-32 endianness model are the same
actual byte of memory.

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

A2-39

Programmers’ Model

Halfword[X] Means the halfword consisting of the bytes whose addresses are X and X+1 in the current
endianness model, combined to form a halfword in little-endian order in the LE endianness
model or in big-endian order in the BE-8 or BE-32 endianness model.
Word[X]

Means the word consisting of the bytes whose addresses are X, X+1, X+2, and X+3 in the
current endianness model, combined to form a word in little-endian order in the LE
endianness model or in big-endian order in the BE-8 or BE-32 endianness model.

Note
It is a consequence of these definitions that if X is word-aligned, Word[X] consists of the
same four bytes of actual memory in the same order in the LE and BE-32 endianness
models.

Align[X]

Means (X AND 0xFFFFFFFC) - that is, X with its least significant two bits forced to zero to make
it word-aligned.

Note
There is no difference between Addr and Align(Addr) on lines for which Addr[1:0] == 0b00
anyway. This can be exploited by implementations to simplify the control of when the least
significant bits are forced to zero.
For the Two-word and Multi-word access types, the Memory accessed column only specifies the lowest
word accessed. Subsequent words have addresses constructed by successively incrementing the address of
the lowest word by 4, and are constructed using the same endianness model as the lowest word.
Table A2-10 Data Access Behavior in ARMv6 Systems
Behavior

Memory
accessed

A

0

0

0

0

xxx

Byte

Normal

Byte[Addr]

-

0

0

xx0

Halfword

Normal

Halfword[Addr]

-

0

0

xx1

Halfword

UNPREDICTABLE

-

-

0

0

xxx

WLoad

Normal

Word[Align(Addr)]

Loaded data rotated right by
8 * Addr[1:0] bits

0

0

xxx

WStore

Normal

Word[Align(Addr)]

Operation unaffected by
Addr[1:0]

0

0

x00

WSync

Normal

Word[Addr]

-

A2-40

Addr[2:0]

Access
Types

U

Notes
LEGACY, NO
ALIGNMENT FAULTING

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Programmers’ Model

Table A2-10 Data Access Behavior in ARMv6 Systems (continued)
U

A

Addr[2:0]

Access
Types

Behavior

Memory
accessed

Notes

0

0

xx1, x1x

WSync

UNPREDICTABLE

-

-

0

0

xxx

Multi-word

Normal

Word[Align(Addr)]

Operation unaffected by
Addr[1:0]

0

0

000

Two-word

Normal

Word[Addr]

-

0

0

xx1, x1x,
1xx

Two-word

UNPREDICTABLE

-

-

1

0

1

0

xxx

Byte

Normal

Byte[Addr]

-

1

0

xxx

Halfword

Normal

Halfword[Addr]

-

1

0

xxx

WLoad
WStore

Normal

Word[Addr]

-

1

0

x00

WSync
Multi-word
Two-word

Normal

Word[Addr]

-

1

0

xx1, x1x

WSync
Multi-word
Two-word

Alignment Fault

-

-

x

1

x

1

xxx

Byte

Normal

Byte[Addr]

-

x

1

xx0

Halfword

Normal

Halfword[Addr]

-

x

1

xx1

Halfword

Alignment Fault

-

-

x

1

x00

WLoad
WStore
WSync
Multi-word

Normal

Word[Addr]

-

x

1

xx1, x1x

WLoad
WStore
WSync
Multi-word

Alignment Fault

-

-

NEW ARMv6
UNALIGNED SUPPORT

FULL ALIGNMENT
FAULTING

ARM DDI 0100I

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

A2-41

Programmers’ Model

Table A2-10 Data Access Behavior in ARMv6 Systems (continued)
U

A

Addr[2:0]

Access
Types

Behavior

Memory
accessed

Notes

x

1

000

Two-word

Normal

Word[Addr]

-

0

1

100

Two-word

Alignment Fault

-

-

1

1

100

Two-word

Normal

Word[Addr]

-

x

1

xx1, x1x

Two-word

Alignment Fault

-

-

Other reasons for unaligned accesses to be UNPREDICTABLE
The following exceptions to the behavior described in Table A2-10 on page A2-40 apply, causing the
resultant unaligned accesses to be UNPREDICTABLE:
•

An LDR instruction that loads the PC, has Addr[1:0] != 0b00, and is specified in the table as having
Normal behavior instead has UNPREDICTABLE behavior.

Note
The reason this applies only to LDR is that most other load instructions are UNPREDICTABLE regardless
of alignment if the PC is specified as their destination register. The exceptions are LDM, RFE and Thumb
POP. If Addr[1:0] != 0b00 for these instructions, the effective address of the transfer has its two least
significant bits forced to 0 if A == 0 and U ==0, and otherwise the behavior specified in the table is
either UNPREDICTABLE or Alignment Fault regardless of the destination register.
•

Any WLoad, WStore, WSync, Two-word or Multi-word instruction that accesses memory with the
Strongly Ordered or Device memory attribute, has Addr[1:0] != 0b00, and is specified in the table
as having Normal behavior instead has UNPREDICTABLE behavior.

•

Any Halfword instruction that accesses memory with the Strongly Ordered or Device memory
attribute, has Addr[0] != 0, and is specified in the table as having Normal behavior instead has
UNPREDICTABLE behavior.

If any of these reasons applies, it overrides the behavior specified in the table.

Note
These reasons never cause Alignment Fault behavior to be overridden.
ARM implementations are not required to ensure that the low-order address bits that make an access
unaligned are cleared from the address they send to memory. They can instead send the address as calculated
by the load/store instruction unchanged to memory, and require the memory system to ignore address[0] for
a halfword access and address[1:0] for a word access.

A2-42

Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved.

ARM DDI 0100I

Programmers’ Model

When an instruction ignores the low-order address bits that make an access unaligned, the pseudo-code in
the instruction description does not mask them out explicitly. Instead, the Memory[
,] function used in the pseudo-code masks them out implicitly. ARMv6 unaligned data access restrictions ARMv6 has the following restrictions on unaligned data accesses: • Accesses are not guaranteed atomic. They can be synthesized out of a series of aligned operations in a shared memory system without guaranteeing locked transaction cycles. • Accesses typically take a number of cycles to complete compared to a naturally aligned transfer. The real-time implications must be carefully analyzed and key data structures might need to have their alignment adjusted for optimum performance. • Accesses can abort on either or both halves of an access where this occurs over a page boundary. The Data Abort handler must handle restartable aborts carefully after an Alignment Fault Status Code is signaled. Therefore shared memory schemes should not rely on seeing monotonic updates of non-aligned data of loads, stores, and swaps for data items greater than byte width. Unaligned access operations should not be used for accessing Device memory-mapped registers. They must also be used with care in shared memory structures that are protected by aligned semaphores or synchronization variables. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A2-43 Programmers’ Model A2.9 Synchronization primitives Historically, support for shared memory synchronization has been with the read-locked-write operations that swap register contents with memory; the SWP and SWPB instructions described in SWP on page A4-212 and SWPB on page A4-214. These support basic busy/free semaphore mechanisms, but not mechanisms that require calculation to be performed on the semaphore between the read and write phases. ARMv6 provides a new mechanism to support more comprehensive non-blocking shared-memory synchronization primitives that scale for multiple-processor system designs. Note The swap and swap byte instructions are deprecated in ARMv6. It is recommended that all software migrates to using the new synchronization primitives. Two instructions are introduced to the ARM instruction set: • Load-Exclusive described in LDREX on page A4-52 • Store-Exclusive described in STREX on page A4-202. The instructions operate in concert with an address monitor, which provides the state machine and associated system control for memory accesses. Two different monitor models exist, depending on whether the memory has the sharable or non-sharable memory attribute. See Shared attribute on page B2-12. Uniprocessor systems are only required to support the non-shared memory model, allowing them to support synchronization primitives with the minimum amount of hardware overhead. An example minimal system is illustrated in Figure A2-2. L2 RAM L2 Cache Bridge to L3 Routing matrix Monitor CPU 1 Figure A2-2 Example uniprocessor (non-shared) monitor Multi-processor systems are required to implement an address monitor for each processor. It is IMPLEMENTATION DEFINED where the monitors reside in the memory system hierarchy, whether they are implemented as a single entity for each processor visible to all shared accesses, or as a distributed entity. Figure A2-3 on page A2-45 illustrates a single entity approach in which the monitor supports state machines for both the shared and non-shared cases. Only the shared attribute case needs to snoop. A2-44 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I Programmers’ Model L2 RAM L2 Cache Bridge to L3 Routing matrix Monitor Monitor CPU 1 CPU 2 Figure A2-3 Write snoop monitor approach Figure A2-4 illustrates a distributed model with local monitors residing in the processor blocks, and global monitors distributed across the targets of interest. Shared L2 RAM Nonshared L2 RAM L2 Cache Bridge to L3 Mon 2 Mon 2 Mon 2 Mon 1 Mon 1 Mon 1 Routing matrix Local Monitor CPU 1 Local Monitor CPU 2 Figure A2-4 Monitor-at-target approach A2.9.1 Exclusive access instructions: non-shared memory For memory regions that do not have the Shared TLB attribute, the exclusive-access instructions rely on the ability to tag the fact that an exclusive load has been executed. Any non-aborted attempt by the processor that executed the exclusive load to modify any address using an exclusive store is guaranteed to clear this tag. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A2-45 Programmers’ Model Note In non-shared memory, it is UNPREDICTABLE whether a store to a tagged physical address will cause a tag to be cleared when that store is by a processor other than the one that caused the physical address to be tagged. Load-Exclusive performs a load from memory, and causes the executing processor to tag the fact that it has an outstanding tagged physical address to non-sharable memory; the monitor transitions state to Exclusive Access. Store-Exclusive performs a conditional store to memory, the store only taking place if the local monitor of the executing processor is in the Exclusive Access state. A status value of 0b0 is returned to a register, and the executing processor's monitor transitions to the Open Access state. If the store is prevented, a value of 0b1 is returned in the instruction defined register. A write to a physical address not covered by the local monitor by that processor using any instruction other than a Store-Exclusive will not affect the state of the local monitor. It is IMPLEMENTATION DEFINED whether a write (other than with a Store-Exclusive) to the physical address which is covered by the monitor will affect the state of the local monitor. If a processor performs a Store-Exclusive to any address in non-shared memory other than the last one from which it has performed a Load-Exclusive, and the monitor is in the exclusive state, it is IMPLEMENTATION DEFINED whether the store will succeed in this case. This mechanism is used on a context switch (see section Context switch support on page A2-48). It should be treated as a software programming error in all other cases. The state machine for the associated data monitor is illustrated in Figure A2-5. STREX(x), STR(x) Rm <= 1’b1; Do not update memory Tagged_address <= x[31:a] Tagged_address <= x[31:a] LDREX(x) LDREX(x) Open Access Rm <= 1’b0; update memory Exclusive Access STREX(Tagged_address) STREX(!Tagged_address) (Rm <= 1’b0 AND update memory) OR STR(Tagged_address) (Rm <= 1’b1 AND do not update memory) STR(!Tagged_address) STR(Tagged_address) The arcs in italics show allowable alternative (IMPLEMENTATION DEFINED) options. The Tagged_address value of ‘a’ is IMPLEMENTATION DEFINED to a value between 2 and 7 inclusive. Figure A2-5 State diagram - local monitor A2-46 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I Programmers’ Model Note The IMPLEMENTATION DEFINED options for the local monitor are consistent with the local monitor being constructed in a manner that it does not hold any physical address, but instead treats all accesses as matching the address of the previous LDREX. The behavior illustrated is for the local address monitor associated with the processor issuing the LDREX, STREX and STR instructions. The transition from Exclusive Access to Open Access is UNPREDICTABLE when the STR or STREX is from a different processor. Transactions from other processors need not be visible to this monitor. A2.9.2 Exclusive access instructions: shared memory For memory regions that have the Shared TLB attribute, the exclusive-access instructions rely on the ability of a global monitor to tag a physical address as exclusive-access for a particular processor. This tag will later be used to determine whether an exclusive store to that address should occur. Any non-aborted attempt to modify that address by any processor is guaranteed to clear this tag. A global monitor can reside in a processor block as illustrated in Figure A2-3 on page A2-45, or as a secondary monitor at the memory interface, as shown in Figure A2-4 on page A2-45. The functionality of the global and local monitors can be combined into a single monitor in implementations. Load-Exclusive from shared memory performs a load from memory, and causes the physical address of the access to be tagged as exclusive-access for the requesting processor. This also causes any other physical address that has been tagged by the requesting processor to no longer be tagged as exclusive access; only a single outstanding exclusive access to sharable memory per processor is supported. Store-Exclusive performs a conditional store to memory. The store is only guaranteed to take place if the physical address is tagged as exclusive-access for the requesting processor. If no address is tagged as exclusive-access, the store will not succeed. If a different physical address is tagged as exclusive-access for the requesting processor, it is IMPLEMENTATION DEFINED whether the store will succeed or not. A status value of 0b0 is returned to a register to acknowledge a successful store, otherwise a value of 0b1 is returned. In the case where the physical address is tagged as exclusive-access for the requesting processor, the state of the exclusive monitor transitions to the Open Access state, and if the monitor was originally in the Open Access state, it remains in this state. Otherwise, it is IMPLEMENTATION DEFINED whether the monitor remains in the Exclusive Access state or transitions to the Open Access state. Every processor (or independent DMA agent) in a shared memory system requires its own address monitor. The state machine for the global address monitor associated with a processor (n) in a multiprocessing environment interacts with all the memory accesses visible to it: • transactions generated by the associated processor (n) • transactions associated with other processors in the shared memory system (!n). The behavior is illustrated in Figure A2-6 on page A2-48. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A2-47 Programmers’ Model Rm <= 1’b1; Do not update memory STREX(x,n), STR(x,n) LDREX(x,!n), STREX(x,!n), STR(x,!n) Tagged_address <= x[31:a] Tagged_address <= x[31:a] (Rm <= 1’b1 AND do not update memory) OR Exclusive Open Access (Rm <= 1’b0 Access AND update memory) STR(!Tagged_address,n), (Rm <= 1’b0 AND update memory) STREX(Tagged_address,!n)*, STR(Tagged_address,n), STR(Tagged_address,!n) STREX(!Tagged_address,n), STREX(Tagged_address,n), STREX(Tagged_address,n), (Rm <= 1’b1 AND do not update memory) STREX(!Tagged_address,n), STR(!Tagged_address,!n), OR STR(Tagged_address,n) STREX(!Tagged_address,!n) (Rm <= 1’b0 AND update memory) (Rm <= 1’b0 AND * STREX(Tagged_Address,!n) only clears monitor if the STREX updates memory update memory) LDREX(x,n) LDREX(x,n) The arcs in italics show allowable alternative (IMPLEMENTATION DEFINED) options. The Tagged_address value of ’a‘ is IMPLEMENTATION DEFINED to a value between 2 and 7 inclusive. Figure A2-6 State diagram - global monitor Note Whether a STREX successfully updates memory or not is dependent on a tag address match with its associated global monitor, hence the (!n) entries are only shown with respect to how they influence state transitions of the state machine. Similarly, an LDREX can only update the tag of its associated global monitor. A2.9.3 Context switch support On a context switch, it is necessary to ensure that the local monitor is in the Open Access state after a context switch. This requires execution of a dummy STREX to an address in memory allocated for this purpose. For reasons of performance, it is recommended that the store-exclusive instruction be within a few instructions of the load-exclusive instruction. This minimizes the opportunity for context switch overhead or multiprocessor access conflicts causing an exclusive store to fail, and requiring the load/store sequence to be replayed. A2-48 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I Programmers’ Model A2.9.4 Summary of operation The following pseudo-functions can be used to describe the exclusive access operations: • TLB() • Shared() • ExecutingProcessor() • MarkExclusiveGlobal(,,) • MarkExclusiveLocal(,,size>) • IsExclusiveGlobal(,,) • IsExclusiveLocal(,,) • ClearExclusiveByAddress(,,) • ClearExclusiveLocal(). 1. If CP15 register 1 bit[0] (Mbit) is set, TLB() returns the physical address corresponding to the virtual address in Rm for the executing processor's current process ID and TLB entries. If Mbit is not set, or the system does not implement a virtual to physical translation, it returns the value in Rm. 2. If CP15 register 1 bit[0] (Mbit) is set, Shared() returns the value of the shared memory region attribute corresponding to the virtual address in Rm for the executing processor's current process ID and TLB entries for the VMSA, or the PMSA region descriptors. If Mbit is not set, the value returned is a function of the memory system behavior (see Chapter B4 Virtual Memory System Architecture and Chapter B5 Protected Memory System Architecture). 3. ExecutingProcessor() returns a value distinct amongst all processors in a given system, corresponding to the processor executing the operation. 4. MarkExclusiveGlobal(,,) records the fact that processor has requested exclusive access covering at least bytes from address . The size of region marked as exclusive is IMPLEMENTATION DEFINED, up to a limit of 128 bytes, and no smaller than , and aligned in the address space to the size of the region. It is UNPREDICTABLE whether this causes any previous request for exclusive access to any other address by the same processor to be cleared. 5. MarkExclusiveLocal(,,) records in a local record the fact that processor has requested exclusive access to an address covering at least bytes from address . The size of the region marked as exclusive is IMPLEMENTATION DEFINED, and can at its largest cover the whole of memory, but is no smaller than , and is aligned in the address space to the size of the region. It is IMPLEMENTATION DEFINED whether this also performs a MarkExclusiveGlobal(,,). 6. IsExclusiveGlobal(,,) returns TRUE if the processor has marked in a global record an address range as exclusive access requested which covers at least the bytes from address . It is IMPLEMENTATION DEFINED whether it returns TRUE or FALSE if a global record has marked a different address as exclusive access requested. If no address is marked in a global record as exclusive access, IsExclusiveGlobal(,,) will return FALSE. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A2-49 Programmers’ Model 7. IsExclusiveLocal(,,) returns TRUE if the processor has marked an address range as exclusive access requested which covers at least the bytes from address . It is IMPLEMENTATION DEFINED whether this function returns TRUE or FALSE if the address marked as exclusive access requested does not cover all of the bytes from address . If no address is marked as exclusive access requested, then this function returns FALSE. It is IMPLEMENTATION DEFINED whether this result is ANDed with the result of an IsExclusiveGlobal(,,). 8. ClearExclusiveByAddress(,,) clears the global records of all processors, other than , that an address region including any of the bytes between and (+-1) has had a request for an exclusive access. It is IMPLEMENTATION DEFINED whether the equivalent global record of the processor is also cleared if any of the bytes between and (+-1) have had a request for an exclusive access, or if any other address has had a request for an exclusive access. 9. ClearExclusiveLocal() clears the local record of processor that an address has had a request for an exclusive access. It is IMPLEMENTATION DEFINED whether this operation also clears the global record of processor that an address has had a request for an exclusive access. For the purpose of this definition, a processor is defined as a system component, including virtual system components, which is capable of generating memory transactions. The processor_id is defined as a unique identifier for a processor. Effects on other store operations All executed store operations gain the following functional behavior to their pseudo-code operation: processor_id = ExecutingProcessor() if Shared(address) then /* from ARMv6 */ physical_address = TLB(address) ClearExclusiveByAddress(physical_address,processor_id,size) Load and store operation The exclusive accesses can be described in terms of their register file usage: A2-50 • Rd: the destination register, for data on loads, status on stores • Rm: the source data register for stores • Rn: the memory address register for loads and stores. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I Programmers’ Model A pseudo-code representation is as follows. LDREX operation: if ConditionPassed (cond) then processor_id = ExecutingProcessor() Rd = Memory[Rn,4] physical_address = TLB(Rn) if Shared(Rn) == 1 then MarkExclusiveGlobal(physical_address,processor_id,4) MarkExclusiveLocal(physical_address,processor_id,4) STREX operation: if ConditionPassed(cond) then processor_id = ExecutingProcessor() physical_address = TLB(Rn) if IsExclusiveLocal(physical_address,processor_id,4) then if Shared(Rn) == 1 then if IsExclusiveGlobal(physical_address,processor_id,4) then Memory[Rn,4] = Rm Rd = 0 ClearExclusiveByAddress(physical_address,processor_id,4) else Rd = 1 else Memory[Rn,4] =Rm Rd = 0 else Rd = 1 ClearExclusiveLocal(processor_id) Note The behavior of STREX in regions of shared memory that do not support exclusives (for example, have no exclusives monitor implemented) is UNPREDICTABLE. For a complete definition of the instruction behavior see LDREX on page A4-52 and STREX on page A4-202. Usage restrictions The LDREX and STREX instructions are designed to work in tandem. In order to support a number of different implementations of these functions, the following notes and restrictions must be followed: 1. ARM DDI 0100I The exclusives are designed to support a single outstanding exclusive access for each processor thread that is executed. The architecture makes use of this by not mandating an address or size check as part of the IsExclusiveLocal() function. If the target address of an STREX is different from the preceding LDREX within the same execution thread, it can lead to UNPREDICTABLE behavior. As a result, an LDREX/STREX pair can only be relied upon to eventually succeed if they are executed with the same address. Where a context switch or exception might result in a change of execution thread, a Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A2-51 Programmers’ Model dummy STREX instruction, as described in Context switch support on page A2-48 should be executed to avoid unwanted effects. This is the only occasion where an STREX is expected to be programmed with a different address from the previously executed LDREX. 2. An explicit store to memory can cause the clearing of exclusive monitors associated with other processors, therefore, performing a store between the LDREX and the STREX can result in livelock situations. As a result, code should avoid placing an explicit store between an LDREX and an STREX within a single code sequence. 3. Two STREX instructions executed without an intervening LDREX will also result in the second STREX returning FALSE. As a result, it is expected that each STREX should have a preceding LDREX associated with it within a given thread of execution, but it is not necessary that each LDREX must have a subsequent STREX. 4. Implementations can cause apparently spurious clearing of the exclusive monitor between the LDREX and the STREX, as a result of, for example, cache evictions. Code designed to run on such implementations should avoid having any explicit memory transactions or cache maintenance operations between the LDREX and STREX instructions. 5. Implementations can benefit from keeping the LDREX and STREX operations close together in a single code sequence. This reduces the likelihood of spurious clearing of the exclusive monitor state occurring, and as a result, a limit of 128 bytes between LDREX and STREX instructions in a single code sequence is strongly recommended for best performance. 6. Implementations which implement coherent protocols, or have only a single master, may combine the local and global monitors for a given processor. The IMPLEMENTATION DEFINED and UNPREDICTABLE parts of the definitions in Summary of operation on page A2-49. are designed to cover this behavior. 7. The architecture sets an upper limit of 128 bytes on the regions that may be marked as exclusive. Therefore, for performance reasons, software is recommended to separate objects that will be accessed by exclusive accesses by at least 128 bytes. This is a performance guideline rather than a functional requirement 8. LDREX and STREX operations shall only be performed on memory supporting the Normal memory attribute. 9. A2-52 The effect of data aborts are UNPREDICTABLE on the state of monitors. It is recommended that abort handling code performs a dummy STREX instruction to clear down the monitor state. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I Programmers’ Model A2.10 The Jazelle Extension The Jazelle Extension was first introduced in ARMv5TEJ, a variant of ARMv5, and is a mandated feature in ARMv6. The Jazelle Extension enables architectural support for hardware acceleration of opcode execution by Java Virtual Machines (JVMs). It is designed in such a way that JVMs can be written to automatically take advantage of any accelerated opcode execution supplied by the processor, without relying upon it being present. In the simplest implementations, the processor does not accelerate the execution of any opcodes, and all opcodes are executed by software routines. This is known as a trivial implementation of the Jazelle Extension, and has minimal costs compared with not implementing the Jazelle Extension at all. Non-trivial implementations of the Jazelle Extension will typically implement a subset of the opcodes in hardware, choosing opcodes that can have simple hardware implementations and that account for a large percentage of Jazelle execution time. The required features of a non-trivial implementation are: • provision of an additional state bit (the J bit) in the CPSR and each SPSR • a new instruction to enter Jazelle state (BXJ) • extension of the PC to support full 32-bit byte addressing • changes to the exception model • mechanisms to allow a JVM to configure the Jazelle Extension hardware to its specific needs • mechanisms to allow OSes to regulate use of the Jazelle Extension hardware. The required features of a trivial implementation are: • Only ARM and Thumb execution states shall exist. The J bit may always read and write as zero. Should the J bit update to one, execution of the following instruction is UNDEFINED. • The BXJ instruction shall behave as a BX instruction. • Configuration support that maintains the interface as permanently disabled. A JVM that has been written to automatically take advantage of hardware-accelerated opcode execution is known as an Enabled JVM (EJVM). A2.10.1 Subarchitectures ARM implementations that include the Jazelle Extension expect the ARM processor’s general-purpose registers and other resources to obey a calling convention when Jazelle state execution is entered and exited. For example, a specific general-purpose register may be reserved for use as the pointer to the current opcode. In order for an EJVM or associated debug support to function correctly, it must be written to comply with the calling convention expected by the acceleration hardware at Jazelle state execution entry and exit points. The calling convention is relied upon by an EJVM, but not in general by other system software. This limits the cost of changing the convention to the point that it can be considered worthwhile to change it if a sufficient technical advantage is obtained by doing so, such as a significant performance improvement in opcode execution. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A2-53 Programmers’ Model Multiple conventions are known collectively as the subarchitecture of the implementation. They are not described in this document, and must only be relied upon by EJVM implementations and debug/similar software as described above. All other software must only rely upon the general architectural definition of the Jazelle Extension described in this section. A particular subarchitecture is identified by reading the Jazelle ID register described in Jazelle ID register on page A2-62. A2.10.2 Jazelle state The Jazelle Extension makes use of an extra state bit (J) in the processor status registers (the CPSR and the banked SPSRs). This is bit[24] of the registers concerned: 31 30 29 28 27 26 25 24 23 N Z C V Q Rsrvd J 20 19 RESERVED 16 15 GE[3:0] 10 9 8 RESERVED 7 6 5 4 E A I F T 0 Mode The other bit fields are described in Program status registers on page A2-11. Note The placement of the J bit in the flags byte was to avoid any usage of the status or extension bytes in code run on ARMv5TE or earlier processors. This ensures that OS code written using the deprecated CPSR, SPSR, CPSR_all or, SPSR_all syntax for the destination of an MSR instruction only ceases to work when features introduced in ARMv6 are used, namely the E, A and GE bit fields. In addition, J is always 0 at times that an MSR instruction is executed. This ensures there are no unexpected side-effects of existing instructions such as MSR CPSR_f,#0xF0000000, that are used to put the flags into a known state. The J bit is used in conjunction with the T bit to determine the execution state of the processor, as shown in Table A2-11. Table A2-11 J T Execution state 0 0 ARM state, executing 32-bit ARM instructions 0 1 Thumb state, executing 16-bit Thumb instructions 1 0 Jazelle state, executing variable-length Jazelle opcodes 1 1 UNDEFINED, and reserved for future expansion The J bit is treated similarly to the T bit in the following respects: • A2-54 On exception entry, both bits are copied from the CPSR to the exception mode’s SPSR, and then cleared in the CPSR to put the processor into the ARM state. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I Programmers’ Model • Data processing instructions with Rd = R15 and the S bit set cause these bits to be copied from the SPSR to the CPSR and execution to resume in the resulting state. This ensures that these instructions have their normal exception return functionality. Such exception returns are expected to use the SPSR and R14 values generated by a processor exception entry and to use the appropriate return instruction for the exception concerned, as described in Exceptions on page A2-16. If return values are used with J == 1 and T == 0 in the SPSR value, then the results are SUBARCHITECTURE DEFINED. • Similarly, LDM instructions with the PC in the register list and ^ specified (that is, LDM (3) instructions, as described in LDM (3) on page A4-40) cause both bits to be copied from the SPSR to the CPSR and execution to resume in the resulting state. These instructions are also used for exception returns, and the considerations in the previous bullet point also apply to them. • In privileged modes, execution of an MSR instruction that attempts to set the J or T bit of the CPSR to 1 has UNPREDICTABLE results. • In unprivileged (User) mode, execution of an MSR instruction that attempts to set the J or T bit of the CPSR to 1 will not modify the bit. • Setting J == 1 and T == 1 causes similar effects to setting T == 1 on a non Thumb-aware processor. That is, the next instruction executed will cause entry to the Undefined Instruction exception. Entry to the exception handler will cause the processor to re-enter ARM state, and the handler can detect that this was the cause of the exception because J and T are both set in SPSR_und. While in Jazelle state, the processor executes opcode programs. An opcode program is defined to be an executable object comprising one or more class files, as defined in Lindholm and Yellin, The Java Virtual Machine Specification 2nd Edition, or derived from and functionally equivalent to one or more class files. While in Jazelle state, the PC acts as a program counter which identifies the next JVM opcode to be executed, where JVM opcodes are the opcodes defined in Lindholm and Yellin, or a functionally equivalent transformed version of them. Native methods, as described in Lindholm and Yellin, for the Jazelle Extension must use only the ARM and/or Thumb instruction sets to specify their functionality. An implementation of the Jazelle Extension must not be documented or promoted as performing any task while it is in Jazelle state other than the acceleration of opcode programs in accordance with this section and Lindholm and Yellin. Extension of the PC to 32 bits In order to allow the PC to point to an arbitrary opcode, all 32 bits of the PC are defined in non-trivial implementations. Bit[0] of the PC always reads as zero when in ARM or Thumb state. Bit[1] reflects the word-alignment, or halfword-alignment of ARM and Thumb instructions respectively. The existence of bit[0] in the PC is only visible in ARM or Thumb state due to an exception occurring in Jazelle state, and the exception return address is odd-byte aligned. The main architectural implication of this is that exception handlers must ensure that they restore all 32 bits of R15. The recommended ways to handle exception returns behave correctly. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A2-55 Programmers’ Model A2.10.3 New Jazelle state entry instruction (BXJ) An ARM instruction similar to BX is added. The BXJ instruction has a single register operand that specifies a target execution state (ARM or Thumb) and branch target address for use if entry to Jazelle state is not available. See BXJ on page A4-21 for more details. Compliant Java execution involves the EJVM using the BXJ instruction, the usage model of the standard ARM registers, and the Jazelle Extension Control and Configuration registers described in Configuration and control on page A2-62. Executing BXJ with Jazelle Extension enabled Executing a BXJ instruction when the JE bit is 1 gives the Jazelle Extension hardware an opportunity to enter Jazelle state and start executing opcodes directly. The circumstances in which Jazelle state execution is entered are IMPLEMENTATION DEFINED. If Jazelle state execution is not entered, the instruction is executed in the same way as a BX instruction to a SUBARCHITECTURE DEFINED register usage model. This is required to ensure the Jazelle Extension hardware and the EJVM software communicate effectively with each other. Similarly, various registers will contain SUBARCHITECTURE DEFINED values when Jazelle state execution is terminated and ARM or Thumb state execution is resumed. The precise set of registers affected by these requirements is a SUBARCHITECTURE DEFINED subset of the process registers, which are defined to be: • the ARM general-purpose registers R0-R14 • the PC • the CPSR • the VFP general-purpose registers S0-S31 and D0-D15, subject to the VFP architecture’s restrictions on their use and subject to the VFP architecture being present • the FPSCR, subject to the VFP architecture being present. All processor state that can be modified by Jazelle state execution must be kept in process registers, in order to ensure that it is preserved and restored correctly when processor exceptions and process swaps occur. Configuration state (that is, state that affects Jazelle state execution but is not modified by it) can be kept either in process registers or in configuration registers. EJVM implementations should only set JE == 1 after determining that the processor’s Jazelle Extension subarchitecture is compatible with their usage of the process registers. Otherwise, they should leave JE == 0 and execute without hardware acceleration. Executing BXJ with Jazelle Extension disabled If a BXJ instruction is executed when the JE bit is 0, it is executed identically to a BX instruction with the same register operand. BXJ instructions can therefore be freely executed when the JE bit is 0. In particular, if an EJVM determines that it is executing on a processor whose Jazelle Extension implementation is trivial or uses an incompatible subarchitecture, it can set JE == 0 and execute correctly, without the benefit of any Jazelle hardware acceleration that may be present. A2-56 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I Programmers’ Model Jazelle state exit The processor exits Jazelle state in IMPLEMENTATION DEFINED circumstances. This is typically due to attempted execution of an opcode that the implementation cannot handle in hardware, or that generates a Jazelle exception (such as a Null-Pointer exception). When this occurs, various processor registers will contain SUBARCHITECTURE DEFINED values, allowing the EJVM to resume software execution of the opcode program correctly. The processor also exits Jazelle state when a processor exception occurs. The CPSR is copied to the exception mode’s banked SPSR as normal, so the banked SPSR contains J == 1 and T == 0, and Jazelle state is restored on return from the exception when the SPSR is copied back into the CPSR. Coupled with the restriction that only process registers can be modified by Jazelle state execution, this ensures that all registers are correctly preserved and restored by processor exception handlers. Configuration and control registers may be modified in the exception handler itself as described in Configuration and control on page A2-62. Considerations specific to execution of opcodes apply to processor exceptions. For details of these, see Jazelle Extension exception handling on page A2-58. It is IMPLEMENTATION DEFINED whether Jazelle Extension hardware contains state that is modified during Jazelle state execution, and is held outside the process registers during Jazelle state execution. If such state exists, the implementation shall: • Initialize the state from one or more of the process registers whenever Jazelle state is entered, either as a result of execution of a BXJ instruction or of returning from a processor exception. • Write the state into one or more of the process registers whenever Jazelle state is exited, either as a result of taking a processor exception or of IMPLEMENTATION DEFINED circumstances. • Ensure that the ways in which it is written into process registers on taking a processor exception, and initialized from process registers on returning from that exception, result in it being correctly preserved and restored over the exception. Additional Jazelle state restrictions The Jazelle Extension hardware shall obey the following restrictions: • It must not change processor mode other than by taking one of the standard ARM processor exceptions. • It must not access banked versions of registers other than the ones belonging to the processor mode in which it is entered. • It must not do anything that is illegal for an UNPREDICTABLE instruction. That is, it must not generate a security loophole, nor halt or hang the processor or any other part of the system. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A2-57 Programmers’ Model As a result of these requirements, Jazelle state can be entered from User mode without risking a breach of OS security. In addition: • Entering Jazelle state from FIQ mode has UNPREDICTABLE results. • Jazelle Extension subarchitectures and implementations must not make use of otherwise-unallocated CPSR and SPSR bits. All such bits are reserved for future expansion of the ARM and Thumb architectures. A2.10.4 Jazelle Extension exception handling All exceptions copy the J bit from the CPSR to the SPSR, and all instructions that have the side-effect of copying the SPSR to the CPSR must copy the J bit along with all the other bits. When an exception occurs in Jazelle state, the R14 register for the exception mode is calculated as follows: IRQ/FIQ Address of opcode to be executed on return from interrupt + 4. Prefetch Abort Address of the opcode causing the abort + 4. Data Abort Address of the opcode causing the abort + 8. Undefined instruction Must not occur. See Undefined Instruction exceptions on page A2-60. SWI Must not occur. See SWI exceptions on page A2-60. Interrupts (IRQ and FIQ) In order for the standard mechanism for handling interrupts to work correctly, Jazelle Exception hardware implementations must take care that whenever an interrupt is allowed to occur during Jazelle state execution, one of the following occurs: A2-58 • Execution has reached an opcode instruction boundary. That is, all operations required to implement one opcode have completed, and none of the operations required to implement the next opcode have completed. The R14 value on entry to the interrupt handler must be the address of the next opcode, plus 4. • The sequence of operations performed from the start of the current opcode’s execution up to any point where an interrupt can occur is idempotent: that is, it can be repeated from its start without changing the overall result of executing the opcode. The R14 value on entry to the interrupt handler must be the address of the current opcode, plus 4. • If an interrupt does occur during an opcode’s execution, corrective action is taken either directly by the Jazelle Extension hardware or indirectly by it calling a SUBARCHITECTURE DEFINED handler in the EJVM, and that corrective action re-creates a situation in which the opcode can be re-executed from its start. The R14 value on entry to the interrupt handler must be the address of the opcode, plus 4. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I Programmers’ Model Data aborts The value saved in R14_abt on a data abort shall ensure that a virtual memory data abort handler can read the system coprocessor (CP15) Fault Status and Fault Address registers, fix the reason for the abort and return using SUBS PC,R14,#8 or its equivalent, without looking at the instruction that caused the abort or which state it was executed in. Note This assumes that the intention is to return to and retry the opcode that caused the data abort. If the intention is instead to return to the opcode after the one that caused the abort, then the return address will need to be modified by the length of the opcode that caused the abort. In order for the standard mechanism for handling data aborts to work correctly, Jazelle Exception hardware implementations must ensure that one of the following applies where an opcode might generate a data abort: • The sequence of operations performed from the start of the opcode’s execution up to the point where the data abort occurs is idempotent. That is, it can be repeated from its start without changing the overall result of executing the opcode. • If the data abort occurs during opcode execution, corrective action is taken either directly by the Jazelle Extension hardware or indirectly by it calling a SUBARCHITECTURE DEFINED handler in the EJVM, and that corrective action re-creates a situation in which the opcode can be re-executed from its start. Note In ARMv6, the Base Updated Abort Model is no longer allowed (see Abort models on page A2-23). This removes one potential obstacle to the first of these solutions. Prefetch aborts The value saved in R14_abt on a prefetch abort shall ensure that a virtual memory prefetch abort handler can locate the start of the instruction that caused the abort simply and without looking at the state in which its execution was attempted. It is always at address (R14_abt – 4). However, a multi-byte opcode may cross a page boundary, in which case the ARM processor’s prefetch abort handler cannot determine directly which of the two pages caused the abort. It is SUBARCHITECTURE DEFINED how this situation is handled, subject to the requirement that if it is handled by calling the ARM processor’s prefetch abort handler, (R14_abt – 4) must point to the first byte of the opcode concerned. In order to ensure subarchitecture-independence, OS designers should write prefetch abort handlers in such a way that they can handle a prefetch abort generated in either of the two pages spanned by such a opcode. A suggested simple technique is: ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A2-59 Programmers’ Model IF the page pointed to by (R14_abt – 4) is not mapped THEN map the page ELSE map the page following the page including (R14_abt – 4) ENDIF retry the instruction SWI exceptions SWI exceptions must not occur during Jazelle state execution, for the following reasons: • ARM and Thumb state SWIs are supported in the ARM architecture. Opcode SWIs are not supported, due to the additional complexity they would introduce in the SWI usage model. • Jazelle Extension subarchitectures and implementations need to have a mechanism to return to ARM or Thumb state handlers in order to execute the more complex opcode. If a opcode needs to make an OS call, it can make use of this mechanism to cause an ARM or Thumb SWI instruction to be executed, with a small overhead in percentage terms compared with the cost of the OS call itself. • SWI calling conventions are highly OS-dependent, and would potentially require the subarchitecture to be OS aware. Undefined Instruction exceptions Undefined Instruction exceptions must not occur during Jazelle state execution. When the Jazelle Extension hardware synthesizes a coprocessor instruction and passes it to a hardware coprocessor (most likely, a VFP coprocessor), and the coprocessor rejects the instruction, there are considerable complications involved if this was allowed to result in the ARM processor’s Undefined Instruction trap. These include: • The coprocessor instruction is not available to be loaded from memory (something that is relied upon by most Undefined Instruction handlers). • The coprocessor instruction cannot typically be determined from the opcode that is loadable from memory without considerable knowledge of implementation and subarchitecture details of the Jazelle Extension hardware. • The coprocessor-generated Undefined Instruction exceptions (and VFP-generated ones in particular) can typically be either precise (that is, caused by the instruction at (R14_und – 4)) or imprecise (that is, caused by a pending exceptional condition generated by some earlier instruction and nothing to do with the instruction at (R14_und – 4)). Precise Undefined Instruction exceptions typically must be handled by emulating the instruction at (R14_und – 4), followed by returning to the instruction that follows it. Imprecise Undefined Instruction exceptions typically need to be handled by getting details of the exceptional condition and/or the earlier instruction from the coprocessor, fixing things up in some way, and then returning to the instruction at (R14_und – 4). This means that there are two different possible return addresses, not necessarily at a fixed offset from each other as they are when dealing with coprocessor instructions in memory, making it difficult to define the value R14_und should have on entry to the Undefined Instruction handler. A2-60 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I Programmers’ Model • The return address for the Undefined Instruction handler places idempotency requirements and/or completion requirements (that is, that once the coprocessor operation has been completed, everything necessary for execution of the opcode has been done) on the sequences of operations performed by the Jazelle Extension hardware. The restrictions require cooperation and limit the design freedom for both the Jazelle acceleration and coprocessor designers. To avoid the need for undefined exceptions, the following coprocessor interworking model for Jazelle Extension hardware applies. Coprocessor Interworking If while executing in Jazelle state, the Jazelle Extension hardware synthesizes a coprocessor instruction and passes it to a hardware coprocessor for execution, then it must be prepared for the coprocessor to reject the instruction. If a coprocessor rejects an instruction issued by Jazelle Extension hardware, the Jazelle Extension hardware and coprocessor must cooperate to: • Prevent the Undefined Instruction exception that would occur if the coprocessor had rejected a coprocessor instruction in ARM state from occurring. • Take suitable SUBARCHITECTURE DEFINED corrective action, probably involving exiting Jazelle state, and executing a suitable ARM code handler that contains further coprocessor instructions. To ensure that this is a practical technique and does not result in inadequate or excessive handling of coprocessor instruction rejections, coprocessors designed for use with the Jazelle Extension must: • When there is an exceptional condition generated by an earlier instruction, the coprocessor shall keep track of that exceptional condition and keep trying to cause an imprecise Undefined Instruction exception whenever an attempt is made to execute one of its coprocessor instructions until the exceptional condition is cleared by its Undefined Instruction handler. • When it tries to cause a precise Undefined Instruction exception, for reasons to do with the coprocessor instruction it is currently being asked to execute, the coprocessor shall act in a memoryless way. That is, if it is subsequently asked to execute a different coprocessor instruction, it must ignore the instruction it first tried to reject precisely and instead determine whether the new instruction needs to be rejected precisely. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A2-61 Programmers’ Model A2.10.5 Configuration and control All registers associated with the Jazelle Extension are implemented in coprocessor space as part of coprocessor fourteen (CP14). The registers are accessed using the MCR (MCR on page A4-62) and MRC (MRC on page A4-70) instructions. The general instruction formats for Jazelle Extension control and configuration are as follows: MCR{} p14, 7, , CRn, CRm{, MRC{} p14, 7, , CRn, CRm{, opcode_2}* opcode_2}* *opcode_2 can be omitted if opcode_2 == 0 The following rules apply to the Jazelle Extension control and configuration registers: • All SUBARCHITECTURE DEFINED configuration registers are accessed by coprocessor 14 MRC and MCR instructions with set to 7. • The values contained by configuration registers are only changed by the execution of MCR instructions, and in particular are not changed by Jazelle state execution of opcodes. • The access policy for the required registers is fully defined in their descriptions. All MCR accesses to the Jazelle ID register, and MRC or MCR accesses which are restricted to privileged modes only are UNDEFINED if executed in User mode. The access policy of other configuration registers is SUBARCHITECTURE DEFINED. • When a configuration register is readable, the result of reading it will be the last value written to it, with no side-effects. When a configuration register is not readable, the result of attempting to read it is UNPREDICTABLE. • When a configuration register can be written, the effect must be idempotent. That is, the overall effect of writing the value more than once must not differ from the effect of writing it once. A minimum of three registers are required in a non-trivial implementation. Additional registers may be provided and are SUBARCHITECTURE DEFINED. Jazelle ID register The Jazelle Identity register allows EJVMs to determine the architecture and subarchitecture under which they are running. This is a coprocessor 14 read-only register, accessed by the MRC instruction: MRC{} p14, 7, , c0, c0 {, 0} ;:= Jazelle Identity register The Jazelle ID register is normally accessible from both privileged and User modes. See Operating System (OS) control register on page A2-64 for User mode access restrictions. A2-62 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I Programmers’ Model The format of the Jazelle Identity register is: 31 28 27 Architecture 20 19 Implementor 12 11 Subarchitecture 0 SUBARCHITECTURE DEFINED Bits[31:28] Contain an architecture code. This uses the same architecture code that appears in the Main ID register in coprocessor 15 Bits[27:20] Contain the implementor code of the designer of the subarchitecture. This uses the same implementor code that appears in the Main ID register in coprocessor 15, as documented in Main ID register on page B3-7. As a special case, if the trivial implementation of the Jazelle Extension is used, this implementor code is 0x00. Bits[19:12] Contain the subarchitecture code. The following subarchitecture code is defined: 0x00 = Jazelle V1 subarchitecture, or trivial implementation of Jazelle Extension if implementor code is 0x00. Bits[11:0] Contain further SUBARCHITECTURE DEFINED information. Main configuration register A Main Configuration register is added to control the Jazelle Extension. This is a coprocessor 14 register, accessed by MRC and MCR instructions as follows: MRC{} p14, 7, , c2, c0 {, 0} MCR{} p14, 7, , c2, c0 {, 0} ; ; ; ; := Main Configuration register Main Configuration register := This register is normally write-only from User mode. See Operating System (OS) control register on page A2-64 for additional User mode access restrictions. The format of the Main Configuration register is: 31 1 0 SUBARCHITECTURE DEFINED Bit[31:1] SUBARCHITECTURE DEFINED Bit[0] The Jazelle Enable (JE) bit, which is cleared to 0 on reset. JE information. When the JE bit is 0, the Jazelle Extension is disabled and the BXJ instruction does not cause Jazelle state execution – instead, BXJ behaves exactly as a BX instruction. See BXJ on page A4-21. When the JE bit is 1, the Jazelle Extension is enabled. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A2-63 Programmers’ Model Operating System (OS) control register The Jazelle OS Control register provides the operating system with process usage control of the Jazelle Extension. This is a coprocessor 14 register, accessed by MRC and MCR instructions as follows: MRC{} p14, 7, , c1, c0 {, 0} MCR{} p14, 7, , c1, c0 {, 0} ; ; ; ; := Jazelle OS Control register Jazelle OS Control register := This register can only be accessed from privileged modes; these instructions are UNDEFINED when executed in User mode. EJVMs will normally never access the Jazelle OS Control register, and EJVMs that are intended to run in User mode cannot do so. The purpose of the Jazelle OS Control register is primarily to allow operating systems to control access to the Jazelle Extension hardware in a subarchitecture-independent fashion. It is expected to be used in conjunction with the JE bit of the Main Configuration register. The format of the Jazelle OS Control register is: 31 2 RESERVED (RAZ) 1 0 C C V D Bits[31:2] Reserved for future expansion. Prior to such expansion, they must read as zero. To maximize future compatibility, software should preserve their contents, using a read modify write method to update the other control bits. CV Bit[1] The Configuration Valid bit, which can be used by an operating system to signal to an EJVM that it needs to re-write its configuration to the configuration registers. When CV == 0, re-writing of the configuration registers is required before an opcode is next executed. When CV == 1, no re-writing of the configuration registers is required, other than re-writing that is certain to occur before an opcode is next executed. CD Bit[0] The Configuration Disabled bit, which can be used by an operating system to monitor and/or control User mode access to the configuration registers and the Jazelle Identity register. When CD == 0, MCR instructions that write to configuration registers and MRC instructions that read the Jazelle Identity register execute normally. When CD == 1, all of these instructions only behave normally when executed in a privileged mode, and are UNDEFINED when executed in User mode. When the JE bit of the Main Configuration register is 0, the Jazelle OS Control register has no effect on how BXJ instructions are executed. They always execute as a BX instruction. When the JE bit of the Main Configuration register is 1, the CV bit affects BXJ instructions as follows: • A2-64 If CV == 1, the Jazelle Extension hardware configuration is considered enabled and valid, allowing the processor to enter Jazelle state and execute opcodes as described in Executing BXJ with Jazelle Extension enabled on page A2-56. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I Programmers’ Model • If CV == 0, then in all of the IMPLEMENTATION DEFINED circumstances in which the Jazelle Extension hardware would have entered Jazelle state if CV had been 1, it instead enters a configuration invalid handler and sets CV to 1. A configuration invalid handler is a sequence of ARM instructions that includes MCR instructions to write the configuration required by the EJVM, ending with a BXJ instruction to re-attempt execution of the opcode concerned. The method by which the configuration invalid handler’s address is determined and its entry and exit conditions are all SUBARCHITECTURE DEFINED. In circumstances in which the Jazelle Extension hardware would not have entered Jazelle state if CV had been 1, it is IMPLEMENTATION DEFINED whether the configuration invalid handler is entered as described in the last paragraph, or the BXJ instruction is treated as a BX instruction with possible SUBARCHITECTURE DEFINED restrictions. The intended use of the CV bit is that when a process swap occurs, the operating system sets CV to 0. The result is that before the new process can execute an opcode in the Jazelle Extension hardware, it must execute its configuration invalid handler. This ensures that the Jazelle Extension hardware’s configuration registers are correctly for the EJVM concerned. The CV bit is set to 1 on entry to the configuration invalid handler, allowing the opcode to be executed in hardware when the invalid configuration handler re-attempts its execution. Note It may seem counterintuitive that the CV bit is set to 1 on entry to the configuration invalid handler, rather than after it has completed writing the configuration registers. This is correct, otherwise, the configuration invalid handler may partially configure the hardware before a process swap occurs, causing another EJVM-using process to write its configuration to the hardware. When the original process is resumed, CV will have been cleared (CV == 0) by the operating system. If the handler writes its configuration to the hardware and then sets CV to 1 in this example, the opcode will be executed with the hardware configured for a hybrid of the two configurations. By setting CV to 1 on entry to the configuration invalid handler, this means that CV is 0 when execution of the opcode is re-attempted, and the configuration invalid handler will execute again (and if necessary, recursively) until it finally completes execution without a process swap occurring. The CD bit has multiple possible uses for monitoring and controlling User mode access to the Jazelle Extension hardware. Among them are: • By setting CD == 1 and JE == 0, an OS can prevent all User mode access to the Jazelle Extension hardware: any attempt to use the BXJ instruction will produce the same result as a BX instruction, and any attempt to configure the hardware (including setting the JE bit) will result in an Undefined Instruction exception. • To provide User mode access to the Jazelle Extension hardware in a simple manner, while protecting EJVMs from conflicting use of the hardware by other processes, the OS should set CD == 0 and should preserve and restore the Main Configuration register on process swaps, initializing its value to 0 for new processes. In addition, it should set the CV bit to 0 on every process swap, to ensure that EJVMs reconfigure the Jazelle Extension hardware to match their requirements when necessary. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A2-65 Programmers’ Model • The technique described in the previous bullet point may result in large numbers of unnecessary reconfigurations of the Jazelle Extension hardware if only a few processes are using the hardware. This can be improved by the OS keeping track of which User mode processes are known to be using an EJVM. The OS should set CD == 1 and JE == 0 for any new processes or on a context switch to an existing process that is not using an EJVM. Any User mode instruction that attempts to access a configuration register will take an UNDEFINED exception. The Undefined Instruction handler can then identify the EJVM need, mark the process as using an EJVM, then return to retry the instruction with CD == 0. A further refinement is to clear the CV bit to 0 only if the context switch is to an EJVM-using process that is different from the last EVJM-using process which ran. This avoids redundant reconfiguration of the hardware. That is, the operating system maintains a “process currently owning the Jazelle Extension hardware” variable, that gets updated with a process_ID when swapping to an EJVM-using process. The context switch software sets CV to 0 if the process_ID update results in a change to the saved variable. Context switch software implementing the CV-bit scheme should also save and restore the Main Configuration register (in its entirety) on a process swap where the EJVM-using process changes. This ensures that the restored EJVM can use the JE bit reliably for its own purpose. Note This technique will not identify privileged EJVM-using processes. However, it is assumed that operating systems are aware of the needs of their privileged processes. • The OS can impose a single Jazelle Extension configuration on all User mode code by writing that configuration to the hardware, then setting CD == 1 and JE == 1. The CV and CD bits are both set to 0 on reset. This ensures that subject to some conditions, an EJVM can operate correctly under an OS that does not support the Jazelle Extension. The main such condition is that a process swap never swaps between two EJVM-using processes that require different settings of the configuration registers. This would occur in either of the following two cases, for example: A2-66 • if there is only ever one EJVM-using process in the system. • if all of the EJVM-using processes in the system use the same static settings of the configuration registers. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I Programmers’ Model A2.10.6 EJVM operation This section summarizes how EJVMs should operate in order to meet the architecture requirements. Initialization During initialization, the EJVM should first check which subarchitecture is present, using the implementor and subarchitecture codes in the value read from the Jazelle Identity register. If the EJVM is incompatible with the subarchitecture, it should either write a value with JE == 0 to the Main Configuration register, or (if unaccelerated opcode execution is unacceptable) generate an error. If the EJVM is compatible with the subarchitecture, it should write its desired configuration to the Main Configuration register and any other configuration registers. The EJVM should not skip this step on the assumption that the CV bit of the Jazelle OS Control register will be 0; an assumption that CV == 0 triggering the configuration invalid handler before any opcode is executed by the Jazelle Extension hardware should not be relied on. Opcode execution The EJVM should contain a handler for each opcode and for each exception condition specified by the subarchitecture it is designed for (the exception conditions always include configuration invalid). It should initiate opcode execution by executing a BXJ instruction with the register operand specifying the target address of the opcode handler for the first opcode of the program, and the process registers set up in accordance with the SUBARCHITECTURE DEFINED register usage model. The opcode handler performs the data-processing operations required by the opcode concerned, determines the address of the next opcode to be executed, determines the address of the handler for that opcode, and performs a BXJ to that handler address with the registers again set up to the SUBARCHITECTURE DEFINED register usage model. The register usage model on entry to exception condition handlers are SUBARCHITECTURE DEFINED, and may differ from the register usage model defined for BXJ instruction execution. The handlers then resolve the exception condition. For example, in the case of the configuration invalid handler, the handler rewrites the desired configuration to the Main Configuration register and any other configuration registers). Further considerations To ensure application execution and correct interaction with an operating system, EJVMs should only perform operations that are allowed in User mode. In particular, they should only ever read the Jazelle ID register, write to the configuration registers, and should not attempt to access the Jazelle OS Control register. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A2-67 Programmers’ Model A2.10.7 Trivial implementations This section summarizes what needs to be implemented in trivial implementations of the Jazelle Extension. • Implement the Jazelle Identity register with the implementor and subarchitecture fields set to zero; the whole register may RAZ (read as zero). • Implement the Main Configuration register to read as zero and ignore writes. • Implement the Jazelle OS control register such that it can be read and written, but its effects are ignored. The register may be implemented as RAZ/DNM - read as zero, do not modify on writes. This allows operating systems supporting an EJVM to execute correctly. • Implement the BXJ instruction to behave identically to the BX instruction in all circumstances, as implied by the fact that the JE bit is always zero. In particular, this means that Jazelle state will never be entered normally on a trivial implementation. • In ARMv6, a trivial implementation can implement the J bit in the CPSR/SPSRs as RAZ/DNM; read as zero, do not modify on writes. This is allowed because there is no legitimate way to set the J bit and enter Jazelle state, hence any return routine that tries to do so is issuing an UNPREDICTABLE instruction. Otherwise, implement J bits in the CPSR and each SPSR, and ensure that they are read, written and copied correctly when exceptions are entered and when MSR, MRS and exception return instructions are executed. • In all cases when J == 1 in the CPSR it is IMPLEMENTATION DEFINED whether the next instruction is fetched and, could result in a prefetch abort, or it is assumed to be UNDEFINED. Note The PC does not need to be extended to 32 bits in the trivial implementation, since the only way that bit[0] of the PC is visible in ARM or Thumb state is as a result of a processor exception occurring during Jazelle state execution, and Jazelle state execution does not occur on a trivial implementation. A2-68 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I Programmers’ Model A2.11 Saturated integer arithmetic When viewed as a signed number, the value of a general-purpose register lies in the range from –231 (or 0x80000000) to +231 – 1 (or 0x7FFFFFFF). If an addition or subtraction is performed on such numbers and the correct mathematical result lies outside this range, it would require more than 32 bits to represent. In these circumstances, the surplus bits are normally discarded, which has the effect that the result obtained is equal to the correct mathematical result reduced modulo 232. For example, 0x60000000 could be used to represent +3 × 229 as a signed integer. If you add this number to itself, you get +3 × 230, which lies outside the representable range, but could be represented as the 33-bit signed number 0x0C0000000. The actual result obtained will be the right-most 32 bits of this, which are 0xC0000000. This represents –230, which is smaller than the correct mathematical result by 232, and does not even have the same sign as the correct result. This kind of inaccuracy is unacceptable in many DSP applications. For example, if it occurred while processing an audio signal, the abrupt change of sign would be likely to result in a loud click. To avoid this sort of effect, many DSP algorithms use saturated signed arithmetic. This modifies the way normal integer arithmetic behaves as follows: • If the correct mathematical result lies within the available range from –231 to +231 – 1, the result of the operation is equal to the correct mathematical result. • If the correct mathematical result is greater than +231 – 1 and so overflows the upper end of the representable range, the result of the operation is equal to +231 – 1. • If the correct mathematical result is less than –231 and so overflows the lower end of the representable range, the result of the operation is equal to –231. Put another way, the result of a saturated arithmetic operation is the closest representable number to the correct mathematical result of the operation. Instructions that support saturated signed 32-bit integer additions and subtractions (Q prefix), use the QADD and QSUB instructions. Variants of these instructions (QDADD and QDSUB) perform a saturated doubling of one of the operands before the saturated addition or subtraction. Saturated integer multiplications are not supported, because the product of two values of widths A and B bits never overflows an (A+B)-bit destination. A2.11.1 Saturated Q15 and Q31 arithmetic A 32-bit signed value can be treated as having a binary point immediately after its sign bit. This is equivalent to dividing its signed integer value by 231, so that it can now represent numbers from –1 to +1 – 2–31. When a 32-bit value is used to represent a fractional number in this fashion, it is known as a Q31 number. Saturated additions, subtractions, and doublings can be performed on Q31 numbers using the same instructions as are used for saturated integer arithmetic, since everything is simply scaled down by a factor of 2–31. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A2-69 Programmers’ Model Similarly, a 16-bit value can be treated as having a binary point immediately after its sign bit, which effectively divides its signed integer value by 215. When a 16-bit value is used in this fashion, it can represent numbers from –1 to +1 – 2–15 and is known as a Q15 number. If two Q15 numbers are multiplied together as integers, the resulting integer needs to be scaled down by a factor of 2–15 × 2–15 == 2–30. For example, multiplying the Q15 number 0x8000 (representing –1) by itself using an integer multiplication instruction yields the value 0x40000000, which is 230 times the desired result of +1. This means that the result of the integer multiplication instruction is not quite in Q31 form. To get it into Q31 form, it must be doubled, so that the required scaling factor becomes 2–31. Furthermore, it is possible that the doubling will cause integer overflow, so the result should in fact be doubled with saturation. In particular, the result 0x40000000 from the multiplication of 0x8000 by itself should be doubled with saturation to produce 0x7FFFFFFF (the closest possible Q31 number to the correct mathematical result of –1 × –1 == +1). If it were doubled without saturation, it would instead produce 0x80000000, which is the Q31 representation of –1. To implement a saturated Q15 × Q15 → Q31 multiplication, therefore, an integer multiply instruction should be followed by a saturated integer doubling. The latter can be performed by a QADD instruction adding the multiply result to itself. Similarly, a saturated Q15 × Q15 + Q31 → Q31 multiply-accumulate can be performed using an integer multiply instruction followed by the use of a QDADD instruction. Some other examples of arithmetic on Q15 and Q31 numbers are described in the Usage sections for the individual instructions. A2-70 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I Chapter A3 The ARM Instruction Set This chapter describes the ARM® instruction set and contains the following sections: • Instruction set encoding on page A3-2 • The condition field on page A3-3 • Branch instructions on page A3-5 • Data-processing instructions on page A3-7 • Multiply instructions on page A3-10 • Parallel addition and subtraction instructions on page A3-14 • Extend instructions on page A3-16 • Miscellaneous arithmetic instructions on page A3-17 • Other miscellaneous instructions on page A3-18 • Status register access instructions on page A3-19 • Load and store instructions on page A3-21 • Load and Store Multiple instructions on page A3-26 • Semaphore instructions on page A3-28 • Exception-generating instructions on page A3-29 • Coprocessor instructions on page A3-30 • Extending the instruction set on page A3-32. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A3-1 The ARM Instruction Set A3.1 Instruction set encoding Figure A3-1 shows the ARM instruction set encoding. All other bit patterns are UNPREDICTABLE or UNDEFINED. See Extending the instruction set on page A3-32 for a description of the cases where instructions are UNDEFINED. An entry in square brackets, for example [1], indicates that more information is given after the figure. 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 2. 3. 4. A3-2 5 4 0 3 2 1 0 0 0 0 1 0 x x 0 Data processing register shift [2] cond [1] 0 0 0 Miscellaneous instructions: See Figure A3-4 cond [1] 0 0 0 1 0 x x 0 x x x x x x x x x x x x 0 x x 1 x x x x Multiplies: See Figure A3-3 Extra load/stores: See Figure A3-5 cond [1] 0 0 0 x x x x x x x x x x x x x x 1 x x 1 x x x x Data processing immediate [2] cond [1] 0 0 1 Undefined instruction cond [1] 0 0 1 1 0 x 0 0 Move immediate to status register cond [1] 0 0 1 1 0 R 1 0 Mask Load/store immediate offset cond [1] 0 1 0 P U B W L Rn Rd Load/store register offset cond [1] 0 1 1 P U B W L Rn Rd Media instructions [4]: See Figure A3-2 cond [1] 0 1 1 x x Architecturally undefined cond [1] 0 1 1 1 1 1 1 1 Load/store multiple cond [1] 1 0 0 P U S W L opcode S x x x x x x x x x x x x x x x 0 Rn Rn Rd Rs Rd 0 shift rotate 1 x x x x Rm immediate x x x x x x x x x x x x x x x x SBO Rm rotate x x x x immediate immediate shift amount shift 0 Rm x x x x x x x x x x x x x x x x x x 1 x x x x Branch and branch with link cond [1] 1 0 1 L Coprocessor load/store and double register transfers cond [3] 1 1 0 P U N W L Coprocessor data processing cond [3] 1 1 1 0 Coprocessor register transfers cond [3] 1 1 1 0 opcode1 L Software interrupt cond [1] 1 1 1 1 1 1 1 1 x x x x x shift amount 6 cond [1] x x x Rd shift Miscellaneous instructions: See Figure A3-4 S Rn 7 0 0 0 opcode S 8 cond [1] Unconditional instructions: See Figure A3-6 1. opcode 9 Data processing immediate shift x x x x x x x x x x x x 1 1 1 1 x x x x register list Rn 24-bit offset opcode1 Rn CRd cp_num CRn CRd cp_num opcode2 0 CRm CRn Rd cp_num opcode2 1 CRm 8-bit offset swi number x x x x x x x x x x x x x x x x x x x x x x x Figure A3-1 ARM instruction set summary The cond field is not allowed to be 1111 in this line. Other lines deal with the cases where bits[31:28] of the instruction are 1111. If the opcode field is of the form 10xx and the S field is 0, one of the following lines applies instead. If the cond field is 1111, this instruction is UNPREDICTABLE prior to ARMv5. The architecturally Undefined instruction uses a small number of these instruction encodings. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I The ARM Instruction Set A3.2 The condition field Most ARM instructions can be conditionally executed, which means that they only have their normal effect on the programmers’ model state, memory and coprocessors if the N, Z, C and V flags in the CPSR satisfy a condition specified in the instruction. If the flags do not satisfy this condition, the instruction acts as a NOP: that is, execution advances to the next instruction as normal, including any relevant checks for interrupts and Prefetch Aborts, but has no other effect. Prior to ARMv5, all ARM instructions could be conditionally executed. A few instructions have been introduced subsequently which can only be executed unconditionally. See Unconditional instruction extension space on page A3-41 for details. Every instruction contains a 4-bit condition code field in bits 31 to 28: 31 28 27 0 cond This field contains one of the 16 values described in Table A3-1 on page A3-4. Most instruction mnemonics can be extended with the letters defined in the mnemonic extension field. If the always (AL) condition is specified, the instruction is executed irrespective of the value of the condition code flags. The absence of a condition code on an instruction mnemonic implies the AL condition code. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A3-3 The ARM Instruction Set A3.2.1 Condition code 0b1111 If the condition field is 0b1111, the behavior depends on the architecture version: • In ARMv4, any instruction with a condition field of 0b1111 is UNPREDICTABLE. • In ARMv5 and above, a condition field of 0b1111 is used to encode various additional instructions which can only be executed unconditionally (see Unconditional instruction extension space on page A3-41). All instruction encoding diagrams which show bits[31:28] as cond only match instructions in which these bits are not equal to 0b1111. Table A3-1 Condition codes Opcode [31:28] Mnemonic extension Meaning Condition flag state 0000 EQ Equal Z set 0001 NE Not equal Z clear 0010 CS/HS Carry set/unsigned higher or same C set 0011 CC/LO Carry clear/unsigned lower C clear 0100 MI Minus/negative N set 0101 PL Plus/positive or zero N clear 0110 VS Overflow V set 0111 VC No overflow V clear 1000 HI Unsigned higher C set and Z clear 1001 LS Unsigned lower or same C clear or Z set 1010 GE Signed greater than or equal N set and V set, or N clear and V clear (N == V) 1011 LT Signed less than N set and V clear, or N clear and V set (N != V) 1100 GT Signed greater than Z clear, and either N set and V set, or N clear and V clear (Z == 0,N == V) 1101 LE Signed less than or equal Z set, or N set and V clear, or N clear and V set (Z == 1 or N != V) 1110 AL Always (unconditional) - 1111 - See Condition code 0b1111 - A3-4 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I The ARM Instruction Set A3.3 Branch instructions All ARM processors support a branch instruction that allows a conditional branch forwards or backwards up to 32MB. As the PC is one of the general-purpose registers (R15), a branch or jump can also be generated by writing a value to R15. A subroutine call can be performed by a variant of the standard branch instruction. As well as allowing a branch forward or backward up to 32MB, the Branch with Link (BL) instruction preserves the address of the instruction after the branch (the return address) in the LR (R14). In T variants of ARMv4 and above, the Branch and Exchange (BX) instruction copies the contents of a general-purpose register Rm to the PC (like a MOV PC,Rm instruction), with the additional functionality that if bit[0] of the transferred value is 1, the processor shifts to Thumb® state. Together with the corresponding Thumb instructions, this allows interworking branches between ARM and Thumb code. Interworking subroutine calls can be generated by combining BX with an instruction to write a suitable return address to the LR, such as an immediately preceding MOV LR,PC instruction. In ARMv5 and above, there are also two types of Branch with Link and Exchange (BLX) instruction: • One type takes a register operand Rm, like a BX instruction. This instruction behaves like a BX instruction, and additionally writes the address of the next instruction into the LR. This provides a more efficient interworking subroutine call than a sequence of MOV LR,PC followed by BX Rm. • The other type behaves like a BL instruction, branching backwards or forwards by up to 32MB and writing a return link to the LR, but shifts to Thumb state rather than staying in ARM state as BL does. This provides a more efficient alternative to loading the subroutine address into Rm followed by a BLX Rm instruction when it is known that a Thumb subroutine is being called and that the subroutine lies within the 32MB range. A load instruction provides a way to branch anywhere in the 4GB address space (known as a long branch). A 32-bit value is loaded directly from memory into the PC, causing a branch. A long branch can be preceded by MOV LR,PC or another instruction that writes the LR to generate a long subroutine call. In ARMv5 and above, bit[0] of the value loaded by a long branch controls whether the subroutine is executed in ARM state or Thumb state, just like bit[0] of the value moved to the PC by a BX instruction. Prior to ARMv5, bits[1:0] of the value loaded into the PC are ignored, and a load into the PC can only be used to call a subroutine in ARM state. In non-T variants of ARMv5, the instructions described above can cause an entry into Thumb state despite the fact that the Thumb instruction set is not present. This causes the instruction at the branch target to enter the Undefined Instruction exception. See The interrupt disable bits on page A2-14 for more details. In ARMv6 and above, and in J variants of ARMv5, there is an additional Branch and Exchange Jazelle® instruction, see BXJ on page A4-21. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A3-5 The ARM Instruction Set A3.3.1 Examples B label ; branch unconditionally to label BCC label ; branch to label if carry flag is clear BEQ label ; branch to label if zero flag is set MOV PC, #0 ; R15 = 0, branch to location zero BL func ; subroutine call to function PC, LR LR, PC ; ; ; ; func . . MOV MOV LDR A3.3.2 A3-6 PC, =func R15=R14, return to instruction after the BL store the address of the instruction after the next one into R14 ready to return load a 32-bit value into the program counter List of branch instructions B, BL Branch, and Branch with Link. See B, BL on page A4-10. BLX Branch with Link and Exchange. See BLX (1) on page A4-16 and BLX (2) on page A4-18. BX Branch and Exchange Instruction Set. See BX on page A4-20. BXJ Branch and change to Jazelle state. See BXJ on page A4-21. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I The ARM Instruction Set A3.4 Data-processing instructions ARM has 16 data-processing instructions, shown in Table A3-2. Table A3-2 Data-processing instructions Opcode Mnemonic Operation Action 0000 AND Logical AND Rd := Rn AND shifter_operand 0001 EOR Logical Exclusive OR Rd := Rn EOR shifter_operand 0010 SUB Subtract Rd := Rn - shifter_operand 0011 RSB Reverse Subtract Rd := shifter_operand - Rn 0100 ADD Add Rd := Rn + shifter_operand 0101 ADC Add with Carry Rd := Rn + shifter_operand + Carry Flag 0110 SBC Subtract with Carry Rd := Rn - shifter_operand - NOT(Carry Flag) 0111 RSC Reverse Subtract with Carry Rd := shifter_operand - Rn - NOT(Carry Flag) 1000 TST Test Update flags after Rn AND shifter_operand 1001 TEQ Test Equivalence Update flags after Rn EOR shifter_operand 1010 CMP Compare Update flags after Rn - shifter_operand 1011 CMN Compare Negated Update flags after Rn + shifter_operand 1100 ORR Logical (inclusive) OR Rd := Rn OR shifter_operand 1101 MOV Move Rd := shifter_operand (no first operand) 1110 BIC Bit Clear Rd := Rn AND NOT(shifter_operand) 1111 MVN Move Not Rd := NOT shifter_operand (no first operand) Most data-processing instructions take two source operands, though Move and Move Not take only one. The compare and test instructions only update the condition flags. Other data-processing instructions store a result to a register and optionally update the condition flags as well. Of the two source operands, one is always a register. The other is called a shifter operand and is either an immediate value or a register. If the second operand is a register value, it can have a shift applied to it. CMP, CMN, TST and TEQ always update the condition code flags. The assembler automatically sets the S bit in the instruction for them, and the corresponding instruction with the S bit clear is not a data-processing instruction, but instead lies in one of the instruction extension spaces (see Extending the instruction set on page A3-32). The remaining instructions update the flags if an S is appended to the instruction mnemonic (which sets the S bit in the instruction). See The condition code flags on page A2-11 for more details. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A3-7 The ARM Instruction Set A3.4.1 Instruction encoding {}{S} , := MOV | MVN {} , := CMP | CMN | TST | TEQ {}{S} , , := ADD | SUB | RSB | ADC | SBC | RSC | AND | BIC | EOR | ORR 31 28 27 26 25 24 cond A3-8 0 0 I 21 20 19 opcode S 16 15 Rn 12 11 Rd 0 shifter_operand I bit Distinguishes between the immediate and register forms of . S bit Signifies that the instruction updates the condition codes. Rn Specifies the first source operand register. Rd Specifies the destination register. shifter_operand Specifies the second source operand. See Addressing Mode 1 - Data-processing operands on page A5-2 for details of the shifter operands. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I The ARM Instruction Set A3.4.2 List of data-processing instructions ADC Add with Carry. See ADC on page A4-4. ADD Add. See ADD on page A4-6. AND Logical AND. See AND on page A4-8. BIC Logical Bit Clear. See BIC on page A4-12. CMN Compare Negative. See CMN on page A4-26. CMP Compare. See CMP on page A4-28. EOR Logical EOR. See EOR on page A4-32. MOV Move. See MOV on page A4-68. MVN Move Not. See MVN on page A4-82. ORR Logical OR. See ORR on page A4-84. RSB Reverse Subtract. See RSB on page A4-115. RSC Reverse Subtract with Carry. See RSC on page A4-117. SBC Subtract with Carry. See SBC on page A4-125. SUB Subtract. See SUB on page A4-208. TEQ Test Equivalence. See TEQ on page A4-228. TST Test. See TST on page A4-230. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A3-9 The ARM Instruction Set A3.5 Multiply instructions ARM has several classes of Multiply instruction: Normal 32-bit x 32-bit, bottom 32-bit result Long 32-bit x 32-bit, 64-bit result Halfword 16-bit x 16-bit, 32-bit result Word ∞ halfword 32-bit x 16-bit, top 32-bit result Most significant word 32-bit x 32-bit, top 32-bit result Dual halfword dual 16-bit x 16-bit, 32-bit result. All Multiply instructions take two register operands as the input to the multiplier. The ARM processor does not directly support a multiply-by-constant instruction because of the efficiency of shift and add, or shift and reverse subtract instructions. A3.5.1 Normal multiply There are two 32-bit x 32-bit Multiply instructions that produce bottom 32-bit results: MUL Multiplies the values of two registers together, truncates the result to 32 bits, and stores the result in a third register. MLA Multiplies the values of two registers together, adds the value of a third register, truncates the result to 32 bits, and stores the result in a fourth register. This can be used to perform multiply-accumulate operations. Both Normal Multiply instructions can optionally set the N (Negative) and Z (Zero) condition code flags. No distinction is made between signed and unsigned variants. Only the least significant 32 bits of the result are stored in the destination register, and the sign of the operands does not affect this value. A3.5.2 Long multiply There are five 32-bit x 32-bit Multiply instructions that produce 64-bit results. Two of the variants multiply the values of two registers together and store the 64-bit result in third and fourth registers. There are signed (SMULL) and unsigned (UMULL) variants. The signed variants produce a different result in the most significant 32 bits if either or both of the source operands is negative. Two variants multiply the values of two registers together, add the 64-bit value from the third and fourth registers, and store the 64-bit result back into those registers (third and fourth). There are signed (SMLAL) and unsigned (UMLAL) variants. These instructions perform a long multiply and accumulate. UMAAL multiplies the unsigned values of two registers together, adds the two unsigned 32-bit values from the third and fourth registers, and stores the 64-bit unsigned result back into those registers (third and fourth). A3-10 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I The ARM Instruction Set All the Long Multiply instructions except UMAAL can optionally set the N (Negative) and Z (Zero) condition code flags. UMAAL does not affect any flags. UMAAL is available in ARMv6 and above. A3.5.3 Halfword multiply There are three signed 16-bit x 16-bit Multiply instructions that produce 32-bit results: SMULxy Multiplies the 16-bit values of two half-registers together, and stores the signed 32-bit result in a third register. SMLAxy Multiplies the 16-bit values of two half-registers together, adds the 32-bit value from a third register, and stores the signed 32-bit result in a fourth register. SMLALxy Multiplies the 16-bit values of two half-registers together, adds the 64-bit value from a third and fourth register, and stores the 64-bit result back into those registers (third and fourth). SMULxy and SMLALxy do not affect any flags. SMLAxy can set the Q flag if overflow occurs in the multiplication. The x and y designators indicate whether the top (T) or bottom (B) bits of the register is used as the operand. They are available in ARMv5TE and above. A3.5.4 Word × halfword multiply There are two signed Multiply instructions that produce top 32-bit results: SMULWy Multiplies the 32-bit value of one register with the 16-bit value of either halfword of a second register, and stores the top 32 bits of the signed 48-bit result in a third register. SMLAWy Multiplies the 32-bit value of one register with the 16-bit value of either halfword of a second register, extracts the top 32 bits, adds the 32-bit value from a third register, and stores the signed 32-bit result in a fourth register. SMLAWy sets the Q flag if overflow occurs in the multiplication. SMULWy does not affect any flags. These instructions are available in ARMv5TE and above. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A3-11 The ARM Instruction Set A3.5.5 Most significant word multiply There are three signed 32-bit x 32-bit Multiply instructions that produce top 32-bit results: SMMUL Multiplies the 32-bit values of two registers together, and stores the top 32 bits of the signed 64-bit result in a third register. SMMLA Multiplies the 32-bit values of two registers together, extracts the top 32 bits, adds the 32-bit value from a third register, and stores the signed 32-bit result in a fourth register. SMMLS Multiplies the 32-bit value of two registers together, extracts the top 32 bits, subtracts this from a 32-bit value from a third register, and stores the signed 32-bit result in a fourth register. These instructions do not affect any flags. They are available in ARMv6 and above. A3.5.6 Dual halfword multiply There are six dual, signed 16-bit x 16-bit Multiply instructions: SMUAD Multiplies the values of the top halfwords of two registers together, multiplies the values of the bottom halfwords of the same two registers together, adds the products, and stores the 32-bit result in a third register. SMUSD Multiplies the values of the top halfwords of two registers together, multiplies the values of the bottom halfwords of the same two registers together, subtracts one product from the other, and stores the 32-bit result in a third register. SMLAD Multiplies the 32-bit value of two registers together, extracts the top 32 bits, subtracts this from a 32-bit value from a third register, and stores the signed 32-bit result in a fourth register. SMLSD Multiplies the 32-bit values of two registers together, extracts the top 32 bits, adds the 32-bit value from a third register, and stores the signed 32-bit result in a fourth register. SMLALD Multiplies the 32-bit value of two registers together, extracts the top 32 bits, subtracts this from a 32-bit value from a third register, and stores the signed 32-bit result in a fourth register. SMLSLD Multiplies the 32-bit value of two registers together, extracts the top 32 bits, subtracts this from a 32-bit value from a third register, and stores the signed 32-bit result in a fourth register. SMUAD, SMLAD, and SMLSLD can set the Q flag if overflow occurs in the operation. All other instructions do not affect any flags. They are available in ARMv6 and above. A3-12 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I The ARM Instruction Set A3.5.7 A3.5.8 Examples MUL MULS MLA SMULL R4, R4, R7, R4, R2, R2, R8, R8, R1 R1 R9, R3 R2, R3 UMULL UMLAL R6, R8, R0, R1 R5, R8, R0, R1 ; ; ; ; ; ; ; Set R4 to value of R2 multiplied by R1 R4 = R2 x R1, set N and Z flags R7 = R8 x R9 + R3 R4 = bits 0 to 31 of R2 x R3 R8 = bits 32 to 63 of R2 x R3 R8, R6 = R0 x R1 R8, R5 = R0 x R1 + R8, R5 List of multiply instructions Multiply Accumulate. See MLA on page A4-66. Multiply. See MUL on page A4-80. MLA MUL SMLA Signed halfword Multiply Accumulate. See SMLA on page A4-141. Signed halfword Multiply Accumulate, Dual. See SMLAD on page A4-144. Signed Multiply Accumulate Long. See SMLAL on page A4-146. SMLAD SMLAL SMLAL Signed halfword Multiply Accumulate Long. See SMLAL on page A4-148. Signed halfword Multiply Accumulate Long, Dual. See SMLALD on page A4-150. Signed halfword by word Multiply Accumulate. See SMLAW on page A4-152. Signed halfword Multiply Subtract, Dual. See SMLAD on page A4-144. Signed halfword Multiply Subtract Long Dual. See SMLALD on page A4-150. Signed Most significant word Multiply Accumulate. See SMMLA on page A4-158. Signed Most significant word Multiply Subtract. See SMMLA on page A4-158. Signed Most significant word Multiply. See SMMUL on page A4-162. Signed halfword Multiply, Add, Dual. See SMUAD on page A4-164. SMLALD SMLAW SMLSD SMLSLD SMMLA SMMLS SMMUL SMUAD SMUL SMULL SMULW SMUSD UMAAL UMLAL UMULL ARM DDI 0100I Signed halfword Multiply. See SMUL on page A4-166. Signed Multiply Long. See SMULL on page A4-168. Signed halfword by word Multiply. See SMULW on page A4-170. Signed halfword Multiply, Subtract, Dual. See SMUSD on page A4-172. Unsigned Multiply Accumulate significant Long. See UMAAL on page A4-247. Unsigned Multiply Accumulate Long. See UMLAL on page A4-249. Unsigned Multiply Long. See UMULL on page A4-251. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A3-13 The ARM Instruction Set A3.6 Parallel addition and subtraction instructions In addition to the normal data-processing and multiply instructions, ARMv6 introduces a set of parallel addition and subtraction instructions. There are six basic instructions: ADD16 Adds the top halfwords of two registers to form the top halfword of the result. Adds the bottom halfwords of the same two registers to form the bottom halfword of the result. ADDSUBX Does the following: 1. Exchanges halfwords of the second operand register. 2. Adds top halfwords and subtracts bottom halfwords. SUBADDX Does the following: 1. Exchanges halfwords of the second operand register. 2. Subtracts top halfwords and adds bottom halfwords. SUB16 Subtracts the top halfword of the first operand register from the top halfword of the second operand register to form the top halfword of the result. Subtracts the bottom halfword of the second operand registers from the bottom halfword of the first operand register to form the bottom halfword of the result. ADD8 Adds each byte of the second operand register to the corresponding byte of the first operand register to form the corresponding byte of the result. SUB8 Subtracts each byte of the second operand register from the corresponding byte of the first operand register to form the corresponding byte of the result. Each of the six instructions is available in the following variations, indicated by the prefixes shown: A3-14 S Signed arithmetic modulo 28 or 216. Sets the CPSR GE bits (see The GE[3:0] bits on page A2-13). Q Signed saturating arithmetic. SH Signed arithmetic, halving the results to avoid overflow. U Unsigned arithmetic modulo 28 or 216. Sets the CPSR GE bits (see The GE[3:0] bits on page A2-13). UQ Unsigned saturating arithmetic. UH Unsigned arithmetic, halving the results to avoid overflow. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I The ARM Instruction Set A3.6.1 List of parallel arithmetic instructions QADD16 QADD8 QADDSUBX QSUB16 QSUB8 QSUBADDX SADD16 SADD8 SADDSUBX SSUB16 SSUB8 SSUBADDX SHADD16 SHADD8 SHADDSUBX SHSUB16 SHSUB8 SHSUBADDX UADD16 UADD8 UADDSUBX USUB16 USUB8 USUBADDX UHADD16 UHADD8 UHADDSUBX UHSUB16 UHSUB8 UHSUBADDX UQADD16 UQADD8 UQADDSUBX UQSUB16 UQSUB8 UQSUBADDX ARM DDI 0100I Dual 16-bit signed saturating addition. See QADD16 on page A4-94. Quad 8-bit signed saturating addition. See QADD8 on page A4-95. 16-bit exchange, signed saturating addition, subtraction. See QADDSUBX on page A4-97. Dual 16-bit signed saturating subtraction. See QSUB16 on page A4-104. Quad 8-bit signed saturating subtraction. See QSUB8 on page A4-105. 16-bit exchange, signed saturating subtraction, addition. See QSUBADDX on page A4-107. Dual 16-bit signed addition. See SADD16 on page A4-119. Quad 8-bit signed addition. See SADD8 on page A4-121. 16-bit exchange, signed addition, subtraction. See SADDSUBX on page A4-123. Dual 16-bit signed subtraction. See SSUB16 on page A4-180. Quad 8-bit signed subtraction. See SSUB8 on page A4-182. 16-bit exchange, signed subtraction, addition. See SSUBADDX on page A4-184. Dual 16-bit signed half addition. See SHADD16 on page A4-130. Quad 8-bit signed half addition. See SHADD8 on page A4-131. 16-bit exchange, signed half addition, subtraction. See SHADDSUBX on page A4-133. Dual 16-bit signed half subtraction. See SHSUB16 on page A4-135. Quad 8-bit signed half subtraction. See SHSUB8 on page A4-137. 16-bit exchange, signed half subtraction, addition. See SHSUBADDX on page A4-139. Dual 16-bit unsigned addition. See UADD16 on page A4-232. Quad 8-bit unsigned addition. See UADD8 on page A4-233. 16-bit exchange, unsigned addition, subtraction. See UADDSUBX on page A4-235. Dual 16-bit unsigned subtraction. See USUB16 on page A4-269. Quad 8-bit unsigned subtraction. See USUB8 on page A4-270. 16-bit exchange, unsigned subtraction, addition. See USUBADDX on page A4-272. Dual 16-bit unsigned half addition. See UHADD16 on page A4-237. Quad 8-bit unsigned half addition. See UHADD8 on page A4-238. 16-bit exchange, unsigned half addition, subtraction. See UHADDSUBX on page A4-240. Dual 16-bit unsigned half subtraction. See UHSUB16 on page A4-242. Quad 8-bit unsigned half subtraction. See UHSUB16 on page A4-242. 16-bit exchange, unsigned half subtraction, addition. See UHSUBADDX on page A4-245. Dual 16-bit unsigned saturating addition. See UQADD16 on page A4-253. Quad 8-bit unsigned saturating addition. See UQADD8 on page A4-254. 16-bit exchange, unsigned saturating addition, subtraction. See UQADDSUBX on page A4-255. Dual 16-bit unsigned saturating subtraction. See UQSUB16 on page A4-257. Quad 8-bit unsigned saturating subtraction. See UQSUB8 on page A4-258. 16-bit exchange, unsigned saturating subtraction, addition. See UQSUBADDX on page A4-259. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A3-15 The ARM Instruction Set A3.7 Extend instructions ARMv6 and above provide several instructions for unpacking data by sign or zero extending bytes to halfwords or words, and halfwords to words. You can optionally add the result to the contents of another register. You can rotate the operand register by any multiple of 8 bits before extending. There are six basic instructions: XTAB16 Extend bits[23:16] and bits[7:0] of one register to 16 bits, and add corresponding halfwords to the values in another register. XTAB Extend bits[7:0] of one register to 32 bits, and add to the value in another register. XTAH Extend bits[15:0] of one register to 32 bits, and add to the value in another register. XTB16 Extend bits[23:16] and bits[7:0] to 16 bits each. XTB Extend bits[7:0] to 32 bits. XTH Extend bits[15:0] to 32 bits. Each of the six instructions is available in the following variations, indicated by the prefixes shown: A3.7.1 A3-16 S Sign extension, with or without addition modulo 216 or 232. U Zero (unsigned) extension, with or without addition modulo 216 or 232. List of sign/zero extend and add instructions SXTAB16 Sign extend bytes to halfwords, add halfwords. See SXTAB16 on page A4-218. SXTAB Sign extend byte to word, add. See SXTAB on page A4-216. SXTAH Sign extend halfword to word, add. See SXTAH on page A4-220. SXTB16 Sign extend bytes to halfwords. See SXTB16 on page A4-224. SXTB Sign extend byte to word. See SXTB on page A4-222. SXTH Sign extend halfword to word. See SXTH on page A4-226. UXTAB16 Zero extend bytes to halfwords, add halfwords. See UXTAB16 on page A4-276. UXTAB Zero extend byte to word, add. See UXTAB on page A4-274. UXTAH Zero extend halfword to word, add. See UXTAH on page A4-278. UXTB16 Zero extend bytes to halfwords. See UXTB16 on page A4-282. UXTB Zero extend byte to word. See UXTB on page A4-280. UXTH Zero extend halfword to word. See UXTH on page A4-284. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I The ARM Instruction Set A3.8 Miscellaneous arithmetic instructions ARMv5 and above include several miscellaneous arithmetic instructions. A3.8.1 Count leading zeros ARMv5 and above include a Count Leading Zeros (CLZ) instruction. This instruction returns the number of 0 bits at the most significant end of its operand before the first 1 bit is encountered (or 32 if its operand is 0). Two typical applications for this are: • To determine how many bits the operand should be shifted left to normalize it, so that its most significant bit is 1. (This can be used in integer division routines.) • To locate the highest priority bit in a bit mask. For details see CLZ on page A4-25. A3.8.2 Unsigned sum of absolute differences ARMv6 introduces an Unsigned Sum of Absolute Differences (USAD8) instruction, and an Unsigned Sum of Absolute Differences and Accumulate (USADA8) instruction. These instructions do the following: 1. Take corresponding bytes from two registers. 2. Find the absolute differences between the unsigned values of each pair of bytes. 3. Sum the four absolute values. 4. Optionally, accumulate the sum of the absolute differences with the value in a third register. For details see USAD8 on page A4-261 and USADA8 on page A4-263. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A3-17 The ARM Instruction Set A3.9 Other miscellaneous instructions ARMv6 and above provide several other miscellaneous instructions: PKHBT (Pack Halfword Bottom Top) combines the bottom, least significant, halfword of its first operand with the top (most significant) halfword of its shifted second operand. The shift is a left shift, by any amount from 0 to 31. See PKHBT on page A4-86. PKHTB (Pack Halfword Top Bottom) combines the top, most significant, halfword of its first operand with the bottom (least significant) halfword of its shifted second operand. The shift is an arithmetic right shift, by any amount from 1 to 32. See PKHTB on page A4-88. REV (Byte-Reverse Word) reverses the byte order in a 32-bit register. See REV on page A4-109. REV16 (Byte-Reverse Packed Halfword) reverses the byte order in each 16-bit halfword of a 32-bit register. See REV16 on page A4-110. REVSH (Byte-Reverse Signed Halfword) reverses the byte order in the lower 16-bit halfword of a 32-bit register, and sign extends the result to 32-bits. See REVSH on page A4-111. SEL (Select) selects each byte of its result from either its first operand or its second operand, according to the values of the GE flags. The GE flags record the results of parallel additions or subtractions, see Parallel addition and subtraction instructions on page A3-14. See SEL on page A4-127. SSAT (Signed Saturate) saturates a signed value to a signed range. You can choose the bit position at which saturation occurs. You can apply a shift to the value before the saturation occurs. See SSAT on page A4-176. SSAT16 Saturates two 16-bit signed values to a signed range. You can choose the bit position at which saturation occurs. See SSAT16 on page A4-178. USAT (Unsigned Saturate) saturates a signed value to an unsigned range. You can choose the bit position at which saturation occurs. You can apply a shift to the value before the saturation occurs. See USAT on page A4-265. USAT16 Saturates two signed 16-bit values to an unsigned range. You can choose the bit position at which saturation occurs. See USAT16 on page A4-267. A3-18 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I The ARM Instruction Set A3.10 Status register access instructions There are two instructions for moving the contents of a program status register to or from a general-purpose register. Both the CPSR and SPSR can be accessed. In addition, in ARMv6, there are several instructions that can write directly to specific bits, or groups of bits, in the CPSR. Each status register is traditionally split into four 8-bit fields that can be individually written: Bits[31:24] The flags field. Bits[23:16] The status field. Bits[15:8] The extension field. Bits[7:0] The control field. From ARMv6, the ARM architecture uses the status and extension fields. The usage model of the bit fields no longer reflects the byte-wide definitions. The revised categories are defined in Types of PSR bits on page A2-11. A3.10.1 CPSR value Altering the value of the CPSR has five uses: • sets the value of the condition code flags (and of the Q flag when it exists) to a known value • enables or disable interrupts • changes processor mode (for instance, to initialize stack pointers) • changes the endianness of load and store operations • changes the processor state (J and T bits). Note The T and J bits must not be changed directly by writing to the CPSR, but only via the BX, BLX, or BXJ instructions, and in the implicit SPSR to CPSR moves in instructions designed for exception return. Attempts to enter or leave Thumb or Jazelle state by directly altering the T or J bits have UNPREDICTABLE consequences. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A3-19 The ARM Instruction Set A3.10.2 Examples These examples assume that the ARM processor is already in a privileged mode. If the ARM processor starts in User mode, only the flag update has any effect. MRS BIC MSR R0, CPSR R0, R0, #0xF0000000 CPSR_f, R0 ; ; ; ; Read the CPSR Clear the N, Z, C and V bits Update the flag bits in the CPSR N, Z, C and V flags now all clear MRS ORR MSR R0, CPSR R0, R0, #0x80 CPSR_c, R0 ; ; ; ; Read the CPSR Set the interrupt disable bit Update the control bits in the CPSR interrupts (IRQ) now disabled MRS BIC ORR MSR R0, CPSR R0, R0, #0x1F R0, R0, #0x11 CPSR_c, R0 ; ; ; ; ; Read the CPSR Clear the mode bits Set the mode bits to FIQ mode Update the control bits in the CPSR now in FIQ mode A3.10.3 List of status register access instructions MRS Move PSR to General-purpose Register. See MRS on page A4-74. MSR Move General-purpose Register to PSR. See MSR on page A4-76. CPS Change Processor State. Changes one or more of the processor mode and interrupt enable bits of the CPSR, without changing the other CPSR bits. See CPS on page A4-29. SETEND Modifies the CPSR endianness, E, bit, without changing any other bits in the CPSR. See SETEND on page A4-129. The processor state bits can also be updated by a variety of branch, load and return instructions which update the PC. Changes occur when they are used for Jazelle state entry/exit and Thumb interworking. A3-20 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I The ARM Instruction Set A3.11 Load and store instructions The ARM architecture supports two broad types of instruction which load or store the value of a single register, or a pair of registers, from or to memory: • The first type can load or store a 32-bit word or an 8-bit unsigned byte. • The second type can load or store a 16-bit unsigned halfword, and can load and sign extend a 16-bit halfword or an 8-bit byte. In ARMv5TE and above, it can also load or store a pair of 32-bit words. A3.11.1 Addressing modes In both types of instruction, the addressing mode is formed from two parts: • the base register • the offset. The base register can be any one of the general-purpose registers (including the PC, which allows PC-relative addressing for position-independent code). The offset takes one of three formats: Immediate The offset is an unsigned number that can be added to or subtracted from the base register. Immediate offset addressing is useful for accessing data elements that are a fixed distance from the start of the data object, such as structure fields, stack offsets and input/output registers. For the word and unsigned byte instructions, the immediate offset is a 12-bit number. For the halfword and signed byte instructions, it is an 8-bit number. Register The offset is a general-purpose register (not the PC), that can be added to or subtracted from the base register. Register offsets are useful for accessing arrays or blocks of data. Scaled register The offset is a general-purpose register (not the PC) shifted by an immediate value, then added to or subtracted from the base register. The same shift operations used for data-processing instructions can be used (Logical Shift Left, Logical Shift Right, Arithmetic Shift Right and Rotate Right), but Logical Shift Left is the most useful as it allows an array indexed to be scaled by the size of each array element. Scaled register offsets are only available for the word and unsigned byte instructions. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A3-21 The ARM Instruction Set As well as the three types of offset, the offset and base register are used in three different ways to form the memory address. The addressing modes are described as follows: Offset The base register and offset are added or subtracted to form the memory address. Pre-indexed The base register and offset are added or subtracted to form the memory address. The base register is then updated with this new address, to allow automatic indexing through an array or memory block. Post-indexed The value of the base register alone is used as the memory address. The base register and offset are added or subtracted and this value is stored back in the base register, to allow automatic indexing through an array or memory block. A3.11.2 Load and store word or unsigned byte instructions Load instructions load a single value from memory and write it to a general-purpose register. Store instructions read a value from a general-purpose register and store it to memory. These instructions have a single instruction format: LDR|STR{}{B}{T} Rd, 31 28 27 26 25 24 23 22 21 20 19 cond A3-22 0 1 I P U B W L 16 15 Rn 12 11 Rd 0 addressing_mode_specific I, P, U, W Are bits that distinguish between different types of . See Addressing Mode 2 - Load and Store Word or Unsigned Byte on page A5-18 L bit Distinguishes between a Load (L==1) and a Store instruction (L==0). B bit Distinguishes between an unsigned byte (B==1) and a word (B==0) access. Rn Specifies the base register used by . Rd Specifies the register whose contents are to be loaded or stored. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I The ARM Instruction Set A3.11.3 Load and store halfword or doubleword, and load signed byte instructions Load instructions load a single value from memory and write it to a general-purpose register, or to a pair of general-purpose registers. Store instructions read a value from a general-purpose register, or from a pair of general-purpose registers, and store it to memory. These instructions have a single instruction format: LDR|STR{}D|H|SH|SB 31 Rd, 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 P U I W L 16 15 Rn 12 11 Rd 8 7 6 5 4 3 0 addr_mode 1 S H 1 addr_mode addr_mode Are addressing-mode-specific bits. I, P, U, W Are bits that specify the type of addressing mode (see Addressing Mode 3 - Miscellaneous Loads and Stores on page A5-33). L, S, H These bits combine to specify signed or unsigned loads or stores, and doubleword, halfword, or byte accesses. See Addressing Mode 3 - Miscellaneous Loads and Stores on page A5-33 for details. Rn Specifies the base register used by the addressing mode. Rd Specifies the register whose contents are to be loaded or stored. A3.11.4 Examples ARM DDI 0100I LDR LDR LDR STR R1, [R0] R8, [R3, #4] R12, [R13, #-4] R2, [R1, #0x100] ; ; ; ; Load R1 from the address in R0 Load R8 from the address in R3 + 4 Load R12 from R13 - 4 Store R2 to the address in R1 + 0x100 LDRB R5, [R9] LDRB R3, [R8, #3] STRB R4, [R10, #0x200] ; Load byte into R5 from R9 ; (zero top 3 bytes) ; Load byte to R3 from R8 + 3 ; (zero top 3 bytes) ; Store byte from R4 to R10 + 0x200 LDR STRB R11, [R1, R2] R10, [R7, -R4] ; Load R11 from the address in R1 + R2 ; Store byte from R10 to addr in R7 - R4 LDR LDR STRB R11, [R3, R5, LSL #2] R1, [R0, #4]! R7, [R6, #-1]! ; Load R11 from R3 + (R5 x 4) ; Load R1 from R0 + 4, then R0 = R0 + 4 ; Store byte from R7 to R6 - 1, ; then R6 = R6 - 1 LDR STR R3, [R9], #4 R2, [R5], #8 ; Load R3 from R9, then R9 = R9 + 4 ; Store R2 to R5, then R5 = R5 + 8 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A3-23 The ARM Instruction Set A3-24 LDR R0, [PC, #40] ; Load R0 from PC + 0x40 (= address of ; the LDR instruction + 8 + 0x40) ; Load R0 from R1, then R1 = R1 + R2 LDR R0, [R1], R2 LDRH R1, [R0] LDRH LDRH STRH R8, [R3, #2] R12, [R13, #-6] R2, [R1, #0x80] ; ; ; ; ; LDRSH LDRSB LDRSB R5, [R9] R3, [R8, #3] R4, [R10, #0xC1] ; Load signed halfword to R5 from R9 ; Load signed byte to R3 from R8 + 3 ; Load signed byte to R4 from R10 + 0xC1 LDRH R11, [R1, R2] STRH R10, [R7, -R4] ; Load halfword into R11 from address ; in R1 + R2 ; Store halfword from R10 to R7 - R4 LDRSH R1, [R0, #2]! ; Load signed halfword R1 from R0 + 2, ; then R0 = R0 + 2 LDRSB R7, [R6, #-1]! LDRH R3, [R9], #2 STRH R2, [R5], #8 LDRD R4, [R9] STRD R8, [R2, #0x2C] ; ; ; ; ; ; ; ; ; ; ; ; ; ; Load halfword to R1 from R0 (zero top 2 bytes) Load halfword into R8 from R3 + 2 Load halfword into R12 from R13 - 6 Store halfword from R2 to R1 + 0x80 Load signed byte to R7 from R6 - 1, then R6 = R6 - 1 Load halfword to R3 from R9, then R9 = R9 + 2 Store halfword from R2 to R5, then R5 = R5 + 8 Load word into R4 from the address in R9 Load word into R5 from the address in R9 + 4 Store R8 at the address in R2 + 0x2C Store R9 at the address in R2 + 0x2C+4 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I The ARM Instruction Set A3.11.5 List of load and store instructions LDR Load Word. See LDR on page A4-43. LDRB Load Byte. See LDRB on page A4-46. LDRBT Load Byte with User Mode Privilege. See LDRBT on page A4-48. LDRD Load Doubleword. See LDRD on page A4-50. LDREX Load Exclusive. See LDREX on page A4-52. LDRH Load Unsigned Halfword. See LDRH on page A4-54. LDRSB Load Signed Byte. See LDRSB on page A4-56. LDRSH Load Signed Halfword. See LDRSH on page A4-58. LDRT Load Word with User Mode Privilege. See LDRT on page A4-60. STR Store Word. See STR on page A4-193. STRB Store Byte. See STRB on page A4-195. STRBT Store Byte with User Mode Privilege. See STRBT on page A4-197. STRD Store Doubleword. See STRD on page A4-199. STREX Store Exclusive. See STREX on page A4-202. STRH Store Halfword. See STRH on page A4-204. STRT Store Word with User Mode Privilege. See STRT on page A4-206. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A3-25 The ARM Instruction Set A3.12 Load and Store Multiple instructions Load Multiple instructions load a subset, or possibly all, of the general-purpose registers from memory. Store Multiple instructions store a subset, or possibly all, of the general-purpose registers to memory. Load and Store Multiple instructions have a single instruction format: LDM{} STM{} Rn{!}, {^} Rn{!}, {^} where: = IA | IB | DA | DB | FD | FA | ED | EA 31 28 27 26 25 24 23 22 21 20 19 cond 1 0 0 P U S W L register list 16 15 Rn 0 register list The list of has one bit for each general-purpose register. Bit 0 is for R0, and bit 15 is for R15 (the PC). The register syntax list is an opening bracket, followed by a comma-separated list of registers, followed by a closing bracket. A sequence of consecutive registers can be specified by separating the first and last registers in the range with a minus sign. P, U, and W bits These distinguish between the different types of addressing mode (see Addressing Mode 4 - Load and Store Multiple on page A5-41). S bit For LDMs that load the PC, the S bit indicates that the CPSR is loaded from the SPSR after all the registers have been loaded. For all STMs, and LDMs that do not load the PC, it indicates that when the processor is in a privileged mode, the User mode banked registers are transferred and not the registers of the current mode. L bit This distinguishes between a Load (L==1) and a Store (L==0) instruction. Rn This specifies the base register used by the addressing mode. A3.12.1 Examples STMFD LDMFD LDMIA STMDA A3-26 R13!, {R0 R13!, {R0 R0, {R5 R1!, {R2, - R12, LR} - R12, PC} R8} R5, R7 - R9, R11} Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I The ARM Instruction Set A3.12.2 List of Load and Store Multiple instructions LDM Load Multiple. See LDM (1) on page A4-36. LDM User Registers Load Multiple. See LDM (2) on page A4-38. LDM Load Multiple with Restore CPSR. See LDM (3) on page A4-40. STM Store Multiple. See STM (1) on page A4-189. STM User Registers Store Multiple. See STM (2) on page A4-191. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A3-27 The ARM Instruction Set A3.13 Semaphore instructions The ARM instruction set has two semaphore instructions: • Swap (SWP) • Swap Byte (SWPB). These instructions are provided for process synchronization. Both instructions generate an atomic load and store operation, allowing a memory semaphore to be loaded and altered without interruption. SWP and SWPB have a single addressing mode, whose address is the contents of a register. Separate registers are used to specify the value to store and the destination of the load. If the same register is specified for both of these, SWP exchanges the value in the register and the value in memory. The semaphore instructions do not provide a compare and conditional write facility. If wanted, this must be done explicitly. Note The swap and swap byte instructions are deprecated in ARMv6. It is recommended that all software migrates to using the new LDREX and STREX synchronization primitives listed in List of load and store instructions on page A3-25. A3.13.1 Examples SWP R12, R10, [R9] ; load R12 from address R9 and ; store R10 to address R9 SWPB R3, R4, [R8] ; load byte to R3 from address R8 and ; store byte from R4 to address R8 SWP R1, R1, [R2] ; Exchange value in R1 and address in R2 A3.13.2 List of semaphore instructions A3-28 SWP Swap. See SWP on page A4-212. SWPB Swap Byte. See SWPB on page A4-214. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I The ARM Instruction Set A3.14 Exception-generating instructions The ARM instruction set provides two types of instruction whose main purpose is to cause a processor exception to occur: • The Software Interrupt (SWI) instruction is used to cause a SWI exception to occur (see Software Interrupt exception on page A2-20). This is the main mechanism in the ARM instruction set by which User mode code can make calls to privileged Operating System code. • The Breakpoint (BKPT) instruction is used for software breakpoints in ARMv5 and above. Its default behavior is to cause a Prefetch Abort exception to occur (see Prefetch Abort (instruction fetch memory abort) on page A2-20). A debug monitor program which has previously been installed on the Prefetch Abort vector can handle this exception. If debug hardware is present in the system, it is allowed to override this default behavior. Details of whether and how this happens are IMPLEMENTATION DEFINED. A3.14.1 Instruction encodings SWI{} 31 28 27 26 25 24 23 cond BKPT 31 0 1 1 1 1 immed_24 28 27 26 25 24 23 22 21 20 19 1 1 1 0 0 0 0 1 0 0 1 0 8 7 immed 4 0 1 1 1 3 0 immed In both SWI and BKPT, the immediate fields of the instruction are ignored by the ARM processor. The SWI or Prefetch Abort handler can optionally be written to load the instruction that caused the exception and extract these fields. This allows them to be used to communicate extra information about the Operating System call or breakpoint to the handler. A3.14.2 List of exception-generating instructions BKPT Breakpoint. See BKPT on page A4-14. SWI Software Interrupt. See SWI on page A4-210. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A3-29 The ARM Instruction Set A3.15 Coprocessor instructions The ARM instruction set provides three types of instruction for communicating with coprocessors. These allow: • the ARM processor to initiate a coprocessor data processing operation • ARM registers to be transferred to and from coprocessor registers • the ARM processor to generate addresses for the coprocessor Load and Store instructions. The instruction set distinguishes up to 16 coprocessors with a 4-bit field in each coprocessor instruction, so each coprocessor is assigned a particular number. Note One coprocessor can use more than one of the 16 numbers if a large coprocessor instruction set is required. Coprocessors execute the same instruction stream as ARM, ignoring ARM instructions and coprocessor instructions for other coprocessors. Coprocessor instructions that cannot be executed by coprocessor hardware cause an Undefined Instruction exception, allowing software emulation of coprocessor hardware. A coprocessor can partially execute an instruction and then cause an exception. This is useful for handling run-time-generated exceptions, like divide-by-zero or overflow. However, the partial execution is internal to the coprocessor and is not visible to the ARM processor. As far as the ARM processor is concerned, the instruction is held at the start of its execution and completes without exception if allowed to begin execution. Any decision on whether to execute the instruction or cause an exception is taken within the coprocessor before the ARM processor is allowed to start executing the instruction. Not all fields in coprocessor instructions are used by the ARM processor. Coprocessor register specifiers and opcodes are defined by individual coprocessors. Therefore, only generic instruction mnemonics are provided for coprocessor instructions. Assembler macros can be used to transform custom coprocessor mnemonics into these generic mnemonics, or to regenerate the opcodes manually. A3.15.1 Examples A3-30 CDP p5, 2, c12, c10, c3, 4 ; ; ; ; Coproc 5 data operation opcode 1 = 2, opcode 2 = 4 destination register is 12 source registers are 10 and 3 MRC p15, 5, R4, c0, c2, 3 ; ; ; ; Coproc 15 transfer to ARM register opcode 1 = 5, opcode 2 = 3 ARM destination register = R4 coproc source registers are 0 and 2 MCR p14, 1, R7, c7, c12, 6 ; ; ; ; ARM register transfer to Coproc 14 opcode 1 = 1, opcode 2 = 6 ARM source register = R7 coproc dest registers are 7 and 12 LDC p6, CR1, [R4] ; Load from memory to coprocessor 6 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I The ARM Instruction Set ; ARM register 4 contains the address ; Load to CP reg 1 LDC p6, CR4, [R2, #4] ; Load from memory to coprocessor 6 ; ARM register R2 + 4 is the address ; Load to CP reg 4 STC p8, CR8, [R2, #4]! ; ; ; ; Store from coprocessor 8 to memory ARM register R2 + 4 is the address after the transfer R2 = R2 + 4 Store from CP reg 8 STC p8, CR9, [R2], #-16 ; ; ; ; Store from coprocessor 8 to memory ARM register R2 holds the address after the transfer R2 = R2 - 16 Store from CP reg 9 A3.15.2 List of coprocessor instructions CDP Coprocessor Data Operations. See CDP on page A4-23. LDC Load Coprocessor Register. See LDC on page A4-34. MCR Move to Coprocessor from ARM Register. See MCR on page A4-62. MCRR Move to Coprocessor from two ARM Registers. See MCRR on page A4-64. MRC Move to ARM Register from Coprocessor. See MRC on page A4-70. MRRC Move to two ARM Registers from Coprocessor. See MRRC on page A4-72. STC Store Coprocessor Register. See STC on page A4-186. Note MCRR and MRRC are only available in ARMv5TE and above. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A3-31 The ARM Instruction Set A3.16 Extending the instruction set Successive versions of the ARM architecture have extended the instruction set in a number of areas. This section describes the six areas where extensions have occurred, and where further extensions can occur in the future: • Media instruction space on page A3-33 • Multiply instruction extension space on page A3-35 • Control and DSP instruction extension space on page A3-36 • Load/store instruction extension space on page A3-38 • Architecturally Undefined Instruction space on page A3-39 • Coprocessor instruction extension space on page A3-40 • Unconditional instruction extension space on page A3-41. Instructions in these areas which have not yet been allocated a meaning are either UNDEFINED or To determine which, use the following rules: UNPREDICTABLE. 1. The decode bits of an instruction are defined to be bits[27:20] and bits[7:4]. In ARMv5 and above, the result of ANDing bits[31:28] together is also a decode bit. This bit determines whether the condition field is 0b1111, which is used in ARMv5 and above to encode various instructions which can only be executed unconditionally. See Condition code 0b1111 on page A3-4 and Unconditional instruction extension space on page A3-41 for more information. 2. If the decode bits of an instruction are equal to those of a defined instruction, but the whole instruction is not a defined instruction, then the instruction is UNPREDICTABLE. For example, suppose an instruction has: • bits[31:28] not equal to 0b1111 • bits[27:20] equal to 0b00010000 • bits[7:4] equal to 0b0000 but where: • bit[11] of the instruction is 1. Here, the instruction is in the control instruction extension space and has the same decode bits as an MRS instruction, but is not a valid MRS instruction because bit[11] of an MRS instruction should be zero. Using the above rule, this instruction is UNPREDICTABLE. 3. If the decode bits of an instruction are not equal to those of any defined instruction, then the instruction is UNDEFINED. Rules 2 and 3 above apply separately to each ARM architecture version. As a result, the status of an instruction might differ between architecture versions. Usually, this happens because an instruction which was UNPREDICTABLE or UNDEFINED in an earlier architecture version becomes a defined instruction in a later version. For the purposes of this section, all coprocessor instructions described in Chapter A4 ARM Instructions as appearing in a version of the architecture have been allocated. The definitions of any coprocessors using the coprocessor instructions determine the function of the instructions. Such coprocessors can define UNPREDICTABLE and UNDEFINED behaviours. A3-32 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I The ARM Instruction Set A3.16.1 Media instruction space Instructions with the following opcodes are defined as residing in the media instruction space: opcode[27:25] = 0b011 opcode[4] = 1 31 28 27 26 25 24 cond 0 1 1 op 5 4 3 0 x x x x x x x x x x x x x x x x x x 1 x x x x The meaning of unallocated instructions in the media instruction space is UNDEFINED on all versions of the ARM architecture. Table A3-3 summarizes the instructions that have already been allocated in this area. Table A3-3 Media instruction space Instructions Architecture versions Parallel additions, subtractions, and addition with subtractions. See Parallel addition and subtraction instructions on page A3-14. ARMv6 and above PKH, SSAT, SSAT16, USAT, USAT16, SEL ARMv6 and above Also sign/zero extend and add instructions. See Extend instructions on page A3-16. SMLAD, SMLSD, SMLALD, SMUAD, SMUSD ARMv6 and above USAD8, USADA8 ARMv6 and above REV, REV16, REVSH ARMv6 and above Figure A3-2 on page A3-34 provides details of these instructions. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A3-33 The ARM Instruction Set 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 Parallel add/subtract cond 0 1 1 0 0 opc1 Halfword pack cond 0 1 1 0 1 0 0 0 Word saturate cond 0 1 1 0 1 U 1 Parallel halfword saturate cond 0 1 1 0 1 U 1 0 Byte reverse word cond Byte reverse packed halfword Rn Rd Rn Rd 9 8 6 opc2 5 4 3 2 1 1 Rm shift_imm op 0 1 Rm Rd shift_imm sh 0 1 Rm sat_imm Rd SBO 0 0 1 1 Rm 0 1 1 0 1 0 1 1 SBO Rd SBO 0 0 1 1 Rm cond 0 1 1 0 1 0 1 1 SBO Rd SBO 1 0 1 1 Rm Byte reverse signed halfword cond 0 1 1 0 1 1 1 1 SBO Rd SBO 1 0 1 1 Rm Select bytes cond 0 1 1 0 1 0 0 0 Rn Rd SBO 1 0 1 1 Rm Sign/zero extend (add) cond 0 1 1 0 1 op Rn Rd rotate SBZ 0 1 1 1 Rm Multiplies (type 3) cond 0 1 1 1 0 opc1 Rd/RdHi Rn/RdLo Rs Unsigned sum of absolute differences cond 0 1 1 1 1 0 0 0 Rd Rn* Unsigned sum of absolute differences, acc cond 0 1 1 1 1 0 0 0 Rd 1 1 1 1 sat_imm SBO 7 opc2 1 Rm Rs 0 0 0 1 Rm Rs 0 0 0 1 Rm 0 Figure A3-2 Media instructions Rn* A3-34 Rn != R15. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I The ARM Instruction Set A3.16.2 Multiply instruction extension space Instructions with the following opcodes are the multiply instruction extension space: opcode[27:24] opcode[7:4] opcode[31:28] == 0b0000 == 0b1001 != 0b1111 /* Only required for version 5 and above */ The field names given are guidelines suggested to simplify implementation. 31 28 27 26 25 24 23 cond 0 0 0 0 20 19 op1 16 15 Rn 12 11 Rd 8 7 6 5 4 Rs 3 0 1 0 0 1 Rm Table A3-4 summarizes the instructions that have already been allocated in this area. Table A3-4 Multiply instruction extension space Instructions Architecture versions MUL, MULS, MLA, MLAS All UMULL, UMULLS, UMLAL, UMLALS, SMULL, SMULLS, SMLAL, SMLALS All UMAAL ARMv6 and above Figure A3-3 provides details of these instructions. 31 30 29 28 27 26 25 24 23 22 21 20 19 Multiply (acc) 18 17 16 15 14 13 12 11 10 9 8 7 6 5 1 3 2 1 0 0 0 0 0 0 A S Rd Rn Unsigned multiply acc acc long cond 0 0 0 0 0 1 0 0 RdHi RdLo Rs 1 0 0 1 Rm Multiply (acc) long cond 0 0 0 0 1 Un A S RdHi RdLo Rs 1 0 0 1 Rm Rs 1 0 0 4 cond 0 Rm Figure A3-3 Multiply instructions A Un S ARM DDI 0100I Accumulate 1 = Unsigned, 0 = Signed Status register update (SPSR => CPSR) Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A3-35 The ARM Instruction Set A3.16.3 Control and DSP instruction extension space Instructions with the following opcodes are the control instruction space. opcode[27:26] opcode[24:23] opcode[20] opcode[31:28] == == == != 0b00 0b10 0 0b1111 /* Only required for version 5 and above */ and not: opcode[25] == 0 opcode[7] == 1 opcode[4] == 1 The field names given are guidelines suggested to simplify implementation. 31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 8 cond 0 0 0 1 0 op1 0 Rn Rd Rs cond 0 0 0 1 0 op1 0 Rn Rd Rs cond 0 0 1 1 0 R 1 0 Rn Rd rotate_imm 7 6 5 4 3 op2 0 0 Rm 0 op2 1 Rm immed_8 Table A3-5 summarizes the instructions that have already been allocated in this area. Table A3-5 Control and DSP extension space instructions A3-36 Instruction Architecture versions MRS All MSR (register form) All BX ARMv5 and above, plus T variants of ARMv4 CLZ ARMv5 and above BXJ ARMv5EJ and above BLX (register form) ARMv5 and above QADD E variants of ARMv5 and above QSUB E variants of ARMv5 and above QDADD E variants of ARMv5 and above Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I The ARM Instruction Set Table A3-5 Control and DSP extension space instructions (continued) Instruction Architecture versions QDSUB E variants of ARMv5 and above BKPT ARMv5 and above SMLA E variants of ARMv5 and above SMLAW E variants of ARMv5 and above SMULW E variants of ARMv5 and above SMLAL E variants of ARMv5 and above SMUL E variants of ARMv5 and above MSR (immediate form) All Figure A3-4 provides details of these instructions. 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Move status register to register cond 0 0 0 1 0 R 0 0 SBO Rd SBZ 0 0 0 0 SBZ Move register to status register cond 0 0 0 1 0 R 1 0 mask SBO SBZ 0 0 0 0 Rm Move immediate to status register cond 0 0 1 1 0 R 1 0 mask SBO rot_imm Branch/exchange instruction set Thumb cond 0 0 0 1 0 0 1 0 SBO SBO SBO 0 0 0 1 Rm Branch/exchange instruction set Java cond 0 0 0 1 0 0 1 0 SBO SBO SBO 0 0 1 0 Rm Count leading zeros cond 0 0 0 1 0 1 1 0 SBO Rd SBO 0 0 0 1 Rm Branch and link/exchange instruction set Thumb cond 0 0 0 1 0 0 1 0 SBO SBO SBO 0 0 1 1 Rm Saturating add/subtract cond 0 0 0 1 0 Rn Rd SBZ 0 1 0 1 Rm Software breakpoint cond 0 0 0 1 0 0 1 0 0 1 1 1 immed Signed multiplies (type 2) cond 0 0 0 1 0 1 y x 0 Rm op op 0 0 immed Rd Rn Rs 0 immed Figure A3-4 Miscellaneous instructions ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A3-37 The ARM Instruction Set A3.16.4 Load/store instruction extension space Instructions with the following opcodes are the load/store instruction extension space: opcode[27:25] opcode[7] opcode[4] opcode[31:28] == == == != 0b000 1 1 0b1111 /* Only required for version 5 and above */ and not: opcode[24] == 0 opcode[6:5] == 0 The field names given are guidelines suggested to simplify implementation. 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 P U B W L 16 15 Rn 12 11 Rd 8 Rs 7 6 5 4 3 1 op1 1 0 Rm Table A3-6 summarizes the instructions that have already been allocated in this area. Table A3-6 Load/store instructions Instruction Architecture versions SWP/SWPB All (deprecated in ARMv6) LDREX ARMv6 and above STREX ARMv6 and above STRH All LDRD E variants of ARMv5 and above, except ARMv5TExP STRD E variants of ARMv5 and above, except ARMv5TExP LDRH All LDRSB All LDRSH All Figure A3-5 on page A3-39 provides details of these extra load/store instructions. A3-38 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I The ARM Instruction Set 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Swap/swap byte cond 0 0 0 1 0 B 0 0 Rn Rd SBZ 1 0 0 1 Rm Load/store register exclusive cond 0 0 0 1 1 0 0 L Rn Rd SBO 1 0 0 1 SBO Load/store halfword register offset cond 0 0 0 P U 0 W L Rn Rd SBZ 1 0 1 1 Rm Load/store halfword immediate offset cond 0 0 0 P U 1 W L Rn Rd HiOffset 1 0 1 1 LoOffset Load signed halfword/byte immediate offset cond 0 0 0 P U 1 W 1 Rn Rd HiOffset 1 1 H 1 LoOffset Load signed halfword/byte register offset cond 0 0 0 P U 0 W 1 Rn Rd SBZ 1 1 H 1 Rm Load/store doubleword register offset cond 0 0 0 P U 0 W 0 Rn Rd SBZ 1 1 St 1 Rm Load/store doubleword immediate offset cond 0 0 0 P U 1 W 0 Rn Rd HiOffset 1 1 St 1 LoOffset B P, U, I, W L H St 0 Figure A3-5 Extra Load/store instructions 1 = Byte, 0 = Word Pre/post indexing or offset, Up/down, Immediate/register offset, and address Write-back fields for the address mode. See Chapter A5 ARM Addressing Modes for more details. 1 = Load, 0 = Store 1= Halfword, 0 = Byte 1 = Store, 0 = Load A3.16.5 Architecturally Undefined Instruction space In general, Undefined instructions might be used to extend the ARM instruction set in the future. However, it is intended that instructions with the following encoding will not be used for this: 31 28 27 26 25 24 23 22 21 20 19 cond 8 7 6 5 4 3 2 1 0 0 1 1 1 1 1 1 1 x x x x x x x x x x x x 1 1 1 1 x x x x If a programmer wants to use an Undefined instruction for software purposes, with minimal risk that future hardware will treat it as a defined instruction, one of the instructions with this encoding must be used. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A3-39 The ARM Instruction Set A3.16.6 Coprocessor instruction extension space Instructions with the following opcodes are the coprocessor instruction extension space: opcode[27:23] opcode[21] == 0b11000 == 0 The field names given are guidelines suggested to simplify implementation. 31 28 27 26 25 24 23 22 21 20 19 cond 1 1 0 0 0 x 0 x 16 15 Rn 12 11 CRd 8 7 cp_num 0 offset In all variants of ARMv4, and in non-E variants of ARMv5, all instructions in the coprocessor instruction extension space are UNDEFINED. It is IMPLEMENTATION DEFINED how an ARM processor achieves this. The options are: • The ARM processor might take the Undefined Instruction exception directly. • The ARM processor might require attached coprocessors not to respond to such instructions. This causes the Undefined Instruction exception to be taken (see Undefined Instruction exception on page A2-19). From E variants of ARMv5, instructions in the coprocessor instruction extension space are treated as follows: A3-40 • Instructions with bit[22] == 0 are UNDEFINED and are handled in precisely the same way as described above for non-E variants. • Instructions with bit[22] ==1 are the MCRR and MRRC instructions, see MCRR on page A4-64 and MRRC on page A4-72. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I The ARM Instruction Set A3.16.7 Unconditional instruction extension space In ARMv5 and above, instructions with the following opcode are the unconditional instruction space: opcode[31:28] == 0b1111 31 30 29 28 27 1 1 1 1 20 19 opcode1 8 7 x x x x x x x x x x x x 4 opcode2 3 0 x x x x Table A3-7 summarizes the instructions that have already been allocated in this area. Table A3-7 Unconditional instruction extension space Instruction Architecture versions CPS/SETEND ARMv6 and above PLD E variants of ARMv5 and above, except ARMv5TExP RFE ARMv6 SRS ARMv6 BLX (address form) ARMv5 and above MCRR2 ARMv6 and above MRRC2 ARMv6 and above STC2 ARMv5 and above LDC2 ARMv5 and above CDP2 ARMv5 and above MCR2 ARMv5 and above MRC2 ARMv5 and above Figure A3-6 on page A3-42 provides details of the unconditional instructions. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A3-41 The ARM Instruction Set 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 Change Processor State Set Endianness 1 1 1 1 0 0 0 1 0 0 0 0 imod M 0 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 1 8 5 4 SBZ S E B 0 0 0 0 Z Save Return State 1 1 1 1 1 0 0 P U 1 W 0 1 1 0 1 SBZ 0 1 0 1 Return From Exception 1 1 1 1 1 0 0 P U 0 W 1 Rn SBZ 1 0 1 0 1 1 1 1 3 2 1 0 mode SBZ addr_mode SBZ mode SBZ 24-bit offset Additional coprocessor double register transfer 1 1 1 1 1 1 0 0 0 1 0 L Additional coprocessor register transfer 1 1 1 1 1 1 1 0 Undefined instruction 6 A I F 0 1 1 1 1 0 1 X 1 U 1 0 1 1 1 1 1 1 0 1 H 7 SBZ Cache Preload Branch with Link and change to Thumb Rn 9 opc1 L Rn Rd cp_num opcode CRn Rd cp_num opc2 CRm 1 CRm 1 1 1 1 1 1 1 1 x x x x x x x x x x x x x x x x x x x x x x x x Figure A3-6 Unconditional instructions A3-42 M mmod X In addressing mode 2, X=0 implies an immediate offset/index, and X=1 a register based offset/index. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I Chapter A4 ARM Instructions This chapter describes the syntax and usage of every ARM® instruction, in the sections: • Alphabetical list of ARM instructions on page A4-2 • ARM instructions and architecture versions on page A4-286. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-1 ARM Instructions A4.1 Alphabetical list of ARM instructions Every ARM instruction is listed on the following pages. Each instruction description shows: • the instruction encoding • the instruction syntax • the version of the ARM architecture where the instruction is valid • any exceptions that apply • an example in pseudo-code of how the instruction operates • notes on usage and special cases. A4.1.1 General notes These notes explain the types of information and abbreviations used on the instruction pages. Addressing modes Many instructions refer to one of the addressing modes described in Chapter A5 ARM Addressing Modes. The description of the referenced addressing mode should be considered an intrinsic part of the instruction description. In particular: • The addressing mode’s encoding diagram and assembler syntax provide additional details over and above the instruction’s encoding diagram and assembler syntax. • The addressing mode’s Operation pseudo-code calculates values used in the instruction’s pseudo-code, and in some cases specify additional effects of the instruction. • All usage notes, operand restrictions, and other notes about the addressing mode apply to the instruction. Syntax abbreviations The following abbreviations are used in the instruction pages: immed_n This is an immediate value, where n is the number of bits. For example, an 8-bit immediate value is represented by: immed_8 offset_n This is an offset value, where n is the number of bits. For example, an 8-bit offset value is represented by: offset_8 The same construction is used for signed offsets. For example, an 8-bit signed offset is represented by: signed_offset_8 A4-2 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Encoding diagram and assembler syntax For the conventions used, see Assembler syntax descriptions on page xxii. Architecture versions This gives details of architecture versions where the instruction is valid. For further information on architecture versions, see Architecture versions and variants on page xiii. Exceptions This gives details of which exceptions can occur during the execution of the instruction. Prefetch Abort is not listed in general, both because it can occur for any instruction and because if an abort occurred during instruction fetch, the instruction bit pattern is not known. (Prefetch Abort is however listed for BKPT, since it can generate a Prefetch Abort exception without these considerations applying.) Operation This gives a pseudo-code description of what the instruction does. For details of conventions used in this pseudo-code, see Pseudo-code descriptions of instructions on page xxi. Information on usage Usage sections are included where appropriate to supply suggestions and other information about how to use the instruction effectively. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-3 ARM Instructions A4.1.2 ADC 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 I 0 1 0 1 S 16 15 Rn 12 11 Rd 0 shifter_operand ADC (Add with Carry) adds two values and the Carry flag. The first value comes from a register. The second value can be either an immediate value or a value from a register, and can be shifted before the addition. ADC can optionally update the condition code flags, based on the result. Syntax ADC{}{S} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. S Causes the S bit (bit[20]) in the instruction to be set to 1 and specifies that the instruction updates the CPSR. If S is omitted, the S bit is set to 0 and the CPSR is not changed by the instruction. Two types of CPSR update can occur when S is specified: • If is not R15, the N and Z flags are set according to the result of the addition, and the C and V flags are set according to whether the addition generated a carry (unsigned overflow) and a signed overflow, respectively. The rest of the CPSR is unchanged. • If is R15, the SPSR of the current mode is copied to the CPSR. This form of the instruction is UNPREDICTABLE if executed in User mode or System mode, because these modes do not have an SPSR. Specifies the destination register. Specifies the register that contains the first operand. Specifies the second operand. The options for this operand are described in Addressing Mode 1 - Data-processing operands on page A5-2, including how each option causes the I bit (bit[25]) and the shifter_operand bits (bits[11:0]) to be set in the instruction. If the I bit is 0 and both bit[7] and bit[4] of shifter_operand are 1, the instruction is not ADC. Instead, see Extending the instruction set on page A3-32 to determine which instruction it is. Architecture version All. A4-4 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Exceptions None. Operation if ConditionPassed(cond) then Rd = Rn + shifter_operand + C Flag if S == 1 and Rd == R15 then if CurrentModeHasSPSR() then CPSR = SPSR else UNPREDICTABLE else if S == 1 then N Flag = Rd[31] Z Flag = if Rd == 0 then 1 else 0 C Flag = CarryFrom(Rn + shifter_operand + C Flag) V Flag = OverflowFrom(Rn + shifter_operand + C Flag) Usage Use ADC to synthesize multi-word addition. If register pairs R0, R1 and R2, R3 hold 64-bit values (where R0 and R2 hold the least significant words) the following instructions leave the 64-bit sum in R4, R5: ADDS R4,R0,R2 ADC R5,R1,R3 If the second instruction is changed from: ADC R5,R1,R3 to: ADCS R5,R1,R3 the resulting values of the flags indicate: N The 64-bit addition produced a negative result. C An unsigned overflow occurred. V A signed overflow occurred. Z The most significant 32 bits are all zero. The following instruction produces a single-bit Rotate Left with Extend operation (33-bit rotate through the Carry flag) on R0: ADCS R0,R0,R0 See Data-processing operands - Rotate right with extend on page A5-17 for information on how to perform a similar rotation to the right. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-5 ARM Instructions A4.1.3 ADD 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 I 0 1 0 0 S 16 15 Rn 12 11 Rd 0 shifter operand ADD adds two values. The first value comes from a register. The second value can be either an immediate value or a value from a register, and can be shifted before the addition. ADD can optionally update the condition code flags, based on the result. Syntax ADD{}{S} , , where: Is the condition under which the instruction is executed. The condition field on page A3-3. If is omitted, the AL (always) condition is used. S Causes the S bit (bit[20]) in the instruction to be set to 1 and specifies that the instruction updates the CPSR. If S is omitted, the S bit is set to 0 and the CPSR is not changed by the instruction. Two types of CPSR update can occur when S is specified: • If is not R15, the N and Z flags are set according to the result of the addition, and the C and V flags are set according to whether the addition generated a carry (unsigned overflow) and a signed overflow, respectively. The rest of the CPSR is unchanged. • If is R15, the SPSR of the current mode is copied to the CPSR. This form of the instruction is UNPREDICTABLE if executed in User mode or System mode, because these modes do not have an SPSR. Specifies the destination register. Specifies the register that contains the first operand. Specifies the second operand. The options for this operand are described in Addressing Mode 1 - Data-processing operands on page A5-2, including how each option causes the I bit (bit[25]) and the shifter_operand bits (bits[11:0]) to be set in the instruction. If the I bit is 0 and both bit[7] and bit[4] of shifter_operand are 1, the instruction is not ADD. Instead, see Extending the instruction set on page A3-32 to determine which instruction it is. Architecture version All. A4-6 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Exceptions None. Operation if ConditionPassed(cond) then Rd = Rn + shifter_operand if S == 1 and Rd == R15 then if CurrentModeHasSPSR() then CPSR = SPSR else UNPREDICTABLE else if S == 1 then N Flag = Rd[31] Z Flag = if Rd == 0 then 1 else 0 C Flag = CarryFrom(Rn + shifter_operand) V Flag = OverflowFrom(Rn + shifter_operand) Usage Use ADD to add two values together. To increment a register value in Rx use: ADD Rx, Rx, #1 You can perform constant multiplication of Rx by 2n+1 into Rd with: ADD Rd, Rx, Rx, LSL #n To form a PC-relative address use: ADD Rd, PC, #offset where the offset must be the difference between the required address and the address held in the PC, where the PC is the address of the ADD instruction itself plus 8 bytes. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-7 ARM Instructions A4.1.4 AND 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 I 0 0 0 0 S 16 15 Rn 12 11 Rd 0 shifter_operand AND performs a bitwise AND of two values. The first value comes from a register. The second value can be either an immediate value or a value from a register, and can be shifted before the AND operation. AND can optionally update the condition code flags, based on the result. Syntax AND{}{S} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. S Causes the S bit (bit[20]) in the instruction to be set to 1 and specifies that the instruction updates the CPSR. If S is omitted, the S bit is set to 0 and the CPSR is not changed by the instruction. Two types of CPSR update can occur when S is specified: • If is not R15, the N and Z flags are set according to the result of the operation, and the C flag is set to the carry output bit generated by the shifter (see Addressing Mode 1 - Data-processing operands on page A5-2). The V flag and the rest of the CPSR are unaffected. • If is R15, the SPSR of the current mode is copied to the CPSR. This form of the instruction is UNPREDICTABLE if executed in User mode or System mode, because these modes do not have an SPSR. Specifies the destination register. Specifies the register that contains the first operand. Specifies the second operand. The options for this operand are described in Addressing Mode 1 - Data-processing operands on page A5-2, including how each option causes the I bit (bit[25]) and the shifter_operand bits (bits[11:0]) to be set in the instruction. If the I bit is 0 and both bit[7] and bit[4] of shifter_operand are 1, the instruction is not AND. Instead, see Extending the instruction set on page A3-32 to determine which instruction it is. Architecture version All. A4-8 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Exceptions None. Operation if ConditionPassed(cond) then Rd = Rn AND shifter_operand if S == 1 and Rd == R15 then if CurrentModeHasSPSR() then CPSR = SPSR else UNPREDICTABLE else if S == 1 then N Flag = Rd[31] Z Flag = if Rd == 0 then 1 else 0 C Flag = shifter_carry_out V Flag = unaffected Usage AND is most useful for extracting a field from a register, by ANDing the register with a mask value that has 1s in the field to be extracted, and 0s elsewhere. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-9 ARM Instructions A4.1.5 B, BL 31 28 27 26 25 24 23 cond 1 0 1 L 0 signed_immed_24 B (Branch) and BL (Branch and Link) cause a branch to a target address, and provide both conditional and unconditional changes to program flow. BL also stores a return address in the link register, R14 (also known as LR). Syntax B{L}{} where: L Causes the L bit (bit 24) in the instruction to be set to 1. The resulting instruction stores a return address in the link register (R14). If L is omitted, the L bit is 0 and the instruction simply branches without storing a return address. Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the address to branch to. The branch target address is calculated by: 1. Sign-extending the 24-bit signed (two's complement) immediate to 30 bits. 2. Shifting the result left two bits to form a 32-bit value. 3. Adding this to the contents of the PC, which contains the address of the branch instruction plus 8 bytes. The instruction can therefore specify a branch of approximately ±32MB (see Usage on page A4-11 for precise range). Architecture version All. Exceptions None. A4-10 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then if L == 1 then LR = address of the instruction after the branch instruction PC = PC + (SignExtend_30(signed_immed_24) << 2) Usage Use BL to perform a subroutine call. The return from subroutine is achieved by copying R14 to the PC. Typically, this is done by one of the following methods: • Executing a BX R14 instruction, on architecture versions that support that instruction. • Executing a MOV PC,R14 instruction. • Storing a group of registers and R14 to the stack on subroutine entry, using an instruction of the form: STMFD R13!,{,R14} and then restoring the register values and returning with an instruction of the form: LDMFD R13!,{,PC} To calculate the correct value of signed_immed_24, the assembler (or other toolkit component) must: 1. Form the base address for this branch instruction. This is the address of the instruction, plus 8. In other words, this base address is equal to the PC value used by the instruction. 2. Subtract the base address from the target address to form a byte offset. This offset is always a multiple of four, because all ARM instructions are word-aligned. 3. If the byte offset is outside the range −33554432 to +33554428, use an alternative code-generation strategy or produce an error as appropriate. 4. Otherwise, set the signed_immed_24 field of the instruction to bits{25:2] of the byte offset. Notes Memory bounds ARM DDI 0100I Branching backwards past location zero and forwards over the end of the 32-bit address space is UNPREDICTABLE. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-11 ARM Instructions A4.1.6 BIC 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 I 1 1 1 0 S 16 15 Rn 12 11 Rd 0 shifter_operand BIC (Bit Clear) performs a bitwise AND of one value with the complement of a second value. The first value comes from a register. The second value can be either an immediate value or a value from a register, and can be shifted before the BIC operation. BIC can optionally update the condition code flags, based on the result. Syntax BIC{}{S} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. S Causes the S bit, bit[20], in the instruction to be set to 1 and specifies that the instruction updates the CPSR. If S is omitted, the S bit is set to 0 and the CPSR is not changed by the instruction. Two types of CPSR update can occur when S is specified: • If is not R15, the N and Z flags are set according to the result of the operation, and the C flag is set to the carry output bit generated by the shifter (see Addressing Mode 1 - Data-processing operands on page A5-2). The V flag and the rest of the CPSR are unaffected. • If is R15, the SPSR of the current mode is copied to the CPSR. This form of the instruction is UNPREDICTABLE if executed in User mode or System mode, because these modes do not have an SPSR. Specifies the destination register. Specifies the register that contains the first operand. Specifies the second operand. The options for this operand are described in Addressing Mode 1 - Data-processing operands on page A5-2, including how each option causes the I bit (bit[25]) and the shifter_operand bits (bits[11:0]) to be set in the instruction. If the I bit is 0 and both bit[7] and bit[4] of shifter_operand are 1, the instruction is not BIC. Instead, see Extending the instruction set on page A3-32 to determine which instruction it is. Architecture version All. A4-12 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Exceptions None. Operation if ConditionPassed(cond) then Rd = Rn AND NOT shifter_operand if S == 1 and Rd == R15 then if CurrentModeHasSPSR() then CPSR = SPSR else UNPREDICTABLE else if S == 1 then N Flag = Rd[31] Z Flag = if Rd == 0 then 1 else 0 C Flag = shifter_carry_out V Flag = unaffected Usage Use BIC to clear selected bits in a register. For each bit, BIC with 1 clears the bit, and BIC with 0 leaves it unchanged. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-13 ARM Instructions A4.1.7 BKPT 31 28 27 26 25 24 23 22 21 20 19 1 1 1 0 0 0 0 1 0 0 1 0 8 immed 7 4 3 0 1 1 1 0 immed BKPT (Breakpoint) causes a software breakpoint to occur. This breakpoint can be handled by an exception handler installed on the Prefetch Abort vector. In implementations that also include debug hardware, the hardware can optionally override this behavior and handle the breakpoint itself. When this occurs, the Prefetch Abort exception context is presented to the debugger. Syntax BKPT where: Is a 16-bit immediate value. The top 12 bits of are placed in bits[19:8] of the instruction, and the bottom 4 bits are placed in bits[3:0] of the instruction. This value is ignored by the ARM hardware, but can be used by a debugger to store additional information about the breakpoint. Architecture version Version 5 and above. Exceptions Prefetch Abort. Operation if (not overridden by debug hardware) R14_abt = address of BKPT instruction + 4 SPSR_abt = CPSR CPSR[4:0] = 0b10111 /* Enter Abort mode */ CPSR[5] = 0 /* Execute in ARM state */ /* CPSR[6] is unchanged */ CPSR[7] = 1 /* Disable normal interrupts */ CPSR[8] = 1 /* Disable imprecise aborts - v6 only */ CPSR[9] = CP15_reg1_EEbit if high vectors configured then PC = 0xFFFF000C else PC = 0x0000000C A4-14 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Usage The exact usage of BKPT depends on the debug system being used. A debug system can use the BKPT instruction in two ways: • Monitor debug-mode. Debug hardware, (optional prior to ARMv6), does not override the normal behavior of the BKPT instruction, and so the Prefetch Abort vector is entered. The IFSR is updated to indicate a debug event, allowing software to distinguish debug events due to BKPT instruction execution from other system Prefetch Aborts. When used in this manner, the BKPT instruction must be avoided within abort handlers, as it corrupts R14_abt and SPSR_abt. For the same reason, it must also be avoided within FIQ handlers, since an FIQ interrupt can occur within an abort handler. • Halting debug-mode. Debug hardware does override the normal behavior of the BKPT instruction and handles the software breakpoint itself. When finished, it typically either resumes execution at the instruction following the BKPT, or replaces the BKPT in memory with another instruction and resumes execution at that instruction. When BKPT is used in this manner, R14_abt and SPSR_abt are not corrupted, and so the above restrictions about its use in abort and FIQ handlers do not apply. Notes Condition field BKPT is unconditional. If bits[31:28] of the instruction encode a valid condition other than the AL (always) condition, the instruction is UNPREDICTABLE. Hardware override Debug hardware in an implementation is specifically permitted to override the normal behavior of the BKPT instruction. Because of this, software must not use this instruction for purposes other than those documented by the debug system being used (if any). In particular, software cannot rely on the Prefetch Abort exception occurring, unless either there is guaranteed to be no debug hardware in the system or the debug system specifies that it occurs. For more information, consult the documentation for the debug system being used. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-15 ARM Instructions A4.1.8 BLX (1) 31 30 29 28 27 26 25 24 23 1 1 1 1 1 0 1 H 0 signed_immed_24 BLX (1) (Branch with Link and Exchange) calls a Thumb® subroutine from the ARM instruction set at an address specified in the instruction. This form of BLX is unconditional (always causing a change in program flow) and preserves the address of the instruction following the branch in the link register (R14). Execution of Thumb instructions begins at the target address. Syntax BLX where: Specifies the address of the Thumb instruction to branch to. The branch target address is calculated by: 1. Sign-extending the 24-bit signed (two's complement) immediate to 30 bits 2. Shifting the result left two bits to form a 32-bit value 3. Setting bit[1] of the result of step 2 to the H bit 4. Adding the result of step 3 to the contents of the PC, which contains the address of the branch instruction plus 8. The instruction can therefore specify a branch of approximately ±32MB (see Usage on page A4-17 for precise range). Architecture version Version 5 and above. See The T and J bits on page A2-15 for further details of operation on non-T variants. Exceptions None. Operation LR = address of the instruction after the BLX instruction CPSR T bit = 1 PC = PC + (SignExtend(signed_immed_24) << 2) + (H << 1) A4-16 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Usage To return from a Thumb subroutine called via BLX to the ARM caller, use the Thumb instruction: BX R14 as described in BX on page A7-32, or use this instruction on subroutine entry: PUSH {,R14} and this instruction to return: POP {,PC} To calculate the correct value of signed_immed_24, the assembler (or other toolkit component) must: 1. Form the base address for this branch instruction. This is the address of the instruction, plus 8. In other words, this base address is equal to the PC value used by the instruction. 2. Subtract the base address from the target address to form a byte offset. This offset is always even, because all ARM instructions are word-aligned and all Thumb instructions are halfword-aligned. 3. If the byte offset is outside the range −33554432 to +33554430, use an alternative code-generation strategy or produce an error as appropriate. 4. Otherwise, set the signed_immed_24 field of the instruction to bits[25:2] of the byte offset, and the H bit of the instruction to bit[1] of the byte offset. Notes Condition Unlike most other ARM instructions, this instruction cannot be executed conditionally. Bit[24] This bit is used as bit[1] of the target address. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-17 ARM Instructions A4.1.9 BLX (2) 31 30 29 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 1 0 0 1 0 16 15 SBO 12 11 SBO 8 SBO 7 6 5 4 3 0 0 1 1 0 Rm BLX (2) calls an ARM or Thumb subroutine from the ARM instruction set, at an address specified in a register. It sets the CPSR T bit to bit[0] of Rm. This selects the instruction set to be used in the subroutine. The branch target address is the value of register Rm, with its bit[0] forced to zero. It sets R14 to a return address. To return from the subroutine, use a BX R14 instruction, or store R14 on the stack and reload the stored value into the PC. Syntax BLX{} where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Is the register containing the address of the target instruction. Bit[0] of Rm is 0 to select a target ARM instruction, or 1 to select a target Thumb instruction. If R15 is specified for , the results are UNPREDICTABLE. Architecture version Version 5 and above. See The T and J bits on page A2-15 for further details of operation on non-T variants. Exceptions None. Operation if ConditionPassed(cond) then target = Rm LR = address of instruction after the BLX instruction CPSR T bit = target[0] PC = target AND 0xFFFFFFFE A4-18 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Notes ARM/Thumb state transfers If Rm[1:0] == 0b10, the result is UNPREDICTABLE, as branches to non word-aligned addresses are impossible in ARM state. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-19 ARM Instructions A4.1.10 BX 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 1 0 0 1 0 16 15 SBO 12 11 SBO 8 SBO 7 6 5 4 3 0 0 0 1 0 Rm BX (Branch and Exchange) branches to an address, with an optional switch to Thumb state. Syntax BX{} where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Holds the value of the branch target address. Bit[0] of Rm is 0 to select a target ARM instruction, or 1 to select a target Thumb instruction. Architecture version Version 5 and above, and T variants of version 4. See The T and J bits on page A2-15 for further details of operation on non-T variants of version 5. Exceptions None. Operation if ConditionPassed(cond) then CPSR T bit = Rm[0] PC = Rm AND 0xFFFFFFFE Notes ARM/Thumb state transfers If Rm[1:0] == 0b10, the result is UNPREDICTABLE, as branches to non word-aligned addresses are impossible in ARM state. Use of R15 Register 15 can be specified for , but doing so is discouraged. In a BX R15 instruction, R15 is read as normal for ARM code, that is, it is the address of the BX instruction itself plus 8. The result is to branch to the second following word, executing in ARM state. This is precisely the same effect that would have been obtained if a B instruction with an offset field of 0 had been executed, or an ADD PC,PC,#0 or MOV PC,PC instruction. In new code, use these instructions in preference to the more complex BX PC instruction. A4-20 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.11 BXJ 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 1 0 0 1 0 16 15 SBO 12 11 SBO 8 7 6 5 4 SBO 0 0 1 0 3 0 Rm BXJ (Branch and change to Jazelle® state) enters Jazelle state if Jazelle is available and enabled. Otherwise BXJ behaves exactly as BX (see BX on page A4-20). Syntax BXJ{} where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Holds the value of the branch target address for use if Jazelle state is not available. Bit[0] of Rm is 0 to select a target ARM instruction, or 1 to select a target Thumb instruction. Architecture version Version 6 and above, plus ARMv5TEJ. Exceptions None. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-21 ARM Instructions Operation if ConditionPassed(cond) then if (JE bit of Main Configuration register) == 0 then T Flag = Rm[0] PC = Rm AND 0xFFFFFFFE else jpc = SUB-ARCHITECTURE DEFINED value invalidhandler = SUB-ARCHITECTURE DEFINED value if (Jazelle Extension accepts opcode at jpc) then if (CV bit of Jazelle OS Control register) == 0 then PC = invalidhandler else J Flag = 1 Start opcode execution at jpc else if ((CV bit of Jazelle OS Control register) == 0) AND (IMPLEMENTATION DEFINED CONDITION) then PC = invalidhandler else /* Subject to SUB-ARCHITECTURE DEFINED restrictions on Rm: */ T Flag = Rm[0] PC = Rm AND 0xFFFFFFFE Usage This instruction must only be used if one of the following conditions is true: • The JE bit of the Main Configuration Register is 0. • The Enabled Java Virtual Machine in use conforms to all the SUB-ARCHITECTURE DEFINED restrictions of the Jazelle Extension hardware being used. Notes ARM/Thumb state transfers IF (JE bit of Main Configuration register) == 0 AND Rm[1:0] == 0b10, the result is UNPREDICTABLE, as branches to non word-aligned addresses are impossible in ARM state. Use of R15 If register 15 is specified for , the result is UNPREDICTABLE. Jazelle opcode address The Jazelle opcode address is determined in a SUB-ARCHITECTURE DEFINED manner, typically from the contents of a specific general-purpose register, the Jazelle Program Counter (jpc). A4-22 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.12 CDP 31 28 27 26 25 24 23 cond 1 1 1 0 20 19 opcode_1 16 15 CRn 12 11 CRd cp_num 8 7 5 4 opcode_2 0 3 0 CRm CDP (Coprocessor Data Processing) tells the coprocessor whose number is cp_num to perform an operation that is independent of ARM registers and memory. If no coprocessors indicate that they can execute the instruction, an Undefined Instruction exception is generated. Syntax CDP{} CDP2 , , , , , , , , , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. CDP2 Causes the condition field of the instruction to be set to 0b1111. This provides additional opcode space for coprocessor designers. The resulting instructions can only be executed unconditionally. Specifies the name of the coprocessor, and causes the corresponding coprocessor number to be placed in the cp_num field of the instruction. The standard generic coprocessor names are p0, p1, ..., p15. Specifies (in a coprocessor-specific manner) which coprocessor operation is to be performed. Specifies the destination coprocessor register for the instruction. Specifies the coprocessor register that contains the first operand. Specifies the coprocessor register that contains the second operand. Specifies (in a coprocessor-specific manner) which coprocessor operation is to be performed. Architecture version CDP is in all versions. CDP2 is in version 5 and above. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-23 ARM Instructions Exceptions Undefined Instruction. Operation if ConditionPassed(cond) then Coprocessor[cp_num]-dependent operation Usage Use CDP to initiate coprocessor instructions that do not operate on values in ARM registers or in main memory. An example is a floating-point multiply instruction for a floating-point coprocessor. Notes Coprocessor fields Only instruction bits[31:24], bits[11:8], and bit[4] are architecturally defined. The remaining fields are recommendations, for compatibility with ARM Development Systems. Unimplemented coprocessor instructions Hardware coprocessor support is optional for coprocessors 0-13, regardless of the architecture version, and is optional for coprocessors 14 and 15 before ARMv6. An implementation can choose to implement a subset of the coprocessor instructions, or no coprocessor instructions at all. Any coprocessor instructions that are not implemented instead cause an Undefined Instruction exception. A4-24 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.13 CLZ 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 1 0 1 1 0 16 15 SBO 12 11 Rd 8 7 6 5 4 SBO 0 0 0 1 3 0 Rm CLZ (Count Leading Zeros) returns the number of binary zero bits before the first binary one bit in a value. CLZ does not update the condition code flags. Syntax CLZ{} , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register for the operation. If R15 is specified for , the result is UNPREDICTABLE. Specifies the source register for this operation. If R15 is specified for , the result is UNPREDICTABLE. Architecture version Version 5 and above. Exceptions None. Operation if Rm == 0 Rd = 32 else Rd = 31 - (bit position of most significant'1' in Rm) Usage Use CLZ followed by a left shift of Rm by the resulting Rd value to normalize the value of register Rm. This shifts Rm so that its most significant 1 bit is in bit[31]. Using MOVS rather than MOV sets the Z flag in the special case that Rm is zero and so does not have a most significant 1 bit: CLZ MOVS ARM DDI 0100I Rd, Rm Rm, Rm, LSL Rd Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-25 ARM Instructions A4.1.14 CMN 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 I 1 0 1 1 1 16 15 Rn 12 11 SBZ 0 shifter_operand CMN (Compare Negative) compares one value with the twos complement of a second value. The first value comes from a register. The second value can be either an immediate value or a value from a register, and can be shifted before the comparison. CMN updates the condition flags, based on the result of adding the two values. Syntax CMN{} , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the register that contains the first operand. Specifies the second operand. The options for this operand are described in Addressing Mode 1 - Data-processing operands on page A5-2, including how each option causes the I bit (bit[25]) and the shifter_operand bits (bits[11:0]) to be set in the instruction. If the I bit is 0 and both bit[7] and bit[4] of shifter_operand are 1, the instruction is not CMN. Instead, see Multiply instruction extension space on page A3-35 to determine which instruction it is. Architecture version All. Exceptions None. Operation if ConditionPassed(cond) then alu_out = Rn + shifter_operand N Flag = alu_out[31] Z Flag = if alu_out == 0 then 1 else 0 C Flag = CarryFrom(Rn + shifter_operand) V Flag = OverflowFrom(Rn + shifter_operand) A4-26 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Usage CMN performs a comparison by adding the value of to the value of register , and updates the condition code flags (based on the result). This is almost equivalent to subtracting the negative of the second operand from the first operand, and setting the flags on the result. The difference is that the flag values generated can differ when the second operand is 0 or 0x80000000. For example, this instruction always leaves the C flag = 1: CMP Rn, #0 and this instruction always leaves the C flag = 0: CMN Rn, #0 ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-27 ARM Instructions A4.1.15 CMP 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 I 1 0 1 0 1 16 15 Rn 12 11 SBZ 0 shifter_operand CMP (Compare) compares two values. The first value comes from a register. The second value can be either an immediate value or a value from a register, and can be shifted before the comparison. CMP updates the condition flags, based on the result of subtracting the second value from the first. Syntax CMP{} , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the register that contains the first operand. Specifies the second operand. The options for this operand are described in Addressing Mode 1 - Data-processing operands on page A5-2, including how each option causes the I bit (bit[25]) and the shifter_operand bits (bits[11:0]) to be set in the instruction. If the I bit is 0 and both bit[7] and bit[4] of shifter_operand are 1, the instruction is not CMP. Instead, see Multiply instruction extension space on page A3-35 to determine which instruction it is. Architecture version All. Exceptions None. Operation if ConditionPassed(cond) then alu_out = Rn - shifter_operand N Flag = alu_out[31] Z Flag = if alu_out == 0 then 1 else 0 C Flag = NOT BorrowFrom(Rn - shifter_operand) V Flag = OverflowFrom(Rn - shifter_operand) A4-28 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.16 CPS 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 1 1 1 1 0 0 0 1 0 0 0 0 imod mmod 0 9 8 7 6 SBZ 5 4 A I F 0 0 mode CPS (Change Processor State) changes one or more of the mode, A, I, and F bits of the CPSR, without changing the other CPSR bits. Syntax CPS {, #} CPS # where: Specifies what effect is wanted on the interrupt disable bits A, I, and F in the CPSR. This is one of: IE Interrupt Enable, encoded by imod == 0b10. This sets the specified bits to 0. ID Interrupt Disable, encoded by imod == 0b11. This sets the specified bits to 1. If is specified, the bits to be affected are specified by . These are encoded in the A, I, and F bits of the instruction. The mode can optionally be changed by specifying a mode number as . If is not specified, then: is not specified and the A, I, and F mask settings are not changed • • the A, I, and F bits of the instruction are zero • imod = 0b00 • mmod = 0b1 specifies the new mode number. • ARM DDI 0100I Is a sequence of one or more of the following, specifying which interrupt disable flags are affected: a Sets the A bit in the instruction, causing the specified effect on the CPSR A (imprecise data abort) bit. i Sets the I bit in the instruction, causing the specified effect on the CPSR I (IRQ interrupt) bit. f Sets the F bit in the instruction, causing the specified effect on the CPSR F (FIQ interrupt) bit. Specifies the number of the mode to change to. If it is present, then mmod == 1 and the mode number is encoded in the mode field of the instruction. If it is omitted, then mmod == 0 and the mode field of the instruction is zero. See The mode bits on page A2-14 for details. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-29 ARM Instructions Architecture version Version 6 and above. Exceptions None. Operation if InAPrivilegedMode() then if imod[1] == 1 then if A == 1 then CPSR[8] = imod[0] if I == 1 then CPSR[7] = imod[0] if F == 1 then CPSR[6] = imod[0] /* else no change to the mask */ if mmod == 1 then CPSR[4:0] = mode Notes User mode CPS has no effect in User mode. Meaningless bit combinations The following combinations of imod and mmod are meaningless: • imod == 0b00, mmod == 0 • imod == 0b01, mmod == 0 • imod == 0b01, mmod == 1 An assembler must not generate them. The effects are UNPREDICTABLE on execution. Condition Unlike most other ARM instructions, CPS cannot be executed conditionally. Reserved modes An attempt to change mode to a reserved value is UNPREDICTABLE Examples CPSIE CPSID CPS A4-30 a,#31 if #16 ; enable imprecise data aborts, change to System mode ; disable interrupts and fast interrupts ; change to User mode Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.17 CPY 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 1 1 0 1 0 16 15 SBZ 12 11 10 9 8 7 6 5 4 Rd 0 0 0 0 0 0 0 0 3 0 Rm CPY (Copy) copies a value from one register to another. It is a synonym for MOV, with no flag setting and no shift. See MOV on page A4-68. Syntax CPY{} , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the source register. Architecture version Version 6 and above. Exceptions None. Operation if ConditionPassed(cond) then Rd = Rm ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-31 ARM Instructions A4.1.18 EOR 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 I 0 0 0 1 S 16 15 Rn 12 11 Rd 0 shifter_operand EOR (Exclusive OR) performs a bitwise Exclusive-OR of two values. The first value comes from a register. The second value can be either an immediate value or a value from a register, and can be shifted before the exclusive OR operation. EOR can optionally update the condition code flags, based on the result. Syntax EOR{}{S} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. S Sets the S bit (bit[20]) in the instruction to 1 and specifies that the instruction updates the CPSR. If S is omitted, the S bit is set to 0 and the CPSR is not changed by the instruction. Two types of CPSR update can occur when S is specified: • If is not R15, the N and Z flags are set according to the result of the operation, and the C flag is set to the carry output bit generated by the shifter (see Addressing Mode 1 - Data-processing operands on page A5-2). The V flag and the rest of the CPSR are unaffected. • If is R15, the SPSR of the current mode is copied to the CPSR. This form of the instruction is UNPREDICTABLE if executed in User mode or System mode, because these modes do not have an SPSR. Specifies the destination register. Specifies the register that contains the first operand. Specifies the second operand. The options for this operand are described in Addressing Mode 1 - Data-processing operands on page A5-2, including how each option causes the I bit (bit[25]) and the shifter_operand bits (bits[11:0]) to be set in the instruction. If the I bit is 0 and both bit[7] and bit[4] of shifter_operand are 1, the instruction is not EOR. Instead, see Extending the instruction set on page A3-32 to determine which instruction it is. Architecture version All. A4-32 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Exceptions None. Operation if ConditionPassed(cond) then Rd = Rn EOR shifter_operand if S == 1 and Rd == R15 then if CurrentModeHasSPSR() then CPSR = SPSR else UNPREDICTABLE else if S == 1 then N Flag = Rd[31] Z Flag = if Rd == 0 then 1 else 0 C Flag = shifter_carry_out V Flag = unaffected Usage Use EOR to invert selected bits in a register. For each bit, EOR with 1 inverts that bit, and EOR with 0 leaves it unchanged. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-33 ARM Instructions A4.1.19 LDC 31 28 27 26 25 24 23 22 21 20 19 cond 1 1 0 P U N W 1 16 15 Rn 12 11 CRd cp_num 8 7 0 8_bit_word_offset LDC (Load Coprocessor) loads memory data from a sequence of consecutive memory addresses to a coprocessor. If no coprocessors indicate that they can execute the instruction, an Undefined Instruction exception is generated. Syntax LDC{}{L} LDC2{L} , , , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. LDC2 Causes the condition field of the instruction to be set to 0b1111. This provides additional opcode space for coprocessor designers. The resulting instructions can only be executed unconditionally. L Sets the N bit (bit[22]) in the instruction to 1 and specifies a long load (for example, double-precision instead of single-precision data transfer). If L is omitted, the N bit is 0 and the instruction specifies a short load. Specifies the name of the coprocessor, and causes the corresponding coprocessor number to be placed in the cp_num field of the instruction. The standard generic coprocessor names are p0, p1, ..., p15. Specifies the coprocessor destination register. Is described in Addressing Mode 5 - Load and Store Coprocessor on page A5-49. It determines the P, U, Rn, W and 8_bit_word_offset bits of the instruction. The syntax of all forms of includes a base register . Some forms also specify that the instruction modifies the base register value (this is known as base register write-back). Architecture version LDC is in all versions. LDC2 is in version 5 and above. A4-34 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Exceptions Undefined Instruction, Data Abort. Operation MemoryAccess(B-bit, E-bit) if ConditionPassed(cond) then address = start_address load Memory[address,4] for Coprocessor[cp_num] while (NotFinished(Coprocessor[cp_num])) address = address + 4 load Memory[address,4] for Coprocessor[cp_num] assert address == end_address Usage LDC is useful for loading coprocessor data from memory. Notes Coprocessor fields Only instruction bits[31:23], bits[21:16], and bits[11:0] are ARM architecture-defined. The remaining fields (bit[22] and bits[15:12]) are recommendations, for compatibility with ARM Development Systems. In the case of the Unindexed addressing mode (P==0, U==1, W==0), instruction bits[7:0] are also not defined by the ARM architecture, and can be used to specify additional coprocessor options. Data Abort For details of the effects of the instruction if a Data Abort occurs, see Effects of data-aborted instructions on page A2-21. Non word-aligned addresses For CP15_reg1_Ubit == 0, the load coprocessor register instruction ignores the least significant two bits of the address. If an implementation includes a System Control coprocessor (see Chapter B3 The System Control Coprocessor), and alignment checking is enabled, an address with bits[1:0] != 0b00 causes an alignment exception. For CP15_reg1_Ubit == 1, all non-word aligned accesses cause an alignment fault. Unimplemented coprocessor instructions Hardware coprocessor support is optional, regardless of the architecture version. An implementation can choose to implement a subset of the coprocessor instructions, or no coprocessor instructions at all. Any coprocessor instructions that are not implemented instead cause an Undefined Instruction exception. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-35 ARM Instructions A4.1.20 LDM (1) 31 28 27 26 25 24 23 22 21 20 19 cond 1 0 0 P U 0 W 1 16 15 Rn 0 register_list LDM (1) (Load Multiple) loads a non-empty subset, or possibly all, of the general-purpose registers from sequential memory locations. It is useful for block loads, stack operations and procedure exit sequences. The general-purpose registers loaded can include the PC. If they do, the word loaded for the PC is treated as an address and a branch occurs to that address. In ARMv5 and above, bit[0] of the loaded value determines whether execution continues after this branch in ARM state or in Thumb state, as though a BX (loaded_value) instruction had been executed (but see also The T and J bits on page A2-15 for operation on non-T variants of ARMv5). In earlier versions of the architecture, bits[1:0] of the loaded value are ignored and execution continues in ARM state, as though the instruction MOV PC,(loaded_value) had been executed. Syntax LDM{} {!}, where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Is described in Addressing Mode 4 - Load and Store Multiple on page A5-41. It determines the P, U, and W bits of the instruction. Specifies the base register used by . Using R15 as the base register gives an UNPREDICTABLE result. ! Sets the W bit, causing the instruction to write a modified value back to its base register Rn as specified in Addressing Mode 4 - Load and Store Multiple on page A5-41. If ! is omitted, the W bit is 0 and the instruction does not change its base register in this way. (However, if the base register is included in , it changes when a value is loaded into it.) Is a list of registers, separated by commas and surrounded by { and }. It specifies the set of registers to be loaded by the LDM instruction. The registers are loaded in sequence, the lowest-numbered register from the lowest memory address (start_address), through to the highest-numbered register from the highest memory address (end_address). If the PC is specified in the register list (opcode bit[15] is set), the instruction causes a branch to the address (data) loaded into the PC. For each of i=0 to 15, bit[i] in the register_list field of the instruction is 1 if Ri is in the list and 0 otherwise. If bits[15:0] are all zero, the result is UNPREDICTABLE. A4-36 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Architecture version All. Exceptions Data Abort. Operation MemoryAccess(B-bit, E-bit) if ConditionPassed(cond) then address = start_address for i = 0 to 14 if register_list[i] == 1 then Ri = Memory[address,4] address = address + 4 if register_list[15] == 1 then value = Memory[address,4] if (architecture version 5 or above) then pc = value AND 0xFFFFFFFE T Bit = value[0] else pc = value AND 0xFFFFFFFC address = address + 4 assert end_address == address - 4 Notes Operand restrictions If the base register is specified in , and base register write-back is specified, the final value of is UNPREDICTABLE. Data Abort For details of the effects of the instruction if a Data Abort occurs, see Effects of data-aborted instructions on page A2-21. Non word-aligned addresses For CP15_reg1_Ubit == 0, the Load Multiple instructions ignore the least significant two bits of the address. If an implementation includes a System Control coprocessor (see Chapter B3 The System Control Coprocessor), an address with bits[1:0] != 0b00 causes an alignment exception if alignment checking is enabled. For CP15_reg1_Ubit == 1, all non-word aligned accesses cause an alignment fault. ARM/Thumb state transfers (ARM architecture version 5 and above) If bits[1:0] of a value loaded for R15 are 0b10, the result is UNPREDICTABLE, as branches to non word-aligned addresses are impossible in ARM state. Time order The time order of the accesses to individual words of memory generated by this instruction is only defined in some circumstances. See Memory access restrictions on page B2-13for details. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-37 ARM Instructions A4.1.21 LDM (2) 31 28 27 26 25 24 23 22 21 20 19 cond 1 0 0 P U 1 0 1 16 15 14 Rn 0 0 register_list LDM (2) loads User mode registers when the processor is in a privileged mode. This is useful when performing process swaps, and in instruction emulators. LDM (2) loads a non-empty subset of the User mode general-purpose registers from sequential memory locations. Syntax LDM{} , ^ where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Is described in Addressing Mode 4 - Load and Store Multiple on page A5-41. It determines the P and U bits of the instruction. Only the forms of this addressing mode with W == 0 are available for this form of the LDM instruction. Specifies the base register used by . Using R15 as gives an UNPREDICTABLE result. Is a list of registers, separated by commas and surrounded by { and }. This list must not include the PC, and specifies the set of registers to be loaded by the LDM instruction. The registers are loaded in sequence, the lowest-numbered register from the lowest memory address (start_address), through to the highest-numbered register from the highest memory address (end_address). For each of i=0 to 14, bit[i] in the register_list field of the instruction is 1 if Ri is in the list and 0 otherwise. If bits[15:0] are all zero, the result is UNPREDICTABLE. For an LDM instruction that does not load the PC, this indicates that User mode registers are to be loaded. ^ Architecture version All. Exceptions Data Abort. A4-38 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation MemoryAccess(B-bit, E-bit) if ConditionPassed(cond) then address = start_address for i = 0 to 14 if register_list[i] == 1 Ri_usr = Memory[address,4] address = address + 4 assert end_address == address - 4 Notes Write-back Setting bit[21] (the W bit) has UNPREDICTABLE results. User and System mode This form of LDM is UNPREDICTABLE in User mode or System mode. Base register mode The base register is read from the current processor mode registers, not the User mode registers. Data Abort For details of the effects of the instruction if a Data Abort occurs, see Effects of data-aborted instructions on page A2-21. Non word-aligned addresses For CP15_reg1_Ubit == 0, the Load Multiple instructions ignore the least significant two bits of the address. If an implementation includes a System Control coprocessor (see Chapter B3 The System Control Coprocessor), an address with bits[1:0] != 0b00 causes an alignment exception if alignment checking is enabled. For CP15_reg1_Ubit == 1, all non-word aligned accesses cause an alignment fault. Time order The time order of the accesses to individual words of memory generated by this instruction is only defined in some circumstances. See Memory access restrictions on page B2-13 for details. Banked registers In ARM architecture versions earlier than ARMv6, this form of LDM must not be followed by an instruction that accesses banked registers. A following NOP is a good way to ensure this. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-39 ARM Instructions A4.1.22 LDM (3) 31 28 27 26 25 24 23 22 21 20 19 cond 1 0 0 P U 1 W 1 16 15 14 Rn 1 0 register_list LDM (3) loads a subset, or possibly all, of the general-purpose registers and the PC from sequential memory locations. Also, the SPSR of the current mode is copied to the CPSR. This is useful for returning from an exception. The value loaded for the PC is treated as an address and a branch occurs to that address. In ARMv5 and above, and in T variants of version 4, the value copied from the SPSR T bit to the CPSR T bit determines whether execution continues after the branch in ARM state or in Thumb state (but see also The T and J bits on page A2-15 for operation on non-T variants of ARMv5). In earlier architecture versions, it continues after the branch in ARM state (the only possibility in those architecture versions). Syntax LDM{} {!}, ^ where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. ! Is described in Addressing Mode 4 - Load and Store Multiple on page A5-41. It determines the P, U, and W bits of the instruction. Specifies the base register used by . Using R15 as gives an UNPREDICTABLE result. Sets the W bit, and the instruction writes a modified value back to its base register Rn (see Addressing Mode 4 - Load and Store Multiple on page A5-41). If ! is omitted, the W bit is 0 and the instruction does not change its base register in this way. (However, if the base register is included in , it changes when a value is loaded into it.) ^ Is a list of registers, separated by commas and surrounded by { and }. This list must include the PC, and specifies the set of registers to be loaded by the LDM instruction. The registers are loaded in sequence, the lowest-numbered register from the lowest memory address (start_address), through to the highest-numbered register from the highest memory address (end_address). For each of i=0 to 15, bit[i] in the register_list field of the instruction is 1 if Ri is in the list and 0 otherwise. For an LDM instruction that loads the PC, this indicates that the SPSR of the current mode is copied to the CPSR. Architecture version All. A4-40 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Exceptions Data Abort. Operation MemoryAccess(B-bit, E-bit) if ConditionPassed(cond) then address = start_address for i = 0 to 14 if register_list[i] == 1 then Ri = Memory[address,4] address = address + 4 if CurrentModeHasSPSR() then CPSR = SPSR else UNPREDICTABLE value = Memory[address,4] PC = value address = address + 4 assert end_address == address - 4 Notes User and System mode This instruction is UNPREDICTABLE in User or System mode. Operand restrictions If the base register is specified in , and base register write-back is specified, the final value of is UNPREDICTABLE. Data Abort For details of the effects of the instruction if a Data Abort occurs, see Effects of data-aborted instructions on page A2-21. Non word-aligned addresses For CP15_reg1_Ubit == 0, the Load Multiple instructions ignore the least significant two bits of the address. If an implementation includes a System Control coprocessor (see Chapter B3 The System Control Coprocessor), an address with bits[1:0] != 0b00 causes an alignment exception if alignment checking is enabled. For CP15_reg1_Ubit == 1, all non-word aligned accesses cause an alignment fault. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-41 ARM Instructions ARM/Thumb state transfers (ARM architecture versions 4T, 5 and above) If the SPSR T bit is 0 and bit[1] of the value loaded into the PC is 1, the results are UNPREDICTABLE because it is not possible to branch to an ARM instruction at a non word-aligned address. Note that no special precautions against this are needed on normal exception returns, because exception entries always either set the T bit of the SPSR to 1 or bit[1] of the return link value in R14 to 0. Time order A4-42 The time order of the accesses to individual words of memory generated by this instruction is not defined. See Memory access restrictions on page B2-13 for details. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.23 LDR 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 I P U 0 W 1 16 15 Rn 12 11 Rd 0 addr_mode LDR (Load Register) loads a word from a memory address. If the PC is specified as register , the instruction loads a data word which it treats as an address, then branches to that address. In ARMv5T and above, bit[0] of the loaded value determines whether execution continues after this branch in ARM state or in Thumb state, as though a BX (loaded_value) instruction had been executed. In earlier versions of the architecture, bits[1:0] of the loaded value are ignored and execution continues in ARM state, as though a MOV PC,(loaded_value) instruction had been executed. Syntax LDR{} , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register for the loaded value. Is described in Addressing Mode 2 - Load and Store Word or Unsigned Byte on page A5-18. It determines the I, P, U, W, Rn and addr_mode bits of the instruction. The syntax of all forms of includes a base register . Some forms also specify that the instruction modifies the base register value (this is known as base register write-back). Architecture version All. Exceptions Data Abort. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-43 ARM Instructions Operation MemoryAccess(B-bit, E-bit) if ConditionPassed(cond) then if (CP15_reg1_Ubit == 0) then data = Memory[address,4] Rotate_Right (8 * address[1:0]) else /* CP15_reg_Ubit == 1 */ data = Memory[address,4] if (Rd is R15) then if (ARMv5 or above) then PC = data AND 0xFFFFFFFE T Bit = data[0] else PC = data AND 0xFFFFFFFC else Rd = data Usage Using the PC as the base register allows PC-relative addressing, which facilitates position-independent code. Combined with a suitable addressing mode, LDR allows 32-bit memory data to be loaded into a general-purpose register where its value can be manipulated. If the destination register is the PC, this instruction loads a 32-bit address from memory and branches to that address. To synthesize a Branch with Link, precede the LDR instruction with MOV LR, PC. Alignment ARMv5 and below If the address is not word-aligned, the loaded value is rotated right by 8 times the value of bits[1:0] of the address. For a little-endian memory system, this rotation causes the addressed byte to occupy the least significant byte of the register. For a big-endian memory system, it causes the addressed byte to occupy bits[31:24] or bits[15:8] of the register, depending on whether bit[0] of the address is 0 or 1 respectively. If an implementation includes a System Control coprocessor (see Chapter B3 The System Control Coprocessor), and alignment checking is enabled, an address with bits[1:0] != 0b00 causes an alignment exception. ARMv6 and above From ARMv6, a byte-invariant mixed-endian format is supported, along with an alignment-checking option. The pseudo-code for the ARMv6 case assumes that unaligned mixed-endian support is configured, with the endianness of the transfer defined by the CPSR E-bit. For more details on endianness and alignment see Endian support on page A2-30 and Unaligned access support on page A2-38. A4-44 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Notes Data Abort For details of the effects of the instruction if a Data Abort occurs, see Effects of data-aborted instructions on page A2-21. Operand restrictions If specifies base register write-back, and the same register is specified for and , the results are UNPREDICTABLE. Use of R15 If R15 is specified for , the value of the address of the loaded value must be word aligned. That is, address[1:0] must be 0b00. In addition, for Thumb interworking reasons, R15[1:0] must not be loaded with the value 0b10. If these constraints are not met, the result is UNPREDICTABLE. ARM/Thumb state transfers (ARM architecture version 5 and above) If bits[1:0] of a value loaded for R15 are 0b10, the result is UNPREDICTABLE, as branches to non word-aligned addresses are impossible in ARM state. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-45 ARM Instructions A4.1.24 LDRB 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 I P U 1 W 1 16 15 Rn 12 11 Rd 0 addr_mode LDRB (Load Register Byte) loads a byte from memory and zero-extends the byte to a 32-bit word. Syntax LDR{}B , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register for the loaded value. If register 15 is specified for , the result is UNPREDICTABLE. Is described in Addressing Mode 2 - Load and Store Word or Unsigned Byte on page A5-18. It determines the I, P, U, W, Rn and addr_mode bits of the instruction. The syntax of all forms of includes a base register . Some forms also specify that the instruction modifies the base register value (this is known as base register write-back). Architecture version All. Exceptions Data Abort. Operation MemoryAccess(B-bit, E-bit) if ConditionPassed(cond) then Rd = Memory[address,1] A4-46 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Usage Combined with a suitable addressing mode, LDRB allows 8-bit memory data to be loaded into a general-purpose register where it can be manipulated. Using the PC as the base register allows PC-relative addressing, to facilitate position-independent code. Notes Operand restrictions If specifies base register write-back, and the same register is specified for and , the results are UNPREDICTABLE. Data Abort ARM DDI 0100I For details of the effects of the instruction if a Data Abort occurs, see Effects of data-aborted instructions on page A2-21. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-47 ARM Instructions A4.1.25 LDRBT 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 I 0 U 1 1 1 16 15 Rn 12 11 Rd 0 addr_mode LDRBT (Load Register Byte with Translation) loads a byte from memory and zero-extends the byte to a 32-bit word. If LDRBT is executed when the processor is in a privileged mode, the memory system is signaled to treat the access as if the processor were in User mode. Syntax LDR{}BT , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register for the loaded value. If R15 is specified for , the result is UNPREDICTABLE. Is described in Addressing Mode 2 - Load and Store Word or Unsigned Byte on page A5-18. It determines the I, U, Rn and addr_mode bits of the instruction. Only post-indexed forms of Addressing Mode 2 are available for this instruction. These forms have P == 0 and W == 0, where P and W are bit[24] and bit[21] respectively. This instruction uses P == 0 and W == 1 instead, but the addressing mode is the same in all other respects. The syntax of all forms of includes a base register . All forms also specify that the instruction modifies the base register value (this is known as base register write-back). Architecture version All. Exceptions Data Abort. Operation if ConditionPassed(cond) then Rd = Memory[address,1] Rn = address A4-48 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Usage LDRBT can be used by a (privileged) exception handler that is emulating a memory access instruction that would normally execute in User mode. The access is restricted as if it had User mode privilege. Notes User mode If this instruction is executed in User mode, an ordinary User mode access is performed. Operand restrictions If the same register is specified for and , the results are UNPREDICTABLE. Data Abort ARM DDI 0100I For details of the effects of the instruction if a Data Abort occurs, see Effects of data-aborted instructions on page A2-21. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-49 ARM Instructions A4.1.26 LDRD 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 P U I W 0 16 15 Rn 12 11 Rd 8 7 6 5 4 3 0 addr_mode 1 1 0 1 addr_mode LDRD (Load Registers Doubleword) loads a pair of ARM registers from two consecutive words of memory. The pair of registers is restricted to being an even-numbered register and the odd-numbered register that immediately follows it (for example, R10 and R11). A greater variety of addressing modes is available than for a two-register LDM. Syntax LDR{}D , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the even-numbered destination register for the memory word addressed by . The immediately following odd-numbered register is the destination register for the next memory word. If is R14, which would specify R15 as the second destination register, the instruction is UNPREDICTABLE. If specifies an odd-numbered register, the instruction is UNDEFINED. Is described in Addressing Mode 3 - Miscellaneous Loads and Stores on page A5-33. It determines the P, U, I, W, Rn, and addr_mode bits of the instruction. The syntax of all forms of includes a base register . Some forms also specify that the instruction modifies the base register value (this is known as base register write-back). The address generated by is the address of the lower of the two words loaded by the LDRD instruction. The address of the higher word is generated by adding 4 to this address. Architecture version Version 5TE and above, excluding ARMv5TExP. Exceptions Data Abort. A4-50 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation MemoryAccess(B-bit, E-bit) if ConditionPassed(cond) then if (Rd is even-numbered) and (Rd is not R14) and (address[1:0] == 0b00) and ((CP15_reg1_Ubit == 1) or (address[2] == 0)) then Rd = Memory[address,4] R(d+1) = memory[address+4,4] else UNPREDICTABLE Notes Operand restrictions If performs base register write-back and the base register is one of the two destination registers of the instruction, the results are UNPREDICTABLE. If specifies an index register , and is one of the two destination registers of the instruction, the results are UNPREDICTABLE. Data Abort For details of the effects of the instruction if a Data Abort occurs, see Effects of data-aborted instructions on page A2-21. Alignment Prior to ARMv6, if the memory address is not 64-bit aligned, the data read from memory is UNPREDICTABLE. Alignment checking (taking a data abort), and support for a big-endian (BE-32) data format are implementation options. From ARMv6, a byte-invariant mixed-endian format is supported, along with alignment checking options; modulo4 and modulo8. The pseudo-code for the ARMv6 case assumes that unaligned mixed-endian support is configured, with the endianness of the transfer defined by the CPSR E-bit. For more details on endianness and alignment see Endian support on page A2-30 and Unaligned access support on page A2-38. Time order ARM DDI 0100I The time order of the accesses to the two memory words is not architecturally defined. In particular, an implementation is allowed to perform the two 32-bit memory accesses in either order, or to combine them into a single 64-bit memory access. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-51 ARM Instructions A4.1.27 LDREX 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 1 1 0 0 1 16 15 Rn 12 11 Rd 8 SBO 7 6 5 4 3 1 0 0 1 0 SBO LDREX (Load Register Exclusive) loads a register from memory, and: • if the address has the Shared memory attribute, marks the physical address as exclusive access for the executing processor in a shared monitor • causes the executing processor to indicate an active inclusive access in the local monitor. Syntax LDREX{} , [] where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register for the memory word addressed by . Specifies the register containing the address. Architecture version Version 6 and above. Exceptions Data Abort. Operation MemoryAccess(B-bit, E-bit) if ConditionPassed(cond) then processor_id = ExecutingProcessor() Rd = Memory[Rn,4] physical_address = TLB(Rn) if Shared(Rn) == 1 then MarkExclusiveGlobal(physical_address,processor_id,4) MarkExclusiveLocal(physical_address,processor_id,4) /* See Summary of operation on page A2-49 */ A4-52 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Usage Use LDREX in combination with STREX to implement inter-process communication in shared memory multiprocessor systems. For more information see Synchronization primitives on page A2-44. The mechanism can also be used locally to ensure that an atomic load-store sequence occurs with no intervening context switch. Notes Use of R15 If register 15 is specified for or , the result is UNPREDICTABLE. Data Abort If a data abort occurs during a LDREX it is UNPREDICTABLE whether the MarkExclusiveGlobal() and MarkExclusiveLocal() operations are executed. Rd is not updated. Alignment If CP15 register 1(A,U) != (0,0) and Rd<1:0> != 0b00, an alignment exception will be taken. There is no support for unaligned Load Exclusive. If Rd<1:0> != 0b00 and (A,U) = (0,0), the result is UNPREDICTABLE. Memory support for exclusives The behavior of LDREX in regions of shared memory that do not support exclusives (for example, have no exclusives monitor implemented) is UNPREDICTABLE. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-53 ARM Instructions A4.1.28 LDRH 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 P U I W 1 16 15 Rn 12 11 Rd 8 7 6 5 4 3 0 addr_mode 1 0 1 1 addr_mode LDRH (Load Register Halfword) loads a halfword from memory and zero-extends it to a 32-bit word. Syntax LDR{}H , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register for the loaded value. If R15 is specified for , the result is UNPREDICTABLE. Is described in Addressing Mode 3 - Miscellaneous Loads and Stores on page A5-33. It determines the P, U, I, W, Rn and addr_mode bits of the instruction. The syntax of all forms of includes a base register . Some forms also specify that the instruction modifies the base register value (this is known as base register write-back). Architecture version All. Exceptions Data Abort. A4-54 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation MemoryAccess(B-bit, E-bit) if ConditionPassed(cond) then if (CP15_reg1_Ubit == 0) then if address[0] == 0 then data = Memory[address,2] else data = UNPREDICTABLE else /* CP15_reg1_Ubit == 1 */ data = Memory[address,2] Rd = ZeroExtend(data[15:0]) Usage Used with a suitable addressing mode, LDRH allows 16-bit memory data to be loaded into a general-purpose register where its value can be manipulated. Using the PC as the base register allows PC-relative addressing to facilitate position-independent code. Notes Operand restrictions If specifies base register write-back, and the same register is specified for and , the results are UNPREDICTABLE. Data Abort For details of the effects of the instruction if a Data Abort occurs, see Effects of data-aborted instructions on page A2-21. Alignment Prior to ARMv6, if the memory address is not halfword aligned, the data read from memory is UNPREDICTABLE. Alignment checking (taking a data abort when address[0] != 0), and support for a big-endian (BE-32) data format are implementation options. From ARMv6, a byte-invariant mixed-endian format is supported, along with an alignment checking option. The pseudo-code for the ARMv6 case assumes that mixed-endian support is configured, with the endianness of the transfer defined by the CPSR E-bit. For more details on endianness and alignment, see Endian support on page A2-30 and Unaligned access support on page A2-38. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-55 ARM Instructions A4.1.29 LDRSB 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 P U I W 1 16 15 Rn 12 11 Rd 8 7 6 5 4 3 0 addr_mode 1 1 0 1 addr_mode LDRSB (Load Register Signed Byte) loads a byte from memory and sign-extends the byte to a 32-bit word. Syntax LDR{}SB , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register for the loaded value. If R15 is specified for , the result is UNPREDICTABLE. Is described in Addressing Mode 3 - Miscellaneous Loads and Stores on page A5-33. It determines the P, U, I, W, Rn and addr_mode bits of the instruction. The syntax of all forms of includes a base register . Some forms also specify that the instruction modifies the base register value (this is known as base register write-back). Architecture version Version 4 and above. Exceptions Data Abort. Operation MemoryAccess(B-bit, E-bit) if ConditionPassed(cond) then data = Memory[address,1] Rd = SignExtend(data) A4-56 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Usage Use LDRSB with a suitable addressing mode to load 8-bit signed memory data into a general-purpose register where it can be manipulated. You can perform PC-relative addressing by using the PC as the base register. This facilitates position-independent code. Notes Operand restrictions If specifies base register write-back, and the same register is specified for and , the results are UNPREDICTABLE. Data Abort ARM DDI 0100I For details of the effects of the instruction if a Data Abort occurs, see Effects of data-aborted instructions on page A2-21. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-57 ARM Instructions A4.1.30 LDRSH 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 P U I W 1 16 15 Rn 12 11 Rd 8 7 6 5 4 3 0 addr_mode 1 1 1 1 addr_mode LDRSH (Load Register Signed Halfword) loads a halfword from memory and sign-extends the halfword to a 32-bit word. If the address is not halfword-aligned, the result is UNPREDICTABLE. Syntax LDR{}SH , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register for the loaded value. If R15 is specified for , the result is UNPREDICTABLE. Is described in Addressing Mode 3 - Miscellaneous Loads and Stores on page A5-33. It determines the P, U, I, W, Rn and addr_mode bits of the instruction. The syntax of all forms of includes a base register . Some forms also specify that the instruction modifies the base register value (this is known as base register write-back). Architecture version Version 4 and above. Exceptions Data Abort. A4-58 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation MemoryAccess(B-bit, E-bit) if ConditionPassed(cond) then if (CP15_reg1_Ubit == 0) then if address[0] == 0 then data = Memory[address,2] else data = UNPREDICTABLE else /* CP15_reg1_Ubit == 1 */ data = Memory[address,2] Rd = SignExtend(data[15:0]) Usage Used with a suitable addressing mode, LDRSH allows 16-bit signed memory data to be loaded into a general-purpose register where its value can be manipulated. Using the PC as the base register allows PC-relative addressing, which facilitates position-independent code. Notes Operand restrictions If specifies base register write-back, and the same register is specified for and , the results are UNPREDICTABLE. Data Abort For details of the effects of the instruction if a Data Abort occurs, see Effects of data-aborted instructions on page A2-21. Alignment Prior to ARMv6, if the memory address is not halfword aligned, the data read from memory is UNPREDICTABLE. Alignment checking (taking a data abort when address[0] != 0), and support for a big-endian (BE-32) data format are implementation options. From ARMv6, a byte-invariant mixed-endian format is supported, along with an alignment checking option. The pseudo-code for the ARMv6 case assumes that mixed-endian support is configured, with the endianness of the transfer defined by the CPSR E-bit. For more details on endianness and alignment, see Endian support on page A2-30 and Unaligned access support on page A2-38. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-59 ARM Instructions A4.1.31 LDRT 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 I 0 U 0 1 1 16 15 Rn 12 11 Rd 0 addr_mode LDRT (Load Register with Translation) loads a word from memory. If LDRT is executed when the processor is in a privileged mode, the memory system is signaled to treat the access as if the processor were in User mode. Syntax LDR{}T , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register for the loaded value. If R15 is specified for , the result is UNPREDICTABLE. Is described in Addressing Mode 2 - Load and Store Word or Unsigned Byte on page A5-18. It determines the I, U, Rn and addr_mode bits of the instruction. Only post-indexed forms of Addressing Mode 2 are available for this instruction. These forms have P == 0 and W == 0, where P and W are bit[24] and bit[21] respectively. This instruction uses P == 0 and W == 1 instead, but the addressing mode is the same in all other respects. The syntax of all forms of includes a base register . All forms also specify that the instruction modifies the base register value (this is known as base register write-back). Architecture version All. Exceptions Data Abort. A4-60 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation MemoryAccess(B-bit, E-bit) if ConditionPassed(cond) then if (CP15_reg1_Ubit == 0) then Rd = Memory[address,4] Rotate_Right (8 * address[1:0]) else /* CP15_reg1_Ubit == 1 */ Rd = Memory[address,4] Usage LDRT can be used by a (privileged) exception handler that is emulating a memory access instruction that would normally execute in User mode. The access is restricted as if it had User mode privilege. Notes User mode If this instruction is executed in User mode, an ordinary User mode access is performed. Operand restrictions If the same register is specified for and the results are UNPREDICTABLE. Data Abort For details of the effects of the instruction if a Data Abort occurs, see Effects of data-aborted instructions on page A2-21. Alignment As for LDR, see LDR on page A4-43. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-61 ARM Instructions A4.1.32 MCR 31 28 27 26 25 24 23 cond 21 20 19 1 1 1 0 opcode_1 0 16 15 CRn 12 11 Rd cp_num 8 7 5 4 3 opcode_2 1 0 CRm MCR (Move to Coprocessor from ARM Register) passes the value of register to the coprocessor whose number is cp_num. If no coprocessors indicate that they can execute the instruction, an Undefined Instruction exception is generated. Syntax MCR{} MCR2 , , , , {, } , , , , {, } where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. MCR2 Causes the condition field of the instruction to be set to 0b1111. This provides additional opcode space for coprocessor designers. The resulting instructions can only be executed unconditionally. Specifies the name of the coprocessor, and causes the corresponding coprocessor number to be placed in the cp_num field of the instruction. The standard generic coprocessor names are p0, p1, ..., p15. Is a coprocessor-specific opcode. Is the ARM register whose value is transferred to the coprocessor. If R15 is specified for , the result is UNPREDICTABLE. Is the destination coprocessor register. Is an additional destination or source coprocessor register. Is a coprocessor-specific opcode. If it is omitted, is assumed to be 0. Architecture version MCR is in all versions. MCR2 is in version 5 and above. Exceptions Undefined Instruction. A4-62 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then send Rd value to Coprocessor[cp_num] Usage Use MCR to initiate a coprocessor operation that acts on a value from an ARM register. An example is a fixed-point to floating-point conversion instruction for a floating-point coprocessor. Notes Coprocessor fields Only instruction bits[31:24], bit[20], bits[15:8], and bit[4] are defined by the ARM architecture. The remaining fields are recommendations, for compatibility with ARM Development Systems. Unimplemented coprocessor instructions Hardware coprocessor support is optional for coprocessors 0-13, regardless of the architecture version, and is optional for coprocessors 14 and 15 before ARMv6. An implementation can choose to implement a subset of the coprocessor instructions, or no coprocessor instructions at all. Any coprocessor instructions that are not implemented instead cause an Undefined Instruction exception. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-63 ARM Instructions A4.1.33 MCRR 31 28 27 26 25 24 23 22 21 20 19 cond 1 1 0 0 0 1 0 0 16 15 Rn 12 11 Rd cp_num 8 7 4 3 opcode 0 CRm MCRR (Move to Coprocessor from two ARM Registers) passes the values of two ARM registers to a coprocessor. If no coprocessors indicate that they can execute the instruction, an Undefined Instruction exception is generated. Syntax MCRR{} , , , , MCRR2 , , , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. MCRR2 Causes the condition field of the instruction to be set to 0b1111. This provides additional opcode space for coprocessor designers. The resulting instructions can only be executed unconditionally. Specifies the name of the coprocessor, and causes the corresponding coprocessor number to be placed in the cp_num field of the instruction. The standard generic coprocessor names are p0, p1, …, p15. Is a coprocessor-specific opcode. Is the first ARM register whose value is transferred to the coprocessor. If R15 is specified for , the result is UNPREDICTABLE. Is the second ARM register whose value is transferred to the coprocessor. If R15 is specified for , or Rn = Rd, the result is UNPREDICTABLE. Is the destination coprocessor register. Architecture version MCRR is in version 5TE and above, excluding ARMv5TExP. MCRR2 is in version 6 and above. Exceptions Undefined Instruction. A4-64 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then send Rd value to Coprocessor[cp_num] send Rn value to Coprocessor[cp_num] Usage Use MCRR to initiate a coprocessor operation that acts on values from two ARM registers. An example for a floating-point coprocessor is an instruction to transfer a double-precision floating-point number held in two ARM registers to a floating-point register. Notes Coprocessor fields Only instruction bits[31:8] are defined by the ARM architecture. The remaining fields are recommendations, for compatibility with ARM Development Systems. Unimplemented coprocessor instructions Hardware coprocessor support is optional for coprocessors 0-13, regardless of the architecture version, and is optional for coprocessors 14 and 15 before ARMv6. An implementation can choose to implement a subset of the coprocessor instructions, or no coprocessor instructions at all. Any coprocessor instructions that are not implemented instead cause an Undefined Instruction exception. Order of transfers If a coprocessor uses these instructions, it defines how each of the values of and is used. There is no architectural requirement for the two register transfers to occur in any particular time order. It is IMPLEMENTATION DEFINED whether Rd is transferred before Rn, after Rn, or at the same time as Rn. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-65 ARM Instructions A4.1.34 MLA 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 0 0 0 1 S 16 15 Rd 12 11 Rn 8 Rs 7 6 5 4 3 1 0 0 1 0 Rm MLA (Multiply Accumulate) multiplies two signed or unsigned 32-bit values, and adds a third 32-bit value. The least significant 32 bits of the result are written to the destination register. MLA can optionally update the condition code flags, based on the result. Syntax MLA{}{S} , , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. S Causes the S bit (bit[20]) in the instruction to be set to 1 and specifies that the instruction updates the CPSR by setting the N and Z flags according to the result of the multiply-accumulate. If S is omitted, the S bit of the instruction is set to 0 and the entire CPSR is unaffected by the instruction. Specifies the destination register. Holds the value to be multiplied with the value of . Holds the value to be multiplied with the value of . Contains the value that is added to the product of and . Architecture version All. Exceptions None. A4-66 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then Rd = (Rm * Rs + Rn)[31:0] if S == 1 then N Flag = Rd[31] Z Flag = if Rd == 0 then 1 else 0 C Flag = unaffected in v5 and above, UNPREDICTABLE in v4 and earlier V Flag = unaffected Notes Use of R15 Specifying R15 for register , , , or has UNPREDICTABLE results. Early termination If the multiplier implementation supports early termination, it must be implemented on the value of the operand. The type of early termination used (signed or unsigned) is IMPLEMENTATION DEFINED. Signed and unsigned The MLA instruction produces only the lower 32 bits of the 64-bit product. Therefore, MLA gives the same answer for multiplication of both signed and unsigned numbers. C flag The MLAS instruction is defined to leave the C flag unchanged in ARMv5 and above. In earlier versions of the architecture, the value of the C flag was UNPREDICTABLE after an MLAS instruction. Operand restriction Specifying the same register for and was previously described as producing UNPREDICTABLE results. There is no restriction in ARMv6, and it is believed that all relevant ARMv4 and ARMv5 implementations do not require this restriction either, because high performance multipliers read all their operands prior to writing back any results. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-67 ARM Instructions A4.1.35 MOV 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 I 1 1 0 1 S 16 15 SBZ 12 11 Rd 0 shifter_operand MOV (Move) writes a value to the destination register. The value can be either an immediate value or a value from a register, and can be shifted before the write. MOV can optionally update the condition code flags, based on the result. Syntax MOV{}{S} , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. S Sets the S bit (bit[20]) in the instruction to 1 and specifies that the instruction updates the CPSR. If S is omitted, the S bit is set to 0 and the CPSR is not changed by the instruction. Two types of CPSR update can occur when S is specified: • If is not R15, the N and Z flags are set according to the value moved (post-shift if a shift is specified), and the C flag is set to the carry output bit generated by the shifter (see Addressing Mode 1 - Data-processing operands on page A5-2). The V flag and the rest of the CPSR are unaffected. • If is R15, the SPSR of the current mode is copied to the CPSR. This form of the instruction is UNPREDICTABLE if executed in User mode or System mode, because these modes do not have an SPSR. Specifies the destination register. Specifies the operand. The options for this operand are described in Addressing Mode 1 Data-processing operands on page A5-2, including how each option causes the I bit (bit[25]) and the shifter_operand bits (bits[11:0]) to be set in the instruction. If the I bit is 0 and both bit[7] and bit[4] of shifter_operand are 1, the instruction is not MOV. Instead, see Extending the instruction set on page A3-32 to determine which instruction it is. Architecture version All. Exceptions None. A4-68 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then Rd = shifter_operand if S == 1 and Rd == R15 then if CurrentModeHasSPSR() then CPSR = SPSR else UNPREDICTABLE else if S == 1 then N Flag = Rd[31] Z Flag = if Rd == 0 then 1 else 0 C Flag = shifter_carry_out V Flag = unaffected Usage Use MOV to: • Move a value from one register to another. • Put a constant value into a register. • Perform a shift without any other arithmetic or logical operation. Use a left shift by n to multiply by 2n. • When the PC is the destination of the instruction, a branch occurs. The instruction: MOV PC, LR • ARM DDI 0100I can therefore be used to return from a subroutine (see instructions B, BL on page A4-10). In T variants of architecture 4 and in architecture 5 and above, the instruction BX LR must be used in place of MOV PC, LR, as the BX instruction automatically switches back to Thumb state if appropriate (but see also The T and J bits on page A2-15 for operation on non-T variants of ARM architecture version 5). When the PC is the destination of the instruction and the S bit is set, a branch occurs and the SPSR of the current mode is copied to the CPSR. This means that you can use a MOVS PC, LR instruction to return from some types of exception (see Exceptions on page A2-16). Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-69 ARM Instructions A4.1.36 MRC 31 28 27 26 25 24 23 cond 21 20 19 1 1 1 0 opcode_1 1 16 15 CRn 12 11 Rd cp_num 8 7 5 4 3 opcode_2 1 0 CRm MRC (Move to ARM Register from Coprocessor) causes a coprocessor to transfer a value to an ARM register or to the condition flags. If no coprocessors can execute the instruction, an Undefined Instruction exception is generated. Syntax MRC{} MRC2 , , , , {, } , , , , {, } where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. MRC2 Causes the condition field of the instruction to be set to 0b1111. This provides additional opcode space for coprocessor designers. The resulting instructions can only be executed unconditionally. Specifies the name of the coprocessor, and causes the corresponding coprocessor number to be placed in the cp_num field of the instruction. The standard generic coprocessor names are p0, p1, ..., p15. Is a coprocessor-specific opcode. Specifies the destination ARM register for the instruction. If R15 is specified for , the condition code flags are updated instead of a general-purpose register. Specifies the coprocessor register that contains the first operand. Is an additional coprocessor source or destination register. Is a coprocessor-specific opcode. If it is omitted, is assumed to be 0. Architecture version MRC is in all versions. MRC2 is in version 5 and above. Exceptions Undefined Instruction. A4-70 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then data = value from Coprocessor[cp_num] if Rd is R15 then N flag = data[31] Z flag = data[30] C flag = data[29] V flag = data[28] else /* Rd is not R15 */ Rd = data Usage MRC has two uses: 1. If specifies R15, the condition code flags bits are updated from the top four bits of the value from the coprocessor specified by (to allow conditional branching on the status of a coprocessor) and the other 28 bits are ignored. An example of this use would be to transfer the result of a comparison performed by a floating-point coprocessor to the ARM's condition flags. 2. Otherwise the instruction writes into register a value from the coprocessor specified by . An example of this use is a floating-point to integer conversion instruction in a floating-point coprocessor. Notes Coprocessor fields Only instruction bits[31:24], bit[20], bits[15:8] and bit[4] are defined by the ARM architecture. The remaining fields are recommendations, for compatibility with ARM Development Systems. Unimplemented coprocessor instructions Hardware coprocessor support is optional for coprocessors 0-13, regardless of the architecture version, and is optional for coprocessors 14 and 15 before ARMv6. An implementation can choose to implement a subset of the coprocessor instructions, or no coprocessor instructions at all. Any coprocessor instructions that are not implemented instead cause an Undefined Instruction exception. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-71 ARM Instructions A4.1.37 MRRC 31 28 27 26 25 24 23 22 21 20 19 cond 1 1 0 0 0 1 0 1 16 15 Rn 12 11 Rd cp_num 8 7 4 3 opcode 0 CRm MRRC (Move to two ARM registers from Coprocessor) causes a coprocessor to transfer values to two ARM registers. If no coprocessors can execute the instruction, an Undefined Instruction exception is generated. Syntax MRRC{} , , , , MRRC2 , , , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. MRRC2 Causes the condition field of the instruction to be set to 0b1111. This provides additional opcode space for coprocessor designers. The resulting instructions can only be executed unconditionally. Specifies the name of the coprocessor, and causes the corresponding coprocessor number to be placed in the cp_num field of the instruction. The standard generic coprocessor names are p0, p1, …, p15. Is a coprocessor-specific opcode. Is the first destination ARM register. If R15 is specified for , the result is UNPREDICTABLE. Is the second destination ARM register. If R15 is specified for , the result is UNPREDICTABLE. Is the coprocessor register which supplies the data to be transferred. Architecture version MRRC is in version 5TE and above, excluding ARMv5TExP. MRRC2 is in version 6 and above. Exceptions Undefined Instruction. A4-72 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then Rd = first value from Coprocessor[cp_num] Rn = second value from Coprocessor[cp_num] Usage Use MRRC to initiate a coprocessor operation that writes values to two ARM registers. An example for a floating-point coprocessor is an instruction to transfer a double-precision floating-point number held in a floating-point register to two ARM registers. Notes Operand restrictions Specifying the same register for and has UNPREDICTABLE results. Coprocessor fields Only instruction bits[31:8] are defined by the ARM architecture. The remaining fields are recommendations, for compatibility with ARM Development Systems. Unimplemented coprocessor instructions Hardware coprocessor support is optional for coprocessors 0-13, regardless of the architecture version, and is optional for coprocessors 14 and 15 before ARMv6. An implementation can choose to implement a subset of the coprocessor instructions, or no coprocessor instructions at all. Any coprocessor instructions that are not implemented instead cause an Undefined Instruction exception. Order of transfers If a coprocessor uses these instructions, it defines which value is written to and which value to . There is no architectural requirement for the two register transfers to occur in any particular time order. It is IMPLEMENTATION DEFINED whether Rd is transferred before Rn, after Rn, or at the same time as Rn. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-73 ARM Instructions A4.1.38 MRS 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 1 0 R 0 0 16 15 SBO 12 11 Rd 0 SBZ MRS (Move PSR to general-purpose register) moves the value of the CPSR or the SPSR of the current mode into a general-purpose register. In the general-purpose register, the value can be examined or manipulated with normal data-processing instructions. Syntax MRS{} MRS{} , CPSR , SPSR where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. If R15 is specified for , the result is UNPREDICTABLE. Architecture version All. Exceptions None. Operation if ConditionPassed(cond) then if R == 1 then Rd = SPSR else Rd = CPSR A4-74 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Usage The MRS instruction is commonly used for three purposes: • As part of a read/modify/write sequence for updating a PSR. For more details, see MSR on page A4-76. • When an exception occurs and there is a possibility of a nested exception of the same type occurring, the SPSR of the exception mode is in danger of being corrupted. To deal with this, the SPSR value must be saved before the nested exception can occur, and later restored in preparation for the exception return. The saving is normally done by using an MRS instruction followed by a store instruction. Restoring the SPSR uses the reverse sequence of a load instruction followed by an MSR instruction. • In process swap code, the programmers’ model state of the process being swapped out must be saved, including relevant PSR contents, and similar state of the process being swapped in must be restored. Again, this involves the use of MRS/store and load/MSR instruction sequences. Notes User mode SPSR ARM DDI 0100I Accessing the SPSR when in User mode or System mode is UNPREDICTABLE. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-75 ARM Instructions A4.1.39 MSR Immediate operand: 31 28 27 26 25 24 23 22 21 20 19 cond 16 15 0 0 1 1 0 R 1 0 field_mask 12 11 SBO 8 7 rotate_imm 0 8_bit_immediate Register operand: 31 28 27 26 25 24 23 22 21 20 19 cond 16 15 0 0 0 1 0 R 1 0 field_mask 12 11 SBO 8 SBZ 7 6 5 4 3 0 0 0 0 0 Rm MSR (Move to Status Register from ARM Register) transfers the value of a general-purpose register or an immediate constant to the CPSR or the SPSR of the current mode. Syntax MSR{} MSR{} MSR{} MSR{} CPSR_, CPSR_, SPSR_, SPSR_, # # where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Is a sequence of one or more of the following: c sets the control field mask bit (bit 16) x sets the extension field mask bit (bit 17) s sets the status field mask bit (bit 18) f sets the flags field mask bit (bit 19). Is the immediate value to be transferred to the CPSR or SPSR. Allowed immediate values are 8-bit immediates (in the range 0x00 to 0xFF) and values that can be obtained by rotating them right by an even amount in the range 0 to 30. These immediate values are the same as those allowed in the immediate form as shown in Data-processing operands - Immediate on page A5-6. Is the general-purpose register to be transferred to the CPSR or SPSR. Architecture version All. A4-76 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Exceptions None. Operation There are four categories of PSR bits, according to rules about updating them, see Types of PSR bits on page A2-11 for details. The pseudo-code uses four bit mask constants to identify these categories of PSR bits. The values of these masks depend on the architecture version, see Table A4-1. Table A4-1 Bit mask constants Architecture versions UnallocMask 4 0x0FFFFF20 4T, 5T UserMask PrivMask StateMask 0xF0000000 0x0000000F 0x00000000 0x0FFFFF00 0xF0000000 0x0000000F 0x00000020 5TE, 5TExP 0x07FFFF00 0xF8000000 0x0000000F 0x00000020 5TEJ 0x06FFFF00 0xF8000000 0x0000000F 0x01000020 6 0x06F0FC00 0xF80F0200 0x000001DF 0x01000020 if ConditionPassed(cond) then if opcode[25] == 1 then operand = 8_bit_immediate Rotate_Right (rotate_imm * 2) else operand = Rm if (operand AND UnallocMask) !=0 then UNPREDICTABLE /* Attempt to set reserved bits */ byte_mask = (if field_mask[0] == 1 then 0x000000FF else 0x00000000) OR (if field_mask[1] == 1 then 0x0000FF00 else 0x00000000) OR (if field_mask[2] == 1 then 0x00FF0000 else 0x00000000) OR (if field_mask[3] == 1 then 0xFF000000 else 0x00000000) if R == 0 then if InAPrivilegedMode() then if (operand AND StateMask) != 0 then UNPREDICTABLE /* Attempt to set non-ARM execution state */ else mask = byte_mask AND (UserMask OR PrivMask) else mask = byte_mask AND UserMask CPSR = (CPSR AND NOT mask) OR (operand AND mask) else /* R == 1 */ if CurrentModeHasSPSR() then mask = byte_mask AND (UserMask OR PrivMask OR StateMask) SPSR = (SPSR AND NOT mask) OR (operand AND mask) else UNPREDICTABLE ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-77 ARM Instructions Usage Use MSR to update the value of the condition code flags, interrupt enables, or the processor mode. You must normally update the value of a PSR by moving the PSR to a general-purpose register (using the MRS instruction), modifying the relevant bits of the general-purpose register, and restoring the updated general-purpose register value back into the PSR (using the MSR instruction). For example, a good way to switch the ARM to Supervisor mode from another privileged mode is: MRS BIC ORR MSR R0,CPSR R0,R0,#0x1F R0,R0,#0x13 CPSR_c,R0 ; ; ; ; Read CPSR Modify by removing current mode and substituting Supervisor mode Write the result back to CPSR For maximum efficiency, MSR instructions should only write to those fields that they can potentially change. For example, the last instruction in the above code can only change the CPSR control field, as all bits in the other fields are unchanged since they were read from the CPSR by the first instruction. So it writes to CPSR_c, not CPSR_fsxc or some other combination of fields. However, if the only reason that an MSR instruction cannot change a field is that no bits are currently allocated to the field, then the field must be written, to ensure future compatibility. You can use the immediate form of MSR to set any of the fields of a PSR, but you must take care to use the read-modify-write technique described above. The immediate form of the instruction is equivalent to reading the PSR concerned, replacing all the bits in the fields concerned by the corresponding bits of the immediate constant and writing the result back to the PSR. The immediate form must therefore only be used when the intention is to modify all the bits in the specified fields and, in particular, must not be used if the specified fields include any as-yet-unallocated bits. Failure to observe this rule might result in code which has unanticipated side effects on future versions of the ARM architecture. As an exception to the above rule, it is legitimate to use the immediate form of the instruction to modify the flags byte, despite the fact that bits[26:25] of the PSRs have no allocated function at present. For example, you can use MSR to set all four flags (and clear the Q flag if the processor implements the Enhanced DSP extension): MSR CPSR_f,#0xF0000000 Any functionality allocated to bits[26:25] in a future version of the ARM architecture will be designed so that such code does not have unexpected side effects. Several bits must not be changed to reserved values or the results are UNPREDICTABLE. For example, an attempt to write a reserved value to the mode bits (4:0), or changing the J-bit (24). A4-78 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Notes The R bit Bit[22] of the instruction is 0 if the CPSR is to be written and 1 if the SPSR is to be written. User mode CPSR Any writes to privileged or execution state bits are ignored. User mode SPSR Accessing the SPSR when in User mode is UNPREDICTABLE. System mode SPSR Accessing the SPSR when in System mode is UNPREDICTABLE. Obsolete field specification The CPSR, CPSR_flg, CPSR_ctl, CPSR_all, SPSR, SPSR_flg, SPSR_ctl and SPSR_all forms of PSR field specification have been superseded by the csxf format shown on page A4-76. CPSR, SPSR, CPSR_all and SPSR_all produce a field mask of 0b1001. CPSR_flg and SPSR_flg produce a field mask of 0b1000. CPSR_ctl and SPSR_ctl produce a field mask of 0b0001. The T bit or J bit The MSR instruction must not be used to alter the T bit or the J bit in the CPSR. If such an attempt is made, the results are UNPREDICTABLE. Addressing modes The immediate and register forms are specified in precisely the same way as the immediate and unshifted register forms of Addressing Mode 1 (see Addressing Mode 1 Data-processing operands on page A5-2). All other forms of Addressing Mode 1 yield UNPREDICTABLE results. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-79 ARM Instructions A4.1.40 MUL 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 0 0 0 0 S 16 15 Rd 12 11 SBZ 8 Rs 7 6 5 4 3 1 0 0 1 0 Rm MUL (Multiply) multiplies two signed or unsigned 32-bit values. The least significant 32 bits of the result are written to the destination register. MUL can optionally update the condition code flags, based on the result. Syntax MUL{}{S} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. S Causes the S bit (bit[20]) in the instruction to be set to 1 and specifies that the instruction updates the CPSR by setting the N and Z flags according to the result of the multiplication. If S is omitted, the S bit of the instruction is set to 0 and the entire CPSR is unaffected by the instruction. Specifies the destination register for the instruction. Specifies the register that contains the first value to be multiplied. Holds the value to be multiplied with the value of . Architecture version All. Exceptions None. Operation if ConditionPassed(cond) then Rd = (Rm * Rs)[31:0] if S == 1 then N Flag = Rd[31] Z Flag = if Rd == 0 then 1 else 0 C Flag = unaffected in v5 and above, UNPREDICTABLE in v4 and earlier V Flag = unaffected A4-80 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Notes Use of R15 Specifying R15 for register , , or has UNPREDICTABLE results. Early termination If the multiplier implementation supports early termination, it must be implemented on the value of the operand. The type of early termination used (signed or unsigned) is IMPLEMENTATION DEFINED. Signed and unsigned Because the MUL instruction produces only the lower 32 bits of the 64-bit product, MUL gives the same answer for multiplication of both signed and unsigned numbers. C flag The MULS instruction is defined to leave the C flag unchanged in ARM architecture version 5 and above. In earlier versions of the architecture, the value of the C flag was UNPREDICTABLE after a MULS instruction. Operand restriction Specifying the same register for and was previously described as producing UNPREDICTABLE results. There is no restriction in ARMv6, and it is believed all relevant ARMv4 and ARMv5 implementations do not require this restriction either, because high performance multipliers read all their operands prior to writing back any results. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-81 ARM Instructions A4.1.41 MVN 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 I 1 1 1 1 S 16 15 SBZ 12 11 Rd 0 shifter_operand MVN (Move Not) generates the logical ones complement of a value. The value can be either an immediate value or a value from a register, and can be shifted before the MVN operation. MVN can optionally update the condition code flags, based on the result. Syntax MVN{}{S} , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. S Sets the S bit (bit[20]) in the instruction to 1 and specifies that the instruction updates the CPSR. If S is omitted, the S bit is set to 0 and the CPSR is not changed by the instruction. Two types of CPSR update can occur when S is specified: • If is not R15, the N and Z flags are set according to the result of the operation, and the C flag is set to the carry output bit generated by the shifter (see Addressing Mode 1 - Data-processing operands on page A5-2). The V flag and the rest of the CPSR are unaffected. • If is R15, the SPSR of the current mode is copied to the CPSR. This form of the instruction is UNPREDICTABLE if executed in User mode or System mode, because these modes do not have an SPSR. Specifies the destination register. Specifies the operand. The options for this operand are described in Addressing Mode 1 Data-processing operands on page A5-2, including how each option causes the I bit (bit[25]) and the shifter_operand bits (bits[11:0]) to be set in the instruction. If the I bit is 0 and both bit[7] and bit[4] of shifter_operand are 1, the instruction is not MVN. Instead, see Extending the instruction set on page A3-32 to determine which instruction it is. Architecture version All. Exceptions None. A4-82 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then Rd = NOT shifter_operand if S == 1 and Rd == R15 then if CurrentModeHasSPSR() then CPSR = SPSR else UNPREDICTABLE else if S == 1 then N Flag = Rd[31] Z Flag = if Rd == 0 then 1 else 0 C Flag = shifter_carry_out V Flag = unaffected Usage Use MVN to: • form a bit mask • take the ones complement of a value. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-83 ARM Instructions A4.1.42 ORR 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 I 1 1 0 0 S 16 15 Rn 12 11 Rd 0 shifter_operand ORR (Logical OR) performs a bitwise (inclusive) OR of two values. The first value comes from a register. The second value can be either an immediate value or a value from a register, and can be shifted before the OR operation. ORR can optionally update the condition code flags, based on the result. Syntax ORR{}{S} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. S Sets the S bit (bit[20]) in the instruction to 1 and specifies that the instruction updates the CPSR. If S is omitted, the S bit is set to 0 and the CPSR is not changed by the instruction. Two types of CPSR update can occur when S is specified: • If is not R15, the N and Z flags are set according to the result of the operation, and the C flag is set to the carry output bit generated by the shifter (see Addressing Mode 1 - Data-processing operands on page A5-2). The V flag and the rest of the CPSR are unaffected. • If is R15, the SPSR of the current mode is copied to the CPSR. This form of the instruction is UNPREDICTABLE if executed in User mode or System mode, because these modes do not have an SPSR. Specifies the destination register. Specifies the register that contains the first operand. Specifies the second operand. The options for this operand are described in Addressing Mode 1 - Data-processing operands on page A5-2, including how each option causes the I bit (bit[25]) and the shifter_operand bits (bits[11:0]) to be set in the instruction. If the I bit is 0 and both bit[7] and bit[4] of shifter_operand are 1, the instruction is not ORR. Instead, see Extending the instruction set on page A3-32 to determine which instruction it is. Architecture version All. A4-84 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Exceptions None. Operation if ConditionPassed(cond) then Rd = Rn OR shifter_operand if S == 1 and Rd == R15 then if CurrentModeHasSPSR() then CPSR = SPSR else UNPREDICTABLE else if S == 1 then N Flag = Rd[31] Z Flag = if Rd == 0 then 1 else 0 C Flag = shifter_carry_out V Flag = unaffected Usage Use ORR to set selected bits in a register. For each bit, OR with 1 sets the bit, and OR with 0 leaves it unchanged. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-85 ARM Instructions A4.1.43 PKHBT 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 1 0 0 0 16 15 Rn 12 11 Rd 7 6 shift_imm 4 3 0 0 1 0 Rm PKHBT (Pack Halfword Bottom Top) combines the bottom (least significant) halfword of its first operand with the top (most significant) halfword of its shifted second operand. The shift is a left shift, by any amount from 0 to 31. Syntax PKHBT {} , , {, LSL #} where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Bits[15:0] of this operand become bits[15:0] of the result of the operation. Specifies the register that contains the second operand. This is shifted left by the specified amount, then bits[31:16] of this operand become bits[31:16] of the result of the operation. Specifies the amount by which is to be shifted left. This is a value from 0 to 31. If the shift specifier is omitted, a left shift by 0 is used. Architecture version Version 6 and above. Exceptions None. Operation if ConditionPassed(cond) then Rd[15:0] = Rn[15:0] Rd[31:16] = (Rm Logical_Shift_Left shift_imm)[31:16] A4-86 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Usage To construct the word in Rd consisting of the top half of register Ra and the bottom half of register Rb as its most and least significant halfwords respectively, use: PKHBT Rd, Rb, Ra To construct the word in Rd consisting of the bottom half of register Ra and the bottom half of register Rb as its most and least significant halfwords respectively, use: PKHBT Rd, Rb, Ra, LSL #16 Notes Use of R15 ARM DDI 0100I Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-87 ARM Instructions A4.1.44 PKHTB 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 1 0 0 0 16 15 Rn 12 11 Rd 7 6 shift_imm 4 3 1 0 1 0 Rm PKHTB (Pack Halfword Top Bottom) combines the top (most significant) halfword of its first operand with the bottom (least significant) halfword of its shifted second operand. The shift is an arithmetic right shift, by any amount from 1 to 32. Syntax PKHTB {} , , {, ASR #} where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Bits[31:16] of this operand become bits[31:16] of the result of the operation. Specifies the register that contains the second operand. This is shifted right arithmetically by the specified amount, then bits[15:0] of this operand become bits[15:0] of the result of the operation. Specifies the amount by which is to be shifted right. A shift by 32 is encoded as shift_imm == 0. If the shift specifier is omitted, the assembler converts the instruction to PKHBT Rd, Rm, Rn. This produces the same effect as an arithmetic shift right by 0. Note If your assembler accepts shifts by #0 and treats them as equivalent to no shift or LSL #0, then it must accept ASR #0 here. It is equivalent to omitting the shift specifier. Architecture version Version 6 and above. Exceptions None. A4-88 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then if shift_imm == 0 then /* ASR #32 case */ if Rm[31] == 0 then Rd[15:0] = 0x0000 else Rd[15:0] = 0xFFFF else Rd[15:0] = (Rm Arithmetic_Shift_Right shift_imm)[15:0] Rd[31:16] = Rn[31:16] Usage To construct the word in Rd consisting of the top half of register Ra and the top half of register Rb as its most and least significant halfwords respectively, use: PKHTB Rd, Ra, Rb, ASR #16 You can use this to truncate a Q31 number in Rb, and put the result into the bottom half of Rd. You can scale the Rb value by using a different shift amount. To construct the word in Rd consisting of the top half of register Ra and the bottom half of register Rb as its most and least significant halfwords respectively, you can use: PKHTB Rd, Ra, Rb The assembler converts this into: PKHBT Rd, Rb, Ra Notes Use of R15 ARM DDI 0100I Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-89 ARM Instructions A4.1.45 PLD 31 30 29 28 27 26 25 24 23 22 21 20 19 1 1 1 1 0 1 I 1 U 1 0 1 16 15 14 13 12 11 Rn 1 1 1 1 0 addr_mode PLD (Preload Data) signals the memory system that memory accesses from a specified address are likely in the near future. The memory system can respond by taking actions which are expected to speed up the memory accesses when they do occur, such as pre-loading the cache line containing the specified address into the cache. PLD is a hint instruction, aimed at optimizing memory system performance. It has no architecturally-defined effect, and memory systems that do not support this optimization can ignore it. On such memory systems, PLD acts as a NOP. Syntax PLD where: Is described in Addressing Mode 2 - Load and Store Word or Unsigned Byte on page A5-18. It specifies the I, U, Rn, and addr_mode bits of the instruction. Only addressing modes with P == 1 and W == 0 are available for this instruction. Pre-indexed and post-indexed addressing modes have P == 0 or W == 1 and so are not available. Architecture version Version 5TE and above, excluding ARMv5TExP. Exceptions None. Operation /* No change occurs to programmer's model state, but where * appropriate, the memory system is signaled that memory accesses * to the specified address are likely in the near future. */ A4-90 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Notes Condition Unlike most other ARM instructions, PLD cannot be executed conditionally. Write-back Clearing bit[24] (the P bit) or setting bit[21] (the W bit) has UNPREDICTABLE results. Data Aborts This instruction never signals a precise Data Abort generated by the VMSA MMU, PMSA MPU or by the rest of the memory system. Other memory system exceptions caused as a side-effect of this operation might be reported using an imprecise Data Abort or by some other exception mechanism. Alignment There are no alignment restrictions on the address generated by . If an implementation contains a System Control coprocessor (see Chapter B3 The System Control Coprocessor), it must not generate an alignment exception for any PLD instruction. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-91 ARM Instructions A4.1.46 QADD 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 1 0 0 0 0 16 15 Rn 12 11 Rd 8 SBZ 7 6 5 4 3 0 1 0 1 0 Rm QADD (Saturating Add) performs integer addition. It saturates the result to the 32-bit signed integer range –231 ≤ x ≤ 231 – 1. If saturation occurs, QADD sets the Q flag in the CPSR. Syntax QADD{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version Version 5TE and above. Exceptions None. Operation if ConditionPassed(cond) then Rd = SignedSat(Rm + Rn, 32) if SignedDoesSat(Rm + Rn, 32) then Q Flag = 1 A4-92 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Usage As well as performing saturated integer and Q31 additions, you can use QADD in combination with an SMUL, SMULW, or SMULL instruction to produce multiplications of Q15 and Q31 numbers. Three examples are: • To multiply the Q15 numbers in the bottom halves of R0 and R1 and place the Q31 result in R2, use: SMULBB QADD • To multiply the Q31 number in R0 by the Q15 number in the top half of R1 and place the Q31 result in R2, use: SMULWT QADD • R2, R0, R1 R2, R2, R2 R2, R0, R1 R2, R2, R2 To multiply the Q31 numbers in R0 and R1 and place the Q31 result in R2, use: SMULL QADD R3, R2, R0, R1 R2, R2, R2 Notes Use of R15 Specifying R15 for register , , or has UNPREDICTABLE results. Condition flags QADD does not affect the N, Z, C, or V flags. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-93 ARM Instructions A4.1.47 QADD16 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 0 1 0 16 15 Rn 12 11 Rd 8 SBO 7 6 5 4 3 0 0 0 1 0 Rm QADD16 performs two 16-bit integer additions. It saturates the results to the 16-bit signed integer range –215 ≤ x ≤ 215 – 1. QADD16 does not affect any flags. Syntax QADD16{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version Version 6 and above. Exceptions None. Operation if ConditionPassed(cond) then Rd[15:0] = SignedSat(Rn[15:0] + Rm[15:0], 16) Rd[31:16] = SignedSat(Rn[31:16] + Rm[31:16], 16) Usage Use QADD16 in similar ways to the SADD16 instruction, but for signed saturated arithmetic. QADD16 does not set the GE bits for use with SEL. See SADD16 on page A4-119 for more details. Notes Use of R15 A4-94 Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.48 QADD8 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 0 1 0 16 15 Rn 12 11 Rd 8 7 6 5 4 SBO 3 1 0 0 1 0 Rm QADD8 performs four 8-bit integer additions. It saturates the results to the 8-bit signed integer range –27 ≤ x ≤ 27 – 1. QADD8 does not affect any flags. Syntax QADD8{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version Version 6 and above. Exceptions None. Operation if ConditionPassed(cond) then Rd[7:0] = SignedSat(Rn[7:0] Rd[15:8] = SignedSat(Rn[15:8] Rd[23:16] = SignedSat(Rn[23:16] Rd[31:24] = SignedSat(Rn[31:24] + + + + Rm[7:0], Rm[15:8], Rm[23:16], Rm[31:24], 8) 8) 8) 8) Usage Use QADD8 in similar ways to the SADD8 instruction, but for signed saturated arithmetic. QADD8 does not set the GE bits for use with SEL. See SADD8 on page A4-121 for more details. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-95 ARM Instructions Notes Use of R15 A4-96 Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.49 QADDSUBX 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 0 1 0 16 15 Rn 12 11 Rd 8 7 6 5 4 SBO 0 0 1 1 3 0 Rm QADDSUBX (Saturating Add and Subtract with Exchange) performs one 16-bit integer addition and one 16-bit subtraction. It saturates the results to the 16-bit signed integer range –215 ≤ x ≤ 215 – 1. QADDSUBX exchanges the two halfwords of the second operand before it performs the arithmetic. QADDSUBX does not affect any flags. Syntax QADDSUBX{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version Version 6 and above. Exceptions None. Operation if ConditionPassed(cond) then Rd[31:16] = SignedSat(Rn[31:16] + Rm[15:0], 16) Rd[15:0] = SignedSat(Rn[15:0] - Rm[31:16], 16) ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-97 ARM Instructions Usage You can use QADDSUBX for operations on complex numbers that are held as pairs of 16-bit integers or Q15 numbers. If you hold the real and imaginary parts of a complex number in the bottom and top half of a register respectively, then the instruction: QADDSUBX Rd, Ra, Rb performs the complex arithmetic operation Rd = (Ra + i * Rb). QADDSUBX does not set the Q flag, even if saturation occurs on either operation. Notes Use of R15 A4-98 Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.50 QDADD 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 1 0 1 0 0 16 15 Rn 12 11 Rd 8 7 6 5 4 SBZ 3 0 1 0 1 0 Rm QDADD (Saturating Double and Add) doubles its second operand, then adds the result to its first operand. Both the doubling and the addition have their results saturated to the 32-bit signed integer range –231 ≤ x ≤ 231 – 1. If saturation occurs in either operation, the instruction sets the Q flag in the CPSR. Syntax QDADD{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register whose value is to be doubled, saturated, and used as the second operand for the saturated addition. Architecture version Version 5TE and above. Exceptions None. Operation if ConditionPassed(cond) then Rd = SignedSat(Rm + SignedSat(Rn*2, 32), 32) if SignedDoesSat(Rm + SignedSat(Rn*2, 32), 32) or SignedDoesSat(Rn*2, 32) then Q Flag = 1 ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-99 ARM Instructions Usage The primary use for this instruction is to generate multiply-accumulate operations on Q15 and Q31 numbers, by placing it after an integer multiply instruction. Three examples are: • To multiply the Q15 numbers in the top halves of R4 and R5 and add the product to the Q31 number in R6, use: SMULTT QDADD • To multiply the Q15 number in the bottom half of R2 by the Q31 number in R3 and add the product to the Q31 number in R7, use: SMULWB QDADD • R0, R4, R5 R6, R6, R0 R0, R3, R2 R7, R7, R0 To multiply the Q31 numbers in R2 and R3 and add the product to the Q31 number in R4, use: SMULL QDADD R0, R1, R2, R3 R4, R4, R1 Notes A4-100 Use of R15 Specifying R15 for register , , or has UNPREDICTABLE results. Condition flags The QDADD instruction does not affect the N, Z, C, or V flags. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.51 QDSUB 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 1 0 1 1 0 16 15 Rn 12 11 Rd 8 7 6 5 4 SBZ 0 1 0 1 3 0 Rm QDSUB (Saturating Double and Subtract) doubles its second operand, then subtracts the result from its first operand. Both the doubling and the subtraction have their results saturated to the 32-bit signed integer range –231 ≤ x ≤ 231 – 1. If saturation occurs in either operation, QDSUB sets the Q flag in the CPSR. Syntax QDSUB{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register whose value is to be doubled, saturated, and used as the second operand for the saturated subtraction. Rm and Rn are in reversed order in the assembler syntax, compared with the majority of ARM instructions. Architecture version Version 5TE and above. Exceptions None. Operation if ConditionPassed(cond) then Rd = SignedSat(Rm - SignedSat(Rn*2, 32), 32) if SignedDoesSat(Rm - SignedSat(Rn*2, 32), 32) or SignedDoesSat(Rn*2, 32) then Q Flag = 1 ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-101 ARM Instructions Usage The primary use for this instruction is to generate multiply-subtract operations on Q15 and Q31 numbers, by placing it after an integer multiply instruction. Three examples are: • To multiply the Q15 numbers in the top half of R4 and the bottom half of R5, and subtract the product from the Q31 number in R6, use: SMULTB QDSUB • To multiply the Q15 number in the bottom half of R2 by the Q31 number in R3 and subtract the product from the Q31 number in R7, use: SMULWB QDSUB • R0, R4, R5 R6, R6, R0 R0, R3, R2 R7, R7, R0 To multiply the Q31 numbers in R2 and R3 and subtract the product from the Q31 number in R4, use: SMULL QDSUB R0, R1, R2, R3 R4, R4, R1 Notes A4-102 Use of R15 Specifying R15 for register , , or has UNPREDICTABLE results. Condition flags The QDSUB instruction does not affect the N, Z, C, or V flags. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.52 QSUB 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 1 0 0 1 0 16 15 Rn 12 11 Rd 8 7 6 5 4 SBZ 0 1 0 1 3 0 Rm QSUB (Saturating Subtract) performs integer subtraction. It saturates the result to the 32-bit signed integer range –231 ≤ x ≤ 231 – 1. If saturation occurs, QSUB sets the Q flag in the CPSR. Syntax QSUB{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Rm and Rn are in reversed order in the assembler syntax, compared with the majority of ARM instructions. Architecture version Version 5TE and above. Exceptions None. Operation if ConditionPassed(cond) then Rd = SignedSat(Rm - Rn, 32) if SignedDoesSat(Rm - Rn, 32) then Q Flag = 1 Notes Use of R15 Specifying R15 for register , , or has UNPREDICTABLE results. Condition flags QSUB does not affect the N, Z, C, or V flags. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-103 ARM Instructions A4.1.53 QSUB16 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 0 1 0 16 15 Rn 12 11 Rd 8 SBO 7 6 5 4 3 0 1 1 1 0 Rm QSUB16 performs two 16-bit subtractions. It saturates the results to the 16-bit signed integer range –215 ≤ x ≤ 215 – 1. QSUB16 does not affect any flags. Syntax QSUB16{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version Version 6 and above. Exceptions None. Operation if ConditionPassed(cond) then Rd[15:0] = SignedSat(Rn[15:0] - Rm[15:0], 16) Rd[31:16] = SignedSat(Rn[31:16] - Rm[31:16], 16) Usage Use QSUB16 in similar ways to the SSUB16 instruction, but for signed saturated arithmetic. QSUB16 does not set the GE bits for use with SEL. See SSUB16 on page A4-180 for more details. Notes Use of R15 A4-104 Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.54 QSUB8 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 0 1 0 16 15 Rn 12 11 Rd 8 7 6 5 4 SBO 1 1 1 1 3 0 Rm QSUB8 performs four 8-bit subtractions. It saturates the results to the 8-bit signed integer range –27 ≤ x ≤ 27 – 1. QSUB8 does not affect any flags. Syntax QSUB8{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version Version 6 and above. Exceptions None. Operation if ConditionPassed(cond) then Rd[7:0] = SignedSat(Rn[7:0] Rd[15:8] = SignedSat(Rn[15:8] Rd[23:16] = SignedSat(Rn[23:16] Rd[31:24] = SignedSat(Rn[31:24] - Rm[7:0], Rm[15:8], Rm[23:16], Rm[31:24], 8) 8) 8) 8) Usage Use QSUB8 in similar ways to SSUB8, but for signed saturated arithmetic. QSUB8 does not set the GE bits for use with SEL. See SSUB8 on page A4-182 for more details. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-105 ARM Instructions Notes Use of R15 A4-106 Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.55 QSUBADDX 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 0 1 0 16 15 Rn 12 11 Rd 8 7 6 5 4 SBO 0 1 0 1 3 0 Rm QSUBADDX (Saturating Subtract and Add with Exchange) performs one 16-bit signed integer addition and one 16-bit signed integer subtraction, saturating the results to the 16-bit signed integer range –215 ≤ x ≤ 215 – 1. It exchanges the two halfwords of the second operand before it performs the arithmetic. QSUBADDX does not affect any flags. Syntax QSUBADDX{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version Version 6 and above. Exceptions None. Operation if ConditionPassed(cond) then Rd[31:16] = SignedSat(Rn[31:16] - Rm[15:0], 16) Rd[15:0] = SignedSat(Rn[15:0] + Rm[31:16], 16) ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-107 ARM Instructions Usage You can use QSUBADDX for operations on complex numbers that are held as pairs of 16-bit integers or Q15 numbers. If you hold the real and imaginary parts of a complex number in the bottom and top half of a register respectively, then the instruction: QSUBADDX Rd, Ra, Rb performs the complex arithmetic operation Rd = (Ra – i * Rb). QSUBADDX does not set the Q flag, even if saturation occurs on either operation. Notes Use of R15 A4-108 Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.56 REV 31 28 27 cond 23 22 21 20 19 0 1 1 0 1 0 1 1 16 15 SBO 12 11 Rd 8 7 6 SBO 4 0 0 1 1 3 0 Rm REV (Byte-Reverse Word) reverses the byte order in a 32-bit register. Syntax REV{} Rd, Rm where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the operand. Architecture version Version 6 and above. Exceptions None. Operation if ConditionPassed(cond) then Rd[31:24] = Rm[ 7: 0] Rd[23:16] = Rm[15: 8] Rd[15: 8] = Rm[23:16] Rd[ 7: 0] = Rm[31:24] Usage Use REV to convert 32-bit big-endian data into little-endian data, or 32-bit little-endian data into big-endian data. Notes Use of R15 ARM DDI 0100I Specifying R15 for register or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-109 ARM Instructions A4.1.57 REV16 31 28 27 cond 23 22 21 20 19 0 1 1 0 1 0 1 1 16 15 SBO 12 11 Rd 8 SBO 7 6 4 3 1 0 1 1 0 Rm REV16 (Byte-Reverse Packed Halfword) reverses the byte order in each 16-bit halfword of a 32-bit register. Syntax REV16{} Rd, Rm where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the operand. Architecture version Version 6 and above. Exceptions None. Operation if ConditionPassed(cond) then Rd[15: 8] = Rm[ 7: 0] Rd[ 7: 0] = Rm[15: 8] Rd[31:24] = Rm[23:16] Rd[23:16] = Rm[31:24] Usage Use REV16 to convert 16-bit big-endian data into little-endian data, or 16-bit little-endian data into big-endian data. Notes Use of R15 A4-110 Specifying R15 for register or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.58 REVSH 31 28 27 cond 23 22 21 20 19 0 1 1 0 1 1 1 1 16 15 SBO 12 11 Rd 8 7 6 SBO 4 1 0 1 1 3 0 Rm REVSH (Byte-Reverse Signed Halfword) reverses the byte order in the lower 16-bit halfword of a 32-bit register, and sign extends the result to 32-bits. Syntax REVSH{} Rd, Rm where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the operand. Architecture version Version 6 and above. Exceptions None. Operation if ConditionPassed(cond) then Rd[15: 8] = Rm[ 7: 0] Rd[ 7: 0] = Rm[15: 8] if Rm[7] == 1 then Rd[31:16] = 0xFFFF else Rd[31:16] = 0x0000 Usage Use REVSH to convert either: • 16-bit signed big-endian data into 32-bit signed little-endian data • 16-bit signed little-endian data into 32-bit signed big-endian data. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-111 ARM Instructions Notes Use of R15 A4-112 Specifying R15 for register or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.59 RFE 31 30 29 28 27 26 25 24 23 22 21 20 19 1 1 1 1 1 0 0 P U 0 W 1 16 15 Rn 12 11 10 9 8 7 SBZ 1 0 1 0 0 SBZ RFE (Return From Exception) loads the PC and the CPSR from the word at the specified address and the following word respectively. Syntax RFE {!} where: Is similar to the in LDM and STM instructions, see Addressing Mode 4 Load and Store Multiple on page A5-41, but with the following differences: • The number of registers to load is 2. • The register list is {PC, CPSR}. Specifies the base register to be used by . If R15 is specified as the base register, the result is UNPREDICTABLE. ! If present, sets the W bit. This causes the instruction to write a modified value back to its base register, in a manner similar to that specified for Addressing Mode 4 - Load and Store Multiple on page A5-41. If ! is omitted, the W bit is 0 and the instruction does not change the base register. Architecture version Version 6 and above. Exceptions Data Abort. Usage While RFE supports different base registers, a general usage case is where Rn == sp (the stack pointer), held in R13. The instruction can then be used as the return method associated with instructions SRS and CPS. See New instructions to improve exception handling on page A2-28 for more details. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-113 ARM Instructions Operation address = start_address value = Memory[address,4] If InAPrivilegedMode() then CPSR = Memory[address+4,4] else UNPREDICTABLE PC = value assert end_address == address + 8 where start_address and end_address are determined as described in Addressing Mode 4 - Load and Store Multiple on page A5-41, except that Number_Of_Set_Bits_in(register_list) evaluates to 2, rather than depending on bits[15:0] of the instruction. Notes Data Abort For details of the effects of this instruction if a Data Abort occurs, see Data Abort (data access memory abort) on page A2-21. Non word-aligned addresses In ARMv6, an address with bits[1:0] != 0b00 causes an alignment exception if the CP15 register 1 bits U==1 or A==1, otherwise RFE behaves as if bits[1:0] are 0b00. In earlier implementations, if they include a System Control coprocessor (see Chapter B3 The System Control Coprocessor), an address with bits[1:0] != 0b00 causes an alignment exception if the CP15 register 1 bit A==1, otherwise RFE behaves as if bits[1:0] are 0b00. Time order The time order of the accesses to individual words of memory generated by RFE is not architecturally defined. Do not use this instruction on memory-mapped I/O locations where access order matters. User mode RFE is UNPREDICTABLE in User mode. Condition Unlike most other ARM instructions, RFE cannot be executed conditionally. ARM/Thumb State transfers If the CPSR T bit as loaded is 0 and bit[1] of the value loaded into the PC is 1, the results are UNPREDICTABLE because it is not possible to branch to an ARM instruction at a non word-aligned address. A4-114 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.60 RSB 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 I 0 0 1 1 S 16 15 Rn 12 11 Rd 0 shifter_operand RSB (Reverse Subtract) subtracts a value from a second value. The first value comes from a register. The second value can be either an immediate value or a value from a register, and can be shifted before the subtraction. This is the reverse of the normal order of operands in ARM assembler language. RSB can optionally update the condition code flags, based on the result. Syntax RSB{}{S} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. S Sets the S bit (bit[20]) in the instruction to 1 and specifies that the instruction updates the CPSR. If S is omitted, the S bit is set to 0 and the CPSR is not changed by the instruction. Two types of CPSR update can occur when S is specified: • If is not R15, the N and Z flags are set according to the result of the subtraction, and the C and V flags are set according to whether the subtraction generated a borrow (unsigned underflow) and a signed overflow, respectively. The rest of the CPSR is unchanged. • If is R15, the SPSR of the current mode is copied to the CPSR. This form of the instruction is UNPREDICTABLE if executed in User mode or System mode, because these modes do not have an SPSR. Specifies the destination register. Specifies the register that contains the second operand. Specifies the first operand. The options for this operand are described in Addressing Mode 1 - Data-processing operands on page A5-2, including how each option causes the I bit (bit[25]) and the shifter_operand bits (bits[11:0]) to be set in the instruction. If the I bit is 0 and both bit[7] and bit[4] of shifter_operand are 1, the instruction is not RSB. Instead, see Extending the instruction set on page A3-32 to determine which instruction it is. Architecture version All. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-115 ARM Instructions Exceptions None. Operation if ConditionPassed(cond) then Rd = shifter_operand - Rn if S == 1 and Rd == R15 then if CurrentModeHasSPSR() then CPSR = SPSR else UNPREDICTABLE else if S == 1 then N Flag = Rd[31] Z Flag = if Rd == 0 then 1 else 0 C Flag = NOT BorrowFrom(shifter_operand - Rn) V Flag = OverflowFrom(shifter_operand - Rn) Usage The following instruction stores the negation (twos complement) of Rx in Rd: RSB Rd, Rx, #0 You can perform constant multiplication (of Rx) by 2n–1 (into Rd) with: RSB Rd, Rx, Rx, LSL #n Notes C flag If S is specified, the C flag is set to: 1 if no borrow occurs 0 if a borrow does occur. In other words, the C flag is used as a NOT(borrow) flag. This inversion of the borrow condition is used by subsequent instructions: SBC and RSC use the C flag as a NOT(borrow) operand, performing a normal subtraction if C == 1 and subtracting one more than usual if C == 0. The HS (unsigned higher or same) and LO (unsigned lower) conditions are equivalent to CS (carry set) and CC (carry clear) respectively. A4-116 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.61 RSC 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 I 0 1 1 1 S 16 15 Rn 12 11 Rd 0 shifter_operand RSC (Reverse Subtract with Carry) subtracts one value from another, taking account of any borrow from a preceding less significant subtraction. The normal order of the operands is reversed, to allow subtraction from a shifted register value, or from an immediate value. RSC can optionally update the condition code flags, based on the result. Syntax RSC{}{S} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. S Sets the S bit (bit[20]) in the instruction to 1 and specifies that the instruction updates the CPSR. If S is omitted, the S bit is set to 0 and the CPSR is not changed by the instruction. Two types of CPSR update can occur when S is specified: • If is not R15, the N and Z flags are set according to the result of the subtraction, and the C and V flags are set according to whether the subtraction generated a borrow (unsigned underflow) and a signed overflow, respectively. The rest of the CPSR is unchanged. • If is R15, the SPSR of the current mode is copied to the CPSR. This form of the instruction is UNPREDICTABLE if executed in User mode or System mode, because these modes do not have an SPSR. Specifies the destination register. Specifies the register that contains the second operand. Specifies the first operand. The options for this operand are described in Addressing Mode 1 - Data-processing operands on page A5-2, including how each option causes the I bit (bit[25]) and the shifter_operand bits (bits[11:0]) to be set in the instruction. If the I bit is 0 and both bit[7] and bit[4] of shifter_operand are 1, the instruction is not RSC. Instead, see Extending the instruction set on page A3-32 to determine which instruction it is. Architecture version All. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-117 ARM Instructions Exceptions None. Operation if ConditionPassed(cond) then Rd = shifter_operand - Rn - NOT(C Flag) if S == 1 and Rd == R15 then if CurrentModeHasSPSR() then CPSR = SPSR else UNPREDICTABLE else if S == 1 then N Flag = Rd[31] Z Flag = if Rd == 0 then 1 else 0 C Flag = NOT BorrowFrom(shifter_operand - Rn - NOT(C Flag)) V Flag = OverflowFrom(shifter_operand - Rn - NOT(C Flag)) Usage Use RSC to synthesize multi-word subtraction, in cases where you need the order of the operands reversed to allow subtraction from a shifted register value, or from an immediate value. Example You can negate the 64-bit value in R0,R1 using the following sequence (R0 holds the least significant word), which stores the result in R2,R3: RSBS RSC R2,R0,#0 R3,R1,#0 Notes C flag If S is specified, the C flag is set to: 1 if no borrow occurs 0 if a borrow does occur. In other words, the C flag is used as a NOT(borrow) flag. This inversion of the borrow condition is used by subsequent instructions: SBC and RSC use the C flag as a NOT(borrow) operand, performing a normal subtraction if C == 1 and subtracting one more than usual if C == 0. The HS (unsigned higher or same) and LO (unsigned lower) conditions are equivalent to CS (carry set) and CC (carry clear) respectively. A4-118 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.62 SADD16 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 0 0 1 16 15 Rn 12 11 Rd 8 7 6 5 4 SBO 0 0 0 1 3 0 Rm SADD16 (Signed Add) performs two 16-bit signed integer additions. It sets the GE bits in the CPSR according to the results of the additions. Syntax SADD16{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then sum = Rn[15:0] + Rm[15:0] /* Signed addition */ Rd[15:0] = sum[15:0] GE[1:0] = if sum >= 0 then 0b11 else 0 sum = Rn[31:16] + Rm[31:16] /* Signed addition */ Rd[31:16] = sum[15:0] GE[3:2] = if sum >= 0 then 0b11 else 0 ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-119 ARM Instructions Usage Use the SADD16 instruction to speed up operations on arrays of halfword data. For example, consider the instruction sequence: LDR LDR SADD16 STR R3, R5, R3, R3, [R0], #4 [R1], #4 R3, R5 [R2], #4 This performs the same operations as the instruction sequence: LDRH LDRH ADD STRH LDRH LDRH ADD STRH R3, R4, R3, R3, R3, R4, R3, R3, [R0], #2 [R1], #2 R3, R4 [R2], #2 [R0], #2 [R1], #2 R3, R4 [R2], #2 The first sequence uses half as many instructions and typically half as many cycles as the second sequence. You can also use SADD16 for operations on complex numbers that are held as pairs of 16-bit integers or Q15 numbers. If you hold the real and imaginary parts of a complex number in the bottom and top half of a register respectively, then the instruction: SADD16 Rd, Ra, Rb performs the complex arithmetic operation Rd = Ra + Rb. SADD16 sets the GE flags according to the results of each addition. You can use these in a following SEL instruction. See SEL on page A4-127. Notes Use of R15 A4-120 Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.63 SADD8 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 0 0 1 16 15 Rn 12 11 Rd 8 7 6 5 4 SBO 1 0 0 1 3 0 Rm SADD8 performs four 8-bit signed integer additions. It sets the GE bits in the CPSR according to the results of the additions. Syntax SADD8{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then sum = Rn[7:0] + Rm[7:0] /* Signed Rd[7:0] = sum[7:0] GE[0] = if sum >= 0 then 1 else 0 sum = Rn[15:8] + Rm[15:8] /* Signed Rd[15:8] = sum[7:0] GE[1] = if sum >= 0 then 1 else 0 sum = Rn[23:16] + Rm[23:16] /* Signed Rd[23:16] = sum[7:0] GE[2] = if sum >= 0 then 1 else 0 sum = Rn[31:24] + Rm[31:24] /* Signed Rd[31:24] = sum[7:0] GE[3] = if sum >= 0 then 1 else 0 ARM DDI 0100I addition */ addition */ addition */ addition */ Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-121 ARM Instructions Usage Use SADD8 to speed up operations on arrays of byte data. This is similar to the way you can use the SADD16 instruction. See the usage subsection for SADD16 on page A4-119 for details. SADD8 sets the GE flags according to the results of each addition. You can use these in a following SEL instruction, see SEL on page A4-127. Notes Use of R15 A4-122 Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.64 SADDSUBX 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 0 0 1 16 15 Rn 12 11 Rd 8 7 6 5 4 SBO 0 0 1 1 3 0 Rm SADDSUBX (Signed Add and Subtract with Exchange) performs one 16-bit signed integer addition and one 16-bit signed integer subtraction. It exchanges the two halfwords of the second operand before it performs the arithmetic. It sets the GE bits in the CPSR according to the results of the additions. Syntax SADDSUBX{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then sum = Rn[31:16] + Rm[15:0] /* Signed addition */ Rd[31:16] = sum[15:0] GE[3:2] = if sum >= 0 then 0b11 else 0 diff = Rn[15:0] - Rm[31:16] /* Signed subtraction */ Rd[15:0] = diff[15:0] GE[1:0] = if diff >= 0 then 0b11 else 0 ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-123 ARM Instructions Usage You can use SADDSUBX for operations on complex numbers that are held as pairs of 16-bit integers or Q15 numbers. If you hold the real and imaginary parts of a complex number in the bottom and top half of a register respectively, then the instruction: SADDSUBX Rd, Ra, Rb performs the complex arithmetic operation Rd = Ra + (i * Rb). SADDSUBX sets the GE flags according to the results the operation. You can use these in a following SEL instruction, see SEL on page A4-127. Notes Use of R15 A4-124 Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.65 SBC 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 I 0 1 1 0 S 16 15 Rn 12 11 Rd 0 shifter_operand SBC (Subtract with Carry) subtracts the value of its second operand and the value of NOT(Carry flag) from the value of its first operand. The first operand comes from a register. The second operand can be either an immediate value or a value from a register, and can be shifted before the subtraction. Use SBC to synthesize multi-word subtraction. SBC can optionally update the condition code flags, based on the result. Syntax SBC{}{S} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. S Sets the S bit (bit[20]) in the instruction to 1 and specifies that the instruction updates the CPSR. If S is omitted, the S bit is set to 0 and the CPSR is not changed by the instruction. Two types of CPSR update can occur when S is specified: • If is not R15, the N and Z flags are set according to the result of the subtraction, and the C and V flags are set according to whether the subtraction generated a borrow (unsigned underflow) and a signed overflow, respectively. The rest of the CPSR is unchanged. • If is R15, the SPSR of the current mode is copied to the CPSR. This form of the instruction is UNPREDICTABLE if executed in User mode or System mode, because these modes do not have an SPSR. Specifies the destination register. Specifies the register that contains the first operand. Specifies the second operand. The options for this operand are described in Addressing Mode 1 - Data-processing operands on page A5-2, including how each option causes the I bit (bit[25]) and the shifter_operand bits (bits[11:0]) to be set in the instruction. If the I bit is 0 and both bit[7] and bit[4] of shifter_operand are 1, the instruction is not SBC. Instead, see Extending the instruction set on page A3-32 to determine which instruction it is. Architecture version All. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-125 ARM Instructions Exceptions None. Operation if ConditionPassed(cond) then Rd = Rn - shifter_operand - NOT(C Flag) if S == 1 and Rd == R15 then if CurrentModeHasSPSR() then CPSR = SPSR else UNPREDICTABLE else if S == 1 then N Flag = Rd[31] Z Flag = if Rd == 0 then 1 else 0 C Flag = NOT BorrowFrom(Rn - shifter_operand - NOT(C Flag)) V Flag = OverflowFrom(Rn - shifter_operand - NOT(C Flag)) Usage If register pairs R0,R1 and R2,R3 hold 64-bit values (R0 and R2 hold the least significant words), the following instructions leave the 64-bit difference in R4,R5: SUBS SBC R4,R0,R2 R5,R1,R3 Notes C flag If S is specified, the C flag is set to: 1 if no borrow occurs 0 if a borrow does occur. In other words, the C flag is used as a NOT(borrow) flag. This inversion of the borrow condition is used by subsequent instructions: SBC and RSC use the C flag as a NOT(borrow) operand, performing a normal subtraction if C == 1 and subtracting one more than usual if C == 0. The HS (unsigned higher or same) and LO (unsigned lower) conditions are equivalent to CS (carry set) and CC (carry clear) respectively. A4-126 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.66 SEL 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 1 0 0 0 16 15 Rn 12 11 Rd 8 7 6 5 4 SBO 1 0 1 1 3 0 Rm SEL (Select) selects each byte of its result from either its first operand or its second operand, according to the values of the GE flags. Syntax SEL{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) Rd[7:0] = if GE[0] Rd[15:8] = if GE[1] Rd[23:16] = if GE[2] Rd[31:24] = if GE[3] ARM DDI 0100I then == 1 == 1 == 1 == 1 then then then then Rn[7:0] Rn[15:8] Rn[23:16] Rn[31:24] else else else else Rm[7:0] Rm[15:8] Rm[23:16] Rm[31:24] Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-127 ARM Instructions Usage Use SEL after instructions such as SADD8, SADD16, SSUB8, SSUB16, UADD8, UADD16, USUB8, USUB16, SADDSUBX, SSUBADDX, UADDSUBX and USUBADDX, that set the GE flags. For example, the following sequence of instructions sets each byte of Rd equal to the unsigned minimum of the corresponding bytes of Ra and Rb: USUB8 SEL Rd, Ra, Rb Rd, Rb, Ra Notes Use of R15 A4-128 Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.67 SETEND 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 1 10 9 SBZ 8 7 4 3 E SBZ 0 0 0 0 0 SBZ SETEND modifies the CPSR E bit, without changing any other bits in the CPSR. Syntax SETEND where: Is one of: BE Sets the E bit in the instruction. This sets the CPSR E bit. LE Clears the E bit in the instruction. This clears the CPSR E bit. Architecture version ARMv6 and above. Exceptions None. Operation CPSR = CPSR with specified E bit modification Usage Use SETEND to change the byte order for data accesses. You can use SETEND to increase the efficiency of access to a series of big-endian data fields in an otherwise little-endian application, or to a series of little-endian data fields in an otherwise big-endian application. Notes Condition ARM DDI 0100I Unlike most other ARM instructions, SETEND cannot be executed conditionally. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-129 ARM Instructions A4.1.68 SHADD16 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 0 1 1 16 15 Rn 12 11 Rd 8 SBO 7 6 5 4 3 0 0 0 1 0 Rm SHADD16 (Signed Halving Add) performs two 16-bit signed integer additions, and halves the results. It has no effect on the GE flags. Syntax SHADD16{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then sum = Rn[15:0] + Rm[15:0] /* Signed addition */ Rd[15:0] = sum[16:1] sum = Rn[31:16] + Rm[31:16] /* Signed addition */ Rd[31:16] = sum[16:1] Usage Use SHADD16 for similar purposes to SADD16 (see SADD16 on page A4-119). SHADD16 averages the operands. It does not set any flags, as overflow is not possible. Notes Use of R15 A4-130 Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.69 SHADD8 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 0 1 1 16 15 Rn 12 11 Rd 8 7 6 5 4 SBO 1 0 0 1 3 0 Rm SHADD8 performs four 8-bit signed integer additions, and halves the results. It has no effect on the GE flags. Syntax SHADD8{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then sum = Rn[7:0] + Rm[7:0] /* Signed addition */ Rd[7:0] = sum[8:1] sum = Rn[15:8] + Rm[15:8] /* Signed addition */ Rd[15:8] = sum[8:1] sum = Rn[23:16] + Rm[23:16] /* Signed addition */ Rd[23:16] = sum[8:1] sum = Rn[31:24] + Rm[31:24] /* Signed addition */ Rd[31:24] = sum[8:1] Usage Use SHADD8 similar purposes to SADD16 (see SADD16 on page A4-119). SHADD8 averages the operands. It does not set any flags, as overflow is not possible. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-131 ARM Instructions Notes Use of R15 A4-132 Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.70 SHADDSUBX 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 0 1 1 16 15 Rn 12 11 Rd 8 7 6 5 4 SBO 0 0 1 1 3 0 Rm SHADDSUBX (Signed Halving Add and Subtract with Exchange) performs one 16-bit signed integer addition and one 16-bit signed integer subtraction, and halves the results. It exchanges the two halfwords of the second operand before it performs the arithmetic. SHADDSUBX has no effect on the GE flags. Syntax SHADDSUBX{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then sum = Rn[31:16] + Rm[15:0] Rd[31:16] = sum[16:1] diff = Rn[15:0] - Rm[31:16] Rd[15:0] = diff[16:1] /* Signed addition */ /* Signed subtraction */ Usage Use SHADDSUBX for similar purposes to SADDSUBX, but when you want the results halved. See SADDSUBX on page A4-123 for further details. SHADDSUBX does not set any flags, as overflow is not possible. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-133 ARM Instructions Notes Use of R15 A4-134 Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.71 SHSUB16 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 0 1 1 16 15 Rn 12 11 Rd 8 7 6 5 4 SBO 0 1 1 1 3 0 Rm SHSUB16 (Signed Halving Subtract) performs two 16-bit signed integer subtractions, and halves the results. SHSUB16 has no effect on the GE flags. Syntax SHSUB16{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then diff = Rn[15:0] - Rm[15:0] Rd[15:0] = diff[16:1] diff = Rn[31:16] - Rm[31:16] Rd[31:16] = diff[16:1] ARM DDI 0100I /* Signed subtraction */ /* Signed subtraction */ Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-135 ARM Instructions Usage Use SHSUB16 to speed up operations on arrays of halfword data. This is similar to the way you can use SADD16. See the usage subsection for SADD16 on page A4-119 for details. You can also use SHSUB16 for operations on complex numbers that are held as pairs of 16-bit integers or Q15 numbers. If you hold the real and imaginary parts of a complex number in the bottom and top half of a register respectively, then the instruction: SHSUB16 Rd, Ra, Rb performs the complex arithmetic operation Rd = (Ra - Rb)/2. SHSUB16 does not set any flags, as overflow is not possible. Notes Use of R15 A4-136 Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.72 SHSUB8 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 0 1 1 16 15 Rn 12 11 Rd 8 7 6 5 4 SBO 1 1 1 1 3 0 Rm SHSUB8 performs four 8-bit signed integer subtractions, and halves the results. SHSUB8 has no effect on the GE flags. Syntax SHSUB8{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then diff = Rn[7:0] - Rm[7:0] Rd[7:0] = diff[8:1] diff = Rn[15:8] - Rm[15:8] Rd[15:8] = diff[8:1] diff = Rn[23:16] - Rm[23:16] Rd[23:16] = diff[8:1] diff = Rn[31:24] - Rm[31:24] Rd[31:24] = diff[8:1] ARM DDI 0100I /* Signed subtraction */ /* Signed subtraction */ /* Signed subtraction */ /* Signed subtraction */ Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-137 ARM Instructions Usage Use SHSUB8 to speed up operations on arrays of byte data. This is similar to the way you can use SADD16 to speed up operations on halfword data. See the usage subsection for SADD16 on page A4-119 for details. SHSUB8 does not set any flags, as overflow is not possible. Notes Use of R15 A4-138 Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.73 SHSUBADDX 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 0 1 1 16 15 Rn 12 11 Rd 8 7 6 5 4 SBO 0 1 0 1 3 0 Rm SHSUBADDX (Signed Halving Subtract and Add with Exchange) performs one 16-bit signed integer subtraction and one 16-bit signed integer addition, and halves the results. It exchanges the two halfwords of the second operand before it performs the arithmetic. SHSUBADDX has no effect on the GE flags. Syntax SHSUBADDX{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then diff = Rn[31:16] - Rm[15:0] Rd[31:16] = diff[16:1] sum = Rn[15:0] + Rm[31:16] Rd[15:0] = sum[16:1] ARM DDI 0100I /* Signed subtraction */ /* Signed addition */ Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-139 ARM Instructions Usage Use SHSUBADDX for similar purposes to SSUBADDX, but when you want the results halved. See SSUBADDX on page A4-184 for further details. SHSUBADDX does not set any flags, as overflow is not possible. Notes Use of R15 A4-140 Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.74 SMLA 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 1 0 0 0 0 16 15 Rd 12 11 Rn 8 7 6 5 4 Rs 1 y x 0 3 0 Rm SMLA (Signed multiply-accumulate BB, BT, TB, and TT) performs a signed multiply-accumulate operation. The multiply acts on two signed 16-bit quantities, taken from either the bottom or the top half of their respective source registers. The other halves of these source registers are ignored. The 32-bit product is added to a 32-bit accumulate value and the result is written to the destination register. If overflow occurs during the addition of the accumulate value, the instruction sets the Q flag in the CPSR. It is not possible for overflow to occur during the multiplication. Syntax SMLA{} , , , where: Specifies which half of the source register is used as the first multiply operand. If is B, then x == 0 in the instruction encoding and the bottom half (bits[15:0]) of is used. If is T, then x == 1 in the instruction encoding and the top half (bits[31:16]) of is used. Specifies which half of the source register is used as the second multiply operand. If is B, then y == 0 in the instruction encoding and the bottom half (bits[15:0]) of is used. If is T, then y == 1 in the instruction encoding and the top half (bits[31:16]) of is used. Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the source register whose bottom or top half (selected by ) is the first multiply operand. Specifies the source register whose bottom or top half (selected by ) is the second multiply operand. Specifies the register which contains the accumulate value. Architecture version Version 5TE and above. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-141 ARM Instructions Exceptions None. Operation if ConditionPassed(cond) then if (x == 0) then operand1 = SignExtend(Rm[15:0]) else /* x == 1 */ operand1 = SignExtend(Rm[31:16]) if (y == 0) then operand2 = SignExtend(Rs[15:0]) else /* y == 1 */ operand2 = SignExtend(Rs[31:16]) Rd = (operand1 * operand2) + Rn if OverflowFrom((operand1 * operand2) + Rn) then Q Flag = 1 A4-142 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Usage In addition to its straightforward uses for integer multiply-accumulates, these instructions sometimes provide a faster alternative to Q15 × Q15 + Q31 → Q31 multiply-accumulates synthesized from SMUL and QDADD instructions. The main circumstances under which this is possible are: • if it is known that saturation and/or overflow cannot occur during the calculation • if saturation and/or overflow can occur during the calculation but the Q flag is going to be used to detect this and take remedial action if it does occur. For example, the following code produces the dot product of the four Q15 numbers in R0 and R1 by the four Q15 numbers in R2 and R3: SMULBB QADD SMULTT QDADD SMULBB QDADD SMULTT QDADD R4, R4, R5, R4, R5, R4, R5, R4, R0, R4, R0, R4, R1, R4, R1, R4, R2 R4 R2 R5 R3 R5 R3 R5 In the absence of saturation, the following code provides a faster alternative: SMULBB SMLATT SMLABB SMLATT QADD R4, R4, R4, R4, R4, R0, R0, R1, R1, R4, R2 R2, R4 R3, R4 R3, R4 R4 Furthermore, if saturation and/or overflow occurs in this second sequence, it sets the Q flag. This allows remedial action to be taken, such as scaling down the data values and repeating the calculation. Notes Use of R15 Specifying R15 for register , , , or has UNPREDICTABLE results. Condition flags The SMLA instructions do not affect the N, Z, C, or V flags. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-143 ARM Instructions A4.1.75 SMLAD 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 1 0 0 0 0 16 15 Rd 12 11 Rn 8 Rs 7 6 5 4 3 0 0 X 1 0 Rm SMLAD (Signed Multiply Accumulate Dual) performs two signed 16 x 16-bit multiplications. It adds the products to a 32-bit accumulate operand. Optionally, you can exchange the halfwords of the second operand before performing the arithmetic. This produces top x bottom and bottom x top multiplication. This instruction sets the Q flag if the accumulate operation overflows. Overflow cannot occur during the multiplications. Syntax SMLAD{X}{} , , , where: Sets the X bit of the instruction to 1, and the multiplications are bottom x top and top x bottom. X If the X is omitted, sets the X bit to 0, and the multiplications are bottom x bottom and top x top. Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Specifies the register that contains the accumulate operand. Architecture version ARMv6 and above. Exceptions None. A4-144 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then if X == 1 then operand2 = Rs Rotate_Right 16 else operand2 = Rs product1 = Rm[15:0] * operand2[15:0] /* Signed multiplication */ product2 = Rm[31:16] * operand2[31:16] /* Signed multiplication */ Rd = Rn + product1 + product2 if OverflowFrom(Rn + product1 + product2) then Q flag = 1 Usage Use SMLAD to accumulate the sums of products of 16-bit data, with a 32-bit accumulator. This instruction enables you to do this at approximately twice the speed otherwise possible. This is useful in many applications, for example in filters. You can use the X option for calculating the imaginary part for similar filters acting on complex numbers with 16-bit real and 16-bit imaginary parts. Notes Use of R15 Specifying R15 for register , , or has UNPREDICTABLE results. Note Your assembler must fault the use of R15 for register . Encoding If the field of the instruction contains 0b1111, the instruction is an SMUAD instruction instead, see SMUAD on page A4-164. Early termination If the multiplier implementation supports early termination, it must be implemented on the value of the operand. The type of early termination used (signed or unsigned) is IMPLEMENTATION DEFINED. N, Z, C and V flags The SMLAD instruction leaves the N, Z, C and V flags unchanged. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-145 ARM Instructions A4.1.76 SMLAL 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 0 1 1 1 S 16 15 RdHi 12 11 RdLo 8 Rs 7 6 5 4 3 1 0 0 1 0 Rm SMLAL (Signed Multiply Accumulate Long) multiplies two signed 32-bit values to produce a 64-bit value, and accumulates this with a 64-bit value. SMLAL can optionally update the condition code flags, based on the result. Syntax SMLAL{}{S} , , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. S Causes the S bit (bit[20]) in the instruction to be set to 1 and specifies that the instruction updates the CPSR by setting the N and Z flags according to the result of the multiply-accumulate. If S is omitted, the S bit of the instruction is set to 0 and the entire CPSR is unaffected by the instruction. Supplies the lower 32 bits of the value to be added to the product of and , and is the destination register for the lower 32 bits of the result. Supplies the upper 32 bits of the value to be added to the product of and , and is the destination register for the upper 32 bits of the result. Holds the signed value to be multiplied with the value of . Holds the signed value to be multiplied with the value of . Architecture version All Exceptions None. A4-146 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then RdLo = (Rm * Rs)[31:0] + RdLo /* Signed multiplication */ RdHi = (Rm * Rs)[63:32] + RdHi + CarryFrom((Rm * Rs)[31:0] + RdLo) if S == 1 then N Flag = RdHi[31] Z Flag = if (RdHi == 0) and (RdLo == 0) then 1 else 0 C Flag = unaffected /* See "C and V flags" note */ V Flag = unaffected /* See "C and V flags" note */ Usage SMLAL multiplies signed variables to produce a 64-bit result, which is added to the 64-bit value in the two destination general-purpose registers. The result is written back to the two destination general-purpose registers. Notes Use of R15 Specifying R15 for register , , , or has UNPREDICTABLE results. Operand restriction and must be distinct registers, or the results are UNPREDICTABLE. Specifying the same register for either and , or and , was previously described as producing UNPREDICTABLE results. There is no restriction in ARMv6, and it is believed all relevant ARMv4 and ARMv5 implementations do not require this restriction either, because high performance multipliers read all their operands prior to writing back any results. Early termination If the multiplier implementation supports early termination, it must be implemented on the value of the operand. The type of early termination used (signed or unsigned) is IMPLEMENTATION DEFINED. C and V flags SMLALS is defined to leave the C and V flags unchanged in ARMv5 and above. In earlier versions of the architecture, the values of the C and V flags were UNPREDICTABLE after an SMLALS instruction. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-147 ARM Instructions A4.1.77 SMLAL 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 1 0 1 0 0 16 15 RdHi 12 11 RdLo 8 Rs 7 6 5 4 3 1 y x 0 0 Rm SMLAL (Signed Multiply-Accumulate Long BB, BT, TB, and TT) performs a signed multiply-accumulate operation. The multiply acts on two signed 16-bit quantities, taken from either the bottom or the top half of their respective source registers. The other halves of these source registers are ignored. The 32-bit product is sign-extended and added to the 64-bit accumulate value held in and , and the result is written back to and . Overflow is possible during this instruction, but only as a result of the 64-bit addition. This overflow is not detected if it occurs. Instead, the result wraps around modulo 264. Syntax SMLAL{} , , , where: Specifies which half of the source register is used as the first multiply operand. If is B, then x == 0 in the instruction encoding and the bottom half (bits[15:0]) of is used. If is T, then x == 1 in the instruction encoding and the top half (bits[31:16]) of is used. Specifies which half of the source register is used as the second multiply operand. If is B, then y == 0 in the instruction encoding and the bottom half (bits[15:0]) of is used. If is T, then y == 1 in the instruction encoding and the top half (bits[31:16]) of is used. Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Supplies the lower 32 bits of the 64-bit accumulate value to be added to the product, and is the destination register for the lower 32 bits of the 64-bit result. Supplies the upper 32 bits of the 64-bit accumulate value to be added to the product, and is the destination register for the upper 32 bits of the 64-bit result. Specifies the source register whose bottom or top half (selected by ) is the first multiply operand. Specifies the source register whose bottom or top half (selected by ) is the second multiply operand. Architecture version Version 5TE and above. A4-148 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Exceptions None. Operation if ConditionPassed(cond) then if (x == 0) then operand1 = SignExtend(Rm[15:0]) else /* x == 1 */ operand1 = SignExtend(Rm[31:16]) if (y == 0) then operand2 = SignExtend(Rs[15:0]) else /* y == 1 */ operand2 = SignExtend(Rs[31:16]) RdLo = RdLo + (operand1 * operand2) RdHi = RdHi + (if (operand1*operand2) < 0 then 0xFFFFFFFF else 0) + CarryFrom(RdLo + (operand1 * operand2)) Usage These instructions allow a long sequence of multiply-accumulates of signed 16-bit integers or Q15 numbers to be performed, with sufficient guard bits to ensure that the result cannot overflow the 64-bit destination in practice. It would take more than 233 consecutive multiply-accumulates to cause such overflow. If the overall calculation does not overflow a signed 32-bit number, then holds the result of the calculation. A simple test to determine whether such a calculation has overflowed is to execute the instruction: CMP , , ASR #31 at the end of the calculation. If the Z flag is set, holds an accurate final result. If the Z flag is clear, the final result has overflowed a signed 32-bit destination. Notes Use of R15 Specifying R15 for register , , , or has UNPREDICTABLE results. Operand restriction If and are the same register, the results are UNPREDICTABLE. Early termination If the multiplier implementation supports early termination, it must be implemented on the value of the operand. The type of early termination used (signed or unsigned) is IMPLEMENTATION DEFINED. Condition flags The SMLAL instructions do not affect the N, Z, C, V, or Q flags. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-149 ARM Instructions A4.1.78 SMLALD 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 1 0 1 0 0 16 15 RdHi 12 11 RdLo 8 Rs 7 6 5 4 3 0 0 X 1 0 Rm SMLALD (Signed Multiply Accumulate Long Dual) performs two signed 16 x 16-bit multiplications. It adds the products to a 64-bit accumulate operand. Optionally, you can exchange the halfwords of the second operand before performing the arithmetic. This produces top x bottom and bottom x top multiplication. Syntax SMLALD{X}{} , , , where: Sets the X bit of the instruction to 1, and the multiplications are bottom x top and top x bottom. X If the X is omitted, sets the X bit to 0, and the multiplications are bottom x bottom and top x top. Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Supplies the lower 32 bits of the 64-bit accumulate value to be added to the product, and is the destination register for the lower 32 bits of the 64-bit result. Supplies the upper 32 bits of the 64-bit accumulate value to be added to the product, and is the destination register for the upper 32 bits of the 64-bit result. Specifies the register that contains the first multiply operand. Specifies the register that contains the second multiply operand. Architecture version ARMv6 and above. Exceptions None. A4-150 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then if X == 1 then operand2 = Rs Rotate_Right 16 else operand2 = Rs accvalue[31:0] = RdLo accvalue[63:32] = RdHi product1 = Rm[15:0] * operand2[15:0] product2 = Rm[31:16] * operand2[31:16] result = accvalue + product1 + product2 RdLo = result[31:0] RdHi = result[63:32] /* Signed multiplication */ /* Signed multiplication */ /* Signed addition */ Usage Use SMLALD in similar ways to SMLAD, but when you require a 64-bit accumulator instead of a 32-bit accumulator. On most implementations, this runs more slowly. See the usage section for SMLAD on page A4-144 for further details. Notes Use of R15 Specifying R15 for register , , , or has UNPREDICTABLE results. Operand restriction If and are the same register, the results are UNPREDICTABLE. Early termination If the multiplier implementation supports early termination, it must be implemented on the value of the operand. The type of early termination used (signed or unsigned) is IMPLEMENTATION DEFINED. Flags SMLALD leaves all the flags unchanged. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-151 ARM Instructions A4.1.79 SMLAW 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 1 0 0 1 0 16 15 Rd 12 11 Rn 8 Rs 7 6 5 4 3 1 y 0 0 0 Rm SMLAW (Signed Multiply-Accumulate Word B and T) performs a signed multiply-accumulate operation. The multiply acts on a signed 32-bit quantity and a signed 16-bit quantity, with the latter being taken from either the bottom or the top half of its source register. The other half of the second source register is ignored. The top 32 bits of the 48-bit product are added to a 32-bit accumulate value and the result is written to the destination register. The bottom 16 bits of the 48-bit product are ignored. If overflow occurs during the addition of the accumulate value, the instruction sets the Q flag in the CPSR. No overflow can occur during the multiplication, because of the use of the top 32 bits of the 48-bit product. Syntax SMLAW{} , , , where: Specifies which half of the source register is used as the second multiply operand. If is B, then y == 0 in the instruction encoding and the bottom half (bits[15:0]) of is used. If is T, then y == 1 in the instruction encoding and the top half (bits[31:16]) of is used. Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the source register which contains the 32-bit first multiply operand. Specifies the source register whose bottom or top half (selected by ) is the second multiply operand. Specifies the register which contains the accumulate value. Architecture version Version 5TE and above. Exceptions None. A4-152 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then if (y == 0) then operand2 = SignExtend(Rs[15:0]) else /* y == 1 */ operand2 = SignExtend(Rs[31:16]) Rd = (Rm * operand2)[47:16] + Rn /* Signed multiplication */ if OverflowFrom((Rm * operand2)[47:16] + Rn) then Q Flag = 1 Usage In addition to their straightforward uses for integer multiply-accumulates, these instructions sometimes provide a faster alternative to Q31 × Q15 + Q31 → Q31 multiply-accumulates synthesized from SMULW and QDADD instructions. The circumstances under which this is possible and the benefits it provides are very similar to those for the SMLA instructions. See Usage on page A4-143 for more details. Notes Use of R15 Specifying R15 for register , , , or has UNPREDICTABLE results. Early termination If the multiplier implementation supports early termination, it must be implemented on the value of the operand. The type of early termination used (signed or unsigned) is IMPLEMENTATION DEFINED. Condition flags The SMLAW instructions do not affect the N, Z, C, or V flags. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-153 ARM Instructions A4.1.80 SMLSD 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 1 0 0 0 0 16 15 Rd 12 11 Rn 8 Rs 7 6 5 4 3 0 1 X 1 0 Rm SMLSD (Signed Multiply Subtract accumulate Dual) performs two signed 16 x 16-bit multiplications. It adds the difference of the products to a 32-bit accumulate operand. Optionally, you can exchange the halfwords of the second operand before performing the arithmetic. This produces top x bottom and bottom x top multiplication. This instruction sets the Q flag if the accumulate operation overflows. Overflow cannot occur during the multiplications or subtraction. Syntax SMLSD{X}{} , , , where: Sets the X bit of the instruction to 1, and the multiplications are bottom x top and top x bottom. X If the X is omitted, sets the X bit to 0, and the multiplications are bottom x bottom and top x top. Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first multiply operand. Specifies the register that contains the second multiply operand. Specifies the register that contains the accumulate operand. Architecture version ARMv6 and above. Exceptions None. A4-154 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then if X == 1 then operand2 = Rs Rotate_Right 16 else operand2 = Rs product1 = Rm[15:0] * operand2[15:0] /* Signed multiplication */ product2 = Rm[31:16] * operand2[31:16] /* Signed multiplication */ diffofproducts = product1 - product2 /* Signed subtraction */ Rd = Rn + diffofproducts if OverflowFrom(Rn + diffofproducts) then Q flag = 1 Usage You can use SMLSD for calculating the real part in filters with 32-bit accumulators, acting on complex numbers with 16-bit real and 16-bit imaginary parts. See also the usage section for SMLAD on page A4-144. Notes Use of R15 Specifying R15 for register , , or has UNPREDICTABLE results. Note Your assembler must fault the use of R15 for register . Encoding If the field of the instruction contains 0b1111, the instruction is an SMUSD instruction instead, see SMUSD on page A4-172. Early termination If the multiplier implementation supports early termination, it must be implemented on the value of the operand. The type of early termination used (signed or unsigned) is IMPLEMENTATION DEFINED. N, Z, C and V flags SMLSD leaves the N, Z, C and V flags unchanged. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-155 ARM Instructions A4.1.81 SMLSLD 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 1 0 1 0 0 16 15 RdHi 12 11 RdLo 8 Rs 7 6 5 4 3 0 1 X 1 0 Rm SMLSLD (Signed Multiply Subtract accumulate Long Dual) performs two signed 16 x 16-bit multiplications. It adds the difference of the products to a 64-bit accumulate operand. Optionally, you can exchange the halfwords of the second operand before performing the arithmetic. This produces top x bottom and bottom x top multiplication. Syntax SMLSLD{X}{} , , , where: Sets the X bit of the instruction to 1, and the multiplications are bottom x top and top x bottom. X If the X is omitted, sets the X bit to 0, and the multiplications are bottom x bottom and top x top. Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Supplies the lower 32 bits of the 64-bit accumulate value to be added to the product, and is the destination register for the lower 32 bits of the 64-bit result. Supplies the upper 32 bits of the 64-bit accumulate value to be added to the product, and is the destination register for the upper 32 bits of the 64-bit result. Specifies the register that contains the first multiply operand. Specifies the register that contains the second multiply operand. Architecture version ARMv6 and above. Exceptions None. A4-156 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then if X == 1 then operand2 = Rs Rotate_Right 16 else operand2 = Rs accvalue[31:0] = RdLo accvalue[63:32] = RdHi product1 = Rm[15:0] * operand2[15:0] product2 = Rm[31:16] * operand2[31:16] result = accvalue + product1 - product2 RdLo = result[31:0] RdHi = result[63:32] /* Signed multiplication */ /* Signed multiplication */ /* Signed subtraction */ Usage The instruction has similar uses to those of the SMLSD instruction (see the Usage section for SMLSD on page A4-154), but when 64-bit accumulators are required rather than 32-bit accumulators. On most implementations, the resulting filter will not run as fast as a version using SMLSD, but it has many more guard bits against overflow. See also the usage section for SMLAD on page A4-144. Notes Use of R15 Specifying R15 for register , , or has UNPREDICTABLE results. Operand restriction If and are the same register, the results are UNPREDICTABLE. Early termination If the multiplier implementation supports early termination, it must be implemented on the value of the operand. The type of early termination used (signed or unsigned) is IMPLEMENTATION DEFINED. Flags SMLSD leaves all the flags unchanged. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-157 ARM Instructions A4.1.82 SMMLA 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 1 0 1 0 1 16 15 Rd 12 11 Rn 8 Rs 7 6 5 4 3 0 0 R 1 0 Rm SMMLA (Signed Most significant word Multiply Accumulate) multiplies two signed 32-bit values, extracts the most significant 32 bits of the result, and adds an accumulate value. Optionally, you can specify that the result is rounded instead of being truncated. In this case, the constant 0x80000000 is added to the product before the high word is extracted. Syntax SMMLA{R}{} , , , where: Sets the R bit of the instruction to 1. The multiplication is rounded. R If the R is omitted, sets the R bit to 0. The multiplication is truncated. Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first multiply operand. Specifies the register that contains the second multiply operand. Specifies the register that contains the accumulate operand. Architecture version ARMv6 and above. Exceptions None. A4-158 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then value = Rm * Rs /* Signed multiplication */ if R == 1 then Rd = ((Rn<<32) + value + 0x80000000)[63:32] else Rd = ((Rn<<32) + value)[63:32] Usage Provides fast multiplication for 32-bit fractional arithmetic. For example, the multiplies take two Q31 inputs and give a Q30 result (where Qn is a fixed point number with n bits of fraction). A short discussion on fractional arithmetic is provided in Saturated Q15 and Q31 arithmetic on page A2-69. Notes Use of R15 Specifying R15 for register , , or has UNPREDICTABLE results. Note Your assembler must fault the use of R15 for register . Encoding If the field of the instruction contains 0b1111, the instruction is an SMMUL instruction instead, see SMMUL on page A4-162. Early termination If the multiplier implementation supports early termination, it must be implemented on the value of the operand. The type of early termination used (signed or unsigned) is IMPLEMENTATION DEFINED. Flags SMMLA leaves all the flags unchanged. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-159 ARM Instructions A4.1.83 SMMLS 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 1 0 1 0 1 16 15 Rd 12 11 Rn 8 Rs 7 6 5 4 3 1 1 R 1 0 Rm SMMLS (Signed Most significant word Multiply Subtract) multiplies two signed 32-bit values, extracts the most significant 32 bits of the result, and subtracts it from an accumulate value. Optionally, you can specify that the result is rounded instead of being truncated. In this case, the constant 0x80000000 is added to the accumulated value before the high word is extracted. Syntax SMMLS{R}{} , , , where: Sets the R bit of the instruction to 1. The multiplication is rounded. R If the R is omitted, sets the R bit to 0. The multiplication is truncated. Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first multiply operand. Specifies the register that contains the second multiply operand. Specifies the register that contains the accumulate operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then value = Rm * Rs /* Signed multiplication */ if R == 1 then Rd = ((Rn<<32) - value + 0x80000000)[63:32] else Rd = ((Rn<<32) – value)[63:32] A4-160 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Usage Provides fast multiplication for 32-bit fractional arithmetic. For example, the multiplies take two Q31 inputs and give a Q30 result (where Qn is a fixed point number with n bits of fraction). Notes Use of R15 Specifying R15 for register , , , or has UNPREDICTABLE results. Early termination If the multiplier implementation supports early termination, it must be implemented on the value of the operand. The type of early termination used (signed or unsigned) is IMPLEMENTATION DEFINED. Flags SMMLS leaves all the flags unchanged. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-161 ARM Instructions A4.1.84 SMMUL 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 1 0 1 0 1 16 15 14 13 12 11 Rd 1 1 1 1 8 Rs 7 6 5 4 3 0 0 R 1 0 Rm SMMUL (Signed Most significant word Multiply) multiplies two signed 32-bit values, and extracts the most significant 32 bits of the result. Optionally, you can specify that the result is rounded instead of being truncated. In this case, the constant 0x80000000 is added to the product before the high word is extracted. Syntax SMMUL{R}{} , , where: Sets the R bit of the instruction to 1. The multiplication is rounded. R If the R is omitted, sets the R bit to 0. The multiplication is truncated. Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first multiply operand. Specifies the register that contains the second multiply operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then if R == 1 then value = Rm * Rs + 0x80000000 else value = Rm * Rs Rd = value[63:32] A4-162 /* Signed multiplication */ /* Signed multiplication */ Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Usage You can use SMMUL in combination with QADD or QDADD to perform Q31 multiplies and multiply-accumulates. It has two advantages over a combination of SMULL with QADD or QDADD: • you can round the product • no scratch register is required for the least significant half of the product. You can also use SMMUL in optimized Fast Fourier Transforms and similar algorithms. Notes Use of R15 Specifying R15 for register , , or has UNPREDICTABLE results. Early termination If the multiplier implementation supports early termination, it must be implemented on the value of the operand. The type of early termination used (signed or unsigned) is IMPLEMENTATION DEFINED. Flags SMMUL leaves all the flags unchanged. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-163 ARM Instructions A4.1.85 SMUAD 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 1 0 0 0 0 16 15 14 13 12 11 Rd 1 1 1 1 8 Rs 7 6 5 4 3 0 0 X 1 0 Rm SMUAD (Signed Dual Multiply Add) performs two signed 16 x 16-bit multiplications. It adds the products together, giving a 32-bit result. Optionally, you can exchange the halfwords of the second operand before performing the arithmetic. This produces top x bottom and bottom x top multiplication. This instruction sets the Q flag if the addition overflows. The multiplications cannot overflow. Syntax SMUAD{X}{} , , where: Sets the X bit of the instruction to 1, and the multiplications are bottom x top and top x bottom. X If the X is omitted, sets the X bit to 0, and the multiplications are bottom x bottom and top x top. Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. A4-164 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then if X == 1 then operand2 = Rs Rotate_Right 16 else operand2 = Rs product1 = Rm[15:0] * operand2[15:0] /* Signed multiplication */ product2 = Rm[31:16] * operand2[31:16] /* Signed multiplication */ Rd = product1 + product2 if OverflowFrom(product1 + product2) then Q flag = 1 Usage Use SMUAD for the first pair of multiplications in a sequence that uses the SMLAD instruction for the following multiplications, see SMLAD on page A4-144. You can use the X option for calculating the imaginary part of a product of complex numbers with 16-bit real and 16-bit imaginary parts. Notes Use of R15 Specifying R15 for register , , or has UNPREDICTABLE results. Early termination If the multiplier implementation supports early termination, it must be implemented on the value of the operand. The type of early termination used (signed or unsigned) is IMPLEMENTATION DEFINED. N, Z, C and V flags SMUAD leaves the N, Z, C and V flags unchanged. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-165 ARM Instructions A4.1.86 SMUL 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 1 0 1 1 0 16 15 Rd 12 11 SBZ 8 Rs 7 6 5 4 3 1 y x 0 0 Rm SMUL (Signed Multiply BB, BT, TB, or TT) performs a signed multiply operation. The multiply acts on two signed 16-bit quantities, taken from either the bottom or the top half of their respective source registers. The other halves of these source registers are ignored. No overflow is possible during this instruction. Syntax SMUL{} , , where: Specifies which half of the source register is used as the first multiply operand. If is B, then x == 0 in the instruction encoding and the bottom half (bits[15:0]) of is used. If is T, then x == 1 in the instruction encoding and the top half (bits[31:16]) of is used. Specifies which half of the source register is used as the second multiply operand. If is B, then y == 0 in the instruction encoding and the bottom half (bits[15:0]) of is used. If is T, then y == 1 in the instruction encoding and the top half (bits[31:16]) of is used. Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the source register whose bottom or top half (selected by ) is the first multiply operand. Specifies the source register whose bottom or top half (selected by ) is the second multiply operand. Architecture version ARMv5TE and above. Exceptions None. A4-166 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then if (x == 0) then operand1 = SignExtend(Rm[15:0]) else /* x == 1 */ operand1 = SignExtend(Rm[31:16]) if (y == 0) then operand2 = SignExtend(Rs[15:0]) else /* y == 1 */ operand2 = SignExtend(Rs[31:16]) Rd = operand1 * operand2 Usage In addition to its straightforward uses for integer multiplies, this instruction can be used in combination with QADD, QDADD, and QDSUB to perform multiplies, multiply-accumulates, and multiply-subtracts on Q15 numbers. See the Usage sections on page A4-93, page A4-100, and page A4-102 for examples. Notes Use of R15 Specifying R15 for register , , or has UNPREDICTABLE results. Early termination If the multiplier implementation supports early termination, it must be implemented on the value of the operand. The type of early termination used (signed or unsigned) is IMPLEMENTATION DEFINED. Condition flags SMUL does not affect the N, Z, C, V, or Q flags. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-167 ARM Instructions A4.1.87 SMULL 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 0 1 1 0 S 16 15 RdHi 12 11 RdLo 8 Rs 7 6 5 4 3 1 0 0 1 0 Rm SMULL (Signed Multiply Long) multiplies two 32-bit signed values to produce a 64-bit result. SMULL can optionally update the condition code flags, based on the 64-bit result. Syntax SMULL{}{S} , , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. S Causes the S bit (bit[20]) in the instruction to be set to 1 and specifies that the instruction updates the CPSR by setting the N and Z flags according to the result of the multiplication. If S is omitted, the S bit of the instruction is set to 0 and the entire CPSR is unaffected by the instruction. Stores the lower 32 bits of the result. Stores the upper 32 bits of the result. Holds the signed value to be multiplied with the value of . Holds the signed value to be multiplied with the value of . Architecture version All. Exceptions None. A4-168 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then RdHi = (Rm * Rs)[63:32] /* Signed multiplication */ RdLo = (Rm * Rs)[31:0] if S == 1 then N Flag = RdHi[31] Z Flag = if (RdHi == 0) and (RdLo == 0) then 1 else 0 C Flag = unaffected /* See "C and V flags" note */ V Flag = unaffected /* See "C and V flags" note */ Usage SMULL multiplies signed variables to produce a 64-bit result in two general-purpose registers. Notes Use of R15 Specifying R15 for register , , , or has UNPREDICTABLE results. Operand restriction and must be distinct registers, or the results are UNPREDICTABLE. Specifying the same register for either and , or and , was previously described as producing UNPREDICTABLE results. There is no restriction in ARMv6, and it is believed all relevant ARMv4 and ARMv5 implementations do not require this restriction either, because high performance multipliers read all their operands prior to writing back any results. Early termination If the multiplier implementation supports early termination, it must be implemented on the value of the operand. The type of early termination used (signed or unsigned) is IMPLEMENTATION DEFINED. C and V flags SMULLS is defined to leave the C and V flags unchanged in ARMv5 and above. In earlier versions of the architecture, the values of the C and V flags were UNPREDICTABLE after an SMULLS instruction. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-169 ARM Instructions A4.1.88 SMULW 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 1 0 0 1 0 16 15 Rd 12 11 SBZ 8 Rs 7 6 5 4 3 1 y 1 0 0 Rm SMULW (Signed Multiply Word B and T) performs a signed multiply operation. The multiply acts on a signed 32-bit quantity and a signed 16-bit quantity, with the latter being taken from either the bottom or the top half of its source register. The other half of the second source register is ignored. The top 32 bits of the 48-bit product are written to the destination register. The bottom 16 bits of the 48-bit product are ignored. No overflow is possible during this instruction. Syntax SMULW{} , , where: Specifies which half of the source register is used as the second multiply operand. If is B, then y == 0 in the instruction encoding and the bottom half (bits[15:0]) of is used. If is T, then y == 1 in the instruction encoding and the top half (bits[31:16]) of is used. Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the source register which contains the 32-bit first operand. Specifies the source register whose bottom or top half (selected by ) is the second operand. Architecture version ARMv5TE and above. Exceptions None. A4-170 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then if (y == 0) then operand2 = SignExtend(Rs[15:0]) else /* y == 1 */ operand2 = SignExtend(Rs[31:16]) Rd = (Rm * operand2)[47:16] /* Signed multiplication */ Usage In addition to its straightforward uses for integer multiplies, this instruction can be used in combination with QADD, QDADD, and QDSUB to perform multiplies, multiply-accumulates and multiply-subtracts between Q31 and Q15 numbers. See the Usage sections on page A4-93, page A4-100, and page A4-102 for examples. Notes Use of R15 Specifying R15 for register , , or has UNPREDICTABLE results. Early termination If the multiplier implementation supports early termination, it must be implemented on the value of the operand. The type of early termination used (signed or unsigned) is IMPLEMENTATION DEFINED. Flags SMULW leaves all the flags unchanged. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-171 ARM Instructions A4.1.89 SMUSD 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 1 0 0 0 0 16 15 14 13 12 11 Rd 1 1 1 1 8 Rs 7 6 5 4 3 0 1 X 1 0 Rm SMUSD (Signed Dual Multiply Subtract) performs two signed 16 x 16-bit multiplications. It subtracts one product from the other, giving a 32-bit result. Optionally, you can exchange the halfwords of the second operand before performing the arithmetic. This produces top x bottom and bottom x top multiplication. Overflow cannot occur. Syntax SMUSD{X}{} , , where: Sets the X bit of the instruction to 1. The multiplications are bottom x top and top x bottom. X If the X is omitted, sets the X bit to 0. The multiplications are bottom x bottom and top x top. Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first multiply operand. Specifies the register that contains the second multiply operand. Architecture version ARMv6 and above. Exceptions None. A4-172 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then if X == 1 then operand2 = Rs Rotate_Right 16 else operand2 = Rs product1 = Rm[15:0] * operand2[15:0] product2 = Rm[31:16] * operand2[31:16] Rd = product1 - product2 /* Signed multiplication */ /* Signed multiplication */ /* Signed subtraction */ Usage You can use SMUSD for calculating the real part of a complex product of complex numbers with 16-bit real and 16-bit imaginary parts. Notes Use of R15 Specifying R15 for register , , or has UNPREDICTABLE results. Early termination If the multiplier implementation supports early termination, it must be implemented on the value of the operand. The type of early termination used (signed or unsigned) is IMPLEMENTATION DEFINED. Flags SMUSD leaves all the flags unchanged. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-173 ARM Instructions A4.1.90 SRS 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 1 1 1 1 1 0 0 P U 1 W 0 1 1 0 1 12 11 10 9 8 SBZ 0 1 0 1 7 5 4 SBZ 0 mode SRS (Store Return State) stores the R14 and SPSR of the current mode to the word at the specified address and the following word respectively. The address is determined from the banked version of R13 belonging to a specified mode. Syntax SRS #{!} where: Is similar to the in LDM and STM instructions, see Addressing Mode 4 Load and Store Multiple on page A5-41, but with the following differences: • The base register, Rn, is the banked version of R13 for the mode specified by , rather than the current mode. • The number of registers to store is 2. • The register list is {R14, SPSR}, with both R14 and the SPSR being the versions belonging to the current mode. Specifies the number of the mode whose banked register is used as the base register for . The mode number is the 5-bit encoding of the chosen mode in a PSR, as described in The mode bits on page A2-14. ! If present, sets the W bit. This causes the instruction to write a modified value back to its base register, in a manner similar to that specified for Addressing Mode 4 - Load and Store Multiple on page A5-41. If ! is omitted, the W bit is 0 and the instruction does not change the base register. Architecture version ARMv6 and above. Exceptions Data Abort. A4-174 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation MemoryAccess(B-bit, E-bit) processor_id = ExecutingProcessor() address = start_address Memory[address,4] = R14 if Shared(address) then /* from ARMv6 */ physical_address = TLB(address) ClearExclusiveByAddress(physical_address,processor_id,4) if CurrentModeHasSPSR() then Memory[address+4,4] = SPSR if Shared(address+4) then /* from ARMv6 */ physical_address = TLB(address+4) ClearExclusiveByAddress(physical_address,processor_id,4) else UNPREDICTABLE assert end_address == address + 8 where start_address and end_address are determined as described in Addressing Mode 4 - Load and Store Multiple on page A5-41, with the following modifications: • Number_Of_Set_Bits_in(register_list) evaluates to 2, rather than depending on bits[15:0] of the instruction. • Rn is the banked version of R13 belonging to the mode specified by the instruction, rather than being the version of R13 of the current mode. Notes Data Abort For details of the effects of this instruction if a Data Abort occurs, see Data Abort (data access memory abort) on page A2-21. Non word-aligned addresses In ARMv6, an address with bits[1:0] != 0b00 causes an alignment exception if CP15 register 1 bits U==1 or A==1. Otherwise, SRS behaves as if bits[1:0] are 0b00. Time order The time order of the accesses to individual words of memory generated by SRS is not architecturally defined. Do not use this instruction on memory-mapped I/O locations where access order matters. User and System modes SRS is UNPREDICTABLE in User and System modes, because they do not have SPSRs. Note In User mode, SRS must not give access to any banked registers belonging to other modes. This would constitute a security hole. Condition ARM DDI 0100I Unlike most other ARM instructions, SRS cannot be executed conditionally. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-175 ARM Instructions A4.1.91 SSAT 31 28 27 26 25 24 23 22 21 20 cond 0 1 1 0 1 0 1 16 15 sat_imm 12 11 Rd 7 6 5 4 3 shift_imm sh 0 1 0 Rm SSAT (Signed Saturate) saturates a signed value to a signed range. You can choose the bit position at which saturation occurs. You can apply a shift to the value before the saturation occurs. The Q flag is set if the operation saturates. Syntax SSAT{} , #, {, } where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the bit position for saturation, in the range 1 to 32. It is encoded in the sat_imm field of the instruction as -1. Specifies the register that contains the signed value to be saturated. Specifies the optional shift. If present, it must be one of: • LSL #N. N must be in the range 0 to 31. This is encoded as sh == 0 and shift_imm == N. • ASR #N. N must be in the range 1 to 32. This is encoded as sh == 1 and either shift_imm == 0 for N == 32, or shift_imm == N otherwise. If is omitted, LSL #0 is used. Return The value returned in Rd is: –2(n–1) if X is < –2(n–1) X if –2(n–1) <= X <= 2(n–1) – 1 2(n–1) – 1 if X > 2(n–1) – 1 where n is , and X is the shifted value from Rm. A4-176 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then if shift == 1 then if shift_imm == 0 then operand = (Rm Artihmetic_Shift_Right 32)[31:0] else operand = (Rm Artihmetic_Shift_Right shift_imm)[31:0] else operand = (Rm Logical_Shift_Left shift_imm)[31:0] Rd = SignedSat(operand, sat_imm + 1) if SignedDoesSat(operand, sat_imm + 1) then Q Flag = 1 Usage You can use SSAT in various DSP algorithms that require scaling and saturation of signed data. Notes Use of R15 ARM DDI 0100I Specifying R15 for register or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-177 ARM Instructions A4.1.92 SSAT16 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 1 0 1 0 16 15 sat_imm 12 11 Rd 8 SBO 7 6 5 4 3 0 0 1 1 0 Rm SSAT16 saturates two 16-bit signed values to a signed range. You can choose the bit position at which saturation occurs. The Q flag is set if either halfword operation saturates. Syntax SSAT16{} , #, where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the bit position for saturation. This lies in the range 1 to 16. It is encoded in the sat_imm field of the instruction as -1. Specifies the register that contains the signed value to be saturated. Return The value returned in each half of Rd is: –2(n–1) if X is < –2(n–1) X if –2(n–1) <= X <= 2(n–1) – 1 2(n–1) – 1 if X > 2(n–1) – 1 where n is , and X is the value from the corresponding half of Rm. Architecture version ARMv6 and above. Exceptions None. A4-178 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then Rd[15:0] = SignedSat(Rm[15:0], sat_imm + 1) Rd[31:16] = SignedSat(Rm[31:16], sat_imm + 1) if SignedDoesSat(Rm[15:0], sat_imm + 1) OR SignedDoesSat(Rm[31:16], sat_imm + 1) then Q Flag = 1 Usage You can use SSAT16 in various DSP algorithms that require saturation of signed data. Notes Use of R15 ARM DDI 0100I Specifying R15 for register or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-179 ARM Instructions A4.1.93 SSUB16 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 0 0 1 16 15 Rn 12 11 Rd 8 SBO 7 6 5 4 3 0 1 1 1 0 Rm SSUB16 (Signed Subtract) performs two 16-bit signed integer subtractions. It sets the GE bits in the CPSR according to the results of the subtractions. Syntax SSUB16{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then diff = Rn[15:0] - Rm[15:0] Rd[15:0] = diff[15:0] GE[1:0] = if diff >= 0 then 0b11 diff = Rn[31:16] - Rm[31:16] Rd[31:16] = diff[15:0] GE[3:2] = if diff >= 0 then 0b11 A4-180 /* Signed subtraction */ else 0 /* Signed subtraction */ else 0 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Usage Use SSUB16 to speed up operations on arrays of halfword data. This is similar to the way you can use SADD16. See the usage subsection for SADD16 on page A4-119 for details. You can also use SSUB16 for operations on complex numbers that are held as pairs of 16-bit integers or Q15 numbers. If you hold the real and imaginary parts of a complex number in the bottom and top half of a register respectively, then the instruction: SSUB16 Rd, Ra, Rb performs the complex arithmetic operation Rd = Ra - Rb. SSUB16 sets the GE flags according to the results of each subtraction. You can use these in a following SEL instruction. See SEL on page A4-127 for further information. Notes Use of R15 ARM DDI 0100I Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-181 ARM Instructions A4.1.94 SSUB8 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 0 0 1 16 15 Rn 12 11 Rd 8 SBO 7 6 5 4 3 1 1 1 1 0 Rm SSUB8 performs four 8-bit signed integer subtractions. It sets the GE bits in the CPSR according to the results of the subtractions. Syntax SSUB8{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then diff = Rn[7:0] - Rm[7:0] /* Signed Rd[7:0] = diff[7:0] GE[0] = if diff >= 0 then 1 else 0 diff = Rn[15:8] - Rm[15:8] /* Signed Rd[15:8] = diff[7:0] GE[1] = if diff >= 0 then 1 else 0 diff = Rn[23:16] - Rm[23:16] /* Signed Rd[23:16] = diff[7:0] GE[2] = if diff >= 0 then 1 else 0 diff = Rn[31:24] - Rm[31:24] /* Signed Rd[31:24] = diff[7:0] GE[3] = if diff >= 0 then 1 else 0 A4-182 subtraction */ subtraction */ subtraction */ subtraction */ Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Usage Use SSUB8 to speed up operations on arrays of byte data. This is similar to the way you can use SADD16 to speed up operations on halfword data. See the usage subsection for SADD16 on page A4-119 for details. Notes Use of R15 ARM DDI 0100I Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-183 ARM Instructions A4.1.95 SSUBADDX 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 0 0 1 16 15 Rn 12 11 Rd 8 SBO 7 6 5 4 3 0 1 0 1 0 Rm SSUBADDX (Signed Subtract and Add with Exchange) performs one 16-bit signed integer subtraction and one 16-bit signed integer addition. It exchanges the two halfwords of the second operand before it performs the arithmetic. SSUBADDX sets the GE bits in the CPSR according to the results. Syntax SSUBADDX{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then diff = Rn[31:16] - Rm[15:0] /* Signed subtraction */ Rd[31:16] = diff[15:0] GE[3:2] = if diff >= 0 then 0b11 else 0 sum = Rn[15:0] + Rm[31:16] /* Signed addition */ Rd[15:0] = sum[15:0] GE[1:0] = if sum >= 0 then 0b11 else 0 A4-184 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Usage You can use SSUBADDX for operations on complex numbers that are held as pairs of 16-bit integers or Q15 numbers. If you hold the real and imaginary parts of a complex number in the bottom and top half of a register respectively, then the instruction: SSUBADDX Rd, Ra, Rb performs the complex arithmetic operation Rd = Ra - i * Rb. Notes Use of R15 ARM DDI 0100I Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-185 ARM Instructions A4.1.96 STC 31 28 27 26 25 24 23 22 21 20 19 cond 1 1 0 P U N W 0 16 15 Rn 12 11 CRd cp_num 8 7 0 8_bit_word_offset STC (Store Coprocessor) stores data from a coprocessor to a sequence of consecutive memory addresses. If no coprocessors indicate that they can execute the instruction, an Undefined Instruction exception is generated. Syntax STC{}{L} STC2{L} , , , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. STC2 Causes the condition field of the instruction to be set to 0b1111. This provides additional opcode space for coprocessor designers. The resulting instructions can only be executed unconditionally. L Sets the N bit (bit[22]) in the instruction to 1 and specifies a long store (for example, double-precision instead of single-precision data transfer). If L is omitted, the N bit is 0 and the instruction specifies a short store. Specifies the name of the coprocessor, and causes the corresponding coprocessor number to be placed in the cp_num field of the instruction. The standard generic coprocessor names are p0, p1, ..., p15. Specifies the coprocessor source register. Is described in Addressing Mode 5 - Load and Store Coprocessor on page A5-49. It determines the P, U, Rn, W and 8_bit_word_offset bits of the instruction. The syntax of all forms of includes a base register . Some forms also specify that the instruction modifies the base register value (this is known as base register write-back). Architecture version STC is in all versions. STC2 is in ARMv5 and above. A4-186 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Exceptions Undefined Instruction, Data Abort. Operation MemoryAccess(B-bit, E-bit) processor_id = ExecutingProcessor() if ConditionPassed(cond) then address = start_address Memory[address,4] = value from Coprocessor[cp_num] if Shared(address) then /* from ARMv6 */ physical_address = TLB(address) ClearExclusiveByAddress(physical_address,processor_id,4) while (NotFinished(coprocessor[cp_num])) address = address + 4 Memory[address,4] = value from Coprocessor[cp_num] if Shared(address) then /* from ARMv6 */ physical_address = TLB(address) ClearExclusiveByAddress(physical_address,processor_id,4) /* See Summary of operation on page A2-49 */ assert address == end_address Usage STC is useful for storing coprocessor data to memory. The L (long) option controls the N bit and could be used to distinguish between a single- and double-precision transfer for a floating-point store instruction. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-187 ARM Instructions Notes Coprocessor fields Only instruction bits[31:23], bits[21:16} and bits[11:0] are defined by the ARM architecture. The remaining fields (bit[22] and bits[15:12]) are recommendations, for compatibility with ARM Development Systems. In the case of the Unindexed addressing mode (P==0, U==1, W==0), instruction bits[7:0] are also not ARM architecture-defined, and can be used to specify additional coprocessor options. Data Abort For details of the effects of the instruction if a Data Abort occurs, see Effects of data-aborted instructions on page A2-21. Non word-aligned addresses For CP15_reg1_Ubit == 0 the store coprocessor register instructions ignore the least significant two bits of address. For CP15_reg1_Ubit == 1, all non-word aligned accesses cause an alignment fault. Alignment If an implementation includes a System Control coprocessor (see Chapter B3 The System Control Coprocessor), and alignment checking is enabled, an address with bits[1:0] != 0b00 causes an alignment exception. Unimplemented coprocessor instructions Hardware coprocessor support is optional, regardless of the architecture version. An implementation can choose to implement a subset of the coprocessor instructions, or no coprocessor instructions at all. Any coprocessor instructions that are not implemented instead cause an Undefined Instruction exception. A4-188 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.97 STM (1) 31 28 27 26 25 24 23 22 21 20 19 cond 1 0 0 P U 0 W 0 16 15 Rn 0 register_list STM (1) (Store Multiple) stores a non-empty subset (or possibly all) of the general-purpose registers to sequential memory locations. Syntax STM{} {!}, where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Is described in Addressing Mode 4 - Load and Store Multiple on page A5-41. It determines the P, U, and W bits of the instruction. Specifies the base register used by . If R15 is specified as , the result is UNPREDICTABLE. ! Sets the W bit, causing the instruction to write a modified value back to its base register Rn as specified in Addressing Mode 4 - Load and Store Multiple on page A5-41. If ! is omitted, the W bit is 0 and the instruction does not change its base register in this way. Is a list of registers, separated by commas and surrounded by { and }. It specifies the set of registers to be stored by the STM instruction. The registers are stored in sequence, the lowest-numbered register to the lowest memory address (start_address), through to the highest-numbered register to the highest memory address (end_address). For each of i=0 to 15, bit[i] in the register_list field of the instruction is 1 if Ri is in the list and 0 otherwise. If bits[15:0] are all zero, the result is UNPREDICTABLE. If R15 is specified in , the value stored is IMPLEMENTATION DEFINED. For more details, see Reading the program counter on page A2-9. Architecture version All. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-189 ARM Instructions Exceptions Data Abort. Operation MemoryAccess(B-bit, E-bit) processor_id = ExecutingProcessor() if ConditionPassed(cond) then address = start_address for i = 0 to 15 if register_list[i] == 1 then Memory[address,4] = Ri address = address + 4 if Shared(address) then /* from ARMv6 */ physical_address = TLB(address) ClearExclusiveByAddress(physical_address,processor_id,4) /* See Summary of operation on page A2-49 */ assert end_address == address - 4 Usage STM is useful as a block store instruction (combined with LDM it allows efficient block copy) and for stack operations. A single STM used in the sequence of a procedure can push the return address and general-purpose register values on to the stack, updating the stack pointer in the process. Notes Operand restrictions If is specified in and base register write-back is specified: Data Abort • If is the lowest-numbered register specified in , the original value of is stored. • Otherwise, the stored value of is UNPREDICTABLE. For details of the effects of the instruction if a Data Abort occurs, see Effects of data-aborted instructions on page A2-21. Non word-aligned addresses For CP15_reg1_Ubit == 0, the STM[1] instruction ignores the least significant two bits of address. For CP15_reg1_Ubit == 1, all non-word aligned accesses cause an alignment fault. A4-190 Alignment If an implementation includes a System Control coprocessor (see Chapter B3 The System Control Coprocessor), and alignment checking is enabled, an address with bits[1:0] != 0b00 causes an alignment exception. Time order The time order of the accesses to individual words of memory generated by this instruction is only defined in some circumstances. See Memory access restrictions on page B2-13 for details. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.98 STM (2) 31 28 27 26 25 24 23 22 21 20 19 cond 1 0 0 P U 1 0 0 16 15 Rn 0 register_list STM (2) stores a subset (or possibly all) of the User mode general-purpose registers to sequential memory locations. Syntax STM{} , ^ where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Is described in Addressing Mode 4 - Load and Store Multiple on page A5-41. It determines the P and U bits of the instruction. Only the forms of this addressing mode with W == 0 are available for this form of the STM instruction. Specifies the base register used by . If R15 is specified as the base register , the result is UNPREDICTABLE. Is a list of registers, separated by commas and surrounded by { and }. It specifies the set of registers to be stored by the STM instruction. The registers are stored in sequence, the lowest-numbered register to the lowest memory address (start_address), through to the highest-numbered register to the highest memory address (end_address). For each of i=0 to 15, bit[i] in the register_list field of the instruction is 1 if Ri is in the list and 0 otherwise. If bits[15:0] are all zero, the result is UNPREDICTABLE. If R15 is specified in the value stored is IMPLEMENTATION DEFINED. For more details, see Reading the program counter on page A2-9. For an STM instruction, indicates that User mode registers are to be stored. ^ Architecture version All. Exceptions Data Abort. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-191 ARM Instructions Operation MemoryAccess(B-bit, E-bit) processor_id = ExecutingProcessor() if ConditionPassed(cond) then address = start_address for i = 0 to 15 if register_list[i] == 1 Memory[address,4] = Ri_usr address = address + 4 if Shared(address) then /* from ARMv6 */ physical_address = TLB(address) ClearExclusiveByAddress(physical_address,processor_id,4) /* See Summary of operation on page A2-49 */ assert end_address == address - 4 Usage Use STM (2) to store the User mode registers when the processor is in a privileged mode (useful when performing process swaps, and in instruction emulators). Notes Write-back Setting bit 21, the W bit, has UNPREDICTABLE results. User and System mode This instruction is UNPREDICTABLE in User or System mode. Base register mode For the purpose of address calculation, the base register is read from the current processor mode registers, not the User mode registers. Data Abort For details of the effects of the instruction if a Data Abort occurs, see Effects of data-aborted instructions on page A2-21. Non word-aligned addresses For CP15_reg1_Ubit == 0, the STM[2] instruction ignores the least significant two bits of address. For CP15_reg1_Ubit == 1, all non-word aligned accesses cause an alignment fault A4-192 Alignment If an implementation includes a System Control coprocessor (see Chapter B3 The System Control Coprocessor), and alignment checking is enabled, an address with bits[1:0] != 0b00 causes an alignment exception. Time order The time order of the accesses to individual words of memory generated by this instruction is only defined in some circumstances. See Memory access restrictions on page B2-13 for details. Banked registers In ARM architecture versions earlier than ARMv6, this form of STM must not be followed by an instruction that accesses banked registers (a following NOP is a good way to ensure this). Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.99 STR 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 I P U 0 W 0 16 15 Rn 12 11 Rd 0 addr_mode STR (Store Register) stores a word from a register to memory. Syntax STR{} , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the source register for the operation. If R15 is specified for , the value stored is IMPLEMENTATION DEFINED. For more details, see Reading the program counter on page A2-9. Is described in Addressing Mode 2 - Load and Store Word or Unsigned Byte on page A5-18. It determines the I, P, U, W, Rn and addr_mode bits of the instruction. The syntax of all forms of includes a base register . Some forms also specify that the instruction modifies the base register value (this is known as base register write-back). Architecture version All. Exceptions Data Abort. Operation MemoryAccess(B-bit, E-bit) processor_id = ExecutingProcessor() if ConditionPassed(cond) then Memory[address,4] = Rd if Shared(address) then /* from ARMv6 */ physical_address = TLB(address) ClearExclusiveByAddress(physical_address,processor_id,4) /* See Summary of operation on page A2-49 */ ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-193 ARM Instructions Usage Combined with a suitable addressing mode, STR stores 32-bit data from a general-purpose register into memory. Using the PC as the base register allows PC-relative addressing, which facilitates position-independent code. Notes Operand restrictions If specifies base register write-back, and the same register is specified for and , the results are UNPREDICTABLE. Data Abort For details of the effects of the instruction if a Data Abort occurs, see Effects of data-aborted instructions on page A2-21. Alignment Prior to ARMv6, STR ignores the least significant two bits of the address. This is different from the LDR behavior. Alignment checking (taking a data abort when address[1:0] != 0b00), and support for a big-endian (BE-32) data format are implementation options. From ARMv6, a byte- invariant mixed-endian format is supported, along with an alignment checking option. The pseudo-code for the ARMv6 case assumes that unaligned mixed-endian support is configured, with the endianness of the transfer defined by the CPSR E-bit. For more details on endianness and alignment see Endian support on page A2-30and Unaligned access support on page A2-38. A4-194 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.100 STRB 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 I P U 1 W 0 16 15 Rn 12 11 Rd 0 addr_mode STRB (Store Register Byte) stores a byte from the least significant byte of a register to memory. Syntax STR{}B , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the source register for the operation. If R15 is specified for , the result is UNPREDICTABLE. Is described in Addressing Mode 2 - Load and Store Word or Unsigned Byte on page A5-18. It determines the I, P, U, W, Rn and addr_mode bits of the instruction. The syntax of all forms of includes a base register . Some forms also specify that the instruction modifies the base register value (this is known as base register write-back). Architecture version All. Exceptions Data Abort. Operation processor_id = ExecutingProcessor() if ConditionPassed(cond) then Memory[address,1] = Rd[7:0] if Shared(address) then /* from ARMv6 */ physical_address = TLB(address) ClearExclusiveByAddress(physical_address,processor_id,1) /* See Summary of operation on page A2-49 */ ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-195 ARM Instructions Usage Combined with a suitable addressing mode, STRB writes the least significant byte of a general-purpose register to memory. Using the PC as the base register allows PC-relative addressing, which facilitates position-independent code. Notes Operand restrictions If specifies base register write-back, and the same register is specified for and , the results are UNPREDICTABLE. Data Abort A4-196 For details of the effects of the instruction if a Data Abort occurs, see Effects of data-aborted instructions on page A2-21. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.101 STRBT 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 I 0 U 1 1 0 16 15 Rn 12 11 Rd 0 addr_mode STRBT (Store Register Byte with Translation) stores a byte from the least significant byte of a register to memory. If the instruction is executed when the processor is in a privileged mode, the memory system is signaled to treat the access as if the processor were in User mode. Syntax STR{}BT , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the source register for the operation. If R15 is specified for , the result is UNPREDICTABLE. Is described in Addressing Mode 2 - Load and Store Word or Unsigned Byte on page A5-18. It determines the I, U, Rn and addr_mode bits of the instruction. Only post-indexed forms of Addressing Mode 2 are available for this instruction. These forms have P == 0 and W == 0, where P and W are bit[24] and bit[21] respectively. This instruction uses P == 0 and W == 1 instead, but the addressing mode is the same in all other respects. The syntax of all forms of includes a base register . All forms also specify that the instruction modifies the base register value (this is known as base register write-back). Architecture version All. Exceptions Data Abort. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-197 ARM Instructions Operation processor_id = ExecutingProcessor() if ConditionPassed(cond) then Memory[address,1] = Rd[7:0] if Shared(address) then /* from ARMv6 */ physical_address = TLB(address) ClearExclusiveByAddress(physical_address,processor_id,1) /* See Summary of operation on page A2-49 */ Usage STRBT can be used by a (privileged) exception handler that is emulating a memory access instruction which would normally execute in User mode. The access is restricted as if it had User mode privilege. Notes User mode If this instruction is executed in User mode, an ordinary User mode access is performed. Operand restrictions If the same register is specified for and , the results are UNPREDICTABLE. Data Abort A4-198 For details of the effects of the instruction if a Data Abort occurs, see Effects of data-aborted instructions on page A2-21. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.102 STRD 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 P U I W 0 16 15 Rn 12 11 Rd 8 7 6 5 4 3 0 addr_mode 1 1 1 1 addr_mode STRD (Store Registers Doubleword) stores a pair of ARM registers to two consecutive words of memory. The pair of registers is restricted to being an even-numbered register and the odd-numbered register that immediately follows it (for example, R10 and R11). A greater variety of addressing modes is available than for a two-register STM. Syntax STR{}D , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the even-numbered register that is stored to the memory word addressed by . The immediately following odd-numbered register is stored to the next memory word. If is R14, which would specify R15 as the second source register, the instruction is UNPREDICTABLE. If specifies an odd-numbered register, the instruction is UNDEFINED. Is described in Addressing Mode 3 - Miscellaneous Loads and Stores on page A5-33. It determines the P, U, I, W, Rn, and addr_mode bits of the instruction. The syntax of all forms of includes a base register . Some forms also specify that the instruction modifies the base register value (this is known as base register write-back). The address generated by is the address of the lower of the two words stored by the STRD instruction. The address of the higher word is generated by adding 4 to this address. Architecture version ARMv5TE and above, excluding ARMv5TExP. Exceptions Data Abort. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-199 ARM Instructions Operation MemoryAccess(B-bit, E-bit) processor_id = ExecutingProcessor() if ConditionPassed(cond) then if (Rd is even-numbered) and (Rd is not R14) and (address[1:0] == 0b00) and ((CP15_reg1_Ubit == 1) or (address[2] == 0)) then Memory[address,4] = Rd Memory[address+4,4] = R(d+1) else UNPREDICTABLE if Shared(address) then /* from ARMv6 */ physical_address = TLB(address) ClearExclusiveByAddress(physical_address,processor_id,4) if Shared(address+4) physical_address = TLB(address+4) ClearExclusiveByAddress(physical_address,processor_id,4) A4-200 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Notes Operand restrictions If performs base register write-back and the base register is one of the two source registers of the instruction, the results are UNPREDICTABLE. Data Abort For details of the effects of the instruction if a Data Abort occurs, see Effects of data-aborted instructions on page A2-21. Alignment Prior to ARMv6, if the memory address is not 64-bit aligned, the instruction is UNPREDICTABLE. Alignment checking (taking a data abort), and support for a big-endian (BE-32) data format are implementation options. From ARMv6, a byte-invariant mixed-endian format is supported, along with alignment checking options; modulo4 and modulo8. The pseudo-code for the ARMv6 case assumes that unaligned mixed-endian support is configured, with the endianness of the transfer defined by the CPSR E-bit. For more details on endianness and alignment, see Endian support on page A2-30 and Unaligned access support on page A2-38. Time order ARM DDI 0100I The time order of the accesses to the two memory words is not architecturally defined. In particular, an implementation is allowed to perform the two 32-bit memory accesses in either order, or to combine them into a single 64-bit memory access. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-201 ARM Instructions A4.1.103 STREX 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 1 1 0 0 0 16 15 Rn 12 11 Rd 8 SBO 7 6 5 4 3 1 0 0 1 0 Rm STREX (Store Register Exclusive) performs a conditional store to memory. The store only occurs if the executing processor has exclusive access to the memory addressed. Syntax STREX{} , , [] where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register for the returned status value. The value returned is: 0 if the operation updates memory 1 if the operation fails to update memory. Specifies the register containing the word to be stored to memory. Specifies the register containing the address. Architecture version ARMv6 and above. Exceptions Data Abort. A4-202 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation MemoryAccess(B-bit, E-bit) if ConditionPassed(cond) then processor_id = ExecutingProcessor() physical_address = TLB(Rn) if IsExclusiveLocal(physical_address, processor_id, 4) then if Shared(Rn) == 1 then if IsExclusiveGlobal(physical_address, processor_id, 4) then Memory[Rn,4] = Rm Rd = 0 ClearExclusiveByAddress(physical_address,processor_id,4) else Rd = 1 else Memory[Rn,4] = Rm Rd = 0 else Rd = 1 ClearExclusiveLocal(processor_id) /* See Summary of operation on page A2-49 */ /* The notes take precedence over any implied atomicity or order of events indicated in the pseudo-code */ Usage Use STREX in combination with LDREX to implement inter-process communication in multiprocessor and shared memory systems. See LDREX on page A4-52 for further information. Notes Use of R15 Specifying R15 for register , , or has UNPREDICTABLE results. Operand restrictions must be distinct from both and , otherwise the results are UNPREDICTABLE. Data Abort For details of the effects of the instruction if a Data Abort occurs, see Effects of data-aborted instructions on page A2-21. If a Data Abort occurs during execution of a STREX instruction: • memory is not updated is not updated. • Alignment If CP15 register 1(A,U) != (0,0) and Rd<1:0> != 0b00, an alignment exception will be taken. There is no support for unaligned Load Exclusive. If Rd<1:0> != 0b00 and (A,U) = (0,0), the result is UNPREDICTABLE ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-203 ARM Instructions A4.1.104 STRH 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 P U I W 0 16 15 Rn 12 11 Rd 8 7 6 5 4 3 0 addr_mode 1 0 1 1 addr_mode STRH (Store Register Halfword) stores a halfword from the least significant halfword of a register to memory. If the address is not halfword-aligned, the result is UNPREDICTABLE. Syntax STR{}H , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the source register for the operation. If R15 is specified for , the result is UNPREDICTABLE. Is described in Addressing Mode 3 - Miscellaneous Loads and Stores on page A5-33. It determines the P, U, I, W, Rn and addr_mode bits of the instruction. The syntax of all forms of includes a base register . Some forms also specify that the instruction modifies the base register value (this is known as base register write-back). Architecture version All. Exceptions Data Abort. A4-204 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation MemoryAccess(B-bit, E-bit) processor_id = ExecutingProcessor() if ConditionPassed(cond) then if (CP15_reg1_Ubit == 0) then if address[0] == 0b0 then Memory[address,2] = Rd[15:0] else Memory[address,2] = UNPREDICTABLE else /* CP15_reg1_Ubit ==1 */ Memory[address,2] = Rd[15:0] if Shared(address) then /* ARMv6 */ physical_address = TLB(address) ClearExclusiveByAddress(physical_address,processor_id,2) /* See Summary of operation on page A2-49 */ Usage Combined with a suitable addressing mode, STRH allows 16-bit data from a general-purpose register to be stored to memory. Using the PC as the base register allows PC-relative addressing, to facilitate position-independent code. Notes Operand restrictions If specifies base register write-back, and the same register is specified for and , the results are UNPREDICTABLE. Data Abort Alignment For details of the effects of the instruction if a Data Abort occurs, see Effects of data-aborted instructions on page A2-21. Prior to ARMv6, if the memory address is not halfword aligned, the instruction is Alignment checking (taking a data abort when address[0] != 0), and support for a big-endian (BE-32) data format are implementation options. UNPREDICTABLE. From ARMv6, a byte-invariant mixed-endian format is supported, along with an alignment checking option. The pseudo-code for the ARMv6 case assumes that mixed-endian support is configured, with the endianness of the transfer defined by the CPSR E-bit. For more details on endianness and alignment, see Endian support on page A2-30 and Unaligned access support on page A2-38. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-205 ARM Instructions A4.1.105 STRT 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 I 0 U 0 1 0 16 15 Rn 12 11 Rd 0 addr_mode STRT (Store Register with Translation) stores a word from a register to memory. If the instruction is executed when the processor is in a privileged mode, the memory system is signaled to treat the access as if the processor was in User mode. Syntax STR{}T , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the source register for the operation. If R15 is specified for , the value stored is IMPLEMENTATION DEFINED. For more details, see Reading the program counter on page A2-9. Is described in Addressing Mode 2 - Load and Store Word or Unsigned Byte on page A5-18. It determines the I, U, Rn and addr_mode bits of the instruction. Only post-indexed forms of Addressing Mode 2 are available for this instruction. These forms have P == 0 and W == 0, where P and W are bit[24] and bit[21] respectively. This instruction uses P == 0 and W == 1 instead, but the addressing mode is the same in all other respects. The syntax of all forms of includes a base register . All forms also specify that the instruction modifies the base register value (this is known as base register write-back). Architecture version All. Exceptions Data Abort. A4-206 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation MemoryAccess(B-bit, E-bit) processor_id = ExecutingProcessor() if ConditionPassed(cond) then Memory[address,4] = Rd if Shared(address) then /* ARMv6 */ physical_address = TLB(address) ClearExclusiveByAddress(physical_address,processor_id,4) /* See Summary of operation on page A2-49 */ Usage STRT can be used by a (privileged) exception handler that is emulating a memory access instruction that would normally execute in User mode. The access is restricted as if it had User mode privilege. Notes User mode If this instruction is executed in User mode, an ordinary User mode access is performed. Operand restrictions If the same register is specified for and , the results are UNPREDICTABLE. Data Abort For details of the effects of the instruction if a Data Abort occurs, see Effects of data-aborted instructions on page A2-21. Alignment As for STR, see STR on page A4-193. If an implementation includes a System Control coprocessor (see Chapter B3 The System Control Coprocessor), and alignment checking is enabled, an address with bits[1:0] != 0b00 causes an alignment exception. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-207 ARM Instructions A4.1.106 SUB 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 I 0 0 1 0 S 16 15 Rn 12 11 Rd 0 shifter_operand SUB (Subtract) subtracts one value from a second value. The second value comes from a register. The first value can be either an immediate value or a value from a register, and can be shifted before the subtraction. SUB can optionally update the condition code flags, based on the result. Syntax SUB{}{S} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. S Sets the S bit (bit[20]) in the instruction to 1 and specifies that the instruction updates the CPSR. If S is omitted, the S bit is set to 0 and the CPSR is not changed by the instruction. Two types of CPSR update can occur when S is specified: • If is not R15, the N and Z flags are set according to the result of the subtraction, and the C and V flags are set according to whether the subtraction generated a borrow (unsigned underflow) and a signed overflow, respectively. The rest of the CPSR is unchanged. • If is R15, the SPSR of the current mode is copied to the CPSR. This form of the instruction is UNPREDICTABLE if executed in User mode or System mode, because these modes do not have an SPSR. Specifies the destination register. Specifies the register that contains the first operand. Specifies the second operand. The options for this operand are described in Addressing Mode 1 - Data-processing operands on page A5-2, including how each option causes the I bit (bit[25]) and the shifter_operand bits (bits[11:0]) to be set in the instruction. If the I bit is 0 and both bit[7] and bit[4] of shifter_operand are 1, the instruction is not SUB. Instead, see Extending the instruction set on page A3-32 to determine which instruction it is. Architecture version All. A4-208 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Exceptions None. Operation if ConditionPassed(cond) then Rd = Rn - shifter_operand if S == 1 and Rd == R15 then if CurrentModeHasSPSR() then CPSR = SPSR else UNPREDICTABLE else if S == 1 then N Flag = Rd[31] Z Flag = if Rd == 0 then 1 else 0 C Flag = NOT BorrowFrom(Rn - shifter_operand) V Flag = OverflowFrom(Rn - shifter_operand) Usage Use SUB to subtract one value from another. To decrement a register value (in Ri) use: SUB Ri, Ri, #1 SUBS is useful as a loop counter decrement, as the loop branch can test the flags for the appropriate termination condition, without the need for a separate compare instruction: SUBS Ri, Ri, #1 This both decrements the loop counter in Ri and checks whether it has reached zero. You can use SUB, with the PC as its destination register and the S bit set, to return from interrupts and various other types of exception. See Exceptions on page A2-16 for more details. Notes C flag If S is specified, the C flag is set to: 1 if no borrow occurs 0 if a borrow does occur. In other words, the C flag is used as a NOT(borrow) flag. This inversion of the borrow condition is used by subsequent instructions: SBC and RSC use the C flag as a NOT(borrow) operand, performing a normal subtraction if C == 1 and subtracting one more than usual if C == 0. The HS (unsigned higher or same) and LO (unsigned lower) conditions are equivalent to CS (carry set) and CC (carry clear) respectively. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-209 ARM Instructions A4.1.107 SWI 31 28 27 26 25 24 23 cond 1 1 1 1 0 immed_24 SWI (Software Interrupt) causes a SWI exception (see Exceptions on page A2-16). Syntax SWI{} where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Is a 24-bit immediate value that is put into bits[23:0] of the instruction. This value is ignored by the ARM processor, but can be used by an operating system SWI exception handler to determine what operating system service is being requested (see Usage on page A4-211 below for more details). Architecture version All. Exceptions Software interrupt. Operation if ConditionPassed(cond) then R14_svc = address of next instruction after the SWI instruction SPSR_svc = CPSR CPSR[4:0] = 0b10011 /* Enter Supervisor mode */ CPSR[5] = 0 /* Execute in ARM state */ /* CPSR[6] is unchanged */ CPSR[7] = 1 /* Disable normal interrupts */ /* CPSR[8] is unchanged */ CPSR[9] = CP15_reg1_EEbit if high vectors configured then PC = 0xFFFF0008 else PC = 0x00000008 A4-210 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Usage SWI is used as an operating system service call. The method used to select which operating system service is required is specified by the operating system, and the SWI exception handler for the operating system determines and provides the requested service. Two typical methods are: • The 24-bit immediate in the instruction specifies which service is required, and any parameters needed by the selected service are passed in general-purpose registers. • The 24-bit immediate in the instruction is ignored, general-purpose register R0 is used to select which service is wanted, and any parameters needed by the selected service are passed in other general-purpose registers. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-211 ARM Instructions A4.1.108 SWP 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 1 0 0 0 0 16 15 Rn 12 11 Rd 8 SBZ 7 6 5 4 3 1 0 0 1 0 Rm SWP (Swap) swaps a word between registers and memory. SWP loads a word from the memory address given by the value of register . The value of register is then stored to the memory address given by the value of , and the original loaded value is written to register . If the same register is specified for and , this instruction swaps the value of the register and the value at the memory address. Syntax SWP{} , , [] where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register for the instruction. Contains the value that is stored to memory. Contains the memory address to load from. Architecture version All (deprecated in ARMv6). Exceptions Data Abort. Operation MemoryAccess(B-bit, E-bit) processor_id = ExecutingProcessor() if ConditionPassed(cond) then if (CP15_reg1_Ubit == 0) then temp = Memory[address,4] Rotate_Right (8 * address[1:0]) Memory[address,4] = Rm Rd = temp else /* CP15_reg1_Ubit ==1 */ temp = Memory[address,4] Memory[address,4] = Rm Rd = temp if Shared(address) then /* ARMv6 */ A4-212 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions physical_address = TLB(address) ClearExclusiveByAddress(physical_address,processor_id,4) /* See Summary of operation on page A2-49 */ Usage You can use SWP to implement semaphores. This instruction is deprecated in ARMv6. Software should migrate to using the Load/Store exclusive instructions described in Synchronization primitives on page A2-44. Notes Use of R15 If R15 is specified for , , or , the result is UNPREDICTABLE. Operand restrictions If the same register is specified as and , or and , the result is UNPREDICTABLE. Data Abort If a precise Data Abort is signaled on either the load access or the store access, the loaded value is not written to . If a precise Data Abort is signaled on the load access, the store access does not occur. Alignment Prior to ARMv6, the alignment rules are the same as for an LDR on the read (see LDR on page A4-43) and an STR on the write (see STR on page A4-193). Alignment checking (taking a data abort when address[1:0] != 0b00), and support for a big-endian (BE-32) data format are implementation options. From ARMv6, if CP15 register 1(A,U) != (0,0) and Rn[1:0] != 0b00, an alignment exception is taken. If CP15 register 1(A,U) == (0,0), the behavior is the same as the behavior before ARMv6. For more details on endianness and alignment see Endian support on page A2-30 and Unaligned access support on page A2-38. Memory model considerations Swap is an atomic operation for all accesses, cached and non-cached. The swap operation does not include any memory barrier guarantees. For example, it does not guarantee flushing of write buffers, which is an important consideration on multiprocessor systems. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-213 ARM Instructions A4.1.109 SWPB 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 1 0 1 0 0 16 15 Rn 12 11 Rd 8 SBZ 7 6 5 4 3 1 0 0 1 0 Rm SWPB (Swap Byte) swaps a byte between registers and memory. SWPB loads a byte from the memory address given by the value of register . The value of the least significant byte of register is stored to the memory address given by , the original loaded value is zero-extended to a 32-bit word, and the word is written to register . If the same register is specified for and , this instruction swaps the value of the least significant byte of the register and the byte value at the memory address. Syntax SWP{}B , , [] where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register for the instruction. Contains the value that is stored to memory. Contains the memory address to load from. Architecture version All (deprecated in ARMv6). Exceptions Data Abort. Operation MemoryAccess(B-bit, E-bit) processor_id = ExecutingProcessor() if ConditionPassed(cond) then temp = Memory[address,1] Memory[address,1] = Rm[7:0] Rd = temp if Shared(address) then /* ARMv6 */ physical_address = TLB(address) ClearExclusiveByAddress(physical_address,processor_id,1) /* See Summary of operation on page A2-49 */ A4-214 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Usage You can use SWPB to implement semaphores. This instruction is deprecated in ARMv6. Software should migrate to using the Load /Store exclusive instructions described in Synchronization primitives on page A2-44. Notes Use of R15 If R15 is specified for , , or , the result is UNPREDICTABLE. Operand restrictions If the same register is specified as and , or and , the result is UNPREDICTABLE. Data Abort If a precise Data Abort is signaled on either the load access or the store access, the loaded value is not written to . If a precise Data Abort is signaled on the load access, the store access does not occur. Memory model considerations Swap is an atomic operation for all accesses, cached and non-cached. The swap operation does not include any memory barrier guarantees. For example, it does not guarantee flushing of write buffers, which is an important consideration on multiprocessor systems. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-215 ARM Instructions A4.1.110 SXTAB 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 1 0 1 0 16 15 Rn 12 11 10 9 8 Rd 7 6 5 4 3 rotate SBZ 0 1 1 1 0 Rm SXTAB extracts an 8-bit value from a register, sign extends it to 32 bits, and adds the result to the value in another register. You can specify a rotation by 0, 8, 16, or 24 bits before extracting the 8-bit value. Syntax SXTAB{} , , {, } where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. This can be any one of: • ROR #8. This is encoded as 0b01 in the rotate field. • ROR #16. This is encoded as 0b10 in the rotate field. • ROR #24. This is encoded as 0b11 in the rotate field. • Omitted. This is encoded as 0b00 in the rotate field. Note If your assembler accepts shifts by #0 and treats them as equivalent to no shift or LSL #0, then it must accept ROR #0 here. It is equivalent to omitting . Architecture version ARMv6 and above. Exceptions None. A4-216 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then operand2 = Rm Rotate_Right(8 * rotate) Rd = Rn + SignExtend(operand2[7:0]) Usage You can use SXTAB to eliminate a separate sign-extension instruction in many instruction sequences that act on signed char values in C/C++. Notes Use of R15 Specifying R15 for register or has UNPREDICTABLE results. Note Your assembler must fault the use of R15 for register . Encoding ARM DDI 0100I If the field of the instruction contains 0b1111, the instruction is an SXTB instruction instead, see SXTB on page A4-222. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-217 ARM Instructions A4.1.111 SXTAB16 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 1 0 0 0 16 15 Rn 12 11 10 9 8 Rd 7 6 5 4 3 rotate SBZ 0 1 1 1 0 Rm SXTAB16 extracts two 8-bit values from a register, sign extends them to 16 bits each, and adds the results to two 16-bit values from another register. You can specify a rotation by 0, 8, 16, or 24 bits before extracting the 8-bit values. Syntax SXTAB16{} , , {, } where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. This can be any one of: • ROR #8. This is encoded as 0b01 in the rotate field. • ROR #16. This is encoded as 0b10 in the rotate field. • ROR #24. This is encoded as 0b11 in the rotate field. • Omitted. This is encoded as 0b00 in the rotate field. Note If your assembler accepts shifts by #0 and treats them as equivalent to no shift or LSL #0, then it must accept ROR #0 here. It is equivalent to omitting . Architecture version ARMv6 and above. Exceptions None. A4-218 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then operand2 = Rm Rotate_Right(8 * rotate) Rd[15:0] = Rn[15:0] + SignExtend(operand2[7:0]) Rd[31:16] = Rn[31:16] + SignExtend(operand2[23:16]) Usage Use SXTAB16 when you need to keep intermediate values to higher precision while working on arrays of signed byte values. See UXTAB16 on page A4-276 for an example of a similar usage. Notes Use of R15 Specifying R15 for register or has UNPREDICTABLE results. Note Your assembler must fault the use of R15 for register . Encoding ARM DDI 0100I If the field of the instruction contains 0b1111, the instruction is an SXTB16 instruction instead, see SXTB16 on page A4-224. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-219 ARM Instructions A4.1.112 SXTAH 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 1 0 1 1 16 15 Rn 12 11 10 9 8 Rd 7 6 5 4 3 rotate SBZ 0 1 1 1 0 Rm SXTAH extracts a 16-bit value from a register, sign extends it to 32 bits, and adds the result to a value in another register. You can specify a rotation by 0, 8, 16, or 24 bits before extracting the 16-bit value. Syntax SXTAH{} , , {, } where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. This can be any one of: • ROR #8. This is encoded as 0b01 in the rotate field. • ROR #16. This is encoded as 0b10 in the rotate field. • ROR #24. This is encoded as 0b11 in the rotate field. • Omitted. This is encoded as 0b00 in the rotate field. Note If your assembler accepts shifts by #0 and treats them as equivalent to no shift or LSL #0, then it must accept ROR #0 here. It is equivalent to omitting . Architecture version ARMv6 and above. Exceptions None. A4-220 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then operand2 = Rm Rotate_Right(8 * rotate) Rd = Rn + SignExtend(operand2[15:0]) Usage You can use SXTAH to eliminate a separate sign-extension instruction in many instruction sequences that act on signed short values in C/C++. Notes Use of R15 Specifying R15 for register or has UNPREDICTABLE results. Note Your assembler must fault the use of R15 for register . Encoding ARM DDI 0100I If the field of the instruction contains 0b1111, the instruction is an SXTH instruction instead, see SXTH on page A4-226. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-221 ARM Instructions A4.1.113 SXTB 31 28 27 26 25 24 23 22 21 20 19 18 17 16 15 cond 0 1 1 0 1 0 1 0 1 1 1 1 12 11 10 9 8 Rd 7 6 5 4 3 rotate SBZ 0 1 1 1 0 Rm SXTB extracts an 8-bit value from a register and sign extends it to 32 bits. You can specify a rotation by 0, 8, 16, or 24 bits before extracting the 8-bit value. Syntax SXTB{} , {, } where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the operand. This can be any one of: • ROR #8. This is encoded as 0b01 in the rotate field. • ROR #16. This is encoded as 0b10 in the rotate field. • ROR #24. This is encoded as 0b11 in the rotate field. • Omitted. This is encoded as 0b00 in the rotate field. Note If your assembler accepts shifts by #0 and treats them as equivalent to no shift or LSL #0, then it must accept ROR #0 here. It is equivalent to omitting . Architecture version ARMv6 and above. Exceptions None. A4-222 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then operand2 = Rm Rotate_Right(8 * rotate) Rd[31:0] = SignExtend(operand2[7:0]) Usage Use SXTB to sign-extend a byte to a word, for example in instruction sequences acting on signed char values in C/C++. Notes Use of R15 ARM DDI 0100I Specifying R15 for register or has UNPREDICTABLE results Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-223 ARM Instructions A4.1.114 SXTB16 31 28 27 26 25 24 23 22 21 20 19 18 17 16 15 cond 0 1 1 0 1 0 0 0 1 1 1 1 12 11 10 9 8 Rd 7 6 5 4 3 rotate SBZ 0 1 1 1 0 Rm SXTB16 extracts two 8-bit values from a register and sign extends them to 16 bits each. You can specify a rotation by 0, 8, 16, or 24 bits before extracting the 8-bit values. Syntax SXTB16{} , {, } where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the operand. This can be any one of: • ROR #8. This is encoded as 0b01 in the rotate field. • ROR #16. This is encoded as 0b10 in the rotate field. • ROR #24. This is encoded as 0b11 in the rotate field. • Omitted. This is encoded as 0b00 in the rotate field. Note If your assembler accepts shifts by #0 and treats them as equivalent to no shift or LSL #0, then it must accept ROR #0 here. It is equivalent to omitting . Architecture version ARMv6 and above. Exceptions None. A4-224 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then operand2 = Rm Rotate_Right(8 * rotate) Rd[15:0] = SignExtend(operand2[7:0]) Rd[31:16] = SignExtend(operand2[23:16]) Usage Use SXTB16 when you need to keep intermediate values to higher precision while working on arrays of signed byte values. See UXTAB16 on page A4-276 for an example of a similar usage. Notes Use of R15 ARM DDI 0100I Specifying R15 for register or has UNPREDICTABLE results Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-225 ARM Instructions A4.1.115 SXTH 31 28 27 26 25 24 23 22 21 20 19 18 17 16 15 cond 0 1 1 0 1 0 1 1 1 1 1 1 12 11 10 9 8 Rd 7 6 5 4 3 rotate SBZ 0 1 1 1 0 Rm SXTH extracts a 16-bit value from a register and sign extends it to 32 bits. You can specify a rotation by 0, 8, 16, or 24 bits before extracting the 16-bit value. Syntax SXTH{} , {, } where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the operand. This can be any one of: • ROR #8. This is encoded as 0b01 in the rotate field. • ROR #16. This is encoded as 0b10 in the rotate field. • ROR #24. This is encoded as 0b11 in the rotate field. • Omitted. This is encoded as 0b00 in the rotate field. Note If your assembler accepts shifts by #0 and treats them as equivalent to no shift or LSL #0, then it must accept ROR #0 here. It is equivalent to omitting . Architecture version ARMv6 and above. Exceptions None. A4-226 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then operand2 = Rm Rotate_Right(8 * rotate) Rd[31:0] = SignExtend(operand2[15:0]) Usage Use SXTH to sign-extend a halfword to a word, for example in instruction sequences acting on signed short values in C/C++. Notes Use of R15 ARM DDI 0100I Specifying R15 for register or has UNPREDICTABLE results Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-227 ARM Instructions A4.1.116 TEQ 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 I 1 0 0 1 1 16 15 Rn 12 11 SBZ 0 shifter_operand TEQ (Test Equivalence) compares a register value with another arithmetic value. The condition flags are updated, based on the result of logically exclusive-ORing the two values, so that subsequent instructions can be conditionally executed. Syntax TEQ{} , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the register that contains the first operand. Specifies the second operand. The options for this operand are described in Addressing Mode 1 - Data-processing operands on page A5-2, including how each option sets the I bit (bit[25]) and the shifter_operand bits (bits[11:0]) in the instruction. If the I bit is 0 and both bit[7] and bit[4] of shifter_operand are 1, the instruction is not TEQ. Instead, see Multiply instruction extension space on page A3-35 to determine which instruction it is. Architecture version All. Exceptions None. Operation if ConditionPassed(cond) then alu_out = Rn EOR shifter_operand N Flag = alu_out[31] Z Flag = if alu_out == 0 then 1 else 0 C Flag = shifter_carry_out V Flag = unaffected A4-228 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Usage Use TEQ to test if two values are equal, without affecting the V flag (as CMP does). The C flag is also unaffected in many cases. TEQ is also useful for testing whether two values have the same sign. After the comparison, the N flag is the logical Exclusive OR of the sign bits of the two operands. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-229 ARM Instructions A4.1.117 TST 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 I 1 0 0 0 1 16 15 Rn 12 11 SBZ 0 shifter_operand TST (Test) compares a register value with another arithmetic value. The condition flags are updated, based on the result of logically ANDing the two values, so that subsequent instructions can be conditionally executed. Syntax TST{} , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the register that contains the first operand. Specifies the second operand. The options for this operand are described in Addressing Mode 1 - Data-processing operands on page A5-2, including how each option causes the I bit (bit[25]) and the shifter_operand bits (bits[11:0]) to be set in the instruction. If the I bit is 0 and both bit[7] and bit[4] of shifter_operand are 1, the instruction is not TST. Instead, see Multiply instruction extension space on page A3-35 to determine which instruction it is. Architecture version All. Exceptions None. Operation if ConditionPassed(cond) then alu_out = Rn AND shifter_operand N Flag = alu_out[31] Z Flag = if alu_out == 0 then 1 else 0 C Flag = shifter_carry_out V Flag = unaffected A4-230 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Usage Use TST to determine whether a particular subset of register bits includes at least one set bit. A very common use for TST is to test whether a single bit is set or clear. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-231 ARM Instructions A4.1.118 UADD16 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 1 0 1 16 15 Rn 12 11 Rd 8 SBO 7 6 5 4 3 0 0 0 1 0 Rm UADD16 (Unsigned Add) performs two 16-bit unsigned integer additions. It sets the GE bits in the CPSR as carry flags for the additions. Syntax UADD16{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then Rd[15:0] = Rn[15:0] + Rm[15:0] GE[1:0] = if CarryFrom16(Rn[15:0] + Rm[15:0]) == 1 then 0b11 else 0 Rd[31:16] = Rn[31:16] + Rm[31:16] GE[3:2] = if CarryFrom16(Rn[31:16] + Rm[31:16]) == 1 then 0b11 else 0 Usage UADD16 produces the same result value as SADD16. However, the GE flag values are based on unsigned arithmetic instead of signed arithmetic. Notes Use of R15 A4-232 Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.119 UADD8 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 1 0 1 16 15 Rn 12 11 Rd 8 7 6 5 4 SBO 1 0 0 1 3 0 Rm UADD8 performs four 8-bit unsigned integer additions. It sets the GE bits in the CPSR as carry flags for the additions. Syntax UADD8{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then Rd[7:0] = Rn[7:0] + Rm[7:0] GE[0] = CarryFrom8(Rn[7:0] + Rm[7:0]) Rd[15:8] = Rn[15:8] + Rm[15:8] GE[1] = CarryFrom8(Rn[15:8] + Rm[15:8]) Rd[23:16] = Rn[23:16] + Rm[23:16] GE[2] = CarryFrom8(Rn[23:16] + Rm[23:16]) Rd[31:24] = Rn[31:24] + Rm[31:24] GE[3] = CarryFrom8(Rn[31:24] + Rm[31:24]) Usage UADD8 produces the same result value as SADD8. However, the GE flag values are based on unsigned arithmetic instead of signed arithmetic. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-233 ARM Instructions Notes Use of R15 A4-234 Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.120 UADDSUBX 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 1 0 1 16 15 Rn 12 11 Rd 8 7 6 5 4 SBO 0 0 1 1 3 0 Rm UADDSUBX (Unsigned Add and Subtract with Exchange) performs one 16-bit unsigned integer addition and one 16-bit unsigned integer subtraction. It exchanges the two halfwords of the second operand before it performs the arithmetic. It sets the GE bits in the CPSR according to the results of the addition and subtraction. Syntax UADDSUBX{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then sum = Rn[31:16] + Rm[15:0] /* unsigned addition */ Rd[31:16] = sum[15:0] GE[3:2] = if CarryFrom16(Rn[31:16] + Rm[15:0]) then 0b11 else 0 diff = Rn[15:0] - Rm[31:16] /* unsigned subtraction */ Rd[15:0] = diff[15:0] GE[1:0] = if BorrowFrom(Rn[15:0] - Rm[31:16]) then 0b11 else 0 ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-235 ARM Instructions Usage UADDSUBX produces the same result value as SADDSUBX. However, the GE flag values are based on unsigned arithmetic instead of signed arithmetic. Notes Use of R15 A4-236 Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.121 UHADD16 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 1 1 1 16 15 Rn 12 11 Rd 8 7 6 5 4 SBO 0 0 0 1 3 0 Rm UHADD16 (Unsigned Halving Add) performs two 16-bit unsigned integer additions, and halves the results. It has no effect on the GE flags. Syntax UHADD16{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then sum = Rn[15:0] + Rm[15:0] /* Unsigned addition */ Rd[15:0] = sum[16:1] sum = Rn[31:16] + Rm[31:16] /* Unsigned addition */ Rd[31:16] = sum[16:1] Usage Use UHADD16 for similar purposes to UADD16 (see UADD16 on page A4-232). UHADD16 averages the operands. Notes Use of R15 ARM DDI 0100I Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-237 ARM Instructions A4.1.122 UHADD8 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 1 1 1 16 15 Rn 12 11 Rd 8 SBO 7 6 5 4 3 1 0 0 1 0 Rm UHADD16 performs four 8-bit unsigned integer additions, and halves the results. It has no effect on the GE flags. Syntax UHADD8{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then sum = Rn[7:0] + Rm[7:0] Rd[7:0] = sum[8:1] sum = Rn[15:8] + Rm[15:8] Rd[15:8] = sum[8:1] sum = Rn[23:16] + Rm[23:16] Rd[23:16] = sum[8:1] sum = Rn[31:24] + Rm[31:24] Rd[31:24] = sum[8:1] /* Unsigned addition */ /* Unsigned addition */ /* Unsigned addition */ /* Unsigned addition */ Usage Use UHADD8 for similar purposes to UADD8 (see UADD8 on page A4-233). UHADD8 averages the operands. A4-238 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Notes Use of R15 ARM DDI 0100I Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-239 ARM Instructions A4.1.123 UHADDSUBX 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 1 1 1 16 15 Rn 12 11 Rd 8 SBO 7 6 5 4 3 0 0 1 1 0 Rm UHADDSUBX (Unsigned Halving Add and Subtract with Exchange) performs one 16-bit unsigned integer addition and one 16-bit unsigned integer subtraction, and halves the results. It exchanges the two halfwords of the second operand before it performs the arithmetic. It has no effect on the GE flags. Syntax UHADDSUBX{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then sum = Rn[31:16] + Rm[15:0] Rd[31:16] = sum[16:1] diff = Rn[15:0] - Rm[31:16] Rd[15:0] = diff[16:1] /* Unsigned addition */ /* Unsigned subtraction */ Usage Use UHADDSUBX for similar purposes to UADDSUBX (see UADDSUBX on page A4-235). UHADDSUBX halves the results. A4-240 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Notes Use of R15 ARM DDI 0100I Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-241 ARM Instructions A4.1.124 UHSUB16 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 1 1 1 16 15 Rn 12 11 Rd 8 SBO 7 6 5 4 3 0 1 1 1 0 Rm UHSUB16 (Unsigned Halving Subtract) performs two 16-bit unsigned integer subtractions, and halves the results. It has no effect on the GE flags. Syntax UHSUB16{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then diff = Rn[15:0] - Rm[15:0] Rd[15:0] = diff[16:1] diff = Rn[31:16] - Rm[31:16] Rd[31:16] = diff[16:1] /* Unsigned subtraction */ /* Unsigned subtraction */ Usage Use UHSUB16 for similar purposes to USUB16 (see USUB16 on page A4-269). UHSUB16 gives half the difference instead of the full difference. Notes Use of R15 A4-242 Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.125 UHSUB8 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 1 1 1 16 15 Rn 12 11 Rd 8 7 6 5 4 SBO 1 1 1 1 3 0 Rm UHSUB8 performs four 8-bit unsigned integer subtractions, and halves the results. It has no effect on the GE flags. Syntax UHSUB8{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then diff = Rn[7:0] - Rm[7:0] Rd[7:0] = diff[8:1] diff = Rn[15:8] - Rm[15:8] Rd[15:8] = diff[8:1] diff = Rn[23:16] - Rm[23:16] Rd[23:16] = diff[8:1] diff = Rn[31:24] - Rm[31:24] Rd[31:24] = diff[8:1] /* Unsigned subtraction */ /* Unsigned subtraction */ /* Unsigned subtraction */ /* Unsigned subtraction */ Usage Use UHSUB8 for similar purposes to USUB8 (see USUB8 on page A4-270). UHSUB8 gives half the difference instead of the full difference. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-243 ARM Instructions Notes Use of R15 A4-244 Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.126 UHSUBADDX 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 1 1 1 16 15 Rn 12 11 Rd 8 7 6 5 4 SBO 0 1 0 1 3 0 Rm UHSUBADDX (Unsigned Halving Subtract and Add with Exchange) performs one 16-bit unsigned integer subtraction and one 16-bit unsigned integer addition, and halves the results. It exchanges the two halfwords of the second operand before it performs the arithmetic. It has no effect on the GE flags. Syntax UHSUBADDX{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then diff = Rn[31:16] - Rm[15:0] Rd[31:16] = diff[16:1] sum = Rn[15:0] + Rm[31:16] Rd[15:0] = sum[16:1] /* Unsigned subtraction */ /* Unsigned addition */ Usage Use UHSUBADDX for similar purposes to USUBADDX (see USUBADDX on page A4-272). UHSUBADDX gives half the difference and the average instead of the full difference and sum. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-245 ARM Instructions Notes Use of R15 A4-246 Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.127 UMAAL 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 0 0 1 0 0 16 15 RdHi 12 11 RdLo 8 7 6 5 4 Rs 1 0 0 1 3 0 Rm UMAAL (Unsigned Multiply Accumulate Accumulate Long) multiplies the unsigned value of register with the unsigned value of register to produce a 64-bit product. Both the unsigned 32-bit value held in and the unsigned 32-bit value held in are added to this product, and the sum is written back to and as a 64-bit value. The flags are not updated. Syntax UMAAL{} , , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Supplies one of the 32-bit values to be added to the product of and , and is the destination register for the lower 32 bits of the result. Supplies the other 32-bit value to be added to the product of and , and is the destination register for the upper 32 bits of the result. Holds the unsigned value to be multiplied with the value of . Holds the unsigned value to be multiplied with the value of . Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then result = Rm * Rs + RdLo + RdHi RdLo = result[31:0] RdHi = result[63:32] /* Unsigned multiplication and additions */ Usage Adding two 32-bit values to a 32-bit unsigned multiply is a useful function in cryptographic applications. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-247 ARM Instructions Notes Use of R15 Specifying R15 for register , , , or has UNPREDICTABLE results. Operand restriction If and are the same register, the results are UNPREDICTABLE. Early termination A4-248 If the multiplier implementation supports early termination, it must be implemented on the value of the operand. The type of early termination used (signed or unsigned) is IMPLEMENTATION DEFINED. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.128 UMLAL 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 0 1 0 1 S 16 15 RdHi 12 11 RdLo 8 7 6 5 4 Rs 1 0 0 1 3 0 Rm UMLAL (Unsigned Multiply Accumulate Long) multiplies the unsigned value of register with the unsigned value of register to produce a 64-bit product. This product is added to the 64-bit value held in and , and the sum is written back to and . The condition code flags are optionally updated, based on the result. Syntax UMLAL{}{S} , , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. S Causes the S bit (bit[20]) in the instruction to be set to 1 and specifies that the instruction updates the CPSR by setting the N and Z flags according to the result of the multiply-accumulate. If S is omitted, the S bit of the instruction is set to 0 and the entire CPSR is unaffected by the instruction. Supplies the lower 32 bits of the value to be added to the product of and , and is the destination register for the lower 32 bits of the result. Supplies the upper 32 bits of the value to be added to the product of and , and is the destination register for the upper 32 bits of the result. Holds the signed value to be multiplied with the value of . Holds the signed value to be multiplied with the value of . Architecture version All. Exceptions None. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-249 ARM Instructions Operation if ConditionPassed(cond) then RdLo = (Rm * Rs)[31:0] + RdLo /* Unsigned multiplication */ RdHi = (Rm * Rs)[63:32] + RdHi + CarryFrom((Rm * Rs)[31:0] + RdLo) if S == 1 then N Flag = RdHi[31] Z Flag = if (RdHi == 0) and (RdLo == 0) then 1 else 0 C Flag = unaffected /* See "C and V flags" note */ V Flag = unaffected /* See "C and V flags" note */ Usage UMLAL multiplies unsigned variables to produce a 64-bit result, which is added to the 64-bit value in the two destination general-purpose registers. The result is written back to the two destination general-purpose registers. Notes Use of R15 Specifying R15 for register , , , or has UNPREDICTABLE results. Operand restriction and must be distinct registers, or the results are UNPREDICTABLE. Specifying the same register for either and , or and , was previously described as producing UNPREDICTABLE results. There is no restriction in ARMv6, and it is believed all relevant ARMv4 and ARMv5 implementations do not require this restriction either, because high performance multipliers read all their operands prior to writing back any results. Early termination If the multiplier implementation supports early termination, it must be implemented on the value of the operand. The type of early termination used (signed or unsigned) is IMPLEMENTATION DEFINED. C and V flags UMLALS is defined to leave the C and V flags unchanged in ARMv5 and above. In earlier versions of the architecture, the values of the C and V flags were after a UMLALS instruction. UNPREDICTABLE A4-250 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.129 UMULL 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 0 1 0 0 S 16 15 RdHi 12 11 RdLo 8 7 6 5 4 Rs 1 0 0 1 3 0 Rm UMULL (Unsigned Multiply Long) multiplies the unsigned value of register with the unsigned value of register to produce a 64-bit result. The upper 32 bits of the result are stored in . The lower 32 bits are stored in . The condition code flags are optionally updated, based on the 64-bit result. Syntax UMULL{}{S} , , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. S Causes the S bit (bit[20]) in the instruction to be set to 1 and specifies that the instruction updates the CPSR by setting the N and Z flags according to the result of the multiplication. If S is omitted, the S bit of the instruction is set to 0 and the entire CPSR is unaffected by the instruction. Stores the lower 32 bits of the result. Stores the upper 32 bits of the result. Holds the signed value to be multiplied with the value of . Holds the signed value to be multiplied with the value of . Architecture version All. Exceptions None. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-251 ARM Instructions Operation if ConditionPassed(cond) then RdHi = (Rm * Rs)[63:32] /* Unsigned multiplication */ RdLo = (Rm * Rs)[31:0] if S == 1 then N Flag = RdHi[31] Z Flag = if (RdHi == 0) and (RdLo == 0) then 1 else 0 C Flag = unaffected /* See "C and V flags" note */ V Flag = unaffected /* See "C and V flags" note */ Usage UMULL multiplies unsigned variables to produce a 64-bit result in two general-purpose registers. Notes Use of R15 Specifying R15 for register , , , or has UNPREDICTABLE results. Operand restriction and must be distinct registers, or the results are UNPREDICTABLE. Specifying the same register for either and , or and , was previously described as producing UNPREDICTABLE results. There is no restriction in ARMv6, and it is believed all relevant ARMv4 and ARMv5 implementations do not require this restriction either, because high performance multipliers read all their operands prior to writing back any results. Early termination If the multiplier implementation supports early termination, it must be implemented on the value of the operand. The type of early termination used (signed or unsigned) is IMPLEMENTATION DEFINED. C and V flags UMULLS is defined to leave the C and V flags unchanged in ARMv5 and above. In earlier versions of the architecture, the values of the C and V flags were after a UMULLS instruction. UNPREDICTABLE A4-252 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.130 UQADD16 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 1 1 0 16 15 Rn 12 11 Rd 8 7 6 5 4 SBO 0 0 0 1 3 0 Rm UQADD16 (Unsigned Saturating Add) performs two 16-bit integer additions. It saturates the results to the 16-bit unsigned integer range 0 ≤ x ≤ 216 – 1. It has no effect on the GE flags. Syntax UQADD16{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then Rd[15:0] = UnsignedSat(Rn[15:0] + Rm[15:0], 16) Rd[31:16] = UnsignedSat(Rn[31:16] + Rm[31:16], 16) Usage Use UQADD16 in similar ways to UADD16, but for unsigned saturated arithmetic. UQADD16 does not set the GE bits for use with SEL. See UADD16 on page A4-232 for more details. Notes Use of R15 ARM DDI 0100I Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-253 ARM Instructions A4.1.131 UQADD8 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 1 1 0 16 15 Rn 12 11 Rd 8 SBO 7 6 5 4 3 1 0 0 1 0 Rm UQADD8 performs four 8-bit integer additions. It saturates the results to the 8-bit unsigned integer range 0 ≤ x ≤ 28 – 1. It has no effect on the GE flags. Syntax UQADD8{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then Rd[7:0] = UnsignedSat(Rn[7:0] Rd[15:8] = UnsignedSat(Rn[15:8] Rd[23:16] = UnsignedSat(Rn[23:16] Rd[31:24] = UnsignedSat(Rn[31:24] + + + + Rm[7:0], Rm[15:8], Rm[23:16], Rm[31:24], 8) 8) 8) 8) Usage Use UQADD8 in similar ways to UADD8, but for unsigned saturated arithmetic. UQADD8 does not set the GE bits for use with SEL. See UADD8 on page A4-233 for more details. Notes Use of R15 A4-254 Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.132 UQADDSUBX 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 1 1 0 16 15 Rn 12 11 Rd 8 7 6 5 4 SBO 0 0 1 1 3 0 Rm UQADDSUBX (Unsigned Saturating Add and Subtract with Exchange) performs one 16-bit integer addition and one 16-bit subtraction. It saturates the results to the 16-bit unsigned integer range 0 ≤ x ≤ 216 – 1. It exchanges the two halfwords of the second operand before it performs the arithmetic. It has no effect on the GE flags. Syntax UQADDSUBX{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then Rd[15:0] = UnsignedSat(Rn[15:0] - Rm[31:16], 16) Rd[31:16] = UnsignedSat(Rn[31:16] + Rm[15:0], 16) Usage Use UQADDSUBX in similar ways to UADDSUBX, but for unsigned saturated arithmetic. UQADDSUBX does not set the GE bits for use with SEL. See UADDSUBX on page A4-235 for more details. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-255 ARM Instructions Notes Use of R15 A4-256 Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.133 UQSUB16 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 1 1 0 16 15 Rn 12 11 Rd 8 7 6 5 4 SBO 0 1 1 1 3 0 Rm UQSUB16 (Unsigned Saturating Subtract) performs two 16-bit subtractions. It saturates the results to the 16-bit unsigned integer range 0 ≤ x ≤ 216 – 1. It has no effect on the GE flags. Syntax UQSUB16{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then Rd[15:0] = UnsignedSat(Rn[15:0] - Rm[15:0], 16) Rd[31:16] = UnsignedSat(Rn[31:16] - Rm[31:16], 16) Usage Use UQSUB16 in similar ways to USUB16, but for unsigned saturated arithmetic. UQSUB16 does not set the GE bits for use with SEL. See SSUB16 on page A4-180 for more details. Notes Use of R15 ARM DDI 0100I Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-257 ARM Instructions A4.1.134 UQSUB8 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 1 1 0 16 15 Rn 12 11 Rd 8 SBO 7 6 5 4 3 1 1 1 1 0 Rm UQSUB8 performs four 8-bit subtractions. It saturates the results to the 8-bit unsigned integer range 0 ≤ x ≤ 28 – 1. It has no effect on the GE flags. Syntax UQSUB8{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then Rd[7:0] = UnsignedSat(Rn[7:0] Rd[15:8] = UnsignedSat(Rn[15:8] Rd[23:16] = UnsignedSat(Rn[23:16] Rd[31:24] = UnsignedSat(Rn[31:24] - Rm[7:0], Rm[15:8], Rm[23:16], Rm[31:24], 8) 8) 8) 8) Usage Use UQSUB8 in similar ways to USUB8, but for unsigned saturated arithmetic. UQSUB8 does not set the GE bits for use with SEL. See SSUB8 on page A4-182 for more details. Notes Use of R15 A4-258 Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.135 UQSUBADDX 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 1 1 0 16 15 Rn 12 11 Rd 8 7 6 5 4 SBO 0 1 0 1 3 0 Rm UQSUBADDX (Unsigned Saturating Subtract and Add with Exchange) performs one 16-bit integer subtraction and one 16-bit integer addition. It saturates the results to the 16-bit unsigned integer range 0 ≤ x ≤ 216 – 1. It exchanges the two halfwords of the second operand before it performs the arithmetic. It has no effect on the GE flags. Syntax UQSUBADDX{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then Rd[31:16] = UnsignedSat(Rn[31:16] - Rm[15:0], 16) Rd[15:0] = UnsignedSat(Rn[15:0] + Rm[31:16], 16) Usage You can use UQSUBADDX in similar ways to USUBADDX, but for unsigned saturated arithmetic. UQSUBADDX does not set the GE bits for use with SEL. See UADDSUBX on page A4-235 for more details. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-259 ARM Instructions Notes Use of R15 A4-260 Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.136 USAD8 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 1 1 0 0 0 16 15 14 13 12 11 Rd 1 1 1 1 8 7 6 5 4 Rs 0 0 0 1 3 0 Rm USAD8 (Unsigned Sum of Absolute Differences) performs four unsigned 8-bit subtractions, and adds the absolute values of the differences together. Syntax USAD8{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-261 ARM Instructions Operation if ConditionPassed(cond) then if Rm[7:0] < Rs[7:0] then diff1 = Rs[7:0] - Rm[7:0] else diff1 = Rm[7:0] - Rs[7:0] /* Unsigned comparison */ if Rm[15:8] < Rs[15:8] then diff2 = Rs[15:8] - Rm[15:8] else diff2 = Rm[15:8] - Rs[15:8] /* Unsigned comparison */ if Rm[23:16] < Rs[23:16] then /* Unsigned comparison */ diff3 = Rs[23:16] - Rm[23:16] else diff3 = Rm[23:16] - Rs[23:16] if Rm[31:24] < Rs[31:24] then /* Unsigned comparison */ diff4 = Rs[31:24] - Rm[31:24] else diff4 = Rm[31:24] - Rs[31:24] Rd = ZeroExtend(diff1) + ZeroExtend(diff2) + ZeroExtend(diff3) + ZeroExtend(diff4] Usage You can use USAD8 to process the first four bytes in a video motion estimation calculation. Notes Use of R15 A4-262 Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.137 USADA8 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 1 1 0 0 0 16 15 Rd 12 11 Rn 8 7 6 5 4 Rs 0 0 0 1 3 0 Rm USADA8 (Unsigned Sum of Absolute Differences and Accumulate) performs four unsigned 8-bit subtractions, and adds the absolute values of the differences to a 32-bit accumulate operand. Syntax USADA8{} , , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first main operand. Specifies the register that contains the second main operand. Specifies the register that contains the accumulate operand. Architecture version ARMv6 and above. Exceptions None. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-263 ARM Instructions Operation if ConditionPassed(cond) then if Rm[7:0] < Rs[7:0] then diff1 = Rs[7:0] - Rm[7:0] else diff1 = Rm[7:0] - Rs[7:0] /* Unsigned comparison */ if Rm[15:8] < Rs[15:8] then diff2 = Rs[15:8] - Rm[15:8] else diff2 = Rm[15:8] - Rs[15:8] /* Unsigned comparison */ if Rm[23:16] < Rs[23:16] then /* Unsigned comparison */ diff3 = Rs[23:16] - Rm[23:16] else diff3 = Rm[23:16] - Rs[23:16] if Rm[31:24] < Rs[31:24] then /* Unsigned comparison */ diff4 = Rs[31:24] - Rm[31:24] else diff4 = Rm[31:24] - Rs[31:24] Rd = Rn + ZeroExtend(diff1) + ZeroExtend(diff2) + ZeroExtend(diff3) + ZeroExtend(diff4] Usage You can use USADA8 in video motion estimation calculations. Notes A4-264 Use of R15 Specifying R15 for register , , or has UNPREDICTABLE results. Encoding If the field of the instruction contains 0b1111, the instruction is a USAD8 instruction instead, see USAD8 on page A4-261. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.138 USAT 31 28 27 26 25 24 23 22 21 20 cond 0 1 1 0 1 1 1 16 15 sat_imm 12 11 Rd 7 6 5 4 3 shift_imm sh 0 1 0 Rm USAT (Unsigned Saturate) saturates a signed value to an unsigned range. You can choose the bit position at which saturation occurs. You can apply a shift to the value before the saturation occurs. The Q flag is set if the operation saturates. Syntax USAT{} , #, {, } where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the bit position for saturation. This lies in the range 0 to 31. It is encoded in the sat_imm field of the instruction. Specifies the register that contains the signed value to be saturated. Specifies the optional shift. If present, it must be one of: • LSL #N. N must be in the range 0 to 31. This is encoded as sh == 0 and shift_imm == N. • ASR #N. N must be in the range 1 to 32. This is encoded as sh == 1 and either shift_imm == 0 for N == 32, or shift_imm == N otherwise. If is omitted, LSL #0 is used. Return The value returned in Rd is: 0 if X is < 0 X if 0 <= X < 2n 2n – 1 if X > 2n – 1 where n is , and X is the shifted value from Rm. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-265 ARM Instructions Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then if shift == 1 then if shift_imm == 0 then operand = (Rm Artihmetic_Shift_Right 32)[31:0] else operand = (Rm Artihmetic_Shift_Right shift_imm)[31:0] else operand = (Rm Logical_Shift_Left shift_imm)[31:0] Rd = UnsignedSat(operand, sat_imm) /* operand treated as signed */ if UnsignedDoesSat(operand, sat_imm) then Q Flag = 1 Usage You can use USAT in various DSP algorithms, such as calculating a pixel color component, that require scaling and saturation of signed data to an unsigned destination. Notes Use of R15 A4-266 Specifying R15 for register or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.139 USAT16 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 1 1 1 0 16 15 sat_imm 12 11 Rd 8 7 6 5 4 SBO 0 0 1 1 3 0 Rm USAT16 saturates two signed 16-bit values to an unsigned range. You can choose the bit position at which saturation occurs. The Q flag is set if either halfword operation saturates. Syntax USAT16{} , #, where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the bit position for saturation. This lies in the range 0 to 15. It is encoded in the sat_imm field of the instruction. Specifies the register that contains the signed value to be saturated. Return The value returned in each half of Rd is: 0 if X is < 0 X if 0 <= X < 2n 2n – 1 if X > 2n – 1 where n is , and X is the value from the corresponding half of Rm. Architecture version ARMv6 and above. Exceptions None. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-267 ARM Instructions Operation if ConditionPassed(cond) then Rd[15:0] = UnsignedSat(Rm[15:0], sat_imm) // Rm[15:0] treated as signed Rd[31:16] = UnsignedSat(Rm[31:16], sat_imm) // Rm[31:16] treated as signed if UnsignedDoesSat(Rm[15:0], sat_imm) OR UnsignedDoesSat(Rm[31:16], sat_imm) then Q Flag = 1 Usage You can use USAT16 in various DSP algorithms, such as calculating a pixel color component, that require saturation of signed data to an unsigned destination. Notes Use of R15 A4-268 Specifying R15 for register or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions A4.1.140 USUB16 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 1 0 1 16 15 Rn 12 11 Rd 8 7 6 5 4 SBO 0 1 1 1 3 0 Rm USUB16 (Unsigned Subtract) performs two 16-bit unsigned integer subtractions. It sets the GE bits in the CPSR as borrow bits for the subtractions. Syntax USUB16{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then Rd[15:0] = Rn[15:0] - Rm[15:0] GE[1:0] = if BorrowFrom(Rn[15:0] - Rm[15:0]) then 0 else 0b11 Rd[31:16] = Rn[31:16] - Rm[31:16] GE[3:2] = if BorrowFrom(Rn[31:16] - Rm[31:16]) then 0 else 0b11 Usage USUB16 produces the same result as SSUB16 (see SSUB16 on page A4-180), but produces GE bit values based on unsigned arithmetic instead of signed arithmetic. Notes Use of R15 ARM DDI 0100I Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-269 ARM Instructions A4.1.141 USUB8 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 1 0 1 16 15 Rn 12 11 Rd 8 SBO 7 6 5 4 3 1 1 1 1 0 Rm USUB8 performs four 8-bit unsigned integer subtractions. It sets the GE bits in the CPSR as borrow bits for the subtractions. Syntax USUB8{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then Rd[7:0] = Rn[7:0] - Rm[7:0] GE[0] = NOT BorrowFrom(Rn[7:0] - Rm[7:0]) Rd[15:8] = Rn[15:8] - Rm[15:8] GE[1] = NOT BorrowFrom(Rn[15:8] - Rm[15:8]) Rd[23:16] = Rn[23:16] - Rm[23:16] GE[2] = NOT BorrowFrom(Rn[23:16] - Rm[23:16]) Rd[31:24] = Rn[31:24] - Rm[31:24] GE[3] = NOT BorrowFrom(Rn[31:24] - Rm[31:24]) Usage USUB8 produces the same result as SSUB8 (see SSUB8 on page A4-182), but produces GE bit values based on unsigned arithmetic instead of signed arithmetic. A4-270 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Notes Use of R15 ARM DDI 0100I Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-271 ARM Instructions A4.1.142 USUBADDX 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 0 1 0 1 16 15 Rn 12 11 Rd 8 SBO 7 6 5 4 3 0 1 0 1 0 Rm USUBADDX (Unsigned Subtract and Add with Exchange) performs one 16-bit unsigned integer subtraction and one 16-bit unsigned integer addition. It exchanges the two halfwords of the second operand before it performs the arithmetic. It sets the GE bits in the CPSR as borrow and carry bits. Syntax USUBADDX{} , , where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. Architecture version ARMv6 and above. Exceptions None. Operation if ConditionPassed(cond) then diff = Rn[31:16] - Rm[15:0] Rd[31:16] = diff[15:0] GE[3:2] = if BorrowFrom(Rn[31:16] sum = Rn[15:0] + Rm[31:16] Rd[15:0] = sum[15:0] GE[1:0] = if CarryFrom16(Rn[15:0] A4-272 /* unsigned subtraction */ - Rm[15:0]) then 0b11 else 0 /* unsigned addition */ + Rm[31:16]) then 0b11 else 0 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Usage USUBADDX produces the same result as SSUBADDX (see SSUBADDX on page A4-184), but produces GE bit values based on unsigned arithmetic instead of signed arithmetic. Notes Use of R15 ARM DDI 0100I Specifying R15 for register , , or has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-273 ARM Instructions A4.1.143 UXTAB 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 1 1 1 0 16 15 Rn 12 11 10 9 8 Rd 7 6 5 4 3 rotate SBZ 0 1 1 1 0 Rm UXTAB extracts an 8-bit value from a register, zero extends it to 32 bits, and adds the result to the value in another register. You can specify a rotation by 0, 8, 16, or 24 bits before extracting the 8-bit value. Syntax UXTAB{} , , {, } where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. This can be any one of: • ROR #8. This is encoded as 0b01 in the rotate field. • ROR #16. This is encoded as 0b10 in the rotate field. • ROR #24. This is encoded as 0b11 in the rotate field. • Omitted. This is encoded as 0b00 in the rotate field. Note If your assembler accepts shifts by #0 and treats them as equivalent to no shift or LSL #0, then it must accept ROR #0 here. It is equivalent to omitting . Architecture version ARMv6 and above. Exceptions None. A4-274 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then operand2 = (Rm Rotate_Right(8 * rotate)) AND 0x000000ff Rd = Rn + operand2 Usage You can use UXTAB to eliminate a separate sign-extension instruction in many instruction sequences that act on unsigned char values in C/C++. Notes Use of R15 Specifying R15 for register or has UNPREDICTABLE results. Note Your assembler must fault the use of R15 for register . Encoding ARM DDI 0100I If the field of the instruction contains 0b1111, the instruction is an UXTB instruction instead, see UXTB on page A4-280. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-275 ARM Instructions A4.1.144 UXTAB16 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 1 1 0 0 16 15 Rn 12 11 10 9 8 Rd 7 6 5 4 3 rotate SBZ 0 1 1 1 0 Rm UXTAB16 extracts two 8-bit values from a register, zero extends them to 16 bits each, and adds the results to the two values from another register. You can specify a rotation by 0, 8, 16, or 24 bits before extracting the 8-bit values. Syntax UXTAB16{} , , {, } where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. This can be any one of: • ROR #8. This is encoded as 0b01 in the rotate field. • ROR #16. This is encoded as 0b10 in the rotate field. • ROR #24. This is encoded as 0b11 in the rotate field. • Omitted. This is encoded as 0b00 in the rotate field. Note If your assembler accepts shifts by #0 and treats them as equivalent to no shift or LSL #0, then it must accept ROR #0 here. It is equivalent to omitting . Architecture version ARMv6 and above. Exceptions None. A4-276 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then operand2 = (Rm Rotate_Right(8 * rotate)) AND 0x00ff00ff Rd[15:0] = Rn[15:0] + operand2[15:0] Rd[31:16] = Rn[31:16] + operand2[23:16] Usage Use UXTAB16 to keep intermediate values to higher precision while working on arrays of unsigned byte values. Notes Use of R15 Specifying R15 for register or has UNPREDICTABLE results. Note Your assembler must fault the use of R15 for register . Encoding ARM DDI 0100I If the field of the instruction contains 0b1111, the instruction is a UXTB16 instruction instead, see UXTB16 on page A4-282. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-277 ARM Instructions A4.1.145 UXTAH 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 1 1 1 1 16 15 Rn 12 11 10 9 8 Rd 7 6 5 4 3 rotate SBZ 0 1 1 1 0 Rm UXTAH extracts a 16-bit value from a register, zero extends it to 32 bits, and adds the result to a value in another register. You can specify a rotation by 0, 8, 16, or 24 bits before extracting the 16-bit value. Syntax UXTAH{} , , {, } where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the first operand. Specifies the register that contains the second operand. This can be any one of: • ROR #8. This is encoded as 0b01 in the rotate field. • ROR #16. This is encoded as 0b10 in the rotate field. • ROR #24. This is encoded as 0b11 in the rotate field. • Omitted. This is encoded as 0b00 in the rotate field. Note If your assembler accepts shifts by #0 and treats them as equivalent to no shift or LSL #0, then it must accept ROR #0 here. It is equivalent to omitting . Architecture version ARMv6 and above. Exceptions None. A4-278 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then operand2 = (Rm Rotate_Right(8 * rotate)) AND 0x0000ffff Rd = Rn + operand2 Usage You can use UXTAH to eliminate a separate zero-extension instruction in many instruction sequences that act on unsigned short values in C/C++. Notes Use of R15 Specifying R15 for register or has UNPREDICTABLE results. Note Your assembler must fault the use of R15 for register . Encoding ARM DDI 0100I If the field of the instruction contains 0b1111, the instruction is a UXTH instruction instead, see UXTH on page A4-284. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-279 ARM Instructions A4.1.146 UXTB 31 28 27 26 25 24 23 22 21 20 19 18 17 16 15 cond 0 1 1 0 1 1 1 0 1 1 1 1 12 11 10 9 8 Rd 7 6 5 4 3 rotate SBZ 0 1 1 1 0 Rm UXTB extracts an 8-bit value from a register and zero extends it to 32 bits. You can specify a rotation by 0, 8, 16, or 24 bits before extracting the 8-bit value. Syntax UXTB{} , {, } where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the operand. This can be any one of: • ROR #8. This is encoded as 0b01 in the rotate field. • ROR #16. This is encoded as 0b10 in the rotate field. • ROR #24. This is encoded as 0b11 in the rotate field. • Omitted. This is encoded as 0b00 in the rotate field. Note If your assembler accepts shifts by #0 and treats them as equivalent to no shift or LSL #0, then it must accept ROR #0 here. It is equivalent to omitting . Architecture version ARMv6 and above. Exceptions None. A4-280 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then Rd[31:0] = (Rm Rotate_Right(8 * rotate)) AND 0x000000ff Usage Use UXTB to zero extend a byte to a word, for example in instruction sequences acting on unsigned char values in C/C++. Notes Use of R15 ARM DDI 0100I Specifying R15 for register or has UNPREDICTABLE results Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-281 ARM Instructions A4.1.147 UXTB16 31 28 27 26 25 24 23 22 21 20 19 18 17 16 15 cond 0 1 1 0 1 1 0 0 1 1 1 1 12 11 10 9 8 Rd 7 6 5 4 3 rotate SBZ 0 1 1 1 0 Rm UXTB16 extracts two 8-bit values from a register and zero extends them to 16 bits each. You can specify a rotation by 0, 8, 16, or 24 bits before extracting the 8-bit values. Syntax UXTB16{} , {, } where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the operand. This can be any one of: • ROR #8. This is encoded as 0b01 in the rotate field. • ROR #16. This is encoded as 0b10 in the rotate field. • ROR #24. This is encoded as 0b11 in the rotate field. • Omitted. This is encoded as 0b00 in the rotate field. Note If your assembler accepts shifts by #0 and treats them as equivalent to no shift or LSL #0, then it must accept ROR #0 here. It is equivalent to omitting . Architecture version ARMv6 and above. Exceptions None. A4-282 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then Rd[31:0] = (Rm Rotate_Right(8 * rotate)) AND 0x00ff00ff Usage Use UXTB16 to zero extend a byte to a halfword, for example in instruction sequences acting on unsigned char values in C/C++. Notes Use of R15 ARM DDI 0100I Specifying R15 for register or has UNPREDICTABLE results Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-283 ARM Instructions A4.1.148 UXTH 31 28 27 26 25 24 23 22 21 20 19 18 17 16 15 cond 0 1 1 0 1 1 1 1 1 1 1 1 12 11 10 9 8 Rd 7 6 5 4 3 rotate SBZ 0 1 1 1 0 Rm UXTH extracts a 16-bit value from a register and zero extends it to 32 bits. You can specify a rotation by 0, 8, 16, or 24 bits before extracting the 16-bit value. Syntax UXTH{} , {, } where: Is the condition under which the instruction is executed. The conditions are defined in The condition field on page A3-3. If is omitted, the AL (always) condition is used. Specifies the destination register. Specifies the register that contains the operand. This can be any one of: • ROR #8. This is encoded as 0b01 in the rotate field. • ROR #16. This is encoded as 0b10 in the rotate field. • ROR #24. This is encoded as 0b11 in the rotate field. • Omitted. This is encoded as 0b00 in the rotate field. Note If your assembler accepts shifts by #0 and treats them as equivalent to no shift or LSL #0, then it must accept ROR #0 here. It is equivalent to omitting . Architecture version ARMv6 and above. Exceptions None. A4-284 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Operation if ConditionPassed(cond) then Rd[31:0] = (Rm Rotate_Right(8 * rotate)) AND 0x0000ffff Usage Use UXTH to zero extend a halfword to a word, for example in instruction sequences acting on unsigned short values in C/C++. Notes Use of R15 ARM DDI 0100I Specifying R15 for register or has UNPREDICTABLE results Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-285 ARM Instructions A4.2 ARM instructions and architecture versions Table A4-2 shows which ARM instructions are present in each current ARM architecture version. Table A4-2 ARM instructions by architecture version A4-286 Instruction v4 v4T v5T v5TE, v5TEJ, v5TExP v6 ADC Yes Yes Yes Yes Yes ADD Yes Yes Yes Yes Yes AND Yes Yes Yes Yes Yes B Yes Yes Yes Yes Yes BIC Yes Yes Yes Yes Yes BKPT No No Yes Yes Yes BL Yes Yes Yes Yes Yes BLX (both forms) No No Yes Yes Yes BX No Yes Yes Yes Yes BXJ No No No Only v5TEJ Yes CDP Yes Yes Yes Yes Yes CDP2 No No Yes Yes Yes CLZ No No Yes Yes Yes CMN Yes Yes Yes Yes Yes CMP Yes Yes Yes Yes Yes CPS No No No No Yes CPY No No No No Yes EOR Yes Yes Yes Yes Yes LDC Yes Yes Yes Yes Yes LDC2 No No Yes Yes Yes LDM (all forms) Yes Yes Yes Yes Yes LDR Yes Yes Yes Yes Yes Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Table A4-2 ARM instructions by architecture version (continued) Instruction v4 v4T v5T v5TE, v5TEJ, v5TExP v6 LDRB Yes Yes Yes Yes Yes LDRD No No No Only v5TE, v5TEJ Yes LDRBT Yes Yes Yes Yes Yes LDREX No No No No Yes LDRH Yes Yes Yes Yes Yes LDRSB Yes Yes Yes Yes Yes LDRSH Yes Yes Yes Yes Yes LDRT Yes Yes Yes Yes Yes MCR Yes Yes Yes Yes Yes MCR2 No No Yes Yes Yes MCRR No No No Only v5TE, v5TEJ Yes MCRR2 No No No No Yes MLA Yes Yes Yes Yes Yes MOV Yes Yes Yes Yes Yes MRC Yes Yes Yes Yes Yes MRC2 No No Yes Yes Yes MRRC No No No Only v5TE, v5TEJ Yes MRRC2 No No No No Yes MRS Yes Yes Yes Yes Yes MSR Yes Yes Yes Yes Yes MUL Yes Yes Yes Yes Yes MVN Yes Yes Yes Yes Yes ORR Yes Yes Yes Yes Yes PKH (both forms) No No No No Yes ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-287 ARM Instructions Table A4-2 ARM instructions by architecture version (continued) A4-288 Instruction v4 v4T v5T v5TE, v5TEJ, v5TExP v6 PLD No No No Only v5TE, v5TEJ Yes QADD No No No Yes Yes QADD16 No No No No Yes QADD8 No No No No Yes QADDSUBX No No No No Yes QDADD No No No Yes Yes QDSUB No No No Yes Yes QSUB No No No Yes Yes QSUB16 No No No No Yes QSUB8 No No No No Yes QSUBADDX No No No No Yes REV (all forms) No No No No Yes RFE No No No No Yes RSB Yes Yes Yes Yes Yes RSC Yes Yes Yes Yes Yes SADD (all forms) No No No No Yes SBC Yes Yes Yes Yes Yes SEL No No No No Yes SETEND No No No No Yes SHADD (all forms) No No No No Yes SHSUB (all forms) No No No No Yes SMLAD No No No No Yes SMLAL Yes Yes Yes Yes Yes SMLALD No No No No Yes Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Instructions Table A4-2 ARM instructions by architecture version (continued) Instruction v4 v4T v5T v5TE, v5TEJ, v5TExP v6 SMLA No No No Yes Yes SMLAL No No No Yes Yes SMLAW No No No Yes Yes SMLSD No No No No Yes SMLSLD No No No No Yes SMMLA No No No No Yes SMMLS No No No No Yes SMMUL No No No No Yes SMUAD No No No No Yes SMULL Yes Yes Yes Yes Yes SMUL No No No Yes Yes SMULW No No No Yes Yes SMUSD No No No No Yes SRS No No No No Yes SSAT (both forms) No No No No Yes SSUB (all forms) No No No No Yes STC Yes Yes Yes Yes Yes STC2 No No Yes Yes Yes STM (both forms) Yes Yes Yes Yes Yes STR Yes Yes Yes Yes Yes STRB Yes Yes Yes Yes Yes STRBT Yes Yes Yes Yes Yes STRD No No No Only v5TE, v5TEJ Yes STREX No No No No Yes ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A4-289 ARM Instructions Table A4-2 ARM instructions by architecture version (continued) A4-290 Instruction v4 v4T v5T v5TE, v5TEJ, v5TExP v6 STRH Yes Yes Yes Yes Yes STRT Yes Yes Yes Yes Yes SUB Yes Yes Yes Yes Yes SWI Yes Yes Yes Yes Yes SWP Yes Yes Yes Yes Deprecated SWPB Yes Yes Yes Yes Deprecated SXT (all forms) No No No No Yes TEQ Yes Yes Yes Yes Yes TST Yes Yes Yes Yes Yes UADD (all forms) No No No No Yes UHADD (all forms) No No No No Yes UMAAL No No No No Yes UMLAL Yes Yes Yes Yes Yes UMULL Yes Yes Yes Yes Yes UQADD (all forms) No No No No Yes UQSUB (all forms) No No No No Yes USAD (both forms) No No No No Yes USAT (both forms) No No No No Yes USUB (all forms) No No No No Yes UXT (all forms) No No No No Yes Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I Chapter A5 ARM Addressing Modes This chapter describes each of the five addressing modes used with ARM® instructions. The chapter contains the following sections: • Addressing Mode 1 - Data-processing operands on page A5-2 • Addressing Mode 2 - Load and Store Word or Unsigned Byte on page A5-18 • Addressing Mode 3 - Miscellaneous Loads and Stores on page A5-33 • Addressing Mode 4 - Load and Store Multiple on page A5-41 • Addressing Mode 5 - Load and Store Coprocessor on page A5-49. Note All valid architecture variants (from v4, see Architecture versions and variants on page xiii) support address modes 1 to 5 inclusive. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A5-1 ARM Addressing Modes A5.1 Addressing Mode 1 - Data-processing operands There are 11 formats used to calculate the in an ARM data-processing instruction. The general instruction syntax is: {}{S} , , where is one of the following: 1. # See Data-processing operands - Immediate on page A5-6. 2. See Data-processing operands - Register on page A5-8. 3. , LSL # See Data-processing operands - Logical shift left by immediate on page A5-9. 4. , LSL See Data-processing operands - Logical shift left by register on page A5-10. 5. , LSR # See Data-processing operands - Logical shift right by immediate on page A5-11. 6. , LSR See Data-processing operands - Logical shift right by register on page A5-12. 7. , ASR # See Data-processing operands - Arithmetic shift right by immediate on page A5-13. 8. , ASR See Data-processing operands - Arithmetic shift right by register on page A5-14. 9. , ROR # See Data-processing operands - Rotate right by immediate on page A5-15. 10. , ROR See Data-processing operands - Rotate right by register on page A5-16. 11. , RRX See Data-processing operands - Rotate right with extend on page A5-17. A5-2 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Addressing Modes A5.1.1 Encoding The following diagrams show the encodings for this addressing mode: 32-bit immediate 31 28 27 26 25 24 cond 0 0 1 21 20 19 opcode S 16 15 Rn 12 11 Rd 8 7 0 rotate_imm immed_8 Immediate shifts 31 28 27 26 25 24 cond 0 0 0 21 20 19 opcode S 16 15 Rn 12 11 Rd 7 6 5 4 shift_imm 3 shift 0 0 Rm Register shifts 31 28 27 26 25 24 cond 0 0 0 21 20 19 opcode S 16 15 Rn 12 11 Rd 8 7 6 5 4 Rs 0 shift 1 3 0 Rm opcode Specifies the operation of the instruction. S bit Indicates that the instruction updates the condition codes. Rd Specifies the destination register. Rn Specifies the first source operand register. Bits[11:0] The fields within bits[11:0] are collectively called a shifter operand. This is described in The shifter operand on page A5-4. Bit[25] Is referred to as the I bit, and is used to distinguish between an immediate shifter operand and a register-based shifter operand. If all three of the following bits have the values shown, the instruction is not a data-processing instruction, but lies in the arithmetic or Load/Store instruction extension space: bit[25] bit[4] bit[7] == 0 == 1 == 1 See Extending the instruction set on page A3-32 for more information. Addressing mode 3, MCRR{2}, MRRC{2}, STC{2} are examples of instructions that reside in this space. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A5-3 ARM Addressing Modes A5.1.2 The shifter operand As well as producing the shifter operand, the shifter produces a carry-out which some instructions write into the Carry Flag. The default register operand (register Rm specified with no shift) uses the form register shift left by immediate, with the immediate set to zero. The shifter operand takes one of the following three basic formats. Immediate operand value An immediate operand value is formed by rotating an 8-bit constant (in a 32-bit word) by an even number of bits (0,2,4,8...26,28,30). Therefore, each instruction contains an 8-bit constant and a 4-bit rotate to be applied to that constant. Some valid constants are: 0xFF,0x104,0xFF0,0xFF00,0xFF000,0xFF000000,0xF000000F Some invalid constants are: 0x101,0x102,0xFF1,0xFF04,0xFF003,0xFFFFFFFF,0xF000001F For example: MOV ADD CMP BIC R0, R3, R7, R9, #0 R3, #1 #1000 R8, #0xFF00 ; ; ; ; Move zero to R0 Add one to the value of register 3 Compare value of R7 with 1000 Clear bits 8-15 of R8 and store in R9 Register operand value A register operand value is simply the value of a register. The value of the register is used directly as the operand to the data-processing instruction. For example: MOV ADD CMP A5-4 R2, R0 R4, R3, R2 R7, R8 ; Move the value of R0 to R2 ; Add R2 to R3, store result in R4 ; Compare the value of R7 and R8 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Addressing Modes Shifted register operand value A shifted register operand value is the value of a register, shifted (or rotated) before it is used as the data-processing operand. There are five types of shift: ASR Arithmetic shift right LSL Logical shift left LSR Logical shift right ROR Rotate right RRX Rotate right with extend. The number of bits to shift by is specified either as an immediate or as the value of a register. For example: MOV ADD RSB SUB MOV ARM DDI 0100I R2, R9, R9, R10, R12, R0, R5, R5, R9, R4, LSL R5, R5, R8, ROR #2 LSL #3 LSL #3 LSR #4 R3 ; ; ; ; Shift R0 left by 2, write to R2, (R2=R0x4) R9 = R5 + R5 x 8 or R9 = R5 x 9 R9 = R5 x 8 - R5 or R9 = R5 x 7 R10 = R9 - R8 / 16 ; R12 = R4 rotated right by value of R3 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A5-5 ARM Addressing Modes A5.1.3 Data-processing operands - Immediate 31 28 27 26 25 24 cond 0 0 1 21 20 19 opcode S 16 15 Rn 12 11 Rd 8 7 rotate_imm 0 immed_8 This data-processing operand provides a constant (defined in the instruction) operand to a data-processing instruction. The value is formed by rotating (to the right) an 8-bit immediate value to any even bit position in a 32-bit word. If the rotate immediate is zero, the carry-out from the shifter is the value of the C flag, otherwise, it is set to bit[31] of the value of . Syntax # where: Specifies the immediate constant wanted. It is encoded in the instruction as an 8-bit immediate (immed_8) and a 4-bit immediate (rotate_imm), so that is equal to the result of rotating immed_8 right by (2 × rotate_imm) bits. Operation shifter_operand = immed_8 Rotate_Right (rotate_imm * 2) if rotate_imm == 0 then shifter_carry_out = C flag else /* rotate_imm != 0 */ shifter_carry_out = shifter_operand[31] A5-6 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Addressing Modes Notes Legitimate immediates Not all 32-bit immediates are legitimate. Only those that can be formed by rotating an 8-bit immediate right by an even amount are valid 32-bit immediates for this format. Encoding Some values of have more than one possible encoding. For example, a value of 0x3F0 could be encoded as: immed_8 == 0x3F, rotate_imm == 0xE or as: immed_8 == 0xFC, rotate_imm == 0xF When more than one encoding is available, an assembler must choose the correct one to use, as follows: • If lies in the range 0 to 0xFF, an encoding with rotate_imm == 0 is available. The assembler must choose that encoding. (Choosing another encoding would affect how some instructions set the C flag.) • Otherwise, it is recommended that the encoding with the smallest value of rotate_imm is chosen. (This choice does not affect instruction functionality.) For more precise control of the encoding, the instruction fields can be specified directly by using the syntax: #, Use of R15 ARM DDI 0100I where = 2 * rotate_imm. If R15 is specified as register Rn, the value used is the address of the current instruction plus eight. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A5-7 ARM Addressing Modes A5.1.4 Data-processing operands - Register 31 28 27 26 25 24 cond 0 0 0 21 20 19 opcode S 16 15 Rn 12 11 10 9 8 Rd 7 6 5 4 3 0 0 0 0 0 0 0 0 0 Rm This data-processing operand provides the value of a register directly. The carry-out from the shifter is the C flag. Syntax where: Specifies the register whose value is the instruction operand. Operation shifter_operand = Rm shifter_carry_out = C Flag Notes A5-8 Encoding This instruction is encoded as a logical shift left by immediate (see Data-processing operands - Logical shift left by immediate on page A5-9) with a shift of zero (shift_imm == 0). Use of R15 If R15 is specified as register Rm or Rn, the value used is the address of the current instruction plus 8. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Addressing Modes A5.1.5 Data-processing operands - Logical shift left by immediate 31 28 27 26 25 24 cond 0 0 0 21 20 19 opcode S 16 15 Rn 12 11 Rd 7 6 5 4 shift_imm 0 0 0 3 0 Rm This data-processing operand is used to provide either the value of a register directly (lone register operand, as described in Data-processing operands - Register on page A5-8), or the value of a register shifted left (multiplied by a constant power of two). This instruction operand is the value of register Rm, logically shifted left by an immediate value in the range 0 to 31. Zeros are inserted into the vacated bit positions. The carry-out from the shifter is the last bit shifted out, or the C flag if no shift is specified. Syntax , LSL # where: Specifies the register whose value is to be shifted. LSL Indicates a logical shift left. Specifies the shift. This is a value between 0 and 31. Operation if shift_imm == 0 then /* Register Operand */ shifter_operand = Rm shifter_carry_out = C Flag else /* shift_imm > 0 */ shifter_operand = Rm Logical_Shift_Left shift_imm shifter_carry_out = Rm[32 - shift_imm] Notes Default shift If the value of == 0, the operand can be written as just (see Data-processing operands - Register on page A5-8). Use of R15 ARM DDI 0100I If R15 is specified as register Rm or Rn, the value used is the address of the current instruction plus 8. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A5-9 ARM Addressing Modes A5.1.6 Data-processing operands - Logical shift left by register 31 28 27 26 25 24 cond 0 0 0 21 20 19 opcode S 16 15 Rn 12 11 Rd 8 Rs 7 6 5 4 3 0 0 0 1 0 Rm This data-processing operand is used to provide the value of a register multiplied by a variable power of two. This instruction operand is the value of register Rm, logically shifted left by the value in the least significant byte of register Rs. Zeros are inserted into the vacated bit positions. The carry-out from the shifter is the last bit shifted out, which is zero if the shift amount is more than 32, or the C flag if the shift amount is zero. Syntax , LSL where: Specifies the register whose value is to be shifted. LSL Indicates a logical shift left. Is the register containing the value of the shift. Operation if Rs[7:0] == 0 then shifter_operand = Rm shifter_carry_out = C Flag else if Rs[7:0] < 32 then shifter_operand = Rm Logical_Shift_Left Rs[7:0] shifter_carry_out = Rm[32 - Rs[7:0]] else if Rs[7:0] == 32 then shifter_operand = 0 shifter_carry_out = Rm[0] else /* Rs[7:0] > 32 */ shifter_operand = 0 shifter_carry_out = 0 Notes Use of R15 A5-10 Specifying R15 as register Rd, register Rm, register Rn, or register Rs has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Addressing Modes A5.1.7 Data-processing operands - Logical shift right by immediate 31 28 27 26 25 24 cond 0 0 0 21 20 19 opcode S 16 15 Rn 12 11 Rd 7 6 5 4 shift_imm 0 1 0 3 0 Rm This data-processing operand is used to provide the unsigned value of a register shifted right (divided by a constant power of two). This instruction operand is the value of register Rm, logically shifted right by an immediate value in the range 1 to 32. Zeros are inserted into the vacated bit positions. The carry-out from the shifter is the last bit shifted out. Syntax , LSR # where: Specifies the register whose value is to be shifted. LSR Indicates a logical shift right. Specifies the shift. This is an immediate value between 1 and 32. (A shift by 32 is encoded by shift_imm == 0.) Operation if shift_imm == 0 then shifter_operand = 0 shifter_carry_out = Rm[31] else /* shift_imm > 0 */ shifter_operand = Rm Logical_Shift_Right shift_imm shifter_carry_out = Rm[shift_imm - 1] Notes Use of R15 ARM DDI 0100I If R15 is specified as register Rm or Rn, the value used is the address of the current instruction plus 8. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A5-11 ARM Addressing Modes A5.1.8 Data-processing operands - Logical shift right by register 31 28 27 26 25 24 cond 0 0 0 21 20 19 opcode S 16 15 Rn 12 11 Rd 8 Rs 7 6 5 4 3 0 0 1 1 0 Rm This data-processing operand is used to provide the unsigned value of a register shifted right (divided by a variable power of two). It is produced by the value of register Rm, logically shifted right by the value in the least significant byte of register Rs. Zeros are inserted into the vacated bit positions. The carry-out from the shifter is the last bit shifted out, which is zero if the shift amount is more than 32, or the C flag if the shift amount is zero. Syntax , LSR where: Specifies the register whose value is to be shifted. LSR Indicates a logical shift right. Is the register containing the value of the shift. Operation if Rs[7:0] == 0 then shifter_operand = Rm shifter_carry_out = C Flag else if Rs[7:0] < 32 then shifter_operand = Rm Logical_Shift_Right Rs[7:0] shifter_carry_out = Rm[Rs[7:0] - 1] else if Rs[7:0] == 32 then shifter_operand = 0 shifter_carry_out = Rm[31] else /* Rs[7:0] > 32 */ shifter_operand = 0 shifter_carry_out = 0 Notes Use of R15 A5-12 Specifying R15 as register Rd, register Rm, register Rn, or register Rs has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Addressing Modes A5.1.9 Data-processing operands - Arithmetic shift right by immediate 31 28 27 26 25 24 cond 0 0 0 21 20 19 opcode S 16 15 Rn 12 11 Rd 7 6 5 4 shift_imm 1 0 0 3 0 Rm This data-processing operand is used to provide the signed value of a register arithmetically shifted right (divided by a constant power of two). This instruction operand is the value of register Rm, arithmetically shifted right by an immediate value in the range 1 to 32. The sign bit of Rm (Rm[31]) is inserted into the vacated bit positions. The carry-out from the shifter is the last bit shifted out. Syntax , ASR # where: Specifies the register whose value is to be shifted. ASR Indicates an arithmetic shift right. Specifies the shift. This is an immediate value between 1 and 32. (A shift by 32 is encoded by shift_imm == 0.) Operation if shift_imm == 0 then if Rm[31] == 0 then shifter_operand = 0 shifter_carry_out = Rm[31] else /* Rm[31] == 1 */ shifter_operand = 0xFFFFFFFF shifter_carry_out = Rm[31] else /* shift_imm > 0 */ shifter_operand = Rm Arithmetic_Shift_Right shifter_carry_out = Rm[shift_imm - 1] Notes Use of R15 ARM DDI 0100I If R15 is specified as register Rm or Rn, the value used is the address of the current instruction plus 8. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A5-13 ARM Addressing Modes A5.1.10 Data-processing operands - Arithmetic shift right by register 31 28 27 26 25 24 cond 0 0 0 21 20 19 opcode S 16 15 Rn 12 11 Rd 8 7 6 5 Rs 4 3 0 1 0 1 0 Rm This data-processing operand is used to provide the signed value of a register arithmetically shifted right (divided by a variable power of two). This instruction operand is the value of register Rm arithmetically shifted right by the value in the least significant byte of register Rs. The sign bit of Rm (Rm[31]) is inserted into the vacated bit positions. The carry-out from the shifter is the last bit shifted out, which is the sign bit of Rm if the shift amount is more than 32, or the C flag if the shift amount is zero. Syntax , ASR where: Specifies the register whose value is to be shifted. ASR Indicates an arithmetic shift right. Is the register containing the value of the shift. Operation if Rs[7:0] == 0 then shifter_operand = Rm shifter_carry_out = C Flag else if Rs[7:0] < 32 then shifter_operand = Rm Arithmetic_Shift_Right Rs[7:0] shifter_carry_out = Rm[Rs[7:0] - 1] else /* Rs[7:0] >= 32 */ if Rm[31] == 0 then shifter_operand = 0 shifter_carry_out = Rm[31] else /* Rm[31] == 1 */ shifter_operand = 0xFFFFFFFF shifter_carry_out = Rm[31] Notes Use of R15 A5-14 Specifying R15 as register Rd, register Rm, register Rn, or register Rs has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Addressing Modes A5.1.11 Data-processing operands - Rotate right by immediate 31 28 27 26 25 24 cond 0 0 0 21 20 19 opcode S 16 15 Rn 12 11 Rd 7 6 5 4 shift_imm 3 1 1 0 0 Rm This data-processing operand is used to provide the value of a register rotated by a constant value. This instruction operand is the value of register Rm rotated right by an immediate value in the range 1 to 31. As bits are rotated off the right end, they are inserted into the vacated bit positions on the left. The carry-out from the shifter is the last bit rotated off the right end. Syntax , ROR # where: Specifies the register whose value is to be rotated. ROR Indicates a rotate right. Specifies the rotation. This is an immediate value between 1 and 31. When shift_imm == 0, an RRX operation (rotate right with extend) is performed. This is described in Data-processing operands - Rotate right with extend on page A5-17. Operation if shift_imm == 0 then See “Data-processing operands - Rotate right with extend” on page A5-17 else /* shift_imm > 0 */ shifter_operand = Rm Rotate_Right shift_imm shifter_carry_out = Rm[shift_imm - 1] Notes Use of R15 ARM DDI 0100I If R15 is specified as register Rm or Rn, the value used is the address of the current instruction plus 8. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A5-15 ARM Addressing Modes A5.1.12 Data-processing operands - Rotate right by register 31 28 27 26 25 24 cond 0 0 0 21 20 19 opcode S 16 15 Rn 12 11 Rd 8 Rs 7 6 5 4 3 0 1 1 1 0 Rm This data-processing operand is used to provide the value of a register rotated by a variable value. This instruction operand is produced by the value of register Rm rotated right by the value in the least significant byte of register Rs. As bits are rotated off the right end, they are inserted into the vacated bit positions on the left. The carry-out from the shifter is the last bit rotated off the right end, or the C flag if the shift amount is zero. Syntax , ROR where: Specifies the register whose value is to be rotated. ROR Indicates a rotate right. Is the register containing the value of the rotation. Operation if Rs[7:0] == 0 then shifter_operand = Rm shifter_carry_out = C Flag else if Rs[4:0] == 0 then shifter_operand = Rm shifter_carry_out = Rm[31] else /* Rs[4:0] > 0 */ shifter_operand = Rm Rotate_Right Rs[4:0] shifter_carry_out = Rm[Rs[4:0] - 1] Notes Use of R15 A5-16 Specifying R15 as register Rd, register Rm, register Rn, or register Rs has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Addressing Modes A5.1.13 Data-processing operands - Rotate right with extend 31 28 27 26 25 24 cond 0 0 0 21 20 19 opcode S 16 15 Rn 12 11 10 9 Rd 8 7 6 5 4 0 0 0 0 0 1 1 0 3 0 Rm This data-processing operand can be used to perform a 33-bit rotate right using the Carry Flag as the 33rd bit. This instruction operand is the value of register Rm shifted right by one bit, with the Carry Flag replacing the vacated bit position. The carry-out from the shifter is the bit shifted off the right end. Syntax , RRX where: Specifies the register whose value is shifted right by one bit. RRX Indicates a rotate right with extend. Operation shifter_operand = (C Flag Logical_Shift_Left 31) OR (Rm Logical_Shift_Right 1) shifter_carry_out = Rm[0] Notes Encoding The instruction encoding is in the space that would be used for ROR #0. Use of R15 If R15 is specified as register Rm or Rn, the value used is the address of the current instruction plus 8. ADC instruction A rotate left with extend can be performed with an ADC instruction. ADC , where == for the modified operand to equal the result, or ADC , , , LSL #1 where the rotate left and extend is the second operand rather than the result. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A5-17 ARM Addressing Modes A5.2 Addressing Mode 2 - Load and Store Word or Unsigned Byte There are nine formats used to calculate the address for a Load and Store Word or Unsigned Byte instruction. The general instruction syntax is: LDR|STR{}{B}{T} , where is one of the nine options listed below. All nine of the following options are available for LDR, LDRB, STR and STRB. For LDRBT, LDRT, STRBT and STRBT, only the post-indexed options (the last three in the list) are available. For the PLD instruction described in PLD on page A4-90, only the offset options (the first three in the list) are available. 1. [, #+/-] See Load and Store Word or Unsigned Byte - Immediate offset on page A5-20. 2. [, +/-] See Load and Store Word or Unsigned Byte - Register offset on page A5-21. 3. [, +/-, #] See Load and Store Word or Unsigned Byte - Scaled register offset on page A5-22. 4. [, #+/-]! See Load and Store Word or Unsigned Byte - Immediate pre-indexed on page A5-24. 5. [, +/-]! See Load and Store Word or Unsigned Byte - Register pre-indexed on page A5-25. 6. [, +/-, #]! See Load and Store Word or Unsigned Byte - Scaled register pre-indexed on page A5-26. 7. [], #+/- See Load and Store Word or Unsigned Byte - Immediate post-indexed on page A5-28. 8. [], +/- See Load and Store Word or Unsigned Byte - Register post-indexed on page A5-30. 9. [], +/-, # See Load and Store Word or Unsigned Byte - Scaled register post-indexed on page A5-31. A5-18 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Addressing Modes A5.2.1 Encoding The following three diagrams show the encodings for this addressing mode: Immediate offset/index 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 0 P U B W L 16 15 Rn 12 11 0 Rd offset_12 Register offset/index 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 P U B W L 16 15 Rn 12 11 10 9 8 7 6 5 4 Rd 3 0 0 0 0 0 0 0 0 0 Rm Scaled register offset/index 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 P U B W L The P bit 16 15 Rn 12 11 Rd 7 6 5 4 shift_imm shift 0 3 0 Rm Has two meanings: P == 0 Indicates the use of post-indexed addressing. The base register value is used for the memory address, and the offset is then applied to the base register value and written back to the base register. P == 1 Indicates the use of offset addressing or pre-indexed addressing (the W bit determines which). The memory address is generated by applying the offset to the base register value. The U bit Indicates whether the offset is added to the base (U == 1) or is subtracted from the base (U == 0). The B bit Distinguishes between an unsigned byte (B == 1) and a word (B == 0) access. The W bit Has two meanings: The L bit ARM DDI 0100I P == 0 If W == 0, the instruction is LDR, LDRB, STR or STRB and a normal memory access is performed. If W == 1, the instruction is LDRBT, LDRT, STRBT or STRT and an unprivileged (User mode) memory access is performed. P == 1 If W == 0, the base register is not updated (offset addressing). If W == 1, the calculated memory address is written back to the base register (pre-indexed addressing). Distinguishes between a Load (L == 1) and a Store (L == 0). Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A5-19 ARM Addressing Modes A5.2.2 Load and Store Word or Unsigned Byte - Immediate offset 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 0 1 U B 0 L 16 15 Rn 12 11 Rd 0 offset_12 This addressing mode calculates an address by adding or subtracting the value of an immediate offset to or from the value of the base register Rn. Syntax [, #+/-] where: Specifies the register containing the base address. Specifies the immediate offset used with the value of Rn to form the address. Operation if U == 1 then address = Rn + offset_12 else /* U == 0 */ address = Rn - offset_12 Usage This addressing mode is useful for accessing structure (record) fields, and accessing parameters and local variables in a stack frame. With an offset of zero, the address produced is the unaltered value of the base register Rn. Notes Offset of zero The syntax [] is treated as an abbreviation for [, #0], unless the instruction is one that only allows post-indexed addressing modes (LDRBT, LDRT, STRBT or STRT). A5-20 The B bit This bit distinguishes between an unsigned byte (B==1) and a word (B==0) access. The L bit This bit distinguishes between a Load (L==1) and a Store (L==0) instruction. Use of R15 If R15 is specified as register Rn, the value used is the address of the instruction plus eight. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Addressing Modes A5.2.3 Load and Store Word or Unsigned Byte - Register offset 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 1 U B 0 L 16 15 Rn 12 11 10 9 8 7 6 5 4 Rd 3 0 0 0 0 0 0 0 0 0 Rm This addressing mode calculates an address by adding or subtracting the value of the index register Rm to or from the value of the base register Rn. Syntax [, +/-] where: Specifies the register containing the base address. Specifies the register containing the value to add to or subtract from Rn. Operation if U == 1 then address = Rn + Rm else /* U == 0 */ address = Rn - Rm Usage This addressing mode is used for pointer plus offset arithmetic, and accessing a single element of an array of bytes. Notes Encoding This addressing mode is encoded as an LSL scaled register offset, scaled by zero. The B bit This bit distinguishes between an unsigned byte (B==1) and a word (B==0) access. The L bit This bit distinguishes between a Load (L==1) and a Store (L==0) instruction. Use of R15 If R15 is specified as register Rn, the value used is the address of the instruction plus eight. Specifying R15 as register Rm has UNPREDICTABLE results. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A5-21 ARM Addressing Modes A5.2.4 Load and Store Word or Unsigned Byte - Scaled register offset 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 1 U B 0 L 16 15 Rn 12 11 Rd 7 6 5 4 3 shift_imm shift 0 0 Rm These five addressing modes calculate an address by adding or subtracting the shifted or rotated value of the index register Rm to or from the value of the base register Rn. Syntax One of: [, [, [, [, [, +/-, +/-, +/-, +/-, +/-, LSL #] LSR #] ASR #] ROR #] RRX] where: A5-22 Specifies the register containing the base address. Specifies the register containing the offset to add to or subtract from Rn. LSL Specifies a logical shift left. LSR Specifies a logical shift right. ASR Specifies an arithmetic shift right. ROR Specifies a rotate right. RRX Specifies a rotate right with extend. Specifies the shift or rotation. LSL 0 to 31, encoded directly in the shift_imm field. LSR 1 to 32. A shift amount of 32 is encoded as shift_imm == 0. Other shift amounts are encoded directly. ASR 1 to 32. A shift amount of 32 is encoded as shift_imm == 0. Other shift amounts are encoded directly. ROR 1 to 31, encoded directly in the shift_imm field. (The shift_imm == 0 encoding is used to specify the RRX option.) Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Addressing Modes Operation case shift of 0b00 /* LSL */ index = Rm Logical_Shift_Left shift_imm 0b01 /* LSR */ if shift_imm == 0 then /* LSR #32 */ index = 0 else index = Rm Logical_Shift_Right shift_imm 0b10 /* ASR */ if shift_imm == 0 then /* ASR #32 */ if Rm[31] == 1 then index = 0xFFFFFFFF else index = 0 else index = Rm Arithmetic_Shift_Right shift_imm 0b11 /* ROR or RRX */ if shift_imm == 0 then /* RRX */ index = (C Flag Logical_Shift_Left 31) OR (Rm Logical_Shift_Right 1) else /* ROR */ index = Rm Rotate_Right shift_imm endcase if U == 1 then address = Rn + index else /* U == 0 */ address = Rn - index Usage These addressing modes are used for accessing a single element of an array of values larger than a byte. Notes The B bit This bit distinguishes between an unsigned byte (B==1) and a word (B==0) access. The L bit This bit distinguishes between a Load (L==1) and a Store (L==0) instruction. Use of R15 If R15 is specified as register Rn, the value used is the address of the instruction plus eight. Specifying R15 as register Rm has UNPREDICTABLE results. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A5-23 ARM Addressing Modes A5.2.5 Load and Store Word or Unsigned Byte - Immediate pre-indexed 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 0 1 U B 1 L 16 15 Rn 12 11 Rd 0 offset_12 This addressing mode calculates an address by adding or subtracting the value of an immediate offset to or from the value of the base register Rn. If the condition specified in the instruction matches the condition code status, the calculated address is written back to the base register Rn. The conditions are defined in The condition field on page A3-3. Syntax [, #+/-]! where: Specifies the register containing the base address. Specifies the immediate offset used with the value of Rn to form the address. ! Sets the W bit, causing base register update. Operation if U == 1 then address = Rn + offset_12 else /* if U == 0 */ address = Rn - offset_12 if ConditionPassed(cond) then Rn = address Usage This addressing mode is used for pointer access to arrays with automatic update of the pointer value. Notes Offset of zero The syntax [] must never be treated as an abbreviation for [, #0]!. A5-24 The B bit This bit distinguishes between an unsigned byte (B==1) and a word (B==0) access. The L bit This bit distinguishes between a Load (L==1) and a Store (L==0) instruction. Use of R15 Specifying R15 as register Rn has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Addressing Modes A5.2.6 Load and Store Word or Unsigned Byte - Register pre-indexed 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 1 U B 1 L 16 15 Rn 12 11 10 9 8 7 6 5 4 Rd 0 0 0 0 0 0 0 0 3 0 Rm This addressing mode calculates an address by adding or subtracting the value of an index register Rm to or from the value of the base register Rn. If the condition specified in the instruction matches the condition code status, the calculated address is written back to the base register Rn. The conditions are defined in The condition field on page A3-3. Syntax [, +/-]! where: Specifies the register containing the base address. Specifies the register containing the offset to add to or subtract from Rn. ! Sets the W bit, causing base register update. Operation if U == 1 then address = Rn + Rm else /* U == 0 */ address = Rn - Rm if ConditionPassed(cond) then Rn = address Notes Encoding This addressing mode is encoded as an LSL scaled register offset, scaled by zero. The B bit This bit distinguishes between an unsigned byte (B==1) and a word (B==0) access. The L bit This bit distinguishes between a Load (L==1) and a Store (L==0) instruction. Use of R15 Specifying R15 as register Rm or Rn has UNPREDICTABLE results. Operand restriction There are no operand restrictions in ARMv6 and above. In earlier versions of the architecture, if the same register is specified for Rn and Rm, the result is UNPREDICTABLE. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A5-25 ARM Addressing Modes A5.2.7 Load and Store Word or Unsigned Byte - Scaled register pre-indexed 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 1 U B 1 L 16 15 Rn 12 11 Rd 7 6 5 4 3 shift_imm shift 0 0 Rm These five addressing modes calculate an address by adding or subtracting the shifted or rotated value of the index register Rm to or from the value of the base register Rn. If the condition specified in the instruction matches the condition code status, the calculated address is written back to the base register Rn. The conditions are defined in The condition field on page A3-3. Syntax One of: [, [, [, [, [, +/-, +/-, +/-, +/-, +/-, LSL #]! LSR #]! ASR #]! ROR #]! RRX]! where: Specifies the register containing the base address. Specifies the register containing the offset to add to or subtract from Rn. LSL Specifies a logical shift left. LSR Specifies a logical shift right. ASR Specifies an arithmetic shift right. ROR Specifies a rotate right. RRX Specifies a rotate right with extend. Specifies the shift or rotation. ! A5-26 LSL 0 to 31, encoded directly in the shift_imm field. LSR 1 to 32. A shift amount of 32 is encoded as shift_imm == 0. Other shift amounts are encoded directly. ASR 1 to 32. A shift amount of 32 is encoded as shift_imm == 0. Other shift amounts are encoded directly. ROR 1 to 31, encoded directly in the shift_imm field. (The shift_imm == 0 encoding is used to specify the RRX option.) Sets the W bit, causing base register update. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Addressing Modes Operation case shift of 0b00 /* LSL */ index = Rm Logical_Shift_Left shift_imm 0b01 /* LSR */ if shift_imm == 0 then /* LSR #32 */ index = 0 else index = Rm Logical_Shift_Right shift_imm 0b10 /* ASR */ if shift_imm == 0 then /* ASR #32 */ if Rm[31] == 1 then index = 0xFFFFFFFF else index = 0 else index = Rm Arithmetic_Shift_Right shift_imm 0b11 /* ROR or RRX */ if shift_imm == 0 then /* RRX */ index = (C Flag Logical_Shift_Left 31) OR (Rm Logical_Shift_Right 1) else /* ROR */ index = Rm Rotate_Right shift_imm endcase if U == 1 then address = Rn + index else /* U == 0 */ address = Rn - index if ConditionPassed(cond) then Rn = address Notes The B bit This bit distinguishes between an unsigned byte (B==1) and a word (B==0) access. The L bit This bit distinguishes between a Load (L==1) and a Store (L==0) instruction. Use of R15 Specifying R15 as register Rm or Rn has UNPREDICTABLE results. Operand restriction There are no operand restrictions in ARM v6 and above. In earlier versions of the architecture, if the same register is specified for Rn and Rm, the result is UNPREDICTABLE. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A5-27 ARM Addressing Modes A5.2.8 Load and Store Word or Unsigned Byte - Immediate post-indexed 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 0 0 U B 0 L 16 15 Rn 12 11 Rd 0 offset_12 This addressing mode uses the value of the base register Rn as the address for the memory access. If the condition specified in the instruction matches the condition code status, the value of the immediate offset is added to or subtracted from the value of the base register Rn and written back to the base register Rn. The conditions are defined in The condition field on page A3-3. Syntax [], #+/- where: Specifies the register containing the base address. Specifies the immediate offset used with the value of Rn to form the address. Operation address = Rn if ConditionPassed(cond) then if U == 1 then Rn = Rn + offset_12 else /* U == 0 */ Rn = Rn - offset_12 Usage This addressing mode is used for pointer access to arrays with automatic update of the pointer value. A5-28 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Addressing Modes Notes Post-indexed addressing modes LDRBT, LDRT, STRBT, and STRT only support post-indexed addressing modes. They use a minor modification of the above bit pattern, where bit[21] (the W bit) is 1, not 0 as shown. Offset of zero The syntax [] is treated as an abbreviation for [],#0 for instructions that only support post-indexed addressing modes (LDRBT, LDRT, STRBT, STRT), but not for other instructions. The B bit This bit distinguishes between an unsigned byte (B==1) and a word (B==0) access. The L bit This bit distinguishes between a Load (L==1) and a Store (L==0) instruction. Use of R15 Specifying R15 as register Rn has UNPREDICTABLE results. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A5-29 ARM Addressing Modes A5.2.9 Load and Store Word or Unsigned Byte - Register post-indexed 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 U B 0 L 16 15 Rn 12 11 10 9 8 Rd 7 6 5 4 3 0 0 0 0 0 0 0 0 0 Rm This addressing mode uses the value of the base register Rn as the address for the memory access. If the condition specified in the instruction matches the condition code status, the value of the index register Rm is added to or subtracted from the value of the base register Rn and written back to the base register Rn. The conditions are defined in The condition field on page A3-3. Syntax [], +/- where: Specifies the register containing the base address. Specifies the register containing the offset to add to or subtract from Rn. Operation address = Rn if ConditionPassed(cond) then if U == 1 then Rn = Rn + Rm else /* U == 0 */ Rn = Rn - Rm Notes Encoding This addressing mode is encoded as an LSL scaled register offset, scaled by zero. Post-indexed addressing modes LDRBT, LDRT, STRBT, and STRT only support post-indexed addressing modes. They use a minor modification of the above bit pattern, where bit[21] (the W bit) is 1, not 0 as shown. The B bit This bit distinguishes between an unsigned byte (B==1) and a word (B==0) access. The L bit This bit distinguishes between a Load (L==1) and a Store (L==0) instruction. Use of R15 Specifying R15 as register Rn or Rm has UNPREDICTABLE results. Operand restriction There are no operand restrictions in ARMv6 and above. In earlier versions of the architecture, if the same register is specified for Rn and Rm, the result is UNPREDICTABLE. A5-30 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Addressing Modes A5.2.10 Load and Store Word or Unsigned Byte - Scaled register post-indexed 31 28 27 26 25 24 23 22 21 20 19 cond 0 1 1 0 U B 0 L 16 15 Rn 12 11 Rd 7 6 5 4 shift_imm 3 shift 0 0 Rm This addressing mode uses the value of the base register Rn as the address for the memory access. If the condition specified in the instruction matches the condition code status, the shifted or rotated value of index register Rm is added to or subtracted from the value of the base register Rn and written back to the base register Rn. The conditions are defined in The condition field on page A3-3. Syntax One of: [], [], [], [], [], +/-, +/-, +/-, +/-, +/-, LSL LSR ASR ROR RRX # # # # where: Specifies the register containing the base address. Specifies the register containing the offset to add to or subtract from Rn. LSL Specifies a logical shift left. LSR Specifies a logical shift right. ASR Specifies an arithmetic shift right. ROR Specifies a rotate right. RRX Specifies a rotate right with extend. Specifies the shift or rotation. ARM DDI 0100I LSL 0 to 31, encoded directly in the shift_imm field. LSR 1 to 32. A shift amount of 32 is encoded as shift_imm == 0. Other shift amounts are encoded directly. ASR 1 to 32. A shift amount of 32 is encoded as shift_imm == 0. Other shift amounts are encoded directly. ROR 1 to 31, encoded directly in the shift_imm field. (The shift_imm == 0 encoding is used to specify the RRX option.) Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A5-31 ARM Addressing Modes Operation address = Rn case shift of 0b00 /* LSL */ index = Rm Logical_Shift_Left shift_imm 0b01 /* LSR */ if shift_imm == 0 then /* LSR #32 */ index = 0 else index = Rm Logical_Shift_Right shift_imm 0b10 /* ASR */ if shift_imm == 0 then /* ASR #32 */ if Rm[31] == 1 then index = 0xFFFFFFFF else index = 0 else index = Rm Arithmetic_Shift_Right shift_imm 0b11 /* ROR or RRX */ if shift_imm == 0 then /* RRX */ index = (C Flag Logical_Shift_Left 31) OR (Rm Logical_Shift_Right 1) else /* ROR */ index = Rm Rotate_Right shift_imm endcase if ConditionPassed(cond) then if U == 1 then Rn = Rn + index else /* U == 0 */ Rn = Rn - index Notes The W bit LDRBT, LDRT, STRBT, and STRT only support post-indexed addressing modes. They use a minor modification of the above bit pattern, where bit[21] (the W bit) is 1, not 0 as shown. The B bit This bit distinguishes between an unsigned byte (B == 1) and a word (B == 0) access. The L bit This bit distinguishes between a Load (L == 1) and a Store (L == 0) instruction. Use of R15 Specifying R15 as register Rm or Rn has UNPREDICTABLE results. Operand restriction There are no operand restrictions in ARMv6 and above. In earlier versions of the architecture, if the same register is specified for Rn and Rm, the result is UNPREDICTABLE. A5-32 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Addressing Modes A5.3 Addressing Mode 3 - Miscellaneous Loads and Stores There are six formats used to calculate the address for load and store (signed or unsigned) halfword, load signed byte, or load and store doubleword instructions. The general instruction syntax is: LDR|STR{}H|SH|SB|D , where is one of the following six options: 1. [, #+/-] See Miscellaneous Loads and Stores - Immediate offset on page A5-35. 2. [, +/-] See Miscellaneous Loads and Stores - Register offset on page A5-36. 3. [, #+/-]! See Miscellaneous Loads and Stores - Immediate pre-indexed on page A5-37. 4. [, +/-]! See Miscellaneous Loads and Stores - Register pre-indexed on page A5-38. 5. [], #+/- See Miscellaneous Loads and Stores - Immediate post-indexed on page A5-39. 6. [], +/- See Miscellaneous Loads and Stores - Register post-indexed on page A5-40. A5.3.1 Encoding The following diagrams show the encodings for this addressing mode: Immediate offset/index 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 P U 1 W L 16 15 Rn 12 11 Rd 8 7 6 5 4 immedH 3 1 S H 1 0 ImmedL Register offset/index 31 28 27 26 25 24 23 22 21 20 19 cond ARM DDI 0100I 0 0 0 P U 0 W L 16 15 Rn 12 11 Rd 8 7 6 5 4 SBZ 1 S H 1 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. 3 0 Rm A5-33 ARM Addressing Modes The P bit Has two meanings: P == 0 Indicates the use of post-indexed addressing. The base register value is used for the memory address, and the offset is then applied to the base register value and written back to the base register. P == 1 Indicates the use of offset addressing or pre-indexed addressing (the W bit determines which). The memory address is generated by applying the offset to the base register value. The U bit Indicates whether the offset is added to the base (U == 1) or subtracted from the base (U == 0). The W bit Has two meanings: P == 0 The W bit must be 0 or the instruction is UNPREDICTABLE. P == 1 W == 1 indicates that the memory address is written back to the base register (pre-indexed addressing), and W == 0 that the base register is unchanged (offset addressing). The L, S and H bits These bits combine to specify signed or unsigned loads or stores, and doubleword, halfword, or byte accesses: L=0, S=0, H=1 Store halfword. L=0, S=1, H=0 Load doubleword. L=0, S=1, H=1 Store doubleword. L=1, S=0, H=1 Load unsigned halfword. L=1, S=1, H=0 Load signed byte. L=1, S=1, H=1 Load signed halfword. Prior to v5TE, the bits were denoted as Load/!Store (L), Signed/!Unsigned (S) and halfword/!Byte (H) bits. Signed bytes and halfwords can be stored with the same STRB and STRH instructions as are used for unsigned quantities, so no separate signed store instructions are provided. Unsigned bytes If S == 0 and H == 0, apparently indicating an unsigned byte, the instruction is not one that uses this addressing mode. Instead, it is a multiply instruction, a SWP or SWPB instruction, an LDREX or STREX instruction, or an unallocated instruction in the arithmetic or load/store instruction extension space (see Extending the instruction set on page A3-32). Unsigned bytes are accessed by the LDRB, LDRBT, STRB and STRBT instructions, which use addressing mode 2 rather than addressing mode 3. Signed stores If S ==1 and L == 0, apparently indicating a signed store instruction, the encoding along with the H-bit is used to support the LDRD (H == 0) and STRD (H == 1) instructions. A5-34 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Addressing Modes A5.3.2 Miscellaneous Loads and Stores - Immediate offset 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 1 U 1 0 L 16 15 Rn 12 11 Rd immedH 8 7 6 5 4 1 S H 1 3 0 immedL This addressing mode calculates an address by adding or subtracting the value of an immediate offset to or from the value of the base register Rn. Syntax [, #+/-] where: Specifies the register containing the base address. Specifies the immediate offset used with the value of Rn to form the address. The offset is encoded in immedH (top 4 bits) and immedL (bottom 4 bits). Operation offset_8 = (immedH << 4) OR immedL if U == 1 then address = Rn + offset_8 else /* U == 0 */ address = Rn - offset_8 Usage This addressing mode is used for accessing structure (record) fields, and accessing parameters and locals variable in a stack frame. With an offset of zero, the address produced is the unaltered value of the base register Rn. Notes Zero offset The syntax [] is treated as an abbreviation for [,#0]. The L, S and H bits The L, S and H bits are defined in Encoding on page A5-33. Use of R15 ARM DDI 0100I If R15 is specified as register Rn, the value used is the address of the instruction plus eight. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A5-35 ARM Addressing Modes A5.3.3 Miscellaneous Loads and Stores - Register offset 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 1 U 0 0 L 16 15 Rn 12 11 Rd 8 SBZ 7 6 5 4 3 1 S H 1 0 Rm This addressing mode calculates an address by adding or subtracting the value of the index register Rm to or from the value of the base register Rn. Syntax [, +/-] where: Specifies the register containing the base address. Specifies the register containing the offset to add to or subtract from Rn. Operation if U == 1 then address = Rn + Rm else /* U == 0 */ address = Rn - Rm Usage This addressing mode is useful for pointer plus offset arithmetic and for accessing a single element of an array. Notes The L, S and H bits The L, S and H bits are defined in Encoding on page A5-33. Unsigned bytes If S == 0 and H == 0, apparently indicating an unsigned byte, the instruction is not one that uses this addressing mode. Instead, it is a multiply instruction, a SWP or SWPB instruction, or an unallocated instruction in the arithmetic or load/store instruction extension space (see Extending the instruction set on page A3-32). Unsigned bytes are accessed by the LDRB, LDRBT, STRB and STRBT instructions, which use addressing mode 2 rather than addressing mode 3. Use of R15 A5-36 If R15 is specified as register Rn, the value used is the address of the instruction plus eight. Specifying R15 as register Rm has UNPREDICTABLE results. Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Addressing Modes A5.3.4 Miscellaneous Loads and Stores - Immediate pre-indexed 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 1 U 1 1 L 16 15 Rn 12 11 Rd immedH 8 7 6 5 4 1 S H 1 3 0 ImmedL This addressing mode calculates an address by adding or subtracting the value of an immediate offset to or from the value of the base register Rn. If the condition specified in the instruction matches the condition code status, the calculated address is written back to the base register Rn. The conditions are defined in The condition field on page A3-3. Syntax [, #+/-]! where: Specifies the register containing the base address. Specifies the immediate offset used with the value of Rn to form the address. The offset is encoded in immedH (top 4 bits) and immedL (bottom 4 bits). ! Sets the W bit, causing base register update. Operation offset_8 = (immedH << 4) OR immedL if U == 1 then address = Rn + offset_8 else /* U == 0 */ address = Rn - offset_8 if ConditionPassed(cond) then Rn = address Usage This addressing mode gives pointer access to arrays, with automatic update of the pointer value. Notes Offset of zero The syntax [] must not be treated as an abbreviation for [,#0]!. The L, S and H bits The L, S and H bits are defined in Encoding on page A5-33. Use of R15 Specifying R15 as register Rn has UNPREDICTABLE results. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A5-37 ARM Addressing Modes A5.3.5 Miscellaneous Loads and Stores - Register pre-indexed 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 1 U 0 1 L 16 15 Rn 12 11 Rd 8 SBZ 7 6 5 4 3 1 S H 1 0 Rm This addressing mode calculates an address by adding or subtracting the value of the index register Rm to or from the value of the base register Rn. If the condition specified in the instruction matches the condition code status, the calculated address is written back to the base register Rn. The conditions are defined in The condition field on page A3-3. Syntax [, +/-]! where: Specifies the register containing the base address. Specifies the register containing the offset to add to or subtract from Rn. ! Sets the W bit, causing base register update. Operation if U == 1 then address = Rn + Rm else /* U == 0 */ address = Rn - Rm if ConditionPassed(cond) then Rn = address Notes The L, S and H bits The L, S and H bits are defined in Encoding on page A5-33. Use of R15 Specifying R15 as register Rm or Rn has UNPREDICTABLE results. Operand restriction There are no operand restrictions in ARMv6 and above. In earlier versions of the architecture, if the same register is specified for Rn and Rm, the result is UNPREDICTABLE. A5-38 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Addressing Modes A5.3.6 Miscellaneous Loads and Stores - Immediate post-indexed 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 0 U 1 0 L 16 15 Rn 12 11 Rd immedH 8 7 6 5 4 3 1 S H 1 0 ImmedL This addressing mode uses the value of the base register Rn as the address for the memory access. If the condition specified in the instruction matches the condition code status, the value of the immediate offset is added to or subtracted from the value of the base register Rn and written back to the base register Rn. The conditions are defined in The condition field on page A3-3. Syntax [], #+/- where: Specifies the register containing the base address. Specifies the immediate offset used with the value of Rn to form the address. The offset is encoded in immedH (top 4 bits) and immedL (bottom 4 bits). Operation address = Rn offset_8 = (immedH << 4) OR immedL if ConditionPassed(cond) then if U == 1 then Rn = Rn + offset_8 else /* U == 0 */ Rn = Rn - offset_8 Usage This addressing mode gives pointer access to arrays, with automatic update of the pointer value. Notes Offset of zero The syntax [] must not be treated as an abbreviation for [],#0. The L, S and H bits The L, S and H bits are defined in Encoding on page A5-33. Use of R15 Specifying R15 as register Rn has UNPREDICTABLE results. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A5-39 ARM Addressing Modes A5.3.7 Miscellaneous Loads and Stores - Register post-indexed 31 28 27 26 25 24 23 22 21 20 19 cond 0 0 0 0 U 0 0 L 16 15 Rn 12 11 Rd 8 SBZ 7 6 5 4 3 1 S H 1 0 Rm This addressing mode uses the value of the base register Rn as the address for the memory access. If the condition specified in the instruction matches the condition code status, the value of the index register Rm is added to or subtracted from the value of the base register Rn and written back to the base register Rn. The conditions are defined in The condition field on page A3-3. Syntax [], +/- where: Specifies the register containing the base address. Specifies the register containing the offset to add to or subtract from Rn. Operation address = Rn if ConditionPassed(cond) then if U == 1 then Rn = Rn + Rm else /* U == 0 */ Rn = Rn - Rm Notes The L, S and H bits The L, S and H bits are defined in Encoding on page A5-33. Use of R15 Specifying R15 as register Rm or Rn has UNPREDICTABLE results. Operand restriction There are no operand restrictions in ARMv6 and above. In earlier versions of the architecture, if the same register is specified for Rn and Rm, the result is UNPREDICTABLE. A5-40 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Addressing Modes A5.4 Addressing Mode 4 - Load and Store Multiple Load Multiple instructions load a subset (possibly all) of the general-purpose registers from memory. Store Multiple instructions store a subset (possibly all) of the general-purpose registers to memory. Load and Store Multiple addressing modes produce a sequential range of addresses. The lowest-numbered register is stored at the lowest memory address and the highest-numbered register at the highest memory address. The general instruction syntax is: LDM|STM{} {!}, {^} where is one of the following four addressing modes: 1. IA (Increment After) See Load and Store Multiple - Increment after on page A5-43. 2. IB (Increment Before) See Load and Store Multiple - Increment before on page A5-44. 3. DA (Decrement After) See Load and Store Multiple - Decrement after on page A5-45. 4. DB (Decrement Before) See Load and Store Multiple - Decrement before on page A5-46. There are also alternative mnemonics for these addressing modes, useful when LDM and STM are being used to access a stack, see Load and Store Multiple addressing modes (alternative names) on page A5-47. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A5-41 ARM Addressing Modes A5.4.1 Encoding The following diagram shows the encoding for this addressing mode: 31 28 27 26 25 24 23 22 21 20 19 cond 1 0 0 P U S W L The P bit 16 15 Rn 0 register list Has two meanings: P==0 indicates that the word addressed by Rn is included in the range of memory locations accessed, lying at the top (U==0) or bottom (U==1) of that range. P==1 indicates that the word addressed by Rn is excluded from the range of memory locations accessed, and lies one word beyond the top of the range (U==0) or one word below the bottom of the range (U==1). The U bit Indicates that the transfer is made upwards (U==1) or downwards (U==0) from the base register. The S bit For LDMs that load the PC, the S bit indicates that the CPSR is loaded from the SPSR. For LDMs that do not load the PC and all STMs, the S bit indicates that when the processor is in a privileged mode, the User mode banked registers are transferred instead of the registers of the current mode. LDM with the S bit set is UNPREDICTABLE in User or System mode. The W bit Indicates that the base register is updated after the transfer. The base register is incremented (U==1) or decremented (U==0) by four times the number of registers in the register list. The L bit Distinguishes between Load (L==1) and Store (L==0) instructions. Register list The register_list field of the instruction has one bit for each general-purpose register: bit[0] for register zero through to bit[15] for register 15 (the PC). If no bits are set, the result is UNPREDICTABLE. The instruction syntax specifies the registers to load or store in , which is a comma-separated list of registers, surrounded by { and }. A5-42 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Addressing Modes A5.4.2 Load and Store Multiple - Increment after 31 28 27 26 25 24 23 22 21 20 19 cond 1 0 0 0 1 S W L 16 15 Rn 0 register list This addressing mode is for Load and Store Multiple instructions, and forms a range of addresses. The first address formed is the , and is the value of the base register Rn. Subsequent addresses are formed by incrementing the previous address by four. One address is produced for each register that is specified in . The last address produced is the . Its value is four less than the sum of the value of the base register and four times the number of registers specified in . If the condition specified in the instruction matches the condition code status and the W bit is set, Rn is incremented by four times the number of registers in . The conditions are defined in The condition field on page A3-3. Syntax IA See also the alternative syntax described in Load and Store Multiple addressing modes (alternative names) on page A5-47. Operation start_address = Rn end_address = Rn + (Number_Of_Set_Bits_In(register_list) * 4) - 4 if ConditionPassed(cond) and W == 1 then Rn = Rn + (Number_Of_Set_Bits_In(register_list) * 4) Notes The L bit This bit distinguishes between a Load Multiple and a Store Multiple. The S bit For LDMs that load the PC, the S bit indicates that the CPSR is loaded from the SPSR. For LDMs that do not load the PC and all STMs, the S bit indicates that when the processor is in a privileged mode, the User mode banked registers are transferred instead of the registers of the current mode. LDM with the S bit set is UNPREDICTABLE in User or System mode. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A5-43 ARM Addressing Modes A5.4.3 Load and Store Multiple - Increment before 31 28 27 26 25 24 23 22 21 20 19 cond 1 0 0 1 1 S W L 16 15 Rn 0 register list This addressing mode is for Load and Store Multiple instructions, and forms a range of addresses. The first address formed is the , and is the value of the base register Rn plus four. Subsequent addresses are formed by incrementing the previous address by four. One address is produced for each register that is specified in . The last address produced is the . Its value is the sum of the value of the base register and four times the number of registers specified in . If the condition specified in the instruction matches the condition code status and the W bit is set, Rn is incremented by four times the number of registers in . The conditions are defined in The condition field on page A3-3. Syntax IB See also the alternative syntax described in Load and Store Multiple addressing modes (alternative names) on page A5-47. Operation start_address = Rn + 4 end_address = Rn + (Number_Of_Set_Bits_In(register_list) * 4) if ConditionPassed(cond) and W == 1 then Rn = Rn + (Number_Of_Set_Bits_In(register_list) * 4) Notes The L bit This bit distinguishes between a Load Multiple and a Store Multiple. The S bit For LDMs that load the PC, the S bit indicates that the CPSR is loaded from the SPSR. For LDMs that do not load the PC and all STMs, the S bit indicates that when the processor is in a privileged mode, the User mode banked registers are transferred instead of the registers of the current mode. LDM with the S bit set is UNPREDICTABLE in User or System mode. A5-44 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Addressing Modes A5.4.4 Load and Store Multiple - Decrement after 31 28 27 26 25 24 23 22 21 20 19 cond 1 0 0 0 0 S W L 16 15 Rn 0 register list This addressing mode is for Load and Store Multiple instructions, and forms a range of addresses. The first address formed is the , and is the value of the base register minus four times the number of registers specified in , plus 4. Subsequent addresses are formed by incrementing the previous address by four. One address is produced for each register that is specified in . The last address produced is the . Its value is the value of the base register Rn. If the condition specified in the instruction matches the condition code status and the W bit is set, Rn is decremented by four times the number of registers in . The conditions are defined in The condition field on page A3-3. Syntax DA See also the alternative syntax described in Load and Store Multiple addressing modes (alternative names) on page A5-47. Operation start_address = Rn - (Number_Of_Set_Bits_In(register_list) * 4) + 4 end_address = Rn if ConditionPassed(cond) and W == 1 then Rn = Rn - (Number_Of_Set_Bits_In(register_list) * 4) Notes The L bit This bit distinguishes between a Load Multiple and a Store Multiple. The S bit For LDMs that load the PC, the S bit indicates that the CPSR is loaded from the SPSR. For LDMs that do not load the PC and all STMs, the S bit indicates that when the processor is in a privileged mode, the User mode banked registers are transferred instead of the registers of the current mode. LDM with the S bit set is UNPREDICTABLE in User or System mode. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A5-45 ARM Addressing Modes A5.4.5 Load and Store Multiple - Decrement before 31 28 27 26 25 24 23 22 21 20 19 cond 1 0 0 1 0 S W L 16 15 Rn 0 register list This addressing mode is for Load and Store multiple instructions, and forms a range of addresses. The first address formed is the , and is the value of the base register minus four times the number of registers specified in . Subsequent addresses are formed by incrementing the previous address by four. One address is produced for each register that is specified in . The last address produced is the . Its value is the value of the base register Rn minus four. If the condition specified in the instruction matches the condition code status and the W bit is set, Rn is decremented by four times the number of registers in . The conditions are defined in The condition field on page A3-3. Syntax DB See also the alternative syntax described in Load and Store Multiple addressing modes (alternative names) on page A5-47. Architecture version All Operation start_address = Rn - (Number_Of_Set_Bits_In(register_list) * 4) end_address = Rn - 4 if ConditionPassed(cond) and W == 1 then Rn = Rn - (Number_Of_Set_Bits_In(register_list) * 4) Notes The L bit This bit distinguishes between a Load Multiple and a Store Multiple. The S bit For LDMs that load the PC, the S bit indicates that the CPSR is loaded from the SPSR. For LDMs that do not load the PC and all STMs, the S bit indicates that when the processor is in a privileged mode, the User mode banked registers are transferred instead of the registers of the current mode. LDM with the S bit set is UNPREDICTABLE in User or System mode. A5-46 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Addressing Modes A5.4.6 Load and Store Multiple addressing modes (alternative names) The four addressing mode names given in Addressing Mode 4 - Load and Store Multiple on page A5-41 (IA, IB, DA, DB) are most useful when a load and Store Multiple instruction is being used for block data transfer, as it is likely that the Load Multiple and Store Multiple have the same addressing mode, so that the data is stored in the same way that it was loaded. However, if Load Multiple and Store Multiple are being used to access a stack, the data is not loaded with the same addressing mode that was used to store the data, because the load (pop) and store (push) operations must adjust the stack in opposite directions. Stack operations Load Multiple and Store Multiple addressing modes can be specified with an alternative syntax, which is more applicable to stack operations: Full stacks Have stack pointers that point to the last used (full) location. Empty stacks Have stack pointers that point to the first unused (empty) location. Descending stacks Grow towards decreasing memory addresses (towards the bottom of memory). Ascending stacks Grow towards increasing memory addresses (towards the top of memory). Two attributes allow four types of stack to be defined: • Full Descending, with the syntax FD • Empty Descending, with the syntax ED • Full Ascending, with the syntax FA • Empty Ascending, with the syntax EA. Note When defining stacks on which coprocessor data is to be placed (or might be placed in the future), programmers are advised to use the FD or EA stack types. This is because coprocessor data can be pushed to these types of stack with a single STC instruction and popped from them with a single LDC instruction. Multi-instruction sequences are required for coprocessor access to FA or ED stacks. Table A5-1 on page A5-48 and Table A5-2 on page A5-48 show the relationship between the four types of stack, the four types of addressing mode shown above, and the L, U, and P bits in the instruction format. ARM DDI 0100I Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. A5-47 ARM Addressing Modes Table A5-1 shows the relationship for LDM instructions. Table A5-1 LDM addressing modes Non-stack addressing mode Stack addressing mode L bit P bit U bit LDMDA (Decrement After) LDMFA (Full Ascending) 1 0 0 LDMIA (Increment After) LDMFD (Full Descending) 1 0 1 LDMDB (Decrement Before) LDMEA (Empty Ascending) 1 1 0 LDMIB (Increment Before) LDMED (Empty Descending) 1 1 1 Table A5-2 shows the relationship for STM instructions. Table A5-2 STM addressing modes A5-48 Non-stack addressing mode Stack addressing mode L bit P bit U bit STMDA (Decrement After) STMED (Empty Descending) 0 0 0 STMIA (Increment After) STMEA (Empty Ascending) 0 0 1 STMDB (Decrement Before) STMFD (Full Descending) 0 1 0 STMIB (Increment Before) STMFA (Full Ascending) 0 1 1 Copyright © 1996-1998, 2000, 2004, 2005 ARM Limited. All rights reserved. ARM DDI 0100I ARM Addressing Modes A5.5 Addressing Mode 5 - Load and Store Coprocessor There are four addressing modes which are used to calculate the address of a Load or Store Coprocessor instruction. The general instruction syntax is: {}{L} ,, where is one of the following four options: 1. [,#+/-*4] See Load and Store Coprocessor - Immediate offset on page A5-51. 2. [,#+/-*4]! See Load and Store Coprocessor - Immediate pre-indexed on page A5-52. 3. [],#+/-*4 See Load and Store Coprocessor - Immediate post-indexed on page A5-53. 4. [],

Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.5
Linearized                      : No
Page Mode                       : UseOutlines
XMP Toolkit                     : 3.1-701
Producer                        : Acrobat Distiller 7.0.5 (Windows)
Creator Tool                    : FrameMaker 7.2
Modify Date                     : 2007:02:27 14:28:47Z
Create Date                     : 2007:02:27 14:28:47Z
Format                          : application/pdf
Title                           : ARM Architecture Reference Manual
Creator                         : ARM Limited
Document ID                     : uuid:161e51d0-dccb-47e0-b9d9-6acc5c72b177
Instance ID                     : uuid:5e7276b0-6355-487e-8cd2-3f84d79c1d3a
Page Count                      : 1138
Has XFA                         : No
Author                          : ARM Limited
EXIF Metadata provided by EXIF.tools

Navigation menu