TMS320C28x Extended Instruction Sets (Rev. A) Technical Reference Manual

User Manual:

Open the PDF directly: View PDF .
Page Count: 404 [warning: Documents this large are best viewed by clicking the View PDF Link!]

Table of Contents
Preface
1 Floating Point Unit (FPU)
2 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
3 Trigonometric Math Unit (TMU)
Revision History
Important Notice

TMS320C28x Extended Instruction Sets

Technical Reference Manual

Literature Number: SPRUHS1A

March 2014–Revised December 2015

Contents

Preface ........................................................................................................................................ 5

1 Floating Point Unit (FPU) ...................................................................................................... 9

1.1 Overview..................................................................................................................... 10

1.1.1 Compatibility with the C28x Fixed-Point CPU ................................................................. 10

1.2 Components of the C28x plus Floating-Point CPU .................................................................... 11

1.2.1 Emulation Logic.................................................................................................... 12

1.2.2 Memory Map ....................................................................................................... 12

1.2.3 On-Chip Program and Data...................................................................................... 12

1.2.4 CPU Interrupt Vectors ............................................................................................ 12

1.2.5 Memory Interface.................................................................................................. 12

1.3 CPU Register Set .......................................................................................................... 13

1.3.1 CPU Registers..................................................................................................... 13

1.4 Pipeline ...................................................................................................................... 19

1.4.1 Pipeline Overview ................................................................................................. 19

1.4.2 General Guidelines for Floating-Point Pipeline Alignment .................................................. 20

1.4.3 Moves from FPU Registers to C28x Registers................................................................ 20

1.4.4 Moves from C28x Registers to FPU Registers................................................................ 21

1.4.5 Parallel Instructions ............................................................................................... 21

1.4.6 Invalid Delay Instructions......................................................................................... 22

1.4.7 Optimizing the Pipeline ........................................................................................... 25

1.5 Floating Point Unit Instruction Set ....................................................................................... 26

1.5.1 Instruction Descriptions........................................................................................... 26

1.5.2 Instructions ......................................................................................................... 29

2 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) ............................................................. 140

2.1 Overview ................................................................................................................... 141

2.2 Components of the C28x Plus VCU.................................................................................... 142

2.2.1 Emulation Logic .................................................................................................. 144

2.2.2 Memory Map ..................................................................................................... 144

2.2.3 CPU Interrupt Vectors........................................................................................... 144

2.2.4 Memory Interface ................................................................................................ 144

2.2.5 Address and Data Buses ....................................................................................... 144

2.2.6 Alignment of 32-Bit Accesses to Even Addresses .......................................................... 145

2.3 Register Set ............................................................................................................... 146

2.3.1 VCU Register Set................................................................................................ 147

2.3.2 VCU Status Register (VSTATUS) ............................................................................. 149

2.3.3 Repeat Block Register (RB) .................................................................................... 152

2.4 Pipeline..................................................................................................................... 154

2.4.1 Pipeline Overview................................................................................................ 154

2.4.2 General Guidelines for VCU Pipeline Alignment ............................................................ 154

2.4.3 Parallel Instructions.............................................................................................. 155

2.4.4 Invalid Delay Instructions ....................................................................................... 156

2.5 Instruction Set ............................................................................................................. 159

2.5.1 Instruction Descriptions ......................................................................................... 159

2.5.2 General Instructions ............................................................................................. 161

2.5.3 Arithmetic Math Instructions.................................................................................... 205

2Contents SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

2.5.4 Complex Math Instructions ..................................................................................... 212

2.5.5 Cyclic Redundancy Check (CRC) Instructions............................................................... 271

2.5.6 Deinterleaver Instructions....................................................................................... 287

2.5.7 FFT Instructions.................................................................................................. 303

2.5.8 Galois Instructions ............................................................................................... 331

2.5.9 Viterbi Instructions ............................................................................................... 344

2.6 Rounding Mode ........................................................................................................... 379

3 Trigonometric Math Unit (TMU)........................................................................................... 381

3.1 Overview ................................................................................................................... 382

3.2 Components of the C28x+FPU Plus TMU............................................................................. 382

3.2.1 Interrupt Context Save and Restore........................................................................... 382

3.3 Data Format ............................................................................................................... 382

3.4 Pipeline..................................................................................................................... 383

3.4.1 Pipeline and Register Conflicts ................................................................................ 383

3.4.2 Delay Slot Requirements ....................................................................................... 385

3.4.3 Effect of Delay Slot Operations on the Flags ................................................................ 386

3.4.4 Multi-Cycle Operations in Delay Slots......................................................................... 386

3.4.5 Moves From FPU Registers to C28x Registers ............................................................. 386

3.5 TMU Instruction Set ...................................................................................................... 388

3.5.1 Instruction Descriptions ......................................................................................... 388

3.5.2 Common Restrictions ........................................................................................... 389

3.5.3 Instructions ....................................................................................................... 389

Revision History ........................................................................................................................ 403

SPRUHS1A–March 2014–Revised December 2015 Contents

www.ti.com

List of Figures

1-1. FPU Functional Block Diagram........................................................................................... 10

1-2. C28x With Floating-Point Registers...................................................................................... 14

1-3. Floating-point Unit Status Register (STF)............................................................................... 16

1-4. Repeat Block Register (RB) .............................................................................................. 18

1-5. FPU Pipeline ................................................................................................................ 19

2-1. C28x + VCU Block Diagram............................................................................................. 142

2-2. C28x + FPU + VCU Registers .......................................................................................... 146

2-3. VCU Status Register (VSTATUS) ...................................................................................... 149

2-4. Repeat Block Register (RB) ............................................................................................. 152

2-5. C28x + FCU + VCU Pipeline ............................................................................................ 154

List of Tables

1-1. 28x Plus Floating-Point CPU Register Summary ...................................................................... 15

1-2. Floating-point Unit Status (STF) Register Field Descriptions ........................................................ 16

1-3. Repeat Block (RB) Register Field Descriptions ........................................................................ 18

1-4. Operand Nomenclature.................................................................................................... 27

1-5. Summary of Instructions................................................................................................... 29

2-1. Viterbi Decode Performance ............................................................................................ 141

2-2. Complex Math Performance............................................................................................. 141

2-3. VCU Register Set......................................................................................................... 147

2-4. 28x CPU Register Summary ............................................................................................ 148

2-5. VCU Status (VSTATUS) Register Field Descriptions................................................................ 149

2-6. Operation Interaction With VSTATUS Bits ............................................................................ 150

2-7. Repeat Block (RB) Register Field Descriptions....................................................................... 152

2-8. Operations Requiring a Delay Slot(s) .................................................................................. 155

2-9. Operand Nomenclature .................................................................................................. 159

2-10. INSTRUCTION dest, source1, source2 Short Description .......................................................... 160

2-11. General Instructions ...................................................................................................... 161

2-12. Arithmetic Math Instructions............................................................................................. 205

2-13. Complex Math Instructions .............................................................................................. 212

2-14. CRC Instructions.......................................................................................................... 271

2-15. Deinterleaver Instructions................................................................................................ 287

2-16. FFT Instructions........................................................................................................... 303

2-17. Galois Field Instructions ................................................................................................. 331

2-18. Viterbi Instructions ........................................................................................................ 344

2-19. Example: Values Before Shift Right.................................................................................... 379

2-20. Example: Values after Shift Right ...................................................................................... 379

2-21. Example: Addition with Right Shift and Rounding.................................................................... 379

2-22. Example: Addition with Rounding After Shift Right................................................................... 379

2-23. Shift Right Operation With and Without Rounding ................................................................... 380

3-1. TMU Supported Instructions............................................................................................. 382

3-2. IEEE 32-Bit Single Precision Floating-Point Format ................................................................. 382

3-3. Delay Slot Requirements for TMU Instructions ....................................................................... 385

3-4. Operand Nomenclature .................................................................................................. 388

3-5. Summary of Instructions ................................................................................................. 389

4List of Figures SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

Preface

SPRUHS1A–March 2014–Revised December 2015

Read This First

This document describes the architecture, pipeline, and instruction sets of the TMU, VCU-II, and FPU

accelerators.

About This Manual

The TMS320C2000™ digital signal processor (DSP) platform is part of the TMS320™ DSP family.

Notational Conventions

This document uses the following conventions.

• Hexadecimal numbers are shown with the suffix h or with a leading 0x. For example, the following

number is 40 hexadecimal (decimal 64): 40h or 0x40.

• Registers in this document are shown as figures and described in tables.

– Each register figure shows a rectangle divided into fields that represent the fields of the register.

Each field is labeled with its bit name, its beginning and ending bit numbers above, and its

read/write properties below. A legend explains the notation used for the properties

– Reserved bits in a register figure designate a bit that is used for future device expansion.

Related Documentation

The following books describe the TMS320x28x and related support tools that are available on the TI

website:

Data Manual and Errata—

SPRS439— TMS320F28335, TMS320F28334, TMS320F28332, TMS320F28235, TMS320F28234,

TMS320F28232 Digital Signal Controllers (DSCs) Data Manual contains the pinout, signal

descriptions, as well as electrical and timing specifications for the F2833x/2823x devices.

SPRZ272— TMS320F28335, F28334, F28332, TMS320F28235, F28234, F28232 Digital Signal

Controllers (DSCs) Silicon Errata describes the advisories and usage notes for different versions of

silicon.

CPU User's Guides—

SPRU430 — TMS320C28x CPU and Instruction Set Reference Guide describes the central processing

unit (CPU) and the assembly language instructions of the TMS320C28x fixed-point digital signal

processors (DSPs). It also describes emulation features available on these DSPs.

SPRUEO2 — TMS320C28x Floating Point Unit and Instruction Set Reference Guide describes the

floating-point unit and includes the instructions for the FPU.

Peripheral Guides—

SPRU566 — TMS320x28xx, 28xxx DSP Peripheral Reference Guide describes the peripheral

reference guides of the 28x digital signal processors (DSPs).

SPRUFB0 — TMS320x2833x, 2823x System Control and Interrupts Reference Guide describes the

various interrupts and system control features of the 2833x and 2823x digital signal controllers

(DSCs).

SPRU812 — TMS320x2833x, 2823x Analog-to-Digital Converter (ADC) Reference Guide describes

how to configure and use the on-chip ADC module, which is a 12-bit pipelined ADC.

SPRU949 — TMS320x2833x, 2823x DSC External Interface (XINTF) Reference Guide describes the

XINTF, which is a nonmultiplexed asynchronous bus, as it is used on the 2833x and 2823x devices.

SPRUHS1A–March 2014–Revised December 2015 Read This First

Related Documentation

www.ti.com

SPRU963 — TMS320x2833x, 2823x Boot ROM Reference Guide describes the purpose and features of

the bootloader (factory-programmed boot-loading software) and provides examples of code. It also

describes other contents of the device on-chip boot ROM and identifies where all of the information

is located within that memory.

SPRUFB7 — TMS320x2833x, 2823x Multichannel Buffered Serial Port (McBSP) Reference Guide

describes the McBSP available on the 2833x and 2823x devices. The McBSPs allow direct

interface between a DSP and other devices in a system.

SPRUFB8 — TMS320x2833x, 2823x Direct Memory Access (DMA) Module Reference Guide

describes the DMA on the 2833x and 2823x devices.

SPRUG04 — TMS320x2833x, 2823x Enhanced Pulse Width Modulator (ePWM) Module Reference

Guide describes the main areas of the enhanced pulse width modulator that include digital motor

control, switch mode power supply control, UPS (uninterruptible power supplies), and other forms of

power conversion.

SPRUG02 — TMS320x2833x, 2823x High-Resolution Pulse Width Modulator (HRPWM) Reference

Guide describes the operation of the high-resolution extension to the pulse width modulator

(HRPWM).

SPRUFG4 — TMS320x2833x, 2823x Enhanced Capture (eCAP) Module Reference Guide describes

the enhanced capture module. It includes the module description and registers.

SPRUG05 — TMS320x2833x, 2823x Enhanced Quadrature Encoder Pulse (eQEP) Module

Reference Guide describes the eQEP module, which is used for interfacing with a linear or rotary

incremental encoder to get position, direction, and speed information from a rotating machine in

high-performance motion and position control systems. It includes the module description and

registers.

SPRUEU1 — TMS320x2833x, 2823x Enhanced Controller Area Network (eCAN) Reference Guide

describes the eCAN that uses established protocol to communicate serially with other controllers in

electrically noisy environments.

SPRUFZ5 — TMS320x2833x, 2823x Serial Communications Interface (SCI) Reference Guide

describes the SCI, which is a two-wire asynchronous serial port, commonly known as a UART. The

SCI modules support digital communications between the CPU and other asynchronous peripherals

that use the standard non-return-to-zero (NRZ) format.

SPRUEU3 — TMS320x2833x, 2823x DSC Serial Peripheral Interface (SPI) Reference Guide

describes the SPI - a high-speed synchronous serial input/output (I/O) port - that allows a serial bit

stream of programmed length (one to sixteen bits) to be shifted into and out of the device at a

programmed bit-transfer rate.

SPRUG03 — TMS320x2833x, 2823x Inter-Integrated Circuit (I2C) Module Reference Guide describes

the features and operation of the inter-integrated circuit (I2C) module.

Tools Guides—

SPRU513 — TMS320C28x Assembly Language Tools v5.0.0 User's Guide describes the assembly

language tools (assembler and other tools used to develop assembly language code), assembler

directives, macros, common object file format, and symbolic debugging directives for the

TMS320C28x device.

SPRU514 — TMS320C28x Optimizing C/C++ Compiler v5.0.0 User's Guide describes the

TMS320C28x™ C/C++ compiler. This compiler accepts ANSI standard C/C++ source code and

produces TMS320 DSP assembly language source code for the TMS320C28x device.

SPRU608 — TMS320C28x Instruction Set Simulator Technical Overview describes the simulator,

available within the Code Composer Studio for TMS320C2000 IDE, that simulates the instruction

set of the C28x™ core.

SPRU625 — TMS320C28x DSP/BIOS 5.32 Application Programming Interface (API) Reference

Guide describes development using DSP/BIOS.

6Read This First SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

Related Documentation

Trademarks

TMS320C28x, C28x, TMS320C2000 are trademarks of Texas Instruments.

SPRUHS1A–March 2014–Revised December 2015 Read This First

Related Documentation

www.ti.com

8Read This First SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

Chapter 1

SPRUHS1A–March 2014–Revised December 2015

Floating Point Unit (FPU)

The TMS320C2000™ DSP family consists of fixed-point and floating-point digital signal controllers

(DSCs). TMS320C2000™ Digital Signal Controllers combine control peripheral integration and ease of

use of a microcontroller (MCU) with the processing power and C efficiency of TI’s leading DSP

technology. This chapter provides an overview of the architectural structure and components of the C28x

plus floating-point unit CPU.

Topic ........................................................................................................................... Page

1.1 Overview........................................................................................................... 10

1.2 Components of the C28x plus Floating-Point CPU ................................................. 11

1.3 CPU Register Set ............................................................................................... 13

1.4 Pipeline............................................................................................................. 19

1.5 Floating Point Unit Instruction Set........................................................................ 26

SPRUHS1A–March 2014–Revised December 2015 Floating Point Unit (FPU)

Programaddressbus(22)

Programdatabus(32)

Readaddressbus(32)

Readdatabus(32)

Writedatabus(32)

Existing

memory,

peripherals,

interfaces

PIE

Writeaddressbus(32)

LVF

LUF

C28x

FPU

Memory

bus

Memory

bus

Overview

www.ti.com

1.1 Overview

The C28x plus floating-point (C28x+FPU) processor extends the capabilities of the C28x fixed-point CPU

by adding registers and instructions to support IEEE single-precision floating point operations. This device

draws from the best features of digital signal processing; reduced instruction set computing (RISC); and

microcontroller architectures, firmware, and tool sets. The DSC features include a modified Harvard

architecture and circular addressing. The RISC features are single-cycle instruction execution, register-to-

microcontroller features include ease of use through an intuitive instruction set, byte packing and

unpacking, and bit manipulation. The modified Harvard architecture of the CPU enables instruction and

data fetches to be performed in parallel. The CPU can read instructions and data while it writes data

simultaneously to maintain the single-cycle instruction operation across the pipeline. The CPU does this

over six separate address/data buses.

Throughout this document the following notations are used:

• C28x refers to the C28x fixed-point CPU.

• C28x plus Floating-Point and C28x+FPU both refer to the C28x CPU with enhancements to support

IEEE single-precision floating-point operations.

1.1.1 Compatibility with the C28x Fixed-Point CPU

No changes have been made to the C28x base set of instructions, pipeline, or memory bus architecture.

Therefore, programs written for the C28x CPU are completely compatible with the C28x+FPU and all of

the features of the C28x documented in TMS320C28x DSP CPU and Instruction Set Reference Guide

(literature number SPRU430) apply to the C28x+FPU.

Figure 1-1 shows basic functions of the FPU.

Figure 1-1. FPU Functional Block Diagram

10 Floating Point Unit (FPU) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

Components of the C28x plus Floating-Point CPU

1.1.1.1 Floating-Point Code Development

When developing C28x floating-point code use Code Composer Studio 3.3, or later, with at least service

release 8. The C28x compiler V5.0, or later, is also required to generate C28x native floating-point

opcodes. This compiler is available via Code Composer Studio update advisor as a seperate download.

V5.0 can generate both fixed-point as well as floating-point code. To build floating-point code use the

compiler switches:-v28 and - -float_support = fpu32. In Code Composer Studio 3.3 the float_support

option is in the build options under compiler-> advanced: floating point support. Without the float_support

flag, or with float_support = none, the compiler will generate fixed-point code.

When building for C28x floating-point make sure all associated libraries have also been built for floating-

point. The standard run-time support (RTS) libaries built for floating-point included with the compiler have

fpu32 in their name. For example rts2800_fpu32.lib and rts2800_fpu_eh.lib have been built for the floating-

point unit. The "eh" version has exception handling for C++ code. Using the fixed-point RTS libraries in a

floating-point project will result in the linker issuing an error for incompatible object files.

To improve performance of native floating-point projects, consider using the C28x FPU Fast RTS Library

(SPRC664). This library contains hand-coded optimized math routines such as division, square root,

atan2, sin and cos. This library can be linked into your project before the standard runtime support library

to give your application a performance boost. As an example, the standard RTS library uses a polynomial

expansion to calculate the sin function. The Fast RTS library, however, uses a math look-up table in the

boot ROM of the device. Using this look-up table method results in approximately a 20 cycle savings over

the standard RTS calculation.

1.2 Components of the C28x plus Floating-Point CPU

The C28x+FPU contains:

• A central processing unit for generating data and program-memory addresses; decoding and executing

instructions; performing arithmetic, logical, and shift operations; and controlling data transfers among

CPU registers, data memory, and program memory

• A floating-point unit for IEEE single-precision floating point operations.

• Emulation logic for monitoring and controlling various parts and functions of the device and for testing

device operation. This logic is identical to that on the C28x fixed-point CPU.

• Signals for interfacing with memory and peripherals, clocking and controlling the CPU and the

emulation logic, showing the status of the CPU and the emulation logic, and using interrupts. This logic

is identical to the C28x fixed-point CPU.

Some features of the C28x+FPU central processing unit are:

• Fixed-Point instructions are pipeline protected. This pipeline for fixed-point instructions is identical to

that on the C28x fixed-point CPU. The CPU implements an 8-phase pipeline that prevents a write to

and a read from the same location from occurring out of order. See Figure 1-5.

• Some floating-point instructions require pipeline alignment. This alignment is done through software to

allow the user to improve performance by taking advantage of required delay slots.

• Independent register space. These registers function as system-control registers, math registers, and

data pointers. The system-control registers are accessed by special instructions.

• Arithmetic logic unit (ALU). The 32-bit ALU performs 2s-complement arithmetic and Boolean logic

operations.

• Floating point unit (FPU). The 32-bit FPU performs IEEE single-precision floating-point operations.

• Address register arithmetic unit (ARAU). The ARAU generates data memory addresses and

increments or decrements pointers in parallel with ALU operations.

• Barrel shifter. This shifter performs all left and right shifts of fixed-point data. It can shift data to the left

by up to 16 bits and to the right by up to 16 bits.

• Fixed-Point Multiplier. The multiplier performs 32-bit × 32-bit 2s-complement multiplication with a 64-bit

result. The multiplication can be performed with two signed numbers, two unsigned numbers, or one

signed number and one unsigned number.

SPRUHS1A–March 2014–Revised December 2015 Floating Point Unit (FPU)

Components of the C28x plus Floating-Point CPU

www.ti.com

1.2.1 Emulation Logic

The emulation logic is identical to that on the C28x fixed-point CPU. This logic includes the following

features:

• Debug-and-test direct memory access (DT-DMA). A debug host can gain direct access to the content

of registers and memory by taking control of the memory interface during unused cycles of the

instruction pipeline.

• A counter for performance benchmarking.

• Multiple debug events. Any of the following debug events can cause a break in program execution:

– A breakpoint initiated by the ESTOP0 or ESTOP1 instruction.

– An access to a specified program-space or data-space location.

When a debug event causes the C28x to enter the debug-halt state, the event is called a break event.

• Real-time mode of operation.

For more details about these features, refer to the TMS320C28x DSP CPU and Instruction Set Reference

Guide (literature number SPRU430.

1.2.2 Memory Map

Like the C28x, the C28x+FPU uses 32-bit data addresses and 22-bit program addresses. This allows for a

total address reach of 4G words (1 word = 16 bits) in data space and 4M words in program space.

Memory blocks on all C28x+FPU designs are uniformly mapped to both program and data space. For

specific details about each of the map segments, see the data sheet for your device.

1.2.3 On-Chip Program and Data

All C28x+FPU based devices contain at least two blocks of single access on-chip memory referred to as

M0 and M1. Each of these blocks is 1K words in size. M0 is mapped at addresses 0x0000 −0x03FF and

M1 is mapped at addresses 0x0400 −0x07FF. Like all other memory blocks on the C28x+FPU devices,

M0 and M1 are mapped to both program and data space. Therefore, you can use M0 and M1 to execute

code or for data variables. At reset, the stack pointer is set to the top of block M1. Depending on the

device, it may also have additional random-access memory (RAM), read-only memory (ROM), external

interface zones, or flash memory.

1.2.4 CPU Interrupt Vectors

The C28x+FPU interrupt vectors are identical to those on the C28x CPU. Sixty-four addresses in program

space are set aside for a table of 32 CPU interrupt vectors. The CPU vectors can be mapped to the top or

bottom of program space by way of the VMAP bit. For more information about the CPU vectors, see

TMS320C28x DSP CPU and Instruction Set Reference Guide (literature number SPRU430). For devices

with a peripheral interrupt expansion (PIE) block, the interrupt vectors will reside in the PIE vector table

and this memory can be used as program memory.

1.2.5 Memory Interface

The C28x+FPU memory interface is identical to that on the C28x. The C28x+FPU memory map is

accessible outside the CPU by the memory interface, which connects the CPU logic to memories,

peripherals, or other interfaces. The memory interface includes separate buses for program space and

data space. This means an instruction can be fetched from program memory while data memory is being

accessed. The interface also includes signals that indicate the type of read or write being requested by the

CPU. These signals can select a specified memory block or peripheral for a given bus transaction. In

addition to 16-bit and 32-bit accesses, the C28x+FPU supports special byte-access instructions that can

access the least significant byte (LSByte) or most significant byte (MSByte) of an addressed word. Strobe

signals indicate when such an access is occurring on a data bus.

1.2.5.1 Address and Data Buses

Like the C28x, the memory interface has three address buses:

•PAB: Program address bus

12 Floating Point Unit (FPU) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

CPU Register Set

The PAB carries addresses for reads and writes from program space. PAB is a 22-bit bus.

•DRAB: Data-read address bus

The 32-bit DRAB carries addresses for reads from data space.

•DWAB: Data-write address bus

The 32-bit DWAB carries addresses for writes to data space.

The memory interface also has three data buses:

•PRDB: Program-read data bus

The PRDB carries instructions during reads from program space. PRDB is a 32-bit bus.

•DRDB: Data-read data bus

The DRDB carries data during reads from data space. DRDB is a 32-bit bus.

•DWDB: Data-/Program-write data bus

The 32-bit DWDB carries data during writes to data space or program space.

A program-space read and a program-space write cannot happen simultaneously because both use the

PAB. Similarly, a program-space write and a data-space write cannot happen simultaneously because

both use the DWDB. Transactions that use different buses can happen simultaneously. For example, the

CPU can read from program space (using PAB and PRDB), read from data space (using DRAB and

DRDB), and write to data space (using DWAB and DWDB) at the same time. This behavior is identical to

the C28x CPU.

1.2.5.2 Alignment of 32-Bit Accesses to Even Addresses

The C28x+FPU CPU expects memory wrappers or peripheral-interface logic to align any 32-bit read or

write to an even address. If the address-generation logic generates an odd address, the CPU will begin

reading or writing at the previous even address. This alignment does not affect the address values

generated by the address-generation logic.

Most instruction fetches from program space are performed as 32-bit read operations and are aligned

accordingly. However, alignment of instruction fetches are effectively invisible to a programmer. When

instructions are stored to program space, they do not have to be aligned to even addresses. Instruction

boundaries are decoded within the CPU.

You need to be concerned with alignment when using instructions that perform 32-bit reads from or writes

to data space.

1.3 CPU Register Set

The C28x+FPU architecture is the same as the C28x CPU with an extended register and instruction set to

support IEEE single-precision floating point operations. This section describes the extensions to the C28x

architecture

1.3.1 CPU Registers

Devices with the C28x+FPU include the standard C28x register set plus an additional set of floating-point

unit registers. The additional floating-point unit registers are the following:

• Eight floating-point result registers, RnH (where n = 0 - 7)

• Floating-point Status Register (STF)

• Repeat Block Register (RB)

All of the floating-point registers except the repeat block register are shadowed. This shadowing can be

used in high priority interrupts for fast context save and restore of the floating-point registers.

Figure 1-2 shows a diagram of both register sets and Table 1-1 shows a register summary. For

information on the standard C28x register set, see the TMS320C28x DSP CPU and Instruction Set

Reference Guide (literature number SPRU430).

SPRUHS1A–March 2014–Revised December 2015 Floating Point Unit (FPU)

ACC(32-bit)

R1H(32-bit)

R2H(32-bit)

R3H(32-bit)

R4H(32-bit)

R5H(32-bit)

R6H(32-bit)

R7H(32-bit)

R0H(32-bit)

FPUStatusRegister(STF)

RepeatBlockRegister(RB)

P (32-bit)

XT (32-bit)

XAR0(32-bit)

XAR1(32-bit)

XAR2(32-bit)

XAR3(32-bit)

XAR4(32-bit)

XAR5(32-bit)

XAR6(32-bit)

XAR7(32-bit)

PC(22-bit)

RPC(22-bit)

DP (16-bit)

SP (16-bit)

ST0(16-bit)

ST1(16-bit)

IER(16-bit)

IFR(16-bit)

DBGIER(16-bit)

StandardC28xRegisterSet Additional32-bitFPURegisters

FPUregistersR0H-R7HandSTF

areshadowedforfastcontext

saveandrestore

CPU Register Set

www.ti.com

Figure 1-2. C28x With Floating-Point Registers

14 Floating Point Unit (FPU) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

CPU Register Set

Table 1-1. 28x Plus Floating-Point CPU Register Summary

ACC Yes Yes 32 bits Accumulator 0x00000000

AH Yes Yes 16 bits High half of ACC 0x0000

AL Yes Yes 16 bits Low half of ACC 0x0000

XAR0 Yes Yes 32 bits Auxiliary register 0 0x00000000

XAR1 Yes Yes 32 bits Auxiliary register 1 0x00000000

XAR2 Yes Yes 32 bits Auxiliary register 2 0x00000000

XAR3 Yes Yes 32 bits Auxiliary register 3 0x00000000

XAR4 Yes Yes 32 bits Auxiliary register 4 0x00000000

XAR5 Yes Yes 32 bits Auxiliary register 5 0x00000000

XAR6 Yes Yes 32 bits Auxiliary register 6 0x00000000

XAR7 Yes Yes 32 bits Auxiliary register 7 0x00000000

AR0 Yes Yes 16 bits Low half of XAR0 0x0000

AR1 Yes Yes 16 bits Low half of XAR1 0x0000

AR2 Yes Yes 16 bits Low half of XAR2 0x0000

AR3 Yes Yes 16 bits Low half of XAR3 0x0000

AR4 Yes Yes 16 bits Low half of XAR4 0x0000

AR5 Yes Yes 16 bits Low half of XAR5 0x0000

AR6 Yes Yes 16 bits Low half of XAR6 0x0000

AR7 Yes Yes 16 bits Low half of XAR7 0x0000

DP Yes Yes 16 bits Data-page pointer 0x0000

IFR Yes Yes 16 bits Interrupt flag register 0x0000

IER Yes Yes 16 bits Interrupt enable register 0x0000

DBGIER Yes Yes 16 bits Debug interrupt enable register 0x0000

P Yes Yes 32 bits Product register 0x00000000

PH Yes Yes 16 bits High half of P 0x0000

PL Yes Yes 16 bits Low half of P 0x0000

PC Yes Yes 22 bits Program counter 0x3FFFC0

RPC Yes Yes 22 bits Return program counter 0x00000000

SP Yes Yes 16 bits Stack pointer 0x0400

ST0 Yes Yes 16 bits Status register 0 0x0000

ST1 Yes Yes 16 bits Status register 1 0x080B(1)

XT Yes Yes 32 bits Multiplicand register 0x00000000

T Yes Yes 16 bits High half of XT 0x0000

TL Yes Yes 16 bits Low half of XT 0x0000

ROH No Yes 32 bits Floating-point result register 0 0.0

R1H No Yes 32 bits Floating-point result register 1 0.0

R2H No Yes 32 bits Floating-point result register 2 0.0

R3H No Yes 32 bits Floating-point result register 3 0.0

R4H No Yes 32 bits Floating-point result register 4 0.0

R5H No Yes 32 bits Floating-point result register 5 0.0

R6H No Yes 32 bits Floating-point result register 6 0.0

R7H No Yes 32 bits Floating-point result register 7 0.0

STF No Yes 32 bits Floating-point status register 0x00000000

RB No Yes 32 bits Repeat block register 0x00000000

(1) Reset value shown is for devices without the VMAP signal and MOM1MAP signal pinned out. On these devices both of these signals are

tied high internal to the device.

SPRUHS1A–March 2014–Revised December 2015 Floating Point Unit (FPU)

CPU Register Set

www.ti.com

1.3.1.1 Floating-Point Status Register (STF)

The floating-point status register (STF) reflects the results of floating-point operations. There are three

basic rules for floating point operation flags:

1. Zero and negative flags are set based on moves to registers.

2. Zero and negative flags are set based on the result of compare, minimum, maximum, negative and

absolute value operations.

3. Overflow and underflow flags are set by math instructions such as multiply, add, subtract and 1/x.

These flags may also be connected to the peripheral interrupt expansion (PIE) block on your device.

This can be useful for debugging underflow and overflow conditions within an application.

As on the C28x, program flow is controlled by C28x instructions that read status flags in the status register

0 (ST0) . If a decision needs to be made based on a floating-point operation, the information in the STF

instruction can be executed. The MOVST0 FLAGinstruction is used to load the current value of specified

STF flags into the respective bits of ST0. When this instruction executes, it will also clear the latched

overflow and underflow flags if those flags are specified.

Example 1-1. Moving STF Flags to the ST0 Register

Loop:

MOV32 R0H,*XAR4++

MOV32 R1H,*XAR3++

CMPF32 R1H, R0H

MOVST0 ZF, NF ; Move ZF and NF to ST0

BF Loop, GT ; Loop if (R1H > R0H)

Figure 1-3. Floating-point Unit Status Register (STF)

31 30 16

SHDWS Reserved

R/W-0 R-0

15 109876543210

Reserved RND32 Reserved TF ZI NI ZF NF LUF LVF

R-0 R/W-0 R-0 R/W-0 R/W-0 R/W-0 R/W-0 R/W-0 R/W-0 R/W-0

LEGEND: R/W = Read/Write; R = Read only; -n= value after reset

Table 1-2. Floating-point Unit Status (STF) Register Field Descriptions

Bits Field Value Description

31 SHDWS Shadow Mode Status Bit

0 This bit is forced to 0 by the RESTORE instruction.

1 This bit is set to 1 by the SAVE instruction.

This bit is not affected by loading the status register either from memory or from the shadow values.

30 - 10 Reserved 0 Reserved for future use

9 RND32 Round 32-bit Floating-Point Mode

0 If this bit is zero, the MPYF32, ADDF32 and SUBF32 instructions will round to zero (truncate).

1 If this bit is one, the MPYF32, ADDF32 and SUBF32 instructions will round to the nearest even value.

8 - 7 Reserved 0 Reserved for future use

6 TF Test Flag

The TESTTF instruction can modify this flag based on the condition tested. The SETFLG and SAVE

instructions can also be used to modify this flag.

0 The condition tested with the TESTTF instruction is false.

1 The condition tested with the TESTTF instruction is true.

16 Floating Point Unit (FPU) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

CPU Register Set

Table 1-2. Floating-point Unit Status (STF) Register Field Descriptions (continued)

Bits Field Value Description

5 ZI Zero Integer Flag

The following instructions modify this flag based on the integer value stored in the destination register:

MOV32, MOVD32, MOVDD32

The SETFLG and SAVE instructions can also be used to modify this flag.

0 The integer value is not zero.

1 The integer value is zero.

4 NI Negative Integer Flag

The following instructions modify this flag based on the integer value stored in the destination register:

MOV32, MOVD32, MOVDD32

The SETFLG and SAVE instructions can also be used to modify this flag.

0 The integer value is not negative.

1 The integer value is negative.

3 ZF Zero Floating-Point Flag (1) (2)

The following instructions modify this flag based on the floating-point value stored in the destination

MOV32, MOVD32, MOVDD32, ABSF32, NEGF32

The CMPF32, MAXF32, and MINF32 instructions modify this flag based on the result of the operation.

The SETFLG and SAVE instructions can also be used to modify this flag

0 The floating-point value is not zero.

1 The floating-point value is zero.

2 NF Negative Floating-Point Flag (1) (2)

The following instructions modify this flag based on the floating-point value stored in the destination

MOV32, MOVD32, MOVDD32, ABSF32, NEGF32

The CMPF32, MAXF32, and MINF32 instructions modify this flag based on the result of the operation.

The SETFLG and SAVE instructions can also be used to modify this flag.

0 The floating-point value is not negative.

1 The floating-point value is negative.

1 LUF Latched Underflow Floating-Point Flag

The following instructions will set this flag to 1 if an underflow occurs:

MPYF32, ADDF32, SUBF32, MACF32, EINVF32, EISQRTF32

0 An underflow condition has not been latched. If the MOVST0 instruction is used to copy this bit to ST0,

then LUF will be cleared.

1 An underflow condition has been latched.

0 LVF Latched Overflow Floating-Point Flag

The following instructions will set this flag to 1 if an overflow occurs:

MPYF32, ADDF32, SUBF32, MACF32, EINVF32, EISQRTF32

0 An overflow condition has not been latched. If the MOVST0 instruction is used to copy this bit to ST0,

then LVF will be cleared.

1 An overflow condition has been latched.

(1) A negative zero floating-point value is treated as a positive zero value when configuring the ZF and NF flags.

(2) A DeNorm floating-point value is treated as a positive zero value when configuring the ZF and NF flags.

SPRUHS1A–March 2014–Revised December 2015 Floating Point Unit (FPU)

CPU Register Set

www.ti.com

1.3.1.2 Repeat Block Register (RB)

The repeat block instruction (RPTB) is a new instruction for C28x+FPU. This instruction allows you to

repeat a block of code as shown in Example 1-2.

Example 1-2. The Repeat Block (RPTB) Instruction uses the RB Register

; find the largest element and put its address in XAR6

MOV32 R0H, *XAR0++;

.align 2 ; Aligns the next instruction to an even address

NOP ; Makes RPTB odd aligned - required for a block size of 8

RPTB VECTOR_MAX_END, AR7 ; RA is set to 1

MOVL ACC,XAR0

MOV32 R1H,*XAR0++ ; RSIZE reflects the size of the RPTB block

MAXF32 R0H,R1H ; in this case the block size is 8

MOVST0 NF,ZF

MOVL XAR6,ACC,LT

VECTOR_MAX_END: ; RE indicates the end address. RA is cleared

The C28x_FPU hardware automatically populates the RB register based on the execution of a RPTB

instruction. This register is not normally read by the application and does not accept debugger writes.

Figure 1-4. Repeat Block Register (RB)

31 30 29 23 22 16

RAS RA RSIZE RE

R-0 R-0 R-0 R-0

15 0

R-0

LEGEND: R = Read only; -n= value after reset

Table 1-3. Repeat Block (RB) Register Field Descriptions

Bits Field Value Description

31 RAS Repeat Block Active Shadow Bit

When an interrupt occurs the repeat active, RA, bit is copied to the RAS bit and the RA bit is cleared.

When an interrupt return instruction occurs, the RAS bit is copied to the RA bit and RAS is cleared.

0 A repeat block was not active when the interrupt was taken.

1 A repeat block was active when the interrupt was taken.

30 RA Repeat Block Active Bit

0 This bit is cleared when the repeat counter, RC, reaches zero.

When an interrupt occurs the RA bit is copied to the repeat active shadow, RAS, bit and RA is cleared.

When an interrupt return, IRET, instruction is executed, the RAS bit is copied to the RA bit and RAS is

cleared.

1 This bit is set when the RPTB instruction is executed to indicate that a RPTB is currently active.

29-23 RSIZE Repeat Block Size

This 7-bit value specifies the number of 16-bit words within the repeat block. This field is initialized

when the RPTB instruction is executed. The value is calculated by the assembler and inserted into the

RPTB instruction's RSIZE opcode field.

0-7 Illegal block size.

8/9-0x7F A RPTB block that starts at an even address must include at least 9 16-bit words and a block that

starts at an odd address must include at least 8 16-bit words. The maximum block size is 127 16-bit

words. The codegen assembler will check for proper block size and alignment.

18 Floating Point Unit (FPU) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

Fetch

C28xpipeline

Decode Read Exe

Write

FPUinstruction

Store

Load

CMP/MIN/MAX/NEG/ABS

MPY/ADD/SUB/MACF32

ER2R1D2D1F2F1

E1RD

www.ti.com

Pipeline

Table 1-3. Repeat Block (RB) Register Field Descriptions (continued)

Bits Field Value Description

22-16 RE Repeat Block End Address

This 7-bit value specifies the end address location of the repeat block. The RE value is calculated by

hardware based on the RSIZE field and the PC value when the RPTB instruction is executed.

RE = lower 7 bits of (PC + 1 + RSIZE)

15-0 RC Repeat Count

0 The block will not be repeated; it will be executed only once. In this case the repeat active, RA, bit will

not be set.

1- This 16-bit value determines how many times the block will repeat. The counter is initialized when the

0xFFFF RPTB instruction is executed and is decremented when the PC reaches the end of the block. When

the counter reaches zero, the repeat active bit is cleared and the block will be executed one more

time. Therefore the total number of times the block is executed is RC+1.

1.4 Pipeline

The pipeline flow for C28x instructions is identical to that of the C28x CPU described in TMS320C28x

DSP CPU and Instruction Set Reference Guide (SPRU430). Some floating-point instructions, however,

use additional execution phases and thus require a delay to allow the operation to complete. This pipeline

alignment is achieved by inserting NOPs or non-conflicting instructions when required. Software control of

delay slots allows you to improve performance of an application by taking advantage of the delay slots and

filling them with non-conflicting instructions. This section describes the key characteristics of the pipeline

with regards to floating-point instructions. The rules for avoiding pipeline conflicts are small in number and

simple to follow and the C28x+FPU assembler will help you by issuing errors for conflicts.

1.4.1 Pipeline Overview

The C28x FPU pipeline is identical to the C28x pipeline for all standard C28x instructions. In the decode2

stage (D2), it is determined if an instruction is a C28x instruction or a floating-point unit instruction. The

pipeline flow is shown in Figure 1-5. Notice that stalls due to normal C28x pipeline stalls (D2) and memory

waitstates (R2 and W) will also stall any C28x FPU instruction. Most C28x FPU instructions are single

cycle and will complete in the FPU E1 or W stage which aligns to the C28x pipeline. Some instructions will

take an additional execute cycle (E2). For these instructions you must wait a cycle for the result from the

instruction to be available. The rest of this section will describe when delay cycles are required. Keep in

mind that the assembly tools for the C28x+FPU will issue an error if a delay slot has not been handled

correctly.

Figure 1-5. FPU Pipeline

SPRUHS1A–March 2014–Revised December 2015 Floating Point Unit (FPU)

Pipeline

www.ti.com

1.4.2 General Guidelines for Floating-Point Pipeline Alignment

While the C28x+FPU assembler will issue errors for pipeline conflicts, you may still find it useful to

understand when software delays are required. This section describes three guidelines you can follow

when writing C28x+FPU assembly code.

Floating-point instructions that require delay slots have a 'p' after their cycle count. For example '2p'

stands for 2 pipelined cycles. This means that an instruction can be started every cycle, but the result of

the instruction will only be valid one instruction later.

There are three general guidelines to determine if an instruction needs a delay slot:

1. Floating-point math operations (multiply, addition, subtraction, 1/x and MAC) require 1 delay slot.

2. Conversion instructions between integer and floating-point formats require 1 delay slot.

3. Everything else does not require a delay slot. This includes minimum, maximum, compare, load, store,

negative and absolute value instructions.

There are two exceptions to these rules. First, moves between the CPU and FPU registers require special

pipeline alignment that is described later in this section. These operations are typically infrequent. Second,

the MACF32 R7H, R3H, mem32, *XAR7 instruction has special requirements that make it easier to use.

Refer to the MACF32 instruction description for details.

An example of the 32-bit ADDF32 instruction is shown in Example 1-3. ADDF32 is a 2p instruction and

therefore requires one delay slot. The destination register for the operation, R0H, will be updated one

cycle after the instruction for a total of 2 cycles. Therefore, a NOP or instruction that does not use R0H

must follow this instruction.

Any memory stall or pipeline stall will also stall the floating-point unit. This keeps the floating-point unit

aligned with the C28x pipeline and there is no need to change the code based on the waitstates of a

memory block.

Please note that on certain devices instructions make take additional cycles to complete under specific

conditions. These exceptions will be documented in the device errata.

Example 1-3. 2p Instruction Pipeline Alignment

ADDF32 R0H, #1.5, R1H ; 2 pipeline cycles (2p)

NOP ; 1 cycle delay or non-conflicting instruction

; <-- ADDF32 completes, R0H updated

NOP ; Any instruction

1.4.3 Moves from FPU Registers to C28x Registers

When transferring from the floating-point unit registers to the C28x CPU registers, additional pipeline

alignment is required as shown in Example 1-4 and Example 1-5.

Example 1-4. Floating-Point to C28x Register Software Pipeline Alignment

; MINF32: 32-bit floating-point minimum: single-cycle operation

; An alignment cycle is required before copying R0H to ACC

MINF32 R0H, R1H ; Single-cycle instruction

; <-- R0H is valid

NOP ; Alignment cycle

MOV32 @ACC, R0H ; Copy R0H to ACC

For 1-cycle FPU instructions, one delay slot is required between a write to the floating-point register and

the transfer instruction as shown in Example 1-4. For 2p FPU instructions, two delay slots are required

between a write to the floating-point register and the transfer instruction as shown in Example 1-5.

20 Floating Point Unit (FPU) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

Pipeline

Example 1-5. Floating-Point to C28x Register Software Pipeline Alignment

; ADDF32: 32-bit floating-point addition: 2p operation

; An alignment cycle is required before copying R0H to ACC

ADDF32 R0H, R1H, #2 ; R0H = R1H + 2, 2 pipeline cycle instruction

NOP ; 1 delay cycle or non-conflicting instruction

; <-- R0H is valid

NOP ; Alignment cycle

MOV32 @ACC, R0H ; Copy R0H to ACC

1.4.4 Moves from C28x Registers to FPU Registers

Transfers from the standard C28x CPU registers to the floating-point registers require four alignment

cycles. For the 2833x, 2834x, 2806x, 28M35xx and 28M26xx, the four alignment cycles can be filled with

NOPs or any non-conflicting instruction except for FRACF32, UI16TOF32, I16TOF32, F32TOUI32, and

F32TOI32. These instructions cannot replace any of the four alignment NOPs. On newer devices any non-

conflicting instruction can go into the four alignment cycles. Please refer to the device errata for specific

exceptions to these rules.

Example 1-6. C28x Register to Floating-Point Register Software Pipeline Alignment

; Four alignment cycles are required after copying a standard 28x CPU

; register to a floating-point register.

;

MOV32 R0H,@ACC ; Copy ACC to R0H

NOP

NOP ; Wait 4 cycles

ADDF32 R2H,R1H,R0H ; R0H is valid

1.4.5 Parallel Instructions

Parallel instructions are single opcodes that perform two operations in parallel. This can be a math

operation in parallel with a move operation, or two math operations in parallel. Math operations with a

parallel move are referred to as 2p/1 instructions. The math portion of the operation takes two pipelined

cycles while the move portion of the operation is single cycle. This means that NOPs or other non

conflicting instructions must be inserted to align the math portion of the operation. An example of an add

with parallel move instruction is shown in Example 1-7.

Example 1-7. 2p/1 Parallel Instruction Software Pipeline Alignment

; ADDF32 || MOV32 instruction: 32-bit floating-point add with parallel move

; ADDF32 is a 2p operation

; MOV32 is a 1 cycle operation

;

ADDF32 R0H, R1H, #2 ; R0H = R1H + 2, 2 pipeline cycle operation

|| MOV32 R1H, @Val ; R1H gets the contents of Val, single cycle operation

; <-- MOV32 completes here (R1H is valid)

NOP ; 1 cycle delay or non-conflicting instruction

; <-- ADDF32 completes here (R0H is valid)

NOP ; Any instruction

Parallel math instructions are referred to as 2p/2p instructions. Both math operations take 2 cycles to

complete. This means that NOPs or other non conflicting instructions must be inserted to align the both

math operations. An example of a multiply with parallel add instruction is shown in Example 1-8.

SPRUHS1A–March 2014–Revised December 2015 Floating Point Unit (FPU)

Pipeline

www.ti.com

Example 1-8. 2p/2p Parallel Instruction Software Pipeline Alignment

; MPYF32 || ADDF32 instruction: 32-bit floating-point multiply with parallel add

; MPYF32 is a 2p operation

; ADDF32 is a 2p cycle operation

;

MPYF32 R0H, R1H, R3H ; R0H = R1H * R3H, 2 pipeline cycle operation

|| ADDF32 R1H, R2H, R4H ; R1H = R2H + R4H, 2 pipeline cycle operation

NOP ; 1 cycle delay or non-conflicting instruction

; <-- MPYF32 and ADDF32 complete here (R0H and R1H are valid)

NOP ; Any instruction

1.4.6 Invalid Delay Instructions

Most instructions can be used in delay slots as long as source and destination register conflicts are

avoided. The C28x+FPU assembler will issue an error anytime you use an conflicting instruction within a

delay slot. The following guidelines can be used to avoid these conflicts.

NOTE: Destination register conflicts in delay slots:

Any operation used for pipeline alignment delay must not use the same destination register

as the instruction requiring the delay. See Example 1-9.

In Example 1-9 the MPYF32 instruction uses R2H as its destination register. The next instruction should

not use R2H as its destination. Since the MOV32 instruction uses the R2H register a pipeline conflict will

be issued by the assembler. This conflict can be resolved by using a register other than R2H for the

MOV32 instruction as shown in Example 1-10.

22 Floating Point Unit (FPU) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

Pipeline

Example 1-9. Destination Register Conflict

; Invalid delay instruction. Both instructions use the same destination register

MPYF32 R2H, R1H, R0H ; 2p instruction

MOV32 R2H, mem32 ; Invalid delay instruction

Example 1-10. Destination Register Conflict Resolved

; Valid delay instruction

MPYF32 R2H, R1H, R0H ; 2p instruction MOV32 R1H, mem32

; Valid delay

; <-- MPYF32 completes, R2H valid

NOTE: Instructions in delay slots cannot use the instruction's destination register as a source

Any operation used for pipeline alignment delay must not use the destination register of the

instruction requiring the delay as a source register as shown in Example 1-11. For parallel

instructions, the current value of a register can be used in the parallel operation before it is

overwritten as shown in Example 1-13.

In Example 1-11 the MPYF32 instruction again uses R2H as its destination register. The next instruction

should not use R2H as its source since the MPYF32 will take an additional cycle to complete. Since the

ADDF32 instruction uses the R2H register a pipeline conflict will be issued by the assembler. This conflict

can be resolved by using a register other than R2H or by inserting a non-conflicting instruction between

the MPYF32 and ADDF32 instructions. Since the SUBF32 does not use R2H this instruction can be

moved before the ADDF32 as shown in Example 1-12.

Example 1-11. Destination/Source Register Conflict

; Invalid delay instruction. ADDF32 should not use R2H as a source operand

MPYF32 R2H, R1H, R0H ; 2p instruction

ADDF32 R3H, R3H, R2H ; Invalid delay instruction

SUBF32 R4H, R1H, R0H

Example 1-12. Destination/Source Register Conflict Resolved

; Valid delay instruction.

MPYF32 R2H, R1H, R0H ; 2p instruction

SUBF32 R4H, R1H, R0H ; Valid delay for MPYF32

ADDF32 R3H, R3H, R2H ; <-- MPYF32 completes, R2H valid

NOP ; <-- SUBF32 completes, R4H valid

It should be noted that a source register for the 2nd operation within a parallel instruction can be the same

as the destination register of the first operation. This is because the two operations are started at the

same time. The 2nd operation is not in the delay slot of the first operation. Consider Example 1-13 where

the MPYF32 uses R2H as its destination register. The MOV32 is the 2nd operation in the instruction and

can freely use R2H as a source register. The contents of R2H before the multiply will be used by MOV32.

SPRUHS1A–March 2014–Revised December 2015 Floating Point Unit (FPU)

Pipeline

www.ti.com

Example 1-13. Parallel Instruction Destination/Source Exception

; Valid parallel operation.

MPYF32 R2H, R1H, R0H ; 2p/1 instruction

|| MOV32 mem32, R2H ; <-- Uses R2H before the MPYF32

; <-- mem32 updated

NOP ; <-- Delay for MPYF32

; <-- R2H updated

Likewise, the source register for the 2nd operation within a parallel instruction can be the same as one of

the source registers of the first operation. The MPYF32 operation in Example 1-14 uses the R1H register

as one of its sources. This register is also updated by the MOV32 register. The multiplication operation will

use the value in R1H before the MOV32 updates it.

Example 1-14. Parallel Instruction Destination/Source Exception

; Valid parallel instruction

MPYF32 R2H, R1H, R0H ; 2p/1 instruction

|| MOV32 R1H, mem32 ; Valid

NOP ; <-- MOV32 completes, R1H valid

; <-- MPYF32, R2H valid

NOTE: Operations within parallel instructions cannot use the same destination register.

When two parallel operations have the same destination register, the result is invalid.

For example, see Example 1-15.

If both operations within a parallel instruction try to update the same destination register as shown in

Example 1-15 the assembler will issue an error.

Example 1-15. Invalid Destination Within a Parallel Instruction

; Invalid parallel instruction. Both operations use the same destination register

MPYF32 R2H, R1H, R0H ; 2p/1 instruction

|| MOV32 R2H, mem32 ; Invalid

Some instructions access or modify the STF flags. Because the instruction requiring a delay slot will also

be accessing the STF flags, these instructions should not be used in delay slots. These instructions are

SAVE, SETFLG, RESTORE and MOVST0.

NOTE: Do not use SAVE, SETFLG, RESTORE, or the MOVST0 instruction in a delay slot.

24 Floating Point Unit (FPU) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

Pipeline

1.4.7 Optimizing the Pipeline

The following example shows how delay slots can be used to improve the performance of an algorithm.

The example performs two Y = MX+B operations. In Example 1-16, no optimization has been done. The Y

= MX+B calculations are sequential and each takes 7 cycles to complete. Notice there are NOPs in the

delay slots that could be filled with non-conflicting instructions. The only requirement is these instructions

must not cause a register conflict or access the STF register flags.

Example 1-16. Floating-Point Code Without Pipeline Optimization

; Using NOPs for alignment cycles, calculate the following:

;

; Y1 = M1*X1 + B1

; Y2 = M2*X2 + B2

;

; Calculate Y1

;

MOV32 R0H,@M1 ; Load R0H with M1 - single cycle

MOV32 R1H,@X1 ; Load R1H with X1 - single cycle

MPYF32 R1H,R1H,R0H ; R1H = M1 * X1 - 2p operation

|| MOV32 R0H,@B1 ; Load R0H with B1 - single cycle

NOP ; Wait for MPYF32 to complete

; <-- MPYF32 completes, R1H is valid

ADDF32 R1H,R1H,R0H ; R1H = R1H + R0H - 2p operation

NOP ; Wait for ADDF32 to complete

; <-- ADDF32 completes, R1H is valid

MOV32 @Y1,R1H ; Save R1H in Y1 - single cycle

; Calculate Y2

MOV32 R0H,@M2 ; Load R0H with M2 - single cycle

MOV32 R1H,@X2 ; Load R1H with X2 - single cycle

MPYF32 R1H,R1H,R0H ; R1H = M2 * X2 - 2p operation

|| MOV32 R0H,@B2 ; Load R0H with B2 - single cycle

NOP ; Wait for MPYF32 to complete

; <-- MPYF32 completes, R1H is valid

ADDF32 R1H,R1H,R0H ; R1H = R1H + R0H

NOP ; Wait for ADDF32 to complete

; <-- ADDF32 completes, R1H is valid

MOV32 @Y2,R1H ; Save R1H in Y2

; 14 cycles

; 48 bytes

The code shown in Example 1-17 was generated by the C28x+FPU compiler with optimization enabled.

Notice that the NOPs in the first example have now been filled with other instructions. The code for the

two Y = MX+B calculations are now interleaved and both calculations complete in only nine cycles.

SPRUHS1A–March 2014–Revised December 2015 Floating Point Unit (FPU)

Floating Point Unit Instruction Set

www.ti.com

Example 1-17. Floating-Point Code With Pipeline Optimization

; Using non-conflicting instructions for alignment cycles,

; calculate the following:

;

; Y1 = M1*X1 + B1

; Y2 = M2*X2 + B2

;

MOV32 R2H,@X1 ; Load R2H with X1 - single cycle

MOV32 R1H,@M1 ; Load R1H with M1 - single cycle

MPYF32 R3H,R2H,R1H ; R3H = M1 * X1 - 2p operation

|| MOV32 R0H,@M2 ; Load R0H with M2 - single cycle

MOV32 R1H,@X2 ; Load R1H with X2 - single cycle

; <-- MPYF32 completes, R3H is valid

MPYF32 R0H,R1H,R0H ; R0H = M2 * X2 - 2p operation

|| MOV32 R4H,@B1 ; Load R4H with B1 - single cycle

; <-- MOV32 completes, R4H is valid

ADDF32 R1H,R4H,R3H ; R1H = B1 + M1*X1 - 2p operation

|| MOV32 R2H,@B2 ; Load R2H with B2 - single cycle

; <-- MPYF32 completes, R0H is valid

ADDF32 R0H,R2H,R0H ; R0H = B2 + M2*X2 - 2p operation

; <-- ADDF32 completes, R1H is valid

MOV32 @Y1,R1H ; Store Y1

; <-- ADDF32 completes, R0H is valid

MOV32 @Y2,R0H ; Store Y2

; 9 cycles

; 36 bytes

1.5 Floating Point Unit Instruction Set

This chapter describes the assembly language instructions of the TMS320C28x plus floating-point

processor. Also described are parallel operations, conditional operations, resource constraints, and

addressing modes. The instructions listed here are an extension to the standard C28x instruction set. For

information on standard C28x instructions, see the TMS320C28x DSP CPU and Instruction Set Reference

Guide (literature number SPRU430).

1.5.1 Instruction Descriptions

This section gives detailed information on the instruction set. Each instruction may present the following

information:

• Operands

• Opcode

• Description

• Exceptions

• Pipeline

• Examples

• See also

The example INSTRUCTION is shown to familiarize you with the way each instruction is described. The

example describes the kind of information you will find in each part of the individual instruction description

and where to obtain more information. On the C28x+FPU instructions, follow the same format as the

C28x. The source operand(s) are always on the right and the destination operand(s) are on the left.

The explanations for the syntax of the operands used in the instruction descriptions for the TMS320C28x

plus floating-point processor are given in Table 1-4. For information on the operands of standard C28x

instructions, see the TMS320C28x DSP CPU and Instruction Set Reference Guide (SPRU430).

26 Floating Point Unit (FPU) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

Floating Point Unit Instruction Set

Table 1-4. Operand Nomenclature

Symbol Description

#16FHi 16-bit immediate (hex or float) value that represents the upper 16-bits of an IEEE 32-bit floating-point value.

Lower 16-bits of the mantissa are assumed to be zero.

#16FHiHex 16-bit immediate hex value that represents the upper 16-bits of an IEEE 32-bit floating-point value.

Lower 16-bits of the mantissa are assumed to be zero.

#16FLoHex A 16-bit immediate hex value that represents the lower 16-bits of an IEEE 32-bit floating-point value

#32Fhex 32-bit immediate value that represents an IEEE 32-bit floating-point value

#32F Immediate float value represented in floating-point representation

#0.0 Immediate zero

#RC 16-bit immediate value for the repeat count

*(0:16bitAddr) 16-bit immediate address, zero extended

CNDF Condition to test the flags in the STF register

FLAG Selected flags from STF register (OR) 11 bit mask indicating which floating-point status flags to change

label Label representing the end of the repeat block

mem16 Pointer (using any of the direct or indirect addressing modes) to a 16-bit memory location

mem32 Pointer (using any of the direct or indirect addressing modes) to a 32-bit memory location

RaH R0H to R7H registers

RbH R0H to R7H registers

RcH R0H to R7H registers

RdH R0H to R7H registers

ReH R0H to R7H registers

RfH R0H to R7H registers

RB Repeat Block Register

STF FPU Status Register

VALUE Flag value of 0 or 1 for selected flag (OR) 11 bit mask indicating the flag value; 0 or 1

SPRUHS1A–March 2014–Revised December 2015 Floating Point Unit (FPU)

INSTRUCTION dest1, source1, source2 — Short Description

www.ti.com

INSTRUCTION dest1, source1, source2 Short Description

Operands

dest1 description for the 1st operand for the instruction

source1 description for the 2nd operand for the instruction

source2 description for the 3rd operand for the instruction

Each instruction has a table that gives a list of the operands and a short description.

Instructions always have their destination operand(s) first followed by the source

operand(s).

Opcode This section shows the opcode for the instruction.

Description Detailed description of the instruction execution is described. Any constraints on the

operands imposed by the processor or the assembler are discussed.

Restrictions Any constraints on the operands or use of the instruction imposed by the processor are

discussed.

Pipeline This section describes the instruction in terms of pipeline cycles as described in

Section 1.4.

Example Examples of instruction execution. If applicable, register and memory values are given

before and after instruction execution. All examples assume the device is running with

the OBJMODE set to 1. Normally the boot ROM or the c-code initialization will set this

bit.

See also F32TOI16 RaH, RbH

F32TOUI16 RaH, RbH

F32TOUI16R RaH, RbH

I16TOF32 RaH, RbH

I16TOF32 RaH, mem16

UI16TOF32 RaH, mem16

UI16TOF32 RaH, RbH

SPRUHS1A–March 2014–Revised December 2015 Floating Point Unit (FPU)

F32TOI32 RaH, RbH — Convert 32-bit Floating-Point Value to 32-bit Integer

www.ti.com

F32TOI32 RaH, RbH Convert 32-bit Floating-Point Value to 32-bit Integer

Operands

RaH floating-point destination register (R0H to R7H)

RbH floating-point source register (R0H to R7H)

Opcode LSW: 1110 0110 1000 1000

MSW: 0000 0000 00bb baaa

Description Convert the 32-bit floating-point value in RbH to a 32-bit integer value and truncate.

Store the result in RaH.

RaH = F32TOI32(RbH)

Flags This instruction does not affect any flags:

Flag TF ZI NI ZF NF LUF LVF

Modified No No No No No No No

Pipeline This is a 2 pipeline cycle (2p) instruction. That is:

F32TOI32 RaH, RbH ; 2 pipeline cycles (2p)

NOP ; 1 cycle delay or non-conflicting instruction

; <-- F32TOI32 completes, RaH updated

NOP

Any instruction in the delay slot must not use RaH as a destination register or use RaH

as a source operand.

Example

MOVF32 R2H, #11204005.0 ; R2H = 11204005.0 (0x4B2AF5A5)

F32TOI32 R3H, R2H ; R3H = F32TOI32 (R2H)

MOVF32 R4H, #-11204005.0 ; R4H = -11204005.0 (0xCB2AF5A5)

; <-- F32TOI32 complete,

; R3H = 11204005 (0x00AAF5A5)

F32TOI32 R5H, R4H ; R5H = F32TOI32 (R4H)

NOP ; 1 Cycle delay for F32TOI32 to complete

; <-- F32TOI32 complete,

; R5H = -11204005 (0xFF550A5B)

See also F32TOI16 RaH, RbH

F32TOUI16R RaH, RbH

I16TOF32 RaH, RbH

I16TOF32 RaH, mem16

UI16TOF32 RaH, mem16

UI16TOF32 RaH, RbH

SPRUHS1A–March 2014–Revised December 2015 Floating Point Unit (FPU)

F32TOUI16R RaH, RbH — Convert 32-bit Floating-Point Value to 16-bit Unsigned Integer and Round

www.ti.com

F32TOUI16R RaH, RbH Convert 32-bit Floating-Point Value to 16-bit Unsigned Integer and Round

Operands

RaH floating-point destination register (R0H to R7H)

RbH floating-point source register (R0H to R7H)

Opcode LSW: 1110 0110 1000 1110

MSW: 1000 0000 00bb baaa

Description Convert the 32-bit floating-point value in RbH to an unsigned 16-bit integer and round to

the closest even value. The result will be stored in RaH. To instead truncate the

converted value, use the F32TOUI16 instruction.

RaH(15:0) = F32ToUI16round(RbH)

RaH(31:16) = 0x0000

Flags This instruction does not affect any flags:

Flag TF ZI NI ZF NF LUF LVF

Modified No No No No No No No

Pipeline This is a 2 pipeline cycle (2p) instruction. That is:

F32TOUI16R RaH, RbH ; 2 pipeline cycles (2p)

NOP ; 1 cycle delay or non-conflicting instruction

; <-- F32TOUI16R completes, RaH updated

NOP

Any instruction in the delay slot must not use RaH as a destination register or use RaH

as a source operand.

Example MOVIZ R5H, #0x412C ; R5H = 0x412C

MOVXI R5H, #0xCCCD ; R5H = 0xCCCD

; R5H = 10.8 (0x412CCCCD)

F32TOUI16R R6H, R5H ; R6H (15:0) = F32TOUI16round (R5H)

; R6H (31:16) = 0x0000

MOVF32 R7H, #-10.8 ; R7H = -10.8 (0x0xC12CCCCD)

; <-- F32TOUI16R complete,

; R6H (15:0) = 11.0 (0x000B)

; R6H (31:16) = 0.0 (0x0000)

F32TOUI16R R0H, R7H ; R0H (15:0) = F32TOUI16round (R7H)

; R0H (31:16) = 0x0000

NOP ; 1 Cycle delay for F32TOUI16R to complete

; <-- F32TOUI16R complete,

; R0H (15:0) = 0.0 (0x0000)

; R0H (31:16) = 0.0 (0x0000)

See also F32TOI16 RaH, RbH

F32TOI16R RaH, RbH

F32TOUI16 RaH, RbH

I16TOF32 RaH, RbH

I16TOF32 RaH, mem16

UI16TOF32 RaH, mem16

UI16TOF32 RaH, RbH

54 Floating Point Unit (FPU) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

F32TOUI32 RaH, RbH — Convert 32-bit Floating-Point Value to 16-bit Unsigned Integer

F32TOUI32 RaH, RbH Convert 32-bit Floating-Point Value to 16-bit Unsigned Integer

Operands

RaH floating-point destination register (R0H to R7H)

RbH floating-point source register (R0H to R7H)

Opcode LSW: 1110 0110 1000 1010

MSW: 0000 0000 00bb baaa

Description Convert the 32-bit floating-point value in RbH to an unsigned 32-bit integer and store the

result in RaH.

RaH = F32ToUI32(RbH)

Flags This instruction does not affect any flags:

Flag TF ZI NI ZF NF LUF LVF

Modified No No No No No No No

Pipeline This is a 2 pipeline cycle (2p) instruction. That is:

F32TOUI32 RaH, RbH ; 2 pipeline cycles (2p)

NOP ; 1 cycle delay or non-conflicting instruction

; <-- F32TOUI32 completes, RaH updated

NOP

Any instruction in the delay slot must not use RaH as a destination register or use RaH

as a source operand.

Example MOVIZF32 R6H, #12.5 ; R6H = 12.5 (0x41480000)

F32TOUI32 R7H, R6H ; R7H = F32TOUI32 (R6H)

MOVIZF32 R1H, #-6.5 ; R1H = -6.5 (0xC0D00000)

; <-- F32TOUI32 complete, R7H = 12.0 (0x0000000C)

F32TOUI32 R2H, R1H ; R2H = F32TOUI32 (R1H)

NOP ; 1 Cycle delay for F32TOUI32 to complete

; <-- F32TOUI32 complete, R2H = 0.0 (0x00000000)

See also F32TOI32 RaH, RbH

I32TOF32 RaH, RbH

I32TOF32 RaH, mem32

UI32TOF32 RaH, RbH

UI32TOF32 RaH, mem32

SPRUHS1A–March 2014–Revised December 2015 Floating Point Unit (FPU)

FRACF32 RaH, RbH — Fractional Portion of a 32-bit Floating-Point Value

www.ti.com

FRACF32 RaH, RbH Fractional Portion of a 32-bit Floating-Point Value

Operands

RaH floating-point destination register (R0H to R7H)

RbH floating-point source register (R0H to R7H)

Opcode LSW: 1110 0110 1111 0001

MSW: 0000 0000 00bb baaa

Description Returns in RaH the fractional portion of the 32-bit floating-point value in RbH

Flags This instruction does not affect any flags:

Flag TF ZI NI ZF NF LUF LVF

Modified No No No No No No No

Pipeline This is a 2 pipeline cycle (2p) instruction. That is:

FRACF32 RaH, RbH ; 2 pipeline cycles (2p)

NOP ; 1 cycle delay or non-conflicting instruction

; <-- FRACF32 completes, RaH updated

NOP

Any instruction in the delay slot must not use RaH as a destination register or use RaH

as a source operand.

Example MOVIZF32 R2H, #19.625 ; R2H = 19.625 (0x419D0000)

FRACF32 R3H, R2H ; R3H = FRACF32 (R2H)

NOP ; 1 Cycle delay for FRACF32 to complete

; <-- FRACF32 complete, R3H = 0.625 (0x3F200000)

See also

56 Floating Point Unit (FPU) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

I16TOF32 RaH, RbH — Convert 16-bit Integer to 32-bit Floating-Point Value

I16TOF32 RaH, RbH Convert 16-bit Integer to 32-bit Floating-Point Value

Operands

RaH floating-point destination register (R0H to R7H)

RbH floating-point source register (R0H to R7H)

Opcode LSW: 1110 0110 1000 1101

MSW: 0000 0000 00bb baaa

Description Convert the 16-bit signed integer in RbH to a 32-bit floating point value and store the

result in RaH.

RaH = I16ToF32 RbH

Flags This instruction does not affect any flags:

Flag TF ZI NI ZF NF LUF LVF

Modified No No No No No No No

Pipeline This is a 2 pipeline cycle (2p) instruction. That is:

I16TOF32 RaH, RbH ; 2 pipeline cycles (2p)

NOP ; 1 cycle delay or non-conflicting instruction

; <-- I16TOF32 completes, RaH updated

NOP

Any instruction in the delay slot must not use RaH as a destination register or use RaH

as a source operand.

Example MOVIZ R0H, #0x0000 ; R0H[31:16] = 0.0 (0x0000)

MOVXI R0H, #0x0004 ; R0H[15:0] = 4.0 (0x0004)

I16TOF32 R1H, R0H ; R1H = I16TOF32 (R0H)

MOVIZ R2H, #0x0000 ; R2H[31:16] = 0.0 (0x0000)

; <--I16TOF32 complete, R1H = 4.0 (0x40800000)

MOVXI R2H, #0xFFFC ; R2H[15:0] = -

4.0 (0xFFFC) I16TOF32 R3H, R2H ; R3H = I16TOF32 (R2H)

NOP ; 1 Cycle delay for I16TOF32 to complete

; <-- I16TOF32 complete, R3H = -4.0 (0xC0800000)

See also F32TOI16 RaH, RbH

F32TOI16R RaH, RbH

F32TOUI16 RaH, RbH

F32TOUI16R RaH, RbH

I16TOF32 RaH, mem16

UI16TOF32 RaH, mem16

UI16TOF32 RaH, RbH

SPRUHS1A–March 2014–Revised December 2015 Floating Point Unit (FPU)

I16TOF32 RaH, mem16 — Convert 16-bit Integer to 32-bit Floating-Point Value

www.ti.com

I16TOF32 RaH, mem16 Convert 16-bit Integer to 32-bit Floating-Point Value

Operands

RaH floating-point destination register (R0H to R7H)

mem316 16-bit source memory location to be converted

Opcode LSW: 1110 0010 1100 1000

MSW: 0000 0aaa mem16

Description Convert the 16-bit signed integer indicated by the mem16 pointer to a 32-bit floating-

point value and store the result in RaH.

RaH = I16ToF32[mem16]

Flags This instruction does not affect any flags:

Flag TF ZI NI ZF NF LUF LVF

Modified No No No No No No No

Pipeline This is a 2 pipeline cycle (2p) instruction. That is:

I16TOF32 RaH, mem16 ; 2 pipeline cycles (2p)

NOP ; 1 cycle delay or non-conflicting instruction

; <-- I16TOF32 completes, RaH updated

NOP

Any instruction in the delay slot must not use RaH as a destination register or use RaH

as a source operand.

Example MOVW DP, #0x0280 ; DP = 0x0280

MOV @0, #0x0004 ; [0x00A000] = 4.0 (0x0004)

I16TOF32 R0H, @0 ; R0H = I16TOF32 [0x00A000]

MOV @1, #0xFFFC ; [0x00A001] = -4.0 (0xFFFC)

; <--I16TOF32 complete, R0H = 4.0 (0x40800000)

I16TOF32 R1H, @1 ; R1H = I16TOF32 [0x00A001]

NOP ; 1 Cycle delay for I16TOF32 to complete

; <-- I16TOF32 complete, R1H = -4.0 (0xC0800000)

See also F32TOI16 RaH, RbH

F32TOI16R RaH, RbH

F32TOUI16 RaH, RbH

F32TOUI16R RaH, RbH

I16TOF32 RaH, RbH

UI16TOF32 RaH, mem16

UI16TOF32 RaH, RbH

58 Floating Point Unit (FPU) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

I32TOF32 RaH, mem32 — Convert 32-bit Integer to 32-bit Floating-Point Value

I32TOF32 RaH, mem32 Convert 32-bit Integer to 32-bit Floating-Point Value

Operands

RaH floating-point destination register (R0H to R7H)

mem32 32-bit source for the MOV32 operation. mem32 means that the operation can only address memory

using any of the direct or indirect addressing modes supported by the C28x CPU

Opcode LSW: 1110 0010 1000 1000

MSW: 0000 0aaa mem32

Description Convert the 32-bit signed integer indicated by the mem32 pointer to a 32-bit floating

point value and store the result in RaH.

RaH = I32ToF32[mem32]

Flags This instruction does not affect any flags:

Flag TF ZI NI ZF NF LUF LVF

Modified No No No No No No No

Pipeline This is a 2 pipeline cycle (2p) instruction. That is:

I32TOF32 RaH, mem32 ; 2 pipeline cycles (2p)

NOP ; 1 cycle delay or non-conflicting instruction

; <-- I32TOF32 completes, RaH updated

NOP

Any instruction in the delay slot must not use RaH as a destination register or use RaH

as a source operand.

Example MOVW DP, #0x0280 ; DP = 0x0280

MOV @0, #0x1111 ; [0x00A000] = 4369 (0x1111)

MOV @1, #0x1111 ; [0x00A001] = 4369 (0x1111)

; Value of the 32 bit signed integer present in

; 0x00A001 and 0x00A000 is +286331153 (0x11111111)

I32TOF32 R1H, @0 ; R1H = I32TOF32 (0x11111111)

NOP ; 1 Cycle delay for I32TOF32 to complete

; <-- I32TOF32 complete, R1H = 286331153 (0x4D888888)

See also F32TOI32 RaH, RbH

F32TOUI32 RaH, RbH

I32TOF32 RaH, RbH

UI32TOF32 RaH, RbH

UI32TOF32 RaH, mem32

SPRUHS1A–March 2014–Revised December 2015 Floating Point Unit (FPU)

I32TOF32 RaH, RbH — Convert 32-bit Integer to 32-bit Floating-Point Value

www.ti.com

I32TOF32 RaH, RbH Convert 32-bit Integer to 32-bit Floating-Point Value

Operands

RaH floating-point destination register (R0H to R7H)

RbH floating-point source register (R0H to R7H)

Opcode LSW: 1110 0110 1000 1001

MSW: 0000 0000 00bb baaa

Description Convert the signed 32-bit integer in RbH to a 32-bit floating-point value and store the

result in RaH.

RaH = I32ToF32(RbH)

Flags This instruction does not affect any flags:

Flag TF ZI NI ZF NF LUF LVF

Modified No No No No No No No

Pipeline This is a 2 pipeline cycle (2p) instruction. That is:

I32TOF32 RaH, RbH ; 2 pipeline cycles (2p)

NOP ; 1 cycle delay or non-conflicting instruction

; <-- I32TOF32 completes, RaH updated

NOP

Any instruction in the delay slot must not use RaH as a destination register or use RaH

as a source operand.

Example MOVIZ R2H, #0x1111 ; R2H[31:16] = 4369 (0x1111)

MOVXI R2H, #0x1111 ; R2H[15:0] = 4369 (0x1111)

; Value of the 32 bit signed integer present

; in R2H is +286331153 (0x11111111)

I32TOF32 R3H, R2H ; R3H = I32TOF32 (R2H)

NOP ; 1 Cycle delay for I32TOF32 to complete

; <-- I32TOF32 complete, R3H = 286331153 (0x4D888888)

See also F32TOI32 RaH, RbH

F32TOUI32 RaH, RbH

I32TOF32 RaH, mem32

UI32TOF32 RaH, RbH

UI32TOF32 RaH, mem32

60 Floating Point Unit (FPU) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

MACF32 R3H, R2H, RdH, ReH, RfH — 32-bit Floating-Point Multiply with Parallel Add

MACF32 R3H, R2H, RdH, ReH, RfH 32-bit Floating-Point Multiply with Parallel Add

Operands This instruction is an alias for the parallel multiply and add instruction. The operands are

translated by the assembler such that the instruction becomes:

MPYF32 RdH, RaH, RbH

|| ADDF32 R3H, R3H, R2H

R3H floating-point destination and source register for the ADDF32

R2H floating-point source register for the ADDF32 operation (R0H to R7H)

RdH floating-point destination register for MPYF32 operation (R0H to R7H)

RdH cannot be R3H

ReH floating-point source register for MPYF32 operation (R0H to R7H)

RfH floating-point source register for MPYF32 operation (R0H to R7H)

Opcode LSW: 1110 0111 0100 00ff

MSW: feee dddc ccbb baaa

Description This instruction is an alias for the parallel multiply and add, MACF32 || ADDF32,

instruction.

RdH = ReH * RfH

R3H = R3H + R2H

Restrictions The destination register for the MPYF32 and the ADDF32 must be unique. That is, RdH

cannot be R3H.

Flags This instruction modifies the following flags in the STF register:.

Flag TF ZI NI ZF NF LUF LVF

Modified No No No No No Yes Yes

The STF register flags are modified as follows:

• LUF = 1 if MPYF32 or ADDF32 generates an underflow condition.

• LVF = 1 if MPYF32 or ADDF32 generates an overflow condition.

Pipeline Both MPYF32 and ADDF32 take 2 pipeline cycles (2p) That is:

MPYF32 RaH, RbH, RcH ; 2 pipeline cycles (2p)

|| ADDF32 RdH, ReH, RfH ; 2 pipeline cycles (2p)

NOP ; 1 cycle delay or non-conflicting instruction

; <-- MPYF32, ADDF32 complete, RaH, RdH updated

NOP

Any instruction in the delay slot must not use RaH or RdH as a destination register or as

a source operand.

SPRUHS1A–March 2014–Revised December 2015 Floating Point Unit (FPU)

MACF32 R3H, R2H, RdH, ReH, RfH — 32-bit Floating-Point Multiply with Parallel Add

www.ti.com

Example ; Perform 5 multiply and accumulate operations:

;

; 1st multiply: A = X0 * Y0

; 2nd multiply: B = X1 * Y1

; 3rd multiply: C = X2 * Y2

; 4th multiply: D = X3 * Y3

; 5th multiply: E = X3 * Y3

;

; Result=A+B+C+D+E

MOV32 R0H, *XAR4++ ; R0H = X0

MOV32 R1H, *XAR5++ ; R1H = Y0

;R2H=A=X0*Y0

MPYF32 R2H, R0H, R1H ; In parallel R0H = X1

|| MOV32 R0H, *XAR4++

MOV32 R1H, *XAR5++ ; R1H = Y1

;R3H=B=X1*Y1

MPYF32 R3H, R0H, R1H ; In parallel R0H = X2

|| MOV32 R0H, *XAR4++

MOV32 R1H, *XAR5++ ; R1H = Y2

;R3H=A+B

;R2H=C=X2*Y2

MACF32 R3H, R2H, R2H, R0H, R1H ; In parallel R0H = X3

|| MOV32 R0H, *XAR4++

MOV32 R1H, *XAR5++ ; R1H = Y3

;R3H=(A+B)+C

;R2H=D=X3*Y3

MACF32 R3H, R2H, R2H, R0H, R1H ; In parallel R0H = X4

|| MOV32 R0H, *XAR4

MOV32 R1H, *XAR5 ; R1H = Y4

; The next MACF32 is an alias for

; MPYF32 || ADDF32

;R2H=E=X4*Y4

MACF32 R3H, R2H, R2H, R0H, R1H ; in parallel R3H = (A + B + C) + D

NOP ; Wait for MPYF32 || ADDF32 to complete

ADDF32 R3H, R3H, R2H ; R3H = (A + B + C + D) + E

NOP ; Wait for ADDF32 to complete

MOV32 @Result, R3H ; Store the result

See also MACF32 R3H, R2H, RdH, ReH, RfH || MOV32 RaH, mem32

MACF32 R7H, R3H, mem32, *XAR7++

MACF32 R7H, R6H, RdH, ReH, RfH

MACF32 R7H, R6H, RdH, ReH, RfH || MOV32 RaH, mem32

MPYF32 RaH, RbH, RcH || ADDF32 RdH, ReH, RfH

62 Floating Point Unit (FPU) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

MACF32 R3H, R2H, RdH, ReH, RfH ∥∥MOV32 RaH, mem32 — 32-bit Floating-Point Multiply and Accumulate

with Parallel Move

MACF32 R3H, R2H, RdH, ReH, RfH ∥∥MOV32 RaH, mem32 32-bit Floating-Point Multiply and

Accumulate with Parallel Move

Operands

R3H floating-point destination/source register R3H for the add operation

R2H floating-point source register R2H for the add operation

RdH floating-point destination register (R0H to R7H) for the multiply operation

RdH cannot be the same register as RaH

ReH floating-point source register (R0H to R7H) for the multiply operation

RfH floating-point source register (R0H to R7H) for the multiply operation

RaH floating-point destination register for the MOV32 operation (R0H to R7H).

RaH cannot be R3H or the same register as RdH.

mem32 32-bit source for the MOV32 operation

Opcode LSW: 1110 0011 0011 fffe

MSW: eedd daaa mem32

Description Multiply and accumulate the contents of floating-point registers and move from register

to memory. The destination register for the MOV32 cannot be the same as the

destination registers for the MACF32.

R3H = R3H + R2H,

RdH = ReH * RfH,

RaH = [mem32]

Restrictions The destination registers for the MACF32 and the MOV32 must be unique. That is, RaH

cannot be R3H and RaH cannot be the same register as RdH.

Flags This instruction modifies the following flags in the STF register:

Flag TF ZI NI ZF NF LUF LVF

Modified No Yes Yes Yes Yes Yes Yes

The STF register flags are modified as follows:

• LUF = 1 if MACF32 (add or multiply) generates an underflow condition.

• LVF = 1 if MACF32 (add or multiply) generates an overflow condition.

MOV32 sets the NF, ZF, NI and ZI flags as follows:

NF = RaH(31);

ZF = 0;

if(RaH(30:23) == 0) { ZF = 1; NF = 0; }

NI = RaH(31);

ZI = 0;

if(RaH(31:0) == 0) ZI = 1;

Pipeline The MACF32 takes 2 pipeline cycles (2p) and the MOV32 takes a single cycle. That is:

MACF32 R3H, R2H, RdH, ReH, RfH ; 2 pipeline cycles (2p)

|| MOV32 RaH, mem32 ; 1 cycle

; <-- MOV32 completes, RaH updated

NOP ; 1 cycle delay for MACF32

; <-- MACF32 completes, R3H, RdH updated

NOP

Any instruction in the delay slot for this version of MACF32 must not use R3H or RdH as

a destination register or R3H or RdH as a source operand.

SPRUHS1A–March 2014–Revised December 2015 Floating Point Unit (FPU)

MACF32 R3H, R2H, RdH, ReH, RfH ∥∥MOV32 RaH, mem32 — 32-bit Floating-Point Multiply and Accumulate with Parallel

Move

www.ti.com

Example ; Perform 5 multiply and accumulate operations:

;

; 1ST multiply: A = X0 * Y0

; 2nd multiply: B = X1 * Y1

; 3rd multiply: C = X2 * Y2

; 4TH multiply: D = X3 * Y3

; 5th multiply: E = X3 * Y3

;

; Result=A+B+C+D+E

MOV32 R0H, *XAR4++ ; R0H = X0

MOV32 R1H, *XAR5++ ; R1H = Y0

;R2H=A=X0*Y0

MPYF32 R2H, R0H, R1H ; In parallel R0H = X1

|| MOV32 R0H, *XAR4++

MOV32 R1H, *XAR5++ ; R1H = Y1

;R3H=B=X1*Y1

MPYF32 R3H, R0H, R1H ; In parallel R0H = X2

|| MOV32 R0H, *XAR4++

MOV32 R1H, *XAR5++ ; R1H = Y2

;R3H=A+B

;R2H=C=X2*Y2

MACF32 R3H, R2H, R2H, R0H, R1H ; In parallel R0H = X3

|| MOV32 R0H, *XAR4++

MOV32 R1H, *XAR5++ ; R1H = Y3

;R3H=(A+B)+C

;R2H=D=X3*Y3

MACF32 R3H, R2H, R2H, R0H, R1H ; In parallel R0H = X4

|| MOV32 R0H, *XAR4

MOV32 R1H, *XAR5 ; R1H = Y4

;R2H=E=X4*Y4

MPYF32 R2H, R0H, R1H ; in parallel R3H = (A + B + C) + D

|| ADDF32 R3H, R3H, R2H

NOP ; Wait for MPYF32 || ADDF32 to complete

ADDF32 R3H, R3H, R2H ; R3H = (A + B + C + D) + E

NOP ; Wait for ADDF32 to complete

MOV32 @Result, R3H ; Store the result

See also MOVIZ RaH, #16FHiHex

MOVIZF32 RaH, #16FHi

MOVXI RaH, #16FLoHex

SPRUHS1A–March 2014–Revised December 2015 Floating Point Unit (FPU)

MOV32 *(0:16bitAddr), loc32 — Move the Contents of loc32 to Memory

www.ti.com

MOV32 *(0:16bitAddr), loc32 Move the Contents of loc32 to Memory

Operands

0:16bitAddr 16-bit immediate address, zero extended

loc32 32- bit source location

Opcode LSW: 1011 1101 loc32

MSW: IIII IIII IIII IIII

Description Move the 32-bit value in loc32 to the memory location addressed by 0:16bitAddr. The

EALLOW bit in the ST1 register is ignored by this operation.

[0:16bitAddr] = [loc32]

Flags This instruction does not modify any STF register flags.

Flag TF ZI NI ZF NF LUF LVF

Modified No No No No No No No

Pipeline This is a two-cycle instruction.

Example MOVIZ R5H, #0x1234 ; R5H[31:16] = 0x1234

MOVXI R5H, #0xABCD ; R5H[15:0] = 0xABCD

NOP ; 1 Alignment Cycle

MOV32 ACC, R5H ; ACC = 0x1234ABCD

MOV32 *(0xA000), @ACC ; [0x00A000] = ACC NOP

; 1 Cycle delay for MOV32 to complete

; <-- MOV32 *(0:16bitAddr), loc32 complete,

; [0x00A000] = 0xABCD, [0x00A001] = 0x1234

See also MOVIZ RaH, #16FHiHex

MOVXI RaH, #16FLoHex

MOVI32 RaH, #32FHex

MOVIZF32 RaH, #16FHi

96 Floating Point Unit (FPU) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

MOVI32 RaH, #32FHex — Load the 32-bits of a 32-bit Floating-Point Register with the immediate

MOVI32 RaH, #32FHex Load the 32-bits of a 32-bit Floating-Point Register with the immediate

Operands This instruction is an alias for MOVIZ and MOVXI instructions. The second operand is

translated by the assembler such that the instruction becomes:

MOVIZ RaH, #16FHiHex

MOVXI RaH, #16FLoHex

RaH floating-point register (R0H to R7H)

#32FHex A 32-bit immediate value that represents an IEEE 32-bit floating-point value.

Opcode

LSW: 1110 1000 0000 0III (opcode of MOVIZ RaH, #16FHiHex)

MSW: IIII IIII IIII Iaaa

LSW: 1110 1000 0000 1III (opcode of MOVXI RaH, #16FLoHex)

MSW: IIII IIII IIII Iaaa

Description Note: This instruction only accepts a hex value as the immediate operand. To specify the

immediate value with a floating-point representation use the MOVF32 RaH, #32F

instruction.

Load the 32-bits of RaH with the immediate 32-bit hex value represented by #32Fhex.

#32Fhex is a 32-bit immediate hex value that represents the IEEE 32-bit floating-point

value of a floating-point number. The assembler will only accept a hex immediate value.

That is, 3.0 can only be represented as #0x40400000. #3.0 will result in an error.

RaH = #32FHex

Flags This instruction modifies the following flags in the STF register:

Flag TF ZI NI ZF NF LUF LVF

Modified No No No No No No No

Pipeline Depending on #32FHex, this instruction takes one or two cycles. If all of the lower 16-

bits of #32FHex are zeros, then assembler will convert MOVI32 to the MOVIZ

instruction. If the lower 16-bits of #32FHex are not zeros, then assembler will convert

MOVI32 to a MOVIZ and a MOVXI instruction.

Example MOVI32 R1H, #0x40400000 ; R1H = 0x40400000

; Assembler converts this instruction as

; MOVIZ R1H, #0x4040

MOVI32 R2H, #0x00000000 ; R2H = 0x00000000

; Assembler converts this instruction as

; MOVIZ R2H, #0x0

MOVI32 R3H, #0x40004001 ; R3H = 0x40004001

; Assembler converts this instruction as

; MOVIZ R3H, #0x4000 ; MOVXI R3H, #0x4001

MOVI32 R4H, #0x00004040 ; R4H = 0x00004040

; Assembler converts this instruction as

; MOVIZ R4H, #0x0000 ; MOVXI R4H, #0x4040

See also MOVIZ RaH, #16FHiHex

MOVXI RaH, #16FLoHex

MOVF32 RaH, #32F

MOVIZF32 RaH, #16FHi

SPRUHS1A–March 2014–Revised December 2015 Floating Point Unit (FPU)

MOVIZ RaH, #16FHiHex — Load the Upper 16-bits of a 32-bit Floating-Point Register

www.ti.com

MOVIZ RaH, #16FHiHex Load the Upper 16-bits of a 32-bit Floating-Point Register

Operands

RaH floating-point register (R0H to R7H)

#16FHiHex A 16-bit immediate hex value that represents the upper 16-bits of an IEEE 32-bit floating-point value.

The low 16-bits of the mantissa are assumed to be all 0.

Opcode

LSW: 1110 1000 0000 0III

MSW: IIII IIII IIII Iaaa

Description Note: This instruction only accepts a hex value as the immediate operand. To specify the

immediate value with a floating-point representation use the MOVIZF32 pseudo

instruction.

Load the upper 16-bits of RaH with the immediate value #16FHiHex and clear the low

16-bits of RaH.

#16FHiHex is a 16-bit immediate value that represents the upper 16-bits of an IEEE 32-

bit floating-point value. The low 16-bits of the mantissa are assumed to be all 0. The

assembler will only accept a hex immediate value. That is, -1.5 can only be represented

as #0xBFC0. #-1.5 will result in an error.

By itself, MOVIZ is useful for loading a floating-point register with a constant in which the

lowest 16-bits of the mantissa are 0. Some examples are 2.0 (0x40000000), 4.0

(0x40800000), 0.5 (0x3F000000), and -1.5 (0xBFC00000). If a constant requires all 32-

bits of a floating-point register to be initialized, then use MOVIZ along with the MOVXI

instruction.

RaH[31:16] = #16FHiHex

RaH[15:0] = 0

Flags This instruction modifies the following flags in the STF register:

Flag TF ZI NI ZF NF LUF LVF

Modified No No No No No No No

Pipeline This is a single-cycle instruction.

Example

; Load R0H with -1.5 (0xBFC00000)

MOVIZ R0H, #0xBFC0 ; R0H = 0xBFC00000

; Load R0H with pi = 3.141593 (0x40490FDB)

MOVIZ R0H, #0x4049 ; R0H = 0x40490000

MOVXI R0H, #0x0FDB ; R0H = 0x40490FDB

See also MOVIZ RaH, #16FHiHex

MOVXI RaH, #16FLoHex

SPRUHS1A–March 2014–Revised December 2015 Floating Point Unit (FPU)

MOVST0 FLAG — Load Selected STF Flags into ST0

www.ti.com

MOVST0 FLAG Load Selected STF Flags into ST0

Operands

FLAG Selected flag

Opcode LSW: 1010 1101 FFFF FFFF

Description Load selected flags from the STF register into the ST0 register of the 28x CPU where

FLAG is one or more of TF, CI, ZI, ZF, NI, NF, LUF or LVF. The specified flag maps to

the ST0 register as follows:

• Set OV = 1 if LVF or LUF is set. Otherwise clear OV.

• Set N = 1 if NF or NI is set. Otherwise clear N.

• Set Z = 1 if ZF or ZI is set. Otherwise clear Z.

• Set C = 1 if TF is set. Otherwise clear C.

• Set TC = 1 if TF is set. Otherwise clear TF.

If any STF flag is not specified, then the corresponding ST0 register bit is not modified.

Restrictions Do not use the MOVST0 instruction in the delay slots for pipelined operations. Doing so

can yield invalid results. To avoid this, the proper number of NOPs or non-pipelined

instructions must be inserted before the MOVST0 operation.

; The following is INVALID

MPYF32 R2H, R1H, R0H ; 2 pipeline-cycle instruction (2p)

MOVST0 TF ; INVALID, do not use MOVST0 in a delay slot

; The following is VALID

MPYF32 R2H, R1H, R0H ; 2 pipeline-cycle instruction (2p)

NOP ; 1 delay cycle, R2H updated after this instruction

MOVST0 TF ; VALID

Flags This instruction modifies the following flags in the STF register:

Flag TF ZI NI ZF NF LUF LVF

Modified No No No No No Yes Yes

When the flags are moved to the C28x ST0 register, the LUF or LVF flags are

automatically cleared if selected.

Pipeline This is a single-cycle instruction.

Example Program flow is controlled by C28x instructions that read status flags in the status

information in the STF register needs to be loaded into ST0 flags (Z,N,OV,TC,C) so that

the appropriate branch conditional instruction can be executed. The MOVST0 FLAG

instruction is used to load the current value of specified STF flags into the respective bits

of ST0. When this instruction executes, it will also clear the latched overflow and

underflow flags if those flags are specified.

Loop:

MOV32 R0H,*XAR4++

MOV32 R1H,*XAR3++

CMPF32 R1H, R0H

MOVST0 ZF, NF

BF Loop, GT ; Loop if (R1H > R0H)

See also MOVIZ RaH, #16FHiHex

MOVIZF32 RaH, #16FHi

101

SPRUHS1A–March 2014–Revised December 2015 Floating Point Unit (FPU)

MPYF32 RaH, RbH, RcH — 32-bit Floating-Point Multiply

www.ti.com

MPYF32 RaH, RbH, RcH 32-bit Floating-Point Multiply

Operands

RaH floating-point destination register (R0H to R7H)

RbH floating-point source register (R0H to R7H)

RcH floating-point source register (R0H to R7H)

Opcode

LSW: 1110 0111 0000 0000

MSW: 0000 000c ccbb baaa

Description Multiply the contents of two floating-point registers.

RaH = RbH * RcH

Flags This instruction modifies the following flags in the STF register:.

Flag TF ZI NI ZF NF LUF LVF

Modified No No No No No Yes Yes

The STF register flags are modified as follows:

• LUF = 1 if MPYF32 generates an underflow condition.

• LVF = 1 if MPYF32 generates an overflow condition.

Pipeline This is a 2 pipeline cycle (2p) instruction. That is:

MPYF32 RaH, RbH, RcH ; 2 pipeline cycles (2p)

NOP ; 1 cycle delay or non-conflicting instruction

; <-- MPYF32 completes, RaH updated

NOP

Any instruction in the delay slot must not use RaH as a destination register or use RaH

as a source operand.

Example Calculate Y = A * B:

MOVL XAR4, #A

MOV32 R0H, *XAR4 ; Load R0H with A

MOVL XAR4, # B

MOV32 R1H, *XAR4 ; Load R1H with B

MPYF32 R0H,R1H,R0H ; Multiply A * B

MOVL XAR4, #Y

; <--MPYF32 complete

MOV32 *XAR4,R0H ; Save the result

See also SUBF32 RaH, RbH, RcH

SUBF32 RdH, ReH, RfH || MOV32 RaH, mem32

SUBF32 RdH, ReH, RfH || MOV32 mem32, RaH

112 Floating Point Unit (FPU) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

NEGF32 RaH, RbH{, CNDF} — Conditional Negation

NEGF32 RaH, RbH{, CNDF} Conditional Negation

Operands

RaH floating-point destination register (R0H to R7H)

RbH floating-point source register (R0H to R7H)

CNDF condition tested

Opcode LSW: 1110 0110 1010 CNDF

MSW: 0000 0000 00bb baaa

Description if (CNDF == true) {RaH = - RbH }

else {RaH = RbH }

CNDF is one of the following conditions:

Encode (1) CNDF Description STF Flags Tested

0000 NEQ Not equal to zero ZF == 0

0001 EQ Equal to zero ZF == 1

0010 GT Greater than zero ZF == 0 AND NF == 0

0011 GEQ Greater than or equal to zero NF == 0

0100 LT Less than zero NF == 1

0101 LEQ Less than or equal to zero ZF == 1 AND NF == 1

1010 TF Test flag set TF == 1

1011 NTF Test flag not set TF == 0

1100 LU Latched underflow LUF == 1

1101 LV Latched overflow LVF == 1

1110 UNC Unconditional None

1111 UNCF (2) Unconditional with flag modification None

(1) Values not shown are reserved.

(2) This is the default operation if no CNDF field is specified. This condition will allow the ZF, NF, ZI, and NI flags to be modified

when a conditional operation is executed. All other conditions will not modify these flags.

Flags This instruction modifies the following flags in the STF register:

Flag TF ZI NI ZF NF LUF LVF

Modified No No No Yes Yes No No

Pipeline This is a single-cycle instruction.

Example MOVIZF32 R0H, #5.0 ; R0H = 5.0 (0x40A00000)

MOVIZF32 R1H, #4.0 ; R1H = 4.0 (0x40800000)

MOVIZF32 R2H, #-1.5 ; R2H = -1.5 (0xBFC00000)

MPYF32 R4H, R1H, R2H ; R4H = -6.0

MPYF32 R5H, R0H, R1H ; R5H = 20.0

; <-- R4H valid

CMPF32 R4H, #0.0 ; NF = 1

; <-- R5H valid

NEGF32 R4H, R4H, LT ; if NF = 1, R4H = 6.0

CMPF32 R5H, #0.0 ; NF = 0

NEGF32 R5H, R5H, GEQ ; if NF = 0, R4H = -20.0

See also POP RB

RPTB #RSIZE, RC

RPTB #RSIZE, loc16

116 Floating Point Unit (FPU) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

RESTORE — Restore the Floating-Point Registers

RESTORE Restore the Floating-Point Registers

Operands

none This instruction does not have any operands

Opcode LSW: 1110 0101 0110 0010

Description Restore the floating-point register set (R0H - R7H and STF) from their shadow registers.

The SAVE and RESTORE instructions should be used in high-priority interrupts. That is

interrupts that cannot themselves be interrupted. In low-priority interrupt routines the

floating-point registers should be pushed onto the stack.

Restrictions The RESTORE instruction cannot be used in any delay slots for pipelined operations.

Doing so will yield invalid results. To avoid this, the proper number of NOPs or non-

pipelined instructions must be inserted before the RESTORE operation.

; The following is INVALID

MPYF32 R2H, R1H, R0H ; 2 pipeline-cycle instruction (2p)

RESTORE ; INVALID, do not use RESTORE in a delay slot

; The following is VALID

MPYF32 R2H, R1H, R0H ; 2 pipeline-cycle instruction (2p)

NOP ; 1 delay cycle, R2H updated after this instruction

RESTORE ; VALID

Flags Restoring the status register will overwrite all flags:

Flag TF ZI NI ZF NF LUF LVF

Modified Yes Yes Yes Yes Yes Yes Yes

Pipeline This is a single-cycle instruction.

117

SPRUHS1A–March 2014–Revised December 2015 Floating Point Unit (FPU)

RESTORE — Restore the Floating-Point Registers

www.ti.com

Example The following example shows a complete context save and restore for a high-priority

interrupt. Note that the CPU automatically stores the following registers: ACC, P, XT,

ST0, ST1, IER, DP, AR0, AR1 and PC. If an interrupt is low priority (that is it can be

interrupted), then push the floating point registers onto the stack instead of using the

SAVE and RESTORE operations.

; Interrupt Save

_HighestPriorityISR: ; Uninterruptable

ASP ; Align stack

PUSH RB ; Save RB register if used in the ISR

PUSH AR1H:AR0H ; Save other registers if used

PUSH XAR2

PUSH XAR3

PUSH XAR4

PUSH XAR5

PUSH XAR6

PUSH XAR7

PUSH XT

SPM 0 ; Set default C28 modes

CLRC AMODE

CLRC PAGE0,OVM

SAVE RNDF32=1 ; Save all FPU registers

... ; set default FPU modes

...

; Interrupt Restore

...

RESTORE ; Restore all FPU registers

POP XT ; restore other registers

POP XAR7

POP XAR6

POP XAR5

POP XAR4

POP XAR3

POP XAR2

POP AR1H:AR0H

POP RB ; restore RB register

NASP ; un-align stack

IRET ; return from interrupt

See also POP RB

PUSH RB

RPTB label, RC

120 Floating Point Unit (FPU) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

RPTB label, #RC — Repeat a Block of Code

RPTB label, #RC Repeat a Block of Code

Operands

label This label is used by the assembler to determine the end of the repeat block and to calculate RSIZE.

This label should be placed immediately after the last instruction included in the repeat block.

#RC 16-bit location

Opcode LSW: 1011 0101 1bbb bbbb

MSW: cccc cccc cccc cccc

Description Repeat a block of code. The repeat count is specified as a immediate value.

Restrictions

• The maximum block size is ≤127 16-bit words.

• An even aligned block must be ≥9 16-bit words.

• An odd aligned block must be ≥8 16-bit words.

• Interrupts must be disabled when saving or restoring the RB register.

• Repeat blocks cannot be nested.

• Any discontinuity type operation is not allowed inside a repeat block. This includes all

call, branch or TRAP instructions. Interrupts are allowed.

• Conditional execution operations are allowed.

Flags This instruction does not affect any flags int the floating-point unit:

Flag TF ZI NI ZF NF LUF LVF

Modified No No No No No No No

Pipeline This instruction takes one cycle on the first iteration and zero cycles thereafter. No

special pipeline alignment is required.

Example The minimum size for the repeat block is 8 words if the block is even aligned and 9

words if the block is odd aligned. If you have a block of 8 words, as in the following

example, you can make sure the block is odd aligned by proceeding it by a .align 2

directive and a NOP instruction. The .align 2 directive will make sure the NOP is even

aligned. Since a NOP is a 16-bit instruction the RPTB will be odd aligned. For blocks of

9 or more words, this is not required.

; Repeat Block (Interruptible)

;

; find the largest element and put its address in XAR6

.align 2

NOP

RPTB VECTOR_MAX_END, #(4-1) ; Execute the block 4 times

MOVL ACC,XAR0

MOV32 R1H,*XAR0++ ; 8 or 9 words block size 127 words

MAXF32 R0H,R1H

MOVST0 NF,ZF

MOVL XAR6,ACC,LT

VECTOR_MAX_END: ; RE indicates the end address

; RA is cleared

When an interrupt is taken the repeat active (RA) bit in the RB register is automatically

copied to the repeat active shadow (RAS) bit. When the interrupt exits, the RAS bit is

automatically copied back to the RA bit. This allows the hardware to keep track if a

repeat loop was active whenever an interrupt is taken and restore that state

automatically.

121

SPRUHS1A–March 2014–Revised December 2015 Floating Point Unit (FPU)

RPTB label, #RC — Repeat a Block of Code

www.ti.com

A high priority interrupt is defined as an interrupt that cannot itself be interrupted. In a

high priority interrupt, the RB register must be saved if a RPTB block is used within the

interrupt. If the interrupt service routine does not include a RPTB block, then you do not

have to save the RB register.

; Repeat Block within a High-Priority Interrupt (Non-Interruptible)

;

; Interrupt: ; RAS = RA, RA = 0

...

PUSH RB ; Save RB register only if a RPTB block is used in the

ISR

...

RPTB #BlockEnd, #5 ; Execute the block 5+1 times

...

BlockEnd ; End of block to be repeated

...

POP RB ; Restore RB register

...

IRET ; RA = RAS, RAS = 0

A low-priority interrupt is defined as an interrupt that allows itself to be interrupted. The

RB register must always be saved and restored in a low-priority interrupt. The RB

; Repeat Block within a Low-Priority Interrupt (Interruptible)

;

; Interrupt: ; RAS = RA, RA = 0

...

PUSH RB ; Always save RB register

...

CLRC INTM ; Enable interrupts only after saving RB

...

... ; ISR may or may not include a RPTB block

...

SETC INTM ; Disable interrupts before restoring RB

...

POP RB ; Always restore RB register

...

IRET ; RA = RAS, RAS = 0

See also POP RB

PUSH RB

RPTB #RSIZE, loc16

122 Floating Point Unit (FPU) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

SAVE FLAG, VALUE — Save Register Set to Shadow Registers and Execute SETFLG

SAVE FLAG, VALUE Save Register Set to Shadow Registers and Execute SETFLG

Operands

FLAG 11 bit mask indicating which floating-point status flags to change.

VALUE 11 bit mask indicating the flag value; 0 or 1.

Opcode LSW: 1110 0110 01FF FFFF

MSW: FFFF FVVV VVVV VVVV

Description This operation copies the current working floating-point register set (R0H to R7H and

STF) to the shadow register set and combines the SETFLG FLAG, VALUE operation in

a single cycle. The status register is copied to the shadow register before the flag values

are changed. The STF[SHDWM] flag is set to 1 when the SAVE command has been

executed. The SAVE and RESTORE instructions should be used in high-priority

interrupts. That is interrupts that cannot themselves be interrupted. In low-priority

interrupt routines the floating-point registers should be pushed onto the stack.

Restrictions Do not use the SAVE instruction in the delay slots for pipelined operations. Doing so can

yield invalid results. To avoid this, the proper number of NOPs or non-pipelined

instructions must be inserted before the SAVE operation.

; The following is INVALID

MPYF32 R2H, R1H, R0H ; 2 pipeline-cycle instruction (2p)

SAVE RNDF32=1 ; INVALID, do not use SAVE in a delay slot

; The following is VALID

MPYF32 R2H, R1H, R0H ; 2 pipeline-cycle instruction (2p)

NOP ; 1 delay cycle, R2H updated after this instruction

SAVE RNDF32=1 ; VALID

Flags This instruction modifies the following flags in the STF register:

Flag TF ZI NI ZF NF LUF LVF

Modified Yes Yes Yes Yes Yes Yes Yes

Any flag can be modified by this instruction.

Pipeline This is a single-cycle instruction.

Example To make it easier and more legible, the assembler will accept a FLAG=VALUE syntax for

the STFLG operation as shown below:

SAVE RNDF32=0, TF=1, ZF=0 ; FLAG = 01001000100, VALUE = X0XX0XXX1XX

MOVST0 TF, ZF, LUF ; Copy the indicated flags to ST0

; Note: X means this flag will not be modified.

; The assembler will set these X values to 0.

The following example shows a complete context save and restore for a high priority

interrupt. Note that the CPU automatically stores the following registers: ACC, P, XT,

ST0, ST1, IER, DP, AR0, AR1 and PC.

123

SPRUHS1A–March 2014–Revised December 2015 Floating Point Unit (FPU)

SAVE FLAG, VALUE — Save Register Set to Shadow Registers and Execute SETFLG

www.ti.com

_HighestPriorityISR:

ASP ;Align stack

PUSH RB ; Save RB register if used in the ISR

PUSH AR1H:AR0H ; Save other registers if used

PUSH XAR2

PUSH XAR3

PUSH XAR4

PUSH XAR5

PUSH XAR6

PUSH XAR7

PUSH XT

SPM 0 ; Set default C28 modes

CLRC AMODE

CLRC PAGE0,OVM

SAVE RNDF32=0 ; Save all FPU registers

... ; set default FPU modes

...

RESTORE ; Restore all FPU registers

POP XT ; restore other registers

POP XAR7

POP XAR6

POP XAR5

POP XAR4

POP XAR3

POP XAR2

POP AR1H:AR0H

POP RB ; restore RB register

NASP ; un-align stack IRET

; return from interrupt

See also SUBF32 RaH, RbH, RcH

SUBF32 RdH, ReH, RfH || MOV32 RaH, mem32

SUBF32 RdH, ReH, RfH || MOV32 mem32, RaH

MPYF32 RaH, RbH, RcH || SUBF32 RdH, ReH, RfH

127

SPRUHS1A–March 2014–Revised December 2015 Floating Point Unit (FPU)

SUBF32 RdH, ReH, RfH ∥∥MOV32 RaH, mem32 — 32-bit Floating-Point Subtraction with Parallel Move

www.ti.com

SUBF32 RdH, ReH, RfH ∥∥MOV32 RaH, mem32 32-bit Floating-Point Subtraction with Parallel Move

Operands

RdH floating-point destination register (R0H to R7H) for the SUBF32 operation

RdH cannot be the same register as RaH

ReH floating-point source register (R0H to R7H) for the SUBF32 operation

RfH floating-point source register (R0H to R7H) for the SUBF32 operation

RaH floating-point destination register (R0H to R7H) for the MOV32 operation

RaH cannot be the same register as RdH

mem32 pointer to 32-bit source memory location for the MOV32 operation

Opcode LSW: 1110 0011 0010 fffe

MSW: eedd daaa mem32

Description Subtract the contents of two floating-point registers and move from memory to a floating-

point register.

RdH = ReH - RfH, RaH = [mem32]

Restrictions The destination register for the SUBF32 and the MOV32 must be unique. That is, RaH

cannot be the same register as RdH.

Flags This instruction modifies the following flags in the STF register:

Flag TF ZI NI ZF NF LUF LVF

Modified No Yes Yes Yes Yes Yes Yes

The STF register flags are modified as follows:

• LUF = 1 if SUBF32 generates an underflow condition.

• LVF = 1 if SUBF32 generates an overflow condition.

The MOV32 Instruction will set the NF, ZF, NI and ZI flags as follows:

NF = RaH(31);

ZF = 0;

if(RaH(30:23) == 0) { ZF = 1; NF = 0; }

NI = RaH(31);

ZI = 0;

if(RaH(31:0) == 0) ZI = 1;

Pipeline SUBF32 is a 2 pipeline-cycle instruction (2p) and MOV32 takes a single cycle. That is:

SUBF32 RdH, ReH, RfH ; 2 pipeline cycles (2p)

|| MOV32 RaH, mem32 ; 1 cycle

; <-- MOV32 completes, RaH updated

NOP ; 1 cycle delay or non-conflicting instruction

; <-- SUBF32 completes, RdH updated

NOP

Any instruction in the delay slot must not use RdH as a destination register or as a

source operand.

128 Floating Point Unit (FPU) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

SUBF32 RdH, ReH, RfH ∥∥MOV32 RaH, mem32 — 32-bit Floating-Point Subtraction with Parallel Move

Example

MOVL XAR1, #0xC000 ; XAR1 = 0xC000

SUBF32 R0H, R1H, R2H ; (A) R0H = R1H - R2H

|| MOV32 R3H, *XAR1 ;

; <-- R3H valid

MOV32 R4H, *+XAR1[2] ;

; <-- (A) completes, R0H valid, R4H valid

ADDF32 R5H, R4H, R3H ; (B) R5H = R4H + R3H

|| MOV32 *+XAR1[4], R0H ;

; <-- R0H stored

MOVL XAR2, #0xE000 ;

; <-- (B) completes, R5H valid

MOV32 *XAR2, R5H ;

; <-- R5H stored

See also SUBF32 RaH, RbH, RcH

SUBF32 RaH, #16FHi, RbH

MPYF32 RaH, RbH, RcH || SUBF32 RdH, ReH, RfH

129

SPRUHS1A–March 2014–Revised December 2015 Floating Point Unit (FPU)

SUBF32 RdH, ReH, RfH ∥∥MOV32 mem32, RaH — 32-bit Floating-Point Subtraction with Parallel Move

www.ti.com

SUBF32 RdH, ReH, RfH ∥∥MOV32 mem32, RaH 32-bit Floating-Point Subtraction with Parallel Move

Operands

RdH floating-point destination register (R0H to R7H) for the SUBF32 operation

ReH floating-point source register (R0H to R7H) for the SUBF32 operation

RfH floating-point source register (R0H to R7H) for the SUBF32 operation

mem32 pointer to 32-bit destination memory location for the MOV32 operation

RaH floating-point source register (R0H to R7H) for the MOV32 operation

Opcode LSW: 1110 0000 0010 fffe

MSW: eedd daaa mem32

Description Subtract the contents of two floating-point registers and move from a floating-point

RdH = ReH - RfH,

[mem32] = RaH

Flags This instruction modifies the following flags in the STF register: SUBF32 RdH, ReH, RfH

|| MOV32 RaH, mem32

Flag TF ZI NI ZF NF LUF LVF

Modified No No No No No Yes Yes

The STF register flags are modified as follows:

• LUF = 1 if SUBF32 generates an underflow condition.

• LVF = 1 if SUBF32 generates an overflow condition.

Pipeline SUBF32 is a 2 pipeline-cycle instruction (2p) and MOV32 takes a single cycle. That is:

SUBF32 RdH, ReH, RfH ; 2 pipeline cycles (2p)

|| MOV32 mem32, RaH ; 1 cycle

; <-- MOV32 completes, mem32 updated

NOP ; 1 cycle delay or non-conflicting instruction

; <-- ADDF32 completes, RdH updated

NOP

Any instruction in the delay slot must not use RdH as a destination register or as a

source operand.

Example ADDF32 R3H, R6H, R4H ; (A) R3H = R6H + R4H and R7H = I3

|| MOV32 R7H, *-SP[2] ;

; <-- R7H valid

SUBF32 R6H, R6H, R4H ; (B) R6H = R6H - R4H

; <-- ADDF32 (A) completes, R3H valid

SUBF32 R3H, R1H, R7H ; (C) R3H = R1H - R7H and store R3H (A)

|| MOV32 *+XAR5[2], R3H ;

; <-- SUBF32 (B) completes, R6H valid

; <-- MOV32 completes, (A) stored

ADDF32 R4H, R7H, R1H ; R4H = D = R7H + R1H and store R6H (B)

|| MOV32 *+XAR5[6], R6H ;

; <-- SUBF32 (C) completes, R3H valid

; <-- MOV32 completes, (B) stored

MOV32 *+XAR5[0], R3H ; store R3H (C)

; <-- MOV32 completes, (C) stored

; <-- ADDF32 (D) completes, R4H valid

MOV32 *+XAR5[4], R4H ; store R4H (D)

; <-- MOV32 completes, (D) stored

130 Floating Point Unit (FPU) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

SUBF32 RdH, ReH, RfH ∥∥MOV32 mem32, RaH — 32-bit Floating-Point Subtraction with Parallel Move

See also SUBF32 RaH, RbH, RcH

SUBF32 RaH, #16FHi, RbH

SUBF32 RdH, ReH, RfH || MOV32 RaH, mem32

MPYF32 RaH, RbH, RcH || SUBF32 RdH, ReH, RfH

131

SPRUHS1A–March 2014–Revised December 2015 Floating Point Unit (FPU)

SWAPF RaH, RbH{, CNDF} — Conditional Swap

www.ti.com

SWAPF RaH, RbH{, CNDF} Conditional Swap

Operands

RaH floating-point register (R0H to R7H)

RbH floating-point register (R0H to R7H)

CNDF condition tested

Opcode LSW: 1110 0110 1110 CNDF

MSW: 0000 0000 00bb baaa

Description Conditional swap of RaH and RbH.

if (CNDF == true) swap RaH and RbH

CNDF is one of the following conditions:

Encode (1) CNDF Description STF Flags Tested

0000 NEQ Not equal to zero ZF == 0

0001 EQ Equal to zero ZF == 1

0010 GT Greater than zero ZF == 0 AND NF == 0

0011 GEQ Greater than or equal to zero NF == 0

0100 LT Less than zero NF == 1

0101 LEQ Less than or equal to zero ZF == 1 AND NF == 1

1010 TF Test flag set TF == 1

1011 NTF Test flag not set TF == 0

1100 LU Latched underflow LUF == 1

1101 LV Latched overflow LVF == 1

1110 UNC Unconditional None

1111 UNCF (2) Unconditional with flag modification None

(1) Values not shown are reserved.

(2) This is the default operation if no CNDF field is specified. This condition will allow the ZF, NF, ZI, and NI flags to be modified

when a conditional operation is executed. All other conditions will not modify these flags.

Flags This instruction modifies the following flags in the STF register:

Flag TF ZI NI ZF NF LUF LVF

Modified No No No No No No No

No flags affected

Pipeline This is a single-cycle instruction.

Example ;find the largest element and put it in R1H

MOVL XAR1, #0xB000 ;

MOV32 R1H, *XAR1 ; Initialize R1H

.align 2

NOP

RPTB LOOP_END, #(10-1); Execute the block 10 times

MOV32 R2H, *XAR1++ ; Update R2H with next element

CMPF32 R2H, R1H ; Compare R2H with R1H

SWAPF R1H, R2H, GT ; Swap R1H and R2H if R2 > R1

NOP ; For minimum repeat block size

LOOP_END:

132 Floating Point Unit (FPU) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

TESTTF CNDF — Test STF Register Flag Condition

TESTTF CNDF Test STF Register Flag Condition

Operands

CNDF condition to test

Opcode LSW: 1110 0101 1000 CNDF

Description Test the floating-point condition and if true, set the TF flag. If the condition is false, clear

the TF flag. This is useful for temporarily storing a condition for later use.

if (CNDF == true) TF = 1; else TF = 0;

CNDF is one of the following conditions:

Encode (1) CNDF Description STF Flags Tested

0000 NEQ Not equal to zero ZF == 0

0001 EQ Equal to zero ZF == 1

0010 GT Greater than zero ZF == 0 AND NF == 0

0011 GEQ Greater than or equal to zero NF == 0

0100 LT Less than zero NF == 1

0101 LEQ Less than or equal to zero ZF == 1 AND NF == 1

1010 TF Test flag set TF == 1

1011 NTF Test flag not set TF == 0

1100 LU Latched underflow LUF == 1

1101 LV Latched overflow LVF == 1

1110 UNC Unconditional None

1111 UNCF (2) Unconditional with flag modification None

(1) Values not shown are reserved.

(2) This is the default operation if no CNDF field is specified. This condition will allow the ZF, NF, ZI, and NI flags to be modified

when a conditional operation is executed. All other conditions will not modify these flags.

Flags This instruction modifies the following flags in the STF register:

Flag TF ZI NI ZF NF LUF LVF

Modified Yes No No No No No No

TF = 0; if (CNDF == true) TF = 1;

Note: If (CNDF == UNC or UNCF), the TF flag will be set to 1.

Pipeline This is a single-cycle instruction.

Example CMPF32 R0H, #0.0 ; Compare R0H against 0

TESTTF LT ; Set TF if R0H less than 0 (NF == 0)

ABS R0H, R0H ; Get the absolute value of R0H

; Perform calculations based on ABS R0H

MOVST0 TF ; Copy TF to TC in ST0

SBF End, NTC ; Branch to end if TF was not set

NEGF32 R0H, R0H

End

See also

133

SPRUHS1A–March 2014–Revised December 2015 Floating Point Unit (FPU)

UI16TOF32 RaH, mem16 — Convert unsigned 16-bit integer to 32-bit floating-point value

www.ti.com

UI16TOF32 RaH, mem16 Convert unsigned 16-bit integer to 32-bit floating-point value

Operands

RaH floating-point destination register (R0H to R7H)

mem16 pointer to 16-bit source memory location

Opcode LSW: 1110 0010 1100 0100

MSW: 0000 0aaa mem16

Description RaH = UI16ToF32[mem16]

Flags This instruction does not affect any flags:

Flag TF ZI NI ZF NF LUF LVF

Modified No No No No No No No

Pipeline This is a 2 pipeline cycle (2p) instruction. That is:

UI16TOF32 RaH, mem16 ; 2 pipeline cycles (2p)

NOP ; 1 cycle delay or non-conflicting instruction

; <-- UI16TOF32 completes, RaH updated

NOP

Any instruction in the delay slot must not use RaH as a destination register or as a

source operand.

Example ; float32 y,m,b;

; AdcRegs.RESULT0 is an unsigned int

; Calculate: y = (float)AdcRegs.ADCRESULT0 * m + b;

;

MOVW DP @0x01C4

UI16TOF32 R0H, @8 ; R0H = (float)AdcRegs.RESULT0

MOV32 R1H, *-SP[6] ; R1H = M

; <-- Conversion complete, R0H valid

MPYF32 R0H, R1H, R0H ; R0H = (float)X * M

MOV32 R1H, *-SP[8] ; R1H = B

; <-- MPYF32 complete, R0H valid

ADDF32 R0H, R0H, R1H ; R0H = Y = (float)X * M + B

NOP

; <-- ADDF32 complete, R0H valid

MOV32 *-[SP], R0H ; Store Y

See also F32TOI16 RaH, RbH

F32TOI16R RaH, RbH

F32TOUI16 RaH, RbH

F32TOUI16R RaH, RbH

I16TOF32 RaH, RbH

I16TOF32 RaH, mem16

UI16TOF32 RaH, RbH

134 Floating Point Unit (FPU) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

UI16TOF32 RaH, RbH — Convert unsigned 16-bit integer to 32-bit floating-point value

UI16TOF32 RaH, RbH Convert unsigned 16-bit integer to 32-bit floating-point value

Operands

RaH floating-point destination register (R0H to R7H)

RbH floating-point source register (R0H to R7H)

Opcode LSW: 1110 0110 1000 1111

MSW: 0000 0000 00bb baaa

Description RaH = UI16ToF32[RbH]

Flags This instruction does not affect any flags:

Flag TF ZI NI ZF NF LUF LVF

Modified No No No No No No No

Pipeline This is a 2 pipeline cycle (2p) instruction. That is:

UI16TOF32 RaH, RbH ; 2 pipeline cycles (2p)

NOP ; 1 cycle delay or non-conflicting instruction

; <-- UI16TOF32 completes, RaH updated

NOP

Any instruction in the delay slot must not use RaH as a destination register or as a

source operand.

Example MOVXI R5H, #0x800F ; R5H[15:0] = 32783 (0x800F)

UI16TOF32 R6H, R5H ; R6H = UI16TOF32 (R5H[15:0])

NOP ; 1 cycle delay for UI16TOF32 to complete

; R6H = 32783.0 (0x47000F00)

See also F32TOI16 RaH, RbH

F32TOI16R RaH, RbH

F32TOUI16 RaH, RbH

F32TOUI16R RaH, RbH

I16TOF32 RaH, RbH

I16TOF32 RaH, mem16

UI16TOF32 RaH, mem16

135

SPRUHS1A–March 2014–Revised December 2015 Floating Point Unit (FPU)

UI32TOF32 RaH, mem32 — Convert Unsigned 32-bit Integer to 32-bit Floating-Point Value

www.ti.com

UI32TOF32 RaH, mem32 Convert Unsigned 32-bit Integer to 32-bit Floating-Point Value

Operands

RaH floating-point destination register (R0H to R7H)

mem32 pointer to 32-bit source memory location

Opcode LSW: 1110 0010 1000 0100

MSW: 0000 0aaa mem32

Description RaH = UI32ToF32[mem32]

Flags This instruction does not affect any flags:

Flag TF ZI NI ZF NF LUF LVF

Modified No No No No No No No

Pipeline This is a 2 pipeline cycle (2p) instruction. That is:

UI32TOF32 RaH, mem32 ; 2 pipeline cycles (2p)

NOP ; 1 cycle delay non-conflicting instruction

; <-- UI32TOF32 completes, RaH updated

NOP

Any instruction in the delay slot must not use RaH as a destination register or as a

source operand.

Example ; unsigned long X

; float Y, M, B

; ...

; Calculate Y = (float)X * M + B

;

UI32TOF32 R0H, *-SP[2] ; R0H = (float)X

MOV32 R1H, *-SP[6] ; R1H = M

; <-- Conversion complete, R0H valid

MPYF32 R0H, R1H, R0H ; R0H = (float)X * M

MOV32 R1H, *-SP[8] ; R1H = B

; <-- MPYF32 complete, R0H valid

ADDF32 R0H, R0H, R1H ; R0H = Y = (float)X * M + B

NOP

; <-- ADDF32 complete, R0H valid

MOV32 *-[SP], R0H ; Store Y

See also F32TOI32 RaH, RbH

F32TOUI32 RaH, RbH

I32TOF32 RaH, mem32

I32TOF32 RaH, RbH

UI32TOF32 RaH, RbH

136 Floating Point Unit (FPU) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

UI32TOF32 RaH, RbH — Convert Unsigned 32-bit Integer to 32-bit Floating-Point Value

UI32TOF32 RaH, RbH Convert Unsigned 32-bit Integer to 32-bit Floating-Point Value

Operands

RaH floating-point destination register (R0H to R7H)

RbH floating-point source register (R0H to R7H)

Opcode LSW: 1110 0110 1000 1011

MSW: 0000 0000 00bb baaa

Description RaH = UI32ToF32[RbH]

Flags This instruction does not affect any flags:

Flag TF ZI NI ZF NF LUF LVF

Modified No No No No No No No

Pipeline This is a 2 pipeline cycle (2p) instruction. That is:

UI32TOF32 RaH, RbH ; 2 pipeline cycles (2p)

NOP ; 1 cycle delay or non-conflicting instruction

; <-- UI32TOF32 completes, RaH updated

NOP

Any instruction in the delay slot must not use RaH as a destination register or as a

source operand.

Example MOVIZ R3H, #0x8000 ; R3H[31:16] = 0x8000

MOVXI R3H, #0x1111 ; R3H[15:0] = 0x1111

; R3H = 2147488017

UI32TOF32 R4H, R3H ; R4H = UI32TOF32 (R3H)

NOP ; 1 cycle delay for UI32TOF32 to complete

; R4H = 2147488017.0 (0x4F000011)

See also F32TOI32 RaH, RbH

F32TOUI32 RaH, RbH

I32TOF32 RaH, mem32

I32TOF32 RaH, RbH

UI32TOF32 RaH, mem32

137

SPRUHS1A–March 2014–Revised December 2015 Floating Point Unit (FPU)

ZERO RaH — Zero the Floating-Point Register RaH

www.ti.com

ZERO RaH Zero the Floating-Point Register RaH

Operands

RaH floating-point register (R0H to R7H)

Opcode LSW: 1110 0101 1001 0aaa

Description Zero the indicated floating-point register:

RaH = 0

Flags This instruction modifies the following flags in the STF register:

Flag TF ZI NI ZF NF LUF LVF

Modified No No No No No No No

No flags affected.

Pipeline This is a single-cycle instruction.

Example ;for(i = 0; i < n; i++)

;{

; real += (x[2*i] * y[2*i]) - (x[2*i+1] * y[2*i+1]);

; imag += (x[2*i] * y[2*i+1]) + (x[2*i+1] * y[2*i]);

;}

;Assume AR7 = n-1

ZERO R4H ; R4H = real = 0

ZERO R5H ; R5H = imag = 0

LOOP

MOV AL, AR7

MOV ACC, AL << 2

MOV AR0, ACC

MOV32 R0H, *+XAR4[AR0] ; R0H = x[2*i]

MOV32 R1H, *+XAR5[AR0] ; R1H = y[2*i]

ADD AR0, #2

MPYF32 R6H, R0H, R1H; ; R6H = x[2*i] * y[2*i]

|| MOV32 R2H, *+XAR4[AR0] ; R2H = x[2*i+1]

MPYF32 R1H, R1H, R2H ; R1H = y[2*i] * x[2*i+2]

|| MOV32 R3H, *+XAR5[AR0] ; R3H = y[2*i+1]

MPYF32 R2H, R2H, R3H ; R2H = x[2*i+1] * y[2*i+1]

|| ADDF32 R4H, R4H, R6H ; R4H += x[2*i] * y[2*i]

MPYF32 R0H, R0H, R3H ; R0H = x[2*i] * y[2*i+1]

|| ADDF32 R5H, R5H, R1H ; R5H += y[2*i] * x[2*i+2]

SUBF32 R4H, R4H, R2H ; R4H -= x[2*i+1] * y[2*i+1]

ADDF32 R5H, R5H,R0H ; R5H += x[2*i] * y[2*i+1]

BANZ LOOP , AR7--

See also POP RB

164 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

PUSH RB — Push the RB Register onto the Stack

RPTB label, loc16

RPTB label, #RC

165

SPRUHS1A–March 2014–Revised December 2015 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)

RPTB label, loc16 — Repeat A Block of Code

www.ti.com

RPTB label, loc16 Repeat A Block of Code

Operands

label This label is used by the assembler to determine the end of the repeat block and to calculate RSIZE.

This label should be placed immediately after the last instruction included in the repeat block.

loc16 16-bit location for the repeat count value.

Opcode LSW: 1011 0101 0bbb bbbb

MSW: 0000 0000 loc16

Description Initialize repeat block loop, repeat count from [loc16]

Restrictions

• The maximum block size is ≤127 16-bit words.

• An even aligned block must be ≥9 16-bit words.

• An odd aligned block must be ≥8 16-bit words.

• Interrupts must be disabled when saving or restoring the RB register.

• Repeat blocks cannot be nested.

• Any discontinuity type operation is not allowed inside a repeat block. This includes all

call, branch or TRAP instructions. Interrupts are allowed.

• Conditional execution operations are allowed.

Flags This instruction does not affect any flags in the VSTATUS register.

Pipeline This instruction takes four cycles on the first iteration and zero cycles thereafter. No

special pipeline alignment is required.

Example The minimum size for the repeat block is 8 words if the block is even aligned and 9

words if the block is odd aligned. If you have a block of 8 words, as in the following

example, you can make sure the block is odd aligned by proceeding it by a .align 2

directive and a NOP instruction. The .align 2 directive will make sure the NOP is even

aligned. Since a NOP is a 16-bit instruction the RPTB will be odd aligned. For blocks of

9 or more words, this is not required.

; Repeat Block of 8 Words (Interruptible)

;

; Note: This example makes use of floating-point (C28x+FPU) instructions

;

; find the largest element and put its address in XAR6

.align 2

NOP

RPTB _VECTOR_MAX_END, AR7

; Execute the block AR7+1 times

MOVL ACC,XAR0 MOV32 R1H,*XAR0++ ; min size = 8, 9 words

MAXF32 R0H,R1H ; max size = 127 words

MOVST0 NF,ZF

MOVL XAR6,ACC,LT

_VECTOR_MAX_END: ; label indicates the end

; RA is cleared

When an interrupt is taken the repeat active (RA) bit in the RB register is automatically

copied to the repeat active shadow (RAS) bit. When the interrupt exits, the RAS bit is

automatically copied back to the RA bit. This allows the hardware to keep track if a

repeat loop was active whenever an interrupt is taken and restore that state

automatically.

A high priority interrupt is defined as an interrupt that cannot itself be interrupted. In a

high priority interrupt, the RB register must be saved if a RPTB block is used within the

166 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

RPTB label, loc16 — Repeat A Block of Code

interrupt. If the interrupt service routine does not include a RPTB block, then you do not

have to save the RB register.

; Repeat Block within a High-Priority Interrupt (Non-Interruptible)

;

; Interrupt: ; RAS = RA, RA = 0

...

PUSH RB ; Save RB register only if a RPTB block is used in the ISR

...

RPTB _BlockEnd, AL ; Execute the block AL+1 times

...

_BlockEnd ; End of block to be repeated

...

POP RB ; Restore RB register ...

IRET ; RA = RAS, RAS = 0

A low-priority interrupt is defined as an interrupt that allows itself to be interrupted. The

RB register must always be saved and restored in a low-priority interrupt. The RB

; Repeat Block within a Low-Priority Interrupt (Interruptible)

;

; Interrupt:

; RAS = RA, RA = 0

...

PUSH RB ; Always save RB register

...

CLRC INTM ; Enable interrupts only after saving RB

...

; ISR may or may not include a RPTB block

...

SETC INTM ; Disable interrupts before restoring RB

...

POP RB ; Always restore RB register

...

IRET ; RA = RAS, RAS = 0

See also POP RB

PUSH RB

RPTB label, #RC

167

SPRUHS1A–March 2014–Revised December 2015 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)

RPTB label, #RC — Repeat a Block of Code

www.ti.com

RPTB label, #RC Repeat a Block of Code

Operands

label This label is used by the assembler to determine the end of the repeat block and to calculate RSIZE.

This label should be placed immediately after the last instruction included in the repeat block.

#RC 16-bit immediate value for the repeat count.

Opcode LSW: 1011 0101 1bbb bbbb

MSW: cccc cccc cccc cccc

Description Repeat a block of code. The repeat count is specified as a immediate value.

Restrictions

• The maximum block size is ≤127 16-bit words.

• An even aligned block must be ≥9 16-bit words.

• An odd aligned block must be ≥8 16-bit words.

• Interrupts must be disabled when saving or restoring the RB register.

• Repeat blocks cannot be nested.

• Any discontinuity type operation is not allowed inside a repeat block. This includes all

call, branch or TRAP instructions. Interrupts are allowed.

• Conditional execution operations are allowed.

Flags This instruction does not affect any flags in the VSTATUS register.

Pipeline This instruction takes one cycle on the first iteration and zero cycles thereafter. No

special pipeline alignment is required.

Example The minimum size for the repeat block is 8 words if the block is even aligned and 9

words if the block is odd aligned. If you have a block of 8 words, as in the following

example, you can make sure the block is odd aligned by proceeding it by a .align 2

directive and a NOP instruction. The .align 2 directive will make sure the NOP is even

aligned. Since a NOP is a 16-bit instruction the RPTB will be odd aligned. For blocks of

9 or more words, this is not required.

; Repeat Block of 8 Words (Interruptible)

;

; Note: This example makes use of floating-point (C28x+FPU) instructions

;

; find the largest element and put its address in XAR6

;

.align 2

NOP

RPTB _VECTOR_MAX_END, AR7

; Execute the block AR7+1 times

MOVL ACC,XAR0 MOV32 R1H,*XAR0++ ; min size = 8, 9 words

MAXF32 R0H,R1H ; max size = 127 words

MOVST0 NF,ZF

MOVL XAR6,ACC,LT

_VECTOR_MAX_END: ; label indicates the end

; RA is cleared

When an interrupt is taken the repeat active (RA) bit in the RB register is automatically

copied to the repeat active shadow (RAS) bit. When the interrupt exits, the RAS bit is

automatically copied back to the RA bit. This allows the hardware to keep track if a

repeat loop was active whenever an interrupt is taken and restore that state

automatically.

A high priority interrupt is defined as an interrupt that cannot itself be interrupted. In a

high priority interrupt, the RB register must be saved if a RPTB block is used within the

168 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

RPTB label, #RC — Repeat a Block of Code

interrupt. If the interrupt service routine does not include a RPTB block, then you do not

have to save the RB register.

; Repeat Block within a High-Priority Interrupt (Non-Interruptible)

;

; Interrupt: ; RAS = RA, RA = 0

...

PUSH RB ; Save RB register only if a RPTB block is used in the ISR

...

RPTB #_BlockEnd, #5 ; Execute the block AL+1 times

...

_BlockEnd ; End of block to be repeated

...

POP RB ; Restore RB register ...

IRET ; RA = RAS, RAS = 0

A low-priority interrupt is defined as an interrupt that allows itself to be interrupted. The

RB register must always be saved and restored in a low-priority interrupt. The RB

; Repeat Block within a Low-Priority Interrupt (Interruptible)

;

; Interrupt:

; RAS = RA, RA = 0

...

PUSH RB ; Always save RB register

...

CLRC INTM ; Enable interrupts only after saving RB

...

; ISR may or may not include a RPTB block

...

SETC INTM ; Disable interrupts before restoring RB

...

POP RB ; Always restore RB register

...

IRET ; RA = RAS, RAS = 0

See also POP RB

PUSH RB

RPTB label, loc16

169

SPRUHS1A–March 2014–Revised December 2015 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)

VCLEAR VRa — Clear General Purpose Register

www.ti.com

VCLEAR VRa Clear General Purpose Register

Operands VRa General purpose register: VR0, VR1... VR8

Opcode LSW: 1110 0110 1111 1000

MSW: 0000 0000 0000 aaaa

Description Clear the specified general purpose register.

VRa = 0x00000000;

Flags This instruction does not modify any flags in the VSTATUS register.

Pipeline This is a single-cycle instruction.

Example ;

; Code fragment from a viterbi traceback

; For the first iteration the previous state metric must be

; initalized to zero (VR0).

;

VCLEAR VR0 ; Clear the VR0 register

MOVL XAR5,*+XAR4[0] ; Point XAR5 to an array

;

; For first stage

;

VMOV32 VT0, *--XAR3

VMOV32 VT1, *--XAR3

VTRACE *XAR5++,VR0,VT0,VT1 ; Uses VR0 (which is zero)

;

; etc...

;

See also VCLROVFI

VRNDON

VSATFOFF

VSATON

176 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

VMOV16 mem16, VRaH — Store General Purpose Register, High Half

VMOV16 mem16, VRaH Store General Purpose Register, High Half

Operands mem16 Pointer to a 16-bit memory location. This will be the source for the VMOV16.

VRaH High word of a general purpose register: VR0H, VR1H...VR8H.

Opcode

LSW: 1110 0010 0001 1000

MSW: 0001 aaaa mem16

Description Store the upper 16-bits of the specified general purpose register into the 16-bit memory

location.

[mem16] = VRa[31:6];

Flags This instruction does not affect any flags in the VSTATUS register.

Pipeline This is a single-cycle instruction.

Example

See also VMOV32 mem32, VSTATUS

VMOV32 mem32, VTa

VMOV32 VRa, mem32

VMOV32 VTa, mem32

183

SPRUHS1A–March 2014–Revised December 2015 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)

VMOV32 mem32, VSTATUS — Store VCU Status Register

www.ti.com

VMOV32 mem32, VSTATUS Store VCU Status Register

Operands mem32 Pointer to a 32-bit memory location. This will be the destination of the VMOV32.

VSTATUS VCU status register.

Opcode LSW: 1110 0010 0000 1101

MSW: 0000 0000 mem32

Description Store the VSTATUS register into the memory location pointed to by mem32.

[mem32] = VSTATUS;

Flags This instruction does not modify any flags in the VSTATUS register.

Pipeline This is a single-cycle instruction.

Example

See also VMOV32 mem32, VRa

VMOV32 mem32, VTa

VMOV32 VRa, mem32

VMOV32 VSTATUS, mem32

VMOV32 VTa, mem32

184 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

VMOV32 mem32, VTa — Store Transition Bit Register

VMOV32 mem32, VTa Store Transition Bit Register

Operands mem32 pointer to a 32-bit memory location. This will be the destination of the VMOV32.

VTa Transition bits register VT0 or VT1

Opcode LSW: 1110 0010 0000 0101

MSW: 0000 00tt mem32

Description Store the 32-bits of the specified transition bits register into the memory location pointed

to by mem32.

[mem32] = VTa;

Flags This instruction does not modify any flags in the VSTATUS register.

Pipeline This is a single-cycle instruction.

Example

See also VMOV32 mem32, VRa

VMOV32 mem32, VSTATUS

VMOV32 VRa, mem32

VMOV32 VSTATUS, mem32

VMOV32 VTa, mem32

185

SPRUHS1A–March 2014–Revised December 2015 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)

VMOV32 VRa, mem32 — Load 32-bit General Purpose Register

www.ti.com

VMOV32 VRa, mem32 Load 32-bit General Purpose Register

Operands VRa General purpose register VR0, VR1....VR8

mem32 Pointer to a 32-bit memory location. This will be the source of the VMOV32.

Opcode LSW: 1110 0011 1111 0000

MSW: 0000 aaaa mem32

Description Load the specified general purpose register with the 32-bit value in memory pointed to

by mem32.

VRa = [mem32];

Flags This instruction does not modify any flags in the VSTATUS register.

Pipeline This is a single-cycle instruction.

Example

See also VMOV32 mem32, VRa

VMOV32 mem32, VSTATUS

VMOV32 mem32, VTa

VMOV32 VSTATUS, mem32

VMOV32 VTa, mem32

186 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

VMOV32 VRb, VRa — Move 32-bit Register to Register

VMOV32 VRb, VRa Move 32-bit Register to Register

Operands VRa General purpose destination register VR0....VR8

VRb General purpose source register VR0...VR8

Opcode LSW: 1110 0110 1111 0010

MSW: 0000 0010 bbbb aaaa

Description Move a 32-bit value from one general purpose VCU register to another.

VRa = [mem32];

Flags This instruction does not affect any flags in the VSTATUS register.

Pipeline This is a single-cycle instruction.

Example ; Swap VR0 and VR1 using VR2 as temporary storage

;

VMOV32 VR2, VR1

VMOV32 VR1, VR0

VMOV32 VR0, VR2

See also VMOV32 mem32, VRa

VMOV32 mem32, VSTATUS

VMOV32 mem32, VTa

VMOV32 VTa, mem32

187

SPRUHS1A–March 2014–Revised December 2015 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)

VMOV32 VSTATUS, mem32 — Load VCU Status Register

www.ti.com

VMOV32 VSTATUS, mem32 Load VCU Status Register

Operands VSTATUS VCU status register

mem32 Pointer to a 32-bit memory location. This will be the source of the VMOV32.

Opcode LSW: 1110 0010 1011 0000

MSW: 0000 0000 mem32

Description Load the VSTATUS register with the 32-bit value in memory pointed to by mem32.

VSTATUS = [mem32];

Flags This instruction modifies all bits within the VSTATUS register.

Pipeline This is a single-cycle instruction.

Example

See also VMOV32 mem32, VSTATUS

VMOV32 mem32, VTa

VMOV32 VRa, mem32

VMOV32 VTa, mem32

188 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

VMOV32 VTa, mem32 — Load 32-bit Transition Bit Register

VMOV32 VTa, mem32 Load 32-bit Transition Bit Register

Operands VTa Transition bit register: VT0, VT1

mem32 Pointer to a 32-bit memory location. This will be the source of the VMOV32.

Opcode LSW: 1110 0011 1111 0001

MSW: 0000 00tt mem32

Description Load the specified transition bit register with the 32-bit value in memory pointed to by

mem32 .

VTa = [mem32];

Flags This instruction does not modify any flags in the VSTATUS register.

Pipeline This is a single-cycle instruction.

Example

See also VMOV32 mem32, VSTATUS

VMOV32 mem32, VTa

VMOV32 VRa, mem32

VMOV32 VSTATUS, mem32

189

SPRUHS1A–March 2014–Revised December 2015 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)

VMOVD32 VRa, mem32 — Load Register with Data Move

www.ti.com

VMOVD32 VRa, mem32 Load Register with Data Move

Operands VRa General purpose registger, VR0, VR1.... VR8

mem32 Pointer to a 32-bit memory location. This will be the source of the VMOV32.

Opcode LSW: 1110 0010 0010 0100

MSW: 0000 aaaa mem32

Description Load the specified general purpose register with the 32-bit value in memory pointed to

by mem32. In addition, copy the next 32-bit value in memory to the location pointed to by

mem32.

VRa = [mem32];

[mem32 + 2] = [mem32];

Flags This instruction does not modify any flags in the VSTATUS register.

Pipeline This is a single-cycle instruction.

Example

See also

190 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

VMOVIX VRa, #16I — Load Upper Half of a General Purpose Register with I6-bit Immediate

VMOVIX VRa, #16I Load Upper Half of a General Purpose Register with I6-bit Immediate

Operands VRa General purpose registger, VR0, VR1... VR8

#16I 16-bit immediate value

Opcode LSW: 1110 0111 1110 IIII

MSW: IIII IIII IIII aaaa

Description Load the upper 16-bits of the specified general purpose register with an immediate

value. Leave the upper 16-bits of the register unchanged.

VRa[15:0] = unchanged;

VRa[31:16] = #16I;

Flags This instruction does not modify any flags in the VSTATUS register.

Pipeline This is a single-cycle instruction.

Example

See also VCLROVFI

VCLROVFR

VRNDON

VSATFOFF

VSATON

194 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

VRNDON — Enable Rounding

VRNDON Enable Rounding

Operands none

Opcode LSW: 1110 0101 0000 1000

Description This instruction enables the rounding mode by setting the RND bit in the VSTATUS

subtraction operations will be rounded instead of being truncated. The operations

affected by rounding are shown in Table 2-6. Refer to the individual instruction

descriptions for information on how rounding effects the operation. To disable rounding

use the VRNDOFF instruction.

For more information on rounding, refer to Section 2.3.2.

VSTATUS[RND] = 1;

Flags This instruction sets the RND bit in the VSTATUS register. It does not change any flags.

Pipeline This is a single-cycle instruction.

Example

See also VCLROVFI

VCLROVFR

VRNDOFF

VSATFOFF

VSATON

195

SPRUHS1A–March 2014–Revised December 2015 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)

VSATOFF — Disable Saturation

www.ti.com

VSATOFF Disable Saturation

Operands none

Opcode LSW: 1110 0101 0000 0111

Description This instruction disables the saturation mode by clearing the SAT bit in the VSTATUS

overflow or underflow. When saturation is enabled, results will instead be set to a

maximum or minimum value instead of being allowed to overflow or underflow. To

enable saturation use the VSATON instruction.

VSTATUS[SAT] = 0

Flags This instruction clears the the SAT bit in the VSTATUS register. It does not change any

flags.

Pipeline This is a single-cycle instruction.

Example

See also VCLROVFI

VCLROVFR

VRNDOFF

VRNDON

VSATON

196 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

VSATON — Enable Saturation

VSATON Enable Saturation

Operands none

Opcode LSW: 1110 0101 0000 0110

Description This instruction enables the saturation mode by setting the SAT bit in the VSTATUS

to overflow or underflow. Results will, instead, be set to a maximum or minimum value.

To disable saturation use the VSATOFF instruction..

VSTATUS[SAT] = 1

Flags This instruction sets the SAT bit in the VSTATUS register. It does not change any flags.

Pipeline This is a single-cycle instruction.

Example

See also VCLROVFI

VCLROVFR

VRNDOFF

VRNDON

VSATOFF

197

SPRUHS1A–March 2014–Revised December 2015 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)

VSETCPACK — Set CPACK bit in the VSTATUS Register

www.ti.com

VSETCPACK Set CPACK bit in the VSTATUS Register

Operands none

Opcode LSW: 1110 0101 0010 0001

Description Set the CPACK bit in the VSTATUS register. This causes the VCU to process complex

data, in complex math operations, in the VRx registers as follows:

VRx[31:16] holds the Imaginary part, VRx[15:0] holds the Real part

Flags This instruction sets the CPACK bit in the VSTATUS register.

Pipeline This is a single-cycle instruction.

Example ; complex conjugate multiply| (a + jb)*(c + jd)=(ac+bd)+j(bc-ad)

VSETCPACK ; cpack = 1 imag part in low word

VMOV32 VR0, *XAR4++ ; load 1st complex input | a + jb

VMOV32 VR1, *XAR4++ ; load second complex input | c + jd

VCCMPY VR3, VR2, VR1, VR0

See also VMOV32 mem32, VSTATUS

VMOV32 mem32, VTa

VMOV32 VRa, mem32

VMOV32VRbVRa

VMOV32VTamem32

203

SPRUHS1A–March 2014–Revised December 2015 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)

VXORMOV32 VRa, mem32 — 32-bit Load and XOR From Memory

www.ti.com

VXORMOV32 VRa, mem32 32-bit Load and XOR From Memory

Operands Input Register Value

VRa General purpose register VR0...VR8

mem32 Pointer to 32-bit memory location

Opcode

LSW: 1110 0011 1111 0000

MSW: 0000 aaaa MMMM MMMM

Description XOR the contents of the VRa register with a long word from memory and store the result

back into VRa

VRa = VRa ^ mem32

Flags This instruction does not affect any flags in the VSTATUS register.

Pipeline This is a single-cycle instruction.

Example VXORMOV32 VR0, *+XAR4[0] ;VR0=VR0 ^ *XAR4[0]

See also

204 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

Instruction Set

2.5.3 Arithmetic Math Instructions

The instructions are listed alphabetically, preceded by a summary.

Table 2-12. Arithmetic Math Instructions

Title ...................................................................................................................................... Page

VASHL32 VRa << #5-bit —Arithmetic Shift Left .................................................................................. 206

VASHR32 VRa >> #5-bit —Arithmetic Shift Right ................................................................................ 207

VBITFLIP VRa —Bit Flip............................................................................................................... 208

VLSHL32 VRa << #5-bit —Logical Shift Left ...................................................................................... 209

VLSHR32 VRa >> #5-bit —Logical Shift Right .................................................................................... 210

VNEG VRa —Two's Complement Negate........................................................................................... 211

205

SPRUHS1A–March 2014–Revised December 2015 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)

VASHL32 VRa << #5-bit — Arithmetic Shift Left

www.ti.com

VASHL32 VRa << #5-bit Arithmetic Shift Left

Operands

VRa VRa can be VR0 - VR7. VRa can not be VR8.

#5-bit 5-bit unsigned immediate value

Opcode LSW: 1110 0110 1111 0010

MSW: 0000 0111 IIII Iaaa

Description Arithmetic left shift of VRa

If(VSTATUS[SAT] == 1){

VRa = sat(VRa << #5-bit Immediate)

}else {

VRa = VRa << #5-bit Immediate

}

Flags This instruction modifies the following bits in the VSTATUS register:

• OVFR is set if the 32-bit signed result after the shift left operation overflows

Pipeline This is a single-cycle instruction

Example VASHL32 VR4 << #16 ; VR4 := VR4 << 16

See also

208 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

VLSHL32 VRa << #5-bit — Logical Shift Left

VLSHL32 VRa << #5-bit Logical Shift Left

Operands

VRa VRa can be VR0 - VR7. VRa can not be VR8.

#5-bit 5-bit unsigned immediate value

Opcode LSW: 1110 0110 1111 0010

MSW: 0000 0101 IIII Iaaa

Description Logical right shift of VRa

VRa = VRa << #5-bit Immediate

Flags This instruction does not affect any flags in the VSTATUS register

Pipeline This is a single-cycle instruction

Example VLSHL32 VR0 << #16 ; VR0 := VR0 << 16

See also VCLROVFI

219

SPRUHS1A–March 2014–Revised December 2015 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)

VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 — Complex Conjugate Multiply and Accumulate

www.ti.com

VCLROVFR

VCCMAC VR5, VR4, VR3, VR2, VR1, VR0

VSATON

VSATOFF

220 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — : Complex Conjugate Multiply and

Accumulate with Parallel Load

VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 : Complex Conjugate Multiply

and Accumulate with Parallel Load

Operands Input Register Value

VR0 First Complex Operand

VR1 Second Complex Operand

VR2 Imaginary part of the Result

VR3 Real part of the Result

VR4 Imaginary part of the accumulation

VR5 Real part of the accumulation

VRa Contents of the memory pointed to by mem32. VRa cannot be VR5, VR4 or VR8

mem32 Pointer to 32-bit memory location

Note: The user will need to do one final addition to accumulate the final multiplications (Real-VR3 and

Imaginary-VR2) into the result registers.

Opcode

LSW: 1110 0011 1111 0111

MSW: 0001 aaaa mem32

Description Complex Conjugate Multiply Operation with parallel load.

// VR5 = Accumulation of the real part

// VR4 = Accumulation of the imaginary part

// VR0 = X + jX: VR0[31:16] = X, VR0[15:0] = jX

// VR1 = Y + jY: VR1[31:16] = Y, VR1[15:0] = jY

// Perform add

if (RND == 1)

{

VR5 = VR5 + round(VR3 >> SHIFTR);

VR4 = VR4 + round(VR2 >> SHIFTR);

}

else

{

VR5 = VR5 + (VR3 >> SHIFTR);

VR4 = VR4 + (VR2 >> SHIFTR);

}

// Perform multiply (X + jX) * (Y - jY)

If(VSTATUS[CPACK] == 0){

VR3 = VR0H * VR1H + VR0L * VR1L; Real result

VR2 = VR0H * VR1L - VR0L * VR1H; Imaginary result

}

else

{

VR3 = VR0L * VR1L + VR0H * VR1H; Real result

VR2 = VR0L * VR1H - VR0H * VR1L; Imaginary result

}

if(SAT == 1)

{

sat32(VR3);

sat32(VR2);

}

VRa = [mem32];

221

SPRUHS1A–March 2014–Revised December 2015 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)

VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — : Complex Conjugate Multiply and Accumulate with

Parallel Load

www.ti.com

Flags This instruction modifies the following bits in the VSTATUS register:

• OVFR is set if the VR3 computation (real part) overflows or underflows.

• OVFI is set if the VR2 computation (imaginary part) overflows or underflows.

Pipeline This is a 2p-cycle instruction.

See also VCLROVFI

VCLROVFR

VCCMAC VR5, VR4, VR3, VR2, VR1, VR0

VSATON

VSATOFF

222 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

VCCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ — Complex Conjugate Multiply and Accumulate

VCCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ Complex Conjugate Multiply and Accumulate

Operands The VMAC alternates which registers are used between each cycle. For odd cycles (1,

3, 5, and so on) the following registers are used:

Odd Cycle Input Value

VR5 Previous real-part total accumulation: Re(odd_sum)

VR4 Previous imaginary-part total accumulation: Im(odd-sum)

VR1 Previous real result from the multiply: Re(odd-mpy)

VR0 Previous imaginary result from the multiply Im(odd-mpy)

[mem32] Pointer to a 32-bit memory location representing the first input to the multiply

If(VSTATUS[CPACK] == 0)

[mem32][32:16] = Re(X)

[mem32][15:0] = Im(X)

If(VSTATUS[CPACK] == 1)

[mem32][32:16] = Im(X)

mem32][15:0] = Re(X)

XAR7 Pointer to a 32-bit memory location representing the second input to the multiply

If(VSTATUS[CPACK] == 0)

*XAR7[32:16] = Re(X)

*XAR7[15:0] = Im(X)

If(VSTATUS[CPACK] == 1)

*XAR7[32:16] = Im(X)

*XAR7 [15:0] = Re(X)

The result from the odd cycle is stored as shown below:

Odd Cycle Output Value

VR5 32-bit real part of the total accumulation

Re(odd_sum) = Re(odd_sum) + Re(odd_mpy)

VR4 32-bit imaginary part of the total accumulation

Im(odd_sum) = Im(odd_sum) + Im(odd_mpy)

VR1 32-bit real result from the multiplication:

Re(Z) = Re(X)*Re(Y) + Im(X)*Im(Y)

VR0 32-bit imaginary result from the multiplication:

Im(Z) = Re(X)*Im(Y) - Re(Y)*Im(X)

For even cycles (2, 4, 6, and so on) the following registers are used:

Even Cycle Input Value

VR7 Previous real-part total accumulation: Re(even_sum)

VR6 Previous imaginary-part total accumulation: Im(even-sum)

VR3 Previous real result from the multiply: Re(even-mpy)

VR2 Previous imaginary result from the multiply Im(even-mpy)

[mem32] Pointer to a 32-bit memory location representing the first input to the multiply

If(VSTATUS[CPACK] == 0)

[mem32][32:16] = Re(X)

[mem32][15:0] = Im(X)

If(VSTATUS[CPACK] == 1)

[mem32][32:16] = Im(X)

223

SPRUHS1A–March 2014–Revised December 2015 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)

VCCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ — Complex Conjugate Multiply and Accumulate

www.ti.com

Even Cycle Input Value

mem32][15:0] = Re(X)

XAR7 Pointer to a 32-bit memory location representing the second input to the multiply

If(VSTATUS[CPACK] == 0)

*XAR7[32:16] = Re(X)

*XAR7[15:0] = Im(X)

If(VSTATUS[CPACK] == 1)

*XAR7[32:16] = Im(X)

*XAR7 [15:0] = Re(X)

The result from even cycles is stored as shown below:

Even Cycle Output Value

VR7 32-bit real part of the total accumulation

Re(even_sum) = Re(even_sum) + Re(even_mpy)

VR6 32-bit imaginary part of the total accumulation

Im(even_sum) = Im(even_sum) + Im(even_mpy)

VR3 32-bit real result from the multiplication:

Re(Z) = Re(X)*Re(Y) + Im(X)*Im(Y)

VR2 32-bit imaginary result from the multiplication:

Im(Z) = Re(X)*Im(Y) - Re(Y)*Im(X)

Opcode

LSW: 1110 0010 0101 0001

MSW: 0010 1111 mem32

Description Perform a repeated complex conjugate multiply and accumulate operation. This

instruction must be used with the single repeat instruction (RPT ||). The destination of

the accumulate will alternate between VR7/VR6 and VR5/VR4 on each cycle.

// Cycle 1:

// Perform accumulate

if(RND == 1)

{

VR5 = VR5 + round(VR1 >> SHIFTR)

VR4 = VR4 + round(VR0 >> SHIFTR)

}

else

{

VR5 = VR5 + (VR1 >> SHIFTR)

VR4 = VR4 + (VR0 >> SHIFTR)

}

// X and Y array element 0

VR1 = Re(X)*Re(Y) + Im(X)*Im(Y)

VR0 = Re(X)*Im(Y) - Re(Y)*Im(X)

// Cycle 2:

// Perform accumulate

if(RND == 1)

{

VR7 = VR7 + round(VR3 >> SHIFTR)

VR6 = VR6 + round(VR2 >> SHIFTR)

}

224 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

VCCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ — Complex Conjugate Multiply and Accumulate

else

{

VR7 = VR7 + (VR3 >> SHIFTR)

VR6 = VR6 + (VR2 >> SHIFTR)

}

// X and Y array element 1

VR3 = Re(X)*Re(Y) + Im(X)*Im(Y)

VR2 = Re(X)*Im(Y) - Re(Y)*Im(X)

// Cycle 3:

// Perform accumulate

if(RND == 1)

{

VR5 = VR5 + round(VR1 >> SHIFTR)

VR4 = VR4 + round(VR0 >> SHIFTR)

}

else

{

VR5 = VR5 + (VR1 >> SHIFTR)

VR4 = VR4 + (VR0 >> SHIFTR)

}

// X and Y array element 2

VR1 = Re(X)*Re(Y) + Im(X)*Im(Y)

VR0 = Re(X)*Im(Y) - Re(Y)*Im(X)

etc...

Restrictions VR0, VR1, VR2, and VR3 will be used as temporary storage by this instruction.

Flags The VSTATUS register flags are modified as follows:

• OVFR is set in the case of an overflow or underflow of the addition or subtraction

operations.

• OVFI is set in the case an overflow or underflow of the imaginary part of the addition

or subtraction operations.

Pipeline

The VCCMAC takes 2p + N cycles where N is the number of times the instruction is

repeated. This instruction has the following pipeline restrictions:

<instruction1> ; No restriction

<instruction2 ; Cannot be a 2p instruction that writes

; to VR0, VR1...VR7 registers

RPT #(N-1) ; Execute N times, where N is even

|| VCMAC VR7, VR6, VR5, VR4, *XAR6++, *XAR7++

<instruction3> ; No restrictions.

; Can read VR0, VR1... VR8

See also VCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++

225

SPRUHS1A–March 2014–Revised December 2015 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)

VCCMPY VR3, VR2, VR1, VR0 — Complex Conjugate Multiply

www.ti.com

VCCMPY VR3, VR2, VR1, VR0 Complex Conjugate Multiply

Operands Both inputs are complex numbers with a 16-bit real and 16-bit imaginary part. The result

is a complex number with a 32-bit real and a 32-bit imaginary part. The result is stored in

VR2 and VR3 as shown below:

Input Register Value

VR0 First Complex Operand

VR1 Second Complex Operand

VR2 Imaginary part of the Result

VR3 Real part of the Result

The result is a complex number with a 16-bit real and a 16-bit imaginary part. The result

is stored in VR5 as shown below:

Opcode LSW: 1110 0101 0000 1110

Description Complex Conjugate 16 x 16 = 32-bit multiply operation.

If the VSTATUS[CPACK] bit is set, the low word of the input is treated as the real part

while the upper word is treated as imaginary. If the VSTATUS[SAT] bit is set, then the

result will be saturated in the event of a 32-bit overflow or underflow. The following

operation is carried out:.

if(VSTATUS[CPACK] == 0){

VR3 = VR0H * VR1H + VR0L * VR1L; //Re(Z) = Re(X)*Re(Y) + Im(X)*Im(Y)

VR2 = VR0H * VR1L - VR0L * VR1H; // Im(Z) = Re(X)*Im(Y) - Im(X)*Re(Y)

}else{

VR3 = VR0L * VR1L + VR0H * VR1H; // Re(Z) = Re(X)*Re(Y) + Im(X)*Im(Y)

VR2 = VR0L * VR1H - VR0H * VR1L; // Im(Z) = Re(X)*Im(Y) - Im(X)*Re(Y)

}

Flags This instruction modifies the following bits in the VSTATUS register:

• OVFR is set if the VR3 computation (real part) overflows or underflows.

• OVFI is set if the VR2 computation (imaginary part) overflows or underflows.

Pipeline This is a 2p-cycle instruction. The instruction following this one should not use VR3 or

VR2.

VCLRCPACK ; cpack = 0 real part in high word

VMOV32 VR0, *XAR4++ ; load 1st complex input | jb + a

VMOV32 VR1, *XAR4++ ; load second complex input | jd + c

VCCMPY VR3, VR2, VR1, VR0 ; complex conjugate multiply|

; (jb + a)*(jd + c)=(ac+bd)+j(bc-ad)

NOP

VMOV32 *XAR5++, VR3 ; store real part first

VMOV32 *XAR5++, VR2 ; store imag part next

VSETCPACK ; cpack = 1 imag part in low word

VMOV32 VR0, *XAR4++ ; load 1st complex input | a + jb

VMOV32 VR1, *XAR4++ ; load second complex input | c + jd

VCCMPY VR3, VR2, VR1, VR0 ; complex conjugate multiply|

; (a + jb)*(c + jd)=(ac+bd)+j(bc-ad)

NOP

VMOV32 *XAR5++, VR3 ; store real part first

VMOV32 *XAR5++, VR2 ; store imag part next

Example

See also VCLROVFI

VCLROVFR

VCCMAC VR5, VR4, VR3, VR2, VR1, VR0

226 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

VCCMPY VR3, VR2, VR1, VR0 — Complex Conjugate Multiply

VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32

VSETCPACK

VCLRCPACK

VSATON

VSATOFF

227

SPRUHS1A–March 2014–Revised December 2015 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)

VCCMPY VR3, VR2, VR1, VR0 || VMOV32 mem32, VRa — Complex Conjugate Multiply with Parallel Store

www.ti.com

VCCMPY VR3, VR2, VR1, VR0 || VMOV32 mem32, VRa Complex Conjugate Multiply with Parallel

Store

Operands Both inputs are complex numbers with a 16-bit real and 16-bit imaginary part. The result

is a complex number with a 32-bit real and a 32-bit imaginary part. The result is stored in

VR2 and VR3 as shown below:

Input Register Value

VR0 First Complex Operand

VR1 Second Complex Operand

VRa Value to be stored

VR2 Imaginary part of the Result

VR3 Real part of the Result

mem32 Pointer to 32-bit memory location

The result is a complex number with a 16-bit real and a 16-bit imaginary part. The result

is stored in VR5 as shown below:

Opcode LSW: 1110 0011 0000 0111

MSW: 0001 aaaa mem32

Description Complex Conjugate 16 x 16 = 32-bit multiply operation.

If the VSTATUS[CPACK] bit is set, the low word of the input is treated as the real part

while the upper word is treated as imaginary. If the VSTATUS[SAT] bit is set, then the

result will be saturated in the event of a 32-bit overflow or underflow. The following

operation is carried out:

if(VSTATUS[CPACK] == 0){

VR3 = VR0H * VR1H + VR0L * VR1L; //Re(Z) = Re(X)*Re(Y) + Im(X)*Im(Y)

VR2 = VR0H * VR1L - VR0L * VR1H; // Im(Z) = Re(X)*Im(Y) - Im(X)*Re(Y)

}else{

VR3 = VR0L * VR1L + VR0H * VR1H; // Re(Z) = Re(X)*Re(Y) + Im(X)*Im(Y)

VR2 = VR0L * VR1H - VR0H * VR1L; // Im(Z) = Re(X)*Im(Y) - Im(X)*Re(Y)

}

[mem32] = VRa;

Flags This instruction modifies the following bits in the VSTATUS register:

• OVFR is set if the VR3 computation (real part) overflows or underflows.

• OVFI is set if the VR2 computation (imaginary part) overflows or underflows.

Pipeline This is a 2p/1-cycle instruction. The multply operation takes 2p cycles and the VMOV

operation completes in a single cycle. The instruction following this one should not use

VR3 or VR2.

Example VCLRCPACK ; cpack = 0 real part in high word

VMOV32 VR0, *XAR4++ ; load 1st complex input | jb + a

VMOV32 VR1, *XAR4++ ; load second complex input | jd + c

VCCMPY VR3, VR2, VR1, VR0 ; complex conjugate multiply|

||VMOV32 VR0, *XAR4++ ; (jb + a)*(jd + c)=(ac+bd)+j(bc-ad)

; load 1st complex input | a + jb

NOP ; for next VCCMPY instr |

VMOV32 *XAR5++, VR3 ; store real part first

VSETCPACK ; cpack = 1 imag part in low word

VMOV32 VR1, *XAR4++ ; load second complex input | c + jd

VCCMPY VR3, VR2, VR1, VR0 ; complex conjugate multiply|

||VMOV32 *XAR5++, VR2 ; (a + jb)*(c + jd)=(ac+bd)+j(bc-ad)

; store imag part of first |

NOP ; VCCMPY instruction |

VMOV32 *XAR5++, VR3 ; store real part first

VMOV32 *XAR5++, VR2 ; store imag part next

VCLRCPACK

228 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

VCCMPY VR3, VR2, VR1, VR0 || VMOV32 mem32, VRa — Complex Conjugate Multiply with Parallel Store

See also VCLROVFI

VCLROVFR

VCCMAC VR5, VR4, VR3, VR2, VR1, VR0

VCCMAC VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32

VSETCPACK

VCLRCPACK

VSATON

VSATOFF

229

SPRUHS1A–March 2014–Revised December 2015 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)

VCCMPY VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — Complex Conjugate Multiply with Parallel Load

www.ti.com

VCCMPY VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 Complex Conjugate Multiply with Parallel

Load

Operands Both inputs are complex numbers with a 16-bit real and 16-bit imaginary part. The result

is a complex number with a 32-bit real and a 32-bit imaginary part. The result is stored in

VR2 and VR3 as shown below:

Input Register Value

VR0 First Complex Operand

VR1 Second Complex Operand

VRa 32-bit value pointed to by mem32. VRa can not be VR2, VR3 or VR8.

VR2 Imaginary part of the Result

VR3 Real part of the Result

mem32 Pointer to 32-bit memory location

The result is a complex number with a 16-bit real and a 16-bit imaginary part. The result

is stored in VR5 as shown below:

Opcode LSW: 1110 0011 1111 0110

MSW: 0001 aaaa mem32

Description Complex Conjugate 16 x 16 = 32-bit multiply operation.

If the VSTATUS[CPACK] bit is set, the low word of the input is treated as the real part

while the upper word is treated as imaginary. If the VSTATUS[SAT] bit is set, then the

result will be saturated in the event of a 32-bit overflow or underflow. The following

operation is carried out:

if(VSTATUS[CPACK] == 0){

VR3 = VR0H * VR1H + VR0L * VR1L; // Re(Z) = Re(X)*Re(Y) + Im(X)*Im(Y)

VR2 = VR0H * VR1L - VR0L * VR1H; // Im(Z) = Re(X)*Im(Y) - Im(X)*Re(Y)

}else{

VR3 = VR0L * VR1L + VR0H * VR1H; // Re(Z) = Re(X)*Re(Y) + Im(X)*Im(Y)

VR2 = VR0L * VR1H - VR0H * VR1L; // Im(Z) = Re(X)*Im(Y) - Im(X)*Re(Y)

}

VRa = [mem32];

Flags This instruction modifies the following bits in the VSTATUS register:

• OVFR is set if the VR3 computation (real part) overflows or underflows.

• OVFI is set if the VR2 computation (imaginary part) overflows or underflows.

Pipeline This is a 2p/1-cycle instruction. The multiply operation takes 2p cycles and the VMOV

operation completes in a single cycle. The instruction following this one should not use

VR3 or VR2.

Example

VCLRCPACK ; cpack = 0 real part in high word

VMOV32 VR0, *XAR4++ ; load 1st complex input | jb + a

VMOV32 VR1, *XAR4++ ; load second complex input | jd + c

VCCMPY VR3, VR2, VR1, VR0 ; complex conjugate multiply|

||VMOV32 VR0, *XAR4++ ; (jb + a)*(jd + c)=(ac+bd)+j(bc-ad)

; load 1st complex input | a + jb

NOP ; for next VCCMPY instr |

VMOV32 *XAR5++, VR3 ; store real part first

VSETCPACK ; cpack = 1 imag part in low word

VMOV32 VR1, *XAR4++ ; load second complex input | c + jd

VCCMPY VR3, VR2, VR1, VR0 ; complex conjugate multiply|

||VMOV32 *XAR5++, VR2 ; (a + jb)*(c + jd)=(ac+bd)+j(bc-ad)

; store imag part of first |

NOP ; VCCMPY instruction |

VMOV32 *XAR5++, VR3 ; store real part first

VMOV32 *XAR5++, VR2 ; store imag part next

VCLRCPACK

230 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

VCCMPY VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — Complex Conjugate Multiply with Parallel Load

See also VCLROVFI

VCLROVFR

VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32

VCCMAC VR5, VR4, VR3, VR2, VR1, VR0

VSETCPACK

VCLRCPACK

VSATON

VSATOFF

231

SPRUHS1A–March 2014–Revised December 2015 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)

VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 — Complex Conjugate Multiply with Parallel Load

www.ti.com

VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 Complex Conjugate Multiply with Parallel Load

Operands Both inputs are complex numbers with a 16-bit real and 16-bit imaginary part. The result

is a complex number with a 32-bit real and a 32-bit imaginary part. The result is stored in

VR2 and VR3 as shown below:

Input Register Value

VR0 First Complex Operand

VR1 Second Complex Operand

VRa 32-bit value pointed to by mem32. VRa can not be VR2, VR3 or VR8.

VR2 Imaginary part of the Result

VR3 Real part of the Result

mem32 Pointer to 32-bit memory location

The result is a complex number with a 16-bit real and a 16-bit imaginary part. The result

is stored in VR5 as shown below:

Opcode LSW: 1110 0101 0000 1111

Description Complex Conjugate 16 x 16 = 32-bit multiply operation.

If the VSTATUS[CPACK] bit is set, the low word of the input is treated as the real part

while the upper word is treated as imaginary. If the VSTATUS[SAT] bit is set, then the

result will be saturated in the event of a 32-bit overflow or underflow. The following

operation is carried out:

if(VSTATUS[CPACK] == 0){

VR3 = VR0H * VR1H + VR0L * VR1L; // Re(Z) = Re(X)*Re(Y) + Im(X)*Im(Y)

VR2 = VR0H * VR1L - VR0L * VR1H; // Im(Z) = Re(X)*Im(Y) - Im(X)*Re(Y)

}else{

VR3 = VR0L * VR1L + VR0H * VR1H; // Re(Z) = Re(X)*Re(Y) + Im(X)*Im(Y)

VR2 = VR0L * VR1H - VR0H * VR1L; // Im(Z) = Re(X)*Im(Y) - Im(X)*Re(Y)

}

VRa = [mem32];

Flags This instruction modifies the following bits in the VSTATUS register:

• OVFR is set if the VR3 computation (real part) overflows or underflows.

• OVFI is set if the VR2 computation (imaginary part) overflows or underflows.

Pipeline This is a 2p/1-cycle instruction. The multiply operation takes 2p cycles and the VMOV

operation completes in a single cycle. The instruction following this one should not use

VR3 or VR2.

Example

VCLRCPACK ; cpack = 0 real part in high word

VMOV32 VR0, *XAR4++ ; load 1st complex input | jb + a

VMOV32 VR1, *XAR4++ ; load second complex input | jd + c

VCCMPY VR3, VR2, VR1, VR0 ; complex conjugate multiply|

||VMOV32 VR0, *XAR4++ ; (jb + a)*(jd + c)=(ac+bd)+j(bc-ad)

; load 1st complex input | a + jb

NOP ; for next VCCMPY instr |

VMOV32 *XAR5++, VR3 ; store real part first

VSETCPACK ; cpack = 1 imag part in low word

VMOV32 VR1, *XAR4++ ; load second complex input | c + jd

VCCMPY VR3, VR2, VR1, VR0 ; complex conjugate multiply|

||VMOV32 *XAR5++, VR2 ; (a + jb)*(c + jd)=(ac+bd)+j(bc-ad)

; store imag part of first |

NOP ; VCCMPY instruction |

VMOV32 *XAR5++, VR3 ; store real part first

VMOV32 *XAR5++, VR2 ; store imag part next

VCLRCPACK

See also VCLROVFI

232 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 — Complex Conjugate Multiply with Parallel Load

VCLROVFR

VCCMAC VR5, VR4, VR3, VR2, VR1, VR0

VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32

VSETCPACK

VCLRCPACK

VSATON

VSATOFF

233

SPRUHS1A–March 2014–Revised December 2015 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)

VCCON VRa — Complex Conjugate

www.ti.com

VCCON VRa Complex Conjugate

Operands

VRa General purpose register: VR0, VR1....VR7. Cannot be VR8.

Opcode LSW: 1110 0001 0001 aaaa

Description if(VSTATUS[CPACK] == 0){

if(VSTATUS[SAT] == 1){

VRaL = sat(- VraL)

}else {

VRaL = - VRaL

}

}else {

if(VSTATUS[SAT] == 1){

VRaH = sat(- VraH)

}else {

VRaH = - VRaH

}

Flags This instruction modifies the following bits in the VSTATUS register:

• OVFI is set in the case an overflow or underflow of the imaginary part of the

conjugate operation.

Pipeline This is a single-cycle instruction.

Example VCCON VR1 ; VR1 := VR1^*

See also

234 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

VCDADD16 VR5, VR4, VR3, VR2 — Complex 16 + 32 = 16 Addition

VCDADD16 VR5, VR4, VR3, VR2 Complex 16 + 32 = 16 Addition

Operands Before the operation, the inputs should be loaded into registers as shown below. The

first operand is a complex number with a 16-bit real and 16-bit imaginary part. The

second operand has a 32-bit real and a 32-bit imaginary part.

Input Register Value

VR4H 16-bit integer:

if(VSTATUS[CPACK]==0)

Re(X)

else

Im(X)

VR4L 16-bit integer:

if(VSTATUS[CPACK]==0)

Im(X)

else

Re(X)

VR3 32-bit integer representing the real part of the 2nd input: Re(Y)

VR2 32-bit integer representing the imaginary part of the 2nd input: Im(Y)

The result is a complex number with a 16-bit real and a 16-bit imaginary part. The result

is stored in VR5 as shown below:

Output Register Value

VR5H 16-bit integer:

if (VSTATUS[CPACK]==0){

Re(Z) = (Re(X) << SHIFTL) + (Re(Y)) >> SHIFTR

} else {

Im(Z) = (Im(X) << SHIFTL) + (Im(Y)) >> SHIFTR

}

VR5L 16-bit integer:

if (VSTATUS[CPACK]==0){

Im(Z) = (Im(X) << SHIFTL) + (Im(Y)) >> SHIFTR

} else {

Re(Z) = (Re(X) << SHIFTL) + (Re(Y)) >> SHIFTR

}

Opcode LSW: 1110 0101 0000 0100

Description Complex 16 + 32 = 16-bit operation. This operation is useful for algorithms similar to a

complex FFT. The first operand is a complex number with a 16-bit real and 16-bit

imaginary part. The second operand has a 32-bit real and a 32-bit imaginary part.

Before the addition, the first input is sign extended to 32-bits and shifted left by

VSTATUS[VSHIFTL] bits. The result of the addition is left shifted by

VSTATUS[VSHIFTR] before it is stored in VR5H and VR5L. If VSTATUS[RND] is set,

then bits shifted out to the right are rounded, otherwise these bits are truncated. The

rounding operation is described in Section 2.3.2. If the VSTATUS[SAT] bit is set, then

the result will be saturated in the event of a 16-bit overflow or underflow.

// RND is VSTATUS[RND]

// SAT is VSTATUS[SAT]

// SHIFTR is VSTATUS[SHIFTR]

// SHIFTL is VSTATUS[SHIFTL]

// VSTATUS[CPACK] = 0

// VR4H = Re(X) 16-bit

// VR4L = Im(X) 16-bit

// VR3 = Re(Y) 32-bit

// VR2 = Im(Y) 32-bit

235

SPRUHS1A–March 2014–Revised December 2015 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)

VCDADD16 VR5, VR4, VR3, VR2 — Complex 16 + 32 = 16 Addition

www.ti.com

// Calculate Z = X + Y

temp1 = sign_extend(VR4H); // 32-bit extended Re(X)

temp2 = sign_extend(VR4L); // 32-bit extended Im(X)

temp1 = (temp1 << SHIFTL) + VR3; // Re(Z) intermediate

temp2 = (temp2 << SHIFTL) + VR2; // Im(Z) intermediate

if (RND == 1)

{

temp1 = round(temp1 >> SHIFTR);

temp2 = round(temp2 >> SHIFTR);

}

else

{

temp1 = truncate(temp1 >> SHIFTR);

temp2 = truncate(temp2 >> SHIFTR);

}

if (SAT == 1)

{

VR5H = sat16(temp1);

VR5L = sat16(temp2);

}

else

{

VR5H = temp1[15:0];

VR5L = temp2[15:0];

}

Flags This instruction modifies the following bits in the VSTATUS register:

• OVFR is set if the real-part computation (VR5H) overflows or underflows.

• OVFI is set if the imaginary-part computation (VR5L) overflows or underflows.

Pipeline This is a single-cycle instruction.

Example ;

;Example: Z = X + Y

;

; X = 4 + 3j (16-bit real + 16-bit imaginary)

; Y = 13 + 12j (32-bit real + 32-bit imaginary)

;

; Real:

; temp1 = 0x00000004 + 0x0000000D = 0x00000011

; VR5H = temp1[15:0] = 0x0011 = 17

; Imaginary:

; temp2 = 0x00000003 + 0x0000000C = 0x0000000F

; VR5L = temp2[15:0] = 0x000F = 15

;

VSATOFF ; VSTATUS[SAT] = 0

VRNDOFF ; VSTATUS[RND] = 0

VSETSHR #0 ; VSTATUS[SHIFTR] = 0

VSETSHL #0 ; VSTATUS[SHIFTL] = 0

VCLEARALL ; VR0, VR1...VR8 == 0

VMOVXI VR3, #13 ; VR3 = Re(Y) = 13

VMOVXI VR2, #12 ; VR2 = Im(Y) = 12

VMOVXI VR4, #3

VMOVIX VR4, #4 ; VR4 = X = 0x00040003 = 4 + 3j

VCDADD16 VR5, VR4, VR3, VR2 ; VR5 = Z = 0x0011000F = 17 + 15j

The next example illustrates the operation with a right shift value defined.

;

; Example: Z = X + Y with Right Shift

236 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

VCDADD16 VR5, VR4, VR3, VR2 — Complex 16 + 32 = 16 Addition

;

; X = 4 + 3j (16-bit real + 16-bit imaginary)

; Y = 13 + 12j (32-bit real + 32-bit imaginary)

;

; Real:

; temp1 = (0x00000004 + 0x0000000D ) >> 1

; temp1 = (0x00000011) >> 1 = 0x0000008.8

; VR5H = temp1[15:0] = 0x0008 = 8

; Imaginary:

; temp2 = (0x00000003 + 0x0000000C ) >> 1

; temp2 = (0x0000000F) >> 1 = 0x0000007.8

; VR5L = temp2[15:0] = 0x0007 = 7

;

VSATOFF ; VSTATUS[SAT] = 0

VRNDOFF ; VSTATUS[RND] = 0

VSETSHR #1 ; VSTATUS[SHIFTR] = 1

VSETSHL #0 ; VSTATUS[SHIFTL] = 0

VCLEARALL ; VR0, VR1...VR8 == 0

VMOVXI VR3, #13 ; VR3 = Re(Y) = 13

VMOVXI VR2, #12 ; VR2 = Im(Y) = 12

VMOVXI VR4, #3

VMOVIX VR4, #4 ; VR4 = X = 0x00040003 = 4 + 3j

VCDADD16 VR5, VR4, VR3, VR2 ; VR5 = Z = 0x00080007 = 8 + 7j

The next example illustrates the operation with a right shift value defined as well as

rounding.

;

; Example: Z = X + Y with Right Shift and Rounding

;

; X = 4 + 3j (16-bit real + 16-bit imaginary)

; Y = 13 + 12j (32-bit real + 32-bit imaginary)

;

; Real:

; temp1 = round((0x00000004 + 0x0000000D ) >> 1)

; temp1 = round(0x00000011 >> 1)

; temp1 = round(0x0000008.8) = 0x00000009

; VR5H = temp1[15:0] = 0x0011 = 8

; Imaginary:

; temp2 = round(0x00000003 + 0x0000000C ) >> 1)

; temp2 = round(0x0000000F >> 1)

; temp2 = round(0x0000007.8) = 0x00000008

; VR5L = temp2[15:0] = 0x0008 = 8

;

VSATOFF ; VSTATUS[SAT] = 0

VRNDON ; VSTATUS[RND] = 1

VSETSHR #1 ; VSTATUS[SHIFTR] = 1

VSETSHL #0 ; VSTATUS[SHIFTL] = 0

VCLEARALL ; VR0, VR1...VR8 == 0

VMOVXI VR3, #13 ; VR3 = Re(Y) = 13

VMOVXI VR2, #12 ; VR2 = Im(Y) = 12

VMOVXI VR4, #3

VMOVIX VR4, #4 ; VR4 = X = 0x00040003 = 4 + 3j

VCDADD16 VR5, VR4, VR3, VR2 ; VR5 = Z = 0x00090008 = 9 + 8j

The next example illustrates the operation with both a right and left shift value defined

along with rounding.

;

; Example: Z = X + Y with Right Shift, Left Shift and Rounding

;

; X = -4 + 3j (16-bit real + 16-bit imaginary)

; Y = 13 - 9j (32-bit real + 32-bit imaginary)

;

; Real:

; temp1 = 0xFFFFFFFC << 2 + 0x0000000D

; temp1 = 0xFFFFFFF0 + 0x0000000D = 0xFFFFFFFD

; temp1 = 0xFFFFFFFD >> 1 = 0xFFFFFFFE.8

237

SPRUHS1A–March 2014–Revised December 2015 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)

VCDADD16 VR5, VR4, VR3, VR2 — Complex 16 + 32 = 16 Addition

www.ti.com

; temp1 = round(0xFFFFFFFFE.8) = 0xFFFFFFFF

; VR5H = temp1[15:0] 0xFFFF = -1;

; Imaginary:

; temp2 = 0x00000003 << 2 + 0xFFFFFFF7

; temp2 = 0x0000000C + 0xFFFFFFF7 = 0x00000003

; temp2 = 0x00000003 >> 1 = 0x00000001.8

; temp1 = round(0x000000001.8 = 0x000000002

; VR5L = temp2[15:0] 0x0002 = 2

;

VSATOFF ; VSTATUS[SAT] = 0

VRNDON ; VSTATUS[RND] = 1

VSETSHR #1 ; VSTATUS[SHIFTR] = 1

VSETSHL #2 ; VSTATUS[SHIFTL] = 2

VCLEARALL ; VR0, VR1...VR8 == 0

VMOVXI VR3, #13 ; VR3 = Re(Y) = 13 = 0x0000000D

VMOVXI VR2, #-9 ; VR2 = Im(Y) = -9

VMOVIX VR2, #0xFFFF ; sign extend VR2 = 0xFFFFFFF7

VMOVXI VR4, #3

VMOVIX VR4, #-4 ; VR4 = X = 0xFFFC0003 = -4 + 3j

VCDADD16 VR5, VR4, VR3, VR2 ; VR5 = Z = 0xFFFF0002 = -1 + 2j

See also VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32

VCADD VR7, VR6, VR5, VR4

VCDADD16 VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32

VRNDOFF

VRNDON

VSATON

VSATOFF

VSETSHL #5-bit

VSETSHR #5-bit

238 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

VCDADD16 VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex Double Add with Parallel Load

VCDADD16 VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 Complex Double Add with Parallel Load

Operands Before the operation, the inputs should be loaded into registers as shown below. The

first operand is a complex number with a 16-bit real and 16-bit imaginary part. The

second operand has a 32-bit real and a 32-bit imaginary part.

Input Register Value

VR4H 16-bit integer:

if (VSTATUS[CPACK]==0)

Re(X)

else

Im(X)

VR4L 16-bit integer:

if (VSTATUS[CPACK]==0)

Im(X)

else

Re(X)

VR3 32-bit integer representing the real part of the 2nd input: Re(Y)

VR2 32-bit integer representing the imaginary part of the 2nd input: Im(Y)

mem32 pointer to a 32-bit memory location.

The result is a complex number with a 16-bit real and a 16-bit imaginary part. The result

is stored in VR5 as shown below:

Output Register Value

VR5H 16-bit integer:

if (VSTATUS[CPACK]==0){

Re(Z) = (Re(X) << SHIFTL) + (Re(Y) ) >> SHIFTR

} else {

Im(Z) = (Im(X) << SHIFTL) + (Im(Y) ) >> SHIFTR

}

VR5L 16-bit integer:

if (VSTATUS[CPACK]==0){

Im(Z) = (Im(X) << SHIFTL) + (Im(Y) ) >> SHIFTR

} else {

Re(Z) = (Re(X) << SHIFTL) + (Re(Y) ) >> SHIFTR

}

VRa Contents of the memory pointed to by [mem32]. VRa can not be VR5 or VR8.

Opcode LSW: 1110 0011 1111 1010

MSW: 0000 aaaa mem32

Description Complex 16 + 32 = 16-bit operation with parallel register load. This operation is useful

for algorithms similar to a complex FFT.

The first operand is a complex number with a 16-bit real and 16-bit imaginary part. The

second operand has a 32-bit real and a 32-bit imaginary part.

Before the addition, the first input is sign extended to 32-bits and shifted left by

VSTATUS[VSHIFTL] bits. The result of the addition is left shifted by

VSTATUS[VSHIFTR] before it is stored in VR5H and VR5L. If VSTATUS[RND] is set,

then bits shifted out to the right are rounded, otherwise these bits are truncated. The

rounding operation is described in Section 2.3.2. If the VSTATUS[SAT] bit is set, then

the result will be saturated in the event of a 16-bit overflow or underflow.

// RND is VSTATUS[RND]

239

SPRUHS1A–March 2014–Revised December 2015 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)

VCDADD16 VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex Double Add with Parallel Load

www.ti.com

// SAT is VSTATUS[SAT]

// SHIFTR is VSTATUS[SHIFTR]

// SHIFTL is VSTATUS[SHIFTL]

// VSTATUS[CPACK] = 0

// VR4H = Re(X) 16-bit

// VR4L = Im(X) 16-bit

// VR3 = Re(Y) 32-bit

// VR2 = Im(Y) 32-bit

temp1 = sign_extend(VR4H); // 32-bit extended Re(X)

temp2 = sign_extend(VR4L); // 32-bit extended Im(X)

temp1 = (temp1 << SHIFTL) + VR3; // Re(Z) intermediate

temp2 = (temp2 << SHIFTL) + VR2; // Im(Z) intermediate

if (RND == 1)

{

temp1 = round(temp1 >> SHIFTR);

temp2 = round(temp2 >> SHIFTR);

}

else

{

temp1 = truncate(temp1 >> SHIFTR);

temp2 = truncate(temp2 >> SHIFTR);

}

if (SAT == 1)

{

VR5H = sat16(temp1);

VR5L = sat16(temp2);

}

else

{

VR5H = temp1[15:0];

VR5L = temp2[15:0];

}

VRa = [mem32];

Flags This instruction modifies the following bits in the VSTATUS register:

• OVFR is set if the real-part (VR5H) computation overflows or underflows.

• OVFI is set if the imaginary-part (VR5L) computation overflows or underflows.

Pipeline Both operations complete in a single cycle.

Example For more information regarding the addition operation, see the examples for the

VCDADD16 VR5, VR4, VR3, VR2 instruction.

;

;Example: Right Shift, Left Shift and Rounding

;

; X = -4 + 3j (16-bit real + 16-bit imaginary)

; Y = 13 - 9j (32-bit real + 32-bit imaginary)

;

; Real:

; temp1 = 0xFFFFFFFC << 2 + 0x0000000D

; temp1 = 0xFFFFFFF0 + 0x0000000D = 0xFFFFFFFD

; temp1 = 0xFFFFFFFD >> 1 = 0xFFFFFFFE.8

; temp1 = round(0xFFFFFFFFE.8) = 0xFFFFFFFF

; VR5H = temp1[15:0] 0xFFFF = -1;

; Imaginary:

; temp2 = 0x00000003 << 2 + 0xFFFFFFF7

; temp2 = 0x0000000C + 0xFFFFFFF7 = 0x00000003

; temp2 = 0x00000003 >> 1 = 0x00000001.8

; temp1 = round(0x000000001.8 = 0x000000002

; VR5L = temp2[15:0] 0x0002 = 2

240 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

VCDADD16 VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex Double Add with Parallel Load

;

VSATOFF ; VSTATUS[SAT] = 0

VRNDON ; VSTATUS[RND] = 1

VSETSHR #1 ; VSTATUS[SHIFTR] = 1

VSETSHL #2 ; VSTATUS[SHIFTL] = 2

VCLEARALL ; VR0, VR1...VR8 == 0

VMOVXI VR3, #13 ; VR3 = Re(Y) = 13 = 0x0000000D

VMOVXI VR2, #-9 ; VR2 = Im(Y) = -9

VMOVIX VR2, #0xFFFF ; sign extend VR2 = 0xFFFFFFF7

VMOVXI VR4, #3

VMOVIX VR4, #-4 ; VR4 = X = 0xFFFC0003 = -4 + 3j

VCDADD16 VR5, VR4, VR3, VR2 ; VR5 = Z = 0xFFFF0002 = -1 + 2j

|| VCMOV32 VR2, *XAR7 ; VR2 = value pointed to by XAR7

See also VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32

VCADD VR7, VR6, VR5, VR4

VRNDOFF

VRNDON

VSATON

VSATOFF

VSETSHL #5-bit

VSETSHR #5-bit

241

SPRUHS1A–March 2014–Revised December 2015 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)

VCDSUB16 VR6, VR4, VR3, VR2 — Complex 16-32 = 16 Subtract

www.ti.com

VCDSUB16 VR6, VR4, VR3, VR2 Complex 16-32 = 16 Subtract

Operands Before the operation, the inputs should be loaded into registers as shown below. The

first operand is a complex number with a 16-bit real and 16-bit imaginary part. The

second operand has a 32-bit real and a 32-bit imaginary part.

Input Register Value

VR4H 16-bit integer:

if(VSTATUS[CPACK]==0)

Re(X)

else

Im(X)

VR4L 16-bit integer:

if VSTATUS[CPACK]==0)

Im(X)

else

Re(X)

VR3 32-bit integer representing the real part of the 2nd input: Re(Y)

VR2 32-bit integer representing the imaginary part of the 2nd input: Im(Y)

The result is a complex number with a 16-bit real and a 16-bit imaginary part. The result

is stored in VR6 as shown below:

Output Register Value

VR6H 16-bit integer:

if (VSTATUS[CPACK]==0){

Re(Z) = (Re(X) << SHIFTL) -(Re(Y) ) >> SHIFTR

} else {

Im(Z) = (Im(X) << SHIFTL) -(Im(Y) ) >> SHIFTR

}

VR6L 16-bit integer:

if(VSTATUS[CPACK]==0){

Im(Z) = (Im(X) << SHIFTL) -(Im(Y) ) >> SHIFTR

} else {

Re(Z) = (Re(X) << SHIFTL) -(Re(Y) ) >> SHIFTR

}

Opcode LSW: 1110 0101 0000 0101

Description Complex 16 - 32 = 16-bit operation. This operation is useful for algorithms similar to a

complex FFT.

The first operand is a complex number with a 16-bit real and 16-bit imaginary part. The

second operand has a 32-bit real and a 32-bit imaginary part.

Before the addition, the first input is sign extended to 32-bits and shifted left by

VSTATUS[VSHIFTL] bits. The result of the subtraction is left shifted by

VSTATUS[VSHIFTR] before it is stored in VR5H and VR5L. If VSTATUS[RND] is set,

then bits shifted out to the right are rounded, otherwise these bits are truncated. The

rounding operation is described in Section 2.3.2. If the VSTATUS[SAT] bit is set, then

the result will be saturated in the event of a 16-bit overflow or underflow.

// RND is VSTATUS[RND]

// SAT is VSTATUS[SAT]

// SHIFTR is VSTATUS[SHIFTR]

// SHIFTL is VSTATUS[SHIFTL]

// VSTATUS[CPACK] = 0

// VR4H = Re(X) 16-bit

// VR4L = Im(X) 16-bit

// VR3 = Re(Y) 32-bit

242 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

VCDSUB16 VR6, VR4, VR3, VR2 — Complex 16-32 = 16 Subtract

// VR2 = Im(Y) 32-bit

temp1 = sign_extend(VR4H); // 32-bit extended Re(X)

temp2 = sign_extend(VR4L); // 32-bit extended Im(X)

temp1 = (temp1 << SHIFTL) - VR3; // Re(Z) intermediate

temp2 = (temp2 << SHIFTL) - VR2; // Im(Z) intermediate

if (RND == 1)

{

temp1 = round(temp1 >> SHIFTR);

temp2 = round(temp2 >> SHIFTR);

}

else

{

temp1 = truncate(temp1 >> SHIFTR);

temp2 = truncate(temp2 >> SHIFTR);

}

if (SAT == 1)

{

VR5H = sat16(temp1);

VR5L = sat16(temp2);

}

else

{

VR5H = temp1[15:0];

VR5L = temp2[15:0];

}

Flags This instruction modifies the following bits in the VSTATUS register:

• OVFR is set if the real-part (VR6H) computation overflows or underflows.

• OVFI is set if the imaginary-part (VR6L) computation overflows or underflows.

Pipeline This is a single-cycle instruction.

Example ;

; Example: Z = X - Y

;

; X = 4 + 6j (16-bit real + 16-bit imaginary)

; Y = 13 + 22j (32-bit real + 32-bit imaginary)

;

;Z=(4-13)+(6-22)j=-9-16j

;

VSATOFF ; VSTATUS[SAT] = 0

VRNDOFF ; VSTATUS[RND] = 0

VSETSHR #0 ; VSTATUS[SHIFTR] = 0

VSETSHL #0 ; VSTATUS[SHIFTL] = 0

VCLEARALL ; VR0, VR1...VR8 = 0

VMOVXI VR3, #13 ; VR3 = Re(Y) = 13 = 0x0000000D

VMOVXI VR2, #22 ; VR2 = Im(Y) = 22j = 0x00000016

VMOVXI VR4, #6

VMOVIX VR4, #4 ; VR4 = X = 0x00040006 = 4 + 6j

VCDSUB16 VR6, VR4, VR3, VR2 ; VR5 = Z = 0xFFF7FFF0 = -9 + -16j

The next example illustrates the operation with a right shift value defined.

;

; Example: Z = X - Y with Right Shift

; Y = 4 + 6j (16-bit real + 16-bit imaginary)

; X = 13 + 22j (32-bit real + 32-bit imaginary)

;

; Real:

; temp1 = (0x00000004 - 0x0000000D) >> 1

; temp1 = (0xFFFFFFF7) >> 1

243

SPRUHS1A–March 2014–Revised December 2015 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)

VCDSUB16 VR6, VR4, VR3, VR2 — Complex 16-32 = 16 Subtract

www.ti.com

; temp1 = 0xFFFFFFFFB

; VR5H = temp1[15:0] = 0xFFFB = -5

; Imaginary:

; temp2 = (0x00000006 - 0x00000016) >> 1

; temp2 = (0xFFFFFFF0) >> 1

; temp2 = 0xFFFFFFF8

; VR5L = temp2[15:0] = 0xFFF8 = -8

;

VSATOFF ; VSTATUS[SAT] = 0

VRNDOFF ; VSTATUS[RND] = 0

VSETSHR #1 ; VSTATUS[SHIFTR] = 1

VSETSHL #0 ; VSTATUS[SHIFTL] = 0

VCLEARALL ; VR0, VR1...VR8 == 0

VMOVXI VR3, #13 ; VR3 = Re(Y) = 13 = 0x0000000D

VMOVXI VR2, #22 ; VR2 = Im(Y) = 22j = 0x00000016

VMOVXI VR4, #6

VMOVIX VR4, #4 ; VR4 = X = 0x00040006 = 4 + 6j

VCDSUB16 VR6, VR4, VR3, VR2 ; VR5 = Z = 0xFFFBFFF8 = -5 + -8j

The next example illustrates rounding with a right shift value defined.

;

; Example: Z = X-Y with Rounding and Right Shift

;

; X = 4 + 6j (16-bit real + 16-bit imaginary)

; Y = -13 + 22j (32-bit real + 32-bit imaginary)

;

; Real:

; temp1 = round((0x00000004 - 0xFFFFFFF3) >> 1)

; temp1 = round(0x00000011) >> 1)

; temp1 = round(0x000000008.8) = 0x000000009

; VR5H = temp1[15:0] = 0x0009 = 9

; Imaginary:

; temp2 = round((0x00000006 - 0x00000016) >> 1)

; temp2 = round(0xFFFFFFF0) >> 1)

; temp2 = round(0xFFFFFFF8.0) = 0xFFFFFFF8

; VR5L = temp2[15:0] = 0xFFF8 = -8

;

VSATOFF ; VSTATUS[SAT] = 0

VRNDON ; VSTATUS[RND] = 1

VSETSHR #1 ; VSTATUS[SHIFTR] = 1

VSETSHL #0 ; VSTATUS[SHIFTL] = 0

VCLEARALL ; VR0, VR1...VR8 == 0

VMOVXI VR3, #-13 ; VR3 = Re(Y)

VMOVIX VR3, #0xFFFF ; sign extend VR3 = -13 = 0xFFFFFFF3

VMOVXI VR2, #22 ; VR2 = Im(Y) = 22j = 0x00000016

VMOVXI VR4, #6

VMOVIX VR4, #4 ; VR4 = X = 0x00040006 = 4 + 6j

VCDSUB16 VR6, VR4, VR3, VR2 ; VR5 = Z = 0x0009FFF8 = 9 + -8j

The next example illustrates rounding with both a left and a right shift value defined.

;

; Example: Z = X-Y with Rounding and both Left and Right Shift

;

; X = 4 + 6j (16-bit real + 16-bit imaginary)

; Y = -13 + 22j (32-bit real + 32-bit imaginary)

;

; Real:

; temp1 = round((0x00000004 << 2 - 0xFFFFFFF3) >> 1)

; temp1 = round((0x00000010 - 0xFFFFFFF3) >> 1)

; temp1 = round( 0x0000001D >> 1)

; temp1 = round( 0x0000000E.8) = 0x0000000F

; VR5H = temp1[15:0] = 0x000F = 15

; Imaginary:

; temp2 = round((0x00000006 << 2 - 0x00000016) >> 1)

244 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A–March 2014–Revised December 2015

Submit Documentation Feedback

www.ti.com

VCDSUB16 VR6, VR4, VR3, VR2 — Complex 16-32 = 16 Subtract

; temp2 = round((0x00000018 - 0x00000016) >> 1)

; temp2 = round( 0x00000002 >> 1)

; temp1 = round( 0x00000001.0) = 0x00000001

; VR5L = temp2[15:0] = 0x0001 = 1

;

VSATOFF ; VSTATUS[SAT] = 0

VRNDON ; VSTATUS[RND] = 1

VSETSHR #1 ; VSTATUS[SHIFTR] = 1

VSETSHL #2 ; VSTATUS[SHIFTL] = 2

VCLEARALL ; VR0, VR1...VR8 == 0

VMOVXI VR3, #-13 ; VR3 = Re(Y)

VMOVIX VR3, #0xFFFF ; sign extend VR3 = -13 = 0xFFFFFFF3

VMOVXI VR2, #22 ; VR2 = Im(Y) = 22j = 0x00000016

VMOVXI VR4, #6

VMOVIX VR4, #4 ; VR4 = X = 0x00040006 = 4 + 6j

VCDSUB16 VR6, VR4, VR3, VR2 ; VR5 = Z = 0x000F0001 = 15 + 1j

See also VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32

VCADD VR7, VR6, VR5, VR4

VRNDOFF

VRNDON

VSATON

VSATOFF

VSETSHL #5-bit

VSETSHR #5-bit

245

SPRUHS1A–March 2014–Revised December 2015 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)

VCDSUB16 VR6, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex 16-32 = 16 Subtract with Parallel Load

www.ti.com

VCDSUB16 VR6, VR4, VR3, VR2 || VMOV32 VRa, mem32 Complex 16-32 = 16 Subtract with Parallel

Load

Operands Before the operation, the inputs should be loaded into registers as shown below. The

first operand is a complex number with a 16-bit real and 16-bit imaginary part. The

second operand has a 32-bit real and a 32-bit imaginary part.

Input Register Value

VR4H 16-bit integer:

if(VSTATUS[CPACK]==0)

Re(X)

else

Im(X)

VR4L 16-bit integer:

if(VSTATUS[CPACK]==0)

Im(X)

else

Re(X)

VR3 32-bit integer representing the real part of the 2nd input: Re(Y)

VR2 32-bit integer representing the imaginary part of the 2nd input: Im(Y)

mem32 pointer to a 32-bit memory location.

The result is a complex number with a 16-bit real and a 16-bit imaginary part. The result

is stored in VR6 as shown below:

Output Register Value

VR6H 16-bit integer:

if (VSTATUS[CPACK]==0){

Re(Z) = (Re(X) << SHIFTL) - (Re(Y) ) >> SHIFTR

} else {

Im(Z) = (Im(X) << SHIFTL) - (Im(Y) ) >> SHIFTR

}

VR6L 16-bit integer:

if(VSTATUS[CPACK]==0){

Im(Z) = (Im(X) << SHIFTL) - (Im(Y)) >> SHIFTR

} else {

Re(Z) = (Re(X) << SHIFTL) - (Re(Y)) >> SHIFTR

}

VRa Contents of the memory pointed to by [mem32]. VRa cannot be VR6 or VR8.

Opcode

LSW: 1110 0011 1111 1011

MSW: 0000 aaaa mem32

Description Complex 16 - 32 = 16-bit operation with parallel load. This operation is useful for

algorithms similar to a complex FFT.