TMS320C28x Extended Instruction Sets (Rev. A) Technical Reference Manual
User Manual:
Open the PDF directly: View PDF .
Page Count: 404
Download | |
Open PDF In Browser | View PDF |
TMS320C28x Extended Instruction Sets Technical Reference Manual Literature Number: SPRUHS1A March 2014 – Revised December 2015 Contents Preface ........................................................................................................................................ 5 1 Floating Point Unit (FPU) 1.1 1.2 1.3 1.4 1.5 2 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) 2.1 2.2 2.3 2.4 2.5 2 ...................................................................................................... 9 Overview..................................................................................................................... 1.1.1 Compatibility with the C28x Fixed-Point CPU ................................................................. Components of the C28x plus Floating-Point CPU .................................................................... 1.2.1 Emulation Logic.................................................................................................... 1.2.2 Memory Map ....................................................................................................... 1.2.3 On-Chip Program and Data ...................................................................................... 1.2.4 CPU Interrupt Vectors ............................................................................................ 1.2.5 Memory Interface .................................................................................................. CPU Register Set .......................................................................................................... 1.3.1 CPU Registers ..................................................................................................... Pipeline ...................................................................................................................... 1.4.1 Pipeline Overview ................................................................................................. 1.4.2 General Guidelines for Floating-Point Pipeline Alignment .................................................. 1.4.3 Moves from FPU Registers to C28x Registers ................................................................ 1.4.4 Moves from C28x Registers to FPU Registers ................................................................ 1.4.5 Parallel Instructions ............................................................................................... 1.4.6 Invalid Delay Instructions ......................................................................................... 1.4.7 Optimizing the Pipeline ........................................................................................... Floating Point Unit Instruction Set ....................................................................................... 1.5.1 Instruction Descriptions ........................................................................................... 1.5.2 Instructions ......................................................................................................... ............................................................. 140 Overview ................................................................................................................... Components of the C28x Plus VCU .................................................................................... 2.2.1 Emulation Logic .................................................................................................. 2.2.2 Memory Map ..................................................................................................... 2.2.3 CPU Interrupt Vectors ........................................................................................... 2.2.4 Memory Interface ................................................................................................ 2.2.5 Address and Data Buses ....................................................................................... 2.2.6 Alignment of 32-Bit Accesses to Even Addresses .......................................................... Register Set ............................................................................................................... 2.3.1 VCU Register Set ................................................................................................ 2.3.2 VCU Status Register (VSTATUS) ............................................................................. 2.3.3 Repeat Block Register (RB) .................................................................................... Pipeline ..................................................................................................................... 2.4.1 Pipeline Overview ................................................................................................ 2.4.2 General Guidelines for VCU Pipeline Alignment ............................................................ 2.4.3 Parallel Instructions .............................................................................................. 2.4.4 Invalid Delay Instructions ....................................................................................... Instruction Set ............................................................................................................. 2.5.1 Instruction Descriptions ......................................................................................... 2.5.2 General Instructions ............................................................................................. 2.5.3 Arithmetic Math Instructions .................................................................................... Contents 10 10 11 12 12 12 12 12 13 13 19 19 20 20 21 21 22 25 26 26 29 141 142 144 144 144 144 144 145 146 147 149 152 154 154 154 155 156 159 159 161 205 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com 2.6 3 2.5.4 Complex Math Instructions ..................................................................................... 2.5.5 Cyclic Redundancy Check (CRC) Instructions ............................................................... 2.5.6 Deinterleaver Instructions ....................................................................................... 2.5.7 FFT Instructions .................................................................................................. 2.5.8 Galois Instructions ............................................................................................... 2.5.9 Viterbi Instructions ............................................................................................... Rounding Mode ........................................................................................................... 212 271 287 303 331 344 379 Trigonometric Math Unit (TMU)........................................................................................... 381 3.1 3.2 3.3 3.4 3.5 Overview ................................................................................................................... Components of the C28x+FPU Plus TMU............................................................................. 3.2.1 Interrupt Context Save and Restore ........................................................................... Data Format ............................................................................................................... Pipeline ..................................................................................................................... 3.4.1 Pipeline and Register Conflicts ................................................................................ 3.4.2 Delay Slot Requirements ....................................................................................... 3.4.3 Effect of Delay Slot Operations on the Flags ................................................................ 3.4.4 Multi-Cycle Operations in Delay Slots......................................................................... 3.4.5 Moves From FPU Registers to C28x Registers ............................................................. TMU Instruction Set ...................................................................................................... 3.5.1 Instruction Descriptions ......................................................................................... 3.5.2 Common Restrictions ........................................................................................... 3.5.3 Instructions ....................................................................................................... 382 382 382 382 383 383 385 386 386 386 388 388 389 389 Revision History ........................................................................................................................ 403 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Contents 3 www.ti.com List of Figures 1-1. FPU Functional Block Diagram ........................................................................................... 10 1-2. C28x With Floating-Point Registers ...................................................................................... 14 1-3. Floating-point Unit Status Register (STF) ............................................................................... 16 1-4. Repeat Block Register (RB) 1-5. FPU Pipeline ................................................................................................................ 19 2-1. C28x + VCU Block Diagram ............................................................................................. 142 2-2. C28x + FPU + VCU Registers .......................................................................................... 146 2-3. VCU Status Register (VSTATUS) ...................................................................................... 149 2-4. Repeat Block Register (RB) ............................................................................................. 152 2-5. C28x + FCU + VCU Pipeline ............................................................................................ 154 .............................................................................................. 18 List of Tables 4 1-1. 28x Plus Floating-Point CPU Register Summary ...................................................................... 15 1-2. Floating-point Unit Status (STF) Register Field Descriptions 1-3. Repeat Block (RB) Register Field Descriptions ........................................................................ 18 1-4. Operand Nomenclature .................................................................................................... 27 1-5. Summary of Instructions................................................................................................... 29 2-1. Viterbi Decode Performance ............................................................................................ 141 2-2. Complex Math Performance............................................................................................. 141 2-3. VCU Register Set ......................................................................................................... 147 2-4. 28x CPU Register Summary ............................................................................................ 148 2-5. VCU Status (VSTATUS) Register Field Descriptions ................................................................ 149 2-6. Operation Interaction With VSTATUS Bits ............................................................................ 150 2-7. Repeat Block (RB) Register Field Descriptions ....................................................................... 152 2-8. Operations Requiring a Delay Slot(s) .................................................................................. 155 2-9. Operand Nomenclature .................................................................................................. 159 2-10. INSTRUCTION dest, source1, source2 Short Description .......................................................... 160 2-11. General Instructions ...................................................................................................... 161 2-12. Arithmetic Math Instructions ............................................................................................. 205 2-13. Complex Math Instructions .............................................................................................. 212 2-14. CRC Instructions .......................................................................................................... 271 2-15. Deinterleaver Instructions ................................................................................................ 287 2-16. FFT Instructions ........................................................................................................... 303 2-17. Galois Field Instructions 2-18. Viterbi Instructions ........................................................................................................ 344 2-19. Example: Values Before Shift Right .................................................................................... 379 2-20. Example: Values after Shift Right 2-21. Example: Addition with Right Shift and Rounding .................................................................... 379 2-22. Example: Addition with Rounding After Shift Right ................................................................... 379 2-23. Shift Right Operation With and Without Rounding ................................................................... 380 3-1. TMU Supported Instructions............................................................................................. 382 3-2. IEEE 32-Bit Single Precision Floating-Point Format ................................................................. 382 3-3. Delay Slot Requirements for TMU Instructions ....................................................................... 385 3-4. Operand Nomenclature .................................................................................................. 388 3-5. Summary of Instructions ................................................................................................. 389 List of Figures ........................................................ ................................................................................................. ...................................................................................... 16 331 379 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Preface SPRUHS1A – March 2014 – Revised December 2015 Read This First This document describes the architecture, pipeline, and instruction sets of the TMU, VCU-II, and FPU accelerators. About This Manual The TMS320C2000™ digital signal processor (DSP) platform is part of the TMS320™ DSP family. Notational Conventions This document uses the following conventions. • Hexadecimal numbers are shown with the suffix h or with a leading 0x. For example, the following number is 40 hexadecimal (decimal 64): 40h or 0x40. • Registers in this document are shown as figures and described in tables. – Each register figure shows a rectangle divided into fields that represent the fields of the register. Each field is labeled with its bit name, its beginning and ending bit numbers above, and its read/write properties below. A legend explains the notation used for the properties – Reserved bits in a register figure designate a bit that is used for future device expansion. Related Documentation The following books describe the TMS320x28x and related support tools that are available on the TI website: Data Manual and Errata— SPRS439— TMS320F28335, TMS320F28334, TMS320F28332, TMS320F28235, TMS320F28234, TMS320F28232 Digital Signal Controllers (DSCs) Data Manual contains the pinout, signal descriptions, as well as electrical and timing specifications for the F2833x/2823x devices. SPRZ272— TMS320F28335, F28334, F28332, TMS320F28235, F28234, F28232 Digital Signal Controllers (DSCs) Silicon Errata describes the advisories and usage notes for different versions of silicon. CPU User's Guides— SPRU430 — TMS320C28x CPU and Instruction Set Reference Guide describes the central processing unit (CPU) and the assembly language instructions of the TMS320C28x fixed-point digital signal processors (DSPs). It also describes emulation features available on these DSPs. SPRUEO2 — TMS320C28x Floating Point Unit and Instruction Set Reference Guide describes the floating-point unit and includes the instructions for the FPU. Peripheral Guides— SPRU566 — TMS320x28xx, 28xxx DSP Peripheral Reference Guide describes the peripheral reference guides of the 28x digital signal processors (DSPs). SPRUFB0 — TMS320x2833x, 2823x System Control and Interrupts Reference Guide describes the various interrupts and system control features of the 2833x and 2823x digital signal controllers (DSCs). SPRU812 — TMS320x2833x, 2823x Analog-to-Digital Converter (ADC) Reference Guide describes how to configure and use the on-chip ADC module, which is a 12-bit pipelined ADC. SPRU949 — TMS320x2833x, 2823x DSC External Interface (XINTF) Reference Guide describes the XINTF, which is a nonmultiplexed asynchronous bus, as it is used on the 2833x and 2823x devices. SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Read This First 5 Related Documentation www.ti.com SPRU963 — TMS320x2833x, 2823x Boot ROM Reference Guide describes the purpose and features of the bootloader (factory-programmed boot-loading software) and provides examples of code. It also describes other contents of the device on-chip boot ROM and identifies where all of the information is located within that memory. SPRUFB7 — TMS320x2833x, 2823x Multichannel Buffered Serial Port (McBSP) Reference Guide describes the McBSP available on the 2833x and 2823x devices. The McBSPs allow direct interface between a DSP and other devices in a system. SPRUFB8 — TMS320x2833x, 2823x Direct Memory Access (DMA) Module Reference Guide describes the DMA on the 2833x and 2823x devices. SPRUG04 — TMS320x2833x, 2823x Enhanced Pulse Width Modulator (ePWM) Module Reference Guide describes the main areas of the enhanced pulse width modulator that include digital motor control, switch mode power supply control, UPS (uninterruptible power supplies), and other forms of power conversion. SPRUG02 — TMS320x2833x, 2823x High-Resolution Pulse Width Modulator (HRPWM) Reference Guide describes the operation of the high-resolution extension to the pulse width modulator (HRPWM). SPRUFG4 — TMS320x2833x, 2823x Enhanced Capture (eCAP) Module Reference Guide describes the enhanced capture module. It includes the module description and registers. SPRUG05 — TMS320x2833x, 2823x Enhanced Quadrature Encoder Pulse (eQEP) Module Reference Guide describes the eQEP module, which is used for interfacing with a linear or rotary incremental encoder to get position, direction, and speed information from a rotating machine in high-performance motion and position control systems. It includes the module description and registers. SPRUEU1 — TMS320x2833x, 2823x Enhanced Controller Area Network (eCAN) Reference Guide describes the eCAN that uses established protocol to communicate serially with other controllers in electrically noisy environments. SPRUFZ5 — TMS320x2833x, 2823x Serial Communications Interface (SCI) Reference Guide describes the SCI, which is a two-wire asynchronous serial port, commonly known as a UART. The SCI modules support digital communications between the CPU and other asynchronous peripherals that use the standard non-return-to-zero (NRZ) format. SPRUEU3 — TMS320x2833x, 2823x DSC Serial Peripheral Interface (SPI) Reference Guide describes the SPI - a high-speed synchronous serial input/output (I/O) port - that allows a serial bit stream of programmed length (one to sixteen bits) to be shifted into and out of the device at a programmed bit-transfer rate. SPRUG03 — TMS320x2833x, 2823x Inter-Integrated Circuit (I2C) Module Reference Guide describes the features and operation of the inter-integrated circuit (I2C) module. Tools Guides— SPRU513 — TMS320C28x Assembly Language Tools v5.0.0 User's Guide describes the assembly language tools (assembler and other tools used to develop assembly language code), assembler directives, macros, common object file format, and symbolic debugging directives for the TMS320C28x device. SPRU514 — TMS320C28x Optimizing C/C++ Compiler v5.0.0 User's Guide describes the TMS320C28x™ C/C++ compiler. This compiler accepts ANSI standard C/C++ source code and produces TMS320 DSP assembly language source code for the TMS320C28x device. SPRU608 — TMS320C28x Instruction Set Simulator Technical Overview describes the simulator, available within the Code Composer Studio for TMS320C2000 IDE, that simulates the instruction set of the C28x™ core. SPRU625 — TMS320C28x DSP/BIOS 5.32 Application Programming Interface (API) Reference Guide describes development using DSP/BIOS. 6 Read This First SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Related Documentation www.ti.com Trademarks TMS320C28x, C28x, TMS320C2000 are trademarks of Texas Instruments. SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Read This First 7 Related Documentation 8 Read This First www.ti.com SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Chapter 1 SPRUHS1A – March 2014 – Revised December 2015 Floating Point Unit (FPU) The TMS320C2000™ DSP family consists of fixed-point and floating-point digital signal controllers (DSCs). TMS320C2000™ Digital Signal Controllers combine control peripheral integration and ease of use of a microcontroller (MCU) with the processing power and C efficiency of TI’s leading DSP technology. This chapter provides an overview of the architectural structure and components of the C28x plus floating-point unit CPU. Topic 1.1 1.2 1.3 1.4 1.5 ........................................................................................................................... Overview ........................................................................................................... Components of the C28x plus Floating-Point CPU ................................................. CPU Register Set ............................................................................................... Pipeline ............................................................................................................. Floating Point Unit Instruction Set........................................................................ SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) Page 10 11 13 19 26 9 Overview 1.1 www.ti.com Overview The C28x plus floating-point (C28x+FPU) processor extends the capabilities of the C28x fixed-point CPU by adding registers and instructions to support IEEE single-precision floating point operations. This device draws from the best features of digital signal processing; reduced instruction set computing (RISC); and microcontroller architectures, firmware, and tool sets. The DSC features include a modified Harvard architecture and circular addressing. The RISC features are single-cycle instruction execution, register-toregister operations, and modified Harvard architecture (usable in Von Neumann mode). The microcontroller features include ease of use through an intuitive instruction set, byte packing and unpacking, and bit manipulation. The modified Harvard architecture of the CPU enables instruction and data fetches to be performed in parallel. The CPU can read instructions and data while it writes data simultaneously to maintain the single-cycle instruction operation across the pipeline. The CPU does this over six separate address/data buses. Throughout this document the following notations are used: • C28x refers to the C28x fixed-point CPU. • C28x plus Floating-Point and C28x+FPU both refer to the C28x CPU with enhancements to support IEEE single-precision floating-point operations. 1.1.1 Compatibility with the C28x Fixed-Point CPU No changes have been made to the C28x base set of instructions, pipeline, or memory bus architecture. Therefore, programs written for the C28x CPU are completely compatible with the C28x+FPU and all of the features of the C28x documented in TMS320C28x DSP CPU and Instruction Set Reference Guide (literature number SPRU430) apply to the C28x+FPU. Figure 1-1 shows basic functions of the FPU. Figure 1-1. FPU Functional Block Diagram Memory bus Program address bus (22) Program data bus (32) Read address bus (32) Read data bus (32) C28x + FPU Existing memory, peripherals, interfaces LVF LUF Memory bus PIE Write data bus (32) Write address bus (32) 10 Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Components of the C28x plus Floating-Point CPU www.ti.com 1.1.1.1 Floating-Point Code Development When developing C28x floating-point code use Code Composer Studio 3.3, or later, with at least service release 8. The C28x compiler V5.0, or later, is also required to generate C28x native floating-point opcodes. This compiler is available via Code Composer Studio update advisor as a seperate download. V5.0 can generate both fixed-point as well as floating-point code. To build floating-point code use the compiler switches:-v28 and - -float_support = fpu32. In Code Composer Studio 3.3 the float_support option is in the build options under compiler-> advanced: floating point support. Without the float_support flag, or with float_support = none, the compiler will generate fixed-point code. When building for C28x floating-point make sure all associated libraries have also been built for floatingpoint. The standard run-time support (RTS) libaries built for floating-point included with the compiler have fpu32 in their name. For example rts2800_fpu32.lib and rts2800_fpu_eh.lib have been built for the floatingpoint unit. The "eh" version has exception handling for C++ code. Using the fixed-point RTS libraries in a floating-point project will result in the linker issuing an error for incompatible object files. To improve performance of native floating-point projects, consider using the C28x FPU Fast RTS Library (SPRC664). This library contains hand-coded optimized math routines such as division, square root, atan2, sin and cos. This library can be linked into your project before the standard runtime support library to give your application a performance boost. As an example, the standard RTS library uses a polynomial expansion to calculate the sin function. The Fast RTS library, however, uses a math look-up table in the boot ROM of the device. Using this look-up table method results in approximately a 20 cycle savings over the standard RTS calculation. 1.2 Components of the C28x plus Floating-Point CPU The C28x+FPU contains: • A central processing unit for generating data and program-memory addresses; decoding and executing instructions; performing arithmetic, logical, and shift operations; and controlling data transfers among CPU registers, data memory, and program memory • A floating-point unit for IEEE single-precision floating point operations. • Emulation logic for monitoring and controlling various parts and functions of the device and for testing device operation. This logic is identical to that on the C28x fixed-point CPU. • Signals for interfacing with memory and peripherals, clocking and controlling the CPU and the emulation logic, showing the status of the CPU and the emulation logic, and using interrupts. This logic is identical to the C28x fixed-point CPU. Some features of the C28x+FPU central processing unit are: • Fixed-Point instructions are pipeline protected. This pipeline for fixed-point instructions is identical to that on the C28x fixed-point CPU. The CPU implements an 8-phase pipeline that prevents a write to and a read from the same location from occurring out of order. See Figure 1-5. • Some floating-point instructions require pipeline alignment. This alignment is done through software to allow the user to improve performance by taking advantage of required delay slots. • Independent register space. These registers function as system-control registers, math registers, and data pointers. The system-control registers are accessed by special instructions. • Arithmetic logic unit (ALU). The 32-bit ALU performs 2s-complement arithmetic and Boolean logic operations. • Floating point unit (FPU). The 32-bit FPU performs IEEE single-precision floating-point operations. • Address register arithmetic unit (ARAU). The ARAU generates data memory addresses and increments or decrements pointers in parallel with ALU operations. • Barrel shifter. This shifter performs all left and right shifts of fixed-point data. It can shift data to the left by up to 16 bits and to the right by up to 16 bits. • Fixed-Point Multiplier. The multiplier performs 32-bit × 32-bit 2s-complement multiplication with a 64-bit result. The multiplication can be performed with two signed numbers, two unsigned numbers, or one signed number and one unsigned number. SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 11 Components of the C28x plus Floating-Point CPU www.ti.com 1.2.1 Emulation Logic The emulation logic is identical to that on the C28x fixed-point CPU. This logic includes the following features: • Debug-and-test direct memory access (DT-DMA). A debug host can gain direct access to the content of registers and memory by taking control of the memory interface during unused cycles of the instruction pipeline. • A counter for performance benchmarking. • Multiple debug events. Any of the following debug events can cause a break in program execution: – A breakpoint initiated by the ESTOP0 or ESTOP1 instruction. – An access to a specified program-space or data-space location. When a debug event causes the C28x to enter the debug-halt state, the event is called a break event. • Real-time mode of operation. For more details about these features, refer to the TMS320C28x DSP CPU and Instruction Set Reference Guide (literature number SPRU430. 1.2.2 Memory Map Like the C28x, the C28x+FPU uses 32-bit data addresses and 22-bit program addresses. This allows for a total address reach of 4G words (1 word = 16 bits) in data space and 4M words in program space. Memory blocks on all C28x+FPU designs are uniformly mapped to both program and data space. For specific details about each of the map segments, see the data sheet for your device. 1.2.3 On-Chip Program and Data All C28x+FPU based devices contain at least two blocks of single access on-chip memory referred to as M0 and M1. Each of these blocks is 1K words in size. M0 is mapped at addresses 0x0000 − 0x03FF and M1 is mapped at addresses 0x0400 − 0x07FF. Like all other memory blocks on the C28x+FPU devices, M0 and M1 are mapped to both program and data space. Therefore, you can use M0 and M1 to execute code or for data variables. At reset, the stack pointer is set to the top of block M1. Depending on the device, it may also have additional random-access memory (RAM), read-only memory (ROM), external interface zones, or flash memory. 1.2.4 CPU Interrupt Vectors The C28x+FPU interrupt vectors are identical to those on the C28x CPU. Sixty-four addresses in program space are set aside for a table of 32 CPU interrupt vectors. The CPU vectors can be mapped to the top or bottom of program space by way of the VMAP bit. For more information about the CPU vectors, see TMS320C28x DSP CPU and Instruction Set Reference Guide (literature number SPRU430). For devices with a peripheral interrupt expansion (PIE) block, the interrupt vectors will reside in the PIE vector table and this memory can be used as program memory. 1.2.5 Memory Interface The C28x+FPU memory interface is identical to that on the C28x. The C28x+FPU memory map is accessible outside the CPU by the memory interface, which connects the CPU logic to memories, peripherals, or other interfaces. The memory interface includes separate buses for program space and data space. This means an instruction can be fetched from program memory while data memory is being accessed. The interface also includes signals that indicate the type of read or write being requested by the CPU. These signals can select a specified memory block or peripheral for a given bus transaction. In addition to 16-bit and 32-bit accesses, the C28x+FPU supports special byte-access instructions that can access the least significant byte (LSByte) or most significant byte (MSByte) of an addressed word. Strobe signals indicate when such an access is occurring on a data bus. 1.2.5.1 Address and Data Buses Like the C28x, the memory interface has three address buses: • PAB: Program address bus 12 Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated CPU Register Set www.ti.com • • The PAB carries addresses for reads and writes from program space. PAB is a 22-bit bus. DRAB: Data-read address bus The 32-bit DRAB carries addresses for reads from data space. DWAB: Data-write address bus The 32-bit DWAB carries addresses for writes to data space. The memory interface also has three data buses: • PRDB: Program-read data bus The PRDB carries instructions during reads from program space. PRDB is a 32-bit bus. • DRDB: Data-read data bus The DRDB carries data during reads from data space. DRDB is a 32-bit bus. • DWDB: Data-/Program-write data bus The 32-bit DWDB carries data during writes to data space or program space. A program-space read and a program-space write cannot happen simultaneously because both use the PAB. Similarly, a program-space write and a data-space write cannot happen simultaneously because both use the DWDB. Transactions that use different buses can happen simultaneously. For example, the CPU can read from program space (using PAB and PRDB), read from data space (using DRAB and DRDB), and write to data space (using DWAB and DWDB) at the same time. This behavior is identical to the C28x CPU. 1.2.5.2 Alignment of 32-Bit Accesses to Even Addresses The C28x+FPU CPU expects memory wrappers or peripheral-interface logic to align any 32-bit read or write to an even address. If the address-generation logic generates an odd address, the CPU will begin reading or writing at the previous even address. This alignment does not affect the address values generated by the address-generation logic. Most instruction fetches from program space are performed as 32-bit read operations and are aligned accordingly. However, alignment of instruction fetches are effectively invisible to a programmer. When instructions are stored to program space, they do not have to be aligned to even addresses. Instruction boundaries are decoded within the CPU. You need to be concerned with alignment when using instructions that perform 32-bit reads from or writes to data space. 1.3 CPU Register Set The C28x+FPU architecture is the same as the C28x CPU with an extended register and instruction set to support IEEE single-precision floating point operations. This section describes the extensions to the C28x architecture 1.3.1 CPU Registers Devices with the C28x+FPU include the standard C28x register set plus an additional set of floating-point unit registers. The additional floating-point unit registers are the following: • Eight floating-point result registers, RnH (where n = 0 - 7) • Floating-point Status Register (STF) • Repeat Block Register (RB) All of the floating-point registers except the repeat block register are shadowed. This shadowing can be used in high priority interrupts for fast context save and restore of the floating-point registers. Figure 1-2 shows a diagram of both register sets and Table 1-1 shows a register summary. For information on the standard C28x register set, see the TMS320C28x DSP CPU and Instruction Set Reference Guide (literature number SPRU430). SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 13 CPU Register Set www.ti.com Figure 1-2. C28x With Floating-Point Registers Standard C28x Register Set Additional 32-bit FPU Registers ACC (32-bit) R0H (32-bit) P (32-bit) XT (32-bit) XAR0 (32-bit) XAR1 (32-bit) R1H (32-bit) R2H (32-bit) R3H (32-bit) XAR2 (32-bit) XAR3 (32-bit) XAR4 (32-bit) R4H (32-bit) R5H (32-bit) XAR5 (32-bit) XAR6 (32-bit) R6H (32-bit) XAR7 (32-bit) R7H (32-bit) PC (22-bit) RPC (22-bit) FPU Status Register (STF) DP (16-bit) Repeat Block Register (RB) SP (16-bit) FPU registers R0H - R7H and STF are shadowed for fast context save and restore ST0 (16-bit) ST1 (16-bit) IER (16-bit) IFR (16-bit) DBGIER (16-bit) 14 Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated CPU Register Set www.ti.com Table 1-1. 28x Plus Floating-Point CPU Register Summary Register C28x CPU C28x+FPU Size Description Value After Reset ACC Yes Yes 32 bits Accumulator 0x00000000 AH Yes Yes 16 bits High half of ACC 0x0000 AL Yes Yes 16 bits Low half of ACC 0x0000 XAR0 Yes Yes 32 bits Auxiliary register 0 0x00000000 XAR1 Yes Yes 32 bits Auxiliary register 1 0x00000000 XAR2 Yes Yes 32 bits Auxiliary register 2 0x00000000 XAR3 Yes Yes 32 bits Auxiliary register 3 0x00000000 XAR4 Yes Yes 32 bits Auxiliary register 4 0x00000000 XAR5 Yes Yes 32 bits Auxiliary register 5 0x00000000 XAR6 Yes Yes 32 bits Auxiliary register 6 0x00000000 XAR7 Yes Yes 32 bits Auxiliary register 7 0x00000000 AR0 Yes Yes 16 bits Low half of XAR0 0x0000 AR1 Yes Yes 16 bits Low half of XAR1 0x0000 AR2 Yes Yes 16 bits Low half of XAR2 0x0000 AR3 Yes Yes 16 bits Low half of XAR3 0x0000 AR4 Yes Yes 16 bits Low half of XAR4 0x0000 AR5 Yes Yes 16 bits Low half of XAR5 0x0000 AR6 Yes Yes 16 bits Low half of XAR6 0x0000 AR7 Yes Yes 16 bits Low half of XAR7 0x0000 DP Yes Yes 16 bits Data-page pointer 0x0000 IFR Yes Yes 16 bits Interrupt flag register 0x0000 IER Yes Yes 16 bits Interrupt enable register 0x0000 DBGIER Yes Yes 16 bits Debug interrupt enable register 0x0000 P Yes Yes 32 bits Product register 0x00000000 PH Yes Yes 16 bits High half of P 0x0000 PL Yes Yes 16 bits Low half of P 0x0000 PC Yes Yes 22 bits Program counter 0x3FFFC0 RPC Yes Yes 22 bits Return program counter 0x00000000 SP Yes Yes 16 bits Stack pointer 0x0400 ST0 Yes Yes 16 bits Status register 0 0x0000 ST1 Yes Yes 16 bits Status register 1 0x080B (1) XT Yes Yes 32 bits Multiplicand register 0x00000000 T Yes Yes 16 bits High half of XT 0x0000 TL Yes Yes 16 bits Low half of XT 0x0000 ROH No Yes 32 bits Floating-point result register 0 0.0 R1H No Yes 32 bits Floating-point result register 1 0.0 R2H No Yes 32 bits Floating-point result register 2 0.0 R3H No Yes 32 bits Floating-point result register 3 0.0 R4H No Yes 32 bits Floating-point result register 4 0.0 R5H No Yes 32 bits Floating-point result register 5 0.0 R6H No Yes 32 bits Floating-point result register 6 0.0 R7H No Yes 32 bits Floating-point result register 7 0.0 STF No Yes 32 bits Floating-point status register 0x00000000 RB No Yes 32 bits Repeat block register 0x00000000 (1) Reset value shown is for devices without the VMAP signal and MOM1MAP signal pinned out. On these devices both of these signals are tied high internal to the device. SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 15 CPU Register Set 1.3.1.1 www.ti.com Floating-Point Status Register (STF) The floating-point status register (STF) reflects the results of floating-point operations. There are three basic rules for floating point operation flags: 1. Zero and negative flags are set based on moves to registers. 2. Zero and negative flags are set based on the result of compare, minimum, maximum, negative and absolute value operations. 3. Overflow and underflow flags are set by math instructions such as multiply, add, subtract and 1/x. These flags may also be connected to the peripheral interrupt expansion (PIE) block on your device. This can be useful for debugging underflow and overflow conditions within an application. As on the C28x, program flow is controlled by C28x instructions that read status flags in the status register 0 (ST0) . If a decision needs to be made based on a floating-point operation, the information in the STF register needs to be loaded into ST0 flags (Z,N,OV,TC,C) so that the appropriate branch conditional instruction can be executed. The MOVST0 FLAGinstruction is used to load the current value of specified STF flags into the respective bits of ST0. When this instruction executes, it will also clear the latched overflow and underflow flags if those flags are specified. Example 1-1. Moving STF Flags to the ST0 Register Loop: MOV32 MOV32 CMPF32 MOVST0 BF R0H,*XAR4++ R1H,*XAR3++ R1H, R0H ZF, NF Loop, GT ; Move ZF and NF to ST0 ; Loop if (R1H > R0H) Figure 1-3. Floating-point Unit Status Register (STF) 31 30 16 SHDWS Reserved R/W-0 R-0 15 6 5 4 3 2 1 0 Reserved 10 RND32 9 8 Reserved 7 TF ZI NI ZF NF LUF LVF R-0 R/W-0 R-0 R/W-0 R/W-0 R/W-0 R/W-0 R/W-0 R/W-0 R/W-0 LEGEND: R/W = Read/Write; R = Read only; -n = value after reset Table 1-2. Floating-point Unit Status (STF) Register Field Descriptions Bits Field 31 SHDWS Value Description Shadow Mode Status Bit 0 This bit is forced to 0 by the RESTORE instruction. 1 This bit is set to 1 by the SAVE instruction. This bit is not affected by loading the status register either from memory or from the shadow values. 30 - 10 Reserved 9 RND32 8-7 Reserved 6 TF 0 Reserved for future use Round 32-bit Floating-Point Mode 0 If this bit is zero, the MPYF32, ADDF32 and SUBF32 instructions will round to zero (truncate). 1 If this bit is one, the MPYF32, ADDF32 and SUBF32 instructions will round to the nearest even value. 0 Reserved for future use Test Flag The TESTTF instruction can modify this flag based on the condition tested. The SETFLG and SAVE instructions can also be used to modify this flag. 16 0 The condition tested with the TESTTF instruction is false. 1 The condition tested with the TESTTF instruction is true. Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated CPU Register Set www.ti.com Table 1-2. Floating-point Unit Status (STF) Register Field Descriptions (continued) Bits Field 5 ZI Value Description Zero Integer Flag The following instructions modify this flag based on the integer value stored in the destination register: MOV32, MOVD32, MOVDD32 The SETFLG and SAVE instructions can also be used to modify this flag. 4 0 The integer value is not zero. 1 The integer value is zero. NI Negative Integer Flag The following instructions modify this flag based on the integer value stored in the destination register: MOV32, MOVD32, MOVDD32 The SETFLG and SAVE instructions can also be used to modify this flag. 3 0 The integer value is not negative. 1 The integer value is negative. ZF Zero Floating-Point Flag (1) (2) The following instructions modify this flag based on the floating-point value stored in the destination register: MOV32, MOVD32, MOVDD32, ABSF32, NEGF32 The CMPF32, MAXF32, and MINF32 instructions modify this flag based on the result of the operation. The SETFLG and SAVE instructions can also be used to modify this flag 2 0 The floating-point value is not zero. 1 The floating-point value is zero. NF Negative Floating-Point Flag (1) (2) The following instructions modify this flag based on the floating-point value stored in the destination register: MOV32, MOVD32, MOVDD32, ABSF32, NEGF32 The CMPF32, MAXF32, and MINF32 instructions modify this flag based on the result of the operation. The SETFLG and SAVE instructions can also be used to modify this flag. 1 0 The floating-point value is not negative. 1 The floating-point value is negative. LUF Latched Underflow Floating-Point Flag The following instructions will set this flag to 1 if an underflow occurs: MPYF32, ADDF32, SUBF32, MACF32, EINVF32, EISQRTF32 0 0 An underflow condition has not been latched. If the MOVST0 instruction is used to copy this bit to ST0, then LUF will be cleared. 1 An underflow condition has been latched. LVF Latched Overflow Floating-Point Flag The following instructions will set this flag to 1 if an overflow occurs: MPYF32, ADDF32, SUBF32, MACF32, EINVF32, EISQRTF32 (1) (2) 0 An overflow condition has not been latched. If the MOVST0 instruction is used to copy this bit to ST0, then LVF will be cleared. 1 An overflow condition has been latched. A negative zero floating-point value is treated as a positive zero value when configuring the ZF and NF flags. A DeNorm floating-point value is treated as a positive zero value when configuring the ZF and NF flags. SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 17 CPU Register Set 1.3.1.2 www.ti.com Repeat Block Register (RB) The repeat block instruction (RPTB) is a new instruction for C28x+FPU. This instruction allows you to repeat a block of code as shown in Example 1-2. Example 1-2. The Repeat Block (RPTB) Instruction uses the RB Register ; find the largest element and put its address in XAR6 MOV32 R0H, *XAR0++; .align 2 ; Aligns the next instruction to an even address NOP RPTB VECTOR_MAX_END, AR7 MOVL ACC,XAR0 MOV32 R1H,*XAR0++ MAXF32 R0H,R1H MOVST0 NF,ZF MOVL XAR6,ACC,LT VECTOR_MAX_END: ; Makes RPTB odd aligned - required for a block size of 8 ; RA is set to 1 ; RSIZE reflects the size of the RPTB block ; in this case the block size is 8 ; RE indicates the end address. RA is cleared The C28x_FPU hardware automatically populates the RB register based on the execution of a RPTB instruction. This register is not normally read by the application and does not accept debugger writes. Figure 1-4. Repeat Block Register (RB) 31 30 RAS RA 29 RSIZE 23 22 RE 16 R-0 R-0 R-0 R-0 15 0 RC R-0 LEGEND: R = Read only; -n = value after reset Table 1-3. Repeat Block (RB) Register Field Descriptions Bits Field 31 RAS Value Description Repeat Block Active Shadow Bit When an interrupt occurs the repeat active, RA, bit is copied to the RAS bit and the RA bit is cleared. When an interrupt return instruction occurs, the RAS bit is copied to the RA bit and RAS is cleared. 30 0 A repeat block was not active when the interrupt was taken. 1 A repeat block was active when the interrupt was taken. RA Repeat Block Active Bit 0 This bit is cleared when the repeat counter, RC, reaches zero. When an interrupt occurs the RA bit is copied to the repeat active shadow, RAS, bit and RA is cleared. When an interrupt return, IRET, instruction is executed, the RAS bit is copied to the RA bit and RAS is cleared. 1 29-23 RSIZE This bit is set when the RPTB instruction is executed to indicate that a RPTB is currently active. Repeat Block Size This 7-bit value specifies the number of 16-bit words within the repeat block. This field is initialized when the RPTB instruction is executed. The value is calculated by the assembler and inserted into the RPTB instruction's RSIZE opcode field. 0-7 Illegal block size. 8/9-0x7F A RPTB block that starts at an even address must include at least 9 16-bit words and a block that starts at an odd address must include at least 8 16-bit words. The maximum block size is 127 16-bit words. The codegen assembler will check for proper block size and alignment. 18 Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Pipeline www.ti.com Table 1-3. Repeat Block (RB) Register Field Descriptions (continued) Bits Field 22-16 RE Value Description Repeat Block End Address This 7-bit value specifies the end address location of the repeat block. The RE value is calculated by hardware based on the RSIZE field and the PC value when the RPTB instruction is executed. RE = lower 7 bits of (PC + 1 + RSIZE) 15-0 1.4 RC Repeat Count 0 The block will not be repeated; it will be executed only once. In this case the repeat active, RA, bit will not be set. 10xFFFF This 16-bit value determines how many times the block will repeat. The counter is initialized when the RPTB instruction is executed and is decremented when the PC reaches the end of the block. When the counter reaches zero, the repeat active bit is cleared and the block will be executed one more time. Therefore the total number of times the block is executed is RC+1. Pipeline The pipeline flow for C28x instructions is identical to that of the C28x CPU described in TMS320C28x DSP CPU and Instruction Set Reference Guide (SPRU430). Some floating-point instructions, however, use additional execution phases and thus require a delay to allow the operation to complete. This pipeline alignment is achieved by inserting NOPs or non-conflicting instructions when required. Software control of delay slots allows you to improve performance of an application by taking advantage of the delay slots and filling them with non-conflicting instructions. This section describes the key characteristics of the pipeline with regards to floating-point instructions. The rules for avoiding pipeline conflicts are small in number and simple to follow and the C28x+FPU assembler will help you by issuing errors for conflicts. 1.4.1 Pipeline Overview The C28x FPU pipeline is identical to the C28x pipeline for all standard C28x instructions. In the decode2 stage (D2), it is determined if an instruction is a C28x instruction or a floating-point unit instruction. The pipeline flow is shown in Figure 1-5. Notice that stalls due to normal C28x pipeline stalls (D2) and memory waitstates (R2 and W) will also stall any C28x FPU instruction. Most C28x FPU instructions are single cycle and will complete in the FPU E1 or W stage which aligns to the C28x pipeline. Some instructions will take an additional execute cycle (E2). For these instructions you must wait a cycle for the result from the instruction to be available. The rest of this section will describe when delay cycles are required. Keep in mind that the assembly tools for the C28x+FPU will issue an error if a delay slot has not been handled correctly. Figure 1-5. FPU Pipeline Fetch C28x pipeline F1 Decode F2 D1 FPU instruction Read D2 Exe Write R1 R2 E W D R E1 E2 W Load Store CMP/MIN/MAX/NEG/ABS MPY/ADD/SUB/MACF32 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 19 Pipeline www.ti.com 1.4.2 General Guidelines for Floating-Point Pipeline Alignment While the C28x+FPU assembler will issue errors for pipeline conflicts, you may still find it useful to understand when software delays are required. This section describes three guidelines you can follow when writing C28x+FPU assembly code. Floating-point instructions that require delay slots have a 'p' after their cycle count. For example '2p' stands for 2 pipelined cycles. This means that an instruction can be started every cycle, but the result of the instruction will only be valid one instruction later. There are three general guidelines to determine if an instruction needs a delay slot: 1. Floating-point math operations (multiply, addition, subtraction, 1/x and MAC) require 1 delay slot. 2. Conversion instructions between integer and floating-point formats require 1 delay slot. 3. Everything else does not require a delay slot. This includes minimum, maximum, compare, load, store, negative and absolute value instructions. There are two exceptions to these rules. First, moves between the CPU and FPU registers require special pipeline alignment that is described later in this section. These operations are typically infrequent. Second, the MACF32 R7H, R3H, mem32, *XAR7 instruction has special requirements that make it easier to use. Refer to the MACF32 instruction description for details. An example of the 32-bit ADDF32 instruction is shown in Example 1-3. ADDF32 is a 2p instruction and therefore requires one delay slot. The destination register for the operation, R0H, will be updated one cycle after the instruction for a total of 2 cycles. Therefore, a NOP or instruction that does not use R0H must follow this instruction. Any memory stall or pipeline stall will also stall the floating-point unit. This keeps the floating-point unit aligned with the C28x pipeline and there is no need to change the code based on the waitstates of a memory block. Please note that on certain devices instructions make take additional cycles to complete under specific conditions. These exceptions will be documented in the device errata. Example 1-3. 2p Instruction Pipeline Alignment ADDF32 R0H, #1.5, R1H NOP NOP ; ; ; ; 2 pipeline cycles (2p) 1 cycle delay or non-conflicting instruction <-- ADDF32 completes, R0H updated Any instruction 1.4.3 Moves from FPU Registers to C28x Registers When transferring from the floating-point unit registers to the C28x CPU registers, additional pipeline alignment is required as shown in Example 1-4 and Example 1-5. Example 1-4. Floating-Point to C28x Register Software Pipeline Alignment ; MINF32: 32-bit floating-point minimum: single-cycle operation ; An alignment cycle is required before copying R0H to ACC MINF32 R0H, R1H ; Single-cycle instruction ; <-- R0H is valid NOP ; Alignment cycle MOV32 @ACC, R0H ; Copy R0H to ACC For 1-cycle FPU instructions, one delay slot is required between a write to the floating-point register and the transfer instruction as shown in Example 1-4. For 2p FPU instructions, two delay slots are required between a write to the floating-point register and the transfer instruction as shown in Example 1-5. 20 Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Pipeline www.ti.com Example 1-5. Floating-Point to C28x Register Software Pipeline Alignment ; ADDF32: 32-bit floating-point addition: 2p operation ; An alignment cycle is required before copying R0H to ACC ADDF32 R0H, R1H, #2 ; R0H = R1H + 2, 2 pipeline cycle instruction NOP ; 1 delay cycle or non-conflicting instruction ; <-- R0H is valid NOP ; Alignment cycle MOV32 @ACC, R0H ; Copy R0H to ACC 1.4.4 Moves from C28x Registers to FPU Registers Transfers from the standard C28x CPU registers to the floating-point registers require four alignment cycles. For the 2833x, 2834x, 2806x, 28M35xx and 28M26xx, the four alignment cycles can be filled with NOPs or any non-conflicting instruction except for FRACF32, UI16TOF32, I16TOF32, F32TOUI32, and F32TOI32. These instructions cannot replace any of the four alignment NOPs. On newer devices any nonconflicting instruction can go into the four alignment cycles. Please refer to the device errata for specific exceptions to these rules. Example 1-6. C28x Register to Floating-Point Register Software Pipeline Alignment ; Four alignment cycles are required after copying a standard 28x CPU ; register to a floating-point register. ; MOV32 R0H,@ACC ; Copy ACC to R0H NOP NOP NOP NOP ; Wait 4 cycles ADDF32 R2H,R1H,R0H ; R0H is valid 1.4.5 Parallel Instructions Parallel instructions are single opcodes that perform two operations in parallel. This can be a math operation in parallel with a move operation, or two math operations in parallel. Math operations with a parallel move are referred to as 2p/1 instructions. The math portion of the operation takes two pipelined cycles while the move portion of the operation is single cycle. This means that NOPs or other non conflicting instructions must be inserted to align the math portion of the operation. An example of an add with parallel move instruction is shown in Example 1-7. Example 1-7. 2p/1 Parallel Instruction Software Pipeline Alignment ; ; ; ; ADDF32 || MOV32 instruction: 32-bit floating-point add with parallel move ADDF32 is a 2p operation MOV32 is a 1 cycle operation ADDF32 R0H, R1H, #2 || MOV32 R1H, @Val NOP NOP ; ; ; ; ; ; R0H = R1H + 2, 2 pipeline cycle operation R1H gets the contents of Val, single cycle operation <-- MOV32 completes here (R1H is valid) 1 cycle delay or non-conflicting instruction <-- ADDF32 completes here (R0H is valid) Any instruction Parallel math instructions are referred to as 2p/2p instructions. Both math operations take 2 cycles to complete. This means that NOPs or other non conflicting instructions must be inserted to align the both math operations. An example of a multiply with parallel add instruction is shown in Example 1-8. SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 21 Pipeline www.ti.com Example 1-8. 2p/2p Parallel Instruction Software Pipeline Alignment ; MPYF32 || ADDF32 instruction: 32-bit floating-point multiply with parallel add ; MPYF32 is a 2p operation ; ADDF32 is a 2p cycle operation ; MPYF32 R0H, R1H, R3H ; R0H = R1H * R3H, 2 pipeline cycle operation || ADDF32 R1H, R2H, R4H ; R1H = R2H + R4H, 2 pipeline cycle operation NOP ; 1 cycle delay or non-conflicting instruction ; <-- MPYF32 and ADDF32 complete here (R0H and R1H are valid) NOP ; Any instruction 1.4.6 Invalid Delay Instructions Most instructions can be used in delay slots as long as source and destination register conflicts are avoided. The C28x+FPU assembler will issue an error anytime you use an conflicting instruction within a delay slot. The following guidelines can be used to avoid these conflicts. NOTE: Destination register conflicts in delay slots: Any operation used for pipeline alignment delay must not use the same destination register as the instruction requiring the delay. See Example 1-9. In Example 1-9 the MPYF32 instruction uses R2H as its destination register. The next instruction should not use R2H as its destination. Since the MOV32 instruction uses the R2H register a pipeline conflict will be issued by the assembler. This conflict can be resolved by using a register other than R2H for the MOV32 instruction as shown in Example 1-10. 22 Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Pipeline www.ti.com Example 1-9. Destination Register Conflict ; Invalid delay instruction. Both instructions use the same destination register MPYF32 R2H, R1H, R0H ; 2p instruction MOV32 R2H, mem32 ; Invalid delay instruction Example 1-10. Destination Register Conflict Resolved ; Valid delay instruction MPYF32 R2H, R1H, R0H NOTE: ; 2p instruction MOV32 R1H, mem32 ; Valid delay ; <-- MPYF32 completes, R2H valid Instructions in delay slots cannot use the instruction's destination register as a source register. Any operation used for pipeline alignment delay must not use the destination register of the instruction requiring the delay as a source register as shown in Example 1-11. For parallel instructions, the current value of a register can be used in the parallel operation before it is overwritten as shown in Example 1-13. In Example 1-11 the MPYF32 instruction again uses R2H as its destination register. The next instruction should not use R2H as its source since the MPYF32 will take an additional cycle to complete. Since the ADDF32 instruction uses the R2H register a pipeline conflict will be issued by the assembler. This conflict can be resolved by using a register other than R2H or by inserting a non-conflicting instruction between the MPYF32 and ADDF32 instructions. Since the SUBF32 does not use R2H this instruction can be moved before the ADDF32 as shown in Example 1-12. Example 1-11. Destination/Source Register Conflict ; Invalid delay MPYF32 ADDF32 SUBF32 instruction. R2H, R1H, R0H R3H, R3H, R2H R4H, R1H, R0H ADDF32 should not use R2H as a source operand ; 2p instruction ; Invalid delay instruction Example 1-12. Destination/Source Register Conflict Resolved ; Valid delay instruction. MPYF32 R2H, R1H, R0H SUBF32 R4H, R1H, R0H ADDF32 R3H, R3H, R2H NOP ; ; ; ; 2p instruction Valid delay for MPYF32 <-- MPYF32 completes, R2H valid <-- SUBF32 completes, R4H valid It should be noted that a source register for the 2nd operation within a parallel instruction can be the same as the destination register of the first operation. This is because the two operations are started at the same time. The 2nd operation is not in the delay slot of the first operation. Consider Example 1-13 where the MPYF32 uses R2H as its destination register. The MOV32 is the 2nd operation in the instruction and can freely use R2H as a source register. The contents of R2H before the multiply will be used by MOV32. SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 23 Pipeline www.ti.com Example 1-13. Parallel Instruction Destination/Source Exception ; Valid parallel operation. MPYF32 R2H, R1H, R0H || MOV32 mem32, R2H NOP ; ; ; ; ; 2p/1 instruction <-- Uses R2H before the MPYF32 <-- mem32 updated <-- Delay for MPYF32 <-- R2H updated Likewise, the source register for the 2nd operation within a parallel instruction can be the same as one of the source registers of the first operation. The MPYF32 operation in Example 1-14 uses the R1H register as one of its sources. This register is also updated by the MOV32 register. The multiplication operation will use the value in R1H before the MOV32 updates it. Example 1-14. Parallel Instruction Destination/Source Exception ; Valid parallel instruction MPYF32 R2H, R1H, R0H ; 2p/1 instruction || MOV32 R1H, mem32 ; Valid NOP ; <-- MOV32 completes, R1H valid ; <-- MPYF32, R2H valid NOTE: Operations within parallel instructions cannot use the same destination register. When two parallel operations have the same destination register, the result is invalid. For example, see Example 1-15. If both operations within a parallel instruction try to update the same destination register as shown in Example 1-15 the assembler will issue an error. Example 1-15. Invalid Destination Within a Parallel Instruction ; Invalid parallel instruction. Both operations use the same destination register MPYF32 R2H, R1H, R0H ; 2p/1 instruction || MOV32 R2H, mem32 ; Invalid Some instructions access or modify the STF flags. Because the instruction requiring a delay slot will also be accessing the STF flags, these instructions should not be used in delay slots. These instructions are SAVE, SETFLG, RESTORE and MOVST0. NOTE: 24 Do not use SAVE, SETFLG, RESTORE, or the MOVST0 instruction in a delay slot. Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Pipeline www.ti.com 1.4.7 Optimizing the Pipeline The following example shows how delay slots can be used to improve the performance of an algorithm. The example performs two Y = MX+B operations. In Example 1-16, no optimization has been done. The Y = MX+B calculations are sequential and each takes 7 cycles to complete. Notice there are NOPs in the delay slots that could be filled with non-conflicting instructions. The only requirement is these instructions must not cause a register conflict or access the STF register flags. Example 1-16. Floating-Point Code Without Pipeline Optimization ; ; ; ; ; ; ; Using NOPs for alignment cycles, calculate the following: Y1 = M1*X1 + B1 Y2 = M2*X2 + B2 Calculate Y1 MOV32 R0H,@M1 MOV32 R1H,@X1 MPYF32 R1H,R1H,R0H || MOV32 R0H,@B1 NOP ADDF32 R1H,R1H,R0H NOP MOV32 @Y1,R1H ; ; ; ; ; ; ; ; ; ; Load R0H with M1 - single cycle Load R1H with X1 - single cycle R1H = M1 * X1 - 2p operation Load R0H with B1 - single cycle Wait for MPYF32 to complete <-- MPYF32 completes, R1H is valid R1H = R1H + R0H - 2p operation Wait for ADDF32 to complete <-- ADDF32 completes, R1H is valid Save R1H in Y1 - single cycle ; ; ; ; ; ; ; ; ; ; Load R0H with M2 - single cycle Load R1H with X2 - single cycle R1H = M2 * X2 - 2p operation Load R0H with B2 - single cycle Wait for MPYF32 to complete <-- MPYF32 completes, R1H is valid R1H = R1H + R0H Wait for ADDF32 to complete <-- ADDF32 completes, R1H is valid Save R1H in Y2 ; Calculate Y2 MOV32 R0H,@M2 MOV32 R1H,@X2 MPYF32 R1H,R1H,R0H || MOV32 R0H,@B2 NOP ADDF32 R1H,R1H,R0H NOP MOV32 @Y2,R1H ; 14 cycles ; 48 bytes The code shown in Example 1-17 was generated by the C28x+FPU compiler with optimization enabled. Notice that the NOPs in the first example have now been filled with other instructions. The code for the two Y = MX+B calculations are now interleaved and both calculations complete in only nine cycles. SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 25 Floating Point Unit Instruction Set www.ti.com Example 1-17. Floating-Point Code With Pipeline Optimization ; ; ; ; ; ; Using non-conflicting instructions for alignment cycles, calculate the following: Y1 = M1*X1 + B1 Y2 = M2*X2 + B2 MOV32 MOV32 MPYF32 || MOV32 MOV32 R2H,@X1 R1H,@M1 R3H,R2H,R1H R0H,@M2 R1H,@X2 MPYF32 || MOV32 R0H,R1H,R0H R4H,@B1 ADDF32 || MOV32 R1H,R4H,R3H R2H,@B2 ADDF32 R0H,R2H,R0H MOV32 @Y1,R1H MOV32 @Y2,R0H ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; Load R2H with X1 - single cycle Load R1H with M1 - single cycle R3H = M1 * X1 - 2p operation Load R0H with M2 - single cycle Load R1H with X2 - single cycle <-- MPYF32 completes, R3H is valid R0H = M2 * X2 - 2p operation Load R4H with B1 - single cycle <-- MOV32 completes, R4H is valid R1H = B1 + M1*X1 - 2p operation Load R2H with B2 - single cycle <-- MPYF32 completes, R0H is valid R0H = B2 + M2*X2 - 2p operation <-- ADDF32 completes, R1H is valid Store Y1 <-- ADDF32 completes, R0H is valid Store Y2 ; 9 cycles ; 36 bytes 1.5 Floating Point Unit Instruction Set This chapter describes the assembly language instructions of the TMS320C28x plus floating-point processor. Also described are parallel operations, conditional operations, resource constraints, and addressing modes. The instructions listed here are an extension to the standard C28x instruction set. For information on standard C28x instructions, see the TMS320C28x DSP CPU and Instruction Set Reference Guide (literature number SPRU430). 1.5.1 Instruction Descriptions This section gives detailed information on the instruction set. Each instruction may present the following information: • Operands • Opcode • Description • Exceptions • Pipeline • Examples • See also The example INSTRUCTION is shown to familiarize you with the way each instruction is described. The example describes the kind of information you will find in each part of the individual instruction description and where to obtain more information. On the C28x+FPU instructions, follow the same format as the C28x. The source operand(s) are always on the right and the destination operand(s) are on the left. The explanations for the syntax of the operands used in the instruction descriptions for the TMS320C28x plus floating-point processor are given in Table 1-4. For information on the operands of standard C28x instructions, see the TMS320C28x DSP CPU and Instruction Set Reference Guide (SPRU430). 26 Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit Instruction Set www.ti.com Table 1-4. Operand Nomenclature Symbol Description #16FHi 16-bit immediate (hex or float) value that represents the upper 16-bits of an IEEE 32-bit floating-point value. Lower 16-bits of the mantissa are assumed to be zero. #16FHiHex 16-bit immediate hex value that represents the upper 16-bits of an IEEE 32-bit floating-point value. Lower 16-bits of the mantissa are assumed to be zero. #16FLoHex A 16-bit immediate hex value that represents the lower 16-bits of an IEEE 32-bit floating-point value #32Fhex 32-bit immediate value that represents an IEEE 32-bit floating-point value #32F Immediate float value represented in floating-point representation #0.0 Immediate zero #RC 16-bit immediate value for the repeat count *(0:16bitAddr) 16-bit immediate address, zero extended CNDF Condition to test the flags in the STF register FLAG Selected flags from STF register (OR) 11 bit mask indicating which floating-point status flags to change label Label representing the end of the repeat block mem16 Pointer (using any of the direct or indirect addressing modes) to a 16-bit memory location mem32 Pointer (using any of the direct or indirect addressing modes) to a 32-bit memory location RaH R0H to R7H registers RbH R0H to R7H registers RcH R0H to R7H registers RdH R0H to R7H registers ReH R0H to R7H registers RfH R0H to R7H registers RB Repeat Block Register STF FPU Status Register VALUE Flag value of 0 or 1 for selected flag (OR) 11 bit mask indicating the flag value; 0 or 1 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 27 INSTRUCTION dest1, source1, source2 — Short Description www.ti.com INSTRUCTION dest1, source1, source2 Short Description Operands dest1 description for the 1st operand for the instruction source1 description for the 2nd operand for the instruction source2 description for the 3rd operand for the instruction Each instruction has a table that gives a list of the operands and a short description. Instructions always have their destination operand(s) first followed by the source operand(s). Opcode This section shows the opcode for the instruction. Description Detailed description of the instruction execution is described. Any constraints on the operands imposed by the processor or the assembler are discussed. Restrictions Any constraints on the operands or use of the instruction imposed by the processor are discussed. Pipeline This section describes the instruction in terms of pipeline cycles as described in Section 1.4. Example Examples of instruction execution. If applicable, register and memory values are given before and after instruction execution. All examples assume the device is running with the OBJMODE set to 1. Normally the boot ROM or the c-code initialization will set this bit. See Also Lists related instructions. 28 Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit Instruction Set www.ti.com 1.5.2 Instructions The instructions are listed alphabetically, preceded by a summary. Table 1-5. Summary of Instructions Title ...................................................................................................................................... ABSF32 RaH, RbH — 32-bit Floating-Point Absolute Value........................................................................ ADDF32 RaH, #16FHi, RbH — 32-bit Floating-Point Addition ..................................................................... ADDF32 RaH, RbH, #16FHi — 32-bit Floating-Point Addition ..................................................................... ADDF32 RaH, RbH, RcH — 32-bit Floating-Point Addition ......................................................................... ADDF32 RdH, ReH, RfH ∥MOV32 mem32, RaH — 32-bit Floating-Point Addition with Parallel Move ...................... ADDF32 RdH, ReH, RfH ∥MOV32 RaH, mem32 — 32-bit Floating-Point Addition with Parallel Move....................... CMPF32 RaH, RbH — 32-bit Floating-Point Compare for Equal, Less Than or Greater Than ................................ CMPF32 RaH, #16FHi — 32-bit Floating-Point Compare for Equal, Less Than or Greater Than ............................. CMPF32 RaH, #0.0 — 32-bit Floating-Point Compare for Equal, Less Than or Greater Than ................................. EINVF32 RaH, RbH — 32-bit Floating-Point Reciprocal Approximation .......................................................... EISQRTF32 RaH, RbH — 32-bit Floating-Point Square-Root Reciprocal Approximation ...................................... F32TOI16 RaH, RbH — Convert 32-bit Floating-Point Value to 16-bit Integer ................................................... F32TOI16R RaH, RbH — Convert 32-bit Floating-Point Value to 16-bit Integer and Round ................................... F32TOI32 RaH, RbH — Convert 32-bit Floating-Point Value to 32-bit Integer ................................................... F32TOUI16 RaH, RbH — Convert 32-bit Floating-Point Value to 16-bit Unsigned Integer .................................... F32TOUI16R RaH, RbH — Convert 32-bit Floating-Point Value to 16-bit Unsigned Integer and Round ..................... F32TOUI32 RaH, RbH — Convert 32-bit Floating-Point Value to 16-bit Unsigned Integer .................................... FRACF32 RaH, RbH — Fractional Portion of a 32-bit Floating-Point Value ...................................................... I16TOF32 RaH, RbH — Convert 16-bit Integer to 32-bit Floating-Point Value .................................................. I16TOF32 RaH, mem16 — Convert 16-bit Integer to 32-bit Floating-Point Value .............................................. I32TOF32 RaH, mem32 — Convert 32-bit Integer to 32-bit Floating-Point Value .............................................. I32TOF32 RaH, RbH — Convert 32-bit Integer to 32-bit Floating-Point Value .................................................. MACF32 R3H, R2H, RdH, ReH, RfH — 32-bit Floating-Point Multiply with Parallel Add ..................................... MACF32 R3H, R2H, RdH, ReH, RfH ∥MOV32 RaH, mem32 — 32-bit Floating-Point Multiply and Accumulate with Parallel Move ................................................................................................................... MACF32 R7H, R3H, mem32, *XAR7++ — 32-bit Floating-Point Multiply and Accumulate ................................... MACF32 R7H, R6H, RdH, ReH, RfH — 32-bit Floating-Point Multiply with Parallel Add ...................................... MACF32 R7H, R6H, RdH, ReH, RfH ∥MOV32 RaH, mem32 — 32-bit Floating-Point Multiply and Accumulate with Parallel Move ................................................................................................................... MAXF32 RaH, RbH — 32-bit Floating-Point Maximum .............................................................................. MAXF32 RaH, #16FHi — 32-bit Floating-Point Maximum .......................................................................... MAXF32 RaH, RbH ∥MOV32 RcH, RdH — 32-bit Floating-Point Maximum with Parallel Move .............................. MINF32 RaH, RbH — 32-bit Floating-Point Minimum................................................................................ MINF32 RaH, #16FHi — 32-bit Floating-Point Minimum ............................................................................ MINF32 RaH, RbH ∥MOV32 RcH, RdH — 32-bit Floating-Point Minimum with Parallel Move ................................ MOV16 mem16, RaH — Move 16-bit Floating-Point Register Contents to Memory............................................. MOV32 *(0:16bitAddr), loc32 — Move the Contents of loc32 to Memory ....................................................... MOV32 ACC, RaH — Move 32-bit Floating-Point Register Contents to ACC .................................................... MOV32 loc32, *(0:16bitAddr) — Move 32-bit Value from Memory to loc32 ..................................................... MOV32 mem32, RaH — Move 32-bit Floating-Point Register Contents to Memory ............................................ MOV32 mem32, STF — Move 32-bit STF Register to Memory .................................................................... MOV32 P, RaH — Move 32-bit Floating-Point Register Contents to P ............................................................ MOV32 RaH, ACC — Move the Contents of ACC to a 32-bit Floating-Point Register ......................................... MOV32 RaH, mem32 {, CNDF} — Conditional 32-bit Move ........................................................................ MOV32 RaH, P — Move the Contents of P to a 32-bit Floating-Point Register ................................................. SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) Page 31 32 34 36 38 40 42 43 45 46 48 50 51 52 53 54 55 56 57 58 59 60 61 63 65 67 69 71 72 73 74 75 76 77 78 79 80 81 83 84 85 86 88 29 Floating Point Unit Instruction Set www.ti.com Table 1-5. Summary of Instructions (continued) MOV32 RaH, RbH {, CNDF} — Conditional 32-bit Move............................................................................ 89 MOV32 RaH, XARn — Move the Contents of XARn to a 32-bit Floating-Point Register ...................................... 90 MOV32 RaH, XT — Move the Contents of XT to a 32-bit Floating-Point Register .............................................. 91 MOV32 STF, mem32 — Move 32-bit Value from Memory to the STF Register ................................................. 92 MOV32 XARn, RaH — Move 32-bit Floating-Point Register Contents to XARn ................................................. 93 MOV32 XT, RaH — Move 32-bit Floating-Point Register Contents to XT ......................................................... 94 MOVD32 RaH, mem32 — Move 32-bit Value from Memory with Data Copy .................................................... 95 MOVF32 RaH, #32F — Load the 32-bits of a 32-bit Floating-Point Register ..................................................... 96 MOVI32 RaH, #32FHex — Load the 32-bits of a 32-bit Floating-Point Register with the immediate.......................... 97 MOVIZ RaH, #16FHiHex — Load the Upper 16-bits of a 32-bit Floating-Point Register ....................................... 98 MOVIZF32 RaH, #16FHi — Load the Upper 16-bits of a 32-bit Floating-Point Register ........................................ 99 MOVST0 FLAG — Load Selected STF Flags into ST0 ............................................................................ 100 MOVXI RaH, #16FLoHex — Move Immediate to the Low 16-bits of a Floating-Point Register .............................. 101 MPYF32 RaH, RbH, RcH — 32-bit Floating-Point Multiply ........................................................................ 102 MPYF32 RaH, #16FHi, RbH — 32-bit Floating-Point Multiply .................................................................... 103 MPYF32 RaH, RbH, #16FHi — 32-bit Floating-Point Multiply .................................................................... 105 MPYF32 RaH, RbH, RcH ∥ADDF32 RdH, ReH, RfH — 32-bit Floating-Point Multiply with Parallel Add ................... 107 MPYF32 RdH, ReH, RfH ∥MOV32 RaH, mem32 — 32-bit Floating-Point Multiply with Parallel Move ...................... 109 MPYF32 RdH, ReH, RfH ∥MOV32 mem32, RaH — 32-bit Floating-Point Multiply with Parallel Move ...................... 111 MPYF32 RaH, RbH, RcH ∥SUBF32 RdH, ReH, RfH — 32-bit Floating-Point Multiply with Parallel Subtract .............. 112 NEGF32 RaH, RbH{, CNDF} — Conditional Negation ............................................................................. 113 POP RB — Pop the RB Register from the Stack ................................................................................... 114 PUSH RB — Push the RB Register onto the Stack ................................................................................ 116 RESTORE — Restore the Floating-Point Registers ............................................................................... 117 RPTB label, loc16 — Repeat A Block of Code ..................................................................................... 119 RPTB label, #RC — Repeat a Block of Code ....................................................................................... 121 SAVE FLAG, VALUE — Save Register Set to Shadow Registers and Execute SETFLG ................................... 123 SETFLG FLAG, VALUE — Set or clear selected floating-point status flags ................................................... 125 SUBF32 RaH, RbH, RcH — 32-bit Floating-Point Subtraction ................................................................... 126 SUBF32 RaH, #16FHi, RbH — 32-bit Floating Point Subtraction ................................................................ 127 SUBF32 RdH, ReH, RfH ∥MOV32 RaH, mem32 — 32-bit Floating-Point Subtraction with Parallel Move ................ 128 SUBF32 RdH, ReH, RfH ∥MOV32 mem32, RaH — 32-bit Floating-Point Subtraction with Parallel Move ................ 130 SWAPF RaH, RbH{, CNDF} — Conditional Swap ................................................................................. 132 TESTTF CNDF — Test STF Register Flag Condition .............................................................................. 133 UI16TOF32 RaH, mem16 — Convert unsigned 16-bit integer to 32-bit floating-point value .................................. 134 UI16TOF32 RaH, RbH — Convert unsigned 16-bit integer to 32-bit floating-point value...................................... 135 UI32TOF32 RaH, mem32 — Convert Unsigned 32-bit Integer to 32-bit Floating-Point Value................................ 136 UI32TOF32 RaH, RbH — Convert Unsigned 32-bit Integer to 32-bit Floating-Point Value ................................... 137 ZERO RaH — Zero the Floating-Point Register RaH ............................................................................. 138 ZEROA — Zero All Floating-Point Registers........................................................................................ 139 30 Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated ABSF32 RaH, RbH — 32-bit Floating-Point Absolute Value www.ti.com ABSF32 RaH, RbH 32-bit Floating-Point Absolute Value Operands RaH floating-point destination register (R0H to R7H) RbH floating-point source register (R0H to R7H) Opcode LSW: 1110 0110 1001 0101 MSW: 0000 0000 00bb baaa The absolute value of RbH is loaded into RaH. Only the sign bit of the operand is modified by the ABSF32 instruction. Description if (RbH < 0) {RaH = -RbH} else {RaH = RbH} This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No Yes Yes No No The STF register flags are modified as follows: NF = 0; ZF = 0; if ( RaH[30:23] == 0) ZF = 1; Pipeline This is a single-cycle instruction. Example MOVIZF32 R1H, #-2.0 ABSF32 R1H, R1H ; R1H = -2.0 (0xC0000000) ; R1H = 2.0 (0x40000000), ZF = NF = 0 MOVIZF32 R0H, #5.0 ABSF32 R0H, R0H ; R0H = 5.0 (0x40A00000) ; R0H = 5.0 (0x40A00000), ZF = NF = 0 MOVIZF32 R0H, #0.0 ABSF32 R1H, R0H ; R0H = 0.0 ; R1H = 0.0 ZF = 1, NF = 0 See also NEGF32 RaH, RbH{, CNDF} SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 31 ADDF32 RaH, #16FHi, RbH — 32-bit Floating-Point Addition www.ti.com ADDF32 RaH, #16FHi, RbH 32-bit Floating-Point Addition Operands RaH floating-point destination register (R0H to R7H) #16FHi A 16-bit immediate value that represents the upper 16-bits of an IEEE 32-bit floating-point value. The low 16-bits of the mantissa are assumed to be all 0. RbH floating-point source register (R0H to R7H) Opcode LSW: 1110 1000 10II IIII MSW: IIII IIII IIbb baaa Add RbH to the floating-point value represented by the immediate operand. Store the result of the addition in RaH. Description #16FHi is a 16-bit immediate value that represents the upper 16-bits of an IEEE 32-bit floating-point value. The low 16-bits of the mantissa are assumed to be all 0. #16FHi is most useful for representing constants where the lowest 16-bits of the mantissa are 0. Some examples are 2.0 (0x40000000), 4.0 (0x40800000), 0.5 (0x3F000000), and -1.5 (0xBFC00000). The assembler will accept either a hex or float as the immediate value. That is, the value -1.5 can be represented as #-1.5 or #0xBFC0. RaH = RbH + #16FHi:0 This instruction can also be written as ADDF32 RaH, RbH, #16FHi. This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No Yes Yes The STF register flags are modified as follows: • LUF = 1 if ADDF32 generates an underflow condition. • LVF = 1 if ADDF32 generates an overflow condition. Pipeline This is a 2 pipeline-cycle instruction (2p). That is: ADDF32 RaH, #16FHi, RbH NOP ; 2 pipeline cycles (2p) ; 1 cycle delay or non-conflicting instruction ; <-- ADDF32 completes, RaH updated NOP Any instruction in the delay slot must not use RaH as a destination register or use RaH as a source operand. Example ; Add to R1H the value 2.0 in 32-bit floating-point format ADDF32 R0H, #2.0, R1H ; R0H = 2.0 + R1H NOP ; Delay for ADDF32 to complete ; <-- ADDF32 completes, R0H updated NOP ; ; Add to R3H the value -2.5 in 32-bit floating-point format ADDF32 R2H, #-2.5, R3H ; R2H = -2.5 + R3H NOP ; Delay for ADDF32 to complete ; <-- ADDF32 completes, R2H updated NOP ; ; Add to R5H the value 0x3FC00000 (1.5) ADDF32 R5H, #0x3FC0, R5H ; R5H = 1.5 + R5H NOP ; Delay for ADDF32 to complete ; <-- ADDF32 completes, R5H updated 32 Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated ADDF32 RaH, #16FHi, RbH — 32-bit Floating-Point Addition www.ti.com NOP See also ; ADDF32 RaH, RbH, #16FHi ADDF32 RaH, RbH, RcH ADDF32 RdH, ReH, RfH || MOV32 RaH, mem32 ADDF32 RdH, ReH, RfH || MOV32 mem32, RaH MACF32 R3H, R2H, RdH, ReH, RfH MPYF32 RaH, RbH, RcH || ADDF32 RdH, ReH, RfH SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 33 ADDF32 RaH, RbH, #16FHi — 32-bit Floating-Point Addition www.ti.com ADDF32 RaH, RbH, #16FHi 32-bit Floating-Point Addition Operands RaH floating-point destination register (R0H to R7H) RbH floating-point source register (R0H to R7H) #16FHi A 16-bit immediate value that represents the upper 16-bits of an IEEE 32-bit floating-point value. The low 16-bits of the mantissa are assumed to be all 0. Opcode LSW: 1110 1000 10II IIII MSW: IIII IIII IIbb baaa Add RbH to the floating-point value represented by the immediate operand. Store the result of the addition in RaH. Description #16FHi is a 16-bit immediate value that represents the upper 16-bits of an IEEE 32-bit floating-point value. The low 16-bits of the mantissa are assumed to be all 0. #16FHi is most useful for representing constants where the lowest 16-bits of the mantissa are 0. Some examples are 2.0 (0x40000000), 4.0 (0x40800000), 0.5 (0x3F000000), and -1.5 (0xBFC00000). The assembler will accept either a hex or float as the immediate value. That is, the value -1.5 can be represented as #-1.5 or #0xBFC0. RaH = RbH + #16FHi:0 This instruction can also be written as ADDF32 RaH, #16FHi, RbH. This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No Yes Yes The STF register flags are modified as follows: • LUF = 1 if ADDF32 generates an underflow condition. • LVF = 1 if ADDF32 generates an overflow condition. Pipeline This is a 2 pipeline-cycle instruction (2p). That is: ADDF32 RaH, #16FHi, RbH NOP ; 2 pipeline cycles (2p) ; 1 cycle delay or non-conflicting instruction ; <-- ADDF32 completes, RaH updated NOP Any instruction in the delay slot must not use RaH as a destination register or use RaH as a source operand. Example ; Add to R1H the value 2.0 in 32-bit floating-point format ADDF32 R0H, R1H, #2.0 ; R0H = R1H + 2.0 NOP ; Delay for ADDF32 to complete ; <-- ADDF32 completes, R0H updated NOP ; ; Add to R3H the value -2.5 in 32-bit floating-point format ADDF32 R2H, R3H, #-2.5 ; R2H = R3H + (-2.5) NOP ; Delay for ADDF32 to complete ; <-- ADDF32 completes, R2H updated NOP ; ; Add to R5H the value 0x3FC00000 (1.5) ADDF32 R5H, R5H, #0x3FC0 ; R5H = R5H + 1.5 NOP ; Delay for ADDF32 to complete ; <-- ADDF32 completes, R5H updated 34 Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated ADDF32 RaH, RbH, #16FHi — 32-bit Floating-Point Addition www.ti.com NOP See also ; ADDF32 RaH, RbH, #16FHi ADDF32 RaH, RbH, RcH ADDF32 RdH, ReH, RfH || MOV32 RaH, mem32 ADDF32 RdH, ReH, RfH || MOV32 mem32, RaH MACF32 R3H, R2H, RdH, ReH, RfH MPYF32 RaH, RbH, RcH || ADDF32 RdH, ReH, RfH SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 35 ADDF32 RaH, RbH, RcH — 32-bit Floating-Point Addition www.ti.com ADDF32 RaH, RbH, RcH 32-bit Floating-Point Addition Operands RaH floating-point destination register (R0H to R7H) RbH floating-point source register (R0H to R7H) RcH floating-point source register (R0H to R7H) Opcode LSW: 1110 0111 0001 0000 MSW: 0000 000c ccbb baaa Add the contents of RcH to the contents of RbH and load the result into RaH. Description RaH = RbH + RcH This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No Yes Yes The STF register flags are modified as follows: • LUF = 1 if ADDF32 generates an underflow condition. • LVF = 1 if ADDF32 generates an overflow condition. Pipeline This is a 2 pipeline-cycle instruction (2p). That is: ADDF32 RaH, RbH, RcH NOP ; 2 pipeline cycles (2p) ; 1 cycle delay or non-conflicting instruction ; <-- ADDF32 completes, RaH updated NOP Any instruction in the delay slot must not use RaH as a destination register or use RaH as a source operand. Example Calculate Y = M1*X1 + B1. This example assumes that M1, X1, B1 and Y are all on the same data page. MOVW DP, #M1 MOV32 R0H,@M1 MOV32 R1H,@X1 MPYF32 R1H,R1H,R0H || MOV32 R0H,@B1 NOP ADDF32 R1H,R1H,R0H NOP MOV32 @Y1,R1H ; ; ; ; ; ; ; ; Load the data page Load R0H with M1 Load R1H with X1 Multiply M1*X1 and in parallel load R0H with B1 <-- MOV32 complete <-- MPYF32 complete Add M*X1 to B1 and store in R1H ; <-- ADDF32 complete ; Store the result Calculate Y = A + B MOVL XAR4, #A MOV32 R0H, *XAR4 MOVL XAR4, #B MOV32 R1H, *XAR4 ADDF32 R0H,R1H,R0H MOVL XAR4, #Y MOV32 *XAR4,R0H See also 36 ; Load R0H with A ; Load R1H with B ; Add A + B R0H=R0H+R1H ; < -- ADDF32 complete ; Store the result ADDF32 RaH, RbH, #16FHi ADDF32 RaH, #16F, RbH ADDF32 RdH, ReH, RfH || MOV32 RaH, mem32 Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated ADDF32 RaH, RbH, RcH — 32-bit Floating-Point Addition www.ti.com ADDF32 RdH, ReH, RfH || MOV32 mem32, RaH MACF32 R3H, R2H, RdH, ReH, RfH MPYF32 RaH, RbH, RcH || ADDF32 RdH, ReH, RfH SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 37 ADDF32 RdH, ReH, RfH ∥MOV32 mem32, RaH — 32-bit Floating-Point Addition with Parallel Move www.ti.com ADDF32 RdH, ReH, RfH ∥MOV32 mem32, RaH 32-bit Floating-Point Addition with Parallel Move Operands RdH floating-point destination register for the ADDF32 (R0H to R7H) ReH floating-point source register for the ADDF32 (R0H to R7H) RfH floating-point source register for the ADDF32 (R0H to R7H) mem32 pointer to a 32-bit memory location. This will be the destination of the MOV32. RaH floating-point source register for the MOV32 (R0H to R7H) Opcode LSW: 1110 0000 0001 fffe MSW: eedd daaa mem32 Perform an ADDF32 and a MOV32 in parallel. Add RfH to the contents of ReH and store the result in RdH. In parallel move the contents of RaH to the 32-bit location pointed to by mem32. mem32 addresses memory using any of the direct or indirect addressing modes supported by the C28x CPU. Description RdH = ReH + RfH, [mem32] = RaH This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No Yes Yes The STF register flags are modified as follows: • LUF = 1 if ADDF32 generates an underflow condition. • LVF = 1 if ADDF32 generates an overflow condition. Pipeline ADDF32 is a 2 pipeline-cycle instruction (2p) and MOV32 takes a single cycle. That is: ADDF32 RdH, ReH, RfH || MOV32 mem32, RaH NOP ; ; ; ; ; 2 pipeline cycles (2p) 1 cycle <-- MOV32 completes, mem32 updated 1 cycle delay or non-conflicting instruction <-- ADDF32 completes, RdH updated NOP Any instruction in the delay slot must not use RdH as a destination register or use RdH as a source operand. Example ADDF32 R3H, R6H, R4H || MOV32 R7H, *-SP[2] SUBF32 R6H, R6H, R4H SUBF32 R3H, R1H, R7H || MOV32 *+XAR5[2], R3H ADDF32 R4H, R7H, R1H || MOV32 *+XAR5[6], R6H MOV32 *+XAR5[0], R3H MOV32 *+XAR5[4], R4H 38 Floating Point Unit (FPU) ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; (A) R3H = R6H + R4H and R7H = I3 <-(B) <-(C) R7H vali R6H = R6H - R4H ADDF32 (A) completes, R3H valid R3H = R1H - R7H and store R3H (A) <-- SUBF32 (B) completes, R6H valid <-- MOV32 completes, (A) stored R4H = D = R7H + R1H and store R6H (B) <-- SUBF32 (C) completes, R3H valid <-- MOV32 completes, (B) stored store R3H (C) <-- MOV32 completes, (C) stored <-- ADDF32 (D) completes, R4H valid store R4H (D) ; SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com ADDF32 RdH, ReH, RfH ∥MOV32 mem32, RaH — 32-bit Floating-Point Addition with Parallel Move ; <-- MOV32 completes, (D) stored See also ADDF32 RaH, #16FHi, RbH ADDF32 RaH, RbH, #16FHi ADDF32 RaH, RbH, RcH MACF32 R3H, R2H, RdH, ReH, RfH MPYF32 RaH, RbH, RcH || ADDF32 RdH, ReH, RfH ADDF32 RdH, ReH, RfH || MOV32 RaH, mem32 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 39 ADDF32 RdH, ReH, RfH ∥MOV32 RaH, mem32 — 32-bit Floating-Point Addition with Parallel Move www.ti.com ADDF32 RdH, ReH, RfH ∥MOV32 RaH, mem32 32-bit Floating-Point Addition with Parallel Move Operands RdH floating-point destination register for the ADDF32 (R0H to R7H). RdH cannot be the same register as RaH. ReH floating-point source register for the ADDF32 (R0H to R7H) RfH floating-point source register for the ADDF32 (R0H to R7H) RaH floating-point destination register for the MOV32 (R0H to R7H). RaH cannot be the same register as RdH. mem32 pointer to a 32-bit memory location. This is the source for the MOV32. Opcode LSW: 1110 0011 0001 fffe MSW: eedd daaa mem32 Perform an ADDF32 and a MOV32 operation in parallel. Add RfH to the contents of ReH and store the result in RdH. In parallel move the contents of the 32-bit location pointed to by mem32 to RaH. mem32 addresses memory using any of the direct or indirect addressing modes supported by the C28x CPU. Description RdH = ReH + RfH, RaH = [mem32] The destination register for the ADDF32 and the MOV32 must be unique. That is, RaH and RdH cannot be the same register. Restrictions Any instruction in the delay slot must not use RdH as a destination register or use RdH as a source operand. This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No Yes Yes Yes Yes Yes Yes The STF register flags are modified as follows: • LUF = 1 if ADDF32 generates an underflow condition. • LVF = 1 if ADDF32 generates an overflow condition. The MOV32 Instruction will set the NF, ZF, NI and ZI flags as follows: NF = RaH(31); ZF = 0; if(RaH(30:23) == 0) { ZF = 1; NF = 0; } NI = RaH(31); ZI = 0; if(RaH(31:0) == 0) ZI = 1; Pipeline The ADDF32 takes 2 pipeline cycles (2p) and the MOV32 takes a single cycle. That is: ADDF32 RdH, ReH, RfH || MOV32 RaH, mem32 ; ; ; ; ; 2 pipeline cycles (2p) 1 cycle <-- MOV32 completes, RaH updated NOP 1 cycle delay or non-conflicting instruction <-- ADDF32 completes, RdH updated NOP 40 Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com Example ADDF32 RdH, ReH, RfH ∥MOV32 RaH, mem32 — 32-bit Floating-Point Addition with Parallel Move Calculate Y = A + B - C: MOVL XAR4, #A MOV32 R0H, *XAR4 MOVL XAR4, #B MOV32 R1H, *XAR4 MOVL XAR4, #C ADDF32 R0H,R1H,R0H || MOV32 R2H, *XAR4 ; Load R0H with A ; Load R1H with B ; Add A + B and in parallel ; Load R2H with C ; <-- MOV32 complete MOVL XAR4,#Y SUBF32 R0H,R0H,R2H NOP ; MOV32 *XAR4,R0H See also ; ADDF32 complete ; Subtract C from (A + B) <-- SUBF32 completes ; Store the result ADDF32 RaH, #16FHi, RbH ADDF32 RaH, RbH, #16FHi ADDF32 RaH, RbH, RcH ADDF32 RdH, ReH, RfH || MOV32 mem32, RaH MACF32 R3H, R2H, RdH, ReH, RfH MPYF32 RaH, RbH, RcH || ADDF32 RdH, ReH, RfH SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 41 CMPF32 RaH, RbH — 32-bit Floating-Point Compare for Equal, Less Than or Greater Than www.ti.com CMPF32 RaH, RbH 32-bit Floating-Point Compare for Equal, Less Than or Greater Than Operands RaH floating-point source register (R0H to R7H) RbH floating-point source register (R0H to R7H) Opcode LSW: 1110 0110 1001 0100 MSW: 0000 0000 00bb baaa Set ZF and NF flags on the result of RaH - RbH. The CMPF32 instruction is performed as a logical compare operation. This is possible because of the IEEE format offsetting the exponent. Basically the bigger the binary number, the bigger the floating-point value. Description Special cases for inputs: • Negative zero will be treated as positive zero. • A denormalized value will be treated as positive zero. • Not-a-Number (NaN) will be treated as infinity. This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No Yes Yes No No The STF register flags are modified as follows: If(RaH == RbH) {ZF=1, NF=0} If(RaH > RbH) {ZF=0, NF=0} If(RaH < RbH) {ZF=0, NF=1} Pipeline This is a single-cycle instruction. Example ; Behavior of ZF and NF flags for different comparisons MOVIZF32 R0H, #5.0 ; CMPF32 R1H, R0H ; ZF CMPF32 R0H, R1H ; ZF CMPF32 R0H, R0H ; ZF R0H = 5.0 = 0, NF = = 0, NF = = 1, NF = (0x40A00000) 1 0 0 ; Using the result of a compare for loop control Loop: MOV32 R0H,*XAR4++ MOV32 R1H,*XAR3++ CMPF32 R1H, R0H MOVST0 ZF, NF BF Loop, GT See also 42 ; ; ; ; ; Load R0H Load R1H Set/clear ZF and NF Copy ZF and NF to ST0 Z and N bits Loop if R1H > R0H CMPF32 RaH, #16FHi CMPF32 RaH, #0.0 MAXF32 RaH, #16FHi MAXF32 RaH, RbH MINF32 RaH, #16FHi MINF32 RaH, RbH Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated CMPF32 RaH, #16FHi — 32-bit Floating-Point Compare for Equal, Less Than or Greater Than www.ti.com CMPF32 RaH, #16FHi 32-bit Floating-Point Compare for Equal, Less Than or Greater Than Operands RaH floating-point source register (R0H to R7H) #16FHi A 16-bit immediate value that represents the upper 16-bits of an IEEE 32-bit floating-point value. The low 16-bits of the mantissa are assumed to be all 0. Opcode LSW: 1110 1000 0001 0III MSW: IIII IIII IIII Iaaa Compare the value in RaH with the floating-point value represented by the immediate operand. Set the ZF and NF flags on (RaH - #16FHi:0). Description #16FHi is a 16-bit immediate value that represents the upper 16-bits of an IEEE 32-bit floating-point value. The low 16-bits of the mantissa are assumed to be all 0. This addressing mode is most useful for constants where the lowest 16-bits of the mantissa are 0. Some examples are 2.0 (0x40000000), 4.0 (0x40800000), 0.5 (0x3F000000), and -1.5 (0xBFC00000). The assembler will accept either a hex or float as the immediate value. That is, -1.5 can be represented as #-1.5 or #0xBFC0. The CMPF32 instruction is performed as a logical compare operation. This is possible because of the IEEE floating-point format offsets the exponent. Basically the bigger the binary number, the bigger the floating-point value. Special cases for inputs: • Negative zero will be treated as positive zero. • Denormalized value will be treated as positive zero. • Not-a-Number (NaN) will be treated as infinity. This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No Yes Yes No No The STF register flags are modified as follows: If(RaH == #16FHi:0) {ZF=1, NF=0} If(RaH > #16FHi:0) {ZF=0, NF=0} If(RaH < #16FHi:0) {ZF=0, NF=1} Pipeline This is a single-cycle instruction Example ; Behavior of ZF and NF MOVIZF32 R1H, #-2.0 ; MOVIZF32 R0H, #5.0 ; CMPF32 R1H, #-2.2 ; CMPF32 R0H, #6.5 ; CMPF32 R0H, #5.0 ; flags for different comparisons R1H = -2.0 (0xC0000000) R0H = 5.0 (0x40A00000) ZF = 0, NF = 0 ZF = 0, NF = 1 ZF = 1, NF = 0 ; Using the result of a compare for loop control Loop: MOV32 R1H,*XAR3++ CMPF32 R1H, #2.0 MOVST0 ZF, NF BF Loop, GT See also ; ; ; ; Load R1H Set/clear ZF and NF Copy ZF and NF to ST0 Z and N bits Loop if R1H > #2.0 CMPF32 RaH, #0.0 CMPF32 RaH, RbH MAXF32 RaH, #16FHi SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 43 CMPF32 RaH, #16FHi — 32-bit Floating-Point Compare for Equal, Less Than or Greater Than www.ti.com MAXF32 RaH, RbH MINF32 RaH, #16FHi MINF32 RaH, RbH 44 Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated CMPF32 RaH, #0.0 — 32-bit Floating-Point Compare for Equal, Less Than or Greater Than www.ti.com CMPF32 RaH, #0.0 32-bit Floating-Point Compare for Equal, Less Than or Greater Than Operands RaH floating-point source register (R0H to R7H) #0.0 zero Opcode LSW: 1110 0101 1010 0aaa Description Set the ZF and NF flags on (RaH - #0.0). The CMPF32 instruction is performed as a logical compare operation. This is possible because of the IEEE floating-point format offsets the exponent. Basically the bigger the binary number, the bigger the floating-point value. Special cases for inputs: • Negative zero will be treated as positive zero. • Denormalized value will be treated as positive zero. • Not-a-Number (NaN) will be treated as infinity. This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No Yes Yes No No The STF register flags are modified as follows: If(RaH == #0.0) {ZF=1, NF=0} If(RaH > #0.0) {ZF=0, NF=0} If(RaH < #0.0) {ZF=0, NF=1} Pipeline This is a single-cycle instruction. Example ; Behavior of ZF and NF MOVIZF32 R0H, #5.0 ; MOVIZF32 R1H, #-2.0 ; MOVIZF32 R2H, #0.0 ; CMPF32 R0H, #0.0 ; CMPF32 R1H, #0.0 ; CMPF32 R2H, #0.0 ; flags for different comparisons R0H = 5.0 (0x40A00000) R1H = -2.0 (0xC0000000) R2H = 0.0 (0x00000000) ZF = 0, NF = 0 ZF = 0, NF = 1 ZF = 1, NF = 0 ; Using the result of a compare for loop control Loop: MOV32 R1H,*XAR3++ CMPF32 R1H, #0.0 MOVST0 ZF, NF BF Loop, GT See also ; ; ; ; Load R1H Set/clear ZF and NF Copy ZF and NF to ST0 Z and N bits Loop if R1H > #0.0 CMPF32 RaH, #0.0 CMPF32 RaH, #16FHi MAXF32 RaH, #16FHi MAXF32 RaH, RbH MINF32 RaH, #16FHi MINF32 RaH, RbH SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 45 EINVF32 RaH, RbH — 32-bit Floating-Point Reciprocal Approximation www.ti.com EINVF32 RaH, RbH 32-bit Floating-Point Reciprocal Approximation Operands RaH floating-point destination register (R0H to R7H) RbH floating-point source register (R0H to R7H) Opcode LSW: 1110 0110 1001 0011 MSW: 0000 0000 00bb baaa This operation generates an estimate of 1/X in 32-bit floating-point format accurate to approximately 8 bits. This value can be used in a Newton-Raphson algorithm to get a more accurate answer. That is: Description Ye = Estimate(1/X); Ye = Ye*(2.0 - Ye*X) Ye = Ye*(2.0 - Ye*X) After two iterations of the Newton-Raphson algorithm, you will get an exact answer accurate to the 32-bit floating-point format. On each iteration the mantissa bit accuracy approximately doubles. The EINVF32 operation will not generate a negative zero, DeNorm or NaN value. RaH = Estimate of 1/RbH This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No Yes Yes The STF register flags are modified as follows: • LUF = 1 if EINVF32 generates an underflow condition. • LVF = 1 if EINVF32 generates an overflow condition. Pipeline This is a 2 pipeline cycle (2p) instruction. That is: EINVF32 RaH, RbH NOP ; 2p ; 1 cycle delay or non-conflicting instruction ; <-- EINVF32 completes, RaH updated NOP Any instruction in the delay slot must not use RaH as a destination register or use RaH as a source operand. 46 Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated EINVF32 RaH, RbH — 32-bit Floating-Point Reciprocal Approximation www.ti.com Example Calculate Y = A/B. A fast division routine similar to that shown below can be found in the C28x FPU Fast RTS Library (SPRC664). MOVL MOV32 MOVL MOV32 LCR MOV32 .... XAR4, #A R0H, *XAR4 XAR4, #B R1H, *XAR4 DIV *XAR4, R0H DIV: EINVF32 CMPF32 MPYF32 NOP SUBF32 NOP MPYF32 NOP MPYF32 CMPF32 SUBF32 NEGF32 MPYF32 NOP MPYF32 LRETR See also ; Load R0H with A ; Load R1H with B ; Calculate R0H = R0H / R1H ; R2H, R1H R0H, #0.0 R3H, R2H, R1H ; R2H = Ye = Estimate(1/B) ; Check if A == 0 ; R3H = Ye*B R3H, #2.0, R3H ; R3H = 2.0 - Ye*B R2H, R2H, R3H ; R2H = Ye = Ye*(2.0 - Ye*B) R3H, R1H, R3H, R0H, R2H, ; ; ; ; ; R2H, R1H #0.0 #2.0, R3H R0H, EQ R2H, R3H R0H, R0H, R2H R3H = Check R3H = Fixes R2H = Ye*B if B == 0.0 2.0 - Ye*B sign for A/0.0 Ye = Ye*(2.0 - Ye*B) ; R0H = Y = A*Ye = A/B EISQRTF32 RaH, RbH SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 47 EISQRTF32 RaH, RbH — 32-bit Floating-Point Square-Root Reciprocal Approximation www.ti.com EISQRTF32 RaH, RbH 32-bit Floating-Point Square-Root Reciprocal Approximation Operands RaH floating-point destination register (R0H to R7H) RbH floating-point source register (R0H to R7H) Opcode LSW: 1110 0110 1001 0010 MSW: 0000 0000 00bb baaa This operation generates an estimate of 1/sqrt(X) in 32-bit floating-point format accurate to approximately 8 bits. This value can be used in a Newton-Raphson algorithm to get a more accurate answer. That is: Description Ye = Estimate(1/sqrt(X)); Ye = Ye*(1.5 - Ye*Ye*X/2.0) Ye = Ye*(1.5 - Ye*Ye*X/2.0) After 2 iterations of the Newton-Raphson algorithm, you will get an exact answer accurate to the 32-bit floating-point format. On each iteration the mantissa bit accuracy approximately doubles. The EISQRTF32 operation will not generate a negative zero, DeNorm or NaN value. RaH = Estimate of 1/sqrt (RbH) This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No Yes Yes The STF register flags are modified as follows: • LUF = 1 if EISQRTF32 generates an underflow condition. • LVF = 1 if EISQRTF32 generates an overflow condition. Pipeline This is a 2 pipeline cycle (2p) instruction. That is: EINVF32 RaH, RbH ; 2 pipeline cycles (2p) NOP ; 1 cycle delay or non-conflicting instruction ; <-- EISQRTF32 completes, RaH updated NOP Any instruction in the delay slot must not use RaH as a destination register or use RaH as a source operand. 48 Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com Example EISQRTF32 RaH, RbH — 32-bit Floating-Point Square-Root Reciprocal Approximation Calculate the square root of X. A square-root routine similar to that shown below can be found in the C28x FPU Fast RTS Library (SPRC664). ; Y = sqrt(X) ; Ye = Estimate(1/sqrt(X)); ; Ye = Ye*(1.5 - Ye*Ye*X*0.5) ; Ye = Ye*(1.5 - Ye*Ye*X*0.5) ; Y = X*Ye _sqrt: ; EISQRTF32 R1H, R0H ; MPYF32 R2H, R0H, #0.5 ; MPYF32 R3H, R1H, R1H ; NOP MPYF32 R3H, R3H, R2H ; NOP SUBF32 R3H, #1.5, R3H ; NOP MPYF32 R1H, R1H, R3H ; NOP MPYF32 R3H, R1H, R2H ; NOP MPYF32 R3H, R1H, R3H ; NOP SUBF32 R3H, #1.5, R3H ; CMPF32 R0H, #0.0 ; MPYF32 R1H, R1H, R3H ; NOP MOV32 R1H, R0H, EQ ; MPYF32 R0H, R0H, R1H ; LRETR See also R0H R1H R2H R3H = = = = X on entry Ye = Estimate(1/sqrt(X)) X*0.5 Ye*Ye R3H = Ye*Ye*X*0.5 R3H = 1.5 - Ye*Ye*X*0.5 R2H = Ye = Ye*(1.5 - Ye*Ye*X*0.5) R3H = Ye*X*0.5 R3H = Ye*Ye*X*0.5 R3H = 1.5 - Ye*Ye*X*0.5 Check if X == 0 R2H = Ye = Ye*(1.5 - Ye*Ye*X*0.5) If X is zero, change the Ye estimate to 0 R0H = Y = X*Ye = sqrt(X) EINVF32 RaH, RbH SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 49 F32TOI16 RaH, RbH — Convert 32-bit Floating-Point Value to 16-bit Integer www.ti.com F32TOI16 RaH, RbH Convert 32-bit Floating-Point Value to 16-bit Integer Operands RaH floating-point destination register (R0H to R7H) RbH floating-point source register (R0H to R7H) Opcode LSW: 1110 0110 1000 1100 MSW: 0000 0000 00bb baaa Description Convert a 32-bit floating point value in RbH to a 16-bit integer and truncate. The result will be stored in RaH. RaH(15:0) = F32TOI16(RbH) RaH(31:16) = sign extension of RaH(15) This instruction does not affect any flags: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline This is a 2 pipeline cycle (2p) instruction. That is: F32TOI16 RaH, RbH ; 2 pipeline cycles (2p) NOP ; 1 cycle delay or non-conflicting instruction ; <-- F32TOI16 completes, RaH updated NOP Any instruction in the delay slot must not use RaH as a destination register or use RaH as a source operand. Example MOVIZF32 R0H, #5.0 F32TOI16 R1H, R0H MOVIZF32 R2H, #-5.0 F32TOI16 R3H, R2H NOP See also 50 ; ; ; ; ; ; ; ; ; ; ; R0H = 5.0 (0x40A00000) R1H(15:0) = F32TOI16(R0H) R1H(31:16) = Sign extension of R1H(15) R2H = -5.0 (0xC0A00000) <-- F32TOI16 complete, R1H(15:0) = 5 (0x0005) R1H(31:16) = 0 (0x0000) R3H(15:0) = F32TOI16(R2H) R3H(31:16) = Sign extension of R3H(15) 1 Cycle delay for F32TOI16 to complete <-- F32TOI16 complete, R3H(15:0) = -5 (0xFFFB) R3H(31:16) = (0xFFFF) F32TOI16R RaH, RbH F32TOUI16 RaH, RbH F32TOUI16R RaH, RbH I16TOF32 RaH, RbH I16TOF32 RaH, mem16 UI16TOF32 RaH, mem16 UI16TOF32 RaH, RbH Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated F32TOI16R RaH, RbH — Convert 32-bit Floating-Point Value to 16-bit Integer and Round www.ti.com F32TOI16R RaH, RbH Convert 32-bit Floating-Point Value to 16-bit Integer and Round Operands RaH floating-point destination register (R0H to R7H) RbH floating-point source register (R0H to R7H) Opcode LSW: 1110 0110 1000 1100 MSW: 1000 0000 00bb baaa Description Convert the 32-bit floating point value in RbH to a 16-bit integer and round to the nearest even value. The result is stored in RaH. RaH(15:0) = F32ToI16round(RbH) RaH(31:16) = sign extension of RaH(15) This instruction does not affect any flags: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline This is a 2 pipeline cycle (2p) instruction. That is: F32TOI16R RaH, RbH NOP ; 2 pipeline cycles (2p) ; 1 cycle delay or non-conflicting instruction ; <-- F32TOI16R completes, RaH updated NOP Any instruction in the delay slot must not use RaH as a destination register or use RaH as a source operand. Example MOVIZ R0H, #0x3FD9 ; R0H [31:16] = 0x3FD9 MOVXI R0H, #0x999A ; R0H [15:0] = 0x999A ; R0H = 1.7 (0x3FD9999A) F32TOI16R R1H, R0H ; R1H(15:0) = F32TOI16round (R0H) ; R1H(31:16) = Sign extension of R1H(15) MOVF32 R2H, #-1.7 ; R2H = -1.7 (0xBFD9999A) ; <- F32TOI16R complete, R1H(15:0) = 2 (0x0002) ; R1H(31:16) = 0 (0x0000) F32TOI16R R3H, R2H ; R3H(15:0) = F32TOI16round (R2H) ; R3H(31:16) = Sign extension of R2H(15) NOP ; 1 Cycle delay for F32TOI16R to complete ; <-- F32TOI16R complete, R1H(15:0) = -2 (0xFFFE) ; R1H(31:16) = (0xFFFF) See also F32TOI16 RaH, RbH F32TOUI16 RaH, RbH F32TOUI16R RaH, RbH I16TOF32 RaH, RbH I16TOF32 RaH, mem16 UI16TOF32 RaH, mem16 UI16TOF32 RaH, RbH SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 51 F32TOI32 RaH, RbH — Convert 32-bit Floating-Point Value to 32-bit Integer www.ti.com F32TOI32 RaH, RbH Convert 32-bit Floating-Point Value to 32-bit Integer Operands RaH floating-point destination register (R0H to R7H) RbH floating-point source register (R0H to R7H) Opcode LSW: 1110 0110 1000 1000 MSW: 0000 0000 00bb baaa Description Convert the 32-bit floating-point value in RbH to a 32-bit integer value and truncate. Store the result in RaH. RaH = F32TOI32(RbH) This instruction does not affect any flags: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline This is a 2 pipeline cycle (2p) instruction. That is: F32TOI32 RaH, RbH NOP ; 2 pipeline cycles (2p) ; 1 cycle delay or non-conflicting instruction ; <-- F32TOI32 completes, RaH updated NOP Any instruction in the delay slot must not use RaH as a destination register or use RaH as a source operand. Example MOVF32 R2H, #11204005.0 F32TOI32 R3H, R2H MOVF32 R4H, #-11204005.0 F32TOI32 R5H, R4H NOP See also 52 ; ; ; ; ; ; ; ; ; R2H = 11204005.0 (0x4B2AF5A5) R3H = F32TOI32 (R2H) R4H = -11204005.0 (0xCB2AF5A5) <-- F32TOI32 complete, R3H = 11204005 (0x00AAF5A5) R5H = F32TOI32 (R4H) 1 Cycle delay for F32TOI32 to complete <-- F32TOI32 complete, R5H = -11204005 (0xFF550A5B) F32TOUI32 RaH, RbH I32TOF32 RaH, RbH I32TOF32 RaH, mem32 UI32TOF32 RaH, RbH UI32TOF32 RaH, mem32 Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated F32TOUI16 RaH, RbH — Convert 32-bit Floating-Point Value to 16-bit Unsigned Integer www.ti.com F32TOUI16 RaH, RbH Convert 32-bit Floating-Point Value to 16-bit Unsigned Integer Operands RaH floating-point destination register (R0H to R7H) RbH floating-point source register (R0H to R7H) Opcode LSW: 1110 0110 1000 1110 MSW: 0000 0000 00bb baaa Description Convert the 32-bit floating point value in RbH to an unsigned 16-bit integer value and truncate to zero. The result will be stored in RaH. To instead round the integer to the nearest even value use the F32TOUI16R instruction. RaH(15:0) = F32ToUI16(RbH) RaH(31:16) = 0x0000 This instruction does not affect any flags: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline This is a 2 pipeline cycle (2p) instruction. That is: F32TOUI16 RaH, RbH NOP ; 2 pipeline cycles (2p) ; 1 cycle delay or non-conflicting instruction ; <-- F32TOUI16 completes, RaH updated NOP Any instruction in the delay slot must not use RaH as a destination register or use RaH as a source operand. Example MOVIZF32 R4H, #9.0 F32TOUI16 R5H, R4H MOVIZF32 R6H, #-9.0 F32TOUI16 R7H, R6H NOP See also ; ; ; ; ; ; ; ; ; ; ; R4H R5H R5H R6H <-- = 9.0 (0x41100000) (15:0) = F32TOUI16 (R4H) (31:16) = 0x0000 = -9.0 (0xC1100000) F32TOUI16 complete, R5H (15:0) = 9.0 (0x0009) R5H (31:16) = 0.0 (0x0000) R7H (15:0) = F32TOUI16 (R6H) R7H (31:16) = 0x0000 1 Cycle delay for F32TOUI16 to complete <-- F32TOUI16 complete, R7H (15:0) = 0.0 (0x0000) R7H (31:16) = 0.0 (0x0000) F32TOI16 RaH, RbH F32TOUI16R RaH, RbH F32TOUI16R RaH, RbH I16TOF32 RaH, RbH I16TOF32 RaH, mem16 UI16TOF32 RaH, mem16 UI16TOF32 RaH, RbH SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 53 F32TOUI16R RaH, RbH — Convert 32-bit Floating-Point Value to 16-bit Unsigned Integer and Round www.ti.com F32TOUI16R RaH, RbH Convert 32-bit Floating-Point Value to 16-bit Unsigned Integer and Round Operands RaH floating-point destination register (R0H to R7H) RbH floating-point source register (R0H to R7H) Opcode LSW: 1110 0110 1000 1110 MSW: 1000 0000 00bb baaa Description Convert the 32-bit floating-point value in RbH to an unsigned 16-bit integer and round to the closest even value. The result will be stored in RaH. To instead truncate the converted value, use the F32TOUI16 instruction. RaH(15:0) = F32ToUI16round(RbH) RaH(31:16) = 0x0000 This instruction does not affect any flags: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline This is a 2 pipeline cycle (2p) instruction. That is: F32TOUI16R RaH, RbH NOP ; 2 pipeline cycles (2p) ; 1 cycle delay or non-conflicting instruction ; <-- F32TOUI16R completes, RaH updated NOP Any instruction in the delay slot must not use RaH as a destination register or use RaH as a source operand. Example MOVIZ R5H, #0x412C MOVXI R5H, #0xCCCD See also F32TOI16 RaH, RbH F32TOI16R RaH, RbH F32TOUI16 RaH, RbH I16TOF32 RaH, RbH I16TOF32 RaH, mem16 UI16TOF32 RaH, mem16 UI16TOF32 RaH, RbH 54 ; ; ; F32TOUI16R R6H, R5H ; ; MOVF32 R7H, #-10.8 ; ; ; ; F32TOUI16R R0H, R7H ; ; NOP ; ; ; ; Floating Point Unit (FPU) R5H = 0x412C R5H = 0xCCCD R5H = 10.8 (0x412CCCCD) R6H (15:0) = F32TOUI16round (R5H) R6H (31:16) = 0x0000 R7H = -10.8 (0x0xC12CCCCD) <-- F32TOUI16R complete, R6H (15:0) = 11.0 (0x000B) R6H (31:16) = 0.0 (0x0000) R0H (15:0) = F32TOUI16round (R7H) R0H (31:16) = 0x0000 1 Cycle delay for F32TOUI16R to complete <-- F32TOUI16R complete, R0H (15:0) = 0.0 (0x0000) R0H (31:16) = 0.0 (0x0000) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated F32TOUI32 RaH, RbH — Convert 32-bit Floating-Point Value to 16-bit Unsigned Integer www.ti.com F32TOUI32 RaH, RbH Convert 32-bit Floating-Point Value to 16-bit Unsigned Integer Operands RaH floating-point destination register (R0H to R7H) RbH floating-point source register (R0H to R7H) Opcode LSW: 1110 0110 1000 1010 MSW: 0000 0000 00bb baaa Description Convert the 32-bit floating-point value in RbH to an unsigned 32-bit integer and store the result in RaH. RaH = F32ToUI32(RbH) This instruction does not affect any flags: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline This is a 2 pipeline cycle (2p) instruction. That is: F32TOUI32 RaH, RbH NOP ; 2 pipeline cycles (2p) ; 1 cycle delay or non-conflicting instruction ; <-- F32TOUI32 completes, RaH updated NOP Any instruction in the delay slot must not use RaH as a destination register or use RaH as a source operand. Example MOVIZF32 R6H, #12.5 F32TOUI32 R7H, R6H MOVIZF32 R1H, #-6.5 F32TOUI32 R2H, R1H NOP See also ; ; ; ; ; ; ; R6H = 12.5 (0x41480000) R7H = F32TOUI32 (R6H) R1H = -6.5 (0xC0D00000) <-- F32TOUI32 complete, R7H = 12.0 (0x0000000C) R2H = F32TOUI32 (R1H) 1 Cycle delay for F32TOUI32 to complete <-- F32TOUI32 complete, R2H = 0.0 (0x00000000) F32TOI32 RaH, RbH I32TOF32 RaH, RbH I32TOF32 RaH, mem32 UI32TOF32 RaH, RbH UI32TOF32 RaH, mem32 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 55 FRACF32 RaH, RbH — Fractional Portion of a 32-bit Floating-Point Value www.ti.com FRACF32 RaH, RbH Fractional Portion of a 32-bit Floating-Point Value Operands RaH floating-point destination register (R0H to R7H) RbH floating-point source register (R0H to R7H) Opcode LSW: 1110 0110 1111 0001 MSW: 0000 0000 00bb baaa Description Returns in RaH the fractional portion of the 32-bit floating-point value in RbH Flags This instruction does not affect any flags: Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline This is a 2 pipeline cycle (2p) instruction. That is: FRACF32 RaH, RbH ; 2 pipeline cycles (2p) NOP ; 1 cycle delay or non-conflicting instruction ; <-- FRACF32 completes, RaH updated NOP Any instruction in the delay slot must not use RaH as a destination register or use RaH as a source operand. Example MOVIZF32 R2H, #19.625 ; R2H = 19.625 (0x419D0000) FRACF32 R3H, R2H ; R3H = FRACF32 (R2H) NOP ; 1 Cycle delay for FRACF32 to complete ; <-- FRACF32 complete, R3H = 0.625 (0x3F200000) See also 56 Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated I16TOF32 RaH, RbH — Convert 16-bit Integer to 32-bit Floating-Point Value www.ti.com I16TOF32 RaH, RbH Convert 16-bit Integer to 32-bit Floating-Point Value Operands RaH floating-point destination register (R0H to R7H) RbH floating-point source register (R0H to R7H) Opcode LSW: 1110 0110 1000 1101 MSW: 0000 0000 00bb baaa Description Convert the 16-bit signed integer in RbH to a 32-bit floating point value and store the result in RaH. RaH = I16ToF32 RbH This instruction does not affect any flags: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline This is a 2 pipeline cycle (2p) instruction. That is: I16TOF32 RaH, RbH ; 2 pipeline cycles (2p) NOP ; 1 cycle delay or non-conflicting instruction ; <-- I16TOF32 completes, RaH updated NOP Any instruction in the delay slot must not use RaH as a destination register or use RaH as a source operand. Example MOVIZ R0H, #0x0000 MOVXI R0H, #0x0004 I16TOF32 R1H, R0H MOVIZ R2H, #0x0000 See also F32TOI16 RaH, RbH F32TOI16R RaH, RbH F32TOUI16 RaH, RbH F32TOUI16R RaH, RbH I16TOF32 RaH, mem16 UI16TOF32 RaH, mem16 UI16TOF32 RaH, RbH ; R0H[31:16] = 0.0 (0x0000) ; R0H[15:0] = 4.0 (0x0004) ; R1H = I16TOF32 (R0H) ; R2H[31:16] = 0.0 (0x0000) ; <--I16TOF32 complete, R1H = 4.0 (0x40800000) MOVXI R2H, #0xFFFC ; R2H[15:0] = 4.0 (0xFFFC) I16TOF32 R3H, R2H ; R3H = I16TOF32 (R2H) NOP ; 1 Cycle delay for I16TOF32 to complete ; <-- I16TOF32 complete, R3H = -4.0 (0xC0800000) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 57 I16TOF32 RaH, mem16 — Convert 16-bit Integer to 32-bit Floating-Point Value www.ti.com I16TOF32 RaH, mem16 Convert 16-bit Integer to 32-bit Floating-Point Value Operands RaH floating-point destination register (R0H to R7H) mem316 16-bit source memory location to be converted Opcode LSW: 1110 0010 1100 1000 MSW: 0000 0aaa mem16 Description Convert the 16-bit signed integer indicated by the mem16 pointer to a 32-bit floatingpoint value and store the result in RaH. RaH = I16ToF32[mem16] This instruction does not affect any flags: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline This is a 2 pipeline cycle (2p) instruction. That is: I16TOF32 RaH, mem16 ; 2 pipeline cycles (2p) NOP ; 1 cycle delay or non-conflicting instruction ; <-- I16TOF32 completes, RaH updated NOP Any instruction in the delay slot must not use RaH as a destination register or use RaH as a source operand. Example MOVW DP, #0x0280 MOV @0, #0x0004 I16TOF32 R0H, @0 MOV @1, #0xFFFC See also F32TOI16 RaH, RbH F32TOI16R RaH, RbH F32TOUI16 RaH, RbH F32TOUI16R RaH, RbH I16TOF32 RaH, RbH UI16TOF32 RaH, mem16 UI16TOF32 RaH, RbH 58 ; ; ; ; ; I16TOF32 R1H, @1 ; NOP ; ; Floating Point Unit (FPU) DP = 0x0280 [0x00A000] = 4.0 (0x0004) R0H = I16TOF32 [0x00A000] [0x00A001] = -4.0 (0xFFFC) <--I16TOF32 complete, R0H = 4.0 (0x40800000) R1H = I16TOF32 [0x00A001] 1 Cycle delay for I16TOF32 to complete <-- I16TOF32 complete, R1H = -4.0 (0xC0800000) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated I32TOF32 RaH, mem32 — Convert 32-bit Integer to 32-bit Floating-Point Value www.ti.com I32TOF32 RaH, mem32 Convert 32-bit Integer to 32-bit Floating-Point Value Operands RaH floating-point destination register (R0H to R7H) mem32 32-bit source for the MOV32 operation. mem32 means that the operation can only address memory using any of the direct or indirect addressing modes supported by the C28x CPU Opcode LSW: 1110 0010 1000 1000 MSW: 0000 0aaa mem32 Description Convert the 32-bit signed integer indicated by the mem32 pointer to a 32-bit floating point value and store the result in RaH. RaH = I32ToF32[mem32] This instruction does not affect any flags: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline This is a 2 pipeline cycle (2p) instruction. That is: I32TOF32 RaH, mem32 ; 2 pipeline cycles (2p) NOP ; 1 cycle delay or non-conflicting instruction ; <-- I32TOF32 completes, RaH updated NOP Any instruction in the delay slot must not use RaH as a destination register or use RaH as a source operand. Example MOVW DP, #0x0280 ; MOV @0, #0x1111 ; MOV @1, #0x1111 ; ; ; I32TOF32 R1H, @0 ; NOP ; ; See also F32TOI32 RaH, RbH F32TOUI32 RaH, RbH I32TOF32 RaH, RbH UI32TOF32 RaH, RbH UI32TOF32 RaH, mem32 DP = 0x0280 [0x00A000] = 4369 (0x1111) [0x00A001] = 4369 (0x1111) Value of the 32 bit signed integer present in 0x00A001 and 0x00A000 is +286331153 (0x11111111) R1H = I32TOF32 (0x11111111) 1 Cycle delay for I32TOF32 to complete <-- I32TOF32 complete, R1H = 286331153 (0x4D888888) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 59 I32TOF32 RaH, RbH — Convert 32-bit Integer to 32-bit Floating-Point Value www.ti.com I32TOF32 RaH, RbH Convert 32-bit Integer to 32-bit Floating-Point Value Operands RaH floating-point destination register (R0H to R7H) RbH floating-point source register (R0H to R7H) Opcode LSW: 1110 0110 1000 1001 MSW: 0000 0000 00bb baaa Description Convert the signed 32-bit integer in RbH to a 32-bit floating-point value and store the result in RaH. RaH = I32ToF32(RbH) This instruction does not affect any flags: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline This is a 2 pipeline cycle (2p) instruction. That is: I32TOF32 RaH, RbH ; 2 pipeline cycles (2p) NOP ; 1 cycle delay or non-conflicting instruction ; <-- I32TOF32 completes, RaH updated NOP Any instruction in the delay slot must not use RaH as a destination register or use RaH as a source operand. Example MOVIZ R2H, #0x1111 ; R2H[31:16] = 4369 (0x1111) MOVXI R2H, #0x1111 ; R2H[15:0] = 4369 (0x1111) ; Value of the 32 bit signed integer present ; in R2H is +286331153 (0x11111111) I32TOF32 R3H, R2H ; R3H = I32TOF32 (R2H) NOP ; 1 Cycle delay for I32TOF32 to complete ; <-- I32TOF32 complete, R3H = 286331153 (0x4D888888) See also F32TOI32 RaH, RbH F32TOUI32 RaH, RbH I32TOF32 RaH, mem32 UI32TOF32 RaH, RbH UI32TOF32 RaH, mem32 60 Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated MACF32 R3H, R2H, RdH, ReH, RfH — 32-bit Floating-Point Multiply with Parallel Add www.ti.com MACF32 R3H, R2H, RdH, ReH, RfH 32-bit Floating-Point Multiply with Parallel Add This instruction is an alias for the parallel multiply and add instruction. The operands are translated by the assembler such that the instruction becomes: Operands MPYF32 RdH, RaH, RbH || ADDF32 R3H, R3H, R2H R3H floating-point destination and source register for the ADDF32 R2H floating-point source register for the ADDF32 operation (R0H to R7H) RdH floating-point destination register for MPYF32 operation (R0H to R7H) RdH cannot be R3H ReH floating-point source register for MPYF32 operation (R0H to R7H) RfH floating-point source register for MPYF32 operation (R0H to R7H) Opcode LSW: 1110 0111 0100 00ff MSW: feee dddc ccbb baaa Description This instruction is an alias for the parallel multiply and add, MACF32 || ADDF32, instruction. RdH = ReH * RfH R3H = R3H + R2H Restrictions The destination register for the MPYF32 and the ADDF32 must be unique. That is, RdH cannot be R3H. Flags This instruction modifies the following flags in the STF register:. Flag TF ZI NI ZF NF LUF LVF Modified No No No No No Yes Yes The STF register flags are modified as follows: • LUF = 1 if MPYF32 or ADDF32 generates an underflow condition. • LVF = 1 if MPYF32 or ADDF32 generates an overflow condition. Pipeline Both MPYF32 and ADDF32 take 2 pipeline cycles (2p) That is: MPYF32 RaH, RbH, RcH ; 2 pipeline cycles (2p) || ADDF32 RdH, ReH, RfH ; 2 pipeline cycles (2p) NOP ; 1 cycle delay or non-conflicting instruction ; <-- MPYF32, ADDF32 complete, RaH, RdH updated NOP Any instruction in the delay slot must not use RaH or RdH as a destination register or as a source operand. SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 61 MACF32 R3H, R2H, RdH, ReH, RfH — 32-bit Floating-Point Multiply with Parallel Add Example ; ; ; ; ; ; ; ; ; www.ti.com Perform 5 multiply and accumulate operations: 1st 2nd 3rd 4th 5th multiply: multiply: multiply: multiply: multiply: A B C D E = = = = = X0 X1 X2 X3 X3 * * * * * Y0 Y1 Y2 Y3 Y3 Result = A + B + C + D + E MOV32 MOV32 R0H, *XAR4++ R1H, *XAR5++ MPYF32 R2H, R0H, R1H || MOV32 R0H, *XAR4++ MOV32 R1H, *XAR5++ MPYF32 R3H, R0H, R1H || MOV32 R0H, *XAR4++ MOV32 R1H, *XAR5++ ; ; ; ; R0H = X0 R1H = Y0 R2H = A = X0 * Y0 In parallel R0H = X1 ; R1H = Y1 ; R3H = B = X1 * Y1 ; In parallel R0H = X2 ; ; ; MACF32 R3H, R2H, R2H, R0H, R1H ; || MOV32 R0H, *XAR4++ MOV32 R1H, *XAR5++ ; ; ; MACF32 R3H, R2H, R2H, R0H, R1H ; || MOV32 R0H, *XAR4 MOV32 R1H, *XAR5 ; R1H = Y2 R3H = A + B R2H = C = X2 * Y2 In parallel R0H = X3 R1H = Y3 R3H = (A + B) + C R2H = D = X3 * Y3 In parallel R0H = X4 R1H = Y4 ; The next MACF32 is an alias for ; MPYF32 || ADDF32 ; R2H = E = X4 * Y4 MACF32 R3H, R2H, R2H, R0H, R1H ; in parallel R3H = (A + B + C) + D NOP ; Wait for MPYF32 || ADDF32 to complete ADDF32 R3H, R3H, R2H NOP MOV32 @Result, R3H See also 62 ; R3H = (A + B + C + D) + E ; Wait for ADDF32 to complete ; Store the result MACF32 R3H, R2H, RdH, ReH, RfH || MOV32 RaH, mem32 MACF32 R7H, R3H, mem32, *XAR7++ MACF32 R7H, R6H, RdH, ReH, RfH MACF32 R7H, R6H, RdH, ReH, RfH || MOV32 RaH, mem32 MPYF32 RaH, RbH, RcH || ADDF32 RdH, ReH, RfH Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com MACF32 R3H, R2H, RdH, ReH, RfH ∥MOV32 RaH, mem32 — 32-bit Floating-Point Multiply and Accumulate with Parallel Move MACF32 R3H, R2H, RdH, ReH, RfH ∥MOV32 RaH, mem32 32-bit Floating-Point Multiply and Accumulate with Parallel Move Operands R3H floating-point destination/source register R3H for the add operation R2H floating-point source register R2H for the add operation RdH floating-point destination register (R0H to R7H) for the multiply operation RdH cannot be the same register as RaH ReH floating-point source register (R0H to R7H) for the multiply operation RfH floating-point source register (R0H to R7H) for the multiply operation RaH floating-point destination register for the MOV32 operation (R0H to R7H). RaH cannot be R3H or the same register as RdH. mem32 32-bit source for the MOV32 operation Opcode LSW: 1110 0011 0011 fffe MSW: eedd daaa mem32 Description Multiply and accumulate the contents of floating-point registers and move from register to memory. The destination register for the MOV32 cannot be the same as the destination registers for the MACF32. R3H = R3H + R2H, RdH = ReH * RfH, RaH = [mem32] Restrictions The destination registers for the MACF32 and the MOV32 must be unique. That is, RaH cannot be R3H and RaH cannot be the same register as RdH. Flags This instruction modifies the following flags in the STF register: Flag TF ZI NI ZF NF LUF LVF Modified No Yes Yes Yes Yes Yes Yes The STF register flags are modified as follows: • LUF = 1 if MACF32 (add or multiply) generates an underflow condition. • LVF = 1 if MACF32 (add or multiply) generates an overflow condition. MOV32 sets the NF, ZF, NI and ZI flags as follows: NF = RaH(31); ZF = 0; if(RaH(30:23) == 0) { ZF = 1; NF = 0; } NI = RaH(31); ZI = 0; if(RaH(31:0) == 0) ZI = 1; Pipeline The MACF32 takes 2 pipeline cycles (2p) and the MOV32 takes a single cycle. That is: MACF32 R3H, R2H, RdH, ReH, RfH ; || MOV32 RaH, mem32 ; ; NOP ; ; NOP 2 pipeline cycles (2p) 1 cycle <-- MOV32 completes, RaH updated 1 cycle delay for MACF32 <-- MACF32 completes, R3H, RdH updated Any instruction in the delay slot for this version of MACF32 must not use R3H or RdH as a destination register or R3H or RdH as a source operand. SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 63 MACF32 R3H, R2H, RdH, ReH, RfH ∥MOV32 RaH, mem32 — 32-bit Floating-Point Multiply and Accumulate with Parallel Move www.ti.com Example ; ; ; ; ; ; ; ; ; Perform 5 multiply and accumulate operations: 1ST 2nd 3rd 4TH 5th multiply: multiply: multiply: multiply: multiply: || || || || = = = = = X0 X1 X2 X3 X3 R0H, *XAR4++ R1H, *XAR5++ MPYF32 R2H, R0H, R1H MOV32 R0H, *XAR4++ MOV32 R1H, *XAR5++ MPYF32 R3H, R0H, R1H MOV32 R0H, *XAR4++ MOV32 R1H, *XAR5++ 64 Y0 Y1 Y2 Y3 Y3 ; R0H = X0 ; R1H = Y0 ; R2H = A = X0 * Y0 ; In parallel R0H = X1 ; R1H = Y1 ; R3H = B = X1 * Y1 ; In parallel R0H = X2 ; R1H = Y2 ; R3H = A + B ; R2H = C = X2 * Y2 MACF32 R3H, R2H, R2H, R0H, R1H ; In parallel R0H = X3 MOV32 R0H, *XAR4++ MOV32 R1H, *XAR5++ ; R1H = Y3 ; R3H = (A + B) + C ; R2H = D = X3 * Y3 MACF32 R3H, R2H, R2H, R0H, R1H ; In parallel R0H = X4 MOV32 R0H, *XAR4 MOV32 R1H, *XAR5 ; R1H = Y4 MPYF32 R2H, R0H, R1H ADDF32 R3H, R3H, R2H NOP ADDF32 R3H, R3H, R2H NOP MOV32 @Result, R3H See also * * * * * Result = A + B + C + D + E MOV32 MOV32 || A B C D E ; R2H = E = X4 * Y4 ; in parallel R3H = (A + B + C) + D ; Wait for MPYF32 || ADDF32 to complete ; R3H = (A + B + C + D) + E ; Wait for ADDF32 to complete ; Store the result MACF32 R3H, R2H, RdH, ReH, RfH MACF32 R7H, R3H, mem32, *XAR7++ MACF32 R7H, R6H, RdH, ReH, RfH MACF32 R7H, R6H, RdH, ReH, RfH || MOV32 RaH, mem32 MPYF32 RaH, RbH, RcH || ADDF32 RdH, ReH, RfH Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated MACF32 R7H, R3H, mem32, *XAR7++ — 32-bit Floating-Point Multiply and Accumulate www.ti.com MACF32 R7H, R3H, mem32, *XAR7++ 32-bit Floating-Point Multiply and Accumulate Operands R7H floating-point destination register R3H floating-point destination register mem32 pointer to a 32-bit source location *XAR7++ 32-bit location pointed to by auxiliary register 7, XAR7 is post incremented. Opcode LSW: 1110 0010 0101 0000 MSW: 0001 1111 mem32 Description Perform a multiply and accumulate operation. When used as a standalone operation, the MACF32 will perform a single multiply as shown below: Cycle 1: R3H = R3H + R2H, R2H = [mem32] * [XAR7++] This instruction is the only floating-point instruction that can be repeated using the single repeat instruction (RPT ||). When repeated, the destination of the accumulate will alternate between R3H and R7H on each cycle and R2H and R6H are used as temporary storage for each multiply. Cycle 1: Cycle 2: Cycle 3: Cycle 4: etc... R3H R7H R3H R7H = = = = R3H R7H R3H R7H + + + + R2H, R6H, R2H, R6H, R2H R6H R2H R6H = = = = [mem32] [mem32] [mem32] [mem32] * * * * [XAR7++] [XAR7++] [XAR7++] [XAR7++] Restrictions R2H and R6H will be used as temporary storage by this instruction. Flags This instruction modifies the following flags in the STF register: Flag TF ZI NI ZF NF LUF LVF Modified No No No No No Yes Yes The STF register flags are modified as follows: • LUF = 1 if MACF32 generates an underflow condition. • LVF = 1 if MACF32 generates an overflow condition. Pipeline When repeated the MACF32 takes 3 + N cycles where N is the number of times the instruction is repeated. When repeated, this instruction has the following pipeline restrictions:RPT #(N-1) || MACF32 R7H, R3H, *XAR6++, *XAR7++ ; ; ; ; No restriction Cannot be a 2p instruction that writes to R2H, R3H, R6H or R7H Execute N times, where N is even ; No restrictions. ; Can read R2H, R3H, R6H and R7H SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 65 MACF32 R7H, R3H, mem32, *XAR7++ — 32-bit Floating-Point Multiply and Accumulate www.ti.com MACF32 can also be used standalone. In this case, the instruction takes 2 cycles and the following pipeline restrictions apply: ; ; ; MACF32 R7H, R3H, *XAR6, *XAR7 ; ; R2H and R3H are valid (note: no delay NOP No restriction Cannot be a 2p instruction that writes to R2H, R3H, R6H or R7H R3H = R3H + R2H, R2H = [mem32] * [XAR7++] <-required) Example ZERO R2H ZERO R3H registers ZERO R6H ZERO R7H RPT #3 || MACF32 R7H, R3H, *XAR6++, *XAR7++ ADDF32 R7H, R7H, R3H NOP NOP ; Zero the accumulation registers ; and temporary multiply storage ; Repeat MACF32 N+1 (4) times ; Final accumulate ; <-- ADDF32 completes, R7H valid Cascading of RPT || MACF32 is allowed as long as the first and subsequent counts are even. Cascading is useful for creating interruptible windows so that interrupts are not delayed too long by the RPT instruction. For example: ZERO R2H ZERO R3H registers ZERO R6H ZERO R7H RPT #3 || MACF32 R7H, || MACF32 R7H, is even || MACF32 R7H, ADDF32 R7H, NOP ; Zero the accumulation registers ; and temporary multiply storage ; Execute MACF32 N+1 (4) times R3H, *XAR6++, *XAR7++ RPT #5 ; Execute MACF32 N+1 (6) times R3H, *XAR6++, *XAR7++ RPT #N ; Repeat MACF32 N+1 times where N+1 R3H, *XAR6++, *XAR7++ R7H, R3H ; Final accumulate ; <-- ADDF32 completes, R7H valid See also 66 MACF32 R3H, R2H, RdH, ReH, RfH || MOV32 RaH, mem32 MACF32 R7H, R6H, RdH, ReH, RfH || MOV32 RaH, mem32 MPYF32 RaH, RbH, RcH || ADDF32 RdH, ReH, RfH Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated MACF32 R7H, R6H, RdH, ReH, RfH — 32-bit Floating-Point Multiply with Parallel Add www.ti.com MACF32 R7H, R6H, RdH, ReH, RfH 32-bit Floating-Point Multiply with Parallel Add This instruction is an alias for the parallel multiply and add instruction. The operands are translated by the assembler such that the instruction becomes: Operands MPYF32 RdH, RaH, RbH || ADDF32 R7H, R7H, R6H R7H floating-point destination and source register for the ADDF32 R6H floating-point source register for the ADDF32 operation (R0H to R7H) RdH floating-point destination register for MPYF32 operation (R0H to R7H) RdH cannot be R3H ReH floating-point source register for MPYF32 operation (R0H to R7H) RfH floating-point source register for MPYF32 operation (R0H to R7H) Opcode LSW: 1110 0111 0100 00ff MSW: feee dddc ccbb baaa Description This instruction is an alias for the parallel multiply and add, MACF32 || ADDF32, instruction. RdH = RaH * RbH R7H = R6H + R6H Restrictions The destination register for the MPYF32 and the ADDF32 must be unique. That is, RdH cannot be R7H. Flags This instruction modifies the following flags in the STF register:. Flag TF ZI NI ZF NF LUF LVF Modified No No No No No Yes Yes The STF register flags are modified as follows: • LUF = 1 if MPYF32 or ADDF32 generates an underflow condition. • LVF = 1 if MPYF32 or ADDF32 generates an overflow condition. Pipeline Both MPYF32 and ADDF32 take 2 pipeline cycles (2p) That is: MPYF32 RaH, RbH, RcH ; 2 pipeline cycles (2p) || ADDF32 RdH, ReH, RfH ; 2 pipeline cycles (2p) NOP ; 1 cycle delay or non-conflicting instruction ; <-- MPYF32, ADDF32 complete, RaH, RdH updated NOP Any instruction in the delay slot must not use RaH or RdH as a destination register or as a source operand. SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 67 MACF32 R7H, R6H, RdH, ReH, RfH — 32-bit Floating-Point Multiply with Parallel Add Example ; ; ; ; ; ; ; ; ; Perform 5 multiply and accumulate operations: 1st 2nd 3rd 4th 5th multiply: multiply: multiply: multiply: multiply: A B C D E = = = = = X0 X1 X2 X3 X3 * * * * * Y0 Y1 Y2 Y3 Y3 Result = A + B + C + D + E MOV32 MOV32 R0H, *XAR4++ R1H, *XAR5++ MPYF32 R6H, R0H, R1H || MOV32 R0H, *XAR4++ MOV32 R1H, *XAR5++ MPYF32 R7H, R0H, R1H || MOV32 R0H, *XAR4++ MOV32 R1H, *XAR5++ MACF32 R7H, R6H, R6H, R0H, R1H || MOV32 R0H, *XAR4++ MOV32 R1H, *XAR5++ MACF32 R7H, R6H, R6H, R0H, R1H || MOV32 R0H, *XAR4 MOV32 R1H, *XAR5 ; Next MACF32 is an alias for ; MPYF32 || ADDF32 MACF32 R7H, R6H, R6H, R0H, R1H NOP ADDF32 R7H, R7H, R6H NOP MOV32 @Result, R7H See also 68 www.ti.com ; ; ; ; R0H = X0 R1H = Y0 R6H = A = X0 * Y0 In parallel R0H = X1 ; R1H = Y1 ; R7H = B = X1 * Y1 ; In parallel R0H = X2 ; ; ; ; R1H = Y2 R7H = A + B R6H = C = X2 * Y2 In parallel R0H = X3 ; ; ; ; R1H = Y3 R7H = (A + B) + C R6H = D = X3 * Y3 In parallel R0H = X4 ; R1H = Y4 ; ; ; ; ; ; R6H = E = X4 * Y4 in parallel R7H = (A + B + C) + D Wait for MPYF32 || ADDF32 to complete R7H = (A + B + C + D) + E Wait for ADDF32 to complete Store the result MACF32 R3H, R2H, RdH, ReH, RfH MACF32 R3H, R2H, RdH, ReH, RfH || MOV32 RaH, mem32 MACF32 R7H, R3H, mem32, *XAR7++ MACF32 R7H, R6H, RdH, ReH, RfH || MOV32 RaH, mem32 MPYF32 RaH, RbH, RcH || ADDF32 RdH, ReH, RfH Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com MACF32 R7H, R6H, RdH, ReH, RfH ∥MOV32 RaH, mem32 — 32-bit Floating-Point Multiply and Accumulate with Parallel Move MACF32 R7H, R6H, RdH, ReH, RfH ∥MOV32 RaH, mem32 32-bit Floating-Point Multiply and Accumulate with Parallel Move Operands R7H floating-point destination/source register R7H for the add operation R6H floating-point source register R6H for the add operation RdH floating-point destination register (R0H to R7H) for the multiply operation. RdH cannot be the same register as RaH. ReH floating-point source register (R0H to R7H) for the multiply operation RfH floating-point source register (R0H to R7H) for the multiply operation RaH floating-point destination register for the MOV32 operation (R0H to R7H). RaH cannot be R3H or the same as RdH. mem32 32-bit source for the MOV32 operation Opcode LSW: 1110 0011 1100 fffe MSW: eedd daaa mem32 Description Multiply/accumulate the contents of floating-point registers and move from register to memory. The destination register for the MOV32 cannot be the same as the destination registers for the MACF32. R7H = R7H + R6H RdH = ReH * RfH, RaH = [mem32] Restrictions The destination registers for the MACF32 and the MOV32 must be unique. That is, RaH cannot be R7H and RaH cannot be the same register as RdH. Flags This instruction modifies the following flags in the STF register: Flag TF ZI NI ZF NF LUF LVF Modified No Yes Yes Yes Yes Yes Yes The STF register flags are modified as follows: • LUF = 1 if MACF32 (add or multiply) generates an underflow condition. • LVF = 1 if MACF32 (add or multiply) generates an overflow condition. The MOV32 Instruction will set the NF, ZF, NI and ZI flags as follows: NF = RaH(31); ZF = 0; if(RaH(30:23) == 0) {ZF = 1; NF = 0;} NI = RaH(31); ZI = 0; if(RaH(31:0) == 0) ZI = 1; Pipeline The MACF32 takes 2 pipeline cycles (2p) and the MOV32 takes a single cycle. That is: MACF32 R7H, R6H, RdH, ReH, RfH ; || MOV32 RaH, mem32 ; ; NOP ; ; NOP 2 pipeline cycles (2p) 1 cycle <-- MOV32 completes, RaH updated 1 cycle delay <-- MACF32 completes, R7H, RdH updated SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 69 MACF32 R7H, R6H, RdH, ReH, RfH ∥MOV32 RaH, mem32 — 32-bit Floating-Point Multiply and Accumulate with Parallel Move www.ti.com Example Perform 5 multiply and accumulate operations: ; ; 1st multiply: A = X0 * Y0 ; 2nd multiply: B = X1 * Y1 ; 3rd multiply: C = X2 * Y2 ; 4th multiply: D = X3 * Y3 ; 5th multiply: E = X3 * Y3 ; ; Result = A + B + C + D + E MOV32 MOV32 || || || R0H, *XAR4++ R1H, *XAR5++ MPYF32 R6H, R0H, R1H MOV32 R0H, *XAR4++ MOV32 R1H, *XAR5++ MPYF32 R7H, R0H, R1H MOV32 R0H, *XAR4++ MOV32 R1H, *XAR5++ MPYF32 R6H, R0H, R1H || ADDF32 R7H, R7H, R6H NOP ADDF32 R7H, R7H, R6H NOP MOV32 @Result, R7H 70 ; R6H = A = X0 * Y0 ; In parallel R0H = X1 ; R1H = Y1 ; R7H = B = X1 * Y1 ; In parallel R0H = X2 ; R1H = Y2 ; R7H = A + B ; R6H = C = X2 * Y2 MACF32 R7H, R6H, R6H, R0H, R1H ; In parallel R0H = X3 MOV32 R0H, *XAR4++ MOV32 R1H, *XAR5++ ; R1H = Y3 MACF32 R7H, R6H, R6H, R0H, R1H || MOV32 R0H, *XAR4 MOV32 R1H, *XAR5 See also ; R0H = X0 ; R1H = Y0 ; R7H = (A + B) + C ; R6H = D = X3 * Y3 ; In parallel R0H = X4 ; R1H = Y4 ; R6H = E = X4 * Y4 ; in parallel R7H = (A + B + C) + D ; Wait for MPYF32 || ADDF32 to complete ; R7H = (A + B + C + D) + E ; Wait for ADDF32 to complete ; Store the result MACF32 R7H, R3H, mem32, *XAR7++ MACF32 R3H, R2H, RdH, ReH, RfH || MOV32 RaH, mem32 MPYF32 RaH, RbH, RcH || ADDF32 RdH, ReH, RfH Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated MAXF32 RaH, RbH — 32-bit Floating-Point Maximum www.ti.com MAXF32 RaH, RbH 32-bit Floating-Point Maximum Operands RaH floating-point source/destination register (R0H to R7H) RbH floating-point source register (R0H to R7H) Opcode LSW: 1110 0110 1001 0110 MSW: 0000 0000 00bb baaa Description if(RaH < RbH) RaH = RbH Special cases for the output from the MAXF32 operation: • NaN output will be converted to infinity • A denormalized output will be converted to positive zero. This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No Yes Yes No No The ZF and NF flags are configured on the result of the operation, not the result stored in the destination register. if(RaH == RbH){ZF=1, NF=0} if(RaH > RbH) {ZF=0, NF=0} if(RaH < RbH) {ZF=0, NF=1} Pipeline This is a single-cycle instruction. Example MOVIZF32 MOVIZF32 MOVIZF32 MAXF32 MAXF32 MAXF32 MAXF32 See also R0H, R1H, R2H, R2H, R1H, R2H, R0H, #5.0 #-2.0 #-1.5 R1H R2H R0H R2H ; ; ; ; ; ; ; R0H R1H R2H R2H R1H R2H R2H = = = = = = = 5.0 (0x40A00000) -2.0 (0xC0000000) -1.5 (0xBFC00000) -1.5, ZF = NF = 0 -1.5, ZF = 0, NF = 1 5.0, ZF = 0, NF = 1 5.0, ZF = 1, NF = 0 CMPF32 RaH, RbH CMPF32 RaH, #16FHi CMPF32 RaH, #0.0 MAXF32 RaH, RbH || MOV32 RcH, RdH MAXF32 RaH, #16FHi MINF32 RaH, RbH MINF32 RaH, #16FHi SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 71 MAXF32 RaH, #16FHi — 32-bit Floating-Point Maximum www.ti.com MAXF32 RaH, #16FHi 32-bit Floating-Point Maximum Operands RaH floating-point source/destination register (R0H to R7H) #16FHi A 16-bit immediate value that represents the upper 16-bits of an IEEE 32-bit floating-point value. The low 16-bits of the mantissa are assumed to be all 0. Opcode LSW: 1110 1000 0010 0III MSW: IIII IIII IIII Iaaa Description Compare RaH with the floating-point value represented by the immediate operand. If the immediate value is larger, then load it into RaH. if(RaH < #16FHi:0) RaH = #16FHi:0 #16FHi is a 16-bit immediate value that represents the upper 16-bits of an IEEE 32-bit floating-point value. The low 16-bits of the mantissa are assumed to be all 0. This addressing mode is most useful for constants where the lowest 16-bits of the mantissa are 0. Some examples are 2.0 (0x40000000), 4.0 (0x40800000), 0.5 (0x3F000000), and -1.5 (0xBFC00000). The assembler will accept either a hex or float as the immediate value. That is, -1.5 can be represented as #-1.5 or #0xBFC0. Special cases for the output from the MAXF32 operation: • NaN output will be converted to infinity • A denormalized output will be converted to positive zero. This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No Yes Yes No No The ZF and NF flags are configured on the result of the operation, not the result stored in the destination register. if(RaH == #16FHi:0){ZF=1, NF=0} if(RaH > #16FHi:0) {ZF=0, NF=0} if(RaH < #16FHi:0) {ZF=0, NF=1} Pipeline This is a single-cycle instruction. Example MOVIZF32 MOVIZF32 MOVIZF32 MAXF32 MAXF32 MAXF32 MAXF32 See also 72 R0H, R1H, R2H, R0H, R1H, R2H, R2H, #5.0 #4.0 #-1.5 #5.5 #2.5 #-1.0 #-1.0 ; ; ; ; ; ; ; R0H R1H R2H R0H R1H R2H R2H = 5.0 (0x40A00000) = 4.0 (0x40800000) = -1.5 (0xBFC00000) = 5.5, ZF = 0, NF = = 4.0, ZF = 0, NF = = -1.0, ZF = 0, NF = = -1.5, ZF = 1, NF = 1 0 1 0 MAXF32 RaH, RbH MAXF32 RaH, RbH || MOV32 RcH, RdH MINF32 RaH, RbH MINF32 RaH, #16FHi Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated MAXF32 RaH, RbH ∥MOV32 RcH, RdH — 32-bit Floating-Point Maximum with Parallel Move www.ti.com MAXF32 RaH, RbH ∥MOV32 RcH, RdH 32-bit Floating-Point Maximum with Parallel Move Operands RaH floating-point source/destination register for the MAXF32 operation (R0H to R7H) RaH cannot be the same register as RcH RbH floating-point source register for the MAXF32 operation (R0H to R7H) RcH floating-point destination register for the MOV32 operation (R0H to R7H) RcH cannot be the same register as RaH RdH floating-point source register for the MOV32 operation (R0H to R7H) Opcode LSW: 1110 0110 1001 1100 MSW: 0000 dddc ccbb baaa Description If RaH is less than RbH, then load RaH with RbH. Thus RaH will always have the maximum value. If RaH is less than RbH, then, in parallel, also load RcH with the contents of RdH. if(RaH < RbH) { RaH = RbH; RcH = RdH; } The MAXF32 instruction is performed as a logical compare operation. This is possible because of the IEEE floating-point format offsets the exponent. Basically the bigger the binary number, the bigger the floating-point value. Special cases for the output from the MAXF32 operation: • NaN output will be converted to infinity • A denormalized output will be converted to positive zero. Restrictions The destination register for the MAXF32 and the MOV32 must be unique. That is, RaH cannot be the same register as RcH. Flags This instruction modifies the following flags in the STF register: Flag TF ZI NI ZF NF LUF LVF Modified No No No Yes Yes No No The ZF and NF flags are configured on the result of the operation, not the result stored in the destination register. if(RaH == RbH){ZF=1, NF=0} if(RaH > RbH) {ZF=0, NF=0} if(RaH < RbH) {ZF=0, NF=1} Pipeline This is a single-cycle instruction. Example MOVIZF32 MOVIZF32 MOVIZF32 MOVIZF32 MAXF32 || MOV32 MAXF32 || MOV32 MAXF32 || MOV32 See also R0H, R1H, R2H, R3H, R0H, R3H, R1H, R3H, R0H, R2H, #5.0 #4.0 #-1.5 #-2.0 R1H R2H R0H R2H R1H R1H ; ; ; ; ; R0H R1H R2H R3H R0H = 5.0 (0x40A00000) = 4.0 (0x40800000) = -1.5 (0xBFC00000) =-2.0 (0xC0000000) = 5.0, R3H = -1.5, ZF = 0, NF = 0 ; R1H = 5.0, R3H = -1.5, ZF = 0, NF = 1 ; R0H = 5.0, R2H = -1.5, ZF = 1, NF = 0 MAXF32 RaH, RbH MAXF32 RaH, #16FHi SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 73 MINF32 RaH, RbH — 32-bit Floating-Point Minimum MINF32 RaH, RbH www.ti.com 32-bit Floating-Point Minimum Operands RaH floating-point source/destination register (R0H to R7H) RbH floating-point source register (R0H to R7H) Opcode LSW: 1110 0110 1001 0111 MSW: 0000 0000 00bb baaa Description if(RaH > RbH) RaH = RbH Special cases for the output from the MINF32 operation: • NaN output will be converted to infinity • A denormalized output will be converted to positive zero. This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No Yes Yes No No The ZF and NF flags are configured on the result of the operation, not the result stored in the destination register. if(RaH == RbH){ZF=1, NF=0} if(RaH > RbH) {ZF=0, NF=0} if(RaH < RbH) {ZF=0, NF=1} Pipeline This is a single-cycle instruction. Example MOVIZF32 MOVIZF32 MOVIZF32 MINF32 MINF32 MINF32 MINF32 See also MAXF32 RaH, RbH MAXF32 RaH, #16FHi MINF32 RaH, #16FHi MINF32 RaH, RbH || MOV32 RcH, RdH 74 Floating Point Unit (FPU) R0H, R1H, R2H, R0H, R1H, R2H, R1H, #5.0 #4.0 #-1.5 R1H R2H R1H R0H ; ; ; ; ; ; ; R0H R1H R2H R0H R1H R2H R2H = = = = = = = 5.0 (0x40A00000) 4.0 (0x40800000) -1.5 (0xBFC00000) 4.0, ZF = 0, NF = 0 -1.5, ZF = 0, NF = 0 -1.5, ZF = 1, NF = 0 -1.5, ZF = 0, NF = 1 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated MINF32 RaH, #16FHi — 32-bit Floating-Point Minimum www.ti.com MINF32 RaH, #16FHi 32-bit Floating-Point Minimum Operands RaH floating-point source/destination register (R0H to R7H) #16FHi A 16-bit immediate value that represents the upper 16-bits of an IEEE 32-bit floating-point value. The low 16-bits of the mantissa are assumed to be all 0. Opcode LSW: 1110 1000 0011 0III MSW: IIII IIII IIII Iaaa Description Compare RaH with the floating-point value represented by the immediate operand. If the immidate value is smaller, then load it into RaH. if(RaH > #16FHi:0) RaH = #16FHi:0 #16FHi is a 16-bit immediate value that represents the upper 16-bits of an IEEE 32-bit floating-point value. The low 16-bits of the mantissa are assumed to be all 0. This addressing mode is most useful for constants where the lowest 16-bits of the mantissa are 0. Some examples are 2.0 (0x40000000), 4.0 (0x40800000), 0.5 (0x3F000000), and -1.5 (0xBFC00000). The assembler will accept either a hex or float as the immediate value. That is, -1.5 can be represented as #-1.5 or #0xBFC0. Special cases for the output from the MINF32 operation: • NaN output will be converted to infinity • A denormalized output will be converted to positive zero. This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No Yes Yes No No The ZF and NF flags are configured on the result of the operation, not the result stored in the destination register. if(RaH == #16FHi:0){ZF=1, NF=0} if(RaH > #16FHi:0) {ZF=0, NF=0} if(RaH < #16FHi:0) {ZF=0, NF=1} Pipeline This is a single-cycle instruction. Example MOVIZF32 MOVIZF32 MOVIZF32 MINF32 MINF32 MINF32 MINF32 See also R0H, R1H, R2H, R0H, R1H, R2H, R2H, #5.0 #4.0 #-1.5 #5.5 #2.5 #-1.0 #-1.5 ; ; ; ; ; ; ; R0H R1H R2H R0H R1H R2H R2H = 5.0 (0x40A00000) = 4.0 (0x40800000) = -1.5 (0xBFC00000) = 5.0, ZF = 0, NF = = 2.5, ZF = 0, NF = = -1.5, ZF = 0, NF = = -1.5, ZF = 1, NF = 1 0 1 0 MAXF32 RaH, #16FHi MAXF32 RaH, RbH MINF32 RaH, RbH MINF32 RaH, RbH || MOV32 RcH, RdH SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 75 MINF32 RaH, RbH ∥MOV32 RcH, RdH — 32-bit Floating-Point Minimum with Parallel Move www.ti.com MINF32 RaH, RbH ∥MOV32 RcH, RdH 32-bit Floating-Point Minimum with Parallel Move Operands RaH floating-point source/destination register for the MIN32 operation (R0H to R7H) RaH cannot be the same register as RcH RbH floating-point source register for the MIN32 operation (R0H to R7H) RcH floating-point destination register for the MOV32 operation (R0H to R7H) RcH cannot be the same register as RaH RdH floating-point source register for the MOV32 operation (R0H to R7H) Opcode LSW: 1110 0110 1001 1101 MSW: 0000 dddc ccbb baaa Description if(RaH > RbH) { RaH = RbH; RcH = RdH; } Special cases for the output from the MINF32 operation: • NaN output will be converted to infinity • A denormalized output will be converted to positive zero. Restrictions The destination register for the MINF32 and the MOV32 must be unique. That is, RaH cannot be the same register as RcH. Flags This instruction modifies the following flags in the STF register: Flag TF ZI NI ZF NF LUF LVF Modified No No No Yes Yes No No The ZF and NF flags are configured on the result of the operation, not the result stored in the destination register. if(RaH == RbH){ZF=1, NF=0} if(RaH > RbH) {ZF=0, NF=0} if(RaH < RbH) {ZF=0, NF=1} Pipeline This is a single-cycle instruction. Example MOVIZF32 MOVIZF32 MOVIZF32 MOVIZF32 MINF32 || MOV32 MINF32 || MOV32 MINF32 || MOV32 See also 76 R0H, R1H, R2H, R3H, R0H, R3H, R1H, R3H, R2H, R1H, #5.0 #4.0 #-1.5 #-2.0 R1H R2H R0H R2H R1H R3H ; ; ; ; ; R0H R1H R2H R3H R0H = 5.0 (0x40A00000) = 4.0 (0x40800000) = -1.5 (0xBFC00000) = -2.0 (0xC0000000) = 4.0, R3H = -1.5, ZF = 0, NF = 0 ; R1H = 4.0, R3H = -1.5, ZF = 1, NF = 0 ; R2H = -1.5, R1H = 4.0, ZF = 1, NF = 1 MINF32 RaH, RbH MINF32 RaH, #16FHi Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated MOV16 mem16, RaH — Move 16-bit Floating-Point Register Contents to Memory www.ti.com MOV16 mem16, RaH Move 16-bit Floating-Point Register Contents to Memory Operands mem16 points to the 16-bit destination memory RaH floating-point source register (R0H to R7H) Opcode LSW: 1110 0010 0001 0011 MSW: 0000 0aaa mem16 Description Move 16-bit value from the lower 16-bits of the floating-point register (RaH[15:0]) to the location pointed to by mem16. [mem16] = RaH[15:0] No flags STF flags are affected. Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline This is a single-cycle instruction. Example MOVW DP, #0x02C0 ; DP = 0x02C0 MOVXI R4H, #0x0003 ; R4H = 3.0 (0x0003) MOV16 @0, R4H ; [0x00B000] = 3.0 (0x0003 See also MOVIZ RaH, #16FHiHex MOVIZF32 RaH, #16FHi MOVXI RaH, #16FLoHex SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 77 MOV32 *(0:16bitAddr), loc32 — Move the Contents of loc32 to Memory www.ti.com MOV32 *(0:16bitAddr), loc32 Move the Contents of loc32 to Memory Operands 0:16bitAddr 16-bit immediate address, zero extended loc32 32- bit source location Opcode LSW: 1011 1101 loc32 MSW: IIII IIII IIII IIII Description Move the 32-bit value in loc32 to the memory location addressed by 0:16bitAddr. The EALLOW bit in the ST1 register is ignored by this operation. [0:16bitAddr] = [loc32] This instruction does not modify any STF register flags. Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline This is a two-cycle instruction. Example MOVIZ MOVXI NOP MOV32 MOV32 See also MOV32 mem32, RaH MOV32 mem32, STF MOV32 loc32, *(0:16bitAddr) 78 Floating Point Unit (FPU) R5H, #0x1234 R5H, #0xABCD ; ; ; ACC, R5H ; *(0xA000), @ACC ; ; ; ; R5H[31:16] = 0x1234 R5H[15:0] = 0xABCD 1 Alignment Cycle ACC = 0x1234ABCD [0x00A000] = ACC NOP 1 Cycle delay for MOV32 to complete <-- MOV32 *(0:16bitAddr), loc32 complete, [0x00A000] = 0xABCD, [0x00A001] = 0x1234 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated MOV32 ACC, RaH — Move 32-bit Floating-Point Register Contents to ACC www.ti.com MOV32 ACC, RaH Move 32-bit Floating-Point Register Contents to ACC Operands ACC 28x accumulator RaH floating-point source register (R0H to R7H) Opcode LSW: 1011 1111 loc32 MSW: IIII IIII IIII IIII Description If the condition is true, then move the 32-bit value referenced by mem32 to the floatingpoint register indicated by RaH. ACC = RaH No STF flags are affected. Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Z and N flag in status register zero (ST0) of the 28x CPU are affected. Pipeline While this is a single-cycle instruction, additional pipeline alignment is required when copying a floating-point register to a C28x register. If the move follows a single cycle floating point instruction, a single alignment cycle must be added. For example: MINF32 R0H,R1H NOP MOV32 @ACC,R0H NOP ; ; ; ; Single-cycle instruction 1 alignment cycle Copy R0H to ACC Any instruction If the move follows a 2 pipeline-cycle floating point instruction, then two alignment cycles must be used. For example: ADDF32 R2H, R1H, R0H ; NOP ; ; NOP ; ; ; NOP ; 2 pipeline instruction (2p) 1 cycle delay for ADDF32 to complete <-- ADDF32 completes, R2H is valid 1 alignment cycle MOV32 ACC, R2H copy R2H into ACC, takes 2 cycles <-- MOV32 completes, ACC is valid Any instruction ADDF32 R2H, R1H, R0H ; NOP ; ; NOP ; MOV32 ACC, R2H ; ; NOP ; MOVIZF32 R0H, #2.5 ; F32TOUI32 R0H, R0H NOP ; ; NOP ; MOV32 P, R0H ; 2 pipeline instruction (2p) 1 cycle delay for ADDF32 to complete < -- ADDF32 completes, R2H is valid 1 alignment cycle copy R2H into ACC, takes 2 cycles <-- MOV32 completes, ACC is valid Any instruction R0H = 2.5 = 0x40200000 Example See also Delay for conversion instruction < -- Conversion complete, R0H valid Alignment cycle P = 2 = 0x00000002 MOV32 P, RaH MOV32 XARn, RaH MOV32 XT, RaH SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 79 MOV32 loc32, *(0:16bitAddr) — Move 32-bit Value from Memory to loc32 www.ti.com MOV32 loc32, *(0:16bitAddr) Move 32-bit Value from Memory to loc32 Operands loc32 destination location 0:16bitAddr 16-bit address of the 32-bit source value Opcode LSW: 1011 1111 loc32 MSW: IIII IIII IIII IIII Description Copy the 32-bit value referenced by 0:16bitAddr to the location indicated by loc32. [loc32] = [0:16bitAddr] No STF flags are affected. If loc32 is the ACC register, then the Z and N flag in status register zero (ST0) of the 28x CPU are affected. Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline This is a 2 cycle instruction. Example MOVW DP, MOV @0, MOV @1, MOV32 NOP See also MOV32 RaH, mem32{, CNDF} MOV32 *(0:16bitAddr), loc32 MOV32 STF, mem32 MOVD32 RaH, mem32 80 Floating Point Unit (FPU) #0x0300 #0xFFFF #0x1111 @ACC, *(0xC000) ; ; ; ; ; ; DP = 0x0300 [0x00C000] = 0xFFFF; [0x00C001] = 0x1111; AL = [0x00C000], AH = [0x00C001] 1 Cycle delay for MOV32 to complete <-- MOV32 complete, AL = 0xFFFF, AH = 0x1111 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated MOV32 mem32, RaH — Move 32-bit Floating-Point Register Contents to Memory www.ti.com MOV32 mem32, RaH Move 32-bit Floating-Point Register Contents to Memory Operands RaH floating-point register (R0H to R7H) mem32 points to the 32-bit destination memory Opcode LSW: 1110 0010 0000 0011 MSW: 0000 0aaa mem32 Description Move from memory to STF. [mem32] = RaH This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No No flags affected. Pipeline This is a single-cycle instruction. Example ; ; ; ; ; ; ; ; ; Perform 5 multiply and accumulate operations: 1st 2nd 3rd 4th 5th multiply: multiply: multiply: multiply: multiply: || || || || = = = = = X0 X1 X2 X3 X3 * * * * * Y0 Y1 Y2 Y3 Y3 Result = A + B + C + D + E MOV32 MOV32 || A B C D E R0H, *XAR4++ R1H, *XAR5++ MPYF32 R6H, R0H, R1H MOV32 R0H, *XAR4++ MOV32 R1H, *XAR5++ MPYF32 R7H, R0H, R1H MOV32 R0H, *XAR4++ MOV32 R1H, *XAR5++ ; R0H = X0 ; R1H = Y0 ; R6H = A = X0 * Y0 ; In parallel R0H = X1 ; R1H = Y1 ; R7H = B = X1 * Y1 ; In parallel R0H = X2 ; R1H = Y2 ; R7H = A + B ; R6H = C = X2 * Y2 MACF32 R7H, R6H, R6H, R0H, R1H ; In parallel R0H = X3 MOV32 R0H, *XAR4++ MOV32 R1H, *XAR5++ ; R1H = Y3 ; R3H = (A + B) + C ; R6H = D = X3 * Y3 MACF32 R7H, R6H, R6H, R0H, R1H ; In parallel R0H = X4 MOV32 R0H, *XAR4 MOV32 R1H, *XAR5 ; R1H = Y4 MPYF32 R6H, R0H, R1H ADDF32 R7H, R7H, R2H NOP ; R6H = E = X4 * Y4 ; in parallel R7H = (A + B + C) + D ; Wait for MPYF32 || ADDF32 to complete SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 81 MOV32 mem32, RaH — Move 32-bit Floating-Point Register Contents to Memory See also 82 ADDF32 R7H, R7H, R6H ; R7H = (A + B + C + D) + E NOP MOV32 ; Wait for ADDF32 to complete ; Store the result @Result, R7H www.ti.com MOV32 *(0:16bitAddr), loc32 MOV32 mem32, STF Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated MOV32 mem32, STF — Move 32-bit STF Register to Memory www.ti.com MOV32 mem32, STF Move 32-bit STF Register to Memory Operands STF floating-point status register mem32 points to the 32-bit destination memory Opcode LSW: 1110 0010 0000 0000 MSW: 0000 0000 mem32 Description Copy the floating-point status register, STF, to memory. [mem32] = STF This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No No flags affected. Pipeline This is a single-cycle instruction. Example 1 MOVW MOVIZF32 MOVIZF32 CMPF32 MOV32 DP, R0H, R1H, R0H, @0, #0x0280 #2.0 #3.0 R1H STF ; ; ; ; ; DP = 0x0280 R0H = 2.0 (0x40000000) R1H = 3.0 (0x40400000) ZF = 0, NF = 1, STF = 0x00000004 [0x00A000] = 0x00000004 Example 2 MOV32 *SP++, STF MOVF32 R2H, #3.0 MOVF32 R3H, #5.0 CMPF32 R2H, R3H MOV32 R3H, R2H, LT MOV32 STF, *--SP See also ; ; ; ; ; ; Store STF in stack R2H = 3.0 (0x40400000) R3H = 5.0 (0x40A00000) ZF = 0, NF = 1, STF = 0x00000004 R3H = 3.0 (0x40400000) Restore STF from stack MOV32 mem32, RaH MOV32 *(0:16bitAddr), loc32 MOVST0 FLAG SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 83 MOV32 P, RaH — Move 32-bit Floating-Point Register Contents to P MOV32 P, RaH www.ti.com Move 32-bit Floating-Point Register Contents to P Operands P 28x product register P RaH floating-point source register (R0H to R7H) Opcode LSW: 1011 1111 loc32 MSW: IIII IIII IIII IIII Description Move the 32-bit value in RaH to the 28x product register P. P = RaH No flags affected in floating-point unit. Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline While this is a single-cycle instruction, additional pipeline alignment is required when copying a floating-point register to a C28x register. If the move follows a single cycle floating point instruction, a single alignment cycle must be added. For example: MINF32 R0H,R1H NOP MOV32 @ACC,R0H NOP ; ; ; ; Single-cycle instruction 1 alignment cycle Copy R0H to ACC Any instruction If the move follows a 2 pipeline-cycle floating point instruction, then two alignment cycles must be used. For example: ADDF32 R2H, R1H, R0H NOP NOP MOV32 ACC, R2H Example MOVIZF32 R0H, #2.5 F32TOUI32 R0H, R0H NOP NOP MOV32 P, R0H See also 84 ; ; ; ; ; ; 2 pipeline instruction (2p) 1 cycle delay for ADDF32 to complete <-- ADDF32 completes, R2H is valid 1 alignment cycle copy R2H into ACC, takes 1 cycle <-- MOV32 completes, ACC is valid NOP ; Any instruction ; R0H = 2.5 = 0x40200000 ; ; ; ; Delay for conversion instruction <-- Conversion complete, R0H valid Alignment cycle P = 2 = 0x00000002 MOV32 ACC, RaH MOV32 XARn, RaH MOV32 XT, RaH Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated MOV32 RaH, ACC — Move the Contents of ACC to a 32-bit Floating-Point Register www.ti.com MOV32 RaH, ACC Move the Contents of ACC to a 32-bit Floating-Point Register Operands RaH floating-point destination register (R0H to R7H) ACC accumulator Opcode LSW: 1011 1101 loc32 MSW: IIII IIII IIII IIII Description Move the 32-bit value in ACC to the floating-point register RaH. RaH = ACC This instruction does not modify any STF register flags. Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline While this is a single-cycle instruction, additional pipeline alignment is required. Four alignment cycles are required after any copy from a standard 28x CPU register to a floating-point register. The four alignment cycles can be filled with any non-conflicting instructions except for the following: FRACF32, UI16TOF32, I16TOF32, F32TOUI32, and F32TOI32. MOV32 R0H,@ACC ; Copy ACC to R0H NOP ; Wait 4 cycles NOP ; Do not use FRACF32, UI16TOF32 NOP ; I16TOF32, F32TOUI32 or F32TOI32 NOP ; ; <-- ROH is valid Example MOV AH, #0x0000 MOV AL, #0x0200 ; ACC = 512 MOV32 R0H, ACC NOP NOP NOP NOP UI32TOF32 R0H, R0H ; R0H = 512.0 (0x44000000) See also MOV32 RaH, P MOV32 RaH, XARn MOV32 RaH, XT SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 85 MOV32 RaH, mem32 {, CNDF} — Conditional 32-bit Move www.ti.com MOV32 RaH, mem32 {, CNDF} Conditional 32-bit Move Operands RaH floating-point destination register (R0H to R7H) mem32 pointer to the 32-bit source memory location CNDF optional condition. Opcode LSW: 1110 0010 1010 CNDF MSW: 0000 0aaa mem32 Description If the condition is true, then move the 32-bit value referenced by mem32 to the floatingpoint register indicated by RaH. if (CNDF == TRUE) RaH = [mem32] CNDF is one of the following conditions: Encode (1) CNDF Description STF Flags Tested 0000 NEQ Not equal to zero ZF == 0 0001 EQ Equal to zero ZF == 1 0010 GT Greater than zero ZF == 0 AND NF == 0 0011 GEQ Greater than or equal to zero NF == 0 0100 LT Less than zero NF == 1 0101 LEQ Less than or equal to zero ZF == 1 AND NF == 1 1010 TF Test flag set TF == 1 1011 NTF Test flag not set TF == 0 1100 LU Latched underflow LUF == 1 1101 LV Latched overflow LVF == 1 1110 UNC Unconditional None 1111 UNCF Unconditional with flag modification None (1) (2) (2) Values not shown are reserved. This is the default operation if no CNDF field is specified. This condition will allow the ZF, NF, ZI, and NI flags to be modified when a conditional operation is executed. All other conditions will not modify these flags. This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No Yes Yes Yes Yes No No if(CNDF == UNCF) { NF = RaH(31); ZF = 0; if(RaH[30:23] == 0) { ZF = 1; NF = 0; } NI = RaH[31]; ZI = 0; if(RaH[31:0] == 0) ZI = 1; } else No flags modified; Pipeline 86 This is a single-cycle instruction. Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated MOV32 RaH, mem32 {, CNDF} — Conditional 32-bit Move www.ti.com Example MOVW MOV MOV MOVIZF32 MOVIZF32 MAXF32 MOV32 See also MOV32 RaH, RbH{, CNDF} MOVD32 RaH, mem32 DP, #0x0300 @0, #0x5555 @1, #0x5555 R3H, #7.0 R4H, #7.0 R3H, R4H R1H, @0, EQ ; ; ; ; ; ; ; DP = 0x0300 [0x00C000] = 0x5555 [0x00C001] = 0x5555 R3H = 7.0 (0x40E00000) R4H = 7.0 (0x40E00000) ZF = 1, NF = 0 R1H = 0x55555555 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 87 MOV32 RaH, P — Move the Contents of P to a 32-bit Floating-Point Register MOV32 RaH, P www.ti.com Move the Contents of P to a 32-bit Floating-Point Register Operands RaH floating-point register (R0H to R7H) P product register Opcode LSW: 1011 1101 loc32 MSW: IIII IIII IIII IIII Move the 32-bit value in the product register, P, to the floating-point register RaH. Description RaH = P This instruction does not modify any STF register flags. Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline While this is a single-cycle instruction, additional pipeline alignment is required. Four alignment cycles are required after any copy from a standard 28x CPU register to a floating-point register. The four alignment cycles can be filled with any non-conflicting instructions except for the following: FRACF32, UI16TOF32, I16TOF32, F32TOUI32, and F32TOI32. MOV32 R0H,@P ; Copy P to R0H NOP ; Wait 4 alignment cycles NOP ; Do not use FRACF32, UI16TOF32 NOP ; I16TOF32, F32TOUI32 or F32TOI32 NOP ; ; <-- R0H is valid ; Instruction can use R0H as a source Example See also 88 MOV PH, #0x0000 MOV PL, #0x0200 MOV32 R0H, P NOP NOP NOP NOP UI32TOF32 R0H, R0H ; P = 512 ; R0H = 512.0 (0x44000000) MOV32 RaH, ACC MOV32 RaH, XARn MOV32 RaH, XT Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated MOV32 RaH, RbH {, CNDF} — Conditional 32-bit Move www.ti.com MOV32 RaH, RbH {, CNDF} Conditional 32-bit Move Operands RaH floating-point destination register (R0H to R7H) RbH floating-point source register (R0H to R7H) CNDF optional condition. Opcode LSW: 1110 0110 1100 CNDF MSW: 0000 0000 00bb baaa Description If the condition is true, then move the 32-bit value referenced by mem32 to the floatingpoint register indicated by RaH. if (CNDF == TRUE) RaH = RbH CNDF is one of the following conditions: Encode (1) CNDF Description STF Flags Tested 0000 NEQ Not equal to zero ZF == 0 0001 EQ Equal to zero ZF == 1 0010 GT Greater than zero ZF == 0 AND NF == 0 0011 GEQ Greater than or equal to zero NF == 0 0100 LT Less than zero NF == 1 0101 LEQ Less than or equal to zero ZF == 1 AND NF == 1 1010 TF Test flag set TF == 1 1011 NTF Test flag not set TF == 0 1100 LU Latched underflow LUF == 1 1101 LV Latched overflow LVF == 1 1110 UNC Unconditional None 1111 UNCF Unconditional with flag modification None (1) (2) (2) Values not shown are reserved. This is the default operation if no CNDF field is specified. This condition will allow the ZF, NF, ZI, and NI flags to be modified when a conditional operation is executed. All other conditions will not modify these flags. This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No Yes Yes Yes Yes No No if(CNDF == UNCF) { NF = RaH(31); ZF = 0; if(RaH[30:23] == 0) {ZF = 1; NF = 0;} NI = RaH(31); ZI = 0; if(RaH[31:0] == 0) ZI = 1; } else No flags modified; Pipeline This is a single-cycle instruction. Example MOVIZF32 MOVIZF32 MAXF32 MOV32 See also MOV32 RaH, mem32{, CNDF} R3H, R4H, R3H, R1H, #8.0 #7.0 R4H R3H, GT ; ; ; ; R3H = 8.0 (0x41000000) R4H = 7.0 (0x40E00000) ZF = 0, NF = 0 R1H = 8.0 (0x41000000) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 89 MOV32 RaH, XARn — Move the Contents of XARn to a 32-bit Floating-Point Register www.ti.com MOV32 RaH, XARn Move the Contents of XARn to a 32-bit Floating-Point Register Operands RaH floating-point register (R0H to R7H) XARn auxiliary register (XAR0 - XAR7) Opcode LSW: 1011 1101 loc32 MSW: IIII IIII IIII IIII Description Move the 32-bit value in the auxiliary register XARn to the floating point register RaH. RaH = XARn This instruction does not modify any STF register flags. Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline While this is a single-cycle instruction, additional pipeline alignment is required. Four alignment cycles are required after any copy from a standard 28x CPU register to a floating-point register. The four alignment cycles can be filled with any non-conflicting instructions except for the following: FRACF32, UI16TOF32, I16TOF32, F32TOUI32, and F32TOI32. MOV32 R0H,@XAR7 NOP NOP NOP NOP ; ; ; ; ; ; ADDF32 R2H,R1H ,R0H ; Copy XAR7 to R0H Wait 4 alignment cycles Do not use FRACF32, UI16TOF32 I16TOF32, F32TOUI32 or F32TOI32 <-- R0H is valid Instruction can use R0H as a source Example MOVL XAR1, #0x0200 ; XAR1 = 512 MOV32 R0H, XAR1 NOP NOP NOP NOP UI32TOF32 R0H, R0H ; R0H = 512.0 (0x44000000) See also MOV32 RaH, ACC MOV32 RaH, P MOV32 RaH, XT 90 Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated MOV32 RaH, XT — Move the Contents of XT to a 32-bit Floating-Point Register www.ti.com MOV32 RaH, XT Move the Contents of XT to a 32-bit Floating-Point Register Operands RaH floating-point register (R0H to R7H) XT auxiliary register (XAR0 - XAR7) Opcode LSW: 1011 1101 loc32 MSW: IIII IIII IIII IIII Description Move the 32-bit value in temporary register, XT, to the floating-point register RaH. RaH = XT This instruction does not modify any STF register flags. Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline While this is a single-cycle instruction, additional pipeline alignment is required. Four alignment cycles are required after any copy from a standard 28x CPU register to a floating-point register. The four alignment cycles can be filled with any non-conflicting instructions except for the following: FRACF32, UI16TOF32, I16TOF32, F32TOUI32, and F32TOI32. MOV32 R0H, XT NOP NOP NOP NOP ; ; ; ; ; ; ADDF32 R2H,R1H,R0H ; Example MOVIZF32 R6H, #5.0 NOP MOV32 XT, R6H MOV32 R1H, XT See also MOV32 RaH, ACC MOV32 RaH, P MOV32 RaH, XARn Copy XT to R0H Wait 4 alignment cycles Do not use FRACF32, UI16TOF32 I16TOF32, F32TOUI32 or F32TOI32 ; ; ; ; <-- R0H is valid Instruction can use R0H as a sourc R6H = 5.0 (0x40A00000) 1 Alignment cycle XT = 5.0 (0x40A00000) R1H = 5.0 (0x40A00000) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 91 MOV32 STF, mem32 — Move 32-bit Value from Memory to the STF Register www.ti.com MOV32 STF, mem32 Move 32-bit Value from Memory to the STF Register Operands STF floating-point unit status register mem32 pointer to the 32-bit source memory location Opcode LSW: 1110 0010 1000 0000 MSW: 0000 0000 mem32 Move from memory to the floating-point unit's status register STF. Description STF = [mem32] This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified Yes Yes Yes Yes Yes Yes Yes Restoring status register will overwrite all flags. Pipeline This is a single-cycle instruction. Example 1 MOVW DP, #0x0300 MOV @2, #0x020C MOV @3, #0x0000 MOV32 STF, @2 Example 2 MOV32 *SP++, STF ; MOVF32 R2H, #3.0 ; MOVF32 R3H, #5.0 ; CMPF32 R2H, R3H ; MOV32 R3H, R2H, LT ; MOV32 STF, *--SP ; See also MOV32 mem32, STF MOVST0 FLAG 92 Floating Point Unit (FPU) ; ; ; ; DP = 0x0300 [0x00C002] = 0x020C [0x00C003] = 0x0000 STF = 0x0000020C Store STF in stack R2H = 3.0 (0x40400000) R3H = 5.0 (0x40A00000) ZF = 0, NF = 1, STF = 0x00000004 R3H = 3.0 (0x40400000) Restore STF from stack SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated MOV32 XARn, RaH — Move 32-bit Floating-Point Register Contents to XARn www.ti.com MOV32 XARn, RaH Move 32-bit Floating-Point Register Contents to XARn Operands XARn 28x auxiliary register (XAR0 - XAR7) RaH floating-point source register (R0H to R7H) Opcode LSW: 1011 1111 loc32 MSW: IIII IIII IIII IIII Description Move the 32-bit value from the floating-point register RaH to the auxiliary register XARn. XARn = RaH No flags affected in floating-point unit. Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline While this is a single-cycle instruction, additional pipeline alignment is required when copying a floating-point register to a C28x register. If the move follows a single cycle floating point instruction, a single alignment cycle must be added. For example: MINF32 R0H,R1H NOP MOV32 @ACC,R0H NOP ; ; ; ; Single-cycle instruction 1 alignment cycle Copy R0H to ACC Any instruction If the move follows a 2 pipeline-cycle floating point instruction, then two alignment cycles must be used. For example: ADDF32 R2H, R1H, R0H NOP NOP MOV32 ACC, R2H NOP Example MOVIZF32 R0H, #2.5 F32TOUI32 R0H, R0H NOP NOP MOV32 XAR0, R0H See also ; ; ; ; ; ; ; 2 pipeline instruction (2p) 1 cycle delay for ADDF32 to complete <-- ADDF32 completes, R2H is valid 1 alignment cycle copy R2H into ACC, takes 1 cycle <-- MOV32 completes, ACC is valid Any instruction ; R0H = 2.5 = 0x40200000 ; ; ; ; Delay for conversion instruction <-- Conversion complete, R0H valid Alignment cycle XAR0 = 2 = 0x00000002 MOV32 ACC, RaH MOV32 P, RaH MOV32 XT, RaH SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 93 MOV32 XT, RaH — Move 32-bit Floating-Point Register Contents to XT MOV32 XT, RaH www.ti.com Move 32-bit Floating-Point Register Contents to XT Operands XT temporary register RaH floating-point source register (R0H to R7H) Opcode LSW: 1011 1111 loc32 MSW: IIII IIII IIII IIII Move the 32-bit value in RaH to the temporary register XT. Description XT = RaH No flags affected in floating-point unit. Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline While this is a single-cycle instruction, additional pipeline alignment is required when copying a floating-point register to a C28x register. If the move follows a single cycle floating point instruction, a single alignment cycle must be added. For example: MINF32 R0H,R1H NOP MOV32 @XT,R0H ; ; ; ; Single-cycle instruction 1 alignment cycle Copy R0H to ACC NOP Any instruction If the move follows a 2 pipeline-cycle floating point instruction, then two alignment cycles must be used. For example: ADDF32 R2H, R1H, R0H NOP NOP MOV32 XT, R2H NOP ; ; ; ; ; ; ; 2 pipeline instruction (2p) 1 cycle delay for ADDF32 to complete <-- ADDF32 completes, R2H is valid 1 alignment cycle copy R2H into ACC, takes 1 cycle <-- MOV32 completes, ACC is valid Any instruction Example MOVIZF32 R0H, #2.5 F32TOUI32 R0H, R0H NOP NOP MOV32 See also 94 XT, R0H ; R0H = 2.5 = 0x40200000 ; ; ; ; Delay for conversion instruction <-- Conversion complete, R0H valid Alignment cycle XT = 2 = 0x00000002 MOV32 ACC, RaH MOV32 P, RaH MOV32 XARn, RaH Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated MOVD32 RaH, mem32 — Move 32-bit Value from Memory with Data Copy www.ti.com MOVD32 RaH, mem32 Move 32-bit Value from Memory with Data Copy Operands RaH floating-point register (R0H to R7H) mem32 pointer to the 32-bit source memory location Opcode LSW: 1110 0010 0010 0011 MSW: 0000 0aaa mem32 Description Move the 32-bit value referenced by mem32 to the floating-point register indicated by RaH. RaH = [mem32] [mem32+2] = [mem32] This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No Yes Yes Yes Yes No No NF = RaH[31]; ZF = 0; if(RaH[30:23] == 0){ ZF = 1; NF = 0; } NI = RaH[31]; ZI = 0; if(RaH[31:0] == 0) ZI = 1; Pipeline This is a single-cycle instruction. Example MOVW DP, #0x02C0 ; DP = 0x02C0 MOV @2, #0x0000 ; [0x00B002] = 0x0000 MOV @3, #0x4110 ; [0x00B003] = 0x4110 MOVD32 R7H, @2 ; R7H = 0x41100000, ; [0x00B004] = 0x0000, [0x00B005] = 0x4110 See also MOV32 RaH, mem32 {,CNDF} SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 95 MOVF32 RaH, #32F — Load the 32-bits of a 32-bit Floating-Point Register www.ti.com MOVF32 RaH, #32F Load the 32-bits of a 32-bit Floating-Point Register This instruction is an alias for MOVIZ and MOVXI instructions. The second operand is translated by the assembler such that the instruction becomes: Operands MOVIZ RaH, #16FHiHex MOVXI RaH, #16FLoHex RaH floating-point destination register (R0H to R7H) #32F immediate float value represented in floating-point representation Opcode LSW: 1110 1000 0000 0III (opcode of MOVIZ RaH, #16FHiHex) MSW: IIII IIII IIII Iaaa LSW: 1110 1000 0000 1III (opcode of MOVXI RaH, #16FLoHex) MSW: IIII IIII IIII Iaaa Note: This instruction accepts the immediate operand only in floating-point representation. To specify the immediate value as a hex value (IEEE 32-bit floatingpoint format) use the MOVI32 RaH, #32FHex instruction. Description Load the 32-bits of RaH with the immediate float value represented by #32F. #32F is a float value represented in floating-point representation. The assembler will only accept a float value represented in floating-point representation. That is, 3.0 can only be represented as #3.0. #0x40400000 will result in an error. RaH = #32F This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline Depending on #32FH, this instruction takes one or two cycles. If all of the lower 16-bits of the IEEE 32-bit floating-point format of #32F are zeros, then the assembler will convert MOVF32 into only MOVIZ instruction. If the lower 16-bits of the IEEE 32-bit floating-point format of #32F are not zeros, then the assembler will convert MOVF32 into MOVIZ and MOVXI instructions. Example MOVF32 R1H, #3.0 See also MOVIZ RaH, #16FHiHex MOVXI RaH, #16FLoHex MOVI32 RaH, #32FHex MOVIZF32 RaH, #16FHi 96 ; ; ; MOVF32 R2H, #0.0 ; ; ; MOVF32 R3H, #12.265 ; ; ; ; Floating Point Unit (FPU) R1H = 3.0 (0x40400000) Assembler converts this instruction as MOVIZ R1H, #0x4040 R2H = 0.0 (0x00000000) Assembler converts this instruction as MOVIZ R2H, #0x0 R3H = 12.625 (0x41443D71) Assembler converts this instruction as MOVIZ R3H, #0x4144 MOVXI R3H, #0x3D71 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated MOVI32 RaH, #32FHex — Load the 32-bits of a 32-bit Floating-Point Register with the immediate www.ti.com MOVI32 RaH, #32FHex Load the 32-bits of a 32-bit Floating-Point Register with the immediate This instruction is an alias for MOVIZ and MOVXI instructions. The second operand is translated by the assembler such that the instruction becomes: Operands MOVIZ RaH, #16FHiHex MOVXI RaH, #16FLoHex RaH floating-point register (R0H to R7H) #32FHex A 32-bit immediate value that represents an IEEE 32-bit floating-point value. Opcode LSW: 1110 1000 0000 0III (opcode of MOVIZ RaH, #16FHiHex) MSW: IIII IIII IIII Iaaa LSW: 1110 1000 0000 1III (opcode of MOVXI RaH, #16FLoHex) MSW: IIII IIII IIII Iaaa Note: This instruction only accepts a hex value as the immediate operand. To specify the immediate value with a floating-point representation use the MOVF32 RaH, #32F instruction. Description Load the 32-bits of RaH with the immediate 32-bit hex value represented by #32Fhex. #32Fhex is a 32-bit immediate hex value that represents the IEEE 32-bit floating-point value of a floating-point number. The assembler will only accept a hex immediate value. That is, 3.0 can only be represented as #0x40400000. #3.0 will result in an error. RaH = #32FHex This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline Depending on #32FHex, this instruction takes one or two cycles. If all of the lower 16bits of #32FHex are zeros, then assembler will convert MOVI32 to the MOVIZ instruction. If the lower 16-bits of #32FHex are not zeros, then assembler will convert MOVI32 to a MOVIZ and a MOVXI instruction. Example MOVI32 R1H, #0x40400000 ; ; ; MOVI32 R2H, #0x00000000 ; ; ; MOVI32 R3H, #0x40004001 ; ; ; MOVI32 R4H, #0x00004040 ; ; ; See also R1H = 0x40400000 Assembler converts MOVIZ R1H, #0x4040 R2H = 0x00000000 Assembler converts MOVIZ R2H, #0x0 R3H = 0x40004001 Assembler converts MOVIZ R3H, #0x4000 R4H = 0x00004040 Assembler converts MOVIZ R4H, #0x0000 this instruction as this instruction as this instruction as ; MOVXI R3H, #0x4001 this instruction as ; MOVXI R4H, #0x4040 MOVIZ RaH, #16FHiHex MOVXI RaH, #16FLoHex MOVF32 RaH, #32F MOVIZF32 RaH, #16FHi SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 97 MOVIZ RaH, #16FHiHex — Load the Upper 16-bits of a 32-bit Floating-Point Register www.ti.com MOVIZ RaH, #16FHiHex Load the Upper 16-bits of a 32-bit Floating-Point Register Operands RaH floating-point register (R0H to R7H) #16FHiHex A 16-bit immediate hex value that represents the upper 16-bits of an IEEE 32-bit floating-point value. The low 16-bits of the mantissa are assumed to be all 0. Opcode LSW: 1110 1000 0000 0III MSW: IIII IIII IIII Iaaa Note: This instruction only accepts a hex value as the immediate operand. To specify the immediate value with a floating-point representation use the MOVIZF32 pseudo instruction. Description Load the upper 16-bits of RaH with the immediate value #16FHiHex and clear the low 16-bits of RaH. #16FHiHex is a 16-bit immediate value that represents the upper 16-bits of an IEEE 32bit floating-point value. The low 16-bits of the mantissa are assumed to be all 0. The assembler will only accept a hex immediate value. That is, -1.5 can only be represented as #0xBFC0. #-1.5 will result in an error. By itself, MOVIZ is useful for loading a floating-point register with a constant in which the lowest 16-bits of the mantissa are 0. Some examples are 2.0 (0x40000000), 4.0 (0x40800000), 0.5 (0x3F000000), and -1.5 (0xBFC00000). If a constant requires all 32bits of a floating-point register to be initialized, then use MOVIZ along with the MOVXI instruction. RaH[31:16] = #16FHiHex RaH[15:0] = 0 This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline This is a single-cycle instruction. Example ; Load R0H with -1.5 (0xBFC00000) MOVIZ R0H, #0xBFC0 ; R0H = 0xBFC00000 ; Load R0H with pi = 3.141593 (0x40490FDB) MOVIZ R0H, #0x4049 ; R0H = 0x40490000 MOVXI R0H, #0x0FDB ; R0H = 0x40490FDB See also 98 MOVIZF32 RaH, #16FHi MOVXI RaH, #16FLoHex Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated MOVIZF32 RaH, #16FHi — Load the Upper 16-bits of a 32-bit Floating-Point Register www.ti.com MOVIZF32 RaH, #16FHi Load the Upper 16-bits of a 32-bit Floating-Point Register Operands RaH floating-point register (R0H to R7H) #16FHi A 16-bit immediate value that represents the upper 16-bits of an IEEE 32-bit floating-point value. The low 16-bits of the mantissa are assumed to be all 0. Opcode LSW: 1110 1000 0000 0III MSW: IIII IIII IIII Iaaa Load the upper 16-bits of RaH with the value represented by #16FHi and clear the low 16-bits of RaH. Description #16FHi is a 16-bit immediate value that represents the upper 16-bits of an IEEE 32-bit floating-point value. The low 16-bits of the mantissa are assumed to be all 0. This addressing mode is most useful for constants where the lowest 16-bits of the mantissa are 0. Some examples are 2.0 (0x40000000), 4.0 (0x40800000), 0.5 (0x3F000000), and -1.5 (0xBFC00000). #16FHi can be specified in hex or float. That is, -1.5 can be represented as #-1.5 or #0xBFC0. MOVIZF32 is an alias for the MOVIZ RaH, #16FHiHex instruction. In the case of MOVIZF32 the assembler will accept either a hex or float as the immediate value and encodes it into a MOVIZ instruction. For example, MOVIZF32 RaH, #-1.5 will be encoded as MOVIZ RaH, 0xBFC0. RaH[31:16] = #16FHi RaH[15:0] = 0 This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline This is a single-cycle instruction. Example MOVIZF32 MOVIZF32 MOVIZF32 MOVIZF32 MOVIZF32 ; ; ; R0H, R1H, R2H, R3H, R4H, #3.0 #1.0 #2.5 #-5.5 #0xC0B0 ; ; ; ; ; R0H R1H R2H R3H R4H = = = = = 3.0 = 0x40400000 1.0 = 0x3F800000 2.5 = 0x40200000 -5.5 = 0xC0B00000 -5.5 = 0xC0B00000 Load R5H with pi = 3.141593 (0x40490000) MOVIZF32 R5H, #3.141593 ; R5H = 3.140625 (0x40490000) ; ; ; Load R0H with a more accurate pi = 3.141593 (0x40490FDB) MOVIZF32 R0H,#0x4049 MOVXI R0H,#0x0FDB See also ; R0H = 0x40490000 ; R0H = 0x40490FDB MOVIZ RaH, #16FHiHex MOVXI RaH, #16FLoHex SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 99 MOVST0 FLAG — Load Selected STF Flags into ST0 MOVST0 FLAG www.ti.com Load Selected STF Flags into ST0 Operands FLAG Selected flag Opcode LSW: 1010 1101 FFFF FFFF Description Load selected flags from the STF register into the ST0 register of the 28x CPU where FLAG is one or more of TF, CI, ZI, ZF, NI, NF, LUF or LVF. The specified flag maps to the ST0 register as follows: • Set OV = 1 if LVF or LUF is set. Otherwise clear OV. • Set N = 1 if NF or NI is set. Otherwise clear N. • Set Z = 1 if ZF or ZI is set. Otherwise clear Z. • Set C = 1 if TF is set. Otherwise clear C. • Set TC = 1 if TF is set. Otherwise clear TF. If any STF flag is not specified, then the corresponding ST0 register bit is not modified. Restrictions Do not use the MOVST0 instruction in the delay slots for pipelined operations. Doing so can yield invalid results. To avoid this, the proper number of NOPs or non-pipelined instructions must be inserted before the MOVST0 operation. ; The following is INVALID MPYF32 R2H, R1H, R0H MOVST0 TF ; 2 pipeline-cycle instruction (2p) ; INVALID, do not use MOVST0 in a delay slot ; The following is VALID MPYF32 R2H, R1H, R0H NOP MOVST0 TF ; 2 pipeline-cycle instruction (2p) ; 1 delay cycle, R2H updated after this instruction ; VALID This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No Yes Yes When the flags are moved to the C28x ST0 register, the LUF or LVF flags are automatically cleared if selected. Pipeline This is a single-cycle instruction. Example Program flow is controlled by C28x instructions that read status flags in the status register 0 (ST0) . If a decision needs to be made based on a floating-point operation, the information in the STF register needs to be loaded into ST0 flags (Z,N,OV,TC,C) so that the appropriate branch conditional instruction can be executed. The MOVST0 FLAG instruction is used to load the current value of specified STF flags into the respective bits of ST0. When this instruction executes, it will also clear the latched overflow and underflow flags if those flags are specified. Loop: MOV32 R0H,*XAR4++ MOV32 R1H,*XAR3++ CMPF32 R1H, R0H MOVST0 ZF, NF BF Loop, GT ; Loop if (R1H > R0H) See also 100 MOV32 mem32, STF MOV32 STF, mem32 Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated MOVXI RaH, #16FLoHex — Move Immediate to the Low 16-bits of a Floating-Point Register www.ti.com MOVXI RaH, #16FLoHex Move Immediate to the Low 16-bits of a Floating-Point Register Operands Ra floating-point register (R0H to R7H) #16FLoHex A 16-bit immediate hex value that represents the lower 16-bits of an IEEE 32-bit floating-point value. The upper 16-bits will not be modified. Opcode LSW: 1110 1000 0000 1III MSW: IIII IIII IIII Iaaa Description Load the low 16-bits of RaH with the immediate value #16FLoHex. #16FLoHex represents the lower 16-bits of an IEEE 32-bit floating-point value. The upper 16-bits of RaH will not be modified. MOVXI can be combined with the MOVIZ or MOVIZF32 instruction to initialize all 32-bits of a RaH register. RaH[15:0] = #16FLoHex RaH[31:16] = Unchanged Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline This is a single-cycle instruction. Example ; Load R0H with pi = 3.141593 (0x40490FDB) MOVIZ R0H,#0x4049 ; R0H = 0x40490000 MOVXI R0H,#0x0FDB ; R0H = 0x40490FDB See also MOVIZ RaH, #16FHiHex MOVIZF32 RaH, #16FHi SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 101 MPYF32 RaH, RbH, RcH — 32-bit Floating-Point Multiply www.ti.com MPYF32 RaH, RbH, RcH 32-bit Floating-Point Multiply Operands RaH floating-point destination register (R0H to R7H) RbH floating-point source register (R0H to R7H) RcH floating-point source register (R0H to R7H) Opcode LSW: 1110 0111 0000 0000 MSW: 0000 000c ccbb baaa Multiply the contents of two floating-point registers. Description RaH = RbH * RcH This instruction modifies the following flags in the STF register:. Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No Yes Yes The STF register flags are modified as follows: • LUF = 1 if MPYF32 generates an underflow condition. • LVF = 1 if MPYF32 generates an overflow condition. Pipeline This is a 2 pipeline cycle (2p) instruction. That is: MPYF32 RaH, RbH, RcH NOP ; 2 pipeline cycles (2p) ; 1 cycle delay or non-conflicting instruction ; <-- MPYF32 completes, RaH updated NOP Any instruction in the delay slot must not use RaH as a destination register or use RaH as a source operand. Example Calculate Y = A * B: MOVL XAR4, #A MOV32 R0H, *XAR4 ; Load R0H with A MOVL XAR4, # B MOV32 R1H, *XAR4 ; Load R1H with B MPYF32 R0H,R1H,R0H ; Multiply A * B MOVL XAR4, #Y ; <--MPYF32 complete MOV32 *XAR4,R0H ; Save the result See also 102 MPYF32 RaH, #16FHi, RbH MPYF32 RaH, RbH, RcH || ADDF32 RdH, ReH, RfH MPYF32 RdH, ReH, RfH || MOV32 RaH, mem32 MPYF32 RdH, ReH, RfH || MOV32 mem32, RaH MPYF32 RaH, RbH, RcH || SUBF32 RdH, ReH, RfH MACF32 R3H, R2H, RdH, ReH, RfH || MOV32 RaH, mem32 Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated MPYF32 RaH, #16FHi, RbH — 32-bit Floating-Point Multiply www.ti.com MPYF32 RaH, #16FHi, RbH 32-bit Floating-Point Multiply Operands RaH floating-point destination register (R0H to R7H) #16FHi A 16-bit immediate value that represents the upper 16-bits of an IEEE 32-bit floating-point value. The low 16-bits of the mantissa are assumed to be all 0. RcH floating-point source register (R0H to R7H) Opcode LSW: 1110 1000 01II IIII MSW: IIII IIII IIbb baaa Description Multiply RbH with the floating-point value represented by the immediate operand. Store the result of the addition in RaH. #16FHi is a 16-bit immediate value that represents the upper 16-bits of an IEEE 32-bit floating-point value. The low 16-bits of the mantissa are assumed to be all 0. #16FHi is most useful for representing constants where the lowest 16-bits of the mantissa are 0. Some examples are 2.0 (0x40000000), 4.0 (0x40800000), 0.5 (0x3F000000), and -1.5 (0xBFC00000). The assembler will accept either a hex or float as the immediate value. That is, the value -1.5 can be represented as #-1.5 or #0xBFC0. RaH = RbH * #16FHi:0 This instruction can also be written as MPYF32 RaH, RbH, #16FHi. This instruction modifies the following flags in the STF register:. Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No Yes Yes The STF register flags are modified as follows: • LUF = 1 if MPYF32 generates an underflow condition. • LVF = 1 if MPYF32 generates an overflow condition. Pipeline This is a 2 pipeline cycle (2p) instruction. That is: MPYF32 RaH, #16FHi, RbH NOP ; 2 pipeline cycles (2p) ; 1 cycle delay or non-conflicting instruction ; <-- MPYF32 completes, RaH updated NOP Any instruction in the delay slot must not use RaH as a destination register or use RaH as a source operand. Example 1 MOVIZF32 R3H, #2.0 MPYF32 R4H, #3.0, R3H MOVL XAR1, #0xB006 MOV32 *XAR1, R4H Example 2 ; ; ; ; ; R3H = 2.0 (0x40000000) R4H = 3.0 * R3H <-- Non conflicting instruction <-- MPYF32 complete, R4H = 6.0 (0x40C00000) Save the result in memory location 0xB006 ;Same as above example but #16FHi is represented in Hex MOVIZF32 R3H, #2.0 ; R3H = 2.0 (0x40000000) MPYF32 R4H, #0x4040, R3H ; R4H = 0x4040 * R3H ; 3.0 is represented as 0x40400000 in ; IEEE 754 32-bit format MOVL XAR1, #0xB006 ; <-- Non conflicting instruction ; <-- MPYF32 complete, R4H = 6.0 (0x40C00000) MOV32 *XAR1, R4H ; Save the result in memory location 0xB006 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 103 MPYF32 RaH, #16FHi, RbH — 32-bit Floating-Point Multiply See also 104 www.ti.com MPYF32 RaH, RbH, #16FHi MPYF32 RaH, RbH, RcH MPYF32 RaH, RbH, RcH || ADDF32 RdH, ReH, RfH Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated MPYF32 RaH, RbH, #16FHi — 32-bit Floating-Point Multiply www.ti.com MPYF32 RaH, RbH, #16FHi 32-bit Floating-Point Multiply Operands RaH floating-point destination register (R0H to R7H) RbH floating-point source register (R0H to R7H) #16FHi A 16-bit immediate value that represents the upper 16-bits of an IEEE 32-bit floating-point value. The low 16-bits of the mantissa are assumed to be all 0. Opcode LSW: 1110 1000 01II IIII MSW: IIII IIII IIbb baaa Description Multiply RbH with the floating-point value represented by the immediate operand. Store the result of the addition in RaH. #16FHi is a 16-bit immediate value that represents the upper 16-bits of an IEEE 32-bit floating-point value. The low 16-bits of the mantissa are assumed to be all 0. #16FHi is most useful for representing constants where the lowest 16-bits of the mantissa are 0. Some examples are 2.0 (0x40000000), 4.0 (0x40800000), 0.5 (0x3F000000), and -1.5 (0xBFC00000). The assembler will accept either a hex or float as the immediate value. That is, the value -1.5 can be represented as #-1.5 or #0xBFC0. RaH = RbH * #16FHi:0 This instruction can also be writen as MPYF32 RaH, #16FHi, RbH. This instruction modifies the following flags in the STF register:. Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No Yes Yes The STF register flags are modified as follows: • LUF = 1 if MPYF32 generates an underflow condition. • LVF = 1 if MPYF32 generates an overflow condition. Pipeline This is a 2 pipeline cycle (2p) instruction. That is: MPYF32 RaH, RbH, #16FHi NOP ; 2 pipeline cycles (2p) ; 1 cycle delay or non-conflicting instruction ; <-- MPYF32 completes, RaH updated NOP Any instruction in the delay slot must not use RaH as a destination register or use RaH as a source operand. Example 1 MOVIZF32 R3H, #2.0 MPYF32 R4H, R3H, #3.0 MOVL XAR1, #0xB008 MOV32 Example 2 *XAR1, R4H ; ; ; ; ; ;Same as above example but #16FHi MOVIZF32 R3H, #2.0 ; MPYF32 R4H, R3H, #0x4040 ; ; ; MOVL XAR1, #0xB008 ; ; MOV32 *XAR1, R4H ; R3H = 2.0 (0x40000000) R4H = R3H * 3.0 <-- Non conflicting instruction <-- MPYF32 complete, R4H = 6.0 (0x40C00000) Save the result in memory location 0xB008 is represented in Hex R3H = 2.0 (0x40000000) R4H = R3H * 0x4040 3.0 is represented as 0x40400000 in IEEE 754 32-bit format <-- Non conflicting instruction <-- MPYF32 complete, R4H = 6.0 (0x40C00000) Save the result in memory location 0xB008 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 105 MPYF32 RaH, RbH, #16FHi — 32-bit Floating-Point Multiply See also 106 www.ti.com MPYF32 RaH, #16FHi, RbH MPYF32 RaH, RbH, RcH Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated MPYF32 RaH, RbH, RcH ∥ADDF32 RdH, ReH, RfH — 32-bit Floating-Point Multiply with Parallel Add www.ti.com MPYF32 RaH, RbH, RcH ∥ADDF32 RdH, ReH, RfH 32-bit Floating-Point Multiply with Parallel Add Operands RaH floating-point destination register for MPYF32 (R0H to R7H) RaH cannot be the same register as RdH RbH floating-point source register for MPYF32 (R0H to R7H) RcH floating-point source register for MPYF32 (R0H to R7H) RdH floating-point destination register for ADDF32 (R0H to R7H) RdH cannot be the same register as RaH ReH floating-point source register for ADDF32 (R0H to R7H) RfH floating-point source register for ADDF32 (R0H to R7H) Opcode LSW: 1110 0111 0100 00ff MSW: feee dddc ccbb baaa Description Multiply the contents of two floating-point registers with parallel addition of two registers. RaH = RbH * RcH RdH = ReH + RfH This instruction can also be written as: MACF32 RaH, RbH, RcH, RdH, ReH, RfH Restrictions The destination register for the MPYF32 and the ADDF32 must be unique. That is, RaH cannot be the same register as RdH. Flags This instruction modifies the following flags in the STF register:. Flag TF ZI NI ZF NF LUF LVF Modified No No No No No Yes Yes The STF register flags are modified as follows: • LUF = 1 if MPYF32 or ADDF32 generates an underflow condition. • LVF = 1 if MPYF32 or ADDF32 generates an overflow condition. Pipeline Both MPYF32 and ADDF32 take 2 pipeline cycles (2p) That is: MPYF32 RaH, RbH, RcH ; 2 pipeline cycles (2p) || ADDF32 RdH, ReH, RfH ; 2 pipeline cycles (2p) NOP ; 1 cycle delay or non-conflicting instruction ; <-- MPYF32, ADDF32 complete, RaH, RdH updated NOP Any instruction in the delay slot must not use RaH or RdH as a destination register or as a source operand. SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 107 MPYF32 RaH, RbH, RcH ∥ADDF32 RdH, ReH, RfH — 32-bit Floating-Point Multiply with Parallel Add Example ; ; ; ; ; ; ; ; ; www.ti.com Perform 5 multiply and accumulate operations: 1st 2nd 3rd 4th 5th multiply: multiply: multiply: multiply: multiply: A B C D E = = = = = X0 X1 X2 X3 X3 * * * * * Y0 Y1 Y2 Y3 Y3 Result = A + B + C + D + E MOV32 MOV32 R0H, *XAR4++ R1H, *XAR5++ MPYF32 R2H, R0H, R1H || MOV32 R0H, *XAR4++ MOV32 R1H, *XAR5++ MPYF32 R3H, R0H, R1H || MOV32 R0H, *XAR4++ MOV32 R1H, *XAR5++ ; R0H = X0 ; R1H = Y0 ; R2H = A = X0 * Y0 ; In parallel R0H = X1 ; R1H = Y1 ; R3H = B = X1 * Y1 ; In parallel R0H = X2 ; R1H = Y2 ; R3H = A + B ; R2H = C = X2 * Y2 MACF32 R3H, R2H, R2H, R0H, R1H ; In parallel R0H = X3 || MOV32 R0H, *XAR4++ MOV32 R1H, *XAR5++ ; R1H = Y3 ; R3H = (A + B) + C ; R2H = D = X3 * Y3 MACF32 R3H, R2H, R2H, R0H, R1H ; In parallel R0H = X4 || MOV32 R0H, *XAR4 MOV32 R1H, *XAR5 ; R1H = Y4 MPYF32 R2H, R0H, R1H || ADDF32 R3H, R3H, R2H NOP See also 108 ; R2H = E = X4 * Y4 ; in parallel R3H = (A + B + C) + D ; Wait for MPYF32 || ADDF32 to complete ADDF32 R3H, R3H, R2H ; R3H = (A + B + C + D) + E NOP MOV32 ; Wait for ADDF32 to complete ; Store the result @Result, R3H MACF32 R3H, R2H, RdH, ReH, RfH MACF32 R3H, R2H, RdH, ReH, RfH || MOV32 RaH, mem32 MACF32 R7H, R3H, mem32, *XAR7++ MACF32 R7H, R6H, RdH, ReH, RfH MACF32 R7H, R6H, RdH, ReH, RfH || MOV32 RaH, mem32 Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated MPYF32 RdH, ReH, RfH ∥MOV32 RaH, mem32 — 32-bit Floating-Point Multiply with Parallel Move www.ti.com MPYF32 RdH, ReH, RfH ∥MOV32 RaH, mem32 32-bit Floating-Point Multiply with Parallel Move Operands RdH floating-point destination register for the MPYF32 (R0H to R7H) RdH cannot be the same register as RaH ReH floating-point source register for the MPYF32 (R0H to R7H) RfH floating-point source register for the MPYF32 (R0H to R7H) RaH floating-point destination register for the MOV32 (R0H to R7H) RaH cannot be the same register as RdH mem32 pointer to a 32-bit memory location. This will be the source of the MOV32. Opcode LSW: 1110 0011 0000 fffe MSW: eedd daaa mem32 Description Multiply the contents of two floating-point registers and load another. RdH = ReH * RfH RaH = [mem32] Restrictions The destination register for the MPYF32 and the MOV32 must be unique. That is, RaH cannot be the same register as RdH. Flags This instruction modifies the following flags in the STF register:. Flag TF ZI NI ZF NF LUF LVF Modified No Yes Yes Yes Yes Yes Yes The STF register flags are modified as follows: • LUF = 1 if MPYF32 generates an underflow condition. • LVF = 1 if MPYF32 generates an overflow condition. The MOV32 Instruction will set the NF, ZF, NI and ZI flags as follows: NF = RaH(31); ZF = 0; if(RaH(30:23) == 0) { ZF = 1; NF = 0; } NI = RaH(31); ZI = 0; if(RaH(31:0) == 0) ZI = 1; Pipeline MPYF32 takes 2 pipeline-cycles (2p) and MOV32 takes a single cycle. That is: MPYF32 RdH, ReH, RfH ; || MOV32 RaH, mem32 ; ; NOP ; ; NOP 2 pipeline cycles (2p) 1 cycle <-- MOV32 completes, RaH updated 1 cycle delay or non-conflicting instruction <-- MPYF32 completes, RdH updated Any instruction in the delay slot must not use RdH as a destination register or as a source operand. SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 109 MPYF32 RdH, ReH, RfH ∥MOV32 RaH, mem32 — 32-bit Floating-Point Multiply with Parallel Move Example www.ti.com Calculate Y = M1*X1 + B1. This example assumes that M1, X1, B1 and Y1 are all on the same data page. MOVW DP, #M1 MOV32 R0H,@M1 MOV32 R1H,@X1 MPYF32 R1H,R1H,R0H || MOV32 R0H,@B1 NOP ADDF32 R1H,R1H,R0H NOP MOV32 @Y1,R1H ; ; ; ; ; ; ; ; ; ; ; ; Load the data page Load R0H with M1 Load R1H with X1 Multiply M1*X1 and in parallel load R0H with B1 <-- MOV32 complete Wait 1 cycle for MPYF32 to complete <-- MPYF32 complete Add M*X1 to B1 and store in R1H Wait 1 cycle for ADDF32 to complete <-- ADDF32 complete Store the result Calculate Y = (A * B) * C: MOVL XAR4, #A MOV32 R0H, *XAR4 MOVL XAR4, #B MOV32 R1H, *XAR4 MOVL XAR4, #C MPYF32 R1H,R1H,R0H || MOV32 R0H, *XAR4 ; Load ROH with A ; Load R1H with B ; Calculate R1H = A * B ; and in parallel load R2H with C ; <-- MOV32 complete MOVL XAR4, #Y MPYF32 R2H,R1H,R0H NOP ; ; ; ; <-- MPYF32 complete Calculate Y = (A * B) * C Wait 1 cycle for MPYF32 to complete MPYF32 complete MOV32 *XAR4,R2H See also 110 MPYF32 RdH, ReH, RfH || MOV32 mem32, RaH MACF32 R3H, R2H, RdH, ReH, RfH || MOV32 RaH, mem32 MACF32 R7H, R6H, RdH, ReH, RfH || MOV32 RaH, mem32 MACF32 R7H, R3H, mem32, *XAR7++ Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated MPYF32 RdH, ReH, RfH ∥MOV32 mem32, RaH — 32-bit Floating-Point Multiply with Parallel Move www.ti.com MPYF32 RdH, ReH, RfH ∥MOV32 mem32, RaH 32-bit Floating-Point Multiply with Parallel Move Operands RdH floating-point destination register for the MPYF32 (R0H to R7H) ReH floating-point source register for the MPYF32 (R0H to R7H) RfH floating-point source register for the MPYF32 (R0H to R7H) mem32 pointer to a 32-bit memory location. This will be the destination of the MOV32. RaH floating-point source register for the MOV32 (R0H to R7H) Opcode LSW: 1110 0000 0000 fffe MSW: eedd daaa mem32 Description Multiply the contents of two floating-point registers and move from memory to register. RdH = ReH * RfH, [mem32] = RaH This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No Yes Yes The STF register flags are modified as follows: • LUF = 1 if MPYF32 generates an underflow condition. • LVF = 1 if MPYF32 generates an overflow condition. Pipeline MPYF32 takes 2 pipeline-cycles (2p) and MOV32 takes a single cycle. That is: MPYF32 RdH, ReH, RfH ; || MOV32 mem32, RaH ; ; NOP ; ; NOP 2 pipeline cycles (2p) 1 cycle <-- MOV32 completes, mem32 updated 1 cycle delay or non-conflicting instruction <-- MPYF32 completes, RdH updated Any instruction in the delay slot must not use RdH as a destination register or as a source operand. Example MOVL XAR1, #0xC003 MOVIZF32 R3H, #2.0 MPYF32 R3H, R3H, #5.0 MOVIZF32 R1H, #5.0 MPYF32 R3H, R1H, R3H || MOV32 *XAR1, R3H NOP See also ; ; ; ; ; ; ; ; XAR1 = 0xC003 R3H = 2.0 (0x40000000) R3H = R3H * 5.0 R1H = 5.0 (0x40A00000) <-- MPYF32 complete, R3H = 10.0 (0x41200000) R3H = R1H * R3H and in parallel store previous R3 value MOV32 complete, [0xC003] = 0x4120, ; [0xC002] = 0x0000 ; 1 cycle delay for MPYF32 to complete ; <-- MPYF32 , R3H = 50.0 (0x42480000) MPYF32 RdH, ReH, RfH || MOV32 RaH, mem32 MACF32 R3H, R2H, RdH, ReH, RfH || MOV32 RaH, mem32 MACF32 R7H, R6H, RdH, ReH, RfH || MOV32 RaH, mem32 MACF32 R7H, R3H, mem32, *XAR7++ SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 111 MPYF32 RaH, RbH, RcH ∥SUBF32 RdH, ReH, RfH — 32-bit Floating-Point Multiply with Parallel Subtract www.ti.com MPYF32 RaH, RbH, RcH ∥SUBF32 RdH, ReH, RfH 32-bit Floating-Point Multiply with Parallel Subtract Operands RaH floating-point destination register for MPYF32 (R0H to R7H) RaH cannot be the same register as RdH RbH floating-point source register for MPYF32 (R0H to R7H) RcH floating-point source register for MPYF32 (R0H to R7H) RdH floating-point destination register for SUBF32 (R0H to R7H) RdH cannot be the same register as RaH ReH floating-point source register for SUBF32 (R0H to R7H) RfH floating-point source register for SUBF32 (R0H to R7H) Opcode LSW: 1110 0111 0101 00ff MSW: feee dddc ccbb baaa Description Multiply the contents of two floating-point registers with parallel subtraction of two registers. RaH = RbH * RcH, RdH = ReH - RfH Restrictions The destination register for the MPYF32 and the SUBF32 must be unique. That is, RaH cannot be the same register as RdH. Flags This instruction modifies the following flags in the STF register:. Flag TF ZI NI ZF NF LUF LVF Modified No No No No No Yes Yes The STF register flags are modified as follows: • LUF = 1 if MPYF32 or SUBF32 generates an underflow condition. • LVF = 1 if MPYF32 or SUBF32 generates an overflow condition. Pipeline MPYF32 and SUBF32 both take 2 pipeline-cycles (2p). That is: MPYF32 RaH, RbH, RcH ; 2 pipeline cycles (2p) || SUBF32 RdH, ReH, RfH ; 2 pipeline cycles (2p) NOP ; 1 cycle delay or non-conflicting instruction ; <-- MPYF32, SUBF32 complete. RaH, RdH updated NOP Any instruction in the delay slot must not use RaH or RdH as a destination register or as a source operand. Example MOVIZF32 R4H, #5.0 ; MOVIZF32 R5H, #3.0 ; MPYF32 R6H, R4H, R5H || SUBF32 R7H, R4H, R5H See also 112 R4H = 5.0 (0x40A00000) R5H = 3.0 (0x40400000) ; R6H = R4H * R5H ; R7H = R4H - R5H NOP ; 1 cycle delay for MPYF32 || SUBF32 to complete ; <-- MPYF32 || SUBF32 complete, ; R6H = 15.0 (0x41700000), R7H = 2.0 (0x40000000) SUBF32 RaH, RbH, RcH SUBF32 RdH, ReH, RfH || MOV32 RaH, mem32 SUBF32 RdH, ReH, RfH || MOV32 mem32, RaH Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated NEGF32 RaH, RbH{, CNDF} — Conditional Negation www.ti.com NEGF32 RaH, RbH{, CNDF} Conditional Negation Operands RaH floating-point destination register (R0H to R7H) RbH floating-point source register (R0H to R7H) CNDF condition tested Opcode LSW: 1110 0110 1010 CNDF MSW: 0000 0000 00bb baaa Description if (CNDF == true) {RaH = - RbH } else {RaH = RbH } CNDF is one of the following conditions: Encode (1) CNDF Description STF Flags Tested 0000 NEQ Not equal to zero ZF == 0 0001 EQ Equal to zero ZF == 1 0010 GT Greater than zero ZF == 0 AND NF == 0 0011 GEQ Greater than or equal to zero NF == 0 0100 LT Less than zero NF == 1 0101 LEQ Less than or equal to zero ZF == 1 AND NF == 1 1010 TF Test flag set TF == 1 1011 NTF Test flag not set TF == 0 1100 LU Latched underflow LUF == 1 1101 LV Latched overflow LVF == 1 1110 UNC Unconditional None Unconditional with flag modification None 1111 (1) (2) UNCF (2) Values not shown are reserved. This is the default operation if no CNDF field is specified. This condition will allow the ZF, NF, ZI, and NI flags to be modified when a conditional operation is executed. All other conditions will not modify these flags. This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No Yes Yes No No Pipeline This is a single-cycle instruction. Example MOVIZF32 R0H, #5.0 MOVIZF32 R1H, #4.0 MOVIZF32 R2H, #-1.5 ; R0H = 5.0 (0x40A00000) ; R1H = 4.0 (0x40800000) ; R2H = -1.5 (0xBFC00000) MPYF32 R4H, R1H, R2H ; MPYF32 R5H, R0H, R1H ; ; CMPF32 R4H, #0.0 ; ; NEGF32 R4H, R4H, LT ; CMPF32 R5H, #0.0 ; NEGF32 R5H, R5H, GEQ ; See also R4H = -6.0 R5H = 20.0 <-- R4H valid NF = 1 <-- R5H valid if NF = 1, R4H = 6.0 NF = 0 if NF = 0, R4H = -20.0 ABSF32 RaH, RbH SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 113 POP RB — Pop the RB Register from the Stack POP RB www.ti.com Pop the RB Register from the Stack Operands RB repeat block register Opcode LSW: 1111 1111 1111 0001 Description Restore the RB register from stack. If a high-priority interrupt contains a RPTB instruction, then the RB register must be stored on the stack before the RPTB block and restored after the RTPB block. In a low-priority interrupt RB must always be saved and restored. This save and restore must occur when interrupts are disabled. Flags This instruction does not affect any flags floating-point Unit: Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline This is a single-cycle instruction. Example A high priority interrupt is defined as an interrupt that cannot itself be interrupted. In a high priority interrupt, the RB register must be saved if a RPTB block is used within the interrupt. If the interrupt service routine does not include a RPTB block, then you do not have to save the RB register. ; Repeat Block within a High-Priority Interrupt (Non-Interruptible) _Interrupt: ; RAS = RA, RA = 0 ... PUSH RB ; Save RB register only if a RPTB block is used in the ISR ... ... RPTB #BlockEnd, AL ; Execute the block AL+1 times ... ... BlockEnd ; End of block to be repeated ... ... POP RB ; Restore RB register ... IRET ; RA = RAS, RAS = 0 A low-priority interrupt is defined as an interrupt that allows itself to be interrupted. The RB register must always be saved and restored in a low-priority interrupt. The RB register must stored before interrupts are enabled. Likewise before restoring the RB register interrupts must first be disabled. ; Repeat Block within a Low-Priority Interrupt (Interruptible) _Interrupt: ; RAS = RA, RA = 0 ... PUSH RB ; Always save RB register ... CLRC INTM ; Enable interrupts only after saving RB ... ... ; ISR may or may not include a RPTB block ... SETC INTM ; Disable interrupts before restoring RB ... POP RB ; Always restore RB register ... IRET ; RA = RAS, RAS = 0 114 Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated POP RB — Pop the RB Register from the Stack www.ti.com See also PUSH RB RPTB #RSIZE, RC RPTB #RSIZE, loc16 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 115 PUSH RB — Push the RB Register onto the Stack PUSH RB www.ti.com Push the RB Register onto the Stack Operands RB repeat block register Opcode LSW: 1111 1111 1111 0000 Description Save the RB register on the stack. If a high-priority interrupt contains a RPTB instruction, then the RB register must be stored on the stack before the RPTB block and restored after the RTPB block. In a low-priority interrupt RB must always be saved and restored. This save and restore must occur when interrupts are disabled. Flags This instruction does not affect any flags floating-point Unit: Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline This is a single-cycle instruction for the first iteration, and zero cycles thereafter. Example A high priority interrupt is defined as an interrupt that cannot itself be interrupted. In a high priority interrupt, the RB register must be saved if a RPTB block is used within the interrupt. If the interrupt service routine does not include a RPTB block, then you do not have to save the RB register. ; Repeat Block within a High-Priority Interrupt (Non-Interruptible) _Interrupt: ; RAS = RA, RA = 0 ... PUSH RB ; Save RB register only if a RPTB block is used in the ISR ... RPTB #BlockEnd, AL ; Execute the block AL+1 times ... ... BlockEnd ; End of block to be repeated ... POP RB ; Restore RB register ... IRET ; RA = RAS, RAS = 0 A low-priority interrupt is defined as an interrupt that allows itself to be interrupted. The RB register must always be saved and restored in a low-priority interrupt. The RB register must stored before interrupts are enabled. Likewise before restoring the RB register interrupts must first be disabled. ; Repeat Block within a Low-Priority Interrupt (Interruptible) _Interrupt: ; RAS = RA, RA = 0 ... PUSH RB ; Always save RB register ... CLRC INTM ; Enable interrupts only after saving RB ... ... ; ISR may or may not include a RPTB block ... SETC INTM ; Disable interrupts before restoring RB ... POP RB ; Always restore RB register ... IRET ; RA = RAS, RAS = 0 See also 116 POP RB RPTB #RSIZE, RC RPTB #RSIZE, loc16 Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated RESTORE — Restore the Floating-Point Registers www.ti.com RESTORE Restore the Floating-Point Registers Operands none This instruction does not have any operands Opcode LSW: 1110 0101 0110 0010 Description Restore the floating-point register set (R0H - R7H and STF) from their shadow registers. The SAVE and RESTORE instructions should be used in high-priority interrupts. That is interrupts that cannot themselves be interrupted. In low-priority interrupt routines the floating-point registers should be pushed onto the stack. Restrictions The RESTORE instruction cannot be used in any delay slots for pipelined operations. Doing so will yield invalid results. To avoid this, the proper number of NOPs or nonpipelined instructions must be inserted before the RESTORE operation. ; The following is INVALID MPYF32 R2H, R1H, R0H ; 2 pipeline-cycle instruction (2p) RESTORE ; INVALID, do not use RESTORE in a delay slot ; The following is VALID MPYF32 R2H, R1H, R0H NOP RESTORE ; 2 pipeline-cycle instruction (2p) ; 1 delay cycle, R2H updated after this instruction ; VALID Restoring the status register will overwrite all flags: Flags Flag TF ZI NI ZF NF LUF LVF Modified Yes Yes Yes Yes Yes Yes Yes Pipeline This is a single-cycle instruction. SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 117 RESTORE — Restore the Floating-Point Registers Example The following example shows a complete context save and restore for a high-priority interrupt. Note that the CPU automatically stores the following registers: ACC, P, XT, ST0, ST1, IER, DP, AR0, AR1 and PC. If an interrupt is low priority (that is it can be interrupted), then push the floating point registers onto the stack instead of using the SAVE and RESTORE operations. ; Interrupt Save _HighestPriorityISR: ASP PUSH RB PUSH AR1H:AR0H PUSH XAR2 PUSH XAR3 PUSH XAR4 PUSH XAR5 PUSH XAR6 PUSH XAR7 PUSH XT SPM 0 CLRC AMODE CLRC PAGE0,OVM SAVE RNDF32=1 ... ... ; Interrupt Restore ... RESTORE POP XT POP XAR7 POP XAR6 POP XAR5 POP XAR4 POP XAR3 POP XAR2 POP AR1H:AR0H POP RB NASP IRET See also 118 www.ti.com ; ; ; ; Uninterruptable Align stack Save RB register if used in the ISR Save other registers if used ; Set default C28 modes ; Save all FPU registers ; set default FPU modes ; Restore all FPU registers ; restore other registers ; restore RB register ; un-align stack ; return from interrupt SAVE FLAG, VALUE Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated RPTB label, loc16 — Repeat A Block of Code www.ti.com RPTB label, loc16 Repeat A Block of Code Operands label This label is used by the assembler to determine the end of the repeat block and to calculate RSIZE. This label should be placed immediately after the last instruction included in the repeat block. loc16 16-bit location for the repeat count value. Opcode LSW: 1011 0101 0bbb bbbb MSW: 0000 0000 loc16 Description Initialize repeat block loop, repeat count from [loc16] Restrictions • • • • • • • The maximum block size is ≤127 16-bit words. An even aligned block must be ≥ 9 16-bit words. An odd aligned block must be ≥ 8 16-bit words. Interrupts must be disabled when saving or restoring the RB register. Repeat blocks cannot be nested. Any discontinuity type operation is not allowed inside a repeat block. This includes all call, branch, or TRAP instructions. Interrupts are allowed. Conditional execution operations are allowed. This instruction does not affect any flags in the floating-point unit: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline This instruction takes four cycles on the first iteration and zero cycles thereafter. No special pipeline alignment is required. Example The minimum size for the repeat block is 9 words if the block is even-aligned and 8 words if the block is odd-aligned. If you have a block of 8 words, as in the following example, you can make sure the block is odd aligned by proceeding it by a .align 2 directive and a NOP instruction. The .align 2 directive will make sure the NOP is evenaligned. Since a NOP is a 16-bit instruction the RPTB will be odd-aligned. For blocks of 9 or more words, this is not required. ; Repeat Block of 8 Words (Interruptible) ; ; find the largest element and put its address in XAR6 .align 2 NOP RPTB VECTOR_MAX_END, AR7 MOVL ACC,XAR0 MOV32 R1H,*XAR0++ MAXF32 R0H,R1H MOVST0 NF,ZF MOVL XAR6,ACC,LT VECTOR_MAX_END: ; Execute the block AR7+1 times ; min size = 8, 9 words ; max size = 127 words ; label indicates the end ; RA is cleared When an interrupt is taken the repeat active (RA) bit in the RB register is automatically copied to the repeat active shadow (RAS) bit. When the interrupt exits, the RAS bit is automatically copied back to the RA bit. This allows the hardware to keep track if a repeat loop was active whenever an interrupt is taken and restore that state automatically. SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 119 RPTB label, loc16 — Repeat A Block of Code www.ti.com A high priority interrupt is defined as an interrupt that cannot itself be interrupted. In a high priority interrupt, the RB register must be saved if a RPTB block is used within the interrupt. If the interrupt service routine does not include a RPTB block, then you do not have to save the RB register. ; Repeat Block within a High-Priority Interrupt (Non-Interruptible) ; ; Interrupt: ; RAS = RA, RA = 0 ... PUSH RB ; Save RB register only if a RPTB block is used in the ISR ... ... RPTB #BlockEnd, AL ; Execute the block AL+1 times ... ... ... BlockEnd ; End of block to be repeated ... ... POP RB ; Restore RB register ... IRET ; RA = RAS, RAS = 0 A low-priority interrupt is defined as an interrupt that allows itself to be interrupted. The RB register must always be saved and restored in a low-priority interrupt. The RB register must stored before interrupts are enabled. Likewise before restoring the RB register interrupts must first be disabled. ; Repeat Block within a Low-Priority Interrupt (Interruptible) ; ; Interrupt: ; RAS = RA, RA = 0 ... PUSH RB ; Always save RB register ... CLRC INTM ; Enable interrupts only after saving RB ... ... ... ; ISR may or may not include a RPTB block ... ... SETC INTM ; Disable interrupts before restoring RB ... POP RB ; Always restore RB register ... IRET ; RA = RAS, RAS = 0 See also 120 POP RB PUSH RB RPTB label, RC Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated RPTB label, #RC — Repeat a Block of Code www.ti.com RPTB label, #RC Repeat a Block of Code Operands label This label is used by the assembler to determine the end of the repeat block and to calculate RSIZE. This label should be placed immediately after the last instruction included in the repeat block. #RC 16-bit location Opcode LSW: 1011 0101 1bbb bbbb MSW: cccc cccc cccc cccc Description Repeat a block of code. The repeat count is specified as a immediate value. Restrictions • • • • • • • The maximum block size is ≤127 16-bit words. An even aligned block must be ≥ 9 16-bit words. An odd aligned block must be ≥ 8 16-bit words. Interrupts must be disabled when saving or restoring the RB register. Repeat blocks cannot be nested. Any discontinuity type operation is not allowed inside a repeat block. This includes all call, branch or TRAP instructions. Interrupts are allowed. Conditional execution operations are allowed. This instruction does not affect any flags int the floating-point unit: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline This instruction takes one cycle on the first iteration and zero cycles thereafter. No special pipeline alignment is required. Example The minimum size for the repeat block is 8 words if the block is even aligned and 9 words if the block is odd aligned. If you have a block of 8 words, as in the following example, you can make sure the block is odd aligned by proceeding it by a .align 2 directive and a NOP instruction. The .align 2 directive will make sure the NOP is even aligned. Since a NOP is a 16-bit instruction the RPTB will be odd aligned. For blocks of 9 or more words, this is not required. ; Repeat Block (Interruptible) ; ; find the largest element and put its address in XAR6 .align 2 NOP RPTB VECTOR_MAX_END, #(4-1) MOVL ACC,XAR0 MOV32 R1H,*XAR0++ MAXF32 R0H,R1H MOVST0 NF,ZF MOVL XAR6,ACC,LT VECTOR_MAX_END: ; Execute the block 4 times ; 8 or 9 words block size 127 words ; RE indicates the end address ; RA is cleared When an interrupt is taken the repeat active (RA) bit in the RB register is automatically copied to the repeat active shadow (RAS) bit. When the interrupt exits, the RAS bit is automatically copied back to the RA bit. This allows the hardware to keep track if a repeat loop was active whenever an interrupt is taken and restore that state automatically. SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 121 RPTB label, #RC — Repeat a Block of Code www.ti.com A high priority interrupt is defined as an interrupt that cannot itself be interrupted. In a high priority interrupt, the RB register must be saved if a RPTB block is used within the interrupt. If the interrupt service routine does not include a RPTB block, then you do not have to save the RB register. ; Repeat Block within a High-Priority Interrupt (Non-Interruptible) ; ; Interrupt: ; RAS = RA, RA = 0 ... PUSH RB ; Save RB register only if a RPTB block is used in the ISR ... ... RPTB #BlockEnd, #5 ; Execute the block 5+1 times ... ... ... BlockEnd ; End of block to be repeated ... ... POP RB ; Restore RB register ... IRET ; RA = RAS, RAS = 0 A low-priority interrupt is defined as an interrupt that allows itself to be interrupted. The RB register must always be saved and restored in a low-priority interrupt. The RB register must stored before interrupts are enabled. Likewise before restoring the RB register interrupts must first be disabled. ; Repeat Block within a Low-Priority Interrupt (Interruptible) ; ; Interrupt: ; RAS = RA, RA = 0 ... PUSH RB ; Always save RB register ... CLRC INTM ; Enable interrupts only after saving RB ... ... ... ; ISR may or may not include a RPTB block ... ... SETC INTM ; Disable interrupts before restoring RB ... POP RB ; Always restore RB register ... IRET ; RA = RAS, RAS = 0 See also 122 POP RB PUSH RB RPTB #RSIZE, loc16 Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated SAVE FLAG, VALUE — Save Register Set to Shadow Registers and Execute SETFLG www.ti.com SAVE FLAG, VALUE Save Register Set to Shadow Registers and Execute SETFLG Operands FLAG 11 bit mask indicating which floating-point status flags to change. VALUE 11 bit mask indicating the flag value; 0 or 1. Opcode LSW: 1110 0110 01FF FFFF MSW: FFFF FVVV VVVV VVVV Description This operation copies the current working floating-point register set (R0H to R7H and STF) to the shadow register set and combines the SETFLG FLAG, VALUE operation in a single cycle. The status register is copied to the shadow register before the flag values are changed. The STF[SHDWM] flag is set to 1 when the SAVE command has been executed. The SAVE and RESTORE instructions should be used in high-priority interrupts. That is interrupts that cannot themselves be interrupted. In low-priority interrupt routines the floating-point registers should be pushed onto the stack. Restrictions Do not use the SAVE instruction in the delay slots for pipelined operations. Doing so can yield invalid results. To avoid this, the proper number of NOPs or non-pipelined instructions must be inserted before the SAVE operation. ; The following is MPYF32 R2H, R1H, SAVE RNDF32=1 ; The following is MPYF32 R2H, R1H, NOP SAVE RNDF32=1 INVALID R0H ; 2 pipeline-cycle instruction (2p) ; INVALID, do not use SAVE in a delay slot VALID R0H ; 2 pipeline-cycle instruction (2p) ; 1 delay cycle, R2H updated after this instruction ; VALID This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified Yes Yes Yes Yes Yes Yes Yes Any flag can be modified by this instruction. Pipeline This is a single-cycle instruction. Example To make it easier and more legible, the assembler will accept a FLAG=VALUE syntax for the STFLG operation as shown below: SAVE RNDF32=0, TF=1, ZF=0 MOVST0 TF, ZF, LUF ; ; ; ; FLAG = 01001000100, VALUE = X0XX0XXX1XX Copy the indicated flags to ST0 Note: X means this flag will not be modified. The assembler will set these X values to 0. The following example shows a complete context save and restore for a high priority interrupt. Note that the CPU automatically stores the following registers: ACC, P, XT, ST0, ST1, IER, DP, AR0, AR1 and PC. SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 123 SAVE FLAG, VALUE — Save Register Set to Shadow Registers and Execute SETFLG www.ti.com _HighestPriorityISR: ASP ;Align stack PUSH RB ; Save RB register if used in the ISR PUSH AR1H:AR0H ; Save other registers if used PUSH XAR2 PUSH XAR3 PUSH XAR4 PUSH XAR5 PUSH XAR6 PUSH XAR7 PUSH XT SPM 0 ; Set default C28 modes CLRC AMODE CLRC PAGE0,OVM SAVE RNDF32=0 ; Save all FPU registers ... ; set default FPU modes ... ... ... RESTORE ; Restore all FPU registers POP XT ; restore other registers POP XAR7 POP XAR6 POP XAR5 POP XAR4 POP XAR3 POP XAR2 POP AR1H:AR0H POP RB ; restore RB register NASP ; un-align stack IRET ; return from interrupt See also 124 RESTORE SETFLG FLAG, VALUE Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated SETFLG FLAG, VALUE — Set or clear selected floating-point status flags www.ti.com SETFLG FLAG, VALUE Set or clear selected floating-point status flags Operands FLAG 11 bit mask indicating which floating-point status flags to change. VALUE 11 bit mask indicating the flag value; 0 or 1. Opcode LSW: 1110 0110 00FF FFFF MSW: FFFF FVVV VVVV VVVV Description The SETFLG instruction is used to set or clear selected floating-point status flags in the STF register. The FLAG field is an 11-bit value that indicates which flags will be changed. That is, if a FLAG bit is set to 1 it indicates that flag will be changed; all other flags will not be modified. The bit mapping of the FLAG field is shown below: 10 9 8 7 6 5 4 3 2 1 0 reserved RNDF32 reserved reserved TF ZI NI ZF NF LUF LVF The VALUE field indicates the value the flag should be set to; 0 or 1. Do not use the SETFLG instruction in the delay slots for pipelined operations. Doing so can yield invalid results. To avoid this, the proper number of NOPs or non-pipelined instructions must be inserted before the SETFLG operation. Restrictions ; The following is INVALID MPYF32 R2H, R1H, R0H ; 2 pipeline-cycle instruction (2p) SETFLG RNDF32=1 ; INVALID, do not use SETFLG in a delay slot ; The following is VALID MPYF32 R2H, R1H, R0H ; 2 pipeline-cycle instruction (2p) NOP ; 1 delay cycle, R2H updated after this instruction SETFLG RNDF32=1 ; VALID This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified Yes Yes Yes Yes Yes Yes Yes Any flag can be modified by this instruction. Pipeline This is a single-cycle instruction. Example To make it easier and legible, the assembler will accept a FLAG=VALUE syntax for the STFLG operation as shown below: SETFLG RNDF32=0, TF=1, ZF=0 ; MOVST0 TF, ZF, LUF ; ; ; See also FLAG = 01001001000, VALUE = X0XX1XX0XXX Copy the indicated flags to ST0 X means this flag is not modified. The assembler will set X values to 0 SAVE FLAG, VALUE SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 125 SUBF32 RaH, RbH, RcH — 32-bit Floating-Point Subtraction www.ti.com SUBF32 RaH, RbH, RcH 32-bit Floating-Point Subtraction Operands RaH floating-point destination register (R0H to R1) RbH floating-point source register (R0H to R1) RcH floating-point source register (R0H to R1) Opcode LSW: 1110 0111 0010 0000 MSW: 0000 000c ccbb baaa Description Subtract the contents of two floating-point registers RaH = RbH - RcH This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No Yes Yes The STF register flags are modified as follows: • LUF = 1 if MPYF32 generates an underflow condition. • LVF = 1 if MPYF32 generates an overflow condition. Pipeline This is a 2 pipeline cycle (2p) instruction. That is: SUBF32 RaH, RbH, RcH ; 2 pipeline cycles (2p) NOP ; 1 cycle delay or non-conflicting instruction ; <-- SUBF32 completes, RaH updated NOP Any instruction in the delay slot must not use RaH as a destination register or as a source operand. Example Calculate Y - A + B - C: MOVL XAR4, #A MOV32 R0H, *XAR4 MOVL XAR4, #B MOV32 R1H, *XAR4 MOVL XAR4, #C ADDF32 R0H,R1H,R0H || MOV32 R2H,*XAR4 ; Load R0H with A ; Load R1H with B ; Add A + B and in parallel ; Load R2H with C ; SUBF32 R0H,R0H,R2H ; NOP ; MOV32 *XAR4,R0H ; See also 126 <-- ADDF32 complete Subtract C from (A + B) <-- SUBF32 completes Store the result SUBF32 RaH, #16FHi, RbH SUBF32 RdH, ReH, RfH || MOV32 RaH, mem32 SUBF32 RdH, ReH, RfH || MOV32 mem32, RaH MPYF32 RaH, RbH, RcH || SUBF32 RdH, ReH, RfH Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated SUBF32 RaH, #16FHi, RbH — 32-bit Floating Point Subtraction www.ti.com SUBF32 RaH, #16FHi, RbH 32-bit Floating Point Subtraction Operands RaH floating-point destination register (R0H to R1) #16FHi A 16-bit immediate value that represents the upper 16-bits of an IEEE 32-bit floating-point value. The low 16-bits of the mantissa are assumed to be all 0. RbH floating-point source register (R0H to R1) Opcode LSW: 1110 1000 11II IIII MSW: IIII IIII IIbb baaa Description Subtract RbH from the floating-point value represented by the immediate operand. Store the result of the addition in RaH. #16FHi is a 16-bit immediate value that represents the upper 16-bits of an IEEE 32-bit floating-point value. The low 16-bits of the mantissa are assumed to be all 0. #16FHi is most useful for representing constants where the lowest 16-bits of the mantissa are 0. Some examples are 2.0 (0x40000000), 4.0 (0x40800000), 0.5 (0x3F000000), and -1.5 (0xBFC00000). The assembler will accept either a hex or float as the immediate value. That is, the value -1.5 can be represented as #-1.5 or #0xBFC0. RaH = #16FHi:0 - RbH This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No Yes Yes The STF register flags are modified as follows: • LUF = 1 if MPYF32 generates an underflow condition. • LVF = 1 if MPYF32 generates an overflow condition. Pipeline This is a 2 pipeline cycle (2p) instruction. That is: SUBF32 RaH, #16FHi, RbH ; 2 pipeline cycles (2p) NOP ; 1 cycle delay or non-conflicting instruction ; <-- SUBF32 completes, RaH updated NOP Any instruction in the delay slot must not use RaH as a destination register or as a source operand. Example Calculate Y = 2.0 - (A + B): MOVL XAR4, #A MOV32 R0H, *XAR4 MOVL XAR4, #B MOV32 R1H, *XAR4 ADDF32 R0H,R1H,R0H NOP ; Load R0H with A ; Load R1H with B ; Add A + B and in parallel ; SUBF32 R0H,#2.0,R2H ; NOP ; MOV32 *XAR4,R0H ; See also <-- ADDF32 complete Subtract (A + B) from 2.0 <-- SUBF32 completes Store the result SUBF32 RaH, RbH, RcH SUBF32 RdH, ReH, RfH || MOV32 RaH, mem32 SUBF32 RdH, ReH, RfH || MOV32 mem32, RaH MPYF32 RaH, RbH, RcH || SUBF32 RdH, ReH, RfH SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 127 SUBF32 RdH, ReH, RfH ∥MOV32 RaH, mem32 — 32-bit Floating-Point Subtraction with Parallel Move www.ti.com SUBF32 RdH, ReH, RfH ∥MOV32 RaH, mem32 32-bit Floating-Point Subtraction with Parallel Move Operands RdH floating-point destination register (R0H to R7H) for the SUBF32 operation RdH cannot be the same register as RaH ReH floating-point source register (R0H to R7H) for the SUBF32 operation RfH floating-point source register (R0H to R7H) for the SUBF32 operation RaH floating-point destination register (R0H to R7H) for the MOV32 operation RaH cannot be the same register as RdH mem32 pointer to 32-bit source memory location for the MOV32 operation Opcode LSW: 1110 0011 0010 fffe MSW: eedd daaa mem32 Description Subtract the contents of two floating-point registers and move from memory to a floatingpoint register. RdH = ReH - RfH, RaH = [mem32] Restrictions The destination register for the SUBF32 and the MOV32 must be unique. That is, RaH cannot be the same register as RdH. Flags This instruction modifies the following flags in the STF register: Flag TF ZI NI ZF NF LUF LVF Modified No Yes Yes Yes Yes Yes Yes The STF register flags are modified as follows: • LUF = 1 if SUBF32 generates an underflow condition. • LVF = 1 if SUBF32 generates an overflow condition. The MOV32 Instruction will set the NF, ZF, NI and ZI flags as follows: NF = RaH(31); ZF = 0; if(RaH(30:23) == 0) { ZF = 1; NF = 0; } NI = RaH(31); ZI = 0; if(RaH(31:0) == 0) ZI = 1; Pipeline SUBF32 is a 2 pipeline-cycle instruction (2p) and MOV32 takes a single cycle. That is: SUBF32 RdH, ReH, RfH ; 2 pipeline cycles (2p) || MOV32 RaH, mem32 ; 1 cycle ; <-- MOV32 completes, RaH updated NOP ; 1 cycle delay or non-conflicting instruction ; <-- SUBF32 completes, RdH updated NOP Any instruction in the delay slot must not use RdH as a destination register or as a source operand. 128 Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com SUBF32 RdH, ReH, RfH ∥MOV32 RaH, mem32 — 32-bit Floating-Point Subtraction with Parallel Move Example MOVL XAR1, #0xC000 SUBF32 R0H, R1H, R2H || MOV32 R3H, *XAR1 MOV32 R4H, *+XAR1[2] ADDF32 R5H, R4H, R3H || MOV32 *+XAR1[4], R0H MOVL XAR2, #0xE000 MOV32 *XAR2, R5H See also ; ; ; ; ; ; ; ; ; ; ; ; ; XAR1 = 0xC000 (A) R0H = R1H - R2H <-- R3H valid <-- (A) completes, R0H valid, R4H valid (B) R5H = R4H + R3H <-- R0H stored <-- (B) completes, R5H valid <-- R5H stored SUBF32 RaH, RbH, RcH SUBF32 RaH, #16FHi, RbH MPYF32 RaH, RbH, RcH || SUBF32 RdH, ReH, RfH SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 129 SUBF32 RdH, ReH, RfH ∥MOV32 mem32, RaH — 32-bit Floating-Point Subtraction with Parallel Move www.ti.com SUBF32 RdH, ReH, RfH ∥MOV32 mem32, RaH 32-bit Floating-Point Subtraction with Parallel Move Operands RdH floating-point destination register (R0H to R7H) for the SUBF32 operation ReH floating-point source register (R0H to R7H) for the SUBF32 operation RfH floating-point source register (R0H to R7H) for the SUBF32 operation mem32 pointer to 32-bit destination memory location for the MOV32 operation RaH floating-point source register (R0H to R7H) for the MOV32 operation Opcode LSW: 1110 0000 0010 fffe MSW: eedd daaa mem32 Description Subtract the contents of two floating-point registers and move from a floating-point register to memory. RdH = ReH - RfH, [mem32] = RaH This instruction modifies the following flags in the STF register: SUBF32 RdH, ReH, RfH || MOV32 RaH, mem32 Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No Yes Yes The STF register flags are modified as follows: • LUF = 1 if SUBF32 generates an underflow condition. • LVF = 1 if SUBF32 generates an overflow condition. Pipeline SUBF32 is a 2 pipeline-cycle instruction (2p) and MOV32 takes a single cycle. That is: SUBF32 RdH, ReH, RfH ; 2 pipeline cycles (2p) || MOV32 mem32, RaH ; 1 cycle ; <-- MOV32 completes, mem32 updated NOP ; 1 cycle delay or non-conflicting instruction ; <-- ADDF32 completes, RdH updated NOP Any instruction in the delay slot must not use RdH as a destination register or as a source operand. Example 130 ADDF32 || MOV32 R3H, R6H, R4H ; (A) R3H = R6H + R4H and R7H = I3 R7H, *-SP[2] ; ; <-- R7H valid SUBF32 R6H, R6H, R4H ; (B) R6H = R6H - R4H ; <-- ADDF32 (A) completes, R3H valid SUBF32 R3H, R1H, R7H ; (C) R3H = R1H - R7H and store R3H (A) || MOV32 *+XAR5[2], R3H ; ; <-- SUBF32 (B) completes, R6H valid ; <-- MOV32 completes, (A) stored ADDF32 R4H, R7H, R1H ; R4H = D = R7H + R1H and store R6H (B) || MOV32 *+XAR5[6], R6H ; ; <-- SUBF32 (C) completes, R3H valid ; <-- MOV32 completes, (B) stored MOV32 *+XAR5[0], R3H ; store R3H (C) ; <-- MOV32 completes, (C) stored ; <-- ADDF32 (D) completes, R4H valid MOV32 *+XAR5[4], R4H ; store R4H (D) ; <-- MOV32 completes, (D) stored Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com See also SUBF32 RdH, ReH, RfH ∥MOV32 mem32, RaH — 32-bit Floating-Point Subtraction with Parallel Move SUBF32 RaH, RbH, RcH SUBF32 RaH, #16FHi, RbH SUBF32 RdH, ReH, RfH || MOV32 RaH, mem32 MPYF32 RaH, RbH, RcH || SUBF32 RdH, ReH, RfH SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 131 SWAPF RaH, RbH{, CNDF} — Conditional Swap www.ti.com SWAPF RaH, RbH{, CNDF} Conditional Swap Operands RaH floating-point register (R0H to R7H) RbH floating-point register (R0H to R7H) CNDF condition tested Opcode LSW: 1110 0110 1110 CNDF MSW: 0000 0000 00bb baaa Description Conditional swap of RaH and RbH. if (CNDF == true) swap RaH and RbH CNDF is one of the following conditions: Encode (1) CNDF Description STF Flags Tested 0000 NEQ Not equal to zero ZF == 0 0001 EQ Equal to zero ZF == 1 0010 GT Greater than zero ZF == 0 AND NF == 0 0011 GEQ Greater than or equal to zero NF == 0 0100 LT Less than zero NF == 1 0101 LEQ Less than or equal to zero ZF == 1 AND NF == 1 1010 TF Test flag set TF == 1 1011 NTF Test flag not set TF == 0 1100 LU Latched underflow LUF == 1 1101 LV Latched overflow LVF == 1 1110 UNC Unconditional None Unconditional with flag modification None 1111 (1) (2) (2) UNCF Values not shown are reserved. This is the default operation if no CNDF field is specified. This condition will allow the ZF, NF, ZI, and NI flags to be modified when a conditional operation is executed. All other conditions will not modify these flags. This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No No flags affected Pipeline This is a single-cycle instruction. Example ;find the largest element and put it in R1H MOVL XAR1, #0xB000 MOV32 R1H, *XAR1 .align 2 ; ; Initialize R1H NOP RPTB LOOP_END, #(10-1); MOV32 R2H, *XAR1++ ; CMPF32 R2H, R1H ; SWAPF R1H, R2H, GT ; NOP ; NOP ; LOOP_END: 132 Floating Point Unit (FPU) Execute the block 10 times Update R2H with next element Compare R2H with R1H Swap R1H and R2H if R2 > R1 For minimum repeat block size For minimum repeat block size SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated TESTTF CNDF — Test STF Register Flag Condition www.ti.com TESTTF CNDF Test STF Register Flag Condition Operands CNDF condition to test Opcode LSW: 1110 0101 1000 CNDF Description Test the floating-point condition and if true, set the TF flag. If the condition is false, clear the TF flag. This is useful for temporarily storing a condition for later use. if (CNDF == true) TF = 1; else TF = 0; CNDF is one of the following conditions: Encode (1) CNDF Description STF Flags Tested 0000 NEQ Not equal to zero ZF == 0 0001 EQ Equal to zero ZF == 1 0010 GT Greater than zero ZF == 0 AND NF == 0 0011 GEQ Greater than or equal to zero NF == 0 0100 LT Less than zero NF == 1 0101 LEQ Less than or equal to zero ZF == 1 AND NF == 1 1010 TF Test flag set TF == 1 1011 NTF Test flag not set TF == 0 1100 LU Latched underflow LUF == 1 1101 LV Latched overflow LVF == 1 1110 UNC Unconditional None 1111 UNCF Unconditional with flag modification None (1) (2) (2) Values not shown are reserved. This is the default operation if no CNDF field is specified. This condition will allow the ZF, NF, ZI, and NI flags to be modified when a conditional operation is executed. All other conditions will not modify these flags. This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified Yes No No No No No No TF = 0; if (CNDF == true) TF = 1; Note: If (CNDF == UNC or UNCF), the TF flag will be set to 1. Pipeline This is a single-cycle instruction. Example CMPF32 R0H, #0.0 ; Compare R0H against 0 TESTTF LT ; Set TF if R0H less than 0 (NF == 0) ABS R0H, R0H ; Get the absolute value of R0H ; Perform calculations based on ABS R0H MOVST0 TF ; Copy TF to TC in ST0 SBF End, NTC ; Branch to end if TF was not set NEGF32 R0H, R0H End See also SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 133 UI16TOF32 RaH, mem16 — Convert unsigned 16-bit integer to 32-bit floating-point value www.ti.com UI16TOF32 RaH, mem16 Convert unsigned 16-bit integer to 32-bit floating-point value Operands RaH floating-point destination register (R0H to R7H) mem16 pointer to 16-bit source memory location Opcode LSW: 1110 0010 1100 0100 MSW: 0000 0aaa mem16 Description RaH = UI16ToF32[mem16] Flags This instruction does not affect any flags: Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline This is a 2 pipeline cycle (2p) instruction. That is: UI16TOF32 RaH, mem16 ; 2 pipeline cycles (2p) NOP ; 1 cycle delay or non-conflicting instruction ; <-- UI16TOF32 completes, RaH updated NOP Any instruction in the delay slot must not use RaH as a destination register or as a source operand. Example ; float32 y,m,b; ; AdcRegs.RESULT0 is an unsigned int ; Calculate: y = (float)AdcRegs.ADCRESULT0 * m + b; ; MOVW DP @0x01C4 UI16TOF32 R0H, @8 ; R0H = (float)AdcRegs.RESULT0 MOV32 R1H, *-SP[6] ; R1H = M ; <-- Conversion complete, R0H valid MPYF32 R0H, R1H, R0H ; R0H = (float)X * M MOV32 R1H, *-SP[8] ; R1H = B ; <-- MPYF32 complete, R0H valid ADDF32 R0H, R0H, R1H ; R0H = Y = (float)X * M + B NOP ; <-- ADDF32 complete, R0H valid MOV32 *-[SP], R0H ; Store Y See also F32TOI16 RaH, RbH F32TOI16R RaH, RbH F32TOUI16 RaH, RbH F32TOUI16R RaH, RbH I16TOF32 RaH, RbH I16TOF32 RaH, mem16 UI16TOF32 RaH, RbH 134 Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated UI16TOF32 RaH, RbH — Convert unsigned 16-bit integer to 32-bit floating-point value www.ti.com UI16TOF32 RaH, RbH Convert unsigned 16-bit integer to 32-bit floating-point value Operands RaH floating-point destination register (R0H to R7H) RbH floating-point source register (R0H to R7H) Opcode LSW: 1110 0110 1000 1111 MSW: 0000 0000 00bb baaa Description RaH = UI16ToF32[RbH] Flags This instruction does not affect any flags: Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline This is a 2 pipeline cycle (2p) instruction. That is: UI16TOF32 RaH, RbH ; 2 pipeline cycles (2p) NOP ; 1 cycle delay or non-conflicting instruction ; <-- UI16TOF32 completes, RaH updated NOP Any instruction in the delay slot must not use RaH as a destination register or as a source operand. Example MOVXI R5H, #0x800F ; R5H[15:0] = 32783 (0x800F) UI16TOF32 R6H, R5H ; R6H = UI16TOF32 (R5H[15:0]) NOP ; 1 cycle delay for UI16TOF32 to complete ; R6H = 32783.0 (0x47000F00) See also F32TOI16 RaH, RbH F32TOI16R RaH, RbH F32TOUI16 RaH, RbH F32TOUI16R RaH, RbH I16TOF32 RaH, RbH I16TOF32 RaH, mem16 UI16TOF32 RaH, mem16 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 135 UI32TOF32 RaH, mem32 — Convert Unsigned 32-bit Integer to 32-bit Floating-Point Value www.ti.com UI32TOF32 RaH, mem32 Convert Unsigned 32-bit Integer to 32-bit Floating-Point Value Operands RaH floating-point destination register (R0H to R7H) mem32 pointer to 32-bit source memory location Opcode LSW: 1110 0010 1000 0100 MSW: 0000 0aaa mem32 Description RaH = UI32ToF32[mem32] Flags This instruction does not affect any flags: Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline This is a 2 pipeline cycle (2p) instruction. That is: UI32TOF32 RaH, mem32 ; 2 pipeline cycles (2p) NOP ; 1 cycle delay non-conflicting instruction ; <-- UI32TOF32 completes, RaH updated NOP Any instruction in the delay slot must not use RaH as a destination register or as a source operand. Example ; ; ; ; ; unsigned long X float Y, M, B ... Calculate Y = (float)X * M + B UI32TOF32 R0H, *-SP[2] ; R0H = (float)X MOV32 R1H, *-SP[6] ; R1H = M ; <-- Conversion complete, MPYF32 R0H, R1H, R0H ; R0H = (float)X * M MOV32 R1H, *-SP[8] ; R1H = B ; <-- MPYF32 complete, R0H ADDF32 R0H, R0H, R1H ; R0H = Y = (float)X * M + NOP ; <-- ADDF32 complete, R0H MOV32 *-[SP], R0H ; Store Y See also 136 R0H valid valid B valid F32TOI32 RaH, RbH F32TOUI32 RaH, RbH I32TOF32 RaH, mem32 I32TOF32 RaH, RbH UI32TOF32 RaH, RbH Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated UI32TOF32 RaH, RbH — Convert Unsigned 32-bit Integer to 32-bit Floating-Point Value www.ti.com UI32TOF32 RaH, RbH Convert Unsigned 32-bit Integer to 32-bit Floating-Point Value Operands RaH floating-point destination register (R0H to R7H) RbH floating-point source register (R0H to R7H) Opcode LSW: 1110 0110 1000 1011 MSW: 0000 0000 00bb baaa Description RaH = UI32ToF32[RbH] Flags This instruction does not affect any flags: Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Pipeline This is a 2 pipeline cycle (2p) instruction. That is: UI32TOF32 RaH, RbH ; 2 pipeline cycles (2p) NOP ; 1 cycle delay or non-conflicting instruction ; <-- UI32TOF32 completes, RaH updated NOP Any instruction in the delay slot must not use RaH as a destination register or as a source operand. Example MOVIZ R3H, #0x8000 ; R3H[31:16] = 0x8000 MOVXI R3H, #0x1111 ; R3H[15:0] = 0x1111 ; R3H = 2147488017 UI32TOF32 R4H, R3H ; R4H = UI32TOF32 (R3H) NOP ; 1 cycle delay for UI32TOF32 to complete ; R4H = 2147488017.0 (0x4F000011) See also F32TOI32 RaH, RbH F32TOUI32 RaH, RbH I32TOF32 RaH, mem32 I32TOF32 RaH, RbH UI32TOF32 RaH, mem32 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 137 ZERO RaH — Zero the Floating-Point Register RaH ZERO RaH www.ti.com Zero the Floating-Point Register RaH Operands RaH floating-point register (R0H to R7H) Opcode LSW: 1110 0101 1001 0aaa Description Zero the indicated floating-point register: RaH = 0 Flags This instruction modifies the following flags in the STF register: Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No No flags affected. Pipeline This is a single-cycle instruction. Example ;for(i = 0; i < n; i++) ;{ ; real += (x[2*i] * y[2*i]) - (x[2*i+1] * y[2*i+1]); ; imag += (x[2*i] * y[2*i+1]) + (x[2*i+1] * y[2*i]); ;} ;Assume AR7 = n-1 ZERO R4H ; R4H = real = 0 ZERO R5H ; R5H = imag = 0 LOOP MOV AL, AR7 MOV ACC, AL << 2 MOV AR0, ACC MOV32 R0H, *+XAR4[AR0] ; R0H = x[2*i] MOV32 R1H, *+XAR5[AR0] ; R1H = y[2*i] ADD AR0, #2 MPYF32 R6H, R0H, R1H; ; R6H = x[2*i] * y[2*i] || MOV32 R2H, *+XAR4[AR0] ; R2H = x[2*i+1] MPYF32 R1H, R1H, R2H ; R1H = y[2*i] * x[2*i+2] || MOV32 R3H, *+XAR5[AR0] ; R3H = y[2*i+1] MPYF32 R2H, R2H, R3H ; R2H = x[2*i+1] * y[2*i+1] || ADDF32 R4H, R4H, R6H ; R4H += x[2*i] * y[2*i] MPYF32 R0H, R0H, R3H ; R0H = x[2*i] * y[2*i+1] || ADDF32 R5H, R5H, R1H ; R5H += y[2*i] * x[2*i+2] SUBF32 R4H, R4H, R2H ; R4H -= x[2*i+1] * y[2*i+1] ADDF32 R5H, R5H,R0H ; R5H += x[2*i] * y[2*i+1] BANZ LOOP , AR7-- See also ZEROA 138 Floating Point Unit (FPU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated ZEROA — Zero All Floating-Point Registers www.ti.com ZEROA Zero All Floating-Point Registers Operands none Opcode LSW: 1110 0101 0110 0011 Description Zero all floating-point registers: R0H R1H R2H R3H R4H R5H R6H R7H = = = = = = = = 0 0 0 0 0 0 0 0 This instruction modifies the following flags in the STF register: Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No No flags affected. Pipeline This is a single-cycle instruction. Example ;for(i = 0; i < n; i++) ;{ ; real += (x[2*i] * y[2*i]) - (x[2*i+1] * y[2*i+1]); ; imag += (x[2*i] * y[2*i+1]) + (x[2*i+1] * y[2*i]); ;} ;Assume AR7 = n-1 ZER0A ; Clear all RaH registers LOOP MOV AL, AR7 MOV ACC, AL << 2 MOV AR0, ACC MOV32 R0H, *+XAR4[AR0] ; R0H = x[2*i] MOV32 R1H, *+XAR5[AR0] ; R1H = y[2*i] ADD AR0,#2 MPYF32 R6H, R0H, R1H; ; R6H = x[2*i] * y[2*i] || MOV32 R2H, *+XAR4[AR0] ; R2H = x[2*i+1] MPYF32 R1H, R1H, R2H ; R1H = y[2*i] * x[2*i+2] || MOV32 R3H, *+XAR5[AR0] ; R3H = y[2*i+1] MPYF32 R2H, R2H, R3H ; R2H = x[2*i+1] * y[2*i+1] || ADDF32 R4H, R4H, R6H ; R4H += x[2*i] * y[2*i] MPYF32 R0H, R0H, R3H ; R0H = x[2*i] * y[2*i+1] || ADDF32 R5H, R5H, R1H ; R5H += y[2*i] * x[2*i+2] SUBF32 R4H, R4H, R2H ; R4H -= x[2*i+1] * y[2*i+1] ADDF32 R5H, R5H,R0H ; R5H += x[2*i] * y[2*i+1] BANZ LOOP , AR7-- See also ZERO RaH SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Floating Point Unit (FPU) 139 Chapter 2 SPRUHS1A – March 2014 – Revised December 2015 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) This chapter provides an overview of the architectural structure and instruction set of the Viterbi, Complex Math and CRC Unit (VCU-II) and describes the architecture, pipeline, instruction set, and interrupts. The VCU is a fully-programmable block which accelerates the performance of communications-based algorithms. In addition to eliminating the need for a second processor to manage the communications link, the performance gains of the VCU provides headroom for future system growth and higher bit rates or, conversely, enables devices to operate at a lower MHz to reduce system cost and power consumption. Any references to VCU or VCU-II in this chapter relate to Type 2 specifically. Information pertaining to an older VCU will have the module type listed explicitly. See the TMS320x28xx, 28xxx DSP Peripheral Reference Guide (SPRU566) for a list of all devices with a VCU module of the same type, to determine the differences between the types, and for a list of device-specific differences within a type. 140 Topic ........................................................................................................................... 2.1 2.2 2.3 2.4 2.5 2.6 Overview ......................................................................................................... Components of the C28x Plus VCU..................................................................... Register Set ..................................................................................................... Pipeline ........................................................................................................... Instruction Set.................................................................................................. Rounding Mode ................................................................................................ C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Page 141 142 146 154 159 379 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Overview www.ti.com 2.1 Overview The C28x with VCU (C28x+VCU) processor extends the capabilities of the C28x fixed-point or floatingpoint CPU by adding registers and instructions to support the following algorithm types: • Viterbi decoding Viterbi decoding is commonly used in baseband communications applications. The viterbi decode algorithm consists of three main parts: branch metric calculations, compare-select (viterbi butterfly) and a traceback operation. Table 2-1 shows a summary of the VCU performance for each of these operations. Table 2-1. Viterbi Decode Performance Viterbi Operation (1) (2) • • VCU Cycles Branch Metric Calculation (code rate = 1/2) 1 Branch Metric Calculation (code rate = 1/3) 2p Viterbi Butterfly (add-compare-select) 2 (1) Traceback per Stage 3 (2) C28x CPU takes 15 cycles per butterfly. C28x CPU takes 22 cycles per stage. Cyclic redundancy check (CRC) CRC algorithms provide a straightforward method for verifying data integrity over large data blocks, communication packets, or code sections. The C28x+VCU can perform 8-, 16-, 24-, and 32-bit CRCs. For example, the VCU can compute the CRC for a block length of 10 bytes in 10 cycles. A CRC result register contains the current CRC which is updated whenever a CRC instruction is executed. Complex math Complex math is used in many applications. The VCU A few of which are: – Fast fourier transform (FFT) The complex FFT is used in spread spectrum communications, as well in many signal processing algorithms. – Complex filters Complex filters improve data reliability, transmission distance, and power efficiency. The C28x+VCU can perform a complex I and Q multiply with coefficients (four multiplies) in a single cycle. In addition, the C28x+VCU can read/write the real and imaginary parts of 16-bit complex data to memory in a single cycle. Table 2-2 shows a summary of the VCU operations enabled by the VCU: Table 2-2. Complex Math Performance Complex Math Operation VCU Cycles Notes Add Or Subtract 1 32 +/- 32 = 32-bit (Useful for filters) Add or Subtract 1 16 +/- 32 = 15-bit (Useful for FFT) Multiply 2p 16 x 16 = 32-bit Multiply & Accumulate (MAC) 2p 32 + 32 = 32-bit, 16 x 16 = 32-bit RPT MAC 2p+N Repeat MAC. Single cycle after the first operation. This C28x+VCU draws from the best features of digital signal processing; reduced instruction set computing (RISC); and microcontroller architectures, firmware, and tool sets. The C2000 features include a modified Harvard architecture and circular addressing. The RISC features are single-cycle instruction execution, register-to-register operations, and modified Harvard architecture (usable in Von Neumann mode). The microcontroller features include ease of use through an intuitive instruction set, byte packing and unpacking, and bit manipulation. The modified Harvard architecture of the CPU enables instruction and data fetches to be performed in parallel. The CPU can read instructions and data while it writes data simultaneously to maintain the single-cycle instruction operation across the pipeline. The CPU does this over six separate address/data buses. Throughout this document the following notations are used: SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 141 Components of the C28x Plus VCU • • • • 2.2 www.ti.com C28x refers to the C28x fixed-point CPU. C28x plus Floating-Point and C28x+FPU both refer to the C28x CPU with enhancements to support IEEE single-precision floating-point operations. C28x plus VCU and C28x+VCU both refer to the C28x CPU with enhancements to support viterbi decode, complex math, forward error correcting algorithms, and CRC. Some devices have both the FPU and the VCU. These are referred to as C28x+FPU+VCU. Components of the C28x Plus VCU The VCU extends the capabilities of the C28x CPU and C28x+FPU processors by adding additional instructions. No changes have been made to existing instructions, pipeline, or memory bus architecture. Therefore, programs written for the C28x are completely compatible with the C28x+VCU. All of the features of the C28x documented in TMS320C28x DSP CPU and Instruction Set Reference Guide (literature number SPRU430) apply to the C28x+VCU. All features documented in the TMS320C28x Floating Point Unit and Instruction Set Reference Guide (SPRUE02) apply to the C28x+FPU+VCU. Figure 2-1 shows the block diagram of the VCU. Figure 2-1. C28x + VCU Block Diagram Memory bus Program address bus (22) Program data bus (32) Read address bus (32) Read data bus (32) C28x + FPU + Vcu Existing memory, peripherals, interfaces LVF LUF Memory bus PIE Write data bus (32) Write address bus (32) 142 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Components of the C28x Plus VCU www.ti.com The C28x+VCU contains the same features as the C28x fixed-point CPU: • A central processing unit for generating data and program-memory addresses; decoding and executing instructions; performing arithmetic, logical, and shift operations; and controlling data transfers among CPU registers, data memory, and program memory. • Emulation logic for monitoring and controlling various parts and functions of the device and for testing device operation. This logic is identical to that on the C28x fixed-point CPU. • Signals for interfacing with memory and peripherals, clocking and controlling the CPU and the emulation logic, showing the status of the CPU and the emulation logic, and using interrupts. This logic is identical to the C28x fixed-point CPU. • Arithmetic logic unit (ALU). The 32-bit ALU performs 2s-complement arithmetic and Boolean logic operations. • Address register arithmetic unit (ARAU). The ARAU generates data memory addresses and increments or decrements pointers in parallel with ALU operations. • Fixed-Point instructions are pipeline protected. This pipeline for fixed-point instructions is identical to that on the C28x fixed-point CPU. The CPU implements an 8-phase pipeline that prevents a write to and a read from the same location from occurring out of order. • Barrel shifter. This shifter performs all left and right shifts of fixed-point data. It can shift data to the left by up to 16 bits and to the right by up to 16 bits. • Fixed-Point Multiplier. The multiplier performs 32-bit × 32-bit 2s-complement multiplication with a 64-bit result. The multiplication can be performed with two signed numbers, two unsigned numbers, or one signed number and one unsigned number. The VCU adds the following features: • Instructions to support Cyclic Redundancy Check (CRC) or a polynomial code checksum – CRC8 – CRC16 – CRC32 – CRC24 • Clocked at the same rate as the main CPU (SYSCLKOUT). • Instructions to support a software implementation of a Viterbi Decoder of constraint length 4 - 7 and code rates of 1/2 and 1/3 – Branch metrics calculations – Add-Compare Select or Viterbi Butterfly – Traceback • Complex Math Arithmetic Unit – Add or Subtract – Multiply – Multiply and Accumulate (MAC) – Repeat MAC (RPT || MAC). • Independent register space. These registers function as source and destination registers for VCU instructions. • Some VCU instructions require pipeline alignment. This alignment is done through software to allow the user to improve performance by taking advantage of required delay slots. See Section 2.4 for more information. Devices with the floating-point unit also include: • Floating point unit (FPU). The 32-bit FPU performs IEEE single-precision floating-point operations. • Dedicated floating-point registers. SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 143 Components of the C28x Plus VCU www.ti.com 2.2.1 Emulation Logic The emulation logic is identical to that on the C28x fixed-point CPU. This logic includes the following features. For more details about these features, refer to the TMS320C28x DSP CPU and Instruction Set Reference Guide (literature number SPRU430): • Debug-and-test direct memory access (DT-DMA). A debug host can gain direct access to the content of registers and memory by taking control of the memory interface during unused cycles of the instruction pipeline • A counter for performance benchmarking. • Multiple debug events. Any of the following debug events can cause a break in program execution: – A breakpoint initiated by the ESTOP0 or ESTOP1 instruction. – An access to a specified program-space or data-space location. When a debug event causes the C28x to enter the debug-halt state, the event is called a break event. • Real-time mode of operation. 2.2.2 Memory Map Like the C28x, the C28x+VCU uses 32-bit data addresses and 22-bit program addresses. This allows for a total address reach of 4G words (1 word = 16 bits) in data space and 4M words in program space. Memory blocks on all C28x+VCU designs are uniformly mapped to both program and data space. For specific details about each of the map segments, see the device-specific data manual. 2.2.3 CPU Interrupt Vectors The C28x+VCU interrupt vectors are identical to those on the C28x CPU. Sixty-four addresses in program space are set aside for a table of 32 CPU interrupt vectors. For more information about the CPU vectors, see TMS320C28x CPU and Instruction Set Reference Guide (literature number SPRU430). Typically the CPU interrupt vectors are only used during the boot up of the device by the boot ROM. Once an application has taken control it should initialize and enable the peripheral interrupt expansion block (PIE). 2.2.4 Memory Interface The C28x+VCU memory interface is identical to that on the C28x. The C28x+VCU memory map is accessible outside the CPU by the memory interface, which connects the CPU logic to memories, peripherals, or other interfaces. The memory interface includes separate buses for program space and data space. This means an instruction can be fetched from program memory while data memory is being accessed. The interface also includes signals that indicate the type of read or write being requested by the CPU. These signals can select a specified memory block or peripheral for a given bus transaction. In addition to 16-bit and 32-bit accesses, the CPU supports special byte-access instructions that can access the least significant byte (LSByte) or most significant byte (MSByte) of an addressed word. Strobe signals indicate when such an access is occurring on a data bus. 2.2.5 Address and Data Buses Like the C28x, the memory interface has three address buses: • PAB: Program address bus: The 22-bit PAB carries addresses for reads and writes from program space. • DRAB: Data-read address bus: The 32-bit DRAB carries addresses for reads from data space. • DWAB: Data-write address bus: The 32-bit DWAB carries addresses for writes to data space. The memory interface also has three data buses: • PRDB: Program-read data bus: The 32-bit PRDB carries instructions during reads from program space. • DRDB: Data-read data bus: The 32-bit DRDB carries data during reads from data space. • DWDB: Data-/Program-write data bus: The 32-bit DWDB carries data during writes to data space or program space. 144 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Components of the C28x Plus VCU www.ti.com A program-space read and a program-space write cannot happen simultaneously because both use the PAB. Similarly, a program-space write and a data-space write cannot happen simultaneously because both use the DWDB. Transactions that use different buses can happen simultaneously. For example, the CPU can read from program space (using PAB and PRDB), read from data space (using DRAB and DRDB), and write to data space (using DWAB and DWDB) at the same time. This behavior is identical to the C28x CPU. 2.2.6 Alignment of 32-Bit Accesses to Even Addresses The C28x+VPU expects memory wrappers or peripheral-interface logic to align any 32-bit read or write to an even address. If the address-generation logic generates an odd address, the CPU will begin reading or writing at the previous even address. This alignment does not affect the address values generated by the address-generation logic. Most instruction fetches from program space are performed as 32-bit read operations and are aligned accordingly. However, alignment of instruction fetches are effectively invisible to a programmer. When instructions are stored to program space, they do not have to be aligned to even addresses. Instruction boundaries are decoded within the CPU. You need to be concerned with alignment when using instructions that perform 32-bit reads from or writes to data space. SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 145 Register Set 2.3 www.ti.com Register Set Devices with the C28x+VCU include the standard C28x register set plus an additional set of VCU specific registers. The additional VCU registers are the following: • Result registers: VR0, VR1... VR8 • Traceback registers: VT0, VT1 • Configuration and status register: VSTATUS • CRC result register: VCRC • Repeat block register: RB Figure 2-2 shows the register sets for the 28x CPU, the FPU and the VCU. The following section discusses the VCU register set in detail. Figure 2-2. C28x + FPU + VCU Registers Standard C28x Register Set Additional 32-bit FPU Registers Standard VCU Register Set ACC (32-bit) R0H (32-bit) VR0 P (32-bit) VR1 R1H (32-bit) XT (32-bit) VR2 VR3 R2H (32-bit) XAR0 (32-bit) VR4 XAR1 (32-bit) R3H (32-bit) VR5 XAR2 (32-bit) VR6 R4H (32-bit) XAR3 (32-bit) VR7 XAR4 (32-bit) R5H (32-bit) VR8 XAR5 (32-bit) VT0 R6H (32-bit) XAR6 (32-bit) VT1 XAR7 (32-bit) VSTATUS R7H (32-bit) VCRC PC (22-bit) RPC (22-bit) FPU Status Register (STF) VSM0 DP (16-bit) Repeat Block Register (RB) SP (16-bit) FPU registers R0H - R7H and STF are shadowed for fast context save and restore ST0 (16-bit) VSM1 . . . . . . VSM63 ST1 (16-bit) IER (16-bit) IFR (16-bit) DBGIER (16-bit) 146 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Register Set www.ti.com 2.3.1 VCU Register Set Table 2-3 describes the VCU module register set. The last three columns indicate whether the particular module within the VCU can make use of the register. Table 2-3. VCU Register Set Register Name Size Description Viterbi Complex Math CRC VR0 32 bits General purpose register 0 Yes Yes No VR1 32 bits General purpose register 1 Yes Yes No VR2 32 bits General purpose register 2 Yes Yes No VR3 32 bits General purpose register 3 Yes Yes No VR4 32 bits General purpose register 4 Yes Yes No VR5 32 bits General purpose register 5 Yes Yes No VR6 32 bits General purpose register 6 Yes Yes No VR7 32 bits General purpose register 7 Yes Yes No VR8 32 bits General purpose register 8 Yes No No VT0 32 bits 32-bit transition bit register 0 Yes No No VT1 32 bits 32-bit transition bit register 1 Yes No No VSTATUS 32 bits VCU status and configuration register Yes Yes No VCRC 32 bits Cyclic redundancy check (CRC) result register No No Yes VSM0VSM63 32 bits Viterbi Decoding State Metric registers Yes No No VRx.By x=0–7 y=0-3 32 bits Aliased address space for each byte of the VRx registers, leftshifted by one No No No (1) (1) Debugger writes are not allowed to the VSTATUS register. Table 2-4 lists the CPU registers available on devices with the C28x, the C28x+FPU, the C28x+VCU and the C28x+FPU+VCU. SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) 147 Copyright © 2014–2015, Texas Instruments Incorporated Register Set www.ti.com Table 2-4. 28x CPU Register Summary Register C28x CPU C28x+FPU C28x+VCU C28x+FPU+VCU ACC Yes Yes Yes Yes Fixed-point accumulator AH Yes Yes Yes Yes High half of ACC AL Yes Yes Yes Yes Low half of ACC XAR0 - XAR7 Yes Yes Yes Yes Auxiliary register 0 - 7 AR0 - AR7 Yes Yes Yes Yes Low half of XAR0 - XAR7 DP Yes Yes Yes Yes Data-page pointer IFR Yes Yes Yes Yes Interrupt flag register IER Yes Yes Yes Yes Interrupt enable register DBGIER Yes Yes Yes Yes Debug interrupt enable register P Yes Yes Yes Yes Fixed-point product register PH Yes Yes Yes Yes High half of P PL Yes Yes Yes Yes Low half of P PC Yes Yes Yes Yes Program counter RPC Yes Yes Yes Yes Return program counter SP Yes Yes Yes Yes Stack pointer ST0 Yes Yes Yes Yes Status register 0 ST1 Yes Yes Yes Yes Status register 1 XT Yes Yes Yes Yes Fixed-point multiplicand register T Yes Yes Yes Yes High half of XT TL Yes Yes Yes Yes Low half of XT ROH - R7H No Yes No Yes Floating-point Unit result registers STF No Yes No Yes Floating-point Uint status register RB No Yes Yes Yes Repeat block register VR0 - VR8 No No Yes Yes VCU general purpose registers VT0, VT1 No No Yes Yes VCU transition bit register 0 and 1 VSTATUS No No Yes Yes VCU status and configuration VCRC No No Yes Yes CRC result register VSM0-VSM63 No No Yes (1) Yes (1) No (1) (1) VRx.By x=0–7 y=0–3 (1) 148 No Yes Yes Description Viterbi State Metric Registers Aliased address space for each byte of the VRx registers, left-shifted by one Present on Type-2 VCU only C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Register Set www.ti.com 2.3.2 VCU Status Register (VSTATUS) The VCU status register (VSTATUS) register is described in Figure 2-3. There is no single instruction to directly transfer the VSTATUS register to a C28x register. To transfer the contents: 1. Store VSTATUS into memory using VMOV32 mem32, VSTATUS instruction 2. Load the value from memory into a main C28x CPU register. Configuration bits within the VSTATUS registers are set or cleared using VCU instructions. Figure 2-3. VCU Status Register (VSTATUS) 31 30 29 CRCMSGFLIP DIVE K 27 26 GFORDER 24 R/W-0 R/W-0 R/W-7 R/W-7 23 16 GFPOLY 15 14 13 12 11 10 OPACK CPACK OVRI OVFR RND SAT 9 SHIFTL 5 4 SHIFTR 0 R/W-0 R/W-0 R/W-0 R/W-0 R/W-0 R/W-0 R/W-0 R/W-0 LEGEND: R/W = Read/Write; R = Read only; -n = value after reset Table 2-5. VCU Status (VSTATUS) Register Field Descriptions Bits Field 31 CRCMSGF LIP (1) 30 Value Description CRC Message Flip This bit affects the order in which the bits in the message are taken for CRC calculation by all the CRC instructions. 0 Message bits are taken starting from most-significant to least-significant for CRC computation. In this case, bytes loaded from memory are fed directly for CRC computation. 1 Message bits are taken starting from least-significant to most-significant for CRC computation. In this case, bytes loaded from memory are “flipped” and then fed for CRC computation. DIVE (1) Divide-by-zero Error 0 Indicates whether a “divide by zero” occurred during a VMOD32 computation. This bit is cleared by executing the VCLRDIVE instruction 1 29-27 K (1) Constraint Length for Viterbi Decoding 0x7 This field sets the constraint length for the Viterbi decoding algorithm. It accepts values of 4 to 7. Values outside this range will be treated as 7 by the hardware. 1 26-24 GFORDER ( Galois Field Polynomial Order 1) 0x7 23-16 GFPOLY (1) This field holds the Order of the polynomial for all the Galois Field instructions. This field is initialized by the VGFINIT mem16 instruction. The actual order of the polynomial is GFORDER+1 Galois Field Polynomial 0 This field holds the Polynomial for all the Galois Field instructions. This field is initialized by the VGFINIT mem16 instruction. 1 15 14 (1) OPACK (1) Viterbi Traceback Packing Order This bit affects the packing order of the traceback output bits (using the VTRACE instructions) 0 Big-endian (compatible with VCU Type-0 output packing order) 1 Little-endian (VCU Type-2 mode) CPACK (1) Complex Packing Order This bit affects the packing order of the 16-bit real and 16-bit imaginary part of a complex numbers inside the 32-bit general purpose VRx register. 0 VRx[31:16] holds Real part, VRx[15:0] holds Imaginary part (VCU-I compatible mode) 1 VRx[31:16] holds Imaginary part; VRx[15:0] holds Real part Present on Type-2 VCU only. SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 149 Register Set www.ti.com Table 2-5. VCU Status (VSTATUS) Register Field Descriptions (continued) Bits Field 13 OVRI 12 Value Description Overflow or Underflow Flag: Imaginary Part 0 No overflow or underflow has been detected. 1 Indicates an overflow or underflow has occurred during the computation of the imaginary part of operations shown in Table 10-6 . This bit will be set regardless of the value of the VSTATUS[SAT] bit.OVRI bit will remain set until it is cleared by executing the VCLROVFI instruction. OVFR 11 Overflow or Underflow Flag: Real Part 0 No overflow or underflow has been detected. 1 Indicates overflow or underflow has occurred during a real number calculation for operations shown in Table 2-6. This bit will be set regardless of the value of the VSTATUS[SAT] bit. This bit will remain set until it is cleared by executing the VCLROVFR instruction. RND Rounding When a right-shift operation is performed the lower bits of the value will be lost. The RND bit determines if the shifted value is rounded or if the shifted-out bits are simply truncated. This is described in Section 2.3.2. Operations which use right-shift and rounding are shown in Table 2-6. The RND bit is set by the VRNDON instruction and cleared by the VRNDOFF instruction. 10 0 Rounding is not performed. Bits shifted out right are truncated. 1 Rounding is performed. Refer to the instruction descriptions for information on how the operation is affected by the RND bit. SAT Saturation This bit determines whether saturation will be performed for operations shown in Table 2-6. The SAT bit is set by the VSATON instruction and is cleared by the VSATOFF instruction. 9-5 0 No saturation is performed. 1 Saturation is performed. SHIFTL Left Shift Operations which use left-shift are shown in Table 2-6 The shift SHIFTL field can be set or cleared by the VSETSHL instruction. 0 No left shift. 0x01 0x1F 4-0 Refer to the instruction description for information on how the operation is affected by the shift value. During the left-shift, the lower bits are filled with 0's. SHIFTR Right Shift Operations which use right-shift and rounding are shown in Table 2-6. The shift SHIFTR field can be set or cleared by the VSETSHR instruction. 0 No right shift. 0x01 0x1F Refer to the instruction descriptions for information on how the operation is affected by the shift value. During the right-shift, the lower bits are lost, and the shifted value is sign extended. If rounding is enabled (VSTATUS[RND] == 1) , then the value will be rounded instead of truncated. Table 2-6 shows a summary of the operations that are affected by or modify bits in the VSTATUS register. Table 2-6. Operation Interaction With VSTATUS Bits Operation (1) OVFI OVFR RND SAT SHIFTL SHIFTR CPACK OPACK DIVE VITDLADDSUB Viterbi Add and Subtract Low - Y - Y - - - - - VITDHADDSUB Viterbi Add and Subtract High - Y - Y - - - - - VITDLSUBADD Viterbi Subtract and Add Low - Y - Y - - - - - VITDHSUBADD Viterbi Subtract and Add High - Y - Y - - - - - VITBM2 Viterbi Branch Metric CR 1/2 - Y - Y - - - - - VITBM3 Viterbi Branch Metric CR 1/3 - Y - Y - - - - - VTRACE (2) Viterbi Trace-back - - - - - - - Y - (1) (2) 150 Description Some parallel instructions also include these operations. In this case, the operation will also modify, or be affected by, VSTATUS bits as when used as part of a parallel instruction. Present on Type-2 VCU only. C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Register Set www.ti.com Table 2-6. Operation Interaction With VSTATUS Bits (continued) Operation (1) OVFI OVFR RND SAT SHIFTL SHIFTR CPACK OPACK DIVE VITSTAGE (2) Viterbi Compute 32 Butterfly Description - Y - Y - - - - - VCADD Complex 32 + 32 = 32 Y Y Y Y - Y - - - VCDADD16 Complex 16 + 32 = 32 Y Y Y Y Y Y - - - VCDSUB16 Complex 16 - 32 = 32 Y Y Y Y Y Y - - - VCMAC Complex 32 + 32 = 32, 16 x 16 = 32 Y Y Y Y - Y - - - VCCMAC (2) Complex Conjugate 32 + 32 = 32, 16 x 16 = 32 Y Y Y Y - Y Y - - VCMPY Complex 16 x 16 = 32 Y Y - Y - - Y - - VCCMPY (2) Complex Conjugate 16 x 16 = 32 Y Y - Y - - Y - - VCSUB Complex 32 - 32 = 32 Y Y Y Y - Y - - - VCCON (2) Complex Conjugate Y - - Y - - Y - - VCSHL16 (2) Complex Shift Left Y Y - Y - - Y - - VCHR16 (2) Complex Shift Right - - Y - - - - - - VCMAG (2) Complex Number Magnitude - Y Y Y - - - - - VNEG Two’s Complement Negation - Y - Y - - - - - VASHR32 (2) Arithmetic Shift Right - - Y - - - - - - VASHL32 (2) Arithmetic Shift Left - Y - Y - - - - - VMPYADD (2) Arithmetic Multiply Add 16 + ((16 x 16) >> SHR) = 16 - Y Y Y - Y - - - VCFFTx (2) Complex FFT calculation step (x = 1 – 10) Y Y Y Y - Y - - - VMOD32 Modulo 32 % 16 = 16 - - - - - - - - Y SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 151 Register Set www.ti.com 2.3.3 Repeat Block Register (RB) The repeat block instruction (RPTB) applies to devices with the C28x+FPU and the C28x+VCU. This instruction allows you to repeat a block of code as shown in Example 2-1. Example 2-1. The Repeat Block (RPTB) Instruction uses the RB Register ; find the largest element and put its address in XAR6 ; ; This example makes use of floating-point (C28x + FPU) instructions ; ; MOV32 R0H, *XAR0++; .align 2 ; Aligns the next instruction to an even address NOP ; Makes RPTB odd aligned - required for a block size of 8 RPTB VECTOR_MAX_END, AR7 ; RA is set to 1 MOVL ACC,XAR0 MOV32 R1H,*XAR0++ ; RSIZE reflects the size of the RPTB block MAXF32 R0H,R1H ; in this case the block size is 8 MOVST0 NF,ZF MOVL XAR6,ACC,LT VECTOR_MAX_END: ; RE indicates the end address. RA is cleared The C28x FPU or VCU automatically populates the RB register based on the execution of a RPTB instruction. This register is not normally read by the application and does not accept debugger writes. Figure 2-4. Repeat Block Register (RB) 31 30 RAS RA 29 RSIZE 23 22 RE 16 R-0 R-0 R-0 R-0 15 0 RC R-0 LEGEND: R = Read only; -n = value after reset Table 2-7. Repeat Block (RB) Register Field Descriptions Bits Field 31 RAS Value Description Repeat Block Active Shadow Bit When an interrupt occurs the repeat active, RA, bit is copied to the RAS bit and the RA bit is cleared. When an interrupt return instruction occurs, the RAS bit is copied to the RA bit and RAS is cleared. 30 0 A repeat block was not active when the interrupt was taken. 1 A repeat block was active when the interrupt was taken. RA Repeat Block Active Bit 0 This bit is cleared when the repeat counter, RC, reaches zero. When an interrupt occurs the RA bit is copied to the repeat active shadow, RAS, bit and RA is cleared. When an interrupt return, IRET, instruction is executed, the RAS bit is copied to the RA bit and RAS is cleared. 1 29-23 RSIZE This bit is set when the RPTB instruction is executed to indicate that a RPTB is currently active. Repeat Block Size This 7-bit value specifies the number of 16-bit words within the repeat block. This field is initialized when the RPTB instruction is executed. The value is calculated by the assembler and inserted into the RPTB instruction's RSIZE opcode field. 0-7 Illegal block size. 8/9-0x7F A RPTB block that starts at an even address must include at least 9 16-bit words and a block that starts at an odd address must include at least 8 16-bit words. The maximum block size is 127 16-bit words. The codegen assembler will check for proper block size and alignment. 152 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Register Set www.ti.com Table 2-7. Repeat Block (RB) Register Field Descriptions (continued) Bits Field 22-16 RE Value Description Repeat Block End Address This 7-bit value specifies the end address location of the repeat block. The RE value is calculated by hardware based on the RSIZE field and the PC value when the RPTB instruction is executed. RE = lower 7 bits of (PC + 1 + RSIZE) 15-0 RC Repeat Count 0 The block will not be repeated; it will be executed only once. In this case the repeat active, RA, bit will not be set. 10xFFFF This 16-bit value determines how many times the block will repeat. The counter is initialized when the RPTB instruction is executed and is decremented when the PC reaches the end of the block. When the counter reaches zero, the repeat active bit is cleared and the block will be executed one more time. Therefore the total number of times the block is executed is RC+1. SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 153 Pipeline 2.4 www.ti.com Pipeline This section describes the VCU pipeline stages and presents cases where pipeline alignment must be considered. 2.4.1 Pipeline Overview The C28x VCU pipeline is identical to the C28x pipeline for all standard C28x instructions. In the decode2 stage (D2), it is determined if an instruction is a C28x instruction, a FPU instruction, or a VCU instruction. The pipeline flow is shown in Figure 2-5. Notice that stalls due to normal C28x pipeline stalls (D2) and memory waitstates (R2 and W) will also stall any C28x VCU instruction. Most C28x VCU instructions are single cycle and will complete in the VCU E1 or W stage which aligns to the C28x pipeline. Some instructions will take an additional execute cycle (E2). For these instructions you must wait a cycle for the result from the instruction to be available. The rest of this section will describe when delay cycles are required. Keep in mind that the assembly tools for the C28x+VCU will issue an error if a delay slot has not been handled correctly. Figure 2-5. C28x + FCU + VCU Pipeline Fetch C28x pipeline F1 Decode F2 D1 Read D2 Exe Write R1 R2 E W FPU instruction D R E1 E2 W VCU instruction D R E1 E2 W Load Store Complex ADD/SUB Viterbi ADDSUB/SUBADD FPU ADD/SUB/MPY, Complex MPY 2.4.2 General Guidelines for VCU Pipeline Alignment The majority of the VCU instructions do not require any special pipeline considerations. This section lists the few operations that do require special consideration. While the C28x+VCU assembler will issue errors for pipeline conflicts, you may still find it useful to understand when software delays are required. This section describes three guidelines you can follow when writing C28x+VCU assembly code. VCU instructions that require delay slots have a 'p' after their cycle count. For example '2p' stands for 2 pipelined cycles. This means that an instruction can be started every cycle, but the result of the instruction will only be valid one instruction later. Table 2-8 outlines the instructions that need delay slots. 154 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Pipeline www.ti.com Table 2-8. Operations Requiring a Delay Slot(s) Operation (1) Description VCMAC Complex 32 + 32 = 32, 16 x 16 = 32 2p Complex Conjugate 32 + 32 = 32, 16 x 16 = 32 2p Complex 16 x 16 = 32 2p Complex Conjugate 16 x 16 = 32 2p VCMPY VCCMPY (3) VCMAG (3) Complex Number Magnitude VCFFTx (3) Complex FFT calculation step (x = 1 – 10) VMOD32 Modulo 32 % 16 = 16 9p Arithmetic Multiply Add 16 + ((16 x 16) >> SHR) = 16 2p VMPYADD (3) (2) (3) 2p/2 (2) Viterbi Branch Metric CR 1/3 VCCMAC (3) (1) Cycles VITBM3 2 2p/2 (2) Some parallel instructions also include these operations. In this case, the operation will also modify, or be affected by, VSTATUS bits as when used as part of a parallel instruction. Variations of the instruction execute differently. In these cases, the user is referred to the description Example 2-2 of the instruction(s) in Section 2.5. Present on Type-2 VCU only. An example of the complex multiply instruction is shown in Example 2-2. VCMPY is a 2p instruction and therefore requires one delay slot. The destination registers for the operation, VR2 and VR3, will be updated one cycle after the instruction for a total of two cycles. Therefore, a NOP or instruction that does not use VR2 or VR3 must follow this instruction. Any memory stall or pipeline stall will also stall the VCU. This keeps the VCU aligned with the C28x pipeline and there is no need to change the code based on the waitstates of a memory block. Example 2-2. 2p Instruction Pipeline Alignment VCMPY VR3, VR2, VR1, VR0 NOP NOP ; ; ; ; 2 pipeline cycles (2p) 1 cycle delay or non-conflicting instruction <-- VCMPY completes, VR2 and VR3 updated Any instruction 2.4.3 Parallel Instructions Parallel instructions are single opcodes that perform two operations in parallel. The guidelines provided in Section 2.4.2 apply to parallel instructions as well. In this case the cycle count will be given for both operations. For example, a branch metric calculation for code rate of 1/3 with a parallel load takes 2p/1 cycles. This means the branch metric portion of the operation takes two pipelined cycles while the move portion of the operation is single cycle. NOPs or other non conflicting instructions must be inserted to align the branch metric calculation portion of the operation as shown in Example 2-4. Example 2-3. Branch Metric CR 1/2 Calculation with Parallel Load ; ; ; ; VITBM2 || VMOV32 instruction: branch metrics calculation with parallel load VBITM2 is a 1 cycle operation (code rate = 1/2) VMOV32 is a 1 cycle operation VITBM2 || VMOV32 VR0 VR2, @Val ; ; ; ; ; Load VR0 with the 2 branch metrics VR2 gets the contents of Val <-- VMOV32 completes here (VR2 is valid) <-- VITBM2 completes here (VR0 is valid) Any instruction, can use VR2 and/or VR0 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 155 Pipeline www.ti.com Example 2-4. Branch Metric CR 1/3 Calculation with Parallel Load ; ; ; ; VITBM3 || VMOV32 instruction: branch metrics calculation with parallel load VBITM3 is a 2p cycle operation (code rate = 1/3) VMOV32 is a 1 cycle operation VITBM3 || VMOV32 VR0, VR1, VR2 VR2, @Val ; ; ; ; ; ; Load VR0 and VR1 with the 4 branch metrics VR2 gets the contents of Val <-- VMOV32 completes here (VR2 is valid) Must not use VR0 or VR1. Can use VR2. <-- VITBM3 completes here (VR0, VR1 are valid) Any instruction, can use VR2 and/or VR0 2.4.4 Invalid Delay Instructions All VCU, FPU and fixed-point instructions can be used in VCU instruction delay slots as long as source and destination register conflicts are avoided. The C28x+VCU assembler will issue an error anytime you use an conflicting instruction within a delay slot. The following guidelines can be used to avoid these conflicts. NOTE: Destination register conflicts in delay slots: Any operation used for pipeline alignment delay must not use the same destination register as the instruction requiring the delay. See Example 2-5. In Example 2-5 the VCMPY instruction uses VR2 and VR3 as its destination registers. The next instruction should not use VR2 or VR3 as a destination. Since the VMOV32 instruction uses the VR3 register a pipeline conflict will be issued by the assembler. This conflict can be resolved by using a register other than VR2 for the VMOV32 instruction as shown in Example 2-6. 156 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Pipeline www.ti.com Example 2-5. Destination Register Conflict ; Invalid delay instruction. ; Both instructions use the same destination register (VR3) ; VCMPY VR3, VR2, VR1, VR0 ; 2p instruction VMOV32 VR3, mem32 ; Invalid delay instruction ; <-- VCMPY completes, VR3, VR2 are valid Example 2-6. Destination Register Conflict Resolved ; Valid delay instruction ; VCMPY VR3, VR2, VR1, VR0 VMOV32 VR7, mem32 NOTE: ; 2p instruction ; Valid delay instruction Instructions in delay slots cannot use the instruction's destination register as a source register. Any operation used for pipeline alignment delay must not use the destination register of the instruction requiring the delay as a source register as shown in Example 2-7. For parallel instructions, the current value of a register can be used in the parallel operation before it is overwritten as shown in Example 2-9. In Example 2-7 the VCMPY instruction again uses VR3 and VR2 as its destination registers. The next instruction should not use VR3 or VR2 as its source since the VCMPY will take an additional cycle to complete. Since the VCADD instruction uses the VR2 as a source register a pipeline conflict will be issued by the assembler. The use of VR3 will also cause a pipeline conflict. This conflict can be resolved by using a register other than VR2 or VR3 or by inserting a non-conflicting instruction between the VCMPY and VCADD instructions. Since the VNEG does not use VR2 or VR3 this instruction can be moved before the VCADD as shown in Example 2-8. Example 2-7. Destination/Source Register Conflict ; Invalid delay instruction. ; VCADD should not use VR2 or VR3 as a source operand ; VCMPY VR3, VR2, VR1, VR0 ; 2p instruction VCADD VR5, VR4, VR3, VR2 ; Invalid delay instruction VNEG VR0 ; <- VCMPY completes, VR3, VR2 valid Example 2-8. Destination/Source Register Conflict Resolved ; Valid delay instruction. ; VCMPY VR3, VR2, VR1, VR0 VNEG VR0 VCADD VR5, VR4, VR3, VR2 ; 2p instruction ; Non conflicting instruction or NOP ; <- VCMPY completes, VR3, VR2 valid It should be noted that a source register for the second operation within a parallel instruction can be the same as the destination register of the first operation. This is because the two operations are started at the same time. The second operation is not in the delay slot of the first operation. Consider Example 2-9 where the VCMPY uses VR3 and VR2 as its destination registers. The VMOV32 is the second operation in the instruction and can freely use VR3 or VR2 as a source register. In the example, the contents of VR3 before the multiply will be used by MOV32. SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 157 Pipeline www.ti.com Example 2-9. Parallel Instruction Destination/Source Exception ; Valid parallel operation. ; VCMPY VR3, VR2, VR1, VR0 || VMOV32 mem32, VR3 NOP ; ; ; ; ; 2p/1 instruction <-- Uses VR3 before the VCMPY update <-- mem32 updated <-- Delay for VCMPY <-- VR2, VR3 updated Likewise, the source register for the second operation within a parallel instruction can be the same as one of the source registers of the first operation. The VCMPY operation in Example 2-10 uses the VR0 register as one of its sources. This register is also updated by the VMOV32 instruction. The multiplication operation will use the value in VR0 before the VMOV32 updates it. Example 2-10. Parallel Instruction Destination/Source Exception ; Valid parallel operation. VCMPY VR3, VR2, VR1, VR0 || VMOV32 VR0, mem32 NOP NOTE: ; 2p/1 instruction ; <-- Uses VR3 before the VCMPY update ; <-- mem32 updated ; <-- Delay for VCMPY ; <-- VR2, VR3 updated Operations within parallel instructions cannot use the same destination register. When two parallel operations have the same destination register, the result is invalid. For example, see Example 2-11. If both operations within a parallel instruction try to update the same destination register as shown in Example 2-11 the assembler will issue an error. Example 2-11. Invalid Destination Within a Parallel Instruction ; Invalid parallel instruction. Both operations use VR3 as a destination register ; VCMPY VR3, VR2, VR1, VR0 ; 2p/1 instruction || VMOV32 VR3, mem32 ; <-- Invalid 158 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Instruction Set www.ti.com 2.5 Instruction Set This section describes the assembly language instructions of the VCU. Also described are parallel operations, conditional operations, resource constraints, and addressing modes. The instructions listed here are independent from C28x and C28x+FPU instruction sets. 2.5.1 Instruction Descriptions This section gives detailed information on the instruction set. Each instruction may present the following information: • Operands • Opcode • Description • Exceptions • Pipeline • Examples • See also The example INSTRUCTION is shown to familiarize you with the way each instruction is described. The example describes the kind of information you will find in each part of the individual instruction description and where to obtain more information. VCU instructions follow the same format as the C28x; the source operand(s) are always on the right and the destination operand(s) are on the left. The explanations for the syntax of the operands used in the instruction descriptions for the C28x VCU are given in Table 2-9. Table 2-9. Operand Nomenclature Symbol Description #16FHi 16-bit immediate (hex or float) value that represents the upper 16-bits of an IEEE 32-bit floating-point value. Lower 16-bits of the mantissa are assumed to be zero. #16FHiHex 16-bit immediate hex value that represents the upper 16-bits of an IEEE 32-bit floating-point value. Lower 16-bits of the mantissa are assumed to be zero. #16FLoHex A 16-bit immediate hex value that represents the lower 16-bits of an IEEE 32-bit floating-point value #32Fhex 32-bit immediate value that represents an IEEE 32-bit floating-point value #32F Immediate float value represented in floating-point representation #0.0 Immediate zero #5-bit 5-bit immediate unsigned value addr Opcode field indicating the addressing mode Im(X), Im(Y) Imaginary part of the input X or input Y Im(Z) Imaginary part of the output Z Re(X), Re(Y) Real part of the input X or input Y Re(Z) Real part of the output Z mem16 Pointer (using any of the direct or indirect addressing modes) to a 16-bit memory location mem32 Pointer (using any of the direct or indirect addressing modes) to a 32-bit memory location VRa VR0 - VR8 registers. Some instructions exclude VR8. Refer to the instruction description for details. VR0H, VR1H...VR7H VR0 - VR7 registers, high half. VR0L, VR1L....VR7L VR0 - VR7 registers, low half. VT0, VT1 Transition bit register VT0 or VT1. VSMn+1: VSMn Pair of State Metric Registers (n = 0 : 62, n is even) VRx.By 32 bit Aliased address space foe each byte of the VRx registers (x=0:7,y =0:3) Each instruction has a table that gives a list of the operands and a short description. Instructions always have their destination operand(s) first followed by the source operand(s). SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 159 Instruction Set www.ti.com Table 2-10. INSTRUCTION dest, source1, source2 Short Description Description 160 dest1 Description for the 1st operand for the instruction source1 Description for the 2nd operand for the instruction source2 Description for the 3rd operand for the instruction Opcode This section shows the opcode for the instruction Description Detailed description of the instruction execution is described. Any constraints on the operands imposed by the processor or the assembler are discussed. Restrictions Any constraints on the operands or use of the instruction imposed by the processor are discussed. Pipeline This section describes the instruction in terms of pipeline cycles as described in Section 2.4. Example Examples of instruction execution. If applicable, register and memory values are given before and after instruction execution. Some examples are code fragments while other examples are full tasks that assume the VCU is correctly configured and the main CPU has passed it data. Operands Each instruction has a table that gives a list of the operands and a short description. Instructions always have their destination operand(s) first followed by the source operand(s). C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Instruction Set www.ti.com 2.5.2 General Instructions The instructions are listed alphabetically, preceded by a summary. Table 2-11. General Instructions Title ...................................................................................................................................... POP RB — Pop the RB Register from the Stack ................................................................................... PUSH RB — Push the RB Register onto the Stack ................................................................................ RPTB label, loc16 — Repeat A Block of Code ..................................................................................... RPTB label, #RC — Repeat a Block of Code ....................................................................................... VCLEAR VRa — Clear General Purpose Register ................................................................................. VCLEARALL — Clear All General Purpose and Transition Bit Registers ...................................................... VCLRCPACK — Clears CPACK bit in the VSTATUS Register .................................................................. VCLRCRCMSGFLIP — Clears CRCMSGFLIP bit in the VSTATUS Register ................................................. VCLROPACK — Clears OPACK bit in the VSTATUS Register .................................................................. VCLROVFI — Clear Imaginary Overflow Flag ...................................................................................... VCLROVFR — Clear Real Overflow Flag ........................................................................................... VMOV16 mem16, VRaH — Store General Purpose Register, High Half ........................................................ VMOV16 mem16, VRaL — Store General Purpose Register, Low Half ......................................................... VMOV16 VRaH, mem16 — Load General Purpose Register, High Half ........................................................ VMOV16 VRaL, mem16 — Load General Purpose Register, Low Half ......................................................... VMOV32 *(0:16bitAddr), loc32 — Move the contents of loc32 to Memory ..................................................... VMOV32 loc32, *(0:16bitAddr) — Move 32-bit Value from Memory to loc32 .................................................. VMOV32 mem32, VRa — Store General Purpose Register ...................................................................... VMOV32 mem32, VSTATUS — Store VCU Status Register ..................................................................... VMOV32 mem32, VTa — Store Transition Bit Register ........................................................................... VMOV32 VRa, mem32 — Load 32-bit General Purpose Register ............................................................... VMOV32 VRb, VRa — Move 32-bit Register to Register .......................................................................... VMOV32 VSTATUS, mem32 — Load VCU Status Register ...................................................................... VMOV32 VTa, mem32 — Load 32-bit Transition Bit Register .................................................................... VMOVD32 VRa, mem32 — Load Register with Data Move ....................................................................... VMOVIX VRa, #16I — Load Upper Half of a General Purpose Register with I6-bit Immediate .............................. VMOVZI VRa, #16I — Load General Purpose Register with Immediate......................................................... VMOVXI VRa, #16I — Load Low Half of a General Purpose Register with Immediate ........................................ VRNDOFF — Disable Rounding ...................................................................................................... VRNDON — Enable Rounding ........................................................................................................ VSATOFF — Disable Saturation ..................................................................................................... VSATON — Enable Saturation ....................................................................................................... VSETCPACK — Set CPACK bit in the VSTATUS Register ...................................................................... VSETCRCMSGFLIP — Set CRCMSGFLIP bit in the VSTATUS Register ..................................................... VSETOPACK — Set OPACK bit in the VSTATUS Register ...................................................................... VSETSHL #5-bit — Initialize the Left Shift Value .................................................................................. VSETSHR #5-bit — Initialize the Left Shift Value .................................................................................. VSWAP32 VRb, VRa — 32-bit Register Swap ...................................................................................... VXORMOV32 VRa, mem32 — 32-bit Load and XOR From Memory ............................................................ SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated Page 162 164 166 168 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 161 POP RB — Pop the RB Register from the Stack POP RB www.ti.com Pop the RB Register from the Stack Operands RB repeat block register Opcode LSW: 1111 1111 1111 0001 Description Restore the RB register from stack. If a high-priority interrupt contains a RPTB instruction, then the RB register must be stored on the stack before the RPTB block and restored after the RTPB block. In a low-priority interrupt RB must always be saved and restored. This save and restore must occur when interrupts are disabled. Flags This instruction does not affect any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example A high priority interrupt is defined as an interrupt that cannot itself be interrupted. In a high priority interrupt, the RB register must be saved if a RPTB block is used within the interrupt. If the interrupt service routine does not include a RPTB block, then you do not have to save the RB register. ; Repeat Block within a High-Priority Interrupt (Non-Interruptible) ; ; Interrupt: ; RAS = RA, RA = 0 ... PUSH RB ; Save RB register only if a RPTB block is used in the ISR ... ... RPTB _BlockEnd, AL ; Execute the block AL+1 times ... ... ... _BlockEnd ; End of block to be repeated ... ... POP RB ; Restore RB register ... IRET ; RA = RAS, RAS = 0 A low-priority interrupt is defined as an interrupt that allows itself to be interrupted. The RB register must always be saved and restored in a low-priority interrupt. The RB register must stored before interrupts are enabled. Likewise before restoring the RB register interrupts must first be disabled. ; Repeat Block within a Low-Priority Interrupt (Interruptible) ; ; Interrupt: ; RAS = RA, RA = 0 ... PUSH RB ; Always save RB register ... CLRC INTM ; Enable interrupts only after saving RB ... ... ... ; ISR may or may not include a RPTB block ... ... SETC INTM ; Disable interrupts before restoring RB ... POP RB ; Always restore RB register ... IRET ; RA = RAS, RAS = 0 See also 162 PUSH RB C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated POP RB — Pop the RB Register from the Stack www.ti.com RPTB label, loc16 RPTB label, #RC SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 163 PUSH RB — Push the RB Register onto the Stack PUSH RB www.ti.com Push the RB Register onto the Stack Operands RB repeat block register Opcode LSW: 1111 1111 1111 0000 Description Save the RB register on the stack. If a high-priority interrupt contains a RPTB instruction, then the RB register must be stored on the stack before the RPTB block and restored after the RTPB block. In a low-priority interrupt RB must always be saved and restored. This save and restore must occur when interrupts are disabled. Flags This instruction does not affect any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example A high priority interrupt is defined as an interrupt that cannot itself be interrupted. In a high priority interrupt, the RB register must be saved if a RPTB block is used within the interrupt. If the interrupt service routine does not include a RPTB block, then you do not have to save the RB register. ; Repeat Block within a High-Priority Interrupt (Non-Interruptible) ; ; Interrupt: ; RAS = RA, RA = 0 ... PUSH RB ; Save RB register only if a RPTB block is used in the ISR ... ... RPTB _BlockEnd, AL ; Execute the block AL+1 times ... ... ... _BlockEnd ; End of block to be repeated ... ... POP RB ; Restore RB register ... IRET ; RA = RAS, RAS = 0 A low-priority interrupt is defined as an interrupt that allows itself to be interrupted. The RB register must always be saved and restored in a low-priority interrupt. The RB register must stored before interrupts are enabled. Likewise before restoring the RB register interrupts must first be disabled. ; Repeat Block within a Low-Priority Interrupt (Interruptible) ; ; Interrupt: ; RAS = RA, RA = 0 ... PUSH RB ; Always save RB register ... CLRC INTM ; Enable interrupts only after saving RB ... ... ... ; ISR may or may not include a RPTB block ... ... SETC INTM ; Disable interrupts before restoring RB ... POP RB ; Always restore RB register ... IRET ; RA = RAS, RAS = 0 See also 164 POP RB C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated PUSH RB — Push the RB Register onto the Stack www.ti.com RPTB label, loc16 RPTB label, #RC SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 165 RPTB label, loc16 — Repeat A Block of Code RPTB label, loc16 www.ti.com Repeat A Block of Code Operands label This label is used by the assembler to determine the end of the repeat block and to calculate RSIZE. This label should be placed immediately after the last instruction included in the repeat block. loc16 16-bit location for the repeat count value. Opcode LSW: 1011 0101 0bbb bbbb MSW: 0000 0000 loc16 Description Initialize repeat block loop, repeat count from [loc16] Restrictions • • • • • • • The maximum block size is ≤127 16-bit words. An even aligned block must be ≥ 9 16-bit words. An odd aligned block must be ≥ 8 16-bit words. Interrupts must be disabled when saving or restoring the RB register. Repeat blocks cannot be nested. Any discontinuity type operation is not allowed inside a repeat block. This includes all call, branch or TRAP instructions. Interrupts are allowed. Conditional execution operations are allowed. Flags This instruction does not affect any flags in the VSTATUS register. Pipeline This instruction takes four cycles on the first iteration and zero cycles thereafter. No special pipeline alignment is required. Example The minimum size for the repeat block is 8 words if the block is even aligned and 9 words if the block is odd aligned. If you have a block of 8 words, as in the following example, you can make sure the block is odd aligned by proceeding it by a .align 2 directive and a NOP instruction. The .align 2 directive will make sure the NOP is even aligned. Since a NOP is a 16-bit instruction the RPTB will be odd aligned. For blocks of 9 or more words, this is not required. ; Repeat Block of 8 Words (Interruptible) ; ; Note: This example makes use of floating-point (C28x+FPU) instructions ; ; ; find the largest element and put its address in XAR6 .align 2 NOP RPTB _VECTOR_MAX_END, AR7 ; Execute the block AR7+1 times MOVL ACC,XAR0 MOV32 R1H,*XAR0++ ; min size = 8, 9 words MAXF32 R0H,R1H ; max size = 127 words MOVST0 NF,ZF MOVL XAR6,ACC,LT _VECTOR_MAX_END: ; label indicates the end ; RA is cleared When an interrupt is taken the repeat active (RA) bit in the RB register is automatically copied to the repeat active shadow (RAS) bit. When the interrupt exits, the RAS bit is automatically copied back to the RA bit. This allows the hardware to keep track if a repeat loop was active whenever an interrupt is taken and restore that state automatically. A high priority interrupt is defined as an interrupt that cannot itself be interrupted. In a high priority interrupt, the RB register must be saved if a RPTB block is used within the 166 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated RPTB label, loc16 — Repeat A Block of Code www.ti.com interrupt. If the interrupt service routine does not include a RPTB block, then you do not have to save the RB register. ; Repeat Block within a High-Priority Interrupt (Non-Interruptible) ; ; Interrupt: ; RAS = RA, RA = 0 ... PUSH RB ; Save RB register only if a RPTB block is used in the ISR ... ... RPTB _BlockEnd, AL ; Execute the block AL+1 times ... ... ... _BlockEnd ; End of block to be repeated ... ... POP RB ; Restore RB register ... IRET ; RA = RAS, RAS = 0 A low-priority interrupt is defined as an interrupt that allows itself to be interrupted. The RB register must always be saved and restored in a low-priority interrupt. The RB register must stored before interrupts are enabled. Likewise before restoring the RB register interrupts must first be disabled. ; Repeat Block within a Low-Priority Interrupt (Interruptible) ; ; Interrupt: ; RAS = RA, RA = 0 ... PUSH RB ; Always save RB register ... CLRC INTM ; Enable interrupts only after saving RB ... ... ... ; ISR may or may not include a RPTB block ... ... SETC INTM ; Disable interrupts before restoring RB ... POP RB ; Always restore RB register ... IRET ; RA = RAS, RAS = 0 See also POP RB PUSH RB RPTB label, #RC SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 167 RPTB label, #RC — Repeat a Block of Code RPTB label, #RC www.ti.com Repeat a Block of Code Operands label This label is used by the assembler to determine the end of the repeat block and to calculate RSIZE. This label should be placed immediately after the last instruction included in the repeat block. #RC 16-bit immediate value for the repeat count. Opcode LSW: 1011 0101 1bbb bbbb MSW: cccc cccc cccc cccc Description Repeat a block of code. The repeat count is specified as a immediate value. Restrictions • • • • • • • The maximum block size is ≤127 16-bit words. An even aligned block must be ≥ 9 16-bit words. An odd aligned block must be ≥ 8 16-bit words. Interrupts must be disabled when saving or restoring the RB register. Repeat blocks cannot be nested. Any discontinuity type operation is not allowed inside a repeat block. This includes all call, branch or TRAP instructions. Interrupts are allowed. Conditional execution operations are allowed. Flags This instruction does not affect any flags in the VSTATUS register. Pipeline This instruction takes one cycle on the first iteration and zero cycles thereafter. No special pipeline alignment is required. Example The minimum size for the repeat block is 8 words if the block is even aligned and 9 words if the block is odd aligned. If you have a block of 8 words, as in the following example, you can make sure the block is odd aligned by proceeding it by a .align 2 directive and a NOP instruction. The .align 2 directive will make sure the NOP is even aligned. Since a NOP is a 16-bit instruction the RPTB will be odd aligned. For blocks of 9 or more words, this is not required. ; Repeat Block of 8 Words (Interruptible) ; ; Note: This example makes use of floating-point (C28x+FPU) instructions ; ; find the largest element and put its address in XAR6 ; .align 2 NOP RPTB _VECTOR_MAX_END, AR7 ; Execute the block AR7+1 times MOVL ACC,XAR0 MOV32 R1H,*XAR0++ ; min size = 8, 9 words MAXF32 R0H,R1H ; max size = 127 words MOVST0 NF,ZF MOVL XAR6,ACC,LT _VECTOR_MAX_END: ; label indicates the end ; RA is cleared When an interrupt is taken the repeat active (RA) bit in the RB register is automatically copied to the repeat active shadow (RAS) bit. When the interrupt exits, the RAS bit is automatically copied back to the RA bit. This allows the hardware to keep track if a repeat loop was active whenever an interrupt is taken and restore that state automatically. A high priority interrupt is defined as an interrupt that cannot itself be interrupted. In a high priority interrupt, the RB register must be saved if a RPTB block is used within the 168 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated RPTB label, #RC — Repeat a Block of Code www.ti.com interrupt. If the interrupt service routine does not include a RPTB block, then you do not have to save the RB register. ; Repeat Block within a ; ; Interrupt: ... PUSH RB ... ... RPTB #_BlockEnd, #5 ... ... ... _BlockEnd ... ... POP RB IRET High-Priority Interrupt (Non-Interruptible) ; RAS = RA, RA = 0 ; Save RB register only if a RPTB block is used in the ISR ; Execute the block AL+1 times ; End of block to be repeated ; Restore RB register ... ; RA = RAS, RAS = 0 A low-priority interrupt is defined as an interrupt that allows itself to be interrupted. The RB register must always be saved and restored in a low-priority interrupt. The RB register must stored before interrupts are enabled. Likewise before restoring the RB register interrupts must first be disabled. ; Repeat Block within a Low-Priority Interrupt (Interruptible) ; ; Interrupt: ; RAS = RA, RA = 0 ... PUSH RB ; Always save RB register ... CLRC INTM ; Enable interrupts only after saving RB ... ... ... ; ISR may or may not include a RPTB block ... ... SETC INTM ; Disable interrupts before restoring RB ... POP RB ; Always restore RB register ... IRET ; RA = RAS, RAS = 0 See also POP RB PUSH RB RPTB label, loc16 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 169 VCLEAR VRa — Clear General Purpose Register VCLEAR VRa www.ti.com Clear General Purpose Register Operands VRa General purpose register: VR0, VR1... VR8 Opcode LSW: 1110 0110 1111 1000 MSW: 0000 0000 0000 aaaa Description Clear the specified general purpose register. VRa = 0x00000000; Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example ; ; Code fragment from a viterbi traceback ; For the first iteration the previous state metric must be ; initalized to zero (VR0). ; VCLEAR VR0 ; Clear the VR0 register MOVL XAR5,*+XAR4[0] ; Point XAR5 to an array ; ; For first stage ; VMOV32 VT0, *--XAR3 VMOV32 VT1, *--XAR3 VTRACE *XAR5++,VR0,VT0,VT1 ; Uses VR0 (which is zero) ; ; etc... ; See also VCLEARALL VTCLEAR 170 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VCLEARALL — Clear All General Purpose and Transition Bit Registers www.ti.com VCLEARALL Clear All General Purpose and Transition Bit Registers Operands none Opcode LSW: 1110 0110 1111 1001 MSW: 0000 0000 0000 0000 Description Clear all of the general purpose registers (VR0, VR1... VR8) and the transition bit registers (VT0 and VT1). VR0 VR0 VR2 VR3 VR4 VR5 VR6 VR7 VR8 VT0 VT1 VSM0 VSM1 ... VSM63 = = = = = = = = = = = = = 0x00000000; 0x00000000; 0x00000000; 0x00000000; 0x00000000; 0x00000000; 0x00000000; 0x00000000; 0x00000000; 0x00000000; 0x00000000; 0x00000000 0x00000000 = 0x00000000 Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example ; ; ; Context save all VCU VRa and VTa registers VMOV32 VMOV32 VMOV32 VMOV32 VMOV32 VMOV32 VMOV32 VMOV32 VMOV32 VMOV32 VMOV32 *SP++, *SP++, *SP++, *SP++, *SP++, *SP++, *SP++, *SP++, *SP++, *SP++, *SP++, VR0 VR1 VR2 VR3 VR4 VR5 VR6 VR7 VR8 VT0 VT1 ; ; Clear VR0 - VR8, VT0 and VT1, VSM0 - VSM63 ; VCLEARALL ; ; etc... See also VCLEAR VRa VTCLEAR SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 171 VCLRCPACK — Clears CPACK bit in the VSTATUS Register VCLRCPACK www.ti.com Clears CPACK bit in the VSTATUS Register Operands none Opcode LSW: 1110 0101 0010 0010 MSW: 0000 0000 0000 0000 Description Clears the CPACK bit in the VSTATUS register. This causes the VCU to process complex data, in complex math operations, in the VRx registers as follows: VRx[31:16] holds Real part, VRx[15:0] holds Imaginary part Flags This instruction clears the CPACK bit in the VSTATUS register. Pipeline This is a single-cycle instruction. Example ; complex conjugate multiply| (jb + a)*(jd + c)=(ac+bd)+j(bc-ad) VCLRCPACK ; cpack = 0 real part in high word VMOV32 VR0, *XAR4++ ; load 1st complex input | jb + a VMOV32 VR1, *XAR4++ ; load second complex input | jd + c VCCMPY VR3, VR2, VR1, VR0 See also VSETCPACK 172 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VCLRCRCMSGFLIP — Clears CRCMSGFLIP bit in the VSTATUS Register www.ti.com VCLRCRCMSGFLIP Clears CRCMSGFLIP bit in the VSTATUS Register Operands none Opcode LSW: 1110 0101 0010 1101 MSW: 0000 0000 0000 0000 Description Clear the CRCMSGFLIP bit in the VSTATUS register. This causes the VCU to process message bits starting from most-significant to least-significant for CRC computation. In this case, bytes loaded from memory are fed directly for CRC computation. Flags This instruction clears the CRCMSGFLIP bit in the VSTATUS register. Pipeline This is a single-cycle instruction. Example ; Clear the CRCMSGFLIP bit to have the CRC routine process the ; input message in big-endian format. The CRCMSGFLIP bit is ; cleared on reset ; VCLRCRCMSGFLIP LCR _CRC_run8Bit See also VSETCRCMSGFLIP SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 173 VCLROPACK — Clears OPACK bit in the VSTATUS Register VCLROPACK www.ti.com Clears OPACK bit in the VSTATUS Register Operands none Opcode LSW: 1110 0101 0010 0101 MSW: 0000 0000 0000 0000 Description Clear the OPACK bit in the VSTATUS register. This bit affects the packing order of the traceback output bits (using the VTRACE instructions). When the bit is set to 0 it forces the bits generated from the traceback operation to be loaded through the LSb of the destination register (or memory location) with the older bits being left shifted. Flags This instruction clears the OPACK bit in the VSTATUS register. Pipeline This is a single-cycle instruction. Example See also 174 VSETOPACK C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VCLROVFI — Clear Imaginary Overflow Flag www.ti.com VCLROVFI Clear Imaginary Overflow Flag Operands none Opcode LSW: 1110 0101 0000 1011 Description Clear the real overflow flag in the VSTATUS register. To clear the real flag, use the VCLROVFR instruction. The imaginary flag bit can be set by instructions shown in Table 2-6. Refer to individual instruction descriptions for details. VSTATUS[OVFR] = 0; Flags This instruction clears the OVFI flag. Pipeline This is a single-cycle instruction. Example See also VCLROVFR VRNDON VSATFOFF VSATON SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 175 VCLROVFR — Clear Real Overflow Flag VCLROVFR www.ti.com Clear Real Overflow Flag Operands none Opcode LSW: 1110 0101 0000 1010 Description Clear the real overflow flag in the VSTATUS register. To clear the imaginary flag, use the VCLROVFI instruction. The imaginary flag bit can be set by instructions shown in Table 2-6. Refer to individual instruction descriptions for details. VSTATUS[OVFR] = 0; Flags This instruction clears the OVFR flag. Pipeline This is a single-cycle instruction. Example See also 176 VCLROVFI VRNDON VSATFOFF VSATON C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VMOV16 mem16, VRaH — Store General Purpose Register, High Half www.ti.com VMOV16 mem16, VRaH Store General Purpose Register, High Half Operands mem16 Pointer to a 16-bit memory location. This will be the source for the VMOV16. VRaH High word of a general purpose register: VR0H, VR1H...VR8H. Opcode LSW: 1110 0010 0001 1000 MSW: 0001 aaaa mem16 Description Store the upper 16-bits of the specified general purpose register into the 16-bit memory location. [mem16] = VRa[31:6]; Flags This instruction does not affect any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example See also VMOV16 VRaH, mem16 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 177 VMOV16 mem16, VRaL — Store General Purpose Register, Low Half www.ti.com VMOV16 mem16, VRaL Store General Purpose Register, Low Half Operands mem16 Pointer to a 16-bit memory location. This will be the destination of the VMOV16. VRaL Low word of a general purpose register: VR0L, VR1L...VR8L. Opcode LSW: 1110 0010 0001 1000 MSW: 0000 aaaa mem16 Description Store the low 16-bits of the specified general purpose register into the 16-bit memory location. [mem16] = VRa[15:0]; Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example See also 178 VMOV16 VRaL, mem16 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VMOV16 VRaH, mem16 — Load General Purpose Register, High Half www.ti.com VMOV16 VRaH, mem16 Load General Purpose Register, High Half Operands VRHL High word of a general purpose register: VR0H, VR1H....VR8H mem16 Pointer to a 16-bit memory location. This will be the source for the VMOV16. Opcode LSW: 1110 0010 1100 1001 MSW: 0001 aaaa mem16 Description Load the upper 16 bits of the specified general purpose register with the contents of memory pointed to by mem16. VRa[31:16] = [mem16]; Flags This instruction does not affect any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example ;1st Iteration VMOV32 VR4, *+XAR3[0] VMOV16 VR0H, *+XAR5[0] VMOV32 VR1, *+XAR3[4] VMOV32 VR6, VR0 ; etc. See also ; VR4H = m, VR4L=n Load m,n ; VR0H = J, VR0L = I Init I, J ; VR1H = u, VR1L = a load u, a ; Save current {J,I} in VR6 VMOV16 mem16, VRaH SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 179 VMOV16 VRaL, mem16 — Load General Purpose Register, Low Half www.ti.com VMOV16 VRaL, mem16 Load General Purpose Register, Low Half Operands VRaL Low word of a general purpose register: VR0L, VR1L....VR8L mem16 Pointer to a 16-bit memory location. This will be the source for the VMOV16. Opcode LSW: 1110 0010 1100 1001 MSW: 0000 aaaa mem16 Description Load the lower 16 bits of the specified general purpose register with the contents of memory pointed to by mem16. VRa[15:0] = [mem16]; Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example ; ; Loop will run 106 times for 212 inputs to decoder ; ; Code fragment from viterbi decoder ; _LOOP: ; ; ; Calculate the branch metrics for code rate = 1/3 ; Load VR0L, VR1L and VR2L with inputs ; to the decoder from the array pointed to by XAR5 ; ; VMOV16 VR0L, *XAR5++ VMOV16 VR1L, *XAR5++ VMOV16 VR2L, *XAR5++ ; ; VR0L = BM0 ; VR0H = BM1 ; VR1L = BM2 ; VR1H = BM3 ; VR2L = pt_old[0] ; VR2H = pt_old[1] ; VITBM3 VR0, VR1, VR2 VMOV32 VR2, *XAR1++ ; etc... See also 180 VMOV16 mem16, VRaL C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VMOV32 *(0:16bitAddr), loc32 — Move the contents of loc32 to Memory www.ti.com VMOV32 *(0:16bitAddr), loc32 Move the contents of loc32 to Memory Operands l*(0:16bitAddr) Address of 32-bit Destination Location (VCU register) loc32 Source Location (CPU register) Opcode LSW: 1011 1101 loc32 MSW: IIII IIII IIII IIII Description Move the 32-bit value in loc32 to the memory location addressed by 0:16bitAddr. The EALLOW bit in the ST1 register is ignored by this operation. [0:16bitAddr] = [loc32] Flags This instruction does not affect any flags in the VSTATUS register. Pipeline This is a two-cycle instruction. Example ; ; ; ; See also VMOV32 VRa, mem32 VMOV32 VRb, VRa VMOV32 loc32, *(0:16bitAddr) EALLOW ignored on write Four NOPs are needed after the operation so that the write to the VCU register takes effect before it is used in subsequent operations, for example VMOV32 VRa,@ACC ; VRa = ACC NOP ; Pipeline alignment NOP ; Pipeline alignment NOP ; Pipeline alignment NOP ; Pipeline alignment VMOV32 *XAR7++, VRa ; [*XAR] = VRa SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 181 VMOV32 loc32, *(0:16bitAddr) — Move 32-bit Value from Memory to loc32 www.ti.com VMOV32 loc32, *(0:16bitAddr) Move 32-bit Value from Memory to loc32 Operands loc32 Destination Location (CPU register) *(0:16bitAddr) Address of 32-bit Source Value (VCU register) Opcode LSW: 1011 1111 loc32 MSW: IIII IIII IIII IIII Description Copy the 32-bit value referenced by 0:16bitAddr to the location indicated by loc32 [loc32] = [0:16bitAddr] Flags This instruction does not affect any flags in the VSTATUS register. Pipeline This is two-cycle instruction. Example ; A single NOP is needed before the operation so as to read the ; correct VCU's VRx register value VMOV32 VRa,*XAR7++ ; VRa = [*XAR7] NOP ; Pipeline alignment VMOV32 @ACC, VRa ; ACC = VRa ; Two NOPs are needed before the operation so as to read the ; correct VCU's VSMx or VRx.By register value. VMOV32 VSM1: VSM0, *XAR7 ; VSM1:VSM0 = [*XAR7] NOP ; Pipeline alignment NOP ; Pipeline alignment VMOV32 @ACC, VSM0 ; AH:AL = VSM1:VSM0 See also VMOV32 VRa, mem32 VMOV32 VRb, VRa VMOV32 *(0:16bitAddr), loc32 182 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VMOV32 mem32, VRa — Store General Purpose Register www.ti.com VMOV32 mem32, VRa Store General Purpose Register Operands mem32 Pointer to a 32-bit memory location. This will be the destination of the VMOV32. VRa General purpose register VR0, VR1... VR8 Opcode LSW: 1110 0010 0000 0100 MSW: 0000 aaaa mem32 Description Store the 32-bit contents of the specified general purpose register into the memory location pointed to by mem32. [mem32] = VRa; Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example See also VMOV32 mem32, VSTATUS VMOV32 mem32, VTa VMOV32 VRa, mem32 VMOV32 VTa, mem32 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 183 VMOV32 mem32, VSTATUS — Store VCU Status Register www.ti.com VMOV32 mem32, VSTATUS Store VCU Status Register Operands mem32 Pointer to a 32-bit memory location. This will be the destination of the VMOV32. VSTATUS VCU status register. Opcode LSW: 1110 0010 0000 1101 MSW: 0000 0000 mem32 Description Store the VSTATUS register into the memory location pointed to by mem32. [mem32] = VSTATUS; Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example See also 184 VMOV32 mem32, VRa VMOV32 mem32, VTa VMOV32 VRa, mem32 VMOV32 VSTATUS, mem32 VMOV32 VTa, mem32 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VMOV32 mem32, VTa — Store Transition Bit Register www.ti.com VMOV32 mem32, VTa Store Transition Bit Register Operands mem32 pointer to a 32-bit memory location. This will be the destination of the VMOV32. VTa Transition bits register VT0 or VT1 Opcode LSW: 1110 0010 0000 0101 MSW: 0000 00tt mem32 Description Store the 32-bits of the specified transition bits register into the memory location pointed to by mem32. [mem32] = VTa; Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example See also VMOV32 mem32, VRa VMOV32 mem32, VSTATUS VMOV32 VRa, mem32 VMOV32 VSTATUS, mem32 VMOV32 VTa, mem32 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 185 VMOV32 VRa, mem32 — Load 32-bit General Purpose Register www.ti.com VMOV32 VRa, mem32 Load 32-bit General Purpose Register Operands VRa General purpose register VR0, VR1....VR8 mem32 Pointer to a 32-bit memory location. This will be the source of the VMOV32. Opcode LSW: 1110 0011 1111 0000 MSW: 0000 aaaa mem32 Description Load the specified general purpose register with the 32-bit value in memory pointed to by mem32. VRa = [mem32]; Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example See also 186 VMOV32 mem32, VRa VMOV32 mem32, VSTATUS VMOV32 mem32, VTa VMOV32 VSTATUS, mem32 VMOV32 VTa, mem32 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VMOV32 VRb, VRa — Move 32-bit Register to Register www.ti.com VMOV32 VRb, VRa Move 32-bit Register to Register Operands VRa General purpose destination register VR0....VR8 VRb General purpose source register VR0...VR8 Opcode LSW: 1110 0110 MSW: 0000 0010 Description Move a 32-bit value from one general purpose VCU register to another. 1111 0010 bbbb aaaa VRa = [mem32]; Flags This instruction does not affect any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example ; Swap VR0 and VR1 using VR2 as temporary storage ; VMOV32 VR2, VR1 VMOV32 VR1, VR0 VMOV32 VR0, VR2 See also VMOV32 mem32, VRa VMOV32 mem32, VSTATUS VMOV32 mem32, VTa VMOV32 VTa, mem32 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 187 VMOV32 VSTATUS, mem32 — Load VCU Status Register www.ti.com VMOV32 VSTATUS, mem32 Load VCU Status Register Operands VSTATUS VCU status register mem32 Pointer to a 32-bit memory location. This will be the source of the VMOV32. Opcode LSW: 1110 0010 1011 0000 MSW: 0000 0000 mem32 Description Load the VSTATUS register with the 32-bit value in memory pointed to by mem32. VSTATUS = [mem32]; Flags This instruction modifies all bits within the VSTATUS register. Pipeline This is a single-cycle instruction. Example See also 188 VMOV32 mem32, VSTATUS VMOV32 mem32, VTa VMOV32 VRa, mem32 VMOV32 VTa, mem32 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VMOV32 VTa, mem32 — Load 32-bit Transition Bit Register www.ti.com VMOV32 VTa, mem32 Load 32-bit Transition Bit Register Operands VTa Transition bit register: VT0, VT1 mem32 Pointer to a 32-bit memory location. This will be the source of the VMOV32. Opcode LSW: 1110 0011 1111 0001 MSW: 0000 00tt mem32 Description Load the specified transition bit register with the 32-bit value in memory pointed to by mem32 . VTa = [mem32]; Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example See also VMOV32 mem32, VSTATUS VMOV32 mem32, VTa VMOV32 VRa, mem32 VMOV32 VSTATUS, mem32 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 189 VMOVD32 VRa, mem32 — Load Register with Data Move www.ti.com VMOVD32 VRa, mem32 Load Register with Data Move Operands VRa General purpose registger, VR0, VR1.... VR8 mem32 Pointer to a 32-bit memory location. This will be the source of the VMOV32. Opcode LSW: 1110 0010 0010 0100 MSW: 0000 aaaa mem32 Description Load the specified general purpose register with the 32-bit value in memory pointed to by mem32. In addition, copy the next 32-bit value in memory to the location pointed to by mem32. VRa = [mem32]; [mem32 + 2] = [mem32]; Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example See also 190 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VMOVIX VRa, #16I — Load Upper Half of a General Purpose Register with I6-bit Immediate www.ti.com VMOVIX VRa, #16I Load Upper Half of a General Purpose Register with I6-bit Immediate Operands VRa General purpose registger, VR0, VR1... VR8 #16I 16-bit immediate value Opcode LSW: 1110 0111 1110 IIII MSW: IIII IIII IIII aaaa Description Load the upper 16-bits of the specified general purpose register with an immediate value. Leave the upper 16-bits of the register unchanged. VRa[15:0] = unchanged; VRa[31:16] = #16I; Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example See also VMOVZI VRa, #16I VMOVXI VRa, #16I SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 191 VMOVZI VRa, #16I — Load General Purpose Register with Immediate VMOVZI VRa, #16I www.ti.com Load General Purpose Register with Immediate Operands VRa General purpose registger, VR0, VR1...VR8 #16I 16-bit immediate value Opcode LSW: 1110 0111 1111 IIII MSW: IIII IIII IIII aaaa Description Load the lower 16-bits of the specified general purpose register with an immediate value. Clear the upper 16-bits of the register. VRa[15:0] = #16I; VRa[31:16] = 0x0000; Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example See also 192 VMOVIX VRa, #16I VMOVXI VRa, #16I C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VMOVXI VRa, #16I — Load Low Half of a General Purpose Register with Immediate www.ti.com VMOVXI VRa, #16I Load Low Half of a General Purpose Register with Immediate Operands VRa General purpose register, VR0 - VR8 #16I 16-bit immediate value Opcode LSW: 1110 0111 0111 IIII MSW: IIII IIII IIII aaaa Description Load the lower 16-bits of the specified general purpose register with an immediate value. Leave the upper 16 bits unchanged. VRa[15:0] = #16I; VRa[31:16] = unchanged; Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example See also VMOVIX VRa, #16I VMOVZI VRa, #16I SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 193 VRNDOFF — Disable Rounding VRNDOFF www.ti.com Disable Rounding Operands none Opcode LSW: 1110 0101 0000 1001 Description This instruction disables the rounding mode by clearing the RND bit in the VSTATUS register. When rounding is disabled, the result of the shift right operation for addition and subtraction operations will be truncated instead of rounded. The operations affected by rounding are shown in Table 2-6. Refer to the individual instruction descriptions for information on how rounding effects the operation. To enable rounding use the VRNDON instruction. For more information on rounding, refer to Section 2.3.2. VSTATUS[RND] = 0; Flags This instruction clears the RND bit in the VSTATUS register. It does not change any flags. Pipeline This is a single-cycle instruction. Example See also 194 VCLROVFI VCLROVFR VRNDON VSATFOFF VSATON C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VRNDON — Enable Rounding www.ti.com VRNDON Enable Rounding Operands none Opcode LSW: 1110 0101 0000 1000 Description This instruction enables the rounding mode by setting the RND bit in the VSTATUS register. When rounding is enabled, the result of the shift right operation for addition and subtraction operations will be rounded instead of being truncated. The operations affected by rounding are shown in Table 2-6. Refer to the individual instruction descriptions for information on how rounding effects the operation. To disable rounding use the VRNDOFF instruction. For more information on rounding, refer to Section 2.3.2. VSTATUS[RND] = 1; Flags This instruction sets the RND bit in the VSTATUS register. It does not change any flags. Pipeline This is a single-cycle instruction. Example See also VCLROVFI VCLROVFR VRNDOFF VSATFOFF VSATON SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 195 VSATOFF — Disable Saturation VSATOFF www.ti.com Disable Saturation Operands none Opcode LSW: 1110 0101 0000 0111 Description This instruction disables the saturation mode by clearing the SAT bit in the VSTATUS register. When saturation is disabled, results of addition and subtraction are allowed to overflow or underflow. When saturation is enabled, results will instead be set to a maximum or minimum value instead of being allowed to overflow or underflow. To enable saturation use the VSATON instruction. VSTATUS[SAT] = 0 Flags This instruction clears the the SAT bit in the VSTATUS register. It does not change any flags. Pipeline This is a single-cycle instruction. Example See also 196 VCLROVFI VCLROVFR VRNDOFF VRNDON VSATON C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VSATON — Enable Saturation www.ti.com VSATON Enable Saturation Operands none Opcode LSW: 1110 0101 0000 0110 Description This instruction enables the saturation mode by setting the SAT bit in the VSTATUS register. When saturation is enables, results of addition and subtraction are not allowed to overflow or underflow. Results will, instead, be set to a maximum or minimum value. To disable saturation use the VSATOFF instruction.. VSTATUS[SAT] = 1 Flags This instruction sets the SAT bit in the VSTATUS register. It does not change any flags. Pipeline This is a single-cycle instruction. Example See also VCLROVFI VCLROVFR VRNDOFF VRNDON VSATOFF SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 197 VSETCPACK — Set CPACK bit in the VSTATUS Register VSETCPACK www.ti.com Set CPACK bit in the VSTATUS Register Operands none Opcode LSW: 1110 0101 0010 0001 Description Set the CPACK bit in the VSTATUS register. This causes the VCU to process complex data, in complex math operations, in the VRx registers as follows: VRx[31:16] holds the Imaginary part, VRx[15:0] holds the Real part Flags This instruction sets the CPACK bit in the VSTATUS register. Pipeline This is a single-cycle instruction. Example ; complex conjugate multiply| (a + jb)*(c + jd)=(ac+bd)+j(bc-ad) VSETCPACK ; cpack = 1 imag part in low word VMOV32 VR0, *XAR4++ ; load 1st complex input | a + jb VMOV32 VR1, *XAR4++ ; load second complex input | c + jd VCCMPY VR3, VR2, VR1, VR0 See also VCLRCPACK 198 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VSETCRCMSGFLIP — Set CRCMSGFLIP bit in the VSTATUS Register www.ti.com VSETCRCMSGFLIP Set CRCMSGFLIP bit in the VSTATUS Register Operands none Opcode LSW: 1110 0101 0010 1100 Description Set the CRCMSGFLIP bit in the VSTATUS register. This causes the VCU to process message bits starting from least-significant to most-significant for CRC computation. In this case, bytes loaded from memory are “flipped” and then fed for CRC computation. Flags This instruction sets the CRCMSGFLIP bit in the VSTATUS register. Pipeline This is a single-cycle instruction. Example ; Set the CRCMSGFLIP bit, each word has all its bits reversed ; prior to the CRC being calculated ; VSETCRCMSGFLIP LCR _CRC_run8Bit VCLRCRCMSGFLIP See also VCLRCRCMSGFLIP SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 199 VSETOPACK — Set OPACK bit in the VSTATUS Register VSETOPACK www.ti.com Set OPACK bit in the VSTATUS Register Operands none Opcode LSW: 1110 0101 0010 0011 Description Set the OPACK bit in the VSTATUS register. This bit affects the packing order of the traceback output bits (using the instructions). When the bit is set to 1 it forces the bits generated from the traceback operation to be loaded through the MSb of the destination register (or memory location) with the older bits being right-shifted. This instruction sets the OPACK bit in the VSTATUS register. Flags This instruction sets the OPACK bit in the VSTATUS register. Pipeline This is a single-cycle instruction. Example VSETOPACK ; ; ; ; VSTATUS.OPACK = 1, start packing the decoded bits from trace back into VT1 starting from the MSb, this obviates the need to manually flip the result each time ; etc… See also 200 VCLROPACK C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VSETSHL #5-bit — Initialize the Left Shift Value www.ti.com VSETSHL #5-bit Initialize the Left Shift Value Operands #5-bit 5-bit, unsigned, immediate value Opcode LSW: 1110 0101 110s ssss Description Load VSTATUS[SHIFTL] with an unsigned, 5-bit, immediate value. The left shift value specifies the number of bits an operand is shifted by. A value of zero indicates no shift will be performed. The left shift is used by the and VCDSUB16 and VCDADD16 operations. Refer to the description of these instructions for more information. To load the right shift value use the VSETSHR #5-bit instruction. VSTATUS[VSHIFTL] = #5-bit Flags This instruction changes the VSHIFTL value in the VSTATUS register. It does not change any flags. Pipeline This is a single-cycle instruction. Example See also VSETSHR #5-bit SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 201 VSETSHR #5-bit — Initialize the Left Shift Value VSETSHR #5-bit www.ti.com Initialize the Left Shift Value Operands #5-bit 5-bit, unsigned, immediate value Opcode LSW: 1110 0101 010s ssss Description Load VSTATUS[SHIFTR] with an unsigned, 5-bit, immediate value. The right shift value specifies the number of bits an operand is shifted by. A value of zero indicates no shift will be performed. The right shift is used by the VCADD, VCSUB, VCDADD16 and VCDSUB16 operations. It is also used by the addition portion of the VCMAC. Refer to the description of these instructions for more information. VSTATUS[VSHIFTR] = #5-bit Flags This instruction changes the VSHIFTR value in the VSTATUS register. It does not change any flags. Pipeline This is a single-cycle instruction. Example See also 202 VSETSHL #5-bit C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VSWAP32 VRb, VRa — 32-bit Register Swap www.ti.com VSWAP32 VRb, VRa 32-bit Register Swap Operands VRb General purpose register VR0…VR8 VRab General purpose register VR0…VR8 Opcode LSW: 1110 0110 1111 0010 MSW: 0000 0011 bbbb aaaa Description Swap the contents of the 32-bit general purpose VCU registers VRa and VRb. Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction. Example ; Swap VR0 and VR1 using VSWAP32 instruction ; See also VMOV32 mem32, VSTATUS VMOV32 mem32, VTa VMOV32 VRa, mem32 VMOV32VRbVRa VMOV32VTamem32 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 203 VXORMOV32 VRa, mem32 — 32-bit Load and XOR From Memory www.ti.com VXORMOV32 VRa, mem32 32-bit Load and XOR From Memory Operands Input Register Value VRa General purpose register VR0...VR8 mem32 Pointer to 32-bit memory location Opcode LSW: 1110 0011 1111 0000 MSW: 0000 aaaa MMMM MMMM Description XOR the contents of the VRa register with a long word from memory and store the result back into VRa VRa = VRa ^ mem32 Flags This instruction does not affect any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example VXORMOV32 VR0, *+XAR4[0] ;VR0=VR0 ^ *XAR4[0] See also 204 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Instruction Set www.ti.com 2.5.3 Arithmetic Math Instructions The instructions are listed alphabetically, preceded by a summary. Table 2-12. Arithmetic Math Instructions Title ...................................................................................................................................... VASHL32 VRa << #5-bit — Arithmetic Shift Left .................................................................................. VASHR32 VRa >> #5-bit — Arithmetic Shift Right ................................................................................ VBITFLIP VRa — Bit Flip............................................................................................................... VLSHL32 VRa << #5-bit — Logical Shift Left ...................................................................................... VLSHR32 VRa >> #5-bit — Logical Shift Right .................................................................................... VNEG VRa — Two's Complement Negate ........................................................................................... SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated Page 206 207 208 209 210 211 205 VASHL32 VRa << #5-bit — Arithmetic Shift Left www.ti.com VASHL32 VRa << #5-bit Arithmetic Shift Left Operands VRa VRa can be VR0 - VR7. VRa can not be VR8. #5-bit 5-bit unsigned immediate value Opcode LSW: 1110 0110 1111 0010 MSW: 0000 0111 IIII Iaaa Description Arithmetic left shift of VRa If(VSTATUS[SAT] == 1){ VRa = sat(VRa << #5-bit Immediate) }else { VRa = VRa << #5-bit Immediate } Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the 32-bit signed result after the shift left operation overflows Pipeline This is a single-cycle instruction Example VASHL32 See also VASHR32 VRa>> #5-bit 206 VR4 << #16 ; VR4 := VR4 << 16 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VASHR32 VRa >> #5-bit — Arithmetic Shift Right www.ti.com VASHR32 VRa >> #5-bit Arithmetic Shift Right Operands VRa VRa can be VR0 - VR7. VRa can not be VR8. #5-bit 5-bit unsigned immediate value Opcode LSW: 1110 0110 1111 0010 MSW: 0000 1000 IIII Iaaa Description Arithmetic right shift of VRa If(VSTATUS[RND] == 1){ VRa = rnd(VRa >> #5-bit Immediate) }else { VRa = VRa >> #5-bit Immediate } Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example VASHR32 See also VASHL32 VRa#5-bit VR1 >> #16 ; VR1 := VR1 >> 16 (sign extended) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 207 VBITFLIP VRa — Bit Flip VBITFLIP VRa www.ti.com Bit Flip Operands VRa General purpose register VR0...VR8 Opcode LSW: 1010 0001 0010 aaaa Description Reverse the bit order of VRa register VRa[31:0] = VRa[0:31] Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example VBITFLIP VR1 ; VR1(31:0) := VR1(0:31) See also 208 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VLSHL32 VRa << #5-bit — Logical Shift Left www.ti.com VLSHL32 VRa << #5-bit Logical Shift Left Operands VRa VRa can be VR0 - VR7. VRa can not be VR8. #5-bit 5-bit unsigned immediate value Opcode LSW: 1110 0110 1111 0010 MSW: 0000 0101 IIII Iaaa Description Logical right shift of VRa VRa = VRa << #5-bit Immediate Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example VLSHL32 See also VLSHL32 VRa>> #5-bit VR0 << #16 ; VR0 := VR0 << 16 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 209 VLSHR32 VRa >> #5-bit — Logical Shift Right www.ti.com VLSHR32 VRa >> #5-bit Logical Shift Right Operands VRa VRa can be VR0 - VR7. VRa can not be VR8. #5-bit 5-bit unsigned immediate value Opcode LSW: 1110 0110 1111 0010 MSW: 0000 0110 IIII Iaaa Description Logical right shift of VRa VRa = VRa >> #5-bit Immediate Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example VLSHR32 See also VLSHL32 VRa#5-bit 210 VR0 >> #16 ; VR0 := VR0 >> 16 (no sign extension) C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VNEG VRa — Two's Complement Negate www.ti.com VNEG VRa Two's Complement Negate Operands VRa VRa can be VR0 - VR7. VRa can not be VR8. Opcode LSW: 1110 0101 0001 aaaa Description Complex add operation. // SAT is VSTATUS[SAT] // if (VRa == 0x800000000) { if(SAT == 1) { VRa = 0x7FFFFFFF; } else { VRa = 0x80000000; } } else { VRa = - VRa } Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the input to the operation is 0x80000000. Pipeline This is a single-cycle instruction. Example See also VCLROVFR VSATON VSATOFF SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 211 Instruction Set 2.5.4 www.ti.com Complex Math Instructions The instructions are listed alphabetically, preceded by a summary. Table 2-13. Complex Math Instructions Title ...................................................................................................................................... VCADD VR5, VR4, VR3, VR2 — Complex 32 + 32 = 32 Addition ............................................................... VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex 32+32 = 32 Add with Parallel Load ................. VCADD VR7, VR6, VR5, VR4 — Complex 32 + 32 = 32- Addition............................................................... VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 — Complex Conjugate Multiply and Accumulate .............................. VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — : Complex Conjugate Multiply and Accumulate with Parallel Load ............................................................................................. VCCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ — Complex Conjugate Multiply and Accumulate .................... VCCMPY VR3, VR2, VR1, VR0 — Complex Conjugate Multiply ................................................................. VCCMPY VR3, VR2, VR1, VR0 || VMOV32 mem32, VRa — Complex Conjugate Multiply with Parallel Store............ VCCMPY VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — Complex Conjugate Multiply with Parallel Load ............ VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 — Complex Conjugate Multiply with Parallel Load ............................. VCCON VRa — Complex Conjugate ................................................................................................. VCDADD16 VR5, VR4, VR3, VR2 — Complex 16 + 32 = 16 Addition .......................................................... VCDADD16 VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex Double Add with Parallel Load ................. VCDSUB16 VR6, VR4, VR3, VR2 — Complex 16-32 = 16 Subtract ............................................................. VCDSUB16 VR6, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex 16-32 = 16 Subtract with Parallel Load ....... VCFLIP VRa — Swap Upper and Lower Half of VCU Register .................................................................. VCMAC VR5, VR4, VR3, VR2, VR1, VR0 — Complex Multiply and Accumulate .............................................. VCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ — Complex Multiply and Accumulate ................................... VCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — Complex Multiply and Accumulate with Parallel Load ............................................................................................................................ VCMAG VRb, VRa — Magnitude of a Complex Number .......................................................................... VCMPY VR3, VR2, VR1, VR0 — Complex Multiply ................................................................................ VCMPY VR3, VR2, VR1, VR0 || VMOV32 mem32, VRa — Complex Multiply with Parallel Store........................... VCMPY VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — Complex Multiply with Parallel Load ........................... VCSHL16 VRa << #4-bit — Complex Shift Left .................................................................................... VCSHR16 VRa >> #4-bit — Complex Shift Right .................................................................................. VCSUB VR5, VR4, VR3, VR2 — Complex 32 - 32 = 32 Subtraction ............................................................ VCSUB VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex Subtraction ............................................. 212 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Page 213 215 217 219 221 223 226 228 230 232 234 235 239 242 246 249 250 252 256 258 259 261 263 265 266 267 269 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VCADD VR5, VR4, VR3, VR2 — Complex 32 + 32 = 32 Addition www.ti.com VCADD VR5, VR4, VR3, VR2 Complex 32 + 32 = 32 Addition Operands Before the operation, the inputs should be loaded into registers as shown below. Each operand for this instruction includes a 32-bit real and a 32-bit imaginary part. Input Register Value VR5 32-bit integer representing the real part of the first input: Re(X) VR4 32-bit integer representing the imaginary part of the first input: Im(X) VR3 32-bit integer representing the real part of the 2nd input: Re(Y) VR2 32-bit integer representing the imaginary part of the 2nd input: Im(Y) The result is also a complex number with a 32-bit real and a 32-bit imaginary part. The result is stored in VR5 and VR4 as shown below: Output Register Value VR5 32-bit integer representing the real part of the result: Re(Z) = Re(X) + (Re(Y) >> SHIFTR) VR4 32-bit integer representing the imaginary part of the result: Im(Z) = Im(X) + (Im(Y) >> SHIFTR) Opcode LSW: 1110 0101 0000 0010 Description Complex 32 + 32 = 32-bit addition operation. The second input operand (stored in VR3 and VR2) is shifted right by VSTATUS[SHIFR] bits before the addition. If VSTATUS[RND] is set, then bits shifted out to the right are rounded, otherwise these bits are truncated. The rounding operation is described in Section 2.3.2. If the VSTATUS[SAT] bit is set, then the result will be saturated in the event of an overflow or underflow. // // // // // // // // // RND is VSTATUS[RND] SAT is VSTATUS[SAT] SHIFTR is VSTATUS[SHIFTR] X: Y: VR5 = Re(X) VR3 = Re(Y) VR4 = Im(X) VR2 = Im(Y) Calculate Z = X + Y if (RND == 1) { VR5 = VR5 + VR4 = VR4 + } else { VR5 = VR5 + VR4 = VR4 + } if (SAT == 1) { sat32(VR5); sat32(VR4); } round(VR3 >> SHIFTR); round(VR2 >> SHIFTR); // Re(Z) // Im(Z) (VR3 >> SHIFTR); (VR2 >> SHIFTR); // Re(Z) // Im(Z) Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the VR5 computation (real part) overflows or underflows. • OVFI is set if the VR4 computation (imaginary part) overflows or underflows. Pipeline This is a single-cycle instruction. SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 213 VCADD VR5, VR4, VR3, VR2 — Complex 32 + 32 = 32 Addition www.ti.com Example See also 214 VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 VCADD VR7, VR6, VR5, VR4 VCLROVFI VCLROVFR VRNDOFF VRNDON VSATON VSATOFF VSETSHR #5-bit C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex 32+32 = 32 Add with Parallel Load VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 Complex 32+32 = 32 Add with Parallel Load Operands Before the operation, the inputs should be loaded into registers as shown below. Each complex number includes a 32-bit real and a 32-bit imaginary part. Input Register Value VR5 32-bit integer representing the real part of the first input: Re(X) VR4 32-bit integer representing the imaginary part of the first input: Im(X) VR3 32-bit integer representing the real part of the 2nd input: Re(Y) VR2 32-bit integer representing the imaginary part of the 2nd input: Im(Y) mem32 pointer to a 32-bit memory location The result is also a complex number with a 32-bit real and a 32-bit imaginary part. The result is stored in VR5 and VR4 as shown below: Output Register Value VR5 32-bit integer representing the real part of the result: Re(Z) = Re(X) + (Re(Y) >> SHIFTR) VR4 32-bit integer representing the imaginary part of the result: Im(Z) = Im(X) + (Im(Y) >> SHIFTR) VRa contents of the memory pointed to by [mem32]. VRa can not be VR5, VR4 or VR8. Opcode LSW: 1110 0011 1111 1000 MSW: 0000 aaaa mem32 Description Complex 32 + 32 = 32-bit addition operation with parallel register load. The second input operand (stored in VR3 and VR2) is shifted right by VSTATUS[SHIFR] bits before the addition. If VSTATUS[RND] is set, then bits shifted out to the right are rounded, otherwise these bits are truncated. The rounding operation is described in Section 2.3.2. If the VSTATUS[SAT] bit is set, then the result will be saturated in the event of an overflow or underflow. In parallel with the addition, VRa is loaded with the contents of memory pointed to by mem32. // // // // // // // // // RND is VSTATUS[RND] SAT is VSTATUS[SAT] SHIFTR is VSTATUS[SHIFTR] VR5 = Re(X) VR3 = Re(Y) VR4 = Im(X) VR2 = Im(Y) Z = X + Y if (RND == 1) { VR5 = VR5 + VR4 = VR4 + } else { VR5 = VR5 + VR4 = VR4 + } if (SAT == 1) { sat32(VR5); sat32(VR4); } VRa = [mem32]; SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback round(VR3 >> SHIFTR); round(VR2 >> SHIFTR); // Re(Z) // Im(Z) (VR3 >> SHIFTR); (VR2 >> SHIFTR); // Re(Z) // Im(Z) C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 215 VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex 32+32 = 32 Add with Parallel Load Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the VR5 computation (real part) overflows. • OVFI is set if the VR4 computation (imaginary part) overflows. Pipeline Both operations complete in a single cycle (1/1 cycles). www.ti.com Example See also 216 VCADD VR7, VR6, VR5, VR4 VCLROVFI VCLROVFR VRNDOFF VRNDON VSATON VSATOFF VSETSHR #5-bit C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VCADD VR7, VR6, VR5, VR4 — Complex 32 + 32 = 32- Addition www.ti.com VCADD VR7, VR6, VR5, VR4 Complex 32 + 32 = 32- Addition Operands Before the operation, the inputs should be loaded into registers as shown below. Each complex number includes a 32-bit real and a 32-bit imaginary part. Input Register Value VR7 32-bit integer representing the real part of the first input: Re(X) VR6 32-bit integer representing the imaginary part of the first input: Im(X) VR5 32-bit integer representing the real part of the 2nd input: Re(Y) VR4 32-bit integer representing the imaginary part of the 2nd input: Im(Y) The result is also a complex number with a 32-bit real and a 32-bit imaginary part. The result is stored in VR7 and VR6 as shown below: Output Register Value VR6 32-bit integer representing the real part of the result: Re(Z) = Re(X) + (Re(Y) >> SHIFTR) VR7 32-bit integer representing the imaginary part of the result: Im(Z) = Im(X) + (Im(Y) >> SHIFTR) Opcode LSW: 1110 0101 0010 1010 Description Complex 32 + 32 = 32-bit addition operation. The second input operand (stored in VR5 and VR4) is shifted right by VSTATUS[SHIFR] bits before the addition. If VSTATUS[RND] is set, then bits shifted out to the right are rounded, otherwise these bits are truncated. The rounding operation is described in Section 2.3.2. If the VSTATUS[SAT] bit is set, then the result will be saturated in the event of an overflow or underflow. // // // // // // // // // RND is VSTATUS[RND] SAT is VSTATUS[SAT] SHIFTR is VSTATUS[SHIFTR] VR5 = Re(X) VR3 = Re(Y) VR4 = Im(X) VR2 = Im(Y) Z = X + Y if (RND == 1) { VR7 = VR7 + VR6 = VR6 + } else { VR7 = VR5 + VR6 = VR4 + } if (SAT == 1) { sat32(VR7); sat32(VR6); } round(VR5 >> SHIFTR); round(VR4 >> SHIFTR); // Re(Z) // Im(Z) (VR5 >> SHIFTR); (VR4 >> SHIFTR); // Re(Z) // Im(Z) Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the VR7 computation (real part) overflows. • OVFI is set if the VR6 computation (imaginary part) overflows. Pipeline This is a single-cycle instruction. SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 217 VCADD VR7, VR6, VR5, VR4 — Complex 32 + 32 = 32- Addition www.ti.com Example See also 218 VCADD VR5, VR4, VR3, VR2 VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 VCLROVFI VCLROVFR VRNDOFF VRNDON VSATON VSATOFF VSETSHR #5-bit C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 — Complex Conjugate Multiply and Accumulate www.ti.com VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 Complex Conjugate Multiply and Accumulate Operands Input Register (1) Value VR0 First Complex Operand VR1 Second Complex Operand VR2 Imaginary part of the Result VR3 Real part of the Result VR4 Imaginary part of the accumulation VR5 Real part of the accumulation (1) The user will need to do one final addition to accumulate the final multiplications (Real-VR3 and ImaginaryVR2) into the result registers. Opcode LSW: 1110 0101 0000 1111 Description Complex Conjugate Multiply Operation // VR5 = Accumulation of the real part // VR4 = Accumulation of the imaginary part // // VR0 = X + jX: VR0[31:16] = X, VR0[15:0] = jX // VR1 = Y + jY: VR1[31:16] = Y, VR1[15:0] = jY // // Perform add // if (RND == 1) { VR5 = VR5 + round(VR3 >> SHIFTR); VR4 = VR4 + round(VR2 >> SHIFTR); } else { VR5 = VR5 + (VR3 >> SHIFTR); VR4 = VR4 + (VR2 >> SHIFTR); } // // Perform multiply (X + jX) * (Y - jY) // If(VSTATUS[CPACK] == 0){ VR3 = VR0H * VR1H + VR0L * VR1L; Real result VR2 = VR0H * VR1L - VR0L * VR1H; Imaginary result } else { VR3 = VR0L * VR1L + VR0H * VR1H; Real result VR2 = VR0L * VR1H - VR0H * VR1L; Imaginary result } if(SAT == 1) { sat32(VR3); sat32(VR2); } Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the VR3 computation (real part) overflows or underflows. • OVFI is set if the VR2 computation (imaginary part) overflows or underflows. Pipeline This is a 2p-cycle instruction. See also VCLROVFI SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 219 VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 — Complex Conjugate Multiply and Accumulate www.ti.com VCLROVFR VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 VSATON VSATOFF 220 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — : Complex Conjugate Multiply and Accumulate with Parallel Load VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 : Complex Conjugate Multiply and Accumulate with Parallel Load Operands Input Register Value VR0 First Complex Operand VR1 Second Complex Operand VR2 Imaginary part of the Result VR3 Real part of the Result VR4 Imaginary part of the accumulation VR5 Real part of the accumulation VRa Contents of the memory pointed to by mem32. VRa cannot be VR5, VR4 or VR8 mem32 Pointer to 32-bit memory location Note: The user will need to do one final addition to accumulate the final multiplications (Real-VR3 and Imaginary-VR2) into the result registers. Opcode LSW: 1110 0011 1111 0111 MSW: 0001 aaaa mem32 Description Complex Conjugate Multiply Operation with parallel load. // VR5 = Accumulation of the real part // VR4 = Accumulation of the imaginary part // // VR0 = X + jX: VR0[31:16] = X, VR0[15:0] = jX // VR1 = Y + jY: VR1[31:16] = Y, VR1[15:0] = jY // // Perform add // if (RND == 1) { VR5 = VR5 + round(VR3 >> SHIFTR); VR4 = VR4 + round(VR2 >> SHIFTR); } else { VR5 = VR5 + (VR3 >> SHIFTR); VR4 = VR4 + (VR2 >> SHIFTR); } // // Perform multiply (X + jX) * (Y - jY) // If(VSTATUS[CPACK] == 0){ VR3 = VR0H * VR1H + VR0L * VR1L; Real result VR2 = VR0H * VR1L - VR0L * VR1H; Imaginary result } else { VR3 = VR0L * VR1L + VR0H * VR1H; Real result VR2 = VR0L * VR1H - VR0H * VR1L; Imaginary result } if(SAT == 1) { sat32(VR3); sat32(VR2); } VRa = [mem32]; SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 221 VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — : Complex Conjugate Multiply and Accumulate with Parallel Load www.ti.com Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the VR3 computation (real part) overflows or underflows. • OVFI is set if the VR2 computation (imaginary part) overflows or underflows. Pipeline This is a 2p-cycle instruction. See also VCLROVFI VCLROVFR VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 VSATON VSATOFF 222 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VCCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ — Complex Conjugate Multiply and Accumulate VCCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ Complex Conjugate Multiply and Accumulate Operands The VMAC alternates which registers are used between each cycle. For odd cycles (1, 3, 5, and so on) the following registers are used: Odd Cycle Input VR5 VR4 VR1 VR0 [mem32] XAR7 Value Previous real-part total accumulation: Re(odd_sum) Previous imaginary-part total accumulation: Im(odd-sum) Previous real result from the multiply: Re(odd-mpy) Previous imaginary result from the multiply Im(odd-mpy) Pointer to a 32-bit memory location representing the first input to the multiply If(VSTATUS[CPACK] == 0) [mem32][32:16] = Re(X) [mem32][15:0] = Im(X) If(VSTATUS[CPACK] == 1) [mem32][32:16] = Im(X) mem32][15:0] = Re(X) Pointer to a 32-bit memory location representing the second input to the multiply If(VSTATUS[CPACK] == 0) *XAR7[32:16] = Re(X) *XAR7[15:0] = Im(X) If(VSTATUS[CPACK] == 1) *XAR7[32:16] = Im(X) *XAR7 [15:0] = Re(X) The result from the odd cycle is stored as shown below: Odd Cycle Output Value VR5 32-bit real part of the total accumulation Re(odd_sum) = Re(odd_sum) + Re(odd_mpy) VR4 32-bit imaginary part of the total accumulation Im(odd_sum) = Im(odd_sum) + Im(odd_mpy) VR1 32-bit real result from the multiplication: Re(Z) = Re(X)*Re(Y) + Im(X)*Im(Y) VR0 32-bit imaginary result from the multiplication: Im(Z) = Re(X)*Im(Y) - Re(Y)*Im(X) For even cycles (2, 4, 6, and so on) the following registers are used: Even Cycle Input Value VR7 Previous real-part total accumulation: Re(even_sum) VR6 Previous imaginary-part total accumulation: Im(even-sum) VR3 Previous real result from the multiply: Re(even-mpy) VR2 Previous imaginary result from the multiply Im(even-mpy) [mem32] Pointer to a 32-bit memory location representing the first input to the multiply If(VSTATUS[CPACK] == 0) [mem32][32:16] = Re(X) [mem32][15:0] = Im(X) If(VSTATUS[CPACK] == 1) [mem32][32:16] = Im(X) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 223 VCCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ — Complex Conjugate Multiply and Accumulate www.ti.com Even Cycle Input Value mem32][15:0] = Re(X) XAR7 Pointer to a 32-bit memory location representing the second input to the multiply If(VSTATUS[CPACK] == 0) *XAR7[32:16] = Re(X) *XAR7[15:0] = Im(X) If(VSTATUS[CPACK] == 1) *XAR7[32:16] = Im(X) *XAR7 [15:0] = Re(X) The result from even cycles is stored as shown below: Even Cycle Output Value VR7 32-bit real part of the total accumulation Re(even_sum) = Re(even_sum) + Re(even_mpy) VR6 32-bit imaginary part of the total accumulation Im(even_sum) = Im(even_sum) + Im(even_mpy) VR3 32-bit real result from the multiplication: Re(Z) = Re(X)*Re(Y) + Im(X)*Im(Y) VR2 32-bit imaginary result from the multiplication: Im(Z) = Re(X)*Im(Y) - Re(Y)*Im(X) Opcode LSW: 1110 0010 0101 0001 MSW: 0010 1111 mem32 Description Perform a repeated complex conjugate multiply and accumulate operation. This instruction must be used with the single repeat instruction (RPT ||). The destination of the accumulate will alternate between VR7/VR6 and VR5/VR4 on each cycle. // Cycle 1: // // Perform accumulate // if(RND == 1) { VR5 = VR5 + round(VR1 >> SHIFTR) VR4 = VR4 + round(VR0 >> SHIFTR) } else { VR5 = VR5 + (VR1 >> SHIFTR) VR4 = VR4 + (VR0 >> SHIFTR) } // // X and Y array element 0 // VR1 = Re(X)*Re(Y) + Im(X)*Im(Y) VR0 = Re(X)*Im(Y) - Re(Y)*Im(X) // // Cycle 2: // // Perform accumulate // if(RND == 1) { VR7 = VR7 + round(VR3 >> SHIFTR) VR6 = VR6 + round(VR2 >> SHIFTR) } 224 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VCCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ — Complex Conjugate Multiply and Accumulate else { VR7 = VR7 + (VR3 >> SHIFTR) VR6 = VR6 + (VR2 >> SHIFTR) } // // X and Y array element 1 // VR3 = Re(X)*Re(Y) + Im(X)*Im(Y) VR2 = Re(X)*Im(Y) - Re(Y)*Im(X) // // Cycle 3: // // Perform accumulate // if(RND == 1) { VR5 = VR5 + round(VR1 >> SHIFTR) VR4 = VR4 + round(VR0 >> SHIFTR) } else { VR5 = VR5 + (VR1 >> SHIFTR) VR4 = VR4 + (VR0 >> SHIFTR) } // // X and Y array element 2 // VR1 = Re(X)*Re(Y) + Im(X)*Im(Y) VR0 = Re(X)*Im(Y) - Re(Y)*Im(X) etc... Restrictions VR0, VR1, VR2, and VR3 will be used as temporary storage by this instruction. Flags The VSTATUS register flags are modified as follows: • OVFR is set in the case of an overflow or underflow of the addition or subtraction operations. • OVFI is set in the case an overflow or underflow of the imaginary part of the addition or subtraction operations. Pipeline The VCCMAC takes 2p + N cycles where N is the number of times the instruction is repeated. This instruction has the following pipeline restrictions: ; ; See also No restriction Cannot be a 2p instruction that writes to VR0, VR1...VR7 registers Execute N times, where N is even *XAR6++, *XAR7++ No restrictions. Can read VR0, VR1... VR8 VCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 225 VCCMPY VR3, VR2, VR1, VR0 — Complex Conjugate Multiply www.ti.com VCCMPY VR3, VR2, VR1, VR0 Complex Conjugate Multiply Operands Both inputs are complex numbers with a 16-bit real and 16-bit imaginary part. The result is a complex number with a 32-bit real and a 32-bit imaginary part. The result is stored in VR2 and VR3 as shown below: Input Register Value VR0 First Complex Operand VR1 Second Complex Operand VR2 Imaginary part of the Result VR3 Real part of the Result The result is a complex number with a 16-bit real and a 16-bit imaginary part. The result is stored in VR5 as shown below: Opcode LSW: 1110 0101 0000 1110 Description Complex Conjugate 16 x 16 = 32-bit multiply operation. If the VSTATUS[CPACK] bit is set, the low word of the input is treated as the real part while the upper word is treated as imaginary. If the VSTATUS[SAT] bit is set, then the result will be saturated in the event of a 32-bit overflow or underflow. The following operation is carried out:. if(VSTATUS[CPACK] == 0){ VR3 = VR0H * VR1H + VR0L VR2 = VR0H * VR1L - VR0L }else{ VR3 = VR0L * VR1L + VR0H VR2 = VR0L * VR1H - VR0H } * VR1L; //Re(Z) = Re(X)*Re(Y) + Im(X)*Im(Y) * VR1H; // Im(Z) = Re(X)*Im(Y) - Im(X)*Re(Y) * VR1H; // Re(Z) = Re(X)*Re(Y) + Im(X)*Im(Y) * VR1L; // Im(Z) = Re(X)*Im(Y) - Im(X)*Re(Y) Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the VR3 computation (real part) overflows or underflows. • OVFI is set if the VR2 computation (imaginary part) overflows or underflows. Pipeline This is a 2p-cycle instruction. The instruction following this one should not use VR3 or VR2. VCLRCPACK VMOV32 VMOV32 VCCMPY NOP VMOV32 VMOV32 VSETCPACK VMOV32 VMOV32 VCCMPY NOP VMOV32 VMOV32 VR0, *XAR4++ VR1, *XAR4++ VR3, VR2, VR1, VR0 *XAR5++, VR3 *XAR5++, VR2 VR0, *XAR4++ VR1, *XAR4++ VR3, VR2, VR1, VR0 *XAR5++, VR3 *XAR5++, VR2 ; ; ; ; ; cpack = 0 real part in high word load 1st complex input | jb + a load second complex input | jd + c complex conjugate multiply| (jb + a)*(jd + c)=(ac+bd)+j(bc-ad) ; ; ; ; ; ; ; store real part first store imag part next cpack = 1 imag part in low word load 1st complex input | a + jb load second complex input | c + jd complex conjugate multiply| (a + jb)*(c + jd)=(ac+bd)+j(bc-ad) ; store real part first ; store imag part next Example See also VCLROVFI VCLROVFR VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 226 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VCCMPY VR3, VR2, VR1, VR0 — Complex Conjugate Multiply www.ti.com VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 VSETCPACK VCLRCPACK VSATON VSATOFF SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 227 VCCMPY VR3, VR2, VR1, VR0 || VMOV32 mem32, VRa — Complex Conjugate Multiply with Parallel Store www.ti.com VCCMPY VR3, VR2, VR1, VR0 || VMOV32 mem32, VRa Complex Conjugate Multiply with Parallel Store Operands Both inputs are complex numbers with a 16-bit real and 16-bit imaginary part. The result is a complex number with a 32-bit real and a 32-bit imaginary part. The result is stored in VR2 and VR3 as shown below: Input Register Value VR0 First Complex Operand VR1 Second Complex Operand VRa Value to be stored VR2 Imaginary part of the Result VR3 Real part of the Result mem32 Pointer to 32-bit memory location The result is a complex number with a 16-bit real and a 16-bit imaginary part. The result is stored in VR5 as shown below: Opcode LSW: 1110 0011 0000 0111 MSW: 0001 aaaa mem32 Description Complex Conjugate 16 x 16 = 32-bit multiply operation. If the VSTATUS[CPACK] bit is set, the low word of the input is treated as the real part while the upper word is treated as imaginary. If the VSTATUS[SAT] bit is set, then the result will be saturated in the event of a 32-bit overflow or underflow. The following operation is carried out: if(VSTATUS[CPACK] == 0){ VR3 = VR0H * VR1H + VR0L VR2 = VR0H * VR1L - VR0L }else{ VR3 = VR0L * VR1L + VR0H VR2 = VR0L * VR1H - VR0H } [mem32] = VRa; * VR1L; //Re(Z) = Re(X)*Re(Y) + Im(X)*Im(Y) * VR1H; // Im(Z) = Re(X)*Im(Y) - Im(X)*Re(Y) * VR1H; // Re(Z) = Re(X)*Re(Y) + Im(X)*Im(Y) * VR1L; // Im(Z) = Re(X)*Im(Y) - Im(X)*Re(Y) Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the VR3 computation (real part) overflows or underflows. • OVFI is set if the VR2 computation (imaginary part) overflows or underflows. Pipeline This is a 2p/1-cycle instruction. The multply operation takes 2p cycles and the VMOV operation completes in a single cycle. The instruction following this one should not use VR3 or VR2. Example VCLRCPACK VMOV32 VMOV32 VCCMPY ||VMOV32 228 ; *XAR4++ ; *XAR4++ ; VR2, VR1, *XAR4++ ; ; NOP ; VMOV32 *XAR5++, VR3 ; VSETCPACK ; VMOV32 VR1, *XAR4++ ; VCCMPY VR3, VR2, VR1, ||VMOV32 *XAR5++, VR2 ; ; NOP ; VMOV32 *XAR5++, VR3 ; VMOV32 *XAR5++, VR2 ; VCLRCPACK VR0, VR1, VR3, VR0, C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) cpack = 0 real part in high word load 1st complex input | jb + a load second complex input | jd + c VR0 ; complex conjugate multiply| (jb + a)*(jd + c)=(ac+bd)+j(bc-ad) load 1st complex input | a + jb for next VCCMPY instr | store real part first cpack = 1 imag part in low word load second complex input | c + jd VR0 ; complex conjugate multiply| (a + jb)*(c + jd)=(ac+bd)+j(bc-ad) store imag part of first | VCCMPY instruction | store real part first store imag part next SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com See also VCCMPY VR3, VR2, VR1, VR0 || VMOV32 mem32, VRa — Complex Conjugate Multiply with Parallel Store VCLROVFI VCLROVFR VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 VCCMAC VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 VSETCPACK VCLRCPACK VSATON VSATOFF SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 229 VCCMPY VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — Complex Conjugate Multiply with Parallel Load www.ti.com VCCMPY VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 Complex Conjugate Multiply with Parallel Load Operands Both inputs are complex numbers with a 16-bit real and 16-bit imaginary part. The result is a complex number with a 32-bit real and a 32-bit imaginary part. The result is stored in VR2 and VR3 as shown below: Input Register Value VR0 First Complex Operand VR1 Second Complex Operand VRa 32-bit value pointed to by mem32. VRa can not be VR2, VR3 or VR8. VR2 Imaginary part of the Result VR3 Real part of the Result mem32 Pointer to 32-bit memory location The result is a complex number with a 16-bit real and a 16-bit imaginary part. The result is stored in VR5 as shown below: Opcode LSW: 1110 0011 1111 0110 MSW: 0001 aaaa mem32 Description Complex Conjugate 16 x 16 = 32-bit multiply operation. If the VSTATUS[CPACK] bit is set, the low word of the input is treated as the real part while the upper word is treated as imaginary. If the VSTATUS[SAT] bit is set, then the result will be saturated in the event of a 32-bit overflow or underflow. The following operation is carried out: if(VSTATUS[CPACK] == 0){ VR3 = VR0H * VR1H + VR0L VR2 = VR0H * VR1L - VR0L }else{ VR3 = VR0L * VR1L + VR0H VR2 = VR0L * VR1H - VR0H } VRa = [mem32]; * VR1L; // Re(Z) = Re(X)*Re(Y) + Im(X)*Im(Y) * VR1H; // Im(Z) = Re(X)*Im(Y) - Im(X)*Re(Y) * VR1H; // Re(Z) = Re(X)*Re(Y) + Im(X)*Im(Y) * VR1L; // Im(Z) = Re(X)*Im(Y) - Im(X)*Re(Y) Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the VR3 computation (real part) overflows or underflows. • OVFI is set if the VR2 computation (imaginary part) overflows or underflows. Pipeline This is a 2p/1-cycle instruction. The multiply operation takes 2p cycles and the VMOV operation completes in a single cycle. The instruction following this one should not use VR3 or VR2. Example VCLRCPACK VMOV32 VMOV32 VCCMPY ||VMOV32 ; *XAR4++ ; *XAR4++ ; VR2, VR1, *XAR4++ ; ; NOP ; VMOV32 *XAR5++, VR3 ; VSETCPACK ; VMOV32 VR1, *XAR4++ ; VCCMPY VR3, VR2, VR1, ||VMOV32 *XAR5++, VR2 ; ; NOP ; VMOV32 *XAR5++, VR3 ; VMOV32 *XAR5++, VR2 ; VCLRCPACK 230 VR0, VR1, VR3, VR0, C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) cpack = 0 real part in high word load 1st complex input | jb + a load second complex input | jd + c VR0 ; complex conjugate multiply| (jb + a)*(jd + c)=(ac+bd)+j(bc-ad) load 1st complex input | a + jb for next VCCMPY instr | store real part first cpack = 1 imag part in low word load second complex input | c + jd VR0 ; complex conjugate multiply| (a + jb)*(c + jd)=(ac+bd)+j(bc-ad) store imag part of first | VCCMPY instruction | store real part first store imag part next SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com See also VCCMPY VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — Complex Conjugate Multiply with Parallel Load VCLROVFI VCLROVFR VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 VSETCPACK VCLRCPACK VSATON VSATOFF SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 231 VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 — Complex Conjugate Multiply with Parallel Load www.ti.com VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 Complex Conjugate Multiply with Parallel Load Operands Both inputs are complex numbers with a 16-bit real and 16-bit imaginary part. The result is a complex number with a 32-bit real and a 32-bit imaginary part. The result is stored in VR2 and VR3 as shown below: Input Register Value VR0 First Complex Operand VR1 Second Complex Operand VRa 32-bit value pointed to by mem32. VRa can not be VR2, VR3 or VR8. VR2 Imaginary part of the Result VR3 Real part of the Result mem32 Pointer to 32-bit memory location The result is a complex number with a 16-bit real and a 16-bit imaginary part. The result is stored in VR5 as shown below: Opcode LSW: 1110 0101 0000 1111 Description Complex Conjugate 16 x 16 = 32-bit multiply operation. If the VSTATUS[CPACK] bit is set, the low word of the input is treated as the real part while the upper word is treated as imaginary. If the VSTATUS[SAT] bit is set, then the result will be saturated in the event of a 32-bit overflow or underflow. The following operation is carried out: if(VSTATUS[CPACK] == 0){ VR3 = VR0H * VR1H + VR0L VR2 = VR0H * VR1L - VR0L }else{ VR3 = VR0L * VR1L + VR0H VR2 = VR0L * VR1H - VR0H } VRa = [mem32]; * VR1L; // Re(Z) = Re(X)*Re(Y) + Im(X)*Im(Y) * VR1H; // Im(Z) = Re(X)*Im(Y) - Im(X)*Re(Y) * VR1H; // Re(Z) = Re(X)*Re(Y) + Im(X)*Im(Y) * VR1L; // Im(Z) = Re(X)*Im(Y) - Im(X)*Re(Y) Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the VR3 computation (real part) overflows or underflows. • OVFI is set if the VR2 computation (imaginary part) overflows or underflows. Pipeline This is a 2p/1-cycle instruction. The multiply operation takes 2p cycles and the VMOV operation completes in a single cycle. The instruction following this one should not use VR3 or VR2. Example VCLRCPACK VMOV32 VMOV32 VCCMPY ||VMOV32 ; *XAR4++ ; *XAR4++ ; VR2, VR1, *XAR4++ ; ; NOP ; VMOV32 *XAR5++, VR3 ; VSETCPACK ; VMOV32 VR1, *XAR4++ ; VCCMPY VR3, VR2, VR1, ||VMOV32 *XAR5++, VR2 ; ; NOP ; VMOV32 *XAR5++, VR3 ; VMOV32 *XAR5++, VR2 ; VCLRCPACK See also 232 VR0, VR1, VR3, VR0, cpack = 0 real part in high word load 1st complex input | jb + a load second complex input | jd + c VR0 ; complex conjugate multiply| (jb + a)*(jd + c)=(ac+bd)+j(bc-ad) load 1st complex input | a + jb for next VCCMPY instr | store real part first cpack = 1 imag part in low word load second complex input | c + jd VR0 ; complex conjugate multiply| (a + jb)*(c + jd)=(ac+bd)+j(bc-ad) store imag part of first | VCCMPY instruction | store real part first store imag part next VCLROVFI C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 — Complex Conjugate Multiply with Parallel Load VCLROVFR VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 VSETCPACK VCLRCPACK VSATON VSATOFF SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 233 VCCON VRa — Complex Conjugate VCCON VRa www.ti.com Complex Conjugate Operands VRa Opcode Description General purpose register: VR0, VR1....VR7. Cannot be VR8. LSW: 1110 0001 0001 aaaa if(VSTATUS[CPACK] == 0){ if(VSTATUS[SAT] == 1){ VRaL = sat(- VraL) }else { VRaL = - VRaL } }else { if(VSTATUS[SAT] == 1){ VRaH = sat(- VraH) }else { VRaH = - VRaH } } Flags This instruction modifies the following bits in the VSTATUS register: • OVFI is set in the case an overflow or underflow of the imaginary part of the conjugate operation. Pipeline This is a single-cycle instruction. Example VCCON VR1 ; VR1 := VR1^* See also 234 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VCDADD16 VR5, VR4, VR3, VR2 — Complex 16 + 32 = 16 Addition www.ti.com VCDADD16 VR5, VR4, VR3, VR2 Complex 16 + 32 = 16 Addition Operands Before the operation, the inputs should be loaded into registers as shown below. The first operand is a complex number with a 16-bit real and 16-bit imaginary part. The second operand has a 32-bit real and a 32-bit imaginary part. Input Register Value VR4H 16-bit integer: if(VSTATUS[CPACK]==0) Re(X) else Im(X) VR4L 16-bit integer: if(VSTATUS[CPACK]==0) Im(X) else Re(X) VR3 32-bit integer representing the real part of the 2nd input: Re(Y) VR2 32-bit integer representing the imaginary part of the 2nd input: Im(Y) The result is a complex number with a 16-bit real and a 16-bit imaginary part. The result is stored in VR5 as shown below: Output Register Value VR5H 16-bit integer: if (VSTATUS[CPACK]==0){ Re(Z) = (Re(X) << SHIFTL) + (Re(Y)) >> SHIFTR } else { Im(Z) = (Im(X) << SHIFTL) + (Im(Y)) >> SHIFTR } VR5L 16-bit integer: if (VSTATUS[CPACK]==0){ Im(Z) = (Im(X) << SHIFTL) + (Im(Y)) >> SHIFTR } else { Re(Z) = (Re(X) << SHIFTL) + (Re(Y)) >> SHIFTR } Opcode LSW: 1110 0101 0000 0100 Description Complex 16 + 32 = 16-bit operation. This operation is useful for algorithms similar to a complex FFT. The first operand is a complex number with a 16-bit real and 16-bit imaginary part. The second operand has a 32-bit real and a 32-bit imaginary part. Before the addition, the first input is sign extended to 32-bits and shifted left by VSTATUS[VSHIFTL] bits. The result of the addition is left shifted by VSTATUS[VSHIFTR] before it is stored in VR5H and VR5L. If VSTATUS[RND] is set, then bits shifted out to the right are rounded, otherwise these bits are truncated. The rounding operation is described in Section 2.3.2. If the VSTATUS[SAT] bit is set, then the result will be saturated in the event of a 16-bit overflow or underflow. // // // // // // // // // // RND SAT SHIFTR SHIFTL is is is is VSTATUS[RND] VSTATUS[SAT] VSTATUS[SHIFTR] VSTATUS[SHIFTL] VSTATUS[CPACK] = 0 VR4H = Re(X) 16-bit VR4L = Im(X) 16-bit VR3 = Re(Y) 32-bit VR2 = Im(Y) 32-bit SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 235 VCDADD16 VR5, VR4, VR3, VR2 — Complex 16 + 32 = 16 Addition www.ti.com // // Calculate Z = X + Y // temp1 = sign_extend(VR4H); temp2 = sign_extend(VR4L); // 32-bit extended Re(X) // 32-bit extended Im(X) temp1 = (temp1 << SHIFTL) + VR3; temp2 = (temp2 << SHIFTL) + VR2; // Re(Z) intermediate // Im(Z) intermediate if (RND == 1) { temp1 = round(temp1 >> temp2 = round(temp2 >> } else { temp1 = truncate(temp1 temp2 = truncate(temp2 } if (SAT == 1) { VR5H = sat16(temp1); VR5L = sat16(temp2); } else { VR5H = temp1[15:0]; VR5L = temp2[15:0]; } SHIFTR); SHIFTR); >> SHIFTR); >> SHIFTR); Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the real-part computation (VR5H) overflows or underflows. • OVFI is set if the imaginary-part computation (VR5L) overflows or underflows. Pipeline This is a single-cycle instruction. Example ; ;Example: Z = X + Y ; ; X = 4 + 3j (16-bit real + 16-bit imaginary) ; Y = 13 + 12j (32-bit real + 32-bit imaginary) ; ; Real: ; temp1 = 0x00000004 + 0x0000000D = 0x00000011 ; VR5H = temp1[15:0] = 0x0011 = 17 ; Imaginary: ; temp2 = 0x00000003 + 0x0000000C = 0x0000000F ; VR5L = temp2[15:0] = 0x000F = 15 ; VSATOFF ; VSTATUS[SAT] = 0 VRNDOFF ; VSTATUS[RND] = 0 VSETSHR #0 ; VSTATUS[SHIFTR] = 0 VSETSHL #0 ; VSTATUS[SHIFTL] = 0 VCLEARALL ; VR0, VR1...VR8 == 0 VMOVXI VR3, #13 ; VR3 = Re(Y) = 13 VMOVXI VR2, #12 ; VR2 = Im(Y) = 12 VMOVXI VR4, #3 VMOVIX VR4, #4 ; VR4 = X = 0x00040003 = 4 + 3j VCDADD16 VR5, VR4, VR3, VR2 ; VR5 = Z = 0x0011000F = 17 + 15j The next example illustrates the operation with a right shift value defined. ; ; Example: Z = X + Y with Right Shift 236 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VCDADD16 VR5, VR4, VR3, VR2 — Complex 16 + 32 = 16 Addition www.ti.com ; ; ; ; ; ; ; ; ; ; ; ; ; X = 4 + 3j Y = 13 + 12j (16-bit real + 16-bit imaginary) (32-bit real + 32-bit imaginary) Real: temp1 = (0x00000004 + 0x0000000D ) >> 1 temp1 = (0x00000011) >> 1 = 0x0000008.8 VR5H = temp1[15:0] = 0x0008 = 8 Imaginary: temp2 = (0x00000003 + 0x0000000C ) >> 1 temp2 = (0x0000000F) >> 1 = 0x0000007.8 VR5L = temp2[15:0] = 0x0007 = 7 VSATOFF VRNDOFF VSETSHR VSETSHL VCLEARALL VMOVXI VMOVXI VMOVXI VMOVIX VCDADD16 #1 #0 VR3, VR2, VR4, VR4, VR5, #13 #12 #3 #4 VR4, VR3, VR2 ; ; ; ; ; ; ; VSTATUS[SAT] = 0 VSTATUS[RND] = 0 VSTATUS[SHIFTR] = 1 VSTATUS[SHIFTL] = 0 VR0, VR1...VR8 == 0 VR3 = Re(Y) = 13 VR2 = Im(Y) = 12 ; VR4 = X = 0x00040003 = ; VR5 = Z = 0x00080007 = 4 + 8 + 3j 7j The next example illustrates the operation with a right shift value defined as well as rounding. ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; Example: Z = X + Y with Right Shift and Rounding X = 4 + 3j Y = 13 + 12j (16-bit real + 16-bit imaginary) (32-bit real + 32-bit imaginary) Real: temp1 = round((0x00000004 + 0x0000000D ) >> 1) temp1 = round(0x00000011 >> 1) temp1 = round(0x0000008.8) = 0x00000009 VR5H = temp1[15:0] = 0x0011 = 8 Imaginary: temp2 = round(0x00000003 + 0x0000000C ) >> 1) temp2 = round(0x0000000F >> 1) temp2 = round(0x0000007.8) = 0x00000008 VR5L = temp2[15:0] = 0x0008 = 8 VSATOFF VRNDON VSETSHR VSETSHL VCLEARALL VMOVXI VMOVXI VMOVXI VMOVIX VCDADD16 #1 #0 VR3, VR2, VR4, VR4, VR5, #13 #12 #3 #4 VR4, VR3, VR2 ; ; ; ; ; ; ; VSTATUS[SAT] = 0 VSTATUS[RND] = 1 VSTATUS[SHIFTR] = 1 VSTATUS[SHIFTL] = 0 VR0, VR1...VR8 == 0 VR3 = Re(Y) = 13 VR2 = Im(Y) = 12 ; VR4 = X = 0x00040003 = ; VR5 = Z = 0x00090008 = 4 + 9 + 3j 8j The next example illustrates the operation with both a right and left shift value defined along with rounding. ; ; ; ; ; ; ; ; ; ; Example: Z = X + Y with Right Shift, Left Shift and Rounding X = -4 + 3j Y = 13 - 9j (16-bit real + 16-bit imaginary) (32-bit real + 32-bit imaginary) Real: temp1 = 0xFFFFFFFC << 2 + 0x0000000D temp1 = 0xFFFFFFF0 + 0x0000000D = 0xFFFFFFFD temp1 = 0xFFFFFFFD >> 1 = 0xFFFFFFFE.8 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 237 VCDADD16 VR5, VR4, VR3, VR2 — Complex 16 + 32 = 16 Addition www.ti.com ; temp1 = round(0xFFFFFFFFE.8) = 0xFFFFFFFF ; VR5H = temp1[15:0] 0xFFFF = -1; ; Imaginary: ; temp2 = 0x00000003 << 2 + 0xFFFFFFF7 ; temp2 = 0x0000000C + 0xFFFFFFF7 = 0x00000003 ; temp2 = 0x00000003 >> 1 = 0x00000001.8 ; temp1 = round(0x000000001.8 = 0x000000002 ; VR5L = temp2[15:0] 0x0002 = 2 ; VSATOFF ; VSTATUS[SAT] = 0 VRNDON ; VSTATUS[RND] = 1 VSETSHR #1 ; VSTATUS[SHIFTR] = 1 VSETSHL #2 ; VSTATUS[SHIFTL] = 2 VCLEARALL ; VR0, VR1...VR8 == 0 VMOVXI VR3, #13 ; VR3 = Re(Y) = 13 = 0x0000000D VMOVXI VR2, #-9 ; VR2 = Im(Y) = -9 VMOVIX VR2, #0xFFFF ; sign extend VR2 = 0xFFFFFFF7 VMOVXI VR4, #3 VMOVIX VR4, #-4 ; VR4 = X = 0xFFFC0003 = -4 + 3j VCDADD16 VR5, VR4, VR3, VR2 ; VR5 = Z = 0xFFFF0002 = -1 + 2j See also 238 VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 VCADD VR7, VR6, VR5, VR4 VCDADD16 VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 VRNDOFF VRNDON VSATON VSATOFF VSETSHL #5-bit VSETSHR #5-bit C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VCDADD16 VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex Double Add with Parallel Load VCDADD16 VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 Complex Double Add with Parallel Load Operands Before the operation, the inputs should be loaded into registers as shown below. The first operand is a complex number with a 16-bit real and 16-bit imaginary part. The second operand has a 32-bit real and a 32-bit imaginary part. Input Register Value VR4H 16-bit integer: if (VSTATUS[CPACK]==0) Re(X) else Im(X) VR4L 16-bit integer: if (VSTATUS[CPACK]==0) Im(X) else Re(X) VR3 32-bit integer representing the real part of the 2nd input: Re(Y) VR2 32-bit integer representing the imaginary part of the 2nd input: Im(Y) mem32 pointer to a 32-bit memory location. The result is a complex number with a 16-bit real and a 16-bit imaginary part. The result is stored in VR5 as shown below: Output Register Value VR5H 16-bit integer: if (VSTATUS[CPACK]==0){ Re(Z) = (Re(X) << SHIFTL) + (Re(Y) ) >> SHIFTR } else { Im(Z) = (Im(X) << SHIFTL) + (Im(Y) ) >> SHIFTR } VR5L 16-bit integer: if (VSTATUS[CPACK]==0){ Im(Z) = (Im(X) << SHIFTL) + (Im(Y) ) >> SHIFTR } else { Re(Z) = (Re(X) << SHIFTL) + (Re(Y) ) >> SHIFTR } VRa Contents of the memory pointed to by [mem32]. VRa can not be VR5 or VR8. Opcode LSW: 1110 0011 1111 1010 MSW: 0000 aaaa mem32 Description Complex 16 + 32 = 16-bit operation with parallel register load. This operation is useful for algorithms similar to a complex FFT. The first operand is a complex number with a 16-bit real and 16-bit imaginary part. The second operand has a 32-bit real and a 32-bit imaginary part. Before the addition, the first input is sign extended to 32-bits and shifted left by VSTATUS[VSHIFTL] bits. The result of the addition is left shifted by VSTATUS[VSHIFTR] before it is stored in VR5H and VR5L. If VSTATUS[RND] is set, then bits shifted out to the right are rounded, otherwise these bits are truncated. The rounding operation is described in Section 2.3.2. If the VSTATUS[SAT] bit is set, then the result will be saturated in the event of a 16-bit overflow or underflow. // RND is VSTATUS[RND] SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 239 VCDADD16 VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex Double Add with Parallel Load // // // // // // // // // www.ti.com SAT is VSTATUS[SAT] SHIFTR is VSTATUS[SHIFTR] SHIFTL is VSTATUS[SHIFTL] VSTATUS[CPACK] = 0 VR4H = Re(X) 16-bit VR4L = Im(X) 16-bit VR3 = Re(Y) 32-bit VR2 = Im(Y) 32-bit temp1 = sign_extend(VR4H); temp2 = sign_extend(VR4L); // 32-bit extended Re(X) // 32-bit extended Im(X) temp1 = (temp1 << SHIFTL) + VR3; temp2 = (temp2 << SHIFTL) + VR2; // Re(Z) intermediate // Im(Z) intermediate if (RND == 1) { temp1 = round(temp1 >> temp2 = round(temp2 >> } else { temp1 = truncate(temp1 temp2 = truncate(temp2 } if (SAT == 1) { VR5H = sat16(temp1); VR5L = sat16(temp2); } else { VR5H = temp1[15:0]; VR5L = temp2[15:0]; } VRa = [mem32]; SHIFTR); SHIFTR); >> SHIFTR); >> SHIFTR); Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the real-part (VR5H) computation overflows or underflows. • OVFI is set if the imaginary-part (VR5L) computation overflows or underflows. Pipeline Both operations complete in a single cycle. Example For more information regarding the addition operation, see the examples for the VCDADD16 VR5, VR4, VR3, VR2 instruction. ; ;Example: Right Shift, Left Shift and Rounding ; ; X = -4 + 3j (16-bit real + 16-bit imaginary) ; Y = 13 - 9j (32-bit real + 32-bit imaginary) ; ; ; Real: ; temp1 = 0xFFFFFFFC << 2 + 0x0000000D ; temp1 = 0xFFFFFFF0 + 0x0000000D = 0xFFFFFFFD ; temp1 = 0xFFFFFFFD >> 1 = 0xFFFFFFFE.8 ; temp1 = round(0xFFFFFFFFE.8) = 0xFFFFFFFF ; VR5H = temp1[15:0] 0xFFFF = -1; ; Imaginary: ; temp2 = 0x00000003 << 2 + 0xFFFFFFF7 ; temp2 = 0x0000000C + 0xFFFFFFF7 = 0x00000003 ; temp2 = 0x00000003 >> 1 = 0x00000001.8 ; temp1 = round(0x000000001.8 = 0x000000002 ; VR5L = temp2[15:0] 0x0002 = 2 240 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VCDADD16 VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex Double Add with Parallel Load ; || See also VSATOFF VRNDON VSETSHR VSETSHL VCLEARALL VMOVXI VMOVXI VMOVIX VMOVXI VMOVIX VCDADD16 VCMOV32 #1 #2 VR3, VR2, VR2, VR4, VR4, VR5, VR2, #13 #-9 #0xFFFF #3 #-4 VR4, VR3, VR2 *XAR7 ; ; ; ; ; ; ; ; VSTATUS[SAT] = 0 VSTATUS[RND] = 1 VSTATUS[SHIFTR] = 1 VSTATUS[SHIFTL] = 2 VR0, VR1...VR8 == 0 VR3 = Re(Y) = 13 = 0x0000000D VR2 = Im(Y) = -9 sign extend VR2 = 0xFFFFFFF7 ; VR4 = X = 0xFFFC0003 = -4 + 3j ; VR5 = Z = 0xFFFF0002 = -1 + 2j ; VR2 = value pointed to by XAR7 VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 VCADD VR7, VR6, VR5, VR4 VRNDOFF VRNDON VSATON VSATOFF VSETSHL #5-bit VSETSHR #5-bit SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 241 VCDSUB16 VR6, VR4, VR3, VR2 — Complex 16-32 = 16 Subtract www.ti.com VCDSUB16 VR6, VR4, VR3, VR2 Complex 16-32 = 16 Subtract Operands Before the operation, the inputs should be loaded into registers as shown below. The first operand is a complex number with a 16-bit real and 16-bit imaginary part. The second operand has a 32-bit real and a 32-bit imaginary part. Input Register Value VR4H 16-bit integer: if(VSTATUS[CPACK]==0) Re(X) else Im(X) VR4L 16-bit integer: if VSTATUS[CPACK]==0) Im(X) else Re(X) VR3 32-bit integer representing the real part of the 2nd input: Re(Y) VR2 32-bit integer representing the imaginary part of the 2nd input: Im(Y) The result is a complex number with a 16-bit real and a 16-bit imaginary part. The result is stored in VR6 as shown below: Output Register Value VR6H 16-bit integer: if (VSTATUS[CPACK]==0){ Re(Z) = (Re(X) << SHIFTL) -(Re(Y) ) >> SHIFTR } else { Im(Z) = (Im(X) << SHIFTL) -(Im(Y) ) >> SHIFTR } VR6L 16-bit integer: if(VSTATUS[CPACK]==0){ Im(Z) = (Im(X) << SHIFTL) -(Im(Y) ) >> SHIFTR } else { Re(Z) = (Re(X) << SHIFTL) -(Re(Y) ) >> SHIFTR } Opcode LSW: 1110 0101 0000 0101 Description Complex 16 - 32 = 16-bit operation. This operation is useful for algorithms similar to a complex FFT. The first operand is a complex number with a 16-bit real and 16-bit imaginary part. The second operand has a 32-bit real and a 32-bit imaginary part. Before the addition, the first input is sign extended to 32-bits and shifted left by VSTATUS[VSHIFTL] bits. The result of the subtraction is left shifted by VSTATUS[VSHIFTR] before it is stored in VR5H and VR5L. If VSTATUS[RND] is set, then bits shifted out to the right are rounded, otherwise these bits are truncated. The rounding operation is described in Section 2.3.2. If the VSTATUS[SAT] bit is set, then the result will be saturated in the event of a 16-bit overflow or underflow. // // // // // // // // // 242 RND SAT SHIFTR SHIFTL is is is is VSTATUS[RND] VSTATUS[SAT] VSTATUS[SHIFTR] VSTATUS[SHIFTL] VSTATUS[CPACK] = 0 VR4H = Re(X) 16-bit VR4L = Im(X) 16-bit VR3 = Re(Y) 32-bit C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VCDSUB16 VR6, VR4, VR3, VR2 — Complex 16-32 = 16 Subtract www.ti.com // VR2 = Im(Y) 32-bit temp1 = sign_extend(VR4H); temp2 = sign_extend(VR4L); // 32-bit extended Re(X) // 32-bit extended Im(X) temp1 = (temp1 << SHIFTL) - VR3; temp2 = (temp2 << SHIFTL) - VR2; // Re(Z) intermediate // Im(Z) intermediate if (RND == 1) { temp1 = round(temp1 >> temp2 = round(temp2 >> } else { temp1 = truncate(temp1 temp2 = truncate(temp2 } if (SAT == 1) { VR5H = sat16(temp1); VR5L = sat16(temp2); } else { VR5H = temp1[15:0]; VR5L = temp2[15:0]; } SHIFTR); SHIFTR); >> SHIFTR); >> SHIFTR); Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the real-part (VR6H) computation overflows or underflows. • OVFI is set if the imaginary-part (VR6L) computation overflows or underflows. Pipeline This is a single-cycle instruction. Example ; ; ; ; ; ; ; ; Example: Z = X - Y X = 4 + 6j Y = 13 + 22j (16-bit real + 16-bit imaginary) (32-bit real + 32-bit imaginary) Z = (4 - 13) + (6 - 22)j = -9 - 16j VSATOFF VRNDOFF VSETSHR VSETSHL VCLEARALL VMOVXI VMOVXI VMOVXI VMOVIX VCDSUB16 #0 #0 VR3, VR2, VR4, VR4, VR6, #13 #22 #6 #4 VR4, VR3, VR2 ; ; ; ; ; ; ; VSTATUS[SAT] = 0 VSTATUS[RND] = 0 VSTATUS[SHIFTR] = 0 VSTATUS[SHIFTL] = 0 VR0, VR1...VR8 = 0 VR3 = Re(Y) = 13 = 0x0000000D VR2 = Im(Y) = 22j = 0x00000016 ; VR4 = X = 0x00040006 = 4 + 6j ; VR5 = Z = 0xFFF7FFF0 = -9 + -16j The next example illustrates the operation with a right shift value defined. ; ; Example: Z = X - Y with Right Shift ; Y = 4 + 6j (16-bit real + 16-bit imaginary) ; X = 13 + 22j (32-bit real + 32-bit imaginary) ; ; Real: ; temp1 = (0x00000004 - 0x0000000D) >> 1 ; temp1 = (0xFFFFFFF7) >> 1 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 243 VCDSUB16 VR6, VR4, VR3, VR2 — Complex 16-32 = 16 Subtract www.ti.com ; temp1 = 0xFFFFFFFFB ; VR5H = temp1[15:0] = 0xFFFB = -5 ; Imaginary: ; temp2 = (0x00000006 - 0x00000016) >> 1 ; temp2 = (0xFFFFFFF0) >> 1 ; temp2 = 0xFFFFFFF8 ; VR5L = temp2[15:0] = 0xFFF8 = -8 ; VSATOFF ; VSTATUS[SAT] = 0 VRNDOFF ; VSTATUS[RND] = 0 VSETSHR #1 ; VSTATUS[SHIFTR] = 1 VSETSHL #0 ; VSTATUS[SHIFTL] = 0 VCLEARALL ; VR0, VR1...VR8 == 0 VMOVXI VR3, #13 ; VR3 = Re(Y) = 13 = 0x0000000D VMOVXI VR2, #22 ; VR2 = Im(Y) = 22j = 0x00000016 VMOVXI VR4, #6 VMOVIX VR4, #4 ; VR4 = X = 0x00040006 = 4 + 6j VCDSUB16 VR6, VR4, VR3, VR2 ; VR5 = Z = 0xFFFBFFF8 = -5 + -8j The next example illustrates rounding with a right shift value defined. ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; Example: Z = X-Y with Rounding and Right Shift X = 4 + 6j Y = -13 + 22j Real: temp1 = temp1 = temp1 = VR5H = Imaginary: temp2 = temp2 = temp2 = VR5L = (16-bit real + 16-bit imaginary) (32-bit real + 32-bit imaginary) round((0x00000004 - 0xFFFFFFF3) >> 1) round(0x00000011) >> 1) round(0x000000008.8) = 0x000000009 temp1[15:0] = 0x0009 = 9 round((0x00000006 - 0x00000016) >> 1) round(0xFFFFFFF0) >> 1) round(0xFFFFFFF8.0) = 0xFFFFFFF8 temp2[15:0] = 0xFFF8 = -8 VSATOFF VRNDON VSETSHR VSETSHL VCLEARALL VMOVXI VMOVIX VMOVXI VMOVXI VMOVIX VCDSUB16 #1 #0 VR3, VR3, VR2, VR4, VR4, VR6, #-13 #0xFFFF #22 #6 #4 VR4, VR3, VR2 ; ; ; ; ; ; ; ; VSTATUS[SAT] = 0 VSTATUS[RND] = 1 VSTATUS[SHIFTR] = VSTATUS[SHIFTL] = VR0, VR1...VR8 == VR3 = Re(Y) sign extend VR3 = VR2 = Im(Y) = 22j 1 0 0 -13 = 0xFFFFFFF3 = 0x00000016 ; VR4 = X = 0x00040006 = ; VR5 = Z = 0x0009FFF8 = 4 + 6j 9 + -8j The next example illustrates rounding with both a left and a right shift value defined. ; ; ; ; ; ; ; ; ; ; ; ; ; ; 244 Example: Z = X-Y with Rounding and both Left and Right Shift X = 4 + 6j Y = -13 + 22j Real: temp1 = temp1 = temp1 = temp1 = VR5H = Imaginary: temp2 = (16-bit real + 16-bit imaginary) (32-bit real + 32-bit imaginary) round((0x00000004 << round((0x00000010 round( 0x0000001D >> round( 0x0000000E.8) temp1[15:0] = 0x000F 2 - 0xFFFFFFF3) >> 1) - 0xFFFFFFF3) >> 1) 1) = 0x0000000F = 15 round((0x00000006 << 2 - 0x00000016) >> 1) C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VCDSUB16 VR6, VR4, VR3, VR2 — Complex 16-32 = 16 Subtract www.ti.com ; ; ; ; ; temp2 temp2 temp1 VR5L = = = = round((0x00000018 - 0x00000016) >> 1) round( 0x00000002 >> 1) round( 0x00000001.0) = 0x00000001 temp2[15:0] = 0x0001 = 1 VSATOFF VRNDON VSETSHR VSETSHL VCLEARALL VMOVXI VMOVIX VMOVXI VMOVXI VMOVIX VCDSUB16 See also #1 #2 VR3, VR3, VR2, VR4, VR4, VR6, #-13 #0xFFFF #22 #6 #4 VR4, VR3, VR2 ; ; ; ; ; ; ; ; VSTATUS[SAT] = 0 VSTATUS[RND] = 1 VSTATUS[SHIFTR] = VSTATUS[SHIFTL] = VR0, VR1...VR8 == VR3 = Re(Y) sign extend VR3 = VR2 = Im(Y) = 22j 1 2 0 -13 = 0xFFFFFFF3 = 0x00000016 ; VR4 = X = 0x00040006 = 4 + ; VR5 = Z = 0x000F0001 = 15 + 6j 1j VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 VCADD VR7, VR6, VR5, VR4 VRNDOFF VRNDON VSATON VSATOFF VSETSHL #5-bit VSETSHR #5-bit SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 245 VCDSUB16 VR6, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex 16-32 = 16 Subtract with Parallel Load www.ti.com VCDSUB16 VR6, VR4, VR3, VR2 || VMOV32 VRa, mem32 Complex 16-32 = 16 Subtract with Parallel Load Operands Before the operation, the inputs should be loaded into registers as shown below. The first operand is a complex number with a 16-bit real and 16-bit imaginary part. The second operand has a 32-bit real and a 32-bit imaginary part. Input Register Value VR4H 16-bit integer: if(VSTATUS[CPACK]==0) Re(X) else Im(X) VR4L 16-bit integer: if(VSTATUS[CPACK]==0) Im(X) else Re(X) VR3 32-bit integer representing the real part of the 2nd input: Re(Y) VR2 32-bit integer representing the imaginary part of the 2nd input: Im(Y) mem32 pointer to a 32-bit memory location. The result is a complex number with a 16-bit real and a 16-bit imaginary part. The result is stored in VR6 as shown below: Output Register Value VR6H 16-bit integer: if (VSTATUS[CPACK]==0){ Re(Z) = (Re(X) << SHIFTL) - (Re(Y) ) >> SHIFTR } else { Im(Z) = (Im(X) << SHIFTL) - (Im(Y) ) >> SHIFTR } VR6L 16-bit integer: if(VSTATUS[CPACK]==0){ Im(Z) = (Im(X) << SHIFTL) - (Im(Y)) >> SHIFTR } else { Re(Z) = (Re(X) << SHIFTL) - (Re(Y)) >> SHIFTR } VRa Contents of the memory pointed to by [mem32]. VRa cannot be VR6 or VR8. Opcode LSW: 1110 0011 1111 1011 MSW: 0000 aaaa mem32 Description Complex 16 - 32 = 16-bit operation with parallel load. This operation is useful for algorithms similar to a complex FFT. The first operand is a complex number with a 16-bit real and 16-bit imaginary part. The second operand has a 32-bit real and a 32-bit imaginary part. Before the addition, the first input is sign extended to 32-bits and shifted left by VSTATUS[VSHIFTL] bits. The result of the subtraction is left shifted by VSTATUS[VSHIFTR] before it is stored in VR5H and VR5L. If VSTATUS[RND] is set, then bits shifted out to the right are rounded, otherwise these bits are truncated. The rounding operation is described in Section 2.3.2. If the VSTATUS[SAT] bit is set, then the result will be saturated in the event of a 16-bit overflow or underflow. // RND is VSTATUS[RND] // SAT is VSTATUS[SAT] // SHIFTR is VSTATUS[SHIFTR] 246 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VCDSUB16 VR6, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex 16-32 = 16 Subtract with Parallel Load // // // // // // // SHIFTL is VSTATUS[SHIFTL] VSTATUS[CPACK] = 0 VR4H = Re(X) 16-bit VR4L = Im(X) 16-bit VR3 = Re(Y) 32-bit VR2 = Im(Y) 32-bit temp1 = sign_extend(VR4H); temp2 = sign_extend(VR4L); if (RND == 1) { temp1 = round(temp1 >> temp2 = round(temp2 >> } else { temp1 = truncate(temp1 temp2 = truncate(temp2 } if (SAT == 1) { VR5H = sat16(temp1); VR5L = sat16(temp2); } else { VR5H = temp1[15:0]; VR5L = temp2[15:0]; } VRa = [mem32]; // 32-bit extended Re(X) // 32-bit extended Im(X) SHIFTR); SHIFTR); >> SHIFTR); >> SHIFTR); Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the real-part (VR6H) computation overflows or underflows. • OVFI is set if the imaginary-part (VR6l) computation overflows or underflows. Pipeline Both operations complete in a single cycle. Example For more information regarding the subtraction operation, please refer to VCDSUB16 VR6, VR4, VR3, VR2. ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; Example: Z = X-Y with Rounding and both Left and Right Shift X = 4 + 6j Y = -13 + 22j Real: temp1 = temp1 = temp1 = temp1 = VR5H = Imaginary: temp2 = temp2 = temp2 = temp1 = VR5L = VSATOFF VRNDON VSETSHR VSETSHL (16-bit real + 16-bit imaginary) (32-bit real + 32-bit imaginary) round((0x00000004 << round((0x00000010 round( 0x0000001D >> round( 0x0000000E.8) temp1[15:0] = 0x000F 2 - 0xFFFFFFF3) >> 1) - 0xFFFFFFF3) >> 1) 1) = 0x0000000F = 15 round((0x00000006 << round((0x00000018 round( 0x00000002 >> round( 0x00000001.0) temp2[15:0] = 0x0001 2 - 0x00000016) >> 1) - 0x00000016) >> 1) 1) = 0x00000001 = 1 #1 #2 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback ; ; ; ; VSTATUS[SAT] = 0 VSTATUS[RND] = 1 VSTATUS[SHIFTR] = 1 VSTATUS[SHIFTL] = 2 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 247 VCDSUB16 VR6, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex 16-32 = 16 Subtract with Parallel Load www.ti.com VCLEARALL VMOVXI VMOVIX VMOVXI VMOVXI VMOVIX VCDSUB16 || VCMOV32 See also 248 VR3, VR3, VR2, VR4, VR4, VR6, VR2, #-13 #0xFFFF #22 #6 #4 VR4, VR3, VR2 *XAR7 ; ; ; ; VR0, VR1...VR8 == 0 VR3 = Re(Y) sign extend VR3 = -13 = 0xFFFFFFF3 VR2 = Im(Y) = 22j = 0x00000016 ; VR4 = X = 0x00040006 = 4 + 6j ; VR5 = Z = 0x000F0001 = 15 + 1j ; VR2 = contents pointed to by XAR7 VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 VCADD VR7, VR6, VR5, VR4 VRNDOFF VRNDON VSATON VSATOFF VSETSHL #5-bit VSETSHR #5-bit C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VCFLIP VRa — Swap Upper and Lower Half of VCU Register www.ti.com VCFLIP VRa Swap Upper and Lower Half of VCU Register Operands VRa General purpose register: VR0, VR1....VR7. Cannot be VR8. Opcode LSW: 1010 0001 0000 aaaa Description Swap VRaL and VRaH Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction. Example VCFLIP VR7 ; VR7H := VR7L | VR7L := VR7H See also SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 249 VCMAC VR5, VR4, VR3, VR2, VR1, VR0 — Complex Multiply and Accumulate www.ti.com VCMAC VR5, VR4, VR3, VR2, VR1, VR0 Complex Multiply and Accumulate Operands Input Register Value VR5 Real part of the accumulation VR4 Imaginary part of the accumulation VR3 Real part of the product VR2 Imaginary part of the product VR1 Second Complex Operand VR0 First Complex Operand NOTE: The user will need to do one final addition to accumulate the final multiplications (Real-VR3 and Imaginary-VR2) into the result registers. Opcode LSW: 1110 0101 0000 0001 Description Complex multiply operation. // // // // // // // // VR5 = Accumulation of the real part VR4 = Accumulation of the imaginary part VR0 = X + jX: VR1 = Y + jY: VR0[31:16] = X, VR1[31:16] = Y, VR0[15:0] = jX VR1[15:0] = jY Perform add if (RND == 1) { VR5 = VR5 + VR4 = VR4 + } else { VR5 = VR5 + VR4 = VR4 + } round(VR3 >> SHIFTR); round(VR2 >> SHIFTR); (VR3 >> SHIFTR); (VR2 >> SHIFTR); // // Perform multiply (X + jX) * // if(VSTATUS[CPACK] == 0){ VR3 = VR0H * VR1H - VR0L VR2 = VR0H * VR1L + VR0L }else{ VR3 = VR0L * VR1L - VR0H VR2 = VR0L * VR1H + VR0H } if(SAT == 1) { sat32(VR3); sat32(VR2); } (Y + jY) * VR1L; // Re(Z) = Re(X)*Re(Y) - Im(X)*Im(Y) * VR1H; // Im(Z) = Re(X)*Im(Y) + Im(X)*Re(Y) * VR1H; // Re(Z) = Re(X)*Re(Y) - Im(X)*Im(Y) * VR1L; // Im(Z) = Re(X)*Im(Y) + Im(X)*Re(Y) Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the VR3 computation (real part) overflows or underflows. • OVFI is set if the VR2 computation (imaginary part) overflows or underflows. Pipeline This is a 2p-cycle instruction. Example 250 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com See also VCMAC VR5, VR4, VR3, VR2, VR1, VR0 — Complex Multiply and Accumulate VCLROVFI VCLROVFR VCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 VSATON VSATOFF SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 251 VCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ — Complex Multiply and Accumulate www.ti.com VCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ Complex Multiply and Accumulate Operands The VMAC alternates which registers are used between each cycle. For odd cycles (1, 3, 5, and so on) the following registers are used: Odd Cycle Input VR5 VR4 VR1 VR0 [mem32] XAR7 Value Previous real-part total accumulation: Re(odd_sum) Previous imaginary-part total accumulation: Im(odd-sum) Previous real result from the multiply: Re(odd-mpy) Previous imaginary result from the multiply Im(odd-mpy) Pointer to a 32-bit memory location representing the first input to the multiply If(VSTATUS[CPACK] == 0) [mem32][32:16] = Re(X) [mem32][15:0] = Im(X) If(VSTATUS[CPACK] == 1) [mem32][32:16] = Im(X) mem32][15:0] = Re(X) Pointer to a 32-bit memory location representing the second input to the multiply If(VSTATUS[CPACK] == 0) *XAR7[32:16] = Re(X) *XAR7[15:0] = Im(X) If(VSTATUS[CPACK] == 1) *XAR7[32:16] = Im(X) *XAR7 [15:0] = Re(X) The result from odd cycle is stored as shown below: Odd Cycle Output Value VR5 32-bit real part of the total accumulation Re(odd_sum) = Re(odd_sum) + Re(odd_mpy) VR4 32-bit imaginary part of the total accumulation Im(odd_sum) = Im(odd_sum) + Im(odd_mpy) VR1 32-bit real result from the multiplication: Re(Z) = Re(X)*Re(Y) - Im(X)*Im(Y) VR0 32-bit imaginary result from the multiplication: Im(Z) = Re(X)*Im(Y) + Re(Y)*Im(X) For even cycles (2, 4, 6, and so on) the following registers are used: Even Cycle Input Value VR7 Previous real-part total accumulation: Re(even_sum) VR6 Previous imaginary-part total accumulation: Im(even-sum) VR3 Previous real result from the multiply: Re(even-mpy) VR2 Previous imaginary result from the multiply Im(even-mpy) [mem32] Pointer to a 32-bit memory location representing the first input to the multiply If(VSTATUS[CPACK] == 0) [mem32][32:16] = Re(X) [mem32][15:0] = Im(X) If(VSTATUS[CPACK] == 1) [mem32][32:16] = Im(X) 252 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ — Complex Multiply and Accumulate www.ti.com Even Cycle Input Value mem32][15:0] = Re(X) XAR7 Pointer to a 32-bit memory location representing the second input to the multiply If(VSTATUS[CPACK] == 0) *XAR7[32:16] = Re(X) *XAR7[15:0] = Im(X) If(VSTATUS[CPACK] == 1) *XAR7[32:16] = Im(X) *XAR7 [15:0] = Re(X) The result from even cycles is stored as shown below: Even Cycle Output Value VR7 32-bit real part of the total accumulation Re(even_sum) = Re(even_sum) + Re(even_mpy) VR6 32-bit imaginary part of the total accumulation Im(even_sum) = Im(even_sum) + Im(even_mpy) VR3 32-bit real result from the multiplication: Re(Z) = Re(X)*Re(Y) - Im(X)*Im(Y) VR2 32-bit imaginary result from the multiplication: Im(Z) = Re(X)*Im(Y) + Re(Y)*Im(X) Opcode LSW: 1110 0010 0101 0001 MSW: 0000 0000 mem32 Description Perform a repeated multiply and accumulate operation. This instruction must be used with the repeat instruction (RPT||). The destination of the accumulate will alternate between VR7/VR6 and VR5/VR4 on each cycle. // Cycle 1: // // Perform accumulate // if(RND == 1) { VR5 = VR5 + round(VR1 >> SHIFTR) VR4 = VR4 + round(VR0 >> SHIFTR) } else { VR5 = VR5 + (VR1 >> SHIFTR) VR4 = VR4 + (VR0 >> SHIFTR) } // // X and Y array element 0 // VR1 = Re(X)*Re(Y) - Im(X)*Im(Y) VR0 = Re(X)*Im(Y) + Re(Y)*Im(X) // // Cycle 2: // // Perform accumulate // if(RND == 1) { VR7 = VR7 + round(VR3 >> SHIFTR) VR6 = VR6 + round(VR2 >> SHIFTR) } else { SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 253 VCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ — Complex Multiply and Accumulate www.ti.com VR7 = VR7 + (VR3 >> SHIFTR) VR6 = VR6 + (VR2 >> SHIFTR) } // // X and Y array element 1 // VR3 = Re(X)*Re(Y) - Im(X)*Im(Y) VR2 = Re(X)*Im(Y) + Re(Y)*Im(X) // // Cycle 3: // // Perform accumulate // if(RND == 1) { VR5 = VR5 + round(VR1 >> SHIFTR) VR4 = VR4 + round(VR0 >> SHIFTR) } else { VR5 = VR5 + (VR1 >> SHIFTR) VR4 = VR4 + (VR0 >> SHIFTR) } // // X and Y array element 2 // VR1 = Re(X)*Re(Y) - Im(X)*Im(Y) VR0 = Re(X)*Im(Y) + Re(Y)*Im(X) etc... Restrictions VR0, VR1, VR2, and VR3 will be used as temporary storage by this instruction. Flags The VSTATUS register flags are modified as follows: • OVFR is set in the case of an overflow or underflow of the addition or subtraction operations. • OVFI is set in the case an overflow or underflow of the imaginary part of the addition or subtraction operations. Pipeline The VCCMAC takes 2p + N cycles where N is the number of times the instruction is repeated. This instruction has the following pipeline restrictions: < > < > ; No restrictions ; Cannot be a 2p instruction that writes ; to VR0, VR1...VR7 registers RPT #(N-1) ; Execute N times, where N is even || VCMAC VR7, VR6, VR5, VR4, *XAR6++, *XAR7++ < > ; No restrictions ; Can read VR0, VR1...VR8 Example Cascading of RPT || VCMAC is allowed as long as the first and subsequent counts are even. Cascading is useful for creating interruptible windows so that interrupts are not delayed too long by the RPT instruction. For example: ; ; Example of cascaded VMAC instructions ; VCLEARALL ; Zero the accumulation registers ; ; Execute MACF32 N+1 (4) times ; RPT #3 || VCMAC VR7, VR6, VR5, VR4, *XAR6++, *XAR7++ ; ; Execute MACF32 N+1 (6) times ; 254 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ — Complex Multiply and Accumulate RPT #5 || VCMAC VR7, VR6, VR5, VR4, *XAR6++, *XAR7++ ; ; Repeat MACF32 N+1 times where N+1 is even ; RPT #N || MACF32 R7H, R3H, *XAR6++, *XAR7++ ADDF32 VR7, VR6, VR5, VR4 See also VCCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 255 VCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — Complex Multiply and Accumulate with Parallel Load www.ti.com VCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 Complex Multiply and Accumulate with Parallel Load Operands Input Register Value VR0 First Complex Operand VR1 Second Complex Operand VR2 Imaginary part of the product VR3 Real part of the product VR4 Imaginary part of the accumulation VR5 Real part of the accumulation VRa Contents of the memory pointed to by mem32. VRa cannot be VR5, VR4, or VR8 mem32 Pointer to 32-bit memory location NOTE: The user will need to do one final addition to accumulate the final multiplications (Real-VR3 and Imaginary-VR2) into the result registers. Opcode LSW: 1110 0011 1111 0111 MSW: 0000 aaaa mem32 Description Complex multiply operation. // // // // // // // // VR5 = Accumulation of the real part VR4 = Accumulation of the imaginary part VR0 = X + Xj: VR1 = Y + Yj: VR0[31:16] = Re(X), VR1[31:16] = Re(Y), VR0[15:0] = Im(X) VR1[15:0] = Im(Y) Perform add if (RND == 1) { VR5 = VR5 + VR4 = VR4 + } else { VR5 = VR5 + VR4 = VR4 + } round(VR3 >> SHIFTR); round(VR2 >> SHIFTR); (VR3 >> SHIFTR); (VR2 >> SHIFTR); // // Perform multiply Z = (X + Xj) * (Y + Yj) // if(VSTATUS[CPACK] == 0){ VR3 = VR0H * VR1H - VR0L * VR1L; // Re(Z) = Re(X)*Re(Y) - Im(X)*Im(Y) VR2 = VR0H * VR1L + VR0L * VR1H; // Im(Z) = Re(X)*Im(Y) + Im(X)*Re(Y) }else{ VR3 = VR0L * VR1L - VR0H * VR1H; // Re(Z) = Re(X)*Re(Y) - Im(X)*Im(Y) VR2 = VR0L * VR1H + VR0H * VR1L; // Im(Z) = Re(X)*Im(Y) + Im(X)*Re(Y) }) if(SAT == 1) { sat32(VR3); sat32(VR2); } VRa = [mem32]; Flags 256 This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the VR3 computation (real part) overflows or underflows. C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — Complex Multiply and Accumulate with Parallel Load • Pipeline OVFI is set if the VR2 computation (imaginary part) overflows or underflows. This is a 2p/1-cycle instruction. The multiply and accumulate is a 2p-cycle operation and the VMOV32 is a single-cycle operation. Example See also VCLROVFI VCLROVFR VCMAC VR5, VR4, VR3, VR2, VR1, VR0 VSATON VSATOFF SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 257 VCMAG VRb, VRa — Magnitude of a Complex Number VCMAG VRb, VRa Magnitude of a Complex Number Operands VRb General purpose register VR0…VR8 www.ti.com VRa General purpose register VR0…VR8 Opcode LSW: 1110 0110 1111 0010 MSW: 0000 0100 bbbb aaaa Description Compute the magnitude of the Complex value in VRa If the VSTATUS[SAT] bit is set, then the result will be saturated in the event of a 32-bit overflow or underflow. If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VRb = rnd(sat(VRaH*VRaH + VRaL*VRaL)>>VSTATUS[SHIFTR]) }else { VRb = sat(VRaH*VRaH + VRaL*VRaL)>>VSTATUS[SHIFTR] } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VRb = rnd((VRaH*VRaH + VRaL*VRaL)>>VSTATUS[SHIFTR]) }else { VRb = (VRaH*VRaH + VRaL*VRaL)>>VSTATUS[SHIFTR] } } Sign-Extension is automatically done for the shift right operations Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if overflow is detected in the complex magnitude operation of the real 32-bit result Pipeline This is a 2 cycle instruction Example VMOV32 VR1, VR0 VCCON VR1 VCMAG VR2 , VR0 and so forth ; VR1 := VR0 ; VR1 := VR1^* ; VR2 := magnitude(VR0) See also 258 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VCMPY VR3, VR2, VR1, VR0 — Complex Multiply www.ti.com VCMPY VR3, VR2, VR1, VR0 Complex Multiply Operands Both inputs are complex numbers with a 16-bit real and 16-bit imaginary part. The result is a complex number with a 32-bit real and a 32-bit imaginary part. The result is stored in VR2 and VR3 as shown below: Input Register Value VR3 Real part of the Result VR2 Imaginary part of the Result VR1 Second Complex Operand VR0 First Complex Operand Opcode LSW: 1110 0101 0000 0000 Description Complex 16 x 16 = 32-bit multiply operation. If the VSTATUS[CPACK] bit is set, the low word of the input is treated as the real part while the upper word is treated as imaginary. If the VSTATUS[SAT] bit is set, the result will be saturated in the event of a 32-bit overflow or underflow. // Calculate: Z = (X + jX) * (Y // if(VSTATUS[CPACK] == 0){ VR3 = VR0H * VR1H - VR0L * VR2 = VR0H * VR1L + VR0L * }else{ VR3 = VR0L * VR1L - VR0H * VR2 = VR0L * VR1H + VR0H * } if(SAT == 1) { sat32(VR3); sat32(VR2); } + jY) VR1L; // Re(Z) = Re(X)*Re(Y) - Im(X)*Im(Y) VR1H; // Im(Z) = Re(X)*Im(Y) + Im(X)*Re(Y) VR1H; // Re(Z) = Re(X)*Re(Y) - Im(X)*Im(Y) VR1L; // Im(Z) = Re(X)*Im(Y) + Im(X)*Re(Y) Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the VR3 computation (real part) overflows or underflows. • OVFI is set if the VR2 computation (imaginary part) overflows or underflows. Pipeline This is a 2p-cycle instruction. The instruction following this one should not use VR3 or VR2. Example ; ; ; ; ; ; ; ; Example 1 X = 4 + 6j Y = 12 + 9j Z = X * Y Re(Z) = 4*12 - 6*9 = -6 Im(Z) = 4*9 + 6*12 = 108 VSATOFF VCLEARALL VMOVXI VMOVIX VMOVXI VMOVIX VCMPY ; VSTATUS[SAT] = 0 ; VR0, VR1...VR8 == 0 VR0, VR0, VR1, VR1, VR3, #6 #4 #9 #12 VR2, VR1, VR0 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback ; VR0 = X = 0x00040006 = ; ; ; ; ; ; 4 + 6j VR1 = Y = 0x000C0009 = 12 + 9j VR3 = Re(Z) = 0xFFFFFFFA = -6 VR2 = Im(Z) = 0x0000006C = 108 <- Must not use VR2, VR3 <- VCMPY completes, VR2, VR3 valid Can use VR2, VR3 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 259 VCMPY VR3, VR2, VR1, VR0 — Complex Multiply See also 260 www.ti.com VCLROVFI VCLROVFR VCMAC VR5, VR4, VR3, VR2, VR1, VR0 VCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 VSATON VSATOFF C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VCMPY VR3, VR2, VR1, VR0 || VMOV32 mem32, VRa — Complex Multiply with Parallel Store www.ti.com VCMPY VR3, VR2, VR1, VR0 || VMOV32 mem32, VRa Complex Multiply with Parallel Store Operands Both inputs are complex numbers with a 16-bit real and 16-bit imaginary part. The result is a complex number with a 32-bit real and a 32-bit imaginary part. The result is stored in VR2 and VR3 as shown below: Input Register Value VR3 Real part of the Result VR2 Imaginary part of the Result VR1 Second Complex Operand VR0 First Complex Operand VRa Value to be stored mem32 Pointer to 32-bit memory location Opcode LSW: 1110 0010 1100 1010 MSW: 0000 aaaa mem16 Description Complex 16 x 16 = 32-bit multiply operation with parallel register load. If the VSTATUS[CPACK] bit is set, the low word of the input is treated as the real part while the upper word is treated as imaginary. If the VSTATUS[SAT] bit is set, then the result will be saturated in the event of a 32-bit overflow or underflow. // Calculate: Z = (X + jX) * (Y // if(VSTATUS[CPACK] == 0){ VR3 = VR0H * VR1H - VR0L * VR2 = VR0H * VR1L + VR0L * }else{ VR3 = VR0L * VR1L - VR0H * VR2 = VR0L * VR1H + VR0H * } if(SAT == 1) { sat32(VR3); sat32(VR2); } VRa = [mem32]; + jY) VR1L; // Re(Z) = Re(X)*Re(Y) - Im(X)*Im(Y) VR1H; // Im(Z) = Re(X)*Im(Y) + Im(X)*Re(Y) VR1H; // Re(Z) = Re(X)*Re(Y) - Im(X)*Im(Y) VR1L; // Im(Z) = Re(X)*Im(Y) + Im(X)*Re(Y) Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the VR3 computation (real part) overflows or underflows. • OVFI is set if the VR2 computation (imaginary part) overflows or underflows. Pipeline This is a 2p/1-cycle instruction. The multiply operation takes 2p cycles and the VMOV operation completes in a single cycle. The instruction following this one must not use VR2 or VR3. Example ; ; ; ; ; ; ; ; Example 1 X = 4 + 6j Y = 12 + 9j Z = X * Y Re(Z) = 4*12 - 6*9 = -6 Im(Z) = 4*9 + 6*12 = 108 VSATOFF VCLEARALL VMOVXI VMOVIX VMOVXI VMOVIX ; VSTATUS[SAT] = 0 ; VR0, VR1...VR8 == 0 VR0, VR0, VR1, VR1, SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback #6 #4 #9 #12 ; VR0 = X = 0x00040006 = 4 + 6j ; VR1 = Y = 0x000C0009 = 12 + 9j ; VR3 = Re(Z) = 0xFFFFFFFA = -6 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 261 VCMPY VR3, VR2, VR1, VR0 || VMOV32 mem32, VRa — Complex Multiply with Parallel Store VCMPY VR3, VR2, VR1, VR0 || VMOV32 *XAR7, VR3 multiply) See also 262 www.ti.com ; VR2 = Im(Z) = 0x0000006C = 108 ; Location XAR7 points to = VR3 (before ; <- Must not use VR2, VR3 ; <- VCMPY completes, VR2, VR3 valid ; Can use VR2, VR3 VCLROVFI VCLROVFR VCMAC VR5, VR4, VR3, VR2, VR1, VR0 VCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 VSATON VSATOFF C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VCMPY VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — Complex Multiply with Parallel Load www.ti.com VCMPY VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 Complex Multiply with Parallel Load Operands Both inputs are complex numbers with a 16-bit real and 16-bit imaginary part. The result is a complex number with a 32-bit real and a 32-bit imaginary part. The result is stored in VR2 and VR3 as shown below: Input Register Value VR3 Real part of the Result VR2 Imaginary part of the Result VR1 Second Complex Operand VR0 First Complex Operand VRa 32-bit value pointed to by mem32. VRa can not be VR2, VR3 or VR8. mem32 Pointer to 32-bit memory location Opcode LSW: 1110 0011 1111 0110 MSW: 0000 aaaa mem32 Description Complex 16 x 16 = 32-bit multiply operation with parallel register load. If the VSTATUS[CPACK] bit is set, the low word of the input is treated as the real part while the upper word is treated as imaginary. If the VSTATUS[SAT] bit is set, then the result will be saturated in the event of a 32-bit overflow or underflow. // Calculate: Z = (X + jX) * (Y // if(VSTATUS[CPACK] == 0){ VR3 = VR0H * VR1H - VR0L * VR2 = VR0H * VR1L + VR0L * }else{ VR3 = VR0L * VR1L - VR0H * VR2 = VR0L * VR1H + VR0H * } if(SAT == 1) { sat32(VR3); sat32(VR2); } VRa = [mem32]; + jY) VR1L; // Re(Z) = Re(X)*Re(Y) - Im(X)*Im(Y) VR1H; // Im(Z) = Re(X)*Im(Y) + Im(X)*Re(Y) VR1H; // Re(Z) = Re(X)*Re(Y) - Im(X)*Im(Y) VR1L; // Im(Z) = Re(X)*Im(Y) + Im(X)*Re(Y) Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the VR3 computation (real part) overflows or underflows. • OVFI is set if the VR2 computation (imaginary part) overflows or underflows. Pipeline This is a 2p/1-cycle instruction. The multiply operation takes 2p cycles and the VMOV operation completes in a single cycle. The instruction following this one must not use VR2 or VR3. Example ; ; ; ; ; ; ; ; Example 1 X = 4 + 6j Y = 12 + 9j Z = X * Y Re(Z) = 4*12 - 6*9 = -6 Im(Z) = 4*9 + 6*12 = 108 VSATOFF VCLEARALL VMOVXI VMOVIX VMOVXI VMOVIX ; VSTATUS[SAT] = 0 ; VR0, VR1...VR8 == 0 VR0, VR0, VR1, VR1, SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback #6 #4 #9 #12 ; VR0 = X = 0x00040006 = 4 + 6j ; VR1 = Y = 0x000C0009 = 12 + 9j ; VR3 = Re(Z) = 0xFFFFFFFA = -6 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 263 VCMPY VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — Complex Multiply with Parallel Load || VCMPY VR3, VR2, VR1, VR0 VMOV32 VR0, *XAR7 See also 264 ; ; ; ; ; www.ti.com VR2 = Im(Z) = 0x0000006C = 108 VR0 = contents of location XAR7 points to <- Must not use VR2, VR3 <- VCMPY completes, VR2, VR3 valid Can use VR2, VR3 VCLROVFI VCLROVFR VCMAC VR5, VR4, VR3, VR2, VR1, VR0 VCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 VSATON VSATOFF C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VCSHL16 VRa << #4-bit — Complex Shift Left www.ti.com VCSHL16 VRa << #4-bit Complex Shift Left Operands VRa General purpose register VR0…VR8 #4-bit 4-bit unsigned immediate value Opcode LSW: 1110 0110 1111 0010 MSW: 0000 0000 IIII aaaa Description Left Shift the Real and Imaginary parts of the complex value in VRa. if(VSTATUS[CPACK] == 0){ if(VSTATUS[SAT] == 1){ VRaL = sat(VRaL <<#4-bit Immediate) (imaginary result) VRaH = sat(VRaH << #4-bit Immediate) (real result) }else { VRaL = VRaL << #4-bit Immediate (imaginary result) VRaH = VRaH << #4-bit Immediate (real result) } }else { If(VSTATUS[SAT] == 1){ VRaL = sat(VRaL << #4-bit Immediate) (real result) VRaH = sat(VRaH << #4-bit Immediate) (imaginary result) }else { VRaL = VRaL << #4-bit Immediate (real result) VRaH = VRaH << #4-bit Immediate (imaginary result) } } Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if overflow is detected in the shift left operation of the real signed-16-bit result. • OVFI is set if overflow is detected in the shift left operation of the imaginary signed16-bit result. Pipeline This is a single-cycle instruction. Example VSATOFF VCSHL16 ; turn off saturation VR5 << #8 ; VR5L := VR5L << 8 | VR5H := VR5H << 8 See also SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 265 VCSHR16 VRa >> #4-bit — Complex Shift Right www.ti.com VCSHR16 VRa >> #4-bit Complex Shift Right Operands VRa General purpose register VR0…VR8 #4-bit 4-bit unsigned immediate value Opcode LSW: 1110 0110 1111 0010 MSW: 0000 0001 IIII aaaa Description Right Shift the Real and Imaginary parts of the complex value in VRa. if(VSTATUS[CPACK] == 0){ if(VSTATUS[RND] == 1){ VRaL = rnd(VRaL >> #4-bit Immediate) (imaginary result) VRaH = rnd(VRaH >> #4-bit Immediate) (real result) }else { VRaL = VRaL >> #4-bit Immediate (imaginary result) VRaH = VRaH >> #4-bit Immediate (real result) } }else { If(VSTATUS[RND] == 1){ VRaL = rnd(VRaL >> #4-bit Immediate) (real result) VRaH = rnd(VRaH >> #4-bit Immediate) (imaginary result) }else { VRaL = VRaL >> #4-bit Immediate (real result) VRaH = VRaH >> #4-bit Immediate (imaginary result) } } Sign-Extension is automatically done for the shift right operations Flags This instruction does not affect any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example VSATOFF VCSHR16 ; turn off saturation VR6 >> #8 ; VR6L := VR6L >> 8 | VR6H := VR6H >> 8 See also 266 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VCSUB VR5, VR4, VR3, VR2 — Complex 32 - 32 = 32 Subtraction www.ti.com VCSUB VR5, VR4, VR3, VR2 Complex 32 - 32 = 32 Subtraction Operands Before the operation, the inputs should be loaded into registers as shown below. Each complex number includes a 32-bit real and a 32-bit imaginary part. Input Register Value VR5 32-bit integer representing the real part of the first input: Re(X) VR4 32-bit integer representing the imaginary part of the first input: Im(X) VR3 32-bit integer representing the real part of the 2nd input: Re(Y) VR2 32-bit integer representing the imaginary part of the 2nd input: Im(Y) The result is also a complex number with a 32-bit real and a 32-bit imaginary part. The result is stored in VR5 and VR4 as shown below: Output Register Value VR5 32-bit integer representing the real part of the result: Re(Z) = Re(X) - (Re(Y) >> SHIFTR) VR4 32-bit integer representing the imaginary part of the result: Im(Z) = Im(X) - (Im(Y) >> SHIFTR) Opcode LSW: 1110 0101 0000 0011 Description Complex 32 - 32 = 32-bit subtraction operation. The second input operand (stored in VR3 and VR2) is shifted right by VSTATUS[SHIFR] bits before the subtraction. If VSTATUS[RND] is set, then bits shifted out to the right are rounded, otherwise these bits are truncated. The rounding operation is described in Section 2.3.2. If the VSTATUS[SAT] bit is set, then the result will be saturated in the event of an overflow or underflow. // RND is VSTATUS[RND] // SAT is VSTATUS[SAT] // SHIFTR is VSTATUS[SHIFTR] // if (RND == 1) { VR5 = VR5 - round(VR3 >> SHIFTR); VR4 = VR4 - round(VR2 >> SHIFTR); } else { VR5 = VR5 - (VR3 >> SHIFTR); VR4 = VR4 - (VR2 >> SHIFTR); } if (SAT == 1) { sat32(VR5); sat32(VR4); } Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the VR5 computation (real part) overflows or underflows. • OVFI is set if the VR6 computation (imaginary part) overflows or underflows. Pipeline This is a single-cycle instruction. Example See also VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 VCADD VR7, VR6, VR5, VR4 VCSUB VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 VCLROVFI SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 267 VCSUB VR5, VR4, VR3, VR2 — Complex 32 - 32 = 32 Subtraction www.ti.com VCLROVFR VRNDOFF VRNDON VSATON VSATOFF VSETSHR #5-bit 268 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VCSUB VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex Subtraction www.ti.com VCSUB VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 Complex Subtraction Operands Before the operation, the inputs should be loaded into registers as shown below. Each complex number includes a 32-bit real and a 32-bit imaginary part. Input Register Value VR5 32-bit integer representing the real part of the first input: Re(X) VR4 32-bit integer representing the imaginary part of the first input: Im(X) VR3 32-bit integer representing the real part of the 2nd input: Re(Y) VR2 32-bit integer representing the imaginary part of the 2nd input: Im(Y) mem32 pointer to a 32-bit memory location The result is also a complex number with a 32-bit real and a 32-bit imaginary part. The result is stored in VR5 and VR4 as shown below: Output Register Value VR5 32-bit integer representing the real part of the result: Re(Z) = Re(X) - (Re(Y) >> SHIFTR) VR4 32-bit integer representing the imaginary part of the result: Im(Z) = Im(X) - (Im(Y) >> SHIFTR) VRa contents of the memory pointed to by [mem32]. VRa can not be VR5, VR4 or VR8. Opcode LSW: 1110 0011 1111 1001 MSW: 0000 aaaa mem32 Description Complex 32 - 32 = 32-bit subtraction operation with parallel load. The second input operand (stored in VR3 and VR2) is shifted right by VSTATUS[SHIFR] bits before the subtraction. If VSTATUS[RND] is set, then bits shifted out to the right are rounded, otherwise these bits are truncated. The rounding operation is described in Section 2.3.2. If the VSTATUS[SAT] bit is set, then the result will be saturated in the event of an overflow or underflow. // RND is VSTATUS[RND] // SAT is VSTATUS[SAT] // SHIFTR is VSTATUS[SHIFTR] // if (RND == 1) { VR5 = VR5 - round(VR3 >> SHIFTR); VR4 = VR4 - round(VR2 >> SHIFTR); } else { VR5 = VR5 - (VR3 >> SHIFTR); VR4 = VR4 - (VR2 >> SHIFTR); } if (SAT == 1) { sat32(VR5); sat32(VR4); } VRa = [mem32]; Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the VR5 computation (real part) overflows or underflows. • OVFI is set if the VR6 computation (imaginary part) overflows or underflows. Pipeline This is a single-cycle instruction. Example SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 269 VCSUB VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex Subtraction See also 270 www.ti.com VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 VCADD VR7, VR6, VR5, VR4 VCSUB VR5, VR4, VR3, VR2 VCLROVFI VCLROVFR VRNDOFF VRNDON VSATON VSATOFF VSETSHR #5-bit C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Instruction Set www.ti.com 2.5.5 Cyclic Redundancy Check (CRC) Instructions The instructions are listed alphabetically, preceded by a summary. Table 2-14. CRC Instructions Title ...................................................................................................................................... VCRC8H_1 mem16 — CRC8, High Byte ............................................................................................ VCRC8L_1 mem16 — CRC8 , Low Byte ............................................................................................ VCRC16P1H_1 mem16 — CRC16, Polynomial 1, High Byte ..................................................................... VCRC16P1L_1 mem16 — CRC16, Polynomial 1, Low Byte...................................................................... VCRC16P2H_1 mem16 — CRC16, Polynomial 2, High Byte ..................................................................... VCRC16P2L_1 mem16 — CRC16, Polynomial 2, Low Byte...................................................................... VCRC24H_1 mem16 — CRC24, High Byte ......................................................................................... VCRC24L_1 mem16 — CRC24, Low Byte .......................................................................................... VCRC32H_1 mem16 — CRC32, High Byte ......................................................................................... VCRC32L_1 mem16 — CRC32, Low Byte .......................................................................................... VCRC32P2H_1 mem16 — CRC32, Polynomial 2, High Byte ..................................................................... VCRC32P2L_1 mem16 — CRC32, Low Byte ....................................................................................... VCRCCLR — Clear CRC Result Register .......................................................................................... VMOV32 mem32, VCRC — Store the CRC Result Register ..................................................................... VMOV32 VCRC, mem32 — Load the CRC Result Register ...................................................................... SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated Page 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 271 VCRC8H_1 mem16 — CRC8, High Byte www.ti.com VCRC8H_1 mem16 CRC8, High Byte Operands mem16 16-bit memory location Opcode LSW: 1110 0010 1100 1100 MSW: 0000 0000 mem16 Description This instruction uses CRC8 polynomial == 0x07. Calculate the CRC8 of the most significant byte pointed to by mem16 and accumulate it with the value in the VCRC register. Store the result in VCRC. if (VSTATUS[CRCMSGFLIP == 0){ temp[7:0] = [mem16][15:8]; }else { temp[7:0] = [mem16][8:15]; } VCRC[7:0] = CRC8 (VCRC[7:0], temp[7:0]) Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VCRC8L_1 mem16 See also VCRC8L_1 mem16 272 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VCRC8L_1 mem16 — CRC8 , Low Byte www.ti.com VCRC8L_1 mem16 CRC8 , Low Byte Operands mem16 16-bit memory location Opcode LSW: 1110 0010 1100 1011 MSW: 0000 0000 mem16 Description This instruction uses CRC8 polynomial == 0x07. Calculate the CRC8 of the least significant byte pointed to by mem16 and accumulate it with the value in the VCRC register. Store the result in VCRC. if (VSTATUS[CRCMSGFLIP] == 0){ temp[7:0] = [mem16][7:0]; }else{ temp[7:0] = [mem16][0:7]; } VCRC[7:0] = CRC8 (VCRC[7:0], temp[7:0]) Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example typedef struct { uint32_t *CRCResult; uint16_t *CRCData; uint16_t CRCLen; }CRC_CALC; // // // Address where result should be stored Start of data Length of data in bytes CRC_CALC mycrc; ... CRC8(&mycrc); ... ; ------------------; Calculate the CRC of a block of data ; This function assumes the block is a multiple of 2 16-bit words ; .global _CRC8 _CRC8 VCRCCLR ; Clear the result register MOV AL, *+XAR4[4] ; AL = CRCLen ASR AL, 2 ; AL = CRCLen/4 SUBB AL, #1 ; AL = CRCLen/4 - 1 MOVL XAR7, *+XAR4[2] ; XAR7 = &CRCData .align 2 NOP ; Align RPTB to an odd address RPTB _CRC8_done, AL ; Execute block of code AL + 1 times VCRC8L_1 *XAR7 ; Calculate CRC for 4 bytes VCRC8H_1 *XAR7++ ; ... VCRC8L_1 *XAR7 ; ... VCRC8H_1 *XAR7++ ; ... _CRC8_done MOVL XAR7, *_+XAR4[0] ; XAR7 = &CRCResult MOV32 *+XAR7[0], VCRC ; Store the result LRETR ; return to caller See also VCRC8H_1 mem16 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 273 VCRC16P1H_1 mem16 — CRC16, Polynomial 1, High Byte www.ti.com VCRC16P1H_1 mem16 CRC16, Polynomial 1, High Byte Operands mem16 16-bit memory location Opcode LSW: 1110 0010 1100 1111 MSW: 0000 0000 mem16 Description This instruction uses CRC16 polynomial 1 == 0x8005. Calculate the CRC16 of the most significant byte pointed to by mem16 and accumulate it with the value in the VCRC register. Store the result in VCRC. if (VSTATUS[CRCMSGFLIP] == 0){ temp[7:0] = [mem16][15:8]; }else { temp[7:0] = [mem16][8:15]; } VCRC[15:0] = CRC16(VCRC[15:0], temp[7:0]) Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VCRC16P1L_1 mem16. See also VCRC16P1L_1 mem16 VCRC16P2H_1 mem16 VCRC16P2L_1 mem16 274 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VCRC16P1L_1 mem16 — CRC16, Polynomial 1, Low Byte www.ti.com VCRC16P1L_1 mem16 CRC16, Polynomial 1, Low Byte Operands mem16 16-bit memory location Opcode LSW: 1110 0010 1100 1110 MSW: 0000 0000 mem16 Description This instruction uses CRC16 polynomial 1 == 0x8005. Calculate the CRC16 of the least significant byte pointed to by mem16 and accumulate it with the value in the VCRC register. Store the result in VCRC. if (VSTATUS[CRCMSGFLIP] == 0){ temp[7:0] = [mem16][7:0]; }else { temp[7:0] = [mem16][0:7]; } VCRC[15:0] = CRC16 (VCRC[15:0], temp[7:0])) Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example typedef struct { uint32_t *CRCResult; uint16_t *CRCData; uint16_t CRCLen; }CRC_CALC; // // // Address where result should be stored Start of data Length of data in bytes CRC_CALC mycrc; ... CRC16P1(&mycrc); ... ; ------------------; Calculate the CRC of a block of data ; This function assumes the block is a multiple of 2 16-bit words ; .global _CRC16P1 _CRC16P1 VCRCCLR ; Clear the result register MOV AL, *+XAR4[4] ; AL = CRCLen ASR AL, 2 ; AL = CRCLen/4 SUBB AL, #1 ; AL = CRCLen/4 - 1 MOVL XAR7, *+XAR4[2] ; XAR7 = &CRCData .align 2 NOP ; Align RPTB to an odd address RPTB _CRC16P1_done, AL ; Execute block of code AL + 1 times VCRC16P1L_1 *XAR7 ; Calculate CRC for 4 bytes VCRC16P1H_1 *XAR7++ ; ... VCRC16P1L_1 *XAR7 ; ... VCRC16P1H_1 *XAR7++ ; ... _CRC16P1_done MOVL XAR7, *_+XAR4[0] ; XAR7 = &CRCResult MOV32 *+XAR7[0], VCRC ; Store the result LRETR ; return to caller See also VCRC16P1H_1 mem16 VCRC16P2H_1 mem16 VCRC16P2L_1 mem16 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 275 VCRC16P2H_1 mem16 — CRC16, Polynomial 2, High Byte www.ti.com VCRC16P2H_1 mem16 CRC16, Polynomial 2, High Byte Operands mem16 16-bit memory location Opcode LSW: 1110 0010 1100 1111 MSW: 0001 0000 mem16 Description This instruction uses CRC16 polynomial 2== 0x1021. Calculate the CRC16 of the most significant byte pointed to by mem16 and accumulate it with the value in the VCRC register. Store the result in VCRC. if (VSTATUS[CRCMSGFLIP] == 0){ temp[7:0] = [mem16][15:8]; }else { temp[7:0] = [mem16][8:15]; } VCRC[15:0] = CRC16(VCRC[15:0], temp[7:0]) Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VCRC16P2L_1 mem16. See also VCRC16P2L_1 mem16 VCRC16P1H_1 mem16 VCRC16P1L_1 mem16 276 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VCRC16P2L_1 mem16 — CRC16, Polynomial 2, Low Byte www.ti.com VCRC16P2L_1 mem16 CRC16, Polynomial 2, Low Byte Operands mem16 16-bit memory location Opcode LSW: 1110 0010 1100 1110 MSW: 0001 0000 mem16 Description This instruction uses CRC16 polynomial 2== 0x1021. Calculate the CRC16 of the least significant byte pointed to by mem16 and accumulate it with the value in the VCRC register. Store the result in VCRC. if (VSTATUS[CRCMSGFLIP] == 0){ temp[7:0] = [mem16][7:0]; }else { temp[7:0] = [mem16][0:7]; } VCRC[15:0] = CRC16 (VCRC[15:0], temp[7:0] Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example typedef struct { uint32_t *CRCResult; uint16_t *CRCData; uint16_t CRCLen; }CRC_CALC; // // // Address where result should be stored Start of data Length of data in bytes CRC_CALC mycrc; ... CRC16P2(&mycrc); ... ; ------------------; Calculate the CRC of a block of data ; This function assumes the block is a multiple of 2 16-bit words ; .global _CRC16P2 _CRC16P2 VCRCCLR ; Clear the result register MOV AL, *+XAR4[4] ; AL = CRCLen ASR AL, 2 ; AL = CRCLen/4 SUBB AL, #1 ; AL = CRCLen/4 - 1 MOVL XAR7, *+XAR4[2] ; XAR7 = &CRCData .align 2 NOP ; Align RPTB to an odd address RPTB _CRC16P2_done, AL ; Execute block of code AL + 1 times VCRC16P2L_1 *XAR7 ; Calculate CRC for 4 bytes VCRC16P2H_1 *XAR7++ ; ... VCRC16P2L_1 *XAR7 ; ... VCRC16P2H_1 *XAR7++ ; ... _CRC16P2_done MOVL XAR7, *_+XAR4[0] ; XAR7 = &CRCResult MOV32 *+XAR7[0], VCRC ; Store the result LRETR ; return to caller See also VCRC16P2H_1 mem16 VCRC16P1H_1 mem16 VCRC16P1L_1 mem16 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 277 VCRC24H_1 mem16 — CRC24, High Byte www.ti.com VCRC24H_1 mem16 CRC24, High Byte Operands mem16 16-bit memory location Opcode LSW: 1110 0010 1100 1011 MSW: 0000 0010 mem16 Description This instruction uses CRC24 polynomial == 0x5D6DCB Calculate the CRC24 of the most significant byte pointed to by mem16 and accumulate it with the value in the VCRC register. Store the result in VCRC. if (VSTATUS[CRCMSGFLIP] == 0){ temp[7:0] = [mem16][15:8]; }else { temp[7:0] = [mem16][8:15]; } VCRC[23:0] = CRC24 (VCRC[23:0], temp[7:0]) Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VCRC24L_1 mem16. See also VCRC24L_1 mem16 278 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VCRC24L_1 mem16 — CRC24, Low Byte www.ti.com VCRC24L_1 mem16 CRC24, Low Byte Operands mem16 16-bit memory location Opcode LSW: 1110 0010 1100 1011 MSW: 0000 0001 mem16 Description This instruction uses CRC24 polynomial == 0x5D6DCB Calculate the CRC24 of the most significant byte pointed to by mem16 and accumulate it with the value in the VCRC register. Store the result in VCRC. if (VSTATUS[CRCMSGFLIP] == 0){ temp[7:0] = [mem16][7:0]; }else { temp[7:0] = [mem16][0:7]; } VCRC[23:0] = CRC24 (VCRC[23:0], temp[7:0]) Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example typedef struct { uint32_t *CRCResult; // Address where result should be stored uint16_t *CRCData; // Start of data uint16_t CRCLen; // Length of data in bytes }CRC_CALC; CRC_CALC mycrc; ... CRC24(&mycrc); ... ; ------------------; Calculate the CRC of a block of data ; This function assumes the block is a multiple of 2 16-bit words ; .global _CRC24 _CRC24 VCRCCLR ; Clear the result register MOV AL, *+XAR4[4] ; AL = CRCLen ASR AL, 2 ; AL = CRCLen/4 SUBB AL, #1 ; AL = CRCLen/4 - 1 MOVL XAR7, *+XAR4[2] ; XAR7 = &CRCData .align 2 NOP ; Align RPTB to an odd address RPTB _CRC24_done, AL ; Execute block of code AL + 1 times VCRC24L_1 *XAR7 ; Calculate CRC for 4 bytes VCRC24H_1 *XAR7++ ; ... VCRC24L_1 *XAR7 ; ... VCRC24H_1 *XAR7++ ; ... _CRC24_done MOVL XAR7, *_+XAR4[0] ; XAR7 = &CRCResult VMOV32 *+XAR7[0], VCRC ; Store the result LRETR ; return to caller See also VCRC24H_1 mem16 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 279 VCRC32H_1 mem16 — CRC32, High Byte www.ti.com VCRC32H_1 mem16 CRC32, High Byte Operands mem16 16-bit memory location Opcode LSW: 1110 0010 1100 0010 MSW: 0000 0000 mem16 Description This instruction uses CRC32 polynomial 1 == 0x04C11DB7 Calculate the CRC32 of the most significant byte pointed to by mem16 and accumulate it with the value in the VCRC register. Store the result in VCRC. if (VSTATUS[CRCMSGFLIP] == 0){ temp[7:0] = [mem16][15:8]; }else { temp[7:0] = [mem16][8:15]; } VCRC[31:0] = CRC32 (VCRC[31:0], temp[7:0]) Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VCRC32L_1 mem16. See also VCRC32L_1 mem16 280 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VCRC32L_1 mem16 — CRC32, Low Byte www.ti.com VCRC32L_1 mem16 CRC32, Low Byte Operands mem16 16-bit memory location Opcode LSW: 1110 0010 1100 0001 MSW: 0000 0000 mem16 Description This instruction uses CRC32 polynomial 1 == 0x04C11DB7 Calculate the CRC32 of the least significant byte pointed to by mem16 and accumulate it with the value in the VCRC register. Store the result in VCRC. if (VSTATUS[CRCMSGFLIP] == 0){ temp[7:0] = [mem16][7:0]; }else { temp[7:0] = [mem16][0:7]; } VCRC[31:0] = CRC32 (VCRC[31:0], temp[7:0]) Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example typedef struct { uint32_t *CRCResult; uint16_t *CRCData; uint16_t CRCLen; }CRC_CALC; // // // Address where result should be stored Start of data Length of data in bytes CRC_CALC mycrc; ... CRC32(&mycrc); ... ; ------------------; Calculate the CRC of a block of data ; This function assumes the block is a multiple of 2 16-bit words ; .global _CRC32 _CRC32 VCRCCLR ; Clear the result register MOV AL, *+XAR4[4] ; AL = CRCLen ASR AL, 2 ; AL = CRCLen/4 SUBB AL, #1 ; AL = CRCLen/4 - 1 MOVL XAR7, *+XAR4[2] ; XAR7 = &CRCData .align 2 NOP ; Align RPTB to an odd address RPTB _CRC32_done, AL ; Execute block of code AL + 1 times VCRC32L_1 *XAR7 ; Calculate CRC for 4 bytes VCRC32H_1 *XAR7++ ; ... VCRC32L_1 *XAR7 ; ... VCRC32H_1 *XAR7++ ; ... _CRC32_done MOVL XAR7, *_+XAR4[0] ; XAR7 = &CRCResult WMOV32 *+XAR7[0], VCRC ; Store the result LRETR ; return to caller See also VCRC32H_1 mem16 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 281 VCRC32P2H_1 mem16 — CRC32, Polynomial 2, High Byte www.ti.com VCRC32P2H_1 mem16 CRC32, Polynomial 2, High Byte Operands mem16 16-bit memory location Opcode LSW: 1110 0010 1100 1011 MSW: 0000 0100 mem16 Description This instruction uses CRC32 polynomial == 0x1EDC6F41 Calculate the CRC32 of the most significant byte pointed to by mem16 and accumulate it with the value in the VCRC register. Store the result in VCRC. if (VSTATUS[CRCMSGFLIP] == 0){ temp[7:0] = [mem16][15:8]; }else { temp[7:0] = [mem16][8:15]; } VCRC[31:0] = CRC32 (VCRC[31:0], temp[7:0]) Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VCRC32P2L_1 mem16. See also VCRC32L_1 mem16 282 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VCRC32P2L_1 mem16 — CRC32, Low Byte www.ti.com VCRC32P2L_1 mem16 CRC32, Low Byte Operands mem16 16-bit memory location Opcode LSW: 1110 0010 1100 1011 MSW: 0000 0011 mem16 Description This instruction uses CRC32 polynomial == 0x04C11DB7 Calculate the CRC32 of the least significant byte pointed to by mem16 and accumulate it with the value in the VCRC register. Store the result in VCRC. if (VSTATUS[CRCMSGFLIP] == 0){ temp[7:0] = [mem16][7:0]; }else { temp[7:0] = [mem16][0:7]; } VCRC[31:0] = CRC32 (VCRC[31:0], temp[7:0]) Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example typedef struct { uint32_t *CRCResult; // Address where result should be stored uint16_t *CRCData; // Start of data uint16_t CRCLen; // Length of data in bytes }CRC_CALC; CRC_CALC mycrc; ... CRC32P2(&mycrc); ... ; ------------------; Calculate the CRC of a block of data ; This function assumes the block is a multiple of 2 16-bit words ; .global _CRC32P2 _CRC32P2 VCRCCLR ; Clear the result register MOV AL, *+XAR4[4] ; AL = CRCLen ASR AL, 2 ; AL = CRCLen/4 SUBB AL, #1 ; AL = CRCLen/4 - 1 MOVL XAR7, *+XAR4[2] ; XAR7 = &CRCData .align 2 NOP ; Align RPTB to an odd address RPTB _CRC32P2_done, AL ; Execute block of code AL + 1 times VCRC32P2L_1 *XAR7 ; Calculate CRC for 4 bytes VCRC32P2H_1 *XAR7++ ; ... VCRC32P2L_1 *XAR7 ; ... VCRC32P2H_1 *XAR7++ ; ... _CRC32P2_done MOVL XAR7, *_+XAR4[0] ; XAR7 = &CRCResult VMOV32 *+XAR7[0], VCRC ; Store the result LRETR ; return to caller See also VCRC32P2H_1 mem16 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 283 VCRCCLR — Clear CRC Result Register VCRCCLR www.ti.com Clear CRC Result Register Operands mem16 16-bit memory location Opcode LSW: 1110 0101 0010 0100 Description Clear the VCRC register. VCRC = 0x0000 Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VCRC32L_1 mem16. See also VMOV32 mem32, VCRC VMOV32 VCRC, mem32 284 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VMOV32 mem32, VCRC — Store the CRC Result Register www.ti.com VMOV32 mem32, VCRC Store the CRC Result Register Operands mem32 32-bit memory destination VCRC CRC result register Opcode LSW: 1110 0010 0000 0110 MSW: 0000 0000 mem32 Description Store the VCRC register. [mem32] = VCRC Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example See also VCRCCLR VMOV32 VCRC, mem32 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 285 VMOV32 VCRC, mem32 — Load the CRC Result Register www.ti.com VMOV32 VCRC, mem32 Load the CRC Result Register Operands mem32 32-bit memory source VCRC CRC result register Opcode LSW: 1110 0011 1111 0110 MSW: 0000 0000 mem32 Description Load the VCRC register. VCRC = [mem32] Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example See also 286 VCRCCLR VMOV32 mem32, VCRC C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Instruction Set www.ti.com 2.5.6 Deinterleaver Instructions The instructions are listed alphabetically, preceded by a summary. Table 2-15. Deinterleaver Instructions Title ...................................................................................................................................... VCLRDIVE — Clear DIVE bit in the VSTATUS Register .......................................................................... VDEC VRaL — 16-bit Decrement ..................................................................................................... VDEC VRaL || VMOV32 VRb, mem32 — 16-bit Decrement with Parallel Load ................................................ VINC VRaL — 16-bit Increment ....................................................................................................... VINC VRaL || VMOV32 VRb, mem32 — 16-bit Increment with Parallel Load .................................................. VMOD32 VRaH, VRb, VRcH — Modulo Operation................................................................................. VMOD32 VRaH, VRb, VRcH || VMOV32 VRd, VRe — Modulo Operation with Parallel Move .............................. VMOD32 VRaH, VRb, VRcL — Modulo Operation ................................................................................. VMOD32 VRaH, VRb, VRcL || VMOV32 VRd, VRe — Modulo Operation with Parallel Move .............................. VMOV16 VRaL, VRbH — 16-bit Register Move.................................................................................... VMOV16 VRaH, VRbL — 16-Bit Register Move ................................................................................... VMOV16 VRaH, VRbH — 16-Bit Register Move ................................................................................... VMOV16 VRaL, VRbL — 16-Bit Register Move.................................................................................... VMPYADD VRa, VRaL, VRaH, VRbH — Multiply Add 16-Bit ..................................................................... VMPYADD VRa, VRaL, VRaH, VRbL — Multiply Add 16-bit ..................................................................... SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated Page 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 287 VCLRDIVE — Clear DIVE bit in the VSTATUS Register www.ti.com VCLRDIVE Clear DIVE bit in the VSTATUS Register Operands none Opcode LSW: 1110 0101 0010 0000 Description Clear the DIVE (Divide by zero error) bit in the VSTATUS register. Flags This instruction clears the DIVE bit in the VSTATUS register Pipeline This is a single-cycle operation Example See also 288 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VDEC VRaL — 16-bit Decrement www.ti.com VDEC VRaL 16-bit Decrement Operands VRaL Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L Opcode LSW: 1110 0110 1111 0010 MSW: 0000 1011 0000 1aaa Description 16-bit Increment VRaL = VRaL - 1 Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example VDEC VR0L See also VINC VRaL || VMOV32 VRb, mem32 VINC VRaL VDEC VRaL || VMOV32 VRb, mem32 ; VR0L = VR0L - 1 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 289 VDEC VRaL || VMOV32 VRb, mem32 — 16-bit Decrement with Parallel Load www.ti.com VDEC VRaL || VMOV32 VRb, mem32 16-bit Decrement with Parallel Load Operands VRaL Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L VRb General purpose register: VR0, VR1....VR7. Cannot be VR8 mem32 Pointer to 32-bit memory location Opcode LSW: 1110 0010 1000 0001 MSW: 01bb baaa mem32 Description 16-bit Decrement with Parallel Load VRaL = VRaL - 1 VRb = [mem32] Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example VDEC VR0L || VMOV32 VR1, *+XAR3[4] See also VINC VRaL VDEC VRaL VINC VRaL || VMOV32 VRb, mem32 290 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VINC VRaL — 16-bit Increment www.ti.com VINC VRaL 16-bit Increment Operands VRaL Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L Opcode LSW: 1110 0110 1111 0010 MSW: 0000 1011 0000 0aaa Description 16-bit Increment VRaL = VRaL + 1 Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example VINC VR0L See also VINC VRaL || VMOV32 VRb, mem32 VDEC VRaL VDEC VRaL || VMOV32 VRb, mem32 ; VR0L = VR0L + 1 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 291 VINC VRaL || VMOV32 VRb, mem32 — 16-bit Increment with Parallel Load www.ti.com VINC VRaL || VMOV32 VRb, mem32 16-bit Increment with Parallel Load Operands VRaL Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L VRb General purpose register: VR0, VR1....VR7. Cannot be VR8 mem32 Pointer to 32-bit memory location Opcode LSW: 1110 0010 1000 0001 MSW: 00bb baaa mem32 Description 16-bit Increment with parallel load VRaL = VRaL +1 VRb = [mem32] Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example VINC VR0L || VMOV32 VR1, *+XAR3[4] See also VINC VRaL VDEC VRaL VDEC VRaL || VMOV32 VRb, mem32 292 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VMOD32 VRaH, VRb, VRcH — Modulo Operation www.ti.com VMOD32 VRaH, VRb, VRcH Modulo Operation Operands VRaH High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H VRb General purpose register: VR0, VR1....VR7. Cannot be VR8 VRcH High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H Opcode LSW: 1110 0110 1000 0000 MSW: 0010 100a aabb bccc Description Modulo operation: 32-bit signed %16 bit unsigned if(VRcH == 0x0){ VSTATUS[DIVE] = 1 }else{ VRaH = VRb % VRcH } Flags This instruction modifies the following bits in the VSTATUS register: • DIVE is set if VRcH is 0 i.e. a divide by zero error. Pipeline This is a 9p cycle instruction. No VMOD32 related instruction can be present in the delay slot of this instruction. Example VMOD32 VR5H, VR3, VR4H NOP MOV *+XAR1[AR0], AL NOP NOP MOV AL, *XAR4++ NOP NOP NOP VMPYADD VR5, VR5L, VR5H, VR4H ; VR5H = VR3%VR4H = j ; compute j = (b * J - v * i) % n; ; D1 ; D2 Save previous Y(i+j*m) ; D3 ; D4 ; D5 AL = X(I) load X(I) ; D6 ; D7 ; D8 ; VR5 = VR5L + VR5H*VR4H ; = i + j*m compute i + j*m See also VMOD32 VRaH, VRb, VRcL VMOD32 VRaH, VRb, VRcL || VMOV32 VRd, VRe VMOD32 VRaH, VRb, VRcH || VMOV32 VRd, Vre VCLRDIVE SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 293 VMOD32 VRaH, VRb, VRcH || VMOV32 VRd, VRe — Modulo Operation with Parallel Move www.ti.com VMOD32 VRaH, VRb, VRcH || VMOV32 VRd, VRe Modulo Operation with Parallel Move Operands VRaH High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H VRb General purpose register: VR0, VR1....VR7. Cannot be VR8 VRcH Low word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H VRd General purpose register: VR0, VR1....VR7. Cannot be VR8 VRe General purpose register: VR0, VR1....VR7. Cannot be VR8 Opcode LSW: 1110 0110 1111 0011 MSW: 1eee dddc ccbb baaa Description Modulo operation: 32-bit signed %16 bit unsigned if(VRcL == 0x0){ VSTATUS[DIVE] = 1 }else{ VRaH = VRb % VRcH } VRd = VRe Flags This instruction modifies the following bits in the VSTATUS register: • DIVE is set if VRcH is 0, that is, a divide by zero error. Pipeline This is a 9p/1 cycle instruction. The VMOD32 instruction takes 9p cycles while the VMOV32 operation completes in a single cycle. No VMOD32 related instruction can be present in the delay slot of this instruction. Example VMOD32 VR5H, VR3, VR4H || VMOV32 VR0, VR6 ; ; ; VINC VR0L ; || VMOV32 VR1, *+XAR3[4] ; MOV *+XAR1[AR0], AL ; VCMPY VR3, VR2, VR1, VR0 ; ; VMOV32 VR1, *+XAR3[2] ; MOV AL, *XAR4++ ; NOP ; VMOV32 VR6, VR0 ; VMOV16 VR0L, *+XAR5[0] ; VMOD32 VR0H, VR3, VR4H ; ; See also 294 VR5H = VR3%VR4H = j; VR0 = {J,I} compute j = (b * J - v * i) % n; load back saved J,I D1 VR1H = u, VR1L = a increment I; load u, a D2 Save previous Y(i+j*m) D3 VR3 = a*I - u*J compute a * I - u * J D4/D1 VR1H = v, VR1L = b load v,b D5 AL = X(I) load X(I) D6 D7 VR6 = {J,I} save current {J,I} D8 VR0L = J load J VR0H = (VR3 % VR4H) = i compute i = (a * I - u * J) % m; VMOD32 VRaH, VRb, VRcH VMOD32 VRaH, VRb, VRcL VMOD32 VRaH, VRb, VRcL || VMOV32 VRd, VRe VCLRDIVE C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VMOD32 VRaH, VRb, VRcL — Modulo Operation www.ti.com VMOD32 VRaH, VRb, VRcL Modulo Operation Operands VRaH High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H VRb General purpose register: VR0, VR1....VR7. Cannot be VR8 VRcL Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L Opcode LSW: 1110 0110 1000 0000 MSW: 0010 011a aabb bccc Description Modulo operation: 32-bit signed %16 bit unsigned if(VRcL == 0x0){ VSTATUS[DIVE] = 1 }else{ VRaH = VRb % VRcL } Flags This instruction modifies the following bits in the VSTATUS register: • DIVE is set if VRcL is 0, that is, a divide by zero error. Pipeline This is a 9p cycle instruction. No VMOD32 related instruction can be present in the delay slot of this instruction. Example VMOD32 VR5H, VR3, VR4L NOP MOV *+XAR1[AR0], AL NOP NOP MOV AL, *XAR4++ NOP NOP NOP VMPYADD VR5, VR5L, VR5H, VR4H ; ; ; ; ; ; ; ; ; ; VR5H = VR3%VR4L = j compute j = (b * J - v * i) % n; D1 D2 Save previous Y(i+j*m) D3 D4 D5 AL = X(I) load X(I) D6 D7 D8 ; VR5 = VR5L + VR5H*VR4H ; = i + j*m compute i + j*m See also VMOD32 VRaH, VRb, VRcH VMOD32 VRaH, VRb, VRcL || VMOV32 VRd, VRe VMOD32 VRaH, VRb, VRcH || VMOV32 VRd, Vre VCLRDIVE SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 295 VMOD32 VRaH, VRb, VRcL || VMOV32 VRd, VRe — Modulo Operation with Parallel Move www.ti.com VMOD32 VRaH, VRb, VRcL || VMOV32 VRd, VRe Modulo Operation with Parallel Move Operands VRaH High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H VRb General purpose register: VR0, VR1....VR7. Cannot be VR8 VRcL Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L VRd General purpose register: VR0, VR1....VR7. Cannot be VR8 VRe General purpose register: VR0, VR1....VR7. Cannot be VR8 Opcode LSW: 1110 0110 1111 0011 MSW: 0eee dddc ccbb baaa Description Modulo operation: 32-bit signed %16 bit unsigned if(VRcL == 0x0){ VSTATUS[DIVE] = 1 }else{ VRaH = VRb % VRcL } VRd = VRe Flags This instruction modifies the following bits in the VSTATUS register: • DIVE is set if VRcH is 0, that is, a divide by zero error. Pipeline This is a 9p/1 cycle instruction. The VMOD32 instruction takes 9p cycles while the VMOV32 operation completes in a single cycle. No VMOD32 related instruction can be present in the delay slot of this instruction. Example VMOD32 VR5H, VR3, VR4L || VMOV32 VR0, VR6 ; ; ; VINC VR0L ; || VMOV32 VR1, *+XAR3[4] ; MOV *+XAR1[AR0], AL ; VCMPY VR3, VR2, VR1, VR0 ; ; VMOV32 VR1, *+XAR3[2] ; MOV AL, *XAR4++ ; NOP ; VMOV32 VR6, VR0 ; VMOV16 VR0L, *+XAR5[0] ; VMOD32 VR0H, VR3, VR4H ; ; See also 296 VR5H = VR3%VR4L = j; VR0 = {J,I} compute j = (b * J - v * i) % n; load back saved J,I D1 VR1H = u, VR1L = a increment I; load u, a D2 Save previous Y(i+j*m) D3 VR3 = a*I - u*J compute a * I - u * J D4/D1 VR1H = v, VR1L = b load v,b D5 AL = X(I) load X(I) D6 D7 VR6 = {J,I} save current {J,I} D8 VR0L = J load J VR0H = (VR3 % VR4H) = i compute i = (a * I - u * J) % m; VMOD32 VRaH, VRb, VRcH VMOD32 VRaH, VRb, VRcL VMOD32 VRaH, VRb, VRcH || VMOV32 VRd, Vre VCLRDIVE C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VMOV16 VRaL, VRbH — 16-bit Register Move www.ti.com VMOV16 VRaL, VRbH 16-bit Register Move Operands VRbH High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H VRaL Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L Opcode LSW: 1110 0110 1111 0010 MSW: 0000 1010 00bb baaa Description 16-bit Register Move VRaL = VRbH Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example VMOV16 VR5L, VR0H See also VMOV16 VRaH, VRbL VMOV16 VRaH, VRbH VMOV16 VRaL, VRbL SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback ; VR5L = VR0H C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 297 VMOV16 VRaH, VRbL — 16-Bit Register Move www.ti.com VMOV16 VRaH, VRbL 16-Bit Register Move Operands VRbL Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L VRaH High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H Opcode LSW: 1110 0110 1111 0010 MSW: 0000 1010 01bb baaa Description 16-bit Register Move VRaH = VRbL Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example VMOV16 VR5H, VR0L See also VMOV16 VRaL, VRbH VMOV16 VRaH, VRbH VMOV16 VRaL, VRbL 298 ; VR5H = VR0L C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VMOV16 VRaH, VRbH — 16-Bit Register Move www.ti.com VMOV16 VRaH, VRbH 16-Bit Register Move Operands VRbH High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H VRaH High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H Opcode LSW: 1110 0110 1111 0010 MSW: 0000 1010 10bb baaa Description 16-bit Register Move VRaH = VRbH Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example VMOV16 VR5H, VR0H See also VMOV16 VRaL, VRbH VMOV16 VRaH, VRbL VMOV16 VRaL, VRbL SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback ; VR5H = VR0H C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 299 VMOV16 VRaL, VRbL — 16-Bit Register Move www.ti.com VMOV16 VRaL, VRbL 16-Bit Register Move Operands VRbL Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L VRaL Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L Opcode LSW: 1110 0110 1111 0010 MSW: 0000 1010 11bb baaa Description 16-bit Register Move VRaL = VRbL Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example VMOV16 VR5L, VR0L See also VMOV16 VRaL, VRbH VMOV16 VRaH, VRbL VMOV16 VRaH, VRbH 300 ; VR5L = VR0L C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VMPYADD VRa, VRaL, VRaH, VRbH — Multiply Add 16-Bit www.ti.com VMPYADD VRa, VRaL, VRaH, VRbH Multiply Add 16-Bit Operands VRbH High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H VRaH Low word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H VRaL Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L VRa General purpose register: VR0, VR1....VR7. Cannot be VR8 Opcode LSW: 1110 0110 1111 0010 MSW: 0000 1100 00bb baaa Description Performs p + q*r, where p,q, and r are 16-bit values If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VRa = rnd(sat(VRaL + VRaH * VRbH)>>VSTATUS[SHIFTR]); }else { VRa = sat(VRaL + VRaH * VRbH)>>VSTATUS[SHIFTR]; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VRa = rnd((VRaL + VRaH * VRbH)>>VSTATUS[SHIFTR]); }else { VRa = (VRaL + VRaH * VRbH)>>VSTATUS[SHIFTR]; } } It should be noted that: • VRaH*VRbH is represented as 32-bit temp value • VRaL should be sign extended to 32-bit before performing add • The add operation is a 32-bit operation Flags This instruction modifies the following bits in the VSTATUS register: • • OVFR is set if signed overflow if 32-bit signed overflow is detected in the add operation. Pipeline This is a 2p cycle operation Example VMPYADD VR5, VR5L, VR5H, VR4H ; VR5 = VR5L + VR5H*VR4H ; = i + j*m compute i + j*m NOP ; D1 See also VMPYADD VRa, VRaL, VRaH, VRbL SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 301 VMPYADD VRa, VRaL, VRaH, VRbL — Multiply Add 16-bit www.ti.com VMPYADD VRa, VRaL, VRaH, VRbL Multiply Add 16-bit Operands VRbL High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H VRaH Low word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H VRaL Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L VRa General purpose register: VR0, VR1....VR7. Cannot be VR8 Opcode LSW: 1110 0110 1111 0010 MSW: 0000 1100 01bb baaa Description Performs p + q*r, where p,q, and r are 16-bit values If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VRa = rnd(sat(VRaL + VRaH * VRbL)>>VSTATUS[SHIFTR]); }else { VRa = sat(VRaL + VRaH * VRbL)>>VSTATUS[SHIFTR]; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VRa = rnd((VRaL + VRaH * VRbL)>>VSTATUS[SHIFTR]); }else { VRa = (VRaL + VRaH * VRbL)>>VSTATUS[SHIFTR]; } } It should be noted that: • VRaH* VRbL is represented as 32-bit temp value • VRaL should be sign extended to 32-bit before performing add • The add operation is a 32-bit operation Flags This instruction modifies the following bits in the VSTATUS register: • • OVFR is set if signed overflow if 32-bit signed overflow is detected in the add operation. Pipeline This is a 2p cycle operation Example VMPYADD VR5, VR5L, VR5H, VR4L ; VR5 = VR5L + VR5H*VR4L ; = i + j*m compute i + j*m NOP ; D1 See also 302 VMPYADD VRa, VRaL, VRaH, VRbH C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Instruction Set www.ti.com 2.5.7 FFT Instructions The instructions are listed alphabetically, preceded by a summary. Table 2-16. FFT Instructions Title ...................................................................................................................................... VCFFT1 VR2, VR5, VR4 — Complex FFT calculation instruction ................................................................ VCFFT2 VR7, VR6, VR4, VR2, VR1, VR0, #1-bit — Complex FFT calculation instruction .................................. VCFFT2 VR7, VR6, VR4, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 — Complex FFT calculation instruction with Parallel Store ............................................................................................................ VCFFT3 VR5, VR4, VR3, VR2, VR0, #1-bit — Complex FFT calculation instruction ......................................... VCFFT3 VR5, VR4, VR3, VR2, VR0, #1-bit || VMOV32 VR5, mem32 — Complex FFT calculation instruction with Parallel Load .................................................................................................................. VCFFT4 VR4, VR2, VR1, VR0, #1-bit — Complex FFT calculation instruction ................................................ VCFFT4 VR4, VR2, VR1, VR0, #1-bit || VMOV32 VR7, mem32 — Complex FFT calculation instruction with Parallel Load ............................................................................................................................ VCFFT5 VR5, VR4, VR3, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 — Complex FFT calculation instruction with Parallel Load ............................................................................................................ VCFFT6 VR3, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 — Complex FFT calculation instruction with Parallel Load ............................................................................................................................ VCFFT7 VR1, VR0, #1-bit || VMOV32 VR2, mem32 — Complex FFT calculation instruction with Parallel Load ......... VCFFT8 VR3, VR2, #1-bit — Complex FFT calculation instruction ............................................................. VCFFT8 VR3, VR2, #1-bit || VOMV32 mem32, VR4 — Complex FFT calculation instruction with Parallel Store ........ VCFFT9 VR5, VR4, VR3, VR2, VR1, VR0 #1-bit — Complex FFT calculation instruction ................................... VCFFT9 VR5, VR4, VR3, VR2, VR1, VR0 #1-bit || VMOVE32 mem32, VR5 — Complex FFT calculation instruction with Parallel Store ............................................................................................................ VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0 #1-bit — Complex FFT calculation instruction ................................. VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0 #1-bit || VMOV32 VR0, mem32 — Complex FFT calculation instruction with Parallel Load ............................................................................................................ SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated Page 304 305 307 309 311 313 315 317 319 320 321 322 323 324 326 330 303 VCFFT1 VR2, VR5, VR4 — Complex FFT calculation instruction www.ti.com VCFFT1 VR2, VR5, VR4 Complex FFT calculation instruction Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR4 First Complex Input VR5 Second Complex Input VR2 Complex Output Opcode LSW: 1110 0101 0010 1011 Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR2H = rnd(sat(VR5H*VR4L - VR5L*VR4H)>>VSTATUS[SHIFTR]) VR2L = rnd(sat(VR5L*VR4L + VR5H*VR4H)>>VSTATUS[SHIFTR]) }else { VR2H = sat(VR5H*VR4L - VR5L*VR4H)>>VSTATUS[SHIFTR] VR2H = sat(VR5L*VR4L + VR5H*VR4H)>>VSTATUS[SHIFTR] } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR2H = rnd((VR5H*VR4L - VR5L*VR4H)>>VSTATUS[SHIFTR]) VR2H = rnd((VR5L*VR4L + VR5H*VR4H)>>VSTATUS[SHIFTR]) }else { VR2H = (VR5H*VR4L - VR5L*VR4H)>>VSTATUS[SHIFTR] VR2L = (VR5L*VR4L + VR5H*VR4H)>>VSTATUS[SHIFTR] } } Sign-Extension is automatically done for the shift right operations Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH • The OVFR and OVFI flags are also set if, after shift right operation, the 32-bit temporary result can't fit in 16-bit destination Pipeline This is a two cycle instruction Example See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit See also 304 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VCFFT2 VR7, VR6, VR4, VR2, VR1, VR0, #1-bit — Complex FFT calculation instruction VCFFT2 VR7, VR6, VR4, VR2, VR1, VR0, #1-bit Complex FFT calculation instruction Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR7 Complex Input VR6 Complex Input VR4 Complex Input VR2 Complex Output VR1 Complex Output VR0 Complex Output #1-bit 1-bit immediate value Opcode LSW: 1010 0001 0011 000I Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR0H = rnd(sat(VR7H + VR2H)>>#1-bit); VR0L = rnd(sat(VR7L + VR2L)>>#1-bit); VR1L = rnd(sat(VR7L - VR2L)>>#1-bit); VR1H = rnd(sat(VR7H - VR2H)>>#1-bit); VR2H = rnd(sat(VR6H * VR4L - VR6L * VR4H)>> VSTATUS[SHIFTR]); VR2L = rnd(sat(VR6L * VR4L + VR6H * VR4H)>> VSTATUS[SHIFTR]); }else { VR0H = sat(VR7H + VR2H)>>#1-bit; VR0L = sat(VR7L + VR2L)>>#1-bit; VR1L = sat(VR7L - VR2L)>>#1-bit; VR1H = sat(VR7H - VR2H)>>#1-bit; VR2H = sat(VR6H * VR4L - VR6L * VR4H)>> VSTATUS[SHIFTR]; VR2L = sat(VR6L * VR4L + VR6H * VR4H)>> VSTATUS[SHIFTR]; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR0H = rnd((VR7H + VR2H)>>#1-bit); VR0L = rnd((VR7L + VR2L)>>#1-bit); VR1L = rnd((VR7L - VR2L)>>#1-bit); VR1H = rnd((VR7H - VR2H)>>#1-bit); VR2H = rnd((VR6H * VR4L - VR6L * VR4H)>> VSTATUS[SHIFTR]); VR2L = rnd((VR6L * VR4L + VR6H * VR4H)>> VSTATUS[SHIFTR]); }else { VR0H = (VR7H + VR2H)>>#1-bit; VR0L = (VR7L + VR2L)>>#1-bit; VR1L = (VR7L - VR2L)>>#1-bit; VR1H = (VR7H - VR2H)>>#1-bit; VR2H = (VR6H * VR4L - VR6L * VR4H)>> VSTATUS[SHIFTR]; VR2L = (VR6L * VR4L + VR6H * VR4H)>> VSTATUS[SHIFTR]; } } Sign-Extension is automatically done for the shift right operations Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH • The OVFR and OVFI flags are also set if, after shift right operation, the 32-bit SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 305 VCFFT2 VR7, VR6, VR4, VR2, VR1, VR0, #1-bit — Complex FFT calculation instruction www.ti.com temporary result can't fit in 16-bit destination Pipeline This is a two cycle instruction Example See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit See also 306 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VCFFT2 VR7, VR6, VR4, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 — Complex FFT calculation instruction with Parallel Store VCFFT2 VR7, VR6, VR4, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 Complex FFT calculation instruction with Parallel Store Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR7 Complex Input VR6 Complex Input VR4 Complex Input VR2 Complex Output VR1 Complex Output VR0 Complex Output #1-bit 1-bit immediate value mem32 Pointer to 32-bit memory location Opcode LSW: 1110 0010 0000 0111 MSW: 0010 000I mem32 Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR0H = rnd(sat(VR7H + VR2H)>>#1-bit); VR0L = rnd(sat(VR7L + VR2L)>>#1-bit); VR1L = rnd(sat(VR7L - VR2L)>>#1-bit); VR1H = rnd(sat(VR7H - VR2H)>>#1-bit); VR2H = rnd(sat(VR6H * VR4L - VR6L * VR4H)>> VSTATUS[SHIFTR]); VR2L = rnd(sat(VR6L * VR4L + VR6H * VR4H)>> VSTATUS[SHIFTR]); }else { VR0H = sat(VR7H + VR2H)>>#1-bit; VR0L = sat(VR7L + VR2L)>>#1-bit; VR1L = sat(VR7L - VR2L)>>#1-bit; VR1H = sat(VR7H - VR2H)>>#1-bit; VR2H = sat(VR6H * VR4L - VR6L * VR4H)>> VSTATUS[SHIFTR]; VR2L = sat(VR6L * VR4L + VR6H * VR4H)>> VSTATUS[SHIFTR]; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR0H = rnd((VR7H + VR2H)>>#1-bit); VR0L = rnd((VR7L + VR2L)>>#1-bit); VR1L = rnd((VR7L - VR2L)>>#1-bit); VR1H = rnd((VR7H - VR2H)>>#1-bit); VR2H = rnd((VR6H * VR4L - VR6L * VR4H)>> VSTATUS[SHIFTR]); VR2L = rnd((VR6L * VR4L + VR6H * VR4H)>> VSTATUS[SHIFTR]); }else { VR0H = (VR7H + VR2H)>>#1-bit; VR0L = (VR7L + VR2L)>>#1-bit; VR1L = (VR7L - VR2L)>>#1-bit; VR1H = (VR7H - VR2H)>>#1-bit; VR2H = (VR6H * VR4L - VR6L * VR4H)>> VSTATUS[SHIFTR]; VR2L = (VR6L * VR4L + VR6H * VR4H)>> VSTATUS[SHIFTR]; } } [mem32] = VR1; Sign-Extension is automatically done for the shift right operations SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 307 VCFFT2 VR7, VR6, VR4, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 — Complex FFT calculation instruction with Parallel Store www.ti.com Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH • The OVFR and OVFI flags are also set if, after shift right operation, the 32-bit temporary result can't fit in 16-bit destination Pipeline This is a 2p/1-cycle instruction. The VCFFT operation takes 2p cycles and the VMOV operation completes in a single cycle. Example See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit See also 308 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VCFFT3 VR5, VR4, VR3, VR2, VR0, #1-bit — Complex FFT calculation instruction VCFFT3 VR5, VR4, VR3, VR2, VR0, #1-bit Complex FFT calculation instruction Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR5 Complex Input VR4 Complex Input VR3 Complex Output VR2 Complex Output/Complex Input from previous operation VR0 Complex Output/Complex Input from previous operation #1-bit 1-bit immediate value Opcode LSW: 1010 0001 0011 001I Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR0H = rnd(sat(VR5H + VR2H)>>#1-bit); VR0L = rnd(sat(VR5L + VR2L)>>#1-bit); VR3H = rnd(sat(VR5H - VR2H)>>#1-bit); VR3L = rnd(sat(VR5L - VR2L)>>#1-bit); VR2H = rnd(sat(VR0H * VR4L - VR0L * VR4H)>>VSTATUS[SHIFTR]); VR2L = rnd(sat(VR0L * VR4L + VR0H * VR4H)>>VSTATUS[SHIFTR]); }else { VR0H = sat(VR5H + VR2H)>>#1-bit; VR0L = sat(VR5L + VR2L)>>#1-bit; VR3H = sat(VR5H - VR2H)>>#1-bit; VR3L = sat(VR5L - VR2L)>>#1-bit; VR2H = sat(VR0H * VR4L - VR0L * VR4H)>>VSTATUS[SHIFTR]; VR2L = sat(VR0L * VR4L + VR0H * VR4H)>>VSTATUS[SHIFTR]; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR0H = rnd((VR5H + VR2H)>>#1-bit); VR0L = rnd((VR5L + VR2L)>>#1-bit); VR3H = rnd((VR5H - VR2H)>>#1-bit); VR3L = rnd((VR5L - VR2L)>>#1-bit); VR2H = rnd((VR0H * VR4L - VR0L * VR4H)>>VSTATUS[SHIFTR]); VR2L = rnd((VR0L * VR4L + VR0H * VR4H)>>VSTATUS[SHIFTR]); }else { VR0H = (VR5H + VR2H)>>#1-bit; VR0L = (VR5L + VR2L)>>#1-bit; VR3H = (VR5H - VR2H)>>#1-bit; VR3L = (VR5L - VR2L)>>#1-bit; VR2H = (VR0H * VR4L - VR0L * VR4H)>>VSTATUS[SHIFTR]; VR2L = (VR0L * VR4L + VR0H * VR4H)>>VSTATUS[SHIFTR]; } } Sign-Extension is automatically done for the shift right operations SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 309 VCFFT3 VR5, VR4, VR3, VR2, VR0, #1-bit — Complex FFT calculation instruction www.ti.com Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH • The OVFR and OVFI flags are also set if, after shift right operation, the 32-bit temporary result can't fit in 16-bit destination Pipeline This is a 2p/1-cycle instruction. The VCFFT operation takes 2p cycles and the VMOV operation completes in a single cycle. Example See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit See also 310 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VCFFT3 VR5, VR4, VR3, VR2, VR0, #1-bit || VMOV32 VR5, mem32 — Complex FFT calculation instruction with Parallel Load VCFFT3 VR5, VR4, VR3, VR2, VR0, #1-bit || VMOV32 VR5, mem32 Complex FFT calculation instruction with Parallel Load Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR5 Complex Input VR4 Complex Input VR3 Complex Output VR2 Complex Output/Complex Input from previous operation VR0 Complex Output/Complex Input from previous operation #1-bit 1-bit immediate value mem32 Pointer to 32-bit memory location Opcode LSW: 1110 0010 1011 0000 MSW: 0000 001I mem32 Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR0H = rnd(sat(VR5H + VR2H)>>#1-bit); VR0L = rnd(sat(VR5L + VR2L)>>#1-bit); VR3H = rnd(sat(VR5H - VR2H)>>#1-bit); VR3L = rnd(sat(VR5L - VR2L)>>#1-bit); VR2H = rnd(sat(VR0H * VR4L - VR0L * VR4H)>>VSTATUS[SHIFTR]); VR2L = rnd(sat(VR0L * VR4L + VR0H * VR4H)>>VSTATUS[SHIFTR]); }else { VR0H = sat(VR5H + VR2H)>>#1-bit; VR0L = sat(VR5L + VR2L)>>#1-bit; VR3H = sat(VR5H - VR2H)>>#1-bit; VR3L = sat(VR5L - VR2L)>>#1-bit; VR2H = sat(VR0H * VR4L - VR0L * VR4H)>>VSTATUS[SHIFTR]; VR2L = sat(VR0L * VR4L + VR0H * VR4H)>>VSTATUS[SHIFTR]; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR0H = rnd((VR5H + VR2H)>>#1-bit); VR0L = rnd((VR5L + VR2L)>>#1-bit); VR3H = rnd((VR5H - VR2H)>>#1-bit); VR3L = rnd((VR5L - VR2L)>>#1-bit); VR2H = rnd((VR0H * VR4L - VR0L * VR4H)>>VSTATUS[SHIFTR]); VR2L = rnd((VR0L * VR4L + VR0H * VR4H)>>VSTATUS[SHIFTR]); }else { VR0H = (VR5H + VR2H)>>#1-bit; VR0L = (VR5L + VR2L)>>#1-bit; VR3H = (VR5H - VR2H)>>#1-bit; VR3L = (VR5L - VR2L)>>#1-bit; VR2H = (VR0H * VR4L - VR0L * VR4H)>>VSTATUS[SHIFTR]; VR2L = (VR0L * VR4L + VR0H * VR4H)>>VSTATUS[SHIFTR]; } } VR5 = [mem32]; Sign-Extension is automatically done for the shift right operations SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 311 VCFFT3 VR5, VR4, VR3, VR2, VR0, #1-bit || VMOV32 VR5, mem32 — Complex FFT calculation instruction with Parallel Load www.ti.com Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH • The OVFR and OVFI flags are also set if, after shift right operation, the 32-bit temporary result can't fit in 16-bit destination Pipeline This is a 2p cycle instruction. Example See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit See also 312 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VCFFT4 VR4, VR2, VR1, VR0, #1-bit — Complex FFT calculation instruction www.ti.com VCFFT4 VR4, VR2, VR1, VR0, #1-bit Complex FFT calculation instruction Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR4 Complex Input VR2 Complex Output/Complex Input from previous operation VR1 Complex Output/Complex Input from previous operation VR0 Complex Output/Complex Input from previous operation #1-bit 1-bit immediate value Opcode LSW: 1010 0001 0011 010I Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR0H = rnd(sat(VR0H + VR2H)>>#1-bit); VR0L = rnd(sat(VR0L + VR2L)>>#1-bit); VR1H = rnd(sat(VR0H - VR2H)>>#1-bit); VR1L = rnd(sat(VR0L - VR2L)>>#1-bit); VR2H = rnd(sat(VR1L * VR4L + VR1H * VR4H)>>VSTATUS[SHIFTR]); VR2L = rnd(sat(VR1H * VR4L - VR1L * VR4H)>>VSTATUS[SHIFTR]); }else { VR0H = sat(VR0H + VR2H)>>#1-bit; VR0L = sat(VR0L + VR2L)>>#1-bit; VR1H = sat(VR0H - VR2H)>>#1-bit; VR1L = sat(VR0L - VR2L)>>#1-bit; VR2H = sat(VR1L * VR4L + VR1H * VR4H)>>VSTATUS[SHIFTR]; VR2L = sat(VR1H * VR4L - VR1L * VR4H)>>VSTATUS[SHIFTR]; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR0H = rnd((VR0H + VR2H)>>#1-bit); VR0L = rnd((VR0L + VR2L)>>#1-bit); VR1H = rnd((VR0H - VR2H)>>#1-bit); VR1L = rnd((VR0L - VR2L)>>#1-bit); VR2H = rnd((VR1L * VR4L + VR1H * VR4H)>>VSTATUS[SHIFTR]); VR2L = rnd((VR1H * VR4L - VR1L * VR4H)>>VSTATUS[SHIFTR]); }else { VR0H = (VR0H + VR2H)>>#1-bit; VR0L = (VR0L + VR2L)>>#1-bit; VR1H = (VR0H - VR2H)>>#1-bit; VR1L = (VR0L - VR2L)>>#1-bit; VR2H = (VR1L * VR4L + VR1H * VR4H)>>VSTATUS[SHIFTR]; VR2L = (VR1H * VR4L - VR1L * VR4H)>>VSTATUS[SHIFTR]; } } Sign-Extension is automatically done for the shift right operations SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 313 VCFFT4 VR4, VR2, VR1, VR0, #1-bit — Complex FFT calculation instruction www.ti.com Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH • The OVFR and OVFI flags are also set if, after shift right operation, the 32-bit temporary result can't fit in 16-bit destination Pipeline This is a 2p cycle instruction. Example See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit See also 314 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VCFFT4 VR4, VR2, VR1, VR0, #1-bit || VMOV32 VR7, mem32 — Complex FFT calculation instruction with Parallel Load VCFFT4 VR4, VR2, VR1, VR0, #1-bit || VMOV32 VR7, mem32 Complex FFT calculation instruction with Parallel Load Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR4 Complex Input VR2 Complex Output/Complex Input from previous operation VR1 Complex Output/Complex Input from previous operation VR0 Complex Output/Complex Input from previous operation #1-bit 1-bit immediate value mem32 Pointer to 32-bit memory location Opcode LSW: 1110 0010 1011 0000 MSW: 0000 010I mem32 Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR0H = rnd(sat(VR0H + VR2H)>>#1-bit); VR0L = rnd(sat(VR0L + VR2L)>>#1-bit); VR1H = rnd(sat(VR0H - VR2H)>>#1-bit); VR1L = rnd(sat(VR0L - VR2L)>>#1-bit); VR2H = rnd(sat(VR1L * VR4L + VR1H * VR4H)>>VSTATUS[SHIFTR]); VR2L = rnd(sat(VR1H * VR4L - VR1L * VR4H)>>VSTATUS[SHIFTR]); }else { VR0H = sat(VR0H + VR2H)>>#1-bit; VR0L = sat(VR0L + VR2L)>>#1-bit; VR1H = sat(VR0H - VR2H)>>#1-bit; VR1L = sat(VR0L - VR2L)>>#1-bit; VR2H = sat(VR1L * VR4L + VR1H * VR4H)>>VSTATUS[SHIFTR]; VR2L = sat(VR1H * VR4L - VR1L * VR4H)>>VSTATUS[SHIFTR]; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR0H = rnd((VR0H + VR2H)>>#1-bit); VR0L = rnd((VR0L + VR2L)>>#1-bit); VR1H = rnd((VR0H - VR2H)>>#1-bit); VR1L = rnd((VR0L - VR2L)>>#1-bit); VR2H = rnd((VR1L * VR4L + VR1H * VR4H)>>VSTATUS[SHIFTR]); VR2L = rnd((VR1H * VR4L - VR1L * VR4H)>>VSTATUS[SHIFTR]); }else { VR0H = (VR0H + VR2H)>>#1-bit; VR0L = (VR0L + VR2L)>>#1-bit; VR1H = (VR0H - VR2H)>>#1-bit; VR1L = (VR0L - VR2L)>>#1-bit; VR2H = (VR1L * VR4L + VR1H * VR4H)>>VSTATUS[SHIFTR]; VR2L = (VR1H * VR4L - VR1L * VR4H)>>VSTATUS[SHIFTR]; } } VR7 = [mem32]; Sign-Extension is automatically done for the shift right operations SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 315 VCFFT4 VR4, VR2, VR1, VR0, #1-bit || VMOV32 VR7, mem32 — Complex FFT calculation instruction with Parallel Load www.ti.com Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH • The OVFR and OVFI flags are also set if, after shift right operation, the 32-bit temporary result can't fit in 16-bit destination Pipeline This is a 2p cycle instruction. Example See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit See also 316 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VCFFT5 VR5, VR4, VR3, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 — Complex FFT calculation instruction with Parallel Load VCFFT5 VR5, VR4, VR3, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 Complex FFT calculation instruction with Parallel Load Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR5 Complex Input VR4 Complex Input VR3 Complex Input VR2 Complex Output/Complex Input from previous operation VR1 Complex Output/Complex Input from previous operation VR0 Complex Output/Complex Input from previous operation #1-bit 1-bit immediate value mem32 Pointer to 32-bit memory location Opcode LSW: 1110 0010 0000 0111 MSW: 0010 001I mem32 Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR0H = rnd(sat(VR3H - VR2H)>>#1-bit); VR0L = rnd(sat(VR3L + VR2L)>>#1-bit); VR1H = rnd(sat(VR3H + VR2H)>>#1-bit); VR1L = rnd(sat(VR3L - VR2L)>>#1-bit); VR2H = rnd(sat(VR5H * VR4L - VR5L * VR4H)>>VSTATUS[SHIFTR]); VR2L = rnd(sat(VR5L * VR4L + VR5H * VR4H)>>VSTATUS[SHIFTR]); }else { VR0H = sat(VR3H - VR2H)>>#1-bit; VR0L = sat(VR3L + VR2L)>>#1-bit; VR1H = sat(VR3H + VR2H)>>#1-bit; VR1L = sat(VR3L - VR2L)>>#1-bit; VR2H = sat(VR5H * VR4L - VR5L * VR4H)>>VSTATUS[SHIFTR]; VR2L = sat(VR5L * VR4L + VR5H * VR4H)>>VSTATUS[SHIFTR]; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR0H = rnd((VR3H - VR2H)>>#1-bit); VR0L = rnd((VR3L + VR2L)>>#1-bit); VR1H = rnd((VR3H + VR2H)>>#1-bit); VR1L = rnd((VR3L - VR2L)>>#1-bit); VR2H = rnd((VR5H * VR4L - VR5L * VR4H)>>VSTATUS[SHIFTR]); VR2L = rnd((VR5L * VR4L + VR5H * VR4H)>>VSTATUS[SHIFTR]); }else { VR0H = (VR3H - VR2H)>>#1-bit; VR0L = (VR3L + VR2L)>>#1-bit; VR1H = (VR3H + VR2H)>>#1-bit; VR1L = (VR3L - VR2L)>>#1-bit; VR2H = (VR5H * VR4L - VR5L * VR4H)>>VSTATUS[SHIFTR]; VR2L = (VR5L * VR4L + VR5H * VR4H)>>VSTATUS[SHIFTR]; } } [mem32] = VR1; Sign-Extension is automatically done for the shift right operations SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 317 VCFFT5 VR5, VR4, VR3, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 — Complex FFT calculation instruction with Parallel Load www.ti.com Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH • The OVFR and OVFI flags are also set if, after shift right operation, the 32-bit temporary result can't fit in 16-bit destination Pipeline This is a 2p cycle instruction. Example See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit See also 318 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VCFFT6 VR3, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 — Complex FFT calculation instruction with Parallel Load VCFFT6 VR3, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 Complex FFT calculation instruction with Parallel Load Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR3 Complex Input VR2 Complex Output/Complex Input from previous operation VR1 Complex Output/Complex Input from previous operation VR0 Complex Output/Complex Input from previous operation #1-bit 1-bit immediate value mem32 Pointer to 32-bit memory location Opcode LSW: 1110 0010 0000 0111 MSW: 0010 010I mem32 Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR0H = rnd(sat(VR3H - VR2H)>>#1-bit); VR0L = rnd(sat(VR3L + VR2L)>>#1-bit); VR1H = rnd(sat(VR3H + VR2H)>>#1-bit); VR1L = rnd(sat(VR3L - VR2L)>>#1-bit); }else { VR0H = sat(VR3H - VR2H)>>#1-bit; VR0L = sat(VR3L + VR2L)>>#1-bit; VR1H = sat(VR3H + VR2H)>>#1-bit; VR1L = sat(VR3L - VR2L)>>#1-bit; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR0H = rnd((VR3H - VR2H)>>#1-bit); VR0L = rnd((VR3L + VR2L)>>#1-bit); VR1H = rnd((VR3H + VR2H)>>#1-bit); VR1L = rnd((VR3L - VR2L)>>#1-bit); }else { VR0H = (VR3H - VR2H)>>#1-bit; VR0L = (VR3L + VR2L)>>#1-bit; VR1H = (VR3H + VR2H)>>#1-bit; VR1L = (VR3L - VR2L)>>#1-bit; } } [mem32] = VR1; Sign-Extension is automatically done for the shift right operations Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH Pipeline This is a 1/1-cycle instruction. The VCFFT and VMOV operations are completed in one cycle. Example See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit See also SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 319 VCFFT7 VR1, VR0, #1-bit || VMOV32 VR2, mem32 — Complex FFT calculation instruction with Parallel Load www.ti.com VCFFT7 VR1, VR0, #1-bit || VMOV32 VR2, mem32 Complex FFT calculation instruction with Parallel Load Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR3 Complex Input VR2 Complex Output/Complex Input from previous operation VR1 Complex Output/Complex Input from previous operation VR0 Complex Output/Complex Input from previous operation #1-bit 1-bit immediate value mem32 Pointer to 32-bit memory location Opcode LSW: 1110 0010 1011 0000 MSW: 0000 011I mem32 Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR0L = rnd(sat(VR0L + VR1L)>>#1-bit); VR0H = rnd(sat(VR0L - VR1L)>>#1-bit); VR1L = rnd(sat(VR0H + VR1H)>>#1-bit); VR1H = rnd(sat(VR0H - VR1H)>>#1-bit); }else { VR0L = sat(VR0L + VR1L)>>#1-bit; VR0H = sat(VR0L - VR1L)>>#1-bit; VR1L = sat(VR0H + VR1H)>>#1-bit; VR1H = sat(VR0H - VR1H)>>#1-bit; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR0L = rnd((VR0L + VR1L)>>#1-bit); VR0H = rnd((VR0L - VR1L)>>#1-bit); VR1L = rnd((VR0H + VR1H)>>#1-bit); VR1H = rnd((VR0H - VR1H)>>#1-bit); }else { VR0L = (VR0L + VR1L)>>#1-bit; VR0H = (VR0L - VR1L)>>#1-bit; VR1L = (VR0H + VR1H)>>#1-bit; VR1H = (VR0H - VR1H)>>#1-bit; } } VR2 = [mem32]; Sign-Extension is automatically done for the shift right operations Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH Pipeline This is a 1/1-cycle instruction. The VCFFT and VMOV operations are completed in one cycle. Example See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit See also 320 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VCFFT8 VR3, VR2, #1-bit — Complex FFT calculation instruction www.ti.com VCFFT8 VR3, VR2, #1-bit Complex FFT calculation instruction Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR2 Complex Output/Complex Input from previous operation VR3 Complex Output/Complex Input from previous operation #1-bit 1-bit immediate value Opcode LSW: 1010 0001 0011 011I Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR2L = rnd(sat(VR2L + VR3L)>>#1-bit); VR2H = rnd(sat(VR2L - VR3L)>>#1-bit); VR3L = rnd(sat(VR2H + VR3H)>>#1-bit); VR3H = rnd(sat(VR2H - VR3H)>>#1-bit); }else { VR2L = sat(VR2L + VR3L)>>#1-bit; VR2H = sat(VR2L - VR3L)>>#1-bit; VR3L = sat(VR2H + VR3H)>>#1-bit; VR3H = sat(VR2H - VR3H)>>#1-bit; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR2L = rnd((VR2L + VR3L)>>#1-bit); VR2H = rnd((VR2L - VR3L)>>#1-bit); VR3L = rnd((VR2H + VR3H)>>#1-bit); VR3H = rnd((VR2H - VR3H)>>#1-bit); }else { VR2L = (VR2L + VR3L)>>#1-bit; VR2H = (VR2L - VR3L)>>#1-bit; VR3L = (VR2H + VR3H)>>#1-bit; VR3H = (VR2H - VR3H)>>#1-bit; } } Sign-Extension is automatically done for the shift right operations Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH Pipeline This is a single cycle instruction. Example See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit See also SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 321 VCFFT8 VR3, VR2, #1-bit || VOMV32 mem32, VR4 — Complex FFT calculation instruction with Parallel Store www.ti.com VCFFT8 VR3, VR2, #1-bit || VOMV32 mem32, VR4 Complex FFT calculation instruction with Parallel Store Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR4 Complex Input from previous operation VR2 Complex Output/Complex Input from previous operation VR3 Complex Output/Complex Input from previous operation #1-bit 1-bit immediate value mem32 Pointer to 32-bit memory location Opcode LSW: 1110 0010 0000 0111 MSW: 0010 011I mem32 Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR2L = rnd(sat(VR2L + VR3L)>>#1-bit); VR2H = rnd(sat(VR2L - VR3L)>>#1-bit); VR3L = rnd(sat(VR2H + VR3H)>>#1-bit); VR3H = rnd(sat(VR2H - VR3H)>>#1-bit); }else { VR2L = sat(VR2L + VR3L)>>#1-bit; VR2H = sat(VR2L - VR3L)>>#1-bit; VR3L = sat(VR2H + VR3H)>>#1-bit; VR3H = sat(VR2H - VR3H)>>#1-bit; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR2L = rnd((VR2L + VR3L)>>#1-bit); VR2H = rnd((VR2L - VR3L)>>#1-bit); VR3L = rnd((VR2H + VR3H)>>#1-bit); VR3H = rnd((VR2H - VR3H)>>#1-bit); }else { VR2L = (VR2L + VR3L)>>#1-bit; VR2H = (VR2L - VR3L)>>#1-bit; VR3L = (VR2H + VR3H)>>#1-bit; VR3H = (VR2H - VR3H)>>#1-bit; } } [mem32] = VR4; Sign-Extension is automatically done for the shift right operations Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH Pipeline This is a single cycle instruction. Example See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit See also 322 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VCFFT9 VR5, VR4, VR3, VR2, VR1, VR0 #1-bit — Complex FFT calculation instruction VCFFT9 VR5, VR4, VR3, VR2, VR1, VR0 #1-bit Complex FFT calculation instruction Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR0 Complex Input VR1 Complex Input VR2 Complex Input VR3 Complex Input VR4 Complex Output VR5 Complex Output #1-bit 1-bit immediate value Opcode LSW: 1010 0001 0011 100I Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR4L = rnd(sat(VR0L + VR2L)>>#1-bit); VR4H = rnd(sat(VR1L + VR3L)>>#1-bit); VR5L = rnd(sat(VR0L - VR2L)>>#1-bit); VR5H = rnd(sat(VR1L - VR3L)>>#1-bit); }else { VR4L = sat(VR0L + VR2L)>>#1-bit; VR4H = sat(VR1L + VR3L)>>#1-bit; VR5L = sat(VR0L - VR2L)>>#1-bit; VR5H = sat(VR1L - VR3L)>>#1-bit; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR4L = rnd((VR0L + VR2L)>>#1-bit); VR4H = rnd((VR1L + VR3L)>>#1-bit); VR5L = rnd((VR0L - VR2L)>>#1-bit); VR5H = rnd((VR1L - VR3L)>>#1-bit); }else { VR4L = (VR0L + VR2L)>>#1-bit; VR4H = (VR1L + VR3L)>>#1-bit; VR5L = (VR0L - VR2L)>>#1-bit; VR5H = (VR1L - VR3L)>>#1-bit; } } Sign-Extension is automatically done for the shift right operations Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH Pipeline This is a single cycle instruction. Example See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 323 VCFFT9 VR5, VR4, VR3, VR2, VR1, VR0 #1-bit || VMOVE32 mem32, VR5 — Complex FFT calculation instruction with Parallel Store www.ti.com VCFFT9 VR5, VR4, VR3, VR2, VR1, VR0 #1-bit || VMOVE32 mem32, VR5 Complex FFT calculation instruction with Parallel Store Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR0 Complex Input VR1 Complex Input VR2 Complex Input VR3 Complex Input VR4 Complex Output VR5 Complex Output #1-bit 1-bit immediate value mem32 Pointer to 32-bit memory location Opcode LSW: 1110 0010 0000 0111 MSW: 0010 100I mem32 Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR4L = rnd(sat(VR0L + VR2L)>>#1-bit); VR4H = rnd(sat(VR1L + VR3L)>>#1-bit); VR5L = rnd(sat(VR0L - VR2L)>>#1-bit); VR5H = rnd(sat(VR1L - VR3L)>>#1-bit); }else { VR4L = sat(VR0L + VR2L)>>#1-bit; VR4H = sat(VR1L + VR3L)>>#1-bit; VR5L = sat(VR0L - VR2L)>>#1-bit; VR5H = sat(VR1L - VR3L)>>#1-bit; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR4L = rnd((VR0L + VR2L)>>#1-bit); VR4H = rnd((VR1L + VR3L)>>#1-bit); VR5L = rnd((VR0L - VR2L)>>#1-bit); VR5H = rnd((VR1L - VR3L)>>#1-bit); }else { VR4L = (VR0L + VR2L)>>#1-bit; VR4H = (VR1L + VR3L)>>#1-bit; VR5L = (VR0L - VR2L)>>#1-bit; VR5H = (VR1L - VR3L)>>#1-bit; } } [mem32] = VR5; Sign-Extension is automatically done for the shift right operations 324 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VCFFT9 VR5, VR4, VR3, VR2, VR1, VR0 #1-bit || VMOVE32 mem32, VR5 — Complex FFT calculation instruction with Parallel Store Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH Pipeline This is a 1/1-cycle instruction. The VCFFT and VMOV operations are completed in one cycle. Example See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit See also SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 325 VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0 #1-bit — Complex FFT calculation instruction www.ti.com VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0 #1-bit Complex FFT calculation instruction Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR0 Complex Input VR1 Complex Input VR2 Complex Input VR3 Complex Input VR6 Complex Output VR7 Complex Output #1-bit 1-bit immediate value Opcode LSW: 1010 0001 0011 101I Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR6L = rnd(sat(VR0H + VR3H)>>#1-bit); VR6H = rnd(sat(VR1H - VR2H)>>#1-bit); VR7L = rnd(sat(VR0H - VR3H)>>#1-bit); VR7H = rnd(sat(VR1H + VR2H)>>#1-bit); }else { VR6L = sat(VR0H + VR3H)>>#1-bit; VR6H = sat(VR1H - VR2H)>>#1-bit; VR7L = sat(VR0H - VR3H)>>#1-bit; VR7H = sat(VR1H + VR2H)>>#1-bit; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR6L = rnd((VR0H + VR3H)>>#1-bit); VR6H = rnd((VR1H - VR2H)>>#1-bit); VR7L = rnd((VR0H - VR3H)>>#1-bit); VR7H = rnd((VR1H + VR2H)>>#1-bit); }else { VR6L = (VR0H + VR3H)>>#1-bit; VR6H = (VR1H - VR2H)>>#1-bit; VR7L = (VR0H - VR3H)>>#1-bit; VR7H = (VR1H + VR2H)>>#1-bit; } } Sign-Extension is automatically done for the shift right operations Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH Pipeline This is a single cycle instruction. Example _CFFT_run1024Pt: ... etc ... ... MOVL *-SP[ARG_OFFSET], XAR4 VSATON 326 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0 #1-bit — Complex FFT calculation instruction _CFFT_run1024Pt_stages1and2Combined: MOVZ AR0, *+XAR4[NSAMPLES_OFFSET] MOVL XAR2, *+XAR4[INBUFFER_OFFSET] MOVL XAR1, *+XAR4[OUTBUFFER_OFFSET] .lp_amode SETC AMODE NOP VMOV32 VMOV32 VCFFT7 || VMOV32 *,ARP2 VR0, *BR0++ VR1, *BR0++ VR1, VR0, #1 VR2, *BR0++ VMOV32 VCFFT8 VR3, *BR0++ VR3, VR2, #1 VCFFT9 VR5, VR4, VR3, VR2, VR1, VR0, #1 .align RPTB 2 _CFFT_run1024Pt_stages1and2CombinedLoop, #S12_LOOP_COUNT VCFFT10 || VMOV32 VR7, VR6, VR3, VR2, VR1, VR0, #1 VR0, *BR0++ VMOV32 VCFFT7 || VMOV32 VR1, *BR0++ VR1, VR0, #1 VR2, *BR0++ VMOV32 VCFFT8 || VMOV32 VR3, *BR0++ VR3, VR2, #1 *XAR1++, VR4 VMOV32 VCFFT9 || VMOV32 *XAR1++, VR6 VR5, VR4, VR3, VR2, VR1, VR0, #1 *XAR1++, VR5 VMOV32 *++, VR7, ARP2 _CFFT_run1024Pt_stages1and2CombinedLoop: VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1 VMOV32 VMOV32 VMOV32 VMOV32 *XAR1++, *XAR1++, *XAR1++, *XAR1++, VR4 VR6 VR5 VR7 _CFFT_run1024Pt_stages1and2CombinedEnd: .c28_amode CLRC AMODE _CFFT_run1024Pt_stages3and4Combined: ... etc ... ... VSETSHR #15 VRNDON MOVL XAR2, *+XAR4[S34_INPUT_OFFSET] MOVL XAR1, #S34_INSEP MOVL XAR0, #S34_OUTSEP MOVL XAR6, *+XAR4[S34_OUTPUT_OFFSET] MOVL ADDB MOVL XAR7, XAR6 XAR7, #S34_GROUPSEP XAR3, #_vcu2_twiddleFactors SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 327 VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0 #1-bit — Complex FFT calculation instruction www.ti.com MOVL *-SP[TFPTR_OFFSET], XAR3 MOVL XAR4, XAR2 ADDB XAR4, #S34_GROUPSEP MOVL XAR5, #S34_OUTER_LOOP_COUNT _CFFT_run1024Pt_stages3and4OuterLoop: MOVL XAR3, *-SP[TFPTR_OFFSET] ; Inner Butterfly Loop VMOV32 VR5, *+XAR4[AR1] VMOV32 VR6, *+XAR2[AR1] VMOV32 VR7, *XAR4++ VMOV32 VR4, *XAR3++ VCFFT1 VR2, VR5, VR4 VMOV32 VCFFT2 VR5, *XAR2++ VR7, VR6, VR4, VR2, VR1, VR0, #1 .align RPTB VMOV32 VCFFT3 || VMOV32 2 _CFFT_run1024Pt_stages3and4InnerLoop, #S34_INNER_LOOP_COUNT VR4, *XAR3++ VR5, VR4, VR3, VR2, VR0, #1 VR5, *+XAR4[AR1] VMOV32 VCFFT4 || VMOV32 VR6, *+XAR2[AR1] VR4, VR2, VR1, VR0, #1 VR7, *XAR4++ VMOV32 VMOV32 VCFFT5 || VMOV32 VMOV32 VMOV32 VCFFT2 || VMOV32 VR4, *XAR3++ *XAR6++, VR0 VR5, VR4, VR3, VR2, VR1, VR0, #1 *XAR7++, VR1 VR5, *XAR2++ *+XAR6[AR0], VR0 VR7, VR6, VR4, VR2, VR1, VR0, #1 *+XAR7[AR0], VR1 _CFFT_run1024Pt_stages3and4InnerLoop: VMOV32 VCFFT3 VR4, *XAR3++ VR5, VR4, VR3, VR2, VR0, #1 NOP VCFFT4 VR4, VR2, VR1, VR0, #1 NOP VMOV32 VCFFT6 || VMOV32 *XAR6++, VR0 VR3, VR2, VR1, VR0, #1 *XAR7++, VR1 NOP VMOV32 VMOV32 *+XAR6[AR0], VR0 *+XAR7[AR0], VR1 ADDB ADDB ADDB ADDB XAR2, XAR4, XAR6, XAR7, BANZ _CFFT_run1024Pt_stages3and4OuterLoop, AR5-- #S34_POST_INCREMENT #S34_POST_INCREMENT #S34_POST_INCREMENT #S34_POST_INCREMENT _CFFT_run1024Pt_stages3and4CombinedEnd: 328 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com See also VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0 #1-bit — Complex FFT calculation instruction The entire FFT implementation, with accompanying code comments, can be found in the VCU Library in controlSUITE. SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 329 VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0 #1-bit || VMOV32 VR0, mem32 — Complex FFT calculation instruction with Parallel Load www.ti.com VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0 #1-bit || VMOV32 VR0, mem32 Complex FFT calculation instruction with Parallel Load Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR0 Complex Input VR1 Complex Input VR2 Complex Input VR3 Complex Input VR6 Complex Output VR7 Complex Output #1-bit 1-bit immediate value mem32 pointer to 32-bit memory location Opcode LSW: 1110 0010 1011 0000 MSW: 0000 100I mem32 Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR6L = rnd(sat(VR0H + VR3H)>>#1-bit); VR6H = rnd(sat(VR1H - VR2H)>>#1-bit); VR7L = rnd(sat(VR0H - VR3H)>>#1-bit); VR7H = rnd(sat(VR1H + VR2H)>>#1-bit); }else { VR6L = sat(VR0H + VR3H)>>#1-bit; VR6H = sat(VR1H - VR2H)>>#1-bit; VR7L = sat(VR0H - VR3H)>>#1-bit; VR7H = sat(VR1H + VR2H)>>#1-bit; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR6L = rnd((VR0H + VR3H)>>#1-bit); VR6H = rnd((VR1H - VR2H)>>#1-bit); VR7L = rnd((VR0H - VR3H)>>#1-bit); VR7H = rnd((VR1H + VR2H)>>#1-bit); }else { VR6L = (VR0H + VR3H)>>#1-bit; VR6H = (VR1H - VR2H)>>#1-bit; VR7L = (VR0H - VR3H)>>#1-bit; VR7H = (VR1H + VR2H)>>#1-bit; } } VR0 = [mem32]; Sign-Extension is automatically done for the shift right operations Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH Pipeline This is a 1/1-cycle instruction. The VCFFT and VMOV operations are completed in one cycle. Example See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit 330 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Instruction Set www.ti.com 2.5.8 Galois Instructions The instructions are listed alphabetically, preceded by a summary. Table 2-17. Galois Field Instructions Title ...................................................................................................................................... VGFACC VRa, VRb, #4-bit — Galois Field Instruction ........................................................................... VGFACC VRa, VRb, VR7 — Galois Field Instruction ............................................................................. VGFACC VRa, VRb, VR7 || VMOV32 VRc, mem32 — Galois Field Instruction with Parallel Load ......................... VGFADD4 VRa, VRb, VRc, #4-bit — Galois Field Four Parallel Byte X Byte Add............................................ VGFINIT mem16 — Initialize Galois Field Polynomial and Order ............................................................... VGFMAC4 VRa, VRb, VRc — Galois Field Four Parallel Byte X Byte Multiply and Accumulate ............................ VGFMPY4 VRa, VRb, VRc — Galois Field Four Parallel Byte X Byte Multiply ................................................. VGFMPY4 VRa, VRb, VRc || VMOV32 VR0, mem32 — Galois Field Four Parallel Byte X Byte Multiply with Parallel Load ............................................................................................................................ VGFMAC4 VRa, VRb, VRc || PACK4 VR0, mem32, #2-bit — Galois Field Four Parallel Byte X Byte Multiply and Accumulate with Parallel Byte Packing .................................................................................... VPACK4 VRa, mem32, #2-bit — Byte Packing .................................................................................... VREVB VRa — Byte Reversal ......................................................................................................... VSHLMB VRa, VRb — Shift Left and Merge Right Bytes ......................................................................... SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated Page 332 333 334 335 336 337 338 339 340 341 342 343 331 VGFACC VRa, VRb, #4-bit — Galois Field Instruction www.ti.com VGFACC VRa, VRb, #4-bit Galois Field Instruction Operands VRb General purpose register: VR0, VR1....VR7. Cannot be VR8 VRa General purpose register: VR0, VR1....VR7. Cannot be VR8 #4-bit 4-bit Immediate Value Opcode LSW: 1110 0110 1000 0001 MSW: 0000 00aa abbb IIII Description Performs the following sequence of operations If (I[0:0] == 1 ) VRa[7:0] = VRa[7:0] ^ VRb[7:0] If (I[1:1] == 1 ) VRa[7:0] = VRa[7:0] ^ VRb[15:8] If (I[2:2] == 1 ) VRa[7:0] = VRa[7:0] ^ VRb[23:16] If (I[3:3] == 1 ) VRa[7:0] = VRa[7:0] ^ VRb[31:24] Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE See also VGFACC VRa, VRb, VR7 VGFACC VRa, VRb, VR7 || VMOV32 VRc, mem32 332 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VGFACC VRa, VRb, VR7 — Galois Field Instruction www.ti.com VGFACC VRa, VRb, VR7 Galois Field Instruction Operands VRb General purpose register: VR0, VR1....VR7. Cannot be VR8 VRa General purpose register: VR0, VR1....VR7. Cannot be VR8 VR7 General purpose register: VR7 Opcode LSW: 1110 0110 1000 0001 MSW: 0000 0100 00aa abbb Description Performs the following sequence of operations If (VR7[0:0] == 1 ) VRa[7:0] = VRa[7:0] ^ VRb[7:0] If (VR7[1:1] == 1 ) VRa[7:0] = VRa[7:0] ^ VRb[15:8] If (VR7[2:2] == 1 ) VRa[7:0] = VRa[7:0] ^ VRb[23:16] If (VR7[3:3] == 1 ) VRa[7:0] = VRa[7:0] ^ VRb[31:24] Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE See also VGFACC VRa, VRb, #4-bit VGFACC VRa, VRb, VR7 || VMOV32 VRc, mem32 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 333 VGFACC VRa, VRb, VR7 || VMOV32 VRc, mem32 — Galois Field Instruction with Parallel Load www.ti.com VGFACC VRa, VRb, VR7 || VMOV32 VRc, mem32 Galois Field Instruction with Parallel Load Operands VRb General purpose register: VR0, VR1....VR7. Cannot be VR8 VRa General purpose register: VR0, VR1....VR7. Cannot be VR8 VRc General purpose register: VR0, VR1....VR7. Cannot be VR8 VR7 General purpose register: VR7 mem32 Pointer to a 32-bit memory location Opcode LSW: 1110 0010 1011 011a MSW: aabb bccc mem32 Description Performs the following sequence of operations If (VR7[0:0] == 1 ) VRa[7:0] = VRa[7:0] ^ VRb[7:0] If (VR7[1:1] == 1 ) VRa[7:0] = VRa[7:0] ^ VRb[15:8] If (VR7[2:2] == 1 ) VRa[7:0] = VRa[7:0] ^ VRb[23:16] If (VR7[3:3] == 1 ) VRa[7:0] = VRa[7:0] ^ VRb[31:24] VRc = [mem32] Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a 1/1-cycle instruction. Both the VGFACC and VMOV32 operation complete in a single cycle. Example See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE See also VGFACC VRa, VRb, #4-bit VGFACC VRa, VRb, VR7 334 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VGFADD4 VRa, VRb, VRc, #4-bit — Galois Field Four Parallel Byte X Byte Add VGFADD4 VRa, VRb, VRc, #4-bit Galois Field Four Parallel Byte X Byte Add Operands VRb General purpose register: VR0, VR1....VR7. Cannot be VR8 VRa General purpose register: VR0, VR1....VR7. Cannot be VR8 VRc General purpose register: VR0, VR1....VR7. Cannot be VR8 #4-bit 4-bit Immediate Value Opcode LSW: 1110 0110 1000 0000 MSW: 000a aabb bccc IIII Description Performs the following sequence of operations If (I[0:0] == 1 ) VRa[7:0] = VRb[7:0] ^ VRc[7:0] else VRa[7:0] = VRb[7:0] If (I[1:1] == 1 ) VRa[15:8] = VRb[15:8] ^ VRc[15:8] else VRa[15:8] = VRb[15:8] If (I[2:2] == 1 ) VRa[23:16] = VRb[23:16] ^ VRc[23:16] else VRa[23:16] = VRb[23:16] If (I[3:3] == 1 ) VRa[31:24] = VRb[31:24] ^ VRc[31:24] else VRa[31:24] = VRb[31:24] Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single cycle instruction Example See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE See also SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 335 VGFINIT mem16 — Initialize Galois Field Polynomial and Order VGFINIT mem16 www.ti.com Initialize Galois Field Polynomial and Order Operands mem16 Pointer to 16-bit memory location Opcode LSW: 1110 0010 1100 0101 MSW: 0000 0000 mem16 Description Initialize GF Polynomial and Order VSTATUS[GFPOLY] = [mem16][7:0] VSTATUS[GFORDER] = [mem16][10:8] Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE See also 336 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VGFMAC4 VRa, VRb, VRc — Galois Field Four Parallel Byte X Byte Multiply and Accumulate VGFMAC4 VRa, VRb, VRc Galois Field Four Parallel Byte X Byte Multiply and Accumulate Operands VRb General purpose register: VR0, VR1....VR7. Cannot be VR8 VRa General purpose register: VR0, VR1....VR7. Cannot be VR8 VRc General purpose register: VR0, VR1....VR7. Cannot be VR8 Opcode LSW: 1110 0110 1000 0000 MSW: 0010 001a aabb bccc Description Performs the follow sequence of operations: VRa[7:0] VRa[15:8] VRa[23:16] VRa[31:24] = = = = (VRa[7:0] (VRa[15:8] (VRa[23:16] (VRa[31:24] * * * * VRb[7:0]) VRb[15:8]) VRb[23:16]) VRb[31:24]) ^ ^ ^ ^ VRc[7:0] VRc[15:8] VRc[23:16] VRc[31:24] The GF multiply operation is defined by VSTATUS[GFPOLY] and VSTATUS[GFORDER] bits. Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE See also VGFMPY4 VRa, VRb, VRc || VMOV32 VR0, mem32 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 337 VGFMPY4 VRa, VRb, VRc — Galois Field Four Parallel Byte X Byte Multiply www.ti.com VGFMPY4 VRa, VRb, VRc Galois Field Four Parallel Byte X Byte Multiply Operands VRb General purpose register: VR0, VR1....VR7. Cannot be VR8 VRa General purpose register: VR0, VR1....VR7. Cannot be VR8 VRc General purpose register: VR0, VR1....VR7. Cannot be VR8 Opcode LSW: 1110 0110 1000 0000 MSW: 0010 000a aabb bccc Description Performs the following sequence of operations VRa[7:0] VRa[15:8] VRa[23:16] VRa[31:24] = = = = VRb[7:0] VRb[15:8] VRb[23:16] VRb[31:24] * * * * VRc[7:0] VRc[15:8] VRc[23:16] VRc[31:24] The GF multiply operation is defined by VSTATUS[GFPOLY] and VSTATUS[GFORDER] bits. Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single cycle instruction Example See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE See also VGFMPY4 VRa, VRb, VRc || VMOV32 VR0, mem32 338 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VGFMPY4 VRa, VRb, VRc || VMOV32 VR0, mem32 — Galois Field Four Parallel Byte X Byte Multiply with Parallel Load VGFMPY4 VRa, VRb, VRc || VMOV32 VR0, mem32 Galois Field Four Parallel Byte X Byte Multiply with Parallel Load Operands VRb General purpose register: VR0, VR1....VR7. Cannot be VR8 VRa General purpose register: VR0, VR1....VR7. Cannot be VR8 VRc General purpose register: VR0, VR1....VR7. Cannot be VR8 VR0 General purpose register: VR0 mem32 Pointer to a 32-bit memory location Opcode LSW: 1110 0010 1011 010a MSW: aabb bccc mem32 Description Performs the following sequence of operations VRa[7:0] = VRb[7:0] VRa[15:8] = VRb[15:8] VRa[23:16] = VRb[23:16] VRa[31:24] = VRb[31:24] VR0 = [mem32] * * * * VRc[7:0] VRc[15:8] VRc[23:16] VRc[31:24] The GF multiply operation is defined by VSTATUS[GFPOLY] and VSTATUS[GFORDER] bits. Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a 1/1-cycle instruction. Both the VGFMPY4 and VMOV32 operation complete in a single cycle. Example See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE See also VGFMPY4 VRa, VRb, VRc SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 339 VGFMAC4 VRa, VRb, VRc || PACK4 VR0, mem32, #2-bit — Galois Field Four Parallel Byte X Byte Multiply and Accumulate with Parallel Byte Packing www.ti.com VGFMAC4 VRa, VRb, VRc || PACK4 VR0, mem32, #2-bit Galois Field Four Parallel Byte X Byte Multiply and Accumulate with Parallel Byte Packing Operands VRb General purpose register: VR0, VR1....VR7. Cannot be VR8 VRa General purpose register: VR0, VR1....VR7. Cannot be VR8 VRc General purpose register: VR0, VR1....VR7. Cannot be VR8 VR0 General purpose register: VR0 mem32 Pointer to 32-bit memory location #2-bit 2-bit Immediate Value Opcode LSW: 1110 0010 1011 1IIa MSW: aabb bccc mem32 Description Performs the follow sequence of operations: VRa[7:0] VRa[15:8] VRa[23:16] VRa[31:24] = = = = If (I == 0) VR0[7:0] VR0[15:8] VR0[23:16] VR0[31:24] (VRa[7:0] (VRa[15:8] (VRa[23:16] (VRa[31:24] = = = = * * * * VRb[7:0]) VRb[15:8]) VRb[23:16]) VRb[31:24]) ^ ^ ^ ^ VRc[7:0] VRc[15:8] VRc[23:16] VRc[31:24] [mem32][7:0] [mem32][7:0] [mem32][7:0] [mem32][7:0] Else If (I == 1) VR0[7:0] = [mem32][15:8] VR0[15:8] = [mem32][15:8] VR0[23:16] = [mem32][15:8] VR0[31:24] = [mem32][15:8] Else If (I == 2) VR0[7:0] = [mem32][23:16] VR0[15:8] = [mem32][23:16] VR0[23:16] = [mem32][23:16] VR0[31:24] = [mem32][23:16] Else If (I == 3) VR0[7:0] = [mem32][31:24] VR0[15:8] = [mem32][31:24] VR0[23:16] = [mem32][31:24] VR0[31:24] = [mem32][31:24] The GF multiply operation is defined by VSTATUS[GFPOLY] and VSTATUS[GFORDER] bits. Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a 1/1-cycle instruction. Both the VGFMAC4 and PACK4 operations complete in a single cycle. Example See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE See also 340 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VPACK4 VRa, mem32, #2-bit — Byte Packing www.ti.com VPACK4 VRa, mem32, #2-bit Byte Packing Operands VRa General purpose register: VR0, VR1....VR7. Cannot be VR8 mem32 Pointer to a 32-bit memory location #2-bit 2-bit Immediate Value Opcode LSW: 1110 0010 1011 0001 MSW: 000a aaII mem32 Description Pack Ith byte from a memory location 4 times in VRa If (I == 0) VRa[7:0] VRa[15:8] VRa[23:16] VRa[31:24] = = = = [mem32][7:0] [mem32][7:0] [mem32][7:0] [mem32][7:0] Else If (I == 1) VRa[7:0] = [mem32][15:8] VRa[15:8] = [mem32][15:8] VRa[23:16] = [mem32][15:8] VRa[31:24] = [mem32][15:8] Else If (I == 2) VRa[7:0] = [mem32][23:16] VRa[15:8] = [mem32][23:16] VRa[23:16] = [mem32][23:16] VRa[31:24] = [mem32][23:16] Else If (I == 3) VRa[7:0] = [mem32][31:24] VRa[15:8] = [mem32][31:24] VRa[23:16] = [mem32][31:24] VRa[31:24] = [mem32][31:24] The GF multiply operation is defined by VSTATUS[GFPOLY] and VSTATUS[GFORDER] bits. Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE See also SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 341 VREVB VRa — Byte Reversal VREVB VRa www.ti.com Byte Reversal Operands VRa General purpose register: VR0, VR1....VR7. Cannot be VR8 Opcode LSW: 1110 0110 1000 0000 MSW: 0010 0100 0000 0aaa Description Reverse Bytes Input: VRa = {B3,B2,B1,B0} Output: VRa = {B0,B1,B2,B3} Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE See also 342 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VSHLMB VRa, VRb — Shift Left and Merge Right Bytes www.ti.com VSHLMB VRa, VRb Shift Left and Merge Right Bytes Operands VRa General purpose register: VR0, VR1....VR7. Cannot be VR8 VRb General purpose register: VR0, VR1....VR7. Cannot be VR8 Opcode LSW: 1110 0110 1000 0000 MSW: 0010 0100 01aa abbb Description Shift Left and Merge Bytes Input: Input: VRa = {B7,B6,B5,B4} VRb = {B3,B2,B1,B0} Output: VRa = {B6,B5,B4,B3} Output: VRb = {B2,B1,B0,8'b0} Restrictions VRa != VRb. The source and destination registers must be different Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE See also SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 343 Instruction Set 2.5.9 www.ti.com Viterbi Instructions The instructions are listed alphabetically, preceded by a summary. Table 2-18. Viterbi Instructions Title ...................................................................................................................................... VITBM2 VR0 — Code Rate 1:2 Branch Metric Calculation ........................................................................ VITBM2 VR0, mem32 — Branch Metric Calculation CR=1/2 ..................................................................... VITBM2 VR0 || VMOV32 VR2, mem32 — Code Rate 1:2 Branch Metric Calculation with Parallel Load .................. VITBM3 VR0, VR1, VR2 — Code Rate 1:3 Branch Metric Calculation .......................................................... VITBM3 VR0, VR1, VR2 || VMOV32 VR2, mem32 — Code Rate 1:3 Branch Metric Calculation with Parallel Load .... VITBM3 VR0L, VR1L, mem16 — Branch Metric Calculation CR=1/3 ........................................................... VITDHADDSUB VR4, VR3, VR2, VRa — Viterbi Double Add and Subtract, High ............................................. VITDHADDSUB VR4, VR3, VR2, VRa || VMOV32 mem32, VRb — Viterbi Add and Subtract High with Parallel Store . VITDHSUBADD VR4, VR3, VR2, VRa — Viterbi Add and Subtract Low ........................................................ VITDHSUBADD VR4, VR3, VR2, VRa || VMOV32 mem32, VRb — Viterbi Subtract and Add, High with Parallel Store VITDLADDSUB VR4, VR3, VR2, VRa — Viterbi Add and Subtract Low ....................................................... VITDLADDSUB VR4, VR3, VR2, VRa || VMOV32 mem32, VRb — Viterbi Add and Subtract Low with Parallel Load... VITDLSUBADD VR4, VR3, VR2, VRa — Viterbi Subtract and Add Low ....................................................... VITDLSUBADD VR4, VR3, VR2, VRa || VMOV32 mem32, VRb — Viterbi Subtract and Add, Low with Parallel Store . VITHSEL VRa, VRb, VR4, VR3 — Viterbi Select High ............................................................................ VITHSEL VRa, VRb, VR4, VR3 || VMOV32 VR2, mem32 — Viterbi Select High with Parallel Load ....................... VITLSEL VRa, VRb, VR4, VR3 — Viterbi Select, Low Word ..................................................................... VITLSEL VRa, VRb, VR4, VR3 || VMOV32 VR2, mem32 — Viterbi Select Low with Parallel Load ........................ VITSTAGE — Parallel Butterfly Computation ....................................................................................... VITSTAGE || VITBM2 VR0, mem32 — Parallel Butterfly Computation with Parallel Branch Metric Calculation CR=1/2 VITSTAGE || VMOV16 VR0L, mem16 — Parallel Butterfly Computation with Parallel Load ................................ VMOV32 VSM (k+1):VSM(k), mem32 — Load Consecutive State Metrics .................................................... VMOV32 mem32, VSM (k+1):VSM(k) — Store Consecutive State Metrics ................................................... VSETK #3-bit — Set Constraint Length for Viterbi Operation .................................................................... VSMINIT mem16 — State Metrics Register initialization .......................................................................... VTCLEAR — Clear Transition Bit Registers ........................................................................................ VTRACE mem32, VR0, VT0, VT1 — Viterbi Traceback, Store to Memory ..................................................... VTRACE VR1, VR0, VT0, VT1 — Viterbi Traceback, Store to Register ......................................................... VTRACE VR1, VR0, VT0, VT1 || VMOV32 VT0, mem32 — Trace-back with Parallel Load.................................. 344 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Page 345 346 347 348 349 350 351 353 354 355 356 357 358 359 360 361 362 363 365 366 368 369 370 371 372 373 374 376 378 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VITBM2 VR0 — Code Rate 1:2 Branch Metric Calculation www.ti.com VITBM2 VR0 Code Rate 1:2 Branch Metric Calculation Operands Before the operation, the inputs are loaded into the registers as shown below. Each operand for the branch metric calculation is 16-bits. Input Register Value VR0L 16-bit decoder input 0 VR0H 16-bit decoder input 1 The result of the operation is also stored in VR0 as shown below: Output Register Value VR0L 16-bit branch metric 0 = VR0L + VR0H VR0H 16-bit branch metric 1 = VR0L - VR0L Opcode LSW: 1110 0101 0000 1100 Description Branch metric calculation for code rate = 1/2. // // // // // // // SAT is VSTATUS[SAT] VR0L is decoder input 0 VR0H is decoder input 1 Calculate the branch metrics by performing 16-bit signed addition and subtraction VR0L = VR0L + VR0H; VR0H = VR0L - VR0L; if (SAT == 1) { sat16(VR0L); sat16(VR0H); } // VR0L = branch metric 0 // VR0H = branch metric 1 Flags This instruction sets the real overflow flag, VSTATUS[OVFR] in the event of an overflow or underflow. Pipeline This is a single-cycle instruction. Example See also VITBM2 VR0 || VMOV32 VR2, mem32 VITBM3 VR0, VR1, VR2 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 345 VITBM2 VR0, mem32 — Branch Metric Calculation CR=1/2 www.ti.com VITBM2 VR0, mem32 Branch Metric Calculation CR=1/2 Operands Before the operation, the inputs are loaded into the registers as shown below. Opcode LSW: 1110 0010 1000 0000 MSW: 0000 0001 mem16 Description Calculates two Branch-Metrics (BMs) for CR = ½ If(VSTATUS[SAT] == 1){ VR0L = sat([mem32][15:0] + [mem32][31:16]); VR0H = sat([mem32][15:0] - [mem32][31:16]); }else { VR0L = [mem32][15:0] + [mem32][31:16]; VR0H = [mem32][15:0] - [mem32][31:16]; } Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if overflow is detected in the computation of 16-bit signed result Pipeline This is a single-cycle instruction. Example ; ; Viterbi K=4 CR = 1/2 ; ;etc ... ; VSETK #CONSTRAINT_LENGTH ; Set constraint length MOV AR1, #SMETRICINIT_OFFSET VSMINIT *+XAR4[AR1] ; Initialize the state metrics MOV AR1, #NBITS_OFFSET MOV AL, *+XAR4[AR1] LSR AL, 2 SUBB AL, #2 MOV AR3, AL ; Initialize the BMSEL register ; for butterfly 0 to K-1 MOVL XAR6, *+XAR4[BMSELINIT_OFFSET] VMOV32 VR2, *XAR6 ; Initialize BMSEL for ; butterfly 0 to 7 VITBM2 VR0, *XAR0++ ; Calculate and store BMs in ; VR0L and VR0H ; ;etc ... See also VITBM2 VR0 VITBM2 VR0 || VMOV32 VR2, mem32 VITSTAGE_VITBM2_VR0_mem32 346 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VITBM2 VR0 || VMOV32 VR2, mem32 — Code Rate 1:2 Branch Metric Calculation with Parallel Load VITBM2 VR0 || VMOV32 VR2, mem32 Code Rate 1:2 Branch Metric Calculation with Parallel Load Operands Before the operation, the inputs are loaded into the registers as shown below. Each operand for the branch metric calculation is 16-bits. Input Register Value VR0L 16-bit decoder input 0 VR0H 16-bit decoder input 1 [mem32] pointer to 32-bit memory location. The result of the operation is stored in VR0 as shown below: Output Register Value VR0L 16-bit branch metric 0 = VR0L + VR0H VR0H 16-bit branch metric 1 = VR0L - VR0L VR2 contents of memory pointed to by [mem32] Opcode LSW: 1110 0011 1111 1100 MSW: 0000 0000 mem32 Description Branch metric calculation for a code rate of 1/2 with parallel register load. // // // // // // // SAT is VSTATUS[SAT] VR0L is decoder input 0 VR0H is decoder input 1 Calculate the branch metrics by performing 16-bit signed addition and subtraction VR0L = VR0L + VR0H; VR0H = VR0L - VR0L; if (SAT == 1) { sat16(VR0L); sat16(VR0H); } VR2 = [mem32] // VR0L = branch metric 0 // VR0H = branch metric 1 // Load VR2L and VR2H with the next state metrics Flags This instruction sets the real overflow flag, VSTATUS[OVFR] in the event of an overflow or underflow. Pipeline Both operations complete in a single cycle. Example See also VITBM2 VR0 VITBM3 VR0, VR1, VR2 VITBM3 VR0, VR1, VR2 || VMOV32 VR2, mem32 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 347 VITBM3 VR0, VR1, VR2 — Code Rate 1:3 Branch Metric Calculation www.ti.com VITBM3 VR0, VR1, VR2 Code Rate 1:3 Branch Metric Calculation Operands Before the operation, the inputs are loaded into the registers as shown below. Each operand for the branch metric calculation is 16-bits. Input Register Value VR0L 16-bit decoder input 0 VR1L 16-bit decoder input 1 VR2L 16-bit decoder input 2 The result of the operation is stored in VR0 and VR1 as shown below: Output Register Value VR0L 16-bit branch metric 0 = VR0L + VR1L + VR2L VR0H 16-bit branch metric 1 = VR0L + VR1L - VR2L VR1L 16-bit branch metric 2 = VR0L - VR1L + VR2L VR1H 16-bit branch metric 3 = VR0L - VR1L - VR2L Opcode LSW: 1110 0101 0000 1101 Description Calculate the four branch metrics for a code rate of 1/3. // // // // // // // // SAT VR0L VR1L VR2L is is is is VSTATUS[SAT] decoder input 0 decoder input 1 decoder input 2 Calculate the branch metrics by performing 16-bit signed addition and subtraction VR0L = VR0L + VR1L VR0H = VR0L + VR1L VR1L = VR0L - VR1L VR1H = VR0L - VR1L if(SAT == 1) { sat16(VR0L); sat16(VR0H); sat16(VR1L); sat16(VR1H); } + + - VR2L; VR2L; VR2L; VR2L; // // // // VR0L VR0H VR1L VR1H = = = = branch branch branch branch Metric Metric Metric Metric 0 1 2 3 Flags This instruction sets the real overflow flag, VSTATUS[OVFR] in the event of an overflow or underflow. Pipeline This is a 2p-cycle instruction. The instruction following VITBM3 must not use VR0 or VR1. Example Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa. See also VITBM2 VR0 VITBM3 VR0, VR1, VR2 || VMOV32 VR2, mem32 VITBM2 VR0 || VMOV32 VR2, mem32 348 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VITBM3 VR0, VR1, VR2 || VMOV32 VR2, mem32 — Code Rate 1:3 Branch Metric Calculation with Parallel Load VITBM3 VR0, VR1, VR2 || VMOV32 VR2, mem32 Code Rate 1:3 Branch Metric Calculation with Parallel Load Operands Before the operation, the inputs are loaded into the registers as shown below. Each operand for the branch metric calculation is 16-bits. Input Register Value VR0L 16-bit decoder input 0 VR1L 16-bit decoder input 1 [mem32] pointer to a 32-bit memory location The result of the operation is stored in VR0 and VR1 and VR2 as shown below: Output Register Value VR0L 16-bit branch metric 0 = VR0L + VR1L + VR2L VR0H 16-bit branch metric 1 = VR0L + VR1L - VR2L VR1L 16-bit branch metric 2 = VR0L - VR1L + VR2 VR1H 16-bit branch metric 3 = VR0L - VR1L - VR2L VR2 Contents of the memory pointed to by [mem32] Opcode LSW: 1110 0011 1111 1101 MSW: 0000 0000 mem32 Description Calculate the four branch metrics for a code rate of 1/3 with parallel register load. // // // // // // // // SAT VR0L VR1L VR2L is is is is VSTATUS[SAT] decoder input 0 decoder input 1 decoder input 2 Calculate the branch metrics by performing 16-bit signed addition and subtraction VR0L = VR0L + VR1L VR0H = VR0L + VR1L VR1L = VR0L - VR1L VR1H = VR0L - VR1L if(SAT == 1) { sat16(VR0L); sat16(VR0H); sat16(VR1L); sat16(VR1H); } VR2 = [mem32]; + + - VR2L; VR2L; VR2L; VR2L; // // // // VR0L VR0H VR1L VR1H = = = = branch branch branch branch Metric Metric Metric Metric 0 1 2 3 Flags This instruction sets the real overflow flag, VSTATUS[OVFR] in the event of an overflow or underflow. Pipeline This is a 2p/1-cycle instruction. The VBITM3 operation takes 2p cycles and the VMOV32 completes in a single cycle. The next instruction must not use VR0 or VR1. Example Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa. See also VITBM2 VR0 VITBM2 VR0 || VMOV32 VR2, mem32 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 349 VITBM3 VR0L, VR1L, mem16 — Branch Metric Calculation CR=1/3 www.ti.com VITBM3 VR0L, VR1L, mem16 Branch Metric Calculation CR=1/3 Operands Input Output VR0L Low word of the general purpose register VR0 VR1L Low word of the general purpose register VR1 mem16 Pointer to 16-bit memory location Opcode LSW: 1110 0010 1100 0101 MSW: 0000 0010 mem16 Description Calculates four Branch-Metrics (BMs) for CR = 1/3 If(VSTATUS[SAT] == 1){ VR0L = sat(VR0L + VR1L + [mem16]); VR0H = sat(VR0L + VR1L – [mem16]); VR1L = sat(VR0L – VR1L + [mem16]); VR1H = sat(VR0L – VR1L – [mem16]); }else { VR0L = VR0L + VR1L + [mem16]; VR0H = VR0L + VR1L – [mem16]; VR1L = VR0L – VR1L + [mem16]; VR1H = VR0L – VR1L – [mem16]; } Flags This instruction modifies the following bits in the VSTATUS register. • OVFR is set if overflow is detected in the computation of a 16-bit signed result Pipeline This is a single-cycle instruction. Example See the example for VITSTAGE || VMOV16 VROL, mem16 See also VITBM3 VITBM3 VR0, VR1, VR2 || VMOV32 VR2, mem32 350 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VITDHADDSUB VR4, VR3, VR2, VRa — Viterbi Double Add and Subtract, High www.ti.com VITDHADDSUB VR4, VR3, VR2, VRa Viterbi Double Add and Subtract, High Operands Before the operation, the inputs are loaded into the registers as shown below. This operation uses the branch metric stored in VRaH. Input Register Value VR2L 16-bit state metric 0 VR2H 16-bit state metric 1 VRaH Branch metric 1. VRa must be VR0 or VR1. The result of the operation is stored in VR3 and VR4 as shown below: Output Register Value VR3L 16-bit path metric 0 = VR2L + VRaH VR3H 16-bit path metric 1 = VR2H - VRaH VR4L 16-bit path metric 2 = VR2L - VRaH VR4H 16-bit path metric 3 = VR2H +VRaH Opcode LSW: 1110 0101 0111 aaaa Description Viterbi high add and subtract. This instruction is used to calculate four path metrics. // // // // // // // Calculate the four path metrics by performing 16-bit signed addition and subtraction Before this operation VR2L and VR2H are loaded with the state metrics and VRaH with the branch metric. VR3L VR3H VR4L VR4H = = = = VR2L VR2H VR2L VR2H + + VRaH VRaH VRaH VRaH // // // // Path Path Path Path metric metric metric metric 0 1 2 3 Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example ; ; ; ; ; ; Example Viterbi decoder code fragment Viterbi butterfly calculations Loop once for each decoder input pair Branch metrics = BM0 and BM1 XAR5 points to the input stream ... ... _loop: VMOV32 VR0, *XAR5++ VITBM2 VR0 || VMOV32 VR2, *XAR1++ to the decoder ; Load two inputs into VR0L, VR0H ; VR0L = BM0 VR0H = BM1 ; Load previous state metrics ; ; 2 cycle Viterbi butterfly ; VITDLADDSUB VR4,VR3,VR2,VR0 ; Perform add/sub VITLSEL VR6,VR5,VR4,VR3 ; Perform compare/select || VMOV32 VR2, *XAR1++ ; Load previous state metrics ; ; 2 cycle Viterbi butterfly, next stage ; VITDHADDSUB VR4,VR3,VR2,VR0 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 351 VITDHADDSUB VR4, VR3, VR2, VRa — Viterbi Double Add and Subtract, High www.ti.com VITHSEL VR6,VR5,VR4,VR3 || VMOV32 VR2, *XAR1++ ; ; 2 cycle Viterbi butterfly, next stage ; VITDLADDSUB VR4,VR3,VR2,VR0 || VMOV32 *XAR2++, VR5 ... ... See also 352 VITDHSUBADD VR4, VR3, VR2, VRa VITDLADDSUB VR4, VR3, VR2, VRa VITDLSUBADD VR4, VR3, VR2, VRa C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VITDHADDSUB VR4, VR3, VR2, VRa || VMOV32 mem32, VRb — Viterbi Add and Subtract High with Parallel Store VITDHADDSUB VR4, VR3, VR2, VRa || VMOV32 mem32, VRb Viterbi Add and Subtract High with Parallel Store Operands Before the operation, the inputs are loaded into the registers as shown below. This operation uses the branch metric stored in VRaH. Input Register Value VR2L 16-bit state metric 0 VR2H 16-bit state metric 1 VRaH Branch metric 1. VRa must be VR0 or VR1. VRb Value to be stored. VRb can be VR5, VR6, VR7 or VR8. The result of the operation is stored in VR3 and VR4 as shown below: Output Register Value VR3L 16-bit path metric 0 = VR2L + VRaH VR3H 16-bit path metric 1 = VR2H - VRaH VR4L 16-bit path metric 2 = VR2L - VRaH VR4H 16-bit path metric 3 = VR2H +VRaH [mem32] Contents of VRb. VRb can be VR5, VR6, VR7 or VR8. Opcode LSW: 1110 0010 0000 1001 MSW: bbbb aaaa mem32 Description Viterbi high add and subtract. This instruction is used to calculate four path metrics. // // // // // // // Calculate the four path metrics by performing 16-bit signed addition and subtraction Before this operation VR2L and VR2H are loaded with the state metrics and VRaH with the branch metric. VR3L VR3H VR4L VR4H = = = = VR2L VR2H VR2L VR2H + + VRaH VRaH VRaH VRaH // // // // Path Path Path Path metric metric metric metric 0 1 2 3 Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example See also VITDHSUBADD VR4, VR3, VR2, VRa VITDLADDSUB VR4, VR3, VR2, VRa VITDLSUBADD VR4, VR3, VR2, VRa SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 353 VITDHSUBADD VR4, VR3, VR2, VRa — Viterbi Add and Subtract Low www.ti.com VITDHSUBADD VR4, VR3, VR2, VRa Viterbi Add and Subtract Low Operands Before the operation, the inputs are loaded into the registers as shown below. This operation uses the branch metric stored in VRaL. Input Register Value VR2L 16-bit state metric 0 VR2H 16-bit state metric 1 VRaL Branch metric 0. VRa must be VR0 or VR1. The result of the operation is 4 path metrics stored in VR3 and VR4 as shown below: Output Register Value VR3L 16-bit path metric 0 = VR2L - VRaH VR3H 16-bit path metric 1 = VR2H + VRaH VR4L 16-bit path metric 2 = VR2L + VRaH VR4H 16-bit path metric 3 = VR2H - VRaL Opcode LSW: 1110 0101 1111 aaaa Description This instruction is used to calculate four path metrics in the Viterbi butterfly. This operation uses the branch metric stored in VRaL. // // // // // // // Calculate the four path metrics by performing 16-bit signed addition and subtraction Before this operation VR2L and VR2H are loaded with the state metrics and VRaL with the branch metric. VR3L VR3H VR4L VR4H = = = = VR2L VR2H VR2L VR2H + + - VRaL VRaL VRaL VRaL // // // // Path Path Path Path metric metric metric metric 0 1 2 3 Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa. See also VITDHADDSUB VR4, VR3, VR2, VRa VITDHSUBADD VR4, VR3, VR2, VRa VITDLSUBADD VR4, VR3, VR2, VRa 354 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VITDHSUBADD VR4, VR3, VR2, VRa || VMOV32 mem32, VRb — Viterbi Subtract and Add, High with Parallel Store VITDHSUBADD VR4, VR3, VR2, VRa || VMOV32 mem32, VRb Viterbi Subtract and Add, High with Parallel Store Operands Before the operation, the inputs are loaded into the registers as shown below. This operation uses the branch metric stored in VRaH. Input Register Value VR2L 16-bit state metric 0 VR2H 16-bit state metric 1 VRaH Branch metric 1. VRa must be VR0 or VR1. VRb Contents to be stored. VRb can be VR5, VR6, VR7 or VR8. The result of the operation is stored in VR3 and VR4 as shown below: Output Register Value VR3L 16-bit path metric 0 = VR2L -VRaH VR3H 16-bit path metric 1 = VR2H + VRaH VR4L 16-bit path metric 2 = VR2L + VRaH VR4H 16-bit path metric 3 = VR2H - VRaH [mem32] Contents of VRb. VRb can be VR5, VR6, VR7 or VR8. Opcode LSW: 1110 0010 0000 1011 MSW: bbbb aaaa mem32 Description Viterbi high subtract and add. This instruction is used to calculate four path metrics. // // // // // // // Calculate the four path metrics by performing 16-bit signed addition and subtraction Before this operation VR2L and VR2H are loaded with the state metrics and VRaH with the branch metric. [mem32] = VRb VR3L = VR2L VR3H = VR2H + VR4L = VR2L + VR4H = VR2H - VRaH VRaH VRaH VRaH // // // // // Store VRb to memory Path metric 0 Path metric 1 Path metric 2 Path metric 3 Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example See also VITDHADDSUB VR4, VR3, VR2, VRa VITDLADDSUB VR4, VR3, VR2, VRa VITDLSUBADD VR4, VR3, VR2, VRa SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 355 VITDLADDSUB VR4, VR3, VR2, VRa — Viterbi Add and Subtract Low www.ti.com VITDLADDSUB VR4, VR3, VR2, VRa Viterbi Add and Subtract Low Operands Before the operation, the inputs are loaded into the registers as shown below. This operation uses the branch metric stored in VRaL. Input Register Value VR2L 16-bit state metric 0 VR2H 16-bit state metric 1 VRaL Branch metric 0. VRa must be VR0 or VR1. The result of the operation is 4 path metrics stored in VR3 and VR4 as shown below: Output Register Value VR3L 16-bit path metric 0 = VR2L + VRaH VR3H 16-bit path metric 1 = VR2H - VRaH VR4L 16-bit path metric 2 = VR2L - VRaH VR4H 16-bit path metric 3 = VR2H + VRaL Opcode LSW: 1110 0101 0011 aaaa Description This instruction is used to calculate four path metrics in the Viterbi butterfly. This operation uses the branch metric stored in VRaL. // // // // // // // Calculate the four path metrics by performing 16-bit signed addition and subtraction Before this operation VR2L and VR2H are loaded with the state metrics and VRaL with the branch metric. VR3L VR3H VR4L VR4H = = = = VR2L VR2H VR2L VR2H + + VRaL VRaL VRaL VRaL // // // // Path Path Path Path metric metric metric metric 0 1 2 3 Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa. See also VITDHADDSUB VR4, VR3, VR2, VRa VITDHSUBADD VR4, VR3, VR2, VRa VITDLSUBADD VR4, VR3, VR2, VRa 356 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VITDLADDSUB VR4, VR3, VR2, VRa || VMOV32 mem32, VRb — Viterbi Add and Subtract Low with Parallel Load VITDLADDSUB VR4, VR3, VR2, VRa || VMOV32 mem32, VRb Viterbi Add and Subtract Low with Parallel Load Operands Before the operation, the inputs are loaded into the registers as shown below. This operation uses the branch metric stored in VRaL. Input Register Value VR2L 16-bit state metric 0 VR2H 16-bit state metric 1 VRaL Branch metric 0. VRa can be VR0 or VR1. VRb Contents to be stored to memory The result of the operation is four path metrics stored in VR3 and VR4 as shown below: Output Register Value VR3L 16-bit path metric 0 = VR2L + VRaH VR3H 16-bit path metric 1 = VR2H - VRaH VR4L 16-bit path metric 2 = VR2L - VRaH VR4H 16-bit path metric 3 = VR2H + VRaL [mem32] Contents of VRb. VRb can be VR5, VR6, VR7 or VR8. Opcode LSW: 1110 0010 0000 1000 MSW: bbbb aaaa mem32 Description This instruction is used to calculate four path metrics in the Viterbi butterfly. This operation uses the branch metric stored in VRaL. // // // // // // // Calculate the four path metrics by performing 16-bit signed addition and subtraction Before this operation VR2L and VR2H are loaded with the state metrics and VRaL with the branch metric. [mem32] = VRb VR3L = VR2L + VR3H = VR2H VR4L = VR2L VR4H = VR2H + VRaL VRaL VRaL VRaL // // // // // Store VRb Path metric Path metric Path metric Path metric 0 1 2 3 Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa. See also VITDHADDSUB VR4, VR3, VR2, VRa VITDHSUBADD VR4, VR3, VR2, VRa VITDLSUBADD VR4, VR3, VR2, VRa SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 357 VITDLSUBADD VR4, VR3, VR2, VRa — Viterbi Subtract and Add Low www.ti.com VITDLSUBADD VR4, VR3, VR2, VRa Viterbi Subtract and Add Low Operands Before the operation, the inputs are loaded into the registers as shown below. This operation uses the branch metric stored in VRaL. Input Register Value VR2L 16-bit state metric 0 VR2H 16-bit state metric 1 VRaL Branch metric 0. VRa must be VR0 or VR1. The result of the operation is four path metrics stored in VR3 and VR4 as shown below: Output Register Value VR3L 16-bit path metric 0= VR2L - VRaH VR3H 16-bit path metric 1 = VR2H + VRaH VR4L 16-bit path metric 2 = VR2L + VRaH VR4H 16-bit path metric 3 = VR2H - VRaL Opcode LSW: 1110 0101 1110 aaaa Description This instruction is used to calculate four path metrics in the Viterbi butterfly. This operation uses the branch metric stored in VRaL. // // // // // // // Calculate the four path metrics by performing 16-bit signed addition and subtraction Before this operation VR2L and VR2H are loaded with the state metrics and VRaH with the branch metric. VR3L VR3H VR4L VR4H = = = = VR2L VR2H VR2L VR2H + + - VRaL VRaL VRaL VRaL // // // // Path Path Path Path metric metric metric metric 0 1 2 3 Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa. See also VITDHADDSUB VR4, VR3, VR2, VRa VITDHSUBADD VR4, VR3, VR2, VRa VITDLADDSUB VR4, VR3, VR2, VRa 358 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VITDLSUBADD VR4, VR3, VR2, VRa || VMOV32 mem32, VRb — Viterbi Subtract and Add, Low with Parallel Store VITDLSUBADD VR4, VR3, VR2, VRa || VMOV32 mem32, VRb Viterbi Subtract and Add, Low with Parallel Store Operands Before the operation, the inputs are loaded into the registers as shown below. This operation uses the branch metric stored in VRaL. Input Register Value VR2L 16-bit state metric 0 VR2H 16-bit state metric 1 VRaL Branch metric 0. VRa must be VR0 or VR1. VRb Value to be stored. VRb can be VR5, VR6, VR7 or VR8. The result of the operation is 4 path metrics stored in VR3 and VR4 as shown below: Output Register Value VR3L 16-bit path metric 0= VR2L - VRaH VR3H 16-bit path metric 1 = VR2H + VRaH VR4L 16-bit path metric 2 = VR2L + VRaH VR4H 16-bit path metric 3 = VR2H - VRaL [mem32] Contents of VRb. VRb can be VR5, VR6, VR7 or VR8. Opcode LSW: 1110 0010 0000 1010 MSW: bbbb aaaa mem32 Description This instruction is used to calculate four path metrics in the Viterbi butterfly. This operation uses the branch metric stored in VRaL. // // // // // // // Calculate the four path metrics by performing 16-bit signed addition and subtraction Before this operation VR2L and VR2H are loaded with the state metrics and VRaH with the branch metric. [mem32] = VRb VR3L = VR2L VR3H = VR2H + VR4L = VR2L + VR4H = VR2H - VRaL VRaL VRaL VRaL // // // // // Store VRb into mem32 Path metric 0 Path metric 1 Path metric 2 Path metric 3 Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa. See also VITDHADDSUB VR4, VR3, VR2, VRa VITDHSUBADD VR4, VR3, VR2, VRa VITDLADDSUB VR4, VR3, VR2, VRa SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 359 VITHSEL VRa, VRb, VR4, VR3 — Viterbi Select High www.ti.com VITHSEL VRa, VRb, VR4, VR3 Viterbi Select High Operands Before the operation, the path metrics are loaded into the registers as shown below. Typically this will have been done using a Viterbi AddSub or SubAdd instruction. Input Register Value VR3L 16-bit path metric 0 VR3H 16-bit path metric 1 VR4L 16-bit path metric 2 VR4H 16-bit path metric 3 The result of the operation is the new state metrics stored in VRa and VRb as shown below: Output Register Value VRaH 16-bit state metric 0. VRa can be VR6 or VR8. VRbH 16-bit state metric 1. VRb can be VR5 or VR7. VT0 The transition bit is appended to the end of the register. VT1 The transition bit is appended to the end of the register. Opcode LSW: 1110 0110 1111 0111 MSW: 0000 0000 bbbb aaaa Description This instruction computes the new state metrics of a Viterbi butterfly operation and stores them in the higher 16 bits of the VRa and VRb registers. To instead load the state metrics into the low 16-bits use the VITLSEL instruction. T0 = T0 << 1 if (VR3L > VR3H) { VRbH = VR3L; T0[0:0] = 0; } else { VRbH = VR3H; T0[0:0] = 1; } // Shift previous transition bits left T1 = T1 << 1 if (VR4L > VR4H) { VRaH = VR4L; T1[0:0] = 0; } else { VRaH = VR4H; T1[0:0] = 1; } // Shift previous transition bits left // New state metric 0 // Store the transition bit // New state metric 0 // Store the transition bit // New state metric 1 // Store the transition bit // New state metric 1 // Store the transition bit Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa. See also VITLSEL VRa, VRb, VR4, VR3 360 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VITHSEL VRa, VRb, VR4, VR3 || VMOV32 VR2, mem32 — Viterbi Select High with Parallel Load VITHSEL VRa, VRb, VR4, VR3 || VMOV32 VR2, mem32 Viterbi Select High with Parallel Load Operands Before the operation, the path metrics are loaded into the registers as shown below. Typically this will have been done using a Viterbi AddSub or SubAdd instruction. Input Register Value VR3L 16-bit path metric 0 VR3H 16-bit path metric 1 VR4L 16-bit path metric 2 VR4H 16-bit path metric 3 [mem32] pointer to 32-bit memory location. The result of the operation is the new state metrics stored in VRa and VRb as shown below: Output Register Value VRaH 16-bit state metric 0. VRa can be VR6 or VR8. VRbH 16-bit state metric 1. VRb can be VR5 or VR7. VT0 The transition bit is appended to the end of the register. VT1 The transition bit is appended to the end of the register. VR2 Contents of the memory pointed to by [mem32]. Opcode LSW: 1110 0011 1111 1111 MSW: bbbb aaaa mem32 Description This instruction computes the new state metrics of a Viterbi butterfly operation and stores them in the higher 16-bits of the VRa and VRb registers. To instead load the state metrics into the low 16-bits use the VITLSEL instruction. T0 = T0 << 1 if (VR3L > VR3H) { VRbH = VR3L; T0[0:0] = 0; } else { VRbH = VR3H; T0[0:0] = 1; } // Shift previous transition bits left T1 = T1 << 1 if (VR4L > VR4H) { VRaH = VR4L; T1[0:0] = 0; } else { VRaH = VR4H; T1[0:0] = 1; } VR2 = [mem32]; // Shift previous transition bits left // New state metric 0 // Store the transition bit // New state metric 0 // Store the transition bit // New state metric 1 // Store the transition bit // New state metric 1 // Store the transition bit // Load VR2 Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa. See also VITLSEL VRa, VRb, VR4, VR3 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 361 VITLSEL VRa, VRb, VR4, VR3 — Viterbi Select, Low Word www.ti.com VITLSEL VRa, VRb, VR4, VR3 Viterbi Select, Low Word Operands Before the operation, the path metrics are loaded into the registers as shown below. Typically this will have been done using a Viterbi AddSub or SubAdd instruction. Input Register Value VR3L 16-bit path metric 0 VR3H 16-bit path metric 1 VR4L 16-bit path metric 2 VR4H 16-bit path metric 3 The result of the operation is the new state metrics stored in VRa and VRb as shown below: Output Register Value VRaL 16-bit state metric 0. VRa can be VR6 or VR8. VRbL 16-bit state metric 1. VRb can be VR5 or VR7. VT0 The transition bit is appended to the end of the register. VT1 The transition bit is appended to the end of the register. Opcode LSW: 1110 0110 1111 0110 MSW: 0000 0000 bbbb aaaa Description This instruction computes the new state metrics of a Viterbi butterfly operation and stores them in the higher 16-bits of the VRa and VRb registers. To instead load the state metrics into the low 16-bits use the VITHSEL instruction. T0 = T0 << 1 if (VR3L > VR3H) { VRbL = VR3L; T0[0:0] = 0; } else { VRbL = VR3H; T0[0:0] = 1; } // Shift previous transition bits left T1 = T1 << 1 if (VR4L > VR4H) { VRaL = VR4L; T1[0:0] = 0; } else { VRaL = VR4H; T1[0:0] = 1; } // Shift previous transition bits left // New state metric 0 // Store the transition bit // New state metric 0 // Store the transition bit // New state metric 1 // Store the transition bit // New state metric 1 // Store the transition bit Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa. See also VITHSEL VRa, VRb, VR4, VR3 362 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VITLSEL VRa, VRb, VR4, VR3 || VMOV32 VR2, mem32 — Viterbi Select Low with Parallel Load VITLSEL VRa, VRb, VR4, VR3 || VMOV32 VR2, mem32 Viterbi Select Low with Parallel Load Operands Before the operation, the path metrics are loaded into the registers as shown below. Typically this will have been done using a Viterbi AddSub or SubAdd instruction. Input Register Value VR3L 16-bit path metric 0 VR3H 16-bit path metric 1 VR4L 16-bit path metric 2 VR4H 16-bit path metric 3 mem32 Pointer to 32-bit memory location. The result of the operation is the new state metrics stored in VRa and VRb as shown below: Output Register Value VRaL 16-bit state metric 0. VRa can be VR6 or VR8. VRbL 16-bit state metric 1. VRb can be VR5 or VR7. VT0 The transition bit is appended to the end of the register. VT1 The transition bit is appended to the end of the register. VR2 Contents of 32-bit memory pointed to by mem32. Opcode LSW: 1110 0011 1111 1110 MSW: bbbb aaaa mem32 Description This instruction computes the new state metrics of a Viterbi butterfly operation and stores them in the higher 16-bits of the VRa and VRb registers. To instead load the state metrics into the low 16-bits use the VITHSEL instruction. In parallel the VR2 register is loaded with the contents of memory pointed to by [mem32]. T0 = T0 << 1 if (VR3L > VR3H) { VRbL = VR3L; T0[0:0] = 0; } else { VRbL = VR3H; T0[0:0] = 1; } // Shift previous transition bits left T1 = T1 << 1 if (VR4L > VR4H) { VRaL = VR4L; T1[0:0] = 0; } else { VRaL = VR4H; T1[0:0] = 1; } VR2 = [mem32] // Shift previous transition bits left // New state metric 0 // Store the transition bit // New state metric 0 // Store the transition bit // New state metric 1 // Store the transition bit // New state metric 1 // Store the transition bit Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa. SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 363 VITLSEL VRa, VRb, VR4, VR3 || VMOV32 VR2, mem32 — Viterbi Select Low with Parallel Load See also 364 www.ti.com VITHSEL VRa, VRb, VR4, VR3 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VITSTAGE — Parallel Butterfly Computation www.ti.com VITSTAGE Parallel Butterfly Computation Operands None Opcode LSW: 1110 0101 0010 0110 Description VITSTAGE instruction performs 32 viterbi butterflies in a single cycle. This instructions does the following: • Depends on the Initial 64 State Metrics of the current stage stored in registers VSM0 to VSM63 • Depends on the Branch Metrics Select configuration stored in registers VR2 to VR5 • Depends on the Computed Branch Metrics of the current stage stored in registers VR0 and VR1 • Computes the State Metrics for the next stage and updates registers VSM0 to VSM63. The 16-bit signed result of the computation is saturated if VSTATUS[SAT] == 1 • Computes transition bits for all 64 states and updates registers VT0 and VT1 Flags This instruction modifies the following bits in the VSTATUS register. • OVFR is set if overflow is detected in the computation of a 16-bit signed result Pipeline This is a single-cycle instruction. Example ; ; Viterbi K=4 CR = 1/2 ; ;etc ... ; VSETK #CONSTRAINT_LENGTH ; Set constraint length MOV AR1, #SMETRICINIT_OFFSET VSMINIT *+XAR4[AR1] ; Initialize the state metrics MOV AR1, #NBITS_OFFSET MOV AL, *+XAR4[AR1] LSR AL, 2 SUBB AL, #2 MOV AR3, AL ; Initialize the BMSEL register ; for butterfly 0 to K-1 MOVL XAR6, *+XAR4[BMSELINIT_OFFSET] VMOV32 VR2, *XAR6 ; Initialize BMSEL for ; butterfly 0 to 7 VITBM2 VR0, *XAR0++ ; Calculate and store BMs in ; VR0L and VR0H .align 2 RPTB _VITERBI_runK4CR12_stageAandB, AR3 _VITERBI_runK4CR12_stageA: VITSTAGE ; Compute NSTATES/2 butterflies ; in parallel, VITBM2 VR0, *XAR0++ ; compute branch metrics for ; next butterfly VMOV32 *XAR2++, VT1 ; Store VT1 VMOV32 *XAR2++, VT0 ; Store VT0 ; ;etc ... ; See also VITSTAGE || VITBM2 VR0, mem32 VITSTAGE || VMOV16 VROL, mem16 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 365 VITSTAGE || VITBM2 VR0, mem32 — Parallel Butterfly Computation with Parallel Branch Metric Calculation CR=1/2 www.ti.com VITSTAGE || VITBM2 VR0, mem32 Parallel Butterfly Computation with Parallel Branch Metric Calculation CR=1/2 Operands Input Output VR0 Destination register mem32 Pointer to 32-bit memory location Opcode LSW: 1110 0010 1000 0000 MSW: 0000 0010 mem32 Description VITSTAGE instruction performs 32 viterbi butterflies in a single cycle. This instructions does the following: • Depends on the Initial 64 State Metrics of the current stage stored in registers VSM0 to VSM63 • Depends on the Branch Metrics Select configuration stored in registers VR2 to VR5 • Depends on the Computed Branch Metrics of the current stage stored in registers VR0 and VR1 • Computes the State Metrics for the next stage and updates registers VSM0 to VSM63. The 16-bit signed result of the computation is saturated if VSTATUS[SAT] == 1 • Computes transition bits for all 64 states and updates registers VT0 and VT1 VR0L = [mem32][15:0] + [mem32][31:16] VR0H = [mem32][15:0] - [mem32][31:16] Flags This instruction modifies the following bits in the VSTATUS register. • OVFR is set if overflow is detected in the computation of a 16-bit signed result Pipeline This is a single-cycle instruction. Example ; ; Viterbi K=4 CR = 1/2 ; ;etc ... ; VSETK #CONSTRAINT_LENGTH ; Set constraint length MOV AR1, #SMETRICINIT_OFFSET VSMINIT *+XAR4[AR1] ; Initialize the state metrics MOV AR1, #NBITS_OFFSET MOV AL, *+XAR4[AR1] LSR AL, 2 SUBB AL, #2 MOV AR3, AL ; Initialize the BMSEL register ; for butterfly 0 to K-1 MOVL XAR6, *+XAR4[BMSELINIT_OFFSET] VMOV32 VR2, *XAR6 ; Initialize BMSEL for ; butterfly 0 to 7 VITBM2 VR0, *XAR0++ ; Calculate and store BMs in ; VR0L and VR0H .align 2 RPTB _VITERBI_runK4CR12_stageAandB, AR3 _VITERBI_runK4CR12_stageA: VITSTAGE ; Compute NSTATES/2 butterflies ; in parallel, ||VITBM2 VR0, *XAR0++ ; compute branch metrics for ; next butterfly VMOV32 *XAR2++, VT1 ; Store VT1 VMOV32 *XAR2++, VT0 ; Store VT0 ; 366 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated www.ti.com VITSTAGE || VITBM2 VR0, mem32 — Parallel Butterfly Computation with Parallel Branch Metric Calculation CR=1/2 ;etc ... ; See also VITSTAGE VITSTAGE || VMOV16 VROL, mem16 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 367 VITSTAGE || VMOV16 VR0L, mem16 — Parallel Butterfly Computation with Parallel Load www.ti.com VITSTAGE || VMOV16 VR0L, mem16 Parallel Butterfly Computation with Parallel Load Operands Input Output VR0L Low word of the destination register mem16 Pointer to 16-bit memory location Opcode LSW: 1110 0010 1100 0101 MSW: 0000 0011 mem16 Description VITSTAGE instruction performs 32 viterbi butterflies in a single cycle. This instructions does the following: • Depends on the Initial 64 State Metrics of the current stage stored in registers VSM0 to VSM63 • Depends on the Branch Metrics Select configuration stored in registers VR2 to VR5 • Depends on the Computed Branch Metrics of the current stage stored in registers VR0 and VR1 • Computes the State Metrics for the next stage and updates registers VSM0 to VSM63. The 16-bit signed result of the computation is saturated if VSTATUS[SAT] == 1 • Computes transition bits for all 64 states and updates registers VT0 and VT1 VR0L = [mem16] Flags This instruction modifies the following bits in the VSTATUS register. • OVFR is set if overflow is detected in the computation of a 16-bit signed result Pipeline This is a single-cycle instruction. Example ; ; Viterbi K=7 CR = 1/3 ; ;etc ... ; _VITERBI_runK7CR13_stageA: VITSTAGE ||VMOV16 VMOV16 VITBM3 VMOV32 VMOV32 ; ; VR0L, *XAR0++ ; VR1L, *XAR0++ ; VR0L, VR1L, *XAR0++ ; ; *XAR2++, VT1 ; *XAR2++, VT0 ; Compute NSTATES/2 butterflies in parallel, Load LLR(A) for next butterfly Load LLR(B) for next butterfly Load LLR(C) and compute branch metric for next butterfly Store VT1 Store VT0 ; ;etc ... ; See also VITSTAGE VITSTAGE || VITBM2 VR0, mem32 368 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VMOV32 VSM (k+1):VSM(k), mem32 — Load Consecutive State Metrics www.ti.com VMOV32 VSM (k+1):VSM(k), mem32 Load Consecutive State Metrics Operands Input Output VSM(k+1):VSM(k) Consecutive State Metric Registers (VSM1:VSM0 …. VSM63:VSM62) mem32 Pointer to 32-bit memory location Opcode LSW: 1110 0010 1000 MSW: 001n nnnn mem32 Description Load a pair of Consecutive State Metrics from memory: 0000 VSM(k+1) = [mem32][31:16]; VSM(k) = [mem32][15:0]; Note: • n-k/2, used in opcode assignment • k is always even Flags This instruction does not affect any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example VMOV32 See also VMOV32 mem32, VSM (k+1):VSM(k) VSM63: VSM62, *XAR7++ SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 369 VMOV32 mem32, VSM (k+1):VSM(k) — Store Consecutive State Metrics www.ti.com VMOV32 mem32, VSM (k+1):VSM(k) Store Consecutive State Metrics Operands Input Output VSM(k+1):VSM(k) Consecutive State Metric Registers (VSM1:VSM0 …. VSM63:VSM62) mem32 Pointer to 32-bit memory location Opcode LSW: 1110 0010 0000 1110 MSW: 000n nnnn mem32 Description Store a pair of Consecutive State Metrics from memory: [mem32] [31:16] = VSM(k+1); [mem32] [15:0] = VSM(k); NOTE: • • n-k/2, used in opcode assignment k is always even Flags This instruction does not affect any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example VMOV32 See also VMOV32 VSM (k+1):VSM(k), mem32 370 *XAR7++ VSM63: VSM62 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VSETK #3-bit — Set Constraint Length for Viterbi Operation www.ti.com VSETK #3-bit Set Constraint Length for Viterbi Operation Operands Input Output #3-bit 3-bit immediate value Opcode LSW: 1110 0110 1111 0010 MSW: 0000 1001 0000 0III Description VSTATUS[K] = #3-bit Immediate Flags This instruction does not affect any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example See also SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 371 VSMINIT mem16 — State Metrics Register initialization VSMINIT mem16 www.ti.com State Metrics Register initialization Operands Input Output mem16 Pointer to 16-bit memory location Opcode LSW: 1111 0010 1100 0101 MSW: 0000 0001 mem16 Description Initializes the state metric registers. VSM0 = 0 VSM1 to VSM63 = [mem16] Flags This instruction does not affect any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example VSMINIT *+XAR4[AR1] ; Initialize the state metrics See also 372 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VTCLEAR — Clear Transition Bit Registers www.ti.com VTCLEAR Clear Transition Bit Registers Operands none Opcode LSW: 1110 0101 0010 Description Clear the VT0 and VT1 registers. 1001 VT0 = 0; VT1 = 0; Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example See also VCLEARALL VCLEAR VRa SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 373 VTRACE mem32, VR0, VT0, VT1 — Viterbi Traceback, Store to Memory www.ti.com VTRACE mem32, VR0, VT0, VT1 Viterbi Traceback, Store to Memory Operands Before the operation, the path metrics are loaded into the registers as shown below using a Viterbi AddSub or SubAdd instruction. Input Register Value VT0 transition bit register 0 VT1 transition bit register 1 VR0 Initial value is zero. After the first VTRACE, this contains information from the previous trace-back. The result of the operation is the new state metrics stored in VRa and VRb as shown below: Output Register Value [mem32] Traceback result from the transition bits. Opcode LSW: 1110 0010 0000 1100 MSW: 0000 0000 mem32 Description Trace-back from the transition bits stored in VT0 and VT1 registers. Write the result to memory. The transition bits in the VT0 and VT1 registers are stored in the following format by the VITLSEL and VITHSEL instructions: VT0[31] Transition bit [State 0] VT0[30] Transition bit [State 1] VT0[29] Transition bit [State 2] ... ... VT0[0] Transition bit [State 31] VT1[31] Transition bit [State 32] VT1[30] Transition bit [State 33] VT1[29] Transition bit [State 34] ... ... VT1[0] Transition bit [State 63] // // Calculate the decoder output bit by performing a // traceback from the transition bits stored in the VT0 and VT1 registers // K = VSTATUS[K]; S = VR0[K-2:0]; VR0[31:K-1] = 0; if (S < (1<<(K-2))){ temp[0] = VT0[(1 << (K-2))- 1 -S]; }else{ temp[0] = VT1[(1 << (K-1))- 1 -S]; } [mem32][0] = temp; [mem32][31:1] = 0; VR0[K-2:0] = 2*VR0[K-2:0] + temp[0]; Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example // // Example traceback code fragment 374 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VTRACE mem32, VR0, VT0, VT1 — Viterbi Traceback, Store to Memory www.ti.com // // XAR5 points to the beginning of Decoder Output array // VCLEAR VR0 MOVL XAR5,*+XAR4[0] // // To retrieve each original message: // Load VT0/VT1 with the stored transition values // and use VTRACE instruction // VMOV32 VT0, *--XAR3 VMOV32 VT1, *--XAR3 VTRACE *XAR5++, VR0, VT0, VT1 VMOV32 VMOV32 VTRACE ... ...etc See also VT0, *--XAR3 VT1, *--XAR3 *XAR5++, VR0, VT0, VT1 for each VT0/VT1 pair VTRACE VR1, VR0, VT0, VT1 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 375 VTRACE VR1, VR0, VT0, VT1 — Viterbi Traceback, Store to Register www.ti.com VTRACE VR1, VR0, VT0, VT1 Viterbi Traceback, Store to Register Operands Before the operation, the path metrics are loaded into the registers as shown below using a Viterbi AddSub or SubAdd instruction. Input Register Value VT0 transition bit register 0 VT1 transiton bit register 1 VR0 Initial value is zero. After the first VTRACE, this contains information from the previous trace-back. The result of the operation is the output of the decoder stored in VR1: Output Register Value VR1 Traceback result from the transition bits. Opcode LSW: 1110 0101 0010 1000 Description Trace-back from the transition bits stored in VT0 and VT1 registers. Write the result to VR1. The transition bits in the VT0 and VT1 registers are stored in the following format by the VITLSEL and VITHSEL instructions: VT0[31] Transition bit [State 0] VT0[30] Transition bit [State 1] VT0[29] Transition bit [State 2] ... ... VT0[0] Transition bit [State 31] VT1[31] Transition bit [State 32] VT1[30] Transition bit [State 33] VT1[29] Transition bit [State 34] ... ... VT1[0] Transition bit [State 63] // // Calculate the decoder output bit by performing a // traceback from the transition bits stored in the VT0 and VT1 registers // K = VSTATUS[K]; S = VR0[K-2:0]; VR0[31:K-1] = 0; if (S < (1<<(K-2))) { temp[0] = VT0[(1<<(K-2))- 1 -S]; }else{ temp[0] = VT1[(1<<(K-1))- 1 -S]; } if(VSTATUS[OPACK]==0){ VR1 = VR1<<1; VR1[0:0] = temp[0] ; VR0[K-2:0] = 2*VR0[K-2:0] + temp[0]; }else{ VR1 = VR1>>1 VR1[31:31] = temp[0] ; VR0[K-2:0] = 2*VR0[K-2:0] + temp[0]; } Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. 376 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated VTRACE VR1, VR0, VT0, VT1 — Viterbi Traceback, Store to Register www.ti.com Example See also VTRACE mem32, VR0, VT0, VT1 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2015, Texas Instruments Incorporated 377 VTRACE VR1, VR0, VT0, VT1 || VMOV32 VT0, mem32 — Trace-back with Parallel Load www.ti.com VTRACE VR1, VR0, VT0, VT1 || VMOV32 VT0, mem32 Trace-back with Parallel Load Operands Input Register Value VT0 Traceback register VT1 Traceback register VR0 Decoded output bits register VR1 Decoded output bits register mem32 Pointer to 32-bit memory location Opcode LSW: 1110 0010 1011 0000 MSW: 0000 0001 mem32 Description Trace-back with Parallel Load K = VSTATUS[K]; S = VR0[K-2:0]; VR0[31:K-1] = 0; if (S < (1 << (K-2))) temp[0] = VT0[(1<<(K-2))- 1 -S]; else temp[0] = VT1[(1<<(K-1))- 1 -S]; if(VSTATUS[OPACK]==0){ VR1 = VR1<<1; VR1[0:0] = temp[0] ; VR0[K-2:0] = 2*VR0[K-2:0] + temp[0]; }else{ VR1 = VR1>>1; VR1[31:31] = temp[0] ; VR0[K-2:0] = 2*VR0[K-2:0] + temp[0]; } VT0 = [mem32] Flags This instruction does not affect any flags in the VSTATUS register. Pipeline This is a 1/1 cycle instruction. The VTRACE and VMOV32 instruction complete in a single cycle. Example ; ; etc ... ; .align RPTB VMOV32 VMOV32 VTRACE ||VMOV32 VMOV32 VTRACE _tb_loop_ovlp2 ; ; etc ... ; See also 378 2 _tb_loop_ovlp2, #12 VT0, *--XAR3 VT1, *--XAR3 VR1,VR0,VT0,VT1 VT0, *--XAR3 VT1, *--XAR3 VR1,VR0,VT0,VT1 VTRACE mem32, VR0, VT0, VT1 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Rounding Mode www.ti.com 2.6 Rounding Mode This section details the rounding operation as applied to a right shift. When the rounding mode is enabled in the VSTATUS register, .5 will be added to the right shifted intermediate value before truncation. If rounding is disabled the right shifted value is only truncated. Table 2-19 shows the bit representation of two values, 11.0 and 13.0. The columns marked Bit-1, Bit-2 and Bit-3 hold temporary bits resulting from the right shift operation. Table 2-19. Example: Values Before Shift Right Bit5 Bit4 Bit3 Bit2 Bit1 Bit0 Bit-1 Bit-2 Bit -3 Value Val A 0 0 1 0 1 1 0 0 0 11.000 Val B 0 0 0 0 0 1 0 0 0 13.000 Table 2-19 shows the intermediate values after the right shift has been applied to Val B. The columns marked Bit-1, Bit-2 and Bit-3 hold temporary bits resulting from the right shift operation. Table 2-20. Example: Values after Shift Right Bit5 Bit4 Bit3 Bit2 Bit1 Bit0 Bit-1 Bit-2 Bit -3 Value Val A 0 0 1 0 1 1 0 0 0 11.000 Val B >> 3 0 0 0 0 0 1 1 0 1 1.625 When the rounding mode is enabled, .5 will be added to the intermediate result before truncation. Table 221 shows the bit representation of Val A + Val (B >> 3) operation with rounding. Notice .5 is added to the intermediate shifted right value. After the addition, the bits in Bit-1, Bit-2 and Bit-3 are removed. In this case the result of the operation will be 13 which is the truncated value after rounding. Table 2-21. Example: Addition with Right Shift and Rounding Bit5 Bit4 Bit3 Bit2 Bit1 Bit0 Bit-1 Bit-2 Bit -3 Value Val A 0 0 1 0 1 1 0 0 0 11.000 Val B >> 3 0 0 0 0 0 1 1 0 1 1.625 .5 0 0 0 0 0 0 1 0 0 0 .500 Val A + Val B >> 3 + .5 0 0 1 1 0 1 0 0 1 13.125 When the rounding mode is disabled, the value is simply truncated. Table 2-22 shows the bit representation of the operation Val A + (Val B >> 3) without rounding. After the addition, the bits in Bit-1, Bit-2 and Bit-3 are removed. In this case the result of the operation will be 12 which is the truncated value without rounding. Table 2-22. Example: Addition with Rounding After Shift Right Bit5 Bit4 Bit3 Bit2 Bit1 Bit0 Bit-1 Bit-2 Bit -3 Value Val A 0 0 1 0 1 1 0 0 0 11.000 Val B >> 3 0 0 0 0 0 1 1 0 1 1.625 Val A + Val B >> 3 0 0 1 1 0 0 1 0 1 12.625 Table 2-23 shows more examples of the intermediate shifted value along with the result if rounding is enabled or disabled. In each case, the truncated value is without .5 added and the rounded value is with .5 added. SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) 379 Copyright © 2014–2015, Texas Instruments Incorporated Rounding Mode www.ti.com Table 2-23. Shift Right Operation With and Without Rounding Bit2 Bit1 Bit0 Bit -1 Bit -2 Value Result with RND = 0 Result with RND = 1 0 1 0 0 0 2.00 2 2 0 0 1 1 1 1.75 1 2 0 0 1 1 0 1.50 1 2 0 0 1 0 1 1.25 1 1 0 0 0 1 1 0.75 0 1 0 0 0 1 0 0.50 0 1 0 0 0 0 1 0.25 0 0 0 0 0 0 0 0.00 0 0 1 1 1 1 1 -0.25 0 0 1 1 1 1 0 -0.50 0 0 1 1 1 0 1 -0.75 0 -1 1 1 1 0 0 -1.00 -1 -1 1 1 0 1 1 -1.25 -1 -1 1 1 0 1 0 -1.50 -1 -1 1 1 0 0 1 -1.75 -1 -2 1 1 0 0 0 -2.00 -2 -2 380 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Chapter 3 SPRUHS1A – March 2014 – Revised December 2015 Trigonometric Math Unit (TMU) The Trigonometric Math Unit (TMU) is a fully programmable block that enhances the instruction set of the C28-FPU to more efficiently execute common trigonometric and arithmetic operations. The TMU module described in this reference guide is a Type 0 TMU. For a list of all devices with a TMU module of the same type and to determine differences between the types, see the TMS320x28xx, 28xxx DSP Peripheral Reference Guide (SPRU566). This document describes the architecture and instruction set of the C28x+FPU+TMU. Topic ........................................................................................................................... 3.1 3.2 3.3 3.4 3.5 Overview ......................................................................................................... Components of the C28x+FPU Plus TMU ............................................................. Data Format ..................................................................................................... Pipeline ........................................................................................................... TMU Instruction Set .......................................................................................... SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Trigonometric Math Unit (TMU) Page 382 382 382 383 388 381 Overview 3.1 www.ti.com Overview The TMU extends the capabilities of a C28x+FPU enabled processor by adding instructions to speed up the execution of common trigonometric and arithmetic operations listed in Table 3-1. Table 3-1. TMU Supported Instructions 3.2 Instructions C Equivalent Operation MPY2PIF32 RaH,RbH a = b * 2pi DIV2PIF32 RaH,RbH a = b / 2pi DIVF32 RaH,RbH,RcH a = b/c SQRTF32 RaH,RbH a = sqrt(b) SINPUF32 RaH,RbH a = sin(b*2pi) COSPUF32 RaH,RbH a = cos(b*2pi) ATANPUF32 RaH,RbH a = atan(b)/2pi QUADF32 RaH,RbH,RcH,RdH Operation to assist in calculating ATANPU2 Components of the C28x+FPU Plus TMU The TMU extends the capabilities of the C28x+FPU processors by adding new instructions and, in some cases, leveraging existing FPU instructions to carry out common arithmetic operations used in control applications. No changes have been made to existing instructions, pipeline or memory bus architecture. All TMU instructions use the existing FPU register set (R0H to R7H) to carry out their operations. A detailed explanation of the workings of the FPU can be found in the TMS320C28x Floating Point Unit and Instruction Set Reference Guide (SPRUEO2). 3.2.1 Interrupt Context Save and Restore Since the TMU uses the same register set and flags as the FPU, there are no special considerations with regards to interrupt context save and restore. If a TMU operation is executing when an interrupt occurs, the C28 can initiate an interrupt context switch without affecting the TMU operation. The TMU will continue to process the operation to completion. Even though most TMU operations are multi-cycle, the TMU operation will have completed by the time register context save operations for the FPU are commenced. When restoring FPU registers, you must make sure that all TMU operations are completed before restoring any register used by another TMU operation. 3.3 Data Format The encoding of the floating-point formats is given in Table 3-2. Table 3-2. IEEE 32-Bit Single Precision Floating-Point Format S32 E32 (7:0) M32 (22:0) Value (V) 0 0 0 Zero (V = 0) 1 0 0 Negative Zero (V = -0) 0 +ve 0 non zero 1 to 254 0 to 0x7FFFFF 254 0x7FFFFF Positive Max (V = +Max) 1 254 0x7FFFFF Negative Max (V = -Max) 0 max=255 0 Positive Infinity (V = +Infinity) 1 max=255 0 Negative Infinity (V = -Infinity) x max=255 non zero De-normalized (V=(-1)S* 2(-126)* (0.M)) 1 -ve 0 +ve Normal Range (V=(-1)S * 2(E-127) * (1.M)) 1 -ve 0 382 Trigonometric Math Unit (TMU) Not A Number (V = NaN) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Pipeline www.ti.com The treatment of the various IEEE floating-point numerical formats for this TMU is the same as the FPU implementation given below: Negative Zero: All TMU operations generate a positive (S==0, E==0, M==0) zero, never a negative zero if the result of the operation is zero. All TMU operations treat negative zero operations as zero. De-Normalized Numbers: A de-normalized operand (E==0, M!=0) input is treated as zero (E==0, M==0) by all TMU operations. TMU operations never generate a de-normalized value. Underflow: Underflow occurs when an operation generates a value that is too small to represent in the given floating-point format. Under such cases, a zero value is returned. If a TMU operation generates an underflow condition, then the latched underflow flag (LUF) is set to 1. The LUF flag will remain latched until cleared by the user executing an instruction that clears the flag. Overflow: Overflow occurs when an operation generates a value that is too large to represent in the given floating-point format. Under such cases, a ± Infinity value is returned. If a TMU operation generates an overflow condition, then the latched overflow flag (LVF) is set to 1. The LVF flag will remain latched until cleared by the user executing an instruction that clears the flag. Rounding: There are various rounding formats supported by the IEEE standard. Rounding has no meaning for TMU operations (rounding is inherent in the implementation). Hence rounding mode is ignored by TMU operations. Infinity and Not a Number (NaN): An NaN operand (E==max, M!=0) input is treated as Infinity (E==max, M==0) for all operations. TMU operations will never generate a NaN value but Infinity instead. 3.4 Pipeline The TMU enhances the instruction set of the C28-FPU and, therefore, operates the C28x pipeline in the same fashion as the FPU. For a detailed explanation on the working of the pipeline, see the TMS320C28x Floating Point Unit and Instruction Set Reference Guide (SPRUEO2). 3.4.1 Pipeline and Register Conflicts In addition to the restrictions mentioned in the TMS320C28x Floating Point Unit and Instruction Set Reference Guide (SPRUEO2), the TMU places the following restrictions on its instructions: Example 3‑1. SINPUF32 Operation (4p cycles) SINPUF32 RaH,RbH Instruction1 Instruction2 Instruction3 Instruction4 ; ; ; ; ; Value in registers RbH read in this cycle. Instructions 1-3 cannot operate on register RaH. Instructions 1-3 can operate on register RbH. Instructions 1-3 can be any TMU/FPU/VCU/CPU operation. Result in RaH usable by Instruction 4. Example 3‑2. COSPUF32 Operation (4p cycles) COSPUF32 RaH,RbH Instruction1 Instruction2 Instruction3 Instruction4 ; ; ; ; ; Value in registers RbH read in this cycle. Instructions 1-3 cannot operate on register RaH. Instructions 1-3 can operate on register RbH. Instructions 1-3 can be any TMU/FPU/VCU/CPU operation. Result in RaH usable by Instruction4. Example 3‑3. ATANPUF32 Operation (4p cycles) ATANPUF32 RaH,RbH Instruction1 Instruction2 Instruction3 Instruction4 ; ; ; ; ; ; Value in registers RbH read in this cycle. Instructions 1-3 cannot operate on register RaH. Instructions 1-3 can operate on register RbH. Instructions 1-3 can be any TMU/FPU/VCU/CPU operation. Result, LVF flag updated on Instruction3 (4th cycle). Result in RaH usable by Instruction4. SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Trigonometric Math Unit (TMU) 383 Pipeline www.ti.com Example 3‑4. DIVF32 Operation (5p cycles) DIVF32 RaH,RbH,RcH Instruction1 Instruction2 Instruction3 Instruction4 Instruction5 ; ; ; ; ; ; Value in registers RbH & RcH read in this cycle. Instructions 1-4 cannot operate on register RaH. Instructions 1-4 can operate on register RbH & RcH. Instructions 1-4 can be any TMU/FPU/VCU/CPU operation. Result, LVF and LUF flags updated on Instruction4 (5th cycle). Result in RaH usable by Instruction5. Example 3‑5. SQRTF32 Operation (5p cycles) SQRTF32 RaH,RbH Instruction1 Instruction2 Instruction3 Instruction4 Instruction5 ; ; ; ; ; ; Value in register RbH read in this cycle. Instructions 1-4 cannot operate on register RaH. Instructions 1-4 can operate on register RbH. Instructions 1-4 can be any TMU/FPU/VCU/CPU operation. Result, LVF flag updated on Instruction4 (5th cycle). Result in register RaH usable by Instruction5. Example 3‑6. QUADF32 Operations (5p cycles) QUADF32 RaH,RbH,RcH,RdH ; Value in registers RcH & RdH read in this cycle. Instruction1 ; Instructions 1-4 cannot operate on registers RaH & RbH. Instruction2 ; Instructions 1-4 can operate on register RbH. Instruction3 ; Instructions 1-4 can be any TMU/FPU/VCU/CPU operation. Instruction4 ; Result, LVF and LUF flags updated on Instruction4 (5th cycle). Instruction5 ; Result in registers RaH & RbH usable by Instruction5. 384 Trigonometric Math Unit (TMU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Pipeline www.ti.com 3.4.2 Delay Slot Requirements The Delay slot requirements for the TMU instructions are presented in Table 3-3. Table 3-3. Delay Slot Requirements for TMU Instructions Case Description 1 Any Single Cycle FPU operation (including any memory load/store operation) SINPUF32/COSPUF32/ATANPUF32/QUADF32/MPY2PIF32/DIV2PIF32/DIVF32/SQRTF32 2 All FPU 2p-cycle operations MPY/ADD/SUB/…. NOP NOP SINPUF32/COSPUF32/ATANPUF32/QUADF32/MPY2PIF32/DIV2PIF32/DIVF32/SQRTF32 3 SINPUF32/COSPUF32/ATANPUF32 NOP NOP NOP All TMU or FPU operations 4 QUADF32/DIVF32/SQRTF32 NOP NOP NOP NOP All TMU or FPU operations Special Cases Involving MPY2PIF32/DIV2PIF32 5 MPY2PIF32/DIV2PIF32 NOP SINPUF32/COSPUF32 6 MPY2PIF32/DIV2PIF32 NOP NOP ATANPUF32/QUADF32/DIVF32/SQRTF32 7 MPY2PIF32/DIV2PIF32 NOP NOP All FPU operations 8 MPY2PIF32/DIV2PIF32 NOP MOV32 mem,RxH; Special case: Store result of MPY2PIF32/DIV2PIF32 to memory (but does not include MOV32 operation between CPU and FPU registers). Notes: The “NOPs” can be any other FPU, TMU, VCU or CPU operation that does not conflict with the current active TMU operation (does not use same destination register). For example, Example 3‑7. Use of Non-Conflicting Instructions in Delay Slots SINPUF32 COSPUF32 MOV32 MOV32 ADDF32 ADDF32 R0H,R1H R2H,R1H R4H,@VarA R5H,@VarB R3H,R4H,R0H R7H,R5H,R2H ; SINPUF32 value (R0H) used here ; COSPUF32 value (R2H) used here The delay FPU slot requirements apply to the operation whereby the destination register value is subsequently used by the TMU operation. For example, in the following case, a parallel MPY and MOV operation precedes the TMU operation and the result from MPY operation is used, then two delay slots are required (Case 2 of Table 3-3): SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Trigonometric Math Unit (TMU) 385 Pipeline www.ti.com Example 3‑8. Delay Slot Requirement for TMU Instructions That Use Results of Prior FPU Instructions MPYF32 ||MOV32 NOP NOP SINPUF32 R3H,R2H,R1H R4H,@varA R6H,R3H If however the result of the parallel MOV operation is used, then no delay slots are required since the MOV will complete in a single cycle. (Case 1 of Table 3-3): Example 3‑9. FPU Instruction Followed by a Non-Dependent TMU Instruction MPYF32 ||MOV32 SINPUF32 R3H,R2H,R1H R4H,@varA R6H,R4H 3.4.3 Effect of Delay Slot Operations on the Flags The LVF and LUF flags can only be set. If multiple operations (from FPU or TMU) try to set the flags, the operations on the flags are ORed together. Operations that set the LVF or LUF flags (either FPU or TMU) are allowed in delay slots. For example, the following sequence of operations is valid: Example 3‑10. Valid Back-to-Back Instructions That may Set the LVF, LUF Flag MPY2PIF32 R0H,R0H MPY2PIF32 R1H,R1H If the SETFLG, SAVE, RESTORE, MOVST0, or loading and storing of the STF register, operations try to modify the state of the LVF, LUF flags while a TMU or any other FPU operation is trying to set the flags, the LUV, LVF flags are undefined. This can only occur if the SAVE, SETFLG, RESTORE, MOVST0, or loading and storing of the STF register, operations are placed in the delay slots of the pipeline operations; this should be avoided. This also applies to ZF and NF flags, which are not affected by TMU operations. 3.4.4 Multi-Cycle Operations in Delay Slots A multi-cycle operation like RET, BRANCH, CALL is equivalent to a minimum four NOPs. For example, the code shown in Example 3-11 returns the correct value because LRETR takes a minimum of four cycles to execute (equivalent to four NOPs): Example 3‑11. Multi-Cycle Operation in the Delay Slot of a TMU Instruction DIVF32 R0H,R2H,R1H LRETR 3.4.5 Moves From FPU Registers to C28x Registers When transferring from floating-point unit registers (result of an FPU or TMU operation) to the C28x CPU register, additional pipeline alignment is required as shown in Example 3-12. 386 Trigonometric Math Unit (TMU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Pipeline www.ti.com Example 3‑12. Floating-Point to C28x Register Software Pipeline Alignment ; SINPUF32: Per unit sine: 4 pipeline cycle operation ; An alignment cycle is required before copying R0H to ACC SINPUF32 R0H,R1H NOP ; Delay Slot 1 NOP ; Delay Slot 2 NOP ; Delay Slot 3 NOP ; Alignment cycle MOV32 @ACC,R0H SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Trigonometric Math Unit (TMU) 387 TMU Instruction Set 3.5 www.ti.com TMU Instruction Set This section describes the assembly language instructions of the TMU. 3.5.1 Instruction Descriptions This section provides detailed information on the instruction set. Each instruction may present the following information: • Operands • Opcode • Description • Exceptions • Pipeline • Examples • See also The example INSTRUCTION is shown to familiarize you with the way each instruction is described. The example describes the kind of information you will find in each part of the individual instruction description and where to obtain more information. TMU instructions follow the same format as the C28x and the C28x+FPU; the source operand(s) are always on the right and the destination operand(s) are on the left. The explanations for the syntax of the operands used in the instruction descriptions for the C28x TMU are given in Table 3-4. For information on the operands of standard C28x instructions, see the TMS320C28x CPU and Instruction Set Reference Guide (SPRU430). Table 3-4. Operand Nomenclature Symbol Description #16FHi 16-bit immediate (hex or float) value that represents the upper 16-bits of an IEEE 32-bit floating-point value. Lower 16-bits of the mantissa are assumed to be zero. #16FHiHex 16-bit immediate hex value that represents the upper 16-bits of an IEEE 32-bit floating-point value. Lower 16-bits of the mantissa are assumed to be zero. #16FLoHex A 16-bit immediate hex value that represents the lower 16-bits of an IEEE 32-bit floating-point value #32Fhex 32-bit immediate value that represents an IEEE 32-bit floating-point value #32F Immediate float value represented in floating-point representation #0.0 Immediate zero #RC 16-bit immediate value for the repeat count *(0:16bitAddr) 16-bit immediate address, zero extended CNDF Condition to test the flags in the STF register FLAG Selected flags from STF register (OR) 11 bit mask indicating which floating-point status flags to change label Label representing the end of the repeat block mem16 Pointer (using any of the direct or indirect addressing modes) to a 16-bit memory location mem32 Pointer (using any of the direct or indirect addressing modes) to a 32-bit memory location RaH R0H to R7H registers RbH R0H to R7H registers RcH R0H to R7H registers RdH R0H to R7H registers ReH R0H to R7H registers RfH R0H to R7H registers RB Repeat Block Register STF FPU Status Register VALUE Flag value of 0 or 1 for selected flag (OR) 11 bit mask indicating the flag value; 0 or 1 388 Trigonometric Math Unit (TMU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated INSTRUCTION dest1, source1, source2 Short Description — www.ti.com INSTRUCTION dest1, source1, source2 Short Description Operands dest1 source1 source2 Description for the 1st operand for the instruction Description for the 2nd operand for the instruction Description for the 3rd operand for the instruction Each instruction has a table that gives a list of the operands and a short description. Instructions always have their destination operand(s) first followed by the source operand(s). Opcode This section shows the opcode for the instruction. Description Detailed description of the instruction execution is described. Any constraints on the operands imposed by the processor or the assembler are discussed. Restrictions Any constraints on the operands or use of the instruction imposed by the processor are discussed. Pipeline This section describes the instruction in terms of pipeline cycles. Example Examples of instruction execution. If applicable, register and memory values are given before and after instruction execution. All examples assume the device is running with the OBJMODE set to 1. Normally the boot ROM or the c-code initialization will set this bit. See Also Lists related instructions. 3.5.2 Common Restrictions For all the TMU instructions, the inputs are conditioned as follows (LVF, LUF are not affected): • Negative zero is treated as positive zero • Positive or negative denormalized numbers are treated as positive zero • Positive and negative NaN are treated as positve and negative infinity respectively 3.5.3 Instructions The instructions are listed alphabetically. Table 3-5. Summary of Instructions Title ...................................................................................................................................... Page MPY2PIF32 RaH, RbH — 32-Bit Floating-Point Multiply by Two Pi .............................................................. 391 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Trigonometric Math Unit (TMU) 389 TMU Instruction Set www.ti.com Table 3-5. Summary of Instructions (continued) DIV2PIF32 RaH, RbH — 32-Bit Floating-Point Divide by Two Pi ................................................................. DIVF32 RaH, RbH, RcH — 32-Bit Floating-Point Division ......................................................................... SQRTF32 RaH, RbH — 32-Bit Floating-Point Square Root ....................................................................... SINPUF32 RaH, RbH — 32-Bit Floating-Point Sine (per unit) .................................................................... COSPUF32 RaH, RbH — 32-Bit Floating-Point Cosine (per unit) ................................................................ ATANPUF32 RaH, RbH — 32-Bit Floating-Point ArcTangent (per unit) ......................................................... QUADF32 RaH, RbH, RcH — Quadrant Determination Used in Conjunction With ATANPUF32() .......................... 390 Trigonometric Math Unit (TMU) 392 393 395 396 398 400 401 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated MPY2PIF32 RaH, RbH — 32-Bit Floating-Point Multiply by Two Pi www.ti.com MPY2PIF32 RaH, RbH 32-Bit Floating-Point Multiply by Two Pi Operands RaH RbH Floating-point destination register (R0H to R7H) Floating-point source register (R0H to R7H) Opcode LSW MSW Description 1110 0010 0111 0000 0000 0000 00bb baaa This operation is similar to the MPYF32 operation except that the second operand is the constant value 2pi: RaH = RbH * 2pi This operation is used in converting Per Unit values to Radians. Per Unit values are used in control applications to represent normalized radians: Per Unit 1.0 0.0 1.0 Radians 2pi 0 2pi 2pi = 6.28318530718 = 1.570796326795 * 2^2 In IEEE 32-bit Floating point format: S = 0 << 31= 0x00000000 E = (2 + 127) << 23= 129 << 23= 0x40800000 M = (1.570796326795 * 2^23) & 0x007FFFFF= 0x00490FDB 2pi = S+E+M = 0x40C90FDB Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No Yes Restrictions If( RaH result is too big for floating-point number, Ea > 255 ){ RaH = ±Infinity LVF = 1; } Pipeline Instruction takes 2 pipeline cycles to execute if followed by either SINPUF32, COSPUF32 or MOV32 mem, Rx operations and 3 pipeline cycles for all other operations (FPU or TMU). Example ;; Convert Per Unit value to Radians: MOV32 R0H,@PerUnit ; R0H = Per Unit value MPY2PIF32 R0H,R0H ; R0H = R0H * 2pi NOP ; pipeline delay MOV32 @Radians,R0H ; store Radian result ; 4 cycles SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Trigonometric Math Unit (TMU) 391 DIV2PIF32 RaH, RbH — 32-Bit Floating-Point Divide by Two Pi www.ti.com DIV2PIF32 RaH, RbH 32-Bit Floating-Point Divide by Two Pi Operands RaH RbH Floating-point destination register (R0H to R7H) Floating-point source register (R0H to R7H) Opcode LSW MSW Description 1110 0010 0111 0001 0000 0000 00bb baaa This operation is similar to the MPYF32 operation except that the second operand is the constant value 1/2pi: RaH = RbH * 1/2pi This operation is used in converting Radians to Per unit values. Per unit values are used in control representing normalized Radians: Per Unit 1.0 0.0 -1.0 Radians 2pi 0 -2pi In IEEE 32-bit Floating point format: 1/2pi = 0.1591549430919 = 1.273239544735 * 2^-3 S = 0 << 31= 0x00000000 E = (-3+127) << 23 = 124 << 23 = 0x3E000000 M = (1.273239544735 * 2^23) & 0x007FFFFF = 0x0022F983 1/2pi = S+E+M = 0x3E22F983 Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No Yes No Restrictions If( RaH result is too small for floating-point number, Ea < 0) { RaH = 0.0 LUF = 1; } Pipeline Instruction takes 2 pipeline cycles to execute if followed by either SINPUF32, COSPUF32 or MOV32 mem, Rx operations and 3 pipeline cycles for all other operations (FPU or TMU). Example ;; Convert Per Unit value to Radians: MOV32 R0H,@Radians ; R0H = Radian value DIV2PIF32 R0H,R0H ; R0H = R0H * 1/2pi NOP ; pipeline delay MOV32 @Per Unit ; store Per Unit result ; 4 cycles 392 Trigonometric Math Unit (TMU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated DIVF32 RaH, RbH, RcH — 32-Bit Floating-Point Division www.ti.com DIVF32 RaH, RbH, RcH 32-Bit Floating-Point Division Operands RaH RbH RcH Floating-point destination register (R0H to R7H) Floating-point source register (R0H to R7H) Floating-point source register (R0H to R7H) Opcode LSW MSW 1110 0010 0111 0100 0000 000c ccbb baaa RaH = RbH/RcH Description The sequence of operations are as follows: Sa = Sb ^ Sc; Ea = (Eb – Ec) + 127; Ma = Mb / Mc; if(Ma < 1.0){){ Ea = Ea – 1; Ma = Ma * 2.0; } if(Ea >= 255){ Ea = 255; Ma = 0; LVF = 1; } if((Ea == 0) & (Ma != 0)){ Sa = 0; Ea = 0; Ma = 0; LUF = 1; } if(Ea < 0){ Sa = 0; Ea = 0; Ma = 0; LUF = 1; } // // // // Set sign of result Calculate Exponent 0.5 < Ma < 2.0 Re-normalize mantissa range // Chek if result too big: // Return Inf // Set overflow flag // Check if result Denorm value: // Return zero // Set underflow flag // Check if result too small: // Return zero // Set underflow flag Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No Yes Yes SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Trigonometric Math Unit (TMU) 393 DIVF32 RaH, RbH, RcH — 32-Bit Floating-Point Division The following boundary conditions apply: Restrictions Division Result LVF LUF 0/0 0 1 - 0/Inf 0 - 1 Inf/Normal Inf 1 - Inf/0 Inf 1 - Inf/Inf Inf - 1 Normal/0 Inf 1 - Normal/Inf 0 - 1 Pipeline Instruction takes 5 pipeline cycles to execute. Example ;; Calculate Z = Y/X MOV32 R0H,@X MOV32 R1H,@Y DIVF32 R2H,R1H,R0H NOP NOP NOP NOP MOV32 @Z,R2H 394 www.ti.com Trigonometric Math Unit (TMU) ; ; ; ; ; ; ; ; ; R0H = X R1H = Y R2H = R1H/R0H = Y/X = Z pipeline delay pipeline delay pipeline delay pipeline delay Z = Y/X 8 cycles SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated SQRTF32 RaH, RbH — 32-Bit Floating-Point Square Root www.ti.com SQRTF32 RaH, RbH 32-Bit Floating-Point Square Root Operands RaH RbH Floating-point destination register (R0H to R7H) Floating-point source register (R0H to R7H) Opcode LSW MSW 1110 0010 0111 0111 0000 0000 00bb baaa Rah = RbH Description Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No Yes Restrictions If( RbH < 0.0 or -Inf ) { // Check if input is negative: Sa = 0; // Return zero Ea = 0; Ma = 0; LVF = 1; // Set overflow flag } If( RbH == +Inf ) { Sa = 0; // Return Inf Ea = 255; Ma = 0; LVF = 1; // Set overflow flag } Pipeline Instruction takes 5 pipeline cycles to execute. Example ;; Calculate Y = sqrt(X) MOV32 R0H,@X ; SQRTF32 R1H,R0H ; NOP ; NOP ; NOP ; NOP ; MOV32 @Y,R1H ; ; R0H = X R1H = sqrt(X) pipeline delay pipeline delay pipeline delay pipeline delay Y = sqrt(X) 7 cycles SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Trigonometric Math Unit (TMU) 395 SINPUF32 RaH, RbH — 32-Bit Floating-Point Sine (per unit) www.ti.com SINPUF32 RaH, RbH 32-Bit Floating-Point Sine (per unit) Operands RaH RbH Floating-point destination register (R0H to R7H) Floating-point source register (R0H to R7H) Opcode LSW MSW 1110 0010 0111 1000 0000 0000 00bb baaa This instruction performs the following equivalent operation: Description PerUnit = fraction(RbH) RaH = sin(PerUnit*2pi) In control applications radians are usually normalized to the range of -1.0 to 1.0. Per Unit 1.0 0.0 -1.0 Radians 2pi 0 -2pi The operation takes the fraction of the input value RbH. This equates to the cosine waveform repeating itself every 2pi radians RbH Per Unit Radians 2.0 0.0 0 Sine Value 0.0 1.75 0.75 3pi/2 -1.0 1.5 0.5 pi 0.0 1.25 0.25 pi/2 1.0 1.0 0.0 0 0.0 0.75 0.75 3pi/2 -1.0 0.5 0.5 pi 0.0 0.25 0.25 pi/2 1.0 0.0 0.0 0 0.0 -0.25 -0.25 -pi/2 -1.0 -0.5 -0.5 -pi 0.0 -0.75 -0.75 -3pi/2 1.0 -1.0 0.0 0 0.0 -1.25 -0.25 -pi/2 -1.0 -1.5 -0.5 -pi 0.0 -1.75 -0.75 -3pi/2 1.0 -2.0 0.0 0 0.0 Flags 396 Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Trigonometric Math Unit (TMU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated SINPUF32 RaH, RbH — 32-Bit Floating-Point Sine (per unit) www.ti.com Restrictions If the input value is too small (<= 2^-33) or too big (>= 2^22), then the output will be returned as 0.0 (no flags affected). Pipeline Instruction takes 4 pipeline cycles to execute. Example ;; Convert Radian value to PerUnit value and ;; calculate Sin value: MOV32 R0H,@RadianValue ; R0H = Radian value DIV2PIF32 R1H,R0H ; R1H=R0H/2pi= Per Unit Value NOP ; pipeline delay SINPUF32 R2H,R1H ; R2H = SINPU(fraction(R1H)) NOP ; pipeline delay NOP ; pipeline delay NOP ; pipeline delay MOV32 @SinValue,R2H ; Sin Value=sin(Radian Value) ; 8 cycles SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Trigonometric Math Unit (TMU) 397 COSPUF32 RaH, RbH — 32-Bit Floating-Point Cosine (per unit) www.ti.com COSPUF32 RaH, RbH 32-Bit Floating-Point Cosine (per unit) Operands RaH RbH Floating-point destination register (R0H to R7H) Floating-point source register (R0H to R7H) Opcode LSW MSW 1110 0010 0111 1001 0000 0000 00bb baaa This instruction performs the following equivalent operation: Description PerUnit = fraction(RbH) RaH = cos(PerUnit*2pi) In control applications radians are usually normalized to the range of -1.0 to 1.0. Per Unit 1.0 0.0 -1.0 Radians 2pi 0 -2pi The operation takes the fraction of the input value RbH. This equates to the cosine waveform repeating itself every 2pi radians RbH Per Unit Radians Cosine Value 2.0 0.0 0 1.0 1.75 0.75 3pi/2 0.0 1.5 0.5 pi -1.0 1.25 0.25 pi/2 0.0 1.0 0.0 0 1.0 0.75 0.75 3pi/2 0.0 0.5 0.5 pi -1.0 0.25 0.25 pi/2 0.0 0.0 0.0 0 1.0 -0.25 -0.25 -pi/2 0.0 -0.5 -0.5 -pi -1.0 -0.75 -0.75 -3pi/2 0.0 1.0 -1.0 0.0 0 -1.25 -0.25 -pi/2 0.0 -1.5 -0.5 -pi -1.0 -1.75 -0.75 -3pi/2 0.0 -2.0 0.0 0 1.0 Flags 398 Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No No Trigonometric Math Unit (TMU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated COSPUF32 RaH, RbH — 32-Bit Floating-Point Cosine (per unit) www.ti.com Restrictions If the input value is too small (<= 2^-33) or too big (>= 2^22), then the output will be returned as 1.0 (no flags affected). Pipeline Instruction takes 4 pipeline cycles to execute. Example ;; Convert Radian value to PerUnit value and ;; calculate Sin value: MOV32 R0H,@RadianValue ; R0H = Radian value DIV2PIF32 R1H,R0H ; R1H=R0H/2pi= Per Unit Value NOP ; pipeline delay COSPUF32 R2H,R1H ; R2H = COSPU(fraction(R1H)) NOP ; pipeline delay NOP ; pipeline delay NOP ; pipeline delay MOV32 @CosValue,R2H ; Cos Value=cos(Radian Value) ; 8 cycles SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Trigonometric Math Unit (TMU) 399 ATANPUF32 RaH, RbH — 32-Bit Floating-Point ArcTangent (per unit) www.ti.com ATANPUF32 RaH, RbH 32-Bit Floating-Point ArcTangent (per unit) Operands RaH RbH Floating-point destination register (R0H to R7H) Floating-point source register (R0H to R7H) Opcode LSW MSW 1110 0010 0111 1010 0000 0000 00bb baaa This instruction computes the arc tangent of a given value and returns the result as a per-unit value: Description PerUnit = atan(RbH)/2pi The operation limits the input ranget of the input value RbH to: -1.0 < = RbH < = 1.0 Values outside this range return 0.125 as follows: RbH Per Unit Radians ATANPU Value LVF Flag >1.0 0.125 pi/4 0.125 1 1.0 0.125 pi/4 0.125 0.0 0.0 0 0.0 -1.0 -0.125 -pi/4 -0.125 <-1.0 -0.125 -pi/4 -0.125 1 Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No No Yes Pipeline Instruction takes 4 pipeline cycles to execute. Example ;; Calculate ATAN and generate ;; convert to Radians: MOV32 R0H,@AtanValue ATANPUF32 R1H,R0H NOP NOP NOP MPY2PIF32 R2H,R1H NOP MOV 400 Trigonometric Math Unit (TMU) @RadianValue,R2H Per Unit value and ; ; ; ; ; ; ; ; ; ; R0H = Atan Value R1H = ATANPU(R0H) pipeline delay pipeline delay pipeline delay R2H = R1H * 2pi = Radian value pipeline delay Store result 8 cycles SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated QUADF32 RaH, RbH, RcH — Quadrant Determination Used in Conjunction With ATANPUF32() www.ti.com QUADF32 RaH, RbH, RcH Quadrant Determination Used in Conjunction With ATANPUF32() Operands RaH RbH RcH RdH Floating-point destination register (R0H to R7H) Floating-point destination register (R0H to R7H) Floating-point source register (R0H to R7H) Floating-point source register (R0H to R7H) Opcode LSW MSW Description 1110 0010 0111 1100 0000 dddc ccbb baaa This operation, in conjunction with atanpu(), is used in calculating atanpu2() for a full circle: RdH = X value RcH = Y value RbH = Ratio of X & Y RaH = Quadrant value (0.0, ±0.25, ±0.5) Calculation of RaH (Quadrant) and RbH (Ratio) Based on RcH (Y) and RdH (X) Values shows how the values RaH and RbH are generated based on the contents of RbH and RcH. The algorithm for this instruction is as follows: if( (fabs(RcH(Y)) == 0.0) & (fabs(RdH(X)) == 0.0) ) { RaH(Quadrant) = 0.0; RbH(Ratio) = 0.0; }else if( fabs(RcH(Y)) < = fabs(RdH(X)) ) { RbH(Ratio) = RcH(Y) / RdH(X); if( RdH(X) >= 0.0 ) RaH(Quadrant) = 0.0; else { if( RcH(Y) >= 0.0 ) RaH(Quadrant) = 0.5; else RaH(Quadrant) = -0.5; } }else { if( RcH(Y) >= 0.0 ) RaH(Quadrant) = 0.25; else RaH(Quadrant) = -0.25; RbH(Ratio) = - RdH(X) / RcH(Y); } Flags Flag TF ZI NI ZF NF LUF LVF Modified No No No No No Yes Yes SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Trigonometric Math Unit (TMU) 401 QUADF32 RaH, RbH, RcH — Quadrant Determination Used in Conjunction With ATANPUF32() www.ti.com Restrictions Division Result LVF LUF 0/0 0 1 - 0/Inf 0 - 1 Inf/Normal Inf 1 - Inf/0 Inf 1 - Inf/Inf Inf - 1 Normal/0 Inf 1 - Normal/Inf 0 - 1 Pipeline Instruction takes 5 pipeline cycles to execute. Example ;; Calculate Z = atan2(Y,X), where Z is in ;; radians: MOV32 R0H,@X ; R0H = X MOV32 R1H,@Y ; R1H = Y ;; if(Y <= X) R2H= R1H/R0H ;; else R2H= -R0H/R1H ;; R3H= 0.0, +/-0.25, +/-0.5 QUADF32 R3H,R2H,R1H,R0H NOP ; pipeline delay NOP ; pipeline delay NOP ; pipeline delay NOP ; pipeline delay ;; R4H = ATANPU(R2H)(Per Unit result) ATANPUF32 R4H,R2H NOP ; pipeline delay NOP ; pipeline delay NOP ; pipeline delay ;; R5H = R3H + ATANPU(R4H) = ATANPU2 value ADDF32 R5H,R3H,R4H NOP ; pipeline delay ;; R6H = ATANPU2 * 2pi = atan2 value(radians) MPY2PIF32 R6H,R5H NOP ; pipeline delay MOV32 @Z,R6H ; store result ; 16 cycles Calculation of RaH (Quadrant) and RbH (Ratio) Based on RcH (Y) and RdH (X) Values If( (|Y| > |X|) & (Y >= 0) ) { Quadrant = 0.25; Ratio = -X/Y; } 0.375 (PU) = 3*pi/4 If( (|Y| <= |X|) & (X < 0) & (Y >= 0) ) { Quadrant = 0.5; Ratio = Y/X } 0.5 (PU) = pi ~0.5 (PU) = ~-pi 0.25 (PU) = pi/2 0.125 (PU) = pi/4 Y ATANPU2(Y,X) = Quadrant + ATANPU(Ratio) If( (|Y| <= |X|) & (X >=0) ) { Ratio = Y/X; X Quadrant = 0.0; } If( (|Y| <= |X|) & (X < 0) & (Y < 0) ) { Quadrant = -0.5; Ratio = Y/X } Note: If( (Y==0) & (X ==0) ) { Ratio = 0.0; Quadrant = 0.0; } -0.25 (PU) = -pi/2 -0.375 (PU) = -3*pi/4 -0.125 (PU) = -pi/4 If( (|Y| > |X|) & (Y < 0) ) { Quadrant = -0.25; Ratio = -X/Y } 402 Trigonometric Math Unit (TMU) SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Revision History www.ti.com Revision History Changes from March 15, 2014 to November 7, 2015 ...................................................................................................... Page • • • Section 2.4.2: Added the last paragraph to this section ............................................................................ 20 Section 1.4.4: Revised this paragraph. ............................................................................................... 21 Chapter 2: C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) : Made changes to the majority of the LSW and MSW opcodes. ................................................................................................................................ 140 SPRUHS1A – March 2014 – Revised December 2015 Submit Documentation Feedback Copyright © 2014–2015, Texas Instruments Incorporated Revision History 403 IMPORTANT NOTICE Texas Instruments Incorporated and its subsidiaries (TI) reserve the right to make corrections, enhancements, improvements and other changes to its semiconductor products and services per JESD46, latest issue, and to discontinue any product or service per JESD48, latest issue. Buyers should obtain the latest relevant information before placing orders and should verify that such information is current and complete. All semiconductor products (also referred to herein as “components”) are sold subject to TI’s terms and conditions of sale supplied at the time of order acknowledgment. TI warrants performance of its components to the specifications applicable at the time of sale, in accordance with the warranty in TI’s terms and conditions of sale of semiconductor products. Testing and other quality control techniques are used to the extent TI deems necessary to support this warranty. Except where mandated by applicable law, testing of all parameters of each component is not necessarily performed. TI assumes no liability for applications assistance or the design of Buyers’ products. Buyers are responsible for their products and applications using TI components. To minimize the risks associated with Buyers’ products and applications, Buyers should provide adequate design and operating safeguards. TI does not warrant or represent that any license, either express or implied, is granted under any patent right, copyright, mask work right, or other intellectual property right relating to any combination, machine, or process in which TI components or services are used. Information published by TI regarding third-party products or services does not constitute a license to use such products or services or a warranty or endorsement thereof. Use of such information may require a license from a third party under the patents or other intellectual property of the third party, or a license from TI under the patents or other intellectual property of TI. Reproduction of significant portions of TI information in TI data books or data sheets is permissible only if reproduction is without alteration and is accompanied by all associated warranties, conditions, limitations, and notices. TI is not responsible or liable for such altered documentation. Information of third parties may be subject to additional restrictions. Resale of TI components or services with statements different from or beyond the parameters stated by TI for that component or service voids all express and any implied warranties for the associated TI component or service and is an unfair and deceptive business practice. TI is not responsible or liable for any such statements. Buyer acknowledges and agrees that it is solely responsible for compliance with all legal, regulatory and safety-related requirements concerning its products, and any use of TI components in its applications, notwithstanding any applications-related information or support that may be provided by TI. Buyer represents and agrees that it has all the necessary expertise to create and implement safeguards which anticipate dangerous consequences of failures, monitor failures and their consequences, lessen the likelihood of failures that might cause harm and take appropriate remedial actions. Buyer will fully indemnify TI and its representatives against any damages arising out of the use of any TI components in safety-critical applications. In some cases, TI components may be promoted specifically to facilitate safety-related applications. With such components, TI’s goal is to help enable customers to design and create their own end-product solutions that meet applicable functional safety standards and requirements. Nonetheless, such components are subject to these terms. No TI components are authorized for use in FDA Class III (or similar life-critical medical equipment) unless authorized officers of the parties have executed a special agreement specifically governing such use. Only those TI components which TI has specifically designated as military grade or “enhanced plastic” are designed and intended for use in military/aerospace applications or environments. Buyer acknowledges and agrees that any military or aerospace use of TI components which have not been so designated is solely at the Buyer's risk, and that Buyer is solely responsible for compliance with all legal and regulatory requirements in connection with such use. TI has specifically designated certain components as meeting ISO/TS16949 requirements, mainly for automotive use. In any case of use of non-designated products, TI will not be responsible for any failure to meet ISO/TS16949. Products Applications Audio www.ti.com/audio Automotive and Transportation www.ti.com/automotive Amplifiers amplifier.ti.com Communications and Telecom www.ti.com/communications Data Converters dataconverter.ti.com Computers and Peripherals www.ti.com/computers DLP® Products www.dlp.com Consumer Electronics www.ti.com/consumer-apps DSP dsp.ti.com Energy and Lighting www.ti.com/energy Clocks and Timers www.ti.com/clocks Industrial www.ti.com/industrial Interface interface.ti.com Medical www.ti.com/medical Logic logic.ti.com Security www.ti.com/security Power Mgmt power.ti.com Space, Avionics and Defense www.ti.com/space-avionics-defense Microcontrollers microcontroller.ti.com Video and Imaging www.ti.com/video RFID www.ti-rfid.com OMAP Applications Processors www.ti.com/omap TI E2E Community e2e.ti.com Wireless Connectivity www.ti.com/wirelessconnectivity Mailing Address: Texas Instruments, Post Office Box 655303, Dallas, Texas 75265 Copyright © 2015, Texas Instruments Incorporated
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf Linearized : No PDF Version : 1.4 Page Mode : UseOutlines Page Count : 404 Creator : TopLeaf 8.0.009 Producer : iText 2.1.7 by 1T3XT Title : TMS320C28x Extended Instruction Sets (Rev. A) Keywords : SPRUHS1, SPRUHS1A Subject : Technical Reference Manual Modify Date : 2015:12:01 13:17:04-06:00 Top Leaf-Profile : final Top Leaf-Version : Version=tlapi 8.0 9 2015/10/02 support@turnkey.com.au Licence=X0514600 Author : Texas Instruments, Incorporated Create Date : 2015:12:01 13:17:04-06:00EXIF Metadata provided by EXIF.tools