VU User’s Manual Users

VU User's Manual

SCE Confidential

SCE CONFIDENTIAL VU User's Manual Version 6.0

-2-

Publication date: April 2002

Sony Computer Entertainment Inc.

1-1, Akasaka 7-chome, Minato-ku

Tokyo 107-0052 Japan

Sony Computer Entertainment America

919 East Hillsdale Blvd.

Foster City, CA 94404, U.S.A.

Sony Computer Entertainment Europe

30 Golden Square

London W1F 9LD, U.K.

The VU User’s Manual is supplied pursuant to and subject to the terms of the Sony Computer Entertainment

PlayStation® license agreements.

The VU User’s Manual is intended for distribution to and use by only Sony Computer Entertainment licensed

Developers and Publishers in accordance with the PlayStation® license agreements.

Unauthorized reproduction, distribution, lending, rental or disclosure to any third party, in whole or in part, of

this book is expressly prohibited by law and by the terms of the Sony Computer Entertainment PlayStation®

license agreements.

Ownership of the physical property of the book is retained by and reserved by Sony Computer Entertainment.

Alteration to or deletion, in whole or in part, of the book, its presentation, or its contents is prohibited.

The information in the VU User’s Manual is subject to change without notice. The content of this book is

Confidential Information of Sony Computer Entertainment.

® and PlayStation® are registered trademarks, and GRAPHICS SYNTHESIZERTM and

EMOTION ENGINETM are trademarks of Sony Computer Entertainment Inc. All other trademarks are property

of their respective owners and/or their licensors.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-3-

About This Manual

The "VU User's Manual" describes the operational functions of the VPU (Vector Operation Unit) built into the

Emotion Engine. For information on data and microprogram transfers to the VPU, refer to the "EE User's

Manual".

- Chapter 1 "VU Overview" describes the configurations of the VPU and the VU (the core of the VPU), the

differences between the two VPUs (VPU0/VPU1) that are embedded, and the operation modes.

- Chapter 2 "Data/Calculation Basic Specifications" describes the numerical data formats used by the VU,

rounding-off operations in calculations, and exception specifications. Note that the description does not

fully conform to the requirements of the IEEE 754 standard.

- Chapter 3 "Micro Mode" describes the architecture and operation of micro mode, in which the VU operates

as a stand-alone processor.

- Chapter 4 "Micro Mode Instruction Reference" describes the individual micro mode instructions.

- Chapter 5 "Macro Mode" describes the architecture and operation of macro mode, in which VPU0 operates

as a coprocessor of the EE Core. This chapter also explains how to control VPU1 from the EE Core.

- Chapter 6 "Macro Mode Instruction Reference" describes the individual macro mode instructions.

- Chapter 7 "Appendix" gives other information, including sample micro programs.

Changes Since Release of 5th Edition

Since release of the 5th Edition of the VU User’s Manual, the following changes have been made.

Note that each of these changes is indicated by a revision bar in the margin of the affected page.

Ch. 1: VU Overview

• Information about the derivation of the M series polynomial was added to “RANDU” in section 1.1.2.

Lower Execution Unit, on page 17.

Ch. 3: Micro Mode

• A correction was made to the description following the figure in Section 3.4.9. XGKICK Pipeline, on

page 52.

Ch. 4: Micro Mode Instruction Reference

• A correction was made to the “Operation Code” figure in the EATANxz reference on page 134.

• A correction was made to “Example” in the IBGTZ reference on page 164.

• A correction was made to “Example” in the IBLEZ reference on page 165.

• A correction was made to “Example” in the IBLTZ reference on page 166.

• A correction was made to “Mnemonic” in the ISW reference on page 173.

• Information about the XGKIXK pipeline and XGKICK synchronization was added to “Remarks” in

the XGKICK reference on page 196.

• Information was added to “Operation” in the XITOP reference on page 197.

Ch. 6: Macro Mode Instruction Reference

• A correction was made to “Operation” in the QMTC2 reference on page 238.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-4-

(This page is left blank intentionally)

SCE CONFIDENTIAL VU User's Manual Version 6.0

-5-

Glossary

Term Definition

EE Emotion Engine. CPU of the PlayStation 2.

EE Core Generalized computation and control unit of EE. Core of the CPU.

COP0 EE Core system control coprocessor.

COP1 EE Core floating-point operation coprocessor. Also referred to as FPU.

COP2 Vector operation unit coupled as a coprocessor of EE Core. VPU0.

GS Graphics Synthesizer.

Graphics processor connected to EE.

GIF EE Interface unit to GS.

IOP Processor connected to EE for controlling input/output devices.

SBUS Bus connecting EE to IOP.

VPU (VPU0/VPU1) Vector operation unit.

EE contains 2 VPUs: VPU0 and VPU1.

VU (VU0/VU1) VPU core operation unit.

VIF (VIF0/VIF1) VPU data decompression unit.

VIFcode Instruction code for VIF.

SPR Quick-access data memory built into EE Core (Scratchpad memory).

IPU EE Image processor unit.

word Unit of data length: 32 bits

qword Unit of data length: 128 bits

Slice Physical unit of DMA transfer: 8 qwords or less

Packet Data to be handled as a logical unit for transfer processing.

Transfer list A group of packets transferred in serial DMA transfer processing.

Tag Additional data indicating data size and other attributes of packets.

DMAtag Tag positioned first in DMA packet to indicate address/size of data and address

of the following packet.

GS primitive Data to indicate image elements such as point and triangle.

Context A set of drawing information (e.g. texture, distant fog color, and dither matrix)

applied to two or more primitives uniformly. Also referred to as the drawing

environment.

GIFtag Additional data to indicate attributes of GS primitives.

Display list A group of GS primitives to indicate batches of images.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-6-

(This page is left blank intentionally)

SCE CONFIDENTIAL VU User's Manual Version 6.0

-7-

Contents

1. VU Overview .........................................................................................................................................................................15

1.1. VPU Structure ................................................................................................................................................................16

1.1.1. Upper Execution Unit ...........................................................................................................................................16

1.1.2. Lower Execution Unit ...........................................................................................................................................17

1.1.3. Floating-Point Registers ........................................................................................................................................18

1.1.4. Integer Register.......................................................................................................................................................18

1.1.5. VU Mem..................................................................................................................................................................18

1.1.6. Micro Mem..............................................................................................................................................................19

1.2. VU Execution Mode .....................................................................................................................................................20

1.3. VU Operation Status.....................................................................................................................................................21

1.3.1. Ready State ..............................................................................................................................................................21

1.3.2. Run State .................................................................................................................................................................21

1.3.3. Stop State.................................................................................................................................................................21

1.4. VU Usage........................................................................................................................................................................23

1.4.1. VU1 Usage Outline................................................................................................................................................23

1.4.2. VU0 Usage Outline................................................................................................................................................23

2. Data/Calculation Basic Specifications................................................................................................................................25

2.1. Data Format ................................................................................................................................................................... 26

2.1.1. Floating-Point Values ............................................................................................................................................26

2.1.2. Fixed-Point Values.................................................................................................................................................26

2.2. Rounding Off Floating-Point Values..........................................................................................................................27

2.3. Exception Processing ....................................................................................................................................................28

2.4. Differences from IEEE 754......................................................................................................................................... 29

3. Micro Mode............................................................................................................................................................................31

3.1. Micro Mode Register Set...............................................................................................................................................32

3.1.1. Floating-Point Registers ........................................................................................................................................32

3.1.2. Integer Registers.....................................................................................................................................................32

3.1.3. ACC Registers......................................................................................................................................................... 33

3.1.4. I Register..................................................................................................................................................................33

3.1.5. Q Register................................................................................................................................................................34

3.1.6. R Register ................................................................................................................................................................ 34

3.1.7. P Register................................................................................................................................................................. 34

3.2. Micro Instruction Set Overview .................................................................................................................................. 35

3.2.1. Upper Instructions ................................................................................................................................................. 35

3.2.2. Lower Instructions.................................................................................................................................................37

3.3. Flags................................................................................................................................................................................. 39

3.3.1. MAC Flags............................................................................................................................................................... 39

3.3.2. Status Flags (SF) .....................................................................................................................................................39

3.3.3. Clipping Flags (CF) ................................................................................................................................................ 40

3.3.4. Flag Set Instructions ..............................................................................................................................................40

SCE CONFIDENTIAL VU User's Manual Version 6.0

-8-

3.3.5. Flag Changes for Each Instruction.......................................................................................................................41

3.3.6. Flag Changes for Exception Occurrences...........................................................................................................41

3.4. Pipeline Operation .........................................................................................................................................................44

3.4.1. Hazards.....................................................................................................................................................................44

3.4.2. Upper Instruction and Lower Instruction...........................................................................................................44

3.4.3. Priority for Writing to a Register ..........................................................................................................................44

3.4.4. FMAC Pipeline........................................................................................................................................................45

3.4.5. FDIV Pipeline .........................................................................................................................................................46

3.4.6. EFU Pipeline ...........................................................................................................................................................48

3.4.7. IALU Pipeline .........................................................................................................................................................49

3.4.8. Conditional Branching and Pipeline.....................................................................................................................50

3.4.9. XGKICK Pipeline ..................................................................................................................................................52

3.5. Micro Subroutine Execution.........................................................................................................................................53

3.5.1. How to Execute a Micro Subroutine ...................................................................................................................53

3.5.2. How to Terminate a Micro Subroutine ...............................................................................................................53

3.5.3. Operation of Execution and Termination...........................................................................................................53

3.6. Other Functions .............................................................................................................................................................55

3.6.1. Data Transfer with VU Mem/Micro Mem .........................................................................................................55

3.6.2. Debug Support Function.......................................................................................................................................55

4. Micro Mode Instruction Reference .....................................................................................................................................57

4.1. Micro Mode Instruction Set..........................................................................................................................................58

4.1.1. Types of Upper Instruction...................................................................................................................................58

4.1.2. Types of Lower Instructions.................................................................................................................................59

4.1.3. Operation Fields for Micro Instructions .............................................................................................................61

4.2. Upper Instruction Reference ........................................................................................................................................65

ABS : Absolute Value...................................................................................................................................................66

ADD : Add....................................................................................................................................................................67

ADDi : Add to I Register ............................................................................................................................................68

ADDq : Add to Q Register .........................................................................................................................................69

ADDbc : Broadcast Add .............................................................................................................................................70

ADDA : Add; to Accumulator ...................................................................................................................................71

ADDAi : Add I Register; to Accumulator.................................................................................................................72

ADDAq : Add Q Register; to Accumulator .............................................................................................................73

ADDAbc : Broadcast Add; to Accumulator.............................................................................................................74

CLIP : Clipping Judgment ...........................................................................................................................................75

FTOI0 : Convert to Fixed Point.................................................................................................................................77

FTOI4 : Convert to Fixed Point.................................................................................................................................78

FTOI12 : Convert to Fixed Point ..............................................................................................................................79

FTOI15 : Convert to Fixed Point ..............................................................................................................................80

ITOF0 : Convert to Floating-Point Number............................................................................................................81

ITOF4 : Convert to Floating-Point Number............................................................................................................82

ITOF12 : Convert to Floating-Point Number..........................................................................................................83

ITOF15 : Convert to Floating-Point Number..........................................................................................................84

MADD : Product Sum.................................................................................................................................................85

MADDi : Product Sum; with I Register ....................................................................................................................86

MADDq : Product Sum; by Q Register.....................................................................................................................87

SCE CONFIDENTIAL VU User's Manual Version 6.0

-9-

MADDbc : Broadcast Product Sum..........................................................................................................................88

MADDA : Product Sum; to Accumulator................................................................................................................ 89

MADDAi : Product Sum

;

by I register, to Accumulator ........................................................................................90

MADDAq : Product Sum; by Q Register, to Accumulator.................................................................................... 91

MADDAbc : Broadcast Product Sum; to Accumulator ......................................................................................... 92

MAX : Maximum Value ..............................................................................................................................................93

MAXi : Maximum Value.............................................................................................................................................94

MAXbc : Maximum Value .......................................................................................................................................... 95

MINI : Minimum Value .............................................................................................................................................. 96

MINIi : Minimum Value .............................................................................................................................................97

MINIbc : Minimum Value .......................................................................................................................................... 98

MSUB : Multiply and Subtract.................................................................................................................................... 99

MSUBi : Multiply and Subtract; with I Register.....................................................................................................100

MSUBq : Multiply and Subtract; by Q Register .....................................................................................................101

MSUBbc : Broadcast Multiply and Subtract........................................................................................................... 102

MSUBA : Multiply and Subtract; to Accumulator.................................................................................................103

MSUBAi : Multiply and Subtract; with I Register, to Accumulator ....................................................................104

MSUBAq : Multiply and Subtract; by Q Register, to Accumulator.....................................................................105

MSUBAbc : Broadcast Multiply and Subtract; to Accumulator ..........................................................................106

MUL : Multiply...........................................................................................................................................................107

MULi : Multiply by I Register...................................................................................................................................108

MULq : Multiply by Q Register................................................................................................................................109

MULbc : Multiply by Broadcast ...............................................................................................................................110

MULA : Multiply; to Accumulator ..........................................................................................................................111

MULAi : Multiply by I Register, to Accumulator ..................................................................................................112

MULAq : Multiply by Q Register, to Accumulator ...............................................................................................113

MULAbc : Broadcast Multiply by broadcast, to Accumulator.............................................................................114

NOP : No Operation.................................................................................................................................................115

OPMULA : Vector Outer Product..........................................................................................................................116

OPMSUB : Vector Outer Product...........................................................................................................................117

SUB : Subtract ............................................................................................................................................................118

SUBi : Subtract I Register..........................................................................................................................................119

SUBq : Subtract Q Register ......................................................................................................................................120

SUBbc : Broadcast Subtract......................................................................................................................................121

SUBA : Substract; to Accumulator ..........................................................................................................................122

SUBAi : Subtract I Register; to Accumulator.........................................................................................................123

SUBAq : Subtract Q Register; to Accumulator......................................................................................................124

SUBAbc : Broadcast Subtract; to Accumulator .....................................................................................................125

4.3. Lower Instruction Reference......................................................................................................................................126

B : Unconditional Branch..........................................................................................................................................127

BAL : Unconditional Branch with Saving Address ...............................................................................................128

DIV : Divide ............................................................................................................................................................... 129

EATAN : Arctangent ................................................................................................................................................130

EATANxy : Arctangent ............................................................................................................................................132

EATANxz : Arctangent ............................................................................................................................................134

EEXP : Exponent......................................................................................................................................................136

SCE CONFIDENTIAL VU User's Manual Version 6.0

-10-

ELENG : Length....................................................................................................................................................... 137

ERCPR : Reciprocal Number .................................................................................................................................. 138

ERLENG : Reciprocal Number of Length............................................................................................................ 139

ERSADD : Reciprocal Number .............................................................................................................................. 140

ERSQRT : Reciprocal Number of Square Root.................................................................................................... 141

ESADD : Sum of Square Numbers ........................................................................................................................ 142

ESIN : Sine................................................................................................................................................................. 143

ESQRT : Square Root............................................................................................................................................... 144

ESUM : Sum of Each Field...................................................................................................................................... 145

FCAND : Test Clipping Flag................................................................................................................................... 146

FCEQ : Test Clipping Flag....................................................................................................................................... 147

FCGET : Get Clipping Flag..................................................................................................................................... 148

FCOR : Test Clipping Flag....................................................................................................................................... 149

FCSET : Setting Clipping Flag................................................................................................................................. 150

FMAND : Test MAC Flag Check ........................................................................................................................... 151

FMEQ : Test MAC Flag Check............................................................................................................................... 152

FMOR : Test MAC Flag Check............................................................................................................................... 153

FSAND : Test Status Flag Check............................................................................................................................ 154

FSEQ : Test Status Flag Check................................................................................................................................ 155

FSOR : Test Status Flag............................................................................................................................................ 156

FSSET : Set Sticky Flags........................................................................................................................................... 157

IADD : ADD Integer ............................................................................................................................................... 158

IADDI : Add Immediate Value Integer ................................................................................................................. 159

IADDIU : Add Immediate Integer ......................................................................................................................... 160

IAND : Logical Product ........................................................................................................................................... 161

IBEQ : Conditional Branch...................................................................................................................................... 162

IBGEZ : Conditional Branch................................................................................................................................... 163

IBGTZ : Conditional Branch................................................................................................................................... 164

IBLEZ : Conditional Branch ................................................................................................................................... 165

IBLTZ : Conditional Branch.................................................................................................................................... 166

IBNE : Conditional Branch...................................................................................................................................... 167

ILW : Integer Load with Offset Specification ....................................................................................................... 168

ILWR : Integer Load................................................................................................................................................. 169

IOR : Logical Sum ..................................................................................................................................................... 170

ISUB : Integer Subtract............................................................................................................................................. 171

ISUBIU : Immediate Value Integer Subtract......................................................................................................... 172

ISW : Integer Store with Offset............................................................................................................................... 173

ISWR : Integer Store ................................................................................................................................................. 174

JALR : Unconditional Jump with Address Saving ................................................................................................ 175

JR : Unconditional Jump........................................................................................................................................... 176

LQ : Load Qword...................................................................................................................................................... 177

LQD : Load Qword with Pre-Decrement.............................................................................................................. 178

LQI : Load with Post-Increment............................................................................................................................. 179

MFIR : Move from Integer Register to Floating-Point Register......................................................................... 180

MFP : Move from P Register to Floating-Point Register..................................................................................... 181

MOVE : Transfer between Floating-Point Registers............................................................................................ 182

SCE CONFIDENTIAL VU User's Manual Version 6.0

-11-

MR32 : Move with Rotate.........................................................................................................................................183

MTIR : Move from Floating-Point Register to Integer Register .........................................................................184

RGET : Get Random Number ................................................................................................................................185

RINIT : Random Number Intialize.........................................................................................................................186

RNEXT : Next Random Number ...........................................................................................................................187

RSQRT : Square Root Division ...............................................................................................................................188

RXOR : Random Number Set .................................................................................................................................189

SQ : Store Qword with Offset .................................................................................................................................190

SQD : Store Qword with Pre-Decrement ..............................................................................................................191

SQI : Store with Post-Increment..............................................................................................................................192

SQRT : Square Root ..................................................................................................................................................193

WAITP : P Register Syncronize ...............................................................................................................................194

WAITQ : Q Register Syncronize .............................................................................................................................195

XGKICK : GIF Control...........................................................................................................................................196

XITOP : VIF Control................................................................................................................................................196

XTOP : VIF Control .................................................................................................................................................198

5. Macro Mode.........................................................................................................................................................................199

5.1. Macro Mode Register Set............................................................................................................................................200

5.1.1. Floating-Point Registers ......................................................................................................................................200

5.1.2. Integer Registers...................................................................................................................................................200

5.1.3. Control Registers..................................................................................................................................................200

5.1.4. Special Registers ...................................................................................................................................................205

5.2. Macro Instruction Set Overview................................................................................................................................206

5.2.1. MIPS COP2 Instructions ....................................................................................................................................206

5.2.2. Coprocessor Transfer Instructions ....................................................................................................................206

5.2.3. Coprocessor Branch Instructions ......................................................................................................................206

5.2.4. Coprocessor Calculation Instructions ...............................................................................................................206

5.2.5. Micro Subroutine Execution Instructions ........................................................................................................208

5.3. Flags............................................................................................................................................................................... 209

5.4. Macro Mode Pipeline ..................................................................................................................................................210

5.4.1. Pipeline Structure of Macroinstructions............................................................................................................210

5.4.2. Hazards in Macro Mode......................................................................................................................................210

5.4.3. Macroinstruction Operation ...............................................................................................................................211

5.4.4. Operation when Transferring Data with EE Core..........................................................................................211

5.4.5. Operation when Executing a Micro Subroutine ..............................................................................................215

5.4.6. Micro Subroutine and Data Transfer Operations............................................................................................216

5.4.7. Q Register Synchronization ................................................................................................................................218

5.4.8. Notes on Other Pipeline Operations.................................................................................................................218

5.5. VU1 Control.................................................................................................................................................................220

5.5.1. MIPS COP2 Condition Signal............................................................................................................................220

5.5.2. MIPS COP2 Control Register ............................................................................................................................220

5.5.3. Floating-Point Registers ......................................................................................................................................221

5.5.4. Integer Registers...................................................................................................................................................222

5.5.5. Control Registers..................................................................................................................................................222

6. Macro Mode Instruction Reference..................................................................................................................................225

6.1. Macro Instruction Operation Code...........................................................................................................................226

SCE CONFIDENTIAL VU User's Manual Version 6.0

-12-

6.1.1. Macro Instruction Operation Type ................................................................................................................... 226

6.1.2. Macro Instruction Operation Field ................................................................................................................... 227

6.2. Macro Instruction Set ................................................................................................................................................. 229

BC2F : Branch on COP2 Conditional Signal......................................................................................................... 230

BC2FL : Branch on COP2 Conditional Signal...................................................................................................... 231

BC2T : Branch on COP2 Conditional Signal ........................................................................................................ 232

BC2TL : Branch on COP2 Conditional signal ...................................................................................................... 233

CFC2 : Transfer Integer Data from VU to EE Core ........................................................................................... 234

CTC2 : Transfer Integer Data from EE Core to VU ........................................................................................... 235

LQC2 : Floating-Point Data Transfer from EE Core to VU.............................................................................. 236

QMFC2 :

Floating-Point Data Transfer from VU to EE Core........................................................................... 237

QMTC2 :

Floating-Point Data Transfer from EE Core to VU .......................................................................... 238

SQC2 :

Floating-Point Data Transfer from VU to EE Core............................................................................... 239

VABS : Absolute Value............................................................................................................................................. 240

VADD : Add.............................................................................................................................................................. 241

VADDi : Add to I Register ...................................................................................................................................... 242

VADDq : Add to Q Register ................................................................................................................................... 243

VADDbc : Broadcast Add ....................................................................................................................................... 244

VADDA : Add to Accumulator .............................................................................................................................. 245

VADDAi : Add I Register to Accumulator ........................................................................................................... 246

VADDAq : Add Q Register to Accumulator ........................................................................................................ 247

VADDAbc : Broadcast Add to Accumulator........................................................................................................ 248

VCALLMS : Start Micro Sub-Routine.................................................................................................................... 249

VCALLMSR : Start Micro Sub-Routine by Register............................................................................................. 250

VCLIP : Clipping Judgment..................................................................................................................................... 251

VDIV : Divide............................................................................................................................................................ 252

VFTOI0 : Conversion to Fixed Point..................................................................................................................... 253

VFTOI4 : Conversion to Fixed Point..................................................................................................................... 254

VFTOI12 : Conversion to Fixed Point .................................................................................................................. 255

VFTOI15 : Conversion to Fixed Point .................................................................................................................. 256

VIADD : Add Integer............................................................................................................................................... 257

VIADDI : Add Immediate Value Integer .............................................................................................................. 258

VIAND : Logical Product ........................................................................................................................................ 259

VILWR : Integer Load.............................................................................................................................................. 260

VIOR : Logical Sum.................................................................................................................................................. 261

VISUB : Integer Subtract.......................................................................................................................................... 262

VISWR : Integer Store .............................................................................................................................................. 263

VITOF0 : Conversion to Floating-Point Number................................................................................................ 264

VITOF4 : Conversion to Floating-Point Number................................................................................................ 265

VITOF12 : Conversion to Floating-Point Number.............................................................................................. 266

VITOF15 : Conversion to Floating-Point Number.............................................................................................. 267

VLQD : Load with Pre-Decrement ........................................................................................................................ 268

VLQI : Load with Post-Increment.......................................................................................................................... 269

VMADD : Product Sum........................................................................................................................................... 270

VMADDi : Product Sum; with I Register .............................................................................................................. 271

VMADDq : Product Sum; with Q Register........................................................................................................... 272

SCE CONFIDENTIAL VU User's Manual Version 6.0

-13-

VMADDbc : Broadcast Product Sum.....................................................................................................................273

VMADDA : Product Sum; to Accumulator...........................................................................................................274

VMADDAi : Product Sum; with I Register, to Accumulator..............................................................................275

VMADDAq : Product Sum; with Q Register, to Accumulator ...........................................................................276

VMADDAbc : Broadcast Product Sum; to Accumulator ....................................................................................277

VMAX : Maximum Value .........................................................................................................................................278

VMAXi : Maximum Value........................................................................................................................................279

VMAXbc : Maximum Value.....................................................................................................................................280

VMFIR : Transfer from Integer Register to Floating-Point Register..................................................................281

VMINI : Minimum Value .........................................................................................................................................282

VMINIi : Minimum Value ........................................................................................................................................283

VMINIbc : Minimum Value.....................................................................................................................................284

VMOVE : Transfer between Floating-Point Registers.........................................................................................285

VMR32 : Vector Rotate.............................................................................................................................................286

VMSUB : Multiply and Subtract ..............................................................................................................................287

VMSUBi : Multiply and Subtract with I Register...................................................................................................288

VMSUBq : Multiply and Subtract; Q Register........................................................................................................289

VMSUBbc : Broadcast Multiply and Subtract........................................................................................................290

VMSUBA : Multiply and Subtract; to Accumulator..............................................................................................291

VMSUBAi : Multiply and Subtract; with I Register, to Accumulator .................................................................292

VMSUBAq : Multiply and Subtract; with Q Register, to Accumulator ..............................................................293

VMSUBAbc : Broadcast Multiply and Subtract; to Accumulator .......................................................................294

VMTIR : Transfer from Floating-Point Register to Integer Register .................................................................295

VMUL : Multiply........................................................................................................................................................296

VMULi : Multiply; by I Register...............................................................................................................................297

VMULq : Multiply; by Q Register............................................................................................................................298

VMULbc : Broadcast Multiply .................................................................................................................................299

VMULA : Multiply; to Accumulator .......................................................................................................................300

VMULAi : Multiply by I Register; to Accumulator ...............................................................................................301

VMULAq : Multiply by Q Register; to Accumulator ............................................................................................302

VMULAbc : Broadcast Multiply; to Accumulator.................................................................................................303

VNOP : No Operation..............................................................................................................................................304

VOPMULA : Vector Outer Product.......................................................................................................................305

VOPMSUB : Vector Outer Product........................................................................................................................306

VRGET : Get Random Numbers............................................................................................................................307

VRINIT : Random Number Initial Set...................................................................................................................308

VRNEXT : New Random Numbers.......................................................................................................................309

VRSQRT : Square Root Division ............................................................................................................................310

VRXOR : Random Number Set ..............................................................................................................................311

VSQD : Store with Pre-Decrement.........................................................................................................................312

VSQI : Store with Post-Increment ..........................................................................................................................313

VSQRT : Square Root ...............................................................................................................................................314

VSUB : Subtract .........................................................................................................................................................315

VSUBi : Subtract I Register ......................................................................................................................................316

VSUBq : Subtract Q Register ...................................................................................................................................317

VSUBbc : Broadcast Subtract...................................................................................................................................318

SCE CONFIDENTIAL VU User's Manual Version 6.0

-14-

VSUBA : Subtract; to Accumulator ........................................................................................................................ 319

VSUBAi : Subtract I Register; to Accumulator ..................................................................................................... 320

VSUBAq : Subtract Q Register; to Accumulator .................................................................................................. 321

VSUBAbc : Broadcast Subtract; to Accumulator.................................................................................................. 322

VWAITQ : Q Register Synchronize ....................................................................................................................... 323

7. Appendix.............................................................................................................................................................................. 325

7.1. Sample Micro Programs ............................................................................................................................................. 326

7.2. EFU Processing........................................................................................................................................................... 353

7.3. Micro Subroutine Debugging .................................................................................................................................... 362

7.3.1. Debug Flow.......................................................................................................................................................... 362

7.3.2. Notes on Re-execution........................................................................................................................................ 363

7.4. Throughput / Latency List ........................................................................................................................................ 365

SCE CONFIDENTIAL VU User's Manual Version 6.0

-15-

1. VU Overview

The VU is a vector ALU that efficiently performs four-element floating-point vector calculations. It is part of

the VPU, along with the VU Mem (VU Data Memory) and the VIF (compressed data expansion engine).

Two VPUs are mounted on the EE as shown in Figure 1-1.

Main Memory

GIF

EE

core

VIF0

VU0

Micro Mem0

4KByte

VU Mem0

4KByte

VUcore

VU regs.

VPU0

VIF1

VU1

Micro Mem1

16KByte

VU Mem1

16KByte

VUcore

VU regs

.

VPU1

EFU

Figure 1-1 VPU System Outline

VU0 is joined to the EE Core as COP2 via a coprocessor connection. It assists the EE Core in non-stationary

geometry processing. It has a 4-Kbyte instruction memory (MicroMem0) and a 4-Kbyte data memory (VU

Mem0).

VU1 operates independently, and is chiefly in charge of background stationary geometry processing. VU1 has

an Elementary Function Unit (EFU), as well as a 16-Kbyte instruction memory (MicroMem1) and a 16-Kbyte

data memory (VU Mem1). VU1 is also connected to the GIF (the interface unit to the Graphics Synthesizer),

and the GIF control instruction (XGKICK instruction) is mounted. VU1's floating-point and integer registers

are mapped to VPU0's VU Mem0.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-16-

1.1. VPU Structure

Figure 1-2 is a block diagram of the VPU including the VU.

Micro Mem

4KByte or 16KByte

Upper Instruction Lower Instruction

63 0

64

32 32

bold line : 128 bit

Micro instruction fetch unit

FMACw

FMACz

FMACy

FMACx

Upper Execution Unit

RANDU/etc

LSU

IALU

EFU

Lower Execution Unit

floating

registers

VF00-VF31

(COP2 data registers)

127 0

BRU

VU Mem

4KByte or 16KByte

integer

registers

VI00`VI15

16

QMTC2

/ LQC2

QMFC2

/ SQC2

CTC2

32

CFC2

32

special

registers

VI16`VI31

COP2

control

registers

External Units

Vector Unit :VU

Vector Processing Unit:VPU

FDIV

VIF

Figure 1-2 VU Outline Block Diagram

The VPU consists of the VU, VU Mem (VU Data Memory), and the VIF (compressed data expansion engine).

The VU loads data in 128-bit units (single-precision floating-point number x 4) from the VU Mem, performs

calculations according to micro programs in the VU's internal MicroMem, and stores the results in the VU Mem.

The VU Mem may be used as a temporary area depending on the micro program.

Micro programs employ 64-bit length LIW (Long Instruction Word) instruction sets. They can concurrently

execute a floating-point product-sum calculation in the upper 32 bits (the Upper instruction field) and a floating-

point division or integer calculation in the lower 32 bits (the Lower instruction field).

There are 32 128-bit floating-point registers (single-precision floating-point x 4). There are 16 16-bit integer

registers.

1.1.1. Upper Execution Unit

FMAC

This unit adds, subtracts, multiplies, and does product-sum operations on floating-point numbers. Four

units are mounted in order to efficiently execute four-element vector calculations: FMACx, FMACy,

FMACz, and FMACw. To increase the efficiency of pipeline processing, the latency of the instructions that

use the FMAC has been unified at four cycles.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-17-

1.1.2. Lower Execution Unit

FDIV

This unit performs self-synchronous high-speed floating-point division/square root calculations. It uses a

single-precision floating-point value as input, then stores the calculation result in the dedicated Q register.

The next FDIV instruction cannot be executed while the FDIV is executing. The FDIV stalls if this is

attempted.

LSU

This unit controls Load/Store to and from VU Mem.

Load/Store must be performed in units of 128 bits, but the x, y, z, and w field units can be masked.

There are two ways of specifying addresses. The first specifies a base (Integer) register and offsets in the

operation code field. The second specifies the base register only, not the offsets, and performs post-

incrementing or pre-decrementing. Incrementing/Decrementing is +1/-1 in 128-bit word address units.

IALU

This unit performs 16-bit integer calculations.

Loop counter calculation and Load/Store address calculation are performed using the Integer registers.

BRU

This unit controls jumping and conditional branching.

Most instructions specify PC-relative addresses for the jump target address. The offset is specified by an 11-

bit immediate value, so it is possible to jump within a range of 8 Kbytes before and after the PC. The JR and

JALR instructions are register indirect jump instructions, which use the data in a register as an absolute

address.

Conditional branching is performed by a comparison with one or two Integer registers. When doing

conditional branching based on the results of floating-point calculations, the results of the AND operation

with the MAC flag, status flag, or clipping flag and the appropriate mask value are temporarily stored in an

Integer register. Branching is performed by comparing with this register.

RANDU

This unit generates floating-point random numbers in the range +1.0<r<+2.0. Using the M series, the

mantissa is created from the type that the user specified. Due to a feature of the M series, +1.0 does not

appear in the random numbers. When 0 is specified as the type, only +1.0 is created, not a random number.

This M series is represented by the following polynomial:

p(x) = x23 + x5 + 1

EFU

This is the Elementary Function Unit, which performs calculations such as exponential, logarithmic, and

trigonometric functions. This unit is mounted only on VU1.

The EFU uses a scalar value (a single floating-point value) or vector value (four floating-point values) as

input, then stores the scalar value from the calculation result in the dedicated P register. Calculation latency

varies for each function. The next EFU instruction cannot be executed while the EFU is executing. The

EFU stalls if this is attempted.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-18-

1.1.3. Floating-Point Registers

The VU has 32 128-bit floating-point registers (VF00 – VF31), which are equivalent to four single-precision

floating-point values each. For a product-sum calculation, two 128-bit registers can be specified as source

registers and one 128-bit register can be specified as the destination register.

1.1.4. Integer Register

The VU has sixteen 16-bit Integer registers (VI00 – VI15). These registers are used for loop counters and

load/store address calculations.

1.1.5. VU Mem

The VU data memory capacity is 4 Kbytes for VU0, and 16 Kbytes for VU1. This memory is connected to the

LSU (Load/Store Unit) at a width of 128 bits, and the address is qword (16 bytes) aligned. The effective data

address must be divisible by 16: the address divided by 16 is specified in some instructions.

Address

0x0000 w z y x

0x0010 w z y x

:

0x0ff0 w z y x

:

Mounted on VU1 only

:

0x3ff0 w z y x

Furthermore, VU1 registers are mapped to addresses 0x4000 to 0x43ff in VU0.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-19-

1.1.6. Micro Mem

This on-chip memory stores 64-bit length LIW (Long Instruction Word) microinstructions. The capacity is 4

Kbytes for VU0, and 16 Kbytes for VU1.

Since the instruction length is 64 bits, instruction addresses must be divisible by 8. The address divided by 8 is

specified in branches and other instructions.

Address

0x0000 Upper Lower

0x0008 Upper Lower

:

0x0ff8 Upper Lower

:

Mounted on VU1 only

:

0x3ff8 Upper Lower

SCE CONFIDENTIAL VU User's Manual Version 6.0

-20-

1.2. VU Execution Mode

There are two VU execution modes: micro mode and macro mode.

In micro mode, the VU functions as a stand-alone processor. It executes microinstruction programs stored in

Micro Mem. VU1 operates in this mode.

In macro mode, the VU functions as COP2 (Coprocessor 2) of the EE Core. VU0 chiefly operates in this

mode.

Macroinstructions lack some of the microinstruction-equivalent functions. Upper instructions and Lower

instructions cannot be executed simultaneously in macroinstructions. However, it is possible to execute

CALLMS instructions, which execute microinstruction programs in MicroMem as subroutines, and COP2 data

transfer instructions, which transfer data to and from a VU register.

Micro mode (VU1/VU0) Macro mode (VU0)

Operation Operates as a stand-alone processor Operates as a coprocessor of EE Core.

Operation code 64-bit long LIW instructions 32-bit MIPS COP2 instructions

Instruction set

Upper instruction+

Lower instruction

(Can be specified simultaneously)

EFU instructions (VU1 only)

External unit control instructions

(VU1 only)

Upper instruction

Lower instruction (partial)

VCALLMS, VCALLMSR

COP2 transfer instructions

Total instruction

count

127 instructions 90 instructions

EFU Is usable as an option (VU1 only) Is not supported

Registers Floating-point registers: 32 x 128 bits

Integer registers: 16

Special registers: ACC, I, Q, R (,P)

Floating-point registers: 32 x 128 bits

Integer registers: 16

Special registers: ACC, I, Q, R

Control registers: 16

SCE CONFIDENTIAL VU User's Manual Version 6.0

-21-

1.3. VU Operation Status

The VU has three operation states: Ready, Run, and Stop. The following sections explain these states and their

transitions to other states. See also "5.1.3. Control Registers" for information regarding state transitions.

1.3.1. Ready State

The Ready state is a stand-by state. When power is turned on, the VU goes into the Ready state, and receives

micro subroutine start-up, macroinstructions, and coprocessor transfer instructions from the EE Core. The VU

can also receive micro program start-up from the VIF.

The VU shifts from the Ready state to the Run state when a micro subroutine is started or a macroinstruction is

executed. On reset, the control register is initialized and the VU goes into the Ready state again.

The VU enters the Stop state if a ForceBreak occurs.

1.3.2. Run State

The Run state is an execution state. In this state, the VU cannot receive micro subroutine start-up nor

macroinstructions from the EE Core. If it does, the EE Core stalls. A coprocessor transfer instruction may or

may not stall, according to the user specification.

The VU shifts from the Run state to the Ready state at micro subroutine E bit termination or macroinstruction

execution termination. The VU shifts to the Stop state when a D bit halt, T bit halt, or ForceBreak occurs

during execution of a micro subroutine.

1.3.3. Stop State

The Stop state is used for debugging. In this state, the VU can receive micro subroutine start-up and

coprocessor transfer instructions from the EE Core. Operation is indeterminate if a macroinstruction is

executed. The VU cannot receive micro program start-up from the VIF.

The VU shifts from the Stop state to the Run state when the VCALLMS or VCALLMSR instruction is

executed, or the CMSAR1 register is written to. The control register is initialized by resetting, then the VU

enters the Ready state. There is no status shift with a ForceBreak.

From the point of view of the VIF, there is no difference between the Run state and the Stop state.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-22-

Ready

Run Stop

VCALLMS

VCALLMSR

Writing to CMSAR1(VI31)

D bit

T bit

ForceBreak

Reset

ForceBreak

E bit

Macro calculation instruction end

Reset

VCALLMS

VCALLMSR

Micro subroutine start from VIF

Writing to CMSAR1(VI31)

Macroinstruction execution

Power-on

Reset

ForceBreak

Figure 1-3 VU Status Shift

SCE CONFIDENTIAL VU User's Manual Version 6.0

-23-

1.4. VU Usage

This section briefly explains VU usage from the point of view of the EE Core. For details, see the document

"EE User's Manual".

1.4.1. VU1 Usage Outline

VU1 performs independently of the EE Core, as a preprocessor of the GS. For this reason, it has a unique on-

chip data memory (VU Mem1) and instruction memory (MicroMem1), and is directly connected to the GS via

the GIF.

Microinstruction programs executed in VU1 are DMA-transferred from main memory to MicroMem1 via VIF1.

Data required by VU1 is transferred to VU Mem1 via VIF1.

Two methods are used to start the microinstruction programs transferred to VU1:

1) Write the execution address to the control register (CMSAR1).

2) Specify the execution address by the VIFcode (MSCAL/MSCALF).

1) is used when returning from the Stop state and 2) is used in ordinary cases.

1.4.2. VU0 Usage Outline

Since VU0 is joined to the EE Core via a coprocessor connection, the VU0 resources can be controlled directly

by EE Core instructions (Macro mode).

By transferring a microinstruction program to VU0 on-chip instruction memory (MicroMem0) in the same way

as VU1, VU0 can be activated as a micro subroutine (Micro Mode).

MicroMem0 and VUMem0 are accessible by EE Core instructions and via VIF0 in the same way as VU1.

Two methods are used to start microinstruction programs in VU0:

1) Execute VCALLMS/VCALLMSR instruction.

2) Specify the execution address by the VIFcode (MSCAL/MSCALF).

SCE CONFIDENTIAL VU User's Manual Version 6.0

-24-

(This page is left blank intentionally)

SCE CONFIDENTIAL VU User's Manual Version 6.0

-25-

2. Data/Calculation Basic Specifications

SCE CONFIDENTIAL VU User's Manual Version 6.0

-26-

2.1. Data Format

Data used by the VU consists of single-precision floating-point values based on IEEE 754 and four types of 32-

bit fixed-point values. The formats are explained below.

2.1.1. Floating-Point Values

Only single-precision (32-bit) floating-point values are supported. The bit fields comply with IEEE 754 as

follows:

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

S E F

1 8 bits 23 bits

Sign S: 1 bit

Exponents E: 8 bits Biased exponent for bias value 127

Mantissa F: 23 bits Excluding hidden bits

The actual mantissa adds 1 hidden bit to the beginning, becoming 1.F. The normalized value is

(-1)S x 1.F x 2(E-127) and zero is E = (F =) 0. The VU does not support non-numeric, infinity, and non-

normalized values based on IEEE 754. See the table below.

Exponent E Mantissa F Value with IEEE 754 Value with VU

255 Other than 0 Non numeric x 2(+128)

255 0 +/- infinity x 2(+128)

254 x 2(+127) x 2(+127)

::: ::: :::

128 x 2(+1) x 2(+1)

127 x 2(+0) x 2(+0)

::: ::: :::

1

Normalization

count

(-1)S x 1.F

x 2(-126)

Normalization

count

(-1)S x 1.F

x 2(-126)

0 Other than 0 Non-normalized value 0

0 0 0 0

2.1.2. Fixed-Point Values

The VU supports four formats of 32-bit fixed-point values. The formats specify the number of bits to the right

of the decimal point: 0 bit, 4 bits, 12 bits, and 15 bits. 2's complement expressions are used for negative

numbers.

Format Range of values Expression precision

0-bit fixed point +2147483647 – -2147483648 1

4-bit fixed point +134217720 – -134217720 0.0625

12-bit fixed point +524287.96875 – -524287.96875 0.000244140625

15-bit fixed point +65535.99609375 – -65535.99609375 0.000030517578125

SCE CONFIDENTIAL VU User's Manual Version 6.0

-27-

2.2. Rounding Off Floating-Point Values

When calculating floating-point values and converting them to and from fixed-point values, rounding off is

performed as follows:

• Calculation

A 24-bit calculation including hidden bits is performed, and the result is truncated. The rounding-off

operation in IEEE 754 is performed in the 0 direction, so the values for the least significant bit may vary.

• Conversion to fixed point

When converting from floating point to fixed point, truncation is made in the 0 direction.

• Conversion from fixed point

When converting from fixed point to floating point, truncation is made in the 0 direction. If the valid bit

count of the fixed-point value exceeds 24 bits, the upper 24 valid bits of the absolute value become the

mantissa, which includes the hidden bits. The remaining bits are truncated.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-28-

2.3. Exception Processing

An exception in the VU means that calculation results differ from normal results (such as division by 0 and

overflow). The VU does not pause when an exception is generated. A flag is set, clamping of the calculation

results is performed, then processing continues.

Exceptions in floating-point calculations are shown in the table below.

Exception Calculation

result

MAC / status flag Sticky flag

0/0 (0 is valid sign) +MAX/-MAX I flag = 1 IS flag = 1

x (x < 0) |x| I flag = 1 IS flag = 1

0 division +MAX/-MAX D flag = 1 DS flag = 1

Exponent overflow +MAX/-MAX Ox/Oy/Oz/Ow/O flag = 1 OS flag = 1

Exponent underflow +0/-0 Ux/Uy/Uz/Uw/U flag = 1

Zx/Zy/Zz/Zw/Z flag = 1

US flag = 1

ZS flag = 1

Conversion overflow

(Floating point count ->

fixed point count)

+MAX/-MAX None None

Note: D flag is not set for 0/0.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-29-

2.4. Differences from IEEE 754

The following are differences between VU calculations and the IEEE 754 standard:

• Precision

The VU supports only single-precision floating-point values.

• Rounding off

Of the four IEEE 754 rounding-off modes, the VU performs rounding similar to a 0 direction truncation.

Since the least significant bit may vary, this method is not exactly the same as IEEE 754.

• Non-numeral/infinity/non-normalized number

The VU does not support non-numerals, infinity, and non-normalized numbers. Therefore, exceptions

related to these items are not generated.

• Overflow/underflow

Overflow and underflow are detected only by the overflow and underflow of the exponent. Underflow in

particular is not considered to lose the precision prescribed by IEEE 754.

• Trap for exception

IEEE 754 recommends that a trap be settable for a calculation result exception. In the VU, processing

continues by just setting a flag. If a trap is necessary, use a flag check instruction after the calculation

instruction.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-30-

(This page is left blank intentionally)

SCE CONFIDENTIAL VU User's Manual Version 6.0

-31-

3. Micro Mode

In Micro mode, the VU executes microinstruction programs (micro subroutines) in MicroMem as a stand-alone

processor. Using the VU in micro mode maximizes the degree of parallelism, and the highest performance can

be achieved.

Microinstructions are 64-bit long LIW instructions. It is possible to specify both an Upper instruction and a

Lower instruction at the same time. The Upper instruction controls four floating-point product-sum ALUs

(FMAC), and the Lower instruction controls one floating-point division/square root ALU (FDIV), one load-

store unit (LSU), and one integer operation unit (IALU), etc. A maximum of six units can execute concurrently.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-32-

3.1. Micro Mode Register Set

In micro mode, 32 floating-point registers, 16 integer registers, and other special registers can be used. These

registers are explained below.

3.1.1. Floating-Point Registers

There are 32 floating-point registers, VF00 - VF31. These are 128-bit long vector registers, which consist of

four single-precision floating-point fields.

The four 32-bit fields have a little-endian arrangement. From the least to the most significant bit, they are

referred to as field x, field y, field z, and field w.

32 bits 32 bits 32 bits 32 bits

127 96 95 64 63 32 31 0

VF00 VF00w VF00z VF00y VF00x

VF01 VF01w VF01z VF01y VF01x

VF02 VF02w VF02z VF02y VF02x

VF03 VF03w VF03z VF03y VF03x

:

VF31 VF31w VF31z VF31y VF31x

VF00 is the constant register. Its fields are set to the following values:

• VF00x: 0.0 (Single-precision floating number)

• VF00y: 0.0 (Single-precision floating number)

• VF00z: 0.0 (Single-precision floating number)

• VF00w: 1.0 (Single-precision floating number)

3.1.2. Integer Registers

There are 16 16-bit long integer registers, VI00 - VI15. VI00 is the constant register and is set to 0.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-33-

15 0

VI00 0 Register

VI01

VI02

VI03

VI04

VI05

VI06

VI07

VI08

VI09

VI10

VI11

VI12

VI13

VI14 Stack pointer (recommended)

VI15 Link register (recommended)

3.1.3. ACC Registers

The ACC registers are accumulators for floating-point product-sum calculations. Four registers exist for four

product-sum ALUs: ACCx, ACCy, ACCz, and ACCw. The ACC registers not only work as destinations of

instructions such as ADDA and MULA but also store the intermediate results of the vector outer product

calculated by OPMULA and OPMULB.

Stalls due to data dependency, described later in this document, do not occur to the ACC register.

3.1.4. I Register

The I register is a 32-bit single-precision floating-point register in which immediate values are stored. When the

I bit (bit 63) of the Upper OP field is set, the content of the Lower OP field is written to the I register at the T

stage of the instruction as a single-precision floating-point number. It is used by the next instruction to be

executed, such as ADDi/MULi. This operation can be described with the LOI pseudo instruction.

No stalls due to data dependency are generated for the I register. Figure 3-1 illustrates a pipeline operation

example in which the I register is used.

MUL.xyzw VF05, VF10, VF20

LOI 100.0 (floating point count imm value)

MULi.xyzw VF10, VF11, I

NOP (MOVE VF00, VF00)

clock cycle 12345678

M T X Y Z S

M X Y Z ST

Figure 3-1 Usage Example of I Register

For more information on pipeline operation, see "3.4. Pipeline Operation".

SCE CONFIDENTIAL VU User's Manual Version 6.0

-34-

3.1.5. Q Register

The Q register is a floating-point register in which results of division, square root operations, and square root

division are stored. Because these calculations differ in latency from other floating-point calculations, a special

register (the Q register) is needed.

No stalls due to data dependency are generated for the Q register, and it is necessary to use the WAITQ

instruction for synchronization.

3.1.6. R Register

The R register is a 23-bit register in which random number values are stored.

3.1.7. P Register

The P register is a 32-bit register in which the EFU instruction results are stored. Because elementary function

calculations differ in latency from other floating-point calculations, a special register (the P register) is needed.

No stalls due to data dependency are generated for the P register, and it is necessary to use the WAITP

instruction for synchronization.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-35-

3.2. Micro Instruction Set Overview

A microinstruction is a 64-bit LIW (Long Instruction Word). It can specify instructions in the Upper instruction

field (the upper 32 bits) and the Lower instruction field (the lower 32 bits) independently.

Instructions that use floating-point product-sum ALUs (FMAC) are usually specified in the Upper instruction

field, and other instructions are specified in the Lower instruction field.

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

Upper Instruction Field

Upper Instruction

32 bits

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

Lower Instruction Field

Lower Instruction

32 bits

3.2.1. Upper Instructions

The Upper instructions are mainly related to floating-point calculations. There are 59 instructions.

Category Instruction Function

ABS absolute Floating- point

calculation ADD addition

ADDi ADD broadcast I register

ADDq ADD broadcast Q register

ADDbc ADD broadcast bc field

ADDA ADD output to ACC

ADDAi ADD output to ACC broadcast I register

ADDAq ADD output to ACC broadcast Q register

ADDAbc ADD output to ACC broadcast bc field

SUB subtraction

SUBi SUB broadcast I register

SUBq SUB broadcast Q register

SUBbc SUB broadcast bc field

SUBA SUB output to ACC

SUBAi SUB output to ACC broadcast I register

SUBAq SUB output to ACC broadcast Q register

SUBAbc SUB output to ACC broadcast bc field

MUL multiply

MULi MUL broadcast I register

MULq MUL broadcast Q register

MULbc MUL broadcast bc field

MULA MUL output to ACC

MULAi MUL output to ACC broadcast I register

MULAq MUL output to ACC broadcast Q register

MULAbc MUL output to ACC broadcast bc field

MADD MUL and ADD

MADDi MUL and ADD broadcast I register

MADDq MUL and ADD broadcast Q register

MADDbc MUL and ADD broadcast bc field

SCE CONFIDENTIAL VU User's Manual Version 6.0

-36-

Category Instruction Function

MADDA MUL and ADD output to ACC

MADDAi MUL and ADD output to ACC broadcast I register

MADDAq MUL and ADD output to ACC broadcast Q register

MADDAbc MUL and ADD output to ACC broadcast bc field

MSUB MUL and SUB

MSUBi MUL and SUB broadcast I register

MSUBq MUL and SUB broadcast Q register

MSUBbc MUL and SUB broadcast bc field

MSUBA MUL and SUB output to ACC

MSUBAi MUL and SUB output to ACC broadcast I register

MSUBAq MUL and SUB output to ACC broadcast Q register

MSUBAbc MUL and SUB output to ACC broadcast bc field

MAX maximum

MAXi MAX broadcast I register

MAXbc MAX broadcast bc field

MINI minimum

MINIi MINI broadcast I register

MINIbc MINI broadcast bc field

OPMULA outer product MULA

OPMSUB outer product MSUB

NOP no operation

FTOI0 float to integer, fixed point 0 bit

FTOI4 float to integer, fixed point 4 bits

FTOI12 float to integer, fixed point 12 bits

Floating-

point/fixed-

point

conversion FTOI15 float to integer, fixed point 15 bits

ITOF0 integer to float, fixed point 0 bit

ITOF4 integer to float, fixed point 4 bits

ITOF12 integer to float, fixed point 12 bits

ITOF15 integer to float, fixed point 15 bits

Clipping

judgment

CLIP clipping

SCE CONFIDENTIAL VU User's Manual Version 6.0

-37-

3.2.2. Lower Instructions

The Lower instructions are floating-point division, integer calculation, transfer between registers, flag operation,

branching, and elementary function calculation, and other control instructions. There are 69 instructions listed

below.

The NOP instruction is not included in the Lower instructions. If necessary, use meaningless instructions such

as MOVE VF00, VF00 in place of NOP.

Category Instruction Function

DIV floating divide

SQRT floating square-root

Floating-point

division

RSQRT floating reciprocal square-root

IADD integer ADD Integer

calculation IADDI integer ADD immediate

IADDIU integer ADD immediate unsigned

IAND integer AND

IOR integer OR

ISUB integer SUB

ISUBIU integer SUB immediate unsigned

MOVE move floating register Register-

register transfer MFIR move from integer register

MTIR move to integer register

MR32 move rotate 32 bits

Load/Store LQ Load Quadword

LQD Load Quadword with pre-decrement

LQI Load Quadword with post-increment

SQ Store Quadword

SQD Store Quadword with pre-decrement

SQI Store Quadword with post-increment

ILW integer load word

ISW integer store word

ILWR integer load word register

ISWR integer store word register

LOI Load immediate value to I register (pseudo instruction)

RINIT random-unit init R register Random

numbers RGET random-unit get R register

RNEXT random-unit next M sequence

RXOR random-unit XOR R register

Synchroniza-

tion

WAITQ wait Q register

Flag operation FSAND flag-operation status flag AND

FSEQ flag-operation status flag EQ

FSOR flag-operation status flag OR

FSSET flag-operation set status flag

FMAND flag-operation MAC flag AND

FMEQ flag-operation MAC flag EQ

FMOR flag-operation MAC flag OR

FCAND flag-operation clipping flag AND

FCEQ flag-operation clipping flag EQ

FCOR flag-operation clipping flag OR

FCSET flag-operation clipping flag set

FCGET flag-operation clipping flag get

Branching IBEQ integer branch on equal

SCE CONFIDENTIAL VU User's Manual Version 6.0

-38-

Category Instruction Function

IBGEZ integer branch on greater than or equal to zero

IBGTZ integer branch on greater than zero

IBLEZ integer branch on less than or equal to zero

IBLTZ integer branch on less than zero

IBNE integer branch on not equal

B branch (PC relative address)

BAL branch and link (PC relative address)

JR jump register (absolute address)

JALR jump and link register (absolute address)

EFU transfer MFP move from P register

EFU

synchroniza-

tion

WAITP wait P register

ESADD Elementary-function Square and ADD

ERSADD Elementary-function Reciprocal Square and ADD

Vector

elementary

function ELENG Elementary-function Length

ERLENG Elementary-function Reciprocal Length

EATANxy Elementary-function ArcTAN y/x

EATANxz Elementary-function ArcTAN z/x

ESUM Elementary-function Sum

ERCPR Elementary-function Reciprocal

ESQRT Elementary-function Square-root

Scalar

elementary

function ERSQRT Elementary-function Reciprocal Square-root

ESIN Elementary-function SIN

EATAN Elementary-function ArcTAN

EEXP Elementary-function Exponential

XGKICK Kick external unit (GIF) External unit

control XTOP Read VIF data (TOP register)

XITOP Read VIF data (ITOP register)

SCE CONFIDENTIAL VU User's Manual Version 6.0

-39-

3.3. Flags

VU flags are roughly divided into 3 types: MAC flags which show floating-point calculation results, status flags

which show total calculation results, and clipping flags which show clipping judgment results.

3.3.1. MAC Flags

The MAC flags show the FMAC calculation results of the Upper instruction (or the macroinstruction

corresponding to it). There are four flag types, z, s, u and o, and each of them has 1 bit corresponding to the

four floating-point calculation units, FMACx, FMACy, FMACz, and FMACw (16 bits in total).

15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

O field U field S field Z field

Ox Oy Oz Ow Ux Uy Uz Uw Sx Sy Sz Sw Zx Zy Zz Zw

4 bits 4 bits 4 bits 4 bits

Z field: Zero flag

This flag is set to 1 when the calculation results are 0, and set to 0 when the calculation results are non-zero.

S field: Sign flag

This flag is set to 1 when the calculation results are negative, and set to 0 when the calculation results are

positive or 0.

U field: Underflow flag

This flag is set to 1 when the calculation results cause an underflow.

O field: Overflow flag

This flag is set to 1 when the calculation results cause an overflow.

The flags of ALUs that do not operate are cleared to 0. In the ADD.xyz instruction, for example, the Zw, Sw,

Uw, and Ow flags become 0 since FMACw does not operate.

Also, the MAC flags do not change when NOPs are executed.

When an overflow or underflow occurs, the Z and S flags are set according to the clamped results (+MAX/-

MAX/+0/-0).

3.3.2. Status Flags (SF)

Status flags consist of the following 12 bits. There are flags to show MAC flag status, flags to show invalid

calculation, and flags to show their accumulation (sticky flag).

11 10 09 08 07 06 05 04 03 02 01 00

DS IS OS US SS ZS D I O U S Z

Z: Zero flag

This flag is set to 1 when any of the Zx, Zy, Zz, and Zw bits of the MAC flags is set to 1.

S: Sign flag

This flag is set to 1 when any of the Sx, Sy, Sz, and Sw bits of the MAC flags is set to 1.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-40-

U: Underflow flag

This flag is set to 1 when any of the Ux, Uy, Uz, and Uw bits of the MAC flags is set to 1.

O: Overflow flag

This flag is set to 1 when any of the Ox, Oy, Oz, and Ow bits of the MAC flags is set to 1.

I: Invalid flag

This flag is set to 1 when a 0/0 calculation is executed by the DIV instruction or a root calculation of

negative number is executed by the SQRT/RSQRT instruction.

D: Zero division flag

This flag is set to 1 when 0 division (except 0/0) is performed by the DIV/RSQRT instruction, and set to 0

when the SQRT instruction is executed regardless of the results.

ZS/SS/US/OS/IS/DS: Sticky flag

Six flags, ZS, SS, US, OS, IS, and DS, are called Sticky Flags, and indicate the accumulation values of the Z,

S, U, O, I, and D flags, respectively. For example, the ZS flag value becomes the logical OR of the Z flag

value and the ZS flag value from the results of the most-recently performed instruction.

In product-sum calculations, a single instruction, such as MADD/MSUB, performs addition and subtraction

following multiplication. Addition and subtraction results are reflected in the Z/ZS/S/SS/U/US/O/OS

flags, and multiplication results are reflected only in the ZS/SS/US/OS flags.

3.3.3. Clipping Flags (CF)

The clipping flags are set according to the results of the clipping judgment (CLIP) instruction.

-x flag Set to 1 when x < -|w|.

+x flag Set to 1 when x > +|w|.

-y flag Set to 1 when y < -|w|.

+y flag Set to 1 when y > +|w|.

-z flag Set to 1 when z < -|w|.

+z flag Set to 1 when z > +|w|.

There are four sets of clipping flags, as shown below. The clipping flag is shifted 6 bits to the left each time the

CLIP instruction is executed, so clipping information for the four most recent vertices is always maintained.

23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

3rd previous

judgment

2nd previous

judgment

Current

judgment

-z +z -y +y -x +x -z +z -y +y -x +x -z +z -y +y -x +x -z +z -y +y -x +x

6 bits 6 bits 6 bits 6 bits

3.3.4. Flag Set Instructions

The following instructions (Lower instructions) are used to set the flags to a specific value.

FSSET instruction: Sets the status flag.

FCSET instruction: Sets the clipping flag.

The effect on the flag of the simultaneous Upper instruction is ignored and; the flag reflects only the results of

the flag setting instructions.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-41-

3.3.5. Flag Changes for Each Instruction

The flag changes for each micorinstruction are shown in the table below.

"-" means "no flag change" and "X" means "flag change according to calculation results".

For the MAC flags, x, y, z, and w are grouped in the same column. Instructions of the same kind are grouped

with "*", for example ADD*.

Upper Instruction

MAC Flag Status Flag Instruction

OUSZ DIOUSZ

SSSSSS

DIOUSZ

Clipping Flag

ABS ---- ------ ------ -

ADD* XXXX --XXXX --XXXX -

ADDA* XXXX --XXXX --XXXX -

CLIP ---- ------ ------ X

FTOI* ---- ------ ------ -

ITOF* ---- ------ ------ -

MADD* XXXX --XXXX --XXXX -

MADDA* XXXX --XXXX --XXXX -

MAX* ---- ------ ------ -

MINI* ---- ------ ------ -

MSUB* XXXX --XXXX --XXXX -

MSUBA* XXXX --XXXX --XXXX -

MUL* XXXX --XXXX --XXXX -

MULA* XXXX --XXXX --XXXX -

NOP ---- ------ ------ -

OPMULA XXXX --XXXX --XXXX -

OPMSUB XXXX --XXXX --XXXX -

SUB* XXXX --XXXX --XXXX -

SUBA* XXXX --XXXX --XXXX -

Lower Instruction

MAC Flag Status Flag Instruction

OUSZ DIOUSZ

SSSSSS

DIOUSZ

Clipping Flag

DIV ---- XX---- XX---- -

RSQRT ---- XX---- XX---- -

SQRT ---- -X---- 0X---- -

FCSET ---- ------ ------ X

FSSET ---- XXXXXX ------ -

Others ---- ------ ------ -

3.3.6. Flag Changes for Exception Occurrences

Flag changes in the basic instructions (except MADD/MSUB)

The following table shows the results and flag changes for instructions other than MADD or MSUB in

which calculation exceptions are generated.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-42-

MAC Flag Status Flag Sticky Flag Exception Calculation

Results Z* S* U* O* Z S U O ZS SS US OS

No exception X X X 0 0 X X 0 0 X X - -

Overflow MAX 0 X 0 1 0 X 0 1 - X - 1

Underflow 0 1 X 1 0 1 X 1 0 1 X 1 -

X: 1 or 0 according to the calculation results -: Same as past values

MAC Flag is set to 0 when the corresponding FMAC unit is not used.

Flag change in MADD instruction

In the MADD instruction, addition/subtraction with the accumulator is performed following the

multiplication. There is a double exception generation factor in one instruction, and flags are set in

compliance with both the multiplication and addition/subtraction. Therefore, flag settings in this instruction

are complicated.

Exception Factor MAC Flag Status Flag Sticky Flag

ACC Multiplication

Calculation

Results Z* S* U* O* Z S U O ZS SS US OS

0/Normalized

value

No exception X X X 0 0 X X 0 0 X X - -

0/Normalized

value

OVF +/-MAX 0 X 0 1 0 X 0 1 - X - 1

0/Normalized

value

UDF - X X 0 0 X X 0 0 X X 1 -

+/-MAX No exception +/-MAX 0 X 0 1 0 X 0 1 b c - 1

+MAX OVF(+MAX) +MAX 0 0 0 1 0 0 0 1 - b - 1

+MAX OVF(-MAX) -MAX 0 1 0 1 0 1 0 1 - 1 - 1

-MAX OVF(+MAX) +MAX 0 0 0 1 0 0 0 1 - b - 1

-MAX OVF(-MAX) -MAX 0 1 0 1 0 1 0 1 - 1 - 1

+/-MAX UDF +/-MAX 0 X 0 1 0 X 0 1 1 c 1 1

Addition/Subtraction

OFV/UDF

+/-MAX/0 X X X X X X X X c c a a

X: 1 or 0 according to the calculation results of addition -: Same as past values

a: Logical OR of the flag value that shows calculation results of addition and the past flag values

b: Logical OR of the flag value that shows calculation results of multiplication and the past flag values

c: Logical OR of the flag values that show calculation results of addition and multiplication and the past flag values

MAC Flag is set to 0 when the corresponding FMAC unit is not used.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-43-

Flag changes in MSUB instruction

The flag changes in the MSUB instruction are almost the same as those in the MADD instruction, but

positive and negative signs are changed when an overflow occurs to multiplication.

Exception Factor MAC Flag Status Flag Sticky Flag

ACC Multiplication

Calculation

Results Z* S* U* O* Z S U O ZS SS US OS

0/Normalized

value

No exception X X X 0 0 X X 0 0 X X - -

0/Normalized

value

OVF +/-MAX 0 X 0 1 0 X 0 1 - X - 1

0/Normalized

value

UDF - X X 0 0 X X 0 0 X X 1 -

+/-MAX No exception +/-MAX 0 X 0 1 0 X 0 1 b c - 1

+MAX OVF(+MAX) -MAX 0 1 0 1 0 1 0 1 - 1 - 1

+MAX OVF(-MAX) +MAX 0 0 0 1 0 0 0 1 - b - 1

-MAX OVF(+MAX) -MAX 0 1 0 1 0 1 0 1 - 1 - 1

-MAX OVF(-MAX) +MAX 0 0 0 1 0 0 0 1 - b - 1

+/-MAX UDF +/-MAX 0 X 0 1 0 X 0 1 1 c 1 1

Addition/Subtraction

OFV/UDF

+/-MAX/0 X X X X X X X X c c a a

X: 1 or 0 according to the calculation results of addition -: Same as past values

a: Logical add of the flag value that shows calculation results of addition and the past flag values

b: Logical add of the flag value that shows calculation results of multiplication and the past flag values

c: Logical add of the flag values that show calculation results of addition and multiplication and the past flag values

MAC Flag is set to 0 when the corresponding FMAC unit is not used.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-44-

3.4. Pipeline Operation

3.4.1. Hazards

VU calculations are performed concurrently via the pipeline, but some operations may stall occasionally under

the conditions specified below. For concrete operations, see sections "3.4.4. FMAC Pipeline" and following.

DIV Resource Hazards

An instruction (DIV/SQRT/RSQRT) that uses the floating-point divider unit when another instruction of

this type is being executed.

EFU Resource Hazards

An instruction (such as ESIN) that uses the EFU when another instruction of this type is being executed.

Floating-Point Register Data Hazards

An instruction that uses a floating-point register when another instruction that uses the register as the

destination is being executed (until the value is fixed).

Data hazard checks are performed independently in each field of x/y/z/w. VF00 is a constant register, and

is not subject to hazard checks.

Integer Register Data Hazards

An instruction that uses the values of an integer register when a load/store instruction to the integer register

is being executed.

Since integer calculation latency is 1 clock, data hazards due to calculations are not generated. VI00 is not

subject to hazard checks.

Data hazards are not generated for the special registers such as ACC, I, Q, P, and R. It is possible to make the

Q and P registers synchronize with each other by using the WAITQ/WAITP instruction.

When transferring data to an external unit (GS) by the XGKICK instruction, stalls continue until the data can be

transferred.

3.4.2. Upper Instruction and Lower Instruction

In micro mode, there is an Upper instruction pipeline and a Lower instruction pipeline. The Upper and Lower

instructions are issued concurrently, so both pipelines stall if hazards occur to either of them.

3.4.3. Priority for Writing to a Register

When the Upper and Lower instructions write data to the same register at the same time, priority is given to the

Upper instruction and the result of the Lower instruction is discarded. Moreover, when a coprocessor transfer

instruction (COP2) writes data at the same time, it is given priority.

COP2 Transfer Instruction> Upper Instruction> Lower Instruction

The above is performed in register units, so note that the result of the Lower instruction is discarded even when

the Upper and Lower instructions write data to different fields as shown in the following example.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-45-

ADD.xy VF01, VF01, VF23 MOVE.w VF01, VF09

3.4.4. FMAC Pipeline

Figure 3-2 illustrates the FMAC pipeline. All instructions except some of the Lower instructions (DIV, SQRT,

RSQRT, integer calculation, and conditional branching) are executed according to this pipeline. Load/Store of

integer registers also follows this pipeline.

The MAC flag, status flag and clipping flag, which show the results of the calculation, are set at the S stage.

M T X Y Z S

access MicroMem

read VPU register write back

execute1

execute2

execute3

Figure 3-2 FMAC Pipeline

An example of the FMAC pipeline is illustrated below.

MUL.xyzw VF05, VF10, VF20

NOP (MOVE VF00, VF00)

MADDA.xyzw ACC, VF12, VF22

NOP (MOVE VF00, VF00)

MADD.xyzw VF06, VF13, VF23

NOP (MOVE VF00, VF00)

MUL.xyzw VF07, VF05, VF24

NOP (MOVE VF00, VF00)11

MULA.xyzw ACC, VF11, VF21

NOP (MOVE VF00, VF00)

MUL.xyzw VF08, VF15, VF25

NOP (MOVE VF00, VF00)

clock cycle12345678910 11

M

T

X

Y

Z

S

M

T

X

Y

Z

S

M

T

X

Y

Z

S

M

T

X

Y

Z

S

M

T

X

Y

Z

S

M

T

X

Y

Z

S

Instruction

1

2

3

4

5

6

Figure 3-3 FMAC Pipeline Operation Example

Figure 3-3 illustrates a normal operation example of the FMAC pipeline. Calculation results of Instruction 1 are

used for Instruction 5, but stalls do not occur because other instructions are being executed between them.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-46-

MUL.xyzw VF10, VF10, VF10

NOP (MOVE VF00, VF00)

ADDw.z VF10z, VF10z, VF10w

NOP (MOVE VF00, VF00)

NOP

MOVE.z VF20z, VF10z

NOP

NOP (MOVE VF00, VF00)

ADDy.x VF10x, VF10x, VF10y

NOP (MOVE VF00, VF00)

NOP

NOP (MOVE VF00, VF00)

clock cycle 1234567891011

Ts: T stage stall Ms: M stage stall

stall

non-stall

M

T

X

Y

Z

S

M

Ts

T

X

Y

Z

S

Ms

M

T

X

Y

Z

S

M

Ts

T

X

Ms

M

T

M

Instruction

1

2

3

4

5

6

Figure 3-4 Stalls due to FMAC Pipeline Data Hazards

Figure 3-4 shows an example of FMAC pipeline stall. VF10x of Instruction 1 output is used for Instruction 2.

Therefore, execution of Instruction 2 is delayed until Instruction 1 reaches the S stage.

Between Instruction 3 and Instruction 4, data hazards occur to VF10z.

Stalls do not occur between Instruction 2 and Instruction 3. The output VF10x of Instruction 2 and the input

VF10z and VF10w of Instruction 3 are in the same register, but data hazards are not generated since they are in

different fields. (Data hazards are generated in the macroinstruction.)

Stalls due to data dependency do not occur between the Upper instruction and the Lower instruction in the

same instruction, though this is not shown in this example. For the relationship with COP2 instructions, see

"5.4.4. Operation when Transferring Data with EE Core".

3.4.5. FDIV Pipeline

The following figures illustrate the pipelines related to the floating-point division unit.

M

access MicroMem

read VU register

write back

self time DIV/SQRT execute

TD1 D2 D3 D4 D5 D6 F

Figure 3-5 FDIV Pipeline (DIV / SQRT)

M

access MicroMem

read VPU register

write back

self time RSQRT execute

TD1 D2 D3 D4 D5 D6 D7 D8 D9 DA DB DC F

Figure 3-6 FDIV Pipeline (RSQRT)

The next DIV/SQRT/RSQRT instruction stalls with the generation of resource hazards during execution of

D1- D6 stage of DIV/SQRT instruction and D1- DC stage of RSQRT instruction.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-47-

M

NOP

DIV Q, VF10, VF20

clock cycle 1234567891011

NOP

NOP (MOVE VF00, VF00)

MULq.xyzw VF13, VF23, Q

NOP (MOVE VF00, VF00)

NOP

NOP (MOVE VF00, VF00)

NOP

NOP (MOVE VF00, VF00)

NOP

NOP (MOVE VF00, VF00)

NOP

NOP (MOVE VF00, VF00)

NOP

NOP (MOVE VF00, VF00)

M

T

X

D1

Y

D2

Z

D3

S

D4 D5 D6 F

M

T

X

Y

Z

S

M

T

X

Y

Z

S

M

T

X

Y

Z

S

M

T

X

Y

Z

S

M

T

X

Y

Z

S

M

T

X

Y

Z

S

M

T

X

Y

Z

Instruction

1

2

3

4

5

6

7

8

Figure 3-7 FDIV Pipeline Operation Example

Figure 3-7 illustrates a normal operation example of the FDIV pipeline. The DIV calculation results can be

received and calculation can be performed by inserting six or more other instructions between the DIV

instruction (Instruction 1) and the MULq instruction (Instruction 8) which uses the Q register, the results of the

DIV instruction, after the DIV instruction (Instruction 1).

NOP

DIV Q, VF10, VF20

MULq.xyzw VF12, VF22, Q

NOP (MOVE VF00, VF00)

clock cycle 1 234567891011

MULq.xyzw VF13, VF23, Q

DIV Q, VF11, VF21

M

NOP

NOP (MOVE VF00, VF00)

NOP

NOP (MOVE VF00, VF00)

NOP

NOP (MOVE VF00, VF00)

M

T

X

D1

Y

D2

Z

D3

S

D4 D5 D6 F

M

T

X

Y

Z

S

M

Ts

T

X

D1

Y

D2

Z

D3

Ms

M

T

X

Y

M

T

X

M

T

Instruction

1

2

3

4

5

6

Figure 3-8 FDIV Pipeline Continuous Execution Example

Figure 3-8 shows an example of FDIV pipeline continuous execution. The next DIV instruction (Instruction 3)

is started during the DIV instruction (Instruction 1) execution, and Instruction 3 stalls until the D6 stage of

Instruction 1 ends.

Although Instruction 2 is a MULq instruction, data dependency is not checked regarding the Q register, so the

Q register value previously obtained is used here, not the calculation results of Instruction 1.

The MULq of Instruction 3 uses the calculation results of Instruction 1, which was written to the Q register at

the F stage, due to stalls.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-48-

NOP

DIV Q, VF10, VF20

clock cycle 1234567891011

MULq.xyzw VF13, VF23, Q

WAITQ

M

NOP

NOP (MOVE VF00, VF00)

NOP

NOP (MOVE VF00, VF00)

NOP

NOP (MOVE VF00, VF00)

NOP

NOP (MOVE VF00, VF00)

M

T

X

D1

Y

D2

Z

D3

S

D4 D5 D6 F

M

Ts

X

Y

Z

T

Ms

M

T

X

Y

M

T

X

M

T

M

Instruction

1

2

3

4

5

6

Figure 3-9 WAITQ Instruction Operation Example

Figure 3-9 illustrates an example of synchronization using the WAITQ instruction. Due to the WAITQ

instruction of Instruction 2, subsequent instructions stall until the output of the DIV instruction (Instruction 1)

is fixed. The results of Instruction 1 can be used from the Upper instruction of Instruction 2.

3.4.6. EFU Pipeline

Figure 3-10 illustrates the EFU pipeline. During execution of N1 - Nn-1 stages, the next elementary function

calculation instruction stalls with the generation of resource hazards. Unlike the FDIV pipeline, the resource

hazards are not generated during execution of the Nn stage, the final stage to be executed.

M

access MicroMem

read VPU register

write back

EFU execute

TN1 N2 Nn-1 Nn P

..................

Figure 3-10 EFU Pipeline

NOP

MFP

M

NOP

ESADD P, VF10

clock cycle

12345 13141516

NOP

ESADD P, VF10

NOP

NOP (MOVE VF00, VF00)

NOP

NOP (MOVE VF00, VF00)

NOP

NOP (MOVE VF00, VF00)

......

M

T

X

N

Y

N

Z

N N P

M

Ts

T

X

N

Y

N

Z

N

S

N

Ms

M

T

X

Y

Z

M

T

X

Y

M

T

X

M

T

Instruction

1

2

3

4

5

6

17

Figure 3-11 EFU Pipeline Continuous Execution Example

SCE CONFIDENTIAL VU User's Manual Version 6.0

-49-

Figure 3-11 illustrates a pipeline operation example in which the EFU instruction is continuously executed.

Since the ESADD instruction (Latency 11, Throughput 10) is executed by Instruction 1 and a new ESADD

instruction (Instruction 2) is executed before the end of the current ESADD instruction, stalls are generated at

the T stage of Instruction 2. Unlike the FDIV pipeline, the stall is cleared at the Nn stage (one stage before the

P stage).

M

NOP

ESADD P, VF10

clock cycle 12345 131415

NOP

WAITP

NOP

MFP

NOP

NOP (MOVE VF00, VF00)

NOP

NOP (MOVE VF00, VF00)

NOP

MFP

......

6

M

T

X

N

Y

N

Z

N

S

N N P

M

T

X

Y

Z

M

Ts

T

X

Y

Ms

M

T

X

M

T

M

Instruction

1

2

3

4

5

6

Figure 3-12 WAITP Instruction Operation Example

Figure 3-12 illustrates a synchronization example using the WAITP instruction. Due to the WAITP instruction

of Instruction 3, a stall occurs and continues until the end of the ESADD instruction of Instruction 1 and is

cleared at the Nn stage, then the succeeding instruction is executed.

Similar to the Q register, the data dependency is not checked in the P register. In the MFP instruction of

Instruction 2, the P register values gained prior to Instruction 1 are used. In the MFP instruction of Instruction

4, the P register values gained from Instruction 1 are used as a result of the stall.

3.4.7. IALU Pipeline

Figure 3-13 illustrates the IALU pipeline, which performs Integer calculation.

M T X y z S

access MicroMem

read VPU register write back

execute

dummy1

dummy2

Figure 3-13 Actual IALU Pipeline

IALU execution ends in one cycle, but the y stage and z stage exist as dummy stages in order to adjust the timing

with the FMAC pipeline. Although the results are actually stored in the integer register at the S stage, there is no

latency in effect for the dummy stages since the results are bypassed from the X/y/z/S stage to the T stage.

The latency for the dummy stages appears only when the microinstruction calculation results are transferred

after the Integer register has been written at the S stage by the CFC2 instruction (a coprocessor transfer

macroinstruction).

SCE CONFIDENTIAL VU User's Manual Version 6.0

-50-

clock cycle 1234567891011

M

MADD.xyzw VF14, VF23, VF13

IADD VI07, VI06, VI04

MADDA.xyzw ACC, VF22, VF12

IADD VI06, VI05, VI04

MADDA.xyzw ACC, VF21, VF11

IADDI VI05, VI05, -1

LOOP:

MULA.xyzw ACC, VF20, VF10

IADD VI04, VI03, VI02

MULA.xyzw ACC, VF20, VF10

LQI.xyzw VF10, (VI04++)

M

T

X

Y

y

Z

z

S

M

T

X

Y

y

Z

z

S

M

T

X

Y

y

Z

z

S

Calculation results bypass

M

T

X

Y

y

Z

z

S

Calculation results bypass

M T X Y Z S

1

2

3

4

5

Instruction

Figure 3-14 IALU Pipeline Bypass Example

Figure 3-14 illustrates an operation example of the IALU pipeline. The IADD instruction of Instruction 3 uses

the results of Instruction 1 and Instruction 2, so bypasses are generated between the y stage of Instruction 1 and

the T stage of Instruction 3, and between the X stage of Instruction 2 and the T stage of Instruction 3.

Similarly, bypasses are generated between the z stage of Instruction 1 and the T stage of Instruction 4, and

between the X stage of Instruction 3 and the T stage of Instruction 4.

3.4.8. Conditional Branching and Pipeline

Figure 3-15 illustrates an example of pipeline operation, which accompanies conditional branching.

clock cycle 1 2 3 4 5 6 7 8 9 10 11

M

Branch delay slot

MADD.xyzw VF14, VF22, VF12

SQI.xyzw VF14, (VI05++)

MADDA.xyzw ACC, VF21, VF11

IBNE VI03, VI00, Loop:

MADDA.xyzw ACC, VF20, VF10

LQI.xyzw VF10, (VI04++)

Loop:

MULA.xyzw ACC, VF23, VF13

IADDI VI03, VI03, -1

Branch delay slot

MADD.xyzw VF14, VF22, VF12

SQI.xyzw VF14, (VI05++)

MADDA.xyzw ACC, VF21, VF11

IBNE VI03, VI00, Loop:

MADDA.xyzw ACC, VF20, VF10

LQI.xyzw VF10, (VI04++)

Loop:

MULA.xyzw ACC, VF23, VF13

IADDI VI03, VI03, -1

M

T

X

Y

y

Z

z

S

M

T

X

Y

Z

S

M

T

X

Y

Z

S

M

T

X

Y

Z

S

M

T

X

Y

y

Z

z

S

M

T

X

Y

Z

S

M

T

X

Y

Z

S

M

T

X

Y

Z

1

2

3

4

5

6

Instruction

7

8

Figure 3-15 Operation Example of Integer Calculation Branching Instruction

In Figure 3-15, a loop has been created between Instruction 1 and Instruction 4. A conditional branch is

performed in Instruction 3 according to the results of the Lower instruction of Instruction 1, and it branches to

the beginning of the loop (Instruction 5) after the one-instruction branch delay slot (Instruction 4).

As mentioned above, a one-instruction slot is necessary between a branch condition setting instruction and a

condition branching instruction. Flag check instructions (FCAND, FCEQ, FCGET, FCOR, FMAND, FMEQ,

FMOR, FSAND, FSEQ, and FSOR) are an exception, and a conditional branch instruction can be placed

immediately after them.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-51-

clock cycle 1 2 3 4 5 6 7 8 9 10 11

M

Branch delay slot

NOP

NOP (MOVE VF00, VF00)

ADD.xyzw VF07, VF12, VF22

NOP (MOVE VF00, VF00)

ADD.xyzw VF06, VF11, VF21

NOP (MOVE VF00, VF00)

MUL.xyzw VF05, VF10, VF20

NOP (MOVE VF00, VF00)

ADD.xyzw VF09, VF14, VF24

FSAND VI01, SF, 0xfff:

MUL.xyzw VF05, VF20, VF10

LQI.xyzw VF10, (VI04++)

ADD.xyzw VF08, VF13, VF23

NOP (MOVE VF00, VF00)

ADD.xyzw VF10, VF15, VF25

IBNE VI01, VI00, Label:

M

T

X

Y

Z

S

M

T

X

Y

Z

S

M

T

X

Y

Z

S

M

T

X

Y

Z

S

M

T

X

Y

y

Z

z

S

M

T

X

Y

y

Z

z

S

M

T

X

Y

Z

S

M

T

X

Y

Z

Instruction

1

2

3

4

5

6

7

8

Figure 3-16 Example of Floating-Point Calculation Branch Instruction (1)

Figure 3-16 shows an example of branching according to floating-point calculation results. A conditional branch

is performed according to the results of Instruction 1, and the calculation results are written to the status flag at

the S stage. The logical AND of the status flag and the immediate value is written to VI01 in Instruction 5, after

inserting Instructions 2 to 4. Then, the conditional branch is performed according to the value of VI01 in

Instruction 6.

clock cycle 4567891011

Y

Branch delay slot

NOP

NOP (MOVE VF00, VF00)

ADD.xyzw VF07, VF12, VF06

NOP (MOVE VF00, VF00)

ADD.xyzw VF06, VF11, VF05

NOP (MOVE VF00, VF00)

MUL.xyzw VF05, VF10, VF20

NOP (MOVE VF00, VF00)

ADD.xyzw VF09, VF14, VF24

FSAND VI01, SF, 0xfff

Label:

MUL.xyzw VF05, VF20, VF10

LQI.xyzw VF10, (VI04++)

ADD.xyzw VF08, VF13, VF23

NOP (MOVE VF00, VF00)

12 13 14

ADD.xyzw VF10, VF15, VF25

IBNE VI01, VI00, Label:

Y

Z

S

Ts

T

X

Y

Z

S

Ms

M

Ts

T

X

Y

Z

S

Ms

M

T

X

Y

Z

M

T

X

Y

y

M

T

X

M

T

M

Instruction

1

2

3

4

5

6

7

8

Figure 3-17 Example of Floating-Point Calculation Branch Instruction (2)

Figure 3-17 shows an example in which stalls are generated due to data hazards. As a result, the status flag

referred to by Instruction 5 shows the calculation results of Instruction 2.

As mentioned above, when reading the status flag, it is necessary to pay attention to the timing.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-52-

3.4.9. XGKICK Pipeline

The stage structure of the XGKICK pipeline that activates GIF transfer via PATH1 is basically the same as that

of the FMAC pipeline, but the XGKICK instruction performs a special pipeline operation that makes the

subsequent instructions stall on the T stage when continuously executed.

MUL.xyzw VF03, VF10, VF20

XGKICK

MUL.xyzw VF05, VF012, VF22

NOP (MOVE VF00, VF00)

MUL.xyzw VF06, VF13, VF23

NOP (MOVE VF00, VF00)

MUL.xyzw VF07, VF14, VF24

NOP (MOVE VF00, VF00)

MUL.xyzw VF04, VF11, VF21

XGKICK

MUL.xyzw VF08, VF15, VF25

NOP (MOVE VF00, VF00)

clock cycle 1234567 mn

M

T

X

Y

Z

S

M

T

X

Y

Z

S

M

Ts

Ms

T

X

Y

Z

M

T

X

Y

Instruction

1

2

3

4

5

6

Ts

Ms

Figure 3-18 XGKICK Pipeline Operation

Figure 3-18 shows an XGKICK pipeline operation example. The XGKICK in Instruction 1 executes without

stalling. However, at the XGKICK in Instruction 2, since the transfer via PATH1 activated in Instruction 1 is

in process, the pipeline stalls until the PATH1 transfer caused by the preceding XGKICK instruction ends. At

this time, not Instruction 2, but the following Instruction 3 is delayed. Note that the Upper instruction of

Instruction 2 is executed without stalling.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-53-

3.5. Micro Subroutine Execution

3.5.1. How to Execute a Micro Subroutine

There are three ways to execute a micro subroutine:

Macroinstruction VCALLMS / VCALLMSR instruction Executable in VU0

Write the execution address to the control register Executable in VU1

MSCAL/MSCALF of VIFcode Executable in VU0/VU1

Operation is indeterminate if VCALLMS/VCALLMSR instruction from the EE Core and start-up from the VIF

are specified concurrently.

3.5.2. How to Terminate a Micro Subroutine

There are three ways to terminate a micro subroutine:

By a microinstruction that sets the E bit to 1.

By a microinstruction that sets the T bit or D bit to 1.

By a Force Break from an external source.

A micro subroutine is normally terminated by a microinstruction that sets the E bit to 1. Other termination

methods are only for debugging purposes. See "7.3. Micro Subroutine Debugging".

3.5.3. Operation of Execution and Termination

Figure 3-19 shows an example of executing a micro subroutine.

address : 0x0100

MUL

NOP

address : 0x0110

MADDA

NOP

address : 0x0118

MADD

NOP

address : 0x0120

MUL

NOP

address : 0x0108

MULA

NOP

address : 0x0128

MUL

NOP

clock cycle 1234567891011

M

12 13 14 15 16 17

E bit

Address specified by callms

Stop position according to E bit

E bit delay Slot

(1 instruction)

address : 0x0130

MUL

NOP

Address stored in TPC

M

T

X

Y

Z

S

M

T

X

Y

Z

S

M

T

X

Y

Z

S

M

T

X

Y

Z

S

M

T

X

Y

Z

S

M

T

X

Y

Z

S

Instruction

1

2

3

4

5

6

7

[Instruction address unit : bytes]

Figure 3-19 Execution by callms and Termination by the E bit

SCE CONFIDENTIAL VU User's Manual Version 6.0

-54-

In Figure 3-19, micro subroutine execution starts from Instruction 1. The E bit, which indicates the end of the

micro subroutine, is set in Instruction 5. There is a one-instruction E-bit delay slot, in which Instruction 6 is

executed; then the micro subroutine stops execution and returns to macro mode. The address of Instruction 7

is stored in the termination position program counter (TPC).

The following kinds of instructions cannot be placed in the E bit delay slot:

• Branch instructions

• Instructions that synchronize to external units, such as XTOP and XITOP

• XGKICK

• VU Mem load/store instruction

• Microinstructions that set the E bit to 1

If a micro subroutine is terminated during the execution of the FDIV or EFU instruction, the FDIV or EFU

process is continued, and the results are stored in the P or Q register at a given latency.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-55-

3.6. Other Functions

3.6.1. Data Transfer with VU Mem/Micro Mem

VU Mem and Micro Mem are I/O-mapped to the main memory of the EE Core. When the VPU is not

operating, these memory locations are accessible directly from the EE Core.

The address map is shown in the following table.

Memory Address

MicroMem0 0x1100_0000 - 0x1100_0ff0

VUMem0 0x1100_4000 - 0x1100_4ff0

MicroMem1 0x1100_8000 - 0x1100_bff0

VUMem1 0x1100_c000 - 0x1100_fff0

3.6.2. Debug Support Function

Execution of a micro subroutine can be suspended for debugging by setting the D bit in the operation code field

to 1. For further information, see "7.3. Micro Subroutine Debugging".

SCE CONFIDENTIAL VU User's Manual Version 6.0

-56-

(This page is left blank intentionally)

SCE CONFIDENTIAL VU User's Manual Version 6.0

-57-

4. Micro Mode Instruction Reference

SCE CONFIDENTIAL VU User's Manual Version 6.0

-58-

4.1. Micro Mode Instruction Set

4.1.1. Types of Upper Instruction

There are four types of Upper instructions that primarily execute floating-point calculations:

UpperOP field type 0

Specifies three registers (VF[fs],VF[ft],VF[fd]) and a broadcast field (e.g. ADDbc instruction), and performs scalar

calculations as follows:

Example: ADDx.xyzw VF10xyzw, VF20xyzw, VF30x

Operation: VF10x = VF20x + VF30x

VF10y = VF20y + VF30x

VF10z = VF20z + VF30x

VF10w = VF20w + VF30x

Upper 32-bit word: UpperOP field type 0

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg fd reg OPCODE bc

- - - - - 0 0 ---- ----- ----- ----- ---- --

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 5 bits 4 bits 2 bits

UpperOP field type 1

Specifies three registers (VF[fs],VF[ft],VF[fd] ), e.g. ADD instruction, and performs vector calculations as

follows:

Example: ADD.xyzw VF10xyzw, VF20xyzw, VF30xyzw

Operation: VF10x = VF20x + VF30x

VF10y = VF20y + VF30y

VF10z = VF20z + VF30z

VF10w = VF20w + VF30w

Upper 32-bit word: UpperOP field type 1

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg fd reg OPCODE

- - - - - 0 0 ---- ----- ----- ----- ------

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 5 bits 6 bits

UpperOP field type 2

Specifies two registers (VF[fs],VF[ft]) and a broadcast field, e.g. ADDAbc instruction, and performs scalar

calculations as follows:

Example: ADDAx.xyzw ACCxyzw, VF20xyzw, VF30x

Operation: ACCx = VF20x + VF30x

ACCy = VF20y + VF30x

ACCz = VF20z + VF30x

ACCw = VF20w + VF30x

SCE CONFIDENTIAL VU User's Manual Version 6.0

-59-

Upper 32-bit word: UpperOP field type 2

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg OPCODE bc

- - - - - 0 0 ---- ----- ----- ----- 1111 --

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 9 bits 2 bits

UpperOP field type 3

Specifies two registers (VF[fs],VF[ft]), e.g. ADDA instruction. Performs vector calculations as follows:

Example: ADDA.xyzw ACCxyzw, VF20xyzw, VF30xyzw

Operation: ACCx = VF20x + VF30x

ACCy = VF20y + VF30y

ACCz = VF20z + VF30z

ACCw = VF20w + VF30w

Upper 32-bit word: UpperOP field type 3

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg OPCODE

- - - - - 0 0 ---- ----- ----- ----- 1111 --

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 11 bits

4.1.2. Types of Lower Instructions

There are 7 types of Lower instructions:

LowerOP field type 1

Specifies 3 registers, e.g. IADD instruction.

Lower 32-bit word: LowerOP field type 1

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

Lower OP. dest ft reg fs reg fd reg OPCODE

1000000 ---- ----- ----- ----- ------

7 bits 4 bits 5 bits 5 bits 5 bits 6 bits

LowerOP field type 3

Specifies up to two registers and a dest field, e.g. MOVE instruction.

Example: MOVE.xyzw VF10xyzw, VF20xyzw

Operation: VF10x = VF20x

VF10y = VF20y

VF10z = VF20z

VF10w = VF20w

Lower 32-bit word: LowerOP field type 3

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

Lower OP. dest ft reg fs reg OPCODE

1000000 ---- ----- ----- ----- 1111 --

7 bits 4 bits 5 bits 5 bits 11 bits

SCE CONFIDENTIAL VU User's Manual Version 6.0

-60-

LowerOP field type 4

Specifies 2 floating-point registers with a specific field for each, e.g. DIV instruction.

Example: DIV Q, VF10x, VF20y

Operation: Q = VF10x ÷ VF20y

Lower 32-bit word: LowerOP field type 4

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

Lower OP. ftf fsf ft reg fs reg OPCODE

1000000 -- -- ----- ----- ----- 1111 --

7 bits 2 bits 2 bits 5 bits 5 bits 11 bits

LowerOP field type 5

Specifies 2 registers and a 5-bit immediate value, e.g. IADDI instruction.

Lower 32-bit word: LowerOP field type 5

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

Lower OP. dest it reg is reg Imm5 OPCODE

1000000 0000 ----- ----- ----- ------

7 bits 4 bits 5 bits 5 bits 5 bits 6 bits

LowerOP field type 7

Specifies 2 registers and an 11-bit immediate value, e.g. ILW instruction.

Lower 32-bit word: LowerOP field type 7

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

Lower OP. dest it reg fs reg Imm11

0------ ---- ----- ----- -----------

7 bits 4 bits 5 bits 5 bits 11 bits

LowerOP field type 8

Specifies 2 registers and a 15-bit immediate value.

Lower 32-bit word: LowerOP field type 8

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

Lower OP. Imm15 it reg fs reg Imm15

0------ ---- ----- ----- -----------

7 bits 4 bits 5 bits 5 bits 11 bits

LowerOP field type 9

Specifies a 24-bit immediate value, e.g. FCAND instruction.

Lower 32-bit word: LowerOP field type 9

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

Lower OP. - Imm24

0------ 0 ------------------------

7 bits 1 24 bits

SCE CONFIDENTIAL VU User's Manual Version 6.0

-61-

4.1.3. Operation Fields for Micro Instructions

Various operation fields in operation codes are explained in this section.

dest field (Upper/Lower)

Upper 32-bit word: UpperOP field type 0

63

62

61

60

59

58

57 56 55 54 53 52

51

50

49

48

47

46

45

44

43

42

41

40

39

38

37

36

35

34

33

32

I

E

M

D

T

-

- dest ft reg

fs reg

fd reg

OPCODE

bc

-

0

0 ---- -----

-----

----

--

1

1 4 bits 5 bits

5 bits

4 bits

2 bits

Lower 32-bit word: LowerOP field type 1

31

30

29

28

27

26

25 24 23 22 21 20

19

18

17

16

15

14

13

12

11

10

09

08

07

06

05

04

03

02

01

00

Lower OP. dest ft reg

fs reg

fd reg

OPCODE

1000000 ---- -----

-----

------

7 bits 4 bits 5 bits

5 bits

6 bits

The dest field specifies the FMAC units to be operated in parallel; that is, either the x, y, z or w field of the 128-

bit data to be operated on.

The dest field is 4 bits: bits 56 through 53 for Upper instructions and bits 24 through 21 for Lower instructions.

Each of the 4 bits can be specified independently; when the bit is set to 1, the corresponding FMAC unit /field

becomes effective.

Bit

Upper Lower

Corresponding FMAC /Field

56 24 x

55 23 y

54 22 z

53 21 w

bc field (Upper)

Upper 32-bit word: UpperOP field type 0

63

62

61

60

59

58

57

56

55

54

53

52

51

50

49

48

47

46

45

44

43

42

41

40

39

38

37

36

35

34 33 32

I

E

M

D

T

-

dest

ft reg

fs reg

fd reg

OPCODE

bc

-

0

----

-----

---- --

1

4 bits

5 bits

4 bits 2 bits

Lower 32-bit word: LowerOP field type 1

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

09

08

07

06

05

04

03

02

01

00

Lower OP.

dest

ft reg

fs reg

fd reg

OPCODE

1000000

----

-----

------

7 bits

4 bits

5 bits

6 bits

The bc field is bits 33 and 32, and specifies the broadcast field as below.

Specified value of bc field Broadcast field

00 x

01 y

10 z

11 w

SCE CONFIDENTIAL VU User's Manual Version 6.0

-62-

fsf/ftf field

Upper 32-bit word: UpperOP field type 0

63

62

61

60

59

58

57

56

55

54

53

52

51

50

49

48

47

46

45

44

43

42

41

40

39

38

37

36

35

34

33

32

I

E

M

D

T

-

dest

ft reg

fs reg

fd reg

OPCODE

bc

-

0

----

-----

----

--

1

4 bits

5 bits

4 bits

2 bits

Lower 32-bit word: LowerOP field type 4

31

30

29

28

27

26

25

24

23 22 21 20

19

18

17

16

15

14

13

12

11

10

09

08

07

06

05

04

03

02

01

00

Lower OP.

ftf fsf ft reg

fs reg

OPCODE

1000000

-- -- -----

-----

1111

--

7 bits

2 bits 2 bits 5 bits

5 bits

11 bits

The combinations of the fsf field with the fs reg field and the ftf field with the ft reg field specify the field to be

calculated by the instruction. Bits 22 and 21 of the Lower instruction are used for the fsf field, and bits 24 and

23 are used for the ftf field.

Specified value for fsf/ftf field Field to be operated

00 x

01 y

10 z

11 w

I bit (Upper)

Upper 32-bit word: UpperOP field type 0

63 62

61

60

59

58

57

56

55

54

53

52

51

50

49

48

47

46

45

44

43

42

41

40

39

38

37

36

35

34

33

32

I E

M

D

T

-

dest

ft reg

fs reg

fd reg

OPCODE

bc

- -

-

0

----

-----

----

--

1 1

1

4 bits

5 bits

4 bits

2 bits

Lower 32-bit word: LowerOP field type 1

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

09

08

07

06

05

04

03

02

01

00

Lower OP.

dest

ft reg

fs reg

fd reg

OPCODE

1000000

----

-----

------

7 bits

4 bits

5 bits

6 bits

The I bit is specified when loading an immediate value into the I register. When bit 63 of the Upper instruction

field is set to 1, the contents of the Lower instruction field are loaded into the I register as a single-precision

floating-point immediate value.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-63-

E bit (Upper)

Upper 32-bit word: UpperOP field type 0

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M

D

T

-

dest

ft reg

fs reg

fd reg

OPCODE

bc

- - -

-

0

----

-----

----

--

1 1 1

1

4 bits

5 bits

4 bits

2 bits

Lower 32-bit word: LowerOP field type 1

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

09

08

07

06

05

04

03

02

01

00

Lower OP.

dest

ft reg

fs reg

fd reg

OPCODE

1000000

----

-----

------

7 bits

4 bits

5 bits

6 bits

The E bit is bit 62 of the Upper instruction field; it is used when designating termination of a micro subroutine.

When the E bit is set to 1, the VU terminates execution of the micro subroutine after the next instruction and

returns to macro mode.

M bit (Upper)

Upper 32-bit word: UpperOP field type 0

63

62 61 60

59

58

57

56

55

54

53

52

51

50

49

48

47

46

45

44

43

42

41

40

39

38

37

36

35

34

33

32

I

E M D

T

-

dest

ft reg

fs reg

fd reg

OPCODE

bc

- - -

-

0

----

-----

----

--

1

1 1 1

1

4 bits

5 bits

4 bits

2 bits

Lower 32-bit word: LowerOP field type 1

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

09

08

07

06

05

04

03

02

01

00

Lower OP.

dest

ft reg

fs reg

fd reg

OPCODE

1000000

----

-----

------

7 bits

4 bits

5 bits

6 bits

The M bit is bit 61 of the Upper instruction field; it specifies QMTC2 / CTC2 instruction interlock. The

QMTC2 / CTC2 instruction is executed without interlocking when the M bit is set to 1. Refer to "5.4. Macro

Mode Pipeline".

D bit (Upper)

Upper 32-bit word: UpperOP field type 0

63

62

61 60 59

58

57

56

55

54

53

52

51

50

49

48

47

46

45

44

43

42

41

40

39

38

37

36

35

34

33

32

I

E

M D T

-

dest

ft reg

fs reg

fd reg

OPCODE

bc

- - -

-

0

----

-----

----

--

1

1 1 1

1

4 bits

5 bits

4 bits

2 bits

Lower 32-bit word: LowerOP field type 1

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

09

08

07

06

05

04

03

02

01

00

Lower OP.

dest

ft reg

fs reg

fd reg

OPCODE

1000000

----

-----

------

7 bits

4 bits

5 bits

6 bits

The D bit is bit 60 of the Upper instruction; it specifies a debug break instruction. When the D bit is set to 1

and the instruction is executed, the VU is halted and an interrupt signal is sent to the host processor. The

interrupt can be enabled/disabled by the DE bit (D bit Enable) of the control register FBRST.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-64-

T bit (Upper)

Upper 32-bit word: UpperOP field type 0

63

62

61

60 59 58

57

56

55

54

53

52

51

50

49

48

47

46

45

44

43

42

41

40

39

38

37

36

35

34

33

32

I

E

M

D T -

-

dest

ft reg

fs reg

fd reg

OPCODE

bc

-

- - 0

0

----

-----

----

--

1

1 1 1

1

4 bits

5 bits

4 bits

2 bits

Lower 32-bit word: LowerOP field type 1

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

09

08

07

06

05

04

03

02

01

00

Lower OP.

dest

ft reg

fs reg

fd reg

OPCODE

1000000

----

-----

------

7 bits

4 bits

5 bits

6 bits

The T bit is bit 59 of the Upper instruction; it specifies debug halt instruction. In the same manner as the D bit,

when the T bit is set to 1 and the instruction is executed, the VU is halted, and an interrupt signal is sent to the

host processor. The interrupt can be enabled/disabled by the TE bit (T bit Enable) of the control register

FBRST.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-65-

4.2. Upper Instruction Reference

This section describes the function, operation code, mnemonic, operation, flag changes, and throughput/latency

of Upper instructions. They are listed in alphabetical order in mnemonic form. The descriptions also include

examples, programming notes, and reference information.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-66-

ABS : Absolute Value

Calculates the absolute value of VF[fs] and stores the result in VF[ft].

Operation Code

Upper 32-bit word: UpperOP field type 3

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg ABS

- - - - - 0 0 ---- ----- ----- 00111 1111 01

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 11 bits

Mnemonic

ABS.dest VF[ft]dest, VF[fs]dest

Operation

if (x ⊆ dest) then VF[ft]x = |VF[fs]x|

if (y ⊆ dest) then VF[ft]y = |VF[fs]y|

if (z ⊆ dest) then VF[ft]z = |VF[fs]z|

if (w ⊆ dest) then VF[ft]w = |VF[fs]w|

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

---- ---- ---- ---- - - - - - - - - - - - - -

Throughput/latency

1 / 4

Example

ABS.xyzw VF10xyzw, VF20xyzw

VF10x = |VF20x|

VF10y = |VF20y|

VF10z = |VF20z|

VF10w = |VF20w|

SCE CONFIDENTIAL VU User's Manual Version 6.0

-67-

ADD : Add

Calculates the sum of VF[fs] and VF[ft], and stores the result in VF[fd].

Operation Code

Upper 32-bit word: UpperOP field type 1

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg fd reg ADD

- - - - - 0 0 ---- ----- ----- ----- 101000

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 5 bits 6 bits

Mnemonic

ADD.dest VF[fd]dest, VF[fs]dest, VF[ft]dest

Operation

if (x ⊆ dest) then VF[fd]x = VF[fs]x + VF[ft]x

if (y ⊆ dest) then VF[fd]y = VF[fs]y + VF[ft]y

if (z ⊆ dest) then VF[fd]z = VF[fs]z + VF[ft]z

if (w ⊆ dest) then VF[fd]w = VF[fs]w + VF[ft]w

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

ADD.xyzw VF10xyzw, VF20xyzw, VF30xyzw

VF10x = VF20x + VF30x

VF10y = VF20y + VF30y

VF10z = VF20z + VF30z

VF10w = VF20w + VF30w

SCE CONFIDENTIAL VU User's Manual Version 6.0

-68-

ADDi : Add to I Register

Adds each field of VF[fs] and the I register, and stores the sum in the corresponding field of VF[fd].

Operation Code

Upper 32-bit word: UpperOP field type 1

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg fd reg ADDi

- - - - - 0 0 ---- 00000 ----- ----- 100010

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 5 bits 6 bits

Mnemonic

ADDi.dest VF[fd]dest, VF[fs]dest, I

Operation

if (x ⊆ dest) then VF[fd]x = VF[fs]x + I

if (y ⊆ dest) then VF[fd]y = VF[fs]y + I

if (z ⊆ dest) then VF[fd]z = VF[fs]z + I

if (w ⊆ dest) then VF[fd]w = VF[fs]w + I

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

ADDi.xyzw VF10xyzw, VF20xyzw, I

VF10x = VF20x + I

VF10y = VF20y + I

VF10z = VF20z + I

VF10w = VF20w + I

SCE CONFIDENTIAL VU User's Manual Version 6.0

-69-

ADDq : Add to Q Register

Adds each field of VF[fs] and the Q register, and stores the sum in the corresponding field of VF[fd].

Operation Code

Upper 32-bit word: UpperOP field type 1

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg fd reg ADDq

- - - - - 0 0 ---- 00000 ----- ----- 100000

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 5 bits 6 bits

Mnemonic

ADDq.dest VF[fd]dest, VF[fs]dest, Q

Operation

if (x ⊆ dest) then VF[fd]x = VF[fs]x + Q

if (y ⊆ dest) then VF[fd]y = VF[fs]y + Q

if (z ⊆ dest) then VF[fd]z = VF[fs]z + Q

if (w ⊆ dest) then VF[fd]w = VF[fs]w + Q

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

ADDq.xyzw VF10xyzw, VF20xyzw, Q

VF10x = VF20x + Q

VF10y = VF20y + Q

VF10z = VF20z + Q

VF10w = VF20w + Q

SCE CONFIDENTIAL VU User's Manual Version 6.0

-70-

ADDbc : Broadcast Add

Calculates the sum of each field of VF[fs] and the specified field of VF[ft], and stores the sum in the

corresponding field of VF[fd].

Operation Code

Upper 32-bit word: UpperOP field type 0

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg fd reg ADD? bc

- - - - - 0 0 ---- ----- ----- ----- 0000 --

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 5 bits 4 bits 2 bits

Mnemonic

ADDbc.dest VF[fd]dest, VF[fs]dest, VF[ft]bc

Operation

if (x ⊆ dest) then VF[fd]x = VF[fs]x + VF[ft]bc

if (y ⊆ dest) then VF[fd]y = VF[fs]y + VF[ft]bc

if (z ⊆ dest) then VF[fd]z = VF[fs]z + VF[ft]bc

if (w ⊆ dest) then VF[fd]w = VF[fs]w + VF[ft]bc

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

ADDx.xyzw VF10xyzw, VF20xyzw, VF30x

VF10x = VF20x + VF30x

VF10y = VF20y + VF30x

VF10z = VF20z + VF30x

VF10w = VF20w + VF30x

SCE CONFIDENTIAL VU User's Manual Version 6.0

-71-

ADDA : Add; to Accumulator

Adds VF[fs] and VF[ft], and stores the sum in ACC.

Operation Code

Upper 32-bit word: UpperOP field type 3

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg ADDA

- - - - - 0 0 ---- ----- ----- 01010 1111 00

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 11 bits

Mnemonic

ADDA.dest ACCdest, VF[fs]dest, VF[ft]dest

Operation

if (x ⊆ dest) then ACCx = VF[fs]x + VF[ft]x

if (y ⊆ dest) then ACCy = VF[fs]y + VF[ft]y

if (z ⊆ dest) then ACCz = VF[fs]z + VF[ft]z

if (w ⊆ dest) then ACCw = VF[fs]w + VF[ft]w

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

ADDA.xyzw ACCxyzw, VF20xyzw, VF30xyzw

ACCx = VF20x + VF30x

ACCy = VF20y + VF30y

ACCz = VF20z + VF30z

ACCw = VF20w + VF30w

SCE CONFIDENTIAL VU User's Manual Version 6.0

-72-

ADDAi : Add I Register; to Accumulator

Adds each field of VF[fs] and the I register, and stores the sum in the corresponding field of ACC.

Operation Code

Upper 32-bit word: UpperOP field type 3

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg ADDAi

- - - - - 0 0 ---- 00000 ----- 01000 1111 10

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 11 bits

Mnemonic

ADDAi.dest ACCdest, VF[fs]dest, I

Operation

if (x ⊆ dest) then ACCx = VF[fs]x + I

if (y ⊆ dest) then ACCy = VF[fs]y + I

if (z ⊆ dest) then ACCz = VF[fs]z + I

if (w ⊆ dest) then ACCw = VF[fs]w + I

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

ADDAi.xyzw ACCxyzw, VF20xyzw, I

ACCx = VF20x + I

ACCy = VF20y + I

ACCz = VF20z + I

ACCw = VF20w + I

SCE CONFIDENTIAL VU User's Manual Version 6.0

-73-

ADDAq : Add Q Register; to Accumulator

Adds each field of VF[fs] and the Q register, and stores the sum in the corresponding field of ACC.

Operation Code

Upper 32-bit word: UpperOP field type 3

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg ADDAq

- - - - - 0 0 ---- 00000 ----- 01000 1111 00

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 11 bits

Mnemonic

ADDAq.dest ACCdest, VF[fs]dest, Q

Operation

if (x ⊆ dest) then ACCx = VF[fs]x + Q

if (y ⊆ dest) then ACCy = VF[fs]y + Q

if (z ⊆ dest) then ACCz = VF[fs]z + Q

if (w ⊆ dest) then ACCw = VF[fs]w + Q

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

ADDAq.xyzw ACCxyzw, VF20xyzw, Q

ACCx = VF20x + Q

ACCy = VF20y + Q

ACCz = VF20z + Q

ACCw = VF20w + Q

SCE CONFIDENTIAL VU User's Manual Version 6.0

-74-

ADDAbc : Broadcast Add; to Accumulator

Adds each field of VF[fs] and the specified field of VF[ft], and stores the sum in ACC.

Operation Code

Upper 32-bit word: UpperOP field type 2

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg ADDA? bc

- - - - - 0 0 ---- ----- ----- 00000 1111 --

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 9 bits 2 bits

Mnemonic

ADDAbc.dest ACCdest, VF[fs]dest, VF[ft]bc

Operation

if (x ⊆ dest) then ACCx = VF[fs]x + VF[ft]bc

if (y ⊆ dest) then ACCy = VF[fs]y + VF[ft]bc

if (z ⊆ dest) then ACCz = VF[fs]z + VF[ft]bc

if (w ⊆ dest) then ACCw = VF[fs]w + VF[ft]bc

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

ADDAx.xyzw ACCxyzw, VF20xyzw, VF30x

ACCx = VF20x + VF30x

ACCy = VF20y + VF30x

ACCz = VF20z + VF30x

ACCw = VF20w + VF30x

SCE CONFIDENTIAL VU User's Manual Version 6.0

-75-

CLIP : Clipping Judgment

Performs clipping judgment by the x,y,z field of VF[fs] and the w field of VF[ft] and sets the clipping flag (CF)

to the result.

Operation Code

Upper 32-bit word: UpperOP field type 3

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg CLIPw

- - - - - 0 0 1110 ----- ----- 00111 1111 11

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 11 bits

Mnemonic

CLIPw.xyz VF[fs]xyz, VF[ft]w

Operation

CF = CF << 6

if (VF[fs]x > +|VF[ft]w|) then {set +x flag}

if (VF[fs]x < -|VF[ft]w|) then {set -x flag}

if (VF[fs]y > +|VF[ft]w|) then {set +y flag}

if (VF[fs]y < -|VF[ft]w|) then {set -y flag}

if (VF[fs]z > +|VF[ft]w|) then {set +z flag}

if (VF[fs]z < -|VF[ft]w|) then {set -z flag}

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

---- ---- ---- ---- - - - - - - - - - - - - X

Throughput/Latency

1 / 4

Example

CLIPw.xyz VF10xyz, VF10w

Under the following condition:

VF10x > +|VF10w|,

-|VF10w| < VF10y < +|VF10w|,

VF10z < -|VF10w|

SCE CONFIDENTIAL VU User's Manual Version 6.0

-76-

Before Execution

23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

3rd previous

judgment

2nd previous

judgment

Current

judgment

-

z

+

z

-

y

+

y

-

x

+

x

-

z

+

z

-

y

+

y

-

x

+

x

-

z

+

z

-

y

+

y

-

x

+

x

-

z

+

z

-

y

+

y

-

x

+

x

0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0

6 bits 6 bits 6 bits 6 bits

↓

After Execution

23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

3rd previous

judgment

2nd previous

judgment

Current

judgment

-

z

+

z

-

y

+

y

-

x

+

x

-

z

+

z

-

y

+

y

-

x

+

x

-

z

+

z

-

y

+

y

-

x

+

x

-

z

+

z

-

y

+

y

-

x

+

x

0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 0 0 0 1

6 bits 6 bits 6 bits 6 bits

Remarks

In order to branch according to the results of clipping judgment, the Lower instructions

FCAND/FCEQ/FCGET/FCOR are available.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-77-

FTOI0 : Convert to Fixed Point

Converts the value of VF[fs] into a fixed-point number whose fractional part is 0 bit, and stores the result in

VF[ft].

Operation Code

Upper 32-bit word: UpperOP field type 3

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg FTOI0

- - - - - 0 0 ---- ----- ----- 00101 1111 00

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 11 bits

Mnemonic

FTOI0.dest VF[ft]dest, VF[fs]dest

Operation

if (x ⊆ dest) then

VF[ft]x = float_to_integer0(VF[fs]x)

if (y ⊆ dest) then

VF[ft]y = float_to_integer0(VF[fs]y)

if (z ⊆ dest) then

VF[ft]z = float_to_integer0(VF[fs]z)

if (w ⊆ dest) then

VF[ft]w = float_to_integer0(VF[fs]w)

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

---- ---- ---- ---- - - - - - - - - - - - - -

Throughput/Latency

1 / 4

Example

FTOI0.xyzw VF10xyzw, VF20xyzw

VF10x = float_to_integer0(VF20x)

VF10y = float_to_integer0(VF20y)

VF10z = float_to_integer0(VF20z)

VF10w = float_to_integer0(VF20w)

Remarks

A few examples are shown in the following table:

x float_to_integer0(x)

-0.45 0

0.45 0

0.55 0

123.45 123

SCE CONFIDENTIAL VU User's Manual Version 6.0

-78-

FTOI4 : Convert to Fixed Point

Converts the value of VF[fs] into a fixed-point number whose fractional part is 4 bits, and stores the result in

VF[ft].

Operation Code

Upper 32-bit word: UpperOP field type 3

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg FTOI4

- - - - - 0 0 ---- ----- ----- 00101 1111 01

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 11 bits

Mnemonic

FTOI4.dest VF[ft]dest, VF[fs]dest

Operation

if (x ⊆ dest) then

VF[ft]x = float_to_integer4(VF[fs]x)

if (y ⊆ dest) then

VF[ft]y = float_to_integer4(VF[fs]y)

if (z ⊆ dest) then

VF[ft]z = float_to_integer4(VF[fs]z)

if (w ⊆ dest) then

VF[ft]w = float_to_integer4(VF[fs]w)

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

---- ---- ---- ---- - - - - - - - - - - - - -

Throughput/Latency

1 / 4

Example

FTOI4.xyzw VF10xyzw, VF20xyzw

VF10x = float_to_integer4(VF20x)

VF10y = float_to_integer4(VF20y)

VF10z = float_to_integer4(VF20z)

VF10w = float_to_integer4(VF20w)

Remarks

A few examples are shown in the following table:

x float_to_integer4(x)

-0.45 -7

0.45 7

0.55 8

123.45 1975

SCE CONFIDENTIAL VU User's Manual Version 6.0

-79-

FTOI12 : Convert to Fixed Point

Converts the value of VF[fs] into a fixed-point number whose fractional part is 12 bits, and stores the result in

VF[ft].

Operation Code

Upper 32-bit word: UpperOP field type 3

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg FTOI12

- - - - - 0 0 ---- ----- ----- 00101 1111 10

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 11 bits

Mnemonic

FTOI12.dest VF[ft]dest, VF[fs]dest

Operation

if (x ⊆ dest) then

VF[ft]x = float_to_integer12(VF[fs]x)

if (y ⊆ dest) then

VF[ft]y = float_to_integer12(VF[fs]y)

if (z ⊆ dest) then

VF[ft]z = float_to_integer12(VF[fs]z)

if (w ⊆ dest) then

VF[ft]w = float_to_integer12(VF[fs]w)

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

---- ---- ---- ---- - - - - - - - - - - - - -

Throughput/Latency

1 / 4

Example

FTOI12.xyzw VF10xyzw, VF20xyzw

VF10x = float_to_integer12(VF20x)

VF10y = float_to_integer12(VF20y)

VF10z = float_to_integer12(VF20z)

VF10w = float_to_integer12(VF20w)

Remarks

A few examples are shown in the following table:

x float_to_integer12(x)

-0.45 -1843

0.45 1843

0.55 2252

123.45 505651

SCE CONFIDENTIAL VU User's Manual Version 6.0

-80-

FTOI15 : Convert to Fixed Point

Converts the value of VF[fs] into a fixed-point number whose fractional part is 15 bits, and stores the result in

VF[ft].

Operation Code

Upper 32-bit word: UpperOP field type 3

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg FTOI15

- - - - - 0 0 ---- ----- ----- 00101 1111 11

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 11 bits

Mnemonic

FTOI15.dest VF[ft]dest, VF[fs]dest

Operation

if (x ⊆ dest) then

VF[ft]x = float_to_integer15(VF[fs]x)

if (y ⊆ dest) then

VF[ft]y = float_to_integer15(VF[fs]y)

if (z ⊆ dest) then

VF[ft]z = float_to_integer15(VF[fs]z)

if (w ⊆ dest) then

VF[ft]w = float_to_integer15(VF[fs]w)

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

---- ---- ---- ---- - - - - - - - - - - - - -

Throughput/Latency

1 / 4

Example

FTOI15.xyzw VF10xyzw, VF20xyzw

VF10x = float_to_integer15(VF20x)

VF10y = float_to_integer15(VF20y)

VF10z = float_to_integer15(VF20z)

VF10w = float_to_integer15(VF20w)

Remarks

A few examples are shown in the following table:

x float_to_integer15(x)

-0.45 -14745

0.45 14745

0.55 18022

123.45 4045209

SCE CONFIDENTIAL VU User's Manual Version 6.0

-81-

ITOF0 : Convert to Floating-Point Number

Considers the value of VF[fs] as a fixed-point number whose fractional part is 0 bit, and converts it into floating

point and stores the result in VF[ft].

Operation Code

Upper 32-bit word: UpperOP field type 3

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg ITOF0

- - - - - 0 0 ---- ----- ----- 00100 1111 00

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 11 bits

Mnemonic

ITOF0.dest VF[ft]dest, VF[fs]dest

Operation

if (x ⊆ dest) then

VF[ft]x = integer_to_float0(VF[fs]x)

if (y ⊆ dest) then

VF[ft]y = integer_to_float0(VF[fs]y)

if (z ⊆ dest) then

VF[ft]z = integer_to_float0(VF[fs]z)

if (w ⊆ dest) then

VF[ft]w = integer_to_float0(VF[fs]w)

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

---- ---- ---- ---- - - - - - - - - - - - - -

Throughput/Latency

1 / 4

Example

ITOF0.xyzw VF10xyzw, VF20xyzw

VF10x = integer_to_float0(VF20x)

VF10y = integer_to_float0(VF20y)

VF10z = integer_to_float0(VF20z)

VF10w = integer_to_float0(VF20w)

Remarks

A few examples are shown in the following table:

x integer_to_float0 (x)

-12 -12.0

1 1.0

123 123.0

1843 1843.0

SCE CONFIDENTIAL VU User's Manual Version 6.0

-82-

ITOF4 : Convert to Floating-Point Number

Considers the value of VF[fs] as a fixed-point number whose fractional part is 4 bits, and converts it into

floating point and stores the result in VF[ft].

Operation Code

Upper 32-bit word: UpperOP field type 3

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg ITOF4

- - - - - 0 0 ---- ----- ----- 00100 1111 01

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 11 bits

Mnemonic

ITOF4.dest VF[ft]dest, VF[fs]dest

Operation

if (x ⊆ dest) then

VF[ft]x = integer_to_float4(VF[fs]x)

if (y ⊆ dest) then

VF[ft]y = integer_to_float4(VF[fs]y)

if (z ⊆ dest) then

VF[ft]z = integer_to_float4(VF[fs]z)

if (w ⊆ dest) then

VF[ft]w = integer_to_float4(VF[fs]w)

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

---- ---- ---- ---- - - - - - - - - - - - - -

Throughput/Latency

1 / 4

Example

ITOF4.xyzw VF10xyzw, VF20xyzw

VF10x = integer_to_float4(VF20x)

VF10y = integer_to_float4(VF20y)

VF10z = integer_to_float4(VF20z)

VF10w = integer_to_float4(VF20w)

Remarks

A few examples are shown in the following table:

x integer_to_float4(x)

-12 -0.750000

1 0.062500

123 7.687500

1843 115.187500

SCE CONFIDENTIAL VU User's Manual Version 6.0

-83-

ITOF12 : Convert to Floating-Point Number

Considers the value of VF[fs] as a fixed-point number whose fractional part is 12 bits, and converts it into

floating point and stores the result in VF[ft].

Operation Code

Upper 32-bit word: UpperOP field type 3

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg ITOF12

- - - - - 0 0 ---- ----- ----- 00100 1111 10

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 11 bits

Mnemonic

ITOF12.dest VF[ft]dest, VF[fs]dest

Operation

if (x ⊆ dest) then

VF[ft]x = integer_to_float12(VF[fs]x)

if (y ⊆ dest) then

VF[ft]y = integer_to_float12(VF[fs]y)

if (z ⊆ dest) then

VF[ft]z = integer_to_float12(VF[fs]z)

if (w ⊆ dest) then

VF[ft]w = integer_to_float12(VF[fs]w)

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

---- ---- ---- ---- - - - - - - - - - - - - -

Throughput/Latency

1 / 4

Example

ITOF12.xyzw VF10xyzw, VF20xyzw

VF10x = integer_to_float12(VF20x)

VF10y = integer_to_float12(VF20y)

VF10z = integer_to_float12(VF20z)

VF10w = integer_to_float12(VF20w)

Remarks

A few examples are shown in the following table:

x integer_to_float12(x)

-12 -0.002930

1 0.000244

123 0.030029

1843 0.449951

SCE CONFIDENTIAL VU User's Manual Version 6.0

-84-

ITOF15 : Convert to Floating-Point Number

Considers the value of VF[fs] as a fixed-point number whose fractional part is 15 bits, and converts it into

floating point and stores the result in VF[ft].

Operation Code

Upper 32-bit word: UpperOP field type 3

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg ITOF15

- - - - - 0 0 ---- ----- ----- 00100 1111 11

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 11 bits

Mnemonic

ITOF15.dest VF[ft]dest, VF[fs]dest

Operation

if (x ⊆dest) then

VF[ft]x = integer_to_float15(VF[fs]x)

if (y ⊆ dest) then

VF[ft]y = integer_to_float15(VF[fs]y)

if (z ⊆ dest) then

VF[ft]z = integer_to_float15(VF[fs]z)

if (w ⊆ dest) then

VF[ft]w = integer_to_float15(VF[fs]w)

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

---- ---- ---- ---- - - - - - - - - - - - - -

Throughput/Latency

1 / 4

Example

ITOF15.xyzw VF10xyzw, VF20xyzw

VF10x = integer_to_float15(VF20x)

VF10y = integer_to_float15(VF20y)

VF10z = integer_to_float15(VF20z)

VF10w = integer_to_float15(VF20w)

Remarks

A few examples are shown in the following table:

x integer_to_float15(x)

-12 -0.000366

1 0.000031

123 0.003754

1843 0.056244

SCE CONFIDENTIAL VU User's Manual Version 6.0

-85-

MADD : Product Sum

Adds the value of ACC to the product of VF[fs] and VF[ft], and stores the result in VF[fd].

Operation Code

Upper 32-bit word: UpperOP field type 1

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg fd reg MADD

- - - - - 0 0 ---- ----- ----- ----- 101001

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 5 bits 6 bits

Mnemonic

MADD.dest VF[fd]dest, VF[fs]dest, VF[ft]dest

Operation

if (x ⊆ dest) then VF[fd]x = ACCx + VF[fs]x × VF[ft]x

if (y ⊆ dest) then VF[fd]y = ACCy + VF[fs]y × VF[ft]y

if (z ⊆ dest) then VF[fd]z = ACCz + VF[fs]z × VF[ft]z

if (w ⊆ dest) then VF[fd]w = ACCw + VF[fs]w × VF[ft]w

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

MADD.xyzw VF10xyzw, VF20xyzw, VF30xyzw

VF10x = ACCx + VF20x × VF30x

VF10y = ACCy + VF20y × VF30y

VF10z = ACCz + VF20z × VF30z

VF10w = ACCw + VF20w × VF30w

Remarks

There is an operation error of 1 bit in multiplication, so the value multiplied by 1 may not be the same as the

original value. By using VF[fs] as a multiplicand, the results of multiplication with 1 are guaranteed to be

accurate.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-86-

MADDi : Product Sum; with I Register

Multiplies each field of VF[fs] by the I register, then adds the product to the corresponding field of ACC, and

stores the result in VF[fd].

Operation Code

Upper 32-bit word: UpperOP field type 1

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg fd reg MADDi

- - - - - 0 0 ---- 00000 ----- ----- 100011

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 5 bits 6 bits

Mnemonic

MADDi.dest VF[fd]dest, VF[fs]dest, I

Operation

if (x ⊆ dest) then VF[fd]x = ACCx + VF[fs]x × I

if (y ⊆ dest) then VF[fd]y = ACCy + VF[fs]y × I

if (z ⊆ dest) then VF[fd]z = ACCz + VF[fs]z × I

if (w ⊆ dest) then VF[fd]w = ACCw + VF[fs]w × I

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

MADDi.xyzw VF10xyzw, VF20xyzw, I

VF10x = ACCx + VF20x × I

VF10y = ACCy + VF20y × I

VF10z = ACCz + VF20z × I

VF10w = ACCw + VF20w × I

Remarks

There is an operation error of 1 bit in multiplication, so the value multiplied by 1 may not be the same as the

original value. By using VF[fs] as a multiplicand, the results of multiplication with 1 are guaranteed to be

accurate.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-87-

MADDq : Product Sum; by Q Register

Multiplies each field of VF[fs] by the Q register, then adds the product to the corresponding field of ACC, and

stores the result in VF[fd].

Operation Code

Upper 32-bit word: UpperOP field type 1

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg fd reg MADDq

- - - - - 0 0 ---- 00000 ----- ----- 100001

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 5 bits 6 bits

Mnemonic

MADDq.dest VF[fd]dest, VF[fs]dest, Q

Operation

if (x ⊆ dest) then VF[fd]x = ACCx + VF[fs]x × Q

if (y ⊆ dest) then VF[fd]y = ACCy + VF[fs]y × Q

if (z ⊆ dest) then VF[fd]z = ACCz + VF[fs]z × Q

if (w ⊆ dest) then VF[fd]w = ACCw + VF[fs]w × Q

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

MADDq.xyzw VF10xyzw, VF20xyzw, Q

VF10x = ACCx + VF20x × Q

VF10y = ACCy + VF20y × Q

VF10z = ACCz + VF20z × Q

VF10w = ACCw + VF20w × Q

Remarks

There is an operation error of 1 bit in multiplication, so the value multiplied by 1 may not be the same as the

original value. By using VF[fs] as a multiplicand, the results of multiplication with 1 are guaranteed to be

accurate.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-88-

MADDbc : Broadcast Product Sum

Multiplies each field of VF[fs] by the specified field of VF[ft], then adds the product to the corresponding field

of ACC and stores the result in VF[fd].

Operation Code

Upper 32-bit word: UpperOP field type 0

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg fd reg MADD? bc

- - - - - 0 0 ---- ----- ----- ----- 0010 --

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 5 bits 4 bits 2 bits

Mnemonic

MADDbc.dest VF[fd]dest, VF[fs]dest, VF[ft]bc

Operation

if (x ⊆ dest) then VF[fd]x = ACCx + VF[fs]x × VF[ft]bc

if (y ⊆ dest) then VF[fd]y = ACCy + VF[fs]y × VF[ft]bc

if (z ⊆ dest) then VF[fd]z = ACCz + VF[fs]z × VF[ft]bc

if (w ⊆ dest) then VF[fd]w = ACCw + VF[fs]w × VF[ft]bc

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

MADDx.xyzw VF10xyzw, VF20xyzw, VF30x

VF10x = ACCx + VF20x × VF30x

VF10y = ACCy + VF20y × VF30x

VF10z = ACCz + VF20z × VF30x

VF10w = ACCw + VF20w × VF30x

Remarks

There is an operation error of 1 bit in multiplication, so the value multiplied by 1 may not be the same as the

original value. By using VF[fs] as a multiplicand, the results of multiplication with 1 are guaranteed to be

accurate.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-89-

MADDA : Product Sum; to Accumulator

Multiplies VF[fs] by VF[ft], then adds the product to ACC and stores the result in ACC.

Operation Code

Upper 32-bit word: UpperOP field type 3

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg MADDA

- - - - - 0 0 ---- ----- ----- 01010 1111 01

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 11 bits

Mnemonic

MADDA.dest ACCdest, VF[fs]dest, VF[ft]dest

Operation

if (x ⊆ dest) then ACCx = ACCx + VF[fs]x × VF[ft]x

if (y ⊆ dest) then ACCy = ACCy + VF[fs]y × VF[ft]y

if (z ⊆ dest) then ACCz = ACCz + VF[fs]z × VF[ft]z

if (w ⊆ dest) then ACCw = ACCw + VF[fs]w × VF[ft]w

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

MADDA.xyzw ACCxyzw, VF20xyzw, VF30xyzw

ACCx = ACCx + VF20x × VF30x

ACCy = ACCy + VF20y × VF30y

ACCz = ACCz + VF20z × VF30z

ACCw = ACCw + VF20w × VF30w

Remarks

There is an operation error of 1 bit in multiplication, so the value multiplied by 1 may not be the same as the

original value. By using VF[fs] as a multiplicand, the results of multiplication with 1 are guaranteed to be

accurate.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-90-

MADDAi : Product Sum

;

by I register, to Accumulator

Multiplies each field of VF[fs] by the I register, then adds the product to the corresponding field of ACC and

stores the result in ACC.

Operation Code

Upper 32-bit word: UpperOP field type 3

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg MADDAi

- - - - - 0 0 ---- 00000 ----- 01000 1111 11

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 11 bits

Mnemonic

MADDAi.dest ACCdest, VF[fs]dest, I

Operation

if (x ⊆ dest) then ACCx = ACCx + VF[fs]x × I

if (y ⊆ dest) then ACCy = ACCy + VF[fs]y × I

if (z ⊆ dest) then ACCz = ACCz + VF[fs]z × I

if (w ⊆ dest) then ACCw = ACCw + VF[fs]w × I

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

MADDAi.xyzw ACCxyzw, VF20xyzw, I

ACCx = ACCx + VF20x × I

ACCy = ACCy + VF20y × I

ACCz = ACCz + VF20z × I

ACCw = ACCw + VF20w × I

Remarks

There is an operation error of 1 bit in multiplication, so the value multiplied by 1 may not be the same as the

original value. By using VF[fs] as a multiplicand, the results of multiplication with 1 are guaranteed to be

accurate.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-91-

MADDAq : Product Sum; by Q Register, to Accumulator

Multiplies each field of VF[fs] by the Q register, then adds the product to the corresponding field of ACC and

stores the result in ACC.

Operation Code

Upper 32-bit word: UpperOP field type 3

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg MADDAq

- - - - - 0 0 ---- 00000 ----- 01000 1111 01

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 11 bits

Mnemonic

MADDAq.dest ACCdest, VF[fs]dest, Q

Operation

if (x ⊆ dest) then ACCx = ACCx + VF[fs]x × Q

if (y ⊆ dest) then ACCy = ACCy + VF[fs]y × Q

if (z ⊆ dest) then ACCz = ACCz + VF[fs]z × Q

if (w ⊆ dest) then ACCw = ACCw + VF[fs]w × Q

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

MADDAq.xyzw ACCxyzw, VF20xyzw, Q

ACCx = ACCx + VF20x × Q

ACCy = ACCy + VF20y × Q

ACCz = ACCz + VF20z × Q

ACCw = ACCw + VF20w × Q

Remarks

There is an operation error of 1 bit in multiplication, so the value multiplied by 1 may not be the same as the

original value. By using VF[fs] as a multiplicand, the results of multiplication with 1 are guaranteed to be

accurate.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-92-

MADDAbc : Broadcast Product Sum; to Accumulator

Multiplies each field of VF[fs] by the specified field of VF[ft], then adds the product to the corresponding field

of ACC and stores the result in ACC.

Operation Code

Upper 32-bit word: UpperOP field type 2

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg MADDA? bc

- - - - - 0 0 ---- ----- ----- 00010 1111 --

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 9 bits 2 bits

Mnemonic

MADDAbc.dest ACCdest, VF[fs]dest, VF[ft]bc

Operation

if (x ⊆ dest) then ACCx = ACCx + VF[fs]x × VF[ft]bc

if (y ⊆ dest) then ACCy = ACCy + VF[fs]y × VF[ft]bc

if (z ⊆ dest) then ACCz = ACCz + VF[fs]z × VF[ft]bc

if (w ⊆ dest) then ACCw = ACCw + VF[fs]w × VF[ft]bc

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

MADDAx.xyzw ACCxyzw, VF20xyzw, VF30x

ACCx = ACCx + VF20x × VF30x

ACCy = ACCy + VF20y × VF30x

ACCz = ACCz + VF20z × VF30x

ACCw = ACCw + VF20w × VF30x

Remarks

There is an operation error of 1 bit in multiplication, so the value multiplied by 1 may not be the same as the

original value. By using VF[fs] as a multiplicand, the results of multiplication with 1 are guaranteed to be

accurate.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-93-

MAX : Maximum Value

Compares VF[fs] with VF[ft] and stores the greater value in VF[fd].

Operation Code

Upper 32-bit word: UpperOP field type 1

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg fd reg MAX

- - - - - 0 0 ---- ----- ----- ----- 101011

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 5 bits 6 bits

Mnemonic

MAX.dest VF[fd]dest, VF[fs]dest, VF[ft]dest

Operation

if (x ⊆ dest) then

if (VF[fs]x > VF[ft]x)

{VF[fd]x = VF[fs]x}

else

{VF[fd]x = VF[ft]x}

(The same operation is performed for the y and z fields.)

if (w ⊆ dest) then

if (VF[fs]w > VF[ft]w)

{VF[fd]w = VF[fs]w}

else

{VF[fd]w = VF[ft]w}

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

---- ---- ---- ---- - - - - - - - - - - - - -

Throughput/Latency

1 / 4

Example

MAX.xyzw VF10xyzw, VF20xyzw, VF30xyzw

if (VF20x > VF30x) then {VF10x = VF20x} else {VF10x = VF30x}

if (VF20y > VF30y) then {VF10y = VF20y} else {VF10y = VF30y}

if (VF20z > VF30z) then {VF10z = VF20z} else {VF10z = VF30z}

if (VF20w > VF30w) then {VF10w = VF20w} else {VF10w = VF30w}

SCE CONFIDENTIAL VU User's Manual Version 6.0

-94-

MAXi : Maximum Value

Compares each field of VF[fs] with the I register and stores the greater field in the corresponding field of

VF[fd].

Operation Code

Upper 32-bit word: UpperOP field type 1

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg fd reg MAXi

- - - - - 0 0 ---- 00000 ----- ----- 011101

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 5 bits 6 bits

Mnemonic

MAXi.dest VF[fd]dest, VF[fs]dest, I

Operation

if (x ⊆ dest) then

if (VF[fs]x > I) then

VF[fd]x = VF[fs]x

else

VF[fd]x = I

(The same operation is performed for the y and z fields.)

if (w ⊆ dest) then

if (VF[fs]w > I) then

VF[fd]w = VF[fs]w

else

VF[fd]w = I

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

---- ---- ---- ---- - - - - - - - - - - - - -

Throughput/Latency

1 / 4

Example

MAXi.xyzw VF10xyzw, VF20xyzw, I

if (VF20x > I) then {VF10x = VF20x} else {VF10x = I}

if (VF20y > I) then {VF10y = VF20y} else {VF10y = I}

if (VF20z > I) then {VF10z = VF20z} else {VF10z = I}

if (VF20w > I) then {VF10w = VF20w} else {VF10w = I}

SCE CONFIDENTIAL VU User's Manual Version 6.0

-95-

MAXbc : Maximum Value

Compares value of each field of VF[fs] with the specified field of VF[ft] and stores the greater value in the

corresponding field of VF[fd].

Operation Code

Upper 32-bit word: UpperOP field type 0

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg fd reg MAX? bc

- - - - - 0 0 ---- ----- ----- ----- 0100 --

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 5 bits 4 bits 2 bits

Mnemonic

MAXbc.dest VF[fd]dest, VF[fs]dest, VF[ft]bc

Operation

if (x ⊆ dest) then

if (VF[fs]x > VF[ft]bc)

{VF[fd]x = VF[fs]x}

else

{VF[fd]x = VF[ft]bc}

(The same operation is performed for the y and z fields.)

if (w ⊆ dest) then

if (VF[fs]w > VF[ft]bc)

{VF[fd]w = VF[fs]w}

else

{VF[fd]w = VF[ft]bc}

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

---- ---- ---- ---- - - - - - - - - - - - - -

Throughput/Latency

1 / 4

Example

MAXw.xyzw VF01xyzw, VF01xyzw, VF00w

A value of less than 1.0 in each field of VF01 is replaced with 1.0.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-96-

MINI : Minimum Value

Compares VF[fs] with VF[ft] and stores the smaller value in VF[fd].

Operation Code

Upper 32-bit word: UpperOP field type 1

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg fd reg MINI

- - - - - 0 0 ---- ----- ----- ----- 101111

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 5 bits 6 bits

Mnemonic

MINI.dest VF[fd]dest, VF[fs]dest, VF[ft]dest

Operation

if (x ⊆ dest) then

if (VF[fs]x < VF[ft]x)

{VF[fd]x = VF[fs]x}

else

{VF[fd]x = VF[ft]x}

(The same operation is performed for the y and z fields.)

if (w ⊆ dest) then

if (VF[fs]w < VF[ft]w)

{VF[fd]w = VF[fs]w}

else

{VF[fd]w = VF[ft]w}

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

---- ---- ---- ---- - - - - - - - - - - - - -

Throughput/Latency

1 / 4

Example

MINI.xyzw VF10xyzw, VF20xyzw, VF30xyzw

if (VF20x < VF30x) then {VF10x = VF20x} else {VF10x = VF30x}

if (VF20y < VF30y) then {VF10y = VF20y} else {VF10y = VF30y}

if (VF20z < VF30z) then {VF10z = VF20z} else {VF10z = VF30z}

if (VF20w < VF30w) then {VF10w = VF20w} else {VF10w = VF30w}

SCE CONFIDENTIAL VU User's Manual Version 6.0

-97-

MINIi : Minimum Value

Compares each field of VF[fs] with the I register and stores the smaller value in the corresponding field of

VF[fd].

Operation Code

Upper 32-bit word: UpperOP field type 1

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg fd reg MINIi

- - - - - 0 0 ---- 00000 ----- ----- 011111

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 5 bits 6 bits

Mnemonic

MINIi.dest VF[fd]dest, VF[fs]dest, I

Operation

if (x ⊆ dest) then

if (VF[fs]x < I) then

VF[fd]x = VF[fs]x

else

VF[fd]x = I

(The same operation is performed for the y and z fields.)

if (w ⊆ dest) then

if (VF[fs]w < I) then

VF[fd]w = VF[fs]w

else

VF[fd]w = I

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

---- ---- ---- ---- - - - - - - - - - - - - -

Throughput/Latency

1 / 4

Example

MINIi.xyzw VF10xyzw, VF20xyzw, I

if (VF20x < I) then {VF10x = VF20x} else {VF10x = I}

if (VF20y < I) then {VF10y = VF20y} else {VF10y = I}

if (VF20z < I) then {VF10z = VF20z} else {VF10z = I}

if (VF20w < I) then {VF10w = VF20w} else {VF10w = I}

SCE CONFIDENTIAL VU User's Manual Version 6.0

-98-

MINIbc : Minimum Value

Compares each field of VF[fs] with the specified field of VF[ft] and stores the smaller value in the corresponding

field of VF[fd].

Operation Code

Upper 32-bit word: UpperOP field type 0

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg fd reg MINI? bc

- - - - - 0 0 ---- ----- ----- ----- 0101 --

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 5 bits 4 bits 2 bits

Mnemonic

MINIbc.dest VF[fd]dest, VF[fs]dest, VF[ft]bc

Operation

if (x ⊆ dest) then

if (VF[fs]x < VF[ft]bc)

{VF[fd]x = VF[fs]x}

else

{VF[fd]x = VF[ft]bc}

(The same operation is performed for the y and z fields.)

if (w ⊆ dest) then

if (VF[fs]w < VF[ft]bc)

{VF[fd]w = VF[fs]w}

else

{VF[fd]w = VF[ft]bc}

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

---- ---- ---- ---- - - - - - - - - - - - - -

Throughput/Latency

1 / 4

Example

MINIx.xyzw VF10xyzw, VF20xyzw, VF30x

if (VF20x < VF30x) then {VF10x = VF20x} else {VF10x = VF30x}

if (VF20y < VF30x) then {VF10y = VF20y} else {VF10y = VF30x}

if (VF20z < VF30x) then {VF10z = VF20z} else {VF10z = VF30x}

if (VF20w < VF30x) then {VF10w = VF20w} else {VF10w = VF30x}

SCE CONFIDENTIAL VU User's Manual Version 6.0

-99-

MSUB : Multiply and Subtract

Multiplies VF[fs] and VF[ft], then subracts the product obtained from the value of ACC, and stores the result in

VF[fd].

Operation Code

Upper 32-bit word: UpperOP field type 1

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg fd reg MSUB

- - - - - 0 0 ---- ----- ----- ----- 101101

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 5 bits 6 bits

Mnemonic

MSUB.dest VF[fd]dest, VF[fs]dest, VF[ft]dest

Operation

if (x ⊆ dest) then VF[fd]x = ACCx - VF[fs]x × VF[ft]x

if (y ⊆ dest) then VF[fd]y = ACCy - VF[fs]y × VF[ft]y

if (z ⊆ dest) then VF[fd]z = ACCz - VF[fs]z × VF[ft]z

if (w ⊆ dest) then VF[fd]w = ACCw - VF[fs]w × VF[ft]w

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

MSUB.xyzw VF10xyzw, VF20xyzw, VF30xyzw

VF10x = ACCx - VF20x × VF30x

VF10y = ACCy - VF20y × VF30y

VF10z = ACCz - VF20z × VF30z

VF10w = ACCw - VF20w × VF30w

Remarks

There is an operation error of 1 bit in multiplication, so the value multiplied by 1 may not be the same as the

original value. By using VF[fs] as a multiplicand, the results of multiplication with 1 are guaranteed to be

accurate.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-100-

MSUBi : Multiply and Subtract; with I Register

Multiplies each field of VF[fs] by the I register, then subtracts the product from the corresponding field of ACC

and stores the result in the corresponding field of VF[fd].

Operation Code

Upper 32-bit word: UpperOP field type 1

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg fd reg MSUBi

- - - - - 0 0 ---- 00000 ----- ----- 100111

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 5 bits 6 bits

Mnemonic

MSUBi.dest VF[fd]dest, VF[fs]dest, I

Operation

if (x ⊆ dest) then VF[fd]x = ACCx - VF[fs]x × I

if (y ⊆ dest) then VF[fd]y = ACCy - VF[fs]y × I

if (z ⊆ dest) then VF[fd]z = ACCz - VF[fs]z × I

if (w ⊆ dest) then VF[fd]w = ACCw - VF[fs]w × I

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

MSUBi.xyzw VF10xyzw, VF20xyzw, I

VF10x = ACCx - VF20x × I

VF10y = ACCy - VF20y × I

VF10z = ACCz - VF20z × I

VF10w = ACCw - VF20w × I

Remarks

There is an operation error of 1 bit in multiplication, so the value multiplied by 1 may not be the same as the

original value. By using VF[fs] as a multiplicand, the results of multiplication with 1 are guaranteed to be

accurate.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-101-

MSUBq : Multiply and Subtract; by Q Register

Multiplies each field of VF[fs] by the Q register, then subtracts the product from the corresponding field of

ACC and stores the result in the corresponding field of VF[fd].

Operation Code

Upper 32-bit word: UpperOP field type 1

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg fd reg MSUBq

- - - - - 0 0 ---- 00000 ----- ----- 100101

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 5 bits 6 bits

Mnemonic

MSUBq.dest VF[fd]dest, VF[fs]dest, Q

Operation

if (x ⊆ dest) then VF[fd]x = ACCx - VF[fs]x × Q

if (y ⊆ dest) then VF[fd]y = ACCy - VF[fs]y × Q

if (z ⊆ dest) then VF[fd]z = ACCz - VF[fs]z × Q

if (w ⊆ dest) then VF[fd]w = ACCw - VF[fs]w × Q

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

MSUBq.xyzw VF10xyzw, VF20xyzw, Q

VF10x = ACCx - VF20x × Q

VF10y = ACCy - VF20y × Q

VF10z = ACCz - VF20z × Q

VF10w = ACCw - VF20w × Q

Remarks

There is an operation error of 1 bit in multiplication, so the value multiplied by 1 may not be the same as the

original value. By using VF[fs] as a multiplicand, the results of multiplication with 1 are guaranteed to be

accurate.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-102-

MSUBbc : Broadcast Multiply and Subtract

Multiplies each field of VF[fs] by the specified field of VF[ft], then subtracts the product from the

corresponding field of ACC and stores the result in the corresponding field of VF[fd].

Operation Code

Upper 32-bit word: UpperOP field type 0

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg fd reg MSUB? bc

- - - - - 0 0 ---- ----- ----- ----- 0011 --

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 5 bits 4 bits 2 bits

Mnemonic

MSUBbc.dest VF[fd]dest, VF[fs]dest, VF[ft]bc

Operation

if (x ⊆ dest) then VF[fd]x = ACCx - VF[fs]x × VF[ft]bc

if (y ⊆ dest) then VF[fd]y = ACCy - VF[fs]y × VF[ft]bc

if (z ⊆ dest) then VF[fd]z = ACCz - VF[fs]z × VF[ft]bc

if (w ⊆ dest) then VF[fd]w = ACCw - VF[fs]w × VF[ft]bc

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

MSUBx.xyzw VF10xyzw, VF20xyzw, VF30x

VF10x = ACCx - VF20x × VF30x

VF10y = ACCy - VF20y × VF30x

VF10z = ACCz - VF20z × VF30x

VF10w = ACCw - VF20w × VF30x

Remarks

There is an operation error of 1 bit in multiplication, so the value multiplied by 1 may not be the same as the

original value. By using VF[fs] as a multiplicand, the results of multiplication with 1 are guaranteed to be

accurate.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-103-

MSUBA : Multiply and Subtract; to Accumulator

Multiplies VF[fs] and VF[ft], then subtracts the product from ACC and stores the result in ACC.

Operation Code

Upper 32-bit word: UpperOP field type 3

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg MSUBA

- - - - - 0 0 ---- ----- ----- 01011 1111 01

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 11 bits

Mnemonic

MSUBA.dest ACCdest, VF[fs]dest, VF[ft]dest

Operation

if (x ⊆ dest) then ACCx = ACCx - VF[fs]x × VF[ft]x

if (y ⊆ dest) then ACCy = ACCy - VF[fs]y × VF[ft]y

if (z ⊆ dest) then ACCz = ACCz - VF[fs]z × VF[ft]z

if (w ⊆ dest) then ACCw = ACCw - VF[fs]w × VF[ft]w

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

MSUBA.xyzw ACCxyzw, VF20xyzw, VF30xyzw

ACCx = ACCx - VF20x × VF30x

ACCy = ACCy - VF20y × VF30y

ACCz = ACCz - VF20z × VF30z

ACCw = ACCw - VF20w × VF30w

Remarks

There is an operation error of 1 bit in multiplication, so the value multiplied by 1 may not be the same as the

original value. By using VF[fs] as a multiplicand, the results of multiplication with 1 are guaranteed to be

accurate.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-104-

MSUBAi : Multiply and Subtract; with I Register, to Accumulator

Multiplies each field of VF[fs] and the I register, then subtracts the product from the corresponding field of

ACC and stores the result in ACC.

Operation Code

Upper 32-bit word: UpperOP field type 3

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg MSUBAi

- - - - - 0 0 ---- 00000 ----- 01001 1111 11

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 11 bits

Mnemonic

MSUBAi.dest ACCdest, VF[fs]dest, I

Operation

if (x ⊆ dest) then ACCx = ACCx - VF[fs]x × I

if (y ⊆ dest) then ACCy = ACCy - VF[fs]y × I

if (z ⊆ dest) then ACCz = ACCz - VF[fs]z × I

if (w ⊆ dest) then ACCw = ACCw - VF[fs]w × I

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

MSUBAi.xyzw ACCxyzw, VF20xyzw, I

ACCx = ACCx - VF20x × I

ACCy = ACCy - VF20y × I

ACCz = ACCz - VF20z × I

ACCw = ACCw - VF20w × I

Remarks

There is an operation error of 1 bit in multiplication, so the value multiplied by 1 may not be the same as the

original value. By using VF[fs] as a multiplicand, the results of multiplication with 1 are guaranteed to be

accurate.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-105-

MSUBAq : Multiply and Subtract; by Q Register, to Accumulator

Multiplies VF[fs] by the Q register, then subtracts the product from the corresponding field of ACC and stores

the result in the corresponding field of ACC.

Operation Code

Upper 32-bit word: UpperOP field type 3

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg MSUBAq

- - - - - 0 0 ---- 00000 ----- 01001 1111 01

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 11 bits

Mnemonic

MSUBAq.dest ACCdest, VF[fs]dest, Q

Operation

if (x ⊆ dest) then ACCx = ACCx - VF[fs]x × Q

if (y ⊆ dest) then ACCy = ACCy - VF[fs]y × Q

if (z ⊆ dest) then ACCz = ACCz - VF[fs]z × Q

if (w ⊆ dest) then ACCw = ACCw - VF[fs]w × Q

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

MSUBAq.xyzw ACCxyzw, VF20xyzw, Q

ACCx = ACCx - VF20x × Q

ACCy = ACCy - VF20y × Q

ACCz = ACCz - VF20z × Q

ACCw = ACCw - VF20w × Q

Remarks

There is an operation error of 1 bit in multiplication, so the value multiplied by 1 may not be the same as the

original value. By using VF[fs] as a multiplicand, the results of multiplication with 1 are guaranteed to be

accurate.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-106-

MSUBAbc : Broadcast Multiply and Subtract; to Accumulator

Multiplies each field of VF[fs] and the specified field of VF[ft], then subtracts the product from the

corresponding field of ACC and stores the result in the corresponding field of ACC.

Operation Code

Upper 32-bit word: UpperOP field type 2

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg MSUBA? bc

- - - - - 0 0 ---- ----- ----- 00011 1111 --

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 9 bits 2 bits

Mnemonic

MSUBAbc.dest ACCdest, VF[fs]dest, VF[ft]bc

Operation

if (x ⊆ dest) then ACCx = ACCx - VF[fs]x × VF[ft]bc

if (y ⊆ dest) then ACCy = ACCy - VF[fs]y × VF[ft]bc

if (z ⊆ dest) then ACCz = ACCz - VF[fs]z × VF[ft]bc

if (w ⊆ dest) then ACCw = ACCw - VF[fs]w × VF[ft]bc

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

MSUBAx.xyzw ACCxyzw, VF20xyzw, VF30x

ACCx = ACCx - VF20x × VF30x

ACCy = ACCy - VF20y × VF30x

ACCz = ACCz - VF20z × VF30x

ACCw = ACCw - VF20w × VF30x

Remarks

There is an operation error of 1 bit in multiplication, so the value multiplied by 1 may not be the same as the

original value. By using VF[fs] as a multiplicand, the results of multiplication with 1 are guaranteed to be

accurate.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-107-

MUL : Multiply

Multiplies VF[fs] by VF[ft] and stores the result in VF[fd].

Operation Code

Upper 32-bit word: UpperOP field type 1

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg fd reg MUL

- - - - - 0 0 ---- ----- ----- ----- 101010

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 5 bits 6 bits

Mnemonic

MUL.dest VF[fd]dest, VF[fs]dest, VF[ft]dest

Operation

if (x ⊆ dest) then VF[fd]x = VF[fs]x × VF[ft]x

if (y ⊆ dest) then VF[fd]y = VF[fs]y × VF[ft]y

if (z ⊆ dest) then VF[fd]z = VF[fs]z × VF[ft]z

if (w ⊆ dest) then VF[fd]w = VF[fs]w × VF[ft]w

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

MUL.xyzw VF10xyzw, VF20xyzw, VF30xyzw

VF10x = VF20x × VF30x

VF10y = VF20y × VF30y

VF10z = VF20z × VF30z

VF10w = VF20w × VF30w

Remarks

There is an operation error of 1 bit in multiplication, so the value multiplied by 1 may not be the same as the

original value. By using VF[fs] as a multiplicand, the results of multiplication with 1 are guaranteed to be

accurate.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-108-

MULi : Multiply by I Register

Multiplies each field of VF[fs] by the I register and stores the result in the corresponding field of VF[fd].

Operation Code

Upper 32-bit word: UpperOP field type 1

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg fd reg MULi

- - - - - 0 0 ---- 00000 ----- ----- 011110

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 5 bits 6 bits

Mnemonic

MULi.dest VF[fd]dest, VF[fs]dest, I

Operation

if (x ⊆ dest) then VF[fd]x = VF[fs]x × I

if (y ⊆ dest) then VF[fd]y = VF[fs]y × I

if (z ⊆ dest) then VF[fd]z = VF[fs]z × I

if (w ⊆ dest) then VF[fd]w = VF[fs]w × I

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

MULi.xyzw VF10xyzw, VF20xyzw, I

VF10x = VF20x × I

VF10y = VF20y × I

VF10z = VF20z × I

VF10w = VF20w × I

Remarks

There is an operation error of 1 bit in multiplication, so the value multiplied by 1 may not be the same as the

original value. By using VF[fs] as a multiplicand, the results of multiplication with 1 are guaranteed to be

accurate.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-109-

MULq : Multiply by Q Register

Multiplies each field of VF[fs] by the Q register and stores the result in the corresponding field of VF[fd].

Operation Code

Upper 32-bit word: UpperOP field type 1

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg fd reg MULq

- - - - - 0 0 ---- 00000 ----- ----- 011100

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 5 bits 6 bits

Mnemonic

MULq.dest VF[fd]dest, VF[fs]dest, Q

Operation

if (x ⊆ dest) then VF[fd]x = VF[fs]x × Q

if (y ⊆ dest) then VF[fd]y = VF[fs]y × Q

if (z ⊆ dest) then VF[fd]z = VF[fs]z × Q

if (w ⊆ dest) then VF[fd]w = VF[fs]w × Q

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

MULq.xyzw VF10xyzw, VF20xyzw, Q

VF10x = VF20x × Q

VF10y = VF20y × Q

VF10z = VF20z × Q

VF10w = VF20w × Q

Remarks

There is an operation error of 1 bit in multiplication, so the value multiplied by 1 may not be the same as the

original value. By using VF[fs] as a multiplicand, the results of multiplication with 1 are guaranteed to be

accurate.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-110-

MULbc : Multiply by Broadcast

Multiplies each field of VF[fs] by the specified field of VF[ft] and stores the result in the corresponding field of

VF[fd].

Operation Code

Upper 32-bit word: UpperOP field type 0

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg fd reg MUL? bc

- - - - - 0 0 ---- ----- ----- ----- 0110 --

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 5 bits 4 bits 2 bits

Mnemonic

MULbc.dest VF[fd]dest, VF[fs]dest, VF[ft]bc

Operation

if (x ⊆ dest) then VF[fd]x = VF[fs]x × VF[ft]bc

if (y ⊆ dest) then VF[fd]y = VF[fs]y × VF[ft]bc

if (z ⊆ dest) then VF[fd]z = VF[fs]z × VF[ft]bc

if (w ⊆ dest) then VF[fd]w = VF[fs]w × VF[ft]bc

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

MULx.xyzw VF10xyzw, VF20xyzw, VF30x

VF10x = VF20x × VF30x

VF10y = VF20y × VF30x

VF10z = VF20z × VF30x

VF10w = VF20w × VF30x

Remarks

There is an operation error of 1 bit in multiplication, so the value multiplied by 1 may not be the same as the

original value. By using VF[fs] as a multiplicand, the results of multiplication with 1 are guaranteed to be

accurate.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-111-

MULA : Multiply; to Accumulator

Multiplies VF[fs] by VF[ft] and stores the result in ACC.

Operation Code

Upper 32-bit word: UpperOP field type 3

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg MULA

- - - - - 0 0 ---- ----- ----- 01010 1111 10

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 11 bits

Mnemonic

MULA.dest ACCdest, VF[fs]dest, VF[ft]dest

Operation

if (x ⊆ dest) then ACCx = VF[fs]x × VF[ft]x

if (y ⊆ dest) then ACCy = VF[fs]y × VF[ft]y

if (z ⊆ dest) then ACCz = VF[fs]z × VF[ft]z

if (w ⊆ dest) then ACCw = VF[fs]w × VF[ft]w

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

MULA.xyzw ACCxyzw, VF20xyzw, VF30xyzw

ACCx = VF20x × VF30x

ACCy = VF20y × VF30y

ACCz = VF20z × VF30z

ACCw = VF20w × VF30w

Remarks

There is an operation error of 1 bit in multiplication, so the value multiplied by 1 may not be the same as the

original value. By using VF[fs] as a multiplicand, the results of multiplication with 1 are guaranteed to be

accurate.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-112-

MULAi : Multiply by I Register, to Accumulator

Multiplies each field of VF[fs] by the value of the I register and stores the result in ACC.

Operation Code

Upper 32-bit word: UpperOP field type 3

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg MULAi

- - - - - 0 0 ---- 00000 ----- 00111 1111 10

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 11 bits

Mnemonic

MULAi.dest ACCdest, VF[fs]dest, I

Operation

if (x ⊆ dest) then ACCx = VF[fs]x × I

if (y ⊆ dest) then ACCy = VF[fs]y × I

if (z ⊆ dest) then ACCz = VF[fs]z × I

if (w ⊆ dest) then ACCw = VF[fs]w × I

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

MULAi.xyzw ACCxyzw, VF20xyzw, I

ACCx = VF20x × I

ACCy = VF20y × I

ACCz = VF20z × I

ACCw = VF20w × I

Remarks

There is an operation error of 1 bit in multiplication, so the value multiplied by 1 may not be the same as the

original value. By using VF[fs] as a multiplicand, the results of multiplication with 1 are guaranteed to be

accurate.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-113-

MULAq : Multiply by Q Register, to Accumulator

Multiplies each field of VF[fs] by the Q register, and stores the result in ACC.

Operation Code

Upper 32-bit word: UpperOP field type 3

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg MULAq

- - - - - 0 0 ---- 00000 ----- 00111 1111 00

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 11 bits

Mnemonic

MULAq.dest ACCdest, VF[fs]dest, Q

Operation

if (x ⊆ dest) then ACCx = VF[fs]x × Q

if (y ⊆ dest) then ACCy = VF[fs]y × Q

if (z ⊆ dest) then ACCz = VF[fs]z × Q

if (w ⊆ dest) then ACCw = VF[fs]w × Q

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

MULAq.xyzw ACCxyzw, VF20xyzw, Q

ACCx = VF20x × Q

ACCy = VF20y × Q

ACCz = VF20z × Q

ACCw = VF20w × Q

Remarks

There is an operation error of 1 bit in multiplication, so the value multiplied by 1 may not be the same as the

original value. By using VF[fs] as a multiplicand, the results of multiplication with 1 are guaranteed to be

accurate.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-114-

MULAbc : Broadcast Multiply by broadcast, to Accumulator

Multiplies each field of VF[fs] by the specified field of VF[ft] and stores the result in the corresponding field of

ACC.

Operation Code

Upper 32-bit word: UpperOP field type 2

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg MULA? bc

- - - - - 0 0 ---- ----- ----- 00110 1111 --

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 9 bits 2 bits

Mnemonic

MULAbc.dest ACCdest, VF[fs]dest, VF[ft]bc

Operation

if (x ⊆ dest) then ACCx = VF[fs]x × VF[ft]bc

if (y ⊆ dest) then ACCy = VF[fs]y × VF[ft]bc

if (z ⊆ dest) then ACCz = VF[fs]z × VF[ft]bc

if (w ⊆ dest) then ACCw = VF[fs]w × VF[ft]bc

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

MULAx.xyzw ACCxyzw, VF20xyzw, VF30x

ACCx = VF20x × VF30x

ACCy = VF20y × VF30x

ACCz = VF20z × VF30x

ACCw = VF20w × VF30x

Remarks

There is an operation error of 1 bit in multiplication, so the value multiplied by 1 may not be the same as the

original value. By using VF[fs] as a multiplicand, the results of multiplication with 1 are guaranteed to be

accurate.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-115-

NOP : No Operation

No operation is performed.

Operation Code

Upper 32-bit word: UpperOP field type 3

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg NOP

- - - - - 0 0 0000 00000 00000 01011 1111 11

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 11 bits

Mnemonic

NOP

Operation

None

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

---- ---- ---- ---- - - - - - - - - - - - - -

Throughput/Latency

1 / 4

SCE CONFIDENTIAL VU User's Manual Version 6.0

-116-

OPMULA : Vector Outer Product

Calculates the first part of the vector outer product of VF[fs] and VF[ft] and stores the result in ACC.

Operation Code

Upper 32-bit word: UpperOP field type 3

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg OPMULA

- - - - - 0 0 1110 ----- ----- 01011 1111 10

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 11 bits

Mnemonic

OPMULA.xyz ACCxyz, VF[fs]xyz, VF[ft]xyz

Operation

ACCx = VF[fs]y × VF[ft]z

ACCy = VF[fs]z × VF[ft]x

ACCz = VF[fs]x × VF[ft]y

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

Vector Outer Product: VF20 × VF30(Be careful to the description order of VF20 and VF30.)

OPMULA.xyz ACCxyz, VF20xyz, VF30xyz

OPMSUB.xyz VF10xyz, VF30xyz, VF20xyz

VF10x = VF20y × VF30z - VF30y × VF20z

VF10y = VF20z × VF30x - VF30z × VF20x

VF10z = VF20x × VF30y - VF30x × VF20y

Remarks

The fields subject to the operation are fixed to x,y,z.

There is an operation error of 1 bit in multiplication, so the value multiplied by 1 may not be the same as the

original value.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-117-

OPMSUB : Vector Outer Product

Calculates the last part of the vector outer product of VF[fs], VF[ft] and ACC and stores the result in VF[fd].

Operation Code

Upper 32-bit word: UpperOP field type 1

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg fd reg OPMSUB

- - - - - 0 0 1110 ----- ----- ----- 101110

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 5 bits 6 bits

Mnemonic

OPMSUB.xyz VF[fd]xyz, VF[fs]xyz, VF[ft]xyz

Operation

VF[fd]x = ACCx - VF[fs]y × VF[ft]z

VF[fd]y = ACCy - VF[fs]z × VF[ft]x

VF[fd]z = ACCz - VF[fs]x × VF[ft]y

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

Vector Outer Product: VF20 × VF30(Be careful with the description order of VF20 and VF30.)

OPMULA.xyz ACCxyz, VF20xyz, VF30xyz

OPMSUB.xyz VF10xyz, VF30xyz, VF20xyz

VF10x = VF20y × VF30z - VF30y × VF20z

VF10y = VF20z × VF30x - VF30z × VF20x

VF10z = VF20x × VF30y - VF30x × VF20y

Remarks

The fields subject to the operation are fixed to x,y,z.

There is an operation error of 1 bit in multiplication, so the value multiplied by 1 may not be the same as the

original value.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-118-

SUB : Subtract

Subtracts VF[ft] from VF[fs] and stores the result in VF[fd].

Operation Code

Upper 32-bit word: UpperOP field type 1

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg fd reg SUB

- - - - - 0 0 ---- ----- ----- ----- 101100

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 5 bits 6 bits

Mnemonic

SUB.dest VF[fd]dest, VF[fs]dest, VF[ft]dest

Operation

if (x ⊆ dest) then VF[fd]x = VF[fs]x - VF[ft]x

if (y ⊆ dest) then VF[fd]y = VF[fs]y - VF[ft]y

if (z ⊆ dest) then VF[fd]z = VF[fs]z - VF[ft]z

if (w ⊆ dest) then VF[fd]w = VF[fs]w - VF[ft]w

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

SUB.xyzw VF01xyzw, VF00xyzw, VF00xyzw

The values of all fields of VF01 all become 0.0.

Remarks

When VF00 is specified as the destination, the instruction is used to compare VF[fs] with VF[ft].

SCE CONFIDENTIAL VU User's Manual Version 6.0

-119-

SUBi : Subtract I Register

Subtracts the I register from each field of VF[fs] and stores the result in the corresponding fields of VF[fd].

Operation Code

Upper 32-bit word: UpperOP field type 1

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg fd reg SUBi

- - - - - 0 0 ---- 00000 ----- ----- 100110

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 5 bits 6 bits

Mnemonic

SUBi.dest VF[fd]dest, VF[fs]dest, I

Operation

if (x ⊆ dest) then VF[fd]x = VF[fs]x - I

if (y ⊆ dest) then VF[fd]y = VF[fs]y - I

if (z ⊆ dest) then VF[fd]z = VF[fs]z - I

if (w ⊆ dest) then VF[fd]w = VF[fs]w - I

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

SUBi.xyzw VF10xyzw, VF20xyzw, I

VF10x = VF20x - I

VF10y = VF20y - I

VF10z = VF20z - I

VF10w = VF20w - I

Remarks

When VF00 is specified as the destination, the instruction is used to compare each field of VF[fs] with the I

register.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-120-

SUBq : Subtract Q Register

Subtracts the Q register from each field of VF[fs] and stores the result in the corresponding field of VF[ft].

Operation Code

Upper 32-bit word: UpperOP field type 1

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg fd reg SUBq

- - - - - 0 0 ---- 00000 ----- ----- 100100

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 5 bits 6 bits

Mnemonic

SUBq.dest VF[fd]dest, VF[fs]dest, Q

Operation

if (x ⊆ dest) then VF[fd]x = VF[fs]x - Q

if (y ⊆ dest) then VF[fd]y = VF[fs]y - Q

if (z ⊆ dest) then VF[fd]z = VF[fs]z - Q

if (w ⊆ dest) then VF[fd]w = VF[fs]w - Q

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

SUBq.xyzw VF10xyzw, VF20xyzw, Q

VF10x = VF20x - Q

VF10y = VF20y - Q

VF10z = VF20z - Q

VF10w = VF20w - Q

Remarks

When VF00 is specified as the destination, the instruction is used to compare each field of VF[fs] with the Q

register .

SCE CONFIDENTIAL VU User's Manual Version 6.0

-121-

SUBbc : Broadcast Subtract

Subtracts the specified field of VF[ft] from each field of VF[fs] and stores the result in the corresponding field of

VF[fd].

Operation Code

Upper 32-bit word: UpperOP field type 0

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg fd reg SUB? bc

- - - - - 0 0 ---- ----- ----- ----- 0001 --

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 5 bits 4 bits 2 bits

Mnemonic

SUBbc.dest VF[fd]dest, VF[fs]dest, VF[ft]bc

Operation

if (x ⊆ dest) then VF[fd]x = VF[fs]x - VF[ft]bc

if (y ⊆ dest) then VF[fd]y = VF[fs]y - VF[ft]bc

if (z ⊆ dest) then VF[fd]z = VF[fs]z - VF[ft]bc

if (w ⊆ dest) then VF[fd]w = VF[fs]w - VF[ft]bc

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

SUBx.xyzw VF10xyzw, VF20xyzw, VF30x

VF10x = VF20x - VF30x

VF10y = VF20y - VF30x

VF10z = VF20z - VF30x

VF10w = VF20w - VF30x

Remarks

When VF00 is specified as the destination, the instruction is used to compare each field of VF[fs] with the

VF[ft]bc field.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-122-

SUBA : Substract; to Accumulator

Subtracts VF[ft] from VF[fs] and stores the result in ACC.

Operation Code

Upper 32-bit word: UpperOP field type 3

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg SUBA

- - - - - 0 0 ---- ----- ----- 01011 1111 00

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 11 bits

Mnemonic

SUBA.dest ACCdest, VF[fs]dest, VF[ft]dest

Operation

if (x ⊆ dest) then ACCx = VF[fs]x - VF[ft]x

if (y ⊆ dest) then ACCy = VF[fs]y - VF[ft]y

if (z ⊆ dest) then ACCz = VF[fs]z - VF[ft]z

if (w ⊆ dest) then ACCw = VF[fs]w - VF[ft]w

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

SUBA.xyzw ACCxyzw, VF20xyzw, VF30xyzw

ACCx = VF20x - VF30x

ACCy = VF20y - VF30y

ACCz = VF20z - VF30z

ACCw = VF20w - VF30w

SCE CONFIDENTIAL VU User's Manual Version 6.0

-123-

SUBAi : Subtract I Register; to Accumulator

Subtracts the I register from each field of VF[fs] and stores the result in the corresponding field of ACC.

Operation Code

Upper 32-bit word: UpperOP field type 3

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg SUBAi

- - - - - 0 0 ---- 00000 ----- 01001 1111 10

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 11 bits

Mnemonic

SUBAi.dest ACCdest, VF[fs]dest, I

Operation

if (x ⊆ dest) then ACCx = VF[fs]x - I

if (y ⊆ dest) then ACCy = VF[fs]y - I

if (z ⊆ dest) then ACCz = VF[fs]z - I

if (w ⊆ dest) then ACCw = VF[fs]w - I

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

SUBAi.xyzw ACCxyzw, VF20xyzw, I

ACCx = VF20x - I

ACCy = VF20y - I

ACCz = VF20z - I

ACCw = VF20w - I

SCE CONFIDENTIAL VU User's Manual Version 6.0

-124-

SUBAq : Subtract Q Register; to Accumulator

Subtracts the Q register from each field of VF[fs] and stores the result in the corresponding field of ACC.

Operation Code

Upper 32-bit word: UpperOP field type 3

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg SUBAq

- - - - - 0 0 ---- 00000 ----- 01001 1111 00

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 11 bits

Mnemonic

SUBAq.dest ACCdest, VF[fs]dest, Q

Operation

if (x ⊆ dest) then ACCx = VF[fs]x - Q

if (y ⊆ dest) then ACCy = VF[fs]y - Q

if (z ⊆ dest) then ACCz = VF[fs]z - Q

if (w ⊆ dest) then ACCw = VF[fs]w - Q

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

SUBAq.xyzw ACCxyzw, VF20xyzw, Q

ACCx = VF20x - Q

ACCy = VF20y - Q

ACCz = VF20z - Q

ACCw = VF20w - Q

SCE CONFIDENTIAL VU User's Manual Version 6.0

-125-

SUBAbc : Broadcast Subtract; to Accumulator

Subtracts the specified field of VF[ft] from each field of VF[fs] and stores the result in the corresponding field of

ACC.

Operation Code

Upper 32-bit word: UpperOP field type 2

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32

I E M D T - - dest ft reg fs reg SUBA? bc

- - - - - 0 0 ---- ----- ----- 00001 1111 --

1 1 1 1 1 1 1 4 bits 5 bits 5 bits 9 bits 2 bits

Mnemonic

SUBAbc.dest ACCdest, VF[fs]dest, VF[ft]bc

Operation

if (x ⊆ dest) then ACCx = VF[fs]x - VF[ft]bc

if (y ⊆ dest) then ACCy = VF[fs]y - VF[ft]bc

if (z ⊆ dest) then ACCz = VF[fs]z - VF[ft]bc

if (w ⊆ dest) then ACCw = VF[fs]w - VF[ft]bc

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

XXXX XXXX XXXX XXXX - - X X X X - - X X X X -

Throughput/Latency

1 / 4

Example

SUBAx.xyzw ACCxyzw, VF20xyzw, VF30x

ACCx = VF20x - VF30x

ACCy = VF20y - VF30x

ACCz = VF20z - VF30x

ACCw = VF20w - VF30x

SCE CONFIDENTIAL VU User's Manual Version 6.0

-126-

4.3. Lower Instruction Reference

This section describes the function, operation code, mnemonic, operation, flag changes, and throughput/latency

of Lower instructions. They are listed in alphabetical order in mnemonic form. The descriptions also include

examples, programming notes, and reference information.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-127-

B : Unconditional Branch

Branches to the PC relative address specified with the immediate value.

Operation Code

Lower 32-bit word: LowerOP field type 7

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

B dest it reg is reg Imm11

0100000 0000 00000 00000 -----------

7 bits 4 bits 5 bits 5 bits 11 bits

Mnemonic

B Imm11

Imm11 is a signed interger of 11-bit long; specify the value obtained by dividing the offset to branch

destination by 8.

Operation

PC = PC + Imm11 × 8

The branch destination is determined by adding the value of Imm11, to the address of the instruction in the

branch delay slot (one instruction).

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

---- ---- ---- ---- - - - - - - - - - - - - -

Throughput/Latency

2 / 2

Example

In the example below, the branch destination varies when VI10 matches with either VI01, VI02, or VI03.

NOP BEQ VI10, VI01, PROG1

NOP NOP

NOP BEQ VI10, VI02, PROG2

NOP NOP

NOP BEQ VI10, VI03, PROG3

NOP NOP

NOP B DEFAULT

Remarks

This instruction cannot be placed in the E bit delay slot.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-128-

BAL : Unconditional Branch with Saving Address

Stores the address before branching in VI[it] and branches to the PC relative address specified with the

immediate value.

Operation Code

Lower 32-bit word: LowerOP field type 7

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

BAL dest it reg is reg Imm11

0100001 0000 ----- 00000 -----------

7 bits 4 bits 5 bits 5 bits 11 bits

Mnemonic

BAL VI[it], Imm11

Imm11 is an 11-bit signed integer; specify the value obtained by dividing the offset to branch destination by

8.

Operation

VI[it] = PC + (2 × 8)

PC = PC + Imm11 × 8

The address of the instruction next to the branch delay slot (one instruction) is stored in VI[it].

The branch destination is determined by adding the value of Imm11 to the address of the instruction in the

slot.

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

---- ---- ---- ---- - - - - - - - - - - - - -

Throughput/Latency

2 / 2

Example

BAL VI15, LABEL:

VI15 = PC+(2 × 8)

Branches to LABEL (PC relative address)

Remarks

This instruction cannot be placed in the E bit delay slot.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-129-

DIV : Divide

Divides fsf field of VF[fs] by ftf field of VF[ft] and stores the result in the Q register.

Operation Code

Lower 32-bit word: LowerOP field type 4

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

Lower OP. ftf fsf ft reg fs reg DIV

1000000 -- -- ----- ----- 01110 1111 00

7 bits 2 bits 2 bits 5 bits 5 bits 11 bits

Mnemonic

DIV Q, VF[fs]fsf, VF[ft]ftf

Operation

Q = VF[fs]fsf ÷ VF[ft]ftf

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

---- ---- ---- ---- X X - - - - X X - - - - -

Throughput/Latency

7 / 7

Example

DIV Q, VF10x, VF20y

Q = VF10x ÷ VF20y

Remarks

A data dependency check is not performed with the Q register. To execute subsequent instructions after the

results of the DIV instruction are written to the Q register, use the WAITQ instruction for synchronization.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-130-

EATAN : Arctangent

Calculates the arc tangent of fsf field of VF[fs] and stores the result in the P register.

Operation Code

Lower 32-bit word: LowerOP field type 4

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

Lower OP. ftf fsf ft reg fs reg EATAN

1000000 00 -- 00000 ----- 11111 1111 01

7 bits 2 bits 2 bits 5 bits 5 bits 11 bits

Mnemonic

EATAN P, VF[fs]fsf

Operation

P = arctan( VF[fs]fsf )

The following approximation formula is used for calculating arctan.

Calculation is valid for: 0 <= x <= 1

4

)

()arctan(

15

8

13

7

11

6

9

5

7

4

5

3

21

π

+×+×+×+×+

×+×+×+×=

tTtTtTtT

tTtTtTtTx

Provided )1(

)1(

+

−

=x

x

t

Single precision floating-point expressions Constants Decimal expressions

S E F

Hex.

Expressions

T1 0.999999344348907 0 01111110 11111111111111111110101 3f7ffff5

T2 -0.333298563957214 1 01111101 01010101010011000011100 beaaa61c

T3 0.199465364217758 0 01111100 10011000100000010100110 3e4c40a6

T4 -0.139085337519646 1 01111100 00011100110110001100011 be0e6c63

T5 0.096420042216778 0 01111011 10001010111011111011111 3dc577df

T6 -0.055909886956215 1 01111010 11001010000000111000100 bd6501c4

T7 0.021861229091883 0 01111001 01100110001011001010010 3cb31652

T8 -0.004054057877511 1 01110111 00001001101011111100111 bb84d7e7

π/4 0.785398185253143 0 01111110 10010010000111111011011 3f490fdb

Flag Changes

MAC flag status flag

Oxyzw Uxyzw Sxyzw Zxyzw DS IS OS US SS ZS D I O U S Z

clipping

flag

---- ---- ---- ---- - - - - - - - - - - - - -

Throughput/Latency

53 / 54

Example

EATAN P, VF10x

P = arctan( VF10x )

SCE CONFIDENTIAL VU User's Manual Version 6.0

-131-

Remarks

A data dependency check is not performed with the P register. To execute subsequent instructions after the

results of the EATAN instruction are written to the P register, use the WAITP instruction for

synchronization.

SCE CONFIDENTIAL VU User's Manual Version 6.0

-132-

EATANxy : Arctangent

Calculates the arctangent based on the x, y fields of VF[fs] and stores in the P register.

Operation Code

Lower 32-bit word: LowerOP field type 3

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00

Lower OP. dest ft reg fs reg EATANxy

1000000 1100 00000 ----- 11101 1111 00

7 bits 4 bits 5 bits 5 bits 11 bits

Mnemonic

EATANxy P, VF[fs]

Operation

P = arctan( VF[fs]y / VF[fs]x )

The following approximation formula is used for calculating arctan.

Calculation is valid for: 0 <= y <= x (Excluding 0 = y =x)

4

)

()arctan(

15

8

13

7

11

6

9

5

7

4

5

3

21

π

+×+×+×+×+

×+×+×+×=

tTtTtTtT

tTtTtTtTx

Provided )(

)1(

xy

y

t+

−

=

Single precision floating-point expressions Constants Decimal Expressions

S E F

Hex.

expressions