EE Overview Manual

User Manual: Pdf

Open the PDF directly: View PDF PDF.
Page Count: 64

DownloadEE Overview Manual
Open PDF In BrowserView PDF
EE Overview

Copyright © 2002 Sony Computer Entertainment Inc.
All Rights Reserved.
SCE Confidential

SCE CONFIDENTIAL

EE Overview Version 6.0

© 2002 Sony Computer Entertainment Inc.
Publication date: April 2002
Sony Computer Entertainment Inc.
1-1, Akasaka 7-chome, Minato-ku
Tokyo 107-0052 Japan
Sony Computer Entertainment America
919 East Hillsdale Blvd.
Foster City, CA 94404, U.S.A.
Sony Computer Entertainment Europe
30 Golden Square
London W1F 9LD, U.K.
The EE Overview is supplied pursuant to and subject to the terms of the Sony Computer Entertainment
PlayStation® license agreements.
The EE Overview is intended for distribution to and use by only Sony Computer Entertainment licensed
Developers and Publishers in accordance with the PlayStation® license agreements.
Unauthorized reproduction, distribution, lending, rental or disclosure to any third party, in whole or in part, of
this book is expressly prohibited by law and by the terms of the Sony Computer Entertainment PlayStation®
license agreements.
Ownership of the physical property of the book is retained by and reserved by Sony Computer Entertainment.
Alteration to or deletion, in whole or in part, of the book, its presentation, or its contents is prohibited.
The information in the EE Overview is subject to change without notice. The content of this book is
Confidential Information of Sony Computer Entertainment.
and PlayStation® are registered trademarks, and GRAPHICS SYNTHESIZERTM and
EMOTION ENGINETM are trademarks of Sony Computer Entertainment Inc. All other trademarks are property
of their respective owners and/or their licensors.
®

© SCEI
-2-

SCE CONFIDENTIAL

EE Overview Version 6.0

About This Manual
The "EE Overview" introduces the development concept and main points of the functions and operation of the
Emotion Engine, the CPU of the PlayStation 2.
- Chapter 1 "Architecture Policy" describes the processing and features of the Emotion Engine and Graphics
Synthesizer, which allow the PlayStation 2 to implement high-speed real-time three-dimensional graphics, an
important characteristic of home entertainment software.
- Chapter 2 "Architecture Overview" introduces the functions and operations of the blocks which make up
the Emotion Engine.
- Chapter 3 "Functional Overview" describes the data flow between the blocks of the Emotion Engine and
from the Emotion Engine to the Graphics Synthesizer.
Changes Since Release of 5th Edition
Since release of the 5th Edition of the EE Overview Manual, the following changes have been made.
Note that each of these changes is indicated by a revision bar in the margin of the affected page.
Ch. 2: Architecture Overview
• A correction has been made to the description for Figure 2-11, in section 2.4. IPU Image Data
Processor, on page 45.
Ch. 3: Functional Overview
• A correction has been made to section 3.3.1. Data Transfer Route, on page 60.

© SCEI
-3-

SCE CONFIDENTIAL

EE Overview Version 6.0

(This page is left blank intentionally)

© SCEI
-4-

SCE CONFIDENTIAL

EE Overview Version 6.0

Glossary
Term
EE
EE Core
COP0
COP1
COP2
GS
GIF
IOP
SBUS
VPU (VPU0/VPU1)
VU (VU0/VU1)
VIF (VIF0/VIF1)
VIFcode
SPR
IPU
word
qword
Slice
Packet
Transfer list
Tag
DMAtag
GS primitive
Context
GIFtag
Display list

Definition
Emotion Engine. CPU of the PlayStation 2.
Generalized computation and control unit of EE. Core of the CPU.
EE Core system control coprocessor.
EE Core floating-point operation coprocessor. Also referred to as FPU.
Vector operation unit coupled as a coprocessor of EE Core. VPU0.
Graphics Synthesizer.
Graphics processor connected to EE.
EE Interface unit to GS.
Processor connected to EE for controlling input/output devices.
Bus connecting EE to IOP.
Vector operation unit.
EE contains 2 VPUs: VPU0 and VPU1.
VPU core operation unit.
VPU data decompression unit.
Instruction code for VIF.
Quick-access data memory built into EE Core (Scratchpad memory).
EE Image processor unit.
Unit of data length: 32 bits
Unit of data length: 128 bits
Physical unit of DMA transfer: 8 qwords or less
Data to be handled as a logical unit for transfer processing.
A group of packets transferred in serial DMA transfer processing.
Additional data indicating data size and other attributes of packets.
Tag positioned first in DMA packet to indicate address/size of data and address
of the following packet.
Data to indicate image elements such as point and triangle.
A set of drawing information (e.g. texture, distant fog color, and dither matrix)
applied to two or more primitives uniformly. Also referred to as the drawing
environment.
Additional data to indicate attributes of GS primitives.
A group of GS primitives to indicate batches of images.

© SCEI
-5-

SCE CONFIDENTIAL

EE Overview Version 6.0

(This page is left blank intentionally)

© SCEI
-6-

SCE CONFIDENTIAL

EE Overview Version 6.0

Contents
1. Architecture Policy .................................................................................................................................................................. 9
1.1. Main Points of Architecture Policy ............................................................................................................................. 10
1.2. Expansion of Bandwidth .............................................................................................................................................. 12
1.3. Geometry Engines in Parallel....................................................................................................................................... 14
1.4. Data Decompression/Unpack..................................................................................................................................... 16
1.5. Memory Architecture .................................................................................................................................................... 17
2. Architecture Overview.......................................................................................................................................................... 21
2.1. EE Block Configuration ............................................................................................................................................... 22
2.2. EE Core: CPU................................................................................................................................................................ 24
2.2.1. EE Core Features................................................................................................................................................... 24
2.2.2. Memory Map........................................................................................................................................................... 25
2.2.3. Instruction Set Overview ...................................................................................................................................... 26
2.3. VPU: Vector Operation Processor.............................................................................................................................. 35
2.3.1. VPU Architecture................................................................................................................................................... 35
2.3.2. VPU0........................................................................................................................................................................ 38
2.3.3. VPU1........................................................................................................................................................................ 38
2.3.4. VIF: VPU Interface................................................................................................................................................ 39
2.3.5. Operation Mode and Programming Model ........................................................................................................ 39
2.3.6. VPU Instruction Set Overview............................................................................................................................. 40
2.4. IPU: Image Data Processor.......................................................................................................................................... 45
2.5. GIF: GS Interface.......................................................................................................................................................... 46
2.6. SIF: Sub-CPU Interface ................................................................................................................................................ 47
3. Functional Overview............................................................................................................................................................. 49
3.1. Data Transfer via DMA ................................................................................................................................................ 50
3.1.1. Sliced Transfer ........................................................................................................................................................ 50
3.1.2. Chain Mode Transfer............................................................................................................................................. 50
3.1.3. Interleave Transfer ................................................................................................................................................. 54
3.1.4. Stall Control ............................................................................................................................................................ 54
3.1.5. MFIFO .................................................................................................................................................................... 55
3.2. Data Transfer to VPU................................................................................................................................................... 56
3.2.1. VIF Overview ......................................................................................................................................................... 56
3.2.2. VIF Packet............................................................................................................................................................... 56
3.2.3. VIFcode Structure.................................................................................................................................................. 57
3.2.4. Data Transfer by UNPACK.................................................................................................................................. 58
3.2.5. Double Buffering ................................................................................................................................................... 59
3.3. Data Transfer to GS ...................................................................................................................................................... 60
3.3.1. Data Transfer Route .............................................................................................................................................. 60
3.3.2. Data Format............................................................................................................................................................ 60
3.3.3. PACKED Mode..................................................................................................................................................... 62
3.3.4. REGLIST Mode..................................................................................................................................................... 62
© SCEI
-7-

SCE CONFIDENTIAL

EE Overview Version 6.0

3.3.5. IMAGE Mode.........................................................................................................................................................62
3.4. Image Decompression by IPU .....................................................................................................................................63

© SCEI
-8-

SCE CONFIDENTIAL

EE Overview Version 6.0

1. Architecture Policy

© SCEI
-9-

SCE CONFIDENTIAL

EE Overview Version 6.0

1.1. Main Points of Architecture Policy
Cutting-edge Process for Consumers
A characteristic of a home entertainment computer (a consumer video game console) is that its functions and
performance cannot be changed during its life. Changing functions and performance brings profit to neither
the developer nor the user. With this in mind, the PlayStation 2 is designed to have the highest performance
by adopting the latest technology and the most advanced manufacturing technology from the early stages, to
secure a long product life with performance at the point of sale kept unchanged.
Silicon for Emotion
High-quality computer graphics require a huge amount of calculation. In addition, high-quality
entertainment software requires a large amount of calculation, not only for beautiful graphics but also for
logical inference and simulation of physical phenomena. The PlayStation 2 has sufficient resources to
produce this level of computer graphics, along with these additional elements.
Fast Rendering
One of the most advanced manufacturing technologies for improving performance in computer graphics is
embedded DRAM, equipped with both an operation circuit and memory. By using embedded DRAM for
the rendering engine, the bandwidth between memory and processor expands dramatically. This eliminates a
bottleneck in pixel fill rate, which has been a problem with rendering engines up to now, and improves
drawing performance dynamically.
Multi Path Geometry
Geometry performance is decreased relative to the improved drawing performance. To increase
performance and distribute the load, the architecture allows parallel geometry engines, and allows two or
more processors to share the same rendering engine by timesharing. This is unlike the previous architecture,
in which the rendering engines are in parallel.
On-demand Data Decompression
The performance of memory is decreased relative to the improved processor performance. To make
effective use of low-capacity, low-speed memory, data is placed in memory in a compressed state, and is
decompressed and generated as necessary. High-resolution textures and modeling data, which use a lot of
memory, are normally kept in main memory in a compressed state and decompressed and generated by
means of a special circuit as necessary.
Stall Control and Memory FIFO
A huge amount of intermediate data (display lists) is continually transferred from the geometry engine to the
rendering engine. To control this data flow without imposing a load on the processor, an MFIFO (Memory
FIFO) mechanism is provided. This allows synchronized data transfers from the geometry engine to
memory and from memory to the rendering engine by using memory as a buffer.
Application-Specific Processors
Video game applications inevitably use regular processes such as coordinate conversion and image
processing. Besides the processing load itself, context-switching overhead places a heavy load on the CPU.
For these reasons, many small-scale sub-processors are applied to these regular processes to share CPU
processing.

© SCEI
-10-

SCE CONFIDENTIAL

EE Overview Version 6.0

Intelligent Data Transport
Distributed processing by increasing sub-processors requires synchronization and arbitration controls. To
ensure that these controls are not a load on the CPU, all the instructions (programs) to the sub-processors
are sent along with data by DMA transfer through main memory.
Data Path Buffering
In a UMA (Unified Memory Architecture) system with many sub-processors, competition for bus access
creates a bottleneck. Therefore, a small-capacity buffer memory is embedded in each sub-processor. The
results of processing are temporarily collected there and then collectively DMA-transferred to main memory.
As a result, burst transfer becomes central to bus access. Transmission efficiency should improve as well.

© SCEI
-11-

SCE CONFIDENTIAL

EE Overview Version 6.0

1.2. Expansion of Bandwidth
Embedded DRAM
Since performance of the rendering engine is determined by access to the frame buffer (pixel fill rate),
performance is maximized by using embedded DRAM in the GS (the frame buffer is embedded in the same
chip as the rendering circuit) and by providing multiple pixel engines to draw several pixels in parallel.

Total

2048bit

width

PXE
PXE
Ctrl

Host
Interface

Frame
Buffer

PXE

….

CRTC

Video

PXE

Figure 1-1 Speedup in Rendering Engine by Embedded DRAM
Complete 128-bit Data Bus
The processor has a 128-bit width data bus and registers. The CPU's general-purpose registers (GPR) and
floating-point coprocessor registers are 128 bits wide. All the processors are connected via a 128-bit bus.

CPU

Co-Processor
128 bit

128 bit

GPR0
GPR1

VF00
VF01

128 bit

….

….

GPR31

VF31

128 bit

128 bit

Main Bus

128 bit
Memory System

Figure 1-2 128-bit Bus

© SCEI
-12-

SCE CONFIDENTIAL

EE Overview Version 6.0

Parallel 128-bit Integer Operation
A multimedia instruction set is implemented. It uses the 128-bit wide GPRs (integer registers) in parallel by
dividing them into fields of 8 bits x 16, 16 bits x 8, 32 bits x 4, and 64 bits x 2. The following example shows
execution of 16-parallel 8-bit addition.

128bit
rs

rt

a0 a1 a2 a3 a4

a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 b11 b12 b13 b14 b15
8bit

rd

PADDB (Parallel Add Byte)

c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15
SQ (Store Quad Word)
Main Memory

Figure 1-3 128-bit Parallel Processing by Multimedia Instruction
Parallel 128-bit Floating Operation
The 128-bit floating-point registers are divided into four 32-bit floating-point fields. Four FMACs (floatingpoint multiply-add ALUs) are provided for four fields to perform operations in parallel. The following
example shows the execution of four parallel 32-bit multiplications.

128 bit
32 bit
VFa .x

VFa .y

MUL

VFa .z

VFa .w

MUL
FMAC

1 cycle
(Throughput)

VFb .x

MUL
FMAC

VFc .x

VFb .y

VFb .w

MUL
FMAC

VFc .y

VFb .z

VFc .z

FMAC

VFc.w

128 bit

Figure 1-4 4-Parallel Floating-Point Operation

© SCEI
-13-

SCE CONFIDENTIAL

EE Overview Version 6.0

1.3. Geometry Engines in Parallel
Principle
To improve geometry performance relative to drawing performance, an architecture is implemented with
two geometry engines connected in parallel to one rendering engine. One of the geometry engines consists
of the CPU, with a high degree of flexibility, and a vector operation unit (VPU0) as a coprocessor to perform
complex irregular geometry processing, including physical simulation. The other engine is structured with a
programmable vector operation unit (VPU1) to perform simple, repetitive geometry processing such as
background and distant views.
The transfer right between the display lists from each geometry engine is arbitrated, and the display lists are
supplied to the rendering engine asynchronously.

2 Geometry Engines

Rendering Engine

CPU
GS
GIF

Vector
Engine

Arbiter

Rendering
Logic

Frame
Buffer

Vector
Engine

Figure 1-5 Parallel Geometry Engines
Dual Context
The display lists supplied from the geometry engines have a context that includes status data such as texture
page and drawing mode. To eliminate the need for setting context information again, two contexts are
maintained in the GS, corresponding to the two geometry engines, VPU0 and VPU1. This is the dual
context mechanism.

GS

GIF
Arbiter

Context
Information

GC0
SEL

Display
List

Rendering
Logic
GC1

GC: Graphic Context

Figure 1-6 Rendering Engine with Dual Context

© SCEI
-14-

Frame
Buffer

SCE CONFIDENTIAL

EE Overview Version 6.0

Data Path
Of the two geometry engines, the higher-priority VPU1 is directly connected to the GS, and the lowerpriority CPU+VPU0 is connected to the GS through the main bus. Because data transfer from the lowerpriority geometry engine might be suspended, generated display lists are buffered temporarily in main
memory. The corresponding DMA channels can monitor each other's transfer address so that the buffer
does not overflow.

GS

Arbiter
1st Priority

CPU

VPU0

2nd Priority

VPU1

Main Bus
128bit

MFIFO
Main Memory

Figure 1-7 Typical Data Paths
Application-Specific Path
The two geometry paths seem to the programmer to be two independent paths. That is, it is possible to
divide graphic processing of the application into two and allocate a portion to each geometry engine.
In general, a high-speed geometry engine (VPU1) takes charge of regular processing such as background and
distant view, and a geometry engine with a high degree of flexibility (CPU+VPU0) takes charge of complex
irregular processing including physical simulation. Simple lighting calculations and transparency perspective
conversions can be executed in VPU1, and the CPU does not have to participate in them directly.

Non-fixed, emotional, and creative operation
EE Core

VPU0
Rendering
Engine
VPU1

Fixed, routine operation

Figure 1-8 Processing Allocation of Geometry Engines

© SCEI
-15-

SCE CONFIDENTIAL

EE Overview Version 6.0

1.4. Data Decompression/Unpack
Image Decompression
High-resolution texture data requiring a large amount of memory is stored in main memory in a compressed
state, and is decompressed with a special decompression processor (IPU) when used. The decompressed
texture data is returned to main memory temporarily and transferred to the GS.

Display List

Compressed Image

Decompresser

Main
Memory

Texture

GS

Arbiter

IPU
Display List

Figure 1-9 Image Data Decompression
Geometry Data Unpack
Modeling data is packed into an optimal bit width in data units, maintained in main memory, and
automatically unpacked by the VIF when sent to the geometry engine (VPU). As a result, the data size in
main memory is reduced, and the load on the VPU can be reduced.

VPU
VU

32bit

VUMEM

X

Z

unpack

VIF
Main
Memory

Y

X

Y

Z

8/16bit

Figure 1-10 Geometry Data Unpack

© SCEI
-16-

1.0

SCE CONFIDENTIAL

EE Overview Version 6.0

1.5. Memory Architecture
Hybrid UMA
To correct the problems with UMA (Unified Memory Architecture), each processor has a high-speed, small
capacity cache or working memory for exclusive use, and is connected to the large capacity shared memory
through the high-speed memory.
By storing the data read from or written to memory in 4-qword units, the cache speeds up the second and
succeeding accesses to the nearby addresses and decreases the frequency of accesses to the main memory.
Access to the main memory is made only when
- the data attempted to be read is not in the cache (cache miss)
- the data written to the cache is not reflected in memory (dirty) and the cache space is required to be
freed to access other addresses (cache out).
Data is transferred between the cache and main memory as burst access every 4-qword block (cache line) to
improve the bus efficiency.

CPU

Geometry
Engine

Rendering
Engine

Local Cache

Local Cache

Local Cache

Burst Access

Main Bus

Main Memory

Figure 1-11 Shared Main Memory and Local Cache
CPU Cache
The CPU has an instruction cache (I-Cache) and a data cache (D-Cache). The data cache has the ability to
load a necessary word from a cache line first (sub-block ordering) and to permit a hazard-free cache-line hit
while a previous load is still in process (hit-under-miss). Since this hit-under-miss effect is similar to the
prefetch (PREF) instruction, it is effective when the address to be accessed is known in advance.
Cache

Size

Way

Line Size

Instruction
Data

16 KB
8 KB

2-way
2-way

4 qwords
4 qwords

Sub-block
Ordering
No
Yes

Hit-under-miss
No
Yes

The output from the cache is also buffered in the Write Back Buffer (WBB). The WBB is a FIFO of 8
qwords. Write requests are stored here, and then written to memory according to the state of the main bus.

© SCEI
-17-

SCE CONFIDENTIAL

EE Overview Version 6.0

Uncached Access
In applications primarily designed for computer graphics, writing display lists to memory is the major
process. The display lists are calculated from the three-dimensional data just read from memory. When
processing a one-way data flow like this, the use of cache may be a disadvantage. Furthermore, in some
cases (e.g. when writing hardware registers and writing data which should be DMA-transferred), it is
preferable that written data be reflected in the main memory immediately.
Therefore, a mode that does not use cache ( uncached mode) is provided. To speed up reading while writing
synchronously, an uncached accelerated mode that uses a special-purpose buffer (UCAB: uncached
accelerated buffer) is also available. The UCAB (in size 8 qwords) speeds up continuous data reading from
the adjoining addresses.
Cached

UnCached

UnCached Accelarated

GPR

GPR

GPR

D Cache

WBB

Memory

WBB

Memory

Figure 1-12 Three Memory Access Modes

© SCEI
-18-

WBB

UCAB

Memory

SCE CONFIDENTIAL

EE Overview Version 6.0

Scratchpad RAM
A general-purpose high-speed internal memory (Scratchpad RAM: SPR) useable as a working memory for
the CPU is embedded, in addition to the data cache. DMA transfer between main memory and the SPR can
be performed in parallel with SPR access from the CPU. Main memory access overhead can be hidden from
the program by using the SPR as a double buffer.
CPU

CPU

SPR

#1

Calc
pseudodual-port

#2

DMA

External
Memory

#1
#2 read

Calc

Calc

read

Calc

read

write read

read write read

Calc

write read

Calc

write read

Memory

a) Architecture

b) Scheduling

Figure 1-13 Double Buffering with SPR
List processor DMA
Display lists are not always located in consecutive areas in memory. In most cases they can be arranged
discontinuously by adopting a linked list structure. To negate the need for data sorting when transferring
non-continuous data between processors, the DMAC can trace data lists according to the tag information
(DMAtag) in the data. This releases the CPU from simple memory copying and increases efficiency in using
the cache.
DMA Start Address
Reference
Texture
Reference
Vertex

CONT
Matrix
REF
REF
NEXT
Next Object

Matrix

Texture

Vertex

Figure 1-14 List Processing with DMAC

© SCEI
-19-

SCE CONFIDENTIAL

EE Overview Version 6.0

(This page is left blank intentionally)

© SCEI
-20-

SCE CONFIDENTIAL

EE Overview Version 6.0

2. Architecture Overview

© SCEI
-21-

SCE CONFIDENTIAL

EE Overview Version 6.0

2.1. EE Block Configuration
The block diagram and main specifications of the EE are shown below.
EE Core

cop1
FPU

CPU

IU 0
64bit

Interrupt
Controller

I$
16KB

VPU0 cop2
IU 1
64bit

D$
8KB

SIO

VU0
micro
MEM

SPRAM
16KB

VU1

VU
MEM

micro
MEM

VIF
128

Timer

128

VPU1

10ch DMA
Controller

VIF

128

128

IPU

128

Memory
Interface

External Memory

Figure 2-1 EE Block Diagram

© SCEI
-22-

VU
MEM

I/O
Interface

Peripherals

GIF

GS

SCE CONFIDENTIAL

Main Specifications
Block
Name
CPU
Core

CACHE

MMU
Instruction set

Coprocessors

FPU
VPU0

Coordinate
engine

VPU1

Image engine

IPU

Built-in devices

DMAC
DRAMC
INTC
TIMER
GIF
SIF

Main bus

EE Overview Version 6.0

Contents
2-way superscalar
Data bus
Internal bus
Internal register
I-Cache
D-Cache

128 bits (64 bits x 2)
128 bits
128 bits x 32
16 KB 2-way set associative
8 KB 2-way set associative with
line lock
Scratchpad RAM (SPR)
16 KB
48-double-entry TLB
32-bit physical/logical address space conversion
64 bits, conforms to MIPS III (partly to MIPS IV)
128-bit parallel multimedia instruction set
3-operand multiply/multiply-add calculation instruction
Interrupt enable/disable instruction
32-bit single-precision floating-point multiply-add arithmetic
logical unit
32-bit single-precision floating-point divide calculator
32-bit single-precision floating-point multiply-add arithmetic
logical unit x 4
32-bit single-precision floating-point divide calculator x 1
Data unpacking function (VIF)
Programmable LIW DSP
Internal bus (data)
128 bits
32-bit single-precision floating-point multiply-add arithmetic
logical unit x 5
32-bit single-precision floating-point divide calculator x 2
Data unpacking function (VIF)
Programmable LIW DSP
Internal bus (data)
128 bits
MPEG2 video layer decoding/bit stream decoding/IDCT/CSC
(Color Space Conversion)/Dither/ VQ (Vector Quantization)
10ch (transfer between memory and I/O, memory and SPR)
RDRAM controller
2 types: INT0 (for interrupt from each device)/INT1 (for
interrupt from DMAC)
16 bits x 4
256-byte FIFO embedded
Data formatting function
Arbitration (PATH1, 2 and 3)
32-bit (address/data multiplex), 128-byte FIFO embedded
128 bits

© SCEI
-23-

SCE CONFIDENTIAL

EE Overview Version 6.0

2.2. EE Core: CPU
2.2.1. EE Core Features
The EE Core is a processor that implements the superscalar 64-bit MIPS IV instruction set architecture. In
particular, 128-bit parallel processing for multimedia applications has been greatly expanded.
The EE Core is composed of the CPU, a floating-point execution unit (Coprocessor 1), an instruction cache, a
data cache, scratchpad RAM, and a tightly coupled vector operation unit (Coprocessor 2).
The CPU has two pipelines and can decode two instructions in each cycle. Instructions are executed and
completed in order. However, since data cache misses are not blocked and a single cache miss does not stall the
pipelines, a load miss or non-cached load completion may occur out of order. Completion of Multiply, MultiplyAdd, Divide, Prefetch, and Coprocessor instructions may also occur out of order. The above features are
summarized as follows:
• 2-way superscalar pipelines
• 128-bit (64 bits x 2) data path and 128-bit system bus
• Instruction set
- 64-bit instruction set conforming to MIPS III and partly conforming to MIPS IV (Prefetch instruction
and conditional move instructions)
- Non-blocking load instructions
- Three-operand Multiply and Multiply-Add instructions
- 128-bit multimedia instructions (Parallel processing of 64 bits x 2, 32 bits x 4, 16 bits x 8, or 8 bits x 16)
• On-chip caches and scratchpad RAM
- Instruction cache: 16 KB, 2-way set associative
- Data cache: 8 KB, 2-way set associative (with a write back protocol)
- Data scratchpad RAM: 16 KB
- Data cache line lock function
- Prefetch function
• MMU
- 48-double-entry full-set-associative address translation look-aside buffer (TLB)

© SCEI
-24-

SCE CONFIDENTIAL

EE Overview Version 6.0

2.2.2. Memory Map
ffff_ffff

ffff_ffff

c000_0000

NO Mount
1fc0_0000

KSEG1
System

Boot ROM (Max. 4MB)

KSEG0
System

Reserved
8000_0000

1400_0000
1200_0000
1000_0000

GS Registers

8000_0000

Extend
Main Memory
(Max. 1GB)

EE Registers

Main Memory
(Max. 256MB)

a000_0000

4000_0000
NO Mount
2000_0000
System

0000_0000

0000_0000
Physical Memory

Kernel Mode

Figure 2-2 EE Core Memory Map

© SCEI
-25-

SCE CONFIDENTIAL

EE Overview Version 6.0

2.2.3. Instruction Set Overview
The EE Core has an instruction set consisting of the MIPS III instruction set, part of the MIPS IV instruction
set, 128-bit multimedia instructions, three-operand multiply instructions, I1 pipe operation instructions, and
others. The EE Core instructions are listed below.
Integer Add/Subtract
Instruction
Fuction
ADD
Add Word
ADDI
Add Immediate Word
ADDIU
Add Immediate Unsigned Word
ADDU
Add Unsigned Word
DADD
Doubleword Add
DADDI
Doubleword Add Immediate
DADDIU
Doubleword Add Immediate Unsigned
DADDU
Doubleword Add Unsigned
DSUB
Doubleword Subtract
DSUBU
Doubleword Subtract Unsigned
SUB
Subtract Word
SUBU
Subtract Unsigned Word
PADDB
Parallel Add Byte
PADDH
Parallel Add Halfword
PADDSB
Parallel Add with Signed Saturation Byte
PADDSH
Parallel Add with Signed Saturation Halfword
PADDSW
Parallel Add with Signed Saturation Word
PADDUB
Parallel Add with Unsigned Saturation Byte
PADDUH
Parallel Add with Unsigned Saturation Halfword
PADDUW
Parallel Add with Unsigned Saturation Word
PADDW
Parallel Add Word
PADSBH
Parallel Add/Subtract Halfword
PSUBB
Parallel Subtract Byte
PSUBH
Parallel Subtract Halfword
PSUBSB
Parallel Subtract with Signed Saturation Byte
PSUBSH
Parallel Subtract with Signed Saturation Halfword
PSUBSW
Parallel Subtract with Signed Saturation Word
PSUBUB
Parallel Subtract with Unsigned Saturation Byte
PSUBUH
Parallel Subtract with Unsigned Saturation Halfword
PSUBUW
Parallel Subtract with Unsigned Saturation Word
PSUBW
Parallel Subtract Word

© SCEI
-26-

Level
MIPS I
MIPS I
MIPS I
MIPS I
MIPS III
MIPS III
MIPS III
MIPS III
MIPS III
MIPS III
MIPS I
MIPS I
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI

SCE CONFIDENTIAL

EE Overview Version 6.0

Integer Multiply/Divide
Instruction
Function
DIV
Divide Word
DIV1
Divide Word Pipeline 1
DIVU
Divide Unsigned Word
DIVU1
Divide Unsigned Word Pipeline 1
MULT
Multiply Word
MULTU
Multiply Unsigned Word
MULT1
Multiply Word Pipeline 1
MULTU1
Multiply Unsigned Word Pipeline 1
PDIVBW
Parallel Divide Broadcast Word
PDIVUW
Parallel Divide Unsigned Word
PDIVW
Parallel Divide Word
PMULTH
Parallel Multiply Halfword
PMULTUW
Parallel Multiply Unsigned Word
PMULTW
Parallel Multiply Word

Level
MIPS I
EE Core
MIPS I
EE Core
MIPS I
MIPS I
EE Core
EE Core
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI

Integer Multiply-Add
Instruction
Function
MADD
Multiply-Add word
MADD1
Multiply-Add word Pipeline 1
MADDU
Multiply-Add Unsigned word
MADDU1
Multiply-Add Unsigned word Pipeline 1
PHMADH
Parallel Horizontal Multiply-Add Halfword
PHMSBH
Parallel Horizontal Multiply-Subtract Halfword
PMADDH
Parallel Multiply-Add Halfword
PMADDUW
Parallel Multiply-Add Unsigned Word
PMADDW
Parallel Multiply-Add Word
PMSUBH
Parallel Multiply-Subtract Halfword
PMSUBW
Parallel Multiply-Subtract Word

Level
EE Core
EE Core
EE Core
EE Core
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI

Floating-Point
Instruction
ADD.S
ADDA.S
MADD.S
MADDA.S
MUL.S
MULA.S
MSUB.S
MSUBA.S
SUB.S
SUBA.S

Function
Floating Point Add
Floating Point Add to Accumulator
Floating Point Multiply-Add
Floating Point Multiply and Add to Accumulator
Floating Point Multiply
Floating Point Multiply to Accumulator
Floating Point Multiply and Subtract
Floating Point Multiply and Subtract from Accumulator
Floating Point Subtract
Floating Point Subtract to Accumulator

Level
MIPS I
EE Core
MIPS I
EE Core
MIPS I
EE Core
MIPS I
EE Core
MIPS I
EE Core

© SCEI
-27-

SCE CONFIDENTIAL

EE Overview Version 6.0

Shift
Instruction
DSRA
DSLL
DSLL32
DSLLV
DSRA32
DSRAV
DSRL
DSRL32
DSRLV
SLL
SLLV
SRA
SRAV
SRL
SRLV
PSLLH
PSLLVW
PSLLW
PSRAH
PSRAVW
PSRAW
PSRLH
PSRLVW
PSRLW
QFSRV

Function
Doubleword Shift Right Arithmetic
Doubleword Shift Left Logical
Doubleword Shift Left Logical Plus 32
Doubleword Shift Left Logical Variable
Doubleword Shift Right Arithmetic Plus 32
Doubleword Shift Right Arithmetic Variable
Doubleword Shift Right Logical
Doubleword Shift Right Logical Plus 32
Doubleword Shift Right Logical Variable
Shift Word Left Logical
Shift Word Left Logical Variable
Shift Word Right Arithmetic
Shift Word Right Arithmetic Variable
Shift Word Right Logical
Shift Word Right Logical Variable
Parallel Shift Left Logical Halfword
Parallel Shift Left Logical Variable Word
Parallel Shift Left Logical Word
Parallel Shift Right Arithmetic Halfword
Parallel Shift Right Arithmetic Variable Word
Parallel Shift Right Arithmetic Word
Parallel Shift Right Logical Halfword
Parallel Shift Right Logical Variable Word
Parallel Shift Right Logical Word
Quadword Funnel Shift Right Variable

Level
MIPS III
MIPS III
MIPS III
MIPS III
MIPS III
MIPS III
MIPS III
MIPS III
MIPS III
MIPS I
MIPS I
MIPS I
MIPS I
MIPS I
MIPS I
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI

Logical
Instruction
AND
ANDI
NOR
OR
ORI
XOR
XORI
PAND
PNOR
POR
PXOR

Function
And
And Immediate
Not Or
Or
Or Immediate
Exclusive OR
Exclusive OR Immediate
Parallel And
Parallel Not Or
Parallel Or
Parallel Exclusive OR

Level
MIPS I
MIPS I
MIPS I
MIPS I
MIPS I
MIPS I
MIPS I
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI

© SCEI
-28-

SCE CONFIDENTIAL

EE Overview Version 6.0

Compare
Instruction
SLTI
SLTIU
SLTU
PCEQB
PCEQH
PCEQW
PCGTB
PCGTH
PCGTW
C.EQ.S
C.F.S
C.LE.S
C.LT.S

Function
Set on Less Than Immediate
Set on Less Than Immediate Unsigned
Set on Less Than Unsigned
Parallel Compare for Equal Byte
Parallel Compare for Equal Halfword
Parallel Compare for Equal Word
Parallel Compare for Greater Than Byte
Parallel Compare for Greater Than Halfword
Parallel Compare for Greater Than Word
Floating Point Compare (Equal)
Floating Point Compare (False)
Floating Point Compare (Less than or Equal)
Floating Point Compare (Less than)

Level
MIPS I
MIPS I
MIPS I
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
MIPS I
MIPS I
MIPS I
MIPS I

Min/Max
Instruction
PMAXH
PMAXW
PMINH
PMINW
MAX.S
MIN.S

Function
Parallel Maximize Halfword
Parallel Maximize Word
Parallel Minimize Halfword
Parallel Minimize Word
Floating Point Maximum
Floating Point Minimum

Level
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
EE Core
EE Core

Data Format Conversion
Instruction
Function
PEXT5
Parallel Extend Upper from 5 bits
PPAC5
Parallel Pack to 5 bits
CVT.S.W
Fixed point Convert to Single Floating Point
CVT.W.S
Floating point Convert to Word Fixed-Point

Level
128-bit MMI
128-bit MMI
MIPS I
MIPS I

© SCEI
-29-

SCE CONFIDENTIAL

EE Overview Version 6.0

Reordering
Instruction
PCPYH
PCPYLD
PCPYUD
PEXCH
PEXCW
PEXEH
PEXEW
PEXTLB
PEXTLH
PEXTLW
PEXTUB
PEXTUH
PEXTUW
PINTEH
PINTH
PPACB
PPACH
PPACW
PREVH
PROT3W

Function
Parallel Copy Halfword
Parallel Copy Lower Doubleword
Parallel Copy Upper Doubleword
Parallel Exchange Center Halfword
Parallel Exchange Center Word
Parallel Exchange Even Halfword
Parallel Exchange Even Word
Parallel Extend Lower from Byte
Parallel Extend Lower from Halfword
Parallel Extend Lower form Word
Parallel Extend Upper from Byte
Parallel Extend Upper from Halfword
Parallel Extend Upper from Word
Parallel Interleave Even Halfword
Parallel Interleave Halfword
Parallel Pack to Byte
Parallel Pack to Halfword
Parallel Pack to Word
Parallel Reverse Halfword
Parallel Rotate 3 Words

Level
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI

Others
Instruction
PABSH
PABSW
PLZCW
ABS.S
NEG.S
RSQRT.S
SQRT.S

Function
Parallel Absolute Halfword
Parallel Absolute Word
Parallel Leading Zero or One Count Word
Floating Point Absolute Value
Floating Point Negate
Floating Point Reciprocal Root
Floating Point Square Root

Level
128-bit MMI
128-bit MMI
128-bit MMI
MIPS I
MIPS I
MIPS IV
MIPS II

© SCEI
-30-

SCE CONFIDENTIAL

EE Overview Version 6.0

Register-Register Transfer
Instruction
Function
MFHI
Move from HI Register
MFLO
Move from LO Register
MOVN
Move Conditional on Not Zero
MOVZ
Move Conditional on Zero
MTHI
Move to HI Register
MTLO
Move to LO Register
MFHI1
Move From HI1 Register
MFLO1
Move From LO1 Register
MTHI1
Move To HI1 Register
MTLO1
Move to LO1 Register
PMFHI
Parallel Move From HI Register
PMFHL
Parallel Move from HI/LO Register
PMFLO
Parallel Move from LO Register
PMTHI
Parallel Move To HI Register
PMTHL
Parallel Move To HI/LO Register
PMTLO
Parallel Move To LO Register
MFC1
Move Word from Floating Point
MOV.S
Floating Point Move
MTC1
Move Word to Floating Point

Level
MIPS I
MIPS I
MIPS IV
MIPS IV
MIPS I
MIPS I
EE Core
EE Core
EE Core
EE Core
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
128-bit MMI
MIPS I
MIPS I
MIPS I

Load from Memory
Instruction
Function
LB
Load Byte
LBU
Load Byte Unsigned
LD
Load Doubleword
LDL
Load Doubleword Left
LDR
Load Doubleword Right
LH
Load Halfword
LHU
Load Halfword Unsigned
LUI
Load Upper Immediate
LW
Load Word
LWL
Load Word Left
LWR
Load Word Right
LWU
Load Word Unsigned
LQ
Load Quadword
LWC1
Load Word to Floating Point

Level
MIPS I
MIPS I
MIPS III
MIPS III
MIPS III
MIPS I
MIPS I
MIPS I
MIPS I
MIPS I
MIPS I
MIPS I
128-bit MMI
MIPS I

© SCEI
-31-

SCE CONFIDENTIAL

EE Overview Version 6.0

Store in Memory
Instruction
Function
SB
Store Byte
SD
Store Doubleword
SDL
Store Doubleword Left
SDR
Store Doubleword Right
SH
Store Halfword
SW
Store Word
SWL
Store Word Left
SWR
Store Word Right
SQ
Store Quadword
SWC1
Store Word from Floating Point

Level
MIPS I
MIPS III
MIPS III
MIPS III
MIPS I
MIPS I
MIPS I
MIPS I
128-bit MMI
MIPS I

Special Data Transfer
Instruction
Function
MFSA
Move from Shift Amount Register
MTSA
Move to Shift Amount Register
MTSAB
Move Byte Count to Shift Amount Register
MTSAH
Move Halfword Count to Shift Amount Register
MFBPC
Move from Breakpoint Control Register
MFCO
Move from System Control Coprocessor
MFDAB
Move from Data Address Breakpoint register
MFDABM
Move from Data Address Breakpoint Mask Register
MFDVB
Move from Data value Breakpoint Register
MFDVBM
Move from Data Value Breakpoint Mask Register
MFIAB
Move from Instruction Address Breakpoint Register
MFIABM
Move from Instruction Address Breakpoint Mask Register
MFPC
Move from Performance Counter
MFPS
Move from Performance Event Specifier
MTBPC
Move to Breakpoint Control Register
MTCO
Move to System Control Coprocessor
MTDAB
Move to Data Address Breakpoint Register
MTDABM
Move to Data Address Breakpoint Mask Register
MTDVB
Move to Data Value Breakpoint Register
MTDVBM
Move to Data Value Breakpoint Mask Register
MTIAB
Move to Instruction Address Breakpoint Register
MTIABM
Move to Instruction Address Mask Breakpoint Register
MTPC
Move to Performance Counter
MTPS
Move to Performance Event Specifier
CFC1
Move Control Word from Floating Point
CTC1
Move Control Word to Floating Point

© SCEI
-32-

Level
EE Core
EE Core
EE Core
EE Core
MIPS I
MIPS I
MIPS I
MIPS I
MIPS I
MIPS I
MIPS I
MIPS I
MIPS I
MIPS I
MIPS I
MIPS I
MIPS I
MIPS I
MIPS I
MIPS I
MIPS I
MIPS I
MIPS I
MIPS I
MIPS I
MIPS I

SCE CONFIDENTIAL

EE Overview Version 6.0

Conditional Branch and Jump
Instruction
Function
BEQ
Branch on Equal
BEQL
Branch on Equal Likely
BGEZ
Branch on Greater Than or Equal to Zero
BGEZL
Branch on Greater Than or Equal to Zero Likely
BGTZ
Branch on Greater Than Zero
BGTZL
Branch on Greater Than Zero Likely
BLEZ
Branch on Less Than or Equal to Zero
BLEZL
Branch on Less Than or Equal to Zero Likely
BLTZ
Branch on Less Than Zero
BLTZL
Branch on Less Than Zero Likely
BNE
Branch on Not Equal
BNEL
Branch on Not Equal Likely
BC0F
Branch on Coprocessor 0 False
BC0FL
Branch on Coprocessor 0 False Likely
BC0T
Branch on Coprocessor 0 True
BC0TL
Branch on Coprocessor 0 True Likely
BC1F
Branch on FP False
BC1FL
Branch on FP False Likely
BC1T
Branch on FP True
BC1TL
Branch on FP True Likely
BC2F
Branch on Coprocessor 2 False
BC2FL
Branch on Coprocessor 2 False Likely
BC2T
Branch on Coprocessor 2 True
BC2TL
Branch on Coprocessor 2 True Likely
J
Jump
JR
Jump Register

Level
MIPS I
MIPS II
MIPS I
MIPS II
MIPS I
MIPS II
MIPS I
MIPS II
MIPS I
MIPS II
MIPS I
MIPS II
MIPS I
MIPS I
MIPS I
MIPS I
MIPS I
MIPS II
MIPS I
MIPS II
MIPS I
MIPS I
MIPS I
MIPS I
MIPS I
MIPS I

Subroutine Call
Instruction
BGEZAL
BGEZALL
BLTZAL
BLTZALL
JAL
JALR

Level
MIPS I
MIPS II
MIPS I
MIPS II
MIPS I
MIPS I

Function
Branch on Greater Than or Equal to Zero and Link
Branch on Greater Than or Equal to Zero and Link Likely
Branch on Less Than Zero and Link
Branch on Less Than Zero and Link Likely
Jump and Link
Jump and Link Register

© SCEI
-33-

SCE CONFIDENTIAL

EE Overview Version 6.0

Break and Trap
Instruction
BREAK
SYSCALL
TEQ
TEQI
TGE
TGEI
TGEIU
TGEU
TLT
TLTI
TLTIU
TLTU
TNE
TNEI
ERET

Function
Breakpoint
System Call
Trap if Equal
Trap if Equal Immediate
Trap if Greater or Equal
Trap if Greater or Equal Immediate
Trap if Greater or Equal Immediate Unsigned
Trap if Greater or Equal Unsigned
Trap if Less Than
Trap if Less Than Immediate
Trap if Less Than Immediate Unsigned
Trap if Less Than Unsigned
Trap if Not Equal
Trap if Not Equal Immediate
Exception Return

Level
MIPS I
MIPS I
MIPS II
MIPS II
MIPS II
MIPS II
MIPS II
MIPS II
MIPS II
MIPS II
MIPS II
MIPS II
MIPS II
MIPS II
MIPS III

Others
Instruction
SYNC.stype
PREF
DI
EI

Function
Synchronize Shared Memory
Prefetch
Disabled Interrupt
Enabled Interrupt

Level
MIPS II
MIPS IV
MIPS I
MIPS I

© SCEI
-34-

SCE CONFIDENTIAL

EE Overview Version 6.0

2.3. VPU: Vector Operation Processor
The EE has two on-chip vector operation processors with the same architecture, VPU0 and VPU1, for floatingpoint vector operation indispensable to geometry processing.
VPU0 is connected to the EE Core via a 128-bit coprocessor bus. The operation resources and registers for
VPU0 can be used directly from the EE Core by using coprocessor instructions and not by using the main bus.
VPU1 is directly connected to the rendering engine, the GS, via the GIF (Graphics Synthesizer Interface Unit).
Display lists generated in VPU1 are not transferred to the GS via the main bus.
VPU0 and VPU1 each have a packet expansion engine called VIF (VPU Interface Unit) at the front end. They
are named VIF0 and VIF1 respectively.

VPU0

VPU1
ALU

FMAC
EE
Core

ALU

micro
Mem0

FMAC

FDIV
128

64

VU Mem0

micro
Mem1
FDIV

128

64

VU Mem1

VIF0

GIF

128
VIF1
128

Main Memory

Figure 2-3 VPU-Related Block Diagram

2.3.1. VPU Architecture
The 2 VPUs basically have the same architecture, consisting of the VU, VU Mem (data memory for VU), and
VIF (compressed-data decompression engine). The VU is a processor unit consisting of several FMACs
(Floating-point Multiply-Add ALUs), FDIV (Floating-point Divide Calculator), 32 four-parallel floating-point
registers, 16 integer registers, and a Micro Mem (program memory). It loads data from the VU Mem in 128-bit
units (single-precision floating-point x 4), performs operations according to microprograms placed in the Micro
Mem, and stores the results in the VU Mem.
Microprograms use a 64-bit-long LIW (Long Instruction Word) instruction set, and can concurrently execute
floating-point multiply-add operations in the Upper 32-bit field (Upper instruction field) and floating-point
divide or integer operations in the Lower 32-bit field (Lower instruction field).

© SCEI
-35-

SCE CONFIDENTIAL

EE Overview Version 6.0

Micro instruction fetch unit
Vector Unit :

VU

Micro Mem

special
registers

4 KBytes or 16 KBytes

64

63
Upper Instruction

bold line : 128 bits

VI16~VI31

0

Lower Instruction

32

32

16

COP2
control
registers

16
QMFC2
/ SQC2

0

Lower Execution Unit

32
BRU

EFU

IALU

(COP2 data registers)

LSU

FDIV

FMACx

FMACy

VF00~VF31

FMACz

registers

Upper Execution Unit

RANDU/etc

floating

FMACw

QMTC2
/ LQC2

127

integer
registers
VI00~VI15

CFC2

32
CTC2

16

Vector Processing Unit :
VPU

VU Mem
4 KBytes or 16 KBytes

VIF
External Units

Figure 2-4 VU Block Diagram
Following are brief descriptions of the VPU units.
FMAC
This unit handles add/subtract, multiply, and multiply-add of the floating-point numbers. FMACx, FMACy,
FMACz, and FMACw are mounted to execute four-element vector operations efficiently. The latency of
instructions which use the FMAC has been unified at four cycles to increase the efficiency of pipeline
processing.
FDIV
This unit performs self-synchronous type floating-point divide/square root operations. FDIV operations
differ from others in latency, so the results are stored in the Q register.
LSU
This unit controls loading and storing to and from VU Mem.
Load/Store must be performed in units of 128 bits, but can be masked in units of x, y, z and w fields.
IALU
This unit performs 16-bit integer operations.
Loop counter operations and load/store address calculations are performed in conjunction with the integer
register.
BRU
This unit controls jump and conditional branch.
RANDU
This unit generates random numbers. Random numbers are generated by the M sequence and stored in the
R register.
© SCEI
-36-

SCE CONFIDENTIAL

EE Overview Version 6.0

EFU
This is an elementary function unit, which executes operations such as exponential and trigonometric
functions. This unit is mounted only on VU1. Operation results are stored in the P register.
Floating-Point Registers
32 128-bit floating-point registers (VF00 - VF31) are mounted. Each register can be divided into 4 fields of
x, y, z, and w, and is equivalent to a vector of four single-precision floating-point numbers. VF00 is a
constant register.
Integer Registers
Sixteen 16-bit integer registers (VI00 - VI15) are mounted. These registers are used as loop counters, and
used for load/store address calculations. VI00 is a constant register.
VU Mem
This is data memory for the VU's exclusive use. Memory capacity is 4 Kbytes for VU0 and 16 Kbytes for
VU1. This memory is connected to the LSU at a width of 128 bits, and addresses are aligned on qword
boundaries.
Address
0x0000
w
z
y
x
0x0010
w
z
y
x
:
:
0x0ff0

w

z

y

x

:
Implemented on VU1 only
:
0x3ff0

w

z

y

x

Figure 2-5 VU Mem Memory Map
Furthermore, VU1 registers are mapped to addresses 0x4000 to 0x43ff in VU0.
Micro Mem
This is on-chip memory, which stores microinstruction programs. Memory capacity is 4 Kbytes in VU0 and
16 Kbytes in VU1.
Address
0x0000
Upper
Lower
0x0008
Upper
Lower
:
:
0x0ff8

Upper

Lower

:
Implemented on VU1 only
:

0x3ff8

Upper

Lower

Figure 2-6 Micro Mem Memory Map
© SCEI
-37-

SCE CONFIDENTIAL

EE Overview Version 6.0

2.3.2. VPU0
VPU0 has a macro mode, which operates according to coprocessor instructions from the EE Core, and a micro
mode, which operates independently according to microprograms stored in the Micro Mem. Almost all the
instructions used in micro mode are also defined as coprocessor instructions, and are executable directly from
the EE Core. Similarly, VPU0 registers can be referred to directly from the EE Core with coprocessor transfer
instructions.
VPU0 is tightly coupled with the EE Core as mentioned above, and takes charge of relatively small-sized
processing.

Upper Exec. Unit

32

EE Core
Cop2
Interface 128
32

Main Bus

Lower Exec.Unit

FMAC

128

128bit
Floating
Regs
Integer

Regs

32

FDIV
32

128

Micro Mem
4KB

VU Mem
4KB

VIF

VPU0

Figure 2-7 VPU0 Block Diagram

2.3.3. VPU1
VPU1 operates only in micro mode. VPU1 has a larger Micro Mem and VU Mem than VPU0, and is equipped
with an EFU. It is also directly connected to the GIF, and has additional synchronization control instructions
such as transfer to the GIF. Furthermore, it structures double buffers in VU Mem and has additional functions
to perform data transfer and operations in parallel.
As mentioned above, VPU1 operates autonomously as a geometry engine independently of the EE Core. Highspeed processing is possible with VPU1, but because of the limits of complexity of what it can process, it divides
processing of standard three-dimensional graphics.
VPU1 operation results are transferred from VU Mem1 to the GS via the GIF, with the highest priority.

© SCEI
-38-

SCE CONFIDENTIAL

EE Overview Version 6.0

Rendering Engine

64
Upper Exec.Unit

Lower Exec.Unit

FMAC

128
128 bit
Floating
Regs
Integer
Regs

32

GIF

FDIV
32

Path1

128

128

Micro Mem

VU Mem

16KB

16KB

Path2

Path3

128

128

VIF

VPU1

Main Bus

Figure 2-8 VPU1 Block Diagram

2.3.4. VIF: VPU Interface
The VIF functions as a preprocessor for the VPU. The VIF unpacks the packed vertex data, based on the
specification of the tag (VIFtag) at the start of the data, and transfers it to the data memory (VU Mem) of the
VPU. As a result, in addition to reducing the data size in main memory, the VIF removes the load in data
formatting from the VPU, which has low degree of programming freedom.
The VIF also stores microprograms in Micro Mem and transfers DIRECT data to the GIF according to the
VIFtag specification.

2.3.5. Operation Mode and Programming Model
The VU has two execution modes, micro mode and macro mode. In micro mode, the VU functions as a standalone processor and executes microprograms stored in Micro Mem. VU1 operates in this mode. In macro
mode, the VU executes macroinstructions as COP2 (Coprocessor 2) of the EE Core. VU0 operates primarily in
this mode.
Microinstructions are LIW (Long Instruction Word) instructions of 32 bits x 2, and can concurrently execute an
Upper instruction, which uses the upper 32 bits of the instruction word, and a Lower instruction, which uses the
lower 32 bits of the instruction word. The Upper instruction controls the FMAC, and the Lower instruction
controls operations which use the FDIV/EFU/LSU/BRU and integer registers. In the Upper instruction, 4
FMACs are operable concurrently with 1 instruction, and a four-dimensional vector calculation can be made in 1
cycle (throughput).
Upper Instruction
MUL VF01,VF02,VF03

Lower Instruction
SQ VF04, VI01
Lower Exec. Unit

Upper Exec. Unit

FMAC
FMAC
FMAC
FMAC

FDIV

LSU

VU Mem

Figure 2-9 Upper Instruction and Lower Instruction

© SCEI
-39-

SCE CONFIDENTIAL

EE Overview Version 6.0

Operation
4-parallel floating-point multiply + 4-parallel floating-point add
Floating point divide
4 x 4 matrix * 4-row vector
4 x 4 matrix * 4 x 4 matrix
1 vertex processing (matrix * vector + divide)

Latency
4
7
8
20
19

Throughput
1
7
4
16
8

Some microinstructions do not have macroinstruction equivalents. Macro mode cannot execute the Upper
instruction and Lower instruction at the same time, either. However, macroinstructions can execute the
CALLMS instruction, which executes a microinstruction program in Micro Mem like a subroutine, and the
COP2 data transfer instruction, which transfers data to the VU registers.

Operation
Operation
code
Instruction set

Total number
of instructions

EFU
Register

Micro Mode (VU1)
Operates as a stand-alone processor
64-bit-long LIW instruction

Macro Mode (VU0)
Operates as a coprocessor of the EE Core
32-bit MIPS COP2 instruction

Upper instruction + Lower instruction
(Can be specified concurrently)
EFU instruction
External unit control instruction
127 instructions

Upper instruction
Lower instruction (partial)
VCALLMS, VCALLMSR instruction
COP2 transfer instruction
90 instructions

Usable as an option
Floating-point register: 32 x 128 bits
Integer register: 16
Special register: ACC, I, Q, R (, P)

Not supported
Floating-point register: 32 x 128 bits
Integer register: 16
Special register: ACC, I, Q, R
Control register: 16

2.3.6. VPU Instruction Set Overview
VPU microinstructions/macroinstructions are listed below.
Floating-Point Operation
Microinstruction
Upper
Lower
ABS
ADD
ADDA
ADDAbc
ADDAi
ADDAq
ADDbc
ADDi
ADDq
DIV
MADD
MADDA
MADDAbc MADDAi
MADDAq
MADDbc
-

Macro-instruction Function
VABS
VADD
VADDA
VADDAbc
VADDAi
VADDAq
VADDbc
VADDi
VADDq
VDIV
VMADD
VMADDA
VMADDAbc
VMADDAi
VMADDAq
VMADDbc

absolute
addition
ADD output to ACC
ADD output to ACC broadcast bc field
ADD output to ACC broadcast I register
ADD output to ACC broadcast Q register
ADD broadcast bc field
ADD broadcast I register
ADD broadcast Q register
floating divide
MUL and ADD
MUL and ADD output to ACC
MUL and ADD output to ACC broadcast bc field
MUL and ADD output to ACC broadcast I register
MUL and ADD output to ACC broadcast Q register
MUL and ADD broadcast bc field

© SCEI
-40-

SCE CONFIDENTIAL

Microinstruction
Upper
Lower
MADDi
MADDq
MAX
MAXbc
MAXi
MINI
MINIbc
MINIi
MSUB
MSUBA
MSUBAbc
MSUBAi
MSUBAq
MSUBbc
MSUBi
MSUBq
MUL
MULA
MULAbc
MULAi
MULAq
MULbc
MULi
MULq
OPMSUB
OPMULA
RSQRT
SQRT
SUB
SUBA
SUBAbc
SUBAi
SUBAq
SUBbc
SUBi
SUBq
-

EE Overview Version 6.0

Macro-instruction Function
VMADDi
VMADDq
VMAX
VMAXbc
VMAXi
VMINI
VMINIbc
VMINIi
VMSUB
VMSUBA
VMSUBAbc
VMSUBAi
VMSUBAq
VMSUBbc
VMSUBi
VMSUBq
VMUL
VMULA
VMULAbc
VMULAi
VMULAq
VMULbc
VMULi
VMULq
VOPMSUB
VOPMULA
VRSQRT
VSQRT
VSUB
VSUBA
VSUBAbc
VSUBAi
VSUBAq
VSUBbc
VSUBi
VSUBq

MUL and ADD broadcast I register
MUL and ADD broadcast Q register
maximum
MAX broadcast bc field
MAX broadcast I register
minimum
MINI broadcast bc field
MINI broadcast I register
MUL and SUB
MUL and SUB output to ACC
MUL and SUB output to ACC broadcast bc field
MUL and SUB output to ACC broadcast I register
MUL and SUB output to ACC broadcast Q register
MUL and SUB broadcast bc field
MUL and SUB broadcast I register
MUL and SUB broadcast Q register
multiply
MUL output to ACC
MUL output to ACC broadcast bc field
MUL output to ACC broadcast I register
MUL output to ACC broadcast Q register
MUL broadcast bc field
MUL broadcast I register
MUL broadcast Q register
outer product MSUB
outer product MULA
floating reciprocal square-root
floating square-root
subtraction
SUB output to ACC
SUB output to ACC broadcast bc field
SUB output to ACC broadcast I register
SUB output to ACC broadcast Q register
SUB broadcast bc field
SUB broadcast I register
SUB broadcast Q register

© SCEI
-41-

SCE CONFIDENTIAL

Format Conversion
Microinstruction
Upper
Lower
FTOI0
FTOI12
FTOI15
FTOI4
ITOF0
ITOF12
ITOF15
ITOF4
Integer Operation
Microinstruction
Upper
Lower
IADD
IADDI
IADDIU
IAND
IOR
ISUB
ISUBIU

EE Overview Version 6.0

Macro-instruction Function
VFTOI0
VFTOI12
VFTOI15
VFTOI4
VITOF0
VITOF12
VITOF15
VITOF4

float to integer, fixed point 0 bit
float to integer, fixed point 12 bits
float to integer, fixed point 15 bits
float to integer, fixed point 4 bits
integer to float, fixed point 0 bit
integer to float, fixed point 12 bits
integer to float, fixed point 15 bits
integer to float, fixed point 4 bits

Macro-instruction Function
VIADD
VIADDI
VIAND
VIOR
VISUB
-

Elementary Function Operation
Microinstruction
Macro-instruction
Upper
Lower
EATAN
EATANxy
EATANxz
EEXP
ELENG
ERCPR
ERLENG
ERSADD
ERSQRT
ESADD
ESIN
ESQRT
ESUM
Register-Register Transfer
Microinstruction
Macro-instruction
Upper
Lower
MFIR
VMFIR
MFP
MOVE
VMOVE
MR32
VMR32
MTIR
VMTIR

integer ADD
integer ADD immediate
integer ADD immediate unsigned
integer AND
integer OR
integer SUB
integer SUB immediate unsigned
Function
Elementary-function ArcTAN
Elementary-function ArcTAN y/x
Elementary-function ArcTAN z/x
Elementary-function Exponential
Elementary-function Length
Elementary-function Reciprocal
Elementary-function Reciprocal Length
Elementary-function Reciprocal Square and ADD
Elementary-function Reciprocal Square-root
Elementary-function Square and ADD
Elementary-function SIN
Elementary-function Square-root
Elementary-function Sum
Function
move from integer register
move from P register
move floating register
move rotate 32 bits
move to integer register

© SCEI
-42-

SCE CONFIDENTIAL

Load/Store
Microinstruction
Upper
Lower
ILW
ILWR
ISW
ISWR
LQ
LQD
LQI
SQ
SQD
SQI
Flag Operation
Microinstruction
Upper
Lower
FCAND
FCEQ
FCGET
FCOR
FCSET
FMAND
FMEQ
FMOR
FSAND
FSEQ
FSOR
FSSET
Branching
Microinstruction
Upper
Lower
B
BAL
IBEQ
IBGEZ
IBGTZ
IBLEZ
IBLTZ
IBNE
JALR
JR

EE Overview Version 6.0

Macro-instruction Function
VILWR
VISWR
VLQD
VLQI
VSQD
VSQI

integer load word
integer load word register
integer store word
integer store word register
Load Quadword
Load Quadword with pre-decrement
Load Quadword with post-increment
Store Quadword
Store Quadword with pre-decrement
Store Quadword with post-increment

Macro-instruction Function
-

flag-operation clipping flag AND
flag-operation clipping flag EQ
flag-operation clipping flag get
flag-operation clipping flag OR
flag-operation clipping flag set
flag-operation MAC flag AND
flag-operation MAC flag EQ
flag-operation MAC flag OR
flag-operation status flag AND
flag-operation status flag EQ
flag-operation status flag OR
flag-operation set status flag

Macro-instruction Function
-

branch (PC relative address)
branch and link (PC relative address)
integer branch on equal
integer branch on greater than or equal to zero
integer branch on greater than 0
integer branch on less than or equal to zero
integer branch on less than zero
integer branch on not equal
jump and link register (absolute address)
jump register (absolute address)

© SCEI
-43-

SCE CONFIDENTIAL

Random Numbers
Microinstruction
Upper
Lower
RGET
RINIT
RNEXT
RXOR
Others
Microinstruction
Upper
Lower
CLIP
NOP
WAITP
WAITQ
XGKICK
XITOP
XTOP

EE Overview Version 6.0

Macro-instruction Function
VRGET
VRINIT
VRNEXT
VRXOR

random-unit get R register
random-unit init R register
random-unit next M sequence
random-unit XOR register

Macro-instruction Function
VCLIP
VNOP
VWAITQ
-

clipping
no operation
wait P register
wait Q register
eXternal-unit GPU2 Interface Kick
eXternal-unit read ITOP register
eXternal-unit read TOP register

© SCEI
-44-

SCE CONFIDENTIAL

EE Overview Version 6.0

2.4. IPU: Image Data Processor
The IPU implements decompression of two-dimensional images, such as texture data and video data. The IPU
decompresses the data, using MPEG2 or a subset of MPEG2, or converts the data, using VQ (Vector
Quantization). Which layer to use depends on the purpose and the property of the image.

IPU
Macro-Block Decoder

VLD

Zig-zag
Scan

IDCT

IQ

CSC

VQ

Local Buffer
Memory

FIFO
128

Figure 2-10 IPU Block Diagram
In decoding MPEG2 bit streams, the IPU decodes macro blocks and the EE Core performs motion
compensation via software by using multimedia instructions. For CSC (Color Space Conversion), the IPU is in
charge.

Rendering
Engine

EE Core
128 bit ALU
IPU
(IDCT)

Bit
Stream

SPR
Buffer#0

SPR
Buffer#1

Reference
Image

Decoded
Image

IPU
(CSC)

RGB
Decoded
Image
Main Memory

Figure 2-11 Decoding Process Flow for Motion Compensation

© SCEI
-45-

SCE CONFIDENTIAL

EE Overview Version 6.0

2.5. GIF: GS Interface
As a front end to the GS, the GIF formats data based on the specifications of a tag (GIFtag) at the start of the
display list packet, and then transfers the formatted data to the GS as a drawing command. Data is input to the
GIF from VU Mem1 via PATH1, from VIF1 via PATH2, and from main memory via PATH3. The GIF also
plays a role in data path arbitration.
PATH1 is assigned to the transfer of display lists processed in VPU1. PATH2 is assigned to the data directly
transferable to the rendering engine, e.g. online textures. PATH3 is assigned to the transfer of display lists
which have been generated by the EE Core and VPU0 and stored temporarily in main memory. The order of
priority is PATH1, PATH2, and PATH3.

GS
GIF

Buffer
64

Packing Logic

128

VU1

PATH1

VU Mem1

GS Control Regs

PATH3
PATH2
128

VIF1

128

GIF
FIFO

Figure 2-12 Data Paths to GS

© SCEI
-46-

SCE CONFIDENTIAL

EE Overview Version 6.0

2.6. SIF: Sub-CPU Interface
The Sub-CPU (IOP) controls sound output and I/O to and from storage devices. It adopts an LMA
configuration with memory independent of the EE. The SIF is the interface to exchange data between these
processors. The DMA controllers (DMACs) for the IOP and EE operate in cooperation through the
bidirectional FIFO (SFIFO) in the SIF.

IOP
Core

SIF

IOP
DMAC

Control Regs .

32

pack/
unpack

EE
Core

EE
DMAC
128

SFIFO

IOP-Memory

Main Memory
(MEM)

Figure 2-13 EE-IOP Interface
Data is transmitted in units called packets. A tag (DMATag) is attached to each packet, containing a memory
address in the IOP memory space, a memory address in the EE memory space, and the data size. The IOPDMAC reads the IOP memory address and data size from the tag, and transmits the packet with its tag to the
SIF. The EE-DMAC reads the packet from the SIF, interprets the first word as a tag, reads the EE memory
address and data size from the tag, and decompresses the data to the specified memory address. These transfer
operations are performed by the DMACs to avoid generating unnecessary interrupts of the CPU.

IOP Memory

EE Memory
Source
Chain
DMA

128 bit

Packet A

128 bit

Packet A

SIF
128 bit

Packet B

Tag

Tag

Tag

Tag

Tag

Tag

Packet B
Destination
Chain
DMA

Packet C

Packet C

Figure 2-14 SIF Data Flow

© SCEI
-47-

SCE CONFIDENTIAL

EE Overview Version 6.0

(This page is left blank intentionally)

© SCEI
-48-

SCE CONFIDENTIAL

EE Overview Version 6.0

3. Functional Overview

© SCEI
-49-

SCE CONFIDENTIAL

EE Overview Version 6.0

3.1. Data Transfer via DMA
Data is transferred between main memory, peripheral processors, and scratchpad memory (SPR) via DMA.
The unit of data transfer is a quadword (128 bits = qword). In data transfer to and from peripheral processors,
data is divided into blocks (slices) of 8 qwords.
On some of the channels, Chain mode is available. This mode performs processing such as switching transfer
addresses according to the tag (DMAtag) in the transfer data. This not only reduces processing such as data
sorting before transfer, but also enables data exchange between peripheral processors through the mediation of
main memory without the EE Core. At such times, the stall control function, which mutually synchronizes
transfer, is available. For the GIF channel, memory FIFO function to use the ring buffer in main memory is
also provided.

3.1.1. Sliced Transfer
Except for the data transfer between the SPR and main memory, DMA transfer is performed by slicing the data
every 8 qwords and arbitrating the transfer requests from each channel. A channel releases the bus right
temporarily whenever transfer of one slice is completed, and it continues transferring if there are no requests
from others. This sliced-transfer mechanism not only enables two or more transfer processes to be executed in
parallel but also allows the EE Core to access main memory during the transfer process. The following figure
illustrates DMA transfers performed concurrently on Channel A and B.
ch-A DREQ

ch-A DREQ

Stall

8 qword

CPU

ch-A

Stall

ch-B

ch-B

8 qword

8 qword

ch-A DREQ

8 qword

ch-A

8 qword

CPU

ch-A

8 qword

ch-B DREQ

ch-B DREQ

ch-B

ch-B DREQ

Figure 3-1 Example of Sliced Transfer

3.1.2. Chain Mode Transfer
Source Chain Mode
Source Chain Mode is used for DMA transfer from memory to peripherals. In this mode, transfer address
and transfer data size are specified according to the tag data (DMAtag) in the packet. The DMAC repeats
transfer processing while tracing the tags in memory, and ends a series of transfers at the point where
transfer of the tag with the end instruction finishes.
The DMAtag is 128-bit data with the following structure. ID is a field in which details of the transfer
operation are specified. Eight types in the table below can be specified.

© SCEI
-50-

SCE CONFIDENTIAL

EE Overview Version 6.0

127

64

(Arbitrary)
63

32 31

Address Specification ADDR

ID
cnt

Transfer Data Position
Next to tag

24

ID/FLG

15

0

Data Size QWC

Next Tag Position
Next to transfer data

Operation
Transfers the data following the tag and
proceeds to the succeeding data.
next Next to tag
Specified in tag
Transfers the data following the tag and
jumps to the specified position.
ref
Specified in tag
Next to tag
Transfers the data at the specified
position.
refs Specified in tag
Next to tag
Transfers the data at the specified
position while applying stall control.
refe Specified in tag
(None)
Transfers the data at the specified
position and ends transfer.
call
Next to tag
Specified in tag
Transfers the data following the tag,
stores the next address, and jumps to the
specified position.
ret
Next to tag
Position stored
Transfers the data following the tag and
when call was
jumps to the position stored when call
specified
was specified.
end Next to tag
(None)
Transfers the data following the tag and
ends transfer.
Data transfers can be performed most efficiently by using these IDs appropriately according to the data
structures in memory. The following is an example.

© SCEI
-51-

SCE CONFIDENTIAL

EE Overview Version 6.0

TADR

TADR

(1)NEXT ADDR=ADDR2
(2)

1

ADDR0

(1)REF ADDR=ADDR2
(3)REF ADDR=ADDR1
(5)END ADDR=-

DATA 0

(6)
(5) END ADDR=(6)

3

ADDR0

DATA 0

3

ADDR1

ADDR1

DATA 1

(4)

DATA 1

2
(3)NEXT ADDR=ADDR1

2

(4)

ADDR2

ADDR2

(2)

DATA 2

next tag

DATA 2

1

ref tag

TADR

(1)CALL ADDR=ADDR1
(7)CALL ADDR=ADDR2
(8)

DATA 0

ADDR0

(2)CALL ADDR=ADDR2
(3)

ADDR1

DATA 1

1

3
(6) RET ADDR=-

(11)END ADDR=(4)(9)

4

(5)(10)

RET ADDR=-

ADDR2

DATA 2

call tag/ret tag
Figure 3-2 Source Chain DMA Tags Showing Data Structures

© SCEI
-52-

2

SCE CONFIDENTIAL

EE Overview Version 6.0

Destination Chain Mode
Destination Chain Mode is used to transfer data from peripherals to memory. The tag (DMAtag) bearing the
destination address and packet length is placed at the start of the transfer packet. This enables the peripheral
side to control the address where data is stored.
The Destination Chain tag is 128-bit data with the following structure, and is classified into three types as
shown in the table below.
127

64

(Arbitrary)
63

32

Address Specification ADDR

ID
cnt
cnts

Destination Address
Specified in tag
Specified in tag

end

Specified in tag

31

24

15

ID/FLG

0

Data Size QWC

Operation
Stores the data following the tag at the specified address.
Stores the data following the tag at the specified address while
applying stall control.
Stores the data following the tag at the specified address and ends
transfer.

The following is an example.

SPR

SADDR

Start Address Register

Main Memory

CNT ADDR=ADDR1,QWC=8

ADDR0
DATA 0
CNT ADDR=ADDR0,QWC=8

DATA 1

D.Chain
DMA
ADDR1

DATA 1

DATA 0

Destination Chain DMA from SPR
IOP Memory

Main Memory

CNT ADDR=ADDR1,QWC=8

ADDR0
DATA 0

D.Chain
DMA

SDMA

DATA 1

CNT ADDR=ADDR0,QWC=8

DATA 1

SFIFO

ADDR1

DATA 0

Destination Chain DMA from SIF

Figure 3-3 Destination Chain DMA to Transfer Data to Specified Address

© SCEI
-53-

SCE CONFIDENTIAL

EE Overview Version 6.0

3.1.3. Interleave Transfer
Interleave mode is available for DMA transfer between main memory and SPR. This mode processes data in
such a way that a small rectangular area is cut out from or fitted into the two-dimensional data (image data)
allocated in memory.
Figure 3-4 illustrates an example of cutting out a small rectangular area (TW, TH) from a rectangular area (FW,
FH).
Fw

Source
Image
Fh

Tw

Destination

T0

T0
T1
T2
:
:
SPR

Th

Tw
Fw

skip
Tw

Fw

skip
Tw

Fw

T1

T2
skip
:
:
Main
memory

Figure 3-4 Cutting Out a Small Rectangular Area in Interleave Mode

3.1.4. Stall Control
When a transfer from a peripheral to memory and a transfer from memory to another peripheral are performed
concurrently, they can be synchronized through the stall address register (D_STADR). The channel that handles
the DMA transfer to memory is called the source channel. The channel that handles the DMA transfer from
memory is called the drain channel. The value of D_STADR is updated as transfer processing on the source
channel side advances, but transfer processing on the drain channel side stalls at the address immediately
preceding the D_STADR address. This mechanism is called stall control.

© SCEI
-54-

SCE CONFIDENTIAL

EE Overview Version 6.0

Main Memory/SPR
Source
DMA
Peripheral0

8 qwords

Drain
DMA
Peripheral1

8 qwords
8 qwords
8 qwords
8 qwords

D_STADR

Figure 3-5 Synchronization between DMA Transfers by Stall Control

3.1.5. MFIFO
A FIFO function can be implemented by using a ring buffer and the DMA tag set in main memory when
transferring data from the scratchpad memory to the VIF1/GIF. This is called MFIFO (MemoryFIFO).
EE Core
SPR

VIF1/GIF

Main BUS

D_RBOR
Dd_TADR
D_RBSR
D8_MADR

Main Memory

Ring Buffer (MFIF0)

Figure 3-6 Memory FIFO (MFIFO)

© SCEI
-55-

SCE CONFIDENTIAL

EE Overview Version 6.0

3.2. Data Transfer to VPU
The EE has two built-in VPUs. These floating-point vector processors execute matrix operations, coordinate
conversion, transparency perspective conversion, and so forth, at high speed. Data is DMA-transferred to the
VPU through the VIF. The header information (VIFcode) embedded in the transfer data specifies how to
process the data in the VPU. This is the mechanism of DMA transfer to the VPU.

3.2.1. VIF Overview
The VIF is an interface unit, which decompresses the DMA-transferred data in packets and transfers it to the
VPU memory. The VIF is designed to set the decompression method and destination memory address of the
data according to the VIFcode included in the VIF packet. It enables the VPU to perform operations
independently of the EE Core by transferring VIF packets of vector data, VIF packets of microinstruction
program, and VIF packets to give an instruction to activate a microinstruction program.
The data types the VIF can decompress and transfer to the VU Mem are one- to four-dimensional vectors
consisting of 8-bit/16-bit/32-bit elements, and a four-dimensional vector of 16-bit color type with RGBa:
5.5.5.1. In addition, the VIF can transfer microinstruction code to be transferred to the Micro Mem. VIF1 can
also transfer data to the GS via the GIF.

3.2.2. VIF Packet
According to the 32-bit VIFcode in the transferred data, the VIF decompresses the following data and writes
memory and registers in the VU. The VIFcode and the following data string are called the VIF packet. Several
VIF packets can exist in 1 DMA packet as shown in the figure below.
DMA packet example (When DMAtag is transferred)
128 bits
←MSB
data
VIFcode0
DMAtag
data
data
VIFcode2
data
data
data
data
data
data
data
data
data
data
data
data
data
data
VIFcode4
data
data
data
VIFcode5
data
data
DMA packet example (When DMAtag is not transferred)
128 bits
←MSB
--DMAtag
VIFcode2
VIFcode1
data
data
data
data
data
data
data
data
data
data
data
VIFcode3
data
VIFcode4
data
data
data
data
data
data
data
data
--VIFcode5

© SCEI
-56-

LSB→
VIFcode1
data
data
data
VIFcode3
data
data
data
LSB→
VIFcode0
data
data
data
data
data
data
data
data

SCE CONFIDENTIAL

EE Overview Version 6.0

VIF packets included in the above DMA packets
data

VIFcode0

VIFcode1
data

data

……data x 12

VIFcode2

data

data

……data x 2

VIFcode3

data

data

……data x 3

VIFcode4

VIFcode5

3.2.3. VIFcode Structure
The VIFcode is 32 bits in length, consisting of the CMD field (8 bits), the NUM field (8 bits), and the
IMMEDIATE field (16 bits) as shown in the figure below.
31

24

23

CMD

16

15

NUM

0

IMMEDIATE

The CMD field gives the VIF instructions on the operation and the decompression method of the following
data. The meanings of the NUM and IMMEDIATE fields change according to the value of the CMD field.
Category
Data transfer

Micro-program
execution

CMD Name
UNPACK
STCYCL
OFFSET
STMOD
STMASK
STROW

Function
Decompresses data and writes to VU Mem.
Sets CYCLE register value.
Sets OFFSET register value (VIF1 only).
Sets MODE register value.
Sets MASK register value.
Sets Row register value.

STCOL

Sets Col register value.

MPG

Loads a microprogram.

Microinstruction
program

FLUSHE
FLUSH

Waits for end of a microprogram.
Waits for end of a microprogram and end of
GIF (PATH1 /PATH2) transfer. (VIF1 only)
Waits for end of a microprogram and end of
GIF transfer. (VIF1 only)
Activates a microprogram.
Executes a microprogram continuously.
FLUSH and activates a microprogram.

None
None

FLUSHA
MSCAL
MSCNT
MSCALF

Following data
Packed vector data
None
None
None
Mask pattern
Row-completion
data
Column-completion
data

None
None
None
None

(VIF1 only)

Double
buffering
GS data
transfer
(VIF1 only)

Others

BASE
ITOP
DIRECT
DIRECTHL
MSKPATH3
NOP
MARK

Sets BASE register value. (VIF1 only)
Sets ITOPS register value.
Transfers data to GIF (via PATH2).
Transfers data to GIF (via PATH2).
Masks transfer via PATH3 to GIF.
No operation
Sets MARK register value.

None
None
GS data
GS data
None
None
None

© SCEI
-57-

SCE CONFIDENTIAL

EE Overview Version 6.0

3.2.4. Data Transfer by UNPACK
The most general data transfer via the VIF is data transfer to VU Mem by using the VIFcode UNPACK. The
transfer data following the VIFcode is packed data; 8 bits x 4 elements and 32 bits x 3 elements, for example.
The VIF decompresses the packed data to vector data of 32 bits x 4 elements and writes it to the VU Mem. At
this time, VU Mem area left blank can be filled with a VPU register value (supplementation), and a constant
offset value can be added to the transfer data (addition).
The list of packing formats is shown as follows.
Format

Data length

S-32
S-16
S-8
V2-32
V2-16
V2-8
V3-32
V3-16
V3-8
V4-32
V4-16
V4-8
V4-5

32 bits
16 bits
8 bits
32 bits
16 bits
8 bits
32 bits
16 bits
8 bits
32 bits
16 bits
8 bits
5+5+5+1 bits

© SCEI
-58-

No. of elements
(dimensions)
1
1
1
2
2
2
3
3
3
4
4
4
4

SCE CONFIDENTIAL

EE Overview Version 6.0

3.2.5. Double Buffering
VPU1 supports double buffering, which sets two buffer areas in the VU Mem and enhances throughputs by
simultaneously transferring data to VU Mem and performing microprogram operations.

TOPS

Following data transfer

Buffer A

VIF

VU

TOP

Buffer B
Transferred data processing

TOP

Buffer A

VIF

Transferred data processing

VU

TOPS

Buffer B
Following data transfer

Figure 3-7 Double Buffering in VU Mem
Double buffer addresses can be set with the VIF1_BASE and VIF1_OFST registers. These can be reflected in
the VIF1_TOPS register and the TOP register of VU1 by taking appropriate steps.
By setting the FLG bit in the VIFcode UNPACK, data can be transferred to the double buffers according to the
relative specification based on the address shown by the TOPS register. When a microprogram reads data from
double buffers, it reads the TOP register value using the XTOP instruction and accesses the data in the buffer
accordingly.
The values of TOPS and TOP are replaced whenever a microprogram is activated. So it is possible to process
transferred data with a microprogram while transferring data to two buffers alternately, by repeating data transfer
and microprogram activation.

© SCEI
-59-

SCE CONFIDENTIAL

EE Overview Version 6.0

3.3. Data Transfer to GS
Regular display lists generated by VU1 and exceptional display lists generated by the EE Core and VU0 are
transferred concurrently while having the transfer right arbitrated through the GIF. This is the typical data flow
from the EE to the GS.
The following are brief descriptions of this data flow.

3.3.1. Data Transfer Route
The GIF has three general data transfer paths called PATH1, PATH2, and PATH3. They work as follows.
• PATH1

PATH1 is a data transfer path from VPU1 data memory (VU Mem1) to the GS. When VU1
executes the XGKICK instruction, transfer processing via this path is performed.

• PATH2

PATH2 is a data transfer path between the FIFO inside the VPU1 VIF and the GIF. This path
is used when executing the DIRECT/DIRECT_HL instruction in the VIF and when
transferring data from the GS to main memory by using the image data transfer function of the
GS.

• PATH3

PATH3 is a direct data transfer path from the EE main bus to the GIF. This path is used when
transferring data from main memory or the SPR to the GS.

Priority and Timing
The three general data transfer paths are prioritized as PATH1>PATH2>PATH3. Whenever transfer of the
GS packet (described later in this document) ends in each path, transfer requests from other paths are
checked. If there is a request, transfer processing is performed according to priority.
Access to GS Privileged Register
The privileged registers of the GS are directly mapped to the I/O space of the EE core, and are accessible
without using the GIF, regardless of the state of the general data transfer paths. The GIF monitors access to
the privileged registers. When the transfer direction switching register (BUSDIR) is accessed, the GIF
switches data transfer direction accordingly.

3.3.2. Data Format
GS Packet
The basic unit of data transferred by the GIF is a GS primitive consisting of header information (GIFtag)
and following data. However, transfer processing is performed in units of GS packets in which several GS
primitives are gathered. The last GS primitive in the GS packet is shown by the termination information
(EOP=1) in the GIFtag.

© SCEI
-60-

SCE CONFIDENTIAL

EE Overview Version 6.0

GIFtag(EOP=0)

GS Primitive

DATA

GIFtag(EOP=0)

GS Packet

GS Primitive

DATA
GIFtag(EOP=1)

GS Primitive

DATA

Figure 3-8 GS Packet Structure
The above data structure is common to any data transfer path. For PATH2 and PATH3, however, the
VIFcode and DMATag are put in front of the GS packet.
It is necessary to align the GIFtag and data on a 128-bit boundary in memory.
GIFtag
The GIFtag has a 128-bit fixed length, and specifies the size and structure of the following data and the data
format (mode). The structure of the GIFtag is as follows:
1
2
7

6
4

REGS (max 16)

Name
NLOOP
EOP
PRE
PRIM
FLG

6
3

6 5
0 9

N
R
E
G

F
L
G

5
8

5
7

4
7

P
R
I
M

4
6

1
5

1
4

0
0

P
R
E

E
O
P

NLOOP

Pos.
14:0
15
46
57:47
59:58

Contents
Repeat count (GS primitive data size)
Termination information (End of Packet)
PRIM field enabled
Data to be set to the PRIM register of GS
Data format
00
PACKED mode
01
REGLIST mode
10
IMAGE mode
11
Disabled (Same operation as the IMAGE mode)
NREG
63:60
Number of register descriptors (Number of register descriptors in REGS
field)
REGS
127:64
Register descriptor (4 bits x 16 max.)
The value of the NLOOP field shows the data size of GS primitive, but the unit varies depending on the
data format.

© SCEI
-61-

SCE CONFIDENTIAL

EE Overview Version 6.0

3.3.3. PACKED Mode
PACKED mode formats (packs) vertex coordinate values, texture coordinate values, and color values generated
as vector data of 32 bits x 4 elements adjusting to the corresponding bit fields of the GS registers, and writes
them to the GS registers. The register descriptors put in the REGS field of the GIFtag correspond to every
qword in the following data, and show the data format and the register where the data is written. The following
9 types of register descriptors are available:
Name
PRIM
RGBAQ
ST
UV
XYZF2
XYZ2
FOG
A+D
NOP

Input Data
Destination Register
Type and attribute of primitive
PRIM
Vertex color
RGBAQ
Vertex texture coordinates
ST
Vertex texture coordinates (Texel coordinate values) UV
Vertex coordinate values + Fog coefficient
XYZF2/XYZF3
Vertex coordinate values
XYZ2/XYZ3
Fog coefficient
FOG
Arbitrary register set value
Specified arbitrarily.
Arbitrary
None (Not output)

3.3.4. REGLIST Mode
REGLIST mode transfers data strings formatted in such a way that they can be written to the GS register as they
are. The data following the GIFtag is considered to be data strings of 64 bits x 2 as they are, and the register
descriptors put in the REGS field of the GIFtag show to which register the data is written.

3.3.5. IMAGE Mode
IMAGE mode transfers image data by means of the host-local transfer function of the GS. The data following
the GIFtag is considered to be data strings of 64 bits x 2 and is written to the HWREG register of the GS
consecutively.

© SCEI
-62-

SCE CONFIDENTIAL

EE Overview Version 6.0

3.4. Image Decompression by IPU
The IPU (Image Processing Unit) is an image data processor whose main functions are bit stream
decompression and macro block decoding of MPEG2. Compressed data in main memory is decoded,
decompressed, and written back again to main memory. The decoded images are transferred to the GS and used
as moving picture image data and texture data.
Figure 3-9 illustrates the basic processing flow of the IPU.
Input data

BS128

Processing

↓
VLC

Output data

↓
CODE16

→

IDCT

→

↓
RAW16

RAW8

RGB32

↓
CSC

↓
Dither

→

↓
RGB32

↓
RGB16

→

VQ
↓
INDX4

Figure 3-9 IPU Processing Flow
The IPU has the following basic functions:
• MPEG2 macro block layer decoding
• MPEG2 bit stream decoding
• Bit stream decompression
The IPU has the following additional post-processing functions.
• YCbCr → RGB color conversion (CSC)
• 4 x 4 ordered dither
• Vector quantization (VQ)
The IPU handles the following data formats:
Name
Contents
BS128
MPEG2 bit stream subset
RGB32
RGBA pixels (A8+R8+G8+B8)
RGB16
RGBA pixels (A1+R5+G5+B5)
RAW8
Unsigned 8-bit YCbCr pixels
RAW16
Singed 16-bit YCbCr pixels (Only lower 9 bits are effective.)
INDX4
Unsigned 4-bit index pixels

Width
128 bits
32 bits
16 bits
8 bits
16 bits
4 bits

© SCEI
-63-

SCE CONFIDENTIAL

EE Overview Version 6.0

The following commands are available:
Name
Contents
BCLR
Input FIFO initialization command
IDEC
Intra decoding command
BDEC
Block decoding command
VDEC
Variable-length data decoding command
FDEC
Fixed-length data decoding command
SETIQ
IQ table setting command
SETVQ
VQ table setting command
CSC
Color space conversion command
PACK
Format conversion command
SETTH
Threshold setting command
Other functional features are as follows.
• Motion Compensation (MC)

Input
BS128
BS128
BS128
BS128
RAW8
RGB16
RAW8
RGB32
-

Output
RGB32/RGB16
RAW16
Variable-length code +
decoding code
Fixed-length data
RGB32/RGB16
RGB16/INDX4
-

In decoding an MPEG2 bit stream, motion compensation (MC) is not
performed in the IPU, but in the EE core, by using multimedia
instructions.

• Automatic Generation of Alpha

The alpha plane (transparency plane) is generated from the decoded
luminance value according to a fixed rule. This is useful in effectively
cutting out the texture pattern when decoding the bit stream without
the stencil pattern (transparent pixel mask pattern).

© SCEI
-64-



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.4
Linearized                      : Yes
Encryption                      : Standard V1.2 (40-bit)
User Access                     : Print, Fill forms, Extract, Assemble, Print high-res
Modify Date                     : 2002:06:10 14:07:49-07:00
Create Date                     : 2002:05:23 23:49:58Z
Page Count                      : 64
Creation Date                   : 2002:05:23 23:49:58Z
Mod Date                        : 2002:06:10 14:07:49-07:00
Producer                        : Acrobat Distiller 5.0.5 (Windows)
Author                          : Sony Computer Entertainment
Metadata Date                   : 2002:06:10 14:07:49-07:00
Creator                         : Sony Computer Entertainment
Title                           : EE Overview
Page Mode                       : UseOutlines
Page Layout                     : SinglePage
EXIF Metadata provided by EXIF.tools

Navigation menu