EE Overview Manual
User Manual: Pdf
Open the PDF directly: View PDF .
Page Count: 64
Download | |
Open PDF In Browser | View PDF |
EE Overview Copyright © 2002 Sony Computer Entertainment Inc. All Rights Reserved. SCE Confidential SCE CONFIDENTIAL EE Overview Version 6.0 © 2002 Sony Computer Entertainment Inc. Publication date: April 2002 Sony Computer Entertainment Inc. 1-1, Akasaka 7-chome, Minato-ku Tokyo 107-0052 Japan Sony Computer Entertainment America 919 East Hillsdale Blvd. Foster City, CA 94404, U.S.A. Sony Computer Entertainment Europe 30 Golden Square London W1F 9LD, U.K. The EE Overview is supplied pursuant to and subject to the terms of the Sony Computer Entertainment PlayStation® license agreements. The EE Overview is intended for distribution to and use by only Sony Computer Entertainment licensed Developers and Publishers in accordance with the PlayStation® license agreements. Unauthorized reproduction, distribution, lending, rental or disclosure to any third party, in whole or in part, of this book is expressly prohibited by law and by the terms of the Sony Computer Entertainment PlayStation® license agreements. Ownership of the physical property of the book is retained by and reserved by Sony Computer Entertainment. Alteration to or deletion, in whole or in part, of the book, its presentation, or its contents is prohibited. The information in the EE Overview is subject to change without notice. The content of this book is Confidential Information of Sony Computer Entertainment. and PlayStation® are registered trademarks, and GRAPHICS SYNTHESIZERTM and EMOTION ENGINETM are trademarks of Sony Computer Entertainment Inc. All other trademarks are property of their respective owners and/or their licensors. ® © SCEI -2- SCE CONFIDENTIAL EE Overview Version 6.0 About This Manual The "EE Overview" introduces the development concept and main points of the functions and operation of the Emotion Engine, the CPU of the PlayStation 2. - Chapter 1 "Architecture Policy" describes the processing and features of the Emotion Engine and Graphics Synthesizer, which allow the PlayStation 2 to implement high-speed real-time three-dimensional graphics, an important characteristic of home entertainment software. - Chapter 2 "Architecture Overview" introduces the functions and operations of the blocks which make up the Emotion Engine. - Chapter 3 "Functional Overview" describes the data flow between the blocks of the Emotion Engine and from the Emotion Engine to the Graphics Synthesizer. Changes Since Release of 5th Edition Since release of the 5th Edition of the EE Overview Manual, the following changes have been made. Note that each of these changes is indicated by a revision bar in the margin of the affected page. Ch. 2: Architecture Overview • A correction has been made to the description for Figure 2-11, in section 2.4. IPU Image Data Processor, on page 45. Ch. 3: Functional Overview • A correction has been made to section 3.3.1. Data Transfer Route, on page 60. © SCEI -3- SCE CONFIDENTIAL EE Overview Version 6.0 (This page is left blank intentionally) © SCEI -4- SCE CONFIDENTIAL EE Overview Version 6.0 Glossary Term EE EE Core COP0 COP1 COP2 GS GIF IOP SBUS VPU (VPU0/VPU1) VU (VU0/VU1) VIF (VIF0/VIF1) VIFcode SPR IPU word qword Slice Packet Transfer list Tag DMAtag GS primitive Context GIFtag Display list Definition Emotion Engine. CPU of the PlayStation 2. Generalized computation and control unit of EE. Core of the CPU. EE Core system control coprocessor. EE Core floating-point operation coprocessor. Also referred to as FPU. Vector operation unit coupled as a coprocessor of EE Core. VPU0. Graphics Synthesizer. Graphics processor connected to EE. EE Interface unit to GS. Processor connected to EE for controlling input/output devices. Bus connecting EE to IOP. Vector operation unit. EE contains 2 VPUs: VPU0 and VPU1. VPU core operation unit. VPU data decompression unit. Instruction code for VIF. Quick-access data memory built into EE Core (Scratchpad memory). EE Image processor unit. Unit of data length: 32 bits Unit of data length: 128 bits Physical unit of DMA transfer: 8 qwords or less Data to be handled as a logical unit for transfer processing. A group of packets transferred in serial DMA transfer processing. Additional data indicating data size and other attributes of packets. Tag positioned first in DMA packet to indicate address/size of data and address of the following packet. Data to indicate image elements such as point and triangle. A set of drawing information (e.g. texture, distant fog color, and dither matrix) applied to two or more primitives uniformly. Also referred to as the drawing environment. Additional data to indicate attributes of GS primitives. A group of GS primitives to indicate batches of images. © SCEI -5- SCE CONFIDENTIAL EE Overview Version 6.0 (This page is left blank intentionally) © SCEI -6- SCE CONFIDENTIAL EE Overview Version 6.0 Contents 1. Architecture Policy .................................................................................................................................................................. 9 1.1. Main Points of Architecture Policy ............................................................................................................................. 10 1.2. Expansion of Bandwidth .............................................................................................................................................. 12 1.3. Geometry Engines in Parallel....................................................................................................................................... 14 1.4. Data Decompression/Unpack..................................................................................................................................... 16 1.5. Memory Architecture .................................................................................................................................................... 17 2. Architecture Overview.......................................................................................................................................................... 21 2.1. EE Block Configuration ............................................................................................................................................... 22 2.2. EE Core: CPU................................................................................................................................................................ 24 2.2.1. EE Core Features................................................................................................................................................... 24 2.2.2. Memory Map........................................................................................................................................................... 25 2.2.3. Instruction Set Overview ...................................................................................................................................... 26 2.3. VPU: Vector Operation Processor.............................................................................................................................. 35 2.3.1. VPU Architecture................................................................................................................................................... 35 2.3.2. VPU0........................................................................................................................................................................ 38 2.3.3. VPU1........................................................................................................................................................................ 38 2.3.4. VIF: VPU Interface................................................................................................................................................ 39 2.3.5. Operation Mode and Programming Model ........................................................................................................ 39 2.3.6. VPU Instruction Set Overview............................................................................................................................. 40 2.4. IPU: Image Data Processor.......................................................................................................................................... 45 2.5. GIF: GS Interface.......................................................................................................................................................... 46 2.6. SIF: Sub-CPU Interface ................................................................................................................................................ 47 3. Functional Overview............................................................................................................................................................. 49 3.1. Data Transfer via DMA ................................................................................................................................................ 50 3.1.1. Sliced Transfer ........................................................................................................................................................ 50 3.1.2. Chain Mode Transfer............................................................................................................................................. 50 3.1.3. Interleave Transfer ................................................................................................................................................. 54 3.1.4. Stall Control ............................................................................................................................................................ 54 3.1.5. MFIFO .................................................................................................................................................................... 55 3.2. Data Transfer to VPU................................................................................................................................................... 56 3.2.1. VIF Overview ......................................................................................................................................................... 56 3.2.2. VIF Packet............................................................................................................................................................... 56 3.2.3. VIFcode Structure.................................................................................................................................................. 57 3.2.4. Data Transfer by UNPACK.................................................................................................................................. 58 3.2.5. Double Buffering ................................................................................................................................................... 59 3.3. Data Transfer to GS ...................................................................................................................................................... 60 3.3.1. Data Transfer Route .............................................................................................................................................. 60 3.3.2. Data Format............................................................................................................................................................ 60 3.3.3. PACKED Mode..................................................................................................................................................... 62 3.3.4. REGLIST Mode..................................................................................................................................................... 62 © SCEI -7- SCE CONFIDENTIAL EE Overview Version 6.0 3.3.5. IMAGE Mode.........................................................................................................................................................62 3.4. Image Decompression by IPU .....................................................................................................................................63 © SCEI -8- SCE CONFIDENTIAL EE Overview Version 6.0 1. Architecture Policy © SCEI -9- SCE CONFIDENTIAL EE Overview Version 6.0 1.1. Main Points of Architecture Policy Cutting-edge Process for Consumers A characteristic of a home entertainment computer (a consumer video game console) is that its functions and performance cannot be changed during its life. Changing functions and performance brings profit to neither the developer nor the user. With this in mind, the PlayStation 2 is designed to have the highest performance by adopting the latest technology and the most advanced manufacturing technology from the early stages, to secure a long product life with performance at the point of sale kept unchanged. Silicon for Emotion High-quality computer graphics require a huge amount of calculation. In addition, high-quality entertainment software requires a large amount of calculation, not only for beautiful graphics but also for logical inference and simulation of physical phenomena. The PlayStation 2 has sufficient resources to produce this level of computer graphics, along with these additional elements. Fast Rendering One of the most advanced manufacturing technologies for improving performance in computer graphics is embedded DRAM, equipped with both an operation circuit and memory. By using embedded DRAM for the rendering engine, the bandwidth between memory and processor expands dramatically. This eliminates a bottleneck in pixel fill rate, which has been a problem with rendering engines up to now, and improves drawing performance dynamically. Multi Path Geometry Geometry performance is decreased relative to the improved drawing performance. To increase performance and distribute the load, the architecture allows parallel geometry engines, and allows two or more processors to share the same rendering engine by timesharing. This is unlike the previous architecture, in which the rendering engines are in parallel. On-demand Data Decompression The performance of memory is decreased relative to the improved processor performance. To make effective use of low-capacity, low-speed memory, data is placed in memory in a compressed state, and is decompressed and generated as necessary. High-resolution textures and modeling data, which use a lot of memory, are normally kept in main memory in a compressed state and decompressed and generated by means of a special circuit as necessary. Stall Control and Memory FIFO A huge amount of intermediate data (display lists) is continually transferred from the geometry engine to the rendering engine. To control this data flow without imposing a load on the processor, an MFIFO (Memory FIFO) mechanism is provided. This allows synchronized data transfers from the geometry engine to memory and from memory to the rendering engine by using memory as a buffer. Application-Specific Processors Video game applications inevitably use regular processes such as coordinate conversion and image processing. Besides the processing load itself, context-switching overhead places a heavy load on the CPU. For these reasons, many small-scale sub-processors are applied to these regular processes to share CPU processing. © SCEI -10- SCE CONFIDENTIAL EE Overview Version 6.0 Intelligent Data Transport Distributed processing by increasing sub-processors requires synchronization and arbitration controls. To ensure that these controls are not a load on the CPU, all the instructions (programs) to the sub-processors are sent along with data by DMA transfer through main memory. Data Path Buffering In a UMA (Unified Memory Architecture) system with many sub-processors, competition for bus access creates a bottleneck. Therefore, a small-capacity buffer memory is embedded in each sub-processor. The results of processing are temporarily collected there and then collectively DMA-transferred to main memory. As a result, burst transfer becomes central to bus access. Transmission efficiency should improve as well. © SCEI -11- SCE CONFIDENTIAL EE Overview Version 6.0 1.2. Expansion of Bandwidth Embedded DRAM Since performance of the rendering engine is determined by access to the frame buffer (pixel fill rate), performance is maximized by using embedded DRAM in the GS (the frame buffer is embedded in the same chip as the rendering circuit) and by providing multiple pixel engines to draw several pixels in parallel. Total 2048bit width PXE PXE Ctrl Host Interface Frame Buffer PXE …. CRTC Video PXE Figure 1-1 Speedup in Rendering Engine by Embedded DRAM Complete 128-bit Data Bus The processor has a 128-bit width data bus and registers. The CPU's general-purpose registers (GPR) and floating-point coprocessor registers are 128 bits wide. All the processors are connected via a 128-bit bus. CPU Co-Processor 128 bit 128 bit GPR0 GPR1 VF00 VF01 128 bit …. …. GPR31 VF31 128 bit 128 bit Main Bus 128 bit Memory System Figure 1-2 128-bit Bus © SCEI -12- SCE CONFIDENTIAL EE Overview Version 6.0 Parallel 128-bit Integer Operation A multimedia instruction set is implemented. It uses the 128-bit wide GPRs (integer registers) in parallel by dividing them into fields of 8 bits x 16, 16 bits x 8, 32 bits x 4, and 64 bits x 2. The following example shows execution of 16-parallel 8-bit addition. 128bit rs rt a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15 + + + + + + + + + + + + + + + + b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 b11 b12 b13 b14 b15 8bit rd PADDB (Parallel Add Byte) c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 SQ (Store Quad Word) Main Memory Figure 1-3 128-bit Parallel Processing by Multimedia Instruction Parallel 128-bit Floating Operation The 128-bit floating-point registers are divided into four 32-bit floating-point fields. Four FMACs (floatingpoint multiply-add ALUs) are provided for four fields to perform operations in parallel. The following example shows the execution of four parallel 32-bit multiplications. 128 bit 32 bit VFa .x VFa .y MUL VFa .z VFa .w MUL FMAC 1 cycle (Throughput) VFb .x MUL FMAC VFc .x VFb .y VFb .w MUL FMAC VFc .y VFb .z VFc .z FMAC VFc.w 128 bit Figure 1-4 4-Parallel Floating-Point Operation © SCEI -13- SCE CONFIDENTIAL EE Overview Version 6.0 1.3. Geometry Engines in Parallel Principle To improve geometry performance relative to drawing performance, an architecture is implemented with two geometry engines connected in parallel to one rendering engine. One of the geometry engines consists of the CPU, with a high degree of flexibility, and a vector operation unit (VPU0) as a coprocessor to perform complex irregular geometry processing, including physical simulation. The other engine is structured with a programmable vector operation unit (VPU1) to perform simple, repetitive geometry processing such as background and distant views. The transfer right between the display lists from each geometry engine is arbitrated, and the display lists are supplied to the rendering engine asynchronously. 2 Geometry Engines Rendering Engine CPU GS GIF Vector Engine Arbiter Rendering Logic Frame Buffer Vector Engine Figure 1-5 Parallel Geometry Engines Dual Context The display lists supplied from the geometry engines have a context that includes status data such as texture page and drawing mode. To eliminate the need for setting context information again, two contexts are maintained in the GS, corresponding to the two geometry engines, VPU0 and VPU1. This is the dual context mechanism. GS GIF Arbiter Context Information GC0 SEL Display List Rendering Logic GC1 GC: Graphic Context Figure 1-6 Rendering Engine with Dual Context © SCEI -14- Frame Buffer SCE CONFIDENTIAL EE Overview Version 6.0 Data Path Of the two geometry engines, the higher-priority VPU1 is directly connected to the GS, and the lowerpriority CPU+VPU0 is connected to the GS through the main bus. Because data transfer from the lowerpriority geometry engine might be suspended, generated display lists are buffered temporarily in main memory. The corresponding DMA channels can monitor each other's transfer address so that the buffer does not overflow. GS Arbiter 1st Priority CPU VPU0 2nd Priority VPU1 Main Bus 128bit MFIFO Main Memory Figure 1-7 Typical Data Paths Application-Specific Path The two geometry paths seem to the programmer to be two independent paths. That is, it is possible to divide graphic processing of the application into two and allocate a portion to each geometry engine. In general, a high-speed geometry engine (VPU1) takes charge of regular processing such as background and distant view, and a geometry engine with a high degree of flexibility (CPU+VPU0) takes charge of complex irregular processing including physical simulation. Simple lighting calculations and transparency perspective conversions can be executed in VPU1, and the CPU does not have to participate in them directly. Non-fixed, emotional, and creative operation EE Core VPU0 Rendering Engine VPU1 Fixed, routine operation Figure 1-8 Processing Allocation of Geometry Engines © SCEI -15- SCE CONFIDENTIAL EE Overview Version 6.0 1.4. Data Decompression/Unpack Image Decompression High-resolution texture data requiring a large amount of memory is stored in main memory in a compressed state, and is decompressed with a special decompression processor (IPU) when used. The decompressed texture data is returned to main memory temporarily and transferred to the GS. Display List Compressed Image Decompresser Main Memory Texture GS Arbiter IPU Display List Figure 1-9 Image Data Decompression Geometry Data Unpack Modeling data is packed into an optimal bit width in data units, maintained in main memory, and automatically unpacked by the VIF when sent to the geometry engine (VPU). As a result, the data size in main memory is reduced, and the load on the VPU can be reduced. VPU VU 32bit VUMEM X Z unpack VIF Main Memory Y X Y Z 8/16bit Figure 1-10 Geometry Data Unpack © SCEI -16- 1.0 SCE CONFIDENTIAL EE Overview Version 6.0 1.5. Memory Architecture Hybrid UMA To correct the problems with UMA (Unified Memory Architecture), each processor has a high-speed, small capacity cache or working memory for exclusive use, and is connected to the large capacity shared memory through the high-speed memory. By storing the data read from or written to memory in 4-qword units, the cache speeds up the second and succeeding accesses to the nearby addresses and decreases the frequency of accesses to the main memory. Access to the main memory is made only when - the data attempted to be read is not in the cache (cache miss) - the data written to the cache is not reflected in memory (dirty) and the cache space is required to be freed to access other addresses (cache out). Data is transferred between the cache and main memory as burst access every 4-qword block (cache line) to improve the bus efficiency. CPU Geometry Engine Rendering Engine Local Cache Local Cache Local Cache Burst Access Main Bus Main Memory Figure 1-11 Shared Main Memory and Local Cache CPU Cache The CPU has an instruction cache (I-Cache) and a data cache (D-Cache). The data cache has the ability to load a necessary word from a cache line first (sub-block ordering) and to permit a hazard-free cache-line hit while a previous load is still in process (hit-under-miss). Since this hit-under-miss effect is similar to the prefetch (PREF) instruction, it is effective when the address to be accessed is known in advance. Cache Size Way Line Size Instruction Data 16 KB 8 KB 2-way 2-way 4 qwords 4 qwords Sub-block Ordering No Yes Hit-under-miss No Yes The output from the cache is also buffered in the Write Back Buffer (WBB). The WBB is a FIFO of 8 qwords. Write requests are stored here, and then written to memory according to the state of the main bus. © SCEI -17- SCE CONFIDENTIAL EE Overview Version 6.0 Uncached Access In applications primarily designed for computer graphics, writing display lists to memory is the major process. The display lists are calculated from the three-dimensional data just read from memory. When processing a one-way data flow like this, the use of cache may be a disadvantage. Furthermore, in some cases (e.g. when writing hardware registers and writing data which should be DMA-transferred), it is preferable that written data be reflected in the main memory immediately. Therefore, a mode that does not use cache ( uncached mode) is provided. To speed up reading while writing synchronously, an uncached accelerated mode that uses a special-purpose buffer (UCAB: uncached accelerated buffer) is also available. The UCAB (in size 8 qwords) speeds up continuous data reading from the adjoining addresses. Cached UnCached UnCached Accelarated GPR GPR GPR D Cache WBB Memory WBB Memory Figure 1-12 Three Memory Access Modes © SCEI -18- WBB UCAB Memory SCE CONFIDENTIAL EE Overview Version 6.0 Scratchpad RAM A general-purpose high-speed internal memory (Scratchpad RAM: SPR) useable as a working memory for the CPU is embedded, in addition to the data cache. DMA transfer between main memory and the SPR can be performed in parallel with SPR access from the CPU. Main memory access overhead can be hidden from the program by using the SPR as a double buffer. CPU CPU SPR #1 Calc pseudodual-port #2 DMA External Memory #1 #2 read Calc Calc read Calc read write read read write read Calc write read Calc write read Memory a) Architecture b) Scheduling Figure 1-13 Double Buffering with SPR List processor DMA Display lists are not always located in consecutive areas in memory. In most cases they can be arranged discontinuously by adopting a linked list structure. To negate the need for data sorting when transferring non-continuous data between processors, the DMAC can trace data lists according to the tag information (DMAtag) in the data. This releases the CPU from simple memory copying and increases efficiency in using the cache. DMA Start Address Reference Texture Reference Vertex CONT Matrix REF REF NEXT Next Object Matrix Texture Vertex Figure 1-14 List Processing with DMAC © SCEI -19- SCE CONFIDENTIAL EE Overview Version 6.0 (This page is left blank intentionally) © SCEI -20- SCE CONFIDENTIAL EE Overview Version 6.0 2. Architecture Overview © SCEI -21- SCE CONFIDENTIAL EE Overview Version 6.0 2.1. EE Block Configuration The block diagram and main specifications of the EE are shown below. EE Core cop1 FPU CPU IU 0 64bit Interrupt Controller I$ 16KB VPU0 cop2 IU 1 64bit D$ 8KB SIO VU0 micro MEM SPRAM 16KB VU1 VU MEM micro MEM VIF 128 Timer 128 VPU1 10ch DMA Controller VIF 128 128 IPU 128 Memory Interface External Memory Figure 2-1 EE Block Diagram © SCEI -22- VU MEM I/O Interface Peripherals GIF GS SCE CONFIDENTIAL Main Specifications Block Name CPU Core CACHE MMU Instruction set Coprocessors FPU VPU0 Coordinate engine VPU1 Image engine IPU Built-in devices DMAC DRAMC INTC TIMER GIF SIF Main bus EE Overview Version 6.0 Contents 2-way superscalar Data bus Internal bus Internal register I-Cache D-Cache 128 bits (64 bits x 2) 128 bits 128 bits x 32 16 KB 2-way set associative 8 KB 2-way set associative with line lock Scratchpad RAM (SPR) 16 KB 48-double-entry TLB 32-bit physical/logical address space conversion 64 bits, conforms to MIPS III (partly to MIPS IV) 128-bit parallel multimedia instruction set 3-operand multiply/multiply-add calculation instruction Interrupt enable/disable instruction 32-bit single-precision floating-point multiply-add arithmetic logical unit 32-bit single-precision floating-point divide calculator 32-bit single-precision floating-point multiply-add arithmetic logical unit x 4 32-bit single-precision floating-point divide calculator x 1 Data unpacking function (VIF) Programmable LIW DSP Internal bus (data) 128 bits 32-bit single-precision floating-point multiply-add arithmetic logical unit x 5 32-bit single-precision floating-point divide calculator x 2 Data unpacking function (VIF) Programmable LIW DSP Internal bus (data) 128 bits MPEG2 video layer decoding/bit stream decoding/IDCT/CSC (Color Space Conversion)/Dither/ VQ (Vector Quantization) 10ch (transfer between memory and I/O, memory and SPR) RDRAM controller 2 types: INT0 (for interrupt from each device)/INT1 (for interrupt from DMAC) 16 bits x 4 256-byte FIFO embedded Data formatting function Arbitration (PATH1, 2 and 3) 32-bit (address/data multiplex), 128-byte FIFO embedded 128 bits © SCEI -23- SCE CONFIDENTIAL EE Overview Version 6.0 2.2. EE Core: CPU 2.2.1. EE Core Features The EE Core is a processor that implements the superscalar 64-bit MIPS IV instruction set architecture. In particular, 128-bit parallel processing for multimedia applications has been greatly expanded. The EE Core is composed of the CPU, a floating-point execution unit (Coprocessor 1), an instruction cache, a data cache, scratchpad RAM, and a tightly coupled vector operation unit (Coprocessor 2). The CPU has two pipelines and can decode two instructions in each cycle. Instructions are executed and completed in order. However, since data cache misses are not blocked and a single cache miss does not stall the pipelines, a load miss or non-cached load completion may occur out of order. Completion of Multiply, MultiplyAdd, Divide, Prefetch, and Coprocessor instructions may also occur out of order. The above features are summarized as follows: • 2-way superscalar pipelines • 128-bit (64 bits x 2) data path and 128-bit system bus • Instruction set - 64-bit instruction set conforming to MIPS III and partly conforming to MIPS IV (Prefetch instruction and conditional move instructions) - Non-blocking load instructions - Three-operand Multiply and Multiply-Add instructions - 128-bit multimedia instructions (Parallel processing of 64 bits x 2, 32 bits x 4, 16 bits x 8, or 8 bits x 16) • On-chip caches and scratchpad RAM - Instruction cache: 16 KB, 2-way set associative - Data cache: 8 KB, 2-way set associative (with a write back protocol) - Data scratchpad RAM: 16 KB - Data cache line lock function - Prefetch function • MMU - 48-double-entry full-set-associative address translation look-aside buffer (TLB) © SCEI -24- SCE CONFIDENTIAL EE Overview Version 6.0 2.2.2. Memory Map ffff_ffff ffff_ffff c000_0000 NO Mount 1fc0_0000 KSEG1 System Boot ROM (Max. 4MB) KSEG0 System Reserved 8000_0000 1400_0000 1200_0000 1000_0000 GS Registers 8000_0000 Extend Main Memory (Max. 1GB) EE Registers Main Memory (Max. 256MB) a000_0000 4000_0000 NO Mount 2000_0000 System 0000_0000 0000_0000 Physical Memory Kernel Mode Figure 2-2 EE Core Memory Map © SCEI -25- SCE CONFIDENTIAL EE Overview Version 6.0 2.2.3. Instruction Set Overview The EE Core has an instruction set consisting of the MIPS III instruction set, part of the MIPS IV instruction set, 128-bit multimedia instructions, three-operand multiply instructions, I1 pipe operation instructions, and others. The EE Core instructions are listed below. Integer Add/Subtract Instruction Fuction ADD Add Word ADDI Add Immediate Word ADDIU Add Immediate Unsigned Word ADDU Add Unsigned Word DADD Doubleword Add DADDI Doubleword Add Immediate DADDIU Doubleword Add Immediate Unsigned DADDU Doubleword Add Unsigned DSUB Doubleword Subtract DSUBU Doubleword Subtract Unsigned SUB Subtract Word SUBU Subtract Unsigned Word PADDB Parallel Add Byte PADDH Parallel Add Halfword PADDSB Parallel Add with Signed Saturation Byte PADDSH Parallel Add with Signed Saturation Halfword PADDSW Parallel Add with Signed Saturation Word PADDUB Parallel Add with Unsigned Saturation Byte PADDUH Parallel Add with Unsigned Saturation Halfword PADDUW Parallel Add with Unsigned Saturation Word PADDW Parallel Add Word PADSBH Parallel Add/Subtract Halfword PSUBB Parallel Subtract Byte PSUBH Parallel Subtract Halfword PSUBSB Parallel Subtract with Signed Saturation Byte PSUBSH Parallel Subtract with Signed Saturation Halfword PSUBSW Parallel Subtract with Signed Saturation Word PSUBUB Parallel Subtract with Unsigned Saturation Byte PSUBUH Parallel Subtract with Unsigned Saturation Halfword PSUBUW Parallel Subtract with Unsigned Saturation Word PSUBW Parallel Subtract Word © SCEI -26- Level MIPS I MIPS I MIPS I MIPS I MIPS III MIPS III MIPS III MIPS III MIPS III MIPS III MIPS I MIPS I 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI SCE CONFIDENTIAL EE Overview Version 6.0 Integer Multiply/Divide Instruction Function DIV Divide Word DIV1 Divide Word Pipeline 1 DIVU Divide Unsigned Word DIVU1 Divide Unsigned Word Pipeline 1 MULT Multiply Word MULTU Multiply Unsigned Word MULT1 Multiply Word Pipeline 1 MULTU1 Multiply Unsigned Word Pipeline 1 PDIVBW Parallel Divide Broadcast Word PDIVUW Parallel Divide Unsigned Word PDIVW Parallel Divide Word PMULTH Parallel Multiply Halfword PMULTUW Parallel Multiply Unsigned Word PMULTW Parallel Multiply Word Level MIPS I EE Core MIPS I EE Core MIPS I MIPS I EE Core EE Core 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI Integer Multiply-Add Instruction Function MADD Multiply-Add word MADD1 Multiply-Add word Pipeline 1 MADDU Multiply-Add Unsigned word MADDU1 Multiply-Add Unsigned word Pipeline 1 PHMADH Parallel Horizontal Multiply-Add Halfword PHMSBH Parallel Horizontal Multiply-Subtract Halfword PMADDH Parallel Multiply-Add Halfword PMADDUW Parallel Multiply-Add Unsigned Word PMADDW Parallel Multiply-Add Word PMSUBH Parallel Multiply-Subtract Halfword PMSUBW Parallel Multiply-Subtract Word Level EE Core EE Core EE Core EE Core 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI Floating-Point Instruction ADD.S ADDA.S MADD.S MADDA.S MUL.S MULA.S MSUB.S MSUBA.S SUB.S SUBA.S Function Floating Point Add Floating Point Add to Accumulator Floating Point Multiply-Add Floating Point Multiply and Add to Accumulator Floating Point Multiply Floating Point Multiply to Accumulator Floating Point Multiply and Subtract Floating Point Multiply and Subtract from Accumulator Floating Point Subtract Floating Point Subtract to Accumulator Level MIPS I EE Core MIPS I EE Core MIPS I EE Core MIPS I EE Core MIPS I EE Core © SCEI -27- SCE CONFIDENTIAL EE Overview Version 6.0 Shift Instruction DSRA DSLL DSLL32 DSLLV DSRA32 DSRAV DSRL DSRL32 DSRLV SLL SLLV SRA SRAV SRL SRLV PSLLH PSLLVW PSLLW PSRAH PSRAVW PSRAW PSRLH PSRLVW PSRLW QFSRV Function Doubleword Shift Right Arithmetic Doubleword Shift Left Logical Doubleword Shift Left Logical Plus 32 Doubleword Shift Left Logical Variable Doubleword Shift Right Arithmetic Plus 32 Doubleword Shift Right Arithmetic Variable Doubleword Shift Right Logical Doubleword Shift Right Logical Plus 32 Doubleword Shift Right Logical Variable Shift Word Left Logical Shift Word Left Logical Variable Shift Word Right Arithmetic Shift Word Right Arithmetic Variable Shift Word Right Logical Shift Word Right Logical Variable Parallel Shift Left Logical Halfword Parallel Shift Left Logical Variable Word Parallel Shift Left Logical Word Parallel Shift Right Arithmetic Halfword Parallel Shift Right Arithmetic Variable Word Parallel Shift Right Arithmetic Word Parallel Shift Right Logical Halfword Parallel Shift Right Logical Variable Word Parallel Shift Right Logical Word Quadword Funnel Shift Right Variable Level MIPS III MIPS III MIPS III MIPS III MIPS III MIPS III MIPS III MIPS III MIPS III MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI Logical Instruction AND ANDI NOR OR ORI XOR XORI PAND PNOR POR PXOR Function And And Immediate Not Or Or Or Immediate Exclusive OR Exclusive OR Immediate Parallel And Parallel Not Or Parallel Or Parallel Exclusive OR Level MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI © SCEI -28- SCE CONFIDENTIAL EE Overview Version 6.0 Compare Instruction SLTI SLTIU SLTU PCEQB PCEQH PCEQW PCGTB PCGTH PCGTW C.EQ.S C.F.S C.LE.S C.LT.S Function Set on Less Than Immediate Set on Less Than Immediate Unsigned Set on Less Than Unsigned Parallel Compare for Equal Byte Parallel Compare for Equal Halfword Parallel Compare for Equal Word Parallel Compare for Greater Than Byte Parallel Compare for Greater Than Halfword Parallel Compare for Greater Than Word Floating Point Compare (Equal) Floating Point Compare (False) Floating Point Compare (Less than or Equal) Floating Point Compare (Less than) Level MIPS I MIPS I MIPS I 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI MIPS I MIPS I MIPS I MIPS I Min/Max Instruction PMAXH PMAXW PMINH PMINW MAX.S MIN.S Function Parallel Maximize Halfword Parallel Maximize Word Parallel Minimize Halfword Parallel Minimize Word Floating Point Maximum Floating Point Minimum Level 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI EE Core EE Core Data Format Conversion Instruction Function PEXT5 Parallel Extend Upper from 5 bits PPAC5 Parallel Pack to 5 bits CVT.S.W Fixed point Convert to Single Floating Point CVT.W.S Floating point Convert to Word Fixed-Point Level 128-bit MMI 128-bit MMI MIPS I MIPS I © SCEI -29- SCE CONFIDENTIAL EE Overview Version 6.0 Reordering Instruction PCPYH PCPYLD PCPYUD PEXCH PEXCW PEXEH PEXEW PEXTLB PEXTLH PEXTLW PEXTUB PEXTUH PEXTUW PINTEH PINTH PPACB PPACH PPACW PREVH PROT3W Function Parallel Copy Halfword Parallel Copy Lower Doubleword Parallel Copy Upper Doubleword Parallel Exchange Center Halfword Parallel Exchange Center Word Parallel Exchange Even Halfword Parallel Exchange Even Word Parallel Extend Lower from Byte Parallel Extend Lower from Halfword Parallel Extend Lower form Word Parallel Extend Upper from Byte Parallel Extend Upper from Halfword Parallel Extend Upper from Word Parallel Interleave Even Halfword Parallel Interleave Halfword Parallel Pack to Byte Parallel Pack to Halfword Parallel Pack to Word Parallel Reverse Halfword Parallel Rotate 3 Words Level 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI Others Instruction PABSH PABSW PLZCW ABS.S NEG.S RSQRT.S SQRT.S Function Parallel Absolute Halfword Parallel Absolute Word Parallel Leading Zero or One Count Word Floating Point Absolute Value Floating Point Negate Floating Point Reciprocal Root Floating Point Square Root Level 128-bit MMI 128-bit MMI 128-bit MMI MIPS I MIPS I MIPS IV MIPS II © SCEI -30- SCE CONFIDENTIAL EE Overview Version 6.0 Register-Register Transfer Instruction Function MFHI Move from HI Register MFLO Move from LO Register MOVN Move Conditional on Not Zero MOVZ Move Conditional on Zero MTHI Move to HI Register MTLO Move to LO Register MFHI1 Move From HI1 Register MFLO1 Move From LO1 Register MTHI1 Move To HI1 Register MTLO1 Move to LO1 Register PMFHI Parallel Move From HI Register PMFHL Parallel Move from HI/LO Register PMFLO Parallel Move from LO Register PMTHI Parallel Move To HI Register PMTHL Parallel Move To HI/LO Register PMTLO Parallel Move To LO Register MFC1 Move Word from Floating Point MOV.S Floating Point Move MTC1 Move Word to Floating Point Level MIPS I MIPS I MIPS IV MIPS IV MIPS I MIPS I EE Core EE Core EE Core EE Core 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI 128-bit MMI MIPS I MIPS I MIPS I Load from Memory Instruction Function LB Load Byte LBU Load Byte Unsigned LD Load Doubleword LDL Load Doubleword Left LDR Load Doubleword Right LH Load Halfword LHU Load Halfword Unsigned LUI Load Upper Immediate LW Load Word LWL Load Word Left LWR Load Word Right LWU Load Word Unsigned LQ Load Quadword LWC1 Load Word to Floating Point Level MIPS I MIPS I MIPS III MIPS III MIPS III MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I 128-bit MMI MIPS I © SCEI -31- SCE CONFIDENTIAL EE Overview Version 6.0 Store in Memory Instruction Function SB Store Byte SD Store Doubleword SDL Store Doubleword Left SDR Store Doubleword Right SH Store Halfword SW Store Word SWL Store Word Left SWR Store Word Right SQ Store Quadword SWC1 Store Word from Floating Point Level MIPS I MIPS III MIPS III MIPS III MIPS I MIPS I MIPS I MIPS I 128-bit MMI MIPS I Special Data Transfer Instruction Function MFSA Move from Shift Amount Register MTSA Move to Shift Amount Register MTSAB Move Byte Count to Shift Amount Register MTSAH Move Halfword Count to Shift Amount Register MFBPC Move from Breakpoint Control Register MFCO Move from System Control Coprocessor MFDAB Move from Data Address Breakpoint register MFDABM Move from Data Address Breakpoint Mask Register MFDVB Move from Data value Breakpoint Register MFDVBM Move from Data Value Breakpoint Mask Register MFIAB Move from Instruction Address Breakpoint Register MFIABM Move from Instruction Address Breakpoint Mask Register MFPC Move from Performance Counter MFPS Move from Performance Event Specifier MTBPC Move to Breakpoint Control Register MTCO Move to System Control Coprocessor MTDAB Move to Data Address Breakpoint Register MTDABM Move to Data Address Breakpoint Mask Register MTDVB Move to Data Value Breakpoint Register MTDVBM Move to Data Value Breakpoint Mask Register MTIAB Move to Instruction Address Breakpoint Register MTIABM Move to Instruction Address Mask Breakpoint Register MTPC Move to Performance Counter MTPS Move to Performance Event Specifier CFC1 Move Control Word from Floating Point CTC1 Move Control Word to Floating Point © SCEI -32- Level EE Core EE Core EE Core EE Core MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I SCE CONFIDENTIAL EE Overview Version 6.0 Conditional Branch and Jump Instruction Function BEQ Branch on Equal BEQL Branch on Equal Likely BGEZ Branch on Greater Than or Equal to Zero BGEZL Branch on Greater Than or Equal to Zero Likely BGTZ Branch on Greater Than Zero BGTZL Branch on Greater Than Zero Likely BLEZ Branch on Less Than or Equal to Zero BLEZL Branch on Less Than or Equal to Zero Likely BLTZ Branch on Less Than Zero BLTZL Branch on Less Than Zero Likely BNE Branch on Not Equal BNEL Branch on Not Equal Likely BC0F Branch on Coprocessor 0 False BC0FL Branch on Coprocessor 0 False Likely BC0T Branch on Coprocessor 0 True BC0TL Branch on Coprocessor 0 True Likely BC1F Branch on FP False BC1FL Branch on FP False Likely BC1T Branch on FP True BC1TL Branch on FP True Likely BC2F Branch on Coprocessor 2 False BC2FL Branch on Coprocessor 2 False Likely BC2T Branch on Coprocessor 2 True BC2TL Branch on Coprocessor 2 True Likely J Jump JR Jump Register Level MIPS I MIPS II MIPS I MIPS II MIPS I MIPS II MIPS I MIPS II MIPS I MIPS II MIPS I MIPS II MIPS I MIPS I MIPS I MIPS I MIPS I MIPS II MIPS I MIPS II MIPS I MIPS I MIPS I MIPS I MIPS I MIPS I Subroutine Call Instruction BGEZAL BGEZALL BLTZAL BLTZALL JAL JALR Level MIPS I MIPS II MIPS I MIPS II MIPS I MIPS I Function Branch on Greater Than or Equal to Zero and Link Branch on Greater Than or Equal to Zero and Link Likely Branch on Less Than Zero and Link Branch on Less Than Zero and Link Likely Jump and Link Jump and Link Register © SCEI -33- SCE CONFIDENTIAL EE Overview Version 6.0 Break and Trap Instruction BREAK SYSCALL TEQ TEQI TGE TGEI TGEIU TGEU TLT TLTI TLTIU TLTU TNE TNEI ERET Function Breakpoint System Call Trap if Equal Trap if Equal Immediate Trap if Greater or Equal Trap if Greater or Equal Immediate Trap if Greater or Equal Immediate Unsigned Trap if Greater or Equal Unsigned Trap if Less Than Trap if Less Than Immediate Trap if Less Than Immediate Unsigned Trap if Less Than Unsigned Trap if Not Equal Trap if Not Equal Immediate Exception Return Level MIPS I MIPS I MIPS II MIPS II MIPS II MIPS II MIPS II MIPS II MIPS II MIPS II MIPS II MIPS II MIPS II MIPS II MIPS III Others Instruction SYNC.stype PREF DI EI Function Synchronize Shared Memory Prefetch Disabled Interrupt Enabled Interrupt Level MIPS II MIPS IV MIPS I MIPS I © SCEI -34- SCE CONFIDENTIAL EE Overview Version 6.0 2.3. VPU: Vector Operation Processor The EE has two on-chip vector operation processors with the same architecture, VPU0 and VPU1, for floatingpoint vector operation indispensable to geometry processing. VPU0 is connected to the EE Core via a 128-bit coprocessor bus. The operation resources and registers for VPU0 can be used directly from the EE Core by using coprocessor instructions and not by using the main bus. VPU1 is directly connected to the rendering engine, the GS, via the GIF (Graphics Synthesizer Interface Unit). Display lists generated in VPU1 are not transferred to the GS via the main bus. VPU0 and VPU1 each have a packet expansion engine called VIF (VPU Interface Unit) at the front end. They are named VIF0 and VIF1 respectively. VPU0 VPU1 ALU FMAC EE Core ALU micro Mem0 FMAC FDIV 128 64 VU Mem0 micro Mem1 FDIV 128 64 VU Mem1 VIF0 GIF 128 VIF1 128 Main Memory Figure 2-3 VPU-Related Block Diagram 2.3.1. VPU Architecture The 2 VPUs basically have the same architecture, consisting of the VU, VU Mem (data memory for VU), and VIF (compressed-data decompression engine). The VU is a processor unit consisting of several FMACs (Floating-point Multiply-Add ALUs), FDIV (Floating-point Divide Calculator), 32 four-parallel floating-point registers, 16 integer registers, and a Micro Mem (program memory). It loads data from the VU Mem in 128-bit units (single-precision floating-point x 4), performs operations according to microprograms placed in the Micro Mem, and stores the results in the VU Mem. Microprograms use a 64-bit-long LIW (Long Instruction Word) instruction set, and can concurrently execute floating-point multiply-add operations in the Upper 32-bit field (Upper instruction field) and floating-point divide or integer operations in the Lower 32-bit field (Lower instruction field). © SCEI -35- SCE CONFIDENTIAL EE Overview Version 6.0 Micro instruction fetch unit Vector Unit : VU Micro Mem special registers 4 KBytes or 16 KBytes 64 63 Upper Instruction bold line : 128 bits VI16~VI31 0 Lower Instruction 32 32 16 COP2 control registers 16 QMFC2 / SQC2 0 Lower Execution Unit 32 BRU EFU IALU (COP2 data registers) LSU FDIV FMACx FMACy VF00~VF31 FMACz registers Upper Execution Unit RANDU/etc floating FMACw QMTC2 / LQC2 127 integer registers VI00~VI15 CFC2 32 CTC2 16 Vector Processing Unit : VPU VU Mem 4 KBytes or 16 KBytes VIF External Units Figure 2-4 VU Block Diagram Following are brief descriptions of the VPU units. FMAC This unit handles add/subtract, multiply, and multiply-add of the floating-point numbers. FMACx, FMACy, FMACz, and FMACw are mounted to execute four-element vector operations efficiently. The latency of instructions which use the FMAC has been unified at four cycles to increase the efficiency of pipeline processing. FDIV This unit performs self-synchronous type floating-point divide/square root operations. FDIV operations differ from others in latency, so the results are stored in the Q register. LSU This unit controls loading and storing to and from VU Mem. Load/Store must be performed in units of 128 bits, but can be masked in units of x, y, z and w fields. IALU This unit performs 16-bit integer operations. Loop counter operations and load/store address calculations are performed in conjunction with the integer register. BRU This unit controls jump and conditional branch. RANDU This unit generates random numbers. Random numbers are generated by the M sequence and stored in the R register. © SCEI -36- SCE CONFIDENTIAL EE Overview Version 6.0 EFU This is an elementary function unit, which executes operations such as exponential and trigonometric functions. This unit is mounted only on VU1. Operation results are stored in the P register. Floating-Point Registers 32 128-bit floating-point registers (VF00 - VF31) are mounted. Each register can be divided into 4 fields of x, y, z, and w, and is equivalent to a vector of four single-precision floating-point numbers. VF00 is a constant register. Integer Registers Sixteen 16-bit integer registers (VI00 - VI15) are mounted. These registers are used as loop counters, and used for load/store address calculations. VI00 is a constant register. VU Mem This is data memory for the VU's exclusive use. Memory capacity is 4 Kbytes for VU0 and 16 Kbytes for VU1. This memory is connected to the LSU at a width of 128 bits, and addresses are aligned on qword boundaries. Address 0x0000 w z y x 0x0010 w z y x : : 0x0ff0 w z y x : Implemented on VU1 only : 0x3ff0 w z y x Figure 2-5 VU Mem Memory Map Furthermore, VU1 registers are mapped to addresses 0x4000 to 0x43ff in VU0. Micro Mem This is on-chip memory, which stores microinstruction programs. Memory capacity is 4 Kbytes in VU0 and 16 Kbytes in VU1. Address 0x0000 Upper Lower 0x0008 Upper Lower : : 0x0ff8 Upper Lower : Implemented on VU1 only : 0x3ff8 Upper Lower Figure 2-6 Micro Mem Memory Map © SCEI -37- SCE CONFIDENTIAL EE Overview Version 6.0 2.3.2. VPU0 VPU0 has a macro mode, which operates according to coprocessor instructions from the EE Core, and a micro mode, which operates independently according to microprograms stored in the Micro Mem. Almost all the instructions used in micro mode are also defined as coprocessor instructions, and are executable directly from the EE Core. Similarly, VPU0 registers can be referred to directly from the EE Core with coprocessor transfer instructions. VPU0 is tightly coupled with the EE Core as mentioned above, and takes charge of relatively small-sized processing. Upper Exec. Unit 32 EE Core Cop2 Interface 128 32 Main Bus Lower Exec.Unit FMAC 128 128bit Floating Regs Integer Regs 32 FDIV 32 128 Micro Mem 4KB VU Mem 4KB VIF VPU0 Figure 2-7 VPU0 Block Diagram 2.3.3. VPU1 VPU1 operates only in micro mode. VPU1 has a larger Micro Mem and VU Mem than VPU0, and is equipped with an EFU. It is also directly connected to the GIF, and has additional synchronization control instructions such as transfer to the GIF. Furthermore, it structures double buffers in VU Mem and has additional functions to perform data transfer and operations in parallel. As mentioned above, VPU1 operates autonomously as a geometry engine independently of the EE Core. Highspeed processing is possible with VPU1, but because of the limits of complexity of what it can process, it divides processing of standard three-dimensional graphics. VPU1 operation results are transferred from VU Mem1 to the GS via the GIF, with the highest priority. © SCEI -38- SCE CONFIDENTIAL EE Overview Version 6.0 Rendering Engine 64 Upper Exec.Unit Lower Exec.Unit FMAC 128 128 bit Floating Regs Integer Regs 32 GIF FDIV 32 Path1 128 128 Micro Mem VU Mem 16KB 16KB Path2 Path3 128 128 VIF VPU1 Main Bus Figure 2-8 VPU1 Block Diagram 2.3.4. VIF: VPU Interface The VIF functions as a preprocessor for the VPU. The VIF unpacks the packed vertex data, based on the specification of the tag (VIFtag) at the start of the data, and transfers it to the data memory (VU Mem) of the VPU. As a result, in addition to reducing the data size in main memory, the VIF removes the load in data formatting from the VPU, which has low degree of programming freedom. The VIF also stores microprograms in Micro Mem and transfers DIRECT data to the GIF according to the VIFtag specification. 2.3.5. Operation Mode and Programming Model The VU has two execution modes, micro mode and macro mode. In micro mode, the VU functions as a standalone processor and executes microprograms stored in Micro Mem. VU1 operates in this mode. In macro mode, the VU executes macroinstructions as COP2 (Coprocessor 2) of the EE Core. VU0 operates primarily in this mode. Microinstructions are LIW (Long Instruction Word) instructions of 32 bits x 2, and can concurrently execute an Upper instruction, which uses the upper 32 bits of the instruction word, and a Lower instruction, which uses the lower 32 bits of the instruction word. The Upper instruction controls the FMAC, and the Lower instruction controls operations which use the FDIV/EFU/LSU/BRU and integer registers. In the Upper instruction, 4 FMACs are operable concurrently with 1 instruction, and a four-dimensional vector calculation can be made in 1 cycle (throughput). Upper Instruction MUL VF01,VF02,VF03 Lower Instruction SQ VF04, VI01 Lower Exec. Unit Upper Exec. Unit FMAC FMAC FMAC FMAC FDIV LSU VU Mem Figure 2-9 Upper Instruction and Lower Instruction © SCEI -39- SCE CONFIDENTIAL EE Overview Version 6.0 Operation 4-parallel floating-point multiply + 4-parallel floating-point add Floating point divide 4 x 4 matrix * 4-row vector 4 x 4 matrix * 4 x 4 matrix 1 vertex processing (matrix * vector + divide) Latency 4 7 8 20 19 Throughput 1 7 4 16 8 Some microinstructions do not have macroinstruction equivalents. Macro mode cannot execute the Upper instruction and Lower instruction at the same time, either. However, macroinstructions can execute the CALLMS instruction, which executes a microinstruction program in Micro Mem like a subroutine, and the COP2 data transfer instruction, which transfers data to the VU registers. Operation Operation code Instruction set Total number of instructions EFU Register Micro Mode (VU1) Operates as a stand-alone processor 64-bit-long LIW instruction Macro Mode (VU0) Operates as a coprocessor of the EE Core 32-bit MIPS COP2 instruction Upper instruction + Lower instruction (Can be specified concurrently) EFU instruction External unit control instruction 127 instructions Upper instruction Lower instruction (partial) VCALLMS, VCALLMSR instruction COP2 transfer instruction 90 instructions Usable as an option Floating-point register: 32 x 128 bits Integer register: 16 Special register: ACC, I, Q, R (, P) Not supported Floating-point register: 32 x 128 bits Integer register: 16 Special register: ACC, I, Q, R Control register: 16 2.3.6. VPU Instruction Set Overview VPU microinstructions/macroinstructions are listed below. Floating-Point Operation Microinstruction Upper Lower ABS ADD ADDA ADDAbc ADDAi ADDAq ADDbc ADDi ADDq DIV MADD MADDA MADDAbc MADDAi MADDAq MADDbc - Macro-instruction Function VABS VADD VADDA VADDAbc VADDAi VADDAq VADDbc VADDi VADDq VDIV VMADD VMADDA VMADDAbc VMADDAi VMADDAq VMADDbc absolute addition ADD output to ACC ADD output to ACC broadcast bc field ADD output to ACC broadcast I register ADD output to ACC broadcast Q register ADD broadcast bc field ADD broadcast I register ADD broadcast Q register floating divide MUL and ADD MUL and ADD output to ACC MUL and ADD output to ACC broadcast bc field MUL and ADD output to ACC broadcast I register MUL and ADD output to ACC broadcast Q register MUL and ADD broadcast bc field © SCEI -40- SCE CONFIDENTIAL Microinstruction Upper Lower MADDi MADDq MAX MAXbc MAXi MINI MINIbc MINIi MSUB MSUBA MSUBAbc MSUBAi MSUBAq MSUBbc MSUBi MSUBq MUL MULA MULAbc MULAi MULAq MULbc MULi MULq OPMSUB OPMULA RSQRT SQRT SUB SUBA SUBAbc SUBAi SUBAq SUBbc SUBi SUBq - EE Overview Version 6.0 Macro-instruction Function VMADDi VMADDq VMAX VMAXbc VMAXi VMINI VMINIbc VMINIi VMSUB VMSUBA VMSUBAbc VMSUBAi VMSUBAq VMSUBbc VMSUBi VMSUBq VMUL VMULA VMULAbc VMULAi VMULAq VMULbc VMULi VMULq VOPMSUB VOPMULA VRSQRT VSQRT VSUB VSUBA VSUBAbc VSUBAi VSUBAq VSUBbc VSUBi VSUBq MUL and ADD broadcast I register MUL and ADD broadcast Q register maximum MAX broadcast bc field MAX broadcast I register minimum MINI broadcast bc field MINI broadcast I register MUL and SUB MUL and SUB output to ACC MUL and SUB output to ACC broadcast bc field MUL and SUB output to ACC broadcast I register MUL and SUB output to ACC broadcast Q register MUL and SUB broadcast bc field MUL and SUB broadcast I register MUL and SUB broadcast Q register multiply MUL output to ACC MUL output to ACC broadcast bc field MUL output to ACC broadcast I register MUL output to ACC broadcast Q register MUL broadcast bc field MUL broadcast I register MUL broadcast Q register outer product MSUB outer product MULA floating reciprocal square-root floating square-root subtraction SUB output to ACC SUB output to ACC broadcast bc field SUB output to ACC broadcast I register SUB output to ACC broadcast Q register SUB broadcast bc field SUB broadcast I register SUB broadcast Q register © SCEI -41- SCE CONFIDENTIAL Format Conversion Microinstruction Upper Lower FTOI0 FTOI12 FTOI15 FTOI4 ITOF0 ITOF12 ITOF15 ITOF4 Integer Operation Microinstruction Upper Lower IADD IADDI IADDIU IAND IOR ISUB ISUBIU EE Overview Version 6.0 Macro-instruction Function VFTOI0 VFTOI12 VFTOI15 VFTOI4 VITOF0 VITOF12 VITOF15 VITOF4 float to integer, fixed point 0 bit float to integer, fixed point 12 bits float to integer, fixed point 15 bits float to integer, fixed point 4 bits integer to float, fixed point 0 bit integer to float, fixed point 12 bits integer to float, fixed point 15 bits integer to float, fixed point 4 bits Macro-instruction Function VIADD VIADDI VIAND VIOR VISUB - Elementary Function Operation Microinstruction Macro-instruction Upper Lower EATAN EATANxy EATANxz EEXP ELENG ERCPR ERLENG ERSADD ERSQRT ESADD ESIN ESQRT ESUM Register-Register Transfer Microinstruction Macro-instruction Upper Lower MFIR VMFIR MFP MOVE VMOVE MR32 VMR32 MTIR VMTIR integer ADD integer ADD immediate integer ADD immediate unsigned integer AND integer OR integer SUB integer SUB immediate unsigned Function Elementary-function ArcTAN Elementary-function ArcTAN y/x Elementary-function ArcTAN z/x Elementary-function Exponential Elementary-function Length Elementary-function Reciprocal Elementary-function Reciprocal Length Elementary-function Reciprocal Square and ADD Elementary-function Reciprocal Square-root Elementary-function Square and ADD Elementary-function SIN Elementary-function Square-root Elementary-function Sum Function move from integer register move from P register move floating register move rotate 32 bits move to integer register © SCEI -42- SCE CONFIDENTIAL Load/Store Microinstruction Upper Lower ILW ILWR ISW ISWR LQ LQD LQI SQ SQD SQI Flag Operation Microinstruction Upper Lower FCAND FCEQ FCGET FCOR FCSET FMAND FMEQ FMOR FSAND FSEQ FSOR FSSET Branching Microinstruction Upper Lower B BAL IBEQ IBGEZ IBGTZ IBLEZ IBLTZ IBNE JALR JR EE Overview Version 6.0 Macro-instruction Function VILWR VISWR VLQD VLQI VSQD VSQI integer load word integer load word register integer store word integer store word register Load Quadword Load Quadword with pre-decrement Load Quadword with post-increment Store Quadword Store Quadword with pre-decrement Store Quadword with post-increment Macro-instruction Function - flag-operation clipping flag AND flag-operation clipping flag EQ flag-operation clipping flag get flag-operation clipping flag OR flag-operation clipping flag set flag-operation MAC flag AND flag-operation MAC flag EQ flag-operation MAC flag OR flag-operation status flag AND flag-operation status flag EQ flag-operation status flag OR flag-operation set status flag Macro-instruction Function - branch (PC relative address) branch and link (PC relative address) integer branch on equal integer branch on greater than or equal to zero integer branch on greater than 0 integer branch on less than or equal to zero integer branch on less than zero integer branch on not equal jump and link register (absolute address) jump register (absolute address) © SCEI -43- SCE CONFIDENTIAL Random Numbers Microinstruction Upper Lower RGET RINIT RNEXT RXOR Others Microinstruction Upper Lower CLIP NOP WAITP WAITQ XGKICK XITOP XTOP EE Overview Version 6.0 Macro-instruction Function VRGET VRINIT VRNEXT VRXOR random-unit get R register random-unit init R register random-unit next M sequence random-unit XOR register Macro-instruction Function VCLIP VNOP VWAITQ - clipping no operation wait P register wait Q register eXternal-unit GPU2 Interface Kick eXternal-unit read ITOP register eXternal-unit read TOP register © SCEI -44- SCE CONFIDENTIAL EE Overview Version 6.0 2.4. IPU: Image Data Processor The IPU implements decompression of two-dimensional images, such as texture data and video data. The IPU decompresses the data, using MPEG2 or a subset of MPEG2, or converts the data, using VQ (Vector Quantization). Which layer to use depends on the purpose and the property of the image. IPU Macro-Block Decoder VLD Zig-zag Scan IDCT IQ CSC VQ Local Buffer Memory FIFO 128 Figure 2-10 IPU Block Diagram In decoding MPEG2 bit streams, the IPU decodes macro blocks and the EE Core performs motion compensation via software by using multimedia instructions. For CSC (Color Space Conversion), the IPU is in charge. Rendering Engine EE Core 128 bit ALU IPU (IDCT) Bit Stream SPR Buffer#0 SPR Buffer#1 Reference Image Decoded Image IPU (CSC) RGB Decoded Image Main Memory Figure 2-11 Decoding Process Flow for Motion Compensation © SCEI -45- SCE CONFIDENTIAL EE Overview Version 6.0 2.5. GIF: GS Interface As a front end to the GS, the GIF formats data based on the specifications of a tag (GIFtag) at the start of the display list packet, and then transfers the formatted data to the GS as a drawing command. Data is input to the GIF from VU Mem1 via PATH1, from VIF1 via PATH2, and from main memory via PATH3. The GIF also plays a role in data path arbitration. PATH1 is assigned to the transfer of display lists processed in VPU1. PATH2 is assigned to the data directly transferable to the rendering engine, e.g. online textures. PATH3 is assigned to the transfer of display lists which have been generated by the EE Core and VPU0 and stored temporarily in main memory. The order of priority is PATH1, PATH2, and PATH3. GS GIF Buffer 64 Packing Logic 128 VU1 PATH1 VU Mem1 GS Control Regs PATH3 PATH2 128 VIF1 128 GIF FIFO Figure 2-12 Data Paths to GS © SCEI -46- SCE CONFIDENTIAL EE Overview Version 6.0 2.6. SIF: Sub-CPU Interface The Sub-CPU (IOP) controls sound output and I/O to and from storage devices. It adopts an LMA configuration with memory independent of the EE. The SIF is the interface to exchange data between these processors. The DMA controllers (DMACs) for the IOP and EE operate in cooperation through the bidirectional FIFO (SFIFO) in the SIF. IOP Core SIF IOP DMAC Control Regs . 32 pack/ unpack EE Core EE DMAC 128 SFIFO IOP-Memory Main Memory (MEM) Figure 2-13 EE-IOP Interface Data is transmitted in units called packets. A tag (DMATag) is attached to each packet, containing a memory address in the IOP memory space, a memory address in the EE memory space, and the data size. The IOPDMAC reads the IOP memory address and data size from the tag, and transmits the packet with its tag to the SIF. The EE-DMAC reads the packet from the SIF, interprets the first word as a tag, reads the EE memory address and data size from the tag, and decompresses the data to the specified memory address. These transfer operations are performed by the DMACs to avoid generating unnecessary interrupts of the CPU. IOP Memory EE Memory Source Chain DMA 128 bit Packet A 128 bit Packet A SIF 128 bit Packet B Tag Tag Tag Tag Tag Tag Packet B Destination Chain DMA Packet C Packet C Figure 2-14 SIF Data Flow © SCEI -47- SCE CONFIDENTIAL EE Overview Version 6.0 (This page is left blank intentionally) © SCEI -48- SCE CONFIDENTIAL EE Overview Version 6.0 3. Functional Overview © SCEI -49- SCE CONFIDENTIAL EE Overview Version 6.0 3.1. Data Transfer via DMA Data is transferred between main memory, peripheral processors, and scratchpad memory (SPR) via DMA. The unit of data transfer is a quadword (128 bits = qword). In data transfer to and from peripheral processors, data is divided into blocks (slices) of 8 qwords. On some of the channels, Chain mode is available. This mode performs processing such as switching transfer addresses according to the tag (DMAtag) in the transfer data. This not only reduces processing such as data sorting before transfer, but also enables data exchange between peripheral processors through the mediation of main memory without the EE Core. At such times, the stall control function, which mutually synchronizes transfer, is available. For the GIF channel, memory FIFO function to use the ring buffer in main memory is also provided. 3.1.1. Sliced Transfer Except for the data transfer between the SPR and main memory, DMA transfer is performed by slicing the data every 8 qwords and arbitrating the transfer requests from each channel. A channel releases the bus right temporarily whenever transfer of one slice is completed, and it continues transferring if there are no requests from others. This sliced-transfer mechanism not only enables two or more transfer processes to be executed in parallel but also allows the EE Core to access main memory during the transfer process. The following figure illustrates DMA transfers performed concurrently on Channel A and B. ch-A DREQ ch-A DREQ Stall 8 qword CPU ch-A Stall ch-B ch-B 8 qword 8 qword ch-A DREQ 8 qword ch-A 8 qword CPU ch-A 8 qword ch-B DREQ ch-B DREQ ch-B ch-B DREQ Figure 3-1 Example of Sliced Transfer 3.1.2. Chain Mode Transfer Source Chain Mode Source Chain Mode is used for DMA transfer from memory to peripherals. In this mode, transfer address and transfer data size are specified according to the tag data (DMAtag) in the packet. The DMAC repeats transfer processing while tracing the tags in memory, and ends a series of transfers at the point where transfer of the tag with the end instruction finishes. The DMAtag is 128-bit data with the following structure. ID is a field in which details of the transfer operation are specified. Eight types in the table below can be specified. © SCEI -50- SCE CONFIDENTIAL EE Overview Version 6.0 127 64 (Arbitrary) 63 32 31 Address Specification ADDR ID cnt Transfer Data Position Next to tag 24 ID/FLG 15 0 Data Size QWC Next Tag Position Next to transfer data Operation Transfers the data following the tag and proceeds to the succeeding data. next Next to tag Specified in tag Transfers the data following the tag and jumps to the specified position. ref Specified in tag Next to tag Transfers the data at the specified position. refs Specified in tag Next to tag Transfers the data at the specified position while applying stall control. refe Specified in tag (None) Transfers the data at the specified position and ends transfer. call Next to tag Specified in tag Transfers the data following the tag, stores the next address, and jumps to the specified position. ret Next to tag Position stored Transfers the data following the tag and when call was jumps to the position stored when call specified was specified. end Next to tag (None) Transfers the data following the tag and ends transfer. Data transfers can be performed most efficiently by using these IDs appropriately according to the data structures in memory. The following is an example. © SCEI -51- SCE CONFIDENTIAL EE Overview Version 6.0 TADR TADR (1)NEXT ADDR=ADDR2 (2) 1 ADDR0 (1)REF ADDR=ADDR2 (3)REF ADDR=ADDR1 (5)END ADDR=- DATA 0 (6) (5) END ADDR=(6) 3 ADDR0 DATA 0 3 ADDR1 ADDR1 DATA 1 (4) DATA 1 2 (3)NEXT ADDR=ADDR1 2 (4) ADDR2 ADDR2 (2) DATA 2 next tag DATA 2 1 ref tag TADR (1)CALL ADDR=ADDR1 (7)CALL ADDR=ADDR2 (8) DATA 0 ADDR0 (2)CALL ADDR=ADDR2 (3) ADDR1 DATA 1 1 3 (6) RET ADDR=- (11)END ADDR=(4)(9) 4 (5)(10) RET ADDR=- ADDR2 DATA 2 call tag/ret tag Figure 3-2 Source Chain DMA Tags Showing Data Structures © SCEI -52- 2 SCE CONFIDENTIAL EE Overview Version 6.0 Destination Chain Mode Destination Chain Mode is used to transfer data from peripherals to memory. The tag (DMAtag) bearing the destination address and packet length is placed at the start of the transfer packet. This enables the peripheral side to control the address where data is stored. The Destination Chain tag is 128-bit data with the following structure, and is classified into three types as shown in the table below. 127 64 (Arbitrary) 63 32 Address Specification ADDR ID cnt cnts Destination Address Specified in tag Specified in tag end Specified in tag 31 24 15 ID/FLG 0 Data Size QWC Operation Stores the data following the tag at the specified address. Stores the data following the tag at the specified address while applying stall control. Stores the data following the tag at the specified address and ends transfer. The following is an example. SPR SADDR Start Address Register Main Memory CNT ADDR=ADDR1,QWC=8 ADDR0 DATA 0 CNT ADDR=ADDR0,QWC=8 DATA 1 D.Chain DMA ADDR1 DATA 1 DATA 0 Destination Chain DMA from SPR IOP Memory Main Memory CNT ADDR=ADDR1,QWC=8 ADDR0 DATA 0 D.Chain DMA SDMA DATA 1 CNT ADDR=ADDR0,QWC=8 DATA 1 SFIFO ADDR1 DATA 0 Destination Chain DMA from SIF Figure 3-3 Destination Chain DMA to Transfer Data to Specified Address © SCEI -53- SCE CONFIDENTIAL EE Overview Version 6.0 3.1.3. Interleave Transfer Interleave mode is available for DMA transfer between main memory and SPR. This mode processes data in such a way that a small rectangular area is cut out from or fitted into the two-dimensional data (image data) allocated in memory. Figure 3-4 illustrates an example of cutting out a small rectangular area (TW, TH) from a rectangular area (FW, FH). Fw Source Image Fh Tw Destination T0 T0 T1 T2 : : SPR Th Tw Fw skip Tw Fw skip Tw Fw T1 T2 skip : : Main memory Figure 3-4 Cutting Out a Small Rectangular Area in Interleave Mode 3.1.4. Stall Control When a transfer from a peripheral to memory and a transfer from memory to another peripheral are performed concurrently, they can be synchronized through the stall address register (D_STADR). The channel that handles the DMA transfer to memory is called the source channel. The channel that handles the DMA transfer from memory is called the drain channel. The value of D_STADR is updated as transfer processing on the source channel side advances, but transfer processing on the drain channel side stalls at the address immediately preceding the D_STADR address. This mechanism is called stall control. © SCEI -54- SCE CONFIDENTIAL EE Overview Version 6.0 Main Memory/SPR Source DMA Peripheral0 8 qwords Drain DMA Peripheral1 8 qwords 8 qwords 8 qwords 8 qwords D_STADR Figure 3-5 Synchronization between DMA Transfers by Stall Control 3.1.5. MFIFO A FIFO function can be implemented by using a ring buffer and the DMA tag set in main memory when transferring data from the scratchpad memory to the VIF1/GIF. This is called MFIFO (MemoryFIFO). EE Core SPR VIF1/GIF Main BUS D_RBOR Dd_TADR D_RBSR D8_MADR Main Memory Ring Buffer (MFIF0) Figure 3-6 Memory FIFO (MFIFO) © SCEI -55- SCE CONFIDENTIAL EE Overview Version 6.0 3.2. Data Transfer to VPU The EE has two built-in VPUs. These floating-point vector processors execute matrix operations, coordinate conversion, transparency perspective conversion, and so forth, at high speed. Data is DMA-transferred to the VPU through the VIF. The header information (VIFcode) embedded in the transfer data specifies how to process the data in the VPU. This is the mechanism of DMA transfer to the VPU. 3.2.1. VIF Overview The VIF is an interface unit, which decompresses the DMA-transferred data in packets and transfers it to the VPU memory. The VIF is designed to set the decompression method and destination memory address of the data according to the VIFcode included in the VIF packet. It enables the VPU to perform operations independently of the EE Core by transferring VIF packets of vector data, VIF packets of microinstruction program, and VIF packets to give an instruction to activate a microinstruction program. The data types the VIF can decompress and transfer to the VU Mem are one- to four-dimensional vectors consisting of 8-bit/16-bit/32-bit elements, and a four-dimensional vector of 16-bit color type with RGBa: 5.5.5.1. In addition, the VIF can transfer microinstruction code to be transferred to the Micro Mem. VIF1 can also transfer data to the GS via the GIF. 3.2.2. VIF Packet According to the 32-bit VIFcode in the transferred data, the VIF decompresses the following data and writes memory and registers in the VU. The VIFcode and the following data string are called the VIF packet. Several VIF packets can exist in 1 DMA packet as shown in the figure below. DMA packet example (When DMAtag is transferred) 128 bits ←MSB data VIFcode0 DMAtag data data VIFcode2 data data data data data data data data data data data data data data VIFcode4 data data data VIFcode5 data data DMA packet example (When DMAtag is not transferred) 128 bits ←MSB --DMAtag VIFcode2 VIFcode1 data data data data data data data data data data data VIFcode3 data VIFcode4 data data data data data data data data --VIFcode5 © SCEI -56- LSB→ VIFcode1 data data data VIFcode3 data data data LSB→ VIFcode0 data data data data data data data data SCE CONFIDENTIAL EE Overview Version 6.0 VIF packets included in the above DMA packets data VIFcode0 VIFcode1 data data ……data x 12 VIFcode2 data data ……data x 2 VIFcode3 data data ……data x 3 VIFcode4 VIFcode5 3.2.3. VIFcode Structure The VIFcode is 32 bits in length, consisting of the CMD field (8 bits), the NUM field (8 bits), and the IMMEDIATE field (16 bits) as shown in the figure below. 31 24 23 CMD 16 15 NUM 0 IMMEDIATE The CMD field gives the VIF instructions on the operation and the decompression method of the following data. The meanings of the NUM and IMMEDIATE fields change according to the value of the CMD field. Category Data transfer Micro-program execution CMD Name UNPACK STCYCL OFFSET STMOD STMASK STROW Function Decompresses data and writes to VU Mem. Sets CYCLE register value. Sets OFFSET register value (VIF1 only). Sets MODE register value. Sets MASK register value. Sets Row register value. STCOL Sets Col register value. MPG Loads a microprogram. Microinstruction program FLUSHE FLUSH Waits for end of a microprogram. Waits for end of a microprogram and end of GIF (PATH1 /PATH2) transfer. (VIF1 only) Waits for end of a microprogram and end of GIF transfer. (VIF1 only) Activates a microprogram. Executes a microprogram continuously. FLUSH and activates a microprogram. None None FLUSHA MSCAL MSCNT MSCALF Following data Packed vector data None None None Mask pattern Row-completion data Column-completion data None None None None (VIF1 only) Double buffering GS data transfer (VIF1 only) Others BASE ITOP DIRECT DIRECTHL MSKPATH3 NOP MARK Sets BASE register value. (VIF1 only) Sets ITOPS register value. Transfers data to GIF (via PATH2). Transfers data to GIF (via PATH2). Masks transfer via PATH3 to GIF. No operation Sets MARK register value. None None GS data GS data None None None © SCEI -57- SCE CONFIDENTIAL EE Overview Version 6.0 3.2.4. Data Transfer by UNPACK The most general data transfer via the VIF is data transfer to VU Mem by using the VIFcode UNPACK. The transfer data following the VIFcode is packed data; 8 bits x 4 elements and 32 bits x 3 elements, for example. The VIF decompresses the packed data to vector data of 32 bits x 4 elements and writes it to the VU Mem. At this time, VU Mem area left blank can be filled with a VPU register value (supplementation), and a constant offset value can be added to the transfer data (addition). The list of packing formats is shown as follows. Format Data length S-32 S-16 S-8 V2-32 V2-16 V2-8 V3-32 V3-16 V3-8 V4-32 V4-16 V4-8 V4-5 32 bits 16 bits 8 bits 32 bits 16 bits 8 bits 32 bits 16 bits 8 bits 32 bits 16 bits 8 bits 5+5+5+1 bits © SCEI -58- No. of elements (dimensions) 1 1 1 2 2 2 3 3 3 4 4 4 4 SCE CONFIDENTIAL EE Overview Version 6.0 3.2.5. Double Buffering VPU1 supports double buffering, which sets two buffer areas in the VU Mem and enhances throughputs by simultaneously transferring data to VU Mem and performing microprogram operations. TOPS Following data transfer Buffer A VIF VU TOP Buffer B Transferred data processing TOP Buffer A VIF Transferred data processing VU TOPS Buffer B Following data transfer Figure 3-7 Double Buffering in VU Mem Double buffer addresses can be set with the VIF1_BASE and VIF1_OFST registers. These can be reflected in the VIF1_TOPS register and the TOP register of VU1 by taking appropriate steps. By setting the FLG bit in the VIFcode UNPACK, data can be transferred to the double buffers according to the relative specification based on the address shown by the TOPS register. When a microprogram reads data from double buffers, it reads the TOP register value using the XTOP instruction and accesses the data in the buffer accordingly. The values of TOPS and TOP are replaced whenever a microprogram is activated. So it is possible to process transferred data with a microprogram while transferring data to two buffers alternately, by repeating data transfer and microprogram activation. © SCEI -59- SCE CONFIDENTIAL EE Overview Version 6.0 3.3. Data Transfer to GS Regular display lists generated by VU1 and exceptional display lists generated by the EE Core and VU0 are transferred concurrently while having the transfer right arbitrated through the GIF. This is the typical data flow from the EE to the GS. The following are brief descriptions of this data flow. 3.3.1. Data Transfer Route The GIF has three general data transfer paths called PATH1, PATH2, and PATH3. They work as follows. • PATH1 PATH1 is a data transfer path from VPU1 data memory (VU Mem1) to the GS. When VU1 executes the XGKICK instruction, transfer processing via this path is performed. • PATH2 PATH2 is a data transfer path between the FIFO inside the VPU1 VIF and the GIF. This path is used when executing the DIRECT/DIRECT_HL instruction in the VIF and when transferring data from the GS to main memory by using the image data transfer function of the GS. • PATH3 PATH3 is a direct data transfer path from the EE main bus to the GIF. This path is used when transferring data from main memory or the SPR to the GS. Priority and Timing The three general data transfer paths are prioritized as PATH1>PATH2>PATH3. Whenever transfer of the GS packet (described later in this document) ends in each path, transfer requests from other paths are checked. If there is a request, transfer processing is performed according to priority. Access to GS Privileged Register The privileged registers of the GS are directly mapped to the I/O space of the EE core, and are accessible without using the GIF, regardless of the state of the general data transfer paths. The GIF monitors access to the privileged registers. When the transfer direction switching register (BUSDIR) is accessed, the GIF switches data transfer direction accordingly. 3.3.2. Data Format GS Packet The basic unit of data transferred by the GIF is a GS primitive consisting of header information (GIFtag) and following data. However, transfer processing is performed in units of GS packets in which several GS primitives are gathered. The last GS primitive in the GS packet is shown by the termination information (EOP=1) in the GIFtag. © SCEI -60- SCE CONFIDENTIAL EE Overview Version 6.0 GIFtag(EOP=0) GS Primitive DATA GIFtag(EOP=0) GS Packet GS Primitive DATA GIFtag(EOP=1) GS Primitive DATA Figure 3-8 GS Packet Structure The above data structure is common to any data transfer path. For PATH2 and PATH3, however, the VIFcode and DMATag are put in front of the GS packet. It is necessary to align the GIFtag and data on a 128-bit boundary in memory. GIFtag The GIFtag has a 128-bit fixed length, and specifies the size and structure of the following data and the data format (mode). The structure of the GIFtag is as follows: 1 2 7 6 4 REGS (max 16) Name NLOOP EOP PRE PRIM FLG 6 3 6 5 0 9 N R E G F L G 5 8 5 7 4 7 P R I M 4 6 1 5 1 4 0 0 P R E E O P NLOOP Pos. 14:0 15 46 57:47 59:58 Contents Repeat count (GS primitive data size) Termination information (End of Packet) PRIM field enabled Data to be set to the PRIM register of GS Data format 00 PACKED mode 01 REGLIST mode 10 IMAGE mode 11 Disabled (Same operation as the IMAGE mode) NREG 63:60 Number of register descriptors (Number of register descriptors in REGS field) REGS 127:64 Register descriptor (4 bits x 16 max.) The value of the NLOOP field shows the data size of GS primitive, but the unit varies depending on the data format. © SCEI -61- SCE CONFIDENTIAL EE Overview Version 6.0 3.3.3. PACKED Mode PACKED mode formats (packs) vertex coordinate values, texture coordinate values, and color values generated as vector data of 32 bits x 4 elements adjusting to the corresponding bit fields of the GS registers, and writes them to the GS registers. The register descriptors put in the REGS field of the GIFtag correspond to every qword in the following data, and show the data format and the register where the data is written. The following 9 types of register descriptors are available: Name PRIM RGBAQ ST UV XYZF2 XYZ2 FOG A+D NOP Input Data Destination Register Type and attribute of primitive PRIM Vertex color RGBAQ Vertex texture coordinates ST Vertex texture coordinates (Texel coordinate values) UV Vertex coordinate values + Fog coefficient XYZF2/XYZF3 Vertex coordinate values XYZ2/XYZ3 Fog coefficient FOG Arbitrary register set value Specified arbitrarily. Arbitrary None (Not output) 3.3.4. REGLIST Mode REGLIST mode transfers data strings formatted in such a way that they can be written to the GS register as they are. The data following the GIFtag is considered to be data strings of 64 bits x 2 as they are, and the register descriptors put in the REGS field of the GIFtag show to which register the data is written. 3.3.5. IMAGE Mode IMAGE mode transfers image data by means of the host-local transfer function of the GS. The data following the GIFtag is considered to be data strings of 64 bits x 2 and is written to the HWREG register of the GS consecutively. © SCEI -62- SCE CONFIDENTIAL EE Overview Version 6.0 3.4. Image Decompression by IPU The IPU (Image Processing Unit) is an image data processor whose main functions are bit stream decompression and macro block decoding of MPEG2. Compressed data in main memory is decoded, decompressed, and written back again to main memory. The decoded images are transferred to the GS and used as moving picture image data and texture data. Figure 3-9 illustrates the basic processing flow of the IPU. Input data BS128 Processing ↓ VLC Output data ↓ CODE16 → IDCT → ↓ RAW16 RAW8 RGB32 ↓ CSC ↓ Dither → ↓ RGB32 ↓ RGB16 → VQ ↓ INDX4 Figure 3-9 IPU Processing Flow The IPU has the following basic functions: • MPEG2 macro block layer decoding • MPEG2 bit stream decoding • Bit stream decompression The IPU has the following additional post-processing functions. • YCbCr → RGB color conversion (CSC) • 4 x 4 ordered dither • Vector quantization (VQ) The IPU handles the following data formats: Name Contents BS128 MPEG2 bit stream subset RGB32 RGBA pixels (A8+R8+G8+B8) RGB16 RGBA pixels (A1+R5+G5+B5) RAW8 Unsigned 8-bit YCbCr pixels RAW16 Singed 16-bit YCbCr pixels (Only lower 9 bits are effective.) INDX4 Unsigned 4-bit index pixels Width 128 bits 32 bits 16 bits 8 bits 16 bits 4 bits © SCEI -63- SCE CONFIDENTIAL EE Overview Version 6.0 The following commands are available: Name Contents BCLR Input FIFO initialization command IDEC Intra decoding command BDEC Block decoding command VDEC Variable-length data decoding command FDEC Fixed-length data decoding command SETIQ IQ table setting command SETVQ VQ table setting command CSC Color space conversion command PACK Format conversion command SETTH Threshold setting command Other functional features are as follows. • Motion Compensation (MC) Input BS128 BS128 BS128 BS128 RAW8 RGB16 RAW8 RGB32 - Output RGB32/RGB16 RAW16 Variable-length code + decoding code Fixed-length data RGB32/RGB16 RGB16/INDX4 - In decoding an MPEG2 bit stream, motion compensation (MC) is not performed in the IPU, but in the EE core, by using multimedia instructions. • Automatic Generation of Alpha The alpha plane (transparency plane) is generated from the decoded luminance value according to a fixed rule. This is useful in effectively cutting out the texture pattern when decoding the bit stream without the stencil pattern (transparent pixel mask pattern). © SCEI -64-
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.4 Linearized : Yes Encryption : Standard V1.2 (40-bit) User Access : Print, Fill forms, Extract, Assemble, Print high-res Modify Date : 2002:06:10 14:07:49-07:00 Create Date : 2002:05:23 23:49:58Z Page Count : 64 Creation Date : 2002:05:23 23:49:58Z Mod Date : 2002:06:10 14:07:49-07:00 Producer : Acrobat Distiller 5.0.5 (Windows) Author : Sony Computer Entertainment Metadata Date : 2002:06:10 14:07:49-07:00 Creator : Sony Computer Entertainment Title : EE Overview Page Mode : UseOutlines Page Layout : SinglePageEXIF Metadata provided by EXIF.tools