1991_IDT_RISC_R300A_R3001_R3051_Product_Information 1991 IDT RISC R300A R3001 R3051 Product Information
User Manual: 1991_IDT_RISC_R300A_R3001_R3051_Product_Information
Open the PDF directly: View PDF .
Page Count: 164
Download | |
Open PDF In Browser | View PDF |
1991 IDT Rise R3000A, R3001, R3051™ Family Product InforRiation Integrated Device Technology, Inc. Integrated Device Technology, Inc. reserves the right to make changes to its products or specifications at any time, without notice, in order to improve design or performance and to supply the best possible product. lOT does not assume any responsibility for use of any circuitry described other than the circuitry embodied in an lOT product. Tho Company makes no representations that circuitry described herein is free from patent infringement or other rights of third parties which may result from its use. No license is granted by implication or otherwise under any patent, patent rights or other rights, of Integrated Device Technology, Inc. LIFE SUPPORT POLICY Integrated Device Technology's products are not authorized for use as critical components In life support devices or systems unless a specific written agreement pertaining to such Intended use Is executed between the manufacturer and an officer of JOT. 1. LIfe support devices or systems are devices or systems which (a) are Intended for surgical Implant Into the body or (b) support or sustain life and whose failure to perform, when properly used In accordance with Instructions for use provided In the labeling, can be reasonably expected to result In a significant Injury to the user. 2. A critical component Is any component of a life support device or system whose failure to perform can be reasonably expected to cause the failure of the life support device or system, or to affect Its safety or effectiveness. The lOT logo is a registered trademark of Integrated Device Technology, Inc. CEMOS, IDT79R3051, IDT79R3052, and RISController are trademarks of Integrated Device Technology, Inc. UNIX is a registered trademark of AT&T. 1991 lOT Rise R3000A, R3001, R3051 Family Product Information TABLE OF CONTENTS IDT79R3000 RISC CPU Processor ......................................................................... 1 IDT79R3000A IDT79R3000AE RISC CPU Processor ......................................................................... 25 I DT79 R3001 RISController™ ................................................................................. 53 I DT79 R301 0 RISC Floating-Point Accelerator (FPA) ................................................ 83 IDT79R3010A IDT79R3010AE RISC Floating-Point Accelerator (FPA) ................................................100 IDT79 R3020 RISC CPU Write Buffer ...................................................................... 123 IDT79R3051™/E IDT79R3051 Family of Integrated RISControllers™ ............................. 141 IDT79R3052™/E Package Information ................................................................................................... 153 ~® IDT79R3000 RISC CPU PROCESSOR Integrated Device Technology, Inc. FEATURES: • Enhanced instruction set compatible version of the IDT79R2000 RISC CPU. • Full 32-bit Operation-Thirty-two 32-bit registers and all instructions and addresses are 32-bit. • Efficient Pipelining-The CPU's 5-stage pipeline design assists in obtaining an execution rate approaching one instruction per cycle. Pipeline stalls and exceptions are handled precisely and efficiently. • On-Chip Cache Control-The IDT79R3000 provides a high bandwidth memory interface that handles separate external Instruction and Data Caches ranging in size from 4 to 256 Kbytes each. Both the caches are accessed during a single CPU cycle. All cache control is on-chip. • On-Chip Memory Management Unit-A fully-associative, 64 entry Translation Lookaside Buffer (TLB) provides fast address translation for virtual-to-physical memory mapping of the 4 Gigabyte virtual address space. • Coprocessor Interface-The IDT79R3000 generates all addresses and handles memory interface control for up to three additional tightly coupled external processors. • Optimizing Compilers are available for C, Fortran, Pascal, COBOL, Ada, and PU1. • UNIXTIA System V.3 and BSD 4.3 operating systems supported. • High-speed CEMOSTIA technology. • Instruction set compatible with the IDT79R2000 RISC CPU. • 16.7MHz, 20MHz, 25MHz and 33MHz clock rates yield up to 28 MIPS sustained throughput. • Supports independent multiword block refill of both the instruction and data caches with variable block sizes. • Supports concurrent refill and execution of instructions. • Partial word stores executed as read-modify-write operations. • 6 external interrupt inputs (up to 64 different sources), 2 software interrupts, with single cycle latency to exception handler routine. • Flexible multiprocessing support on chip with no impact on uniprocessor designs. • Military product compliant to MIL-STD-883, Class B. DESCRIPTION: The lOT 79R3000 RISC Microprocessor consists of two tightlycoupled processors integrated on a single chip. The first processor is a full 32-bit CPU based on RISC (Reduced Instruction Set Computer) principles to achieve a new standard of microprocessor performance. The second processor is a system control coprocessor, called CPO, containing a fully-associative 64 entry TLB (Translation Lookaside Buffer), MMU (Memory Management Unit) and control registers, supporting a 4 Gigabyte virtual memory subsystem, and a Harvard Architecture Cache Controller achieving a bandwidth of over 260 Mbytes/second using industry standard static RAMs. This data sheet provides an overview of the features and architecture of the 79R3000 CPU, Revision 2.0. A more detailed description of the operation of the device is incorporated in the "R3000 Family Hardware User Manual", and a more detailed architectural overview is provided in the "mips RISC Architecture" book, both available from lOT. Documentation providing details of the software and development environments supporting this processor are also available from lOT. IDT79R3000 PROCESSOR CPO CPU (System Control Coprocessor) Exception / Control Re isters Memory Management Unit Reigsters Translation LooKaSlde Buffer (64 entries) Local Control Logic TAG(20+4) ADDRESS(18) Multi lier/Divider Address Adder PC IncrementlMux Data(32+4) CEMOS is a trademark of Integrated Device Technology, Inc. UNIX is a registered trademark of AT&T. MILITARY AND COMMERCIAL TEMPERATURE RANGES e 1990 Integrated Device Tochnology, Inc. MARCH 1990 DSC--9021 12 IDT79R3000 RISC CPU PROCESSOR MIUTARY AND COMMERCIAL TEMPERATURE RANGES IDT79R3000 CPU Registers The IDT 79R3000 CPU provides 32 general purpose 32-bit registers, a 32-bit Program Counter, and two 32-bit registers that hold the results of integer multiply and divide operations. Only two ofthe 32 general registers have a special purpose: register rO is hardwired to the value "0", which is a useful constant, and register r31 is used as the link register in jump-and-link instructions (return address for subroutine calls). The CPU registers are shown in Figure 2. Note that there is no Program Status Word (PSW) register shown in this figure: the functions traditionally provided by a PSW register are instead provided in the Status and Cause registers incorporated within the System Control Coprocessor (CPO). The IDT79R3000 instruction set can be divided into the following groups: • Load/Store instructions move data between memory and general registers. They are alii-type instructions, since the only addressing mode supported is base register plus 16-bit, signed immediate offset. The Load instruction has a single cycle of latency, which means that the data being loaded is not available to the instruction immediately after the load instruction. The compiler will fill this delay slot with either an instruction which is not dependent on the loaded data, or with a Nap instruction. There is no latency associated with the store instruction. Loads and Stores can be performed on byte, half-word, word, or unaligned word data (32 bit data not aligned on a modul0-4 address). The CPU cache is constructed as a write-through cache. • Computational instructions perform arithmetic, logical and shift operations on values in registers. They occur in both R-type (both operands and the result are registers) and I-type (one operand is a 16-bit immediate) formats. Note that computational instructions are three operand instructions; that is, the result of the operation can be stored into a different register than either of the two operands. This means that operands need not be overwritten by arithmetic operations. This results in a more efficient use of the large register set. • Jump and Branch instructions change the control flow of a program. Jumps are always to a paged absolute address formed by combining a 26-bittarget with four bits ofthe Program counter (J-type format, for subroutine calls), or 32-bit register byte addresses (R-type, for returns and dispatches). Branches have 16-bit offsets relative to the program counter (I-type). Jump and Link instructions save a return address in Register 31. The 79R3000 instruction set features a number of branch conditions. Included is the ability to compare a register to zero and branch, and also the ability to branch based on a comparison between two registers. Thus, net performance is increased since software does not have to perform arithmetic instructions prior to the branch to set up the branch conditions. • Coprocessor instructions perform operations in the coprocessors. Coprocessor Loads and Stores are I-type. Coprocessor computational instructions have coprocessordependent formats (see coprocessor manuals). • Coprocessor 0 instructions perform operations on the System Control Coprocessor (CPO) registers to manipulate the memory management and exception handling facilities of the processor. • Special instructions perform a variety of tasks, including movement of data between special and general registers, system calls, and breakpoint. They are always R-type. General Purpose Registers 31 0 Multiply / Divide Registers 0 0 31 I r1 r2 HI • • • I o 31 La · Program Counter 31 r29 I r30 0 I PC r31 Figure 2. IDT79R3000 CPU Registers Instruction Set Overview AIiIDT 79R3000 instructions are 32 bits long, and there are only three instruction formats. This approach simplifies instruction decoding thus minimizing instruction execution time. The 79R3000 processor initiates a new instruction on every run cycle, and is able to complete an instruction on almost every clock cycle. The only exceptions are the Load instructions and Branch instructions, which each have a single cycle of latency associated with their execution. Note, however, that in the majority of cases the compilers are able to fill these latency cycles with useful instructions which do not require the result of the previous instruction. This effectively eliminates these latency effects. The actual instruction set of the CPU was determined after extensive simulations to determine which instructions should be implemented in hardware, and which operations are best synthesized in software from other basic instructions. This methodology resulted in the R3000 having the highest performance of any available microprocessor. Table 1 lists the instruction set of the IDT79R3000 processor. I-Type (Immediate) 31 2625 I °e I 2120 rs I 16 15 rt 0 I I immediate J-Type (Jump) 31 26 25 I °e I 0 I tar!ilet R-Type (Register) 31 262521201615 op 6 5 11 10 rd I re I 0 funct I Figure 3. 1DT79R3000 Instruction Formats 2 MIUTARY AND COMMERCIAL TEMPERATURE RANGES IDT79R3000 RISC CPU PROCESSOR DESCRIPTION OP Multiply/Divide Instructions Load/Store Instructions LB LBU LH LHU LW LWL LWR Load Load Load Load Load Load Load SB SH SW SWL SWR Store Byte Store Halfword Store Word Store Word Left Store Word Right Byte Byte Unsigned Halfword Halfword Unsigned Word Word Left Word Right MULT MULTU DIV DIVU Multiply Multiply Unsigned Divide Divide Unsigned MFHI MTHI MFLO MTLO Move Move Move Move J Jump Jump and Link Jump to Register Jump and Link Register From HI To HI From LO To LO Jump and Branch Instructions JAL JR JALR Arithmetic Instructions (ALU Immediate) ADDI ADDIU SLTI SLTIU DESCRIPTION OP ANDI ORI XORI Add Immediate Add Immediate Unsigned Set on Less Than Immediate Set on Less Than Immediate Unsigned AND Immediate OR Immediate Exclusive OR Immediate LUI Load Upper Immediate BEQ BNE BLEZ BGTZ BLTZ BGEZ BLTZAL BGEZAL Branch on Equal Branch on Not Equal Branch on Less than or Equal to Zero Branch on Greater Than Zero Branch on Less Than Zero Branch on Greater than or Equal to Zero Branch on Less Than Zero and Link Branch on Greater than or Equal to Zero and Link Special Instructions Arithmetic Instructions (3-operand, register-type) ADD ADDU SYSCALL BREAK System Call Break LWCz SWCz MTCz MFCz CTCz CFCz COPz BCzT BCzF Load Word from Coprocessor Store Word to Coprocessor Move To Coprocessor Move From Coprocessor Move Control to Coprocessor Move Control From Coprocessor Coprocessor Operation Branch on Coprocessor z True Branch on Coprocessor z False Add Add Unsigned Coprocessor Instructions SUB SUBU Subtract Subtract Unsigned SLT SLTU Set on Less Than Set on Less Than Unsigned AND OR XOR NOR AND OR Exclusive OR NOR System Control Coprocessor (CPO) Instructions Shift Instructions SLL SRL SRA SLLV SRLV SRAV Shift Left Logical Shift Right Logical Shift Right Arithmetic Shift Left Logical Variable Shift Right Logical Variable Shift Right Arithmetic Variable MTCO MFCO Move To CPO Move From CPO TLBR TLBWI TLBWR TLBP Read indexed TLB entry Write Indexed TLB entry Write Random TLB entry Probe TLB for matching entry RFE Restore From Exception Table 1. 1DT79R3000 Instruction Summary 3 IDT79R3000 RISC CPU PROCESSOR MIUTARY AND COMMERCIAL TEMPERATURE RANGES IDT79R3000 System Control Coprocessor (CPO) The IDT79R3000 can operate with up to four tightly-coupled coprocessors (designated CPO through CP3). The System Control Coprocessor (or CPO), is incorporated on the IDT79R3000 chip and supports the virtual memory system and exception handling functions of the IDT79R3000. The virtual memory system is implemented using a Translation Lookaside Buffer and a group of programmable registers as shown in Figure 4. System Coprocessor f~~~~~f~~k~ l~g~~~~f{~~ Ia) ~~mENmTmRmYmHmlmmmdmmmm=EmNTmRmYmLmOmmmm~1 INDEX 63 RANDOM TLB 8 7 o NOT ACCESSED BY RANDOM D Used with Virtual Memory System EZ3 Used with Exception Processing Figure 4. The System Coprocessor Registers System Control Coprocessor (CPO) Registers The CPO registers shown in Figure 4 are used to control the memory management and exception handling capabilities of the IDT79R3000. Table 2 provides a brief description of each register. REGISTER DESCRIPTION EntryHi EntryLo Index Random High half of a TLB entry Low half of a TLB entry Programmable pointer into TLB array Pseudo-random pointer into TLB array Status Cause EPC Context BadVA Mode, interrupt enables, and diagnostic status info Indicates nature of last exception Exception Program Counter Pointer into kernel's virtual Page Table Entry array Most recent bad virtual address PRld Processor revision identification (Read only) Table 2. System Control Coprocessor (CPO) Registers 4 MIUTARY AND COMMERCIAL TEMPERATURE RANGES IDTI9R3000 RISC CPU PROCESSOR Memory Management System 64 entries, each of which maps a 4-Kbyte page, with controls for read/write access, cache ability, and process identification. The TLB allows each user to access up to 2 Gbytes of virtual address space. Figure 5 illustrates the format of each TLB entry. The Translation operation involves matching the current Process ID (PID) and upper 20 bits of the address against PID and VPN (Virtual Page Number) fields in the TLB. When both match (or the TLB entry is Global), the VPN is replaced with the PFN (Physical Frame Number) to form the physical address. TLB misses are handled in software, with the entry to be replaced determined by a simple RANDOM function. The routine to process a TLB miss in the UNIX environment requires only 10-12 cycles, which compares favorably with many CPUs which perform the operation in hardware. The IDT79R3000 has an addressing range of 4 Gbytes. However, since most IDT79R3000 systems implement a physical memory smaller than 4 Gbytes, the IDT79R3000 provides for the logical expansion of memory space by translating addresses composed in a large virtual address space into available physical memory address. The 4 GByte address space is divided into 2 GBytes which can be accessed by both the users and the kernel, and 2 GBytes for the kernel only. Tho TLB (Translation Lookaslde Buffer) Virtual memory mapping is assisted by the Translation Lookaside Buffer (TLB). The on-chip TLB provides very fast virtual memory access and is well-matched to the requirements of multitasking operating systems. The fully-associative TLB contains TLB ENTRY FORMAT 63 4 43 38 37 32 31 12 10 9 8 o 7 o 7 v ENTRYLO ENTRYHI VPN - Virtual Page number TLBPID - Process ID PFN - Physical frame number N - Non-cacheable flag D - Dirty flag (Write protect) V - Valid entry flag G - Global flag (ignore PID ) 0- Reserved Figure 5. TLB Entry Format IDT79R3000 Operating Modes The IDT79R3000 has two operating modes: User mode and Kernelmode. The IDT79R3000 normally operates in the User mode until an exception is detected forcing it into the Kernel mode. It remains in the Kernel mode until a Restore From Exception (RFE) instruction is executed. The manner in which memory addresses are translated or mapped depends on the operating mode of the IDT79R3000. Figure 6 shows the MMU translation performed for each of the operating modes. 5 I0T79R3000 RISC CPU PROCESSOR MIUTARV AND COMMERCIAL TEMPERATURE RANGES MMU ADDRESS TRANSLATION VIRTUAL -> PHYSICAL OxFFFFFFFF » KERNEL MAPPED CACHEABLE (kseg2) OxCOOOOOOO OxAOOOOOOO KERNEL UNMAPPED UNCACHED (kseg1 ) KERNEL UNMAPPED Ox80000000 Ox7FFFFFFF - OxFFFFFFFF PHYSICAL MEMORY 3584 MB MEMORY 512 MB 1ACH~)D ksegO KERNEUUSER MAPPED CACHEABLE (kuseg) ANY o .. Ox20000000 OxlFFFFFFF OxOOOOOOOO Figure 6. I0T79R3000 Virtual Address Mapping User Mode-in this mode, a single, uniform virtual address space (kuseg) of 2 Gbyte is available. Each virtual address is extended with a 6-bit process identifier field to form unique virtual addresses. All references to this segment are mapped through the TLB. Use of the cache for up to 64 processes is determined by bit settings for each page within the TLB entries. Kernel Mode-four separate segments are defined in this mode: • kuseg-when in the kernel mode, references to this segment are treated just like user mode references, thus streamlining kernel access to user data. • ksegO-;eferences to this 512 Mbyte segment use cache memory but are not mapped through the TLB. Instead, they always map to the first 0.5 GBytes of physical address space. • kseg1-references to this 512 Mbyte segment are not mapped through the TLB and do not use the cache. Instead, they are hard-mapped into the same 0.5 GByte segment of physical address space as ksegO. • kseg2-references to this 1 Gbyte segment are always mapped through the TLB and use of the cache is determined by bit settings within the TLB entries. ID179R3000 Pipeline Architecture The execution of a single IDT79R3000 instruction consists of five primary steps: 1) IF -Fetch the instruction (I-Cache). 2) RD - Read any required operands from CPU registers while decoding the instruction. 3) ALU - Perform the required operation on instruction operands. 4) MEM-Access memory (D-Cache). 5) we - Write back results to register file. Each of these steps requires approximately one CPU cycle as shown in Figure 7 (parts of some operations overlap into another cycle while other operations require only 112 cycle). Instruction Execution IF I I-Cache I RD ALU J OP RF MEM D-CACHE WB ~ one cycle Figure 7. 1DT79R3000 Instruction Pipeline 6 WB I IDTI9R3000 RISC CPU PROCESSOR MIUTARV AND COMMERCIAL TEMPERATURE RANGES The IDT79R3000 uses a 5-stage pipeline to achieve an instruction execution rate approaching one instruction per CPU cycle. Thus, execution of five instructions at a time are overlapped as shown in Figure 8. • External Cache Memory-Local, high-speed memory (called cache memory) is used to hold instructions and data that is repetitively accessed by the CPU (for example, within a program loop) and thus reduces the number of references that must be made to the slower-speed main memory. Some microprocessors provide a limited amount of cache memory on the CPU chip itse~. The external caches supported by the IDT79R3000 can be much larger; while a small cache can improve performance of some programs, significant improvements for a wide range of programs require large caches. • Separate Caches for data and Instructlons-Even with high-speed caches, memory speed can still be a limiting factor because of the fast cycle time of a high-performance microprocessor. The IDT79R3000 supports separate caches for instructions and data and alternates accesses of the two caches during each CPU cycle. Thus, the processor can obtain data and instructions at the cycle rate of the CPU using caches constructed with commercially available IDT static RAM devices. In order to maximize bandwidth in the cache while minimizing the requirement for SRAM access speed, the R3000 divides a single-processor clock cycle into two phases. During one phase, the address forthe data cache access is presented while data previously addressed in the instruction cache is read; during the next phase, the data operation is completed while the instruction cache is being addressed. Thus, both caches are read in a single processor cycle using only one set of address and data pins. • Write Buffer-In orderto ensure data consistency, all data that is written to the data cache must also be written out to main memory. The cache write model used by the IDT79R3000 is that of a write-through cache; that is, all data written by the CPU is immediately written into the main memory. To relieve the CPU of this responsibility (and the inherent performance burden) the IDT79R3000 supports an interface to a write buffer. The IDT79R3020 Write Buffer captures data (and associated addresses) output by the CPU and ensures that the data is passed on to main memory. IDT79R3000 Instruction Pipeline (5-deep) Q Instruction Flow Current CPU Cycle Figure 8. 1DT79R3000 Execution Sequence This pipeline operates efficiently because different CPU resources (address and data bus accesses, ALU operations, register accesses, and so on) are utilized on a non-interfering basis. Memory System Hierarchy The high performance capabilities of the IDT79R3000 processor demand system configurations incorporating techniques frequently employed in large, mainframe computers but seldom encountered in systems based on more traditional microprocessors. A primary goal of systems employing RISC techniques is to minimize the average number of cycles each instruction requires for execution. In orderto achieve this goal, RISC processors incorporate a number of RISC techniques including a compact and uniform instruction set, a deep instruction pipeline (as described above), and utilization of optimizing compilers. Many of the advantages obtained from these techniques can, however, be negated by an inefficient memory system. Figure 9 illustrates memory in a simple microprocessor system. In this system, the CPU outputs addresses to memory and reads instructions and data from memory or writes data to memory. The address space is completely undifferentiated: instructions, data, and I/O devices are all treated the same. In such a system, a primary limiting performance factor is memory bandwidth. IDT79R3000 Microprocessor Memory (and I/O) Data Address Main Memory Figure 9. A Simple Microprocessor Memory System Figure 10 illustrates a memory system that supports the significantly greater memory bandwidth required to take full advantage of the IDT79R3000's performance capabilities. The key features of this system are: Figure 10. An IDTI9R3000 System with a High-Performance Memory System 7 MIUTARY AND COMMERCIAL TEMPERATURE RANGES IDTI9R3000 RISC CPU PROCESSOR IDT79R3000 Processor Subsystem Interfaces Figure 11 illustrates the three subsystem interfaces provided by the 10T79R3000 processor: • Cache control interface (on-chip) for separate data and instruction caches permits implementation of off-chip caches using standard lOT SRAM devices. The 79R3000 directly controls the cache memory with a minimum of external components. Both the instruction and data cache can vary from o to 256K Bytes (64 K entries). The 79R3000 also includes the TAG control logic which determines whether or not the entry read from the cache is the desired data. The 79R3000 cache controller implements a direct mapped cache for high net performance (bandwidth). It has the ability to refill multiple words when a cache miss occurs, thus reducing the effective miss rate to less than 2% for large caches. When a cache miss occurs, the 79R3000 can support refilling the cache in 1,4,8, 16, or 32 word blocks to minimize the effective penalty of having to access main memory. The 79R3000 also incorporates the ability to perform instruction streaming; while the cache is refilling, the processor can resume execution once the missed word is obtained from main memory. In this way, the processor can continue to execute concurrently with the cache block refill. • Memory controller interface for system (main) memory. This interface also includes the logic and signals to allow operation with a write buffer to further improve memory bandwidth. In addition tothe standard full word access, the memory controller supports the ability to write bytes and half-words by using partial word operations. The memory controller also supports the ability to retry memory accesses if, for example, the data returned from memory is invalid and a bus error needs to be signalled. • Coprocessor Interface-The IDT79R3000 features a tightly coupled co-processor interface in which all co-processors maintain synchronization with the main processor; reside on the same data bus as the main processor; and participate in bus transactions in an identical manner to the main processor. The 10T79R3000 generates all required cache and memory control signals, including cache and memory addresses for attached coprocessors. As a result, only the data bus and a few control signals need to be connected to a coprocessor. The interface supports three types of coprocessor instructions: loads/stores, coprocessor operations, and processorcoprocessor transfers. Note that coprocessor loads and stores occur directly between the coprocessor and memory, without requiring the data to go through the CPU. Synchronization between the CPU and external coprocessors is achieved using a Phased-Lock Loop interface to the coprocessor. The coprocessor physical interface also includes coprocessor condition signals (CpCond(n)), which are used in coprocessor branch instructions, and a coprocessor busy signal (CpBusy) which is used to stall the CPU if the coprocessor needs to hold off subsequent operations. Finally, a precise exception interface is defined between the CPU and coprocessors using the external interrupt inputs of the CPU. This allows a coprocessor exception, even if it was the result of a multi-cycle operation, to be traced to the precise coprocessor operation which caused it. This is an important feature for languages which can define specific error handlers for each task. MULTIPROCESSING SUPPORT The 10T79R3000 supports multiprocessing applications in a simple but effective way,. Multiprocessing applications require cache coherency across the multiple processors. The 10T79R3000 offers two signals to support cache coherency: the first, MPStall, stalls the processor within two cycles of being received and keeps it from accessing the cache. This allows an external agent to snoop into the processor data cache. The second signal, MPlnvalidate, causes the processor to write data on the data cache bus which indicates the externally addressed cache entry is invalid. Thus, a subsequent access to that location would result in a cache miss, and the data would be obtained from main memory. The two MP signals would be generated by a external logic which utilizes a secondary cache to perform bus snooping functions. The 79R3000 does not impose an architecture for this secondary cache, but rather is flexible enough to support a variety of application specific architectures and still maintain cache coherency. Further, there is no impact on designs which do not require this feature. ADVANCED FEATURES The 10T79R3000 offers a number of additional features such as the ability to swap the instruction and data caches, facilitating diagnostics and cache flushing. Another feature isolates the caches, which forces cache hits to occur regardless of the contents of the tag fields. Further features of the 10T79R3000 are configured during the last four cycles prior to the negation of the RESET input. These functions include the ability to select cache sizes and cache refill block sizes; the ability to utilize the multiprocessor interface; whether or not instruction streaming is enabled; whether byte ordering follows "Big-Endian" or "Little-Endian" protocols, etc. Table 3 shows the configuration options selected at Reset. These are further discussed in the "Hardware User's Manual". BACKWARD COMPATIBILITY WITH 79R2000 The 10T79R3000 can be used in sockets designed for the 79R2000A. The pin-out of the 79R3000 has been selected to ensure this compatibility, with new functions mapped onto previously unused pins. The instruction set is compatible with that of the 79R2000 at the binary level. As a result, code written for the older processor can be executed. New features, such as block refill, instruction streaming, etc. can be selectively disabled. In most 79R2000A applications, the 79R3000 can be placed in the socket with no modification to initialization settings. The initialization of the 79R3000 includes whether or not the device should operate as a 79R2000A. Systems using 79R2000A would normally have this input configured so that the device would default to this mode. Further application assistance on this topic is available from lOT. A SPECIAL NOTE ON PACKAGING Both the flat pack and the PGA packages for the 79R3000 incorporate separate power and ground planes to eliminate noise associated with high frequency operation. This, coupled with the numerous power and ground pins provided on the device, helps to ensure very reliable operation. The interface supports up to four separate coprocessors. Coprocessor 0 is defined to be the system control coprocessor, and resides on the same chip as the CPU unit. Coprocessor 1 is the Floating Point Accelerator, lOT 79R301 o. Coprocessors 2 and 3 are available to support an interface to application specific functions. 8 MIUTARY AND COMMERCIAL TEMPERATURE RANGES 1DT79R3000 RISC CPU PROCESSOR WCYCLE X CYCLE YCYCLE ZCYCLE DBlkSizeO IBlkSizeO Reserved(l ) Reserved(l ) PhaseDelayOn(2) R3000 Mod e(2) DBIkSize1 IBIkSize1 IStream Store Partial Extend Cache Reserved(l) BigEndian TriState NoCache BusDriveOn INPUT IntO ill Int2 Int3 Int4 Int5 Reserved(l) MultiProcessor PhaseDelayOn(2) R3000 Mod e(2) PhaseDelayOn(2) R3000 Mod e(2) PhaseDelayOn(2) R3000 Mod e(2) NOTES: 1. Reserved entries must be driven high. 2. These values must be driven stable throughout the entire RESET period. Table 3: 1DT79R3000 Mode Selectable Featuros Data Bus Data Bus Data Bus Tag Bus r---- r-- - - Tag Bus Tag Bus _AdrLo Bus Jl '-)' ITrans-rp'arent Latch ",/7 Tag AdrLo TagV TagP Data DataP DClk IClk AdrLo Bus "',/ HTrans-1 p'arent Latch '\. '-/""J "',/~"',>' Data Tag Instruction Cache IAdr [15:2] IRd DRd f--oo OE WE ~ IWr DWr r----. WE Clk2xSys Clk2xSmp XEn Clk2xRd SysOut CIk2xPhi AccTy[2:0] Reset MemRd CpSync MemWr Run RdBusy Exc WrBusy CpBusy CpCond[O] f4--r--f4--- Data Data Cache OE ~ "'J'-/""J Memory Interface DAdr Tag [15:2] IDT79R3000 Processor with System Control Coprocessor "',/~ 1_ 2- Clocks 3 14-14--"f4- Coprocessors CpCond[3:1] BusError Intr(5:0) Hardware nterrupts Figure 11. IDT79R3000 Subsystem Interfaces Example; 64 KB Caches 9 I I IDT79R3000 RISC CPU PROCESSOR MIUTARY AND COMMERCIAL TEMPERATURE RANGES PIN CONFIGURATION Data21 Data22 Data24 Data25 Data26 Data31 DataP3 Data27 Data28 AdrLo2 AdrLo3 AdrLo4 AdrLo5 AdrLo6 AdrLo7 AdrLo8 AdrLo9 AdrLo10 AdrLo11 AdrLo12 AdrLo13 AdrLo14 VCC15 VCC16 VCC17 GND16 GND17 VCC18 VCC19 GND18 VCC20 VCC21 VCC22 AdrLo15 CpCondO CpCond1 Resvd1 (1) GND19 GND20 AdrLo16(2) XEn Data29 Data30 Exc CIk2xPhi GND7 GND6 Clk2xSmp VCC7 VCC6 GND5 GND4 GND3 VCC5 VCC4 VCC3 GND2 GND1 Clk2~ IRd1 DBQ1 IWr1 DWr1 VCC2 VCC1 Mr/...o17(2) IntO int1 CIk2x.Bd lo12 SysOut DClk IClk Int3 Int4 Int5 CpBusy WrBusy RdBusy J..BQ.2 DRd2 IWr2 ~ MemWr ~ror Reset 172-PIN CERAMIC FLATPACK (Cavity Side View) NOTES: 1. Reserved pins must not be connected. 2. AdrLo 16 & 17 are multi-function pins which are controlled by mode select programming on interrupt pins at reset time. AdrLo 16: MP Invalidate, CpCond (2). AdrLo 17: MP Stall, CpCond (3). 10 IDTI9R3000 RISC CPU PROCESSOR MIUTARY AND COMMERCIAL TEMPERATURE RANGES PIN CONFIGURATION 2 A VCC14 AdrLo 6 3 4 5 a 7 6 9 10 11 12 13 14 15 AdrLo ppCond AdrLo(1) AdrLo(1) 16 17 0 15 Infr2 Intr5 Wr Busy Reset VCC10 AdrLo CpCond 13 1 Intr1 Intr3 Cp Busy Bus Error DR2 Tag12 Tag15 AdrLo GND13 GND12 VCG11 a IntrO Intr4 Rd Busy AdrLo 10 AdrLo VCC12 11 AdrLo 9 AdrLo 14 B AdrLo 3 DRd2 AdrLo 7 C AdrLo 0 AdrLo 4 VCC13 AdrLo 5 D Data 1 AdrLo 2 GNDO Tag14 Tag17 Tag19 E DataP 0 Data 0 AdrLo 1 Tag16 Tag20 VCG9 F VCCO Data 7 Data 2 GND10 Tag21 Tag23 G Data 4 Data 3 GND1 GND9 Tag22 TagP1 H Data 6 Data 5 Data a vcca Tag25 Tag24 J Data 10 DataP 1 Data 9 Tag28 Tag29 Tag26 K Data 15 Data 11 GND2 GND8 Tag P2 Tag27 L VCC1 Data 12 Data 17 Acc Typ2 Tag31 Tag30 M Data 13 Data 16 DataP 2 GND7 Acc Typ1 VCC7 N Data 14 Data 18 Data 19 GND3 Data 24 Data P3 VCC3 VCC4 GND5 GND6 DRd1 Mem Wr Mem Rd Run TagV P Data 23 Data 20 IWr2 Data 22 Data 26 Data 27 XEn Data 30 CIk2x Sys CIk2x Rd DClk IRd1 IWr1 Cp Sync Acc TypO a VCC2 Data 21 Data 25 Data 31 Data 2a GND4 Data 29 Excep tion CIk2x Phi CIk2x Smp SysOut VCC5 IClk DWr1 VCC6 AdrLo 12 IRd2 GND11 Tag13 TagPO Tag1a 144-Pin PGA (Top View) NOTE: 1. AdrLo 16 & 17 are multi-function pins which are controlled by mode select programming on interrupt pins at reset time. AdrL016: MP Invalidate, CpCond (2). AdrL017: MP Stall, CpCond (3). 11 1DT79R3000 RISC CPU PROCESSOR MIUTARY AND COMMERCIAL TEMPERATURE RANGES PIN DESCRIPTIONS PIN NAME 110 Data (0-31) I/O A 32-bit bus used for all instruction and data transmission among the processor, caches, memory interface, and coprocessors. A 4-bit bus containing even parity over the data bus. DESCRIPTION DataP (0-3) I/O Tag (12-31) I/O A 2O-bit bus used for transferring cache tags and high addresses between the processor, caches, and memory interface. TagV I/O The tag validity indicator. TagP (0-2) I/O A 3-bit bus containing even parity over the concatenation of TagV and Tag. AdrLo (0-17) 0 An 18-bit bus containing byte addresses used for transferring low addresses from the processor to the caches and memory interface. (AdrLo 16: CpCond (2), AdrLo 17: CpCond (3) set by reset initialization). mcrr 0 Read enable for the instruction cache. lWrT ma2 lWr2 0 Write enable for the instruction cache. 0 An identical copy of fI5.aT used to split the load. 0 An identical copy of lWrT used to split the load. IClk The instruction cache address latch clock. This cfock runs continuously. "C5mT 0 0 r5WrT 0 The write enable for the data cache. t)15.d2 0 An identical copy of r>Wr2 0 An identical copy of mVrT used to split the load. The read enable for the data cache. ms.crr used to split the load. DClk 0 The data cache address latch clock. This cfock runs continuously. ~ 0 The read enable for the Read Buffer. AccTyp (0-2) 0 A 3-bit bus used to indicate the size of data being transferred on the data bus, whether or not a data transfer is occurring, and the purpose of the transfer. tremWr Meiru5.d 0 Signals the occurrence of a main memory write 0 Signals the occurrence of a main memory read. BusError I Signals the occurrence of a bus error during a main memory read or write. ~ 0 Indicates whether the processor is in the run or stall state. EXceplion 0 Indicates that the instruction about to commit state should be aborted and other exception related information. WsOUi 0 A reflection of the internal processor cfock used to generate the system cfock. CpSync 0 A cfock which is identical to RdBusy I The main memory read stall termination signa!. In most system designs RdBusy is normally asserted and is deasserted only to indicate the successful completion of a memory read. RdBusy is sampled by the processor only during memory read stalls. WrBuSY I The main memory write stall initiation/termination signal. CpBusy I The coprocessor busy stall initiation/termination signa!. CpCond (0-1) I A 2-bit bus used to transfer conditional branch status from the coprocessors to the main P!ocessor. CpCond (2-3) I Conditional branch status from coprocessors to the processor. Function is provided on AdrLo 16/17 pins and is selected at reset time. MPStall I Multiprocessing Stall. Signals to the processor that it should stall accesses to the caches in a multiprocessing environment. This is physically the same pin as CpCond3; its use is determined at RESET initialization. MPlnvalidate I Multiprocessing Invalidate. Signals to the processor that it should issue invalidate data on the cache data bus. The address to be invalidated is externally provided. This is the same pin as CpCond2; its use is determined at RESET initialization. Jiii(0-5) I A 6-bit bus used by the memory interface and coprocessors to signal maskable interrupts to the processor. At reset time, mode select values are read in. Clk2xSys I The master double frequency input cfock used for generating SySOUt. Clk2xSmp I A double frequency cfock input used to determine the sample point for data cominQ into the processor and coprocessors. Clk2xRd I A double frequency cfock input used to determine the enable time of the cache RAMs. Clk2xPhi I A double frequency cfock input used to determine the position of the internal phases, phasel and phase2. ~ I Synchronous initialization input used to force execution starting from the reset memory address. Reset must be deasserted synchronously but asserted asynchronously. The deassertion of resei must be synchronized by the leading edge of SysOut. WsOUi and used by coprocessors for timing synchronization with the CPU. 12 IDT79R3000 RISC CPU PROCESSOR MIUTARY AND COMMERCIAL TEMPERATURE RANGES RECOMMENDED OPERATING ABSOLUTE MAXIMUM RATINGS(1.3) SYMBOL VTERM TA TBIAS TSTG VIN RATING COMMERCIAL MILITARY UNIT Terminal Voltage with Respect to GND -0.5 to +7.0 -0.5 to +7.0 V Operating Temperature Temperature Under Bias o to +70 -55 to +125 °C -55 to +125 -65 to +135 °C Storage Temperature -55 to +125 -65 to +150 °C Input Voltage(2) -0.5 to +7.0 -0.5 to +7.0 V TEMPERATURE AND SUPPLY VOLTAGE AMBIENT GND GRADE Vee TEMPERATURE Military Commercial -55°C to + 125°C OV 5.0± 10% O°C to +70°C OV 5.0±5% OUTPUT LOADING FOR NOTES: 1. Stresses greater than those listed under ABSOLUTE MAXIMUM RATINGS may cause permanent damage to the device. This is a stress rating only and functional operation of the device at these or any other conditions above those indicated in the operational sections of this specification is not implied. Exposure to absolute maximum rating conditions for extended periods may affect reliability. AC TESTING >--,.----v To Device Under Test 2. VIN minimum = -3.0V for pulse width less than 15ns. VIN should not exceed Vee +0.5 Volts. 3. Not more than one output should be shorted at a time. Duration of the short should not exceed 30 seconds. DC ELECTRICAL CHARACTERISTICSCOMMERCIAL TEMPERATURE RANGE TA =0°Cto+70°C Vee=+50V±5% SYMBOL PARAMETER 16.67MHz MIN. MAX. TEST CON DITIONS VOH Output HIGH Voltage Vee = Min., IOH = --4mA VOL Output LOW Voltage Vee = Min., IOl = 4mA 3.5 - 20.0MHz 25.0MHz 33.33MHz MIN. MAX. MIN. MAX. MIN. MAX. 3.5 - VOHe Output HIGH Voltage(7) Vee = Min., IOH = --4mA 4.0 4.0 VOHT Output HIGH Voltage (4,6) Vee = Min., IOH = -SmA 2.4 2.4 VOLT Output LOW Voltage (4,6) Vee = Min., IOl = SmA Input LOW Voltage (1) 0.8 3.0 3.0 V o.S V 0.8 V V pF V 2.0 0.8 Input HIGH Voltage (2,5) V V V 2.4 O.S 2.0 2.0 - 0.4 4.0 0.8 Input HIGH Voltage (5) 3.5 0.4 0.4 UNIT V 3.0 Input LOW Voltage (1,2) 0.4 0.4 0.4 Input Capacitance (6) 10 10 10 COUT Output Capacitance (6) 10 10 10 pF Icc Operating Current ! 575 650 750 rnA IIH III Input HIGH Leakage (3) VIH - Vee Input LOW Leakage (3) Vil = GND -10 loz Output Tri-state Leakage VOH = 2.4V, VOL = 0.5V --40 Vee = Max. - 10 10 -10 40 --40 10 {#:::" 10 40 --40 40 -10 40 --40 NOTES: 1. VIL Min. = -3.0V for pulse width less than 15ns. VIL should not fall below -0.5 Volts for larger periods. 2. VIHS and VILS apply to Clk2xSys, Clk2xSmp, CIk2xRd, Clk2xPhi, CpBusy, and Reset. 3. These parameters do not apply to the clock inputs. 4. VOHT and VOLT apply to the bidirectional data and tag busses only. Note that VIH and VIL also apply to these signals. VOHT and VOLT are provided to give the designer further information about these specific signals. 5. VIH should not be held above Vce + 0.5 volts. 6. Guaranteed by design. 7. VOHC applies to mID" and Exception. 13 IDTI9R3000 RISC CPU PROCESSOR MIUTARY AND COMMERCIAL TEMPERATURE RANGES DC ELECTRICAL CHARACTERISTICSMILITARY TEMPERATURE RANGE (TA =-55°Cto+125°C, Vee =+5.0V± 10%) SYMBOL PARAMETER 16.67MHz TEST CONDITIONS MAX. MIN. :t VOH Output HIGH Voltage Vee = Min., 10H = -4mA 3.5 VOL Output LOW Voltage Vee = Min., IOL = 4mA -.,::::::\:::¢ VOHe Output HIGH Voltage(7) Vee = Min., 10H = -4mA 4.0 VOHT Output HIGH Voltage (4,6) Vee = Min., 10H = -8mA 2,.4. VOLT ,Output LOW Voltage (4,6) Vee = Min., 10L = 8 m A , , : · : { : ..::::::::;:::::::<::n:.::0.8 VIH Input HIGH Voltage J:::... (6) = 2.4V, VOL = 0.5V 0.8 V - V - 0.4 V - 10 pF NOTES: 1. VIL Min. =-3.0V for pulse width less than 15ns. VIL should not fall below -0.5 Volts for larger periods. 2. VIHS and VILS apply to Clk2xSys, Clk2xSmp, CIk2xRd, Clk2xPhi, Cp8usy, and neset. 3. These parameters do not apply to the clock inputs. 4. VOHT and VOLT apply to the bidirectional data and tag busses only. Note that VIH and VIL also apply to these signals. VOHT and VOLT are provided to give the designer further information about these specific signals. 5. VIH should not be held above Vec + 0.5 volts. 6. Guaranteed by design. 7. VOHC applies to 11uii and EXception. 14 1DT79R3000 RISC CPU PROCESSOR MIUTARY AND COMMERCIAL TEMPERATURE RANGES AC ELECTRICAL CHARACTERISTICS(1, 2, 3)_ COMMERCIAL TEMPERATURE RANGE (TA =O°Cto+70°C, Vcc=+5.0V±5%) PARAMETER UNIT TEST CONDITION All timings are referenced to 1.5V. The clock parameters apply to all four 2xClocks: Clk2xSys, Clk2xSmp, Clk2xRd, and Clk2xPhi. This parameter is guaranteed by design. These parameters apply when the 79R3010 Floating Point Coprocessor is connected to the CPU. With phase lock on, Reset must be asserted for the longer of 3000 clock cycles or 200 microseconds. _. 5. Tcyc is one CPU clock cycle (two cycles of a 2x clock). 6. With the exception of the RUil signal, no two signals on a given device will derate for a given load by a difference greater than 15%. 1. 2. 3. 4. 15 MIUTARV AND COMMERCIAL TEMPERATURE RANGES IDT79R3000 RISC CPU PROCESSOR AC ELECTRICAL CHARACTERISTICS(1, 2, 3)_ MILITARY TEMPERATURE RANGE (TA =-55°Cto+125°C, Vee =+5.0V± 10%) PARAMETER TEST CONDITIONS 1. 2. 3. 4. 16.67MHz MIN. UNIT All timings are referenced to 1.5V. The clock parameters apply to all four 2xClocks: CIk2xSys, Clk2xSmp, Clk2xRd, and CIk2xPhi. This parameter is guaranteed by design. These parameters apply when the 79R301 0 Floating Point Coprocessor is connected to the CPU. With phase lock on, Reset must be asserted for the longer of 3000 clock cycles or 200 microseconds. 5. Tcyc is one CPU clock cyde (two cycles of a 2x clock). 6. With the exception of the ~ signal, no two signals on a given device will derate for a given load by a difference greater than 15%. 16 1DT19R3000 RISC CPU PROCESSOR MIUTARY AND COMMERCIAL TEMPERATURE RANGES Tckhigh Tckp Teklaw Clk2xSys Clk2xSmp Clk2xRd Clk2xPhi Figure 12. Input Clock Timing Tcyc Tsmp SmpOut* RdOut* PhiOut* Tsys * These signals are not actually output from the processor. They are drawn to provide a reference for other timing diagrams. Figure 13. Processor Reference Clock Timing 17 IDT79R3000 RISC CPU PROCESSOR MIUTARY AND COMMERCIAL TEMPERATURE RANGES SysOut 2 2 Phase -----' AddrLo AccTyp 0:1 Size of Loaded Data AccTyp 2 D Bus Input Tdh Data and Tag Busses IClk DClk Twrdly Figure 14. Synchronous Memory (Cache) Timing 18 IDTI9R3000 RISC CPU PROCESSOR RUN Phase I MIUTARV AND COMMERCIAL TEMPERATURE RANGES 2 I I 2 AddrLo Tag (Address High) Data (Output) Figure 15. Memory Write Timing 19 RUN FIXUP STALL STALL 2 I 2 IDT79R3000 RISC CPU PROCESSOR MIUTARY AND COMMERCIAL TEMPERATURE RANGES RUN Phas. ui~, Tsys7~ 2 2 I Tsaval AddrL 0 '" D Addr ) Tsys7~ ..;:, I Addr Ta 9 (Add res s Hig h) -+ AccTyp 0 :1 £= Tacty Data Size Tsacty-+ (:.::~,/~~~,,,.] ~at2 Tsacty )~ DAddr Read Address ~Tdval ~ Tdval ~ ~~ Read Address Data Miss J':~ -+ ~ y I j~ .1 J ~ .... RdBu sy --- n I+- Tsmp ?IS Dc - ITds Tsmp - ~I'- / ... ... Tds f-Tdh -+Idh,..~ Tdsl :Jc -L I Twrdltk"~ .... Tmrdt Tact y -+~ - d~ ~~. CpCon dO DAddr) L> Cached r n I Addr ) - , +-rrd Tsmp Tstl ..,~ Tden-+ -+I Tddis .... r-Tacty 1-+ Data Size Tden KJ- -CJ Tmrdi ~ ~ Tdh Data (Input ) ..,,,Tsys Tsys? 1/:",,1 ",.,.,. • -+ AccTyp 2 I· -+ 2 2 .-.I I RUN FIXUP STALL I I I ~ 1;: ~Tsys 1;: ~~~ ~, Tsys7~ R~ ~ I I ut_ STALL Tsys Tro~ +- +-=+ -.:7 ~ I I .! Tsmp Trun I Figure 16. Memory Read Timing 20 I ~ IDTI9R3000 RISC CPU PROCESSOR MIUTARY AND COMMERCIAL TEMPERATURE RANGES Co-Processor Store Co-ProcessorLoad 2 2 Phase SysOut -----' Data Bus Run CpBusy ---+--+~---4-----~-----+-----~~~r---- ----t------+-- '4-----iIit------+-- Exception - - - t - - ' CpCond(n) -----it----------t__' Condition Valid Figure 17. Co-Processor Load/Store Timing 21 JDT79R3000 RISC CPU PROCESSOR Phase MIUTARY AND COMMERCIAL TEMPERATURE RANGES 2 2 Figure 18. Interrupt Timing 22 IDTI9R3000 RISC CPU PROCESSOR MIUTARY AND COMMERCIAL TEMPERATURE RANGES N N N N N NOTES: 1. Reset must be negated synchronously; however, it can be asserted asynchronously. Designs should not rely on the proper functioning of sysOut prior to the assertion of r:i9s9t. 2. If Phase:tock On or "'R3""OO=O~M""'o-d""'e are asserted as mode select options, they should be asserted throughout the Reset period, to insure that the slowest co-processor in the system has sufficient time to lock the CPU clocks. 3. ~ is acturally sampled in both Phase 1 and Phase 2. To insure proper initialization, it is recommended that ~ be negated relative to the end of Phase 1. Figure 19. Mode Vector Initialization 23 IDT79R3000 RISC CPU PROCESSOR MIUTARY AND COMMERCIAL TEMPERATURE RANGES ORDERING INFORMATION lOT XXXXX Device Type -XL --X- Speed Package x Process! Temp. R~ 24 :ank Commercial (O°C to +70°C) Military (-55°C to +125°C) Compliant to MIL-STD-883, Class B Military Temperature Range Only G F 144-Pin PGA 172-Pin Flat Pack 16 20 25 33 16.67 MHz 20.0 MHz 25.0 MHz 33.33 MHz 79R3000 RISC CPU Processor t;J® PRELIMINARY IDT79R3000A IDT79R3000AE RISC CPU PROCESSOR Integrated Device Technology, Inc. • Supports independent multiword block refill of both the instruction and data caches with variable block sizes. • Supports concurrent refill and execution of instructions. • Partial word stores executed as read-modify-write operations. • 6 external interrupt inputs (up to 64 different sources), 2 software interrupts, with single cycle latency to exception handler routine. • Flexible multiprocessing support on chip with no impact on uniprocessor designs. • Military product compliant to MIL-STD-883, Class B. FEATURES: • Enhanced instruction set compatible version of the IDT79R2000, IDT79R3000 RISC CPUs. • Upwardly pin-compatible with IDT79R3000 RISC CPU. • IDT79R3000A "E" version relaxes system memory timing requirements. • Full 32-bit Operation-Thirty-two 32-bit registers and all instructions and addresses are 32-bit. • Efficient Pipe lining-The CPU's 5-stage pipeline design assists in obtaining an execution rate approaching one instruction per cycle. Pipeline stalls and exceptions are handled precisely and efficiently. • On-Chip Cache Control-The IDT79R3000 provides a high bandwidth memory interface that handles separate external Instruction and Data Caches ranging in size from 4 to 256 Kbytes each. Both the caches are accessed during a single CPU cycle. All cache control is on-chip. • On-Chip Memory Management Unit-A fully-associative, 64 entry Translation Lookaside Buffer (TLB) provides fast address translation for virtual-to-physical memory mapping of the 4 Gigabyte virtual address space. • Coprocessor Interface-The IDT79R3000 generates all addresses and handles memory interface control for up to three additional tightly coupled external processors. • Optimizing Compilers are available for C, Fortran, Pascal, COBOL, Ada, and PU1. • UNIXTM System V.3 and BSD 4.3 operating systems supported. • High-speed CEMOSTM technology. • Instruction set compatible with the IDT79R2000 RISC CPU. • 16.7MHz, 20MHz, 25MHz and 33MHz clock rates yield up to 28 MIPS sustained throughput. DESCRIPTION: The lOT 79R3000A RISC Microprocessor consists of two tightlycoupled processors integrated on a single chip. The first processor is a full 32-bit CPU based on RISC (Reduced Instruction Set Computer) principles to achieve a new standard of microprocessor performance. The second processor is a system control coprocessor, called CPO, containing a fully-associative 64 entry TLB (Translation Lookaside Buffer), MMU (Memory Management Unit) and control registers, supporting a 4 Gigabyte virtual memory subsystem, and a Harvard Architecture Cache Controller achieving a bandwidth of over 260 Mbytes/second using industry standard static RAMs. This data sheet provides an overview of the features and architecture of the 79R3000A CPU, Revision 3.0. A more detailed description of the operation of the device is incorporated in the "R3000A Family Hardware User Manual", and a more detailed architectural overview is provided in the "mips RISC Architecture" book, both available from lOT. Documentation providing details of the software and development environments supporting this processor are also available from lOT. 10T79R3000A PROCESSOR CPO CPU (System Control Coprocessor) Excegtion / Control Re isters Memorv Management Unit Registers Translation LooKaslde Buffer (64 entries) Local Control Logic Multi lier/Divider Address Adder PC IncrementlMux TAG(20+4) Oata(32+4) AOORESS(18) CEMOS is a trademark of Integrated Device Technology, Inc. UNIX is a registered trademark of AT&T. MILITARY AND COMMERCIAL TEMPERATURE RANGES C 1990 Integrated Device Technology, Inc. 25 JULY 1990 DSC-90381- MIUTARY AND COMMERCIAL TEMPERATURE RANGES IDT79R3000AlAE RISC CPU PROCESSOR The IDT79R3000A instruction set can be divided into the following groups: • Load/Store instructions move data between memory and general registers. They are alii-type instructions, since the only addressing mode supported is base register plus 16-bit, signed immediate offset. The Load instruction has a single cycle of latency, which means that the data being loaded is not available to the instruction immediately after the load instruction. The compiler will fill this delay slot with either an instruction which is not dependent on the loaded data, or with a NOP instruction. There is no latency associated with the store instruction. Loads and Stores can be performed on byte, half-word, word, or unaligned word data (32 bit data not aligned on a modul0-4 address). The CPU cache is constructed as a write-through cache. • Computational instructions perform arithmetic, logical and shift operations on values in registers. They occur in both R-type (both operands and the result are registers) and I-type (one operand is a 16-bit immediate) formats. Note that computational instructions are three operand instructions; that is, the result of the operation can be stored into a different register than either of the two operands. This means that operands need not be overwritten by arithmetic operations. This results in a more efficient use of the large register set. • Jump and Branch instructions change the control flow of a program. Jumps are always to a paged absolute address formed by combining a 26-bit target with four bits of the Program counter (J-type format, for subroutine calls), or 32-bit register byte addresses (R-type, for returns and dispatches). Branches have 16-bit offsets relative to the program counter (I-type). Jump and Link instructions save a return address in Register 31. The 79R3000A instruction set features a number of branch conditions. Included is the ability to compare a register to zero and branch, and also the ability to branch based on a comparison between two registers. Thus, net performance is increased since software does not have to perform arithmetic instructions prior to the branch to set up the branch conditions. • Coprocessor instructions perform operations in the coprocessors. Coprocessor Loads and Stores are I-type. Coprocessor computational instructions have coprocessordependent formats (see coprocessor manuals). • Coprocessor 0 instructions perform operations on the System Control Coprocessor (CPO) registers to manipulate the memory management and exception handling facilities of the processor. • Special instructions perform a variety of tasks, including movement of data between special and general registers, system calls, and breakpoint. They are always R-type. IDT79R3000A CPU Registers The IDT79R3000A CPU provides 32 general purpose 32-bit registers, a 32-bit Program Counter, and two 32-bit registers that hold the results of integer mUltiply and divide operations. Only two of the 32 general registers have a special purpose: register rO is hardwired to the value "0", which is a useful constant, and register r31 is used as the link register in jump-and-link instructions (return address for subroutine calls). The CPU registers are shown in Figure 2. Note that there is no Program Status Word (PSW) register shown in this figure: the functions traditionally provided by a PSW register are instead provided in the Status and Cause registers incorporated within the System Control Coprocessor (CPO). General Purpose Registers 31 0 Multiply / Divide Registers 0 0 31 I r1 r2 HI I o 31 • LO ·· · Program Counter 31 r29 I r30 0 I PC r31 Figure 2. 1DT79R3000A CPU Registers Instruction Set Overview -AIIIDT79R3000A instructions are 32 bits long, and there are only three instruction formats. This approach simplifies instruction decoding thus minimizing instruction execution time. The 79R3000A processor initiates a new instruction on every run cycle, and is able to complete an instruction on almost every clock cycle. The only exceptions are the Load instructions and Branch instructions, which each have a single cycle of latency associated with their execution. Note, however, that in the majority of cases the compilers are able to fill these latency cycles with useful instructions which do not require the result of the previous instruction. This effectively eliminates these latency effects. The actual instruction set of the CPU was determined after extensive simulations to determine which instructions should be implemented in hardware, and which operations are best synthesized in software from other basic instructions. This methodology resulted in the R3000A having the highest performance of any available microprocessor. Table 1 lists the instruction set of the IDT79R300OA processor. I-Type (Immediate) 31 I 2625 op I 2120 rn I a 16 15 rt immediate I J-Type (Jump) 31 26 25 a 1r--op~~I~---------t-a-rg-et----------~I- R-Type (Register) 31 I 26252120 op I rs I 1615 rt I 6 5 1110 rd I re I a funct I Figure 3. 1DT79R3000A Instruction Formats 26 MILITARY AND COMMERCIAL TEMPERATURE RANGES IDT79R3000AlAE RISC CPU PROCESSOR DESCRIPTION OP OP Multiply/Divide Instructions Load/Store Instructions LB LBU LH LHU LW LWL LWR Load Load Load Load Load Load Load SB SH SW SWL SWR Store Byte Store Halfword Store Word Store Word Left Store Word Right DESCRIPTION Byte Byte Unsigned Halfword Halfword Unsigned Word Word Left Word Right MULT MULTU DIV DIVU Multiply Multiply Unsigned Divide Divide Unsigned MFHI MTHI MFLO MTLO Move From HI Move To HI Move From LO Move To LO J JAL JR JALR Jump Jump and Link Jump to Register Jump and Link Register Branch on Equal Branch on Not Equal Branch on Less than or Equal to Zero Branch on Greater Than Zero Branch on Less Than Zero Branch on Greater than or Equal to Zero Branch on Less Than Zero and Link Branch on Greater than or Equal to Zero and Link Jump and Branch Instructions Arithmetic Instructions (ALU Immediate) ADDI ADDIU SLTI SLTIU Add Immediate Add Immediate Unsigned Set on Less Than Immediate Set on Less Than Immediate Unsigned BEQ BNE BLEZ BGTZ BLTZ BGEZ ANDI ORI XORI AND Immediate OR Immediate Exclusive OR Immediate BLTZAL BGEZAL LUI Load Upper Immediate Special Instructions Arithmetic Instructions (3-operand, register-type) ADD ADDU SYSCALL BREAK System Call Break LWCz SWCz MTCz MFCz CTCz CFCz COPz BCzT BCzF Load Word from Coprocessor Store Word to Coprocessor Move To Coprocessor Move From Coprocessor Move Control to Coprocessor Move Control From Coprocessor Coprocessor Operation Branch on Coprocessor z True Branch on Coprocessor z False Add Add Unsigned Coprocessor Instructions SUB SUBU Subtract Subtract Unsigned SLT SLTU Set on Less Than Set on Less Than Unsigned AND OR XOR NOR AND OR Exclusive OR NOR System Control Coprocessor (CPO) Instructions Shift Instructions SLL SRL SRA SLLV SRLV SRAV Shift Left Logical Shift Right Logical Shift Right Arithmetic Shift Left Logical Variable Shift Right Logical Variable Shift Right Arithmetic Variable MTCO MFCO Move To CPO Move From CPO TLBR TLBWI TLBWR TLBP Read indexed TLB entry Write Indexed TLB entry Write Random TLB entry Probe TLB for matching entry RFE Restore From Exception Table 1. 1DT79R3000A Instruction Summary 27 MIUTARY AND COMMERCIAL TEMPERATURE RANGES IDT79R3000AlAE RISC CPU PROCESSOR and supports the virtual memory system and exception handling functions of the IDT79R3000A. The virtual memory system is implemented using a Translation Lookaside Buffer and a group of programmable registers as shown in Figure 4. IDT79R3000A System Control Coprocessor (CPO) The IDT79R3000A can operate with up to four tightly-coupled coprocessors (designated CPO through CP3). The System Control Coprocessor (or CPO), is incorporated on the IDT79R3000 chip System Coprocessor r~~~~f~~?i~§II~~~~rif~~~%~J ~ 'm l mEN=T~Rl mY=H~1mm_~=m=E=NllmTR=Y=Lllm0mm~rlJl INDEX 63 RANDOM TLB 8 7 o NOT ACCESSED BY RANDOM D Used with Virtual Memory System EZJ Used with Exception Processing Figure 4. The System Coprocessor Registers System Control Coprocessor (CPO) Registers The CPO registers shown in Figure 4 are used to control the memory management and exception handling capabilities of the IDT79R3000A. Table 2 provides a brief description of each register. REGISTER DESCRIPTION EntryHi EntryLo Index Random High half of a TLB entry Low half of a TLB entry Programmable pointer into TLB array Pseudo-random pointer into TLB array Status Cause EPC Context BadVA Mode, interrupt enables, and diagnostic status info Indicates nature of last exception Exception Program Counter Pointer into kernel's virtual Page Table Entry array Most recent bad virtual address PRld Processor revision identification (Read only) Table 2. System Control Coprocessor (CPO) Registers 28 IDT79R3000AlAE RISC CPU PROCESSOR MIUTARY AND COMMERCIAL TEMPERATURE RANGES Memory Management System The IDT79R3000A has an addressing range of 4 Gbytes. However, since most IDT79R3000A systems implement a physical memory smaller than 4 Gbytes, the IDT79R3000A provides for the logical expansion of memory space by translating addresses composed in a large virtual address space into available physical memory address. The 4 GByte address space is divided into 2 GBytes which can be accessed by both the users and the kernel, and 2 GBytes for the kernel only. 64 entries, each of which maps a 4-Kbyte page, with controls for read/write access, cacheability, and process identification. The TLB allows each user to access up to 2 Gbytes of virtual address space. Figure 5 illustrates the format of each TLB entry. The Translation operation involves matching the current Process 10 (PID) and upper 20 bits of the address against PID and VPN (Virtual Page Number) fields in the TLB. When both match (or the TLB entry is Global), the VPN is replaced with the PFN (Physical Frame Number) to form the physical address. TLB misses are handled in software, with the entry to be replaced determined by a simple RANDOM function. The routine to process a TLB miss in the UNIX environment requires only 10-12 cycles, which compares favorably with many CPUs which perform the operation in hardware. The TLB (Translation Lookaslde Buffer) Virtual memory mapping is assisted by the Translation Lookaside Buffer (TLB). The on-chip TLB provides very fast virtual memory access and is well-matched to the requirements of multitasking operating systems. The fully-associative TLB contains TLB ENTRY FORMAT 63 4 43 38 37 32 31 12 1 10 9 8 o 7 o 7 v ENTRYHI ENTRYLO VPN - Virtual Page number TLBPID - Process ID PFN - Physical frame number N - Non-cacheable flag o - Dirty flag (Write protect) V - Valid entry flag G - Global flag (ignore PID ) 0- Reserved Figure 5. TLB Entry Format IDT79R3000 Operating Modes The IDT79R3000A has two operating modes: User mode and Kernelmode. The IDT79R3000A normally operates in the User mode until an exception is detected forcing it into the Kernel mode. It remains in the Kernel mode until a Restore From Excep- tion (RFE) instruction is executed. The manner in which memory addresses are translated or mapped depends on the operating mode of the IDT79R3000A. Figure 6 shows the MMU translation performed for each of the operating modes. 29 1DT79R3DDDAIAE RISC CPU PROCESSOR MIUTARY AND COMMERCIAL TEMPERATURE RANGES MMU ADDRESS TRANSLATION VIRTUAL -> PHYSICAL OxFFFFFFFF KERNEL MAPPED CACHEABLE (kseg2) * OxCOOOOOOO OxAOOOOOOO Ox80000000 Ox7FFFFFFF - OxFFFFFFFF KERNEL UNMAPPED UNCACHED (kseg1 ) PHYSICAL MEMORY KERNEL UNMAPPED CACHED -(kseqO) 3584 MB KERNEUUSER MAPPED CACHEABLE : (kuseg) ANY Ox20000000 b.(> o '. r OxlFFFFFFF MEMORY OxOOOOOOOO ..... 512 MB Figure 6. 1DT79R3DOOA Virtual Address Mapping User Mode-in this mode, a single, uniform virtual address space (kuseg) of 2 Gbyte is available. Each virtual address is extended with a 6-bit process identifier field to form unique virtual addresses. All references to this segment are mapped through the TLB. Use of the cache for up to 64 processes is determined by bit settings for each page within the TLB entries. Kernel Mode-four separate segments are defined in this mode: • kuseg-when in the kernel mode, references to this segment are treated just like user mode references, thus streamlining kernel access to user data. • ksegO--references to this 512 Mbyte segment use cache memory but are not mapped through the TLB. Instead, they always map to the first 0.5 GBytes of physical address space. • kseg1---references to this 512 Mbyte segment are not mapped through the TLB and do not use the cache. Instead, they are hard-mapped into the same 0.5 GByte segment of physical address space as ksegO. • kseg2---references to this 1 Gbyte segment are always mapped through the TLB and use of the cache is determined by bit settings within the TLB entries. 1DT79R3000 Pipeline Architecture The execution of a single IDT79R3000A instruction consists of five primary steps: -Fetch the instruction (I-Cache). 1) IF 2) RD - Read any required operands from CPU registers while decoding the instruction. 3) ALU - Perform the required operation on instruction operands. 4) MEM-Access memory (D-Cache). 5) WB - Write back results to register file. Each of these steps requires approximately one CPU cycle as shown in Figure 7 (parts of some operations overlap into another cycle while other operations require only 112 cycle). Instruction Execution IF I II-Cache RD ALU 1 RF OP MEM WB D-CACHE WB J I '--v----J one cycle Figure 7. IDT79R300DA Instruction Pipeline 30 IDT79R3000AlAE RISC CPU PROCESSOR MILITARY AND COMMERCIAL TEMPERATURE RANGES The IDT79R3000A uses a 5-stage pipeline to achieve an instruction execution rate approaching one instruction per CPU cycle. Thus, execution of five instructions at a time are overlapped as shown in Figure 8. • External Cache Memory-Local, high-speed memory (called cache memory) is used to hold instructions and data that is repetitively accessed by the CPU (for example, within a program loop) and thus reduces the number of references that must be made to the slower-speed main memory. Some microprocessors provide a limited amount of cache memory on the CPU chip itself. The external caches supported by the IDT79R3000A can be much larger; while a sma" cache can improve performance of some programs, significant improvements for a wide range of programs require large caches. • Separate Caches for data and Instructlons-Even with high-speed caches, memory speed can still be a limiting factor because of the fast cycle time of a high-performance microprocessor. The IDT79R3000A supports separate caches for instructions and data and alternates accesses of the two caches during each CPU cycle. Thus, the processor can obtain data and instructions at the cycle rate of the CPU using caches constructed with commercially available lOT static RAM devices. In order to maximize bandwidth in the cache while minimizing the requirement for SRAM access speed, the R3000A divides a single-processor clock cycle into two phases. During one phase, the address for the data cache access is presented while data previously addressed in the instruction cache is read; during the next phase, the data operation is completed while the instruction cache is being addressed. Thus, both caches are read in a single processor cycle using only one set of address and data pins. • Write Buffer-In order to ensure data consistency, all data that is written to the data cache must also be written out to main memory. The cache write model used by the IDT79R3000A is that of a write-through cache; that is, all data written by the CPU is immediately written into the main memory. To relieve the CPU of this responsibility (and the inherent performance burden) the IDT79R3000A supports an interface to a write buffer. The IDT79R3020 Write Buffer captures data (and associated addresses) output by the CPU and ensures that the data is passed on to main memory. IDT79R3000A Instruction Pipeline (5-deep) 11FT I RD I ALUf ~. MEM WB IF I RD I ALU Q Instruction Flow MEM WB I ::;:::::::;:::;:;::::::.:.: I IF I RD I IF ALU RD IF MEM I WB I ALU I MEMI WB I RD I ALU I MEM I WB I Current CPU Cycle Figure 8. IDT79R3000A Execution Sequence This pipeline operates efficiently because different CPU resources (address and data bus accesses, ALU operations, register accesses, and so on) are utilized on a non-interfering basis. Memory System Hierarchy The high performance capabilities of the IDT79R3000A processor demand system configurations incorporating techniques frequently employed in large, mainframe computers but seldom encountered in systems based on more traditional microprocessors. A primary goal of systems employing RISC techniques is to minimize the average number of cycles each instruction requires for execution. In orderto achieve this goal, RISC processors incorporate a number of RISC techniques including a compact and uniform instruction set, a deep instruction pipeline (as described above), and utilization of optimizing compilers. Many of the advantages obtained from these techniques can, however, be negated by an inefficient memory system. Figure 9 illustrates memory in a simple microprocessor system. In this system, the CPU outputs addresses to memory and reads instructions and data from memory or writes data to memory. The address space is completely undifferentiated: instructions, data, and I/O devices are all treated the same. In such a system, a primary limiting performance factor is memory bandwidth. IDT79R3000A Microprocessor Microprocessor (CPU) Memory (and I/O) Data Address Main Memory Figure 9. A Simple Microprocessor Memory System Figure 10 illustrates a memory system that supports the significantly greater memory bandwidth required to take full advantage of the IDT79R3000A's performance capabilities. The key features of this system are: Figure 10. An 1DT79R3000A System with a Hlgh·Performance Memory System 31 IDTI9R3000AlAE RISC CPU PROCESSOR MIUTARY AND COMMERCIAL TEMPERATURE RANGES 1DT79R3000A Processor Subsystem Interfaces Figure 11 illustrates the three subsystem interfaces provided by the I0T79R3000A processor: • Cache control interface (on-chip) for separate data and instruction caches permits implementation of off-chip caches using standard lOT SRAM devices. The 79R3000A directly controls the cache memory with a minimum of external components. Both the instruction and data cache can vary from a to 256K Bytes (64 K entries). The 79R3000A also includes the TAG control logic which determines whether or not the entry read from the cache is the desired data. The 79R3000A cache controller implements a direct mapped cache for high net performance (bandwidth). It has the ability to refill mul1iple words when a cache miss occurs, thus reducing the effective miss rate to less than 2% for large caches. When a cache miss occurs, the 79R300OA can support refilling the cache in 1,4, 8, 16, or 32 word blocks to minimize the effective penalty of having to access main memory. The 79R3000A also incorporates the ability to perform instruction streaming; while the cache is refilling, the processor can resume execution once the missed word is obtained from main memory. In this way, the processor can continue to execute concurrently with the cache block refill. • Memory controller interface for system (main) memory. This interface also includes the logic and signals to allow operation with a write buffer to further improve memory bandwidth. In addition to the standard full word access, the memory controller supports the ability to write bytes and half-words by using partial word operations. The memory controller also supports the ability to retry memory accesses if, for example, the data returned from memory is invalid and a bus error needs to be signalled. • Coprocessor Interface-The IDT79R3000A features a tightly coupled co-processor interface in which all co-processors maintain synchronization with the main processor; reside on the same data bus as the main processor; and participate in bus transactions in an identical manner to the main processor. The I0T79R3000A generates all required cache and memory control signals, including cache and memory addresses for attached coprocessors. As a result, only the data bus and a few control signals need to be connected to g coprocessor. The interface supports three types of coprocessor instructions: loads/stores, coprocessor operations, and processorcoprocessor transfers. Note that coprocessor loads and stores occur directly between the coprocessor and memory, without requiring the data to go through the CPU. Synchronization between the CPU and external coprocessors is achieved using a Phased-Lock Loop interface to the coprocessor. The coprocessor physical interface also includes coprocessor condition signals (CpCond(n)), which are used in coprocessor branch instructions, and a coprocessor busy signal (CpBusy) which is used to stall the CPU if the coprocessor needs to hold off subsequent operations. Finally, a precise exception interface is defined between the CPU and coprocessors using the external interrupt inputs of the CPU. This allows a coprocessor exception, even if it was the result of a multi-cycle operation, to be traced to the precise coprocessor operation which caused it. This is an important feature for languages which can define specific error handlers for each task. The interface supports up to four separate coprocessors. Coprocessor 0 is defined to be the system control coprocessor, and resides on the same chip as the CPU unit. Coprocessor 1 is the Floating Point Accelerator, lOT 79R301 OA. Coprocessors 2 and 3 are available to support an interface to application specific functions. MULTIPROCESSING SUPPORT The I0T79R3000A supports multiprocessing applications in a simple but effective way. Multiprocessing applications require cache coherency across the multiple processors. The I0T79R3000A offers two signals to support cache coherency: the first, MPStall, stalls the processor within two cycles of being received and keeps it from accessing the cache. This allows an external agent to snoop into the processor data cache. The second signal, MPlnvalidate, causes the processor to write data on the data cache bus which indicates the externally addressed cache entry is invalid. Thus, a subsequent access to that location would result in a cache miss, and the data would be obtained from main memory. The two MP signals would be generated by a external logic which utilizes a secondary cache to perform bus snooping functions. The 79R300OA does not impose an architecture for this secondary cache, but rather is flexible enough to support a variety of application specific architectures and still maintain cache coherency. Further, there is no impact on designs which do not require this feature. Further, the 79R3000A has improved on the microprocessor support found in the 79R3000, by allowing the use of cache RAMs with internal address latches in multiprocessor systems. ADV ANCED FEATURES The I0T79R3000A offers a number of additional features such as the ability to swap the instruction and data caches, facilitating diagnostics and cache flushing. Another feature isolates the caches, wh ich forces cache hits to occur regardless of the contents of the tag fields. The I0T79R3000A allows the processor to execute user tasks of the opposite byte ordering (endianness) of the operating system, and further allows parity checking to be disabled. More details on these features can be found in the IDT 79R3000A Family Hardware User's Manual. Further features of the I0T79R3000A are configured during the last four cycles prior to the negation of the RESET input. These functions include the ability to select cache sizes and cache refill block sizes; the ability to utilize the multiprocessor interface; whether or not instruction streaming is enabled; whether byte ordering follows "Big-End ian" or "Little-End ian" protocols, etc. Table 3 shows the configuration options selected at Reset. These are further discussed in the "Hardware User's Manual". BACKWARD COMPATIBILITY WITH 79R2000 The I0T79R3000A can be used in sockets designed for the 79R3000A. The pin-out of the 79R3000A has been selected to ensure this compatibility, with new functions mapped onto previously unused pins. The instruction set is compatible with that of the 79R2000 at the binary level. As a result, code written for the older processor can be executed. New features can be selectively disabled. In most 79R3000A applications, the 79R3000A can be placed in the socket with no modification to initialization settings. Further application assistance on this topic is available from lOT. PACKAGE THERMAL SPECIFICATIONS The I0T79R3000 utilizes special packaging techniques to improve both the thermal and electrical characteristics of the microprocessor. In order to improve the electrical characteristics of the device, the package is constructed using multiple signal planes, including individual power planes and ground planes to reduce noise associated with high-frequency TIL parts. In addition, the 175-pin PGA package utilizes extra power and ground pins to reduce tve inductance from the internal power planes to the power planes of the PC Board. 32 IDT79R3000AlAE RISC CPU PROCESSOR MIUTARY AND COMMERCIAL TEMPERATURE RANGES In order to improve the electrical characteristics of the microprocessor, the device is housed using cavity down packaging. In addition, these packages incorporate a copper-tungsten thermal slug designed to efficiently transfer heat from the die to the case of the package, and thus effectively lower the thermal resistance of the package. The use of an additional external heat sink affixed to the package thermal slug further decreases the effective thermal resistance of the package. The case temperature may be measured in any environment to determine whether the device is within the specified operating range. The case temperature should be measured at the center of the top surface opposite the package cavity (the package cavity is the side where the package lid is mounted). The equivalent allowable ambient temperature, TA, can be calculated using the thermal resistance from case to ambient (Oca) for the given package. The following equation relates ambient and case temperature: TA = Tc - P*0ca where P is the maximum power consumption, calculated by using the maximum Icc from the DC Electrical Characteristics section. Typical values for 0ca at various airflows are shown in table 3 for the various CPU packages. Airflow - (ftlmln) 200 400 600 800 1000 21 7 3 2 1 0.5 23 9 4 3 2.5 1.5 0 0ca (175-PGA, 144-PGA) 0ca (172 Quad Flatpack) Table 3. Thermal Resistance (0ca) at Various Alrnows 33 MIUTARY AND COMMERCIAL TEMPERATURE RANGES IDTI9R3000AlAE RISC CPU PROCESSOR WCYCLE X CYCLE YCYCLE ZCYCLE DBlkSizeO IBlkSizeO DispPar/RevEnd Reserved(1) PhaseDelayOn(2) R3000 Mode(2) DBIkSize1 IBikSize1 IStream Store Partial Exlend Cache MPAdrDisable Ig noreParity MultiProcessor BigEndian TriState NoCache BusDriveOn PhaseDelayOn(2) R3000 Mode(2) PhaseDelayOn(2) R3000 Mod e(2) PhaseDelayOn(2) R3000 Mod e(2) INPUT InlO iiiff Inl2 Inl3 Inl4 Inl5 NOTES: 1. Reserved entries must be driven high. 2. These values must be driven stable throughout the entire RESET period. Table 3: IDTI9R3000A Mode Selectable Features Data Bus Data Bus Data Bus Tag Bus - -- Tag Bus Tag Bus - r---- ..---- - AdrLo Bus ,JlI'-/ Tag AdrLo TagV TagP ~ ITrans-rp'arent Latch Data DataP DClk IClk AdrLo Bus ~ p'arent Latch '-/"./ '-/'-/ .7 Data Tag IAdr [15:2] Instruction Cache IDT79R3000A Processor with System Control Coprocessor DE ~ IRd DRd r---. :-- IWr DWr f--- WE WE '7' Clk2xSys ~ ?~,)' XEn Data Cache r-- CIk2xPhi ~i.....-Reset ~ AccTy[2:0] MemRd CpSync MemWr Run RdBusy Exc WrBusy CpBusy CpCond[O] DE Data ,;, 1~ 2- Clk2xSmp ~ Clocks Clk2xRd ~ SysDut Memory Interface DAdr Tag [15:2] 3 Coprocessors CpCond[3:1 ] BusError Intr(5:0) ~Iware nterrupts Figure 11. IDT79R3000A Subsystem Interfaces Example; 64 KB Caches 34 I I IDTI9R3000AlAE RISC CPU PROCESSOR MIUTARY AND COMMERCIAL TEMPERATURE RANGES PIN CONFIGURATION AdrLo2 AdrLo3 AdrLo4 AdrLo5 AdrLo6 AdrLo7 AdrLo8 AdrLo9 AdrLo10 AdrLo11 AdrLo12 AdrLo13 AdrLo14 VCC VCC VCC GND GND VCC VCC GND VCC Data21 Data22 Data24 Data25 Data26 Data31 DataP3 Data27 Data28 XEn Data29 Data30 Exc CIk2xPhi GND GND Clk2xSmp vce VCC GND GND GND VCC VCC VCC GND GND vce vce AdrLo15 CpCondO CpCond1 Resvd1(1) GND GND AdrLo16(2) Mrl-017(2) IntO Clk2~ IRd1 D..BQJ. IWr1 DWr1 VCC VCC inIT ln12 GIk2xBd SysOut DClk IClk IRd2 DRd2 Int3 Int4 Int5 CpBusy WrBusy RdBusy DWr2 MemWr ~ror WTr2 Reset 172·PIN CERAMIC FLATPACK (Top View) NOTES: 1. Reserved pins must not be connected. 2. AdrLo 16 & 17 are multi-function pins which are controlled by mode select programming on interrupt pins at reset time. AdrLo 16: MP Invalidate, CpCond (2). AdrLo 17: MP Stall, CpCond (3). 35 IDT79R3000AlAE RISC CPU PROCESSOR MIUTARY AND COMMERCIAL TEMPERATURE RANGES PIN CONFIGURATION 2 3 4 5 6 7 8 9 10 AdrLo epCond AdrLo(1) AdrLo(1) 16 17 0 15 11 rntr2 12 13 14 15 Wr R9S9t vcc Tag12 Tag15 A VCC AdrLo 6 AdrLo 10 AdrLo 11 VCC AdrLo 14 B AdrLo 3 DRd2 AdrLo 7 AdrLo 9 AdrLo 12 IRd2 AdrLo CpConc 13 1 intrf frrtr3 Cp Busy Bus Error C AdrLo 0 AdrLo 4 vce AdrLo 5 AdrLo 8 GND GND IntrO frrtr4 Rd Busy GND D Data 1 AdrLo 2 GND Tag14 Tag 17 Tag19 E DataP 0 Data 0 AdrLo 1 Tag16 Tag20 VCC F vce Data 7 Data 2 GND Tag21 Tag23 G Data 4 Data 3 GND GND Tag22 TagPl H Data 6 Data 5 Data 8 VCC Tag25 Tag24 J Data 10 DataP 1 Data 9 Tag28 Tag29 Tag26 K Data 15 Data 11 GND GND Tag P2 Tag27 L vce Data 12 Data 17 Acc Typ2 Tag31 Tag30 M Data 13 Data 16 DataP 2 GND Ace Typl vce N Data 14 Data 18 Data 19 GND Data 24 Data P3 VCC VCC GND GND DAd 1 Mem Wr Mem Ad Run TagV P Data 23 Data 20 IWr2 Data 22 Data 26 Data 27 XEn Data 30 CIk2x Sys CIk2x Rd DClk IRd1 IWr1 Cp Syne Ace TypO Q vce Data 21 Data 25 Data 31 Data 28 GND Data 29 Excep tion CIk2x Phi CIk2x Smp SysOut vee IClk DWrl vee VCC Intr5 Busy 144-Pln PGA (Top View) NOTE: 1. AdrLo 16 & 17 are multi-function pins which are controlled by mode select programming on interrupt pins at reset time. AdrLo16: MP Invalidate, CpCond (2). AdrL017: MP Slall, CpCond (3). 36 DR2 Tag13 TagPO Tag18 IDTI9R3000AlAE RISC CPU PROCESSOR A MIUTARV AND COMMERCIAL TEMPERATURE RANGES 2 3 4 5 6 AdrLo 6 AdrLo 10 AdrLo 11 vee AdrLo 14 9 8 7 10 AdrLo ~peond AdrLo(1) AdrLo(1) 16 17 a 15 11 12 13 14 15 vee Tiiir2 Intr5 Wr Busy Reset DR2 Tag12 Tag15 B AdrLo 3 DRd2 AdrLo 7 AdrLo 9 AdrLo 12 IRd2 AdrLo epeond 13 1 i""iltr1 Intr3 ep Busy Bus Error e AdrLo a AdrLo 4 vee AdrLo 5 AdrLo 8 GND GND vee Intra rntr4 Rd Busy GND Tag13 TagPO Tag18 D Data 1 AdrLo 2 GND GND vee GND vee GND vee GND vee GND Tag14 Tag 17 Tag19 E DataP a Data a AdrLo 1 vee vee Tag16 Tag20 vee F vee Data 7 Data 2 GND GND GND Tag21 Tag23 G Data 4 Data 3 GND vee vee GND Tag22 TagP1 H Data 6 Data 5 Data 8 GND GND vee Tag 25 Tag24 J Data 10 DataP 1 Data 9 vee vee Tag28 Tag29 Tag26 K Data 15 Data 11 GND GND GND GND Tag P2 Tag27 L vee Data 12 Data vee vee Acc Typ2 Tag31 Tag30 M Data 13 Data 16 DataP 2 GND vee GND N Data 14 Data 18 Data 19 GND Data 24 Data P3 P Data 23 Data 20 IWr2 Data 22 Data 26 a vee Data 21 Data 25 Data 31 Data 28 17 GND vee GND vee GND GND Acc Typ1 vee vee vee GND GND DAd 1 Mem Wr Mem Ad Run TagV Data 27 XEn Data 30 elk2x Sys elk2x Rd Delk IRd1 IWr1 ep Sync Acc TypO GND Data 29 Ex~ tion elk2x Phi elk2x Smp SysOut vee lelk DWr1 vee vee 175-Pln PGA (Top View) NOTE: 1. AdrLo 16 & 17 are multi-function pins which are controlled by mode select programming on interrupt pins at reset time. AdrLo16: MP Invalidate, CpCond (2). AdrLo17: MP Stall, CpCond (3). 37 IDT79R3000AlAE RISC CPU PROCESSOR MIUTARY AND COMMERCIAL TEMPERATURE RANGES PIN DESCRIPTIONS PIN NAME I/O Data (0-31) I/O DESCRIPTION A 32-bit bus used for all instruction and data transmission among the processor, caches, memory interface, and coprocessors. DataP (0-3) I/O A 4-bit bus containing even parity over the data bus. Tag (12-31) I/O A 20-bit bus used for transferring cache tags and high addresses between the processor, caches, and memory interface. TagV I/O The tag validity indicator. TagP (0-2) I/O A 3-bit bus containing even parity over the concatenation of TagV and Tag. AdrLo (0-17) 0 An 18-bit bus containing byte addresses used for transferring low addresses from the processor to the caches and memory interface. (AdrLo 16: CpCond (2), AdrLo 17: CpCond (3) set by reset initialization). maT IWrf md2 TWr2 0 Read enable for the instruction cache. 0 Write enable for the instruction cache. 0 An identical copy of TRdT used to split the load. 0 An identical copy of TWrf used to split the load. IClk 0 0 The instruction cache address latch clock. This clock runs continuously. "C5P.aT mVrT rns.a2 t5Wr2 The read enable for the data cache. 0 The write enable for the data cache. 0 An identical copy of tJRaT used to split the load. 0 An identical copy of r5WrT used to split the load. DClk 0 The data cache address latch clock. This clock runs continuously. )(En 0 The read enable for the Read Buffer. AccTyp (0-2) 0 A 3-bit bus used to indicate the size of data being transferred on the data bus, whether or not a data transfer is occurring, and the purpose of the transfer. fTeiiiWr 0 Signals the occurrence of a main memory write ~ 0 Signals the occurrence of a main memory read. BusError I Signals the occurrence of a bus error during a main memory read or write. ~ 0 Indicates whether the processor is in the run or stall state. EXceptIon 0 Indicates that the instruction about to commit state should be aborted and other exception related information. sysoUt 0 A reflection of the internal processor clock used to generate the system clock. CpSync 0 A clock which is identical to sysoUt and used by coprocessors for timing synchronization with the CPU. RdBusy I The main memory read stall termination signal. In most system designs RdBusy is normally asserted andis deasserted only to indicate the successful completion of a memory read. RdBusy is sampled by the processor only during memory read stalls. Wr8uSy I The main memory write stall initiation/termination signal. C~Busy I The coprocessor busy stall initiation/termination signal. CpCond (0-1) I A 2-bit bus used to transfer conditional branch status from the coprocessors to the main processor. CpCond (2-3) I Conditional branch status from coprocessors to the processor. Function is provided on AdrLo 16/17 pins and is selected at reset time. MPStall I Multiprocessing Stall. Signals to the processor that it should stall accesses to the caches in a multiprocessing environment. This is physically the same pin as CpCond3; its use is determined at RESET initialization. MPlnvalidate I Multiprocessing Invalidate. Signals to the processor that it should issue invalidate data on the cache data bus. The address to be invalidated is externally provided. This is the same pin as CpCond2; its use is determined at RESET initialization. Jiii(0-5) I A 6-bit bus used by the memory interface and coprocessors to signal maskable interrupts to the processor. At reset time, mode select values are read in. Clk2xSys I The master double frequency input clock used for generating Clk2xSmp I A double frequency clock input used to determine the sample point for data coming into thej2rocessor and cQP!ocessors. Clk2xRd I A double frequency clock input used to determine the enable time of the cache RAMs. Clk2xPhi I A double frequency clock input used to determine the position of the internal phases, phasel and phase2. liSsei I Synchronous initialization input used to force execution starting from the reset memory address. ReS9t must be deasserted synchronously but asserted asynchronously. The deassertion of reset must be synchronized by the leading edge of SysOut. 38 SYsOUt. IDT79R3000AlAE RISC CPU PROCESSOR MILITARY AND COMMERCIAL TEMPERATURE RANGES ABSOLUTE MAXIMUM RATINGS(1,3) SYMBOL RATING COMMERCIAL MILITARY UNIT VTERM Terminal Voltage with Respect to GND -0.5 to +7.0 -0.5 to +7.0 V TA.Te Operating Temperature Oto +70 (Ambient) -55 to +125 (Case) °C -55 to +125 -65 to +135 °C -55 to +125 -65 to +150 °C -0.5 to +7.0 -0.5 to +7.0 V TSIAS TSTG VIN Temperature Under Bias Storage Temperature(2) Input Voltage AC TEST CONDITIONS PARAMETER MIN. MAX. UNIT Input HIGH Voltage 3.0 - V VIL Input lOW Voltage - 0.4 V VIHS Input HIGH Voltage 3.5 - V VILS Input lOW Voltage - 0.4 V TEMPERATURE GND Vee Military -55°C to + 125°C (Case) OV 5.0 ± 10% O°C to +70°C (Ambient) OV 5.0 ±5% OUTPUT LOADING FOR AC TESTING 2. VIN minimum = -3.0V for pulse width less than 15ns. VIN should not exceed Vee +0.5 Volts. 3. Not more than one output should be shorted at a time. Duration of the short should not exceed 30 seconds. VIH GRADE Commercial NOTES: 1. Stresses greater than those listed under ABSOLUTE MAXIMUM RATINGS may cause permanent damage to the device. This is a stress rating only and functional operation of the device at these or any other conditions above those indicated in the operational sections of this specification is not implied. Exposure to absolute maximum rating conditions for extended periods may affect reliability. SYMBOL RECOMMENDED OPERATING TEMPERATURE AND SUPPLY VOLTAGE 39 r---r---O To Device Under Test 1DT79R3000AlAE RISC CPU PROCESSOR MIUTARY AND COMMERCIAL TEMPERATURE RANGES DC ELECTRICAL CHARACTERISTICSCOMMERCIAL TEMPERATURE RANGE TA =O°Cto +70°C, Vee a SYMBOL PARAMETER TEST CONDITIONS +5.0V±5% 16.67MHz 20.0MHz 25.0MHz 33.33MHz UNIT MIN. MAX. MIN. MAX. MIN. MAX. MIN. MAX. VOH Output HIGH Voltage Vee - Min., IOH 3.5 - 3.5 - 3.5 - 3.5 - V VOL Output LOW Voltage Vee = Min., IOL" 4mA - 0.4 0.4 - 0.4 - 0.4 V VOHe Output HIGH Voltage(7) Vee .. Min., IOH .. -4mA 4.0 - 4.0 - 4.0 V (4,6) Vee .. Min., IOH = -SmA 2.4 - - Output HIGH Voltage 2.4 - 4.0 VOHT 2.4 - 2.4 - V VOLT Output LOW Voltage (4,6) Vee = Min., IOL = SmA - O.S - O.S - O.S - O.S V VIH Input HIGH Voltage (5) 2.0 - 2.0 - 2.0 - 2.0 - V VIL Input LOW Voltage (1) - O.S - O.S - O.S - O.S V VIHS Input HIGH Voltage (2,5) 3.0 - 3.0 - 3.0 - 3.0 - V VILS Input LOW Voltage (1,2) - 0.4 0.4 0.4 - 0.4 V 10 - 10 pF 10 pF 450 - 10 - - 750 mA 10 ~ CIN Input Capacitance COUT Output Capacitance lee Operating Current hH Input HIGH Leakage (3) VIH = Vee hL Input LOW Leakage VIL loz Output Tri-state Leakage =-4mA (6) (6) Vee = 5V, TA = 70°C (3) =GND -10 -40 - VOH = 2.4V, VOL = 0.5V 10 - - -10 - -10 - -10 - ~ 40 -40 40 -40 40 -40 40 ~ 10 10 10 10 550 10 650 10 NOTES: 1. VIL Min. = -3.0V for pulse width less than 15ns. VIL should not fall below -0.5 Volts for larger periods. 2. VIHS and VILS apply to Clk2xSys, Clk2xSmp, CIk2xRd, Clk2xPhi, Cp8usy, and R9s9t. 3. These parameters do not apply to the clock inputs. 4. VOHT and VOLT apply to the bidirectional data and tag busses only. Note that VIH and VIL also apply to these signals. VOHT and VOLT are provided to give the deSigner further information about these specific signals. 5. VIH should not be held above Vcc + 0.5 volts. 6. Guaranteed by design. 7. VOHC applies to 1mr::r and Exception. 40 1DT79R3000AlAE RISC CPU PROCESSOR MIUTARV AND COMMERCIAL TEMPERATURE RANGES DC ELECTRICAL CHARACTERISTICSMILITARY TEMPERATURE RANGE (Te =-55°C to +125°C, Vee =+5.0V± 10%) SYMBOL PARAMETER 16.67MHz MAX. MIN. TEST CONDITIONS 20.0MHz MAX. MIN. 25.0MHz MIN. MAX. UNIT VOH Output HIGH Voltage Vee = Min., IOH = -4mA 3.5 - 3.5 - 3.5 - V VOL Output LOW Voltage Vee = Min., IOL = 4mA - 0.4 - 0.4 - 0.4 V VOHe Output HIGH Voltage(7) Vee = Min., IOH = -4mA 4.0 - 4.0 - V VOHT (4,6) Vee = Min., IOH = -SmA 2.4 - 2.4 - 4.0 Output HIGH Voltage 2.4 - V VOLT Output LOW Voltage (4,6) Vee = Min., IOL = SmA - O.S - O.S - O.S V VIH Input HIGH Voltage (5) 2.0 - 2.0 - 2.0 - V VIL Input LOW Voltage (1) - O.S - 0.8 - O.S V VIHS Input HIGH Voltage (2,5) 3.0 - 3.0 - 3.0 - V - 0.4 - 0.4 - 0.4 V 10 - 10 pF VILS Input LOW Voltage (1,2) CIN Input Capacitance COUT Output Capacitance lee Operating Current hH Input HIGH Leakage (3) VIH = Vee IlL Input LOW Leakage (3) VIL = GND -10 - -10 - -10 VOH = 2.4V, VOL = 0.5V -40 40 -40 40 -40 40 J.1.A loz NOTES: (6) (6) Vee = 5V, TA = 70°C Output Tri-state Leakage 10 10 550 10 - 10 675 10 10 pF 775 rnA 10 J.1.A - J.1.A 1. VIL Min. = -3.0V for pulse width less than 15ns. VIL should not fall below -0.5 Volts for larger periods. 2. VIHS and VILS apply to Clk2xSys, Clk2xSmp, Clk2xRd, Clk2xPhi, Cp8usy, and 3. These parameters do not apply to the clock inputs. J5.eSei. 4. VOHTand VOLT apply to the bidirectional data and tag busses only. Note thatVIH and VIL also apply to these signals. VOHTand VOLT are provided to give the designer further information about these specific signals. 5. VIH should not be held above Vee + 0.5 volts. 6. Guaranteed by design. 7. VOHC applies to 1m'FJ and Exception. 41 IDT79R3000AlAE RISC CPU PROCESSOR MIUTARY AND COMMERCIAL TEMPERATURE RANGES AC ELECTRICAL CHARACTERISTICS FOR IDT79R3000A(1, 2, 3)_ COMMERCIAL TEMPERATURE RANGE (TA =0°Cto+70°C. Vcc=+5.0V±5%) SYMBOL TEST CONDITIONS PARAMETER 25.0MHz 33.33MHz 20.0MHz 16.67MHz MIN. MAX. MIN. MAX. MIN. MAX. MIN. MAX. UNIT Clock TCkHigh Input Clock High(2) Note 7 12.5 - 10 - 8 TCklow Input Clock Low(2) Note 7 12.5 - 10 - 8 TCkP Input Clock Period(2) Clk2xSys to CIk2xSmp(6) Clk2xSmp to Clk2xRd(6) Clk2xSmp_to CIk2xPhi(6) - - ns ns 500 tcyc/4 tcyc/4 tcyc/4 15 0 0 4.5 500 tcyc/4 tcyc/4 tcyc/4 ns ns ns ns -2 - -1.5 ns - -0.5 - -1.5 -1 -0.5 ns 3 - 3 - 2.5 ns 3 - 3 ns 7 - 25 0 0 7 500 tcyc/4 tcyc/4 tcyc/4 - -2 - -1 3 - 5 - 4 - 8 -2.5 - 11 - 9 -2.5 - -2.5 9 6 6 20 0 0 5 500 tcyc/4 tcyc/4 tcyc/4 30 0 0 - Run Operation TOEn Data Enable(3) TOOls Data Disable(3) TOVal Data Valid Load = 25pF TWrOly Write Delay Load = 25pF Tos Data Set-up 9 TOH Data Hold(3) -2.5 TCBS CpBusy Set-up TCBH CpBusy Hold TAcTy Access Type (1 :0) Load = 25pF TAT2 Access Type (2) TMWr Memory Write TExc Exception TAval Address Valid = 25pF Load = 25pF Load = 25pF Load = 25pF TintS Int(n) Set-up TlntH Int(n) Hold 13 -2.5 - Load 9 -2.5 7 - 6 17 - 14 27 - 23 8 - 7 2 - -2.5 7 2 -2.5 7 -2.5 - -2.5 - 5 - 3.5 ns 12 - 8.5 ns - 18 5 2 - 5 -2.5 7 5 -2.5 ns ns ns ns 13.5 ns 3.5 ns 1 ns - ns 15 ns 13.5 ns ns Stall Operation TSAVal Address Valid TSAcTy Address Type TMRdi Memory Read Initiate TMRdt Memory Read Terminate TStl Run Terminate TRun Run Initiate TSMWr Memory Write TSExc Exception Valid Load - 30 Load - 27 = 25pF = 25pF Load = 25pF Load = 25pF Load = 25pF Load = 25pF Load = 25pF Load = 25pF 1 27 - 27 1 23 23 23 - 23 1 20 18 18 - 18 1 13.5 ns - 13.5 ns 7.5 ns 2 ns 3 17 3 15 3 10 2 - 7 - 6 - 4 - 3 27 3 23 3 18 2 13.5 ns - 15 - 13 - 10 - 7.5 ns 6 6 - 6 128 - 128 - 0.5 2 0.5 1 Reset Initialization TRST Reset Pulse Width TrstPll Reset timing. Phase-lock on(4, 5) Trstcp Reset timing. Phase-lock 3000 Off(4. 5) 3000 128 3000 6 3000 128 - Tcyc Tcyc Tcyc Capacitive Load Deration CLD Load Derate(6) 0.5 1 0.5 1 ns125pF NOTES: 1. All timings are referenced to 1.5V. 2. The clock parameters apply to all four 2xClocks: Clk2xSys, Clk2xSmp, Clk2xRd, and CIk2xPhi. 3. This parameter is guaranteed by design. 4. These parameters apply when the 79R301 0 Floating Point Coprocessor is connected to the CPU. With phase lock on, Reset must be asserted for the longer of 3000 clock cycles or 200 microseconds. 5. Tcyc is one CPU clock cycle (two cycles of a 2x clock). 6. With the exception of the 'RUri signal, no two signals on a given device will derate for a given load by a difference greater than 15%. 7. Clock transition time < 2.5ns for 33.33 MHz; clock transition time < 5ns for other speeds. 42 1DT79R3000AlAE RISC CPU PROCESSOR MIUTARY AND COMMERCIAL TEMPERATURE RANGES AC ELECTRICAL CHARACTERISTICS FOR IDT79R3000AE(1, 2, 3)_ COMMERCIAL TEMPERATURE RANGE (TA =0°Cto+70°C, Vcc=+5.0V±5%) SYMBOL PARAMETER TEST CONDITIONS 25.0MHz 33.33MHz 20.0MHz 16.67MHz MIN. MAX. MIN. MAX. MIN. MAX. MIN. MAX. UNIT Clock TCkHigh Input Clock High(2) Note 7 12.5 TCklow Input Clock Low(2) Note 7 12.5 TCkP Input Clock Period(2) Clk2xSys to CIk2xSmp(6) Clk2xSmp to Clk2xRd(6) Clk2xSmp to CIk2xPhi(6) 30 0 0 9 500 tcyc/4 tcyc/4 tcyc/4 10 10 25 0 0 7 500 tcyc/4 tcyc/4 tcyc/4 8 8 20 0 0 5 - 6 6 - ns ns 500 tcyc/4 tcyc/4 tcyc/4 15 0 0 3.5 500 tcyc/4 tcyc/4 tcyc/4 -1.5 - -1.5 ns -0.5 ns 2 ns 2 ns ns ns ns ns Run Operation - TOEn Data Enable(3) - -2 TOOls - -1 - -1 TOVal Data Disable(3) Data Valid Load = 25pF - 3 Write Delay Load = 25pF - 3 TWrOly 5 - 4 Tos Data Set-up 9 8 TOH Data Hold(3) -2.5 - -2.5 TCBS Cp8usy Set-up 13 - 11 - TCBH Cp8usy Hold -2.5 - -2.5 - -2.5 TAcTy Access Type (1 :0) TAT2 Access Type (2) Load = 25pF TMWr Memory Write TExc Load = 2SpF - 27 Exception Load = 25pF - 7 Load = 25pF - 1.5 9 - TAval Address Valid TintS Int(n) Set-up TlntH Int(n) Hold Load = 25pF -2.5 7 17 -2.5 8 -2.5 -2 6 14 23 7 1.5 - 6 9 6 -2.5 -0.5 3 3 5 12 18 5 1.5 - 4.5 - ns -2.5 - ns - 3.5 ns 8.5 ns 4.5 - 7 -2.5 -2.5 ns ns 9.5 ns 3.5 ns 1 ns ns ns Stall Operation 27 Load = 25pF - Load = 25pF - 27 TSAVal Address Valid Load = 25pF TSAcTy Address Type Load = 25pF TMRdi Memory Read Initiate TMRdt Memory Read Terminate 30 27 - 23 23 23 23 - 15 ns 18 13.5 ns 18 - 13.5 ns 18 - 10 ns 7.5 ns 3 ns - 20 TStl Run Terminate Load = 25pF 3 17 3 15 3 10 2 TRun Run Initiate Load = 25pF - 7 - 6 - 4 - TSMWr Memory Write Load = 25pF 3 27 3 23 3 18 2 9.5 ns TSExc Exception Valid Load = 25pF - 15 - 13 - 10 - 7.5 ns 6 6 3000 128 - 0.5 1 Reset Initialization TRST Reset Pulse Width TrstPll Reset timing, Phase-lock on(4, 5) 3000 Trstcp Reset timing. Phase-lock off(4, 5) 128 - 0.5 2 6 - 6 3000 - 3000 128 - 128 - 0.5 1 Tcyc Tcyc Tcyc Capacitive Load Deration CLD Load Derate(6) 0.5 1 ns125pF NOTES: 1. All timings are referenced to 1.SV. 2. The clock parameters apply to all four 2xClocks: Clk2xSys, Clk2xSmp, Clk2xRd, and Clk2xPhi. 3. This parameter is guaranteed by design. 4. These parameters apply when the 79R3010 Floating Point Coprocessor is connected to the CPU. With phase lock on, ~ must be asserted for the longer of 3000 clock cycles or 200 microseconds. 5. Tcyc is one CPU clock cycle (two cycles of a 2x clock). 6. With the exception of the 11Url signal, no two signals on a given device will derate for a given load by a difference greater than 1S%. 7. Clock transition time < 2.Sns for 33.33 MHz; clock transition time < Sns for other speeds. 43 IDT79R3000AlAE RISC CPU PROCESSOR MIUTARY AND COMMERCIAL TEMPERATURE RANGES AC ELECTRICAL CHARACTERISTICS FOR IDT79R3000A(1,2,3)MILITARY TEMPERATURE RANGE (Te .. -55°Cto+125°C. Vee .. +5.0V± 10%) SYMBOL PARAMETER TEST CONDITIONS 16.67MHz MAX. MIN. 20.0MHz MAX. MIN. 25.0MHz MIN. MAX. UNIT Clock TekHigh Input Clock High(2) Note 7 12.5 Teklow Input Clock Low(2) Note 7 12.5 TekP Input Clock Period(2) Clk2xSys to CIk2xSmp(6) Clk2xSmp to Clk2xRd(6) Clk2xSmp to CIk2xPhi(6) 30 0 0 9 500 tcyc/4 tcyc/4 tcyc/4 10 10 25 0 0 7 - - 8 - ns 8 - ns 500 tcyc/4 tcyc/4 tcyc/4 20 0 0 5 500 tcyc/4 tcyc/4 tcyc/4 ns ns ns ns ns Run Operation -2 - -1.5 - -1 - -0.5 ns 3 ns 8 - - TOEn Data Enable(3) - -2 TOOls Data Disable(3) - -1 TOVal Data Valid Load =25pF - 3 TWrOly Write Delay Load = 25pF - 5 Tos Data Set-up 9 TOH Data Hold(3) -2.5 -2.5 TeBs Cp8usy Set-up 13 - TeBH Cp8usy Hold -2.5 - -2.5 TAeTy Access Type (1 :0) Load = 25pF - TAT2 Access Type (2) Load = 25pF TMWr Memory Write Load = 25pF TExe Exception Load = 25pF TAval Address Valid Load = 25pF - TintS Int(n) Set-up TlntH Int(n) Hold 11 - 3 4 7 -2.5 9 -2.5 3 ns - ns - ns ns ns 27 - 23 - 7 - 7 - 2 - 2 ns 7 - ns 7 17 6 14 5 ns 12 ns 18 ns 5 ns 2 - 9 - 8 -2.5 - -2.5 - -2.5 - 30 - 23 - 20 ns 27 - 23 - 18 ns 18 ns - ns Stall Operation TSAVal Address Valid Load = 25pF TSAeTy Address Type Load = 25pF TMRdi Memory Read Initiate Load = 25pF TMRdt Memory Read Terminate Load = 25pF - 27 - 23 - 18 ns TStl Run Terminate Load = 25pF 3 17 3 15 3 10 ns TRun Run Initiate Load = 25pF - 7 - 6 - 4 ns TSMWr Memory Write Load = 25pF 3 27 3 23 3 18 ns TSExe Exception Valid Load = 25pF - 15 - 13 - 7.5 ns 1 27 1 23 1 Reset Initialization TRST Reset Pulse Width 6 - 6 - 6 Reset timing. Phase-lock on(4, 5) 3000 - 3000 Trstep Reset timing. Phase-lock Off(4, 5) 128 - 128 - 3000 - Tcyc TrstPll 128 - Tcyc 0.5 2 0.5 Tcyc Capacitive Load Deration CLD Load Derate(6) 1 0.5 1 ns125pF NOTES: 1. All timings are referenced to 1.5V. 2. The clock parameters apply to all four 2xClocks: Clk2xSys, Clk2xSmp, Clk2xRd, and Clk2xPhi. 3. This parameter is guaranteed by design. 4. These parameters apply when the 79R301 0 Floating Point Coprocessor is connected to the CPU. With phase lock on, Reset must be asserted for the longer of 3000 clock cycles or 200 microseconds. 5. Tcyc is one CPU clock cycle (two cycles of a 2x clock). 6. With the exception of the ~ signal, no two signals on a given device will derate for a given load by a difference greater than 15%. 7. Clock transition time < 2.5ns for 33.33 MHz; clock transition time < 5ns for other speeds. 44 IDT79R3000AlAE RISC CPU PROCESSOR MIUTARY AND COMMERCIAL TEMPERATURE RANGES AC ELECTRICAL CHARACTERISTICS FOR IDT79R3000AE(1, 2, 3)_ MILITARY TEMPERATURE RANGE (Te = -55°C to +125°C, Vee = +5.0V ± 10%) SYMBOL PARAMETER TEST CONDITIONS 16.67MHz MIN. MAX. 20.0MHz MIN. MAX. 25.0MHz MIN. MAX. UNIT Clock TekHigh Input Clock High(2) Note 7 12.5 - 10 - 8 - TCklow Input Clock Low(2) Note 7 12.5 - 10 - 8 - TCkP Input Clock Period(2) Clk2xSys to CIk2xSmp(6) Clk2xSmp to Clk2xRd(6) Clk2xSmp to CIk2xPhi(6) ns ns 500 tcyc/4 tcyc/4 tcyc/4 25 0 0 7 500 tcyc/4 tcyc/4 tcyc/4 20 0 0 5 500 tcyc/4 tcyc/4 tcyc/4 ns ns ns ns - -2 -2 - -1.5 ns 5 - - 8 30 0 0 9 Run Operation TDEn Data Enable(3) TDDls Data Disable(3) TDVal Data Valid TWrDly Write Delay TDS Data Set-up 9 TDH Data Hold(3) -2.5 TCBS CpBusy Set-up TCBH CpBusy Hold TAcTy Access Type (1 :0) TAT2 Access Type (2) TMWr Memory Write = 25pF Load = 25pF Load 13 -2.5 TExc Exception = 25pF Load = 25pF Load = 25pF Load = 25pF TAval Address Valid Load TlnlS Int(n) Set-up TlnlH Int(n) Hold - Load =25pF 9 -2.5 -1 3 -1 3 4 - -0.5 ns 3 ns 3 ns ns - 11 - 9 - -2.5 - -2.5 - ns 7 - 5 ns 17 27 7 2 - -2.5 - - 6 7 -2.5 - ns ns 12 ns 18 ns 7 - 5 ns 2 - 2 ns 14 23 8 - 7 - ns -2.5 - -2.5 - ns - 23 - Stall Operation TSAVal Address Valid TSAcTy Address Type TMRdi Memory Read Initiate TMRdl Memory Read Terminate Tstl Run Terminate TRun Run Initiate TSMWr Memory Write TSExc Exception Valid = 25pF = 25pF Load = 25pF Load = 25pF Load = 25pF - 30 =25pF Load = 25pF Load = 25pF Load Load Load 20 ns 18 ns 18 ns 18 ns 27 - 27 - 23 - 3 17 3 15 3 10 ns - 7 - 6 - 4 ns 3 27 3 23 3 18 ns - 15 - 13 - 10 ns 27 23 23 Reset Initialization - TRST Reset Pulse Width TrslPll Reset timing, Phase-lock on(4, 5) 3000 TrslcP Reset timing, Phase-lock Off(4, 5) 128 - 0.5 2 6 - 6 - Tcyc 3000 3000 - Tcyc 128 - 128 - Tcyc 0.5 1 0.5 1 ns125pF 6 Capacitive Load Deration CLD Load Derate(6) NOTES: 1. All timings are referenced to 1.5V. 2. The clock parameters apply to all four 2xClocks: Clk2xSys, Clk2xSmp, CIk2xRd, and CIk2xPhi. 3. This parameter is guaranteed by design. 4. These parameters apply when the 79R301 0 Floating Point Coprocessor is connected to the CPU. With phase lock on, "Res8t must be asserted for the longer of 3000 clock cycles or 200 microseconds. 5. Tcyc is one CPU clock cycle (two cycles of a 2x clock). 6. With the exception of the Jiiii signal, no two signals on a given device will derate for a given load by a difference greater than 15%. 7. Clock transition time < 2.5ns for 33.33 MHz; clock transition time < 5ns for other speeds. 45 IDT79R3000AlAE RISC CPU PROCESSOR MIUTARY AND COMMERCIAL TEMPERATURE RANGES Tckp Tcklow Tckhigh Clk2xSys Clk2xSmp Clk2xRd Clk2xPhi Figure 12. Input Clock Timing Tcyc Tsmp Tsys • These signals are not actually output from the processor. They are drawn to provide a reference for other timing diagrams. Figure 13. Processor Reference Clock Timing 46 IDTI9R3000AlAE RISC CPU PROCESSOR MIUTARV AND COMMERCIAL TEMPERATURE RANGES 2 2 Phase SysOut -----' AddrLo AccTyp 0:1 Size of Loaded Data AccTyp 2 DBus Input Tdh Data and Tag Busses Tds IClk DClk Twrdly Figure 14. Synchronous Memory (Cache) Timing 47 IDT79R3000AlAE RISC CPU PROCESSOR RUN Phase I MIUTARY AND COMMERCIAL TEMPERATURE RANGES 2 I 2 2 AddrLo Tag (Address High) AccTyp 0:1 Data (Output) FIgure 15. Memory Write Timing 48 RUN FIXUP STALL STALL I 2 MIUTARV AND COMMERCIAL TEMPERATURE RANGES IDn9R3000AlAE RISC CPU PROCESSOR RUN Phase I I Tsys7'" I .,,, Tsaval AddrL0 K: D Addr ) I Addr ~I'- Tsacty-. -+ AccTyp 2 ~at2 Tsacty ~, I 2 ~,Tsys ~~~ Tsys71L I ...l ~ -4--+ TsyS..,L D Addr ~ ~Tdval ~ I 2 Read Address -. Data Size j.;=; ~,::>·,,>, I ~(*~>I k<~ Tden-. ~ -.J '+;~diS Tdval ~>I Read Address Tden -. r!:.Tacty )K Data Miss 1-+ -+ 0t< y Cached I W~~ K=>- r---cJ d::j :1 I ~ Tmrdt -0 +- Tsmp Tds ~ -. "'f-... -Tdh RdBu sy n -+Tdhrcr- ?IS CpCon dO Tstl n I Twr~1 ITmrdi I} Tds l rc -.Td} l- ITds Tsmp r Tacty Data Size I- Data (Input ) RUN FIXUP STALL I 2 ~~. uf~" ut_ STALL 7 Tsmp L :J / +- Trd ...... - Tsys ........=.. ..,'- II .! Trun Tsmp I Figure 16. Memory Read Timing 49 1 ~ IDT79R3000AlAE RISC CPU PROCESSOR MIUTARY AND COMMERCIAL TEMPERATURE RANGES Co-Processor Store Co-Processor Load 2 2 Phase SysOut -----' Data Bus Run CpBusy -~~~----~--------~--------~--------~--~----- - - + - - - - - - i - - - ' '4--~__-----+-- Exception _ _I-' CpCond(n) _~I--_ _ _ _"""'" Condition Valid Figure 17. Co-Processor Load/Store Timing 50 IDTI9R3000AlAE RISC CPU PROCESSOR MIUTARY AND COMMERCIAL TEMPERATURE RANGES 2 2 Phase Figure 18. Interrupt Timing Phase 2 2 2 2 2 SysOut PhiOut Mode Int(n) Reset NOTES: 1. Reset must be negated synchronously; however, it can be asserted asynchronously. Designs should not rely on the proper functioning of sysoUt prior to the assertion of rs.eset. 2. If Phase:tock On or ... R3=O"""O,."O.... M,...o..,a-e are asserted as mode select options, they should be asserted throughout the Reset period, to insure that the slowest co-processor in the system has sufficient time to lock the CPU clocks. 3. 'I5.eSet is actually sampled in both Phase 1 and Phase 2. To insure proper initialization, it is recommended that 'ReSet be negated relative to the end of Phase 1. Figure 19. Mode Vector Initialization 51 IDT79R3000AlAE RISC CPU PROCESSOR MIUTARY AND COMMERCIAL TEMPERATURE RANGES ORDERING INFORMATION lOT XXXXX Device Type -.XL --1L Speed Package x Process! Temp. R~ :ank G 175 G 144 F 175-Pin PGA (Cavity Down) 144-Pin PGA (Cavity Down) 172-Pin Flat Pack 16 20 33 16.67 MHz 20.0 MHz 25.0 MHz 33.33 MHz 79R3000A 79R3000AE RISC CPU Processor Enhanced Timing Version 25 52 Commercial (O°C to +70°C) Military (-55°C to +125°C) Compliant to MIL-STD-883, Class 8 Military Temperature Range Only (;)® PRELIMINARY IDT79R3001 RISControlier™ Integrated Device Technology, Inc. • Independent block refill sizes forthe instruction and data caches • Concurrent cache refill and execution • Works on 8-, 16- and 32-bit data • Supports unaligned 32-bit data • Optimizing compilers for C, Ada, Pascal, Fortran • RTOS support for C or Ada environments FEATURES: • Enhanced Instruction Set compatible version of IDT79R3000 RISCCPU • Achieves high-performance with reduced parts count and lower overall system cost • Flexible on-chip cache controller supports various cache, main memory sizes • Supports optional data parity with parity error output signal • Works with IDT79R3010 RISC Floating-Point Coprocessor • DMA interface support • Large synchronous memory space for real-time systems • Full 32-bit operations - 32-bit registers, 32-bit address and data interface • On-chip memory management unit with 64 fully associative TLB entries maps 4 Gbyte virtual address space • High-speed interrupt response (6 interrupt input pins) with precise exception capability • High-speed CEMOSTt.I technology results in speeds from 12.5 to 25MHz • Supports caches from 8 Kbytes to 16Mbytes DESCRIPTION: The IDT79R3001 brings the high-performance inherent in the IDT79R3000 RISC Microprocessor to lower cost systems. It does this while maintaining full (both User and Kernel) software compatibility with both the IDT79R200OA and IDT79R3000 RISC Microprocessors. The IDT79R3001 achieves lower system cost by reducing the number of components required to construct a synchronous memory (or cache) external to the processor and by simplifying the asynchronous memory interface. By removing the requirement for parity and allowing the system designer to select the cache organization which best suits the system, overall parts count is dramatically reduced while maintaining high performance. CONTROL CPO (System Control Co-Processor) CPU I Master Pipeline/Bus Control 32 General Reg. ALU Shifter Exce~tionl Control egisters Local Control Logic MMU Registers Mult./Dlv. Addr. Adder PC IncremenUMux 64-Entry TLB Virtual Page Numberl Virtual Address 1 1 Tag l t I Address (24) (19) Data (32 + 4) Figure 1. 1DT79R3001 Block Diagram CEMOS and RISController are trademarks of Integrated Device Technology, Inc. APRIL 1990 COMMERCIAL TEMPERATURE RANGE • 1990 Integrated Device Technoiogy. Inc. 53 DSC-90351- COMMERCIAL TEMPERATURE RANGE IDT79R3001 RISControlier The IDT79R3001 RISC Microprocessor extends the ability of the IDT79R3000 family to support embedded and cost sensitive applications. Its level of integration and flexibility allows highperformance systems to be constructed at reasonable cost in a straightforward manner, without forcing the system designer to support features not required in his application. The IDT79R3001 consists of two tightly coupled processors integrated on a single chip. The first processor is a full 32-bit CPU based on RISC principles to achieve a new standard of performance in microprocessor based systems. The second processor is a system control co-processor, called CPO, containing a fully associative 64-entry TLB (Translation Lookaside Buffer), MMU (Memory Management Unit), and control registers, supporting a 4 Gigabyte virtual memory subsystem and a Harvard Architecture Synchronous Memory/Cache controller which achieves ultra-high bandwidth using industry standard SRAM devices. This data sheet provides an overview of the features and architecture of the IDT79R3001 CPU. A more detailed description of the operation and timing of this device is incorporated in the "IDT79R3001 Hardware User's Guide", and a detailed architectural overview is provided in the "mips RISC Architecture" book, both available from IDT. Further literature describing the hardware, software, and development tools for the IDT79R3001 are also available from IDT. ALU WB Write File· Operation D-Memory Back Reg '-y--/ One Cycle Figure 2. IDT79R3001 Flve-Stage Pipeline The five primary stages of the pipeline, each of which require approximately one CPU cycle, are: IF Instruction Fetch, when the processor fetches the instruction from the Instruction Synchronous Memory RD Read required operands from on-<:hip register file while decoding the instruction. ALU Perform the required operation on instruction operands. MEM Access data memory (load or store) we Write results back to register file. HARDWARE OVERVIEW The IDT79R3001 is a high-performance RISC microprocessor incorporating a fast execution engine and sophisticated yet flexible memory interface designed to support the processor bandwidth requirements at minimal system cost. Thus, the CPU achieves an average execution rate approaching one instruction per CPU cycle, since the execution of five instructions at a time are overlapped within the processor (Figure 3). Optimizing compiler technology fully comprehends the interaction of software with the various pipeline resources, and serves to both eliminate any potential pipeline conflicts which might arise and to maximize instruction throughput. Execution Engine The IDT79R3001 contains the same basic execution engine as the ultra-high performance IDT79R3000 and thus achieves over 20 MIPS performance at 25 MHz. The key to the performance of the processor is the instruction pipeline, illustrated in Figure 2. The execution of a single IDT79R3001 instruction consists of five primary steps, some of which may be broken down further into smaller subsets. MEM we RD ALU MEM IF RD ALU IF RD Instruction Flow MEM Current CPU Cycle Figure 3. Instruction Execution In IDT79R3001 Pipeline 54 COMMERCIAL TEMPERATURE RANGE IDTI9R3001 RISController used for main memory. The IDT79R3001 integrates a flexible Direct-Mapped Cache Controller On-Chip, eliminating external cache control logic and minimizing cache management overhead. If the synchronous memory space is used for processor caches, then cache "misses· will cause the processor to automatically process an asynchronous memory transfer to refill the cache. The key to achieving the system cost and performance goals of an IDT79R3001-based system is to partition the memory system to the needs of the application. The IDT79R3001 Memory Interfaces The key to achieving the inherent performance of the IDT79R3001 is to design a memory subsystem capable of providing a new instruction to the processor on almost every clock cycle. Like the IDT79R3000, the IDT79R3001 supports a hierarchical view of the memory subsystem. However, the IDT79R3001 allows the system designer to make more trade-offs in the partitioning and architecture of the various levels in order to more completely meet the needs of certain types of applications. The IDT79R3001 supports two classifications of external memory: synchronous and asynchronous. The Harvard-Architecture (separate instruction and data memories) synchronous memory allows the processor to achieve the highest levels of performance. The processor is able to obtain both an instruction and data word from the synchronous memory on every clock cycle, resulting in high instruction and data throughput. The asynchronous memory space contains larger, slower memory devices such as EPROM, main memory DRAMs, and peripheral devices. Multiple clock cycles are required for data movement in the asynchronous memory. Many systems implement a memory hierarchy between these two memory spaces, whereby the synchronous memory space is used as processor caches and the asynchronous memory space is Synchronous Memory System As with any high-performance processor, the IDT79R3001 requires high-bandwidth to achieve high-performance. Thus, it is important that the majority of its execution occur in the synchronous memory space. In applications which require substantial amounts of main memory, this memory space will be implemented as instruction and data caches. The synchronous memory is designed to be able to supply both an instruction and data word to the processor on each clock cycle. When the synchronous memory spaces are used as caches, then they are used to hold instruction and data that is repetitively accessed by the CPU (for example, within a program loop). This reduces the number of slower asynchronous memory cycles and thus achieves higher performance. 2 2 (Instruction Read) AddrLo DCLk IClk IRd < Data Addr \ I I \ • (Data Read) * • X Instr. Addr I \ \ I (Data Store) (Instruction Read) Data Addr \ I I * Instr. Addr \ • • Data and TAG Buses Data RAM Instr. RAM • I \ \ I CJ< Instr. RAM ).C ~ • CPU Data Pins Figure 4. Synchronous Memory Control Timing of each synchronous memory can be optionally selected at RESET time for applications which desire to make this cost trade-off. The synchronous interface works by dividing the basic CPU cycles into two phases. During one phase, a cache address is presented by the processor and captured by external latches (the latch control signals are directly generated by the CPU). During the next phase, the address for the other memory space is generated and captured while the data movement operation for the first cache is completed. The processor directly generates the SRAM Output Enable and Write Enable signals and the address latch enable signals, requiring no external decoding. This is illustrated in Figure 4. Some microprocessors incorporate small amounts of cache onchip, which has a very small and unpredictable effect on the execution of large programs. The IDT79R3001 supports caches of from 8kB in size up through 16MB, thus bringing substantial performance improvements to very large programs and also allowing realtime system designers to design cache-based systems to support deterministic requirements. The IDT79R3001 directly controls the synchronous memory interface (whether it is being used as caches or not) with a minimum of external components. The IDT79R3001 includes all control signals and cache TAG control logic (for a direct mapped cache) for the synchronous memory interfaces. Parity over the data portion 55 COMMERCIAL TEMPERATURE RANGE IDT79R3001 RISController cache sizes and cacheable main memory choices. The IDT79R3001 allows the system designer to scale the synchronous memory system exactly according to the system needs, thus eliminating extra memory and logic devices and achieving substantial cost savings with no loss of performance. Thus, the synchronous memory interface of the IDT79R3001 allows for high-bandwidth memory systems to be implemented with a minimum of control logic. This is desirable, since RiSe performance tends to be a function of memory bandwidth. By simplifying the design of the synchronous memory system (illustrated in Figure 5), it is easier for the system designer to ach ieve high performance with minimum chip count and without requiring ultra-fast or specialty components. Further, the IDT79R3001 supports the ability to refill multiple words into the cache from main memory when a cache-miss occurs, further reducing system cost and increasing performance in cacha-based systems. The IDT79R3001 can obtain 1,4,8, 16, or 32 words from main memory when processing a cache-miss, thus amortizing the cache-miss penalty over a large amount of data. The IDT79R3001 also performs instruction streaming, which is the simultaneous execution of incoming instructions while the cache is being refilled. The actual width of the tag bus, and whether or not parity over the data parts of each synchronous memory, is determined according to how the device is initialized. The IDT79R3001 can accommodate a TAG bus width of 0-19 bits, compatible with a variety of 1DT79R3001 RISControlier .. r TAG A Valid l Data (Data Parity) AddrLo OW; JRd~Clk O(:lk DRcJ r ~F;lAI · 1.lt! FCT373A - l~ Data Cache Tai s (SR M) II J f+ f+ .J. Data Cache Data (SRAM) 51_ LE ~ ~ WE 5r Instruction Cache Data ~ (SRAM) wt OE IT ... ~~L ... Instruction Cache Tais (SR M) • ~ I Figure 5. 1DT79R3001 Synchronous Interface no us memory TAG bit compare, a pull-down resistor of 4kn is connected to the appropriate IDT79R3001 TAG pin. If a TAG bit is to be included, no resistor is required (the IDT79R3001 pulls floating inputs to Vee during RESET by a small pull-up, which is disabled when RESET is negated). If a TAG bit is excluded from the cycle-by-cycle comparison, it is still driven out with the appropriate address value during write cycles or asynchronous memory reads. Thus, the system designer still has the full 4 Gbyte of address space available for address decoding, without requiring the synchronous memory to be able to cache all such addresses. The TAG Bus The TAG bus of the IDT79R3001 has been designed to allow the system designerto implement the exact cache configuration that is right for the system. For larger caches, low-order TAG bits do not need to be supplied for the TAG comparison. Additionally, the number of high-order TAG bits supplied is determined by the system designer, according to the amount of cacheable main memory the system supports. Since most embedded systems would tend to implement caches of 16KB and greater, and cacheable memory spaces of 32MB or smaller, significant cost and area reductions are achieved by configuring a smaller TAG bus. The system configures the on-chip TAG comparator at RESET Initialization time. If a TAG bit is not to be included in the synchro- 56 COMMERCIAL TEMPERATURE RANGE IDT79R3001 RISControlier to all TAG pins. The Valid Pin still needs to be supplied on each cycle, thus allowing various memory schemes to be implemented (such as static column DRAM). However, the IDT79R3001 can be initialized to not assert the Valid pin as an output during Write cycles, simplifying the design of logic to drive the signal. Figure 6 illustrates a reduced system, which implements 16KB of Instruction and 16KB of data cache, and 512MB of cacheable address space, using just 6 IDT71586 4Kx16 Latched CacheRAWt.4 components and 4 pull-down resistors. Note that in systems which do not implement the synchronous memory space as cache, then pull-down resistors would be added IDT79R3001 RISControlier . A TAG 13,29:31 . r Data (Data Parity) TAG 14:28 rlr-f Valid" ~ 4kn DWr IRcLlClk AddrLo DClk DRd IWr ~ ~. {~ ~ 1I: .... E ~ Data Data Cache Cache ~ Tags Data IDT71586 ~ 2xlDT71586 ~ We F- .t. :.. ..t. ~ .J.7 Instruction ~ Instruction Cache ~ Cache Data Tags OE 2x1DT71586 ~ IDT71586 '--~ ;:.. .t. ;:.. ... :.. Figure 6. Small Footprint Cache for 1DT79R3001 Cache Update Write Cycles When the on-chip TAG comparator indicates that the item read from the cache was not the desired item, a cache-miss is processed. A main memory (asynchronous) transfer is automatically processed. The IDT79R3001 desires to update the cache using a burst refill of mUltiple adjacent words from main memory. The processor is "stalled" until the first word of the block is available. The processor is then released, and the block of words is brought into the cache at the rate of one word per CPU clock cycle. Note that if the cache-miss was in the instruction cache, the processor is capable of simultaneously executing the incoming instruction stream as the cache is updated, thus effectively making the cache update transparent to the system and increasing performance. The IDT79R3001 utilizes a write through cache. That is, data written by the processor is both written to the cache and main memory simultaneously. Thus, main memory always has a current copy of all data. Typically, latching devices are used between the cache subsystem and the slower main memory. These Write Buffers capture the data simultaneous with the cache update, allowing the processor to continue to the next cycle without actually waiting for the main memory transfer to complete. The IDT79R3001 generates parity over the data field on write cycles, which can be propagated into both the synchronous and asynchronous memory spaces. When the processor writes less than a 32-bit quantity (a "partial" word), the processor can perform a "read-modify-write" of the cache. That is, the processor will read the 32-bit word containing 57 IOTI9R3001 RISController COMMERCIAL TEMPERATURE RANGE the partial address(es) to be updated from the cache. If a "hit" occurs, then the new data will be merged with the old and the new 32-bit value will be written both to the cache and to main memory. If a cache "m iss" occurs, then only the partial data is written to main memory and the cache is unchanged. Partial word capability is selected as a RESET option. values for the asynchronous transfer address. Note that systems which exclude invididual TAG bits from comparison (to reduce cache width) still have all TAGs available as outputs. The data path between the processor and the asynchronous memory space is managed according to the needs of the application. Write Buffer FIFO devices, such as the IDT79R3020, are used to capture address and data during store cycles. These devices are used to capture the data in one cycle, and allow the processor to continue to execute from the synchronous memory while the slower asynchronous memory actual retires the write. The read path is also constructed according to the needs of the system. If block refill is used, then the read path is highly dependent on the design of the main memory system. Pipeline devices such as IDT74FCT520A, or simple latches such as IDT74FCT374, may be used. A simple asynchronous memory interface is shown in Figure 7. In this system, main memory is assumed to be fast enough to support the block refill requirements of the system, thus simplifying the read path. In fact, both the read and wr~e data paths are actually managed through a single set of IDT29FCT52A bidirectional latching transceivers. During write cycles (whic~ically captured by Write Buffers), the processor asserts MemWr to indicate that a write cycle is in progress. The memory system negates WrBusy to indicate that the processor is done with the write cycle. During read cycles, the processor will assert MemRd to indicate that a main memory read is in progress. The memory system will hold RdBusy active until the desired data is available. The processor will activate the XEn signal to allow data to be passed from the main memory to the processor data bus. If the cache is to be updated with the new data, then the processor will assert the appropriate cache write signal to allow the cache RAMs to capture the incoming data bus. THE ASYNCHRONOUS MEMORY INTERFACE The IDT79R3001 also supports an asynchronous memory interface, which supports the use of slower memory devices such as slow DRAM, EPROM and also supports the use of peripherals and other "non-cacheable" devices. In general, if a cache-miss (or parity error, if enabled) occurs, the processor will automatically use the asynchronous memory interface to retrieve the desired data, and will update the cache accordingly. Additionally, software can force the use of the asynchronous memory space through the use of the on-chip MMU. When the processor seeks either instructions or data within a certain address range (kseg1), the processor knows that this data is uncacheable and will perform an asynchronous memory transfer. Additionally, within cacheable memory, TLB entries can be used to mark certain pages as "uncacheable". When an address of an "uncacheable" page is used, the processor will automatically use the asynchronous memory space. The asynchronous memory space uses the same data bus as the synchronous memory space. This facilitates the automatic updating of cache memory when the asynchronous memory is accessed due to cache-miss activity or memory writes. The asynchronous address bus is composed from the synchronous memory AddrLo bus, and the TAG bus. External logic devices (such as IDT74FCT374A registers) are used to capture AddrLo and TAG 58 IOTI9R3001 RISControlier COMMERCIAL TEMPERATURE RANGE .4 ~ : , , + I I-Cache I t ~ Tag . Data D-Cache t f ~ AddrLo ~ ,. IDTISR3001 RISController + FCT373AI (2) t , FCT823A (4) Address Registers XEn WrBusy SysClk MemRd MemWr RdBusvBusErr r rr: Il , !AccTn Buffered SysClk CONTROL RdIWr Ready (4) Data Transceivers Main Memory Address (32) rfl 1 1 Main Memory Contro FCT52A ·· - - ·· MainMemory Data (32) Main Memory Figure 7. IOTI9R3001 Asynchronous Interface The AccTyp bus is used to indicate the size of the data transfer (8, 16, 24, or 32 bits), and for main memory reads, whether or not the data is "cacheable". This simplifies the main memory address decoding, since the AccTyp indicates whether the main memory needs to perform a burst read of multiple words. erations, or as a source for co-processor store operations), and performs the data portion of the operation when appropriate. Thus, co-processors effectively load and store directly with memory, without requiring operands to go through the CPU first. This achieves the highest levels of perform ance (note that the co-processor interface also supports move, whereby data can be moved directly between the CPU and any co-processor). Figure 8 illustrates the use of the IDT79R301 0 in a IOTI9R3001 system. The co-processor interface manages synchronization between the parts, and is used to communicate status from the c0processor to the CPU. CpBusy, or co-processor busy, stalls the CPU until the busy co-processor resource (requested by a c0processor instruction) is free, and CpCond, or co-processor condition, is used to report status on co-processor test instructions. CpSync, is used to help the co-processor stay "locked" to the CPU, so that the co-processor knows when data is on the bus to be sampled on load operations or when to place data on the bus for store operations. Note that the co-processor sits on the same data bus as the CPU, but has no connection to the address bus. The CPU is responsible for performing all memory addressing, including the determination of "cache hit", write-buffer full cycles, and any processing that might be required for cache misses. Co-Processor Interface The I0T79R3001 implements a co-processor interface, which allows the use of the IOT79R3010 high-performance RISC Floating Point Accelerator without requiring the use of external interface components. The co-processor interface has been designed to make system co-processors appear to the programmer as if they were on-chip extensions of the core execution engine. Thus, the IDT79R3010 FPA works as a true co-processor, rather than as a peripheral which must be programmed. In the IDT79R3001 co-processor model, the CPU is responsible for controlling all data cycles. The co-processor keeps in synchronization with the CPU (including the pipeline stages), and uses a Phase-Locked Loop to keep synchronized with the processor bus traffic. The co-processor then "snoops" the data bus, watching for co-processor instructions. It also knows when data cycles on the bus are intended for it (either as a target in co-processor load op- 59 1DT79R3001 RISController COMMERCIAL TEMPERATURE RANGE I IDT79R3001 RISController Clock Generator I ... Clocks ....CpBusy ...... Run Exception Int(n) CpCond1 ...... Clocks FpBusy Run Exception Fplnt FpCond CpSync FpSync 4 Tag ~ Data 1DT79R3010 FPA FpSysout FpSysin AddrLo Data A ~ J r Addr Addr D-Cache I-Cache Tag .4 Data t Tag 4 Data t Figure 8. 1DT79R3001 Interface to IDT79R3010 Floating Point Co-Processor the on-chip data registers, status register, Exception PC and exception "cause" register. Note that the co-processor model includes "precise exceptions". That is, an exception is signaled to the exact instruction whi?h generated the exceptional condition. No further state commitments are made by the IDT79R3001 and, thus, the exact cont~xt. at the time of the exception is known to the programmer. This IS true even for multi-cycle operations, such as those of the FPA. Interrupts The IDT79R3001 features 6 separate interrupt input pins. Interrupts are not vectored, but rather cause the general exception vector address to be the next execution address. These pins are not encoded internally; external logic can choose to implement these interrupt lines as either 6 or 64 interrupt sources; software would then perform the appropriate decoding to get to the specific interrupt handler. Interrupts are recognized in the ALU stage of the on-chip pipeline. Instructions less advanced in the pipeline are "flushed" and will be restarted when the return from exception oCcurs (an onchip register contains the address of the instruction which was excepted). Instructions further advanced in the pipeline are allowed to continue. Unlike other RISC processors, the IDT79R3001 does not require the programmer to save and restore pipeline status to allow normal execution to be resumed. Depending on the application and exception, at most software would need to save/restore DMA Interface The IDT79R3001 features a simple DMA interface which allows an external master to gain control of the synchronous memory space. Note that it is not necessary to include logic on the CPU.to arbitrate for the asynchronous memory space; the read/wnte buffer interface is where such arbitration logic belongs and it is left to the system designer to implement the type of asynchronous memory structure that best fits the application. 60 COMMERCIAL TEMPERATURE RANGE 1DT79R3001 RISControlier .. ~ A 4 r- '1 , Tag Data AddrLo Cache Ctrl ~ '373" Synchronous 4-C> Data Memory f-<> ... ~ '--- • .. .-- ~ 1DT79R3001 RISController r-- '--- ~ '373" DMAStall t Req. AddrLo Cache Ctrl DMA Controller f--t> Synchronous Instruction Memory ~ ..... .... .... '--- I Tag Async IfF Ctrl .. , .., .... .... I Main Mem Ctrl Async.I/F Memory orUO J I Figure 9. IDT79R3001 DMA Interface When an external master "owns" the synchronous bus, the CPU will tri-state the following pins and buses: Advanced Features The IDT79R3001 contains special features which provide added flexibility across a number of applications, as well as allow for system diagnostic support. In support of diagnostics, the IDT79R3001 allows for cache "swapping" (interchange of which memory bank is for instruction and which is for data), which is useful in system initialization, cache flushing, and diagnostics. Additionally, the caches can be "isolated" from main memory, which forces cache "hits" to occur regardless of the tag comparison, and which is useful in determining that the synchronous memory space RAMs are functional. An additional feature is the ability to enable parity checking over the data field of each synchronous memory. If parity is enabled, the processor will check the parity when a synchronous access occurs; if a parity error is detected, it is signaled to the external world on the Parity Error signal and a cache-miss cycle is processed. The Parity Error signal will remain low until the parity error flag in the CPO status register is cleared by software. A number of other system selectable features are selected at reset time. The input reset ''vectors" are sampled on the interrupt input lines during the last four cycles of the reset period. The input vectors are listed in Table 1. These selections include the ability to select the block refill sizes for each of the instruction and data memories, whether Big Endian or Little Endian order is to be used, whether to use data parity, and whether or not to accommodate a Phase-Locked Loop for a co-processor. The initialization of the CPU and meaning of each input vector is more fully explained in the "IDT79R3001 Hardware User's Guide". AddrLo: The Synchronous memory direct address bus. Data & Tag: The synchronous memory RAM data lines. Cache Control: IRd,lWr, IClk, DRd, DWr and DClk. This allows the external master to use the existing control lines to control the synchronous memory. XEn: The read buffer transceiver enable, which will allow the external master to use the read/write buffer path for DMA. Valid: This enables the DMA interface to be used for multiprocessing applications. The DMA interface consists of a single input signal, DMAStall, which causes the processor to stall and to tri-state the above named lines. The external master is guaranteed mastership of the bus within a very short number of cycles, depending on the exact external bus activity of the CPU when the DMA was requested. The DMA master negates the DMAStall signal when the DMA operation is completed to allow the CPU to resume processing. Consult the "IDT79R3001 Hardware User's Guide" for more details. Figure 9 illustrates the system connection of an external DMA master to a IDT79R3001 system. 61 1DT79R3001 RISController COMMERCIAL TEMPERATURE RANGE INPUT WCYCLE X CYCLE Y CYCLE Z CYCLE IntO Reserved Reserved Reserved Reserved iiiIT Reserved Reserved Reserved Reserved Ti1t2 DBlkSizeO DBIkSize1 Parity On Valid Output Int3 IBlkSizeO IBIkSize1 Store Partial ControlLow Int4 PilOn PilOn PilOn PilOn IntS Reserved BigEndian TriState Reserved "Reserved signals must be "high" during these cycles. Table 1. 1DT79R3001 Mode Selectable Features value "0", a useful constant; and register r31 is used as the link register in jump-and-link instructions (the return address for subroutine calls). Otherwise, there is no requirementthat a particular register be used as a stack or frame pointer, etc., although there is a register convention as part of the "mips ABI" (Applications Binary Interface standard) which the compiler suite uses. The CPU registers are illustrated in Figure 10. Note that there is no Program Status Word register shown in this figure. The functions traditionally provided by a PSW register are instead provided in the Status and Cause Registers incorporated within the on-chip System Control Co-Processor (CPO). The instruction set does not use condition codes. PROCESSOR ARCHITECTURE The IDT79R3001 is a full implementation of the IDT79R2000N IDT79R3000 Instruction Set Architecture (the MIPS-liSA). This architecture is discussed in great detail in "mips RISC Architecture", available from IDT. IDT79R3001 CPU Registers The IDT79R3001 CPU provides 32 general purpose (orthogonal) 32-bit registers, a 32-bit Program Counter and two 32-bit registers used to hold the results of the CPU integer multiply and divide operations. Two of the 32 general registers have special purposes designed to increase processor performance: register rO is hardwired to the General Purpose Registers 31 Multiply/Divide Registers 0 0 31 0 HI r1 o 31 r2 - LO 31 r29 Program Counter r30 o PC r31 Figure 10. 1DT79R3001 Registers able to the subsequent instruction). However, in the majority of cases the compilers (and even the MIPS assembler) is able to reorder instructions to fill these latency cycles with useful instructions which do not require the results of the previous instruction (in the worst case, a NOP instruction is inserted). This effectively eliminates these latency effects and does not require the applications programmer to be aware of the pipeline structure. The actual instruction set of the CPU was determined after extensive simulations to determine which instructions should be implemented in hardware and which operations are best synthesized in software from other basic operations. This methodology has resulted in the highest performance processor available. Instruction Set Overview AIIIDT79R3001 instructions are 32 bits long and there are only three instruction formats (see Figure 11). This approach simplifies decoding, thus minimizing instruction execution time. The IDT79R3001 processor initiates a new instruction on every RUN cycle, and is able to complete an instruction on almost every clock cycle. The only exceptions are the LOAD instructions and BRANCH instructions, which each have a single cycle of latency associated with their execution (that is, the instruction immediately after the branch is always executed regardless of the branch condition; similarly, the data loaded by a LOAD instruction is not avail- 62 IDT79R3001 RISController COMMERCIAL TEMPERATURE RANGE I-Type (Immediate) 31 2625 I o~ I rs 2120 I 0 1615 I rt I Immediate J-Type (Jump) 31 2625 I o~ 0 I R-Type (Register) 31 2625 I op I I tar~et rs 2120 I 1615 rt I 11 10 rd I 65 re I 0 funct I Figure 11. IDT79R3001 Instruction Formats The IDT79R300 1 instruction set can be divided into the following groups: • Load/Store Instructions move data between memory and the general registers. These are all "I-Type" instructions. The only addressing mode supported is base register plus signed, immediate 16-bit offset. This effectively allows three addressing modes: register plus offset, register (using zero offset), and immediate (using rO, the zero register). The Load instruction has a single cycle of latency, as descrioed above. That is, the instruction immediately after the load instruction cannot rely on the new data; however, the assembler and compilers automatically handle this, reordering code to insure that no conflicts occur. Note that the store operation has no latency in its effect. Loads and stores can be performed on byte, half-word, word, or unaligned word data (32-bit data not aligned on a modul0-4 address). • Computational instructions perform arithmetic, logical, and shift operations on values in registers. They occur in both "R-Type" (both operands and the result are general registers), and "I-Type" (one operand is a 16-bit immediate value) formats. Note that computational instructions are three operand instructions: that is, the result register can be different from both source registers. This means that operands need not be overwritten by arithmetic operations. This results in a more efficient use of the register set, and further increases performance. • Jump and Branch instructions change the flow of control of a program. Jumps are always to a paged absolute address formed by combining a 26-bit target with four bits of the Program Counter ("J-Type" format for subroutine calls), or 32-bit register byte addresses ("R-Type", for Returns and dispatches). Branches have 16-bit offsets relative to the program counter ("I-Type"). Jump and Link instructions save a return address in ,Register 31. The IDT79R3001 instruction set features numerous branch conditions. Included is the ability to branch based on a comparison of two registers, or on the comparison of a register to zero. Thus, net performance is increased since the processor does not have to precede the branch instruction with arithmetic operations. • Co-processor instructions perform operations in the co-processors (such as the IDT79R3010 FPA). Co-processor Loads and Stores are "I-Type"; computational instructions have co-processor dependent formats. • Co-processor 0 instructions perform operations on the System Control Co-Processor (CPO) registers to manipulate the memory management and exception handling facilities of the on-chip co-processor. • Special instructions perform a variety of tasks, including movement of data between general and special registers, system calls, and breakpoint operations. These are always "R-Type". IDT79R3001 System Control Co-processor (CPO) The IDT79R3001 can operate with up to four tightly coupled c0processors, designated CPO-CP3. CPO is included on-chip as co-processor 0, the System Control Co-processor. CPO is responsible for supporting both the virtual memory system and the exception handling functions of the IDT79R3001. 63 1DT79R3001 RISControlier COMMERCIAL TEMPERATURE RANGE OP DESCRIPTION OP Multiply/Divide Instructions Load/Store Instructions LB LBU LH LHU LW LWL LWR Load Load Load Load Load Load Load Byte Byte Unsigned Halfword Halfword Unsigned Word Word Left Word Right SB SH SW SWL SWR Store Store Store Store Store Byte Halfword Word Word Left Word Right DESCRIPTION MULT MULTU DIV DIVU Multiply Multiply Unsigned Divide Divide Unsigned MFHI MTHI MFLO MTLO Move From HI Move To HI Move From LO Move To LO J JAL JR JALR Jump Jump and Link Jump to Register Jump and Link Register Branch on Equal Branch on Not Equal Branch on Less than or Equal to Zero Branch on Greater Than Zero Branch on Less Than Zero Branch on Greater than or Equal to Zero Branch on Less Than Zero and Link Branch on Greater than or Equal to Zero and Link Jump and Branch Instructions Arithmetic Instructions (ALU Immediate) ADDI ADDIU SLTI SLTIU Add Immediate Add Immediate Unsigned Set on Less Than Immediate Set on Less Than Immediate Unsigned BEQ BNE BLEZ BGTZ BLTZ BGEZ AND! ORI XORI AND Immediate OR Immediate Exclusive OR Immediate BLTZAL BGEZAL LUI Load Upper Immediate Special Instructions Arithmetic Instructions (3-operand, register-type) ADD ADDU Add Add Unsigned SUB SUBU Subtract Subtract Unsigned SLT SLTU Set on Less Than Set on Less Than Unsigned AND OR XOR NOR AND OR Exclusive OR NOR SYSCALL BREAK Co-processor Instructions LWCz SWCz MTCz MFCz CTCz CFCz COPz BCzT BCzF Shift Shift Shift Shift Shift Shift Load Word from Co-processor Store Word to Co-processor Move To Co-processor Move From Co-processor Move Control to Co-processor Move Control From Co-processor Co-processor Operation Branch on Co-processor z True Branch on Co-processor z False System Control Co-processor (CPO) Instructions Shift Instructions SLL SRL SRA SLLV SRLV SRAV System Call Break Left Logical Right Logical Right Arithmetic Left Logical Variable Right Logical Variable Right Arithmetic Variable MTCa MFCa Move To CPO Move From CPO TLBR TLBWI TLBWR TLBP RFE Read indexed TLB entry Write Indexed TLB entry Write Random TLB entry Probe TLB for matching entry Table 2. 1DT79R3001 Instruction Summary 64 Restore From Exception IDT79R3001 RISController COMMERCIAL TEMPERATURE RANGE EntrYH~ Entrylo ] 63 TLB 8~ 7 ________________ ~ Not Accessed by Random O~~~~~~~~ 1'.;,.\·1 Used with Exception Processing D Used with Virtual Memory System Figure 12. System Control Co-processor (CPO) Registers tion handling registers. Figure 12 illustrates the register set of the System Control Co-processor. Table 3 provides a brief explanation of the function of each of these registers. A more detailed explanation of the use of each of these registers is included in the "mips RISC Architecture" manual. CPO Registers As a co-processor, CPO has a number of registers which it uses to perform its control functions. These include 64 fully associative Translation Lookaside Buffers (TLBs), used to manage the virtual memory space; registers to manage the TLB set; and the excep- REGISTER DESCRIPTION EntryHi High half of a TLB entry EntryLo Lower half of a TLB ~ntry Index Programmable pointer into TLB array Random Pseudo-random pointer into TLB array Status Mode, interrupt enables and diagnostic status information Cause Indicates nature of last exception EPC Exception Program Counter-contains address of instruction which detected the exception Context Pointer into the kernel's virtual Page Table Entry array BadVA Most recent bad virtual address PrlD Processor revision identification Table 3. CPO Registers 65 IDT79R3001 RISControlier COMMERCIAL TEMPERATURE RANGE to Co-processor 0 and the Kernel has not enabled the User to access the co-processor, an exception will occur. Similarly, if a User task attempts to use a Kernel virtual address, an exception will occur. Thus, system resources are protected from User tasks. The manner in which memory addresses are translated (mapped) depends on the operating mode of the IDT79R3001 and on the virtual address desired. Figure 13 illustrates the virtual address mapping performed by the 10T79R3001: User Mode - in this mode, a single, uniform virtual address space (kuseg) of 2 Gbyte is available to each user task (tasks are further identified by a 6-bit process identifier field in order to form unique virtual addresses). All references to this segment are mapped using the TLB, which utilizes both the virtual address and the Process 10 field to perform the virtuaJ-to-physical mapping (note that this allows the cache to be shared by up to 64 User processes at a time without requiring time consuming Cache or TLB flushing). Memory Management System The 10T79R3001 supports a virtual memory system, so that each task in a given application can be unaware of the addressing needs of other tasks. This is also useful in systems with limited physical memory; the IOT79R3001 provides for the logical expansion of memory by translating addresses composed in a large virtual space into available physical memory addresses. IDT79R3001 Operating Modes The IOD9R3001 has two operating modes: User Mode and Kernel Mode. The 10T79R3001 normally operates in the User Mode until an exception is detected, forcing it into the Kernel Mode. The processor remains in Kernel Mode until the exceptions are handled and the processor executes an RFE (Return from Exception) instruction, which will restore it to User Mode. Kernel Mode allows software to alter machine state information such as that contained in the CPO registers; that is, if in User Mode an access is attempted MMU ADDRESS TRANSLATION VIRTUAL PHYSICAL Kernel Mapped (kseg2) OxcOOOOOOO OxaOOOOOOO Ox80000000 Kernel Uncached (kseg1 ) Physical Memory Kernel Cached (ksegO) User Mapped Cacheable .. ... Any (kuseg) OxOOOOOOOO I~ ~ Oxffffffff y~ 3548 MB J Memory 512 MB "'" the reset vector is contained in this segment, so that the processor does not require either the cache or the TLB to be valid at RESET time. • kseg2 - References to this 1 Gbyte segment are always mapped through the TLB. As with kuseg, the ability of memory pages to be cached is determined by a bit setting in the TLB entry for that page. Kernel Mode - Four separate segments are accessible through this mode: • kuseg - When in the Kernel Mode, references to this segment are treated just like User Mode references, thus streamlining Kernel accesses to User memory. • ksegO - References to this 512 Mbyte segment may use the cache memory, but are not translated by the TLB. Instead, these addresses map directly to the first 512 Mbytes of the physical address space. Note that many dedicated embedded applications will utilize this address space and kseg1 only, ratherthan any of the TLB mapped segments. • kseg1 - References to this 512 Mbyte segment are not mapped through the TLB. Additionally, this memory is viewed as uncacheable, which means that references through this segment will always use the asynchronous memory interface. As with ksegO, references through this segment are hard-mapped to the first 512 Mbytes of physical memory. When the processor boots, The Translation Lookaslde Buffer (TLB) The translation of virtual addresses in either kuseg or kseg2 (mapped segments) is performed by the on-chip Translation Lookaside Buffer array. This array consists of 64 fully-associative (content addressable) memory elements. Each entry maps a 4Kbyte virtual page to a 4Kbtye physical page. Each TLB entry contains other information about the virtual address it maps (such as which User process it maps) and also about the physical address (such as whether it is cacheable or writeable). 66 IDT79R3001 RISController COMMERCIAL TEMPERATURE RANGE 4443 63 I VPN "'--- I 3837 TLBPID I 0 INIDlvl~ PFN I ----- ~ EntryHI 0 121110987 3231 I 0 -- ~ EntryLo D - Dirty Page / Write Protect V - Valid TLB Entry G - Global translation (ignore PID) 0- Reserved VPN-Virtual Page Number TLBPID - Process Identifier PFN - Physical Frame Number N - Non-cacheable Physical Page Figure 14. TLB Entry Format ware enough information to obtain the appropriate TLB entry at speeds which exceed those achieved by many CPUs which use hardware TLB replacement (10-12 cycles under UNIX). When a TLB miss occurs, the address of the instruction which was executing is stored in the EPC register, and the BadVA register contains the address which was being translated. The Context register uses the BadVA value to generate a direct pointer to the kernel Page Table Entry for the desired virtual address. The Random register suggests the TLB entry to be replaced by the new entry. Note that the lower eight TLB entries are not pointed to by Random; the kernel software can thus insure that it is constantly mapped, and deterministic response is guaranteed. Figure 14 illustrates the format of each TLB entry. The translation operation is illustrated in Figure 15. The upper portion of the desired virtual address is compared against the VPN field of each TLB entry. Additionally, the current process ID (contained in the TLBHI register) is matched against the PID field of the TLB entry (if the TLB entry is marked as Global, the PID comparison is ignored). If a match occurs, and the TLB entry is marked as Valid, then the translation is completed by replacing the VPN of the virtual address with the corresponding PFN (Physical Frame Number). Note that the use of the TLB does not incur an execution penalty, since the execution engine pipeline includes stages to cover for the time required to make the TLB search and translation. TLB misses occur when no successful match occurs. These events are handled in software. The CPO registers give the soft- Program Counter Current Process ID Virtual Address s~ PID VPN Flaqs 63 62 61 60 PFN ··· • CAM : (Content Addressable : Memory) 3 2 RAM 1 a t I ~ =~ I 31 12 11 Physical Address 0 Figure 15. Virtual to Physical TLB Translation IDT79R3000. At the system level, some hardware re-design is necessary to achieve the cost savings inherent in the IDT79R3001 hardware interface. BACKWARD COMPATIBILITY WITH IDT79R2000A AND 79R3000 PROCESSORS The IDT79R3001 can execute the same binary software (either kernel or user) that is executed by either the IDT79R2000A or 67 COMMERCIAL TEMPERATURE RANGE 1DT79R3001 RISControlier PIN DESCRIPTIONS PIN NAME 110 DESCRIPTION Memory Interface Data (0:31) I/O A 32-bit bus used for all instruction and data transmission among the processor, synchronous memory space, asynchronous memory space and co-processors. DataP (0:3) I/O A 4-bit bus containing even parity over the data bus. If parity checking is enabled, a parity error will cause the l5EiTsignal to be asserted and a cache-miss to occur. Regardless of whether parity checking is enabled, the processor will always generate parity on writes. Tag (13:31) I/O A 19-bit bus used for transferring cache tags and high-order address bits between the processor, caches and asynchronous memory spaces. AddrLo (0:23) a A 24-bit bus containing low-order byte addresses for both the synchronous (cache) and asynchronous memory spaces. Synchronous Memory Control meT TWr IClk r>J5.d T5Wr DClk a a a a a a The output enable for the instruction cache. The polarity of this signal is selectable. The write enable for the instruction cache. The polarity of this signal is selectable. The instruction cache address latch clock. The clock runs continuously. The output enable for the data cache. The polarity of this signal is selectable. The write enable for the data cache. The polarity of this signal is selectable. The data cache address latch clock. The clock runs continuously. Valid I/O A high on this signal indicates that the Tags just read from the cache are valid. When a cache update occurs, the processor will generate the appropriate Valid bit. T5EIT a If parity checking is enabled, this signal is an active low output of the internal CPO parity error status bit. It is driven low when a parity error is detected and remains low until software clears the parity error flag in the status register. This pin is physically the same pin as AccTyp2. Its function is selected during device reset. Asynchronous Memory Interface XEi a The transceiver enable for the read buffer. AccTyp (0:2) a a a A 3--bit bus used to indicate the size 01 data bein~ transferred on the asynchronous memory bus, whether~ot a data transfer is occurring and the purpose of the trans er. If parity checking is enabled, AccTyp2 becomes the rr signal. fTemWr 1TemP.d BusError I Signals the occurance of an asynchronous memory write cycle. Signals the occurance of an asynchronous memory read cycle. Signals the occurance of a bus error during an asynchronous memory transfer cycle. SysOut a a a RdBusy I The asynchronous memory read stall termination signal. In most system designs, RdBusy is normally asserted and is deasserted only to indicate the successful completion of the memory read. RdBusy is sampled by the processor only during memory read stalls. WrBuSY I The asynchronous memory write stall initiation/termination signal. 1iiiii EXceplion Indicates whether the processor is in a RUN or STALL state. Indicates the instruction about to commit processor state should be aborted and other exception related infonmation. A clock derived from the internal processor clock used to generate the system clock. Wr&i"SY is only sampled during write operation. Co-Processor Interface CpSync a CpBusy I The co-processor busy stall initiation/termination signal. I A 4-bit bus used to transfer conditional branch status from the co-processors to the CPU. CpCond(O) is used to control whether or not a cache burst refill occurs; the other signals are used as input port pins for co-processor branch instructions. CpCond (0:3) A clock which is identical to SysOut and used by co-processors for timing synchronization with the CPU. Processor Control Signals DMAStall I DMA Stall. Signals to the processor that it should stall accesses to the synchronous memories and tri-state the synchronous memory Interface. liif(0:5) I A 6-bit bus used to signal maskable interrupts to the CPU. A reset time, mode values are sampled from this bus to initialize the processor. During normal operation, these signals are not latched by the processor and must remain asserted until the processor acknowledges the interrupt (through software) to the interrupt source. Clk2xSys I The master double frequency input clock, used to generate SysOut. Clk2xSmp/Rd I A double frequency clock inFeut used to determine the sample point for data coming into the CPU and co-processors and used to determine the enab e time of the synchronous memory RAMs. Clk2xPhi I A double frequency clock input used to determine the position of the two internal phases. P.eset I Initialization input used to force execution starting from the re~mory address. but must be negated synchronously with the leading edge of SysOut. 68 T5.eSei should be asserted asynchronously COMMERCIAL TEMPERATURE RANGE 1DT79R3001 RISController ABSOLUTE MAXIMUM RATINGS(1,3) SYMBOL RATING COMMERCIAL MILITARY UNIT -0.5 to +7.0 -0.5 to +7.0 V AMBIENT TEMPERATURE VTERM Terminal Voltage with Respect to GND TA Operating Temperature o to +70 -55 to +125 °C TSIAS Temperature Under Bias -55 to +125 -65 to +135 °C TSTG Storage Temperature(2) -55 to +125 -65 to +150 °C VIN Input Voltage -0.5 to +7.0 -0.5 to +7.0 V OUTPUT LOADING FOR AC TESTING >--.,.---1> To Device NOTES: 1. Stresses greater than those listed under ABSOLUTE MAXIMUM RATINGS may cause permanent damage to the device. This is a stress rating only and functional operation of the device at these or any other conditions above those indicated in the operational sections of this specification is not implied. Exposure to absolute maximum rating conditions for extended periods may affect reliability. Under Test 2. VIN minimum = -3.0V for pulse width less than 15ns. SIGNAL CL VIN should not exceed Vee +0.5 Volts. 3. Not more than one output should be shorted at a time. Duration of the short should not exceed 30 seconds. DC ELECTRICAL CHARACTERISTICSCOMMERCIAL TEMPERATURE RANGE SYMBOL PARAMETER (TA IRd, DRd, IWr, DWr 50pF All others 25pF = ooe to +70 oe, Vee = +5.0V +- 5%) 16.67MHz MIN. MAX. TEST CONDITIONS 20.0MHz MIN. MAX. 25.0MHz MAX. MIN. UNIT VOH Output HIGH Voltage Vee 3.5 - 3.5 - 3.5 V Output LOW Voltage Vee - 0.4 - - VOL 0.4 - 0.4 V VOHT Output HIGH Voltage (4,7) 2.4 V 4.0 4.0 - - Output HIGH Voltage (8) - 2.4 VOHe 4.0 - V VOLT Output LOW Voltage (4,7) - O.S - O.S - O.S V VIH Input HIGH Voltage (5) 2.0 - 2.0 - 2.0 - V = Min., IOH = -4mA = Min., IOH = 4mA Vee = Min., IOH = -SmA Vee = Min., IOH = -4mA Vee = Min., IOH = SmA 2.4 VIL Input LOW Voltage - O.S - O.B - O.B V VIHS Input HIGH Voltage (2,5) 3.0 - 3.0 - 3.0 - V VILS Input LOW Voltage (1,2) - 0.4 - 0.4 - 0.4 V IRESET Input HIGH Current(6) 10 100 10 100 10 100 ~A CIN Input Capacitance(7) 10 pF 10 - - Output Capacitance(7) 10 - 10 pF lee Operating Current - 10 COUT 575 - 650 - 750 mA IIH Input HIGH Leakage (3) - 10 - 10 ilL Input LOW Leakage (3) loz Output Tri-state Leakage = Max. VIH = Vee VIL = GND VOH = 2.4V, VOL = 0.5V Vee -10 -40 10 - -10 40 -40 - 10 ~A - -10 - ~A 40 -40 40 ~A NOTES: 1. VIL Min. = -J.OV for pulse width less than 15ns. VIL should not fall below -0.5 Volts for longer periods. 2. VIHS and VILS apply to Clk2xSys, Clk2xSmp/Rd, Clk2xPhi, CpBusy, and l1EiS9t. 3. These parameters do not apply to the clock inputs. 4. VOHT and VOLT apply to the bidirectional data and tag buses only. Note that VIH and VIL also apply to these signals. VOHT and VOLT are supplied as additional information to help the system designer understand the relationship between current drive and output voltage on these pins. 5. VIH should not be held above Vcc + 0.5 volts. 6. The I DTI9R300 1 contains an internal pull-up/current source on the TAG pins to facilitate initialization. This current source is disconnected when l1EiS9t is inactive. 7. Guaranteed by design. S. VOHC applies to 11Uii and EXception. 69 COMMERCIAL TEMPERATURE RANGE 1DT79R3001 RISControlJer AC ELECTRICAL CHARACTERISTICS-(1,4) COMMERCIAL TEMPERATURE RANGE (TA SYMBOL PARAMETER =0°Cto+70°C,Vcc=+5.0V±5%) 20.0MHz 16.67MHz MAX. MAX. MIN. MIN. TEST CONDITIONS 25.0MHz MIN. MAX. UNIT Clock - TCkHigh Input Clock High(2) Transition < 5ns 12.5 - 10 - 8 TCklow Input Clock Low(2) Transition < 5ns 12.5 - 10 - 8 30 500 25 500 20 500 ns TCkP Clk2xSys to CIk2xSmp/Rd(S) 0 Tcyc/4 0 Tcyc/4 0 Tcyc/4 ns Clk2xSmp/Rd to Clk2xPhi(S) 9 Tcyc/4 7 Tcyc/4 5 Tcyc/4 ns - -2 - -1.5 ns -1 - -2 - -1 -0.5 ns 3 - 3 2 ns 5 - 4 - 3 ns - 8 -2.5 - - ns 11 - 9 - ns -2.5 - -2.5 - ns - 5 ns Input Clock Period(S) ns ns Run Operation TOEn Data Enable(3) TOOls Data Disable(3) TOVal Data Valid Load = 25pF TWrDly Write Delay Load = 25pF Tos Data Set-up TOH Data Hold Tcss CpBusy Set-up TCSH CpBusy Hold TAcTy Access Type (1 :0) Load = 25pF - TAT2 Access Type2 Load = 25pF 17 TMWr Memory Write Load = 25pF 1 TExc Exception 9 -2.5 13 -2.5 7 - 6 6 -2.5 ns - 14 - 12 - ns 27 1 23 1 18 ns - 7 - 5 ns - 23 - 20 ns 23 - 18 ns Load = 25pF - 7 Load = 25pF - 30 Stall Operation TSAVal Address Valid TSAcTy Address Type Load = 25pF TMRdi Memory Read Initiate Load = 25pF 1 1 18 ns TMRd Read Terminate Load = 25pF - 7 - 7 - 5 ns TStl Run Terminate Load 2 17 2 15 2 11 ns TRun Run Initiate = 25pF Load = 25pF - 7 - 6 - 4 ns 27 27 23 18 ns 15 ns Memory Write Load = 25pF Exception Valid Load - TDMAOis DMA Drive On = 25pF Load = 25pF 3 15 3 15 3 15 ns TOMAEn DMA Drive Off Load = 25pF - 10 - 10 - 10 ns 6 - 6 6 - Tcyc 140 - - 140 - J.l.S 20 1 - 23 1 TSMWr TSEc 1 27 1 18 - Reset Initialization TRST Reset Pulse Width TRSTTAG Reset Pulse Width, Pull-downs on Tag 140 Capacitive Load Deration CLD 0.5 Load Derate(6) 1 0.5 NOTES: 1. All timings are referenced to 1.5V. 2. The clock parameters apply to all three 2x Clocks: Clk2xSys, Clk2xSmp/Rd and Clk2xPhi. 3. This parameter is guaranteed by design. 4. These parameters are illustrated in detail in the "IDT79R3001 Hardware Interface Guide". 5. Tcyc is one CPU clock cycle (2 cycles of a 2x clock). 6. With the exception of RUrl, no two signals of a given device will derate by a difference greater than 15%. 70 1 0.5 1 ns125pF lon9R3001 RISController COMMERCIAL TEMPERATURE RANGE PIN CONFIGURATIONS 172-Pin Ceramic Flatpack (Cavity Side View) 44 43 Data21 Data22 Data24 Data25 Data26 Data31 DataP3 Data27 Data28 XEn Data29 Data30 Exc Clk2xPhi GND7 GND6 CpCond2 VCC7 VCC6 GND5 GND4 GND3 VCC5 VCC4 VCC3 GND2 GND1 IDT79R3001 RISController ~s CpSync MemWr AccTy1 Run VCC2 VCC1 Clk2x.S.mp/Rd SysOut DClk IClk CpCond3 MemRd AccTyO AccTy2 DmAStall 172 86 AdrLo2 AdrLo3 AdrLo4 AdrLo5 AdrLo6 AdrLo7 AdrLo8 AdrLo9 AdrLo10 AdrLo11 AdrLo12 AdrLo13 AdrLo14 VCC15 VCC16 VCC17 GND16 GND17 VCC18 VCC19 GND18 VCC20 VCC21 VCC22 AdrLo15 CpCondO CpCond1 Resvd1 GND19 GND20 AdrLo16 AdrLo17 IntO 87 inIT 1nl2 Int3 Int4 Int5 CpBusy WrBusy RdBusy BusError Reset 130 Note: 1. AccTyp2 is redefined to be Parity Error if the parity enable option is selected at device initialization. 71 129 1DT79R3001 RISControlier COMMERCIAL TEMPERATURE RANGE PIN CONFIGURATIONS (continued) 144-Pin PGA (Top View) A VCC14 2 3 AdrLo 6 AdrLo 10 AdrLo VCC12 11 AdrLo 14 AdrLo CpCond AdrLo 15 16 0 AdrLo 9 Cp Sync AdrLo CpConc 13 1 AdrLo GND13 GND12 VCC11 8 4 5 6 7 8 9 10 11 12 AdrLo 17 Ini2 Int5 ill Int3 Cp Busy IntO Int4 Rd Busy 13 Wi 14 15 Reset VCC10 Busy B AdrLo 3 Mem Wr AdrLo 7 C AdrLo 0 AdrLo 4 VCC13 AdrLo 5 D Data 1 AdrLo 2 GNDO Tag15 Tag19 Tag21 E DataP 0 Data 0 AdrLo 1 Tag18 Tag22 VCC9 F VCCO Data 7 Data 2 GND10 Tag23 Tag25 G Data 4 Data 3 GND1 H Data 6 Data 5 Data J Data 10 DataP 1 K Data 15 L AdrLo 12 Bus Error Run Tag13 Tag16 GND11 Tag14 Tag 17 Tag20 GND9 Tag24 Tag26 VCC8 Tag28 Tag27 Data 9 Tag31 Valid Tag29 Data 11 GND2 GND8 AdrLo 19 Tag30 VCC1 Data 12 Data 17 AdrLo 22 AdrLo 20 AdrLo 18 M Data 13 Data 16 DataP 2 GND7 AdrLo 23 VCC7 N Data 14 Data 18 Data 19 GND3 DAd IWr AdrLo 21 P Data 23 Data 20 AccTy1 lAd DWr Q VCC2 Data 21 Data 25 IDT79R3001 RISControlier 8 Data 24 DataP 3 VCC3 Data 22 Data 26 Data 27 XEn Data 30 Data 31 Data 28 GND4 Data 29 Excep tion VCC4 GND5 GND6 Mem Ad CIk2x CIk2x Sys Smp/Rd CIk2x Phi Cp Cond2 Note: 1. AccTyp2 is redefined to be Panty Error if the parity enable option is selected at device initialization. 72 DClk DmA Stall AccTyO Cp Cond3 SysOut VCC5 IClk AccTy2 VCC6 1DT79R3001 RISController COMMERCIAL TEMPERATURE RANGE Tckp Tcklow Tckhigh Clk2xSys Clk2xSmpRd Clk2xPhi Figure 16. Input Clock Timing Tcyc Trd Tsmp Tsmp Trd RdISmpOut* Tsys PhiOut* Tsys * These signals are not actually output from the processor. They are drawn to provide a reference for other timing diagrams. Figure 17. Processor Reference Clock Timing 73 IDTI9R3001 RISController COMMERCIAL TEMPERATURE RANGE 2 2 Phase SysOut _ _ _ _oJ AddrLo AccTyp 0:1 Size of Loaded Data AccTyp 2 DBus Input Tdh Data and Tag Busses (Clk DClk Twrdly Figure 18. Synchronous Memory (Cache) Timing 74 COMMERCIAL TEMPERATURE RANGE IDT79R3001 RISController STALL RUN Phase I 2 I STALL I 2 RUN FIXUP 2 2 Tag (Address High) AccTyp 0:1 Reserved Reserved Data (Output) Trun Figure 19. Memory Write Timing 75 IDT79R3001 RlSControlJer COMMERCIAL TEMPERATURE RANGE RUN I Phas.1 ut_ ~ ut~" Tsys7~ STALL I 2 ~Tsys ~ Tsaval Addr Lo ~ D Addr ) I Addr Tsys7~ ~~ I )K: D Addr Tdval i++I --.., Read Address I ..... Tdval Ta g (Add res s Hig h) ~,/I -+ -. rTacty Data Size AccTyp 0 :1 -. AccTyp 2 Data (Input ) Tsacty-+ ~at2 Tsacty Read Address Tden 'I@ Tden~"" -+ ~ 2 -,~Tsys +-+ ~~ I H ~·:><::· - k=> j~ .1 Tsmp I Tds I r Twr~'). .... Tmrdi d ~ Tmrdt -+ . -Tsmp 7C. ¥ / RdBu sy ~ -. n foil Tds ... -Tdh -.Tdhr-rCpCon dO I Tstl n Tact y r-Tacty 1-+ Data Size -+ H< DAddr ~ Tdsl J1 Trd +- .- - Tsys ~ -.!c,. ..!. Trun Tsmp I Figure 20. Memory Read Timing 76 1 FL COMMERCIAL TEMPERATURE RANGE IDT79R3001 RISController Co-Processor Store Co-Processor Load 2 2 Phase SysOut ----- Data Bus Run ___ +--+~ ____ ~ __________ ~ __________+-________ CpBusy ___+-__________+-_~--~-----~~-' Exception - - I - - ' CpCond(n) - - - t - - - - - - - t - - ' Condition Valid Figure 21. Co-Processor Load/Store Timing 77 ~~-L+- ____ IDTI9R3001 RISController COMMERCIAL TEMPERATURE RANGE 2 2 Phase Figure 22. Interrupt Timing 78 COMMERCIAL TEMPERATURE RANGE IOTI9R3001 RISControlier ~ tos ---+ tOH tSMP .- """.""".""".. "'" """ "',. \., .."" .. ,'" .... ""."",.,'",.. ,'" .."" ..\., .. "" ..."", ...",..."'...... ,,,',."'" ",..", ....."".'\.,.",. "",~:";'" ·"".··""~··\•. '·,•.""".·'·,,,.··',l __ -+I ~,,':'".....~.,.,.,.,.~."'.~"'.~ ..."..~",:-,.>,~ ..,,~.,,~.:-.~",'~ .... ,~",.:-",',-.r.' .. ".,~ ..~.".. ~ ",'.~ .. ",~ . ,...:-,,'.~.,,' •. ".-., .. ':-""~ ".,.•..,.~, •.:-""•.",...., .."' ..",..."..".".,.-.r.' ..,,....,-.".. "....,. ..".,.."~ ......-.,,.~ .. ,,'...\""" .. ''--'".'"'~''~', .. ''''''''''' IRd " . ."" ....",....", ", "" . "'".,,"", ", ."" .."".""'" .,....'"" "."""" "."",.",\.."".."".""",,,.,,,.""',."'" ", DClk " ' ..,'",...",....""...,'" ...,','.. ""'" """ "'.""",. "",,">,."",,."""" "" tOMAOis "',.""'" .,. ".".,....'. .,....'. .,..'.,. . . .,. . ' ', "'. ,.1 "'."'..,.',.,. "",.";',.""" "',.""" '''.,.'''''' ",. "',. ",.""',. "',.""" ",."",."'.,.""',.""" "',."'.,. "',. "".""" ",."'" '''''' "',.""" "',."" "',.""',.""" "".1 IClk ~ ""."\.~ """ """ .,'" ....", ......"."". """ "" """ "',. "",.""" """ """ """ "",. ""',. t------------- ""'.1 I"."""."",. '.,." .. "..".. "".""" "',.""" ",."',. ", "" .. "",."",."",.".;,... ""."'"."" "',,"",."'" """ "' .. "",."",."",."'" ">',."'" "'.,.. "'"."'".""" AdrLo '" Instruction Figure 24. Entering DMA Stall 80 Data COMMERCIAL TEMPERATURE RANGE IOTI9R3001 RISControlier Run phase DMA Stall DMA Stall 2 2 SysOut tSYS 1+-. tSYS !.--. /~ PhiOut -'r- // DMAStall / v i- tOH I ~ tSMP tOS-' .- Run DRd DWr IRd IWr / / "'" X,,·.·"···",··"",''"·,,.·,,·'''·",.·"".·"".··" / --. ~.' / .'.'" ..'" .."" ..""." .."",':"".':.""".:"'."'.j . - tOMAOis ,. .."",""".'" "'..."'.."" ..."",," . .:..'" ',..",..."" ..",.. '"" .."....",..",.'" .." ,,'>",' "" _tions in the CPU. NON FPU Figure 3. Examples of Overlapping Floating Point Operation Exceptions INSTRUCTION SET OVERVIEW The IDT79R3010 FPA supports all five IEEE standard exceptions: • Invalid Operation • Inexact Operati.on • Division by Zero • Overflow • Underflow The FPA also supports the optional, Unimplemented Operation exception that allows unimplemented instructions to trap to software emulation routines. The FPA provides precise exception capability to the CPU; that is, the execution of a floating point operation which generates an exception causes that exception to occur at the CPU instruction which caused the operation. This precise exception capability is a requirement in applications and languages which provide a mechanism for local software exception handlers within software modules. AIIIDT79R3010 instructions are 32 bits long and they can be divided into the following groups: • Load/Store and Move instructions move data between memory, the main processor and the FPA general registers. • Computational instructions perform arithmetic operations on floating point values in the FPA registers. • Conversion instructions perform conversion operations between the various data formats. • Compare instructions perform comparisons of the contents of registers and set a condition bit based on the results. The result of the compare operation is output on the FpCond output of the FPA, which is typically used as CpCond1 on the CPU for use in coprocessor branch operations. Table 1 lists the instruction set of the IDT79R3010 FPA. OP Description Load/Store/Move Instructions OP Description Computational Instructions LWC1 SWC1 MTC1 MFC1 CTC1 CFC1 Load Word to FPA Store Word from FPA Move Word to FPA Move Word from FPA Move Control word to FPA Move Control word from FPA ADD.fmt SUB.fmt MUL.fmt DlV.fmt ABS.fmt MOV.fmt NEG.fmt Floating-point Floating-point Floating-point Floating-point Floating-point Floating-point Floating-point CVT.S.fmt CVT.D.fmt CVT.W.fmt Floating-point Convert to Single FP Floating-point Convert to Double FP Floating-point Convert to fixed-point C.cond.fmt Floating-point Compare Add Subtract Multiply Divide Absolute value Move Negate Compare Instructions Conversion Instructions Table 1. 1DT79R3010 Instruction Summary 85 IDT79R3010 RISC FLOATING POINT ACCELERATOR (FPA) MIUTARY AND COMMERCIAL TEMPERATURE RANGES 3) ALU-If the instruction is an FPA instruction, instruction execution commences during this pipe stage. 4) MEM-If this is a coprocessor load or store instruction, the FPA presents or captures the data during phase 2 of this pipe stage. 5) WB-The FPA uses this pipe stage solely to deal with exceptions. 6) FWB-The FPA uses this stage to write back ALU results to its register file. This stage is the equivalent of the WB stage in the IDT79R3000 main processor. Each of these steps requires approximately one FPA cycle as shown in Figure 3 (parts of some operations spill over into another cycle while other operations require only 112 cycle). IOTI9R3010 PIPELINE ARCHITECTURE The IDT79R3010 FPA provides an instruction pipeline that parallels that of the IDT79R3000 processor. The FPA, however, has a 6-stage pipeline instead of the 5-stage pipeline of the IDT79R3000: the additional FPA pipe stage is used to provide efficient coordination of exception responses between the FPA and main processor. The execution of a single IDT79R301 0 instruction consists of six primary steps: 1) IF-Instruction Fetch. The main processor calculates the instruction address required to read an instruction from the I-Cache. No action is required of the FPA during this pipe stage since the main processor is responsible for address generation. 2) RD-The instruction is present on the data bus during phase 1 of this pipe stage and the FPA decodes the data on the bus to determine if it is an instruction for the FPA. Instruction Execution I IF I I-Cache I RD I RF ALU MEM WB FWB OP D-Cache exceotions FpWB Figure 4. 1DT79R3010 InstructIon Summary The IDT79R301 0 uses a 6-stage pipeline to achieve an instruction execution rate approaching one instruction per FPA cycle. Instruction Flow Thus, execution of six instructions at a time are overlapped as shown in Figure 5. WB FWB MEM WB RD ALU MEM IF RD ALU IF RD ~:;:;:;:}f~:;:~:~:;:; Current Cycle Figure 5. IDT79R3010 InstructIon Pipeline This pipeline operates efficiently because different FPA resources (address and data bus accesses, ALU operations, register accesses, and so on) are utilized on a non-interfering basis. 86 IDT79R3010 RISC FLOATING POINT ACCELERATOR (FPA) MIUTARY AND COMMERCIAL TEMPERATURE RANGES PIN CONFIGURATION (Top View) Clk2xRd FpSysin Data (31) VCC1 GND1 DataP (3) FpSysOut Clk2xSys Clk2xSmp Clk2xPhi Reset FpSync VCC2 GND2 VCC3 GND3 PLLOn VCC4 GND4 VCC5 GND5 11 10 9 12 13 14 1 8483 82 81 80 79 78 77 7675 ., 74 " 73 72 Index 15 16 17 18 19 20 21 22 84-Pln J Bend CERQUAD 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 4950 51 52 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 ---- 0u,... ............ :>....,...., ......,...., N C') VII) J!! J!! J!!J!! co co co co 0000 Note: Reserved pins must not be connected. 87 GND13 DataP (1) VCC12 GND12 FpCond FpBusy Fplnt EXception Run Resvd2· Resvd1 VCC11 GND11 VCC10 GND10 FpPresent ResvdO VCC9 GND9 VCC8 GND8 1DT79R3010 RISC FLOATING POINT ACCELERATOR (FPA) MIUTARV AND COMMERCIAL TEMPERATURE RANGES PIN CONFIGURATION (Ceramic, Cavity Down)- BOTTOM VIEW M Vss Vee Data 17 DataP 1 Vss L Data 21 Data 20 Data 18 Data 16 Vee K Vss Vee Data 19 J Data 23 H FP Cond Vss RUri Vee Rsrvd 2 Rsrvd 1 Vee Vss Data 15 Data 14 Vee Vss Data 22 Data 13 Data 12 Data 24 DataP 2 Data 11 Data 10 G Data 26 Data 25 Vee Vss F Vss Vee Data 8 Data E Data 27 Data 28 Data 7 DataP 0 D Data 29 Data 30 Data 5 Data c Vss Vee Clk2x Rd Vee Vss B Fp Sysln Data 31 DataP 3 Vee Clk2x Sys Vee Clk2x Phi Vee PilOn Data 1 Data 3 Data A Vss Vee F2§:s Out Vss Clk2x Smp Vss Reset Vss FP Sync Data 0 Vee Vss 2 3 4 5 6 7 8 9 10 11 12 FPlnt FPBusy Exception FP PreS9rit Rsrvd 0 84-Pin Ceramic Pin Grid Array Data 2 NOTE: 1. Reserved pins must not be connected. 88 9 6 4 IDT79R3010 RISC FLOATING POINT ACCELERATOR (FPA) MIUTARV AND COMMERCIAL TEMPERATURE RANGES PIN CONFIGURATION 84-L QUAD FLATPACK (CAVITY DOWN) TOP VIEW 84 64 63 Data (30) Data (29) Data (28) Data (27) VCCO GNDO Data (26) Data (25) Data (24) DataP (2) Data (23) Data (22) Data (21) Data (20) VCC14 GND14 Data (19) Data (18) Data (17) Data (16) VCC13 Data (0) Data (1) Data (2) Data (3) GND6 VCC6 Data (4) Data (5) Data (6) Data (7) DataP (0) Data (8) Data (9) Data (10) Data (11) GND? VCC? Data (12) Data (13) Data (14) Data (15) 21 43 42 22 NOTE: 1. Reserved pins must not be connected. 89 IDT79R3010 RISC FLOATING POINT ACCELERATOR (FPA) MIUTARY AND COMMERCIAL TEMPERATURE RANGES PIN DESCRIPTIONS PIN NAME 110 Data (0-31) I/O A multiplexed 32-bit bus used for instruction and data transfers on phase 1 and phase 2, respectively. DataP (0-3) 0 A 4-bit bus containing even parity over the data bus. Parity is generated by the FPA on stores. Run I Exception I DESCRIPTION Input to the FPA which indicates whether the processor-coprocessor system is in the run or stall state. Input to the FPA which indicates exception related status information. FpBusy 0 Signal to the CPU indicating a request for a coprocessor busy stall. FpCond 0 0 Signal to the CPU indicating the result of the last comparison operation. Fplnt Reset I Synchronous initialization input used to distinguish the processor-FPA synchronization period from the execution period. Reset must be synchronized by the leading edge of SysOut from the CPU. PilOn I Input which during the reset period determines whether the phase lock mechanism is enabled and during the execution period determines the output timing model. 0 Output which is pulled to ground through an impedance of approximately O.5k ohms. By providing an external pullup on this line, an indication of the presence or absence of the FPA can be obtained. FpPresent Signal to the CPU indicating that a floating-point exception has occurred for the current FPA instruction. Clk2xSys I A double frequency clock input used for generating FpSysOut. Clk2xSmp I A double frequency clock input used to determine the sample point for data coming into the FPA. Clk2xRd I A double frequency clock input used to determine the disable point for the data drivers. Clk2xPhi I FpSysOut 0 A double frequency clock input used to determine the position of the internal phases, phase 1 and phase 2. Synchronization clock from the FPA. FpSysln I Input used to receive the synchronization clock from the FPA. FpSync I Input used to receive the synchronization clock from the CPU. 90 1DT79R3010 RISC FLOATING POINT ACCELERATOR (FPA) MIUTARY AND COMMERCIAL TEMPERATURE RANGES ABSOLUTE MAXIMUM RATINGS(1,3) SYMBOL RATING COMMERCIAL MILITARY UNIT VTERM Terminal Voltage with Respect to GND TA Operating Temperature o to +70 -55 to +125 °C TBIAS Temperature Under Bias -55 to +125 -65 to +135 °C TSTG Storage Temperature(2) -55 to +125 -65 to +150 °C VIN Input Voltage -0.5 to +7.0 -0.5 to +7.0 V -0.5 to +7.0 -0.5 to +7.0 RECOMMENDED OPERATING TEMPERATURE AND SUPPLY VOLTAGE V GRADE AMBIENT TEMPERATURE GND Vee Military -55°C to + 125°C OV 5.0± 10% O°Cto +70°C OV 5.0 ±5% Commercial OUTPUT LOADING FOR AC TESTING NOTES: 1. Stresses greater than those listed under ABSOLUTE MAXIMUM RATINGS may cause permanent damage to the device. This is a stress rating only and functional operation of the device at these or any other conditions above those indicated in the operational sections of this specification is not implied. Exposure to absolute maximum rating conditions for extended periods may affect reliability. >--.,.----0 To Device Under Test 2. VIN minimum = -::J.OV for pulse width less than 15ns. VIN should not exceed Vee +0.5 Volts. 3. Not more than one output should be shorted at a time. Duration of the short should not exceed 30 seconds. DC ELECTRICAL CHARACTERISTICS COMMERCIAL TEMPERATURE RANGE (TA = 0°Cto+70°C. Vee =+5.0 V±5%) SYMBOL PARAMETER TEST CONDITIONS 16.67 MHz MIN. MAX. 20.0 MHz MIN. MAX. 25.0 MHz MIN. MAX. 33.33MHz UNIT MIN. MAX. 3.5 3.5 3.5 3.5 VOH Output HIGH Voltage Vee = Min. IOH =-4mA Val Output LOW Voltage Vee = Min, IOl VOlFP Output LOW Voltage(S) Vee VIH Input HIGH Voltage(6) Vil Input LOW Voltage(1) VIHS Input HIGH Voltage(2,6) VllS Input LOW Voltage(l,2) VI He Input HIGH Voltage(4,6) Vile Input LOW Voltage(l ,4) 0.4 0.4 0.4 CIN Input Capacitance(7) 10 10 10 COUT Output Capacitance(7) lee Operating Current Vee = Max = 4mA = Min. IOl = 1.5mA ~:::::::::: .. :;.:::::- ::::::i4 V 0.4 0.4 0.4 --1~~~:::: 0.5 0.5 0.5 -? ;;;::iI;'o;s V ));:;:::Q:~ V 2.0 2.0 2.0 0.8 0.8 3.0 3.0 0.4 4.0 0.8 3.(t)) 3.0 0.4 0.4 4.0 4.0 2.Q}!?t! a ..... :.:.: V V V -t: ::::::::;Q~4 V ..........:........ V 4.6 ..J )!:\:::<>\4 :····:-:·:·:::-:·1:0 V pF 10 10 10 -// :::: :::10 pF 625 675 750 -k~r: /900 mA IrH Input HIGH Leakage(3) VIH = Vee -10 10 -10 10 -10 10 -19- ::::::{:::tp ~ Irl Input LOW Leakage(3) Vil =GND -10 10 -10 10 -10 10 -19:: !::i::fb ~ loz Output Tri-state Leakage VOH = 2.4V. VOL = 0.5V -40 40 -40 40 -40 40 -40 :tt\ib ~ NOTES: 1. VIL Min. = -3.0V for pulse width less than 15ns. VIL should not fall below -fJ.5V for longer periods. 2. VIHS and VILS apply to Clk2xSys, Clk2xSmp, Clk2xRd, Clk2xPhi, 3. These parameters do not apply to the clock inputs. J1)SySTri, J1)SyiiC and ~. 4. VIHC and VILC apply to 11ui1, PilOn and EXceptIOn. 5. VOLFP applies to the FPPresent pin only. 6. VIH and VIHS should not be held above Vee + 0.5 Volts. 7. Guaranteed by design. 91 IDT79R3010 RISC FLOATING POINT ACCELERATOR (FPA) MIUTARY AND COMMERCIAL TEMPERATURE RANGES DC ELECTRICAL CHARACTERISTICS MILITARY TEMPERATURE RANGE (TA =-55°Cto+125°C, Vee =+5.0 V± 10%) SYMBOL PARAMETER 16.67 MHz TEST CONDITIONS VOH Output HIGH Voltage Vee = Min., 10H = -4rnA 3.5 VOL Output LOW Voltage Vee = Min., 10L = 4rnA - VOLFP Output LOW Voltage(S) Vee = Min., 10L = 1.5rnA VIH Input HIGH Voltage(6) VIL Input LOW Voltage(1) VIHS Input HIGH Voltage(2,6) VILS Input LOW Voltage(1,2) VIHe Input HIGH VoltaQe(4,6) VILe Input LOW VoltaQe(1 ,4) CIN Input Capacitance(7) COUT Output Capacitance(7) Icc Operating Current hH hL loz .:.:4<:::, . :::::::::::::Oi4II\::. .:}}>.:)
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.3 Linearized : No XMP Toolkit : Adobe XMP Core 4.2.1-c041 52.342996, 2008/05/07-21:37:19 Create Date : 2017:08:11 05:14:52-08:00 Modify Date : 2017:08:11 06:00:42-07:00 Metadata Date : 2017:08:11 06:00:42-07:00 Producer : Adobe Acrobat 9.0 Paper Capture Plug-in Format : application/pdf Document ID : uuid:e394b302-8a16-364f-b8b7-f2a7c149da35 Instance ID : uuid:47fef3f7-84a3-fd4c-81ea-75eee0cd9d69 Page Layout : SinglePage Page Mode : UseNone Page Count : 164EXIF Metadata provided by EXIF.tools