1991_IDT_RISC_Databook 1991 IDT RISC Databook

User Manual: 1991_IDT_RISC_Databook

Open the PDF directly: View PDF PDF.
Page Count: 578

Download1991_IDT_RISC_Databook 1991 IDT RISC Databook
Open PDF In BrowserView PDF
Integrated Device Technology, Inc.

D Til BOOK

3236 Scott Boulevard, Santa Clara, California 95054
Telephone: (408) 727-6116 • TWX: 910-338-2070 • FAX: (408) 492-8674
Printed in U.S.A.
©1991 Integrated Device Technology, Inc.

GENERAL INFORMATION

CONTENTS OVERVIEW
Historically, Integrated Device Technology has presented our product offerings entirely under
one cover. For ease of use for our customers, we have divided the products into fourseparate data
books - Logic, Specialized Memory, RISC and Static RAM.
lOT's 1991 RISC Data Book is comprised of new and revised data sheets and application notes
forthe RISC and RISC Subsystem product lines. Also included is a current, complete packaging
section for alii DT product groups. This section will be updated in each subsequent data book with
the latest available packages.
The RISC Data Book's Table of Contents contains a listing ofthe products contained inthe 1991
RISC Data Book, as well as those products which are contained in the other three data books. The
numbering scheme is slightly different from the past. The nu mber in the bottom center of the page
denotes the section number and the sequence of the data sheet within that section, (Le. 5.5 would
be the fifth data sheet in the fifth section). The number in the lower right hand corner is the page
number of that particular data sheet.
Integrated Device Technology, a recognized leader in high-speed CMOS technology, produces
a broad line of products, enabling us to provide a complete CMOS solution to designers of highperformance digital systems. Our products include industry standard devices, as well as products
with speed, lower power, package and/or architectural benefits that allow the designer to achieve
significantly improved system performance.
Use this book to find ordering Information: Start with the Ordering Information chart at the
back of each data sheet or the Cross Reference Guides (in Section 1), along with the Package
Outline Index (page 4.2), to 'compose the complete lOT part number. Reference data on our
Technology Capabilities and Quality Commitments are included in separate sections (2 and 3,
respectively) .
Use this book to find product data: Start with the Table of Contents, organized by product
line (page 1.3), or with the Numeric Table of Contents across all product lines (page 1.4). These
indexes will direct you to the page on which the complete technical data sheet can be found. Data
sh,eets may be of the following type:
ADVANCE INFORMATION - contain initial descriptions, subject to change, for products that
are in development, including features and block diagrams.
PRELIMINARY - contain descriptions for products soon to be, or recently, released to
production, including features, pinouts and block diagrams. Timing data are based on simulation
or initial characterization and are subject to change upon full characterization.
FINAL - contain minimum and maximum limits specified over the complete supply and
temperature range for full production devices.
'
New products, product performance enhancements, additional package types and new product
families are being introduced frequently. Please contact your local lOT sales representative to
determine the latest device specifications, package types and product availability.

1.1

II

LIFE SUPPORT POLICY
Integrated Device Technology's products are not authorized for use as critical components In life support devices
or systems unless a specific written agreement pertaining to such Intended use Is executed between the manufacturer and an officer of lOT.
1. Life support devices or systems are devices or systems which (a) are Intended for surgical Implant Into the body
or (b) support or sustain life and whose failure to perform, when properly used In accordance with Instructions for
use provided In the labeling, can be reasonably expected to result In a significant Injury to the user.
2. A critical component Is any component of a life support device or system whose failure to perform can be reasonably expected to cause the failure of the life support device or system, or to affect Its safety or effectiveness.

Note: Integrated Device Technology, Inc. reserves the right to make changes to its products or specifications at any time, without notice,
in order to improve design or performance and to supply the best possible product. lOT does not assume any responsibility for use of any
circuitry described other than the circuitry embodied in an lOT product. The Company makes no representations that circuitry described
herein is free from patent infringement or other rights of third parties which may result from its use. No license is granted by implication or
otherwise under any patent, patent rights or other rights, of Integrated Device Technology, Inc.

1.1

2

1991 Rise DATA BOOK
SUMMARY TABLE OF CONTENTS
PAGE

GENERAL INFORMATION
Contents Overview ...............................................................................................................................................
Summary Table of Contents .................................................................................................................................
Table of Contents .................................................................................................................................................
Numeric Table of Contents ...................................................................................................................................
lOT Package Marking Description ........................................................................................................................

.1.1
1.2
1.3
1.4
1.5

TECHNOLOGY AND CAPABILITIES
IDT... Leading the CMOS Future ...........................................................................................................................
lOT Military and DESC-SMD Program ..................................................................................................................
Radiation Hardened Technology ..........................................................................................................................
lOT Leading Edge CEMOS Technology ...............................................................................................................
Surface Mount Technology ...................................................................................................................................
State-of-the-Art Facilities and Capabilities ............................................................................................................
Superior Quality and Reliability ............................................................................................................................

2.1
2.2
2.3
2.4
2.5
2.6
2.7

QUALITY AND RELIABILITY
Quality, Service and Performance ........................................................................................................................
lOT Quality Conformance Program ......................................................................................................................
Radiation ToleranVEnhanced/Hardened Products for Radiation Environments .................................................. .

3.1
3.2
3.3

PACKAGE DIAGRAM OUTLINES
Thermal Performance Calculations for lOT's Packages .......................................................................................
Package Diagram Outline Index ...........................................................................................................................
Monolithic Package Diagram Outlines ..................................................................................................................

4.1
4.2
4.3

RISC PROCESSING COMPONENTS
RISC Processing Components Products ..............................................................................................................

5.1

RISC SUPPORT COMPONENTS
RISC Support Components Products ...................................................................................................................

6.1

RISC MODULE PRODUCTS
RISC Module Products

7.1

RISC DEVELOPMENT SUPPORT
RISC Development Support Products ..................................................................................................................

8.1

APPLICATION NOTES
RISC Microprocessor Products Application Notes ...............................................................................................
RISC Microprocessor Conference Papers ............................................................................................................

IDT SALES OFFICE, REPRESENTATIVE AND DISTRIBUTOR LOCATIONS

1.2

9.1
9.16

II

SUMMARY TABLE OF CONTENTS (CONTINUED)

BOOK

LOGIC DATA BOOK
Complex logic Products .......... ..... ......... ... ....... ... ....... ......... ... ......... ............ ....... ... .... ...... ........... ..... ..... ..... .... ........
Standard logic Products ......................................................................................................................................

lOGIC
lOGIC

SPECIALIZED MEMORIES DATA BOOK
ECl Products........................................................................................................................................................
FIFO Products ........................................................................ :.............................................................................
Specialty Memory Products ..................................................................................................................................
Subsystems Products ...........................................................................................................................................

SMP
SMP
SMP
SMP

SRAM DATA BOOK
CEMOS Static RAMs with Power Down Products ................................................................................................
High-Speed BiCEMOS Static RAM Products .......................................................................................................

1~

SRAM
SRAM

2

1991 RISC DATA BOOK
TABLE OF CONTENTS
PAGE

GENERAL INFORMATION
Contents Overview ............ ........ ........... ... ........................ ................. ......... ............ ..... ......... ....... ................ ..........
Summary Table of Contents ..................................................................................... ........... ....... ........ ......... .........
Table of Contents .........................................................................................:.......................................................
Numeric Table of Contents ..................................... ......... ......................................... ...................... ............ ..........
IDT Package Marking Description ........................................................................................................................

1.1
1.2
1.3
1.4
1.5

TECHNOLOGY AND CAPABILITIES
IDT... Leading the CMOS Future ...........................................................................................................................
IDT Military and DESC-SMD Program..................................................................................................................
Radiation Hardened Technology ..........................................................................................................................
IDT Leading Edge CEMOS Technology ...............................................................................................................
Surface Mount Technology ...........................................................................................................~.......................
State-of-the-Art Facilities and Capabilities .................................................................................. ..........................
Superior Quality and Reliability ............................................................................................................................

2.1
2.2
2.3
2.4
2.5
2.6
2.7

QUALITY AND RELIABILITY
Quality, Service and Performance ........................................................................................................................
IDT Quality Conformance Program ............................................................................... ~......................................
Radiation ToleranVEnhanced/Hardened Products for Radiation Environments .................... ...............................

3.1
3.2
3.3

PACKAGE DIAGRAM OUTLINES
Thermal Performance Calculations for IDT's Packages .......................................................................................
Package Diagram Outline Index ...........................................................................................................................
Monolithic Package Diagram Outlines. ............................... ........... ............................................ ................ ...........

4.1
4.2
4.3

RISC PROCESSING COMPONENTS
IDT79R3000A
IDT79R3001
IDT79R3010A
IDT79R3500
IDT79R3051
IDT79R4000

RISC CPU Processor ................................................................... .............................
RISController™ ....................................................................................... ..................
RISC Floating Point Accelerator (FPA) .....................................................................
RISC CPU Processor RISCore™ ............................................................ ..................
IDT79R3051 Family of Integrated RISControllers™ .................................................
Third Generation RISC Microprocessor ....................................................................

5.1
5.2
5.3
5.4
5.5
5.6

RISC SUPPORT COMPONENTS
IDT79R3720
IDT79R3721
IDT79R3722
IDT79R3020
IDT73200L
IDT73201L
IDT73210
IDT73211

Bus Exchanger for R3051 Family..............................................................................
DRAM Controller for R3051 Family............................................................. ..............
I/O Interface Controller for R3051 Family..................................................................
RISC CPU Write Buffer ............................................................................. ................
16-Bit CMOS Multilevel Pipeline Registers ................................................. ..............
16-Bit CMOS Multilevel Pipeline Registers .................................... ............. ..............
Fast CMOS Octal Register Transceiver with Parity...................................................
Fast CMOS Octal Register Transceiver with Parity...................... .............................

6.1
6.2
6.3
6.4
6.5
6.5
6.6
6.6

RISC MODULE PRODUCTS
IDT7RS101
IDT7RS102
IDT7RS103
IDT7RS104

R3000
R3000
R3000
R3001

CPU Modules for General Applications .........................................................
CPU Modules for Compact Systems ................................................ .............
CPU Modules for Compact Systems .............................................................
RISC Engine for Embedded Controllers ........................ ............... .......... .......

1.3

7.1
7.2
7.3
7.4

II

1991 RISC DATA BOOK (CONTINUED)

PAGE

RISC MODULE PRODUCTS (CONTINUED)
IDT7RS107
R3000 CPU Modules for High Performance and MultiProcessor Systems ...............
IDT7RS108
R3000 CPU Modules with 256K Caches...................................................................
IDT7RS109
R3000 CPU Modules with 256K Caches...................................................................
IDT7RS110
Plug Compatible Family of R3000 CPU Modules ......................................................

7.5
7.6
7.7
7.8

RISC DEVELOPMENT SUPPORT
RC32xx
I DT7RS300 Series
IDT7RS363
IDT7RS364
IDT7RS382
IDT7RS383
IDT7RS388
IDT7RS502
IDT7RS503
IDT7RS901
IDT7RS903
IDT7RS904
IDT7RS905

IDT RISC Development Host Systems ......................................................................
Prototyping Platform for Any IDT CPU Module ................................................... ,.....
R3000 PGA Adaptor....... ........ ................. ..... ........... ............ ......................... .............
R3000 Disassembler for Use with the HP 16500 Logic Analyzer ..............................
R3000 and R3001 Evaluation Boards ........ ........................................ ........ ...............
R3000 and R3001 Evaluation Boards .......................................................................
REAL8™ R3000 Laser Printer Controller Evaluation System ...................................
MacStation 2 R3000 Development System ...............................................................
MacStation 3 R3000 Development System .......... ...................... ........... .......... ..........
IDT/sim System Integration Manager ROMabie Debugging Kernal ..........................
IDT/c Multi-Host C-Compiler System.........................................................................
Cross Assembler for IBM PCs and Clones ................................................................
IDT/fp Floating Point Library "for Use with R3000 Compilers .....................................
Third Party Development Support .............................................................................

8.1
8.2
8.3
8.4
8.5
8.5
8.6
8.7
8.8
8.9
8.10
8.11
8.12
8.13

APPLICATION NOTES
RISC Microprocessor Products Application Notes
AN-19
RISC and the Memory Hierarchy.............................................. ................... ..............
AN-26
Interrupt Latency and Handling in the IDT79R3000 ..................................................
AN-27
Cache Design Considerations Using the IDT79R3000 .............................................
AN-28
Using the IDT79R3000 in a Multiprocessor Organization .........................................
AN-33
The Effect of Branch, Load and Store Latencies on RISC Processor
Performance ...................................................................... ...............................
AN-55
DeSign Techniques for Lowering Power Consumption in RISC Designs ..................
AN-58
Parity in the IDT RISC Family....................................................................................
AN-61
R3000 33MHz Specification and Cache Timing ........................................................
AN-62
R3000 Family Software Tools for Performance Analysis ..........................................
AN-65
Using the IDT73210 as a One-Deep Read and Write Buffer with the
IDT79R3000/3001 ................................ :..........................................................
AN-66
Designing Embedded Control Applications with the R3001 RISController ................
AN-67
Using IDT71502 RAMs in a Real-Time Debugging Tool for an R3000
Microprocessor-Based System. .................................................... ... .................
AN-76
Using the IDT7MB6049 Cache Module for Single and Multiprocessor Systems.......
AN-77
R3001 Specification and Cache Timing ....................................................................
RISC Microprocessor Conference Papers
CP-01
A Powerful Development Tool for the IDT79R3000 RISC Family.............................
CP-02
Developing Applications for the IDT79R3000 RISC Microprocessor ........................
CP-03
IDTs R3001 Simplifies Design of High-Performance Control Systems ....................

9.1
9.2
9.3
9.4
9.5
9.6
9.7
9.8
9.9
9.10
9.11
9.12
9.13
9.14

9.15
9.16
9.17

IDT SALES OFFICE, REPRESENTATIVE AND DISTRIBUTOR LOCATIONS

1.3

2

1990-91 LOGIC DATA BOOK
The following Is a listing of the data sheets located In the 1990-91 Logic Data Book available under separate cover:

COMPLEX LOGIC PRODUCTS

PAGE

DSP AND MICROSLICETM PRODUCTS
IDT39C01
4-Bit Microprocessor Slice .........................................................................................
IDT39C10
12-Bit Sequencer .......................................................................................................
IDT49C402
16-Bit Microprocessor Slice .......................................................................................
IDT49C410
16-Bit Sequencer .......................................................................................................
IDT7210L
16 x 16 Parallel Multiplier-Accumulator .....................................................................
IDT7216L
16 x 16 Parallel Multiplier ..........................................................................................
IDT7217L
16 x 16 Parallel Multiplier (32 Bit Output) ..................................................................
IDT7381L
16-Bit CMOS Cascadable ALU .................................................................................
IDT7383L
16-Bit CMOS Cascadable ALU .................................................................................

5.1
5.2
5.3
5.4
5.5
5.6
5.6
5.7
5.7

READIWRITE BUFFER PRODUCTS
16-Bit CMOS Multilevel Pipeline Register .................................................................
IDT73200L
IDT73201 L
16-Bit CMOS Multilevel Pipeline Register .................................................................
IDT73210
Fast Octal Register Transceiver w/Parity ..................................................................
IDT73211
Fast Octal Register Transceiver w/Parity ..................................................................

5.8
5.8
5.9
5.9

ERROR DETECTION AND CORRECTION PRODUCTS
16-Bit Cascadable EDC.............................................................................................
IDT39C60
32-Bit Cascadable EDC.............................................................................................
IDT49C460
IDT49C465
32-Bit CMOS Flow-ThruEDC Unit .............................................................................
IDT49C466
64-BIT CMOS Flow-ThruEDC Unit ............................................................................

5.10
5.11
5.12
5.13

GRAPHICS PRODUCTS
IDT75C457
IDT75C458
IDT75C48
IDT75C58

5.14
5.15
5.16
5.17

CMOS Single 8-Bit PaletteDACTM for True Color Applications .................................
Triple 8-Bit PaletteDACTM ..........................................................................................
8-Bit Flash ADC.........................................................................................................
8-Bit Flash ADC with Overflow Output ....... ................. ........ ..... ....... .... ............. .........

STANDARD LOGIC PRODUCTS
IDT29FCT52T
IDT29FCT53T
IDT29FCT520T
IDT29FCT521T
IDT54/74FCT138T
IDT54/74FCT139T
IDT54/74FCT151 T
IDT54/74FCT251 T
IDT54/74FCT157T
IDT54/74FCT257T
IDT54/74FCT161T
IDT54/74FCT163T
IDT54/74FCT191 T
IDT54n4FCT193T
IDT54/74 FCT240T
IDT54/74FCT241T
IDT54/74FCT244T
IDT54/74FCT540T
IDT54/74 FCT541 T
IDT54/74 FCT245T

Non-inverting Octal Registered Transceiver ..............................................................
Inverting Octal Registered Transceiver .....................................................................
Multi-level Pipeline Register ......................................................................................
Multi-level Pipeline Register ......................................................................................
1-of-8 Decoder...........................................................................................................
DuaI1-of-4 Decoder ..................................................................................................
8-lnput Multiplexer .....................................................................................................
8-lnput Multiplexer w/3-State ............................................................... ......................
Quad 2-lnput Multiplexer ...........................................................................................
FQuad 2-lnput Multiplexer w/3-State .........................................................................
Synchronous Binary Counter w/Asynchronous Master Reset ...................................
Synchronous Binary Counter w/Synchronous ReseL................. ........ ......... .............
Up/Down Binary Counter w/Preset and Ripple Clock................................................
Up/Down Binary Counter w/Separate Up/Down Clocks ............................................
Inverting Octal Buffer/Line Driver ..............................................................................
Non-inverting Octal Buffer/Line Driver .......... ............................................. .......... ......
Non-inverting Octal Buffer/Line Driver ....... .......................................... ................... ...
Inverting Octal Buffer/Line Driver ..............................................................................
Non-inverting Octal Buffer/Line Driver .......................................................................
Non-inverting Octal Transceiver ................................................................................

1.3

6.1
6.1
6.2
6.2
6.3
6.4
6.5
6.5
6.6
6.6
6.7
6.7
6.8
6.9
6.10
6.10
6.10
6.10
6.10
6.11

3

II

1990-91 LOGIC DATA BOOK (CONTINUED)

PAGE

STANDARD LOGIC PRODUCTS (CONTINUED)
I DT54/74FCT640T
Inverting Octal Transceiver ........................................................................................
IDT54174FCT645T
Non-inverting Octal Transceiver .......................................................................;....... .
I DT54174FCT273T
Octal D Flip-Flop w/Comrnon Master Reset ............................................................ ..
I DT54/74FCT299T
8 Input Universal Shift Register w/Common Parallel 110 Pins .................................. .
IDT54/74FCT373T
Non-inverting Octal Transparent Latch w/3-State .................................................... .
IDT54174FCT533T
Inverting Octal Transparent Latch w/3-State ............................................................ .
IDT54/74FCT573T
Non-inverting Octal Transparent Latch w/3-State .....................................................
IDT54174FCT374T
Non-inverting Octal D Register ................................................................................ ..
IDT54/74FCT534T
Inverting Octal D Register .........................................................................................
IDT54/74FCT574T
Non-inverting,Octal D Register ............ ,' .....................................................................
I DT54/74 FCT377T
Octal D Flip-Flop w/Clock Enable ..............................................................................
IDT54/74FCT399T
Quad Dual-Port Register ...........................................................................................
IDT54/74FCT521T
8-Bit Identity Comparator ......................................................................................... ..
I DT54/74FCT543T
Non-inverting Octal Latched Transceiver ..................................................................
IDT54/74FCT646T
Non-inverting Octal Registered Transceiver ............................................................. .
I DT54/74FCT648T
Inverting Octal Registered Transceiver .................................................................... .
IDT54/74FCT651T
Inverting Octal Registered Transceiver .................................................................... .
I DT54/74FCT652T
Non-inverting Octal Registered Transceiver ............................................................ ..
I DT54/74FCT620T
Inverting Octal Bus Transceiver w/3-State .............................................................. ..
IDT54/74FCT623T
Non-inverting Octal Bus Transceiver w/3-State ........................................................ .
IDT54/74FCT621T
Non-inverting Octal Bus Transceiver (Open Drain) ................................. ;..... ~ ......... ..
IDT54/74FCT622T
Inverting Octal Bus Transceiver (Open Drain) ........................................................ ..
IDT54/74FCT821T '
10-Bit Non-inverting Register w/3-State ................................................................... .
I DT54/74 FCT823T
9-Bit Non-inverting Register w/Clear & 3-State ........................................................ .
IDT54/74FCT825T
8-Bit Non-inverting Register w/Clear & 3-State ........................................................ .
IDT54/74FCT827T
10-Bit Non-inverting Buffer ........................................................................................
IDT54/74FCT828T
10-Bit Inverting Buffer ................................................................................................
IDT54/74FCT841T
10-Bit Non-inverting Latch; ........................................................................................
IDT54/74FCT843T
9-Bit Non-inverting Latch ...........................................................................................
IDT54/74FCT845T
8-Bit Non-inverting Latch ......................................................................................... ..
IDT29FCT52
IDT29FCT53
I DT29FCT520
IDT49FCT661
IDT49FCT804
IDT49FCT805
IDT49FCT806
IDT49FCT818
IDT49C25
IDT39C8XX
IDT54/74FCT138 '
IDT54/74FCT139
IDT54/74FCT161
IDT54/74FCT163
IDT54/74FCT182
IDT54/74FCT191
IDT54/74FCT193
I DT54/74FCT240
IDT54/74FCT241
IDT54/74FCT244
I DT54/74 FCT540
IDT54/74FCT541
IDT54/74FCT245
IDT54/74FCT640

Non-inverting Octal Registered Transceiver ............................................................. .
Inverting Octal Registered Transceiver ...................................................... ~ ............ ..
Multi-level Pipeline Register .... :........................................................ ,....................... .
16-Bit Synchronous Binary Counter ..........................................................................
High-Speed Tri-Port Bus Multiplexer .........................................................................
Buffer/Clock Driver w/Guaranteed Skew ...................................................................
Buffer/Clock Driver w/Guaranteed Skew ...................................................................
Octal Register with SPCTM ........................................................................................
Microcycle Length Controller .................................................................................... .
IDT39C8XXX Family ................................................... ,., .......................................... .
1-of-8 Decoder ..........................................................................................................,
DuaI1-of-4 Decoder ..................................................................................................
Synchronous Binary Counter w/Asynchronous Master Reset ............................. :.... .
Synchronous Binary Counter w/Synchronous Reset .................................................
Carry Lookahead Generator ......................................................................................
Up/Down Binary Counter w/Preset and Ripple Clocks ............................................ ..
Up/Down Binary Counter w/Separate Up/Down Clocks ............................................
Inverting Octal Buffer/Line Driver ............................................. ~ ............................... .
Non-inverting Octal Buffer/Line Driver ... ,...................................................................
Non-inverting Octal Buffer/Line Driver .......................................................................
Inverting Octal Buffer/Line Driver ..............................................................................
Non-inverting Octal Buffer/Line Driver ............................................. :....................... ..
Non-inverting Octal Transceiver ................ ~ .............................................................. .
Inverting Octal Transceiver ....................................................................................... .

1.3

6.11
6.11
6.12
6.13
6.14
6.14,
6.14
6.15
6.15
6.15
6:16
6.17
6.18
6.19
6.20
6.20
6.20
6.20
6.21
6.21
6.22
6.22
6.23
6.23
6.23
6.24
6.24
6.25
6.25
6.25
6.26
6.26
6.27
6.28
,6.29
6.30
6.30
6.31
6.32
6.33
6.34
6.35
6.36
6.36
6.37
6.38
6.39
6.40
6.40
6.40
6.40
6.40
6.41
6.41

4

1990-91 LOGIC DATA BOOK (CONTINUED)

PAGE

STANDARD LOGIC PRODUCTS (CONTINUED)
IOT54/74FCT645
Non-inverting Octal Transceiver ................................................. ;..............................
I OT54/74FCT273
Octal 0 Flip-Flop w/Comrnon Master Reset ....... ........................... ... ......... ...............
IOT54/74FCT299
8-lnput Universal Shift Register w/Comrnon Parallel I/O Pins ...................................
IOT54/74FCT373
Non-inverting Octal Transparent Latch.......... ......... ...... .... ... .... ................... ..... ....... ...
IOT54/74FCT533
Inverting Octal Transparent Latch ................................................ ..... ........................
IOT54/74FCT573
Non-inverting Octal Transparent Latch .......... ......... ................. ........................ ..........
IOT54/74FCT374
Non-inverting Octal 0 Flip-Flop. ............................ ....... .......... ...................................
IOT54/74FCT534
Inverting Octal 0 Flip-Flop w/3-State ............................................ ..... ........................
IOT54/74FCT574
Non-inverting Octal 0 Register w/3-State .......... ~.......................................................
IOT54/74FCT377
Octal 0 Flip-Flop w/Clock Enable................. ..... .......... .... ..... .......... .............. .............
IOT54/74FCT399
Quad Dual-Port Register .................................. ;........................................................
IOT54/74FCT521
8-Bit Identity Comparator ...... :....................................................................................
I OT54/74FCT543
Non-inverting Octal Latched Transceiver .................. ~ ............... .'...............................
I OT54/74FCT646
Non-inverting Octal Registered Transceiver ..............................................................
IOT54/74FCT821
10-Bit Non-inverting Register w/3-State ....................................................... :............
IOT54/74FCT823
9-Bit Non-inverting Register w/Clear & 3-State .........................................................
I OT54/74 FCT824
9-Bit Inverting Register w/Clear & 3-State .................................................................
IOT54/74FCT825
8-Bit Non-inverting Register .......................................................................................
IOT54/74FCT827
10-Bit Non-inverting Buffer. ...... ... .............. ........ ..... .... .......... ....... ... .............. ....... ... ...
IOT54/74FCT833
8-Bit Transceiver w/Parity ................... ,......................................................................
IOT54/74FCT841
10-Bit Non-inverting Latch .................... ............ .... ....... ... .............. ............. ...... ..... .....
I OT54/74FCT843
9-Bit Non-inverting Latch .................................................... ~ .... ~.................................
IOT54/74FCT844
9-Bit Inverting Latch ............................................................................................. ~.....
IOT54/74FCT845
8-Bit Non-inverting Latch ................................................... ,................ ~ ....................... '
IOT54/74FCT861
10-Bit Non-inverting Transceiver ...............................................................................
I OT54/74 FCT863
9-Bit Non-inverting Transceiver .................................................................................
I OT54/74 FCT864
9-Bit Inverting Transceiver ..........................................................................................

6.41
6.42
6.43
6.44
6.44
6.44
6.45
6.45
6.45
6.46
6.47
6.48
6.49
6.50
6.51
6.51
6.51
6.51
6.52
6.53
6.54
6.54
6.54
6.54
6.55
6.55
6.55

I OT5417 4FBT240
IOT54/74FBT241
IOT54/74FBT244
IOT54/74FBT245
IOT54/74FBT373
IOT54/74FBT374
I OT54/74FBT540
I OT54/74FBT541
IOT54/74FBT821
I OT54/74FBT823
IOT54/74FBT827
IOT54/74FBT828
IOT54/74FBT841
I OT54/74FBT2240
I OT54/74FBT2244
IOT54/74FBT2373
I OT54/74FBT2827
IOT54/74FBT2828
IOT54/74FBT2841

6.56
6.57
6.58
6.59
6.60
6.61
6.62
6.62
6.63
6.64
6.65
6.65
6.66
6.67
6.68
6.69
6.70
6.70
6.71

Inverting Octal Buffer/Line Driver .............................................. ................................
Non-inverting Octal Buffer/Line Driver .... ............. ....... .................... .... ........ ........... ....
Non-inverting Octal Buffer/Line Driver ... ..................... .................... .... ........ ........... ....
Non-inverting Octal Transceiver ..................................................... ... ........................
Octal Transparent Latch w/3-State ............................................................................
Non-inverting Octal 0 Register ....................................................... ... ........................
Inverting Octal Buffer ................................................................. ................................
Non-inverting Octal Buffer .................................................................... .....................
1O-Bit Non-inverting Register .............................................. ......................................
9-Bit Inverting Register ..............................................................................................
Non-inverting 10-Bit Buffers/Driver ............................................................................
Inverting1 O-Bit Buffers/Driver ....................................................................................
10-Bit Non-inverting Latch .........................................................................................
Inverting Octal Buffer/Line Driver w/25.Q Series Resistor ............. ................... ..........
Inverting Octal Buffer/Line Driver w/25.Q Series Resistor ..........................................
Octal Transparent Latch w/3-State & 250 Series Resistor ....... ................................
Non-inverting 10-Bit Buffers/Driver w/250 Series Resistor .......................................
Inverting10-Bit Buffers/Driver w/250 Series Resistor ...................... ..........................
10-Bit Memory Latch w/25.Q Series Resistor ................................ .............................

APPLICATION AND TECHNICAL NOTES
Complex Lagle Products Technical Notes
TN-02
Build a 20MIP Data Processing Unit .........................................................................
TN-03
Using the lOT49C402A ALU ......................................................................................

1.3

7.1
7.2

5

1990-91 LOGIC DATA BOOK (CONTINUED)

PAGE

Complex Logic Products Application Notes
AN-03
Trust Your Data with A High-Speed CMOS 6-, 32- or 64-Bit EDC .......................... .
AN-06
16-Bit CMOS Slices - New Building Blocks Maintain Microcode
Compatibility Yet Increase Performance ..........................................................
AN-17
FIR Filter Implementation Using FIFOs and MACs ...................................................
AN-24
Designing with the IDT49C460 and IDT39C60 Error Detection and
Correction Units ................................................................................................
AN-32
Implementation of Digital Filters Using IDT7320, IDT7210, IDT7216 ....................... .
Address Generator in Matrix Unit Operation Engine .................................................
AN-35
AN-37
Designing High-Performance Systems Using the IDT PaletteDACTM ...................... .
Using the IDT75C457's PaletteDACTM in True Color and Monochrome
AN-63
Graphics Applications ......................................................................................
AN-64
Protecting Your Data with IDT's 49C465 32-Bit Flow-thruEDCTM Unit .................... ..
AN-65
Using IDT73200 or 1DT7321 0 as Read and Write Buffers with R3000 .................... .
Standard Logic Application Notes ....................................................................................................................
AN-47
Simultaneous Switching Noise ..................................................................................
AN-48
Using High-Speed Logic ................................................................... .........................
AN-49
Characteristics of PCB Traces ............ ...................... .......... ................ ........ ..............
AN-50
Series Termination ....................................................................................................
AN-51
Power Dissipation in Clock Drivers............................................................................
FCT Output Structures and Characteristics ...............................................................
AN-52
Power-Down Operation .............................................................................................
AN-53
AN-54
FCT-T Logic Family ...................................................................................................
Standard Logic Technical Bulletins ..................................................................................................................

1~

7.3
7.4
7.5
7.6
7.7
7.8
7.9
7.10
7.11
7.12
7.13
7.14
7.15
7.16
7.17
7.18
7.19
7.20
7.21
7.22

6

SPECIALIZED MEMORIES DATA BOOK
The following Is a listing of the data sheets located In the 1990-91 Specialized Memories Data Book available
under separate cover:

ECl PRODUCTS
IDT10484
IDT100484
IDT101484
IDT10A484
IDT100A484
IDT101A484
IDT10490
IDT100490
IDT101490
IDT10494
IDT100494
IDT101494
IDT10496ll
IDT100496ll
IDT101496ll
IDT10496Rl
IDT100496Rl
IDT101496Rl
IDT10497
IDT100497
IDT101497
IDT10498
IDT100498
IDT101498
IDT10504
IDT100504
IDT101504
IDT10506ll
IDT100506ll
IDT101506ll
IDT10506Rl
IDT100506Rl
IDT101506Rl
IDT10507
IDT100507
IDT101507
IDT10508
IDT100508
IDT101508
IDT10509
IDT100509
IDT101509

PAGE
4K x 4 ECl 10K SRAM (Corner Power) ....................................................................
4K x 4 ECl 1OOK SRAM (Corner Power) ........................................... .......................
4K x 4 ECl 101K SRAM (Corner Power) ....................•.............................................
4K x 4 ECl 10K SRAM (Center Power) ....................................................................
4K x 4 ECl 100K SRAM (Center Power) ........................................... .......................
4K x 4 ECl 101 K SRAM (Center Power) ........................................... .......................
64K x 1 ECl 10K SRAM ............................................................................................
64K x 1 ECl 1OOK SRAM. ....... ........... ... .......... .... ....... ..... ........ ..... ....... ............ ..... .....
64K x 1 ECl 101K SRAM..........................................................................................
16K x 4 ECl 10K SRAM............................................................................................
16K x 4 ECl 100K SRAM..........................................................................................
16K x 4 ECl 101K SRAM..........................................................................................
16K x 4 Self-Timed latch Input, latch Output ..........................................................
16K x 4 Self-Timed latch Input, latch Output ......................... ;.... ~...........................
16K x 4 Self-Timed latch Input, latch Output ..........................................................
16K x 4 Self-Timed Reg Input, latch Output .............................................................
16K x 4 Self-Timed Reg Input, latch Output .............................................................
16K x 4 Self-Timed Reg Input, latch Output.............................................................
16K x 4 Synchronous Write, latch Output ...................... ~.........................................
16K x 4 Synchronous Write, latch Output ................................................................
16K x 4 Synchronous Write, latch Output ................................................................
16K x 4 Conditional Write, latch Output ...................................................................
16K x 4 Conditional Write, latch Output ............................................ .... ...................
16K x 4 Conditional Write, latch Output ............................................ .......................
64K x 4 ECl 10K SRAM ............................................................................................
64K x 4 ECl 1OOK SRAM ............. ........... ........ ..... .... ....... ............. ........... ... .... ..... ......
64K x 4 ECl 1OOK SRAM ................................................................... .......................
64K x 4 Self-Timed latch Input, latch Output ..........................................................
64K x 4 Self-Timed latch Input, latch Output ..........................................................
64K x 4 Self-Timed latch Input, latch Output ..........................................................
64K x 4 Self-Timed Reg Input, latch Output .............................................................
64K x 4 Self-Timed Reg Input, latch Output.............................................................
64K x 4 Self-Timed Reg Input, latch Output.............................................................
64K x 4 Synchronous Write, latch Output ................................................................
16K x 4 Synchronous Write, latch Output ................................................................
16K x 4 Synchronous Write, latch Output ................................................................
64K x 4 Conditional Write, latch Output ....................................................................
64K x 4 Conditional Write, latch Output ...................................................................
64K x 4 Conditional Write, latch Output ...................................................................
32K x 9 ECl 10K SRAM............................................................................................
32K x 9 ECl 1OOK SRAM ....... ... ..... ......... ....... ...... ...... ..... ....... ... ..... .... ... ...... ........ ......
32K x 9 ECl 101K SRAM..........................................................................................

5.1
5.1
5.1
5.2
5.2
5.2
5.3
5.3
5.3
5.4
5.4
5.4
5.5
5.5
5.5
5.6
5.6
5.6
5.7
5.7
5.7
5.8
5.8
5.8
5.9
5.9
5.9
5.10
5.10
5.10
5.11
5.11
5.11
5.12
5.12
5.12
5.13
5.13
5.13
5.14
5.14
5.14

256 x 9-Bit Parallel FIFO ...........................................................................................
512 x 9-Bit Parallel FIFO ...........................................................................................
1024 x 9-Bit Parallel FIFO .........................................................................................
2K x 9-Bit Parallel FIFO .............................................................................................
4K x 9-Bit Parallel FIFO .............................................................................................
8K x 9-Bit Parallel FIFO .............................................................................................
16K x 9-Bit Parallel FIFO ...........................................................................................

6.1
6.1
6.2
6.3
6.3
6.4
6.5

FIFO PRODUCTS
IDT7200
IDT7201
IDT7202
IDT7203
IDT7204
IDT7205
IDT7206

1.3

7

II

1990-91 SPECIALIZED MEMORIES DATA BOOK (CONTINUED)

PAGE

FIFO PRODUCTS (CONTINUED)
1K x 9-Bit Parallel FIFO wI Flags and Output Enable ............................................... 6.6
IDT72021
2K x 9-Bit Parallel FIFO w/Flags and Output Enable ................................................ 6.6
IDT72031
4K x 9-Bit Parallel FIFO w/Flags and Output Enable ................................................ 6.6
IDT72041
2K x 9-Bit Configurable Parallel-Serial FIFO ............................................................. 6.7
IDT72103
4K x 9-Bit Configurable Parallel-Serial FIFO ............................................................. 6.7
IDT72104
256 x 16-Bit Parallel-to-Serial FIFO ........................................................................... 6.8
IDT72105
512 x 16-Bit Parallel-to-Serial FIFO ........................................... ;............................... 6.8
IDT72115
1024 x 16-Bit Parallel-to-Serial FIFO:........................................................................ 6.8
IDT72125
2048 x 9-Bit Parallel-to-Serial FIFO ........................................................................... 6.9
IDT72131
4096 x 9-Bit Parallel-to-Serial FIFO ........................................................................... 6.9
IDT72141
2048 x 9-Bit Serial-to-Parallel FIFO ....................... :................................................... 6.10
IDT72132
4096 x 9-Bit Serial-to-Parallel FIFO ........................................................................... 6.10
IDT72142
256 x 8-Bit Parallel SyncFIFOTM (Clocked FIFO) ...................................................... 6.11
IDT72200
512 x 8-Bit Parallel SyncFIFOTM (Clocked FIFO) ...................................................... 6.11
IDT72210
64 x 8-Bit Parallel SyncFIFOTM (Clocked FIFO) ........................................................ 6.11
IDT72420
256 x 9-Bit Parallel SyncFIFOTM (Clocked FIFO) ...................................................... 6.12
IDT72201
512 x 9-Bit Parallel SyncFIFOTM (Clocked FIFO) ...................................................... 6.12
IDT72211
IDT72421
64 x 9-Bit Parallel SyncFIFOTM (Clocked FIFO) ........................................................ 6.12
512 x 18-Bit Parallel SyncFIFOTM (Clocked FIFO) ............. :...................................... 6.13
IDT72215A
IDT72225A
1024 x 18-Bit Parallel SyncFIFOTM (Clocked FIFO) .~................................................ 6.13
1K x 8-Bit Parallel SyncFIFOTM (Clocked FIFO)........................................................ 6.14
IDT72220
IDT72230
2K x 8-Bit Parallel SyncFIFOTM (Clocked FIFO) ........................................................ 6.14
4K x 8-Bit Parallel SyncFIFOTM (Clocked FIFO) ........................................................ 6.14
IDT72240
IDT72221
1K x 9-Bit Parallel SyncFIFOTM (Clocked FIFO) ........................................................ 6.15
IDT72231
2K x 9-Bit Parallel SyncFIFOTM (Clocked FIFO) ............ :........................................... 6.15
IDT72241
4K x 9-Bit Parallel SyncFIFOTM (Clocked FIFO)........................................................ 6.15
2K x 18-Bit Parallel SyncFIFOTM (Clocked FIFO) ...................................................... 6.16
IDT72235
IDT72245
4K x 18-Bit Parallel SyncFIFOTM (Clocked FIFO) ...................................................... 6.16
IDT72401
64 x 4 FIFO................................................................................................................ 6.17
IDT72402
64 x 5 FIFO................................................................................................................. 6.17
IDT72403
64 x 4 FIFO (w/Output Enable) .................................................................................. 6.17
IDT72404
64 x 5 FIFO (w/Output Enable) .................................................................................. 6.17
IDT72413
64 x 5 FIFO (w/Flags) ................................................................................................ 6.18
IDT7251
512x18-Bit-1Kx9-BitBiFIFO .............................................................................. 6.19
IDT7252
1K x 18-Bit - 2K x 9-Bit BiFIFO ...................... :........................................................ 6.19
512x18-Bit-1Kx9-BitBiFIFO .............................................................................. 6.19
IDT72510
1K x 18-Bit - 2K x 9-Bit BiFIFO ............................. ;.................... :............................ 6.19
IDT72520
, 512 x 18-Bit BiFIFO ................................................................................................... 6.20
IDT72511
IDT72521
,1K x 18-Bit BiFIFO .................................... ;................................................................ 6.20
256 x 18-Bit Synchronous BiFIFO (SyncBiFIFOTM) ................................................... 6.21
IDT72605
IDT72615
512 x 18-Bit Synchronous BiFIFO (SyncBiFIFOTM)...................................................6.21

SPECIALTV MEMORY PRODUCTS
IDT7130
IDT7140
IDT7030
IDT7040
IDT7010
IDT70104
IDT70101
IDT70105
IDT7132
IDT7142
IDT7032·

8K (1 K x 8) Dual-Port RAM (MASTER) .....................................................................
8K (1 K x 8) Dual-Port RAM (SLAVE) ............................... ;........................................
8K (1 K x 8) Dual-Port RAM (MASTER) .....................................................................
8K (1 K x 8) Dual-Port RAM (SLAVE) ........................................................................
9K (1 K x 9) Dual-Port RAM (MASTER) .....................................................................
9K (1 K x 9) Dual-Port RAM (SLAVE) ................................. :......................................
9K (1 K x 9) Dual-Port RAM (MASTER w/lnterrupts) .................................................
9K (1 K x 9) Dual-Port RAM (SLAVE w/lnterrupts) .....................................................
16K (2K x 8) Dual-Port RAM (MASTER) ...................................................................
16K (2K x 8) Dual-Port RAM (SLAVE) ......................................................................
16K (2K x 8) Dual-Port RAM (MASTER) ...... ~.............................................................

1.3

7.1
7.1
7.2
7.2
7.3
7.3
7.4
7.4
7.5
7.5
7.6

8

1990-91 SPECIALIZED MEMORIES DATA BOOK (CONTINUED)

PAGE

SPECIALTV MEMORY PRODUCTS (CONTINUED)
IDT7042
16K (2Kx 8) Dual-Port RAM (SLAVE) ......................................................................
IDT71321
16K (2K x 8) Dual-Port RAM (MASTER w/lnterrupts) .............................................. .
IDT71421
16K (2K x 8) Dual-Port RAM (SLAVE w/lnterrupts) ...................................................
IDT7012
18K (2K x 9) Dual-Port RAM .....................................................................................
IDT70121
18K (2K x 9) Dual-Port RAM (MASTER w/lnterrupts) .............................................. .
IDT70125
18K (2K x 9) Dual-Port RAM (SLAVE w/lnterrupts) ...................................................
IDT7133
32K (2K x 16) Dual-Port RAM (MASTER) .................................................................
IDT7143
32K (2K x 16) Dual-Port RAM (SLAVE) ................................................................... .
IDT7133SNLA
32K (2K x 16) Dual-Port RAM (MASTER) .................................................................
IDT7143SNLA
32K (2K x 16) Dual-Port RAM (SLAVE) ....................................................................
IDT7134
32K (4K x 8) Dual-Port RAM .....................................................................................
I DT7134SNLA
32K (4K x 8) Dual-Port RAM .................................................................................... .
IDT71342
32K (4K x 8) Dual-Port RAM (w/Semaphores) ..........................................................
IDT71342SNLA
32K (4K x 8) Dual-Port RAM (w/Semaphores) ..........................................................
IDT7014
32K (4K x 9) Dual-Port RAM .....................................................................................
IDT7024
64K (4K x 16) Dual-Port RAM .................................................................................. .
IDT7005
64K (8K x 8) Dual-Port RAM .....................................................................................
IDT7025
128K (8K x 16) Dual-Port RAM ................................................................................ .
IDT7006
128K (16K x 8) Dual-Port RAM ................................................................................ .
IDT7050
8K (1 K x 8) FourPort™ RAM .....................................................................................
IDT7052
16K (2K x 8) FourPort™ RAM ...................................................................................
IDT71502
64K (4K x 16) Registered RAM (W/SPCTM) ...............................................................

7.6
7.7
7.7
7.8
7.9
7.9
7.10
7.10
7.11
7.11
7.12
7.13
7.14
7.15
7.16
7.17
7.18
7.19
7.20
7.21
7.22
7.23

SUBSYSTEMS PRODUCTS
MULTI·PORT MODULES
IDT7M134
8K x 8 Master Dual-Port SRAM Module ....... ...... ........... ..... ..... ......... ............... ....... ... 8.1
IDT7M144
8K x 8 Slave Dual-Port SRAM Module ...................................................................... 8.2
IDT7M135
16K x 8 Master Dual-Port SRAM Module .................................................................. 8.1
IDT7M145
16K x 8 Slave Dual-Port SRAM Module .................................................................... 8.2
IDT7M137
32K x 8 Master Dual-Port SRAM Module .................... ..... ....... .... ................. ... ..... ..... ' 8.3
IDT7M1003
64K x 8 Dual-Port SRAM Module .............................................................................. 8.4
IDT7M1001
128K x 8 Dual-Port SRAM Module ................... ............. ................... ..... .... ..... ... ..... ... 8.4
IDT7M1004
8K x 9 Dual-Port SRAM Module ....................'..... .... ....... ... ................ ........ ..... ......... ... 8.5
IDT7M1005
16K x 9 Dual-Port SRAM Module .............................................................................. 8.5
IDT7MB6056
32K x 16 Dual-Port (Shared Memory) SRAM Module. ............ .... ..... ..... ..... ..... ....... ... 8.6
IDT7MB1008
32K x 16 Dual-Port SRAM Module ........ ........... ............... ... ......... ..... .............. ..... ...... 8.7
IDT7MB1006
64K x 16 Dual-Port SRAM Module ........................................................................... 8.7
IDT7MB6046
64K x 16 Dual-Port (Shared Memory) SRAM Module ............................................... 8.6
IDT7MB6036
128K x 16 Dual-Port (Shared Memory) SRAM Module ............................................. 8.6
IDT7MB6156
32K x 18 Dual-Port (Shared Memory) SRAM Module ............................................... 8.8
IDT7MB6146
64K x 18 Dual-Port (Shared Memory) SRAM Module ............ ............. ...... ........ ........ 8.8
IDT7MB6136
128K x 18 Dual-Port (Shared Memory) SRAM Module ............................................. 8.8
IDT7M1002
16K x 32 Dual-Port SRAM Module ........................... ......... ... ................ ..... ....... ......... ' 8.9
IDT7MB1041
8K x 8 FourPort™ SRAM Module ... ........................................ ..... ........ .... ........ ..... ..... 8.10
IDT7MB1042
4K x 8 FourPort™ SRAM Module. ............................ ... ........... ..... ........ .... ... ..... ..... ..... 8.10
IDT7MB1043
4K x 16 FourPort™ SRAM Module ............................................................................ 8.11
IDT7MB1044
2K x 16 FourPort™ SRAM Module ............................................................................ 8.11
FIFO MODULES
IDT7M205
I DT7MP2005
IDT7M206
IDT7MP2011
IDT7M207

8K x 9-Bit CMOS FIFO Module .................................................................................
8K x 9-Bit FIFO Module .............................................................................................
16K x 9-Bit CMOS FIFO Module ...............................................................................
16K x 9 Bit FIFO Module ...........................................................................................
32K x 9-Bit CMOS FIFO Module ................. ;.............................................................

1.3

8.12
8.13
8.12
8.13
8.14

9

II

1990-91 SPECIALIZED MEMORIES DATA BOOK (CONTINUED)

PAGE

FIFO MODULES (CONTINUED)

IDT7MP2010
IDT7MP2009

16K x 18-Bit FIFO Module .........................................................................................
32K x 18-Bit FIFO Module .........................................................................................

8.15
8.15

1M x 1 CMOS Static RAM Module. ...... ........ ................ ..... .......... ..... ............ ..... .... ....
256K x 4 CMOS Static RAM Module .........................................................................
64K x 8 CMOS Static RAM Module ............... ;...........................................................
128K x 8 CMOS Static RAM Module. ........... ............... ..... .... ...................... ... ....... .....
128K x 8 CMOS Static RAM Module ..... ....... ... ..... ....... ..... .... ........................... ..... .....
128K x 8 CMOS Static RAM Module ............ ............... ..... .... ........................... ..... .....
256K x 8 CMOS Static RAM Module ....... ............... ....... ..... ............ ... ........... ....... ......
512K x 8 CMOS Static RAM Module .........................................................................
512K x 8 CMOS Static RAM Module ..................................................... ....................
512K x 8 CMOS Static RAM Module ....................................................... ..................
512K x 8 CMOS Static RAM Module .........................................................................
64K x 9 CMOS Static RAM Module ...........................................................................
256K x 9 CMOS Static RAM Module ........ .... ........ .... ... ..... .... ............................. ... .....
16K x 16 CMOS Static RAM Module .........................................................................
2(16K x 16) CMOS Static RAM Module ....................................................................
32K x 16 CMOS Static RAM Module .........................................................................
32K x 16 CMOS Static RAM Module ............................................ .............................
32K x 16 CMOS Static RAM Module .........................................................................
64K x 16 CMOS Static RAM Module ............ ........ .... ..... ... .... .......... .............. ........ .....
64K x 16 CMOS Static RAM Module ............ ............ ..... ....... .......... .............. ....... ......
64K x 16 CMOS Static RAM Module ............ ... .............. ....... ......................... ......... ...
64K x 16 CMOS Static RAM Module .........................................................................
256K x 16 CMOS Static RAM Module ............................................ ......... ..................
512K x 16 CMOS Static RAM Module .......................................................................
16K x 32 CMOS Static RAM Module wlSeparate Data 1/0 .......................................
16K x 32 CMOS Static RAM Module .........................................................................
32K x 32 CMOS Static RAM Module ............ ............ ..... ....... ......................... .... ... .....
64K x 32 CMOS Static RAM Module ..... ....... ............ ..... ..... .............. ............ .............
64K x 32 CMOS Static RAM Module ....... ..... ....... ..... ..... ....... ............ ............. ...... ... ...
128K x 32 CMOS Static RAM Module .......................................................................
256K x 32 CMOS Static RAM Module .......................................................................

8.16
8.17
8.18
8.19
8.20
8.21
8.22
8.23
8.23
8.24
8.25
8.18
8.26
8.27
8.28
8.29
8.30
8.31
8.32
8.29
8.30
8.31
8.33
8.34
8.35
8.36
8.37
8.38
8.39
8.37
8.40

IDT7MB6064
IDT7MB6044
IDT7MB6043
IDT7MB6051

(2
(2
(2
(2

8.41
8.42
8.43

IDT7MB6039
IDT7MB6049

(2
(2

IDT7MB6040
IDT7MB6061

(2
(2

SRAM MODULES

IDT7MC4001
IDT7M4042
IDT7M812
IDT8M824S
IDT8MP824S
IDT8MP824L
IDT7MP4034
IDT7M4048
IDT7MB4048
IDT7MP4008S
IDT7MP4058L
IDT7M912
IDT7MB4040
IDT7MC4005
IDT7MB4009
IDT8M612
IDT8MP612S
IDT8MP612L
IDT7M624
IDT8M624
IDT8MP624S
IDT8MP624L
IDT7M4016
IDT7MP4047
IDT7MC4032
IDT7MP4031
IDT7M4003
IDT7M4017
IDT7MP4036
IDT7M4013
IDT7MP4045
CACHE MODULES

x.4K x 64) Data/Instruction Cache Module for IDT79R3000 CPU .........................
x 4K x 64) Data/Instruction Cache Module for IDT79R3000 CPU .........................
x 8K x 64) Data/Instruction Cache Module for IDT79R3000 CPU .........................
x 8K x 64) Data/Instruction Cache Module for IDT79R3000 CPU
(Multiprocessor) ........................................................................ ........................
x 16K x 60) Data/Instruction Cache Module for IDT79R3000 CPU .......................
x 16K x 60) Data/Instruction Cache Module for IDT79R3000 CPU
(Multiprocessor) ........ ....... ..... ....................... ............ ........ ....... ......... ..... ....... .....
x 16K x 64) Data/Instruction Cache Module for General Purpose CPUs ..............
x 16K x 60) Data/Instruction w/Resettable Instruction Tag ........ ..... ...... ... ......... ....

8.44
8.45
8.46
8.47
8.48

WRITABLE CONTROL STORE MODULES

IDT7M6032
IDT7MB6042

16K x 32 Writable Control Store Static RAM Module ................................................
8K x 112 Writable Control Store Static RAM Module ................................................

8.49
8.50

Modules with Various Combinations of SRAMs, EPROMs and EEPROMs ..............

8.51

OTHER MODULES

Flexi-Pak Family

1.3

10

1990-91 SPECIALIZED MEMORIES DATA BOOK (CONTINUED)

PAGE

CUSTOM MODULES
Subsystem Custom Module Capabilities ..............................................................................................................

8.52

II

APPLICATION AND TECHNICAL NOTES
FIFO Products Application Notes
AN-01
Understanding the 1DT720117202 FIFO ....................................................................
AN-15
Using the IDT721 03/1 04 Serial-Parallel FIFO ...........................................................
AN-22
Performance Advantages with lOT's Flagged FIFOs .............................................. ..
AN-34
General Purpose (16-Bit to 8-Bit) BiFIFO Interface ...................................................
AN-36
The BiFIFO Parity Generation and Checking ............................................................
AN-39
The Programmable Flags of BiFIFOs ........................................................................
AN-56
The BiFIFO Expansion Configuration ...................................................................... ..
AN-57
The BiFIFO Bypass ...................................................................................................
AN-60
Designing with the lOT SyncFIFOTM - The Architecture of the Future .................. ..
AN-69
Depth Expansion of lOT's Synchronous FIFOs Using the Ring Counter
Approach ...........................................................................................................
AN-71
Simplify SCSI Host Adapter Design with Bidirectional FIFO Memories .................. ..
AN-73
Understanding the Output Control OE of the Flagged FIFOs:
IDT72021/31/41 ................................................................................................
FIFO Products Technical Notes
TN-06
Designing with FIFOs ................................................................................................
TN-08
Operating FIFOs on Full and Empty Boundary Conditions ...................................... .
TN-09
Cascading FIFOs or FIFO Modules ...........................................................................
Specialty Memory Products Application Notes
AN-02
Dual-Port RAMs Simplify Communication in Computer Systems ............................. .
AN-09
Dual-Port RAMs Yield Bit-Slice DeSigns without Microcode .................................... .
Dual-Port RAMs with Semaphore Arbitration .; ..........................................................
AN-14
AN-42
Using the IDT7052 FourPort™ SRAM ...................................................................... .
lOT FourPort™ RAM Facilitates Multiprocessor Designs ......................................... .
AN-43
AN-45
Introduction to lOT's FourPort™ RAM ...................................................................... .
AN-59
Using IDT7024 and IDT7025 Dual-Port Static RAMs to Match System
Bus Widths .............. '...........................................................................................
AN-67
Using IDT71502 RAMs in a Real-Time Debugging Tool for an R3000
Microprocessor-based System ........................................................................
AN-68
Dual-Port RAM Simplifies PC to TMS320 Interface ................................................ ..
AN-70
Dual-Port Interrupt Expansion ...................................................................................

9.1
9.2
9.3
9.4
9.5
9.6
9.7
9.8
9.9
9.10
9.11
9.12
9.13
9.14
9.15

9.16
9.17
9.18
9.19
9.20
9.21
9.22
9.23
9.24
9.25

Subsystems Products Application Notes
AN-44
Design Guidelines for Custom Module Packages .................................................... . 9.26
AN-74
Understanding Dual-Port Shared Memory Modules ................................................. . , 9.27
AN-75
Using the IDT7M4017 in an 8-Bit and 16-Bit Wide Organization ............................. . 9.28
AN-76
Using the IDT7MB6049 Cache Module with the IDT79R3000 RISC Processor
in Single or Multiprocessor Systems ............................................................... . 9.29

1.3

11

STATIC RAM DATA BOOK
The following Is a listing of the data sheets located In the 1991 Static RAM Data Book available under separate
cover:

STATIC RAM PRODUCTS

PAGE

CEMOS STATIC RAMS WITH POWER DOWN
IDT6116
2Kx.8withPower-Down ...........................................................................................
IDT61298
64K x 4 with Output Enable and Power-Down ..........................................................
IDT6167
16K x 1 with Power-Down .........................................................................................
IDT6168
4K x 4 with Power-Down ...........................................................................................
IDT6178
4K x 4 Cache-Tag with Power-Down ....................... ,................ .................................
IDT61970
4K x 4 with Oiutput Enable and Power-Down............................................................
IDT6198
16K x 4 with Output Enable and Power-Down ..........................................................
IDT71 024
128K x 8 with Power-Down ..... ....... ... ..... ....... ....... .......... ..... ...... ........................ ........
IDT71028
256K x 4 with Power-Down .......................................................................................
IDT71256
32K x 8 with Power-Down .........................................................................................
64K x 4 with Power-Down .........................................................................................
IDT71258
IDT71259
32K x 9 with Power-Down ............. ..................... ........................ ..... ..... ......... ........ ....
IDT71281
64K x 4 with Separate I/O and Power-Down .............................................................
IDT71282
64K x 4 with Separate I/O and Power-Down .............................................................
IDT71569
8K x 9 with Address Latch and Power-Down .......................... ;.................................
IDT71586
4K x 16 with Address Latch and Power-Down ..........................................................
IDT71589
32K x 9 Burst Mode with Power-Down ......................................................................
IDT7164
8K x 8 with Power-Down ...........................................................................................
IDT7165
8K x 8 Resettable Power-Down.................................................................................
IDT71681
4K x 4 Separate I/O and Power-Down ......................................................................
IDT71682
4K x 4 Separate I/O and Power-Down ......................................................................
IDT7169
8K x 9 with Power-Down ...........................................................................................
IDT7174
8K x 8 Cache-Tag with Power-Down.........................................................................
IDT7187
64K x 1 with Power-Down .........................................................................................
IDT7188
16K x 4 with Power-Down .........................................................................................
IDT7198
16K x 4 with Output Enable, 2 CS and Power-Down ................................................
IDT71981
16K x 4 with Separate I/O and Power-Down .............................................................
IDT71982
16K x 4 with Separate I/O and Power-Down .............................................................

5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
5.11
5.12
5.13
5.13
5.14
5.15
5.16
5.17
5.18
5.19
5.19
5.20
5.21
5.22
5.23
5.24
5.25
5.25

HIGH-SPEED BiCEMOSTM STATIC RAMS
IDT61B298
64K x 4 BiCEMOS with Output Enable ................... ;..................................................
IDT61B98
16K x 4 BiCEMOS with Output Enable ......................................................................
IDT71B024
128K x 8 BiCEMOS ...................................................................................................
IDT71B028
256K x 4 BiCEMOS ................................................................... ............... .................
IDT71B221
4K x 18 x 2 BiCEMOS with Self-Timed Latch............................................................
IDT71B222
4K x 18 x 2 BiCEMOS with Dual Address Latches....................................................
IDT71B229
16K x 9 x 2 BiCEMOS Cache RAM ...........................................................................
IDT71B256
32K x 8 BiCEMOS .....................................................................................................
IDT71B258
64K x 4 BiCEMOS ..... ....... ..... ... ..................... ............... .............. ..... ..... ........... ..........
IDT71B556
32K x 8 BiCEMOS with Address Latch ... ..... ............ .... .................... ..... .... ......... ..... ...
IDT71B569
8K x 9 BiCEMOS with Address Latch... .......... ...... ...... ........... ....... ... ..... .......... ...... .....
IDT71B64
8K x 8 BiCEMOS .......................................................................................................
IDT71B65
8K x 8 BiCEMOS Resettable .....................................................................................
IDT71B69
8K x 9 BiCEMOS .......................................................................................................
IDT71B74
8K x 8 BiCEMOS Cache-Tag ....................................................................................
IDT71B79
8K x 9 BiCEMOS Cache-Tag ....................................................................................
IDT71B88
16K x 4 BiCEMOS .....................................................................................................
IDT71B98
16K x 4 BiCEMOS with Output Enable, 2 CS............................................................

6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
6.10
6.11
6.12
6.13
6.14
6.15
6.16
6.17
6.18

1.3

12

1991 STATIC RAM DATA BOOK (CONTINUED)

PAGE

II

APPLICATION AND TECHNICAL NOTES
Static RAM Products Application Notes
AN-OS
Separate 1/0 RAMs Increase Speed and Reduce Part Count...................................
AN-07
Cache Tag RAM Chips Simplify Cache Memory Design...........................................
AN-10
Low-Power and Battery Back-up Operation of CMOS Static RAMs..........................
AN-20
Static RAM Timing ........................................................................ ........ .....................
AN-27
Cache Design Considerations Using the IDT79R3000 .............................................
AN-30
Complete Cache Controller and Cache Memory Design Using
IDT Standard Parts for the 80386 Processor ....................................................
AN-38
IDT Static RAMs Simplify Cache Design with the 80386 and 82385 ........................
AN-46
A 33MHz MC68030 Zero-Wait Cache Memory .................................... .....................
AN-79
25MHz 68020 Cache in Only 19 Chips .....................................................................
AN-80
Intel 386/82385 Cached Microprocessor System ......................................................
AN-81
The IDT71586 Burst Cache RAM for the i486...........................................................

7.6.
7.7
7.8
7.9
7.10
7.11

Static RAM Products Technical Notes
TN-04
Using High-Speed 8K x 8 RAMs ...............................................................................
TN-07
Fast RAMs Give Lowest Power.................................................................................
TN-11
Cache Timing for the 68020 ......................................................................................
TN-13
Cache Timing for the 80386 ......................................................................................
TN-16
Programmable Length Shift Registers Using RAMs and Counters ...........................

7.12
7.13
7.14
7.15
7.16

1~

7.1
7.2
7.3
7.4
7.5

13

NUMERICAL TABLE OF CONTENTS
PART NO.

100484
100490
100494
100496ll
100496RL
100497
100498
100504
100506lL
100506RL
100507
100508
100509
100A484
101484
101490
101494
101496ll
101496RL
101497
101498
101504
101506ll
101506Rl
101507
101508
101509
101A484
10484
10490
10494
10496ll
10496Rl
10497
10498
10504
10506ll
10506Rl
10507
10508
10509
10A484
29 FCT52
29FCT520
29FCT520T
29FCT521T
29FCT52T
29FCT53
29 FCT53T
39C01
39C10
39C60
39C8XX
49C25

PAGE
4K x 4 ECl 100K SRAM (Corner Power) ......................................................
64K x 1 ECl 1OOK SRAM ..............................................................................
16K x 4 ECl 100K SRAM ..............................................................................
16K x 4 Self-Timed latch Input, latch Output .............................................. .
16K x 4 Self-Timed Reg Input, latch Output.. .............................................. .
16K x 4 Synchronous Write, Latch Output ................................................... .
16K x 4 Conditional Write, latch Output .......................................................
64K x 4 ECl 100K SRAM ..............................................................................
64K x4 Self-Timed latch Input, latch Output. ............................................. .
64K x 4 Self-Timed Reg Input, latch Output ................................................ .
16K x 4 Synchronous Write, latch Output ....................................................
64K x 4 Conditional Write, latch Output .......................................................
32K x 9 ECl 100K SRAM ..............................................................................
4K x 4 ECl 1OOK SRAM (Center Power) ..................................................... .
4K x 4 ECl 101 K SRAM (Corner Power) ..................................................... ~
64K x 1 ECl 101K SRAM ..............................................................................
16K x 4 ECl 101K SRAM ..............................................................................
16K x 4 Self-Timed latch Input, latch Output.. ............................................ .
16K x 4 Self-Timed Reg Input, latch Output ................................................ .
16K x 4 Synchronous Write, latch Output ....................................................
16K x 4 Conditional Write, latch Output .......................................................
64K x 4 ECl 1OOK SHAM ..............................................................................
64K x 4 Self-Timed latch Input, latch Output .............................................. .
64K x 4 Self-Timed Reg Input, latch Output ................................................ .
16K x 4 Synchronous Write, Latch Output ................................................... .
64K x 4 Conditional Write, latch Output .......................................................
32K x 9 ECl 101K SRAM ..............................................................................
4K x 4 ECl 101 K SRAM (Center Power) ......................................................
4K x 4 ECl 10K SRAM (Corner Power) ....................................................... .
64K x 1 ECl 10K SRAM ................................................................................
16K x 4 ECl 10K SRAM ................................................................................
16K x 4 Self-Timed latch Input, latch Output.. ............................................ .
16K x 4 Self-Timed Reg Input, latch Output ................................................ .
16K x 4 Synchronous Write, latch Output ................................................... .
16K x 4 Conditional Write, latch Output .......................................................
64K x 4 ECl 10K SRAM ................................................................................
64K x 4 Self-Timed latch Input, latch Output.. ............................................ .
64K x 4 Self-Timed Reg Input, latch Output ................................................ .
64K x 4 Synchronous Write, latch Output ....................................................
64K x 4 Conditional Write, latch Output .......................................................
32K x 9 ECl 10K SRAM ................................................................................
4K x 4 ECl 10K SRAM (Center Power) ........................................................
Non-inverting Octal Registered Transceiver ..... ............................................ .
Multi-level Pipeline Register ..........................•...............................................
Multi-level Pipeline Register ......................................................,.................. .
Multi-level Pipeline Register ... .......................................................................
Non-inverting Octal Registered Transceiver ..................................................
Inverting Octal Registered Transceiver ........................................................ .
Inverting Octal Registered Transceiver .........................................................
4-Bit Microprocessor Slice .............................................................................
12-Bit Sequencer ...........................................................................................
16-Bit Cascadable EDC .........: .......................................................................
IDT39C8XXX Family .....................................................................................
Microcycle length Controller .........................................................................

1.4

SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
lOGIC
lOGIC
lOGIC
lOGIC
lOGIC
lOGIC
lOGIC
lOGIC
lOGIC
lOGIC
lOGIC
lOGIC

NUMERICAL TABLE OF CONTENTS (CONTINUED)
PAGE

PART NO.

49C402
49C410
49C460
49C465
49C466
49 FCT661

49FCT804
49FCT805
49FCT806
49FCT818
54/74FBT2240
54/74FBT2244
54/74FBT2373
54/74FBT240
54/74FBT241
54/74FBT244
54/74FBT245
54/74FBT2827
54/74FBT2828
54/74FBT2841
54/74FBT373
54/74FBT374
54/74FBT540
54/74FBT541
54/74FBT821
54/74FBT823
54/74FBT827
54/74FBT828
54/74FBT841
54/74FCT138

54/74FCT138T
54/74FCT139
54/74FCT139T

54/74FCT151T
54/74FCT157T

54/74FCT161
54/74FCT161 T
54/74FCT163

54/74FCT163T
54/74FCT182
54/74FCT191
54/74FCT191T
54/74FCT193
54/74FCT193T
54/74FCT240

54/74FCT240T
54/74FCT241

54/74FCT241 T
54/74FCT244

54/74FCT244T
54/74FCT245
54/74 FCT245T
54/74FCT251T
54/74FCT257T

16-Bit Microprocessor Slice ...........................................................................
16-Bit Sequencer ...........................................................................................
32-Bit Cascadable EDC .................................................................................
32-Bit CMOS Flow-ThruEDC Unit ............................................................... ..
64-BIT CMOS Flow-ThruEDC Unit ................................................................
16-Bit Synchronous Binary Counter ..............................................................
High-Speed Tri-Port Bus Multiplexer .............................................................
Buffer/Clock Driver w/Guaranteed Skew ..................................................... ..
Buffer/Clock Driver w/Guaranteed Skew .......................................................
Octal Register with SPCTM ............................................................................
Inverting Octal BufferlLine Driver w/250 Series Resistor ..............................
Inverting Octal BufferlLine Driver w/250 Series Resistor ..............................
Octal Transparent Latch w/3-State & 250 Series Resistor .......................... .
Inverting Octal Buffer/Line Driver ..................................................................
Non-inverting Octal Buffer/Line Driver ...........................................................
Non-inverting Octal Buffer/Line Driver ...........................................................
Non-inverting Octal Transceiver ....................................................................
Non-inverting 10-Bit Buffers/Driver w/250 Series Resistor .......................... .
Inverting10-Bit Buffers/Driver w/25n Series Resistor .................................. ..
10-Bit Memory Latch w/25n Series Resistor .................................................
Octal Transparent Latch w/3-State ................................................................
Non-inverting Octal D Register ......................................................................
Inverting Octal Buffer .....................................................................................
Non-inverting Octal Buffer ....... ......................................................................
10-Bit Non-inverting Register ........................................................................
9-Bit Inverting Register ..................................................................................
Non-inverting 10-Bit Buffers/Driver ................................................................
Inverting10-Bit Buffers/Driver ........................................................................
10-Bit Non-inverting Latch .............................................................................
1-of-8 Decoder ...............................................................................................
1-of-8 Decoder ...............................................................................................
DuaI1-of-4 Decoder ......................................................................................
DuaI1-of-4 Decoder ......................................................................................
8-lnput Multiplexer .........................................................................................
Quad 2-lnput Multiplexer ...............................................................................
Synchronous Binary Counter w/Asynchronous Master Reset ...................... .
Synchronous Binary Counter w/Asynchronous Master Reset ...................... .
Synchronous Binary Counter w/Synchronous Reset .................................... .
Synchronous Binary Counter w/Synchronous Reset .................................... .
Carry Lookahead Generator ..........................................................................
Up/Down Binary Counter w/Preset and Ripple Clocks ................................ ..
Up/Down Binary Counter w/Preset and Ripple Clock ....................................
Up/Down Binary Counter w/Separate Up/Down Clocks ............................... .
Up/Down Binary Counter w/Separate Up/Down Clocks ................................
Inverting Octal Buffer/Line Driver ..................................................................
Inverting Octal Buffer/Line Driver ..................................................................
Non-inverting Octal Buffer/Line Driver ...........................................................
Non-inverting Octal Buffer/Line Driver ...........................................................
Non-inverting Octal Buffer/Line Driver ...........................................................
Non-inverting Octal Buffer/Line Driver ...........................................................
Non-inverting Octal Transceiver ....................................................................
Non-inverting Octal Transceiver ....................................................................
8-lnput Multiplexer w/3-State .........................................................................
Quad 2-lnput Multiplexer w/3-State ...............................................................

1.4

LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC

2

II

NUMERICAL TABLE OF CONTENTS (CONTINUED)
PAGE

PART NO.

54/74FCT273
54/74FCT273T
54/74FCT299
54/74FCT299T
54/74FCT373
54/74FCT373T
54/74FCT374
54/74FCT374T
54/74 FCT377
54/74FCT377T
54/74FCT399
54/74 FCT399T
54/74FCT521
54/74FCT521T
54/74FCT533
54/74FCT533T
54/74 FCT534
54/74FCT534T
54/74FCT540
54/74FCT540T
54/74FCT541
54/74FCT541T
54/74FCT543
54/74FCT543T
54/74 FCT573
54/74FCT573T
54/74 FCT574
54/74FCT574T
54/74FCT620T
54/74FCT621T
54/74FCT622T
54/74FCT623T
54/74FCT640
54/74FCT640T
54/74 FCT645
54/74FCT645T
54/74FCT646
54/74FCT646T
54/74FCT648T
54/74FCT651T
54/74FCT652T
54/74FCT821
54/74FCT821 T
54/74 FCT823
54/74 FCT823T
54/74FCT824
54/74 FCT825
54/74FCT825T
54/74FCT827
54/74FCT827T
54/74FCT828T
54/74FCT833
54/74FCT841
54/74FCT841T

Octal D Flip-Flop w/Comrnon Master Reset ................................................ .
Octal D Flip-Flop w/Comrnon Master Reset ................................................. .
8-lnput Universal Shift Register w/Comrnon Parallel I/O Pins ...................... .
8 Input Universal Shift Register w/Comrnon Parallel I/O Pins ...................... .
Non-inverting Octal Transparent Latch ..........................................................
Non-inverting Octal Transparent Latchw/3-State .........................................
Non-inverting Octal D Flip-Flop .................................................................... .
Non-inverting Octal D Register ....... ...............................................................
Octal D Flip-Flop w/Clock Enable ..................................................................
Octal D Flip-Flop w/Clock Enable ..... ............................................................ .
Quad Dual-Port Register ............................................................................. ..
Quad Dual-Port Register ...................................... ;....................................... .
8-Bit Identity Comparator ............................................................................. ..
8-Bit Identity Comparator ....... ........................................................................
Inverting Octal Transparent Latch ................... :........................................... ..
Inverting Octal Transparent Latch w/3-State ................................................ .
Inverting Octal D Flip-Flop w/3-State .........................................................:.. .
Inverting Octal D Register ..............................................................................
Inverting Octal Buffer/Line Driver ..................................................................
Inverting Octal Buffer/Line Driver ..................................................................
Non-inverting Octal Buffer/Line Driver ...........................................................
Non-inverting Octal Buffer/Line Driver ........................................................... .
Non-inverting Octal Latched Transceiver ......................................................
Non-inverting Octal Latched Transceiver ......................................................
Non-inverting Octal Transparent Latch ..........................................................
Non-inverting Octal Transparent Latch w/3-State ....................................... ..
Non-inverting Octal D Register w/3-State ..................................................... .
Non-inverting Octal D Register ......................................................................
Inverting Octal Bus Transceiver w/3-State ................................................... .
Non-inverting Octal Bus Transceiver (Open Drain) ..................................... ..
Inverting Octal Bus Transceiver (Open Drain) ............................................ ..
Non-inverting Octal Bus Transceiver w/3-State ............................................ .
Inverting Octal Transceiver ........ :.................................................................. .
Inverting Octal Transceiver ............................................................................
Non-inverting Octal Transceiver ...................... :............................................. .
Non-inverting Octal Transceiver ....................................................................
Non-inverting Octal Registered Transceiver ............................................... ;..
Non-inverting Octal Registered Transceiver ................................................ ..
Inverting Octal Registered Transceiver .........................................................
Inverting Octal Registered Transceiver .........................................................
Non-inverting Octal Registered Transceiver ................................................. .
10-Bit Non-inverting Register w/3-State ........................................................
10-Bit Non-inverting Register w/3-State ........................................................
9-Bit Non-inverting Register w/Clear & 3-State ............................................ .
9-Bit Non-inverting Register w/Clear & 3-State ............................................ .
9-Bit Inverting Register w/Clear & 3-State .....................................................
8-Bit Non-inverting Register ......................................................................... .
8-Bit Non-inverting Register w/Clear & 3-State ............................................ .
10-Bit Non-inverting Buffer ............................................................................
10-Bit Non-inverting Buffer ........................ ;.................................................. .
10-Bit Inverting Buffer .....................................................................................
8-Bit Transceiver w/Parity ...............................................................................
10-Bit Non-inverting Latch .......... ...................................................................
10-Bit Non-inverting Latch ........ .......................'............................................ ..

1.4

LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC·
LOGIC
LOGIC
LOGIC
. LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC

3

NUMERICAL TABLE OF CONTENTS (CONTINUED)
PART NO.
54/74 FCT843
54/74FCT843T
54/74FCT844
54/74FCT845
54/74FCT845T
54/74 FCT861
54/74 FCT863
54/74 FCT864
6116
61298
6167
6168
6178
61970
6198
61B298
61B98
7005
7006
7010
70101
70104
70105
7012
70121
70125
7014
7024
7025
7030
7032
7040
7042
7050
7052
71024
71028
71256
71258
71259
71281
71282
7130
7132
71321
7133
7133SNLA
7134
71342
71342SNLA
7134SNLA
7140
7142
71421

PAGE
9-Bit Non-inverting Latch ...............................................................................
9-Bit Non-inverting Latch ...............................................................................
9-Bit Inverting Latch .......................................................................................
8-Bit Non-inverting Latch ............................................................................. ..
8-Bit Non-inverting Latch ............................................................................. ..
10-Bit Non-inverting Transceiver ................................................................. ..
9-Bit Non-inverting Transceiver .................. ~ ................................................ ..
9-Bit Inverting Transceiver .............................................................................
2K x 8 with Power-Down ...............................................................................
64K x 4 with Output Enable and Power-Down ..............................................
16K x 1 with Power-Down ..............................................................................
4K x 4 with Power-Down ...............................................................................
4K x 4 Cache-Tag with Power-Down .............................................................
4K x 4 with Oiutput Enable and Power-Down ................................................
16K x 4 with Output Enable and Power-Down ............................................ ..
64K x 4 BiCEMOS with Output Enable ..........................................................
16K x 4 BiCEMOS with Output Enable ......................................................... .
64K (8K x 8) Dual-Port RAM ........................................................................ .
128K (16K x 8) Dual-Port RAM .................................................................... .
9K (1 K x 9) Dual-Port RAM (MASTER) .... ~ .................................................. ..
9K (1 K x 9) Dual-Port RAM (MASTER w/lnterrupts) .....................................
9K (1 K x 9) Dual-Port RAM (SLAVE) ............................................................
9K (1 K x 9) Dual-Port RAM (SLAVE wllnterrupts) ....................................... ..
18K (2K x 9) Dual-Port RAM .........................................................................
18K (2K x 9) Dual-Port RAM (MASTER w/lnterrupts) .................................. .
18K (2K x 9) Dual-Port RAM (SLAVE wllnterrupts) .......................................
32K (4K x 9) Dual-Port RAM ................................... ;.....................................
64K (4K x 16) Dual-Port RAM .......................................................................
128K (8K x 16) Dual-Port RAM ................................................................... ..
8K (1 K x 8) Dual-Port RAM (MASTER) ........................................................ .
16K (2K x 8) Dual-Port RAM (MASTER) .......................................................
8K (1 K x 8) Dual-Port RAM (SLAVE) ............................................................
16K (2K x 8) Dual-Port RAM (SLAVE) ........................................................ ..
8K (1 K x 8) FourPort™ RAM .........................................................................
16K (2K x 8) FourPort™ RAM .......................................................................
128K x 8 with Power-Down ............................................................................
256K x 4 with Power-Down ............................................................................
32K x 8 with Power-Down ..............................................................................
64K x 4 with Power-Down ............................................................................. .
32K x 9 with Power-Down ................. ~ ........................................................... .
64K x 4 with Separate 1/0 and Power-Down ................................................ .
64K x 4 with Separate 1/0 and Power-Down .................................................
8K (1 K x 8) Dual-Port RAM (MASTER) ....................................................... ..
16K (2K x 8) Dual-Port RAM (MASTER) ..................................................... ..
16K (2K x 8) Dual-Port RAM (MASTER wllnterrupts) ................................. ..
32K (2K x 16) Dual-Port RAM (MASTER) ................................................... ..
32K (2K x 16) Dual-Port RAM (MASTER) ................................................... ..
32K (4K x 8) Dual-Port RAM .........................................................................
32K (4K x 8) Dual-Port RAM (w/Semaphores) ..............................................
32K (4K x 8) Dual-Port RAM (w/Semaphores) ............................................ ..
32K (4K x 8) Dual-Port RAM .........................................................................
8K (1K x 8) Dual-Port RAM (SLAVE) ............................................................
16K (2K x 8) Dual-Port RAM (SLAVE) ........................................................ ..
16K (2K x 8) Dual-Port RAM (SLAVE wllnterrupts) ..................................... ..

1.4

LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP

4

II

NUMERICAL TABLE OF CONTENTS (CONTINUED)
PART NO.

PAGE

7143
7143SNLA
71502
71569
71586
71589
7164
7165
71681
71682
7169
7174
7187
7188
7198
71981
71982
71 B024
71 B028
71B221
71B222
71B229
71B256
71B258
71B556
71B569
71B64
71B65
71B69
71B74
71B79
71B88
71B98
7200
7201
7202
72021
7203
72031
7204
72041
7205
7206
72103
72104
72105
7210L
72115
72125

32K (2K x 16) Dual-Port RAM (SLAVE) ....................................................... .
32K (2K x 16) Dual-Port RAM (SLAVE) ....................................................... .
64K (4K x 16) Registered RAM (W/SPCTM) ................................................. ..
8K x 9 with Address Latch and Power-Down .............................................. ..
4K x 16 with Address Latch and Power-Down ..............................................
32K x 9 Burst Mode with Power-Down ........................................................ ..
8K x 8 with Power-Down ............................................................................. ..
8K x 8 Resettable Power-Down .....................................................................
4K x 4 Separate I/O and Power-Down ........................................................ ..
4K x 4 Separate I/O and Power-Down ..........................................................
8K x 9 with Power-Down ............................................................................. ..
8K x 8 Cache-Tag with Power-Down ............................................................ .
64K x 1 with Power-Down ..............................................................................
16K x 4 with Power-Down ............................................................................. .
16K x 4 with Output Enable, 2 CS and Power-Down .................................. ..
16K x 4 with Separate I/O and Power-Down ................................................ .
16K x 4 with Separate I/O and Power-Down ................................................ .
128K x 8 BiCEMOS .......................................................................................
256K x 4 BiCEMOS .......................................................................................
4K x 18 x 2 BiCEMOS with Self-Timed Latch and Power-Down .................. .
4K x 18 x 2 BiCEMOS with Dual Address Latches and Power-Down .......... .
16K x 9 x 2 BiCEMOS Cache RAM ...............................................................
32K x 8 BiCEMOS .........................................................................................
64K x 4 BiCEMOS ....................................................................................... ..
32K x 8 BiCEMOS with Address Latch ........................................................ ..
8K x 9 BiCEMOS with Address Latch .......................................................... ..
8K x 8 BiCEMOS ...........................................................................................
8K x 8 BiCEMOS Resettable .........................................................................
8K x 9 BiCEMOS ...........................................................................................
8K x 8 BiCEMOS Cache-Tag ........................................................................
8K x 9 BiCEMOS Cache-Tag ...................................................................... ..
16K x 4 BiCEMOS .........................................................................................
16K x 4 BiCEMOS with Output Enable, 2 CS .............................................. ..
256 x 9-Bit Parallel FIFO ...............................................................................
512 x 9-Bit Parallel FIFO ...............................................................................
1024 x 9-Bit Parallel FIFO .............................................................................
1K x 9-Bit Parallel FIFO wI Flags and Output Enable .................................. .
2K x 9-Bit Parallel FIFO .................................................................................
2K x 9-Bit Parallel FIFO w/Flags and Output Enable .................................. ..
4K x 9-Bit Parallel FIFO .................................................................................
4K x 9-Bit Parallel FIFO w/Flags and Output Enable .................................. ..
8K x 9-Bit Parallel FIFO ................................................................................ .
16K x 9-Bit Parallel FIFO ........ :......................................................................
2K x 9-Bit Configurable Parallel-Serial FIFO ............: .................................. ..
4K x 9-Bit Configurable Parallel-Serial FIFO .................................................
256 x 16-Bit Parallel-to-Serial FIFO ............................................................. ..
16 x 16 Parallel Multiplier-Accumulator ........................................................ .
512 x 16-Bit Parallel-to-Serial FIFO ...............................................................
1024 x 16-Bit Parallel-to-Serial FI FO .............................................................

72131

2048 x 9-Bit Parallel-to-Serial FIFO ............................................................. ..

SMP

72132
72141
72142
7216L

2048 x 9-Bit Serial-to-Parallel FIFO ...............................................................
4096 x 9-Bit Parallel-to-Serial FIFO ...............................................................
4096 x 9-Bit Serial-to-Parallel FIFO ...............................................................
16 x 16 Parallel Multiplier ..............................................................................

SMP
SMP
SMP
LOGIC

1.4

SMP
SMP
SMP
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SRAM
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
LOGIC
SMP
SMP

5

NUMERICAL TABLE OF CONTENTS (CONTINUED)
PART NO.
7217L
72200
72201
72210
72211
72215A
72220
72221
72225A
72230
72231
72235
72240
72241
72245
72401
72402
72403
72404
72413
72420
72421
7251
72510
72511
7252
72520
72521
72605
72615
73200L
73200L
73201L
73201L
73210
73210
73211
73211
7381L
7383L
75C457
75C458
75C48
75C58
79R3000A
79R3001
79R3010A
79R3020
79R3051
79R3500
79R3720
79R3721
79R3722
79R4000

PAGE
16 x 16 Parallel Multiplier (32 Bit Output) ..................................................... .
256 x 8-Bit Parallel SyncFIFOTM (Clocked FIFO) ..........................................
256 x 9-Bit Parallel SyncFIFOTM (Clocked FIFO) ........................................ ..
512 x 8-Bit Parallel SyncFIFOTM (Clocked FIFO) ......................................... .
512 x 9-Bit Parallel SyncFIFOTM (Clocked FIFO) ........................................ ..
512 x 18-Bit Parallel SyncFIFOTM (Clocked FIFO) ........................................
1K x 8-Bit Parallel SyncFIFOTM (Clocked FIFO) ............................................
1K x 9-Bit Parallel SyncFIFOTM (Clocked FIFO) ............................................
1024 x 18-Bit Parallel SyncFIFOTM (Clocked FIFO) .................................... ..
2K x 8-Bit Parallel SyncFIFOTM (Clocked FIFO) .......................................... ..
2K x 9-Bit Parallel SyncFIFOTM (Clocked FIFO) .......................................... ..
2K x 18-Bit Parallel SyncFIFOTM (Clocked FIFO) ..........................................
4K x 8-Bit Parallel SyncFIFOTM (Clocked FIFO) ........................................... .
4K x 9-Bit Parallel SyncFIFOTM (Clocked FIFO) ............................................
4K x 18-Bit Parallel SyncFIFOTM (Clocked FIFO) ..........................................
64 x 4 FIFO .....................................................................................................
64 x 5 FIFO ....................................................................................................
64 x 4 FIFO (w/Output Enable) ......................................................................
64 x 5 FIFO (..·../Output Enable) .................................................................... ..
64 x 5 FIFO (w/Flags) ....................................................................................
64 x 8-Bit Parallel SyncFIFOTM (Clocked FIFO) .......................................... ..
64 x 9-Bit Parallel SyncFIFOTM (Clocked FIFO) ............................................
512 x 18-Bit -1 K x 9-Bit BiFIFO ..................................................................
512 x 18-Bit -1 K x 9-Bit BiFIFO ................................................................ ..
512 x 18-Bit BiFIFO .......................................................................................
1K x 18-Bit - 2K x 9-Bit BiFIFO ...................................................................
1K x 18-Bit - 2K x 9-Bit BiFI FO ...................................................................
1K x 18-Bit BiFIFO .........................................................................................
256 x 18-Bit Synchronous BiFIFO (SyncBiFIFOTM) ..................................... ..
512 x 18-Bit Synchronous BiFIFO (SyncBiFIFOTM) ..................................... ..
16-Bit CMOS Multilevel Pipeline Register ................................................... ..
16-Bit CMOS Multilevel Pipeline Registers .................................................. ..
16-Bit CMOS Multilevel Pipeline Register .................................................... .
16-Bit CMOS Multilevel Pipeline Registers ....................................................
Fast Octal Register Transceiver w/Parity ......................................................
Fast CMOS Octal Register Transceiver with Parity ..................................... ..
Fast Octal Register Transceiver w/Parity .................................................... ..
Fast CMOS Octal Register Transceiver with Parity ..................................... ..
16-Bit CMOS Cascadable ALU .....................................................................
16-Bit CMOS Cascadable ALU .....................................................................
CMOS Single 8-Bit PaletteDACTM for True Color Applications .................... ..
Triple 8-Bit PaletteDACTM ...............................................................................
8-Bit Flash ADC .............................................................................................
8-Bit Flash ADC with Overflow Output ..........................................................
RISC CPU Processor .................................................................................. ..
RISController™ .............................................................................................
RISC Floating Point Accelerator (FPA) ....................................................... ..
RISC CPU Write Buffer ................................................................................ .
IDT79R3051 Family of Integrated RISControllers™ .................................... ..
RISC CPU Processor RISCore™ ..................................................................
Bus Exchanger for R3051 Family ................................................................ ..
DRAM Controller for R3051 Family ............................................................. ..
I/O Interface Controller for R3051 Family ......................................................
Third Generation MIPS RISC Processor ..................................................... ..

1.4

LOGIC
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
LOGIC
6.50
LOGIC
6.50
LOGIC
6.60
LOGIC
6.60
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
LOGIC
5.10
5.20
5.30
6.40
5.50
5.40
6.10
6.20
6.30
5.60

6

II

NUMERICAL TABLE OF CONTENTS (CONTINUED)
PART NO.
7M1001
7M1002
7M1003
7M1004
7M1005
7M134
7M135
7M137
7M144
7M145
7M205
7M206
7M207
7M4003
7M4013
7M4016
7M4017
7M4042
7M4048
7M6032
7M624
7M812
7M912
7MB1006
7MB1008
7MB1041
7MB1042
7MB1043
7MB1044
7MB4009
7MB4040
7MB4048
7MB6036
7MB6039
7MB6040
7MB6042
7MB6043
7MB6044
7MB6046
7MB6049
7MB6051
7MB6056
7MB6061
7MB6064
7MB6136
7MB6146
7MB6156
7MC4001
7MC4005
7MC4032
7MP2005
7MP2009

PAGE
128K x 8 Dual-Port SRAM Module ............................................................... .
16K x 32 Dual-Port SRAM Module ................................................................
64K x 8 Dual-Port SRAM Module ..................................................................
8K x 9 Dual-Port SRAM Module ....................................................................
16K x 9 Dual-Port SRAM Module ..................................................................
8K x 8 Master Dual-Port SRAM Module ........................................................
16K x 8 Master Dual-Port SRAM Module ......................................................
32K x 8 Master Dual-Port SRAM Module ..................................................... .
8K x 8 Slave Dual-Port SRAM Module ...... ;...................................................
16K x 8 Slave Dual-Port SRAM Module ........................................................
8K x 9-Bit CMOS FIFO Module .....................................................................
16K x 9-Bit CMOS FIFO Module ...................................................................
32K x 9-Bit CMOS FIFO Module ...................................................................
32K x 32 CMOS Static RAM Module ............................................................ .
128K x 32 CMOS Static RAM Module .......................................................... .
256K x 16 CMOS Static RAM Module .......................................................... .
64K x 32 CMOS Static RAM Module ............................................................ .
256K x 4 CMOS Static RAM Module ............................................................ .
512K x 8 CMOS Static RAM Module ............................................................ .
16K x 32 Writable Control Store Static RAM Module .................................. ..
64K x 16 CMOS Static RAM Module ............................................................ .
64K x 8 CMOS Static RAM Module ...............................................................
64K x 9 CMOS Static RAM Module ...............................................................
64K x 16 Dual-Port SRAM Module ...............................................................
32K x 16 Dual-Port SRAM Module ................................................................
8K x 8 FourPort™ SRAM Module .................. ;.............................................. .
4K x 8 FourPort™ SRAM Module ..................................................................
4K x 16 FourPort™ SRAM Module ................................................................
2K x 16 FourPort™ SRAM Module .............................................................. ..
2(16K x 16) CMOS Static RAM Module ........................................................
256K x 9 CMOS Static RAM Module ............................................................ .
512K x 8 CMOS Static RAM Module ............................................................ .
128K x 16 Dual-Port (Shared Memory) SRAM Module ................................ .
(2 x 16K x 60) Data/Instruction Cache Module for IDT79R3000 CPU .......... .
(2 x 16K x 64) Data/Instruction Cache Module for General Purpose CPUs ..
8K x 112 Writable Control Store Static RAM Module .................................. ..
(2 x 8K x 64) Data/Instruction Cache Module for IDT79R3000 CPU ............ .
(2 x 4K x 64) Data/Instruction Cache Module for IDT79R3000 CPU ............ .
64K x 16 Dual-Port (Shared Memory) SRAM Module .................................. .
(2 x 16K x 60) Data/Instruction Cache Module for IDT79R3000 CPU
(Multiprocessor) ............................................... ;...................................... .
(2 x 8K x 64) Data/Instruction Cache Module for IDT79R3000 CPU
(Multiprocessor) ................................. ;......... , .........................................
32K x 16 Dual-Port (Shared Memory) SRAM Module .................................. .
(2 x 16K x 60) Data/Instruction w/Resettable Instruction Tag ...................... .
(2 x 4K x 64) Data/Instruction Cache Module for IDT79R3000 CPU ............ .
128K x 18 Dual-Port (Shared Memory) SRAM Module ................................ .
64K x 18 Dual-Port (Shared Memory) SRAM Module .................................. .
32K x 18 Dual-Port (Shared Memory) SRAM Module .................................. .
. 1M x 1 CMOS Static RAM Module ................................................................
16K x 16 CMOS Static RAM Module ............................................................ .
16K x 32 CMOS Static RAM Module w/Separate Data 1/0 .......................... .
8K x 9-Bit FIFO Module ................................................................................ .
32K x 18-Bit FIFO Module .............................................................................

1.4

SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP

7

NUMERICAL TABLE OF CONTENTS (CONTINUED)
PART NO.

7MP2010
7MP2011
7MP4008S
7MP4031
7MP4034
7MP4036
7MP4045
7MP4047
7MP4058L
7RS101
7RS102
7RS103
7RS104
7RS107
7RS108
7RS109
7RS110
7RS300 Series
7RS363
7RS364
7RS382
7RS383
7RS388
7RS502
7RS503
7RS901
7RS903
7RS904
7RS905
8M612
8M624
8M824S
8MP612L
8MP612S
8MP624L
8MP624S
8MP824L
8MP824S
Flexi-Pak Family
RC32xx

PAGE

16K x 18-Bit FIFO Module ................................ ;........................................... .
16K x 9 Bit FIFO Module .......................... ;................................................... .
512K x 8 CMOS Static RAM Module .............................................................
16K x 32 CMOS Static RAM Module ............................................................ .
256K x 8 CMOS Static RAM Module ............................................................ .
64K x 32 CMOS Static RAM Module ............................................................ .
256K x 32 CMOS Static RAM Module ... ....................................................... .
512K x 16 CMOS Static RAM Module ...........................................................
512K x 8 CMOS Static RAM Module ............................................................ .
R3000 CPU Modules for General Applications ............................................ .
R3000 CPU Modules for Compact Systems ................................................ .
R3000 CPU Modules for Compact Systems ................................................ .
R3001 RISC Engine for Embedded Controllers ........................................... .
R3000 CPU Modules for High Performance and MultiProcessor Systems '.'
R3000 CPU Modules with 256K Caches .......................................................
R3000 CPU Modules with 256K Caches ., .....................................................
Plug Compatible Family of R3000 CPU Modules ......................................... .
Prototyping Platform for Any IDT RISC CPU Module ................................... .
R3000 PGA Adaptor ......................................................................................
R3000 Disassembler for Use with the HP 16500 Logic Analyzer ................. .
R3000 and R3001 Evaluation Boards ...........................................................
R3000 and R3001 Evaluation Boards ...........................................................
REAL8™ R3000 Laser Printer Controller Evaluation System ...................... .
MacStation 2 R3000 Development System .................................................. .
MacStation3 R3000 Development System .................................................. .
IDT/sim System Integration Manager ROMable Debugging Kernal ............. .
IDTlc Multi-Host C-Compiler System ............................................................ .
Cross Assembler for IBM PCs and Clones ................................................... .
IDTlfp Floating Point Library for Use with R3000 Compilers ........................ .
32K x 16 CMOS Static RAM Module ............................................................ .
64K x 16 CMOS Static RAM Module ............................................................ .
128K x 8 CMOS Static RAM Module ....... ..................................................... .
32K x 16 CMOS Static RAM Module ............................................................ .
32K x 16 CMOS Static RAM Module ............................................................ .
64K x 16 CMOS Static RAM Module ................................ :........................... .
64K x 16 CMOS Static RAM Module .............................................................
128K x 8 CMOS Static RAM Module .............................................................
128K x 8 CMOS Static RAM Module .............................................................
Modules with Various Combinations of SRAMs, EPROMs and EEPROMs ..
lOT RISC Development Host Systems ..........................................................
Subsystem Custom Module Capabilities .......................................................
Third Party Development Support .................................................................

1.4

SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
7.10
7.20
7.30
7.40
7.50
7.60
7.70
7.80
8.20
8.30
8.40
8.50
8.50
8.60
8.70
8.80
8.90
8.10
8.11
8.12
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
SMP
8.10
SMP
8.13

8

II

IDT PACKAGE MARKING DESCRIPTION
PART NUMBER DESCRIPTION

4.

lOT's part number identifies the basic product, speed,
power, package(s) available, operating temperature and
processing grade. Each data sheet has a detailed description,
using the part number, for ordering the proper product for the
user's application. The part number is comprised of a series
of alpha-numeric characters:

5.

1. An "lOT" corporate identifier for Integrated Device

6.

2.
3.

Technology, Inc.
A basic device part number composed of alpha-numeric
characters.
A device power identifier, composed of one or two alpha
characters, is used to identify the power options. In most
cases, the following alpha characters are used:
"S" or "SA" is used for the standard product's power.
"L" or "LA" is used for lower power than the standard
product.

7.

A device speed identifier, when applicable, is either alpha
characters, such as "A" or "B", or numbers, such as 20 or
45. "The speed units, depending on the product, are in
nanoseconds or megahertz.
A package identifier, composed of one or two characters.
The data sheet should be consulted to determine the
packages available and the package identifiers for that
particular product.
A temperature/process identifier. The product is available
in either the commercial or military temperature range,
processed to a commercial specification, orthe product is
available in the military temperature range with full
compliance to MIL-STD-883. Many of lOT's products
have burn-in included as part of the standard commercial
process flow.
A special process identifier, composed of alpha characters,
is used for products which require radiation enhancement
(RE) or radiation tolerance (RT).

Example for Monolithic Devices:
IDT

xxx. .. xxx

xx

x.. x

x... x

x

xx

T T_:

Special Process
Process/f emperature"
Package"
Speed
Power
Device Type"

" Field Identifier Applicable To All Products
2507 drw 01

ASSEMBLY LOCATION DESIGNATOR

MIL-STD-883C COMPLIANT DESIGNATOR

lOT uses various locations for assembly. These are
identified by an alpha character in the last letter of the date
code marked on the package. Presently, the assembly
location alpha character is as follows:
A = Anam, Korea
I = USA
P = Penang, Malaysia

lOT ships military products which are compliant to the latest
revision of MIL-STD-883C. Such products are identified by a
"C" designation on the package. The location of this designator
is specified by internal documentation at lOT.

1.5

TECHNOLOGY AND CAPABILITIES

IDT... LEADING THE CMOS FUTURE
A major revolution is taking place in the semiconductor
industry today. A new technology is rapidly displacing older
NMOS and bipolar technologies as the workhorse of the 80's
and beyond. That technology is high-speed CMOS. Integrated
Device Technology, a company totally predicated on and
dedicated to implementing high-performance CMOS products,
is on the leading edge of this dramatic change.
Beginning with the introduction of the industry's fastest
CMOS 2K x 8 static RAM, IDT has grown into a company with
multiple divisions producing a wide range of high-speed
CMOS circuns that are, in almost every case, the fastest
available. These advanced products are produced with lOT's
proprietary CEMOSTM technology, a twin-well, dry-etched,
stepper-aligned process utilizing progressively smaller
dimensions.
From inception, lOT's product strategy has been to apply
the advantages of it's extremely fast CEMOS technology to
produce the integrated circuit elements required to implement
high-performance digital systems. lOT's goal is to provide the
circuits necessary to create systems which are far superior to
previous generations in performance, reliability, cost weight
and size. Many of the company's innovative product designs
offer higher levels of integration, advanced architectures,
higher density packaging and system enhancement features
that are establishing tomorrow's industry standards. The
company is committed to providing its customers with an everexpanding seriesofthese high-speed, lower-power IC solutions
to system design needs.
lOT's commitment, however, extends beyond state-of-theart technology and advanced products to providing the highest

2.1

level of customer service and satisfaction in the industry.
Producing products to exacting quality standards that provide
excellent, long-term reliability is given the same level of
importance and priority as device performance. lOT is also
dedicated to delivering these high-quality advanced products
on time. The company would like to be known not only for its
technological capabilities, but also for providing its customers
with quick, responsive and courteous service.
lOT's product families are available in both commercial and
military grades. As a bonus, commercial customers obtain the
benefits of mil nary processing disciplines, established to meet
or exceed the stringent criteria of the applicable military
specifications.
lOT is the leading U.S. supplier of high-speed CMOS
circuits. The company's high-performance fast SRAM , FCT
logic family, high-density modules, FIFOs, complex logic
products, specialty memories, ECl I/O BiCEMOSTM memories,
RISC subsystems, and the 32-bit RISC microprocessor family
complement each otherto provide high-speed CMOS solutions
to a wide range of applications and systems.
Dedicated to maintaining its leadership position as a stateof-the-art IC manufacturer, lOT will continue to focus on
maintaining· its technology edge as well as developing a
broader range of innovative products. New products and
speed enhancements are continuously being added to each
of the existing product families and additional product lines will
be introduced. Contact your IDTfield representative orfactory
marketing engineer to determine the latest product offerings.
If you're building state-of-the-art equipment, IDTwants to help
you solve some of your design problems.

fI

IDT MILITARY AND DESC-SMD PROGRAM
lOT is a leading supplier of military, high-speed CMOS
circuits. The company's high-performance Static RAMs, FCT
logic Family, Complex logic (ClP), FIFOs, Specialty
Memories (SMP), ECl I/O BiCMOS Memories, 32-bit RISC
Microprocessor, RISC Subsystems and high-density
Subsystems Modules product lines complement each otherto
provide high-speed CMOS solutions to a wide range of
military applications and systems. Most of these product lines
offer Class B products which are fully compliant to the latest
revision of Mll-STO-883, Paragraph 1.2.1. In addition, lOT
offers Radiation Tolerant (RT), as well as Radiation Enhanced
(RE), products.
lOT has an active program with the Defense Electronic
Supply Center (OESC) to list all of lOT's military compliant
SMO

devices on Standard Military Drawings (SMO). The SMO
program allows standardization of militarized products and
reduction of the proliferation of non-standard source control
drawings. This program will go far toward reducing the need
for each defense contractor to make separate specification
control drawings for purchased parts. lOT plans to have
SMOs for many of its product offerings. Presently, lOT has 88
devices which are listed or pending listing. The devices are
from lOT's SRAM, FCT logic family, Complex logic (ClP),
FIFOs and Specialty Memories (SMP) product families. lOT
expects to add another 20 devices to the SMO program in the
near future. Users should contact either lOT or OESC for
current status of products in the SMO program.

SMO

SMO

SRAM

lOT

LOGIC

lOT

FIFO

lOT

84036/0
5962-88740
84132/B
5962-86015/A
5962-86859
5962-86705/A
5962-85525/A
5962-88552
5962-88662
5962-88611
5962-88681/A
5962-88545
5962-88544
5962-88725/A
5962-89690
5962-89691
5962-89692
5962-89712
5962-89892
5962-38294
5962-89598

6116
6116LA
6167
7187
6198/719817188
6168
7164
71256L
71256S
71682L
71258S
71258L
71257L
71257S
6116
7198
7188
71982
6198
8Kx8
32Kx8

7201 LA
72404
7203S
7204L
7202L
7201S
72403L
7200L
72103L
72104L
7203L

lOT

5962-86875/A
5962-87002/A
5962-88610/A
5962-88665/A

7130/7140
713217142
7133S17143S
7133U7143L

CLP

lOT

5962-87708/A
5962-88535
5962-88533/A
5962-88613
5962-88643
5962-88743
5962-86273
5962-87686
5962-88733
5962-90579

39Cl0B & C
39COl
49C460A
39C60A
49C41 0
75C48S
7216L
7217L
7210
7381L

54FCT244/A
54FCT245/A
54FCT299/A
54FCT373/A
54FCT374/A
54FCT377/A
54FCT138/A
54FCT240/A
54FCT273/A
54FCT861 AlB
54FCT827AlB
54FCT841 AlB
54FCT821AlB
54FCT521/A
54FCT161/A
54FCT573/A
54FCT823A1B
54FCT163/A
54FCT825A1B
54FCT863A1B
54FCT520AlB
54FCT646A1B
54FCT139/A
54FCT824A1B
54FCT533/A
54FCT182/A
54FCT645A1B
54FCT640AlB
54FCT534/A
54FCT540/A
54FCT541/A
54FCT191/A
54FCT241/A
54FCT399/A
54FCT574/A
54FCT833A1B
54FCT845A1B
54FCT543/A

5962-87531
5962-86846/A
5962-88669
5962-89568
5962-89536
5962-89863
5962-89523
5962-89666
5962-89942
5962-89943
5962-89567

SMP

5962-87630/B
5962-87629/C
5962-86862/A
5962-87f344/A
5962-87628/C
5962-87627
5962-87654/A
5962-87655
5962-87656/A
5962-89533
5962-89506
5962-88575
5962-88608
5962-88543/A
5962-88640
5962-88639
5962-88656
5962-88657
5962-88674
5962-88661
5962-88736
5962-88775
5962-89508
5962-89665
5962-88651
5962-88652
5962-88653
5962-88654
5962-88655
5962-89767
5962-89766
5962-89733
5962-89732
5962-89652
5962-89513
5962-89731
5962-88675
5962-89730

2509 tbl 01

2.2

RADIATION HARDENED TECHNOLOGY
lOT manufactures and supplies radiation hardened products
formilitary/aerospaceapplications. Utilizing special processing
and starting materials, lOT's radiation hardened devices are
ableto survive in hostile radiation environments. Intotaldose,
dose rate and environments where single event upset is of
concern, lOT products are designed to continue functioning
without loss of performance. lOT can supply all its products on
these processes. Total Dose radiation testing is performed in-

2.3

house on an ARACOR X-Ray system. External facilities are
utilized for device research on gamma cell, LlNAC and other
radiation equipment. lOT has an on-going research and
development program for improving radiation handling
capabilities (See "lOT Radiation ToleranVEnhanced products.
for Radiation Environments" in Section 3) of lOT productsl
processes.

IDT LEADING EDGE CEMOS TECHNOLOGY
and wide operating temperatu re range; it also achieves speed
and output drive equal or superior to bipolar Schottky TIL.
The last decade has seen development and production of four
"generations" of lOTs CEMOS technology with process
improvements which have reduced lOT's electrical effective
(Lett) gate lengths by more than 50 percent from 1.3 microns
(millionths of a meter) in 1981 to 0.6 microns in 1989.

HIGH-PERFORMANCE CEMOS
From lOTs beginnings in 1980, it has had a belief in and a
commitment to CMOS. The company developed a highperformance version of CMOS, called enhanced CMOS
(CEMOS), that allows the design and manufacture of leadingedge components. It incorporates the best characteristics of
traditional CMOS, including low power, high noise immunity
CEMOSI

CEMOSII
A

C

CEMOS III

CEMOSV

CEMOSVI

Calendar Year

1981

1983

1985

1987

1989

1990

Drawn
Feature Size

2.5~

1.7~

1.3~

1.2~

1.0~

O.8u

Left
Basic
Process
Enhancements

1.3~

1.1~

Dual-well,
Wet Etch,
Projection
Aligned

Dry Etch,

O.9~

Shrink,
Spacer

Stepper

0.8~

Silicide,
BPSG,
BiCEMOS I

O.6~

0.45~

BiCEMOS II

BiCEMOS III

2514 drw 01

CEMOS IV = CEMOS III - scaled process optimized for high-speed logic.

Figure 1.

Continual advancement of CEMOS technology allows I DT
to implement progressively higher levels of integration and
achieve increasingly faster speeds maintaining the company's
established pOSition as the leader in high-speed CMOS
integrated circuits. In addition, the fundamental process
technology has been extended to add bipolar elements to the
CEMOS platform. lOTs BiCEMOS process combines the
ultra-high speeds of bipolar devices with the lower power and
cost of CMOS, allowing us to build even faster components
than straight CMOS at a slightly higher cost.

CEMOSI
1981

CEMOSII

1983

SEM photos (miniaturization)

CEMOSIII
1987

CEMOSV
1989
2514 drw 02

Figure 2. Flfteen-Hundred-Power Magnification Scanning Electron
Microscope (SEM) Photos of the Four Generations of IOrs CEMOS
Technology

2.4

-3V

NMOS

Potential

+5V

CEMOSTM
2514drw04

2514 drw 03

Figure 3. IDT CEMOS Device Cross Section

Figure 4. lOT CEMOS Built-In High Alpha Particle Immunity

ALPHA PARTICLES

InpuVOutput Pad

Random alpha particles can cause memory cells to
temporarily lose their contents or suffer a "soft error." Traveling
with high energy levels, alpha particles penetrate deep into an
integrated chip. As they burrow into the silicon, they leave a
trail of free electron-hole pairs in their wake.
The cause of alpha particles is well documented and
understood in the industry. lOT has considered various
techniquesto protect the cells from this hazardous occurrence.
These techniques include dual-well structures (Figures 3 and
4) and a polymeric compound for die coating. Presently, a
polymeric compound is used in many of lOTs SRAMs; however,
the specific techniques used may vary and change from one
device generation to the next as the industry and lOT improve
the alpha particle protection technology.

1,000

.s<"'E

fff900

Q)

n-Substrate

-

Q;

ll~
w _
(a)

•

Section A-A

0)800
0)

-

~

a.

::l

..r::.
.B

j

e

,

1\

ff-

~
<.)

700

I~

1"I~

~
01234567

(b) Collector Supply Voltage 'Icc (V)

Typical
2514 drw 05

Figure 5. lOT CEMOS Latchup Suppression

LATCHUP IMMUNITY
A combination of careful design layout, selective use of
guard rings and proprietary techniques have resulted in virtual
elimination of latchup problems often associated with older
CMOS processes (Figure 5). The use of NPN and N-channel
I/O devices eliminates hole injection latchup. Double guard
ring structures are utilized on all input and output circuits to
absorb injected electrons. These effectively cut off the cu rrent
paths into the internal circuits to essentially isolate I/O circuits.
Compared to older CMOS processes which exhibit latchup
characteristics with trigger currents from 10-20mA, lOT
products inhibit latchup at trigger currents substantially greater
than this.

2.4

2

SURFACE MOUNT TECHNOLOGY
AND
lOT'S MODULE PRODUCTS
Requirements for circuit area reduction, utilizing the most
efficient and compact component placement possible and the
needs of production manufacturing for electronics assemblies
are the driving forces behind the advancement of circuit-board
assembly technologies. These needs are closely associated
with the advances being made in surface mount devices
(SMO) and surface mount technology (SMT) itself. Yet, there
are two major issues with SMT in production manufacturing of
electronic assemblies: high capital expenditures and complexity of testing.
The capital expenditure required to· convert to efficient
production using SMT is still too high for the majority of
electronics companies, regardless of the 20-60% increase in
the board densities which SMT can bring. Because ofthis high
barrier to entry, we will continue to see a large market segment
[large even compared to the exploding SMT market] using
traditional through-hole packages (i.e. DIPs, PGAs, etc) and
assembly techniques. How can these types of companies
take advantage of SMO and SMT? Let someone else, such
as lOT, do it forthem by investing time and money in SMT and
then in return offer through-hole products utilizing SMT processes. Products which fit this description are multi-chip
modules, consisting of SMT assembled SMOs on a throughhole type substrate. Modules enable companies to enjoy SMT
density advantages and traditional package options without
the expensive startup costs required to do SMT in-house.
Although subcontracting this type of work to an assembly
house is an alternative, there still is the other issue of testing,
an area where many contract assembly operations fall short
of lOT's capability and experience. Prerequisites for adequate module testing sophisticated high performance parametric testers, customized test fixtures, and most importantly the experience to tests today's complex electronic
devices. Companies can therefore take advantage of lOT's
experience in testing and manufacturing high performance
CMOS multi-chip modules.
At lOT, SMO components are electrically tested, environmentally screened, and performance selected for each lOT
module. All modules are 100% tested as if they are a separate
functional component and are guaranteed to meet all specified parameters at the module output without the customer
having to understand the modules' internal workings.

2.5

Other added benefits companies get by using lOT's CMOS
module products are:
1) a wide variety of high performance, through-hole products utilizing SMO packaged components,
2) fast speeds compared with NMOS based products,
3) low power consumption compared with bipolar technologies, and
4) low cost manufacturability compared with GaAs based
products.
IDT has recognized the problems of SMT and began
offering CMOS modules as part of its standard product portfolio. lOT modules combine the advantages of:
1) the low power characteristics of lOT's CEMOS'" and
BiCEMOS'" products,
2) the density advantages of first class SMO components
including those from IDT's components divisions, and
3) experience in system level design, manufacturing, and
testing with its own in-house SMT operation.
lOT currently has two divisions (Subsystems and RISC
Subsystems) dedicated to the development of module products ranging from simple memory modules to complex VME
sized application specific modules to full system level CPU
boards. These modules have surface mount devices assembled on both sides of either a multi-layer glass filled epoxy
(FR-4) or a multi-layer co-fired ceramic substrate. Assembled
modules come available in industry standard through-hole
packages and other space-saving module packages. Industry proven vapor-phase or IR reflow techniques are used to
solder the SMOs to the substrate during the assembly process. Because of our affiliation with lOT's experienced semiconductor manufacturing divisions, we thoroughly understand and therefore test all modules to the applicable datasheet specifications and customer requirements.
Thus, lOT is able to offer today's electronic design engineers a unique solution for their "need-more-for-Iess"
problem. modules. These high speed, high performance
products offer the density advantages of SMO and SMT, the
added benefit of low power CMOS technology, and throughhole packaged electronics without the high cost of doing it inhouse.

STATE-OF-THE-ART FACILITIES AND CAPABILITIES
Integrated Device Technology is headquartered in Santa
Clara, California - the heart of the "Silicon Valley." The
company's operations are housed in seven facilities totaling
over 500,000 square feet. These facilities house all aspects
of business from research and development to design, wafer
fabrication, assembly, environmental screening, test and
administration. In-house capabilities include scanning electron
microscope (SEM) evaluation, particle impact noise detection
(PIND), plastic and· hermetic packaging, military and
commercial testing, burn-in, life test and a full complement of
environmental screening equipment.
The over-200 ,OOO-square-foot corporate headquarters
campus is composed of four buildings. The largest facility on
this site is a 100,000 square foot, two-building complex. The
first building, a 60,000 square foot facility, is dedicated to the
Complex logic, Standard logic and RISC Microprocessor
product lines, as well as hermetic and plastic package
assembly, logic products' test, burn-in, mark and QA, and a
reliability/failure analysis lab.
lOT's Packaging and Assembly Process Development
teams are located here. To keep pace with the development
of new products and to enhance the lOT philosophy of
"Innovation," these teams have ultra modern, integrated and
correspondingly sophisticated equipment and environments
at their disposal. All manufacturing is completed in dedicated
clean room areas (Class 10K minimum), with all preseal
operations accomplished under Class 100 laminarflow hoods.
Development of assembly materials, processes and
equipment is accomplished under a fully operational production
environment to ensure reliability and repeatable product. The
Hermetic Manufacturing and Process Development team is
currently producing custom products to the strict requirements
of M Il-STD-883. The fully automated plastic facility is currently
producing high volumes of USA-manufactured product, while·
developing state-of-the-art surface mount technology patterned
after Mll-STD-883.
The second building of the complex houses sales, marketing,
finance and MIS.
The RISC Subsystems and Subsystems Modules Divisions
are located behind the twO-building complex in a 54,000
square foot facility. Also located at this facility are Quality
Assurance and wafer fabrication services.
Directly across the street from the two-building complex is
a newly acquired 50,000 square foot facility that houses

2.6

administrative services, Northwest Area Sales, Human
Resources, International Planning and Shipping and Receiving
functions.
lOT's largest and newest facility, opened in 1990 in San
Jose, California, is a multi-purpose 150,000 square foot, ultra
modern technology development center. This facility houses
a 25,000 square foot, combined Class 1 (a maximum of one
particle per cubic foot of 0.2 micron or larger), sub-half-micron
R&D fabrication facility and a wafer fabrication area. This fab
supports both production volumes of lOT products, including
some next generation SRAMs, and the R&D efforts of the
technology development staff. Technology development efforts
targeted for the center include advanced silicon processing
and wafer fabrication techniques. A test area to support both
production and research is located on-site. The building is
also the new home of the FIFO and ECl product lines.
lOT's second largest facility is located in Salinas, California,
about an hour away from Santa Clara. This 95,000 square
foot facility, located on 14 acres, is the Static RAM Division
and Specialty Memory product line. Constructed in 1985, this
facility houses an ultra-modern 25,000 square foot highvolume wafer fabrication area measured at Class 2-to-3 (a
maximum of 2 to 3 particles per cubic foot of 0.2 micron or
larger) clean room conditions. Careful design and construction
of this fabrication area created a clean room environment far
beyond the 1985 average for U.S. fab areas. This made
possible the production of large volumes of high-density
submicron geometry, fast static RAMs. This facility also
houses shipping areas for lOT's leadership family of CMOS
static RAMs. This site will expand to accommodate a 250,000
square foot complex.
To extend these philosophies while maintaining strict control
of our processes, lOT has an operational Assembly and Test
facility located in Penang, Malaysia. This facility assembles
product to USA standards, with all assemblies done under
laminar flow conditions (Class 100) until the silicon is encased
in its final packaging. All products in this facility are
manufactured to the quality control requirements of Mil-STD883.
All of lOT's facilities are aimed at increasing our
manufacturing productivity to supply ever larger volumes of
high-performance, cost-effective leadership CMOS products.

fI

SUPERIOR QUALITY AND RELIABILITY
Maintaining the highest standards of quality in the industry
on all products is the basis of Integrated Device Technology's
manufacturing systems and procedures. From inception,
quality and reliability are built into all of lOT's products. Quality
is "designed in" at every stage of manufacturing - as opposed
to being '1ested-in" later - in order to ensure impeccable
performance.
.
Dedicated commitment to fine workmanship, along with
development of rigid controls throughout wafer fab, device
assembly and electricaltest, create inherently reliable products.
Incoming materials are subjected to careful inspections. Qualtty
monitors, or inspections, are performed throughout the
manufacturing flow.
IDTmilttarygrademonolithichermeticproductsaredesigned
to meet or exceed the demanding Class B reliability levels of
MIL-STD-883 and MIL-M-3"8510, as defined by Paragraph
1.2.1 of MIL-STD-883.
Product flow and test procedures for all monolithic hermetic
military grade products are in accordance with the latest
revision and notice of M IL-STD-883. State-of-the-art production
techniques and computer-based test procedures are coupled
with tight controls and inspections to ensure that products
meet the requirements for 100% screening. Routine quality
conformance lot testing is performed as defined in MIL-STD883, Methods 5004 and 5005.
For lOT module products, screening of the fu lIy assembled
substrates is performed, in addition to the monolithic level
screening, to assure package integrity and mechanical

reliability. All modules receive 100% electrical tests (DC,
functional and dynamic switching) to ensure compliance with
the "subsystem" specifications.
By maintaining these high standards and rigid controls
throughout every step of the manufacturing process, lOT
ensures that commercial, industrial and military grade products
consistently meet customer requirements forquality, reliability
and performance.

SPECIAL PROGRAMS
Class S. lOT also has all manufacturing, screening and
test capabilities in-house (except X-ray and some Group 0
tests) to perform complete Class S processing per MIL-STD883 on alii DT products and has supplied Class S products on
several programs.
Radiation Hardened. lOT has developed and supplied
several levels of radiation hardened products for military/
aerospace applications to perform at various levels of dose
rate, total dose, single event upset (SEU), upset and latchup.
lOT products maintain nearly their same high-performance
levels built to these special process requirements. The
company has in-house radiation testing capability used both
in process development and testing of deliverable product.
lOT also has a separate group within the company dedicated
to supplying products for radiation hardened applications and
to continue research and development of process and products
to further improve radiation hardening capabilities.

2.7

QUALITY AND RELIABILITY

II

QSP-QUALITY, SERVICE AND PERFORMANCE
Ouality from the beginning, is the foundation for lOT's
commitment to supply consistently high-quality products to
our customers. lOT's quality commitment is embodied in its all
pervasive Constant Ouality Improvement (COl) program.
Everyone who influences the quality of the product-from the
designer to the shipping clerk-is committed to constantly
improving the product quality.

These systems and controls concentrate on COl by focusing
on the following key elements:

lOT'S FOCUS

Standardization
Implementing policies, procedures and measurement
techniques that are common across different operational

"To make quantitative constant improvement in the quality
of our actions that result in the supply of leadership products
in conformance to the requirements of our customers. "
lOT has dedicated its efforts to constant quantitative
improvements in quality. The result, a supply of leadership
products that conform to the requirements of our customers.

lOT'S PRODUCT ASSURANCE STRATEGY
FOR CQI
Measurable standards are essential to the success of COL
All the processes contributing to the final quality of the product
need to be monitored, measured and improved upon through
the use of statistical tools.
DEVELOPMENT

I

PRODUCT FLOW

ASSEMBLY

I

Documentation
Documenting and training in policies, procedures,
measurement techniques and updating through
characterization! capability studies.
Productivity Improvement
Using constant improvement teams made up from
employees at all levels of the organization.
Leadership
Focusing on quality as a key business parameter and
strategic strength.

Customer Service
Supporting the customer, as a partner, through
performance review and pro-active problem solving.

TEST

I
SHIP
Our customers receive the benefitof our optimized systems.
Installed to enhance quality and reliability, these systems
provide accurate and timely reporting on the effectiveness of
manufacturing controls and the reliability and quality
performance of lOT products and services.

People Excellence
Committing to growing, motivating and retaining people
through training, goal setting, performance measurement
and review.

PRODUCT FLOW

ORDER ENTRY

Product quality starts here. lOT has mechanisms and
procedures in place that monitor and control the quality of our
development activities. From the calibration of design capture
libraries through process technology and product
characterization that establish whether the performance,
ratings and reliability criteria have been met. This includes
failure analysis of parts that will improve the prototype product.
At the pre-production stage once again in-house qualification
tests assure the quality and reliability of the product. All
specifications and manufacturing flows are established and
personnel trained before the product is placed into production.

I

PRODUCTION CONTROL
SERVICE FLOW

areas.

Total Employee Participation
Incorporating the COl program into the lOT Corporate
Culture.

FAB

I

Statistical Techniques
Using statisticaltechniques, including Statistical Process
Control (SPC) to determine whether the product!
processes are under control.

I
SHIPPING

I

CUSTOMER SUPPORT

3.1

II

Manufacturing
To make COl during the manufacturing stage, control items
are determined for major manufacturing conditions. Data is
gathered and statistical techniques are used to control specific
manufacturing processes that affect the quality of the product.
In-process and final inspections are fed back to earlier
processes to improve product quality. All product is burnedin (where applicable) before 100% inspection of electrical
characteristics takes place.
Products which pass final inspection are then subject to
Ouality Assurance and Reliability Tests. This data is used to
improve manufacturing processes and provide reliability
predictions of field applications.

Inventory and Shipping
Controls in shipping focus on ensuring parts are identified
and packaged correctly. Care is also taken to see that the
correct paperwork is present and the product being shipped
was processed correctly.

SERVICE FLOW
Ouality not only applies to the product but to the quality -of
-service we give our customers. Service is also constantly
improved.

Order Procedures
Checks are made at the order entry stage to ensure the
correct processing ofthe Customer's product. Afterverification
and data entry the Acknowledgements (sent to Customers)
are again checked to ensure details are correct. As part of the
COl program, the results of these verifications are analyzed
using statistical techniques and corrective actions are taken.

to adopt these same disciplines. As a result, employees
receive extEmsive training and the performance level of key
actions are kept under constant review. These key actions
include:
Ouotation response and accuracy.
Scheduling response and accuracy.
Response and accuracy of Expedites.
Inventory, management, and effectiveness.
On time delivery.

Customer suppon
lOT has a worldwide network of sales offices and Technical
Development Centers. These provide local customer support
on business transactions, and in addition, support customers
on applications information, technical services, benchmarking
of hardware solutions, and demonstration of various
Development Workstations.
The key to COl is the timely· resolution of defects and
implementation of the corrective actions. This is no more
important than when product failures are found by a
customer.When failures are found at the customer's incoming
inspection, in the production line, or the field application, the
Ouality Assurance group is the focal point for the investigation
of the cause of failure and implementation of the corrective
action. lOT constantly improves the level of support we give
our customers by monitoring the response time to customers
that have detected a product failure. Providing the customer
with an analysis of the failure, including corrective actions and
the statistical analysis of defects, brings COl full circle-full
support of our customers and their designs with high-quality
products.

SUMMARY
In 1990, lOT made the commitment to "Leadership through
Quality, Service, and Performance Products".

Production Control
Production Control (P.C.) is responsible for the flow and
logistics of material as it moves through the manufacturing
processes. The quality of the actions taken by P.C. greatly
impinges on the quality of service the customer receives.
Because many of our customers have implemented Just-inTime (JIT) manufacturing practices, IOTas asupplieralso has

3.1

We believe by following that credo lOT and our cusotmers
will be successful in the coming decade.With the
implementation of the COl strategy, we will satisfy our goal...

"Leadership through Quality, Service and Performance
Products':

2

lOT QUALITY CONFORMANCE PROGRAM
A COMMITMENT TO QUALITY
Integrated Device Technology's monolithic and modular
assembly products are designed, manufactured and tested in
accordance with the strict controls and procedures required
by Military Standards. The documentation, design and
manufacturing criteria of the Quality and Reliability Assurance
Program were developed and are being maintained to the
most current revisions of MIL-3851 0 as defined by paragraph
1.2.1 of MIL-STD-883 and MIL-STD-883 requirements.
Product flow and test procedures for all Class B monolithic
hermetic Military Grade microcircuits are in full compliance
with paragraph 1.2.1 of MIL-STD-883. State-of-the-art
production techniques and computer-based test procedures
are coupled with stringent controls and inspections to ensure
that products meet the requirements for 100% screening and
quality conformance tests as defined in MIL-STD-883, Methods
5004 and 5005.
Product flow and test procedures for all plastic and
commercial hermetic products are in accordance with industry
practices for producing highly reliable microcircuits to ensure
that products meet the lOT requirements for 100% screening
and quality conformance tests.
By maintaining these high standards and rigid controls
throughout every step of the manufacturing process, lOT
ensures that our products consistently meet customer
requirements for quality, reliability and performance.

4.

Wire Bond Monitor: Product samples are routinely
subjected to a strength test per Method 2011, Condition
0, to ensure the integrity of the lead bond process.

5..

Pre-cap Visual: Before the completed package is
sealed, 100% of the product is visually inspected to
Method 2010, Condition B criteria.

6.

Environmental Conditioning: 100% of the sealed
product is subjected to environmental stress tests.
These thermal and mechanical tests are designed to
eliminate units with marginal seal, die attach or l e a d _
bond integrity.
_

7.

Hennetlc Testing: 100% of the hermetic packages
are subjected to fine and gross leak seal tests to
eliminate marginally sealed units or units whose
seals may have become defective as a result of
environmental conditioning tests.

8.

Pre-Burn-In Electrical Test: Each product is 100%
electrically tested at an ambient temperature of +25°C
to lOT data sheet or the customer specification.

9.

Burn-In: 100% of the Military Grade product is
burned-in under dynamic electrical conditions to the
time and temperature requirements of Method 1015,
Condition D. Except for the time, Commercial Grade
product is burned-in as applicable to the same
conditions as Military Grade devices.

10.

Post-Burn-In Electrical: After burn-in, 100% of the
Class B Military Grade product is electrically tested to
lOT data sheet or customer specifications over the
-55°C to +125°C temperature range. Commercial
Grade products are sample tested to the applicable
temperature extremes.

11.

Mark: All product is marked with product type and lot
code identifiers. MIL-STD-883 compliant Military
Grade products are identified with the required
compliant code letter.

12.

Quality Conformance Tests: Samples of the Military
Grade product which have been processed to the
100% screening tests of Method 5004 are routinely
subjected to the quality conformance requirements of
Method 5005.

SUMMARY
Monolithic Hermetic Package Processing FloW2.5kg)

I
I

lOT SPEC
2011

(>3.0grams)

2010

CONDo B

I

I
PRE·CAP VISUAL
SAMPLE

2010

CONDo B

I
lOT SPEC

I
lOT SPEC PROVIDES LOT
TRACEABILITY
I
1010 CONDo C, 10 cycles
I -65°C TO +150°C .
2001 CONDo E, Y1 Direction
>30kg (PKG < 5g)
>20kg (PKG  PHYSICAL
OxFFFFFFFF

KERNEL
MAPPED
ANY
CACHEABLE
OxCOOOOOOO t -_ _. .,;.{k_s_eg;;.,.2. .,;.}_ _--i

OxFFFFFFFF

KERNEL
UNMAPPED
UNCACHED
OxAOOOOOOO 1--_ _ _(k_s_eg_1_)- - - i

Ox80000000
Ox7FFFFFFF

PHYSICAL
MEMORY

KERNEL
UNMAPPED
CACHED
(ksegO)
Ox20000000
KERNEUUSER
MAPPED
CACHEABLE
{kuseg}

O~

3584 MB

_______

Ox1FFFFFFF
MEMORY

512 MB

OxOOOOOOOO

~

II

2860 drw06

Figure 6. IDT79R3000A Virtual Address Mapping

User Mode-in this mode, a single, uniform virtual address space (kuseg) of 2 Gbyte is available. Each virtual
address is extended with a 6-bit process identifier field to form
unique virtual addresses. All references to this segment are
mapped through the TLB. Use of the cache for up to 64
processes is determined by bit settings for each page within
the TLB entries.
Kernel Mode-four separate segments are defined. in
this mode:
• kuseg-when in the kernel mode, references to this segment are treated just like user mode references, thus
streamlining kernel access to user data.
• ksego-references to this 512 Mbyte segment use cache
memory but are not mapped through the TLB.lnstead, they
always map to the first 0.5 GBytes of physical address
space.
• kseg1-references to this 512 Mbyte segment are not
mapped through the TLB and do not use the cache.
Instead, they are hard-mapped into the same 0.5 GByte
segment of physical address space as ksegO.
• kseg2-references to this 1 Gbyte segment are always
mapped through the TLB and use of the cache is determined by bit settings within the TLB entries.

IOTI9R3000 Pipeline Architecture
The execution of a single IDT79R3000A instruction consists of five primary steps:
1) IF
Fetch the instruction (I-Cache).
2) RO
Read any required operands from CPU
registers while decoding the instruction.
3) ALU
Perform the required operation on
instruction operands.
4) MEM Access memory (D-Cache).
5) we - Write back results to register file.
Each of these steps requires approximately one CPU
cycle as shown in Figure 7 (parts of some operations overlap
into another cycle while other operations require only 1/2
cycle).

IF

I
I-CACHE
I

RD

I RF

ALU

MEM

WB

OP

D-CACHE

WBI

One Cycle
2860 drw07

Figure 7. IDT79R3000A Instruction Pipeline

5.1

7

I

IDT79R3000AlAE RISC CPU PROCESSOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

INSTRUCTION EXECUTION

Microprocessor
(CPU)

The IDT79R3000A uses a 5-stage pipeline to achieve an
instruction execution rate approaching one instruction per
CPU cycle. Thus, execution of five instructions at a time are
overlapped as shown in Figure 8.

Data

(5-Deep)

Address

Memory
(and I/O)

2860 drw09

Figure 9. A Simple Microprocessor Memory System

Figure 10 illustrates a memory system that supports the
significantly greater memory bandwidth required to take full
advantage of the IDT79R3000A's performance capabilities.
The key features of this system are:

Instruction

Flow

Current
CPU
Cycle

IDT79R3000A
Microprocessor

2860 drw 08

Data

Figure 8. IDT79R3000A Execution Sequence

Address

IDT79R3000A INSTRUCTION PIPELINE
This pipeline operates efficiently because different CPU
resources '(address and data bus accesses, ALU operations,
register accesses, and so on) are utilized on a non-interfering
basis.
Memory System Hierarchy
The high performance capabilities of the IDT79R3000A
processor demand system configurations incorporating techniques frequently employed in large, mainframe computers
but seldom encountered in systems based on more traditional
microprocessors.
A primary goal of systems employing RiSe techniques is
to minimize the average number of cycles each instruction
requires for execution. In order to achieve this goal, RiSe
processors incorporate a numberof RiSe techniques, including a compact and uniform instruction set, a deep instruction
pipeline (as described above), and utilization of optimizing
compilers. Many of the advantages obtained from these
techniques can, however, be negated by an inefficient memory
system.
Figure 9 illustrates memory in a simple microprocessor
system. In this system, the CPU outputs addresses to memory
and reads instructions and data from memory orwrites data to
memory. The address space is completely undifferentiated:
instructions, data, and I/O devices are all treated the same. In
such a system, a primary limiting performance factor is memory
bandwidth.

5.1

Main Memory

2860drw 10

Figure 10. An IDT79R3000A System with a
High-Performance Memory System

8

MILITARY AND COMMERCIAL TEMPERATURE RANGES

IDTI9R3000AlAE RISC CPU PROCESSOR

•

External Cache Memory-Local, high-speed memory
(called cache memory) is used to hold instructions and data
that is repetitively accessed· by the CPU (for example,
within a program loop) and thus reduces the number of
references that must be made to the slower-speed main
memory. Some microprocessors provide a limited amount
of cache memory on the CPU chip itself. The external
caches supported by the IOT79R3000A can be much
larger; while a small cache can improve performance of
some programs, significant improvements for a wide range
of programs require large caches.
• Separate Caches for data and Instructions-Even with
high-speed caches, memory speed can still be a limiting
factor because of the fast cycle time of a high-performance
microprocessor. The I0T79R3000A supports separate
caches for instructions and data and alternates accesses
of the two caches during each CPU cycle. Thus, the
processor can obtain data and instructions at the cycle rate
of the CPU using caches constructed with commercially
available lOT static RAM devices.
In orderto maximize bandwidth in the cache while minimizing the requirement for SRAM access speed, the R3000A
divides a single-processor clock cycle into two phases.
During one phase, the address for the data cache access
is presented while data previously addressed in the instruction cache is read; during the next phase, the data
operation is completed while the instruction cache is being
addressed. Thus, both caches are read in a single processor cycle using only one set of address and data pins.
• Write Buffer-in orderto ensure data consistency, all data
that is written to the data cache must also be written out to
main memory. The cache write model used by the
I0T79R3000A is that of a write-through cache; that is, all
data written by the CPU is immediately written into the main
memory. To relieve the CPU of this responsibility (and the
inherent performance burden) the IOT79R3000A supports
an interface to a write buffer. The I0T79R3020 Write Buffer
captures data (and associated addresses) output by the
CPU and ensures that the data is passed on to main
memory.
IDT79R3000A Processor Subsystem Interfaces
Figure 11 illustrates the three subsystem interfaces provided by the IOT79R3000A processor:
• Cache control interface (on-chip) for separate data and
instruction caches permits implementation of off-chipcaches
using standard lOT SRAM devices. The 79R3000Adirectly
controls the cache memory with a minimum of external
components. Both the instruction and data cache can vary
from 0 to 256K Bytes (64K entries). The 79R3000A also
includes the TAG control logic which determines whether
or not the entry read from the cache is the desired data. The
79R3000A cache controller implements a direct mapped
cache for, high net performance (bandwidth). It has the
ability to refill multiple words when a cache miss occurs,
thus reducing the effective miss rate to less than 2% for

5.1

large caches. When a cache miss occurs, the 79R3000A
can support refilling the cache in 1, 4, 8, 16, or 32 word
blocks to minimize the effective penalty of having to access
main memory. The 79R3000A also incorporates the ability
to perform instruction streaming; while the cache is refilling, the processor can resume execution once the missed
word is obtained from main memory. In this way, the
processor can continue to execute concurrently with the
cache block refill.
• Memory controller interface for system (main) memory.
This interface also includes the logic and signals to allow
operation with a write buffer to further improve memory
bandwidth. In addition to the standard full word access, the
memory controller supports the ability to write bytes and
half-words by using partial word operations. The memory
controller also supports the ability to retry memory accesses if, for example, the data returned from memory is
invalid and a bus error needs to be Signalled.
• Coprocessor Interface-The IOT79R3000A features a
tightly coupled co-processor interface in which all coprocessors maintain synchronization with the main processor; reside on the same data bus as the main processor; and participate in bus transactions in an identical
manner to the main processor. The IOT79R3000A generates all required cache and memory control Signals, including cache and memory addresses for attached
coprocessors. As a result, only the data bus and a few
control signals need to be connected to a coprocessor.
The interface supports three types of coprocessor instructions: loads/stores, coprocessor operations, and processor-coprocessor transfers. Note that coprocessor loads
and stores occur directly between the coprocessor and
memory, without requiring the data to go through the CPU.
Synchronization between the CPU and external
coprocessors is achieved using a Phased-Lock Loop interface to the coprocessor. The coprocessor physical interface also includes coprocessor condition Signals
(CpCond(n)), which are used in coprocessor branch instructions, and a coprocessor busy signal (CpBusy) which
is used to stall the CPU if the coprocessor needs to hold off
subsequent operations.
Finally, a precise exception interface is defined between
the CPU and coprocessors using the external interrupt
inputs of the CPU. This allows a coprocessor exception,
even if it was the result of a multi-cycle operation, to be
traced to the precise coprocessor operation which caused
it. This is an important feature for languages which can
define specific error handlers for each task.
The interface supports up to four separate coprocessors.
Coprocessor 0 is defined to be the system control
coprocessor, and resides on the same chip as the CPU
unit. Coprocessor 1 is the Floating Point Accelerator, lOT
79R301 OA. Coprocessors 2 and 3 are available to support
an interface to application specific functions.

9

EI

1DT79R3000AlAE RISC CPU PROCESSOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

MULTIPROCESSING SUPPORT

PACKAGE THERMAL SPECIFICATIONS

The IDT79R3000A supports multiprocessing applications
in a simple but effective way. Multiprocessing applications
require cache coherency across the multiple processors. The
IDT79R3000Aoffers two signals to support cache coherency:
the first, MPStall, stalls the processor within two cycles of
being received and keeps it from accessing the cache. This
allows an external agent to snoop into the processor data
cache. The second signal, MPlnvalidate, causes the processor to write data on the data cache bus which indicates the
externally addressed cache entry is invalid. Thus, a subsequent access to that location would result in a cache miss, and
the data would be obtained from main memory.
The two MP signals would be generated by a external
logic which utilizes a secondary cache to perform bus snooping functions. The 79R3000Adoes not impose an architecture
for this secondary cache, but rather is flexible enough to
support a variety of application specific architecture stand still
maintain cache coherency. Further, there is no impact on
designs which do not require this feature. The 79R3000A has
further improved on the microprocessor support found in the
79R3000, by allowing the use of cache RAMs with internal
address latches in multiprocessor systems.

The IDT79R3000A utilizes special packaging techniques
to improve both the thermal and electrical characteristics of
the microprocessor.
In order to improve the electrical characteristics of the
device, the package is constructed using rriultiple signal
planes, including individual power planes and ground planes
to reduce noise associated with high-frequency TIL parts. In
addition, the 175-pin PGA package utilizes extra power and
ground pins to reduce the inductance from the internal power
planes to the power planes of the PC Board.
In order to improve the electrical characteristics of the
microprocessor, the device is housed using cavity down
packaging. In addition, these packages incorporate a coppertungsten thermal slug designed to efficiently transfer heat
from the die to the case of the package, and thus effectively
lower the thermal resistance of the package. The use of an
additional external heat sink affixed to the package thermal
slug further decreases the effective thermal resistance of the
package.
The case temperature may be measured in any environment to determine whether the device is within the specified
operating range. The case temperature should be measured
at the center of the top surface opposite the package cavity
(the package cavity is the side· where the package lid is
mounted).
The equivalent allowable ambient temperature, TA, can
be calculated using the thermal resistance from case to
ambient (Oca) for the given package. The following equation
relates ambient and case temperature:
TA = Tc - P*0ca
where P is the maximum power consumption, calculated by
using the maximum Icc from the DC Electrical Characteristics
section.
Typical values for Oca at various airflows are shown in
table 4 for the various CPU packages.

ADVANCED FEATURES
The IDT79R3000A offers a number of additional features
such as the ability to swap the instruction and data caches,
facilitating diagnostics and cache flushing. Another feature
isolates the caches, which forces cache hits to occur regardless of the contents of the tag fields. The I DT79R3000A allows
the processor to execute user tasks of the opposite byte
ordering (endianness) of the operating system, and further
allows parity checking to be disabled. More details on these
features can be found in the lOT 79R3000A Family Hardware
User's Manual.
Further features of the IDT79R3000A are configured
during the last four cycles prior to the negation of the RESET
input. These functions include the ability to select cache sizes
and cache refill block sizes; the ability to utilize the multiprocessor interface; whether or not instruction streaming is
enabled; whether byte ordering follows "Big-Endian" or "LittleEndian" protocols, etc. Table 3 shows the configuration options selected at Reset. These are further discussed in the
"Hardware User's Manual".

Airflow - (ft/mln)

0ca (175-PGA,
144-PGA)
0ca (172 Quad
Flatpack)

BACKWARD COMPATIBILITY WITH 79R2000

0
21

200
7

400
3

600
2

23

9

4

3

800 1000
1
0.5
2.5

1.5
2860 tbl 03

The IDT79R3000A can be used in sockets designed for
the 79R3000. The pin-out of the 79R3000A has been selected
to ensure this compatibility, with new functions mapped onto
previously unused pins. The instruction set is compatible with
that of the 79R2000 at the binary level. As. a result, code
written forthe older processor can be executed. New features
can be selectively disabled.
In most 79R3000 applications, the 79R3000A can be
placed in the socket with no modification to initialization
settings. Further application assistance on this topic is available from IDT.

Table 4. R3000A Package Characteristics

5.1

10

IDT79R3000AlAE RISC CPU PROCESSOR

Input

MIUTARV AND COMMERCIAL TEMPERATURE RANGES

X Cycle

W Cycle

IntO

DBlkSizeO
IBlkSizeO
DispPar/RevEnd
Reserved(1)
PhaseDelayOn(2)
R3000 Mode(2)

inIT
Int2
Int3
Int4
Int5

DBIkSize1
IBIkSize1
IStream
Store Partial
PhaseDelayOn(2)
R3000 Mode(2)

Y Cycle

ZCycle

Extend Cache
MPAdrDisable
IgnoreParity
MultiProcessor
PhaseDelayOn(2)
R3000 Mode(2)

Big Endian
TriState
NoCache
BusDriveOn
PhaseDelayOn(2)
R3000 Mode(2)
2860 tbl 04

NOTES:
1. Reserved entries must be driven high.
2. These values must be driven stable throughout the entire RESET period.
Table 3. R3000A Mode Selectable Features

uata Bus

Data Bus

~

;---

r--

~

f--

us

Ta CI

E

~ rLo Bus

n"

"<;

TaR.,
Tag
TagP

... ~
Transparent
Latch

Data

AdrLo

AdrLo Bus

'7

Data
DataP

~ IClk

DClk

r-r--

'7 ...

"<;

Data

'7

Tag

Instruction
Cache

'Ii

IAdr
(15:2)
OE ~

WE

.,.,

""v

IDT79R3000A Processor
with System Control
Coprocessor

7

TRO

IAdr Tag
(15:2)

Clk2xSys
Clk2xSmp

XEi

Memory
Interface

Clk2Rd

SysOut

QI;

II

'7

"<;

Data

Data
Cache

~

14-

.-

Clk2xPhi

~

Reset

+-

"<;

1f2f3

.

lJ9niWr

RUn
EXc

.

Wrl3USY

7'

Clocks

CpSync

RdBusy

Coprocessors

CpBusy
CpCond(3:1 ) .....

CpCond(O)
r

r+

~

AccTy(2:0)

.....

'7

UWr f--+ WE

'7",/

-

It.
parent
Latch

...

m1cJ

.- 1Wr

....

~

us

BusError

Int(5:0)

L

Hardware
Interrupts

I
I

2860drw 11

Figure 11. IDT79R3000A Subsystem Interfaces Example; 64 KB Caches

5.1

11

IDT79R3000AlAE RISC CPU PROCESSOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

PIN CONFIGURATION

Data2l
Data22
Data24
Data25
Data26
Data3l
DataP3
Data27
Data28

Adrlo2
Adrl03
Adrl04
Adrlo5
Adrl06
Adrlo7
Adrl08
Adrlo9
Adrlol0
Adrloll
Adrlo12
Adrlo13
Adrlo14

~

Data29
Data30

£XC

vce
vee
vce

CIk2xPhi

GND
GND

GND
GND

Clk2xSmp

vec
vee

vce
vce
GND
vec
vcc
vee

GND
GND
GND

vee
vee
vec

Adrlo15
CpCondO
CpCondl
Resvdl

GND
GND
Clk2xSys

TRd1

GND
GND

rnm

1Wr"i
0Wri

Adrlo16
Adrlo17

vce
vcc

TrltO
Trill

mt2

CIk2xRd

SySOUt

Trlt3
frii4

DClk
IClk

mtS

1m2
tmil2
1Wr2
t5Wr2

epBusy

WrSUSY
~

BusError

~

Resiii

2860drw 12

172-Pln Flatpack (Top View)

NOTES:
1. Reserved pins must be connected.
,
2. AdrLo 16 and 17 are multifunction pins which are controlled by mode select programming on interrupt pins at reset time
AdrLo 16: MP Invalidate. CpCond (2).
AdrLo 17: MP Stall. CpCond (3).

5.1

12

IDT79R3000AlAE RISC CPU PROCESSOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

PIN CONFIGURATION
2

3

5

6

8

10

A

(No
Pin)

AdrLo
6

AdrLo
10

AdrLo
11

vee

AdrLo
14

AdrLo ~pCond AdrLo
15
0
16

AdrLo
17

B

AdrLo
3

DRd2

AdrLo
7

AdrLo
9

AdrLo
12

IRd2

AdrLo ppeond Int(1)
13
1

e

AdrLo
0

AdrLo
4

vee

AdrLo
5

AdrLo
8

GND

GND

vee

D

Data
1

AdrLo
2

GND

GND

vee

GND

vee

GND

E

DataP
0

Data
0

AdrLo
1

vee

Data
7

G

Data
4

H

12

13

14

15

Int(2)

Int(5)

Wr
Busy

R'iiSet

vee

Int(3)

ep
Busy

BuS
Error

DWr2

Tag12

Tag15

Int(O)

Int(4)

Rd
Busy

GND

Tag13 Tag PO

Tag18

vee

GND

vee

GND

Tag14 Tag 17

Tag19

vee

vee

Tag16 Tag20

vee

Data
2

GND

GND

GND

Tag21

Data
3

GND

vee

vee

GND

Tag22 TagP1

Data
6

Data
5

Data
8

GND

GND

vee

Tag25

Tag24

Data
10

DataP
1

Data
9

vee

vee

Tag28 Tag29

Tag26

Data
15

Data
11

GND

GND

GND

GND

TagP2 Tag27

vee

Data
12

Data
17

vee

vee

Acc
Typ2

Tag31

Tag30

M

Data
13

Data
16

DataP
2

GND

vee

GND

vee

GND

vee

GND

vee

GND

GND

Acc
Typ1

vee

N

Data
14

Data
18

Data
19

GND

Data
24

DataP
3

vee

vee

GND

GND

DRd1

Mem
Wr

Mem

Rd

Run

TagV

Data

Data
20

IWr2

Data
22

Data
26

Data
27

XEn

Data

30

elk2x
Sys

elk2x
Rd

Delk

IRd1

IWr1

#nc

Acc
TypO

Data
21

Data
25

Data
31

Data
28

GND

Data
29

E~
tlon

elk2x
Phi

elk2x SysOut
Smp

vee

lelk

DWr1

vee

F

K

P

a

23
vee

7

11

Tag23

I

2860 drw 13

175-Pin PGA (Top View)
NOTE:
1. AdrLo 16 and 17 are multifunction pins which are controlled by mode select programming on interrupt pins at reset time
AdrLo 16: MP Invalidate, CpCond (2).
AdrLo 17: MP Stall, CpCond (3).

5.1

13

I

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

IDT79R3000AlAE RISC CPU PROCESSOR

PIN CONFIGURATION
2

3

5

6

7

8

10

11

12

13

14

15

A

vee

AdrLo
6

AdrLo
10

AdrLo
11

,vee

AdrLo
14

AdrLo "pCond AdrLo
15
0
16

AdrLo
H

Int(2)

Int(5)

Wr
Busy

JieSei"

vee

B

AdrLo
3

DRd2

AdrLo
7

AdrLo
9

AdrLo
12

iRd2

AdrLo CpCond Int(1)
13
1

Int(3)

Cp
Busy

BuS
Error

5Wr2

Tag12

Tag15

C

AdrLo
0

AdrLo
4

vee

AdrLo
5

AdrLo
8

GND

GND

vee

Int(4)

Rd
, Busy

Tag13 TagPO

Tag18

o

Data
1

AdrLo
2

GND

GND

Tag14

TagH

Tag19

E

DalaP
0

Data
0

AdrLo
1

Tag16

Tag20

vee

vee

Data
7

Data
2

GND

Tag21

Tag23

G

Data
4

Data
3

GND

GND

Tag22

TagP1

H

Data
6

Data
5

Data
8

vee

Tag25

Tag24

Data
10

DataP
1

Data
9

Tag28

Tag29

Tag26

K

Data
15

Data
11

GND

GND

TagP2

Tag27

L

vee

Data
12

Data
17

Acc
Typ2

Tag31

Tag30

M

Data
'13

Data
16

DataP
2

GND

Acc
Typ1

vee

N

Data
14

Data
18

Data
19

GND

Data
24

DataP
3

vee

vee

GND

GND

DRd1

Mem
Wr

Mem

Run

TagV

P

Data
23

Data
20

IWr2

Data
22

Data

Data
27

XEn

Data

elk2x
Sys

elk2x
Rd

Delk

IRd1

IWr1

~

Acc
TypO

Q

vee

Data
21

Data
25

Data
31

Data
28

GND

Data

E~·

elk2x
Phi

elk2x SysOut
Smp

vee

lelk

DWr1

vee

F

26

29

30

bon

Inl(O)

GND

Ad

2860 drw 14

144-Pln PGA (Top View)
NOTE:
1. Adrlo 16 and 17 are multifunction pins which are controlled by mode select programming on interrupt pins at reset time
Adrlo 16: MP Invalidate, CpCond (2).
.
.
Adrlo 17: MP Stall, CpCond (3).

5.1

14

1DT79R3000AlAE RISC CPU PROCESSOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

PIN CONFIGURATION

~pSYRd
em
JTeITiWr

Gnd
Gnd
. Reset
BusError
RdBusy
WrBusy

8~r(1)

~
wr1

!J
9~gm
r2

FpBUSy
nt(S)
Int(4)

.QBQill

.!n!@l
Int(2)

IRd(2)
Gnd
IClk
Gnd
DClk

.!nll.U
Int(O)
Vee
AdrLo(17)

SySLM

Vee

CIk2xRd
Clk2xSys

Vee

Clk2xSmp
Clk2xPhi
Exception
Gnd
Data(30)
Gnd
Data(29)
Gnd

AdrLo(16)

Vee

160 Pin EIAJ
Plastic Quad Flat Pack
Top Side View

CpCond(1)
CpCond(O)
Gnd
AdrLo(1S)

Vee

AdrLo(14)

Vee

AdrLo(13)

Vee
Vee

XEi

Vee

AdrLo(12)
Gnd
AdrLo(11)
Gnd
AdrLo(10)
Gnd
AdrLo(9)
AdrLo(8)
AdrLo(7)
AdrLo(6)
AdrLo(S)

Data~~
Data~~
DataP(3)
Data(31)
Data(26)
Data(25)
Data(24)
Data(22)
Data(21)

Vee

2860 drw 15

NOTE:
1. AdrLo 16 and 17 are multifunction pins which are controlled by mode select programming on interrupt pins at reset time
AdrLo 16: MP Invalidate, CpCond (2).
AdrLo 17: MP Stall, CpCond (3).

5.1

15

IDT79R3000AlAE RISC CPU PROCESSOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

PIN DESCRIPTIONS
Pin Name

1/0

Data (0-31)

I/O

A 32-bit bus used for all instruction and data transmission among the processor, caches, memory interface,
and coprocessors.

Description

DataP (0-3)

I/O

A 4-bit bus containing even parity over the data bus.

Tag (12-31)

I/O

A 20-bit bus used for transferring cache tags and high addresses between the processor, caches, and memory
interface.

TagV

I/O

The tag validity indicator.

Tag P (0-2)

I/O

A 3-bit bus containing even parity over the concatenation of TagV and Tag.

AdrLo (0-17)

0

An 18-bit bus containing byte addresses used for transferring low addresses from the processor to the caches
and memory interface. (AdrLo 16: CpCond (2), AdrLo 17: CpCond (3) set by reset initialization).

IRd1

0

Read enable for the instruction cache.

IWr1

0

Write enable for the instructon cache.

IRd2

0

An identical copy of IRd 1 used to split the load.

IWr2

0

An identical copy of IWr1 used to split the load.

IClk

0
0

The instruction cache address latch clock. This clock runs continuously.
The write enable for the data cache.

DRd2

0
0

DWr2

0

An identical copy of DWr1 used to split the load.

DClk

0

The data cache address latch clock. This clock runs continuously.

DRd1
DWr1

The read enable for the data cache.
An identical copy of DRd1 used to split the load.

XEn

0

The read enable for the Read Buffer.

AccTyp(O-2)

0

A 3-bit bus used to indicate the size of data being transferred on the data bus, whether or not a data transfer is
occurring, and the purpose of the transfer.
Signals the occurrence of a main memory write.

MemWr

0

MemRd

0

BusError

I

Signals the occurrence of a main memory read.
Signals the occurrence of a bus error during a main memory read or write.

SysOut

0
0
0

CpSync

0

RdBusy

I

The main memory read stall termination signal. In most system designs RdBusy is normally asserted and is
deasserted only to indicate the successful completion of a memory read. RdBusy is sampled by the processor
only during memory read stalls.
The main memory write stall initiationltermination signal.

Run
Exception

Indicates whether the processor is in the run or stall state.
Indicates that the instruction about to commit state should be aborted and other exception related information.
A reflection of the internal processor clock used to generate the system clock.
A clock which is identical to SysOut and used by coprocessors for timing synchronization with the CPU.

WrBusy

I

CpBusy

I

The coprocessor busy stall initiationltermination signal.

CpCond (0-1)

I

A 2-bit bus used to transfer conditional branch status from the coprocessors to the main processor.

CpCond (2-3)

I

Conditional branch status from coprocessors to the processor. Function is provided on AdrLo 16/17 pins and is
selected at reset time.

MPStall

I

Multiprocessing Stall. Signals to the processor that it should stall accesses to the caches in a multiprocessing
environment. This is physically the same pin as CpCond3; its use is determined at RESET initialization.

MPlnvalidate

I

Multiprocessing Invalidate. Signals to the processor that it should issue invalidate data on the cache data bus.
The address to be invalidated is externally provided. This is the same pin as CpCond2; its use is determined at
RESET initialization.

Int (0-5)

I

A 6-bit bus used by the memory interface and coprocessors to signal maskable interrupts to the processor. At
reset time, mode select values are read in.
2860 ttll 05

5.1

16

IDT79R3000AlAE RISC CPU PROCESSOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

PIN DESCRIPTIONS (Continued)
Pin Name

I/O

Description

Clk2xSys

I

The master double frequency input clock used for generating SysOut.

Clk2xSmp

I

A double frequency clock input used to determine the sample point for data coming into the processor and
coprocessors.
A double frequency clock input used to determine the enable time of the cache RAMs.

Clk2xRd

I

Clk2xPhi

I

A double frequency clock input used to determine the position of the internal phases, phase1 and phase2.

Reset

I

Synchronous initialization input used to force execution starting from the reset memory address. Reset must be
deasseted synchronously but asserted asynchronously. The deassertion of Reset must be synchronized by the
leading edge of SysOut.
2860 tbl 06

ABSOLUTE MAXIMUM RATINGS(1, 3)
Symbol
VTERM

TA.Tc

TBIAS
TSTG
liN

Rating
Terminal Voltage
with Respect
toGND
Operating
Temperature

RECOMMENDED OPERATING
TEMPERATURE AND SUPPLY VOLTAGE

Commercial
Military
Unit
-0.5 to +7.0 -0.5 to +7.0
V

o to +70(4) -55 to + 125
(Ambient)
(Case)
o to +90(5)
(Case)
Case Temperature -55 to + 125(4) -65 to +135
o to +90(5)
Under Bias
Storage
-55 to +125 -65 to +155
Temperature
Input Voltage
-0.5 to +7.0 -0.5 to +7.0

GND

Vee

Military
16-33 MHZ

-55°C to + 125°C
(Case)

OV

5.0 ±10%

Commercial
16-33 MHz

O°Cto +70°C
(Ambient)

OV

5.0±5%

Commercial
37-40 MHz

O°Cto +90°C
(Case)

OV

5.0±5%

Grade

Temperature

°C

°C
°C

OUTPUT LOADING FOR AC TESTING

V

2860 tbl 07

NOTE:
1. Stresses greater than those listed under ABSOLUTE MAXIMUM RATINGS
may cause permanent damage to the device. This is a stress rating only
and functional operation of the device at these or any other conditions
above those indicated in the operational sectionsofthis specification is not
implied. Exposure to absolute maximum rating conditions for extended
periods may affect reliability.
2. VIN minimum =-3.0V for pulse width less than 15ns.
VIN should not exceed Vcc +0.5 Volts.
3. Not more than one output should be shorted at a time. Duration of the short
should not exceed 30 seconds.
4. 16-33 MHz only.
5. 37-40 MHz only.

AC TEST CONDITIONS
Symbol

Parameter

To Device
Under Test

2860 drw 16

Min.

Max.

Unit

VIH

Input HIGH Voltage

3.0

-

V

VIL

Input LOW Voltage

-

0.4

V

VIHS

Input HIGH Voltage

3.5

-

V

VILS

Input LOW Voltage

-

0.4

V
2860 tbl 08

5.1

17

UIUTARY AND COMMERCIAL TEMPERATURE RANGES

IDT79R3000AlAE RISC CPU PROCESSOR

DC ELECTRICAL CHARACTERISTICSCOMMERCIAL TEMPERATURE RANGE (TA = O°Cto +70°C,

VCC

= +s.ov±S%)
79R3000AE

79R3000A
16.67MHz
Symbol

Parameter

Test Conditions

20.0MHz

25.0MHz

33.33MHz

Min.

Max.

Min.

Max.

Min.

Max.

Min.

Max. Unit

VOH

Output HIGH Voltage

Vee .. Min., IOH = -4mA

3.5

-

3.5

-

3.5

-

3.5

-

VOL

Output LOW Voltage

Vee - Min., IOL - 4mA

-

0.4

-

0.4

-

0.4

-

0.4

V

VOHe

Output HIGH Voltage(7)

Vee - Min., IOH= -4mA

4.0

4.0

= -SmA

2.4

2.4

2.4

2.4

-

V

Vee - Min., IOH

-

4.0

Output HIGH Voltage(4.6)

-

4.0

VOHT

-

VOLT

Output LOW Voltage(4.6)

Vee - Min., IOL = SmA

-

O.S

-

O.S

-

O.S

-

O.S

V

VIH

Input HIGH Voltage(5)

2.0

-

2.0

-

2.0

-

2.0

-

V

VIL

Input LOW Voltage(1)

-

O.S

-

O.S

-

O.S

-

O.S

V

VIHS

Input HIGH Voltage(2.5)

3.0

-

3.0

-

3.0

-

3.0

-

V

VILS

Input LOW Voltage(1.2)

0.4

-

0.4

-

0.4

V

Input Capaeitanee(6)

10

-

10

10

-

10

pF

10

-

10

-

10

pF

450

550

650

-

750

rnA

100

-

100

-

0.4

CIN

-

-

COUT

Output Capacitanee(6)

lee

Operating Current

Vee = 5V, TA = 70°C

IIH

Input HIGH Leakage(3)

VIH= VCC

-

ilL

Input LOW Leakage(3)

VIL= GND

-100

loz

Output Tri-state Leakage

VOH = VCC, VOL = GND

-100

-

-100

100

-100

10

-

-100

100

-100

100

V

V

100

JlA

-

-100

-

JlA

100

-100

100

JlA

2860 tbl 10

NOTES:
1.
2.
3.
4.

VIL Min. = -3.0V for pulse width less than 15ns. VIL should not fall below -0.5 Volts for larger periods.
VIHS and VILS apply to CIk2xSys, CIk2xSmp, Clk2xRd, Clk2xPhi, Cp8usy, and Reset.
These parameters do not apply to the clock inputs.
VOHT and VOLT apply to the bidirectional data and tag busses only. Note that VIH and VIL also apply to these signals. VOHT and VOLT are provided
to give the designer further information about these specific signals.
5. VIH should not be held above Vee + 0.5 volts.
6. Guaranteed by design.
7. VOHC applies to RUN and Exception.

5.1

18

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

IDT79R3000AlAE RISC CPU PROCESSOR

DC ELECTRICAL CHARACTERISTICS'MILITARY TEMPERATURE RANGE (TC = -55°C to +125°C, VCC = +5.0V ±10%)
79R3000AE

79R3000A
16.67MHz
Symbol

Test Conditions

Parameter

Min.

Vee .. Min., IOH = -4mA

Max.

20.0MHz
Max.

Min.
3.5

3.5

Min.

Output HIGH Voltage

VOL

Output LOW Voltage

Vee .. Min., IOL = 4mA

VOHC

Output HIGH Voltage(7)

Vee .. Min., IOH = -4mA

4.0

4.0

4.0

VOHT

Output HIGH Voltage(4,6)

Vec = Min., IOH = -BmA

2.4

2.4

2.4

VOLT

Output LOW Voltage(4.6)

Vee = Min., IOL = BmA

VIH

Input HIGH Voltage(5)

VIL

Input LOW Voltage(1)

VIHS

Input HIGH Voltage(2.5)

VILS

Input LOW Voltage(1,2)

0.4

CIN

Input Capacitance(6)

10

0.4

O.B
2.0

Output Capacitance(6)

Icc

Operating Current

Vee = SV, TA = 70°C

IIH

Input HIGH Leakage(3)

VIH= VCC

ilL

Input LOW Leakage(3)

VIL

loz

Output Tri-state Leakage

VOH = VCC, VOL = GND

.':'.

7~P'

3~9.

V

..:......,.....-

-';:::::::::::;.:..-

I::t~:h
::::4:::::::' t·::::::,;",'
&'~l.:,,:.:: :::::'-

10 . :::::::4&::::" .,,( 10
·::t~~::.-

-100

"::4::..

-100

-100

100

-100

600
100

V

:'::\:,:~.4

V
O.B

V

O.B

V

0.4

0.4

V

10

10

pF

V

v

3.0

10

10

pF

6S0

750

mA

100

IlA

100

IlA _ _

100

-100

V

2.0

-100
100

0.4

;::::'4~R

.<: I>:.:O;;~··

''':t::::1p/'

S()(g :}\4.,:::."::'

= GND

-:j~~~::"

I·::::':::"';':::..

3.0

:::::::10.9-

Max. Unit

O.B ::: ::::::::24:::::::· I:':' O.B

O.B

COUT

Min.

0.4 ,.1:::::::::#1::::::::

0.4

O.B

3.0

Max.

3.5

VOH

2.0

33.33MHz

25.0MHz

-100
100

-100

2860tblll~

NOTES:
1.
2.
3.
4.

VIL Min. = -3.0V for pulse width less than 15ns. VIL should not fall below -0.5 Volts for larger periods.
VIHS and VILS apply to Clk2xSys, Clk2xSmp, Clk2xRd, Clk2xPhi, Cp8usy, and Reset.
These parameters do not apply to the clock inputs.
VOHT and VOLT apply to the bidirectional data and tag busses only. Note that VIH and VIL also apply to these signals. VOHT and VOLT are provided
to give the designer further information about these specific signals.
5. VIH should not be held above Vcc + 0.5 volts.
6. Guaranteed by design.
7. VOHC applies to RUN and Exception.

5.1

19

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

IDT79R3000AlAE RISC CPU PROCESSOR

DC ELECTRICAL CHARACTERISTICSCOMMERCIAL TEMPERATURE RANGE(TC =O°C to +90°C, Vcc =+S.OV ±S%)
79R3000AE
4O.0MHz

37.0MHz
Symbol

Parameter

Test Conditions

Min.

Max.

Min.

VOH

Output HIGH Voltage

Vee .. Min., IOH .. -4mA

3.5

-

3.5

VOL

Output LOW Voltage

Vee .. Min., IOL .. 4mA

-

0.4

-

VOHe

Output HIGH Voltage(7)

Vee - Min., IOH .. -4mA

4.0

VOHT

Output HIGH Voltage(4.6)

Vee .. Min., IOH = -SmA

2.4

-

VOLT

Output LOW Voltage(4.6)

Vee .. Min., IOL = SmA

-

O.S

VIH

Input HIGH Voltage(5)

2.0

-

VIL

Input LOW Voltage(1)

-

VIHS

Input HIGH Voltage(2,5)

3.0

O.S ..::t::::~:::.. :::~::,t::;;,::':<";:'::}'
.:·:::;t~~::::::;:'::;:~ :,::.
3.0

VILS

Input LOW Voltage(1 ,2)

-

. ::/::Q+:i..

CIN

Input Capacitance(6)

-

·t::::::::::,:{l9,;;»
.::({ :':':::~::\::,::::::::':Jb

-

COUT

Output Capacitance(6)

lee

Operating Current

Vee .. 5V, TA _ 70°C

:::/::':::\':;;< ith .
100
~~~~~::::::::::.

IIH

Input HIGH Leakage(3)

VIH .. VCC

IlL

Input LOW Leakage(3)

VIL .. GND

-100

loz

Output Tri-state Leakage

VOH = VCC, VOL = GND

-100

'.'

Max.
...

Unit

\t.:;:::. -

V

.,::::~~:::: ·~:;::·:·:·:·::::::'O.4

V

"'::",,::'.;:::g3:::::::~);;:;:::';:::
4.9. ..:\},:(~

.;:::::::;:;::.:.::§;:;:::.\}

.{

4:;;:::::·

·::::t:::;;;::'::'4,P

V
V

O.S

V

-

V

O.S

V

-

V

-

0.4

V

10

pF

10

pF

-

850

mA

-

100

IlA
IlA

-

-100

-

100

-100

100

IlA
2B60 tbl 12

NOTES:
1. VIL Min. = -3.0V for pulse width less than 15ns. VIL should not fall below -0.5 Volts for larger periods.
2. VIHS and VILS apply to CIk2xSys, Clk2xSmp, Clk2xRd, Clk2xPhi, Cp8usy, and Reset.
3. These parameters do not apply to the clock inputs.
4. VOHT and VOLT apply to the bidirectional data and tag busses only. Note that VIH and VIL also apply to these signals. VOHT and VOLT are provided
to give the designer further information about these specific signals.
5. VIH should not be held above Vcc + 0.5 volts.
6. Guaranteed by design.
7. VOHC applies to RUN and Exception.

5.1

20

IDT79R3000AlAE RISC CPU PROCESSOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

AC ELECTRICAL CHARACTERISTICS(1,2,3)COMMERCIAL TEMPERATURE RANGE (TA = O°C to +70°C, Vcc

Symbol

Parameter

Test Conditions

=+5.0V ±5%)

79R3000A
16.67MHz
20.0MHz
Min. Max. Min. Max.

79R3000AE
33.33MHz

25.0MHz
Min. Max.

Min.

Max.

Unit

Clock
TCkHigh Input Clock High(2)

Note 7

12.5

-

10

-

8

TCklow Input Clock Low(2)
Input Clock Period l')
TCkP
Clk2xSys to Clk2XSmp(6)
CIk2xSmp to CIk2xRd(6)
Clk2xSmp to Clk2xPhi(6)

Note 7

12.5
30
0
0
9

-

-

500
tcyc/4
tcyc/4
tcyc/4

10
25
0
0
7

500
tcyc/4
tcyc/4
tcyc/4

8
20
0
0
5

-

6
6
15
0
0
3.5

-

ns

500
tcyc/4
tcyc/4
tcyc/4

ns
ns
ns
ns
ns

-1.5

ns

-0.5

ns

2

-

2

ns

3

-

2

ns

4.5

ns

-2.5

-

ns

7

-

ns

-2.5

-

-2.5

-

ns

6

-

5

-

3.5

ns

-

8.5

ns

500
tcyc/4
tcyc/4
tcyc/4

Run Operation
TOEn

Data Enable(3)

-

-2

-

-2

TOOls

Data Disable(3)

-

-1

-1

TOVal

Data Valid

Load= 25pF

3

TWrOly

Wr~e

Load= 25pF

-

5

-

Tos

Data Set-up

9

-

8

TOH

Data Hold(3)

-2.5

-

-2.5

Tcss

Cp8usy Set-up

13

11

TCSH

CpBusy Hold

-

-

TAcTy

Access Type (1 :0)

Load= 25pF

TAT2

Access Type (2)

Load= 25pF

TMWr

Memory Write

TExc

Exception

Delay

TAval

Address Valid

TintS

Int(n) Set-up

TlntH

Int(n) Hold

-2.5

-

17

Load= 25pF

-

27

Load= 25pF

-

7

Load= 25pF

-

2

9

-

-2.5

7

-2.5

3
4

6
-2.5
9

-1.5
-0.5

-

14

-

12

23

18

-

9.5

ns

5

-

3.5

ns

2

-

1

ns

8

-

6

4.5

-2.5

-

-2.5

-

7

1.5

-

-2.5

ns
ns

Stall Operation
TSAVal Address Valid

Load= 25pF

-

30

-

23

-

20

-

15

ns

TSAcTy Access Type

Load= 25pF

-

27

-

23

-

18

-

13.5

ns

TMRdi

Memory Read Initiate

Load= 25pF

TMRdt

Memory Read Terminate

Load= 25pF

1

27

-

27

1

23

-

23

1

18

-

18

1

13.5

ns

-

10

ns

7.5

ns

3

ns

Tstl

Run Terminate

Load= 25pF

3

17

3

15

3

10

2

TRun

Run Initiate

Load= 25pF

-

7

-

6

-

4

-

TSMWr

Memory Write

Load= 25pF

3

27

3

23

3

18

TSExc

Exception Valid

Load= 25pF

-

15

-

13

-

10

6

6
3000

-

3000

128

-

2

9.5

ns

7.5

ns

Tcyc

128

-

0

1

ns125pF

-

Reset Initialization
TrstPll Reset timing, Phase-lock on(4,5)

3000

-

3000

Trstcp Reset timing, Phase·lock off(4,5)

128

-

128

-

0.5

2

0.5

1

TRST

Reset Pulse Width

6

6

Tcyc
Tcyc

Capacitive Load Deration
CLD

Load Derate(6)

0.5

1

2860 tbl 13

NOTES:
1. All timings are referenced to 1.5V.
2. The clock parameters apply to all four 2xClocks: Clk2xSys, Clk2xSmp, Clk2xRd, and Clk2xPhi.
3. This parameter is guaranteed by design.
4. These parameters apply when the 79R3010 Floating Point Coprocessor is connected to the CPU. With phase lock on, Reset must be asserted
for the longer of 3000 clock cycles or 200 microseconds.
5. Tcyc is one CPU clock cycle (two cycles of a 2x clock).
6. With the exception of the Run signal, no two signals on a given device will derate for a given load by a difference greater than 15%.
7. Clock transition time < 2.5ns for 33.33MHz; clock transition time < 5ns for other speeds.

5.1

21

IDT79R3000AlAE RISC CPU PROCESSOR'

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

AC ELECTRICAL CHARACTERISTICS(1,2,3)MILITARY TEMPERATURE RANGE (TC = -55°C to +125°C, Vcc =+5.0V±10%)
79R3000A
16.67MHz
20.0MHz

Symbol
Clock

Parameter

'Test Conditions

Min.

Max.

Min.

Max.

79R3000AE
25.0MHz
33.33MHz

Min.

Max.

Min.

Max.

TCkHigh Input Clock High(2)

Note 7

12.5

10

TCkLow Input Clock Low(2)
TCkP
Input Clock Periodlil:)
Clk2xSys to Clk2XSmp(6)
CIk2xSmp to CIk2xRd(6)
Clk2xSmp to Clk2xPhi(6)

Note 7

12.5
30

10
25

TOVal

Load .. 2SpF

3

3

3

Load= 25pF

5

4

3 ::::::[T[::::;;l$;:[:[::: :t::::::@t '

Data Valid

TWrDly Write Delay

o

o
9

SOO
tcyc/4
tcyc/4
tcyc/4

o

o
7

9

6
6

8
8
500
tcyc/4
tcyc/4
tcvc/4

20

o

o

5

6

8

SOO
tcyc/4
tcyc/4
tcyc/4

15
0
0
3.5

,:,:,l.,.,

,':',

Unit
ns

ns
500::::: \:,' ns
tcyc/4{f\:ns
.t~~ itt ll.§::,:.,
tcyc/4
ns ()

ns

':"::;:;;2::':::::::

-

ns

::;::::1~§[J:

Tos

Data Set-up

ns

TAT2

Access Type (2)

Load .. 2SpF

17

14::\:::::='::::::::[:[[:m:t:,:.:

8.5

ns

TMWr

Memory Write

Load .. 25pF

27

23 "I:::r:i5:::r:::;::::,1 'Ef:::;:

9.S

ns

3.5

ns
ns

TlntH

Int(n) Hold

-2.5

-2.5·"::sr::i>1

3:,:;:::

4.5

ns

-2.5

ns

Stall O p e r a t i o n " : : : : : : : : : : : : : : : : "

. ,:,

r-~------------~------~~--~~~'
~~~"~--~~~--~

t-=T:-sA_v_a_1t-A:-d_d_re_s~s=V_a_li_d_ _ _ _ _ _-+-:-L_o_ad-:-._2s...:p_F-:--_
.
_+-_-+-_3__
0_,±,~",.,.,:,.,'
.....
":,,, 23/::/;',
20
1S
ns
TSAcTy Access Type
Load= 2SpF
27.::f;=:· _4.iif7::::;:;tZ:::-.l3(*r-_-+-1:-:8:-ir--+-:-13=-.-=s+-n-s-i
TMRdi

Memory Read Initiate

TSMWr Memory Write

2,t1/' ,,t:rr zf"

Load= 25pF

Load= 25pF

3:ff:::

2Z:?':'f:

23

18

3

18

2

13.S

ns

9.5

ns

r-T_SE_X_c~E_x_c_e~pt_io_n_V_a_li_d_ _ _ _ _ _~L_o_ad_=_2S~P_F_ _ _~,,:~~:~~:~~
~~~~~1_3-L--~-10-L--~_7_._5~-n_s~
Reset Initialization:::::::::::::::::':':':::::::::::::'::::::::::::?

TRST
Reset Pulse Width
~@
6
6
6
Tcyc
t--+---~:-+---~
H---~+---+-~~~~
TrstPLL Reset timing, Phase-lock on(4.5)
\:;';" 3000
3000
3000
Tcyc
Trstcp Reset timing, Phase-lock off(4.5)

128::::{i~

128

128

128

Tcyc

2

O.S

0.5

o

ns125pF

Capacitive Load Deration
CLD

Load Derate(6)

0.5

2860 Ibl 14

NOTES:
1. All timings are referenced to 1.5V.
2. The dock parameters apply to all four 2xClocks: Clk2xSys, Clk2xSmp, Clk2xRd, and Clk2xPhi.
3. This parameter is guaranteed by design.
4. These parameters apply when the 79R3010 Floating Point Coprocessor is connected to the CPU. With phase lock on, Reset must be asserted
for the longer of 3000 clock cydes or 200 microseconds.
5. T eye is one CPU clock cyde (two cycles of a 2x clock).
6. With the exception of the Run signal, no two signals on a given device will derate for a given load by a difference greater than 15%.
7. Clock transition time < 2.5ns for 33.33MHz; clock transition time < 5ns for other speeds.

5.1

22

IDT79R3000AlAE RISC CPU PROCESSOR

MIUTARV AND COMMERCIAL TEMPERATURE RANGES

AC ELECTRICAL CHARACTERISTICS(1,2,3)·COMMERCIAL TEMPERATURE RANGE (Tc = O°C to +90°C, Vcc = +S.OV ±S%)

Parameter

Test Conditions

Unit

2860 tbl 15

NOTES:
1. All timings are referenced to 1.5V.
2. The clock parameters apply to all four 2xClocks: Clk2xSys, Clk2xSmp, Clk2xRd, and Clk2xPhi.
3. This parameter is guaranteed by design.
4. These parameters apply when the 79R3010 Floating Point Coprocessor is connected to the CPU. With phase lock on, Reset must be asserted
for the longer of 3000 clock cycles or 200 microseconds.
5. Teye is one CPU clock cycle (two cycles of a 2x clock).
6. With the exception of the Run signal, no two signals on a given device will derate for a given load by a difference greater than 15%.
7. Clock transition time < 2.5ns.

5.1

23

IDT79R3000AlAE RISC CPU PROCESSOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

Clk2xSys

Clk2xSmp

Clk2xRd

Clk2xPhi

2860 drw 17

Figure 12. Input Clock Timing

SmpOut*

RdOut*

PhiOut*

2B60drw 18

Figure 13. Processor Reference Clock Timing
•

These signals are not actually output from the processor.
They are drawn to provide a reference for other timing diagrams.

5.1

24

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

IDT79R3000AlAE RISC CPU PROCESSOR

Phase

2

2

SysOut

PhiOut

AddrLo

AccTyp 0:1

Size of Load Data

AccTyp 2
D Bus
Input

Data and
Tag Buses

II

IClk

DClk

2860drw 19

Figure 14. Synchronous Memory (Cache) Timing

5.1

25

IDT79R3000AlAE RISC CPU PROCESSOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

RUN
Phase

STALL
2

STALL
2

FIXUP
2

RUN
2

AddrLo

Tag

(Address
High)

AccTypO:1

AccTyp 2
Data
(Output)

2860 drw 20

Figure 15. Memory Write Timing

5.1

26

IDTI9R3000AlAE RISC CPU PROCESSOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

RUN
Phase

STALL
2

STALL
2

FIXUP
2

RUN
2

AddrLo

Tag
(Address
High)

AccTyp 0:1

AccTyp 2

II

Data
(Input)

I

RdBusy

CpCondO

2860drw 21

Figure 16. Memory Read Timing

5.1

27

IDT79R3000AlAE RISC CPU PROCESSOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

Co rocessor Load

Co rocessor Store
Phase

2

2

Cp8usy

CpCond(n)

2860 drw 22

Figure 17. Coprocessor Load/Store Timing

5.1

28

IDT79R3000AlAE RISC CPU PROCESSOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

2

Phase

2

2860 drw 23

Figure 18. Interrupt Timing

Phase

2

2

2

2

2

Mode

r-

TdH

2860drw 24

Figure 19. Mode Vector Initialization
NOTES:
1. Reset must be negated synchronously; however, it should be asserted asynchronously. Designs must not rely on the proper functioning of SysOut prior
to the assertion of Reset.
2. If Phase-Lock On or ""'R=300=O.....Mo.,...-,de-are asserted as mode select options, they should be asserted throughout the Reset period, to insure that the slowest
coprocessor in the system has sufficient time to lock to the CPU clocks.
3. Reset is actually sampled in both Phase 1 and Phase 2. To insure proper initialization, it must be negated relative to the end of Phase 1.

5.1

29

IDT79R3000AlAE RISC CPU PROCESSOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

ORDERING INFORMATION
lOT

XXXXX

XX

x

Device Type

Speed

Process!
Temperature
Range

y!ank

~

I

GD175
______________~GD1~

I~M

~

______________________

16
20
~25

33
37
40
179R3000A

l..-----------------~I 79R3000AE

Commercial (O°C to +70°C)
Military (-55°C to +125°C)
Compliant to MIL-STD-883, Class B
Military Temperature Range Only
175-Pin PGA (Cavity Down)
1~-Pin PGA (Cavity Down)
172-Pin Flat Pack (Cavity Down)
160-Pin Plastic Quad Flat
16.67 MHz
20.0 MHz
25.0 MHz
33.33 MHz

RISC CPU Processor
Enhanced Timing Version
2860 drw 25

VALID COMBINATIONS
lOT 79R3000A 79R3000A 79R3000AE
79R3000AE

16, 20
16, 208, M
- 25, 33 B, M
- 25, 33, 37, 40

All packages
G0144, GO 175, F
G0144, GO 175, F
G0144, GO 175, F

AE2860-0

5.1

30

t;)

IDT79R3001

RISControlier™
CPU FOR HIGH-PERFORMANCE
EMBEDDED SYSTEMS

Integr.ated Device Technology, Inc.

FEATURES:
•
•
•
•
•
•
•
•
•
•
•

• Supports caches from 8 Kbytes to 16M bytes
• Independent block refill sizes for the instruction and data
caches
• Concurrent cache refill and execution
• Works on 8-, 16- and 32-bit data
• Supports unaligned 32-bit data
• Optimizing compilers for C, Ada, Pascal, Fortran
• RTOS support for C or Ada environments

Enhanced Instruction Set compatible version of
IDT79R3000 RISC CPU
Achieves high-performance with reduced parts count
and lower overall system cost
Flexible on-chip cache controller supports various
cache, main memory sizes
Supports optional data parity with parity error output
signal
Works with IDT79R3010 RISC Floating-Point
Coprocessor
DMA interface support
Large synchronous memory space for real-time systems
Full 32-bit operations - 32-bit registers, 32-bit address
and data interface
On-chip memory management unit with 64 fully associative TLB entries maps 4 Gbyte virtual address space
High-speed interrupt response (6 interrupt input pins)
with precise exception capability
High-speed CEMOSTM technology results in speeds
from 12.5 to 40MHz

DESCRIPTION:
The IDT79R3001 brings the high-performance inherent in
the IDT79R3000 RISC Microprocessorto lowercostsystems.
It does this while maintaining full (both User and Kernel)
software compatibility with both the IDT79R2000A and
IDT79R3000 RISC Microprocessors.
The IDT79R3001 achieves lower system cost by reducing
the number of components required to construct a synchronous memory (or cache) external to the processor and by
simplifying the asynchronous memory interface. By removing
the requirement for parity and allowing the system designer to
selecl the cache organization which best suits the system,

FUNCTIONAL BLOCK DIAGRAM
CPO

CONTROL

(Sy stem Control Coprocessor)

I

CPU

D

I

Master Pipeline/Bus Control

~~

~~

~

>-

~

>-

Exception/Control
Registers
Memory
Management
Unit Registers

~

V
TAG (19)

/~

General Registers
(32x32)
ALU
Local
Control
Logic

Translation
Lookaside
Buffer
(64 entries)
L.

D
Shifter
Multiplier/Divider
Address Adder
PC IncrementlMux

II

U

Virtual Page Number/
Virtual Address

I

J

~
I

I

...

"

Data (32 +4)

ADDRESS (24)
2873 drw 01

CEMOS and RISController are trademarks of Integrated Device Technology. Inc.

MILITARY AND COMMERCIAL TEMPERATURE RANGES
e 1990 Integrated Device Technology, Inc.

5.2

DECEMBER 1990
DSC-90621-

1

EI

IDT79R3001
RISController FOR HIGH-PERFORMANCE EMBEDDED SYSTEMS

overall parts count is dramatically reduced while maintaining
high performance.
The IOT79R3001 RISC Microprocessor extends the ability
of the 10T79R3000 family to support embedded and cost
sensitive applications. Its level of integration and flexibility
allows high-performance systems to be constructed at reasonable cost in a straightforward manner, without forcing the
system deSigner to support features not required in his application.
The 10T79R3001 consists of two tightly coupled processors
integrated on a single chip. The first processor is a full 32-bit
CPU based on RISC principles to achieve a new standard of
performance in microprocessor based systems. The second
processor is a system control co-processor, called CPO,
containing a fully associative 64-entry TLB (Translation
Lookaside Buffer), MMU (Memory Management Unit), and
control registers, supporting a 4 Gigabyte virtual memory
subsystem and a Harvard Architecture Synchronous Memory!
Cache controller which achieves ultra-high bandwidth using
industry standard SRAM devices.
This data sheet provides an overview of the features and
architecture of the IOT79R3001 CPU. A more detailed
description of the operation and timing of this device is
incorporated in the "10T79R3001 Hardware User's Guide,"
and a detailed architectural overview is provided in the "mips
RISC Architecture" book, both available from lOT. Further
literature describing the hardware, software, and development tools for the IOT79R3001 are also available from lOT.

HARDWARE OVERVIEW
The 10T79R3001 is a high-performance RISC microprocessor incorporating a fast execution engine and sophisticated yet flexible memory interface designed to support the
processor bandwidth requirements at minimal system cost.
Execution Engine
The 10T79R3001 contains the same basic execution engine
as the ultra-high performance I OT79R3000 and thus achieves
over 28 MIPS performance at 33 MHz.
The keytothe performance of the processor is the instruction
pipeline, illustrated in Figure 2. The execution of a single
10T79R3001 instruction consists of five primary steps, some
of which may be broken down further into smaller subsets.
The five primary stages of the pipeline, each of which
require approximately one CPU cycle, are:
IF

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

IF

Read required operands from on-Chip register file
while decoding the instruction.

ALU

Perform the required operation on instruction
operands.

MEM

Access data memory (load or store).

WB

Write results back to register file.

MEM

WB

D-Memory

Write
Back

ALU

RD

r I-Memory I Reg.

Operation

File

I

I

"-y---/
One Cycle
2873 drw02

Figure 2. IDT79R3001 Five-Stage Pipeline

Thus, the CPU achieves an average execution rate
approaching one instruction per CPU cycle, since the execution of five instructions at a time are overlapped within the
processor (Figure 3). Optimizing compiler technology fully
comprehends the interaction of software with the various
pipeline resources, and serves to both eliminate any potential
pipeline conflicts which might arise and to m'!ximize instruction throughput.
The IDT79R3001 Memory Interfaces
The key to achieving the inherent 'performance of the
IOT79R3001 is to design a memory subsystem capable of
providing a new instruction to the processor on almost every
clock cycle.
Like the IOT79R3000, the IDT79R3001 supports a hierarchical view of the memory subsystem. However, the
IOT79R3001 allows the system designerto make more tradeofts in the partitioning and architecture of the various levels in
order to more completely meet the needs of certain types of
applications.
The IOT79R3001 supports two classifications of external
memory: synchronous and asynchronous. The HarvardArchitecture (separate instruction and data memories) synchronous memory allows the processor to achieve the highest
levels of performance. The processor is able to obtain both an
instruction and data word from the synchronous memory on
every clock cycle, resulting in high instruction and data
throughput.
:}(::::;::::;:::::

Instruction Fetch, when the processor fetches the
instruction from the Instruction Synchronous
Memory.

RD

I

¢=J

MEM

WB

ALU

MEM

WB

RD

ALU

MEM

IF

RD

ALU

IF

RD

Instruction
Flow

Current
CPU
Cycle

2873 drw03

Figure 3. Instruction Execution in 1DT79R3001 Pipeline

5.2

2

IDT79R3001
RISController FOR HIGH-PERFORMANCE EMBEDDED SYSTEMS

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

Synchronous Memory System

The asynchronous memory space contains larger, slower
·memory devices such as EPROM, main memory DRAMs,
and peripheral devices. Multiple clock cycles are required for
data movement in the asynchronous memory.
Many systems implement a memory hierarchy between
these two memory spaces, whereby the synchronous memory
space is used as processor caches and the asynchronous
memory space is used for main memory. The IDT79R3001
integrates a flexible Direct-Mapped Cache Controller OnChip, eliminating external cache control logic and minimizing
cache management overhead. If the synchronous memory
space is used for processor caches, then cache "misses" will
cause the processor to automatically process an asynchronous memory transfer to refill the cache.
The key to achieving the system cost and performance
goals of an IDT79R3001-based system is to partition the
memory system to the needs of the application.

As with any high-performance processor, the IDT79R3001
requires high-bandwidth to achieve high-performance. Thus,
it is important that the majority of its execution occur in the
synchronous memory space. In applications which require
substantial amounts of main memory, this memory space will
be implemented as instruction and data caches.
The synchronous memory is deSigned to be able to supply
both an instruction and data word to the processor on each
clock cycle. When the synchronous memory spaces are used
as caches, then they are used to hold instruction and data that
is repetitively accessed by the CPU (for example, within a
program loop). This reduces the number of slower asynchronous memory cycles and thus achieves higher performance.
Some microprocessors incorporate small amounts of cache
on-Chip, which has a very small and unpredictable effect on
the execution of large programs. The IDT79R3001 supports

2

1
(Instruction
Read)

2

1
(Instruction
Read)

(Data Read)

(Data Store)

AddrLo
I
I

DClk

/

\

/:
'---L

\

I

III

I
I

IClk

IRd

DRd

/
/

\

/

\

/
\

I

I

\l

/

\

DWr

I
I

ex

/

I

Data and
TAG Buses

0

Instr. RAM

I
I

0

I

Data RAM

I
I

I

Instr. RAM

)

: CPU Data Pins:
I

I
2873 drw04

Figure 4. Synchronous Memory Control Timing

caches of from 8kB in size up through 16MB, thus bringing
substantial performance improvements to very large programs and also allowing real-time system designers to design
cache-based systems to support deterministic requirements.
The I DT79R3001 directly controls the synchronous memory
interface (whether it is being used as caches or not) with a
minimum of external components. The I DT79R3001 includes
all control signals and cache TAG control logic (for a direct
mapped cac~e) forthe synchronous memory interfaces. Parity
over the data portion of each synchronous memory can be
optionally selected at RESET time for applications which
desire to make this cost trade-off.

5.2

The synchronous interface works by dividing the basic
CPU cycles into two phases. During one phase, a cache
address is presented by the processor and captured by
external latches (the latch control signals are directly generated by the CPU). During the next phase, the address forthe
other memory space is generated and captured while the data
movement operation or the first cache is completed. The
processor directly generates the SRAM Output Enable and
Write Enable signals and the address latch enable signals,
requiring no external decoding. This is illustrated in Figure 4.
Further, the IDT79R3001 supports the ability to refill multiple
words into the cache from main memory when a cache-miss

3

I

I DTI9R3001
RISController FOR HIGH-PERFORMANCE EMBEDDED SYSTEMS

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

occurs, further reducing system cost and increasing performanceincache-basedsystems. The IDT79R3001 can obtain
1, 4, 8, 16, or 32 words from main memory when processing
a cache-miss, thus amortizing the cache-miss penalty over a
large amount of data.
The IDT79R3001 also performs instruction streaming, which
is the simultaneous execution of incoming instructions while
the cache is being refilled.
The actual width of the tag bus, and whether or not parity
over the data parts of each synchronous memory is included,
is determined according to how the device is initialized. The
IDT79R3001 can accommodate aTAG bus width of 0-19 bits,
compatible with a variety of cache sizes and cacheable main

memory choices. The IDT79R3001 allows the system deSigner to scale the synchronous memory system exactly
according to the system needs, thus eliminating extra memory
and logic devices and achieving substantial cost savings with
no loss of performance.
Thus, the synchronous memory interface ofthe IDT79R3001
allows for high-bandwidth memory systems to be implemented
with a minimum of control logic. This is desirable, since RiSe
performance tends to be a function of memory bandwidth. By
simplifying the design of the synchronous memory system
(illustrated in Figure 5), it is easierforthe system designer to
achieve high performance with minimum chip count and
without requiring ultra-fast or specialty components.

IDT79R3001
RISControlier
"'- Data
(Data Parity)

TAG Ii
Valid

r

~

AddrLo

DWr IRd
IClk
DClk DRd IWr

L~ ~l I

~ ~ ~E

FCT373A

LE

LE

I

J.~
Data
Cache
Tags
(SRAM)

lr
I

--

I

'~t

... J..

~
Data
Cache
Data
(SRAM)

FCT373A

---=-WE

4--

WE

~

N

OE

""1r

Instruction
Cache
Data
(SRAM)

4'1

f--

I--

Instruction
Cache
Tags
(SRAM)
L

;::...

I
2B73drw 05

Figure 5. IDTI9R3001 Synchronous Interface

The TAG Bus
The TAG bus of the IDT79R3001 has been designed to
allow the system designer to implement the exact cache
configuration that is right for the system. For larger caches,
low-order TAG bits do not need to be supplied for the TAG
comparison. Additionally, the number of high-order TAG bits
supplied is determined by the system designer, according to
the amount of cacheable main memory the system supports.
Since most embedded systems would tend to implement
caches of 16KB and greater, and cacheable memory spaces
of 32MB or smaller, significant cost and area reductions are
achieved by configuring a smaller TAG bus.

The system configures the on-chip TAG comparator at
RESET Initialization time. If a TAG bit is not to be included in
the synchronous memory TAG bit compare, a pull-down
resistor of 4kQ is connected to the appropriate IDT79R3001
TAG pin. If a TAG bit is to be included, no resistor is required
(the IDT79R3001 pulls floating inputs to Vee during RESET
by a small pull-up, which is disabled when RESET is negated).
If a TAG bit is excluded from the cycle-by-cycle comparison, it is still driven out with the appropriate address value
during write cycles or asynchronous memory reads. Thus, the
system designer still has the full 4 Gbyte of address space
available for address decoding, without requiring the synchronous memory to be able to cache all such addresses.
5.2

4

1DT79R3001
RISController FOR HIGH-PERFORMANCE EMBEDDED SYSTEMS

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

Figure 6 illustrates a reduced system, which implements
·16KB of Instruction and 16KB of data cache, and 512MB of
cacheable address space, using just 6 IDT71586 4Kx16
Latched CacheRAMTM components and 4 pull-down resistors.
Note that in systems which do not implement the synchronous memory space as cache, then pull-down resistors

would be added to all TAG pins. The Valid Pin still needs to
be supplied on each cycle, thus allowing various memory
schemes to be implemented (such as static column DRAM).
However, the IDT79R3001 can be initialized to not assert the
Valid pin as an output during Write cycles, simplifying the
design of logic to drive the signal.

IDT79R3001
RISControlier

~

y

Data
(Data Parity)

AddrLo

~

Ii

TAG 13,29:31
TAG 14:28
Valid

DWr IRd
IClk
DClk DRd IWr

~r-l
, ¥4kQ
I

j>Data
Cache
Tags
IDT71586

i'f-

~

f-f--

k

7""

"""
4Instruction
LE
t-Cache
Data
WE
2x1DT71586 f-~
OE

f4Data
LE
Cache
I"
Data
WE
2xlDT71586

i'[

-==-

--OE

i

~ I

l

~t
Instruction
. Cache
Tags
IDT71586

II

/~

I

2873 drw 06

Figure 6. Small Footprint Cache for IDT79R3001

Cache Update
When the on-Chip TAG comparator indicates that the item
read from the cache was not the desired item, a cache-miss
is processed. A main memory (asynchronous) transfer is
automatically processed.
The IDT79R3001 desires to update the cache using a burst
refill of multiple adjacent words from main memory. The
processor is "stalled" until the first word of the block is
available. The processor is then released, and the block of
words is brought into the cache at the rate of one word per
CPU clock cycle.
Note that if the cache-miss was in the instruction cache, the
processor is capable of simultaneously executing the incoming
instruction stream as the cache is updated, thus effectively
making the cache update transparent to the system and
increasing performance.
Write Cycles
The IDT79R3001 utilizes a write through cache. That is,
data written by the processor is both written to the cache and

5.2

main memory simultaneously. Thus, main memory always
has a current copy of all data.
Typically, latching devices are used between the cache
subsystem and the slower main memory. These Write Buffers
capture the data simultaneous with the cache update, allowing
the processor to continue to the next cycle without actually
waiting for the main memory transfer to complete. The
IDT79R3001 generates parity over the data field on write
cycles, which can be propagated into both the synchronous
and asynchronous memory spaces.
When the processor writes less than a 32-bit quantity (a
"partial" word), the processor can perform a "read-modifywrite" of the cache. That is, the processor will read the 32-bit
word containing the partial address(es) to be updated from the
cache. If a "hit" occurs, then the new data will be merged with
the old and the new 32-bit value will be written both to the
cache and to main memory. If a cache "miss" occurs, then only
the partial data is written to main memory and the cache is
unchanged. Partial word capability is selected as a RESET
option.

5

IDn9R3001
RISControlier FOR HIGH-PERFORMANCE EMBEDDED SYSTEMS

THE ASYNCHRONOUS MEMORY INTERFACE
The IDT79R3001 also supports an asynchronous memory
interface, which supports the use of slower memory devices
such as slow DRAM or EPROM and also supports the use of
peripherals and other "non-cacheable" devices.
In general, if a cache-miss (or parity error, ii enabled)
occurs, the processor will automatically use the asynchronous memory interface to retrieve the desired data, and will
update the cache accordingly.
Additionally, software can force the use of the asynchronous
memory space through the use of the on-chip MM U. When the
processor seeks either instructions or data within a certain
address range (kseg1), the processor knows that this data is
uncacheable and will perform an asynchronous memory
transfer. Additionally, within cacheable memory, TLB entries
can be used to make certain pages as "uncacheable". When
an address of an "uncacheable" page is used, the processor
will automatically use the asynchronous memory space.
The asynchronous memory space uses the same data bus
as the synchronous memory space. This facilitates the
automatic updating of cache memory when the asynchronous
memory is accessed due to cache-miss activity or rnemory
writes. The asynchronous address bus is composed from the
synchronous memory AddrLo bus, and the TAG bus. External
logic devices (such as IDT74FCT374A registers) are used to
capture AddrLo and TAG values forthe asynchronous transfer

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

address. Note that systems which exclude individual TAG bits
from comparison (to reduce cache width) still have all TAGs
available as outputs.
The data path between the processor and the asynchronous memory space is managed according to the needs of the
application. Write Buffer FIFO devices, such as the
IDT79R3020, are used to capture address and data during
store cycles. These devices are used to capture the data in
one-cycle, and allow the processor to continue to execute
from the synchronous memory while the slower asynchronous memory actual retires the write.
The read path is also constructed according to the needs of
the system. If block refill is used, then the read path is highly
dependent on the design of the main memory system. Pipeline
devices such as IDT74FCT540A, or simple latches such as
IDT74FCT374, may be used.
A simple asynchronous memory interface is shown in
Figure 7. In this system, main memory is assumed to be fast
enough to support the block refill requirements of the system,
thus simplifying the read path. In fact, both the read and write
data paths are actually managed through a single set of
IDT29FCT52A bidirectional latching transceivers.
During write cycles (which are typically captured by Write
Buffers), the processor asserts MemWrto indicate that a write
cycle is in progress. The memory system negates WrBusy to
indicate that the processor is done with the write cycle.

Tag
Data
IDT79R3001
R1SController

XEn I-++---+---..J
WrBusy SysClk MemRd
RdBusy Bus Err MemWr AccTyp
Main Memory
Address (32)

Main
Memory
Control

Main Memory
Data (32)

Buffered
SysClk
CONTROL

RdlWr

t-------------.,~

Main
Memory

Ready~----------~
2873 drw07

Figure 7. IDn9R3001 Asynchronous Interface

5.2

6

1DT79R3001
RISController FOR HIGH-PERFORMANCE EMBEDDED SYSTEMS

I
IDT79R3001
RISController

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

Clock
Generator

I

Clocks
Cp8usy
Run
Exception
Int(n)
CpCond1

Clocks
Fp8usy
Run
Exception
Fpint
FpCond

CpSync

Tag

Data

IDT79R3010
FPA

FpSync

C

AddrLo

FpSysOut
FpSysin

~

Data
~

1

+

Addr

Addr

I-Cache
Tag
A~

D-Cache

Data

Tag

Data

t

J

t

I

2873 drw 08

Figure 8.IDT79R3001 Interface to 1DT79R3010 Floating Point Co-Processor

During read cycles, the processor will assert MemRd to
indicate that a main memory read is in progress. The memory
system will hold RdBusy active until the desired data is
available. The processor will activate the XEn signal to allow
data to be passed from the main memory to the processor
databus. If the cache is to be updated with the new data, then
the processor will assert the appropriate cache write Signal to
allow the cache RAMs to capture the incoming databus.
The AccTyp bus is used to indicate the size of the data
transfer (8,16,24, or 32 bits), and for main memory reads,
whether or not the data is "cache able". This simplifies the
main memory address decoding, since the AccTyp indicates
whether the main memory needs to perform a burst read of
multiple words.
Co-Processor Interface
The IDT79R3001 implements a co-processor interface,
which allows the use of the IDT79R3010 high-performance
RISC Floating Point Accelerator without requiring the use of
external interface components.
The co-processor interface has been designed to make
system co-processors appear to the programmer as if they
were on-chip extensions of the core execution engine. Thus,
the IDT79R3010 FPA works as a true co-processor, rather
than as a peripheral which must be programmed.
In the IDT79R3001 co-processor model, the CPU is
responsible for controlling all data cycles. The co-processor
keeps in synchronization with the CPU (including the pipeline
stages), and uses a Phase-Locked Loop to keep synchronized

5.2

wHh the processorbustraffic. The co-processorthen "snoops' . .
the data bus, watching for co-processor instructions. It also knows when data cycles on the bus are intended for it (either
as a target in co-processor load operations, or as a source for
co-processor restore operations), and performs the data
portion of the operation when appropriate. Thus, co-processors
effectively load and store directly with memory, without requiring
operands to go through the CPU first. This achieves the
highest levels of performance (note that the co-processor
interface also supports move, whereby data can be moved
directly between the CPU and any co-processor).
Figure 8 illustrates the use of the IDT79R3010 in a
IDT79R3001 system. The co-processor interface manages
synchronization between the parts, and is used to communicate status from the co-processorto the CPU. CpBusy, or coprocessor busy, stalls the CPU until the busy co-processor
resource (requested by a co-processor instruction) is free,
and CpCond, or co-processor condition, is used to report
status on co-processor test instructions. CpSync, is used to
help the co-processor stay "locked" to the CPU, so that the coprocessor knows when data is on the bus to be sampled on
load operations or when to place data on the bus for store
operations.
Note that the co-processor sits on the same data bus as the
CPU, but has no connection to the address bus. The CPU is
responsible for performing all memory addressing, including
the determination of "cache hit", write-buffer full cycles, and
any processing that might be required for cache misses.

7

1DT79R3001
RISControlier FOR HIGH-PERFORMANCE EMBEDDED SYSTEMS

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

INTERRUPTS

DMA INTERFACE

The IDT79R3001 features 6 separate interrupt input pins.
Interrupts are not vectored, but rather cause the general
exception vector address to be the next execution address.
These pins are not encoded internally; external logic can
choose to implement these interrupt lines as either 6 or 64
interrupt sources; software would then perform the appropriate decoding to get to the specific interrupt handler.
Interrupts are recognized in the ALU stage of the on-chip
pipeline. Instructions less advanced in the pipeline are
"flushed" and will be restarted when the return from exception
occurs (an on-chip register contains the address of the
instruction which was excepted). Instructions fu rther advanced
in the pipeline are allowed to continue. Unlike other RISC
processors, the IDT79R3001 does not require the programmer to save and restore pipeline status to allow normal
execution to be resumed. Depending on the application and
exception, at most software would need to save/restore the
on-Chip data registers, status register, Exception PC and
exception "cause" register.
Note that the co-processor model includes "precise exceptions." That is, an exception is signaled to the exact
instruction which generated the exceptional condition. No
further state commitments are made by the IDT79R3001 and,
thus, the exact context at the time of the exception is known
to the programmer. This is true even for multi-cycle operations, such as those of the FPA.

The IDT79R3001 features a simple DMA interface which
allows an external master to gain control of the synchronous
memory space. Note that it is not necessary to include logic
on the CPU to arbitrate for the asynchronous memory space;
the readlwrite buffer interface is where such arbitration logic
belongs and it is left to the system designer to implement the
type of asynchronous memory structure that best fits the
application.
When an external master "owns" the synchronous bus, the
CPU will tri-state the following pins and buses:

Tag

AddrLo: The Synchronous memory direct address bus.
Data & Tag: The synchronous memory RAM data lines.
Cache Control: IRd, IWr, lelk, DRd, DWr and DClk. This
allows the external masterto use the existing control
lines to control the synchronous memory.
XEn: The read buffer transceiver enable, which will
allow the external master to use the readlwrite
buffer path for DMA.
Valid: This enables the DMA interface to be used for
multi-processing applications.

Data
AddrLo 1-..---/

Cache Ctrl t-t-"T'"'ii-----;-----t-----1
IDT79R3001
RISController

DMAStal1
Req.

AddrLo Cache
Ctrl
Tag

DMA
Controller

~

_ _ _ _- - I

Async IIF ~--------~
Ctrl
I....--.---,r---'

Main Mem C t r l l - - - - - - - - - - - + f
2873 drw 09

Figure 9. IDT79R3001 DMA Interface

5.2

8

1DT19R3001
RISControlier FOR HIGH-PERFORMANCE EMBEDDED SYSTEMS

Input

W Cycle

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

X Cycle

V Cycle

Z Cycle

Into

Reserved

Reserved

Reserved

Reserved

Int1

Reserved

Reserved

Reserved

Reserved

Int2

DBlkSizeO

DBIkSize1

Parity On

Valid Output

Int3

IBlkSizeO

IBIkSize1

Store Partial

ControlLow

Int4

PilOn

PilOn

PilOn

PilOn

Int5

Reserved

BigEndian

TriState

Reserved

NOTE:
1. Reserved signals must be "high" during these cycles.

2873 tbl 01

Table 1. IDT79R3001 Mode Selectable Features

The DMA interface consists of a single input signal,
DMAStall, which causes the processor to stall and to tri-state
the above named lines. The external master is guaranteed
mastership of the bus within a very short number of cycles,
depending on the exact external bus activity of the CPU when
the DMA was requested. The DMA master negates the
DMAStali Signal when the DMA operation is completed to
allow the CPUto resume processing. Consultthe "IDT79R3001
Hardware User's Guide" for more details.
Figure 9 illustrates the system connection of an external
DMA master to a IDT79R3001 system.

enabled, the processor will check the parity when a synchronous access occurs; if a parity error is detected, it is Signaled
to the external world on the Parity Error Signal and a cachemiss cycle is processed. the Parity Error signal will remain low
until the parity error flag in the CPO status register is cleared
by software.
A number of other system selectable features are selected
at reset time. The input reset "vectors" are sampled on the
interrupt input lines during the last four cycles of the reset
period. The input vectors are listed in Table 1. These
selections include the ability to select the block refill sizes for
each of the instruction and data memories, whether Big
Endian or Little Endian order is to be used, whetherto use data
parity, and whether or not to accommodate a Phase-Locked
Loop for a co-processor. The initialization of the CPU and
meaning of each input vector is more fully explained in the
"IDT79R3001 Hardware User's Guide".

ADVANCED FEATURES
The IDT79R3001 contains special features which provide
added flexibility across a number of applications, as well as
allow for system diagnostic support.
In support of diagnostics, the IDT79R3001 allowsforcache
"swapping" (interchange of which memory bank is for instruction and which is fordata), which is useful in system initialization,
cache flushing, and diagnostics. Additionally, the caches can
be "isolated" from main memory, which forces cache "hits" to
occur regardless of the tag comparison, and which is useful in
determining that the synchronous memory space RAMs are
functional.
An additional feature is the ability to enable parity checking
over the data field of each synchronous memory. If parity is
General Purpose Registers

o

31

0
r1
r2

···
·
r29

PROCESSOR ARCHITECTURE
The IOT79R3001 is a full implementation of the
IDT79R2000AlIDT79R3000 Instruction Set Architecture (the
MIPS-liSA). This architecture is discussed in great detail in
"mips RISC Architecture," available from lOT;
IOTI9R3001 CPU Registers
The IOTI9R3001 CPU provides 32 general purpose
(orthogonal) 32-bit registers, a 32-bit Program Counter and
two 32-bit registers used to hold the results of the CPU integer
multiply and divide operations.
Two of the 32 general registers have special purposes
designed to increase processor performance: register rO is
hardwired to the value "0", a useful constant; and register r31
is used as the link register in jump-and-link instructions (the
return address for subroutine calls). Otherwise, there is no
requirement that a particular register be used as a stack or
frame pOinter, etc., although there is a register convention as
part of the "mips ABI" (Applications Binary Interface standard)
which the compiler suite uses.
The CPU registers are illustrated in Figure 10. Note that
there is no Program Status Word register shown in this figure.
The functions traditionally provided by a PSW register are
instead provided in the Status and Cause Registers incorporated within the on-chip System Control Co-Processor (CPO).
The instruction set does not use condition codes.

Multiply/Divide Registers
31

I

0

I

HI

31

I

0

I

LO

Program Counter
31

I

0
PC

I

r30
r31
2873 drw 10

Figure 10. IDT79R3001 Registers

5.2

9

EI

I DT79R3001
RISController FOR HIGH-PERFORMANCE EMBEDDED SYSTEMS

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

Instruction Set Overview
AIIIDT79R3001 instructions are 32 bits long and there are
only three instruction formats (see Figure 11). This approach
simplifies decoding, thus minimizing instruction execution
time. The IDT79R3001 processor initiates a new instruction
on every RUN cycle, and is able to complete an instruction on
almost every clock cycle. The only exceptions are the LOAD
instructions and BRANCH instructions, which each have a
single cycle of latency associated with their execution (that is,
the instruction immediately afterthe branch is always executed
regardless of the branch condition; similarly, the data loaded
by a LOAD instruction is not available to the subseq~ent
instruction). However, in the majority of cases the compIlers
(and even the MIPS assembler) are able to reorder instructions to fill these latency cycles with useful instructions which
do not require the results of the previous instruction (in the
worst case, a NOP instruction is inserted). This effectively
eliminates these latency effects and does not require the
applications programmerto be aware of the pipeline ~tructure.
The actual instruction set of the CPU was determIned after
extensive simulations to determine which instructions should
be implemented in hardware and which operations are best
synthesized in software from other basic operations. This
methodology has resulted in the highest performance
processor available.
The IDT79R3001 instruction set can be divided into the
following groups:
• Load/Store Instructions move data between memory
and the general registers. These are all "I-Type" instructions. The only addressing mode supported is base register plus signed, immediate 16-bit offset. This effectively
allows three addressing modes: register plus offset, register (using zero offset), and immediate (using rO,the zero
register).
The Load instruction has a single cycle of latency, as
described above. That is, the instruction immediately after
the load instruction cannot rely on the new data; however,
the assembler and compilers automatically handle this,
reordering code to insure that no conflicts occur. Note that
the store operation has no latency in its effect.
Loads and stores can be performed on byte, half-word,
word, or unaligned word data (32-bit data not aligned on a
modul0-4 address).
• Computational instructions perform arithmetic, logical, and
shift operations on values in registers. They occur in both
"R-Type" (both operands and the result are general registers), and "I-Type" (one operand is a 16-bit immediate
value) formats.
Note that computational instructions are three operand
instructions: that is, the result register can be different from
both source registers. This means that operands need not
be overwritten by arithmetic operations. This results in a
more efficient use of the register set, and further increases
performance.

I-Type (Immediate)
26 25 21 . 20 16

I

I

op

rs

I

rt

0

15

31

I

I

immediate

J-Type (Jump)
31

I

26

0

25

I

op

I

target

R-Type (Register)
31

I

26
op

25 21

I

rs

20 16

I

rt

I

15 11
rd

10

I

re

6

5

I

0

funct

I

2873 drw 11

Figure 11. IDT79R3001 Instruction Formats

• Jump and Branch instructions change the flow of control of
a program. Jumps are always to a page~ absolute. address
formed by combining a 26-bit target WIth four bIts of the
Program Counter ("J-Type" format for subroutine calls), or
32-bit register byte addresses ("R-Type," for Returns and
dispatches). Branches have 16-bit offsets relative to the
.
program counter ("I-Type").
Jump and Link instructions save a return address In
Register 31. The IDT79R3001 instructi.on set fe~.tures
numerous branch conditions. Included IS the abIlity to
branch based on a comparison of two registers, or on the
comparison of a register to zero. Thus, net performance is
increased since the processor does not have to precede
the branch instruction with arithmetic operations.
• Co-processor instructions perform operations in the coprocessors (such as the IDT79R301 0 FPA). Co-processor
Loads and Stores are "1- Type;" computational instructions
have co-processor dependent formats.
• Co-processor 0 instructions perform o~erations on :he
System Control Co-processor (CPO) regIsters to manrpulate the memory management and exception handling
facilities of the on-chip co-processor.
• Special instructions perform a variety of tasks, including
movement of data between general and special registers,
system calls, and breakpoint operations. These are always
"R-Type."
1DT79R3001 System Control Co-processor (CPO)
The IDT79R3001 can operatewith uptofourtightlycoupled
co-processors, designated CPO-CP3. CPO is included onchip as co-processor 0, the System Control .co-processor.
CPO is responsible for supporting both the VIrtual memory
system and the exception handling functions of the
IDT79R3001.

5.2

10

IDTI9R3001
RISControlier FOR HIGH-PERFORMANCE EMBEDDED SYSTEMS

op
LB
LBU
LH
LHU
LW
LWL
LWR
SB
SH
SW
SWL
SWR

Description

OP

Load/Store Instructions
Load Byte
Load Byte Unsigned
Load Halfword
Load Halfword Unsigned
Load Word
Load Word Left
Load Word Right
Store Byte
Store Halfword
Store Word
Store Word Left
Store Word Right

MULT
MULTU
DIV
DIVU
MFHI
MTHI
MFLO
MTLO

ANDI
ORI
XORI
LUI

Arithmetic Instructions
(ALU Immediate)
Add Immediate
Add Immediate Unsigned
Set on Less Than Immediate
Set on Less Than Immediate
Unsigned
AND Immediate
OR Immediate
Exclusive OR Immediate
Load Upper Immediate

ADD
ADDU
SUB
SUBU
SLT

Arithmetic Instructions
(3-operand, register-type)
Add
Add Unsigned
Subtract
Subtract Unsigned
Set on Less Than

SLTU
AND
OR
XOR
NOR

Set on Less Than Unsigned
AND
OR
Exclusive OR
NOR

SLL
SRL
SRA
SLLV
SRLV
SRAV

Shift Instructions
Shift Left Logical
Shift Right Logical
Shift Right Arithmetic
Shift Left Logical Variable
Shift Right Logical Variable
Shift Right Arithmetic Variable

ADDI
ADDIU
SLTI
SLTIU

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

Description
Multiply/Divide Instructions
Multiply
Multiply Unsigned
Divide
Divide Unsigned
Move From HI
Move To HI
Move From LO
Move To LO

SYSCALL
BREAK

Jump and Branch Instructions
Jump
Jump and Link
Jump to Register
Jump and Link Register
Branch on Equal
Branch on Not Equal
Branch on Less than or Equal to Zero
Branch on Greater Than Zero
Branch on Less Than Zero
Branch on Greater than or
Equal to Zero
Branch on Less Than Zero and Link
Branch on Greater than or Equal to
Zero and Link
Special Instructions
System Call
Break

LWCz
SWCz
MTCz
MFCz
CTCz
CFCz
COPz
BCzT
BCzF

Coprocessor Instructions
Load Word from Coprocessor
Store Word to Coprocessor
Move To Coprocessor
Move From Coprocessor
Move Control to Coprocessor
Move Control From Coprocessor
Coprocessor Operation
Branch on Coprocessor z True
Branch on Coprocessor z False

MTCO
MFCO
TLBR
TLBWI
TLBWR
TLBP
RFE

System Control Coprocessor
(CPO) Instructions
Move To CPO
Move From CPO
Read indexed TLB entry
Write Indexed TLB entry
Write Random TLB entry
Probe TLB for matching entry
Restore From Exception

J
JAL
JR
JALR
BEQ
BNE
BLEZ
BGTZ
BLTZ
BGEZ
BLTZAL
BGEZAL

II

2B73 tbl 02

Table 2. IDTI9R3001 Instruction Summary

5.2

11

IDT79R3001
RISControlier FOR HIGH-PERFORMANCE EMBEDDED SYSTEMS

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

cP~s~e~~~;~~~essor,

11~[~~iiill!1 Iff!!f!~~~~~!ff!1 Ilflfl ifl!i:fi~ l lflfl l!ifil

CPO has a numberof registers which it
uses to perform its control functions. These include 64 fully
associative Translation Lookaside Buffers (TLBs), used to
manage the virtual memory space; registers to manage the
TLB set; and the exception handling registers. Figure 12
illustrates the register set ofthe System Control Co-processor.
Table 3 provides a brief explanation of the function of each of
these registers. A more detailed explanation of the use of
each of these registers is included in the "mips RISC
Architecture" manual.

ENTRYHI

I

ENTRYLO

I

. .
63

TLB

Memory Management System
The IDT79R3001 supports a virtual ,memory sY$tem, so
that each task in a given application can be unaware of the
addressing needs of other tasks. This is also useful in
systems with limited physical memory; the IDT79R3001 provides for the logical expansion of memory by translating
addresses composed in a large virtual space into available
physical memory addresses.

81--_ _ _ _ _ _ _--1
7
NOT ACCESSED
BY RANDOM

OL...------------'

D
ED

Used with Virtual Memory System
Used with Exception Processing
2873 drw 12

Figure 12. The System Control Co-processor (CPO) Registers

Register

Description

EntryHi

High half of a TLB entry

EntryLo

Low half of a TLB entry

Index

Programmable pointer into TLB array

Random

Pseudo-random pointer into TLB array

Status

Mode, interrupt enables and diagnostic status
information

Cause

Indicates nature of last exception

EPC

Exception Program Counter-contains address of
instruction which detected the exception

Context

Pointer into kernel's virtual Page Table Entry array

BadVA

Most recent bad virtual address

PrlD

Processor revision identification (Read only)
28731bi 03

Table 3. CPO Registers

5.2

12

IDT79R3001
RISController FOR HIGH·PERFORMANCE EMBEDDED SYSTEMS

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

MMU ADDRESS TRANSLATION
VIRT'UAL

PHYSICAL

Oxffffffff

Kernel Mapped
Cache able
(kseg2)

ANY

OxcOOOOOOO 1 - - - - - - - - - - 1

Kernel Uncached
(kseg1)

OxaOOOOOOO 1--_ _ _ _ _ _ _-1

Physical
Memory

3584 MB

Memory

512 MB

Kernel Cached
(ksegO)

Ox80000000 1--_ _ _ _ _ _ _-1

User Mapped
Cacheable
(kuseg)
OxOOOOOOOO

L -_ _ _ _ _ _ _.....

2873drw 13

IDT79R3001 Operating Modes
The IDT79R3001 has two operating modes: User Mode
and Kernel Mode. The IDT79R3001 normally operates in the
User Mode until an exception is detected, forcing it into the
Kernel Mode. The processor remains in Kernel Mode until the
exceptions are handled and the processor executes an RFE
(Return from Exception) instruction, which will restore it to
User Mode. Kernel Mode allows software to alter machine
state information such as that contained in the CPO registers;
that is, if in User Mode an access is attempted to Co-processor
o and the Kernel has not enabled the User to access the coprocessor, an exception will occur. Similarly, if a User task
attempts to use a Kernel virtual address,' an exception will
occur. Thus, system resources are protected from Usertasks.
The manner in which memory addresses are translated
(mapped) depends on the operating mode of the IDT79R3001
and on the virtual address desired. Figure 13 illustrates the
virtual address mapping performed by the IDT79R3001:
User Mode - in this mode, a single, uniform virtual address
space (kuseg) of 2 Gbyte is available to each user task (tasks
are further identified by a 6-bit process identifier field in order
to form unique virtual addresses). All references to this
44
VPN

43

I

38
TLBPIO

37

I

y

31

32

0

segment are mapped using the TLB, which utilizes both the
virtual address and the Process 10 field to perform the virtualto-physical mapping (note that this allows the cache to be
shared by up to 64 User processes at a time without requiring
.
time consuming Cache or TLB flushing).
Kernel Mode-Four separate segments are accessible
through this mode:
• kuseg-When in the Kernel Mode, references to this
segment are treated just like User Mode references, thus
streamlining Kernel accesses to User memory.
• ksegD-References to this 512 Mbyte segment may use
the cache memory, but are not translated by the TLB.
Instead, these addresses map directly to the first 512
Mbytes of the physical address space. Note that many
dedicated embedded applications will utilize this address
space and kseg1 only, rather than any of the TLB mapped
segments.
• kseg1-References to this 512Mbyte segment are not
mapped through the TLB. Additionally, this memory is
viewed as uncacheable, which means that references
through this segment will always use the asynchronous
memory interface. As with ksegO, references through this

o

I
A.

~--------------~y~--------------~

ENTRYLO

ENTRYHI

0- Dirty PageIWrite Protect
V - Valid entry flag
G - Global flag (ignore PIO)
0- Reserved

VPN - Virtual Page Number
TLBPIO - Process 10
PFN - Physical Frame Number
N - Non-cacheable Physical Page

2873 drw 14

Figure 14. TLB Entry Format

5.2

13

II

IDT79R3001
RISController FOR HIGH-PERFORMANCE EMBEDDED SYSTEMS

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

segment are hard-mapped to the first 512 Mbytes of
physical memory. When the processor boots, the reset
vector is contained in this segment, so that the processor
does not require either the cache or the TLB to be valid at
RESET time.
• kseg2-References to this 1 Gbyte segment are always
mapped through the TLB. As with kuseg, the ability of
memory pages to be cached is determined by a bit setting
in the TLB entry for that page.

Note that the use of the TLB does not incur an execution
penalty, since the execution engine pipeline includes stages
to cover for the time required to make the TLB search and
translation.
TLB misses occur when no successful match occurs.
These events are handled in software. The CPO registers give
the software enough information to obtain the appropriate TLB
entry at speeds which exceed those achieved by many CPUs
which use hardware TLB replacement (10-12 cycles under
UNIX).
When a TLB miss occurs, the address of the instruction
which was executing is stored in the EPC register, and the
BadVA register contains the address which was being translated. The Context register uses the BadVA value to generate
a direct pointer to the kernel Page Table Entry for the desired
virtual address. The Random register suggests the TLB entry
to be replaced by the new entry. Note that the lower eight TLB
entries are not pointed to by Random; the kernel software can
thus insure that it is constantly mapped, and deterministic
response is guaranteed.

The Translation Lookaslde Buffer (TLB)
The translation of virtual addresses in eitherkuseg orkseg2
(mapped segments) is performed by the on-chip Translation
Lookaside Buffer array. This array consists of 64 fullyassociative (content addressable) memory elements. Each
entry maps a 4Kbyte virtual page to a 4Kbyte physical page.
Each TLB entry contains other information about the virtual
address it maps (such as which User process it maps) and
also about the physical address (such as whether it is cache able
or writeable).
Figure 14 illustrates the format of each TLB entry. The
translation operation is illustrated in Figure 15. The upper
portion of the desired virtual address is compared against the
VPN field of each TLB entry. Additionally, the current process
ID (contained in the TLBHI register) is matched against the
PIDfieldofthe TLB entry (ifthe TLB entryis marked as Global,
the PID comparison is ignored). If a match occurs, and the
TLB entry is marked as Valid, then the translation is completed
by replacing the VPN of the virtual address with the corresponding PFN (Physical Frame Number).
Current
Process ID

5

31

PID

+
VPN

62
61

··

•• •
••• ••
2

o

o

12 11

~

63

3

The IDT79R3001 can execute the same binary software
(either kernel or user) that is executed by either the
IDT79R2000A or IDT79R3000. At the system level, some
hardware re-design is necessary to achieve the cost savings
inherent in the IDT79R3001 hardware interface.

0

T
60

BACKWARD COMPATIBILITY WITH
IDT79R2000A AND 79R3000 PROCESSORS

CAM
(Content Addressable
Memory)

______-vy________

~A~

________ ________
~y

Virtual
Address
~

I

---

PFN

Flags

-

···
·•

--

--

RAM

~

A

r

12 11

31

A

Physical
Address

o

2873drw 15

Figure 15. Virtual to Physical TLB Translation

5.2

14

1DT79R3001

RISControlier FOR HIGH-PERFORMANCE EMBEDDED SYSTEMS

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

PIN DESCRIPTIONS
Pin Name

I/O

Memory Interface
Data (0:31)
1/0

Description
A 32-bit bus used for all instruction and data transmission among the processor, synchronous memory space,
asynchronous memory space and co-processors.

DataP (0:3)

1/0

A 4-bit bus containing even parity over the data bus. If parity checking is enabled, a parity error will cause the PErr
signal to be asserted and a cache-miss to occur. Regardless of whether parity checking is enabled, the processor
will always generate parity on writes.

Tag (13:31)

1/0

A 19-bit bus used for transferring cache tags and high-order address bits between the processor, caches and
asynchronous memory spaces.

AddrLo (0:23)

0

A 24-bit bus containing low-order byte addresses for both the synchronous (cache) and asynchronous memory
spaces.

Synchronous Memory Control

DClk

0
0
0
0
0
0

Valid

1/0

A high on this signal indicates that the Tags just read from the cache are valid. When a cache update occurs, the
processor will generate the appropriate Valid bit.

PErr

0

If parity checking is enabled, this signal is an active low output of the internal CPO parity error status bit. It is driven
low when a parity error is detected and remains low until software clears the parity error flag in the status register. This
pin is physically the same pin as AccTyp2. Its function is selected during device reset.

IRd
IWr
IClk
DRd
DWr

The output enable for the instruction cache. The polarity of this signal is selectable.
The write enable for the instruction cache. The polarity of this signal is selectable.
The instruction cache address latch clock. The clock runs continuously.
The output enable for the data cache. The polarity of this signal is selectable.
The write enable for the data cache. The polarity of this signal is selectable.
The data cache address latch clock. The clock runs continuously.

Asynchronous Memory Interface
XEn
AccTyp (0:2)
MemWr
MemRd
BusError

0
0
0
0
I

The transceiver enable for the read buffer.
A 3-bit bus used to indicate the size of data being transferred on the asynchronous memory bus, whether or not a data
transfer is occurring and the purpose of the transfer. If parity checking is enabled, AccTyp2 becomes the PErr signal.
Signals the occurrence of an asynchronous memory write cycle.
Signals the occurrence of an asynchronous memory read cycle.
Signals the occurrence of a bus error during an asynchronous memory transfer cycle.

SysOut

0
0
0

RdBusy

I

The asynchronous memory read stall termination signal. In most system designs, RdBusy is normally asserted and
is deasserted only to indicate the successful completion of the memory read. RdBusy is sampled by the processor
only during memory read stalls.

WrBusy

I

The asynchronous memory write stall initiationltermination signal. WrBusy is only sampled during write operation.

Run
Exception

Indicates whether the processor is in a RUN or STALL state.
Indicates the instruction about to commit processor state should be aborted and other exception related information.
A clock derived from the internal processor clock used to generate the system clock.

Co-Processor Interface
CpSync

0

CPBusy

I

The co-processor busy stall initiationltermination signal.

CpCond (0:3)

I

A4-bitbus used to transfer conditional branch status from the co-processors tothe CPU. CpCond(O) is used to control
whether or not a cache burst refill occurs; the other signals are used as input port pins for co-processor branch
instructions.

A clock which is identical to SysOut and used by co-processors for timing synchronization with the CPU.

Processor Control Signals
DMAStall
I
DMA Stall. Signals to the processor that it should stall accesses to the synchronous memories and tri-state the
synchronous memory interface.
Int (0:5)

I

A S-bit bus used to signal maskable interrupts to the CPU. A reset time, mode values are sampled from this bus to
initialize the processor. During normal operation, these signals are not latched by the processor and must remain
asserted until the processor acknowledges the interrupt (through software) to the interrupt source.

Clk2xSys

I
I

The master double frequency input clock, used to generate SysOut.

Clk2xSmp/Rd
Clk2xPhi

I

A double frequency clock input used to determine the position of the two internal phases.

Reset

I

Initialization input used to force execution starting from the reset memory address. Reset should be asserted
asynchronously but must be negated synchronously with the leading edge of SysOut.

A double frequency clock input used to determine the sample point for data coming into the CPU and co-processors
and used to determine the enable time of the synchronous memory RAMs.

5.2

15

IDT79R3001
RISController FOR HIGH-PERFORMANCE EMBEDDED SYSTEMS

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

ABSOLUTE MAXIMUM RATINGS(1, 3)
Symbol
VTERM

TA,Tc

TSIAS
TSTG
liN

Rating
Terminal Voltage,
with Respect
toGND
Operating
Temperature

Military
Commercial
Unit
V
-0.5 to +7.0 -0.5 to +7.0

o to +70(~) -55 to +125
(Case)
(Ambient)
o to +90(5)
(Case)
Case Temperature -55 to + 125(4) -65 to +135
o to +9Q(5)
Under Bias
Storage
-55 to +125 -65 to + 155
Temperature
Input Voltage
-0.5 to +7.0 -0.5 to +7.0

RECOMMENDED OPERATING
TEMPERATURE AND SUPPLY VOLTAGE
Temperature
GND
Vee
Grade
Military
16-33 MHZ

-55°C to + 125°C
(Case)

OV

5.0 ±100/0

Commercial
16-33 MHz

O°C to +70°C
(Ambient)

OV

5.0±5%

Commercial
37-40 MHz

O°C to +90°C
(Case)

OV

5.0±5%

°C

°C

28731bi 07

°C

OUTPUT LOADING FOR AC TESTING
V

NOTE:
2873 Ibi 05
1. Stresses greater than those listed under ABSOLUTE MAXIMUM RATINGS
may cause permanent damage to the device. This is a stress rating only
and functional operation of the device at these or any other conditions
above those indicated in the operational sections of this specification is not
implied. Exposure to absolute maximum rating conditions for extended
periods may affect reliability.
2. VIN minimum = -3.0V for pulse width less than 15ns.
VIN should not exceed Vee +0.5 Volts.
3. Not more than one output should be shorted at a time. Duration of the short
should not exceed 30 seconds.
4. 16-33 MHz only.
5. 37-40 MHz only.

To Device
Under Test

2873 drw 16

AC TEST CONDITIONS
Symbol

Parameter

Min.

Max.

Unit

Signal

CL

VIH

Input HIGH Voltage

3.0

-

V

IRd, IWr, DRd, DWr

50pt

VIL

Input LOW Voltage

-

0.4

V

All Others

25pt

VIHS

Input HIGH Voltage

3.5

-

V

VILS

Input LOW Voltage

-

0.4

V
28731b1 06

5.2

16

1DT79R3001
MIUTARY AND COMMERCIAL TEMPERATURE RANGES

RISControlier FOR HIGH·PERFORMANCE EMBEDDED SYSTEMS

DC ELECTRICAL CHARACTERISTICS
COMMERCIAL TEMPERATURE RANGE (TA

=O°C to +70°C, Vcc =+5.0V ±5%)
16.67MHz

Symbol

Parameter

Test Conditions

Min.

Max.

20.0MHz

25.0MHz

33.33MHz

Min.

Max.

Min.

Max.

Min.

Max. Unit

VOH

Output HIGH Voltage

Vee = Min., IOH = -4mA

3.5

-

3.5

-

3.5

-

3.5

-

V

VOL

Output LOW Voltage

Vee = Min., IOL = 4mA

-

0.4

-

0.4

-

0.4

-

0.4

V

VOHT

Output HIGH Voltage(4,7)

Vee = Min., IOH = -SmA

2.4

-

2.4

-

V

Vee = Min., IOH = -4mA

4.0

4.0

-

2.4

Output HIGH Voltage(8)

-

2.4

VOHe

4.0

-

4.0

-

V

VOLT

Output LOW Voltage(4,7)

Vee = Min., IOL = BmA

-

O.B

-

O.B

-

O.B

-

O.B

V

VIH

Input HIGH Voltage(5)

2.0

-

2.0

-

2.0

-

2.0

-

V

VIL

Input LOW Voltage

-

O.B

-

O.B

-

O.B

-

O.B

V

VIHS

Input HiGH Voltage(2,5)

3.0

-

3.0

-

3.0

-

3.0

-

V

VILS

Input LOW Voltage(1,2)

-

0.4

-

0.4

-

0.4

-

0.4

V

IRESET

Input HIGH Current(6)

10

100

10

100

10

100

10

100

IlA

-

10

-

CIN

Input Capacitanee(7)

-

10

COUT

Output Capacitanee(7)

-

10

-

10
10

pF
pF

BOO

mA

100

IlA

lee

Operating Current

Vee = Max.

Input HIGH Leakage(3)

VIH = Vee

ilL

Input LOW Leakage(3)

VIL = GND

·100

-

·100

-

·100

-

·100

-

IlA

loz

Output Tri-state Leakage

VOH = 2.4V, VOL = 0.5V

-100

100

-100

100

-100

100

-100

100

IlA

100

100

750

10
10

IIH

575

650

10

100

NOTES:
1. VIL Min. = -3.0V for pulse width less than 15ns. VIL should not fall below -0.5 Volts for larger periods.
2. VIHS and VILS apply to Clk2xSys, Clk2xSmp/Rd, Clk2xPhi, CpBusy, and Reset.
3. These parameters do not apply to the clock inputs.
.
4. VOHT and VOLT apply to the bidirectional data and tag buses only. Note that VIH and VIL also apply to these signals. VOHT and VOLT are supplies as additional
information to help the system designer understand the relationship between current drive and output voltage on these pins ..
5. VIH should not be held above Vee + 0.5 volts.
6. The IDT79R3001 contains an internal pull-up/current source on the TAG pins to facilitate initialization. This current source is disconnected when Reset
is inactive.
7. Guaranteed by design.
S. VOHe applies to RUN and Exception.

5.2

17

IDT79R3001
RISController FOR HIGH-PERFORMANCE EMBEDDED SYSTEMS

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

DC ELECTRICAL CHARACTERISTICS
COMMERCIAL TEMPERATURE RANGE (TC = oDe to +90 o e, VCC

=+5.0V ±5%)
40.0MHz

37.0MHz
Symbol

Parameter

Test Conditions

Min.

Max.

Min.

VOH

Output HIGH Voltage

Vee = Min., 10H = -4mA

3.5

-

3.5

VOL

Output LOW Voltage

Vee = Min., 10L = 4mA

-

0.4

-

VOHe

Output HIGH Voltage(7)

Vee = Min., 10H = -4mA

4.0

4.0

VOHT

Output HIGH Voltage(4.6)

Vee = Min., 10H = -8mA

2.4

-

VOLT

Output LOW Voltage(4.6)

Vee = Min., 10L = 8mA

-

0.8

VIH

Input HIGH Voltage(5)

2.0

-

.,::,.

Input LOW Voltage(1)

-

0.8

Input HIGH Voltage(2,5)

3.0

-:t:t:::::::}:: ~~{t::·

VILS

Input LOW Voltage(1,2)

-

IRESET

Input HIGH Current(6)

10

CIN

Input Capacitance(6)

-

COUT

Output Capacitance(6)

-~:::t::=/t}:::::. :=::::::::::"'"

Icc

Operating Current

Vee = 5V, TA = 7aoC

·::::t:::::::t?-

...

V

.·t:::d14:.

V

~:::.

.::::::::nf:::··

VIL

::(\ .:'::::::::::::}:::>

-&~k=:,.:·::==::::::::

V

:;:::.,

V

0.8

V

-

V

0.8

V

-

V

-

-0.4

V

10

100

JlA

3.0

10

pF

10

-

10

pF

825

-

850

mA

-

100

JlA

.:::::::: 1::::fj>.,:::::>Y6

.:::{:=tr::::::::::::·;·:.·:::

Unit

2.4:{/t:::::::: !<::::!::> -

VIHS

}::::JQq:.

J

Max.

IIH

Input HIGH Leakage(3)

VIH= VCC

IlL

Input LOW Leakage(3)

VIL = GND

·foo

-

-100

-

JlA

loz

Output Tri-state Leakage

VOH = VCC, VOL = GND

-100

100

-100

100

JlA

NOTES:

100

2873 tbl 09

1.
2.
3.
4.

VIL Min. = -3.0V for pulse width less than 15ns. VIL should not fall below -0.5 Volts for larger periods.
VIHS and VILS apply to Clk2xSys, Clk2xSmp, Clk2xRd, Clk2xPhi, Cp8usy, and Reset.
These parameters do not apply to the clock inputs.
VOHT and VOLT apply to the bidirectional data and tag buses only. Note that VIH and VIL also apply to these signals. VOHT and VOLT are provided
to give the designer further information about these specific signals.
5. VIH should not be held above Vcc + 0.5 volts.
6. Guaranteed by design.
7. VOHC applies to RUN and Exception.

5.2

18

IDTI9R3001
RISController FOR HIGH·PERFORMANCE EMBEDDED SYSTEMS

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

DC ELECTRICAL CHARACTERISTICS
MILITARY TEMPERATURE RANGE (Tc = -55°C to +125°C, VCC = +5.0V ±10%)
16.67MHz
Symbol

Parameter

VOH

Output HIGH Voltage

VOL

Output LOW Voltage

VOHT

Output HIGH Voltage(4,7)

VOHe

Output HIGH Voltage(8)

VOLT

Output LOW Voltage(4,7)

VIH

Input HIGH Voltage(5)

VIL

Input LOW Voltage

VIHS

Input HIGH Voltage(2,5)

VILS

Input LOW Voltage{1,2)

Test Conditions

Min.

= Min.,

3.5

= -4mA
Vee = Min., IOL = 4mA
Vee = Min., IOH = -SmA
Vee = Min., IOH = -4mA
Vee = Min., IOL = SmA
Vee

IOH

7.

Max.

25.0MHz
Min.

Max.

0.4

Max.

0.4

Unit

V

3.5

3.5

3.5

33.33MHz
Min.

.iI:p.4

V

2.4

2.4

V

4.0

4.0

V

O.S
2.0

V

O.S

V

2.0
O.S ..

O.S
3.0

3Jb :)::o.S

O.S

V

0.4

V

V

3.0

3.0

1Jt..jOO·

IRESET

Input HIGH Current(6)

100

~A

CIN

Input Capacitanee(7)

10

pF

GoUT

Output Capaeitanee(7)

10

pF

lee

Operating Current

Vee = Max.

750

mA

VIH = Vee

1.
2.
3.
4.

8.

20.0MHz
Min.

0.4

10

IIH

Input HIGH Leakage(3)

IlL

Input LOW Leakage(3)

VIL = GND

loz

Output Tri-state Leakage

VOH = 2.4V, VOL

= 0.5V

-100

NOTES:

5.
6.

Max.

100

100

-100

100

10

-100

100

~A

2873 tbl 08

VIL Min. = -3.0V for pulse width less than 1Sns. VIL should not fall below -0.5 Volts for larger periods.
VIHS and VILS apply to Clk2xSys, Clk2xSmp/Rd, Clk2xPhi, CpBusy, and Reset.
These parameters do not apply to the clock inputs.
VOHTand VOLT apply to the bidirectional data and tag buses only. Note that VIH and VILalso apply to these signals. VOHT and VOLT are supplies as additional
information to help the system designer understand the relationship between current drive and output voltage on these pins ..
VIH should not be held above Vee + 0.5 volts.
The IDT79R3001 contains an internal pull-up/current source on the TAG pins to facilitate initialization. This current source is disconnected when Reset
is inactive.
Guaranteed by design.
VOHe applies to RUN and Exception.

5.2

19

II

IDT79R3001
RISControl/er FOR HIGH-PERFORMANCE EMBEDDED SYSTEMS

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

AC ELECTRICAL CHARACTERISTICS(1,4)
COMMERCIAL TEMPERATURE RANGE (TA = O°C to +70°C, VCC = +S.OV ±S%)
16.67MHz
Symbol

Parameter

Test Conditions

Min.

Max.

20.0MHz
Min. Max.

25.0MHz
Min.

Max.

33.33MHz
Min.

Max.

Unit

6

ns
ns

Clock

8

10

-

8

-

6

-

25

500

20

500

15

500

ns

Tcyc/4

0

Tcyc/4

0

Tcyc/4

0

Tcyc/4

ns

9

Tcyc/4

7

Tcyc/4

5

Tcyc/4

3.5

Tcyc/4

ns

.-

-2

-

-1.5

ns

-0.5

ns

9

-

10

30

500

Clk2xSys to Clk2xSmp/Rd(5)

0

Cik2xSmp/Rd to Clk2xPhi(5)

TCkHigh

Input Clock High(2)

Note 7

12.5

TCkLow

Input Clock Low(2)

Note 7

12.5

TCkP

Input Clock Period(2)

Run Operation
TOEn

Data Enable(3)

TOOls

Data Disable(3)

TOVal

Data Valid

Load= 25pF

TWrDly

Write Delay

Load= 25pF

Tos

Data Set·up

TOH

Data Hold

TCBS

Cp8usy Set-up

-2

-

-1.5

-1

5

-

4

-

-0.5

3

-

-

8

-2.5

-2.5

-2.5

13

-

11

-

-2.5

-

-2.5

-1

3

-2.5

-

Load= 25pF

-

7

-

Access Type2

Load= 25pF

17

-

Memory Write

Load= 25pF

1

27

Exception

Load= 25pF

-

7

-

7

-

30

-

TCBH

Cp8usy Hold

TAcTy

Access Type (1 :0)

TAT2
TMwr
TExc

6
9

2
3

-

6

-

5

14

-

12

-

1

23

1

18

-

5

23

-

20

23

-

18

2

ns

2

ns
ns

4.5

-

-2.5

-

ns

7

-

ns

3.5

ns

8.5

ns

9.5

ns

3.5

ns

15

ns

-2.5

-

ns

Stall Operation
TSAVal

Address Valid

Load= 25pF

TSAcTy

Access Type

Load= 25pF

TMRdi

Memory Read Initiate

Load= 25pF

TMRdT

Memory Read Terminate

Load= 25pF

1

Tstl

Run Terminate

Load= 25pF

3

TRun

Run Initiate

Load= 25pF

7

1

-

27
27

13.5

ns

1

13.5

ns
ns

1

23

2

1

23

1

5

1

13.5

17

3

15

3

10

2

7.5

ns

-

6

-

4

3

ns
ns

1

18

-

-

TSMWr

Memory Write

Load= 25pF

3

27

3

23

3

18

2

9.5

TSEc

Exception Valid

Load= 25pF

-

15

-

13

-

10

-

7.5

ns

TOMAOis

DMA Drive On

Load= 25pF

3

15

3

15

3

15

3

15

ns

TOMAEn

DMA Drive Off

Load= 25pF

-

10

-

10

-

10

-

10

ns

-

6

-

6

140

140

-

Reset Initialization
TRST

Reset Pulse Width

TRSTTAG Reset Pulse Width, Pull-downs
on Tag

6

-

6

140

-

140

0.5

1

0.5

Tcyc

IlS

Capacitive Load Deration
CLD

Load Derate(6)

1

0.5

1

NOTES:
1. All timings are referenced to 1.5V.
2. The dock parameters apply to all three 2xClocks: Clk2xSys, Clk2xSmp/Rd, and Clk2xPhi.
3. This parameter is guaranteed by design.
4. These parameters are illustrated in detail in the "IDT79R3001 Hardware Interface Guide".
5. Tcyc is one CPU clock cycle (2 cycles of a 2x dock).
6. With the exception of Run, no two signals on a given device will derate for a given load by a difference greater than 15%.
7. Transition time <2.5ns for 33MHz; <5ns for lower speeds.

5.2

0.5

1

ns125pF
28731b1 11

20

1DT79R3001

RISControlier FOR HIGH·PERFORMANCE EMBEDDED SYSTEMS

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

AC ELECTRICAL CHARACTERISTICS(1,4)
COMMERCIAL TEMPERATURE RANGE (Tc = O°C to +90°C, Vcc = +5.0V ±5%)
Parameter

Test Conditions

Unit

NOTES:
1. All timings are referenced to 1.SV.
2. The clock parameters apply to all three 2xClocks: Clk2xSys, Clk2xSmp/Rd, and Clk2xPhi.
3. This parameter is guaranteed by design.
4. These parameters are illustrated in detail in the "IDT79R3001 Hardware Interface Guide".
5. Tcyc is one CPU clock cycle (2 cycles of a 2x clock).
6. With the exception of Run, no two signals on a given device will derate for a given load by a difference greater than 15%.

5.2

2873 tbl 12

21

I DT79R3001
RISController FOR HIGH-PERFORMANCE EMBEDDED SYSTEMS

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

AC ELECTRICAL CHARACTERISTICS(1,4)
MILITARY TEMPERATURE RANGE (Tc = -55°C to +125°C, Vcc = +5.0V ±1 0%)
Parameter

Test Conditions

Unit

NOTES:
1. All timings are referenced to 1.5V.
2. The clock parameters apply to all three 2xClocks: CIk2xSys, Clk2xSmp/Rd, and Clk2xPhi.
3. This parameter is guaranteed by design.
4. These parameters are illustrated in detail in the "IDT79R3001 Hardware Interface Guide".
5. Tcyc is one CPU clock cycle (2 cycles of a 2x clock).
6. With the exception of Run, no two signals on a given device will derate for a given load by a difference greater than 15%.
7. Transition time <2.5ns for 33MHz; <5ns for lower speeds.

5.2

22

IDT79R3001
RISController FOR HIGH-PERFORMANCE EMBEDDED SYSTEMS

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

PIN CONFIGURATIONS
172-Pin Ceramic Flatpack (Cavity Side View)

Data2l
Data22
Data24
Data25
Data26
Data3l
DataP3
Data27
Data28
XEn
Data29
Data30
Exc
CIk2xPhi
GND7
GND6
CpCond2
VCC7
VCC6
GND5
GND4
GND3
VCC5
VCC4
VCC3
GND2
GNDl
CIk2xSys
CpSync
MemWr
Acelt!.
Run
VCC2
VCCl
CIk2xSmp/Rd
SysOut

43

87

IDT79R3001 RISConlroller

AdrLo2
AdrLo3
AdrLo4
AdrLo5
AdrLo6
AdrLo7
AdrLo8
AdrLo9
AdrLol0
AdrLoll
AdrLo12
AdrLo13
AdrLo14
VCC15
VCC16
VCC17
GND16
GND17
VCC18
VCC19
GND18
VCC20
VCC2l
VCC22
AdrLo15
CpCondO
CpCondl
Resvdl
GND19
GND20
AdrLo16
AdrLo17

Ir1tO
Trill

int2
int3

DClk
IClk

1nt4
1nt5
CpBusy
WrBusy
RdBusy
BusError
Reset

CpCond3
MemRd
AceTyO
AcetTy2
DMAStall

2873 drw 17

NOTE:
1. AccTyp2 is redefined to be Parity Error if the parity enable option is selected at device initialization.

5.2

23

II

IDT79R3001
RISControlier FOR HIGH·PERFORMANCE EMBEDDED SYSTEMS

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

PIN CONFIGURATIONS (Continued)
144-Pin PGA (Top View)
4

5

10

11

12

13

Adrlo ~pCond Adrlo
15
16
0

Adrlo
17

Int(2)

Int(5)

Wr
Busy

Reset VCC10

Adrlo ~pCond
13
1

Int(l)

Int(3)

Cp
Busy

~
Error

Run

Tag13

Tag16

Adrlo GND13 GND12 VCCll
8

Int(O)

Int(4)

Rd
Busy

GND

Tag14

Tag 17

Tag20

8

6

9

14

15

A

VCC14 Adrlo
6

Adrlo
10

Adrlo VCC12 Adrlo
11
14

B

Adrlo
3

Adrlo
7

Adrlo
9

C

Adrlo
0

D

Data
1

Adrlo
2

GNDo

Tag15

Tag19

Tag21

E

DataP
0

Data
0

Adrlo
1

Tag18

Tag22

VCC9

VCCO

Data
7

Data
2

GND10 Tag23

Tag25

G

Data
4

Data
3

GNDl

H

Data
6

Data
5

Data
8

Data
10

DataP
1

Data
15

F

Mem
Wr

Adrlo VCC13 Adrlo
4
5

Adrlo
12

Cp
Sync

GND9

Tag24

Tag26

VCC8

Tag28

Tag27

Data
9

Tag31

Valid

Tag29

Data
11

GND2

GND8

Adrlo
19

Tag30

VCCl

Data
12

Data
17

Adrlo
22

Adrlo
20

Adrlo
18

M

Data
13

Data
16

DataP
2

GND7

Adrlo
23

VCC7

N

Data
14

Data
18

Data
19

GND3

Data
24

DataP
3

VCC3

VCC4

DRd

IWr

Adrlo
21

P

Data
23

Data
20

AccTyl

Data
22

Data
26

Data
27

XEn

Data
30

IRd

DWr

a

VCC2

Data
21

Data
25

Data
31

Data
28

GND4

Data
29

E~-

K

IDT79R3001 RISControlier

tion

GND5

GND6

MlLm
Ad

CIk2x CIk2x
DClk
Sys SmplAd
CIk2x
Phi

DMA
Stall

Cp
AccTyO
Cond3

Cp
SysOut VCC5
Cond2

IClk

AccTy2 VCC6
2873 drw 18

NOTE:
1. AccTyp2 is redefined to be Parity Error if the parity enable option is selected at device initialization.

5.2

24

1DT79R3001

RISControlier FOR HIGH·PERFORMANCE EMBEDDED SYSTEMS

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

Clk2xSys

Clk2xSmpRd

Clk2xPhi
2873drw 19

Figure 16. Input Clock Timing

Rd/SmpOut*

2873 drw 20

* These signals are not actually output from the processor. They are drawn to provide

a reference for other timing diagrams.

Figure 17. Processor Reference Clock Timing

5.2

25

IDT79R3001
RISController FOR HIGH·PERFORMANCE EMBEDDED SYSTEMS

Phase

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

2

2

SysOut

PhiOut

AddrLo

AccTyp 0:1

Size of Load Data

AccTyp 2
D Bus
Input

Data and
Tag Buses

IClk

DClk

2873 drw21

Figure 18. Synchronous Memory (Cache) Timing

5.2

26

1DT79R3001
RISController FOR HIGH·PERFORMANCE EMBEDDED SYSTEMS

RUN
Phase

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

STALL

2

STALL

2

FIXUP

2

RUN

2

AddrLo

Tag
(Address
High)

AccTyp 0:1

AccTyp 2

Data
(Output)

2873 drw 22

Figure 19. Memory Write Timing

5.2

27

IDT79R3001
RISController FOR HIGH-PERFORMANCE EMBEDDED SYSTEMS

RUN
Phase

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

STALL
2

STALL
2

FIXUP
2

RUN
2

AddrLo

Tag
(Address

High)

AccTyp 0:1

AccTyp 2

Data
(Input)

RdBusy

CpCondO

2873 drw 23

Figure 20. Memory Read Timing

5.2

28

IDT79R3001
RISController FOR HIGH-PERFORMANCE EMBEDDED SYSTEMS

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

Co rocessor Store

Phase

Co rocessor Load

2

2

Cp8usy

Exception

CpCond(n)

2873 drw24

Figure 21. Co-Processor Load/Store Timing

5.2

29

IDT79R3001
RISController FOR HIGH·PERFORMANCE EMBEDDED SYSTEMS

Phase

MILITARY AND COMMERCIAL TEMPERATURE RANGES

2

2

2873 drw 25

Figure 22. Interrupt Timing

Phase

2

2

2

2

2

SysOut

PhiOul
Mode

Inl(n)

Reset

r-

TdH
2873 drw26

Figure 23. Mode Vector Initialization
NOTES:
1. Reset must be negated synchronously; however, it can be asserted asynchronously. Designs must not rely on the proper functioning of SysOut prior to
the assertion of Reset.
2. If Phase-Lock On or is asserted as mode select options, they should be asserted throughout the Reset period, to insure that the slowest coprocessor in
the system has sufficient time to lock to the CPU clocks.
3. Reset is actually sampled in both Phase 1 and Phase 2. To insure proper initialization, it must be negated relative to the end of Phase 1.

5.2

30

I DT79R3001
RISControlier FOR HIGH·PERFORMANCE EMBEDDED SYSTEMS

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

DMAStall

Stall

Fixup

2873 drw 27

Figure 24. Entering DMA Stall

5.2

31

IDT79R3001
RISControJler FOR HIGH·PERFORMANCE EMBEDDED SYSTEMS

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

DMAStall

Run
Phase

DMAStall

2

2

SysOut

PhiOut

DMAStall

Run

DRd
DWr
IRd
IWr
XEn
IClk
DClk
AdrLo
2873 drw 28

Figure 25. Completing DMA Stall

ORDERING INFORMATION
IDT

XXXXX
Device Type

XX
Speed

X

x

Package

Process/
Temperature
Range

y~lank
F

144·Pin PGA
172·Pin Flat Pack

16
20
25
33
37
40

16.67 MHz
20.0 MHz
25.0 MHz
33.0 MHz
37.0 MHz
40.0 MHz

1.---------1 G

'--_ _ _ _ _ _ _ _ _ _ _ _-1

I.-----------------~ 79R3001

5.2

Commercial (O°C to +70°C)
Military (-55°C to + 125°C)
Compliant to MIL·STD·883, Class E

RISControlier

2873 drw 29

32

t;)

RISC FLOATING POINT
ACCELERATOR (FPA)

IDT79R3010A
IDT79R3010AE

Integrated Devlc.e Technology, Inc.

FEATURES:

•

• Hardware Support of Single- and Double-Precision
Operations:
- Floating-Point Add
- Floating-Point Subtract
- Floating-Point Multiply
- Floating-Point Divide
- Floating-Point Comparisons
- Floating-Point Conversions
• Sustained performance:
- 11 MFLOPS single precision UNPACK
-7.3 MFLOPS double precision UNPACK
• 16.7MHz through 40 MHz operation
• Direct, high-speed interface with IDT79R3000A and
1DT79R3001 Processor
• Supports Full Conformance With IEEE 754-1985 FloatingPoint Specification
• Full 64-bit operation using sixteen 64-bit data registers
• High-speed CEMOSTM technology
• Military product compliant to MIL-STD-883, Class B
• 32-bit status/control register providing access to all IEEEStandard exception handling

Load/store architecture allows data movement directly
between FPA and memory or between CPU and FPA
• Overlapped operation of independent floating point ALUs
• Fully pin-compatible with IDT79R3010/IDT79R3010L

DESCRIPTION:
The IDT79R3010A Floating-Point Accelerator (FPA) operates in conjunction with the IDT79R3000A Processor and
extends the IDT79R3000A's instruction set to perform arithmetic operations on values in floating-point representations.
The IDT79R3010A FPA, with associated system software,
fully conforms to the requirements of ANSI/IEEE Standard
754-1985, "I EEE Standard for Binary Floating-Point Arithmetic." In addition, the architecture fully supports the standard's
recomendations.
This data sheet provides an overview of the features and
architecture of the 79R301 OA FPA. A more detailed description of the operation of the device is incorporated in the
"R3000A Family Hardware User's Manual," and a more detailed architectural overview is provided in the "mips RISC ~
ArcMeclure" book, both available from IDT.
...

CACHE
DATA

CONTROL
UNIT
&
CLOCKS
DIVIDE UNIT

CLOCKS-"
PLLOn -..

MULTIPLY UNIT

Figure 1. 1DT79R3010A Functional Block Diagram

CEMOS is a trademark of Integrated Device Technology. Inc.

MILITARY AND COMMERCIAL TEMPERATURE RANGES
<1:11990 Integrated Device Technology. Inc.

5.3

DECEMBER 1990
DSC-903911

1

IDT79R3010AlAE
RISC FLOATING POINT ACCELERATOR

MIUTARV AND COMMERCIAL TEMPERATURE RANGES

IDT79R3010A FPA REGISTERS
The IDT79R3010A FPA provides 32 general purpose 32bit registers, a Control/Status register, and a Revision Iden-

tification register. The tightly-coupled coprocessor interface
causes the register resources of the FPA to appear to the
systems programmers as an extension of the CPU internal
registers. The FPA registers are shown in Figure 2.

General Purpose Registers
(FGR/FPR)
63

o

32 31
FGR1

FGRO

FGR3

FGR2

FGR5

FGR4

ControllStatus Register
31

•
•
•
FGR27

FGR26

FGR29

FGR28

FGR31

FGR30

0

I

Exceptions/Enables/Modes

I

31

ImplementaUonIRevlslon
Register

0

I

I
2873 drw 02

Figure 2. 1DT79R3010A FPA Registers

of
•
•
•

Floating-point coprocessor operations reference three types
registers:
Floating-Point Control Registers (FCR)
Floating-Point General Registers (FGR)
Floating-Point Registers (FPR)

The FPA performs three types of operations:
• Loads and Stores;
• Moves;
• Two- and three-register floating-point operations.

Floating-Point General Registers (FGR)
There are 32 Floating-Point General Registers (FGR) on
the FPA. They represent directly-addressable 32-bit registers,
and can be accessed by Load, Store, or Move Operations.
Floating-Point Registers (FPR)
The 32 FGRs described in the preceding paragraph are
also used to form sixteen 64-bit Floating-Point Registers
(FPR). Pairs of general registers (FGRs), for example FGRO
and FGR 1 (refer to Figure 2) are physically combined to form
a single 64-bit FPR. The FPRs hold a value in either singleor double-precision floating-point format. Double-precision
format FPRs are formed from two adjacent FGRs.
Floating-Point Control Registers (FCR)
There are 2 Floating-Point Control Registers (FCR) on the
FPA. They can be accessed only by Move operations and
include the following:
• Control/Status register, used to control and monitor
exceptions, operating modes, and rounding modes;
• Revision register, containing revision information about
the FPA.

COPROCESSOR OPERATION
The FPAcontinually monitors the IDT79R3000A processor
instruction stream. If an instruction does not apply to the
coprocessor, it is ignored; if an instruction does apply to the
coprocessor, the FPA executes that instruction and transfers
necessary result and exception data synchronously to the
IDT79R3000A main processor.

5.3

Load, Store, and Move Operations
Load, Store, and Move operations move data between
memory or the IDT79R3000A Processor registers and the
IDT79R3010A FPA registers. These operations perform no
format conversions and cause no floating-point exceptions.
Load, Store, and Move operations reference a single 32-bit
word of either the Floating-Point General Registers (FGR) or
the Floating-Point Control Registers (FCR).
Floating-Point Operations
The FPA supports the folowing single- and double-precision format floating-point operations:
• Add
• Subtract
• Multiply
• Divide
• Absolute Value
• Move
• Negate
• Compare
In addition, the FPA supports conversions between singleand double-precision floating-point formats and fixed-point
formats.
The FPA incorporates separate Add/Subtract, Multiply,
and Divide units, each capable of independent and concurrent
operation. Thus, to achieve very high performance, floating
point divides can be overlapped with floating point multiplies
and floating point additions. These floating point operations
occur independently of the actions of the CPU, allowing
furtheroverlap of integer and floating point operations. Figure
3 illustrates an example of the types of overlap permissible.

2

1DT79R3010AlAE
RISC FLOATING POINT ACCELERATOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

o

4

2

6

10

8

12

DIV.S

~ Only Load, Store, and Move operations
~ are permitted in FPA during these cycles.

Other FPA instructions can proceed during
these cycles. However, two multiply or two
divide operations cannot be overlapped.

I I These cycles are free for integer operations
~intheCPU.

2873 drw 03

Figure 3. Examples of Overlapping Floating Point Operation

Exceptions
The IDT79R3010A FPA supports all five IEEE standard
exceptions:
• Invalid Operation
Inexact Operation
• Division by Zero
• Overflow
• Underflow
The FPA also supports the optional, Unimplemented Operation exception that allows unimplemented instructions to
trap to software emulation routines.
The FPA provides precise exception capability to the CPU;
that is, the execution of a floating point operation which
generates an exception causes that exception to occur at the
CPU instruction which caused the operation. This precise
exception capability is a requirement in applications and
languages which provide a mechanism for local software
exception handlers within software modules.
OP

INSTRUCTION SET OVERVIEW
Alii DT79R301 OA instructions are 32 bits long and they can
be divided into the folowing groups:
• Load/Store and Move instructions move data between
memory, the main processor and the FPA general
registers.
• Computational instructions perform arithmetic operations
on floating point values in the FPA registers.
• Conversion instructions perform conversion operations
between the various data formats.
• Compare instructions perform comparisons of the contents of registers and set a condition bit based on the
results. The result of the compare operation is output on the
FpCond output of the FPA, which is typically used as
CpCond1 on the CPU for use in coprocessor branch
operations.
Table 1 lists the instruction set of the IDT79R3010A FPA.

LWC1
SWC1
MTC1
MFC1
CTC1
CFC1

Description
Load/Store/Move Instructions
Load Word to FPA
Store Word from FPA
Move Word to FPA
Move Word from FPA
Move Control word to FPA
Move Control word from FPA

CVT.S.fmt
CVT.D.fmt·
CVT.W.fmt

Conversion Instructions
Floating-point Convert to Single FP
Floating-point Convert to Double FP
Floating-point Convert to fixed-point

OP
ADD.fmt
SUB.fmt
MUL. fmt
DIV.fmt
ABS.fmt
MOV.fmt
NEG.fmt
C.cond.fmt

Description
Computational Instructions
Floating-point Add
Floating-point Subtract
Floating-point Multiply
Floating-point Divide
Floating-point Absolute value
Floating-point Move
Floating-point Negate
Compare Instructions
Floating-point Compare

2873 tbl 01

Table 1. IDT79R3010A Instruction Summary

5.3

3

II
I

IDT79R3010AlAE
RISC FLOATING POINT ACCELERATOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

ID79R3010 PIPELINE ARCHITECTURE

instruction on the bus to determine if it is an instruction for
the FPA.
3) ALU-If the instruction is an FPA instruction, instruction
execution commences during this pipe stage.
4) MEM-If this is a coprocessor load or store instruction, the
FPA presents or captures the data during phase 2 of this
pipe stage.
5) WB-The FPA uses this pipe stage solely to deal with
exceptions.
6) FWB-The FPA uses this stage to write back ALU results
to its register file. This stage is the equivalent of the WB
stage in the IDT79R3000A main processor.
Each of these steps requires approximately one FPA cycle
as shown in Figure 3 (parts of some operations spill over into
another cycle while other operations require only 1/2 cycle).

The IDT79R3010A FPA provides an instruction pipeline
that parallels that of the IDT79R3000A processor. The FPA,
however, has a 6-stage pipeline instead of the S-stage pipeline
of the IDT79R3000: the additional FPA pipe stage is used to
provide efficient coordination of exception responses between
the FPA and main processor.
The execution of a single IDT79R3010A instruction consists of six primary steps:
1) IF-Instruction Fetch. The main processor calculates the
insruction address required to read an instruction from the
I-Cache. No action is required of the FPA during this pipe
stage since the main processor is responsible for address
generation.
2) RD-The instruction is present on the data bus during
phase 1 of this pipe stage and the FPA decodes the

INSTRUCTION EXECUTION
F

I

I
I-Cache

RD

I RF

MEM

ALU

D-Cache

OP

T

WB
Register file
write back or
FP exceptions

FWB
FpWB

'-----y--J
One Cycle
2873 drw 04

Figure 4. IDT79R3010A Instruction Summary

Instruction
Flow

IF

WB

FWB

MEM

WB

ALU

MEM

RD

ALU

IF

RD

Current
Cycle

2873 drw 05

Figure S. IDT79R3010A Instruction Pipeline

The IDT79R3010A uses a 6-stage pipeline to achieve an
instruction execution rate approaching one instruction per
FPA cycle. Thus, execution of six instructions at a time are
overlapped as shown in Figure 5.

This pipeline operates efficiently because different FPA
resources (address and data bus accesses, ALU operations,
register accesses, and so on) are utilized on a non-interfering
basis.

S.3

4

1DT79R3010AlAE
RISC FLOATING POINT ACCELERATOR

MILITARY AND COMMERCIAL TEMPERATURE RANGES

PACKAG.E THERMAL SPECIFICATIONS
The IDT79R3010A utilizes special packaging techniques
to improve both the thermal and electrical characteristics of
the floating pOint accelerator.
In order to improve the electrical characteristics of the
device, the package is constructed using multiple signal
planes, including individual power planes and ground planes
to reduce noise associated with high-frequency TIL parts.
In order to improve the thermal characteristics of the
floating point accelerator, the device is housed using cavity
down packaging for the flatpack and the PGA (the J-bend
CerQuad is cavity up). In addition, these packages incorporate a copper-tungsten thermal slug designed to efficiently
transfer heat from the die to the case of the package, and thus
effectively lower the thermal resistance of the package. The
use of an additional external heat sink affixed to the package
thermal slug further decreases the effective thermal resistance of the package.
The case temperature may be measured in any environment to determine whether the device is within the specified
operating range. The case temperature should be measured

5.3

at the center of the top surface opposite the package cavity
(the package cavity is the side where the package lid is
mounted).
The equivalent allowable ambient temperature, TA, can be
calculated using the thermal resistance from case to ambient
(0ca) for the given package. The following equation relates
ambient and case temperature:
TA = Te - P*0ca
where P is the maximum power consumption, calculated by
using the maximum Icc from the DC Electrical Characteristic
section.
Typical values for 0ca at various airflows are shown in
Table 2 for the various CPU packages.
Airflow - {film in)
0

200

400

600

800

1000

0ca (84-PGA)

22

8

3

2

1.5

1.0

0ca (84-Flatpack)

22

9

4

3

2

1.5

0ca (84-CerOuad)

25

17

12

8

7

6
2873 tbl 02

Table 2. Thermal Resistance (Oca) at Various Airflows

5

IDT79R3010AlAE
RISC FLOATING POINT ACCELERATOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

PIN CONFIGURATION(1)
(Top View)

1110 9

CIk2xRd
FpSysin
Data (31)
VCC1
GND1
DataP (3)
FpSysOUt
CIk2xSys
CIk2xSmp
CIk2xPhi
Reset
FpSync
VCC2
GND2
VCC3
GND3
PLLOn
VCC4
GND4
VCC5
GND5

8 7

12
13
14
15
16

6 5 4 3 2

1 84 83 82 81 80 79 78

•

"

Index

17
18
19
20
21

22

n

76 75
74
73
72
71
70

GND13
DataP(1)
VCC12
GND12
FpCond

69
68
67

~sy

66

65

84-Pin J Bend CERQUAD

84

23

63

24
25
26
27
28
29

62
61
59
58
57

30

56

31

55

60

Fplnt
Exception
Run
Resvd2
Resvd1
VCC11
GND11
VCC10
GND10
FpPresent
ResvdO
VCC9
GND9
vcca
GNDa
2873 drw 06

££~~~~~e~E~~~~~bO~~~~

~g~~zug~g~~~~~~zu~~~~

~~~~~>~~~~O~~g~~>~~~~
0000
OO~~
~~~~

0000

NOTE:
1. Reserved pins must not be connected.

5.3

6

IDT79R3010AlAE
RISC FLOATING POINT ACCELERATOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

PIN CONFIGURATION(1)
(Ceramic, Cavity Down) - BOTTOM VIEW

Vss

Vee

Data
17

DataP
1

Vss

Data
21

Data
20

Data
18

Data
16

Vee

K

Vss

Vee

Data
19

J

Data

23

H

M

FP
Cond

Rsrvd
1

Vee

Vss

Data
15

Data
14

Vee

Vss

Data
22

Data
13

Data
12

Data
24

DataP
2

Data
11

Data
10

G

Data
26

Data
25

Vee

Vss

F

Vss

Vee

Data
8

Data
9

E

Data
27

Data
28

Data
7

DataP
0

D

Data
29

Data
30

Data
5

Data
6

C

Vss

Vee

CIk2x
Rd

Data
2

Vee

Vss

B

...EL
Sysln

Data
31

DataP
3

Vee

CIk2x
Sys

Vee

CIk2x
Phi

Vee

"PTIOn

Data
1

Data
3

Data
4

A

Vss

Vee

Fe.§.ts
Out

Vss

CIk2x
Smp

Vss

ReSet

Vss

rJ5

Data
0

Vee

Vss

7

8

10

11

FPlnt

Vss

FPBusy Exception

Vee

Run

Rsrvd
rI"
Present
2
Rsrvd
0

84-Pin Ceramic Pin Grid Array

3

Sync

II

12
2873 drw 07

NOTE:
1. Reserved pins must not be connected.

5.3

7

IDT79R3010AlAE
RISC FLOATING POINT ACCELERATOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

PIN CONFIGURATION(1)
84-L QUAD FLATPACK (CAVITY DOWN)
TOP VIEW

Data (30)
Data (29)
Data (28)
Data (27)

Data (0)
Data (1)
Data (2)
Data (3)
GND

vee

vee

GND
Data (2S)
Data (25)
Data (24)
DataP (2)
Data (23)
Data (22)
Data (21)
Data (20)

Data (4)
Data (5)
Data(S)
Data (7)
DataP (0)
Data (8)
Data (9)
Data (10)
Data (11)
GND

vee

GND
Data (19)
Data (18)
Data (17)
Data (1S)

vee

Data (12)
Data (13)
Data (14)
Data (15)

vee __-r-..

2873 drw 08

NOTE:
1. Reserved pins must not be connected.

5.3

8

1DT79R3010AlAE
RISC FLOATING POINT ACCELERATOR

MILITARY AND COMMERCIAL TEMPERATURE RANGES

PIN DESCRIPTIONS
Pin Name

I/O

Data (0-31)

I/O

A multiplexed 32-bit bus used for instruction and data transfers on phase 1 and phase 2. respectively.

DataP (0-3)

0

A 4-bit bus containing even parity over the data bus. Parity is generated by the FPA on stores.

Run

I

Exception

I

Description

Input to the FPA which indicates whether the processor-coprocessor system is in the run or stall state.
Input to the FPA which indicates exception related status information.

FpBusy

0

Signal to the CPU indicating a request for a coprocessor busy stall.

FpCond

0

Signal to the CPU indicating the result of the last comparision operation.

Fplnt

0

Signal to the CPU indicating that a floating-point exception has occured forthe current FPA instruction.

Reset

I

Synchronous initialization input used to distinguish the processor-FPA synchronization period from the
execution period. Reset must be synchronized by the leading edge of SysOut from the CPU.

PilOn

I

Input which during the reset period determines whether the phase lock mechanism is enabled and during the
execution period determines the output timing model.

0

Output which is pulled to ground through an impedance of approximately O.Sk ohms. By providing an external
pullup on this line, an indication of the presence or absence of the FPA can be obtained.

FpPresent
Clk2xSys

I

A double frequency clock input used for generating FpSysOut.

Clk2xSmp

I

A double frequency clock input used to determine the sample point for data coming in to the FPA.

Clk2xRd

I

A double frequency clock input used to determine the disable point for the data drivers.

Clk2xPhi

I

A double frequency clock input used to determine the position of the internal phases, phase 1 and phase 2.

FpSysOut

0

Synchronization clock from the FPA.

FpSysln

I

Input used to receive the synchronization clock from the FPA.

FpSync

I

Input used to receive the synchronization clock from the CPU.

5.3

,,,,·,,,11

9

1DT79R3010AJAE
RISC FLOATING POINT ACCELERATOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

ABSOLUTE MAXIMUM RATINGS(1, 3)
Symbol
VTERM

TA,Tc

TSIAS
TsrG
liN

Rating
Terminal Voltage
with Respect
toGND
Operating
Temperature

Military
Commercial
Unit
-0.5 to +7.0 -D.5to +7.0
V

o to +70(4) -55 to +125
(Ambient)
(Case)
o to +90(5)
(Case)
Case Temperature -55 to + 125(4) -65 to +135
o to +90(5)
Under Bias
Storage
-55 to +125 -65 to +155
Temperature
Input Voltage
-D.5 to +7.0 -D.5 to +7.0

RECOMMENDED OPERATING
TEMPERATURE AND SUPPLY VOLTAGE
Temperature
-55°C to + 125°C
(Case)

GND
OV

Vcc
5.0±10%

Commercial
16-33 MHz

O°Cto +70°C
(Ambient)

OV

5.0±5%

Commercial
37-40 MHz

O°Cto +90°C
(Case)

OV

5.0 ±5%

Grade
Military

°C

°C

28731bl 06

°C

OUTPUT LOADING FOR AC TESTING

V

2873 Ibl 04

NOTE:
1. Stresses greater than those listed under ABSOLUTE MAXIMUM RATI NGS
may cause permanent damage to the device. This is a stress rating only
and functional operation of the device at these or any other conditions
above those indicated in the operational sections of this specification is not
implied. Exposure to absolute maximum rating conditions for extended
periods may affect reliability.
2. VIN minimum = -3.0V for pulse width less than 15ns.
VIN should not exceed Vcc +0.5 Volts.
3. Not more than one output should be shorted ata time. Duration of the short
should not exceed 30 seconds.
4. 16-33 MHz only.
5. 37-40 MHz only.

To Device
Under Test

2873 drw 09

AC TEST CONDITIONS
Symbol

Parameter

Min.

Max.

Unit

VIH

Input HIGH Voltage

3.0

-

V

VIL

Input LOW Voltage

-

0.4

V

VIHS

Input HIGH Voltage

3.5

-

V

VILS

Input LOW Voltage

-

0.4

V

VIHC

Input HIGH Voltage

4.0

-

V

VILC

Input LOW Voltage

-

0.4

V
2873 Ibl 05

5.3

10

IDT79R3010AlAE
RISC FLOATING POINT ACCELERATOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

DC ELECTRICAL CHARACTERISTICS FOR IDT79R3010A
COMMERCIAL TEMPERATURE RANGE (TA = ooe to + 70 oe, vcc = + 5.0 V ± 5%)
Symbol

Parameter

Test Conditions

Min.

= Min, 10H = -4mA
= Min, 10L = 4mA
Vcc = Min, IOL = 1.5mA

16.67 MHz
Max.

20.0 MHz
Min.

Max.

Unit

VOH

Output HIGH Vo~age

Vcc

3.5

-

3.5

-

VOL

Output LOW Vo~age

Vcc

0.4

-

0.4

V

VOLFP

Output LOW Voltage(S)

-

0.5

-

0.5

V

VIH

Input HIGH Vo~age(6)

2.0

-

2.0

-

V

VIL

Input LOW Voltage(1)

-

0.8

-

0.8

V

VIHS

Input High Voltage(2,6)

3.0

-

3.0

-

V

VILS

Input LOW Voltage(1 ,2)

-

0.4

-

0.4

V

VIHC

Input HIGH Vo~age(4,6)

4.0

-

4.0

-

V

VILC

Input LOW Voltage(1 ,4)

-

0.4

CIN

Input Capacitanee(7)

-

10

-

COUT

Output Capacitanee(7)

-

10

pF

Operating Current

525

mA

Input HIGH Leakage(3)

-

600

ilH

-

10

Icc

ilL

Input LOW Leakage(3)

loz

Output Tri-state Leakage

= 5.0V, TA = 70°C
VIH = Vec
VIL = GND
VOH = 2.4V, VOL = 0.5V
Vcc

100

V

0.4

V

10

pF

100

IlA

-100

-

-100

-

Il A

-100

100

-100

100

Il A
2873 tbl 07

II

DC ELECTRICAL CHARACTERISTICS FOR IDT79R3010AE
COMMERCIAL TEMPERATURE RANGE (TA = ooe to + 70 oe, vcc = + 5.0 V ± 5%)
25.0 MHz
Symbol

Parameter

Unit

Min.

Vec

= Min, 10H = -4mA
= Min, 10L = 4mA
Vee = Min, 10L = 1.5mA

3.5

-

3.5

-

Vee

-

0.4

0.4

V

-

0.5

-

0.5

V

Min.

VOH

Output HIGH Voltage

VOL

Output LOW Voltage

VOLFP

Output LOW Voltage(S)

VIH

Input HIGH Voltage(6)

2.0

-

2.0

-

V

VIL

Input LOW Voltage(1)

-

0.8

-

0.8

V

VIHS

Input High Voltage(2,6)

3.0

-

3.0

-

VILS

Input LOW Voltage(1 ,2)

-

0.4

-

0.4

V
V

VIHC

Input HIGH Voltage(4,6)

4.0

-

4.0

-

V

VILC

Input LOW Voltage(1,4)

-

0.4

0.4

V

CIN

Input Capacitanee(7)

-

10

10

pF

COUT

Output Capacitanee(7)

lec

Operating Current

-

IIH

Input HIGH Leakage(3)

IlL

Input LOW Leakage(3)

loz

Output Tri-state Leakage

= 5.0V, TA = 70°C
VIH = Vee
VIL = GND
VOH = 2.4V, VOL = 0.5V
Vee

-

10

-

650
100

V

10

pF

700

rnA

100

IlA

-100

-

-100

-

IlA

-100

100

-100

100

IlA
2873 tbl 08

NOTES:

1.
2.
3.
4.
S.
6.
7.

Max.

33.33 MHz
Max.

Test Conditions

VIL Min. = -3.0V for pulse width less than 15ns. VIL should not fall below -0.5V for larger periods.
VIHS and VILS apply to Clk2xSys, Clk2xSmp, Clk2xRd, Clk2xPhi, FpSysin, FpSync and Reset.
These parameters do not apply to the clock inputs.
VIHe and VILe apply to Run, PilOn and Exception.
VOLFP applies to the FPPresent pin only.
VIH and VIHS should not be held above Vee + 0.5 Volts.
Guaranteed by design.

5.3

11

1DT79R3010AlAE
RISC FLOATING POINT ACCELERATOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

DC ELECTRICAL CHARACTERISTICS FOR IDT79R3010AE
COMMERCIAL TEMPERATURE RANGE (TC = O°C to + 90°C, vcc = + 5.0 V ± 5%)
37 MHz
Symbol

Parameter

40 MHz

Max.

Min.

Test Conditions

VOH

Output HiGH Voltage

Vcc = Min, IOH = -4mA

VOL

Output LOW Voltage

Vcc = Min, IOl = 4mA

0.4

VOlFP

Output LOW Voltage(S)

Vcc= Min, 10l = 1.5mA

0.5

VIH

Input HIGH Voltage(6)

Vil

Input LOW Voltage(1)

VIHS

Input High Voltage(2,6)

VllS

Input LOW Voltage(1,2)

VIHC

Input HIGH Voltage(4,6)

VllC

Input LOW Voltage(1,4)

3.5

0.8

",:f::::

. ::::. ·::::::::t::2{>
!::i::f::\:. :'{O

0.4

~:::~

4.0

GoUT

Output Capacitance(7)

Icc

Operating Current

Vcc = 5.0V, Tc = 90°C

·::::tJO

IIH

Input HIGH Leakage(3)

VIH = Vcc

III

Input LOW Leakage(3)

Vil = GND

loz

Output Tri-state Leakage

VOH = 2.4V, VOL = 0.5V

V

};::.::':::::;::::):.

'\1:/

;::::;;:::.,

:::::}

-

..::::~~}.

"::::"

V

-

V

0.8

V
V

0.4

V
V

0.4

V

10

pF

. /?::::;:::;:··5ij/::·

10

pF

·~:\~tt~~~ 125
{?. 100

750

mA

100

J.1A

..:::;:::)}

7::::::;::;t::\:I:

+1:9,9:/':::::'
-1&f:'

NOTES:
1. VIL Min. = -3.0V for pulse width less than 1Sns. VIL should not fall below -O.SV for larger periods.
2. VIHS and VILS apply to CIk2xSys. Clk2xSmp. Clk2xRd. Clk2xPhi, FpSysin, FpSync and Reset.
3. These parameters do not apply to the clock inputs.
4. VIHC and VILC apply to Run, PilOn and Exception.
5. VOLFP applies to the FPPresent pin only.
6. VIH and VIHS should not be held above Vee + 0.5 Volts.
7. Guaranteed by design.

5.3

V

. <~;lf:H{::::J:·:::··

3.0

Input Capacitance(7)

:.4:.
::~.::

2.0

Unit

.·:;titi¥:;:.

3.5

2.0

CIN

Max.

Min.

-100
100

-100

J.1A
100

J.1A
2873 tbl 09

12

1DT79R3010AlAE
RISC FLOATING POINT ACCELERATOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

DC ELECTRICAL CHARACTERISTICS FOR IDT79R3010A
MILITARY TEMPERATURE RANGE (Tc =-55°C to + 125°C, VCC = + 5.0 V± 10%)
16.67 MHz
Symbol

Parameter

Test Conditions

Min.
3.5

20.0 MHz

Max.

VOH

Output HiGH Voitage

Vee = Min, IOH = -4rnA

VOL

Output LOW Voltage

Vee = Min, IOL = 4rnA

0.4

VOLFP

Output LOW Voltage(S)

Vee = Min, IOL = 1.5rnA

0.5

VIH

Input HiGH Voltage(6)

VIL

Input LOW Voltage(1)

VIHS

Input High Voltage(2,6)

VILS

Input LOW Voltage(1,2)

VIHe

Input HIGH Voltage(4,6)

VILe

Input LOW Voltage(l,4)

":{::oJ~:::::::;;
. ~::::'::·;::::J/9..5 .

--~f\;.

4.0

-

·::,:ii":;::: l:j::it>·4.0

0.4

\:::.::'\\

<:::\1.0
. ,~@'·:.{>.J(f;

lee

Operating Current

Vee = 5.0V, TA = 70°C

IIH

Input HIGH Leakage(3)

VIH = Vee

IlL

Input LOW Leakage(3)

VIL = GND

loz

Output Tri-state Leakage

VOH = 2.4V, VOL = 0.5V

:::;:;::.,

. ,::;(:it:'
"7,;,,;;;::':@:I::

.':'.

I:'::;:::'

:::::.

V
V
V

0.4

V

0.4

V

10

pF

10

pF

650

rnA

100

JlA

100
100

V

0.8

-100

-106;

V

V

";:,::.,

"\/:('575

41Pg.:;:;;';;

Unit

V

. ;:; \:.:;:.::(tt¢';:-

0.4

Input Capacitanee(7)

::::I(

::;;:{~&::::tm>· ;.

3.0

Output Capacitanee(7)

;,(:/\, :.:':'::::::j>. -

2.0
0.8

COUT

-4.t

3.5

2.0

CIN

Max.

Min.

JlA

-100

100

JlA
2873 tbl 10

DC ELECTRICAL CHARACTERISTICS FOR IDT79R3010AEMILITARY TEMPERATURE RANGE (Tc = -55°C to + 125°C, Vcc = + 5.0 V ± 10%)
25.0 MHz
Symbol

Parameter

Test Conditions

Min.

33.33 MHz
Max.

Max.

Min.

VOH

Output HiGH Voltage

Vee = Min, IOH = -4rnA

3.5

-

3.5

VOL

Output LOW Voltage

Vee = Min, iOL = 4rnA

0.4

VOLFP

Output LOW Voltage(S)

Vee - Min, IOL - 1.5rnA

-

0.5

-

VIH

Input HiGH Voltage(6)

2.0

-

2.0

VIL

Input LOW Voltage(1)

-

0.8

-t.~

VIHS

input High Voltage(2,6)

3.0

-

VILS

Input LOW Voltage(1,2)

-

0.4

VIHe

Input HiGH Voltage(4,6)

4.0

-

VILe

Input LOW Voltage(1 ,4)

-

0.4:):;;:::<:::j:

CIN

Input Capacitanee(7)

COUT

Output Capacitanee(7)

-

lee

Operating Current

·~k

..:.:::::to:~j:::",,:.

-ft·::}{,O.5
:,,:i/=:::: :.:.:.\@> -

': :" ::t::::)

.;!J!I!..l> .
.,:<

;::::;:,.

Unit
V
V
V
V

0.8

V

-

V'

0.4

V

-

V

0.4

V

10

pF

.,.!:::(::;::{:;:Hf':

-

10

pF

Vee = 5.0V, TA = 70°C

-.:;:;::::::::::: "'\:::\/'700

-

750

rnA

"?,:,;:::',::@}:::- 1:('

-

IIH

Input HiGH Leakage(3)

VIH = Vee

IlL

input LOW Leakage(3)

VIL= GND

loz

Output Tri-state Leakage

VOH = 2.4V, VOL = 0.5V

"::'}:::':.:: ,{::mj>.4.0

.;:i::::;1.o.::::::\;:'

41"99/
-1mr'

:.;;::;.,

100

JlA

-

-100

-

JlA

100

-100

100

100

JlA
2873 tbl 11

NOTES:
1. VIL Min. = -3.0V for pulse width less than 1Sns. VIL should not fall below -O.SV for larger periods.
2. VIHS and VILS apply to Clk2xSys, Clk2xSmp, Clk2xRd, Clk2xPhi, FpSysin, FpSync and Reset.
3. These parameters do not apply to the clock inputs.
4. VIHe and VILe apply to Run, PilOn and Exception.
5. VOLFP applies to the FPPresent pin only.
6. VIH and VIHS should not be held above Vee + 0.5 Volts.
7. Guaranteed by design.

5.3

13

III

IDT79R3010AlAE
RISC FLOATING POINT ACCELERATOR

MIUTARV AND COMMERCIAL TEMPERATURE RANGES

AC ELECTRICAL CHARACTERISTICS FOR IDT79R3010A(1, 3)
COMMERCIAL TEMPERATURE RANGE (TA = O°C to +70°C, Vcc =+5.0V ± 5%)
Symbol

Parameter

16.67 MHz
Max.

Test Conditions

Min.

20.0 MHz
Min.

Max.

Unit

Clock
TCkHigh

Input Clock High(2)

Note 7

12

TCkLow
TCkP

Input Clock Low(2)

Note 7

12
30
0
0
9

Input Clock Period
Clk2xSys to Clk2XSmp(5)
Clk2xSmp to CIk2xRd(5)
Clk2xSmp to Clk2xPhi(5)

1000
tcyc/4
tcyc/4
tcyc/4

10
10
25
0
0
7

-

ns

1000
tcyc/4
tcyc/4
tcyc/4

ns
ns
ns
ns
ns

-

-2

ns

-1

ns

3

ns

15

-

ns

30

ns

Timing Paramters
TOEn

Data Enable(3)

-

-2

TDOls

Data Disable(3)

-

-1

TOVal

Data Valid

-

3

TRSOS

Reset Set-up

15

Tos

Data Set-up

9

TOH

Data Hold(3)

-2.5

-

Load= 25pF

TFpCond Fp Condition

-

35

TFpBusy Fp Busy

15

8
-2.5

12

TFplnt

Fp Interrupt

TFpMov

Fp Move To

-

TRExS

Exception Set-up (Run Cycle)

14

TSExS

Exception Set-up (Stall Cycle)

12

TExH

Exception Hold

0

TRunS

Run Set-up

17

TRunH

Run Hold

-2

TStallS

Stall Set-up

10

TStallH

Stall Hold

-2

-

3000

-

3000

128

-

128

40
35

ns
ns

13

ns

35

ns

30

ns

-

ns

10

-

ns

0

-

ns

15

ns

-2

-

10

-

ns

-2

-

ns

-

Tcyc

ns

Reset Initialization
TrstPLL

Reset Timing, Phase-lock on(4, 5)

Trst

Reset Timing, Phase-lock oU(5)

Tcyc

Capacitive Load Deration
CLD

Load Derate(6)

0.5

2

NOTES:
1. All timings are referenced to 1.5V.
2. The clock parameters apply to all four 2xClocks: Clk2xSys, Clk2xSmp, Clk2xRd, and Clk2xPhi.
3. This parameter is guaranteed by design.
4. With PilOn asserted, Reset must be asserted for the longer of 3000 clock cycles or 200 microseconds.
5. Tcyc is one CPU clock cycle (two cycles of a 2x clock).
6. No two signals on a given device will derate for a given load by a difference greater than 15%.
7. Clock transition time < 5ns.

5.3

0.5

1

ns/25pF
2873 tbl 12

14

IDT79R3010AlAE
RISC FLOATING POINT ACCELERATOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

AC ELECTRICAL CHARACTERISTICS FOR IDT79R3010AE(1,3)
COMMERCIAL TEMPERATURE RANGE (TA = O°C to +70°C, VCC = +5.0V ± 5%)
33.33 MHz

25.0 MHz
Symbol

Parameter

Test Conditions

Min.

Max.

Min.

Max.

Unit

Clock
TCkHigh Input Clock High(2)
TCklaw
TCkP

Input Clock Low(2)
Input Clock Period
Clk2xSys to Clk2XSmp(5)
Clk2xSmp to CIk2xRd(5)
Clk2xSmp to Clk2xPhi(5)

Note 7

8

Note 7

8
20
0

0
5

1000
tcyc/4
tcyc/4
tcyc/4

6
6
15
0

0
3.5

1000
tcyc/4
tcyc/4
tcyc/4

ns
ns
ns
ns
ns
ns

Timing Paramters

-

TOEn

Data Enable(3)

TOOls

Data Disable(3)

TOVal

Data Valid

TRSOS

Reset Set-up

TDS

Data Set-up

6

TDH

Data Hold(3)

-

Load= 25pF

-0.5

-

2

-

-1.5

-2.5

-

TFpCond Fp Condition

-

TFpBusy Fp Busy

25

10

TFpMov

Fp Move To

-

TRExS

Exception Set-up (Run Cycle)

11

TSExS

Exception Set-up (Stall Cycle)

TExH
TRunS

TFplnt

Fp Interrupt

-1

ns

-0.5

ns

2

ns

10

-

ns

4.5

-

ns

-2.5

25

-

17

ns

10

-

7

ns

18

ns

ns

16

ns

9

-

ns

8

-

6.5

ns

Exception Hold

0

-

0

Run Set-up

15

-

12.5

-

-

Tcyc

25

TRunH

Run Hold

-2

-

-1.5

TStaliS

Stall Set-up

9

7

TStaliH

Stall Hold

-2

-

-2

3000

-

3000

128

-

128

ns
ns
ns
ns
ns

Reset Initialization
TrstPll

Reset Timing, Phase·lock on(4, 5)

Trst

Reset Timing, Phase·lock off(5)

Tcyc

Capacitive Load Deration
CLD

Load Derate(6)

0.5

1

NOTES:
1. All timings are referenced to 1.5V.
2. The clock parameters apply to all four 2xClocks: Clk2xSys, Clk2xSmp, Clk2xRd, and Clk2xPhi.
3. This parameter is guaranteed by design.
4. With PilOn asserted, Reset must be asserted for the longer of 3000 clock cycles or 200 microseconds.
5. Tcyc is one CPU clock cycle (two cycles of a 2x clock).
6. No two signals on a given device will derate for a given load by a difference greater than 15%.
7. Clock transition time < 2.5ns for 33M Hz; clock transition time < 5ns for all other speeds.

5.3

0.5

1

ns125pF
2873 tbl 13

15

II

I DT79 R30 1OAIAE
MIUTARY AND COMMERCIAL TEMPERATURE RANGES

RISC FLOATING POINT ACCELERATOR

AC ELECTRICAL CHARACTERISTICS FOR ID179R301 OAE(1, 3)
COMMERCIAL TEMPERATURE RANGE (TC =O°C to +90°C, VCC =+5.0V ± 5%)
Parameter

NOTES:
1. All timings are referenced to 1.5V.
2. The clock parameters apply to all four 2xClocks: Clk2xSys, Clk2xSmp, Clk2xRd, and Clk2xPhi.
3. This parameter is guaranteed by design.
4. With PilOn asserted, Reset must be asserted for the longer of 3000 clock cycles or 200 microseconds.
5. Teye is one CPU clock cycle (two cycles of a 2x clock).
6. No two signals on a given device will derate for a given load by a difference greater than 15%.
7. Clock transition time < 2.5ns.

5.3

2873

tbr

16

14

IDT79R3010AlAE
RISC FLOATING POINT ACCELERATOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

AC ELECTRICAL CHARACTERISTICS FOR IDT79R3010A(1,3)
MILITARY TEMPERATURE RANGE (TC = -55°C to +125°C, VCC = +5.0V ± 10%)
20.0 MHz

16.67 MHz
Symbol

Parameter

Test Conditions

Max.

Min.

Max.

Min.

Unit

Clock
TCkHigh

Input Clock High(2)

Note 7

12

TCkLow
TCkP

Input Clock Low(2)

Note 7

12
30
0
0

Input Clock Period
Clk2xSys to Clk2XSmp(5)
Clk2xSmp to CIk2xRd(5)
Clk2xSmp to Clk2xPhi(5)

-

10
10
25
0
0
7

1000
tcyc/4
tcyc/4
tcyc/4

9

'::::"

-

··\}t~

/\\\4000/:

·········tcycU·

::{":!I

ns
ns
ns
ns
ns
ns

Timing Paramters
TOEn

Data Enable(3)

TOOls

Data Disable(3)

TOVal

Data Valid

TRSOS

Reset Set-up

Tos
TOH

-1

-

3

15

-

Data Set-up

9

Data Hold(3)

-2.5

-

Load= 25pF

TFplnt

Fp Interrupt

TFpMov

Fp Move To

-

TRExS

Exception Set-up (Run Cycle)

14

TSExS

Exception Set-up (Stall Cycle)

12

TExH

Exception Hold

a

TRunS

Run Set-up

17

TRunH

Run Hold

TStaliS

Stall Set-up

TStaliH

Stall Hold

TFpCond Fp Condition
TFpBusy Fp Busy

Reset Timing, Phase-lock on(4, 5)

Trst

Reset Timing, Phase-lock off(5)

0.5

.":'::;:'/

ns

~:::::::.:::~::=

ns

13

ns

40 ~::::::::@;;:::.<-

35

ns

35,:.):::;;:;:;::'" -

30

ns

12

-

ns

10

-

ns

ns

Tcyc

,-:!ill:a

2::,)\:::!/

15
-2
10
-2

-

3000

-

128

-

2

0.5

2

NOTES:
1. All timings are referenced to 1.5V.
2. The clock parameters apply to all four 2xClocks: Clk2xSys, Clk2xSmp, Clk2xRd, and Clk2xPhi.
3. This parameter is guaranteed by design.
4. With PilOn asserted, Reset must be asserted for the longer of 3000 clock cycles or 200 microseconds.
5. Tcyc is one CPU clock cycle (two cycles of a 2x clock).
6. No two signals on a given device will derate for a given load by a difference greater than 15%.
7. Clock transition time < 5ns

5.3

ns

3

30

~

~

ns

-1

ns

:(}e:::::.:...}}::
/M:::-

300Q,;:;:},:::;::..

Load Derate(6)

:ttr::::"\\}

-2

-

-2~
""";'

Capacitive Load Deration
CLD

2::;\I}:::·I:/·'·"'·

~::::--"

-2
10

Reset Initialization
TrstPLL

.1

-2

-

ns
ns

ns
ns
ns
ns

Tcyc

ns125pF
2873 tbl 15

17

•

IDT79 R30 1OAIAE
MILITARY AND COMMERCIAL TEMPERATURE RANGES

RISC FLOATING POINT ACCELERATOR

AC ELECTRICAL CHARACTERISTICS FOR IDT79R3010AE(1,3)
MILITARY TEMPERATURE RANGE (TC =-55°Cto +125°C, Vce = +5.0V± 10%)
33.33 MHz

25.0 MHz
Symbol

Parameter

Test Conditions

Max.

Min.

Max.

Min.

Unit

Clock
TCkHigh

Input Clock High(2)

Note 7

8

TCklaw
TCkP

Input Clock Low(2)

Note 7

8
20
0
0
5

Input Clock Period
Clk2xSys to Clk2XSmp(5)
Clk2xSmp to CIk2xRd(5)
Clk2xSmp to Clk2xPhi(5)

-

6
6

1000
tcyc/4
tcyc/4
tcyc/4

15
0
0
3.5

":",-

;:::}}~::......'.:::.:.:.'

/iI:Ji:::::19llit::
tcyc/4

:/ii i!

ns
ns
ns
ns
ns
ns

Timing Paramters

-

TOEn

Data Enable(3)

TOOls

Data Disable(3)

TOVal

Data Valid

TRSOS

Reset Set-up

10

Tos

Data Set-up

6

TOH

Data Hold(3)

-2.5

Load= 25pF

.,::::::::::;:::::. -1

-1.5

:?::{

-0.5

2

-

IiI

TFpCond Fp Condition

-

25

.::.

TFpBusy Fp Busy

-

10

·/::i:

TFplnt

Fp Interrupt

-

TFpMov

Fp Move To

-

~~.4

TRExS

Exception Set-up (Run Cycle)

11

TSExS

Exception Set-up (Stall Cycle)

8

TExH

Exception Hold

0

TRunS

Run Set-up

15

TRunH

Run Hold

-2

TStaliS

Stall Set-up

9

TStaliH

Stall Hold

-2~2

Reset Initialization
TrstPll

Reset Timing, Phase-lock on(4, 5)

Trst

ResetTiming, Phase-lock off(5)

.:"}:. ·21:::::::::::-:::::::

{I? .{H+-:/\::·
I:::::

3000..:::::::::::.:..

~

Capacitive Load Deration
CLD

...::::::p::"
::i\±t::::.:}@

Load Derate(6)

0.5

::::'::

'::::::::::::.

:·:::::::::::::r:7:}"

-

17

ns

7

ns

18

ns

-

16

ns

9

-

ns

6.5

-

ns

0

ns

-2

-

3000

-

Tcyc

12.5
-1.5
7

1

0.5

ns
ns

ns
ns
ns
ns

Tcyc

128

NOTES:
1. All timings are referenced to 1.5V.
2. The clock parameters apply to all four 2xClocks: Clk2xSys, Clk2xSmp, Clk2xRd, and Clk2xPhi.
3. This parameter is guaranteed by design.
4. With PilOn asserted, Reset must be asserted for the longer of 3000 clock cycles or 200 microseconds.
5. Teye is one CPU clock cycle (two cycles of a 2x clock).
6. No two signals on a given device will derate for a given load by a difference greater than 15%.
7. Clock transition time < 2.5ns for 33MHz; clock transition time < 5ns for all other speeds.

5.3

ns
ns

~

-4:::::\i\::: 1::.-:::::::

ns

2

-

...:.:::::::::::::;;g/

I\}::::

:::

ns

-0.5

1

ns/25pF
2873 tbl 15

18

IDT79R3010AlAE
RISC FLOATING POINT ACCELERATOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

Clk2xSys

Clk2xSmp

Clk2xRd

Clk2xPhi

2873 drw 10

Figure 6. Input "2x" Clock Timing

II

FpSysOut

FpSmpOut*

2873 drw 11

Figure 7. Processor Reference Clock
*

These signals are not actually output from the floating point processor.
They are drawn to provide a reference for other timing diagrams.

5.3

19

IDT79R3010AlAE
RISC FLOATING POINT ACCELERATOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

FPAStore

Phase

FPA Load

2

2

FpSysOut

Data Bus

2873 drw 12

Figure 8. Floating Point Load/Store Timing

. MoveTo Writeback

MoveTo MEM Access

Phase

2

2

2860drw 22

Figure 9. Move to FPC Status Timing

5.3

20

1DT79R3010AlAE
RISC FLOATING POINT ACCELERATOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

FPALU

Phase

2

FpSysOut

l

FPMem

"
2

"
FpPhiOut

Fplnt
2873 drw 14

Figure 10. Floating Point Interrupt TIming

FPCom areMEM

FPCom areALU

Phase

2

2

2873 drw 15

Figure 11. Floating Point Condition Timing

5.3

21

IDT79R3010AlAE
RISC FLOATING POINT ACCELERATOR

Phase

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

I'

2

FpSysOut

l

'I

2

FpPhiOut

FpBusy

Exception

Run

2873 drw 16

Figure 12. Floating Point Busy, Exception Timing

Phase

2

2

2

2

2

ff
Tds

Tsmp

Vee

Trst
2873 drw 17

Figure 13. Power-On Reset Timing

5.3

22

IDT79R3010AlAE
RISC FLOATING POINT ACCELERATOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

ORDERING INFORMATION
IDT

XXXXX
Device Type

XX
Speed

X
Package

x
Processl
Temperature
Range .

y~ank
F
~--------------~G

QJ

~

16
20
______________________~25
33
37
40

79R3010A
~-------------------4 79R3010AE

Commercial (O°C to +70°C)
Military (-55°C to + 125°C)
Compliant to MIL-STD-883, Class B
Military Temperature Range Only
84-Pin Quad Flatpack (Cavity Down)
84-Pin PGA (Cavity Down)
84-Pin J-Bend CerPack (Cavity Up)
16.67 MHz
20.0 MHz
25.0 MHz
33.33 MHz
37 MHz
40 MHz
Floating Point Accelerator
Enhanced Timing Floating Point Accelerator
2873 drw 18

11

5.3

23

(;)

PRELIMINARY
IDT79R3500A

RISCore™
RISC CPU PROCESSOR

Integrated DevIce Technology, Inc.

FEATURES:

• Supports independent multiword block refill of both the
instruction and data caches with variable block sizes.
• Supports concurrent refill and execution of instructions.
• Partial word stores executed as read-modify-write.
• 6 external interrupt inputs, 2 software interrupts, with single
cycle latency to exception handler routine.
• Flexible multiprocessing support on chip with no impact on
uniprocessor designs.
• Software compatible with R3000, R2000 CPUs and R301 0,
2010 FPAs.
• Faster integer multiply and divide than standard R3000
• TLB disable feature allowing a simple memory model for
Embedded Applications.
• Programmable Tag bus width allowing reduced cost cache.
• Hardware Support of Single- and Double-Precision Floating Point Operations that include Add, Subtract, Multiply,
Divide, Comparisons, and Conversions.
• Sustained Floating Point Performance of 11 MFlops single
precision UNPACK and 7.3 MFLOPS double precision
• Supports Full Conformance With IEEE 754-1985 Floating
Point Specification
• 64-bit FP operation using sixteen 64-bit data registers

• A single chip integrating the R3000 CPU and R3010 FPA
execution units, using the R3000A pinout.
• Efficient Pipe lining-The CPU's 5-stage pipeline design
assists in obtaining an execution rate approaching one
instruction per cycle. Pipeline stalls and exceptions are
handled preCisely and efficiently.
• On-Chip Cache Control-The IDT79R3500A provides a
high bandwidth memory interface that handles separate
external Instruction and Data Caches ranging in size from
4 to 256 Kbytes each. Both caches are accessed during a
single CPU cycle. All cache control is on-Chip.
• On-Chip Memory Management Unit-A fully-associative,
64 entry Translation Lookaside Buffer (TLB) provides fast
address translation for virtual-to-physical memory mapping of the 4 Gigabyte virtual address space.
• Dynamically able to switch between Big- and Little- Endian
byte ordering conventions.
• Optimizing Compilers are available for C, FORTRAN,
Pascal, COBOL, Ada, and PU1.
• 16.7 through 40MHz clock rates yield up to 32 VUPS
sustained throughput.

IDT79R3500A PROCESSOR
CONTROL

D

I
FPA

CPO
(System Control Coprocessor)

D

FPA Registers

>-

'<;

7-

CPU

"Ii

7-

~~

L

~~
Exc~tion/Control

Memory
Management
Unit Registers

FPA Divide Unit
FPA Multiply Unit

>-

V
TAG (20+4)

~

ALU
Local
Control
Logic

Translation
Lookaside
Buffer
(64 entries)
L

"-

General Registers
(32x32)

egisters

Exponent Add Unit

Shifter
Integer
Multiplier/Divider
Address Adder
PC Increment/Mux

11

~~1

Virtual Page Number/
Virtual Address

.J

~}
I

I

'V7
Data (32 +4)

ADDRESS (18)

RISCore and CEMOS are trademarks of Integrated Device Technology. Inc.

2871 drw 01

MILITARY AND COMMERCIAL TEMPERATURE RANGES
e 1990 Integrated Device Technology, Inc.

I

Master Pipeline/Bus Control

5.4

DECEMBER 1990
DSC-9054/·

1

IDT79R3500A RISCorenl RISC CPU PROCESSOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

DESCRIPTION:

FPA REGISTERS

The IDT79R3500A RISC Microprocessor consists ofthree
tightly-coupled processors integrated on a single chip. The
first processor is a full 32-bit CPU based on RISC (Reduced
Instruction Set Computer) principles to achieve a new standard of microprocessor performance. The second processor
is a system control coprocessor, called CPO, containing a
fully-associative 64 entry TLB (Translation Lookaside Buffer),
MMU (Memory Management Unit) and control registers, supporting a 4 Gigabyte virtual memory subsystem, and a Harvard
Architecture Cache Controller achieving a bandwidth of 320
Mbytes/second using industry standard static RAMs. The
third processor is the Floating Point Accelerator which performs arithmetic operations on values in floating-point representations. This processor fully conforms to the requirements
of ANSI/IEEE Standard 754-1985, "IEEE Standard for Binary
Floating-Point Arithmetic." In addition, the architecture fully
supports the standard's recommendations.
The programmer model of this device will be the same as
the programmer model of a system which uses a discrete
79R3000 with the 79R3010: 32 integer registers, 16 floating
point registers; co-processor 0 registers; floating point status
and control register; RISC integer ALU; Integer Multiply and
Divide ALU; Floating Point Add/Subtract, Multiply, and Divide
ALUs. The device pipeline will be the same as forthe 79R3000,
as will the co-processor 0 functionality. No new instructions
have been introduced. Pin compatibility extends to AC and DC
characteristics, software execution and initialization mode
vector selection.
This data sheet provides an overview of the features and
architecture of the 79R3500A CPU, Revision 3.0. A more
detailed description of the operation of the device is incorporated in the '~3500A Family Hardware User Manual", and a
more detailed architectural overview is provided in the "mips
RISC Architecture" book, both available from IDT. Documentation providing details of the software and development
environments supporting this processor are also available
from IDT.

The IDT79R301 OA FPA provides 32 general purpose 32bit registers, a Control/Status register, and a Revision Identification register.
Floating-point coprocessor operations reference three types
of registers:
• Floating-Point Control Registers (FCR)
• Floating-Point General Registers (FGR)
• . Floating-Point Registers (FPR)

ID179R3500A CPU Registers
The I DT79R3500A CPU provides 32 general purpose 32bit registers; a 32-bit Program Counter, and two 32-bit registers that hold the results of integer multiply and divide operations. Only two of the 32 general registers have a special
purpose: register rO is hardwired to the value "0", which is a
useful constant, and register r31 is used as the link register in
jump-and-link instructions (return address for subroutine calls).
The CPU registers are shown in Figure 2. Note that there
is no Program Status Word (PSW) register shown in this
figure: the functions traditionally provided by a PSW register
are instead provided in the Status and Cause registers incorporated within the System Control Coprocessor (CPO).

5.4

General Purpose Registers

o

31
rO
r1
r2

..

Multiply/Divide Registers
31

I
I

I
I

31

0
LO

Program Counter

r29

31

r30

I

r31

0

HI

0
PC

I

II

2871 drw02

Figure 2. IDT79R3500A CPU Registers

I

Floating-Point General Registers (FGR)
There are 32 Floating-Point General Registers (FGR) on
the FP A. They represent directly-addressable 32-bit registers,
and can be accessed by Load, Store, or Move Operations.
Floating-Point Registers (FPR)
The 32 FGRs described in the preceding paragraph are
also used to form sixteen 64-bit Floating-Point Registers
(FPR). Pairs of general registers (FGRs), for example FGRO
and FGR 1 (refer to Figure 3) are physically combined to form
a single 64-bit FPR. The FPRs hold a value in either single- or
double-precision floating-point format. Double-precision format FPRs are formed from two adjacent FGRs.
Floating-Point Control Registers (FCR)
There are 2 Floating-Point Control Registers (FCR) on the
FPA. They can be accessed only by Move operations and
include the following:
• Control/Status register, used to control and monitor exceptions, operating modes, and rounding modes;
• Revision register, containing revision information about
the FPA.

2

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

IDT79R3500A RISCore™ RISC CPU PROCESSOR

General Purpose Registers
(FGRlFPR)

o

32 31

63
FGR1

FGRO

FGR3

FGR2

31

FGR5

FGR4

I

ExeeptionslEnables/Modes

31

Implementation/Revision
Register

Control/Status Register

•
•
•
FGR27

FGR26

FGR29

FGR28

FGR31

FGR30

0

0

I

I
2871 drw 03

Figure 3. FPA Registers

Instruction Set Overview
AIIIDT79R3500A instructions are 32 bits long, and there
are only three· instruction formats. This approach simplifies
instruction decoding, thus minimizing instruction execution
time. The 79R3500A processor initiates a new instruction on
every run cycle, and is able to complete an instruction on
almost every clock cycle. The only exceptions are the Load
instructions and Branch instructions, which each have a single
cycle of latency associated with their execution. Note, however, that in the majority of cases the compilers are able to fill
these latency cycles with useful instructions which do not
require the result of the previous instruction. This effectively
eliminates these latency effects.
The actual instruction set of the CPU was determined after
extensive simulations to determine which instructions should
be implemented in hardware, and which operations are best
synthesized in software from other basic instructions. This
methodology resulted in the R3500A having the highest
performance of any available microprocessor.

I-Type (Immediate)

31

I

26

25 21

I

op

rs

20 16

I

rt

15

I

0

I

immediate

J-Type (Jump)

31

I

26

25

0

op

I

target

R-Type (Register)

31

I

26
op

I

25 21
rs

20 16

I

rt

I

15 11
rd

I

10
re

6

5

I

fu net

0

I

2871 drw 04

Figure 4. 1DT79R3500A Instruction Formats

5.4

The IDT79R3500A instruction set can be divided into the
following groups:
Load/Store instructions move data between memory and
general registers. They are alii-type instructions, since the
only addressing mode supported is base register plus 16bit, signed immediate offset.
The Load instruction has a single cycle of latency, which
means that the data being loaded is not available to the
instruction immediately after the load instruction. The compiler will fill this delay slot with either an instruction which is
not dependent on the loaded data, or with a NOP instruction. There is no latency associated with the store instruction.
Loads and Stores can be performed on byte, half-word,
word, or unaligned word data (32 bit data not aligned on a
modul0-4 address). The CPU cache is constructed as a
write-through cache.
Computational instructions perform arithmetic, logical
and shift operations on values in registers. They occur in
both R-type (both operands and the result are registers)
and I-type (one operand is a 16-bit immediate) formats. FP
computational instructions perform arithmetic operations
on floating point values in the FPA registers. Note that
computational instructions are three operand instructions;
that is, the result of the operation can be stored into a
different register than either of the two operands. This
means that operands need not be overwritten by arithmetic
operations. This results in a more efficient use of the large
register set.
Conversion instructions perform conversion operations
on the floating point values in the FPA registers.
Compare intructions perform comparisons of the contents
of FPA registers and set a condition bit based on the
results. The result of the compare operations is tied directly
to Cp Cond (1) for software testing.
Jump and Branch instructions change the control flow of
a program. Jumps are always to a paged absolute address
formed by combining a 26-bit target with four bits of the
Program counter (J-type format, for subroutine calls), or

3

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

IDT79R3500A RISCoreThi RISC CPU PROCESSOR

OP

Description

LB
LBU
LH
LHU
LW
LWL
LWR
SB
SH
SW
SWL
SWR

Load/Store Instructions
Load Byte
Load Byte Unsigned
Load Halfword
Load Halfword Unsigned
Load Word
Load Word Left
Load Word Right
Store Byte
Store Halfword
Store Word
Store Word Left
Store Word Right

LWC1
SWC1
MTC1
MFC1
CTC1
CFC1

FPA Load/Store/Move Instructions
Load Word to FPA
Store Word from FPA
Move Word to FPA
Move Word from FPA
Move Control word to FPA
Move Control word from FPA

ANDI
ORI
XORI
LUI

Arithmetic Instructions
(ALU Immediate)
Add Immediate
Add Immediate Unsigned
Set on Less Than Immediate
Set on Less Than Immediate
Unsigned
AND Immediate
OR Immediate
Exclusive OR Immediate
Load Upper Immediate

ADD
ADDU
SUB
SUBU
SLT
SLTU
AND
OR
XOR
NOR

Arithmetic Instructions
(3-operand, register-type)
Add
Add Unsigned
Subtract
Subtract Unsigned
Set on Less Than
Set on Less Than Unsigned
AND
OR
Exclusive OR
NOR

ADD.fmt
SUB.fmt
MULfmt
DIV.fmt
ABS.fmt
MOV.fmt
NEG.fmt

FPA Computational Instructions
Floating point Add
Floating point Subtract
Floating point Multiply
Floating point Divide
Floating-point Absolute value
Floating point Move
Floating point Negate

C.cond.fmt

FPA Compare Instructions
Floating-point Compare

SLL
SRL

Shift Instructions
Shift Left Logical
Shift Right Logical

ADDI
ADDIU
SLTI
SLTIU

Description

OP
SRA
SLLV
SRLV
SRAV

Shift Instructions (Cont.)
Shift Right Arithmetic
Shift Left Logical Variable
Shift Right Logical Variable
Shift Right Arithmetic Variable

CVT.S.fmt
CVT.D.fmt
CVT.W.fmt

FPA Conversion Instructions
Floating point Convert to Single FP
Floating point Convert to Double FP
Floating point Convert to fixed point

MULT
MULTU
DIV
DIVU
MFHI
MTHI
MFLO
MTLO

Multiply/Divide Instructions
Multiply
Multiply Unsigned
Divide
Divide Unsigned
Move From HI
Move To HI
Move From LO
Move To LO

J
JAL
JR
JALR
BEQ
BNE
BLEZ
BGTZ
BLTZ
BGEZ
BLTZAL
BGEZAL

Jump and Branch Instructions
Jump
Jump and Link
Jump to Register
Jump and Link Register
Branch on Equal
Branch on Not Equal
Branch on Less than or Equal to Zero
Branch on Greater Than Zero
Branch on Less Than Zero
Branch on Greater than or
Equal to Zero
Branch on Less Than Zero and Link
Branch on Greater than or Equal to
Zero and Link

SYSCALL
BREAK

Special Instructions
System Call
Break

LWCZ
SWCZ
MTCZ
MFCZ
CTCZ
CFCZ
COPZ
BCZT
BCZF

Coprocessor Instructions
Load Word from Coprocessor
Store Word to Coprocessor
Move To Coprocessor
Move From Coprocessor
Move Control to Coprocessor
Move Control From Coprocessor
Coprocessor Operation
Branch on Coprocessor z True
Branch on Coprocessor z False

MTCO
MFCO
TLBR
TLBWI
TLBWR
TLBP
RFE

System Control Coprocessor
(CPO) Instructions
Move To CPO
Move From CPO
Read indexed TLB entry
Write Indexed TLB entry
Write Random TLB entry
Probe TLB for matching entry
Restore From Exception

Table 1. IDT79R3500A Instruction Summary

5.4

II

2871 tbl 01

4

IDT79R3500A RISCoreThf RISC CPU PROCESSOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

32-bit register byte addresses (R-type, for returns and
dispatches). Branches have 16-bit offsets relative to the
program counter (I-type). Jump and Link instructions save
a return address in Register 31. The 79R3500A instruction
set features a numberof branch conditions. Included is the
ability to compare a register to zero and branch, and also
the ability to branch based on a comparison between two
registers. Thus, net performance is increased since software does not have to perform arithmetic instructions prior
to the branch to set up the branch conditions.
• Coprocessor instructions perform operations in the
coprocessors. Coprocessor Loads and Stores are I-type.
• Coprocessor 0 instructions perform operations on the
System Control Coprocessor (CPO) registers to manipulate the memory management and exception handling
facilities of the processor.
• Special instructions perform a variety of tasks, including
movement of data between special and general registers,
system calls, and breakpoint. They are always R-type.

Register

Description

EntryHi
EntryLo
Index
Random

High half of a TLB entry
Low half of a TLB entry
Programmable pointer into TLB array
Pseudo-random pointer into TLB array

Status
Cause
EPC
Context
BadVA

Mode, interrupt enables, and diagnostic status info
Indicates nature of last exception
Exception Program Counter
Pointer into kernel's virtual Page Table Entry array
Most recent bad virtual address

PRld

Processor revision identification (Read only)
2871 tbl 02

Table 2. System Control Coprocessor (CPO) Registers

SYSTEM COPROCESSOR

Table 1 lists the instruction set of the IDT79R3500A
processor.
IDT79R3500A System Control Coprocessor (CPO)
The IDT79R3500A can operate with up to four tightlycoupled coprocessors (designated CPO through CP3). The
System Control Coprocessor (or CPO), is incorporated on the
IDT79R3500A chip and supports the virtual memory system
and exception handling functions of the IDT79R3500A. The
virtual memory system is implemented using a Translation
Lookaside Buffer and a group of programmable registers as
shown in Figure 5.
System Control Coprocessor (CPO) Registers
The CPO registers shown in Figure 5 are used to control
the memory management and exception handling capabilities
of the IDT79R3500A. Table 2 provides a brief description of
each register.

I·.·•••.•

ENTRYHI

I

:~~~~~~·I• 1.':".I.'I~ ~ ~"· 'jl l'.1
I.·.I

ENTRYLO

I

63

TLB

8
t----------t
7
NOT ACCESSED
BY RANDOM

0'-----------'

D
EEl

Used with Virtual Memory System
Used with Exception Processing
2871 drw 05

Figure 5. The System Coprocessor Registers

5.4

5

IDT79R3500A RISCoreTId RISC CPU PROCESSOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

.Memory Management System
The IDT79R3500A has an addressing range of 4 Gbytes.
However, since most IDT79R3500A systems implement a
physical memory smaller than 4Gbytes, the IDT79R3500A
provides for the logical expansion of memory space by translating addresses composed in a large virtual address space
into available physical memory address. Two TLB modes are
supported. When the TLB is used, the 4 GByte address space
is divided into 2 GBytes which can be accessed by both the
users and the kernel, and 2 GBytes forthe kernel only. Virtual
addresses within the kerneVuser segment are translated to
physical addresses on a 4kB page basis. This mode is typical
of UNIX and other sophisticated operating systems. When the
TLB is disabled, mapping is locked as 2 GBytes as kernel/
user, and 1.5 GBytes as kernel only. This mode requires no
TLB manipulation, provides large linear address space, and is
typical for embedded applications.
TLB (Translation Lookaside Buffer)
Virtual memory mapping is assisted by the Translation
Lookaside Buffer (TLB). The on-chip TLB provides very fast
virtual memory access and is well-matched to the requirements of multi-tasking operating systems. The fully-associative TLB contains 64 entries, each of which maps a 4-Kbyte
page, with controls for read/write access, cacheability, and
process identification. The TLB allows each userto access up
to 2 Gbytes of virtual address space.
Figure 6 illustrates the format of each TLB entry. The

Translation operation involves matching the current Process
10 (PID) andupper20bitsofthe address against PID and VPN
(Virtual Page Number) fields in the TLB. When both match (or
the TLB entry is Global), the VPN is replaced with the PFN
(Physical Frame Number) to form the physical address.
TLB misses are handled in software, with the entry to be replaced determined by as imple RANDOM function. The routine to process a TLB miss in the UNIX environment requires
only 10-12 cycles, which compares favorably with many C PUs
which perform the operation in hardware.
TLB Disabled Operation
Many embedded systems do not like the complexity or
uncertainty associated with the on-Chip TLB. However, many
systems still desire the ability to implement a kernel/user
mode. Therefore, to implement a hierachical task model, the
TLB must be used. The IDTR3500A gives the system designer one more option, allowing the TLB to be disabled and
performing a fixed mapping of virtual to physical addresses,
while maintaining separation of kernel and user resources.
The user may elect to disable the TLB through the reset
sectors. In this case, the mapping shown in Figure 8. is used,
and device power consumption is reduced. Note that
"cached" segments means that there is no mechanism to
exclude addresses in these regions from the cache.
This mapping means that applications designed to run in
ksegO and kseg1 (to avoid the TLB) can use the IDT3500A,
disable the TLB to reduce power, and not have to change
software to take advantage of this new feature.

TLB ENTRY FORMAT
63

I

44 43
VPN

I

32 31

38 37
TLBPID

Y

I

0

I

12 11 10

9

A

ENTRYHI

8

7

o

PFN

y

ENTRYLO
VPN - Virtual Page Number
TLBPID - Process ID
PFN - Physical Frame Number
N - Non-cacheable flag
D - Dirty flag (Write protect)
V - Valid entry flag
G - Global flag (ignore PID)
0- Reserved

2971 drw 06

Figure 6. TLB Entry Format

5.4

6

II:'

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

IDT79R3500A RISCoreThl RISC CPU PROCESSOR

MMU ADDRESS TRANSLATION
VIRTUAL --7 PHYSICAL
OxFFFFFFFF

OxCOOOOOOO

KERNEL
MAPPED
CACHEABLE
(kseg2)

OxAOOOOOOO

KERNEL
UNMAPPED
UNCACHED
(kseg1)

Ox80000000
Ox7FFFFFFF

OxFFFFFFFF
ANY

PHYSICAL
MEMORY

3584 MB

KERNEL
UNMAPPED
CACHED
(ksegO)
Ox20000000
KERNEUUSER
MAPPED
CACHEABLE
(kuseg)

Ox1FFFFFFF
512 MB

MEMORY
OxOOOOOOOO

o '--_______-'

2871 drw07

Figure 7. 1DT79R3500A Virtual Address Mapping

MNU Address Translation
Virtual -) Physical
(TLB Disabled)
Oxffffffff
Kernel Cached
(kseg2)

Kernel Cache able
Tasks

1024 MB

KerneVUser
Cacheable
Tasks

2048 MB

Inaccessible

512MB

Kernel Boot
and 1/0

512MB

OxcOOOOOOO
Kernel Uncached
f---

OxaOOOOOOO

(kseg1)
Kernel Uncached

Ox80000000

(ksegO)

,--. ~

-

I---

User Cached
(kseg)
I---

L.....to

OxOOOOOOOO

2871 drw 08

Figure 8. TBl Disabled Mapping

NOTE: This model is consistent with the mapping available in the IDT79R3051 family. The identical mapping provides software compatibility to the
lower cost CPUs.

5.4

7

IDT79R3500A RISCoreThi RISC CPU PROCESSOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

. Operating Modes
The IDT79R3500A has two operating modes: User mode
and Kernel mode. The IDT79R3500A normally operates in the
User mode until an exception is detected forcing it into the
Kernel mode. It remains in the Kernel mode until a Restore
From Exception (RFE) instruction is executed. The manner in
which memory addresses are translated or mapped depends
on the operating mode of the IDT79R3500A. Figure 7 shows
the MMU translation performed for each of the operating
modes.
User Mode-in this mode, a single, uniform virtual address
space (kuseg) of 2 Gbyte is available. When the TLB is used,
each virtual address is extended with a 6-bit process identifier
field to form unique virtual addresses. All references to this
segment are mapped through the TLB. Use of the cache for
up to 64 processes is determined by bit settings for each page
within the TLB entries. If the TLB is not used, these addresses
are translated to begin at 1Gbyte of the physical address
space.
Kernel Mode-four separate segments are defined in this
mode:
• kuse{rwhen in the kernel mode, references to this segment are treated just like user mode references, thus
streamlining kernel access to user data.
• ksegD-references to this 512 Mbyte segment use cache
memory but are not mapped through the TLB.lnstead, they
always map to the first 0.5 GBytes of physical address
space.
• kseg1-references to this 512 Mbyte segment are not
mapped through the TLB and do not use the cache.
Instead, they are hard-mapped into the same 0.5 GByte
segment of physical address space as ksegO.
• kseg2-when the TLB is not used, references to this
1Gbyte segment directly addresses the upper 1Gbyte of
physical address space. These addresses are defined to
be kernel mode which are cacheable. When the TLB is
used, references to this 1Gbyte segment are always mapped
through the TLB and use of the cache is determined by bit
settings within the TLB entry.

FPA COPROCESSOR OPERATION (CP1)
The FPA continually monitors the processor instruction
stream. If an instruction does not apply to the coprocessor, it
is ignored; if an instruction does apply to the coprocessor, the
FPA executes that instruction and transfers necessary result
and exception data synchronously to the main processor.
The FPA performs three types of operations:
• Loads and Stores;
• Moves;
• Two- and three-register floating-point operations.

5.4

Load, Store, and Move Operation
Load, Store, and Move operations data between memory
or the integer registers and the FPA registers. These operations perform no format conversions and cause no floatingpoint exceptions. Load, Store, and Move operations reference
a single 32-bit word of either the Floating-Point General
Registers (FGR) orthe Floating-Point Control Registers (FCR).
Floating-Point Operations
The FPA supports the following single- and double-precision format floating-point operations:
• Add
• Subtract
• Multiply
• Divide
• Absolute Value
• Move
• Negate
• Compare
In addition, the FPA supports conversions between singleand double-precision floating-point formats and fixed-point
formats.
The FPA incorporates separate Add/Subtract, Multiply,
and Divide units, each capable of independent and concurrent
operation. Thus, to achieve very high performance, floating.,
point divides can be overlapped with floating point multiplies
and floating point additions. These floating point operations
occur independently of the actions of the CPU, allowing
further overlap of integer and floating point operations. Figure
9 illustrates an example of the types of overlap permissible.
Exceptions
The FPA supports all five IEEE standard exceptions:
• Invalid Operation
• Inexact Operation
• DiviSion by Zero
• Overflow
• Underflow
The FPA also suppoerts the optional, Unimplemented
Operation exception that allows unimplemented instructions
to trap to software emulation routines.
The FPA provides precise exception capability to the CPU;
that is, the execution of a floating point operation which
generates an exception causes that exception to occur at the
CPU instruction which caused the operation. This precise
exception capability is a requirement in applications and
languages which provide a mechanism for local software
exception handlers within software modules.

8

IDT79R3500A RISCore™ RISC CPU PROCESSOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

o

2

4

8

6

12

10

DIV.S

Only Load, Store, and Move operations
are permitted in FPA during these cycles.
1::I:::::::::::::i:::::i:::i::::l

~~~se: ~:Cfe~~W~~~~~~, ~~~ ~~i~~ ~~[~5

.............•....... divide operation cannot be overlapped.

I I These cycles are free for integer operations
~ intheCPU.

2B71 drw 09

Figure 9. Examples of Overlapping Floating Point Operation

F

I

I
I-Cache

RD

I RF

ALU

MEM

OP

D-Cache

I

WB
Register file
write back or
FP exceptions

FWB
*FpWB

I

'-y----J
* FP ops only

One Cycle

2B71 drw 10

, Figure 10. Instruction Execution

WB

*FWB

MEM
ALU
RD
Instruction
Flow

IF

Current
CPU
Cycle

2B71 drw 11

Figure 11. 1DT79R3500A Execution Sequence

5.4

9

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

IDT79R3500A RISCoreThl RISC CPU PROCESSOR

IDT79R3500 PIPELINE ARCHITECTURE
The execution of a single I DT79R3500A integer instruction
consists of five pipe stages while floating point instruction
takes six pipe stages. They are:
1) I F-Instruction fetch. The processor calcu lates the instruction address required to read from the I cache.
2) RD-The instruction is present on the data bus during
phase one of this pipe stage. Instruction decode occurs
during phase two. Operands are read from the registers if
required.
3 ALU-Perform the required operation on instruction operands. If this is a FPA instruction, instruction execution
commences.
4) M EM-Access memory. If the instruction is a load or store,
the data is presented or captured during phase 2 of this
pipe stage.
5) WB-Write integer results back into register file. In FPA
cycles this pipe stage is used for exceptions.
6) FWB-The FPA uses this stage to write back ALU results
to its register file.
Each of these steps requires approximately one FPA cycle
as shown in Figure 10. (parts of some operations spill over into
another cycle while other operations require only 1/2 cycle.)
The CPU uses a five stage pipeline while while the FPA
uses a 6 stage to achieve an instruction execution rate
approaching one instruction per cycle. Thus, execution of six
instructions at a time are overlapped as shown in Figure 11.
This pipeline operates efficiently because different CPU
resources (address and data bus accesses, ALU operations,
register accesses, and so on) are utilized on a non-interfering
basis.

Microprocessor
(CPU)
Data

Address

2871 drw 12

Figure 12. A Simple Microprocessor Memory System

Figure 13 illustrates a memory system that supports the
significantly greater memory bandwidth required to take full
advantage of the IDT79R3500A's performance capabilities.
The key features of this system are:

IDT79R3500A

Microprocessor
Address

Data

II

MEMORY SYSTEM HIERARCHY
The high performance capabilities of the IDT79R3500A
processor demand system configurations incorporating techniques frequently employed in large, mainframe cOmputers
but seldom encountered in systems based on more traditional
microprocessors.
A primary goal of systems employing RISC techniques is to
minimize the average number of cycles each instruction
requires for execution. Techniques to reduce cycles-perinstruction include a compact and uniform instruction set, a
deep instruction pipeline (as described above), and utilization
of optimizing compilers. Many of the advantages obtained
from these techniques can, however, be negated by an
inefficient memory system.
Figure 12 illustrates memory in a simple microprocessor
system. In this system, the CPU outputs addresses to memory
and reads instructions and data from memory or writes data to
memory. The address space is completely undifferentiated:
instructions, data, and I/O devices are all treated the same. In
such a system, a primary limiting performance factor is
memory bandwidth.

5.4

Main Memory

2871 drw 13

Figure 13. An IDT79R3500A System with a
High-Performance Memory System

10

MILITARY AND COMMERCIAL TEMPERATURE RANGES

1DT79R3500A RISCore™ RISC CPU PROCESSOR

• External Cache Memory-Local, high-speed memory
(called cache memory) is used to hold instructions and data
that is repetitively accessed by the CPU (for example,
within a program loop) and thus reduces the number of
references that must be made to the slower-speed main
memory. Some microprocessors provide a limited amount
of cache memory on the CPU chip itself. The external
caches supported by the IDT79R3500A can be much
larger; while a small cache can improve performance of
some programs, significant improvements for a wide range
of programs require large caches.
• Separate Caches for Data and Instructions-Even with
high-speed caches, memory speed can still be a limiting
factor because of the fast cycle time of a high-performance
microprocessor. The IDT79R3500A supports separate
caches for instructions and data and alternates accesses
of the two caches during each CPU cycle. Thus, the
processor can obtain data and instructions at the cycle rate
of the CPU using caches constructed with commerCially
available IDT static RAM devices.
In orderto maximize bandwidth in the cache while minimizing the requirement for SRAM access speed, the R3500A
divides a Single-processor clock cycle into two phases.
During one phase, the address for the data cache access
is presented while data previously addressed in the instruction cache is read; during the next phase, the data
operation is completed while the instruction cache is being
addressed. Thus, both caches are read in a single processor cycle using only one set of address and data pins.
• Write Buffer-in orderto ensure data consistency, all data
that is written to the data cache must also be written out to
main memory. The cache write model used by the
IDT79R3500A is that of a write-through cache; that is, all
data written by the CPU is immediately written into the main
memory. To relieve the CPU of this responsibility (and the
inherent performance burden) the IDT79R3500A supports
an interface to a write buffer. The IDT79R3020 Write Buffer
captures data (and associated addresses) output by the
CPU and ensures that the data is passed on to main
memory.
IDT79R3500A Processor Subsystem Interfaces
Figure 14 illustrates the three subsystem interfaces provided by the IDT79R3500A processor:
• Cache control interface (on-chip) for separate data and
instruction caches permijs implementation of off-chip caches
using standard IDT SRAM devices. The 79R3500A directly
controls the cache memory with a minimum of external
components. Both the instruction and data cache can vary
from 0 to 256K Bytes (64K entries). The 79R3500A also
includes the TAG control logic which determines whether
or not the entry read from the cache is the desired data. The
79R3500A cache controller implements a direct mapped
cache for high net performance (bandwidth). It has the

5.4

ability to refill multiple words when a cache miss occurs,
thus reducing the effective miss rate to less than 2% for
large caches. When a cache miss occurs, the 79R3500A
can support refilling the cache in 1, 4, 8, 16, or 32 word
blocks to minimize the effective penalty of having to access
main memory. The 79R3500A also incorporates the ability
to perform instruction streaming; while the cache is refilling, the processor can resume execution once the missed
word is obtained from main memory. In this way, the
processor can continue to execute concurrently with the
cache block refill.
• Memory controller interface for system (main) memory.
This interface also includes the logic and signals to allow
operation with a write buffer to further improve memory
bandwidth. In addition to the standard full word access, the
memory controller supports the ability to write bytes and
half-words by using partial word operations. The memory
controller also supports the ability to retry memory accesses if, for example, the data returned from memory is
invalid and a bus error needs to be signalled.
•. Coprocessor Interface-The IDT79R3500 features a set
of on board tightly coupled coprocessors. Coprocessor 0 is
defined to be the system control coprocessor and
Coprocessor 1 is the Floating Point Accelerator. They have
direct access to the internal data bus which allows them
direct load and store of data in the same fashion as
accessing the CPU registers. This relieves the typical
bottleneck of having to load data into the CPU register set
and then passing that data off to the co-processors.
In applications where the FPA was off chip, as in using the
I DT79R301 OA several control pins were used for communicationswiththe CPU and a Phase Lock Loopwas located
on the I DT79R301 OA to synchronize the two together. As
they are now integrated into a single chip, these are no
longer needed. The FpCond output, which is used in
coprocessor branch instructions, is now internally tied to
the CpCond(1) input of the CPU leaving the external
CpCond(1) pin available for another function. This signal is
selectable to either output the FpBusy or the FPlnt. For
applications where FPlnt was connected to anyone of the
siX CPU HW interrupt inputs, that can also be internally
routed-the default being Int(3), as recommended by the
MIPS architecture. If FPlnt is internally routed, the external
interrupt input corresponding to the FP interrupt is ignored.
Internal routing of these selections are made via the reset
vector.
The internal CPBusy input, which is used to stall the CPU
if the coprocessor needs to hold off subsequent operations, has two sources-FPBusy and the external CpBusy
pin which are logically ORed together. Further, Run and
Exception of both the FPA and CPU are internally tied and
brought out with the external CPBusy input to accommodate off chip coprocessor 2 and 3. This external interface
is available to support application specific functions.

11

1DT79R3500A RISCore™ RISC CPU PROCESSOR

MILITARY AND COMMERCIAL TEMPERATURE RANGES

MULTIPROCESSING SUPPORT
The IDT79R3500A supports multiprocessing applications
in a simple but effective way. Multiprocessing applications
require cache coherency across the multiple processors. The
IDT79R3500A offers two signals to support cache coherency:
the first, MPStall, stalls the processor within two cycles of
being received and keeps it from accessing the cache. This
allows an external agent to snoop into the processor data
cache. The second signal, MPlnvalidate, causes the processor to write data on the data cache bus which indicates the
externally addressed cache entry is invalid. Thus, a subsequent access to that location would result in a cache miss, and
the data would be obtained from main memory.
The two MP signals would be generated by a external logic
which utilizes a secondary cache to perform bus snooping
functions. The 79R3500A does not impose an architecture for
this secondary cache, but rather is flexible enough to support
a variety of application specific architectures and still maintain
cache coherency. Further, there is no impact on deSigns
which do not require this feature. The 79R3500A further
allows the use of cache RAMs with internal address latches
in multiprocessor systems.

ADVANCED FEATURES
The IDT79R3500A offers a number of additional features
such as the ability to swap the instruction and data caches,
facilitating diagnostics and cache flushing. Another feature
isolates the, caches, which forces cache hits to occur regardless of the contents of the tag fields. The I DT79R3500A allows
the processor to execute user tasks of the opposite byte
ordering (endianness) of the operating system, has double the
integer multiply/divide performance of R3000 and R2000, has
a programmable Tag width bus, and further allows parity
checking to be disabled. More details on these features can be
found in the IDT79R3500 Family Hardware User's Manual.
Further features of the IDT79R3500A are configured during the last four cycles prior to the negation of the RESET
input. These functions include the ability to select cache sizes
and cache refill block sizes; the ability to utilize the multiprocessor interface; whether or not instruction streaming is
enabled; whether byte ordering follows "Big-Endian" or "LittleEndian" protocols, etc. Table 3 shows the configuration options
selected at Reset. These are further discussed in the "Hardware User's Manual".

In most R3000A applications, the IDT79R3500 can be
placed in the socket with no modification to initialization
settings. Additionally, the IDT79R3500 can be used in systems that did not include the R3010 in the original design.
Further application assistance on these topiCS are available
from IDT.

PACKAGE THERMAL SPECIFICATIONS
The IDT79R3500A utilizes special packaging techniques
to improve both the thermal and electrical characteristics of
the microprocessor.
In order to improve the electrical characteristics of the
device, the package is constructed using multiple signal
planes, including individual power planes and ground planes
to reduce noise associated with high-frequency TTL parts. In
addition, the 175-pin PGA package utilizes extra power and
ground pins to reduce the inductance from the internal power
planes to the power planes of the PC Board.
In order to improve the electrical characteristics of the
microprocessor, the device is housed using cavity down
packaging. In addition, these packages incorporate a coppertungsten thermal slug designed to efficiently transfer heat
from the die to the case of the package, and thus effectively
lower the thermal resistance of the package. The use of an
additional external heat sink affixed to the package thermal
slug further decreases the effective thermal resistance of the
package.
The case temperature may be measured in any environment to determine whether the device is within the specified
operating range. The case temperature should be measured
at the center of the top surface OPPOSite the package cavity
(the package cavity is the side where the package lid is
mounted).
The equivalent allowable ambient temperature, TA, can be
calculated using the thermal resistance from case to ambient
(0ca) for the given package. The following equation relates
ambient and case temperature:
TA = Tc - P*0ca
where P is the maximum powerconsumption, calculated by
using the maximum Icc from the DC Electrical Characteristics
section.
Typical values for 0ca at various airflows are shown in
table 2 for the various CPU packages.

II

Airflow - (ft/min)

BACKWARD COMPATIBILITY
The primary goal of the 79R3500A is the ability to replace
the R3000 and R3010 with a single chip solution. This can be
done eitherthe R3000/R301 0 orthe R3000NR301 OAas well.
The pinout of the IDT79R3500A has been selected to ensure
this compatibility, with new functions mapped onto previously
used pins. The instruction set is compatible with that of the
R2000 at the binary level. As a result, code written forthe older
processor can be executed.

0

200

0ca (175-PGA,
144-PGA)

21

7

0ca (172 Quad
Flatpack)

23

9

600

800

1000

3

2

1

0.5

4

3

2.5

1.5

400

2871 lbl 03

Table 2. R3500A Package Characteristics

'..,

5.4

12

IDTI9R3500A RISCore™ RISC CPU PROCESSOR

Input
IntO

X Cycle

WCycie
DBlkSizeO
IBlkSizeO
DispPar/RevEnd
Reserved(1)
FPINT decode
7RR3500 mode

TiiIT
Int2
Int3
Int4
Int5

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

YCycie

DBIkSize1
IBIkSize1
IStream
StorePartial
FPINT decode
TLB disable

ZCycie

Extend Cache
MPAdrDisable
Ignore Parity
MultiProcessor
FPINT decode
Tag Mode 1

Big Endian
TriState
NoCache
BusDriveOn
FPINT onto CpCond
Tag Mode 0
2871 tbl 04

NOTES:
1. Reserved entries must be driven high.
2. These values must be driven stable throughout the entire RESET period.
Table 3. R3500A Mode Selectable Features

Data Bus

~

-

ff-

r---

la C1 E us

,..MrLo Bus

n"

....

....

Tag
TagV
TagP

7

~anITparent
Latch
r-,

"'

7' . .

Data

7' . .

Tag

Instruction
Cache

....

7 ....

7 ....

Data~

Data Bus
us

AdrLo

Data
DataP

IClk

HTransol
;.-

....

DClk

parent
Latch

,

IDT79R3500A Processor
with System Control
Coprocessor

7

IAdr
[15:2]

7' ....

....

IRd

DRd

~

OE

WE

~

IWr

DWr

~

WE

SysOut

CIk2xSys

~

Clk2xSmp

~

Clk2Rd

f+-

7'

Data

Data
Cache

~

7'

7' ....

DAdr Tag
[15:2]

, OE

XEn

....

7'
23

Clocks

Clk2xPhi ~
Reset f4-

AccTy(2:0)
Memory
Interface

AdrLo Bus

;.-

MemRd

CpSync

MemWr

Run
, Exc

RdBusy
WrBusy

Coprocessors

CpBusy

CpCond(O)

CpCond[2:3]
Int[5:0]

BusError

,L

Hardware
Interrupts

I
2871 drw 14

Figure 14. IDTI9R3500A Subsystem Interfaces Example; 64 KB Caches

5.4

13

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

1DT79R3500A RISCore"N RISC CPU PROCESSOR

PIN CONFIGURATION

44
Data21
Data22
Data24
Data25
Data26
Data31
DataP3
Data27
Data2B

43

Adrlo2
Adrlo3
AdrLo4
Adrlo5
Adrl06
Adrlo7
Adrlo8
Adrlo9
AdrLol0
Adrloll
Adrlo12
AdrLo13
AdrLo14
vce
VCC
VCC
GND
GND
vec
vee
GND
vee
vce
vcc
Adrlo15
CpCondO
CpCondl
Resvdl
GND
GND
AdrLo16
Ad rLo 17
Trito
Tritl

ffi
Data29
Data30

rxc

Clk2xPhi
GND
GND
Clk2xSmp
VCC
VCC
GND
GND
GND
VCC
VCC
VCC
GND
GND
elk2xSys

mal
'D1fcf1
iWr1
UWr1
vcc
vce
CIk2xRd

mt2

SySOui

1nt3

DClk
IClk

mt5

Ini4

ma2
lmd2
TWr2

CpBusy
WrBusy

~

UWr2

BusError

l'ieSiii

MeiliWr

2871 drw 15

172-Pin Flatpack (Top View)

NOTES:
1. Reserved pins must be connected.
2. AdrLo 16 and 17 are multifunction pins which are controlled by mode select programming on interrupt pins at reset time
AdrLo 16: MP Invalidate, CpCond (2).
AdrLo 17: MP Stall, CpCond (3).

5.4

14

II

IDT79R3500A RISCore™ RISC CPU PROCESSOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

PIN CONFIGURATION
5

6

A

(No
Pin)

AdrLo
6

AdrLo
10

AdrLo
11

vee

AdrLo
14

B

AdrLo
3

DRd2

AdrLo
7

AdrLo
9

AdrLo
12

e

AdrLo
0

AdrLo
4

vee

AdrLo
5

D

Data
1

AdrLo
2

GND

GND

E

DataP
0

Data
0

AdrLo
1

vee

Data
7

G

Data
4

H

12

13

14

15

Inl(2)

Inl(5)

Wr
Busy

Resel

vee

InI(3)

ep
Busy

Bus
Error

DWr2

T a 9 12

Ta915

InICO)

Inl(4)

Rd
Busy

GND

T a 9 13 TagPO

Tag18

vee

GND

vee

GND

T a 914

Tag 17 Tag19

vee

vee

Tag16

Tag20

vee

Data
2

GND

GND

GND

Tag21

Ta923

Data
3

GND

vee

vee

GND

Tag22 TagP1

Data
6

Data
5

Data
8

GND

GND

vee

Tag25

Ta924

Data
10

DataP
1

Data
9

vee

vee

Tag28

Tag29

Tag26

Data
15

Data
11

GND

GND

GND

GND

TagP2 Ta927

vee

Data
12

Data

vee

vee

Acc
Typ2

Tag31

Ta930

M

Data
13

Data
16

DataP
2

GND

vee

GND

vee

GND

vee

GND

vee

GND

GND

Acc
Typ1

vee

N

Data
14

Data
18

Data
19

GND

Data
24

DataP
3

vee

vee

GND

GND

DRd1

Mem
Wr

Mem

Run

TagV

P

Data
23

Data

IWr2

Data

Data
26

Data
27

XEn

Data

22

elk2x
Sys

elk2x
Rd

Delk

IRd1

IWr1

~

Acc
Typo

a

vee

Data
21

Data
25

Data
31

Data
28

GND

Data
29

E~-

elk2x
Phi

elk2x SysOuI
Smp

vee

lelk

DWr1

vee

3

F

K

20

17

10

7
AdrLo ep~onc AdrLo
15
16

AdrLo
17

IRd2

AdrLo epConc
13
1

Inl(1)

AdrLo
8

GND

GND

vee

vee

GND

vee

GND

30

lion

11·

Ad

2871 drw 16

175-Pin PGA (Top View)
NOTE:
1. AdrLo 16 and 17 are multifunction pins which are controlled by mode select programming on interrupt pins at reset time
AdrLo 16: MP Invalidate, CpCond (2).
AdrLo 17: MP Stall, CpCond (3).

5.4

15

1DT79R3500A RISCore™ RISC CPU PROCESSOR

MILITARY AND COMMERCIAL TEMPERATURE RANGES

PIN CONFIGURATION
10

11

12

13

14

15

AdrLo CpConc AdrLo
15
0
16

Adrlo
17

Int(2)

Int(5)

Wr
Busy

Reset

VCC

A

VCC

AdrLo
6

AdrLo
10

AdrLo
11

VCC

AdrLo
14

B

Adrlo
3

DRd2

Adrlo
7

Adrlo
9

Adrlo
12

IRd2

Adrlo CpConc
13
1

Tnif1i

Int(3)

Cp
Busy

Bus
Error

DWr2

Tag12

Tag15

C

Adrlo
0

Adrlo
4

VCC

Adrlo
5

Adrlo
8

GND

GND

Int(O)

Int(4)

Rd
Busy

GND

Tag13

TagPO

Tag18

D

Data
1

Adrlo
2

GND

GND

Tag14

Tag17

Tag19

E

DataP
0

Data
0

Adrlo
1

Tag16

Tag20

VCC

VCC

Data
7

Data
2

GND

Tag21

Tag23

G

Data
4

Data
3

GND

GND

Tag 22

TagP1

H

Data
6

Data
5

Data
8

VCC

Tag25

Tag24

Data
10

DataP
1

Data
9

Tag28

Tag29

Tag2G

Data
15

Data
11

GND

GND

TagP2

Tag27

VCC

Data
12

Data

Acc
Typ2

Tag31

Tag30

M

Data
13

Data
16

DataP
2

GND

Acc
Typ1

VCC

N

Data
14

Data
18

Data
19

GND

Data
24

DataP
3

VCC

VCC

GND

GND

DRd1

Mem
Wr

Mem

Run

TagV

P

Data
23

Data
20

IWr2

Data
22

Data
26

Data
27

XEn

Data
30

Clk2x
Sys

Clk2x
Rd

DClk

IRd1

IWr1

~

Acc
TypO

Q

VCC

Data
21

Data
25

Data
31

Data
28

GND

Data
29

E~-

Clk2x
Phi

Clk2x
Smp

SysOut

VCC

IClk

DWr1

VCC

K

VCC

17

tlon

Ad

II

2871 drw 17

144-Pin PGA (Top View)
NOTE:
1. AdrLo 16 and 17 are multifunction pins which are controlled by mode select programming on interrupt pins at reset time
AdrLo 16: MP Invalidate. CpCond (2).
AdrLo 17: MP Stall. CpCond (3).

5.4

16

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

IDT79R3500A RISCore™ RISC CPU PROCESSOR

PIN DESCRIPTIONS
Pin Name
Data (0-31)

110
110

Description
A 32-b~ bus used for all instruction and data transmission among the processor, caches, memory interface, and
coprocessors.

DataP (0-3)

110
110
110
110
0

A 4-bit bus containing even parity over the data bus.
A 2O-bit bus used fortransferring cache tags and high addresses between the processor, caches, and memory interface.

Tag (12-31)
TagV
Tag P (0-2)
AdrLo (0-17)
IRd1
IWr1
IRd2
IWr2
lelk
DRd1
DWr1

The tag valid~y indicator.
A 3-bit bus containing even parity over the concatenation of TagV and Tag.
An 18-b~ bus containing byte addresses used fortransferring low addresses from the processorto the caches and memory
interface. (AdrLo 16: CpCond (2), AdrLo 17: CpCond (3) set by reset in~ialization).

0
0
0
0
0
0
0
0
0
0
0
0

Read enable for the instruction cache.
enable for the instructon cache.
An identical copy of IRd1 used to split the load.
An identical copy of IWr1 used to split the load.
The instruction cache address latch clock. This clock runs continuously.
The read enable for the data cache.
The write enable for the data cache.

MemWr
MemRd
BusError

0
0

Run

0

Exception

0

CpSync

0

Signals the occurrence of a main memory wr~e.
Signals the occurrence of a main memory read.
Signals the occurrence of a bus error during a main memory read or write.
Indicates whether the processor is in the RUN or STALL state. In the discrete design, the R3000 Run output is tied directly
to the R301 0 Run input. In the 79R3500, this is done internally, but the Run signal is also brought out for application spec~ic
coprocessors.
Indicates that the instruction that is about to comm~ to a state change should be aborted; also indicates other exception
related information. In the discrete design, the R3000 Exception output is tied to the R301 0 Exception input. In the 79R3500
this is done internally, but the Exception signal is also brought out for application specific coprocessors.
A clock which is identical to SysOut and used by external coprocessors for timing synchronization w~h the 79R3500. In
the discrete des~nc output from the R3000 is tied to the R301 0 FPSync input. In the 79R3500, this is done
internally, but the CpSync signal is also brought out for application specific coprocessors.

RdBusy

I

WrBusy
CpBusy

I
I

CpCond(1)

I

CpCond (0,2-3)

I

MPStail

I

MPlnvalidate

I

Int (0-5)

I

DRd2
DWr2
DClk
XEn
AccTyp(0-2)

I

Wr~e

An identical copy of DRd1 used to split the load.
An identical copy of DWr1 used to split the load.
The data cache address latch clock. This clock runs continuously.
The read enable forthe Read Buffer.
A 3-b~ bus used to indicate the size of data being transferred on the data bus, whether or not a data transfer is occurring,
and the purpose of the transfer.

The main memory read stall termination signal. In most system designs RdBusy is normally asserted and is deasserted
only to indicate the successful completion of a memory read. RdBusy is sampled by the processor only during memory
read stalls.
The main memory write stall initiationltermination signal.
Input used to indicate that the requested coprocessor resource is unavailable, or used to preserve the precise exception
model. In the descrete design, CpBusy is driven directly~R301 0 FpBusy output. In the 79R3500 the CpBusy input
of the CPU is the logical OR of both the internal FPA FpBusy and the external CpBusy pin. This input is provided for
external application specific coprocessors. An internal pull down resistor is provided ~ this input is left open.
Signal used by the branch on Coprocessor 1 truelfalse instruction. In discrete systems using a R301 0 FPA, this is normally
tied to the FpCond output. In the 79R3500, the internal FpCond is directly tied to the internal CpCond(1) input leaving this
pin available for other functions. This pin defau~s to outputthe FpBusy internal signal or, (via the Reset vectors), output the
FPlnt-in the latter case, external hardware must route this signal to the appropriate Int pin.
Conditional branch status from coprocessors to the processor. Function is provided on AdrLo 16/17 pins and is selected
at reset time.
Mu~iprocessing Stall. Signals tothe processor that ~ should stall accesses tothe caches in a mu~iprocessing environment.
This is physically the same pin as CpCond3; its use is determined at RESET in~ialization.
Mu~iprocessing Invalidate. Signals to the processor that ~ should issue invalidate data on the cache data bus. The address
to be invalidated is externally provided. This is the same pin as CpCond2; ~s use is determined at RESET initialization.

A 6-b~ bus used by the memory interface and coprocessors to signal maskable interrupts to the 79R3500. This bus is also
used at reset time to select among the mode-selectable features of the 79R3500. The FPA FPlnt output signal is typically
connected to one of these interrupt lines;thechoice is programmable through the reset vectors with the defau~ being Int(3).

5.4

17

MIUTARV AND COMMERCIAL TEMPERATURE RANGES

IDT79R3500A RISCore™ RISC CPU PROCESSOR

PIN DESCRIPTIONS (Continued)
Pin Name

I/O

Description

Clk2xSys

I

The master double frequency input clock used for generating SysOut.

Clk2xSmp

I

A double frequency clock input used to determine the sample point for data coming into the processor and
coprocessors.

Clk2xRd

I

A double frequency clock input used to determine the enable time of the cache RAMs.

Clk2xPhi

I

A double frequency clock input used to determine the position of the internal phases, phase1 and phase2.

Reset

I

Synchronous initialization input used to force execution starting from the reset memory address. Reset must be
deasseted synchronously but asserted asynchronously. The deassertion of Reset must be synchronized by the
leading edge of SysOut.
2871 tbl 05

ABSOLUTE MAXIMUM RATINGS(1, 3)
Symbol
VTERM

TA.Te

TSIAS
TSTG
liN

Rating
Terminal Voltage
with Respect
toGND
Operating
Temperature

Commercial
-0.5 to +7.0

Military
Unit
-0.5 to +7.0
V

o to +70(4) -55 to +125
(Ambient)
(Case)
o to +90(5)
(Case)
Case Temperature -55 to + 125(4) -65 to + 135
o to +90(5)
Under Bias
Storage
-55 to +125 -65 to +155
Temperature
Input Voltage
-0.5 to +7.0 -0.5 to +7.0

RECOMMENDED OPERATING
TEMPERATURE AND SUPPLY VOLTAGE
GND

Vee

Military
16-33 MHZ

-55°C to + 125°C
(Case)

OV

5.0 ±10%

Commercial
16-33 MHz

O°C to +70°C
(Ambient)

OV

5.0 ±5%

Commercial
37-40 MHz

O°Cto +90°C
(Case)

OV

5.0 ±5%

Grade

Temperature

°C

°C
°C

""., "II

OUTPUT LOADING FOR AC TESTING

V

2871 1bl 06

NOTE:
1. Stresses greater than those listed under ABSOLUTE MAXI MUM RATINGS
may cause permanent damage to the device. This is a stress rating only
and functional operation of the device at these or any other conditions
above those indicated in the operational sections ofthis specification is not
implied. Exposure to absolute maximum rating conditions for extended
periods may affect reliability.
2. VIN minimum = -3.0V for pulse width less than 15ns.
VIN should not exceed Vcc +0.5 Volts.
3. Not more than one output should be shorted ata time. Duration of the short
should not exceed 30 seconds.
4. 16-33 MHz only.
5. 37-40 MHz only.

+4mA

To Device
Under Test

-4mA

AC TEST CONDITIONS
Symbol

Parameter

2871 drw 16

Min.

Max.

Unit

VIH

Input HIGH Voltage

3.0

-

V

VIL

Input LOW Voltage

-

0.4

V

VIHS

Input HIGH Voltage

3.5

-

V

VILS

Input LOW Voltage

-

0.4

V
2871 tbl 07

5.4

18

IDT79R3500A RISCore™ RISC CPU PROCESSOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

DC ELECTRICAL CHARACTERISTICSCOMMERCIAL TEMPERATURE RANGE (TA = O°C to +70°C, VCC = +5.0V ±5%)
79R3500A
16.67MHz
Symbol

Test Conditions

Parameter

79R3500AE

2O.OMHz

25.0MHz

33.33MHz

Min.

Max.

Min.

Max.

Min.

Max.

Min.

-

3.5

-

3.5

VOH

Output HIGH Voltage

Vee = Min., 10H = -4mA

3.5

-

3.5

VOL

Output LOW Vo~age

Vee = Min., 10L = 4mA

-

0.4

-

0.4

-

0.4

VOHe

Output HIGH Vo~age(7)

Vee

4.0

-

2.4

2.4

-

4.0

Output HIGH Voltage(4.6)

-

4.0

VOHT

2.4

rt$:\ I::':::~;~

VOLT

Output LOW Voltage(4.6)

= Min., 10H = -4mA
Vee = Min., 10H = -SmA
Vee = Min., 10L = SmA

-

O.S

-

O.S

-

VIH

Input HIGH Voltage(5)

2.0

-

2.0

-

? r{ I{(:S'

VIL

Input LOW Voltage(1)

-

O.S

-

o.S;:;: 1:\:,.:4::::::::

VIHS

Input HIGH Voltage(2,5)

3.0

-

3.0

VILS

Input LOW Voltage(1,2)

-

0.4

CIN

Input Capacitanee(6)

-

10

COUT

Output Capacitanee(6)

Icc

Operating Current

Vee

IIH

Input HIGH Leakage(3)

VIH

. .:.,. :·:::C::(>
.i§:q!:: \:::=.:£
--::,::::) 1:::::)06) -

IlL

Input LOW Leakage(3)

loz

Output Tri-state Leakage

= 5V, TA = 70°C
= VCC
VIL = GND
VOH = VCC, VOL = GND

loej":p:?-100

100

-

Ii: -

V

I?:fl .i
;:::;:r..:

V

-

V

O.S

V

2.0

-

V

-

O.S

V

-

3.0

-

V

0.4

-

0.4

V

-

10

10

pF

-

10

10

pF

650

-

750

mA

100

-

100

J.lA
J.lA

.~,:.

..::::::'..

-

:::i:=P:~.::::"::)1:::·.

I::::':;)::",

:':::'::":'

: ,,}:.4;:\:,:/ 10

GoUT

Output Capacitance(6)

-

• n·:::::'

-.,::':::::>

10

-

10

-

10

pF

Icc

Operating Current

:~::'~

600

650

mA

Input HIGH Leakage(3)

-

750

IIH

-

100

IlA
IlA

'':'

~j;".l}, ...

ilL

Input LOW Leakage(3)

= 5V, TA = 70°C
VIH = VCC
VIL = GND

loz

Output Tri-state Leakage

VOH = VCC, VOL = GND

Vec

NOTES:

-106{
-100

>100

:::,7"

100

10

100

-100

-

-100

-

-100

-

-100

100

-100

100

-100

100

" ., "EI
Il A

1.
2.
3.
4.

VIL Min. = -3.0V for pulse width less than 15ns. VIL should not fall below -0.5 Volts for larger periods.
VIHS and VILS apply to CIk2xSys, CIk2xSmp, Clk2xRd, Clk2xPhi, Cp8usy, and Reset.
These parameters do not apply to the clock inputs.
VOHTand VOLT apply to the bidirectional data and tag busses only. Note that VIH and VILaiso apply to these signals. VOHT and VOLT are provided
to give the designer further information about these specific signals.
5. VIH should not be held above Vcc + 0.5 volts.
6. Guaranteed by design.
7. VOHC applies to RUN and Exception.

5.4

20

IDT79R3500A RISCoren! RISC CPU PROCESSOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

DC ELECTRICAL CHARACTERISTICSCOMMERCIAL TEMPERATURE RANGE (Tc = O°C to +90°C, Vcc = +S.OV ±S%)
79R3500AE
40.0MHz

37.0MHz

Symbol

Test Conditions

Parameter

VOH

Output HIGH Voltage

Vee = Min., IOH = -4mA

VOL

Output LOW Voltage

Vee

VOHe

Output HIGH Voltage(7)

Vee = Min., IOH = -4mA

VOHT

Output HIGH Voltage(4.6)

Vee

VOLT

Output LOW Voltage(4,6)

Vee

= Min., IOH = -SmA
= Min., IOL = SmA

= 5V, TA = 70°C

VIH

Input HIGH Voltage(5)

VIL

Input LOW Voltage(1)

Min.

Max.

3.5

Min.

Max.

Unit
V

3.5

= Min., IOL = 4mA

VIHS

Input HIGH Voltage(2,5)

VILS

Input LOW Voltage(1 ,2)

CIN

Input Capacitanee(6)

COUT

Output Capaeitanee(6)

Icc

Operating Current

Vee

IIH

Input HIGH Leakage(3)

VIH = VCC

ilL

Input LOW Leakage(3)

VIL

loz

Output Tri-state Leakage

VOH

= GND
= VCC, VOL = GND

'~100

100

-100

100

JlA
2871 tbl 11

NOTES:

VIL Min. = -3.0V for pulse width less than 15ns. VIL should not fall below -0.5 Volts for larger periods.
VIHS and VILS apply to Clk2xSys, Clk2xSmp, Clk2xRd, Clk2xPhi, Cp8usy, and Reset.
These parameters do not apply to the clock inputs.
VOHT and VOLT apply to the bidirectional data and tag busses only. Note that VIH and VIL also apply to these signals. VOHT and VOLT are provided
to give the designer further information about these specific signals.
5. VIH should not be held above Vec + 0.5 volts.
6. Guaranteed by design.
7. VOHC applies to RUN and Exception.
1.
2.
3.
4.

5.4

21

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

1DT79R3500A RISCorel l l RISC CPU PROCESSOR

AC ELECTRICAL CHARACTERISTICS(1,2,3)COMMERCIAL TEMPERATURE RANGE (TA = O°C to +70°C, VCC = +5.0V ±5%)

Parameter

Test Conditions

NOTES:
1. All timings are referenced to 1.5V.
2. The clock parameters apply to all four 2xClocks: Clk2xSys, Clk2xSmp, Clk2xRd, and Clk2xPhi.
3. This parameter is guaranteed by design.
4. These parameters apply when the 79R301 0 Floating Point Coprocessor is connected to the CPU. With phase lock on, Reset must be asserted
for the longer of 3000 clock cycles or 200 microseconds.
5. Tcyc is one CPU clock cycle (two cycles of a 2x clock).
6. With the exception of the Run signal, no two signals on a given device will derate for a given load by a difference greater than 15%.
7. Clock transition time < 2.5ns for 33.33MHz; clock transition time < 5ns for other speeds.

5.4

22

IDT79R3500A RISCoreThi RISC CPU PROCESSOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

AC ELECTRICAL CHARACTERISTICS(1,2,3)MILITARY TEMPERATURE RANGE (Tc = -55°C to +12S o C, VCC = +S.OV ±10%)

Parameter

Test Conditions

NOTES:
1. All timings are referenced to 1.SV.
2. The clock parameters apply to all four 2xClocks: Clk2xSys, Clk2xSmp, Clk2xRd, and Clk2xPhi.
3. This parameter is guaranteed by design.
4. These parameters apply when the 79R301 0 Floating Point Coprocessor is connected to the CPU. With phase lock on, Reset must be asserted
for the longer of 3000 clock cycles or 200 microseconds.
5. Tcyc is one CPU clock cycle (two cycles of a 2x clock).
6. With the exception of the Run signal, no two signals on a given device will derate for a given load by a difference greater than 15%.
7. Clock transition time < 2.Sns for 33.33MHz; clock transition time < Sns for other speeds.

5.4

23

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

IDTI9R3500A RISCoreTId RISC CPU PROCESSOR

AC ELECTRICAL CHARACTERISTICS(1,2,3)COMMERCIAL TEMPERATURE RANGE (Tc = O°C to +90°C,

Parameter

VCC

= +S.OV±S%)
Unit

Test Conditions

NOTES:
1. All timings are referenced to 1.5V.
2. The clock parameters apply to all four 2xClocks: Clk2xSys, Clk2xSmp, Clk2xRd, and Clk2xPhi.
3. This parameter is guaranteed by design.
4. These parameters apply when the 79R3010 Floating Point Coprocessor is connected to the CPU. With phase lock on, Reset must be asserted
for the longer of 3000 clock cycles or 200 microseconds.
5. Tcyc is one CPU clock cycle (two cycles of a 2x clock).
6. With the exception of the Run signal, no two signals on a given device will derate for a given load by a difference greater than 15%.
7. Clock transition time < 2.5ns.

5.4

24

IDT79R3500A RISCore™ RISC CPU PROCESSOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

Clk2xSys

Clk2xSmp

Clk2xRd

CIk2xPhi

2871 drw 19

Figure 15. Input Clock Timing

SmpOut*

RdOut*

PhiOut*

2871 drw 20

Figure 16. Processor Reference Clock TIming
•

These signals are not actually output from the processor.
They are drawn to provide a reference for other timing diagrams.

5.4

25

1DT79R3500A RISCoreThi RISC CPU PROCESSOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

2

Phase

2

SysOut

PhiOut

AddrLo

AccTyp 0:1

Size of Stored Data

Size of Load Data

AccTyp 2
D Bus
Input

I Bus
Input

Data and
Tag Buses

II

Tds
Tsmp
IClk
Trd
DClk

IRd

DRd

DWr
2871 drw 21

Figure 17. Synchronous Memory (Cache) Timing

5.4

26

1DT79R3500A RISCore™ RISC CPU PROCESSOR

RUN
Phase

MILITARY AND COMMERCIAL TEMPERATURE RANGES

STALL

2

STALL

2

FIXUP

2

RUN

2

AddrLo

Tag
(Address
High)

AccTyp 0:1

AccTyp 2

Data
(Output)

2871 drw 22

Figure 18. Memory Write Timing

5.4

27

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

IDT79R3500A RISCore™ RISC CPU PROCESSOR

RUN
Phase

STALL
2

STALL
2

FIXUP
2

RUN
2

AddrLo

Tag
(Address
High)

AccTyp 0:1

AccTyp 2

II

Data
(Input)

RdBusy

CpCondO

2871 drw 23

Figure 19. Memory Read Timing

5.4

28

IDT79R3500A RISCoreThi RISC CPU PROCESSOR

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

Co rocessor Load

Co rocessor Store

Phase

I

2

2

Cp8usy

CpCond(n)

2871 drw24

Figure 20. Coprocessor Load/Store Timing

5.4

29

IDT79R3500A RISCoreTId RISC CPU PROCESSOR

Phase

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

2

2

2871 drw 25

Figure 21. Interrupt Timing

Phase

2

2

2

2

2

II

Mode

t-TdH
2871 drw 26

Figure 22. Mode Vector Initialization
NOTES:
1. Reset must be negated synchronously; however, it should be asserted asynchronously. Designs must not rely on the proper functioning of SysOut prior
to the assertion of Reset.
2. If Phase-Lock On or ....
RA.300=O,.."M7"o-de:-are asserted as mode select options, they should be asserted throughout the Reset period, to insure that the slowest
coprocessor in the system has sufficient time to lock to the CPU clocks.
3. Reset is actually sampled in both Phase 1 and Phase 2. To insure proper initialization, it must be negated relative to the end of Phase 1.

5.4

30

IDT79R3500A RISCore™ RISC CPU PROCESSOR

MILITARY AND COMMERCIAL TEMPERATURE RANGES

ORDERING INFORMATION"
IDT

XXXXX
Device Type

XX
Speed

x
Process!
Temperature
Range

y~ank
L -_ _ _ _ _ _- j

GD175
GD144

F

~

16
20
______________________~ 25
33
37
40

L - - - - - - - - - - - - - - - - - - - - - l 79R3500A

Commercial (O°C to +70°C)
Military (-55°C to + 125°C)
Compliant to MIL-STD-883, Class B
Military Temperature Range Only
175-Pin PGA (Cavity Down)
144-Pin PGA (Cavity Down)
172-Pin Flat Pack (Cavity Down)
16.67 MHz
20.0 MHz
25.0 MHz
33.33 MHz

RISC CPU Processor
Enhanced Timing Version
2871 drw 27

AE2860-0

5.4

31

G@
Integrated DevIce Technology,lnc.

ADVANCE INFORMATION
lOT 79R3051™, 79R3051E
lOT 79R3052™, 79R3052E

IDT79R3051 FAMILY OF
INTEGRATED
RISControliers™

FEATURES:
• Instruction set compatible with IDT79R3000A and
IDT79R3001 MIPS RISC CPUs
• High level of integration minimizes system cost, power
consumption
79R3000A 179R3001 Execution Engine
R3051 features 4kB of Instruction Cache
R3052 features akB of Instruction Cache
All devices feature 2kB of Data Cache
"E" Versions (Extended Architecture) feature full
function Memory Management Unit, including 64entry Translation Lookaside Buffer (TLB)
4-deep write buffer eliminates memory write stalls
4-deep read buffer supports burst refill

R3051 FAMILY BLOCK DIAGRAM
C1k2,ln

JI Generator
Clock
Unit

~

On-chip DMA arbiter
Bus Interface Minimizes Processor Stalls
Single clock input
Direct interface to R3720/21/22 RISChipset
35 MIPS, over 64,000 Dhrystones at 40 MHz
Low cost 84-pin PLCC packaging
Flexible bus interface allows simple, low cost designs
20, 25, 33, and 40 MHz operation
Complete software support
Optimizing compilers
Real-time operating systems
Monitors/debugge rs
Floating Point Software
Page Description Languages

•
•
•
•
•
•
•

I

BrCond(3:0)

I Master Pipeline Control I
System Control
Coprocessor

II

Integer
CPU Core

r
General R~isters
(32 x3 )

Exception/Control
Registers
Memo~ Management

ALU

egisters

T>
...............
....... ....

Int(5:0)
~

•

...; .• <•.

Shifter

.<

.:.:•...... <.:.:

MultlDiv Unit

; Bt: fe

Address Adder

,04

.......•

.

! •••• / (

••••••••••••••

<>

PC Control

~

I

Virtual Address
32 ~

,

Phvsical Address Bus

+

+

Instruction
Cache
(SkB/4kB)

:

y

Data Bus

:

Bus Interface Unit
4-deep
Write
Buffer

";,~~11
Buffer

Address!
Data

RISController, R305x, R3051 , R3052 are trademarks of Integrated Devioe Technology, Inc.

I

Dala
Cache
(2kB)

DMA
Arbiter

1
BIU

Control

32 ~

,

I

t t 1

DMA

RdlWr

Ctrf

Ctrf

Sy'SClk
NOVEMBER 1990

ySI.lII\

Unit

ns

2

t -_ _-+~trurSt~u~rs~WJ~rN-e-a;.;.;r,;..;...;.;.:..,~D.;.;.ata=En~-+_______
?::~
_ _ _ _ _-+__+-_-+_ _-+-__+-_~~.

t5"B'U'S'Grii
t6"B'U'S'Grii

Max.

rising

8

7

8
5

7
5

-

6··:\t

10

ns

10

ns

5

ns

6
~t:::::;:;4:;:·

5
3

ns
ns

~:,:"t :t}~"'3::
S'",~t
1.5

2

ns

2

ns
ns

12

t7

Wr, ~, t3urSVWrNea:r, AID

Negated from ~ falling
Valid from ~ rising

t8

ALE

Asserted from ~ rising

4

4

t9
110

ALE
AID

Negated from ~ falling
Hold from ALE negated(4)

4

4

t11
t12

tJaiaEii
DataEn

Asserted from ~ falling
Asserted from AID tri-state(4)

0

0

0':'::"';; :::=

0

ns
ns

t14

AID

Driven from ~ rising(4)

0

0

0

0

ns

~::~~

__

2

2
15

~~~~~~~3~;2~~~a_ta_E_n_'B_u_rs_t_AN_r_N_ea_r~~~:~:g:~;~:~::~~~,",~~m=:=~C=I;~_f_al_lin_lg_~

__

AID

Tri-state from ~ falling

10

t19
t20
t21

AID
Clk2xln
Clk2xln

~fallingtodataout

10

Pulse Width High
Pulse Width Low

10
10

t22
t23
t24

Clk2xln
~

Clock Period
Pulse Width from Vcc valid
Minimum Pulse Width

25
200
32

t25
t26

~

Set-up to ~ falling
Mode set-up to li9SEii rising

6
6

Mode hold from li9SEii rising
Set-up to WsCl'k falling

2
6

t27
t28

rrrt
rrrt

:t::::rS

tt~:::;:.>..

1.5

13

+-~~~4-_~~~~:~jttt~~~
. . *.;~_,+-_:~+-_-+_;.;.;:~~;;.;.;:~

t18

~

15

-S:;,',;

10
8
8

1';';'::;=:1..

9

8

ns

....10
::':;:':4rtrf6.5
.. 6.5

9

8

ns
ns
ns

20
200;:;:::
-it:·
32 ';';';:';'=::' ;~;",

5
5

15
200
32

12.5
200
32

tsys

:$::,: ;:::, ::::::;-

4
4

3
3

ns
ns

2

4
1

3
1

5

ns

ns
ns

t29

STiii, SBrCond
STiii, SBrCond

Hold from ~ falling

2

t30

rrrt, BrCond

Set-up to ~ falling

6

t31
tsys

rrrt, BrCond
~yscn<

Hold from ~ falling
Pulse Width

2
2*t22

t32
t33

~YsCIk
~Y5CIk

Clock High Time
Clock Low Time

t22 - 2 t22+2 t22-2 t22+2 t22-1 t22+1 t22-1 t22+1
t22 - 2 t2atf::z .;t22 - 2 t22 + 2 t22 - 1 t22 + 1 t22 -1 t22 + 1

tderate

All outputs

Timing deration for loading
over 25pf(4, 5)

ns
ns
ns

2*t22';:' ;:;;~N::h

2*t22

2*t22

2*t22

2*t22

2*t22
ns
ns

nsl
25pF

NOTES:
1.
2.
3.
4.
5.

All timings referenced to 1.5 Volts.
All outputs tested with 25 pF loading.
The AC values listed here reference timing diagrams contained in the R3051 Family Hardware User's Manual.
Guaranteed by design.
This parameter is used to derate the AC timings according to the loading of the system. This parameter provides a deration for loads over the specified
test condition; that is, the deration factor is applied for each 25 pF over the specified test load condition.

5.5

16

ADVANCE INFORMATION

IDT79R3051 FAMILY OF INTEGRATED RISControliera

ORDERING INFORMATION
lOT

xxxxx
Device Type

xx

x

Speed Package

x
Process!
Temp. Range

Y

Blank
'B'
'M'

L..--------I 'J'
'OJ'
'20'

L..-----------i '25'
'33'
'40'
79R3051

L..---------------f 79R3051
E
79R3052
79R3052E

Commercial Temperature Range
Compliant to MIL-STD.-883, Class B
Military Temperature Range Only
84-Pin PLCC
84-Pin J-Bend Cerpack
20.0 MHz
25.0 MHz
33.33 MHz
40.0 MHz
4kB Instruction
4kB Instruction
8kB Instruction
8kB Instruction

Cache,
Cache,
Cache,
Cache,

No TLB
With TLB
No TLB
With TLB

II

5.5

17

(~5

ADVANCE
INFORMATION
IDT79R4000

THIRD GENERATION
MIPS RISC PROCESSOR

Integrated Device Technology, Inc.

FEATURES:

• High-Level of Integration
- RISC Integer Unit
- IEEE Compatible Floating Point Units
- Memory Management Unit
- 8kB Instruction Cache
- 8kB Data Cache
- Direct control of optional secondary cache
- Extensive multi-processing support

• High-Performance, Highly Integrated CPU
• Fully Binary Compatible with R2000, R3000 CPUs
• Capable of over 50 VAX MIPS sustained system
performance
• High-level of performance
- Utilizes super-pipe lining to exploit 2-level instruction level parallelism with no issue restrictions
- Balanced integer and floating point performance
- 64-bit floating point extensions
- Multi-processing support

BLOCK DIAGRAM

1
Memory and
Secondary Cache
Interface
8k Bytes
Instruction
Cache

8k Bytes
Data
Cache

1

s~r-pipelined

antral Unit

Register File
32 x 32-bits

Register File
32 x 32-bits

Exception
Handling
Registers

Integer Multiply!
Divide

Floating Point
Execution Units

Memory
Management
Registers
Translation
Lookaside
Buffer

ALU
Integer
Unit

Floatin~

Point
Units
(CP1)

Memory Management
and
Exception Handling
(CPO)

CEMOS Is a trademark 01 Integrated Device Technology, Inc.

NOVEMBER 1990
«:>1990 Integrated Device Technology, Inc.

5.6

DSC-80531·

1

IDT79R4000 THIRD GENERATION MIPS RISC CPU

ADVANCE INFORMATION

DESCRIPTION:

Exploiting Instruction Level Parallelism

The R4000 is the third generation of MIPS RiSe technology, continuing MIPS track record as the performance leader
and establishing a new performance standard forthe 1990's.
The R4000 maintains full binary compatibility with applications executing on the R2000 and R3000 MIPS RiSe epus
(also available from lOT) and lOT's RISeontrolierTM family,
while achieving substantially higher performance. The key to
this performance standard is both the architecture/implementation of the processor, and the level of integration achieved
in a single chip. The R4000 contains both a high-performance
execution core (integer and floating point) as well as sufficient
memory bandwidth (large on-chip caches) to keep the execution engine running. The on-chip resources are complemented by a direct interface to an optional secondary cache,
and also by multi-processing support, allowing the system
designerto further increase memory bandwidth to the processor, and to operate numerous processors together to increase
overall computational power. This balanced architectural
approach allows R4000 based systems to achieve a wide
range of price performance goals.

There are a number of techniques available to achieve
multiple instructions perclockcycle. The two discussed most
frequently are super-scalar, and super-pipelined architectures. These machines attempt to initiate multiple instructions
per clock cycle, as long as there are no data dependencies
between the instructions. Thus, it is uptothe epu (ratherthan
the compilers or programmer) to detect and exploit this
"instruction level parallelism" to increase performance.
Superscalar machines attempt to run multiple instructions
in distinct pipelines. In order to accomplish this, execution
resources must be replicated in each pipeline. Further,
significant logic must exist between the pipelines to insure that
data dependencies amongst multiple instructions are resolved properly, and to insure that exceptions·are detected
and handled preCisely. Figure 2 illustrates a theoretical superscalar machine of degree 2.
Superpipelined machines attempt to initiate multiple instructions per clock cycle, sequentially. In order to do this,
execution units in the basic machine pipeline which have long
latencies (require a long time to complete their operation)
must be pipe lined, so that multiple instructions can be executing
simultaneously (although sequentially) in those units. In order
to achieve high performance, the speed of the individual
pipestages is higher than in equivalent superscalar implementation. However, each pipestage is significantly less
complex than the super-scalar equivalent, allowing these
higher speeds to be achieved. Figure 3 shows the equivalent
super-pipe lined machine of degree 2 (not intended to represent R4000 pipeline).
Studies have shown that architecturally, superscalar and
superpipelined are duals of each other: that is, each approach
is roughly equivalent in its ability to exploit instruction level
parallelism. Differences in performance will then be related to
the actual implementation of those techniques in a given
processor, constrained by the current semiconductor technology. Compromises and trade-offs which may reduce
effectiveness from the theoretical machine include:

Keys to Performance
All microprocessors are governed by the same basic performance equation: the time required to perform a given task
is the product of the number of instructions required to execute
the task with the  time required to complete an
instruction. MIPS optimizing compilers, and the MIPS RiSe
architecture, serve to minimize the first term in the product.
The R4000 maintains the same focus on compiler technology
as an extension of the epu architecture as did the earlier
generations of MIPS processors.
The remaining term in the performance equation is the
average amount of time required to execute instructions. The
R4000 is designed to exploit 2-level instruction parallelism,
thus being able to retire 2 instructions per clock cycle (sustained). Further, the architecture, and the level of integration,
allow substantially faster clock rates to be used. The combination of fast clock rates, and multiple instructions per clock
cycle, minimize the average time per instruction.
One Clock
Cycle

II

One Clock
Cycle

~
Instruction
Fetch

Decode!
Register Fetch

ALU

Memory
Access

Register
Write Back

Instruction
Fetch

Decode!
Register Fetch

ALU

Memory
Access

Register
Write Back

Instruction
Fetch

Decode!
Register Fetch

ALU

Memory
Access

Register.
Write Back

Instruction
Fetch

Decode!
Register Fetch

ALU

Memory
Access

Register
Write Back

Figure 2. Superscalar RISC Pipeline

5.6

2

ADVANCE INFORMATION

IDT79R4000 THIRD GENERATION MIPS RISC CPU

One Clock
Cycle

One Clock
Cycle

~
Decode Decode
IIOFf'h I-Fetch
2
1
2

ALU
1

I-Fetch I-Fetch Decode Decode
1
2
1
2

ALU

Mem

2

Mem
2

WB

WB

1

1

2

ALU

ALU

Mem

2

Mem
2

WB

1

1

2

ALU

ALU

Mem

1

I-Fetch I-Fetch Decode Decode
1
2
1
2

1

I-Fetch I-Fetch Decode Decode
1
2
1
2

WB

2

Mem
2

WB

1

1

2

ALU

ALU

Mem

Mem
2

WB

1

2

1

WB

1

WB
2

Figure 3. Superplpellned RISC Pipeline

• Issue Restrictions: Due to the complexity and implementation cost of replicating execution units, and checking for
data dependencies between parallel pipelines, many super-scalar machines institute restrictions in the types of
operations that may be initiated in parallel. This degrades
the chip from the theoretical performance of a true superscalar machine.

Based on these constraints, the architects of the R4000
have implemented a super-pipelined execution engine to
exploit 2-level instruction parallelism, with no issue restrictions, and with substantial primary instruction and data caches.
The R4000 is thus capable of the raw execution speed
required to achieve high-performance, and supplies sufficient
bandwidth from its primary caches to minimize main memory
cycles. Finally, the R4000 is able to benefit from the strength
of the MIPS optimizing compiler technology, without requiring
an arbitrarily complex peephole scheduler.

• Clock Frequency: the complexity of an implementation
affects the clock frequencies achievable. A machine designed for instruction level parallelism may become so
complex that the clock frequency is adversely affected. This
can be a greaterfactor in multi-chip implementations, where
significant speed is lost in bringing Signals from one packaged part to another across a PC board.

Level of Integration:
The R4000 brings all of the execution resources necessary
for a high-performance computing system into a single chip.
These resources include:

• Memory Bandwidth: a high-performance microprocessor
needs substantial memory bandwidth to achieve its performance potential. If the implementation is too complex, there
may not be enough room for adequate caches to keep the
execution engine fed. .A good implementation is able to
integrate sufficient cache memory to allow the execution
engine to frequently operate at its peak performance rating.
• Complier technology: the compiler technology must be
capable of generating efficient code forthe execution engine.
In the case of a machine with issue restrictions, complex
peephole optimizations may be required to maximize parallel operation. These optimizations are generally outside
the realm of other traditional optimizations required for more
general (less restrictive) machines.

• A High Performance Execution Engine. The R4000 utilizes a super-pipelined execution engine, while maintaining
full binary compatibility with the R2000/R3000.
• Full featured MMU. The R4000 integrates memory management and exception handling facilities on-Chip as the
system control co-processor (CPO), thus not requiring an
external MMU device.
• High Performance Floating Point Accelerator. The R4000
integrates single and double precision floating point onchip, as co-processor 1.

• Level of parallelism exploited. A tradeoff can be made
between machines with significant issue restrictions· but
high peak parallelism and machines with few or no issue
restrictions but less peak parallelism. Numerous studies
have shown that a machine capable of exploiting two level
instruction parallelism, with no issue restrictions, will outperform a machine capable of exploiting 4-level parallelism
but which implements significant issue restrictions. Further,
the amount of parallelism exploited by the machine may
cause tradeoffs in other areas, such as amount of cache or
clock frequency.
5.6

• Large primary caches: the R4000 integrates large (Sk
bytes each) instruction and data caches on-Chip. These
large caches allow the execution resources to operate at
peak rates through substantial amounts of the application,
resulting in high actual system performance, not just peak
native MIPS.
• Direct support for optional secondary cache. The R4000
incorporates the ability to implement an external secondary
cache, to further increase processor bandwidth. This is
especially important in multi-processing systems.
• Multi-processing support. The R4000 provides the
support necessary to implement high-performance, multiprocessing systems.

3

ADVANCE INFORMATION

IDT79R4000 THIRD GENERATION MIPS RISC CPU

APPLICATIONS
The R4000 extends the performance range served by the
MIPS architecture into higher levels of performance. The
R4000 provides a high-performance migration path to those
applications currently served by devices such as the R3000,
R3001, and R3051.
The MIPS RISC architecture has found widespread acceptance in a number of applications. These include:

• Hlgh-perfonnance multi-processing systems. Further
computational throughput can be achieved by implementing multiple R4000 in a single system, as iIIustrat:d in .
figure 4. MIPS RISC is already well represented I~.multl­
processing applications, including systems from Slhcon
Graphics, Stardent Computer, and Digital Equipm:nt
Corporation. The R4000 allows these syste~s to Im~le­
ment even higher performance in each CPU, increasing
overall system capability.

Figure 4. R40QO·Based Multl·Processlng System

II

• Real-time systems. The Joint Integrated Avionics
Working Group (JIAWG) committee has selected the
MIPS RISC architecture as a standard for military avionics. The R4000 allows these real-time applications to
benefit from high integration and CPU performance .
• Embedded computing systems. The MIPS RISC
architecture has won designs in a number of embedded
systems applications, including laser printers, graphics
systems, and data communications. A typical highperformance embedded system buil1 around the R4000 is
illustrated in figure 5.

Figure 5. R40QO·Based Embedded System

5.6

4

IDT79R4000 THIRD GENERATION MIPS RISC CPU

ADVANCE INFORMATION

• Desktop workstations. MIPS RISC is a leading architecture in UNI)(TM based workstations, from vendors such
as MIPS, Digital Equipment Corporation, and Silicon
Graphics. The R4000 extends the performance range
achievable in a desktop environment, while minimizing
chip count (and thus real estate, cost, and power consumption) as illustrated in figure 6.

• Deskslde server systems. The R4000 is also capable
of supporting high performance server systems, such as
systems built by MIPS and Digital Equipment (around the
R3000) today. Implementing a secondary cache, and a
larger I/O and main memory system, extends the basic
UNIX system to a high-performance server system, as
shown in figure 7.

(VME, TurboChannel, etc.)

(VME, TurboChannel, etc.)
Figure 7 •. R4000-Based Deskslde Server System

Figure 6. R4000-Based Desktop WorkstaUon

ADDITIONAL INFORMATION
Additional information on the R4000 is available from lOT.
Please contact your local sales representative for additional
information on this product.

5.6

5

RISC SUPPORT COMPONENTS

RISC SUPPORT COMPONENTS
A RISC microprocessor is an important, but not selfsufficient, element of a high-performance general or embedded computing system. Equally important is the memory
system (both cache and main memory) and the I/O interface
to the execution core.
To simplify the task of building these high-performance
subsystems, lOT produces a wide variety of support chips
and building block devices. These chips range from general
purpose devices such as fast static RAM and high-performance logic (used with many processor families) , to specialized devices used in only certain types of applications (such
as the lOT LaserFIFO, used in laser printer systems) and
devices deSigned to work with only a specific processor
family.
Generic building block devices include SRAMs, with densitiesfrom 16KB to 1MB and access times as low as 7ns, as
well as high-speed logic devices such as the FCT-T family.
Devices specifically developed for RISC systems include
the RISChipset™ - 3720 Bus Exchanger, 3721 DRAM
Controller and the 3722 I/O Controller. These components

6.0

facilitate design of systems based upon the R3051/52 controller family. The DRAM and I/O controllers have direct bus
interface to the 3051/52.
The R3020 Write Buffer enhances the performance of
R3000 systems by allowing the processor to perform write
operations at full clock speeds instead of resorting to timeconsuming CPU stall cycles. The memory can then retire
the data at a slower rate. The R32xx family of read/write
buffers includes the memory read capability, enabling the
use of slower main memory without impacting system performance.
By providing these system solutions as building blocks,
lOT allows its customers the maximum flexibility in achieving
their price performance goals while minimizing time-to-market, real estate and complexity of the end system.
This section of the data book contains some selected
devices which have either been specifically designed for
particular RISC processors or found to be exceptionally
useful in these high-performance systems.

TABLE OF CONTENTS
PAGE

RISC SUPPORT COMPONENTS
IDT79R3720
IDT79R3721
IDT79R3722
IDT79R3020
IDT73200L
IDT73201L
IDT73210
IDT73211

Bus Exchanger for R30SFM Family ..........................................................................
DRAM Controller for R30S1 Family.......... .................................................. ...............
1/0 Interface Controller for R30S1 Family..................................................................
RISC CPU Write Buffer .............................................................................................
16-Bit CMOS Multilevel Pipeline Registers ...............................................................
16-Bit CMOS Multilevel Pipeline Registers ...............................................................
Fast CMOS Octal Register Transceiver with Parity...................................................
Fast CMOS Octal Register Transceiver with Parity............ ............ ............ ...............

6.1
6.2
6.3
6.4
6.S
6.S
6.6
6.6

MacStation. RISC CPU SubSystem. RISController, RealS and TargetSystem are Trademarks of Integrated Device Technoiogy, Inc.
Apple, Macintosh, AppleTalk, LaserWrlter, AlUX, Mu~IFlnder are registered Trademarks of Apple ColT'4JUter, Inc.
UNIX Is a registered trademark of Unix System Laboratories.
MIPS, RISClos, and RISCorTllller are trademarks of MIPS COrTlluter Systems, Inc. NuBus is a trademark of Texas Instruments, Inc.
MS-DOS Is a registered trademark of MicroSoft Corporation. Truelmage and Windows are trademarks of MicrosoftCorporation.
TrueType Is a trademark of Apple CorTlluter.
Postscript Is a trademark of Adobe Systems~
PeerlessPage Is a trademark of The Peerless Grol.p.

6.0

2

G®

ADVANCE
INFORMATION
IDT79R3720

BUS EXCHANGER
FOR R3051 FAMILY

Integrated Device Technology, Inc.

FEATURES:

DESCRIPTION:

• Direct Interface to R3051 Family RISChipSet™
---:. R3051™ Family of Integrated RISControlierTM
CPUs
- R3721 DRAM Controller
- R3722 I/O Interface Controller
• Interfaces a single CPU bus to interleaved or banked
memory systems
• Data path for read and write operations
• Low noise outputs
• Supports R3051 family systems from 20 to 33MHz
• Simplifies data path design in high-performance memory
systems
• 3-Bus Architecture
- One CPU Bus
- Two (interleaved or banked) memory busses
- Each bus independently latched to support
asynchronous operation
• 68-Pin PLCC Package
• High-performance CEMOSTM technology

The R3720 Bus Exchanger is designed to provide data path
support in an R3051 family system utilizing interleaved or
banked memory techniques. The Bus Exchanger is responsible for interfacing between the CPU ND bus (CPU address/
data bus) and multiple memory data busses.
Thus, the Bus Exchanger uses a three bus architecture,
with control signals suitable for simple transfer between the
CPU bus and either memory bus. The Bus Exchanger
features independent read and write latches for each memory
bus, thus supporting a variety of memory strategies.
The bus exchanger can be used as a simple transceiver,
passing data between the single CPU bus and the pair of
memory busses. Alternately, data from any of the three ports
can be latched by the Bus Exchanger, to free the sending port
(memory, during reads, and the CPU, during writes) while the
receiving port processes the transfer at its m'm rate.
The difference in operation is accomplished through the
use of a simple set of control Signals. These signals include
independent latch enables for all three ports, signals to
indicate the direction of transfer (read or write) and which
memory port is involved, and other signals which can be used
to force the ports to operate as either a transceiver or a true
latch.

BLOCK DIAGRAM

CPUEnEven----------~--------~

r----------- EvenOEn

Even
Write
Latch

Even(15:0)
Even
Read
Latch

EvenEn

CPU(15:0) ...- - -....
4

T/R
Path
SelOdd
SelEven
OddOEn

CPUEnOdd----------~--~----~

Odd
Write
Latch

. .- - - - - - . Odd(15:0)
Odd
Read
L.:La:t:ch.:.J-_ _ _ _ _ _ _ _ _ _ _ _ _ _ OddEn

1+----.. .

CPUOEn-----------~

CEMOS. RISController. R305x. R3051. R3052 are trademarks 01 Integrated Device Technology. Inc.

NOVEMBER 1990
Cl990 Integrated DevIce Technology. Inc.

6.1

DSC-80SO/-

1

IDT79R3720 BUS EXCHANGER

ADVANCE INFORMATION

with two non-interleaved banks of memory, a high-order
address bnfrom the processor determines which memory port
is being accessed. In an interleaved memory system, Addr(2)
is used to alternate between two banks of memory. Both these
cases are handled by the R3721 DRAM controller for the
R3051 family.
The R3721 DRAM controlie r for the R3051 family uses the
bus exchanger as a simple set of transceivers. The R3721
directly controls the inputs of the bus exchanger, during both
reads and wrnes.

USE AS PART OF THE R3051 FAMILY CHIPSET
When used wnh an R3051 family CPU, a pair of bus
exchangers are typically used as illustrated in figure 2.
The bus exchanger is typically used as an integrated
transceiver in R3051 family applications; that is, the latches
are held "open" through the transfer. In such an application,
the single bus exchanger replaces 8 basic transceivers plus
logic to coordinate data flow between the paths.
The system memory model determines the mapping between processor addresses and memory ports. In a system

-----a.l= Clk2xln
IDT R3051 Family
RISControlier

Addressl

Control

Data

R305x

...

local Bus

IDT79R3721
DRAM
Controller

IDT79R3722
Integrated
I/O Controller

I--

J
PROM

110

DRAM

DRAM

110

IDT79R3720
Bus Exchanger
(2)

....

t

IF

Figure 2. Bus Exchanger Used In R3051 Family System

6.1

2

(;5

ADVANCE
INFORMATION
IDT79R3721

DRAM CONTROLLER
FOR R3051 FAMILY

Integrated Device Technology. Inc.

FEATURES:

• High performance from low cost DRAMs
- Programmable timing model for 60 - 120 ns
DRAMs
- Supports Page Mode read and writes using on-chip
Page Detection
- Supports R3051 Family cache burst refill
- DMA interface for burst DMA read or write
accesses
• Supports Multiple Common DRAM Configurations
- 1MB to 16 MB
- 256kx4 through 4Mx1 DRAMs
- 1, 2, or 4 Banks of DRAMs
- Non-interleaved or Dual Interleaving Configurations
- Page Mode, or Static Column Mode Accesses
- CAS-before-RAS or RAS-only refresh timing
control on-chip
• Cascade able to allow multiple DRAM controllers per
system
• Low noise TTL level outputs for CPU interface
• High-performance CEMOSTM technology

• Direct Interface to R3051™ Family RISControlier™ Chip
Set
- R3051 Family Integrated RISControlier CPU
- R3720 Bus Exchanger
- R3722 I/O Interface Controller
• Directly drives DRAM address and control signals
- Directly drive 36 DRAM devices or multiple SIMM
modules
- High output capability requires no external drivers
for DRAM address or control signals
- Replicated control signals allows control of multiple
memory banks
- Directly drives Bus Exchanger or 74FCT245 data
path buffers
• Low noise outputs with built-in series resistance for
direct drive of DRAMS (large capacitive load)
• 20, 25, and 33 MHz Operation
• Simplifies system design by eliminating glue logic/PALs

BLOCK DIAGRAM

ALE

...

umn
Latch
ow

...

"ur

:S-urst 1
Counter!

I

....
Addr(3jfJ
Wr
Ack
RdCEn

...
...

DataEn
BurstlWrNear
SysClk
MSeI
CS
BurstDM
EOM

::

..
...

.....

..

Adr(10:0)

v

........
..

...

Byte Write
Control

CPUEn(1:0)
EvenEn
OddEn
Path

T/R

ByteWE(3:0)
OE(3:0)

!

+

......

AIO(24:0)

.....
.

Data Path
Control
Bank Select

Page Mode
Address
Comparator

II

Mode
Register

.....
R3051
Family
Interface
Controller

Timing
Generator
RAS(3:0)

RAS/CAS
Generator

...

CAS(3:0)

DRAM Access
Control
Refresh
Controller

Refresh Address

CEMOS, RISController, R305x, R305t, R3052 are trademar1
...-

:>.....

-

AccTyp1

...

-

--

Clock

Addrln7:0
Address1 :0

Dataln8:0
AccTypO
AccTyp1

Write
Buffer
(x 4)

WtMem
WbFuli

CpCondO ~--,----- Request

Figure 2. Write Buffer -1DT79R3000 Processor Interface

Write Buffer-Processor Interface Signals

Since Request is deasserted if there is no data in the Write Buffer,
software can determine if a previous write operation (for example,
to an I/O device) has been completed before in~iating a read or
read status operation from that device.

Clock
An inverted version of the IDT79R3000's SysOut signal from the
IDT79R3000 processor that synchronizes data transfers. The
Write Buffer uses the trailing edge of Cloekto latch the contents of
the AdrLo bus and uses the leading Clock edge to latch the contents of the Data and Tag buses.

WbFull
The Write Buffer asserts this signal to the IDT79R3000's
WrBusy input whenever it cannot accept any more data; that is,
when the current write will fill the buffer or the buffer has all address-data pairs occupied. The IDT79R3000 processor performs
a write-busy stall if it needs to store data while the WbFullIWrBusy
signal is asserted.

Dataln8:0
Nine input data lines from the IDT79R3000 processor's Data bus
(eight bits of data and one bit of parity).
Addrln7:0
Eight input address lines from the IDT79R3000 processor. The
address lines are taken from the AdrLo and Tag buses.

Data & Address Connections
Figure 3 illustrates how four Write Buffers are connected to the
address and data outputs of the IDT79R3000 processor.

Address1:0
The two least significant address bits from the IDT79R3000 processor. These two address bits must be connected to all four Write
Buffers and are used in conjunction with the access type
(AccTyp1:0) signals, the Position 1:0 signals, and the BigEndian
signal to determine which byte(s) in a word are being written into a
particular Write Buffer.

Address Inputs
Each Write Buffer device has eight address inputs (Adrln7:0).
The four low-{)rder bits (Adrln3:0) are clocked into the device on
the trailing edge of the Clock signal and are taken from the
IDT79R3000's AdrLo bus. The four high-{)rder b~s (Adrln7:4) are
clocked into the device on the rising edge of the Clock signal and
are taken from the IDT79R3000's Tag bus.
Each device also has separate inputs (Address1, AddressO) for
the two low-order bits from the AdrLo bus. These bits must be
input to each device since they comprise the byte pointer. Note in
Figure 3 that the two low-order Adrln inputs (Adrln1 :0) to Write
Buffer device 0 are connected to ground since the Address1,
AddressO inputs already supply these bits to the device.

AccTypln1 :0
The access type signals from the IDT79R3000 processor specifying the size of a data access: word, tri-byte, half-word, or byte.
WtMem
This input is connected to the MemWr signal from the
IDT79R3000 processor that is asserted whenever the processor is
performing a store (write) operation.

Data Inputs
Each Write Buffer device has nine data inputs that are clocked
into the device on the leading edge of the Clock signal and are
taken from the IDT79R3000's Data bus. In Figure 3; each device
captures eight bits of data and one b~ of par~. Also note that the
data bits assigned to each device correspond to the address bits
connected to the device. This arrangement is required since data

Request
The primary purpose of this signal is to request access to memory and is described later when the Write Buffer-Main Memory
Interface is discussed. The Request signal can also be connected
to the CpCondO input of the IDT79R3000 and can then be tested
by software to determine if there is any data in the Write Buffer.
6.4

2

MILITARY AND COMMERCIAL TEMPERATURE RANGES

1DT79R3020 RISC CPU WRITE BUFFER

simplifies system utilization of the "Read Error Address" feature
described later.
.

selection is dependent on a combination of the AccType signals
.and the two low order address bits. The arrangement also

Data, &
Panty

K....
.....

<

~
./

.

Data Bus [35:00)

Read
Buffer

>
...>

Tag N a g Bus [31:16)

....

>

System Data Bus

AdrLo Bus [15:00)

AdrLo

Ta~31:2f>

IDT79R3000
Processor

>
>

~

AdrLo15:12

I

~

Data35 31 :28

I--

......

..

f--

6

....
Address~

Adrln7:4
Adrln3:0
Dataln8:4

Oata15:1& Dataln3:0

l-

1\
L>

System Address Bus

Address1 :0

Write
Buffer
3

' .......

, ... >

DataOut

Position1 ~·1·
PositionO ~·1·

.....

:>

Ta027:24

::::

>
>

AdrLo11 :08

Adrln7:4
Address

Adrln3:0

~
Data33 27:24

Dataln8:4

~

Oata11:08 >

v

1-0

I--

f--

Dataln3:0

....

Write
Buffer
2
Position1
PositionO

I.- "1"
I.- "0"

Address1 :0

:> Adrln7:4

.....

~ Dataln8:4

~
Data07:04

I-

~

=>

.....

.....

f--

....

......

AdrLo07:0V Adrln3:0
Data34,23:20

Dataln3:0

Address Out>

v

Write
Buffer

1

Address1:0

=>

~
AdrLo03,02

>

....

!>

Dala3219:16

...

Adrln7:4
Adrln3:2
Adrln1:0
Dataln8:4

'r::

I

,'. ....... >

OataOut
Position 1 ~·o·
PositionO ~·1·

.....
Taq19:16

' .......

, ... >

OataOut

......
Taq23:20

....
o-i>

Address~

Write
Buffer
0

DataOul

Position 1 ~"O"
PositionO ~"O"

:>

Oala03:00> Dataln3:0

-

v_

Address1 :0

Figure 3. Write Buffer Data and Address Line Connections

into the Write Buffers' latches. Figure 4 illustrates the timing for the
processor-Write Buffer interface.
When the WrtMem signal is asserted, the low-order address bits,
and the Address 1:0 inputs, are latched on the trailing edge of the
Clock signal (1). The rising edge of Clock (2) is used to latch the
high-order address bits, the access type inputs and the contents of
the data bus.

The Position1 and PositionO signals shown in Figure 3 specify
the nibble position within a halfword that each write buffer device
comprises.

Write Buffer· Processor Timing
Transfers between the processor and the Write Buffers occur
synchronously: the Clock signal from the processor is input to the
Write Buffers and used to clock the address and data information

6.4

3

IDT79R3020 RISC CPU WRITE BUFFER

MILITARY AND COMMERCIAL TEMPERATURE RANGES

Clock

Adrln3:0
(AdrLo)
Address1 :0
AccType1:0
Adrln7:4
(Tag)
Dataln
Figure 4. Processor - Write Buffer Interface Timing

WRITE BUFFER· MAIN MEMORY INTERFACE

of the memory interface signals and the Clock signal is required,
the handshaking signals in this interface have no direct connection
to the operation of the Write Buffer-processor interface.

Figure 5 shows the signals comprising the Write Buffer interface
to main memory. This interface is essentially decoupled from the
Write Buffer-processor interface: although some synchronization
OutEn

--

32

/

AddrOut
I

Write
Buffer
(x 4)

...

AccTypOut1

~

36

DataOut
and Parity

/
I

)

Main
Memory
Controller

...

Request
Acknowledge

)

....

AccTypOutO

~

-,.....

Clock

Figure 5. Write Buffer - Main Memory Interface

Write Buffer· Main Memory Interface Signals

OutEn
The memory controller asserts this write input to enable the tristate outputs of the IDT79R3020 address and data signals.

Each Write Buffer provides the following signals that comprise
the interface to a main memory controller:

Request
The Write Buffer asserts this signal to inform the main memory
system that it has data to be written to memory.

AddrOut7:0
Eight address line output from each Write Buffer.
DataOut8:0
Nine data lines from each Write Buffer (eight bits of data and one
bit of parity).

Acknowledge
The main memory system asserts this signal when it has captured the data presented by the Write Buffer on the DataOut lines.

AccTypOut 1 :0
The access type signals from the Write Buffer specifying the size
of a data access: word, tri-byte, half-word, or byte.

6.4

4

IDTI9R3020 RISC CPU WRITE BUFFER

MIUTARV AND COMMERCIAL TEMPERATURE RANGES

Write Buffer· Main Memory Interface Timing

The Write Buffer responds to this signal by discarding the
address-data pair that was just output.

Figure 6 illustrates the timing for the transfer of data from the
Write Bufferto the main memory system. The sequence illustrated
in this figure is as follows:

4)

The memory system can deassert the OutEn signal to return
the Write Buffers' address and data outputs to their tri-state
condition.
Since the Request signal remains asserted, the memory system asserts the OutEn signal again to enable the next
address-data pair onto the system buses.
When memory system has accepted the second addressdata pair, it again asserts the Acknowledge signal. If the Write
Buffer is now empty, it responds to this signal by de asserting
the Request signal.

1)

When the Write Buffer has a data-address pair for transfer to
the memory system, it asserts the Request signal.

S)

2)

When memory system is ready to handle the Write Buffer
data, it asserts the OutEn signal to enable the Write Buffers'
address and data outputs onto the system buses.
When memory system no longer requires the Write Buffer
address and data outputs, it asserts the Acknowledge signal.

6)

3)

Clock

Acknowledge

I

\[TI
(

AddrOut
DataOut

"\..

)

Figure 6. Write Buffer - Main Memory Interface Timing

Note that the buffer's interface to main memory is not completely
asynchronous: assertion of the Request signal by the Write Buffer
is synchronized with the rising edge of Clock, and the Acknowledge signal input by main memory has a minimum set up and hold
time in relation to the Clock signal.

sented to the main memory controller. Subsequent writes are then
placed in another buffer. No reliance should be placed in any
aspect of gathering (except that it only involves sequential writes to
the same word address) as it is not readily deterministic. Nonsequential writes to the same word address are not gathered.
Note that gathering can require that two main memory controller
references be used to empty a single Write Buffer entry. For example, this can occur if Bytes 0 and 3 of a word are sequentially w r i t - I I
ten. Where order in writing is important, such as in I/O controllers, •
software should avoid sequential accesses to the same word. In
cases where write-read access ordering is important but reading of
the write location is not desired, such as during I/O, then a write followed by a write to a dummy location followed by a read of the
dummy location will insure the first write has occurred before continuing. Alternatively, the Request signal can be tested to determine that the Write Buffer is empty.

MISCELLANEOUS WRITE BUFFER· BOARD
LOGIC INTERFACE
The Write Buffers support several functions that utilize signals
that do not fit neatly into the descriptions of either the processor or
main memory interfaces. These functions and signals typically
involve miscellaneous logic on a CPU board and include the following:
•
•
•
•

byte gathering
configuration connections (Big Endian, Position 1:0)
address matching logic
error address latch logic
The sections that follow describe each of these categories.

Configuration Logic Connections
Because of their byte gathering capability, each buffer device
internally maintains a record of each valid byte in an address/data
pair. To do this, each device must have a way of determining which
data bits within a word it is handling. The following signals determine how the write buffers handle data that is written to the
devices:
• Position 1, Position 0 - these signals (in conjunction with
BigEndian) determine how each Write Buffer decodes the
Address 1/0 and AccType 1/0 to determine if it should store
the data inputs. Refer to Figure 3 for an illustration of how
data bits are assigned to Write Buffer devices based on their
position.

Byte Gathering
The Write Buffers perform byte (half-word, tri-byte and word)
gathering to decrease the number of write transfers to same location; that is, sequential writes to the same WORD address have
their data combined into the same address-data pair buffer.
Byte gathering is prohibited in the address-data pair that is currently available to the memory controller. Thus, the first write into
an empty Write Buffer will not have subsequent writes gathered
into it because it is currently available foroutputto memory. Writes
to the same location (byte) may be overwritten in the Write Buffer if
the gathering is not prohibited by the preceding rule.
The Write Buffers present address-data pairs to the main memory controller in the sequence in which they were received from the
processor except in the case of gathered data, where bytes or half
words can be collected and written to main memory in a single
write operation. If the address-data pair buffer is scheduled to be
output, then gathering is inhibited and the buffer contents are pre-

• BigEndian - When asserted, byte 0 is the leftmost, most
significant byte (big-endian): when deasserted, byte 0 is the
rightmost, least-significant byte (Iiltle-endian).
• Address 1, Address 0 - these signals (taken from the AdrLo
bus) must be connected to all buffer devices since they
determine which byte within a word is being accessed.

6.4

5

IDT79R3020 RISC CPU WRITE BUFFER

MILITARY AND COMMERCIAL TEMPERATURE RANGES

• AccType 1, AccType 0 - these inputs signals specify the data
size of a write operation as shown in Table 1.

Table 1 shows how these signals operate to specify how bytes
are saved within the Write Buffers.

Bytes Accessed
Big-Endian

31

o

I:: :::tF :::::1 :::::j:: ::1:: :::::}2}:::: :1 ::)3=:: }:I I:

1))0)

::I:::}> >1

Little-Endian

31

:};l >::1: :::::::::2/: ::1: »r :::1 :::::g}

II

I

o
::1

I})):; =:::::1 :(() 0 =::1
I

1:::::2:: :::=::h {::}3 \}}I I::: ::::3 i:>1 :::::::::::2/ {::I

I=:::o}: ::1
I 1 : : : 6 ::::1
1:::l:{}1
II
k /::d\::A
H? :::}:2 :::}):/
II
I :::::}:::::g:} ::::;1
I
1 }::)::::3 i{\:1 I ::::::}}3::: ::::::"1
I
Table 1. Byte Specifications for Write Operations
The lower two address bits of the device in position zero (as
determined by the two POSITION inputs) are inhibited; that is, they
are not stored directly as they are output on the AdrLo bus. Instead,
on output, the lower two address bits are generated from the indication of the positions of the valid data bytes as determined by
above table.

word address matches, the buffers assert signals that can be used
by the main memory controller to ensure that the Write Buffer is
emptied before the read access with the conflicting address has
been performed.
Figure 7 illustrates the Write Buffer signals involved in address
comparison logic. Each write buffer provides four output signals
(MatchOut A, B, C, and D) which correspond to the four buffer
ranks (A, B, C, D) in each device as shown in Figure 1. These
MatchOut signals can be externally NAN Oed as shown in Figure 7
to determine if the address being input matches those in any rank
of the Write Buffer.

MatchOut/Matchln Logic and Read Conflicts
Whenever the processor references main memory (either a write
or a read reference), the Write Buffers compare the word address
from the CPU with the word addresses stored in the buffers. If any

Write Buffer 3
MatchlnA
MatchlnB
r--I--I--~ MatchlnC
..-'-'-'-~ MatchlnD

1-------...,

MatchOutA
MatchOutB I - - - - O ! >
MatchOutC I---_~
MatchOutD

MatchlnA

Write Buffer 2
MatchlnA
MatchlnB
.....~~~ MatchlnC
-+-+-+-~ MatchlnD

MatchOutA t:=::;:=-;::==~
MatchOutB I---~
MatchOutC
MatchOutD

Write Buffer 1
MatchlnA
MatchlnB
....."'-.a.--I MatchlnC
-+-"'-~~ MatchlnD

MatchlnB

MatchlnC

MatchOutA 1-------'
MatchOutB I--~~
MatchOutC
MatchOutD

MatchlnD

Write Buffer 0
MatchlnA
MatchlnB
_--~ MatchlnC
-+---~ MatchlnD

MatchOutA ~-------I
MatchOutB I--~~
MatchOutC
MatchOutD

To Main Memory
Controller
CONFLICT

Figure 7. Write Buffer MatchOutIMatchln logic

6.4

6

IDT79R3020 RISC CPU WRITE BUFFER

MILITARY AND COMMERCIAL TEMPERATURE RANGES

Error Address Latch

The outputs of the NAND gates are fed into Write Buffers via the
Matchln A, B, C, and D signals and are used within each device as
part of the byte gathering logic. The NAND gate outputs can be
NANDed together as shown in Figure 7 with the resultant signal
used (in conjunction with the processor's MEMRD signal) to alert
the main memory controller logic that there is a pending buffered
write that conflicts with a just-issued read. The main memory controller can then delay the read access until the Request signal is
deasserted indicating that the Write Buffer has been emptied.

The write buffer incorporates an internal latch that can be loaded
with one of the buffered addresses and subsequently enabled out
onto the data lines. This feature can be used by error handling routines to read an address back from the Write Buffer and analyze or
recover from certain bus errors. Figure 8 shows the signals involved in operation of this latch.

Error Address
Latch

Figure 8. The Write Buffer Error Address Latch

When the LatchErrAddr signal is asserted, the address currently
available to the address outputs of the Write Buffer is latched into
the internal latch. This address can then be output on the DataOut
lines by asserting the EnErrAdr Signal so that the processor can

read the address in as data. Refer to the AC specifications for
timing parameters of the signals associated with the error address
latch.

6.4

7

1DT79R3020 RISC CPU WRITE BUFFER

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

RECOMMENDED OPERATING

ABSOLUTE MAXIMUM RATINGS(l,3)

SYMBOL
VTERM
TA
TBIAS
TSTG
VIN

RATING

COMMERCIAL

MILITARY

UNIT

-0.5 to +7.0

-0.5 to +7.0

V

Terminal Voltage
with Respect to
GND
Operating
Temperature
Temperature
Under Bias
Storage
Temperature(2)
Input Voltage

TEMPERATURE AND SUPPLY VOLTAGE

AMBIENT
iURE
-55°C to + 125°C

GRADE

oto +70

-55 to +125

°C

-55 to +125

-65 to +135

°C

-55 to +125

-65 to +150

°C

-0.5 to +7.0

-0.5 to +7.0

V

Military
Commercial

GND

Vee

OV

5.0 ± 10%

OV

5.0±5%

O°Cto +70°C

OUTPUT LOADING FOR AC TESTING

NOTES:
1. Stresses greater than those listed under ABSOLUTE MAXIMUM RATINGS may cause permanent damage to the device. This is a stress rating
only and functional operation of the device at these or any other conditions
above those indicated in the operational sections of this specification is not
implied. Exposure to absolute maximum rating conditions for extended periods may affect reliability.

>--..----4>

To Device
Under Test

2. VIN minimum =-3.0V for pulse width less than 15ns. VIN should not exceed
Vee +0.5 Volts.
3. Not more than one output should be shorted at a time. Duration of the short
should not exceed 30 seconds.
DC ELECTRICAL CHARACTERISTICS COMMERCIAL TEMPERATURE RANGE (TA = O°C to +70°C, Vee = +5.0 V ±5%)
SYMBOL

PARAMETER

16.67 MHz 20.0 MHz 25.0 MHz 33.33MHz 40 MHz
UNIT
MIN. MAX. MIN. MAX MIN. MAX MIN. MAX MIN. MAX

TEST CON DITIONS

VOH

Output HIGH Voltage

Vee = Min IOH = -4mA

3.5

-

3.5

-

3.5

-

3.5

-

3.5

-

VOL

Output LOW Voltage

Vee = Min, IOL = 4mA

-

0.4

-

0.4

-

0.4

-

0.4

-

2.4

V

VIH

Input HIGH Voltage(1)

2.4

-

2.4

-

2.4

-

2.4

-

2.4

-

V

VIL

Input LOW VoltaQe(2)

-

0.8

-

0.8

-

0.8

-

0:8

0.8

V

Input Capacitance

10

10

-

10

10

10

10

-

10

-

10

-

pF

Output Capacitance

-

10

COUT

-

-

-

CIN
Icc

Operating Current

Vee = Max

50

-

60

-

70

mA

VIH = Vee

10

-

10

10

-

90

Input HIGH Leakage

-

80

IIH

-

10

~

ilL

Input LOW Leakage

VIL =GND

-10

-

-10

-

-10

-

-10

-

-10

-

~

loz

Output Tri-state Leakage

VOH = 2.4V, VOL = O.SV

-40

40

-40

40

-40

40

-40

40

-40

40

~

10

-

10

V

pF

DC ELECTRICAL CHARACTERISTICS MILITARY TEMPERATURE RANGE
SYMBOL

PARAMETER

(TA=-55°Cto+125°C, Vee=+5.0V± 10%)

TEST CONDITIONS

VOH

Vee = Min IOH = -4mA

VOL

Vee = Min, IOL = 4mA

16.67 MHz
MAX.
MIN.
3.5

20.0 MHz
25.0 MHz
MIN.
MAX. MIN.
MAX.
3.5

UNIT

3.5

V

0.4

V

2.4

VIH

V

VIL

V

CIN

Input Capacitance

COUT

Output Capacitance

lee

Operating Current

IIH

Input HIGH Leakage

IlL

Input LOW Leakage

loz

Output Tri-state Leakage

pF
10
70
10
-10
-40

40

-40

10
-10

40

-40

pF
80

mA

10
-10

40

-40

40

NOTES:
1. VIH should be held above Vee + 0.5 Volts.
2. \'IL Min. = -3.0V for pulse width less than 15ns. VIL should not fall below -0.5 Volts for longer periods.

6.4 .

8

MILITARY AND COMMERCIAL TEMPERATURE RANGES

IDT79R3020 Rise cPU WRITE BUFFER

AC ELECTRICAL CHARACTERISTICS COMMERCIAL TEMPERATURE RANGE
SYMBOL

(TA +O°Cto70°C, Vcc=+5.0V±5%)

PARAMETER

16.67 MHz
MIN. MAX

33.33 MHz
20.0 MHz
25.0 MHz
MIN. MAX. MIN. MAX. MIN. MAX.

40.0 MHz
MIN. MAX.

7

-

6

-

3

-

3

4

-

3

6

3

4

-

4

t5

Access Type 1:0 to Clock rising setup

7

6

t6

Access Type 1:0 from Clock rising hold

3

3

t7

Addrin (7:4) to Clock rising setup

7

5

-

4

4

t8

Addrin (7:4) from Clock rising hold

3

-

3

-

2

1

-

1

5

-

4

4

-

4

3

1
6

-

6

5

-

2

4

-

-

3

Address 1 :0 from Clock falling hold

-

4

8

-

3

-

3

-

22

-

'16

t1

Addrin (3:0) to Clock falling setup

8

t2

Addrin (3:0) from Clock falling hold

4

t3

Address 1 :0 to Clock falling setup

t4

7

4
5
2

t9

Dataln (8:0) to Clock rising setup

7

t10

Dataln (8:0) from Clock rising hold

3

t11

WrtMem to Clock rising setup

10

t12

WrtMem from Clock rising hold

6

t13

Request from Clock rising

-

32

-

30

t14

Acknowledge to Clock rising setup

12

-

11

-

6

t15

Acknowledge from Clock rising hold

7

-

6

-

5

t16

LatchErrAdr to Acknowledge rising

5

-

5

-

t17

WbFull active from Clock rising

-

WbFull inactive from Clock rising

-

21

t18

21

-

t19

OutEn to AddrOut (7:0), DataOut (8:0) valid

2

15

t20

OutEn to AddrOut (7:0), DataOut (8:0) tri-state

2

8

7

3
4
2

5

-

19

-

17

-

9

19

-

11

-

2

15

2

15

15

2

15

2

-

22

-

3
3
4
2
4

1

-

-

UNIT

ns
ns
ns
ns
ns
ns
ns
ns
ns
ns
ns
ns

16

ns
ns

4

-

4

-

3

-

3

-

ns

3

-

3

-

ns

-

9

ns

9

9

ns

2

12

2

12

ns

15

2

12

2

12

ns

t21

MatchOut (ABCD) from Clock rising

-

24

20

-

15

-

15

ns

t22

Matchln (ABCD) to Clock rising setup

10

9

-

8

-

5

3

3

-

3

-

3

3

-

ns

Matchln (ABCD) from Clock rising hold

-

5

t23

-

t24

EnErrAdr to Data (error latch) valid

2

15

2

15

2

15

2

15

2

15

ns

ns

t25

EnErrAdr to Data (error latch) tri-state

2

15

2

15

2

15

2

15

2

15

ns

t26

AddresslData out from Clock rising

-

30

-

27

-

24

-

16

-

16

ns

t27

Reset to Clock rising. set-up

10

-

10

-

10

-

8

t28

Reset from Clock rising. hold

3

-

2

-

1

-

1

ns

t29

Reset low pulse width

8

-

8

-

8

-

cycles

t30

WbFull High from Clock rising (after Reset)

11

ns
ns

t31

Request High from Reset low

t32

Access TypOut 1 :0 low from Reset low

t33

Match Out (ABC D) Low from Reset low

t34

AddresslData out tri-state from Reset low
(OutEn negated)

-

t35

Access TypeOut from Clock rising

-

tcyc

Clock Pulse Width

60

tckhigh

Clock High Pulse Width

24

tcklow

Clock Low Pulse Width

24

-

20

-

21

28

-

26

21

-

20

32

-

30

22

19

32

-

30

2000

50

2000

-

6.4

20
20

-

8

-

40
16
16

-

8
1
8

ns

-

18

-

11
16

-

16

25

-

23

23

ns

20

-

15

-

15

ns

23

-

23

ns

23

ns

20

27
27
2000

-

30
12
12

23
2000

-

25
10
10

2000

-

ns
ns
ns

9

a
'l

i

•

I.:~.,
I

IDT79R3020 RISC CPU WRITE BUFFER

MILITARY AND COMMERCIAL TEMPERATURE RANGES

AC ELECTRICAL CHARACTERISTICS MILITARY TEMPERATURE RANGE (TA +-55°C to 125°C, Vee = +5.0V ±
SYMBOL

PARAMETER

10%)

16.67 MHz
20.0 MHz
MIN.
MAX. MIN.
MAX.

6.4

UNIT

10

MILITARY AND COMMERCIAL TEMPERATURE RANGES

IDT79R3020 RISC CPU WRITE BUFFER

Clock

I

\_---11

\
...jtllHt12

~

~----+-------~------------

Adrln3:0
(AdrLo)
Address1 :0

AccType1:0

Adrln7:4
(Tag)
Dataln

REQUEST

Acknowledge

t16~
LatchErrAdr

----------------------~I

\~___________

Figure 9. Write Buffer Timing Specifications

Clock

I

\

\

I

REQUEST

~tlB

t17H

Acknowledge

------------------~~~-------------------Figure 10. WBFULL Signal Timing Specifications

6.4

11

IDT79R3020 RISC CPU WRITE BUFFER

MILITARY AND COMMERCIAL TEMPERATURE RANGES

1

\ ' - - - _ _ _ _ _....1

DataOut
AddrOut

,,,C<

..

~'~

Figure 11. OUTEN Timing Specifications

Clock

MatchOut
(ABCD)

Matchln
(ABCD)

DataOut

Error Latch Data Out

Figure 12. Match and Error latch Timing Specifications

Clock

I
t26

(

Address/Data

REQUEST

Acknowledge

(

AccTypOut
t35

Figure 13. Address/Data Out, Access Type Out

6.4

12

IDTI9R3020 RISC CPU WRITE BUFFER

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

tCYC

Clock

Reset

______

~-------+--~~------------------J

OutEn

Addr/Data

----+--

~-~~---------------------------------------------

Out

----+-"""
+-__I __-+_________________________________________

AccessTypOut
(0:1) ______

Request

~

----+--'

WBFull ___-+ ____

~-JI

Figure 14. Reset Timing

6.4

13

MILITARY AND COMMERCIAL TEMPERATURE RANGES

IDT79R3020 RISC CPU WRITE BUFFER

68-PIN CPGA FOR R3020
PIN GRID ARRAY (CERAMIC) - BOTTOM VIE,?,

CLOCK
ACCACADTYPEO KNOWL DRESS1
EDGE

DATAIN2

DATAIN4

DATAIN6

VCC2

ACCAOBIGENDATAIN1
TYPE1 DRESSO ENDIAN E8.BQRADR

DATAIN3

DATAINS

GND2

DATAIN?

ADDR- ADDROUTS OUT4

DATAINa

ADDRINO

H

ADDR- ADDROUT3 OUT2

ADDRIN1

ADDRIN2

G

ADDR- ADDRoun OUTO

ADDRIN3

ADDRIN4

F

DATAOUTa

DATAOUTO

ADDRINS

ADDRIN6

E

DATAOUT1

DATAOUT2

ADDRIN?

LATCHERRADR

o

DATAOUT3

ADDROUT6

MATCH- MATCHINA
INB

C

ADDROUT7

ACCTYPE

MATCH- MATCH
INC
IND

L

K

B

A

GND1

ACCTYPE
OUTO

VCC1

DATAINO

-

oun

GNDO

DATAOUT?

DATAOUT4

VCCO

DATAOUTS

DATAOUT6

2

3

4

-RE- MATCH- MATCH- RESET POSITIONO
QUEST OUTC OUTA

WsFDTI MATCH- MATCH- WTMEM
OUTO

OUTB

6

?

S

6.4

a

VCC3

GND3

POOUTEN
SITION1
9

10

11

14

MILITARY AND COMMERCIAL TEMPERATURE RANGES

1DT79R3020 RISC CPU WRITE BUFFER

PIN CONFIGURATION
PLASTIC LEADED CHIP CARRIER
(TOP VIEW)
0
zz
~

~

o=>

o

9
VSSO
ACCTYPEOUTO
ACCTYPEOUT1
ADDROUT7
ADDROUT6
DATAOUT3
DATAOUT2
DATAOUT1
DATAOUTO
DATAOUT8
ADDROUTO
ADDROUT1
ADDROUT2
ADDROUT3
ADDROUT4
ADDROUT5
VSS1

0IZ

~~~8
OO=>U

8~
U«

>0

a.. a..

-,

0>

8 7 6 5 4 3 2 1 6867 6665 646362.61

10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

60
59
58
57
56
55
54
53
52
51
50
49
48
47
46
45
44

Index

VSS3
MATCHIND
MATCHINC
MATCHINB
MATCH INA
LATCHERRADR
ADDRIN7
ADDRIN6
ADDRIN5
ADDRIN4
ADDRIN3
ADDRIN2
ADDRIN1
ADDRINO
DATAIN8
DATAIN7
VSS2

2728 293031 32333435363738394041 4243

0
U

0

~wo

en Z«

W a..C)C/)
a.. >-0C/) C/)

~

U

a:
0

0
...,J
> ~ u...,Ja: a: is
Z U ~
0
I-WW W

U

OW

a:
a:

U.3: 0
0 oQ
«U «0
z« «en

0

Z

N

C')

zz

'

«0 «0 0
««
« « «
0 0 0 0

W
Z
W

~

«U

ORDERING INFORMATION
lOT

XXXXX
XX
Device Type Speed

X
Package

X
Process!
Temp. Range

I

Blank
'B'

Commercial
Military

'G'
'J'

68-Pin PGA
68-Pin PLCC

'16'
'20'
'25'
'33'
'40'

16.67 MHz
20.0 MHz
25.0 MHz
33.33 MHz
40.0 MHz

79R3020 RISC CPU Write Buffer

6.4

15

~~

IDT73200
IDT73201

16-BIT CMOS
MULTILEVEL
PIPELINE REGISTERS

Integrated DevIce Technology, Inc.

technology, the IDT73200 and IDT73201 have access times
of 12ns.
The IDT73200 contains eight 16-bit registers which can be
configured as one 8-level, two 4-level , four 2-level or eight
1-level pipeline registers.
The IDT73201 contains seven 16-bit registers and a direct
feed-through path. The seven registers can be configured as
one 7-level, a 4-level plus a 3-level, three 2-level or seven
1-level pipeline registers.
An eight-to-one output multiplexer allows data to be read
from anyone of the registers or from the feed-through path on
the IDT73201. Three input control pins (SELo-SEL2) select
which of the multiplexer inputs are directed to the output (Yo-

FEATURES:
• IDT73200: Eight 16-bit high-speed pipeline registers
• IDT73201: Seven 16-bit high-speed pipeline registers
plus a direct feed-through path
• 12ns to 20ns access time
• Programmable multilevel register configurations
• Powerful instruction set: transfer, hold, load directly
• Functionally replaces four Am29520s
• Read/Write buffer for 32-bit RISC/CISC microprocessors
• Applications as temporary address storage or
programmable pipeline registers for DSP products
• Coefficient storage for FIR filters
• Three-state outputs
• TTL-compatible
• Produced with advanced submicron CEMOSTM
high-performance technology
• Available in 48-pin plastic and ceramic DIP and 52-pin
surface mount PLCC
• Military product compliant to MIL-STD-883, Class B

Y1S).

DESCRIPTION:
The IDT73200 and IDT73201 are mutilevel pipeline
registers. With lOT's high-performance CEMOSTM

These pipeline registers are ideal for high throughput,
vector-oriented operations such as those in digital signal
processing (DSP). The IDT73200 and IDT73201 can also be
used as quick access scratch pad registers for general
purpose computing.
The two pipeline registers are packaged in 48-pin plastic
and ceramic DIPs for through-hole designs as well as 52-pin
PLCC and LCC for surface mount designs. Military grade
product is manufactured in compliance with the latest revision
of MIL-STD-883, Class B.

FUNCTIONAL BLOCK DIAGRAMS
Do-DIS

00-015
16

CLK-...-----+-----+-+------.

OE-----~

CLK·-.....-----+-----t-+------,

L......;,,;.;.,y.;-'--'

OE-----~'"

GND

Vee

10-115
CEN

16

10-115
CEN

YO-YIS
2562 drw 01

L...--,----'

POWER
SUPPLY

,GND

!--Vee

YO-YI5
2562 drw 02

10173201

10173200
eEMOS Is a trademark of Inlegrated Device Technology Inc.

MILITARY AND COMMERCIAL TEMPERATURE RANGES
101990 Integrated Device Technology,lnc.

6.5

DECEMBER 1990
DSC-4Kl36/1

1

10173200, 10173201
16-BITCMOS MULTILEVEL PIPEUNE REGISTERS

IIIUTARY AND COMMERCIAL TEMPERATURE RANGES

PIN CONFIGURATIONS
11
10
CEN
Do
Dl
D2
D3
D4
Ds
Ds
D7
GND
Vee
Da
D9
Dl0
Dll
D12
D13
D14
D1S
SEl2
SEll
SElo

INDEX

12
13
Yo
Yl
Y2
Y3
GND
Y4
Ys
Ys
Y7
Vee
GND
Ya

'r

yg
Yl0
Yll
GND
Y12
Y13
Y14
Y1S
OE
ClK

~L-IL-I'--'L.....JL....JIIL....JL.....JL-J

7 6 5 4 3 2

I I
LJ

'-'L-JL.....I

~

52 51 50 49 4a 47

03
04
05
06
07
GNO
Vee

]a
]9
]10
]11
] 12
]13
] 14

46[
45 [
44 [
43(
42[
41 [
40[

Os

] 15

39[

GNO

09

] 16

3S[

Ys
Y9

1

J52 -1

NC
GNO
Y4
Y5
Ys
Y7
Vee

010

] 17

37 [

011

]

18

36(

Yl0

012

] 19

35[

Yll

34 [

GNO

NC

]

"

20
~~~ ~~~~ ~~ ~o~~;g
N
----.J
CO)

"'"

I()

~

I

0
~
III "'"
N ()
-.J-.JO"'---z

W

ooo~~H~()

CO)

»>->-

2562 drw 03

PLCC
TOP VIEW

DIP
TOP VIEW

PIN DESCRIPTIONS
Pin Name

I/O

00- 015

I

YO-Y15

0

Description
Sixteen-bit data input port.
Sixteen-bit data output port.

10-13

I

Four control pins to select the register operation performed.

SELo- SEL2

I

Three control pins to select the register appearing at the output.

ClK

I

Clock input.

CEN

I

Clock enable control pin. When this pin is low, the instruction 10-13 is performed on the registers.
When high, no register operation occurs.

OE

I

Output enable control pin. When this pin is high, the output port Y is in a high impedance state.
. When low, the output port Y is active.

Vee

Power supply pin, 5V.

GNO

Ground pins,

av.
2562 tbl 01

10173200 OUTPUT SELECTION
SEL2

SELl

SELo

a
a
a
a

a
a

a

1
1

1

1

a

1

a
a

1
1

10173201 OUTPUT SELECTION
SEL2

SELl

SELo

V Output

a
a

a

A~YO-Y15

1

B~YO-Y15

1

a

o ~YO-Y15

a
a
a
a

c ~ Yo- Y15
o ~ YO-Y1S

E~

Vo- Y1S

1

1

F ~ Yo- Y1S

1

a

1

1

V Output
A~

Yo- Y1S

1

B~

Yo- Y1S

a

C~YO-V15

1

1

a

E~ YO-Y15

1

a
a

1

F ~ YO-Y15

G ~ Yo- Y1S

1

1

a

G~YO-Y15

H ~ Yo- Y1S

1

1

1

00- 015 -+ Yo- Y15
~:;,o~IOIU;;

~:;,o~IOIU'

U

2

IDT73200, 1DT73201
16-81T CMOS MULTILEVEL PIPEUNE REGISTERS

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

IDT73200 INSTRUCTION TABLE
13

12

h

10

0

0

0

0

LDA

Do- D15 ~A

0

0

0

1

LDB

Do-

B

1

0

0

1

0

LDC

Do- D15~C

1

0

0

1

1

LDD

Do-

D

1

0

0

0

LDE

Do- D15~ E

1

0

1

LDF

Do- D15~ F

1

0

1
1
1

1

0

LDG

00- 015

~G

1

0

1

1

1

LDH

Do- D15 ~ H

1

1

0

0

0

LSHAH

Do-D15~A~B~C~D ~E~F ~G~H

8

1

0

0

1

LSHAD

Do - D15 ~ A ~ B ~ C ~ D

4

1

0

1

0

LSHEH

Do - D15 ~ E ~ F ~ G ~ H

1

0

1

1

LSHAB

00- D15

1

0

0

LSHCD

Do- D15 ~C~ D

0

1

LSHEF

Do- D15 ~ E~ F

1

1
1
1

1

0

LSHGH

Do- D15 ~G~ H

2
2
2
2

1

1

1

1

HOLD

Hold All Registers

-

0

1

Mnemonic

Function
D15~

D15~

Pipeline Levels

1

~A~

B

4

2562 tbl 04

IDT73201 INSTRUCTION TABLE
13

12

h

10

0

0

0

0

LDA

Do- D15 ~A

0

0

0

1

LDB

00- 015

B

.1

0

0

1

0

LDC

Do- D15~ C

0

0

1

1

LDD

Do- D15 ~ D

0

0

0

LDE

Do-

E

0

1

LDF

Do- D15~ F

1

1

0

LOG

Do-D15~G

0

1
1
1
1

1
1
1

1

1

HOLD

Hold All Registers

7

0
0

Mnemonic

Function
~

D15~

Pipeline Levels

1

1

1

0

0

0

LSHAG

Do - D15 ~ A ~ B ~ C ~ D -4 E ~ F ~ G

1

0

0

1

LSHAD

Do - D15 ~ A ~ B ~ C ~ D

4

1

0

1
1

0

LSHEG

Do - D15 ~ E ~ F ~ G

1

LSHAB

Do- D15 ~A~ B

3
2

0

0

LSHCD

Do- D15~ C~ D

0

1

LSHEF

Do- D15~ E~ F

1

0

LDG

Do- D15 ~G

1

1

HOLD

Hold All Registers

1
1
1
1
1

0

1
1
1
1

2
2
1

2562tb105

6.5

3

IDTI3200, IDTI3201
16-81T CMOS MULTILEVEL PIPEUNE REGISTERS

IIIUTARY AND COMMERCIAL TEMPERATURE RANGES

10173200 PIPELINE CONFIGURATIONS
Eight 1-Level

I

t'=o
A

II

~
t '~2

C

t l• 3
D

II

~1=4
E

~ 1.. 5
F

~1-6
G

I
I

I

ll",7

II

10173201 PIPELINE CONFIGURATIONS

Four 2-Level

H

Two 4-Level

I

Seven 1-Level

t ,• O
A
II

~~
~~

D

One a-Level

I

One 4-Level, One 3-Level

Three 2-Level

~~
~
One 7-Level

~
CAPACITANCE (TA = +25°C, F = 1.0MHz)

ABSOLUTE MAXIMUM RATINGS(1)
Symbol
Vcc
VTERM

TA
I BIAS
TSTG

Rating

Commercial

Military

Unit

Power Supply
Voltage
Terminal Voltage
with Respect
toGND

-0.5 to +7.0

-0.5 to +7.0

V

-0.5 to
Vcc+ 0.5

-0.5 to
Vcc+ 0.5

V

o to +70

-55 to +125

°C

-55 to +125

-65 to +135

°C

-55 to +125

-65 to +155

°C

Operating
Temperature
I emperature
Under Bias
~torage

Symbol
CIN
Cour

UG uutput

bU

bU

Conditions
VIN = OV
Vour = OV

Typ.
10
12

Unit
pF
pF

NOTE:
25621b! 07
1. This parameter is sampled at initial characterization and is not 100%
tested.

TEST CIRCUIT

Temperature
lOUT

Parameter(l)
Input Capacitance
Output Capacitance

mA

Current
NOTE:
25621b1 06
1. Stresses greater than those listed under ABSOLUTE MAXIMUM
RATINGS may cause permanent damage to the device. This is a stress
rating only and functional operation of the device at these or any other
conditions above those indicated in the operational sections of this
specification is not implied. Exposure to absolute maximum rating
conditions for extended periods of time may affect reliability.

6.5

Test

Switch

tPLZ

Closed

tPZL

Closed

Open Drain

Closed

All Other Tests

Open

DEFINITIONS:
CL = Load capacitance includes jig and probe capacitance.
Rr = Termination should be equal to Zour of the pulse generator.
(Typically SOn)
VIN = OV to 3.0V
INPUT: tr = tI = 2.5ns (10% to 90%) unless otherwise specified

25621b110

4

II
•

10173200, 10173201
16-BIT CMOS MULTILEVEL PIPEUNE REGISTERS

IIIUTARY AND COMMERCIAL TEMPERATURE RANGES

DC ELECTRICAL CHARACTERISTICS
Commercial: O°C to +70°C, 5V ± 5%; Military: -55°C to +125°C, 5V ± 10%
Symbol

Parameter

Test Condition

Max

Min.

-

VIH

High-Level Input Voltage

VIL

Low-Level Input Voltage

hH

High Level Input Current

Vee- Max.

VI- Vee
VI-GND

Unit

2.0

-

-

O.S

V

10

p.A

V

IlL

Low-Level Input Current

Vee- Max.

-10

p.A

VOH

High-Level Output Voltage

Vee- Min.,
IoH - -SmA(COM'L.), -6mA(MIL.)

2.4

-

V

VOl

Low-Level Output Voltage

Vee- Min.,
lot. - 16mA(COM'L.), 12mA(MIL.)

-

0.4

V

-1.2

VIK

Input Clamp Voltage

1I--1SmA

-

lOS

Short Cjrcuit Output
Current(2)

vee - Max., VO = l:iNU
VI - Vee or GND

-20

10ZH

High Impedance Output
Current

Vee - Max.

VI- Vee

-

20

p.A

10Zl..

Low Impedance Output
Current

Vcc - Max.

VI-GND

-

-20

p.A

V
mA

NOTES:
1. For conditions shown as Min. or Max., use appropriate value based on temperature range.
2. Not more than one output should be shorted at one time. Duration of the short circuit test should not exceed 100 milliseconds.

2562tb1 08

POWER SUPPLV CHARACTERISTICS
TypJ2)

Max.

Unit

Iccoc

Quiescent Power Supply Current

Vcc = Max.
VI = VLC or VHC

-

2

10

mA

ICCQT(J)

Quiescent Power Supply Current
Inputs HIGH

Vcc = Max.
VI = 3.4V

-

15

45

mA

ICCD1'''}

Dynamic Power Supply Current

VCC Max.
Outputs Disabled, OE .. HIGH
fcp = 1OMHz, 50% Duty Cycle
VI S VHC, VI ~ VLC

COM'L.

-

10

30

mA

MIL.

-

10

40

Vcc= Max.
Outputs Disabled, OE = HIGH
fcp = 40MHz, 50% Duty Cycle
VI S VHC, VI ~ VLC

COM'L.

-

10

60

MIL.

-

10

SO

Symbol

leeD1''''

Test Condltlons(1)

Parameter

Dynamic Power Supply Current

=

Min.

NOTES:
1. For conditions shown as Min. or Min., use appropriate value specified under Electrical Characteristics for the applicable device type.
2. Typical values are at Vce = S.OV, +2SoC ambient and maximum loading, not production tested.
3. This parameter is not directly testable but is derived for use in the total power supply calculation.
4. Ic = laulEscENT + I INPUTS + IDYNAMIC
Ic = Iccoc + (ICCOT X DH X NT) + ICCD
Iccac = Ouiescent Current
ICCOT Power Supply Current for a TIL High Input (VIN 3.4V)
DH = Duty Cycle for each TIL Input High
NT =Number of TIL Inputs at DH
IceD =Dynamic Charge moved by an input transition pair (HLH or LHL)
All currents are in milliamps and all frequencies are in megahertz.

=

mA

2562tb1 09

=

6.5

5

10T73200, 10173201
16-BIT CMOS MULnLEVEL PIPEUNE REGISTERS

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

AC ELECTRICAL CHARACTERISTICS
Commerical: TA = O°C to +70°C, Vee

= 5V ±5%', Military' TA = -55°C to +125°C , Vee = 5V -+10%
Military

Commercial

73200L15
73201L15

73200L12
73201L12
Parameter
ClK to

Y~Y15

Propagation Delay

SEL~SEL2

~15

to Y~Y15 Propagation Delay
to ClK Set-up Time

D~D15

to ClK Hold Time

73200L15
73201L15

73200L20
73201L20

Max.

Max.

Min.

Max.

-

12

-

15

-

15

12

-

15

-

15

3

-

4

-

4
2

-

5

1

3

-

5

-

6

-

ns

2

3

-

ns

6

2

-

3

-

ns
ns

Min.

2

1~13

to ClK Set-up Time

4

-

5

1~13

to ClK Hold Time

2

-

2

CEN to ClK Set-up Time

4

CEN to ClK Hold Time

2

OE Enable Time(1)
OE Disable Time(1)

-

ClK Pulse Width HIGH

5

ClK Pulse Width lOW

5

ClK Period
Data In to Data Out Flowthrough(2)

5
2

5

Min.

Max.

Unit

Min.

-

20

ns

20

ns
ns.
ns

ns

-

10

-

10

-

13

9

-

9

-

13

ns

-

5

-

5

-

6

ns

5

-

5

-

6

-

-

12

15

-

20

ns

12

-

15

-

-

9
8

15

-15

ns

20

ns

NOTES:
1. Output Enable and Disable times measured to 500mV change of output voltage level.
2. 73201 only.

2562tblll

AC TEST CONDITIONS

L

Input Pulse levels
Input RiselFall Times

4ns

Input Timing Reference levels

1.SV

Output Reference levels
Output load

7.0V

GNDto 4.0V

1.SV

RT

500n
RL

•

See Figure 1
2562 drw05

2562 tbl12

•
I

Figure 1. AC Output Test Circuit

CMOS TESTING CONSIDERATIONS
There are certain testing considerations which must be
taken into account when testing high-speed CMOS devices in
an automatic environment. These are:
1) Proper decoupling at the test head is necessary. Placement of the capacitor set and the value of capacitors used
is critical in reducing the potential erroneous failures
resulting from large Vee current changes. Capacitor lead
length must be short and as close to the OUT power pins
as possible.
2) All input pins should be connected to a voltage potential
during testing. If left floating, the device may begin to
oscilliate causing improper device operation and possible
latchup.

6.5

3)

Definition of input levels is very important. Since many
inputs may change coincidentally, significant noise at
the device pins may cause the VIL and VIH levels not to be
met until the noise has settled. To allow for thistestingl
board induced noise, IDT recommends using VIL ~ OV and
VIH ~ 3V for AC tests.
4) Device grounding is extremely important for proper device testing. The use of multi-layer performance boards
with radial decoupling between power and ground planes
is required. The ground plane must be sustained from the
performance board to the DUT interface board. All
unused interconnect pins must be properly connected to
the ground pin. Heavy gauge stranded wire should be
used for powerwiring and twisted pairs are recommended
to minimize inductance.

6

10T732ClO, 10T73201
16-BIT CMOS MULTILEVEL PIPEUNE REGISTERS

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

ORDERING INFORMATION
lOT

xxxxx

xx

xx

x

x

Device
Type

Power

Speed

Package

Process!
Temperature
Range

·Y:lank

L...-_ _ _- - ' -_ _ _- - j

L...-_ _ _ _ _ _ _ _ _ _ _- - j

P
C

J

Plastic DIP
Sidebraze DIP
Plastic Leaded Chip Carrier

~~}

Speed in Nanoseconds

-I L

L...-_ _ _ _ _ _ _ _ _ _ _ _ _ _ _

L -_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _~

Commercial (O°C to +70°C)
Military (-55°C to +125°C)
Compliant to MIL-STD-883, Class B

73200
73201

Low Power
16-Bit 8-Level Pipeline Register
16-Bit 7-Level Pipeline Register
2562 drw06

6.5

7

(;)®

PRELIMINARY
10173210
10173211

FAST CMOS OCTAL
REGISTER TRANSCEIVER
WITH PARITY

Integratal De\1u Technoiogy,lnc.

FEATURES:

• Even parity generation from Port B to Port A
• Parity polarity control
• High output drive capability: 64/48mA (commercial/
military)
• Available in 32-pin, 300 mil plastic DIP and sidebraze
DIP, surface mount 32-pin SOJ and LCC packages
• High-speed, low-power, CEMOSTM process technology
• Military product compliant to MIL-STD-883, Class B

• Two bidirectional interfacing ports
• Single-level pipeline register for one port and one-level
(73211) or two-level (73210) pipeline register for the
other port
• 8-bit wide interface ports plus parity bit
• Even parity checking in both directions
• Even/odd parity generation from Port A to Port B

FUNCTIONAL BLOCK DIAGRAM
Ao--s

PERRB

Vee GND2-

 '.,

8o--a to LE:.:::
ts

Set-up Timei':::·
8o--a to CP to
".. "~'" LE

= High

tH

Hold Time
8o--a to CP to Low-to-High; LE = High

1.5

-

-

ns

tPZH
tPZL

Output Enable Time
AOEtoAo--a,80Et08o--a

-

-

7.0

ns

tPHZ
tPLZ

Output Disable Time
AOEtoAo--a,80Et08o--a

-

-

6.5

ns

tPWH

Clock Pulse Width High

7.0

5.0

-

ns

tPWL

Clock Pulse Width Low

7.0

5.0

-

NOTE:
1. Typical values are at Vee = 5.0V and +25°e ambient, not production tested.

6.6

ns
2594 tbll0

10

IDT73210,IDT73211
FAST CMOS OCTAL REGISTER TRANSCEIVER WITH PARITY

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

SWITCHING CHARACTERISTICS OVER MILITARY OPERATING RANGE
TA
CL

= -5S0C to + 12SoC; Vee = sv ± 10%
= SOpF; RL = soon
Typ.(1)

Max.

Unit

tPLH
tPHL

Propagation Delay
Clock to A0-8 (AOE = Low)
Clock to B0-8 (BOE - Low)

-

-

12.2

ns

tPHL

Propagation Delay
CP to PERRA, PERRB

-

-

10.6

ns

tPHL

Propagation Delay
POLARITY to B0-8

-

-

7.0

ns

tPHL

Propagation Delay
B0-8 to PERRB
LE = High

-

10.6

ns

ts

Set-up Time
A0-8, 80-8, POLARITY, SEL to CP

tH

Hold Time
A0-8, 80-8, POLARITY, SEL to CP

ts

Set-up Time
AEN, BEN to CP Low-to-High

tH

Hold Time
AEN, BEN to CP Low-to·High{

ts

Set-up Time
B0-8 to LE

tH

Hold Time
B0-8 to LE

ts
tH

Parameter

Description

Min.

//:.

{t~:,';:;:4i;!',

-

ns

-

-

ns

-

-

ns

1.5

-

-

ns

2.0

-

-

ns

1.5

-

-

ns

Set-up Time
B0-8 to CP to Low-to-High; LE = High

3.0

-

-

ns

Hold Time
B0-8 to CP to Low·to-High; LE

1.5

-

-

ns

2.0..• ;;.:.

....,.,.,...:.:.:.::,( \.\·Ill

.:.I:::,,::U:::::::;~:~~iii:::.}:j}':·
.{}\ ..

S·::::::::

.:.:":.\::,/:::> . . .

(:::::::~:::I::',::::I:i:)

il::::\:2,

:..

:':'/2-

tPZH
tPZL

Output Enable Time
AOEtoA0-8,BOEtoB0-8

-

-

7.0

ns

tPHZ
tPLZ

Output Disable Time
AOEtoA0-8, BOEtoB0-8

-

-

6.5

ns

tPWH

Clock Pulse Width High

8

6

Clock Pulse Width Low

8

6

-

ns

tPWL

NOTE:
1. Typical values are at Vee = 5.0V and +25°e ambient.

ns
259411>111

6.6

11

IDT73210,IDT73211
FAST CMOS OCTAL REGISTER TRANSCEIVER WITH PARITY

MIUTARY AND COMMERCIAL TEMPERATURE RANGES

Vee

ESD
PROTECTION

INPUTS

0------..--;

.....III

OUTPUTS

2594 drw07
2594drw08

Figure 4. Input Interface Circuit
Figure 5. Output Interface Circuit

DEFINITIONS:
CL = Load capacitance: includes jig and probe capacitance
RL = Termination resistance: should be equal to ZOUT of the Pulse Generator

Figure 6. AC Test Load Circuit

AC TEST CONDITIONS
Input Pulse Levels

GND to3.0V

Test

Switch

Input RiselFall Times

1V/ns

Closed

Input Timing Reference Levels

1.SV

Open Drain
Disable Low
Enable Low
All other Tests

Output Reference Levels
Output Load

1.SV
See Figure 6

Open
25941b113

25941b112

6.6

12

10173210,10173211
MIUTARY AND COMMERCIAL TEMPERATURE RANGES

FAST CMOS OCTAL REGISTER TRANSCEIVER WITH PARITY

ORDERING INFORMATION
IDT

XXXX
Device Type

X
Package

x
Process!
Temperature
Range

y~LANK
'------------1 Y
'-----------------1

Commercial (O°C to +70°C)
Military (-55°C to + 125°C)
Compliant to MIL-STD-883, Class B

TP

32-pin Small Outline IC (J-Bend)
32-pin Thin Plastic Dip (300mil wide)

73210
73211

8-bit One Single, One Double Pipeline Registers
8-bit Two Single Pipeline Registers
2594 drw 10

6.6

13

RISC MODULE PRODUCTS

II

RISC MODULE PRODUCTS
Maximizing the performance of R3000 systems means
designing with very high-speed components and finely tuning
PC board layouts to work at very high clock rates. IDT offers
a variety of pre-built, pre-tested RISC subsystems that can be
used to eliminate this part of the design task.
Roughly three by six inches in size, the modules are built on
8-10 layer PC boards with components surface mounted on
both sides. Most modules include at least the CPU, optional
FPA, cache RAMs and Read/Write Buffers. The high-speed
clock is also on the module, along with some reset, interrupt
and control logic. The net effect is to put all of the very highspeed logic onto a tightly integrated, independent subsystem

that can be purchased like a component. The modules are
100% burned-in and tested at the rated speed.
The modules are designed around several different architectures. Within a given architecture, there is a range of
speeds and cache sizes to choose from. All the modules with
the same architecture are plug compatible, so price/performance options are easy to offer in the end system by simply
selecting the appropriate RISC SubSystem module.
Prototyping development systems are available for each
module architecture to serve as a starting point for additional
hardware and software development.

II

7.0

TABLE OF CONTENTS
PAGE

RISC MODULE PRODUCTS
IDT7RS101
IDT7RS102
IDT7RS103
IDT7RS104
IDT7RS107
IDT7RS10S
IDT7RS109
IDT7RS110

R3000 CPU Modules for General Applications ................................ ......... .................
R3000 CPU Modules for Compact Systems .............................................................
R3000 CPU Modules for Compact Systems .............................................................
R3001 RISC Engine for Embedded Controllers ........................................................
R3000 CPU Modules for High Performance and MultiProcessor Systems ...............
R3000 CPU Modules with 2S6K Caches ...................................................................
R3000 CPU Modules .................................................................................................
Plug Compatible Family of R3000 CPU Modules ......................................................

7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.S

MacStatlon. RISC CPU SubSystem. RISControlier. Real8 and TargetSystemare Trademarks of Integrated Device Technology. Inc.
Apple. Maantosh. AppleTalk. LaserWriter. AlUX. Mu~iFinder are registered Trademarks of Apple Computer. Inc.
UNIX is a registered trademark of Unix System Laboratories.
MIPS. RISCIos. and RISCorTlliler are trademarks of MIPS CorTlluter Systems. Inc. NuBus is a trademark of Texas Instruments. Inc.
MS-DOS is a registered trademark of MicroSoft Corporation. Truelmage and Windows are trademarks of MicrosoftCorporatlon.
TrueType is a trademark of Apple CorTlluter.
Postscript is a trademark of Adobe Systems.
PeerlessPage Is a trademark of The Peerless Group.

7.0

2

t;)®

R3000 CPU MODULES
FOR GENERAL APPLICATIONS

IDTIRS101

Integriolted Device Technology, Inc.

POWERFUL GENERAL PURPOSE R3000
MODULE:

FEATURES:
• R3000 CPU on 3.7" x 6.5" plug-In module

The IDT7RS101 is a complete reduced instruction set
computer (RISC) CPU, based on the MIPS R3000 RISC
processor, and supplied on a small fully-tested high-density
plug-in module. The module includes the R3000 CPU, 64
Kbytes each of data and instruction cache memory, a single
word write buffer and a four-word read buffer to support
block refill. Clock generation, reset, control and interrupt
functions are included on the module to simplify the remainder of the system design. Parity bits on incoming data wo'rds
may be generated automatically on the module, transparent
to the rest of the system. Alternatively, parity may be
handled in the user system, with on-board circuits only
performing optional parity check functions. Five user inter-I
rupts are provided with an on-board clocked register to I
ensure synchronized activity with the R3000 timings.
The module is constructed using surface mount devices
on a 3.72 by 6.5" epoxy laminate board, and is connected to
the user's system via four 50-pin Insulation Displacement
Connectors.

• 64K each Instruction and Data Caches
• On-board clock generation
• Four-word read buffer for block refill. Single
word write buffer
• On-board parity generation
• Five user Interrupts Into on-board register
• Available with or without Floating Point
Accelerator
• Cache supports full 32-bit address space
• 100% burn-In and functional test at rated speed

II
7RS101 Module. Actual Size 3.72" x 6.50·'

OCTOBER 1990

RSD 101pblAIR

4:>1990 Integrated Device Technology. Inc.

7.1

DSC-8030/1

1

IDTIRS101
R3000 RISC CPU MODULE

PRODUCT BRIEF

Initialization Options

ARCHITECTURAL HIGHLIGHTS
Four-word Read Buffer
The 7RS101 includes a singl.e write buffer and a fourword deep read buffer. Cache read miss operations (memory
data not currently stored in the cache) may be handled either
with a four-word block refill or with a single-word fetch and
cache update. All control signals are available to implement
either option. Address mapping can be used to force block
refills on some addresses (for example, instructions) and
single-word updates on other addresses (for example, data).

Clock Generation
The clocks for both the R3000 and the R3010 are automatically generated on the module using a very accurate
and stable delay line driven by a single user-supplied input
clock signal. There are three buffered clock output signals
for use with external control logic and system timing, each
of which is an identical inverted version of the R3000 output,
SYSOUT#.

Parity Generation
The R3000 Processor requires incoming data words to
consist of 32 bits of data and 4 bits of parity. The 7RS 101
module can be set to either of two modes for parity handling.
It can check parity and report errors on incoming data that
consists of 32 data bits and 4 parity bits. In the other mode,
it can generate parity on 32 bits of incoming data and supply
the full 36 bits to the CPU.

7.1

The R3000 requires mode selection to be made during
the RESET initialization sequence. The 7RS101 module
provides three pins that can simply be tied High or Low by
the user to select some of the R3000's options: Instruction
Streaming on or off, Partial Word Store on or off, and
BigEndian or LittleEndian byte order.

User Interrupts
Six user interrupt inputs are provided. Each of these is a
negative-true signal, terminated with a 4.7K ohm pull-up
load resistoronthe module, so pins may be left unconnected
if they are not used. The interrupt signals are clocked into
an Interrupt Input Register on the module by the CPU clock
SYSOUT. This ensures that the interrupt inputs to the
R3000 are synchronized to its clock. One of the interrupts
is a reserved for use by the R3010 FPA; the other 5 are
available for the user.

Buffered Outputs
The address and data lines coming out of the module are
buffered, and can support substantial bus drive. All control
signals except those coming directly from the R3000 or
R3010 are also buffered.

2

IDT7RS101
R3000 RISC CPU MODULE

PRODUCT BRIEF

FUNCTIONAL BLOCK DIAGRAM

CPU Controls

Oscln
R3000 and R3010
Reset --oF...
Int's --+0...
~14~~44~;4- Read

Busy

SysOut ......~.;;,;,;,;.;,;.~
ADRLO

TAG

Latch
Address

fI

Data Bus

Address Bus

7.1

3

IDT7RS101
R3000 RISC CPU MODULE

PRODUCT BRIEF

SIGNALS PROVIDED ON MODULE PINS
Signal
MDOO-MD31

Type

Functional Description

1/0

Memory data lines tolfrom main memory system_

PARO-PAR3

1/0

Parity bits for data lines. Unconnected if on-board parity generation selected.

MAOO-MA31

OUT

Memory address lines to main memory system. These are registered outputs.

CA2-CA3

OUT

Block refill counter outputs. These lines are normally used instead of MA02-MA03, since they are the
outputs of the counter used to implement the 4-word block refill function.

RACT(0:1)

OUT

Positive-true outputs indicating the states of the R3000 outputs,ACCTYP(0:1), which identify the
size of data transactions for readlwrite cycles. These are registered outputs, like MAOO-MA31.

BACT2

OUT

Buffered ACCTYP(2) output from R3000, used to distinguish between cache and non-cache memory
operations.

AOE#

IN

Negative-true input to enable the 3-state register output pins, MAOO-MA31 and RACT(0:1).

UINTO-UINT5

IN

User interrupt inputs. Each has a 4.7K ohm pullup load resistor. UINT1 is reserved for R301 0 FPA
usage.

OSCIN

IN

Oscillator input clock signal. (2x clock rate).

MRES#

IN

Negative-true master reset input.

RESSW1-3

IN

Mode selection inputs used to determine R3000 setup options during reset initialization sequence.
Each has a 4.7K ohm pullup load resistor. Jumpers or switches to ground select the desired options.

SYSOUT1-3

OUT

Buffered clock outputs for synchronizing external events. Each of these is an identical clock signal,
representing the inverted form of the R3000 output, SYSOUT#.

MEMRD#

OUT

Direct negative-true output from R3000, used to indicate that a memory read cycle is in progress.

MEMWR#

OUT

Direct negative-true output from R3000,' used to indicate that a memory write cycle is in progress.

IN

Positive-true input used to request a memory read stall initiation and termination. This signal is
normally held in its asserted state and de-asserted at the completion of CPU stalls.

WBSY#

IN

Negative-true input used to request a busy indication for subsequent memory write operations.

BLKR#

IN

Negative-true input used to request a block read sequence for read operations from main memory.

AEN#

IN

Negative-true input used to enable the clock for loading the address register.

CEN

IN

Positive-true input used to enable the increment of the block refill address counter for pins CA2-CA3.

PHOLD

IN

Positive-true input used to inhibit clocking of the read buffer. This signal is normally the complement
of the CEN input.

WOE#

IN

Negative-true input used to enable the data output drivers for main memory write cycles.

WCTL#

IN

Negative-true input used to enable the clock to load data into the write data register for main memory.
write cycles.

CPCO

IN

Direct input to R3000 Processor, used to indicate the size of the data (block, word, byte, or other) for
memory read cycles.

CPC1

OUT

Connection between the R3000 Processor and the R301 0 FPA. Indicates the status of the conditional
branch. This pin is provided for diagnostic purposes, only.

EXC#

OUT

Direct output from R3000, indicating the EXC# signal between the R3000 and the R3010.

RUN#

OUT

Negative-true output from the R3000, indicating that the R3000 is in its RUN state (not stalled).

BERR#

IN

RBSY

Negative-true input to R3000 (with 4.7K ohm pullup), indicating a bus error in main memory.

FPA#

OUT

POE#

IN

Negative-true input with 4. 7K ohm pullup resistor, used to enable the on-board parity generation logic.
It is left unconnected if parity is to be handled by user system.

PERR#

OUT

Negative-true output which indicates a parity error on incoming data when on-board parity generation
is not selected.

TAGV

OUT

Tag validity indicator, connected between R3000 and cache memory. Provided for diagnostic
purposes, only.

Negative-true output from R3010, indicating the presence of R301 0 FPA on the module.

7.1

4

IDT7RS101
R3000 RISC CPU MODULE

PRODUCT BRIEF

RELATED PRODUCTS
Prototyplng System

The 7RS101 module can be placed into immediate service using our flexible 7RS301 Prototyping Platform. The
system includes two boards: a general purpose CPU board,
and a personality card that interfaces the module to the CPU
board.
The CPU board contains 1Mb of main memory, 256K of
EPROM, two RS232 serial ports, an 8254 counter/timer,
and an 8-bit parallel port accessible through a dual port
RAM. Four 50-pin connectors provide access to all the
address, data, and control signals for external connection to
additional hardware on, for example, a wire-wrap board.

The system includes lOT's Software Integration Manager, which provides facilities for downloading code,
examining memory, and stepping through programs.
The personality card is on a separate board, and provides
a bed for the module, necessary control signals, and connectors for an HP 16500 Logic Analyzer.
Code for the R3000 can be created on a MIPS development system, on lOT's MacStation™ system, or using lOT's
PC-based cross assembler and compiler products. Assembled code can be downloaded into the Prototyping
System for execution and debug.

ORDERING INFORMATION
Ordering Part Number

CPU

Speed

FPA

I-cache

D-cache

7RS1 01 F66A16A

R3000A

R3010A

64K

64K

16 MHz

7RS1 01 F66A20A

R3000A

R3010A

64K

64K

20 MHz

7RS1 01 F66A25A

R3000A

R3010A

64K

64K

25 MHz

7RS1 01 F66A30A

R3000A

R3010A

64K

64K

30 MHz

7RS101N66A16A

R3000A

None

64K

64K

16 MHz

7RS1 01 N66A20A

R3000A

None

64K

64K

20 MHz

7RS1 01 N66A25A

R3000A

None

64K

64K

25 MHz

7RS1 01 N66A3OA

R3000A

None

64K

64K

30 MHz

Other

II

ADDITIONAL INFORMATION
For detailed technical specifications on this module refer to the 7RS1 01 Product Specification and User's Manual.

CUSTOM OPTIONS
Some features of the 7RS1 01 can be modified on special order. Contact your lOT sales office for information.

7.1

5

G®

R3DDD CPU MODULES
FOR COMPACT SYSTEMS

IDT7RS1D2

Integrated DevIce Technology, Inc.

FEATURES:

R3DDD MODULE FOR GENERAL USE IN
SMALL SYSTEMS:

• Cache Size: 16K Instruction, 16K Data

The IDT7RS102 is a complete reduced instruction set
computer (RISC) CPU, based on the MIPS R3000 RISC
processor, and supplied on a small fully-tested high-density
plug-in module. The module includes the R3000 CPU, the
R3010 Floating Point Accelerator,16 Kbytes each of data
and instruction cache memory, a single word read buffer and
a single word write buffer.
Cache misses are handled with single word read requests to memory, providing a simple interface to any type
of main memory system.
The module is constructed using surface mount devices
on a 3.2" by 3.9" epoxy laminate board, and is connected to
the user's system via 144 pins located in two pin row regions
on the board.

• Extremely small size: 3.2" x 3.9"
• Processor Speeds up to 25 MHz
• Includes R3010 Floating Point Accelerator
• Single word Read and Write Buffers
• 100% burn-In and functional test at rated speed

7RS102 Module. Actual Size 3.2" x 3.9"

DECEMBER 1990

RSD PB102lAIR

Cl1990 Integrated Device Technology,lnc.

7.2

DSC-9031/1

1

IDT7RS1D2
R3DDD RISC CPU MODULE

PRODUCT BRIEF

ARCHITECTURAL HIGHLIGHTS
Small and Simple
The 7RS102 module is designed to be as small as
possible and to provide a simple interface to the user's
system. The 16K caches are the smallest useful for most
systems.
R3010 Floating Point Accelerator
The R301 0 Floating Point Accelerator (FPA) is included
as an integral part of the module. It operates in conjunction
with the R3000 RISC Processor and greatly improves the
system performance by expanding the instruction set to
include very fastfloating point capabilities. All timing and
control connections are on the module and are completely
transparent to the user.
Clock Generation
The clock inputs to the 7RS1 02 are the direct connections
to the clocks for both the R3000 and the R3010. These
clocks must be generated in the user system and applied to
the module.
Cache Memory
Cache memory is provided on the module for a capacity
of 16K bytes for each of the two R3000 cache memory
systems (Instruction Cache and Data Cache). Memory
operations which require main memory data transfers are
conveniently handled by means of a variety of on-board
control signals.
Cache read miss operations are handled as single-word
fetch and cache update. Non-cache read operations (such
as I/O reads) are indicated by means of control signals and
are easily accomodated by the user.

Parity Generation
The R3000 Processor requires incoming data words to
consist of 32 bits of data and a 4-bit parity code. Each of the
4 parity bits applies to a particular byte in the word. The
required parity is even. The user system is required to
generate parity for incoming data to the module and may
optionally check parityfordata being passed to main memory.
Address and Data Buffers
The address and data lines coming out of the module are
buffered and can support substantial drive requirements.
The address pins are direct outputs from registers and
include the signals MACTO-MACT2. The three-state output
drivers may be disabled by de-asserting the output enable
control line, AOE. This is not normally done, but is provided
as a feature for systems which may require it.
The data pins are driven by Bi-directional Registers.
Enable/disable control of the three-state output drivers is
accomplished with the signal, DOE. Memory write cycles
utilize a single-word write buffer on the module which
permits the R3000 Processor to continue running while data
is being written into main memory. USER INTERRUPT
INPUTS
R3000 User Interrupts
Six user interrupt inputs are provided, INTO-INTS. Each
of these is a negative-true signal, terminated with a 10K ohm
pullup load resistor on the modu Ie. In this way, the pin may
be left unconnected if it is not to be used. The interrupt
signals are connected directly to the interrupt pins of the
R3000 Processor. INn is a reserved pin on this version of
the module and is required for use by the R301 0 FPA. As a
result, it may not be used and must be left unconnected.

II

7.2

2

IDT7RS102
R3000 RISC CPU MODULE

PRODUCT BRIEF

FUNCTIONAL BLOCK DIAGRAM
CPU Controls

~~
~r

-....

2xclocks

Clock

...

Reset

......

Int's

-

R3000 and R3010

.......

Rese~t

Read

-

Interrupts

Read Busy

......

SysOut

.-

'240

......

~

SYSOUT
ADRLO

TAG

DATA
~~

+

+

Latch

Latch

Address
Data
D-Cache

,

"

..-

--.--- ...

.

",

.---......

Address
Data
I-Cache

f

"

Read Addr
Buffer

Write Addr
Buffer

Write Data
Buffer

t

t

!

Read Data
Buffer

j
~~

Address Bus ~ r

~r

7.2

Data Bus

3

IDT7RS102
R3000 RISC CPU MODULE

PRODUCT BRIEF

SIGNALS PROVIDED ON MODULE PINS
Signal Name

Description

Type

MDOO-MD31

110

Memory data lines tolfrom main memory system.

MPARO-MPAR3

110

Parity bits for data lines. Parity must be supplied to the 7RS1 02 module, in accordance with R3000
requirements.

MAOO-MA31

OUT

Memory address lines to the main memory system. These are registered outputs.

MACTO-MACT2

OUT

Positive-true outputs indicating the states of the R3000 outputs, ACCTYP(O:2), which identify the
nature of the data transactions for read/write cycles. These are registered outputs, like MAOO-MA31.

MTAGV

OUT

Registered TAGVoutput from R3000.

ACCTYP2

OUT

Unbuffered ACCTYP(2) output from R3000, used to distinguish between cache and non-cache
memory operations.

AOE#

IN

Negative-true input to enable the three-state register output pins, MAOO-MA31, MACTO-MACT2, and
MTAGV.

INTO-INTS

IN

Interrupt inputs. Each has a 10K ohm pullup load resistor. INT1 is reserved for R301 0 FPA usage.

CLK2XPHI

IN

Clock input for R3000 and R3010. Timings must conform to R3000 specifications.

CLK2XRD

IN

Clock input for R3000 and R301 o. Timings must conform to R3000 specifications.

CLK2XSYS

IN

Clock input for R3000 and R301 o. Timings must conform to R3000 specifications.

CLK2XSMP

IN

Clock input for R3000 and R3010. Timings must conform to R3000 specifications.

MRES#

IN

Negative-true master reset input. Connects directly to R3000 RES# input pin.

SYSOUT1-3

IN

Buffered clock outputs for synchronizing external events. Each of these is an identical clock signal,
representing the inverted form of the R3000 output, SYSOUT#.

MEMRD#

OUT

Direct negative-true output from R3000, used to indicate that a memory read cycle is in progress.

MEMWR#

OUT

Direct negative-true output from R3000, used to indicate that a memory write cycle is in progress.

IN

Positive-true input used to request a memory read stall initiation and termination. This signal is
normally held in its asserted state and deasserted at the compl;etion of stalls.

RBSY
WBSY#

IN

Negative-true input used to request a busy indication for subsequent memory write operations.

ACE#

IN

Negative-true input used toenable the clock for loading the address register.

RDEN#

IN

Negative-true input used toenable the clock for loading the data register for memory read cycles ..

DOE#

IN

Negative-true input used to enable the three-state data outputs, MDOO-MD31 and MPARO-MPAR3.

WCTL#

IN

Negative-true input used to enable the clock to load data into the write data register for main memory
write cycles.

CPCO

IN

Direct input to the R3000, used to indicate tha size of the data (block, word, byte, or other) for memory
read cycles.

CPC1

OUT

Connection between the R3000 Processor and the R301 0 FPA, indicating the status of the conditional
branch. This pin is provided for diagnostic purposes, only.

CPC2, CPC3

110

Direct connections to R3000 pins.

EXC#

OUT

Direct output from R3000, indicating the EXC# signal between the R3000 and the R3010.

RUN#

OUT

Negative-true output from the R3000, indicating that the R3000 is in its RUN state (not stalled).

BERR#
FPA#

IN
OUT

Negative-true input to R3000, used to indicate a bus error in main memory.
Negative-true output fro R3010, indicating the presence of R3010 FPA on the module.

7.2

4

II

IDT7RS102
R3000 RISC CPU MODULE

PRODUCT BRIEF

RELATED PRODUCTS
Prototyplng System
The 7RS1 02 module can be placed into immediate service using our flexible 7RS302 Prototyping Platform. The
system includes two boards: a general purpose CPU board,
and a personality card that interfaces the module to the CPU
board.
The CPU board contains 1 Mb of main memory, 256K of
EPROM, two RS232 serial ports, an 8254 counter/timer,
and an 8-bit para"el port accessible through a dual port
RAM. Four 50-pin connectors provide access to a" the
address, data, and control signals for external connection to
additional hardware on, for example, a wire-wrap board.

The system includes lOT's IDT/sim System Integration
Manager, which provides facilities for downloading code,
examining memory, and stepping through programs.
The personality card is on a separate board, and provides
a bed for the module, necessary control signals, and connectors for an H P 16500 Logic Analyzer.
Code for the R3000 can be created on a MIPS development system, on lOT's MacStation™ system, orusing lOT's
PC-based cross assembler and compiler products. Assembled code can be downloaded into the Prototyping
System for execution and debug.

Module Prototyplng Platform.
The card on the left Is the personality card with a module; the card on the right Is the general purpose CPU.

ORDERING INFORMATION
Ordering Part Number

CPU

FPA

I-cache

D-cache

Speed

16K

16 MHz

7RS102-16A

R3000

R3010

16K

7RS102-20A

R3000

R3010

16K

16K

20 MHz

7RS102-25A

R3000

R3010

16K

16K

25 MHz

7RS102F16A

R3000

R3010

16K

16K

16 MHz

7RS102F20A

R3000

R3010

16K

16K

20 MHz

7RS102F25A

R3000

R3010

16K

16K

25 MHz

Other

NOTE:
1. 7RS110 module recommended for new designs.

ADDITIONAL INFORMATION
For more details on the 7RS102 module, refer to the 7RS1 02 Technical Specification and User's Manual.

7.2

5

t;)®
Integr~ted

R3000 CPU MODULES
FOR COMPACT SYSTEMS

IDT7RS103

Device Technology, Inc.

DISTINCTIVE FEATURES:
• Three Cache Size Versions:
16K Instruction, 16K Data (7RS1 03-44)
32K InstrUction, 32K Data (7RS1 03-55)
64K Instruction, 64K Data (7RS1 03-66)

R3000 MODULE FOR COMPACT HIGH
PERFORMANCE SYSTEMS:
The ID17RS103 is a family of interchangeable RISC CPU
SubSystem modules, based on the MIPS R3000 RISC processor, and supplied on small fully-tested high-density plug-in
modules. The module includes the R3DDD CPU, optionally the
R3D1 D Floating Point Accelerator, and data and instruction
cache memory. Versions are available with 16K each I and D
cache (7RS103-44), 32K each I and D cache (7RS1D3-55)
and 64K each I and D cache (7RS1 03-66). The three versions
differ only in the length of the board. All plug into the same
socket. The delay line to generate the three R300D 2x clock
signals is included on the module, so the module can be driven
from a single 2x clock.
Externally, the user system supplies the R300D control
I
signals and the read and write buffers.
The module is constructed using surface mount devices on I
both sides of a 2.9" epoxy laminate board, and is connected
to the user's system via 192 pins located in two pin row regions
on the board.

• Extremely small size: 2.9" x 3.7"
• Processor speeds up to 25 MHz
• Optional R3010 Floating Point Accelerator
• On-board delay line to create R3000 clocks
• 100% burn-In and functional test at rated speed

7RS103-44 Module. Actual Size 2.9'· x 3.T'

DECEMBER 1990

RSD PSl031B1R

el990 Integrated Device TechnDlogy. Inc.

7.3

DSC-II041/·

1

IDT7RS103
R3000 RISC CPU MODULE

PRODUCT BRIEF

ARCHITECTURAL HIGHLIGHTS:
The Minimal Module
The 7RS103 is designed to provide an R3000 RISC
SubSystem in as small a space as possible. It includes only
the CPU (and FPA), cache memories, and a delay line to
generate the 2x clocks to the R3000. The read and write
buffers and control logic are handled off the module by the
user's system. This makes the module ideal for use with
ASICs or other unique implementations of main memory
interface.

The R3000 timing and control signals are brought directly
off the module. The R3000 data sheet should be consulted for
all the timing specifications. One of the interrupt inputs is
required by the R3010 on versions that include the FPA
device.
The three versions of the 7RS1 03 differ only in the cache
memory sizes. They are completely interchangeable.

FUNCTIONAL BLOCK DIAGRAM
ACCTYP +
ADDRLO

CTRL
12

,

I-CACHE

TAG
24

DATA

3~

I

.'

-f-D-CACHE

--

'.

12

24
,

.

'

.

R3010
'-f-FLOATING POINT
36
,
ACCELERATOR ~'-----~---------------~~---------------~----------+-~

-

R3000
RISC CPU

~

20
,

1-+--+---1---4

......... 20
1

...

36

.

24

36
,
........ 19

""I' 36

x FCT257

CLOCKS
SELECTOR

"
H

~3

I

1 x FCT240
CLOCK
BUFFER

DELAYI
LINE

ADDRESS (0:15) TAG (16:31)
ACCTYP(0:2)
TAGV

I . . . . 21

I~

,
DATA
AND
PARITY
2856 drwOl

l'

CONTROL
SIGNALS
EXTERNAL
CLOCKS
EXTERNAL
OSCILLATOR

7.3

2

~®

R3001 RISC ENGINE
FOR EMBEDDED CONTROLLERS

ADVANCE
INFORMATION
IDT7RS104

Integrated Devke Technology, Inc.

FEATURES:
• 128K each of Instruction and Data RAM In
Synchronous Memory

R3001 MODULE WITH SYNCHRONOUS
MEMORY
The IOT7RS1 04 is a complete reduced instruction set computer (RISC) CPU, based on the lOT 79R3001, an lOT
derivative of the MIPS R3000 RI SC processor, and supplied
on a small fully-tested high-density plug-in module. The
module includes the CPU, optionally the R3010 Floating
Point Accelerator, and 256K of synchronous memory divided into 128K each for Instruction and data space. Clock
generation, reset, control and interrupt functions are included on the module to simplify the remainderof the system
design.
The 104 module takes advantage of the R3001 's ability to
address large amounts of synchronous memory, permitting
the entire application program to reside in high-speed memory
space.
The module is constructed using surface mount devices on
a 3.7" by 6.1" epoxy laminate board, and is connected to the
user's system via pins located in two pin row regions on the
board.

• Simple Interface to peripherals
• On-Board Dual UART
• On-Board DMA Control
• Includes IDT/slm Monitor In EPROM
• On-Board 8254 Counter/Timer
• On-board OSCillator, delay line, and reset
circuitry
• 100% burn-In and functional test at rated speed

II

7RS104 Module. Actual Size: 3.7" x 6.1"

DECEMBER 1990

RSD PB1041A1R
e1990 Integrated Device Technology. Inc.

7.4

DSC-II06O/.

1

IDT7RS104
R3001 RISControlier™ ENGINE

PRODUCT BRIEF

ARCHITECTURAL HIGHLIGHTS
Complete RISC Computer
The IDT7RS1 04 is a complete reduced instruction set computer (RISC) module optimized for embedded control applications. It is based on the R3001 RISController and includes the R3010 Floating Point Accelerator (FPA) to enhance performance. On-board EPROM and serial 110 allow
the module to perform as a stand-alone computer. All logic
and control functions have been incorporated in the module
in order to simplify system design for the end user. An
82C54 Programmable Interval Timer is included for periodic
interrupt functions.

Two Synchronized User Interrupts
There are two pins for user interrupts provided' on the
module. They feed an on-board locked register to ensure
synchronized activity with the R3000. The remaining four
R3000 interrupts are used on the board: two interrupts are
dedicated to the 82C54 Timer, one interrupt is dedicated to
the 2681 DUART, and another interrupt is connected to the
R3010 Floating Point Accelerator.

R3001 CPU
The module uses IDT's R3001 CPU device. The R3001 is
architecturally the same as the R3000, but permits disabling
of parity checking, thereby eliminating several bits of width
in the synchronous memories. It is also capable of supporting synchronous memory spaces larger than the 256 KB
maximum supported by the R3000. Like the R3000, the
R3001 uses the R3010 Floating Point Accelerator for high
speed arithmetic.

The 7RS104 Module is designed to be a complete R3000
based controller, with a simple asynchronous handshaking
to external 1/0 devices or possibly addit!onal memory. The
two 128 KB synchronous memory blocks on the module are
intended to be large enough to store all the instructions and
data used by the machine, so there are no cache miss stalls.
This design not only allows the system to run at maximum
performance, but also eliminates the inconsistencies in
execution speed that result from cache misses and refills in
conventional R3000 designs.

TYPICAL APPLICATIONS

Simple Interface
For ease of use, all complex clock timing signals and
handshaking logic are generated on the module and are
derived from either an on-board oscillator, or an externallysupplied clock input signal. This frees the system designer
from the task of having to design a state machine which
implements the system's handshake and arbitrationfunctions.
Large Synchronous Memory Space
A large "cache" memory is included on the module with 128K
byte capacity for the Instruction Cache and' 128K byte
capacity for the Data Cache. This large cache allows most
embedded control software to run completely within the
cache, making external asynchronous memory unnecessary.
In fact, the 7RS1 04 "cache': memories are intended to be the
machine's main memory. They are initially filled with instructions by using code in the EPROMs to move data from
module's bus into memory using Load and Store, instructions.
Reset and Initialization Logic
A master reset input triggers special initialization logic which
is used to set the modes of operation of the R3001 with no
user intervention required. The module is resetto Bigendian.

7.4

2

IDT7RS103
R3000 RISC CPU MODULE

PRODUCT BRIEF

SIGNALS PROVIDED ON MODULE PINS
Pin Name

Functional DeScription

Type

DOO-D31

1/0

Memory data lines tolfrom main memory system.

DPO-DP3

1/0

Parity bits for data lines. Parity must be supplied to the 7RS1 03 module, in accordance with R300C
requirements.

AOO-A1S

OUT

Address lines from the R3000. (lower 16 bits)

T16-T31

OUT

Tag lines from the R3000. (higher 16 bits)

ACCTYPO-

OUT

Positive-true outputs indicating the states of the R3000 ACCTYP20utputs, ACCTYP(0:2), which
identify the nature of the data transactions for readlwrite cycles.

TAGV

OUT

Connection between cache and R3000.

INTO-INTS

IN

Interrupt inputs. Each has a 10K ohm pullup load resistor.

EXTOSC

IN

External oscillator input (Needed when using on-board delay line)

ECLKSYS

IN

The 2x Clock inputs for R3000 and R3010. Timings must conform to R3000 specifications.
(Needed when not using on board delay line)

MRES#

IN

Negative-true master reset input. Connects directly to R3000 RES# input pin.

ECLKHPI
ECLKRD SMP

I

BSYSOUTA-

OUT

BSYSOUTD

Buffered clock outputs for synchronizing external events.
Each of these is an identical clock signal, representing the inverted form of the R3000 output,
SYSOUT#.

BSYSOUT#

OUT

Buffered R3000 clock output, SYSOUT# for synchronizing external events. Non-inverted form of
SYSOUT#.

MEMRD#

OUT

Direct negative-true output from R3000, used to indicate that a memory read cycle is in progress.

MEMWR#

OUT

Direct negative-true output from R3000, used to indicate that a memory write cycle is in progress.

RBSY

IN

Positive-true input used to request a memory read stall initiation and termination. This signal is
normally held in its asserted state and deasserted at the completion of stalls.

WBSY#

IN

Negative-true input used to request a busy indication for subsequent memory write operations.

CPCO

IN

Direct input to the R3000, used to indicate the size of the data (block, word, byte, or other) for
memory read cycles.

CPC1

OUT

CPC2, CPC3

1/0

Connection between the R3000 Processor and the R3010 FPA, indicating the status of the
conditional branch. This pin is provided for diagnostic purposes, only.

OUT

Direct output from R3000, indicating the EXC# signal between the R3000 and the R3010.

RUN#

OUT

Negative-true output from the R3000, indicating that the R3000 is in its RUN state (not stalled).

BERR#

IN

II

Direct connections to R3000 pins.

EXC#

Negative-true input to R3000, used to indicate a bus error in main memory.

FPA#

OUT

Negative-true output indicating the presence of R3010 FPA on the module.

XEN#

OUT

Direct negative true output from the R3000. Used for read buffers output enable.

FPINT#

OUT

Negative-true R3010 interrupt request.

7.3

3

IDT7RS103
R3OO0 RISC CPU MODULE

PRODUCT BRIEF

ORDERING INFORMATION
Ordering Part Number

CPU

FPA

I-cache

D-cache

Speed

16K

16K

16 MHz

7RS103N44A16A

R3000A

NONE

7RS 103N44A20A

R3000A

NONE

16K

16K

20 MHz

7RS 103N44A25A

R3000A

NONE

16K

16K

25 MHz

7RS 103F44A 16A

R3000A

R3010A

16K

16K

16 MHz

7RS103F44A20A

R3000A

R3010A

16K

16K

20 MHz

7RS103F44A25A

R3000A

R3010A

16K

16K

25 MHz

7RS1 03N55A 16A

R3000A

NONE

32K

32K

16MHz

7RS 103N55A20A

R3000A

NONE

32K

32K

20 MHz

7RS 103N55A25A

R3000A

NONE

32K

32K

25 MHz

7RS103F55A16A

R3000A

R3010A

32K

32K

16MHz

7RS 103F55A20A

R3000A

R3010A

32K

32K

20 MHz

7RS 103F55A25A

R3000A

R3010A

32K

32K

25 MHz

7RS103N66A16A

R3000A

NONE

64K

64K

16MHz

7RS103N66A20A

R3000A

NONE

64K

64K

20 MHz

7RS103N66A25A

R3000A

NONE

64K

64K

25 MHz

7RS1 03N66A 16A

R3000A

R3010A

64K

64K

16 MHz

7RS103N66A20A

R3000A

R3010A

64K

64K

20 MHz

7RS103N66A25A

R3000A

R3010A

64K

64K

25 MHz

MORE INFORMATION
For more information on this module, ask your IDT sales office for the Technical Specification and User's Manual.

7.3

4

IDT7RS104
R3001 RISControlleri1d ENGINE

PRODUCT BRIEF

BLOCK DIAGRAM

-

R3001

R3KDATA

...

R3010

R3K ADR

1"'_ 1373,1....-:.._ .....
-~~
...
U -:-

128KB
Instruction
Cache

128KB
Data
Cache

J~

LocalBus
Control
PALs

j

J B;Dir~:lonal l

~

~

Address
Buffers

~

j

I

~.

1-deep
Read!
Write
Buffers

~

1

DMA
Control
Buffer

j

J

256Kx8
EPROM
(with IDT!sim

t J~
,Ir

,

,

LocalBus
CONTROL LocalBus ADDRESS

DUART
2681

-,

Counter
Timer
8254

J~
,Ir

,
LocalBus DATA
DMA
CONTROL

7.4

3

IDT7RS104
R3001 RISControlierTht ENGINE

PRODUCT BRIEF

SIGNALS PROVIDED ON MODULE PINS
Signal Name

Type

Description

STANDARD LocalBus SIGNALS
MDOO-MD31

I/O

Memory data lines tolfrom main memory system.

MAOO-MA31

I/O

Memory address lines to main memory system. These are registered inputs/outputs.

LRD#

OUT

Negative-true output which indicates that a memory read cycle is in progress.

LWR#

OUT

Negative-true output which indicates that a memory write cycle is in progress.

RACK#

IN

Negative-true input which is used to indicate that the main memory read cycle initiated by the
IDT7RS105 has been completed.

WACK#

IN

Negative-true input which is used to indicate that the main memory write cycle initiated by the
IDT7RS105 has been completed.

BEO#-BE3#

OUT

Negative-true output which indicates which byte is being accessed during main memory read or
write operations. These four signals are valid only when LRD# or LWR# istrue.

UINTO

IN

User interrupt input. Has a 10K ohm pullup load resistor.

UINT1

IN

Interrupt input which is driven by the R3010 Floating Point Accelerator on the standard module.
Has a 10K ohm pullup load resistor.

UINT2

IN

User interrupt input. Has a 10K ohm pullup load resistor.

UINT3

IN

Interrupt input which is driven by the 8254 C~unterfTimer on the standard module. Has a 10K ohrr
pullup load resistor.

UINT4

IN

Interrupt input which is driven by the 8254 CounterlTimer on the standard module. Has a 10K ohrr
pullup load resistor.

UINT5

IN

Interrupt input which is driven by the SN2681 DUART on the standard module. Has a 10K ohm
pullup load resistor.

R3KOSCIN

IN

Oscillator input clock signal for the R3000/R3010. (2x clock rate).

BSYSOUT#

OUT

Buffered clock outputs for synchronizing external events. This signal represents the non-inverted
form of the R3000 output, SYSOUT#.

BSYSCLKA-D

OUT

Buffered clock outputs for synchronizing external events. Each of these four outputs is an identica
signal, representing the inverted form of the R3000 output SUSOUT#.

MREQ#

OUT

Negative-true, one-clock cycle long output which indicates the start of a main memory read or writE
cycle.

RESET#

IN

Negative-true master reset input.

CPTO

IN

Indirect input to R3000 Processor. Transfers conditional branch status from an external
coprocessor to the R3000.

CPC1

OUT

CPC2-CPC3

IN

Connection between the R3000 Processor and the R3010 FPA. Transfers conditional branch
status from the R301 0 to the R3000. This pin is provided for diagnostic purposes only.
Direct inputs to the R3000 Processor. Transfers conditional branch status from external
coprocessors to the R3000.

NON-STANDARD LocalBus SIGNALS
UARTCLKOUT

OUT

Clock output from a 3.6864 MHz oscillator on board the module.

CLK8254

IN

Clock input to the 82C54 Interval Timer. May be shorted to UARTCLKOUT by the user.

EXTCSUART#

IN

Negative-true input which is the chip select for the SN2681 DUART.

List continued on following page

7.4

4

IDT7RS104
R3001 RISControlier™ ENGINE

PRODUCT BRIEF

SIGNALS PROVIDED ON MODULE PINS (CONTINUED)
EXTCSTIM#

IN

Negative-true input which is the chip select for the 8254 Counter/ Timer.

INTCSUART#

OUT

Negative-true output of an on-board decoder. Decodes the UART address space as Ox1 FEOOOOOOx1 FEOFFFF. May be shorted to EXTCSUART# by the user.

INTCSTIM#

OUT

Negative-true output of an on-board decoder. Decodes the Interval Timer address space as
Ox1 FE40000-0x1 FE4FFFF. May be shorted to EXTCSTIM# by the user.

DIRECT MEMORY ACCESS COMA) CONTROL SIGNALS
DICLK

IN

Latch enable to the instruction cache's address latch which is used during DMAs.

DDCLK

IN

Latch enable to the data cache's address latch which is used during DMAs.

DIWR#

IN

Negative-true input which is the write enable to the instruction cache RAMs. DIWR# is used
during DMAs.

DDWR#

IN

Negative-true input which is the write enable to the data cache RAMs. DDWR# is used during
DMAs.

DIRD#

IN

Negative-true input which is the output enable to the instruction cache RAMs. DIRD# is used
during DMAs.

DDRD#

IN

Negative-true input which is the output enable to the data cache RAMs. DDRD# is used during
DMAs.

TAG V

IN

Tag Valid input to the R3001. Has a 10K ohm pullup load resistor.

UDMA

IN

Positive-true, buffered input to the R3001 's DMA pin. Has a 10K ohm pulldown load resistor.

DCTL#

IN

Negative-true, buffered input which is the output enable for all of the DMA signals. Can be driven
true only after UDMA becomes true. Also acts as the output enable for the R3001 side of the
address buffers. Has a 10K ohm pullup load resistor.

DAOE#

IN

Negative-true, buffered input which is the output enable to the main memory side of the address
buffers. Can be used only when DCTL# is true. Has a 10K ohm pullupload resistor.

DMAEN#

IN

Negative-true, buffered input which is the clock enable to the main memory side of the address
buffers. Can be used only when DCTL# is true.

DAEN#

IN

Negative-true, buffered input which is the clock enable to the R3001 side of the address buffers.
Also act as the read data clock enable to the data buffers. Can only be used when DCTL# is true.
Has a 10K ohm pullup load resistor.

DWOE#

IN

Negatvie-true, buffered input which is the write data output enable to the data buffers. Can only be
used when DCTL# is true. Has a 10K ohm pulldown load resistor.

DXEN#

IN

Negative-true, buffered input which is the read data output enable to the data buffers. Can only be
used when DCTL# is true. Has a 10K ohm pullup load resistor.

fI

DUART CONTROL SIGNALS
I

RxDA

IN

Direct SCN2681 input.
Direct SCN2681 input.

RxDB

IN

TxDA

OUT

Direct SCN2681 output.

TxDB

OUT

Direct SCN2681 output.

IN

Direct SCN2681 inputs.

IPO-IP1
IP2-IP6

OPO-OP7

OUT

Direct SCN2681 outputs.

7.4

5

IDT7RS104
R3001 RISControllerThl ENGINE

PRODUCT BRIEF

RELATED PRODUCTS
IDT/slm
The 7RS1 04 module includes lOT's monitor in EPROM on
board. The IOT7RS901 System Integration Manager (IOT/
sim) is a ROMabie software product that permits convenient
control and debug of RISCsystems built around the MIPS
R3000 architecture. It permits users to quickly develop and
debug stand-alone systems. Facilities are included to
operate the CPU under controlled conditions, examining
and altering the contents of memory, manipulating and
controlling R3000 resources (such as cache, TLB and
coprocessors), loading programs from host machines, and
controlling the path of execution of loaded programs. Remote (source/symbolic) debugging is also supported.
10T/sim requires 82Kb of EPROM space for code and data
and 16Kb of RAM space for uninitialized variable data and
stack. The minimal I/O system supported uses UARTS. The
default drivers support the 2681 or 68681 devices. Other
devices can be added easily.

Prototyplng Platfonn
A Prototyping Platform is in development for this product.
Please contact your lOT sales office for latest status and
technical information.

ORDERING INFORMATION
Ordering Part Number

CPU

FPA

I-memory

D-memory

Speed

Other

7RS104F77A16A

R3001

R3010

128KB

128KB

16 MHz

EPROM socketed

7RS 104F77A20A

R3001

R3010

128KB

128KB

20 MHz

EPROM socketed

7RS104F77A25A

R3001

R3010

128KB

128KB

25 MHz

EPROM socketed

CUSTOM OPTIONS
Most software features of the 7RS1 04 can be modified by special order. Contact sales office for details.

MORE INFORMATION
For more information on this module, ask your lOT sales office for the Technical Specification and User's Manual.

7.4

6

(;)®
Integrated Device Technology, Inc.

R3000 CPU MODULES
FOR HIGH PERFORMANCE AND
MULTIPROCESSOR SYSTEMS

FEATURES:
• Cache Size: 64K Instruction, 64K Data

IDT7RS107

R3DDO MODULE FOR HIGH PERFORMANCE
CPUs AND MULTIPROCESSOR SYSTEMS:
The IDT7RS107 is a complete reduced instruction set
computer (RISC) CPU, based on the MIPS R3000 RISC
processor, and supplied on a small fully-tested high-density
plug-in module. The module includes the R3000 CPU, the
R3010 Floating Point Accelerator, 64 Kbytes each of data
and instruction cache memory, a single word read buffer and
a four-word write buffer. Clock generation, reset, control
and interrupt functions are included on the module to simplify the remainder of the system design.
The 107 module is designed to support the R3000's
multiprocessor features. Data in the D-cache can be invalidated by the R3000 CPU. It is also possible to invalidate the
entire contents of the I-cache in a single cycle by using an
external cache reset signal.
The module is constructed using surface mount devices
on a 5.2" by 5.2" epoxy laminate board, and isconnected to
the user's system via 195 pins located in two pin row regions
on the board.

• Processor Speeds up to 33 MHz
• Includes R3010 Floating Point Accelerator
• 1-word Read Buffer; 4-word Write Buffer
• Supports R3000 Multiprocessor Features
• Entire I-Cache can be Invalidated with external
cache reset signal
• Eight-word block refills
• On-board OSCillator, delay line, and reset
circuitry
• 100% burn-In and functional test at rated speed

II

7RS107 Module. Actual Size 5.2" x 5.2"

DECEMBER 1990

RSD PB107/AIR

ClI990 Integrated Device Technology. Inc.

7.5

DSC-904211

1

IDT7RS107
R3000 RISC CPU MODULE

PRODUCT BRIEF

ARCHITECTURAL HIGHLIGHTS
Uses R3020 Write Buffers
R3020 chips are used on the module to provide a "smart"
four-deep write buffer between the CPU and external
memory. These devices store data and addresses for up to
four write requests to main memory, and handle the handshaking with the memory controller. The R3020s support
features such as byte gathering (combining multiple byte
writes to the same address in the buffer into a single write)
and address matching (a read orwrite to an address already
in the write buffer will be detected so the user software can
take appropriate action). The R3020's Match signals are
OR'ed on the module to produce a Single output, labeled
CONFLICT.
Resettable Instruction Cache
The 7RS107 module permits invalidation of the entire
instruction cache via a "cache reset" pin on the module. This
feature is used to wipe the cache clean when the a block of
instructions in main memory have been changed by a DMA
operation. It is usually much faster than invalidating each
affecting tag individually.
Multiprocessor Invalidate In Data Cache
The module supports the R3000's multiprocessor cache
invalidate feature, so that data cache coherency can be
maintained when data held in the cache is altered externally.
The R3000's MP Stall and MP Invalidate Signals are available as pins on the module. The user's system stalls the
processor and then provides an address to the module while
signaling MP Invalidate. The module stores the address in
a latch and applies it to the cache at the right time for the
R3000 to invalidate the referenced tag.

On-board Oscillator and Delay LIne
All the clock generation circuitry required by the R3000
system is on the module. A jumper can be used to select
between the on-board crystal oscillator or an external oscillator input. A delay line on the module is used to set the
timing for register strobes and other critical signals relative
to the R3000 clock. The R3000 clock output "SYSOUT" is
made available to the user system through eight pins on the
module, each independently buffered.
R3000 Reset and Initialization Logic
The initialization logic forthe R3000 CPU is contained on
the module. A "Cold Reset" pin on the module starts the
required 15 ms reset signal to the CPU, and then provides
the initialization vectors during the last few cycles. A second
reset pin is provided to reinitialize the CPU without repeating
the 15 ms delay. The R3000 is initialized to "Big-Endian"
operation.
Five User Interrupt LInes
Five pins on the module are used for user interrupt inputs.
The user interrupts are synchronized in registers on the
module before being sent to the R3000. Interrupt 2 is used
for the Floating Point Accelerator, if present.
External R3000 Condition Code Pin
The R3000 input CPCO is available as a pin on the
module. During initialization, this pin is programmed as a
Condition Code test pin, so the R3000 can do a Test and
Branch in a single cycle based on its state. During read
stalls, the pin determines whether a single word or 8 words
will be read. Reads into the instruction cache must always
be block refills.

TYPICAL APPLICATIONS
Eight-Word Block Refll\
The module refills both the instruction and data caches
from memory in eight-word blocks. Following a cache miss,
the processor will request a memory read at the missed
address and wait for a data ready acknowledgement. When
an acknowledge is received, the processor will load eight
words into cache on eight successive clock cycles. The
memory interface must supply the correct eight words (address A4A3A2 = 0 to 7) at the processor's speed, 40 ns
intervals for a 25 MHz system. Interleaved memory is
usually the best way to support this requirement. The
processor's CPCO pin, available as a pin on the module, can
be used to over-ride the block refill on data, but instructions
refills must always be in 8-word blocks. The processor
performs instruction streaming during the refill.

7.5

The 7RS1 07 module is designed for applications that run
complex operating systems, such as UNIXTM, or that need
outside control of cache memory contents, such as multiprocessor systems.
The module supports the R3000's ability to invalidate
entries in the data cache, allowing multiple processor systems to maintain cache coherency.
The module is offered with the maximum possible cache
sizes (64K each) that can be supported by the R3000 in a
multiprocessor configuration. These sizes are well suited to
running UNIX at very high instruction rates as well.
The R3020 Write Buffer is used to provide a four-word
deep write buffer, which is ideal for most UNIX systems.

2

IDTIRS107
R3000 RISC CPU MODULE

PRODUCT BRIEF

FUNCTIONAL BLOCK DIAGRAM

CPU Controls

Oscln
R3000 and R3010
Reset

Read

Int's
Read
Busy
SysOut
DATA

Address
Data

I-Cache

•
Address Bus

Write Request
Write Ack

7.5

Data Bus

3

IDT7RS107
R3000 RISC CPU MODULE

PRODUCT BRIEF

SIGNALS PROVIDED ON MODULE PINS
Signal Name

Type

Description

MAO ... MA31

I/O

32-bit address from the module to external memory. This is an output from the 3020 Write Buffer
except during the MP Invalidate function, when it is the input to the MP cache address latch.

MDO ... MD31

I/O

32-bit data bus between the module and external memory. Driven from the 3020 Write Buffer during
writes; input to the Read Data Buffer during reads.

BACTO, 1,2

0

The three R3000 AccType status signals, driven from the 3020 Write Buffer during writes and from
a latch during reads.

MDPO ... MDP3

I/O

The four parity bits for the MD data. Output during writes and input during reads.

CP_CpCondO, 2, 3

I

The three flag inputs to the R3000 CPU. CPCO is used during read stalls to control block refill of the
data cache. (The instruction cache must always be block refilled.) CPC2 and CPC3 are the MP stall
and invalidate controls.

ALOE

I

Data Cache Address Latch Output Enable When LOW, enables the output of the latch holding the
data cache address supplied by the R3000. It should be LOW at all times except when the MP Latch
is being used to invalidate a cache address.

MPALOE

I

Data Cache MP Address Latch Output Enable. This input is used to enable the output of the latch
holding the address supplied by the user system during an MP stall cycle. It should be enabled (LOW)
only during the MP invalidate operation.

BSYSOUT2 ... 9
UINTO,1,3,4,5

0
I

Eight buffered inverted copies of the R3000 signal "SYSOUT" for use in the user's system.
Interrupt inputs to the R3000. These signals are synchronized to SYSOUT on the module. R3000
interrupt 2 is used for the Floating Point Accelerator.

BRESET

0

Buffered copy of the reset signal created on the module to reset the CPU. LOW during Reset.

WB_WbFull

0

Write Busy. Status signal created by the R3020 write buffer. Goes LOW to indicate the buffer is full.

CPU_BusError

I

Input to the R3000 indicating a bus error has occurred.

RES ETC

I

Cold Reset to the module. The module creates a 15 ms long reset to the R3000 and executes the
R3000 initialization sequence when this pin goes LOW.

0

This signal can be used to detect the presence of an FPA on the module. To be used, it must be
connected to a 4.7K pullup resistor. The pin will be LOW if the FPA is present.

FP_FpPresent
RESETI
WB_OutEn
WB_Request
WB_Acknowledge
CONFLICT
RABOE
RDBCE

I

Active LOW asynchronous clear to the I-Cache Tag RAMS. Sets the entire I-Cache invalid.

I

Write Buffer Output Enable. When LOW, turns on the outputs of the R3020 write buffers.

0
I

Output from the R3020 to indicate that there is data in the buffer to be written to memory. Active LOW
Input to the R3020 to indicate data has been written into memory.

0

The OR of all the R3020 Match signals; indicates the address on the R3020 inputs matches one of
the addresses currently in the write buffer.

I

Read Address Buffer Output Enable. When LOW, turns on outputs of the buffers containing the read
address.

I

Read Data Buffer Clock Enable. When LOW, enables the clock (SYSOUT) to the Read Data Buffers.

READ

0

RABLE

I

WB_LatchErrAddr

I

Latch Error Address input to the R3020.

WB_EnErrAddr

I

Enable Error Address input to R3020.

Status signal output. LOW during reads.
Read Address Buffer Latch Enable. When HIGH, enables the Read Address Buffer latches.

CP_MemRd

0

CP_RdBusy

I

Read Busy. Input to the R3000 to indicate acknowledgment of the MEMRD request.

RESETX

I

Additional Reset command. Same as RESETC, but does not go through the 15 ms delay. Can be
used to re-initialize the R3000 when power is on.

R3000 output signal. When LOW, there is a request for a read from external memory.

7.5

4

IDT7RS107
R3000 RISC CPU MODULE

PRODUCT BRIEF

RELATED PRODUCTS
Prototyplng System

The system includes lOTs Software Integration Manager. which provides facilities for downloading code. examining memory. and stepping through programs.
The personality card is on a separate board. and provides
a bed for the module. necessary control signals. and connectors for an HP 16500 Logic Analyzer.
Code for the R3000 can be created on a MIPS development system. on lOTs MacStation™ system. orusing lOTs
PC-based cross assembler and compiler products. Assembled code can be downloaded into the Prototyping
System for execution and debug.

The 7RS1 07 module can be placed into immediate service using our flexible 7RS307 Prototyping Platform. The
system includes two boards: a general purpose CPU board.
and a personality card that interfaces the module to the CPU
board.
The CPU board contains 1Mb of main memory. 256K of
EPROM. two RS232 serial ports. an 8254 counter/timer.
and an 8-bit parallel port accessible through a dual port
RAM. Four 50-pin connectors provide access to all the
address. data. and control Signals for external connection to
additional hardware on. for example. a wire-wrap board.

A Module Prototyplng Platform.
The card on the left is the personality card with a module; the card on the right is the general purpose CPU.

II

7.5

5

IDnRS107
R3000 RISC CPU MODULE

PRODUCT BRIEF

ORDERING INFORMATION
Ordering Part Number

CPU

FPA

I-cache

D-cache

Speed

7RS 107N66A 16A

R3000A

NONE

64K

64K

16 MHz

7RS107N66A20A

R3000A

NONE

64K

64K

20 MHz

7RS 107N66A25A

R3000A

NONE

64K

64K

25 MHz

7RS107N66A33A

R3000A

NONE

64K

64K

33 MHz

7RS 107F66A 16A

R3000A

R3010A

64K

64K

16 MHz

7RS 107F66A20A

R3000A

R3010A

64K

64K

20 MHz

7RS107F66A25A

R3000A

R3010A

64K

64K

25 MHz

7RS107F66A33A

R3000A

R3010A

64K

64K

33 MHz

Other

CUSTOM OPTIONS
Some features of the 7RS1 07 can be modified by special
order. Contact your IDT sales office for details.
Software modifications include: initialization mode forthe
R3000, endian option, size of block refill, instruction streaming
option.

Manufacturing options include pin length, style, and plating; special marking; additional burn-in, and socketing of the
CPU and/or FPA.

7.5

6

(;)®

R3000 CPU MODULES
WITH 256K CACHES

IDT7RS108

Integrated Device Technology, Inc.

FEATURES:
• Cache Size: 256KB Instruction, 256KB Data

R3000 MODULE FOR HIGH PERFORMANCE
CPUs:
The IDT7RS108 is a complete reduced instruction set
computer (RISC) CPU, based on the MIPS R3000 RISC
processor, and supplied on a small fully-tested high-density
plug-in module. The module includes the R3000 CPU, the
R3010 Floating Point Accelerator, 256 Kbytes each of data
and instruction cache memory, a single word read buffer and
a four-word write buffer.
Clock generation, reset, parity, control and interrupt functions are included on the module to simplify the remainder
of the system design.
The 7RS108 module is pin compatible with the 7RS107
and 7RS109 modules (with 64K caches), but does not
support the multiprocessing features offered by those·
modules.
The module is constructed using surface mount devices
on a 5.2" by 5.2" epoxy laminate board, and is connected to
the user's system via 195 pins located in two pin row regions
on the board.

• Processor Speeds up to 25 MHz
• Includes R3010 Floating Point Accelerator
• 1-word Read Buffer; 4-word Write Buffer
• Eight-word block refills
• On-board Parity Generation and Check
• On-board oscillator, delay line, and reset
cIrcuitry
• 100% burn-in and functional test at rated speed

II

7RS108 Module. Actual Size 5.2" x 5.2"

DECEMBER 1990

RSD PB108/M
1990 Integrated Device Technology. Inc.

8.2

DSC-9043/1

1

IDT7RS300 SERIES
PROTOTYPING PLATFORMS

PRODUCT BRIEF

FLEXIBLE PROTOTVPING PLATFORMS
System Description
The 7RS300 series RISC Prototyping Platform is designed to simplify the initial prototyping of both hardware
and software for systems using one the the lOT RISC
SubSystem Modules. The System Board is very general,
and is the same in all of the 300 series Platforms. It contains
basic control logic, mostly in PALs, 1 megabyte of static
main memory, 256K of EPROM, a counter/timer, and I/O
ports. Static RAM is used for main memory to provide the
simplest interface to the module. The EPROM contains
lOT's System Integration Manager in about 80K; the rest is
available for user software.
The System Board connects to a personality board forthe
module through a pair of ribbon connectors. Each module
architecture uses a different personality board. The personality board provides such features as clock generation,
R3000 reset and initialization, read and write buffers, etc., to
the extent that they are not already on the module. The
personality board also contains five 20-pin plugs that can be
directly connected to an HP 16500 series Logic Analyzer,
and provides a uniform interface to the System Board.

System Board Hardware
The System Board is powered by a single 5 volt supply
connected to a plug on the board. The plug conforms to the
standard used for PCs, so an ordinary inexpensive PC
power supply works easily with the board. A terminal can be
connected to one of the RS-232 ports to act as the terminal
forthe Software Integration Manager. The other serial port
is generally used to download software from some host
system. Alternatively, there is an 8-bit wide parallel port built
using dual port RAM that can be used for higher speed
download.

Four 50-pin IDC (3M) connectors are configured for
connecting additional hardware to the System Board. They
contain the following signals:
32 bits of address
32 bits of data, and 4 parity bits
SYSOUT (buffered clock from the R3000)
RESET# (copy of the R3000's Reset signal)
Parity and Address output enables from the address
and data registers (to permit tri-stating other data onto
these lines).
Six interrupt lines to the R3000. These are registered
or not, depending on the module.
The four byte Write Enable signals.
Five decoded chip select outputs from the upper 16
bits of address (1 FE6 through 1FEE).
MEMRD#, used to enable output devices in the expansion system during data read cycles.
Auxiliary input and output signals from the 68681 dual
UART
• MREQ# and XACK# handshaking signals for controlling the timing of data transfers.

Personality Board Hardware
The personality board connects to the system board
through two ribbon connectors. It contains a cut out area
and plugs which accept the appropriate module. There are
two five-volt power connectors, again using standard PC
plugs. One power supply is for the personality card, the
other for the module.
Five connectors are pre-wired to connect the modules
Signals to an HP logic analyzer. Because of the speed of the
signals in the R3000 system, the connectors are placed on
the slow side of the read/write buffers, so for disassembly
and trace purposes, the R3000 must be run uncached.

Software Included
The System Board contains lOT's System Integration
Manager (IDT/sim) in EPROM.

8.2

2

IDT7RS300 SERIES
PROTOTYPING PLATFORMS

PRODUCT BRIEF

FUNCTIONAL BLOCK DIAGRAM

PERSONALITY BOARD
AND MODULE

SYSTEM
BOARD

I

SO·PIN CONN

l

RS232C CONN #1

I

RS232C CONN #2

A

D

D
N
D
T- R
R
E
0
S
S
L

A

C
0

I

CONTROL
LOGIC

I
I

SRAM
(256Kx32)

~
tI
J

r-

EPROM
(64Kx32)

I

rfUAL-PORT RAM
(4Kx8)

-

4 so· PIN
EXPANSION/PROTO
CONNECTORS

iI

SERIAL I/O
(68681/MAX235)

I

T

A

tI
J

r-

8254
TIMER

L

Block DIagram of the 7RS300 SerIes Proto typIng Platforms

8.2

3

"8
:o~
0...,
~:o

O(f)

:;!~
,,0

-(f)

zm

1:):0

"iii
>(f)
~

SERIAL 1/0 PORTS
(DB25 CONNECTORS)
POWER
(LOGIC)

·0

Ira 0
loi

POWER

=

DOD ODOomOl~ ~

~

c=::J

00 0 0
IDT7RS103
RISC MODULE

U

nn

~N1

000 00

c=::J

DOOO 0

I

I

U

...

In ..

C::::::~J

~~ffi9Iil~

J

~

ISRAM IlsRAM IlsRAM IlsRAM I
\SRAM IISRAM IISRAM IlsRAM I

(4 - 50PIN)

ISRAM IISRAM IISRAM II SRAM I

r
~ 0
MAIN
LL_4I~!...--_ _ _ _ _ MEMORY

c=::J

USER
EXPANSION
CONNECTOR

S

~~~~
~~
~~OOO
ISRAM
IlsRAM IlsRAM IISRAM I
0 0 ISRAM IlsRAM IISRAM IlsRAM I

O0

:c
(f)

~

u

U

N

~ C:::::::::I

i.1

noD 0 I::::~M:
::::~M: r:R:~M: ~R:~M: ~!
SRAM II SRAM II SRAM II SRAM I

c:=:::J

lID

~

bi

000

t;;;;;J

CD

iol

"o
:0

PARALLEL PORT
(DUAL-PORT RAM)

J10J1~

RESET

7RS343 BOARD
(PERSONALITY BOARD FOR 7RS1 03)

BUnON

7RS340 BOARD
(RiSe SYSTEM BOARD)

Layout of the series 300 Prototyplng Platform.
The configuration shown Is for the 7RS1 03 module.

"o

:0

C

c
o

~

m

""

:0

iii

iii

"

IDT7RS300 SERIES
PROTOTYPING PLATFORMS

PRODUCT BRIEF

INCLUDED WITH SYSTEMS

ORDERING INFORMATION
Ordering Part Number

Each Prototyping Platform includes the System Board,
completely populated with 1 Mb of RAM and 256K of
EPROM, with the Software Integration Manager in the
EPROM. Each System also includes the appropriate personality card for the module architecture indicated and
configured forthe speed indicated. Documentation includes
complete schematics for both the system board and the
personality board, including all the PAL equations for the
control circuitry.

Speed

FOR USE WITH THE 7RS101 ARCHITECTURE
7RS301-16

16 MHz

7RS301-20

20 MHz

7RS301-25

25 MHz

7RS301-30

30 MHz

FOR USE WITH THE 7RS102 ARCHITECTURE
7RS302-16

16 MHz

7RS302-20

20 MHz

7RS302-25

25 MHz

Auxiliary Download Program
For downloading code from a MIPS machine into an
evaluation board. This software includes programs
to convert MIPS object code into S-records and to
download either ASCII or binary S-records to a
remote target. This software is only needed WITh
MIPS computers; all other machines (including the
MacStation) have standard utilities available to
perform this function.

FOR USE WITH THE 7RS103 ARCHITECTURE
7RS303-16

16 MHz

7RS303-20

20 MHz

7RS303-25

25 MHz

FOR USE WITH THE 7RS1 04
Contact Factory
FOR USE WITH THE 7RS107 ARCHITECTURE
7RS307-16

16 MHz.

7RS307-20

20 MHz

7RS307-25

25 MHz

7RS307-33

33 MHz

MIPS download utility •.•.......•..•......... 7RS950BUU
Supplied on QIC-24 TAR tape.

FOR USE WITH THE 7RS108 ARCHITECTURE
7RS308-25

25 MHz

FOR USE WITH THE 7RS109 ARCHITECTURE
7RS309-25

25 MHz

7RS309-33

33 MHz

FOR USE WITH THE 7RS110 ARCHITECTURE
7RS31 0-20

20 MHz

7RS31 0-25

25 MHz

7RS31 0-33

33 MHz

8.2

5

t;)®

R3000 PGA ADAPTOR

IDT7RS363

Integrated Device Technology, Inc.

FEATURES:
• Simple and direct connection to HP 16500A
Logic Analyzer System

DESCRIPTION:
The IDT7RS363 is an adapter card intended for use in
performing diagnostics on the operation of the IDT79R3000
RiSe Processor on a Hewlett-Packard model 16500A Logic
Analysis system. It contains no active components. Instead, it is used as a socket adapter for the R3000, and all
address, all data, and many control lines are made accessible for capture by the logic analyzer. It may only be used
with the pin grid array (PGA) package of the R3000 and
requires the logic analyzer system to be equipped with 5 HP
Termination Adapters, PIN 01650-63201, to provide forthe
direct connection to the analyzer input pods.
For ease of setup, a diskette is provided, as a part of the
7RS363, which contains files loadable directly into the HP
logic analyzer. These files automatically set up the logic
analyzer by assigning the pods and the individual input
channels directly to the signals captured from the R3000. In
this way, the logic analyzer display will immediately represent the captured signals with all the proper signal names
displayed.
The 7RS363 may also be used as a Simple diagnostic
tool, separate from its use as a logic analyzer adapter card.
This is accomplished by virtue of the fact that all necessary
signals of the R3000 are immediately accessible as test
points on the card.

• Probe points to 32 address, 32 data and 16
control signals of R3000
• Several clocks available for signal strobes
• Compact physical size permits Its use in target
system with minimal impact on spacing
requirements
• Setup files for 16500A Logic Analyzer assures
speedy startup
• No active components
• Compact design assures minimal added lead
capacitances (approx 5 pF)

OCTOBER 1990

RSD PB363/A/R


. . . . i>
«
> (

. . . . . ·../·.c. . . . ....»> .• • • • ...........>
8 sockets for 256K x 8
or 512K x 8 ROMs or
EPROMs. (1MB,2MB
or 4MB of ROM.)

. . . gf88nU8I· ··1 «> e J• . . .
,
Canon Video
Interface

. . . . . .....')

<

socket~

32
for 256K x 4
non-Interleaved DRAMs
provide space for 1MB,
2MB or 4MB of RAM

IDnRS388 Controller Fits Canon LBP-SX Print Engine

8.6

2

IDT7RS388 REAL8Th1
LASER PRINTER CONTROLLER DEVELOPER'S KIT

PRODUCT BRIEF

PEERLESSPAGEN Printer OS
Peerless Page provides a real-time multitasking executive/kernel surrounded by graphics and I/O services such as
font and emulation soft-switching and active port sensing.
Peerless Page is a portable, easily extensible foundation
that supports multiple industry standards and accelerates
time-to-market by enabling OEMs to quickly and cost-effectively build differentiated product lines.
The Peerless Page OS itself has been ported to the
REAL8 board and is included in the board's EPROMs. The
Truelmage POL interpreter runs on top of PeerlessPage.
Other POLs can easily be ported to the REAL8 by taking
advantage of the graphics and font handling features built
into PeerlessPage.
The PeerlessPage Interface manuals are available directly from the Peerless Group, and describe the calls
available to the user. Users can write C programs with the
appropriate calls, download the compiled code to the REAL8
board, and execute it in a software development environment.

DEMO SOFTWARE INCLUDED
The REAL8 EPROMS contain a demo copy of Microsoft
Truelmage POL, TrueType fonts, and the PeerlessPage
Printer Operating System from The Peerless Group. As
shipped, the board can be plugged directly into a Canon SX
engine and can print downloaded PostScript files. This
provides an easy way to compare the performance of this
R3000 based controller with controllers based on conventional technology or other RISC machines.
N

Microsoft TRUEIMAGE POL
Truelmage is an open technology Page Description Language for high performance full-function printers. Truelmage
provides extensive features such as TrueType™ scalable
fonts, enhanced communication with the Microsoft Windows™ operating environment and improved printer
performance, as well as complete Adobe® PostScript compatibility. Truelmage contains a TrueType rasterizer, as well
as an Adobe Type I rasterizer, and is able to execute Type
I fonts.

~

-V

~

~ Epson ~
: Front
Post- ~ XL24 ~ Font , Panel
Script : Diablo : Scaling : IIF

- ., ----------"--------------..t.---1" eve
pp Icatlon
oper s
~

I

f,t _

__________________________
,....
UNIFORM
HIGH·LEVEL APPLICATION INTERFACE

I"
I

"

'

Toolkit

MULTI-TASKING OS KERNEL
• Memory Mgmt
• Task Mgmt
• Interrupt Services

A
D

• Real Time Clk
• 1/0 Support
• Scheduler

FONT CACHE MANAGEMENT

PeerlessPage™

,

~

,..
..

~~~~~~-I ..'

~--'::::':':':"':~_-I

..,"Hardware

.. Developer's

/

,..

Toolkit

-"

~-#--

HARDWARE

PeerlessPage Printer Operating System

8.6

3

IDT7RS388 REAL8™
LASER PRINTER CONTROLLER DEVELOPER'S KIT

PRODUCT BRIEF

ARCHITECTURAL HIGHLIGHTS
The IDT7RS388 uses a 25 MHz R3001 RISControlier™
with optional (socketed) R3010 Floating Point Accelerator.
The FPA typically provides 15 - 20 % performance improvement in PostScript applications and is not required at all for
PCl applications.

The IDT7RS388 assumes that Instruction Cache is always used, but provides a jumper for disabling the Data
Cache. Similarly, junpers allow the ROM/EPROM area to
accommodate different memory configurations (see Hardware Reference Manual included with Kit).

Optional

R3001
CPU
Centronics
Interface

r--

R3010
f-

Floating
Point
Accelerator

I
D
Cache ¢ache

I
RS232C
Interface

-

SXVideo
Output
to
Print
Engine

~

300 dpi

1MB
TO
4MB

ROMI
EPROM

1MB
TO
4MB
DRAM

ORDERING INFORMATION
Each REAl8 laser Printer Evaluation System includes a controller board for a Canon lBP-SX engine with 4 MB of DRAM
and with Peerless Page and Truelmage in EPROM. Also included is a hardware reference manual describing the design in detail,
including schematics and PAL equations.
REAL8 Printer Evaluation Board ......................................................................................... 7RS388

EI
Integrated Device Technology, Inc. reserves the right to make changes to the specification in this data sheet in order to improve design or performance and
to supply the best possible product.
MacStation, RISC CPU SubSystem, RISController, REAL8 and TargetSystem are Trademarks of Integrated Device Technology, Inc.
Apple, Macintosh, AppleTalk, LaserWriter, AlUX, MultiFinder are registered Trademarks of Apple Computer, Inc. UNIX is a registered trademark of AT&T.
MIPS, RISC/os, and RISCompiler are trademarks of MIPS Computer Systems, Inc. NuBus is a trademark of Texas Instruments, Inc. MS-DOS is a registered
trademark of MicroSoft Corporation. Truelmage and Windows are trademarks of MicrosoftCorporation. TrueType is a trademark of Apple Computer.
Postscript is a trademark of Adobe Systems.

8.6

4

t;)®

IDT7RS502

MacStation™ 2
R3000 DEVELOPMENT SYSTEM

Integrated Device Technology, Inc.

FEATURES:

R3000 COMPUTER PLUGS INTO
A MACINTOSH II:

• 10 VAX MIPS RISC Computer Inside any Macintosh 1\

The MacStation 2 is an R3000 based workstation consisting of a Macintosh II computer and a high-performance
R3000 CPU that plugs into the NuBus inside the Mac. The
R3000 CPU runs 10T/ux, lOT's port of MIPS RISC/os UNIX
V 3, in a window under Multifinder.
The UNIX window is opened by double-clicking an icon
on the Mac screen. The window is essentially a terminal
emulator. When the window is opened, 10T/ux starts up on
the R3000 CPU. Users can switch back and forth between
the 10T/ux window and other Macintosh windows freely.
The system includes MIPS powerful C Compiler. Source
code is written using any Macintosh text editing program,
and a single command in the 10T/ux window will copy the file
into the UNIX file system and compile it. Any other software
written for MIPS RISC/os can be ported to the MacStation 2.
The MacStation hardware uses two slots in the
Macintosh. The MacStation software is supplied on tapes
and requires approximately 160 Mb of hard disc space when
loaded.

• Includes UNIX V 3 Operating System
• Supplied with MIPS C-Compiler, Assembler, and
Symbolic Debugger
• Will support any MIPS software packages,
including SPP, SPP/e, FORTRAN
• Uses all Macintosh peripherals for I/O
• Multifinder and System 7 Compatible

MacStation 2

DECEMBER 1990

RSD PBx502fAfR
l 2 Development System

ORDERING INFORMATION
The MacStation 2 may be ordered in a variety of forms, ranging from the essential boards and software
to complete systems with software pre-installed. The IDT/ux operating system requires a signed singlesystem license agreement which must be executed before the system can be shipped. Contact your
lOT sales office for a sample of the license.
MacStation Boards ................................................................................. 7RS502B8-L
Includes R3000 NuBus CPU card, 8 Mbyte NuBus Memory card, IDTlux.
Requires a Macintosh /I computer, system 6.01 or later, Apple tape drive, at
least 160 Mbytes of free disc space.
MacStation Conversion Kit ..................................................................7RS502TD8-L
Everything to convert a Macintosh /I to a MacStation. Includes the
MacStation boards, IDTlux, a tape drive, a 160 Mbyte Hard Disc and
cables.
Memory Expansion Card ...........................................................................7RS502X8
Adds 8 Mbytes to the Memory Card, raising total available to IDTlux to 16
Mbytes.
Standard MonoChrome System ........................................................ 7RS502MXM-L
Includes the MacStation Boards, IDTlux, tape drive, 160 Mb external hard
disc, cables. Also includes Macintosh /Ix computer with 4 Mbytes of internal
memory and 80 Mb internal hard disc, monochrome 13" monitor and video
card, extended keyboard.
High Perfonnance MonoChrome System ....................................... 7RS502MFXM-L
Includes the MacStation Boards, IDTlux, tape drive, 160 Mb external hard
disc, cables. Also includes Macintosh IIfx computer with 4 Mbytes of
internal memory and 80 Mb internal hard disc, monochrome 13" monitor
and video card, extended keyboard.
Other Macintosh system configurations.
The MacStation can be supplied with any desired Macintosh configuration.
Contact factory for a quotation.

IDT/ux Documentation
This is an eight-manual set of documentation for MIPS RISC/os (UNIX V 3 with BSD extensions).
Included are: RISC/os Programmer's Reference Manual, User's Reference Manual, System Administration Reference Manual, System Administrator's Guide, Programmer's Guide, User's Guide, Streams
Primer and Programmer's Guide, Guide to BSD on RISC/os.
IDT/ux Documentation Package ............................................................ 7RS551 BDU

Other MIPS Software for MacStation
The following MIPS products are available for the MacStation. All require asigned license agreement
prior to shipment. Contact your lOT sales office for a sample of the agreement.
SPP for MacStatlon .............................................................................7RS992SMT-L
Source Code, site license, includes documentation
SPP/e for MacStatlon ..........................................................................7RS993SMT-L
Source Code, site license, includes documentation

8.7

4

t;)®

MacStation 3
R3000 DEVELOPMENT SYSTEM

ADVANCE
INFORMATION
IDT7RS503

Integrated Device Technology, Inc.

R3000 COMPUTER PLUGS INTO
A MACINTOSH II

FEATURES:
• 20 VAX MIPS RISC Computer Inside any
Macintosh II

The MacStation 3 is a high performance R3000 based
workstation consisting of a Macintosh II computer and an
R3000 CPU board that plugs into the NuBus inside the Mac.
The R3000 CPU runs 10T/ux, lOT's port of MIPS RISC/os
UNIX V 3, in a window under Multifinder.
The UNIX window is opened by double-clicking an icon
on the Mac screen. The window is essentially a terminal
emulator. When the window is opened, 10T/ux starts up on
the R3000 CPU. Users can switch back and forth between
the 10T/ux window and other Macintosh windows freely.
The system includes MIPS powerful C Compiler. Source
code is written using any Macintosh text editing program,
and a single command in the 10T/ux window will copy the file
into the UNIX file system and compile it. Any other software
written for MIPS RISC/os can be ported to the MacStation.
The MacStation hardware uses one NuBus slot in the
Macintosh. The MacStation software is supplied on tapes
and requires approximately 160 Mb of hard disc space when
loaded.

• Includes UNIX V 3 Operating System
• Supplied with MIPS C-Compiler, Assembler, and
Symbolic Debugger
• Will support most MIPS software packages,
Including SPP
• Uses all Macintosh peripherals for I/O
• Includes Macintosh-Independent SCSI and
serial 110 ports
• Multlfinder and System 7 compatible

MacStation 3

DECEMBER 1990

RSD PBx503/A11
(!!)1990 Integrated Device Technology. Inc.

8.8

DSC-90591-

1

IDT7RSS03
MacStation 3 RISC Workstation

PRODUCT BRIEF

MacStation 3 HARDWARE

on the SCSI port to provide faster disc access to the RiSe
machine by avoiding delays inherent in the NuBus and Mac
O/S.

Rise CPU Card

Main Memory

The CPU card is a 25 MHz R3000 system, using the
Floating Point Accelerator and 64 KB each of instruction and
data caches. The necessary hardware to run UNIX is also
on the board. EPROMs on the card contain lOT's System
Integration Manager (IDT/sim) which provides many debug
and control features at the monitor level. The board can
communicate with all Macintosh 110 and with Ethernet via
the Macintosh NuBus. Additionally, the board has its own
SCSI and serial 110 ports. A separate terminal can be used
on the serial 110 port to just run command-line Unix, and an
independent Macintosh compatible hard disc can be used

The MacStation 3 is available with either 8 or 16 MB of
DRAM. This permits reasonably large programs to execute
without excessive disc swapping.

MacStation 3 SOFTWARE
The MacStation is shipped with all the software on
tapes. It may be ordered with a 160 Mb hard disc with the
software pre-installed, but tapes are still included for backup.

..

Hard Disc

CRT

-.- -.-

RISC CPU BOARD

t

I

SCSI

I
,
I
•, •
R3000
CPU

..

~

I I
Serial
1/0

t

f

II-CACHE

8 -16
MByte
Memory

D-CACHE I

MONITOR AND
NuBus INTERFACE

I..... ' ...

I

Macintosh II

-

....

.-

NuBus

t
SCSI
CONTROLLEF

•

SERIAL
110

t
68020
CPU

•

t
VIDEO
MEMORY

DRAM
MEMORY

j

J

..

~

CRT
~

Hard Disc

8.8

.I.

2

IDT7RS503
MacStatlon 3 RISC Workstation

PRODUCT BRIEF

R3~J
Nf~~
BOARD

Board may be inserted into any NuBus slot

/dev/people Ilounted on tun/people
/dev/dslc/aacDsk7s0 Ilounted on /un

DlIT: SINGLE USElt IIDDE
Wed Jun 20 14 :39 :00 PDT 1990

Screen Shot of Macintosh with IDT/ux running.

8.8

3

IDT7RS503
MacStation 3 RISC Workstation

PRODUCT BRIEF

ORDERING INFORMATION
The MacStation 3 may be ordered in a variety of forms, ranging from the essential boards and software to complete systems wtth software pre-installed. The IOT/ux operating system requires a signed
single-system license agreement which must be executed before the system can be shipped. Contact
your lOT sales office for a sample of the license.
8 MB MacStatlon System ......................................................................7RS50388-L
Includes R3000 NuBus CPU card with 8 Mbyte RAM, IDTlux. Requires a
Macintosh /I computer, system 6.01 or later, Apple tape drive, at least 160
Mbytes of free disc space.
16 MB MacStatlon System .................................................................. 7RS503816-L
Includes R3000 NuBus CPU card with 16Mbyte RAM, IDTlux. Requires a
Macintosh /I computer, system 6.01 or later, Apple tape drive, at least 160
Mbytes of free disc space.
MacStation Conversion Kit ..................................••..••...........•....•........ 7RS503T08-L
Everything to convert a Macintosh /I to a MacStation. Includes the 8 MB
MacStation board, IDTlux, a tape drive, a 160 Mbyte Hard Disc and
cables.
Other Macintosh system configurations.
The MacStation can be supplied with any desired Macintosh configuration.
Contact factory for a quotation.

IDT/ux Documentation
This is an eight-manual set of documentation for MIPS RISC/os (UNIX V 3 with BSO extensions).
Included are: RISC/os Programmer's Reference Manual, User's Reference Manual, System Administration Reference Manual, System Administrator's Guide, Programmer's Guide, User's Guide,
Streams Primer and Programmer's Guide, Guide to BSO on RISC/os.
IOT/ux Documentation Package ........................................................... 7RS551 80U

Other MIPS Software for MacStation
The following MIPS products are available for the MacStation. All require aSigned license agreement
prior to shipment. Contact your lOT sales office for a sample of the agreement.
SPP for MacStation •........................................................................... 7RS992SMT-L
Source Code, site license, includes documentation
SPP/e for MacStation ...........................................•.....•..............•......•. 7RS993SMT-L
Source Code, site license, includes documentation

Integrated Device Technology, Inc. reserves the right to make changes to the specification in this data sheet in order to improve design or performance and
to supply the best possible product.
MacStation, RISC CPU SubSystem, RISController, and TargetSystem are Trademarks of Integrated Device Technology,lnc. Apple, Macintosh, AppleTalk,
LaserWriter, AlUX, MultiFinder are registered Trademarks of Apple Computer,lnc. UN IX is a registered trademark of AT& T. MIPS, RISC/os, and RISCompiler
are trademarks of MI PS Computer Systems, Inc. NuBus is a trademark of Texas Instruments, Inc. RISC/os is a trademark of MIPS Computer Systems, Inc ..
Ethernet is atrademark of Xerox. VAX is a trademark of Digital Equipment Corporation. OS/2 is a trademark of IBM Corporation. MS-DOS is a registered
trademark of MicroSoft Corporation.
.

8.8

4

G®

IDT/sim
SYSTEM INTEGRATION MANAGER
ROMabie DEBUGGING KERNEL

IDT7RS901

Integrmd Device Technology, Inc.

FEATURES:
• Provides complete control over hardware and
software for system Integration

POWERFUL TOOL FOR R3000 SOFTWARE!
HARDWARE INTEGRATION:
.The IDT7RS901 System Integration Manager (IDT/sim)
is a ROMabie software product that permits convenient
control and debug of RISC systems built around the MIPS
R3000 architecture. It permits users to quickly develop and
debug stand-alone systems. Facilities are included to
operate the CPU under controlled conditions, examining
and altering the contents of memory, manipulating and
controlling R3000 resources (such as cache, TLB and
coprocessors), loading programs from host machines, and
controlling the path of execution of loaded programs. Remote (source/symbolic) debugging is also supported.
IDT/sim requires 82Kb of EPROM space for code and
data and 16Kb of ram sapce for uninitialized variable data
and stack. The minimal 110 system supported uses UARTS.
The default drivers support the 2681 or 68681 devices.
Other devices can be added easily.

• Fits In 82Kb of EPROM space, plus 16Kb of RAM
• ProvIdes CPU control for regIster and memory
manipulation, cache access, and TLB
management
• Includes standard I/O support
• Easy to add new commands and new I/O drIvers.
• Complete support for MIPS symbolic debuggers.
No addItional code requIred
• Supports downloading code In either ASCII or
binary formats

"D

RS232C CABLE
(DOWNLOAD
& REMOTE DEBUG)

-,.

(
/

\

I

~I

I

\

~~
TERMINAL

DEVELOPMENT

HOST SYSTEM
( MacStationTId , MIPS or PC)

TARGET WITH IDT/slm
System Integration Manager

DECEMBER 1990

RSD PB9011AIR
 
Specify virtual to physical mapping in the translation
buffer.

helpl? [commandlist}
Prints a list of the commands available in the monitor.

tlbpid/ti [pid]
Displays the current process identifier ( pid ).

regsel/rs [-c/-h]
Selects display format for register names.

tlbptovltp 
Search the translation buffer for translations which
map to .

checksum/cs
Display the checksums for each of the 'EPROMs'.

load/I <-device> [format]
Down load from an 110 device.

init/i
Initialize prom monitor (warm reset)

debug/db [DEV]
Enter remote symbolic debug mode with host.

dbgint/di [-e/-d>} 
Debug interrupt enable/disable - allows 'break key' to
gen extr. int.

go/g [-n] 
Begin execution at address fill/f [-w/-h/-b/-If-r] [value_list.] Fills memory specified by range with value_list.. gotill/gt
Continue execution from the current PC till break address encountered sub [-w/-hl-b/-I/-r]
This command allows the user to examine and change memory interactively. call/ca
[arg1 arg2 ... argB} Invoke a 'C' language subroutine. dump/d [-wI-hi Display contents of memory. step/s [] Execute a specified number of instructions. move/m [-w/-b/-h] Move the block of memory contlc Continues execution of the client process from where it last halted execution. compare/cp [-w/-b/-h] Compare the block of memory brk/b [addresslist] Display currently set breakpoints or set break point at each of the addresses. search/sr [-w/-b/-h] [mask] Search the area of memory for a value. wc [-i] [-w/-b/-h] [value._list] Fill cache memory with a pattern. unbrk/ub Unset breakpoints listed. cacheflush/cf [-i/-d] Flush both the i-cache and the d-cache. rc [-iJ <-w/-b/-h] Display cache memory. fr Put into the register dr [reg#/name} Print out the current contents of registers. dis Disassemble the contents of memory. 8.9 3 IDT7RS901 IDT/sim SYSTEM INTEGRATION MANAGER PRODUCT BRIEF LIST OF RUN TIME SUPPORT ENTRY POINTS ~exit() _resetO Resets the monitor Returns control to the monitor. _atob(str,intptr,base,seg) Converts an ascii string to an integer. _setjmp(cu,-cntx) Save the current context so that non-local goto's may be implemented. _ clea,-cache(begin_addr, num_bytes) Clears a selected area in I and D cache _cli(cmd_table,prompt) General purpose command line interpreter _Iongjmp(cu,-cntx) Restores the saved context so that non-local goto's may be implemented. _flush_cacheO Flushes both the I and D cache. _showchar(c) Prints the character passed to it in a visible manner _strcat(s,t) Concatenate two strings (39) ~eLrange(str,start,end) Parses the range specification. _strcmp(s,t) Compare two strings (36) ~etchar() Get a character from the standard input device. _strcpy(s,t) Copy one sting to another (38) ~ets(str) Get a string from the standard input device. _strlen(s) Determine the number of characters in a string. (37) _help(argc,argv,cmd_table) Print the usage line for all specified commands _tokenize(cmdline,argv) Parse command line and build argc/argv structure _instaILcommands(cmd_table) Allows the user to extend the command set of the standard monitor. _write(fd,buf,cnt) Write data to an external device. _instaILimmediate_int(ptr_user_inLrt) Installs a pointer to a user interrupt function that will be called by the monitor when an exception! interrupt occurs. _nstaILnew_dev(dtptr,diptr) Installs a new device driver that will be recognized by IDT/sim _instaILnormaLint(ptr_use,-inLrt) Installs a pointer to a user interrupt function that will be called by the monitor when an exception! interrupt occurs. _ioct/(fd,cmd,arg) Sets flags for i/o characteristics and/or calls driver ioctl routines. _ open(device,f/ags) Opens a device for reading and/or writing. yrintf(format,[args)) Formatted print routine yutchar(c) Output a character to the standard output device. yuts(str) Output a string to the standard output device. Jead(fd,buf,cnt) Read data from an external device. 8.9 4 IDT7RS901 IDT/sim SYSTEM INTEGRATION MANAGER PRODUCT BRIEF ORDERING INFORMATION To order the IDT System Integration Manager, order the Developmental Use License AND order the software on the appropriate media. The license will be shipped to you for signature; on return the software will be shipped. You may also order binary distribution rights for the run-time version of the monitor. Ask your IDT sales office for information. Licenses Developmental Use License ••.••.••••.•..•.••.•..•...•...•..••.•....•......•....•.•..••••.•.••.•...•.•...••.•....••....•... 7RS901 SLY Permits purchase of up to six copies of source code (any media combination) and use of source code to develop run-time binaries on up to six machines at a time, but does not permit inclusion of the run time code in an end product. Binary Distribution Rights .•..••...•.•••.•.•••......•....•....••.•....•.......•...•......•••....•.•...••......•....•.•... 7RS901 BLP-L Extension to Developmental Use License to permit inclusion of binary code into end product. Development Use License must be referenced on order or ordered simultaneously. This license permits up to 100 copies to be distributed royalty-free. Additional copies are subject to the royalty below, or a one-time buyout. Binary Distribution SUblicense •.•••....••.•......•..••.......•..•..•......•...••••.•.•••....••••...•.•..•.••...•.•... 7RS901 BLC-L Per Copy Royalty for distribution of runtimes developed using the System Integration Manager beyond the first 100.. Maintenance Agreement .•.•..•..•....•.•.....•..............•...•..••..................•..........•.•.....•....•..•..•..•...7RS901 SSY One year free minor updates, and discounted upgrade to major update versions. We supply a direct telephone contact for support. Source Media Source for 286/386, MS-DOS .•....••.•...•.•......•.....•.•...•..••..................•...••.....•.•.....••...•.....•..•7RS901 SAF-L Use with /OTic C-Compiler (7RS903). Shipped with both 1.2 MB 5.25" and 1.44 MB 3.5" diskettes Source for 286/386 PC, SCO Xenlx ..•...................•......••....•.............•.•.....••.....•....•............ 7RS901 SXX-L Use with IDTlc C-Compiler (7RS903). Developmental Use License number must be referenced on order, or must be ordered simultaneously. Source for lOT MacStation, on Mac Disc ....•.•....••........•..................•............•....•............ 7RS901 SMD-L Use with MIPS C Compiler supplied with MacStation or with IDTlc. Developmental Use License number must be referenced on order, or must be ordered simultaneously. Source for MIPS or SUN Machine, QIC-24 TAR Tape .................................................... 7RS901 SUU-L Use with MIPS C Compiler or with /OTic. Developmental Use License number must be referenced on order, or must be ordered simultaneously. EPROM Versions The following versions of IDT/sim are supplied in EPROMs for the indicated hardware. These versions are for updating the hardware to the latest version of the monitor. For Evaluation boards and Prototyplng Systems ............................................................. 7RS901 BAP Use with 7RS382,383, or any 7RS300 series module Prototyping Platform For the MacStation 1 ........................................................................................................... 7RS901 BBP Use with 7RS501 Original MacStation CPU board For the MacStatlon 2 .••.•......•.....•.......•....•.••...........•......•...................•.....•.....•.................•.•.. 7RS901 BCP Use with the 7RS502 MacStation CPU board Auxiliary Download Programs For downloading code from a MIPS machine into IDT/sim. This software includes programs to convert MIPS object code into S-records and to download either ASCII or binary S-records to a remote target. This software is only needed with MIPS computers; all other machines (including the MacStation) have standard utilities available to perform this function. 8.9 5 t;)® IOTIc MULTI·HOST C·COMPILER SYSTEM IDT7RS903 Integrated DevIce Technology, Inc. FEATURES: OPTIMIZING C-COMPILER SYSTEM: • Includes C-compiler, Optimizing Scheduler, Assembler, and Linker IOTic consists of a set of software products that run on a variety of platforms, and which together produce highly efficient code for R3000 CPUs. The code can be downloaded in several formats to a target machine for execution . On the target machine, the code can be controlled with lOT's System Integration Manager (IOT/sim). The compiler is based on the popular GNU C compiler, and is fully compliant with ANSI C. The entire package is available for execution on 286 or 386 machines underMS-OOSorXENIX, as well asthe MIPS and SUN workstations, and lOT's MacStation single user workstation. For any platform, IOTic can be ordered with or without a software floating point library. A switch in the compiler determines if floating pOint instructions will result in R3010 instructions in the object code orwhether calls to the floating point library will be made instead. • Optional Floating Point Software • Meets Plum Hall 2.00 ANSI C validation suite • Runs on 80286 and 80386 machines under MSbos™ or XENIXTM, on MIPS machines under RISC/os, and on MacStation™ under IOT/ux • Supports entire lOT family of MIPS ISA Processors: R3000, R3001, R3051, and R3052 • Loader communicates with lOT's System Integration Manager (IOT/slm) • Provides control over multiple memory segments D [R3K] ~ 00 00 00 IJI [JI [JI [JI [JI [JI [JIIJIIJI [JI [JIIJI DiDiDiDi DiDiDiDi &I IOT/c System Flow DECEMBER 1990 RSD PB903/C/R 101990 Integrated Device Technology. Inc. 8.10 DSC-8061/· 1 IDT7RS903 IDTlc R3000 C-COMPILER PRODUCT BRIEF DESCRIPTION: The 10T/c C-Compiler System is a complete development package for CPUs based on the R3000 architecture. It contains an optimizing cross compiler, scheduler optimizer, cross assembler, linker, anda downloader. The 'c' compiler is compliant with ANSI 'c' standard and performs the optimizations available in state of the art 'c' compilers. The assembler supports the R3000 machine instructions and architecture described in the book by Gerry Kane, "MIPS RISC Architecture", including both native and synthetic instructions. The complete IOTic package runs on a variety of host machines and operating systems and is part of lOT's cross development system tools which include other packages such as debug monitors and libraries. Complier The C pre-processor is GNU cpp and the compiler itself is based on GNU C. All C-preprocessing features are supported. The combination of the compiler and assembler included in 10T/c has been tested for compliance to the ANSI 'C' standard using the Plum Hall test suite and is compliant. C programs written forthe MIPS C compiler may also be compiled without modification. The C compiler performs extensive optimization in multiple passes through the code. Each of the many optimization techniques can be individually switched on or off with compiler directives. Optimizing Scheduler and Assembler The lOT cross assembler input is compatible with source code written for the MIPS assembler. It implements the R3000 native instruction set as well as the augmented synthetic instructions defined in the "MIPS RISC ARCHITECTURE" book by Gerry Kane. There are some extensions in the lOT cross assembler that provide the programmer with more control over code generation, such as 'Iaiu' load address upper and 'oria' -load address lower, enabling direct programming in pure assembly language. The assembler produces .0 files which are later linked togetherwith other files to produce an executable file. The scheduler first expands the synthetic instructions into the native instruction set. It then rearranges code to allow for and take advantage of R3000 pipeline architecture. At the same time the scheduler analyzes loads of static constants and makes use of previously loaded constants that are close in value. Memory description file The memory description file is used to instruct the linker where to place object modules in the R3000 memory map. It tells linkerwhat address classes are legal, what addresses exist within those classes, and what addresses should be written to output files. The file consists of a sequence of class specifications (COOE, OATA, etc.) and associated address ranges. Linker The linker combines together separately assembled program files into one object module. Command line switches may be used to override the memory description file. The format of object code produced by the assembler in 10T/c is not compatible with the format produced by the MIPS assembler, so modules compiled by the MIPS software cannot be linked directly with modules compiled by IOTic. Recompilation under 10T/c is required. There are three types of output file formats supported: SRecords,lNTEL hex, and binary image. The S-Record files are useful in down-loading to target boards. The. INTEL hex format files are useful for EPROM programing because the linker provides for the code to be divided into multiple files under this format. Endlanness IOTlc is Big-Endian. Floating Point Library 10T/c may be ordered with a floating point library. A switch in the compiler is set at compile time to determine how the compiler should handle floating point instructions. In the the normal mode, it will produce R301 0 Floating Point Accelerator instructions in the object code. If the switch is set the other way, the compiler will insert calls to the floating pOint library instead, and the floating point library must be lavailable at link time. Because the compiler knows about the libary during compile time, it can perform optimizations not otherwise possible and keep the execution penalty for using software instead of hardware to about a factor of 4 in very fp intensive code. 8.10 2 I0T7RS903 IOTIc R3000 C-COMPILER PRODUCT BRIEF Idtel ('------------- --- ............... _-- ...__ ...-.... i asr3k r----:---~i--t ~ Assembler I i I ,__J::::=:=:::;~__ _._._. _-_-_ J-----.. +_-_-_.... IOTlc Flow 8.10 3 IDTIRS903 IDT/c R3000 C-COMPILER PRODUCT BRIEF OPTIMIZATION PASSES Multiple optimization passes are performed by the GCC compiler. Below is a brief description of what takes place on each pass. Note that switches can be used In the compiler to turn individual optimization choices off or on, providing the programmer with a great deal of control over how the compiler modifies the code. Jump optimization Simplifies jumps to the following instruction, jumps across jumps, and jumps to jumps; while deleting unreferenced labels and unreachable code. Register Scan and common subexpresslon elimination Finds first and last use of each register for purposes of subexpression elimination while performing constant propagation. Loop optimization and strength reduction Moves constant expression code outside of dynamic loop. Data flow analysis The program is divided into basic blocks and identifies the life of values in registers. Once done, then code producing unused results can be eliminated and unreachable loops are eliminated. Local register allocation Allocates registers to be used inside each basic block. Global register allocation Assigns registers for values which live across basic block boundaries. Final Pass The final pass is to generate assembler code. At this point peephole optimizations are performed as well as generating and optimizing the function entry and exit code sequences. PERFORMANCE COMPARISONS Execution To obtain a measure of the efficiency of the IDT/c compiler, a set of benchmark programs was compiled under both IOTlc and the MIPS compiler, and the size and execution time of the resulting binaries were compared. Execution Time Comparison Code Size Exec. Time Complied with MIPS C 1.0 1.0 Complied with IOTlc 1.20 1.19 Compile Time The time required to compile a program under IDT/c depends on the machine speed, type, and configuration. For comparative purposes, the Stanford benchmark was compiled under a variety of hosts and the results are shown below. For reference, the same program was also compiled using the MIPS compiler. Compile Time Comparisons Compile Time Host 24 sec. MIPS C on MIPS Machine 25 sec. 10T/c on MIPS Machine 695 sec. 10T/c on 10 MHz 286, MS-OOS 70 sec. 10T/c on 25 MHz 386, Xenix 8.10 4 IDT7RS903 IDTlc R3000 C-COMPILER PRODUCT BRIEF COMMAND LINE SWITCHES ASSEMBLER DIRECTIVES -E: .allgn n, n=1-3 align so that n least significant bits of address are O. pre-process only .. S file is expected. Pre-processed file is written to the standard output. .ascll "string" assemble string. -0 : Optimize (GNU cc -0 option). .bss -01 : store following into bss section. Optimize even more (GNU cc options: -fstrengthreduce -fforce-addr -fforce-mem -fcombine-regs -finline-functions). .byte arg,arg, ••. ,arg assemble arguments into consecutive bytes. .data -c: store following into data section. Assemble only, do not link. Expected are filenames with .s or .S suffixes. Output files (in absence of -0) will have .0 suffix. .end name Included for MIPS asm compatibility but ignored. produce assembly listing. .ent name Included for MIPS asm compatibility but ignored. -ZA: -0 XXX: Name output file. The default output name is 'out.sre'. .extern name,n Import symbol 'name' that refers to n bytes of storage . (included for MIPS asm compatibility and ignored). -ZL: .globl name export defined symbol. Produce link map. -Fxxx.xxX: Use xXX.xxx as memory layout description file. In absence of -F option the default is to use file idt.mem in default library directory. .half arg ,arg ,••• ,arg assemble arguments into consecutive halfwords. .set argument argument can be : -ZThhhhhhhh : Specify text loading address, hhhhhhhh is address in hex, up to 8 hex digits. This will override .mem file definitions. at - error flag every use of $1. noat - disable errors due to user's usage of $1 (at). reorder - enable scheduling to resolve pipeline conflicts. -ZDhhhhhhhh : Specify data loading address, hhhhhhhh is address in hex, up to 8 hex digits. This will override .mem file definitions. -e name: Use global 'name' as program start address. noreorder - disable scheduling. .space n skip next n bytes, advancing location counter by n. .text -noenv: Do not include default library modules which define the order of program sections and global symbols that point to beginning and end of text, data and bss. -nostdllb: Do not include library for linking with lOT PROM monitor. store following into text section. .word arg,arg, ••. ,arg assemble arguments into consecutive words. SEGMENT The SEGMENT directive selects the address segment where the following code or data will be stored. It is used to implement '.text', '.data' and '.bss' which are MIPS compatible segments. Using this directive the user can create other custom segments. 8.10 5 IDT7RS903 IDTlc R3000 C-COMPILER PRODUCT BRIEF ORDERING INFORMATION The IOTlc C-Compiler is an efficient R3000 C-compiler system based on the popular GNU C and hosted on a variety of computers. The IOTic system includes the compiler, assembler, scheduler and linker. All PC versions of the software are shipped with both 1.2 MB floppy discs and 1.44MB 3.5" diskettes. A "boxtop" single user license is included with the product. Media, without Floating Point The software listed below does not include the floating point library. For 286 machine, MS-DOS •..•.•••.••••..•.•.•.•...•......•.....•.••..•.•.•.•.••.•.••••.•..•.••.•.•.•.••.•.•.•... 7RS903BAF-N Not recommended for large, complex programs. At least 2MB RAM recommended. Requires DoOS version 3.3 or greater. For 386 machine, MS-DOS .......................................................................................7RS903BBF-N Note: as of 10/1/90, this product is the same as the 286 MS-DOS version, but a performance enhancement is planned for early 1991. Registered users of this version of the software will automatically receive the enhanced software as soon as available. For 286 machine, SCO Xenlx ••.••.•..••.....•.•.•............•.•...•.••....••.•.•..•.•.....•.•.•.•.••...•...... 7RS903BYX-N For 386 machine, SCO Xenlx •.•...•.•....•......•..................•.............•.....•.....•.....•••.....•••. 7RS903BXX-N For MIPS machine RISC/os, on QIC-24 TAR Tape ................................................. 7RS903BUU-N For MacStatlon, on Macintosh Tape ••••••.••.•...•.............•............•..•.••..•..•..••..••.•.•....• 7RS903BMD-N Runs on MacStation R3000 board under IDT/ux. For SUN Sparcstatlon, on QIG-24 TAR tape ........................................................... 7RS903BWU-N Media, with Floating Point Library The software listed below includes the floating point library. For 286 machine, MS-DOS .....................................................................................7RS903FBAF-N Not recommended for large, complex programs. At least 2MB RAM recommended. Requires DoOS version 3.3 or greater. For 386 machine, MS-DOS .....................................................................................7RS903FBBF-N Note: as of 10/1/90, this product is the same as the 286 MS-DOS version, but a performance enhancement is planned for early 1991. Registered users of this version of the software will automatically receive the enhanced software as soon as available. For 286 machine, SCO Xenlx .................................................................................7RS903FBYX-N For 386 machine, SCO Xenlx .................................................................................7RS903FBXX-N For MIPS machine RISc/os, on QIC-24 TAR Tape ............................................... 7RS903FBUU-N For MacStatlon, on Macintosh Tape .•...•.••...•.•.....••.•••••.•••...••.•.••.•.•••.•••...••••••••••...• 7RS903FBMD-N Runs on MacStation R3000 board under IDT/ux. For SUN Sp,arcstatlon, on QIG-24 TAR tape ......................................................... 7RS903FBWU·N Floating Point Upgrade The version of the compiler without floating point may be upgraded to add the floating point library. To upgrade, contact your lOT sales office. Indicate the order code and serial number for your original software on your order, so we will ship the correct format. Floating Point Upgrade ...............................................................................................7RS905BZU Maintenance Maintenance .................................................................................................................7RS903BSV Includes free upgrades for one year and direct telephone contact for support 8.10 6 ,;5 CROSS ASSEMBLER FOR IBM PCs AND CLONES IDT7RS904 IntegrOlted Device Technology, Inc. FEATURES: • C-Llke Pre-processor allows programmer to extend Instruction set DESCRIPTION: IDT7RS904 is a cross assembler with a C-like pre-processor and linker that produces Motorola S-record or INTEL HEX downloadable files and a downloader. It is intended for cross-development with R3000 as target architecture. The assembler is compatible with files written forthe MIPS assembler. The assembler supports the R3000 machine instructions and architecture descibed in the book by Gerry Kane, "MIPS RiSe Architecture". The cross assembler package runs on variety of host machines and operating systems. The pre-processor, assembler and linker are invoked by a single driver named "asr3k". • Provides for separate code segments loaded at different memory locations • Outputs S-records and Intel HEX • Provides symbolic and hex Instruction listing • Lists Absolute addresses of code segments and symbols • Interfaces to lOT System Integration Manager ,, ,,, ,, ,,, ,,, , ,,, ,, ,, ,, ,, ,, ,,, , !,--_............ __ ............. Assembler Flow OCTOBER 1990 RSD PB9041A1R ::< ". ".~".i<>< ... i> {" .. ."""" ,i·, " ~...• ',.< "'. PROCESSOR 1 .U ... LOCAL CACHE LOCAL CACHE <", ,.,,,""' .. MEMORY MODULE 1 ,\'" MEMORY MODULE 2 I·, .. "" """, ".",." :.,. i~ ·""ii"""'" .:'i:}"","· .,' ... '.... MEMORY MODULE M I'.·.·.·.·· .......•·...•...•••·· . ··.·.···.. ·.••···•·...•. ," : ', ><<,: .•, ... ,. ,' ..,', 1/02 •••••• .'i'?,r" ,'.',. ,. '." NElWORK } • • • • • • • • • . l. < " , ) 1 . ·• ·,. ·.·,·,·,.·.,.·,",. ",".".'" "", ,,< "> INTERCONNECTION • • • • • • •_ -< """.,.,.", . /« :.'":,,,i,,' LOCAL CACHE ,'' ' , - i. . . . »1 PROCESSOR N PROCESSOR 2 ./,.,.". >, ? •••• I· •••••• •••• •• •••••••• •••••••••••••••• ••••• •••••••• •• Figure 1. Block Diagram of a Shared Memory Multiprocessor System SHARED MEMORY MULTIPROCESSOR SYSTEMS A simplified block diagram of a shared memory multiprocessor with local caches is shown in Figure 1. This model of a multiprocessor system is defined to be tightly coupled and the N processors are connected to M memory modules and P I/O devices via an interconnection network. All the processors have a local cache memory, share the same global address space and communicate via shared memory. The interconnection network ensures complete connectivity between the processors and memory modules and can be implemented as a simple shared bus, multi-stage delta network or a more complex cross-bar switch. The global shared address space is assumed to be interleaved amongst the memory modules in order to minimize memory access conflicts. Note that the need for a interconnection network can be obviated by using a multi-port memory [1]. Examples of commercial machines employing a shared memory multiprocessor configuration using the R2000/3000 RiSe processor include the Titan Graphics Supercomputer from Ardent Computers [2] and the 4D-MP Graphics Superworkstation from Silicon Graphics [3]. © 1988 Integrated Device Technology. Inc. CACHE COHERENCY The presence of local caches in a shared memory multiprocessor system introduces the issue of cache coherency that may result in data inconsistencies. This problem arises because several copies of the same data may exist in local caches of different processors at the same time. If one of the processors modifies (writes) the value of its copy of the data, then the other processors will have the stale or incorrect copy of the modified data in their local caches. This is a potential problem created by asynchronous parallel algorithms that do not have explicit synchronization. Data inconsistencies may also arise in multiprogrammed multiprocessor systems whereby a suspended process may migrate to another processor and the most recently updated data ofthe process might still be in the original processor's local cache. When the process is run on the new processor, there is a possibility that stale data is used if the local cache was not previously flushed. This assumes that the process did run previously on this processor. It is clear that if data consistency is to be ensured in a multiprocessor system, cache coherency must be maintained. 9.4 Pr1nted In the U.S.A. 01/89 1 USING THE 1DT79R3000 IN A MULTIPROCESSOR ORGANIZATION APPLICATION NOTE AN-28 Figure 2. Block Diagram of a DuaIIDT79R3000 Shared Memory Multiprocessor A static approach to maintain cache coherency is to make all writeable data that is shared, non-cacheable. This method ensures data consistency, but at the price of decreased performance and with increased main memory conflicts. A dynamic approach to maintain cache coherency is to allow multiple copies of shared writeable data to exist and rely on a cache coherence protocol between the processors to ensure cache consistency. Several cache coherence protocols have been proposed and implemented using both hardware [4] and software support [5]. The type of protocol used depends primarily on interconnection network and the number of processors in the system. A DUAL IDT79R3000 SHARED MEMORY MULTIPROCESSOR A simplified block diagram of a dual IDT79R3000 shared memory multiprocessor is shown in Figure 2. A simple shared bus configuration was chosen for clarity. The two processors are connected to the main memory and an I/O device via a common bus. Access to the shared bus is arbitrated by the bus control logic. Each processor has an instruction and data cache and write-through cache update policy is assumed, i.e. all writes to the cache are also immediately transmitted directly to main memory. Note that a write-back cache update policy, (writes done only to the cache and main memory is updated when the cache line is replaced) would generate less memory traffic [10]. This is usually implemented when there are more than two processors in the system. Read and write buffers are included to provide a convenient asynchronous interface to the main memory. The snoop cache and control logic is used to implement a dynamic cache coherency check mechanism. For clarity, a very simple cache coherence protocol is chosen for the dual IDT79R3000 multiprocessor system and is described in detail below (more sophisticated and efficient schemes are described in [4], [5], [6] , [7] & [8J). 9.4 2 USING THE 1DT79R3000 IN A MULTIPROCESSOR ORGANIZATION APPLICATION NOTE AN-28 Figure 3. Processor Interface to the Snoop Control Logic CACHE COHERENCE PROTOCOL Each snoop cache maintains a directory of the current entries in the local data cache, (I. e. it contains the tags of all the current entries in the local data cache). Its primary function is to monitor the external memory bus for an address match. In addition, the snoop cache maintains state information for each data cache line. A cache (tag) line can be in one of three states: private, shared or invalid. Data that is exclusive to the processor is marked private, data that Is common to the processors is marked shared and data that is inconsistent is marked invalid. The snoop cache is updated concurrently with the data cache. Whenever processor 1 modifies or writes a line that is marked shared in its local cache, its snoop control logic Signals processor 2 that a write to a shared line has occurred. The snoop control logic of processor 2 then interrogates its snoop cache to determine whether a copy of the modified data is present in the local data cache. If a copy is present, it is invali~ated using the MP request and MP invalidate signals as shown In t~e Figure 2 and the tag line in the snoop cache of processor 2 IS marked invalid. The snoop control logic of processor 2 sends an acknowledge signal to processor 1 which then proceeds to complete its write operation to the shared location, I.e. writes into the data cache as well as into the write buffer. It must be noted that the data value in the write buffer mUEt be retired to the main memory before the write operation can be completed. This prevents possible data inconsistencies that may arise by processor 2 trying to read that particular main memory location before it is updated. This cache coherence protocol is also known as cross-interrogation. Note that this protocol is applicable only to cache lines that are marked shared, while writes to cache lines marked private are performed at the processor speed. In the event of simultaneous writes to the same shared cache line by both the processors, only one ofthe processors will successfully acquire the external bus (determined by the bus arbitration logic) to issue a cross-interrogation signal to the other processor. The write operation of the processor that did not ac~uire the exte~al bus will result in a write miss. Figure 3 shows a typical processor Interfaceto the snoop cache and control logic in more detail, and is also described below. 9.4 3 USING THE IDT79R3000 IN A MULTIPROCESSOR ORGANIZATION APPLICATION NOTE AN-28 Figure 4. Cache Invalidation Timing Diagram DYNAMIC CACHE COHERENCY CHECK MECHANISM The signals at the snoop logic - processor interface include the MP request, MP invalidate, processor latch enable, invalidate latch enable and the invalidate address (address of the cache location to be invalidated). The snoop logic receives a cross- interrogation signal from the other processor when a write is performed to a shared cache line. It then searches its tags for an address match. If a match occurs, the address is captured in the invalidate address register which is clocked by SysOut*, as shown in the Figure 3. The CpCond(3) input (MP request signal) of the IDT79R3000 is then asserted, causing the 79R3000 to enter into a MP stall. As there is no cache activity on the first cycle of an MP stall, the processor latch enable signal is deasserted and the invalidate latch enable is asserted in order to present the invalidate address to the data cache. After the first stall cycle, the CPU will issue DRd* pulses during every phase 2 and DClk (connected to the transparent latches) during every phase 1, this lasts until the end of the stall or until one cycle after the assertion of CpCond(2). This permits the snoop logic to read the data cache (Data and Tag values can be sampled by the falling edge of SysOut*) in order to determine whether an invalidation is to be performed. If the cache location is to be invalidated, the MP invalidate signal (connected to the CpCond(2) input of the 79R3000) is asserted. Invalidation occurs by the assertion of Dwr* during phase 2 of the stall cycle with an arbitrary invalid tag and arbitrary data value driven onto the Tag and Data buses. If CpCond(2) is deasserted while CpCond(3) is still asserted, the processor will return to issuing DRd* pulses to enable data cache reads. The cycle after CpCond(3) is deasserted contains no cache activity. This cycle is used to re-enable the processor's transparent latch and disable the invalidate transparent latch. A detailed timing diagram of a snapshot of the 9.4 4 II I USING THE IDT79R3000 IN A MULTIPROCESSOR ORGANIZATION cache invalidation process is shown in Figure 4. This is a modified version of the timing diagram shown in [11]. Note that Figure 4 shows the minimal timing required. CpCond(2) is asserted two cycles after CpCond(3) is asserted and before the first Drd*. This implies that the data location is invalidated irrespective of the value being read. The symbol "cD" denotes that the cache drives the data and tag buses when CpCond(3) is asserted. The symbol "pO" denotes that the processor drives the data and tag buses when CpCond (2) is asserted. The snoop control logic, at this stage, must mark the tag line in Its snoop cache as invalid and send an acknowledge signal to the other processor. This indicates that the cache invalidation is complete. If desired, more sophisticated and efficient invalidation schemes, such as techniques for block invalidation, could be implemented. SECONDARY CACHE SCHEME The cache-main memory Interface described above could be made more efficient by using a system of multi-level caches [9], [12], to provide additional memory bandwidth. For .instance, a secondary cache that is four times the size of the first leve~ or primary cache could be implemented. !he seconda~ cache I.S a superset of the primary cache and also Includes state Information to maintain cache coherency. The cache update policy is typically write-through, from the primary to the secondary cache and write-back from the secondary cache to main memory. Since the primary cache is always a subset of the secondary cache, consistent data is guaranteed. This type of multi-level cache organization is implemented in the 4D-MP Graphics Superworkstation [3] made by Silicon Graphics. CONCLUSION Maintaining cache coherency Is vital in shared memory multiprocessors. The implementation of the cache- main memory interface and the cache coherency protocol are critical issues. The IDT79R3000 RISC processor provides features that facilitate the implementation of cache coherence check mechanisms with minimum hardware and is well suited to be used in a shared memory multiprocessor environment. APPLICATION NOTE AN-28 REFERENCES [1] K. Hwang & F. A. Briggs, Computer Architecture and Parallel Processing, McGraw-Hili, 1984, pp 459 - 525. [2] T. Diede et ai, "The Titan Graphics Supercomputer Architecture",IEEE Computer, Sept. 1988, pp 13 - 30. [3] F. Baskett, T. Jermoluk & D. Solomon, "The 4D-MP Graphics Superworkstation: Computing + Graphics = 40 MIPS + 40 MFLOPS and 100,000 Lighted Polygons per second", Digest of Papers, COMPCON, Spring 1988, 33'rd IEEE Computer Society Int. Conference, pp 468471. [4] J. Archibald & J. Baer, "Cache Coherence Protocols:. Evaluation Using a Multiprocessor Simulation Model", ACM Transactions on Computer Systems, Vol. 4, No.4, Nov. 1986, pp 273-298. [5] A. J. Smith, " CPU Cache Consistency with Software Support and Using "One Time Identifiers" ", Technical Report 86/290, EECS Department, University of California, Berkeley, California 94720. [6] M. Papamarcos & J. Patel, "A Low-Overhead Coherence Solution for Multiprocessors with Private Cache Memories", Proc. 11'th Annuallnt. Symp. on Compo Arch., June 1984, pp 348 - 354. [7] L. Rudolph & Z. Segall, "Dynamic Decentralized Cache Schemes for MIMD Parallel Architectures", Proc. 11'th Annual Int. Symp. on Comp. Arch., June 1984, pp 340 - 347. [8] P. Sweazey & A. J. Smith, "A Class of Compatible Cache Consistency Protocols and their Support by the IEEE Futurebus", Proc. 13'th Annual Int. Symp. on Compo Arch., June 1986, pp 414 - 423. [9] A. J. Smith, "Design of CPU Cache Memories", Proceedings, IEEE TENCON, Korea, Aug. 1987, pp 1 -10. [10] A. J. Smith, "Cache Memories", ACM Computing Surveys, Vol. 14, No.3, Sept 1982, pp 474 - 530. [11] Multi-processor Interface, MIPS R3OO0 Processor Interface, MIPS Computer Systems, May 231988, pp 59-61. [12] S. Przybylski, M. Horowitz & J. Hennessey, "Performance Tradeoffs in Cache Design", Proc., 15th Annual Int. Symp. on Compo Arch., June 1988, pp 1 -18. 9.4 5 t;)® cIt APPLICATION NOTE AN-33 THE EFFECT OF BRANCH, LOAD AND STORE LATENCY ON RISC PROCESSOR PERFORMANCE Integrated Device Technology. Inc. By Roy M. Johnson INTRODUCTION INSTRUCTION PIPELINE One of the ways RISC processors attempt to achieve single-cycle execution is by pipelining instruction execution. Instruction flow through the pipeline can be impeded, however, due to the latency effects of instructions such as loads, stores and branches. As a consequence of these latency effects, single·cycle instruction execution cannot be maintained without a loss of pipeline efficiency. This article uses a simple probabilistic model to measure the effect of load, store and branch instructions on pipeline performance in the Am29000, MC88100, SPARC and the R3000 RISC processors. The analysis shows that the pipeline architecture of the R3000 processor is able to minimize the latency effects of loads and branches to a greater extent than the other RISC processors and thereby maintain higher pipeline efficiency. An instruction pipeline can potentially increase the throughput of a processor by overlapping the execution of several different instructions. In other words, at any given time in an instruction pipeline, several instructions can be in different stages of execution. Figure 1 shows the stages of a typical instruction pipeline of a RISC processor with a load/store architecture (only load and store instructions access memory and all other operations are performed on registers). After the initial start up latency to fill the pipeline, an instruction theoretically can complete execution on every clock cycle as shown in Figure 2. Figure 1. Stages of a Typical R3000 Instruction Pipeline However, because of data dependencies between instructions in the pipeline and latency effects of instructions such as loads and branches, single-cycle instruction execution cannot be maintained. For instance, in the case of conditional branch instructions, if the branch condition is evaluated in the ALU stage of the pipeline (as shown in Figure 1) at a time t, and ~ the branch condition is true, the target instruction can only be fetched at time t + 1. The instructions fetched at times t - 3, t - 2 and t -1 (assuming sequential instruction fetch continues after the branch instruction is fetched) will have to be discarded or flushed from the pipeline. Thus, there are 3 cycles in which no useful work is done. Similarly, in the case of a load instruction, because of finite memory access time, the operand of the load instruction will not be available for the next instruction in the pipeline (typically the instruction after the load uses the data of the load instruction). Therefore, the pipeline will have to be stalled while the memory request is satisfied. Store instructions can similarly stall the pipeline if both instructions and data use the same single port memory. To prevent memory access conflicts, the instruction after a store cannot be fetched until the store is completed. The examples described above illustrate the loss in pipeline efficiency due to the latencies of branch and memory reference instructions. Techniques to improve pipeline efficiency by reducing the branch penalty include the use of branch target buffers [1], dynamic branch prediction strategies and delayed branching schemes [2]. Latency effects of memory reference instructions can be minimized by the use of a Harvard architecture with a split cache memory for instruction and data [3] and instruction prefetching schemes. The effect of load, store and branch instructions on pipeline performance can be approximated using asimple probabilistic model. This mathematical model is an extension of the one proposed by D. J. Lilja [4] for measuring the effect of branch penalty in pipelined processors. The model is described below and is used to compare the pipeline performance of the Am29000, MC881 00, SPARC and the R3000 RISC processors. PIPELINE STAGE CYCLE IF ID OF ALU WB (start up latency) 1 i 2 i+ 1 i 3 i+2 i+ 1 i " " 4 i +3 i+2 i+ 1 i 5 i +4 i+3 i+2 i+1 i 6 i +5 i+4 i+3 i +2 i+ 1 7 • • • • • (instruction i completes) (instruction i + 1 completes) etc. Figure 2. Instruction Flow Through the Pipeline c 1989 Integrated Device Technology, Inc. Printed in the U.S.A. 9.5 7/89 APPLICATION NOTE AN-33 PIPELINE PERFORMANCE MODEL CPlave = Pb(1 + bnop) + Pm(1 + m) + (1 - Pb- Pm)(1) Consider a pipelined RISC processor in which all instructions except memory reference instructions and branches can be executed on every clock cycle, i.e. assume single cycle execution for all instructions except loads, stores and branches. Then the average number of cycles per instruction (CPl ave ) is given by: CPlave =Pb(1 + b) + Pm(1 + m) + (1 - Pb - Pm)(1) (1) CPlave= average number of cycles per instruction probability that an instruction is a branch branch penalty (number of cycles wasted in the pipeline ff an instruction is a branch) Pm probability that an instruction is a memory reference (load or store) = memory reference penalty (number of cycles wasted in the pipeline if an instruction is a memory reference, i.e. the memory reference latency) Pipeline efficiency can then be computed as the ratio of the minimum number of cycles per instruction (maximum performance case, which is 1 cycle for this model) to the average number of instructions per cycle: Pipelineeff = (2) CPI ave where Pipelineeff = the pipeline efficiency Note that if b = 0 and m = 0, i.e. there are no branch or memory reference penalties, then CPlave - 1 from Equation (1) and Pipelineelf from Equation (2) where, b = number of delay slots after a branch instruction the effective number of cycles/branch = 1 + bPnop and the effective number of cycles/memory reference = 1 + m if fi is the probability that a delay slot i is filled with a useful instruction, then 11 + f2 + ...... + fb (4) Pnop = 1 - b Pb b m (3) In other words, when the average number of cycles per instruction is 1, then the pipeline operates at maximum efficiency. The key assumptions of the pipeline model represented by Equation 1 are: 1) The execution of all branch and memory reference instructions will result in a pipeline penalty. This may not be true in all RISC architectures especially with reference to conditional branch instructions where the pipeline pena~y will depend on whether or not the branch is taken. Similarly, the memory reference penalty will depend on whether the instruction is a load or a store. 2) The model also assumes that a new instruction can be fetched on every clock cycle, i.e. a 100% instruction cache hit rate which is the ideal case. Equation 1 should be modified to accurately model a target architecture. The RISC architectures in this comparison (the Am29000, SPARC, MC881 00 and R3000) use a delayed branching scheme. A processor with a branch delay of b cycles will always execute b instructions after a branch instruction, whether the branch is taken or not. The compiler (for a processor using delayed branches) determines the instruction dependencies and reorganizes the instruction stream to fill the branch delay slots with useful instructions. In many cases some or all of the b delay slots after the branch must be filled with no-ops. Gross and Hennessy [5] developed an algorithm for optimizing delayed branches in the MIPS project and have shown that the first branch delay slot can be filled with a useful instruction more than half the time while subsequent delay slots are increasingly harder to fill. Equation 1 can be modified for a RISC architecture with a delayed branching scheme with b branch delay slots as follows: Thus, Pnop shows the fraction of delay slots that do no useful work and thereby add to the branch penalty. Equation 3 can be further reduced to: CPlave = 1 + bPbPnop+ mPm In other words the average number of cycles per instruction is 1 plus a fraction due to the branch penalty plus a fraction due to the memory reference penalty. The pipeline model represented by Equation 3 will be used in the architectural comparisons made in the subsequent section. INSTRUCTION MIX The format of the instruction mix used in the performance comparisons is shown in Table 1. This instruction mix is typically representative of non numeric code (no floating point operations included) and is obtained from a suite of integer benchmarks performed by MIPS Computer Systems. It is important to note that even though an instruction mix is program dependent, a fairly significant portion of the total number of instructions usually consists of load, store and branch instructions. In the analysis it is assumed that conditional branch instructions are taken 65% of the time [4] and the instruction immediately following a load always uses the data of the load instruction. INSTRUCTION TYPE RELATIVE FREQUENCY ALU operations .55 Branch instructions .15 Load Instructions .20 Store Instructions .10 TOTAL 1.0 Table 1. Instruction Mix Format PERFORMANCE COMPARISONS This section uses the pipeline model represented by Equation 3 and the instruction mix described in Table 1, to compare the the pipeline efficiencies of the Am29000, SPARC, MC88100 and the R3000 RISC architectures. It is assumed that the branch delay slot can be filled with a useful instruction 50% of the time in all cases [5]. Am29000 The Am29000 RISC processor has a 4 stage instruction pipeline, provides an interface to separate instruction and data caches and also includes an on-chip branch target cache [7]. The processor also includes a 4 deep instruction pre-fetch buffer. All branch instructions execute with a single branch-delay slot. If the branch is 9.5 2 APPLICATION NOTE AN-33 taken and the target address is found in the branch target cache, there is no extra branch penalty (except for the single delay slot). However, in case of a branch target cache miss, there is a penalty of at least 2 cycles and the probability of a branch target buffer miss will be assumed to be .25 [4]. Load instructions execute with a latency of at least 1 cycle and a store instruction will cause a pipeline hold mode of at least one cycle if overlapped with a load. Equation 3 can be modified to to model the Am29000 and reduces to: CPlave = 1 + bPbPnop + PbPI Pm c + 1Pi + SPs (6) CPlave = average number of cycles per instruction Pb probability that an instruction is a branch (.15) b number of branch delay slots (1) Pnop probability that an instruction in a branch delay slot is a no-op (.5) probability that branch is taken (.65) PI probability of a miss in the branch target buffer (.25) Pm c number of cycles wasted due to a branch target buffer miss (2) PI probability that an instruction is a load (.2) Ps probability that an instruction is a store (.1) I load latency (1) s store latency (1 ) effective cycles/load 1 + I =2 effective cycles/store =1+s=2 effective cycles/branch 1 + bPnop + PI Plm C = 1.825 The numbers in parentheses are the actual values assigned to each of the factors. Using these values in Equation 6, the average number of cycles per instruction is computed to be: CPlave = 1.424 cycles and the pipeline efficiency is computed to be Pipelineeff = 1 11.424 = .702 from Equation 2 There~ore, for the instruction mix described in Table 1, the pipeline operates at an efficiency of~. branching scheme is implemented with a branch delay slot of 1 cycle. The target instruction of a branch is always fetched regardless of whether the branch is taken or not. In the case of a conditional branch that fails, there is an additional penalty to flush the pipeline and fetch the correct instruction sequence [6]. Loads take 2 cycles; the address of the operand is computed in the ALU stage (stage 3) of the pipeline and the cache access is made in stage 4. A new instruction cannot be fetched during stage 4 of a load instruction as both instruction and data share the same memory and therefore a load effectively takes 2 cycles. For similar reasons, store instructions take 3 cycles. If it is assumed that there is a branch penalty of at least 2 cycles for a conditional branch that fails (Le. to flush the first two stages of the pipeline), then equation 3 then reduces to: CPlave = 1 + Pb bPnop + (1 - PI) 2) + 1PI + sPs (7) Pb b Pnop probability that an instruction is a branch (.15) number of branch delay slots (1) probability that an instruction in a branch delay slot is a no-op (.5) PI probability that branch is taken (.65) PI probability that an instruction is a load (.2) Ps probability that an instruction is a store (.1) I load latency (1) s store latency (2) effective cycles/load effective cycles/store 1+S = 3 effective cycles/branch 1 + bPnop + (1 - PI )21 = 2.2 Using these values in Equation 7, the average number of cycles per instruction is computed to be: CPlave = 1.58 cycles and the pipeline efficiency is computed to be PipelineeH = 1 11.58 = .633 from Equation 2 Therefore, for the instruction mix described in Table 1, the pipeline operates at an efficiency of 63.3%. MC88100 SPARC The SPARG processor has a 4 stage instruction pipeline, includes a large register file (136 general purpose registers) [8] and uses a register windowing scheme for parameter passing. It can be argued that in the case of a RISC architecture with a register windowing scheme, there will be relatively fewer loads and stores for a given instruction mix. The code generated, however, depends on the quality of the compiler. Morrison and Walker [13] have shown that for a given instruction mix the compiler for the SPARC architecture (the only architecture in this analysis that actually uses a register windowing scheme) reduced the the percentage of loads and stores by 3-9% compared to the MIPS compiler. The MIPS optimizing compiler, however, generated fewer total instructions and sometimes up to 1/3 fewer instructions compared to the SPARC compiler. The MIPS compiler therefore, generated fewer total number of loads and stores. The instruction mix in Table 1 is used for comparing the performance of the SPARC with the R3000. The SPARC processor has a single address and data bus and uses a single cache for both instructions and data. The architecture includes 2 deep instruction pre-fetch buffer and uses a value in the condition code register for conditional branching. A delayed The Motorola MC88100 RISC processor includes 32 general purpose registers and uses a two-port, non multiplexed memory access scheme (Harvard architecture). The processor contains a 2 stage instruction unit pipeline that fetches and supplies instructions to the integer or floating-point unit [9]. Data memory accesses are pipelined and controlled by the 3 stage data unit. Because the integer unit consists of a single stage, there are effectively 6 pipeline stages for an integer ALU operation performed on operands in memory. A delayed branching scheme is employed with a delay slot of 1 cycle. A separate adder for address calculations is also included so that branch target addresses are computed in parallel with instruction decoding. The branch penalty therefore is only 1 cycle (the single delay slot). The load latency is 2 cycles because the memory management unit is off-chip and additional time is needed for address translation; stores on the other hand have no latency and execute in a single cycle. Because loads and stores are pipelined, there are possible memory conflicts that will add to the memory reference penalty. It is assumed that the compiler can reorganize the instruction stream so that the first load delay slot (first load latency cycle) can be filled with a useful instruction 70% of the time and the second delay slot can be successfully filled 30% 9.5 3 &I • APPLICATION NOTE AN-33 of the time. Therefore, the probability that the instructions in the load delay slots are no-op's can be computed using Equation 4. Equation 3 then reduces to: CPla~e - 1 + bPb Pno~b +1PI Pnopl of stage 4) and it is assumed that the compiler fills the load delay slot with a useful instruction 70% of the time. Store instructions have no latency and execute in a single cycle. Using these parameters Equation 3 reduces to: (8) Pb probability that an instruction is a branch (.15) b = number of branch delay slots (1) Pnopb = probability that an instruction in a branch delay slot is a no-op (.5) PI - probability that an instruction is a load (.2) I number of load delay slots (2) Pnopl = probability that the instructions in the load delay slots are no-ops (.5) effective cycles/load - 1 + IPnopi = 2 effective cycles/store .. 1 = 1 + bPnopb = 1.5 effective cycleslbranch Using these values in Equation 8, the average number of cycles per instruction is computed to be: CPlave .. 1.275 cycles and the pipeline efficiency is computed to be Pipelineeff = 1/1.275 = .784 from Equation 2 CPlave = 1 + bPb Pnopb + 1PI Pnopl (9) probability that an instruction is a branch (.15) = number of branch delay slots (1) Pnopb = probability that an instruction in a branch delay slot is a no-op (.5) probability that an instruction is a load (.2) PI number of load delay slots (1) I Pnopl .. probability that the instruction in the load delay slots is a no-op (.3) effective cycles/load = 1 + IPnopl = 1.3 effective cycles/store = 1 effective cycleslbranch = 1.5 1 + bPnopb Using these values in Equation 9, the average number of cycles per instruction is computed to be: CPlave = 1.135 cycles and the pipeline efficiency is computed to be Pipelineeff = 1 /1.135 = .881 from Equation 2 Therefore for the instruction mix described in Table 1, the pipeline operates at an efficiency of 78.4%. R3000 The R3000 RISC processor [10] has a 5-stage instruction pipeline and includes 32 general purpose registers. The processor has both an address and a data bus which are cycled at twice the processors clock frequency. This enables the processor to have separate instruction and data caches. The instruction cache is accessed during one phase of the processor cycle, while the data cache is accessed on the other phase [11]. All branch instructions execute with a delay slot of 1 cycle. The processor also includes a separate adder for address computations which enables the target address of a branch instruction to compute in parallel with the instruction decode phase of the pipeline [12]. The processor employs a fast compare scheme for conditional branches (includes a separate comparator for equality, inequality and any relation with zero tests). Branch conditions, therefore, can be evaluated early in the pipeline and the ALU stage is not required for most conditional branches [2]. Thus, the branch penalty is 1 cycle (branch delay slot) that the compiler successfully fills 50% of the time. Load instructions have a latency of 1 cycle (the address of the load is computed in stage 3, and the memory access is completed by the end PROCESSOR Am29000 Therefore, for the instruction mix described in Table 1, the pipeline operates at an efficiency of 88.1 %. CONCLUSION The performance comparisons made above are summarized in Table 2 shown below. The effective number of cycles for branch and memory operations are also included. It is important to note that the CPlave and the Pipelineeff results shown in Table 2 pertain to the instruction mix described in Table 1. Any instruction mix that has a greater frequency of loads, stores or branches will increase the average number of instructions per cycle and thereby decrease the pipeline efficiency. The results show the detrimental effect of load, store and branch latencies on pipeline efficiency. The analysis shows that the R3000 maintains the highest pipeline efficiency for the given instruction mix. This is because the pipeline architecture of the R3000 is optimally designed to minimize the latency effects of loads and branches. The branch latency in the R3000 is minimized by using a fast single cycle compare and branch and having the branch target address computed by the second stage of the pipeline. The load latency is minimized by having the cache control and tag compare logic on chip and accessing the instruction and data caches on alternate phases of the clock. SPARC MC88100 R3000 1.3 Effective cycles/load 2 2 2 Effective cycles/store 2 3 1 1 Effective cycleslbranch 1.825 2.2 1.5 1.5 CPlave 1.424 1.58 1.275 1.135 Pioelineeff 70.2% 63.3% 78.4% 88.1% Table 2. Performance ComparIson Summary 9.5 4 THE EFFECT OF BRANCH, LOAD AND STORE LATENCY ON RISC PROCESSOR PERFORMANCE APPLICATION NOTE AN-33 [7J Am29000 Streamlined Instruction Processor User Manual, Advanced Micro Devices Inc., 1988, pp 4-1 to 4-16. [8J SPARC MB86901 High Performance 32-bit RISC Processor, Product Description, Fujitsu Microelectronics Inc., Sept. 1988, pp 2 - 30. [9J MC88100 RISC Microprocessor User's Manual, Motorola Inc., 1988. [10] G. Kane, MIPS RISC Architecture, Prentice-Hall Inc., 1987, pp 1-7 to 2-15. [11J T. Riordan and P. Ries, MIPS R3000 Processor Interface, MIPS Computer Systems Inc., Oct 27 1988, pp 4 -7. [12] T. Riordan et ai, "The MIPS M2000 System", Proc. The effect of instruction cache misses has not been taken into 102 Table 2. Capacitances of the Various Devices in a TypicailDTI9R300 Systems Assumptions for Surface Mount Layout Design with x4 SRAMs In the following sections, certain assumptions have been made while calculating the derating factors. These are as follows: 1) The trace has a capacitance of 2pF/inch. 2) The speed of light is 2ns/foot in epoxy. 3) The IDT79R3000 speeds are specified with a loading of 25pF. For every additional 25pF, there is a delay of 1 ns. D-Cache Tag I-Cache Tag 3" m m IDT79R3000 4) The distances between the IDT79R3000 and the latches are approximately 1inch each. 5) The distances between the IDT79R3000 and the RAMs are approximately 4inches each. 6) In all of the assumption, it is assumed that a surface mount package is used. Figure 3 shows a brief mechanical layout of an IDT79R3000 board. Figure 3 shows a brief mechanical layout of an IDT79R3000 board. m m IDT79R3000 1" 1" D-Cache Data I-Cache Data 3" Assume read and write buffer underneath the board 2853 drw03 Figure 3. Surface Mount Board of an mTI9R3000 System With Cache and Main Memory Interface and Approximate Distances Between the Various Devices &I I 9.8 3 IDT79 R3000 33MHz SPECIFICATION AND CACHE TIMING APPLICATION NOTE AN-61 Address Bus Derating calculations For the system shown in Figure 4.1, each address bit is connected to five latches: one going to the main memory interface buffer, two to the instruction cache and tag memory, and two to the data cache and tag memory, respectively. The latches in turn are connected to the address pins on the static RAM. Figure 4 shows all the devices to which each address bit is connected. BitA2 I ! 10T79R3000 IOT54174FCT373B ! 1 10T54174FCT374A IOT54174FCT823B (4) 2853 drw 04 Figure 4. Block Diagram Showing Various Devices Connected to One Address Bit Trace length from the CPU to the latch = 4inches - (1) Capacitance of the trace, Ctrace= 4 x 2pF/inch = 8pF Input capacitance of the 373 .Iatch = 1OpF - (2) (3) As each address bit is connected to five latches (IDT54174FCT373), Total input capacitance due to 5 devices, C373in = 5 x 10 = 50pF ~ (4) Total capacitive load = Ctrace + C373in = 8 + 50 =58pF - The rated IDT79R3000 load, CL(R3000) = 25pF From Eq. (5) and Eq. (6), Extra capacitive loading for the IDT79R3000 (5) (6) = 58 - 25 = 33pF - (7) Let us now examine the capacitive loading between the latches and the RAM. Path length from latches (IDT54/74FCT373s) to RAM (IDT7198s) =3" - (8) Trace capacitance from latch to RAM Input capacitance of the RAM = 5pF - = 3 x 2pF/in = 6pF _ (9) (10) Each output from the latch is connected to eight RAM devices. Load due to 8 devices = 8 x 5 = 40pF -(11) Total capacitance = 40 + 6 = 46pF - The rated 373 load = 50pF - (12) (13) From Eq. (12) and Eq. (13) it can be seen that there is no delay due to the capacitive load between the latch and the RAM. However, there is a delay due to the capacitive load between the IDT79R3000 and the latch. This delay can be calculated as follows: For every extra 25pF of load, there is a delay of 1ns. - (14) From Eq. (7) and Eq. (14), delay due to the capacitive load = 33/25 = 1.32ns - (15) The speed of light == 2ns/foot - (16) 9.8 4 I DT79R3000 33MHz SPECIFICATION AND CACHE TIMING APPLICATION NOTE AN-61 For a maximum path length of 5", delay = 5"/12" x 2 = 0.8ns - From Eq. (15) and Eq. (17), Total propagation delay for the address bus, AdrLod = 1.32 (17) + 0.8:::: 2ns - (18) (IDT7198), the data cache (IDT7198), a read register (IDT541 74FCT374A) and a write register (IDT54/74FCT823). This is shown in Figure 5. Two cases must be considered: a data store and a data fetch. Both are discussed. Data Bus Derating Calculations The derating calculations for the data path are similar to those done for the address path. The data bus is connected to the floating point unit (IDT79R3010), the instruction cache Bit DO ! IDT79R3000 IDT7198 (2) i l IDT54174FCT374A IDT54/74FCT823B (I and D) (Caches) IDT79R3010 2853 drw 05 Figure 5. Block Diagram Showing Various Devices Connected to One Address Bit Data Store (IDT79R3000 CPU Outputs Data): Each data bit is connected to two RAMs (IDT7198s) --one for instruction and one for data. The path length forthe data bus = 5" - Trace capacitance for the data bus (18) = 5 x 2pF/in = 1OpF - Capacitive loading due to devices, Cdevices = 2 X CRAMin + C374in + CS23 + CR3010 Cdevices = 2 x 7 + 12 + 10 + 10 = 46pF - Total capacitive load (19) (20) (21) =Ctrace + Cdevices =46 + 10 =56pF - Propagation delay due to speed of light Delay due to capacitive load = 5"/12" x 2 = 0.8ns - = (56 -25 ) 125 = 1.24ns - From Eq. (23) and Eq. (24), Total propagation delay on a store (22) = 1.24 + 0.8 :::: 2ns - Load ( RAM Provides Data) Since the trace length is the same, Ctrace = (23) (24) (25) 1OpF _ (26) Capacitive load due to devices, Cdevices = CR3000 + CR3010 + CRAMin + C374in + CS23 Cdevices = 10 + 10 + 12 + 10 + 7 = 49pF - (27) 9.8 5 IDT79R3000 33M Hz SPECIFICATION AND CACHE TIMING Total capacitance APPLICATION NOTE AN-61 = Ctrace + Cdevices = 10 + 49 = 59pF The RAM rated drive is 30pF Extra load = Total capacitance - RAM rated drive = 59 - 30 = 29pF - Propagation delay due to capacitive load = 29/25 = 1.16ns - Propagation delay due to the path length = 0.8ns - Total propagation delay = 1.16 + 0.8 "" 2ns - Read and Write Control Derating Calculations The effect of the capacitance on the control signals fromthe IDT79R3000 processorto the caches and the memory interface is considered here. The control signals on the IDT79R3000 are the IRd, .DRd, IWr and DWr which control the instruction cache read, data cache read, instruction cache write and data cache write, respectively. The read and write control signals (29) (30) (31) are connected to the output enable (OE), and write enable (WE) of the instruction and data cache, respectively. Two control signals each are provided for the read and write operations of each of the caches. Assuming the use of a 16 K x 41DT7198 static RAM, each control signal is connected to eight such static RAMs. Number of devices (SRAM) connected to each control line = 8 Input capacitance of each device (SRAM) Total load capacitance Path length = 5" - = 5pF - (32) (33) (34) (35) Trace Capacitance Total capacitance = 5 x 8 = 40pF - (28) = 5 x 2pF/in = 10pF = 1OpF - = 40 + 10 = 50pF - Extra capacitive load (36) (37) = 50 - 25 = 25pF - Propagation delay due to capacitive load Propagation delay due to the trace length Total propagation delay = 1 + 0.8 "" 2ns - (38) = 1ns - (39) = 0.8ns - (40) (41) Assumptions for Through-hole Layout Design Using x4 SRAMS In this section, the de ratings are calculated for a throughhole design. Figure 6 shows an example of the layout of a 9.8 through-hole design. This layout corresponds to an IDT79R3000 demonstration board used extensively at IDT. The data trace lengths are 10 inches and the address trace lengths are 9 inches. 6 1DT79R3000 33MHz SPECIFICATION AND CACHE TIMING B B B B B B B B APPLICATION NOTE AN-61 rn rn rn rn 00 OOrn rn IDT79R3000 g' i ~ 00 oorn IDT79R3010 iii ...J I I 5" I a: 1 Addr. Reg. I 2" I rn 2" I 2" I 2853 drw 06 Figure 6. Board Layout for a Through Hole Design of an IDT79R3000 Cache Subsystem Address Derating Calculations For the system shown in Figure 4, the number of devices connected to the IDT79R3000 is the same. Trace length from the CPU to the latch = 9" Trace capacitance = Ctrace = 9 x 2 = 18pF Input capacitance of the latches = 1OpF - (42) (43) (44) Total capacitance = 5 x C373 + Ctrace= 5 x 10 + 18 = 68pF Extra load on the R3000 = CL = 68 - 25 = 43pF - (45) (46) Since the rated IDT54/74FCT373 load is 50pF, there is no derating factor between the IDT54/74FCT373 and the RAMs. Therefore the derating is between the IDT79R3000 and the latches. Delay due to capacitance = 43/25 = 1.75ns - (47) Propagation delay due to the trace length = 9/12 x 2 = 1.5ns Total derating on the address bus = 1.75 + 1.5"" 3ns Derating on the Data Bus As in the section entitled Data Bus Derating Calculations, the derating for the data bus is calculated for two cases: i) an instruction fetch and ii) data store. (48) (49) ,.. Data Store (IDT79R3000 CPU Outputs Data) _ Each data bit is connected to two RAMs (IDT7198s) - one for instruction and one for data. 9.8 7 I 1DT79R3000 33MHz SPECIFICATION AND CACHE TIMING APPLICATION NOTE AN-61 The path length for the data bus = 10" Trace capacitance for the data bus (50) = 10 x 2pF/in = 20pF - Capacitive loading due to devices, ,Cdevices = 2 X CRAMin + C374in + CS23 + CR3010 Cdevices = 2 x 7 + 12 + 10 + 10 = 46pF - Total capacitive load (51) (52) (53) = Ctrace + Cdevices = 20 + 46 = 66pF - Propagation delay due to speed of light Delay due to capacitive load (54) = 10"/12" x 2 = 1.6ns - (55) = (66 - 25 ) /25 = 1.55ns -(56) From Eq. (23) and Eq. (24), Total propagation delay on a store = 1.54 + 1.67 '" 3ns Load ( RAM Provides Data) Since the trace length is the same, Ctrace = 20pF - (57) (58) Capacitive load due to devices, Cdevices = CR3000 + CR3010 + CRAMin + C374in + CS23 Cdevices = 10 + 10 + Total capacitance 12 + 10 + 7 = 49pF - (59) = Ctrace + Cdevices = 20 + 49 = 69pF The RAM rated drive is 30pF Extra load = Total capacitance - RAM rated drive = 69 - 30 = 39pF - Propagation delay due to capacitive load = 39/30 = 1.3ns - Propagation delay due to the path length = 1.6ns Total propagation delay = 1.3 + 1.6 '" 3ns - (62) effect. The trace length from the CPU to the RAMs is 9 inches for the layout shown in Figure 6. Each control Signal is connected to 8 devices. Number of RAM devices connected to each control signal Total load capacitance The trace length = 9' - Trace capacitance = 5pF - (65) = 8 x 5 = 40pF - (66) Extra load =8 - (64) (67) = 9 x 2pF/inch = 18pF - (68) = 40 + 18 = 58pF - (69) Total load capacitance (61) (63) Read and Write Control Deratings For a through-hole design, the effect of derating on the control signals will be more. This section calculates that Input capacitance of each RAM (60) = 58 - 25 = 33pF - (70) 9.8 8 IDT79R3000 33M Hz SPECIFICATION AND CACHE TIMING Derating due to capacitive load APPLICATION NOTE AN-61 =33/ 25 = 1.35ns - Propagation delay due to trace length Total derating = 1.35 + 1.5 ". 3ns - (71) = 912 x 2ns/foot = 1.5ns - (72) (73) Timing Equations for Cache Design This section deals with the timing equations that enable us to determine the critical timing requirements of the static RAM that will be used as cache; These equations are based on the use of static RAMs (without built-in latches) as cache RAMs. The superscript 'd' in the following equations denotes the deratings to be taken into account. The static RAM chosen for illustration here is a 16K x 41DT7198. The board Is assumed to be surface mount for all speeds of the IDT79R3000 except for the 16MHz speed grade. The deratings for the surface mount board are 2ns and 3ns for a through-hole board (which is used for the 16MHz IDT79R3000). The deratings were derived from certain assumptions. The explanation and the methodology used is set forth in the previous sections. Following is a generalized equation and the timing requirements 9.8 for different frequencies of the IDT79R3000. All calculations are based on the IDT79R3000 specifications for the four speed versions (16, 20, 25 and 33MHz), which are found in the IDT data sheets. Figures 7, 8, 9, and 10 show the timing diagrams of the IDT79R3000 when it is doing a data store followed by an instruction fetch. This is the worst case example and is chosen to determine the SRAM parameter requirements. Figure 7 shows the timing diagrams for an IDT79R3000 operating at 16MHz. Figures 8, 9 and 10 show the timing diagrams for an IDT79R3000 operating at 20M Hz, 25MHz and 33M Hz, respectively. The encircled numbers represent the equations presented in the section entitled· Timing Equations for Cache Design. The timing diagram, in conjunction with the equations, are usedto arrive at determining the timing requirements. 9 IDT79R3000 33MHz SPECIFICATION AND CACHE TIMING APPLICATION NOTE AN-61 60ns Cycle Timing STORE (Phase 2) phi AdrLo I I Adr 3 ~~~~~~~ [READ: phi ~ ---.. OAdr 5.2 29+3 d 9 . 49.2 (50) 1 / 1 I ////////////////// rd I 9 samp samp m AdrLo + 373PD + RAMAA + OS RAMoUT FETCH (Phase 1) phi 3 15+3 9 d d Rd + RAMOE + OS CD 9 (10) ....,"""',.... I in G) 30 (30) 3 15+3 I RAuJfE @ d [WRITE: .... , ..... /, /..... /, // CPlJouT , ......... /// I\.. 1 / sa~p ~ Oout 3+3 ® 14 (14) 18 (24) IRd -2.5 10.5 Rd + RAMHZ - Den 16 ~««« 2 ® OVal d + RAMsD _ Wr d 20 (20) 0 -. 2 RAMHD - RAMLZ 4 W WrOly -2 (0) .... OWr 3+3 (11.5) 2 (6) 1.5 (6) ® OVal d + Setupsvs _ Sys d - 240PD 14 (14) 2 4.8 (2.2) 2 1 Sys d + 240PD + Holdsvs - RAM..z - Rd d 6 (6) (6) SysClk ® I ~ sys (0) sys (0) (14) 2B53drw 07 Figure 7. Cache Timing Diagram for a 16MHz IDT79R3000 9.8 10 I OTI9R3000 33M Hz SPECIFICATION AND CACHE TIMING APPLICATION NOTE AN~1 SOns Cycle Timing AdrLo I K K I Adr 2 24t~ 5.2 41.2 (42) [READ: ) ~~~~~~~ /////// 7 1 samp G) ///// I I 2 /////// 13+2 d I in "- G) 2 ru '~ LLL/// //J /1' ~««« )ffi 1 13 ® DVal d + RAMsD _ Wr d 17 (17) 0 2 RAMHD - RAML2 ~ DWr 4 CD __ WrDly -2 (0) ~ / (8.5) (6) 1.5 DVal d + Setupsvs _ Sys d - 240PD 3+2 ® 14 (14) sa~p Dout "-"" "" "" 3+2 -2 10 d Rd + RAMHZ - Den 15 (19) CPUouT """""""""- 25(25) RAMm Dout ...... samp 1 12 (14) CPlJouT I K=J 2 10+2 (WRITE: ) 5 5 (6) I IRd samp I ...... phi 10 1 DVai d + RAMso _ Wr d ~~<-« 2+2 ® 13 (14) -. 0 2 RAMHO - RAMLZ 3 (j) WrDly -2 (0) " DWr 1.5 DVal d + Setupsvs _ Sys d - 240PD 2+2 (6) (6) 1 (6) ® 7.5 (8) 4.8 (3) 1 2 1 Sys d + 240PD + JHoldsvs - RAM2 - Rd d 5.8 (6) SysClk " rn I (6) ~ sys sys (0) (0) (12) 2853 drw 09 Figure 9. Cache Timing Diagram for a 25MHz IDT79R3000 9.B 12 1DT79R3000 33MHz SPECIFICATION AND CACHE TIMING APPLICATION NOTE AN--61, 30ns Cycle Timing STORE (Phase 2) phi FETCH (Phase 1) phi AdrLo 1~________JIV~_____________I_A_d_r____________-'IK~ 2 4.2 12+2 4.5 AdrLo + 373PD + RAMA~ + DS RAtJaJr 24.7 (25.5) »»») _____________D__A_d_r____________ IK===J samp m 3.5 samp I CD 3.5 (4.5) I //////// rd phi /////////// -"'" I 2 7+2 d I in ')""""'" 4.5 Rd + RAuJE + DS 15.5 (15.5) I 2 6 -1 d Rd + RAMHZ - Den 7+2 RAuJE ill 9 (10.5) IRd (WRITE: /// CPUour / '" ./ '~'--________- t________-')t' 1 :««« D out 2+2 ® 9 (9) 7.5 1 d ® DVai + RAMsD - Wr d 10.5 (10.5) 0 ---. 2 RAMHD - RAMLZ 3 CD WrDly -2(0) /v DWr 2+2 (4.5) 1 1.5 (4.5) DVal d + Setupsvs _ Sys d - 240PD (4.5) ® 6 (6) 1 4.8 (1.7) 2 1 Sys d + 240PD + Holdsvs - RAM2 _ Rd d 4.5 (4.5) SysClk '~ CD -I ~~ v sys sys (0) (0) (9) 2853 drw 10 Figure 10. Cache timing Diagram for a 33MHz IDT79R3000 9.8 13 IDT79R3000 33MHz SPECIFICATION AND CACHE TIMING APPLICATION NOTE AN-61 The following equations are used to determine the timing parameters for the static RAM so that they can function as cache for different operating frequencies of the IDT79R3000. The numbers at the left correspond to the encircled numbers (1) Internal Sample to Phase Delay This is the time that the processor needs to sample the incoming data. Typically, forthe IDT79R3000, tsmp ~ (2) in the timing diagrams. Equations 9 and 10 are not shown in the timing diagram but are included for completeness. The equations also use some IDT79R3000 parameters. These are listed in Table 3. 5 RAM Address Access Time This equation is used to determine the Address Access time parameter requirements of the static RAM. From the timing diagram of Figure 10, it is easily calculated. As an example, let us calculate the address access time for a 33MHz IDT79R3000. The total cycle time for a 33MHz IDT79R3000 is 30ns. If the processor's sample time requirement is met, the time remaining in the cycle is 24ns. In this time the data has to be presented to the processor. The processor requires a data set-up time of 4ns. There is also a propagation delay through the latch for the address bus. For the 33MHz part, a fast IDT541 74FCT373C is used which has a maximum propagation delay of 4.2ns (see Table 4). The derating factors due to the capacitance and the trace length also have to be taken into account. Using all these factors, the equation is: tRAMAA ~ tcyc - tsmp - lOS - t373PO - tAdrLod - tRAMAA d 16MHz IDT79R3000: tRAMAA ~ 60 - 10 - 9 - 5.2 - 3 - 3 tRAMAA ~ 29.8 20 MHz IDT79R3000: tRAMAA ~ 50 - 8 - 8 - 5.2 - 2 - 2 tRAMAA ~ 24.8 25 MHz IDT79R3000: tRAMAA ~ 40 - 6 - 6 - 5.2 - 2 - 2 tRAMAA ~ 18.8 33 MHz IDT79R3000: tRAMAA ~ 30 - 4.5 - 4.5 - 4.7 - 2 - 2 tRAMAA ~ 12.3 (3) Cache Enable to Sample This equation is used to determine the system output enable(toEs) requirements of the cache RAM and should meet the processor's set-up specification. The output enable time (tOE) specifications forthe RAM are testedforavoltage change of 200mV (afallfrom 1.732Vto 1.532VforIDTRAMs). Fora system, however, the voltage falls from approximately 3.3V to 1.5V. This fall time is usually a nanosecond. Therefore, the RAM specifications should take this system factor Into consideration and specify the output enable time at least one nanosecond lower than the calculated timings. tOES ~ tcyC/2 - tRd - tos - tsys-smp + tsys-rd - tOEsC 16MHz IDT79R3000: tOES ~ 30 - 3 - 9 - 10 + 10 - 3 tOES ~ 15 20MHz IDT79R3000: tOES ~ 25 - 2 - 8 - 8 + 8 - 2 tOES ~ 13 25MHz IDT79R3000: tOES ~ 20 - 2 - 6 - 6 + 6 - 2 tOES ~ 10 33MHz IDT79R3000: t OES ~ 15 - 2 - 4 - 4.5 + 4.5 - 2 tOES ~ 7 9.B 14 IDT79R3000 33MHz SPECIFICATION AND CACHE TIMING (4) APPLICATION NOTE AN-61, Minimum Read Pulse Width This timing requirement guarantees that the read pulse width generated by the processor is at least as long as the cache RAM output-enable time. tOES ~ tcycl2 - tsys-rd - tOEs:' 16MHz IDT79R3000: tOES ~ 30 - 10 - 3 tOES ~ 17 20MHz IDT79R3000: tOES ~ 25 - 8 - 2 tOEs ~ 15 25MHz IDT79R3000: tOEs ~ 20 - 6 - 2 tOEs ~ 12 33MHz IDT79R3000: tOES ~ 15 - 4.5 - 2 tOES ~ 8.5 (5) Read-Write I-Cache Data Bus Contention This timing requirement ensures that the RAM output is tristated soon enough after the instruction read signal goes high. In the worst case, when the processor performs a store operation, no data contention occurs. tRAMHZ ~ tsys - tReP + DEn 16MHz IDT79R3000: tRAMHZ ~ 16 - 3 + (-2.5) tRAMHZ ~ 10.5 20MHz IDT79R3000: tRAMHZ tRAMHZ ~ ~ 14 - 2 + (-2) 10 25MHz IDT79R3000: tRAM HZ ~ 12 - 2 + (-1.5) tRAMHZ ~ 8.5 33MHz IDT79R3000: tRAM HZ ~ 9 - 2 + (-1) tRAMHZ ~ 6 (6) Processor Data Set-up to End of Write This enables the designer to determine whether the cache RAMs have adequate data set-up time when the processor does a store operation. In the equation, the minimum derating is used on the write line i.e., tWr d , because that is the worst case assumption. tRAMDS ~ tcycl2 - tsys-smp - tDVal - tDVaf - twrt 16MHz IDT79R3000: tRAMDS ~ 30 -10 - 3 - 3 - (-2) tRAMDS ~ 16 20MHz IDT79R3000: tRAMDS ~ 25 - 8 - 3 - 2 - (-1) tRAMDS ~ 13 25MHz IDT79R3000: tRAMDS ~ 20 - 6 - 2 - 2 - (-1) tRAMDS ~ 11 II 33MHz IDT79R3000: tRAMDS ~ 15 - 4.5 - 2 - 2 - (-1) tRAMDS ~ 7.5 9.B 15 IDT79R3000 33M Hz SPECIFICATION AND CACHE TIMING (7) APPLICATION NOTE AN~1 Data Hold from End of Write This parameter requirement guarantees that the data hold from end of write of the cache RAM is met when the processor or the read buffer is writing to the RAMs. tRAMHO ~ tsmp-rd + tRAMLZ 16MHz IDT79R3000: tRAMHD ~ 0 + 2 tRAMHD ~ 2 20MHz IDT79R3000: tRAMHD ~ 0 + 2 tRAMHD ~ 2 25MHz IDT79R3000: tRAMHD ~ 0 + 2 tRAMHD ~ 2 33MHz IDT79R3000: tRAMHD ~ 0 + 2 tRAMHD ~ 2 (8) Data SetUp to SysClk This timing parameter ensures that the set-up time into an external register (for the main memory interface) is sufficient enough forwhen the processor is doing a store. The data is clocked in the register on the rising edge of the buffered SysOut (through an inverting IDT54/74FCT240A). In this equation, tsys(min)d is used to ensure worst case calculations. tsetupSys ~ tcyC/2 - tsys - tOVal- toVaf + tsysd + t240POmin 16MHz IDT79R3000: tSetUpSys ~ 30 - 16 - 3 - 3 + 2 + 1.5 tSetUpSys ~ 11.5 20MHz IDT79R3000: tSetUpSys ~ 25 - 14 - 3 - 2 + 1 + 1.5 tSetUpSys ~ 8.5 25MHz IDT79R3000: tSetUpSys ~ 20 - 12 - 2 - 2 + 1 + 1.5 tSetUpSys ~ 6.5 33MHz IDT79R3000: tSetUpSys ~ 15 - 9 - 2 - 2 + 1 + 1.5 tSetUpSys ~ 4.5 (9) Data Hold from SysClk . This timing parameter is to guarantee that the hold time specification for an external register is met on a processor store. In this equation the minimum value of tRDd is taken to ensure worst case numbers. tHoldSys ~ tsys-rd - tsysd - t240POmax + tRAMLZ + tRef 16MHz IDT79R3000: tHoldSys ~ 6 - 2 - 4.8 + 2 tHoldSys ~ 2.2 +1 20MHz IDT79R3000: tHoldSys ~ 6 - 1 - 4.8 + 2 + 1 tHoldSys ~ 3.2 25MHz IDT79R3000: tHoldSys ~ 6 - 1 - 4.8 + 2 + 1 tHoldSys ~ 3.2 33MHz IDT79R3000: tHoldSys ~ 4.5 - 1 - 4.8 + 2 + 1 tHoldSys ~ 1.9 9.8 16 IDT79R3000 33MHz SPECIFICATION AND CACHE TIMING APPLICATION NOTE AN-61. (10) Address Set-up to End of Write: This equation enables us to determine the timing requirement for the RAM so that the address setup time is sufficient before the trailing edge of the write pulse. tRAMAW:;;; tcyc - tsmp-sys - tAdrLod - t373PD + tw,o 16MHz IDT79R3000: tRAMAW:;;; 60 - 10 - 3 - 5.2 + 3 tRAMAW:;;; 44.8 20MHz IDT79R3000: tRAMAW:;;; 50 - 8 - 2 - 5.2 + 2 tRAMAW:;;; 36.8 25MHz IDT79R3000: tRAMAW:;;; 40 - 6 - 2 - 5.2 + 2 tRAMAW:;;; 28.8 33MHz IDT79R3000: tRAMAW:;;; 30 - 4.5 - 2 - 4.7 + 2 tRAMAW:;;; 20.8 (11) Write Hold Pulse-Width: This requirement guarantees that the cache RAM's minimum write pulse width specification is met. tRAMPW:;;; tcyC/2 - tWrDly 16MHz IDT79R3000: tRAMPW:;;; 30 - 5 tRAMPW:;;; 25 20MHz IDT79R3000: tRAMPW:;;; 25 - 4 tRAMPW:;;; 21 25MHz IDT79R3000: tRAMPW:;;; 20 - 3 tRAMPW:;;; 17 33MHz IDT79R3000: tRAMPW:;;; 15 - 2 tRAMPW:;;; 13 (12) Write Recovery Time: The write recovery time is the time between the write pulse going inactive and the change in address. This characteristic is usually specified by the SRAM manufacturer and is typically zero. This parameter is important in the IDT79R3000 cache interface and care must be taken to choose the proper part to prevent race conditions. In the IDT79R3000 cache design using the IDT7198 16 K x 4 RAM, the latch enable is controlled by IClkJDClk and the write enable on the RAM is controlled by IWrIDWr. The timing diagram shows the relationship between the two clocks and the parameter tWR Timing calculations below show that the write recovery specifications are not violated. Derating Calculations for DClk and DWr To calculate the effect of derating on the control signals DClk and DWr, the following assumptions have been made: 1) The pin to pin variation on an IDT79R3000 device is 15 % for a 50pF load. Under the maximum case, the deratings will vary from 1.7 to 2ns for DClk and DWr. Under the minimum case the deratings will vary from 0.58 to 0.625ns. 9.8 2) The trace length for the DWr signal is 6 inches. 3) The trace length for the DClk Signal is 2 inches. 4) The trace length of the address bus to the RAM is 4 inches. 5) Each IClk control signal is connected to four IDT541 74FCT373 devices. 17 IDT79R3000 33M Hz SPECIFICATION AND CACHE TIMING APPLICATION NOTE AN-61 IDT79R3000 1 AdrLo I - - I 2xMHz Osc. I Delay Line I ~ Clk2xSys Clk2xSmp CIk2xRd CIk2xPhi LE IDT54174FCT373 I IClk WE IWr Addr IDT7198 (16K x4) 2853 drw 11 Figure 11. Circuit Showing IWr and IClk Signals to Latch and SRAM The input and ouput capacitances for the IDT79R3000, IDT7198and IDT54/74 FCT373 can be obtained from Table 2. Figure 11 is a simple circuit showing the connections of IClk and IWrfromthe IDT79R3000to the latch enable (LE) on the IDT54/74FCT373 device and Write Enable (WE) on the static RAM, respectively. Figure 12 shows the tWR timings with respect to the data cache in an IDT79R3000-based system. d ~f- tClk I DClk ~ I d tWr DWr ~ I AdrLo IAdr )K ~ L- t-tWr ""V DAdr Latched AdrLo IAdr "K ~bt> I-A D-A I DAdr D-A 2853 drw 12 Figure 12. Write Recovery TIming To prove that the tWR parameter is not violated, the calculations are done as shown below. The derating effects on the DClk and AdrLo signal should exceed that of the DWr signal. These calculations are similar to the derating calculations in previous sections. The minimum propagation delay through the latch is considered. The derating on the DClk signal coming out of the IDT79R3000 is less than that of the DWr signal. The reverse case is superfluous and, in fact, improves the situation. The minimum and worst case derating effects are shown below. The write recovery time parameter must not be violated over the entire operating range. Capacitive Derations: (IDT79R3000 Variations 15% 1.7ns - 2ns) (( Cd river + Cload + Ctrace) - Crated)/25 * (MinOrMax) =CLD Iclk ((10 + 4 * 10 + 2 * 2) - 25)/25 * 1.7 = 1.97ns Iwr ((10 + 8 * 7 + 6 * 2) - 25)/25 * 2 = 4.24ns RamAddr ((12 + 8 * 7 + 4 * 2) - 50)/25 * 0.5 9.8 = 0.52ns 18 IDTI9R3000 33MHz SPECIFICATION AND CACHE TIMING APPLICATION NOTE AN-6"l Calculations: Race path 1 : Iclkmin + tPD(le) + RamAdrmin Race path 2 : Iwrmax = 4.24 = ( 1.97 + 2 + 0.52 ) = 4.49 Path1 - Path2 > tWR 4.49 - 4.24 > 0 Capacitive Derations: (IDT79R3000 Variations 15% 0.58ns - 0.625ns) (( Cdriver + Cload + Ctrace) - Crated)/25 * (MinOrMax) = CLD Iclk ((10 + 4 * 10 + 2 * 2) - 25)/25 * 0.58 = 0.58ns Iwr . ((10 + 8 * 7 + 6 * 2) - 25)/25 • 0.625 = 1.325ns RamAddr ((12 + 8 * 7 + 4 * 2) - 50)/25 * 0.5 = 0.52ns Calculations: Race path 1 : Icll tWR 3.1 - 1.325 = 1.775> 0 From the above calculations and the RAM timing Tables 4 and 5, it can be seen that the data set-up to the processor is met. The output enable of the RAM, which is controlled by IRd, goes high and the RAM output starts to go tri-state. From the figure, the reader may correctly question whether the hold time requirements of the IDT79R3000 are met. They are indeed met by the capacitance on the bus and also because CMOS devices are being used. The technical note entitled Meeting Bus Hold forthe IDT79R3000 gives a more detailed explanation. Table 4 gives the timing data sheet for a typical SRAM device. The timing parameters correspond to a particular RAM configuration. Other RAM devices may have different timings for some of the parameters; however, there are certain timings that must be met. These critical parameters are listed in Table 5 and the unlisted parameters may vary a bit from device to device. II I 9.8 19 IDn9R3000 33MHz SPECIFICATION AND CACHE TIMING APPLICATION NOTE AN-61 AC ELECTRICAL CHARACTERISTICS - COMMERCIAL TEMPERATURE RANGE 16MHz Symbol Parameter Min. 20MHz Max. 25MHz Min. Max. Min. 33MHz Max. Min. Max. Unit Clock TCkHigh Input Clock High 12.5 - 10 - 6 - ns Input Clock Low 12.5 - 10 - 8 TCkLow 8 - 6 - ns TCkP Input Clock Period 30 500 25 500 20 500 15 200 ns Clk2xSys to Clk2xSmp 0 tcyc/4 0 tcyc/4 0 tcyc/4 0 tcyc/4 ns Clk2xSmp to Clk2xRd 0 tcyc/4 0 tcyc/4 0 tcyc/4 0 tcyc/4 ns Clk2xSmp to Clk2xPhi 9 tcyc/4 7 tcyc/4 5 tcyc/4 3.5 tcyc/4 ns - -1.5 -1 ns - 2 3 - 6 - 4.5 -2.5 -1.5 Run operation TDen Data Enable - -2 TDDis Data Disable -1 TDVal Data Valid - - -2 3 - 5 - 9 - 8 -2.5 -2.5 13 - CpBusy Hold -2.5 - -2.5 TAcTy Access Type [1 :01 7 TAT2 Access Type [21 - - TMWr Memory Write TWrDly Write Delay TDS Data Set-Up TDH Data Hold TCBS CpBusy Set-Up TCBH TExe 1 Exception 3 17 27 11 1 -1 4 -0.5 -0.5 ns 2 ns 2 ns ns -1.5 - ns - -2.5 - 6 - 5 - 4 14 - 12 - 8.5 ns 9.5 ns 3.5 ns 23 9 1 18 7 1 ns ns ns - 7 - 7 - 5 - 23 - 20 - 15 ns 23 18 - 10 ns - Stall Operation TSAVal Address Valid - 30 TSAcTy Access Type Valid - 27 TMRdl Memory Read Initiate TMRdT Memory Read Terminate 1 27 - 7 1 23 - 7 TSd Run Terminate 2 17 2 15 TRun Run Initiate - 7 - 6 TSMWr Memory Write TSEx Exception Valid TRST Reset Pulse Width TrstPLL Reset Timing. PLU 1) On Trstcp Reset Timing. PLL Off 1 27 - 20 1 - 23 18 1 2 1 18 5 11 4 18 - 15 6 - 1 2 1 - 10 ns 3.5 ns 8 ns 3 ns 9.5 ns 10 ns Reset Initialization 6 - 6 3000 - 3000 128 - 128 0.5 - 0.5 - 3000 128 6 - TckP 3000 - TckP 128 TckP Capacitive Load Derating Factor CLD Load Derate 1 0.5 1 0.5 1 ns/pF 28531b103 NOTE: 1. PLL: Phase Locked Loops. Table 3. IDT79R300 AC Specifications 9.8 20 1DT79R3000 33MHz SPECIFICATION AND CACHE TIMING APPLICATION NOTE AN~1· READ CYCLE TIMING SPECIFICATIONS 16.7MHZ Parameter Min. Max. 25.0MHz 20.0MHz Min. Max. Min. Max. 33.0MHz Min. tRC 30 - 25 - 20 - 12 tAA 30 - 19 30 - 25 tACS1 - 25 - 19 - tCLZ1 5 - 5 - 5 - 2 tOE - 15 - 13 - 10 - tOll 5 - 5 - 5 - 3 - 10 - 8 10 - 8 - tCHZ1 - 12 tOHZ - 10 tOH 5 - 5 30 - tPU tPO 0 - 0 25 5 0 - 20 Max. 12(1) 15 7 8 6 0 - 0 - - 15 28531b104 WRITE CYCLE TIMING SPECIFICATIONS 16.7MHZ Parameter Min. Max. 25 25 - 0 - 0 tWP 25 - 21 tWR1 0 0 - 0 tWR2 tWC 30 tCW1 25 tAW tAS 25.0MHz 20.0MHz Min. 21 21 0 Max. Min. Max. - 15 0 - 0 0 - - 20 17 17 0 17 tWHZ - 18 - 16 - tOW 16 - 13 - 11 tOH 0 - 0 tOW 5 - 5 33.0MHz Min. - 0 5 Max. 8 - 15 15 0 13 0 7 0 5 6 28531b105 NOTE: 1. This assumes that an IDT54174FCT373C with a tPD = 4.2ns is used. Table 4. Static RAM Read and Write Timings to Work as Cache With the IDT79R300 II 9.8 21 1DT79R3000 33MHz SPECIFICATION AND CACHE TIMING APPLICATION NOTE AN-61 READ CYCLE TIMING SPECIFICATIONS(1) 20.0MHz 16.7MHZ Parameter tRC tAA tACS1 Min. · - tCLZ1 · tOE - tOll · tCHZ1 - tOHZ - tOH tPU tPO · · - Max. 29.8 · - 15 - · 10.5 - · Min. · - · · - · ·- Max. 25.0MHz Min. · - - 24.8 · - · - - 13 · - · - 10 · · - · - Max. 18.8 · - 10 - · 8.5 - · 33.0MHz Min. · - · · - Max. 12.8 · - 7 - · - · · 6 - - * 28531b106 WRITE CYCLE TIMING SPECIFICATIONS(1) 16.7MHZ Parameter Min. · tWC tCW1 tAW · 44.8 tAS · tWP 25 tWR1· tWR2 · tWHZ - tOW 16 tOH tOW · · Max. - - · - - 25.0MHz 20.0MHz Min. · · · · 36.8 21 · - 13 · · Max. Min. - · · · 28.8 - · - 17 · · Max. - - - 11 · · 33.0MHz Min. · · · 19.8 13 * · Max. - · - 7 · · - 28531b107 NOTE: 1. Shown are the minimum or maximum parameters. Numbers not shown are not critical for the IDT79R3000 application. Table 5. Static RAM Parameters to Work as Cache with the IDT79R3000 9.8 22 IDTI9R3000 33M Hz SPECIFICATION AND CACHE TIMING APPLICATION NOTE AN-61. Parameter Load Symbol IDT54174FCT373A Propagation Delay 50 T373 PO - 5.2 IDT54174FCT373A Latch Enable Delay 50 T373LE 2 8.5 IDT54174FCT373A Latch Enable Hold 50 T373Hld 1.8 - IDT54174FCT240A Propagation Delay 50 T240PO 1.5 4.8 Min. Max. IDT54174FCT373C Propagation Delay 50 T373 PO 1.5 4.7 IDT54174FCT240C Propagation Delay 50 T240PO 1.5 3.7 2853 tbl 08 Table 6. Timing Parameters of IOT54f74FCT logic Devices Legend: tRAMAA tRAMOE tRAMHZ tRAMLZ tRAMHD - RAM RAM RAM RAM RAM Access Time Output Enable Time OutPut Low Impedance to Output in High Impedance Output in High Impedance to Output in Low Impedance Data Hold Time lOS IDT79R3000 Data Set-up Time tsys Phase Difference between Clk2xSys and Clk2xPhi trd Phase Difference between Clk2xPhi and Clk2xRd tsmp Phase Difference between Clk2xPhi and Clk2xSmp tcyc Cycle time of the IDT79R3000 tsmp-rd = tsmp - trd t240PD - Propagation delay from Clk to Output of IDT54174FCT240A 9.8 23 10T79R3000 33MHz SPECIFICATION AND CACHE TIMING APPLICATION NOTE AN-61 3) The rated load of the IDT79R3000 at which the timings are specified is 25pF. There is a 1ns derating for every additional 25pF. 4) The distances between the IOT79R3000 and the latches are approximately 5 inches. 5) The distances between the IOT79R3000 and the RAMs are approximately 2 inches each. 6) It is assumed that surface mount packages are used. The input capacitance of the RAMs is a typical value (7pF) for a PLCC package. USING X16 LATCHED RAMS AS CACHE FOR THE 1DT79R3000 ON A SURFACE MOUNT DESIGN Assumptions for Surface Mount DesIgn Layout Using x16 Latched RAMs as Cache for the IDT79R3000 In this section the RAM timings are calculated for a 4K x 16 1DT71586 which has the latches built in. Forthe static RAMs with latches built in, the address access times, tRAMAA, and the address set-up to end of write, tRAMAW will change from those of a regular static RAM. The propagation delay due to the latches is eliminated, increasing the access time and the address set-up to end of write by about 5ns. In addition, the board layout is different because the distances from the CPU to the RAM are reduced. This decreases the derating factors by a finite amount. This section calculates the derating factors for an IDT79R3000 cache design using the IOT71586 as cache. These are the following assumptions: 1) The trace has a capacitance of 2pF/inch. 2) The speed of light is 2ns/foot in epoxy. Derating Calculations Using IDT71586 as Cache RAMs The derating factors for the I OT71586 cache RAMs follow the same methodology as explained in the section entitled Deartlng Calculations and Cache Timing Considerations Using x4 SRAMs. The cache size is 4K words for instruction and 4K words for data. The latches are eliminated. The derating factors for the address and data bus are calculated. Figure 13 shows a cache system forthe IOT79R3000 with the latched RAMs (Le., IOT71586 as the cache). There are a total of 8 such devices required for a 4K word size of instruction and data cache. I + Addr + AdrLo (13:0) Addr OE IRd W IWr Data ...- Data and Tag IDT71586 4K x 16 OE --"oW ~ Data IDT71586 4Kx 16 IDT79R3000 For each cache, i.e., Instruction or Data Cache Number of IDT71586s for Data and Tag = 8 Size is 4KB 2853 drw 13 Figure 13. IOT71586 Used as Cache RAMs for the 10T79R3000, Cache Size 9.8 =4KWords 24 IDT79R3000 33MHz SPECIFICATION AND CACHE TIMING I D-Cache Tag I APPUCATION NOTE AN-61 I IDT79R3000 D-Cache Tag I rn rn rn r-~ I !-Cache Tag I I IDT79R3010 !-Cache TaQ Ol (jj '0, Ol a:: '- 1 2" 1" 2" I 2853 drw 14 Figure 14. Surface Mount Board Layout of an 1DT79R3000 System Using 1DT71586 as Cache and Approximate Distances Between Devices Address Bus Derating Calculations Figure 14 shows an example layout of an 10T79R3000 surface mount design board using latched RAMs (IOT71586) as cache for the IOT79R3000 system. The distance between the 10T79R3000 data pins and the caches is about 2 inches. The total trace length for the address bus and the data bus is about 4 inches each. ! l IDT71586 IDT54/74FCT373A I AdrLo2 IDT79R3000 Each AdrLo bus is connected to eight latched RAMs (Le., the IOT71586) and the address latch for main memory writes and reads. (Figure 15). 2853 drw 15 Figure 15. Number of Devices Connected to Address Bus Trace length from the CPU to the address latch (for main memory) Capacitance of the trace = Ctrace = 4 x 2pF/inch = 8pF - Input capacitance of the 373 latch = 10pF - Total capacitance due to the load The rated 10T79R3000 load Extra loading on the 10T79R3000 (76) (77) = 8 + 56 + 10 = 74pF - (78) = 25pF - (74) (75) = 8 x 7 = 56pF - Total input capacitance due to 8 devices = 4 inches - (79) = 74 - 25 = 49pF - (80) The delay can be calculated as follows: For every extra 25pF of load,there is a delay of 1ns - (81) From Eq. 80 and Eq. 81, delay due to the capacitive load 9.8 = 49/25 == 2ns - (82) 25 IDT79R3000 33M Hz SPECIFICATION AND CACHE TIMING APPLICATION NOTE A~1 The speed of light'" 2ns/foot -- (83) For a maximum path length of 3", delay = 3"/12" x 2 = 0.5ns -- (84) From Eq. 82 and Eq. 84, Total propagation delay for the address bus = 2 + 0.5 '" 2.5ns -(85) From the above calculations, it is seen that the derating on the address bus is 2.5ns. Data Bus Derating Calculations From Figure 16, it is seen that the data bus is connected to the floating point unit (IDT79R301 0), two IDT71586 devices, one read register (IDT54/74FCT374A) and one write register (IDT54/74FCT823B). As in the previous section where we considered a 16K x 4 static RAM, we have to calculate the deratings for two cases: i) for an instruction fetch, and ii) for a data store. Bit DO ! IDT79R3000 IDT71586 (2) i 1 IDT54174FCT374A IDT54174FCT823B (I and D) (Caches) IDT79R3010 2853 drw 16 Figure 16. Devices Data Bus Connected to the IDT79R3000 Data Store (IDT79R3000 Outputs Data) Each data bit is connected to two RAM devices The path length of the data bus = 4 inches - Trace capacitance of the data bus one for instruction and one for data. (86) = 4 x 2pF/inch = 8pF - (87) Capacitive loading on the data bus due to the different devices 2 x 7 + 10 + 12 + 10 = 46pF - (88) Total capacitive load = Cdevices + Ctrace = 46 + 8 = 54pF - = 2 x CRAMin + CR3010in + C374out+ C823in = (89) Propagation delay due to speed of light = 4"/12" x 2 = 0.6ns -(90) Delay due to capacitive load Total delay = (54 - 25) 125 = 1.16ns - (91) = 1.16 + 0.7 = 1.8ns '" 2ns- (92) Load Data Into IDT79R3000 (RAM Outputs Data) Since the trace length is the same, the trace capacitance Ctrace = 8pF- (93) Capacitive load = CR3000in + CR3010in + C7 1586in + C374in + C8230ut - Cdevices = 10 + 10 + 7 + 12 + 10 = 49pF - (94) (95) 9.8 26 1DT79R3000 APPLICATION NOTE AN-61· 33MHz SPECIFICATION AND CACHE TIMING Ctotal = 49 + 8 = 57pF - (96) The RAM rated drive = 30pF - (97) Propagation delay due to extra capacitive loading Propagation delay due to path length = 0.8ns Total propagation delay = 1.08 + 0.8 '" 2ns - = (57 - 30) I 25 = 1.08ns - (99) (100) control Signals are connected to the output enable (OE) and write enable (WE) of the instruction and data cache, respectively. Two control signals each are provided for the read and write operations of each of the caches. Assuming the use of a4K x 16 IDT71586 static RAM, each control signal is connected to 2 such static RAMs. Read and Write Control Derating Calculations The effect of the capacitance on the control signals from the IDT79R3000 processorto the caches and the memory interface is considered here. The control signals on the IDT79R3000 are the IRd, DRd, IWr and DWrwhich control the instruction cache read, data cache read, instruction cache write and data cache write, respectively. The read and write Number of devices (SRAM) connected to each control line Input capacitance of each device (SRAM) Total load capacitance Path length = 4" - = 2 x 5 = 1OpF - (98) = 5pF - =2 - (101) (102) (103) (104) Trace Capacitance = 4 x 2pF/in = 8pF = 8pF Tota! capacitance = 10 + 8 = 18pF - (105) (106) There is no extra capacitive loading here as the rated IDT79R3000 load is 25pF Propagation delay due to the trace length = 0.8ns Total propagation delay = 0.8", 1ns - (108) deratings forthe surface mount board are 2ns. The deratings were derived from certain assumptions. The explanation and the methodology used is explained in the previous sections. Following is a generalized equation given by the timing requirements for different frequencies of the IDT79R3000. All calculations are based on the IDT79R3000 specifications for the four speed versions (16, 20, 25 and 33MHz), which are found in the lOT data sheets. Timing Equations for Cache Design This section contains the timing equations that enable us to determine the critical timing requirements of the static RAM that will be used as cache. These equations are based on the use of static RAMs without built-in latches as cache RAMs. The superscript 'd' in the following equations denotes the deratings to be taken into account. The static RAM chosen for illustration is a 4K x 16 IDT71586. The board is assumed to be surface mount for all speeds of the IDT79R3000. The (1 ) (107) Internal Sample to Phase Delay This is the time that the processor needs to sample the incoming data. Typically, forthe IDT79R3000, tsmp ~ 5. 9.8 27 1DT79R3000 33M Hz SPECIFICATION AND CACHE TIMING (2) APPLICATION NOTEAN-61 RAM Address Access Time This equation is used to determine the Address Access time parameter requirements of the static RAM. From the timing diagram of Figure 16, it is easily calculated. The total cycle time for a 33MHz I DT79R3000 is 30ns. If the processor's sample time requirement is met, the time remaining in the cycle is 24ns in which the data has to be presented to the processor. The processor requires a data set-up time of 4ns. The derating factors due to the capacitance and the trace length also have to be taken into account. Using all these factors, the equation is, tRAMAA ~ tcyc - tsmp - tos - tAdrLo d - tRAMAA d 16MHz IDT79R3000: tRAMAA ~ 60 - 10 - 9 - 3.5 - 3 tRAMAA ~ 34.5 20MHz IDT79R3000: tRAMAA ~ 50 - 8 - 8 - 2.5 - 2 tRAMAA ~ 29.5 25MHz IDT79R3000: tRAMAA ~ 40 - 6 - 6 - 2.5 - 2 tRAMAA ~ 23.5 33MHz IDT79R3000: tRAMAA ~ 30 - 4.5 - 4.5 - 2.5 - 2 tRAMAA ~ 16.5 (3) Cache Enable to Sample This equation is used to determine the output enable requirements of the cache RAM and should meet the processor's set-up specification. The output enable time for the latched RAM is specified by the manufacturer and tested fora voitagechangeof200mV (1.732Vto 1.532Vfor IDT RAMs). Fora system, the voltage falls from a level of 3.3V to 1.732 and the added fall time must be considered when specifying the RAM tOE parameter. This fall time is approximately one additional nanosecond. Therefore, the RAM tOE parameter should be one nanosecond lower than the calculated numbers below. tOES ~ tcyc/2 - tRod - tos - tsys-smp + tsys-rd - tOEs d 16MHz IDT79R3000: tOES ~ 30 - 2 - 9 - 10+ 10 -.3 tOES ~ 16 20MHz IDT79R3000: tOES ~ 25 - 1 - 8 - 8 + 8 - 2 tOES ~ 14 25MHz IDT79R3000: tOES ~ 20 - 1 - 6 - 6 + 6 - 2 tOEs ~ 11 33MHz IDT79R3000: tOEs ~ 15 - 1 - 4 - 4.5 + 4.5 - 2 tOES ~ 8 (4) Minimum Read Pulse Width This timing requirement guarantees that the read pulse width generated by the processor is at least as long as the cache RAM output enable time. tOES ~ tcyc/2 - tsys-rd - tOESd 16MHz IDT79R3000: tOES ~ 30 - 10 - 3 tOES ~ 17 20MHz IDT79R3000: tOES ~ 25 - 8 - 2 tOES ~ 15 25MHz IDT79R3000: tOES ~ 20 - 6 - 2 tOEs ~ 12 33MHz IDT79R3000: tOES ~ 15 - 4.5 - 2 tOEs ~ 8.5 9.8 28 1DT79R3000 APPLICATION NOTE AN~1' 33MHz SPECIFICATION AND CACHE TIMING (5) Read-Write I-Cache Data Bus Contention This timing requirement ensures that the RAM output is tri-stated soon enough after the instruction read signal goes high. In the worst case, when the processor performs a store operation, no data contention occurs. tRAMHZ ~ tsys - tRd d + DEn 16MHz IDT79R3000: tRAM HZ ~ 16 - 2 + (-2.5) tRAMHZ ~ 11.5 20MHz IDT79R3000: tRAM HZ ~ 14 - 1 + (-2) tRAM HZ ~ 11 25MHz IDT79R3000: tRAMHZ ~ 12 - 1 + (-1.5) tRAMHZ ~ 9.5 33MHz IDT79R3000: tRAMHZ ~ 9 - 1 + (-1) tRAMHZ ~ 7 (6) Processor Data Set-Up to End of Write This enables the designer to determine whether the cache RAMs have adequate data set-up time when the processor does a store operation. In the equation, the minimum derating is used on the write line (Le., twfll as that is the worst case assumption. tRAMDS ~ tcyC/2 - tsys-smp - tDVal - tDVald - tWr d 16MHz IDT79R3000: tRAMDS tRAMDS ~ ~ 30 -10 - 3 - 3 - (-2) . 16 20MHz IDT79R3000: tRAMDS ~ 25 - 8 - 3 - 2 - (-1) tRAMDS ~ 13 25MHz IDT79R3000: tRAMDS ~ 20 - 6 - 2 - 2 - (-1) tRAMDS ~ 11 33MHz IDT79R3000: tRAMDS ~ 15 - 4.5 - 2 - 2 - (-1) tRAMDS ~ 7.5 (7) Data Hold from End of Write This parameter requirement guarantees that the data hold from end of write of the cache RAM is met when the processor or the read buffer is writing to the RAMs. tRAMHD ~ tsmp-rd + tRAMLZ 16MHz IDT79R3000: tRAMHD ~ 0 + 2 tRAMHD ~ 2 20MHz IDT79R3000: tRAMHD ~ 0 + 2 tRAMHD ~ 2 25MHz IDT79R3000: tRAMHD ~ 0 + 2 tRAMHD ~ 2 I &I 33MHz IDT79R3000: tRAMHD ~ 0 + 2 tRAMHD ~ 2 9.8 29 IDT79R3000 33M Hz SPECIFICATION AND CACHE TIMING (8) APPLICATION NOTE AN-61 Data Set-Up to SysClk This timing parameter ensures that the set-up time into an external register (for the main memory interface) is sufficient enough for the case when the processor is doing a store. The data is clocked in the register on the rising edge of the buffered SysOut (through an inverting IDT54/74FCT240A). In this equation, tsys(min)d is used to ensure worst case calculations. tsetupSys ~ tcyc/2 - tsys - tOVal - tOVal d + tsysd + t240POmin 16MHz IDT79R3000: tSetUpSys ~ 30 - 16 - 3 - 3 + 2 + 1.5 tSetUpSys ~ 11.5 20MHz IDT79R3000: tSetUpSys ~ 25 - 12 - 3 - 2 + 1 + 1.5 tSetUpSys ~ 10.5 25MHz IDT79R3000: tSetUpSys ~ 20 - 12 - 2 - 2 + 1 + 1.5 tSetUpSys ~ 6.5 33MHz IDT79R3000: tSetUpSys ~ 15 - 9 - 2 - 2 tSetUpSys ~ 4.5 + 1 + 1.5 (9)· Data Hold from SysClk This timing parameter is to guarantee that the hold time specification for an external register is met on a processor store. In this equation the minimum value of tRod is taken to ensure worst case numbers. tHoldSys ~ tsys-rd - tsysd - t240POmax + tRAMLZ + tRd d 16MHz IDT79R3000: tHoldSys ~ 6 - 2 - 4.8 + 2 + 1 tHoldSys ~ 2.2 20MHz IDT79R3000: tHoldSys ~ 6 - 1 - 4.8 tHoldSys ~ 3.2 +2 +1 25MHz IDT79R3000: tHoldSys ~ 6 - 1 - 4.8 + 2 + 1 tHoldSys ~ 3.2 33MHz IDT79R3000: tHoldSys ~ 4.5 - 1 - 4.8 + 2 + 1 tHoldSys ~ 1 .9 (10) Address Set-Up to End of Write: This equation enables us to determine the timing requirement for the RAM so that the address setup time is sufficient before the trailing edge of the write pulse. tRAMAW ~ tcyc - tsmp-sys - tAdrLod + tWf 16MHz IDT79R3000: tRAMAW ~ 60 - 10 - 3.5 + 2 tRAMAW ~ 48.5 20MHz IDT79R3000: tRAMAW ~ 50 - 8 - 2.5 tRAMAW ~ 40.5 +1 25MHz IDT79R3000: tRAMAW ~ 40 - 6 - 2.5 tRAMAW ~ 32.5 +1 33MHz IDT79R3000: tRAMAW ~ 30 - 4.5 - 2.5 + 1 tRAMAW ~ 24 9.8 30 IDT79R3000 33M Hz SPECIFICATION AND CACHE TIMING APPLICATION NOTE AN-6l (11) Write Hold Pulse Width: This requirement guarantees that the cache RAMs minimum write pulse width specification is met. tRAMPW ~ tcyC/2 - tWrDly 16MHz IDT79R3000: tRAMPW ~ 30 - 5 tRAMPW ~ 25 20MHz IDT79R3000: tRAMPW ~ 25 - 4 tRAMPW ~ 21 25MHz IDT79R3000: tRAMPW ~ 20 - 3 tRAMPW~ 17 33MHz IDT79R3000: tRAMPW ~ 15 - 2 tRAMPW ~ 13 From the above calculations and Figure 15, it can be seen that the data set-up to the processor is met. The output enable of the RAM, which is controlled by I Rd, goes high and the RAM output starts to go tri-state. From the figure, the reader may correctly question whether the hold time requirements of the IDT79R3000 are met. They are indeed met by the capacitance on the bus and also because CMOS devices are being used. The technical note entitled Meeting Bus Hold for the IDT79R3000 gives a more detailed explanation. Ph ase2 AdrLo Phase 1 " I Adr D Adr / Data /// ///// ~"','\.,,"--'\.. IRd 5 tAs lelk phi sys (0) '" I I in " ."""""""'" LI////J~ / 5 tAH '" / smp (rd) (6) phi sys (12) (0) smp (rd) (6) phi (12) 2853 drw 17 Figure 17. Address Setup and Hold Timing for a Latched RAM (25MHz ID179R3000) From Figure 17, it isclearlyseenthatthe address setup and hold time forthe latched RAMs are met by using IClk to capture the instruction address. Figure 17 is to illustrate the timings for a 25MHz IDT71586 latched RAM. Similar timing diagrams can be drawn to verify the setup and hold times for IDT79R3000 operating at different frequencies. II I 9.8 31 I DT79R3000 APPLICATION NOTE AN-61 3314Hz SPECIFICATION AND CACHE TIMING Legend: tRAMAA tRAMOE tRAMHZ tRAMLZ tRAMHD - RAM RAM RAM RAM RAM Access Time Output Enable Time OutPut Low Impedance to Output in High Impedance Output in High Impedance to Output in Low Impedance Data Hold Time tos IDT79R3000 Data Set-up Time tsys Phase Difference between Clk2xSys and Clk2xPhi trd Phase Difference between Clk2xPhi and Clk2xRd tsmp Phase Difference between Clk2xPhi and Clk2xSmp teye Cycle time of the IDT79R3000 tsmp-rd =tsmp - trd t240PO - Propagation delay from Clk to Output of IDT54174FCT240A READ CYCLE TIMING SPECIFICATIONS Description 25.0MHz 20.0MHz 16.7MHZ Parameter 33.0MHz Min. Max. 15 10 - - 5 -.:.. 4 - 5 - 4 - - 24 - 15 24 15 Min. Max. Min. Max. Min. - 30 - 25 - 30 35 Max. tRC Read Cycle 35 tCH ALEN High 10 tCl ALEN Low 10 tAS Add. Latch Set-Up 5 tAH Add. Latch Hold 5 tM Address Access 35 tACE Chip Enable Access 16 - 14 11 - - 3 - 3 - 3 2 - 2 - 20 - 15 9 - - 10 10 5 5 10 tOE Output Enable - tCll CE to Out in LZ 3 tOll CE to Out in LZ 2 tCHZ CE to Out in HZ CE to Out in HZ 11 - 22 tOHZ - 25 11 - tOH Output Hold from Address Change 3 - 3 - 3 2 30 8 8 3 8 7 2853tbl09 Table 7. Read Cycle Timings for an lOT Static RAM with latches 9.8 32 IDT79R3000 33MHz SPECIFICATION AND CACHE TIMING APPLICATION NOTE AN-61 WRITE CYCLE TIMING SPECIFICATIONS 16.7MHZ Parameter Description Min. Max. - 20.0MHz Min. Max. 25.0MHz Min. - 25 10 10 5 5 - 25 0 17 20 tCW CE to End-of-Write tWR Write Recovery 35 10 10 5 5 35 0 25 25 0 tWHZ Write to Out in HZ - 15 - 15 - tOW Data Set-Up 16 tDH Data Hold tOW Out Active from End-of-Write 0 5 - 13 0 5 - 11 0 5 tWC Write Cycle tCH AlEN High tCl AlEN low tAS Add. latch Set-Up tAH Add. latch Hold tAW Add. to End-of-Write tASW Add. Set-Up tWP Write Pulse Width - 30 10 10 5 5 30 0 20 20 - - 0 - - - - - - 0 Max. 13 - - 33.0MHz Min. Max. 15 0 11 11 - 0 - 15 8 8 4 4 - 8 7 0 5 - 2853 !bllO Table 8. Write Cycle Timings for an IDT Static RAM with Latches 9.8 33 I DT79R3000 33M Hz SPECIFICATION AND CACHE TIMING APPLICATION NOTE AN-61 REFERENCES 1) Kane, Gerry, "mips RISC ARCHITECTURE," Prentice-Hall Inc., NJ, 1988 2) IDT RISC R3000 Family Hardware User Manual, October 1988 3) IDT RISC R3000 Family Data Sheets, 1988 . 4) IDT Data Book, 1988 5) IDT Data Book Supplement, 1989 6) Simha, Satyanarayana , "Meeting Bus Hold for the IDT79R3000 - Technical Note", Yet to be published 9.8 34 f~5 APPLICATION NOTE AN-62 R3000 FAMILY SOFTWARE TOOLS FOR PERFORMANCE ANALYSIS Integrated Device Technology, Inc. by V.S. Ramaprasad and Roy M. Johnson INTRODUCTION System design involves trade-offs between several components to achieve the desired price/performance ratio. Existing development tools of the IDT79R3000 enable the designer to measure and compare the price/performance of various configurations. cache2000, Pixie and Pixstats are software tools that allow the designer to analyze the performance of the simulated IDT79R3000 system by executing a designer's application program on the proposed system. Pixie and Pixstats are part of the RISC/os, while cache2000 is distributed with the System Programmer's Package (SPP). SPP is a set of development tools that include monitors, a standalone C compiler, a standalone I/O library, local and remote debugging tools, downloading software via RS232 and Ethernet, cache2000 that simulates memory subsystem, and complete system simulation software called SABLE. SABLE simulates IDT79R3000/3001 instructions, caches, TLB behavior, a disk drive and a DUART for a simple console interface. The simulation results of the cache2000 and Pixstats provide the designer with all the information Tag AddrLo Data IDT79R3000 ~ Instruction Cache MEMORY SUBSYSTEM To meet the high processing speed of the MIPS RISC processors, the memory subsystem (Figure 1) is usually structured into a hierarchyof small high-speed cache memory, read/write buffers and large, slow main memory. The cache memory is typically made up of an instruction cache and adata cache. These cache memories allow the CPU to fetch one instruction and one word of data in every clock cycle. To retire the writes to the main memory in one clock cycle, it is common to use a write buffer as an interface between the high-speed CPU and the slower main memory. The write buffer functions as a FIFO of multiple levels and has the ability to perform special functions to minimize the main memory bus traffic. cache2000 allows the user to choose various parameters for these three basic blocks of the memory hierarchy and then analyze the performance of the proposed memory system. I Read! Write Buffers I I L needed to determine the best possible price/performance solution. Addr Main Memory System Data ~ Data Cache I 2854 drw 01 Figure 1. Typical Memory Subsystem IDT79R3000 cache2000 involved in accessing the memory. But, it does not take into account any interlock cycles of the CPU or the FPA. These interlock cycles can be determined from the output of Pixstats. The main memory model simulated is Page mode, and the latencies associated are changeable. By simulating different memory subsystems with the cache2000, the user can determine the performance of the application program on those systems and can arrive at an optimal solution. cache2000 is a software tool that simulates a proposed IDT79R3000/3001 memory subsystem. It analyzes the memory references made by an application program during its execution and generates various statistics about its dynamic behavior. cache2000determines the execution time taken by the user's application program by Simulating the penalties 1!:l1990 Integrated Device Technology, Inc. 9.9 7/90 1 II • R3000 FAMILY SOFTWARE TOOLS FOR PERFORMANCE ANALYSIS APPLICATION NOTE AN-62 cache2000. To compile cache2000.c , an include file called ''trace.h'' must be in the current directory. Command level modifications can be done at compilation time or at execution time of cache2000. The parameters of the caches can be changed by redefining them when invoking the compiler. The main memory latencies and the write buffer depth can be altered at the time of execution. (A complete list of all the options of cache2000, Pixie and Pixstats is enclosed at the end of this application note.) The options include: PIXIE Pixie is a Unix program that generates a new executable of the user's code by adding instructions that trace pieces of code known as basic blocks. This new executable is referred to as pixified executable. A basic block is a sequence of instructions with a single entry point and a single exit point. Once processing starts at the entry point, it will proceed until the exit point without branching. Along with the modified executable, Pixie will also generate a file (with suffix Addrs) that contains the addresses of these basic blocks. When the pixified executable is run (by typing its name), a new file with the execution count of each basic block (with suffix Counts) is generated. Programs like cache2000 and Pixstats use the Addrs and Counts files, along with the pixified executable, to generate formatted profile information. When using cache2000, the user's program must be pixified with -idtrace option to generate memory referencing information of instructions and data. When using Pixstats, the user's program need not be pixified with any option. (a) Editingthe source file cache2000.c. -Invoke an editor (vi or emacs) on cache2000.c and modify any changeable parameter. The list of changeable parameters is given at the end. Save the changes and compile the source file with the C compiler. To compile cache2000.c, the file trace.h should also be present in the current directory. The highest level of optimization (04) is used to produce the most optimal executable. The math library is linked by using -1m. Once cache2000.c is modified with all the required parameter values, there is no need to use any options either at compile time or at run time. PIXSTATS Pixstats is also a Unix program that analyzes the program's execution characteristics. It generates statistics regarding the opcode frequencies, interlocks and a mini-profile ofthe program execution. To run Pixstats, the user's program needs to be pixified (running Pixie) without any option. To generate the Counts file, the pixified program must be executed. Pixstats assumes that the user's program fits into the caches and, therefore, does not add any memory penalties. The integer multiply/divide interlock cycles and the floating point interlocks listed in the output of Pixstats needs to be added to the number of cycles given by cache2000 to accurately determine the total numberof cycles the user's program takes for execution. This execution time is with an FPA in the system, as the IDT79R3000/3001 compilers assume the presence of an FPA in the system at the time of code generation. If FPA does not exist, the FPA instructions are emulated in software. Every floating point instruction will cause an exception and the exception handler will invoke the emulating software. A listing of the emulation overhead is provided in the following pages. The overhead and the number of the floating point accelerator instructions contained in the Pixstats output will help the designer in estimating the overhead of a system without an FPA. cc -04 -0 cache2000 cache2000.c -1m (b) Making changes to the Cache2000 at compilation time by redefining the parameters cc -04 -0 cache2000 -DLsize_log=12 -Dd_size_log=10 cache2000.c -1m This compiles a cache2000 that models an instruction cache of 4K words and data cache of 1K words. The compile time changeable parameters of cache2000 are: Parameter Name LrefilL/og Lsize_log d_refilL/og d_size_log byte--..9athering read_conflicLcheck istreaming Possible Values 0,2,4,5 10, 11, 12, 13, 14, 15, 16 0,2,4,5 10, 11, 12, 13, 14, 15, 16 0,1 0,1 0,1 (c) Setting the parameters at the execution time ofcache2000 - The memory latencies can be set at the time of simulation. cache2000 is executed in parallel with the pixified executable of the user's application program. The Unix command "make pipe" is used· to run these two processes (cache2000 and user's program) in parallel. The tiser's program (pixified with -idtrace), when executed, will generate memory referencing information to a Unix file with file descriptor 19. This information is piped (by the make pipe command) to the standard input file (file descriptor 0). cache2000 reads the standard input and proceeds with the memory subsystem simulation. An MODELING MEMORY SUBSYSTEMS WITH CACHE2000 To simulate different memory subsystems, cache2000 is modified to the desired parameters of the proposed system. Creating a cache2000 model to simulate the desired memory subsystem can be done either by editing the source file "cache2000.c", or defining the parameters at Unix command level. Allthe possible changeable parameters can be assigned new values when the source file is edited. Once the source file is edited, it should be compiled to create the executable of 9.9 2 APPLICATION NOTE A~2 R3000 FAMILY SOFTWARE TOOLS FOR PERFORMANCE ANALYSIS makepipe 19 mYJJrog.pixie 'f' 0 cache2000 -read_latency 6> mYJJrog.cache2000 example of setting up the read latency to 6 cycles is: makepipe 19 mYJJrog.pixie 'f' 0 cache2000 -read_latency 6> mYJJrog.cache2000 Now cache2000 models a main memory with a read memory latency of 6 cycles. STEPS IN PERFORMANCE ANALYSIS The steps involved in measuring the performance of the IDT79R3000/3001-based systems are: (1) Compile the source of the application program with the desired level of optimization to create the executable module. cc -04 -0 my_prog mYJJrog.c (2) Create a cache2000 executable to model the proposed memory subsystem by compiling the cache2000.c. (5) Run Pixstatstofindoutthe interlock cycles of IDT79R3000/ 3001 integer multiply/divide unit and the interlock cycles of the FPA IDT79R3010. These interlock cycles should be added to the cycles from the cache2000 output. The output of Pixstats also contains the IDT79R3000/3001 opcode distribution. This provides the designer with the percentages of the FPA instructions in the application program. Running Pixstats is a three step process. The steps are: (a) Repixify the application program without any option. (b) Run the pixified program by typing in the name of the pixified output to generate the basic block counts file. (c) Run Pixstats. The commands to accomplish these tasks are: (d) Pixify the application program. cc -04 -0 cache2000 -DLsize_'og=12 -Dd_size_log=10 -DLrefilUog= 5 -Distreaming= 1 cache2000.c -1m This compiles a cache2000with instruction cache of 4K words, data cache of 1K words and 32 words of block refill for instruction cache with instruction streaming enabled. pixie -0 my_prog.pixie myprog This generates mYJJrog.Addrs file containing the basic block addresses. (e) Run the pixified program (3) Pixify the application program with -idtrace. myJJrog.pixie pixie -idtrace -0 mYJJrog.pixie mYJJrog The above command pixifies the application program mYJJrog and places the output in mYJJrog.pixie. (4) Use makepipe to run the pixified application program and Cache2000 in parallel. makepipe 19 my_prog.pixie '/' 0 cache2000 > myJJrog.cache2000 This command pipes the memory references generated by mYJJrog.pixie to the cache2000. The statistical information produced by cache2000to the standard output can be redirected to mYJJrog.cache2000. Arguments, input file, output file forthe user's application program can be specified by: makepipe 19 mYJJrog.pixieargs '<' inpuLfile'>' outpuLfile '/' 0 cache2000 > mYJJrog.cache2000 By running the makepipe in the background, the designer can fire up more than one simulation. The following command sets up the read latency from the main memory to 6 cycles and runs the simulation in the background. 9.9 This generates mYJJrog.Counts file containing the execution count of the basic blocks. (f) Run Pixstats. pixstats my_prog> mYJJrog.pixstats Pixstats uses the Addrs and Counts files and redirects the output to mYJJrog. pixs tats. cache2000 OUTPUT The cache2000program prints out the statistical information to the standard output file. It can be redirected to any desired file. The statistical information is generated every time after the execution of user-specified number of cycles. By setting the print variable to an extremely large number (20000000000), the output generation can be restricted to the final results. The following is one such output file for a typical image processing application program written in C language. The executable size is 500Kbytes, the text segment is 300Kbytes and data segment is 200Kbytes consisting mainly of uninitialized data items of size greater than 512 bytes. The cache2000 models a memory subsystem of 8Kbytes of 3 rI' • R3000 FAMILY SOFTWARE TOOLS FOR PERFORMANCE ANALYSIS APPLICATION NOTE AN-62 instruction cache, 8Kbytes of data cache, 4 words of block refill size for the I-cache and 1 word refill for the D-cache. Instruction streaming is enabled. The cache flushes are turned off to model an embedded application. The clock speed is 16.67MHz. The Page mode main memory read latency is 5 cycles, giving an access time of 300ns. Lines in the file are numbered in the first column for referencing in the explanation. 1 2 cache2000 -flush 20000 -cycle 60 -clock 16.67 -wbsize 1 -read_latency 5: Fri Oct 1321 :39:051989 3 4 5 73398701 cycles (1.569), 4.4s @ 16.7MHz 46783524 instructions (46.8M) 0 cache flushes 6 7 8 9 10 11 12 13 14 15 16 17 18 19 word read double read word write double write byte write half write swr swl Iwc1 Idc1 swc1 sdc1 basic block (null) 20 21 22 23 l-cache size = 2048 words,direct-mapped, 4 word refill D-cache size = 2048 words,direct-mapped, 1 word refill,write-through Write buffer = 1 deep, conflict checking on d-miss, byte gathering TLBsize = 56 entries, associative, random replacement, page size =1 024words 9008972 0 4161532 0 445222 438867 0 73 255500 0 5635 0 8615556 0 19.26% 0.00% 8.90% 0.00% 0.95% 0.94% 0.00% 0.00% 0.55% 0.00% 0.14% 0.00% 18.42% 0.00% per instr 24 25 26 27 28 29 uTLB misses: I-TLB misses: D-TLB misses: I-cache misses: D-cache misses: Idle writes: 318004 9394 7927 2007056 234278 2171982 30 Page mode writes: 2561693 31 Non-page writes: 377581 32 33 34 35 36 37 38 Total writes: Write merges I-stream branch: I-stream d-miss: I-stream write: I-stream block: I-stream words: 5111256 0 351599 31036 252829 1371592 0.4/2.1/1.6 9.9 per cycle per other (0.68%1 0.43%) (0.02%1 0.01%) (4.29%1 0.01%1 0.06%) (0.01%1 2.73%) (0.50%1 0.32%1 2.53%) (4.64%1 2.96%1 42.5%) (5 memory cycles) (5.48%1 3.49%1 50.1 %) (2 memory cycles) (0.81%1 0.51%1 7.4%) (6 memory cycles) (10.93%1 6.96%) (0.00%1 0.00%1 0.00%) (0.75%1 0.48%1 17.5%) (0.07%1 0.04%1 1.5%) (0.54%1 0.34%1 12.6%) (2.93%1 1.87%1 68.3%) 4 R3000 FAMILY SOFTWARE TOOLS FOR PERFORMANCE ANALYSIS 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 uTLB miss cycles: I-TLB miss cycles: D-TLB miss cycles: I-cache miss cycles: I-cache streaming: D-cache miss cycles: 2-cycle SB/SH/SWUSWR: Write buffer full cycles: (average 1.3 per write) Write wait cycles: (average 1.4 per miss) Subtotal: Instructions: Total: TLB memory cycles: Memory bus cycles: simulation runtime: DONE! APPLICATION NOTE AN-62 318004 122122 103051 18063504 -4155427 1405668 884162 6780214 ( 0.68%/ 0.43%) (penalty 1) (0.26%1 0.17%) (penalty 13) (0.22%/ 0.14%) (penalty 13) (38.61 %/ 24.61 %) (penalty 9) (-8.88%/ -5.66%) (3.00%/ 1.92%) (penalty 6) (1.89%1 1.20%) (penalty 1) (14.49%1 9.24%) 3093879 ( 6.61 %/ 4.22%) 26615177 (56.89%1 36.26%) 46783524 (100.00%1 63.74%) 73398701 (156.89%1 100.00%) 111030 (0.24%1 0.15%) 33976968 (72.63%/ 40.29%) 248.4u 23.4s, 5473.6w/5%, 172 KinsVs, 255 Krefs/s EXPLANATION OF FIELDS cache2DDD Runtime Parameters 1 2 cache2000 -flush 20000 -cycle 60 -clock 16.67 -wbsize 1 -read_latency 5: Fri Oct 1321:39:051989 All the cache2000 runtime parameters that are set by the user are listed in this line. The cache flush interval is set to 20,000 million cycles. This implies that the caches are flushed for every 20 billion cycles of user's program execution. In Multiprogramming/Mu ltitasking systems, on context -switching, implicit cache flushing occurs due to a series of cache misses. These cache misses are equivalent to caches being flushed. To simulate that environment, cache2000allows the user to set the cache flush interval. By choosing a large number (like 20 billion) for this variable, this flushing can be turned off for embedded applications that do nol run Multiprogramming! Multitasking software. The clock speed selected by the user is 16.67MHz and, therefore, the cycle time is 60 nanoseconds. The write buffer simulated is 1 word deep. The Page mode main memory read latency on load misses is 5 cycles or 300 nanoseconds. Cycles and Instructions 3 73398701 cycles (1.569), 4.4s @ 16.7MHz 4 46783524 instructions (46.8M) The user's application program takes 73398701 cycles for complete execution. To execute these cycles, the processor takes 4.4 seconds when running at 16.67MHz. It takes an average of 1.569 cycles per instruction for simulation of the current memory subsystem and ofthe current user's application program. As we change the parameters for the memory subsystem, notice that this average number of cycles per instructionwillalsochange. Forafinetunedmemorysubsystem this number approaches one. The clock speed can be set to the desired frequency with -clock option at execution time of the cache2000. Line 4 gives the number of instructions executed. The last field in line 4 is the number of instructions in millions. The number of instructions is independent of the memory system being simulated. It only depends on the user's program, the data files the program is using and the optimization level used in compiling the program. Cache Flushes 5 0 cache flushes are zero forthis example, as the program does not execute 20 In the current simulation, the caches are flushed for every 20,000 million instructions to simulate an embedded application environment without any context switching. The cache flushes billion instructions. 9.9 5 &I I R3000 FAMILY SOFTWARE TOOLS FOR PERFORMANCE ANALYSIS Reads 6 word read 7 double read. 9008972 o APPLICATION NOTE AN-62 19.26% 0.00% These two lines give the number of read instructions and their percentage of the total number of instructions. In the current simulation, 19.26% of the total instructions are word (32 bits) load instructions. For cached data/instructions, IDT79R3000 treats all load instructions as load word Writes 8 word write 9 double write 10 byte write 11 half write 12 swr 13 swl 4161532 0 445222 438867 0 73 instructions since a word is read from the caches on cache hits. On a cache miss, a word or multiple words are read from the memory. The double reads on line 7 are not applicable to the IDT79R3000. 8.90% 0.00% 0.95% 0.94% 0.00% 0.00% the IDT79R3000 does a read-modify-write operation. The processor does a load from the cache (on cache hit) at the store address, merges the data to be stored with the data loaded and writes the result back to cache. On a cache miss for the load, the partial word is written to the main memory (through a write buffer), leaving the cache untouched. These lines list all the store instructions executed by the processor. All the load/store instructions work only with the data cache if the load/store address is cached. In this example, almost 9% of the stores are store word instructions. This is almost half of the word read percentage, which is typical of many applications. Double writes are not applicable to the 1DT79R3000. For partial word writes (sb,sh,swr,swl), FPA (cp1) Read and Writes 255500 14 Iwc1 15 Idc1 0 16 swc1 65635 0 17 sdc1 0.55% 0.00% 0.14% 0.00% The number of reads and writes to/from the FPA (coprocessor 1) are given by the above four lines. The double word load and store (ldc1,sdc1) are not applicable to the IDT79R3000. The coprocessor load/store instructions are word operations. The CPU and the coprocessors are tightly coupled as they share the same data bus. The Iwc1 and swc1 Basic Blocks 18 basic block 8615556 18.42% A basic block is a sequence of instructions with a single entry point and a single exit point. Once execution starts at the entry point instruction, it proceeds sequentially until the exit point instruction. Entry points are the target addresses of a Anulled Instructions 19 (null) 0 instructionsareexecutedbythelDT79R3000. Forthecurrent user's program, there are 255500 FPA load instructions. The percentage of these loads to the total number of instructions is 0.55. The numberof store instructions is 65635 and 0.14 is the percentage to the total number of instructions. jump/branch instruction and exit pOints are addresses holding a ju mp/branch instruction. The basic block count on line 18 indicates the number of basic blocks of the user's application program executed. 0.00% Not applicable for the IDT79R3000. 9.9 6 APPLICATION NOTE A~2 R3000 FAMILY SOFTWARE TOOLS FOR PERFORMANCE ANALYSIS Instruction Cache 20 I-cache size = 2048 words,direct-mapped,4 word refill 21 O-cache size = 2048 words,direct-mapped,1 word refill,write-through The instruction cache and the data cache for the I0T79R3000 are direct-mapped. For stores, the data is written to the main memory through the data cache, implementing the write-through policy. For partial word stores, the data cache is looked up for a cache hit. If it misses, the partial word is written to the main memory leaving the cache unchanged. The I-cache size, O-cache size, I-cache refill size Micro TLB Misses 24 uTLB misses: 318004 (0.68%/ 0.43%) In translating the virtual address of an instruction to its physical address, a two-entry table called uTLB (micro TLB) is employed. This micro TLB is not used for data references. If a uTLB miss is encountered, the TLB (56 entries) is looked up. Whenever there is a uTLB miss, its least recently used TLB Misses 25 I-TLB misses: 26 O-TLB misses: 9394 7927 2007056 234278 entry is updated with the appropriate entry from the main TLB. The percentage of uTLB misses to the total number of instructions and the percentage of the misses to the total number of cycles are given in paranthesis. (0.02%/ 0.01%) (0.02%/ 0.01%/ 0.06%) The above two lines account for the number of times the mapping information needed ( a TLB entry) to translate the virtual address to physical is missing from the TLB. The first line gives the number of misses for instructions while the second one accounts for the data references. The main memory holds the entire page table and on a TLB miss, this Cache Misses 27 I-cache misses: 28 O-cache misses: and O-cache refill size can be chosen by the user. In the current simulation, the caches are 2048 words deep (8Kbytes) and the block refill sizes for I-cache and O-cache are 4 words and 1 word, respectively. These parameters can be set at compilation time of the Cache2000. The possible cache sizes are 4, 8,16,32,64,128 and 256 Kbytes. The block refill sizes can be selected as 1, 4, 8, 16 or 32 words. page table is looked up in order to update the TLB. The TLB is a fully associative memory and a random replacement algorithm is employed for TLB updates from the page table. The last number in paranthesis for O-TLB misses is the percentage of the O-TLB misses to the total number of load and store instructions. (4.29%/ 2.73%) (0.50%/ 0.32%/ 2.53%) The I-cache misses are the numberof times the processor referred to the instruction cache and could not find the instruction. The number of misses on line 27 indicates that the 2Kwords of I-cache are not quite sufficient for 100% hit rate. Increasing the cache size might enhance the performance. The first percentage, 4.29%, is the I-cache miss rate, making the I-cache hit rate 95.71%. The O-cache misses are the number of attempts the processor made to read from the data cache but could not locate the word. The data cache miss rate is given by the last percentage in paranthesis (2.53%) which is the percentage of O-cache misses to the number load word instructions (Iw) and the number of load word instructions to the coprocessor 1 (lwc1). The data cache misses, however, do not include the load misses occuring in read-modify-write operations of partial word stores. II 9.9 7 R3000 FAMILY SOFTWARE TOOLS FOR PERFORMANCE ANALYSIS Main Memory Writes 29 Idle writes: 2171982 30 Page mode writes: 2561693 31 Non-page writes: 377581 32 Total writes: 33 Write Merges 5111256 0 APPLICATION NOTE AN-62 (4.64%1 2.96%1 42.5%) (5memory cycles) (5.48%1 3.49%1 50.1 %) (2 memory cycles) (0.81%1 0.51%1 7.4%) (6 memory cycles) (10.93%1 6.96%) (0.00%1 0.00%1 0.00%) cache2000 simulates Page mode main memory. All the different types of writes to the main memory are listed here. Idle writes are writes to the main memory that do not follow or precede another write. Page mode writes are successive writes to the same page. Non-page writes are successive writes to different pages. Total writes are the summation of these three types of writes. The third percentage in the first set of parentheses on lines 29 through 31 is the percentage of the corresponding write to the total writes. The numberof memory cycles assumed in the current simulation to perform these Streaming 34 I-stream branch: 35 I-stream d-miss: 36 I-stream write: 37 I-stream block: 38 I-stream words: 351599 31036 252829 1371592 0.4/2.1/1.6 writes is given in the second set of parentheses. These cycles can be specified by the user at run time of cache2000. Write merges involve merging the words in the write buffer if the words have the same address. These write merges are simulated by Cache2000 if byte gathering is turned on. Byte gathering is one of the features of the write buffer, IDT79R3020. Write merges reduce the number of references made to the main memory for data updates. In this simulation with 1 word deep write buffer and byte gathering tu rned on, the nu mber of write merges is zero. (0.75%1 (0.07%1 (0.54%1 (2.93%1 On every I-cache miss, a block refill number of instructions is brought into the instruction cache. Enabling streaming allows the CPU to execute these instructions as they get written into the instruction cache. In other words, the CPU does not have to wait forthe completion ofthe writing of a block of instructions into the instruction cache before it starts the execution. Line 37 gives the number of times the CPU streamed through a full block. Lines 34 to 36 list the number of times the processor dropped out of streaming due to branches, data cache misses, write buffer flushes and partial word stores. Line 34 gives the number of occasions the processor dropped out of streaming because of executing a branch/jump instruction. If the CPU is executing a load and faces a data cache miss, streaming is aborted; Line 35 accounts for such cases. The processor also drops out of streaming whenever the write buffer is full and the current instruction needs to write to the write buffer, orwhenever it has Micro TLB Miss Cycles 24 uTLB misses: 318004 39 uTLB miss cycles: 318004 0.48%1 0.04%1 0.34%1 1.87%1 17.5%) 1.5%) 12.6%) 68.3%) to perform a read-modify-write operation for partial word store instructions. I-stream write on Line 36 gives the number of such drop outs. The last line, 38, gives the instruction number in the block that caused the I-cache miss (0.4 or the second instruction in the block; numbering of the instructions starts from 0 in a block). the number of instructions the processor streamed through (2.1) and the number of instructions of the block the processor did not execute (1.6). The numbers on Line 38 are average numbers. The last percentage in lines 34 to 37, under the other column, is the percentage of the corresponding streaming number to the number of I-cache misses on Line 27. Dropping out of streaming does not stop the refilling of the instruction cache with the block refill number of instructions. When streaming is aborted, the processor waits till the refilling is finished and then starts executing the instruction that caused the drop out. (0.68%1 0.43%) (0.68%1 0.43%) (penalty 1) misses (318004) with the penalty (1) on Line 39. This penalty of 1 cycle can not be changed by the user. The percentages in parentheses on Line 39 are the penalty cycles per instruction and per cycle, respectively. When the uTLB does not contain the entry to translate the virtual address of an instruction to its physical address, there is a penalty (1 cycle) in referring the main TLB and updating one of its entries with the appropriate entry for future use. The miss cycles in Line 39 are a product of the number of uTLB 9.9 8 APPLICATION NOTE AN-62 R3000 FAMILY SOFTWARE TOOLS FOR PERFORMANCE ANALYSIS TLB Miss Cycles 25 I-TLB misses: 26 O-TLB misses: 40 I-TLB miss cycles: 41 O-TLB miss cycles: 9394 7927 122122 103051 (0.02%/ (0.02%/ (0.26%/ (0.22%/ cycles do not account for any I/O cache miss penalty cycles that might arise while executing the TLB refill algorithm. The percentages in parentheses on lines 40 and 41 are penalty cycles per instruction and per cycle, respectively. The penalty cycles for loading the main TLB and/or the uTLB with a single page table entry from the main memory is given by lines 40 and 41. The penalty for every miss (13 cycles) is fixed and can not be set by the user. These miss Cache Miss Cycles 27 I-cache misses: 28 O-cache misses: 42 l-cache miss cycles: 43 I-cache streaming: 44 O-cache miss cycles: (4.29%/ 2.73%) (0.50%/ 0.32%/ 2.53%) (38.61 %/ 24.61 %) (penalty 9) (-8.88%/ -5.66%) (3.0%/ 1.92%) (penalty 6) 2007056 234278 18063504 -4155427 1405668 The penalty cycles associated with the I-cache misses and the O-cache misses are given by lines 42 and 44. The penalty for a miss is the sum of read latency of the main memory (user selectable) and the block refill size. In the current simulation the read latency is set to be 5 cycles and the block refill size is chosen to be 4 words for instruction cache and 1 word for the Partial Word Stores 10 byte write 445222 11 half write 438867 0 12 swr 13 swl 73 45 2-cycle SB/SH/SWUSWR: 884162 data cache. If instruction streaming is enabled, the penalty for I-cache misses is reduced because of concurrent execution of the instructions with the refill of the instruction cache. This gain in cycles is given in Line 43. The percentages in parentheses on lines 42 to 44 are penalty cycles per instruction and per cycle, respectively. 0.95% 0.94% 0.00% 0.00% (1 .89%/ 1.20%) (penalty 1) Partial word store instructions take two cycles as the processor performs a read-modify-write operation, so the penaHy for each store is one cycle. The total penalty cycles due to these instructions is given in Line 45 (884162). The Write Buffer Penalty Cycles 46 Write buffer full cycles: 6780214 47 Write wait cycles: 0.01%) 0.01 %/ 0.06%) 0.17%) (penalty 13) 0.14%) (penalty 13) number of 2-cycle stores is the sum of all the partial word stores listed from Line 10 to 13. The percentages in parentheses on Line 45 are the penalty cycles per instruction and per cycle, respectively. (14.49%/ 9.24%) (average 1.3 per write) (6.61 %/ 4.22%) (average 1.4 per miss) 3093879 Whenever the write buffer is full, the next write causes the buffer to be flushed. The processor waits while this flush is carried out. For the current user program, the processor waited for 6780214 cycles (Line 46) while the main memory is being updated with the contents of the write buffer. The average wait penalty per write due to a full write buffer is given in second setof parentheses on Line 46 (1.3). The write buffer continually updates the main memory whenever it has' some 9.9 data and the memory bus is free. When there is an I-cache miss or a O-cache miss, the memory bus is busy supplying instructions and data holding off the write buffer writes. The number of cycles the writes are kept waiting due to reads from the main memory on cache misses is given on Line 47. The average number in parentheses is the write wait cycles per l- and D-cache misses. 9 &I R3000 FAMILY SOFTWARE TOOLS FOR PERFORMANCE ANALYSIS Subtotal 48 Subtotal: 26615177 APPLICATION NOTE AN-62 (56.89%1 36.26%) The subtotal is the total numberof penalty cycles minus the cycles gained due to streaming. The percentage of the total Instructions 4 46783524 instructions (46.8M) 49 Instructions: 46783524 penalty cycles to the total instructions and the percentage of the total penalty cycles to the total cycles is given in ( )s. (100.00%1 63.74%) Line 49 gives the total number of instructions of the current application program executed. The number of instructions might change if the application program is data dependent or if a different optimization level is used in compiling the program. Total Cycles 3 73398701 cycles (1.569), 4.4s@ 16.7MHz 50 Total: 73398701 (156.89%1 100.00%) The total number of cycles the processor takes to execute the current userprogram isthe sum ofthe numberof instructions (1 cycle/instr) and the total number of penalty cycles due to TLB Refill Cycles 51 TLB memory cycles: 111030 uTLBITLB misses, cache misses, 2-cycle instructions, write buffer flushing and write buffer waits, minus the cycles gained due streaming. (0.24%/0.15%) For every TLB miss, a TLB refill algorithm gets executed. This kernel code is cached and resides in ksegO. The cache misses occuring while this algorithm gets executed are taken Bus Usage 52 Memory bus cycles: 33976968 into account in I-cache misses and D-cache misses. Line 51 gives the number of cycles spent for such cache misses. (72.63%1 46.29%) The caches are loaded with instructions and data from the main memory and updates are made to the main memory by the write buffer keeping the memory bus busy with these transfers. The number of cycles the bus is in usage is given on line 52. Cache2000 Runtime 53 simulation runtime: 248.4u 23.4s, 5473.6w/5%, 172 KinsVs, 225 KrefS/s 54 DONE! made per second by the cache2000program in simulating the memory model are listed on Line 53. These numbers have no bearing on the performance of the memory model itself. The amount of user time in seconds (248.4), the amount of system time in seconds (23.4), the number of instructions executed per second and the number of memory references 9.9 10 APPLICATION NOTE A~2 R3000 FAMILY SOFTWARE TOOLS FOR PERFORMANCE ANALYSIS COMPARISON OF MEMORY SUBSYSTEMS tables. In these simulations, it is assumed that the write buffer is 1 word deep and the main memory read latency is 5 cycles. The application program is run through different memory subsystems and execution times are listed in the following I-Cache Size (Kbytes) D-Cache Size (Kbytes) 1- & D-Cache Refill (words) (1) (2) (3) 8 8 8 0 0 0 1 4 8 (4) (5) (6) 8 8 8 8 8 8 1 4 8 (8) (9) 16 16 16 8 8 8 1 4 8 (10) (11) (12) 16 16 16 6 16 16 1 4 8 (13) (14) (15) 32 32 32 16 16 16 1 4 8 (16) (17) (18) 32 32 32 32 32 32 4 8 (19) (20) (21) 64 64 64 32 32 32 1 4 8 (22) (23) (24) 64 64 64 64 64 64 1 4 8 (7) Run Time 16.67MHz (seconds) Instruction Streaming OFF ON ON OFF ON ON OFF ON ON OFF ON ON OFF ON ON OFF ON' ON OFF ON ON OFF ON ON 1 9.6 8.3 8.1 5.9 4.4 4.2 5.1 4.1 4.0 5.1 4.1 4.0 4.3 3.7 3.7 4.2 3.7 3.6 3.7 3.5 3.5 3.7 3.5 3.5 2854 tbl 01 Table 1. Notice that when streaming is on, doubling the instruction cache from 8Kbytes to 16Kbytes gives 6% gain in performance, 16Kbytes to 32Kbytes gives 9.5% gain and 32Kbytes to 64Kbytes gives 4.2% gain. When streaming is off, doubling the instruction cache gives an average 16% gain in performance. Also notice that changing the data cache size from 0 to 8Kbytes gives an average gain of 90%. In the following table, the simulations assume a read latency of 3 cycles and a 1 word deep write buffer. Instruction streaming is turned on. I Cache Size (Kbytes) D Cache Size (Kbytes) I &D Cache Refill (words) Run Time 20.00MHz (seconds) (1) 32 8 1 3.4 (2) 32 8 4 3.0 (3) 32 32 1 3.3 (4) 32 32 4 3.0 Comparing cases 1 and 2, observe that block refill of 4 words enhanced the performance by 13.3%. Between 3 and 4, the gain is 10%. With 1 word refill, incrementing the data cache to 32Kbytes gives a 3% gain. Between 2 and 4, refill size contributes little to the overall performance. In the following simulations, refill size of both 1- and Dcaches is 1 word. The read latency is 5 cycles. I Cache Size (Kbytes) D Cache Size (Kbytes) Write Buffer Depth (words) Run Time 20.00MHz (seconds) (1) 4 0 1 9.1 (2) 4 0 4 8.9 2854 tbl 03 Table 3. Write buffer, for this application program, gives a boost of , . . 2.25% to the performance. _ 2854 tbl 02 Table 2. 9.9 11 R3000 FAMILY SOFTWARE TOOLS FOR PERFORMANCE ANALYSIS APPLICATION NOTE AN-62 PIXSTATS OUTPUT 14375801 (0.307) loads+stores - These are the memory referencing instructions executed by the IDT79R3000. The following listing is the output of the Pixstats program. The information that is not relevant to performance analysis is omitted from the output file. The interlock cycles of the integer mult/div of the IDT79R3000 and the interlock cycles of the IDT79R3010 add/multldiv units are listed in the Pixstats output. These interlock cycles should be added to the number of cycles given in the cache2000 output to obtain the exact number of cycles for program execution, sincecache2000 only evaluates the memory penalties and does not take into account of any interlock penalties. Onthe other hand, Pixstats assumes that the application program fits completely in the caches. Pixstats is run only once because it is independent of the memory subsystem. The number of interlock cycles that should be added to the cache2000 cycles is the difference between the number of cycles given on the first line and the number of instructions on the second line of Pixstats output. The Pixstats output file is interspersed with comments for explanation. The output file follows. 14558230 (0.311) data bus use the data bus is busy is given here. The number of cycles 6406647 (0.137) branches. 7024133 (0.150) nops - Thetotal numberof nops executed is given. These nops are due to load delay slots and branch delay slots. 1974000 (0.042) multiply/divide interlock cycles (12/35 cycles) - The IDT79R3000 has a separate integer multiply/ divide unit that takes 12 and 35 cycles for integer multiplication and division, respectively. Any attempt to read the result of a multiply/divide before the operation is complete will cause the CPU to interlock until the operation is finished. The number of cycles the units Vlere interlocked is given above. pixstats embedded_application: 49268275 (1.053) cycles (2.95s @ 16.67MHz) - This line gives the number cycles the IDT79R3000 and IDT79R3010 take to execute the application program. The number of cycles calculated by Pixstats can differ from the cycles calculated by cache2000. In calculating the cycles, Pixstats assumes that the application program instructions and data completely reside in the caches. In other words, Pixstats assumes 100% cache hit rate for both 1- and D-caches, while Cache2000 accounts forthe memory overhead. On the other hand, pixstats estimates the interlock cycles ofthe IDT79R3000 and IDT79R3010. The interlock cycles are not taken into account by cache2000. The number inthefirst setof ()s is the average number of cycles per instruction. The execution time is 2.95s at 16.67MHz. 46783524 (1.000) instructions- This line gives the number of instructions executed. Both Pixstats and cache2000 execute the same number of instructions. 9264472 (0.198) loads - This line gives the total number of loads tothe IDT79R3000 and IDT79R301 O. This is the sum of loads on lines 6 and 14 of cache2000 output. 5111329 (0.109) stores - This line gives the total number of stores from the IDT79R3000 and IDT79R301 O. This is the sum of stores on lines 8,10,11,12,13, and 16 of Cache2000 output. 9.9 385600 (0.008) floating point data interlock cycles. 9562 (0.000) floating point add unit interlock cycles. 16676 (0.000) floating point multiply unit interlock cycles. 372 (0.000) floating point divide unit interlock cycles. 98541 (0.002) other floating pOint interlock cycles - The above lines give the interlock cycles of the IDT79R301 0 FPA. Data interlocks occur because of data dependencies between various floating point operations. This usually happens because the source operand (register) of some fp operation is the destination registerof some previous fp operation that has not completed. The interlock cycles due to data dependencies between two successive floating pOint operations are 385600. Cycles in which the add/multldiv are interlocked are because consecutive fp operations could not be issued. The add, multiply and divide units themselves are not pipe lined and, therefore, a new operation cannot be started before the previous one completes. 0.337 load nops per load - This number indicates that 67% of the load delay slots are filled with useful instructions. 0.356 stores per memory reference - This number implies that 35% of the memory references are store operations and the remaining 65% are load operations. Many application programs typically have a 2:1 ratio for reads to writes. 0.474 branch nops per branch - 53% of the branch delay slots have been successfully filled with useful instructions by the assembler. 12 R3000 FAMILY SOFTWARE TOOLS FOR PERFORMANCE ANALYSIS Opcode Distribution: 14871837 spec Iw 6261479 4970770 addiu sw 4161532 Inop 3119580 bnop 3039486 2227361 addu andi 1938082 beqz 1881668 bnez 1648279 1524644 sll Ibu 1315591 1255554 lui 1244693 b 1030526 sra 866987 Ihu 822784 cop1 759725 jr 728865 situ 724597 Ih 607812 jnop 603324 bcond 536865 bne 519056 bgez 492881 445222 sb sh 43886 sltiu 409184 beq 405527 sri 389980 subu 333942 jal 332866 sit 298293 nop 261743 Iwc1 255500 and 225631 blez 202636 bgtz 182090 tcvtd 175925 mflo 129374 slti 127245 tcvts 118090 multu 105875 jalr 103051 mtf 65771 swc1 65635 or 60680 ori 52753 bft 50075 tmul 48172 bltz 43984 cft 40491 ctf 40046 fadd 38012 mff 36121 APPLICATION NOTE AN-62 c.lt xor srlv xori div sllv fcvtw c.le fsub addi bft fdiv mthi tneg c.eq divu fmov sub srav nor Ib c.olt syscall tabs Iwl Iwr swl add j 31.79% 13.38% 10.63% 8.90% 6.67% 6.50% 4.76% 4.14% 4.02% 3.52% 3.26% 2.81% 2.68% 2.66% 2.20% 1.85% 1.76% 1.62% 1.56% 1.55% 1.30% 1.29% 1.15% 1.11% 1.05% 0.95% 0.94% 0.87% 0.87% 0.83% 0.71% 0.71% 0.64% 0.56% 0.55% 0.48% 0.43% 0.39% 0.38% 0.28% 0.27% 0.25% 0.23% 0.22% 0.14% 0.14% 0.13% 0.11% 0.11% 0.10% 0.09% 0.09% 0.09% 0.08% 0.08% 35951 29882 23727 23585 22679 22030 19601 19410 17003 14746 14744 13146 12038 9690 8581 7802 7283 4142 2869 2389 964 877 793 736 178 164 73 73 36 0.08% 0.06% 0.05% 0.05% 0.05% 0.05% 0.04% 0.04% 0.04% 0.03% 0.03% 0.03% 0.03% 0.02% 0.02% 0.02% 0.02% 0.01% 0.01% 0.01% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% The lines above give the opcode distribution of the instructions that are executed. Notice that 759725 COP1 instructions have been executed. This number and the percentage are cumulative for all the COP1 instructions. Less than 2% of the instructions executed are, therefore, I DT79R301 0 instructions. 24424 static instructions 4414 static basic blocks 5.5 static instructions per basic Static opcode frequency 6646 spec Iw 3408 sw 2438 addiu 2157 1627 addu cop1 1160 Inop 1094 Ii 1031 jal 917 bnop 889 lui 810 beqz 616 Iwc1 526 bnez 518 514 Ibu 9.9 block 27.21% 13.95% 9.98% 8.83% 6.66% 4.75% 4.48% 4.22% 3.75% 3.64% 3.32% 2.52% 2.15% 2.12% 2.10% 13 R3000 FAMILY SOFTWARE TOOLS FOR PERFORMANCE ANALYSIS Ih sll jr sh b andi jnop sb sra subu fcvtd swc1 bne situ nop fcvts beq Ihu sit fmul mtf bcond ori blez fadd slti sri mflo bgez and sltiu multu bgtz mff bff fdiv cff ctf c.lt fsub bltz jalr xori div xor Ib fcvtw c.le bft addi fneg fmov sllv syscall or c.eq 500 499 474 464 455 407 406 378 324 316 274 259 256 225 202 193 189 187 74 117 117 105 105 104 103 100 88 80 78 75 61 57 50 49 44 42 37 34 32 29 27 22 20 20 19 17 16 15 15 14 13 12 9 9 9 9 APPLICATION NOTE AN-62 2.05% 2.04% 1.94% 1.90% 1.86% 1.67% 1.66% 1.55% 1.33% 1.29% 1.12% 1.06% 1.05% 0.92% 0.83% 0.79% 0.77% 0.77% 0.71% 0.48% 0.48% 0.43% 0.43% 0.43% 0.42% 0.41% 0.36% 0.33% 0.32% 0.31% 0.25% 0.23% 0.20% 0.20% 0.18% 0.17% 0.15% 0.14% 0.13% 0.12% 0.11% 0.09% 0.08% 0.08% 0.08% 0.07% 0.07% 0.06% 0.06% 0.06% 0.05% 0.05% 0.04% 0.04% 0.04% 0.04% divu mfhi fabs srlv Iwl Iwr srav sub c.olt j swl add nor 7 6 6 5 4 4 4 3 3 2 2 2 1 0.03% 0.02% 0.02% 0.02% 0.02% 0.02% 0.02% 0.01% 0.01% 0.01% 0.01% 0.01% 0.00% The list above gives the number of different kinds of instructions the compiler has generated. The total number of instructions the compiler generated is 24424 and, in the executable, there are 1160 COP1 instructions. The numbers given in the list are cumulative of their kind. FPA EMULATION OVERHEAD The MIPS compilers generate executable code assuming the presence of an FPA. When the FPA is not present, or if it is not functional, for every FPA instruction, the exception handler invokes the floating point instruction emulation software. The current state execution information is passed onto the emulation software through the Process Control Block (pcb). Following the IEEE standards, the emulation software decodes the FPA instructions, fetches the operands, checks for NaNs, Infinities, Zeros and DEmormalized numbers, carries out the FPA operation, normalizes the result and, finally, returns the result. The amount of time it takes to execute the emulation software depends on the instruction being emulated, the value of the operands and the amount of the emulation code in the cache. The following is a table listing of the emulation times of the emulation software of RISC/os 4.0 on M/120 with IDT79R3000 @ 16.67MHz for the floating point arithmetic instructions. The times are in micro seconds. The numbers in ( )s are the number of times the emulation is slower than actual execution on the IDT79R3010. Instruction ADD.S Best Case Avg Case ~secs ~secs 8.35(70) Worst Case ~secs 18.31 (153) 25.5(213) ADD.D 8.35(70) 18.31(153 25.5(213) SUB.S 8.35(70) 18.31 (153) 25.5(213) SUB.D 8.35(70) 18.31 (153) 25.5(213) MUL.S 9.48(40) 23.08(96) 42.91(179) MUL.D 9.48(32) 23.08(77) 42.91(143) DIV.S 9.48(13) 23.08(32) 42.91(60) DIV.D 9.48(9) 23.08(20) 42.91(38) 2854 tbl 04 Table 4. 9.9 14 APPLICATION NOTE AN-62 R3000 FAMILY SOFTWARE TOOLS FOR PERFORMANCE ANALYSIS The best case occurs when most of the emulation software is in the cache and the operands are either zero or infinity. On the other hand, the worst case arises when very little of the emulation code is in the cache and the operands are denormalized. The average times are good indicators of the overhead for regular data. To determine the emulation overhead for an application program, the output of the Pixstats must be used. The static and the dynamic opcode distribution for coprocessor 1 listed in the output and the times given above help the designer estimate the total emulation overhead. CONCLUSION The software tools - cache2000, Pixie and Pixstats allow the userto accurately project the performance of different IDT79R3000-based systems. The application program needs to be compiled only once. cache2000 executable must be created for every proposed memory model. All these models can be run in parallel as background jobs. Tofindthe interlock cycles of the IDT79R3000 and IDT79R301 0, Pixstats should be run only once. The cache20000utput clearly points out the tunable parameters of the memory subsystem. The Pixstats output provides information on integer and floating point interlocks and static/dynamic opcode distribution. 9.9 15 R3000 FAMILY SOFTWARE TOOLS FOR PERFORMANCE ANALYSIS APPLICATION NOTE AN-62 APPENDIX A cache2000(lspp) SYSTEM PROGRAMMER'S MANUAL --cycle ns Use a ns cycle time when converting cycle counts to seconds. Default is 40ns. Name cache2000 - analyze cache misses for an M2000 system (IDT79R2900/R2950) --clock MHz Use a 1000/MHz cycle time when converting cycle counts to seconds. Default is 25.0MHz. Synopsis Cache2000 [options] -wbsize N Simulate a N-deep I DT79R2020 write buffer. Default is 4 deep. Maximum is 8 deep. Description cache2000analyzes a memory reference stream produced by a program pixified with -idtrace. To use cache2000, first use Plxle(1) to translate and instrument the executable object module for the program. Use pixie's -idtrace option. Next, execute the translation on an appropriate input using makeplpe( 1). The pixified program will output the addresses and types of memory references to Unix file descriptor 19. With makepipe this trace can be fed into the standard input of cache2000for simulation. Example: make pipe 19 foo.pixie fooarg "<" fooinput '>' foooutput T 0\ /usr/local/bin/cache2000 --comment ''too fooarg" '>' foo.cache2000 cache2000 differs from cache26(1) in that it models the IDT79R2950 memory board, which supports page mode writes and 16-word cache refill. The IDT79R2900 also uses the IDT79R3000 2-cycle partial word store option to avoid cache invalidation. The cache sizes are hardwired in the source to 64K bytes. The data cache is write-thru, with IDT79R2020 write buffering. cache2000simulatesthe memory subsystem. Accurate performance predictions must add the stall cycles predicted by pixstats(1). --comment string Include string in the output. This is useful for associating the output with the program and input used to generate it. -print N Set the interval for periodic statistics printout. N is the interval in millions of instructions. Default is 20 million instructions. -flush N Set the cache flush interval. N is the interval in millions of instructions. Default is 1 million instructions. -[no]random_flush Flush cache at random intervals (Poisson distribution). Default: -norandom_flush. 9.9 -read_latency N Instruction and data cache misses take N+ 16 extra cycles. Default is 12 cycles. -idle_word N Set the IDTR2950 memory board idle word write time to N cycles. Default is 4 cycles. -page_write N Set the IDTR2950 non-idle, page mode write time to N cycles. Default is 2 cycles. -nonpage_write N Set the IDTR2950 non-idle, non-page mode write time to N cycles. Default is 6 cycles. See Also pixie(1), pixstats(1), makepipe(1), cache23(1), cache26(1), The MIPS System Programmer's Reference. These are the cache2000 parameters that can be modified to model different memory systems. The values assigned to these parameters are just examples. /* cahce parameters */ /* instruction refill size, in words */ #ifndef LrefilUog # define LrefilUog 4 #endif #define LrefilLsize (1«LrefiIUog) /* instruction cache size, in words */ #ifndef Lsize_log # define Lsize_log 14 #endif #define LrefilLsize (1 «LrefilUog) /* instruction cache size, in words */ #ifndef Lsize_log # define Lsize_log 14 #endif #define Lsize (1 «Lsize_log) 16 R3000 FAMILY SOFTWARE TOOLS FOR PERFORMANCE ANAL VSIS APPLICATION NOTE AN-62 r instruction streaming */ r data refill size, in words */ #ifndef d_refilUog # define d_refilUog 4 #endif #define d_refilLsize (1 «d_refiIUog) r data cache size, in words */ #ifndef d_size_log # define d_size_log 14 #endif #define d_size (1 «d_size_log) #ifndef istreaming # define istreaming 1 #endif r memory parameters */ r byte gathering */ #ifndef byte_gathering # define byte_gathering 0 #endif r read conflict checking in write buffer */ #ifndef read_conflict_check # define read_conflicCcheck 0 #endif private private private private private private unsigned wbsize = 4; unsigned read_latency = 13; unsigned idle_write_time = 4; unsigned page_write_time = 2; unsigned nonpage_write_time = 6; unsigned byte_extra_write_time = 4; private private private private private private char *comment = NULL; boolean random_flush = false; unsigned prinUnterval = 20000000; unsigned flush_interval = 1000000; double random_flush-parameter; double cycletime = 40e-9; Ell I 9.9 17 R3000 FAMILY SOFTWARE TOOLS FOR PERFORMANCE ANALYSIS APPLICATION NOTE AN-62 PIXIE(1-SysV) RISC/os REFERENCE MANUAL Name pixie - -[no]dtrace [Disable] orenabletracing of data memory references. For the moment, -dtrace requires -itrace. Default: -nodtrace. add profiling code to a program -idtrace_file number Specify a UNIX file descriptor number for the trace output file. Default: 19. Synopsis pixie in-prog_name [options] Description Pixie reads an executable program, partitions it into basic blocks, and writes an equivalent program containing additional code that counts the execution of each basic block. (A basic block is a region of the program that can be entered only at the beginning and exited only at the end). Pixie also generates a file containing the address of each of the basic blocks. When you run the pixie-generated program, it will (provided it terminates normally or via a call to exlt(2)) generate a file containing the basic block counts. The name of the file is that of the original program with any leading directory names removed and ".Counts" appended. prof(1) and plxstats(1) can analyze these files and produce a listing of profiling data. -bbaddrs name Specify a name for the file of basic block addresses. Default is to remove any leading directory names from the in-prog_name and append ".Addrs". -bbcounts name Specifies the full filename of the basic block counts file. Default: objfile.Counts. -mips1 Use the MIPS1 instruction set (IDT79R2000, IDT79R3000) for output executable. This is the default. -mips2 Use the MIPS2 instruction set (a superset of MIPS1) for output executable. -[no]quiet [Permits] or suppresses messages summarizing the binary-to-binary translation process. Default: -noquiet. -[no ]branchcounts -branchcounts inserts extra counters to track whether each branch instruction is taken or not taken. When this option is used, plxstats will automatically print more statistics. Default: -nobranchcounts. -[no]idtrace [Disable] or enable tracing of instruction and data memory references. -idtrace is equivalent to using both -it race and -dtrace together. Defau It: -noidtrace. -[no]itrace [Disable] or enable tracing of instruction memory references. Default: -noitrace. 9.9 See Also prof( 1), plxstats( 1). The MIPS Languages Programmer's Guide. Bugs The handler function address to the signal system calls is not translated. and so programs that receive signals will not work pixified. Programs that call vforkO will not work pixified because te child process will modify the parent state required for pixie operation. Use forkO instead. Pixified code is substantially larger than the original code. Conditional branches that used to fit in the 16-bit branch displacement field may no longer fit, generating a pixie error. 18 R3000 FAMILY SOFTWARE TOOLS FOR PERFORMANCE ANALYSIS APPLICATION NOTE AN-62 PIXSTATS{1-SysV) RISC/os REFERENCE MANUAL -r2010 Use r2010 floating point chip operation times and overlap rules. This is the default. Name pixstats - -r2360 Use r2360 floating point board operation times and overlap rules. analyze program execution Synopsis pixstats program [options] -disassemble Disassemble and show the analyzed object code. Description Plxstats analyzes a program's execution characteristics. To use plxstats, first use Plxle(1) to translate and instrument the executable object module forthe program. Next, execute the translation on an appropriate input. This produces a .Count file. Finally, use pixstats to generate a detailed report on opcode frequencies, interlocks, a mini-profile, and more. -cycle ns Assume a ns cycle time when converting cycle counts to seconds. See Also pixie(1), prof(1), The MIPS Languages Programmer's Guide. Bugs Pixstats models execution assuming a perfect memory system. Cache misses etc. will increase above the plxstats predictions. II 9.9 19 G Integrated Device Technology,lnc. USING IDT732aa OR IDT7321 a AS READ AND WRITE BUFFERS WITH R3aaa APPLICATION NOTE AN-65 CONTENTS AN-GSA USING THE IDT73200 MULTILEVEL PIPELINE REGISTER AS READ AND WRITE BUFFERS WITH R3000/1 by Danh Le Ngoe, Ignaslo Osorio, Avlgdor Wlllenz AN-GSB USING IDT73210 AS READ AND WRITE BUFFERS WITH R3000 by V.S. Ramaprasad 6/90 101990 Integrated Device Technology, Inc. 9.10 G® Integrated Device Technology, Inc:. USING THE IDT73200 MULTILEVEL PIPELINE REGISTERS AS READ AND WRITE BUFFERS WITH R3000/1 APPLICATION NOTE AN-GSA By Danh Le Ngoc, Ignacio Osorio and Avlgdor W1l1enz INTRODUCTION The objective of this application note is to describe the use of the lOT 73200 multilevel pipeline register as the write buffer and read bufferforthe R3000/1 RiSe processor.The following topics are discussed: • The IDT73200 Multilevel Pipeline Register, presents a brief description of general characteristics and configura tions of the multilevel pipeline register. • Read-Write buffers, explains what read and write buffers are, and how they function in a R3000/1 system. • Implementing R-W Buffers, describes how to implement the IOT73200 as read and write buffers. Buffer depths are also discussed in this section. • A Typical System, provides an example of read-write buffers using the IOT73200, within a RiSe system. It also presents the control logic and PAL equations to operate the I0T73200 as read and write buffers. 00-15 16 SELO-2 "';3~_ _.r...L---'----L-~ __~~-.L..-~--~ :00 10-3 4 mo ~~ ~:o mO :or 2647 drw 16 Y0-15 (% Figure 1. Block Diagram of the 10T73200 6190 101990 Integrated Device Technology, Inc. 9.10 2 USING THE 1DT73200 OR IDT73210 AS READ AND WRITE BUFFERS WITH R3000 13 12 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 APPLICATION NOTE AN-65 11 10 MNEMONIC 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 LDA LDB LDC LDD LDE LDF LDG LDH LSHAH LSHAD LSHEH LSHAB LSHCD LSHEF LSHGH HOLD a 1 1 0 0 1 1 0 0 1 .1 0 0 1 1 PIPELINE LEVEL FUNCTION DO-15->A DO-15->B DO-15->C DO-15->D DO-15->E DO-15->F DO-15->G DO-15->H DO-15->A->B->C->D->E->F->G->H DO-15->A->B->C->D DO-15->E->F->G->H DO-15->A->B DO-15->C->D DO-15->E->F DO-15->G->H HOLD ALL REGISTERS 1 1 1 1 1 1 1 1 8 4 4 2 2 2 2 2647 drw 17 02 Figure 2. Load Control THE lOT 73200 MULTILEVEL PIPELINE REGISTER The IDT 73200 is a high-speed, low-power Programmable Multilevel Pipeline Register. It has a dedicated 16-bit input port and a dedicated 16-bit output port. As shown in figure 1, the IDT73200 contains eight 16-bit registers which can be configured as one a-level, two 4-level, four 2-level, or eight 1-level pipeline registers. Data at the input port DO-15 can be written into any of the eight registers under control of the load control: 10-3. Figure 2 illustrates the load control for the input port . An eight-to-one output multiplexer allows data to be read on the. V-bus from any of the eight registers using the outputselection control: SO-2. Figure 3 illustrates the output control. READ-WRITE BUFFERS As shown in the Figure 4 , a high-speed computer system consists of a R3000/1 chip set, high-speed cache, write buffer, read buffer, 1/0 devices, and main memory. Since the main processor supports a write-through cache policy, all data written into the data cache must also be written into the main memory to maintain the cache coherency. Due to the data-rate mismatch between the high-speed processor bus (33M Hz -> 240M bytes/sec) and slow speed main memory (10-15MHz ->10-40 Mbytes/sec), a write buffer and a readbuffer are required. The write buffer is an elastic buffer which is used to capture addresses and data at the cache speed. At the other side of the write buffer, data is transfered into the main memory at the system memory speed. When a load operation causes a cache miss, a main memory read is initiated. Two types of main memory read are supported on the R3000/1: single word transfer and multiple word transfer. In either case, a read-buffer is used to capture data from the system memory at memory speed. Then data is written into the cache at the cache speed. The depth of the write buffer and the read-buffer are dependent on different factors such as processor speed, system memory speed, bus protocol and the application. SEL2 SEL1 SELO YOUTPUT 0 0 0 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 A REG BREG CREG DREG EREG FREG GREG HREG a 1 1 1 1 2647 drw 18 Figure 3. Output Selection 9.10 3 USING THE IDT73200 OR IDT73210 AS READ AND WRITE BUFFERS WITH R3000 APPLICATION NOTE AN-65 HIGHEST BANDWIDTH (NATIVE RATE OF CPU) PROCESSOR EJ B DATA jeACHE ~200 MBYTES/Sec 33MHZ ~240 MBYTES/Sec -------- ...... ADDRESS ,t 25MHZ ~ 25MHZ ~100 MBYTES/Sec.each 33MHZ ~120 MBYTES/Sec each INSTRUCTION CACHE ~ 10- DATA CACHE jf- - 1 , ---!- !WRITE BUFFER A WRITE ----------... --- READ BUFFER & READ BUFFER - ") HIGH-SPEED SYSTEM BUS 1 1 BiFIFO t MAIN MEMORY t- I SCSI CONTROL t SCSI BUS ~ f I 10-15 MHz-.. 10-40 Mbytes/Sec 11 DISK CONTROL 6 LOWESTBANDWIDTH ( SYSTEM MEMORY & VO DEVICES) 2647 drw 19 Figure 4. Simplified Block Diagram of a High-Speed RISC System IMPLEMENTING R-W BUFFERS As previously described in section 1, the lOT 73200 is like a high-speed synchronous memory with a depth that is programmable from 1 to 8 deep. Therefore, a write buffer and read buffer for the high performance R3000/1 system can be easily implemented with the help of the lOT 73200. FigureS illustrates a detailed R3000/1 system which consists of the R3000/1 chip set, write buffer, read buffer, lOT49C46S Flow-thruEOCTM, system main memory, and several state machines to control the main memory interface. In this scheme, the data bits together with the parity bits flow from the main memory through the EOC device for error detection and correction. When an error is detected, the EOC informs the read buffer control through an error feedback path. In figure S the write buffer consists of two paths: address and data. The address path (34-bit) is created with three lOT 73200s to capture the address, tag, and the access type bits. The data path of the write buffer (32-bit), is formed by two IOT73200. Data coming from the CPU is buffered into the "data path" write buffer prior to being written into the main memory. The read buffer in figure S consists only of a "data path" (36-bits) which includes the required data parity on the R3000/1 system. The lOT 49C465 high speed Flow-thruEOC can be used to maintain the data integrity of the system main memory. Also, parity bits are generated with the help of the IOT49C465 Flow-thruEOC. 9.10 4 &I • J FPU R301D OSC ,--- ~A~ RESET-. INITO 'TiONs-. STATE MACH 'lTR5-o-. - w ~~ ~f-+ CPU DATA ADDRLO 0 TAG ~ACCTYP1-o MEM RD&WR g~~I t I ~ ~ i~ ~ ~ 0 Dl5-0 ~ 10-3 IDT73200 AS PAL AS MEMORY DWBCT CONTROL JJ 4)::; ~ J ~2 ACLITYP WRITE BUFFER tf Y1-o 16 015-0 + -. ..... - =: IDT73200 AS TAG WRITE BUFFER Yl5-0 X60 Cm :Co f-. :!!::j ~ ~ I CP AWBCT ~ , SYSTEM CLOCK BUFFEQ! FCT240A C~ ~:i! -.- 4 ::; 21 +1 BUSY RD&Wf -. ... DATAIf R3000 ~ :cc: ~~ i 015-0 ~ ~ IDT73200 :- ADRLOW WRITE BUFFER ..... ~ t CP ~ AS 015-0 IDT73200 AS 10-3 DATA WRITE BUFFER ~ Y15-o m~ llIe ~O m_ '11::1 DCACHE X60BITS 74FCT373A 2X 16 ~ -Iw 36 ~ :CC CIl::j :Cw -N -I ... ::I:e :c~ ! t ,SEL2-Q - ..... ..... IDT73200 AS AS DATA WRITE BUFFER DATA PARITY READ BUFFER ~~ ~ f-. Yl5-0 t ~ Y11-o IDT73200 ~ ~ ~ 8e t ~ ---+ ~ f-. wCll 36 ..... ..... f-+ Y11-o IDT73200 AS DATA READ BUFFER 015-0 015-0 I ~4 JYmm -'mmv ' READY ADDRESS DRAM CONTROL m~ 'RAS ROW COLUM ADDRESS 12 DATA BUS : I mt 1 :;uQ-31 FLOW·THROUGH ECC IDT41C415 CBOQ-6 CBlQ-6 MOO-31 I ~JL t 74FCT245 74FCT245 ..... AS DATA READ BUFFER 12 ~" ~'~ J t ~ I -~ ""'M MEM,,,,, ..", ",eo,,,,, j USING COMMON 1.0 DRAMS ..... IDT73200 015-0 e ADDRESS BUS + ..... Y11-o -a "C C £ az z U1 ~ i ~ ~ m Figure 5. Detailed R3000 System with Read and Write Buffer z~ &, U1 USING THE 10173200 OR 10173210 AS READ AND WRITE BUFFERS WITH R3000 APPUCATION NOTE AN-65 BUFFER DEPTH Typical write buffer depths are two and four levels. However, high end applications with intensive memory access may require deeper write buffers, i.e., eight or sixteen levels as shown in figure 6. Read buffer depth design issues are somewhat different from those of the write buffers. For example, when an I-cache miss occurs we are faced with the question of "how many blocks to refill in the I-cache? " To answer this question, we should recall that the miss rate is determined by the cache size and the block size. Therefore, the block size determines the size of the read buffer. Thus to bring 16 words from memory into the I-cache would require a 16 level deep read buffer. Fetching small blocks of instructions using a shallow read buffer implies constantly fetching instructions and therefore stalling the CPU for several cycles. Depending on the application, this could impose significant penalty on system performance. Due to program locality (sequentiality of instructions), we would benefit most by fetching a large block. Deep read buffers for I-cache are therefore an appealing solution. Typical read buffer depths are 4levels; high end applications could consider from 8 to 16 levels read buffers. D-cache fetching, on the other hand, is random in nature and typical schemes prefer a 1-level deep read buffer. The flexibility of the 73200 allows instantaneous re-configuration when fetching for the I-cache and then forthe D-cache. For example, we could have an R3000/1 initialized for 16 words I-cache fetching and 1 word D-cache fetching and still use the same read buffer. This can be accomplished through a read buffer controller capable of configuring the 73200 to different depths. Lets now discuss two popular read buffer configurations. As discussed earlier, the IDT 73200 can be configured for different depths: eight 1-level, four 2-level, two 4-level or one 8-level deep registers. This feature makes the 73200 particularly flexible in Read/Write buffer applications. The depth of the write buffer is programmable using the load-control and the output selection. A single IDT 73200 can be programmed to a buffer depth of 1 to 8. A deeper write buffer can be implemented by cascading several devices in depth as shown in the figure 6. The right depth depends on the application program and/or hardware requirements. 00-31 32 CP ern 10-S • 01S-0 . .. , ~ ~ lOT 73200 AS ~ WRITE BUFFER lOT 73200 AS WRITE BUFFER f--- SELO-2 ~ Vs~ at; • 01S-0 Y1S-0 • 01S-0 Y1S-0 ~ 01S-0 • ~ lOT 73200 AS ~ WRITE BUFFER , Y1S-0 lOT 73200 AS a) 1-level deep Read Buffer & b) 8-level deep Read Buffer WRITE BUFFER ~ ~ ,t Y1S-0 , Y31-0 2647 drw21 Figure 6. 16-0eep and 32-Wlde Write Buffer Using 10173200s With the help of the write buffer, the CPU can write to the memory without regard to the memory speed. However, if consecutive (or back to back) write operations take place, the write buffer would eventually become full and cause the CPU to stall. Naturally, this could present a problem in high performance systems. A logical solution to CPU stalls is to increase the depth of the write buffer. 1-LEVEL DEEP READ BUFFER One-level deep read buffers can be used in high performance systems where the data transfer between the main memory and the CPU is efficiently handled. This can be accomplished through a sophisticated memory scheme, like interleaving, and supported by a fast DRAM architecture. Such a scheme minimizes the transfer rate mismatch between the CPU and the main memory. One-level deep read buffers can also be applied in low performance systems where the penalty in fetching one word at a time is not significant. 8-LEVEL DEEP READ BUFFER This configuration can be used in a general purpose system. An 8-level deep write buffer offers the benefit of effective data rate capture from the R3000/1 to the main memory. The 8-level Read Buffer is convenient for slow main memory systems. 9.10 6 II' • USING THE 10T73200 OR 10T73210 AS READ AND WRITE BUFFERS WITH R3000 APPLICATION NOTE AN-65 -... ---- DATA ADDRESt_1 -I R3000/1 M 11..- INSTR. CACHE P "'"" 11- DATA CACHE P ---- ... - h 1r CONT ROL WRITE BUFFER ADDRESS DATA 73200 l' l' WRITE BUFFER 73200 73200 , CONTROL STATE MACHINE 73200 READ BUFFER 73200 : 73200 l' -.. -... MEMORY & I/O 2647 drw 22 FIgure 7. R3000/1 System with the Read-Write Buffers A TYPICAL SYSTEM Figure 7 shows the interconnections among the R3000/1 , Instruction and Data Cache, ReadlWrite Buffers, control state machine, and the system memory. The read and write buffers are built from multilevel pipeline registers denoted by the lOT 73200. The control state machine represents the logic needed to drive the read and write buffers. WRITE BUFFER INTERFACE A write buffer, as discussed earlier, transfers data from cache to main memory and provides address bits to select memory locations. This is illustrated in Figure 8: one write buffer is dedicated to pass address bits and the othertransfers data to the main memory. Therefore, the write buffer labeled "address" is activated in both memory reading and memory writing operations. As seen in Figure 8, the address path carries address, tag and Acc type bits. Notice that the write buffer labeled "Address", is formed by two lOT 73200. The first 73200 captures Address low 0-13 and AccTyp 0,1. The second 73200 captures Adr High 14-29 and Tag 0,1. The Data path, as shown in figure 8, carries data and parity bits. The data write buffer uses two lOT 73200. They latch 32 data bits from the cache and transfer them to a memory location selected by a memory controller. Notice that parity bits can be generated using the IOT49C465 when data is flowing from the write buffer to the system memory. A situation of interest in deep write buffers is the following: The CPU requests reading data from a memory location that is about to be updated by the write buffer. The potential problem is clear: reading data that hasn't been updated yet. To avoid this problem, write buffer systems use conflict checking schemes. A common "conflict checking scheme" is implemented by comparing addresses of memory locations to be read and written by the readlwrite buffers. When an address match is found, a match signal is send to the CPU. This solution may involve using more hardware to implement such scheme. Another approach is '1Iushing". To simplify the design, the write buffer is '1Iushed", i.e., all pending writings are placed in the main memory before any read buffer operation takes place. Such is the case in figure 8 where no additional hardware was needed. Figure 8 shows the associated control circuitry to drive the write buffer. Notice that write buffer "data" and AdrHi (14-31) are clocked at the SYSOUT signal, whereas AccTyp (0:1) and adr-Lo(0:13) are clocked at SYSOUT. WRITE BUFFER CONTROLLER The write buffer controller is internally driven by two counters: The I-counter selects the load operation for the input to the 73200. The SEL-counter selects the register to be read in the 73200 output. The write buffer controller also takes care of the '1lushing" scheme. 9.10 7 , USING THE 10173200 OR 10173210 AS READ AND WRITE BUFFERS WITH R3000 APPUCATION NOTE AN-55 ADDRESS PATH DATA PATH -I» »0 G>o oJJ :...~ ACCTYPO ACCTYP1 CPU CONTROL~~------~ WRITE BUFFER CONTROL LOGIC WRITE BUFFER IC(O-2) DATA PATH SELC(O-2) r JJ o ~~ 73200: 73200 ~ MEMORY CONTROL PATH ADDRESS PATH DATA PATH 2647 drw23 Figure 8. Write Buffer Interface The PAL equations for the write buffer controller are: MODULE WB_CONT; TITLE WB_CONT; TYPE MMI 16R4; Table; XNOT= LWRNOT:= Inputs; IC3 SELC3 WBEMPTY WRACQ MEMRD WBFULL MEMWR RESET Node[pin2]; Node[pin3]; Node[pin4]; Node[pin4]; Node[pinS]; Node[pin6]; Node[pin7]; Node[pin9]; LRD LWR Node[pin1S]; Node[pin14]; LAD Node[pin1S]; Node[pin14]; Node[pin 19]; Node[pin12]; Node[pin13]; Node[pin18]; ICE3 XOR SELC3; LRDANDWBEMPTYAND~ AND RESETaR LRD AND! MEMWR AND WRACQ AND! WB EM PTY AND RESET; LRD NOT:= (lWB EM PTY AND !MEMRD AND [WR AND RESEn OR (!LRD AND !MEMRD AND RESEl); WBCEN NOT = WBFULL; ICE NOT = (!MEMWRAND 'WBFOIT) OR(!MEMRD AND !WBEMPTY); SELCE NOT- (!LWR AND IWRACQ) OR (!LRD AND MEMRD); Outputs; LWR X WBCEN ICE SELCE END; END WB_CONT. 9.10 8 USING THE IDT73200 OR 1DT7321 0 AS READ AND WRITE BUFFERS WITH R3000 APPLICATION NOTE AN-65 The PAL equations for the I-counter are: MODULE I-COUNTER; TITLE LCOUNTER; TYPE MMI 16R4; Inputs; X IC iCE R3KRST IC3 IC2 IC1 ICO Node[pin2]; Node[pin3]; Node[pin4]; Node[pin5]; Node[pin17]; Node[pin16]; Node[pin1S]; Node[pin14]; WBFOIT WBEMPTY SO S1 S2 Node[pin 18]; Node[pin13]; Node[pin19]; so S1 SELC3 SELC2 SELC1 CO S2 C Node[pin 17]; Node[pin16]; Node[pin1S]; Node[pin14]; Node[pin 19]; Node[pin12]; ICO NOT:= ( ICO AND !ICE ) OR (!ICO AND ICE) OR !R3KRST; IC1 NOT:= (!ICO AND !IC1 AND IIC1AND !ICE) OR (ICO AND IC1 AND liCE) OR (IIC1 AND ICE) OR !R3KRST; (!ICO AND IIC1 ABD !IC2 ABO !ICE) OR (ICO AND IC1 AND IC2 AND liCE) OR (IICO AND IC1 AND !IC2 AND liCE) OR (ICO AND !IC1 AND IIC2 AND !ICE) OR (IIC2 AND ICE) OR !R3KRST; (IIC2 AND IIC3 AND liCE) OR (!lC1 AND IC2 AND !IC3 AND liCE) OR (!ICO AND IC1 AND IC2 AND IIC3 AND liCE) OR (ICO AND IC1 AND IC2 AND IC3 AND liCE) OR (!lnd; End LCounter; The PAL equations for the SEL-counter are: MODULE SEL_COUNTER; TITLE SEL_COUNTER; TYPEMMI 16R4; Node[pin18]; Node[pin13]; Node[pin 17]; Node[pin 1S]; Node[pin1S]; Node[pin14]; Node[pin19]; Node[pin 12]; Table; Table; IC2 NOT:= Node[pin17]; Node[pin 1S]; Node[pin1S]; Node[pin14]; Outputs; Outputs; IC3 IC2 IC1 ICO SELC3 SELC2 SELC1 SELCO C NOT = SO AND S1 AND S2; End; End SEL_Counter; Inputs; SELCE ICO IC1 IC2 R3KRST Node[pin2]; Node[pin3]; Node[pin4]; Node[pinS]; Node[pinS]; 9.10 9 USING THE IDT73200 OR IDT73210 AS READ AND WRITE BUFFERS WITH R3000 APPUCATION NOTE AN·65 OATAPATH ~... j~ RESET .. CPU ~ ... CONTROL~ PATH ~ ..- , SELC(0-2) ... IC(0-3): READ BUFFER CONTROL LOGIC READ BUFFER , . --... ~--. YO-Y13 , OATA , 73200 173200 ~ 00-0 31 ~= MEMORY CONTROL PATH DATA PATH 2647 drw 24 Figure 9. Read Buffer Interface READ BUFFER INTERFACE Outputs; When reading from the main memory to the cache, the R3000/1 sends a memory read signal to the control state machine, represented in Figure 9 as the Read Buffer Control Logic. Once the signal has been acknowledged, the R3000/1 places the address, tag, and data size in the write buffers. Internally, the 73200 registers capture this information at R3000/1 clock rate with load and output configurations determined by the read buffer controller. Once the address is available in the address bus, the controller will then drive memory signals to initiate the memory transfer at memory clock rate into the read buffer. LRD LWR WB_CLK_DIS WB DATA_OE CMEMRD CCMEMRD WRBUSY Node[pin 15]; Node[pin14]; Node[pin19]; Node[pin12]; Node[pin13]; Node[pin18]; Node[pin16]; LWRNOT:= LRD AND !MEMWRAND RESET OR !LWR AND WRACQ; LWRAND !MEMRD; Table; LRD NOT:= READ BUFFER CONTROLLER WRBUSY NOT := The read buffer controller monitors the flow of data within the Read Buffer by programming the 73200 internal registers to the appropriate load mode and memory clock frequency. Finally, the controller selects the output registers at such speed to match the R3000/1 frequency. The PAL equations for the read buffer controller are: !MEMWROR !WRBUSY AND WRACQ OR !MEMRD; WBCEN NOT:= !MEMWR OR !WB_CLK_DIS AND WRAC5 OR !MEMRD ANDCCMEMRD; LRD AND !MEMWR OR {need to be inverted} !LWR AND WRACQ; MODULE W/RB_CONT; TITLE WIRB_CONT; TYPE MM116R8; Inputs; CMEMRD NOT := WRACQ MEMRD MEMWR RESET LRD LWR WB CLK_DIS CMEMRD CCMEMRD WRBUSY Node[pin5]; Node[pin6]; Node[pin8]; Node[pin9]; Node[pin15]; Node[pin14]; Node[pin19]; Node[pin 13]; Node[pin 18]; Node[pin 161; CCMEMRD NOT:= End; End W/RB_CONT; !MEMRD; CMEMRD; CONCLUSION As the speed of the processor increases, write- and read buffers must also become faster and deeper. The high-speed multi-level pipeline register IDT 73200 meets that challenge by providing a fast and flexible data path to suit various highspeed RISC and CISC processors. 9.10 10 II G® Integrated Device Technology, Inc. USING IDT7321 D AS READ AND WRITE BUFFERS WITH R3DDD APPLICATION NOTE AN-6S8 By V. S. Ramaprasad INTRODUCTION In this application note, the design of one deep read and one deep write buffer to be used in an R3DDD system is described with boolean equations and timing diagrams. The boolean equations are for the control signals of the read and write buffers and the main memory interface. This control logic can be implemented with any PLD. The syntax chosen to describe these equations is simple and it is not associated with any PLD programming software. The timing diagrams explain the various states during the operation of one deep read and write buffers. Also described in this application note are the other possible configurations of implementing read and write buffers with IDT7321 Os. These components can be used as two deep read and one deep write, and one deep read and two deep write buffers. Beforethe application is presented, the features of 7321 D are 'described and a summary of the memory interface signals of R3DDD is given. R3DDD based systems require readlwrite buffers between the CPU and the main memory due to memory bandwidth mismatch. The main memory system supplies the instructions/ data through a read buffer. The CPU makes the data updates to the main memory through a write buffer. The speed differences between the CPU, the caches and the main memory that typically exist in many systems demand the use of at least one level deep read and write buffers. The use of these buffers isolates the caches from the rest of the memory system. They also limit the physical length of the address and data lines and serve as drivers to the rest of the system. The gain in performance by increasing the depth of the read and the write buffers is completely dependent on the application program being executed. By modeling memory subsystems with different depths of read/write buffers (using the System Programmer's Package tools forthe R3DDD) and running the application program on the model, the designer can make the trade-off between the cost and the depth of the buffers. For high performance systems with sophisticated main memory schemes like interleaving, and for systems with fast DRAM architectures like Page Mode, or Static Column Mode, a one deep read buffer might satisfy the transfer rate of the processor. e1990 In1egrated Device Technology. Inc. For low performance systems, where the penalty of fetching one word at a time is not significant, and for applications with infrequent successive writes, a one deep write buffer might also deliver optimal performance. In systems where one-level deep read and write buffers proved to be sufficient, a bidirectional register can be utilized to serve as both read and write buffers. The 8-bit bidirectional register,IDT7321 D, with parity checking and parity generation is an ideal candidate for this purpose. This bidirectional register also allows the designerto build a two-level deep read buffer and one level deep write buffer, or one-level read buffer and two-level write buffer configurations. Using IDT7321D reduces the parts that are needed for parity generation. Also, by clocking in the lower address bits and the higher address bits with separate clocks, the designer can eliminate latching the address low bits. IDT73210 FEATURES Figure 1 shows the features of IDT7321 Dwith all the control signals and data paths. It is a bidirectional buffer with separate output enables and clock enables. Data is registered with the same clock in both directioris. There is a single data path from port A to port B. The 8-bit data and the parity bit are clocked through register X. The POLARITY signal is used to select even or odd parity generation. Even parity checking is done on the data, and a parity error is indicated by PERRA. The 8bit data and the parity bit are enabled through a tri-stateable buffer to port B. ' There are two data paths from port B to portA. A multiplexer controlled by SEL selects a path. Even parity checking is done in both the paths and parity error is indicated by PERRB. The first path is through latch Wand register Z. In this path bit W8 is complemented by POLARITY to yield either even or odd parity.The second path is through registers Y & Z and even parity is generated on the data. The two registers in the second data path provide the user with two- level deep buffering. The 9-bit output is enabled through a tri-stateable buffer to port A. 9.10 6/90 11 USING THE 10173200 OR 10173210 AS READ AND WRITE BUFFERS WITH R3000 APPUCATION NOTE AN-65 PERRS A 0-8 AEN Vee GND0-2 --~=::==1 CP-.--~> POLARITY - t - - - - - i f - - . - - - - - - , Even Parity Even/,odd Panty Ch eck Generation 9 Complement Even Parity Even~Odd Check Panty Even P.arity Check Generate Even Parity 9 9 L--------~-----+_LE ~---~--------~~SEL . PERRA SOE 80-8 2647 drw01 Figure 1. 10173210 9.10 12 USING THE IDT73200 OR IDT73210 AS READ AND WRITE BUFFERS WITH R3000 APPLICATION NOTE AN-65 AdrLo (0:17) < AcTyp2 - .- AcTyp1 AcTypO :> Tag (16:31) ~ Data (0:31) MemRd RdBusy IDT79R3000 BusErr MemWr CpCondO ~ WrBusy XEn SysOut 2647 drw 02 Figure 2. Memory Interface Signals IDT79R3000 MEMORY INTERFACE The R3000 has interfaces to the main memory through the asynchronous memory bus. The output signals indicate the n~ture of operation that the R3000 is performing.· The input signals are used to indicate the termination of a stall block refills, and to cause exception processing. ' The figure above shows the signals used to interface to main memory. The address bus is split into AdrHi and AdrLo. The AdrHi bus is also used as the Tag bus for cache reads and therefore is shown as bidirectional. MemRd: This signal indicates the entry into the stall on a read operation. It is an active-low output signal. This output signal of the R3000 is used by the state machines to enter a read state and signal the memory system that the R3000 accepts data from the supplied 32-bit address. For one word refill, MemRd is deasserted by the R3000 one cycle afterthe RdBusy signal is deasserted indicating that the required data is ready. The deassertion of MemRd signals the end of a read stall. MemRd stays asserted during the entire stall cycles. RdBusy: This input signal to the R3000 is used to enter and terminate read stall cycles. The deassertion of RdBusy terminates stall cycles and the R3000 enters a fixup one cycle later during single word loads or it enters refill cycles in case of multiple word loads. RdBusy assertion and deassertion is sampled by the R3000 in phase 1 of the clock cycle. XEn: This active-low, output signal is used to enable the output of the read buffer in refill and fixup cycles. MemWr: This output signal is asserted low for store operations. Unlike MemRd, this signal is active for only one cycle as are the associated data and addresses. MemWr is used to enter a write state. 9.10 WrBusy: In order to create a write stall, this input signal to the R3000 has to be asserted low during the cycle in which MemWr is asserted. The deassertion of WrBusy terminates a write stall and the R3000 enters the fixup cycle. In the fixup cycle, the last write operation during which WrBusy was asserted is repeated. WrBusy is usually tied to the Signal that indicates the write buffer is full. WrBusy assertion is sampled by the processor in phase 2 and the deassertion is sampled in phase 1 of the clock cycle. SysOut: This is the clock output of the 79R3000 and is the clock frequency at which the R3000 is rated. CpCondO: The condition of this input Signal to the R3000 in stall cycles determines if the processor will do a single word read or a multiple word read. BusErr: This input signal is provided as a mechanism to create an exception in the R3000 and as an aid to escape from interminable stall cycles. AccTypO: This output signal has three functions. During cached reads it indicates whether there was a data cache miss or an instruction cache miss. This information is useful if the block refill size is different for data and instructions. During uncached reads it is used with AccTyp1 to indicate the size of the data being read. During writes it is used along with the AccTyp1 to indicate the size of the data being written. AccTyp1: This output signal is undefined for cached reads. For uncached read operations and for store operations, AccTyp1 along with AccTypO, indicates the size of data transfer. AccTyp2: AccTyp2 is undefined for store operations with stall cycles. For load ope ratio ns, it is high for cached operations and low for uncached operations. During run cycles, this line indicates whether there is any data transfer during the second phase. 13 USING THE IDTI3200 OR IDTI3210 AS READ AND WRITE BUFFERS WITH R3000 APPUCATION NOTE AN-65 write operations. The timing diagrams point out the control signals that resolve any conflicts in the use of these buffers. The control logic, described in the following sections can be implemented with any PLD that matches the proc~ssor speed. To interface with the main memory, signals are defin~d to make a request to the main memory (MREQ), to specify a read or a write operation to the main memory (MRD, MWR), and a signal from the main memory to indicate the completion of read or write operation (CYCEND). The memory interface signals from the R3000 are used by the PAL state machine inorder to generate controls to the buffers and the main memory. The RdBusy, WrBusy, MREQ, MRD, MWR, and the clock and data output enables for the 7321 Os are generated by the state machine. USING 1017321 Os Figure 3 shows the application of the 7321 Os as one-deep read and one-deep write buffer. Four 73210s are used to transfer the 32-bit data and the associated four parity bits. On the address bus, four 73210s are used to pass the 32-bit physical address and the access type bits(0:1) to the main memory. Port B of the 7321 Os are connected to the processor side, and Port A of the 7321 Os are connected to the memory side. The read and the write data paths are explained in Figures 4 and 5. In this design, one single set of four IDT7321 Os serve the function of read and write buffers. Also, a set of four IDT7321 Os are used to capture the addresses during read and IDT79R3000 /32 + 4 Data / r-- WrBusy (4) ~ MemWr ~C..> f- 18/ AddrLo / ~ MemRd Tag r-- (4) r-- I-1--1-- LL RdBusy ~ I Cache x 60 Bits SysOut ~ XEn AccTyp I-- « ' I- D Cache x 60 Bits LL ~ ~ '---- 20 + 1 + 3/ / AccTyp 0,1 t Tag 16:31 1 SysClk I CP PAL State Machine 2x IDT73210 Memory Interface r J PortB BO 4x IDT73210 2x IDT73210 I+- PortA I k PortB l+- Enables 1 LE PortB Data .. Parity AdrLo 0:15 PortA T I i ! • Address l PortA t • Data + Parity 2647 drw 03 Figure 3. Using IDTI3210 as Read and Write Buffer 9.10 14 USING THE IDT73200 OR IDT73210 AS READ AND WRITE BUFFERS WITH R3000 APPUCATION NOTE AN-65 Read Operations The data path for the read operations is through register X. The address and the access type bits go through the latch W and the register Z. The lower address bits are clocked in with SysOut. The tag bits, along with the access type bits, are registered with inverted SysOut (SysClk). The latch W is AccTyp 0,1 Tag 16:31 always transparent to bypass the address bits and access type bits. The POLARITY Signal is held low to pass the access type bits as par~y bits through the two 73210 on the tag bus. The low POLARITY signal to the four 7321 Os on the data bus generates even parity on the data passing through register X. Data + Parity AdrLo 0:15 SysOut - - - - - + - - - + - - - - - - , SysClk CP CP PortB I GND POLARITY CP Port B Port B POLARITY POLARITY Vcc Vee Vec AEN AEN SEL SEL BOE! Port A BOE! SEL Port A Address PortA Data 2647drw 04 Figure 4. Read Data Path. One Deep Read, One Deep Write Buffers Using IDT7321 0 9.10 15 USING THE 10173200 OR 10173210 AS READ AND WRITE BUFFERS WITH R3000 APPUCATION NOTE AN-65 Write Operations available in the Z registers, the PAL state machine generates the output enable signals and presents the address and the data to the memory. The even parity that is generated by the CPU passes through the parity unit without getting modified. The data path for write operations is through latch Wand register Z. Data is clocked in the 7321 as on the data bus with SysClk along with the tag bits. The lower address bits are clocked in with SysOut. Once the address and data are AccTyp 0,1 Tag 16:31 Data + Parity AdrLo 0:15 SysOut - - - - - + - - - + - - - - - , SysClk CP PortS CP PortS I GND POLARITY CP PortS POLARITY POLARITY Vee Vee Vee AEN AEN SEL SOE SEL PortA SEL SOE PortA Data + Parity Address 2647 drw05 Figure 5. Write Data Path. One Deep Read, One Deep Write Buffers Using 10173210 CONTROL LOGIC The control logic for the signals that control the address 7321 as, the data 7321 as, and handshake signals to the main memory system is described with simple boolean equations. The 73210s are used to capture the data and addresses during read and write operations and to provide the system with one-level deep read and write data paths. Byte enable signals for partial word writes can be generated by extending these equations. In this design, block refills are supported and instruction streaming is assumed to be enabled. The control signals that are utilized by the PAL state machine to control the read/write buffers, and communicate with the main memory controller are WIP, WrBusy, RdBusy, MRD, MWR, MREQandCYCEND. The clock inputtothe PAL is inverted SysOut. In the following boolean equations the following notation is adopted: 9.10 Logical NOT operation. Indicates the corresponding signal is active low. Logical OR operation. OR AND Logical AND operation. To end a boolean equation. Registered output. Combinatorial output. 16 USING THE 10T73200 OR 10T73210 AS READ AND WRITE BUFFERS WITH R3000 APPLICATION NOTE AN·65 Main Memory Controls { RdBusy usually stays asserted even when there is no read { operation going on. It is deasserted when the memory system { acknowledges ( asserting CycEnd) that the read is finished. It { remains de asserted till MemRd gets deasserted. { It is assumed here that the memory system asserts CYCEND in { phase 1. RdBusy gets de asserted in phase 2, and the CPU puts { out XEn(s) from the next clock cycle. { RdBusy is not deasserted with CYCEND associated with a prior { write operation. { RdBusy is a registered output. RdBusy:= !( !WIP AND MemRd AND CYCEND) OR !( !RdBusy AND MemRd); {A single pulse request is sent out to the main memory system to { indicate that a read or write operation is coming along. It is { asserted only when a read or write operation is feasible through {the one deep read and write buffers. { This is a registered output. MREQ:= (!WIP AND !MemRd AND MemWr AND !MREQ) OR ( !WIP AND MemRd AND !MREQ); {A read strobe is given out to the main memory system to { indicate a read operation. This signal is asserted while there is { no write in progress and MemRd is asserted. { To support block refills MRD stays asserted with MemRd. { This is a registered output. MRD:= (!MRD AND !WIP AND MemRd) OR ( MRD AND MemRd); The control signal CYCEN 0 is asserted by the main memory controller to indicate the finish of a write operation or the availability of the first word of the block refill in the read buffer. It is assumed that CYCEND is asserted in phase 1, so that RdBusy can also be deasserted in phase 1. {WIP signal is used to indicate whether the write buffer is in {the process of retiring its contents to the main memory. { WIPis asserted when MemWr is asserted and deasserted when { an acknowledge (CYCEND) from the main memory system comes { back indicating that the write is carried out. {WIP is a registered output WIP:= (!WIP AND !MemRd AND MemWr) OR (WIP AND !CYCEND); { Write busy (WrBusy) is asserted when there is a write in { progress or when read operation is going on. During read { operations with streaming enabled, WrBusy should be 9.10 17 USING THE 10T73200 OR 10T73210 AS READ AND WRITE BUFFERS WITH R3000 APPLICATION NOTE AN·55 { asserted to stop any writes because there is a common data { buffer for both read and writes. WrBusy = WIP OR MemRd; { A write strobe is given out to the main memory system to { indicate a write operation is in progress. MWR=WIP; Higher Address Buffer Controls { Controls for 73210s that pass higher address bits (Tag 16:31) { and access type bits (AccTyp 0,1). { The path through latch W & register Z is selected by the internal { Mux. SEL = 1; { Register Z is enabled for read and write operations when there { is no contention between read and writes.Latch W is transparent. { A read operation in progress is indicated by MemRd signal. { For write operations ABEN is enabled for one clock cycle. DMemWr:= MemWr; ABEN = (MemRd AND !WIP AND !MemWr) OR (MemWr AND !WIP AND !MeiTIRd) OR (ABEN AND MemWr AND !DMemWr); { Allows the Access Type bits to pass through "Compliment { Even/Odd Parity" unit as parity bits without getting modified. POLARITY = 0; { The higher address bits along with the access type bits are { clocked into the register Z with inverted SysOut. CP = ! SysOut; { The higher address bits along with the access type bits are put { out to port A when there is a write in progress or while MemRd { is asserted. AAOE = WIP OR MemRd; { AEN always disabled for the address 73210. { BOE always disabled for the address 73210. Lower Address Buffer Controls { Controls for 7321 Os that pass lower address bits (AddrLo 0:15) 9.10 18 USING THE 10T73200 OR 10T73210 AS READ AND WRITE BUFFERS WITH R3000 APPUCATION NOTE AN-55 { The path through latch W & register Z is selected by the internal { Mux. SEL = 1; { Register Z is enabled for read and write operations when there { is no contention between read and writes.Latch W is transparent. { A read operation in progress is indicated by MemRd signal. { For write operations ABEN,is enabled for one clock cycle. DMemWr:= MemWr; ABEN = (MemRd AND !WIP AND !MemWr) OR ( MemWr AND !WIP AND !MemRd) OR ( ABEN AND MemWr AND IDMemWr); { POLARITY is Don't Care. POLARITY = 0; { The lower address bits are clocked into the register Z with { SysOut signal, because they are available in the first phase. CP = SysOut; { The lower address bits are put out to port A when there is a { write in progress or while MemRd is asserted. AAOE = WIP OR MemRd; {AEN always disabled for the address 73210. AEN = 1; { BOE always disabled for the address 73210. BOE = 1; Data Buffer Controls { Controls for 7321 Os that transfer data bits for reads & writes. { The path through latch W & register Z is selected by the internal { Mux to provide one-deep write buffer. SEL= 1; { Register Z is enabled for write operations when there is no { read operation in progress. A read operation in progress is { indicated by MemRd signal. Latch W is transparent. DMemWr:= MemWr; DB EN = (MemWr AND !WIP AND !MemRd) OR ( DBEN AND MemWr AND !DMemWr); ( Even polarity generated by the CPU is passed through by setting ( POLARITY to ZERO. 9.10 19 USING THE IDT73200 OR IDT73210 AS READ AND WRITE BUFFERS WITH R3000 POLARITY APPLICATION NOTE AN-65 = 0; { The data bits along with the parity bits are clocked into the { register Z with inverted SysOut. CP = ! SysOut; { The data bits are put out to port A when there is a write in { progress. DAOE = WIP; { DAEN is enabled during read operations. DAEN = MemRd AND !WIP; DBOE is enabled by XEn to read the data from the read buffer. TIMING DIAGRAMS Figures 6 through 11 give the timing waveforms for the onedeep read and one-deep write buffer described in Figure 3. The signals shown in these figures are described by the boolean equations presented earlier. In these timing diagrams, the signals that are generated by the PAL state machine are shown with a displacement in relation to their input signals. Also, some of the signals generated by the PAL state machine are registered with SysClk. The main memory interface signals generated by the PAL are MREO, MRD, CYCEND, and MWR. The enable signals to the address 73210s are ABEN, and MOE. The enable signals to the data 7321 Os are DBEN, DAOE, DAEN, and DBOE. The memory acknowledge signal, CYCEND is asserted two cycles after MR EO is asserted, for both read and write operations. Figure 6 shows read and write operations. The memory read operation starts with the MemRd signal being asserted. A MREO pulse is sent out to the memory, and MRD signal is asserted forthe duration that the MemRd signal stays asserted. The memory system responds to the request by placing the data in the read buffer and asserting CYCEND. This deasserts the RdBusy signal. RdBusy is sampled in phase 1 by the processor, and it generates XEn in the next clock cycle. Since the one-deep read and write buffers are implemented using the same buffers, during a memory read operation the WrBusy signal is asserted to halt any write operations. The address enables are asserted through out the read operation to capture the addresses. For read operations, DAEN is asserted with MemRd signal to capture the data coming from the memory. The port B output enable for the data buffers is controlled by XEn for reading in the data. The read latency is five clock cycles including the fixup cycle. For write operations in Figure 6 , the WrBusy signal is asserted as long as the write operation is in progress. This is indicated by WIP. It should be noticed that the RdBusy signal is asserted during write operations to block any read operations. The address enables are asserted during the write run cycle, and the address output enable is asserted throughout the write operation. The data buffer enables are asserted in the same way. It should be noted that WIP is a clocked output. The write operation takes three cycles. Figure 7 shows a four word data block refill. The RdBusy signal, once deasserted, remains deasserted until MemRd is deasserted. Figure 8 shows four word instruction block refill with streaming enabled. The instruction cache miss occured on the instruction 11. The refill starts with the baSic block boundary instruction 10. The processor enters fixup as the missed instruction is fetched. The processor streams through the rest of the block. Figure 9 shows a memory read requested by the processor before a previous write is retired to the main memory. The state machine puts out a request for the read operation only after the completion of the write operation, indicated by the first assertion of CYCEND. The enable ABEN is not enabled for the read until( the previous write is completed. Figure 10 shows two write operations in two consecutive clock cycles. Since the write buffer is one word deep, the second write is not absorbed by the write buffer, and the processor stalls until( the first write is retired to the main memory. In the following fixup cycle, the second write is completed to the write buffer. The memory request M REO for the second write is only generated in the fixup cycle. The data and the address of the second write are not captured by the buffers while the first write is in progress. In should be noted that deassertion of WrBusy is sampled by the processor in phase 1. Figure 11 shows a write operation occuring in the middle of streaming. Streaming starts with instruction 11. The next instruction 12 issues a write. Since the write busy signal is already asserted, instruction streaming is aborted. The instruction 12 is executed in the following fixup cycle. WIP is asserted only in the fixup cycle. The data and the address of the write instruction 12 are not captured during streaming. 9.10 20 rI • I USING THE IDT73200 OR 1OT73210 AS READ AND WRITE BUFFERS WITH R3000 RUN APPLICATION NOTE AN-65 STALL STALL STALL STALL FIXUP RUN RUN RUN RUN RUN RUN ~ --I'---I'- ~ --I'---I'- ...r--L ...JLI L~ ...r--L I L I RdBusy I L -----l L ---.J I I I I L -W L I I I I I I I I I r - iL-J L I 1 J L-J I I I I 2647 drw06 Figure 6. Read, Write Operations 9.10 21 USING THE 10T73200 OR 10T73210 AS READ AND WRITE BUFFERS WITH R3000 RUN STALL APPUCATION NOTE AN-65 STALL STALL STALL REFILL REFILL REFILL REFILL FIXUP RUN RUN ~ LrL ~ u-L S L S L ~ ~ ~ ~ J L ~ I---- RdBusy I I L ~ I I L -W L I LI L:----t- I I I I I I I I 2647 drw 07 Figure 7. Data Block Refill II I 9.10 22 USING THE 10T73200 OR 10T73210 AS READ AND WRITE BUFFERS WITH R3000 RUN STALL STALL APPLICATION NOTE AN-65 STALL STALL REFILL FIXUP STREA~ STREA~ 10 11 12 13 RUN RUN RUN RUN RUN RUN WL- JI.... WL- JI.... ...JLWL-WL- SL ~ WL- JI.... WL-WL- SL ..Jl... r-----RdBusy I I L --.J I J L-. ..J ~ ~ ~ ---1 I I I I I I I I 2647drw08 Figure 8. Instruction Streaming 9.10 23 USING THE IDT73200 OR IDT73210 AS READ AND WRITE BUFFERS WITH R3000 RUN RUN APPLICATION NOTE AN-65 STALL STALL STALL STALL STALL STALL REFILL REFILL REFILL REFILL FIXUP ~ S L ~ I LS L S L u-LI LS L u-L u-L ~ S L r---RdBusy I I L ----.-J L ----.-J r-- I L -~ L -~ L I 0 - I r-- I I --"L --"L --"L I r-r-- L-W I L-W I· I r-2647drw09 Figure 9. Read During Write In Progress 9.10 24 USING THE IDT73200 OR IDT73210 AS READ AND WRITE BUFFERS WITH R3000 RUN RUN RUN APPLICATION NOTE AN-65 STALL STALL FIXUP RUN RUN RUN RUN RUN RUN RUN u--t.-u--t.-u--t.- S L S L S Lu--t.-u--t.-I LS L ....rLu--t.-u--t.RdBusy L --' L ----.J L - .J I L - .J I I I I I I I I I I J I t-J I I I I ~J I I I I I I I 2647drw 10 Figure 10. Write During Write in Progress 9.10 25 USING THE 10T73200 OR 10T73210 AS READ AND WRITE BUFFERS WITH R3000 RUN APPUCATION NOTE AN·65 STALL STALL STALL STALL REFILL FIXUP ~TREA~ REFILL FIXUP 11 12 13 12 10 RUN RUN RUN RUN RUN ~ JL~ JL~ ...rl..- ~ ....IL ~ JL~ u---L JL s-L ~ I--- RdBusy I L L ---.J L ----1 I L - ..J L-. W ~ W- ~ W r--- r--- I I J r-l I I I - L-W I I I I L-J I I I J 2647 drw 11 Figure 11. Write in Streaming 9.10 26 USING THE 10173200 OR 10173210 AS READ AND WRITE BUFFERS WITH R3000 APPLICATION NOTE AN-65 TWO DEEP READ AND ONE DEEP WRITE 1017321 Os can also be used in two-deep read and one .deepwrneconfiguration. Forcapturingthe addressesandthe access type bits, four 10173210s are used with B ports connected to the processor. For transfering data, four ID173210s are used with A ports connected to the processor. AccTyp 0,1 Tag 16:31 PortS This configuration of the data path uses the registers Y and Z for read operations as two-level deep buffers. For write operations, the data is wrtten to the register X, thus providing a one-deep write buffer. The read and write data paths are shown in Figures 12 and 13. It should be noticed that even parity is generated on the data in both the directions. AdrLo 0:15 Data + Parity Port S PortA Address Data 2647drw 12 Figure 12. Read Data Path - Two Deep Read, One Deep Write Buffers Using 10173210 9.10 27 USING THE IDT73200 OR IDT73210 AS READ AND WRITE BUFFERS WITH R3000 AccTyp 0,1 APPUCATION NOTE AN-55 Tag 16:31 Port B AdrLo 0:15 Data + Parity Port B PortA PortA Data + Parity Address 2647drw 13 Figure 13. Write Data Path - Two Deep Read, One Deep Write Buffera Ualng IDT73210 II I 9.10 28 USING THE 10173200 OR ID173210 AS READ AND WRITE BUFFERS WITH R3000 AccTyp 0,1 APPLICATION NOTE AN-55 Tag 16:31 AdrLo 0:15 PortB Port B Data + Parity Port B PortA PortA Address Data 2647 drw 14 Figure 14. Read Data Path - One Deep Read, Two Deep Write Buffer. Using 10173210 ONE DEEP READ AND TWO DEEP WRITE To use IOT7321 Os in a one deep read and two deep write configuration, four IOT7321Os are connected to the address bus with B ports on the processor side. Four I 0T7321 as are connected to the data bus with B ports on the processor side to transmit data. The data path for the read operations is shown in Figure 14. The address and the access type bits can be passed through the latch Wand the register Z. Oata is read back from the memory through the register X after even parity is generated. Figure 15 shows the write data path. To utilize the 7321 as as two deep write buffer, the addresses and the data are passed through registers Y and Z. These two registers provide the two-level deep buffering for the addresses and the data. If any write operations, such as writing to the registers of I/O devices, require only one-deep write buffer, then the path through the latch Wand the register Z is useful for both data and the addresses. It should be noticed that to transfer access type bits in two-deep write configuration, separate two-level deep buffering is required. Increasing the depth of the write buffer to two may improve the performance Significantly if the application executes the second store before the first store is absorbed by the main memory. 9.10 29 USING THE IDT73200 OR IDT73210 AS READ AND WRITE BUFFERS WITH R3000 APPLICATION NOTE AN·65 Tag 16:31 Port B AdrLo 0:15 Data + Parity Port B Port B Data + Parity Address 2647drw 15 Figure 15. Write Data Path. One Deep Read, Two Deep Write Buffers Using IDT73210 CONCLUSIONS IDT73210 is an ideal part for one/two-deep readlwrite buffers for R3000 applications. It is bidirectional, and speed compatible with the existing RiSe processors. It generates and checks even parity and hence reduces the parts count in the memory interface for R3000 based systems. Using IDT73210sonthe address bus, separate latchesforcapturing the address low bits can be eliminated. IDT73210 also provides the designer two different data paths from port B to port A to be selected dynamically depending on the operation. II I 9.10 30 (;)® IntePAted Device Technology, Inc. DESIGNING EMBEDDED CONTROL APPLICATIONS WITH THE IDT79R3001 RISControlier™ APPLICATION NOTE AN-66 By Michael J. Miller INTRODUCTION The 1DT79R3000 RISC Microprocessor is increasingly selected for demanding embedded tasks in applications ranging from avionics controllers (CAP-32) to laser printer engines and PBX systems where performance is critical. This wide range of use is possible because of the inherently flexible yet powerful nature of the R3000 which allows the system designer maximum flexibility to achieve both cost and performance goals. The IDT79R3001 RISController is the first derivative of the R3000 family and builds upon this flexibilny by allowing the designer to create systems with fewer cache parts or even systems without cache at all! While some processors incorporate an on-chip cache, it is typically smaller than desired for large applications. The R3001 allows the designer to decide whether to have a cache and also the cache size. Since the R3001 includes the cache control, there is no overhead beyond the actual memory components. Basic system designs vary according to specific needs of the application. To be an ideal embedded control CPU, a processor must provide solutions that meet different sets of criteria. These criteria, however, vary in importance depending upon the particular application. In general, the criteria will fall into one of two broad categories: system performance or system cost. Performance can be measured in terms of raw throughput, context switch time and minimum and maximum latency to interrupts. System cost consists of component cost and parts count, which are not always the same. For avionics, system board space is paramount, not individual component cost. The IDT79R3001 RISController was defined to meet these requirements for embedded systems, as well as maintain software and architectural compatibility with the R3000. Compatibility allows all of the R3000 development tools to be used with the R3001. While the changes to the R3000 may seem deceptively simple, they result in significant benefits to the R3000 family. The 79R3001 dramatically reduces cache memory costs and lowers the interface parts counts, as well as provides support for single-hierarchy memory systems to increase performance. TWO SYSTEM APPROACHES The RISController offers the designer two basic system philosophies. The first is to treat a relatively small amount of local synchronous memory built with SRAM as a cache to allow the R3001 to manage the movement of data between the cache and main memory. In this configuration, the local memory provides each word of instruction or data with a valid bit and an address tag of its page in main memory. If the fetched data is not valid or is from the wrong place, the processor refetches the data from main memory in an asynchronous handshake. The designer programs the TAG address width to exactly match the system requirements. For example, a cacheable main memory of 8Mbytes and a cache of 16Kbytes requires a 9-bit tag and, thus, the total required width is 42 bits for data, tag and valid bit. With the large index address in the local cache, the designer can choose to lock into the cache portions of the program . The other approach is to incorporate a large amount of local memory and manage the loading of data and instructions through program control. With this approach, the width of the local memory is 32 or 36 bits, depending on whether or not parity is selected for protection. The R3001 accomplishes both approaches with direct control of the local memory through dedicated pins and, thus, avoids extra components and maximizes performance of the local memory. RISControlier is a trademark of Integrated DevIce Technology. Inc. C EXECUTiVE Is a registered trademark of JM I Software Consultants. Inc. el990 Integrated Device Technology. Inc. 9.11 5190 1 DESIGNING EMBEDDED CONTROL APPLICATIONS WITH THE ID179R3001 RISControlier APPLICATION NOTE AN-66 IDT79R3001 IDT79R3001 Data Valid Tag ~ I / /32 321/ Instruction Data Data (+ Parity) Ad Wr Address v 1 - 19 I / Cache I Address I /24 / / 32 32+(4) 32 Data and Address Bus Buffers EPROMs and I/O Main Memory and I/O 2855 drw 01 Figure 1. Block Diagram of Two Approaches MAKING A SELECTION To select the optimal configuration, the designer must determine the computational requirements of the application. For example, many applications have a "drop dead" performance requirement; a processor unable to achieve this minimum performance standard is not a viable candidate for the application. Performance can be measured as the time to execute a task, respond to an interrupt or perform a complete context switch. The performance of the R3000 is demonstrated to be 21 VUPS in a 25MHz M/2000 system or 13 VUPs in a 16MHz M/120 system. Using the JMI C EXECUTIVE® as kernel on an M/120 provides context switches in 10jls and system calls in only 8jls. Predictable interrupt response times can be achieved by locking the kernel into cache. With the R3001, this is accomplished by using a high address line (such as AddrL023) as one of the address lines for the instruction cache. In this way the instruction cache is divided into two halves, each half containing words from a specific region in memory. In this example, if all of the kernel code is placed in memory above 8M and the application code is below 8M, the kernel will not be pushedoutofthecache by the application code. Furthermore, 9.11 if a kernel such as the "e EXECUTIVE" is used, the kernel will never leave the cache because it consists of only 8Kbytes which fit entirely into the 16Mb cache that is divided into two parts. Another key aspect of an embedded control application is the parts count and the board area of the design. With this in mind, IDT and other SRAM suppliers currently provide wider memories as well as integrated logic functions on-chip. For example, the IDT71586 is a 4K x 16 SRAM with an integrated address latch that can be used to build a processor cache (for either the R3000 or 80386) with a low-parts count. Another new standard architecture is emerging with IDT's new 71222, which includes two interleaved banks of 4K x 18 SRAM in one device. With this approach, a desig ner can bu ild a syste m that includes both instruction and data caches with minimal parts count. Like most other processors, the R3001 needs address and data buffers. If operated without cache it will provide performance similar to a 68020. By adding a 16Kbyte cache using only three parts (IDT71586), however, the performance more than doubles, making a 12.5MHz system achieve 4 VUPs running large embedded applications such as Page , . Description Inlerpreters. &. 2 DESIGNING EMBEDDED CONTROL APPLICATIONS WITH THE IDTI9R3001 RISControlier APPLICATION NOTE AN-$ System Memory Map 16Mb IDT79R3001 Data Valid 1 ~ 32}' Address Tag {1 -19 I Kernal ~ I Applica';o" Cache I Data Addr23 Kernal Code 0~ I 8Mb )-'32 Data and Address Bus Buffers Application Gode and Data Main Memory and I/O ~ o 2855drw02 Figure 2. Locking Code In the Instruction cache NO CACHES A "cacheless" scheme can be used in various embedded applicationswhere deterministic behavior must be guaranteed, the application has stringent board space constraints and where the amount of code to be executed is limited although high-performance is demanded. Examples of such applications include communication controllers, avionics, robotics and switching controllers. These systems typically feature compact instruction and data spaces because the nature of theirtask is highly repetitive. This allows the system designer to compact the memory hierarchy and eliminate the need to "cache" large main memory in his system. Instead, the R3001 is used in a non-hierarchical (without cache) based system, where aN of the needed code 9.11 and data fit in a tightly coupled synchronous memory. Such a system achieves the highest performance possible while minimizing the part count. Since all memory accesses are synchronous (with no cache or TLB misses), the system achieves the true (theoretical) performance of the RiSe engine. At 25MHz operation, the system will achieve 23 VUPs with complete predictability for interrupts. The example shown in Figure 3 is a system which incorporates 128Kb each of instruction and data synchronous memory, a one deep read/write buffer interface to slower peripheral devices, 64Kb of EPROM for instructions and initialization data (configured in the asynchronous space) and a DUART. The instruction and data memory can also be expanded by using 64K x 4 or 128K x 8 memories to make systems with 256K and 512K bytes, respectively. 3 DESIGNING EMBEDDED CONTROL APPLICATIONS WITH THE IDT79R3001 RISControlier APPUCATION NOTE AN-66 IDT79R3001 SysOut MemRdl MemWr AccTyp (2-0) RdBsy WrBsy CpCondO Data 0-31 Tag 13-28 ,~ ,~ r r.. IWr lRd Valid 1 J 1 DE '--- WE DO-31 State Machine Control 3 x IDT71586 ~ AddrLo 0-23 IClk ,~ LE Addr +--+ 3 x PALs ~ + D Rd Wr Xack MemStrt D32-46 D ( 4 x IDT29FCT52 ) ( 2 x IDT29FCT823) Y Y t ! Data 0-31 Addr 16-31 D ( 2 x I DT29FCT823) Y ~ Addr 0-15 2855drw 03 Figure 3. Low Parts Count System Using the IDT79R3001 Slower devices, such as the boot EPROM and the I/O devices, reside in the "asynchronous" address space of the system. The R3001 divides its 4GB virtual address space into 4 sections: 1. The kerneVusersegmentwhich is both cached and mapped by the TLB. 2. Kernel segment 0 which is not mapped but is cacheable. This is where the system's synchronous memory exists. 3. Kernel segment 1 which is neither mapped norcacheable. The processor reset vector as well as the peripheral device are located in this space. 4. The kernel segment 2which, like the kernel/user segment, is both cacheable and mapped by the TLB. The asynchronous bus is used when accesses to kseg1 are issued by software, as well as when the system is initializing itself. The asynchronous bus supports 1/0 and memory devices with varying access speeds. In this example, the asynchronous address space is 256Kb and the asynchronous data bus is 8 bits wide. The design includes 128Kb EPROM ( two 64K x 8 device) and one DUART. The EPROM holds the boot code, initialization data and application code. It would be reasonably simple to scale this design to include other I/O devices as well. CONCLUSION One of the strengths of the R3000 is its ability to support different memory hierachies and cache configurations. lOT made modifications to the R3000 to create the IDT79R3001 RISControlier. It retains the flexibility of the original R3000 while meeting controller design requirements such as reduced parts count, small board real estate and predictable interrupt response. While future versions of the family will provide higher levels of integration such as on-board cache, there will always be a version which allows the designer to tailor the memory hierarchy. II I 9.11 4 DESIGNING EMBEDDED CONTROL APPLICATIONS WITH THE IDT79R3001 RISControlier APPUCATION NOTE AN-66 Addr Latches AddrLo t-----.--1~ IClk t---t---~ 128K Instruction 4 x IDT71256 mer t---t-----~ OE ~j-_j------~WE~--~ IDT79R3001 DClk I----+--~ 128K Data 4 x IDT71256 14---..t DAd t---t-----~ UWr r--I---~l..jyJ=------1 Data ...---Ir------r-----,r----r---~ .Bd.tID WrBsY ~ MemWr PAL State Machine and Control 2855 drw 04 Figure 4. Using the IDT79R3001 In a Non-cache Approach 9.11 5 G® Integrated Device Technology, In~. APPLICATION NOTE AN-67 USING IDT715D2 RAMs IN A REAL-TIME DEBUGGING TOOL FOR A R3DDD MICROPROCESSOR BASED SYSTEM by Bhanu V. R. Nandurl INTRODUCTION The proliferation of high-speed RISC and CISC microprocessors has created a demand for real-time debugging tools. This application note shows how a real-time logic tracing tool can be created using IDT71502 multifunction RAMs. The IDT71502 can be used as a stand-alone logic analyzer or as part of an embedded fault monitor and analysis system. Details of how to apply this system to an R3000 RISC microprocessor-based system are given. The discussion in this paper is also equally valid for use in high-speed CISC processor-based designs. The IDT71502 can be used to function either as a logic tracing device or as a test pattern generator. As a logic tracing device the IDT71502 can record bus activity continuously and then be stopped on a predetermined event such as a bus error. This allows the activity leading up to the "event" to be recorded for analysis. Since the trace function is accommodated in a single device. embedded tracing is more likely to be practical. DESCRIPTION OF IDT71502 MULTIFUNCTION RAM pipeline register and an address counter. In addition, there is a 16-bit set-up register used to set the chip operating mode and to read back chip operating status conditions. It includes a serial control interface called the serial protocol channel (SPCTM) which is available in a variety of other products from I DT as well. The SPC logic. as implemented in the IDT71502. has one a-bit command shift register. a command decode register and a 16-bit data shift register. The serial data shift register can be configured to operate in a diagnostic mode. In the diagnostic mode of operation. the shift register can read all status conditions on the chip such as the RAM output. pipeline register output. data output pin state and RAM load/read counter value. The serial protocol channel consists of a four-pin interface bus through which the user can access the internal registers of the IDT71502. The four pins are: (a) Serial data input pin (SDI) for sending data and commands to the device. (b) Serial data output pin (SDO) for extracting data from the device. (c) Serial clock pin (SCLK) forclockingdata and commands. (d) Command/Data mode pin (C/O) to provide command or data identification to the device. IDT71502 is a 4K x 16 multifunction RAM with an address set-up time of 25ns. It has a breakpoint comparator,16-bit TIMING CHAIN RS-232 INTERFACE SCLK MAXIM RS-232 WESTERN DIGITAL TR1863 UART C/O SDI SDO IDT71502 4Kx 16 RAM 2676 drwOl Figure 1. SPC is a trademark 01 Integrated Device Technology. Inc. el990 Integrated Device Technology. Inc. 9.12 8190 1 USING IDT71502 RAM. IN A REAL-TIME DEBUGGING TOOL FOR A R3000 MICROPROCESSOR BASED SYSTEM APPLICATION NOTE AN-67 This four-bit bus can be very conveniently connected to an RS-232C line for direct serial communication with a computer and Figure 1 illustrates one scheme to achieve this. The user is urged to refer to the "IDT71502 FUNCTIONALITY DEMONSTRATION BOARD USER MANUAL" for more information on interfacing the I DT71502 to the RS-232C serial communication line. The SPC's eight bit command is divided into a four-bit command field and a four-bit register field. The four-bit command field is used to determine whether a read or a write operation will be executed. The four-bit register field of the command register is used to select the various internal registers and the external pins on which the read or write will ta!(e place. Thus, the four-bit command field and the four-bit register field can effectively access any internal register for a read or a write operation and monitor the state of the external pins. Table 1 summarizes the SPC commands, and Register codes and the set-up register format. When the command! data line is high, commands are serially clocked through the SCLK into the internal command register via the serial data input pin (SOl). When the command/data line is low, data is serially clocked by the SCLK into the internal data register via the serial data input pin (SOl). The SPC commands are executed whenever the CiOline transitions from a command mode (logic 1) to a data mode (logic 0). This device, when configured to operate in the trace mode, serves as a real-time debugging tool analogous to a logic analyzer. SET-UP/STATUS REGISTER CODE Bit Operation Performed. Name SPC COMMAND CODES SPC REGISTER CODES Command Code (Hexadecimal) ReadlWrlte Function 15 CE Read Only 14 SOEFF Read Only 13 SOE Pin Read Only 12 OEPin Read Only 11 WE Pin Read Only 10 INITPin Read Only 9 BP Compare Read Only 8 BPPin Read Only 7 CS1 ReadlWrite 6 csa ReadlWrite 5 Non-Reg High ReadlWrite 4 Non-Reg Low - 3 2 - BC-ADDRS ReadlWrite 1 BC Pipelined ReadlWrite 0 Trace Mode ReadlWrite 26761b10l Register Code (Hexadecimal) Operation 0 ReadlWrlte Function ReadlWrite Register RAM Counter 0 Read Read Register 1 Write Write Register 1 ReadlWrite RAM OutpuVlnput 2 ReadlWrite Pipeline Register 3 ReadlWrite Break Mask Register 4 ReadlWrite Break Data Register 5 ReadlWrite Setup and Status Register 2 Read RAM Read RAM and Increment Counter 3 Write RAM Write RAM and Increment Counter 4-C Reserved (Reserved NO-OP) 0 Write Stub Diagnostic E Write Serial Diagnostic F Reserved (Reserved NO-OP) 6 Read Only 11015 - 1/00 (Data Pins) 7 Read Only RAM Address Pins 8-F Reserved Reserved (Unused) 26761b103 2676 Ibl 02 Table 1. 9.12 2 USING IDT71502 RAMs IN A REAL·TlME DEBUGGING TOOL FOR A R3000 MICROPROCESSOR BASED SYSTEM APPLICATION NOTE AN-67 AN R300D-BASED SYSTEM the systemdesignerwho is interested in using the IDT79R3000, the CPU and FPU, as well as the instruction and data caches with the read and write buffers, are now available in a compact module (IDT7RS101) which can be connected to the user's system bus. This approach to system design vastly reduces the design cycle time by shifting the design emphasis to main memory and I/O interfaces. A block diagram of a R3000-based system's CPU and its memory interface is shown in Figure 2. It consists of the CPU and FPU, data and instruction caches and the read and write buffers connected to the CPU and the system bus. This is a typical configuration found in embedded or general purposetype systems which use the R3000. To reduce the burden of IDT79R3000 RISC Processor CPU Data Bus IDT79R3010 Floating Point Unit j CPU Address Bus To Main Memory and I/O Interface 2676 drw 02 Figure 2. A Generic R3DOO Microprocessor Based System When debugging a system board based on the IDT7RS1 01 orits equivalent, the majority of debugging is done by monitoring the cache to main memory interface on the main memory side of this interface. An embedded trace function may operate in the same way. This keeps the capacitance of the trace RAM pins out of the speed critical cache buses. If desired, the R3000 can be operated in the uncached mode. This forces all accesses to main memory and allows every memory access of the processor to be monitored from the cache to main memory interface. The userwishing to operate in the uncached mode can do so by setting bit 11 of the TLB entry register to 1, indicating uncached mode, oroperate the software in virtual address space kseg1. Kseg1 is kemal-mode virtual addressing space which is uncached and is 512 Mbytes long starting at virtual address OxaOOO_OOOO. With this approach, the user must define instruction space and data space in the main memory and must provide an address decoded input to the IDT71502 tracing the control bus. This input will be used to determine whether an instruction or data related transaction occurred during that clock period. Another approach is to tie the address valid bit on the TAG bus to ground via a 300 Ohm resistor. This is necessary to prevent a direct short from occurring when the CPU is driving the TAG bus. Tying the address valid bit on the TAG bus to ground will result in invalidating the cache TAGs and cache misses will occur, resulting in the processor accessing main memory to get that information. Whenever the main memory is accessed to get information after a cache miss, the processor puts out information on the Access Type pins, indicating the size of the word to be transferred and that it was a cached reference. AccTyp(2) pin output indicates a cached reference when 1 and an uncached reference when O.AccTyp(O) 9.12 3 , 9 ' , USING 1DT71502 RAMs IN A REAL·TlME DEBUGGING TOOL FOR A R3000 MICROPROCESSOR BASED SYSTEM APPUCATION NOTE AN-67 indicates a Data reference when 0 and an instruction reference when 1. ,The AccTyp signals are latched using our control trace RAM and will determine whether an instruction or data transaction occurred during that clock period. A user wishing to implement his own cache can use the IDT71502 inthe trace mode to monitor the cache. However, it should be pointed out that the timing for this part of his system is more stringent. The user may have to register trace data before clocking it into the IDT71502s to meet the IDT71502s set-up and hold time restrictions. To Main Memory and VO Devices .- SYSTEM ADDRS BUS ... ~ SYSTEM CONTROL BUS ~ SYSTEM DATJ BUS • MEMORY CONTR'LR 1' ~V031 V0 15-0oo I• V031 -ooo -000 IF spe 2 < spe 1 IDT71502 RAM IDT71502 RAMx2 IDT71502 RAMx2 WI: ~eLK BKPT~ WI: ~eLK IF spe 3 SPCport 0 SPC port 1 T G~O WRITE BUFFER ~ ~ WI: ~eLK ~ I I 1 -=E READ BUFFER BKPT I - - I I Ir I 1WE I 1~ SPC port 2 Memwrj.l RdAck BAcct(2) BAcct(O) CPU ADDRS BUS CPU CONTROL BUS CPU DATA BUS TCLK PAL State Machine ~ ..... • Vee ...... MemRd WrAck BSysOut 2676 drw 03 Figure 3. Block Diagram to Trace Instructions, Oata, Instruction AOddresses and Data Addresses on the System Bus of an R300o-based System DESCRIPTION OF THE MONITOR CIRCUIT Figure 3 shows the block diagram of an implementation of the. monitor circuit. It is placed on the system bus between main memory and the write buffer of the R3000. The R3000 uses the write. through cache update policy to ensure data coherency. The function of the write buffer is to capture data and addresses output by the CPU and ensure that data is passed on to main memory. The read buffer is used for temporary storage of data during data transfers between main memory and the CPU. Depending on the block refill size, the read buffer can be 1,4,8, 160r32wordsdeep. The block refill size of the system is fixed during the system reset operation. The R3000's CpCondO input can be set to a 0 to indicate a single word transfer or can be set to a 1 to indicate a block transfer by the external memory controller. The PAL state machine is used to generate the appropriate IDT71502 strobes to capture instructions, data, instruction addresses and data addresses. The IDT71502s labeled "1 " in Figure 3 is used for capturing data and instruction addresses; IDT71502s labeled "2" is used for capturing data and instructions. The IDT71502labeled "3" is used to trace the control bus signals. In this application note, we assume a single word deep read buffer. If a system is designed for all possible types of data transfers (i.e bytes, half words, tribytes, words and block refills), ourPALequationswili also have to change to generate the strobes necessary to trace these data transfers. o 9.12 4 USING IDTI1502 RAMs IN A REAL·TIME DEBUGGING TOOL FOR A R3000 MICROPROCESSOR BASED SYSTEM phi BSysClk AddrOutEn APPUCATION NOTE AN-67 phi phi phi -~ ~~~~ ,~~~~~~~~ ~~" I--- ///////// ~ ", // '" SysAddr SysData phi phi / L ~ /' -tTWDS tTWOH / tTWS_ /V TCLK ~'TWH- 2676 drw 04 Figure 4. Main Memory Read Cycle (Single Word Read) 9.12 5 USING 1DT71502 RAMs IN A REAL-TIME DEBUGGING TOOL FOR A R3000 MICROPROCESSOR BASED SYSTEM phi phi APPLICATION NOTE AN-67 phi phi phi phi BSysClk BMemWr WrAck OutEn AddrOut DataOut 1WE TCLK CS 2676 drwOS Figure 5. Main Memory Write Cycle (Single Word Write) TIMING ANALYSIS Figure 4 is the timing waveform for a single word read and Figure 5 is the timing waveform for a single word write. Since the system bus timing parameters are dependent on an external memory controller, Table 2 summarizes the important handshaking signals needed to satisfy the protocol necessary to trace system bus signals. AddrOutEn is an input to the read buffer from the main memory controller. When asserted, this input will enable the address that is registered in the read buffer to the system bus. McRd is a read strobe that is generated by the main memory controller in response to a MemRd pulse from the R3000. RBDEn is a main memory controller input to the read buffer 9.12 that registers the data available on the system data bus into the read buffer. WrAck is an input to the write buffer from the main memory controller. It indicates that it has written the word presented to it to main memory. RdAck is also a main memory controller output that is used to generate the RdBusy signal to the R3000. TWE is an input to the IDT71502s that latches data addresses, instruction addresses, data, and instructions; it is also an output from our PAL state machine. TCLK is the clock input to the IDT71502s tracing data addresses, instruction addresses, data, and instructions. The signals TWE and TCLK are also used as inputs to the IDT71502s in order to trace the control bus signals. 6 USING IDT71502 RAMs IN A REAL·TIME DEBUGGING TOOL FOR A R3000 MICROPROCESSOR BASED SYSTEM APPLICATION NOTE AN-67 Signal Function BMemRd The buffered memory read signal from the R3000 BMemWr The buffered memory write signal from the R3000 AddrOutEn Read buffer address output enable signal from the memory controller McRd Main memory read strobe from the memory controller RBDEn Read buffer data enable strobe from memory controller WrAck Write acknowledge to write buffer from memory controller RdAck Read acknowledge to read buffer from memory controller TWE Write enable input to IDT71502 from PAL state machine TCLK Clock input to IDT71502 from PAL state machine 267611>104 Table 2. TIMING SPECIFICATIONS FOR THE IDT71502s CONCLUSION tTWDS is the IDT71502 specification defined as "Trace Write Data Set-up Time". The user must satisfy the following condition: tTWDS ~ 8ns tTWDH is the IDT71502 specification defined as "Trace Write Data Hold Time". The user must satisfy the following condition: tTWDH ~ 2ns tTWS is the IDT71502 specification defined as "Trace Write Enable Set-up Time." The user must satisfy the following condition: tTWS ~ 8ns tTWH is the IDT71502 specification defined as "Trace Write Enable Hold Time." The user must satisfy the following condition: tTWH ~ 2ns The IDT71502 is a multifunction RAM that is fast enough to be used to trace the operation on most high-speed microprocessors including the IDT79R3000 RISC microprocessor. The 25ns speed grade can be used to trace full speed the operation of this processor up to 25M Hz. The discussion in this paper focused on providing the pertinent information needed to construct a monitor circuit based on IDT71502 multifunction RAMs to trace the system bus of an R3000-based system. This discussion is also valid for users interested in using the IDT71502 RAMs in a trace mode to monitor system buses based on other high-speed processors. Microprocessor based systems are usually provided with software routines that are used as diagnostic tools to test system primary and secondary memory for failures. These programs also test 1/0 devices before the user receives a prompt, telling him the system as a whole is ready for service. This procedure is usually carried out after system reset, but occasionally during normal operation the system "crashes" in the middle of some critical task and the user has no clue as to what happened priorto the "crash". The IDT71502 multifunction RAMs, when operated in trace mode and mounted permanently on critical system paths, can serve as "black boxes"to givethe user this very important information. This information can then be very conveniently retrieved via the four bit serial protocol channel connected to the RS-232 connector and the reason for the crash can be determined. The IDT71502 is a mu Itifunction RAM that has the capability to serve as a valuable logic monitoring tool. It contains the Serial Protocol Channel and a breakpoint comparator, has a 4K x 16 memory space and is available with an access speed of 25ns. Thus, it is well-suited for use as a single chip logic _ analyzer in high-speed, high-density environs. 9.12 _ 7 t;)® Integrated Device Technology,lnc. USING THE IDT7MB6049 CACHE MODULE WITH THE IDT79R3000 RISC PROCESSOR IN SINGLE OR MULTIPROCESSOR SYSTEMS APPLICATION NOTE AN-76 by Kelly Mass The I DT7MB6049 is a complete cache module for the I DT79R3000 RISC processor and is designed for both singleand multi-processor systems. It has two banks of SRAMs, each configured as 16K x 60, and each with address latches. One bank is used to cache instructions, the other to cache data. They share a data bus, allowing one bank to be accessed at a time. Use in multi-processor systems, is facilitated by a second address bus and an additional set of latches forthat bus. This bus is used in multi-processor applications to latch an address from a source other than the R3000. This allows the system to invalidate entries in the data cache in conjunction with the R3000. This is done in order to maintain cache coherency. The setof address latches forthe instruction cache is included in the module for symmetry, although normally no invalidations are done to the instruction cache. Instruction cache invalidation would require cache swapping, but only data cache invalidation is described below. When the system wants to invalidate an entry in the data cache, it forces the R3000 into an MP Stall by asserting CpCond(3). During the one clock cycle that it takes for the processor to enter the MP Stall, it is the responsibility of the system to disable the output of the latch which supplies the processor's address to the data cache, and enable the output of the latch which supplies the invalidate address. The module pins P10E(1) and P20E(1) are used for this purpose. It is important that they should never be activated simultaneously since the outputs of the latches are tied together. The same applies to P10E(2) and P20E(2) for the instruction cache. Both address latches for the data cache are normally clocked by the same DClk signal from the R3000 through the P1 LE(1) and P2LE(1) pins of the 7MB6049. Once the processor is in MP Stall, it strobes DRd while CpCond(2) is unasserted, allowing the system to read the contents of the cache. The actual invalidation of the data cache entries begins when the system asserts CpCond(2) and provides the appropriate invalidate address. CpCond(2) causes the R3000 to output an invalid bit and strobe DWr. Multiple invalidations are performed by keeping CpCond(2) and (3) asserted, and changing the invalidate address. Note that the invalidate address timing must be consistent with the processor timing. One suggestion is that the invalidate address input of the module be driven by a register that is clocked by SysOut. The IDT7MB6049 has two chip select (CS) signals. Both ofthese should be grounded ifthe cache is not depth expanded. The four output enable (OE) and four write enable (WE) signals are split evenly between the data and instruction cache: (1-2) control the data cache, and (3-4) control the instruction cache. OE(1-2) ofthe 7MB6049 connectto the DRd1 and DRd20n the R3000. DRd1 and DRd2 are identical, and the load should be distributed evenly between them. Likewise, OE(3-4) connect to IRd1 and IRd2, WE(1-2) connect to DWr1 and DWr2, and WE(3-4) connect to DWr1 and DWr2. The convention of the pin naming of the 7M B6049 is that P 1 refers to the address from the R3000, and that P2 refers to the (invalidate) address from the system. Likewise, (1) refers to the data cache and (2) refers to the instruction cache. As shown in Figure 2, P1 LE(1) and P2LE(1) are typically connected together to DClk since they latch addresses into the two data cache latches. P1 LE(2) and P2LE(2) likewise are connected together to IClk. P2LE(2) is not used if instruction cache invalidation is not performe~ Similarly, P1 OE(1) and P10E(2) are typically connected together so that the outputs of the two R3000 address latches are enabled and disabled together, while P20E(1) and P20E(2) can together control the output of the invalidate address latches. P20E(2) may be pulled continuously high if the instruction invalidate address latch is unused. The 60 data I/O pins of the module are labeled D(O) to D(59). Although the ordering of the data and address pins of a RAM is normally arbitrary and can be ignored, that is not the case with the 7MB6049. Because of steps taken to reduce the chip count and power consumption of the module, Tag(12)Tag(15) of the R3000 must connect to D(36)-D(39) on the 7MB6049, and AdrLo(12)-AdrLo(15) of the R3000 must connect to P1A(10)-P1A(13) on the 7MB6049. The order in which the other I/O pins are connected is not critical. Table 1 shows recommended I/O pin connections between the R3000 and 7MB6049. R3000 Signals ID17MB6049 Signals Data Data{O) - Data(31) D{O) - D(31) Data Parity DataP{O) - DataP(3) D(32) - D(35) Tag Tag(12) - Tag(31) D(36) - D(55) Tag Parity TAgP{O) - TagP(2) D(56) - D(58) Tag Valid TagV D(59) 2730tb101 Table 1. Connection of Data and Tag Buses 8190 e1990 Integrated Device T echnoiogy. Inc. 9.13 USING THE IDT7MB6049 CACHE MODULE WITH THE IDT79R3000 RISC PROCESSOR IN SINGLE OR MULTIPROCESSOR SYSTEMS APPUCATION NOTE AN·76 TO SYSTEM r----------------------~---------------------~ INVALID PROCESSOR INVALIDATE ADDRESS ADDRESS ENABLE ADDRESS ENABLE R3000 DCLK ICLK AdrLo (2·15) DWr1, DWr2 DRd1, DRd2 IRd1,IRd2 IWr1,IWr2 DATA AND TAG BUSES 60 2730 drwOl Figure 1. Block Diagram of the IDT7MB6049 9.13 2 USING THE IDT7MB6049 CACHE MODULE WITH THE 1DT79R3000 RISC PROCESSOR IN SINGLE OR MULTIPROCESSOR SYSTEMS APPUCATION NOTE AN-76 IDT7MB6049 R3000 P2A(0)-P2A(13) t - - - INVALIDATE ADDRESS (2-15) DClK IClK Addrlo (2-15) ---- P1lE(1) P20E(1) I---+-- INVALID ADDRESS ENABLE P2lE(1) P20E(2) P1 lE(2) P10E(1) P2lE(2) P10E(2) WE(1). WE(2) ORd1. DRd2 OE(1). OE(2) IRd1.IRd2 OE(3). OE(4) IWr1.IWr2 TO SYSTEM P1A(0)-P1A(13) DWr1.DWr2 -- PROCESSOR ADDRESS ENABLE WE(3). WE(4) 0(0)-0(59) DATAANO TAG BUSES CS(1). CS(2) 2730 drw02 Figure 2. Pin Connections of the IDT7MB6049 9.13 3 t;)® APPLICATION NOTE AN-77 IDT79R3001 SPECIFICATIONS & CACHE RAM TIMINGS Integrated Device Technology, Inc:. 1.0 GENERAL DESCRIPTION 3.0 THREE CLOCKS AND DELAY-LINE SETTINGS The IDT79R3001 is a RISC microprocessor which is used in a variety of applications ranging from low-end embedded controllers to high-end workstations. Currently, the R3001 operates at a frequency of up to 33 MHz. This specification does not explain the functionality of the R3001 nor its architecture but is limited to describing the key timing parameters. For a more detailed description of the functionality and architectural description of the R3001 , please refer to the references listed at the end of this document This document starts with a brief description of the R3001 , the three-phase clock inputs, cache timings, required timings for SRAMs to function as cache for the R3001 , two technical notes explaining the factors used in the timing calculations, and the conclusion reached. Figure 3.1 shows a block level diagram of the R3001 with its three clock inputs coming from a delay line. Table 3.1 shows a summary of the delay line settings to be used for different operating frequencies of the R3001. Please note carefully that Clk2xSys Is taken asthe zero time reference and comes from the first tap of the delay line. The other 2x clocks lag Clk2xSys in time and follow it with respect to delay line taps. 4.0 DERATING CALCULATIONS AND CACHE TIMING CONSIDERATIONS USING x4 SRAMS The design of the cache subsystem for the R3001 is straightforward. Industry standard static RAMs function as cache. This chapter discusses the methodology used to calculate the critical timing parameters for a static RAM so that it can function as cache forthe R3001. This chapter examines the timings for a 16 MHz, 20 MHz, 25 MHz, and 33 MHz R3001. The timing equations derived take into account the effect of capacitive loading on the bus. The derating factors are calculated based on certain assumptions. These assumptions are detailed in this chapter and the derating factors calculated. The timing equations are then discussed. At the end of this chapter a table containing the SRAM timings (for different operating frequency of the R3001) is included. 2.0 FUNCTIONAL DESCRIPTION The IDT79R3001 is a 32-bit RISC microprocessor that is currently available from Integrated Device Technology, Inc. in two packages: the 144-pin PGA and the 172-pin ceramic flatpack. It has a 32-bit data bus, a 32-bit address bus that is divided into the low-order bits (AdrLo) and the high-order bits (TAG), control signals for the cache, control signals for main memory, and power and ground pins. The R3001 also has three double frequency clock inputs and one clock output used for interfacing the R3001 to the external world. IDT79R3001 CIk2xSys CIk2xSmp'Rd Delay Line CIk2xPhi Figure 3.1. Three-Phase Clock Input to the R3001 Parameter Clk2xSys Clk2xSmp Clk2xPhi 16 MHz 0 20 MHz 0 25 MHz 0 33 MHz 0 6 16 6 14 6 12 4.5 9 II I Table 3.1. Delay Line Settings for R3001 Operating at Different Frequencies ©1990 Integrated Device Technology. Inc. 9.14 8/90 R3001 SPECIFICATION & CACHE RAM TIMINGS MICROPROCESSOR INTERFACE GUIDE APPUCATION NOTE-n .. .. 1m lWr t5E ~ .. .. IlE IClk Adr I FCT373 1 12X 1DT7198 (16KX4) Data Adrlo SysOut Data - -- -' - Address Register FCT823 .. IDT79R3001 Data Register FCT374 1 ..... ril=\d rmr DClk ..I lE I .. FCT373 1 t5E < < I- H Data WE 12X IDT7198 (16KX4) Adr Figure 4.1. Block Level Diagram of a Cache Subsystem with the R3001 Using IDT7198 16K x 4 to Function as Cache (the R3010 Is not shown In this ngure) 4.1 Device Capacitance Figure 4.1 shows a typical R3001 based system. The cache comprises of fast 16 K x 4 static RAMs e.g., the IDT7198. The AdrLo bus of the R3001 goes through a highspeed transparent latch: the FCT373. It also goes through a latch which is used to address the main memory. All the devices have an input and an output capacitance. In addition, each device is capable of driving a certain load. These parameters: the input capacitance, the output capacitance, and the load capacitance are given in Table 4.1. The cache format of the R3001 comprises of 48 bits: 32 bits of data, 15 bits of tag and, a valid bit. With this requirement, it is clear that forthe instruction cache, 12 IDT7198s (16K x 4 SRAMs) are needed. The data cache has the same format. This means that there are a total of 24 SRAM devices for the cache. 9.14 Device #of Devices R3000 1 Cin = 10 pF 10pF IDT7198 2 Cout = 7pF 14pF IDT374A 1 Cout = 12 pF 12 pF IDT8238 1 Cin 10pF Capacitance = 10 pF Total Capacitance Table 4.1. Capacitances of the Various Devices In a Typical R3001 System 2 R3001 SPECIFICATIONS & CACHE RAM TIMING MICROPROCESSOR INTERFACE GUIDE APPUCATlON NOTE-n 4.2 Assumptions for Surface Mount Layout Design With 'x4' SRAMs In the following sections, certain assumptions have been made while calculating the derating factors. These are as follows: 1) The trace has a capacitance of 2 pF/inch. 2) The speed of light is 2 nslfoot in epoxy. 3) The R3001 speeds are specified with a loading of 25pF. For every additional 25 pF, there is a delay of 1 ns. Note that the cache control signals are specified with a 50pF load and derate 1ns/25pF after that. 4 The distances between the R3001 and the latches are approximately 1 inch each. 5) The distances between the R3001 and the RAMs ap proximately 4 inches each. 6) In all of the assumption, it is assumed that a surface mount package is used. Figure 4.2 shows a brief mechanical layout of an R3001 board. 1 D~ache 1 Tag 00~ [3;tJ 00 1R=01,8 Ii Tag 3" 1" i I I. D-Cache Data 1 00~ 00 ~ Let us now examine the capacitive loading between the latches and the RAM. Path length from latches (373s) to RAM (7198s) = 3" (4.8) Trace capacitance from latch to RAM = 3 X 2 pF/in = 6pF (4.9) Input capacitance of the RAM = 5 pF (4.10) Each output from the latch is connected to eight RAM devices. (4.11) Load due to 8 devices = 6 X 5 = 30 pF Total capacitance = 30 + 6 = 36 pF (4.12) (4.13) The rated '373 load = 50 pF From Eq. (4.12) and Eq. (4.13) it can be seen that there is no delay due to the capacitive load between the latch and the RAM. However, there is a delay due to the capacitive load between the R3001 and the latch. This delay can be calculated as follows: Data 1" Trace length from the CPU to the latch = 4 inches (4.1) Capacitance of the trace, Ctrace = 4 X 2 pF/inch = 8 pF (4.2) Input capacitance of the 373 latch = 10 pF (4.3) As each address bit is connected to five latches (FCT373), Total input capacitance due to 5 devices, C373in = 5 X 10 = 50 pF (4.4) Total capacitive load = Ctrace + C373in = 8 + 50 = 58 pF(4.5) (4.6) The rated R3001 load, CL(R3001) = 25 pF From Eq. (4.5) and Eq. (4.6). Extra capacitive loading for (4.7) the R3001 = 58 - 25 = 33 pF 3" I Assume read and write buffer underneath the board Figure 4.2. Surface Mount Board of an R3001 System with Cache and Main Memory Interface and Approximate Distances Between the Various Devices 4.2.1 Address Bus Derating Calculations For the system shown in Figure 4.1 each address bit is connected to five latches: one going to the main memory interface buffer, two to the instruction cache and tag memory, and two to the data cache and tag memory respectively. The latches in turn are connected to the address pins on the static RAM. Figure 4.3 shows all the devices that each address bit is connected to. BitA2 For every extra 25 pF of load, there is a delay of 1ns (4.14) From 4.7 and 4.14, delay due to the capacitive load = 33 I 25 = 1.32 ns (4.15) (4.16) The speed of light == 2 nslfoot For a maximum path length of 5", delay = 5"/12" X 2 = 0.8 ns (4.17) From Eq. (4.15) and Eq. (4.17), Total propagation delay for the address bus, AdrLod = 1.32 + 0.8==2ns (4.18) 4.2.2 Data Bus Derating Calculations The derating calculations for the data path are similar to those done for the address path. The data bus is connected to the floating point unit (R301 0), the instruction cache (IDT7198), the data cache (IDT7198), a read register (FCT374A), and a write register (FCT823). This is shown in Figure 4.4. Two cases must be considered: a data store and , . a data fetch. Both are discussed, _ I Figure 4.3. Block Diagram Showing Various Devices Connected to One Address Bit 9.14 3 R3001 SPECIFICATION & CACHE RAM TIMINGS MICROPROCESSOR INTERFACE GUIDE APPLICATION NOTE-n Bit DO J~ h ~IJ " " R3000 - IDT7198 (2) FCT374A FCT823B (1& D) I (caches) " R3010 Figure 4.4. Block Diagram Showing Various Devices Connected toOne Data Bit 4.2.2.1 Data Store (R3001 CPU Outputs Data): Each data bit is connected to two RAMs (7198s) - one for instruction and one for data. (4.18) The path length for the data bus = 5" Trace capacitance for the data bus = 5 X 2 pF/in=10pF (4.19) Capacitive loading due to devices, Cdevices = 2 X CRAMin + C374in + CS23 + CR3010 (4.20) Cdevices = 2 x 7 + 12 + 10 + 10 = 46 pF (4.21) Total capacitive load = Ctrace + Cdevices = 46 + 10 = 56 pF (4.22) Propagation delay due to speed of light = 5"/12" x 2 = 0.8 ns (4.23) Delay due to capacitive load = (56 -25 ) 125 = 1.24 ns (4.24) From Eq. (4.23) and Eq. (4.24), Total propagation delay on a store = 1.24 + 0.8 '" 2 ns (4.25) 4.2.2.2 Load (RAM Provides Data) Since the trace length is the same, Ctrace = 10 pF (4.26) Capacitive load due to devices, Cdevices = CR3001 + CR3010 + CRAMin + C374in + CS23 Cdevices = 10 + 10 + 12 + 10 + 7 = 49 pF (4.27) Total capacitance = Ctrace + Cdevices = 10 + 49 = 59 pF The RAM rated drive is 30 pF. Extra load = Total capacitance - RAM rated drive = (4.28) 59 - 30 = 29 pF Propagation delay due to capacitive load = 29/25 = 1.16 ns (4.29) (4.30) Propagation delay due to the path length = 0.8 ns (4.31) Total propagation delay = 1.16 + 0.8 '" 2 ns 4.2.3 Read and Write Control Derating Calculations The effect of the capacitance on the control signals fromthe R3001 processor to the caches and the memory interface is considered here. The control signals on the R3001 are the IRd, DRd, IWr, and DWr which control the instruction cache read, data cache read, instruction cache write, and data cache write respectively. The read and write control signals are connected to the output enable (OE), and write enable (WE) of the instruction and data cache, respectively. Assuming the use of a 16 K x 4 I DT7198 static RAM, each control signal is connected to 8 such static RAMs. Number of devices (SRAM) connected to each control line = 12 Input capacitance of each device (SRAM) = 5 pF Total load capacitance = 5 x 12 =60 pF Path length = 5" Trace Capacitance = 5 x 2 pF/in = 10 pF = 10 pF Total capacitance = 60+ 10 = 70 pF Extra capacitive load = 70 - 50 = 20 pF Propagation delay due to capacitive load", 1 ns Propagation delay due to the trace length = 0.8 ns Total propagation delay = 1 + 0.8 '" 2 ns 9.14 (4.32) (4.33) (4.34) (4.35) (4.36) (4.37) (4.38) (4.39) (4.40) (4.41) 4 R3001 SPECIFICATIONS & CACHE RAM TIMING MICROPROCESSOR INTERFACE GUIDE I I I I D-cache D-cache D-cache D-cache L.. APPUCATION NOTE-n IB ~ IB 3 ~ IB 3 IB ~ ~ -l 00~ 00~ 00 00 a: ~ R3000 ~ IJ~ EJ ~ ..J_ 5" I -g I a: s ~ a: -g -g a: 00 0000 00 I 2" Addr. Reg . J_ I ~ I 2" 2" Q) I,· .. Figure 4.5. Board Layout for a Through-Hole Design of an R3001 Cache Subsystem 4.3 Assumptions for Through-Hole Layout Design Using x4 SRAMs 4.3.2.1 Data Store (R3001 CPU Outputs Data) Each data bit is connected to two RAMs (7198s) - one for instruction and one for data. (4.50) The path length for the data bus = 10" Trace capacitance for the data bus =10 x 2 pF/in = 20 pF (4.51) Capacitive loading due to devices, . Cdevices = 2 x CRAMin + C374in + CS23 + CR3010 (4.52) 4.3.1 Address Derating Calculations Cdevices = 2 x 7 + 12 + 10 + 10 = 46 pF (4.53) Total capacitive load = Ctrace + Cdevices = 20 + 46 = 66 pF For the system shown in Figure 4.3, the number of devices (4.54) connected to the R3001 is the same. Propagation delay due to speed of light = 10"/12" x 2 = 1.6 ns' (4.55) (4.42) Trace length from the CPU to the latch = 9 (4.43) Delay due to capacitive load = (66 -25 ) /25 = 1.55 ns (4.56) Trace capacitance = Ctrace = 9 x 2 = 18 pF Input capacitance of the latches = 10 pF (4.44) From Eq. (4.23) and Eq. (4.24), Total capacitance = 5 x C373 + Ctrace = 5 x 10 + 18 = 68 pF Total propagation delay on a store = 1.54 + 1.67 ... 3 ns (4.57) (4.45) (4.46) 4.3.2.2 Load (RAM Provides Data) Extra load on the R3001 = CL = 68 - 25 = 43 pF Since the rated 373 load is 50 pF, there is no derating factor Since the trace length is the same, Ctrace = 20 pF (4.58) between the FCT373 and the RAMs. Capacitive load due to devices, Therefore the derating is between the R3001 and the latches. Cdevices = CR3001 + CR3010 + CRAMin + C374in + CS23 (4.47) Delay due to capacitance = 43/25 = 1.75 ns Cdevices = 10 + 10 + 12 + 10 + 7 = 49 pF (4.59) Propagation delay due to the trace length = 9/12 x 2 = 1.5 ns (4.48) Total capacitance = Ctrace + Cdevices = 20 + 49 = 69 pF The RAM rated drive is 30 pF. Total derating on the address bus = 1.75 + 1.5"" 3 ns (4.49) Extra load = Total capacitance - RAM rated drive = In this section, the deratings are calculated for a through hole design. Figure 4.5 shows an example of the layout of a through hole design. This layout corresponds to an demonstration board used extensively at lOT. The data trace lengths are 10 inches and the address trace lengths are 9 inches. 4.3.2 Derating on the Data Bus As in section 4.2.2, the derating for the data bus is calculated for two cases: i) an instruction fetch, and ii) data store. 69 - 30 = 39 pF Propagation delay due to capacitive load = 39/30 = 1.3 ns Propagation delay due to the path length = 1.6 ns (4.60) Total propagation delay = 1.3 + 1.6"" 3 ns 9.14 (4.61) (4.62) (4.63) 5 II • I R3001 SPECIFICATION & CACHE RAM TIMINGS MICROPROCESSOR INTERFACE GUIDE APPLICATION NOTE-77 4.3.3 Read and Write Control Deratlngs For a through hole design, the effect of derating on the control signals will be more. This section calculates that effect. The trace length from the CPU to the RAMs is 9 inches for the layout shown in Figure 4.5. Each control signal is connected to 8 devices. Number of RAM devices connected to each control signal = 12 Input capacitance of each RAM = 5 pF Total load capacitance = 12 x 5 = 60 pF The trace length = 9" Trace capacitance = 9" x 2 pFlinch = 18 pF Total load capacitance = 60 + 18 = 78 pF Extra load = 78 - 25 = 53 pF Derating due to capacitive load = 53/50 = 1.6 ns Propagation delay due to trace length = 9/12 x 2 nslfoot = 1.6 ns Total derating = 1.6 + 1.6", 3 ns (4.64) (4.65) (4.66) (4.67) (4.68) (4.69) (4.70) (4.71) (4.72) (4.73) 4.4 Timing Equations for Cache Design This section deals with the timing equations that enable us to determine the critical timing requirements of the static RAM that will be used as cache. These equations are based on the use of static RAMs (without built-in latches) as cache RAMs. The superscript cd' in the following equations denote the deratings to be taken into account. The static RAM chosen for illustration here is a 16K x 4 IDT7198. The board Is assumed to be surface mount for all speeds of the R3001 except for the 16 MHz speed grade. The deratings for the surface mount board is 2 ns and that for a through hole board (which is used for the 16 MHz R3001) is 3 ns. The deratings were derived from certain assumptions. The explanation and the methodology used is explained in the previous sections. In the following, a generalized equation is given followed by the timing requirements for different frequencies of the R3001. All calculations are based onthe R3001 specifications forthe four speed versions (16, 20, 25, and 33 MHz), which are found in the lOT data sheets. Figures 4.6,4.7,4.8, and 4.9 show the timing diagrams of the R3001 when it is doing a data store followed by an instruction fetch. This is the worst case example and is chosen to determine the SRAM parameter requirements. Figure 4.6 shows the timing diagrams for an R3001 operating at 16 MHz. Figures 4.7,4.8, and 4.9 show the timing diagrams for an R3001 operating at 20 MHz, 25 MHz and 33 MHz respectively. The encircled numbers represent the equations presented in section 4.4. The timing diagram in conjunction with the equations are used to arrive at determining the timing requirements. 9.14 6 ;::xJ gg 0 .... 60 ns Cycle Timing "0 en :xJ"O g~ phi h" 2) STORE (oh hi 1) FETCH (oh m;; eno en o~ :xJ::! I AdrLo I Adr 3 d 9 + RAM AA + os 5.2 AdrLo d+ 373 29+3 ~ ( READ:) ~, , I rd c: ;; ""l:n I 0 m 0 :r CD :... "" c IRd (WRITE:) iii CPUout .......... , ~. /II/A ..... y 3+3 16 2 4.8 Sysd+ 240 G") 30(30) 15+3 d RAM OE 14 (14) 4 RAM HD - RAM LZ a::> ~ WrDly ~ "- I I I I ", (6) 2 1d Hold sys - RAM LZ- Rd CD ~ "0 "0 I !: (6) E SysClk sys sys (0) (0) (14) II -2.5 10.5 3 Rf +RAM HZ - Den CD CD 2 (2.2) : PO + 6(6) ..... -I §: Z (" I 1.5 DV,aid + Setup sys - Sysd_ 240 ~3: ~~ 0 :t :xJ c::xJ -~ Ref +RAM OE + os (]) 2 d N G")m 9 x-et D out DVai + RAM so - Wr d 3: m:t 9 (10) I ~, en !: d ~II- samp iil 3 15+3 3 I in -len 00 samp CD """"'A A'-'-'-'-'-'-'-Y 18 (24) cc §' m .... ·/""",I// / / / /'". _0 ZZ "TIO ~~ 9 i I ~"", il co CD ..... .......... .'-.'-. .......... , ' ,.........." , ..... ,-....T "TI ::! 3 s cc samp PO 49.2 (50) RAMout 'I - D Adr oz z ~ ~ :::IJ 08 :IJ .... ~en 50 ns Cycle Timing :IJ"tI g~ m-n phi phi STORE (phase 2) AdrLo 2 24+2 52 d 8 AdrLo d+ 373 po+ RAM AA + DS RAM out -n 41.2 (42) ( READ:] ~ samp CD &"'~""""""'v rd c iii 0 :T III ~ lAd [WRITE:] iii" g- ~ CPUout 3+2 13 Z en 2 110 CD 14(14) ~ o: so - Wr d CD - 0 :::: "2 Ref + RAM HZ - Den CD ~ 1 d DVai + RAM I\) 17(17) J: N DWr 4 RAM HO • RAM LZ CD ~ ~ WrDly ~ t\. 0 ~ (6) (6) 3+2 DVal d + (8.5) I Setup 1 1.5 sys - Sysd_ 240 I.. 11 (11) PO CD 1 4.8 Sysd+ 240 I" (3) 2 PO + Hold 1d sys - RAM L£ RdlCD 5.8(6)" i (6) SysClk t--. v > "tI sys sys c: (0) (0) (14) CD 3: C') I o out '1::777771.. III :IJ w m::! samp CQ 3 c:> 63: ~<;;!£<;;V!~ A~"'''''' 15 (19) C III C'):IJ 25 (25) d RAM OE 13+2 0 III ~ O~ samp CD 13+2 ~ ;.., ::! 3 :sec ~> mm 7(8) I in rril!O :lJO d 8 Ref +RAM OE + DS CD 2 cD _0 ZZ 7 I V7777r7777777777777b :IJ::! I DAdr I Adr ~o 0> phi FETCH (phase 1) "tI ~ -I oZ Z ~ 71 :::t :::0 -w 00 0'" "tIen :o"tl Om 00 :00 40 ns Cycle Timing phi FETCH (phase 1) I Adr D Adr 2 52 19+2 d AdrLo + 373 PI1" ~ RAM d 6 AA + 8 samp CD os phi 5 -ren ~RO "T10 » 00 C)m c::o ~~ 5 (6) -r K«<~<~ I in 0> :o~ _0 ZZ m:::t: samp CD I 34.2 (34) (READ:) I en STORE (phase 2) AdrLo RAMout m;; enO phi ~ Z "T1 ce· e C) rd Cil A ~ I 0 III 0 ::r CD -r 3· s· <0 :... CC A ekx CPUout [ 10 d DVai + N 12("i"4) -1.5 CD 111.5 (12) ~ 1 RAM so - Wr d - 13(14) :::t: == N DWr 0 I I 2. RAM HO - RAM LZ CD o a:> U1 r- ~ 2+2 d (6) l I d Setup 1.5 sys - Sys - 240 PD 1 CD I I I (3) 2 1d Sysd+ 240 PD + Hold sys- RAM L£ RdlCD 7.5 (S) SysClk 4.S WrDly f(6) (6) DVai + ~ 3 I'\.. :! <0 IS samp D out 2+2 III w 2 Ref + RAM HZ - Den <3) "- (WRITE:) C iii· :0 d OE ! lAd CC iil 3 10+2 RAM 5.8(6) i sys • I (6) { sys (0) (0) (12) I > "tI "tI C 0 > -r 0 z z 0 -r ~ II s:::a -w 00 ::a 0 0 ..... "en :tJ"tI g~ m" 30 ns Cycle Timing phi AdrLo AdrLo ~ I DAdr 4.2 12+2 d 4.5 + 373 PO RAM AA + DS 3.5 samp a:> d ~»»»»>X " d 4.5 Ref +RAM. OE + DS 2 cC c rd iD I in 0> ::a:j _0 ZZ rri~ ::a 0 ,,> >0 O::t: mm c:> samp CD I C)::a 3.5(4.5) 24.7(25.5) (READ:] ~o phi FETCH (phase 1) I Adr 2 RAMout phi STORE (phase 2) 63: K«««; m~ i Z en C) 7+2 ""' 10 a> 15.5 (15.5) 7+2 0 RAM 0> 0 d OE 2 I6 ,1 R~ +RAM HZ • Den CD w w :!: DVai 7.5 d 1 + RAM so ,Wr d N ::a ~ 3 CD 10.5(10.5) ::t: $X X-$ D out DWr ~ - RAM HO - RAM LZ CD t-t ~ (4.5) (4.5) 2+2 d DVai + (4.5) I Setup 1 d 1.5 sys' Sys ·240 PD 1 CD Sysd+ 48 24~ (1.7) PD+ ! 2 1d Hold sys' RAM ' RdlCD 4.5(4.5) 6 (6) • I LZ > '"C (4.5) SysClk ..... o WrDly !---- '"C C sys sys (0) (0) (9) o ~ o Z Z o ~ ~ R3001 SPECIFICATIONS & CACHE RAM TIMING MICROPROCESSOR INTERFACE GUIDE APPUCATION NOTE-n The following equations are used to determine the timing parameters for the static RAM so that they can function as cache for different operating frequencies of the R3001. The numbers at the left correspond to the encircled numbers in the timing diagrams. Equations 9 and 10 are not shown in the timing diagram but are included for completeness. The equations also use some R3001 parameters. These are listed in Table 4.2. (1) Internal Sample to Phase Delay This is the time that the processor needs to sample the incoming data. Typically, for the R3001, tsmp ;::: 5. specifications should take this system factor Into consideration and specify the output enable time at least one nanosecond lower than the calculated timings. tOES ~ tcyC/2 - tRod - tDS - tsys-smp + tsys-rd - tOEs d 16 MHz R3001: tOES ~ 30 - 3 - 9 - 10 + 10 - 3 tOES ~ 15 20 MHz R3001 : tOES ~ 25 - 2 - 8 - 8 + 8 - 2 tOES~ 13 25 MHz R3001 : tOES ~ 20 - 2 - 6 - 6 + 6 - 2 tOES ~ 10 33 MHz R3001 : tOES ~ 15 - 2 - 4 - 4.5 + 4.5 - 2 tOES~ 7 (2) RAM Address Access Time This equation is used to determine the Address Access time parameter requirements of the static RAM. From the timing diagram of Figure 4.9, it is easily calculated. As an example, let us calculate the address access time for a 33 MHz R3001. The total cycle time for a 33 MHz R3001 is 30 ns. If the processor's sample time requirement is met the time remaining in the cycle is 24 ns. This time the data has to be presented to the processor. The processor requires a data setup time of 4 ns. There is also a propagation delay through the latch for the address bus. For the 33 MHz part, a fast FCT373C is used which has a maximum propagation delay of 4.7 ns (See Table 4.3). The derating factors due to the capacitance and the trace length also have to be taken into account. Using all these factors, the equation is: (4) Minimum Read Pulse Width This timing requirement guarantees that the read pulse width generated by the processor is at least as long as the cache RAM output-enable time. tOES ~ tcyC/2 - tsys-r d - tOES d 16 MHz R3001: tOES ~ 30 - 10 - 3, tOES ~,17 20 MHz R3001: tOES ~ 25 - 8 - 2 tOES~ 15 tRAMAA ~ tcyc - tsmp - tos - t373PO - tAdrLo d -tRAMAAd 25 MHz R3001: tOES ~ 20 - 6 - 2 tOES ~ 12 16 MHz R3001: tRAMAA ~ 60 - 10 - 9 - 5.2 - 3 - 3 tRAMAA ~ 29.8 33 MHz R3001: tOES ~ 15 - 4.5 - 2 tOES ~ 8.5 20 MHz R3001: tRAMAA ~ 50 - 8 - 8 - 5.2 - 2 - 2 tRAMAA ~ 24.8 (5) Read-Write I-Cache Data Bus Contention This timing requirement ensures that the RAM output is tristated soon enough after the instruction read signal goes high. In the worst case, when the processor performs a store operation, no data contention occurs. 25 MHz R3001: tRAMAA ~ 40 - 6 - 6 - 5.2 - 2 - 2 tRAMAA ~ 18.8 33 MHz R3001: tRAMAA ~ 30 - 4.5 - 4.5 - 4.7 - 2 - 2 tRAMAA ~ tRAMHZ ~ tsys - tRd d + DEn 12.3 16 MHz R3001: tRAMHZ ~ 20 MHz R3001: tRAMHZ ~ 25 MHz R3001: tRAMHZ ~ (3) cache Enable to Sample This equation is used to determine the system output enable(toES) requirements of the cache RAM. This should meet the processor's setup specification. The output enable time (tOE) specifications for the RAM is tested for a voltage change of 200 mV (a fall from 1.732 Vto 1.532Vtor1DT RAMs). Fora system, however, the voltage falls from approximately 3.3 V to 1.5 V. This fall time is usually a nanosecond. Therefore the RAM 9.14 16 - 3 + (-2.5) tRAMHZ ~ 10.5 14 - 2 + (-2) tRAMHZ ~ 10 12 - 2 + (-1_5) tRAMHZ ~ 8.5 33 MHz R3001: tRAM HZ ~ 9 - 2 + (-1) tRAMHZ ~ 6 11 R3001 SPECIFICATION & CACHE RAM TIMINGS MICROPROCESSOR INTERFACE GUIDE APPUCATION NOTE-n (6) Processor Data-Setup to End of Write This enables the designer to determine whether the cache RAMs have adequate data setup time when the processor does a store operation. In the equation, the minimum derating is used on the write line i.e., tWr d because that is the worst case assumption. tRAMOS :::; tcyC/2 - tsys-smp - tOVal - tOVal d - tWr d 16 MHz R3001: tRAMOS:::; 30 - 10 - 3 - 3 - (-2) tRAMOS:::; 16 20 MHz R3001 : tSetupSys :::; 25 - 14 - 3 - 2 + 1 + 1.5 tSetupSys :::; 8.5 25 MHz R3001 : tSetupSys :::; 20 - 12 - 2 - 2 + 1 + 1.5 tSetupSys :::; 6.5 33 MHz R3001: tSetupSys:::; 15 - 9 - 2 - 2 + 1 + 1.5 tSetupSys :::; 4.5 (9) Data Hold from SysClk 20 MHz R3001: tRAMOS:::; 25 - 8 - 3 - 2 - (-1) tRAMOS:::; 13 This timing parameter is to guarantee that the hold time specification for an external register is met on a processor store. In this equation the minimum value of tRod is taken to insure worst case numbers. 25 MHz R3001: tRAMOS:::; 20 - 6 - 2 - 2 - (-1) tRAMOS:::; 11 tHoldSys :::; tsys-rd d- tsysd - t240PDmax + tRAMLZ + tRd d 33 MHz R3001: tRAMOS:::; 15 - 4.5 - 2 - 2 - (-1) tRAMOS:::; 7.5 16 MHz R3001: tHoldSys:::; 6 - 2 - 4.8 + 2 + 1 tHoldSys:::; 2.2 20 MHz R3001 : tHoldSys :::; 6 - 1 - 4.8 + 2 + 1 tHoldSys :::; 3.2 (7) Data Hold from End of Write This parameter requirement guarantees that the data holdfrom end of write ofthe cache RAM ismetwhenthe processor or the read buffer is writing to the RAMs. 25 MHz R3001: tHoldSys :::; 6 - 1 - 4.8 + 2 + 1 tHoldSys :::; 3.2 33 MHz R3001 : tHoldSys :::; 4.5 - 1 - 4.8 + 2 + 1 tHoldSys :::; 1.9 tRAMHD :::; tRAMLZ 16 MHz R3001: tRAMHD:::; 2 (10) Address Setup to End of Write 20 MHz R3001: tRAMHD:::; 2 This equation enables us to determine the timing requirement for the RAM so that the address set up time is sufficient before the trailing edge of the write pulse. 25 MHz R3001: tRAMHO:::; 2 33 MHz R3001: tRAMHO:::; 2 tRAMAW:::; tcyc - tsmp-sys - tAdrLod - t373PO + twrd (8) Data Setup to SysClk This timing parameter ensures that the setup time into an external register (for the main memory interface) is sufficient enough for when the processor is doing a store. The data is clocked in the register on the rising edge of the buffered SysOut (through an inverting FCT240A). In this equation, tsys(min)d is used to insure worst case calculations. 16 MHz R3001: tRAMAW:::; 60 - 10 - 3 - 5.2 + 3 tRAMAW:::; 44.8 20 MHz R3001: tRAMAW:::; 50 - 8 - 2 - 5.2 + 2 tRAMAW:::; 36.8 25 MHz R3001 : tRAMAW :::; 40 - 6 - 2 - 5.2 + 2 tRAMAW:::; 28.8 33 MHz R3001: tRAMAW:::; 30 - 4.5 - 2 - 4.7 + 2 tRAMAW:::; 20.8 tsetupSys :::; tcyC/2 - tsys - tOVal - tOVal d + tsysd+t240POmin 16 MHz R3001 : tSetupSys:::; 30 - 16 - 3 - 3 + 2 + 1.5 tSetupSys:::; 11.5 9.14 12 R3001 SPECIFICATIONS & CACHE RAM TIMING MICROPROCESSOR INTERFACE GUIDE APPLICATION NOTE-77 (11) Write Hold Pulse-Width 10T79R3001 This requirement guarantees that the cache RAMs minimum write pulse width specification is met. AdrLo CIk2xSys tRAMPW :::; tcyC/2 - tWrDly CIk2xSmpIRd IClk 1Wr ~ Addr Path2 10T7198 CIk2xPhi 16 MHz R3001 : tRAMPW :::; 30 - 5 tRAMPW:::; 25 (16KX4) 20 MHz R3001: tRAMPW:::; 25 - 4 tRAMPW:::; 21 Figure 4.10a. Circuit Showing rwr and IClk Signals to Latch and SRAM 25 MHz R3001: tRAMPW:::; 20 - 3 tRAMPW:::; 17 33 MHz R3001: tRAMPW:::; 15 - 2 tRAMPW:::; 13 (12) Write Recovery Time The write recovery time is the time between the write pulse going inactive and the change in address. This characteristic is usually specified by the SRAM manufacturer and is typically zero. This parameter is important in the R3001 cache interface and care must be taken to choose the proper part to prevent race conditions. In the R3001 cache design using the IDT7198 16 Kx4RAM, the latch enable is controlled by IClklDClk and the write enable on the RAM is controlled by IWrl DWr. The timing diagram shows the relationship between the two clocks and the parameter TwR. Timing calculations below show that the write recovery specifications are not violated. The input and ouput capacitances for the R3001, IDT7198, and FCT373 can be obtained from Table 4.1. Figure 4.1 Oa is a simple circuit showing the connections of IClk and IWr from the R3001 to the latch enable (LE) on the FCT373 device and Write Enable (WE) on the static RAM respectively. Figure 4.1 Ob shows the tWR timings with respect to the data cache in an R3001 based system. DClk AdrLo I Latched AdrLo Derating calculations for DClk and OW, Output To calculate the effect of derating on the control signals DClk and DWr, the following assumptions have been made. 1) The pin to pin variation on an R3001 device is 15 % for a 50 pF load. Under the maximum case, the deratings will vary from 1.7 to 2 ns for DClk and DWr. Under the minimum case the deratings will vary from 0.58 to 0.625 ns. 2) The trace length for the DWr signal is 6 inches. 3) The trace length for the DClk signal is 2 inches. 4) The trace length of the address bus to the RAM is 4 inches. 5) Each IClk control signal is connected to four FCT373 devices. 9.14 IAdr IAdr I ! DAdr 1 ~~tPD (~---D"';'-A--~ ~A Figure 4.10b. Write Recovery Timing To prove that the TWR parameter is not violated, the calculations are done as follows. The derating effects on the DClk and AdrLo signal should exceed that of the DWr signal. The calculations are similar to the derating calculations in previous sections. The minimum propagation delay through the latch is considered. The derating on the DClk Signal coming out of the 79R3001 is lesser than that of the DWr signal. The reverse case is superfluous and in fact makes the Situation better. The minimum and worst case derating effects on the same 79R3001 have been shown. This is because the write recovery time parameter must not be violated over the entire operating range. 13 R3001 SPECIFICATION & CACHE RAM TIMINGS MICROPROCESSOR INTERFACE GUIDE APPUCATION NOTE-n capacitive Deratlons: (R3001 variations 15% 1.7ns· 2ns) Calcu lations: «( Cdriver + Cload + Ctrace) - Crated)/25 * (Min Or Max) = CLD Race path 1 : Iclkmin + Tpd(le) + RamAdrmin (0.58 + 2 + 0.52) = 3.1ns Race path 2 : Iwrmax = 1.325 Iclk ((10 + 4 * 10 + 2 * 2) - 25)/25 * 1.7 = 1.97ns Path1 - Path2 > Twr Iwr ((10 + 8 * 7 + 6 * 2) - 25)/25 * 2 =4.24ns 3.1 -1.325 RamAddr ((12 + 8 * 7 + 4 * 2) - 50)/25 * 0.5 = 0.52ns = Path1 - Path2 > Twr 4.49 - 4.24 > 0 Capacitive Deratlons: (R3001 variations 15% 0.58ns - 0.625ns ) (( Cd river + Cload + Ctrace) - Crated)/25 * (Min Or Max) = CLD Ielk ((10 + 4 * 10 + 2 * 2) - 25)/25 * 0.58 = 0.58ns Iwr ((10 + 8 * 7 + 6 * 2) - 25)/25 * 0.625 = 1.325ns = 1.775> 0 From the above calculations and the RAM timing tables 4.3 and 4.4, it can be seen that the data setup to the processor is met. The output enable of the RAM which is controlled by lAd goes high and the RAM output starts to go tri-state. From the figure, the reader may correctly question whether the hold time requirements of the R3001 are met. It is indeed met by the capacitance on the bus and also due to the fact that CMOS devices are being used. The technical note entitled "Meeting Bus Hold for the R3001" gives a more detailed explanation. Table 4.3 gives the timing data sheet for a typical SRAM device. The timing parameters correspond to a particular RAM configuration. Other RAM devices may have different timings for some of the parameters. However, there are certain timings that must be met. These critical parameters are listed in Table 4.4 and the unlisted parameters may vary a bit from device to device. Calculations : Race path 1 : Iclkmin + Tpd(le) + RamAdrmin ( 1.97 + 2 + 0.52 ) =4.49 Race path 2 : Iwrmax = 4.24 = RamAddr ((12 + 8 * 7 + 4 * 2) - 50)/25 * 0.5 = 0.52ns 9.14 14 R3001 SPECIFICATIONS & CACHE RAM TIMING MICROPROCESSOR INTERFACE GUIDE APPUCATION NOTE-77 AC ELECTRICAL CHARACTERISTICSCOMMERCIAL TEMPERATURE RANGE SYMBOL PARAMETER TEST CONDITIONS 16MHz MAX. MIN. 20 MHz MIN. MAX. 25 MHz MIN. MAX. UNIT - ns Clock TCkHigh Input Clock High(2) Transition < 5ns 12.5 TCklaw Input Clock Low (2) Transition < 5ns 12.5 TCkP Input Clock Period(5) - 10 - 8 10 - 8 ns 30 500 25 500 20 500 ns Clk2xSys to Clk2xSmp/Rd(5) 0 Tcycl4 0 Tcycl4 0 Tcycl4 ns Clk2xSmp/Rd to Clk2xPhi(5) 9 Tcycl4 7 Tcycl4 5 Tcycl4 ns - -1.5 ns -0.5 ns Run Operation - TOen Data Enable (3) TOOis Data Disable (3) Toval Data Vaild Load = 25pF TWrOly Write Delay Load = 25pF Tos Data Set-Up TOH Data Hold Tcss CpBusy Set-Up TCSH CpBusy Hold TAcTy Access Type [1 :0] Load = 2SpF - TAT2 Access Type [2] Load = 2SpF 17 TMWr Memory Write Load = 2SpF TExc Exception 9 -2.5 13 -2.5 -2 - -2 -1 - -1 3 - 3 5 - 4 8 - 6 -2.5 - -2.5 11 - 9 -2.5 - -2.5 2 ns 3 ns - ns ns ns ns - 6 - 5 ns - 14 - 12 - ns 1 27 1 23 1 18 ns Load = 2SpF - 7 - 7 - 5 ns - 30 - 23 - 20 ns 18 ns 7 Stall Operation TSAVal Address Valid Load = 2SpF TSAcTy Address Type Load = 25pF TMRdl Memory Read Initiate Load = 2SpF 1 27 1 23 1 18 ns TMRd Read Terminate Load = 2SpF - 7 - 7 - 5 ns TStl Run Terminate Load = 25pF 2 17 2 15 2 11 ns TRun Run Initiate Load = 2SpF - 7 - 6 - 4 ns TSMWr Memory Write Load = 25pF 1 27 1 23 18 ns TSEc Exception Valid Load = 2SpF - 20 - 18 - 15 ns TSEc DMA Drive On Load = 25pF 3 15 3 15 3 15 ns TSEc DMA Drive Off Load = 2SpF - 10 - 10 - 10 ns 6 27 23 1 Reset Initialization TRST Reset Pulse Width 6 Tcyc 140 140 - - Reset Pulse Width, Pull-downs on Tag - 6 TRSTTAG 140 - Jls 0.5 1 0.5 1 0.5 Capacitive Load Deration Q.o Load Derate 1 ns/25pF NOTES: 1. All timings are referenced to 1.5V 2. The clock parameters apply to all three 2x clocks: Clk2xSys, Clk2xSmp/Rd and Clk2xPhi. 3. This parameteris guaranteed by design. 4. These parameters are illustrated in detail in the "I DT79R300 1 Hardware Interface Guide". 5. Tcyc is one CPU clock cycle (2 cycles of a 2x clock). 6. With the exception of Run, no two signals of a given device will derate by a difference greater than 15%. Table 4.2. R3001 AC Specificalions.*PLL: Phase Locked Loops 9.14 15 R3001 SPECIFICATION & CACHE RAM TIMINGS MICROPROCESSOR INTERFACE GUIDE APPUCATION NOTE-n READ CYCLE TIMING SPECIFICATIONS 16.7 MHz Parameter 20.0 MHz 25.0 MHz 33.0 MHz Min. Max. Min. Max. Min. Max. Min. tRC 30 - 25 - 20 - 12 tAA 30 20 - 12(1) 25 - 19 30 - 25 tACS1 - tCLZ1 5 - 5 - 5 - 2 - tOES - 15 - 13 - 10 - 7 tOLZ 5 - 5 - 5 - 3 - tCHZ1 12 - 10 - 8 10 - 8 - 8 tOHZ - tOH 5 5 - 5 0 0 - 0 - 0 tpu - 0 - tPD - 30 - 25 - 20 - 15 10 Max. 15 6 WRITE CYCLE TIMING SPECIFICATIONS 16.7 MHz Parameter 20.0 MHz 25.0 MHz 33.0 MHz Min. Max. Min. Max. Min. Max. Min. Max. twc 30 25 - 15 - 25 21 - 20 tCW1 17 25 21 - 17 tAS 0 0 - 0 twp 25 21 - 17 - 15 tAW - 0 - 0 - 15 0 13 tWR1 0 - 0 - 0 tWR2 0 - 0 - 0 tWHZ - 18 - 16 - 8 - 6 tDW 16 - 13 - 11 7 tDH 0 - 0 5 - 5 - 0 tow - - 5 0 5 Table 4.3. Static RAM Read and Write Timings to Work as Cache with the R3001 NOTE: 1. This assumes that an FCT373C with a TPD =4.7 ns Is used. 9.14 16 R3001 SPECIFICATIONS & CACHE RAM TIMING MICROPROCESSOR INTERFACE GUIDE APPUCATION NOTE-77 READ CYCLE TIMING SPECIFICATIONS 16.7 MHz Parameter tRC tM tACSl tCLZl tOES tOLZ tCHZ1.2 tOHZ tOH tpu tPD Min. · - · · - - · · - Max. 20.0 MHz Min. Max. - · 29.8 - 24.8 · - · 15 - · 10.5 - - · - · · 13 - - · - 10 · - · - · - 25.0 MHz Min. · - · · - · · - Max. 18.8 · 10 - · 8.5 - · 33.0 MHz Min. Max. - 12.3 · · · - · · - - · 7 - · 6 - - · WRITE CYCLE TIMING SPECIFICATIONS 16.7 MHz Parameter twc tCWl tAW tAS Min. · · · 44.8 Max. - - Min. · · · 36.8 Max. - - 25.0 MHz Min. · · · 28.8 Max. - - 33.0 MHz Min. · · · 19.8 Max. - twp 25 tWRl 0 - 0 - 0 - 0 tWR2 0 - 0 - 0 - 0 tWHZ -- · -- · -- · -- tDW 16 11 - 7 - - · · - tDH tow · · - 20.0 MHz - 21 13 · · - - 17 · · 13 · Table 4.4. Static RAM Parameters to Work as Cache with the R3001 , . NOTE: All the parameters shown are the most allowable for maximum and minimum, respectively. Numbers not shown are not critical for the R3001 application. 9.14 17 &. R3001 SPECIFICATION & CACHE RAM TIMINGS MICROPROCESSOR INTERFACE GUIDE APPLICATION NOTE-n Parameter Load Symbol Min. Max. FCT373A Propagation Delay 50 t373 PO - 5.2 FCT373A Latch Enable Delay 50 t373 LE 2 8.5 FCT373A Latch Enable Hold 50 t373 Hid 1.8 - FCT240A Propagation Delay 50 t240 PO 1.5 4.8 FCT373C Propagation Delay 50 t373 PD 1.5 4.7 FCT240C Propagation Delay 50 t240 PO 1.5 3.7 Table 4.5. Timing Parameters of FCT Logic Devices 5.0 USING x16 LATCHED RAMS AS CACHE FOR THE R3001 ON A SURFACE MOUNT DESIGN 4.4.1 Legend tRAMAA - RAM Access Time tRAMOE - RAM Output Enable Time tRAMHZ - RAM OutPut Low impedance to Output in High impedance tRAMLZ - RAM Output in High impedance to output in Low impedance tRAMHO - RAM Data Hold Time tos- R3001 Data Setup Time tsys- Phase Difference between Clk2xSys and Clk2xPhi trd- Phase Difference between Clk2xPhi and Clk2xSmp/Rd tsmp- Phase Difference between Clk2xPhi and Clk2xSmp/Rd tcyc- Cycle time of the R3001 t240PO - Propagation delay from Clk to Output of FCT240A 5.1 Assumptions for Surface Mount Design Layout Using x16 Latched RAMs as Cache for the R3001 In this chapter, the RAM timings are calculated for a 4K X 16 I DT71586 which have the latches built in. For the static RAMs with latches built in, the address access times tRAMAA, and the address setup to end of write tRAMAW will change from those of a regular static RAM. The propagation delay due to the latches is eliminated increasing the access time and the address setup to end of write by about 5 ns. In addition the board layout is different because the distances from the CPU to the RAM is reduced. This decreases the derating factors by a finite amount. This chapter calculates the derating factors for an R3001 cache design using the IDT71586 as cache. These are the following assumptions: 1) The trace has a capacitance of 2 pF/inch 2) The speed of light is 2 ns/foot in epoxy. 3) The R3001 speeds are specified with a loading of 25pF. For every additional 25pF, there is a delay of 1ns. Note that the cache control Signals are specified with a 50pF load and derate 1ns/25pF after that. 4) The distances between the R3001 and the latches are approximately 5 inches. 5) The distances between the R3001 and the RAMs are approximately 2 inches each. 6) In all of the assumptions, it is assumed that a surface mount package is used. The input capacitance of the RAMs is a typical value (7 pF) for a PLCC package. 9.14 18 R3001 SPECIFICATIONS & CACHE RAM TIMING MICROPROCESSOR INTERFACE GUIDE APPLICATION NOTE-n The instruction and data caches are deSigned using the IOT71586 latched SRAMs. The system memory is assumed tobe256MB. Oataparityonthe R3001 is disabled. Theupper four tag bits are also not compared and their comparision is turned off at reset time using pull down resistors on the Tag (31 :28) bits. Therefore the cache data format is 48 bits wide. The instruction and data caches are built using six IOT71586 - three for instruction cache and three for the data cache. 5.2 Figure 5.2 shows an example layout of an R3001 surface mount design board using latched RAMs (IOT71586) as cache for the R3001 system. The distance between the R3001 data pins and the caches is about 2 inches. The total trace length for the address bus and the data bus is about 4 inches each. 5.2.1 Address Bus Derating Calculations Each AdrLo bus is connected to eight latched RAMs i.e., the IOT71586 and the address latch for main memory writes and reads. (Figure 5.3) Derating Calculations Using IDT71586 as Cache RAMs The derating factors for the IOT71586 cache RAMs follow the same methodology as explained in Chapter 4. The cache size is 4K words for instruction and 4K words for data. The latches are eliminated. The derating factors for the address and data bus are calculated. Figure 5.3. Number of Devices Address Bus is Connected To 3x IDT71586 IDT79R3001 Trace length from the CPU to the address latch (for main (5.1) memory) = 4 inches T31·28 T27·13 Capacitance of the trace = Ctrace = 4 x 2 pFlinch = 8 pF (5.2) Data Input capacitance of the 373 latch = 10 pF 150Cl .... Dala31·0 '----t---~---=-----t --~:---o--Tag 22·14 + valid~ L - - - -....... Total capacitance due to the load = 6 x 7 = 42 (5.4) =8 + 42 + 10 = 60 pF (5.5) = 25 pF Extra loading on the R3001 = 60 - 25 = 35 pF The rated R3001 load Figure 5.1. 71586 Used as Cache RAMs for the R3001 Cache Size (5.3) Total input capacitance due to 6 RAM devices pF =16KB (5.6) (5.7) The delay can be calculated as follows. For every extra 25 pF of load, there is a delay 01.1 ns (5.8) Figure 5.1 shows a cache system for the R3001 with the latched RAMs i.e., IOT71586 as the cache. There are a total of 6 such devices required for 16KB each of instruction and data cache. From Eq. 5.7 and Eq. 5.8, delay due to the capacitive load = 351 25 = 1.4 ns (5.9) The speed of light"" 2 nslfoot (5.10) For a maximum path length of 3", delay = 3"/12" x 2 ns I :~ I~ I 'Z' = 0.5 (5.11) From Eq. 5.9 and Eq. 5.11, EJ EJ Total propagation delay for the address bus ns = 1.4 + 0.5 "" 2 (5.12) From the above calculations, it is seen that the derating on the address bus is 2 ns. 5.2.2 Data Bus Derating Calculations 'Z' Figure 5.2. Surface Mount Board Layout of an R3001 System Using IDT71586 as Cache and Approximate Distances Between Devices From Figure 5.4, it is seen that the data bus is connected tothefloating point unit (R301 0), two 71586devices, one read register (FCT374A), and one write register (FCT823B). As in the previous chapter where we considered a 16 K x 4 static RAM, we have to calculate the deratings for two cases: i) for an instruction fetch, and ii) for a data store. 9.14 19 II• R3001 SPECIFICATION & CACHE RAM TIMINGS MICROPROCESSOR INTERFACE GUIDE APPUCATION NOTE-77 write respectively. The read and write control signals are connected to the output enable (OE), and write enable (WE) of the instruction and data cache respectively. Two control signals each are provided for the read and write operations of each of the caches. Assuming the use of a 4 K x 161DT71586 static RAM, each control signal is connected to 3 such static RAMs. Bit DO (2) ~==~ l~:c~!s) Number of devices (SRAM) connected to each control line =3 (5.28) Figure 5.4. Devices Data Bus of the R3001 Is Connected to 5.2.2.1 Data Store (R3001 outputs data) Each data bit is connected to two RAM devices _ one for instruction and one for data. The path length of the data bus = 4 inches. (5.13) Input capacitance of each device (SRAM) = 5 pF (5.29) Total load capacitance = 3 x 5 = 15 pF (5.30) Path length = 4" (5.31) Trace Capacitance = 4 x 2 pF/in = 8 pF (5.32) Total capacitance = 15 + 8 = 23 pF (5.33) There is no extra capacitive loading here as the rated R3001 load is 50 pF. Propagation delay due to the trace length = 0.8 ns (5.34) Trace capacitance of the data bus = 4 x 2 pF/inch = 8pF (5.14) Total propagation delay = 0.8 "" 1 ns Capacitive loading on the data bus due to the different devices = 2 X CRAMin + CR3010in + C3740ut + C823in = 2 x 7 +10+12+10=46pF (5.15) 5.3 Timing Equations for Cache Design Total capacitive load = Cdevices + Ctrace = 46 + 8 = 54 pF (5.16) Propagation delay due to speed of light = 4"/12" x 2 = 0.6 ns (5.17) Delay due to capacitive load = (54 - 25) 125 = 1.16 ns (5.18) Total delay = 1.16 + 0.7 = 1.8 ns "" 2 ns. (5.19) 5.2.2.2 Load Data Into R3001 (RAM outputs data) Since the trace length is the same, the trace capacitance Ctrace= 8 pF. (5.20) Capacitive load = CR3001in + CR3010in + C71586in + C374in + C8230ut (5.21) Cdevices = 10 + 10 + 7 + 12 + 10 = 49 pF Ctotal =49 + 8 = 57 pF The RAM rated drive = 30 pF This section deals with the timing equations that enable us to determine the critical timing requirements of the static RAM that will be used as cache. These equations are based on the use of static RAMs without built-in latches as cache RAMs. The superscript cd' in the following equations denote the deratings to be taken into account. The static RAM chosen for illustration here is a 4K x 161 OT71586. The board is assumed to be surface mount for all speeds of the R3001. The deratings for the surface mount board is 2 ns. The deratings were derived from certain assumptions. The explanation and the methodology used is explained in the previous sections. In the following, a generalized equation is given followed by the timing requirements for different frequencies of the R3001. All calculations are based on the R3001 specifications forthe four speed versions (16, 20,25, and 33 MHz), which are found in the lOT data sheets. (5.22) (5.23) (5.24) Propagation delay due to extra capacitive loading = (57 30) 125 = 1.08 ns (5.25) Propagation delay due to path length = 0.8 ns (5.26) Total propagation delay = 1.08 + 0.8 "" 2 ns (5.27) (5.35) 5.2.3 Read and Write Control Derating Calculations The effect ofthe capacitance on the control signals fromthe R3001 processor to the caches and the memory interface is considered here. The control signals on the R3001 are the IRd, ORd, IWr, and DWr which control the instruction cache read, data cache read, instruction cache write, and data cache 9.14 (1) Internal Sample to Phase Delay This is the time that the processor needs to sample the incoming data. Typically, for the R3001, tsmp ~ 5 (2) RAM Address Access Time This equation is used to determine the Address Access time parameter requirements of the static RAM. From the timing diagram of Figure 5.4, it is easily calculated. The total cycle time for a 33 MHz R3001 is 30 ns. If the processor's sample time requirement is met, the time remaining in the cycle is 24 ns in which the data has to be presented to the processor. The processor requires a data setup time of 4 ns. The derating factors due to thecapacitance and the trace length have also to be taken into account. Using all these factors, the equation is, 20 R3001 SPECIFICATIONS & CACHE RAM TIMING MICROPROCESSOR INTERFACE GUIDE tRAMAA ~ APPUCATION NOTE-n tcyc - tsmp - tos - tAdrLod - tRAMAAd (4) Minimum Read Pulse Width This timing requirement guarantees that the read pulse width generated by the processor is at least as long as the cache RAM output enable time. 16MHzR3001: tRAMAA~60-10-9 -3-3 tRAMAA~ 35 tOES ~ tcyC/2 - tsys-rd - tOES d 20 MHz R3001: tRAMAA ~ 50 - 8 - 8 - 2 - 2 tRAMAA~ 16 MHz R3001: tOES 30 tOES 25 MHz R3001: tRAMAA ~ 40 - 6 - 6 - 2 - 2 tRAMAA 33 MHz R3001: tRAMAA ~ ~ tRAMAA 20 MHz R3001 : tOES 24 ~ 30 - 10 - 3 ~ 25 MHz R3001: tOES 17 ~ tOES (3) Cache Enable to Sample This equation is used to determine the output enable requirements of the cache RAM and should meet the processor's setup specification. The output enable time for the latched RAM is specified by the manufacturer and tested for a voltage change of 200 mV (1.732 V to 1.532 V for lOT RAMs). For a system the voltage falls fromalevelof3.3 Vto 1.5Vandthe added fall time must be considered when specifying the RAM tOE parameter. This fall time is approximately an additional nanosecond. Therefore the RAM tOE parameter should be one nanosecond lower than the calculated numbers below. tOES ~ tcyC/2 - tRod - tos - tsys-smp + tsys-rd - tOES d 25 - 8 - 2 33 MHz R3001: tOES ~ tOES ~ ~ 15 - 4.5 - 2 ~ 8.5 This timing requirement ensures that the RAM output is tri-stated soon enough after the instruction read Signal goes high. In the worst case, when the processor performs a store operation, no data contention occurs. tRAMHZ ~ tsys - tRd d + tOEn 20 MHz R3001: tRAMHZ 16 12 (5) Read-Write I-Cache Data Bus Contention 16 MHz R3001: tRAM HZ 16 MHz R3001 : tOES ~ 30 - 2 - 9 - 10 + 10 - 3 15 20 - 6 - 2 ~ tRAMHZ tOES 17 tOES~ 30 - 4.5 - 4.5 - 2 - 2 ~ ~ ~ tRAMHZ 16 - 2 + (-2.5) ~ 11.5 14 - 1 + (-2) ~ 11 20 MHz R3001: tOES ~ 25 - 1 - 8 - 8 + 8 - 2 tOES ~ 14 25 MHz R3001: tRAM HZ ~ 12 -1 + (-1.5) 25 MHz R3001: tOES ~ 20 - 1 - 6 - 6 + 6 - 2 tRAMHZ ~ 9.5 tOES ~ 11 33 MHz R3001: tRAM HZ ~ 9 - 1 + (-1) 33 MHz R3001: tOES ~ 15 -1 - 4 - 4.5 + 4.5 - 2 tOES tRAMHZ ~ 7 ~8 (6) Processor Data-Setup to End of Write ,. This enables the designer to determine whether the cache RAMs have adequate data setup time when the processor does a store operation. In the equation, the 9.14 21 IiII R3001 SPECIFICATION & CACHE RAM TIMINGS MICROPROCESSOR INTERFACE GUIDE APPUCATION NOTE-n minimum derating is used on the write line i.e., tWr d because. that is the worst case assumption. 25 MHz R3001 : tSetupSys ~ tSetupSys tRAMDS ~ tcyC/2 - tsys-smp - tDVal - tDVaJd - tWrd 20 - 12 - 2 - 2 + 1 + 1.5 ~ 6.5 33 MHz R3001: tSetupSys ~ 15 - 9 - 2 - 2 + 1 + 1.5 16 MHz R3001: tRAMDS ~ 30 -10 - 3 - 3 - (-2) tSetupSys ~ 4.5 tRAMDS ~ 16 (9) Data Hold from SysClk 20 MHz R3001: tRAMDS ~ tRAMDS 25 MHz R3001: tRAMDS ~ tRAMDS 33 MHz R3001: tRAMDS ~ tRAMDS 25 - 8 - 3 - 2 - (-1) This timing parameter is to guarantee that the hold time specification for an external register is met on a processor store. In this equation the minimum value of tRod is taken to insure worst case numbers. 13 ~ 20 - 6 - 2 - 2 - (-1) tHoldSys ~ tsys-rd - tsysd - t240POmax + tRAMLz + tR~ 11 ~ 16 MHz R3001: tHoldSys 15 - 4.5 - 2 - 2 - (-1) ~ tHoldSys 7.5 This parameter requirement guarantees that the data hold from end of write of the cache RAM is met when the processor or the read buffer is writing to the RAMs. ~ tRAMLZ ~ 2 25 MHz R3001: tRAMHD ~ 2 33 MHz R3001 : tRAMHD ~ 2 ~ ~ tHoldSys ~ 2.2 ~ 3.2 6 - 1 - 4.8 + 2 + 1 ~ 3.2' 4.5 - 1 - 4.8 + 2 + 1 ~ 1.9 (10) Address Setup to End of Write: This equation enables us to determine the timing requirement for the RAM so that the address set up time is sufficient,before the trailing edge of the write pulse. (8) Data Setup to SysClk This timing parameter ensures that the setup time into an external register (for the main memory interface) is sufficient enough for the case when the processor is doing a store. The data is clocked in the register on the rising edge of the buffered SysOut (through an inverting FCT240A). In this equation, tsys(min)d is used to insure worst case calculations. ~ 25 MHz R3001: tHoldSys 33 MHz R3001 : tHoldSys 20 MHz R3001: tRAMHD ~ 2 tsetupSys tHoldSys tHoldSys 16 MHz R3001: tRAMHD 6 - 2 - 4.8 + 2 + 1 20 MHz R3001: tHoldSys ~ 6 - 1 - 4.8 + 2 + 1 (7) Data Hold from End of Write tRAMHD ~ tcyC/2 - tsys - tDVal - tDVal d + tsysd + t240POmin 16 MHz R3001: tSetupSys ~ 30 -16 - 3 - 3 + 2 + 1.5 tRAMAW ~ tcyc - tsmp-sys - tAdrLod + twrd 16 MHz R3001: tRAMAW ~ 60 - 10 - 3+ 2 tRAMAW~ 49 20 MHz R3001 : tRAMAW ~ 50 - 8 - 2 + 1 tRAMAW~ 41 25 MHz R3001: tRAMAW ~ 40 - 6 - 2 + 1 tSetupSys ~ 11.5 tRAMAW~ 20 MHz R3001: tSetupSys ~ 25 - 12 - 3 - 2 + 1 + 1.5 33 33 MHz R3001 : tRAMAW ~ 30 - 4.5 - 2 + 1 tRAMAW ~ 24.5 tSetupSys ~ 10.5 9.14 22 R3001 SPECIFICATIONS & CACHE RAM TIMING MICROPROCESSOR INTERFACE GUIDE APPUCATION NOTE-n From Figure 5.5, it is clearly seen that the address setup and hold time for the latched RAMs are met by using IClk to capture the instruction address. Figure 5.5 is to illustrate the timings for a 25 MHz IDT71586 latched RAM. Similar timing diagrams can be drawn to verify the setup and hold times for R3001 operating at different frequencies. (11) Write Hold Pulse Width: This requirement guarantees that the cache RAMs minimum write pulse width specification is met. tRAMPW ::;; tcyC/2 - tWrDly 16 MHz R3001: tRAMPW ::;; 30 - 5 5.3.1 Legend tRAMPW::;; 25 tRAMAA - RAM Access Time 20 MHz R3001: tRAMPW::;; 25 - 4 tRAMOE - RAM Output Enable Time tRAMPW::;; 21 tRAMHZ - RAM OutPut Low impedance to Output in High impedance 25 MHz R3001: tRAMPW::;; 20 - 3 tRAMLZ - RAM Output in High impedance to output in Low impedance tRAMPW::;; 17 tRAMHD - RAM Data Hold Time 33 MHz R3001: tRAMPW::;; 15 - 2 tRAMPW::;; 13 From the above calculations and Figure 5.3, it can be seen that the data setup to the processor is met. The output enable of the RAM which is controlled by lAd goes high and the RAM output starts to go tri-state. From the figure, the reader may correctly question whether the hold time requirements of the R3001 are met. It is indeed met by the capacitance on the bus and also due to the fact that CMOS devices are being used. The technical note entitled "Meeting Bus Hold for the R3001" gives a more detailed explanation. tDS - R3001 Data Setup Time tsys - Phase Difference between CIk2xSys and Clk2xPhi trd - Phase Difference between Clk2xPhi and Clk2xRd tsmp- Phase Difference between Clk2xPhi and Clk2xSmp tcyc - Cycle time of the R3001 tsmp-rd = tsmp - trd t240PD - Propagation delay from Clk to Output of FCT240A phase 2 AdrLo K phase 1 ~K IAdr ~ Data IRd" i\ 5 las -- lelk phi - ~ ~ DAdr lin ~~ ) 5 lah ~ ~ j sys smp(rd) phi sys smp(rd) phi (0) (6) (12) (0) (6) (12) Figure 5.5. Address Setup and Hold Timing for a Latched RAM (25 MHz R3001) 9.14 23 R3001 SPECIFICATION & CACHE RAM TIMINGS MICROPROCESSOR INTERFACE GUIDE Parameter tRC tCH tCl tAS tAH tAA tACE tOEs tClZ tOlZ tCHZ tOHZ tOH Description Read cycle ALEN high ALEN low Adr Latch set-up Adr Latch Hold Address Access Chip Enable access Output enable CE to out in LZ OE to out in LZ CE to out in HZ OE to out in HZ Output Hold from address in change APPLICATION NOTE-n 16MHz Max. Min. 35 -10 10 5 5 35 -35 16 -3 -2 25 11 3 -- -- --- -- 20 MHz Min. Max. 30 -10 10 5 5 30 -30 14 -3 2 22 11 3 -- --- ---- 25 MHz Min. Max. 25 10 10 5 5 -24 25 -11 -3 2 20 --- 3 --- 33 MHz Min. Max. 15 8 8 4 4 ---3 2 --- 9 -- 3 17 17 8 --15 7 -- Table 5.1. Read Cycle Timings for an IDT Static RAM with Latches Parameter twc tCH tCl tAS tAH tAW tASW twp tcw tWR tWHZ tDW tDH tow Description Write cycle ALEN high ALEN low Adr Latch set-up Adr Latch Hold Address to end of write Address set-up Write pulse width CE to end of write Write recovery Write to out in HZ Data setup Data Hold Out active from end of write 16MHz Min. Max. 35 -10 10 5 5 -35 0 --25 25 0 15 --16 -0 5 --- -- 20 MHz Min_, Max. 30 -10 10 5 5 30 0 -20 -20 0 15, -13 0 -5 -- ---- -- 25 MHz Max. Min. 25 10 10 5 5 25 0 -17 20 0 -13 11 0 5 -- -- -- ---- 33 MHz Min. Max. 15 8 8 4 4 15 0 11 11 0 8 7 0 5 ---- -- -- ---- Table 5.2. Write Cycle Timings for an lOT Static RAM with Latches REFERENCES 1) Kane, Gerry., "mips RISC ARCHITECTURE," Prentice Hall Inc., N.J, 1988 2) lOT RISC R3001 Microprocessor Interface Guide, April 1990 3) lOT RISC R3001 Family Data Sheets, 1990 4) lOT Data Book, 1989 5) lOT Data Book Supplement, 1989 9.14 24 (;)® A POWERFUL DEVELOPMENT TOOL FOR THE IDT 79R3000 Rise FAMILY Integrated Device Technology, Inc. CONFERENCE PAPER CP-01 AS PRESENTED AT SOUTHCON '89 By Philip Bourekas INTRODUCTION The inherent flexibility of the lOT 79R3000 family architecture allows system designers to make a wide variety of system tradeoffs. Therefore, it is extremely important that effective development tools be available for each step of the design. This need starts when the initial architectural decisions and performance simulations are made and carries through to the final stages of hardwaresoftware integration and system diagnostics development. This paper describes the System Programmer's Package (SPP) for the lOT 79R3000, an effective set of tools available for each part of the design process. These tools allow designers to make tradeoffs in both hardware and software which result in systems with very cost-effective performance, and also speeds the overall . development of the target application. PROCESSOR OVERVIEW The architecture of the lOT 79R3000 family is partitioned into a processor and a separate high-performance floating point unit. The processor, which can sustain 20 VAX MIPS performance, consists of two tightly-coupled processors implemented on a single ship. The first processor is a full 32-bit CPU utilizing RISC techniques to achieve a new standard of microprocessor performance; the second processor is a system control coprocessor (CPO), containing a Translation Lookaside Buffer (TLB) and control registers to support a 4 GByte virtual memory subsystem. Also integrated onto the processor chip is a dual-cache controller which controls separate direct-mapped instruction and data caches. Each cache can independently vary in size from 4K bytes through 256K bytes, with independently selectable block refill sizes of 1 to 32 words. The processor achieves 200 MBytesl second cache bandwidth at 25 MHz. Figure 1 illustrates the interaction between the CPU, its coprocessors, and the memory subsystems. The full performance of the 79R3000 is achieved by the proper integration of software and hardware. The 79R3000 contains a highly efficient five-stage pipeline, with each stage of the pipeline controlling a different CPU resource. Optimizing compiler technology serves to both reduce the number of instructions required to perform a given task, and also serves to efficiently schedule instructions. This eliminates processor stalls which might arise if a hardware interlock is activated and eliminates NOP instructions during the latency cycles of LOAO or BRANCH operations. 79R3000 CPU Data Instruction OE, Cache WE (4k - 256k bytes) with System Coprocessor Data Data Address Memory Memory Control Coprocessor and I/O Interrupts Figure 1. 79R3000 Based System c 1989 Integrated Device Technology, Inc. Printed in the U.S.A. 9.15 5/89 A POWERFUL DEVELOPMENT TOOL FOR THE lOT 79R3000 RISCFAMILY ARCHITECTURAL DECISION PHASE In order to determine the system configuration which yields the best price/performance tradeoff for a given application, the system designer has to make a number of decisions. For example, the designer needs to select the cache configuration which best s~pports the application requirements. The 79R3000 allows the deslgnerto choose various cache depths from 4K bytes to 256K bytes, and to select an appropriate block refill size, for both the instruction and data caches. The optimal cache size and block size is a function of the locality of the typical programs running on the system, as well as .the CPU sub-system's latency to main memory when a cache miss occurs. In general, the longer the latency to main memory the longer the CPU will be stalled for a cache miss and thus the more severe the penalty. This means that in orderto overcome the effects of a long memory latency, a large cache and a fairly large block size are needed to support high performance. Systems with a relatively low latency to main memory may be able to tolerate a smaller, less expensive cache subsystem and a smaller block size. CONFERENCE PAPER CP-{)1 While Sable models things such as disks simply as a source or sink for data files without modeling the physical performance of a disk (latency, etc.), Sable does provide sophisticated models of the TLB and cache. Sable keeps track of the contents of the TLB and of each cache entry after every cycle. This allows debugging of kernels, diagnostics and other programs which might r~.quire knowledge of the privileged, supervisor state as well as faCilitates debugging of user programs which may execute on top of an operating system. Thus, by using the Sable architecture simulator, the. software for the target system can be developed concurrently With the hardware itself. It is worth noting that MIPS Computer Systems uses Sable to debug its operating systems for new computers, and typically has the software working months in advance of the target hardware. Other utilities are provided to minimize the software development effort. The Stand-Alone 110 library, SAIO, is designed to perform the functions of the stdio standard "C" library. The SAIO library contains Unix system-like routines that access disks, Ethernet, tapes, and UART devices and also includes routines to p.erform fault handling and cache flushing. The stand-alone compiler system uses the SAIO library to produce code that will execute in the target stand-alone environment. The entire stand-alone applications program is then constructed "on-top" of the modules provided by the SPP, as shown in Figure 3. Operating System Development Machine Figure 2. Using Cache-2000 to Determine. Optimum System Configuration Before going to the expense of drawing schematics or building boards, the system designer can use a set of programs contained in the SPP which simulates various cache sizes and refill sizes. Since the typical bus latency of the target system is included during the simulations, a relatively accurate model of anticipated performance is available. This capability is contained in the Cache-2000 module as part of the SPP; provided in source form, this package allows the system designer to describe the overall system and then benchmark typical target software (as shown in Figure 2). In this manner, a decision about the allowable tradeoffs in cost and performance can be made before the detailed design is begun. Operating System Development Machine Figure 3. Using SPP to Develop Stand-Alone Applications on the Host System SOFTWARE DEVELOPMENT PHASE Software development is increasingly a bottleneck in system development. The SPP contains a number of utilities which facilitate the parallel development of software with hardware, allowing much of the software can be developed long before the target hardware is available. Key to this is the stand-alone software simulator package, Sable, which models the CPU, FPA, and the entire memory hierarchy of TLB, Cache, main memory, disk and system console. Sable simulates the execution of 79R3000 instructions in an environment which mimics the target system. Further customization is possible, as the simulation package is provided in source form. Code can be developed in assembly language, "C", Pascal, Fortran or even PU1 or Cobol; the software development tools work equ~lIy well with any of MIPS standard languages. MIPS simulation debugger, sdbx, provides source level debugging across the entire language suite, tracing every machine action back to the original source statement. Sdbx provides the ability to examine and modify the state of the machine as well as standard debug facilities such as single-step or breakpoint. Finally, another utility that comes standard with the MIPS RISe/ os UNIX operating system, called PIXIE, provides extensive profiling of the target code. PIXIE dynamically profiles the execution of 9.15 2 A POWERFUL DEVELOPMENT TOOL FOR THE lOT 79R3000 RISCFAMILY the target software, so that the performance of the application can be tuned and so that all of the modules can be fully tested. PIXIE works on the binary image of the program, and thus does not require modification to the source code in order to be used. HARDWARE/SOFTWARE INTEGRATION The SPP toolkit also benefits the process of integrating software onto the target hardware. The SPP provide utilities which connect the target to the host development machine for final debugging, and also provides a number of software modules designed to be incorporated into the target to facilitate integration. The SPP includes a PROM monitor which provides system diagnostics and in~ialization of the machine. Provided in source form, the monitor is designed to initialize the cache, reset the machine state, and transfer control either to a shell program or the applications program. The monitor is responsible for storing communications parameters for the console port and maintaining other system information. Utilities are provided to facilitate the compilation of the mon~orprogram and to help burn it into PROMs. The PROM monitor allows basic commands to examine and modify system memory, and provides primitive debug support which is typically retained in the customer end product. In addition to monitor functions, the PROM monitor also contains a set of power-on diagnostics which exercises the basic functionality of the CPU, FPA and memory. These diagnostics are also often incorporated into the customer end product. Once the target system can correctly execute the diagnostic suite, the application is known to communicate effectively with the cache and the basic operation of the CPU subsystem is confirmed. CONFERENCE PAPER CP-01 The Monitor and Stand-Alone Shell, SASH, extends the functionality of the simple PROM monitor. The monitor functions are partitioned between the PROM Monitor and SASH so that basic functions provided by the PROM monitor can be kept in a small PROM, while the extended SASH functions can then be executed out of RAM. The 'dbgmon' debug monitor is designed to be incorporated into the target machine. It provides hooks which allow the host to remotely debug the target machine, and works with the Mips host system debugger, pdbx, which is a source level symbolic debugger configured to operate remotely. This duo performs remote debugging, including the ability to breakpoint, single step, and examine the execution state of the target. Beyond disassembly, the debugger maps machine instructions to the source statements which generated them, and prints variables and addresses symbolically. Finally, in order to debug the target through the host development vehicle, communications software is also provided. Two RS232 ports in the target system aid development; one for the target systems' console monitor and one for the remote debug access to the target from the host. In addition to RS232 capability, drivers are also provided for Ethernet. Once the initial debug of the program is completed, Ethernet can be used to greatly speed the transfer of programs and data from the host. A Bootfile Server Driver is provided on the host to download bootable images to the target across the Ethernet. The SPP provides driver routines in source form, so that if necessary the drivers can be modified. Figure 4 illustrates the use of the SPP in debugging the stand-alone application on the target system. HOST DEVELOPMENT SYSTEMS The SPP has been designed to run on a variety of development hosts, provided that the host utilizes the R3000 processor. By running on a native mode platform, using a toolkit that was developed by the same people that developed the processor itself, there is no danger of errors in the interpretation of the workings of the processor. SPP Makefiles Compiler System Standalone Library Operating System Simulator Development Machine Operating System RS232 Development Machine t----1IIIj .. Figure 6. The Systems Programmer's Package for the lOT 79R3000 Optional Etherne'-t---.-------' ~ ................•.. Figure 4. Using the SPP During Hardware/Software Integration 9.15 3 A POWERFUL DEVELOPMENT TOOL FOR THE lOT 79R3000 RISCFAMILY CONFERENCE PAPER CP-01 .1;; :1 111111 Mac-II Board 1 User Bmips M/120 6-12 Users 14 mips RC2030 2-4 Users 12 mips Ml2000 Many Users 20 mips Figure 5. Range of SPP Host Development Systems for the IDT 79R3000 Platforms for the SPP include MIPS' M Series of RISC computers, which are powerful (1 0 to 20 VAX MIPS) multi-user UNIX environments, and also includes the 7RS201 Macintosh-II development card available from lOT. The choice of host development machine is primarily determined by the number of developers of the application. Figure 5 illustrates the range of SPP hosts available. CONCLUSIONS The lOT 79R3000 is a very powerful microprocessor based upon RISC principles. The processor architecture is supported by a very powerful toolset, the SPP, which supports product development during all phases of the design process. During the initial architectural phase, the SPP provides tools which enable the system designer to make appropriate tradeoffs in cache design to achieve very cost effective performance for the end application. Software development can then proceed concurrently with hardware development. The SPP includes language and library tools to facilitate development, as well as a powerful symbolic debugger for the software debug effort. An entire simulation environment simulates the processor, memory, and 1/0 subsystems. It is used to fully simulate the target system before it is even fully designed, and allows the target operating system and software to be debugged on an existing, functional platform. Finally, the process of integrating software onto the target hardware is facilitated by modules included in the SPP. Source code for communications drivers, diagnostics, and a debugger and system monitor allow very powerful debug support as the software is brought up on the target hardware. ' Altogether, the SPP enables applications to be developed forthe 79R3000 in a very timely fashion, and allows the system designers to concentrate on developing the target system rather than on developing tools. The SPP provides the user with total control over the development process. Its power has been demonstrated at MIPS Computer Systems, where it is used to develop operating systems for new computers. Operating System Development Machine Figure 6. The Systems Programmer's Package for the lOT 79R3000 UNIX Is a tra:Jemark of AT&T VAX Is a trademark of Dighal Equipment Corporation Macintosh Is a trademark of Apple Corrputer MIPS and RISc/os are trademarks 01 MIPS Corrputer Systems 9.15 4 ~ ~® DEVELOPING APPLICATIONS FOR THE lOT 79R3000 RISC MICROPROCESSOR Integrated Device Technology, Inc. CONFERENCE PAPER CP-02 AS PRESENTED AT NORTHCON '89 By Philip Bourekas INTRODUCTION ARCHITECTURE OVERVIEW Today's high-speed RiSe microprocessors bring new levels of performance to todays' applications. This performance is a blend of the raw system speed and of the architecture of the processor. However, these processors require a different design methodology than the familiar techniques applicable to lower speed elsetype microprocessors. Todays' processors feature tools rich in software content, appropriate to the software-intensive nature of RiSe. The lOT 79R3000 Rise Microprocessor family is richly supported by a wide variety of development tools, benefiting all stages of the development process. These tools allow designers to maximize performance, minimize cost, and significantly reduce time to market. Additionally, the tools are flexible enough to be well suited to the requirements of either reprogrammable or embedded type applications. The lOT 79R3000 is a high-performance microprocessor using RiSe techniques to achieve 27 MIPS performance in a 33MHz system. The 79R3000 architecture was developed by MIPS Computer Systems as an evolution of work originally begun at Stanford University, and is a second generation implementation of the MIPS instruction set architecture. The 79R3000 achieves this high-performance through the combination of low average cycles per instruction (typical of RISC processors) and high instruction and data bandwidth. The 79R3000 consists of two tightly-coupled processors on a single chip. The 32-bit integer RISC CPU is complemented by CPO, the on-chip system control coprocessor containing an MMU with 64-entry, fully associative TLB, processor control and status registers, and an on-chip cache controller featuring 267MBytes/second of processor bandwidth using industry standard static RAMs. ,..--_ _ _ _ _ _ _ _-,Da Data Addr 5 79R3000 CPU Instruction CE, Cache WE (4k - 256k bytes) Data Data Address Memory and I/O Memory Control Interrupts Figure 1. Typical 79R3000 Based System o 1989 Integrated Device Tec:hnology, Inc:. Printed in the U.S.A. 9.16 6/89 DEVELOPING APPLICATIONS FOR THE lOT 79R3000 RISC MICROPROCESSOR CONFERENCE PAPER CP-02 The arch~ecture features a separate floating point accelerator, the 79R3010, which executes at 9.3 single precision Unpack MFLOPS (4.3 MFlops double precision). Also frequently used is the 79R3020 write buffer, which allows the processor to execute stores to main memory at the processor cycle rate, rather than the main memory rate. Figure 1 illustrates a typical 79R3000 basedsystem. The full performance of the 79R3000 family is a result of balanced integration between software and hardware. The integer CPU contains a five-stage pipeline, therefore executing five instructions concurrently and reducing the average time per instruction. Optimizing compiler technology serves to both reduce the number of dynamic instructions required to execute a given task, and also to insure that the various resources of the CPU are fully utilized. Examples of compiler optimization include scheduling instructions to eliminate latency effects in branch or load instructions, and also scheduling resources to minimize the occurance of hardware interlocks which might arise if an access to a busy resource is attempted. in depth from 4k Bytes through 256kBytes. Additionally, each cache can have independent cache refill block sizes of 1 through 32 words (block refill refers to how much data is retrieved from main memory when processing a cache miss; block refill relies on the principle of locality of reference to improve net processor performance by amortizing the expense of going to slower main memory over a number of instruction or data elements likely to be needed). The on-chip cache controller implements separate, directmapped Instruction and Data caches using industry standard static RAM devices such as the IDT 7198 (16k x 4) or lOT 71586 (4k x 16 with integrated address latch), using a single address and single data bus. External latches capture the address for the cache access, while the 79R3000 provides signals to directly control the address latching, RAM output enable, and RAM write control for each cache. The tag comparison occurs on-chip, simultaneous with the data access, minimizing the amount of time necessary for the cache control function. Thus, it is very simple to implement a highly-efficient cache for the 79R3000 while minimizing the amount of logic needed to implement the cache control. Figure 2 illustrates the design of 64kBytes each of Instruction and Data Cache, a configuration common in workstation applications. ON-CHIP CACHE CONTROL The 79R3000 contains an on-chip cache controller which controls separate Instruction and Data Caches. Each cache can vary Data and Tag Buses I\I---------l~~'--_;=::;:==;__, AdrLo Bus IRd1* IWr1* lelk 79R3000 Processor IRd2* IWr2* Clk2xSys Clk2xRd Clk2xSmp Clk2xPhi DRd1* DWr1* DClk DRd2* DWr2* SysOut * Xen * Address Data 32 + 4-bits Parity Figure 2. 64kBytes of Instruction and Data Cache 9.16 2 DEVELOPING APPLICATIONS FOR THE lOT 79R3000 RISC MICROPROCESSOR CONFERENCE PAPER CP--02 constraints in terms of cost, size and performance. The lOT 79R3000 is supported by tools which allow system designers to evaluate their proposed system in advance of committing to PC board, and to verify that the system design goals are achieved. ASYNCHRONOUS MEMORY INTERFACE The 79R3000 also incorporates a simple, flexible interface to ,main memory resources, including RAM, PROM, and liD. This interface is supported through the use of the Asynchronous Memory Interface, which is used whenever uncached data is required (e.g. cache miss processing, uncacheable memory, liD devices, etc.). The asynchronous interface consists of control signals for reading and writing main memory. The processor asserts its request signal (e.g. MemRd, which indicates a main memory read), and main memory throttles the processor with the appropriate control signal (e.g. RdBusy, which says that the required data is not yet available). Further information about the type of transaction is available from the Access Type bus, which indicates the size and cause of the data access. The processor automatically updates the contents of its caches when main memory traffic was initiated by a cache miss. The 79R3000 can be used with a wide variety of memory interfaces. In compute server type systems, the asynchronous interface is typically used as part of a backplane bus arbiter circuit. The memory required by the processor then typically takes many cycles before the first transaction can be completed, while immediately subsequent transactions take considerably fewer cycles (the 79R3000 block refill mechanism takes advantage of this fact). In embedded systems, the processor typically has much more direct control of its main memory resources, and these signals are used with a simple state machine to facilitate transfer activity between the main memory bus and the CPU-cache bus. Thus, the 79R3000 asynchronous memory interface is flexible enough to satisfy a wide variety of applications. With all of this flexibility, it is important that the system designer have a rich toolset available to assess the implications of certain design decisions. This toolset must account for the application PERFORMANCE CONSIDERATIONS This architecture results in the highest standard of microprocessor performance. In developing an application for the 79R3000, it is important to understand the fundamental reasons for the processor's high standard of performance, and to understand the implication of design decisions in terms of performance. In this way, the cost of the system can be optimized for the performance level required by the application. The performance of the 79R3000 is a result of its low average clocks per instruction, and its high memory bandwidth. The pipeline stages allow multiple instructions to be in various stages of processing simultaneously, as shown in figure 3. (Compiler technology serves to insure full use of this parallelism.) The instruction cache allows a new instruction to be initiated on every processor clock cycle, and the low average instruction latency means that instructions are completed ,at close to one cycle per instruction. The purpose of the instruction cache is to provide the majority of the instructions at the processor clock rate. Instruction bandwidth is complemented by data bandwidth. The purpose of the system write buffers (such as the lOT 79R3020) is to allow the processor to finish with a data write in one clock cycle, even if the system takes multiple clock cycles to retire the data. The data cache provides for very fast data load operations, and is capable of supplying a data operand in every clock cycle (the 79R3000 can get both an instruction and data item in each clock cycle, resulting in 267MBytes/sec bandwidth at 33Mhz). Instruction #1 r-------~------------ Instruction Fetch Instruction Fetch ---------------------- --------------------Instruction #2 r------~------------- Instruction Decode Instruction Fetch Instruction #3 -,.----------,------------Instruction Fetch ALU Memory Write Back Figure 3. Pipeline Stages of the 79R3000 9.16 3 DEVELOPING APPLICATIONS FOR THE lOT 79R3000 RISC MICROPROCESSOR CONFERENCE PAPER CP-02 Performance in a given application is thus to a first approximation a function of system speed and cache efficiency. That is, more cache hits imply that more information is given to the processor at the single cycle rate. Coupled with high frequency (short cycle time) operation, the 79R3000 achieves ultra-high performance. The 79R3000 includes features which further help to minimize the impact of cache miss processing. These features include the block refill capability, which allows multiple main memory words to be retrieved when performing cache miss processing. Block refill takes advantage of the difference between memory latency (the time to the first operand), and memory bandwidth (the time between successiveoperands). The 79R3000 allows the designer to select the block refill appropriate to the application. Further, the 79R3000 performs instruction streaming, which is the simultaneous fetch and execution of cached instructions, to result in substantial performance improvement when filling the cache with a new process or task. The amount of cache "sufficient" for a given application is a function of a number of factors. These factors include the latency to main memory (how long on average it takes to receive an operand not in the cache), as well as the instruction mix (the reference and time locality of the code, the amount of data versus ALU operations, etc.). The system designer is free to implement more or less cache, as appropriate for the system, depending on his cost and performance constraints. One tool available to help the system designer determine the appropriate amount of cache for a given application is the Cache-2000 cache simulator. With Cache-2000, the designsr can "describe" a proposed system and evaluate the performance of representative software on that system. Cache-2000 does not require the software engineer to modify source code, but rather works from the executable image of the code using UNIX profiling tools to determine the number of cycles required to execute the program on the described system. Inputs to Cache-2000 include the cache sizes, processor frequency, main memory latency, write buffer depth, and block refill sizes selected. Cache-2000 provides a highly accurate simulation of the software performance on the system described, and gives the system designer a firm basis for making design trade-offs. A final consideration exists for real time systems, where a general purpose cache may not satisfy the predictability requirements Few InstructIons, 110 IntensIve (e.g. DataComm) Tight Loops, Data Intensive . (e.g. Laser Printer) of the application. A number of techniques for the 79R3000 exist which guarantee predictable, high-performance in real time applications. These techniques allow the system designer to use either hardware or software to lock time critical portions of the operating system into cache at all times, guaranteeing single cycle access to kernel instructions. Alternately, the system designer could take advantage of the on-chip cache controller to construct a memory system entirely with synchronous memory devices (e.g. SRAM), and achieve the highest levels of system performance and true predictability. Figure 4 illustrates alternative ways in which the 79R3000 flexibility and performance can be brought to embedded applications. The performance evaluation tools allow the system designer to conclude in advance whether a given system architecture is appropriate to the task. COST TRADEOFFS Once the appropriate system configuration is determined, the system designer can go about designing the lowest cost system achieving the desired level of performance. IDT has written applications notes describing ways to reduce system cost while not affecting application performance. These techniques take advantage of the nature of various types of applications, and use the integration of the 79R3000 to full advantage. By taking advantage of these features, the system designer can build "just as much" cache as is needed for the application. Specifically, the 79R3000 cache controller was designed to implement a line size of one word (that is, one tag, or main memory origin descriptor per 32-bit word in the cache). Further, the cache controller was designed to support a full 4 Gigabytes of cacheable main memory, and to implement caches between 4k bytes and 256k Bytes in depth. In doing so, the. on-chip cache controller will compare 20-bits of tag on each cache cycle. However, it is not required that all tag values come from ram elements. In systems which implement caches larger than 4k Bytes, it is possible to provide a feedback path between the processor output cache address and the input low-order tags. This can be done with a single 74FCT244A buffer/driver, rather than RAM. General Control, Limited Memory Needed (e.g. Robotics) High Performance Real Time System (e.g. Flight Simulator) Figure 4. Typical 79R3000-based Systems 9.16 4 DEVELOPING APPLICATIONS FOR THE IDT 79R3000 RISC MICROPROCESSOR CONFERENCE PAPER CP-02 Similarly, many embedded applications require less than 4 Gigabytes of cacheable memory (a number more typically associated with reprogram mabie applications). In these systems, the designer can "hard-wire" high-order tags, using resistors or buffers, and reduce the cost of the cache by reducing the cacheable memory space. This still allows the system designer to access the entire 4 GigaByte address space for memory decoding, etc. Additionally, there is no absolute requirement that the cache line size be one word. The system designer could take advantage of the processor's built in block refill capability to implement a larger line size. If the line size were increased to only four words, then one-fourth as many tags would be required as data elements, and the cost of the cache is further reduced. These include hardware models of the component family, such as those available from Mentor or Valid Logic. These models use actual components to sim ulate the interaction of the CPU in the target system, while the system exists only in schematic form on the CAD platform. This reduces the risk that the first PC board has significant errors. When the system is being initially debugged, logic analyzer tools are used to trace the system activity. The 79R3000 is supported on a number of logic analyzers, including Gould, Tektronix, and HP family offerings. These analyzers disassemble the bus traffic, and are invaluable, non-intrusive aids during the debug process. Software tools are also used during system debug. The PROM monitor, available from lOT, includes useful debug aids such as breakpoint, memory examine and modify, diagnostics, etc. The PROM monitor is available in source form, allowing full customization to the requirements of the target system. Thus, the interaction of the target software and hardware can be examined and confirmed during the debug stages. THE ROLE OF DEVELOPMENT TOOLS For the system designer to properly design the application, it is important that sufficient tools be in place. Many of these decisions will be determined by the software to be run on the application. The 79R3000 features a wide variety of software development tools, which allow the concurrent development of both hardware and software. These tools allow the software to be developed before the target hardware is designed. Thus, the system designer works from accurate information regarding the nature of the software, its dynamic execution profile, and the amount of memory required to support it. The software developer develops his application on the development host, and uses Sable, the architecture and instruction set simulator, to develop all of the application code prior to target system availability. The hardware designer can also draw on a variety of development and debug tools. In addition to the Cache-2000 program, there are a number of other tools used by the hardware designer. PRE-PACKAGED SOLUTIONS There are a number of other products available to minimize the amount of time required for RISC development. These include hardware and software products. For the hardware designer, lOT offers a family of RISC subsystems, integrating the CPU and Caches onto a single, small footprint module such as the subsystem shown in figure S. These modules are ideal for initial development and even for full production. The risk associated with high-speed system design is eliminated, and the customer can focus on his value added software and peripherals. , FPU .. OSC r--- Reset Init Options IntrS-Q CPU Data C) § '" s::: .. Init PAL State Mach ~ ~ >- (4)~ J r+ ~-+ U AddrLo Q) J:!:.. 118 I (4) - Tag ~ 1 ... t r:( M ...... M I- .... I .. Buffered Sys Clock ~: ~ ,~ f++ o Cache r. ~ X 60 Bits .. U ~ > ~ ...!::..I ~ 120 + 1 +3 (6) , .... V 16 Mem Rd ~. ~~ • .... V 16 , -OE A , Address I ··x'· :.' -OE y I (4) ,:. ~ (S) , --+I FCT374AI ~ :.j.' I ..........36 + 4 Write Buffer FCT823BIFCT823B~ r ,; ~ t:lFCT373AI Busy Rd i f+-+ I Cache x 60 Bits"' 7 (1) I FCT240A ~ I M 0 '---- ,, 1 32 + 4 , ~ Write Req & Ack , :. t .. ,:. ; t :.;:;..-: Data + Parity I Figure 5. IDT RISC Subsystem 9.16 &I 5 DEVELOPING APPLICATIONS FOR THE lOT 79R3000 RISC MICROPROCESSOR CONFERENCE PAPER CP-02 For the software designer, a rich toolset is also available. The Stand-alone I/O library interfaces between C language I/O calls and direct device drivers. The PROM monitor eliminates the need for the software developer to develop tools, and thus allows him to focus on the software for the application itself. Finally, a wide variety of operating system support is available, including real time operating systems for either C or Ada applications. CONCLUSIONS The 1DT79R3000 architecture is ideal for a wide variety of highperformance applications, ranging from embedded applications through real-time systems and including high-performance reprogram mabie applications such as workstations and file System Architecture Evaluation servers. In addition to its inherent high-performance, the 79R3000 architecture is flexible enough to allow the system designerto configure the system according to the application requirements, thus achieving the desired performance at minimum cost. In order to bring the performance of the processor to the application system, it is important that a rich and robust toolset be available at all stages of the system development cycle. The 79R3000 provides tools for both the hardware and software teams to use during all phases of the development project, as shown in figure 6. The end result is the highest-performance processor available today, and a toolset which enables application developers to bring that performance to their systems at minimum cost and with minimal development effort. System Development Phase System Integration and Verification Dis-assembly Diagnostics PROM Monitor Figure 6. System Development Phases and Support UNIX is a trademark 01 AT&T VAX Is a trademark 01 Dlg~aI Equipment Corporation Macintosh Is a trademark 01 Apple Corfl)uter MIPS and RISCIos are trademarks 01 MIPS CorJl)uter Systems 9.16 6 G® Integrated Device Technology, Inc. lOT's R3001 SIMPLIFIES DESIGN OF HIGH-PERFORMANCE CONTROL SYSTEMS CONFERENCE PAPER CP-03 by Philip Bourekas ABSTRACT • This paper discusses the architecture of the R3001 RISControlier™, a derivative of MIPS Computer Systems R3000 microprocessor which lOT has developed to address the particular needs of embedded systems designers. This paper discusses the architectural features of the MIPS RISC architecture which make it well suited to embedded applications, and discusses the changes to the R3000 implementation embodied in the R3001. The paper also gives examples of how these changes help embedded system designers achieve the goals of their applications. INTRODUCTION The lOT R3000 microprocessor (the MIPS RISC Processor) has found widespread acceptance among re-programmable applications such as UNIXTM workstations and server systems. Less widely known, but of considerable significance, is the growing acceptance in the embedded marketplace in applications including real-time control systems, data communications, laser printer controllers, graphics terminals, and avionics controllers. This acceptance is testimony to the elegance of the instruction set, software, and the basic device implementation. In working with such a diversity of customers, however, lOT identified a set of significant changes to the R3000 device which make it even better suited to solving these types of control problems. While the R3000 is obviously a good device for many embedded applications, it became clear from our customers that it would be possible to implement incremental changes to the device to simplify and broaden its application in embedded systems. lOT has implemented a derivative of the R3000 which addresses many of the issues the R3000 brought to embedded system designers. The R3001 is the solution. The design goals of the R3001 were to: Maintain FULL software compatibility with the R3000, at both the kernel and user levels, to maximize the wealth of software support (both development and applications) available for the MIPS RISC architecture. • Allow embedded system designers to realize the performance of the R3000 at lower total system cost (fewer devices, less power, less board area, etc.). • Allow the system designer more options in the design and partitioning of the high-performance memory system. • Give the system designer full control of all aspects of system design; don't make each system pay forthe full set of worst case assumptions about systems dramatically different from his or her target application. • Recognize the needs of real-time deterministic systems, and provide solutions to the problems of general purpose caches in these applications. • Support systems which do not wish to implement "cache", such as real-time systems. • Support the use of complementary products designed for the R3000, such as the high-performance R301 0 Floating Point, cache RAMs, interface chips, etc. • Support systems which distribute the processing task among tightly-coupled heterogenous processing devices, such as systems with I/O processors working directly with the R3001 local memory. • Maintain same pin count as R3000 (which is pad limited; higher pin count would thus result in higher end-user device cost). Pin compatibility was not a constraint. • Achieve the same range of speeds as the standard R3000. • Maintain the full performance of the R3000. This paper discusses the architecture of the R3001, and how it achieves the above goals. A few design examples are included to help clarify how these changes achieve the goals established for the project. MIPS is a trademark of MIPS Computer Systems. Inc. VAX is a trademark of Digttal Equipment Corp. UNIX is a registered trademark of AT&T. RISControlierv is a trademark of Integrated Device Technology. Inc. et990 Integrated Device Technology. Inc. 9.17 8/90 lOT's R3001 SIMPLIFIES DESIGN OF HIGH-PERFORMANCE CONTROL SYSTEMS CONFERENCE PAPER CP-03 software development tools, including debugging support, system profiling, system performance projection, and target system software simulation. This allows system software to be developed quickly and in parallel with the target hardware, minimizing time-to-market. THE R3000 IN EMBEDDED SYSTEMS There are a number of reasons why the R3000 has been used in embedded systems. The primary reasons are: The R3000 attains the highest levels of performance, based on its efficient architecture: 28 VAX Units of Performance at 33 MHz, sustained, in real-world application. Other processor architectures require much higher frequencies to approximate the performance achieved by the R3000. The R301 0 Floating Point Acceleratorco-processorbrings excellent floating point performance to those systems that need it, such as military avionics control systems. The compilers forthe R3000 achieve most of the efficiency of programming in assembly language, while offering the ease of development of high-level languages. It is possible to use a single basic CPU subsystem design, and achieve various levels of performance by either varying cache depth, system speed, or memory interface (write buffer depth and block refill size), and by using hardware or software floating pOint. The R3000 development environment provides powerful THE R3000 ARCHITECTURE The R3000 implements a strictly hierarchical view of memory, appropriate for minicomputer systems. In a typical R3000 system, a given level of memory is a high-speed cache of the larger, slower memory below in the hierarchy, as shown in Figure 1. For example, the on-chip register file contains the most frequently used data items; the next high-speed memory is implemented as high-speed instruction and data caches for the main memory, which is a "cache" of the mass storage system. The R3000 utilizes various techniques to manage the interaction of the various levels, from register allocation algorithms implemented in the compilers to TAG comparison of the cache, and an MMU to support a demand paged virtual memory system. Highest Bus Bandwidth (Native Rate of the CPU) On-Chip Register File Managed by Compilers, Programmer's 3 Words/ Clock Cycle Instruction and Data Caches Managed by On-Chip Controller 2 Words/Clock Cycle Main Memory Capabilities are system dependent. Long Latency, but Burst Bandwidth of 1 Word/Clock Cyeie Lowest Bandwidth (Where the Program Resides) Mass Storage Managed by MMU, Page Tables Figure 1. Hierarchies of Memory in an R3000-Based System 9.17 2 lOT's R3001 SIMPLIFIES DESIGN OF HIGH-PERFORMANCE CONTROL SYSTEMS CONFERENCE PAPER CP-03 The R3000 integrates a direct mapped cache controller on Chip. The R3000 requires that a full 60 bits of data be provided by each cache in each cycle; these 60 bits include 32 bits of data, 4 bits of data parity, 20 bits of TAG, a valid bit, and 3 bits of TAG parity. This amount of overhead is required because the R3000 allows (and assumes) cacheable main memory as large as 4 GB, and a cache as small as 4 KB. All of these techniques combine to allow the R3000 to bring the highest levels of performance to microprocessor-based systems. However, these techniques are not necessarily appropriate to embedded applications, and thus add complexity and cost to the system design. For example, an embedded system typically contains all instructions in the system PROMs; there is no need to perform demand paging from disks. Similarly, embedded systems may not operate in a multitasking model, but rather run a single executive process. This simplifies (or eliminates) the requirement for managing virtual memory references. In fact, many embedded systems do not wish to deal at all with caches. This may be motivated by cost, area, or system architectural requirements. For example, a real-time system may be vastly complicated by a general purpose cache; the system designer can not be guaranteed what is in the cache when a given system event occurs, and thus the general purpose cache brings no performance gain to the real-time system. In fact, in these real-time systems, general purpose caches are viewed as "non-deterministic", and thus contradict a basic requirement of real-time applications. THE lOT 79R3001 ARCHITECTURE From the perspective of a programmer, there is no difference between the standard R3000 and the lOT 79R3001. Both devices contain the same basic execution core and memory management unit, thus eliminating any risk of software incompatibility. The R3001 is, therefore, fully software compatible with the R3000, at both the kernel and user levels of software. Neither user programs nor kernels need to be modified to use the R3001. However, the R3001 can present a totally different look to the system designer compared to the model assumed by the R3000. The R3001 was implemented as a set of design changes to the bus-interface of the R3000. These changes were selected to maximize the amount of flexibility the system designer has when implementing the appropriate memory subsystem for his embedded application. Whereas typical uni-processorcomputersystems tendto have relatively similar memory requirements, embedded applications tend to have dramatically different requirements, depending onthe particular problem to be solved. THE R3001 MEMORY INTERFACE While the R3001 does support the hierarchical memory structure ofthe R3000 (at lower system cost), the RISControlier goes beyond the capabilities of the R3000 by allowing a different view of memory. Rather than assuming that the highest speed memory is always used as a general purpose cache, the R3001 allows the system designer to use this memory space in a wide variety of ways. This flexibility is the key to the R3001 in embedded systems. The R3001 allows the system deSigner to view the fast memory as a synchronous memory space, distinct from (ratherthan a cache of) slower memory. This partitioning is at the discretion of the system designer; in some systems a general purpose cache model might still be used, while in other systems a "cache less" memory system would be implemented to assure real-time performance. The R3001 directly controls two areas of memory: the synchronous memory space, and the asynchronous memory space. Synchronous memory is further subdivided into separate instruction and data portions, to supply the highest possible bandwidth to the processor. Unlike the R3000, it is simple for the system designer to implement these memory spaces as two different regions of memory, rather than implement a hierarchical relationship between them. The memory interfaces supported by the R3001 are illustrated in Figure 2. The R3001 directly controls the synchronous memory spaces using a single data bus and single address bus but separate sets of control pins (one set for instruction and one fordata). The control pins control the external de-multiplexing of address (using external transparent latches) and muniplexing of data (using the output enable of the memory devices). The synchronous interface performs both an instruction and data access per clock cycle on alternate phases of the clock. It presents an address for the instruction in one phase and completes the data transaction in the same phase. In the next phase it completes the instruction transaction and initiates a new data transaction. The operation of the synchronous interface is illustrated in Figure 3. In systems which wish to implement a hierarchical relationship between the high-speed synchronous memory and slower memory, the R3001 simplifies the design of the high-speed caches by making the on-Chip direct-mapped cache controller much more flexible. All the system deSigner needs to do is ''widen'' the synchronous memory space by connecting standard memory devices to the appropriate TAG lines of the processor. The CPU will automatically manage the cache, detect hits and process misses by accessing the asynchronous memory space. 9.17 3 lOT's R3001 SIMPLIFIES DESIGN OF HIGH·PERFORMANCE CONTROL SYSTEMS Data Bus Data Bus Tag Bus - CONFERENCE PAPER CP-03 - - - Data Bus Tag Bus Tag Bus _AdrLoBus , ~ Tag TagV AdrLo AdrLo Bus I- Data DataP . \. >' -Trans· parent ~ IClk Latch Data Tag Synchronous Instruction Memory ,. DClk >' , OE I+- IRd DRd ~ OE WE I+- DWr ~ WE IWr Synchronous Data Memory , ... (ReadIWrite Buffers) >' Tag Data DAdr . ,. . Asynchronous Memory Interface ~ I- Trans· parent Latch R3001 Processor with System Control Coprocessor IAdr r- : :::: ::... .... ... ::: ;. ;. ... XEn SysOut AccTy[2:0] MemRd MemWr RdBusy WrBusy CpCond[O] BusError DMA Stall Clk2xSys Clk2xSmpiRd r Clocks ~- Clk2xPhi Reset CpSync Run Exc CpBusy ~ CpCond[3:1 ] ~ Intr(5:0) Coprocessors .... ;., :::... ... .. L Hardware Interrupts .. ... 1213 I I 'vI- Asynchronous Memory Figure 2. Memory Interfaces of lOT's R3001 9.17 4 lOT's R3001 SIMPUFIES DESIGN OF HIGH·PERFORMANCE CONTROL SYSTEMS ~ (lns::::d::adllr__(_:_:'_,~_.2:_:_:_:_) (InS:::~d::adl ---. Addrlo 1 CONFERENCE PAPER CP·03 J DClk -;;------.\J,''-___--' IClk IRd -:----------' 1 ---....L.IO-___ L,!:-!__-"'nn,i::,:.~, ~-----,n:::-I ......I. :: :l \J:=-' ! _-----'n:I. DRd 1 :~ Dwr~----------~~----------~~------~LI Data and TAG Busses ~~CJ~~---~CJ~~----~.I~~~~:!:. Instr. RAM ":1. Data R A M !....': '1.'::' Instr. RAM I L-f' CPU Data Pins . II Figure 3. R3001 Synchronous Memory Interface Operation In these hierarchical systems, the R3001 implements caches with substantially lower cost than the R3000 does. This is because of the flexibility of the synchronous memory interface. The R3001 eliminates TAG parity, and makes data parity optional. Additionally, rather than requiring TAGs for a model of 4 GB memory and a 4 KB cache, the R3001 only requires the amount of TAGs required for that particular system. Most embedded systems implement main memory of 8 MB or less, and build caches of 8 KB or larger. Thus, rather than requiring 60 bits per cache to support the R3000 cache controller (or 15 high-speed 16Kx4 memories), R3001 based systems can be implemented with cache widths of 44 bits or less (only 11 RAMs per cache). This saves a minimum of 8 high-speed RAMs per system, as shown in Figure 4. Note, however, that disabling the checking of TAG bits does not reduce the amount of memory supported by the processor; it 9.17 reduces the amount of memory which may be cached. The fu II 4 GB address space is still available (although not all of it is cacheable) in the system, simplifying address decoding for various types of memory such as memory-mapped I/O devices. This same mechanism extends to allow cacheless systems; the system designer merely disables all TAG checking. The architecture of the processor allows software to separate synchronous and asynchronous memory references by using virtual addresses that are either"cacheable" (synchronous) or "uncacheable" (asynchronous). This approach yields direct benefits to real-time systems. By eliminating the hierarchical model of memory, the highperformance real-time system designer can accurately measure and predict critical real-time metrics such as context switch time and interrupt latency without worrying about the uncertainty associated with general purpose caches. 5 lOT's R3001 SIMPLIFIES DESIGN OF HIGH-PERFORMANCE CONTROL SYSTEMS CONFERENCE PAPER CP-03 R3001 A TAG 13:15, 27:31 ~ TAG 16:26 Valid Data y AddrLo DClk 11.- Data Cache Tags 3x1DT7198 .c! ;:=... ~ Data Cache Data 8x1DT7198 .c! ;::... ~ ..J,.J .J. .~ f4- h IRd DWr DClk DRd IWr LL ~ f4- ~ WE LE ~ ..... OE OE .. k To ?Address Decoder ¥4~ "'" t- .c! ~ Instruction Cache Data 8x1DT7198 ~ ~ 11 Instruction Cache Tags 3xlDT7198 .c! ;:=... FIgure 4. Cost-Reduced R3001 Cached System DMASUPPORT Since the R3000 always assumes a cache-based system. the only external access to the fast memory supported are those required for multi-processor cache coherency. The R3000 thus incorporates a simple but flexible interface which allows an external agent to specify a given cache line. and to request that the processor invalidate that line by writing into it. System DMA events. such as disk transfers. are assumed to only occur in the asynchronous memory space. In this memory space. the processor is not a master, but merely another requester. . While this model is appropriate for minicomputer type systems, it severely limits the embedded system designer. Many embedded systems use different types of specialty processors under the control of a single, general purpose processor. For example, a threat recognition system might contain a DSP subsystem to perform image recognition. and use a general purpose processor for prioritization or response to threats. Such a system needs a high-bandwidth method of communicating between these processors. a natural application for DMA transfers. The R3001 allows an external master to request mastership of the synchronous memory bus and control signals. as shown in Figure 5. The DMA controller then can quickly transfer data from one processor memory into the high-speed R3001 synchronous memory. allowing the RISController to process the data (or special instruction sequence) rapidly. CACHELESS SYSTEM A high-performance processor requires a great deal of memory bandwidth to keep the execution engine running at full speed. Typically. caches are used t6 supply this bandwidth. In many applications. however. performance can be traded for cost. and the use of caches can be avoided. Alternately. in many other applications. it is feasible to implement a large enough fast memory that traditional caching strategies can be avoided. 9.17 6 lOT's R3001 SIMPLIFIES DESIGN OF HIGH-PERFORMANCE CONTROL SYSTEMS Tag CONFERENCE PAPER CP-03 Synchronous Instruction Memory Data AddrLo ~~---I Cache Ctrll-l-.,..~""'-""----~--~ IDT79R3001 RISControlier Synchronous Data Memory DMAStall Req. AddrLo Cache Ctrl Tag DMA Controller Async IIF Ctrl 1---------' I--------------.....~ L...--r-----r---J Main Mem CtrlJ---- Figure 5. R3001 DMA Arbiter Interface The system design shown in Figure 6 illustrates a cacheless system offering 7 mips of performance (based on realworld embedded applications such as page-description language interpreters). Rather than using caches, a set of ROMs contains the processor instructions, and the processor clock rate is slowed to 8 MHz to allow a ROM access every clock cycle. In addition to the instruction ROMs in the synchronous space, there is a bank of DRAM for main storage (in a laser printer or graphics application, this would contain the page or screen image and the display lists). Peripheral devices, such as communications devices, are also resident in the asynchronous memory space. All instruction accesses are satisfied by the ROMs, providing zero-wait state access to instructions. Since the processor clock speed is reduced to 8 MHz, DRAM accesses can be satisfied in 3 clock cycles (even if using 180 ns DRAMs, such as those available in the military market place). Note that a small data cache, implemented using just 5 2Kx8 SRAMs at 90ns (very inexpensive devices) could further increase performance by reducing the number of DRAM references necessary. Implementing the data cache would merely require the inclusion of the address latch and the memory devices. A similar system could be implemented using dense SRAMs rather than ROMs, allowing higher speed operation. Such a system would probably require the boot ROMs to be resident in the asynchronous memory space. This type of system is typical of many of the real-time applications of the R300 1. The R3001 makes this possible because of its processors ability to implement cacheless systems, and because of its support of large (16 MB) synchronous memory. REAL-TIME CACHE BASED SYSTEM It is possible to implement a system with the benefits of caching which also satisfies the deterministic requirements of the real-time system designer. With the R3001, the real-time system designer can easily implement a specialized caching structure which guarantees that critical kernel routines will be executed with constant execution time (ratherthan the variable execution time which is a result of the uncertainty of whether the code is cache resident). 9.17 7 9 I lOT's R3001 SIMPLIFIES DESIGN OF HIGH-PERFORMANCE CONTROL SYSTEMS CONFERENCE PAPER CP-03 2 x FCT373A R3001 ROM OE IClK t----I---t~ 8M HZ IRD t - - _ . . - - - - - - - P I RdBusy Address Mux PAL STATE MACHINE DRAM Figure 6. R3001 ROM Based, Cacheless System The basic technique involves segmenting the cache into two portions: a dedicated kernel section, where the time critical code is resident and which tasks can riot overwrite, and a general purpose section, where the tasks benefit from caching but which is non-deterministic. Software separates memory areas forttie time critical code from the general task, according to the. addresses used. External address decoding is then used to separate kernel from general accesses, and select the right cache area depending on the decoded address. Figure 7 shows the cache subsystem which might be implemented for such a design. A high-order address line indicates whether the access is for the time critical kernel, or is a general access. If a kernel access is indicated, then the chip-enable for the kernel cache is activated and the enable for the general cache is negated,thus insuring a kernel reference. A simple routine at system startup pre-loads the critical code into the kernel cache, so that all invocations of those routines result in cache "hits". This achieves the highest levels of real-time performance, utilizes the caches to minimize overall system cost and achieves the highest levels of realtime throughput. 9.17 8 lOT's R3001 SIMPLIFIES DESIGN OF HIGH-PERFORMANCE CONTROL SYSTEMS CONFERENCE PAPER CP-03 R3001 Data AddrLo Data Cache Kernel I-Cache Task I-Cache Figure 7. R3001 Real-Time, Cache Based System SUMMARY The introduction of the R3001 enables the designers of embedded systems to bring the performance inherent in the MIPS RiSe architecture to their embedded applications by satisfying the cost, area, and performance constraints of various applications. This allows the designer to use a single base architecture throughout their system, for example, by using the R3001 to manage I/O processing in an R3000 9.17 based minicomputer. This allows both IDT and ourcustomers to leverage their experience with the architecture, and has proven to be attractive to a large number of our customers. IDT will continue to innovate with the MIPS RiSe architecture, producing other derivative versions of the architecture which are well suited to specific embedded applications. 9 DOMESTIC SALES ALABAMA CANADA (EASTERN) lOT 4930 Corporate Dr., Ste. 1 Huntsville, AL 35805 (205) 721'()211 (613) 591-9555 ALASKA Canadian Mktg. Tech. Inc. Mississauga, ONT Canadian Mktg. Tech. Inc. Kanata,ONT (416) 612-0900 Westerberg & Associates Bellevue, WA (206) 453..as81 Canadian Mktg. Tech. Inc. Pointe Claire, QUE (514) 694-6088 ARIZONA Western High Tech Mktg. Scottsdale, AZ (602) 860-2702 CANADA (WESTERN) Westerberg & Associates Bellevue, WA (206) 453-8881 ARKANSAS lOT (S. Central Regional Office) 14285 Midway Rd., Ste. 100 Dal/as, TX 75244 (214) 490-6167 CALIFORNIA lOT (Corporate Headquarters) 3236 Scott Blvd. P.O. Box 58015 Santa Clara, CA 95052-8015 (408) 727-6116 lOT (Western Headquarters) 2972 Stender Way Santa Clara, CA 95054 (408) 492-8350 lOT (SW Regional Office) 6 Jenner Dr., Ste. 100 Irvine, CA 92718 (714) 727-4438 lOT (SW Regional Office) 16130 Ventura Blvd., Ste.37o Encino, CA 91436 (818) 981-4438 Quest-Rep San Diego, CA (619) 565-8797 COLORADO lOT (NW Regional Office) 1616 17th St., Ste. 370 Denver, CO 80202 (303) 628-5494 Thorson Rocky Mountain Englewood, CO (303) 799~3435 CONNECTICUT Lindco Associates Woodbury, CT (203) 266-0728 DELAWARE lOT (NE Regional Office) 428 Fourth St., Ste. 6 Annapolis, MD 21403 (301) 858-5423 FLORIDA lOT (SE Regional Office) 1413 S. Patrick Dr., Ste. 10 Indian Harbor Beach, FL 32937 (407) 773-3412 lOT (SE Regional Office) 601 Cleveland St., Ste.400 Clearwater, FL 34615 (813) 447-2884 lOT (SE Regional Office) 1500 N. W. 49th St., Ste.500 Ft. Lauderdale, FL 33309 (305) 776-5431 GEORGIA lOT (SE Regional Office) 1413 S. Patrick Dr., Ste. 10 Indian Harbor Beach, FL 32937 (407) 773-3412 HAWAII lOT (Western Headquarters) 2972 Stender Way Santa Clara, CA 95054 (408) 492-8350 IDAHO (NORTHERN) Westerberg & Associates Bellevue, WA (206) 453-8881 IOWA MINNESOTA Rep Associates Cedar Repids, IA OHMS Technology Inc. Edina, MI (319) 373-0152 (612) 932-2920 KANSAS MISSISSIPPI Rush & West Associates Olathe, KS KENTUCKY lOT (SE Regional Office) 1413 S. Patrick Dr., Ste. 10 Indian Harbor Beach, FL 32937 (407) 773-3412 Norm Case Associates Rocky River, OH MISSOURI (913) 764-2700 (216) 333-0400 Rush & West Associates st. Louis, MO LOUISIANA (314) 965-3322 lOT (S. Central Regional Office) 14285 Midway Rd., Ste. 100 Dal/as, TX 75244 (214) 490-6167 Thorson Rocky Mountain Englewood, CO (303) 799-3435 MAINE (503) 620-1931 lOT (Eastern Headquarters) #2 Westboro Business Park 200 Friberg Pkwy., Ste.4Oo2 Westboro, MA 01581 (508) 898-9266 ILLINOIS MARYLAND lOT (Central Headquarters) 1375 E. Woodfield Rd., Ste.38o Schaumburg,lL 60173 (708) 517-1262 lOT (fJE Regional Office) 428 Fourth St., Ste. 6 Annapolis, MD 21403 (301) 858-5423 Synmark Sales Park Ridge, IL MASSACHUSETTS IDAHO (SOUTHERN) Westerberg & Associates Portland, OR (708) 390-9696 INDIANA Arete Sales Ft. Wayne, IN (219) 423-1478 Arete Sales Greenwood, IN (317) 882-4407 MONTANA lOT (Eastern Headquarters) #2 Westboro Business Park 200 Friberg Pkwy., Ste.4Oo2 Westboro, MA 01581 (508) 898-9266 MICHIGAN Tritech Sales Farmington Hills, MI (313) 442-1200 NEBRASKA lOT (Central Headquarters) 1375 E. Woodfield Rd., Ste.38o Schaumburg,lL 60173 (708) 517-1262 NEVADA (NORTHERN) lOT (Western Headquarters) 2972 Stender Way Santa Clara, CA 95054 (408) 492-8350 NEVADA (SOUTHERN) Western High Tech Mktg. (Clark County, NV) Scottsdale, AZ (602) 860-2702 NEW HAMPSHIRE Quality Components Buffalo, NY IDT (Eastem Headquarters) #2 Westboro Business Park 200 Friberg Pkwy., Ste.4OO2 Westboro, MA 01581 (SOB) 898-9266 (716) 837-5430 PENNSYLVANIA (WESTERN) Norm Case Associates Rocky River, OH Quality Components Manlius, NY (216) 333-0400 (315) 682-8885 SJ Associates Rockville Centre, NY (516) 536-4242 NEW JERSEY IDT (NE Regional Office) One Greentree Centre, Ste.202 Marlton, NJ OB053 (609) 596-8668 SJ Associates SJ Associates NORTH CAROLINA Tingen Technical Sales Raleigh, NC RHODE ISLAND (919) 870-6670 OHIO (609) 866-1234 Norm Case Associates Rocky River, OH (216) 333-0400 NEW MEXICO Western High Tech Mktg. Albuquerque, NM OKLAHOMA (505) 884-2256 IDT (Central Headquarters) 1375 E. Woodfield Rd., Ste.380 Schaumburg,IL 60173 (7OB) 517-1262 IDT (NE Regional Office) 250 Mill St., Ste.107 Rochester, NY 14614 (716) 546-4880 Rockville Centre, NY WASHINGTON IDT (S. Central Regional Office) 6034 W. Courtyard Dr., Ste.3O$48 Austin, TX 78730 (512) 338-2440 Westerberg & Associates Bellevue, WA IDT (S. Central Regional Office) 14285 Midway Rd., Ste. 100 Dallas, TX 75244 (214) 490-6167 (516) 536-4242 Mt. Laurel, NJ NEW YORK PENNSYLVANIA (EASTERN) TEXAS IDT (Eastem Headquarters) #2 Westboro Business Park 200 Friberg Pkwy., Ste.4002 Westboro, MA 01581 (SOB) 898-9266 SOUTH CAROLINA IDT (SE Regional Office) 1413 S. Patrick Dr., Ste. 10 (407) 773-3412 IDT (NW Regional Office) 7981168th Ave. N.E., Ste.32 Redmond, WA 98052 (206) 881-5966 WEST VIRGINIA UTAH Norm Case Associates Rocky River, OH Anderson Associates Bountiful, UT (216) 333-0400 (801) 292-8991 WISCONSIN VERMONT Synmark Sales Park Ridge, IL IDT (Eastem Headquarters) #2 Westboro Business Park 200 Friberg Pkwy., Ste.4002 Westboro, MA 01581 (SOB) 898-9266 (708) 390-9696 WYOMING Thorson Rocky Mountain Englewood, CO (303) 799-3435 Indian Harbor Beach, FL 32937 (206) 453-8881 VIRGINIA IDT (NE Regional Office) 428 Fourth St., Ste. 6 Annapolis, MD 21403 (301) 858-5423 OREGON Westerberg & Associates Portland, OR (503) 620-1931 lOT TECHNICAL CENTERS Integrated Device Technology Integrated Device Technology (Westem Headquarters) 2972 Stender Way Santa Clara, CA 95054 (408) 492-8350 (South Central Regional Office) 14285 Midway Road, Suite 100 Dallas, TX 75244 (214) 490-6167 Integrated Device Technology Integrated Device Technology (Southwestem Regional Office) 6 Jenner Drive, Suite 100 Irvine, CA 92718 (Eastem Headquarters) #2 Westboro Business Park 200 Friberg Parkway, Suite 4002 Westboro, MA 01581 (SOB) 898-9266 (714) 727-4438 INTEGRATED DEVICE TECHNOLOGY, INC. (European Headquarters/Northem Europe Regional Office) 21 The Crescent Leatherhead Surrey, UK KT228DY Tel.: 44-372-377375 AUTHORIZED DISTRIBUTORS HALL-MARK ELECTRONICS Contact your local office. HAMILTON/AVNET INSIGHT ELECTRONICS VANTAGE COMPONENTS ZENTRONICS INTERNATIONAL SALES AUSTRALIA George Brown Group Rydalmere, Australia Tel.: 612-638-1999 George Brown Group Hilton, Australia Tel.: 618-352-2222 George Brown Group Blackburn, Australia Tel.: 613-878-8111 AUSTRIA Ing. Erst. Steiner Vienna, Austria Tel.: 43-222-827-4740 BELGIUM BeteaSA Sint-Stevens-Wolnne, Belgium Tel.: 323-736-1080 Dacom GmbH Sarstedt, FRG 49-89-5066-5160 Aquitech Merignac, France Tel.: 33-56-55-1830 Scantec GmbH Planegg, FRG Tel.: 49-89-859-8021 Aquitech Cede x, France Tel.: 33-1-40-96-9494 Scantec GmbH Kirchheim, FRG Tel.: 49-89-70-215-4027 Aquitech Rennes, France Tel.: 33-99-78-3132 Scantec GmbH Ruckersdorf, FRG Tel.: 49-89-91-157-9529 Aquitech Lyon, France Tel.: 33-72-73-2412 Topas Electronic GmbH Hannover, FRG Tel.: 49-51-113-1217 HONG KONG Topas Electronc GmgH Quickborn, FRG Tel.: 49-4106-73097 FINLAND ComodoOy Helsinki, Finland Tel.: 358-0757-2266 DENMARK ExatecAlS Copenhagen, Denmark Tel.: 45-31-191022 FEDERAL REPUBLIC OF GERMANY Dacom GmbH Stuttgart, FRG Tel.: 49-711-780-6810 Dacom GmbH Ismaning, FRG Tel.: 49-89-964-880 Dacom GmbH Buxheim, FRG Tel.: 49-08-458-4003 Dacom GmbH Soligen, FRG Tel.: 49-21-259-3011 Dacom GmbH Karlsruhe, FRG Tel.: 49-72-14-7193 (Hong Kong Regional Office) Unit 329, 31F Asia Business Centre The Centre Mark, 287-299 Queen's Road Central Hong Kong Tel.: 852-542-0067 Lestina International Ltd. Kowloon, Hong Kong Tel.: 852-735-1736 SOUTH AMERICA lOT Intectra Inc. Mountain View, CA Tel.: 415-967-8818 (Japan Headquarters) U.S. Bldg. 201 1-6-15 Hirakarasho, Chiyoda-Ku Tokyo 102, Japan Tel.: 81-3-221-9821 Dia Semicon Systems Tokyo, Japan Tel.: 81-3-439-2700 Kanematsu Semiconductor Corp. Tokyo, Japan Tel.: 81-3-511-n91 Marubun Tokyo, Japan Tel.: 81-3-639-9897 lOT (Southern Europe Regional Office) 15 Rue du Buisson aux Fraises 91300 Massy, France Tel.: 33-1-69-30-89-00 Scientec REA Cesson-Sevigne, France Tel.: 33-99-32-1544 Scientec REA Saint Etienne, France Tel.: 33-n-79-7970 Scientec REA Venissieux, France Tel.: 33-78-00-0415 StolzAG Lausanne, Switzerland Tel.: 41-21-274838 Scientec REA Cedex, France Tel.: 33-61-39-0989 A2M BUC, France Tel.: 33-39-54-9113 Microelit SRL Rome,ltaly Tel.: 39-6-8894323 w. w. Eastern Electronics Seoul, Korea Tel.: 822-566-0514 Auriema Eindhoven, Netherlands Tel.: 31-40-816565 NORWAY Microelit SRL Milan, Italy Tel.: 39-2-469044 Svensk Teleindustri AB Spanga, Sweden Tel.: 46-8-761-7300 KOREA ITALY Lasi Electronica Milano, Italy Tel.: 39-66-101370 SWEDEN StolzAG Baden-Daettwil, Switzerland Tel.: 41-56-849000 NETHERLANDS Vectronics, Ltd. Herzlia, Israel Tel.: 972-52-556070 Anatronic, SA Barcelona, Spain Tel.: 34-3-258-1906 Tachibana Tectron Co., Ltd. Tokyo, Japan Tel.: 81-3-793-1171 ISRAEL Scientec REA Chatillon, France Tel.: 33-149-652750 Anatronic, SA Madrid, Spain Tel.: 34-154-24455 SWITZERLAND INDIA Malhar Corp. Rosemont, PA Tel.: 215-527-5020 SPAIN NKK Corp. Tokyo, Japan Tel.: 81-3-228-3826 FRANCE lOT (Central Europe Regional Office) Gottfried-Von-Cramm-Str. 1 8056 Neufahrn Federal Repulic of Germany Tel.: 49-8165-5024 lOT JAPAN Eltron AlS Oslo, Norway Tel.: 47-2-500650 SINGAPORE Data Source Pte. Ltd. Lorong, Singapore Tel.: 65-291-8311 TAIWAN Johnson Trading Company Taipei, Taiwan Tel.: 886-273-31211 General Industries Inc. Taipei, Taiwan Tel.: 886-2764-5126 UNITED KINGDOM lOT (European Headquarters/ Northern Europe Regional Office) 21 The Crescent Leatherhead Surrey, UK KT228DY Tel.: 44-372-363339 Micro Call, Ltd. Thame Oxon, England Tel.: 44-84-261-939 " I

Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.3
Linearized                      : No
XMP Toolkit                     : Adobe XMP Core 4.2.1-c041 52.342996, 2008/05/07-21:37:19
Create Date                     : 2017:08:11 07:30:49-08:00
Modify Date                     : 2017:08:11 08:21:18-07:00
Metadata Date                   : 2017:08:11 08:21:18-07:00
Producer                        : Adobe Acrobat 9.0 Paper Capture Plug-in
Format                          : application/pdf
Document ID                     : uuid:471c2a22-5cc8-1349-aa2d-93c3ca3a2180
Instance ID                     : uuid:b799ec77-f4ee-4540-a398-e485476e95d8
Page Layout                     : SinglePage
Page Mode                       : UseNone
Page Count                      : 578
EXIF Metadata provided by EXIF.tools

Navigation menu