Intel_Multimedia_and_Supercomputing_Processors_Jan92 Intel Multimedia And Supercomputing Processors Jan92
User Manual: Intel_Multimedia_and_Supercomputing_Processors_Jan92
Open the PDF directly: View PDF
.
Page Count: 1216
| Download | |
| Open PDF In Browser | View PDF |
int:eL
Intel Corporation is a leading supplier of microcomputer components,
modules and systems. When Intel first introduced the microprocessor in 1971,
it created the era of the microcomputer. Today, Intel architectures are considered
world standards. Intel products are used in a wide variety of applications including,
embedded systems such as automobiles, avionics systems and telecommunications
equipment, and as the CPU in personal computers, network servers and
supercomputers. Others bring enhanced capabilities to systems and networks.
Intel's mission is to deliver quality products through leading-edge technology.
MULTIMEDIA AND
SUPERCOMPUTING
PROCESSORS
1992
Intel Corporation makes no warranty for the use of its products and assumes no responsibility for any errors
which may appear in this document nor does it make a commitment to update the information contained
herein.
Intel retains the right to make changes to these specifications at any time, without notice.
Contact your. , local
sales. office. to obtain the latest specifications
before placing your order.
..
.
The following are trademarks of Intel Corporation and may only be used to identify Intel products:
376, Above, ActionMedia, BITBUS, Code Builder, DeskWare, Digital
Studio, DVI, EtherExpress, ETOX, FaxBACK, Grand Challenge, i, i287,
i386, i387, i486, i487, i750, i860, i960, ICE, iLBX, Inboard, Intel, Intel287,
Intel386, Intel387, Intel486, Inte1487, intel inside., Intellec, iPSC, iRMX,
iSBC, iSBX, iWARP, LAN Print, LANSelect, LAN Shell, LANSight,
LANSpace, LANSpool, MAPNET, Matched, MCS, Media Mail, NetPort,
NetSentry, OpenNET,PR0750, ProSolver, READY-LAN, Reference Point,
RMX/80, SatisFAXtion, Snapln 386, Storage Broker, SugarCube, The
Computer Inside., TokenExpress, Visual Edge, and WYPIWYF.
MDS is an ordering code only and is not used as a product name or trademark. MDS is a registered trademark
of Mohawk Data Sciences Corporation.
CHMOS and HMOS are patented processes of Intel Corp.
Intel Corporation and Intel's FASTPATH are not affiliated with Kinetics, a division of Excelan, Inc. or its
FASTPATH trademark or products.
Additional copies' of this manual or other Intel literature may be obtained from:
Intel Corporation
Literature Sales
P.O. Box 7641
Mt. Prospect, IL 60056-7641
@INTEL CORPORATION 1991
intaL
INTEL SERVICE
INTEL'S COMPLETE SUPPORT SOLUTION WORLDWIDE
Intel Service is a complete support program that provides Intel customers with hardware support, software
support, customer training, and consulting services. For detailed information conta~t your local sales offices.
Service and support are major factors in determining the success of a product or program. For Intel this
support includes an international service organization and a breadth of service programs to meet a variety of
customer needs. As you might expect, Intel service is extensive. It can start with On-Site Installation and
Maintenance for Intel and non-Intel systems and peripherals, Repair Services for Intel OEM Modules and
Platforms, Network Operating System support for Novell NetWare and Banyan VINES software, Custom
Integration Services for Intel Platforms, Customer Training, and System Engineering Consulting Services. Intel
maintains service locations worldwide. So wherever you're using Intel technology, our professional staff is
within close reach.
ON-SITE INSTALLATION AND MAINTENANCE
Intel's installation and maintenance services are designed to get Intel and Intel-based systems and the networks they use up and running-fast. Intel's service centers are staffed by trained and certified Customer
Engineers throughout the world. Once installed, Intel is dedicated to keeping them running at maximum
efficiency, while controlling costs.
REPAIR SERVICES FOR INTEL OEM MODULES AND PLATFORMS
Intel offers customers of its OEM Modules and Platforms a comprehensive set of repair services that reduce
the costs of system warranty, maintenance, and ownership. Repair services include module or system testing
and repair, module exchange, and spare part sales.
NETWORK OPERATING SYSTEM SUPPORT
An Intel software support contract for Novell NetWare or Banyan VINES software means unlimited access to
troubleshooting expertise any time during contract hours - up to seven days per week, twenty-four hours per
day. To keep networks current and compatible with the latest software versions, support services include access
to minor releases and "patches" as made available by Novell and Banyan.
CUSTOM SYSTEM INTEGRATION SERVICES
Intel Custom System Integration Services enable resellers to order completely integrated systems assembled
from a list of InteI386'" and InteI486'" microcomputers and validated hardware and software options. These
services are designed to complement the reseller's own integration capabilities. Resellers can increase business
opportunities, while controlling overhead and support costs.
CUSTOMER TRAINING
Intel offers a wide range of instructional programs covering various aspects of system design and implementation. In just three to five days a limited number of individuals learn more in a single workshop than in weeks of
self-study. Covering a wide variety of topics, Intel's major course categories include: architecture and assembly
language, programming and operating systems, BITBUS''', and LAN applications.
SYSTEM ENGINEERING CONSULTING
Intel provides field system engineering consulting services for any phase of your development or application
effort. You can use our system engineers in a variety of ways ranging from assistance in using a new product,
developing an application, personalizing training and customizing an Intel product to providing technical and
management consulting. Working together, we can help you get a successful product to market in the least
possible time.
int:eL
DATA SHEET DESIGNATIONS
Intel uses various data sheet markings to designate each phase of the document as it
relates to the product. The marking appears in the upper, right~hand corner of the data
sheet. The following is the definition of these markings:
Data Sheet Marking
Description
Product Preview
Contains, information on products in the design phase of
development. . Do not finalize a design with this
information. Revised information will be published when
the product becomes available.
Advanced Information
Contains. information on products being sampled or
the initial produ~tion phase of development. *
Preliminary
Contains preliminary information on new products in
production. * .
,.
No Marking
Contains information on products in full production. *
III
"Specifications within these data sheets are subject· to change without notice:'Verify with your local Intel sales
office that you have the latest data sheet before finalizing a design.
i750™ Microprocessor Family
i860™ Microprocessor Family
i960™ Microprocessor Family
Memories and Peripherals
Development Support Tools
Table of Contents
Alphanumeric Index .........' ......... ~ ................................... .
x
i750TM MICROPROCESSOR FAMILY
Chapter 1
i750™ PROCESSOR DATA SHEETS
82750DB Display Processor ...............................................'.
82750PB Pixel Processor ................................................. .
1-1
1-57
i860™ MICROPROCESSOR FAMILY
Chapter 2
i860TM PROCESSOR DATA SHEETS AND APPLICATION NOTES
i860 XP Microprocessor ............................. :...................... .
i860 XR 64-Bit Microprocessor ............. " .............................. .
82495XP Cache Controller/82490XP Cache RAM ............................ .
AP-434 Using i860 Microprocessor Graphics Instructions for 3-D Rendering ..... .
AP-435 Fast Fourier Transforms on the i860 Microprocessor. ....... , .......... .
AP-452 Designing a Memory Bus Controller for the 82495/82490 Cache ........ .
2-1
2-164
2-243
2-378
2-393
2-447
i960™ MICROPROCESSOR FAMILY
Chapter 3
i960TM PROCESSOR PRODUCT OVERVIEWS AND DATA SHEETS
80960SA/80960SB Embedded 32-Bit Processors with 16-Bit Burst Data Bus ..... .
i960 KA/ KB Processor Product Overview ................................... .
80960KA Embedded 32-Bit Processor ......................... ; ............ .
80960KB Embedded 32-Bit Processor with Integrated Floating-Point Unit ........ .
80960CA Product Overview ................................................ .
80960CA-33, -25, -16, 32-Bit High Performance Embedded Processor .......... .
i960™ MC Processor Product Overview ..................................... .
80960MC Embedded 32-Bit Microprocessor with Integrated Floating-Point Unit and
Memory Management Unit ............................................... .
M82965 Fault Tolerant Bus Extension Unit .................................. .
3-1
3-29
3-34
3-81
3-128
3-166
3-233
3-238
3-276
MEMORIES AND PERIPHERALS
Chapter 4
DATA SHEETS
85C960 1-Micron CHMOS 80960 K-Series Bus Control microPLD .............. .
27960CX Pipelined Burst Access 1M (128K x 8) CHMOS EPROM .............. .
27960KX Burst Access 1M (128K x 8) CHMOS EPROM ....................... .
82596CA High-Performance 32-Bit Local Area Network Coprocessor ........... .
4-1
4-19
4-40
4-59
DEVELOPMENT SUPPORT TOOLS
Chapter 5
i960 Family of Software Debuggers ......................................... .
EXV960MC Execution Vehicle ............................................. .
80960SA/SB Development Support ........................................ .
ICE-960SB and ICE-960KB In-Circuit Emulators .............................. .
ICE-960MC In-Circuit Emulator ............................................. .
QT960 Evaluation and Prototyping Board ................................•....
DB960CADIC In-Circuit Debugger .......................................... .
Intel Development Tools Software Services .................................. .
iRMK 960 Real-Time Kernel ............................................... .
EV80960CA Evaluation Board ............................................. .
i960 SA/SB Evaluation Board .............................................. .
ix
5-1
5-6
5-8
5-15
5-25
5-33
5-36 .
5-41
5-43
5-49
5-52
Alphanume.ric Index
27960CX Pipelined Burst Access 1M (128Kx 8) CHMOS EPROM. , ............... , . . . .
27960KX Burst Access 1M (128K x 8) CHMOS EPROM .. . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
80960CA Product Overview . . . . . . . . . . . . . . . . . . . . . . . . . ... . . . . . . . . . . . . . . . . .. . . . . . . . ..
80960CA-33, -25, -16, 32-Bit High Performance Embedded Processor ... , . . . . . . . . . . . . ..
80960KA Embedded 32-Bit Processor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .. . . . . . . . . .
80960KB Embedded 32-Bit Processor with Integrated Floating-Point Unit. . . . . . . . . . . . . . .
80960MC Embedded 32-Bit Microprocessor with Integrated Floating-Point Unit and
Memory Management Unit .......................... , .. ; ...... , . . . . . . . . . . . ... . . ..
80960SAl80960SB Embedded 32-Bit Processors with 16-Bit Burst Data Bus............
80960SAlSB Development Support ................. , ..................... , . . . . . . ..
82495XP Cache Controller 182490XP Cache RAM .. : . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . ..
82596CA High-Performance 32-Bit Local Area Network Coprocessor ..... , ... : . . . . . . . . .
82750DBDispiay Processor .... ~ ...... , . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . .
82750PB Pixel Processor ... , ..... ~ . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . .. . . . . . . . . . . . .
85C960 1-Micron CHMOS 80960 K-Series BusControlmicroPLD ................. .....
AP-434 Using i860 Microprocessor Graphics Instructions for 3-0 Rendering . . . . . . . . . . . ..
AP-435 Fast Fourier Transforms on the i860 Microprocessor ..................... . . . ..
AP-452 Designing a Memory Bus Controller for the 82495/82490 Cache. . . .. . . . . . . . . . ..
DB960CADIC In-Circuit Debugger. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
EV80960CA Evaluation Board ..•...... , .......................... ; ........ , .... ; .. •
EXV960MC Execution Vehicle ....... '.: .................... ;.......... .......... .....
i860 XP Microprocessor. ................." ....... ; ..... '..•.. :, . ... . . . . . . . . . . . . . . . .
i860 XR 64-Bit Microprocessor ....................... ;............................
i960 Family of Software Debuggers '" .. , ............• '. .. .. . .•. .. . . .... . . . . . . .. . . . . .
i960 KAlKB Processor Product Overview ............................ ; ....... , . . . . ..
i960™ MC Processor· Product Overview.: ................... ; ................ , .: . . ... ..
i960 SA/SB Evaluation Board .......................................................
ICE-960MC In-Circuit Emulator .... , ..... " .. :' .. , ................. ; . . •. . . . . . . . . . .. ..
ICE-960SB and ICE-960KB In-Circuit Emulators. . . . . . . . . . . . . .. . .. . . . . . .. . . . . . . .. . . . . .
lritel Development Tools Software Services ........•..... ' ....... ;. . ... . . . . . . . . . . . . . .
iRMK 960 Real-Time Kernel........ ... .................•..........................
M82965 Fault Tolerant Bus Extension Unit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
OT960 Evaluation and Prototyping Board ................ . . . . . . . . . . . . . . . .. . . . . . . . . . .
x
4-19
. 4-40
3-128
3-166
3-34
3-81
3-238
3-1
5-8
2-243
4-59
1-1
1-57
4-1
2-378
2-393
2-447
5-36
5-49
5-6
2-1
2-164
5-1
3-29
3-233
5-52
5-25
5-15
5-41
5-43
3-276
5-33
i750™ Microprocessor Family
1
8275008
DISPLAY PROCESSOR
•
Programmable Video Timing
- 28 MHz and 45MHz Operating Frequency
- Pixel/Line Address Range to 4096
- Fully Programmable Sync,
Equalization, and Serration
Components
- Fully Programmabl~ Blanking and
Active Display Start and Stop Times
- Genlocking Capability
•
Flexible Display Characteristics
- 8-, Pseudo 16-, 16-, and 32-BitlPixel
Modes '
- Selectable Pixel Widths of 1.0, 1.5,
2.0, 2.5, through 14 Periods of the
Input Frequency
- Support Popular Display Resolutions:
VGA, XGA, NTSC, PAL, and SECAM
- On-Chip Triple DAC for Analog RGB/
YUV Output'
- Mix Graphics and Video Images on a
Pixel by Pixel Basis
- Real Time Expansion of the Reduced
Sample Density Video Color
Components (U, V) to Full Resolution
- Three Independently Addressable
Color Palettes
- Programmable 2X Horizontal
Interpolation of Y Channel
-16 x 16 x 2-Bit Cursor Map with
Independently Programmable 2X
Expansion Factors in X and Y
Dimensions
- YUV to RGB Color Space Conversion
- 2X Vertical Replication of V, U, and V
Data for Displaying Full Motion Video
on VGA Monitor
- Register and Function Compatible
with the 82750DA
Intel's 8275008 is a custom designed VLSI chip used for processing and displaying video graphic information.
It is register and function compatible with the 827500A,
Reset inputs allow the 827500B to be genlocked to an external sync source. By programming internal control
registers, this sync can be modified to accommodate a wide variety of scanning frequencies. A large selection
of bits/pixel; pixels/line, and pixel widths are programmable, allowing a wide latitude in trading·off image
quality vs update rate and VRAM requirements.
The 827500B can operate in a'digitizing mode, wherein it generates timing and control signals to the 82750P8
and VRAM, but does not output display information. Besides digitizer support signals and video synchroniza·
tion, the 827500B outputs digital and analog RGB or YUV information and an8-bit digital word of alpha data.
This alpha channel data may be used to obtain a fractional mix of 827500B outputs with another video source.
VRAM
r,
Video
MI ••
=~7·1+~~w...J
Serial Shift
Register
Video Input
240855-.1
8275008 Subsystem Diagram
Intel Corporation assumes no responsibility for the use of any circuitry other than cirCUitry embodied in an Intel product. No other circuit patent
licenses are implied, Information contained herein supersedes previously published specifications on these devices from Intel.
February 1991
© INTEL CORPORATION, 1991
1-1
Order Number: 240855-003
8275008 Display Processor
CONTENTS
CONTENTS
PAGE
1.08275008 PIN DESCRIPTION
Pinout ...........•........................ 1-4
Quick Pin Reference ..................... 1-8
PAGE
4.0 PROGRAMMING THE 8275008
Overview ...............................
Pipeline Delay through the 82750DB .....
Programming Considerations .........
Cursor Registers ......... ~ ... . . . . .. ..
Display Timing Registers .............
VBUSCode Registers ................
Color Registers ......................
Control Registers ....................
Color Map Registers ......... ,.......
8275008 Register Summary ............
2.0 ARCHITECTURE
Overview ............................... 1-11
Sync Generation and Timing ............ 1-11
VBUS Control ........................... 1-14
VB US Code Description .............. 1-16
Pixel Processing Path ................... 1-19
VU Interpolation ......................... 1-19
Colormap Lookup Table (CLUT)
Operation . . . . . . . . . . . . .. . . . . • . . . . . . . . .. 1-20
8-BitiPixel Graphics Mode ........... 1-21
8-BitiPixel Video Mode ................ 1-21
8-BitiPixel Mixed Mode .............. 1-21
Pseudo 16-BitiPixel Graphics Mode .. 1-21
Ps.eudo 16-Bit/Pixel Video Mode ..... 1-21
Pseudo 16-BitiPixelMixed Mode ..... 1-22
16-BitiPixel Graphics Mode .......... 1-22
16-BitlPixei Video Mode ............... 1-22
16-BitiPixel Mixed Mode ............. 1-22
32-BitiPixel Graphics Mode .......... 1-22
32-BitiPixel Video Mode ............. 1-22
32-BitiPixel Mixed Mode ............. 1-22
Y Interpolator ........................... 1-23
Cursor ........................... ; ...... 1-23
YUV to RGB Converter ................. , 1-25
Output Equalization ............... ; ..... 1-26
Digital to Analog. Converters ............. 1-27
5.0. ELECTRICAL DATA
D.C. Characteristics .....................
A. C. Characteristics .....................
Digital to Analog Converter Electrical
Characteristics . . . . . . . . . . . . . . . . . . . . . . ..
Output Delay and Rise Time versus Load
Capacitance ..........................
1-33
1-33
1-34
1-34
1-35
1-37
1-38
1-38
1-42
1-43
1-44
1-45
1-50
1-52
6.0 MECHANICAL DATA
Packaging Outlines and Dimensions ..... 1-53
Package Thermal Specifications .......... 1-56
FIGURES
Figure 1-1 82750DB Pinout .............. 1-4
Figure 1-2 82750DB Functional Signal
Groupings .................... 1-7
Figure 2-1 82750DB Unit Level
Diagram .................... 1-12
Figure 2-2 Horizontal Programming
Parameters ................. 1-13
Figure 2-3 Vertical Programming
Parameters ................. 1-13
Figure 2-4 82750PB/82750DB
Communication ............. 1-14
Figure 2-5 82750DB 1X Shift Clock
Operation ................... 1-15
Figure 2-6 82750DB 1;2X Shift Clock
Operation ................... 1-15
Figure 2-7 82750DB 1;3X Shift Clock
Operation ................... 1-15
Figure 2-8 . Mask Operation on CLUT
Address .. ;................. 1-20
3.0 HARDWARE INTERFACE
82750DB Reset Operations .............. 1-28
InputiOutput Transformation ............ 1-28
Genlocking on the 82750DB ............. 1-29
Digitizing Images with the 82750DB ...... 1-30
1-2
CONTENTS
Figure 2-9
Figure 3-1
Figure 3-2
Figure 3-3'
Figure 4-1
Figure 5-1
Figure 5-2
Figure 5-3
Figure 5-4
Figure 5-5
Figure 5-6
Figure 5-7
Figure 5-8
Figure 5-9
Figure 5-10
Figure 5-11
Figure 5-12
Figure 5-13
,
Figure 5-14
Figure 6-1
Figure 6-2
Figure 6-3
Figure 6-4
Figure 6-5
CONTENTS
PAGE
Divide by 2.5 Pixel Clock .... 1-27
Horizontal and Vertical Reset
Timing ..... " ............... 1-30
Digitizing Example .......... 1-31
Digitizing Example with Line
Replicate ....... " " ........ 1-32
Programming the Video Sync
Outputs .................... 1-36
Clock Waveforms ........... 1-47
Output Waveforms ......... 1-47
Input Waveforms ........... 1-47
1X SCLK Mode ............. 1-48
1;2X SCLK Mode ........... 1-48
1;3X SCLK Mode ........... 1-48
PIXCLK Waveforms ........ 1-49
Output Setup and Hold ..... 1-49
TEST ACT # Float Delay .... 1-49
DISDIG to Digital Output
Delay ....................... 1-50
DISDAC to Analog Output
Delay ....................... 1-50
Typical Output
Configuration ............... 1-51
Typical Output Valid Delay
Versus Load Capacitance
under Worst Case
Conditions . . . . . . . . . . . . . . . . .. 1-52
Typical Output Rise Time
Versus Load Capacitance
under Worst Case
Conditions . . . . . . . . . . . . . . . . .. 1-52
Principle Dimensions of the
82750DB in the 132-Lead PQFP
Package .................... 1-53
132-Lead PQFP Mechanical
Package Detail-Typical
Lead ...... , .... ;........... 1-54
132-Lead PQFP Mechanical
Package Detail-Protective
Bumper .................... 1-54
Detailed Dimensions of the
82750DB in the 132-Lead
PQFP Package-Molded
Details , ..... ,.............. 1-54
Detailed Dimensions of the
82750DB in the 132-Lead
PQFP Package-Terminal
Details ..................... 1-55
PAGE
TABLES
Table 1-1 Pin Cross Reference by Pin
Name ...... , ...... ,........... 1-5
Table 1-2 Pin Cross Reference QY
Location . . . . . . . . . . . . . . . . . . . . . .. 1-6
Table 1-3 Pin Descriptions .............. , 1-8
Table 1-4 Input Pins .................... 1-11
Table 2-1 VU Transfer Request
Patterns ............. , .... ,... 1-17
Table 2-2 VU Transfer Request Patterns
with Line Replicate ....... ,.,' 1-17
Table 2-3 CLUT Modes ................. 1-20
Table 2-4 Control Bit Settings and
Resulting Interpolator
Output .......... ,............ 1-23
Table 2-5 Cursor Color Registers ....... 1-24
Table 2-6 Cursor Sizes ................. 1-24
Table 2-7 82750DB Active T-Cycle
Patterns ................ , .. , .. 1-26
Table 2-8 Digital to Analog Converter
Pins .............. , ......... ,' 1-27
Table 3-1 Selecting Alpha Outputs ...... 1-29
Table 4-1 VU Sampling ................. 1-39
Table 4-2 Pixel Times ................... 1-39
Table 4-3 Number of Bits/Pixel ......... 1-40
Table 4-4 Test Mode Select Coding ..... 1-40
Table 4-5 Coding of Transfer Timing
Select Bits ... ,............... 1-42
Table 4-6 82750DB Register Space ..... 1-43
Table 5-1 Absolute Maximum
Requirements ................ 1-44
Table 5-2 D.C. Characteristics .......... 1-44
Table 5-3 A.C., Characteristics at
28 MHz ............ , ........ , 1-45
Table 5-4 A.C. Characteristics at
45 MHz ............................... 1-46
Table 5-5 DAC D.C. Characteristics ....... 1-50
Table 5-6 DAC A.C. Characteristics ........ 1-51
Table 6-1 PQFP Symbol List ...............
Table 6-2 Intel Case Outline Drawings for
PQFP at 0.025 Inch Pitch ......
Table 6-3 Thermal Resistances
(OC/W) .................................
Table 6-4 Maximum TA at Various
Airflows ..............................
1-3
1-53
1-53
1-56
1-56
82750D8
1.0 8275008 PIN DESCRIPTION
Pinout
132 130 128 126 124 122 120 118 118 114 112 110
108 ,108, 104
102 100
131 129 127 125 123 121 119 117 115 113 111 109 107 105
103 101
o
1
2
3
4
,5
8
7
8
9
10
11
12
13
14
15
18
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
0
vee VSS
VSS
vee •
o
o
oo }' _
'0
o
o
o
~oo 0 0 0 0 0 0 0 ~ I~~O 0 0 0 0 0 0 0 0 0 0 0 0
GY vee AVSS vee, VGCS\
Avee
RV VSS
BU
'
"PIXCLK
IREFIN
vee
DRV(5)
OJ
0 0
I 'I
~SS
, VSS vee
ORV[7]
"
VSS 0
DBU[3:D)
DBU[8:4) ALPHA[o)
0
SS VSS
vee
' {0
DBU[7] ALPHA[3:1)
0
DRV(8)
' ' DRV[4:o)
0
'
~O
ALPHA(4) 0
ALPHA(5) 0
vee 0
o
o
o
o
ALPHA(6) 0
~O
ALPHA(7) 0
DGY[7:o)
O~~O
8275008 Pinout
TOP VIEW
o
~
o
o
vss
VSS
BPP[o) 0
D~;~I~) ~
CB 0
vee 0
~
o
o
o
o
o
o
o
o
o
o
o
o
VBUS[3:o) {
DATAIN(31)
VSS DATAlN[18:14) 'DATAlN{21:17)
'I
'
DATAlN[25:22)
~vss-.L,vee
i'CC"
II
DATAIN[3O:26)
I
VSYNC 0
000
70
69
vss 0
- 69
,I,
I
TESTACT#
\,'
. \fCO
89
98
87
88
85
84
83
82
81
80
79
78
~o
RESETB# 0
CSYNCO
,HSYNCO
~~~SRe::ET#~~~:EQIN
ws
90
77
78
75
74
73
72
71
DATAIN[13:o)
vee
~VSS"
vss vccr----l , I
~
0'
SCLK[I) 0
VSS 0
vee 0
99
98
97
98
95
94
93
92
91
'\:
vee 0
_\vcc DISDAC
87
OOOOO~OOOOOOOOOOOOOOOOOOOOOOOOOOO
240855-2
Figure 1-1.8275008 Pinout
1-4
8275008
Table 1-1'. Pin Cross Reference by Pin Name
Location
Pin Name
Location
ACTDIS
Pin Name
87
DATAIN(15)
37
DRV(7)
114
Vee
Pin Name
Location
Pin Name
Location
82
ALPHA[7]
88
DATAIN(14)
36
DRV(6)
118
Vee
91
ALPHA[6]
90
DATAIN(13)
31
DRV[5)
119
Vee
98
ALPHA[5]
92
DATAIN(12)
30
DRV(4)
3
Vee
100
ALPHA[4]
93
DATAIN(11)
29
DRV[3]
4
Vee
104
5
Vee
109
6
Vee
116
ALPHA[3]
95
DATAIN(10)
28
DRV[2]
ALPHA[2]
96
DATAIN(9)
27
DRV(1)
ALPHA[ll
97
DATAIN(8)
26
DRV[O) .
7
Vee
123
ALPHA[O]
102
DATAIN(7)
25
FCO
61
Vee
127
AVCC
128
DATAIN(6)
24
FREQIN
64
Vee
132
AVSS
125
DATAIN[5]
23
GY
VGCS
121
69
DATAIN(4)
22
HRESET#
60
VRESET#
BPP[1]
85
DATAIN(3)
21
HYSNC
71
Vss
1
BPP[O]
86
DATAIN(2)
20
IREFIN
130
Vss
16
BU
122
DATAIN[1)
19
PIXCLK
120
CB
83
DATAIN[O)
18
RESETB#
CSYNC
72
DBU(7)
103
RV
DATAIN[31]
58
DBU[6)
105
SCLK[1]
77
DATAIN[30]
56
DBU(5)
106
SCLK[O)
74
DATAIN[29]
55
DBU(4)
107
TEST#
63
BG
)
129
59
Vss
17
73
Vss
32
126
Vss
34
Vss
39
Vss
48
Vss
57
DATAIN[28]
54
DBU[3)
110
TESTACT#
62
Vss
66
DATAIN[27]
53
DBU[2)
111
VBUS[3]
81
Vss
68
DATAIN[26]
52
DBU(1)
112
VBUS[2)
80
Vss
76
DATAIN[25]
50
DBU[O)
113
VBUS[1]
79
Vss
89
DATAIN[24]
49
DGY(7)
8
VBUS[O]
78
Vss
94
DATAIN[23]
47
DGY(6)
9
Vee
2
Vss
99
DATAIN[22]
46
DGY(5)
10
Vee
33
Vss
101
DATAIN[21]
44
DGY(4)·
11
Vee
35
Vss
108
DATAIN[20]
43
DGY[3]
12
Vee
45
Vss
115
DATAIN(19)
42
DGY[2]
13
Vee
51
Vss
117
DATAIN(18)
41
DGY(1)
14
Vee
65
Vss
124
DATAIN(17)
40
DGY[O)
15
Vee
67
Vss
131
DATAIN[16]
38
DISDAC
66
Vee
75
VSYNC
DISDIG
84
1-5
70
8275008
Table 1·2. Pin Cross Reference by Location
Location
Pin Name
Location
Pin Name
Location
Pin Name
Location
pin Name
1
VSS
34
Vss
67
Vee
100
2
Vee
35
Vee
68
Vss
101
Vss
3
DRV[4]
36
DATAIN[14]
69
BG
102
ALPHA[O]
4
DRV[3]
37
DATAIN[15]
70
VSYNC
103
DBU[7]
5
DRV[2]
38
DATAIN[16]
71
HSYNC
104
Vee
6
DRV[1]
39
Vss
72
CSYNC
105
DBU[6]
7
DRV[O]
40
DATAIN[17]
73
RESETB#
106
DBU[5]
Vee
8
DGY[7]
41
DATAIN[18]
74
SCLK[O]
107
DBU[4]
9
DGY[6]
42
DATAIN[19]
75
Vee
108
Vss
10
DGY[5]
43
DATAIN[20]
76
Vss
109
Vee
11
DGY[4]
44
DATAIN[21]
77
SCLK[1]
110
DBU[3]
12
DGY[3]
45
Vee
78
VBUS[O]
111
DBU[2]
13
DGY[2]
46
DATAIN[22]
79
VBUS[1]
112
DBU[1]
14
DGY[1]
47
DATAIN[23]
80
VBUS[2]
113
DBU[O]
15
DGY[O]
48
Vss
81
VBUS[3]
114
DRV[7]
16
VSS
49
DATAIN[24]
82
Vee
115
Vss
17
VSS
50
DATAIN[25]
83
CB
116
Vee
18
DATAIN[O]
51
Vee
84
DISDIG
117
Vss
19
DATAIN[1]
52
DATAIN[26]
85
BPP[1]
118
DRV[6]
20
DATAIN[2]
53
DATAIN[27]
86
BPP[O]
119
DRV[5]
21
DATAIN[3]
54
DATAIN[28]
87
ACTOIS
120
PIXCLK
22
DATAIN[4]
55
DATAIN[29]
88
ALPHA[7]
121
VGCS
23
DATAIN[5]
56
DATAIN[30]
89
Vss
122
BU
24
DATAIN[6]
57
Vss
90
ALPHA[6]
123
Vee
25
DATAIN[7]
58
DATAIN[31]
91
Vee
124
Vss
26
DATAIN[8]
59
VRESET#
92
ALPHA[5]
125
AVss
27
DATAIN[9]
60
HRESET#
93
ALPHA[4]
126
RV
28
DATAIN[10]
61
FCD
94
Vss
127
Vee
29
DATAIN[11]
62
TESTACT#
95
ALPHA[3]
128
AVee
30
DATAIN[12]
63
TEST#
96
ALPHA[2]
129
GY
31
FREQIN
97
IREFIN
DATAIN[13]
64
ALPHA[1]
130
32
VSS
65
98
Vee
131
33
Vee
66
99
Vss
132
1-6
I Vss
Vee
82750D8
FREQIN
HSYNC
VSYNC
VDP{
INTERFACE
<
CSYNC
VBUS[3:0]
CB
VDP COM. BUS
BG
PIXCLK
--r--VRAM
INTERFACE
<,
SCLK[l :OJ
ACTDIS
SHIFT CONTROL
GY
RV
-IANALOG
OUTPUT
BU
82750D8
ALPHA[7:0]
DATAIN[31 :0]
_J ____ _
DATA BUS
>
DATA OUT
DISPLAY
INTERFACE
DGY[7:OJ
FCO
Y/G
DATA OUT
VIR
DATA OUT
U/B
DATA OUT
DRV[7:OJ
DBY[7:OJ
DISDAC
DISDIG
RESETB#
BPP[1:0]
BITS/PIXEL
TESTACT#
0.,
82750D8 '
FREQIN
_I
SCLK[1:OJ
;_ _
--.-l:
,
TDSCLK
\
~
Y
______________.J,
:~ TACCESS -----.;
,
VRAM
data
\....._----
,
----~;------------~X~-~--------------,
,
~'".---_.~: TSETUP
240655-10
Figure 2·7. 8275008 1j.3X Shift Clock Operation
1-15
intel®
8275008
When reading data from memory during active display, the SCLK[1 :01 outputs operate at a rate required to support the programmed display rate. This
rate is determined from the following equation:
RATE = _ _ _ _ _...:(_#_o_f_bi_ts_I'-pi_xe--'I)'--_ _ __
(32-bitlword)' (# word/fetch) • (#T-cycle/pixel)
where: # bits/pixel and # T-cycles/pixel are user~
programmed
# word/fetch is: 1
The SCLK[1 :01 outputs will be the same frequency
as the input clock in the 1X shift clock mode, and
one half the input clock frequency when using the
1/2X mode. The frequency will be one third in the
input clock when using the 1/3X mode. In the 1/3X
mode the SCLK[1 :0] outputs will be high for one
T-cycle, and low for 2 T-cycles.
VBUS CODE DESCRIPTION
When the 827500B is actively fetching and displaying pixels, VUXFER, BMXIYBMNPX, and REGX are
typically sent over the VBUS. Of the three codes,
REGX has top priority, followed by VUXFER, and
last by BMXIYBMNPX. These commands may be
programmed to occur each active line during the
blanking interval for the line just completed. If a reg. ister transfer has been programmed for an active
line, it takes priority and is executed first. Otherwise,
immediately after the register. transfer, any scheduled VUXFER and BMXIYBMNPX commands are
executed. The programmer has the responsibility for
verifying that the sum of times required by these
commands does not exceed horizontal non-active
display time. The 827500B will commence fetching
pixels at the subsequent start of active display. A
detailed explanation of the different types of VBUS
commands and their corresponding codes follows.
The other parameter the programmer needs to set is
the SCLK delay. This can be found in the Pixel Control Register. It is the number of 827500B clock cycles that the DB will wait before clocking in data, out
of the VRAM, after the initiation of a transfer request
on the VB US outputs.
REGX (0010) This command requests that the
82750PB transfer 827500B register information into
the VRAM shift· registers. Besides the automatic
827500B register transfer that occurs on the second
line (line 2) of each field, the programmer can specify the next horizontal line on which another register
transfer is to take place. The transfers may be
scheduled many times during the field. On the first
transfer, the 82750PB uses the contents of its
827500Bc register as the starting address of the
827500B register data. On each subsequent access, the programmed pitch value in 82750PB's
827500Bc-PITCH register is added to the accumulated start address. The programmer must ensure
that the data is stored in VRAM at the correct address. Since the pitch remains constant, the longest
register load will determine the pitch value.
The VBUS unit performs a vertical checksum on all
the register information. Each bit in the register word
undergoes an exclusive-OR with the corresponding
bit in the previous data word. The· 827500B compares this information with the user generated
checksum, which is the last 32-bit data word read
into the 827500B during a register transfer. If the
values do not match, the 827500B will disable all of
its digital sync and data outputs, enter the reset
state, and send a SHUTDOWN code (827500BSO)
to the 82750PB over the VBUS[3:0] outputs. If the
new checksum is correct, the new register values
will take effect immediately.
VUXFER(0001) This code is used to request VU
data, providing new VU data is required by the
827500B . This command is issued only on vertically
active lines (as programmed in the register, not as
seen on the screen) and possibly the four lines after.
On each line, a row of V and/or U samples are load c
ed into the VU interpolator line stores. The pattern of
requests depends upon the mode in which the VU
interpolator is operating. In the interlaced VU mode,
one line of samples for both the V and U components are fetched during each transfer; in the non-interlaced VU mode, only one line of samples for either the V or U components is fetched. Table 2-1
illustrates the pattern of requests. M is the programmed first vertical active line, and N the last active line. The modes listed have VU transfer requests following the end of horizontal active of the
lines specified, stopping with the last line, N + 4.
Transfer Requests
The following. commands request the 82750PB to
transfer information from the VRAM array into the
VRAM shift register. When multiple requests are programmed for a given line, they are listed in the priority they are sent. When asserting a transfer request,
the programmer must be aware of two other programmed parameters, VBLEN and SCLK delay.
The VBLEN parameter is a user programmed value
whose bits lie in the General Control Register. It is
the length of time, in 827500B T-cycles, that a particular VB US code will be held at the outputs. It is
used to ensure that the asynchronously operating
82750PB chip will have enough time to recognize
and begin operating on an 827500B transfer request.
1-16
nn~
8275008
®
Table 2-1 VU Transfer Request Patterns
Mode
Active
Line
2x Non-Interlaced M
M+ 1
M+4
M+5
N+4
2x Interlaced
(Odd and Even
Fields)
M
M+4
M+5
N+4
Request VU Data
Mode
Fetch 1st Line of V
Fetch 1st Line of U
Fetch 2nd Line of V
Fetch 2nd Line of U
Fetch Last Line of V
Fetch 1st Line of V and U
Fetch 2nd Line of V and U
Fetch 3rd Line of V and U
Fetch Last Line of V and U
4x Non-Interlaced M
M+ 1
M+4
M+5
M+8
N+4
Fetch 1st Line of V
Fetch 1st Line of U
Fetch 2nd Line of V
Fetch 2nd Line of U
Fetch 3rd Line of V
Fetch Last Line of V
4x Interlaced
(Odd and Even
Fields)
Fetch
Fetch
Fetch
Fetch
M
M+4
M+6
N+4
Table 2-2. VU Transfer Request Patterns
with Line Replicate
1st Line of V and U
2nd Line of V and U
3rd Line of V and U
Last Line of V and U
The 82750PB uses another internal pointer to cause
the VRAM to load the desired VU data into its shift
registers (incrementing the pointer by a pitch value).
This command is asserted for a programmable number of T-cycles (m), as specified in the Miscellaneous Control register. Then, the 827500B fetches
them, tying up the 827500BIVRAM interface for
(n + 2) cycles, where n is % the programmable total
number of 8-bit samples of V and U fetched. Note
that one extra word, which may overlap the next
VBUS command, is fetched.
By setting a bit in the Miscellaneous Control register,
it is possible to replicate lines of V and U generated
by the interpolator for the entire field. Since each
line of VU data is displayed twice, the rate that the
VU sample map has to be fetched from VRAM is
reduced by %. Table 2-2 lists the sequence of VU
loads.
In some cases, the VU interpolator may cover only a
portion of the display. In those instances, M in the
above examples would be the first line that VU interpolation is enabled. N would be the last line that VU
interpolation is enabled. Regardless of the state of
the Line Replicate bit, there would be no vertical
pipeline delay between the loading of the first line of
samples and the second line of samples. The first
line of samples would be loaded at M-1, and the
second line at M. This reduces the delay between
switching interpolation modes during a single display.
1-17
Active
Line
Request
2x Non-Interlaced M
M +1
M+4
M+5
M+8
M+9
.N+ 4
Fetch 1st Line of V
Fetch 1sl Line of U
Fetch 2nd Line of V
Fetch 2nd Line of U
Fetch 3rd Line of V .
Fetch 3rd Line of U
Fetch Last Line of V
2x Interlaced
(Odd and Even
Fields)
Fetch 1st Line of V and U
Fetch 2nd Line of V and U
Fetch 3rd Line of V and U
Fetch Last Line of V and U
M
M+4
M+6
N+4
4x Non-Interlaced M
M+ 1
M+ 4
M+5
M +12
M + 13
N+4
Fetch 1st Line of V
Fetch 151 Line of U
Fetch 2nd Line of V
Fetch 2nd Line of U
Fetch 3rd Line of V
Fetch 3rd Line of U
Fetch Last Line of V
4x Interlaced
(Odd and Even
Fields)
Fetch
Fetch
Fetch
Fetch
M
M+ 4
M+8
N+4
.15t Line of V and U
2nd Line of V and U·
3rd Line of V and U
Last Line of V and U
BM){ (0000) This command requests a bitmap.
BMX (0000) is sent after horizontal active stops, beginning on the fifth line after vertical act!ve sta~s,
and continuing until the fifth line after vertical active
stops. (There is a vertical pipeline delay of five lines
through the 827500B, due to internal timing reqUlre.ments.) A line programmed to start at line M, wll
have its first active line displayed at line M + 5.The
82750PB uses an internal pointer to cause the
VRAM shift registers to be loaded with pixel values.
The 827500B subsequently fetches them as required for display. This command is asserted on the
VBUS for the user-programmed number of T-cycles
and must be completed before active display begins.
YBMNP){ (0100) This command performs a·Y bitmap transfer without performing a pitch calc~latio~.
When the line replicate mode is selected by Bit 22 In
the Miscellaneous Control register, this code is asserted every other display line so that the same line
of information can be used twice.
8275008
82750D8SD (1001) This command is the
827500B Shut Down code. During every register
transfer, the 827500B keeps an internal vertical exclusive-or checksum of the register data as it is read
onto the chip. The last word of data that is read
during the register transfer is the user-generated
checksum. If the two checksums match, operation
proceeds as normal. If they do not match, the
827500B enters the reset state and sends this code
to the 82750PB. The 827500B will remain reset until
the reset pin is asserted and negated by the host
processor.
Digitizer Commands
When· in the line replicate mode, and digitizing an
NTSC source. (for example, when genlocking an
NTSC source to a system that uses only a VGA
monitor),. each line of captured data is effectively
output at twice the rate. Since each line need only
be stored once in memory (it is duplicted automatically in the display mode) only one WROIGI code,
followed by aWROIGINP, is sent every other line.
On alternate lines, two WROIGINP are sent and will
select the last address that was written, without incrementing the 82750PB bitmap address pointer.
This is described in detail in Chapter 3.
REFRESH (1010) This command asks the
82750PB to generate up to 15 refresh cycles every
horizontal line. The 827500B transfer cycles have a
higher priority than refresh requests in the 82750PB.
REFRESH will not be asserted if programmed to occur at the same time as a transfer request code.
WRDIGI (0011) This command requests a write of
digitized data. The operation of this command is dependent upon the external hardware and is discussed in the section on genlocking (page 29). If
digitizing is enabled, this command is .asserted on
the VBUS for a programmable number of T-cycles.
The pointer is then incremented by a pitch value.
Since each horizontal line is·stored in a single row of
memory, this pitch value is equal to the horizontal
resolution, in bytes, for non-interlaced bitmaps. For
interlaced bitmaps, the pitch value is equal to twice
the horizontal resolution, in bytes. This allows alternate lines of data to be skipped over in successive
fields.
Video Synchronization Information
The following codes are lJsedto pass the video line
and field information from 827500B to the pixel
processor.
VEVEN (1101) This code indicates the start of an
even (i.e. second) field of a frame. This command is
sent coincident with line one of each even field.
When genlocking to an external source (see pg. 29),
the occurence of a vreset signal during programmed
horizontal active time will cause the 827500B to output a VEVEN code on the VBUS.
WRDIGINP (0111) This command allows access
to digitized data without performing a pitch calculation. WROIGINP (0111) requests that the 82750PB
perform a transfer request at the last calculated address. Note that oniy a memory transfer cycie is performed-the pitch value is. not added to this address. This will always ensure that the digitized data
is written into the last selected memory address, in
case a physical memory boundary has been
crossed. This command is asserted after the WRDIGI transfer has completed.
VODD (1100) This code indicates the start of an
odd (Le. first or only) field of a frame. This command
is always sent immediately after RESETB# is negated, and coincident with line one of the odd field.
Similarly, when genlocking, the occurence of a
vreset signal during any time other than horizontal
active time will cause the 827500B to output a
VOOO code on the VBUS.
Refresh and Control Commands
HUN (1110) This code marks every horizontal line
at a programmable point in the line. HLiN is used by
the 82750PB to increment its horizontal line counter.
The following signals are used to pass refresh requests and control information to the 82750PB.
DFL (1000) The Display Format Load command is
a maskable host processor interrupt that can be programmed to occur at any time during the display.
This is used by the 82750PB to transfer the shadow
register contents into the working register set in the
VRAM interface. This· is useful in supporting splitscreen-type applications, where it is desirable to
change the bitmap pointers at some point before the
end of the display.
1-18
Intel"
8275008
values, it is given the value of the weighted average
of the known values. Values are understood to be
non:negative integers. When the final value is outputted, any fractions are truncated or rounded to the
closest odd integer according to the programmed
value of the interpolation round flag. This process is
iterated until all pixels have assigned color values. If
the number of VU data samples loaded into the
82750DB is not enough to cover the active display
area, then the last data sample will be replicated
horizontally across the active display window.
Pixel Processing Path
This logic accepts the 32-bit word from the input
latch and divides the word into the programmed pixel format. This will result in either four 8-bit pixels,
two 16-bit pixels, one 32-bit pixel, or an 8-bit pixel
with an 8-bit alpha value (pseudo 16-bit mode). The
pixels act as addresses to the color table, or may
bypass the table completely as described below.
Pixel information may be mixed with the output of
the VU interpolator, which outputs interpolated samples derived from a reduced sample bitmap. The
least significant bit of Y or LSB of U can be programmed to act as a switch between using the explicit pixel value of YUV or using the luminance portion of the pixel with the VU portion obtained from
the interpolator. If the value of the LSB of Y (or U,
whichever is selected) is zero, the pixel data is used.
If the LSB of Y (or U) is one, the output of the VU
interpolator is used. Note that if the LSB of Y is used
as the switch flag, the luminance portion of the word
will be only 7 bits wide.
As mentioned previously in the VBUS Control discussion, each line of VU data can be used twice by
setting the Line Replicate bit in the Miscellaneous
Control register. Also, each horizontal VU sample
can be replicated by setting the VU Replicate bit in
the Pixel Control register. This will cause the V and
U pixels generated by the VU interpolator every pixel
time to be used twice. This can result in an effective
8X horizontal expansion, which is useful when horizontal blanking time is at a premium. This bit affects
the horizontal interpolation algorithm only, and will
not affect the line loading sequence for VU during
the active display.
The alpha information is also processed in this
block. The alpha data may come from one of two
sources: it may be explicitly coded in the pixel word,
as is the case in the 32-bitJpixel and pseudo 16-bitl
pixel mode, or it may be obtained by comparing the
Y portion of the pixel with a preprogrammed value
and outputting one pre programmed value if they
match and a different value if they do not match.
This latter capability is known as Alpha Trap.
When interpolation is turned on by the programmer
(by specifying a non-zero number of samples to be
fetched), VU interpolation may nevertheless be disabled for each pixel if the following conditions are
met:
1. Conditional interpolation has been selected by
the programmer,
AND
Either of the two user-programmed conditions:
a. Switching on the LSB of the U bit has been
selected, and the lowest-order bit of the U value fetched for the upper left pixel in the block
has value zero. This allows switching to occur
on a 2 x 2-pixel or 4 x 4-pixel grid, depending
on the expansion mode the user has selected.
The full 8 bits of Y and V are used, but the
usable space of U has been decreased to 7
bits.
b. Switching on the LSB of the Y bit has been
selected, and the low order bit of the Y value
for the current pixel has a value of zero.
2. Display of fetched and interpolated VU values
may also be suppressed by setting the Interpolation Output Enable bit (in the miscellaneous control register) to zero. This will allow VU data to be
loaded into the VU line stores without displaying
VU data. This is useful when a mid-screen transition is made between two interpolation modes,
to compensate for the vertical latency of the interpolation process.
VU Interpolation
When VU interpolation is enabled by the programmer, and when the display is in the active region,
"VU data" will be fetched, as required by the interpolator (by the mechanisms discussed previously in
the section titled "VB US Code Description"). This
data has the format V, V, ... , V, U, U, ... , U where
each V or U is 8 bits, and the bytes are grouped into
32-bit double-words with. earliest in lowest order.
The number, "N", of V bytes and U bytes is the
same; N is programmed to be either 256 samples, or
one of 32 to 192 samples in 32-byte increments.
The first V data and the first U data fetched on the
first line of VU interpolation supplies the VU value for
the first active pixel on that line. All the other VU
pairs. that are fetched define values for the grid of
pixels defined below and to the right of this one by
the VU expansion factor every other or every fourth
horizontally and vertically. Most other VU values are
filled in recursively by interpolation. Wherever there
is a pixel which lies between two pixels with known
1-19
intel®
827500B
For modes that require both, video and graphics to
pass through the color table, the table can be split
into two halves: one half for graphics and the other
for video pixels. By,using theSPLITCLUT bit in the
Miscellaneous Control register in conjunction, with
the LSB of Y or U, the color table address is forced
to either the video table or graphics table automatically. In this case, the masking operation is still used,
but the address is forced to either an even or odd
entry, regardless of the results of the masking operation. ,The flag bit that decides between the, two
types of pixels automatically selects the correct portion of the CLUT table for a single channel. Note the
LSB of Y or U selects the proper half of the CLUT for
that single component. The SPLIT CLUT mode assures the proper half ,of the CLUT is used for all
three components. '
Colormap Lookup Table (CLUT)
Operation
'
The 82750DB contains three 256 x 8-bit color lookup tables. The color maps can be accessed separately, or may act as one large 256 x 24-bit table.
The manner in which the tables are addressed is
determined by the programmed bits/pixel and depends on whether the pixel'is a graphics or video
pixel. Also each Y, U, and V color table address can
be masked. The masks can be used in all the bit!
pixel modes, but are most useful with the 16-bit!pixel mode. In this mode, the mask allows the YUV
values to be mapped to 8-bit values instead of 6-5-5.
Each channel (Y, U, V) has a MASK SET register
and a MASK DATA register that selects the color
lookup address bit to be changed and the new value
of the bit, respectively. A simple mask operation on
one channel is illustated in Figure 2-8.
The color table can be bypassed completely when
displaying either graphics or video, independent of
the programmed bits/pixel. This is programmed by
the user via the VIDEO PASS and GRAPHICS PASS
bits in the Miscellaneous Control register. Table 2-3
summarizes the various modes when using the
CLUT.
The CLUT address mask operation is determined by
a logical equation given by:
Result ~ (mask set and mask data)
I (mask set and data byte)
Each bit of the Result byte is determined individually
by this equation. The Result byte is then further processed in order to produce theCLUT RAM address.
Bit
MASK SET Register (0 x 41)
MASK DATA Register (0 x 42)
Result
Data Byte
240855-11
Figure 2·8. Mask Operation on CLUT Address
Table 2·3 CLUT Modes
Graphics
Pass
Video
Pass
0
LSB Y orU
SPLITCLUT
X
X
0
0
Masked Graphics Data
0
X
Graphics Pixels Bypass CLUT
X
X
0
,1
0
Masked Video Data
1
1
X
Video Pixels Bypass CLUT
0
X
0
1
Even Address Only (Graphics)
X
0
1
1
Odd Address Only (Video)
1.
1
X
X
CLUT Not Used at All
1
1-20
Colormap Address
8275008
When writing to the CLUT, the most significant byte
of the data word corresponds to the address, and
the least significant 24 bits are the YUV data (least
significant to most significant, respectively). An index register is used to allow the 6-bit address to be
mapped to an 8-bit number. (Refer to Chapter 4 for
more information.) By resetting the 827500A Disable bit, it is possible to make the CLUT look like the
reduced entry color lookup table on the 827500A.
tained from the VU interpolator. In this case each
video component is used as an address to its corresponding CLUT as described above. When the
switch flag is set to a zero, the VU values are not
used and the Y value is used as the address to all
color tables. These pixels are treated the same as in
the 8-bit/pixel graphics mode.
In this mode the applications programmer must en,
sure that the proper information has been loaded
into specific areas of the color maps. For example,
all the video pixels will use the odd address values.
By restricting the address used in the graphics and
video mode, two unique maps may coexist in the
tables. One map is used for non-linear transformations on video data, and the other for graphics color
lookup table applications.
The following paragraphs summarizes the possible
bit/pixel modes, using the LSB of Y or U switching
ability and the various graphics and video bypass
modes. Note that there are modes where the LSB of
Y or U are not used to switch between graphics and
video.
As illustrated above, the CLUT can be bypassed by
asserting either or bo!h of the bypass controls.
8-BIT/PI){EL GRAPHICS MODE
This is the graphics-only mode, in which the 8 bits
are used as inputs to all three color tables. This
makes the color maps look like a single, 256 x 24-bit
CLUT and allows 256 unique colors from a palette of
16 million to be available at any given time. If the
Graphics Pass bit is asserted, the CLUT will be bypassed and the 8-bit values of the Y, U, and V channels will be input to each channel of the converter
matrix.
PSEUDO 16-BIT/PIXEL GRAPHICS MODE
In the pseudo 16-bit/pixel graphics mode each
32-bit data word is made up of two, 16-bit pixel
words. The 827500B processes each 16-bit pixel
word, so that the least significant 8 bits correspond
to pixel information, and the most significant 8 bits
are used as alpha information. The 827500B uses
the lower 8 bits as inputs to all three color tables.
This makes the color maps look like a single, 256 x
24-bit color table. If the Graphics Pass bit is asserted, the CLUT will be bypassed and the 8-bit values
of the Y, U, and V channels will be input to each
channel of the converter matrix.
8-BIT/PI){EL VIDEO MODE
When used with subsampled VU information from
the interpolator, the 8 bits are actually a luminance
value. The Y portion addresses the Y color table, V
the V color table, and U the U color table. By using
the color table, a one-to-one mapping exists, allowing non-linear transformations to be applied to the
pixel data to enhance the quality of the reconstructed image. By asserting the VIOEOPASS bit in the
Miscellaneous Control register, the color table can
be bypassed.
PSEUDO 16-BIT/PIXEL VIDEO MODE
When used with subsampled VU information, the
least significant 8 bits of the pixel word are actually a
luminance value. The most significant 8 bits are
used as alpha information. The VU information is
generated by the 827500B interpolator. Each of the
color maps uses the corresponding 8-bit video component as an addess. By asserting the Video Pass
bit in the Miscellaneous Control register, the color
table can be bypassed.
8-BITIPIXEL MIXED MODE
In the 8-bit/pixel mixed mode the LSB of Y or U is
used as a switch flag to change the index to the
color tables. When the switch flag is set to a one,
the Y value corresponds to a luminance value, and
the VU values are the chrominance information ob-
1-21
8275008
ters. When the switch flag indicates the video mode,
the lower 8 bits of the 16-bit pixel word and the VU
values obtained from the interpolar are input to their
respective GLUTs. If the SPLITGLUT mode is seleCted, the LSB of the address is forced to either an odd
or even entry in the three color tables, depending on
whether the data is video or graphics information.
PSEUDO 16-BIT/PIXEL MIXED MODE
In this mode the LSB of Y or U is used as switch flag
to change the index to the color tables. When. the
LSB of Y or U is set to a one, the lower 8-bit value
corresponds to a luminance value, and the V and U
values are the chrominance information. In this
case, each video component of the 827500B is
used as a colormap address as described above.
When the LSB of Y or U is set to zero, the V and U
values from the interpolator are not used, and the Y
value is used as the address to all color tables.
32-BIT/PIXEL GRAPHICS MODE
Eight bits each of Y, U, and V are used as addresses
to each segment of the color table. Since the size of
the addressable color space is not increased, the
advantage of using the color map is for special effects or gamma correction. The most significant 8
bits of the 32-bit data word are used for the alpha
channel data. If the Graphics Pass bit is asserted,
the GLUT will be bypassed and the 8-bit values of
the Y, V, and U will be input to each channel of the
converter matrix.
16-BIT/PIXEL GRAPHICS MODE
The 16-bit pixel word is broken up on the 827500B
to yield 6 bits of Y, and 5 bits each of V and U. The Y
bits are the least significant, and the U bits are the
most significant. These values are then padded with
zeros in the lower order bits, to obtain an 8-bit word
for each pixel component. Each component addresses its respective GLUT. However, the Y channel may access only 64 unique locations, and 5-bit
resolution for VU restricts them to 32 unique locations each. The address range may be extended by
using the colormap mask registers to add 2 bits of
precision in the least significant bits for Y and 3 least
significant bits each for VU channels. This allows the
programmer to access all the entries in the color
table by reprogramming the MASK OATA and MASK
SET registers during the blanking interval.
32-BITIPIXEL VIDEO MODE
The Y channel contains the least significant 8 bits of
the 32-bit data word. The U and V information is
generated by the VU interpolator. The YUV channels
are input to their respective color tables. The size of
the addressable color space is not increased, but
this can be used to take advantage of a non-linear
transformation, which may aid in the decompression
process. The most significant 8 bits of the data word
are used for the alpha channel data.
16-BIT/PIXEL VIDEO MODE
32-BITIPIXEL MIXED MODE
This mode works like the 8-bit/pixel video mode described above, except that the 827500B has processed the information so that the Y channel contains the least significant 8 bits of the 16-bit data
word. The V and U information is generated by the
VU interpolator. If the SPLITGLUT mode is selected,
the LSB of the address is forced to an odd entry in
the three color tables.
When the switch flag is zero, the graphics mode is
selected, and the inputs to the GLUT are the respective 8 bits each of YUV data. These pixel values may
be masked by using the colormap mask data and
mask set registers. When the switch flag indicates
the video mode, the lower 8 bits of the pixel word
and the VU values obtained from the interpolator are
input to their respective GLUTs. If the SPLITGLUT
mode is selected, the LSB of the address is set to
either an odd or even entry in the three color tables,
depending on whether the data is video or graphics
information. The most significant 8 bits of the data
word are used for the alpha channel data.
16-BIT/PIXEL MIXED MODE
When the switch flag is zero, the graphics mode is
selected and the inputs to the GLUT are the respective YUV data in the 6-5-5 format. These pixel values
are extended by using the colormap masking regis-
1-22
intei®
8275008
Table 2-4. Control Bit Settings and
Resulting Interpolator Output
Y Interpolator
The Y Interpolator performs a 2X horizontal linear
interpolation on each line of Y values. When Y interpolation is enabled, the internal pixel clock is twice
the frequency of PIXCLK output.
Viden
Gren
V/G
Switch
0
X
X
X
Interpolator
Bypassed
1
0
0
X
Interpolator
Bypassed
1
0
1
0
Interpolate
Graphics Pixel
1
0
1
1
Do Not
Interpolate
Video Pixel
1
1
0
1
Interpolate
Video Pixel
1
1
0
0
Do Not
Interpolate
Graphics Pixel
1
1
1
X
Interpolate
Both Video
and Graphics
Pixels
827500B
Enable
NOTE:
If Y interpolation is enabled, then only the integer
values of pixel times greater than IX may be
used.
The interpolation may be separately controlled for
both video and graphics pixels, via the Viden and
Gren bits (bits 12 and 11) of the General Control
register. A video pixel is defined as one generated
using VU interpolated values. A graphics pixel does
not use the VU interpolator. The effects of setting
the control bits, the 827500B enable flag, and video/graphics pixel switch (V /G Switch) on the output
of the Interpolator are summarized in Table 2-4.
Because of the asymmetric nature of the internal
pixel clock used on 827500B, the number of T-cycles between successive Y pixels varies depending
on the programmed pixel width. When enabled
there is a pipeline delay through the Y Interpolato;
eql!al to the number of T-cycles between each internal pixel clock.
When the interpolator is bypassed as described
p.bove, there. is a fixed delay through this block. The
V and U data are delayed by one pixel clock to allow
the chroma data to line up with the luminance data.
Other control signals, such as the register address
byte (most significant byte of the 32-bit data word
read from VRAM), the pixel clock, horizontal and
~ertical active displays, composite blanking, and regIster load enable signals are also delayed by one
pixel clock in order to line up with the YUV data. The
programmer must ensure that the active display timIng IS programmed to take the appropriate delay
through the Y Interpolator into account.
1-23
Result
Cursor
Hardware support for a 16 x 16-pixel cursor has
been included on the 827500B. The cursor is capable of providing sharp color transitions, when using
subsampled VU bitmaps. Software intervention· is
minimized, leaving the host with more processing cycles to perform other operations.
Under normal operation, the XYstarting display position of the cursor is loaded into the Cursor Control
register during a 827500B register load. On the display line corresponding to the Y start position, the
8275008
cursor is displayed when the X starting position
(specified in T-cycles) is reached. On the following
15 lines, the cursor will be displayed at this X position everY line, for both interlaced and non~inter
laced displays.
Each 2-bit cursor pixel will select one of the three
Cursor Color registers or transparency. The 24-bit
output of one of the three color registers (or the actual display pixel data if transparency is used) is input to the YUV converter.
A normal 8275008 register transfer is used to load
the entire 16 x 16 x 2 bits (16 words of 32 bits each)
of cursor data. During this register transfer, the cur- .
sor data is distinguished from normal register data
by placing the Cursor Control register immediately
before the 16 words of cursor data. When the
8275008 loads the Cursor Control register, it will interpret the next sixteen 32-bit words of register data
as the cursor bitmap, and will disable the other registers on the· 8275008 from decoding the address
field of the 32-bit data word. (The checksum of the
8275008 register data is not performed during the
loading of the cursor bitmap data.) The cursor bitmap will be loaded a line at a time, starting at line
zero and continuing in sequential order to line 15.
Each line in the cursor map actually contains sixteen
2-bit cursor pixels, with the two least significant bits
corresponding to the first cursor pixel in that line,
and the two most significant bits corresponding to
the 16th cursor pixel on that line. Each 2-bit pixel
may select one of the three Cursor Color registers or
transparency, according to the format indicated in
Table 2-5.
The cursor bitmap length is 16 lines, and the width is
16 pixels. Although the length of the cursor may be
changed dynamically by chaining register loads to
update the cursor map, the size of the cursor is dependent on the type of display. For interlaced displays, each line of cursor data will appear on the
same line of each field. This results in a cursor of
16 x 32 pixels. For non-interlaced displays, the same
line of cursor information will appear on the same
line every field. The cursor in this case will be 16 x
16 pixels. The size of the cursor may be doubled
independently in the horizontal and/or vertical direction by setting the 2X Horizontal Cursor or 2X Vertical Cursor bit in the General Control register. In this
case, no new data is loaded into the cursor map; the
data is just replicated in the corresponding dimension. Table 2-6 summarizes some of the possible
cursor sizes. Note that by loading the cursor bitmap
with different data at the start of every field, cursor
sizes not listed below may be achieved.
Table 2-5. Cursor Color Registers
Table 2-6. Cursor Sizes
2X Horz.
Cursor
2X Vert.
Cursor
Display
Cursor Size
(in Pixels)
Cursor Pixel
Output
Off
Off
Interlaced
16 x 32
00
Transparency
(CUiSOi Pixel Not Displayed)
On
Off
Interlaced
32 x32
Off
On
Interlaced
16 x 64
01
Cursor Color Register 1
On
On
Interlaced
32x64
10
Cursor Color Register 2
Off
Off
Non-Interlaced
16 x 16
11
Cursor Color Register 3
Three 24-bit color registers that hold the color information for the cursor may be written to at any time
during the register load. The cursor may be loaded
any time during the blanking intervals of the display.
For displays that do not program the cursor during
the display, the cursor bitmap may be loaded during
the vertical blanking interval.
On
Off
Non-Interlaced
32 x 16
Off
On
Non-Interlaced
16 x 32
On
On
Non-Interlaced
32x32
There is a complex relationship between the cursor
and the pixel data especially when using non-integral divisors of the pixel clocks. Since the pixel data
output from the 8275008 pixel path always changes
coincident with the rising edge of the clock, the cursor start position must be positioned on the rising
·edge of any period of the pixel clock. The programmer "must enforce the corresponding restrictions on
the start and stop position of the cursor.
When the T-cycle count equals the value programmed into the X start position of the Cursor Control register, the first cursor pixel can be displayed.
1-24
8275008
When converting the normalized analog values Y',
V', U' to digital y, v. u values, the D.C. offset and
conversion ranges are compatible with the CCIR
601 standard for digital video. The ranges for the
components and the corresponding Digital to Analog equivalent equations are given below:
YUV to RGB Converter
The following equations give the theoretical relationship between analog RGB components, R, G, B, and
analog YUV components, Y, U, V.
Y
~
V
~
U
~
+ 0.586816 G + 0.114363 B
(1 a)
0.701178 R - 0.586816G - 0.114363 B
(1 b)
0.298822 R
R-
Y~
B - Y
~
-0.298822 R - 0.586816 G + 0.885637 B
y
(1 c)
where: 0.0 < G, R, B < 1.0
0.0 < Y < 1.0
-0.701 < V < +0.701
-0.886 < U < -0.886
~
Y - 0.509228 V - 0.194888 U
v
u
B~Y+U
(2c)
where: 0.0 < G, R, B < 1.0
0.0 < Y < 1.0
-0.701 < V < +0.701
- 0.886 < U < + 0.886
V' ~ 0.5V
0.701
v' + 16
(4b)
(240 - 16) U' + 16
(4c)
+ 0.5
(3b)
~~+0.5
(3c)
16
(5a)
112V
+ 128
0.701
(5b)
112U
0.886
(5c)
~
(219)Y
v
~
--
u
~
--
+
+ 128
16 < Y < 235
16 < v, u < 240
By solving equations 5 for Y, U. V, and substituting
into Equation 2, we get the relationship between analog R, G, B and the digital DVI y, u, v data:
(3a)
0.886
~
y
G ~ 0.004566y - 0.003187 v - 0.001541 u + 0.532242
(6a)
R ~ 0.004566 Y + 0.006259 v - 0.874202
(6b)
0.004566 Y + 0.007911 u - 1.085631
(6c)
B
U'
(240 - 16)
where: 0.0 < Y < 1.0
- 0.886 < U < 0.886
-0.701 < V < 0.701
The luminance channel for the YUV inputs is presumed to swing between O.OV and 1.0V. However,
the chroma components do not and need to be normalized to a OV to 1V range. The offset binary encoding used to obtain unsigned numbers must also
be accounted for. This encoding should center the V
and U inputs at the midpoint of the voltage range.
The equations for the normalized version of Y, V,
and U (Y', V', and U' respectively) are:
Y
~
Substituting the normalized analog voltages of
Equation 3 into Equation 4, we obtain the digital version of the input data, used in the DVITM Technology
system:
(2a)
(2b)
~
(4a)
where: 16 < u < 240
R~Y+V
Y'
(235 - 16) Y' + 16
where: 16 < v < 240
Solving for G, R, B, we can obtain the inverse relationship:
G
~
where: 16 < Y < 235
~
where: 0.0 < R. G, B < 1.0
16 < Y < 235
16 < v, u < 240
where: 0.0 < Y', V' U' < 1.0
0.0 . 255 or
< 0 due to excursions in the inputs) are clipped to
255 or O.
transitions fall alternately on the active and inactive
phase of the input frequency, while the internal pixel
clock transitions always occur on the active phase.
Also .note that PIXCLK does not have a 50% duty
cycle.
= y + 1.370705 v - 191.45029
(7b)
The equalizing logiC derives a clock that has a period equal to,the programmed pixel rate, providing an
edge to sample the output information. This allows
the Digital to Analpg Converter to directly sample
the output of the pixel data path before performing
the analog conversion.
b = y + 1.732446 u - 237.75314
(7c)
Table 2-7. 8275008 Active T-Cycle Patterns
9 = Y - 0.698001 v - 0.337633 u + 116.56116
r
(7a)
where: 16 < y.< 235
16
o<
< v,
Pixel Time
(T-Cycles)
u < 240
g, r, b < 255
1
1.5
By substitution of Equation 5 into Equation 1, and by
converting G, A, and 8 to digital values, we can obtain the inverse relationship of Equation 7:
2
Pattern Of Internal
Pixel Clock
Always On
1 On/1 On/1. Off
- 1 On/1 Off
2.5
1 On/1 Off/1 On/2 Off
3
1 On/2 Off
y = +0.298822r + 0.586816g + 0.114363b + 16
(8a)
u = -0.172486r - 0.338721 g+ 0.511206b +128
(8b)
3.5
1 On/20ff/1 On/3 Off
ipv
(8c)
4
1 On/3 Off
=
+0:511545 r - 0.428112g - 0.083434 b+ 128'
where: 16 < y
< 235
16 < v, u< 240
o<
g, r, b < 255
Output Equalization
The units on the 8275008 process the pixel information at the operating frequency of the chip: If the
output pixel rate is not equal to the maximum frequency, the units have null states during which processing is suspended. This type of operation is nec,essary on the 8275008 because of the large
. amount of pipelining. Table 2-7 gives the pattern of
T-cycles on the 8275008 during which processing is
active, according to the programming shown in Table 4-2.
'
4.5
1 On/3 Offl1 On/4 Off
5
1 On/4 Off
5.5
1 On/40ff/1 On/5 Off
6
,1 On/5 Off
6.5
1 On/5 Off/1 On/6 Off
7
1 On/6 Off
7.5
1 On/6 Offl1 On/7 Off
8
10n17 Off
8.5
1 Onl7 Offl1 On/8 Off
9
1 On/8 Off
9.5
1 On/8 Off/1 On/9 Off
10
The pixel information must be. output at a rate that js
some sub-multiple of the operating frequency. The
divisor is programmed by the user, and may be from
1 to 12 times slower than the period of FAEQIN, in
increments of %. Divisors of 13 and 14 are also programmable. Because non-integral divisors are used,
it is necessary for the 82750DB to output different
information on both phases of FAEQIN. This is illustrated in Figure 2-9, which uses a 2.5 divisor for the
clock. Notice that the pixel clock output (PIXCLK)
, 10.5
1-26
"
1 On/9 Off
1 On/9 Off/1 On/10 Off
11
1 On/10 Off
11.5
1 On/10 Off/1 On/11 Off
12
1 On/11 Off
13
1 On/12 Off
14
1 On/13 Off
8275008
FREQIN
I
I
I
I
Internal
Pixel.
Clock
Y
\'-----r--'/
I
I
\'--------'/
I
I
:
PIXCLK
I
I
I
I
~: 1/2T-cyclo + td
240855-12
Figure 2-9 Divide by 2.5 Pixel Clocl(
Digital to Analog Converters
The Digital to Analog Converters (DACs) take three
channels of video information output from the pixel
data path, converting it from B-bit digital values to
analog voltage levels typically between OV and 1V.
The conversion is monotonic, and a pixel clock is
used to derive a two-phase clock internal to the
DAC. The data is sampled from the output of either
the pixel path, or the YUV to RGB matrix on the
rising edge of the internal active phase of this clock..
The DISDAC input pin can be asserted to disable the
analog outputs· and place· them into a high-impedance state.
where: Cext is the external capacitance applied and
Cout is the intrinsic capacitance of an analog output.
For high performance the objective would be to
minimize Rext and Cext. The voltage Voutfs can be
determined by any combination of Ifs and Rext, but
must not exceed 1.5V. In addition Ifs must not exceed 22 mA. The analog outputs must go through
an external buffer to drive doubly-terminated 75Q
coax line.
Table 2-B lists pins Which are used to configurethe
triple DAC.
Table 2-8. Digital To Analog Converter Pins
The analog outputs of the triple DAC are referenced
to an external current source, which must be connected to the IREFIN pin. All the analog outputs are
scaled by this current reference. The value of the
analog output full scale is as follows:
Description
Signal
IREFIN
Analog Current Reference. Must Be
Decoupled to AVCC.
VGCS
Internal Voltage Reference. Must
Be Decoupled to AVCC.
AVcc
Analog Power
where: Iref is the magnitude of the reference
current.
AVss
Analog Ground
GY, RV, BU
Analog Pixel Outputs
The output voltage generated at full scale is:
DISDIG
Disable Digital Outputs
.DISDAC
Disable Analog Outputs
Ifs = Iref • 255
18.5
Vfs = lis' Rext
NOTE:
Rext is the load resistance value.
-
A typical output load for the analog outputs (RV, BU,
GY) is 75Q. The speed of the DAC analog output
rise and fall times is determined by the time constant:
.
Rext • (Cext
+
Cout)
1-27
The digital video outpvts must be disabled by
setting DISDIG high whenever the analog outputs are used. Otherwise the A.G. and D.C. characteristics of the DAG are not guaranteed.
intei·
8275008
the beginning of a horizontal line and at the beginning of the first field sometimes referred to as line 1
of field 1. There will not be a horizontal sync pulse
on the first line after reset, but HSYNC will be generated on every line thereafter. All horizontal and vertical programming parameters as well as scheduling
of any transfer requests and control information to
be sent on the VBUS must be set up by the user
during the first register load. Included in the control
information are parameters for the 82750PB to refresh the VRAM. Refresh must occur on every line.
This requires that the line rate of the 827500B must
be at least 4 kHz to guarantee that enough refresh
cycles are generated. Additional register transfers
(up to one per line) may be programmed to occur on
any line during the field. As a result of this transfer
display characteristics and programming parameters
may be changed.
3.0 HARDWARE INTERFACE
8275008 Reset Operation
Upon power-up, the 827500B is in an indeterminate
state and must be reset. The RESETB# signal asserted by the host processor is sampled on the rising edge of FREOIN. The 827500B will enter the
reset state a maximum of four cycles after
RESETB# is sampled. The 827500B will request
the 82750PB to generate VRAM refresh cycles by
asserting a REFRESH code on the VB US for 16 Tcycles. This code is repeated every 256 T-cycles,
until RESETB# is negated.
NOTE:
The RESET8# input is an edge-triggered input.
After power-up, the host processor must set the
RESET8# input low for a minimum of ten T-cycles in order to reset the 82150D8. The host
must then set the RESETB# input high to start
normal operation.
After the first· field, automatic register transfers will
occur on the second line of each subsequent field.
Note that all register transfers will occur at 1/3 of
the operating frequency of the 827500B, unless the
1X or 1/2X SCLK mode has been programmed by
the user.
When the RESETB# input is released, a Start of
Vertical Field command (VOOO) is sent for 16 T-cycles to the 82750PB via the VBUS. This code is immediately followed by a Register Transfer Request
command (REGX) that is held for 256 T-cycles. This
256 T-cycle wait assures that the 82750PB has ample .time to honor the 827500B register transfer request. The register data is then read into the
827500B from the serial port of the VRAMs at a rate
that is equal to Va of the operating frequency. If the
register transfer does not terminate after 256 T-cycles, the 827500B will automatically stop the trans.fer, send an 827500BSO code to the a2750PB, and
re-enter the reset state.
Throughout the reset process, the states of all outputs become valid at various times. Specifically, after being held low for at least 10 T-cycles,
RESETB # must transition to a high state in order
to initiate normal operation. By the time RESETB #
reaches this low to high transition, the states of
SCLK[1:0j, VBUS[3:0], HSYNC, VSYNC, CSYNC,
and FCOare valid. 10 T-cycles following
RESETS # 's transition from iow to high, the states of
BG, CB, ACTOIS, PIXCLK, OGY[7:0j, ORV[7:0], and
OBU[7:0] become valid. ALPHA[7:0j and BPP[1:0j
signals reach a valid state 10 T-cycles following the
completion of the first register load following reset.
Ouring this register transfer, and on all subsequent
register transfers (programmed or automatic), the
827500B performs a vertical checksum on the register data. The last 32-bit word read in during a register transfer is the user-generated checksum of that
register data. If the 827500B-generated checksum
error does not match the user-generated checksum,
the 827500B sends a SHUTOOWN code to the
82750PB via the VBUS, and will automatically re-enter the reset state. The 827500B will remain in the
reset state until the RESETB# input is toggled by
the hostprocessor. Any VRAfyI requests or control
signals programmed to occur during this time will be
ignored.
Input/Output Transformation
In general, the control outputs, including the sync
signals, are delayed by pipelining effects from their
corresponding inputs. If the output sync signals are
taken as the time base, the first pixel in a line is
actually fetched by an SCLK that is up to 19 T-cycles
before its corresponding PIXCLK. Some later pixels
may be delayed by an additional number of T-cycles,
depending upon bits/pixels, pixel timing, and wheth~
er Y interpolation is enabled.
Outside of the active display region and before the
blanking output is asserted, border pixels are output.
Where the blanking region has been entered and the
display is not active, U1e output is the value contained in the Blanking Color register.
Normal programmed operations start after the first
successful register load. Frame timing will start at
1-28
Intel,
8275008
Pixel handling in the active region is defined by three
parameters:
Genlocking on the 8275006
1. The bits/pixel parameter.
The genlocking algorithm on the 8275008 uses horizontal and vertical resets, HRESET #
and
VRESET #, obtained from an external device. When
the Genlock bit in the Miscellaneous Control register
is off, the 8275008 will ignore all signals present on
it's HRESET # and VRESET # inputs. The 8275008
will resync itself when the programmed end of line
count is received. This allows the user to turn off
genlock without having to worry about the state of
the input video.
2. Whether VU interpolation is in effect or not.
3. If the 8275008 Enable bit has been selected.
VU interpolation is in effect for a given pixel if:
1. The VU interpolator is turned on (VU sample load
set to non-zero load value),
AND
2. VU interpolation display is permitted (VU interpolation display operations bit equals 1),
AND
3. One of the two following conditions is met:
a. Either the interpolation is unconditional,
OR
b. The controlling Y or the controlling U sample
for this pixel has a least significant bit of 1.
The value of the alpha output may come from one of
the following three sources:
1. It may be explicitly coded into the pixel data (32bit/pixel and pseudo 16-bit/pixel with Alpha
modes only).
2. It may be output from one of two programmable
registers, AlphaO and Alpha1.
3. Ouring the portion of the display when the border
is active, the 8 most significant bits of the 80rder
Alpha register may be output.
'
Table 3-1 illustrates how the Alpha outputs are selected;
Table 3·1. Selecting Alpha Outputs
Alpha
Enable
Alpha
Trap Select
0
X
AlphaO Register
1
0
AlphaO Register
(8,16 bpp)
1
0
MS 8yte of Pixel
(32, Pseudo 16 bpp)
1
1
Trap Match = 0,
AlphaO Register
1
1
Trap Match = 1,
Alpha1 Register
Alpha Output
1-29
When the Genlock bit is set to one, the 8275008 will,
use the external resets to reset its internal horizontal
and vertical sync counters. In this case, the width of
the active line is determined by the HRESET # signal, and the length of the field is governed by
VRESET #. The programmed values for these registers will be ignored. As shown in Figure 3-1,
when asserted VRESET # and HRESET # are effected just after the third falling edge of FREOIN.
VRESET # has no effect on the 8275008 if the first
half of the first line of an odd field or the second (and
only) half of the first line of an even field is already in
progress. HRESET # has no effect on the 8275008
if itoccur$ during the programmed first half of the
line. The user may decrease the effect of jitter by
reducing the "window" during which the vertical reset signal is supposed to occur. This can be done by
scheduling a register load to occur after the vertical
active display time has ended, thereby decreasing
the programmable horizontal active window to a size
acceptable for the video source. When VRESET# is
received during this reduced, programmed horizontal active window, the 82750D8 is reset to an
even vertical field. When VRESET# occurs at any
other time in the horizontal scan line, the 8275008
is set to ari odd field.
int'et
8275008
FREQIN
I
HRESET# \ \ . . _ _ _ _ _ _ _ _ _ _---J
~l
HSYNC
~
/
VRESET#,\\.._ _ _ _ _'-jl'/-'I_ _ _ _--'
IF
VSYNC
240855-13
Fjgure 3~1; Horizontal, and Vertical Reset Timing
Digitizing Images with the 8275008 ,',
Digitizing is enabled by setting theDigitizej::nablebit
in the Miscellaneous Control register. Note that en~
abling the digitize mode does not aytomatically enable genlocking. The Genlock bit m'ust be set separately, if it is required. When digitizing, the .827500B
is used to shift digitized data .into the VRAM shift
registers, an9 then transfer Jhis data into the VRAM
array.
The 82750DB also provides an external "digitizer
window" signal, FCD. This signal defines the vertical
active region that the digi~izer enabled. Typically, the'
user sets up the display parameters to reflect the
"window" of the display to be digitized. The horizon~
tal and vertical active window size can be selected
by programming the Active Start and Stop registers.
FCD is derived from the Vertical Start and Stop registers, and is used to enable the digitizer to drive the
VRAM bus. During the programmed vertical blanking
interval the FCD signal will be negated, and therefore, the digitizer is prohibited from driving the VRAM
bus. This will allow data to be read from the VRAM
serial data bus during the automatic register transfer
that is performed at the start of the field. Note that it
will still be possible to program the 82750DB to digitize during the vertical blanking interval, in order, for
example, to capture time codes from a VCR.
Wh~~ capturing and di~~laying NTSC data ,during
the horizontal blanking interval of the first display
line,. a WRDIGINP command is sent on the VB US to
the 82750PB. (Refer to Figure 3-2.) Recall that there
is a 5-line vertical pipeline delay through the
82750DB. If the first display line is programmed to
be n, the first display line will occur at n + 5. Similarly, if the last line is programmed to be m, then the
last display will, be line m + 5. The WRDIGINP
VB US code causes a, dummy, write transfer cycle
that" places the VRAMs in the write mode. The
827~OPB then sets the bitmap pointers to the first
line's address (LO). This code is immediately followed by another WRDIGINP command that causes
the 82750PB to perform a write transfer cycle at the
L'O address. Since no digitized data has been read
.in, invalid data is loaded into row LO of the VRAM
array.
During the aCtive display of the first display line, the
,82750DB provides shift clocks at the programmed
pixel rate. The digitized data is shifted into the
VRAMs while the user-programmed horizontal active
window, is active. During the horizontal blanking interval of the hext line, the 82750DB sends a WRDIGI
code to the 82750PB, thereby transferring the LO
data from the shift register to the VRAM array at the
LO address. The 82750PB performs a pitch calculation, pointing it to the L 1 row. After the WRDIGI
8275008
WRDIGINP
WRDIGINP
t t
WRDtGINP Place VRAUS in wr ite mode
Sel 82750PB po;nler 10 LO
WRQIGINP Transfer (]orboge to LO address
(Seled LO)
Iinen+4 l J
WRDtGI Transfer lO doto to LO address
reo Asserted
/t
WRDIGI
WRDIGINP
WROIGI
WRDIGINP
t t
Digitized Dolo LO
Set 82750PB pointer to II
WRDIGINP Transfer LO to II address
(Selecl LI)
Iinen+S l J
t t
DiqitilCd Dolo L1
WROIGI Transfer L1 dolo to 1I address
Set 82750PB pointer to L2
WRDtGINP Transfer 11 10 l2 address
(Soled L2)
linen+6 l J
WRDlGI
I
Losl Line Ot Dolo
ul .t
WROIGINP
t
WROIGllronsfer Lf doto to If address
Set 82750PB pointer to Lf+ 1
WRDIGINP Transfer U 10 If+1 address
(Seled Lt+ 1)
Iinem+slJ
reo
./t
Negoled
Iinem+6lJ
240855-14
Figure 3-2. Digitizing Example
The vertical sync pulses are buffered, so the start of
the field transfer request can be honored immediately after the previous transfer request is finished.
transfer has finished, the 82750DB issues a
WDIGINP command to the 82750PB that performs a
write transfer cycle at L1 address. This will write the
LO data into the L1 address. The next line the L1 row
will be written over with L1 data. This same procedure continues for the entire active display, until the
last active line is reached (m + 5). A final pair of
WRDIGI and WRDIGINP codes are sent to the
82750PB to load in the last line of data. At the start
of horizontal sync of the next line, the FCD signal
will be negated.
Also, captured NTSC data may be displayed on a
VGA-type monitor. This requires the 82750DB to operate at a VGA frequency (approximately 31.5 kHz),
which ·is tWice that of NTSC. Each line of captured
NTSC data is read into the 82750DB twice. Setting
the line replicate bit makes doubling of memory unnecessary. Figure 3-3 illustrates how the 82750DB
operates in such a mode. The Line Replicate, Digitizer, .and Genlock bits in the Miscellaneous Control
register are assumed to be set to one. During the
HBI of the first display line, a dummy write transfer
cycle. (WRDIGINP) places the VRAMs in the write
mode. The 82750PB then sets the bitmap pointers
to the first line's address (LO). This code is immediately followed by a WDIGINP command, causing the
82750PB to perform a write transfer cycle at the LO
address. Since no digitized data has been read in,
unknown values are loaded into row LO of the VRAM
array.
The purpose of the WDIGINP may not be apparent
at first glance. This signal ensures that the correct
data is written into the last selected VRAM address.
This is necessary when crossing the. physical boundaries of VRAM memory.
When the 82750D8 is genlocked, the digitizing
device must also provide the HRESET # and
VRESET# signals. The device must ensure that
VRESET# is never asserted during the start of the
line. This al.lows a register transfer (which shortens
the active display and is required for digitizing) to
complete before the start of a field register transfer.
1-31
8275008
WRDlGINP
Iinen+4
t t
lJ
reo Asserted
..
linen+5
linen+6
lJ
lJ
WRDlGINP
I
mgilized Data LD
I
Digi[;led 0010 LD
WRDrCINP
WRDIGINP
WROIGI
WROIGINP
WRorC:1NP
WRQIGINP
Itt
Itt
WRDIGI Place VRAUs in write mode
Set 82750PB pointer 10 LO
WRDIGINP Transfer garbage to LO address
(Select LO)
WRDIGINP Transfer
(Select
WRDIGINP Transfer
(Select
LO doto 10 LO address
LO)
lO data to LO address
LO)
WROIGI Transfer LO dota to LO address
Set B2750PB pai"ter to II
WRDIGINP Transfer LO to II address
(Select lI)
WROICINP Tronsfer II dolo to L1 address
linen+7
LJ
I
Oigilized Ooto II
Itt
WROIGI
I
Dig;{;zed Data Ll
linen+8
WROIGINP
Itt
(Select ll)
WRDICINP TrQnsfe~ L t dolo to II address
(Select ll)
WRDIGI Transfer L1 dolo to II address
Set 82750PB pointer to L2
WRDIGINP Transfer l1 to l2 address
(Select l2)
lJ
WRDIGI Transfer Lf do to
WRDIGI
I,o,t lin, Of OQI~ If
WROIGINP
t t
to
Lf address
Set 82750P8 pointer to U+'
(If WRDIGINP then select ro .... If+ I)
WRDIGINP Trar"lsfer Lf to Lf+1 oddress
(Select If. 1)
linem+5lJ
.
FCO Negated
linem+6lJ
240855-15
Figure 3-3. Digitizing Example with Line Replicate
the L1 address are sent. After the fourth line, (which
has the same data as the third line) a write operation
is performed to load L1 data into the L1 address,
and the 82750PB pointer is updated to address L2.
A WRDIGINP code is sent to select the L2 address.
This same procedure continues for the entire active
display, until the last active line is reached (m + 5).
A final pair of WRDIGI and WRDIGINP or two
WRDIGINP codes are set to the 82750PB to load in
the last line of data. At the start of horizontal sync of
the next line, the FCO signal will be negated.
At the end of the first line the 82750DB sends two
WRDIGINP codes to the 82750PB, thereby transfer·
ring the LO data from the shift register to the VRAM
array at the LOaddress. The 82750PB does not per·
form a pitch calculation, so the pointer remains at
the address for LO. After the second display line
(which has the same data as the first line),· a
WRDIGI code is sent to the 82750PB that writes the
LO data to the LO address and updates the bitmap
pointer to L1. The WRDIGINP signal immediately following this selects the L 1 address. After the third
line of data, two WRDIGINP codes that select
1-32
inlet
8275008
4.0 PROGRAMMING THE 827500B
Pipeline Delay through the 827500B
The actual horizontal pipeline delay through the
82750DB is dependent on processing elements
used to generate the output. If Y interpolation is not
used, the pipeline delay is:
Overview
All registers are loaded by the issuance of a REGX
command from the 82750DB to the 82750PB over
the VBUS. This causes the 82750PB to load a sequence of register values into the VRAM serial output registers from an address designated by a
82750DB register pointer. After the request is granted, a new 82750DB register word is read in with
each SCLK. Each 32-bit word consists of a register
address in the high byte and register values in the
rest of the word. The sequence is terminated by a
stop code that corresponds to the address byte being equal to Oxff. A variable number of 32-bit words
can be loaded. During reset, if a stop bit is not found
within 256 T-cycles, the register transfer is terminated, a SHUTDOWN code is asserted on the VBUS,
and the 82750DB returns to the reset state. All
transfer requests are terminated at the start of a new
field. This ensures that non-terminating register
transfers caused by bad register data will be halted.
Horiz. Active Pipeline Delay = 16 cycles +
SCLK Transfer Timing Delay
Here the SCLK Transfer Timing Delay is 1 for 1X, 2
for 1/2X, and 3 for 1/3X.
If Y interpolation is used, the pipeline delay is:
Horiz. Pipeline Delay = 16 cycles +
SCLK Transfer Timing Delay + Integer (Pixel Time)
The integer (Pixel Time) is simply the integer value
of the programmed pixel time. The horizontal pipeline
delay for blanking differs from that of active. When yinterpoloation is on or off, the pipeline delay for horizontal blanking is:
Horiz. Blanking Pipeline Delay = 10 cycles +
SCLK Transfer Timing Delay
The horizontal sync pipeline delay is always equal to
o cycles.
During this register transfer, and on all subsequent
register transfers (programmed or automatic), the
82750DB performs a vertical checksum on the register data. The last 32-bit word read in during a register transfer is the user-generated checksum of that
register data. If the 82750DB-generated checksum
error does not match the user-generated checksum,
the 82750DB sends out a SHUTDOWN code to the
82750PB via the VBUS, and will automatically re-enter the reset state.
Thus all horizontal parameters, (e.g. horizontal
blanking start, active stop) must be programmed to
account for the total horizontal pipeline delay. The
vertical pipeline delay. The vertical blanking and
vertical sync pipeline delay are always equal to 0
lines. All vertical parameters must be programmed
so that this delay is taken into account.
1-33
8275008
PROGRAMMING CONSIDERATIONS
The user must ensure that the 827500B is programmed correctly. Illegal or illogical combinations
of display parameters are not corrected in hardware,
and may cause the 827500B to output erroneous
display or timing information. The following list highlights some basic guidelines to follow when programming the 827500B.
1. The maximum rate that data may be read into the
827500B is determined by the type of memory
used. This in turn effects the maximum rate and
depth of data that can be displayed. If 32 bits of
data can only be read into the 827500B every
two clock cycles, only 16 bits of data may be displayed every clock cycle. The programmer
should match the transfer rate (1 X, 1/2X, or
1/3X) with the memory speed, and the display
pixel rate with the pixel depth and memory bandwidth.
2. Blanking intervals of the display are defined by
the non-active programmed time. During this portion of the display, programmed transfers take
place. If a transfer does not complete before the
start of the active display, it is terminated, and
active display data is shifted into the 827500B at
the programmed rate. During horizontal blanking
intervals, the user should allow enough time for
all programmed register, colormap, and VU data
transfers to complete.
3. When digitizing (capturing) images, no other bitmap transfers (e.g., REGX,VU) should be scheduled to occur during the active portion of the field.
4. Active start and stop times shouid not be programmed to overlap the blanking stop and start
times, taking the pipeline delay through the
827500B into account.
5. Programming the Y interpolation to occur in a
non-integral pixel width will cause the Y channel
to output incorrect data.
CURSOR REGISTERS
The following registers are used to program the
characteristics of the on-chip cursor.
Ox5b
Cursor Position Update Register
31
24
01011011
-
23
12
Vertical Position
11
Ox5a
Cursor Control Register
31
24
01011010
-
23
12
Vertical Position
11
o
Horizontal Position
Horizontal Position in units of T -cycles
Vertical Position in units of full lines
This register also gives the horizontal and vertical
position of the cursor. The cursor will extend 16-pixel
periods, starting at the prescribed horizontal position, for the next 16 lines. (Or 32-pixel periods for 32
lines if the 2X Cursor Mode bits in the General Control register are set to one.) Receipt of this address
also. causes the 827500B to interpret the next sixteen 32-bit words of register data as the 16 x 16 x
2-bit cursor map. This will cause the register address
decoding logic internal to the 827500B to be disabled, and the next 16 words of information will be
loaded into the Cursor table. Each 32-bit word will be
interpreted as a line (16 pixels) of cursor data, with
the two least significant bits corresponding to the
first cursor pixel to be displayed.
Cursor Color 3
Ox59
If the cursor is enabled and the 24 bits of data in this
register are selected, the data will be sent directly to
the YUV conversion matrix during active display. The
bits should be programmed as RGB values when the
YUV to RGB matrix is not being used.
Cursor Color 2
Ox 58
If the cursor is enabled and the 24 bits of data in this
register are selected, the data will be sent directly to
the YUV conversion matrix during active display. The
bits should be programmed as RGB values when the
YUV to RGB matrix is not being used.·
Cursor Color 1
Ox57
o
Horizontal Position
Horizontal Position in units of T-cycles
Vertical Position in units of full lines
This register gives the horizontal and vertical position of the cursor. The cursor will extend 16-pixel
periods, starting at the prescribed horizontal position, for the next 16 lines. (Or 32-pixel periods for 32
lines if the 2X Cursor Mode bits in the General Control register are set to one.
1-34
If the cursor is enabled and the 24 bits of data in ihis
register are selected, the data will be sent directly to
the YUV conversion matrix during active display. The .
bits should be programmed as RGB values when the
YUV to RGB matrix is not being used.
8275008
Each register has two, 12-bit components, listed
with. least significant bits first, followed by the 12
most significant bits. Horizontal timing is measured
in units of T-cycles (periods of the master clock)
from the start of horizontal sync. The register content defines the number of T-cycles that elapse before the event controlled by this register takes place.
The exception to this rule is the base counter, which
specifies the number of T-cycles/half line, Zero is
not an allowable value; use the total number of T-cycles per half line or full line instead. Unused bits
should be zero. Sync signals are RESET to initial
values as specified for each; "start" means to set to
1, and "stop" means to be reset to zero.
12
# of Lines/Field
-
31
11
24
01010101
-
12
23
HSYNC Stop
HSYNC Stop in units of T-cycles
VSYNC Stop in units of half lines
Ox54
31
o
11
o1 0 1 0 1 0 0
-
o
11
VSYNC Stop
Sync Starts
VSYNC Start
HSYNCStart
HSYNC Start in units of T-cycles
VSYNC Start in units of half lines
The Sync Stops and Sync Starts registers are used
in conjunction with one another to specify the start
and stop locations of the horizontal sync, HSYNC,
and vertical sync, VSYNC, output signals. VSYNC
may be programmed to start and stop at any time
during a given field as defined on a half-line interval.
Bits 23 through 12 in the Sync Starts and Sync
Stops registers are used to define the start and stop
times for VSYNC, respectively. Similarly, HSYNC
may be programmed to start and stop at any line
position as defined in units of T -cycles. Bits 11
through 0 in the Sync Starts and Sync Stops registers are used to define the start and stop positions
for HSYNC, respectively.
Ox56
Base Counter
23
Ox55
Sync Stops
DISPLAY TIMING REGISTERS
o
#of T-Cycles/Half Lines
T-cycles/Hal Line in units of T-cycles (Periods of the
master Clock)
Half Lines/Field in units of half lines
As defined by NTSC standards, vertical timing can
be measured from the start of a field in one of two
ways: either in units of half lines, or in units of full
lines. When programmed for an interlaced display,
(Le. an odd number of half lines per field) the st'art of
a field coincides with the start of a line on odd fields
and with the midpoint of a line on even fields. In the
latter case, for an event that is programmed in full
lines, the first half line is ignored, and counting begins with the first full line. With this interpretation, the
register content defines the number of half or full
lines that elapse before the event controlled by this
register takes place. The same may be said for the
horizontal component, which is defined by the number of T-cycles/half line. The hardware does not
look for nor correct illogical combinations of register
settings. The monitor should be protected from damage with external circuitry when debugging is in
progress.
The horizontal component of the Sync Stops register
also affects the composite sync, of CSYNC output. In
this case, the CSYNC output will be the same as the
HSYNC output, except during the vertical sync and
equalization interval. In the latter case, the CSYNC
output is determined by the Serration and Equalization registers.
Ox53
Blanking Stops
23
o 1 0 1 001 1
-
All of the internal timing is derived from comparing
the programmed values with the values of this register. The horizontal base counter is programmed using the least significant 12 bits. In this case the values loaded into this register should be one less than
the desired value. Bits 23 through 12 are used to
specify the number of half lines per field.
1-35
12
Vertical Blank Stop
11
o
Horizontal Blank Stop
HB Stop in units of T-cycles
VB Stop in units of half lines
The Blanking Start and Stop registers control the
composite blanking output (CB). The horizontal
blanking start and stop position, in units of T-cycles,
can be specified to occur at any time during the line.
By the same token, the vertical blanking start and
stop positions can be programmed to occur at any
half-line interval.
827500B
The CB output combines both the horizontal and
vertical· blanking pulses programmed using these
two registers. This information is independent from
the HSYNC, VSYNC, arid CSYNC outputs, so the
user must specify the proper blanking intervals for
the monitor that is being used. If .the programmer
specifies the blanking period to end before the active line starts, or start after the active line has ended, the border color is output. Due to internal pipeline delays on the 827500B, the values should be
one less than desired for VB Start and Stop. For HB
Start and Stop subtract the total horizontal pipeline
delay.
Ox52
Blanking Starts
2~
31
23
01010010
- HB' Start in units of T-cycles
- VB Start in units of half lines
Resets to 1
Resets to 1
Pr~gram values one less .than desired for VB Start
and Stop. FClr horizontal blanking start, load numbers less than the total horizontal pipeline delay.
Pre·Equalization
'Pulses
)a
I...
I
Ox51
Serration Start
31
24
12
23'
o 1 0 1 000 1
Not Used
o
11
Serration Start
- SER Start In units of T-cycles Resets to 0
- (not used)
The vertical component of the CSYNC (composite
sync) signal is ·made up of two types of pulses:
equalization and serration pulses. The window during which the serration pulses are active,is determined by the VSYNC start and stop positions, as
shown in Figure 4-1. When vertical sync (VSYNC) is
active, 'in this' case on line 3, the first serration pulse
is output on the CSYNC signal. This pulse will start
at the T-cycle count specified in Bits 11 to 0 of the
Serration Start register. The pulse will end when the
half-line count specified in the Base Counter register
has been reached. This pulse will be repeated for
every half line that the VSYNC output is programmed to be active, regardless of the position in
the field. iri Figure 4-1, this continues until half line
12, or line 6.
Serration Pulses ,
Start 01 Odd .,eld
,,~Ho"
rizontalEqualization Stop
,
f ',
Vertical Serration Start
vfical Equalization Stop
CSYNC
VSYNC
HSYNC
uneCount
~ ++
++2 ++3 ++4"++5 ++6 ++7'~4 +8 ++9 +.
1
O'
Figure 4-1_ Programming the Video Sync Outputs
1-36
240855-16
8275008
O){50
Equalization Parameters
Active Region Starts
o
1211
o
31
o 1 0 1 0 0 0 0 Vertical Equaliza1ion Stop Horizontal Equalization Stop
-
EOH Stop in units of T-cycles
EOV Stop in units of half lines
01001,110
Resets to 1
Resets to 1
-
During the vertical equalizing period, which starts at
field-beginning, an equalization, pulse is output on
the CSYNC signal at the beginning of each half line,
as shown in Figure 4-1. The width of this equalization pulse is determined by the value in bits 11 to 0
of this register. The half line on which these pulses
are to stop is programmed in bits 23 through 12 of
this register. If VSYNC is programmed to occur during the equalization interval (as it is for NTSC type
displays), the serration pulses are output on the
CSYNC signal.
Active Region Stops
31
2423
01001111
Vertical Active Stop
Horizontal Active Start
Actdis Start in units of T-cycles
Vertical Start in units of full lines
Burst Gate Stop
31
24
o 1 001
-
Ox4d
23
12
11
Vertical BG Stop
10 1
o
Horizontal BG Stop
Horizontal Stop Position in units of T-cycles
Vertical Stop Position in units of full lines
The Burst Gate Horizontal and Vertical Start and
Stop registers allow the user to program a window
into which burst can be added. This is useful when
modulating the outputs of the 827500B.
Olc4f
12 11
Ox4e
Burst Gate Start
Ox4c
o
31
Horizontal Active Stop
24
23
01001100
- Actdis Stop in units of T-cycles
- Vertical Stop in units of full lines
-
The active region window, during which pixels to be
displayed are fetched from VRAM, is defined by the
Active Region Start and Stop registers, The first display line is actually five lines after the line indicated
in the vertical region of the Active Region Start register. The position of the active region on a horizontal
line is determined by the horizontal component of
the Active Region Start register. Pixels will be
fetched, from VRAM at a rate determined by the
number of bits/pixel and pixel widths. In order for the
827500B to operate properly, the horizontal width of
the active region window must be an integral number
of display pixel widths, taking into account the horizontal pipeline delay, Also, the Active Region Start
and Stop must fall within a single line boundary, as
dictated by the Base Counter register. When,the first
pixel actually appears at the output of the 827500B,
the output is a function. of the processing elements
used as discussed above.
'
12
11
Vertical BG Start
o
Horizontal BG Start
Horizontal Start Position in units of T-cycles
Vertical Start Position in units of full lines
VBUS CODE REGIS-rERS
The following group of registers are used by the pro:
gram mer to schedule when VBUS transfer or control
codes are to be sent to the 82750PB by the
827500B.
O){4b
Display Format Load Interrupt
31
2 23
01001011
-
1211
Vertical DFL Position Horizontal DFL Positio
Horizontal Position in units of T-cycles
Vertical Position in units of full lines
This is the programmable XY interrupt, used by the
82750PB to perform a load of the Shadow Copy registers.This interrupt is sent on the VBUS when the
bits 23 to 12 match the current display line position,
and, bits 11 to 0 match the T-cycle count.
When the active region is over, the border color is
output until the programmed blanl<42
CLUT Mask Data Register
Ox5e
Not Used
Ox43
Miscellaneous Control
Ox5f
Not Used
Ox44
General Control
Ox60
Not Used
Ox45
Pixel Control
Ox61
Not Used
Ox46
Blanking Color
Ox62
Not Used
Ox47
Alpha Register
Ox63
Not Used
Ox48
Border Color
Ox64
Not Used
Ox49
Register Transfer
Ox65
Not Used
Ox4a
Line Notification and Timing
Ox66
Ox4b
DFL Load
Ox67
Ox4c
Burst Gate Start
Ox68
Not Used
,
Not Used
Not Used
Ox4d
Burst Gate Stop
Ox69-0x6e
Not Used
Ox4e
Active Region Start
Ox6f
Not Used
Ox4f
Active Region Stop
Ox70
Not Used
Ox 50
Equaliz(ition Parameters
Ox71-0x7f
Not Used
Ox51
Serration Start
Ox80-0xfe
Not Used
Ox52
Blanking Start
Oxff
Stop Code
1-43
82750D6
5.0 ELECTRICAL DATA
Maximum Ratings
Table 5-1 is a stress rating only, and functional operation
at the maximums is not guaranteed. Functional operating conditions are given in the DC and AC Characteristics (Tables 5-2, 5-3, 5-4, and 5-5).
Taible
5~1.
Exposure to the Maximum Ratings may affect device
reliability. Furthermore, aithough the 827500B contains protective circuitry to resist damage from static
electrical .. discharge, always take precautions to
avoid high static voltages or electric fields.'
Absolute Maximum Requlreinents
"Maximum
Condition
Requirement
-65·C to 110·C
Case Temperature under Bias
- '65·C to 11 O·C
Storage Temperature
Voltage on Any Pin with Respect to Ground
':'O.5Vto
Supply Voltage with Respect to Vss
-'0.5V to
Vce +
0.5V
+6.5V
DC Characteristics
Table 5-2. DC Characteristics Vcc = 5V ±10%,TcASE ;':' O°C to 95°C
Symbol
Parameter
Min
Typ
Unit
-0.3
Notes
V1L
Input LOW Voltage
VIH
InputHIGH Voltage
VOL
Output LOW Voltage
"V
IOL
VOH
Output HIGH Voltage
V
IOH
~
VSS<'-"NtA.
1-50
int:eL
8275008
Table 5-6. DAC A.C. Characteristics
Symbol
Parameter
Min
ft4~.)!;
Typ
Unit
Notes
tr, tf
Rise/Fall Time
ns
(Note 1)
ClkF
Clock Feedthrough
dB
(Note 2)
GlEn
Glitch Energy
pV-sec
(Notes 2, 3)
Skew
Output Skew
ns
Xtlk
Crosstalk
pV-sec
(Note 2)
NOTES:
1. Maximum value is for RL = 750 and CL =
2. Assumes an 80 MHz filter on output.
3. Glitch energy generated from the il)!l.
4. DISDIG must be tied high.<;;t
5. Assumes the use of 0.1 IlF capacitor be
% to 90% fuliscale transmission.
n VGCS and AVcc and 0.1 IlF and 10 IlF capacitors between IREFIN and AVcc .
IRE FIN
Ground
VGCS
8275008
Avss
Avec
Ground
f-~~---+---+--- +5V (AveC>
R
To
Monitor
G
8
R, = 750
RL = Load Resistance
C, =0.1
CL
::::
~F
Load Capacitance
tis = 255 * Iref
18.5
Vb = IfII * RL
where:
o (9)lcIA®-8®lo®1&
mm (inch)
DETAIL
J
8.28 (.18S)
8.14 (,885)
.1-.~
S CEG •
1 CEG.
DETAIL L
240855-36
Figure 6-2. 132-Lead PQFP Mechanical Package Detail-Typical Lead
.."~
,..". ",.1
E2
1.32 ('SS2)
1.22 <-0848)
. ~
8.~S (.835) MIN.
2.83 (.888)
1.'3 <'871:0)
----02 ----I
DETAILM
240855-34
mm (inch)
Figure 6~3. 132-Lead PQFP Mechanical Package Detail-Protective Bumper,
240855-33
mm (inch)
Figure 6-4. Detailed Dimensions of the 82750DB In the ·132~Lead PQFP Package-Molded Details
1-54
8275008
'1 11 0 .1.35 (0.925>1
SEE DETAIL L
,-+-.j.j...-
SEE DETAIL J
I - - - D3/E3---i
1------
D4/E4 ----~
~----D/E----~
240855-35
mm (inch)
Figure 6-5. Detailed Dimensions of the 82750D8 in the 132-Lead PQFP Packag&'-Terminal Details
NOTES,
ALL OIt'£NSIONS
»4)
TOLERANCES CONFeR< TO ANSI Yl4.5I1-1'82
rn
OATI..t1 PLANE
LOCATED AT THE I'Q.O PARTING LlN£ AND
COINCIDENT 1IITH TI-£ BOTTOH OF THE LEAD IIt£RE LEAD EXITS PLASTIC BODY
OATI..t1S ~ Ar«l
ga TO BE DETEMIIIN£O MRE CENTER LEADS EXIT
PLASTIC BOOY AT OAT\.I'I I'LANE
rn
CONTROl.LlNO OIP€NSION, INCH
OIP€NSIONS 01, 02, El AND E2 ARE Mt4SUl£O AT THE rQ.D PARTING LINE.
01 AND El 00 NOT INCLi..IJ£ AH ALLOllASl..! rQ.D PROTRUSION OF 1.18 1'91
(.817 IN> PER SIDE. 02 AND E2 00 "ilT INCLlJlE A TOTAL ALL01lAllLE
rQ.O PROTRUSION OF D.18 I9t (.BII7 INI AT I1AXII'U'I PACKAOE SIZE.
PIN 1 IDENTIFIER 18 LOCATED UITHltl
()j£
OF TI-£ niO ZONES 1r«lICATED
240855-37
1-55
·Inlel®
..
8275008
TA
Package Thermal Specifications
The 8275008 is specified for operation when Tc
(the case temperature) is within the range of ooe to
95°. Tc may be measured in any environment to determine whether the 8275008 is within specified operating range. The case temperature should be measured at the center of the top surface.
=
TC - P • fiCA
Typical values for eCA at various airflows are given
in Table 6-3 for the 132-lead PQFP package. When
using the digital outputs, Table 6-4 shows the maximum TA allowable (without exceeding Tel at various
airflows. The power dissipation (P) is calculated by
using the typical supply currents at 5V as shown in
Table 5-2.
T A (the ambient temperature) can' be calculated
from eCA (thermal resistance from case to ambient)
with the following equation:
Similarly, when using the analog outputs, the maximum TA allowed is a function of Ifs. The equation for
calculating the power is given in the following
equation which can then be used in calculating the
maximum TAP = 5V * (lCCNT + (3 * Its + 6))
Table 6-3. ThermanResistances (OCIW)
eCA Versus Airflow-ft/min (m/sec)
Package
132-Lead PQFP
0
(0)
200
(1.01 )
400
(2.02)
600
(3.04)
800
(4.06)
1000
(5.07)
26.0
17.5
14.0
11.5
9.5
8.5
Table 6-4. Maximum T A at Various Airflows
ee)
T A Versus Airflow-ft/min (m/sec)
Package
Frequency
(MHz)
0
(0)
200
(1.01)
400
(2.03)
600
(3.04)
800
1000
(4.06)
(5.07)
132-Lead PQFP
28
71
79
82
84
86
87
45
59
71
75
79
82
83
1-56
82750PB
PIXEL PROCESSOR
•
•
•
•
25 MHz Clock with Single Cycle
Execution
Zero Branch Delay
Wide Instruction Word Processor
II 512 x 48-Bit Instruction RAM
Interpolator
• Pixel
High Performance Memory Interface
II
- 32-Bit Memory Data Bus
- 50 MBytes per Second Maximum
- 25 MBytes per Second with Standard
VRAMs or DRAMs
512 x 16-Bit Data RAM
II 16 General-Purpose Registers
II Two Internal 16-Bit Buses
III 4 Gbyte Linear Address Space
II ALU with Dual-Add-With-Saturation
Mode
III 132-Pin PQFP
II Variable Length Sequence Decoder
II Compatible with the 82750PA
Intel's 82750PB is a 25 MHz wide instruction processor that generates and manipulates pixels. When paired
with its companion chip, the 82750DB, and used to implement a DVI Technology video subsystem, the
82750PB provides real time (30 images/sec) pixel processing, real time video compression, interactive motion
video playback and real time video effects.
Real time pixel manipulations, including 30 images/sec video compression, are supported by the 25 MHz
instruction rate. On-chip instruction RAM provides programmability for execution of a wide range of algorithms
that support motion video decompression, text, and 2D and 3D graphics. Inner loops are optimized with the
integration of sixteen 16-bit quad ported registers, on-chip DRAM, and two loop counters that provide zero
delay two-way branching "free" in any instruction. Two, 16-bit internal buses enable two parallel register
transfers on each 82750PB instruction, contributing to the real time performance of the video processing.
Another feature that adds to the processing power of the 82750PB is the 16-bit ALU, which includes an 8-bit
dual-add-with-saturate operation critical for pixel arithmetic. Other specialized features for pixel processing
include a 2D pixel interpolator for image processing functions and a variable length sequence decoder for
decoding compressed data.
The 827.50PB is implemented using Intel's low-power CHMOS IV Technology and is packaged in a 132-lead
space-saving, plastic quad flat pack (PQFP) package.
Video Output
CSYNC
Video
B!U
Mlxer/
Display
G/Y
Device
VRAM
R/V
ALPHA[7:0]
Seriol Shift
Register
VRESElI
HRESEl#
Video Input
240854-1
82750PB Subsystem Diagram
Intel Corporation assumes no responsibility for the use of any circuitry other than circuitry embodied in an Intel product. No other circuit patent
licenses are implied. Information contained herein supersedes previously published specifications on these devices from Intel.
February 1991
© INTEL CORPORATION, 1991
1-57
Order Number: 240854-003
•
iniet
82750PB
82750PB Pixel Processor
CONTENTS
CONTENTS
PAGE
PAGE
VRAM Pointers ...................
1-86
,
Shadow Copy ..................... 1-86
Host Interface ........................ 1-87
Host Register.Access ............. 1-88
Host VRAM Access ............... 1-89
Host External Access ............. 1-89
Host Register Address Mapping ... 1-89
Initializing the 82750PB .............. 1-96
Performance Monitoring .............. '1-97
HostlVRAM Timing Diagrams ........ 1-97
1.0 82750PB PIN DESCRIPTION ....... 1-61
Pinout ................................ 1-61
Quick Pin Reference ................. 1-65
2.0 ARCHITECTURE ...................
Overview .............................
Registers ............................
ALU .................................
Barrel Shifter ...................... , ..
Data RAM ...........................
Loop Counters .......................
Microcode RAM ......................
Horizontal Line Counter ..............
Field.Counter ........................
Input FIFOs ............. : ............
Output FIFOs ........................
Statistical Decoder ....................
Pixel Interpolator .....................
Mode Select .......... ~ . .. . ... .. . ..
Reset .............................
Pairing ............................
Phase .... : .......................
Pipelining .........................
Reserved .........................
Signature Register ...................
Display Format Registers .............
1-68
1-68
1-68
1-69
1-70
1-70
1-70
4.0 MICROCODE INSTRUCTION
1-71
FORMAT ............................ 1-102
1-72
Overview ...........................
.Instruction Sequencing ..............
Instruction Word Field Descriptions ..
NADDR-Next Instruction Address
Field ...........................
CFSEL-Condition Flag Select
Field ............................
ASRC-A Bus Source Select
Field ...........................
ADST-A Bus Destination Select
. Field ...... , ....................
BSRC-B Bus Source Select
Field ...........................
1-72
1-72
1-73
1-74
1-79
1-80
1-80
1-80
1"80
1-80
1-81
BDST-B Bus Destination Select
Field ...........................
CNT-Decrement Loop Counter .
Bit .............................
LIT-Literal Select Bit ............
SHFT-Shift Control Field ........
ALUSS-ALU Source Select Bits .
1-81
1-81
3.0 HARDWARE INTERFACE .......... 1-82
VRAM Interface ......................
VRAM Accesses ..................
FastVRAM Cycles ............,....
VBUS Codes ......................
Priority ............................
1-82
1-83
1-102
1c102
1-102
1-102
1-102
1-103
1-103
1-103
1-103
1-103
1-103
1-104
1-104,
ALUOP-ALU Operation Code
. Field, ..... : .................... 1-104
LC-Loop Counter Select Bit· ..... 1-104
1-84
1-84
1-85
1-58
intel"
82750PB
82750PB Pixel Processor
CONTENTS
CONTENTS
PAGE
PAGE
Figure 5"1. Clock Waveforms ........... 1-112
Figure 5-2. Output Waveforms .......... 1-112
5.0 ELECTRICAL DATA ............... 1-110
D.C. Characteristics .................... 1-110
A.C. Characteristics .................... 1-111
Output Delay and Rise Time Versus Load
Capacitance ......................... 1-113
Figure 5-3. Input Wavef6rms ...........
Figure 5-4. CLKOUTWaveforms .......
Figure 5-5. Typical Output Valid Delay
Versus Load Capacitance
under Worst Case
Conditions ..................
Figure 5-6. Typical Output Rise Time
Versus Load Capacitance
under Worst Case
Conditions ..................
Figure 6-1. Principal Dimensions of the
82750PB in the 132-Lead
PQFP Package .............
Figure 6-2. Detailed Dimensions of the
82750PB in the 132-Lead
PQFP-Molding Details .....
6.0 MECHANICAL DATA .............. 1-114
Packaging Outlines and Dimensions .... 1-114
Package Thermal Specifications ........ 1-119
FIGURES
Figure 1-1. 82750PB Pinout .............
J
'
Figure 1-2. 82750PB Functional Signal
Groupings ...................
Figure 2-1. 82750PB Block Diagram .....
Figure 2-2. Input FIFO Control Register ..
Figure 2-3. Output FIFO Control
Register .....................
Figure 2-4. Statistical Decode CONTROL
Register .....................
Figure 2-5. VRAM Bitstream Decoding
Addresses ...................
Figure 2-6. Pixel Interpolation ............
Figure 2-7. Sequential-2D. Pixel
,
Interpolation .................
, Figure 2-8. Pixel Interpolator Control
Register .....................
Figure 2-9. Pixel Pair Phases ............
Figure 3-1. Access State Diagram .......
Figure 3-2. Cyclic Ordering of FIFOs .....
Figure 3-3. VRAM Addressing ...........
Figure 3-4. VRAM Read and Write
Cycles ................... . . ..
Figure 3-5. VRAM Transfer and Refresh
Cycles .......................
Figure 3-6. Host Register Read and Write
Cycles .... . . . . . . . . . . . . . . . . . ..
1-61
1-64
1-68
1-72
1-112
1-112
1-113
1-113
1-115
i -116
Figure 6-3. Detailed Dimensions of the
82750PB in the 132-Lead
PQFP-Terminal Details .... 1-116
Figure 6-4. 132-Lead PQFP Mechanical
Package Detail-Protective
Bumper ...... " ............. 1-117
Figure 6-5. 132-Lead PQFP Mechanical
Package Detail-Typical
Lead ., ...................... 1-117
1-73
1-77
1-78
1c79
1-79
TABLES
Table 1-1. Pin Cross Reference by Pin
Name ............. '.......... 1-62
1-80
1,-81
1-83
Table 1-2. Pin Cross Reference by
Location ....................
Table 1-3. Pin Descriptions .............
Table 1-4. Output Pins .................
Table 1-5. Input Pins ...................
Table 1-6. Input/Output Pins ...........
Table 2-1. Bit Assignment for cc
Register ....................
1-86
1-86
1-98
1-98
1-99
Table 2-2. ALU Opcodes ...............
Table 2-3. Circular Buffer Register .....
Table 2-4. Sample Code Description
Table .......................
Table 2-5. Decoded Values ............
Figure 3-7. Host External Cycles ........ 1-100
Figure 3-8. Host VRAM Read and Write
Cycles .................... 1-101
Figure4-1. Literal Field Mapping onto a
Bus ......................... 1-104
Table 2-6.
Figure 4-2. 82750PB Instruction, Word
Format ..................... 1-108
1-59
1-63
1-65
1-67
1-67
1-67
1-69
1-69
1-73
1-75
1-75
END Mode Decoded
Values .... , ................. 1-75
82750PB
82750PB Pixel Processor
CONTENTS
Table 2-7.
Table 2-8.
CONTENTS
PAGE
END Flag Decoded Values .. 1-76
Packed 3-Bit Field Decoded
Values ........................ 1-76
VRAM Bitstream Decoded
Values ......................
Table 2-10., Decod(ng Symbols ..........
Table 2-11. Mode Select Operating
Modes ......................
Table 2-12. Pipelining Delay for
Sequential-2D NON-PAIR
Mode ........................
Table 2-13. Signature Values ............
PAGE
Table 3-12. Bit Assignments for
PROCESSOR STATUS
Register .................... 1-93
Table 3-13. 82750PB A Bus
Source/Destination Register
Mapping .................. ,. 1-94
Table 3-14. 82750PB B Bus
Source/Destination Register
Mapping .................... 1-95
Table 2-9.
1-78
1-78
1-80
Table 3-15. VRAM Pointer RAM
Mapping .................... 1-96
Table 4-1. Mirocode Next Instruction
Selection .................. 1-102
Table 4-2. PC Load Example .......... 1-103
1-81
1-81
Table 2-14. Display Registers ........... H2
Table 3-1.
Table 3-2.
VRAM Interface Signals ..... 1-82
82750PB VRAM Access
States ....................... 1-83
Table 3-3.
Table 3-4.
VB US Codes ............... , 1-85
Priority of VRAM
Operations ................. , 1-85
Table 3-5.
Table 3-6.
Host Interface Signals ....... 1-87
Host, VRAM and External
Device Signals. , ........... , 1-87
Table 4-3.
Table 4-4.
Table 4-5.
Table 5-1.
Condition Flag Select Field
Assignments ............... 1-103
SHIFT Control Field
Coding ..................... 1-104
82750PB Source/Destination
Coding ..................... 1-106
Absolute Maximum
Requirements .............. 1-110
Table 5-2.
D.C. Characteristics., ...... 1-110
A.C. Characteristics at
25 MHz .................... 1-111
PQFP Symbol List ...... , ... 1:;14
Table 3-7.
82750PB Host Transaction
States ...................... 1-88
Table 5-3.
Table 3-8.
Host Cycle Types ., ......... 1-88
Table 6-1.
Table 3-9.
Host Address Mapping ...... 1-90
Table 6-2.
Table 3-10. Bit Assignments for
Microcode Processor
CONTROL Register ......... 1-91
Intel Case Outline Drawings
for PQFP at 0.025-lnch
Pitch ....... , ............... 1-115
Table 6-3.
Thermal Resistances
Table 6-4.
Maximum T A at Various
Airflows .................... 1-119
eC/W) ..... , ............... 1-119
Table 3-11. Bit Assignments for
INTERRUPT FLAG
Register .................... 1-92
1-60
82750PB
1.0 82750PB PIN DESCRIPTION
Pinout
131
132
129
'lO
127
'28
125
'26
123
'2'
121
122
119
120
,,~
117
118·
116
11]
",
,11
112
109
liD
107
'DB
10!t
106
103
'04
101
102
'DO
000000000000000000000000000000000
023
o
022
IISS
o IICC
0021
0020
vee
024
vee
D26
025
\ISS
02.
027
030
029
VSSi
031
•
\ISS
vee
All
ClKOUl
A29
AJO
A27
\ICC
A28
\ISS
A25
A26
vec
A23
o
o
A22
A21
o
o
.'9 o
A18
o
IItC
o
A20
0019
o 01B
0017
0016
9
0015
10
OIlSS
11
001.
12
0013
AI6
.3
0012
A'5
14
o
A14
0"
011
.~
0010
16
009
o
o
82750PB Pinout
TOP VIEW
All
o
o
o
o
o
o
.'2 o
o
o
99
98
97
96
95
9'
9)
92
91
go
89
88
87
as
05
84
17
o IISS
18
008
19
007
20
006
2'
22
01lSS
AI!
0
7t
005
07
78
23
00.
AI!
24
003
\ISS
0
0
0
.'0 o
AD
2~
o
26
002
27
Om
M
28
000
M
29
3D
J.
32
Jl
o
o
vee
IISS
AS
\ISS
HINT'
A2
NlClr5l,
H.DY'~
UROY
PM'Al'
~ :" HBUSEN'~
TAN'.'~
HAAIIIi,UAEZ' t,.::;~' T~
o
~
£b
vee b
b
b
I vee bClk'Nb WE' b
I
I
00
0
0
000 0 0
0000000
00
IICC
HR[C, B[2'
\ISS
BEJI
BED'
8Et '
\ISS
\ISS
WUs[J) 7U12:01 I HALEN' _
VCC
l1li[0'
\ISS
YCC
o
o
o
o
o
o
o
o
o
o
0)
B2
81
80
n
78
75
7.
73·
72
71
70
69
88
67
llCC \ISS
240854-2
Figure 1-1. 82750PB Pinout
1-61
82750PB.
Table 1-1. Pin Cross Reference by Pin Name
Pin
Name
A2
A3
A4
A5
A6
A7
A8
A9
A10
A11
A12
A13
A14
A15
A16
A17
A18
A19
A20
A21
A22
A23
A24
A25
A26
A27
A28
A29
A30
A31
BEO#
BE1#
BE2#
Location
71
72
73
74
77
78
79
80
81
83
84
85
86
87
88
90
92
93
95
96
97
102
103
105
106
107
110
111
112
113
44
43
42
Pin
Name
Location
Pin
Name
Location
Pin
Name
BE3#
ClKIN
ClKOUT
DO
01
02
03
04
05
06
07
08
09
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
41
47
114
.28
27
26
24
23
22
20
19
18
16
15
14
13
12
11
9
8
7
6
5
4
3
130
129
128
126
125
122
121
120
030
031
HAlEN#
HAlT#
HBUSEN#
HINT#
HRAM#
HROY#
HREG#
HREQ#
MROY#
MREQ#
NXTFST#
PMFRZ#
RESET#
RFSH#
TEST#
TRNFR#
VBUS[O]
VBUS[1]
VBUS[2]
VBUS[3]
Vee
Vee
Vee
Vee
Vee .Vee
Vee
Vee
Vee
Vee
Vee
119
118
55
31
36
30
58
38
40
56
60
59
61
70
63
62
69
37
54
53
52
50
2
33
35
45
51
65
67
75
82
91
98
Vee
Vee
Vss
Vee
Vee
Vee
Vee
Vee
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
WE#
1-62
Location
100
104
94
109
116
123
127
132
1
32
34
39
48
57
66
68
76
89
99
101
108
115
117
124
131
10
17
21
25
29
46
64
49
82750PB
Table 1-2. Pin Cross Reference by location
location
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Pin
Name
location
Pin
Name
VSS
Vee
021
020
019
018
017
016
015
VSS
014
013
012
011
010
09
VSS
08
07
06
VSS
05
04
03
Vss
02
01
DO
VSS
HINT#
HAlT#
VSS
Vee
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
VSS
Vee
HBUSEN#
TRNFR#
HROY#
VSS
HREG#
BE3#
BE2#
BE1#
BEO#
Vee
VSS
ClKIN
VSS
WE#
VBUS[3]
Vee
VBUS[2]
VBUS[1]
VBUS[O]
HALEN#
HREQ#
VSS
HRAM#
MREQ#
MROY#
NXTFST#
RFSH#
RESET#
VSS
Vee
VSS
location
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
1-63
Pin
Name
location
Pin
Name
Vee
Vss
TEST#
PMFRZ#
A2
A3
A4
A5
Vee
VSS
A6
A7
A8
A9
A10
Vee
A11
A12
A13
A14
A15
A16
VSS
A17
Vee
A18
A19
VSS
A20
A21
A22
Vee
VSS
100
101
102
103
104
105
106
107·
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
Vee
VSS
A23
A24
Vee
A25
A26
A27
Vss
Vee
A28
A29
A30
A31
CLKOUT
VSS
Vee
VSS
031
030
029
028
027
Vee
VSS
026
025
Vee
024
023
022
VSS
Vee
82750PB
.
AJ31:9)
ADDRESS BUS)
--
eLKIN
MREQ#
RESET#
TRNFR#
CLKOUT
VRAM
..
RFSH#
NXTFST#
INTERFACE
MRDY#
A
--
AJ8:2)
-... SHARED
ADDRESS BUS
82750PB
HREQ#
A
/BYTE ENABLE BUS
HREG#
.K
D[31:D) "-
r
HALT#
-HINT#
PMFRU
INTERFACE
---
DATA BUS
HBUSEN#
IL
VRAM
INTERFACE
HALEN#
HRDY#
VDP
BETWEEN
HOST AND
WE#
HRAM#
HOST
INTERFACE
BE#[3:D)
YBUS[3:0)
"VDP COM_ BUS)
}
VCC
VSS
MICROCODE
SIGNALS
~POWER
jCONNECTIONS
24.0854-3
Figure 1-2. 82750PB Functional Signal Groupings
1-64
intel®
82750PB
Quick Pin Reference
Table 1-3 provides descriptions of 82750PB pins.
Table 1-3. Pin Descriptions
Symbol
Type
Name and Function
ClKIN
I
ClKIN is a 1X ClOCI{ INPUT that provides the fundamental timing for the
82750PB. One cycle of ClKIN is denoted as one T-cycle.
RESET#
I
The 82750PB is reset and initialized by holding this signal active for at least t\3n
T-cycles. Refer to Initializing the 82750PB Section in Chapter 3.
HREQ#
I
The HOST REQUEST signal is a request from the host CPU to perform a read
or write access to either registers on the 82750PB, an external device, or to
VRAM shared by the 82750PB and the host. The type of access that is
requested is determined by the host access definition signals: HREG #,
HRAM#, and WE#.
HREG,#
HRAM#
I
The HOST REGISTER and HOST RAM signals, when validated by HREQ#,
are used to define three host access cycles. HRAM# active indicates the host
is requesting a VRAM read or write cycle. HREG # active indicates that the
host is requesting a 82750PB register read or register write cycle. When both
signals are inactive, a host external cycle is requested.
HBUSEN#
0
HOST BUS ENABLE is asserted by the 82750PB at the start of a host access
to indicate that the 82750PB Address and Data buses (A[31 :2], BE# [3:0], and
D[31 :0]) have been tri-stated. This allows the host to drive the same buses
either for accessing shared VRAM or the 82750PB internal registers.
HAlEN#
I
The HOST ADDRESS LATCH ENABLE signal is used to indicate to the
82750PB that the host has asserted a valid address (A[31:2], BE# [3:0]) and
write enable (WE#).
HRDY#
0
HOST READY is asserted by the 82750PB at the end of a host access to
indicate that the access cycle is ready for data transfer. For a host write cycle,
HRDY # indicates that the 82750PB is ready to accept data from the host. For
a host VRAM write cycle, HRDY # indicates that the VRAM has latched the
data from the host. For a host read cycle, HRDY # indicates that output data
from the 82750PB or VRAM is ready to be latched by the host.
HINT#
0
HOST INTERRUPT: This output is asserted when an interrupt condition is
detected by the 82750PB, and the enable bit in the PROCESSOR CONTROL
register corresponding to that interrupt condition is set to a ONE. HINT # stays
active until the host CPU reads the INTERRUPT STATUS register. If an
interrupt condition that is enabled occurs during the same cycle that the
INTERRUPT STATUS register is being read, HINT# remains active.
D[31 :0]
I/O
A[31 :9]
A[8:2]
I/O
The DATA BUS is used to transfer data between:
1. The 82750PB and VRAM, and
2. The Host CPU and internal 82750PB registers. During host VRAM accesses,
this bus is tri-stated to allow the host to share the same VRAM data bus. During
host accesses to internal 82750PB registers all 32 bits are used for data
transfer.
The ADDRESS BUS is shared between the 82750PB and the host for
addreSSing VRAM. This 30-pin bus addresses 32-bit double words in VRAM.
Byte Enable signals are used to address individual bytes or words within a
double word in VRAM. In addition, the address for host accesses to internal .
82750PB registers are communicated to the 82750PB using the lower seven
pins, A[8:2], and the BE # pins. During host access cycles to either VRAM or
82750PB internal registers, A[31 :2] are tri-stated. For internal register
accesses, as indicated by HREG# being low, the lower seven bits, A[8:21, are
used as the host address input.
ClKOUT
0
0
The CLOCK OUTPUT signal is one of the two internal clocks and is
synchronized with ClKIN. It is always driven and will have a 50% duty cycle.
1-65
82750PB
Symbol
Type
BE#[3:0]
I/O
Table 1-3. Pin Descriptions (Continued)'
Name and Function
The BYTE ENABLE BUS, is shared by the 82750PB and the host for
addressing VRAM down to the byte level. The correspondence between
the four Byte Enable pins and the D[31 :0] pins is: BE # [3]-D[31 :24],
BE# [2]-D[23:16], BE#[1l-D[15:81, and BE#[0],,-D[7:0]. During VRAM
read cycles, the 82750PB enables all four bytes. During write cycles the'
82750PB only enables those bytes that are to be written. Bytes that are
not enabled are not to be altered in VRAM. During host accesses to
82750PB on~chip registers, the BE # [0] pin is used as an input to select
whether the even or odd word is being accessed; the double word
address is provided by the host on the A[8:2] pins. BE # [0] = 0 indicates
that data is transferred on D[15:0]. BE# [0] = 1 indicates that data is
transferred on D[31:16].
MEMORY REQUEST is asserted for the first cycle, T1, of each VRAM
MREQ#
0
TRNFR#,
RFSH#
0
The MEMORY CYCLE DEFINITION SIGNALS: Transfer, Refresh and
Write Enable are asserted at the same time as MREQ #, but stay active
for the entire VRAM cycle. TRNFR # active indicates a VRAM transfer
cycle. RFSH # active indicates a VRAM refresh cycle. If neither TRNFR #
nor RFSH # are active, a VRAM data read or write cycle is requested.
I/O
The WRITE ENABLE pin is used as an output during a 82750PBIVRAM
cycle to drive the WE # signal, which defines the access as a VRAM read
cycle (when inactive) or write cycle (when active). During HostlVRAM
and Host External cycles, the 82750PB tri-states this pin to allow the host
to drive the VRAM write enable signals directly. During Host/register
cycles, this pin is used as an input for the Host Write Enable signal to
determine whether the host is reading or writing the 82750PB register.
NXTFST#
0
The NEXT FAST signal indicates that the following vram cycle can be
performed with a page-mode or bank-interleaved access. This signal is
asserted during the first of a pair of VRAM cycles that is guaranteed to be
within the same VRAM page and in opposite banks-a pair of accesses
to two sequential double wOids in VRAM ai addresses Even Address and
Even Address + 1. In other words, A[2] is a zero for the first cycle and a
one for the second cycle.
MRDY#
I
The MEMORY READY input indicates that the VRAM cycle has
progressed to the point where it is ready to perform the data transfer; For
a VRAM read cycle, the VRAM data can be latche9 by the transition of
. MRDY # to an active state. For a VRAM write cycle, MRDY # indicates
that the data has been latched into the VRAMs.
VBUS[3:0]
I
The VDP COMMUNICATION BUS is used to communicate from the
82750DB to the 82750PB. Codes sent over this bus indicate interrupt
requests, transfer requests, and status information. Since the 82750DB
and 82750PB run asynchronously, the VBUS signals are sampled on the
falling edge of ClKIN and compared with the previous sample. For a
VBUS code to be detected by the 82750PB, it must be valid for two
successive samples.
HAlT#
I
The HALT signal causes the microcode processor on the 82750PB to
halt prior to executing the next instruction. This signal does not halt the
VRAM interface. The Halt signal will allow the design of a hardware
emulator for the 82750PB based on an 82750PB chip.
TEST #
I
The TEST signal is used for test purposes only and must remain high for
normal operation.
cycle.
WE#
1-66
InteL
82750PB
Table 1-3. Pin Descriptions (Continued)
Symbol'
Type
PMFRZ#
0
Name and Function
The PERFORMANCE MONITORING AND FREEZE signal is toggled by
specific microcode instructions and can be used to determine the time
required to execute certain sections of microcode.
+ 5V D.C. supply input.
Vee
I
POWER pins provide the
Vss
I
GROUND pins provide the OV connection to which all inputs and outputs
are referenced.
Table 1-5. Input Pins
Table 1-4. Output Pins
Active
Level
Name
CLKOUT
High
When
Floated
Name
Active
Level
Synchronous/
Asynchronous
Synchronous
Always Driven
CLKIN
High
Reset', Host Cycle
RESET#
Low
Asynchronous
HBUSEN#
Low
Reset'
HREQ#
Low
Asynchronous*
HRDY#
Low
Reset'
HREG#
Low
Synchronous
HINT#
Low
Reset'
HRAM#
Low
Synchronous
A[31:9]
. High
MREQ#
Low
Reset'
MRDY#
Low
Synchronous
TRNFR#,
RFSH#
Low
Reset'
VBUS[3:0]
High
Asynchronous
NXTFST#
Low
Reset'
Low
Reset'
PMFRZ#
"The reset state
IS
HALT#
Low
Synchronous
HALEN#
Low
Asynchronous*
*Can be programmed to accept synchronous Inputs.
caused by RESET # being active low.
Table 1-6. Input/Output Pins
Name
Active Level
When Floated
Synch/Async
D[31:0]
High
Reset', Host Cycle
Synchronous
A[S:2]
High
Reset', Host Cycle
Synchronous
BE#[3:0]
Low
Reset', Host Cycle
Synchronous
WE#
Low
Reset', Host Cycle
Synchronous
"The reset state
IS
caused by RESET # being active low.
All output pins are floated when RESET is active low.
82750PB
2.0 ARCHITECTURE
Overview
The 82750PB includes a wide instruction word
processor that comprises a· number of processing,
storage, and input! output elements. The wide instruction word architecture allows a number of these
elements to operate in parallel. The 82750PB executes one instruction every internal clock cycle or
T-cycle. The various elements are connected via
two 16-bit buses, the A bus and B bus, as shown in
Figure 2-1. During each instruction execution cycle,
data can be transferred from a bus source to a bus
destination element on both buses.
Registers
IrN; N
=
0-15)
There are 16 general-purpose data registers, each
16 bits wide, that are connected to both the A bus
and B bus as both sources and destinations. These
registers are designated rO-r15. All the registers are
functionally identical except rO, which also includes
logic for bit shifting and byte swapping. A register
can source both the A bus and the B bus in the
same cycle. A register cannot be the destination of
both the A bus and the B bus in a single instruction .
. Because the registers are doubly latched, the same
register may be both a source and destination in the
same cycle. Theresult is that the data in the register
prior to the current cycle will be driven on the source
bus, and the data on the destination bus will be
latched into the register at the end of the cycle.
Register rO has additional logic to allow bit shifting
and byte swapping. The value in rO can be. shifted
left or right one bit position per instruction cycle. For
a right shift, the new MSB is equal to the old MSB; in
other words, the value is sign-extended. For left
shifting, the new LSB is equal to zero. RO can not be
shifted and loaded in the same instruction. Byte
sWapping, on the other hand, only occurs whenrO is
being loaded with a value from the A bus or B bus.
Byte swapping causes the most significant byte and
the least significant byte. of the 16-bit value being
loaded into rO to be interchanged. Refer to Chapter
4 for a description of the SHFT microcode field that
controls the shifting and swapping operations in rOo
SEQUENCER
DATA
RAM
~ICROCODE
RAM
MICROCOOE
INSTRUCTION
AlU
0[31:0]
REGISTER
FILE
BARREL
SHIFTER
COUNTERS
..--_ _ _---, A[31 :2]
VRA~
PIXEL
INTERPOLATOR
BE6[3:0]
POINTERS
HOST/VRA~
INTERFACE
240854-4
Figure 2-1. 82750PB Block Diagram
1-68
82750PB
ALU
Table 2-2. ALU Opcodes
Operation
No Operation
pass a
pass b
1's compliment of a
{atu, eel
The ALU performs 16-bit arithmetic and logic operations, and can also be operated as two independent
8-bit ALUs for the Dual-Add-with-Saturate operation.
There are two fields in the microcode instruction that
affect the operation of the ALU: the ALUOP field
specifies the operation to be performed, and the
ALUSS field specifies the source of the two ALU
inputs. Refer to Chapter 4 for further information on
these .fields.
1 's compliment of b
The two ALU operands either come from values
held in the ALU input latches or from "eavesdropping" on the A or S buses. The result of any ALU
operation is latched in the ALU output register, atu.
In a subsequent instruction this result can be transferred to any A or S destination.
The ALU has four condition flag outputs: CarryOut,
Sign, Overflow, and Zero. CarryOut is the carry out
of the most significant bit position. Sign is equal to
the value of the most significant bit of the result.
Overflow is the exclusive-OR of CarryOut and the
Carryln to the most significant bit position of the result. Zero is true (a value of1) if all 16 bits of the
ALU result are equal to zero. CarryOut and Overflow
are defined as equal to zero for all logical operations. For most ALU operations, the state of these
four condition flags are latched when the operation
is complete. There are eight operations (nop, a*, b*,
+), -I, 0*, prof and int) that are exceptions. These
operations are performed without disturbing the
condition state of the previous ALU operation.
Bit
SitO
Sit 1
Sit 2
Sit3
Sit 4
Sit 5
Sit 6
Sit 7
Sit 15:8
Condition
False (This bit of the cc is always read as
a zero.)'
ALU Carry Out
ALU Overflow
ALU Sign
ALU Zero
Loop Counter Zero'
RO LSS'
RO MSS'
RESERVED. The state of these bits is
undefined when read; write as zeros.
&
-&
&-
a+b
a+b+1
a-b
+
++
-
-a + b
2'5 compliment of a
2'5 compliment of b
-+
-a
-b
Interrupt Host
Zero
Pass a, Don't Latch Flags
Pass b, Don't Latch Flags
(NOTa) OR b
.aOR (NOT b)
Dual Sub: with Sat.
Perform. Monitor/Profile
Table 2-1. Bit Assignments for cc Register
a
b
-a
-b
aANDb
(NOTa)ANDb
aAND (NOT b)
aORb
aXORb
Incrementa
Increment b
Decrement a
Decrement b
Dual Add with Sat.
a + b + (Prev. Carry)
a - b - (Prev. Sorrow)
- a + b - (Prev. Sorrow)
Microcode routines can read and write the ALU condition flag register, cc. This can be used to save and
restore the state of these flags. The bit ordering of
the ALU condition flags within cc are given in Table
2-1.A complete list of ALU opcodes is given in Table
2-2.
Mnemonic
nop
1
A
a++
b++
a-b--
+]
+<
-<
-+<
int
0*
a'
b*
-I
1-]
prof
The Dual-Add-with~Saturate operation performs independent 8-bit ADDs on the upper and lower bytes
·of the two ALU operands. The two bytes of the A
operand are treated as unsigned binary numbers
(00:FF16 corresponds to .0:25510). The two bytes of
the S operand are treated as offset binary numbers
'These are read·only values and are not affected by writes to the cc
register.
.
.
.
1-69
82750PB
with an offset of + 128 (OO:FF16 corresponds to
-12810: 12710). The upper and lower byte results
are treated as 9-bit offset binary, including the carry
output of each byte, with a + 128 offset (000:1FF16
corresponds to -12810:38310) and are saturated to
a range of 0-25510. A result that is less than zero is
set equal to zero or 0016 and a result that is greater
than + 255 is set equal to + 255 or FF 16.
In fact, this operation is symmetric. Either the A operand or the 8 operand can be defined as the unsigned binary value, and the other operand will be
treated as the offset signed binary value ..
Dual-subtract-with-saturate is similar to dual-addwith-saturate. It calculates A - 8 + 128 on each
8-bit half of the two 16-bit inputs, and clamps the
results to 0 and 255. This can be viewed as subtracting an offset-binary signed byte (-128 to 127) from
an unsigned byte (0 to 255).
Data RAM
IdramN, 'dramN, + +, - -; N = 1-41
The Data RAM holds 512, 16-bit words that are accessed using four pointers. To access a value in a
particular location, the microcode routine must first
load a pOinter with the address to be accessed, and
then perform a read or write using the same pointer.
In parallel with the data RAM access, the pointer
can optionally be post-incremented or post-decremented. The four pOinters, referred to as dram 1dram4, can be written and read via the A bus. When
a dram pointer, which is only 9 bits wide, is read onto
the A bus, its upper seven bits are set to zeros.
NOTE:
The width of the dram pointers may change in
later versions of the 82750PB. Software should
not rely on the width of a pointer to, for example, mask the upper seven bits of a value to
zero.
The ALU opcode 'int' generates the MCINT (microcode interrupt) condition. When this condition is detected by interrupt logic in the host CPU interface,
and if the Enable MCINT bit in the PROCESSOR
CONTROL register is set to a ONE, the host interrupt output, HINT#, will be asserted. Refer to Chapter 3 for further information on host interface.
All four pointers can be used to read or write the
Data RAM from either the A or B bus. Only one Data
RAM access can be performed in any cycle. A Data
RAM access is referred to, using C language syntax,
as *dram1. The' means "the value pointed to by".
As another example, *dram3+ + means access the
Data RAM using the pointer dram3 and increment
dram3. The symbol - - in place of the + + would
indicate auto decrement.
The 'prof' opcode activates the PMFRZ# pin, and is
primarily used for performance monitoring and/or
debugging.
Barrel Shifter
(shift, shift-r, shift-r!. shift-II
Loop Counters
The barrel shifter performs a single cycle, n-bit left or
right shift. The barrel shifter operates independent of
the ALU. The three barrel shifter operations are:
Shift-r for a right shift with sign extend; Shift-rl for
right shift with zero fill; and Shift-I for a left shift with
zero fill. The shift operation is invoked by writing a
4-bit value (the shift amount) to one of three A bus
registers, depending on which of the three operations is to be performed. The operand is taken from
the B bus, and the result is stored in the barrel shifter output register, Shift. Like the ALU result register,
the value in Shift can be read onto the A bus. or B
bus in the following instruction cycle.
Icnt,cnt21
Two 16-bitloop counters are available to microcode
programs for automatically counting iterations of a
microcode loop. In parallel with other operations
performed in an instruction, either loop counter can
be decremented, and a conditional branch can be
made based on the loop counter value being equal
or not. equal to zero. Since the two loop counters
can be written and read on the A bus, as cnt and
cnt2 respectively, they can also be used fo; variable
storage when not being used as loop counters. The
loop counters can be written to and decremented
during the same instruction· cycle. The value in the
counter at the start of the next cycle will be the value
written to the counter minus one.
A barrel shifter operation does not affect any of the
condition flags.
The LC microcode bit determines the loop counter
that is selected iar decrementing and/or branching
in an instruction. The LC microcode bit does not affect the loop counter that is written or read over the
A bus, since· each loop counter is separately addressable as a A bus source or destination. Refer to
1-70
82750PB
Chapter 4 for a description of the CNT - - microcode bit that causes the select loop counter to be
decremented, and for a description of the CFSEL
microcode field that is used to perform a conditional
branch based on the selected loop counter's value.
tion to be read and then reading the three 16-bit
words of the instruction from the mcode1-mcode3
registers. Normally, this would be done by the Host
CPU while the 82750PB is halted. Since mcode1mcode3 hold the instruction pointed to by the pc (Le.
the instruction that is about to be executed), normally reading these three registers froin a microcode
routine is not useful.
Microcode RAM
{meodel-3, maddr, pel
The read registers named mcode1-mcode3 and the
write registers also named mcode1-mcode3 are in
fact different registers. Writing values into mcode1mcode3 and then reading the values of mcode1mcode3 will not read back the same values just writ·
ten. The read registers hold the instruction stored ir
the instruction latch (the instruction to be executed).
The write registers hold an instruction that is about
to be written into microcode RAM.
The 82750PB executes instructions stored in an onchip microcode RAM. This RAM holds 512 instructions and each instruction is 48 bits wide. Normally,
to start the microcode processor, the host CPU will
load a microcode program into the microcode RAM,
point the program counter, pc, to the start of the
program, and then release the HALT bit to start executing the microcode program. The microcode processor can also load its own microcode RAM to overlay new routines and therefore, qoes not require
constant intervention by the host to perform multiple
operations.
After writing to maddr to load an instruction into microcode RAM, a one cycle freeze occurs and during
the freeze a write to the microcode RAM takes
place. The instruction following the write to maddr
can either jump to the address just loaded or start
loading the mcode 1-mcode3 registers with the next
instruction to be written.
Writing an instruction into Microcode RAM is done
by first loading the three registers mcode3, mcode2,
and mcode 1 with the three 16-bit words of the instruction (the most significant word goes into
mcode1), and then loading the address where the
instruction should be written into maddr.
Here are two examples that illustrate the fact that
the. 82750PB requires at least one instruction between the write to maddr and the execution of the
instruction that is loaded by the write to maddr.
The host CPU can also read the Microcode RAM by
first loading the pc with the address of the instruc-
Example 1:
maddr = ADDRl
jmp addrl
/* load instruction */
/* jump to it, this is the extra inst. required between */
/* uriting to maddr and executing the loaded inst. */
ADDR1:
????n?????
/* here '.s where new instruction gets loaded
*/
Example 2:
maddr = INST
nop
INST:
?f?????????
/* extra instruction */
/* instruction gets loaded here */
When a microcode routine writes to pc, one more instruction is executed before the jump to the new address
takes effect. For example:
.
pc
ADDRl
rO = rl
jmp ADDR2
1* this instruction gets executed but */
/* its jump to ADDR2 is ignored. */
ADDR1:
r3 = rO
/* after this instruction executes r3
1-71
= rO =
rl */
82750PB
When the host CPU writes to the pc, the instruction
at the address that was written is loaded into the·
mcade 1-mcode3 registers and, when the microcode processor is released from its Halt condition,
this is the first instruction that will be executed.
When the host CPU reads the pc, the result returned
is the address of the instruction that will be executed
when Halt is released, that is, the address of the
instruction held in the mcade1-mcade3 registers.
Horizontal Line Counter
!lent)
The 12-bit Horizontal Line Counter is updated by
VB US codes from the 82750DB to track the horizontal display line that is currently being scanned by the
82750DB. The counter is reset by a VODD code and
incremented each time an HLiNE code is received.
A value can also be written into a Horizontal Line
Counter but this is used primarily for testing the
82750PB. The upper four bits will always read zeros.
Field Counter
{tent)
The 4-bit field counter is updated by VBUS codes
from the display processor to keep track of the field
count being displayed by 82750DB. The counter is
incremented each "time a VODD code or VEVEN code
is received. When reading the field counter, the upper 12 bits will read zeiOS. This counter wiii not be
initialized upon reset.
The mode control register for each input FIFO, designated in1-c or in2-c, contains four mode bits as
seen in Figure 2-2. The WORD/BYTE bit (bit 0) determines whether the input FIFO is in word mode
(WORD/BYTE = 0) or byte mode (WORD/BYTE =
1). In byte mode, the FIFO can start reading on any
byte boundary and in word mode on any word
boundary.
The INC/DEC bit (bit 1) determines the order that
bytes or words. are read from VRAM. In INCREMENT mode, with INC/DEC = 0, the FIFO reads
from the least significant byte or word to the most
significant byte or word of each double word and
increments through double words in VRAM. In DECREMENT mode, with INC/DEC = 1, the FIFO reads
from most significant byte or word to least significant
byte or word within a double word and decrements
through double words in VRAM.
The AHOLD bit (Bit 2) is used by the address hold
mode. When asserted, (bit 2 = 1) the automatic address increment/decrement function will be disabled
and input FIFOs will not double buffer VRAM data. In
other words, at the end of a VRAM cycle, when the
FIFO has been updated with 64 bits of VRAM data,
the input FIFO will not issue another MREQ# until
there is a write to the address-Io registers OR a rollover/roil-under read access of the input FIFO. If a
roll-over/roll-under occurs, then a memory request
will be issued to fetch data from the same VRAM
location. If there is a write to the address-Io register,
the FIFO will then fetch data from the new location.
The PREFETCH OFF bit (bit 3) specifies whether
the FIFO will automaticallyprefetch successive quad
words from VRAM or will only fetch a new quad word
when a value from that quad word is requested. In
PREFETCH-ON mode, bit 3 = 0, the input FIFO prefetches successive quad words from VRAM as necessary to keep its buffer full (either from ascending
or descending addresses, depending on the state of
the INC/DEC bit). In PREFETCH-OFF mode, the
FIFO will still prefetch the first two quad words to fill
its buffer (when started at a new address location),
but will only fetch a new quad word when a read
request is made to the FIFO for a value in the next
unfetched quad word.
Input FIFOs
(inN-1o, inN-hi, inN-e, 'inN; N = 1, 2)
There are two input channels, referred to as input
FIFOs, through which the processor can read pixels
or data from VRAM. Each channel automatically
fetches 64-bit quad words from VRAM and breaks
them into 8-bit bytes or 16-bit words that are read by
microcode. Each input FIFO operates independently
and can be programmed to automatically increment
or decrement through bytes or words in VRAM. The
FIFOs are double buffered so that while values are
being extracted from one quad word (64 bits), the
next quad word is being prefetched from VRAM.
bits:
15 ... 6
Set to Zeros
5
BY-32 tv10DE
4
CB
The CB bit (bit 4) allows circular buffers of sizes
64 Kbytes, 128 Kbytes, or 256 Kbytes to be created
in VRAM memory. The choice of different sizes of
buffers are determined by programming the least
signficant 3 bits of the circular buffer register (cir-
3
PFOFF
2
AHOLD
Figure 2-2. Input FIFO Control Register
1-72
1
INC/DEC
o
WORD/BYTE
82750PB
cbuf). To enable this feature, the CB bit has to be
set to a 1, then depending on the buffer size
selected, the appropriate address pin that goes off
chip will be forced to a 0 (register pointers remain
unchanged). Table 2-3 shows the programming
combinations of the circular buffer register.
Output FIFOs
loutN-lo, outN-hi, outN-c, 'outN, outN+
Table 2-3. Circular Buffer Register (circbuf)
000
Disabled
Effect on PB Address Bus
(If Function Enabled)
21
None
100
256 Kbytes Address Pin 18 Forced to 0
010
128 Kbytes Address Pin 17 Forced to 0
001
64 Kbytes Address Pin 16 Forced to 0
The mode control register for each output FIFO,
designated outl-c or out2-c, contains six mode bits
as shown in the Figure 2-3. The WORD/BYTE bit
(bit 0) determines whether the output FIFO is in word
mode (WORD/BYTE = 0) or byte mode (WORD/
BYTE = 1). In byte mode the FIFO can start writing
on any byte boundary in VRAM and in word mode on
any word boundary.
In "BY-32" MODE (bit 3), the pointer increments or
decrements by 32 bits, independent of whether the
FIFO is in 8-bit pixel mode or 16-bit pixel mode. This
mode was added to facilitate microcode that operates on one component of a 32-bit per pixel image.
The standard sequence for initializing an input FIFO
is to write to the control register (in-c), the high address (in-hi), and then the low address (in-fa) of the
appropriate FIFO. Refer to the access state diagram
in Chapter 3. The write to in-fa causes the FIFO to
start reading from VRAM. A byte or word is then
read from 'in. Successive reads from 'in will read
sequential bytes or words from VRAM. Writing to the
control register each time the FIFO is started at a
new address is not necessary, except to change the
FIFO's mode. Also, if the new address is within the
same 64 kByte page of VRAM, only the lo-address
needs to be written in order to start the FIFO reading
from the new address.
If microcode attempts to read a value from an empty
input FIFO, the processor is frozen prior to the execution of the instruction, until the FIFO's control logic has fetched another double word from VRAM and
extracted the next value. At this point, the processor
is released from the frozen state, and the instruction
that reads the value is executed. When the processor is frozen waiting for a particular FIFO that isn't
yet ready, that FIFO's VRAM access priority is raised
above all other FIFOs.
bits:
1,
There are two output channels, referred to as output
FIFOs, through which the graphics processor writes
pixels or data to VRAM. Each channel automatically
collects bytes or words into 64-bit quad words and
writes the quad words to VRAM. Each output FIFO
operates independently and can be programmed to
write bytes or words into sequential addresses in
VRAM (either incrementing or decrementing). The
FIFOs are double buffered so that while one quad
word is waiting to be written to VRAM, the next quad
word can be assembled from individual bytes or
words.
It is important to note that the internal address
counters themselves are not affected by the circbuf
function. Only the selected external address pin is
forced to '0'.
Bits [2:0] Buffer Size
+ .. N =
The INC/DEC bit (bit 1) determines the order that
bytes or words are written to VRAM. In INCREMENT
mode, with INC/DEC = 0, the FIFO writes from the
least significant byte or word to the most significant
byte or word in a double word and increments
through double words in VRAM. In DECREMENT
mode, with INC/DEC = 1, the FIFO writes from
most significant byte or word to least significant byte
or word within a double word and decrements
through double words in VRAM.
When the AHOLD bit (bit 2) is set, th~ output FIFO
quad word address is not incremented or decremented. In this mode, the FIFO continues to output
to a single quad word in VRAM.
The FORCE-LSB bits (bits 3 and 4) are used to force
the least significant bit of each byte written to VRAM
to either a zero or a one. This can be used, for example, to force the LSB to the correct polarity when
writing to the U bitmap during motion video decompression. In certain display modes for the 82750DB,
the LSB of the 8-bit samples in the U or Y bitmap are
used to select VIDEO or GRAPHICS display mode
for the n x n group of display pixels corresponding to
the particular U or Ysample. A one in the FORCE-
15-6
5
4
3
2
1
0
Set to Zeros BY-32 MODE FORCE-LSB FORCE-LSB AHOLD INC/DEC WORD/BYTE
ENABLE
VALUE
Figure 2.-3. Output FIFO Control Register
1-73
82750PB
LSB ENABLE bit (bit 4) enables the forcing; a zero
results in normal operation. The FORCE-LSB VALUE bit (bit 3) is used as the value to which the LSB is
forced. Whether in byte mode or word mode, the
LSB of each byte is forced to the FOR9E-LSB value.
Statistical Decoder
Istat-1o, stat-hi, stat-c, stat-ram, 'stat, 'stat# I
The Statistical Decoder (also referred to as the Huffman Decoder) is a specialized input channel that
can read a variable-length bit sequence from VRAM
and convert it into a fixed-length bit sequence that is
read by the microcode processor. In image compression, as well as in other applications such as
text compression,' certain values occur more frequently than others. A means of compressing this
data is to use fewer bits to encode more frequently
occurring values and more bits to encode less frequently occurring values. This type of encoding results in a variable-length sequence in which the
length of a symbol (the group of bits used to encode
a single value) can range for example, from one bit
to sixteen bits.
In "BY-32" MODE (bit 5), the pointer increments or
decrements by 32 bits, independent of whether the
FIFO is in 8-bit pixel mode or 16-bit pixel mode. This
mode is used to facilitate microcode that operates
on one component of a 32-bit per pixel image. The
bytes or words that are skipped over will be unchanged in VRAM.
The standard sequence for initializing an output
FIFO is to write to the control register (out-c), the
low address (out-to), and then the high address (outhi) of the appropriate FIFO. A series of bytes or
words is then written to 'out. Refer to the access
state diagram in Chapter 3 (Figure 3-1).
The statistical code that the statistical decoder can
decode is of either of the two forms: .
In order to flush any remaining data in an output
FIFO before changing its VRAM pointer, it is necessary to write to the control register. When pointing to
a new location in VRAM, if the new address is within
the same 64 kByte page of VRAM, only the lo-address needs to be written.
Ox
10x
110xxx
1110xxxxx
1x
01x
001 xxx
0001xxxxx
or
11111110xxxxxx
111111110xxxxxx
There must be one instruction between the write to
the output FIFOs low address and the first write to
·outN. Therefore, it is recommended that outN-lo be
written before outN-hi. The write to outN-hi insures
that this requirement is met. If only the outN-lo value
is being changed, it is still necessary to have one
additional instruction before the first write to ·outN.
00000001 xxxxxx
000000001 xxxxxx
Each symbol of a given length (one per line as
shown here) consists of a run-in sequence followed
by some number of x-bils. The run-in sequence is
defined as a series of zero or more ONEs followed
by a ZERO or, as in the code on the right above,
zero or more ZEROs followed by a ONE. The remainder of this description will use examples of the
code on the left. A bit in the decoder's control register determines the polarity of the run-in sequence
bits.
When writing bytes or words to VRAM through an
output FIFO, byte or word can be skipped over by
writing to outN+ + instead of 'outN. When the values are written to VRAM, any byte or word that was
skipped will retain its original value in VRAM, and its
value is not altered by the VRAM write. This can be
used when writing a series of pixels, some of which
are "transparent", allowing whatever was behind
them to show through.
a
In the example on the left, there would be two symbols of length two: 00 and 01. Each x-bit can take on
a ZERO or ONE value. The number of x-bits following a run-in sequence can range from zero to six.
Since the goal, in general, is to have a few short
codes and a larger number of long codes, typically,
codes with fewer run-in bits will have fewer x's following. However, this is not a hardware constraint. A
code of this form is completely described by a code
description table indicating: for each length of run-in
sequence, R = the number of ONEs in the run-in,
and how many x-bits follow the ZERO. The value of
R is used as an index into the code description table.
Due to the hardware implementation, the number
actually stored in the table is 2x, where x is the number of x-bits.
If the microcode routine attempts to write a value to
a full output FIFO, the processor is frozen prior to
the execution of the instruction. The processor remains frozen until the FIFO has a chance to write
one of the buffered quad words to VRAM. At that
point, the processor is released from the frozen
state, and the instruction that writes the value is executed. When the processor is frozen, waiting for a
particular FIFO that isn't yet ready, that FIFO's
VRAM access priority is raised above all other
FIFOs.
1-74
For the example above, the corresponding code description values are given in Table 2-4.
int:eL
82750PB
Table 2·4. Sample Code Description Table
R
X
2X(dec.)
2X(bin.)
0
1
2
3
1
1
3
5
2
2
8
32
0000010
0000010
0001000
0100000
6
64
1000000
where X(r) corresponds to the X value in the table
entry corresponding to R = r.
For example, in the above code:
B(O) = 0,
B (0) is always zero
B(1) = 0 + 2 = 2
B(2) = 0 + 2 + 2 = 4
B(3) = 0 + 2 + 2 + 8 = 12
B(4) = 0 + 2 + 2 + 8 + 32 = 44
...
7
This is one of the "reasons that the table holds 2x
instead of X. The calculation of B(R) are easier to
implement in logic.
Note that the table only goes up to symbols with
seven ONEs in the run-in. For symbols with more
than seven ONEs, the value of X and 2X for seven
ONEs is used for all symbols having seven or more
ONEs in the run-in sequence. For example, in the
code above a symbol with eight or more ONEs in tne
run-in sequence has six x-bits following the ZERO,
which is the same as symbols having seven ONEs.
There are two enhancements that are made to this
coding scheme in the implementation on the
82750PB. These two modes are referred to as END
mode and SHORT mode. If neither END nor SHORT
mode are enabled, the decoding is performed as described above. SHORT mode allows the decoder to
be switched easily to a simpler code format without
having to reload the code description table. In the
SHORT form, all symbols have the same number of
x-bits, as though all entries in the table had been
filled with the same value of 2x. When SHORT mode
is invoked, this value of 2x is obtained from a field in
the statistical decoder's CONTROL word, instead of
from the individual table entries.
For each different symbol, including all symbols of
the same run-in length with different x-bit values, the
decoder generates a unique fixed-length, 16-bit value. Some of the decoded values for the sample
code given above are provided in Table 2-5.
Table 2·5. Decoded Values
Symbol'
Decoded Value
00
01
1
" 100
2
101
3
110000
4
110001
5
110010
o
10x
110x
6
...
.
END mode is added in recognition of the fact that,
for codes with few symbols, some increase in efficiency is possible by not having to place a zero at
the end of the longest run-in sequence. For example, consider the code:
0
The END mode allows us to shorten the last symbol
to 11 x instead of 11 Ox. The trailing ZERO is not required because the decoder has been told that the
maximum length of a run-in is two ONEs. The resulting symbol set and corresponding decoded values
'
are given in Table 2-6.
. ..
110111
11
111000000
12
...
. ..
111011111
43
...
Table 2·6. END Mode Decoded Values
Symbol
0., •
The x·blts of the symbol are
In
0
boldface for clarity.
Th!3 algorithm for generating a decoded value from a
symbol is as follows: all symbols of a given run-in
length are assigned a base value, B; the value corresponding to a particular symbol is equal to B plus the
binary value of the x-bits in the symbol. The base
valule B for a symbol with a run-in length of R is
calculated by:
B(R)
=
SUM[2 X(rl] with r
=
0 to R - 1,
1-75
Decoded Value
0
100
1
101
2
110
3
111
4
82750PB
The number of x;bits must be. constant for all symbols of the same run-in length. Therefore, a code
such as:
Table2-S.Packed 3-Bit
Field Decoded Values
Table Entries
Code
o
10xx
11xxx +- NOT CORRECT! ... Must be 11xx.
is not allowed. The last symbol (11 xxx, in this case)
uses the same table entry for 2x as the next to last
symbol (10xx) and, therefore, the last symbol will. be
11xx.
Index
0
0
10xx
1
110xxx
2
111xxx
3
4
5
6
7
0
0
O·
4.
1
8
.. -
-
'.
-
1
1
3
5
6
,
2X
.
0
1xx
7
END Bit
I
oxx'
4
Table 2-7. END F:lag Decoded Values
Table Entries
END Bit
2
The maximum length of the run-in sequence in END
mode is specified by placing an END flag in the code
description table. For example, a code and the corresponding table is shown in Table 2-7.
Code
Index
2x
41 N
=
3, so X
-
-
-
-
-
= 21
:
-
The unpacked bits are in reverse order relative to
how they are stored in VRAM. For example, if thr!3ebifvaluesare packed in VRAM,the pattern 110in
VRAMis read from right to left and gives an un.
packed or decoded value of 3;
The CONTROL register for the statistical decoder
(stat~c) is used to specify the mode to use for decoding, as well as to invoke certain m()des for writing
and reading the ccide description table. Flefer to. the
bit aSSignments for this register below. To write to
the code description. table, the WRITE bit (bit 4) is
set to a ONE; the starting table index is reset to·
zero. Each write to the table causes the index to
increment by one. This index will wrap around from
seilen back to zero. For example, to write all eight
table entries the user would write a value of Oic10 to
stat-c register and then write eight a-bit values to the
register stat-ram. The most significant bit of each
8-bit value is the END bit, and the lower ~even bits
are the values of 2x. To read the code description
table, the TEST bit (bit 5) of the CONTROL register
is set to a one. The table entries are then read from
the decoder's data register (·stat). Reads and writes
always start at table entry zero.
-
-
The hyphens indicate that those table entries aren't
used to decode this code. Note that the symbol
111 xxx has three x-bits because of the value of 2x in
Index 2; it is not based on the 2x value in Il"!dex 3.
The SHORTED and END modes can be invoked
simultaneously, resulting in a code such as:
Ox
10x
110x
111x
NOTE:
When reading the code description table, it is
necessary to' wait one instruction time between
the write to stat-c and the first read from ·stat.
An access diagram showing all legal sequences
ior read and write' FIFO registers is shown in
Chapter 3 (Figure 3-1).
with a SHORT - 2 x value = 2 (for 1, x-bit in each
symbol) and the END bit set in Index 2.
Packed binary fields with one to seven bits per field
can be read using the statistical decoder by setting
the END bit in Index 0 and by programming the X
value to be N -1, where N is the number of bits per
field. For example, packed three-bit fields could be
decoded as shown in Table 2-8.
1-76
intel~
82750PB
The-code for reading the eight table entries into the first eight locations of data RAM would be:
dram3 = 0
cnt = 8
LOOP:
stat-c =, Ox20
Itest mode to read the stat-ram (the table)
Iwait one inst. before first read
'dram3+ + = 'stat cnt- jcp loop
Itwo inst. loop necessary to wait one inst.
Ibetween each read from ·stat.
Bits
15
POL
14
RSVD'
13
CB
12:8
SVAL
7
SHORT
6
END
5
TEST
4
WRITE
3
RSVO*
2:0
Starting
Stat-ram
ADDRESS
• Reserved: write zeros to these bits.
Figure 2·4. Statistical Decode CONTROL Register
END mode is enabled by setting the END bit (bit 6)
in the CONTROL register to a ONE. The SHORT
mode is enabled by setting the SHORT bit (bit 7) in
the CONTROL register to a ONE. When in SHORT
mode, the five SVAL bits (bits 12:8) in the CONTROL register are used as the SHORT - 2x value.
word and the fetch of the next 32-bit word may overlap. As with the input and output FIFOs, the decoder
has a VRAM pointer associated with it that points to
the location in VRAM from which it is reading data.
This pointer increments twice each time a new quad
word is read; there is no decrement mode. When the
least significant word of the decoder's pointer (statIo) is written, any data that had previously been prefetched from VRAM is ignored" and the decoder
fetches one quad word starting from this new location.
The POL bit (bit 15) determines the polarity of the
run-in sequence bits. If bit 15 = 0, then ONEs ending in ZERO (e.g., 111 Oxxx) sequence is selected. If
bit 15= 1, the ZEROs ending in ONE (e.g., 0001 xxx)
sequence is selected.
'
The 82750PB assumes that the statistically encoded
bitstream in VRAM starts with the least significant bit
of a double word. That is, the two LSBs of the address written to start-Io are ignored.
The CB bit (bit 13) allows circular buffers of sizes
64 Kbytes, 128 Kbytes, or 256 Kbytes to be created
in memory, as in the case of the input FIFO. The
choice, of different sizes of buffers are determined
by programming the least significant 3 bits of the
circular buffer register (circbuf). To enable this feature, the CB bit has to be set to a 1, then depending
on the buffer size selected, the appropriate address
pin that goes off chip will be forced to a 0 (register
pointers remain unchanged). Table 2-3 shows the
programming combination of the circular buffer
register.
The statistical decoder decodes data at a rate of
one bit per T-cycle. To a first approximation, the decode time for an N-bit symbol is:
decode time (in T-cycles) = N
+
1
Since it takes at least 64 T-cycles to decode data
from one quad word, which is the time required fo
eight quad word reads from VRAM, the decoder
should rarely run out of data. Therefore, the above
estimate should very accurately model the actual
decoding rate of the statistical decoder.
The decoding parameters may be changed between
symbols by writing to the CONTROL register and, if
necessary, writing new values into the, code description table. The correct procedure for changing the
code type or decode mode is to read the last value
from the decoder prior to the change, using *stat#
instead of *stat. This keeps the decoder from automatically starting to decode the next symbol. At this
point, the code description table and the SHORT
, and END mode bits can be changed as desired. The
next time the CONTROL register is written with both
TEST = 0 and WRITE = 0, the decoder will begin
to decode the next symbol using the new parameters.
The statistical decoder always begins to read the
bitstream from the least significant bit of the double
word found at the starting location in VRAM. That is,
the decoder does not start on a byte or word boundary as an input FIFO or output FIFO does, but only
on double word boundaries. The bitstream moves
from the least significant bit to the most significant
bit of a double word and then to the least significant
bit of the next double word (at the next higher ad-
The statistical decoder buffers one quad word read
from VRAMso that the decoding of bits in one 32-bit
1-77
iniei~
82750PB
dress location). For the x-bits. the first x-bit read
from the bitstream becomes the most significant bit
of the x-bit field when it is interpreted as a binary
number. The example below shows a code definition. a bitstream stored in VRAM. and the resulting
decoded values.
The code definition and range of values for each
symbol length are indicated in Table 2-9.
Table 2-9. VRAM Bitstream Decode Values
Symbol Values
0
Comments
0
10x
1.2
100=1.101=2
110xx
3-6
11000 = 3•...• 11011 = 6
1110xxx 7-14 1110000 = 7•...• 1110111 = 14
Decoding starts at address 0 in this example. The
two double words at addresses 0 and 1 are:
0: OxAC98E14D
1: Ox372E74CB
The bitstream in VRAM. with colons dividing the
symbols (read from right to left starting at LSB of
address 0) is shown in Figure 2-5.
Table 2-10 lists the symbols. in the order they are
encountered in the bitstream. and the corresponding
decoded values.
Address MSB
o
l1li(
Table 2-10. Decoding Syril60ts- -_.
Symbol
101
Value
2
. 100
101
0
0
0
0
1110001
100
100
11010
1110100
1
2
0
0
0
0
8
1
1
5
11
11001
0
1110011
101
0
0
1110110
4
0
10
2
0
0
13
...
. ..
Read bltstream from LSB to MSB 0(
Comments
Starts at LSB.
AddressO.
Scanning Left
Spans First and
Second Double Word
LSB
Start
1: 0 1 0 1 1 : ~ 0 1 : 00 1 : 1 0 0 0 1 1 1 : 0 : 0 : 0 : 0 : 1 0 1 : 0 0 1 : 10 1+ Here
First bit of a symbol, continued at LSB of next double word
0:0110111: 0: 0: 101: 1100111: 0: 10011 :001011
240854-5
Figure 2-5. VRAM Bitstream Decoding Addresses
82750PB
h
o
h=6~16
4
AI
B
I
0-0·
The example in Figure 2-7 shows a single row of
pixels being interpolated in Sequential-2D mode using two rows from the original (source) bitmap. The h
and v weighting are constant for ali the interpolated
pixels. In this case, the weights appear to be approximately h = 10/16 and v = 6/16.
I
12
I
15
I
·0
B
v
ABE
F
W
X
y
Z
C
D
G
H
K
4-·
B-·
-First Input Row
-Interpolated Row
--Second Input Row
Figure 2-7. Sequential-2D Pixel Interpolation
• W.
v=10/16 - - ••
The pixel interpolator is pipelined and requires some
startup sequence to fill the pipeline. Once .filled, the
pixel interpolator generates a new interpolated pixel
every two T-cycles when in Sequential-2D mode.
Source pixels are written into the interpolator as pixel pairs. In the case above, the pixel pair BA would
be written first, followed by the pixel pair DC. It would
seem more natural to refer to the pixel pair as AB,
but because of the way 8-bit pixels are arranged in
16-bit words in VRAM, the left-most pixel on the
screen is the least significant byte position. For example, if pixel A had a hex value of OxAA and B had
a value of OxBB, the 16-bit word containing pixels A
and B would have a value of OxBBAA. _
12-·
15-·
o
o
c
o
240654-6
Figure 2-6. Pil{el Interpolation
lPixellnterpolator
(Pixint-c, Pixint)
The pixel interpolator performs bilinear interpolation
on four 8-bit pixels to generate, in effect, a pixel
shifted by a fraction of a pixel position. See Figure
2-6. If the four pixels have values of A, B, C, and D;
and the horizontal weight and vertical weight are h
and v, respectively, the interpolated value W, ignoring any quantization effects, is given by: .
W = A*(1-h)(1-v)+ B*h(1-v)+C*(1-h)v+D*hv
The values of hand v are even multiples of 1/16.
Figure 2-6 illustrates pixel interpolation with an h
weight of 6/16 or 3/8 and a v weight of 10/16 or
5/8.
The pixel interpolar can operate in two modes: sequential-2D and random-2D. Sequential-2D mode is
used for motion video decoding and wheri an array
of pixels are interpolated with a common weighting.
Random-2D mode is used either when the pixel arrays to be interpolated are not adjacent pixels in two
rows or when the weight is changed for each interpolation_ (The word random is used here to mean
non-sequential.)
1-79
Then, two pixels are read from the interpolator. Because the pipeline isn't full yet, these pixels are read
and discarded. This loop of writing two pixel pairs
and reading two output pixels continues four times.
The two pixels that are read this fourth time are the
first two valid output pixels: Wand X. The interpolator may also collect output (interpolated) pixels into
pixel pairs. For exmple, pixels Wand X, instead of
being output separately, would be combined into a
16-bit pixel pair XW. Since there are two possible
phase relationships between the input pixel pairs
and output pixel pairs, the desired phasing (either X
and W paired or Y and X paired) can be specified.
intei®
82750PB
bits
13
12
11
10
9
8
7:4
3:0
15
14
'RESERVED-Write as ZERO
'Pipelining Select (1 = Fast,O = Standard)
'Phase (0 = In Phase, 1 = Opposite Phase)
'RESERVED-Write as ZERO
'Pairing (1 = Output Pixel Pairs, 0 = Single Pixels)
'Reset Bit (1 = Reset, 0 = Normal)
Mode Select Bits - - - - - - - - - - - - - - - - - - -' - - - - - - -'
Vertical Weight - - --'
Horizontal Weight - - - - - -'
Figure 2-8. Pixel Interpolator Control Register
Random-2D interpolation is used either when the
pixels to be interpolated are not in horizontal rows or
when the weight is changed for each interpolated
pixel. Examples for this are smooth warping or
smooth scaling operations. In the case of Random2D, the processing for successive interpolated pixels can not take advantage of pipelining; each pixel
is considered to be the first pixel of a Sequential
mode interpolation. The weight and the two input
pixel-pairs are written into the interpolator. After
waiting at least 10 T-cycles, the one interpolated pixel can be read. (The delay is 10 cycles when in the
standard mode (bit 14 = 0) and 6 T-cycles when in
the fast mode (bit 14 = 1).) Then, the next two input
pixel-pairs and if necessary, the new weight value,
are written, and 10 cycles later the next interpolated
pixel can be read.
RESET
Writing a ONE to bit 10 resets the pixel interpolator.
The pixel interpolator must be reset prior to changing modes.
PAIRING
A ZERO in bit 11 causes the pixel interpolator to
output individual pixels. A ONE causes the interpolator to collect adjacent pixels (in Sequential-2D
mode) into 16-bit pixel pairs. This feature assists in
motion video decoding, when combined with the
ALU's dual-add-with-saturate operation, by allowing
two pixels to be processed each cycle. The phasing
used in collecting the pixel pairs is determined by the
Phase bit described below.
The h and v weight values, the mode selection, and
other control bits are written to the pixel interpolator
contiOl register (avg-c). The bit assignment for this
register is in Figure 2-8. The least significant byte
holds the 4-bit v value (bits 7:4) and the 4-bit h value
(bits 3:0).
'
PHASE
When output pixels are collected into pixel pairs,
there are two possible alignments of the input pixel
pairs to the output pixel pairs. The Phase bit (bit 13)
selects the alignment to be used, based on the relative word alignment of the source and destination
bitmaps in VRAM. When the Phase bit is set to a
ZERO, this indicates that the bitmaps are in-phase.
In this case, the first two output pixels are grouped
into. one 16-bit pixel pair (with the first pixel in the
least significant byte). When the Phase bit is set to a
ONE, the bitmaps are out-of-phase. In this case, the
first pixel is placed in the most significant byte of the
first pixel pair, with invalid data in the least significant
byte, and the second and third output pixels are collected into the second pixel pair. This is illustrated in
Figure 2-9.
NOTE:
The values used for h and v here are numerators
of the fraction where the implied denominator is
16.
MODE SELECT
Bits 8 and 9 are used to select on of four operating
modes, of which only two are presently defined.
These modes are given in Table 2-11.
Table 2-11. Mode Select Operating Modes
Bits 9:8
Mode
00
RANDOM-2D
01
Sequential-2D
10
RESERVED
11
RESERVED
PIPELINING
A ZERO in bit 14 causes the pixel interpolator to use
the standard amount of pipeline delay. A ONE in this
field will select the fast mode that has less pipeline
delay. Table 2-12 shows the pipelining delay for both
modes. Note that the effect of the phase bit is to add
an extra pixel delay.
1-80
82750PB
In-Phase:
A_ _ B
I_ _ J
E_ _ F
W_ _ X
C_ _ O
1st Row of Input Pixels Pairs
Y_ _ Z
Output Pixel Pairs
G_ _ H
K_ _ L
2nd Row of Input Pixel Pairs
E_ _ F
I_ _ J
1st Row of Input Pixels Pairs
Out-of-Phase:
~B
?? _ _W
C_ _ O
~y
G_ _ H
Z _ _??
Output Pixel Pairs
K_ _ L
2nd Row of Input Pixel Pairs
Figure 2-9. Pixel Pair Phases
82750PA emulation mode, and the 82750PB in native mode. The currently defined signature values
given in Table 2-13.
Table 2-12. Pipelining Delay for
Sequeniial-2D NON-PAIR Mode
Pipelining
Bit
(Bit 14)
Phase
Bit
(Bit 13)
Pipeline Delay
in Output
Pillels
0
0
6
0
1
7
1
0
2
1
1
3
Table 2-13. Signature Values
Value
Definition
OxFFFE
The 82750PB Emulating the 82750PA
OXFFFC
The 82750PB in Native Mode
All other signature values are presently undefined
but may be used in the future to denote other versions of the 82750 architecture.
When in PAIR mode (with bit 11 = one), the amount
of pixel delay does not change, but half as many
reads and writes are required to fill the pipeline because each read or write of the averager transfers
two pixels. For example, when in the standard mode
(bit 14 = 0), with zero phase (bit 13 = 0) and pair
mode (bit 11 = 1), three indeterminate pixel pairs
must be read before the first good pixel pair is read.
In the same case but with the phase bit = 1, the
fourth pixel pair read contains one good pixel and
one indeterminate pixel, and the fifth pixel pair read
contains two good pixels.
Display format Registers
{yeven, yodel,
VU,
vptrl
The 82750PB's processor can write to the display
registers in the VRAM interface. These registers are
pointers and pitch values that address display bitmaps and 827500B register loads in VRAM. Pointers are 32-bit values that specify the specify the
starting byte address of a bitmap or register load
within a 4 GByte address space. The bottom two
address bits are ignored since display bitmaps and
register loads must start on a double word boundary.
Therefore, the internal representation of a pointer is
a 30-bit value. The pitch value associated with each
pointer indicates the number of bytes between the
start of two lines of a display bitmap or between the
start of two register loads. The pitch is a single 16-bit
value with its two least significant bits ignored, since
the pitch must be an integer number of double
words. Currently, there is also a restriction in the
8275008 limiting all display bitmap pitches to powers of two; so, the maximum display bitmap pitch is
± 214 Bytes = ± 16 kBytes. The display registers
are described in Table 2-14.
RESERVED
Bits 15 and 12 are reserved for future use. Write
ZEROs into these bit positions.
Signature Register
{hwid!
The signature register can be read either by the host
CPU or by microcode to determine the version of the
82750PB. The value of the signature register can be
used to distinguish between the 82750PB in the
1-81
82750PIB
Table 2·14. Display Registers
Register
Description
yeven-Io, hi
This register pair points to the start of the Y bitmap or main bitmap that
is to be displayed during an even field scan.
yodd-Io, hi
This register pair points to the start of the Y bitmap or main bitmap that
is to be displayed during the odd field scan.
ypitch
The value in this register is added to the current Y bitmap pointer value
each time a Y transfer is performed.
vu-Io, hi
This register pair points to the start of the VU bitmap. This bitmap is
read to generate the VU values for both odd and even field scans.
vupitch
This value is added to the current VU bitmap pointer value each time a
VU transfer is performed.
vptr-Io, hi
This register pair points to the start of a series of 8275008 register
loads stored in VRAM.
vpitch
This value is added to the current 8275008 register load pointer each
time a 8275008 register load is performed. The pitch is equal to the
number of bytes from the start of one register load to the start of the
next register load.
3.0 HARDWARE INTERFACE
o
Arbitrates VRAM accesses between the two input
FIFOs, the two output FIFOs, the statistical decoder, the transfer request logic, the VRAM refresh logic, and the external VRAM a~cess logic.
o
During a memory cycle, performs appropriate address arithmetic on the VRAM pointer used for
that memory cycle .
o
As a result of certain V8US codes, performs a
shadow copy that consists of copying display-related VRAM pointer values"from shadow registers
(that are loaded by the host CPU or the microcode processor) to working registers where the
various pointers are used for transfer cycles
when the 8275008 is refreshing the display
screen.
VRAM Interface
The VRAM interface performs the following operations:
• Maintains VRAM pointers for the two input FIFOs,
the two output FIFOs, the statistical decoder, the
Y (main) bitmap, the VU bitmap, and the
8275008 register load.
e
Decodes V8US codes and takes appropriate actions such as generating a transfer cycle, sched.uling refresh cycles, or generating interrupt conditions.
Table 3·1. VRAM Interface Signals
Description
Signal
MREQ#
MEMORY REQUEST is asserted during the first cycle of a VRAM
memory access.
TRNFR#
The TRANSFER output indicates the current memory cycle is a result
of a 8275008 transfer request.
RFSM#
The REFRESH output indicates the current memory cycle is a result of
a 8275008 refresh request.
NXTFST#
The NEXT FAST output indicates the next memory access will use the
same row address as the current memory access. This facilitates the
use of page mode memory accesses.
MROY#
The MEMORY READY input indicates the availability of valid data on
the 0[31 :0] pins.
1-82
intel·
82750PB
VRAM ACCESSES
The 82750PB can initiate five different types of
memory accesses: FIFO read, FIFO write, transfer
read, transfer write, and refresh. In addition, the
82750PB supports VRAM accesses by external logic. During an external access VRAM cycle, the
82750PB tri-states its VRAM address and data buses and performs a host VRAM read or host VRAM
write cycle. There is another operation performed by
the 82750PB, a shadow copy, that is not a VRAM
cycle but is arbitrated as though it were, since no
VRAMcycles can take place during a shadow copy.
The seven types of VRAM cycles initiated by the
82750PB, including host VRAM read and host
VRAM write, begin with the 82750PB asserting a
combination of its three VRAM cycle definition outputs: TRNFR#, RFSH#, and WE#. External logic
detects the state of these signals, validated by
MREQ#, and produces the appropriate sequence of
VRAM control signals (RAS, CAS, etc.) to perform
the type of memory cycle the 82750PB has requested. The 82750PB requires that each of these VRAM
cycles take a minimum of two T-cycles, or T-states,
denoted T1 and T2. External logic can insert additional T2 states in order to stretch the VRAM cycle
to more than two T-cycles. The start of a new VRAM
access cycle is signaled by the assertion of MREQ#
for the first T-cycle, T1. The VRAM access cycle
definition signals, TRNFR#, RFSH#, and WE#, are
asserted at the start of T1 and remain asserted until
the end of the last T2. Other VRAM operations can
be described similarly by sequences of T-states. Refer to Figure 3-4 and 3-5 on page 42 for timing diagrams.
Table 3-2 defines the states used for all VRAM access operations. A state diagram for the VRAMI
Host Interface is provided in Figure 3-1. This diagram includes the FIFO access states
Table 3·2. 82750PB VRAM Access States
State
Description
Ti
Idle State, No VRAM Activity
T1, TF1
First State of a VRAM FIFO Cycle
T2, TF2
Last State of a VRAM FIFO Cycle
TSC
The T-State required to perform a
shadow copy
TTX1
First State of a VRAM Transfer Cycle
TTX2
Last State of a VRAM Transfer Cycle
TRF1
First State of a VRAM Refresh Cycle
TRF2
Last State of a VRAM Refresh Cycle
FIFO ACCESS
/
/
,/
--
........
- -----------.....
HOST ACCESS
........ ,,
,'
\
I
/
I
/
//
1
1
/
,
I
\
I
1
I
I
1
I
I
1roI[1.40RY NOT
REAoY
\
\"
''- ........
I
I
_-
I
I
I
!
1
1
/
/1/
R[rRESH
CYCLE
240854-7
Figure 3·1. Access State Diagram
1-83
•
_ l!
!I
InTeD~
82750PB
Note that during successive VRAM cycles it is not
necessary to go back to the idle state, Ti, between
each cycle; the TF2 state can be followed directly by
a T1 state: starting at the next VRAM. cycle. This
results in efficient utilization of the 82750PBIVRAM
bandwidth by allowing a VRAM cycle time of 2
T-states.
The NXTFST # output signal is provided for cases
when external logic can generate a faster access for
the second access of the'two sequential accesses.
During such a pair of accesses, NXTFST # is asserted during the first of the two accesses in order to
provide sufficient time for the external logic to generate the appropriate fast memory cycle for the second access. Refer to the timing diagrams in Figures
3-4 and 3-5 (page 42) for examples illustrating the
use of the NXTFST # signal.
FAST VRAM CYCLES
When the 82750PB performs Data Read or Data
Write VRAM cycles for the input or output FIFOs, it
performs two 32-bit accesses to read or write one
54-bit value. These accesses are always performed
in a sequence of EvenAddress followed by EvenAddress + 1, which guarantees both that the two sequential accesses will be in opposite banks and that
the two accesses will be within the same VRAM
page. This allows external logic to use either bankinterleaving or a page-mode access to complete the
second access of the sequence and improve the
VRAM bandwidth. However, the second access
does not need to be handled differently from the
first. Except for the assertion of the NXTFST # signal, both accesses are treated as standard VRAM
accesses. External logic can ignore the NXTFST #
signal, though, and treat the two accesses as two
normal data read or data write cycles. Note that
NXTFST # is not asserted for transfer, refresh, or
host memory accesses.
VBUS CODES
Transfer request, interrupt, and synchronization
codes are sent over.the BUS from the 827500B to
the 82750PB. The codes recognized by the
82750PB are listed in Table 3-3, along with the actions taken by the 82750PB as a result of receiving
each code. Codes that cause TRANSFER cycles
must be asserted for at least two clock cycles of the
82750PB to insure that, in the worst case, the
82750PB completes the transfer cycle before the
code is released and the 827500B starts shifting
data from the VRAM shift registers. Other codes
must also be asserted for a minimum of two
82750PB clock cycles. Only the codes given in the
Table 3-3 are valid codes for the VBUS. Other codes
are reserved for future use and should not be used.
Once a transfer cycle code is sent to the 82750PB,
any non-transfer code may be sent immediately. A
subsequent transfer cycle code should be sent only
after the. current transfer cycle is completed.
1-84
InteL
82750PB
Table 3-3. VBUS Codes
Binary
Action
Name
+
0000
YBMX
TXRO Cycle Using Yc; Yc = Yc
0001
VUBMX
TXRO Cycle Using VUc; VUc = VUc
0010
REGX
TXRO Cycle Using Vc; Vc = Vc
0011
WROIGX
TXWR Cycle Using Yc; Yc = Yc
0100
YNPBMX
TXRO Cycle Using Yc; Yc = Yc
0101
Reserved
Reserved
0110
Reserved
Reserved
0111
WROIGNPX
TXWR Cycle Using Yc; Yc = Yc
1000
OFL
OFL Int; Shadow Copy"
1001
827500B80
827500B Shutdown Interrupt
Yp'
+ VUp
+ Vp
+ Yp
1010
REFRESH
Schedule N Refresh Cycles
1011
Reserved
Reserved
1100
VOOO
VBllnt; OF Int; Shadow Copy Odd; Hline = 0'"
1101
VEVEN
VBllnt; EF Int; Shadow Copy Even
1110
HLiNE
Icnt +
1111
NULL
No Action
+
(Increment Line Counter)
NOTES:
'Yc-Y bitmap pOinter, current; Yp-Y bitmap pitch; VU-VU bitmap; V-82750DB register load .
• 'Shadow Copy with Yc = Y-start-odd in odd field; Yc = Y-start-even in even field .
• , 'Hline-Horizontal Line Counter.
gle REFRESH code from the 827500B schedules a
number of refresh cycles, a higher priority for refresh
would cause all the refresh cycles to occur in a burst
that would lock out all lower priority requests until all
refresh cycles completed. Instead, the following
restriction applies to all request types with higher
priority than refresh: high priority requests, such as
transfer cycles, shadow copies, and external VRAM
access must occur infrequently enough to allow
proper refresh of the VRAM chips. Transfer cycles
and shadow copies, by their nature, occur infrequently so they are not generally a problem.
PRIORITY
Each time the VRAM state machine completes a
VRAM operation and returns to the Ti state, it examines all pending VRAM access requests and selects
the highest priority request for the next VRAM operation. The priority ordering of these requests are listed in Table 3-4.
Table 3-4. Priority of VRAM Operations
Request Type
Priority
Transfer Cycle
Highest
Shadow Copy
•
•
•
Host Access
VRAM Refresh
FIFO Read/Write
There is a separate priority scheme for the five FIFO
channels. The scheme used is rotating priority with
automatic override and single cycle arbitration. Rotating priority means that the priority is assigned in a
fixed cyclic order with the lowest priority given to the
FIFO channel that "won" the last FIFO access.
There is only one level of memory, so the order that
requests arrive is not a factor in the arbitration. The
cyclic order is given in Figure 3-2.
Lowest
NOTE:
The shadow copy is treated as a VRAM operation even
though it does not result in an access to VRAM.
As an example, if input FIFO 0 (abbreviated if0) was
the last channel to perform a cycle, the priority order
for the next FIFO access (from highest to lowest)
would be: if1, sd, of0, of1, and il0.
The VRAM refresh operation is placed low on the
priority list to reduce the latency in servicing transfer
requests and external VRAM requests. Since a sin1-85
II
82750PB
Automatic override that the rotating cyclic priority
can be bypassed if there is an URGENT condition
for one of the channels. A channel is urgent if the
microcode processor is frozen because the processor is waiting for that channel to be ready. The channel can be either an input channel that is empty or
an output channel that is full. In this case, the urgent
channel gets the next available cycle. However, the
priority will still be lower than non-FIFO requests,
such as refresh cycles.
If a VRAM pointer appears on the B-Bus as source
or as a destination then the following rules apply:
Rule 1
If a B-Bus destination refers to an address that is
both Even and > Ox1 f, then the source is restricted
to "-10" pointers if the source refers to a pointer.
Rule 2
Single clock cycle arbitration means that the selection of the next channel that will get an access occurs in a single T-cycle or T-state, either in a Ti state
or during the last T2 state of the previous VRAM
cycle:
If a B-Bus destination refers to an address that is
both Odd and >Ox1f, then the source is restricted to
"-hi" pointers if the source refers to a pointer.
SHADOW COPY
VRAM POINTERS
When a VODD, VEVEN, or DFL code is received
from the 82750DB over the VBUS, a shadow copy is
scheduled. The actual shadow copy will occur as
soon as the priority logic allows. Any VRAM access
in progress must complete and a pending transfer
cycle, if any, must be performed before the shadow
copy can start. During the operation, shadow registers for the Y-START, Y-PITCH, VU-START, VUPITCH, 82750DB-START, and 82750DB-PITCH are
copied into the corresponding working registers.
During display refresh, the address arithmetic is performed on the working registers. The shadow registers can be loaded by the host CPU or by a microcode routine with less critical timing constraints, and
then copied instantly by a shadow copy with it is time
to update the registers, either prior to the next field
or during the active display for split screen effects.
The VRAM interface maintains VRAM pointers for
the FIFOs, as well as display-related pointers for the
82750DB. Internally each pointer or address is
stored as a 30-bit value addressing a double word in
VRAM. The pointer values are read and written as
two 16-bit words representing a 32-bit byte address
(refer to the Figure 3-3). With a 30-bit double word
address, the 82750PB can decode a VRAM address,
space of 1G double words or 4 GBytes.
Input and output FIFOs can address down to a single word or byte in VRAM. A FIFO's pointer is postincremented or post-decremented in parallel with its
VRAM read or write cycle.
The statistical decoder can only start decoding bitstreams on double word boundaries.in VRAM and
can only increment through VRAM. The decoder's
pointer is post-incremented in parallel with each of
its VRAM read cycles.
Display-related pointers are updated by adding a
pitch value to the current value during the corresponding transfer cycle.
--+
in FIFO 1
--+
InFlFO 0
--+
outFIFO 1
--+
out FIFO 0
--+
t
Statistical Decoder
1
240854-8
Figure 3-2. Cyclic Ordering of FIFOs
31
30 29
24 23
16 15
3
2
1
0
<- - - - - - - - - - - - - - - - - - VRAM Address - - - - - - - - - - - - - - - - 30 bits - - - - - - - - - - - - >
Byte Address within Double-Word ......................................................... <- - -'>
I
<- Least Sig. Wd. of VRAM Addr.->
<- Most Sig. Word of VRAM Address. ->
Figure 3-3. VRAM Addressing
1-86
82750PB
There are actually two shadow registers for YSTART. One for start of odd fields and one for start
of even fields. A VODD code causes Y-START-ODD
to be copied into the working register Y-CURRENT.
Similarly, a VEVEN code causes the Y-STARTEVEN to be copied into Y-CURRENT. A DFL code
causes the Y-START-ODD value to be copied if the
most recent start of field code received is a VODD,
or a Y-START-EVEN value if the most recent start of
field code was a VEVEN. This allows a simple interlaced or non-interleaced display to be refreshed with
. no host CPU intervention. For more complex displays, such as split screens, the host CPU must update the shadow registers prior to each shadow
copy. A shadow copy operation requires 2 T-cycles.
Host Interface
The Host Interface provides the following functions:
" Arbitrates host CPU and 82750PB access to
VRAM.
.. Provides the host access to external devices.
o Provides the host access to 82750PB internal
registers and memories.
Signals specific to the Host Interface are listed in
Table 3-5.
Table 3-5. Host Interface Signals
Description
Signal
HREQ#
HOST REQUEST: Asynchronous request from the host for all types of
host access. Used both to request and release system buses.
HREG#
HOST REGISTER: Single-ranked control to request host access to
82750PB internal registers in concert with HRAM #.
HRAM#
HOST VRAM: Single-ranked control to request host access to VRAM in
concert with HREG #.
HALEN#
HOST ADDRESS LATCH ENABLE: Asynchronous status from the host
indicating the presence of valid address, write enable (transaction
direction control), and the byte enables at the interface of the 82750PB.
HBUSEN#
HOST BUS ENABLE: 82750PB synchronous status granting the host
access to the address, write enable, data bus, and byte enables at the
interface of the 82750PB.
HRDY#
HOST READY: 82750PB synchronous status to the host indicating the
presence of valid data appearing at the 82750PB's databus for VRAM
and register accesses and optionally for external accesses.
HINT#
HOST INTERRUPT: 82750PB synchronous interrupt to the host, set
under direct or indirect microprogram control.
Signals common to the host, VRAM, and external device interfaces are listed in Table 3-6.
Table 3-6. Host, VRAM, and External Device Interfaces
Description
Signal
A[31:2]
ADDRESS BUS: System address bus used to select unique VRAM, the
82750PB register, and external device locations that will be accessed
under host control. The lower seven bits A[8:2] are bidirectional and are
used during register accesses
0[31:0]
DATA BUS: Bidirectional system data bus used to transfer data to and
from all sources and destinations. When transferring 16-bit host register
values, the data bus MSH and LSH will both carry identical values.
WE#
WRITE ENABLE: Bidirectional, single-ranked signal used to determine
the data transfer direction. When active during host register cycles, data
flows from the host to an 82750PB destination. During host VRAM cycles,
WE# active will define the data direction to be from th,e host to VRAM.
BE[3:0]#
BYTE ENABLE: Bidirectional signals used to select the bytes that will be
modified during data transactions. All host register transactions are
performed 16 bits at a time, while VRAM may be modified 8 bits at a time.
1-87
II
82750PB
As with VRAM operations, host operations are described through a sequence of T-states. Table 3-7 defines
the T-states used to implement all host transactions with VRAM, external devices, and the 82750PB.
The master execution state diagram that defines the VRAM/Host transactions is provided in Figure 3-1.
Table 3-7. 82750PB Host Transaction States
State
Description
TA
First state of any host transaction. Entry into TA will be granted after
HREQ# has been asserted. During this state, the 82750PB will tri-state
its address, data bus, write enable, and byte enable signals to provide a
full cycle of "dead-band" before the assertion of HBUSEN #. In the state
immediately following TA HBUSEN # will assert, allowing the host to drive
the host buses.
TB
First cyCle in which the host is granted bus access for register or VRAM
transactions. The sequencerwill remain in TB until HALEN # is received,
indicating that the address write enable and byte enable signals are
stable at the 82750PB pins.
TC1
First cyclethat output data is valid.
TCn
This state is entered to wait for the completion of the current host cycle.
The cycle is defined as complete when HREQ # deasserts. HRDY # is
asserted along with valid data until the transition to state TD occurs.
TO
The last cycle of a host transaction. HBUSEN# is deasserted allowing
one dead-band cycle to allow control of the address, data, write enable,
and byte enable signals to be returned to the 82750PB.
TV1
First cycle of a Host VRAM transaction. Memory is requested and is
followed by a transition to TV2.
TV2
Last cycle of a Host VRAM transaction. The sequencer will remain in TV2
until MRDY # is received.
A single stage of .input synchronization is employed
for HREG#, HRAM#, WE#, and BE[O]#, while
HREQ# and HALEN# are programmable to have
one or two stages by bit 12 of the Microcode Processor Control Register. See Table 3-10. T-state transitions are caused by the synchronized versions of
these signals.
HOST REGISTER ACCESS
The host has access to the 82750PB's internal registers and memories to monitor and control the oper:
ation of the microcode processor, provide a means
of debugging microprogram routines, and to function
as the primary test port for production testing.
The synchronized versions of HREG# and HRAM#
must be stable before entry into T-state TA. The
synchronized versions of WE#, BE[O]#, and
HALEN # should be stable before exiting T-State
TB. Once .asserted, all of the above signals should
remain stable until the deassertion of HBUSEN #.
Register access is initiated by the host asserting
HREQ#, HREG#, and HRAM# as shown in Table
3-8 and in the timing diagrams on pages 42 through
45. After the host has been granted bus access by
an active HBUSEN# in state TB, the address, write
enable, and byte enables may be driven. After these
signals have stabilized HALEN # is asserted, enabling a read or a write operation to occur.
The type of host cycle to perform is determined by
the states of HREG # and HRAM # as indicated in
Table 3-8.
Table 3-8. Host Cvcle Types
-
HREG#
HRAM#
Host Cycle
Type
1
1
External
0
1
Register
1
0
VRAM
0
0
Reserved
1-88
82750PB
In the case of a register read, state TC1 is entered
and the data bus is driven with the internal value.
One cycle later, a transition to state TC occurs, and
HRDY # activates, signaling the presence of stabilized data at the 82750PB data pins. This state (TC)
will be maintained until the host deasserts HREQ #,
signaling the completion of the cycle that caused a
transition to state TO.
NOTE:
The host device must be able to transmit or receive
memory data in order to be valid at the trailing
edge of MRDY # at the data's destination (memory
or host).
After MRDY # becomes active, a transition from TV2
into TC1 is accomplished to allow time to propagate
data to the host. TC is then entered to await the
deassertion of HREQ# (if it has not already occurred). TO is then entered, duplicating the deadbanding previously described.
In the case of a register write, TC1 is again entered
(from TB), but the data bus may now be driven by
the host. (During host cycles, data bus drive activity
is indirectly controlled by WE # and an additional
dead-band is provided by entry into state TC1 to allow for internal WE # stabilization.) Stable data at
the 82750PB interface, as well as the completion of
the write cycle, is signaled by the deassertion of
HREQ#. As with reads, the deactivation of HRDY#
signals-the transition to state TO.
HOST EXTERNAL ACCESS
In addition to VRAM and register host access, an
external device access mechanism is provided_ During this access, upon the receipt of HREQ# with
HREQ# and HRAM# inactive, the 82750PB releases the address, data, write enable, and byte enables
in state TA.
As state TO is entered, HRDY # and HBUSEN #
deassert, the address data, write enable, and byte
enables tri-state, and bus control is returned to the
82750PB in the following cycle.
The difference here is that state TC1 is directly entered from T A, thereby ignoring any transitions of
HALEN #. Since the 82750PB also ignores the data
bus direction control (write enable) the host and an
external device may communicate unencumbered
by the 82750PB.
HOST VRAM ACCESS
Because the 82750PB is so closely coupled with
VRAM, host accesses to VRAM are arbitrated and
controlled by the 82750PB. VRAM access is initiated
by the host asserting HREQ#, HREG#, and
HRAM # as shown in the Host Cycle Table above
and in the timing diagrams on pages 42 through 45.
After the host has been granted bus access by an
active HBUSEN #, the address, write enable, and
byte enables may then be driven. After these signals
have stabilized at the memory devices (or longest
relevant propagation path), HALEI'>J # is asserted,
enabling a read or a write operation to occur.
Entry into state. TC directly follows TC1 in the expected sequence and remains there until HREQ# is
released. This is followed by entry into TO.
HBUSEN # is asserted during the timing that TC1
and TCN are active.
During an external access, HRDY # is not asserted
unless the external logic asserts MRDY # as shown
in Figure 3-7.
HOST REGISTER ADDRESS MAPPING
Because VRAM will not drive the data bus until after
a memory request, a transition into state TC1 to allow for data bus direction stabilization is not required. Instead, a transition to state TV1 occurs,
which asserts MREQ# for a single cycle and is followed by a transition to TV2. TV2 will remain the
current state until the reception of an active
MRDY#.
Table 3-9 shows the host address mapping of the
on-chip registers and memories, in terms of the offset in bytes, from the base address for 82750PB
accesses_ Note that the 82750PB only supports
word accesses to these registers. Therefore, the
least significant bit of the byte offset should be set to
zero. The 82750PB forms the register address from
inputs on the A[31 :2] pins and BE# [3:0] pins. The
A[31 :2] specify the double word address of the register, and combinations of the BE # pins determine
which of the two words with the double word is being
addressed. BE # [3:0] = 110°2 selects the least significant word within a double word, and BE # [3:0] =
00112 selects the most significant word within a
double word. These are the only two valid patterns
for BE # inputs during a host register access cycle.
In the case of a VRAM read, the memory data bus
will be driven during TV1, and valid data will appear
in state TV2. Data will be guaranteed valid coincident with the deassertion of MRDY # from memory.
In the case of a VRAM write, the memory data bus is
driven with valid data during TV1. Again the reception of MRDY # will serve to indicate the completion
of the memory operation.
1-89
II
intei®
82750PB
During an access to areas (a) or (b), bits 6:1 of the
byte offset should be set to the source or de~tina
tion code for the register that will be read or wntten.
The coding is the same as used in the microcode
instruction word. Bit 0 is always set to a zero. Refer
to the 82750PB Source and Destination Coding
Table found in Chapter 4,
Table 3-9. Host Address Mapping
Byte
Address
OxOOO-Ox07E
Ox080-0xOFE
Ox100-0x17E
Ox180-0x1 FE
Description
(a) A source and
destination registers
(b) B source and
destination registers
(c) Microcode processor control
and status registers
(d) VRAM pointer RAM
Area (c) contains one write-only register, the CONTROL register, and two read-only registers, the INTERRUPT FLAG register and the microcode PROCESSOR STATUS register. The CONTROL register is
used to halt or single-step the microcode processor,
which enables or masks interrupts to the host CPU,
selects the signal that is output via the PMON/FRZ
pin, and enables or disables the 82750PAemulati?n
mode. The bit assignments for the CONTROL regIster are given in Table 3-10.
NOTE:
The host should only perform 16-M word reads
or writes to 82750PB registers. The 82750PB
does not support byte reads or writes or double
wordreads or writes to on-chip registers.
During reset of the 82750PB, the HALT bit is set to a
one, the six Interrupt Enable bits are reset to zero,
the Disable SYNC bit is set to zero, the PMON/FRZ
bit is set to zero (so that the FRZ signal is output),
and the Enable 82750PB bit is reset to zero (so that
on reset, the 82750PB starts in a 82750PA emulation mode).
When the host CPU reads or writes to areas (a, b, or
d) and the 82750PB is not already in a HALTstate,
the m,icrocode processor is automatically HALTED
for the one T-cycle actually required to complete the
data transfer, and then the processor is restarted
after the transfer is complete. If the 82750PB is in a
HALT state when the host access is initiated, it will
remain in the HALT state following the completion of
the access. This is transparent to both the host CPU
and the microcode processor.
1-90
82750PB
Table 3-10. Bit Assignments for Microcode Processor CONTROL
Register [Write-Only, Byte Offset = Ox1001
Bit
BitO
Name
HALT
Description
1 = Microcode Processor Halt
Microcode Processor Run
o=
Bit 1
SINGLE-STEP
Bit 2
Enable MCINT
1 = Execute One" Instruction and then Halt
(Only when Already Halted, Bit 0 = 1)
0= No Action
1 = Enable Microcode Interrupts to Host CPU
Mask Microcode Interrupts
o=
Bit 3
Enable VBI
1 = Enable Vertical Blanking Interrupt to Host CPU
Mask Vertical Blanking Interrupt
o=
Bit 4
Enable DFL
1 = Enable DFL Interrupt to Host CPU
Mask DFL Interrupt
o=
Bit 5
Enable SD
1 = Enable 82750DB Shutdown Interrupt to Host
Mask SD Interrupt
o=
Bit 6
Enable OFI
1 = Enable Odd Field Interrupt
Mask OF Interrupt
o=
Bit 7
Enable EFI
1 = Enable Even Field Interrupt
Mask EF Interrupt
o=
Bits 8-11 *
Bit 12
1 = RESERVED; Write as Zeros
Disable SYNC
1 = Disable Synchronizers for HREQ# IHALEN #
Enable Synchronizers for HREQ# IHALEN#
o=
Bit 13
PMON/FRZ
1 = Output FRZ # Signal on PMFRZ # Pin
Output PMON # Signal on PMFRZ # Pin
o=
1 = RESERVED; Write as Zero
Bit 14
Bit 15
Enable 82750PB
1 = Enable 82750PB Mode
Enable 82750PA Emulation Mode
o=
'AII other bits are reserved for future use, and should be written as zeros.
1-91
82750PB
The INTERRUPT FLAG register holds a flag for
each of the six interrupt sources. A flag bit is set to a
one when the interrupt condition is detected (independent of the state of the corresponding Interrupt
Enable/Mask bit in the CONTROL register), and all
flags are cleared to zero each time the INTERRUPT
FLAG register is read. If this register is read during
the same cycle that an interrupt condition is detected, the flag bit corresponding to that interrupt condition will remain at a one. This new interrupt condition
will then be seen by the host processor when it next
reads the INTERRUPT FLAG register. The flag insures that an interrupt is not lost if it occurs at the
same cycle that the INTERRUPT FLAG register is
read (and reset). In addition, the Microcode Interrupt
source has an overflow flag that indicates if more
than one Microcode Interrupt has occurred since the
Interrupt Flag register was last read. The bit assignments for the INTERRUPT FLAG register are listed
in Table 3-11.
The PROCESSOR STATUS register holds four
status bits: HALT, FREEZE, PMON, and SYNC
status. HALT indicates that the processor is HALTED due to a HALT bit in the CONTROL register being set to ONE or due to the HALT # pin being
asserted. FREEZE indicates that the processor is
waiting for one of the VRAM channels to become
ready or is waiting for an access to the VRAM pointer RAM. PMON is a signal that can be.toggled by a
special ALU opcode or a special B source code.
This signal can be used for performance monitoring
of microcode. SYNC status bit indicates the presence or absence of the internal synchronizers for
HREQ# -and HALEN# inputs. In addition, the Interrupt Mask bits that are written into the PROCESSOR
CONTROL register can be read from this register.
These mask bits are read in the same polarity that
they are written, but note that the bit positions and
bit ordering are not consistent with the PROCESSOR CONTROL register. The bit aSSignments for
this register are given in Table 3-12.
a
Address mapping for areas (a), (b), and (d) are given
in Tables 3-13 to 3-15.
Table 3-11. Bit Assignments for INTERRUPT FLAG Register
(Read-Only, Byte Offset = Ox100)
Bit
Description
Bit 8:0
Not Used, the State of These Bits Are Not Specified
Bit 9
EF Interrupt Flag
Bit 10
OF Interrupt Flag
Bit 11
MCINT Overflow Flag
Bit 12
82750DB Shutdown Interrupt
Bit 13
MCINT Microcode Interrupt
Bit 14
VBI Vertical Blanking Interrupt
Bit 15
DFL Display Format Load Interrupt
1-92
82750PB
Table 3·12. Bit Assignments for PROCESSOR STATUS Register
(Read·Only, Byte Offset = Ox102)
Bit
Description
=
=
BitO
HALT (1
Bit 1
FREEZE (1
Bit 2
PMON (1
Bit 3
Synchronizers on HREQ# IHALEN # (0
Bit 9:4
Not Used, the State of These Bits is Not Specified
=
Halted,O
=
Frozen, 0
Active, 0
=
Running)
=
Running)
Inactive)
Bit 10
MCINT Microcode Interrupt Mask
Bit 11
VBI Vertical Blanking Interrupt Mask
=
Bit 12
DFL Display Format Load Interrupt Mask
Bit 13
82750DB Shutdown Interrupt Mask
Bit 14
OF Interrupt Mask
Bit 15
EF Interrupt Mask
1-93
Enabled, 1
=
Disabled)
82750PB
Table 3-13. 82750PB A Bus Source/Destination AddresS Mapping
Address (Hex)
ADST
OxOOO
Null
OxOO2
OxOO4
OxOO6
ASRC
Address (Hex)
ADST
Null
Ox042
out1 + +
'in2
hwid·
Ox044
shift-hi
'stat
ee
Ox046
out1-hi
*stat#
Ox048
'out2
maddr
OxOO8
alu
Ox04A
out2+ +
OxOOA
ent
ent
Ox04C
shift-r
OxOOC
ent2
ent2
Ox04E
out2-hi
OxOOE
lent
lent
Ox050
out1-e
rO
rO
Ox052
in1-e
Ox010
ASRC
Ox012
r1
r1
Ox054
shift-I
Ox014
r2
r2
Ox056
in1-hi
Ox016
r3
r3
Ox058
out2-e
Ox018
r4
r4
Ox05A
in2-e
Ox01A
r5
r5
Ox05C
Ox01C
r6
r6
Ox05E
in2-hi
Ox01E
r7
r7
Ox060
r8
r8
Ox020
meode3
meode3
Ox062
r9
r9
Ox022
meode2
meode2
Ox064
r10
r10
Ox024
meode1
meode1
Ox066
r11
r11
Ox026
pc
pc
Ox068
r12
r12
Ox028
pixint-e
Ox06A
r13
r13
Ox02A
pixint
pixint
Ox06C
r14
r14
Ox02C
*dram1
*dram1
Ox06E
r15
r15
Ox02E
*dram2
*dram2
Ox070
ee
shift
Ox030
*dram1 + +
*dram1 + +
Ox072
fent
fent
Ox032
*dram2+ +
*dram2+ +
Ox074
*dram3
*dram3
Ox034
*dram1- -
*dram1- -
Ox076
*dram4
*dram4
Ox036
*dram2- -
*dram2- -
Ox078
*dram3+ +
*dram3+ +
Ox038
dram1
dram1
Ox07A
*dram4+ +
*dram4+ +
Ox03A
dram2
dram2
Ox07C
*dram3- -
*dram3--
Ox03C
dram3
dram3
Ox07E
*dram4- -
*dram4- -
Ox03E
dram4
dram4
Ox040
*out1
*in1
1-94
intel®
82750PB
Table 3-14. 82750PB B Bus Source/Destination Address Mapping
Address (Hex)
BDST
BSRC
Address (Hex)
BDST
OxOSO
Null
Null
OxOC2
out1 + +
alu
OxOC4
out1-lo
out1-lo
OxOS2
BSRC
OxOS4
*dram3
*dram3
OxOC6
out1-hi
out1-hi
OxOS6
*dram4
*dram4
OxOCB
*out2
stat-Io
OxOSS
*dram3+ +
*dram3+ +
OxOCA
out2+ +
stat-hi
OxOSA
*dram4+ +
*dram4+ +
OxOCC
out2-lo
out2-lo
OxOSC
*dram3- -
*dram3- -
OxOCE
out2-hi
out2-hi
OxOSE
*dram4- -
*dram4- -
OxODO
out1-c
out1-c
Ox090
rO
rO
OxOD2
in1-c
in1-c
Ox092
r1
r1
OxOD4
in1-lo
in1-lo
Ox094
r2
r2
OxOD6
in1-hi
in1-hi
Ox096
r3
r3
OxODB
out2-c
out2-c
Ox09S
r4
r4
OxODA
in2-c
in2-c
Ox09A
r5
r5
OxODC
in2-lo
in2-lo
Ox09C
r6
r6
OxODE
in2-hi
in2-hi
Ox09E
r7
r7
OxOEO
stat-ram
rB
OxOAO
rS
*in1
OxOE2
stat-c
r9
OxOA2
r9
*in2
OxOE4
stat-Io
r10
OxOA4
r10
* stat
OxOE6
stat-hi
r11
OxOA6
r11
'stat#
OxOEB
yeven-Io
r12
OxOAS
r12
circbuf
OxOEA
yeven-hi
r13
OxOAA
r13
OxOEC
yodd-Io
r14
OxOAC
r14
OxOEE
yodd-hi
r15
OxOFO
ypitch
OxOAE
r15
OxOBO
circbuf
literal 0
OxOF2
shift
stat-c
literal 1
OxOF4
vu-Io
*dram1
*dram1
literal 2
OxOF6
vu-hi
*dram2
OxOB6
*dram2
literal 3
OxOFB
vupitch
*dram1 + +
OxOBB
*dram1 + +
literal 4
OxOFA
vpitch
*dram2+ +
OxOBA
*dram2+ +
literal 5
OxOFC
vptr-Io
*dram1- -
OxOBC
*dram1- -
literal 6
OxOFE
vptr-hi
*dram2- -
OxOBE
*dram2- -
literal 7
OxOCO
'out1
prof
OxOB2
OxOB4
1-95
II
inlet
82750PB
Table 3-15. VRAM Pointer RAM Mapping
Byte Address
Description
Name
Ox180
Ox182
Yw-Io
YW-hi
Working Copy of Y Pointer
Ox184
Ox186
out1-lo
out1-hi
Output FIFO 1 Pointer
Ox188
Yw-pitch
Working Copy of Y Pitch
RESERVED
Ox18A
Ox18C
Ox18E
out2-lo
out2-hi
Output FIFO 2 Pointer
Ox190
Ox192
VUw-lo
VUw-hi
Working Copy of VU Pointer
Ox194
Ox196
in1-lo
in1-hi
Input FIFO 1 Pointer
Ox198
VUpitchw
Working Copy of VU Pitch
Ox19A
vpitchw
Working Copy of 82750DB Pitch
Ox19C
Ox19E
in2-lo
in2-hi
Input FIFO 2 Pointer
Ox1AO
Ox1A2
vptrw-Io
vptrw-hi
Working Copy of 82750DB Pointer
Ox1A4
Ox1A6
stat-Io
stat-hi
Working Copy of Statistical Decoder Pointer
Ox1A8
Ox1AA
Yeven-Io
Yeven-hi
Shadow Copy of Y Start Even Pointer
Ox1AC
Ox1AE
Yodd-Io
Yodd-hi
Shadow Copy ofY Start Odd Pointer
Ox1BO
Ypitch
Shadow Copy of Y Pitch
Ox1B2
rfcnt
RFSH Cycles per RFSH Code from 82750DB
Ox1B4
Ox1B6
VU-Io
VU-hi
Shadow Copy of VU Start Pointer
Ox1B8
VUpitch
Shadow Copy of VU Pitch
Ox1BA
vpitch
Shadow Copy of 82750DB Pitch
Ox1BC
Ox1BE
vptr-Io
vptr-hi
Shadow Copy of 82750DB Pointer
NOTE: Register rfont wnte only register and should never be read.
Initializing the 82750PB
leasing RESET #. This is referred to as the INITIAL
state. In the INITIAL state:
The 82750PB is placed in a RESET state by asserting RESET# for at least ten T-cycles. In the RESET
state, which continues until RESET # is released, all
of the 82750PB's outputs are tri-stated for compatibility with board test requirements.
• The microcode processor is halted.
Proper initialization of the 82750PB requires that the
82750PB is held in a RESET state by keeping RESET # active for at least 10 T -cycles, and then re-
• The VRAM interface is ready to service VRAM
requests; however, none of the VRAM pointers
are valid.
• All six interrupts are masked, and the interrupt
latches are cleared.
• The 82750PAl82750PB instruction format select
bit is set to the 82750PA.
1-96
int:eL
82750PB
both as external signals, multiplexed on a single output pin, and as bits in the Processor Status register.
FRZ# is active for each T-cycle when the microcode processor is frozen, waiting for access to
VRAM or to the VRAM Pointer RAM. PMON # can
be toggled by a special ALU opcode or a special B
bus source code. This allows PMON # to be used to
indicate what particular segment of microcode is being execute. The PMON/FRZ bit in the Processor
Control register selects the signal that is being output.
• The number of refresh cycles that will be generated each time a RFSH code is received from the
827500B is set to 14 cycles.
• All bidirectional I/O pins are tristated.
After the 82750PB has been initialized, i.e., placed in
the INITIAL state, but prior to releasing the
827500B's reset signal, the following operations
must be performed:
• Load the REFRESH-CYCLES-PER-LiNE register
with the appropriate value (the equation for the
value is: VALUE = (2 N - 1), where N is the Ilumber of cycles; for example, 5 refresh cycles would
result in VALUE = 25 - 1 = 3110 = 001 F16.
The refresh register is 14 bits wide and the way it
works is to generate one refresh everytime a right
shift results in a '1' bit. It continues the right sifting
until it finds a '0' bit and halts. Hence from programming point of view: 001 F16 = FFOF16 = 5 refresh
cycles per line.
o Load the shadow copies of Y, VU, and 827500B
pointers'and pitches.
• Load the appropriate 827500B Register Load list
into VRAM starting at the address pointed to by
the 827500B pointer.
Prior to releasing the microcode processor from its
HALTed state to run a microcode program, the following operations must be performed: .
• If 82750PB code is to be executed, bit 15 of the
82750PB CONTROL register must be set to a
one.
• Load a microcode program into microcode RAM
on the 82750PB by writing to the three instruction
word registers (meode 1 - the most significant
word of the instruction, meode2,
and
meode3 ~ the least significant word of the instruction, the one containing the next address
field) and then writing to maddr, the address in
microcode RAM where the instruction will be
loaded.
Freezes may indicate that the microcode routine is
not making the most efficient use of the input and
output FIFO buffering. This is particularly important
for the inner loops of graphics and video routines
that are memory-bandwidth limited. Ideally, inner
loops should be balanced so that the rate pixels are
processed is equal to the rate that they can be read
from and written to VRAM with no freezes. The buffering in the input and output FIFOs serve to make
sequential reads and writes to VRAM more efficient
by performing full 54-bit reads and writes, instead of
individual 8-bit or 15-bit accesses. This has the effect of averaging the VRAM read/write rate over a
number of instruction times. For example, if the
82750PB is performing a 54-bit read or write every 8
T-cycles, for an average 018 bits per T-cycle, a two
instruction inner loop could read one 8-bit pixel and
write one 8-bit pixel without any freezes occurring
(assuming the source pixels and the destination pixels are each sequential).
The PMON# provides a more standard performance
monitoring capability by indicating when a particular
segment of microcode, bracketed by special instructions that toggle the PMON# signal, is being executed: This allows either absolute execution-time
measurement or measurement of the fraction of the
total execution time that is required by the segment.
Either the ALU opcode'prof' or the B bus source
code 'prof' will toggle the PMON signal.
• Write to the 82750PB CONTROL register with the
HALT bit (bit 0) set to zero, causing the processor
to start executing an instruction sequence, or with
the SINGLE-STEP bit (bit 1) set to a one (keeping
HALT also set to one), causing the processor to
execute a single instruction.
An external HALT pin is provided on the 82750PB to
allow external debugging hardware to immediately
halt the microcode processor. Activating this input
causes the microcode processor to halt prior to executing the next instruction. When the processor is
halted, the VRAM interface portion of the 82750PB
continues to operate normally, performing transfer
cycles, refresh cycles, and shadow copies as requested by the 827500B.
Performance Monitoring
Host/VRAIVl Timing
Two signals, FRZ# and PMON#, w~ich are useful
for microcode performance monitoring, are available
Figures 3-4 through 3-8 are HostlVRAM Timing Oiagrams.
• Load the PC with the address in microcode RAM
of the first instruction to be executed.
1-97
Diag~ams
82750PB
1
CLK
MREQ#
A[31:3]
Ti
.1 TI
1
T2
.1 TFI 1
Tl. 1 Tl.1
TFl.l
BEiI[3:0J
WE#
NXTFST#
_ _-+1,
1
i!
1
----~lhXr------~--------~IX
!X
_ _-+~\~_____·~I~/r-----~\
O[31:0J
MROY#
I·
.I
Ti
..
1
=====~XC========blxc========b
~----,li-_ _
Ix==
i
I
!
I\
!K
c=
IX
i
----~h\~_________+I----------_4J
I! I
----~h\~_ _ _ _~iIr-----h
\
!
!
c=
I
I
TRNFR#
RFSH#
TFl
"---..~~~~
e
A[2]
Tl ,I TFI 1
------<=~!~ill~!X==!===::>--_t---<~
>
LV-,
(lfrom8l750PBK
'I--...J.(c-D..K..aK-LK':-XO-L>.K\.l.--L.1..LJKwK>-AX..aK'--"\_-L(~KO-L>.K...cKwK~'l.l
\KX\
I
~
to 8l750PB)......h. ..
a:y:y:
KKK K K\
r----------------+--------------------41.
8l750PB VRAM Write Cycle Pair
8l750PB VItAM Read Cycle Pair
(first cyde has one wait ~~te)
(zero wait states for both cycles)
240854-9
NOTES:;
1. Address pin A[2) is always ZERO for the first cycle of a cycle pair and ONE for the second cycle.
2. The two cycles of a cycle pair are both writes or both reads.
Figure 3-4 VRAM Read and Write Cycles
CLK
.. MREQ#
A[31:2]
BE#[~:O]
WE#
NXTFST#
TRNFR#
RFSH#
0[31:0J
MROY#
1 TI 1 Tal 1 Tt'l. 1 Ttxl.1 Tal .I TrfI 1 Tr12.1 Tr12 .1 Trl2.1 Tr12.1 Ti 1
~~~~~~~
_____-,~ i i i
'-------J
i i!
------"
1\
j
\~
ix:=
c=
j
___~__~---~II
----~~~
i
______+_-~----~! ____ ~:!.:~--~--~~--~_c=---(r______
-----------r----~--~I\~
I
)
IXKxxxxxK'l.l
'@
\X\
~~----+_--_+---~r---
IxxxXXl
'@
@
\X\
81750PB VItAM Refresh Cycle
NOTE: the address is held
81750PB VItAM Transfer Cycle
(Transfer Read or Write
de~ndjng on
..
~t~~~~~~~: NO~
state of
WE#signal)
output by 827S0PB; It Ii assumed
that a CAS before BAS refresh
cycle i. generated to the
DRAMIVRAM chips.
240854-10
Figure 3-5. VRAM Transfer and Refresh Cycles
1-98
82750PB
Th
Th
Tc~
TeN
,I
Td
1
Ti
eLK
HREQ#
IIREG#
\
'"
1\
\ \ \\:
/:
HRAM#
HBUSEN#
/ / /
HALEN#
/ / /
/ /
/ /; /
A[J):21
\
BE#[J:OI
WE#
0[3):01
IIROY#
\\
~
--\ \
"~~I'---
WE#
0[3):01
HROY#
Shaded areas indicate
bidircctibnal·signalis
driven by host
240854-11
NOTES:
1. MREQ#, RFSH#, TRNFR#, and NXTFST# remain inactive during Host Register Read and Write cycles.
2. If HALEN # IHREQ # synchronizers are disabled then the second Ti and Tb states will be missing.
Figure 3·6. Host Register Read and Write Cycles
1-99
1
:i"
c(
~
eLK
HREQ#
l'!
10
HREG#
...CCD
HRAM#
tf
HBUSEN#
:"I
::z::
-...
0
11/
m
)C
,
CD
:::J
..... DI
0-
°lll
CD
DI
A[31:2J
BE#[J:OJ
GI
N
WE#
en
D[JI:O]
'V
"'o"
OJ
aDI
:::J
a-
~I
0
'<
MRDY#
HRDY#
Note: HRDY# is only as.'iCrted by 827S0PB if
n
extemnllosic asserts MRDY#.IfMRDY#
is not asserted. HRDY !ltays inactive during
11/
an Extcmal cycle.
ii'
240854-12
NOTES:
1. MREQ#, RFSH# TRNFR#, and NXTFST# remain inactive during Host External Read and Write cycles.
2. If the Synchronizer on HREQ# is disabled, then the second Ti state will be missing.
_.
l
8
TI
TI
eLK
HREQ#
\
I
HREG#
:!!
cc
HRAM#
ID
HBUSEN#
Cf
9'1
HALEN#
...
c:
::E:
0
...
III
.!.
o
.....
<
::u
>
s:
::u
ID
II)
a.
II)
=
a.
:e...
=<=
ID
(')
A[31:2]
BE#[3:0]
CI
WEI
U1
Q
0[31:0]
"OJ
l\)
""-l
MREQ#
MROY#
r......
HROY#
~\
~(
'<
()
iD
III
Note: 827S0PB will stay in Tb for the maximum of:
1) one T-state. OR
2) two T-states after VALEN# goes low.
240854-13
NOTES:
1. RFSH #. TRNFR #. and NXTFST # remain inactive during Host VRAM Read and Write cycles.
2. If the Synchronizers on HREQ# IHALEN# is disabled, then the second Ti state will be missing.
iii
82750PB
4.0 MICROCODE INSTRUCTION
FORMAT
Overview
The 82750PB executes two slightly different instruction formats: one that is backward compatible with
the 82750PA and another that allows full access to
the microcode resources of the 82750PB. The
82750PAl82750PB bit in the 82750PB processor
control register determines which instruction format
is in effect (see Chapter 3). On reset, the 82750PB is
placed in 82750PA instruction format mode. In this
mode the 82750PB will execute binary microcode
originally assembled for the 82750PA in a manner
that is functionally equivalent to the 82750PA.
The following description applies to the 82750PB instruction format. Exact definitions of 82750PB instruction formats and field codings are shown in Figure 4-2 and Table 4-5.
Instruction Sequencing
The instruction word for 82750PB's microcode processor is 48 bits wide. The Microcode RAM holds 512
instructions. Nine bits of each instruction specify the
address of the next instruction to be executed. Each
instruction fetch reads two instructions (of odd address and even address pair) using the upper eight
bits of the 9-bit instruction address. Both the LSB of
the instruction address and a Condition Flag bit, selected from eight possible branching conditions, are
used to determine whether the next instruction to be
executed is the even address instruction or odd address instruction, according to the logic table shown
as Table 4-1.
Table 4-1. Microcode Next Instruction Selection
LSBof
Address
Condition
Flag State
Next
Instruction
0
o (FALSE)
EVEN
0
1 (TRUE)
EVEN
1
o (FALSE)
ODD
1
1 (TRUE)
EVEN
For an unconditional branch, the condition flag
FALSE (which is always zero) is selected; this causes the LSB of the address to be passed through to
select the next instruction: LSB = 0 selects EVEN
and LSB = 1 selects ODD. This allows unconditional branching to any of the 512 instructions in the
RAM. For a conditional branch, the LSB of the address is set to a one; this causes the state of the
condition flag to select the next instruction: FALSE
selects the ODD instruction and TRUE selects the
EVEN instruction. Therefore, a conditional branch
jumps to either the odd or even instruction of an
odd/even pair depending on the state of the condition.
Instruction Word Field Descriptions
Each field of the microcode instruction format is described in the following sections.
NADDR-NEXT INSTRUCTION ADDRESS FIELD
This field holds the address of the next instruction to
be .executed. Taking advantage of the fact that the
microcode RAM is physically organized as 256 deep
by 96 wide (two instructions are fetched per read
cycle), a zero delay two-way branch can be
achieved. The only case in which this field is not
used to determine the address of the next instruction to be executed is when an instruction writes to
the PC. (The term PC refers to the register that holds
the address of the next instruction to be executed.)
When an instruction loads the PC a one instruction
delay occurs before the load takes effect. Therefore,
the instruction pointed to by the next instruction field
of the instruction that loads the PC is executed before the jump to the new address occurs. This is
shown in Table 4-2.
There are no restrictions on the instruction following
a PC load; it will always be executed, even while
single stepping the processor or if the processor is
frozen on that instruction.
CFSEL-CONDITION FLAG SELECT FIELD
This field selects which condition flag will be used
with the LSB of NADDR to select the next instruction
from the odd/even pair. The condition flag assignment is given in Table 4-3.
1-102
82750PB
Table 4-2. PC Load Example
Addr
Instruction
NADDR
10
pc = 0
55
Load PC with zero.
55
rO = 1
X
This instruction is executed but its next
address field is ignored.
0
r1 = rO
25
PC load takes effect after a one instructon delay,
the result is that r1 = rO = 1.
Comments
Table 4-3. Condition Flag Select Field Assignments
Value
Flag
000
FALSE
Select for Unconditional Branch
001
CARRY
Carry Out from ALU Condition Flag Latch
010
OVF
Overflow from ALU Condition Flag Latch
011
SIGN
Sign from ALU Condition Flag Latch
Description
100
ZERO
Zero from ALU Condition Flag Latch
101
LCNTZ
TRUE if Selected Loop Counter = 0
110
LSB
LSB of Data Register rO
111
MSB
MSB of Data Register rO
NOTE:
The ALU condition flags (CARRY, OVF, SIGN, and ZERO) are latched in the ALU Condition Flag register. This register is
updated for most-but not all-ALU operations. The remaining flags (LCNTZ, LSB, and MSB) are updated and latched each
cycle.
ASRC-A BUS SOURCE SELECT FIELD
CNT-DECREMENT LOOP COUNTER BIT
This field selects the element that should drive its
data onto the A bus during the execution of this instruction. The mapping for this and the following
three fields is provided in Chapter 6.
A one in this bit position causes the selected Loop
Counter (selected by LC,.the loop counter select bit)
to be decremented. The new value of the loop counter and the updated LeNTZ condition flag are not
ready until the next instruction cycle. Therefore, in a
loop where the loop counter is decremented and
tested for zero in the same instruction (typically in a
one instruction loop), the start value for the loop
counter should be one less than the number of times
the loop should be executed.
ADST-A BUS DESTINATION SELECT FIELD
This field selects which element should latch data
from the A bus during the execution of this instruction. See ASRC above.
LIT-LITERAL SELECT BIT
BSRC-B BUS SOURCE SELECT FIELD
When this bit is a one, the ASRC and CFSEL fields
are replaced with a 9-bit literal value that is driven as
a source in the least significant 9 bits of the A bus. In
this case, the upper 7 bits of the A bus are forced to
zeros. The mapping of bits from the literal field to the
A bus is shown in Figure 4-1.
Same as ASRC, but for B bus. See ASRC above.
BDST-B BUS DESTINATION SELECT FIELD
Same as ADST, but for B bus. See ADST above.
NOTE
A conditional branch and a literal on the A bus are
not allowed in the same instruction. A 3-bit literal
can be placed on the B bus in any instruction.
1-103
·an'el®
+_1
A bus bits
Inst. Word Bits
ASRG Field
CFSEL Field
82750PB
15
14
13
12
·11
10
'*- - - - - - Forced to Zero - - - -
9
~
8
7
6
5
4
17
'*-
16
15
14
13
3
2
12
11
o
10
9
~
'*- --
~
Figure 4-1. Literal Field Mapping onto a Bus
on either bit causes the corresponding latch to hold
its current content. This allows the ALU operands
either to come from "eavesdropping" on the A or B
bus transfers occurring in the current instruction cycle or to be held for multiple instruction cycles in
either the A or B input latch.
SHFT-SHIFT CONTROL FIELD
This field controls the bit shifting and byte swapping
logic associated with register rO. The encoding of
this field is given in Table 4-4.
Table 4-4. SHIFT Control Field Coding
SHFT
Operation
00
No Shift or Swap Operation
01
Shift rO Right One Bit
Position, Sign Extend
10
Shift rO Left One Bit
Position, Zero Fill
11
Byte Swap the Value
Being Loaded into rO°
ALUOP-ALU OPERATION CODE FIELD
*Byte swapping only works when ro is the destination on the
A bus or the B bus. It does not swap data held in ro, only data
being loaded. In order to byte swap data in register ro, ro
must be both a source and destination for either the A or B
bus.
ALUSS-ALUSOURCE SELECT BITS
These two bits are used as enables for the two ALU
input latches. Bit 39 enables the latch that connects
to the A bus; bit 38 enables the latch connected to
the B bus. A one in either bit position causes the
corresponding input latch to latch the value on the
bus to which it is connected (the A or B bus). A zero
This field specifies the ALU instruction to be· performed during the current instruction cycle. The encoding of this field is given in Figure 4-2. Normally, at
the end of the instruction execution, the result of the
ALU operation is latched in the ALU output latch that
can be a source on either the A or B buses. However, if a Nap is selected for the ALU operation, the
ALU output latch is not latched. The data is held
from the previous instruction. In addition to Nap,
certain other ALU opcodes do not actually perform
ALU operations and therefore, do not latch the ALU
results. They are INT (microcode interrupt) and the
PROF instruction.
LC-LOOP COUNTER SELECT BIT
This bit selects which of the two loop counters is to
be used for decrementing or Loop-Gounter-Zero
conditional branching in the current instruction. A
zero selects loop counter zero and a one selects
loop counter one.
Refer to the Intel 82750PB Microcode Programming
Guide for more information on microcode programming.
1-104
82750PB
Table 4-5. 82750PB Source/Destination Coding
Address (Hex)
BDST
BSRC
OxO
Null
Null
alu
hwid
"dram3
"dram3
cc
Ox3
"dram4
*dram4
Ox4
*dram3+ +
*dram3+ +
Ox5
*dram4+ +
"dram4+ +
cnt
cnt
Ox6
*dram3- -
*dram3- -
cnt2
cnt2
Ox7
*dram4- -
*dram4- -
.Icnt
Icnt
OxB
rO
rO
rO
rO
Ox9
-r1
r1
r1
r1
OxA
r2
r2
r2
r2
Ox1
Ox2
\
ADST
ASRC
Null
Null
maddr
alu
OxB
r3
r3
r3
r3
OxC
r4·
r4
r4
r4
OxD
r5
r5
r5
r5
OxE
r6
r6
r6
r6
OxF
r7
r7
r7
r7
Ox10
rB
*in1
mcode3
mcode3
Ox11
r9
*in2
mcode2
mcode2
Ox12
. r10
"stat
mcode1
mcode1
pc
Ox13
r11
'stat#
pc
Ox14
r12
circbuf
pixint-c
Ox15
r13
pixint
pixint
Ox16
r14
"dram1
'dram1
Ox17
r15
Ox1B
circbuf
Ox19
'dram2
"dram2
literal 0
'dram1 + +
'dram1 + +
literal 1
'dram2+ +
'dram2+ +
Ox1A
*dram1
literal 2
'dram1- -
'dram1- -
Ox1B
*dram2
literal 3
'dram2- -
'dram2- -
Ox1C
*dram1 + +
literal 4
dram1
dram1
Ox1D
'dram2+
+
literal 5
dram2
dram2
Ox1E
*dram1- -
literal 6
dram3
dram3
Ox1F
*dram2- -
literal 7
dram4
dram4
Ox20
'out1
prof
'out1
. 'in1
1-106
intel~
82750PB
Table 4·5. 82750PB Source/Destination Coding (Continued)
Address (Hex)
BDST
Ox21
out1 + +
Ox22
out1-lo
Ox23
Ox24
Ox25
Ox26
BSRC
ADST
ASRC
out1 + +
"in2
out1-lo
shift-rl
"stat
out1-hi
out1-hi
out1-hi
"stat#
"out2
stat-Io
"out2
out2+ +
stat-hi
out2+ +
out2-lo
out2-lo
shift-r
Ox27
out2-hi
out2-hi
out2-hi
Ox2B
out1-c
out1-c
out1-c
Ox29
in1-c
in1-c
in1-c
Ox2A
in1-lo
in1-lo
shift-I
Ox2B
in1-hi
in1-hi
in1-hi
Ox2C
out2-c
out2-c
out2-c
in2-c
Ox2D
in2-c
in2-c
Ox2E
in2-lo
in2-lo
Ox2F
in2-hi
in2-hi
in2-hi
Ox30
stat-ram
ra
ra
ra
Ox31
stat-c
r9
r9
r9
Ox32
stat-Io
r10
r10
r10
Ox33
stat-hi
r11
r11
r11
Ox34
yeven-Io
r12
r12
r12
Ox35
yeven-hi
r13
r13
r13
Ox36
yodd-Io
r14
r14
r14
Ox37
yodd-hi
r15
r15
r15
Ox3a
ypitch
Ox39
shift
cc
shift
stat-c
fent
fent
Ox3A
vu-Io
*dram1
*dram3
*dram3
Ox3B
vU-hi
*dram2
*dram4
*dram4
Ox3C
vupitch
*dram1 + +
*dram3+ +
*dram3+ +
Ox3D
vpitch
*dram2+ +
*dram4+ +
*dram4+ +
Ox3E
vptr-Io
*dram1- -
*dram3- -
*dram3- -
Ox3F
vptr-hi
*dram2- -
*dram4- -
*dram4- -
1-107
82750PB
47
bit
coding
OxO
Oxl
~
Ox3
0X4
--::-:--
~
Ox6
OX?
0X'il
~
15
LC
SEL
1
cnt
cnt2
46
45
44
43
14
13
SHFT
CNTL
2
nop
shftr
shfll
swap
12
11
42
41
9
10
ALU
OPCODE
5
NOP
ZERO
a
b
-a
-b
40
39
38
meode 1
7
8
6
ALU
SS
2
hold
latb
tat a
both
37
38
35
34
3
2
5
4
LIT
CNT
1
nop
lit
1
nop
doc
~
axc
""""OXil
'--axE
~
r--oxw
: Oxl1
~
Ox13
I-oxi4
&
-&
&++
-I
- <
~
r--ox;-o
Iox2o
30
29
15
14
13
26
28
27
mcode2
12
11
10
8 Bus
Source
6
null
alu
'dram3
'dram4
·dramS + +
'dram4 + +
'dramS
'dram4
rO
",2
",2
r3
,3
,4
,5
,6
,7
'in1
'1n2
r9
+
,,0
'stat
a
-b
'11
,12
,13
'stat#
: OxlE
OxlF
~
Ox19
31
,0
,7
,8.
-
+ <
. +J
J
Ox17
null
r6
: Ox18
Ox1C
i
1
0
B Bus
Oesttnstion
6
,4
,5
+ <
+
a+ +
b+ +
ab
int
prof
a'
b'
: Ox15
Ox16
32
'dram3
'dram4
'dram3 + +
'dram4 + +
'dram3
'dram4
""""""OxA"""
--;:-::-
33
circbuf
'14
,15
circbuf
literatO
literal 1
'dff;lml
'dram2
literal 2
literal 3
'dram1 + +
literal 4
literalS
literale
literal 7
'dram2 + +
'dram1
'dram2
'outl
~
Ox22
out1
prof
++
out1 10
0011 hi
'out2
out2 + +
ou12 10
oul2 hi
outl
c
in1
c
in1 10
in1 - hi
~
~
Ox25
0X2'6
~
0x28
~
~
Ox28
r--oFc-
oul2 - c
in2 c
in2
10
in2
hi
stat ,am
stat c
stal 10
stat hi
yeven 10
yaven hi
yodd - 10
yodd hi
ypitch
~
Ox2E
fox2F
: Ox30
Ox31
f-ox32
: Ox33
Ox34
r-oxss
I~
Ox37
~
out1
10
hi
stat·lo
stat·hi
oul2 10
ou12 hi
outl
c
in1
c
in1
10
outl
in1 - hi
ou12
in2
in2
1n2 ,8
,9
'10
,11
,12
,13
,14
,15
shiH
stat
~
Ox3A
vu
10
0x3il
0X3C
~
vu - hi
vupilch
vpitch
vplr 10
~
vpt,
~
Figure 4·2, 82750PB Instruction Word Format
1-108
hi
c
c
10
hi
c
-dram1
·dram2
-dram1 + +
·dram2 + +
·dram1
·dram2
25
24
9
8
int:eL
23
82750PB
22
21
20
19
18
17
meade 2
7
bit
coding
OxO
6
5
4
3
ABus
Destination
2
1
I
I
o I
16
14
15
13
12
11
10
14
15
13
A Bus
Source
12
11
10
6
3
null
FALSE
CARRY
cc
OVERFLOW
moddr
alu
ZERO
Ox5
cnl
cnt
CNTO
Ox6
cnt2
cnt2
LS8rD
Ox7
lcnt
lent
MS8rD
Ox8
rO
rD
Ox9
rl
rl
OxA
r2
r2
Ox8
(.J
(.J
OxC
r4
r4
OxD
r5
r5
OxE
r6
rS
OxF
r7
r7
Oxl0
mcode3
mcode3
Ox11
mcode2
meade1
meade1
pc
pixint - c
pc
Ox14
pixint
pixin!
Ox16
'~ram1
Ox17
'dram2
'dram1
'dram2
'dram1 + +
*dram2 + +
'dram1 --
'dram1
'dram2
+ +
+ +
'dram1
Ox18
'dram2 - -
*dram2 --
OxiC
draml
Ox10
dram1
dram2
OxlE
dram3
dram3
OxlF
dram4
0)(20
'out1
oun + +
dram4
'in1
'i02
'stat
'stat#
Ox21
Ox22
0)(23
0)(24
'out2
0)(25
out2 + +
Ox26
shift - r
Ox27
out2 - hi
Ox28
outl - c
Ox29
in1 - c
Ox28
shih -I
in1
hi
Ox2C
out2 - c
Ox2D
in2 - c
6
5
3
2
1
0
9
8
4
3
2
1
0
Next
dram2
shift - rI
hi
outl
Ox2A
7
4
mcode2
0)(15
Ox19
OxlA
5
SIGN
Ox4
Ox16
6
9
6
null
hwid
Ox13
7
Address
Ox2
Ox12
8
Cond Flag
Select
Oxl
Ox3
9
meade 3
Ox2E
Ox2F
0)(30
in2 - hi
r8
r8
0)(31
r9
r9
Ox32
riO
riO
Ox33
rl1
rll
0)(34
r12
r12
0)(35
r13
r13
0)(36
r14
r14
Ox37
r15
r15
Ox38
cc
shift
fent
0)(39
tent
Ox3A
'dram3
Ox38
'dram4
'dram3 +
'dram4 +
'dram3
'dram4 -
Ox3e
Ox30
Ox3E
Ox3F
'drama
+
+
'dram4
'dram3 + +
'dram4 + +
'dram3
'dram4
-
Figure 4-2. 82750PB Instruction Word Format (Continued)
1-109
"
82750PB
Exposure to Maximum Ratings may affect device re~
liability. Furthermore, although the 82750PB con·
tains protective circuitry to resist damage from static
electrical discharge, always take precautions to
avoid high static voltages or electric fields.
5.0 ELECTRICAL DATA
Maximum Ratings
Table 5·1 is a stress rating only, and functional operation
at the maximums is not guaranteed. Functional operat·
ing conditions are given in the DC and AC Characteris·
tics (Tables 5·2, 5·3, 5·4, and 5·5).
DC Characteristics
Table 5-1. Absolute Maximum Requirements
Condition
Maximum Requirement
Case Temparature under Bias
to 110°C
Storage Temperature
to 150°C
Voltage on Any Pin with Respect to Ground
to Vee + 0.5V
Supply Voltage with Respect to Vss
V to + 6.5V
Table 5·2. DC Characteristics
Symbol
=
Parameter
O°C to 90°C
Unit
Notes
Input lOW Voltage
V
(Note 1)
Input HIGH Voltage
V
(Note 1)
Output lOW Voltage
V
IOL = 4.0 mAP)
V
IOH = -'1.0 mAP)
8.HP
150
+10
IlA
VSs<"iN
1.31 <.Bl2) -I
S.2/i1 <.SSS)
t-
=..
D4/E4--....:ot
I~.~I"=-.~211J:::--:"(~.S=-:::9~8':'"")®:::::M~I~cI~A~®=:---=B~®~IO:-:O=®IInvm.
DETAIL
DETAIL L
J
8 CEQ.
" CEG.
240854-28
mm (inch)
Figure 6-S.132-Lead PQFP Mechanical Package.Detail-Typical Lead
1-117
82750PB
NOTES.
AlL DII'£NSIONS AKl TOlERANCES CoPf"ORH TO ANSI Y14.Sf1-1982
oATI.I1 PLAAE gg LOCATED AT TI€ P10lD PARTING LItE AI'()
COIIoCIDENT 11TH THE BOTTOt1 ~ TI€ LEAD MRE LEAD EXITS PLASTIC BODY
DATI.I1S (!;II AI'() g:a
TO BE DET~ltED MAE CENTER LEADS EXIT
PLASTIC BOOYAT OA~ PLAtE 6B3
.
CONTROLLING DII'£NSION,· lloCH
Oll'ENSlciNs 01, 02, El AI'() E2 AIlE I'EA8UI£0 AT TI€ P1Ol0 PARTING LItE.
01 AKl El 00 NOT 1r«:L1AlE AN ALLOIABLE P1Ol0 PROTRUSION ~ '.18 ""
<.!J!7 !N)
PE.q
eI~.
02 A."!)
0
CO t«)T tt«:L\AJE A TOTAL ALLOIiIA8LE
P1Olo PROTRUSION r7 '.18"" <..,7 IN) AT I'1AXI/'U'I PACKAGE SIZE.
&
PIN 1 IDENTIFIER IS LOCATED 'ITHIN atE
~ TI€ TIO ZatES II'()ICATEO
240854-29
1-118
82750PB
TA (the ambient temperature) can be calculated
from eCA (thermal resistance from case to ambient)
with the following equation:
Package Thermal Specifications
The 82750PB is specified for operation when TG
(the case temperature) is within the range of O·G to
90·G. TG may be measl.lred in any environment to
determine whether the 82750PB is within specified
operation range. The case temperature should be
measured at the center of the top surface.
TA = Tc
-
P • ()CA
Typical values for eGA at various airflows are given
in Table 6-3 for the 132-lead PQFP package. Table
6-4 shows the maximum TA allowable (wihout ex-,
ceeding Td at various airflows. The power dissipation (P) is calculated by using the typical supply current at 5V as shown in Table 5-2.
Table 6·3. Thermal Resistance ("C/W)
eCA
Versus Airflow-ft/mln (m/sec)
Package
0
(0)
200
(1.01)
400
(2.03)
600
(3.04)
800
(4.06)
1000
(5.07)
132-Lead
PQFP
26.0
17.5
14.0
11.5
9.5
8.5
Table 6·4 Ma'ximum T A at Various Airflows (·C)
TA Versus Airflow-ft/min (m/sec)
Package
Frequency
(MHz)
0
(0)
200
(1.01)
400
(2.03)
600
(3.04)
800
(4.06)
1000
(5.07)
132-Lead
PQFP
25
70
76
80
81
83
84
1-119
i860™ Microprocessor Family
2
II
i860™ XP MICROPROCESSOR
•
Parallel Architecture that Supports Up
to Three Operations per Clock
- One Integer or Control Instruction
- Up to Two Floating-Point Results
•
High Performance Design
- 40/50 MHz Clock Rate
-100 Peak Single Precision MFLOPS
- 75 Peak Double Precision MFLOPS
- 64-Bit External Data Bus
- 64-Bit Internal Code Bus
-128-Bit Internal Data Bus
•
High Integration on One Chip
- 32-Bit Integer and Control Unit
- 32/64-Bit Pipelined Floating-Point
- 64-Bit 3-D Graphics Unit
- Paging Unit with 64 Four-Kbyte and
16 Four-Mbyte Pages
- 16 Kbyte Code Cache
- 16 Kbyte Data Cache
•
Compatible with Industry Standards
- ANSIIIEEE Standard 754-1985 for
Binary Floating-Point Arithmetic
-Intel 386™lintei 486™/i860TM Data
Formats and Page Table Entries
- Binary Compatible with i860™ XR
Applications Instruction Set
- Detached Concurrency Control Unit
(CCU) Supports Parallel Architecture
Extensions (PAX)
- JEDEC 262-pin Ceramic Pin Grid
Array Package
-IEEE Standard 1149.1/06 BoundaryScan Architecture
ill Easy to Use
- On-Chip Debug Register
-UNIX'/860
- APX Attached Processor Executive
- Assembler, Linker, Simulator,
Debugger, C and FORTRAN
Compilers, FORTRAN Vectorizer,
Scalar and Vector Math Libraries
- Graphics Libraries
113 Fast, Multiprocessor-Oriented Bus
- Burst Cycles Move 400 Mbyte/Sec
- Hardware Cache Snooping
- MESI Cache Consistency Protocol
- Supports Second-Level Cache
- Supports DRAM
The Intel i860 XP Microprocessor (order code A80860XP) delivers supercomputing performance in a single
VLSI component. The 32/64-bit architecture of the i860 XP microprocessor balances integer, floating point,
and graphics performance for applications such as engineering workstations, scientific computing, 3-D graphics workstations, and multiuser systems. Its parallel architecture achieves high throughput with RiSe design
techniques, multiprocessor. support, pipelined processing units, wide data paths, large on-chip caches, 2.5
million transistor design, and fast O.8-micron silicon technology.
A51- A3
D63 - DO CONTROL
240874-1
Figure 0.1. Block Diagram
'UNIX is a registered trademark of UNIX System Laboratories, Inc.
Intel, i860, Intel386 and Intel486 are trademarks of Intel Corporation.
2-1
November 1991
Order Number: 240874-002
i860™ XP MICROPROCESSOR
CONTENTS
CONTENTS
PAGE
2.4.4.6 Accessed and Dirty
Bits .......................... 2-26
2.4.4.7 Page Tables for Trap
Handlers ..................... 2-26
2.4.4.8 Combining Protection of
Both Levels of Page Tables ... 2-26
2.4.5 Address Translation
Algorithm ....................... 2-26
2.4.6 Address Translation Faults ... 2-27
2.5 Detached CCU ................... 2-27
2.5.1 DCCU Initialization ........... 2-27
2.5.2 DCCU Addressing ........... 2-27
2.5.3 DCCU Internals .............. 2-28
2.6 Instruction Set .................... 2-28
2.6.1 Pipe lined and Scalar
Operations ...................... 2-30
2.6.1.1 Scalar Mode ............ 2-31
2.6.1.2 Pipelining Status
Information ................... 2-31
2.6.1.3 Precision in the
Pipelines ..................... 2-31
2.6.1.4 Transition between
Scalar and Pipe lined
Operations ................... 2-31
2.6.1.5 Pipelined Loads ......... 2-32
2.6.2 Dual-Instruction Mode ....... 2-32
2.6.3 Dual-Operation Instructions .. 2-33
2.7 Addressing Modes ................ 2-34
2.8 Traps and Interrupts .............. 2-34
2.8.1 Trap Handler Invocation ..... 2-34
2.8.2 Instruction Fault ............. 2-34
2.8.2.1 Lock Protocol ........... 2-35
2.8.2.2 Using PT and P! Bits .... 2-35
2.8.3 Floating-Point Fault .......... 2-36
2.8.3.1 Source Exception
Faults ........................ 2-36
2.8.3.2 Result Exception
Faults ........................ 2-36 .
2.8.4 Instruction Access Fault ..... 2-37
2.8.5 Data Access Fault ........... 2-37
2.8.6 Parity Error Trap ............. 2-38
2.8.7 Bus Error Trap ............... 2-38
PAGE
1.0 FUNCTIONAL DESCRIPTION ........ 2-9
2.0 PROGRAMMING INTERFACE ...... 2-10
2.1 Data Types ....................... 2-10
2.1.1 Integer ...................... 2-10
2.1.2 Ordinal ...................... 2-10
2.1 .3 Single- and Double-Precision
Real .................... i . • . • . • . 2-10
2.1.4 Pixel ......................... 2-11
2.2 Register Set ...................... 2-12
2.2.1 Integer Register File ......... 2-12
2.2.2 Fioating-Point Register Fiie .. 2-12
2.2.3 Processor Status Register ... 2-12
2.2.4 Extended Processor Status
Register ........................ 2-14
2.2.5 Data Breakpoint Register .... 2-16
2.2.6 Directory Base Register ...... 2-16
2.2.7 Fault Instruction Register .... 2-17
2.2.8 Floating-Point Status
Register ........................ 2-17
2.2.9 KR, KI, T, and MERGE
Registers ....................... 2-19
2.2.10 Bus Error Address
Register ........................ 2-20
2.2.11 Privileged Registers ........ 2-20
2.2.12 Concurrency Control
Register ........................ 2-20
2.2.13 NEWCURR Register ........ 2-21
2.2.14 STAT Register .............. 2-21
2.3 Addressing ....................... 2-21
2.4 Virtual Addressing ................ 2-22
2.4.1 Page Frame ................. 2-22
2.4.2 Virtual Address .............. 2-23
2.4.3 Page Tables ................. 2-23
2.4.4 Page-Table Entries .......... 2-24
2.4.4.1 Page Frame Address .... 2-24
2.4.4.2 Present Bit .............. 2-25
2.4.4.3 Writable and User Bits .. 2-25
2.4.4.4 Write-Through Bit ....... 2-25
2.4.4.5 Cache Disable Bit ....... 2-25
2-2
CONTENTS
CONTENTS
PAGE
2.8.8 InterruptTrap ................ 2-38
2.8.9 Reset Trap .................. 2-38
2.9 Debugging ........................ 2-38
PAGE
4.2.11 ClK (Clock) ................ 2-51
4.2.12 CTYP (Cycle Type) ......... 2-52
4.2.13 D/C# (Data/Code) ......... 2-52
4.2.14 D63-DO (Data Pins) ........ 2-52
4.2.15 DP7 - DPO (Data Parity) ..... 2-52
4.2.16 EADS # (External Address
Status) .......................... 2-52
4.2.17 EWBE # (External Write
Buffer Empty) ................... 2-52
4.2.18 FLiNE # (Flush Line) ........ 2-52
4.2.19 HIT # (Cache Inquiry Hit) .... 2-53
3.0 ON-CHIP CACHES .................. 2-39
3.1 Address Translation Caches ......
3.2 Internal Instruction and Data
Caches ............................
3.2.1 Data Cache ..................
3.2.1.1 Data Cache Update
Policies ......................
3.2.2 Instruction Cache ............
3.2.3 Cache Replacement
Algorithm .......................
3.2.4 Cache Consistency
Protocol ........................
3.2.4.1 Data Cache States ......
3.2.4.2 Write-Once Policy .......
3.2.4.3 Locked Access .........
3.3 Internal Cache Consistency .......
3.3.1 Address Space
Consistency ....................
3.3.2 Instruction Cache
Consistency ....................
3.3.3 Page Table Consistency .....
3.3.4 Consistency of
Cacheability .....................
3.3.5 load Pipe Consistency .......
3.3.6 Summary ....................
2-39
2-41
2-42
2-42
2-43
4tFn~)H.I:.~.~.~~~t.~~~~f.i~~......... 2-53
2-43
4.2.21 HlDA (Bus Hold
Acknowledge) ...................
4.2.22 HOLD (Bus Hold) ...........
4.2.23 INV (Invalidate) .............
4.2.24 INT /CS8 (Interrupt/CodeSize Eight Bits) ..................
2-43
2-43
2-44
2-44
2-45
4.2.25 KBO, KB1 (Cache Block) ....
4.2.26 KEN # (Cache Enable) ., ....
4.2.27 lEN (Data length) ..........
4.2.28 lOCK# (Address lock) ....
4.2.29 M/IO# (Memory-I/O) .......
4.2.30 NA # (Next Address
Request) ........................
4.2.31 NENE# (Next Near) ........
4.2.32 PCD (Page Cache
Disable) .........................
4.2.33 PCHK # (Parity Check) ......
4.2.34 PCYC (Page Cycle) .........
4.2.35 PEN # (Parity Enable) .......
2-45
2-46
2-46
2-47
2-47
2-47
4.0 HARDWARE INTERFACE .......... 2-47
4.1 Pins Overview ....................
4.2 Signal Description ................
4.2.1 A31-A3 (Address Pins) ......
4.2.2 ADS # (Address Status) ......
4.2.3 AHOlD (Address Hold) ......
4.2.4 BE7#-BEO#
(Byte Enables) ..................
4.2.5 BERR (Bus Error) ............
4.2.6 BOFF # (Back-Off) ...........
4.2.7 BRDY # (Burst Ready) .......
4.2.8 BREQ (Bus Request) ........
4.2.9 BYPASS# (Bypass) .........
4.2.10 CACHE# (Cacheability) ....
2·47
2-50
2-50
4.2.36 PWT (Page
Write-Through) ..................
4.2.37 RESET (System Reset) .....
4.2.38 RSRVD, SPARE ............
4.2.39 TCK (Test Clock) ...........
4:2.40 TDI (Test Data Input) .......
4.2.41 TDO (Test Data Output) ....
4.2.42 TMS (Test Mode Select) ....
4.2.43 TRST# (Test Reset) .......
4.2.44 Vcc (System Power) and Vss
(Ground) ........................
4.2.45 VccClK (Clock Power) ......
2-50
2-50
2-50
2-50
2-50
2-51
2-51
2-51
2-51
2-3
2-53
2-53
2-53
2-53
2-54
2-54
2-54
2-54
2-55
2-55
2-55
2-55
2-55
2-56
2-56
2-56
2-56
2-56
2c 56
2-56
2-56
2-57
2-57
2-57
2-57
II
CONTENTS
CONTENTS
PAGE
PAGE
4.2.46 WB/WT # (Write-Back/
Write-Through) .................. 2-57
5.5 RESET Initialization ............... 2-86
4.2.47W/R# (Write/Read) ........ 2-57
6.0 TESTABILITy ....................... 2-87
6.1 Test Architecture ................. 2-87
6.2 Test Data Registers .............. 2-87
6.3 Instruction Register ............... 2-88
6.4 TAP Controller .................... 2-89
6.4.1 Test-Logic-Reset State ...... 2-89
6.4.2 Run-Test/Idle State .......... 2-90
6.4.3 Select-OR-Scan State ....... 2-90
6.4.4 Select-IR-Scan State ........ 2-91
6.4.5 Capture-DR State ............ 2-91
6.4.6 Shift-DR State ............... 2-91
6.4.7 Exit1-DR State ............... 2-91
6.4.8 Pause-DR State ............. 2-91
6.4.9 Exit2-DR State ............... 2-91
6.4.1 o Update-DR State ........... 2-91
6.4.11 Capture-IR State ........... 2-91
6.4.12Shift-IR State ............... 2-92
6.4.13 Exit1-IR State .............. 2-92
6.4.14 Pause-IR State ............. 2-92
6.4.15 Exit2-IR State .............. 2-92
6.4.16 Update-IR State ............ 2-92
6.5 Boundary Scan Register Cell
Ordering ........................... 2-92
6.6 TAP Controller Initialization ....... 2-94
5.0 BUS OPERATION ................... 2-57
5.1 Bus Cycles ....................... 2-57
5.1.1 Single-Transfer Cycle ........ 2-58
5.1.2 Burst Cycles ................. 2-58
5.1.3 Pipelined Cycles ............. 2-61
5.1.4 Interrupt Acknowledge
Cycles .......................... 2-63
5.1.5 Special Bus Cycles .......... 2-64
5.2 Bus Arbitration .................... 2-65
5.2.1 HOLD and HLDA
Arbitration ...................... 2-65
5.2.2 Bus Cycle Back-Off and
Restart ......................... 2-66
5.2.2.1 Cycle Back-Off .......... 2-66
5.2.2.2 Cycle Restart ...... , .... 2-67
5.2.2.3 Late Back-Off Modes ... 2-67
5.2.2.4 One-Clock Late Back-Off
Mode ........................ 2-67
5.2.2.5 Two-Clock Late Back-Off
Mode ........................ 2-69
5.3 Cache Inquiry Cycles
(Snooping) ............. , ........... 2-71
5.3.1 Inquiry Write-Back Cycles .... 2-73
5.3.2 Snooping Responsibility
Limits ........ , .................. 2-75
7.0 MECHANICAL DATA ............... 2-94
5.3.2.1 Inquiry for a Line Being
Cached ...................... 2-75
8.0 PACKAGE THERMAL
SPECiFiCATIONS ................... 2-102
5.3.2.2 Inquiry for a Line Being
Replaced .................... 2-77
9.0 ELECTRICAL DATA ............... 2-103
9.1 Absolute Maximum Ratings ...... 2-104
9.2 D.C. Characteristics ............. 2-104
9.3 A.C. Characteristics ............. 2-105
9.4 Component Buffer Model ........ 2-111
9.4.1 First Order Electrical Buffer
Model ......................... 2-111
9.4.2 First Order Electrical Model
Parameter Values .............. 2-111
9.4.3 Package Parameters ........ 2-111
9.4.4 Board.1 nterconnects ........ 2-112
5.3.3 Write Cycle Reordering Due to
Buffering ........................ 2-79
5.3.4 Strong Ordering Mode ....... 2-80
5.3.5 Scheduling Inquiry Write-Back
Cycles .......................... 2-81
5.3.5.1 Choosing between
FLlNE# and BOFF# ......... 2-81
5.3.5.2 Reordering Write-Backs
with FLlNE# ................. 2-82
5.3.5.3 Reordering Write-Backs
with BOFF # ................. 2-84
5.4 The LOCK # Cycle Attribute ....... 2-84
2-4
CONTENTS
CONTENTS
PAGE
PAGE
10.4 Instruction Characteristics ...... 2-142
10.0 INSTRUCTION SET .............. 2-120
10.1 Instruction Definitions in
Alphabetical Order ................ 2-121
10.5 Software Compatibility .......... 2-145
10.5.1 Required Changes ........ 2-145
10.5.2 Performance
Optimizations .................. 2-145
10.2 Instruction Format and
Encoding ......................... 2-130
10.2.1 REG-Format Instructions .. 2-130
10.5.3 New Features ............. 2-146
10.5.4 Notes ..................... 2-146
10.2.2 CTRL-Format
Instructions .................... 2-133
11.0 REVISION HISTORY ............. 2-146
10.2.3 Floating-Point Instruction
Encoding ...................... 2-133
INDEX .. ............................... 2-147
10.3 Instruction Timings ............. 2-136
2-5
CONTENTS
CONTENTS
PAGE
Figure 5.6
FIGURES
Figure 0.1'
Figure 2.1
Figure 2.2
Figure 2.3
Figure 2.4
Figure 2.5
Block Diagram .............. 2-1
Real Number Formats ....... 2-11
Pixel Format Example .... "... 2-12
Registers and Data Paths ... 2-13
Processor Status Register ... 2-14
Extended Processor Status
Register ; ................... 2-15
Figure 2.6 Directory Base Register ..... 2-16
Figure 2.7 Floating-Point Status
Register .................... 2-18
Figure 2.8 Concurrency Control
Register .................... 2-20
Figure 2.9 . Concurrency Status
Register .................... 2-21
Figure 2.10 Little and Big Endian Memory
Transfers ................... 2-22
Figure 2.11 Formats of Virtual
Addresses .................. 2-23
Figure 2.12 Address Translation ......... 2-23
Figure 2.13 Formats of Page Table
Entries ...................... 2-24
Figure 2.14 Pipelined Instruction
Execution ................... 2-30
Figure 2.15 Dual-Instruction Mode
Transitions (1 of 2) .......... 2-32
Figure 2.15 Dual-Instruction Mode
Transitions (2 of 2) .......... 2-32
Figure 2.16 Dual-Operation Data
Paths ....................... 2-33
Figure 3.1 4K TLB Organization ........ 2-40
Figure 3.2 4M TLB Organization ....... 2-40
Figure 3.3 Cache Address Usage ...... 2-41
Figure 3.4 Data Cache Organization .... 2-42
Figure 3.5 Instruction Cache
Organization ................ 2-43
Figure 4.1 Signal Grouping ............. 2-49
Figure 5.1 Timing Diagram
Conventions ................ 2-57
Figure 5.2 Fastest Single-Transfer
Cycles ...................... 2-58
Figure 5.3 Single-Transfer Cycles with
Wait States ................. 2-59
Figure 5.4 Basic Burst Cycle ........... 2-60
Figure 5.5 Slow Burst Cycle ............ 2-60
Figure 5.7
Figure 5.8
Figure 5.9
Figure 5.10
Figure 5.11
Figure 5.12
Figure 5.13
Figure 5.14
Figure 5.15
Figure 5.16
Figure 5.17
Figure 5.18
Figure 5.19
Figu're 5.20
Figure 5.21
Figure 5.22
Figure 5.23
Figure 5.24
Figure 5.25
Figure 5.26
Figure 5.27
Figure 5.28
Figure 5.29
2-6
PAGE
Different Lengths of Burst
Cycles ...................... 2-61
Pipelined Cache Line Fills ... 2-63
Pipelined Back-to-Back Read
and Write Cycles ............ 2-64
Example Interrupt
Acknowledge Sequence .... 2-65
HOLD/HLDA Handshake ... 2-66
Normal Back-Off ............ 2-68
One-Clock Normal
Back-Off .................... 2-68
Fastest Nonpipelined Cycles
in One-Clock Late Back-Off
Mode ....................... 2-69
One-Clock Late Back-Off
Mode (Case 1) .............. 2-70
One-Clock Late Back-Off
Mode (Case 2) .............. 2-70
One-Clock Late Back-Off
Mode (Case 3) .............. 2-71
Two-Clock Late Back-Off
Mode ....................... 2-71
Inquiry Miss Cycle ........... 2-72
Fastest Inquiry Cycles
(Miss and Hit) ............... 2-73
Inquiry Hit Cycle with
Write-Back .................. 2-74
Snoop Responsibility Pickup
(Nonpipelined Cycle) ........ 2-76
Snoop Responsibility Pickup
(Pipelined Cycle) ............ 2-77
Latest Snooping of
Write-Back (Not Late
Back-Off Mode) ............. 2-78
Latest Snooping of WriteBack (One-Clock
Late Back-Off Mode) ........ 2-78
Latest Snooping of WriteBack (Two-Clock Late BackOff Mode) ................... 2-79
Write Reordering due to
Buffering .................... 2-80
Timing of EWBE# .......... 2-81
Cycle Reordering via FLlNE#
(No Ongoing Burst) ......... 2-82
Cycle Reordering via FLlNE#
(Ongoing Burst) ............. 2-83
CONTENTS
PAGE
Figure 5.30 Cycle Reordering via BOFF #
(Ongoing Burst) ............. 2-84
Figure 5.31 lOCK# Timing ............. 2-85
Figure 5.32 Reset Activities ............. 2-86
Figure 6.1 Format of DID Register ...... 2-88
Figure 6.2 logical Structure of BSR
Register .................... 2-88
Figure 6.3 TAP Controller State
Diagram .................... 2-90
Figure 6.4 Boundary Scan Register
Ordering .................... 2-93
Figure 7.1 i860TM XP Microprocessor
Pin Configuration-View from
Pin Side .................... 2-95
Figure7.2 i860TM XP Microprocessor
Pin Configuration-View from
Top Side .................... 2-96
Figure 7.3 262-lead Ceramic PGA
Package Dimensions ...... 2-101
Figure 8.1 Icc Derating with Case
Temperature ................ 2-103
Figure 9.1 ClK, Input, and Output
Timings .................... 2-107
Figure 9.2 TAP Signal Timings ........ 2-107
Figure 9.3 Typical Output Delay vs load
Capacitance ............... 2-108
CONTENTS
. Figure 9.4
Figure 9.5a
Figure 9.5b
Figure 9.6
Figure 9.7a
Figure 9.7b
Figure 9.8
Figure 9.9a
Figure 9.9b
Figure 9.10
Figure 10.1
Figure 10.2
Figure 10.3
Figure 10.4
2-7
PAGE
Typical Output Delay vs.
load Capacitance under
Worst-Case Conditions .... 2-108
Typical Slew Time vs. load
Capacitance under WorstCase Conditions (Rising
Voltage) ................... 2-109
Typical Slew Time vs. load
Capacitance under WorstCase Conditions (Falling
Voltage) ................... 2-109
Typical Icc vs. Frequency .. 2-110
Output Model .............. 2-111
Input Model ................ 2-111
Package Model ............ 2-112
Output Buffer and Package
Model ..................... 2-112
Input Buffer and Package
Mod.el ..................... 2-112
Transmission Line Model ... 2-112
REG-Format Variations .... 2-131
Core Escape Instructions .. 2-132
CTRl-Format Instructions .. 2-133
Floating-Point Instruction
Encoding .................. 2-134
CONTENTS
CONTENTS
PAGE
Table5.4
TABLES
Table 2.1
Table 2.2
Table 2.3
Table 2.4
Table 2.5
Table 2.6
Table 2.7
Table 2.8
Table 2.9
Table 2.9
Table 2.10
Table 2.11
Table 3.1
Table 3.2
Table 3.3
Table 3.4
Table 4.1
Table 4.2
Table 4.3
Table 4.4
Table 4.5
Table 5.1
Table 5.2
Table 5.3
Pixel Formats ................
Values of PS ................
Values of RB ................
ValuE;ls of RC ................
Values of RM ................
Values of LRP1 and LRPO ...
Values of CO and DO ........
CCU Addresses .............
Instruction Set (1 of 2) .......
Instruction Set (2 of 2) .......
Types of Traps ..............
Register and Cache Values
after Reset ..................
MESI Cache Line States .....
Internally Initiated Cache
State Transitions ............
Inquiry-Initiated Cache State
Transitions ..................
Summary of Cache Flushing
And Invalidation .............
Pin Summary ................
ADS# Initiated Bus Cycle
Definitions ...................
Memory Data Transfer Cycle
Types .......................
Cycle Length Definition ......
EADS# Sample Time ........
Burst Order for Cache Line
Transfers ....................
Pipeline Cycle
Compatibility ................
Encoding of Special Bus
Cycles .......................
2-11
Table 5.5
2-14
Table 6.1
Table 6.2
2-17
2-17
2-18
Table 6.3
Table 7.1
2-19
2-20
2-28
Table 7.2
2-29
2-30
Table 7.3
2-35
Table 8.;
Table 8.2
2-39
2-43
2-44
Table 9.1
Table 9.2
2-44
Table 9.3
2-47
Table 9.4
2-48
2-49
Table 9.5
Table 10.1
Table 10.2
Table 10.3
Table 10.4
Table 10.5
Table ,1 0.6
Table 10.7
Table 10.8
2-49
2-49
2-52
2-61
2-62
2-65
2-8
PAGE
Inquiry for a Line being
Cached ..................... 2-75
Output Pin Status during
Reset ....................... 2-86
TAP Instruction Encoding .... 2-88
Registers Active by
Instruction ................... 2-89
Instruction Functions ........ 2-94
Pin Cross Reference by
Location .............. , ...... 2-97
Pin Cross Reference by Pin
Name ....................... 2-98
Ceramic PGA Package
Dimension Symbols ........ 2-100
Thermai Resistance ........ 2-102
Maximum TA at Various
Airflows .................... 2-102
D.C. Characteristics ........ 2-104
50 MHzA.C.
Characteristics ............. 2-105
Small Output Buffer First
Order Electrical Model
Parameter Values .......... 2-113
Large Output Buffer First
Order Electrical Model
Parameter Values .......... 2-114
Buffer Models ....... , ...... 2-115
Precision Specification ..... 2-120
FADDP MERGE Update .... 2-129
Register Encoding .......... 2-130
REG-Format Opcodes ...... 2-132
Core Escape Opcodes ...... '2-133
CTRL-Format Opcodes ..... 2-133
Floating-Point Opcodes ..... 2-134
DPC Encoding .............. 2-135
onte)®
1.0
i860TM XP MICROPROCESSOR
The .floating-poi~t multiplier performs floating-point
and Integer multiply as well as floating-point recipro-.
cal operations on 64- and 32-bit floating-point values. A multiplier instruction executes in three to four
clocks; however, in pipe lined mode, a new result can
be generated every clock for single-precision and
every other clock for double precision.
FUNCTIONAL DESCRIPTION
As ~hown by the block diagram on the front page,
the IB60 XP Microprocessor consists of the following
units:
1. Integer Registers and Core Execution Unit
2. Floating-Point Registers and Control Unit
3. Floating-Point Adder Unit
4. Floating-Point Multiplier l,Jnit
5. Graphics Unit
6. Paging Unit
7. Instruction Cache
B. Data Cache
9. Bus and Cache Control Unit
10. Detached Concurrency Control Unit
The graphics unit supports three-dimensional drawing i~ a graphi~s frame buffer, with color intensity
shading and hidden surface elimination via the
Z-buffer algorithm. The graphics unit recognizes the
pixel as an B-, 16-, or 32-bit integer data type. It can
compute individual red, blue, and green color intensity values within a pixel; but it does so with parallel
operations that take advantage of the 64-bit internal
word size and 64-bit external bus. The graphics features of the iB60 XP microprocessor assume that the
surface of a solid object is drawn with polygon .
patches which, like the pieces of a puzzle, collec- '
tively approximate the shape of the original object.
The color intensities of the vertices of the polygon
and their distances from the viewer are known, but
the distances and intensities of the other points
must be calculated by interpolation. The graphics instructions of the iB60 XP microprocessor directly aid
such interpolation.
EI
The core execution unit controls overall operation of
the iB60 XP microprocessor. It executes load store
integer, bit, I/O, and control-transfer operatio~s, and
fetches instructions for the floating-point unit as well.
A set of 32 x 32-bit general-purpose registers are
provided for the manipulation of integer data. Load
and store instructions move B-, 16-, and 32-bit data
to ~nd from these registers. Its full set of integer,
logical, and control-transfer instructions give the
core unit the ability to execute complete systems
software and applications programs. A trap mechanism provides rapid response to exceptions and external interrupts. Debugging is supported by the ability to trap on data or instruction reference.
The paging unit implements protected, paged, virtual
memory. The paging unit uses two four-way set-associative cache memories called TLBs (Translation
Lookaside Buffers) to perform the translation of logical address to physical address, and to check for
access violations. The access protection scheme
employs two levels of privilege: user and supervisor.
One TLB supports 4 Kbyte pages, and has 64 entries; the other supports 4 Mbyte pages, and has 16
entries.
The floating-point hardware is connected to a separate set of floating-point registers, which can be accessed as 16 x 64-bit rElgisters or as 32 x 32-bit
registers. Load and store instructions can also access these same registers as B x 12B-bit registers.
All floating-point and graphics instructions use these
registers as their source and destination operands.
The floating-point control unit controls both the float!ng-~oint adder and the floating-point multiplier, issuIng Instructions, handling all source and result exceptions, and updating status bits in the floatingpOint status register. The adder and multiplier can
operate in parallel, producing up to two results per
clock. The floating-point data types, floating-point instructions, and exception handling all support the
IEEE Standard for Binary Floating-Point Arithmetic
(ANSIIIEEE Std 754-19B5).
The floating-point adder performs addition subtraction, comparison, and conversions on 64- ~nd 32-bit
floating-point values. An adder instruction executes
in thre.e clocks; however, in pipelined mode, a new
result IS generated every clock.
The instruction cache is a four-way set-associative
memory of 16 Kbytes, with 32-byte lines. It transfers
up to 64 bits per clock (400 Mbyte/sec at 50 MHz).
. The data cache is a four-way set-associative memory of 16 Kbytes, with 32-byte lines. It transfers up to
12B bits per clock (BOO Mbyte/sec at 50 MHz). The
iB60 XP microprocessor normally uses write-back
c~ching, i.e: memory writes update the cache (if applicable) without necessarily updating memory immediately; however, under both software and hardware control, write-through and write-once policies
can be implemented, or caching can be inhibited.
The caches are transparent to applications software.
The bus and cache control unit performs data and
instruction accesses for the core unit. It receives cycle requests and specifications from the core unit,
performs the data-cache or instruction-cache miss
processing, controls TLB translation, and provides
2-9
Intei·
i860TM XP MICROPROCESSOR
the interface to the external bus. Its pipe lined structure supports up to three outstanding bus cycles. Its
burst mode transfers data at up to 400 Mbyte/sec at
50 MHz. In multiprocessor systems, .it maintains
cache consistency by monitoring bus activity in parallel with other CPU functions.
The DCCU (detached concurrency control unit) is a
compatible subset of the external CCU that expedites loop-level parallelism and synchronization in
multiprocessor systems. The DCCU consists of registers and a counter that allow a single i860 XP microprocessor to run binary code compiled for a multiprocessor system adhering to the PAX parallel applications binary interface (ASI).
2.1.1 INTEGER
An integer is a 32-bit signed value in standard two's
complement form. A 32-bit integer can represent a
value in the range -2,147,483,648 (-2 31 ) to
2,147,483,647 (+ 231 - 1), Arithmetic operations on
8- and 16-bit integers can be performed by sign-extending the 8- or 16-bit values to 32 bits, then using
the 32-bit operations.
There are also add and subtract instructions that operate on 64-bit long integers.
Load and store instructions may also reference (in
addition to the 32- and 64-bit formats previously
mentioned) 8- and 16-bit items in memory. When an
8- or 16-bit item is loaded into a register, it is converted to an integer by sign-extending the value to
32 bits. When an 8- or 16-bit item is stored from a
register, th? corresponding numb6i of low-order bits
of the register are used.
The i860 XP microprocessor may to be used with or
without an external, secondary cache built from
82495XP and 82490XP cache components. An
82495XP and 82490XP cache provides up to 512
Kbytes of high-speed storage for data and instruction combined. In most cases, an 82495XP and
82490XP cache can provide data to the CPU with
zero wait states. The larger size of an external cache
can provide an increased hit rate when the size or
number of data structures and programs exceeds
the size of the internal caches. In multiprocessor
systems, the external cache serves as local memory, and can reduce bus traffic. An external cache
also hides the processor from rest of system, which
is a double advantage:
1. The processor can be upgraded without affecting
design of the memory and other subsystems.
2. Slower and less expensive memory and I/O subsystem designs can be employed without unduly
lowering overall system performance.
2.1.2 ORDINAL
Arithmetic operations are available for 32-bit ordinals. An ordinal .is an unsigned integer. An ordinal
can represent values in the range 0 to
4,294,967,295 (+2 32 - 1).
Also, there are add and subtract instructions that operate on 64-bit ordinals.
2.1.3 SINGLE- AND DOUBLE-PRECISION REAL
Figure 2.1 shows the real number formats. A singleprecision real (also called "single real") data type is
a 32-bit binary floating-point number. Sit 31 is the
sign bit; bits 30 .. 23 are the exponent; and bits 22 .. 0
are the fraction. In accordance with ANSI/IEEE
standard 754, the value of a single-precision real is
defined as follows:
1. If e = 0 and f =/0 0 or e = 255 then generate a
floating-point source-exception trap when encountered in a floating-point operation.
2. If 0 < e ~ 255, then the value is (-1)5 X 1.f x
2 e - 127.
Refer to the 82495XP Cache Control/erl82490XP
Cache RAM Data Sheet (Intel Order #240956) for
more information.
2.0
PROGRAMMING INTERFACE
The programmer-visible aspects of the architecture
of the i860 XP microprocessor include data types,
registers, instructions, and traps.
3. If e = 0 and f= 0, then the value is signed zero.
2.1
Data Types
A double-precision real (also called "double real")
data type is a 64-bit binary floating-point number. Bit
63 is the sign bit; bits 62 .. 52 are the exponent; and
bits 51 .. 0 are the fraction. In accordance with ANSI/
IEEE standard 754, the value of a double-precision
real is defined as follows:
1. If e = 0 and f =/0 0 or e = 2047, then generate a
floating-point source-exception trap when encountered in a floating-point operation.
The i860 XP microprocessor provides operations for
integer and floating-point data. Integer operations
are performed on 32-bit operands with some support
also for 64-bit operands. Load and store instructions
can reference 8-bit, 16-bit, 32-bit, 64-bit, and 128-bit
operands. Floating-point operations are performed
on IEEE-standard 32- and 64-bit formats. Graphics
instructions operate on arrays of 8-, 16-, or 32-bit
pixels.
2-10
int:eL
i860TM XP MICROPROCESSOR
2. If 0 < e <,2047, then the value is (-1)S x 1.1 x
2e-1023.
.
3. If e
less of the pixel size, the i860 XP microprocessor
always operates on 64 bits of pixel data at a time.
The pixel data type is used by two kinds of instructions:
• The selective pixel-store instruction that helps implement hidden surface elimination.
• The pixel add instruction that helps implement
3-0 color intensity shading.
= 0 and 1 = 0, then the value is signed zero.
The special values infinity, NaN ("Not a Number"),
indefinite, and denormal generate a trap when encountered. The trap handler implements IEEE-standard results.
A double real value occupies an even/odd pair of
floating-point registers. Bits 31 .. 0 are stored in the
even-numbered floating-point register; bits 63 .. 32
are stored in the next higher odd-numbered floatingpoint register.
To perform color intensity shading efficiently in a variety of applications, the i860 XP microprocessor defines three pixel formats according to Table 2.1.
Figure 2.2 illustrates one way of assigning meaning
to the fields of pixels. These aSSignments are for
illustration purposes only. The i860 XP microprocessor defines only the field sizes, not the specific use
of each field. Other ways of using the fields of pixels
are possible.
'
2.1.4 PIXEL
A pixel may be 8-, 16-, or 32-bits long, depending on
color and intensity resolution requirements. RegardSingle-Precision Real
31 '30 23 :22
e
f
Double-Precision Real
0
Is
2-J
SIGN
EXPONENT
FRACTION
52'51
ffi. '62
Is
I
1
0
I
e
I;
f
1
L-FRACTION
EXPONENT
SIGN
240874-2
240874-3
Figure 2.1. Real Number Formats
Table 2.1. Pixel Formats
Pixel
Size
(in bits)
Bits of
Color 1
Intensity(1)
Bits of
Color 2
Intensity(1)
Bits of
Color 3
Intensity(1)
8
16M
32
6
8
N (:0;; 8) bits of intensity(2)
6
8
4
8
J
J
I
Bits of
Other
Attribute
(Texture, Color)
8-N
0
8
NOTES:
1. The intenSity attribute fields may be assigned to colors in any order convenient to the application.
2. With a·bit pixels, up to a bits can be used for intensity; the remaining bits can be used for any other attribute, such as
color or texture. Bits that require interpolation (shading), such as those for intensity, must be the low-order bits of the pixel.
2-11
i860™ XP MICROPROCESSOR
a-BIT PIXEL
15 14 15 12 If 10 9
16-BIT PIXEL
I
RED
I
~
( _ _C_OL_O_R_--.J'I
8 7 6 5 4 .; 2
GREEN
I
1 0
BLUE
32-BIT PIXEL
Jf 30292827262524 '2322 21 20 19 18 17 16 1514131211109 B 7 6 5 4 3 2
I
RED
GREEN
BLUE
1 0
TEXTURE
240874-4
NOTE:
These aSSignments of specific meanings to the fields of pixels are for illustration only. Only the field sizes are defined,
not the specific use of each field.
Figure 2.2. Pixel Format Example
When accessing 64-bit floating-point or integer values, the i860 XP microprocessor uses an even/odd
pair of registers. When accessing 128-bit values, it
uses an aligned set of four registers (fO, f4, f8, f12,
f16, f20, f24, or f28). The instruction must designate
the lowest register number of the set of registers
containing 64- or 128-bit values. Misaligned register
numbers produce undefined results. The. register
with the lowest number contains the least significant
part of the value. For 128-bit values, the register pair
with the lower number contains the value from the
lower memory address; the register pair with the
higher number contains the value from the higher
address.
2.2 Register Set
As Figure 2.3 shows, the i860 XP microprocessor
has the following registers:
• An integer register file
o A floating-point register file
• Control registers psr, epsr, db, dirbase, fir, fsr,
bear, eer, p3, p2, p1, pO
• Special-purpose registers KR, KI, T,MERGE,
STAT, and NEW~URR
The control registers are accessible only by load
and store control-register instructions; the integer
and floating-point registers are accessed by arithmetic operations and load and store instructions. The
special-purpose registers KR, KI, and T are used by
floating-point instructions; MERGE is used by graphics instructions. NEWCURR and STAT are used for
concurrency control; they are accessed by memory
load and store instructions.
The 128-bit load and store instructions, along with
the 128-bit data path between the floating-point registers and the data cache, help to sustain an extraordinarily high rate of computation.
2.2.3 PROCESSOR STATUS REGISTER
The processor status register (psr) contains miscellaneous state information for the current process.
Figure 2.4 shows the format of the psr.
• BA (Break Read) and BW (Break Write) enable a
data access trap when the operand address
matches the address in the db register and a
read or write (respectively) occurs.
2.2.1 INTEGER REGISTER FILE
There are 32 integer registers, each 32 bits wide,
referred to as rO through r31, which are used for
address computation and scalar integer computations. Register rO always returns zero when read.
• Various instructions set CC (Condition Code) according to tests they perform. The branch-oncondition-code instructions test its value. The bla
instruction sets and tests LCC (Loop Condition
Code).
2.2.2 FLOATING-POINT REGISTER FILE
There are 32 floating-point registers, each 32-bits
wide, referred to as fO through f31, which are used
for floating-point computations. Registers fO and f1
always return zero when read. The floating-point
registers are also used by a set of integer operations, primarily for graphics computations.
• 1M (Interrupt Mode), if set, enables external interrupts on the INT pin; disables interrupts on INT if
clear. 1M does not affect parity error interrupts or
interrupts on the BEAR pin.
2-12
i860TM }(P MICROPROCESSOR
BEAR
FIR
CCR
PSR
PO
DIRBASE
PI
DB
P2
FSR
P3
EPSR
84
II
INSTRUCTION
DECODE AND FETCH
LATE BACK-OFF
I
I
16 KBYTE
INSTRUCTION CACHt
•D I• D•
L..ii_BUFFERS
_ _....1 '28
ADDRESS
32
32
D
•
••
16 KBYTE
DATA CACHE
32
••
32
240874-5
Figure 2.3. Registers and Data Paths
o FT (Floating-Point Trap),
OAT (Data Access
Trap), IAT (Instruction Access Trap), IN (Interrupt), and IT (Instruction Trap) are trap flags.
They are set when the corresponding trap condition occurs. IN is set on INT, bus error and parity
error. The trap handler examines these bits (and
other trap bits in the epsr) to determine which
condition or conditions have caused the trap.
• U (User Mode) is set when the i860 XP microprocessor is executing in user mode; it is clear
when the i860 XP microprocessor is executing in
supervisor mode. In user mode, writes to some
control registers are inhibited. This bit also controls the memory protection mechanism.
• PIM (Previous Interrupt Mode) and PU (Previous
User Mode) save the corresponding status bits
(1M and U) on a trap, because those status bits
are changed when a trap occurs. They are restored into their corresponding status bits when
returning from a trap handler with a branch indirect instruction when a trap flag is set in the psr.
o
2-13
DS (Delayed Switch) is set if a trap occurs during
the instruction before dual-instruction mode is entered or exited. If DS is set and DIM (Duallnstruction Mode) is clear, the i860 XP microprocessor
switches to dual-instruction mode one instruction
i860™ XP MICROPROCESSOR
BREAK R E A D - - - - - - - - - - - - - - - - - - - - - ,
BREAK W R I T E - - - - - - - - - - - - - - - - - - - - - ,
CONDITION CODE - - - - - - - - - - - - - - - - - - ,
LOOP CONDITION CODE - - - - - - - - - - - - - - - - ,
INTERRUPT M O D E ' - - - - - - - - - - - - - - - - - ,
PREVIOUS INTERRUPT MODE . , - - - - - - - - - - - - - - ,
USER M O D E - - - - - - - - - - - - - - - - ,
PREVIOUS USER MODE - - - - - - - - - - - - - ,
INSTRUCTION TRAP - - - - - - - - - - - - . . . . ,
INTERRUPT - - - - - - - - - - - - - - - ,
INSTRUCTION ACCESS TRAP ---~----....,
DATA ACCESS.TRAP-----------,
FLOATING-POINT TRAP - - - - - - - - - ,
DELAYED SWITCH - - - - - - - - - - ,
DUAL INSTRUCTION MODE
------,J
JJI
3130292827262524 '23222120 1918 17.16 15 14 1312 1110 9 8
I
PM
PS
? D F 2 AI I
SC'i~1 F M S T T TNT
I'.~I~
I
7 6 5 4 3 2
plul) I ~
U I 'I M M C
~
1
0
-
C B B:
C WR.
KILL NEXT FP INSTRUCTION
'------,- (RESERVED)
' - - - - - - - SHIFT COUNT
' - - - - - - - - - PIXEL SIZE
' - - - - - - - - - - - - - PIXEL MASK
ITI!I RESERVED BY INTEL CORPORATION
IllB CAN
BE CHANGED ONLY FROM SUPERVISOR LEVEL
240874-6
Figure 2.4. Processor Status Register
after returning from the trap handler. If OS and DIM
are both set, the i860 XP microprocessor switches
to single·instruction mode one instruction after returning from the trap handler.
• When a trap occurs, the i860 XP microprocessor
sets DIM if it is executing in dual-instruction
mode; it clears DIM if it is executing in single-instruction mode. If DIM is set.after returning from a
trap handler, the. i860 XP microprocessor resumes execution in dual-instruction mode.
• When KNF (Kill Next Floating-Point Instruction) is
set, the next floating-point inStriJction is suppressed (except that its dual-instruction mode bit
is interpreted). A trap handler sets KNF 'if the
trapped floating-point instruction should not be
reexecuted.
• SC (Shift Count) stores the shift count used by
the last right-shift instruction. It controls the number of shifts executed by the double-shift instruction.
• PS (Pixel Size) and PM (Pixel Mask) are used by
the pixel-store and other graphics· instructions.
The values of PS control pixel size as defined by
Table 2.2. The bits in PM correspond to pixels to
be updated by the pixel-store instruction pst.d.
The low-order bit of PM corresponds to the loworder pixel of the 64-bit source operand of pst.d.
The number of low-order bits of PM that are actually used is the number of pixels that fit into
64-bits, which depends upon PS. If a bit of PM is
set, then pst.d stores the corresponding pixel.
Refer also to the pst.d instruction in section 10.
Table 2.2. Values of PS
Value
Pixel Size
in Bits
Pixel Size
in Bytes
00
01
10
11
8
16
32
(undefined)
1
2
4
(undefined)
2.2.4 EXTENDED PROCESSOR STATUS
REGISTER
The extended processor status register (epsr) contains additional state information for the current process beyond that stored in the psr. Figure 2.5 shows
the format of the epsr.
• The processor type is 2 for the i860 XP micro, processor.
• The stepping number has a unique value that distinguishes among different revisions of the proc-
essor.
• IL (Interlock) is set if a trap occurs after a lock
instruction but before the last BRDY # of the load
or store following the subsequent unlock
instruction. IL indicates to the trap handler that a
locked sequence has been interrupted. When the
trap handler finds IL set, it should scan backwards for the lock instruction and restart at that
point. The absence of a lock instruction within
30-33 instructions of the trap indicates a programming error.
2-14
int:eL
i860™ XP MICROPROCESSOR
INTERLOCK - - - - - - - - - - - - ,
WRITE-PROTECT MODE - - - - - - - - - ,
PARITY ERROR FLAG •
-------.l
" 0/29.'28/27. 5. 5
reI S P P I
FH 0 I T I
,
.•.: ,.\f-~-\f:\f-~"'\f:\\ " J\
\~,
'0"
,
",".
"~ J>.'•,
. ,. .:•.".},. ,. ,.• . '.:,'. •.,'. . '. . •,.:'. ,.:. '. . !.,
; .,. ' / ' f ; V ; tJL:...A"('·Y ; YJ ;~'"
I I I Ie>'
1_ _
_I{
I
t
I
1. ____
,&,. _ _ _ _ ' - _ _ _ _
:
:
I
::
I
I
TO CPU
-
:
I
I
FROM CPU
,
------------'. ,
\..
I
~ ____ ~
,
240874-30
Figure 5.3. Single-Transfer Cycles with Wait States
The fastest' possible burst cycle requires two clock
periods for the first data item: one clock for ADS#
and one clock for BRDY #; subsequent data items
are transferred every clock period. One such bus
cycle is shown in Figure 5.4. Note that, in this case,
the initial cycle generated by the i860 XP microprocessor could be satisfied by a single data transfer, but
the system transforms it into a multiple-transfer
cache line fill by activating KEN # in the clock period
of the first BRDY #. KEN # has this effect only if the
CACHE# pin is active, which means the cycle is internally cacheable in the i860 XP microprocessor.
Read data is sampled only in the clock period in
which BRDY # is returned, which means that data
need not be sent to the i860 XP microprocessor every clock period in the burst cycle. Figure 5.5 shows
an example of a burst cycle in which two clock periods are required for every burst item.
The burst length attributes LEN and CACHE # are
driven with the address. Figure 5.6 illustrates two
consecutive burst cycles with differing length attributes: the first one is a noncacheable 128-bit read,
and the second one is a cache line fill initiated by a
cacheable 64-bit read.
2-59
II
i860™ XP MICROPROCESSOR
CLK
ADS#
..______.Jx~--~----~------~----~------~----~------~----____----~
ADDRESS
LEN
\
CACHE#
\
W/R#
: \
1
I
1
It",;;'' ' ;if(~W8'f;;M iff¥i !)%)W;~ £\if;W;iltAf!tJ!:fi 1rJ,ti!Jfziitr0WN~tittgi*;!jh);i
8RDY#
I
I
I
I
KEN#
I
I
I
I
I
I
I
I
I
II'~IIII
DATA
~----
.. ----,..-
I
I
TO CPU
TO CPU
TO CPU
TO CPU
I
--1---- ... ---- .. ----,
1
I
I
I
240874-31
NOTE:
1. KEN# driven with first assertion of BRDY#
Figure 5.4. Basic Burst Cycle
CLK
ADS#
, I
ADDRESS
LEN
\
CACHE#
\
W/R#
\
'I
I
I
KEN#
fit J;1&)I;;;1t}!1~iWJW1fii;;!ziJF~t,,&1t~tzil!~{;t;JrJzi;:j;i1illtfJ@JiJft~!j:X;J;0%,/At:liti1illlzi1fiJ; :zi;l) t$B#!'4Ifj~zit{%W)tiziJf!&~
DATA
~-
I
I
I
I
I
I
I
I
I
I
I
~ - -:- - ~ --:- - ~ - -:- - ~ - ~- I~I~I~I~I
- - -:.. -
I
- -
~
t
240874-32
NOTE:
1. Wait states added by delaying assertion of BRDY #
Figure 5.5. Slow Burst Cycle
2-60
InteL
i860™ XP MICROPROCESSOR
2
3
4
ClK
I
ADS#
\..iJ
:\..l.J
\
\
lEN
G
CACHE#
I
' I
W/R#
BRDY#
II
KEN#
I
DATA
1
1
1
•
I
I
I
I
I
I
I
I
1
~----~-~--:--88SS-~----~
= 1
•
I
I
I
I
I
I
I
:
:
240874-33
NOTE:
1. Length attributes driven with ADS#
Figure 5.6. Different Lengths of Burst Cycles
The timing of write bursts is similar to that of read
bursts. The i860 XP microprocessor does not put
data on D63-DO for writes until the clock period after ADS#.
Table 5.1. Burst Order for Cache Line Transfers
When initiating any read, the i860 XP microprocessor presents the address for the data item requested. When the cycle is converted into a cache fill the
first data item returned corresponds to the add~ess
sent out by the i860 XP microprocessor. The remaining items must be returned in the order shown in
Table 5.1. This ordering is optimized for two-bank
memories, but works equally well with noninterleaved memories.
.
1st
Address
2nd
Address
3rd
Address
4th
Address
0
8
Ox10
Ox18
8
0
Ox18
,Ox10
Ox10
Ox18
0
8
Ox18
Ox10
8
0
5.1.3 PIPELINED CYCLES
A pipelined cycle is one that starts while one or two
other bus cycles are outstanding. A cycle is considered outstanding until the last BRDY # is asserted to
terminate that cycle. A nonpipelined cycle is one
that starts when' no other bus cycles are outstanding. Both types of cycle can be either read or write,
cycles. To allow high transfer rates in large memory
systems, the i860 XP microprocessor supports twolevel pipelining. New cycles can start as often as
every other clock until three cycles are outstanding.
In i860 XP microprocessor systems, memory must
support the burst order as defined in Table 5.1 for
reads. For writes, the burst addresses are always
increasing, so writes with four transfers match the
!irst lin~ of the table. In CS8 (code-size 8 bits) mode,
Instructions are not fetched in bursts. '
Note that the i860 XP microprocessor drives only
the first address of a burst cycle; the memory system is responsible for calculating subsequent addresses as shown in the table. The addresses can
be derived by complementing A3 after every transfer, and complementing A4 after two transfers.
The system asserts NA # to indicate that the
i860 XP microprocessor can start another cycle before the current one is completed. (NA # can even
2-61
intei®
i860™ XP MICROPROCESSOR
be asserted while BRDY # is active.) The i860 XP
microprocessor begins sampling NA# in the next
clock after ADS # is asserted. If the following conditions are met, a new (pipelined) cycle begins:
1. NA # having been active
2. An internal request pending
3. Compatibility between the pending request and
the outstanding requests (refer to Table 5.2)
4. HOLD, BOFF #, and AHOLD not active
5. Fewer than three cycles outstanding
The following "compatibility" rules determine when
the processor does not issue a pipelined ADS#
(they are the source of Table 5.2):
• Data cache line fills are pipelined into each other
only in the case olan aliasing virtual tag miss with
a physical tag hit.
• Reads Can be pipelined into TLB miss writes. TLB
misses for instructions can be pipelined into data
accesses, and vice versa.
• No data cycle is ever pipelined while LOCK # is
active.
• 1/0 cycles, special cycles, and Idint cycles never
begin when any cycle is outstanding.
NA# may be asserted before, simultaneously with,
or after the first BRDY # of the current cycle. If NA #
is asserted before the first BRDY #, the cacheability
(KEN #) and cache policy (WB/WT #) indicators for
the current cycle are sampled during the same clock
period as NA# is sampled active; otherwise, they
are sampled with the first BRDY #. Figure 5.7 shows
an example of four-transfer, pipelined, back-to-back
reads. Note the timing of KEN #. Because NA # is
asserted before the first BRDY # of the cycle A,
KEN# is sampled with the NA# for cycle B.
Table 5.2. Pipeline Cycle Compatibility
If A is Outstanding, can B be Pipelined into It?
B
Data
Cache
Line Fill
Data Cache
Store Miss,
Write·Thru
Data Cache
Read Miss
KEN# =1
Write·
Back"
Instruction
Fetch
pfld
TLB
Miss
Idio, stio,
Idint, scyc
LOCK#
Active
Data
Cache
Line Fill
YES'
YES'
YES'
YES
YES
YES'
YES
NO
YES
Data Cache
Store Miss,
Write·Thru
YES
YES
YES
YES
YES
YES
YES
NO
YES
YES'
YES'
YES'
YES'
YES
YES'
YES
NO
YES'
...J
Data Cache
Read Miss
KEN#= 1
>
u
Write-Back
YES
YES
YES
NO
YES
YES
YES
NO
YES
Instruction
Fetch
YES
YES
YES
YES
YES
YES
YES
NO
YES
pfld
YES
YES
YES
YES
YES
YES
YES
NO
YES
TLB Miss
YES
YES
YES
YES
YES
YES
YES
NO
YES
stio
scyc
YES
YES
YES
YES
YES
YES
YES
NO
YES
idio
idint
NO
NO
NO
NO
YES
NO
YES
NO
NO
LOCK#
Active
NO
NO
NO
NO
YES
NO
YES
NO
NO
A
w
U
III
:::l
0
>
w
a:
Q.
NOTE:
, Pipelining can occur if the first ADS# is for an aliasing virtual tag miss with a physical tag hit.
"Inquiry write-backs are not pipe lined into prior cycle unless FLlNE# is asserted.
2-62
intel®
i860™ XP MICROPROCESSOR
2
4
3
5
6
7
8
10
ClK
CACHE#
~~
__~~____~______~____~__________~____~____~____~_________
240874-34
NOTES:
A Four-transfer. cache line fill cycle
B Four-transfer. cache line fill cycle
1. KEN# for A simultaneous with NA#
Figure 5.7. Pipelhied Cache Line Fills
Write cycles can be pipe lined into read cycles and
vice versa, but, in both cases, the processor will
leave one clock between bursts to allow bus turnover, and will ignore any BRDY # given to it at that
time. Pipelined back-to-back read and write cycles
are shown in Figure 5.8. On writes, assertion of NA #
does not cause the values on the data bus to
change; it just enables new address and cycle specification outputs.
5.1.4 INTERRUPT ACKNOWLEDGE CYCLES
In response to a trap caused by assertion of the INT
pin, trap-handling software can generate· interrupt
acknowledge cycles by executing a procedure similar to the following.
lIThe following lock instruction must be on a 32-byte boundary:
lock
ldint.b src2, rdest
or
rdest, rO, rdest
unlock
Ilnop
Ilnop
ldint.b rO,
rdest
II
II
II
Lock the bus
First INTA cycle. Src2 contains 8.
Won't proceed until rdest loaded.
I I Unlock the bus 'after the next ldint
I I Insert 4 + idle
II clocks for 8259A recovery.
II Second INTA cycle
2-63
II
i860TM XP MICROPROCESSOR
2
3
4
5
7
6
8
9
10
ClK
I
I
W/R#
lEN#
~I
I
I
....-...a...,.---;.--'
:.:J
U
:---\iJ
I
CACHE#
ADS#
BRDY#
~i VtfA!iltt@~ti;ti ;;2ik;j;~
I
I
:
I
I
I
DATA
I
I
I
I
r----~----~--~--~--<~r~X~~~}--+--
I
TO CPU
:
I
T;;¥'_
~~"2q8;;;
BRDY#
LOCK#
I
;'i/
I
I
I
I
- ... - - - - 'T
I
I
I
_ -__I
240874-36
Figure 5.9. ElCample Interrupt Acknowledge Sequence
..
Table 5 3 Encoding of Special Bus Cycles
BE7#-BEO#
Special Bus Cycle
11110111
11111011
11111101
11111110
Write Back External Cache and Invalidate
Halt
Invalidate External Cache
Shut Down
All other encodings are reserved.
ed. (In the case of HOLD, BREQ is asserted even
though HLDA is asserted.) If holding due to BOFF#
and cycles need to be restarted or there is a new
internal request, it asserts the BREQ signal within
four clock periods after the assertion of BOFF #. In
all cases, BREQ remains active at least until the
clock after ADS # is activated for the requested cycle.
5.2 Bus Arbitration
The i860 XP microprocessor responds to three different signals that tell it to stop driving the bus:
HOLD Finishes outstanding cycles before giving
up the bus.
BOFF # Aborts outstanding cycles and gives up bus
immediately.
AHOLD Stops driving address bus and permits a
cache inquiry,
5.2.1 HOLD AND HLDA ARBITRATION
AHOLD results in a partial hold state, which is cov- '
ered in Section 5.3. The present section concen.
trates on HOLD and BOFF # .
When in a hold state (due either to HOLD or
BOFF#), the i860 XP microprocessor uses BREQ to
request control of the bus. If holding due to HOLD,
AHOLD, or BOFF #, the processor activates BREQ
in the clock after an internal bus request is generat2-65
HOLD indicates to the i860 XP microprocessor that
another bus master needs control of the bus. When
HOLD is asserted, the i860 XP microprocessor
keeps control of the bus until all outstanding cycles
are completed. Then it floats the output signals (except BREQ, HLDA, LOCK#, PCHK#, HIT#, and
HITM #) and asserts HLDA. These outputs remain at
the high-impedance state until HOLD is deasserted.
II
int:et
i860TM XP MICROPROCESSOR
HLDA may be asserted as soon as the clock period
after the one in which HOLD is asserted. HLDA may
be deasserted as soon as the clock after the one in
which HOLD is de asserted.
3. Maintain cache consistency; for example, the
i860 XP microprocessor is attempting to read or
write to a line that has been modified in the cache
of another CPU.
4. Prevent illegal access to an address already
locked by another CPU in a multiprocessor system.
An example HOLD/HLDA transaction is shown in
Figure 5.10. The i860 XP microprocessor recognizes
HOLD even while RESET is asserted, and it drives
HLDA in this case as well.
5.2.2.1 Cycle Back-Off
HOLD is recognized even when BOFF # is active,
and the i860 XP microprocessor responds with
HLDA the same as when the bus is idle.
Bus cycles are aborted when the system asserts
BOFF #. The i860 XP microprocessor samples this
pin in every clock period that it is driving the bus.
When BOFF # is asserted, the i860 XP microprocessor immediately (in the next clock period) floats the
bus. It floats the ADS# pin one clock period later,
thereby giving time for ADS # to be de asserted so
that it is not left floating active. The i860 XP microprocessor floats the same pins as for HOLD, but
HLDA is not asserted. If a bus cycle is in progress at
the time BOFF # is asserted, the cycle is aborted,
and, in a read cycle, any data returned to the processor while BOFF # is active is ignored. BOFF #
overrides BRDY #; so, if both are sampled active in
the same clock, BRDY# is ignored. BOFF# aborts
5.2.2 BUS CYCLE BACK-OFF AND RESTART
The i860 XP microprocessor provides the ability to
abort bus cycles and restart them again. It is necessary to abort cycles for reasons such as the following:
1. Retry after an error is detected by ECC or parity
logic.
2. Escape from a deadlock; for example, when the
i860 XP microprocessor is using A31-A3 to load
a new cache line, but the 82495XP cache controller needs A31-A5 to invalidate a line in the
CPU cache which the 82495XP cache controller
is replacing in its cache in order to satisfy the
CPU's line-fill request.
3
2
4
5
B
7
6
9
10
ClK
,
W
ADS#
lEN
CACHE#
P-----____\
,
,
\-~----~----~---~~-<
,
,
,
,~----~
I
I
I
,
I
I
I
I
,
I
,
'r--------
~----~
I-~----~----~----~-<
,~--~~
~~------------J.,
----------y-~----~----~~,
/.
,
~-.----~----~----~-<
,
,~.----~~
,,
,,
BRDY#
HOLD
HlDA#
'
~-----'
----_----_-1/
!O' _ _ _ _ _ _ _ _ _ _
240874-37
Figure 5.10. HOLD/HLDA Handshake
2-66
intel®
i860™ XP MICROPROCESSOR
a burst cycle even if it arrives with the last BRDY #
of the cycle. However, for read bursts, data transfers
completed before assertion of BOFF # are used by
the processor if they satisfy an internal request.
Cacheable data is cached in spite of BOFF #; however, the cached data is overwritten when the cycle
is restarted.
in parallel with inquiries to the other processors,
rather than delay the entire burst until the inquiries are finished.
For such situations, the iB60 XP microprocessor provides late back-off mode. For a read cycle in this
mode, the processor employs a buffer to internally
delay data and BRDY #, which allows BOFF # assertion to be delayed relative to the external
BRDY #. Likewise, for a write cycle in this mode,
BOFF # assertion can be delayed relative to
BRDY #. However, data for a write cycle is not delayed.
The bus remains in the high-impedance state until
BOFF # is deasserted. If cycles need to be restarted
or if a new internal request has been generated, the
BREQ signal is asserted within four clock periods
after the assertion of BOFF #.
Two flavors of late back-off mode are provided:
1. One allows BOFF # to be delayed by one clock
period relative to the data transfer. The processor enters one-clock late back-off mode when
the FLlNE# pin has been sampled active for at
least three clock periods when RESET deactivates.
2. The other allows BOFF # to be delayed by up to
two clock periods relative to the data transfer.
The iB60 XP microprocessor enters this mode
when software sets the LB bit of the dirbase
register.
5.2.2.2 Cycle Restart
When the system deasserts BOFF #, the iB60 XP
microprocessor restarts aborted bus cycles from the
beginning by driving the address and status (A31A3, W/R#, D/C#, etc.) and asserting ADS#. If
more than one cycle was outstanding when BOFF #
was asserted, the iB60 XP microprocessor restarts
all outstanding cycles in the same order. If HITM # is
active due to an inquiry, the write-back for it will be
the first cycle after deassertion of BOFF #. BOFF #
restarts all aborted cycles except:
o The stale cycles mentioned in section 5.3.5.
o The read that may have been generated by an
alias hit (virtual tag miss, but physical tag hit).
o The read that may have been generated by a
pfld that hit the data cache.
If the processor enters one-clock late back-off mode
during RESET, it is impossible to enter two-clock
late back-off mode. The LB bit has no effect. Furthermore, software cannot exit two-clock late backoff mode once it is activated, and the LB bit cannot
be cleared except by resetting the processor.
If the processor's KEN# pin was active (with NA#
or first BRDY #) before the cycle was aborted, external hardware must activate it again after the cycle is
restarted. In other words; the system cannot use
BOFF # to change the cacheability of a cycle via
KEN#.
Figures 5.12-5.17 illustrate variations on late backoff mode cycles. BOFF # can be (and usually is) asserted longer than one clock period, as Figure 5.11
shows; the remaining figures show an active time of
only one clock.
5.2.2.4 One-Clock Late Back-Off Mode
The LOCK# signal is not affected by restarted cycles; it retains its state in spite of BOFF # assertion.
In one-clock late back-off mode the data is delayed
internally by one clock before it is used.
5.2.2.3 Late Back-Off Modes
In this mode, data and BRDY # are seen by internal
logic one clock period later than they appear on the
bus, which is equivalent to adding an extra wait state
to reads on the external bus (Figure 5.13). All responses to BRDY # (assertion of the ADS # for the
next cycle, assertion of HLDA in response to a
HOLD request, and deassertion of HITM#) are delayed by one clock period compared to the normal
mode of operation. Not delayed, however, are write
data on D63-DO and sampling of KEN# and WBI
WT #. KEN # and WB/WT # must be valid with the
first BRDY # assertion. Also, the response to NA #
(assertion of ADS#) is not delayed if fewer than
three pipelined cycles are outstanding.
In some cases the logic that needs to assert
BOFF# cannot make the necessary decision in time
to cancel the relevant cycle or data transfer. For example:
1. The result of checking ECC or parity may not be
available until one or two cycles after the BRDY #
to which it corresponds.
2. When the iB60 XP microprocessor is attempting
to read or write to a line that might be modified in
the cache of another processor on the same bus,
it may be advantageous to let part of a burst run
2-67
fI
Intel.
i860TM XP MICROPROCESSOR
ClK
lEN
CACHE#
BOFF#
A31:A3
I
I
I
I
,- -i -- --., ---- "T - - - - .,. - - '\."....'---;-.---i----.;.--....;,...----i
,
,
\ __ ~---_~-_--~-- __ ~_-J
'0
,
\\...-T,-.....;....-..;....-----;.-J!
I
u:u u -:-uu:_ u u:u <,.--~--...,.,.----~~----..;
A
X'-____..;B;....._ _ _...;
:-----,-----,---\.!J
: : e
\
--..
,
,
,
,
,
""f: ~;~-~i. Atn \FPf;" ; --:--{
,
.
ill
ADS#
A
X
e
B
:
'
\.Ii:
BRDY#
240874-39
NOTES:
A Noncacheable, 64-bit cycle (one transfer)
B Next cycle (any type)
1. BOFF # cancels cycle and data transfer
2. Cycle A restarts one clock after BOFF # is deasserted
3. Earliest ADS# assertion for next cycle
Figure 5.12. One-Clock Normal Back-Off
2-68
inteL
i860TM XP MICROPROCESSOR
10
eLK
ADS#
ADDRESS
LEN
,
,
~
::x :
I
•
..
,
~~ _ _-
8
I
X:
ij
8
13
14
•
I
8
X
X
. _ _ _ _~J
I
I
I
12
11
D
I
CACHE#
U
I
W/R#
~\.. _ _ _ _ _.J!
BRDY#
""';',
I
DATA
~- -
I
- -
~-
11
\'-____..J!
'U';::'\"';"'" ,:,,;;:'8,:'
h_i_A>
I
I
I
-< CT~OU }--~I
I
- -
I
I
I
I
"'U-'-'___.......
I
I
I
I
I
I
-!--~--~---!.
--T---~--- .!_-.( f~~U~}_- ~--- -~----~
I~I
I~I
I
I
I
I
240874-40
NOTE:
1, Idle clock due to internal delay of BRDY #
Figure 5.13. Fastest Nonpipelined Cycles in One-Clock Late Back-Off Mode
mode, data delivery is delayed by either one or two
clock periods, depending on external activity. For
any BRDY #, the data is delayed by one clock period. If in the next clock period BRDY # is again asserted, the previous data is used. However, if in that
next clock period BRDY # remains inactive, the data
is delayed for one extra clock period before it is
used. The responses to BRDY # (assertion of the
ADS# for the next cycle, assertion of HLDA, and
deassertion of HITM#) are delayed by one or two
clock periods, depending on the value of BRDY # in
the next clock. The response to NA # (assertion of
ADS#) is not delayed if fewer than three pipelined
cycles are outstanding.
If BOFF # is asserted as late as the second BRDY #
(Figure 5.14), it cancels the entire cycle, ignores
data latched with the first BRDY #, and ignores the
data being driven with the second BRDY #. This is
true of a two-transfer burst (shown) as well as a fourtransfer burst (not shown).
In a two-transfer burst, if BOFF# is asserted in the
clock after the second BRDY # (Figure 5.15), it still
cancels the cycle.
In a four-transfer burst, if BOFF# is asserted within
one clock after the last BRDY # (Figure 5.16), it still
forces a retry of the cycle, but previously transferred
read data is used by the processor if it satisfies the
read request.
The st.c dirbase instruction that sets the LB bit
must be aligned on a 32-byte boundary and must be
followed by seven nap instructions. Software must
not enable late back-off mode when the processor is
used with the 82495XP external cache controller.
5.2.2.5 Two-Clock Late Back-Off Mode
Two-clock late back-off mode gives external logic
even rnore time to decide to use BOFF #. In this
2-69
FI
intet
i860TM XP MICROPROCESSOR
10
11
12
13
14
ClK
,
lEN
CACHE#
BOff#
ADS#
\.:-1
P
~~--~--~--~--~~\ -:_
,Jr-~--~--~~--~--~--~--~
'p
'.
:\.lJ
BRDY#
240874-41
NOTES:
A Noncacheable, 128-bit cycle (two transfers)
B Next cycle (any type)
1. BOFF# cancels both transfers (A1 in buffer, A2 on 063-00)
2. Cycle A restarts one clock after BOFF # is deasserted
3. Earliest AOS# assertion for next cycle
Figure 5.14. One-Clock Late Back-Off Mode (Case 1)
10
4
11
12
13
14
ClK
,
lEN
CACHE#
BOff#
ADS#'
BRDY#
,
\.:-1
P
P
w
,
:\.lJ
,
~~~: \t~:i~{;~:::;pwj~· it.? ,:
,
,
,
,
~ !~~?!f/#): : ~;: ~:j i::::W;~:' tW:;}" ~. ~:·,\~g:'i~':;:<'~' ~:tl~;~~1t~:¥it:t}k
240874-42
NOTES:
A Noncacheable, 128-bit cycle (two transfers)
B Next cycle (noncacheable)
1. BOFF # .cancels both transfers (A2 in buffer is needed to satisfy request)
2. Cycle A restarts one clock after BOFF # is deasserted
3. Earliest ADS # assertion for next cycle
Figure 5.15. One-Clock Late Back-Off Mode (Case 2)
intel®
i860™ XP MICROPROCESSOR
12
11
10
13
14
ClK
lEN
CACHE#
n
,
~--~~--~----~----~----~-----r--Jr-:-'~
,
~
KEN#
:,
U
~
~
~
r -__~____~____~
\ -:- '~__"",,________,,--___.....JI
I
,
__ ____ ____ ____
_____________________klA______________
~
80FF#
ADS#
~
,
,
,
U
U
NA#
8RDY#
ill
U
-----..;.;.....~.----~~....,~
8:
, ... !
240874-43
NOTES:
A Cacheable 54-bit (or less) cycle (four transfers)
B Next cycle (any type)
1. BOFF# cancels A2 and A3 transfers, but A1 transfer has already satisfied request
2. pycle A restarts one clock after BOFF # is deasserted
3. Earliest ADS# assertion for next cycle
Figure 5_16. One-Clock Late Back-Off Mode (Case 3)
4
6
10
11
12
13
ClK
lEN
CACHE#
BOFF#
ADS#
:;J
:;J
,
~
,
\..i,
,
240874-44
Figure 5.17. Two-Clock Late Back-Off Mode
tional in order to allow the address of inquiry to be
driven by the system. An inquiry cycle can begin during any hold state:
5.3 Cache Inquiry Cycles (Snooping)
Another processor initiates an inquiry cycle to check
whether an address is cached in the internal data or
instruction cache of the i860 XP microprocessor. An
inquiry cycle differs from any other cycle in that it is
initiated externally to the i860 XP microprocessor,
and the signal for beginning the cycle is EADS# (External Address Status) instead of ADS#. The address bus of the i860 XP microprocessor is bidirec-
1. While HOLD and HLDA are asserted.
2. While BOFF # is asserted.
3. While AHOLD (address hold) is asserted.
2-71
fI
InteL
i860™' XP MICROPROCESSOR
If neither a HOLD nor a BOFF # is in effect, the system can assert AHOLD to interrupt the current bus
activity.
to be transferred on the bus. It does not, however,
compare with the address of write-miss data in the
write buffers. Two clock periods after sampling
EADS #, the i860 XP drives the results of the inquiry
look-up on these output pins:
HIT #
Specifies whether the address was found
(active) or not found (inactive).
HITM# If active, the line found was in the M-state;
if inactive, the line was in E- or S-state, or
was not found.
EADS# is first sampled two clocks after BOFF# or
AHOLD assertion, or one clock after HLDA. This allows time for the processor to float A31-A5 and for
the system to stabilize the inquiry address there.
In the clock in which EADS# is asserted, the
i860 XP microprocessor samples these inputs,
which qualify the type of inquiry:
INV
Specifies whether the line (if found) must
be invalidated (that is, changed to I-state).
FLiNE # Specifies whether the line (if found in Mstate) must be written back immediately or
after outstanding bus cycles are completed.
Figure 5.18 shows an inquiry with AHOLD that miss~
es the cache. When the system asserts AHOLD, the
i860 XP microprocessor floats A31-A3 in the next
clock period. It does not, however, assert HLDA;no
acknowledge is required. Once the address pins are
floating, external logic drives the address for the inquiry on A31-A5 and starts the inquiry cycle byactivating EADS#. The i860 XP microprocessor does
not begin sampling EADS# until the second clock
after AHOLD is activated. EADS# activation may be
delayed any number of clocks.
The i860 XP microprocessor compares the address
of the inquiry request with addresses of lines in
cache and of any line in the write-back buffer waiting
3
2
5
4
7
6
8
9
10
ClK
AHOlD
'.
~
I
ADDRESS
:'$
I
-J} - -:- -
_ _F_R_OM_C_PU..,.:
EADS#
s --:- -{
I
I
I
I
F~OM CPU
I
I
I
,
HITM#
HIT#
________________________
~I
,
I
BRDY#
,,
V!I
,
ADS#
I
I
•
I
I
,
,
,
,
'G:>
.--.----~A ------.----.----~----~----~----~----.
"
"
"
DATA
I
;'i~~W;;f·,~t~~;;~l,;¥t~~~:il;Wt~{~,liI:r.t~~~~%~WHi~;,~;;it~i;:11W%~;*t~@;W~M¥:
I
I
I
I
I
I
I
I
I
I
I
,
I
I
I
I
240874-45
NOTES:
.
A outstanding cycle (for example, a single-transfer read) finishes during the inquiry
1. Earliest assertion of EADS# is two clocks after assertion of AHOLD
2. Earliest deassertion of AHOLD is one clock after assertion of EADS #
3. HIT# is valid two clocks after assertion of EADS#
4. Earliest assertion of ADS# for next cycle is one clock after deassertion of AHOLD
Figure 5.18. Inquiry Miss Cycle
2-72
i860TM XP MICROPROCESSOR
The earliest that AHOLD can be deasserted is the
clock after EADS# assertion. However, by maintaining AHOLD active, multiple inquiry cycles can be executed in one AHOLD session (Figure 5.19). The
i860 XP microprocessor can accept inquiry cycles at
a rate of one every other clock period, unless a
write-back is required. The earliest that ADS# can
be asserted for the next cycle is the clock after
AHOLD deassertion.
5.3.1 INQUIRY WRITE-BACK CYCLES
If an inquiry cycle hits a dirty (M-state) line in the
i860 XP microprocessor cache, the i860 XP microprocessor asserts the HITM # signal to indicate that
the line will be written on the bus. The HITM # output
becomes valid in the same clock period as HIT # .. In
this case the modified line is written out, and the
cache entry is changed to either I or S state according to INV. The HITM # signal. stays active through
the last BRDY # for the corresponding write-back
cycle.
The second inquiry in Figure 5.19 hits an unmodified
line in the cache. When a cache line with matching
address is found and the INV input signal is asserted
(as in this case), that line is invalidated (changed to
I-state). If the INV signal is inactive, the line enters
S-state.
2
3
An inquiry write-back cycle is similar to ordinary
write-back cycles. It is initiated by assertion of
ADS#. ADS# is asserted even when the AHOLD
5
4
B
7
6
9
II
10
CLK
I
I
AHOLD
, $
-.,......,/
I
ADDRESS
I
__..rR_O_t.t_C_P_U.,.:-I} - -;- - {TO
I
CPU
I
litlwf11X
TO
I
?pu} - -:- -
<\._,.....__F..:R_O_t.t_C_Pu.....:_ __
EADS#
INV
HITt.t#
HIT#
I
__________________J! •
\ t
I
I
ADS#
~
Vi)
I
BRDY#
CD'.-1--------- ... ---- ... ----_.'--- .. ---- .. ----.
I
I
DATA
I
-----_._-....
I
I
A
I
I
I
I
I
I
I
I
I
I
I
I
I
I
240874-46
NOTES:
A Outstanding cycle (for example, a single-transfer read) finishes during the inquiry
B Earliest inquiry, no invalidation
C Earliest successive inquiry, with invalidation
1. EADS # is not sampled in the clock after its assertion
2. Inquiry B misses cache
3. Earliest deassertion of AHOLD is one clock after last assertion of EADS#
4. Inquiry C hits cache, invalidates l i n e .
.
.
5. Earliest assertion of ADS#' for next cycle is one clock after deassertion of AHOLD
Figure 5.19. Fastest Inquiry Cycles (Miss and Hit)
2-73
i860TM XP MICROPROCESSOR
signal is active. The cycle definition signals are driven properly by the processor, however, the address
pins are not driven, because activation of AHOLD
forces the i860 XP microprocessor off the address
bus. If, however, AHOLD is deasserted before or
during the write-back cycle, the i860 XP microprocessor drives the correct address for the write-back.
clock periods after the HITM # pin is driven active or
after the last BRDY # is returned for any outstanding
cycle, whichever occurs later.
Bursts for a HITM# write-back, as for any writeback, are in the order 0, 8, Ox10, Ox18, because the
i860 XP microprocessor ignores A4-A3 of the inquiry address.
For all types of inquiry, the write-backs are not pipelined into an outstanding cycle,. except when the
FLlNE# pin is used (refer to section 5.3.5). ADS#
for the inquiry write-back is asserted from one to four
2
4
3
Figure 5.20 shows an inquiry cycle that hits an Mstate line.
7
5
10
8
12
11
ClK
AHOlD
J
I
ADDRESS
I
,
I
I
I
I
FRO~ cpu} - -:- - ~§~:~~~I~t1iM%~i:~~:~;~::;~{i:~Mf;:jtnJf~1~i i};:;:!trl\,<:;r; ~}~~~~m::iMm~:4~::t'::~lt*t:t - -:- - {__---FR-O-~-C-pu_ _ _..
\~~~
HITM#
______~~______~II 6
\
HIT#
lEN#
CACHE#
I
I
ADS#
\Y
DATA
':t;FjlAll)f~i1fj fH\@i}~
I
I
I
I
I
I
I
I
I
I
I
I
I
•IIIIIIII~'
____ • _ _ _ _ ... ____ .. _ _ _ _ .. ____
__ __ I- ___ • _ ___ __ 1_ _
fROM
FROM
FROM
FRO
_
~
DATA
I
I
I
I
I '
I
~
1
I
I
CPU
OPU
CPU
CPU
240874-55
Figure 5.28. Cycle Reordering via FLlNE# (No Ongoing Burst)·
2-82
I
i860™ XP MICROPROCIESSOR
4
11
10
12
13
14
CLK
AHOLD
H1T#
'0
~,________________________~\~______~____~__~____~____~__________~____~__~__
H1T~#
W/R#
CACHE
FUNE#
ADS#
NA#
\\..-+____!--__+ __--J____..J....__.....J._ _ _.J-_-+__!--_-+--I
·~~__________~________~----~--~~I
\~--~--~--~--~~--~~~__~~~~~'~~~__~__~~. .~~~~~--~~~~~~~~~~~,~-"~--~"~'~~.'~~"'~*'~,~:~~-.,__~.~O'\~8
~----~--------~----------~\JlJ~-----------:
'
~
. •,. . . .:.•. ,:i.i'<.:..•:.'•>
.
~
,
"t.;.I
~>k::/;fi,
BRDY#
DATA
240874-56
NOTES:
1. BRDY # is ignored by CPU from end of ongoing burst through ADS# of write-back, even if other cycles remain
outstanding
2. NA# required only if another cycle is outstanding
3. If the first BRDY # is asserted here or sooner (relative to HITM #), the outstanding cycle completes before the
FLlNE# write-back.
Figure 5.29. Cycle Reordering via FLINE # (Ongoing Burst)
2-83
inteL
i860TM XP MICROPROCESSOR
5.3.5.3 Reordering Write-Backs with BOFF #
5.4 The LOCK # Cycle Attribute
Back-off cycles are discussed in general in Section
5.2.2. Figure 5.30 shows how BOFF # can be used
to cancel outstanding cycles so that an inquiry writeback can take place immediately.
The processor asserts the LOCK# signal when several accesses to a single memory location must be
effectively uninterruptible. By causing LOCK # to be
asserted, a programmer can, for example, increment
the contents of a memory variable and be assured
that the variable will not be accessed between the
read and the update of that variable.
10
12
13
15
14
16
eLK
AHOLD
,
BOF'F'#
I
VU
,
,
,~;{t;. '4-At10:t;t,:.~ .;1% :;.j!:;;:\~LA;;0:ij';i '" :;\L~;li;~:;;:, i.,i::i::RT·":.t ;';;: ;L:~;'l:$~; .,}'; ..\;(0
\
HIT~#
HIT#
,
I
\
:
LEN#
!---~-~-~-~--+--~\--:--<".---------Y
CACHE#
\ --:--'\,,~,:------______I
,
ADS#~
•
t)
\.i.J
\...U
NOTES:
A Outstanding cycle (for example, noncacheable 128-bit read) W Write-back cycle
1. AHOLD begins an inquiry while one cycle is outstanding.
2. Earliest assertion of EADS# is two clocks after assertion of AHOLD
3. Inquiry hits modified line.
4. Assertion of BOFF # aborts th~ outstanding cycle.
5. BRDY# asserted during BOFF# is ignored by CPU.
6. Write-back begins after deassertion of BOFF # .
7. Earliest assertion of ADS# for restart of cycle A (assuming no pipe lining).
Figure 5.30. Cycle Reordering via BOFF # (Ongoing Burst)
2-84
\j
in~®
i860™ XP MICROPROCESSOR
The memory location to be locked is the one whose
address is driven during the cycle in which LOCK #
is first activated. In multiprocessor systems, external
hardware should guarantee that no other processor
is granted a locked read, locked write, or unlocked
write to the same location until LOCK # is deasserted. The i860 XP microprocessor has no hardware
provision to prevent another master from also locking the variable; this responsibility falls on the bus
arbiter. In the simplest implementation, the arbiter
can globally prevent other masters from accessing
the bus.
microprocessor recognizes bus hold (HOLD), address hold (AHOLD), and back-off (BOFF#) while
the LOCK# signal is active. In spite of such intervening conditions, the arbiter should prevent any
other bus master from also locking or updating the
variable the i860 XP microprocessor locked. In simple systems the HOLD input can be masked by the
LOCK# output (that is, the external logic that generates HOLD can AND the LOCK# signal with other
hold conditions). More sophisticated systems, however, may allow the bus to be turned over while
LOCK# is asserted.
Not all cycles affect the value of LOCK #. Code
fetches, write-backs due to replacement or inquiry,
and cycles restarted due to BOFF # do not affect
LOCK #. Any other type of cycle can be used to initiate or terminate LOCK #, including cache line fills,
interrupt acknowledge, 1/0, and special cycles.
Whatever the lock implementation, arbiter design
must, in one case, allow another processor to write
the locked variable. That case is when another
i860 XP microprocessor or master asserts HITM # in
response to the inquiry generated by the locking
processor's initial read. That other master must write
back the locked variable before the i860 XP microprocessor can read it. This HITM# write-back must
always be allowed.
Data accesses with LOCK # asserted are not pipelined, and other data cycles are not pipe lined while a
LOCK# cycle remains outstanding. Instruction
fetches, however, may be pipelined during lock.
The timing of LOCK # is shown in Figure 5.31. Note
that LOCK # is asserted in the same clock period as
ADS# for the locked address, but is deasserted in
the clock period after ADS# for tho unlocking load
or store.
The i860 XP microprocessor can run very long lock
sequences; therefore, to guarantee reasonable bus
turnover latency in multi master systems, the i860 XP
4
5
ClK
LEN
CACHE#
W/R#
ADS#
ADDRESS
\
I
I
'--_---lI.\__-.--I'
I
~,
__~X~__O=-~X~__________~
I
BRDY#
:
:
I
I
I
I
,. _ _ _ _
DATA
,
p:: i'" '{H1::;1;;;j;f';"f;g:;:i}~1,j: ,;Hi;' t;;;;;U,A:Nl1EIi:;::::' .,,;':
~
.1. _
:ap:W:
__ __ I. _
:
\ .
lOCK#
TO
CPU
:
_
FROM
:
I,
CPU
6
I
,
NOTES:
L Locking access
U Unlocking access
1. This address is to be locked
2. LOCK# is asserted with ADS#
3. LOCK # is deasserted one clock after ADS #
Figure 5.31. LOCK# Timing
2-85
:
_ .... ____ ,
I
I
I
I
I
,
240874-58
i860™ XP MICROPROCESSOR
5.5 RESET Initialization
Initialization of the i860 XP microprocessor is caused
when the system asserts the RESET signal for at
least ten clocks. Table 5.5 shows the status of output pins during the time that RESET is asserted.
Note that the bidirectional data pins (D63-DO and
DP7-DPO) are floated during RESET, though the bidirectional A31-A3 pins are not. If the i860 XP microprocessor is used with 82495XP and 82496XP
cache, however, the latter do float the bidirectional
pins they share with i860 XP microprocessor during
RESET. Note that HOLD requests are honored during RESET and that the HLDA output, signal may
also become active. The status of output pins depends on whether a HOLD request is being acknowledged. Note also that the test logic may be active
during RESET and that the EXTEST instruction may
drive other values on the output pins.
After the RESET signal goes inactive the processor
remains in the RESET state for three more clocks.
Applications that use the HOLD signal to float the
bus during RESET should keep HOLD active for
three more clocks after the RESET signal is deactivated.
Some aspects of processor configuration are determined by asserting input signals during RESET. To
select a given option, the corresponding input must
be asserted for at least the last three clocks before
the falling edge of RESET; to deselect, the corresponding input must be de asserted for at least the
last three clocks before the falling edge of RESET:
EWBE# Enter strong ordering mode.
FLlNE# Enter one clock late back-off mode.
INT less Enter eight-bit code-size mode.
PEN#
Enter normal (small output buffers) current mode.
Figure 5.32 shows how configuration pins are sain- )
pled during the three clock periods just before the"\
falling edge of RESET. No inputs besides EWBE#,
HOLD, FLlNE#, INT/CS8, and PEN# are sampled
during RESET.
Table 5.5. Output Pin Status during Reset
Pin Value
Pin Name
HOLD
Not Acknowledged
HOLD
Acknowledged
LOW
LOW
LOW
HIGH
Tristate OFF
Undefined
LOW
HIGH
Tristate OFF
Tristate OFF
Tristate OFF
Tristate OFF
Undefined
HIGH
Undefined
HIGH
BREQ
HLDA
W/R#, PWT, PCD
ADS#
D63-DO, DP7-DPO
A31-A3, BE7#0-BEO#, NENE# CACHE#, CTYP, D/C#,
KBO, KB1, LEN, M/IO#, PCYC
PCHK#, HIT#
*
HITM#, LOCK#
NOTE:
This table does not apply if the test logiC is running the EXTEST instruction.
10
11
12
13
elK
RESET
EWBE#,
FUNE#.
INT/eSB,
PEN#
OTHER
INPUTS
u
,
1<;-oF ......
;;tPC;i·""."
·c . . ·
";"q ••
r;;;
[1ft''!',! ••.•
>;;··;;·!!hrr¥;;;;'i
" ~!;;i;;§!X
;;"";'";;;:;;;;14"'; 1"'; ff;;!;;;i;r';;·1~;t;Q1"ii~!i'i. ;ti;
•
\
X
b . ._ _. . ; ; ; ._ _- , ' -_ _ _ __
;tcti!!ii';#¥ii;t~;t~;Wc ii§t~":V!i!i\!;;;§t14! ;tP~t,pg§!¥t~~~~\t§t4j: i\((;~'¥;?; t\;M\..____---!
HOLD
HLDA
240874-59
NOTE:
1. The CPU samples these inputs in the clocks preceding the falling edge of reset.
Figure 5.32. Reset Activities
2-86
i860™ }{P MICROPROCESSOR
While in eight-bit code-size mode, instruction cache
misses are one'byte reads (transferred on 07-00 of
the data bus) instead of eight-byte reads. This allows
the i860 XP microprocessor to be bootstrapped from
an eight-bit ROM. For these code reads, byte enables BE2#-BEO# are redefined to be the low order three bits of the address, so that a complete
byte address is available. The entire eight-byte data
QUs continues to be parity-checked by the i860 XP
microprocessor during GS8-mode instruction fetches; therefore, external hardware must either generate good parity on all eight bytes or disable parity
traps by deasserting PEN # during GS8 mode.
6.1
Test Architecture
The test logic contains the following elements:
While in this mode, instructions must reside in an
eight-bit wide memory, while data must reside in a
separate 64-bit wide memory. After the code has
been loaded into 64-bit memory, initialization code
can initiate 64-bit code fetches by clearing the GS8
bit of the dirbase register (refer to section 2). Once
eight-bit code-size mode is disabled by software, it
cannot be reenabled except by resetting the i860 XP
microprocessor.
o
Test access port (TAP), which consists of input
pins TMS, TGK, TOI, and TRST#; and output pin
TOO.
o
TAP controller, which receives the dedicated test
clock (TGK) and interprets the signals on the test
mode select (TMS) line. The TAP controller generates clock and control signals for the instruction and test data registers and for other parts of
the test logic.
o
Instruction register (IR), which allows instruction
codes to be shifted into the test logic. The instruction codes are used to select the test to be
performed or the test data register to be accessed.
o
Test data registers: Bypass Register (BPR), Device Identification Register (DID), and BoundaryScan Register (BSR).
The instruction and test data registers are separate
shift-register paths connected in parallel and having
a common serial data input and a common serial
data output connected to the TAP TOI and TOO signals respectively.
Instruction fetches in GS8 mode update. the instruction cache if KEN # is asserted during NA # or all of
the first eight BROY #s (refer to section 4.2.26).
They are pipelined if NA # is asserted. When used
with the 82495XP and 82496XP cache, GS8 mode
works only if the ROM locations are made noncacheable.
6.2 Test Data Registers
The test logic contains the following data registers:
6.0
TIESTABlliTY
The i860 XP microprocessor provides testability features compatible with the proposed Standard Test
Access Port and Boundary-Scan Architecture (IEEE
Std. P1149.1 106). The subset of the standard test
logic implemented in the i860 XP microprocessor
provides for testing the interconnections between
the i860 XP microprocessor and other integrated circuits once they have been assembled onto a printed
circuit board.
The test logic consists of a boundary-scan register
and other building blocks that are accessed through
a test access port (TAP). The TAP provides a simple
serial interface that makes it possible to test all signal traces with only a few probes.
The TAP can be controlled by a bus master. The bus
master can be either automatic test equipment or a
component that interfaces to a four-pin test bus.
2-87
o
Bypass Register (BPR): BPR is a one-bit shift
register that provides a minimum-length path between TOI and TOO when no test operation of
the component is required. This allows more rapid movement of test data to and from other board
components that are required to perform test operations. While running through BPR, the data is
transferred without inversion from TOI to TOO.
o
Device Identification Register (DID): This register contains the manufacturer's identification
code, part number code, and version code in the
format shown by Figure 6.1. The values are: manufacturer's identification code (9), part number
code (61AO), version code (8), entire 32-bit value
(Ox861 A0013).
o
Boundary Scan Register (BSR): The BSR is a
single shift-register path containing 150 cells that
are connected to all input and output pins of the
i860 XP microprocessor. Figure 6;2 shows the
logical structure of the BSR. Input cells only capture data; they do not affect operation of the
i860 XP microprocessor. Data is transferred without inversion from TOI to TOO through the BSR
during scanning. The BSR can be operated by
the EXTEST and SAMPLE instructions.
intei®
i860TM XP MICROPROCESSOR
J1J0292 '27262524 2J22 2120 19 18 17 16 15 14 IJ 12 1 1 1 0 9 8 7 6 5 4 J 2
I
\ \I \I \I \I
\
\
\I \I \I \I
8
\
\
\I
1 0
MANUFACTURER
IDENTITY
PART NUMBER
VERSION
\
\I
\I \I \I
\I \I \I \I \I \I
61AO
\I \I
9
\
\I \I
\
H
\
240874-60
Figure 6.1. Format of DID Register
BOUNDARY SCAN REGISTER
System
Bidirectional
Pin
System
Logic
Input
SYSTEM
LOGIC
System
3-State
TCK
Output
TOI
TOO
240874-61
Figure 6.2. Logical Structure of BSR Register
EXTEST The BSR cells associated with output pins
drive the output pins of the i860 XP microprocessor. Values scanned into the BSR
cells become the output values. The BSR
cells associated with input pins sample
the inputs of the i860 XP microprocessor.
Note that I/O pins can be input or output
for this test, depending on their control
setting. The values shifted to the input
latches are not used by the internal logic
of the i860 XP microprocessor. After use
of the EXTEST command, the i860 XP microprocessor must be reset (with the RESET signal) before normal use.
SAMPLE The BSR cells associated with output pins
sample the value driven by the i860 XP
microprocessor. BSR cells associated
with input pins sample on the rising edge
of TCK the values driven to the i860 XP
6.3 Instruction Register
The Instruction Register (IR) selects the test to be
performed and the test data register to be accessed.
It is four bits wide, with no parity bit. Table 6.1 shows
the encoding of the instructions supported by the
TAP controller of the i860 XP microprocessor. The
rightmost bit is the least significant and is the first
shifted out on TOO.
Table 6.1. TAP Instruction Encoding
Instruction Code
Instruction
0000
EXT EST Boundary Scan
0001
SAMPLE Boundary Scan
0010
10COOE
0011 ... 1110
Intel reserved CAUTION'
1111
BYPASS
• CAUTION: Operation of these private instructions may
cause damage to the component.
2-88
i860™ XP MICROPROCESSOR
microprocessor. BSR cells associated
with I/O pins sample the value on the respective pin. The I/O pin can be driven by
the iB60 XP microprocessor or by external
hardware. The values shifted to the input
latches are not used by the internal logic
of the iB60 XP microprocessor.
IDCODE The identification code of the iB60 XP microprocessor from the DID register is
passed to TOO. The DID register is not
altered by data shifted in on TOL
BYPASS Test data is passed from TOI to TOO via
the single-bit BPR, effectively bypassing
the test logic of the iB60 XP microprocessor. Because of its special encoding, this
instruction can be entered by holding TOI
HIGH while completing an instructionscan cycle. This reduces the demands on
the host test system in cases where access is required, for example, only to chip
57 on a 1DO-chip board.
Note that an open circuit fault in the
board-level test data path causes the
BPR register to be selected following an
instruction-scan cycle, because the TOI
input has a pull-up resistor. Therefore, no
unwanted interference with the operation
of the on-chip system logic can occur.
Table 6.2 defines which registers are active during
execution of each instruction.
6.4 TAP Controller
The value of the TMS input signal at a rising edge of
TCK controls the sequence of state changes. The
state diagram for the TAP controller is shown in Figure 6.3. Test designers must consider the operation
of the state machine in order to design the correct
sequence of values to drive on TMS.
6.4.1 TEST-LOGIC-RESET STATE
In this state, the test logic is disabled so that normal
operation of the iB60 XP microprocessor can continue unhindered. This is achieved by initializing the instruction register such that the IOCOOE instruction
is loaded. No matter what the original state of the
controller, the controller enters Test-Logie-Reset
when the TMS input is held HIGH for at least five
rising edges of TCK. The controller remains in this
state while TMS is HIGH.
If the controller leaves the Test-Logie-Reset state as
a result of an erroneous LOW signal on the TMS line
at the time of a rising edge of TCK (for example, a
glitch due to external interference), it returns to the
Test-Logie-Reset state following three rising edges
of TCK while the TMS signal at the intended HIGH
logic level. The operation of the test logic is slich
that no disturbance is caused to on-ctlip systom logic operation as the result of such an error. On leaving the Test-Logie-Reset state, the controller moves
into the Run-Test/Idle state, where no action occurs
because the current instruction has been set to select operation of the DID register. The test logic is
also inactive in the Seleet-DR-Sean and Se/ect-IRScan states.
The TAP controller is also forced to the Test-LogieReset state by applying a LOW logic level to the
TRST # input and at power-up.
The TAP Controller is a synchronous, finite state
machine. It controls the sequence of operations of
the test logic. The TAP Controller changes state
only in response to the following events:
1. A rising edge of TCK.
2. A transition to logic zero at the TRST # input.
3. Power-up.
Table 6.2. Registers Active by Instruction
Register
Mode
EXTEST
SAMPLE
IOCOOE
BYPASS
BSR
TOI
TOI
~
BSR ~ TOO
~ BSR ~ TOO
Inactive
Inactive
DID
BPR
Inactive
Inactive
Inactive
Inactive
Inactive
TOI ~ BPR ~ TOO
OIO~TOO
Inactive
2-B9
intel®
i860TM XP MICROPROCESSOR
o
240874-62
NOTE:
0,1 The values present on TMS at the time of a rising edge on TCK.
Figure 6.3. TAP Controller State Diagram
6.4.2 RUN-TEST/IDLE STATE
6.4.3 SELECT-DR-SCAN STATE
The controller enters this state between scan operations. Onc~ in this state, the controller remains in
this state as long as TMS is held LOW. No activity
occurs in the test logic~ The instruction register and
all test data registers retain their previous state.
When TMS is HIGH. and a rising edge is applied to
TCK, the controller moves to the Select-OR-Scan
state.
This is a temporary controller state. The test data
register selected by the current instruction retains its
previous state. If TMS is held LOW and a rising edge
is applied to TCK when in this state, the controller
moves into the Capture-DR state, and a scan sequence for the selected test data register is initiated.
If TMS is held HIGH and a rising edge is applied to
TCK, the controller moves to the Select-IR-Scan
state.
The instruction does not change in this. state.
2-90
i860™ XP MICROPROCESSOR
6.4.4 SELECT-IR-SCAN STATE
6.4.8 PAUSE-DR STATE
This is a temporary controller state. The test data
register selected by the current instruction retains its
previous state. If TMS [s held LOW and a rising edge
is applied to TCK when in this state, the controller
moves into the Capture-IR state, and a scan sequence for the instruction register is initiated. If TMS
is held HIGH and a rising edge is applied to TCK, the
controller moves to the Test-Logie-Reset state.
The pause state allows the test controller to temporarily halt the shifting of data through the test data
register in the serial path between TOI and TOO.
This might be necessary, for example, to allow the
tester to reload its pin memory from disk during application of a long test sequence.
The test data register selected by the current instruction retains its previous state. The instruction
does not change in this state.
The instruction does not change in this state.
The controller remains in this state as long as TMS
is LOW. When TMS goes HIGH and a rising edge is
applied to TCK, the controller moves to the Exit2-DR
state.
6.4.5 CAPTURE-DR STATE
In this state, the BSR captures input pin data if the
current instruction is EXTEST or SAMPLE. The other
test data registers, which do not have parallel input,
are not changed.
6.4.9 EXIT2-DR STATE
The instruction does not change in this state.
This is a temporary state. If TMS is held HIGH and a
rising edge is applied to TCK, the scanning process
terminates, and the TAP controller ,enters the
Update-DR state. If TMS is held LOW and a rising
edge is applied to TCK, the, controller enters the
Shift-DR state.
When the TAP controller is in this state and a rising
,edge is applied to TCK, the controller enters the
Exit1-DR state if TMS is HIGH or the Shift-DR state
if TMS is LOW.
The test data register selected by the current in·
struction retains its previous state unchanged. The
instruction does not change in this state.
6.4.6 SHIFT-DR STATE
In this controller state, the test data register connected between Toi and TOO as a result of the current instruction shifts data one stage toward its serial
output on each rising edge of TCK.
6.4.10 UPDATE-DR STATE
The BSR register is provided with a latched parallel
output to prevent changes at the parallel output
while data is shifted in response to the EXTEST and
SAMPLE instructions. When the TAP controller is in
this state and the BSR register is selected, data is
latched onto the parallel output of this register from
the shift-register path on the falling edge of TCK.
The data held at the latched parallel output does not
change other than in this state.
The instruction does not change in this state.
When the TAP controller is in this state and a rising
edge is applied to TCK, the controller enters the
Exitt-DR state if TMS is HIGH or remains in the
Shift-DR state if TMS is LOW.
6.4.7 EXIT1-DR STATE
All shift-register stages in test data registers selected by the current instruction retain their previous
state unchanged. The instruction does not change in
this state.
This is a temporary state. If TMS is held HIGH, a
rising edge applied to TCK while in this state causes
the controller to enter the Update-DR state, which
terminates the scanning process. If TMS is held low
and a rising edge is applied to TCK, the controller
enters the Pause-DR state.
When the TAP controller is in this state and a rising
edge is applied to TCK, the controller enters the
Seleet-DR-Sean state if TMS is held HIGH or the
Run- Test/Idle state if TMS is held LOW.
The test data register selected by the current instruction retains its previous state unchanged. The
instruction does not change in this state.
6.4.11 CAPTURE-IR STATE
In this controller state the shift register contained in
the instruction register loads the fixed value 0001 on
the rising edge of TCK.
2-91
i860™ XP MICROPROCESSOR
The test data register selected by the c;:urrent instruction retains its previous state. The instruction
.
does not change in this state.
terminates, and the TAP controller enters the
Update-IR state. If TMS is held LOW and a rising
edge is applied to TCK, the controller enters the
Shift-IR state.
When the controller is in this state and a rising edge
is applied to TCK, the controller enters the Exit1-IR
state if TMS is held HIGH or the Shift-IR state if TMS
is held LOW.
The test data register selected by the current instruction retains its previous state unchanged. The
instruction does not change in this state, and the
instruction register retains its state.
6.4,12 SHIFT-IR STATE
6.4.16 UPDATE-IR STATE
In this state, the shift register contained in the instruction register is connected between TDI and
TDO and shifts data one stage towards its serial output on each rising edge of TCK.
The instruction shifted into the instruction register is
latched onto the parallel output from the shift-register path on the falling edge of TCK. Once the new
instruction has been latched, it becomes the current
instruction.
The test data register selected by the current instruction retains its previous state. The instruction
does not change in this state.
Test data registers selected by the current instruction retain the previous state.
When the controller is in this state and a rising edge
is applied to TCK, the controller enters the Exit1-/R
state ifTMS is held HIGH or remains in the Shift-IR
state if TiviS fs heid LO'vV.
6.5 Boundary Scan Register Cell
Ordering
Figure 6.4 shows the order of cells in the .BSR.
There are 150 cells including TDO. TDI is not a BSR
cell.
6.4.13 EXIT1-IR STATE
This is.a temporary state. If TMS is held HIGH, a
rising edge applied to TCK while in this state. causes
the controller to enter the Update-IR state, which
terminates the scanning process. If TMS is held low
and a rising edge is applied to TCK, the controller
enters the Pause-IR state.
The DCTL, ACTI.:, TCTL, and OCTL cells do not correspond to pins of the i860 XP microprocessor; rather, they control the bidirectional and tristate pins:
DCTL D63-DO, DP7-DPO
ACTL A31-A3
TCTL Tristate outputs: ADS#, BE7#-BEO#,
CACHE#, CTYP, D/C#, KBO, KB1, LEN,
MIIO#,NENE#, PCD, PCYC, PWT, W/R#
OCTL Outputs not floated in normal operation:
BREQ, HIT#, HITM#, HLDA, LOCK#,
PCHK#
The test data register selected by· the current instruction retains its previous state unchanged. The
instruction· does not 'change in this state, and the
instruction register retains its state.
6.4.14 PAUSE-IR STATE
This state allows the shifting of the instruction register to be temporarily halted.
If a value of one is loaded into any of these control
latches, the associated pins will not drive the external bus while running EXTEST.
The test data register selected by the current instruction retains its previous state. The instruction
does not change in this state, and the instruction
register retains its state.
The values of DCTL, ACTL, TCTL, and OCTL are
undefined during the SAMPLE instruction.
The values and direction of 1/0 and outputs do not
change during the scanning process (that is, during
Shift-DR states). They only change after scanning is
completed (in the Update-DR state).
The controller remains in this state as long as TMS
is LOW. When TMS goes HIGH and a rising edge is
applied to TCK, the controller moves to the Exit2-IR
state.
The decision table, Table 6.3, defines how the
boundary scan instructions EXTEST and SAMPLEI
PRELOAD utilize BSA.
6.4.15 EXIT2-IR STATE
This is a temporary state. If TMS is held HIGH and a
rising edge is applied to TCK, the scanning process
2-92
i860™ XP MICROPROCESSOR
10
20
30
40
50
60
70
80
90
100
110
120
130
140
240874-63
Figure 6.4. Boundary Scan Register Ordering
2-93
in1:el®
i860TM XP MICROPROCESSOR
6.6 TAP Controller Initialization
7.0
TAP can be initialized by applying a high signal level
on the TMS input for five periods of TCK or by activating the TRST# input pin. TCK does not have to
be running in order to initialize TAP with the TRST#
pin. TRST # is provided with an internal pull-up resistor; so, even if an open circuit fault occurs, the TAP
logic can still be used.
Figures 7.1 and 7.2 show the locations of pins; Tables 7.1 and 7.2 help to locate pin identifiers.
MECHANICAL DATA
Table 6.3. Instruction Functions
Instruction:
Control Cell:
Input BSR cells ...
EXTEST
LOW
Input/output BSR cells:
LOW
I
HIGH
. .. sample values driven to
processor by system
... sample values driven to
processor by system
NO
NO
. .. drive output pins with
cell values
.. , sample values driven
by processor
Values of input cells
used by processor?
Output BSR cells ...
I
SAMPLE/PRELOAD
HIGH
Treat as
output
I
2-94
Treat as
input
Treat as
output
I
Treat as
input
="
KEN II
0
NAil
0
WB/WTII
0
Yee
0
Vee
0
Vee
0
Yee
0
Vee
0
Vee
0
Vee
0
Vee
0
Vee
0
Yee
0
Yee
0
055
0
051
0
044
0
040
0
T
W/RII
0
LEN
0
PWT
0
PCYC
0
Yss
0
Yss
0
Yss
0
Yss
0
Yss
0
YSS
0
Yss
0
YSS
0
YSS
0
Yss
0
Yss
0
056
0
049
0
042
0
039
0
S
A3
0
RESET
0
lOCKII
0
1.1/1011
0
EAOSII INT/CS8 BERR
0
0
0
f'LlNEIl
0
HlOA
0
KBI
0
NENE#
0
Hiltl
0
TRST#
0
TOI
0
062
0
058
0
046
0
052
0
037
0
R
A4
0
Yss
0
Vee
0
BOFF#
0
O/CII
0
PEN II
0
BREQ
0
TOO
0
KBO
0
HOLD
0
TI.1S
0
063
0
060
0
057
0
Vee
0
033
0
035
0
CD
en
C)
Q
TCK
0
YSS
0
Yee
0
061
0
054
0
Vee
0
Yss
0
OP4
0
-I
P
YeeClK
·0
Vee
0
YSS
0
RSRYO
0
CTYP
0
059
0
OP6
0
Yss
0
Vee
0
034
0
Vee
0
Vee
0
YSS
0
AOS"
0
HITIA"
0
OP7
N
.0
050
0
Yss
0
Vee
0
036
0
Yss
0
Yss
0
ClK
0
AS
M
Vee
0
Cl
053
0
047
0
Vss
0
Yss
0
031
0
.""3:
Vee
Yss
SPARE
A6
048
041
Yss
Vee
0
0
0
0
0
0
0
0
Vee
0
)(
L
Vee
0
Yss
Yss
AID
A8
0
0
0
0
045
0
043
K
Vee
0
0
Yss
0
Yss
0
Vee
0
.Yee
·0
Yss
.0
A12
A14
0
0
OP5
0
038
0
Yss
0
Vee
0
Vee
0
n
::J:I
J
Vee
·0
H
. Vee
·0
Yss
0
Yss
:::!!
u:t
.0
AI6
0
A20
0
032
0
PCHK"
0
Yss
0
Yss
0
Vee
0
o(')
iil
::::.
G
Vee
0
Vee
0
Yss
0
A22
0
A26
0
028
0
030
0
Yss
0
Vee
0
029
0
~
:::I
F
A7
0
Vee
.0
Yss
0
A28
0
A30
0
024
0
026
0
Yss
0
Vee
0
027
.0
ii
E
A9
..0
Yss
0
Yee
0
A27
0
aEOIl
0
021
0
023
0
025
0
Yss
0
Vee
0
0
0
All
0
Yss
0
Vee
0
A29
0
BElli
0
BE211
0
BE611
0
EWBEII
0
01
0
05
0
DID
0
014
0
OP2
0
017
0
019
0
020
0
Vee
0
OP3
0
022
0
:Po
AI3
0
A19
0
AlB
0
A31
C
BE4#
0
Yss
0
Yss
0
Yss
0
VSS
0
YSS
0
Yss
0
YSS
0
YSS
0
YSS
0
012
0
08
0
07
0
016
0
018
0
"@J
D13
0
DIS
0
Iiiiil
F
!!
..
u:t
C
CD
;"'I
.....
!iI:
)(
'tI
..
..
..
iii:
0'
0
'C
0
n
CD
III
III
I\)
l
U
BROYII
0
0
cO :Po
c.n
:::I
0
0
:::I
c
0
t
......
==
3
:::I
en
c:
CD
B
A
PCO
0
INY
0
CACHEII AHOlO
0
0
0
PINOUT
PIN SIDE VIEW
AI5
0
A21
0
A24
0
BE3#
0
Yss
0
AI7
A23
0
0
A25
0
BE511
0
BE7# BYPASS#
0
0
1
2
3
4
5
Vee
0
6
0
Vee
0
VSS
0
Vee
0
Yss
0
Vee
0
Yee
0
' Vss
0
D9
0
Dl1
0
Vee
0
Vee
0
Vee
0
Vee
0
Yee
0
02
.0
Yee
0
OPO
0
03
0
04
0
06
0
OPI
0
8
9
10
11
12
13
14
15
16
17
18
19
Yee
0
YSS
DO
0
7
240874-64
II
*
CD
al
o
"D
s::
o
"D
::J:I
m
o
::J:I
2&
~
~
~
2&
~
U
040
0
044
0
051
0
055
0
Vee
0
Vee
0
Vee
0
Vee
0
Vee
0
Vee
0
Vcr;
0
Vcr;
0
Vee
0
Vee
0
Vee
0
WB/WTtI
0
NA""
0
KEN""
0
BROY""
0
T
039
0
042
0
049
0
056
0
Vss
0
Vss
0
Vss
0
Vss
0
Vss
0
Vss
0
Vss
0
Vss
0
Vss
0
Vss
0
Vss
0
peye
0
PWT
0
lEN
0
W/R""
0
S
037
0
052
0
046
0
058
0
062
0
TDI
0
TRSTtI
0
HITtI
0
NENEtI
0
KBI
0
HlOA
0
FUNE""
0
BERR
0
INT/eS8 EAOStI
0
0
M/IOtl
0
lOCKtI
0
RESET
0
A3
0
R
035
0
033
0
Vee
0
057
0
060
0
063
0
TMS
0
HOLD
0
KBO
0
TOO
0
BREQ
0
PENtI
0
INV
0
o/c""
BOFF""
0
Vee
0
vss
0
A4
0
iii
Q
OP4
0
Vss
0
Vee
0
054
0
061
0
AHOlO CACHEtI
0
0
Vee
0
Vss
0
TCK
0
a:
034
Vee
Vss
OP6
059
CTYP
RSRVO
Vss
Vee
VeeClK
P
0
0
0
0
0
0
0
0
0
0
036
Vee
Vss
050
HITMtI
ADS""
Vss
Vee
Vee
"TI
o·
.
"""
'"
c
CD
QI
0
-I
)C
"V
..n
..
..
3:
0
"tI
0
CD
n
In
IIJ
I}l
CO
m
0
"V
3"
0
0
N
M
L
K
J
PCO
0
0
::s
c(
@
0
0
0
0
OP7
0
0
0
0
0
0
031
Vss
047
053
A5
ClK
Vss
Vss
0
Vss
0
0
0
0
0
0
0
0
Vee
0
Vee
Vee
Vss.
041
A6
0
SPARE
0
Vss
0
Vee
0
Vee
A8
0
AID
0
Vss
Vss
Vee
0
0
0
A14
0
A12
0
Vss
0
Vee
Vee
0
0
A16
0
Vss
0
Vss
0
Vee
0
o"V
::r:I
o
0
0
0
0
048
0
Vee
0
Vss
0
Vss
0
043
0
045
0
Vee
Vee
Vss
038
OP5
0
0
0
0
0
PINOUT
TOP SIDE VIEW
o
i!
><
"V
3::
n
::r:I
Vee
0
Vss
0
Vss
0
PCHK""
H
0
0
A20
0
a0"
G
029
0
Vee
0
Vss
0
030
0
028
0
A26
0
A22
0
Vss
0
Vee
0
Vee
0
(/)
(/)
I
F
027
0
Vee
0
Vss
0
026
0
024
0
A30
0
A28
0
Vss
0
Vee
0
A7
0
::r:I
ii"
-..
E
Vee
0
Vss
0
025
0
023
0
021
0
BEOtl
~
A27
0
Vee
0
Vss
0
A9
0
3
0
D
022
0
OP3
0
Vee
0
020
0
019
0
017
0
OP2
0
014
0
010
0
05
0
01
0
EWBEtI
0
BE6#
0
BE2""
0
BEl""
0
A29
0
Vee
0
Vss
0
All
0
"tI
~
C
018
0
016
0
07
0
08
0
012
0
Vss
0
Vss
0
Vss
0
Vss
0
Vss
0
Vss
0
Vss
0
Vss
0
Vss
0
BE4#
0
A31
0
A18
0
A19
0
AU
0
c:CD
015
0
013
0
011
0
09
0
Vss
0
Vee
0
Vee
0
Vss
0
Vee
0
Vss
0
. Vee
0
Vss
0
Vee
0
Vee
0
Vss
0
BE3#
0
A24
B
o·
A21
0
AIS
0
A
OPI
0
06
0
04
0
03
0
OPO
0
Vee
0
02
0
Vee
0
Vee
0
Vee
0
Vee
0
Vee
0
00
0
8E5#
0
A25
0
A23
0
A17
0
19
18
17
16
15
14
13
12
11
10
9
8
7
4
3
2
:::J
ic
:::J
<
rn
032
0
Cii
g)
0
BYPASS# BE7#
0
0
6
5
o
m
o
"@
2.eJ
1M!
IF'
~
~
~
240874-65
2.&
~
intet
i860™ XP MICROPROCESSOR
Table 7.1. Pin Cross Reference by Location
Location
Signal
A01 ............ A17
A02 ............ A23
A03 ............ A25
A04 .......... BE5#
A05 .......... BE7#
A06 ...... BYPASS #
A07 ..... ; ....... 00
A08 ............ Vee
A09 ............ Vee
A10 ............ Vee
A11 ............ Vee
A12 ............ Vee
A13 ............. 02
A14 ............ Vee
A15 ............ OPO
A16 ............. 03
A17 ............. D4
A18 ............. 06
A19 ............ OP1
B01 ............ A15
B02 ............ A21
B03 ............ A24
B04 .......... BE3#
B05 ............ Vss
B06 ............ Vee
B07 ............ Vee
B08 ............ Vss
B09 ............ Vee
B10 ............ Vss
B11 ............ Vee
B12 ............ Vss
B13 ............ Vee
B14 ............ Vee
B15 ............ Vss
B16 ............. 09
B17 ............ 011
B18 ............ 013
B19 ............ 015
C01 ............ A13
C02 ............ A19
C03 ............ A18
C04 ............ A31
C05 .......... BE4#
C06 ............ Vss
C07 ............ Vss
C08 ............ Vss
C09 ............ Vss
C10 ............ Vss
C11 ............ Vss
C12 ............ Vss
C13 ............ Vss
C14 ............ Vss
Location
Signal
C15 ............ 012
C16 ............. 08
C17 ............. 07
C18 ............ 016
C19 ............ 018
001 ............ A11
002 ............ vss
003 ............ Vee
004 ............ A29
005; ......... BE1 #
006 .......... BE2#
007 .......... BE6#
008 ........ EWBE#
009 ............. 01
010 ............. 05
011 .......... ; .010
012 ....•....... 014
013 ........... OP2
014 ............ 017
015 ............ 019
016 ............ 020
017 ............ Vee
018 ............ OP3
019 ............ 022
E01 .. : .......... A9
E02 ............ Vss
E03 ............ Vee
E04 ............ A27
E05 .......... BEO#
E15 ............ 021
E16 ............ 023
E17 ............ 025
E18 ............ Vss
E19 ............ Vee
F01 ............. A7
F02 .... '........ Vee
F03.: .......... Vss
F04 ............ A28
F05 ............ A30
F15 ............ 024
F16 ............ 026
F17 ............ Vss
F18 ............ Vee
F19 ............ 027
G01 ........... Vee
G02 ........... Vee
G03 .. ; ......... Vss
G04 ............ A22
G05 ............ A26
G15 ........... 028
G16 ........... 030
G17 ............ Vss
Location
Signal
G18 ........... Vee
G19 ........... 029
H01 ............ Vee
H02 ............ Vss
H03 ............ Vss
H04 ............ A16
H05 ............ A20
H15 ............ 032
H16 ........ PCHK#
H17 ........... ,Vss
H18 ............ Vss
H19 ............ Vee
J01 ............ Vee
J02 ............ Vee
J03 ............ Vss
, J04 ............ A12
J05 ............ A14
J15 ............ OP5
J16 ............ 038
J17 ............ Vss
J18 ........... Nee
J19 ............ Vee
K01 ............ Vee
K02 ............ Vss
K03 ............ Vss
K04 ........... :A10
K05 ............. A8
K15 ............ 045
K16 ............ 043
K17 ........... ;Vss
K18 ............ Vss
K19 ............ Vee
L01 ............ Vee
L02 ............ Vee
L03 ............ Vss
L04 ......... SPARE
L05 ............. A6
L15 ............ 048
L16 ............ 041
L17 : ........... Vss
L18 ....... ; .... Vee
L19 ............ Vee
M01 ........... Vee
M02 .. , ........ Vss
M03 ........... Vss
M04 ........ '... CLK
M05 ............. A5
M15 ........... 053
M16 ........... 047
M17 ........... Vss
M18 ........... Vss
M19 ........... 031
2-97
Location
Signal
N01 ............ Vee
N02: ........... Vee
N03 ............ Vss
N04 .......... AOS#
N05 ......... HITM#
N15 ............ OP7
N16 ............ 050
N17 ............ Vss
N18 ............ Vee
N19 ............ 036
P01 ........ VeeCLK
P02 ............ Vee
P03 ............ Vss
P04 ......... RSRVO
P05 .......... CTYP
P15 ............ 059
P16 ............DP6
P17 ............ Vss
P18 ............ Vee
P19 ............ 034
001 ........... TCK
002 ............ Vss
003 ............ Vee
004 ....... CACHE#
005 ........ AHOLO
015 ............ 061
016 ............ 054
017 ............ Vee
018 ............ Vss
019 ........... OP4
R01 ............. A4
R02 ............ Vss
R03.: .......... Vee
R04 ........ BOFF#
R05 .......... 0/C#
R06 ........... PCO
R07 ............ INV
R08 .......... PEN#
R09 .......... BREO
R10 ........... TOO
R11 ............ KBO
R12 .......... HOLO
R13 ........... TMS
R14 ............ 063
R15 ............ 060
R16 ............ 057
R17 ............ Vee
R18 ........... ,033
R19 ............ 035
S01 ............. A3
S02 ......... RESET
S03 ........ LOCK#
fI
i860TM XP MICROPROCESSOR
Table 7.1. Pin Cross Reference by Location (Continued)
Location
Signal
804 ......... M/IO#
805 ........ EA08#
806 ....... INT/C88
807 .......... BERR
808 ........ FLlNE#
809 .......... HlOA
810 ............ KB1
811 ........ NENE#
812 ........... HIT#
813 ......... TR8T#
814 ..... ; ...... TOI
815 ............ 062
816 ............ 058
817 ............ 046
Location
Signal
Location
818 ............ 052
819 ............ 037
T01 ......... W/R#
T02 ............ lEN
T03 ........... PWT
T04 .......... PCYC
T05 ............ Vss
T06 ............ Vss
TO? ........... Vss
T08 ............ Vss
T09 ............ Vss
T10 ............ Vss
T11 ............ Vss
T12 ............ Vss
Signal
T13 ............ VSS
T14 ............ Vss
T15 ............ Vss
T16 ............ 056
T17 ............ 049
T18 ............ 042
T19 ............ 039
U01 ........ BROY#
U02 ......... KEN#
U03 ........... NA#
U04 ...... WB/WT#
U05 ............ Vee
U06 ............ Vee
U07 ............ Vee
Location
Signal
U08 .......... ,.Vee
U09 ............ Vee
U10 ............ Vee
U11 ............ Vee
U12 ............ Vee
U13 ......... , .. Vee
U14 ............ Vee
U15 ............ Vee
U16 ............ 055
U17 ............ 051
U18 ............ 044
U19 ............ 040
Table 7.2. Pin Cross Reference by Pin Name
Signal
Location
A3
A4
A5
A6
A7
A8
A9
A10
A11
A12
A13
A14
A15
A16
A17
A18
A19
A20
A21
A22
A23
A24
A25
A26
A27
A28
A29
A30
A31
A08#
801
R01
M05
l05
F01
K05
E01
K04
001
J04
C01
J05
B01
H04
A01
C03
CO2
H05
B02
G04
A02
B03
A03
G05
E04
F04
004
F05
C04
N04
Signal
AHOlO
BEO#
BE1#
BE2#
BE3#
BE4#
BE5#
BE6#
BE7#
BERR
BOFF#
RSRVO
BROY#
BREO
CACHE#
ClK
CTYP
00
01
02
03
04
05
06
07
08
09
010
011
012
Location
Signal
005
E05
005
006
B04
C05
A04.
007
A05
807
R04
P04
U01
R09
004
M04
P05
A07
009
A13
A16
A17
010
A18
C17
C16
B16
011
B17
C15
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
033
034
035
036
037
038
039
040
041
042
2-98
Location
B18
012
B19
C18
014
C19
015
016
E15
019
E16
F15
E17
F16
F19
G15
G19
G16
M19
H15
R18
P19
R19
N19
S19
,J16
T19
U19
l16
T18
Signal
Location
043
044
045
046
047
048
049
050
051
052
053
054
055
056
057
058
059
060
061
062
063
O/C#
OPO
OP1
OP2
OP3
OP4
OP5
OP6
OP7
K16
U18
K15
817
M16
l15
T17
N16
U17
818
M15
016
U16
T16
R16
816
P15
R15
015
815
R14
R05
A15
A19
013
018
019
J15
P16
N15
infel·
i860™ XP MICROPROCESSOR
Table 7.2. Pin Cross Reference by Pin Name
Signal
EAD8#
FLlNE#
HIT#
HITM#
HLDA
HOLD
INT/C88
INV
KBO
KB1
KEN#
LEN
LOCK#
M/IO#
NA#
NENE#
PCD
PCHK#
PCYC
PEN#
PWT
RE8ET
8PARE
EWBE#
BYPA88#
TCK
TDI
TOO
TM8
TR8T#
Vee
Vee
Vee
Vee
Vee
Vee
Location
805
808
812
N05
809
R12
806
R07
R11
810
U02
T02
S03
804
U03
811
R06
H16
T04
R08
T03
802
L04
D08
A06
Q01
814
R10
R13
813
A08
A09
A10
A11
A12
A14
Signal
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
- Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Location
Signal
B06
B07
B09
B11
B13
B14
D03
D17
E03
E19
F02
F18
G01
G02
G18
H01
H19
J01
J02
J18
J19
K01
K19
L01
L02
L18
L19
M01
N01
N02
N18
P02
P18
Q03
Q17
R03
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
VeeCLK
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
2-99
(Continued)
Location
R17
U05
U06
U07
U08
U09
U10
U11
U12
U13
U14
U15
P01
B05
B08
B10
B12
B15
C06
C07
C08
C09
C10
C11
C12
C13
C14
D02
E02
E18
F03
F17
G03
G17
H02
H03
Signal
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
W/R#
WB/WT#
Location
H17
H18
J03
J17
K02
K03
K17
K18
L03
L17
M02
M03
M17
M18
N03
N17
P03
P17
Q02
Q18
R02
T05
T06
T07
T08
T09
T10
T11
T12
T13
T14
T15
T01
U04
inteL
i860TM XP MICROPROCESSOR
Table 7.3. Ceramic PGA Package Dimension Symbols
. Letter or
Symbol
Description of Dimensions
A
Distance from seating plane to highest point of body
A1
Distance between seating plane and base plane (lid) .
A2
Distance from base plane to highest point of body
A3
Distance from seating plane to bottom of body
B
Diameter of terminal lead pin
0
Largest overall package dimension of length
01
A body length dimension, outer lead center to outer lead center
e1
L
81
Linear spacing between true lead position centerlines
Distance from seating plane to end of lead
Other body dimension, outer lead center to edge of body
NOTES;
1. Controlling dimension: millimeter.
2. Dimension "e1" ("e") is noncumulative.
3. Seating plane (standoff) is defined by P.C. board hole size: 0.0415-0.0430 inch.
4. Dimensions "8", "81", and "C" are nominal.
5. Details of Pin 1 identifier are optional.
2-100
i860TM XP MICROPROCESSOR
F '
l
DI
SEATINGPLANE
"
S1-
n
A3-
HI
-JLi+
LLI~
SE:~~~
r==
01
::
lV
=
::= =:= ,...." "'\
::=
- ...
se (ALL PINS)
::
PIN
DETAIL
SWAGGED
\
PIN ----l
Al- -
(4 PL)
..... AZIeASE PLANE-
240874-66
Family: Ceramic Pin Grid Array Package
Millimeters
Symbol
Min
Max
Inches
Notes
Min
Max
Notes
A
3.56
4.57
.140
.180
A1
0.64
1.14
Solid Lid
.025
.045
Solid Lid
A2
2.79
3.56
Solid Lid
.110
.140
Solid Lid
A3
1.14
1.40
.045
.055
8
0.43
0.51
.017
.020
D
49.28
49.96
1.940
1.967
D1
45.59
45.85
1.795
1.805
e1
2.29
2.79
.090
.110
L
2.54
3.30
.100
.130
N
240
280
240
280
S1
1.52
2.54
.060
.100
ISSUE
9/90
Figure 7.3. 262-Lead Ceramic PGA Package Dimensions
2-101
int'eL
8.0
i860™ XP MICROPROCESSOR
Typical values for (JCA at various airflows and for (}JC
are given in Table 8.1 for the 1.95 sq. in., 262 pin,
ceramic PGA. (JJC is shown so that (}JA can be calculated by:
PACKAGE THERMAL
SPECIFICATIONS
For this section, let:
P = maximum power consumption
TC =
TA =
case temperature
ambient air temperature
(}CA
=
thermal resistance from case to ambient air
(}JC
(}JA
=
thermal resistance from junction to case
thermal resistance from junction to ambient
air
=
Note that (JJC with a heatsink differs from (JJC without a heatsink because case temperature is measured differently. Case temperature for (JJC with
heatsink is measured at the center of the heat fin
base. Case temperature for (JJC without heatsink is
measured at the center of the package top surface.
The i860 XP microprocessor is specified for operation when TC is within the range of 0°C-85°C. TC may
be measured in any environment to determine
whether the i860 XPmicroprocessor is within specified operating range. The case temperature should
be measured at the center of the top surface opposite the pins.
TA can be calculated from (}CA with the following
equation:
Table 8.2 shows the maximum T A allowabie (without
exceeding Tcl at various airflows.
Note that TA is greatly improved by attaching "fins"
or a "heat sink" to the package. P (the maximum
.power consumption) is calculated by using the maximum Icc at 5V as tabulated in the D.C. Characteristics of section 9.
Figure 8.1 gives typical Icc derating with case temperature. For more information on heat sinks, measurement techniques, or package characteristics, refer to Intel Packaging Handbook, order number
240800.
Table 8.1. Thermal Resistance-In °C/Watt
(}CA as a Function of Airflow - ft/min (m/sec)
(}JC
I
I
0
(0)
200
(1.01)
400
(2.03)
600
(3.04)
800
(4.06)
1000
(5.07)
With Heat Sink'
1.6
10.1
6.3
4.3
3.2
2.5
2.2
Without Heat Sink
1.0
13.5
11.0
8.0
6.5
' 5.5
5.0
NOTE:
• Nine·fin, unidirectional heat sink (fin dimensions: 0.250" height, 0.040" fin width, 0.100" center·to-center spacing, 1.730;'
length)
.
Table 8.2. Maximum T A at Various Airflows-In °C
Airflow - ft/min (m/sec)
(MHz)
0
(0)
200
(1.01)
400
(2.03)
600
(3.04)
800
(4.06)
1000
(5.07)
TAwith
Heat Sink'
50
24
47
59
66
70
72
TA without
Heat Sink
50
4
19
37
46
52
55
TA with
Heat Sink"
40
34.5
53.5
63.5
69
72.5
74
TA without
Heat Sink
40
17.5
30
45
52.5
57.5
60
fCLK
NOTE:
• Nine-fin, unidirectional heat sink (fin dimensions: 0.250" height, 0.040" fin width, 0.100" center·to·center spacing, 1.730"
length)
2-102
int:et
i860TM XP MICROPROCESSOR
1.30
1.20
~
: 50 MHz
E
.::.
-
--....;
'C:
1.10
u
_u
1.00
:--
: 40 MHz:
-:--.:
0.90
II
0.80
0
10
20
30
40
50
60
70
80 85
TEMPERATURE (Degrees Centigrade)
240874-67
Figure 8.1. Icc Derating with Case Temperature
9.0
ELECTRICAL DATA
All input and output timings are specified n:ilative to
the 1.5V level of the rising edge of elK and refer to
the pOint that the signal reaches 1.5V.
2-103
.
..
InTel®
i860™ XP MICROPROCESSOR
9.1 Absolute Maximum Ratings
NOTICE: This data sheet contains preliminary information on new products in production. The specifications are subject to change without notice. Verify with
your local Intel Sales office that you have the latest
data sheet before finalizing a design.
Case Temperature T c under Bias ...... O°C to 85°C
Storage Temperature .......... - 65°C to + 150°C
Voltage on Any Pin with
Respect to Ground .......... - 0.5 to Vcc + 0.5V
• WARNING: Stressing the device beyond the "Absolute
Maximum Ratings" may cause permanent damage.
These are stress ratings only. Operation beyond the
"Operating Conditions" is not recommended and extended exposure beyond the "Operating Conditions"
may affect device reliability.
9.2 D.C. Characteristics
Table 9.1. D.C. Characteristics Operating Conditions: Vcc = 5V ±5%; Tc = O°C to 85°C
Symbol
Vil
Parameter
Input lOW voltage (TTL)
Min
Max
Units
-0.3
+0.8
V
Notes
VIH
Input HIGH voltage (TTL)
2.0
VCC+ 0.3
V
2.5
VCC + 0.3
V
0.45
V
1
V
2
VIHC
ClK Input HIGH (TTL)
Val
Output lOW voltage (TTL)
VOH
Output HIGH voltage (TTL)
Icc
Power supply current (@ 50 MHz)
1.2
Amp
3
IcC
Power supply current (@40 MHz)
1.0
Amp
3
III
Input leakage current
ILiP
Input leakage current (pull-up)
2.4
±15
/LA
4
-400
/LA
5
ILO
Output leakage current
±15
/LA
6
CIN
Input capacitance
11.5
pF
7
Co
1/0 or output capacitance
14
pF
7
NOTES:
1. This parameter is measured with current load of 5 mA.
2. This parameter is measured with current load of 1 mA. Typical value is Vee - 0.45V.
3. Measured at 50 MHz and Vee = 5V.
4. This parameter is for inputs without pullups. Vee is on, and OV ,;; VIN ,;; Vee.
5. This parameter is for inputs with pullups and VIL = 0.45V. Note that if the pull-ups are put in high-impedance state via the
DCTl boundary scan cell that also tri-states the data outputs, then the leakage is ± 15 ",A.
6. 0.45V ,;; VIN ,;; Vee - 0.45V.
7. These parameters are not tested; they are guaranteed by design characterization.
2-104
i860™ XP MICROPROCESSOR
9.3 A.C. Characteristics
Table 9.2. A.C. Characteristics
CL = 0 pF Unless Otherwise Specified; Vee = 5V ± 5%; Te = O°C to 85°C
40MHz
Symbol
Parameter
Fig
Min
(ns)
50 MHz
Max
(ns)
Min
(ns)
Max
(ns)
tc
ClK Period
9.1
25
40
20
40
tic
TCK Period
9.2
40
1000
40
1000
ClK Stability
9.1
tch
ClK High Time
9.1
7
tcl
ClK low Time
9.1
7
tr
ClK Rise Time
9.1
0.1%
Notes
0.1%
7
7
3
3
h
If
ClK Fall Time
9.1
3
3
h
ts
TCK to ClK Skew
9.3
±1
±1
i
tlch
TCK High Time
9.2
10
10
tlcl
TCKlowTime
9.2
10
10
tlcr
TCK Rise Time
9.2
4
4
tlcf
TCK Fall Time
9.2
4
4
tsu.l
RESET, HOLD, BERR, FLlNE#,
PEN #, INT ICS8 Setup Time
9.1
8
7
tsu.2
BOFF#,AHOlD,KEN#,NA#,
INV, WB/WT# Setup Time
9.1
8
7
tsu.3
EADS # Setup Time
9.1
9
8
tsu.4
EWBE# Setup Time
9.1
8.5
7.5
tsu.5
BRDY # Setup Time
9.1
8.5
7.5
tsu.6
063-00, DP7-DPO Setup Time
9.1
8.5
7.5
tsu.7
063-00, DP7-DPO Setup Time
(late Backoff Mode)
9.1
5.5
4.5
tsu.8
A31-A5 Setup Time
9.1
11
10
tlsu
TDI, TMS, TRST# Setup Time
9.2
8
8
tth
TDI, TMS,TRST# Hold Time
9.2
2
1
b
th.l
Hold Time, All Inputs
except 063-00, DP7-DO
9.1
2
1
c
th.2
063-00, DP7-DPO Hold Time
(Normal and late Back-Off Mode)
9.1
3
2
c
ttco
TOO Valid Delay and All Outputs
Valid Delay in EXTEST Mode
9.2
1.5
17.5
1.5
16.5
a, f
tco.l
A31-A22 Valid Delay
9.1
1.5
12
1.5
11
a
tco.2a
A21-A3 Valid Delay
(High Current Mode)
9.1
1.5
11.5
1.5
10.5
a,9
tco.2b
A21-A3 Valid Delay
(Normal Current Mode)
9.1
1.5
12
1.5
11
a
2-105
i860TM XP MICROPROCESSOR
Table 9.2. A.C. Characteristics (Continued)
Cl = 0 pF Unless Otherwise Specified; VCC = 5V ± 5%; TC = O°C to 85°C
40 MHz
Symbol
Parameter
Fig
Min
(ns)
50 MHz
Max
(ns)
Min
(ns)
Max
(ns)
Notes
tco.3
063-00, DP7-DPO Valid Delay
9.1
2.5
14
2.5
13
a,d
tcoA
BREQ,HLDA,PCHK#,
NENE#, KBO, KB1 Valid Delay
9.1
1.5
13
1.5
12
a
tco.5a
ADS# Valid Delay
(High Current Mode)
9.1
1.5
10
1.5
9
a,g
tco.5b
ADS# Valid Delay
(Normal Current. Mode)
9.1
1.5
11
1.5
10
a.
tco.6a
W/R# Valid Delay
(High Current Mode)
9.1
1.5
11
1.5
10
a,g
tco.6b
W/R# Valid Delay
(Normal Current Mode)
9.1
1.5
12
1.5
11
a
tco.7a
HITM# Valid Delay
(High Current Mode)
9.1
1.5
12
1.5
11
a,g
tco.7b
HITM # Valid Delay
(Normal Current Mode)
9.1
1.5
13
1.5
12
a
tco.8
PWT, PCD, HIT#, CTYP, D/C# M/IO#,
PCYC, LOCK#, CACHE#, LEN Valid Delay
9.1
1:5
12
1.5
11
a
tco.9a
BEO#-BE7# Valid Delay
(High Current Mode)
9.1
1.5
12
1.5
11
a,g
tco.9b
BEO#-BE7# Valid Delay
(Normal Current Mode)
9.1
1.5
13
1.5
12
a
tz.1
Float Time All Outputs
except 063-00, DP7-DPO
9.1
2
19
2
18
e
tz.2
Float TimeD63-DO,
9.1
3
19
3
18
e
tzt
Float Time during Boundary Scan EXTEST .
20
f
DP7~DPO
9.1
20
NOTES:
a. Minimum and maximum delays are for OpF load.
b. These hold times are referenced to the falling edge of TCK.
c. These hold times are referenced to the rising edge of ClK.
d. Output delay for 063-00, OP7-0PO is from the ClK after AOS# activation.
e. Float time = delay until maximum output current is less than ± ILO. Float time is not tested ..
f. Delay from falling edge of TCK.
g. These pins can be configured as normal or high-current buffers. When they are configured as high-current buffers for
interface with cache memory or other large loads, use the derating curves in Figure 9.3. Otherwise, all normal buffers use
the derating curves in Figure 9.4.
.
h. tr and tf should be measured between O.BV and 2.5V.
i. Assumes TCK and ClK both at 25 MHz.
2-106
int:eL
i860™ XP MICROPROCESSOR
~~-----------------\c--------------------~
~--------\chl---------+I'---------\c"---------'
ClK
1.5V-
_ _ _.1-_ _ _ _
OUTPUTS
r------
-
-
-
-
~2.4V
-1.5V
SO.45V
\Z~
OUTPUTS
.
_________________ Jr---~~c---240874-68
Figure 9.1. elK, Input, and Output Timings
~-------------------\tOI----~------------~
1 - - - - - \tch---------I-_------- \\cl'-----+I
TCK
\tcr
TOI,
TIotS,
TRST#
TOO
- - - - - - - - - - - 1.5V
240874-69
Figure 9.2. TAP Signal Timings
2·107
i860TM XP MICROPROCESSOR
ADS#, A21-A3, BE7#-BEO#, W/R#, HITM# (In High-Current Mode)
7.00
6.00
I
<
z
~
/
,
4.00
~
...
..s
~
3.00
\
2.00
"1.00
V
/
/-~~
~~~
/
MAXIMUM VALID DELAY
5.00
V
/
/
L
~~~
~~
~~
~~
...... '
'~~
~~~
~~
I-
MltflNUIoI YALID DELAY
L
1-- ............
.,
"
~~
0.00
ISO,
100
50,
250
LOAO (pF)
240874-70
NOTES:
Graphs are not linear outside the Cl range shown.
NOMINAL = OpF value given in the A.C. Timings table.
'Typical part under worst-case conditions.
Figure 9.3. Typical Output Delay vs Load Capacitance
All Outputs (In Normal Mode)
9.00
8.00
I
7.00
6.00
il
~
..,~
..s
~
/
,/
4.00
/
3.00
2.00
1.00
V
/
5.00
/
I
MAXIMUM VALID DELAY
/
V~
/
~~
~~
~
/
~~~
;.
. ...
~~
~~
~~
~
~~
~~
".'
I
l
WININUM VALID DELAY
~~
0.00
0'
25
75
50
100
125
150
LOAO (pF)
NOTES:
Graphs are not linear outside the Cl range shown.
NOMINAL = 0 pF value given in the A.C. Timings table.
'Typical part under worst-case conditions.
Figure 9.4. Typical Output Delay vs Load Capacitance
. 2-108
240874-71
on~®
i860TM }{P MICROPROCESSOR
>'
o
N
1
>
t-
o
25
50
75
100
125
150
175
200
225
250
LOAD CAPACITANCE, ~ (pr)
240874-81
NOTES:
Graphs are not linear outside the CL range shown.
'Typical part under worst-case con?itions.
Figure 9.5a. Typical Slew Time vs Load Capacitance under Worst-Case Conditions (Rising Voltage)
3.5
>'
0
2.5
~
~
-5
2.0
L.J
'"
i=
;:
.
/
.j.,
.
,
1.5
L.J
--'
'"
':J
t-
:
,
:
,
,
,
,
,
:
:
..t
'
,
,
:
:
:
'
:
:
:
:
:
1.0
•.
VrfoS#,A21-A3,8E7#
..::,./
L
BEO#,W/R#,HIT·t.l#
(In High Current Mode)
~f""'
0.5
V'"\:
'
'
0
0
25
50
75
100
125
150
175
LOAD CAPACITANCE, ~ (pr)
200
225
250
240874-82
NOTES:
Graphs are not linear outside the CL range shown.
'Typical part under worst-case conditions.
Figure 9.5b. Typical Slew Time vs Load Capacitance under Worst-Case Conditions (Falling Voltage)
2-109
intel®
i860™ XP MICROPROCESSOR
1.30
1.20
1.10
1.00
c:
E
oS
0.90
<.>
..Y
0.80
0.70
0.60
25
30
35
40
FREQUENCY (MHz)
NOTES:
Graph is not linear outside the frequency range shown.
'Worst-case supply current at 5V.
Figure 9.6. Typical Icc vs. Frequency
2-110
'45
50
240874-73
intel®
i860™ XP MICROPROCESSOR
9.4 Component Buffer Model
9.4.1 FIRST ORDER ELECTRICAL BUFFER
MODEL
240874-84
The first order electrical buffer model provides an
accurate and simple representation of the buffers
used in the inputs and outputs of the CHMOS i860
XP CPU. The model output consists of four components:
1. Linear voltage waveform (dV /dt)
Figure 9.7b. Input Model
9.4.2 FIRST ORDER ELECTRICAL MODEL
PARAMETER VALUES
2. Intrinsic buffer delay due to CL (to)
3. Buffer output impedance (Ro)
4. Buffer output capacitance (Co)
as shown in Figure 9.7a
A fitting algorithm has been used to arrive at values
for dV / dt, to, Co, and Ro such that Ro matches the
actual buffer impedance and Co, the intrinsic buffer
output capacitance whether the output is on or off,
remains constant across the operating range while
minimizing the difference between the full buffer circuit and its simplified electrical model for a set of
different loads (lumped capacitance, and short and
long transmission lines): dV / dT is the slope of the
voltage ramp, while to is the intrinsic buffer delay
associated with a given CL. to accounts for the intrinsic delay by offsetting the excitation of the model by
the amount of the delay.
NOTE:
to is zero for CL = 0 and when the load is represented by a transmission line.
240874-83
Figure 9.7a. Output Model
The input model consists of one component, buffer
capacitance (CIN), as shown in Figure 9.7b.
The parameters that make up the first order electrical model vary with the buffer design. In addition,
these parameters also vary with the operating condition (Le., temperature and Vee) of the buffer process. The typical process corner is being modeled.
Two sizes of buffer are used on these components,
labelled here as small and large. The parameter values found in Table 9.3 and 9.4 list dV /dt, to, Ro, and
Co. These parameters are provided for both low-tohigh and high-to-Iow transitions at the typical process corner for three operating conditions (Vee =
5.5V and TJ = -10a C, Vee = 5.0V and TJ = 80 a C,
and Vee = 4.5V and TJ = 125a C.
9.4.3 PACKAGE PARAMETERS
In addition to the buffer characteristics, package
characteristics are also included to complete- the
model. Package inductance, capacitance and resistance values vary with design geometry and material
properties of the package. Figure 9.8 shows a model
of the package including these parameters and
should be placed between the first order electrical
buffer model as shown in Figure 9.9 and the board
interconnects. Notice the package model only includes the package inductance (Lp) and capacitance (Cp). This is sufficient since the package resistance is so small it is negligible.
Table 9.5 lists the buffer model parameters for each
pin of the i860 XP microprocessor. The table gives
the package model parameters for each pin, followed by the input capacitance (input and I/O pins)
and/or output buffer size (outputs and I/O). In those
cases where the buffer used by a pin is an option
selected at reset by the PEN # input, the output buffer column lists the sizes available. Large buffers correspond to high-current mode, while small buffers
correspond to normal current mode.
2-111
i860TM XP MICROPROCESSOR
9.4.4 BOARD INTERCONNECTS
240874-85
Figure 9.8. Package Model
The board interconnect can be considered as a
lumped parameter (capacitive loa.d) or as a transmission line. As a rule of thumb, an unterminated board
interconnect may be considered as a capacitive load
if the round trip time (time for signal to travel from
one end of the interconnect to the other and back) is
short compared to the transition time of the signal.
At frequencies of 50 MHz and above most interconnects behave as transmission lines (Figure 9.10).
For accurate results at high frequencies, these
transmission line effects must be taken into account
.
. and modeled.
240874-86
Figure 9.9a. Output Buffer and Package Model
T
240874-87
Figure 9.9b. Input Buffer and Package Model
240874-88
Figure 9.10. Transmission Line Model
2-112
inteL
i860TM XP MICROPROCESSOR
Table 9.3. Small Output Buffer First Order Electrical Model Parameter Values
to (ns) at various CL
Transition
Vee
TJ
(C)
Ro
(ohms)
Co
(pF)
dV/dT
0
5
25
50
100
150
(pF)
(pF)
(pF)
(pF)
(pF)
(pF)
Low-to-High
5.5
-10
28.0
4.3
5.5/1.2
0
0.0
0:1
0.3
0.7
1.1
Low-to·High
5.5
80
36.4
4.3
5.5/1.4
0
0.0
0.1
0.8
0.8
1.2
Low-lo-High
5.5
125
40.4
4.3
5.5/1.5
0
0.0
0.1
0.4
0.8
1.2
Low-to-High
5.0
-10
30.2
4.3
5.0/1.2
0
0.0
0.1
0.4
0.8
1.2
Low-to-High
5.0
80
39.2
4.3
5.0/1.4
0
0.0
0.2
0.4
0.9
1.3
Low-to-High
5.0
125
43.5
4.3
5.0/1.6
0
0.0
0.2
0.4
0.9
1.3
Low-to-High
4.5
-10
33.0
4.3
4.5/1.2
0
0.0
0.2
0.5
1.0
1.4
Low-to-High
4.5
80
42.8
4.3
4.5/1.6
0
0.0
0.2
0.5
1.0
1.5
Low-to-High
4.5
125
47.4
4.3
4.5/1.6
0
0.0
0.3
0.6
1.1
1.6
High-to-Low
5.5
-10
23.2
4.3
5.5/1.0
0
0.0
0.4
0.7
1.2
1.6
High-to-Low
5.5
80
31.4
4.3
5.5/1.4
0
0.0
0.4
0.9
1.3
1.8
High-to-Low
5.5
125
36.1
4.3
5.5/1.6
0
0.0
0.5
0.8
1.3
1.8
High-to-Low
5.0
-10
24.0
4.3
5.0/1.1
0
0.0
0.5
0.9
1.2
1.7
High-to-Low
5.0
80
32.8
4.3
5.0/1.4
0
0.0
0.5
0.9
1.5
1.9
High-to-LoVi
5.0
125
37.8
4.3
5.0/1.7
0
0.0
0.5
0.9
1.4
1.8
High-to-Low
4.5
-10
25.1
4.3
4.5/1.2
0
0.0
0.4
0.7
1.2
1.7
High-to-Low
4.5
80
34.5
4.3
4.5/1:6
0
0.0
0.4
0.8
1.3
1.8
High-to-Low
4.5 '
125
39.9
4.3
4.5/1.8
0
0.0
0.5
0.9
1.4
1.9
2-113
I
fI
intel®
i860TM XP MICROPROCESSOR
Table 9.4. Large Output Buffer First Order Electrical Model Parameter Values
Transition
Vee
TJ
(C)
to (ns) at various CL
Ro
Co
(ohmS) (pF)
dV/dT
0
5
(pF) (pF)
25
(pF)
50
100
150
200
250
300
(pF) (pF) (pF)
(pF) (pF) (pF)
Low-to-High
5.5
-10
12.1
4.3
5.5/0.7
0
0.0
0.1
0.3
0.6
0.8
1.0
1.3
1.5
Low-to-High
5.5
80
15.5
4.3
5.5/0.9
0
0.0
0.2
0.3
0.6
0.9
1.1
1.4
1.7
Low-to-High
5.5
125
17.2
4.3
5.5/1.1
0
0.0
0.2
0.4
0,7
1.0
1.2
1.4
1.7
Low-to-High
5.0
-10
13.0
4.3
5.0/0.9
0
0.0
0.1
0.3
0.6
0.9
1.1
1.4
1..7
Low-to-High
5.0
80
16.7
4.3
5.0/1.0
0
0.0
0.2
0.4
0.8
1.1
1.4
1.7
2.0
Low-to-High
5.0
125
18.5
4.3
5.0/1.2
0
0.0
0.2
0.4
0.8
1.1
1.4
1.7
2.0
Low-to-High
4.5
-10
14.1
4.3
4.5/0.9
0
0.0
0.2
0.4
0.7
1.1
1.4
1.7
2.0
Low-to-High
4.5
80
18.0
4.3
4.5/1.2
0
0.0
0.2
0.4
0.9
1.2
1.5
1.9
2.2
Low-to-High
4.5
125
19.9
4.3
4.5/1.3
0
0.0
0.2
0.5
0.8
1.2
1.5
1.9
2.2
High-to-Low
5.5
-10
10.6
4.3
5.5/0.7
0
0.0
0.3
0.6
0.9
1.2
1.5
1.8
2.0
High-to-Low
5.5
80
13.9
4.3
5.5/1.0
0
0.0
0.4
0.7
1.2
1.5
1.9
2.2
2.5
High-to-Low
5.5
125
15.8
4.3
5.5/1.1
0
0.0
0.4
0.8
1.3
1.7
2.0
2.4
2.8
High-to-Low
5.0
-10
11.0
4.3
5.0/0.8
0
0.0
0.4
0.7
1.0
1.3
1.6
1.9
2.1
High-to-Low
5.0
80
14.5
4.3
5.0/1.0
0
0.0
0.4
0.8
1.2
1.6
2.0
2.3
2.6
2.8
High-to-Low
5.0
125
16.5
4.3
5.0/1.2
0
0.0
0.4
0.8
1.3
1.7
2.1
2.5
High-to-Low
4.5
-10
11.3
4.3
4.5/0.9
0
0.0
0.4
0.7
1.1
1.4
1.7.
2.0
2.4
High-to-Low
4.5
80
15.2
4.3
4.5/1.2
0
0.0
0.4
0.8
1.3
1.6
2.0
2.3
2.7
Highcto-Low
4.5
125
17.4
4.3
4.5/1.3
0
0.0
0.4
0.8
1.3
1.7
2.1
2.5
2.8
2-114
i860™ XP MICROPROCESSOR
Table 9.5 Buffer Models
Location
Cp (pF)
Typical
Lp (nH)
Typical
Input
Buffer
CIN (pF)
Typical
Output
Buffer
Size
(Large or Small)
S01
7.6
13.8
6.7
LIS
A4
R01
6.2
14.5
6.7
LIS
As
M05
6.5
7.8
6.7
LIS
A6
L05
5.3
8.0
6.7
LIS
A7
F01
7.7
16.2
6.7
LIS
Pin Name
A3
As
K05
5.1
7.7
6.7
A9
E01
8.0
16.4
6.7
A10
K04
5.1
8.8
6.7
LIS
LIS
LIS
A11
001
8.3
16.8
6.7
LIS
IA12
J04
5.2
9.0
6.7
LIS
A13
C01
8.7
17.2
6.7
LIS
A14
J05
5.2
7.8
6.7
LIS
A1S
801
9.0
17.8
6.7
LIS
A16
H04
5.2
9.0
6.7
LIS
A17
A01
9.4
18.2
6.7
A1S
C03
7.8
14.5
6.7
LIS
LIS
A19
CO2
9.0
15.3
6.7
LIS
A20
H05
7.5
7.7
6.7
LIS
A21
802
8.5
15.7
6.7
LIS
A22
G04
7.5
9.1
4.4
S
A23
A02
8.1
15.7
4.4
S
A24
803
7.0
14.5
4.4
S
A2S
A03
7.7
14.6
4.4
S
A26
G05
6.7
7.9
4.4
S
A27
E04
7.6
9.6
4.4
S
A2S
F04
6.5
9.2
4.4
S
A29
004
7.4
10.0
4.4
S
A30
F05
5.9
8.2
4.4
S
A31
C04
6.6
10.4
4,4
S
AOS#
N04
6.2
9.1
AHOLO
005
6.0
8.8
BEO#
E05
5.7
8.8
BE1#
005
6.7
8.8
LIS
LIS
8E2#
006
5.7
9.0
LIS
2-115
LIS
2.0
fI
i860TM XP MICROPROCESSOR
Table 9.5. Buffer Models (Continued)
Location
Cp (pF)
Typical
Lp (nH)
Typical
BE3#
B04
6.5
11.2
BE4#
C05
5.9
10.6
Pin Name
Input
Buffer
CIN (pF)
Typical
Output
Buffer
Size
(Large or Small)
BE5#
A04
6.5
12.0
liS
liS
liS
BE6#
007
4.9
8.6
LIS
BE7#
A05
6.1
11.5
BERR
S07
5.8
8.7
2.0
BOFF#
R04
6.3
10.4
2.0
RSRVO
P04
6.4
9.4
2.0
BROY#
U01
8.0
14.7
2.0
BREO
R09
4.4
7.5
BYPASS #
A06
liS
S
Strapping Option
CACHE#
004
6.6
9.8
ClK
M04
6.2
8.9
CTYP
P05
6.5
8.6
Do
A07
5.5
10.6
01
009
7.6
02
A13
7.4
03
A16
7.7
04
A17
9.2
05
010
06
A18
S
2.0
S
4.4
S
7.6
4.4
S
15.0
4.4
S
17.7
4.4
S
17.9
4.4
S
7.5
7.6
4.4
S
9.4
18.3
4.4
S
07
C17
8.6
15.9
4.4
S
08
C16
8.6
14.5
4.4
S
09
B16
9.3
14.7
4.4
S
010
011
8.3
7.5
4.4
S
011
B17
8.9
14.7
4.4
S
012
C15
8.1
7.8
4.4
S
013
B18
8.6
15.4
4.4
S
014
012
7.2
7.8
4.4
S
015
B19
8.2
15.6
4.4
S
016
C18
7.9
10.7
4.4
S
017
014
6;7
9.2
4.4
S
018
C19
7.6
14.2
4.4
S
019
015
6.4
10.0
4.4
S
2-116
i860™ XP MICROPROCESSOR
Table 9.5 Buffer Models (Continued)
Location
Cp (pF)
Typical
Lp (nH)
Typical
Input
Buffer
CIN (pF)
Typical
020
016
7.4
10.7
4.4
8
021
E15
5.6
8.8
4.4
8
0 22
019
6.7
12.7
4.4
8
023
E16
5.5
9.7
4.4
8
024
F15
5.3
8.3
4.4
8
025
E17
6.6
9.9
4.4
8
0 26
F16
5.3
9.7
4.4
8
027
F19
6.2
11.7
4.4
8
028
G15
5.1
7.9
4.4
8
029
G19
6.2
11.8
4.4
8
030
G16
5.1
8.9
4.4
8
031
M19
8.6
16.2
4.4
8
032
H15
5.2
7.7
4.4
8
Pin Name
Output
Buffer
Size
(Large or Small)
033
R18
11.0
19.6
4.4
8
034
P19
8.0
18.4
4.4
8
035
R19
9.1
18.8
4.4
8
036
N19
8.1
16.9
4.4
8
037
819
9.2
20.7
4.4
8
038
J16
8.4
8.9
4.4
8
039
T19
10.5
19.6
4.4
8
040
U19
10.8
19.1
4.4
8
041
L16
8.3
10.9
4.4
8
042
T18
10.5
17.8
4.4
8
043
K16
8.4
8.8
4.4
8
044
U18
10.1
17.7
4.4
8
045
K15
9.3
7.5
4.4
8
046
817
9.5
14.5
4.4
8
047
M16
8.0
9.8
4.4
8
048
L15
8.0
7.7
4.4
8
049
T17
8.7
14.6
4.4
8
050
N16
7.8
9.9
4.4
8
051
U17
8.6
15.2
4.4
8
052
818
7.6
14.3
4.4
8
2-117
i860™ XP MICROPROCESSOR
Table 9.5 Buffer Models (Continued)
Lp (nH).
Typical
Input
Buffer
CIN (pF)
Typical
Output
Buffer
Size
(Large or Small)
Location
Cp (pF)
Typical
053
M15
7.7
7.1
4.4
5
054
016
7.0
11.1
4.4
5
055
U16
8.0
14.3
4.4
5
056
T16
7.S
12.8
4.4
5
D57
R16
6.5
11.8
4.4
5
058
516
7.5
11.3
4.4
5
Pin Name
059
P15
6.2
8.7
4.4
5
060
R15
7.1
9.6
4.4
5
061
015
5.9
9.3
4.4
5
062
515
6.9
10.7
4.4
5
063
R14
5.6
9.7
4.4
5
O/C#
R05
5.8
9.7
4.4
5
5
OPO
A15
7.7
18.3
OP1
A19
9.7
18.9
4.4
5
OP2
013
7.1
8.5
4.4
5
OP3
018
6.7
11.3
4.4
5
DP4
019
10.4
19.0
4.4
5
OP5
J15
9.9
7.7
4.4
5
OP6
P16
9.3
10.7
4.4
5
OP7
N15
6.8
8.9
4.4
5
EA05#
505
5.5
10.5
EWBE#
008
7.5
7.6
2.0
FLlNE#
508
5.4
8.1
.2.0 .
HIT#
512
5.9
11.1
HITM#
N05
6.2
8.2
·2.0
5
L
.
HLOA
509
5.3
7.9
HOLD
R12
6.1
11.1
2.0
INT/C58
506
5.2
10.0
2.0
INV
R07
5.3
8.2
2.0
KBO
R11
6.1
9.2
KB1
510
6.4
KEN#
U02
. 7.4
13.4
LEN
T02
7.9
12.8
5
7.9
2-118
5
5
2.0
5
intel·
i860™ XP MICROPROCESSOR
Table 9.5 Buffer Models (Continued)
Pin Name
Location
Cp (pF)
Typical
Lp (nH)
Typical
Input
Buffer
CIN (pF)
Typical
Output
Buffer
Size
(Large or Small)
LOCK#
S03
7.7
11.2
S
M/IO#
S04
7.3
10.3
S
NA#
U03
7.1
13.0
NENE#
S11
6.3
9.6
S
PCD
R06
5.6
8.9
S
PCHK#
H16
5.1
8.8
S
PCYC
T04
7.2
11.4
PEN#
R08
4.8
7.8
PWT
T03
7.4
12.1
7.9
12.5
2.0
S
2.0
S
RESET
S02
SPARE
L04
2.0
TCK
001
5.8
14.1
2.0
TOI
814
6.5
9.8
2.0
NC
TOO
R10
6.3
7.6
TMS
R13
5.6
9.6
2.0
2.0
TRST#
S13
6.3
9.6
W/R#
T01
7.8
14.3
WB/WT#
U04
6.7
12.3
2·119
S
LIS
2.0
II
Intel·
i860™ XP MICROPROCESSOR
A function that computes the target address by shifting the offset (either Ibrott or
sbrotf) left by two bits, sign-extending it to
32 bits, and adding the result to the current
instruction pointer plus four. The resulting
target address may lie anywhere within the
address space.
brx
10.0 INSTRUCTION SET
Key to abbreviations:
For register operands, the abbreviations that describe the operands are composed of two parts. The
first part describes the type of register:
c One of the .control registers fir, psr, epsr,
or p3
dirbase,
db, fsr, bear, ccr, pO, p1, p2,
.
,
f
One of the floating-point registers: fO through
f31
i
One of the integer registers: rO through r31
The second part identifies the field of the machine
instruction into which the operand is to be placed:
src1
The first of the two source-register designators, which may be either a register or a
16-bit immediate constant or address offset. The immediate value is zero-extended
for logical operations and is sign-extended
for add and subtract operations (including
addu and subu) and for all addressing calculations.
src1ni Same as src1 .except that no immediate
constant or address offset value is permitted.
src1s
Same as src1 except that the immediate
constant isa 5-bit value that is zero-extended to 32 bits.
src2
The second of the two source-register designators.
dest
The destination register designator.
Thus, the operand specifier isrc2, for example,
means that an integer register is used and that the
encoding of that register must be placed in the src2
field of the machine instruction.
Other (nonregister) operands are specified by a onepart abbreviation that represents both the type of
operand required and the instruction field into which
the value of the operand is placed:
# const
A 16-bit immediate constant or address off-
Ibroff
set that the i860 XP microprocessor signextends to 32 bits when computing the effective address.
A signed, 26-bit, immediate, relative branch
offset.
A signed, 16-bit, immediate, relative branch
offset.
sbroff
Table 10.1. Precision Specification
Suffix
Source Precision
Result Precision
.ss
.sd
.dd
.ds
single
single
double
double
single
double
double
single
Unless otherwise specificed, floating-point operations accept single- or double-precision source operands and produce a result of equal or greater precision. Both input operands must have the same precision. The source and result
precision are specified by a two-letter suffix to the mnemonic of the operation.
Other abbreviations include:
.p
Precision specification .5S,
.sd, or .dd (.ds not permitted). Refer to Table 10.1.
.r
Precision specification .ss,
.sd, .ds, or .dd. Refer to
Table 10.1.
.sd or .dd Refer to Table
.V
10.1.
.W
.55 or .dd. Refer to Table
10.1.
.b (8 bits), .5 (16 bits), or .I
.x
(32 bits)
.y
.1 (32 bits), .d (64 bits), or
.q (128 bits)
mem.x(address)
The memory location indicated by address with a
size of x.
port.x(address)
The liD port indicated by
address with a size of x.
int_vector.x(address) The interrupt vector with a
size of x returned from liD
port address.
PM
The pixel mask, which is
considered as an array of
eight bits PM(7) .. PM(0),
where PM(O) is the leastsignificant bit.
2-120
infel .
i860TM XP MICROPROCESSOR
10.1 Instruction Definitions in Alphabetical Order
adds isrc1, isrc2, idest ..................................................................Add Signed
idest isrc1 + isrc2
OF (bit 31 carry =F- bit 30 carry)
ee set if isrc2 + isrc1 < 0 (signed)
ee clear if isrc2 + isrc1 :?: 0 (signed)
addu isrc1, isrc2, idest ............................................................... Add Unsigned
ides! isrc1 + isrc2
OF bit 31 carry
ee bit 31 carry
and isrc1, isrc2, ides! .................................................................. Logical AND
idest isrc 1 and isrc2
.
ee set if result is zero, cleared otherwise
andh #const, isrc2, ides! ............ ~ .............................. '............... Logical AND High
(# cons! shifted left 16 bits) and isrc2
'
ides! ee set if result is zero, cleared otherwise
andnot isrc1, isrc2, idest .................... .'..................................... Logical AND NOT
ides! (not isrc1) and isrc2
ee set if result is zero, cleared otherwise
andnoth # const, isrc2, idest .................................................. Logical AND NOT High
(not (# cons! shifted left 16 bits» and isrc2
idest ee set if result is zero, cleared otherwise
bc Ibroff ....................................................................... , ..... Branch on CC
IF
ee = 1
THEN
continue execution at brx(lbroff)
FI
bc. t Ibroff .....................................................................Branch on CCo' Taken
IF
ee = 1
THEN
execute one more sequential instruction
continue execution at brx(lbroff)
ELSE
skip next sequential instruction
FI
bla isrc1ni, isrc2, sbroff ................................. : ................... . Branch on LCC and Add
Lee-temp clear if isrc2 + isrc1ni < 0 (signed)
Lee-temp set if isrc2 + isrc1ni:?: 0 (signed)
isrc 1ni + isrc2
isrc2 Execute'ohe more sequential instruction
IF
Lee
THEN
Lee Lee-temp
continue execution at brx(sbroff)
ELSE
Lee Lee-temp
FI
bnc Ibroff ................................................ ; ...................... Branch on Not CC
IF
ee = 0
THEN
continue execution at brx(lbroff)
FI
'
1860TM XP MICROPROCESSOR
bnc.t /broff ........................... , ................................... Branch on Not CC, Taken
IF
CC = 0
.
THEN
execute one more. sequential instruction
continue execution at brx(/broff)
ELSE
skip next sequential instruction
FI
br /broff ............................................................. Branch Direct Unconditionally
Execute one more sequential instruction.
Continue execution at brx(/broff).
bri [isrc1m] ............. ~ .......................................... Branch Indirect Unconditionally
Execute one more sequential instruction
IF
any trap bit in psr is set
THEN
copy PU to U, PIM to 1M in psr
clear trap bits
IF
DS is set and DIM is reset
THEN
enter dual-instruction mode after executing one
instruction in single-instruction mode
ELSE
IF
DS is set and DIM is set
THEN
enter single-instruction mode after executing one
instruction in dual-instruction mode
ELSE
IF
DIM is set
THEN
enter dual-instruction mode
for next instruction pair
ELSE
enter single-instruction mode
for next instruction pair
FI
FI
. FI
FI
Continue execution at address in isrc1ni
(The original contents of isrc1ni is used even if the next instruction
modifies isrc1ni. Does not trap if isrc1ni is misaligned.)
bte isrc 1s, isrc2, sbroff ...... , ....................................................... Branch If Equal
IF
isrc1s = isrc2
THEN
continue execution at brx(sbroff)
FI
btne isrc 1s, isrc2, sbroff ......................................................... Branch If Not Equal
IF
.
isrc1s ¥= isrc2
THEN
continue execution at brx{sbroff)
FI
.
call/broff . ......................................... ~........•.......................... Subroutine Call
r1 +- address of next sequential instruction
Execute one more sequential instruction
Continue execution at brx(/broff)
+
4 (or
+
8 in dual mode)
calli [isrc 1m] ...............•............................................... Indirect Subroutine Call
r1 +- address of next sequential instruction + 4 (or + 8 in dual mode)
Execute one more sequential instruction
Continue execution at address in isrc1ni
(The original contents of isrc1ni is used even if the next instruction
modifies isrc1hi. Does not trap if isrc1ni is misaligned. The
register isrc1ni must not be r1.)
fadd.p fsrc1, fsrc2, fdest ................................................•........ Floating-Point Add
fdest +- fsrc 1
+
fsrc2
2-122
i860TM }(P MICROPROCESSOR
faddp (sret, (sre2, (dest .... .................................................... Add with Pixel Merge
{dest +- (sret + (sre2 (using integer arithmetic; 8-byte operands and destination)
Shift and load MERGE register from {sret + {sre2 as defined in Table 10.2
faddz (sret, {sre2, {dest ........................................................... Add with Z Merge
(dest +- {sret + {sre2 (using integer arithmetic; 8-byte operands and destination)
Shift MERGE right 16 and load fields 31..16 and 63 .. 48 from (sret + (sre2
famov.r (sret, {dest . ..................................................... Floating-Point Adder Move
{dest +- {sret
fiadd.w {sret, (sre2, (dest ......................................................... Long-Integer Add
{dest +- {sret + (sre2 (2's complement integer arithmetic)
iisub.w (sret, {sre2, (dest ..................................................... Long-Integer Subtract
(rdest +- (sret - {sre2 (2's complement integer arithmetic)
iiJr.v (sret, (dest . ............................................... Floating-Point to Integer Conversion
(dest +- 64-bit value with low-order 32 bits equal to integer part of (sret rounded
Floating-Point Load
fld.y isre t (isre2), (dest .................................................................... (Normal)
fld.y isre t (isre2) + +, (dest ......................................................... (Autoincrement)
(dest +- mem.y (isret + isre2)
IF autoincrement
THEN isre2 +- isret + isre2
FI
Cache Flush
flush # eonst(isre2) ....................................................................... (Normal)
flush # eonst(isre2) + + ............................................................ (Autolncrement)
Write back (if modified) the line in data cache that has address (# eonst + isre2)
80860XR: and set tag value to (#eonst + isre2).
80860XP: and invalidate its virtual and physical tags.
Contents of line undefined.
IF autoincrement
THEN isre2 +- #eonst + isre2
FI
fmlow.dd {sret, (sre2, (dest .......................................... '" . Floating-Point Multiply Low
(dest +- low-order 53 bits of ({sret mantissa x (sre2 mantissa)
(dest bit 53 +- most significant bit of «(sretmantissa x (sre2 mantissa)
fmov.r {sret, {dest .................................................... Floating-Point Reg-Reg Move
Assembler pseudo-operation
= fiadd.ss (sret, fO, (dest
fmov.ss (sret, {dest
fmov.dd {sret, {dest
= fiadd.dd {sre t, fO, (dest
fmov.sd {sret, {dest
= famov.sd (sret, {dest
fmov.ds (sret, {dest
= famov.ds {sret, (dest
fmul.p {sret, {sre2, (dest . ..................................................... Floating-Point Multiply
{dest +- (sret x {sre2
fnop ................................................................. Floating-Point No Operation
Assembler pseudo-operation
fnop = shrd rO, rO, rO
2-123
inteL
i860™ XP MICROPROCESSOR
form fsre1, fdes! ... : ...................................................... OR with MERGE Register
fdes! ~ fsre10R MERGE
MERGE ~ 0
frcp.p fsre2, fdes! ......................................................... Floating-Point Reciprocal
fdes! ~ 1 / fsre2 with maximum mantissa error < 2- 7
frsqr.p fsre2, fdes! . ........................................... Floating-Point Reciprocal Square Root
. fdes! ~ 1 / Hsre2 with maximum mantissa error < 2- 7
.
Floating-Point Store
fst.y fdes!,isre1(isre2j ..................................................................... (Normal)
fst.y fdes!, isre1(isre2j+ + ....... , .............................................. " .(Autoincrement)
mem.y (isre2 + isre1) ~ fdes!
IF autoincrement .
THEN isre2 ~ isre1 + isre2
FI
fsub.p fsre 1, fsre2, fdes! .................................. ~ ... ; ; .............. Floating-Point Subtract
fdes! ~ fsre 1 - fsre2
ftrunc. v fsre 1, fdes! ............................................ Floating-Pointto Integer Conversion
fdes!~ 64-bit val.ue with low-order 32 bits equal to integer part of fsre1
fxfr fsre1,ides! ..................................................... Transfer F-P to Integer Register
ides!
~
fsre1
fzchklfsre 1, fsre2, fdes! ......................... , ............................. 32-Bit Z-Buffer Check
Consider the 64-bit operands as arrays of two 32-bit
fields fsre1(1 )..fsre1(O), fsre2(1 )..fsre2(O), and fdest(1 )..fdest(O)
where zero denotes the least-significant field.
PM ~ PM shifted right by 2 bits
FOR i = 0 to 1
DO
PM [i + 6] ~ fsre2(i) ,,; fsre1(i) (unsigned)
fdest(i) ~ smaller of fsre2(i) and fsre1(i)
OD
MERGE
~.
0
fzchks fsre1, fsre2, fdes! ...................................................... 16-Bit Z-Buffer Check
Consider the 64-bit operands as arrays of four 16-bit
.
fields fsre1(3)..fsre1(O), fsre2(3) .. fsre2(O) , and fdest(3) ..fdest(O)
where zero denotes the least-significant field.
PM ~ PM shifted right by 4 bits
FOR i = 0 to 3
DO
PM [i + 4] ~ fsre2(i) ,,; fsre1(i) (unsigned)
fdest(i) ~ smaller of fsre2(i) and fsre1(i)
OD
MERGE
~
0
intovr .......................................................... Software Trap on Integer Overflow
IF OF = 1
THEN generate trap with IT set in psr
FI
ixfr isre 1ni, fdes! ....................................................Transfer Integer to F-P Register
fdes! ~ isre 1ni
2-124
i860™ XP MICROPROCESSOR
Id.c csrc2, idest . ........................................................ Load from Control Register
idest ~ csrc2
Id.x isrc1(isrc2), idest .................................................................. Load Integer
idest ~ mem.x (isrc1 + isrc2)
Idint.x isrc2, idest ............................................................ Load Interrupt Vector
idest ~ int_vector.x (isrc2)
NOTE: Not available with the i860 XR CPU
Idio.)( isrc2, idest ......................................................................... Load I/O
idest ~ portx (isrc2)
NOTE: Not available with the i860 XR CPU
lock .................................................................. Begin Interlocked Sequence
Set BL in dirbase.
.
The next load or store that appears on the bus locks that location.
Disable interrupts until the bus is unlocked.
mov isrc2, idest ............................................................ Register-Register Move
Assembler pseudo-operation
mov isrc2, idest = shl rO, isrc2, idest
mov cons!32, ides! .. : ................................................... Constant-to-Register Move
Assembler pseudo-operation
when OxFFFF8000 ::;: cons!32 < Ox8000 ...
adds l%cons!32, rO, ides!
otherwise ...
orh h%cons!32, rO, ides!
or l%cons!32, idest, ides!
nop ..................................................................... . Core-Unit No Operation
Assembler pseudo-operation
nop = shl rO, rO, rO
or isrc 1, isrc2, ides! ..................................................... ; ............... Logical OR
ides! ~ isrc1 OR isrc2
CC set if result is zero, cleared otherwise
orh #const, isrc2, ides! ............................................................ . Logical OR high
ides! ~ (# cons! shifted left 16 bits) OR isrc2
CC set if result is zero, cleared otherwise
pfadd.p fsrc1, fsrc2, fdes! . .............................................. Pipelined Floating-Point Add
fdes! ~ last stage adder result
Advance A pipeline one stage
A pipeline first stage ~ fsrc1 + fsrc2
pfaddp fsrc1, fsrc2, fdes! ............................................ . Pipelined Add with Pixel Merge
fdes! ~ last-stage graphics-unit result
last-stage graphics-unit result ~ fsrc1 + fsrc2
(using integer arithmetic; 8-byte operands and destination)
Shift, then load MERGE register from fsrc1 + fsrc2 as defined in Table 10.2
pfaddz fsrc 1, fsrc2, fdes! . ................................................ Pipe lined Add with Z Merge
!rdes! ~ last-stage graphics-unit result
last-stage graphics-unit result ~ fsrc1 + fsrc2
(using integer arithmetic; 8-byte operands and destination)
Shift MERGE right 16, then load fields 31..16 and 63 .. 48 fromfsrc1 + fsrc2
2-125
intel®
i860TM XP MICROPROCESSOR
pfam.p fsrc1, fsrc2, fdest ................................... Pipelined Floating-Point Add and Multiply
fdest ~ last stage adder result
Advance A and M pipeline one stage (operands accessed before advancing pipeline)
A pipeline first stage ~ A-op1 + A-op2
M pipeline first stage ~ M-op1 x M-op2
pfamov.r fsrc1, fdest .......................................... . Pipelined Floating-Point Adder Move
fdest ~ last stage adder result
Advance A pipeline one stage
A 'pipeline first stage ~ fsrc1
pfeq.p fsrc1, fsrc2, fdest .................................... . Pipelined Floating-Point Equal Compare
fdest ~ last stage adder result
CC set if fsrc1 = fsrc2, else cleared
Advance A pipeline one stage
A pipeline first stage is undefined, but no result exception occurs
pfgt.p fsrc1, fsrc2, fdest ............................. . Pipelined Floating-Point Greater-Than Compare
(Assembler clears R-bit of instruction)
fdest ~Iast stage adder result .
CC set if fsrc1 > fsrc2, else cleared
Advance A pipeline one stage
A pipeline first stage is undefined, but no result exception occurs
pfiadd.w fsrc1, fsrc2, fdest . .............................................. Pipelined Long-Integer Add
fdest ~ last-stage graphics-unit result
last-stage graphics-unit result ~ fsrc1 + fsrc2 (2's complement integer arithmetic)
pfisub.w fsrc1, fsrc2, fdest . .......................................... Pipelined Long-Integer Subtract
fdest ~ last-stage graphics-unit result
last-stage graphics-unit result ~ fsrc1 - fsrc2 (2's complement integer arithmetic)
pfix.v fsrc1, fdest ..................................... Pipelined Floating-Point to Integer Conversion
fdest ~ last stage adder result
Advance A pipeline one stage
A pipeline first stage ~ 64-bit value with low-order 32 bits
equal to integer part of fsrc1 rounded
Pipelined Floating-Point Load
pfld.y isrc1(isrc2), fdest . .................................................................. . (Normal)
pfld.y isrc1(isrc2) + +, fdest ........................................................ (Auto increment)
fdest ~ mem.y (third previous pfld's (isrc1 + isrc2)
(where .y is precision of third previous pfld.y)
IF autoincrement
THEN isrc2 ~ isrc1 + isrc2
FI
NOTE: pfld.q is not available with the i860 XR CPU
pfle.p fsrc1, fsrc2, fdest ................................... Pipelined F-P Less-Than or Equal Compare
Assembler sets R-bit of instruction
fdest ~ last stage adder result
CC clear if fsrc 1 ~ fsrc2, else set
Advance A pipeline one stage
A pipeline first stage is undefined, but no result exception occurs
2-126
intel®
i860™ XP MICROPROCESSOR
pfmam.p fsrc 1, fsrc2, fdes! .................................. Pipe lined Floating-Point Add and Multiply
fdes! +- last stage multiplier result
Advance A and M pipeline one stage (operands accessed before advancing pipeline)
A pipeline first stage +- A-op1 + A-op2
M pipeline first stage +- M-op1 x M-op2
pfmov.r fsrc1, fdes! ......................................... . Pipelined Floating-Point Reg-Reg Move
Assembler pseudo-operation
pfmov.ss fsrc1, fdes! = pfiadd.ss fsrc1, fO, fdes!
pfmov.dd fsrc1, fdes! = pfiadd.dd fsrc1, fO, fdest
pfmov.sd fsrc1, fdes! = pfamov.sd fsrc1, fdest
pfmov.ds fsrc1, fdes! = pfamov.ds fsrc1, fdes!
pfmsm.p fsrc1, fsrc2, fdes! ............................. Pipelined Floating-Point Subtract and Multiply
fdest +- last stage multiplier result
Advance·A and M pipeline one stage (operands accessed before advancing pipeline)
A pipeline first stage +- A-op1 - A-op2
M pipeline first stage +- M-op1 x M-op2
pfmul.p fsrc1, fsrc2, fdes! ........................................... Pipelined Floating-Point Multiply
fdest +- last stage multiplier result
Advance M pipeline one stage
M pipeline first stage +- {src1 x fsrc2
pfmul3.dd fsrc1, {src2, fdes! . .......................................... Three-Stage Pipelined Multiply
{dest +- last stage multiplier result
Advance 3-Stage M pipeline one stage
M pipeline first stage +- {src1 x {src2
pform {src 1, fdest . ................................................. Pipelined ORlo MERGE Register
{des! +- last-stage graphics-unit result
last-stage graphics-unit result +- {src1 OR MERGE
MERGE +- 0
pfsm.p fsrc 1, fsrc2, {dest ............................... Pipelined Floating-Point Subtract and Multiply
{des! +- last stage adder result
Advance A and M pipeline one stage (operands accessed before advancing pipeline)
A pipeline first stage +- A-op1 - A-op2
M pipeline first stage +- M-op1 x M-op2
pfsub.p {src1, fsrc2, {dest . .......................................... Pipelined Floating-Point Subtract
{des! +- last stage adder result
Advance A pipeline one stage
A pipeline first stage +- {src1 - fsrc2
pftrunc.v fsrc1, fdes! . ................................. Pipelined Floating-Point to Integer Conversion
{dest +- last stage adder result
Advance A pipeline one stage
A pipeline first stage +- 64-bit value with low-order 32 bits
equal to integer part of fsrc 1
2-127
FII
i860TM XP MICROPROCESSOR
pfzchkl fsrc1, fsrc2, fdest ............................................ Plpellned 32-Bit Z-Buffer Check
Consider the 64-bit operands as arrays of two 32-bit
fields fsrc1(1) .. fsrc1(0), fsrc2(1) .. fsrc2(0), and fdest(1)..fdest(0)
where zero denotes the least-significant field.
PM +- PM shifted right by 2 bits
FOR i = 0
1
DO
to
PM Ii + 61 +- fsrc2(i) :::;; fsrc1(i) (unsigned)
fdest(i) +- last-stage graphics-unit result
last-stage graphics-unit result +- smaller of fsrc2(i) and fsrc1
00
MERGE +- 0
pfzchks fsrc1, fsrc2, fdest ............... .............................. Plpelined 16-Bit Z-Buffer Check
Consider the 64-bit operands as arrays of four 16-bit
.
fields fsrc1(3) .. fsrc1(0), fsrc2(3) .. fsrc2(0), and fdest(3) .. fdest(0)
where zero denotes the least-significant field.
PM +- PM shifted right by 4 bits
FOR i = 0 to 3
DO
PM Ii + 41 +- fsrc2(i) :::;; fsrc1(i) (unsigned)
fdest +- last-stage graphics-unit result
.
last-stage graphics-unit result(i) +- smaller of fsrc2(i) and fsrc1(i)
00
MERGE· +- 0
pst.d fdest, # const(isrc2) ......................................•......................... Pixel Store
pst.d fdest, # const(isrc2) + + ............................................. Pixel Store Autoincrement
Pixels enabled by PM in mem.d (isrc2 + #consf) +- fdest
Shift PM right by 8/pixel size (in bytes) bits
IF autoincrement
THEN isrc2 +- #const + isrc2
FI
scyc.x isrc2 ..........................................................................Special Cycles
Generate a special bus cycle (D/C#=O, W/R#=1,M/10#=0) and
set BE7#-BEO#·according to the value contained in the register isrc2
NOTE: Not available with the i860 XR CPU
shl isrc1, isrc2, idest .................................................•.....................Shift Left
idest +- isrc2 shifted left by isrc1 bits
shr isrc1, isrc2, idest . ....................................................................Shift Right
SC (in psr) +- isrc1
idest +- isrc2 shifted right by isrc1 bits
shra isrc1, isrc2, idest ......................................................... Shift Right Arithmetic
idest +- isrc2 arithmetically shifted right by isrc1 bits
shrd isrc1ni, isrc2, idest . ..............................•...........................Shlft Right Double
idest +- low-order 32 bits of isrc1ni:isrc2 shifted right by SC bits
.
st.c isrc 1ni, csrc2 ......................................................... Store to Control Register
csrc2 +- src1ni
.
st.x isrc 1ni, # const(isrc2) ............................................................. Store Integer
mem.x (isrc2 + #consf) +- isrc1ni
2-128
i860TM XP MICROPROCESSOR
stio.x isrc1ni, isrc2 .... , ............................................ , .......................... Store 1/0
port.x (isrc2) +- isrc 1ni
NOTE: Not available with the i860 XR CPU
subs isrc1, isrc2, idest .............................................................. Subtract Signed
idest +- isrc1 - isrc2
OF +- (bit 31 carry
bit 30 carry)
CC set if isrc2 > isrc1 (signed)
CC clear if isrc2 ~ isrc1 (signed)
'*
subu isrc1, isrc2, idest ....... '............. , ...................................... Subtract Unsigned
idest +- isrc 1 - isrc2
OF +- NOT (bit 31 carry)
CC +- bit 31 carry
(i.e.
CC set if isrc2 ~ isrc1 (unsigned)
CC clear if isrc2 > isrc1 (unsigned»
trap
~~~~~~::rfr;:~:~ ·rr· ~~t i~ .p~~ ...................................................Software Trap
unlock ................................................................. End Interlocked Sequence
Clear BL in dirbase. The next load or store
unlocks the bus. Interrupts are enabled.
xor isrc 1, isrc2, idest .......................................................... Logical Exclusive OR
idest +- isrc1 XOR isrc2
CC set if result is zero, cleared otherwise
xorh # const, isrc2, idest .................................................. Logical Exclusive OR High
idest +- (# const shifted left 16 bits) XOR isrc2
CC set if result is zero, cleared otherwise
Table 10.2. FADDP MERGE Update .
Pixel Size
(fromPS)
8
16
32
Right Shift Amount
(Field Size)
Fields Loaded from
Result into MERGE
63 .. 56,
63 .. 58,
63 .. 56,
47 ..40,
47 ..42,
31 .. 24,
31 .. 26,
31 .. 24
2-129
15.. 8
15.. 10
8
6
8
•
i860TM XP MICROPROCESSOR
10.2 Instruction Format and Encoding
10.2.1 REG-FORMAT INSTRUCTIONS
All instructions are 32 bits long and begin on a fourbyte boundary. When operands are registers, the
encodings shown in Table 10.3 are used.
Within the REG-format are several. variations as
shown in Figure 10.1. Table 10.4 gives the encodings for these instructions. One encoding is an escape code that defines yet another variation: the
core escape instructions. Figure 10.2 shows the format of this group, and Table 10.5 shows the encodings.
There are two general core-instruction formats
(REG-format and CTRL-format) and a separate format for floating-point instructions.
In these instructions, the src2 field .selects one of
the 32 integer registers (most instructions) or one of
the control registers (st.c and Id.c): Dest selects
one of the 32 integer registers (most instructions) or
floating-paint registers (tld,tst, ptld, pst, ixtr). For
instructions where src1 is optionally an immediate
value, bit 26 of the opcode (I-bit) indicates whether
src1 is an immediate. If bit 26 is clear, an integer
register is used; if bit 26 is set, src1 is contained in
the low-order 16 bits, except for bte and btne
instructions. For bte and btne, the five-bit immediate
value is contained in the src1 field. For st, bte, btne,
and bla, the upper five bits of the offset or broffset
are contained in the dest field instead of src1, and
the lower 11 bits of offset are the lower .11 bits of
the instruction.
Table 10.3. Register Encoding
Register
Encoding
rO
0
r31
31
to
0
t31
31
Fault Instruction
Processor Status
Directory Base
Data Breakpoint
Floating-Point Status
Extended Processor Status
0
1
2
3
4
5
Bus Error Address'
Concurrency Control'
pO'
p1'
p2*
p3'
For Id and st, bits 28 and zero determine operand
size as follows:
6
7
8
9
10
11
Bit 28
BitO
Operand Size
0
0
1
1
0
1
0
1
8-bits
8-bits
16-bits
32-bits
When src1 is immediate and bit 28 is set, bit zero of
the immediate value is forced to zero.
NOTE:
'Available only with iB60 XP CPU. Using these encodings
with the iB60 XR CPU produces undefined results.
2-130
int:eL
i860TM XP MICROPROCESSOR
For fld, fst, pfld, pst, and flush, bit 0 selects autoincrement addressing if set. For fld, fst, pfld, and pst,
bits one and two select the operand size as follows:
Bit 1
Bit2
Operand Size
0
0
1
1
0
1
0
1
64-bits
128-bits
32-bits
32-bits
When sret is immediate, bits zero and one of the
immediate value are forced to zero to maintain alignment. When bit one of the immediate value is clear,
bit two is also forced to zero.
For the instructions Idio, stio, Idint, and seye, the
operand size is encoded by bits 9 and 10 as follows.
For other instructions, these bits are reserved and
should be set to zero.
Operand Size
Bit 10
Bit 9
8 Bits (.b)
16 Bits (.s)
32 Bits (.I)
reserved
0
0
1
1
0
1
0
1
For flush, bits one and two must be zero.
313029282726. r.252-12J222/ '2019181716 15141Jf211 1 0 9 8 7 6 5 - 1 3 2
I
SRC2
OPCODE/I
I
DEST
SRCI
1 0
IMMEDIATE, OFFSET,
OR NULL
240874-74
'03130292827. '26. 52-12322 21 '2019181716 151-11312111098765 -I 32
I
OPCODE
I
SRC2
I
DEST
IMMEDIATE
1 0
I
240874-75
313029282726. '252-1232221 '2019181716 151-1131211 1 0 9 8 7 6 5 - 1 3 2
I
OFFSET
HIGH
SRC2
OPCODE/I
SRCI
SRCIS
1 0
OFFSET LOW
240874-76
31.50292827. '26.'252-12322 21 '2019181716 151-1131211 1 0 9 8 7 6 5 - 1 3 2 1 0
I
OPCODE
II
SRC2
I
OFFSET
HIGH
IMMEDIATE I
OFFSET LOW
t
240874-77
Figure 10.1. REG-Format Variations
2-131
int'eL
i860TM XP MICROPROCESSOR
Table 10.4. REG-Format Opcodes
31
30
Id.x
st.x
ixfr
..
28
27
26
0
1
1
1
I
1
0
0
-
Load Integer
Store Integer
Integer to F-P Reg Transfer
(reserved) .
0
0
0
0
0
0
0
·0
0
0
0
0
L
L
0
fld.x, fst.x
flush
pst.d
Id.c, st.c
Load/Store F-P
Flush
Pixel Store
Load/Store Control Register
0
0
0
0
0
0
0
0
1
1
1
1
0
1
1
1
LS
0
LS
1
1
0
bri
trap
Branch Indirect
Trap
(Escape for F-P Unit)
(Escape for Core Unit)
Branch Equal or Not Equal
Pipelined F-P Load
(CTRL-Format Instructions)
0
0
0
0
0
0
0
,1
1
1
1
1
1
1
0
0
0
0
0
1
1
0
0
0
0
1
0
x
0
0
1
1
E
0
x
0
1
0
1
I
I
x
addu, os, subu, ~s
shl, shr
shrd
bla
shra
Add/Subtract
Logical Shift
Double Shift
Branch LCCSet and Add
Arithmetic Shift
1
1
1
1
1
0
0
0
0
0
0
1
1
1
1
SO
0
AS
LR
0
0
I
I
0
and(h)
andnot(h)
or(h)
xor(h)
AND
ANDNOT
OR
XOR
(reserved)
1
1
1
1
1
1
1
1
1
1
0
0
1
1
x
0
1
0
1
x
--
bte, btne
pfld.y
-
L
LS
SO
H
29
Integer Length
o -8 bits
1
-16 or 32 bits (Selected by bit OJ
Load/Store
o -Load
1
-Store
Signed/Ordinal
o -Ordinal
1
-Signed
High
o -and, or, andnot, xor
1
-andh, orh, andnoth, xorh
AS
LR
E
OEST
Ell RESERVED
1
1
1
1.
1
I
H
H
H
H
I
I
I
I
0
1
SRC1
Figure 10.2. Core Escape Instructions
2-132
1
1
Add/Subtract
o -Add
1
-Subtract·
Left/Right
o -Left Shift
1
-Right Shift .
Equal
o -Branch on Unequal
1
~Branch on Equal
Immediate
o -src 1 is register
1
-src1 is immediate
BY INTEL CORPORATION (SET TO ZERO)
I
240874-78
int:eL
i860TM XP MICROPROCESSOR
Table 10.5. Core Escape Opcodes
-
lock
calli
-
introvr
-
-
unlock
Idio'
stio'
Idint'
scyc'
-
(reserved)
Begin Interloacked Sequence
Indirect Subroutine Call
(reserved)
Trap on Integer Overflow
(reserved)
(reserved)
End Interlocked Sequence
Load 110
Store 110
Load Interrupt Vector
Special Cycles
(reserved)
(reserved)
(reserved)
4
3
2
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
1
1
1
1
1
0
1
0
0
0
0
1
1
1
1
0
0
0
0
1
o
x
x
0
0
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
0
1
0
1
x
x
x
x
x
x
NOTE:
'Available only with i860 XP CPU, not with i860 XR CPU
10.2.2 CTRL·FORMAT INSTRUCTIONS
The CTRL-Format instructions do not refer to registers; so, instead of the register fields, they have a 26-bit
relative branch offset. Figure 10.3 shows the format of these instructions and Table 10.6 defines the encod.
ings.
'J1$029, '282726, ~_.u.~mmqm~NOU"mg87654$2 1 0
101 11
BROFFSET
OPC
}
240874-79
NOTE:
BROFFSET is a signed 26-bit relative branch offset
Figure 10.3. CTRL·Format Instructions
Table 10.6. CTRL·Format Opcodes
-
br
call
bc(.t)
bnc(.t)
T
(reserved)
(reserved)
Branch Direct
Call
Branch on CC Set
Branch on CC Clear
28
27 .
26
0
0
0
0
1
1
O..
0
1
0
1
T
T
0
1
1
0
1
Taken
o
1
-bc or bnc
-bc.t or bnc.t
10.2.3 FLOATING·POINT INSTRUCTION
ENCODING
The floating-point instructions also constitute an escape series. All these instructions begin with the bit
sequence 010010. Figure 10.4 shows the format of
the floating-point instructions, and Table 10.7 gives
the encodings. Within the dual-operation instructions
is a subcode DPC whose vaiues are given in Table
10.9 along with the mnemonic that corresponds to
each.
.
.
2-133
EI
intei®
i860TM XP MICROPROCESSOR
313029282726-
\0 1
o
'52~2322
0 1 01
21 '2019'181716 15 14 13 12 II 10 9
SRC2
DEST
SRCl
87 6 5
P DiS R
4
J
2
OPCODE
1 0
[
240874-80
SRC1, SRC2.
DEST
P
o
5
R
Source; one of 32 floating-paint registers
Destination; one of 32 floating-point registers (except fxfr; one of 32 integer registers)
Pipelining
1
Pipelined instruction mode
a Scalar instruction mode
Dual-Instruction Mode
1
Dual-instruction mode
a Single-instruction mode
Source Precision
1
Double-precision source operands
a Single-precision source operands
Result Precision
1
Double-precision result
a Single-precision result
Figure 10.4. Floating-Point Instruction Encoding
Table 10.7. Floating-Point Opcodes
654
pfam
pfmam
pfsm
pfmsm
Add and Multiply'
Multiply with Add'
Subtract and Multiply'
Multiply with Subtract'
(p)fmul
fmlow
frcp
frsqr
pfmul3.dd
3
o
2
0
0
0
0
0
1
Multiply
Multiply Low
Reciprocal
Reciprocal Square Root
3-Stage Pipelined Multiply
0
0
0
0
0
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
1
1
0
0
1
0
1
0
(p)fadd
(p)fsub
(p)fix
(p)famov
pfgt/pfle"
pfeq
(p)ftrunc
Add
Subtract
Fix
Adder Move
Greater Than
Equal
Truncate
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
,1
0
0
0
0
0
0
1
0
0
0
0
1
1
0
0
0
1
1
0
0
1
0
1
0
1
0
1
0
fxfr
(p)fiadd
(p)fisub
Transfer to Integer Register
Long-Integer Add
Long-Integer Subtract
1
1
1
0
0
0
0
0
0
0
1
1
0
0
1
0
0
0
0
1
1
(p)fzchkl
(p)fzchks
(p)faddp
(p)faddz
(p)form
Z-Check Long
Z-Check Short
Add with Pixel Merge
Add with Z Merge
OR with MERGE Register
1
1
1
1
1
0
0
0
0
0
1
1
1
1
1
0
1
0
0
1
1
1
0
0
0
1
1
0
0
1
1
1
0
1
0
NOTE:
All opcodes not shown are reserved.
• pfam and pfsm have P-bit set; pfmam and pfmsm have P-bit clear .
•• pfgt has R bit cleared; pile has R bit set.
2-134
OPC
OPC
int:eL
i860TM XP MICROPROCESSOR
Table 10.8. ope Encoding
M-Unit
01
M-Unit
op2
A-Unit
op1
A-Unit
op2
T
Load
Load'
r2s1
r2st
r2as1
r2ast
KR
KR
KR
KR
src2
src2
src2
src2
src1
T
src1
T
M result
M result
A result
A result
No
No
Yes
Yes
Yes
Yes
i2p1
i2pt
i2ap1
i2apt
i2s1
i2st
i2as1
i2ast
KI
KI
KI
KI
src2
src2
src2
src2
src1
T
src1
T
M result
M result
A result
A result
No
No
Yes
Yes
Yes
Yes
1000
1001
1010
1011
rat1p2
m12apm
ra1p2
m12ttpa
rat1s2
m12asm
ra2s2
m12ttsa
KR
src1
KR
src1
A result
src1
src2
Yes
src2
A result
src2
A result
M result
src1
T
src2
No
No
A result
Yes
1100
1101
1110
1111
iat1p2
m12tpm
ia1p2
m12tpa
iat1s2
m12tsm
ia1s2
m12tsa
KI
src1
KI
src1
A result
src1
T
src1
T
src2
M result
src2
A result
Yes
src2
A result
src2
No
No
No
No
No
No
No
No
No
No
No
ope
PFMAM
Mnemonic
PFMSM
Mnemonic
M-Unit
op1
M-Unit
op2
A-Unit
op1
A-Unit
op2
T
Load
Load'
0000
0001
0010
0011
mr2p1
mr2pt
mr2mp1
mr2mpt
mr2s1
mr2st
mr2ms1
mr2mst
KR
KR
KR
KR
src2
src2
src2
src2
src1
T
src1
T
M result
M result
M result
No
No
Yes
M result
Yes
Yes
Yes
0100
0101
0110
0111
mi2p1
(mi2pt
mi2mp1
mi2mpt
mi2s1
mi2st
mi2ms1
mi2mst
KI
KI
KI
KI
src2
src2
src2
src2
src1
T
src1
T
M result
M result
M result
M result
No
No
Yes
Yes
Yes
Yes
1000
1001
1010
1011
mrmt1p2
mm12mpm
mrm1p2
mm12ttpm
mrmt1s2
mm12msm
mrm1s2
mm12ttsm
KR
src1
KR
src1
M result
src2
M result
src2
src1
M result
src1
T
src2
Yes
M result
src2
M result
No
No
1100
1101
1110
mimt1p2
mm12tpm
mim1p2
mimt1s2
mm12tsm
mim1s2
KI
src1
KI
M result
src1
T
src1
src2
M result
src2
ope
PFAM
Mnemonic
PFSM
Mnemonic
0000
0001
0010
0011
r2p1
r2pt
r2ap1
r2apt
0100
0101
0110
0111
1111
src2
M result
Yes
Yes
No
No
K
No
No
No
No
K
No
No
No
No
No
No
No
No
No
No
No
Intel Reserved
NOTE:
• If K-Ioad is set, KR is loaded when operand-1 of the multiplier is KR; KI is loaded when operand-1 of the multiplier is KI.
2-135
II
i860™ XP MICROPROCESSOR
10.3 Instruction Timings
Generally, i860 XP microprocessor instructions take
bne clock to execute unless a freeze condition is
invoked. Detailed times, along with freeze conditions
and their associated delays, are shown in the table
on the following pages. The following symbols are
used for brevity in the timing table:
+n
-
n._m
c.
If a floating-point instruction, graphics-unit instruction, fst, or pst is executed when a scalar
floating-point operation (other than frcp or
frsqr) is in progress, the scalar operation must
complete first: two additional clocks for fadd,
fix, fmlow, fmul.ss, fmul.sd, ftrunc, and
fsub; three additional clocks for fmul.dd. Add
one if either or both of these situations occur:
clocks must be added to the execution
time if the stated conditions apply.
n
The processor requires at least n clocks between the indicated instructions. The actual
delay will be. n minus the number of clocks
for executing intervening instructions (or
dual-mode pairs). If the time for intervening
instructions is ?: n, there is no delay.
1. There is an overlap between the result register of the previous scalar operation and
the source of the floating-point operation,
and the destination precision of the scalar
operation differs from the source precision
of the floating-point operation.
Indicates a range of clocks. These cases
are accompanied by a reference to a note
where further explanation is available.
XR:
Applies to i860 XR microprocessors only.
Applies to i860 XP microprocessors only.
OA
The number of clocks to finish all outstanding accesses.
R1
The number of clocks from ADS# through
the first READY# (80860XR) or BRDY#
(80860XP) of the indicated bus activity.
R2
The number of clocks from ADS# through
the second READY# or BRDY#.
RL
The number of clocks from ADS# through
the last READY# or BRDY#.
RL1
XP: The number of clocks through last
BRDY# of first access.
RN
XR: The number of clocks until next nonrepeated address can be issued (Le., an address that is not the 2nd-4th cycle of a
cache fill, the 2nd-8th cycle of a CS8 mode
instruction fetch, nor the 2nd cycle of a 128bit write).
The number of clocks through READY # or
BRDY # for the next 64-bit-or-less write cycle or second READY # or BRDY # for the
next 128-bit write cycle.
NOTES:
a.
"Store path full" means two stores or one 256bit write-back internally waiting for bus plus externalbus pipeline full.
.
n
XP:
RX
b.
2. The floating-point operation is pipelined
and its destination is not fO.
TLB TLB miss. Five clocks plus the number of
clocks to finish two reads plus the number of
clocks to set A-bits (if necessary).
In addition, any instruction may be delayed due to an
instruction cache, miss or TLB miss during the instruction fetch. The time for a TLB miss is shown
above in note TLB. An instruction cache miss adds
the following delays:
• The number of clocks to get the next instruction
from the bus (ADS # clock to first READY # or
BRDY# clock, inclusive).
.. XR: When any of the instructions in the new instruction-cacheline is a branch or call or causes
a freeze, the time through the last READY # for
the new line.
.. If the data cache is being accessed when the instruction-cache miss occurs, two clocks for data
cache miss; one clock for hit.
Not included in the table is the delay caused by a
trap. This depends on the trap handler.
In dual instruction mode, each pair of instructions
requires the maximum of the times required by each
individual instruction.
"Address path full" means one address internally waiting for bus while external bus pipeline
full.
2-136
int'eL
Instruction
i860™· XP MICROPROCESSOR
Execution
Clocks
Condition
adds
addu
and
andh
and not
andnoth
bc
1
2
+
bc.t
1
2
+1
bla
1
2
If branch not taken.
If branch taken.
If the prior instruction is addu, adds, subu, subs, pfeq, or pfgt.
If branch taken.
If branch not taken.
If the prior instruction is addu, adds, subu, subs, pfeq, or pfgt. .
If branch taken.
If branch not taken.
bnc
(same as bc)
bnc.t
(same as be.t)
br
brl
2
bte
1
3
btne
If branch not taken.
If branch taken.
(same as bte)
call
+1
+1 +R1
+1+R2
calli
If r1 referenced in next instruction.
If data cache load miss in progress for a read of less than 128 bits.
If data cache load miss in progress for 128-bit read.
2
+1
+1 +R1
+1 +R2
If r1 referenced in next instruction.
If data cache load miss in progress for a read of less than 128 bits.
If data cache load miss in progress for 128-bit read.
- 2..4
( ... and all other A-unit instructions except dual operations)
If executed when a scalar floating-point operation (other than frcp
or frsqr) is in progress.(e)
fadd.p
2-137
EI
InteL
Instruction
i860TM XP MICROPROCESSOR
E~~~~~:n
faddp
~
Condition
( ... and all other G-unit instructions except fladd.w, fxfr)
+ 1 If {dest is used by next instruction and next instruction is G-, M· or A-unit instruction
2..4 If executed when a scalar floating·point operation (other than frep or frsqr) is in
progress.(el
faddz
(same as faddp)
famov.r
(same as fadd.p)
fladd.w
~
1
+ 1 If {dest is used by next instruction and next instruction is M· or A·unit instruction
(except when fiadd is used for fmov.dd or fmov.ss).
+ 1 If (dest is used by next instruction and next instruction is G-unit instruction.
2 .. 4 If executed when a scalar floating-point operation (other than frep or frsqr) is in
progress.(el
flsub.w
(same as faddp)
fix.v
(same as fadd.p)
fld.y
+1
~2
+1 +R1
+1 +R2
+1 +Rl
~2
+2
+R2
+RN
+RL1
+TlB
If this is the instruction after a st, fst or pst that hits the data cache.
If (dest is referenced in the next two instructions.
If 32·bit fld.l or 64·bit fld.d misses the data cache.
If 128-bit fld.q misses the data cache.
If data cache load miss in progress (except in the. following case).
XP: If this instruction follows a data cache access that misses in the virtual tags but
hits in the physical tags.
XP: If the prior instruction is a pfld.y that hits a modified line in the data cache.
XP: If data·cache line write-back due to snoop is in progress.
XR: If address path full.(al
XP: If address path full.(a)
IfTlB miss.
flush
~
3
2
+ R2
+ 1 + RX
+ TlB
~
fmlow.dd
~
XR: If preceded by another flush.
XP: If preceded by another flush.
XP: If data-cache line write-back due to snoop is in progress.
If flush to modified line when store path full.(b)
If TlB miss.
( ... and all other M-unit instruction except dual operations)
+ 1 If (src1 refers to result of the prior operation (either scalar or pipelined).
+ 1 If the prior operation is a double-precision multiply.
2 ..4 If executed when a scalar floating-point operation (other than frep or frsqr) is in
progress.(e)
fmov.r
fmov.ss and·fmov.dd same as fiadd.w
fmov.sd and fmov.ds same as iadd.p .
fmul.p
(same as fmlow.dd)
2-138
i860TM XP MICROPROCESSOR
Instruction
Execution
Clocks
Condition
fnop
form
(same as faddp)
frcp.p
(same as fmlow.dd)
frsqr.p
(same as fmlow.dd)
fst.y
1
+1
+1+RL
+2
-2
+R2
- 2.. 4
+RN
+RL1
+1+RX
+TLB
If followed by pipelinedfloating-point operation that overwrites the register
being stored.
If data cache load miss in progress.
XP: If the prior instruction is a pfld.y that hits a modified line in the data cache.
XP: If this instruction follows a data cache access that misses in the virtual
tags but hits in the physical tags.
XP: If data-cache line write-back due to snoop is in progress.
If executed when a scalar floating-point operation (other than frcp or frsqr) is
in progress,(el
XR: If address path full.(al
XP: If address path full.(al
If cache miss when store path full,(bl
IfTLB miss.
fsub.p
(same as fadd.p)
ftrunc.v
(same as fadd.p)
fxfr
1
+1
+1+R1
+1+R2
- 2.. 4
If idest referenced in next instruction.
If data cache load miss in progress for 64-bit read.
If data cache load miss in progress for 12B-bit read.
If executed when a scalar floating-point operation (other than frcp or frsqr) is
in progress.(el
fzchkl
(same as faddp)
fzchks
(same as faddp)
Intovr
Ixfr
Id.c
1
+1+R1
+1+R2
-2
If data cache load miss in progress for 64-bit read.
If data cache load miss in progress for 12B-bit read.
If fdest is referenced in the next two instructions.
1
+1
+1+R1
+1+R2
If idest referenced in next instruction.
If data cache load miss in progress fOr 64-bit read ..
If data cache load miss in progress for 12B~bit read.
2-139
II
i860TM XP MICROPROCESSOR
Condition
If idest referenced in next instruction.
If this is the instruction after a st, fst or pst that hits the data cache.
If data cache load miss in progress.
If Id.x misses the data cache and a subsequent instruction references the
idest of the Id.x (except for following case).
XP: If this instruction follows a data cache access that misses in the virtual
tags but hits in the physical tags.
XP: If the prior instruction is a pfld.y that hits a modified line in the data cache.
XP: If data-cache line write-back due to snoop is in progress.
XR: If address path fuu.(al
XP: If address path full.(al
If cache miss when store path full.Cb)
IfTLB miss.
+ OA
+ OA
mov
nop
or
orh
pfadd.p
(same asfadci.p)
pfaddp
(same as faddp)
pfaddz
(same as faddp)
pfam.p
1
+1
~
+1
2 ..4
( ... and all other dual operations)
If fsrc1 refers to result of the prior operation (either scalar or pipelined).
If the prior operation is a double-precision multiply.
If executed when a scalar floating-point operation (other than frep or frsqr)is
in progress.(el
pfamov.r
(same as fadd.p)
pfeq.p
(same as fadd.p)
pfgt.p
(same as fadd.p)
pfiadd.w
(same as faddp)
pfisUD.W
(same as faddp)
pfix.v
(same as fadd.p)
2·140
i860TM XP MICROPROCESSOR
Instruction
pfld.y
Execution
Clocks
1
+1+RL
~2
+ 1 +RL1
+2+0A
+2
~2
+R2
+RN
+RL1
+TLB
Condition
If data cache load miss in progress.
If (dest is referenced in the next two instructions.
If three pfld's are outstanding.
XR: If pfld hits data cache.
XP: If the prior instruction is a pfld.y that hits a modified line in the
data cache.
XP: If this instruction follows a.data cache access that misses in
the virtual tags but hits in the physical tags.
XP: If data-cache line write-back due to snoop is in progress.
XR: If .address path full,(a)
XP: If address path full.(a)
IfTLB miss.
pfle.p
pfmam.p
(same as pfam.p)
pfmov.r
pfmov.ss and pfmov.dd same as faddp
pfmov.sd and pfmov.ds same as fadd.p
pfmsm.p
(same as pfam.dd)
pfmul.p
(same as fmlow.dd)
pfmul3.dd
(same as fmlow.dd)
pform
(same as faddp)
pfsm.p
(same as pfam.dd)
pfsub.p
(same as fadd.p)
pftrunc.v
(same as fadd.p)
pfzchkl
(same as faddp)
pfzchks
(same as faddp)
pst.d
(same as fst.d)
scyc.x
1
+ OA
shl
shr
shra
shrd
st.c
3
+1 +R1
+1+R2
If data cache load miss in progress for a ~ead of less than 128 bits.
If data cache load miss in progress for 128~bit read.
2-141
fI
i860TM XP MICROPROCESSOR
Instruction
st.x
Execution
Clocks
1
+1 +RL
+2
+---+0 2
+R2
+RN
+RL1
+1+RX
+TLB
stlo.x
1
Condition
If data cache load miss in progress.
XP: If the prior instruction is a pfld.y that hits a modified line in the data cache.
XP: If this instruction follows a data cache access that misses in the virtual
tags but hits in the physical tags.
XP: If data-cache line write-back due to snoop is in progress.
XR: If address path full.(a)
XP: If address path fulUa)
If cache miss when store path full.(b)
IfTLB miss.
+ OA
subs
subu
trap
unlock
xor
'1
xorh
10.4 Instruction Characteristics
RE
The following table lists some of the characterisics
of each instruction. The characteristics are:
• What processing unit executes the instruction.
The codes for processing units are:
A Floating-point adder unit
E Core execution unit
G Graphics unit
M Floating-point multiplier unit
OAT
Floating-Point Result Exception, including
overflow,underflow, inexact result
Data Access Fault
Note that this is not the same as specifying at which
instructions faults may be reported. A result exception is reported on the subsequent floating-point instruction, pst, fst, or sometimes fld, pfld, and ixfr.
• Whether the instruction is pipelined or not. A P
indicates that the instruction is pipelined. .
• Whether the instruction is a delayed branch instruction. A 0 marks the delayed branches.
• Whether execution is suppressed in user mode.
An SU marks supervisor-only instructions.
• Whether the instruction is available on both the
i860 XR and i860 XP microprocessors. An XL
marks instructions that are available only on the
i860 XP microprocessor.
.• Whether the instruction changes the condition
code CC. A CC marks those instructions that
change CC.
• Which faults can be caused by the instruction.
The codes used for exceptions are:
IT
Instruction Fault
SE
Floating-Point Source Exception
The· instruction access fault IAT and the interrupt
trap IN are not shown in the table because they can
occur for any instruction.
• Performance notes. These comments regarding
optimum performance are recommendations only.
If these recommendations are not followed, the
i860 XP microprocessor automatically waits the
necessary number of clocks to satisfy internal
hardware requirements. The following notes define the numeric codes that appear in the instruction table:
1. The following instruction should not be a conditional branch (bc, bnc, bc.t, or bnc.t).
2. The destination should not be a source operand of the next two instructions.
3. A load should not directly follow a store that is
expected to hit in the data cache.
4. When the prior instruction is scalar, fsrc,1
should not be the same as the fdest of the prior
operation.
2-142
intel~
i860TM XP MICROPROCESSOR
5. The fdes! should not reference the destination
of the next instruction if that instruction is a
pipelined floating-point operation.
b. When using a bri to return from a trap handler,
programmers should take care to prevent traps
from occurring on that or on the next sequential instruction. 1M should be zero (interrupts
disabled) when the bri is executed.
6. The destination should not be a source operand of the next instruction. (For call and calli,
the destination is r1.)
.7. When the prior operation is scalar and multiplier op1 is fsrc1, fsrc2 should not be the same as
the fdest of the prior operation.
8. When the prior operation is scalar, src1 and
src2 of the current operation should not be the
same as dest of the prior operation.
9. A pfld should not immediately follow a pfld.
• Programming restrictions. These indicate combinations of conditions that must be avoided by programmers, assemblers, and compilers. The following notes define the alphabetic codes that
appear in the instruction table:
a. The sequential instruction following a delayed
control-transfer instruction may not be another
control-transfer instruction, nor a trap instruction, nor the target of a control-transfer instruction.
Instruction
Execution
Unit
adds
addu
and
andh
andnot
E
E
E
E
E
andnoth
bc
bc.t
bla
bnc
E
E
E
E
E
bnc.t
br
bri
bte
btne
call
calli
fadd.p
faddp
faddz
famov.r
fiadd.w
fisub.w
fix.p
fld.y
~
Pipelined?
Delayed?
Supervisor?
i860TM XP Only?
c. If fdest is not zero, fsrc1 must not be the same
as fdest.
d. When fsrc1 goes to multiplier op1 or to KR or
KI, fsrc1 must not be the same as fdest.
e. If dest is not zero, src1 and src2 must not be
the same as dest.
f. Isrc1 must not be the same register as isrc2 for
the autoincrementing form of this instruction.
g. Isrc1 must not be the same register as isrc2.
h. flush must not be used in a locked sequence
or in dual instruction mode.
Sets
CC?
Faults
CC
CC
CC
CC
CC
CC
Performance
Notes
Programming
Restrictions
1
1
D
D
a
a,g
E
E
E
E
E
D
D
D
a
a
a,b
E
E
A
G
G
D
D
A
G
G
A
E
6
6
a
a
SE,RE
8
8
SE,RE
8
8
,
SE,RE
DAT
2,3
NOTES:
• On the iS60 XP microprocessor, the pipelined instructions can generate ITR with PI.
•• On the iS60 XR micropocessor, the 12S-bit pfld.q is not available. If used it causes an instruction trap.
·2-143
f
fI
i860TM XP MICROPROCESSOR
Instruction
flush
fmlow.dd
fmul.p
form
frcp.p
Execution
Unit
Plpelined?
Delayed?
Supervisor?
1860TM XPOnly?
Sets
CC?
Faults
Performance
Notes
SE,RE
4
4
8
h
E
M
M
G
M
M
SE,RE
frsqr.p
fst.y
fsub.p
ftrunc.p
fxfr
G
6,8
fzchkl
fzchks
Intovr
ixfr
Id.c
G
G
E
E
E
8
8
Id.x
Idint.x
Idio.x
lock
or
orh
E
E
E
E
E
E
pfadd.p
pfaddp
pfaddz
pfam.p
pfamov.r
pfeq.p
pfgt.p
pfladd.w
pfisub.w
pflx.p
pfld.y
pfmam.p
pfmsm.p
pfmul.p
pfmul3.dd
pform
pfsm.p
pfsub.p
pftrunc.p
pfzchkl
Programming
Restrictions
SE,RE
E
OAT
A
A
SE,RE
SE,RE
A
G
G
A&M
A
A
A
G
G
A
E
A&M
A&M
M
M
G
A&M
A
A
G
f
5
IT
2
OAT
OAT
OAT
SU,XP
SU,XP
6
CC
CC
P
P
P
P
P
P
P
P
P
P
SE,RE*
*
*
SE,RE*
SE,RP
CC
CC
SE*
SE*
*
*
SE,RE*
1
1
8
8
OAT'
2,9
7
7
4
4
P,(XP)**
P
P
P
P
SE,RE*
SE,RE*
SE,RE*
SE,RE*
P
P
P
P
P
SE,RP
SE,RE*
SE,RE*
*
•
e
e
8
8
7
d
e
e
"
8
7
f
d
d
t./)
8
NOTES:
• On the i860 XP microprocessor, the pipelined instructions can generate ITR with PI .
•• On the i860 XR micropocessor, the 128-bit pfld.q is not available. If used it causes an instruction trap.
2-144
c
c
e
d
inteL
Instruction
pfzchks
pst.d
scyc.x
shl
shr
i860™ XP MICROPROCESSOR
Execution
Unit
Pipelined?
Delayed?
Supervisor?
i860TM XP Only?
.G
E
E
E
E
shra
shrd
st.c
st.x
stio.x
E
E
E
E
E
subs
subu
trap
unlock
xor
xorh
E
E
E
E
E
E
Sets
CC?
P
Faults
SU,XP
*
OAT
OAT
SU,XP
OAT
OAT
CC
CC
Performance
Notes
Programming
Restrictions
8
5
f
1
1
IT
CC
CC
NOTES:
• :on the i860 XP m~croprocessor, the pipelined instructions can generate ITR with PI.
On the 1860 XR mlcropocessor, the 128·bit pfld.q is not available. If used it causes an instruction trap.
10.5 Software Compatibility
10.5.1 REQUIRED CHANGES
To port existing systems software from the i860 XR
micro~rocessor to the i860 XP microprocessor, the
following changes may be required. Applications
software does not require changes.
1. Data cache flush. All four ways of the data cache
must be flushed on the i860 XP microprocessor.
The cache flush routine can be modified to check
processor type in epsr or the DCS field of
dirbase and flush the appropriate number of
ways.
2. Parity and bus error traps. If the i860 XP system
signals these errors, the trap handler must be extended to handle them. Software must avoid testing the BEF and PEF bits unless executing on the
i860 XP microprocessor.
3. LOCK# deactivation. On the i860 XP microprocessor, traps do not automatically deactivate the
LOCK# signal, so the trap handler must do a
data access to deactivate LOCK #. Trap handlers
that already access data soon after invocation do
not require this modification.
4. Load pipe precision. The precision of the last
stage of the load pipeline is specified by the LRP
bit on the i860 XR microprocessor but by the
LRPO and LRP1 bits on the i860 XP microproces-
sor. The procedure that restores· the load pipe
must .check the processor type, use the appropriate bits: and restore the correct precision. Pipe
restoration code for the i860 XR microprocessor
will work correctly on the i860 XP microprocessor
if pfld.q is not used.
5. Pre-accessed trap handler pages. Page-directory
and page-table entries for the instruction pages
of the trap handler and for the first data page
accessed by the trap handler must always have
A.= 1. Software modified to allocate page tables
thiS way works on both i860 XR and i860 XP microprocessors.
6. Page directory entry bit 7 must be zero. This is
t~e bit that se~ects four Mbyte or four Kbyte page
size. On the 1860 XR microprocessor, it is reserved and should be set to zero. It must be set
to zero for four Kbyte pages to work on the
i860 XP microprocessor.
10.5.2 PERFORMANCE OPTIMIZATIONS
Software developers may wish to make the following
performance enhancements in systems software for
the i860 XP microprocessor. Systems software that
must execute on both i860 XP and i860 XR systems
can contain code both with and without the optimiza~ions. By testing the processor type, the appropriate
Instruction path can be determined.
2-145
FI
inlet
i860™ XP MICROPROCESSOR
1. Data cache flush. On the i860 XP microprocessor, a complete flushing of the data cache is not
needed when changing context or marking a
page not present.
2. The epsr bits AI, 01, PI, and PT can be used on
the i860 XP microprocessor to make trap handlers more efficient.
3. Four-Mbyte pages can be allocated to frame buffers and the operating-system kernel, thereby reducing the cost of TLB misses.
10.5.3 NEW FEATURES
Software that uses the new features available only
on the i860 XP microprocessor will not be compatible with thei860 XR microprocessor unless alternate instruction paths are provided.
Systems software features:
1.
2.
3.
4.
5.
6.
New instructions Idio, stio, Idint, and scyc.
Four-Mbyte pages.
Privileged Registers pO, p1, p2, and p3.
Concurrency control unit.
128-bit load instruction pfld.q.
Support for virtual address aliases.
Applications software features:
1. Concurrency control unit.
2. 128-bit load instruction pfld.q. The i860 XR microprocessor traps on pfld.q; therefore, software
has the opportunity to emUlate a pfld.q with two
pfld.d instructions. However, this strategy does
not yield optimal performance on the i860 XR mi.croprocessor.
10.5.4 NOTES
On the i860 XP microprocessor, pages with WT = 1
are cached with the write-through policy; whereas,
on thei860 XR microprocessor, they are not cached
at all. Because this change in the function of WT
was anticipated in the i860 XR microprocessor documentation, no incompatibility should arise.
11.0 REVISION HISTORY
DATA SHEET REVISION REVIEW
The following list represents the major differences
between version 002 and version 001 of the 1860 XP
Microprocessor Data Sheet.
Section 2.2.4
AI bit has been changed to TAl in
Figure 2.5. The explanation for PI
bit has been expanded.
Section 4.2.33
PCHK # signal description has
been expanded.
Section 4.2.35
Output buffer configuration has
been added in PEN # signal description.
.
Section 4.2.37
RESET description has been expanded.
Section 5.1.3
Table 5.2 has been corrected.
The explanation of write/read and
read/write pipelining has been revised.
Section 5.2.2.4-5 The explanation of late back-off'
mode has been expanded.
Figure 5.27 has been corrected.
Section 5.2.4
Section 5.3.4
The explanation of EWBE# timing has been corrected.
Section 5.5
RESET initialization description
has been expanded.
Section 9.2
D.C. Characteristics are corrected.
A.C. Characteristics are replaced
Section 9.3
with nominal timings based on
CL = 0 pF.
Figure 9.3 and Figure 9.4 have
been replaced with nominal A.C .
timings based on CL = 0 pF.
Figure 9.5 has been corrected for
normal and high-current output
buffers.
Component buffer model has
Section 9.4
been added.
Section 10.4
Programming restriction on flush
instruction has been added.
2-146
int'eL
i860TM XP MICROPROCESSOR
A
8-bit pixel
data type, 2.1.4
AA
fsr U-bit (update bit), 2.2.8
access rights
address translation caches, 3.1
16-bit pixel
data type, 2.1.4
16-bit values
alignment requirements, 2.3
A.C. characteristics
electrical data, 9.3
addressing
i860 XP microprocessor, 2.3
32-bit binary floating-point
modes, 2.7
single-precision real, 2.1.3
32-bit integer
data type, 2.1.1
32-bit ordinal
address space
consistency, 3.3.1
address translation
algorithm, 2.4.5
data type, 2.1.2
caches: 3.1
32-bit pixel
faults, 2.4.6.
P (present) bit, 2.4.4.2
data type, 2.1.4
virtual addressing, 2.4
32-bit values
alignment requirements, 2.3
adds (Add Signed)
epsr OF (overflow flag), 2.2.4
64-bit binary floating-point
instruction definition, 10.1
double-precision real,· 2.1;3
instruction timing, 10.3
floating-point register file, 2.2.2
64-bit integer
addu (Add Unsigned)
epsr OF (overflow flag), 2.2.4
data type, 2.1.1
instruction definition, 10.1
floating-point register file, 2.2.2
64-bit values
alignment requirements, 2.3
instruction timing,.1 0.3
ADS# (address status)
AHOLD (address hold), 4.2.3
128-bit load and store instructions
floating-point register file, 2.2.2
signal description, 4.2.2
AE
fsr U-bit (update bit), 2.2.8
128-bit values
alignment requirements, 2.3
AHOLD (address hold)
bus arbitration, 5.2
82495XP/82490XP cache
signal description, 4.2.3
BRDY # (burst ready), 4.2.7
external secondary cache, 1.0
write-once policy, 3.2.4.2
algorithm
address translation, 2.4.5
A31-A3 (address pins)
signal description, 4.2.1
cache replacement, 3.2.3
aliasing
instruction cache, 3.2.2
A (accessed)
internal instruction and data caches, 3.2
page-table entries (PTEs), 2.4.4.6
2-147
FI
1860TMXP MICROPROCESSOR
alignment
BE7#-BEO# (byte enables)
signal description, 4~2.4
requirements, 2.3
andh (Logical AND High)
bear (bus error address register)
format description, 2.2.10
instruction definition, 10.1
instruction timing, 10.3
BE (big endian)
and (Logical AND)
data cache, 3.2.1
epsr format description, 2.2.4
instruction definition, 10.1
instruction timing, 10.3
BEF (bus error flag)
epsr format description, 2.2.4
andnoth (Logical AND NOT High)
instruction definition, 10.1
instruction timing, 10.3
andnot (Logical AND NOT)
instruction definition, 10.1
BEh#
BE7#-BEO# (byte enables),4.2.4
BERR (bus error)
bear (bus error address register), 2.2.10
instruction timing, 10.3
bus error trap, 2.8.7
epsr BEF (bus error flag), 2.2.4
psr 1M (interupt mode), 2.2.3
ANSI/IEEE Standard, 754 to 1985,1.0
AO
signal description, 4.2.5
fsr U-bit (update bit), 2.2.8
arbitration
big endian mode
addressing, 2.3
bus operation, 5.2
HOLD and HLDA, 5.2.1
bla (Branch on LCC and Add)
epsr AI (trap on autoincrement instruction),2.2.4
ATE (address translation enable)
instruction definition, 10.1
address translation, 2.4
instruction timing, 10.3
dirbase format description, 2.2.6
BL (bus lock)
AU
dirbase format description, 2.2.6
fsr U-bit (update bit), 2.2.8
bnc (Branch on Not CC)
instruction definition, 10.1
B
instruction timing, 10.3
back-off
\
bus cycle, 5.2.2
bnc.t (Branch on Not CC, Taken)
late modes, 5.2.2.3
instruction definition, 10.1
one-clock late mode, 5.2.2.4
instruction timing, 10.3
two-clock late mode, 5.2.2.5
BOFF # (back-off)
bc (Branch on CC)
ADS# (address status), 4.2.2
instruction definition, 10.1
BERR(bus error), 4.2.5
instruction timing, 10.3
bus arbitration, 5.2
dirbase LB (late back-off mode), 2.2.6
bc.t (Branch on CC, Taken)
FLlNE# choice, 5.3.5.1
instruction definition, 10.1
signal description, 4.2.6
instruction timing, 10.3
2-148
i860™ XP MICROPROCESSOR
boundary scan
bus and cache control unit
register cell ordering, 6.5
BPR (bypass register)
function of, 1.0
bus cycles
back-off and restart, 5.2.2
test, 6.2
bus operation, 5.1
br (Branch Direct Unconditionally)
type output pins, 4.1
instruction definition, 10.1
instruction timing, 10.3
bus errors
bear (bus error address register), 2.2.10
BR (break read)
trap, 2.8.7
debugging i860 XP microprocessor, 2.9
psr format description, 2.2.3
bus operation
i860 XP microprocessor, 5.0
BRDY# (burst ready)
bear (bus error address register), 2.2.10
BW (break write)
debugging i860 XP microprocessor, 2.9
BERR (bus error), 4.2.5
psr format description, 2.2.3
epsr IL (interlock), 2.2.4
locked access, 3.2.4.3
BYPASS# (bypass)
signal description, 4.2.7
signal description, 4.2.9
write-once policy, 3.2.4.2
TAP encoding, 6.3
BREQ (bus request)
signal description, 4.2.8
C
CACHE# (cacheability)
BE7 # - BEO # (byte enables), 4.2.4
bri (Branch Indirect Unconditionally)
signal description, 4.2.10
instruction definition, 10.1
brl (Branch Indirect Unconditionally)
cache
address translation, 3.1
instruction timing, 10.3
consistency protocol, 3.2.4
BS (bus or parity error trap in supervisory mode)
external secondary, 1.0
epsr format description, 2.2.4
inquiry cycles (snooping), 5.3
internal instruction and data, 3.2
BSR (boundary scan register)
invalidating entries, 3.3
test, 6.2
on-chip, 3.0
bte (Branch If Equal)
replacement algorithm, 3.2.3
instruction definition, 10.1
instruction timing, 10.3
cacheability
address translation caches, 3.1
btne (Branch If Not Equal)
consistency, 3.3.4
instruction timing, 10.3
calli (Indirect Subroutine Call)
buffer
instruction definition, 10.1
models, 9.4
size, selection with PEN #, 4.2.35, 5.5, 9.4.3
instruction timing, 10.3
call (Subroutine Call)
burst cycles
instruction definition, 10.1
bus cycle, 5.1.2
bus arbitration
instruction timing, 10.3
capture-DR
test state, 6.4.5
bus operation, 5.2
2-149
intel®
i860TM XP MICROPROCESSOR
capture-IR
copy-back policy
data cache update, 3.2.1.1
test state, 6.4.11
CC (condition code)
core execution unit
psr format description, 2.2.3
function of, 1.0
ccr (concurrency control register)
CS8 (code size 8-bit)
OCCU initialization, 2.5.1
BE7 # -BEO# (byte enables), 4.2.4
format description, 2.2.12
dirbase format description, 2.2.6
CCUBASE
CTRl-format
ccr (concurrency control register), 2.2.12
OCCU addressing, 2.5.2
OCCU initialization, 2.5.1
CO (cache disable)
bypassing instruction and data cache, 3.3
instructions, 10.2.2
CTYP (cycle type)
signal description, 4.2.12
current mode
high vs. normal, 4.2.35, 5.5, 9.3, 9.4.3
page-table entries (PTEs), 2.4.4.5
ClK (clock)
cycles
back-off, 5.2.2.1
signal description, 4.2.11
burst cycles, 5.1.2
co (CCU on)
interrupt acknowledge, 5.1.4
ccr (concurrency control register), 2.2.12
pipelined, 5.1.3
restart, 5.2.2.2
color intensity shading
special bus, 5.1.5
pixel formats, 2.1.4
compatibility
pipelined cycles, 5.1.3
o
063-00 (data pins)
software changes, 10.5.1
concurrency control' unit (CCU)
signal description, 4.2.14
data access
ccr (concurrency control register). 2.2.12
fault, 2.8.5
detached CCU, 2.5
NEWCURR register, 2.2.13
data cache
bypassing, 3.3
consistency
flushing, 3.3
address space, 3.3.1
function of, 1.0
cacheability, 3.3.4
operation, 3.2
instruction cache, 3.3.2
organization, 3.2.1
internal cache, 3.3
states, 3.2.4.1
load pipe, 3.3.5
update policies, 3.2.1.1
page table, 3.3.3
protocol, 3.2.4
data types
i860 XP microprocessor, 2.1
write-once policy, 3.2.4.2
control registers
OAT (data access trap)
debugging i860 XP microprocessor, 2.9
mgister set, 2.2
psr format description, 2.2.3
2-150
intel®
i860™ XP MICROPROCESSOR
db (data breakpoint register)
dirbase (directory base register)
debugging i860 XP microprocessor, 2.9
address space consistency, 3.3.1
format description, 2.2.5
cache replacement algorithm, 3.2.3
DCCU initialization, 2.5.1
psr BR (break read) and BW (break write), 2.2.3
format description, 2.2.6
Obit
instruction cache consistency, 3.3.2
dual-instruction mode, 2.6.2
page directory, 2.4.3
D/C# (data/code)
page table consistency, 3.3.3
P (present) bit, 2.4.4.2
signal description, 4.2.13
D.C. characteristics
disassemblers
big endian mode, 2.3
electrical data, 9.2
DCCU (detached concurrency control unit)
01 (trap on delayed instruction)
epsr format description, 2.2.4
addressing, 2.5.2
ccr (concurrency control register), 2.2.12
function of, 1.0
OM (dual instruction mode)
psr format description, 2.2.3
initialization, 2.5.1
internals, 2.5.3
DO (detached only)
ccr (concurrency control register), 2.2.12
DCS (data cache size)
epsr format description, 2.2.4
double-precision real
data type, 2.1.3
o (dirty)
page-table entries (PTEs), 2.4.4.6
double real value
floating-point registers, 2.1.3
debugging
i860 XP microprocessor, 2.9
double-shift instruction
psr SC (shift count), 2.2.3
deferred-write policy
data cache update, 3.2.1.1
DP7-DPO (data parity)
signal description, 4.2.15
denormal
special floating-point values, 2.1.3
DPC (data-path control)
dual-operation instructions, 2.6.3
Detached
DPS (DRAM page size)
STAT register description, 2.2.14
dirbase format description, 2.2.6
detached CCU
OS (delayed switch)
i860 XP microprocessor, 2.5
psr format description, 2.2.3
d.fnop
DTB (directory table base)
dual-instruction mode, 2.6.2
dirbase format description, 2.2.6
. DID (device identification register)
dual-instruction mode
test, 6.2
paralieliism, 2.6.2
DIR
virtual address, 2.4.2
dual-operation instructions
floating-point, 2.6.3
2-151
i860™ XP MICROPROCESSOR
fault
E
address translation, 2.4.6
EAOS#
data access, 2.8.5
AHOLO (address hold), 4.2.3
floating-point, 2.8.3
EAOS# (external address status)
instruction access, 2.8.4
signal description, 4.2.16
result exception fault, 2.8.3.1
source exception fault, 2.8.3.1
epsr (extended processor status register)
data cache, 3.2.1
fiadd.w (Long-Integer Add)
OCCU internals, 2.5.3
instruction definition, 10.1
format description, 2.2.4
instruction timing, 10.3
page-table entries (PTEs), 2.4.4.3
fir (fault instruction register)
EWBE # (external write buffer empty)
epsr 01 (trap on delayed instruction), 2.2.4
format description, 2.2.7
epsr SO (strong ordering), 2.2.4
signal description, 4.2.17
fisub.w (Long-Integer Subtract)
instruction definition, 10.1
exit1-0R
test state, 6.4.7
exit1-IR
instruction timing, 10.3
fix.v (Floating-Point to Integer Conversion)
instruction definition, 10.1
test state, 6.4.13
instruction timing, 10.3
exit2-0R
test state, 6.4.9
fld.y (Floating-Point Load)
instruction definition, 10.1
exit2-IR
instruction timing, 10.3
test state, 6.4.15
EXTEST
FLlNE# (flush line)
BOFF # choice, 5.3.5.1
TAP encoding, 6.3
signal description, 4.2.18
floating-point
F
adder, 1.0
faddp (Add with Pixel Merge)
instruction definition; 10.1
control unit, 1.0
instruction timing, 10.3
fault, 2.8.3
instruction encoding, 10.2.3
fadd.p (Floating-Point Add)
multiplier, 1.0
instruction definition, 10.1
register file, 2.2.2
instruction timing, 10.3
flush (Cache Flush)
faddz (Add with Z Merge)
cache replacement algorithm, 3.2.3
instruction definition, 10.1
dirbase RB (replacement block), 2.2.6
instruction timing, 10.3
flushing data cache, 3.3
instruction definition, 10.1
famov.r (Floating-Point Adder Move)
instruction timing, 10.3
instruction definition, 10.1
requirements summary, 3.3.6
instruction timing, 10.3
2-152
intel"
i860™ XP MICROPROCESSOR
fmlow.dd (Floating-Point Multiply Low)
FT (floating-point trap)
instruction definition, 10.1
instruction timing, 10.3
psr format description, 2.2.3
ftrunc.v (Floating-Point to Integer Conversion)
instruction definition, 10.1
fmov.r (Floating-Point Reg-Reg Move)
instruction timing, 10.3 '
instruction definition, 10.1
instruction timing, 10.3
fxfr (Transfer F-P to Integer Register)
fmul.p (Floating-Point Multiply)
instruction definition, 10.1
instruction definition, 10.1
instruction timing, 10.3
instruction timing, 10.3
fzchkl (32-Bit Z-Buffer Check)
instruction definition, 10.1
fnop (Floating-Point No Operation)
instruction timing, 10.3
instruction definition, 10.1
instruction timing, 10.3
fzchks (16-Bit Z-Buffer Check)
instruction definition, 10.1
form (OR with MERGE Register)
instruction timing, 10.3
instruction definition, 10.1
instruction timing, 10.3
FZ (flush zero)
fsr format description, 2.2.8
frcp.p (Floating-Point Reciprocal)
instruction definition, 10.1
instruction timing, 10.3
frsqr.p (Floating-Point Reciprocal Square Root)
G
graphics unit
function of, 1.0
instruction definition, 1,0.1
instruction timing, 10.3
H
fsr (floating-point status register)
format description, 2.2.8
pipelining status information, 2.6.1.2
fst.y (Floating-Point Store)
instruction definition, 10.1
instruction timing, 10.3
fsub.p (Floating-Point Subtract)
instruction definition, 10.1
instruction timing, 10.3
FTE (floating-point trap enable)
fsr format description, 2.2,8
hardware interface
i860 XP microprocessor, 4.0
HIT# (cache inquiry hit)
signal description, 4.2.19
HITM# (hit modified line)
internal cache consistency, 3.3
signal description, 4.2.20
HLDA (bus hold acknowledge)
signal description, 4.2.21
HOLD (bus hold)
bus arbitration, 5.2
signal description, 4.2.22
1860TM XP MICROPROCESSOR
i860 XP microprocessor
bus operation, 5.0
instruction
.access fault, 2.8.4
characteristics, 10.4
CTRL-format, 10.2.2
functional description, ·1.0
definitions, 10.1
hardware interface, 4.0
dual-operation, 2.6.3
instruction set, 8.0
encoding floating-point, 10.2.3
mechanical data, 7.0
fault, 2.8.2
on-chip caches, 3.0
format and encoding, 10.2
programming interface, 2.0
REG-format, 10.2.1
testability, 6.0
IAT (instruction access trap)
psr format description, 2.2.3
timing, 10.3
instruction cache
bypassing, 3.3
consistency, 3.3.2
IOCOOE
function of, 1.0
TAP encoding, 6.3
operation, 3.2
IEEE Standard
organization, 3.2.2
for Binary Floating-Point Arithmetic, 1.0
P1149.1/06 testability, 6.0
instruction set
abbreviations, 10.0
IL (interlock)
extensions of i860 XR, 2.6
epsr format description, 2.2.4
1M (interrupt mode)
psr format description, 2.2.3
indefinite
special floating-point values, 2.1.3 .
i860 XP microprocessor, 8.0
INT ICS8 (interrupt/code-size 8~bits)
Signal description, 4.2.24
integer
data type, 2.1.1
inexact result
result exception fault, 2.8.3.2
register file, 2.2.1
internal cache
initialization
at RESET, 5.5
consistency, 3.3
interrupt
infinity
acknowledge cycles, 5.1.4
special floating-point values,. 2.1.3
i860 XP microprocessor, 2.8
trap, 2.8.8
IN (interrupt)
psr format description, 2.2.3
INT (interrupt)
epsr format description,
InLoop
STAT register description, 2.2.14
Intovr (Software Trap on Integer Overflow)
instruction definition, 10.1
inquiry cycles
instruction timing, 10.3
data cache states, 3.2.4.1
for line being cached, 5.3.2.1
~.2.4
INTpin
epsr INT (interrupt), 2.2.4
psr 1M (interrupt mode), 2.2.3
for line being replaced, 5.3.2.2
snooping, 5.3
write-back, 5.3.1
2-154
i860TM XPMICROPROCESSOR
invalidation requirements
Id.c (Load from Control Register)
summary, 3.3.6
fir (fault instruction register), 2.2.7
instruction definition, 10.1
INV (invalidate)
instruction timing, 10.3
signal description, 4.2.23
IR (instruction register)
Idint.x (Load Interrupt Vector)
big endian mode, 2.3
test, 6.3
epsr BE (big endian), 2.2.4
IRP (integer graphics)
extensions of i860 XR, 2.6
fsr format description, 2.2.8
instruction definition, 10.1
instruction timing, 10.3
ITI (cache and TLB invalidate)
dirbase format description, 2.2.6
Idio.x (Load liD)
big end ian mode, 2.3
IT (instruction trap)
extensions of i860 XR, 2.6
psr format description, 2.2.3
instruction definition, 10.1
ilcfr (Transfer Integer to F-P Register)
instruction definition, 10.1
instruction timing, 10.3
K
instruction timing, 10.3
Id.1
flushing data cache, 3.3
Id.x (Load Integer)
DCCU .internals, 2.5.3
KBO, KB1 (cache block)
instruction definition, 10.1
signal description, 4.2.25
instruction timing, 10.3
KEN# (cache enable)
BE7#-BEO# (byte enables), 4.2.4
LEN (data length)
signal description, 4.2.27
bypassing instruction and data cache, 3.3
DCCU addressing, 2.5.2
LFBSR (linear feedback shift register)
internal instruction and data caches, 3.2
cache replacement algorithm, 3.2.3
locked access, 3.2.4.3
signal description, 4.2.26
little endian mode
addressing, 2.3
KI
special purpose register description, 2.2.9
load pipe
consistency, 3.3.5
KNF (kill next floating-point instruction)
psr format description, 2.2.3
LOCK # (address lock)
A (accessed) bit, 2.4.4.6
KR
cycle attribute, 5.4
special purpose register description, 2.2.9
dlrbase BL (bus lock), 2.2.6
signal description, 4.2.28 .
L
lock (Begin Interlocked Sequence)
LB (late back-off mode)
dirbase BL (bus lock), 2.2.6
dirbase format description, 2.2.6
instruction definition, 10.1
instruction timing, 10.3
LCC (loop condition code)
locked access, 3.2.4.3
psr CC (condition code), 2.2.3
2-155
InteL
i860TM XP MICROPROCESSOR
locked access
cache consistency,· 3.2.4.3
N
NA# (next address request)
lock instruction
locked access, 3.2.4.3
signal description, 4.2.30
epsr IL (interlock), 2.2.4
write-once policy, 3.2.4.2
lock protocol
instruction fault, 2.8.2.1
NaN (Not a Number)
special floating-point values, 2.1.3
LRPa (load pipe result precision)
fsrformat description, 2.2.8
NENE# (next near)
dirbase DPS (DRAM page size), 2.2.6
LRP1 (load pipe result precision)
signal description, 4.2.31
fsr format description, 2.2.8
Nested
STAT register description, 2.2.14
M
MA
NEWCURR register
fsr U-bit (update bit), 2.2.8
DCCU internals, 2.5.3
format description, 2.2.13
mechanical data
i860 XP microprocessor, 7.0
nonpipelined cycle
bus cycle, 5.1.3
MERGE
special purpose register description, 2.2.9
nop (Core-Unit No Operation)
instruction definition, 10.1
MESI
instruction timing, 10.3
cache consistency protocol, 3.2.4
write cycle reordering, 5.3.3
MI
fsr U-bit (update bit), 2.2.8
o
offset
addressing modes, 2.7
M/IO# (memory-I/O)
I
signal description, 4.2.29
virtual address, 2.4.2
OF (overflow flag)
epsr format description, 2.2.4
MO
fsr U-bit (update bit), 2.2.8
on-chip caches
mov (Constant-to-Register Move)
instruction definition! 10.1
i860 XP microprocessor, 3.0
ordinal
mov (Register-Register Move)
instruction definition, 10,1
instruction timing, 10.3
MU
fsr U-bit (update bit), 2.2.8
data type, 2.1.2
orh (Logical OR High)
instruction definition, 10.1
instruction timing, 10.3
or (Logical OR)
instruction definition, 10.1
instiUction timing, 10.3
2-156
i860™ XP MICROPROCESSOR
output pins
PBM (page-table bit mode)
pins overview, 4.1
overflow
epsr format description, 2.2.4
PCD (page cache disable)
result exception fault, 2.8.3.2
bypassing instruction and data cache, 3.3
CD (cache disable), 2.4.4.5
p
signal description, 4.2.32
package
PCHK# (parity check)
thermal specifications, 8.0
PAGE
signal description, 4.2.33
PCYC (page cycle)
virtual address, 2.4.2
page directory
signal description, 4.2.34
PEF (parity error flag)
little endian mode, 2.3
epsr format description, 2.2.4
page tables, 2.4.3
PEN # (parity enable)
paged virtual-address space
bear (bus error address register), 2.2.10
addressing, 2.3
parity error trap, 2.8.6
signal description, 4.2.35
page frame
address, 2.4.4.1
performance optimizations
physical main memory, 2.4.1
page table
combining protection, 2.4.4.8
software compatibility, 10.5.2
pfaddp (Pipelined Add with Pixel Merge)
instruction definition, 10.1
consistency, 3.3.3
instruction timing, 10.3
entry format description, 2.4.4
format description, 2.4.3
little endian mode, 2.3
pfadd.p (Pipelined Floating-Point Add)
instruction definition, 10.1
instruction timing, 10.3
for trap handlers, 2.4.4.7
paging unit
pfaddz (Pipelined Add with Z Merge)
address translation caches, 3.1
instruction definition, 10.1
function of, 1.0
instruction timing, 10.3
parallelism
dual-instruction mode, 2.6.2
pfamov.r (Pipelined Floating-Point Adder Move)
instruction definition, 10.1
instruction timing, 10.3
use of, 2.6
parity error
pfam.p (Pipelined Floating-Point Add and Multiply)
dual-operation, 2.6.3
bear (bus error address register), 2.2.10
psr 1M (interrupt mode), 2.2.3
instruction definition, 10.1
instruction timing, 10.3
trap, 2.8.6
. special purpose registers, 2.2.9
pause-DR
test state, 6.4.8
pfeq.p (Pipelined Floating-Point Equal Compare)
instruction definition, 10.1
pause-IR
instruction timing, 10.3
test state, 6.4.14
2-157
fI
intel®
i860TM XP MICROPROCESSOR
pfgt.p (Pipelined Floating-Point Greater-Than
Compare)
pfmul3.dd (Three-Stage Pipelined Multiply
instruction definition, 10.1
instruction definition, 10.1
instruction timing, 10.3
instruction timing, 10.3
pfmul.p (Pipelined Floating-Point Multiply)
pfiadd.w (Pipelined Long-Integer Add)
instruction definition, 10.1
instruction definition, 10.1
instruction timing, 10.3
instruction timing, 10.3
pform (Pipelined OR to MERGE Register)
pfisub.w (Pipelined Long-Integer Subtract)
instruction definition, 10.1
instruction definition, 10.1
instruction timing, 10.3
instruction timing, 10.3
pfix.v (Pipelined Floating-Point to Integer
Conversion)
pfsm.p (Pipelined Floating-Point Subtract
and Multiply)
dual-operation, 2.6.3
instruction definition, 10.1
instruction definition, 10.1
instruction timing, 10.3
instruction timing, 10.3
special purpose registers, 2.2.9
pfld (Pipelined Floating-Point Load)
epsr PT (trap on pipeline use), 2.2.4
pfsub.p (Pipelined Floating-Point Subtract)
load pipe consistency, 3.3.5
instruction definition, 10.1
pipeline loads, 2.6.1.5
instruction timing, 10.3
pfld.q
extensions of i860 XR, 2.6
pftrunc.v (Pipelined Floating-Point to
Integer Conversion)
instruction definition, 10.1
pfld.y (Pipelined Floating-Point Load)
instruction timing, 10.3
instruction definition, 10.1
instruction timing, 10.3
pfzchkl (Pipelined 32-Bit Z-Buffer Check)
instruction definition, 10.1
pfle.p (Pipelined F-P Less-Than or Equal Compare)
instruction timing" 10.3
instruction definition, 10.1
instruction timing, 10.3
pfzchks (Pipelined 16-Bit Z-Buffer Check)
instruction definition, 10.1
pfmam.p (Pipelined Floating-Point Add and Multiply)
instruction timing, 10.3
dual operation, 2.6.3
instruction definition, 10.1
physical main memory
instruction timing, 10.3
special purpose registers, 2.2.9
page frame, 2.4.1
physical tags
internal instruction and data caches, 1.2
pfmov.r (Pipelined Floating-Point Reg-Reg Move)
instruction definition, 10.1
instruction timing, 10.3
pfmsm.p (Pipelined Floating-Point Subtract
and Multiply)
PI bit
using, 2.8.2.2
PIM (previous interrupt mode)
psr format description, 2.2.3
dual operation, 2.6.3
instruction definition, 10.1
pins overview
instruction timing, 10.3
hardware interface, 4.1
special purpose registers, 2.2.9
2-158
i860TM XP MICROPROCESSOR
pipeline
PWT (page write-through)
cycles, 5.1.3
signal description, 4.2.36
loads, 2.6.1.5
WT (write-through), 2.4.4.4
operations, 2.6.1
R
precision in, 2.6.1.3
scalar transition, 2.6.1.4
ratings
status information, 2.6.1.2
absolute maximum, 9.1
PI (pipeline instruction)
RS (replacement block)
epsr format description, 2.2.4
dirbase format description, 2.2.6
pixel
RC (replacement control)
data type, 2.1.4
dirbase format description, -2.2.6
PM (pixel mask)
REG-format
psr format description, 2.2.3
instructions, 10.2.1
P (present)
register cell ordering
page-table entries (PTEs), 2.4.4.2
boundary scan, 6.5
privileged registers
replacement algorithm
format description, 2.2.11
cache, 3.2.3
processor
RESET (system reset)
revisions, 2.2.4
AHOLD (address hold), 4.2.3
type, 2.2.4
bear (bus error address register), 2.2.10
cache replacement algorithm, 3.2.3
programming interface
epsr SEF (bus error flag), 2.2.4
i860 XP microprocessor, 2.0
epsr SO (strong ordering), 2.2.4
PS.(pixel size)
initialization, 5.5
psr format description, 2.2.3
signal description, 4.2.37
trap, 2.8.9
psr (processor status register)
debugging i860 XP microprocessor, 2.9
format description, 2.2.3
restart
bus cycle, 5.2.2
page-table entries (PTEs), 2.4.4.3
result exception fault
pst.d (Pixel Store)
floating-point, 2.8.3.1
instruction definition, 10.1
instruction timing, 10.3
,
right-shift instruction
psr SC (shift count), 2.2.3
psr PS (pixel size) and PM (pixel mask), 2.2.3
PT (trap on pipeline use)
, RM (rounding mode)
fsr format description, 2.2.8
epsr format description, 2.2.4
using, 2.8.2.2
RR (result register)
fsr format description, 2.2.8
PU (previous user mode)
psr format description, 2.2.3
run-test/idle
test state, 6.4.2
2-159
intel®
i860TM XP MICROPROCESSOR
shr (Shift Right)
S
instruction definition, 10.1
SAMPLE
instruction timing, 10.3
TAP encoding, 6.3
scalar
signal description
hardware interface, 4.2
mode, 2.6.1.1
operations, 2.6.1
single-precision real
pipelined transition, 2.6.1.4
SC (shift count)
data type, 2.1.3
single-transfer cycle
bus cycle, 5.1.1
psr format description, 2.2.3
scyc.x (Special Cycle,s)
SI (sticky inexact)
fsr format description, 2.2.8
big endian mode, 2.3
epsr BE (big endian), 2.2.4
extensions of i860 XR, 2.6
snooping
inquiry cycles, 5.3
instruction definition, 10.1
internal instruction and data caches, 3.2
instruction timing, 10.3
select-OR-scan
test state, 6.4.3
select-IR-scan
test state, 6.4.4
serializing
locked access, 3.2.4.3
5E (source exception)
fsr format description, 2.2.8
shift-DR
test state, 6.4.6
shift-IR
test state, 6.4.12
shl (Shift Left)
instruction definition, 10.1
responsibility limits, 5.3.2
software compatibility
required changes, 10.5.1
SO (strong ordering)
epsr format description, 2.2.4
source exception fault
floating-point, 2.8.3.1
spare
signal description, 4.2.38
special bus
cycles, 5,1.5
special-purpose registers
register set, 2.2
special values
floating-point numbers, 2.1.3
instruction timing, 10.3
shra (Shift Right Arithmetic)
5TAT register
OCCU internals, 2.5.3
instruction definition, 10.1
format description, 2.2.14
instruction timing, 10.3
shrd (Shift Right Double)
instruction definition, 10.1
instruction timing, 10.3
2-160
intel@
i860™ XP MICROPROCESSOR
st.e (Store to Control Register)
TAl (Trap On Autoincrement)
address translation, 2.4
epsr format description, 2.2.4
dirbase BL (bus lock), 2.2.6
fsr U-bit (update bit), 2.2.B
dirbase CSB (code size B-bit), 2.2.6
TAP (test access port)
fsr U-bit (update bit), 2.2.B
controller, 6.4
instruction definition, 10.1
controller initialization, 6.6
instruction timing, 10.3
testability, 6.0
privileged registers, 2.2.11
TCK (test clock)
stepping number
signal description, 4.2.39
epsr format description, 2.2.4
TOI (test data input)
stio.x (Store lID)
signal description, 4.2.40
big endian mode, 2.3
epsr BE (big endian), 2.2.4
TOO (test data output)
extensions of iB60 XR, 2.6
signal description, 4.2.41
instruction definition, 10.1
test
instruction timing, 10.3
architecture, 6.1
strong ordering mode
data registers, 6.2
inquiry cycle, 5.3.4
testability
st.x (Store Integer)
iB60 XP microprocessor, 6.0
OCCU internals, 2.5.3
test-logic-reset
instruction definition, 10.1
test state, 6.4.1
instruction timing, 10.3
test state
subs (Subtract Signed)
capture-DR, 6.4.5
epsr OF (overflow flag), 2.2.4
capture-IR, 6.4.11
instruction definition, 10.1
exit1-0R,6.4.7
instruction timing, 10.3
exit1-IR,6.4.13
exit2-0R, 6.4.9
subu (Subtract Unsigned)
epsr OF (overflow flag), 2.2.4
exit2-IR, 6.4.15
instruction definition, 10.1
pause-DR, 6.4.B
instruction timing,·1 0.3
pause-IR,6.4.14
run-test/idle, 6.4.2
supervisor/user mode
select-OR-scan, 6.4.3
addressing, 2.3
select-IR-scan, 6.4.4
eer (concurrency control register), 2.2.12
shift-DR, 604.6
psr U (user mode), 2.2.3
shift-IR, 6.4.12
test-logic-reset, 6.4.1
T
update-DR, 6.4.10
special purpose register description, 2.2.9
update-IR, 6.4.16
tags
thermal specifications
internal instruction and data caches, 3.2
package, B.O
2-161
..
i860TM XP MICROPROCESSOR
TI (trap inexact)
update-IR
test state, 6.4.16
fsr format description, 2.2.8
TLB
user/supervisor mode
address translation caches, 3.1
ccr (concurrency control register), 2.2.12
DCCU addressing, 2.5.2
psr U (user mode), 2.2.3
internal cache consistency, 3.3
U (user)
page-table entries (PTEs), 2.4.4.3
TMS (test mode select)
psr format description, 2.2.3
signal description, 4.2.42
trap handler
invocation, 2.8.1
page tables, 2.4.4.7
V
VecCLK (clock power)
signal description, 4.2.45
trap (Software Trap)
bus error, 2.8.7
Vee (system ground)
signal description, 4.2.44
i860 XP microprocessor, 2.8
instruction cache consistency, 3.3.2
instruction definition, 10.1
virtual address
address translation caches, 3.1
instruction timing, 10.3
CCUBASE, 2.2.12
interrupt, 2.8.8
format description, 2.4.2
parity error, 2.8.6
i860 XP microprocessor, 2.4
RESET, 2.8.9
virtual tag
tri-state
instruction cache, 3.2.2
output pins, 4.1
TRST # (test reset)
signal description, 4.2.43
U
internal instruction and data caches, 3.2
Vss (ground)
signal description, 4.2.44
w
U-bit (update bit)
wait state
fsr format description, 2.2.8
underflow
single-transfer cycle, 5.1.1
WB/WT # (write-back/write-through)
result exception fault, 2.8.3.2
signal description, 4.2.46
unlock (End Interlocked Sequence)
write-once policy, 3.2.4.2
dirbase BL (bus lock), 2.2.6
WP (write protect)
epsr IL (interlock), 2.2.4
epsr format description, 2.2.4
instruction definition, 10.1
page-table entries (PTEs), 2.4.4.3
instruction timing, 10.3
W /R # (write/read)
update-DR
signal description, 4.2.47
test state, 6.4.10
write-once policy, 3.2.4.2
2-162
i860™ XP MICROPROCESSOR
write-back
data cache update policy, 3.2.1.1
with FLiNE #, 5.3.5.2
x
xorh (Logical Exclusive OR High)
instruction definition, 10.1
inquiry cycles, 5.3.1
instruction timing, 10.3
scheduling inquiry cycles, 5.3.5
write cycle
xor (Logical Exclusive OR)
instruction definition, 10.1
reordering due to buffering, 5.3.3
instruction timing, 10.3
write-once
cache consistency, 3.2.4.2
data cache update policy, 3.2.1.1
z
Z-buffer
special purpose registers, 2.2.9
write-through
data cache update policy, 3.2.1.1
WT (write-through)
page-table entries (PTEs), 2.4.4.4
write-through policy, 3.2.1.1
W (writable)
page-table entries (PTEs), 2.4.4.3
2-163
i860™ XR 54-BIT MICROPROCESSOR
• Parallel Architecture that Supports Up
to Three Operations per Clock
- One Integer or Control Instruction
per Clock
- Up to Two Floating-Point Results per
Clock
..
• High Performance Design
....;.25/33.3/40 MHz Clock Rates
- 80 Peak Single Precision MFLOPs
- 60 Peak Double Precision MFLOPs
- 64-Bit External Data Bus
- 64-Bit Internal Instruction Cache Bus
-128-Bit Internal Data'Cache Bus
•
Compatible with Industry Standards
- ANSI/IEEE Standard 754-1985 for.
Binary Floating-Point· Arithmetic
-lnte1386™1486TM Microprocessor
Data Formats and Page Table Entries
-JEDEC 168-pin Ceramic Pin Grid
Array Package (see Packaging
Outlines and Dimensions, order
#231369)
•
Easy to Use
- On-Chip Debug Register
- Assembler, Linker, Simulator,
Debugger, C and FORTRAN
Compilers, FORTRAN Vectorizer, .
Scalar and Vector Math Libraries for
both OS/2* and UNIX* Environments.
• High Level of Integration on One Chip
- 32-Bit Integer and Control Unit
- 32/64-Bit Pipelined Floating-Point
Adder and Multiplier Units
- 64-Bit 3-D Graphics Unit
- Paging Unit with Translation
Lookaside Buffer
- 4 Kbyte Instruction Cache
- 8 Kbyte Data Cache
The Intel i860™ XR Microprocessor (order codes A80B60XR·25, AB0860XR·33 and ABOB60XR·40) delivers
supercomputing performance in a single VLSI component. The 64-bit design of the iB60 XR microprocessor
balances integer, floating point, and graphics performance for applications such as engineering workstations,
scientific computing, 3-0 graphics workstations, and multiuser systems. Its parallel architecture achieves high
throughput with RISC design techniques, pipelined processing units, wide data paths, large on-chip caches,
million-transistor design, and fast one-micron CHMOS IV silicon technology.
A31-A3 063-00 CONTROL
;.
t I
I
of
BUS &: CACHE
CONTROL UNIT
32
PHYSICAL
64
ADD~ESS
64
I
1
INSTRUCTION
CACHE
"I
64
64
FP
FP
src1
result
src2
DATA BUS
32
FP INSTRUCTION BUS
32 CORE INSTRUCTION BUS
I:::J
INSTRUCTION
ADDRESS
PAGE UNIT
64
FP
32
I
FP REGISTER FILE
64
CACHE lOW
DATA
32
30
-I
64
1
64
f
I
CACHE HIGH
DATA
FP
MULTIPLIER UNIT
•
32 DATA ADDRESS
I
I
64
64
I
FP
GRAPHICS UNIT
I
r-
ADDER UNIT
•
J
•
"T
I
64
DATA CACHE
T
n!f! ..
ALIGNMENT
32
~
I
RiSe CORE
J CONTROLLING
FLOATING-POINT,
UNIT I.e
$_$J 1
•
I
240296-1
Figure 0.1. Block Diagram
Inte!. intel. Inte1386™. Inte1486™, i860 XR. Multibus II and Parallel System Bus are trademarks of Intel Corporation.
'UNIX is a registered trademark of UNIX System Laboratories. Inc. OS/2 is a trademark of International Business Machines
Corporation.
2·164
August 1991
Order Number: 240296-005
i860TM XR 64-Bit Microprocessor
CONTENTS
PAGE
CONTENTS
PAGE
2.4.7 Page Translation Cache .... 2-184
1.0 FUNCTIONAL DESCRiPTION ..... 2-168
2.5 Caching and Cache Flushing ..... 2-184
2.0 PROGRAMMING INTERFACE ..... 2-168
2.6 Instruction Set ............. : ..... 2-185
2.1 Data Types ...................... 2-169
2.1.2 Ordinal ..................... 2-169
2.6.1 Pipelined and Scalar
Operations ..................... 2-185
2.6.1.1 Scalar Mode ........... 2-185
2.1.3 Single- and Double-Precision
Real ........................... 2-169
2.6.1.2 Pipelining Status
Information .................. 2-185
2.1.4 Pixel ........................ 2-170
2.2 Register Set ..................... 2-170
2.6.1.3 Precision in the
Pipelines .................... 2-187
2.6.1.4 Transition between
Scalar and Pipelined
Operations .................. 2-188
2.1.1 Integer ..................... 2-169
2.2.1 Integer Register File ........ 2-171
2.2.2 Floating-Point Register
File ............................ 2-171
2.6.2 Dual-Instruction Mode ...... 2-188
2.2.3 Processor Status Register .. 2-171
2.2.4 Extended Processor Status
Register ....................... 2-174
2.6.3 Dual-Operation Instruction .. 2-189
2.7 Addressing Modes ............... 2-189
2.2.5 Data Breakpoint Register ... 2-175
2.8 Traps and Interrupts ............. 2-190
2.2.6 Directory Base Register ..... 2-175
2.2.7 Fault Instruction Register ... 2-176
2.8.1 Trap Handler Invocation ... :2~190
2.8.2 Instruction Fault ............ 2-191
2.8.3 Floating-Point Fault ..... ~ ... 2-191
2.2.8 Floating-Point Status
Register ....................... 2-176
2.8.3.1 Source Exception
Faults ....................... 2-191
2.2.9 KR, KI, T, and MERGE
Registers ...................... 2-177
2.8.3.2 Result Exception
Faults ....................... 2-191
2.3 Addressing ...................... 2-178
2.8.4 Instruction Access Fault .... 2-192
2.8.5 Data Access Fault .......... 2-192
2.4 Virtual Addressing ............... 2-178
2.4.1 Page Forms ................ 2-180
2.4.2 Virtual Address ............. 2-180
2.8.6 Interrupt Trap ............... 2-192
2.4.3 Pages Tables ............... 2-180
2.4.4 Page-Table Entries ......... 2-181
2.8.7 ResetTrap ................. 2-192
2.9 Debugging ...................... 2-193
2.4.4.1 Page Frame Address .. 2-181
3.0 HARDWARE INTERFACE ......... 2-193
2.4.4.2 Present Bit .. , .......... 2-181
3.1 Signal Description ....... ; ....... 2-193
2.4.4.3 Writable and User
Bits ......................... 2-181
3:1.1 Clock (ClK) ................ 2-193
3.1.2 System Reset (RESET) ..... 2-193
2.4.4.4 Write-Through Bit ...... 2-182
2.4.4.5 Cache Disable Bit ...... 2-182
2.4.4.6 Accessed and Dirty
Bits ......................... 2-H12
2.4.4.7 Combining Protection of
Both levels of Page
Tables ...................... 2-182
2.4.5 Address Translation
Algorithm ...................... 2-183
2.4.6 Address Translation Faults .. 2-184
2-165
3.1.3 Bus Hold (HOLD) and Bus
Hold Acknowledge (HlDA) ..... 2-193
3.1.4 Bus Request (BREQ) ....... 2-194
3.1.5 Interrupt/Code-Size (I NT I
CS8) ........................... 2-1,94
3.1.6 Address Pins (A31-A3) and
Byte Enables (BE7#-BEO#) .. 2-195
3.1.7 Data Pins (D63-DO) ........ 2-195
3.1.8 Bus lock (lOCK #) ......... 2-195
fI
CONTENTS
PAGE
3.1.9 Write/Read Bus Cycle
(W/R#) ....................... 2-196
3.1.10 Next Near (NENE#) ....... 2-196
3.1.11 Next Address Request
(NA#) ......................... 2-196
3.1.12 Transfer Acknowledge
(READY#) ..................... 2-196
3.1.13 Address Status (ADS #) ... 2-196
3.1.14 Cache Enable (KEN #) .... 2-196
3.1.15 Page Table Bit (PTB) ......
3.1.16 Boundary Scan Shift Input
(SHI) ............. , .............
3.1.17 Boundary Scan Enable
(BSCN) .............. , .........
3.1.18 Shift Scan Path (SCAN) ....
3.1.19 Configuration
(CC1-CCO) ....................
3.1.20 System Power (Vcd and
Ground (Vss) ....... : ...... , ...
3.2 Initialization .....................
3.3 Testability .......................
3.3.1 Normal Mode ...............
3.3.2 Shift Mode ..................
PAGE
8.0 INSTRUCTION SET . ............... 2-220
8.1 Instruction Definitions in
Alphabetical Order ....... , ........ 2-221
8.2 Instruction Format and
Encoding .........................
8.2.1 REG-Format Instructions ...
8.2.2 CTRL-Format Instructions ..
8.2.3 Floating-Point Instructions ..
8.3 Instruction Timings ..............
2-197
2-228
2-228
2-231
2-232
2-234
8.4 Instruction Characteristics ....... 2-236
2-197
FIGURES
Figure 0.1
2-197
2-197
2-197
2-197
2:197
2-198
2-199
2-199
4.0 BUS OPERATION ................. 2-199
4.1 Pipelining ........................ 2-199
4.2 Bus State Machine ..............
4.3 Bus Cycles ......................
4.3.1 Nonpipelined Read Cycles ..
4.3.2 Nonpipelined Write Cycles ..
4.3.3 Pipelined Read and Write
Cycles .........................
4.3.4 Locked Cycles .. ; ...........
4.3.5 HOLD and BREQ Arbitration
Cycles ..................... , ...
4.4 Bus States during RESET ........
CONTENTS
2-200
2-202
2-202
2-203
2-205
Block Diagram .............
Figure 2.1 Real Number Formats' .....
Figure 2.2 Pixel Format Example ......
Figure 2.3 Registers and Data Paths ..
Figure 2.4 Processor Status
Register ...................
Figure 2.5 Extended Processor Status
. Register ...................
Figure 2.6 Directory Base Register ....
Figure 2.7 Floating-Point Status
Register ...................
Figure 2.8 Little and Big Endian Data
Access ....................
Figure 2.9 Format of a Virtual
Address ...................
Figure 2:10 Address Translation .......
Figure 2.11 Format of a Page Table
Entry ......................
Figure 2.12 Pipelined Instruction
Execution ..................
2-207
2-207
2-208'
5.0 MECHANICAL DATA .............. 2-209
6.0 PACKAGE THERMAL
SPECIFICATIONS ................... 2-214
7.0 ELECTRICAL DATA ............... 2-216
7.1 Absolute Maximum' Ratings ...... 2-216
7.2 D.C. Characteristics ............. 2-216
7.3 A.C.Characteristics ............. 2-217
2-166
2-164
2-169
2-170
2-172
2-173
2-173
2-174
2-176
2-179
2-180
2-180
2-181
2-187
CONTENTS
CONTENTS
PAGE
FIGURES (Continued)
Figure 2.13 Dual-I nstruction Mode
Transitions ................ 2-188
Figure 2.14 Dual-Operation Data
Paths ...................... 2-189
Figure 3.1 Order of Boundary Scan
Chain ...................... 2-199
Figure 4.1 .Bus State Machine ......... 2-201
Figure 4.2 Fastest Read Cycles ....... 2-202
Figure 4.3 Fastest Write Cycles ....... 2-203
Figure 4.4 Fastest Read/Write
Cycles ..................... 2-204
Figure 4.5 Pipelined Read Followed by
Pipelined Write ............ 2-204
Figure 4.6 P!pel!ned Write Followed by
Plpelined Read ............ 2-205
Figure 4.7 Pipelining Driven by NA# .. 2-206
Figure 4.8 NA# Active with No Internal
Bus Requ.est ............... 2-206
Figure 4.9 Locked Cycles ............. 2-207
Figure 4.10 HOLD, HLDA, and BREQ .. 2-208
Figure 4.11 Reset Activities ............ 2-208
Figure 5.1 Pin Configuration-View from
Top Side ................... 2-209
Figure 5.2 Pin Configuration-View from
Pin Side ................... 2-210
Figure 5.3 168-Lead Ceramic PGA
Package Dimensions ...... 2-214
Figure 6.1 Icc vs Case Temperature .. 2-215
Figure 7.1 CLK, Input, and Output
Timings .................... 2-218
Figure 7.2 Typical Output Delay vs Load
Capacitance under WorstCase Conditions ........... 2-219
Figure 7.3 Typical Slew Time vs Load
Capacitance under WorstCase Conditions ........... 2-219
Figure 7.4 Typical Icc vs Frequency .. 2-219
Figure 8.1 REG-Format Variations .... 2-229
Figure 8.2 Core Escape Instruction
Format .................... 2-230
Figure 8.3 CTRL Instruction Forma~ ... 2-231
Figure 8.4 Floating-Point Instruction
Encoding .................. 2-232
TABLES
Table 2.1
Table 2.2
Table 2.3
Table 2.4
Table 2.5
Table 2.6
PAGE
Pixel Formats ................ 2-170
Values of PS ................. 2-174
Values of RB ................ 2-176
Values of RC ................ 2-176
Values of RM ................. 2-177
Combining Directory and Page
Protection ................... 2-183
Table 2.7 Instruction Set ............... 2-186
Table 2.8 Types of Traps .............. 2-190
Table 2.9 Register and Cache Values
.
. after Reset .................. 2-193
Table 3.1 Pin Summary ................ 2-194
Table 3.2 Indentifying Instruction
Fetches ..................... 2-196
Table 3.3 Cacheability based on KEN#
and CD OR'ed WT ........... 2-197
Table 3.4 Output Pin Status during
RESET ...................... 2-198
Table 3.5 Test Mode Selection .......... 2-198
Table 3.6 Test Mode Latches .......... 2-198
Table 5.1 Pin Cross Reference by
Location ..................... 2-211
Table 5.2 Pin Cross Reference by Pin
. Name ....................... 2-212
Table 5.3 Ceramic PGA Package
Dimension Symbols .......... 2-213
Table 6.1 Thermal Resistance eC/W)
8JC and 8CA ................. 2-215
Table 6.2 Maximum Allowable TA at
Various Airflows ............. 2-216
Table 7.1 D.C. Characteristics .......... 2-216
Table 7.2 A.C. Characteristics .......... 2-217
Table 8.1 Precision Specification ....... 2-220
Table 8.2 FADDP MERGE Update ..... 2-228
Table 8.3 Register Encoding ........... 2-228
Table 8.4 REG-Format Opcodes .. ,.... : 2-230
Table 8.5 Core Escape Opcodes ....... 2-231
Table 8.6 CTRL-FormatOpcodes ...... 2-231
Table 8.7 Floating-Point Opcodes ...... 2-232
Table 8.8 DPCEncoding ...... ~ ......... 2-233
Table 8.9 Instruction Characteristics ... 2-239
2-167
intel®
i860TM XR MICROPROCESSOR
however, in pipelined mode, a new· result can be
generated every clock for single-precision and every
other clock for double precision.
1.0 FUNCTIONAL DESCRIPTION
As shown by the block diagram on the front page,
the i860 XR microprocessor consists of 9 units:
1.
2.
3.
4.
5.
6.
7.
8.
9.
Core Execution Unit
Floating-Point Control Unit
Floating-Point Adder Unit
Floating-Point Multiplier Unit
Graphics Unit
Paging Unit
Instruction Cache
Data Cache
Bus and Cache Control Unit
The core execution unit controls overall operation of
the i860 XR microprocessor. The core unit executes
load, store, integer, bit, and control-transfer operations, and fetches instructions for the floating-point
unit as well. A set of 32 x 32-bit general-purpose
registers are provided for the manipulation of integer
data. Load and store instructions move 8-, 16-, and
32-bit data to and from these registers. Its full set of
integer, logical, and control-transfer instructions give
the core unit the ability to execute complete systems
software and applications programs. A trap mechanism provides rapid response to exceptions and external interrupts. Debugging is supported by the ability to trap on data or instruction reference.
The floating-point hardware is connected to a separate set of floating-point registers, which can· be
accessed as 16 x 64-bit registers, or 32 x 32-bit registers. Special load and store instructions can also
access these same registers as 8 x 128-bit registers.
All floating-point instructions use these registers as
their source and destination operands.
The floating-point control unit controls both the floating-point adder and the floating-point multiplier, issuing instructions, handling all source and. result
exceptions, and updating status bits in the floatingpoint status register. The adder and multiplier can
operate in parallel, producing up to two results per
clock. The floating-point data types, floating-pointinstructions, and exception. handling all support the
IEEE Standard for Binary Floating-Point Arithmetic
(ANSI/IEEE Std 754-1985).
The floating-point adder performs addition, subtraction, comparison, and conversions on 64- and 32-bit
floating-point values. An adder instruction executes
in three clocks; however, in pipelined mode, a new
result is generated every clock.
The floating-point multiplier performs floating-pOint
and integer multiply and floating-point reciprocal operations on 64- and 32-bit floating-point vaiues. A
multiplier instruction executes in three to four clocks;
The graphics unit has special integer logic that supports three-dimensional drawing in a graphics frame
buffer, with color intensity shading and hidden surface elimination via the Z-buffer algorithm. The
graphics unit recognizes the pixel as an 8-, 16-, or
32-bit data type. It can compute individual red, blue,
and green color intensity values within a pixel; but it
does so with parallel operations that take advantage
of the 64-bit internal word size and 64-bit external
bus. The graphics features of the i860 XR microprocessor assume that the surface of a solid object
is drawn with polygon patches whose shapes approximate the original object. The color intensities of
the vertices of the polygon and their distances from
the viewer are known, but the distances and intensities of the other points must be calculated by interpolation. The graphics instructions of the i860 XR
microprocessor directly aid such interpolation.
The paging unit implements protected, paged, virtual
memory via a 64-entry, four-way set-associative
memory called the TLB (Translation Lookaside Buffer). The paging unit uses the TLB to perform the
translation of logical address to physical address,
and to check for access violations; The access protection scheme employs two levels of privilege: user
and supervisor.
The instruction cache is a two-way set-associative
memory of four Kbytes, with 32-byte blocks. It transfers up to 64 bits per clock (320 Mbyte/sec at
40 MHz).
The data cache is a two-way set-associative memory of eight Kbytes, with 32-byte .blocks. It transfers
up to 128 bits per clock (640 Mbyte/sec at 40 MHz).
The i860 XR microprocessor normally uses writeback caching, i.e. memory writes update the cache
(if applicable) without necessarily updating memory
immediately; however, caching can be inhibited by
software where necessary.
The bus and cache control unit performs data and
instruction accesses for the core unit. It receives cycle requests and specifications from the core unit,
performs the data-cache or instuction-cache miss
processing, controls TLB translation, and provides
the interface to the external bus. Its pipelined structure supports up to three outstanding bus cycles.
2.0 PROGRAMMING INTERFACE
The programmer-visible aspects of the architecture
of the i860 XR microprocessor include data types,
registers, instructions, and traps.
2-168
i860™ XR MICROPROCESSOR
2.1 Data Types
2.1.2 ORDINAL
The i860 XR microprocessor provides operations for
integer and floating-point data. Integer operations
are performed on 32-bit operands with some support
also for 64-bit operands. Load and store instructions
can reference 8-bit, 16-bit, 32-bit, 64-bit, and 128-bit
operands. Floating-point operations are performed
on IEEE-standard 32- and 64-bit formats. Graphics
oriented instructions operate on arrays of 8-, 16-, or
32-bit pixels.
Arithmetic operations are available for 32-bit ordinals. An ordinal is an unsigned integer. An ordinal
can represent values in the range 0 to
4,294,967,295 (+232 - 1).
2.1.1 INTEGER
Figure 2.1 shows the real number formats. A singleprecision real (also called "single real") data type is
a 32-bit binary floating-point number. Bit 31 is the
sign bit; bits 30 .• 23 are the exponent; and bits 22 .. 0
are the fraction. In accordance with ,ANSI/IEEE
standard 754, the value of a single-precision real is
defined as follows: '
An integer is a 32-bit signed value in standard two's
complement form. A 32-bit integer can represent a
value in the range -2,147,483,648 (-2 31 ) to
2,147,483,647 (+ 231 - 1). Arithmetic operations on
8- and 16-bit integers can be performed by sign-extending the 8- or 16-bit values to 32 bits, then using
the 32-bit operations.
There are also add and subtract instructions that op,erate on 64-bit long integers.
Load and store instructions may also reference (in
addition to the 32- and· 64-bit formats previously
mentioned) 8- and 16-bit items in memory. When an
8- or 16-bit item is loaded into a register, it is converted to an integer by sign-extending the value to
32 bits. When an 8- or 16-bit item is stored from a
register, the corresponding number of low-order bits
of the register are used.
Also, there are add and subtract instructions that operate on 64-bit ordinals.
2.1.3 SINGLE- AND DOUBLE-PRECISION REAL
1. If e = 0 and f* 0 or e = 255 then generate a
floating-point source-exception trap when encountered in a floating-point operation.
2. If 0 < e < 255,then the value is C-'1)S'X 1.1 x
2 e - 127.
3. If e
= 0 and f = 0, then'the value is signed zero.
A double-precision real (also called "double real")
data type is a 64-bit binary floating-point riumber. Bit
63 is the sign bit; bits 62.~52 are the exponent; and
bits 51 .. 0 are the fraction. In accordance with ANSI/
IEEE standard 754,the value of a doublecprecision
real is defined as ,follows:
.
.
".
1. If e = 0 and f,= 0 or e = 2047, then generate a
floating-point source-exceptiori trap when Emcountered in a floating-point operation. . '
2. If 0
< e < 2047, then the value is (-1)S x 1.1 x
29-1023.
'
Single-Precision Real
31
o
23
Is I
t
LfRACTION
.___
L
---=================EXPONENT
SIGN
240296-2
Double-Precision Real'
63
o
52
I
Is I
tl
LfRACTION
EXPONENT
SIGN
L.._ _ _ _ _ _ _ _ _ _ _ _ _ _ _
L.._ _ _ _ _ _ _ _ _ _ _ _ _ _...:...._ _
~
240296-3
Figure 2.1. Real Number Formats
2-169
•
i860TM XR MICROPROCESSOR
3. If e
=
0 and f
=
0, then the value is signed zero.
Table 2.1. Pixel Formats
Bits of
Bits of
Bits of
Bits of
Pixel
Other
Size
Color 1 Color 2 Color 3
Attribute
(in bits) Intensity Intensity Intensity
(Texture)
The special values infinity, NaN ("Not a Number"),
indefinite, and denormal generate a trap when encountered .. The trap handler implements IEEE-standard results.
N (~ 8) bits of intensity'
6
6
4
8
8
8
8
16
32
A doubl.e real value occupies an even/odd pair of
floating-point registers. Bits 31 .. 0 are stored in the
even-numbered floating-point register; bits 63 .. 32
are stored in the next higher odd-numbered floatingpOint register.
I
8-N
I
8
The intensity attribute fields may be assigned to· colors in
any order convenient to the application.
2.1.4 PIXEL
A pixel may be 8, 16, or 32 bits long depending on
color and intensity resolution requirements. Regard"
less of the pixel size, the i860 XR microprocessor
always operates on 64 bits worth of pixels at a time.
The pixel data type is used by two kinds of instructions:
• The selective pixel-store instruction that helps implement hidden surface elimination.
• The pixel add instruction that helps implement
3-D color intensity shading.
'With a·bit pixels, up to a bits can be used for intensity; the
remaining bits can be used for any other attribute, such as
color, The intensity bits must be the low-order bits of the
pixel.
2.2 Register Set
As Figure 2.3 shows, the i860 XR microprocessor
has the following registers:
• An integer register file
• A floating-point register file
• Six control registers (psr, epsr, db, dirbase, fir,
and fsr)
• Four special-purpose registers (KR, KI, T, and
MERGE)
To perform color intensity shading efficiently in a varietyof applications, the i860 XR microprocessor defines three pixel formats according to Table 2.1 ..
Figure 2.2 illustrates one way of assigning meaning
to the fields of pixels. These assignments are for
illustration purposes only. The i860 XR microprocessor defines only the field sizes, not the specific use
of each field. Other ways of using the fields of pixels
are possible.
The control registers are accessible only by load
and store control-register instructions; the integer
and floating-point registers are accessed by arithmetic operations and load and store instructions. The
special-purpose registers KR, KI, T, and MERGE are
used by a few specific instructions.
7
o
5
. 8-BIT PIXEL ......C--'"I____--'
15
3
9
G
R
16-BIT PIXEL
0
B
32-BIT PIXEL
23
31
R
o
7
15
B
G
T
240296-4
I-Intensity, R-Red intensity, G-Green intensity, 8-81ue intensity, C-Color, T:-Texture
These assignments of specific meanings to the fields of pixels are for illustration purposes only. Only the field sizes are
defined, not the specific use of each field.
Figure 2.2. Pixel Format Example
2-170
Intel·
i860™ XR MICROPROCESSOR
2.2.1 INTEGER REGISTER FILE
There are 32 integer registers, each 32 bits wide,
referred to as rO through r31, which are used for
address computation and scalar integer computations. Register rO always returns zero when read,
independently of what is stored in it.
The 128-bit load and store instructions, along with
the 128-bit data path between the floating-point registers and the data cache help to sustain the extraordinarily high rate of computation.
2.2.3 PROCESSOR STATUS REGISTER
The processor status register (psr) contains miscellaneous state information for the current process.
Figure 2.4 shows the format of the psr.
2.2.2 FLOATING-POINT REGISTER FILE
There are 32 floating-point registers, each 32-bits
wide, referred to as fO through f31, which are used
for floating-point computations. Registers
and f1
always return zero when read, independently of
what is stored in them. The floating-point registers
are also used by a set of graphics operations, primarily for 3D graphics computations.
to
When accessing 64-bit floating-point or integer values, the i860 XR microprocessor uses an even/odd
pair of registers. When accessing 128-bit values, it
uses an aligned set of four registers (YO, f4, fS, ... ,
12S). The instruction must designate the lowest register number of the set of registers containing 64- or
128-bit values. Misaligned register numbers produce
undefined results. The register with the lowest number contains the least significant part of the value.
For 128-bit values, the register pair with the lower
numbers contain the least significant 64 bits while
the register pair with the higher numbers contain the
most significant 64 bits.
2-171
• BR (Break Read) and BW (Break Write) enable a
data access trap when the operand address
matches the address in the db register and a
read or write (respectively) occurs.
o Various instructions set CC (Condition Code) ac-
cording to tests they perform. The branch-on- •
condition-code instructions test its value. The bla
instruction sets and tests LCC (Loop Condition
Code).
o 1M (Interrupt Mode) enables external interrupts if
set; disables interrupts if clear.
o U (User Mode) is set when the i860 XR micro-
, processor is executing in user mode; it is clear
when the i860 XR microprocessor is executing in
supervisor mode. In user mode, writes to some
control registers are inhibited. This bit also controls the memory protection mechanism. See
section 2.4.4.3 for a description of memory protection in user and supervisor modes.
Intel.
i860TM XR MiCROPROCESSOR
.........,,.:-.3:.;2:...- ADDRESS
128
32
CONTROL
REGISTERS
128
32
32
128
32
32
ADDRESS
240296-5
Figure 2.3. Registers and Data Paths
2-172
i860™ }{R MICROPROCESSOR
BREAK READ
BREAK WRITE
CONDITION CODE - - - - - - - - - - - - - - - - - - - - - - - - ,
lOOP CONDITION CODE - - - - - - - - - - - - - - - - - - - - - - ,
INTERRUPT MODE - - - - - - - - - - - - - - - - - - - - - ,
PREVIOUS INTERRUPT MODE - - - - - - - - - - - - - - - - - - ,
USER MODE
PREVIOUS USER MODE
INSTRUCTION TRAP - - - - - - - - - - - - - - - - - - ,
INTERRUPT - - - - - - - - - - - - - - - - - - - ,
I
I
I
].1
INSTRUCTION
DATA ACCESS ACCESS
TRAP TRAP
-------------,.
flOATING-POINT TRAP
DELAYED SWITCH - - - - - - - - - - - - - - - - ,
DUAL INSTRUCTION MODE - - - - - - - - - - ,
31
PM
.....
tL
fI
Kill NEXT flOATING-POINT INSTRUCTION
(RESERVED)
SHIrT COUNT
PIXEL SIZE
PIXEL MASK
240296-6
'Can be changed only from supervisor level.
Figure 2.4 Processor Status Register
11
INTERLOCK - - - - - ' - - - - - - - - - - - - - - ,
WRITE-PROTECT MODE
DATA CACHE SIZE - - - - - - ,
31
PROCESSOR
TYPE
(RESERVED)
ttt
Il
~ (RESERVED)
'----------- :tGG~~6~:~EM~~EMODE
'--- - - - - - - - - - - - OVERFLOW flAG
240296-31
'Can be changed only from supervisor level.
Figure 2.5 Elttended Processor Status Register
o PIM (Previous Interrupt Mode) and PU (Previous
o OS (Delayed Switch) is set if a trap occurs during
User Mode) save the corresponding status bits
(1M and U) on a trap, because those status bits
are changed when a trap occurs. They are restored into their corresponding status bits when
returning from a trap handler with a branch indirect instruction when a trap flag is set in the psr.
FT (Floating-Point Trap), OAT (Data Access
Trap), IAT (Instruction Access Trap), IN (Interrupt), and IT (Instruction Trap) are trap flags.
They are set when the corresponding trap condition occurs. The trap handler examines these bits
to determine which condition or conditions have
caused the trap.
the instruction before dual-instruction mode is entered or exited. If OS is set and DIM (Duallnstruction Mode) is clear, the i860 XR microprocessor
switches to dual-instruction mode one instruction
after returning from the trap handler. If OS and
DIM are both set, the i860 XR microprocessor
switches to single-instruction mode one instruction after returning from the trap handler.
When a trap occurs, the. i860 XR microprocessor
sets DIM if it is executing in dual-instruction
mode; it clears DIM if it is executing in single-instruction mode. If DIM is set after returning from a
trap handler, the i860 XR microprocessor resumes execution in dual-instruction mode.
o
2-173
o
. +_1
In'el®
i860 iM XR MiCROPROCESSOR
• When KNF (Kill Next Floating-Point Instruction) is
set, the next floating-point instruction is suppressed (except that its dual-instruction mode bit
is interpreted). A trap handler sets KNF if the
trapped floating-point instruction should not be
reexecuted.
• SC (Shift Count) stores the shift count used by
the last right-shift instruction. It controls the number of shifts executed by the double-shift instruction.
• PS (Pixel Size) and PM (Pixel Mask) are used by
the pixel-store instruction and by the graphics instructions. The values of PS control pixel size as
defined by Table 2.2. The bits in PM correspond
to pixels to be updated by the pixel-store instruction pst.d. The low-order bit of PM corresponds
to the low-order pixel of the 54-bit source operand of pst.d. The number of low-order bits of PM
that are actually used is the number of pixels that
fit into 54-bits, which depends upon PS. If a bit of
PM is set, then pst.d stores the. corresponding
pixel. Refer also to the pst.d instruction in section
2.2.4 EXTENDED PROCESSOR STATUS
REGISTER
The extended processor status register (epsr) contains additional state information for the current process beyond that stored in the psr. Figure 2.5 shows
the format of the epsr.
., The processor type is one for the i850 XR microprocessor.
., The stepping number has a unique value that distinguishes among different revisions of the processor.
.. IL (Interlock) is set if a trap occurs after a lock
instruction but before the load or store following
the subsequent unlocl< instruction. IL indicates to
the trap handler that a locked sequence has
been interrupted. When the trap handler finds IL
set, it should scan backwards for the lock instruction and restart at that point. The absence of
a lock instruction within 30-33 instructions of the
trap indicates a programming error.
o
B.
Table 2.2. Values of PS
Value
Pixel Size
in bits
Pixel Size
in bytes
00
01
10
11
B
15
32
(undefined)
1
2
4
(undefined)
WP (write protect) controls the semantics of the
W bit of page table entries. A clear W bit in either
the directory or the page table entry causes
writes to be trapped. When WP is clear, writes
are trapped in user mode, but not in supervisor
mode. When WP is set, writes are trapped in both
user and supervisor modes. After the value of the
WP bit is changed, the TLB must be invalidated
by setting the ITI bit of the dirbase register, before any stores are performed.
" INT (Interrupt) is the value of the INT input pin.
o DCS (Data Cache Size) is a read-only field that
tells the size of the on-chip data cache. The number of bytes actually available is 212 + DCS; therefore, a value of zero indicates 4 Kbytes, one indicates 8 Kbytes, etc.
ADDRESS TRANSLATION ENABLE - - - - - - - - - - - - - - - - - - - - - - - - ,
j .
BUS
-DRAMLOCK
PAGESIZE
-----------------------,- - [
j.\
I-CACHE,
TLBINVALIDATE
- - - - - - - - - - - -_
(RESERVED)
----- - - - - - '_
--.CODE SIZE 8-BIT - - - - - ' - - - - - - - - - - - - - - - - - REPLACEMENT BLOCK - - - - - - - - - - - - - - - - ,
REPLACEMENT CONTROL - - - - - - - - - - - - - : - - -
12
31
DIRECTORY TABLE BASE (DTB)
'"
*
'Can be changed only from supervisor level
Figure 2.6. Directory Base Register
2-174
*
'" '" * *
*
'"
240296-7
i860TM XR MICROPROCESSOR
" PBM (Page-Table Bit Mode) determines which bit
of page-table entries is output on the PTB pin.
When PBM is clear, the PTB signal reflects bit CD
of the page-table entry used for the current cycle.
When PBM is set, the PTB signal reflects bit WT
of the page-table entry used for the current cycle.
.. BE (Big Endian) controls the ordering of bytes
within a data item in memory. Normally (i.e. when
BE is clear) the i860 XR microprocessor operates
in little endian mode, in which the addressed byte
is the low-order byte. When BE is set (big endian
mode), the low-order three bits of all load and
store addresses are complemented, then
masked to the appropriate boundary for alignment. This causes the addressed byte to be the
most significant byte. Section 2.3 discusses little
and big endian addressing.
o
OF (Overflow Flag) is set by adds, addu, subs,
and subu when integer overflow occurs. For
adds and subs, OF is set if the carry from bit 31
is different than the carry from bit 30. For addu,
OF is set if there is a carry from bit 31. For subu,
OF is set if there is no carry from bit 31. Under all
other conditions, it is cleared by these instructions. OF controls the function of the intovr
instruction. OF cannot be written in user mode
using ST.C.
o
o
"
2.2.5 DATA BREAKPOINT REGISTER
The data breakpoint register (db) is used to generate a trap when the i860 XR microprocessor makes
a data-operand access to the address stored in this
register. The trap is enabled by BR and BW in psr.
The db register can only be changed from supervisor level. When comparing, a number of low order
bits of the address are ignored, depending on the
size of the operand. For example, a 16-bit access
ignores the low-order bit of the address when comparing to db; a 32-bit access ignores the low-order
two bits. This ensures that any access that overlaps
the address contained in the register will generate a
trap. The OAT occurs before the. data is accessed
and prevents the load or store from completing.
2.2.6 DIRECTORY BASE REGISTER
The directory base register dirbase (shown in Figure
2.6) controls address translation, caching, and bus
options. The dirbase register can only be changed
from supervisor level. The BL bit is changed from
user level with the lock and unlock instructions.
.. ATE (Address Translation Enable), when set, enables the virtual-address translation algorithm.
The data cache must be flushed before changing
the ATE bit.
oOPS (DRAM Page Size) controls how many bits
to ignore when comparing the current bus-cycle
2-175
o
address with the previous bus-cycle address to
generate the NENE# signal. This feature allows
for higher speeds when using static column or
page-mode DRAMs and consecutive reads and
writes access the row. The comparison ignores
the low-order 12 + DPS bits. A value of zero is
appropriate for one bank of 256K x n RAMs, 1
for 1M x n RAMS, etc. For interleaved memory,
increase DPS by one for each power of interleaving-add one for 2-way, and two for 4-way, etc.
When BL (Bus Lock) is set, external bus accesses are locked. The LOCK # signal is asserted the
next bus cycle whose internal bus request is generated after BL is set. It remains set on every
subsequent bus cycle as long as BL remains set.
The LOCK # signal is deasserted on the next
load or store instruction after BL is cleared. Traps
immediately clear BL. The lock and unlock
instructions control the BL bit. The result of modifying BL with the st.c instruction is not defined.
ITI (I-Cache, TLB Invalidate), when set in the value that is loaded into dirbase, causes all entries
in the instruction cache and address-translation
cache (TLB) to be invalidated. The ITI bit does
not remain set in dirbase. ITI always' appears as
zero when reading dirbase. Section 2.5 discusses flushing the data cache before invalidating the
TLB.
When CS8 (Code Size 8-Bit) is set, instruction
cache misses are processed as 8-bit bus cycles.
When this bit is clear, instruction cache misses
are processed as 64-bit bus cycles. This bit can
not be set by software; hardware sets this bit at
initialization time. It can be cleared by software
(one time only) to allow the system to execute out
of 64-bit memory after bootstrapping from 8-bit
EPROM. Anondelayed branch to code in 64-bit
memory should directly follow the st.c (store control register) instruction that clears CS8, in order
to make the transition from 8·bit to 64-bit memory
occur at the correct time. The branch instruction
must be aligned on a 64-bit boundary.
RB (Replacement Block) identifies the cache
block to be replaced by cache replacement algorithms. The high-order bit of RB is ignored by the
instruction and data caches. RB conditions the
cache flush instruction flush, which is discussed
in Section 8. Table 2.3 explains the values of RB.
o RC (Replacement Control) controls cache replacement algorithms. Table 2.4 explains the significance of the values of RC.
o DTB (Directory Table Base) contains the highcorder 20 bits of the physical address of the page
directory when address translation is enabled (i.e.
ATE = 1). The low-order 12 bits of the address
are zeros.
ini'ei®
i860TM XR MICROPROCESSOR
FLUSH ZERO
TRAP INEXACT --~~----------------------,
ROUNDING MODE - - - - - - - - - - - - - - - - - - - - - - - - ,
UPDATE
FLOATING-POINT TRAP ENABLE - - - - - - - - - - - - - - - - - ,
(RESERVED) - - - - - - - - - - - - - - - - - - - - - - ,
STICKY INEXACT FLAG - - - - - - - - - - - - - - - - - - ,
~~~~I~~I~~C~~6~R~L-OW
~-==-----,l-.llil j
MULTIPLIER OVERFLOW - - - - - - - - - - - - - - - , '
MULTIPLIER INEXACT - - - - - - - - - - - - - - ,
MULTIPLIER ADD ONE - - - - - - - - - - - - - ,
ADDER UNDERFLOW - - - - - - - - - - - - - ,
ADDER OVERFLOW - - - - - - - - - - - - ,
·,1
1.
Itt.
ADDER INEXACT
ADDER ADD ONE
RESULT REGISTER
ADDER EXPONENT
(RESERVED)
LOAD PIPE RESULT PRECISION
INTEGER(GRAPHICS) PIPE RESULT PRECISION
MULTIPLIER PIPE RESULT PRECISION
ADDER PIPE RESULT PRECISION
(RESERVED)
L________
.
240296-8
Figure 2.7. Floating-Point Status Register
Table 2.3. Values of RB
Value
0
0
1
1
0
1
0
1
Replace
TLB Block
Replace Instruction
and Data Cache Block
0
1
2
3
0
1
0
1
Table 2.4. Values of RC
Value
Meaning
00
Selects the normal replacement
algorithm where any block in the set
may be replaced on cache misses in all
caches.
01
Instruction, data, and TLB cache
misses replace the block selected by
RB. The instruction and data caches
ignore the high-order bit of RB. This
mode is used for instruction cache and
TLB testing.
10
Data cache misses replace the block
selected by the low-order bit of RB.
Instruction and TLB caches use
random replacement.
11
Disables data cache replacement.
Instruction and TLB caches use
random replacement.
2.2.7 FAULT INSTRUCTION REGISTER
When a trap occurs, this register contains the address of the trapping instruction (not necessarily the
instruction that created the conditions that required
the trap). The fir is a read-only register. In single-instruction mode, using a Id.c instruction to read the
fir anytime except the first time after a trap saves in
idest the address of the Id.c instruction; in dual-instruction mode, the address of its floating-point companion(address of the Id.c - 4) is saved.
2.2.8 FLOATING-POINT STATUS REGISTER
The floating-point status register (fsr) contains the
floating-point trap and rounding-mode status for the
current process. Figure 2.7 shows its format. The fsr
is writable in user level.
• If FZ (Flush Zero) is clear and underflow occurs,
a result-exception trap is generated. When FZ is
set and underflow occurs, the result is set to zero,
and no trap due to underflow occurs.
o If TI (Trap Inexact) is clear, inexact results do not
cause a trap. If TI is set, inexact results cause a
trap. The sticky inexact flag (SI) is set whenever
an inexact result is produced, regardless of the
setting of TI.
• RM (Rounding Mode) specifies one of the four
rounding modes defined by the IEEE standard.
Given a true result b that cannot be represented
intel·
i860™ XR MICROPROCESSOR
Table 2.5. Values of RM
Value
Rounding Action
Rounding Mode
00
Round to nearest or even
01
10
11
Round down (toward - 00)
Round up (toward + 00
Chop (toward.zero)
Closer to b of a or c; if equally
close, select even number
(the one whose least
significant bit is zero).
a
c
Smaller in magnitude of a or c.
by the target data type, the i860 XR microprocessor determines the two representable numbers a
and c that most closely bracket b in value (a < b
< c). The i860 XR microprocessor then rounds
(changes) b to a or c according to the mode selected by RM as defined .in Table 2.5. Rounding
introduces an error in the result that is less than
one least-significant bit.
o The U-bit (Update Bit), if set in the value that is
loaded into fsr by a st.c instruction, enables updating of the result-status bits (AE, AA, AI, AD,
AU, MA, MI, MO, and MU) in the first-stage of the
floating-point adder and multiplier pipelines. If this
bit is clear, the result-status bits are unaffected
by a st.c instruction; st.c ignores the corresponding bits in the value that is being loaded. A st.e
always updates fsr bits 21 .. 17 and 8.. 0 directly.
The U-bit does not remain set; it always appears
as zero when read.
o The FTE (Floating-Point Trap Enable) bit, if clear,
disables all floating-point traps (invalid input operand, overflow, underflow, and inexact result).
• SI (Sticky Inexact) is set when the last stage result of either the multiplier or adder is inexact (I.e.
when either AI or MI is set). SI is "sticky" in the
sense that it remains set until reset by software.
AI arid MI, on the other hand, can by changed by
the subsequent floating-point instruction.
• SE (Source Exception) is set when one of the
source operands of a floating-point operation is
invalid; it is cleared when all the input operands
are valid. Invalid input operands include denorrnals, infinities, and all NaNs (both quiet and signaling).
• When read from the fsr, the result-status bits MA,
MI, MO, and MU (Multiplier Add-One, Inexact,
Overflow, and Underflow, respectively) describe
the last stage result of the multiplier.
When read from the fsr, the result-status bits AA,
AI, AD, AU, and AE (Adder Add-One, Inexact,
Overflow, Underflow, and Exponent, respectively)
describe the last stage result of the adder. The
high-order three bits of the 11-bit exponent of the
adder result are stored in the AE field.
The Adder Add One and Multiplier Add One bits
indicate that the absolute value of the result frac-
tion grew by one least-significant bit due to
rounding. AA and MA are not influenced by the
sign of the result.
After a floating-point operation in a given unit (adder or multiplier), the result-status bits of that unit
are undefined until the point at which result exceptions are reported.
When written to the fsr with the U-bit set, the
result-status bits are placed into the first stage of
the adder and multiplier pipelines. When the
processor executes pipelined operations, it propagates the result-status bits of a particular unit
(multiplier or adder) one stage for each pipelined
floating-point operation for that unit. When they
reach the last stage, they replace the normal result-status bits in the fsr. When the U-bit is not
set, result-status bits in the word being written to
the fsr· are ignored.
In a floating-point dual-operation instruction (e.g.
add-and-multiply or subtract-and-multiply), both
the multiplier and the adder may set exception
bits. The result-status bits for a particular unit remain set until the next operation that uses that
unit.
o RR (Result Register) specifies which floatingpoint register (fO-f31) was the destination register when a result-exception trap occurs due to a
scalar operation.
• LRP (Load Pipe Result Precision), .IRP (Integer
(Graphics) Pipe Result Precision), MRP (Multiplier
Pipe Result Precision), and ARP (Adder Pipe Result Precision) aid in restoring pipeline state after
a trap or process switch. Each defines the precision of the last stage result in the corresponding
pipeline. One of these bits is set when the result
in the last stage of the corresponding pipeline is
double precision; it is cleared if the result is single
precision. These bits cannot be changed by software.
2.2.9 KR, KI, T, AND MERGE REGISTERS
The KR, KI, and T registers are special-purpose registers used by the dual-operation floating-point
instructions pfam, pfmam, pfsm, and pfmsm,
2-177
FI
inlei.
i860TM XR MICROPROCESSOR
which initiate both an adder (A-unit) operation and a
multiplier (M-unit) operation. The KR, KI, and T registers can store values from one dual-operation instruction and supply them as inputs to subsequent
dual-operation instructions. (Refer to Figure 2.14.)
The MERGE register is used only by the graphics
instructions. The purpose of the MERGE register is
to accumulate (or merge) the results of multiple-addition operations that use as operands the color-intensity values from pixels or distance values from a
Z-buffer. The accumulated results can then be
stored in one 64-bit operation.
Code accesses are always done with little endian
addressing. This implies that code will appear differently than documented here when accessed as big
endian data. Intel recommends that disassemblers
running in a big endian system, convert instructions
which have been read as data back to little endian
form and present them in the format documented
here.
Page directories and page tables are also accessed
in little endian mode, regardless of the value of the
BE bit.
Alignment requirements are as follows (any violation
results in a data-access trap): .
.
Two multiple-addition instructions and an OR instruction use the MERGE register. The addition instructions are designed ~o add interpolation values
to each color-intensity field in an array of pixels or to
each distance value in a Z-buffer.
Refer to the instruction descriptions in section 8 for
more information about these registers.
.
• 128-bit values are aligned on 16-byte boundaries
when referenced in memory (Le. the four least
significant address bits .must be zero).
o 64-bit values are aligned on 8-byte boundaries
when referenced in memory (Le. the three least
significant address bits must be zero).
• 32-bit values are aligned on 4-byte boundaries
when referenced In memory (Le. the two least
significant address bits must be zero).
• 16-bit values are aligned on 2~byte boundaries
when referenced in memory (Le. the least significant address bit must be zero).
2.3 Addressing
Memory is addressed in byte units with a paged virtual-address space of 232 bytes. Data and instructions can be located anywhere in this address
space. Address arithmetic is performed using 32-bit
input values and produces 32-bit results. The low-order 32 bits of the r~sult are used in case of overflow.
Normally, multibyte data values are stored in memory in little endian format, Le., with the least significant
byte at the lowest memory address. As an option,
the ordering can be dynamically selected by software in supervisor mode. The i860 XR micropro.cessor also offers big end ian mode, in which the most
significant byte of a data item is at the lowest address. Figure 2.8 shows the difference between.the
two storage modes, Big endian and little endian data
areas should not be mixed within 64-bit data word.
Illustrations of data structures in this data sheet
show data stored in little endian mode, Le., the loworder byte is at the lowest memory address.
a
2.4 Virtual Addressing
When address translation is enabled, the i860 XR
microprocessor maps instruction and data virtual addresses into physical addresses before referencing
memory. This address transformation is compatible
with that of the Intel386™ microprocessor and implements the basic features neede? for page-oriented virtual-memory systems and page-level protec~~
.
The address translation is optional. Address translation is in effect only when the ATE bit of dirbase is
set. This bit is typically set by the operating system
during software initialization. The ATE bit must be
set if the operating system is to implement page-oriented protection or page-oriented virtual memory.
2-178
_.
l
MAIN MEMORY
WORD
,I
@
WORDO H G F E D C B A
d63 .
I
dO
"1\
iFi
...
CD
c:
CD
..
Byte Enables
!;
:::I
~ Q,
-..J m
co iFi
I'll
:::I
Q,
iii·
:::I
~
n
n
CD
1/1
1/1
CD
1/1
DATA BUS
(BE#)
CD
DI
Id.b O(rO), r16
Id.b
Id.b
Id.b
Id.b
Id.b
Id.b
Id.b
1(rO),
2(rO),
3(rO),
4(rO),
S(rO),
6(rO),
7(rO),
r16
r16
r16
r16
r16
r16
r16
d63
dO
.Id.s
Id.s
Id.s
Id.s
O(rO),
2(rO),
4(rO),
6(rO),
r16
r16
r16
r16
1:0
3:2
5:4
7:6
Id.1 O(rO), r16
Id.l 4(rO), r16
3:0
7:4
d31
dO
A
B
C
D
E
F
G
H
B
C
D
E
F
G
H
d63
dO
0
HGFE
dO
DC B AI
d31
I~
(BE#)
7
6
5
4
3
2
1
0
C
E
G
7:6
5:4
3:2
1:0
dO
C
G
B
F
~I
7:4
3:0
o-I
DATA BUS
d63
~
r16
dO
d31
dO
H
G
F
E
D
C
B
A
d63
dO
D
F
H
HG
I
d31
[J]
DC
FE
d63
Byte Enables
r16
A
0
1
2
3
4
5.
6
7
iii
0)
BIGENDIAN
LITTLE ENDIAN
I\)
dO
d31
DC
.
BA
d63
dO
IH G FE DC B AI
.
F
D
B
G
C
F
B
d31
I~
s::
o::u
H
G
F
E
D
C
B
A
::u
o
dO
::u
Q [J]
FE
><
::u
o
"
om
tJ)
tJ)
o
E
C
A
dO
~I
'W
~
Iiiiil
F
~
~
NOTE:
64· and 128-bit big endian accesses are treated the same as little endian accesses.
~
2:el
.~
II
inteJ·
i860TM XR MICROPROCESSOR
o
11
21
31
DIR
PAGE
OFFSET
I,
L -_ _ _ _ _ _ _ _~_ _ _ _ _ _~_ _~--~~--------~----------------------~
Figure 2.9. Format of a Virtual Address
Address
is reset.
the ATE
the ATE
translation is disabled when the processor
It is enabled when a store to dirbase sets
bit. It is disabled again when a store clears
bit:
2.4.1 PAGE FRAME
A page frame is a 4-Kbyte unit of contiguous addresses of physical main memory. Page frames begin on 4-Kbyte boundaries and are fixed in size. A
page is the collection of data that occupies a page
frame when that data is present in main memory.
The data may also occupy some location in secondary storage when there is not sufficient space in
main memory.'
2.4.2 VIRTUAL ADDRESS
A virtual address refers indirectly to a physical address by specifying a page table, a page within that
I
DIR
I
PAGE
I
OFFSET
table, and an offset within that page. Figure 2.9
shows the format of a virtual address.
Figure 2.10 shows how the i860 XR microprocessor
converts the DIR, PAGE, and OFFSET .fields of a
virtual address into the physical address by consulting two levels of page tables. The addressing mechanism uses the DIR field as an index into a page
directory, uses the PAGE field as an index into the
page table determined by the page directory, and
uses the OFFSET field to address a byte within the
page determined by the page table.
2.4.3 PAGE TABLES
A page table is simply an array of 32-bit page specifi·
ers. A page table is itself a page, and therefore contains 4 Kbytes of memory or at most 1K 32-bit entries.
PAGE FRAME
I
I
PHYSICAL
ADDRESS
PAGE DIRECTORY
PAGE TABLE
4
:.....-...,.
DIR ENTRY
PG TBl ENTRY
f---,
DTB
240296-32
, Figure 2.10. Address Translation
2-180
i860TM XR MICROPROCESSOR
Two levels of tables are used to address a page of
memory. At the higher level is a page directory. The
page directory addresses up to 1K page tables of
the second level. A p~ge table of the second level
addresses up to 1K pages. All the tables addressed
by one page directory, therefore, can address 1M
pages (2 20 ). Because each page contains 4 Kbytes
(2 12 bytes), the tables of one page directory can
span the entire physical address space of the i860
XR microprocessor (2 20 x 212 = 232).
The physical address of the current page directory is
stored in DTB field of the dirbase register. Memory
management software has the option of using one
page directory for all processes, one page directory
for each process, or some combination of the two.
2.4.4 PAGE-TABLE ENTRIES
cates that the entry can be used. When P = 0 in
either level of page tables, the entry is not valid for
address translation, and the rest of the entry is available for software use; none of the other bits in the
entry is tested by the hardware. If P = 0 in either
level of page tables when an attempt is made to use
a page-table entry for address translation, the processor signals either a data-access fault or an instruction-access fault. In software systems that support paged virtual memory, the trap handler can
bring the required page into physical memory.
Note that there is no P bit for the page directory
itself. The page directory may be not-present while
the associated process is suspended, but the operating system must ensure that the page directory
indicated by the dirbase image associated.with the
process is present in physical memory before the
process is dispatched.
Page-table entries (PTEs) in either level of page tables have the same format. Figure 2.11 illustrates
.this format.
2.4.4.1 Page Frame Address
The page frame address specifies the physical starting address of a page. Because pages are located
on 4K boundaries, the low-order 12 bits are always
zero. In a page directory, the page frame address is
the address of a page table. In a second-level page
table, the page frame address is the address of the
page frame that contains the desired memory oper.
and.
2.4.4.2 Present Bit
2.4.4.3 Writable and User Bits
The W (writable) and U (user) bits are used for pagelevel protection, which the i860 XR microprocessor
performs at the same time as address translation.
The concept of privilege for pages is implemented
by assigning each page to one of two .Ievels:
1. Supervisor level (U = O)-for the operating system and other systems software and related data.
2. User level (U = 1)-for applications procedures
and data.
The U bit of the psr indicates whether the i860 XR
microprocessor is executing at user or supervisor
level. The i860 XR microprocessor maintains the U
bit of psr as follows:
The P(present) bit indicates whether a page table
entry can be used in address translation. P = 1 indiPRESENT - - - - - - - - - - - - - - - - - - - - - - - ,
WRITABLE - - - - - - - - - - - - - - - - - - - - - - - ,
USER----------------------,
WRITE-THROUGH - - - - - - - - - - - - - - - - - - - ,
CACHE DISABLE - - - - - - - - - - - - - - - - - - - ,
ACCESSED - - - - - - - - - - - - - - - - - - - ,
DIRTY - - - - - - - - - - - - - - - - - - - - ,
(RESERVED) . , - - - - - - - - - - - - - - - - - ,
AVAILABLE FOR SYSTEMS PROGRAM USE
I'
PAGE FRAME ADDRESS 31. .12
240296-34
NOTE:
X indicates Intel reserved. Do not use.
Figure 2.11. Format of a Page Table Entry
2-181
II
intel®
i860™ XR MICROPROCESSOR
.. The i860 XR microprocessor clears the psr U bit
to indicate supervisor level when a trap occurs
(including when the trap instruction causes the
trap). The prior value of U is copied into PU.
G The i860 XR microprocessor copies the psr PU
bit into the U bit when an indirect branch is executed and one of the trap bits is set. If PU was
one, the i860 XR microprocessor enters user
level.
If the CD (cache disable) bit in the second-level
page-table entry is set, data. from the associated
page is not placed in instruction or data caches.
Clearing CD permits the cache hardware to place
data from the associated page into caches. The CD
bit of page directory entries is not referenced by the
processor, but is reserved.
With the U bit of psr and the Wand U bits of the
page table entries, the i860 XR microprocessor implements the following protection rules:
To control external caches, the i860 XR microprocessor outputs on its PTB pin either the CD or WT bit.
The PBM bit of epsr determines which bit is output.
2.4.4.5 Cache Disable Bit
• When at user level, a read or write of a supervisor-level page causes a trap.
e When at user level, a write to a page whose W bit
is clear causes a trap.
e When at user level, st.c to certain control registers is ignored.
2.4.4.6 Accessed and Dirty Bits
The A (accessed) and D (dirty) bits provide data
about page usage in both levels of the page tables.
. When the i860 XR microprocessor is executing at
supervisor level, all pages are addressable, but,
when it is executing at user level, only pages that
belong to the user-level are addressable.
When the i860 XR microprocessor is executing at
supervisor level, all pages are readable. Whether a
page is writable depends upon the write-protection
mode controlled by WP of epsr:
WP =0
WP = 1
All pages are writable.
A write to a page whose W bit is
clear causes a trap.
When the i860 XR microprocessor is executing at
user level, only pages that belong to user level and
are marked writable are actually writable; pages that
belong to supervisor level are neither readable nor
writable from user level.
2.4.4.4 Write-Through Bit
The i860 XR microprocessor does not implement a
write-through caching policy for the on-chip data
cache; however, the WT (write-through) bit in the
second-level page-table entry does determine internal caching policy. If WT is set in a PTE, on-chip
caching of data from the corresponding page is inhibited. The i860 XR CPU may place pages having
WT = 1 into the instruction cache. Future implementations of the i860 XR architecture may adhere
to a write-through data caching policy. Therefore,
they may cache pages having the WT bit of ~he PTE
set. If WT is clear, the normal write-back policy is
applied to data from the page in the on-chip caches.
The WT bit of page directory entries is not referenced by the processor, but is reserved.
The WT bit is independent of the CD bit; therefore,
data may be placed in a second-level coherent
cache, but kept out of the on-chip caches.
The i860 XR microprocessor sets the corresponding
accessed bits in both levels of page tables before a
read or write operation to a page. The processor
tests the dirty bit in the second-level page table before a write to an address covered by that page table
entry, and, under certain conditions, causes traps.
The trap handler then has the opportunity to maintain appropriate values in the dirty bits. The dirty bit
in directory .entries is not tested by the i860 XR microprocessor.. The precise algorithm for using these
bits is specified in Section 2.4.5.
An operating system that supports paged virtual
memory can use these bits to determine what pages
to eliminate Irom physical memory when the demand for memory exceeds the physical memory
available. The D and A bits in the PTE (page-table
entry) are normally initialized to zero by the operating system. The processor sets the A bit when a
page' is accessed either by a read or write operation.
When a data- or instruction-access fault occurs, the
trap handler sets the D bit il an allowable write is
being performed, then re-executes the instruction.
The operating system is responsible for coordinating
its updates to the accessed and dirty bits with updates by the CPU and by other processors that may
share the page tables. The i860 XR microprocessor
automatically asserts the LOCK # signal while setting the A bit. II an A-bit of a PTE is found not set
during a locked sequence (created by the lock instruction) , a trap will occur and the processor will not
update the A-bit.
2.4.4.7 Combining Protection of Both Levels of
Page Tables
For anyone page, the protection attributes of its
page directory entry may differ from those of its
page table entry. The i860 XR microprocessor computes the effective protection attributes for a page
2-182
i860™ }{R MICROPROCESSOR
by examining the protection attributes in both the
directory and the page table. Table 2.6 shows the
effective protection provided by the possible combinations of protection attributes.
2.4.5 ADDRESS TRANSLATION ALGORITHM
The algorithm below defines the translation of each
virtual address to a physical address. Let DIR,
PAGE, and OFFSET be the fields of the virtual address; let PFA 1 and PFA2 be the page frame address fields of the first and second level page tables
respectively; DTB is the page directory table base
address stored in the dirbase register.
data or instruction access fault. (The trap allows
software to set A to one and restart the sequence. This avoids ambiguity in determining
what address corresponds to a locked semaphore for external bus hardware use.)
6. If A in the PTE is zero, and if the TLB miss occurred while the bus was not locked, assert
LOCK #. Re-fetch and check the PTE, set A, and
store the PTE. Deassert LOCK # during the store.
7. Locate the PTE at the physical address formed by
PFA 1:PAGE:OO.
8. Perform the P, W, U, and A checks as in steps 2
through 6 with the second-level PTE.
9. If D in the PTE is clear and the operation is a
write, generate a data or instruction access fault.
1. Read the PTE (page table entry) at the physical
address formed by DTB:DIR:OO.
2. If P in the PTE is zero, generate a data- or instruction-access fault.
3. If W in the PTE is zero, the operation is a write,
and either the U-bit of the PSR is set or WP = 1,
generate a data or instruction access fault.
4. If the U-bit in the PTE is zero and the U-bit in the
psr is set, generate a data or instruction access
fault.
10. Form the physical address as PFA2:0FFSET.
The i860 XR microprocessor looks only in external
memory for Page Directories and Page Tables, in
the translation process. The data cache is not
searched. Therefore, any code which modifies Page
Directories or Page Tables must keep them out of
the cache. The tables should be kept in non-cacheable memory, or flushed from the cache.
5. If A in the PTE is zero, and if the TLB miss occurred while the bus was locked, generate a
Table 2.6. Combining Directory and Page Protections
Page Directory
Entry
Combined Protection
Page Table
Entry
User
Access
U-bit
W-bit
U-bit
W-bit
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
1
1
1
1
0
0
1
1
1
1
1
1
0
0
0
0
0
1
1
1
1
1
1
1
1
0
NOTES:
N = No access allowed
R = Read access only
0
1
1
0
1
1
WP
= X
Supervisor
Access
WP
= 0
WP
= 1
N
N
N
N
R/W
R/W
R/W
R/W
R
R
R
R
0
1
N
N
N
N
R/W
R/W
R/W
R/W
R
R/W
R
R/W
0
1
0
1
N
N
R
R
R/W
R/W
R/W
R/W
R
R
R
R
0
1
0
1
N
N
R
R/W
R/W
R/W
R/W
R/W
R
R/W
R
R/W
1
0
1
0
1
R/W = Both reads and writes allowed
X = Don't care
2-183
intel"
i860TM XR MICROPROCESSOR
cache. Future implementations of the iS60 XR architecture may adhere to a write-through data
cache policy. Thus, they may cache pages having
the WT bit of the PTE set. The value of the CD bit
or the WT bit is output on the PTB pin for use by
external caches.
o Invalidating Instruction and Address-Translation Caches. Storing to the dirbase register with
the ITI bit set invalidates the contents of the instruction and address-translation caches. This bit
should be set when modifying a page table, when
modifying a page containing instructions, or when
changing the DTB field of dirbase or the WP bit
of the epsr. Note that in order to make the instruction or address-translation caches consistent with the data cache, the data cache must be
flushed before invalidating the other caches.
The iS60 XR microprocessor expects Page Directories and Page Tables to be in little endian format.
The operating system must maintain these tables in
little endian format by either setting BE = 0 when
manipulating the tables or by complementing bit 2 of
the address when loading or storing entries.
2.4.6 ADDRESS TRANSLATION FAULTS
The address translation fault is one instance of the
data-access fault. The instruction causing the fault
can be re-executed upon returning from the trap
handler.
2.4.7 PAGE TRANSLATION CACHE
For greatest efficiency in address translation, the
iS60 XR microprocessor stores the most recently
used page-table data in an on-chip cache called the
TLB (translation lookaside buffer). Only if the necessary paging information is not in the cache must
both levels of page tables be referenced.
2.5 Caching and Cache Flushing
The iS60 XR microprocessor has the ability to cache
instruction, data, and address-translation information in on-chip caches. Caching uses virtual-address
tags. The effects of mapping two different virtual addresses in the same address space to the same
physical address are undefined.
Instruction, data, and address-translation caching on
the iS60 XR microprocessor are not transparent. Because the data cache uses a write-back protocol,
writes do not immediately update memory, and
writes to memory by other bus devices do not update the cache. Changes to page tables do not automatically update the TLB, and changes to instructions do not automatically update the instruction
cache. Under certain circumstances, such as liD
. references, self-modifying code, page-table updates, or shared data in a multiprocessing system, it
is necessary to bypass or to flush the caches. The
i860 XR microprocessor provides the following
methods for doing this:
• Bypassing Instruction and Data Caches. If
deasserted during cache-miss processing, the
KEN# pin disables instruction and data caching
of the referenced data. If the CD bit of the associated second-level PTE is set, caching of data and
instructions is disabled. The iS60 XR CPU may
place pages having WT = 1 into the instruction
2-184
o
NOTE:
The mapping of the page containing the
currently executing instruction and the
next six instructions should not be different in the new page tables when st.C dirbase changes DTB or activates ITI. The
six instructions following the st.c should
be nops and should lie in the same page
as the st.c.
Flushing the Data Cache. The data cache is
flushed by a software routine using the flush instruction. The data cache must be flushed prior to
invalidating the instruction or address-translation
caches (as controlled by the ITI bit of dirbase) or
enabling or disabling address translation (via the
ATE bit). The data cache does not need flushing
if the program is modifying only the P, U, W, A, or
o bits of a PTE (as long as the Page Frame Address is not changed and the PTE itself was not
in the data cache.) The i860 XR CPU does not
check these protection bits on cache line writeback. Thus, a trap handler can service a OAT for
D-bit-zero by setting 0 = 1 and then ITI = 1. In
the case of setting the P or A bits active, there is
no need to invalidate or flush any caches because the processor does not load entries into
the TLB that have P = 0 or A = O. The i860 XR
microprocessor searches only external memory
for Page Directories and Page Tables in the
translation process. The data cache is not
searched. Therefore, Page Tables and Directories should be kept in non-cacheable memory, or
flushed from the cache by any code which accesses them.
i860™ XR MICROPROCESSOR
2.6 Instruction Set
Table 2.7 shows the complete set of instructions
grouped by function within processing unit. Refer to
Section 8 for an algorithmic definition of each instruction.
The architecture of the i860 XR microprocessor
uses parallelism to increase the rate at which operations may be introduced into the unit. Parallelism in
the i860 XR microprocessor is not transparent; rather, programmers have complete control over parallelism and therefore can achieve maximum performance for a variety of computational problems.
2.6.1 PIPELINED AND SCALAR OPERATIONS
One type of parallelism used within the floating-point
unit is "pipelining". The pipelined architecture treats
each operation as a series of more primitive operations (called "stages") that can be executed in parallel. Consider just the floating-point adder unit as an
example. Let A represent the operation of the adder.
Let the stages be represented by A1. A2, and A3.
The stages are designed such that Ai + 1 for one adder instruction can execute in parallel with Ai for the
next adder instruction. Furthermore, each Ai can be
executed in just one clock. The pipelining within the
multiplier and graphics units can be described simi. larly, except that the number of stages may be different.
Figure 2.12 illustrates three-stage pipelining as
found in the floating-point adder (also in the floatingpoint multiplier when single-precision input operands
are employed). The columns of the figure represent
the three stages of the pipeline. Each stage holds
intermediate results and also (when introduced into
first stage by software) holds status information pertaining to those results. The figure assumes that the
instruction stream consists of a series of consecutive floating-point instructions, all of one type (i.e~ all
adder instructions or all single-precision multiplier instructions). The instructions are represented as i,
i + 1, etc. The rows of the figure represent the states
of the unit at successive clock cycles. Each time a
pipelined operation is performed, the result of the
last stage of the pipeline is stored in the destination
register fdest, the pipeline is advanced one stage,
and the input operands fsrc1 and fsrc2 are transferred to the first stage of the pipeline.
In the i860 XR microprocessor, the number of pipeline stages ranges from one to three. A pipelined
operation with a three-stage pipeline stores the result of the third prior operation. A pipelined operation
with a two-stage pipeline stores the result of the second prior operation. A pipelined operation with a
one-stage pipeline stores the result of the prior operation.
There are four floating-point pipelines: one for the
multiplier, one for the adder, one for the graphics
unit, and one for floating-point loads. The adder
pipeline has three stages. The number of stages in
the multiplier pipeline depends on the precision of
the source operands in the pipeline. Single precision
has three stages and double precision has two
stages. The graphics unit has one stage for all precisions. The load pipeline has three stages for all precisions.
Changing the FZ (flush zero), RM (rounding mode),
or RR (result register) bits of fsr while there are results in either the multiplier or adder pipeline produces effects that are not defined.
2.6.1.1 Scalar Mode
In addition to the pipelined execution mode, the i860
XR microprocessor also can execute floating-point
instructions in "scalar" mode. Most floating-point instructions have both pipelined and scalar variants,
distinguished by a bit in the instruction encoding. In
scalar mode, the floating-point unit does not start a
new operation until the previous floating-point operation is completed. The scalar operation passes
through all stages of its pipeline before a new operation is introduced, and the result is stored automatically. Scalar mode is used when the next operation
depends on results from the previous few floatingpoint operations (or when the compiler or programmer does not want to deal with pipelining).
2.6.1.2 Pipelining Status Information
Result status information in the fsr consists of the
AA, AI, AD, AU, and AE bits, in the case of the adder, and the MA, MI, MO, and MU bits, in the case of
the multiplier. This information arrives at the fsr via
the pipeline in one of two ways:
2-185
int:et
i860TM XR MICROPROCESSOR
Table 2.7. Instruction Set
floating-Point Unit
Core Unit
Mnemonic
Description
Mnemonic
Description
Load and Store Instructions
Register to Register Moves
Id.x
st.x
fld.y
pfld.z
fst.y
pst.d
fxfr
Load integer
Store integer
F-P load
Pipelined F-P load
F-P store
Pixel store
fmul,p
pfmul,p
pfmul3.dd
fmlow.p
frcp.p
frsqr.p
Register to Register Moves
ixfr
Transfer F-P to integer register
f-P Multiplier Instruction
Transfer integer to F-P register
F-P multiply
Pipelined F-P multiply
3-8tage pipelined F-P multiply
F-P multiply low
F-P reciprocal
F-P reciprocal square root
Integer Arithmetic Instructi.ons
f-P Adder Instructions
addu
adds
subu
subs
fadd.p
pfadd.p
famov.r
pfamov.r
fsub.p
pfsub.p
pfgt.p
pfeq.p
fix.p
pfix.p
ftrunc.p
pftrunc.p
Add unsigned
Add signed
Subtract unsigned
Subtract signed
Shift Instructions
shl
shr
shra
shrd
Shift left
Shift right
Shift right arithmetic
Shift right double
Logical Instructions
and
andh
and not
andnoth
or
orh
xor
xorh
Logical AND
Logical AND high
Logical AND NOT
Logical AND NOT high
Logical OR
Logical OR high
Logical exclusive OR
Logical exclusive OR high
Dual-Operation Instructions
pfam.p
pfsm.p
pfmam.p
pfmsm.p
fisub.z
pfisub.z
fiadd.z
pfiadd.z
Software trap
Software trap on integer overflow
Branch direct
Branch indirect
Branch on CC
Branch on CC taken
Branch on not CC
Branch on not CC taken
Branch if equal
Branch if not equal
Branch on LCC and add
Subroutine call
Indirect subroutine call
F-P
F-P
F-P
F-P
add and multiply
subtract and multiply
multiply with add
multiply with subtract
Long-integer subtract
Pipelined long-integer subtract
Long-integer add
Pipelined long-integer add
Graphics Instructions
fzchks
pfzchks
fzchkl
pfzchkl
faddp
pfaddp
faddz
pfaddz
form
pform
System Control Instructions
flush
Id.c
st.c
lock
unlock
Pipelined
Pipelined
Pipelined
Pipelined
Long Integer Instructions
Control-Transfer Instructions
trap
intovr
br
bri
bc
bC.t
bnc
bnc.t
bte
btne
bla
call
calli
F-P add
Pipelined F-P add
F-P adder move
Pipelined F-P adder move
F-P subtract
Pipelined F-P subtract
Pipelined F-P greater-than compare
Pipelined F-P equal compare
F-P to integer conversion
Pipelined F-P to integer conversion
F-P to integer truncation
Pipelined F-P to integer truncation
16-bit Z-buffer check
Pipelined 16-bit Z-buffer check
32-bit Z-buffer check
Pipelined 32-bit Z-buffer check
Add with pixel merge
Pipelined add with pixel merge
Add with Z merge
Pipelined add with Z merge
OR with MERGE register
Pipelined OR with MERGE register
Assembler Pseudo-Operations
Cache flush
Load from control register
Store to control register
Begin interlocked sequence
End interlocked sequence
Mnemonic
mov
fmov.r
pfmov.r
nop
fnop
pfle.p
2-186
Description
Integer register-register move
F-P reg-reg move
Pipelined F-P reg-reg move
Core no-operation
F-P no-operation
Pipelined F-P less-than or equal
i860TM XR MICROPROCESSOR
STAGE 1
results
STAGE 2
(status)
results
STAGE 3
(status)
results
status
CLOCK n
INSTRUC
I
CLOCK n+1
1+1
INSTRUC
1+1
(s)
CLOCK
1+2
INSTRUC
1+2
r
1+1
(s)
r
~
1+3
1+4
(s)
CLOCK
1+2
(s)
r
CLOCKn+~
INSTRUC
~-
n+~
1+5
1+4
fdest
1+3
•
r-.:::,
1+3
(s)
r
1+1
r
1+4
INSTRUC
~
CLOCKn+~
~
s
r-.:::,
1+2
(s)
r
I
(s)
1+3
INSTRUC
n+~
fdest
1+4
s
r--"
~1+3
fdest
1+5
1+5
240296-9
Figure 2.12. Plpelined Instruction Execution
1. It is calculated by the last stage of the pipeline.
This is the normal case.
2. It is propagated from the first stage of the pipeline. This method is used when restoring the state
of the pipeline after a preemption. When a store
instruction updates the fsr and the value of the
U bit in the word being written into the fsr is set,
the store updates the result status bits in the first
stage of both the adder and multiplier pipelines.
When software changes the result-status bits of
the first stage of a particular unit (multiplier or adder), the updated result-status bits are propagated one stage for each pipelined floating-point operation for that unit. In this case, each stage of the
adder and multiplier pipelines holds its own copy
of the relevant bits of the fsr. When they reach
the last stage, they override the normal resultstatus bits computed from the last stage result.
At the next floating-point instruction (or, at certain
core instructions), after the result reaches the last
stage, the i860 XR microprocessor traps if any of the
status bits of the fsr indicate exceptions. Note that
the instruction that creates the exceptional condition
is not the instruction at which the trap occurs.
2.6.1.3 Precision in the Pipelines
in pipelined mode, when a floating-point operation is
initiated, the result of an earlierpipelined 1Ioatingpoint operation is returned. The result precision of
the current instruction applies to the operation being
initiated. The precision of the value stored in Ides! is
that which was specified by the instruction that initiated that ·operation.
2-187
i860TM XR MICROPROCESSOR
o
31
OP
d.FP-OP
d.FP-OP or CORE-OP
63
CORE-OP
d.FP-OP
CORE-OP
FP-OP
CORE-OP
FP-OP
ENTER OUALINSTRUCTION MODE.
INITIATE EXIT FROM
DUAL-INSTRUCTION MODE.
LEAVE DUALINSTRUCTION MODE.
OP
OP
o
31
OP
d.FP-OP
FP-OP
63
I
FP-OP
CORE-OP
TEMPORARY DUALINSTRUCTION MODE
OP
OP
240296-10
. Figure 2.13. Dual-Instruction Mode Transitions
If fdest is the same as fsrc 1 or fsrc2, the value being
stored in fdest is used as the input operand. In this
case, the precision of fdest must be the same as the
source precision.
The multiplier pipeline has two· stages when the
source operand is double-precision and three stages
when the precision of the source operand is single.
This means that a· pipelined multiplier operation
stores the result of the second previous multiplier
operation for double-precision inputs and third previous for single-precision inputs (except when changing precisions).
2.6.1.4 Transition between Scalar and Pipelined
Operations
When a scalar operation is executed, it passes
through all stages of the pipeline; therefore, any unstored results in the affected pipeline are lost. To
avoid losing information, the last pipelined operations before a scalar operation should be dummy
pipelined operations that unload unstored results
from the affected pipeline.
After a scalar operation, the values of all pipeline
stages of the affected unit (except the last) are undefined. No spurious result-exception traps result
when the undefined values are subsequently stored
by pipelined operations; however, the values should
not be referenced as source operands.
For best performance a scalar operation should not
immediately precede a pipelined operation whose
fdest is nonzero.
2.6.2 DUAL-INSTRUCTION MODE
Another form of parallelism results from the fact that
the i860 XR microprocessor can execute both a
floating-point and a core instruction simultaneously.
Such parallel execution is called dual-instruction
mode. When executing in dual-instruction mode, the
instruction sequence consists of 64-bit aligned instructions with a floating-point instruction in the lower32 bits and a core instruction in the upper 32 bits.
Table 2.7 identifies which instructions are executed
by the core unit and which by the floating-point unit.
2-188
i860™ XR MICROPROCESSOR
Programmers specify dual-instruction mode either
by including in the mnemonic of a floating-point instruction a d. prefix or by using the Assembler directives .dual .... enddual. Both of the specifications
cause the O-bit of floating-point instructions to be
set. If the i860 XR microprocessor is executing in
single-instruction mode and encounters a floatingpoint instruction with the O-bit set, one more 32-bit
instruction is executed before dual-mode execution
begins. If the i860 XR microprocessor is executing in
dual-instruction mode and a floating-point instruction
is encountered with a clear O-bit, then one more pair
of instructions is executed before resuming single-instruction mode. Figure 2.13 illustrates two variations
of this sequence of events: one for extended sequences of dual-instructions and one for a single instruction pair.
3.0perand-1 of the adder can be (sret, the
T-register, or the last stage result of the adder
pipeline.
4. Operand-2 of the adder can be (sre2, the last
stage result of the multiplier pipeline, or the
last stage result of the adder pipeline.
Figure 2.14 shows all the possible data paths surrounding the adder and multiplier. A ope field in
these instructions select different data paths. Table
8.8 shows the various encodings of the ope field.
Refer to Dual Operation Instructions section in the
i860 Microprocessor Programmer's Reference Manual for pictorial description.
SRCI
SRC2
RDEST
When a 64-bit dual-instruction pair sequentially follows a delayed branch instruction in dual-instruction
mode, both 32-bit instructions are executed.
2.6.3 DUAL-OPERATION INSTRUCTIONS
Special dual-operation floating-point instructions
(add-and-multiply, subtract-and-multiply) use both
the multiplier and adder units within the floatingpoint unit in parallel to efficiently execute such common tasks as evaluating systems of linear equations, performing the Fast Fourier Transform (FFT),
and performing graphics transformations.
.
The instructions pfam (sret, (sre2, (dest (add and
multiply), pfsm (sret, (sre2, (dest (subtract and multiply), pfmam (sert, (sre2, (dest (multiply and add),
and pfmsm (sret, (sre2, (dest (multiply and subtract)
initiate both an adder operation and a multiplier operation. Six operands are required, but the instruction format specifies only three operands; therefore,
there are special provisions for specifying the operands. These special provisions consist of:
" Three special registers (KR, KI, and T), that can
store values from one dual-operation instruction
and supply them as inputs to subsequent dualoperation instructions.
1. The constant registers KR and KI can store the
value of (sret and subsequently supply that
value to the multiplier pipeline in place of (sret.
2. The transfer register T can store the last stage
result of the multiplier pipeline and subsequently supply that value to the adder pipeline
in place of (sret.
e A four-bit data-path control field in the opcode
(OPC) that specifies the operands and loading of
the special registers.
1. Operand-1 of the multiplier can be KR, KI, or
(sret.
2. Operand-2 of the multiplier can be (sre2 or the
last stage result of the adder pipeline.
MULTIPLIER UNIT
ADDER UNIT
RESULT
240296-11
Figure 2.14. Dual-Operation Data Paths
Note that the mnemonics pfam.p, pfsm.p,
pfmam.p, and pfmsm.p are never used as such in
the assembly language; these mnemonics are used
here to designate classes of related instructions.
Each value of ope has a unique mnemonic associated with it.
2.7· Addressing Modes
Data access is limited to load and store instructions.
Memory addresses are computed from two fields of
load and store instructions: isret and isre2.
1. isre t either contains the identifier of a 32-bit integer register or contains an immediate 16-bit address offset.
2. isre2 always specifies a register.
2-189
i860™ XR MICROPROCESSOR
Table 2.8. Types of Traps
Indication
Type
PSR,EPSR
Instruction
Fault
Floating
Point
Fault
IT
FSR
OF
IL
Caused by
Condition
Instruction
Software traps
Missing unlock
trap, intovr
Any
SE
Floating-point source exception Any M- or A-unit except fmlow
Floating-point result exception Any M- or A-unit except fmlow, pfgt,
AO,MO
overflow
and pfeq. Reported on any F-P
AU,MU
underflow
instruction plus pst, fst, and
AI,MI
inexact result
sometimes fld, pfld, ixfr
FT
Instruction
IAT
Access Fault
Address translation exception
during instruction fetch
Any
Data Access
Fault
Load/store address translation
exception
Misaligned operand address
Operand address matches
db register
Any load/store
OAT'
Interrupt
IN
Reset
Any load/store
Any load/store
External interrupt
No trap bits set
Hardware RESET signal
NOTES:
"These cases can be distinguished by examining the operand addresses.
The IL bit of the epsr must be checked by the trap handler to tell if the bus is currently in a locked sequence.
Because either isrc1 or isrc2 may be null (zero), a
variety of useful addressing modes result:
offset + register Useful for accessing fields within
a record, where register points
to the beginning of the record.
Useful for accessing items in a
stack frame, where register is
r3, the register used for pointing
to the beginning of the stack
frame.
register
register
offset
+ register
Useful for two-dimensional arrays or for array access within
the stack frame.
Useful as the end result of any
arbitrary address calculation.
Absolute address into the first or
last 32K of the logical address
space.
In addition, the floating-point load and store instructions may select autoincrement addressing. In this
mode isrc2 is replaced by the sum of isrc 1 and isrc2
after performing the load or store. This mode makes
stepping through arrays more efficient, because it
eliminates one address-calculation instruction.
2.8 Traps and Interrupts
Traps are caused by exceptional conditions detected in programs or by external interrupts. Traps
cause interruption of normal program flow to exe-
cute a special program known as a trap handler.
Traps are divided into the types shown in Table 2.8.
Interrupts and traps start execution in single instruction mode at virtual address OxFFFFFFOO in supervisor level (U = 0).
2.8.1 TRAP HANDLER INVOCATION
This section applies to traps other than reset. When
a trap occurs, execution of the current instruction is
aborted. The instruction is restartable. The processor takes the following steps while transferring control to the trap handler:
1. Copies U (user mode) of the psr into PU (previous
U).
2.
3.
4.
5.
Copies 1M (interrupt mode) into PIM (previous 1M).
Sets U to zero (supervisor mode).
Sets 1M to zero (interrupts disabled).
If the processor is in dual instruction mode, it sets
DIM; otherwise it clears DIM.
6. If the processor is in single-instruction mode and
the next instruction will be executed in dualinstruction mode or if the processor is in dual-instruction mode and the next instruction will be
executed in single-instruction mode, OS is set;
otherwise, it is cleared.
7. The appropriate trap type bits in psr are set (IT,
IN, IAT, OAT, FT). Several bits may be set if the
corresponding trap conditions occur simultaneously..
2-190
int'et
i860TM XR MICROPROCESSOR
8. An address is placed in the fault instruction register (fir) to help locate the trapped instruction. In
single-instruction mode, the address in fir is the
address of the trapped instruction itself. In dual-instruction mode, the address in fir is that of the
floating-point half of the dual instruction. If an instruction or data access fault occurred, the associated core instruction is the high-order half of the
dual instruction (fir + 4). In dual-instruction
mode, when a data access fault occurs in the absence of other trap conditions, the floating-point
half of the dual instruction will already have been
executed.
2.8.3 FLOATING-POINT FAULT
The floating-point fault is reported on floating-point
instructions, pst, fst, and sometimes fld, pfld, ixfr.
The floating-point faults of the i860 XR microprocessor support the floating-point exceptions defined by
the IEEE standard as well as some other useful
classes of exceptions. The i860 XR microprocessor
divides these into two classes: source exceptions
and result exceptions. The numerics library supplied
by Intel provides the IEEE standard default handling
for all these exceptions.
2.8.3.1 Source Exception Faults
When used as inputs to the multiplier or adder, all
exceptional operands, including infinities, denormalized numbers and NaNs, cause a floating-point fault
and set SE in the fsr. Source exceptions are reported on the instruction that initiates,the operation. For
pipe lined operations, the pipeline is not advanced.
The' processor begins executing the trap handler
by transferring execution to virtual address
OxFFFFFFOO. The trap handler begins execution in
single-instruction mode. The trap handler must examine the trap-type bits in psr (IT, IN, IAT, OAT, FT)
to determine the cause or causes of the trap.
2.8.2 INSTRUCTION FAULT
This fault is caused by any of the following conditions. In all cases the processor sets the :IT bit before entering the trap handler.
1. By the trap instruction. When trap is executed in
dual-instruction mode, the floating-point companion of the trap instruction is not executed before
the trap is taken.
~. By the intovr instruction. The trap occurs only if
OF in epsr is set when intovr is executed. The
trap handler should clear OF before returning.
When intovr causes a trap in dual-instruction
mode, the floating-point companion of the intovr
instruction is completely executed before the trap
is taken.
3. By violation of lock/unlock protocol, explained below. (Note that trap and intovr should not be
used within a locked sequence; otherwise, it
would be difficult to distinguish between this and
the prior cases.)
The lock protocol requires the following sequence
of activities:
1. lock
2. Any load or store instruction that ,misses the
cache
3. unlock
4. Any load or store instruction (regardless ofwhether it misses the cache)
The SE value is undefined for faults on tid, pfld, fst,
pst, and ixfr instructions when in single-instruction
mode or when in dual-instruction mode and the companion instruction is not a multiplier or adder operation.
2.8.3.2 Result Exception Faults
The class of result exceptions includes any of the
following conditions:
o Overflow. The absolute value of the rounded
true result would exceed the largest positive finite
number in the destination format.
e Underflow (when FZ is clear). The absolute value of the rounded true result would be smaller
than the smallest positive finite number in the
destination format.
o Inexact result (when Tl is set). The result is not
exactly representable in the destinatiori format.
For example, the fraction % cannot be precisely
represented in binary form. This exception occurs
frequently and indicates that some (generally acceptable) accuracy has been lost.
The point at which a result exception is reported depends upon whether pipelined operations are being
used:
-' ,
There may be other instructions between any of
these steps. The bus is locked after step 2; and remains locked until step 4. Step 4 must follow step 1
by 30 instructions or less, otherwise the instruction
trap occurs. In case of a trap, IL is also set. If the
load or store instruction in step 2 hits the cache, the
sequence is legal, but the bus is ncit locked.
2-191
e Scalar (nonpipelined) operations. Result ex-
ceptions are reported on the next floating-point,
fst.x, or pstx (and sometimes fld, pfld, ixfr) instruction after the scalar operation. When a trap
occurs, the last stage of the affected unit contains the result of the scalar operation.
• Pipelined operations. -Result exceptions are reported when the result is in the last stage and the
next floating-point, fst.x or pst.x (and sometimes
fld, pfld, ixfr) instruction is executed. When a
trap occurs, the pipeline is not advanced, and the
last stage results (that caused the trap) remain
, unchanged. ,
II
'
inteL
i860TM XR MICROPROCESSOR
When no trap occurs (either because FTE is clear or
because no exception occurred), the pipeline is advanced normally by the new floating-point operation.
The result-status bits of the affected unit are undefined until the point that result exceptions are reported. At this point, the last stage result-status bits (bits
29 .. 22 and 16 .. 9 of the fsr) reflect the values in the
last stages of both the adder and multiplier. For example, if the last stage result in the multiplier has
overflowed and a pipelined floating-point pfadd is
started, a trap occurs and MO is set.
For scalar operations, the RR bits of fsr specify the
register in which the result was stored. RR is updated when the scalar instruction is initiated. The trap,
however, occurs on a subsequent instruction. Programmers must prevent intervening stores to fsr
from modifying theRR bits. Prevention may take one
of the following forms: .
Note that several instructions are fetched at one
time, either due to instruction prefetching or to instruction caching. Therefore, a trap handler can
change from supervisor to user mode and continue
to execute instructions fetched from a supervisor
page. An instruction access trap occurs only when
the next group of instructions is fetched from a supervisor page (up to eight instructions later). If, in the
meantime, the handler branches to a user page, no
instruction access trap occurs. No protection violation results, because the processor does not permit
data accesses to supervisor pages while running in
user mode.
2.8.5 DATA ACCESS FAULT
This trap results from an abnormal condition detected during data operand fetch or store. Such an exception can be due only to one of the following causes:
• Before any store to fsr when a result exception
may be pending, execute a dummy floating-point
operation to trigger the result-exception trap.
• An attempt is being made to write to a page
whose D (Dirty) bit is clear.
• A memory operand is misaligned (is not located
at an address that is a multiple of the length of
the data).
• Always read from fsr before storing to it, and
mask updates so that the RR bits are not
changed.
• The address stored in the db register is equal to
one of the addresses spanned by the operand.
For pipelined operations, RR is cleared and the result is in the last stage of the pipeline of the appropriate unit. The trap handler must flush the pipeline,
saving the results and the status bits.
In either pipelined or scalar mode, the trap handler
must then compute the trapping result. In either
case, the result has the same fraction as the true
result and has an exponent which is the low-order
bits of the true result. The trap handler can inspect
the result, compute the result appropriate for that
instruction (a NaN or an infinity, for example), and
store the correct result. The result is either stored in
the register specified by RR (if nonzero) or (if RR =
0) the trap handler must reload the pipeline with the
saved results and status bits.
Result exceptions may be reported for both the adder and multiplier units at the same time. In this
case, the trap handler should fix up the last stage of
both pipelines.
• The operand is in a not-present page.
• An attempt is being made from user level to write
to a read-only page or to access a supervisor-level page.
• The operand was in a page whose PTE had A =
0, and the access occurred during a locked sequence. (i.e., between lock and unlock.)
• Write protection (determined by epsr bit WP
is violated in supervisor mode.
= 1)
2.8.6 INTERRUPT TRAP
An interrupt is an event that is 'signaled from an external source. If the processor is executing with interrupts enabled (1M set in the psr), the processor
sets the interrupt bit IN in the psr, and generates an
interrupt trap. Vectored interrupts are implemented
by interrupt controllers and software.
2.8.7 RESET TRAP
2.8.4 INSTRUCTION ACCESS FAULT
This trap occurs during address translation for instruction fetches in any of these cases:
• The address fetched is in a page whose P (present) bit in the page table is clear (not present).
.. The address fetched is in a supervisor mode
page, but the processor is in user mode.
• The address fetched is in a page whose PTE has
A = 0, and the access occurs during a locked
sequence (Le., between lock and unlock).
When the i860 XR microprocessor is reset, execution begins in single-instruction mode at physical address OxFFFFFFOO. This is the same address as for
other traps. The reset trap can be distinguished from
other traps by the fact that no trap bits are set. The
instruction cache is flushed. The bits DPS, BL, and
ATE in diibase are cleared. csa is initialized by the
value at the INT pin at the end of reset. The readonly fields of the espr are set to identify the processor, while the IL, WP, and PBM bits are cleared. The
2-192
int:et
i860™ XR MICROPROCESSOR
bits U, 1M, SR, and SW in psr are cleared, as are the
trap bits FT, OAT, IAT, IN, and IT. All other bits of
psr and all other register contents are undefined.
BR (break read) and BW (break write) bits of the
psr, which enable trapping of either reads or
writes (respectively) to the address in db.
o OAT (data access trap) bit of the psr, which allows the trap handler to determine when a data
breakpoint was the cause of the trap.
o trap instruction that can be used to set breakpoints in code. Any number of code breakpoints
can be set. The values of the isrc1 and isrc2
fields help identify which breakpoint has occurred.
o IT (instruction trap) bit of the psr, which allows
the trap handler to determine when a trap
instruction was the cause of the trap.
o
Refer to Table 2.9 for a summary of these initial settings.
Table 2.9. Register and Cache Values after Reset
Registers
Initial Value
Integer Registers Undefined
Floating-Point
Undefined
Registers
psr
U, 1M, SR, SW, FT, OAT, IAT, IN,
IT = 0; others are undefined
epsr
Il, WP, PSM, BE = 0;
Processor Type, Stepping
Number, DCS are read
only; others are undefined
db
Undefined
dirbase
DPS, Bl, ATE = 0; others
are undefined
fir
Undefined
fsr
Undefined
KR, KI, T,
Undefined
MERGE
Caches
3.0 HARDWARE INTIERfACIE
In the following description of hardware interface,
the # symbol at the end of a signal name indicates
that the active or asserted state occurs when the
signal is at a low Voltage. When no # is present after
the signal name, the signal is asserted when at the
.
high voltage level.
Initial Value
3.1 Signal Description
Instruction Cache Flushed
Data Cache
Undefined
TlB
Flushed
The software must ensure that the data cache is
flushed and control registers are properly initialized
before performing operations that depend on the
values of the cache or registers. The data cache has
no "validity" bits, so memory accesses before the
flush may result in false data cache hits.
Reset code must initialize the floating-point pipeline
state to zero with floating-point traps disabled to ensure that no spurious floating-point traps are generated.
After a RESET the i860 XR microprocessor starts
execution at supervisor level (U = 0). Before branching to the first user-level instruction, the RESET trap
handler or subsequent initialization code has to set
PU and a trap bit so that an indirect branch instruction will copy PU to U, thereby changing to user level.
2.9 Debugging
The i860 XR microprocessor supports debugging
with both data and instruction breakpoints. The features of the i860 XR architecture that support debugging include:
• db (data breakpoint register) which permits specification of a data addresses that the i860 XR microprocessor will monitor.
Table 3.1 identifies functional groupings of the pins,
lists every pin by its identifier, gives a brief description of its function, and lists some of its characteristics. All output pins are tristate, except HlDA and
BREQ. All inputs are synchronous, except HOLD
and INT.
3.1.1 CLOCK (ClK)
The elK input determines execution rate and timing
of the i860 XR microprocessor. Timing of other signals is specified relative to the rising edge of this
signal. The i860 XR microprocessor can utilize a
clock rate of 25 MHz, 33.3 MHz or 40 MHz. The
internal operating frequency is the same as the external clock.
3.1.2 SYSTEM RESET (RESET)
Asserting RESET for at least 16 ClK periods causes
initialization of the i860 XR microprocessor. Refer to
section 3.2 "Initialization" for more details related to
RESET.
3.1.3 BUS HOLD (HOLD) AND BUS HOLD
ACKNOWLEDGE (HlDA)
These pins are used for i860 XR microprocessor bus
arbitration. At some clock after the HOLD signal is
asserted, the i860 XR microprocessor releases con-
2-193
int:et
i860TM XR MICROPROCESSOR
Table 3.1. Pin Summary
Pin
Name
CLK
RESET
HOLD
HLDA
BREQ
INT/CS8
Active
State
Function
Execution Control Pins
CLocK
System reset
Bus hold
Bus hold acknowledge
Bus request
Interrupt, code·size
High
High
High
High
High
Input!
Output
I
I
I
0
0
I
Bus Interface Pins
A31-A3
BE7#-BEO#
D63-DO
LOCK#
W/R#
NENE#
NA#
READY#
ADS#
Address bus
Byte Enables
Data bus
Bus lock
Write/Read bus cycle
NExt NEar
Next Address request
Transfer Acknowledge
ADdress Status
High
Low
High
Low
High/Low
Low
Low
Low
Low
0
0
I/O
0
0
0
I
I
0
Cache Interface Pins
KEN#
PTB
Cache ENable
Page Table Bit
SHI
BSCN
SCAN
Boundary Scan Shift Input
Boundary Scan Enable
Shift Scan Path
CC1-CCO
Configuration
I
Low
High
0
High
High
High
I
I
I
High
I
TestablJlty Pins
Intel~Reserved
Configuration Pins
Power and Ground Pins
Vee
Vss
System power
System ground
A # after a pin name indicates that the signal
IS
active when at the low voltage level.
trol of the local bus and puts all bus interface outputs (except BREQ and HLDA) into a floating state,
then asserts HLDA-all during the same clock period. It maintains this state until HOLD is deassetted.
Instruction execution stops only if required instructions or data cannot be read from the on-chip instruction and data caches.
The time required to acknowledge a hold request is
one clock plus the number of clocks needed to finish
any outstanding bus cycles. HOLD is recognized
even while RESET or LOCK # is asserted.
When leaving a bus hold, the i860 XR microprocessor deactivates HLDA and, in the same clock period,
initiates a pending bus cycle, if any.
Hold is an asynchronous input.
3.1.4 BUS REQUEST (BREQ)
This signal is asserted when the i860 XR microprocessor has a pending memory request, even when
HLDA is asserted. This allows an external bus arbiter to implement an "on demand only" policy for
granting the bus to the i860 XR microprocessor.
BREQ is asserted the clock after the i860 XR microprocessor realizes an internal request for the bus. In
normal operation, BREQ goes low the clock after
ADS# goes low for the final pending bus cycle. (Refer to Figure 4.10 for timing information.) During data
or instuction cache fills, however, BREQ may be
deasserted for one or more clocks, due to cache
and TLB logic.
3.1.5 INTERRUPT/CODE-SIZE (INT ICS8)
This input allows interruption of the current instruction stream. If interrupts are enabled (1M set in psr)
when INT is asserted, the i860 XR microprocessor
fetches the next instruction from address
2-194
i860TM XR MICROPROCESSOR
OxFFFFFFOO. To assure that an interrupt is recognized, INT should remain asserted until the software
acknowledges the interrupt (by writing, for example,
to a memory-mapped port of an interrupt controller).
When the bus is not locked, the maximum time between the assertion of INT and the execution of the
first instruction of the trap handler is ten clocks, plus
the time for four sets of four pipelined read cycles
and two sets of four pipelined writes (instructionand data-cache misses and write-back cycles to update memory), plus the time for twenty nonpipelined
read cycles (six TLB misses, with eight refetches
when the A-bit is zero), plus the time for eight nonpipe lined writes (updates to the A-bit).
If the bus is locked from a lock instruction, the INT
pin is ignored and the INT bit of epsr is always zero.
The lock instruction can only assert LOCK# for 3033 instructions before trapping.
If INT is asserted during the clock before the falling
edge of RESET, the eight-bit code-size mode is selected. For more about this mode, refer to section
3.2 "Initialization".
INT is an asynchronous input.
The address and byte-enable pins are driven until
either NA# or READY# is asserted.
3.1.7 DATA PINS (063-00)
The bus interface has 64 bidirectional data pins
(D63-DO) to transfer data in eight- to 64-bit quantities. Pins D7-DO transfer the least significant byte;
pins D63-D56 transfer the most significant byte.
In read bus cycles, all 64 bits of the data bus are
latched, even in CSS-mode instruction fetches when
only the low-order eight bits are used.
In write bus cycles, the point at which data is driven
onto the bus depends on the type of the preceding
cycle. If there was no preceding cycle (Le. the bus
was idle), data is driven with the address. If the preceding cycle was a write, data is driven as soon as
READY # is returned from the previous cycle. If the
preceding cycle was a read, data is driven one clock
after READY # is returned from the previous cycle,
thereby allowing time for the bus to be turned
around. Data continues to be driven until READY #
for the current cycle is returned.
3.1.8 BUS LOCK (LOCK#)
3.1.6 ADDRESS PINS (A31-A3) AND BYTE
ENABLES (BE7#-BEO#)
The 29-bit address bus (A31-A3) identifies addresses to a 64-bit location. Separate byte-enable signals
(BE7#-BEO#) identify which bytes should be accessed within the 64-bit location. In all noncacheable read cycles (KEN # deasserted), the byte
enables match the length and address of the requested data. Cacheable read cycles (KEN # asserted), however, result in four 64-bit memory cycles to
fill an entire 32-byte cache line. The BEn# pins activated are those that represent the operand of the
load instruction that caused the line fill, and these
same BEn# pins remain activated for all four cycles
of the line fill. All 64 bits must be returned for each
cycle without regard for the BEn# signals. In all
write cycles (noncacheable writes as well as cache
line write-backs) the BEn# signals indicate the
bytes that must be written.
Instruction fetches (W/R# is low) are distinguished
from data accesses by the unique combinations of
BE7#-BEO# defined in Table 3.2. For an eight-bit
code fetch in eight-bit code-size (CSS) mode,
BE2#-BEO# are redefined to be A2-AO of the address. In this case BE7 # -BE3 # form the code
shown in Table 3.2 that identifies an instruction
fetch. The A2 in the table does not represent a physical pin, just a conceptual internal address line value.
The "x"under A2 for CSS mode means "not applicable", or "don't care". All other combinations of byte
enables indicate data accesses.
This signal is used to provide atomic (indivisible)
read-modify-write sequences in multiprocessor systems. A multiprocessor bus arbiter must permit only
one processor a locked access to the address which
is on the bus when LOCK # first activates. The system must maintain the lock of that location until
LOCK # deactivates ..
The iS60 XR microprocessor coordinates the external LOCK# signal with the software-controlled BL
bit of the dirbase register. Programmers do not
have to be concerned about the fact that bus activity
is not always synchronous with instruction execution. LOCK# is asserted with ADS# for the address
operand of the first load or store instruction executed after the BL bit is set by the lock instruction.
Pending bus cycles are locked according to the value of the BL bit when the instruction was executed.
Even if the BL bit is changed between the time that
an instruction generates an internal bus request and
the time that the cycle appears on the bus, tne i860
XR microprocessor still asserts LOCK # for that bus
cycle.
If ADS # is active when LOCK # deactivates, then
that request should complete before the hardware
relinquishes the lock. If ADS # is not active, the locking of the location can immediately end when
LOCK # deactivates. Of course the simplest arbitration hardware can just lock the entire bus against all
other accesses during LOCK # assertion through
RDY# of the cycle in which LOCK# goes inactive.
2-195
II
i860™ XR MICROPROCESSOR
..
Table 3 2 Identifying Instruction Fetches
Code
Fetch
Normal
(Non·CSS)
Normal
(Non·CSS)
CSS
Mode
BE7#
BE6#
BE5#
BE4#
BE3#
BE2#
BE1#
BEO#
1
1
1
1
1
0
1
0
1
1
0
1
0
1
1
1
1
x
1
0
1
0
1
Low-order address bits
A2
0
When the BL bit is deasserted with the unlock instruction, LOCK # is deasserted with the next load
or store but after any pending bus cycles. Between
locked sequences, at least one cycle of no LOCK #
is guaranteed by the behavior of the unlock instruction. LOCK # deassertion may occur independently
of ADS# for the case of a trap or a cache hit after
unlock.
The iS60 XR microprocessor also asserts LOCK #
during TLB miss processing for updates of the accessed bit in page-table entries. The maximum time
that LOCK # can be asserted in this case is five
clocks plus the time required to perform a read-modify-write sequence. Instruction fetches do not alter
the LOCK # pin.
Between lock and unlock instructions, the INT pin is
ignored and the INT bit of epsr is zero when read by
Id.c epsr. The time that interrupts are disabled is
limited by the lock protocol outlined in Section 2.S.2.
fore the current cycle ends. (If the system does not
implement pipelining, NA# does not have to be activated.) The iS60 XR microprocessor samples NA#
every clock, starting one clock after the prior activation of ADS #. When NA # is active, the iS60 XR
microprocessor is free to drive address and bus-cycle definition for the next pending bus cycle. The
iS60 XR microprocessor remembers that NA# was
asserted when no internal request is pending; therefore, NA# can be deactivated after the next rising
edge of the CLK signal. Up to three bus cycles can
be outstanding simultaneously.
3.1.12 TRANSFER ACKNOWLEDGE (READY#)
The system must assert the READY #. signal during
read cycles when valid data is on the data pins and
during write cycles when the system has accepted
data from the data pins. READY # must be asserted
for at least one clock. Sampling of READY# begins
in the clock after an ADS# or in the second clock
after a prior READY#.
3.1.9 WRITE/READ BUS CYCLE (W/R#)
This pin specifies whether a bus cycle is a read
(LOW) or write (HIGH) cycle. It is. driven until either
NA# or READY# is asserted.
3.1.10 NEXT NEAR (NENE#)
This signal allows higher-speed reads and writes in
the case of consecutive reads and writes that access static column or page-mode DRAMs. The iS60
XR microprocessor asserts NENE# when the current address is in the same DRAM page as the previous bus cycle. The iS60 XR microprocessor determines the DRAM page size by inspecting the DPS
field 'in the dirbase register. The page size can
range from 29 to 216 64-bit words, supporting DRAM
sizes from 256K x 1, 256K x 4, and up. NENE# is
never asserted on the next bus cycle after HLDA is
deasserted.
3.1.11 NEXT ADDRESS REQUEST (NA#)
NA# makes address pipelining possible. The system asserts NA # for at least one clock to indicate
that it is ready to accept the next address from the
iS60 XR microprocessor. NA# may be asserted be-
3.1.13 ADDRESS STATUS (ADS#)
The iS60 XR microprocessor asserts ADS# during
the first clock of each bus cycle to identify the clock
period during which it begins to assert outputs on
the address bus. This signal is held active for one
clock.
3.1.14 CACHE ENABLE (KEN#)
The iS60 XR microprocessor samples KEN # to determine whether the data being read for the current
cache-miss cycle is to be cached. This pin is internally NORed with the CD and WT bits to control
cacheability on a page by page basis (refer to Table
3.3).
If the address is one that is permitted to be in the
cache, KEN # must be continuously asserted during
the sampling period starting from the second rising
clock edge after ADS# is asserted, through the
clock NA# or READY# is asserted. The. entire 64
bits of the data bus will be used for the read, regardless of the state of the byte-enable pins. Three additional 64-bit bus cycles will be generated to fill the
rest of the 32-byte cache block.
2-196
i860TM XR MICROPROCESSOR
If KEN # is found deasserted at any clock from the
clock after ADS# through the clock of the first NA#
or READY #, the data being read will not be cached
and two scenarios can occur: 1) if the cycle is due to
data·cache miss, no subsequent cache·fill cycles
will be generated; 2) if the cycle is due to an instruc·
tion·cache miss, additional cycle(s) will be gene rat·
eduntil the address reaches a 32·byte boundary. To
avoid caching a line, external hardware must deas·
sert KEN # during or before the first NA # or
READY#.
3.1.17 BOUNDARY SCAN ENABLE (BSCN)
This pin is used with the testability features. Refer to
section 3.3.
3.1.18 SHIFT SCAN PATH (SCAN)
This pin is used with the testability features. Refer to
section 3.3.
3.1.19 CONFIGURATION (CC1-CCO)
3.1.15 PAGE TABLE BIT (PTB)
Depending on the setting of the PBM (page· table bit
mode) bit of the epsr, the PTB reflects the value of
either the CD (cache disable) bit or the WT (write
through) bit of the page·table entry used for the cur·
rent cycle. When paging is disabled, PTB remains
inactive.
Table 3.3. Cacheability based on
KEN# and CD OR WT
CDORWT
KEN#
Meaning
0
0
0
1
0
Cacheable access
Noncacheable access
Noncacheable page
Noncacheable page
1
1
1
These two pins are reserved by Intel. Strap both pins
LOW.
3.1.20 SYSTEM POWER (Vee> AND GROUND
(Vss)
The i860 XR microprocessor has 48 pins for power
and ground. All pins must be connected to the ap·
propriate low·inductance power and ground signals
in the system.
3.2 Initialization
3.1.16 BOUNDARY SCAN SHIFT INPUT (SHI)
This pin is used with the testability features. Refer to
section 3.3.
Initialization of the i860 XR microprocessor is
caused by assertion of the RESET signal for at least
16 clocks. Table 3.4 shows the status of output pins
during the time that RESET is asserted. Note that
HOLD requests are honored during RESET and that
the status of output pins depends on whether a
HOLD request is being acknowledged.
2·197
PI
i860TM XR MICROPROCESSOR
AOS#, LOCK#
HIGH
Tri-State OFF
serted, the iB60 XR microprocessor enters boundary
scan mode on the next rising clock edge. Boundary
scan mode can be activated even while RESET is
active. When BSCN is deasserted while in boundary
scan mode, the iB60 XR microprocessor leaves
boundary scan mode on the next rising clock edge.
After leaving boundary scan mode, the internal state
is undefined; therefore, RESET should be asserted.
W/R#, PTB
LOW
Tri-State OFF
Table 3.5. Test Mode Selection
BREQ
LOW
LOW
BSCN
SCAN
HLOA
LOW
HIGH
LO
Tri-State OFF
Tri-State OFF
Undefined
Tri-State OFF
LO
HI
HI
LO
HI
LO
HI
Table 3.4. Output Pin Status during Reset
Pin Value
Pin Name
063-00
HOLD
HOLD
Not
Acknowledged
Acknowledged
A31-A3,
BE7#-BEO#,
NENE#
After a reset, the iB60 XR microprocessor begins executing at physical address OxFFFFFFOO. The program-visible state of the iB60 XR microprocessor after reset is detailed in section 2.B.7.
Eight-bit code-size mode is selected when INT ICSB
is asserted during the clock before the falling edge
of RESET: While in eight-bit code-size mode, instruction cache misses are byte reads (transferred
on 07-00 of the data' bus) instead of eight-byte
reads. This allows the iB60 XR microprocessor to be
bootstrapped from an eight-bit EPROM. For these
code reads, byte enables BE2#-BEO# are redefined to be the low order three bits of the address,
so that a complete byte address is available. These
reads update the instruction cache if KEN # is asserted (refer to section 3.1.14) and are not pipelined
even if NA # is asserted. While in this mode, instructions must reside in an eight-bit wide memory, while
data must reside in a separate 64-bit wide memory.
After the code has been loaded into 64-bit memory,
initialization code can initiate 64-bit code fetches by
clearing the CSB bit of the dirbase register (refer to
section 2). Once eight-bit code-size mode is disabled by software, it cannot be reenabled except by
resetting the iB60 XR microprocessor.
For testing purposes, each signal pin has associated
with it an internal latch. Table 3.6 indentifies these
latches by name and classifies them as input, output, or control. The input and output latches carry
.the name of the corresponding pins.
Table 3.6. Test Mode Latches
Input
Latch
SHI
BSCN
SCAN
RESET
00-063
eCl-CCO
The pins BSCN and SCAN control the boundary
scan mode (refer to Table 3.5). When BSCN is as-
Output
Latch
Associated
Control
Latch
00-063
OATAt
A31-A3
NENE#
PTB#
AODRt
NENEt
PTBt
W/R#
W/Rt
AOS#
HLOA
LOCK#
AOSt
BE7#-BEO#
BREQ
BEt
LOCKt
REAOY#
KEN#
NA#
INT/CSB
HOLD
3.3 Testability
The iB60 XR microprocessor has a boundary scan
mode that may be used in component- or board-level testing to test the signal traces leading to and
from the iB60 XR microprocessor. Boundary scan
mode provides a simple serial interface that makes it
possible to test all signal traces with only a few
probes. Probes need be connected only to CLK,
BSCN, SCAN, SHI, BREQ, RESET, and HOLD.
Testability Mode
No testability mode selected
(Reserved for Intel)
Boundary scan mode, normal
Boundary scan mode, shift
SHI as input; BREQ as
output
Within boundary scan mode the iB60 XR microprocessor operates in one of two sub modes: normal
mode or shift mode, depending on the value of the
SCAN input. A typical test sequence is ...
2-19B
int:eL
i860™ XR MICROPROCESSOR
A tester causes entry into this mode for one of two
purposes:
1. To assign values to output latches to be driven
onto output pins upon subsequent entry into normal mode.
2. To read the values of input pins previously latched
in normal mode.
1. Enter shift mode to assign values to the latches
that correspond with the pins.
2. Enter normal mode. In normal mode the iB60 XR
microprocessor transfers the latched values to
the output pins and latches the values that are
being driven onto the input pins.
3. Reenter shift mode to read the new values of the
input pins.
4.0 BUS OPERATION
3.3.1 NORMAL MODE
When SCAN is deasserted, the normal mode is selected. For each input pin (RESET, HOLD,
INT/CSB, NA#, READY#, KEN#, SHI, BSCN,
SCAN, CC1, and CCO), the corresponding latch is
loaded with the value that is being driven onto the
pin.
The tristate output pins (A31-A3, BE7 # -BEO#,
W/R#, NENE#, ADS#, lOCK#, and PTB) are enabled by the control latches ADDRt (for A31-A3),
BEt, W/Rt, NENEt, ADSt, lOCKt, and PTBt. If a control latch is set, the corresponding output latches
drive their output pins; otherwise the pins are not
driven.
The 110 pins (063-00) are enabled by the control
latch OATAt, which is similar to the other control
latches. In addition, when DATAt is not set, the data
pins are treated as input pins and their values are
latched.
3.3.2 SHIFT MODE
-+
1
SHI
-+
2
BSCN
-+
3
SCAN
70
CCI
-+
71
CCO
-+
72
A31.
-+
-+
106
W/RI
-+
107
W/R#
-+
115
NA#
-+
116
INT/CSB
114
KEN#
With regard to how a bus cycle is generated by the
iB60 XR microprocessor, there are two types of cycles: pipelined and nonpipelined. Both types of cycles can be either read or write cyqles. A pipelined
cycle is one that starts while one or two other bus
cycles are outstanding. A non pipe lined cycle is one
that starts when no other bus cycles are outstanding.
4.1 Pipelining
When SCAN is asserted, the shift mode is selected.
In shift mode, the pins are organized into a boundary
scan chain. The scan chain is configured as a shift
register that is shifted on the rising edge of ClK. The
SHI pin is connected to the input of one end of the
boundary scan chain. The value of the most significant bit of the scan chain is output on the BREQ pin.
To avoid glitches while the values are being shifted
along the chain, the tester should assert ·both the
RESET and HOLD pins. Then all tristate outputs are
disabled. The order of the pins within the chain is
shown in Figure 3.1.
105
PTB#
A bus cycle begins when ADS# is activated and
ends when READY # is sampled active. READY # is
sampled one clock after assertion of ADS# and
thereafter until it becomes active. New cycles can
start as often as every other clock until three cycles
are outstanding. A bus cycle is considered outstanding as long as READY # has not been asserted to
terminate that cycle. After READY # becomes active, it is not sampled again for the following (outstanding) cycle until the second clock after the one
during which it became active. READY # is assumed
to be inactive when it is not sampled.
4
-+
-+
lOB
ADSI
-+
117
HOLD
A m-n read or write cycle is a cycle with a total cycle
time of m clocks and a cycle-to-cyCie time of n
clocks (m ;;:: n). Total cycle time extends from the
clock in which ADS# is activated to the clock in
which READY # becomes active, whereas cycle-tocycle time extends from the time that READY # is
sampled active for the previous cycle to the time
that it is sampled active again for the current cycle.
When m = n, a nonpipelined cycle is implied; m > n
implies a pipe lined cycle.
RESET
-+
5
DATAl
-+
6
DO
-+
-+
100
A3
-+
101
ADDRI
-+
102
NENEI
-+
109
ADS#
-+
110
HLDA
-+
111
LOCK!
-+
liB
BEl
-+
119
BE7#
-+
Figure 3.1. Order of Boundary Scan Chain
2-199
-+
69
063
-+
-+
103
NENE#
-+
104 '
PTBI
-+
112
LOCK#
-+
113
READY#
-+
126
BEO#
-+
127
BREQ
-+
-+
·iii'el®
+_!
i860TM XR MICROPROCESSOR
Pipelining may occur for the next bus cycle any time
the current bus cycle requires more than two clock
periods to finish (m > 2). If a bus request is pending,
the .next cycle will be initiated when NA # is sampled
active, even if the current cycle has not terminated.
In this case, pipelining occurs. NA# is not recognized unit! after ADS# has become inactive.
To allow high transfer rates in large memory systems, two-level pipelining is supported (Le., there
may be up to three cycles in progress at one time).
Pipelining enables a new word of data to be transferred every two clocks, even though the total cycle
time may be up to six clocks.
4.2 Bus State Machine
The operation of the bus is described in terms of a
bus state machine using a state transition diagram.
Figure 4.1 illustrates the i860 XR microprocessor
bus state machine. A bus cycle is composed of two
or more states. Each bus state lasts for one elK
period.
The i860 XR microproc~ssor supports up to two levels of address pipelining. Once it has started the first
bus cycle, it can generate up to two more cycles as
long as READY# remains inactive. To start a new
bus cycle while other cycles are still outstanding,
NA# must be active for at least one clock cycle
starting with the clock after the previous ADS # .
NA# is latched internally.
States Tj and Tjk' for j = {1 ,2,31 and k ,; {1 ,21 , are
used to describe the state of the i860 XR microprocessor Bus State Machine. Index j indicates the number of outstanding bus cycles while index k distinguishes the intermediate states for the j-th outstanding cycle. Therefore there can be up to three out-
standing cycles, and there are two possible intermediate states for each level of pipelining. Tj1 is .the
next state after Tj, as long as j cycles are outstanding. Tj2 is entered when NA# is active but the i860
XR microprocessor is not ready to start a new cycle.
Five conditions have to be met to start a new cycle
while one or more cycles are already pending:
1. READY # inactive
2. NA# having been active
3. An internal request pending (BREQ active)
4. HOLD not active
5. Fewer than three cycles outstanding
Note that BREQ is asserted on the clock after the
i860 XR microprocessor realizes an internal request
for the bus.
Upon hardware RESET, the bus control logic enters
the idle state TI and awaits an internal request for a
bus cycle. If abus cycle is reques~ed while there is
no hold request from the system, a bus cycle begins,
advancing to state T1. On the next cycle, the state
machine automatically advances to state T11. If
READY # is active in state T 11, the bus control logic
returns either to Tj, if no new cycle is started, or to
T1, if a new cycle request is pending internally. In
fact, if an internal bus request is pending each time
READY # is active, the state machine continues to
cycle between T 11 and T1.
However, if READY # is not active but the next address request is pending (as indicated by an active
NA#), the state machine advances either to state
T2 (if an internal bus request is pending, signifying
that two bus cycles are now outstanding), or to state
T12 (if no bus internal request is pending, signifying
NA# has been found active). Transitions from state
T12 are similar to those from T11.
2-200
int'eL
i860TM XR MICROPROCESSOR
REAOY~
DEASSERTEO
NOTES:
READY#
NA#
ADS#
HLDA
HOLD
REQUEST
Once READY # has been sampled active, it is
not sampled again until two clocks later
Not sampled during ADS # active clock
Active in T1, T2 and T3
Active in TH
HOLD in this figure is the internally synchronized version of the external signal HOLD
Internal Bus Request Pending (BREQ asserted)
HOLD ASSERTED
240296-29
Figure 4.1. Bus State Machine
If two bus cycles are already outstanding {as indicated by T2k for k = {1 ,2l) and NA # is latched active
but READY # is not active, one more bus request
causes entry into state T3. Transitions from this
state are similar to those from T 2.
In general, if there is an internal bus request each
time both READY# and NA# are active, the state
2-201
machine continues to oscillate between Tj1 and Tj,
for j = [2,3l.
When NA# is sampled active while there is a pending bus request, ADS# is activated in the next clock
period (provided no more than two cycles are already outstanding).
i860TM XR MICROPROCESSOR
Internal pending bus requests start new bus cycles
only if no HOLD request has been recognized. THis
entered from the idle state T" T11. and T12. HLDA is
active in this state. There is a one clock delay to
synchronize the HOLD input when the signal meets
the respective minimum setup and hold time requirements. The state machine uses the synchronized
HOLD to move from state to state.
4.3.1 NONPIPELINED READ CYCLES
A read cycle begins with the clock in which ADS # is
asserted. The i860 XR microprocessor begins driving the address during this clock. It samples
READY # for active state every clock after the first
clock. A minimum of two clocks is required per cycle.
Data is latched when READY # is fouhd active when
sampled at the end of a clock period. Figure 4.2 illustrates nonpipelined read cycles with zero wait
states.
.
4.3 BusCycles
Figures 4.2 through 4.10 illustrate combinations of
bus cycles.
CYCLE 1
CYCLE 2
CYCLE 3
NON-PIPELINED
READ
NON-PIPELINED
READ
NON-PIPELINED
READ
(2-2)
(2-2)
(2-2)
Tl
T11
Tl
T11
Tl
T11
ClK
ADS#·
A31-A3. W/R#.
BEn#. NENE#.
.
PTB
NA#
. READY#
063-00
240296-13
Figure 4.2. Fastest Read Cycles
2-202
intel .
i860TM XR MICROPROCESSOR
CYCLE 1·
CYCLE 2
CYCLE 3
NON-PIPELINED
WRITE
NON-PIPELINED
WRITE
NON-PIPELINED
WRITE
(2-2)
Tl
(2-2)
Tl
Til
(2-2)
TI1
Tl
TI1
ClK
ADSII
A31-A3. W/R#.
BEn#. NENE#.
PTB
II
NA#
READY#
240296-14
Figure 4.3. Fastest Write Cycles .
4.3.2 NONPIPELtNED WRITE CYCLES
The ADS# and READY# activity for write cYcles
follows the same logic as that· for read cycles. as
Figure 4.3 illustrates for back-to-back, nonpipelined
write cycles with zero wait-states.
The fastest write cycle takes only two clocks to complete. However, when a read cycle immediately precedes a write cycle, the write cycle must contain a
wait state, as illustrated in Figure 4.4. Because the
device being read might still be driving the data bus
during the first clock of the write cycle, there is a
potential for bus contention. To help avoid such contention, thei860 XR microprocessor does not drive
the data bus until the second clock of the write cycle. The wait state is required to provide the additional time necessary to terminate the write cycle. In
other read-write combinations. the i860 XR microprocessor does not require a wait state.
2-203
i860Tivi XR lViiCROPRO.CESSOR
CYCLE 1
CYCLE 2
CYCLE 3
NON-PIPELINED
READ
NON-PIPELINED
WRITE
(3-3)
NON-PIPELINED
READ
(2-2)
Tl ,
T11
Tl
Tll
(2-2)
Tll
Tl
T11
ClK
ADS#
A31-A3. W/R#.
BEn#. NENE#.
PTB
NA#
READY#
063-00
240296-15
Fig!Jre 4.4. Fastest ReadlWrite Cycles
CYCLE 1
CYCLE 2
CYCLE 3
CYCLE 4
NON-PIPElINED
READ
PIPEllNED
READ
PIPELINED
WRITE
(5-5)
, (5-2)
PIPEllNED
WRITE
(6-3)
(6-2)
ClK
ADS#
A31-A3. W/R#.
BEn#. NENE#.
PTB
NA#
READY#
063-00
240296-16
Figure 4.5. Pipelined Read Followed by Pipelined Write
2-204
i860™ XR MICROPROCESSOR
CYCLE 1
CYCLE 2
CYCLE 3
CYCLE 4
NON-PIPELINED
WRITE
PIPELINED
WRITE
PIPELINED
READ
PIPELINED
READ
(5-5)
(5-2)
(5-2)
(5-2)
ClK
ADS#
A31-A3. WjR#.
BEn#. NENE#.
PTB
•
NA#
READY#
D63-DO
240296-17
figure 4.6. Pipelined Write Followed by Pipelined Read
4.3.3 PIPELINED READ AND WRITE CYCLES
Figures 4.5 and 4.6 illustrate combinations of nonpipelined and pipelined read and write cycles. The
following description applies to both diagrams. While
Cycle 1 is still in progress, two new cycles are initiated. By the time READY # first becomes active, the
state machine has moved through states T 1, T 11,
T2, T21, and T3. Cycles 3 and 4 show how activating
READY # terminates the corresponding outstanding
cycle, and yet activating NA# while there is an internal request pending adds a new outstanding cycle.
In Figure 4.5, Cycle 3 is a write cycle following a read
cycle; therefore, one wait state must be inserted.
The i860 XR microprocessor does not drive the data
bus until one clock after the read data is returned
from the preceding read cycle. During Cycles 3 and
4, the state machine oscillates between states T3
and T31 maintaining full bus capacity (two levels of
pipe lining; three outstanding cycles). Cycles 2, 3,
and 4 in Figure 4.6 are 5-2 cycles; i.e. each requires
a total cycle time of five clocks while the throughput
rate is one cycle every two clocks.
Figure 4.7 illustrates in a more general manner how
the NA# signal controls pipelining. Cycle 1 is a 2-2
cycle, the fastest possible. The next cycle cannot be
started any earlier; therefore, there is no need to
activate NA# to start the next cycle early. Cycle 2, a
3-3 read, is different. Cycle 3 can be started during
the third state (a wait state) of Cycle 2, and NA# is
asserted to accomplish this.
NA# is not activated following the ADS# clock of
Cycle 3, thereby allowing Cycle 3 to terminate before the start of Cycle 4. As a result, Cycle 4 is a
nonpipelined cycle.
2-205
int:et
i860TM XR MICROPROCESSOR
CYCLE 1
CYCLE 2
CYCLE 3
CYCLE 4
NON"'PIPELINED
READ
NON-PIPELINED
READ
(3-3)
PIPELINED
READ
NON-PIPELINED
READ
(3-2)
(2-2)
(2-2)
IDLE
IDLE
elK
ADS#
A31-A3. W/R#.
BEn#. NENE#.
PTB
NA#
READY#
063-00
240296-18
Figure 4.7. Pipelining Driven by NA#
elK
ADS#
A31-A3. W/R#.
BEn#. NENE#.
PTB
NA#
READY#
063-00
240296-19
Figure 4.8. NA # Active with No Internal Bus Request
2-206
i860™ XR MICROPROCESSOR
CYCLE 1
CYCLE 2
CYCLE 3
NON-PIPELINED
READ
NON-PIPELINED
WRITE
(3-3)
NON-PIPELINEO
WRITE
(2-2)
T,
T
"
T,
T"
(2-2)
T"
T,
T"
ClK
AOS#
A31-A3. W/R#.
BEn#. NENE#.
PTB
NA#
REAOY#
063-00
lOCK#
240296-20
Figure 4.9. Locked Cycles
When there is no internal bus request, activating
NA # does not start a new cycle; the i860 XR microprocessor, however, remembers that NA# has been
activated. Figure 4.8 illustrates the situation where
NA# is active but no internal bus request is pending.
NA# is activated when two cycles are outstanding.
Because there is no internal request pending until
after one idle state, no new bus cycle is started during that period.
4.3.4 LOCKED CYCLES
The LOCK # signal is asserted when the current bus
cycle is to be locked with the next bus cycle. Assertion of LOCK # may be initiated by a program's setting the BL bit of the dirbase register using the lock
instruction (refer to section 2) or by the i860 XR microprocessor itself during page table updates.
In Figure 4.9, the first read cycle is to be locked with
the following write cycle. If there were idle states
between the cycles, the LOCK# signal would remain asserted. This is the case for a read/modify/
write operation. Cycle 3 is not locked because
LOCK # is no longer asserted when Cycle 2 starts.
4.3.5 HOLD AND BREQ ARBITRATION CYCLES
The HOLD, HLDA, and BREQ signals permit bus arbitration between the i860 XR microprocessor and
another bus master.
See Figure 4.10. When HOLD is asserted, the i860
XR microprocessor does not relinquish control of
the bus until all outstanding cycles are completed. If
HOLD were asserted one clock earlier, the last i860
XR microprocessor bus cycle before HLDA would
not be started.
HOLD is sampled at the end of the clock in which it
is activated. Recommended setup and hold times
must be met to guarantee sampling one clock after
external HOLD activation. When HOLD is sampled
active, a one clock delay for internal synchronization
follows. Likewise when HOLD is deasserted, there is
a one-clock delay for internal synchronization before
HLDA is deasserted. The outputs (except HLDA and
BREQ) float when HLDA is asserted.
2-207
i860TM XR MICROPROCESSOR
TI
Til
ClK
ADS#
A31-A3. W/R#.
aEn#. NENE#.
PTa
READY#
HOLD
HlDA
BREO
240296-21
Figure 4.10. HOLD, HLDA, and BREQ
If. during a HOLD cycle, an internal bus request is
generated, BREQ is activated even though HLDA is
asserted. It remains active at least until the clock
after ADS# is activated for the requested cycle.
SET. If INT ICS8 is sampled active, the i860 XR mi·
croprocessor enters CS8 mode. No inputs (except
for HOLD and INT/CS8) are sampled during RESET.
Note that, because HOLD is recognized even while
RESET is active, the HLDA output signal may also
become active during RESET. Refer to Table 3.4
"Output Pin Status during Reset".
4.4 Bus States During RESET
Figure 4.11 shows how INT ICS8 is sampled during
the clock period just before the falling. edge of RE-
'" 16 ClKs
ClK
RESET
INT/CSB
OTHER
bI:N~7'V~7dI~~7Vd:N~r--t
INPUTS
240296-22
Figure 4.11. Reset Activities
2-208
intet
i860™ XR MICROPROCESSOR
5.0 MECHANICAL DATA
Figures 5.1 and 5.2 show the locations of pins; Tables 5.1 and 5.2 help to locate pin identifiers.
Q
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
vee
Vss
vee
Vss
A12
A17
A19
A21
A23
A25
A29
1.31
Vee
Vss
Vee
Vss
Vee
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
Vss
Vee
Vss
AS
AID
A13
A15
A18
A20
.424
A27
A28
ceo
Vee
Vss
Vee
Vss
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
Vee
Vss
A6
A7
A9
All
A14
A16
eLK
1.22
A26
A30
eel
062
060
Vss
Vee
()
()
()
()
()
()
Vss
Vee
A5
063
059
Vss
()
()
()
()
()
()
Vcc
A4
1.3
061
058
056
()
W/R#
()
()
()
()
()
NENE#
PTB
057
054
052
()
()
()
()
()
()
ADS#
HLOA
BREa
ass
053
050
()
()
()
()
()
()
LOCK#
KEN,
READY#
051
049
0.48
()
()
()
()
()
()
INT/CSB
NAN
HOLD
047
045
046
~
()
()
()
()
()
BE5#
BE71
BESI
043
0042
()
044
( )
BE3#
()
()
BE2#
BE41
()
039
()
041
()
040
"
12
()
()
()
()
()
()
12
SHI
BEl #
BED#
037
036
038
13
()
()
()
RESET
SCAN
85CN
()
035
()
03.
Vee
10
".
( )
"
()
()
()
()
()
( )
Vss
DO
Dl
033
Vee
Vss
15
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
02
03
05
D7
011
013
017
021
023
027
029
031
032
Vss
Vee
()
()
()
()
D9
()
()
()
()
()
()
()
()
()
()
()
D4
DB
015
014
019
022
025
028
030
Vss
Vee
Vss
1S
17
()
()
Vss
Vcc
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
()
(.)
Vee
Vss
Vee
Vss
Vee
OS
010
012
016
018
020
024
026
VSS
Vcc
Vss
Vee
10
.13
14
15
1S
17
A
Q
240296-23
Figure 5.1. Pin Configuration-View from Top Side
2-209
inteL
i860TM XR MICROPROCESSOR
Q
000
0
0
0
0
0
0
000
0
0
0
0
0
Vee
Vss
Vee
A31
A29
A25
A23
A21
A12
Vss
Vee
, Vss
Vee
Vss
Vee
A19
A17
000
0
000
0
0
000
0
0
0
0
0
Vss
Vee
ceo
11.24
A20
AlB
A13
AID
A8
Vss
Vee
':Iss
Vee
Vss
A28
A27
A15
000
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Vee
062
eel
.430
A26
.1.22
elK
A 16
A 14
All
A9
A7
AS
Vss
Vee
o
Vss
Vss
o
059
060
o
o
D~3
A5
METAL LID
000
056
058
054
A3
053
057
PTB
055
BREO
051
READY#
KEN#
o
o
o
o
0
0
040
12
0
038
13
0
Vee
14
0
Vss
15
0
Vee
16
0
Vss
17
HlDA
049
045
o
042
o
D41
o
036
o
W/R#
AOS#
000
o
044
11
NENE#
048
046
Vee
000
000
10
.44
000
000
050
o
Vss
000
061
000
052
o
Vee
047
HOLD
o
o
043
BE6#
o
o
039
BE4#
o
0
037
BEO#
o
o
NA#
o
BE7 #
o
BE2#
o
BEl #
o
LOCK#
o
o
INT/CSB
o
o
o
85CN
SCAN
o
033 ' - -_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _- - '
o
o
o
o
DO
Vss
o
o
Vss
032
o
.0
Vee
Vss
o
031
o
030
o
029
o
028
000
027
023
021
o
017
013
o
011
o
o
o
o
07
05
03
000
o
o
o
o
o
o
025
014
015
DB
D9
D4
Vee
022
019
0
o
o
o
o
000
o
o
o
o
o
o
Vee
Vss
Vee
Vss
026
024
016
012
010
OS
Vee
Vss
020
DIS
o
02
o
Vss
o
Vee
o
o
Vee
o
o
Vss
13
RESET
Vss
Vee.
12
SHI
035
01
11
BE3#
034
Vee
10
BE5#
o
14
15
16
Vss
o
17
Vee
S'
240296-24
Figure 5.2. Pin Configuration-View from Pin Side
2-210
intel®
i860™ XR MICROPROCESSOR
Table 5.1. Pin Cross Reference by Location
Location
Signal
Ai ............. Vee
A2 ............. Vss
A3 ............. Vee
A4 ............. Vss
A5 ............. 056
A6 ............. 052
A7 ............. 050
A8 ............. 048
A9 ............. 046
A10 ............ 044
A11 ............ 040
A12 ............ 038
A13 ............ Vee
A14 ............ Vss
A15 ............ Vee
A16 ............ Vss
A17 ............ Vee
81 ............. Vss
82 ............. Vee
83 ............. Vss
84 ............. 059
85 ............. 058
86 ............. 054
87 ............. 053
88 ............. 049
89 ............. 045
810 ............ 042
811 ............ 041
812 ............ 036
813 ............ 034
814 ............ Vee
815 ............ Vss
816 ............ Vee
817 ............ Vss
C1 ............. Vee
C2 ............. Vss
.C3 ............. 060
C4 ............. 063
C5 ............. 061
C6 ............. 057
C7 ............. 055
C8 ............. 051
Location
Signal
Location
C9 ............. 047
C10 ............ 043
C11 ............ 039
C12 ............ 037
C13 ............ 035
C14 ............ 033
C15 ............ 032
C16 ............ Vss
C17 ............ Vee
01 ............. Vss
02 ............. Vee
03 ............. 062
015 ............ 031
016 ............ 030
017 ............ Vss
E1 ............. Vee
E2 ............. CCO
E3 ............. CC1
E15 ............ 029
E16 ............ 028
E17 ............ 026
F1 ............. A31
F2 ............. A28
F3 ............. A30
F15 ............ 027
F16 ............ 025
F17 ............ 024
G1 ............. A29
G2 ............. A27
G3 ............. A26
G15 ........... 023
G16 ........... 022
G17 ........... 020
Hi ............. A25
H2 ............. A24
H3 ............. A22
H15 ............ 021
H16 ............ 019
H17 ............ 018
J1 ............. A23
J2 ............. A20
J3 ............. ClK
2-211
Signal
J15 ............ 017
J16 ............ 014
J17 ............ 016
K1 ............. A21
K2 ............. A18
K3 ............. A16
K15 ............ 013
K16 ............ 015
K17 ............ 012
l1 ............. A19
l2 ............. A15
l3 ............. A14
l15 ............ 011
l16 ............. 08
l17 ............ 010
Mi ............. A17
M2 ............. A13
M3 ............. A11
M15 ............ 07
M16 ............ 09
M17 ............ 06
N1 ............. A12
N2 ............. A10
N3 .............. A9
N15 ............. 05
N16 ............. 04
N17 ............ Vee
Pl ............. Vss
P2 .............. A8
P3 .............. A7
P15 ............. 03
P16 ............ Vee
P17 ............ Vss
01. ........... . Vee
02 ............ . Vss
03 .............. A6
04 ..... ......... A5
05 ............ .. A3
06 ............ PT8
07 .......... . 8REO
08 ....... . REAOY#
09 .......... . HOlO
Location
Signal
010 ......... . 8E6#
011 ........ .. 8E4#
012 ......... . BEO#
013 ......... . 8SCN
014 ............. 01
015 ............. 02
016 ........... . Vss
017 ........... Vee
R1 ............. Vss
R2 ............. Vee
R3 ............. Vss
R4 ............. Vee
R5 .............. A4
R6 ......... NENE#
R7 ........... HlOA
R8 .......... KEN#
R9 ............ NA#
RiO .......... 8E7#
R11 .......... 8E2#
R12 .......... 8E1#
R13 .......... SCAN
R14 ............. ~O
R15 ............ Vss
R16 ............ Vee
R17 ............ Vss
S1 ............. Vee
S2 ............. Vss
S3 ............. Vee
S4 ............. Vss
S5 ............. Vee
S6 .......... W/R#
S7 ........... AOS#
S8 ......... lOCK#
S9 ........ INT/CS8
S10 .......... 8E5#
S11 .......... 8E3#
S12 ............ SHI
S13 ......... RESET
S14 ............ Vss
S15 ............ Vee
S16 ............ Vss
S17 ............. Vee
int'et
i860TM XR MICROPROCESSOR
Table 5.2. Pin Cross Reference by Pin Name
Signal
Location
A3 .............. 05
A4 .............. R5
A5 .......... '.... 04
A6 .............. 03
A7 .............. P3
AB .............. P2
A9 .............. N3
A10 ............. N2
A11 ............. M3
A12 ............. N1
A13 ............. M2
A14 ............. l3
A15 ............. l2
A16 ............. K3
A17 ............. M1
A1B ............. K2
A19 ............. l1
A20 ............. J2
A21 ............. K1
A22 ............. H3
A23 ............. J1
A24 ............. H2
A25 ............. H1
A26 ............. G3
A27 ............. G2
A2B ............. F2
A29 ............. G1
A30 ............. F3
A31 ............. F1
A08# ........... 87
BEO# .......... 012
BE1# .......... R12
BE2# .......... R11
BE3# .......... 811
BE4# .......... 011
BE5# .......... 810
BE6# .......... 010
BE7# .......... R10
BREO ........... 07
B8CN .......... 013
CCO ............. E2
,CC1 ............. E3
Signal
Location
ClK ............. J3
00 ............. R14
01 ............. 014
02 ............. 015
03 ............. P15
04 ............. N16
05 ............. N15
06 ............ M17
07 ............ M15
OB ............. L16
09 ............ M16
010 ............ l17
011 ............ L15
012 ............ K17
013 ..........' .. K15
014 ............ J16
015 ............ K16
016 ............ J17
017 ............ J15
01B ............ H17
019 ............ H16
020 ........... G17
021 ............ H15
022 ........... G16
023 ........... G15
024 ............ F17
025 ............ F16
026 ............ E17
027 ............ F15
02B ............ E16
029 ............ E15
030 ............ 016
031 ............ 015
032 ............ C15
033 ............ C14
034 ............ B13
035 ............ C13
036 ............ B12
037 ............ C12
03B ............ A12
039 ............ C11
040 ............ A11
Signal
Location
041 ............ B11
042 ..... : ...... B10
043 ............ C10
044 ............ A10
045 ............. B9
046 ............. A9
047 ............. C9
04B ............. AB
049 ............. BB
050 ............. A7
051 ............. CB
052 .............. A6
053 ............. B7
054 ............. B6
055 ............. C7
056 ............. A5
057 ............. C6
05B ............. B5
059 ............. B4
060 ............. C3
061 ............. C5
062 ............. 03
063 ............. C4
HlOA ........... R7
HalO ........... 09
INT/C8B ........ 89
KEN# .......... RB
lOCK# ......... 8B
NA# ............ R9
NENE# ......... R6
PTB ............ 06
REAOY# ........ OB
RE8ET ......... 813
8CAN .......... R13
8HI ............ 812
Vee ............. A1
Vee ............. A3
Vee ............ A13
Vee ............ A15
Vee ............ A17
Vee ............. B2
Vee ............ B14
2-212
Signal
Location
Vee ............ B16
Vee ............. C1
Vee ............ C17
Vee ............. 02
Vee ............. E1
Vee ............ N17
Vee ............ P16
Vee ............. 01
Vee ........... 017
Vee ............. R2
Vee ............. R4
Vee ............ R16
Vee ............. 81
Vee ............. 83
Vee ............. 85
Vee ............ 815
Vee ............ 817
Vss ............. A2
Vss ............. A4
Vss ............ A14
Vss ............ A16
Vss ............. B1
Vss ............. B3
Vss ............ B15
Vss ............ B17
Vss ............. C2
Vss ............ C16
Vss ............. 01
Vss ............ 017
Vss ............. P1
Vss ............ P17
Vss ............. 02
Vss ............ 016
Vss ............. R1
Vss ............. R3
Vss ............ R15
Vss ............ R17
Vss ............. 82
Vss ............. 84
Vss ............ 814
Vss ............ 816
W/R# .......... 86
int'el..
i860TM XR MICROPROCESSOR
Table 5.3. Ceramic PGA Package Dimension Symbols
Letter or
Symbol
Description of Dimensions
A
Distance from seating plane to highest point of body
A1
Distance between seating plane and base plane (lid)
A2
Distance from base plane to highest point of body
A3
Distance from seating plane to bottom of body
B
Diameter of terminal lead pin
D
Largest overall package dimension of length
D1
A body length dimension, outer lead center to outer lead center
e1
Linear spacing between true lead position centerlines
L
Distance from seating plane to end of lead
51
Other body dimension, .outer lead center to edge of body
NOTES:
1. Controlling dimension: millimeter.
2. Dimension "e1" ("e") is non-cumulative.
3. Seating plane (standoff) is defined by P.C. board hole size: 0.0415-0.0430 inch.
4. Dimensions "6", "61" and "c" are nominal.
5. Details of Pin 1 identifier are optional.
2-213
inte!®
i860TM XR MICROPROCESSOR
[I
¢1.65
SEATIN~_
D
D,
s'-1 ~ d~-r---r
.,I ----- f=~
---PLANE _
@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@
@@a@@@@@@@@@@@D@@
@@@
@@@
@@@
@@@
@@@
@@@
@@@
@@@
@@@
@@@
@@@
@@@
@@@
@@@
@@@
@@@
,
/
@@@
@@@
RL
f
-
(
PIN C3~
\
'-
I~
@ @
SEATING
PLANE~
¢B (ALL PINS)
D
SWAGGED
PIN
DETAIL
--
@@@
@@o@@@@@@@@@@@~:
@@@@@@@@@@@@@@@o@
@@@@@@@@@@@@@@@o@
L
r-
A
A3 -
SWAGGED~
~:~~ REF.
.
PIN
(4 PL)
45° CHAMfER
(INDEX CORNER)
-- L
A,--
BASE- A2
PLAN°E-
240296-30
Family: Ceramic Pin Grid Array Package
Millimeters
Symbol
Min
Max
A
3.56
4.57
A1
0.64
1.14
A2
2.79
3.56
A3
1.14
Inches
Notes
Min
Max
0.140
0.180
SOLID LID
0.025
0.045
SOLID LID
SOLID LID
0.110
0.140
SOLID LID
1.40
0.045
0.055
0.020
B
0.43
0.51
0.017
D
44.07
44.83
1.735
1.765
D1
40.51
40.77
1.595
1.605
e1
2.29
2.79
0.090
0.110
L
2.54
3.30
0.100
N
# of Pins
168
S1
ISSUE
1.52
IWS
2.54
REVX
0.130
# of Pins
168
0.060
Notes
0.100
7/15/88
Figure 5.3.168 Lead Ceramic PGA Package Dimensions
The i860 XR microprocessor is specified for opera·
tion when Tc is within the range of O°C-85°C. Tc
may be measured in any environment to determine
whether the i860 XR microprocessor is within specified operating range. The case temperature should
be measured at the center of the top surface oppo'
site the pins.
6.0 PACKAGE THERMAL
SPECIFICATIONS
For this section, let:
P = maximum power consumption
Tc = case temperature
TA = ambient air temperature
TA can be calculated from 8CA (thermal resistance
from case to ambient) with the following equation:
8CA = thermal resistance from case to ambient air
8JC = thermal resistance from junction to case
8JA = thermal resistance from junction to ambient
air
2-214
in~®
i860™ XR MICROPROCESSOR
Table 6.2 shows the maximum TA allowable (without
exceeding Tcl at various airflows and operating frequencies (fCLK)'
Typical values for ()CA and ()JC at various airflows
are given in Table 6.1 for the 1.75 sq. in., 168 pin,
ceramic PGA. ()JC is also shown so that ()JA can be
calculated by:
Note that TA is greatly improved by attaching "fins"
or a "heat sink" to the package. P (the maximum
power consumption) is calculated by using the maximum Icc at 5V as tabulated in the DC Characteristics of section 7.
Note that () JC with a heatsink differs from () JC without a heatsink because case temperature is measured differently. Case temperature for () JC with
heatsink is measured at the center of the heat fin
base. Case temperature for ()JC without heatsink is
measured at the center of package top surface.
Figure 6.1 gives typical Icc derating with case temperature. For more information on heat sinks, measurement techniques, or package characteristics, refer to Intel Packaging Handbook, order number
240800.
II
Typical part at 5V with maximum load
ICC (rnA)
580
570
560
550
-r--- r--
I---
540
- r--40.0 MHz
530
--r--
r--
520
510
500
- -
r---
r--
490
480
33.3 MHz
r-
470
460
450
440
25.0 MHz
430
420
410
400
o
10
20
30
40
50
60
70
80 85
Tc (Oe)
240296-33
Figure 6.1. Icc vs Case Temperature
Table 6.1. Thermal Resistance eC/W) () JC and ()CA
()CA at Airflow-ft/min (m/sec)
()JC
0
(0)
200
(1.01)
400
(2.03)
600
(3.04)
800
(4.06)
1000
(5.07)
With
HeatSink*
2
11
6
4
3.2
2.5
2.2
Without
Heat Sink
1.5
17.5
13
11
9.5
8.5
8
..
'Nlne·fln, unidirectional heat sink (fin dimensions: 0.350" height, 0.040
width, 0.115" center·to·center spacing, 1.530" length).
2-215
int:eL
i860™ XR MICROPROCESSOR
Table 6.2. Maximum Allowable T A at Various Airflows
InoC
Airflow-ft/min (m/sec)
fCLK
(MHz)
TA with
Heat Sink'
TA without
Heat Sink
0
(0)
200
(1.01)
400
(2.03)
600
(3.04)
800
(4.06)
1000
(5.07)
25.0
57.5
70
75
77
78.8
79.5
33.3
52
67
73
75.5
77.4
78.5
40.0
49.3
65.5
72
74.6
76.9
77.9
25.0
41.3
52.5
57.5
61.3
63.8
65
33.3
32.5
46
52
56.5
59.5
61
40.0
28.1
42.8
49.3
54.1
57.4
59
'Nine·fin unidirectional heat sink (fin dimensions: 0.350" height, 0.040 width,
0.115" center·to·center spacing, 1.530" length).
7.0 ELECTRICAL DATA
NOTICE: This data sheet contains preliminary infor·
mation on new products in production. The specifica·
tions are subject to change without notice. Verify with
your local Intel Sales office that you have the latest
data sheet before finalizing a design.
Inputs and outputs are TTL compatible, except for
ClK. All input and output timings are specified rela·
tive to the 1.5 volt level of the rising edge of ClK
and refer to the point that the signals reach 1.5V.
• WARNING: Stressing the device beyond the "Absolute
Maximum Ratings" may cause permanent damage.
These are stress ratings only. Operation beyond the
"Operating Conditions" is not recommended and extended exposure beyond the "Operating Conditions"
may affect device reliability.
7.1 Absolute Maximum Ratings
Case Temperature TC under Bias ...... O°C to 85°C
Storage Temperature .......... - 65°C to + 150°C
Voltage on Any Pin
with Respect to Ground .............. - 0:5 to 6.5V
7.2 D.C. Characteristics
Table 7.1. DC Characteristics
Tc = O°C to 85°C Vcc = 5V +5%
Symbol
Parameter
III
Input lOW Voltage
Input HIGH Voltage
ClK Input lOW Voltage
ClK Input HIGH Voltage
Output lOW Voltage
Output HIGH Voltage
Power Supply Current
ClK = 25.0 MHz
ClK = 33.3 MHz
ClK = 40.0 MHz
Input leakage Current
ILO
CIN
Co
CCLK
Output leakage Current
Input Capacitance
I/O or Output Capacitance
Clock Capacitance
VIL
VIH
VILC
VIHC
VOL
VOH
Icc
Min
Max
Units
-0.3
2.0
-0.3
3.0
+0.8
Vcc+ 0.3
+0.8
VCC + 0.3
0.45
V
V
V
V
V
V
500
600
650
±15
mA
mA
mA
J1-A
±15
15
15
20
J1-A
pF
pF
pF
2.4
Notes
(Note 1)
(Note 2)
VCC@5V
Vcc@5V
VCC@5V
No pullup
or pulldown
(Note 3)
(Note 3)
(Note 3)
NOTES:
1. This parameter is measured at 4.0 mA for A31-A3, 063-00, BE7#-BEO#; at 5.0 mA for all other outputs.
2. This parameter is measured at 1.0 mA for A31-A3, 063-00, BE7 # -BEO#; at 0.9 mA all other outputs.
3. These are not tested. They are guaranteed by design characterization.
2·216
int:el.
i860™ XR MICROPROCESSOR
7.3 A.C. Characteristics
Table 7.2. A.C. Characteristics
Tc = O°C to 85°C, Vee = 5V ±5%
All timings measured at ClK = 1.5V unless otherwise specified.
33 MHz
25 MHz
Symbol
t1
Parameter
ClK Period
40 MHz
Min
(ns)
Max
(n5)
Min
(ns)
Max
(n5)
Min
(n5)
Max
(ns)
40
125
30
125
25
125
Notes
t2
ClK High Time
6
5
3
,at3V
t3
ClK low Time
8
7
5
at 0.8V
t4
ClK Fall Time
7
7
7
3V-0.8V
t5
ClK Rise Time
7
7
7
0.8V-3V
t6a
A31-A3, PTB, W/R#, NENE#
Valid Delay
3.5
25
3.5
23
3.5
19
50 pF load
t6b
BEn#* Valid Delay
3.5
27
3.5
25
3.5
21
50 pF load
t7
Float Time, All
3.5
40
3.5
30
3.5
25
(Note 1)
t8
ADS#,BREQ,lOCK#,HlDA
Valid Delay
3.5
22
3.5
20
3.5
15
50 pF load
3.5
38
3.5
35
3.5
31
t9
063-00 Valid Delay
t10
Setup Time, Allinput5
t11a
t11b
50 pF load
13
11
8
(Note 2)
Hold Time, All Inputs except
DATA
4
4
3
(Note 2)
DATA Hold Time
5
4
3
NOTES:
1. Float condition occurs when maximum output current becomes less than ILO in magnitude. Float delay is not tested.
2. INT and HOLD are asynchronous inputs. The setup and hold specifications are given for test purposes or to assure
recognition on a specific rising edge of elK.
• n
=
0, 1, ... ,7
2-217
1860TM XR MICROPROCESSOR
3.0V
elK
1.5V
t2
t3
O.BV
t1
INPUT
HOLD
INPUT
SETUP
t10min
t11 m1n
INPUTS
OUTPUTS
VALID
FLOAT
240296-25
Figure 7.1. elK, Input, and Output Timings
2-218
i860TM XR MIC.ROPROCESSOR
nom +15 ,----,------,---r---.----,
i---+----t--+---\-:;r-"r
nom +10
TYPICAL' OUTPUT
DELAY (ns)
@
nom +5 f----+----+---,~+="""'+---l
1.5V
L-_--'-_--'-_ _-'---_-L_----'
nom -5
25
50
75
100
125
LOAD CAPACITANCE. CL
NOTES:
Graphs are not linear outside the CL range shown.
nom = nominal value given in the AC timing table.
'Typical part under worst-case conditions.
150
(pi) .
240296-26
Figure 7.2. Typical Output Delay vs Load Capacitance under Worst-Case Conditions
TYPICAL' OUTPUT
SLEW TIME (ns)
A S#. BREQ. LOCK#. HLDA
9 f---i--il"----+---oA-~"'"
(0.8-2.0V)
/R#. NENE#
oL-~_-L_-L_~~
25
50
75
100
125
LOAD CAPACITANCE. CL
NOTES:
Graphs are not linear outside the CL range shown.
'Typical part under worst-case conditions.
150
(pi)
240296-27
Figure 7.3. Typical Slew Time vs Load Capacitance under Worst-Case Conditions
700
.<-
600
500
-50
.J,>
400
V
,,-1/
,,-
v
V'
300
200
8
12 16 20 24 26 30 343840
FREQUENCY (MHz)
NOTES:
Graphs are not linear outside the frequency range shown.
'Worst-case supply current at 5V.
Figure 7.4. Typical Icc vs Frequency
2-219
240296-28
i860TM XR MICROPROCESSOR
8.0 INSTRUCTION SET
Key to abbreviations:
For register operands, the abbreviations that describe the operands are composed of two parts. The first part
describes the type of register:
One of the control registers fir, psr, epsr, dirbase, db, or fsr
c
One of the floating-point registers: fO through f31
t
One of the integer registers: rO through r31
The second part identifies the field of the machine instruction into which the operand is to be placed:
src1
The first of the two source-register designators, which may be either a register or a 16-bit
immediate constant or address offset. The immediate value is zero-extended for logical
operations and is sign-extended for add and subtract operations (including addu and subu)
and for all addressing calculations.
Same as src1 except that no immediate constant or address offset value is permitted.
src1ni
Same as src1 except that the immediate constant is a 5-bit value that is zero-extended to 32
src1s
bits.
The second of the two source-register designators.
src2
The destination register designator.
dest
Thus, the operand specifier isrc2, for example, means that an integer register is used and that the encoding of
that register must be placed in the src2 field of the machine instruction.
Other (non register) operands are specified by a one-part abbreviation that represents both the type of operand
required and the instruction field into which the value of the operand is placed:
# canst
A 16-bit immediate constant or address offset that the i860 XR microprocessor sign-extends
to 32 bits when computing the effective address.
Ibraff
A signed, 26-bit, immediate, relative branch offset.
sbraff
A signed, 16cbit, immediate, relative branch offset.
brx
A function that computes the target address by shifting the offset (either Ibratt or sbraff) left
, by two bits, sign-extending it to 32 bits, and adding the result to the current instruction pointer
plus four. The resulting target address may lie anywhere within the address space.
Unless otherwise specified, floating-pOint operations accept single- or double-precision
source operands and produce a result of equal or greater precision. Both input operands
must have the same precision. The source and result precision are specified by a two-letter
suffix to the mnemonic of the operation.
Other abbreviations include:
.p
Precision specification .5S, .sd, or.dd (.ds not permitted). Refer to Table 8.1 .
.r
Precision specification.ss, .sd, .ds, or .dd. Refer to Table 8.1 .
.v
.sd or .dd. Refer to Table 8.1 .
.ss or .dd. Refer to Table 8.1 .
.W
.b (8 bits), .s (16 bits), or .1 (32 bits)
.x
.y
.I (32 bits), .d (64 bits), or .q (128 bits)
.I (32 bits), or ..d (64 bits)
.Z
Table 8.1. Precision Specification
Suffix
Source
Precision
Result
Precision
.ss
.sd
.dd
.ds
single
single
double
double
single
double
double
single
2-220
int:eL
i860TM XR MICROPROCESSOR
mem.x(address) The contents of the memory location indicated by address with a size of x.
PM
The pixel mask, which is considered as an array of eight bits PM [7]..PM [01. where PM[O] is
the least significant bit.
.
8.1 Instruction Definitions in Alphabetical Order
adds
isrc1, isrc2, ides! ............................................................ Add Signed
ides! ~ isrc1 + isrc2
OF ~ (bit 31 carry ,p bit 30 carry)
CC set if isrc2 < -isrc1 (signed)
CC clear if isrc2 ~ -isrc1 (signed)
addu
isrc1, isrc2, ides! ............ .............................................. Add Unsigned
ides! ~ isrc1 + isrc2
OF ~ bit 31 carry
CC ~ bit 31 carry
isrc1, isrc2, ides! ........................................................... Logical AND
ides! ~ isrc 1 and isrc2
CC set if result is zero, cleared otherwise
andh
#const isrc2, ides! .................................. : ................. Logical AND High
ides! ~ (#cons! shifted left 16 bits) and isrc2
CC set if result is zero, cleared otherwise
and not
isrc1, isrc2, ides! ...................................................... .Logical AND 'NOT
ides!. ~ not isrc 1 and isrc2
CC set if result is zero, cleared otherwise
andnoth. #const isrc2, ides! ................................................ Logical AND NOT High
ides! ~ not (# cons! shifted left 16 bits) and isrc2
CC set if result is zero, cleared otherwise
bc
Ibroff .......................... , ......................................... Branch on ee
IF
CC = 1
THEN continue execution· at brx(lbroff)
FI
bc.t
Ibroff ............................................................. Branch on ee, Taken
IF
CC = 1
THEN execute one more sequential instruction
continue execution at brx(lbroff)
ELSE skip next sequential instruction
FI
bla
. isrc1ni, isrc2, sbroff ............................................. . Branch on Lee-and Add
LCC-temp clear if isrc2 < -isrc1ni (signed)
LCC~temp set ifisrc2 ~ -isrc1ni (Signed)
isrc2 ~ isrc1ni + isrc2
Execute one more sequential instruction
IF
LCC
THEN LCC ~ LCC-temp
continue execution at brx(sbroff)
ELSE LCC ~ LCC-temp
FI
bnc
Ibroff .............................................•..................Branch on Not ec
CC = 0
IF
THEN continue execution at brx(lbroff)
FI
bnc.t
Ibroff ......................................................... Branch on Not ee, Taken
IF
CC = 0
THEN execute one more sequential instruction
continue execution at brx(lbroff)
ELSE skip next sequential instruction
FI
and
2-221
II
int"eL
i860™ XR MICROPROCESSOR
br
Ibroff . ...................... , ............................. Branch Direct Unconditionally
Execute one more sequential instruction.
Continue execution at brx(lbroff).
bri
[isre1nil ................................................ Branch Indirect Unconditionally
Execute one more sequential instruction
IF
any trap bit in psr is set
THEN
copy PU to U, PIM to 1M in psr
clear trap bits
IF
OS is set and DIM is reset
THEN enter dual-instruction mode after executing one
instruction in single-instruction mode
ELSE IF
DS is set and DIM is set
THEN enter single-instruction mode after executing one
instruction in dual-instruction mode
ELSE IF
DIM is set
THEN enter dual-instruction mode
for next two instructions
ELSE enter single-instruction mode
for next two instructions
FI
FI
FI
FI
Continue execution at address in isre1ni
(The original contents of isre1ni is used even if the next instruction
modifies isre1ni. Does not trap if isre1ni is misaligned.)
isre1s, isre2, sbroff ...................................................... . Branch If Equal
bte
isre 1s = isre2
IF
THEN continue execution at brx(sbro{f)
FI
isre 1s, isre2,' sbroff ................................................... Branch If Not Equal
btne
isre 1s oF isre2
IF
THEN continue execution at brx(sbro{f)
FI
call
Ibroff . ..................................................................Subroutine Call
r1 +- address of next sequential instruction + 4 (+ 8 in dual mode)
Execute one more sequential instruction
Continue execution at brx(lbroff)
calli
[isre 1nil ................................................... , .... Indirect Subroutine Call
r1 +- address of next sequential instruction + 4 (+ 8 in dual mode)
Execute one more sequential instruction
Continue execution at address in isre1ni
(The original contents of isre1ni is used even if the next instruction
modifies isre1ni. Does not trap if isre1ni is misaligned.
The register isre1ni must not be r1.)
fadd.p
fsre 1, fsre2, fdest ..................................................... Floating-Point Add
fdest +- fsre1 + fsre2
faddp
{sre1, fsre2, fdest .......... ......................................... Add with Pixel Merge
{dest +- {sre 1 + {sre2
Shift and load MERGE register as defined in Table 8.2
faddz
fsre 1, fsre2, fdest ...................................................... Addwith Z Merge
fdest +- {sre1 + fsre2
Shift MERGE right 16 and load fields 31..16 and 63 .. 48
famov.r
{sre 1, {dest . .................................................. Floating-Point Addei Move
{dest +- {sre1
Send {sre1 through the floating-point adder. (Preserves -0 (minus zero) when {sre1 is -0. (sre2
must be coded as fO by the assembler.)
2-222
intel®
i860™ XR MICROPROCESSOR
fsrc1, fsrc2, fdest .. .................................................... Long-Integer Add
fsrc1 + fsrc2
fisub.w
fsrc1, fsrc2, fdest . ................................................. Long-Integer Subtract
fdest ~ fsrc1 - fsrc2
fiadd.w
fdest
fix.v
fdest
~
fsrc1, fdest ......................................... Floating-Point to Integer Conversion
64- bit value with low-order 32 bits equal to integer part of fsrc1 rounded
~
Floating-Point Load
fld.y
isrc1(isrc2), fdest ........................................... , .................. (Normal)
f1d.y
isrc1(isrc2)+ +, fdest .......... ......................................... (Autoincrement)
fdest ~ mem.y (isrc1 + isrc2)
IF autoincrement
THEN isrc2 ~ isrc1 + isrc2
FI
Cache Flush
flush
# const(isrc2) ............................................................... ; .. (Normal)
flush
# const(isrc2) + + ...................................................... (Autoincrement) •
Replace block in data cache with address (# const + isrc2).
Contents of block undefined.
IF autoincrement
THEN isrc2 ~ #const + isrc2
FI
fmlow.dd fsrc1, fsrc2, fdest . ............................................ Floating-Point Multiply Low
fdest ~ low-order 53 bits of fsrc 1 mantissa x fsrc2 mantissa
fdest bit 53 ~. most significant bit of mantissa
fmov.r
fsrc1, fdest ................................................ Floating-Point Reg-Reg Move
Assembler pseudo-operation
fmov.ss fsrc1, fdest = fiadd.ss fsrc1, fO, fdest
fmov.dd fsrc1, fdest = fiadd.dd fsrc1, fO, fdest
fmov.sd fsrc1, fdest = famov.sd fsrc1, fdest
fmov.ds fsrc1, fdest = famov.ds fsrc1, fdest
fmul.p
fsrc1, fsrc2, fdest ................................................. Floating-Point Multiply
fdest ~ fsrc 1 x fsrc2
fnop .................................................................. Floating-Point No Operation
Assembler pseudo-operation
fnop = shrd rO, rO, rO
form
fsrc1, fdest .................................................... OR with MERGE Register
fdest ~ fsrc1 OR MERGE
MERGE ~ 0
frcp.p
fsrc2, fdest .................................................... Floating-Point Reciprocal
fdest ~ 1Ifsrc2 with maximum mantissa error < 2- 7
frsqr.p
fsrc2, fdest ....................................... Floating-Point Reciprocal Square Root
fdest ~ 1/SQRT (fsrc2) with maximum mantissa error < 2- 7
Floating-Point Store
fst.y
fdest, isrc1(isrc2) .............................................................. (Normal)
fst.y
fdest, isrc1(isrc2) + + ................................................... (Autoincrement)
mem.y (isrc2 + isrr;;1) ~ fdest
IF autoincrement
THEN isrc2 ~ isrc1 + isrc2
FI
fsrc1, fsrc2, fdest . ................................................ Floating-Point Subtract
fsrc 1 - fsrc2
fsub.p
fdest
~
ftrunc. v
fdest
~
fxfr
idest
fsrc 1, fdest ......... : ............................... Floating-Point to Integer Conversion
64-bit value with low-order 32 bits equal to integer part of fsrc1
fsrc1, idest .............................................. Transfer F-P to Integer Register
fsrc1
~
2-223
int'el.
i860TM XR MICROPROCESSOR
fzchkl
fsrc 1, fsrc2, fdest .................................................. 32-Bit Z-Buffer Check
Consider fsrc1, fsrc2, and fdest as arrays of two 32-bit
fields fsrc1(O) .. fsrc1(1), fsrc2(O)..fsrc2(1), and fdest(O)..fdest(1)
where zero denotes the least~significant field .
. PM ~ PM shifted right by 2 bits
FOR i = 0 to 1
DO
PM [i + 6] ~ fsrc2(i) ,,;: fsrc1(i) (unsigned)
fdest(i) ~ smaller of fsrc2(i) and fsrc1(i)
00
MERGE
~
0
fzchks
fsrc1, fsrc2, fdest .................................................. 16-Bit Z-Buffer Check
Consider fsrc1, fsrc2, and fdest as arrays of four 16-bit
.
fields fsrc1(O) .. fsrc1(3), fsrc2(O)..fsrc2(3) , and fdest(O) .. fdest(3)
where zero denotes the least-significant field.
PM ~ PM shifted right by 4 bits
FOR i = 0 to 3
DO
PM [i + 4] ~ fsrc2(i) ,,;: fsrc1(i) (unsigned)
fdest(i) ~ smaller of fsrc2(i) and fsrc1(i)
00
MERGE
~
0
intovr; ... ; ...... ; ............................................... Software Trap on Integer Overflow
If OF in epsr = 1, generate trap with IT set in psr.
ixfr
isrc1ni, fdest ............................................ Transfer Integer to F-P Register
isrc1ni
fdest
~
idest
~
idest
~
Id.c
Id.x
csrc2, idest . ................................................. Load from Control Register
csrc2
.
isrc1(isrc2j,idest . .......................................................... Load Integer
mem.x (isrc1 + isrc2j
lock .................................................................. Beginlnterlocked Sequence
Set BL in dirbase. The next load or store that misses the cache locks that location.
Disable interrupts until the bus is unlocked.
mov
isrc2, idest ...................................................... Register-Register Move
Assembler pseudo-operation
mov isrc2, idest = shl rO, isrc2, idest
mov
const32, idest . ............................................... Constant-to-Register Move
Assembler pseudo-operation
adds l%const32, rO, idest
... when const32 < Ox8000
orh h %const32, rO, ides!
or l%const32, ides!, idest
... when cons!32 :<: Ox8000
nop ........ ; .....................................................•......... Core-Unit No Operation
Assembler pseudo-operation
nop = shl rO, rO, rO
or
isrc 1, isrc2, idest .. ........................................................... Logical OR
idest ~ isrc1 OR isrc2
CC set if result is zero, cleared otherwise
orh
# const, isrc2, idest ..................................................... Logical OR High
idest ~ . (# const shifted left 16 bits) OR isrc2
CC set if result is zero, cleared otherwise
2-224
intel~
i860TM XR MICROPROCESSOR
pfadd.p
fsrc 1, fsrc2, fdest ........................................... Pipelined Floating-Point Add
fdest ~ last stage Adder result
Advance A pipeline one stage
A pipeline first stage ~ fsrc1 + fsrc2
pfaddp
fsrc1, fsrc2, fdest ......................................... Pipellned Add with Pixel Merge
fdest ~ last stage Graphics result
last stage Graphics result ~ fsrc1 + fsrc2
Shift and load MERGE register from last stage Graphics result as defined in Table 8.2
pfaddz
fsrc1, fsrc2, fdest . ............................................ Pipelined Add with Z Merge
fdest ~ last stage Graphics result
last stage Graphics result ~ fsrc1 + fsrc2
Shift MERGE right 16 and load fields 31..16 and 63 ..48 from last stage Graphics result
pfam.p
fsrc1, fsrc2, fdest ............................... Pipelined Floating-Point Add and Multiply
fdest ~ last stage Adder result
Advance A and M pipeline one stage (operands accessed before advancing pipeline)
A pipeline first stage ~ A-op1 + A-op2
M pipeline first stage ~ M-op1 x M-op2
pfamov.r fsrc1, fdest ......................................... Pipelined Floating-Point Adder Move
fdest ~ last stage Adder result
Advance A pipeline one stage
A pipeline first stage ~ fsrc1
pfeq.p
fsrc1, fsrc2, {dest ................................ . Pipelined Floating-Point Equal Compare
fdest ~ last stage Adder result
CC set if fsrc1 = fsrc2, else cleared
Advance A pipeline one stage
A pipeline first stage is undefined, but no result exception occurs
pfgt.p
fsrc1, fsrc2, fdest ........................ Pipelined Floating-Point Greather-Than Compare
(Assembler clears R-bit of instruction)
{dest ~ last stage Adder result
CC set if fsrc 1 > fsrc2, else cleared
Advance A pipeline one stage
A pipeline first stage is undefined, but no result exception occurs
pfiadd.w fsrc1, fsrc2, fdest ............................................ Pipelined Long-Integer Add
fdest ~ last stage Graphics result
last stage Graphics result ~ fsrc1 + fsrc2
pfisub.w
fsrc1, fsrc2, fdest ........................................ Pipelined Long-Integer Subtract
fdest ~ last stage Graphics result
last stage Graphics result ~ fsrc1 - fsrc2
pfix.v
fsrc1, fdest . ............................... Pipelined Floating-Point to Integer Conversion
fdest ~ last stage Adder result
Advance A pipeline one stage
A pipeline first stage ~ 64-bit value with low-order 32 bits
equal to integer part of fsrc1 rounded
Pipelined Floating-Point Load
pfld.z
isrc1(isrc2).fdest ..............................................................(Normal)
pfld.z
isrc 1(isrc2) + +, fdest .... ; .............................................. (Autoincrement)
fdest ~ mem.z (third previous pfld's (isrc1 + isrc2)
(where .z is precision of third previous pfld_z)
If autoincrement
THEN isrc2 ~ isrc1 + isrc2
FI
pfle.p
. fsrc1, fsrc2, fdest .............................. Pipelined F-P Less-Than or Equal Compare
Assembler pseudo-operation, identical to pfgt.p except that
assembler sets R-bit of instruction.
fdest ~ last stage Adder result
CC clear if fsrc1 ~ fsrc2, else set
Advance A pipeline one stage
A pipeline first stage is undefined, but no result exception occurs
2-225
fI
i860TM XR MICROPROCESSOR
pfmam.p fsrc1, fsrc2, fdest .............................. . Pipellned Floating-Point Add and Multiply
fdest +- last stage Multiplier result
Advance A and M pipeline one stage (operands accessed before advancing pipeline)
A pipeline first stage +- A·op1 - A·op2
M pipeline first stage +- M-op1 x M-op2
pfmov.r
fsrc1, fdest . ...................................... Pipelined Floating-Point Reg-Reg Move
Assembler pseudo-operation
.
pfmov.ss fsrc1, fdest = pfiadd.ss fsrc1, fO, fdest
pfmov.dd fsrc1, fdest = pfiadd.dd fsrc1, fO, fdest
pfmov.sd fsrc1, fdest = pfamov.sd fsrc1, fdest
pfmov.ds fsrc1, fdest = pfamov.ds fsrc1, fdest
pfmsm.p fsrc1, fsrc2, fdest ........................... Pipelined Floating-Point Subtract and Multiply
fdest +- last stage Multiplier result
Advance A and M pipeline one stage (operands accessed before advancing pipeline)
A pipeline first stage isrc1 (signed)
CC clear if isrc2 ~ isrc1 (signed)
"*
subu
isrc1, isrc2, idest ... " . , ' , . , . , . , ' , . , " " " " " " " " " " " " " " " ' " ,Subtract Unsigned
idest ~ isrc1 - isrc2
OF ~ NOT (bit 31 carry)
CC ~ bit 31 carry
(I.e.
CC set if isrc2 ~ isrc1 (unsigned)
CC clear if isrc2 > isrc1 (unsigned)
trap
isrc1ni, isrc2, idest ... , , , , , " " , , . , , , , . , , , ... , , . , .. , , .. , , , , , , , , , , , . , , , , , , , ,Software Trap
Generate trap with IT set in psr
unlock ",.,",", .. " .. ", ... , .. , ... " .. "." .. ,',.,',.".,',.,"""" End Interlocked Sequence
Clear BL in dirbase, The next load or store unlocks the bus.
Enable interrupts after bus is unlocked,
xor
isrc1, isrc2, ides! ., .,.,"',.,""',. """ "',.,"'" " " " " " " " . Logical Exclusive OR
idest ~ isrc1 XOR isrc2
CC set if result is zero, cleared otherwise
xorh
# const, isrc2, idest , . , , , . , , , . , , , , . , , , . , . , , , , , .. , , , , , . , , , , . , , , , ,Logical Exclusive OR High
idest ~ (# const shifted left 16 bit) XOR isrc2
CC set if result is zero, cleared otherwise
2-227
II
inteL
i860TM XR MICROPROCESSOR
Table 8.2. FADDP MERGE Update
Table 8.3. Register Encoding
Pixel
Size
(from PS)
Fields Loaded From
Result into MERGE
Right Shift
Amount
(Field Size)
8
16
32
63 .. 56, 47 ..40, 31 .. 24, 15.. 8
63 .. 58,47 ..42,31 .. 26,15 .. 10
63 .. 56,
31 .. 24
8
6
8
Register
Encoding
rO
0
r31
31
10
0
f31
31
Fault Instruction
Processor Status
Directory Base
Data Breakpoint
Floating-Point Status
Extended Process Status
0
1
2
3
4
5
8.2 Instruction Format and Encoding
All i"nstructions are 32 bits long and begin on a fourbyte boundary. When operands are registers, the
register encodings shown in Table 8.3 are used.
There are two general core-instruction formats,
REG-format and CTRL-format, as well as a separate
format for floating-point instructions.
8.2.1 REG-FORMAT INSTRUCTIONS
Within the REG-format are several variations as
shown in Figure 8.1. Table 8.4 gives the encodings
for these instructions. One encoding is an escape
code that defines yet another variation: the core escape instructions. Figure 8.2 shows the format of
this group, and Table 8.5 shows the encodings.
For Id and st, bits 28 and zero determine operand
size as follows:
In these instructions, the src2 field selects one of
the 32 integer registers (most instructions) or five
control registers (st.c and Id.c). Dest selects one of
the 32 integer registers (most instructions) or floating-point registers (fld, fst, pfld, pst, ixfr). For instructions where src1 is optionally an immediate value, bit 26 of the opcode (I-bit) indicates whether src1
is an immediate. If bit 26 is clear, an integer register
is used; if bit 26 is set, src1 is contained in the loworder 16 bits, except for bte and btne instructions.
For bte and btne, the five-bit immediate value is
contained in the src1 field. For st, bte, btne, and
bla, the upper five bits of the offset or broffset are
contained in the dest field instead of src1, and the
lower 11 bits of offset are the lower 11 bits of the
instruction.
Bit 28
Bit 0
Operand Size
0
0
1
1
0
1
0
1
8-bits
8-bits
16-bits
32-bits
When src1 is an immediate and bit 28 is set, bit zero
of the immediate value is forced to zero.
For fld, fst, pfld, pst, and flush, bit 0 selects autoincrement addressing if set. For fld, fst, pfld, and
pst, bits one and two select the operand size as
follows:
Bit 1
Bit 2
0
0
1
1
0
1
0
1
Operand Size
64-bits
128-bits
32-bits
32-bits .
When src1 is an immediate value, bits zero and one
of the immediate value are forced to zero to maintain alignment. When bit one of the immediate value
is clear, bit two is also forced to zero.
For flush, bits one and two must be zero.
2-228
intet
i860TM XR MICROPRO.CESSOR
General Format
25
31
OPCODE/I
20
15
SRC2
o
10
SRC1
IMMEDIATE, OFFSET, OR NULL
16-Blt Immediate Variant (except bte and btne)
I
20
25
31
SRC2
OPCODE
I
15
DEST
o
.
I
IMMEDIATE
st, bla, bte, and btne
25
31
OPCODE/I
20
15
SRC2
o
10
SRC1
SRC1S
OFFSET
HIGH
OFFSET LOW
bte and btne with 5-Blt Immediate
31
20
15
OFFSET
HIGH
IMMEDIATE
Figure 8.1. REG-Format Variations
2-229
o
10
OFFSET LOW
FI
i860™ ·XR MICROPROCESSOR
Table 8.4.
REG~Format
Opcodes
26
31
Id.x
st.x
ixfr
Load Integer
Store Integer
Integer to F-P Reg Transfer
(reserved)
Load/Store F-P
Flush
Pixel Store
Load/Store Control Register
Branch Indirect
Trap
(Escape for F-P Unit)
(Escape for Core Unit)
Branch Equal or Not Equal
Pipelined F~P Load
(CTRL-Format Instructions)
Add/Subtract
Logical Shift
Double Shift
Branch LCC Set and Add
Arithmetic Shift
AND·
ANDNOT
OR
XOR
(reserved)
tld.x, fst.x
flush
pst.d
Id.c, st.c
bri
trap
bte, btne
pfld.y
addu, os, subu, os,
shl, shr
shrd
bla
shra
and(h)
andnot(h)
or(h)
xor(h)
L
LS
SO
H
Integer Length
0 -8 bits
1 -16 or 32 bits (selected by bit 0)
Load/Store
0 -Load
1 -Store
Signed/Ordinal
0 -Ordinal
1 -Signed
High
0 --and, or, andnot, xor
1 -andh, orh, andnoth, xorh
31
26
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
AS
LR
E
I
0
1
0
1
1
1
0
0
0
0
1
0
x
SO
0
1
1
1
0
1
0
1
x
x
10
SRC1
L
L
0
1
1
1
LS
0
1
LS
0
0
1
1
E
0
I
1
0
0
I
1
1
0
0
1
0
1
I
I
x
AS
LR
x
I
I
0
0
1
H
H
H
H
1
0
1
I
I
I
I
I
0
5
reserved'
Figure 8.2. Core Escape Instruction Format
2-230
0
0
0
0
1
1
1
1
0
0
0
0
0
1
1
0
1
1
1
1
0
0
1
1
Add/Subtract
0 -Add
1 -Subtract
Left/Right
0 -Left Shift
1 -Right Shift
Equal
0 -Branch on Not Equal
1 -Branch on Equal
Immediate
0 -src 1 is register
1 -src1 is immediate
15
1 0 0 1
reserved'
0
1
1
'reserved (must be set to zero by assemblers)
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
0
0
0
0
0
1
1
1
1
1
0
OPCODE
I
intel"
i860TM XR MICROPROCESSOR
Table 8.5. Core Escape Opcodes
o
4
lock
calli
intovr
unlock
(reserved)
Begin Interlocked Sequence
Indirect Subroutine Call
(reserved)
Trap on Integer Overflow
(reserved)
(reserved)
End Interlocked Sequence
(reserved)
(reserved)
(reserved)
0
0
0
0
0
0
0
0
1
0
1
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
1
1
1
1
x
x
x
0
0
1
1
0
0
1
1
x
x
x
0
1
0
1
0
1
0
1
x
x
x
8.2.2 CTRL-FORMAT INSTRUCTIONS
The CTRL instructions do not refer to registers, so instead of the register fields, they have a 26-bit relative
branch offset. Figure 8.3 shows the format of these instructions and Table 8.6 defines the encodings.
31
26
o
25
BROFFSET
BROFFSET is a signed 26-bit relative branch offset.
Figure 8.3. CTRL Instruction Format
Table 8.6. CTRL-Format Opcodes
28
br
call
bc(.t)
bnc(.t)
T
(reserved)
(reserved)
Branch Direct
. Call
Branch on CC Set
Branch on CC Clear
0
0
0
0
1
1
26
0
0
1
1
0
1
0
1
0
1
T
T
Taken
o -bc or bnc
1 -bc.t or bnc.t
2-231
InteL
i860TMXR MICROPROCESSOR
8.2.3 FLOATING-POINT INSTRUCTIONS
The floating·point instructions also constitute an escape series. All these instructions begin with the bit sequence 010010. Figure 8.4 shows the format of the floating point instructions, and Table 8.7 gives the encodings. Within the dual-operation instructions is a subcode OPC whose values are given in Table 8.8 along with
.
the mnemonic that corresponds to each.
31
15
20
25
DEST
SRC2
OPCODE
SRC1
SRC1, SRC2 -Source; one of 32 floating-point registers
DEST
-Destination register
(instructions other than Ixlr) one of 32 floating-point registers
(Ixlr) one of 32 integer registers
P Pipelining
1 -Pipelined instruction mode
o -Scalar instruction mode
D Dual-Instruction Mode
1 -Dual-instruction mode
o -Single-instruction mode
S Source Precision
1 -Double-precision source operands
o -Single-precision source operands
R Result Precision
1 -Double-precision result
o -:-Single-precision result
Figure 8.4. Floating-Point Instruction Encoding
Table 8.7. Floating-Point Opcodes
o
6
pfam
pfmam
pfsm
pfmsm
Add and Multiply'
Multiply with Add'
Subtract and Multiply'
Multiply with Subtract'
(p)fmul
fmlow
frcp
frsqr
pfmul3.dd
0
0
0
OPC
0
0
1
OPC
Multiply
Multiply Low
Reciprocal
Reciprocal Square Root
3-Stage Pipelined Multiply
0
0
0
0
0
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
1
.1
0
0
1
0
1
0
(p)fadd
(p)fsub
(p)fix
(p)famov
pfgtlpfle"
pfeq
(p)ftrunc
Add
Subtract
Fix
Adder Move
Greater Than
Equal
Truncate
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
1
0
0
0
0
1
1
0
0
0
1
1
0
0
1
0
1
0
1
0
1
0
fxfr
(p)fiadd
(p)fisub
Transfer to Integer Register
Long-Integer Add
Long-Integer Subtract
1
1
1
0
0
0
0
0
0
0
1
1
0
0
1
0
0
0
0
1
1
(p)fzchkl
(p)fzchks
(p)faddp
(p)faddz
(p)form
Z-Check Long
Z-Check Short
Add with Pixel Merge
Add with Z Merge
OR with MERGE Register
1
1
1
1
1
0
0
0
0
0
1
1
1
1
1
0
1
0
0
1
1
1
0
0
0
1
1
0
0
1
1
1
0
1
0
'pfam and pfsm have P-blt set; pfmam and pfmsm have P-blt clear.
"pfgt has R bit cleared; pIle has R bit set.
NOTE: /
All opcodes not shown are reserved.
2-232
int'eL
i860™ XR MICROPROCESSOR
The following table shows the opcode mnemonics that generate the various encodings of DPe and explains
each encoding.
Table 8 8.
ope
PFAM
Mnemonic
PFSM
Mnemonic
0000
0001
0010
0011
r2p1
r2pt
r2ap1
r2apt
0100
0101
0110
0111
ope Encoding
K
M-Unit
op1
M-Unit
op2
A-Unit
op1
A-Unit
op2
T
Load
Load'
r2s1
r2st
r2as1
r2ast
KR
KR
KR
KR
src2
src2
src2
src2
src1
T
src1
T
M result·
M result
A result
A result
No
No
Yes
Yes
No
Yes
No
Yes
i2p1
i2pt
i2ap1
i2apt
i2s1
i2st
i2as1
i2ast
KI
KI
KI
KI
src2
src2
src2
src2
src1
T
src1
T
M result
M result
A result
A result
No
No
Yes
Yes
No
Yes
No
Yes
1000
1001
1010
1011
rat1p2
m12apm
ra1p2
m12ttpa
rat1s2
m12asm
ra1s2
m12ttsa
KR
src1
KR
src1
A result
src2
A result
src2
src1
A result
src1
T
src2
M result
src2
A result
Yes
No
No
Yes
No
No
No
No
1100
1101
1110
1111
iat1p2
m12tpm
ia1p2
m12tpa
iat1s2
m12tsm
ia1s2
m12tsa
KI
src1
KI
src1
A result
src2
A result
src2
src1
T
src1
T
src2
M result
src2
A result
Yes
No
No
No
No
No
No
No
ope
PFMAM
Mnemonic
PFMSM
Mnemonic
M-Unit
op1
M-Unit
op2
A-Unit
op1
A-Unit
op2
T
Load
Load'
0000
0001
0010
0011
mr2p1
mr2pt
mr2mp1
mr2mpt
mr2s1
mr2st
mr2ms1
mr2mst
KR
KR
KR
KR
src2
src2
src2
src2
src1
T
src1
T
M result
M result
M result
M result
No
No
Yes
Yes
No.
Yes
No
Yes
0100
0101
0110
0111
mi2p1
mi2pt
mi2mp1
mi2mpt
mi2s1
mi2st
mi2ms1
mi2mst
KI
KI
KI
KI
src2
src2
src2
src2
src1
T
src1
T
M result
M result
M result
M result
No
No
Yes
Yes
No
Yes
No
Yes
1000
1001
1010
1011
mrmt1p2
mm12mpm
mrm1p2
mm12ttpm
mrmt1s2
mm12msm
mrm1s2
mm12ttsm
KR
src1
KR
src1
M result
src2
M result
src2
src1
M result
src1
T
src2
M result
src2
A result
Yes
No
No
Yes
No
No
No
No
1100
1101
1110
1111
mimt1p2
mm12tpm
mim1p2
mimt1s2
mm12tsm
mim1s2
KI
src1
KI
src2
M result
src2
Yes
No
No
No
No
No
M result
src1
src2
T
M result
src1
Intel-Reserved
. .
K
..
*If K-Ioad IS set, KR IS loaded when operand-1 of the multiplier IS KR; KIIS loaded when operand-1 of the multiplier IS KI.
2-233
intet
i860TM XR MICROPROCESSOR
8.3 Instruction Timings
i860 XR microprocessor instructions take one clock
to execute unless a freeze condition is invoked.
Freeze conditions and their associated delays are
shown in the table below. Freezes due to multiple
simultaneous cache misses result in a delay that is
the sum of the delays for processing each miss by
itself. Other multiple freeze conditions usually add
only the delay of the longest individual freeze.
Delay
Freeze Condition
Instruction-cache miss
Number of clocks to read instruction (from ADS
clock to first READY # clock) plus time to last
READY # of block when jump or freeze occurs
during miss processing plus two clocks if datacache being accessed when instruction-cache
miss occurs.
Reference to destination of Id instruction that
misses
One plus number of clocks to read data (from
ADS# clock to first READY# clock) minus number
of instructions executed since load (not counting
instruction that references load destination)
tid miss
One plus number of clocks until first READY #
returned (for 32- or 64-bit read cycles) or until
second READY # returned (for 128-bit fld.q read
cycles)
call, calli, Ixfr, fxfr, Id.c, or st.c and data cache
load miss processing in progress
One plus number of clocks until first READY #
returned (for 64-bit read cycles) or until second
READY # returned (for 128-bit fld.q read cycles)
Id/stlpfld/fld/fst and data cache load miss
processing in progress
One plus number of clocks until last READY #
returned
Reference to dest of Id, call, calli, fxfr, or Id.c in
the next instruction. (Dest of call and calli is r1.)
One clock
2-234
i860TM XR MICROPROCESSOR
Freeze Condition
Delay
Reference to dest of fld/pfldlixfr in the next two
instructions
Two clocks in the first instruction; one in the
second instruction
bc/bnc/bc.t/bnc.t following addu/adds/subu/
subs/pfeq/pfle/pfgt
One clock
Fsrc1 of multiplier operation refers to result of
previous operation
One clock
Floating-point operation or graphics-unit
instruction or fst, and scalar operation in progress
other than frcp or frsqr
If the scalar operation is fadd, fix, fmlow, fmul.ss,
fmul.sd, ftrunc, or fsub, two minus the number of
instructions (or dual-mode pairs) already executed
after the scalar operation. If the scalar operation is
fmul.dd, three minus' the number of instructions
(or dual-mode pairs) executed after it. Add one if
either or both of these two situations occur:
1. There is an overlap between· the result register
of the previous scalar operation and the source
of the floating·point operation, and the
destination precision of the scalar operation is
different than the source precision of the
floating-point operation.
2. The floating-point operation is pipelined and its
destination is not fO.
There is no delay if the result is negative.
Multiplier operation preceded by a double
precision multiply
One clock
TLB miss
Five plus the number of clocks to finish two reads
plus the number of clocks to set A-bits (if
necessary)
pfld when three pfld's are outstanding
One plus the numlJer of clocks to return data from
first pfld
pfld hits in the data cache
Two plus the number of clocks to finish all
outstanding accesses
st, pst or fst miss, Id miss, or flush with modified
block when store path full (two stores or one 256bit write-back internally waiting for bus plus
external bus pipeline full)
One plus the number of clocks until READY #
active on next 64-bit write cycle or second
READY # of next 128-bit write cycle.
Id, fld, pfld, st, pst, or fst when address path full
(one address internally waiting for bus plus
external bus pipeline full)
Number of clocks until next nonrepeated address
can be issued (I.e., an address that is not the 2nd4th cycle of a cache fill, the 2nd-8th cycle of a
CS8 mode instruction fetch, nor the 2nd cycle of a
128-bit write)
Id/fld following st/fst hit
One clock
2-235
II
i860TM XR MICROPROCESSOR
Freeze Condition
Delay
Delayed branch not taken
One clock
Nondelayed branch taken:
bc,bnc
bte, btne
One clock
Two clocks
Indirect branch bri or call calli
One clock
st.c
Two clocks
Result of graphics-unit instruction (other than
fmov.dd) used in next instruction when the next
instruction is an adder- or multiplier-unit instruction
One clock
Result of graphics-unit instruction used in next
instruction when the next instruction is a graphicsunit instruction
One clock
flush followed by flush
Three clocks minus the number of instructions
between the two flush instructions. There is no
delay if the result is negative.
fst or pst followed by pipelined floating-point
operation that overwrites the register being stored
One clock
8.4 Instruction Characteristics
The following table lists some of the characteristics
of each instruction. The characteristics are:
• What processing unit executes the instruction.
The codes for processing units are:
A
Floating-point adder unit
E
Core execution unit
G
Graphics unit
Floating-point multiplier unit
M
• Whether the instruction is pfpelined or not. A P
indicates that the instruction is pipelined.
• Whether the instruction is a delayed branch instruction. A D marks the delayed branches.
• Whether the instruction changes the condition
code CC. A CC marks those instructions that
change CC.
• Which faults can be caused by the instruction.
The codes used for exceptions are:
IT
Instruction Fault
SE
Floating-Point Source Exception
RE
Floating-Point Result Exception, including
overflow, underflow, inexact result
OAT Data Access Fault
Note that this is not the same as specifying at
which instructions faults may be reported. A result exception is reported on the subsequent
floating-point instruction, pst, fst, or sometimes
fld, pfld, and ixfr.
2-236
The instruction access fault IAT and the interrupt
trap IN are not shown in the table because they
can occur for any instruction.
• Performance notes. These comments regarding
optimum performance are recommendations
only. If these recommendations are not followed,
the i860 XR microprocessor automatically waits
the necessary number of clocks to satisfy internal
hardware requirements. The following notes define the numeric codes that appear in the instruction table:
1. The following instruction should not be a conditional branch (bc, bnc, bc.t, or bnc.t).
2. The destination should not be a source operand of the next two instructions.
3. A load should not directly follow a store that is
expected to hit in the data cache.
4. When the prior instruction is scalar, fsret
should not be the same as the fdest of the
prior operation.
5. The fdest should not reference the destination
of the next instruction if that instruction is a
pipelined floating-point operation.
6. The destination should not be a source operand of the next instruction. (For call and calli,
the destination is r1.)
i860TM leR MICROPROCESSOR
o
7. When the prior operation is scalar and multiplier opt is fsrct, fsrc2 should not be the same
as the fdest of the prior operation.
8. When the prior operation is scalar, fsrct and
fsrc2 of the current operation should not be the
same as fdest of the prior operation.
9. A pfld should not immediately follow a pfld.
Programming restrictions. These indicate combinations of conditions that must be avoided by
programmers, assemblers, and compilers. The
following notes define the alphabetic codes that
appear in the instruction table:
a. The sequential instruction following a delayed
control-transfer instruction may not be another
control-transfer instruction (except in the case
of external interrupts), nor a trap instruction,
nor the target of a control-transfer instruction.
b. When using a bri to return from a trap handler,
programmers should take care to prevent traps
from occurring on that or on the next sequential instruction. 1M should be zero (interrupts
disabled) when the bri is executed.
c. If fdest is not zero, fsrct must not be the same
as fdest.
d. When fsrct goes to the multiplier opt, KR, or
KI, fsrct must not be the same as fdest.
e. If fdest is not zero, fsrct and fsrc2 must not be
the same as fdest
f. isrct must not be the same as isrc2 for the
autoincrementing form of this instruction.
2. If an ixfr, fld, or pfld loads the same register
as a source operand in the floating point instruction, the floating-point instruction references the register value before,the load updates it.
3. An fst or pst that stores a register that is the
destination register of the companion pipelined floating-point operation will store the result of the companion operation.
4. When the core instruction sets CC and the
floating-point instruction is pfgt, pfle, or pfeq,
CC is set according to the result of pfgt, pfle,
or pfeq.
5, When a trap instruction causes a trap in dualinstruction mode, the floating-point instruction
has neither completed execution nor has up- •
dated the FT bit or any result status bits. This
is not a problem when the trap is inserted by a
debugger, because the trap is replaced by the
original instruction, and the dualcmode pair is
reexecuted. However, when the trap is programmed, the trap handler must avoid reexecuting the trap by returning to user code at
the address in fir + 8. In this case, the trap
handler must emulate the floating-point instruction before returning to the user code.
Emulation of the instruction must include all
side-effects (for example, the effect of its
D-bit, effect on the pipelines, and effect on FT
and result-status bits), just as if the instruction
had been executed by the processor in the
original context.
g. isrct must not be the same as isrc2.
6. In dual-instruction mode, when the intovr instruction causes a trap, the floating-point companion instruction has completely finished execution before the trap is taken.
o Core and Floating-Point Instruction Interaction in
Dual-Instruction Mode
1. If one of the branch-on-condition instructions
be or bne is paired with a floating-point compare, the branch tests the value of the condition code prior to the compare.
2-237
intel" .
i860™ XR MICROPROCESSOR
-. Programming Restrictions for Dual-Instruction
.
Mode
1. The result of placing a core instruction in the
.low-order 32 bits or a floating-point instruction
in the high-order 32 bits is not defined (except
for shrd rO, rO, rO which is interpreted as
fnop).
2. A floating-point instruction that has the D-bit
set must be aligned on a 64-bit boundary (Le.,
the three least-significant bits of its address
must be zero). This applies as well to the initial
32-bit floating-point instruction that triggers
the transition into dual-instruction mode, but
does not apply to the following instruction.
3.· When the floating-point operation is scalar
and the core operation is fst or pst, the store
should not reference the result register of the
floating-point operation. When the core operation is pst, . the floating-point instruction can... not be (p)fzchks or (p)fczhkl.
4. When the core instruction of a dual-mode pair
is a control-transfer operation and the previous instruction had the D~bit set, the floatingpoint instruction must also have the D-bit set.
In other words, an exit from dual-instruction
mode cannot be initiated (first instruction pair
without D-bit set) when the core instruction is
a control-transfer instruction.
5. When the core operation is a Id.c or st.c, the
floating-point operation must be d.fnop.
6. When the floating-point operation is fxfr, the
core instruction cannot be Id, Id.c, st, st.c,
call ixfr, or any instruction that updates an integer register (including autoincrement indexing). Furthermore, the core instruction cannot
be a fld, fst, pst, or pfld that uses as isrc1 or
isrc2 the same register as the idest of the
fxfr. Additionally, in dual instruction mode,
txfr may not be used in a branch delay slot if
its destination register is referenced by the
preceding branch instruction.
7. A bri must not be executed in dual-instruction
mode if any trap bits are set.
8. When the core operation is bc.t or bnc.t, the
floating point operation cannot be pfeq or
ptgt. The floating-point operation in the sequentiallyfollowing instruction pair cannot be
pfeq or pfgt, either.
9. A transition to or from dual-instruction mode
cannot be initiated on the instruction .following
a brio
10. An ixfr, fld, or pfld cannot update the destination of the companion floating-point instruction (unless the destination is fO or f1)
or of the following pipe lined floating-point instruction (regardless of its destination register). No overlap of register destinations is
permitted; for example, the following instructions must not be paired:
II Illegal case I
d.fmul.ss f9, flO, f5
fld.d
address, f4
; Overlaps f5
II Illegal case 2
d.fmul.ss fO, fO, f3
fld.q
address, fO
; Overlaps f3
II Illegal case 3
d.fmul.ss f9, flO, fll
fld.l
address, f5
d.pfadd.ss fX,fx, f4
Overlaps f5, if last
stage result is doubleprecision
11. During a locked sequence, a transition to or
from dual-instruction mode is not permitted.
2-238
intel·
i860™XR MICROPROCESSOR
Table 8.9 Instruction Characteristics
Instruction
Execution
Unit
adds
addu
and
andh
andnot
E
E
E
E
E
andnoth
bc
bc.t
bla
bnc
E
E
E
E
E
bnc.t
br
bri
bte
btne
call
calli
fadd.p
faddp
faddz
famov.r
fiadd.z
fisub.z
fix.p
fld.y
flush
fmlow.p
fmul.p
form
frep.p
Pipelined?
Delayed?
Sets
CC?
Faults
CC
CC
CC
CC
CC
CC
Performance
Notes
Programming
Restrictions
1
1
0
0
a
a,g
E
E
E
E
E
0
0
0
a
a
a, b
E
E
0
0
6
6
A
a
a
SE, RE
8
8
G
G
A
SE, RE
8
8
G
G
A
SE, RE
E
OAT
2,3
SE,RE
4
4
8
f
E
M
M
G
M
SE,RE
M
SE, RE
frsqr.p
fst.y
fsub.p
ftrunc.p
fxfr
G
6,8
fzchkl
fzchks
intovr
ixfr
Id.c
G
G
E
E
E
8
8
Id.x
or
orh
pfadd.p
pfaddp
E
E
E
E
OAT
A
A
SE, RE
SE, RE
A
G
5
f
IT
2
OAT
6
CC
CC
P
P
SE,RE
8
2·239
e
i860TM XR MICROPROCESSOR
Table 8.9 Instruction Characteristics (Continued)
Execution
Unit
Pipelined?
Delayed?
pfaddz
pfam.p
pfamov.r
pfeq.p
pfgt.p
pfiadd.z
G
A&M
A
A
A
G
P
P
P
P
P
P
pfisub.z
pfix.p
pfld.z
pfmam.p
G
A
E
A&M
P
P
P
P
pfmsm.p
pfmul.p
pfmul3.dd
pform
pfsm.p
pfsub.p
A&M
M
M
G
A&M
A
P
P
P
P
P
P
SE, RE
SE, RE
SE, RE
pftrunc.p
pfzchkl
pfzchks
pst.d
shl
A
G
G
E
E
P
P
P
SE,RE
shr
shra
shrd
st.c
st.x
E
E
E
E
E
subs
subu
trap
xor
xorh
E
E
E
E
E
Instruction
Sets
CC?
Faults
SE, RE
SE,RE
SE
SE
CC
CC
SE, RE
OAT
SE, RE
SE,RE
SE, RE
Performance
Notes.
Programming
Restrictions
8
7
e
d
1
1
8
e
8
e
2,9
7
f
d
7
4.
4
8
7
d
c
c
e
d
8
8
OAT
f
OAT
CC
CC
1
1
IT
Ii
CC
CC
DATA SHEET REVISION REVIEW
The following list represents the key differences be·
tween version 002 and version 001 of the i860 XR
. Microprocessor Data Sheet.
1. Big·endian description in section 2.3 has been
expanded.
2. Bit 17 of the Extended Processor Status Register (EPSR) is the INT bit which reflects the value
on the interrupt pin (INT), as described in section 2.2.4 entitled "EXTENDED .PROCESSOR
STATUS REGISTER". This is a documentation
update only.
3. The cacheability of a page is controlled by
NOR'ing the value of the CD, WT bits and the
4.
5.
KEN # input pin, as described in section 2.5 entitled "Caching and Cache Flushing" and section 3.1.14 entitled "Cache Enable (KEN#)".
This is a documentation update only.
The NOTE section in section 2.5 entitled "Caching and Cache Flushing" has been updated to
clarify the paging requirement on changing the
DTB field in the dirbase register.
Information on register encoding is added in
section 8.2 entitled "Instruction Format and Encoding". This is a documentation update only.
The following list represents the key differences between version 003 and version 002 of the i860 XR
Microprocessor Data Sheet.
2-240
intel~
i860TM XR MICROPROCESSOR
Specification Changes:
1. Specification changes for improved AC performance are in section 7.3.
2. HOLD is acknowledged during locked bus cycles. See section 3.1.8.
3. Additional paths have been added to the bus
state diagram to allow direct transitions from
states T12 and T11 to state TH. See Figures 4.1
and 4.10.
4. Two new instructions, (p)famov.r, have been
added. These replace (p)fadd.ds and
(p)fadd.sd in the assembler pseudo-ops
(p)fmov.r. These changes are in section 8.1
and tables 2.7, 8.7, and 8.9.
Documentation Changes:
1. Big and little end ian description has been expanded in sections 2.2.2, 2.3, and Figure 2.8.
2. The actions and explanations of the lock, unlock, and st.c dirbase changing the Bl bit have
been updated in sections 2.2.4, 3.1.5, 3.1.8,
4.3.4, 4.3.5, and 8.1.
3. The explanation of the AA and MA bits of the
fpsr have been expanded in section 2.2.8.
4. The explanation of the WT bit of the Page Table
Entries has been expanded in sections 2.4.4.4
and 2.5.
5. A change concerning the locking of the bus during address translation is explained in sections
2.4.5 and 2.8.5.
6. A further explanation on when to flush the data
cache is given in section 2.5.
7. The explanation of the floating point multiplier
pipeline has been expanded in section 2.6.1.
8. The explanation of BREQ has been expanded
in section 3.1.4 and Figure 4.1.
9. The explanation of result exceptions has been
expanded in sections 2.8 and 3.2.
10. Instruction fetch identification has been clarified
in section 3.1.6 and table 3.2.
11. Bus cycle diagrams in Figures 4.7,4.8, and 4.10
have been clarified/corrected.
12. Precision specification .r has been added to
section 8.0 and table 8.1.
13. In section 8.4, performance note 9 has been
added, programming restriction d has been
changed, and programming restriction f has
been added. Table 8.9 has been updated to reflect these changes.
14. The description of testability has changed in
sections 3.3. and 3.3.2. RESET and HOLD must
be asserted by the tester to force the chip outputs to float (tri-state).
2-241
The following list represents the major differences
between version 004 and version 003 of the i860 XR
Microprocessor Data Sheet:
Section 2.2.4 The explanation of the WP bit of the
espr has been expanded.
Section 2.8.2 More information on the instruction
trap has been added.
Section 2.8.4 The instruction access trap has been
clarified.
Section 2.8.7 The values of registers after a reset
trap have been specified.
Section 3.1.4 BREQ timing has been clarified.
Section 3.1.5 The calculation of interrupt latency
has bee corrected.
Section 3.1.6 The description of the byte-enable
signals has been expanded.
Section 3.1.8 The relation between the lock
instruction and the lOCK # signal has
been clarified. The Bl bit should no
longer be changed by writing to the
dirbase register.
Section 6.0 The thermal specifications have been
updated.
Section 7.3 The A.C. Characteristics for ClK have
changed.
Section 7.3 Advance timing information for the 50
MHz clock rate has been added.
These timings are subject to change
without notice.
Section 8.0 The operand naming conventions
have improved.
Section 8.2.1 The encoding of the flush instruction
has been corrected.
Section 8.3 The data-dependent multiplier freeze
has been eliminated. Other freeze
conditions have been corrected or
clarified.
The following list represents the major differences
between version 005 and version 004 of the i860 XR
Microprocessor Data Sheet.
Section 2.2.4 OF bit is writable only in supervisor
mode using ST.C.
Section 3.1.1 ClK rate has been updated.
Section 5.0 Figure 5.3 has been corrected.
Section 6.0 More information on measuring case
temperature has been added.
Section 6.0 Figure 6.1' has been updated to include 25 MHz.
Section 6.0 Table 6.1 has been corrected.
Section 6.0 Table 6.2 has been updated to include 25 MHz.
int'et
Section 7.2
Section 7.3
Section 7.3
i860™ XR MICROPROCESSOR
The D.C. Characteristics have been
updated to include 25 MHz power supply current.
The A.C. Characteristics for ClK have
been changed.
50 MHz clock rate has been deleted.
Section 7.3
Section 7.3
Section 8.3
Section 8.4
2-242
25 MHz A.C. Specifications have been
added.
Figure 7.1 has been corrected.
The data-dependent multiplier rounding freeze has been eliminated.
Programming restrictions for dual-instruction mode are added.
.
.
..
82495XP CACHE CONTROllERI
82490XP CACHE RAM
Two-Way, Set Associative, Secondary
Cache for i860™ XP Microprocessor
I!ill MESI Cache Consistency Protocol
EI Hardware Cache Snooping
50 MHz "No Glue" Interface with CPU
Configurable
- Cache Size 256 or 512 Kbytes
- Line Width 32, 64 or 128 Bytes
- Memory Bus Width 64 or 128 Bits
0
Maintains Consistency with Primary
Cache via Inclusion Principle
~
Flexible User-Implemented Memory
Interface Enables Wide Range of
Product Differentiation
- Clocked or Strobed
- Synchronous or Asynchronous
-Pipelining
- Memory Bus Protocol
II Dual-Ported Structure Permits
Simultaneous Operations on CPU and
Memory Buses
III
IB
Efficient MRU Way Prediction
- Zero Wait States on MRU Hit
- One Wait State on MRU Miss
G 82495}{P Cache Controller Available in
Dynamically Selectable Update Policies
- Write-Through
- Write-Once
- Write-Back
0
208-lead Ceramic Pin Grid Array
Package
8249Q}(P Cache RAM Available in 84Lead Plastic Quad Flatpack Package
(See Packaging Handbook, Order # 240800)
The Intel 82495XP cache controller and 82490XP cache RAM, when coupled with a user-implemented memory bus controller, provide a second-level cache subsystem that eliminates the memory latency and bandwidth
bottleneck for a wide range of multiprocessor systems based on the i860 XP microprocessor. The CPU
i,nterface is optimized to serve the i860 XP microprocessor with zero wait states at up to 50 MHz. A secondary
cache built from the 82495XP and 82490XP isolates the CPU from the memory subsystem; the memory can
run slower and follow a different protocol than the i860 XP microprocessor.
.-----
j'• -
--~
.
.,'. . TRANSCEIVER I
I11. ____
(OPTIONAL)
... ___ •I
... _ .... _..at
ADDRESS
CONTROL
MEMORY BUS
240956-60
Figure 0-1. Secondary Cache Configuration
Intel, Intel, and iS60 are trademarks of Intel Corporation.
Intel Corporation assumes no responsibility for the use of any circuitry other than circuitry embodied in an Intel product. No other circuit patent
June 1991
licenses are implied. Information contained herein supersedes previously published specifications on these devices from Intel.
© INTEL CORPORATION, 1991
2-243
Order Number: 240956·001
82495XP Cache Controller/82490XP Cache RAM
1.0 82495XP/82490XP PINOUTS
A
o
TAG9
BCD
000
RDYSRC
TAG 10
2
o
TAG7
o
rSIOUT"
o
MeACHEIf
efA!
3
o
TAC!
o
0
E
o
Vee
o
o
0
o
CADS"
o
TAGI
0
o
G
o
Vee
o
V55
H
J
o
TDO
o
0
CWR"
o
o
Vee
o
V55
V55
o
o
0
K
Vee
o
V55
o
o
CWAY
0·0
o
Vee
o
V55
o
MWBWTIIt
CAHOLD
o
0
F'PFlDfP
0
o
7
o
o
9
o
o
SETtO
0
SWLN*
o
0
0
Vee
o
0
0
o
0
0
o
Ne
o
o
15
o
Ne
o
0
BlAS'HI"
16
o
erA'"
0
0
o
0
AHOlO
A
B
6
7
8
9
¥ss
10
Vss
..SEll 0
o
Vee
000
WSET9
o
Vo<
006
13
MSETS
0
0
0
CACHE'"
SROye2"
0
crAD
0
0
DOFF.
o
o
Vee
o
0
peYc
0
o
v"
o
Vee
HIT... "
o
V,s
o
0
EAOS"
F
G
H
0
o
Vss
0
NA'
o
V,,
o
o
WAY
o
o
o
Vss
K
w~P
o
o
V55
o
o
Vee
Vee
L
0
MerA"
Vss
Vee
0
WCfAS
WBWEIII
Vee
J
.£.,
W.A
INV
Vee
E
Ow.?.. 0
WBWHI
PCD
LEN
Vss
0
DC'
N
o
BUS_
0
wcye.
P
12
Vss
wcrA6
CF A5
11
Vss
WSETI
0
o
o
Vco
Vss
000
000
C
5
000
0
PWT
o
Yss
IolSET2
WIO"
WR"
Vco
WTAGO
LOCK"
17
Yss
IASET8
0
0
4
t.lTAG2
BROye,.
BlE#"
0
Vss
MTAGI
BOTTOM SIDE VIEW
0
0
M40[.
000
crAB
ADS"
0
MTAG4
o
o
3
McrA2
WTAGe
SETO
SET2
o
SNPOLK
000
82495XP
SETI
erA1
0
M14Gl0
SET6
Vss
2
000
SET'"
Vee
0
SYNC_
MUG 11
elK
0
o
Ne
SNPSTBIII
MCrA!
SETS
Vee
o
SNPHCA
MBAOE"
SETS
SET3
13
0
vss
Vss
12
o
000
crA2
SET7
0
0
RESET
1A02
o
o
MALE
T.S
WBALE
TAG8
Vee
o
WKEN"
000
TAGO
0
o
Ne
SWEND"
SNPINV
000
Vee
o
0
FLUSH"
TAG6
Vss
11
0
000
Vss
000
TeK
NEHE"
SET9
10
o
TDI
000
¥ss
Vee
vee
o
o
V55
TAGll
Vss
8
o
BOT'
CNAII'
s
Q
BRDY.
000
Vee
Vee
o
R
P
Vee
KWENDfil
Vss
6
o
DReTY"
V55
lAG..
5
N
L
CRDY.
SNPADS·
TAGS
4
o
SNPBSYfJ
V55
PALLe"
KLOCK"
F
Vss
.£.
.£0
.£.o
0
.£..
14
t.4TAG3
0.£.715
MUGS
0
.£..
16
0
17
WSET4
0
MAWEA"
0
MSET7
MSET6
Q
R
S
240956-1
Figure 1-1. 82495XP Pinout (Bottom View)
2-244
inteL
S
o
Ne
82495XP Cache Controller/82490XP Cache RAM
R
o
Q
o
SWEND"
o
He
000
MALE
TMS
Vss
3
o
SNPCLK
0
0
SNPNCA
000
IoIADE"
o
o
o
o
Vee
DRCTt.4"
o
o
Vss
L
o
Vee
K
o
J
o
NRO"
Vee
o
o
H
o
o
Vee
o
o
F
o
E
D
o
o
Vee
o
o
Vss
o
o
o
TOI
o
o
Vss
000
0
SNPCYCIf
$NPADS""
CWAY
I.IWBW1"
0
0
FLUSH"
0
0
eNA#
t.4THIT"
000
fPFLO#
KWENOIII
CAHOlO
MHITt.4#
0
cors.
0
CADS#
o
o
Vee
o
8
9
o
o
o
0
o
o
0
o
o
IoISETB
o
IoISET2
o
IoITAG7
McrA6
MSETO
o
I.ITAG9
17
000
t.lCFAO
o
0
MSET7
R
0
WBTYP
Q
0
o
o
Vss
wcye.
P
0
WRARR"
o
WAY
o
o
o
Vss
o
Vee
o
Vee
000
CACHE"
OCO
0
EADS_
00
0
BOrr_
o
o
Vss
o
4
5
Vss
o
Str.llN#
o
o
VCIt
6
Vss
o
o
7
o
0
8
TAG6
TAGO
o
o
9
000
0
Vss
Vc¢
10
Vee
SEr9
Vss
o
0
11
SETS
000
0
SEU
VC(C
12
5ET3
000
0
SEll
VC(C
13
o
He
14
o
He
15
o
o
o
K
o
CFAD
o
16
o
Vss
o
o
Vee
0
17
H
G
F
0
eFAI
000
BLAST"
o
ADS"
o
BLE*
KEN.
0
o
Hlnl"
WRO
E
PWT
D
CfU
LOCK"
000
Yee
Vee
J
o
SET2
BRoye1 II
MIO"
Vee
L
o
Vss
o
eFA6
LEN
Vss
HA.
o
erAS
BRDYC2#
peo
Vss
Vee
M
o
pcye
INV
Vss
N
0
WBWTIJ
WBWE"
000
UAW[A#
0
WBA
BUS'"
I.ISET6
S
0
I.ICFA 1
MeFA-4-
I.ISET-4
o
lAOI
SETO
tr.ICfA5
000
o
TAG4
SET6
I.ISETI
UTAG5
3
0
"'SET9
t.lSEr3
o
elK
000
000
TAG3
SEllO
o
I.ISElIO
t.iSET5
"'TAGe
o
TOO
SET6
000
IoITAG3
16
TOP SIDE VIEW
IATAGO
Vss
15
o
SET7
000
13
0
TAG2
'.ITAG!
Vss
14
o
82495XP
000
Vee
2
TAG8
I.ITAG2
Vss
12
o
0
000
Vee
o
TAGI!
"'TAG! 1
Vss
11
0
KLOCK"
CI.IIO#
IJTACB
Vee
0
CWR#
"'TACilO
Vss
10
0
lAG7
000
I.ITAG-4
Vee
o
TAG9
CFA3
NENE#
000
Vee
weACHE"
. CFA2
Vss
A
TAGS
Vss
7
o
000
PAlLCIfli
SHPIHV
t.iCFA3
o
RDYSRC
B
TAGIO
Vss
RESET
o
c
SNPBSYI'
Vee
Vss
Vss
G
000
Vee
Vee
M
WBALE
Vss
6
0
SYNC_
t.lCFA2
5
0
N
TeK
SNPSTB"
4
o
BRCY·
MKEt-il"
2
P
c
B
AHOlD
A
240956-2
Figure 1-2. 82495XP Pinout (Top View)
2-245
infel~
82495XP Cache Controller/82490XP Cache RAM
TNS
ADS"
TOI
Hln,"
TCK
BRDYCJ:#
NDATA7
BRDY"
BLAST"
vee
NDATA3
W/R#
CDATA7
Vss
Vee
NDATA6
CDATA3
vee
82490XP
MDATA2
CDATA 1
TOP SIDE VIEW
vss
Vss
NDATAS
CDATA6
Vee
CDATAS
MDATAl
Vee
Vss
CDATA2
NDATA4
CDATAO
Vee
Vss
.. OATAO
CDATA4
WAY
Vss
MOOEtt
WRARRtt
MZST"
CRDY#
"l;g; §" "~ ~ "~ g~
'"
" '" '" '" '"
t;;
~
"'"d >VI'" :"
>0
.. ~
>>~; ~ ~ In
g
,. i,. "'"ffi ",. ""t
~
"'" "
240956-3
Figure 1·3. 82490XP Pinout (Top View)
ADS#
TNS
Hlnt#
TDI
8ROye#
TCK
BRDY#
NDATA7
8LAST*
Vee
NDATA3
W/R#
CDATA7
Vss
MDATA6
vee
CDATA3
CDATA 1
Vss
82490XP
BOTTOM SIDE VIEW
CDATA6
Vee
NDATA2
Vss
MDATA5
CDATA5
Vee
Vee
NDATA 1
CDATA2
Vss
CDATAO
NDATA4
Vss
Vee
NDATAD
CDATA4
WAY
Vss
WRARR#
MOOE"
CROY#
MZBrS
240956-4
Figure 1·4. 82490XP Pinout (Bottom View)
2-246
82495XP Cache Controller/82490XP Cache RAM
~~[g[60~OOO~~W
1.1 Pin Cross Reference Tables
Table 1-1. 82495XP Pin Cross Reference by Name
Signal
Location
Signal
Location
Signal
Location
AOS#
B15
AHOLO
A17
BGT#
M03
B LAST #
C15
BLE#
C16
BOFF # [CLENO]
G15
BROY#
P01
BROYC1#
015
BROYC2#
F14
BUS#
P16
CACHE#
G14
CAOS#
E03
CAHOLO
G04
COC#
003
COTS#
F04
CFAO
E15
CFA1
B14
CFA2#
006
CFA3
B02
CFA4
A16
CFA5
E14
CFA6
014
CLK
011
CMIO#
004
CNA#[CFGO]
L04
CROY # [SLFTST # ]
M02
CWAY
J03
CWR#
E04
OC#
H14
ORCTM#
M01
J04
EAOS#
J15
FLUSH # [NCPFLO #]
N04
FPFLD # [FPFLOEN]
FSIOUT#
001
HITM # [CPUTYP]
017
INV[CLEN1]
K15
KEN#
016
KLOCK#
C03
KWENO# [CFG2]
M04
LEN
002
F15
LOCK#
B16
MALE[WWOR#]
MAOE#
S04
MAWEA#
017
MBALE[HIGHZ#]
P04
MBAOE#
P06
MCACHE#
CO2
MCFAO
016
MCFA1
N14
MCFA2
R04
MCFA3
006
MCFA4
P15
MCFA5
P14
MCFA6
P13
MCYC#
P17
MHITM#
H04
MIO#
F16
MKEN#
R01
MRO#
J01
MSETO
015
MSET1
P12
MSET10
011
MSET2
P11
MSET3
014
MSET4
R16
MSET5
013
MSET6
R17
MSET7
S17
MSET8
P10
MSET9
012
MTAGO
010
MTAG1
P09
MTAG10
007
MTAG11
P07
MTAG2
009
MTAG3
R14
MTAG4
008
MTAG5
R15
MTAG6
S14
MTAG7
S15
MTAG7
S17
MTAG8
P08
MTAG9
S16
MTHIT#
G03
MWBWT#
K03
NA#
J17
NENE#
005
PALLC#
002
PCO
H15
PCYC
J14
PWT
C17
ROYSRC
C01
RESET
005
SETO
013
SET1
C13
SET10
A09
SET2
C14
SET3
B12
SET4
C12
SET5
C11
SET6
012
SET7
009
SET8
010
SET9
B09
SMLN# .
C06
SNPAOS#
F03
SNPBSY#
F01
SNPCLK[SNPMD]
S03
SNPCYC#
H03
SNPINV
P05
SNPNCA
003
2-247
82495XP Cache Controller/82490XP Cache RAM
~~~ILO[MJO[f{]b~JRlW
Table 1·1. 82495XP Pin Cross Reference by Name (Continued)
Signal
Location
SNPSTB#
R03
SWEND#[CFG1]
001
SYNC# [MEMLDRV]
TAGO
C08
TAG 1
A04
TAG10
B01
TAG11
C05
TAG2
008
TAG3
A03
TAG4
B04
TAG5
B03
TAG6
C07
TAG7
A02
TAG8
007
TAG9
A01
TCK
P03
TDI
N03
TOO
C04
TMS
P02
WAY
L15
WBA
M14
WBTYP
N15
WBWE#
M15
WBWT#[WRMRST]
K14
WR#
B17
WRARR#
L14
NC A14,A15,S01,S02
Signal
Location
Vee A05-A08,A10-A13,E01,E17,
H01,H17, K01,K17, L01, L17,
C09, N17, F17, G01, G17,
M17, N01, S05-S13
Signal
Location
004
Vss B05-B08, 810-811,B13, E02,
E16,F02,H02,H16,J02,J16,
K02, K04, K16,L02-L03,L16,
C10,N16,G02,G16,R02,R05R10, M16, N02, R11-R13
Table 1·2. 82490XP Pin Cross Reference by Name
Signal
Signal
Location
Location
Signal
Location
AO
65
A1
66
A10
77
A11
78
A12
79
.A13
80
A14
81
A15
82
A2
67
A3
68
A4
69
A5
70
A6
71
A7
73
A8
75
A9
76
ADS#
63
BE#
64
BLAST #
59
BOFF#
36
BRDY#
60
BRDYC#
61
BUS#
40
CDATAO
48
CDATA1
54
CDATA2
49
CDATA3
55
CDATA5
51
CDATA6
52
CDATA7
57
CDATA4
46
CLK
30
CRDY#
43
HITM#
62
MAWEA#
41
MBRDY # [MISTB]
22
MCLK [MSTB # #]
26
MCYC#
42
MDATAO
18
MDATA1
14
MDATA2
10
MDATA3
6
MDATA4
16
MDATA5
12
MDATA6
8
MDATA7
4
MDOE#
20
MEOC#
23
MFRZ#[MEMLDRV]
24
MOCLK[MOSTB]
27
MSEL#[MTR4#/ ... ]
25
MZBH [MX4 # / ... ]
21
PAR#
32
RESET
28
2
TOO
84
TCK
TMS
3
TOI
1
WAY
45
WBA
38
WBWE#
39
WR#
58
WBTYP
37
WRARR#
44
NC
83
Vee 5,9,13,17,29,35,50,
56, 74
2-248
Vss 7,11,15,19,31,33,34,47,
53, 72
82495XP Cache Controller/82490XP Cache RAM
1.2 Quick Pin Reference
BGT # [C490LDRV]
I
Bus Guaranteed Transfer, [82490XP low Drive]
This signal is generated by the MBC to the 82495XP. It indicates to the
82495XP a commitment by the MBC to complete the cycle on the memory
bus. Until BGT # activation the 82495XP owns the cycle and will abort it if
intervening snoops happen. After BGT # the cycle is owned by the MBC until
its completion. From BGT# until SWEND# snoops will be accepted, but none
will be processed until SWEND# activation.
During RESET's falling edge, this signal controls the driver's strength of the
82495XP to 82490XP interface signals. This strength is a function of the
cache size, and therefore the number of 82490XP's. Refer to the layout
specifications section for more details.
BLE#
0
BE latch Enable
The BlE# signal is used to control the enable line of an external '377-type
latch. The latch captures the i860 XP CPU's BE (Byte Enable) signals and
other CPU provided cycle attributes which do not go through the 82495XP.
BRDY#
I
82495XP Burst Ready
This is the burst ready indication from the memory bus controller. The MBC
should connect its burst ready indication to the CPU BRDY #, the 82495XP
BRDY # and the 82490XP BRDY #. In the CPU, it provides the same function
as that described in the CPU data sheet. The 82495XP will only use this
indication for burst tracking purposes. In the 82490XP, it increments the CPU
latch burst counter.
CADS#
0
Cache Address Strobe
This signal is generated by the 82495XP and used by the memory bus
controller. Its assertion requests execution of a memory bus cycle by the
memory bus controller. This signal when active indicates that the cache cycle
control and attribute signals are valid ..
CAHOLD
0
82495XP AHOlD
This signal is generated by the 82495XP to track the CPU AHOlD signal
when used for warm-reset and lOCKed sequences. It also provides
information about CPU and cache BIST.
CD/C#
0
Cache Data/Control
This is a cycle definition signal driven by the 82495XP. It indicates the type of
memory bus cycle requested. This signal is valid with CADS# and can be
pipelined by the memory bus controller.
CDTS#
0
Cache Data Strobe
This signal is driven by the 82495XP to the memory bus controller. COTS # for
read cycles indicates that in the next ClK the memory bus controller can
generate the first BRDY # for the read cycle. For write cycles it indicates
when data is available on the memory bus. Usage of this signal allows
complElte independency between address strobes (CADS#, SNPADS#) and
data strobe.
CFGO-2
I
Cache Configuration bits 0-2
These signals are inputs to the 82495XP. CFGO-2 allow the 82495XP to be
configured to 5 different modes. Different modes indicate 82495XP/CPU line
ratio, tag size (4K/8K), lines per sector.
2-249
82495XP CacheController/82490XP Cache RAM
1.2 Quick Pin Reference (Continued)
CLK
I
Clock
This signal provides the fundamental timing for the 82495XP, 82490XP and
CPU. It must be provided to the 82495XP, 82490XPs, CPU and memory bus
controller components with minimal skew~
CMIIO#
0
Cache Memory/IO
This signal is driven by the 82495XP and is a cycle definition signal. It
indicates the type of memory bus cycle requested. This signal is valid with
CADS# and can be pipelined by the memory bus controller.
CNA#[CFGO]
I
82495XP Next Address Enable, [Configuration Pin 0]
This signal is driven by the memory bus controller and supplied to the
82495XP. It is used by the memory bus controller to dynamically pipeline
CADS# cycles.
During RESET falling edge it functions as the 82495XP CFGO input.
CRDY # [SLFTST #]
I
Cache Memory Bus Ready, [82495XP Self Test]
This signal is generated by the memory bus controller and informs the
82495XP and 82490XP that a memory bus cycle has been completed .
. CRDY # activation ends the memory bus cycle.
During RESET's falling edge, if this signal is sampled low(active) and
MBALE is sampled high(active), 82495XP self test will be invoked .
CWAY
0
. Cache Way
CWAY is driven by the 82495XP and isa cyCle definition signal that
indicates to the memory bus controller the WAY to be used by the
requested cycle. On line-fills it indicates the way the line will be loaded. For
write-backs it indicates the WAY that was written-back. This signal is valid
with CADS".
CW/R#
0
,
Cache Write/Read
This signal is driven by the 82495XP and is a 82495XP cycle definition
signal. It indicates the type of memory bus cycle requested. This signal is
valid. with CADS# and can be pipelined by the memory bus controller.
DRCTM#
I
Memory Bus Direct to [M] State
This signal is an input to the 82495XP. It is the mechanism by which the
memory bus can dynamically inform the 82495XP of a request to skip the.
[E] state and move the line directly to the [M] state. This signal is sampled
by the 82495XP when SWEND# is asserted.
FLUSH # [NCPFLD#]
I
Flush the 82495XP cache, [Enable Non-Cacheable PFLD]
This signal is an input to the 82495XP.Flush when active will cause the
82495XP to write-back all of its modified lines into main memory then
invalidate all tag locations. At the end of a flush operation the 82495XP tag
array will be completely invalidated.
During RESET activation, this pin functions as the NCPFLD# configuration
signal which, with FPFLDEN, selects one of three modes for handling
i860 XP CPU floating point load cycles.
2-250
82495}{P Cache Controller/82490){P Cache RAM
1.2 Quic!{ !Pin Pueferel1ce (COntinued)
FPFLD# [FPFLDEN]
I/O
FIFO PFLD Enable [PFLD Mode Select]
During RESET, FPFLDEN and NCPFLDEN # inputs select one of three
modes to handle i860 XP CPU pipelined floating point load cycles. In the
mode which supports an external FIFO, the FPFLD# output indicates a
PFLD cycle to be loaded into the FIFO.
FSIOUTff
0
Flush/Sync/Initialization Output
This signal is an output of the 82495XP and indicates the start and end of
three operations: Flush, Sync, and Initialization. The output is activated
when the operation internally begins and is de-activated when the
operation ends.
t(LOCt{#
0
82495XP LOCK #
This signal is driven by the 82495XP and indicates to the memory bus
controller a request to execute atomic read-modify-write sequences.
KLOCK # is active with the CADS # of the first LOCKed operation and
remains active until at least the clock following CADS# of the last cycle of
LOCKed operation.
[CFG2]
I
Cacheability Window End, [Configuration Pin 2]
This signal is generated by the MBC and indicates to the 82495XP that the
Cacheability Window has expired. At this point the 82495XP will latch the
memory cacheability signal (MKEN#) and make decisions based on the
cacheability attribute. MRO# which indicates the Read-Only cycle attribute
is also sampled at this point.
During RESET's falling edge this line functions as the CFG2 configuration
signal which is used to configure the 82495XP/82490XP with cache
parameters.
MALE [WII'JOR # 1
I
MAO\:#
I
Memory Bus Address Output Enable
This signal is generated by the memory bus controller and controls the
82495XP's output buffer of the memory bus address latches. The 82495XP
drives the memory bus address lines if MAOE # is active (low). Otherwise,
it is tristated. MAOE# also serves as a qualifier for snooping cycles: when
inactive snoops will be enabled.
~lIBALE[HIGI-IZ# ]
I
Memory Bus, '82495XP sub-line-address Latch Enable[High Impedance
Output]
This signal has an exact function as MALE but controls only the 82495XP
sub-line addresses. This signal is generated by the memory bus controller,
, and controls a 82495XP internal transparent address latch (373 like).
CADS# will generate a new address at the input of the internal address
latch. MBALE activation(high) will allow the flowing of the sub-line address
to the memory bus provided MBAOE# is active. When MALE inactive(low),
the sub-line address at the latch input is latched.
HIGHZ#, if active along with SLFTST#, causes the 82495XP to float all of
its outputs.
~(WEND #
Memory Bus, Address Latch Enable[Weak Write Ordering]
This signal is generated by the memory bus controller, and controls a
82495XP internal transparent address latch (373 like). CADS# will
generate a new address at the input of the internal address latch. MALE
'activation(high) will allow the flowing of this address to the memory bus
provided MAOE# is active. When MALE inactive(low), the address at the
latch input is latched.
WWOR # configures the 82495XP into strong or weak write-ordering
mode.
2-251
intet
82495XP Cache Controller/82490XP Cache RAM
~!ru~Il..DIMIDOO~OOW
1.2 Quick Pin Reference (Continued)
MBAOE#
,I
Memory Bus,· 82495XP sub-line Address Output Enable
This signal has a similar function than MADE #, but controls only the
82495XP sub-line addresses.
If MBAOE # is active(low), the 82495XP will drive the sub-line portion of the
address onto the memory bus. Otherwise, it is tristated. MBAOE# is also
sampled during snoop cycles. If MBAOE# is sampled inactive with
SNPSTB # , the snoop write back cycle(if any) will begin at the sub-line
address provided. If MBAOE# is active with SNPSTB#, the snoop write
back will begin at sub-line address O.
MBRDY # (MISTB)
I
Memory Bus Ready, (Memory Input Strobe)
This pin is an input to the 82490XP. It is used in clocked bus mode to
indicate the end of a transfer. When active(low) it indicates that the
82490XP should increment the burst counter and either output the next
data or get ready to accept the next data.
In strobed memory bus mode this pin is the input data strobe to the
82490XP. On each MISTB edge, the 82490XP latches the data and
increments the burst counter.
MCACHE#
0
82495XP Inte~nal Cacheability
This signal is driven by the 82495XP. On read cycles, this signal indicates
the cycle's internal cacheability attribute. In write cycles MCACHE# is only
active for write-back cycles. MCACHE# is not activated for 110, special
cycles and Locked Cycles.
110
110
110
Memory Bus Configurable address lines
Memory bus SET number
Memory bus TAG bits
These are the memory bus address lines of the 82495XP ~nd should be .
connected to the A31-A2 (A31-A3 for 64 bit bus) signals of the Memory
Bus. These signals, along with the byte enables, define the physical area of
memory or 110 accessed.
The 82495XP drive these signals in normal memory bus cycles and have
them as inputs during snooping.
I
Memory Bus Clock, [Memory Input Strobel
In clocked memory bus mode this pin provides the memory bus clock to the
82490XP. In clocked mode, memory bus signals and memory bus data are
sampled on the rising edge of the MCLK. In a clocked memory bus write,
data is driven off of MCLKor MOCLK depending upon the configuration.
This pin is an input to the 82490XP. It is sampled during reset and
determines the memory bus type. If active(low), the memory bus will be
strobed. If inactive (high), the rnemory bus will be clocked.
If a clock is detected at this input, this pin becomes the memory bus clock,
and clocked memory bus mode is selected.
110
Memory Bus Data
The.se pins are the 8 memory .data pins of the 82490XP. All or part of these
pins will be used depending on the cache configuration. In clocked memory
bus mode, these pins are sampled with the rising edge of MCLK. New data
. is driven out on these pins with MEOC# or the rising edge of MCLK or
MOCLK together with MBRDY# active. In strobed memory bus mode,
these pins are sampled on each MISTB edge. New data is driven out on
.these pins with each MOSTB edge.
MCFA6-MCFAO
MSET10-MSETO
MTAG11·MTAGO
MCLK[MSTBM# I
MDATAO-MDATA7
2-252
82495XP Cache Controller/82490}{P Cache RAM
1.2 Quick Pin Reference (Continued)
MDOE#
I
Memory Data Output Enable
This signal is an input to the 82490XP. The memory bus output enable is
used to control the 82490XP's driving of data onto the memory bus. When
this pin is inactive(high), the MDATA[O:7] pins are tristated. When this pin is
active(low), the MDATA[O:7] pins are actively driving data. The function of
this pin is the same for strobed or clocked memory bus operation as
MDOE# has no relation to ClK or MClK.
MEOC#
I
Memory End of Cycle
This signal is an input to the 82490XP. Since it is synchronous to the
memory bus, it may be used to end a cycle on the memory bus and begin a
pending cycle without waiting for synchronization to the CPU ClK. MEOC#
also causes the latching or driving of data and resetting of the memory burst
counter.
MFRZ# [MEMLDRV]
I
Memory Freeze, [Memory Bus low Drive]
This signal is an input to the 82490XP. It is used for write cycles that could
cause allocation cycles. When this pin is active (low), write data is latched in
the 82490XP. The subsequent allocation will not overwrite data latched by
the write. This prevents the actual write to memory from having to be
performed on the memory bus. The allocated line will be placed in the [M]
state in the cache since memory has not been updated.
During RESET's falling edge, this signal is sampled to indicate the
82490XP's memory bus driving strength. The 82490XP provides normal and
high drive capability buffers.
MHITM#
0
Memory Bus Hit to Modified Line
This signal is driven by the 82495XP during snoop cycles and indicates
whether the snooping address hit a Modified line in the 82495XP cache. The
82495XP automatically schedules the writing-back of modified lines when
snoop hits occur. MHITM# is activated the ClK after SNPCYC# and will
remain active until the next SNPSTB #.
MKEN#
I
Memory Bus Cacheability.
This signal is an input to the 82495XP. It is the memory bus cache enable
pin. It is used to indicate to the 82495XP if the current memory bus cycle is
cacheable or not. This pin is sampled by the 82495XP with KWEND#
assertion.
MOCU«MOSTB)
I
Memory Output Clock, (Memory Output Strobe)
MOClK controls a transparent latch at the 82490XP data outputs. By
providing a clock input, skewed from MClK, MDATA hold time may be
increased.
In strobed bus mode this pin is the data output strobe. On each MOSTB
edge, new data will be output onto the memory bus.
MRO#
I
Memory Bus Read-Only
This pin is an input to the 82495XP. It is the READ-ONLY attribute pin. It is
used to indicate to the 82495XP that the accessed line should get a READONLY attribute. READ-ONLY lines will be non-cacheable in the first level
cache. READ-ONLY lines will be cached in the 82495XP if M KEN # is
sampled active during KWEND# and will be cached in the [S] state. This pin
is sampled by the 82495XP with KWEND# assertion.
2-253
82495XP Cache Controller/82490XP Cache RAM
~OO~IbOIMlOOO~OOW
.
1 2 Quick Pin Reference (Continued)
MSEL# [MTR4/TR8 #]
I
Memory Select, [Memory Transfer]
This signal is a chip select input to the 82490XP. MSEl# activation
qualifies the MBRDY # input of the 82490XP. MSEl# going active causes
the sampling of MZBT# for the next cycle. MSEl# going inactive resets
\
the 82490XP's internal memory burst counter.
This pin is used to determine the number of transfers necessary on the
memory bus for each cache line. If high, there are 4 transfers on the .
memory bus for each cache line. If low, there are 8 transfers on the
memory bus for each cache line.
.,
MTHIT#
0
Memory Bus Tag Hit
This Signal is driven by the 82495XP during snoop cycles. It indicates
whether the snooping address hit any line (exclusive, shared, or modified) .
in the 82495XP cache. MTHIT# is activated t~e ClK after SNPCYC# and
will remain active until the next SNPSTB #.
MWB/WT#
I
Memory Bus Write Policy
This Signal is an input to the 82495XP. It is the mechanism by which the
memory bus can dynamically inform the 82495XP of the cycle write policy
(Write-Through/Write-Back): This signal is sampled by the 82495XP with
SWEND# activation.
MZBT# [MX4/MX8#]
I
Memory Zero Based Transfer, [Memory 1/0 Bits]
This signal is an input to the 82490XP. When this pin is sampled active
(with MSEl# or MEOC#) it indicates that the memory bus cycle should
start with burst location zero independent of the sUb-line address
requested by the CPU.
This pin is used to determine the number of 10 pins used for the memory
bus. When HIGH it indicates that 4 10 pins are used per 82490XP. When
lOW it indicates that 8 10 pins are used.
NENE#
0
Next Near
This signal is generated by the 82495XP and indicates to the memory bus
controller if the address of the requested memory cycle is "near" the
address of the previously generated one (in the same 2K DRAM page).
This information can be used by the memory bus controller to optimize
access to paged or static column DRAMs. This signal is valid together with
CADS#.
PALLC#
0
Potential Allocate
This signal is generated by the 82495XP and indicates to the memory bus
controller that the current write cycle can potentially allocate a cache line ..
Potential allocate cycles are cycles which are 82495XP misses with PCD;
PWT inactive.
RDYSRC
0
Relidy Source
This signal is an output of the 82495XP. It indicates the source of the
BRDY generation for the CPU. When high it indicates that the memory bus
.controller should generate BRDYs to the CPU, when low it indicates that
the 82495XP will be the one providing BRDYs.
2-254
82495XP Cache Controller/82490XP Cache RAM
1.2 Quick Pin Reference (Continued)
RESET
SMLN#
I
0
Reset
This signal forces the 82495XP and 82490XP to begin execution at a known state. It's
falling edge will sample the state of the configuration pins. RESET is an asynchronous
input to the 82495XP and 82490XP.
The following 8249SXP pins are sampled during reset falling edge:
CNA# [CFGO]: CFGO line of 82495XP configuration inputs.
SWEND# [CFG1]: CFG1 line of 82495XP configuration inputs.
KWEND# [CFG2]: CFG2 line of 82495XP configuration inputs.
FLUSH # [NCPFLD # ]: Enables decoding of the non-cacheable PFLD mode. Active if low.
FPFLD# [FPFLDEN]: Enables the external FIFO pfld mode. Active high.
BGT # [C490LDRV]: Indicates the driving strength of the 82495XP /82490XP interface. If
high, the 82495XP can drive up to 10 82490XP's without derating. If low, the 82495XP
can drive up to 18 82490XP's without derating.
SYNC# [MEMLDRV]: Indicates the 82495XP's memory bus driving strength .
SNPCLK[SNPMD]: Indicates the snoop mode, synchronous or asynchronous.
CFGO-CFG2 signals are used to configure the 82495XP/82490XP with cache
parameters. They define the lines/sector, line ratio, and number of tags.
MALE[WWOR #]: Enforces strong or weak write-ordering consistency.
MBALE[HIGHZ#]: If active along with SLFTST# will tristate all 82495XP outputs.
The following 82490XP pins are sampled during reset falling edge:
PAR#: If active(low), this pin configures the 82490XP as a parity storage device. The
parity configuration stores the paritybits belonging to data stored in other 82490XP's.
MZBT # [MX4/MX8 #]: Determines the number of 10 pins used for the memory bus
interface. If high, four 10 pins are chosen. If low, eight 10 pins are chosen.
MSEL # [MT4/MT8 #]: Determines the number of transfers necessary on the memory bus
for each cache line. If high, four memory bus transfers are needed to fill a cache line. If
low, eight memory bus tranfers are needed to fill a cache line.
MCLK[MSTBM #]: If active(low), this pin indicates a strobed memory bus configuration. If
inactive(high), a clocked memory bus is chosen.
.
MFRZ# [MEMLDRV]: Indicates the 82490XP's memory bus driving strength.
Same Cache Line
This signal is an output of the 82495XP. It is used to indicate to the memory bus controller
that the current cycle is to the same 82495XP line as the previous one. This indication
can be used by the memory bus controller to selectively activate its SNPSTB # signal to
other caches. For example, back to back snoop hits to the same line may be snooped
only once. This signal is valid together with CADS #.
2-255
•
82495XP Cache Controller/82490XP Cache RAM
1.2 Quick Pin Reference (Continued)
SNPADS#
0
Cache Snoop Address Strobe
This signal is an output of the 82495XP. It has an identical functionality as
CADS#, but is generated only on snooping-write-back cycles. Considering that
snoop write-back cycles are the only ones which are generated independent of
CPU bus activity, this separate address strobe should ease implementation of
the memory bus controller. Whenever active, the memory bus controller should
abort all pending cycles (cycles for which BGT # was not issued yet. After
BGT # the memory bus controller is responsible for the cycle completion). The
82495XP assumes that non-committed cycles are aborted upon SNPADS#
and may re-issue them again after the completion of the snoop.
SNPBSY# .
0
Snoop Busy
This.signal is driven by the 82495XP. When inactive(high), it indicates that the
82495XP is ready to accept another snoop .cycle. SNPBSY # will be activated
for one of two reasons: A snoop hit to a modified line, a back-invalida~ion is
needed when there is one already in progress. In either of these cases, the
82495XP will not perform the look-up for a pending snoop until SNPBSY # is
de-activated.
SNPCLK[SNPMD]
I
Snoop Clock [Snoop Mode]
This pin provides the 82495XP with the snoop clock to be used in clocked
memory interfaces. During clocked mode SNPSTB#, SNPINV, SNPNCA,
MBAOE #, MAOE #. and the Address lines will be sampled by SNPClK.
During RESET activation, this pin functions as the SNPMD(snoop mode)
signal. If high it indicates strobed snooping mode. If low it indicates
synchronous snooping mode. For clocked snooping mode, SNPClK is
connected to the snoop clock source.
SNPCYC#
0
Snoop Cycle
This signal is an output of the 82495)(P. It indicates when the snooping look-up
is actually taking place in the 82495XP tag RAM.
SNPINV
I
SNPNCA
I
Snoop Non Caching Device Access
This signal is an input to the 82495XP and provides the 82495XP information
on whether the current memory bus master is a non caching device (DMA,
etc). This indication allows the 82495XP to avoid changing line states from
exclusive to shared unnecessarily.
SNPSTB#
I
Snoop Strobe
This signal is an input to the 82495XP which is used to initiate a snoop.
SNPSTB# causes the latching of the snoop address and parameters. The
82495XP supports three latching modes: Clocked, Strobed, Synchronous. In
the clocked mode, address and attribute signals will be latched with the
activation of SNPSTB # .SNPClK. In the strobed mode, address and attributes
will be latched by the SNPSTB # falling edge. In synchronous mode, address
and attribute signals will be latched with the activation of SNPSTB # .ClK.
,
Snoop Invalidation
This signal is an input to the 82495XP and indicates the resulting line state in
case of a snoop hit cycle. If active, it forces the line to go to an invalid state.
This signal is sampled with SNPSTB #.
2-256
intet
82495XP Cache Controller/82490XP Cache RAM
1.2 Quick Pin Reference (Continued)
SWENO#[CFG1]
I
Snoop Window End, [Configuration Pin 1]
This signal is generated by the MBC and indicates to the 82495XP that the
Snoop Window has expired. At this point the 82495XP will latch the memory
bus attributes: write policy (MWB/WT #), and direct to [M] Iransfer
(DRCTM#). At the end of the snooping window, all other devices have
snooped the bus master's address and have generated address caching
attributes on the bus. Once a cycle begins, the 82495XP prevents snooping
until it has received SWEND # . The 82495XP will act based on those
attributes and will update its tag RAM.
During RESET's falling edge this line functions as the CFG1 configuration
Signal which is used to configure the 82495XP 182490XP with cache
parameters.
SYNC# [MEMLORV]
I
Synchronize 82495XP cache, [Memory Bus low Drive]
This signal is an input to the 82495XP. Activation of this line will cause the
synchronization of the 82495XP tag array with main memory. All 82495XP
modified lines will be written back to main memory. The difference between
FLUSH and SYNC is that on SYNC the 82495XP and CPU tag array will NOT
be invalidated. All the valid entries will be kept, with all modified lines
(M state) becoming non-modified (E state).
During RESET's falling edge, this signal is sampled to indicate the memory
bus driving strength. If it is sampled low, the maximum capacitive load
without derating is 100pf. If it is sampled high, the maximum capacitive load
without derating is 50pf.
TCK
I
Testability Clock
This signal is an input to both the 82495XP and 82490XP. This is the
boundary scan clock. This Signal has to be connected to a clock
synchronous to ClK to insure initialization of the test logic.
TOI
I
Testability serial input
This signal is an input to both the 82495XP and 82490XP.
TOO
0
Testability serial output
This signal is an output of both the 82495XP and 82490XP.
TMS
I
Testability Control
This signal is an input to both the 82495XP and 82490XP.
The following pins have internal pull-ups:
During tri-state output testing sequence, all pull-ups
will be disabled.
ADS#, NA#, FPFlD#, TDI, TMS, BGT#,
KWEND#, SWEND#, CNA#, BRDY#, SYNC#,
FlUSH#, SNPSTB#, MRO#, DRCTM#, TCK,
SNPClK, MFRZ#, MZBT#, MClK, MOCLK.
The following Signals are glitch free. These signals
are always at a valid logic level following RESET:
CADS#,CDTS#,SNPADS#,SNPCYC#.
2-257
•
82495XP Cache Controller/82490XP Cache RAM
~OO~ILO~OOO~OOW
1.3 Output Pins
Table 1-3 lists all output pins, from which 'part(s) they are driven, and their active levels.
Table 1-3. Output Pins
Name
Part
Name
Active Level
Part
Active Level
BLE#
82495XP
lOW
MTHIT#
82495XP
lOW
CADS #
82495XP
lOW
NENE#
82495XP
LOW
CAHOLD
82495XP
HIGH
PAllC#
82495XP
lOW
CDTS#
82495XP
lOW
RDYSRC
82495XP
HIGH
CWAY
82495XP
SMlN#
82495XP
lOW
CWIR # ,CD/C# ,CM/IO#
82495XP
-
SNPADS#
82495XP
LOW
FSIOUT#
82495XP
lOW
SNPBSY#
82495XP
LOW
KLOCK#
82495XP
lOW
SNPCYC#
82495XP
lOW
MCACHE#
82495XP
lOW
TOO
82495XP 182490XP
-
MHITM#
82495XP
lOW
.1.4 Input Pins
Table 1-4 lists all input pins, which part(s) they are input to, their active level, and whether they are synchronous or asynchronous inputs.
Table 1·4 Input Pins
Name
Part
Active Level
Synchronous/Asynchronous
lOW
Synchronous toClK
BGT # [C490lDRV]
82495XP
BRDY#
82495XP 182490XP
lOW
Synchronous to ClK
ClK
82495XP182490XP
-
CFG3
82495XP
-
CNA# (CFGO)
82495XP
lOW
Synchronous to ClK
CRDY # [SlFTST #]
82495XP182490XP
lOW
Synchronous to ClK
DRCTM#
82495XP
lOW
Note 2
FLUSH# [NCPFlD#].
82495XP
lOW
Asynchronous
CPUTYP
82495XP
lOW
Synchronous to ClK
Synchronous to ClK
KWEND# (CFG2)
82495XP
LOW
Synchronous to ClK
MALE, MBAlE
82495XP
HIGH
Asynchronous
MAOE#,MBAOE#
82495XP
lOW
Asynchronous
MClK[MSTBM#]
82490XP
lOW·
Synchronouos to MClK
MBRDY # (MISTB)
82490XP
MDOE#
82490XP
lOW
Asynchronous
MEOC#
82490XP
lOW
Synchronousl Asynchronous, Note 1
-
~OO~[bO[M]OOO~OO't7
82495XP Cache Controller/82490XP Cache RAM
Table 1·4. Input Pins (Continued)
Name
Part
Active Level
Synchronous/Asynchronous
MFRZ#
82490XP
low
Synchronousl Asynchronous, Note 1
MOClK(MOSTB)
82490XP
MSEl[MTR4/TR8#]
82490XP
low
Synchronousl Asynchronous, Note 1
MZBT # [MX4/MX8 #]
82490XP
low
Synchronousl Asynchronous, Note 1
MKEN#
82495XP
lOW
Note 2
MRO#
82495XP
lOW
Note 2
MWB/WT#
82495XP
-
Note 2
PAR#
82490XP
low
Synchronous to ClK
RESET
82495XP 182490XP
HIGH
Asynchronous
SNPClK[SNPMD]
82495XP
-
-
SNPINV
82495XP
HIGH
Note 3
SNPNCA
82495XP
HIGH
Note 3
SNPSTB#
82495XP
lOW
Note 3
SWEND# (CFG1)
82495XP
lOW
Synchronous to ClK
SYNC# [MEMlDRV]
82495XP
lOW
Asynchronous
TCK
82495XP 182490XP
TDI
82495XP182490XP
-
Synchronous to TCK
TMS
82495XP 182490XP
-
Synchronous to TCK
NOTES:
(1) In Clocked memory bus mode these pins are synchronous with MCll<. In Strobed memory bus mode these pins are
asynchronous.
(2) MWB/WT·#, DRCTM# must be synchronous to ClK during SWEND#. MKEN#,MRO# must be synchronous to ClK
during KWEND#.
(3) In clocked memory bus mode these pins are synchronous with SNPClK. In strobed memory mode these pins are
asynchronous.
1.5 Input/Output Pins
Table 1-5 lists all input/output pins, which part they interface with, and when they are floated.
Table 1·5. Input/Output Pins
Name
Part
Synch/Asynch
When Floated
FPFlD# [FPFlDEN]
82495XP
Synchronous to ClK
-
MCFAO-MCFA6
82495XP
Note 1
MAOE#
MDATAO-MDATA7
82490XP
Note 2
MSETO-MSET10
82495XP
Note 1
MTAGO-MTAG11
82495XP
Note 1
=
MDOE# =
MAOE# =
MAOE# =
High
Hight and during Reset
High
High
NOTES:
(1) With MALE high and MAOE# low, these pins are synchronous to ClK.
(2) In ClocllI
Data
a
XVRs/LATCHES
D
1!:.""""","lII2Ic:I
MEMORY BUS
-"""' .... """ .....
-_ ..a
Optional
I
>
240956-11
Figure 6-1. Memory Bus Controller Interface Model
Figure 6-1 shows the memory bus controller (MBC)
interface model. The memory bus controller interfaces to the i860 XP CPU, 82495XP, 82490XP, and
memory bus. The MBC interface was defined with a
minimal set of assumptions as to the memory bus
implementation. The chipset was designed to enable
flexibility in the design of a memory bus and controller.
The 82495XP requests control of the memcirybus
by signalling the memory bus controller. The memory bus controller is responsible for arbitrating and
granting the bus to the 82495XP. Once granted, the
memory bus controller is responsible for executing
the requested cycle, snooping the other caches, and
ending the cycle. The 82495XP supports different
modes of snooping, different modes of memory bus
operation, and various special cycles. Memory Bus
Controller design dictates which of these features
are used, and exactly how they are used.
2-279
Intel.,
82495XP Cache Controller/82490XP Cache RAM
After BGT # the memory bus controller owns the cycle. The 82495XP assumes the cycle will terminate
and will not re-issue it on snoop-write-backs. Following BGT# comes KWEND# which indicates that the
cacheability window is closed and that the 82495XP
can sample MKEN#, MRO# attributes. Those indicate to the 82495XP cacheability and read-only respectively. These attributes can be determined by
decoding the 82495XP address. Based on those attributes the 82495XP executes AllOCATIONS,
Line-fills, Replacements, etc.
6.1 Cycle Attribute and Progress
CADS', SNPAoS'
COTS"
Cycle Request
BGT"
KWENo#(ATTRIB, MKEN#, MRO#)
SWENo#(ATTRIB, MWB/WT#, DRCTM#)
CNA#
CROY#
Cycle Progress
240956-12
Figure 6-2. Cycle Attribute and Progress Signals
CADS# indicates the start of the cycle address
phase, CDTS# tracks CADS# and indicates the
start of the cycle data phase. For READ cycles it
indicates that starting in the next ClK the CPU data
bus is in read mode under the control of the MBC
until the last BRDY #. In Read cycles, if the MBC
already owns the CPU data bus, CDTS # will be activated with CADS #. For AllOCATE cycles the MBC
does not need the CPU data bus, therefore COTS #
is activated together with CADS#.
For Write cycles CDTS# indicates that the 1st piece
of data is available on the memory bus. For writeback cycles CDTS # indicates that all data is av~iI
able (write-back buffer or snoop buffer loaded with
correct write-back data).
As a response to the cycle request, the memory bus
controller responds with cycle progress signals. All
cycle progress signals are sampled ONCE in specific windows and then ignored until CRDY # of the
corresponding cycle. BGT # indicates a commitment
by the memory bus controller to complete the cycle
execution on the memory bus. Up until this point the
82495XP owns the cycle. This means that intervening snoop-write-backs will abort it and the 82495XP
re-issues the cycle to the MBC. There is only one
case where the 82495XP will issue a new, not a reissued, cycle; if the original CADS# operation is a
write-back cycle, and the interrupting snoop cycle
hits that write-back buffer, then the subsequent
CADS# will be for a completely new cycle (not a reissuing of the interrupted CADS# operation).
Following KWEND#, SWENO# is activated. It indicates that the Snoop Window is closed. The
82495XP samples MWB/WT # and DRCTM # attributes. These attributes are determined by snooping
the other caches in the system. At this point the
82495XP updates its TAGRAM state related to the
line access in progress.
lastly the MBC issues CRDY#, which indicates to
the 82495XP the end of the transaction data phase.
The 82495XP allows memory bus pipelining by providing CNA# which allows the MBC to request a
new address phase before the conclusion of the current data phase. The 82495XP supports a 1 level
. deep address pipeline on the Memory Bus.
6.2 Snoop Operations
The 82495XP provides the capability of snooping
operations on the memory bus to ensure cache consistency. A snoop operation consists of two phases:
1) initiation phase and 2) response phase.
<
Initiation>
<
Response>
240956-13
Figure 6-3. 82495XP Snooping Operations
During the initiation phase the MBC provides the
82495XP with the snoop address information. During
the response phase the 82495XP provides the
snoop status information.
2-280
82495XP Cache Controller/82490XP Cache RAM
NOTE:
6.2.1 SNOOP INITIATION PHASE
The 82495XP samples the SNPClK[SNPMD] signal at the falling edge of RESET to determine the
snoop mode. If a rising edge occurs on the
SNPClK[SNPMD] after. RESET has gone inactive,
clocked mode will be selected. Systems using
stobed or synchronous mode must ensure that no
rising edge occur on SNPClK[SNPMD] after RESET has gone inactive.
The 82495XP provides three modes for initiating
snoops:
1. Strobed: the falling edge of SNPSTB if is used.
2. Clocked: SNPSTB # is sampled with SNPClK.
3. Synchronous: SNPSTB# is sampled with ClK.
These three snooping modes are configured as follows:
1. Strobed: The SNPClK[SNPMD] signal must be
strapped high.
2. Clocked: The SNPClK[SNPMD] signal must be
connected to the snoop clock source.
3. Synchronous: The SNPClK [SNPMD] signal
must be strapped low.
Figure 6-4 shows the strobed method of snoop initiation. The memory address, SNPNCA, SNPINV,
and MBAOE# are latched with the falling edge of
the SNPSTB#. If MAOE# is sampled active (low),
the SNPSTB# will not cause a snoop. The snoop
initiation is recognized by the 82495XP, is synchronized in the next clock, and causes a snoop in the
following clock.
SNPSTB#
SNPINV
SNPNCA
t.4SETO-l0
t.4TAGO-ll
t.4CFAO-6
t.4BAOE#
t.4AOE#
CLK.
SNPCYC#
\
/
\\\\
t.4THIT#
t.4HITt.4#
SNPBSY#
240956-14
Figure 6-4. Strobed Snoop Mode
2-281
Intel ...
82495XP Cache Controller182490XP Cache RAM
Figure 6-5 shows the clocked method of snoop initiation. The memory address, SNPNCA, SNPINV,
and MBAOE # are latched with the rising edge of
SNPCLK when SNPSTB# is first sampled low.
SNPSTB # must be sampled high for at least one
SNPClK
SNPCLK in order to rearm for another snoop. If
MAOE# is sampled active (low), the SNPSTB# will
not cause a snoop. The snoop initiation is recognized by the 82495XP, is synchronized in the next
clock, and causes a snoop in the following clock.
',,---.1/ ,
I
_+1_...../1...____1___...,,/
~\"'_..AI
,
·SNPSTB#
,'"----I,
\ ....
SNPINV
SNPNCA
MSETO-l0
IHAGO-ll
MCFAO-6
MBAOE#
MAOE#
ClK
SNPCYC#
.. '
\\\\
\\\\
MTHIT#
MHITM#
SNPBSY#
240956-15
Figure 6-5. Clocked Snoop Mode
2-282
82495XP Cache Controller/82490XP Cache RAM
Figure 6-6 shows the synchronous snoop mode. The
memory address, SNPNCA, SNPINV, and MBAOI;#
are latched with the rising edge of ClK when
SNPSTB# is first sampled low. SNPSTB# must be
sampled high for at least one ClK in order to rearm
for another snoop. If MAOE # is sampled active
(low), the SNPSTB# will not cause a snoop. The
snoop initiation is recognized by the 82495XP, and
causes a snoop in the next clock. .
ClK
SNPSTB#
SNPINV
SNPNCA
MSETO-10
MTAGO-11
MCFAO-6
MBAOE#
MAOE#
)000000000000
·
.
)O¢OOOOOOOOOOO¢OO(~
·
,
.
)O¢OOOOO¢OOOOO¢OO{~
·
.
.
)O¢OOOOO¢OOOOO¢OO<>---~
·
.
.
)OOOOOOOQOOOOO¢
·
.
.
)O¢OOOOO¢OOOOo¢OO\...-....i-IVVVV"VVVVV'YYYYYYVYVY'VYYVY,
·
.
.
)O¢OOOOO¢OOOOO¢
···
····
SNPCYC#
.
=====:
...
....
\
:
I
MTHIT#
\\\\
MHITM#
\Sss
SNPBSY#
\\\\
240956-16
Figure 6-6. Synchronous Snoop Mode
2-283
inlet
82495XP Cache Controller/82490XP Cache RAM
6.2.2 RESPONSE PHASE
The snoop respbnse phase consists of two parts:
1) 82495XP state indication 2) 82495XP snoop processing completion. The response phase is' AlWAYS synchronous with the CPU ClK. The
82495XP state indication is presented on MHITM#
and MTHiT# and remains stable until the next
snoop. These signals indicate the state of the
82495XP line just prior to the. snoop operation. The
memory bus controller can predict the final state of
the 82495XP line knowing the initiql state and the
SNPINV and SNPNCA inputs. The snoop completion information is determined by the SNPBSY # output. The SNPBSY # output inactive indicates that
the 82495XP is ready to accept another snoop cycle.
Figure 6-7 shows the 82495XP response to snoops
without invalidation. The first snoop is to a line which
is not currently stored in the cache.
.
Figure 6-8 shows the 82495XP response to snoops
with invalidation.
The SNPBSY # signal will be activated for one of
two reasons: 1) a snoop hit to a modified line,
SNPBSY# will remain active until the modified line·
SNPSTB#
has been written back. 2) a Back invalidation is
ne,eded and there is a back invalidation in process.
The SNPBSY # minimum active time is two ClK periods. This allows an external logic to trap-hold active SNPBSY # using ClK. The external logic must
first look for active SNPCYC# and then trap-hold
SNPBSY#,
6.2.3 PIPELINED SNOOPS
The 82495XP allows the memory bus controller to
pipeline snoop operations. The 82495XP allows the
next snoop address to be supplied and the next
snoop requested before the last snoop has completed,
There are a set of. rules which govern the operation
of pipelined snoops, These rules are as follows:
(1) For strobed mode snoops, the memory bus controller cannot cause a second falling edge of
SNPSTB# until after the falling edge of
SNPCYC#,
(2) For clocked mode snoops, the memory bus controller cannot cause a second falling edge of
SNPSTB# to be sampled by SNPCLK, until after
the falling edge of SNPCYC# ,
M Stat.
~,--JI
~\" _ _--I
SNPINV
MTHIT#
.MHITM#
SNPBSY#
240956-17
Figure 6-7. Snoops without Invalidation
I stot.
SNPSTB#
r----\ E. S S. tat. /
~\" _ _--I!
\
,
\
M Stot.
I
SNPINV
MTHIT#
MHITM#
SNPBSY#
240956-18
Figure 6-8. Snoops with Invalidation
2-284
82495XP Cache Controller/82490XP Cache RAM
ClK
SNPST8#
-:---u;
\J.J
: I7ZlJZI17
\
< SN?OP>
'-L-
\J.J
< SN?OP>
SNPCYC#
,
)0< :,
:X)O( ,:
,
MHITM#
)0<
SNP8SY#
W
:>00< :
:\)(\
:
,
,
MTHIT#
:, XXX
,:
:, )00\7:
r:wv
:)ooct=
,
K>OOC
,:~
,
240956-19
Figure 6-9. Fastest Possible Synchronous Snooping
,,
,
ClK
I
SNPST8#
I
I
~
~
__--_-""'
'
,
SNPCYC#
MTHIT#
)0(
MHITt.4#
)0<
SNP8SY#
:'W
I
I
;mmlOl/:
< SN?OP>,-_ _ _ _-
,
:,
:
:)00(
:,
,
,:X)O( :
:\)(\
\\
...'
:X)O( :
:, )OW
,
r:w.;
: )OO(J::
K>OOC
,:~
,
240956-20
Figure 6-10. Fastest Possible Asynchronous Snooping
(3) For synchronous mode snoops, the memory bus
controller cannot cause a second falling edge of
SNPSTB# to be sampled by ClK, until the ClK
after SNPCYC# is active.
6.2.4 OVERLAPPING SNOOPS WITH MEMORY
BUS CYCLES
The 82495XP allows snoops to be overlapped with
data transfers. The 82495XP divides the memory
bus cycle into 4 main regions as shown below:
CRDY,#
CADS#
BGT#
2
SWEND# CRDY#
3
4
I
CADS#
I
Region 1 is after a previous memory bus cycle (Le.
after CRDY #) and before the new memory bus cycle starts (before CADS#). A snoop in this region is
looked up immediately and serviced immediately.
Region 2 is after a memory bus cycle has started
(CADS#) but before the 82495XP has been granted
the bus (BGT #). A snoop in this region is looked up
immediately and serviced immediately. CADS# is
re-issued for the aborted cycle once the snoop completes.
Region 3 is after the 82495XP has been granted the
bus and before the SWEND# is completed. A snoop
in this region has its lookup blocked until after the
SWEND#. After SWEND#, the snoop response is
given, but no write-back will be initiated until after
CRDY#.
Region 4 is after SWEND # and before CRDY #. A
snoop in this region is looked up immediately but
serviced after CRDY #. This snoop is logically treated as if it occurred after CRDY # (snoop hits to modified data will schedule a write-back which will be
executed after the conclusion of the current memory
bus cycle). Note that the result of the snoop
MHITM#, MTHIT# will be available immediately
with the look-up.
2-285
iniaL
82495XP Cache Controller/82490XP Cache RAM
6.2.5 SNOOP INTERLOCK
The 82495XP uses two interlock mechanisms to ensure that Snoops are identified within the proper region. The first interlock ensures that once a BGT #
has been given snoops are blocked until after
SWEND#. The second interlock ensures that once
a snoop has been started BGT # cannot be given
until after the snoop has been serviced.
Figure 6-11 shows how once the 82495XP sees a
BGT# it blocks all snoops until after SWEND#. If a
snoop has been initiated, and no SNPCYC# has
been issued before BGT # assertion, the snoop has
been blocked.
Figure 6-12 shows a snoop occurring before BGT #.
Once the 82495XP has honored a snoop, the
82495XP, depending on the result of the snoop, may
ignore BGT # until the snoop is serviced. The
82495XP will always ignore BGT# when SNPCYC#
is active. If the snoop result is a hit to a modified line
(MHITM # active), the 82495XP will ingore BGT # as
long as both SNPBSY# and MHITM# remain active. In this case, it is the memory bus controller's
responsibility to hold BGT # until SNPBSY # goes
inactive or reassert it after SNPBSY # becomes inactive. If the snoop result is not a hit a modified line
(MHITM # inactive), the 82495XP is capable of accepting BGT# even when SNPBSY# is active. This
allows the memory bus controller to preceed with a
memory bus cycle by asserting BGT# while the
82495XP is performing back-invalidations.
These two interlock mechanisms provide a flexible
method of ensuring predictable handling of overlapped snoops.
NOTE:
Even when snoops are delayed, address latching is
performed with SNPSTB# activation.
ClK
I
I
BGT#
UJ
I
SWEND#
or
I
CRDY#
I
x
I
UJ
:X
240956-21
Figure 6-11. BGT # Blocking a Snoop
ClK
< SNoop>
SNPCYC#
MH 1Tt.1 #
SNPBSY#
--------------~UJ~------------------------
K\\\
K~~~\~
____, --__--~I
I
I
X__-+:______
+-__
B_G_T_#_!-GN-O-R-E-D~--____~----X
:
240956-22
Figure 6-12. Snoop Occurring before BGT#
2-286
8249S){P Cache Controller/82490){P Cache RAM
~OO[g[bO[MJOIfil&OOW
BGT # and CRDY # are required for all (non-snoop)
cycles. KWEND# and SWEND# are only required
for those cycles which sample them.
6.2.6 SNOOPS CONCURRENT WITH LINE FILL
CYCLES
During snoops concurrent with line-fills/allocates,
the following responsibility boundaries must be fullfilled in order to insure data consistency:
Once a signal has been sampled, it is a "don't care"
until CRDY # of that cycle. Additionally, these signals plus the attributes MRO#, MKEN#, MWBI
WT #, and DRCTM # need only follow setup and
hold times when they are being sampled.
o If a snoop happens before BGT #, more precisely
if SNPCYC# is active before BGT#, it is the system's responsibility not to return stale data within
the line-filii allocation.
o If a snoop happens after BGT # , more precisely if
SNPCYC# is active after BGT#, then the
B2495XP insures data consistency by providing
interlocks with the CPU which avoid caching of
stale data.
For pipelined cycles, the cycle attributes (BGT #,
KWEND#, ... ) will only be sampled after CRDY#
of the previous cycle.
Note that there are many other rules that govern
when signals may be asserted in relation to one another. These may be found in the specific pin descriptions of each signal in chapter 7.
6.3 Memory Bus Controller Interface
Rules
Snoop-Write-Back cycles are a subset of the normal
cycles. Snoop-Write-Back cycles are requested as a
consequence of snoop hits to Modified lines. Those
are intervening cycles and are requested by activating SNPADS# instead of CADS#. For those cycles,
the B2495XP only samples the CRDY # response.
The B2495XP assumes that the memory bus controller owns the bus to perform the intervening writeback (restricted back-off protocol) and that no other
agents will snoop this cycle. Also the B2495XP will
ignore CNA# during Snoop-Write-Backs.
To begin a cache cycle, the B2495XP outputs the
CADS# signal. The cache address and other cycle
parameters are guaranteed to be stable with
CADS # assertion. These parameters are guaranteed to be stable until CNA# or CRDY # of that cycle. After CNA # or CRDY # these parameters are
undefined.
Either during, or after CADS# the CDTS# signal is
asserted. Data is guaranteed to stable with CDTS #
assertion, or the data path is available.
182495 Output Signals 1
CADS# COTS#
~
,----------------'--.
~~.1~f--~f~-rf--~1~T~im.~
BGT#
KWEND#
SWEND#
t------
CRDY#
CNA# - - - - - -
1 82495 Input Signals
t
1
240956-23
Figure 6-13. Cycle Progress
2-2B7
fI
int:el..
82495XP Cache Controller/82490XP Cache RAM
182495 Output Signal.
I
SNPADS# CDTS#
'\. ').----------------'...
.
I
.
82495 Input Signals
f
Time
CRDY#
I
240956-24
Figure 6·14. Cycle Progress for Snoop Cycles
6.5 Cycle Length
6.4 LOCK # Protocol
When CADS# is generated, the 82495XP outputs
The 82495XP provides a LOCK signal for the memory bus called KLOCK #. KLOCK# is. generated by
the 82495XP whenever the CPU generates the
LOCK # signal. KLOCK #, like the other cycle attributes, is valid with CADS # assertion.
CW/R# and MCACHE#. These signals provide the
MBC with enough information to determine the type
of 82495XP cycle. Table 6-1 summarizes the cycle
types for the 82495XP/82490XP. All line-fills and
write-backs to the 82495XP/82490XP cache operate on the entire length of a cache line.
When the CPU generates a LOCK cycle, the
82495XP always. generates a bus cycle. LOCK cycles are non-cacheable to both the. 82495XP and
CPU, so the information is passed through the
82490XPs to the CPU with BRDYs generated by the
MBC. If the. LOCKed read cycle is a hit in the
82495XP, the 82495XP ignores the data that it is
receiving and supplies data from the 82490XP array
(in accordance with the BRDYs supplied by the
MBG). Locked writes are posted like any other write.
LOCKed cycles, both reads and writes, never
change the 82495XP tag state.
In addition to the length of the cycle from.. the
82495XP/82490XP, the memory bus c;:ontroller may
need to determine the length of the cycle to the
CPU. Specifically, for those 82495XP cycles where
RDYSRC= 1, the MBC must decode the i860 XP
CPU's W/R#, LEN, and CACHE# outputs to determine the number of BRDY#s which the MBC will
provide to the CPU. These signals are captured for
the current cycle by a user-provided BE latch (see
Section 7.2 for details). Table 6-2 presents the CPU
cycle length definitions; see the i860 XP microprocessor Data Sheet (Order #240874) for further details.
During a LOCKed cycle, the MBC must prevent other masters from snooping the 82495XP. Specifically,
the MBC must prevent SNPSTB# between BGT#
of the first LOCKed transfer, and SWEND# of the
last LOCKed transfer.
Locked
Locked
CADS
LOCK
CADS
LOCK
Read
Read
\
t
I
CADS
LOCK
Writ.
CADS
LOCK
\
\
\
BGT
Read
f
SWEND
C
I
Time
f ~
BGT
~.
MBC Must Not Allow Snoops
Figure 6-15. Snooping During LOCKed Cycles
. 2-288
240956-25
82495XP Cache Controller/82490XP Cache RAM
Table 6-1. 82495XP/82490XP Cycle Determination
CW/R#
RDYSRC
Posted Write
1
Write Backs
1
Non-Cacheable Read
Cycle Type
MCACHE#
MKEN#
0
1
X
0
0
X
0
1
1
X
Non-Cacheable Read
0
1
0
1
Cacheable Read
0
1
0
0
Allocation
0
0
0
X
Table 6·2 i860 XP CPU Cycle Determination
W/R#
LEN
CACHE#
MKEN#
Cycle Description
Burst Length
0
0
1
-
Non-Cacheable 64-Bit Read
1
0
0
1
0
1
-
0
0
1
0
1
1
1
-
Non-Cacheable 64-Bit Read
1
-
64-Bit Write
1
1
-
I/O and Special Cycles
1
1
-
Non-Cacheable 128-Bit Read
2
1
Non-Cacheable 128-Bit Read
2
1
-
128-Bit Write
2
-
1
0
-
0
0
Cache Line Fill
4
1
-
0
-
Cache Write-Back
4
NOTE:
.
If MRO# is asserted to the 82495XP, the effect on i860 XP CPU cycle determination is the same as when MKEN# = 1.
6.6 Consecutive Cycles
-
Because a 82495XP line can be longer than a CPU
line, there are circumstances where a read miss will
be to a line that is currently being filled. If this is the
case, the 82495XP treats this like a read hit, but
supplies data after CRDY # for the line fill. Data is
supplied from the 82490XP array.
-
6.7 CPU/Memory Bus Concurrency
The 82495XP allows concurrency between the CPU
and memory buses. CPU bus cycles will either be
serviced locally by the 82495XP (hits) or require
memory bus service. Whenever a CPU cycle requires memory bus service, it will be scheduled to
run on the memory bus, and CPU bus activity will be
allowed to continue.
Examples of concurrency are:
- Snoops and CPU bus operations
- Posted writes with CPU and memory bus operations
CPU bus operation on the back of long line fills
(82495XP line longer than the CPU line)
Allocations and replacements with CPU and
memory bus operations.
In certain cases, consistency of data and prevention
of deadlocks preclude concurrency. Problems may
occur when the current memory bus cycle changes
the tag state and therefore affects the operation of
the next CPU cycle request. In those cases the
82495XP will hold concurrency to ensure data consistency. Handling of those cases is completely
transparent to the MBC.
The 82490XP supports two modes of memory bus
operation: clocked mode and strobed mode. In
clocked mode, memory bus signals are sampled by
the 82490XP on rising edges of MClK. Similarly,
memory bus data and signals are output by the
82490XP with respect to MClK (or MOClK) rising
edge transitions.
In strobed mode, memory bus signals are sampled
or output with respect to rising and falling edges of
other signals. Strobed mode has ~he advantage of
not requiring setup and hold times to a ClK or MCLK
edge.
2-289
82495XP Cache Controller/82490XP Cache RAM
6.8 Memory Bus Modes
6.8.1.2 Asynchronous Clocked Mode
In asynchronous clocked mode, MClK is not the
same frequency as ClK. Some memory signals,
since they reference MClK, must be synchronized
to ClK to communicate with the 82495XP. For example, when a cycle completes, the memory system
asserts a signal, driven from MClK, to the memory
bus controller which will be synchronized to ClK to
become CRDY #. This is because CRDY # is syn'
chronous to ClK and not MClK.
Clocked Memory Bus Mode
CLK~
MCLK
I
I
-:-c::
-cb~
1I
Valid
setup I hold
240956-27
Strobed Memory Bus Mode
Asynchronous mode allows the rest of the system to
run at a lower frequency than the CPU ClK. Not only
does this simplify system design, but allows the designer to place hooks to allow the same design to
scale easily to a higher frequency. If all the features
of the 82495XP are used properly, an asynchronous
memory design does not have to incur much synchronization penalty. For example, MEOC# is synchronous to the memory environment (MClK). This
allows the memory system to end the current cycle
and start the next before CRDY# is synchronized in
the CPU environment.
CYC1.~
"
Start
I
MSEL#
::I
[¥)[R1[gILO[M]OIRl~[R1W
,
I '
~~
Active Inactive
240956-28
Figure 6-16. Clocked and Strobed
Mode Sampling
6.8.1 CLOCKED MODE
In clocked mode operation MClK is used to reference the signals MDATAO-MDATA7, MSEl#,
MFRZ#, MZBT#, MBRDY#, and MEOC#. Clocked
mode will be selected if the 82490XP detects a
clock at its MClK input after RESET. MClK need
not have any relation to ClK. If this is the case, the
memory bus is said to be operating in "clocked
asynchronous" mode. If MClK = ClK, the memory
bus is operating in "clocked synchronous" mode. If
MClK x N = ClK (where N = 2, 3, 4 ... ), the
memory bus is operating in "clocked divided synchronous" mode. These three clocked modes, asynchronous, synchronous, and divided synchronous,
are not differentiated by the 82490XP.
MOClK controls a transparent latch at,the 82490XP
data output pins. If a clock is provided at this input,
data is latched with MOClK going low. This clock is
available in clocked mode only. MOClK allows the
system to provide a greater MDATA hold time by
skewing MOClK from MClK. If MOClK is tied high,
MDATA is driven from MClK.
6.8.1.1 Synchronous Clocked Mode
In synchronous clocked mode MClK = ClK. This
means the CPU clock is used for 82495XP,
82490XP, and the memory bus. A synchronous
memory bus allows memory to com'municate with
the 82495XP without synchronizers since the
82495XP runs with ClK. With a synchronous design,
however, high clock frequencies must be routed to
all parts of a system with minimal skew. This may
not be possible with future projected frequencies. A
synchronous memory system and memory bus controller must be redesigned when future speed upgrades are required.
6.8.1.3 Divided Synchronous Clocked Mode
Divided synchronous clocked mode is a subset of
synchronous clocke.d mode. It allows two things to
happen: One, the memory system is capable of
communicating with the 82495XP without synchronization. Two, a slower frequency clock may be routed
around the system.
Divided synchronous mode still requires clock skew
restrictions. It also carries the same scalability drawbacks that full synchronous mode does.
6.8.2 STROBED MODE
Strobed mode is configured on the 82490XP by
strapping MClK high. In strobed mode:
- MDATAO-MDATA7 are sampled with respect to
edges of MEOC#, MISTB, and MOSTB.
- For write cycles, MFRZ# is sampled when
MEOC# goes active.
- MZBT# is sampled when MSEl# is inactive,
and is latched when MSEl# goes active.
MZBT # is also sampled for the next operation
when MSEl# is active and MEOC# goes active.
By not using MClK, strobed mode has no setup and
hold time restrictions, and is scalable to higher frequencies. Strobed mode does, however, require
synchronization to 82495XP ClK synchronous signals.
2-290
82495){P Cache Controller/8249DXP Cache RAM
~OO[g[bOIMlOOO~OOW
6.9 Memory Bus Operation
6.9.2 MEMORY CYCLE BUFFERS
All data is handled by the B2490XP cache RAMs.
The B2495XP instructs the B2490XP whether to use
the data array or buffers, and specifically which buffer to use. The MBC is responsible for bursting 'data
in and out of the B2490XP's, in and out of the CPU
during miss cycles, and indicating when the operation is finished. Communication between the
B2490XPs and memory bus may be done in a
clocked mode or strobed mode. See the Memory
Bus Modes section for more details. '
There are 2 memory cycle buffers in the 82490XP.
They are used for line-fills, allocates, and memory
writes. The buffers are 64-bits wide (per B2490XP) to
support B transfers with B memory bus 1/0 pins
(maximum configuration). The B2490XP alternates
use of these buffers. When one buffer has a posted
write or is being used for a memory read, the other
one is available for the next cycle.
A B2490XP has 4 memory buffers. It has 2 memory
cycle buffers, one write-back buffer, and one snoop
buffer. Each buffer is capable of holding an entire
B2495XP line of the longest configurable length.
The memory cycle buffers of the B2490XP are used
for posting writes and holding data during
B2495XP/B2490XP line-fills. The write-bacl< buffer is
used for holding data from a cache replacement.
This data is ready to be written out, and the writeback buffer is snoopable. The snoop buffer is used
to hold modified data that has been hit by a snoop.
Since snoop hits are the highest priority cycle, this
buffer will be emptied before any other cycle or
snoop request begins.
,6.9.1 82490)(P BUFFERS AND MUXES
The 4 B2490XP memory buffers are all multiplexed
(muxed) to the memory bus. The mux is used to select which buffer is on the bus, and specifically which
slice of that buffer is on the bus. MBRDY # assertion
increments a counter for this mux which selects the
next slice of that buffer.
The counter used to increment through the buffer
slices is called the memory burst counter. The memory burst counter follows the CPU burst order depending on the subline address of the initial slice.
Once the MBC is finished with a buffer, MEOC# is
asserted to switch the mux to the next buffer to be
used. MEOC# will also reset the counter and latch
'the last slice of data.
On the CPU side, the 82490XP contains a CPU buffer and mux. The CPU buffer captures data from the
appropriate memory buffer or 82490XP array to feed
it to the CPU. The mux selects which slice is muxed
to the CPU bus. The counter for this mux is incremented with BRDY#.
The 82490XP array contains a mux that selects
which way, based on the MRU algorithm, will be
read during hit cycles. This mux is used during write
cycles to write to the correct way.
During allocation cycles, read for ownership may be
implemented by using the MFRZ# signal. If MFRZ#
is sampled active during the write cycle, the memory
cycle buffer will freeze the write data in the buffer so
the subsequent line-fill fills around it. This way the
write cycle need not be written to memory. The line
must be tagged as modified.
6.9.3 WRITE BACI( BUFFER AND SNOOP
BUFFER
The write back buffer and snoop buffer are both 64bits to handle the maximum B2495XP line length.
The write back buffer is used when replaced data
must be written back to main memory (including
FLUSH and SYNC cycles) and the snoop buffer is
used when data must be written out from a snoop
hit.
Before a line fill begins, the B2495XP checks to see
if it must remove a modified line to make room for
the line-filled line. If so, the modified line is placed in
the write back buffer and the line-fill is filled through
a memory cycle buffer. Should the line-fill be select,
ed as non-cacheable, both buffer contents are discarded and the B2490XP array value remains as it
was before the line-fill.
There is no need to run the line-fill, replacement
(write back), FLUSH, or SYNC cycles contiguously:
If a snoop is requested between the two cycles, the
write back buffer is snoopable, and data can be written directly out of it if need be.
6.9.4 MEMORY BUS CONTROL SIGNALS
The main memory bus control signals are MSEL#,
MEOC#,MBRDY#, and CRDY#. These signals
control the B2490XP data path, buffers, and muxes.
MSEL# selects which 82490XPs are being used in
the current cycle by qualifying the MBRDY # Signal.
If MSEL# is inactive, MBRDY# is not recognized for,
that 82490XP. MSEL# is also used to reset the
memory burst counter. If MSEL# goes inactive, the
counter is initialized to its starting value. This is use-
2-291
inteL
82495XP Cache Controller/82490XP Cache RAM
ful for aborted/restarted cycles. MSEL# may remain
active for many or all cycles. MSEL# must, however, be inactive sometime after RESET to initialize the
memory burst counter for the first time.
6.9.5 82490XP DATA PATH
An example 82490XP read data path is shown in
Figure 6-6. The path between the CPU and memory
bus is a flow-thru' path, not a clocked path. Each
entire 82495XP cache line of data in the CPU buffer
is available at the memory buffer with some propagation delay. Likewise, each entire 82495XP cache
line of data in the memory buffer is available in the
CPU buffer with some propagation delay. Data is
burst into and out of the memory buffer using
MBRDY # or MISTB/MOSTB. Data is burst into and
out of the CPU buffer using BRDY #. This means
there is no synchronization required between memory and CPU data paths.
MEOC# is asserted by the MBC to end finish with
the current buffer, and switch the memory bus to the
next buffer to be used. MEOC# latches in the last
piece of data and resets the memory burst counter
before switching to the new buffer.
MBRDY #is used to increment the memory burst
counter to select the next slice of data. This will
strobe data out of the 82490XP (write cycles) or load
data into the 82490XP (read cycles). MBRDY# is
ignored by the 82490XP if MSEL# is inactive.
To give an example how the path works, during a
CPU line fill, data may be returned to the CPU in two
different fashions. One, each time the memory buffer fills a dword, BRDY# may be asserted a clock
later to burst it back to the CPU. Two, the memory
buffer can be filled and then BRDY # asserted on
four consecutive clocks to burst data back to the
CPU.
CRDY # finishes the current cycle. Once CRDY # is
asserted, the 82490XP disposes of the information
in the buffers used in that cycle, and loads information into the 82490XP array. CRDY # must be asserted on the clock or after MEOC# is asserted for
a particular cycle.
t
r-----------------
Data To CPU MUX
To CPU
JBurst
Count
BRDY#
t
I
I
I
I
I
ILo
OE
___
32/
OE
OE
32/
32/
~
Burst
Count
r---
I
I
I
I
I
I
I
OE
--- --- ---- --
MBRDY#
-~
32/
CPU Latch
-~
Flow-Thru
Pa ths
--- --- --- --
~
I
I
I
T .
I
Latch
~------------------~
t
Data From Memory MUX
Figure 6-17. 82490XP Read Data Path
2-292
Mem Buffer # 1
240956-29
82495XP Cache Controller/82490XP Cache RAM
A snoop hit to a modified location causes a line of
cache data to be written out to memory. Snoop hits
are the highest priority cycle and must be serviced
immediately. A snoop hit to a modified location causes the snooped line to be written to the snoop buffer
of the 82490XP. SNPADS# is then asserted and the
snoop is written out.
6.9.6 WRITE CYCLES
There are 3 basic types of write cycles: CPU generated write cycles, write back cycles caused by a
cache replacement, and snoop write back cycles
caused by a snoop hit. All write cycles, except the
snoop write back, begin with CADS# assertion. The
snoop write back cycle begins with SNPADS#.
6.9.6.3 Memory Bus Controller Responsibility
6.9.6.1 CPU Generated Write Cycles
The MBC recognizes a write cycle with CADS# and
CW/R# {or SNPADS#for snoop cycles}. If
MCACHE# is active, the MBC knows the cycle is a
write back cycle, otherwise it is a CPU-generated
cycle.
When the CPU begins a write cycle, four things can
happen to it. One, the CPU write is a hit to a modified or exclusive line. In this case the write is terminated by the cache immediately and invisibly to the
MBC.
CPU-generated write cycles are written to the main •
memory bus so that other caches can invalidate
their copies of this information. The other caches do ;, ..
this by snooping with SNPINV active during snoop
initiation if they detect a write cycle on the bus.
Two, the write is to a shared location. This type of
write is posted to the 82490XP memory cycle buffers
and the cycle is terminated by the cache. If a memory cycle buffer is occupied with a write cycle, the
CPU waits until the previous write completes. The
write cycle must be written to the memory bus so
that other copies of the write in other caches be
invalidated.
Once the MBC detects CDTS# active, the data will
be available for writing in the next clock in the appropriate 82490XP. buffer. The MBC should assert
MSEL# so bursting is enabled, and burst through
the write using MBRDY#(or MOSTB}. MSEL# activation also caused MZBT # to be sampled. MZBT i,l
must be inactive at this time if the data will be written
according to CPU burst order.
Three, the write is a cache miss. This type of write is
posted to a memory cycle buffer if the 82490XP is
not waiting for another posted write to complete. If
PALLC# is asserted, the write may be turned into an
allocation.
Once the write cycle is complete, MEOC# must be
asserted to end the write cycle and switch to the
next pending cycle. If this write cycle is turned into
an allocation, MFRZ# is sampled with MEOC# to
freeze the write data in the 82490XP.
Four, the write is a LOCKed write. LOCKed writes
are posted regardless of the tag state. The write is
then treated as if it were a miss except that there is
no change in the tag state and no allocation allowed.
MEOC# simply switches buffers from the current
one in use to the buffer of the next pending cycle.
CRDY # needs to be asserted to actually end the
cycle and allow the 82495XP and 82490XP to dispose of the information.
6.9.6.2 Cache Generated Write Cycles
The 82495XP/82490XP will generate a write cycle in
three situations: a line fill or allocation causing a
cache replacement, a snoop hit to a modified location, and write backs caused by FLUSH or SYNC.
Write back caused by FLUSH or SYNC are indestinguishable from write-back cycles caused by replacement. Cache generated write cycles are the length
of a cache line.
Cache replacements and FLUSH/SYNC cycles
cause a line {or two lines if sectored} of cache data
to be placed in the write-back buffer of the 82490XP.
If no cycle is pending, CADS# is asserted and the
data is written out. If a snoop hits the write-back
buffer, the data is written out via SNPADS# like a
normal snoop hit. The write back is then cancelled
since the data was written through the snoop hit.
6.9.6.4 Write Allocation and Read for
Ownership
The 82495XP/82490XP supports write allocation.
An allocation cycle is a read of a cache line caused
by a write miss to the same location. In its simplest
form, a write miss is written to memory, then the
82495XP requests a line from that same location.
Meanwhile, the CPU only sees the write cycle.
Write allocation may only be done if PALLC# is active during CADS # of the write cycle. For the allocation to occur, MKEN# must be returned active during KWEND# of the write cycle. The write cycle may
2-293
82495XP Cache Controller/82490XP Cache RAM
be an actual write or a "dummy" write. Dummy
writes are write cycles· that· are terminated in the
82495XP and 82490XP as if they were normal
writes, butthe data is not actually written to memory.
This saves a data write to memory.
During write allocation, the write cycle will progress
like a normal write cycle except MKEN # must be
active during KWEN D # activation. If the write cycle
is a dummy write, MFRZ # must be used with
MEOC# so that the line filled data is read around
the write data into the 82490XP buffer. The line fill
cycle is like any other line fill cycle except the CPU
doesn't. get any. data. If a. dummy write.wasperformed, DRCTM # must be asserted during
SWEND# activation to fill the line to the M state,
and any cache supplying the data must invalidate its
copy.
Using dummy write cycles and filling data to the M
state from another cache or memory is called Read
For Ownership. This is because ownership is being
transferred. In read for ownership cycles, memory is
avoided as much as possible. First, the dummy write
cycle avoids memory. Second, a line fill is performed
as a cache to cache transfer with DRCTM # asserted. All caches were snooped with invalidate to eliminate their copies.
For allocation cycles, SWEND# is not sampled for
the write portion of the allocation.
6.9.7 READ CYCLES
The CPU initiates all read cycles. These are usually
line fills to the CPU and line fills· to the
82495XP/82490XP. The signal MCACHE# is output
with CADS # to indicate whether this cycle mayor
may not be cacheable. If cacheable, MKEN# is returned by the MBC to ultimately determine cacheability. '
Read hit cycles are serviced by the. cache without
MBC intervention. The only read cycles seen by the
MBC (except I/O or special) are read misses and
locked read cycles.
Read misses cause CADS# to be asserted at most
two clocks afterADS# of the CPU read cycle. If
cache able, as determined from MCACHE #, the
MBC will return 4 BRDYs back to the CPU and 4 or 8
MBRDYs to the 82495XP/82490XP. If the transfer is
non-cacheable, the i860 XP CPU lEN and CACHE #
outputs indicate the number of transfers to be given
to the CPU. MBRDY# need not be used in the
transfer if only a single piece of data is required by
the CPU.
.
If the read cycle is cacheable, it may cause another
cached line to be bumped out of the cache. This is
called a replacement and, if modified, causes a write
back cycle. While one of the 82490XP memory buffers is being filled for the line fill, the write back buffer
is loaded. If the line fill turns out to be non-cacheable at the end of the transfer, the write-back buffer
is discarded, and the line in the cache remains valid.
Otherwise, CADS# will be generated after the read
cycle so the write back can be performed. The write
back need not happen immediately after the line fill
since the write-back buffer is snoopable.
All locked reads go to the memory bus. If the read is
a cache Mto M', the 82495XP/82490XP will ignore
the data that the MBC returns, and provide it from its
array. locked reads are not cacheable by the CPU
or the 82495XP/82490XP. Snoop write-backs that
are a result of a lOCKed read/write request must
update memory.
6.9.7.1 Memory Bus Controller Responsibility
Once the MBC sees a read cycle on the memory
bus, it must determine whether the read is cacheable or non-cacheable using MCACHE # and its own
address decoding. If non-cacheable, the CPU expects a number of transfers as determined by its
lEN and CACHE# outputs. If cacheable, the .CPU
expects 4 transfers, and the cache expects 4 or 8
(configuration dependent).
MKEN# is sampled during KWEND# to determine
cacheability. Before MKEN # is sampled, KEN # is
active assuming cacheability for the CPU. MKEN #
must be sampled 1 clock before the first BRDY # to
make the cycle non-cacheable.
Once the read cycle is given to the memory system,
all 82495XP/82490XP caches snoop to see if they
contain the data in modified form. If so, the MBC
must abort the cycle in memory and receive the· data
directly from the 82495XP/82490XP that has it, or
wait until that cache writes it to memory. If the data
transfer avoids memory, ie goes cache to cache,
DRCTM#must be asserted with SWEND# to place
the line in the M' state and the cache giving the data
must invalidate its copy.
MSEl# is activated and MBRDY# (or MISTB) used
to sample input data from the read cycle. Once
CDTS# has been seen active, the CPU read data
path is clear. BRDY # may be returned to the CPU
sometime after each MBRDY # for each piece of input data (see MDATA setup to ClK). Once the
transfer completes, MEOC# and CRDY# are asserted to complete the cycle in the 82495XP/
82490XP.
2-294
82495XP Cache Controller/82490XP Cache RAM
6.9.8 I/O AND SPECIAL CYCLES
110 and special cycles (flush, etc) are decoded by
the 82495XP and not posted. These cycles wait until
all buffers have been written, and all cycles have
been completed, before they cause CADS# assertion. The CPU waits until the special cycle ends with
the MBC's BRDY # assertion before it continues.
When the 82495XP/82490XP is performing a
FLUSH or SYNC, many write back cycles are required. These cycles look like ordinary write back
cycles, and should be handled as such. FSIOUT # is
active during these write back cycles, so when FSIOUT # goes inactive the cycle is complete and the
memory bus controller can supply BRDY # to the
CPU.
[¥>oo[gl1.nlMlnOO~OOw
In this example, the CPU port of the 82490XPs is in
x4 mode and the memory bus port is in x8 mode.
This allows all 128 bits of the memory bus to be
multiplexed to the 64-bit CPU bus.
For read cycles, each MBRDY # loads 8 bits into
each 82490XP. This is 128-bits of data. It will take 2
BRDY # assertions to load this into the CPU. The
first BRDY # assertion loads the first 4 bits onto the
CPU bus, and the next BRDY # assertion loads the
remaining 4 bits.
For a 64-bit write cycle, the data is available at the
on the appropriate data bits. On the i860 XP CPU
with a 128-bit bus, this is determined by CPU address bit AS. The other data bits are undefined. For
write-back cycles, all 128 bits are available at once.
MBRDY # assertion will strobe the next 128 bits on
the memory bus.
6.10 Different Bus Widths
The 82490XP is capable of supporting either 64- or
128-bit memory bus widths. Depending on the configuration, the 82490XP's CPU and 110 busses may
be multiplexed. The following diagram shows how
an i860 XP CPU may be connected to a 128-bit
memory bus:
060-063
F
F
F
__ ,824;OXP F
824~OXP F
DIETAILIED PiN DIESCRIPTIONS
The following chapter provides a detailed description of each pin of the 82495XP and 82490XP. The
pins have been categorized by function. Each pin
description has a heading which summarizes the
most important aspects of the pin. The heading is
organized as:
0124-0127
060-063
Pin Name
--~ 8241940XP
Name Meaning
Pin Function
110, 82495XP/82490XP/iB60 XP CPU, (location)
Signal Type
Synchronousl Asynchronous
--~ 8241930XP
__ 18241920XP
for example,
CADS #
--,
04-07
7.0
068-071
04-07
064-067
00-03
240956-30
Figure 6-18. 82490XP On Wide Bus
Cache Address Strobe
Indicates beginning of cache cycle
Output from 8249.5XP (pin ES) Cycle Control Signal
Synchronous to ClK
Following the heading are three sections. The first
section, Signal Description, provides information of
what the signal does, how to use it, and in what
modes it operates. The second section, When Sampled or When Driven, indicates all the exact places
where the part samples the signal, generates the
signal, or neither. The third section, Relation to Other Signals, mentions the other signals that are affected by this signal, synchronization requirements,
and shared pins.
2-295
II
82495XP Cache Controller/82490XP Cache. RAM
All specific information about each pin is provided in
this chapter.
7.0.3 82495XP/82490XP INTERFACE SIGNALS
These pins comprise the interface between the
82495XP and 82490XP. The 82495XP uses these
pins to control the 82490XP and its buffers. The signals in this interface are not flexible; Chapter 10 addresses the use of these signals. The following are
the 82495XP/82490XP interface signals:
7.0.1 CONFIGURATION SIGNALS
These signals are inputs to the 82495XP· and
82490XP that are sampled at RESET and alter the
configuration and operation of the cache.
WRARR#
BUS#
WBA[SEC2]
BLAST #
elK
WAY
MCYC#
WBTYP[LRO]
BOFF#
MAWEA#
WBWE#[LR1]
BRDYCU
RESET
SIGNAL DESCRIPTIONS
Config
Setup
Hold
7.1
240956-31
Figure 7-1. Configuration input Setup and Hold
Each set of configuration inputs may have different
setup times, but all signals have the same hold time:
The signals may be released on the CPU clock edge
that RESET is detected inactive. There are some
configuration signals that are strapping options and
cannot change their value during 82495XP operation.
7.0.2 CPU BUS INTERFACE SIGNALS
These pins comprise the interface between CPU
and 82495XP/82490XP. The signals in this interface
are not flexible; Chapter 10 addresses the use of
these signals. The following are the CPU bus interface signals:
SETO-SET10
ADS#
M/IO#
PWT
BRDYC1#
EADS#
BOFF#
TAGO- TAG11
W/R#
HITM#
PCD
KEN#
BEO-BE7#
CFAO-CFA6
D/C#
LOCK#
LEN
AHOLD
INV
The majority of these signals must be connected
strictly between the i860 XP CPU and the 82495XP.
However, a subset of these signals is needed by the
MBC to decode the i860 XP CPU cycle in cases
where the MBC provides BRDYs to the CPU. For
these purposes the following signals must also be
inputs to a.latch controlled by the 82495XP's BLE#
output:
BEO#-BE7#
LEN
PWT
CACHE#
PCD
CTYP
PCYC
BGT#
Bus Guaranteed Transfer
Signals 82495XP of memory 'bus controller's commitment to complete the bus cycle.
Input to 82495XP (pin M4)Cycle Progress Signal
Synchronous
7.1.1 SIGNAL DESCRIPTION
The 82495XP owns all bus cycles (initiated by
CADS#) until the memory bus controller accepts
ownership. During this time cycles may be aborted
due to a snoop. The memory bus controller signals
its acceptance of ownership by driving BGT # active
into the 82495XP. Once BGT# is driven active, the
memory bus controller is responsible for completing
the cycle on the memory bus. CRDY # signals completion of the cycle.
Once BGT# is asserted, other devices may not perform snoops into the 82495XP until the end of the
snooping window, SWEND# activation. The snoop
address is latched if SWEND is asserted between
BGT# and SWEND#, but the snoop does not begin
until after SWEND# is asserted. SNPCYC# will not
be asserted until the snoop window ends with
SWEND# asserted. The advantage of asserting
BGT # early is that it allows the 82495XP to start
inquiries to the CPU, load the write-back buffer, and
progress forward in. the CPU bus pipeline. The disadvantage is that snooping of this 82495XP is now
blocked until SWEND# is asserted.
7.1.2 WHEN SAMPLED
After the 82495XP asserts CADS#, it begins sampling BGT # until it is sampled active.
BGT# is a "Don't Care" after it has been recognized for cycle N and prior to the assertion of
2-296
82495XP Cache Controller/82490XP Cache RAM
CADS# for cycle N + 1. In addition, BGT # is a
"Don't Care" once a cycle started by CADS # is
aborted by a snoop, until the cycle is restored by the
re-issueing of CADS #.
[¥)OO~!bO~O[t{]~OO)7
CRDY#, which is one clock AFTER CADS# for
back-to-back cycles. The signals latched in the BE
latch are only valid for CPU generated memory bus
cycles (ie, not a 82495XP generated writeback or
allocation).
7.1.3 RELATION TO OTHER SIGNALS
When implementing BGT# in the MBC the following
rules should be used:
1. BGT# must follow every assertion of CADS#,
unless the cycle is aborted due to a snoop.
2. It must preceed CRDY # (for line fills and allocations BGT # must preceed CRDY # by at least 3
ClKS).
3. In addition BGT# must be asserted with or before the assertion of KWEND# and SWEND#.
4. BGT# must be asserted with or before the assertion of BRDY # by the MBC.
5. BGT # is not required following the assertion of
SNPADS#.
6. BGT# must be asserted with or before MEOC#
is asserted.
7.2 BlIE#
BE latch Enable
Controls latching of i860 XP CPU's byte enable and
cycle attribute signals
Output of 82495XP (pin C16) Cycle Control Signal.
Synchronous to ClK
7.2.1 SIGNAL DESCRIPTION
BlE# is used to control the enable line of an external latch (clock edge triggered '377 type). This latch
is used to capture the i860 XP CPU's byte enables
(BEO # - BE7 #) and CPU cycle attribute signals
which do not go through the 82495XP. The 82495XP
manages the opening and closing of this latch: when
BlE# is active, new values from the CPU enter the
latch at each rising edge of ClK.
The 82495XP latches the byte enables after ADS#
of a memory bus bound cycle. It relatches this information with CRDY # or CNA # of that cycle if another cycle is pending.
7.2.3 RELATION TO OTHER SIGNALS
The following CPU signals must be latched in the BE
latch:
BEO#-BE7#
lEN
PWT
CACHE#
PCD
CTYP
PCYC
All other signals in the 82495XP to CPU interface
(listed in sec. 7.0.2) must be connected only between the i860 XP CPU and the 82495XP.
7.3 I8RDV#
Burst Ready
Memory Bus Controller Burst Ready input to
82495XP, 82490XP, and i860 XP CPU
Input to 82495XP and 82490XP (82495XP pin P1,
82490XP pin 60) Cycle Progress Signal
Input to i860 XP CPU (BRDY2#, pin U1)
Synchronous to ClK
7.3.1 SIGNAL DESCRIPTION
The BRDY # input to both the 82495XP and
82490XP must be connected to the BRDY# signal
which the MBC is providing to the i860 XP CPU's
BRDY2# pin. The Signal is used by the 82495XP for
burst tracking purposes. In the 82490XP, it increments the CPU latch burst counter.
During CPU read cycles, BRDY # allows the next 32
or 64-bit slice of read data to be available at the
82490XP's CDATA outputs (CPU bus) by advancing
the CPU latch burst counter. At the same time,
BRDY # is latching the previous slice of data into the
i860 XP CPU. Refer to chapter 6 for more details.
During CPU write cycles, BRDY # is used to latch
each slice of write data into the CPU latches and
advance the latch counter.
7.2.2 WHEN DRIVEN
During CPU special and I/O cycles (which are not
posted) BRDY # is used to end the cycle.
The 82495XP latches the BE latch signals 1 clock
after ADS# of a memory-bound cycle. Thus latched
BEO#-BE7# are valid with CADS#. The 82495XP
opens, then closes this latch if a cycle is pending
and CNA# or CRDY# is asserted. Thus latched
BEO#-BE7# are valid two clocks after CNA# or
BRDY # must not be asserted until the bus is granted (BGT # asserted) and until the data path is ready
for transferring (CDTS# is asserted).
2-297
II
82495XP Cache Controller/82490XP Cache RAM
7.3.2 WHEN SAMPLED
7.4.3 RELATION OT OTHER SIGNALS
BRDY# is sampled by the CPU, the 82495XP, and
the 82490XP at every ClK edge. It must always
meet proper setup and hold times to ClK.· Even
though the CPU latch may not be in use, BRDY #
assertion will still advance the latch counter.
C490LDRV shares a pin with BGT # .
7.5 CADS#
Cache Address Strobe
7.3.3 RELATION TO OTHER SIGNALS
BRDY # controls the CPU and 82490XP CPU latches. BRDY # has the following implication rules:
1. The last BRDY # for cycle N must be asserted 2
clocks before MEOC# for cycle N + 1.
2. BRDY# ~ BGT#
3. BRDY# > CDTS#
7.4 C490LDRV
Indicates beginning of cache cycle
Output from 82495XP (pin E3) Cycle Control Signal
Synchronous toClK
7.5.1 SIGNAL DESCRIPTION
CADS # requests the execution of a memory bus
cycle to the MBC, and indicates that the cycleattributes (ie. CD/C#, CM/IO#, CW/R#, PALLC#,
etc.) are valid.
.
82490XP low Drive Buffer
Selects the 82495XP low capacitance driving buffers
Input to 82495XP (pin M3) Configuration Signal
Synchronous to ClK
If the 82495XP receives a snoop hit ban [M] state
line before BGT # is asserted by the MBC, the current CADS# is aborted and reissued after the snoop
has completed. If the current line (issued by the
stalled CADS#) is invalidated by the snoop, then
that CADS# is cancelled ( ie. will not be reissued
after the snoop is completed).
7.4.1 SIGNAL DESCRIPTION
CADS# is a glitch-free signal.
C490lDRV selects the driving strength of the
82495XP buffers that interface to the 82490XP. Refer to the layout specifications for information how
C490LDRV should be connected.
7.5.2 WHEN DRIVEN
CADS# is asserted by the 82495XP for exactly one
ClK, and is always a valid logic level.
7.4.2 WHEN SAMPLED
C490lDRV is a configuration input sampled like Figure 7-1. C490lDRV requires a setup time of 4 CPU
clocks. After sampling, C490lDRV is a "don't care"
until it is sampled as the BGT # pin after the first
CADS# assertion.
7.5.3 RELATION TO OTHER SIGNALS
CADS#, when asserted, indicates that the cache
cycle control and attribute signals (ex. CD/C#,
NENE#, CW/R#, etc.) are valid.
Since allocations do not require BRDY #5 to the
CPU, the CDTS# of an allocation cycle will always
2-298
82495XP Cache Controller/82490XP Cache RAM
occur with CADS# of the allocation. In normal cycles the 82495XP will generate CADS# followed by
CDTS#.
7.6.3 RELATION TO OTHER SIGNALS
CAHOlD reflects the value ofAHOlD except during
self-test. During self-test, the value of CAHOlD
should be latched with the falling edge of FSIOUT #
to determine pass/tail.
CADS# = = CDTS# for all write-through cycles.
Once CADS# is active, PAllC#, CWAY, CDTS#,
and BUS# are valid. Address and cycle specification signals (MSETO-MSET10, MTAGO-MTAG11,
MCFAO-MCFA6, CW/R#, CM/IO#, CD/C#,
RDYSRC, MCACHE#,NENE#,SMlN#,KlOCK#,
and CPlOCK#) will be valid with CADS# active as
well.
7.7 C'DIC#
Cache Data/Code
Indicates whether current cycle is Code or Data
Output from 82495XP (pin 03) Cycle Control Signal
Synchronous to ClK
Every CADS # initiated cycle requires a BGT # and
CRDY # input from the MBC.
7.7.1 SIGNAL DESCRIPTION
CADS# and SNPADS# will never be asserted on
the same ClK.
CD/C#, along with CW/R# and CM/IO#, is a
82495XP cycle definition signal. It indicates the type
of bus cycle being requested of the MBC. CD/C#
can be pipelined by the memory bus controller (by
using the CNA# input to the 82495XP).
7.S CAHOlD
82495XP AHOlD Output
Self-test result and AHOlD output status
Output of 82495XP (pin G4) Test Signal
Synchronous to ClK
7.7~2
WHEN DRIVEN
CD/C# is valid in the same ClK as CADS# and
remains valid until CRDY# or CNAt!. C/DC# is always a valid logic level.
7.6.1 SIGNAL DESCRIPTION
CAHOlD has two functions. One, it indicates the result of the built-in self-tests of the 82495XP. Two, it
represents the 82495XP AHOlD into the i860 XP
CPU.
The 82495XP drives CAHOlD after the 82495XP
self-tests have completed. CAHOlD should be
latched when FSIOUT # goes inactive after reset. If
CAHOlD is high, the self-tests have passed, otherwise they have failed.
When the 82495XP drives AHOlD to the i860 XP
CPU, it also drives CAHOlD, thus providing a means
of tracking inquire cycles and back invalidations for
performance monitoring.
7.7.3 RELATION TO OTHER SIGNALS
Address and cycle specification signals (MSETOMSET10,
MTAGO-MTAG11, MCFAO-MCFA6,
CW/R#, CM/IO#, CD/C#, RDYSRC, MCACHE#,
NENE#, SMlN#, KlOCK#, and CPlOCK#) will be
valid with CADS # .
7.3 CDATAO-CDATA7
CPU Data Bus Connection
Data Bus Connection from 82490XP to CPU
Input/Output to 82490XP (pins 48, 54, 49, 55, 46,
51, 52, 57)
Isolated Interface
7.6.2 WHEN DRIVEN
CAHOlD is always at a valid logic level. During selfCAHOlD is held until the clock edge that FSIOUT# is sampled inactive. After self-test, or reset,
CAHOlD is asserted whenever the 82495XP asserts AHOlD.
~est,
2-299
intel"
82495XP Cache Controller/82490XP Cache RAM
1F>[ffi~[bO[M]OOO&[ffiW
7.8.1 SIGNAL DESCRIPTION
7.9.2 WHEN DRIVEN
CDATAO .. 7 is the 82490XP data bus connection to
the CPU. All or part of these 8 pins will be used in
connecting the 82490XP to the CPU depending on
the cache configuration. See layout information for
details.
CDTS # is asserted for one ClK, at the same time or
later than CADS # for any given cycle.
7.9.3 RELATION TO OTHER SIGNALS
When the MBC samples CDTS# asserted, it can
begin providing BRDY#s for the read cycle to the
CPU in the next ClK. CDTS # must always be asserted before CRDY # and must be asserted prior to
the first BRDY #.
7.9 CDTS#
Cache Data Strobe
Indicates availability of CPU data/data bus
Output from 82495XP (pin F4) Cycle Control Signal
Synchronous to ClK
The CDTS# of an allocation will always occur with
CADS# of the allocation. In normal cycles the
82495XP will generate CDTS# following CADS#.
CDTS# will be asserted at least one ClK after
SNPADS#.
7.9.1 SIGNAL DESCRIPTION
For read cycles, CDTS #, when asserted, indicates
that in the next CPU clock the data bus path is available. This is the earliest time in which BRDY# may
be supplied to the CPU. For CPU initiated write cycles, it indicates that the data is available on the
memory bus. For i860 XP CPU inquire cycles,
CDTS# informs the MBC that the last piece of inquire data is valid on the CPU bus.
7.10 CFGO-CFG2
Configuration Pins
Determine Cache Characteristics
Input to 82495XP (pins l4, Q1, M4,) Configuration
Signals
Synchronous to ClK
Usage of this signal allows complete independence
between address strobes (CADS# and SNPADS#)
and data strobe. CDTS # allows the 82495XP to signal the MBC that a new cycle has begun as soon as
addresses are available. This allows memory bus cycles to start before data is ready to be given/taken.
7.10.1 SIGNAL DESCRIPTION
CFGO-CFG2 are the 3 cache configuration inputs
that determine cache characteristics such as line ratio, tag size, and lines per sector. During RESET, this
information is passed on to the 82490XPs. The following table maps CFGO-CFG2 to their respective
configurations for the i860 XP CPU;
CDTS# is a glitch-free signal.
Config
No.
Line
Ratio
Lines/
Sector
No. of
Tags
CFG2
CFG1
CFGO
0
0
1
1
1
1
8K
.2
2
1
4K
1
1
1
3
1
2·
8K
0
0
0
4
2
1
8K
0
1
1
5
4
1
4K
1
1
0
2-300
82495XP Cache Controller/82490XP Cache RAM
~OO~!!"D~DOO~OOW
7.10.2 WHEN SAMPLED
7.13 CNA#[CFGO]
CFGO-CFG2 are sampled like Figure 7-1 with a setup time of at least 10 CPU clocks. After sampling,
CFGO, CFG1, and CFG2 become cycle progress input signals to the 82495XP and are sampled after
CADS # of the first cycle.
82495XP Next Address Enable
Dynamically pipelines CADS# cycles
Input to 82495XP (pin l4) Cycle Progress Signal
Synchronous to ClK
7.10.3 RELATION TO OTHER SIGNALS
7.13.1 SIGNAL DESCRIPTION
CFGO shares a pin with CNA#, CFG1 shares a pin
with SWEND#, and CFG2 shares a pin with
KWEND#.
CNA# is used by the MBC to dynamically pipeline
CADS # cycles. When active it indicates to the
82495XP that the next MBC request can be started.
Only one level of pipelining is allowed in the
82495XP.
7.11
ClK
CNA# is an optional input for all cycles initiated with
CADS#.
i860 XP CPU, 82495XP, 82490XP Clock
Input to the 82495XP (D11)
7.13.2 WHEN SAMPLED
7.11.1 SIGNAL DESCRIPTION
The CLK input determines the execution rate and
timing of the 82495XP, 82490XP, and CPU. Pin timings are specified relative to the rising edge of this
signal. The i860 XP CPU, 82495XP, and 82490XP
requires TTL levels on ClK for proper operation.
CNA# is sampled starting in the first ClK in which
BGT # is sampled active until CRDY # is sampled
active. CNA # is then ignored until the BGT # of the
next cycle.
CNA# is ignored during snoop write-back cycles.
7.13.3 RELATION TO OTHER SIGNALS
7.12 CM/IO#
Once the 82495XP samples this signal active, it issues the CADS# for the next memory bus cycle as
soon as one begins.
Cache MemoryliO
Indicates whether current cycle is Memory or 10
Output from 82495XP (D4) Cycle Control Signal
Synchronous to ClK
CNA# is recognized between BGT# and CRDY#
or CDTS# and CRDY# of a given cycle.
7.12.1 SIGNAL DESCRIPTION
7.14 CRDV#
CM/IO#, along with CW/R# and CD/C#, is a
82495XP cycle definition signal. It indicates the type
of bus cycle being requested of the MBC. CM/IO#
can be pipelined by the memory bus controller
(CNA# input to the 82495XP).
7.12.2 WHEN DRIVEN
CM/IO#is valid in the same ClK as CADS#, and
remains active until CRDY # or CNA #.
7.12.3 RELATION TO OTHER SIGNALS
Address and cycle specification signals (MSETOMSET10, MTAGO-MTAG11, MCFAO-MCFA6, CWI
R#, CMIIO#, CD/C#, RDYSRC, MCACHE#,
NENE#, SMlN#, KlOCK#, and CPlOCK#) will be
valid with CADS# assertion.
2-301
Cache Ready
Ends a cycle in the 82495XP/82490XP
Input to 82495XP and 82490XP (pins M2, 43) Cycle
Progress Signal
Synchronous to ClK
7.14.1 SIGNAL DESCRIPTION
CRDY # is used by the 82495XP and 82490XP to
end a memory bus cycle. CRDY # indicates full comof
the
cycle
and
allows
the
pletion
82495XP/82490XP to free internal resources for the
next cycle. In the 82490XP, this means that the current memory buffer in use is emptied (put in array,
discarded, etc). In the 82495XP, CRDY# assertion
allows 82495XP cycle progress signals (BGT # ,
KWEND#, SWEND#) to be sampled for the next
cycle if pipelining is used.
II
.=n~®
"~J
82495XP Cache Controller/82490}{P Cache RAM
CRDY # is required for all 82495XP 182490XP memory bus cycles, including snoop cycles. CRDY #
must be asserted to the 82495XP and 82490XP at
the same time.
7.14.2 WHEN SAMPLED
CRDY# for a given cycle is ignored until KWEND#
is returned for that cycle; If KWEND# is not required
for the cycle, CRDY # is ignored until BGT #. When
CRDY# is ignored, it may violate setup and hold
times.
7.14.3 RELATION TO OTHER SIGNALS
CRDY # must be sampled by the 82495XP and
82490XP at the same time. For the 82495XP,
CRDY # has many cycle implication rules:
1. CRDY# > CDTS#
2. CRDY# > BGT#
3. CRDY# > BGT# + 2 clocks if cycle is a line-fill
or allocation
4. CRDY # > KWEND # if cycle is a line-fill or writethrough with potential allocation (PAllC# = 0)
[prm~[]JIMlO~~!ruW
7.15.1 SIGNAL DESCRIPTION
CWAY is a cycle definition signal which indicates to
the MBC the WAY used by the requested cycle. On
line-fills it indicates the way the line will be loaded.
For write-hits (to IS] state or lOCKed) it indicates
the way which was a hit. For write-backs it indicates
the way that was written-bacl<.
CWAY is utilized by external tracking machines in
order for the 82495XP tags to be accurately duplicated.
7.15.2 WHEN DRIVEN
CWAY is valid together with CADS# and remains
valid until CRDY# or CNA#.
7.15.3 RELATION TO OTHER SIGNALS
CWAY is valid with CADS#.
7.16 CW/R#
For the 82490XP, CRDY # has three basic rules:
Cache Write/Read
Indicates whether current cycle is write or read
1. MEOC# for cycle N must be sampled with or before CRDY# for cycle N.
Output from 82495XP (pin E4) Cycle Control Signal
Synchronous to ClK
2. MEOC# for cycle N + 1 must be sampled at least
2 CPU clocks after CRDY # for cycle N.
3. CRDY # for cycle N + 1 must be after the last
BRDY# for cycle N.
MBRDY# fills the current 82490XP memory buffer.
CRDY # emties this buffer and makes it available for
new cycles. CRDY # may be asserted on the same
clock as MEOC# which may be asserted on the
. same clock as MBRDY #.
CRDY# shares a pin with SlFTST#.
7.16.1 SIGNAL DESCRIPTION
CW/R#, along with CD/C# and CM/IO#, is a
82495XP cycle definition signal. It indicates the type
of bus cycle being requested of the MBC. CW/R#
can be pipelined by the memory bus controller
(CNA# input to the 82495XP).
7.16.2 WHEN DRIVEN
CW/R# is valid in the same ClK as CADS# and is
valid until CRDY# or CNA#.
7.15 CWAY
Cache Way
Indicates WAY used by the current cycle
Output from 82495XP (pin J3) Cycle Control Signal
Synchronous to ClK
7.16.3 RELATION TO OTHER SIGNALS
Address and cycle specification signals (MSETOMSET10,
MTAGO-MTAG11,
MCFAO-MCFA6,
CW/R#, CM/IO#, CD/C#, RDYSRC, MCACHE#,
NENE#, SMlN#, KlOCK#, and CPlOCK#) will be
valid with CADS#.
2-302
82495XP Cache Controller/82490XP Cache RAM
~ffi{~IbD~DOO~ffi{\'7
7.17 DRCTM#
7.17.2 WHEN SAMPLED
Memory Bus Direct to [M] State
Signals 82495XP to tag data direct to the [M] state,
skipping the [E] and [S] states.
DRCTM # is synchronous to ClK. It is only sampled
when SWEND# is active (the end of the snooping
window). When SWEND# is inactive DRCTM# is
ignored and does not have to meet setup and hold
times.
Input to the 82495XP (pin M1) Cycle Attribute Signai
Synchronous to ClK
7.17.3 RELATION TO OTHER SIGNALS
7.17.1 SIGNAL DESCRIPTION
DRCTM # is an input to the 82495XP from the memory bus. When sampled active at the end of the
snooping window (SWEND# activation), the
82495XP moves the line fill in progress directly to
the [M] state.
DRCTM # (direct to [M]) and MWB/WT # (write policy) combine to define the memory bus attributes and
are sampled on ClK at the end of the snooping window (SWEND# activation).
If MRO# is sampled active during KWEND#,
DRCTM# is ignored.
There are three cases in which this is useful.
1. Simplifies External State Tracker
External, trackers can only track the [M], [S], and
[J] states. The [E] state can not be tracked externally since cache write hits internally change [E]
state lines to [M] state. DRCTM# can be used to
eliminate therE] state from the MESI protocol.
2. Read For Ownership
During a write miss with allocation the write may
go to the memory buffer and not be written to
memory. A read from memory, in conjunction with
the MFRZ# signal asserted, reads the data to fill
around the bytes written by the CPU. The contents of the memory buffer are then entered into
the cache. The cache would normally tag this
data in the [E] state (The cache assumes the
write went to main memory). The system has the
option of never completing the write to memroy
(increases performance by completing the allocation quicker). If the, write is not performed to
memory, the cache is the only owner of the new
data and therefore the cache entry must be
tagged to the [M] state.
3. Cache to Cache Transfer
A cache to cache transfer may occur as a result
of a snoop. For example, if CPU/Cache 1 performs a read from main memory and CPu/Cache
2 flags it as a snoop hit to an [M] state line. To
expedite the transfer, the system may perform
the writeback from CPU/Cache 2 directly to
CPU/Cache 1, bypassing memory. CPU/Cache 1
assumes the write-back went to memory and
would normally tag the line to the [S] state. Since
the system did not perform the write to memory,
the system should drive DRCTM# to force the
line to, the [M] state. In addition, the line should
be invalidated in CPU/Cache 2 by driving
SNPINV.
7.18 FLUSH#
Flush
Causes a 82495XP Cache Flush
Input to 82495XP (N4) Cache Synchronization Signal
Asynchronous input
7.18.1 SIGNAL DESCRIPTION
This signal causes the 82495XP to flush all its modified lines to main memory. The flushing of modified
lines require the 82495XP to perform back-invalidation and inquire cycles to the CPU. At the end of
flush, the 82495XP tag array will be completely invalidated.
FLUSH # will invalidate the entire 82495XP tag array. It takes two clocks to look-up and invalidate a
tag entry. The 82495XP will also invalidate tags in
the CPU cache by running back-invalidation cycles.
If the 82495XP tag state is modified, the 82495XP
will run inquire cycles to the i860 XP CPU to see is
the line is modified in its cache. If so, the i860 XP
CPU will write back the line into the 82495XP write
buffer. All modified 82495XP cache lines must be
written to memory.
7.18.2 WHEN SAMPLED
FlUSH# can be asserted at any time. The,82495XP
will complete all outstanding transactions on the
CPU ,and memory bus before beginning the
FLUSH # process. The memory bus controller does
not have to prevent FlUSH# during locked cycles
because the 82495XP will complete its locked transaction before the FLUSH # process will begin.
2-303
•
82495XP Cache Controller/82490XP Cache RAM
Once a FLUSH # operation has begun, the FLUSH #
signal is ignored until the operation completes. If
RESET is activated while the FLUSH # operation is
in progress, the FLUSH # operation will be aborted
and the RESET immediately executed.
FlUSH# is an asynchronous input. FlUSH# must
have a pulse width of 2 ClK's in order to guarantee
82495XP recognition.
7.18.3 RELATION TO OTHER SIGNALS
To initiate a FLUSH #, the 82495XP will complete all
pending cycles and prohibit the processor from issuing any further ADS#'s while the FlUSH# is in
progress. The FSIOUT # output signal is used to indicate the start and end of the FLUSH # operation. It
will become active when the FLUSH # signal is internally recognized (all outstanding cycles have completed) and will de-activate with the CRDY # of the
last FLUSH # write-back.
The memory bus controller supplies BRDY # to the
CPU once FSIOUT # has gone inactive and the
FLUSH is complete. Once FLUSH # has begun, and
FSIOUT# active, all CADS#'s and CRDY#'scorrespond to write-backs caused by the FlUSH# operation.
The 82495XP can be snooped during FLUSH # cycles and the snooping protocols will be the same as
that for any memory bus cycle.
Mode
FPFLDEN
NCPFLD#
1
0
1
2
0
0
3
1
1
Illegal Mode
1
0
If mode 3 has been selected, the 82495XP allows
the PFLD pipeline to be extended with an external
FIFO. After RESET, when this mode has been selected, the FPFlD output will indicate that the requested cycle is a PFlD cycle. See Section 5.2.5 for
more details.
7.19.2 WHEN DRIVEN
FPFlDEN is sampled on RESET as in figure 7-1,
\vith a setup time of 4 CPU clocks. In PFLD mode
#3, the FPFlD# output is valid in the same ClK as
CADS# and remains valid until CRDY# or CNA#.
7.19.3 RELATION TO OTHER SIGNALS
Address and cycle specification signals (MSETOMTAGO-MTAG11,
MCFAO-MCFA6,
MSET10,
CW/R#, CM/IO#, CD/C#, RDYSRC, MCACHE#,
NENE#, SMlN#, KlOCK#, and CPlOCK#) will be
valid with CADS#.
7.20 FSIOUT#
7.19 FPFLD# [FPFLDEN]
External FIFO PFlD
Indicates PFlD cycle during external PFLD FIFO
mode
Output of the 82495XP (J4) Cycle Control Signal
Sync to ClK
Flush, Sync, Initialization Output
Indicates the start and end of the Flush,
Sync, and Initialization operations.
Output of the 82495XP (01) Cache Synchronization
Signal
Sync to ClK
7.19.1 SIGNAL DESCRIPTION
7.20.1 SIGNAL DESCRIPTION
During RESET, this pin functions as the FPFLDEN
configuration signal. The 82495XP can be configured to decode the i860 XP microprocessor's PFlD
cycles. The 82495XP supports 3 operational modes
for PFLD cycle decoding, as defined by FPFlDEN
and NCPFlD#:
This signal indicates the start and the end of either a
Flush, Sync, or Initialization (including self-test if requested) operation. These operations are mutually
exclusive. This signal is activated when the 82495XP
begins the operation and goes inactive upon completion of the operation.
Mode # 1. PFlD cycles are cached in the 82495XP.
Mode #2. PFlD cycles are not cached in the
82495XP, without an external PFlD extension FIFO.
Mode #3. PFLD cycles not cached in the 82495XP,
with an external PFlD extension FIFO.
7.20.2 WHEN DRIVEN
This signal will be asserted whenever a Flush, Sync,
or Initialization operation is internally recognized by
the 82495XP and is in progress.
2-304
inteL
82495XP Cache Controller/82490XP Cache RAM
[F>!ru~[1,O[M]OOO~!ruW
7.20.3 RELATION TO OTHER SIGNALS
7.21.3 RELATION TO OTHER SIGNALS
FSIOUT # active indicates that either Flush, Sync, or
Initialization operation is in progress. Only one of
these operations can be run within the 82495XP at a
time.
HIGHZ# shares a pin with MBAlE. 82495XP outputs are tristated if both HIGHZ# and SlFTST# are
sampled active during reset.
The table below shows the priorities of these three
operations:
7.22 KLOCK#
Operation
Trigger
Initialization
RESET
Flush
FLUSH#
Sync
SYNC#
Priority
Highest
82495XP lOCK #
Request to MBC of lOCKed cycle
Output from 82495XP (pin C3) Cycle Control Signal
Synchronous to ClK
7.22.1 SIGNAL DESCRIPTION
lowest
If a trigger of higher priority occurs while a lower
priority operation is running, the lower priority operation is aborted and the higher priority one executed.
If a trigger of lower priority occurs when a higher
priority one is running, the lower priority trigger is
ignored. Once a FlUSH# or SYNC# operation has
begun, its trigger is ignored until the operation completes.
When a higher priority operation aborts a lower priority one, FSIOUT # remains active.
Since RESET, FlUSH# and SYNC# are all asynchronous, FSIOUT # will be activated when the
82495XP is actually internally executing the operation.
7.21 HIGHZ#
High Impedance Outputs
Causes 82495XP outputs to be tristated
Input to 82495XP (pin P4) Test Signal
Synchronous to ClK
KlOCK# indicates to the MBC that there is a request to execute a locked cycle. This signal follows
the CPU lock request.
KLOCK # is simply a one-clock flow-through version
of the CPU lOCK # signal. The 82495XP will activate KLOCK # with CADS # of the first cycle of a
lOCKed operation and it will remain active until the
CADS# of the last cycle of the lOCKed operation.
Note that if the memory bus is pipelined, there may
be a situation in which KLOCK # deactivation is in
the same ClK as its new activation (together with
CADS#). In this case KlOCK# won't go inactive
between back-to-back locked sequences. KLOCK #
will never go inactive if the CPU lOCK# does not go
inactive. The 82495XP will not open arbitration windows between back-to-back locked sequences; it is
the memory bus controller's responsibility to implement this functionality by detecting a lOCKed write
followed by a lOCKed read.
KLOCK # activation is not qualified by the tag array
look-up (hit/miss indications); therefore, KlOCK#
can be active before CADS # is asserted.
7.22.2 WHEN DRIVEN
7.21.1 SIGNAL DESCRIPTION
The 82495XP will enter self-test if both SlFTST# is
active and HIGHZ# is inactive during reset. If
SlFTST# is sampled active and HIGHZ# is sampled active during reset, the 82495XP floats all its
outputs until the 82495XP is reset again. Activation
of HIGHZ# without SlFTST# does nothing.
KlOCK# assertion is a flow-through of 1 ClK from
the CPU lOCK# after the 82495XP completes all
pending cycles. KLOCK # deassertion is a flowthrough of 1 ClK from the CPU lOCK# signal, and
must be at least 1 ClK after the last CADS# of a
lOCKed sequence. KlOCK# is always driven to a
valid logic level.
7.22.3 RELATION TO OTHER SIGNALS
7.21.2 WHEN SAMPLED
HIGHZ# is sampled like figure 7-1 with a setup time
of 10 CPU clocks. HIGHZ# is then a don't care until
the 82495XP reset sequence is complete (with FSIOUT # going inactive) where it becomes the MBAlE
pin.
Address and cycle specification signals (MSETOMSET10, MTAGO-MTAG11, MCFAO-MCFA6, CWI
R#, CM/IO#, CD/C#, RDYSRC, MCACHE#,
NENE#, SMlN#, KlOCK#, and CPlOCK#) will be
valid with CADS#. 'i
2-305
82495XP Cache Controller/82490XP Cache RAM
7.23 KWEND#
7.24 MALE
Cacheability Window End
Closes 82495XP Cacheability Window
Input to 82495XP (pin M4) Cycle Progress Signal
Synchronous to CLK
Memory Address Latch Enable
Tristates/Enables Memory Address Outputs
Input to 82495XP (pin 02) Cycle Control Signal
Asynchronous
7.23.1 SIGNAL DESCRIPTION
7.24.1 SIGNAL DESCRIPTION
KWEND# is a cycle progress input to the 82495XP
that, when active, closes the cacheability window
and causes the cacheability attributes MKEN# and
MRO# to be sampled.
The 82495XP contains an address latch which controls the last stage of the. 82495XP address output. It
is controlled by four signals: MAOE #, MBAOE #,
MALE, and MBALE. The signals MALE a:nd MBALE
control the latching of the entire 82495XP address
where MBALE controls the subline portion and
MALE controls the rest.
KWEND# is sampled by the 82495XP after BGT#
has been sampled active. KWEND# should be asserted by the MBC once .the memory address has
been decoded and cacheability (MKEN#) and readonly (MRO#) attributes have been determined.
The sampling of KWEND# active allows SWEND#
to be sampled. Resolving KWEND# quickly allows
the non-cacheable window between BGT # and
SWEND# to be closed more quickly. KWEND# activation also allows the 82495XP to start allocations
and begin replacements.
7.23.2 WHEN SAMPLED
KWEND# is sampled by the 82495XP on the clock,
or after, BGT# has been sampled active. Once
KWEND#is sampled active it is not sampled again
until BGT# of the next cycle. KWEND# need not
follow setup and hold times if it is not being sampled.
BGT#, KWEND# and SWEND# may be asserted
on the same clock edge.
KWEND# need only be activated for those cycles
which require the sampling of MKEN # and MRO#.
These are line-fills and write cycles with potential
allocation.
7.23.3 RELATION TO OTHER SIGNALS
KWEND# is sampled on or. after BGT#and allows
the sampling of SWEND#. KWEND# activation
causes the sampling of MKEN# and MRO#.
According to cycle progress implication rules,
CRDY# must be at least one clock after KWEND#
for line fills and write-through cycles with potential
allocate.
MALE is provided so that the memory bus controller
can control \vhen the next pipelined address is driven. With MALE high, the 82495XP address latch is in
'flow-through' mode and the· 82495XP address is
available at the memory bus. Changes in the
82495XP address are seen immediately at the mem"
ory bus. When MALE is driven low the address at
the latch input is latched. Any subsequent address
driven by the 82495XP will not be seen at the memory bus outputs until MALE is driven high again.
MALE will latch 82495XP addresses regardless of
the state of MAOE#. If MAOE# is inactive, MALE
will still operate the latch properly, but the memory
bus will be tristated.
7.24.2 WHEN SAMPLED
MALE is asynchronous and. can be asserted and
deasserted at any time. MALE should always be
driven to a valid state since it directly controls thei
operation of the address latch.
7.24.3 RELATION TO OTHER SIGNALS
MALE together with MBALE control the latching of
the entire 82495XP output address. The other latch
control signals, MAOE# and MBAOE#, provide the
memory bus controller complete command over the
address outputs. MAOE# and MBAOE# do not affect the operation of MALE or MBALE.
MALE shares a pin with the WWOR # configuration
pin.
KWEND# shares a pin with CFG2.
2-306
82495}{P Cache Controller/8249Q}{P Cache
RA~li
7.25 MAOE#
MALE and MAOE # together provide full control
over the 82495XP address output latch.
Memory Address Output Enable
Tristates/Enables Memory Address Outputs
Input to 82495XP (pin S4) Cycle Control Signal
Asynchronous except during snoop cycles
7.26 MElAllE
7.25.1 SIGNAL DESCRIPTION
The 82495XP has an address latch which is controlled by a latch input, MALE, and an output enable
input, MAOE#. MAOE# has two main functions.
One, driving MAOE # active will enable the 82495XP
to drive it's address lines MTAGO-11, MSETO-10,
and MCFAO-6. Two, MAOE# is a qualifier for snoop
cycles and must be inactive for the 82495XP to
snoop.
In general, MAOE# should be active if its 82495XP
is the current bus master. When that 82495XP gives
up the bus, MAOE # should be inactive to float the
address lines and allow another master to snoop.
MAOE # controls the output of the 82495XP address except the subline (burst) portion. This portion
has a separate output control: MBAOE#.
7.25.2 WHEN SAMPLED
MAOE# is an asynchronous input (except during
snoop cycles) and always has full control over the
address output. For this reason, MAOE # must always be driven to a valid state.
The 82495XP does, however, sample MAOE # during snoop cycles. When sampled, MAOE it must
meet proper setup and hold times. In synchronous
snoop mode MAOE# is sampled on a CLK edge. In
clocked mode MAOE# is sampled on a SNPCLK
edge. In strobed mode MAOE# is sampled with the
falling edge of SNPSTB #. If MAOE # is sampled active, the snoop will be ignored. This allows
SNPSTB# to share a common line for multiple
82495XPs.
Memory Burst Address Latch Enable
Tristates/Enables Memory Burst Address Outputs
Input to 82495XP (pin P4) Cycle Control Signal
Asynchronous
7.26.1 SIGNAL DESCRIPTION
The 82495XP address latch is controlled by four signals: MAOE#, MBAOE#, MALE, and MBALE. The
signals MALE and MBALE control the latching of the
entire 82495XP address where MBALE controls the
subline portion and MALE controls the rest.
MALE and MBALE are provided so that the memory
bus controller has complete flexibility when the next
address is driven. With MBALE high, the subline portion of the 82495XP address latch is in "flowthrough" mode and the 82495XP subline address is
available at the memory bus. Changes in the
82495XP sub line address are seen immediately at
the memory bus. When MBALE is driven low the
subline address at the latch input is latched. Any
subsequent subline address driven by the 82495XP
will not be seen at the memory bus outputs until
MBALE is driven high again.
MBALE will latch 82495XP addresses regardless of
the state of MAOE# or MBAOE#. If MBAOE# is
inactive, MBALE will still operate the latch properly,
but the subline portion of the memory bus will be
tristated.
Separate line and sub line address latch controls are
provided so that the latch outputs may be driven at
different times. The table below indicates the subline
address bits for each line size.
MAOE# need not meet any setup or hold time if it is
not being sampled during a snoop cycle.
Line Size (Bytes)
Subline Address
32
A3,A4
64
M,A5
128
A5,A6
7.25.3 RELATION TO OTHER SIGNALS
MAoE# together with MBAOE# control the entire
82495XP address. Both signals are asynchronous
and thus need never be synchronized to any clock.
Both signals are, however, sampled during snoop
cycles and require proper setup and hold times in
these situations.
7.26.2 WHEN SAMPLED
MBALE is asynchronous and can be asserted and
de asserted at any time. MBALE should always be
driven to a valid state since it directly controls the
operation of the address latch.
2-307
82495XP Cache Controller/82490XP Cache RAM
7.26.3 RELATION TO OTHER SIGNALS
MALE together with MBALE control the latching of
the entire 82495XP output address. The other latch
control signals, MAOE# and MBAOE#, provide the
memory bus controller complete command over the
address outputs. MAOE# and MBAOE# do not af·
fect the operation of MALE or MBALE.
MBALE shares a pin with the HIGHZ# configuration
pin.
7.27 MBAOE#
Memory Burst Address Output Enable
Tristates/Enables Memory Subline Address Outputs
Input to 82495XP (pin P6) Cycle Control Signal
Asynchronous except during snoop cycles
~OO[g[!"O~OOO~OOW
must meet proper setup and hold times to ClK's
rising edge. In clocked mode, MBAOE# must meet
setup and hold times to SNPCLK's riSing edge. In
strobed mode, MBAOE# must meet setup and hold
times to SNPSTB#'s falling edge.
If MBAOE# is not being sampled for a snoop, ie.
SNPSTB# is not asserted, MBAOE# need not meet
any setup or hold time.
7.27.3 RELATION TO OTHER SIGNALS
MAOE# and MBAOE# control the entire 82495XP
address output asynchronously. This address latch
is completely controlled by MALE, MBAlE, MAOE#,
and MBAOE#.
MBAOE# is only sampled by the 82495XP during
snoop cycles with SNPSTB #.
7.27.1 SIGNAL DESCRIPTION
7.28 MBRDY#
The 82495XP address latch is controlled by four sig·
nals: MAOE#, MBAOE#, MALE, and MBALE.
MAOE# and MBAOE# are the output enables of
this latch for the entire 82495XP address. Specifically, MBAOE# controls the subline address portion
and MAOE # controls the rest.
Memory Burst Ready
Burst Ready input to 82490XP memory buffers
Input to 82490XP (pin 22) Cycle Progress Signal
Synchronous to MCLK
MBAOE # has two functions. One, it can tristate the
subline portion of the address separately from the
rest of the address. Since the 82495XP does not
sequence through burst addresses, the memory system may wish to .provide the burst count. This requires that the 82495XP address burst portion be
tristated after the first transfer. The Subline Address
table appears in Section 7.26, MBALE.
7.28.1 SIGNAL DESCRIPTION
Two, MBAOE# is sampled during snoop cycles. If
MBAOE# is sampled inactive, the snoop write back
cycle, if any, will begin at the subline address provided. If MBAOE# is sampled active, the snoop write
back will begin at subline address o. This allows
snoop write backs to begin at the snooped subline
address and progress through the normal burst order.
7.27.2 WHEN SAMPLED
Like MAOE #, MBAOE # is asynchronous except
during snoop cycles and can be asserted or deasserted at any time. Since MBAOE # has direct control over the address latch, it must always be driven
to a valid state.
MBAOE# is ,however, sampled during snoop cycles. In synchronous snooping mode, MBAOE#
When in clocked memory bus mode, MBRDY # (with
MSEL# active) is used to advance the memory
burst counter for the 82490XP buffer in use. This
causes either new data to be latched from the memory bus (read cycle), or new data to be driven from
the 82490XP buffer (write cycle). MBRDY # is sampled on all MCLK edges in which MSEL# is sampled
active and has no relation to CLK. In strobed mode,
MBRDY# must be tied high as MISTB/MOSTB
strobes data in/out of the 82490XP.
For write cycles, the first piece of write data is available at the MDATA pins. MBRDY# assertion with
MSEL# active causes the next 32, 64, or 128-bit
slice of write data to be available. If only one slice is
required, MSEL # and MBRDY # need never go active.
For read cycles, the first piece of read data flows
through to the CPU. MBRDY# assertion with
MSEL # active causes the next slice of memory data
to be latched in the 82490XP buffer. BRDY# assertion will allow this data to be available on the CPU
bus and latch it into the Cpu. For cacheable cycles,
MBRDY # needs to be asserted 4 or 8 times depending on the cache configuration.
2-308
int:el.,
82495XP Cache Controller/82490XP Cache RAM
~OO~[bD~DOO~OOW
7.28.2 WHEN SAMPLED
7.29.3 RELATION TO OTHER SIGNALS
MBRDY # is sampled on all MClK edges where
MSEl# is sampled active. In this way MSEl# qualifies the MBRDY# input. If MSEl# is sampled inactive, MBRDY # need not follow setup and hold times
to MClK.
Address and cycle specification signals (MSETOMSET10,
MTAGO-MTAG11,
MCFAO-MCFA6,
CW/R#, CMIIO#, CD/C#, RDYSRC, MCACHE#,
NENE#, SMlN#, KLOCK#, and CPlOCK#) will be
valid with CADS # .
7.28.3 RELATION TO OTHER SIGNALS
7.30 MCFAO-MCFA6
MSETO-MSET10
MTAGO-MTAG11
MBRDY# is qualified by the MSEl# input.
MBRDY # advances the memory burst counter for
the 82490XP in use which either inputs or outputs
data through MDATA.
MEOC# switches the 82490XP buffers to the next
pending cycle, so the last MBRDY # must come before or on the clock of MEOC# assertion.
7.29 MCACHE #
82495XP Internal Cacheability
Indicates cycle cacheability attribute
Output from 82495XP (pin C2) Cycle Control Signal
Synchronous to ClK
7.29.1 SIGNAL DESCRIPTION
MCACHE# is driven by the 82495XP and indicates
that the current cycle may be cached. Data cacheability is determined later in the cycle by MKEN # assertion. MCACHE# is asserted for allocation, replacement write-back cycles, and during cacheable
read-miss cycles. (ie. read-miss cycles in which PCD
is not asserted). It is not asserted for 10, special, or
locked cycles.
Cycle Type
MCACH~#
Posted Writes
1
Write Backs
0
Read, PCD
=0
Read, PCD = 1
0
Allocation
0
I/O Cycles
1
locked Cycles
1
MCFAO-MCFA6 Memory Configuration Address I/O
MSETO-MSET10 Memory Set Address I/O
MTAGO-MTAG11 Memory Tag Address I/O
82495XP Memory Address Inputs/Outputs
Input/Output of 82495XP (pins N14, P7-P15, 06016, R4, R14-R17, S14-S17) Cycle Control Signals
Input Synchronous to ClK, SNPCLK, or SNPSTB#.
Output from ClK, MAOE# active or MALE higt).
7.30.1 SIGNAL DESCRIPTION
MSETO-10, MTAGO-11, and MCFAO-6 provide the
complete 30 bit address input/output interface of
the 82495XP to the memory bus. Together they
span the entire CPU address range A2-A31. Depending on the cache configuration, each pin represents a different CPU address line (see configuration section for details).
MSETO-10, MTAGO-11, and MCFAO-6 pass
through a 82495XP output latch. The latching of this
latch is controlled by MAlE/MBALE, and the output
of this latch is controlled by MAOE#/MBAOE#.
With MAOE#/MBAOE# active, MSET/MTAG/
MCFA are 82495XP outputs. They are valid at the
start of a memory bus cycle at the input of the
82495XP address latch. If MALE/MBAlE is high
(flow-through) and MAOE#/MBAOE# is active
(outputs enabled), they are driven to the memory
bus with CADS #.
1
7.29.2 WHEN DRIVEN
MCACHE# is valid in the same ClK as CADS# and
remains valid until CRDY# or CNA#.
If a new cycle starts and MAlE/MBAlEis low, the
previous address remains valid at the 82495XP
MSET/MTAG/MCFA outputs. Once MAlE/MBAlE
goes high, the new address flows through with the
appropriate propagation delay (MSET/MTAG/
MCFA address valid delay from MAlE/MBAlE going high). The new address will be driven to the
82495XP MSET/MTAG/MCFA outputs if MAOE#!
MBAOE# is active.
2-309
ini:ei®
82495XP Cache Controller/82490XP Cache RAM
If a new cycle starts, MAlE/MBAlE is high, and
MAOE#/MBAOE# is inactive, the 82495XP MSETI
MTAG/MCFA outputs will remain tristated. Once
MAOE#/MBAOE# is asserted, the new address
flows through with the appropriate propagation delay
(MSET/MTAG/MCFA address valid from MAOE#!
MBAOE # going active).
~OO[§[!"OlMlOOO~OOW
MSETO-10, MTAGO-11, and MCFAO-6 are used
as inputs during snoop. cycles. They are sampled
with SNPSTB# like any other snoop attribute signal.
7.31
MCLK
Memory Bus Clock
Input to the 82490XP (Pin 26)
MSETO-10, MTAGO-11, and MCFAO-6 are used
as inputs to the 82495XP during snoop cycles. Here,
MAOE#/MBAOE# is inactive. MSET/MTAGI
MCFA are sampled by the 82495XP during snoop
initiation just like the other snoop attributes.
7.3.1.1 SIGNAL DESCRIPTION
7.30.2 WHEN SAMPLED
If MAlE/MBAlE is high and MAOE#/MBAOE# is
low, MSETO-10, MTAGO-11, and MCFAO-6 are
valid with CADS # with a timing reference to ClK.
Otherwise, thev are asserted with a delav from
.MAlE/MBAlE high or MAOE # IMBAOE # activ~.
MSETO-10, MTAGO-11, and MCFAO-6 change
once CNA# or CRDY# is sampled active. MSETO10, MTAGO-11, and MCFAO-6 have a float delay
from MAOE#/MBAOE# going inactive. These outputs are undefined after CRDY # ICNA # assertion
and before the next CADS # assertion.
As inputs during snoop cycles (SNPSTB# asserted),
they must be sampled like other snoop attributes
with proper setup and hold times. In synchronous
snoop mode this is with respect toClK; in clocked
mode, this is with respect to SNPClK; and in
strobed mode this is with respect to SNPSTB# failing edge.
If MAOE# is inactive and SNPSTB# is not asserted
(no snoop), MSETO-10, MTAGO-11, and MCFAO6 need not meet any setup or hold time.
In a clocked memory bus mode, this pin provides the
memory bus clock. Memory bus signals and memory
bus data are sampled on the rising edge of MClK.
Memory bus write data is driven off MCLK or
MOClK depending upon the configuration. MClK
has no relation to ClK.
7.31.3 RELATION TO OTHER SIGNALS
MClK shares a pin with MISTB.
In clocked memory bus mode, the MDATA7MDATAO, MSEL#, MFRZ#, MBRDY#, MZBT#,
and MEOC# pins are sampled synchronously with
the rising edge of MClK. In a clocked memory bus
write, MDATA7-MDATAO are driven synchronous
with MClK or MOClK.
MOClK is a delayed version of MClK. If a clocked
memory bus configuration is chosen, and the
MOClK rising edge is detected by the 82490XP after RESET, data will be driven off of MOCLK rather
then MClK. Only data is effected by MOClK.
MOCLK is used to allow the system designer to increase the minimum output time of MDATA relative
to MClK.
7.32 MDATAO-MDATA7
7.30.3 RELATION TO OTHER SIGNALS
MSETO-10, MTAGO-11, and MCFAO-6 are asserted with CADS# so they are valid when CADS# is
sampled active. This is true as long as MAlE/MBAlE is high and MAOE # IMBAOE # is active. If
MSETO-10, MTAGO-11, and MCFAO-6 have been
asserted but are blocked by MAlE/MBAlE or
MAOE # IMBAOE #, they are asserted from MAlEI
MBAlE going high or MAOE#/MBAOE# going ac·
tive..
MSETO-10, MTAGO-11, and MCFAO-6 are deasserted or changed with CADS# or CNA# active.
They may also be floated with MAOE# going inactive.
Memory Bus Data Pins
82490XP Connection to the Memory Bus
Input/Output of 82490XP (pins 18,14, 10,6, 16, 12,
8,4)
Synchronous to ClK or MClK or MOClK or MISTB
or MOSTB.
7.32.1 SIGNAL DESCRIPTION
MDATAO-7 is the 82490XP data bus connection to
the memory bus. All or part of these pins will be used
depending on the cache configuration. These pins
2-310
82495XP Cache Controller/82490XP Cache RAM
are directly controlled by the MDOE# input. With
MDOE# inactive, these pins are tristated and may
be used as inputs.
For write cycles, the 82495XP asserts CDTS# to
indicate that data will be available at the MDATA
pins or in its buffer. Data is output with respect to
CLK, MCLK, MOCLI<, or MEOC1I and is strobed
with MBRDY1I. In strobed memory bus mode, data
is output using MOSTB.
For read cycles, COTS # indicates that the CPU data
path will be available for read data in the next clocle
BRDY 11 reads data into the CPU from the 82490XP.
Data is read into the 82490XPs through MDATA using MBRDY 11 or MISTB.
MDOE# must be inactive for MDATA to read data.
CDTSI! assertion by the 82495XP indicates that the
read path is available in the next clock. Data must be
read into MDATA with respect to MCLK or MISTB
and must follow proper setup and hold times if
MBRDY 11 is active or MISTB is changing.
The memory bus controller must account for the
large setup time required to read data into the CPU.
If properly done, data can be read into MDATA by
asserting MBRDY 11 and in the next full CPU clock
read into the CPU using BRDY #.
7.33 MDOlE#
Memory Data Output Enable
Tristates/Enables Memory Data Outputs
7.32.2 WHEN DRIVEN
Input to 82490XP (pin 20) Cycle Control Signal
When the CPU or 82495XP initiates a write cycle,
the write data is written to the appropriate 82490XP
buffer and CDTS1I is asserted. If MDOE# is active,
that first piece of write data will be available at the
MDATA pins with some delay from the CPU CLI<
edge that COTS 11 is asserted. Subsequent pieces of
write data are output with some delay from MCLK or
MOCLK (mode dependent) from the edge that
MBRDY # is sampled active. In strobed mode, subsequent data is output with MOSTB assertion.
MDATA has no value before CDTS# assertion, after
MEOC# with no pending cycle, or with MDOE# inactive.
For read cycles, the 82495XP asserts CDTSi! the
clock before the MOATA path is available for read
data. MDOE1I must be inactive for the 82490XP to
read data. Read data is strobed into the 82490XP by
asserting MBRDY1I on MCLK edges. MEOC1I will
latch the last piece data as it switches buffers. In
strobed mode, data is read by MISTB. Data that is
read into MDATA must meet proper setup and hold
times.
Data at the MDATA inputs need not follow setup and
hold times to MCLK edges that sample MBRDY #
inactive.
Asynchronous
7.33.1 SIGNAL DESCRIPTION
MDOE # is an input to the 82490XP that, when asserted, causes the 82490XP to drive its MOATAOMDATA7 outputs. When MDOE# is inactive, these
lines are floated and may be used as inputs to the
82490XP. MDOE# is not sampled by any clock and
is a direct connection to the 82490XP memory ouput
driver.
7.33.2 WHEN SAMPLED
Since MDOE# is a direct connection to the
82490XP memory output drivers, MDOE# must always be driven to a valid level. With MDOE# inactive, data in the 82490XP's may be driven to MDATA
outputs with some propagation delay from MDOE#
going active. Similarly, there is some float delay from
MDOE# going inactive.
MDOE# must be inactive for the 82490XP to read
memory data.
7.33.3 RELATION TO OTHER SIGNALS
7.32.3 RELATION TO OTHER SIGNALS
CDTS# indicates that write data is in the 82490XP
buffers. If MDOE# is active, write data is available at
MDATA some time after CDTS# or MEOC# is sampled active. Subsequent write data is available at
MDATA after MBRDY# assertion or MOSTB changing.
MDOE# has no relation to MCLK, MOCLK, or
MOSTB. Since MDOE# controls the final stage of
the MDATA output buffers, it has no effect on any
other signal of the 82490XP.
7.34 MEMLDRV
Memory Low Capacitance Drivers
Selects the Low Capacitance Drivers for the
82495XP and the 82490XP
2-311
82495XP Cache Controller/8249DXP Cache RAM
Inputs to 82495XP and 82490XP (pins Q4, 24) Configuration Signal
Synchronous to CLK
7.34.1 SIGNAL DESCRIPTION
MEMLDRV is a pin on both the 82495XP and
82490XP that, when high during reset, select normal
driving memory output buffers. If this pin is driven
low at reset, the high capacitance drivers are selected. Specifically, these are the 82495XP address outputs to the memory bus, and the 82490XP MDATA
outputs. The normal output drivers are designed to
drive up to 50 pF loads. The high capacitance driv·ers can drive up to 100 pF without derating.
For read or write cycles, MEOC# may be activated
on or after the clock edge of the last MBRDY # of
the current cycle. If a cycle is pending (pipelining is
used), the next cycle will flow-through with a propagation delay from MEOC# assertion. MEOC# is required for all memory bus cycles.
In addition to switching memory buffers, MEOC#
does three other things. One, MEOC# activation
causes the memory burst counter to be reset to its
start value and if MSEL # is active, MZBT # is sampled. This allows MSEL# to stay active between cycles. Two, MEOC# activation during a write cycle
causes MFRZ# to be sampled for the a subsequent
allocation (line-fill). Three, MEOC# latches in the
last slice of data (like MBRDY#) before switching
buffers.
7.34.2 WHEN SAMPLED
MEMLDRV is sampled like figure 7-1 with a setup
time of 4 CPU clocks for the 82495XP and 1 CPU
clock for the 82490XP. On the 82495XP, MEMLDRV
becomes the SYNC# input once FSIOUT# goes
inactive. On the 82490XP, MEMLDRV becomes the
MFRZ# signal which is sampled after the 'first memory cycle begins.
7.35.2 WHEN SAMPLED
In clocked memory bus mode, MEOC# is sampled
on every MCLK edge. It must always observe setup·
and hold times to MCLK. In strobed memory bus
mode, MEOC# is always sampled and must meet
proper active/inactive times.
7.35.3 RELATION TO OTHER SIGNALS
7.34.3 RELATION TO OTHER SIGNALS
MEOC# is provided so that a cycle may end on the
memory bus before CRDY'fI can be asserted. The
implication rules surrounding MEOC# are:
1. MEOC# :5: CRDY#
MEMLDRV sha.res a pin with SYNC# on the
82495XP and MFRZ# on the 82490XP.
2. MEOC# for cycle N + 1 ~ 2 clocks after CRDY #
of cycle N
3. MEOC # for cycle N + 1 ~ 2 clocks after last
BRDY # of cycle N
4. MEOC# ~ BGT#
7.35 MEOC#
Memory End of Cycle
Ends a cycle in 82490XP by switching buffers
Input to 82490XP (pin 23) Cycle Control Signal
Synchronous to MCLK or Asynchronous (strobed
mode)
MEOC# active with MSEL# active causes the sampling of MZBT# and MFRZ#.
7.35.1 SIGNAL DESCRIPTIONS
MEOC# is an input to the 82490XP that ends the
current cycle and switches memory buffers for new
cycle. Switching to the next cycle does not cause
information to be lost in the memory or CPU buffers
in the 82490XP, but rather switches new buffers to
the memory I/O bus of the_ 82490XP.
MEOC# is provided so that the memory system,
which is synchronous to MCLK, can switch to a new
cycle without synchronization .. In clocked memory
bus mode MEOC# is sampled with the rising edge
of MCLK. In strobed memory bus mode the MEOC#
function is performed with rising or falling edges of
MEOC#.
7.36 MFRZ#
Memory Data Freeze
Freezes Memory Write Data in 82490XP Buffer
Input to 82490XP (pin 24) Cycle Control Signal
Synchronous to MCLK or Strobed
7.36.1 SIGNAL DESCRIPTION
MFRZ# is an input to the 82490XP that when active
causes the 82490XP to "freeze" write data in· the·
82490XP memory buffer and allow a subsequent allocation to fill a cache line around it. MFRZ# is pro-
2-312
Intel.,
82495XP Cache Controller/82490XP Cache RAM
vided so that an actual write to memory need not be
done to perform an allocation. Using MFRZ # to perform this dummy write cycle requires that the memory bus controller put the allocated line into the "M"
state.
PAllC# must be active and MKEN# must be returned active for the write cycle to be turned into an
allocation. MFRZ# is sampled when MEOC# goes
active at the end of the write cycle. The subsequent
line fill is then filled around the write data to complete the allocation.
7.36.2 WHEN SAMPLED
In clocked memory bus mode, MFRZ # is sampled
with the MClK rising edge that MEOC# is sampled
active for all CPU write cycles. MFRZ# need only
follow a proper setup and hold time in this situation.
When the device which controls the memory bus
(the master) performs a memory access, a snoop is
requested of all other caching devices on the bus
(snoopers). An asserted MHITM# pin from any of
the snooper 82495XPs alerts the master that main
memory's data is stale, and that the bus must be
temporarily given to the snooper which has its
MHITM# asserted so that the modified line can be
written out to the memory. bus.
7.37.2 WHEN DRIVEN
The snoop lookup is performed in the clock in which
SNPCYC# is asserted. The MHITM# result for the
snoop is driven on the ClK following SNPCYC#,
and remains valid until the next assertion of
SNPSTS#. The MHITM# signal is not valid from
SNPSTS# until the ClK after SNPCYC#.
In strobed mode, MFRZ# is sampled with the falling
edge of MEOC# for write cycles. MFRZ# need only
follow a proper setup and hold time inthis situation.
7.37.3 RELATION TO OTHER SIGNALS
7.36.3 RELATION TO OTHER SIGNALS
A 82495XP can accept a snoop request while performing memory bus transfers of its own. If a snoop
is requested of a 82495XP while it is performing a
data transfer of its own, the results of the snoop may
be delayed. If SNPSTB# is sampled at a 82495XP
after it has received SGT # for its own cycle, the
snoop lookup is performed (SNPCYC# active) after
the SWEND# of its own cycle, and MHITM# is driven with valid results one ClK after SNPCYC# (see
Sections 6.2.4 and 6.2.5).
MFRZ# is sampled with the MEOC# going active or
being active for write cycles. MFRZ # is used so that
a dummy write cycle can be performed. If an allocation is done, DRCTM# must be asserted during the
SWEND# window of the line fill to put the allocated
line in the "M" state.
MFRZ# shares a pin with the MEMlDRV configuration input.
MHITM# and MTHIT# outputs together indicate the
results of a snoop lookup in the 82495XP.
7.38 MISTB
7.37 MHITM#
Memory Sus Input Strobe
Strobes data into the 82490XP
Input to 82490XP (pin 22) Cycle Control Signal
Asynchronous
Memory Sus Hit [M]
Indicates snoop hit to modified line
Output from 82495XP (pin H4) Snooping Signal
Sync to ClK
7.38.1 SIGNAL DESCRIPTION
7.37.1 SIGNAL DESCRIPTION
The MHITM # output is driven by the 82495XP during a snoop cycle to indicate that the snooping address has hit a Modified line. If the signal is logic
high, the snoop has not hit a modified line;. if the
signal is logic low, the snoop has hit a modified line.
When a snoop hits a modified line, the 82495XP automatically schedules a write-back of the hit modified line to the memory bus.
MISTS is an input to the 82490XP that, on rising or
falling edges, causes the 82490XP to latch its MDATA inputs. MISTS is used in strobed memory bus
mode. In clocked memory bus mode, MISTS is the
MSRDY # input.
.
2-313
82495XP Cache Controller/82490XP Cache RAM
!¥'OO~11.0IMlOOO~OOW
7.38.2 WHEN SAMPLED
7.39.3 RELATION TO OTHER SIGNALS
MISTS is always sampled by the 82490XP. MISTS
must meet proper strobed mode active and inactive
times.
MKEN# and MRO# are sampled with KWEND#
active. MKEN# must be sampled at least 2 clocks
before SRDY # assertion to make a line-fill noncacheable.
7.38.3 RELATION TO OTHER SIGNALS
7.40 MOClK
MISTS causes the latching of the 82490XP MDATA
inputs in strobed mode. MISTS shares a pin with
MBRDY#.
Memory Data Output Clock
Separate Clock Reference for Memory Data Output
7.39 MKEN#
Input to 82490XP (pin 27)
Asynchronous
Memory Cache Enable
Determines 82495XP and CPU cacheability ,
Input to 82495XP (pin R1) Cycle Attribute Signal
7.40;1 SIGNAL DESCRIPTION
MOCLK is the latch enable for the 82490XP memory
data outputs (MDATA). MOCLKcontrols the latching
of. a transparent latch which, when high, causes
MDATAto be driven from MCLK.When low, MDATA
is latched. MOCLK may only .beused in clocked
memory bus mode and only affects output data. It is
provided so that a greater MDATA output hold time
can be generated.
Synchronous to CLK
7.39.1 SIGNAL DESCRIPTION
MKEN # is an input to the 82495XP that is sampled
at the closing of the cacheability window (KWENO#
is sampled active). The 82495XP drives KEN # back
to. the CPU one clock after sampling the value of
MKEN#. MKEN# thus determines whether the cur·
rent cycle is .cacheable in the 82495XP and in, the
CPU.
To be used effectively, MOCLK must be a clock input that is skewed from MCLK: The 'following picture
shows how MOCLK has increased the hold time of
the output burst data:
.
.,
For read cycles, if MCACHE# is active (cacheable),
KEN# is driven out of the 82495XP to. the CPU to
indicate cacheability. If MKEN# is sampled inactive
during KWEND# activation, KEN# is brought inactive by the 82495XP, and the line will not be cacheable by the CPU or' 82495XP. If MCACHE# is inactive, the line will be non-cacheableregardless of
MKEN#. PCD active will cause MCACHE# to be
inactive.
MCLK
~
MOCLK
\: /\:/\
I
MDATA
MBRDY#
MKEN # is sampled during write-through cycles that
are potentially allocatable (PALLC# is active during
the write cycle). If MKEN# is sampled active during
KWEND# activation of the write cycle, an allocation
will occur, and a line-fill will follow the write cycle.
MKEN # during the line-fill is ignored. The MSC indicates to the 82495XP that it intends to perform an
allocation by asserting MKEN #.
.'-+-'
I
'-+-'
.
\
i.: ~(....---":;;;...iJC
~~~i----~~-----,
240956-32
7.40.2 WHEN SAMPLED
MKEN# must be sampled 1 clock before the first
BRDY # assertion to make a line-fill non~cacheable
to the CPU.
7.39.2 WHEN SAMPLED
MKEN # is sampled on the clock edge that
KWEND# is first sampled active. In all other places
MKEN # may violate setup and hold times.
MOCLK is sampled during and after RESET to determine whether output data should ,be driven from
MCLK or MOCLK: If toggling, MOCLK controls the
MDATA outputs with MCLK. If high,data is driven
from MCLK alone. Regardless, input data is never
referenced to MOCLK. .'
In strobed memory bus mode the MOCLK signal becomes MOSTS.· MOCLK is only used in clocked.
memory bus mode.
'
2-314
82495)(P Cache Controller182490){P Cache RAM
7.40.3 RELATION TO OTHER SIGNALS
To be used effectively, MOCLK must be the same
frequency as MCLK but be skewed. This effectively
increases MOATA hold time to main memory. Main
memory must sample the data on MCLK edges.
Once MRO# is sampled active during KWEND# activation, KEN # to the CPU is driven inactive regardless of the state of MKEN#. MKEN# does, however, determine whether the 82495XP will cache the
read-only line. Once MRO# is returned active, the
CPU will only require the number of transfers as indicated by LEN and CACHE#. If MKEN# is returned
active, the 82495XP will require an entire cache line.
82495XP read-only cache lines are filled to the [S1
state.
MOCLK shares a pin with the MOSTS signal.
7.41
state, and causes the line to be non-cacheable to
the CPU. Writes to read-only lines in the 82495XP
are treated as write-misses that are non-allocatable
(PALLC# is inactive). MRO# is a bit in each
82495XP tag entry.
MOSTB
Memory Sus Output Strobe
Strobes data out of 82490XP
Input to 82490XP (pin 27) Cycle Control Signal
Asynchronous
7.41.1 SIGNAL DESCRIPTION
MOSTS is an input to the 82490XP that, on rising
and falling edges, causes the 82490XP to output
data through its MOATA outputs. MOSTS is only
used in strobed memory bus mode. In clocked memory bus mode, MOSTS is the MOCLK input.
7.41.2 WHEN SAMPLED
The line-fill portion of an allocation may be filled to
the read-only state by returning MRO# active during
KWEND# of the line-fill. MRO# is ignored during
the write portion.
If MRO# is returned active during KWENO#,
DRCTM # and MWS/WT # are ignored during
SWEND#.
MRO# must be returned to the 82495XP at least 2
clocks before SROY # is returned to the CPU so
KEN # can be sampled properly.
MOSTS is always sampled by the 82490XP. MOSTS
must meet strobed mode active and inactive times.
There is one Read-Only bit per tag in the 82495XP.
7.41.3 REALTION TO OTHER SIGNALS
7.42.2 WHEN SAMPLED
MOSTS strobes data out of the 82490XP through
MOATA. MOSTS shares a pin with MOCLK.
MRO# is sampled on the first clock that KWENO#
is sampled active. In all other clocks, MRO# need
not follow setup and hold times.
7.42 MFlO#
7.42.3 RELATION TO OTHER SIGNALS
Memory Read-Only
Designates current line as read-only
Input to 82495XP (pin J1) Cycle Attribute Signal
Synchronous to ClK
MRO# and MKEN# are sampled with KWENO#
activation. MRO# must be returned at least 2 clocks
prior to the first SROY #.
7.43 MSEL#
7.42.1 SIGNAL DESCRIPTION
MRO# is an input to the 82495XP that is sampled at
the closing of the cacheability window (KWENO#
activation). If sampled active, it causes the current
line fill to the 82495XP to be put in the read-only
2-315
Memory Suffer Chip Select
Selects 82490XP, Causes Sampling of MZST#
Input to 82490XP (pin 25) Cycle Control Signal
Synchronous to MCLK or Strobed
82495XP Cache Controller/82490XP Cache RAM
7.43.1 SIGNAL DESCRIPTION
MSEl# is an input to the 82490XP that has 3 main
functions. One, MSEl# active qualifies the
MBRDY# input to the 82490XP. If MSEl# is inactive for a particular 82490XP, MSRDY# will not be
recognized by that 82490XP.
Two, MSEl# going active causes the sampling of
MZST # for the next transfer.
Three, MSEl# going inactive resets the 82490XP
internal memory burst counter. The 82490XP containsa memory burst counter that counts through
the CPU burst order with each MSRDY# assertion
and increments a pointer to the 82490XP memory
buffer being accessed.
MSEl# going inactive will reset this burst counter to
its original burst value. Sy resetting this counter before iviEOC# asseriion, ali information currentiy being read into the 82490XP is lost, but information
that is being written out is maintained and may be
rewritten.
In general, MSEl# may stay inactive for single
transfer cycles such as posted 64-bit write cycles.
Once active, MSEl # need not go inactive as the
burst counter is reset withMEOC# activation. Since
MZBT# may also be sampled with MEOC#, it is
possible to leave MSEl# asserted throughout most
basic transfers.
MSEl# or MEOC# must be used to reset the burst
counter before any transfer begins. If transfers are
interrupted (by a snoop hit before BGT # assertion
for example), MSEl# must be brought inactive so
the burst counter may be reset for the snoop write
back.
MSEl# must be sampled inactive for at least 1
MClK after reset. This resets the memory burst
counter for the first transfer.
7.43.2 WHEN SAMPLED
In clocked memory bus mode, MSEl # is sampled
with all rising edges of MClK. In this mode, if
MSEl# is sampled inactive, the memory burst
counter is reset and MZST# is sampled. If MSEl#
is sampled active and MBRDY # is sampled active,
the memory burst counter is incremented. Since it is
constantly sampled with MClK, MSEl# must always be driven to a known state and must always
meet setup and hold times to every MClK edge.
2-316
~OO~IbDINlDOO~OOW
In strobed mode, MSEl# falling edge causes the
sampling of MZST#. While MSEl# is active, MISTS
and MaSTS cause the memory burst counter to be
incremented. The rising edge of MSEl# causes the
memory burst counter to be reset.
MSEl# must be inactive sometime after RESET before the first transfer to initialize the burst counter.
7.43.3 RELATION TO OTHER SIGNALS
MSEl# causes the sampling of MZST#, and qualifies the use of MSRDY#, MaSTS, and MISTS.
Since MSEl# acts as a qualifier for these signals,
MSEl# may be asserted at the same time as
MBRDY #, MaSTS, or MISTS.
7.44 MTHIT#
Memory Sus Tag Hit
Indicates snoop hit
Output from 82495XP (pin G3) Snooping Signal
Sync to ClK
7.44.1 SIGNAL DESCRIPTION
The MTHIT # output is asserted by the 82495XP
during snoop cycles to indicate that the snoop address has hit a line in the 82495XP cache. An asserted MTHIT # signal from any of the snooping
82495XP's alerts a bus master that the data being
accessed resides in another cache. If SNPINV was
not asserted on the snoop request, the copy of the
data in a 82495XP asserting MTHIT# will remain
valid and in the Shared state-so a caching master
must also place his copy of the data in the Shared
state.
7.44.2 WHEN DRIVEN
The snoop lookup is performed in the ClK in which
SNPCYC# is asserted. The MTHIT# result for the
snoop is driven on the next ClK and remains valid
until the next assertion of SNPSTB#. The MTHIT#
signal is not valid from SNPSTS# until the ClK after
SNPCYC#.
7.44.3 RELATION TO OTHER SIGNALS
MTHIT# and MHITM# together indicate the results
of a snoop lookup in the 82495XP.
82495XP Cache Controller/82490)(P Cache RAM
An 82495XP can accept a snoop request while performing memory bus transfers of its own. If a snoop
is requested while it is performing a transfer of its
own, the results of the snoop may be delayed. If
SNPSTB# is sampled at a 82495XP after it has received BGT # for its own cycle, the snoop lookup is
performed (SNPCYC# active) after the SWEND# of
its own cycle, and MTHIT# is driven with the valid
result one ClK after SNPCYC# (see Sections 6.2.4
and 6.2.5).
Because an asserted MTHIT# from any snooping
82495XP requires the master to place th~ fetched
line in the Shared state (unless it is an invalidating
snoop), the memory bus controller should include
the MTHIT# signals ofdther processors when generating the MWB/WT# signal to its own 82495XP.
7.45 MWB/WT#
Memory Write-back/Write-through .
Forces lines to be filled to the [S] state
Input to 82495XP (pin K3) Cycle Attribute Signal
Synchronous to ClK
7.45.1 SIGNAL DESCRIPTION
MWB/WT # is an input to the 82495XP that is sampled at the closing of the snoop window (SWEND#
activation). If sampled active, the current line-fill is
filled to the [S] state in the 82495XP. The [S] state
is a write-through state in the 82495XP.
MWB/WT # is used in many cases. If a cache· to
cache transfer updates memory and leaves the data
valid in the other cache, the line must be filled to the
[S] state instead of the [E] state default. A portion of
memory may be designated as write-through by asserting MWB/WT # for appropriate addresses.
MWB/WT# has no effect on the 82495XP if
DRCTM# is sampled active or MRO# has been
sampled active during KWEND #. If PWT is active,
MWB/WT # has no effect and the line is filled to the
[S] state.
~[RH~lbmK1mrttJ~~w
7.46 MX4/MX8#
MTR4/MTR8#
Memory 4/8 1/0 bits
Memory 4/8 Transfers
Selects MDATA Input/Output width and number of
memory bus transfers
Inputs to 82490XP (pins 21, 25) Configuration Signals
Synchronous to ClK
7.46.1 SIGNAL DESCRIPTION
MX4/MX8# configures the 82490XP to use
MDATA[O:3] or MDATA[O:7] memory bus 1/0 pins.
MTR4/MTR8# selects whether the a cache line will
take 4 or 8 transfers. These selections depend on
the line ratio (82495XP line size I CPU line size) and
must be configured according to the following table:
Line
Ratio
MX41
MXS#
MTR41
MTRS #
Membus
1/0 Pins
CPUbus
1/0 Pins
1
1
1
4
4
2
1
0
4
4
2
0
1
8
4
4
0
0
8
4
1
0
1
8
8
2
0
0
8
8
7.46.2 WHEN SAMPLED
These signals are sampled like Figure 7-1 with a setup time of 1 clock. Once the first CADS# is issued
by the 82495XP these signals are sampled for the
MZBT# and MSEl# functions.
7.46.3 RELATION TO OTHER SIGNALS
MX4/MX8# shares a pin with MZBT# and MTR41
MTR8# shares a pin with MSEl#.
7.45.2 WHEN SAMPLED
7.47 MZBT#
MWB/WT # is sampled on the first clock edge that
SWEND# is sampled active. If MWB/WT# is not
being sampled, it need not follow setup and hold
times.
Memory Zero Base Transfer
Forces cycles to begin at subline address 0
Input to 82490XP (pin 21) Cycle Control Signal
Synchronous to MClK or Strobed
7.45.3 RELATION TO OTHER SIGNALS
Both MWB/WT # and DRCTM # are sampled with
SWEND#.
2-317
intei"
~[fJ~IbD!MlD~~I:Rr,?
82495XP Cache Controller/82490XP Cache RAM
7.47.1 SIGNAL DESCRIPTION
7.47.3 RELATION TO OTHER SIGNALS
'MZBT# is an input to the 82490XP that forces a
read or write cycle to begin with burst address 0
regardless of the CPU generated address.
MZBT# is sampled with MSEL# and MEOC# and
has no effect otherwise. In systems that will never
force a zero-based transfer, MZBT # may be driven
high after RESET.
MZBT # is sampled before the transfer begins.
MZBT# is sampled with MSEL# and MEOC#.
MZBT # is sampled with MSEL # going active for the
current cycle. If MSEL # stays active between cycles, MZBT# is sampled with MEOC# going active
for the previous cycle.
Once sampled, data input to the 82490XP's will start
at burst address 0 and continue through 4, 8, C, etc.
If the CPU is requesting a burst location other than
0, the memory bus controller must hold off any
BRDY # until that bursted item is read from the
memory bus.
MZBT# shares a pin with the MX4/MX8# configuration input.
7.48 NCPFLD #
Non-Cacheable PFLD
Enables Non-Cacheable Floating Point Loads
Input to 82495XP (N4) Configuration Signal
Asychronous
7.48.1 S!GNAL DESCR!PT!ON
7.47.2 WHEN SAMPLED
In clocked mode, MZBT # is sampled in two locations. First, MZBT # is sampled on all MCLK rising
edges where MSEL # is sampled inactive. Once
MSEL# is sampled active, the value of MZBT# that
was sampled one MCLK before is used for the next
transfer.
Second, MZBT# is sampled on MCLK rising edges
where MEOC# is sampled active with MSEL# active. The MZBT # value sampled will be used for the
next transfer. This allows MSEL# to stay asserted
between transfers if so desired.
During RESET, this pin functions as the NCPLFD#
configuration signal. The 82495XP can be configured to decode i860 XP CPU PFLD (Pipelined Floating Point Load) cycles. The 82495XP supports 3 operational modes for PFLD cycle decoding as defined
by FPFLDEN and NCPFLD#:
Mode # 1. PFLD cycles that are cached in the
82495XP.
Mode #2. PFLD cycles not cached in the 82495XP,
without an external PFLD extension
FIFO.
Mode #3. PFLD cycles not cached in the 82495XP,
with an external PFLD extension FIFO.
In strobed mode, MZBT # is sampled with the same
two signals. First, it is sampled with the falling edge
of MSEL#. Second, it is sampled with the falling
edge of MEOC# if MSEL# is active.
In clocked
setup and
MSEL# is
active with
memory bus mode MZBT # must follow
hold times to all MCLK edges where
sampled inactive or MEOC# is sampled
MSEL# active.
In strobed memory bus mode MZBT # must meet
setup and hold times to MSEL # falling edge and
MEOC# falling edge if MSEL# is active.
Mode #
FPFLDEN
1
0
1
2
0
0
3
1
1
Illegal Mode
1
0
See Section 5.2.5 for details.
2-318
NCPFLD#
82495XP Cache Controller/82490XP Cache RAM
7.50.1 SIGNAL DESCRIPTION
7.48.2 CASES IT IS ASSERTED AND
DEASSERTED
NCPFlD# is sampled on the falling edge of RESET
and is a don't care at any other time. NCPFlD#
must be valid for at least 10 ClK's before RESET's
falling edge.
PAllC# indicates to the MBC that the current write
cycle may allocate (perform a line-fill on) a cache
line. The MBC chooses to perform an allocation by
asserting MKEN# during KWEND# of the write cycle. Potential allocate cycles are cycles which are
82495XP misses with PCD and PWT inactive.
7.48.3 RELATION TO OTHER SIGNALS
The exact condition for assertion of PAllC# is:
NCPFlD# shares a pin with FlUSH#. Both
NCPFlD# and FPFlDEN describe the PFlD mode
used.
Miss * !PCD * !PWT * LOCK# * W/R# * D/C# * M/IO#
PAllC# is inactive (HIGH) for any write-hit to a
Read-Only line.
7.49 NIENIE#
Next Near
Indicates current cycle address is near previous one.
Output from 82495XP (pin D5) Cycle Control Signal
Synchronous to ClK
7.50.2 WHEN DRIVEN
PAllC# is valid.in the same ClK as CADS# and is
valid until CRDY# or CNA#.
7.50.3 RELATION TO OTHER SIGNALS
PAllC# is valid with CADS#.
7.49.1 SIGNAL DESCRIPTION
NENE# indicates to the MBC that the address of
the requested memory cycle is "near" the address
of the previously generated one (in the same 2K
DRAM page). This information may be used by the
MBC to optimize access to paged or static column
DRAMs.
7.51
PAR#
Parity Selection
Selects 82490XP as a Parity Device
Input to 82490XP (pin 32) Configuration Signal
Synchronous to ClK
7.49.2 WHEN DRIVEN
NENE# is valid together with CADS# and will stay
valid until CNA#or CRDY#.
7.49.3 RELATION TO OTHER SIGNALS
Address and cycle specification signals (MSETOMSET10,
MTAGO'-MTAG11,
MCFAO-MCFA6,
CW/R#, CM/IO#, CD/C#, RDYSRC, MCACHE#,
NENE#, SMlN#, KlOCK#, and CPlOCK#) will be
valid with CADS # .
7.51.1 SIGNAL DESCRIPTION
PAR# is a strapping option on the 82490XP that,
when strapped low, configures that 82490XP device
to be a dedicated parity device. A 82490XP parity
device must be configured the same as all the other
devices, however, the data lines are defined differently. CDATA[0:3j are 4' parity bit 1/0 lines and
CDATA[4:7j are 4 bit select lines so each parity line
may be written individually. Parity devices must be
used as follows:
NENE# may change state after CNA# or CRDY#
are asserted tethe 82495XP.
7.50 PALLC#
Potential Allocate
Indicates 82495XP intent to allocate current cycle
Output from 82495XP (pin D2) Cycle Control Signal
Synchronous to ClK
2-319
Memory
Bus
Width
Number
of Parity
Devices
256K
64
2
4:4
512K
128
2
4:8
Cache
Size
82490XP
1/0 Bits
(CPU:Mem)
•
82495XP Cache Controller/82490XP Cache RAM
7.51.2 WHEN SAMPLED
7.53.1 SIGNAL DESCRIPTION
PAR# is a strapping option and must be tied either
high or low.
The falling edge of this signal tells the 82495XP to
sample all configuration inputs and initializes the
82495XP to a known state. See the specific configuration signals for setup and hold times relative to
RESET's falling edge. RESET can be asserted at
any time.
7.51.3 RELATION TO OTHER SIGNALS
PAR# affects the definition of the CDATA and MDATA lines of the 82490XP.
7.52 RDYSRC
Ready Source
Cycle control signal to the MBC
Output from 82495XP (pin C1) Cycle Control Signal
Synchronous to ClK
7.5~.1
During initiialization, the 82495XP lRU bits are set
to 1 indicating that the 82495XP lRU way is way 1.
The 82490XP MRU bits are'initlialized td 0 as are all
tag array bits.
RESET takes about 4100 clocks in the 82495XP.
RESET with self-test takes about 80,000 clocks.
7.53.2 WHEN SAMPLED
SiGNAL DESCRiPTiON
RDYSRC serves as a cycle control signal to the
MBC. It indicates the source of the BRDY # genera~
tion (either 82495XP or MBC) for the CPU. When
high it indicates that the MBC should generate the
BRDY#s to the CPU, when low it indicates thaUhe
82495XP will provide the BRDY#s.
RESET is an asvnchronous inout. RESET must have
a pulse width of at least 8 ClK's in order to guarantee 82495XP recognition.
7.53.3 RELATION TO OTHER SIGNALS
The following signals are sampled at RESET:
CNA# [CFGO]:
CFGO line of 82495XP
configuration inputs
SWEND# [CFG1]:
CFGl line of 82495XP
configuration inputs
7.52.2 WHEN DRIVEN
KWEND# [CFG2]:
RDYSRC is valid in the same ClK as CADS# and is
valid until CRDY# or CNA#.
CFG2 line of 82495XP
configuration inputs
FLUSH# [NCPFLD#]:
If low, enables decoding of
i860XL non- cacheable PFLD
mode.
FPFLD # [FPFLDEN]:
If high, enables the external
FIFO for i860XL PFLD mode.
BGT# [C490LDRV]:
Indicates the driving strength of
the 82495XP/82490XP
interface.
SYNC# [MEMLDRV]:
Indicates the memory bus
driving strength.
SNPCLK# [SNPMD]:
Indicates the snooping mode;
synchronous or strobed.
CFG2-CFGO
Configure cache parameters
such as lines/ sector, line ratio,
and number of tags.
RDYSRC is asserted for line-fill and not asserted for
the write portion of allocation cycles.
7.52.3 RELATION TO OTHER SIGNALS
Address and cycle specification signals (MSETOMTAGO-MTAG11,
MCFAO-MCFA6,
MSET10,
CW/R#, CMIIO#, CD/C#, RDYSRC, MCACHE#,
NENE#, SMlN#, KLOCK#, and CPlOCK#) will be
valid with CADS #.
7.53 RESET
Reset
Forces the 82495XP to begin execution in a known
state
Input to 82495XP (05)
Asynchronous
2-320
82495XP Cache Controller/82490XP Cache RAM
7.54 SLFTST #
7.55.3 RELATION TO OTHER SIGNALS
Self Test
Executes B2495XP self-test
Input to B2495XP (pin M2) Test Signal
Synchronous to ClK
Address and cycle specification signals (MSETOMSET10,
MTAGO-MTAG11,
MCFAO-MCFA6,
CW/R#, CM/IO#, CD/C#, RDYSRC, MCACHE#,
NENE#, SMLN#, KlOCK#, and CPlOCK#) will be
valid with CADS #.
7.54.1 SIGNAL DESCRIPTION
7.56 SNPADS#
If SlFTST# is sampled low and HIGHZ# is sampled high, the B2495XP will perform a self-test after
reset. The results of the self-tests are given by CAHOLD when FSIOUT # goes inactive.
Cache Snoop Address Strobe
Initiates a snoop write back cycle
Output from B2495XP (pin F3) Snooping Signal
Sync to ClK
7.54.2 WHEN SAMPLED
SlFTST# is sampled with reset like figure 7-1 with a
setup time of 10 CPU clocks. SlFTST # .is then a
"don't care" until after the first CADS # activation
when it becomes the CRDY # pin.
7.54.3 RELATION TO OTHER SIGNALS
SlFTST # shares a pin with CRDY #. The 82495XP
enters self-test if both SlFTST# is sampled active
and HIGHZ# is sampled inactive.
7.55 SMLN#
7.56.1 SIGNAL DESCRIPTION
The SNPADS# signal indicates valid cache control
and attribute signals, functioning identically to
CADS #, but is generated only on snoop writebacks. The separation of address status signals for
normal and snoop write-back cycles eases memory
bus controller implementation. When SNPADS# is
activated, the memory bus controller should abort all
pending cycles for which BGT # has not been issued. The 82495XP reissues these non-committed
cycles after the snoop write-back has completed.
7.56.2 WHEN DRIVEN
Same Line
Current cycle is same 82495XP line as previous one.
Output from 82495XP (pin C6) Cycle Control Signal
Synchronous to ClK
7.55.1 SIGNAL DESCRIPTION
SMlN# is used to indicate to the MBC that the current cycle is accessing the same 82495XP cache
line as the previous cycle. This indication can be
used by the MBC to selectively activate its
SNPSTB# signal to other caches in the system. For
example, back-to-back snoop hits to the same line
may be snooped only once.
SNPADS# is produced when a snoop hits a modified line. A modified line condition exists when a line
in the cache has been updated, and copies of that
memory location in other devices are no longer valid. A snoop is initiated by the master of a shared bus
when accessing a memory location on the shared
bus.
The response of the 82495XP to a snoop appears
on the MTHIT# and MHITM# pins in the clock after
SNPCYC# is active. If these pins are both driven
low, the snoop resulted in a hit to a modified line,
and a snoop write-back is initiated with the assertion
of SNPADS#. SNPADS# is driven, at earliest, two
clocks after SNPCYC#. Like CADS#, SNPADS# is
active for one ClK, and is always valid.
7.55.2 WHEN DRIVEN
SMlN# is asserted with CADS# and will stay valid
until CNA# or CRDY#.
.2-321
7.56.3 RELATION TO OTHER SIGNALS
Cycles initiated by SNPADS# require only CRDY#;
they do not require the other cycle progress signals
(BGT#, KWEND#, SWEND#) .
II
82495XP Cache Controller/82490XP Cache RAM
The SNPADS# signal is driven by the 82495XP to
indicate the start of the write-back cycle; the
82495XP drives the following address and cycle
specification signals valid with SNPADS#: CW/R#,
CD/C#, CM/IO#, MCACHE#, RDYSRC, NENE#,
SMlN#, and the address on MSET[O:10],
MTAG[O:11], and MCFA[O:6]. Upon assertion of
SNPADS#, the memory bus controller should cancel all pending cycles for which BGT # has not yet
been asserted, because they will be reissued after
the snoop write-back. The 82495XP will ignore
BGT # while SNPBSY # and MHITM # are active (ie,
during the write-back).
The 82495XP can accept a snoop request while performing memory bus transfers of its own. If a snoop
is requested while it is performing a transfer of its
own, the results of the snoop and any necessary
snoop write-backs may be delayed. If SNPSTB # is
sampled at a 82495XP after it has received BGT #
fOi its Owvii cycle, and the snoop hits a modified line,
the snoop write-back will occur after CRDY # for the
82495XP's own cycle. See Sections 6.2.4 and 6.2.5
for details.
7.57.3 RELATION TO OTHER SIGNALS
After SNPCYC# occurs for a snoop, a new snoop
. may be initiated. If SNPBSY # is asserted for the
initial snoop, the SNPCYC# of the second snoop is
delayed until the SNPBSY# signal is de asserted for
the initial snoop, indicating that its snoop processing
has completed.
7.58 SNPClK [SNPMD]
Snoop Clock [Snooping Mode]
Selects 82495XP snooping mode
Input to 82495XP (pin S3) Snooping Signal
Synchronous to ClK
7.58.1 SIGNAL DESCRIPTION
SNPMD selects whether the 82495XP snoop initiation be in synchronous, clocked, or strobed mode.
82495XP snoop response is always synchronous to
ClK.
Synchronous mode (to ClK) is selected by SNPMD
sampled low during reset. Strobed mode is selcted
by SNPMD sampled high during reset. Clocked
mode is selected by connecting the snoop clock
source to SNPMD, and thus SNPMD becomes the
actual snoop clock (SNPClK).
7.57 SNPBSY#
Snoop Busy
Indicates additional snoop processing in progress
Output from 82495XP (pin F1) Snooping Signal
Sync to ClK
7.58.2 WHEN SAMPLED
7.57.1 SIGNAL DESCRIPTION
SNPBSY# and SNPCYC# indicate a snoop in progress. The SNPCYC# signal is asserted on the actual
snoop look-up to the 82495XP tags. If the snoop
look-up indicates a valid line is hit and the snoop is
invalidating, the 82495XP must perform a back invalidation on the CPU. If a snoop hit occurs to amodified line, a snoop write-back must occur. SNPBSY #
is asserted and remains active while either a back
invalidation or a snoop write-back is in progress.
7.57.2 WHEN DRIVEN
SNPBSY # is activated for two conditions. First,
SNPBSY # is activated whenever a back invalidation
is necessary: the snoop returns MTHIT # active and
SNPINV was asserted on the snoop initiation. Second, SNPBSY# is activated when a modified cache
line is hit on a snoop, as indicated by MHITM#, until
the modified line has been written back (CRDY # returned for the write-back).
SNPMD is sampled like figure 7-1 with a setup time
of 4 CPU clocks. SNPMD is then not used unless
clocked mode is being selected. If clocked mode is
selected, SNPMD becomes SNPClK to clock in
snoop requests.
7.58.3 RELATION TO OTHER SIGNALS
SNPMD becomes SNPClK if a clock signal is detected at reset. In this clocked mode, SNPClK is
then used to clock-in SNPSTB #, the snoop address, and all snoop attributes.
7.59 SNPCYC#
Snoop Cycle
Indicates snoop look-up occurring in 82495XP tags
Output from 82495XP (pin H3) Snooping Signal
Sync to ClK
SNPBSY# is valid in the ClK following SNPCYC#,
and if active, remains active for a minimum of two
ClKS.
2-322
untel·
82495XP Cache Controller/82490XP Cache RAM
~rnl~lLO[M]OOO~OOW
7.59.1 SIGNAL DESCRIPTION
7.60.2 WHEN SAMPLED
SNPCYC# is asserted by the 82495XP during the
clock when the actual tag look-up for the snoop is
performed. SNPCYC# may appear as early as the
ClK following SNPSTB # assertion, or may be delayed several clocks while a snoop write-back or
82495XP memory bus cycle take place.
When a bus master performs a bus access, the
SNPSTB# of all other 82495XPs is asserted to initiate a snoop for that address. If the master's access
is one which is modifying the data (a write to memory, etc.), the SNPINV pin of all snooping 82495XPs
must be asserted during SNPSTB # so that the line
is properly marked Invalid.
7.59.2 WHEN DRIVEN
SNPINV is not asserted during SNPSTB# assertion
if snoop hits are to remain valid: the master issuing
the snoop does not require their invalidation (a
read).
SNPCYC# is always a valid 82495XP output. It is
asserted once, for a single clock, for every snoop
which is initiated in the 82495XP.
7.59.3 RELATION TO OTHER SIGNALS
A snoop is initiated by assertion of the SNPSTB#
input if MAOE# is not asserted. The actual snoop,
signalled by the assertion of SNPCYC#, can be delayed by a prior snoop's write-back in progress
(SNPBSY # asserted) or by a 82495XP memory cycle in progress (SNPSTB# occurs after BGT#)see SNPSTB# for details. If neither of these is occurring, strobed and clocked snooping modes can
also delay snoop look-up for a clock while the snoop
address and attributes are synchronized.
In the clock following SNPCYC#, MHITM # and
MTHIT# report valid snoop results.
7.60 SNPINV
Snoop Invalidation
Forces invalidation of snoop hits
Input to 82495XP (pin P5) Snooping Signal
Sampled with SNPSTB# (see SNPSTB#)
SNPINV assertion forces all snoop hits to be invalidated, overriding other inputs or attributes (ie
SNPNCA). When SNPINV is not asserted, cache
states change according to normal protocol.
SNPINV is only sampled with SNPSTB#, which may
be qualified by ClK or SNPClK depending on the
snooping mode, and must meet setup and hold
times for the edge of its sampling. When SNPSTB#
is not being asserted, SNPINV is a don't care and
need not follow setup and hold times.
7.60.3 RELATION TO OTHER SIGNALS
SNPINV is sampled according to SNPSTB#, which
may be qualified by SNPClK or ClK, depending on
the snooping mode. SNPINV overrides the SNPNCA
input, which may also be asserted with SNPSTB#. If
MAOE# is active with SNPSTB# sampling, the
snoop request is ignored.
7.61 SNPNCA
7.60.1 SIGNAL DESCRIPTION
Assertion of the SNPINV signal during the initiation
of a snoop request forces a snoop hit for that request into the Invalid state.
The SNPINV pin is sampled upon initiation of a
snoop request with SNPSTB# activation, depending
on snooping mode: rising edge of first ClK when
SNPSTB is asserted (synchronous snooping mode),
or rising edge of first SNPClK when SNPSTB# is
asserted (clocked mode), or falling edge of strobed
SNPSTB # (strobed mode).
2-323
Snoop Non Caching device Access
Indicates to snooping 82495XP that the initiating
master is a non- caching device
Input to 82495XP (pin 03) Snooping Signal
Sampled with SNPSTB# (see SNPSTB#)
7.61.1 SIGNAL DESCRIPTION
SNPNCA indicates that the master which is initiating
the snoop request will not cache the data. If the
SNPNCA pin is not asserted and the snoop is noninvalidating (where noninvalidating = SNPINV not asserted), a .snoop hit line must be placed in the
Shared state, since the data will exist in another
82495XP Cache Controller/82490XP Cache
cache. If SNPNCA is asserted and the snoop is noninvalidating, a snoop hit line will not be entered into a
new cache, so a hit Exclusive or Modified line will be
placed in the Exclusive state by the 82495XP. A
noninvalidating snoop hit to a Shared line must keep
the hit line in the Shared state, regardless of
SNPNCA.
~AM
SNPNCA, and places all snoop hit lines into the Invalid state. If MAOE # is active on SNPSTB # sampling, the snoop request is ignored.
7.62 SNPSTB#
Snoop Strobe
SNPNCA is sampled upon initiation of a snoop request with SNPSTB# activation, depending on the
snooping mode: rising edge of first ClK when
SNPSTB# asserted (synchronous snooping mode),
or the rising edge of SNPClK when SNPSTB# is
asserted (clocked snooping mode), or the falling
edge of SNPSTB# (strobed snooping mode).
Initiates 82495XP snoop and latches snoop address
& attributes
Input to 82495XP (pin R3) Snooping Signal
Sync to ClK or SNPClK. or strobed
7.62.1 SIGNAL DESCRIPTION
Snoop strobe initiates a 82495XP snoop request. It
controls the latching of the snoop address and
snoop attribute signals, in the manner specified by
one of three snooping modes:
7.61.2 WHEN SAMPLED
To achieve maximum processor performance and
minimum bus traffic, SNPNCA shouid be asserted
when the noninvalidating snoop is caused by an access from a non-caching device like a DMA.
Snooping Modes
If the snoop is being caused by a device which will
also be caching the data, SNPNCA must not be asserted, so that the 82495XP does not leave the hit
line in an Exclusive state-subsequent writes to
lines in this state do not appear on the bus, and stale
data would result in the cache which incorrectly asserted SNPNCA.
If SNPNCA is asserted on a noninvalidating snoop
request, the following outlines the behavior of the
cache for a snoop hit in each of the MESI states:
Modified The data is written to the bus, and the
line is placed in the Exclusive state
Exclusive The line remains in the Exclusive state
Shared
Invalid
The line remains in the Shared state
This is a cache miss. The line remains
Invalid.
If SNPNCA is NOT asserted on a noninvalidating
snoop request, an M, E, or S state hit line will be
placed in the Shared state. Again, M state causes a
write to the bus, Invalid lines remain Invalid.
SNPNCA is only sampled with SNPSTB#, which
may be qualified by ClK or SNPClK depending on
the snooping mode, and must meet setup and hold
times for the edge of this sampling. When
SNPSTB# is not being sampled, SNPNCA is a don't
care and need not follOW set-up and hold times.
7.61.3 RELATION TO OTHER SIGNALS
SNPNCA is sampled with SNPSTB#, which may be
qualified by SNPClK or ClK, depending on snooping mode. The assertion of SNPINV overrides
Mode
Strobed
Clocked
,
Synchronous
Snoop Address!
Attributes Sampled on:
falling edge of SNPSTB#
rising edge of SNPClK when
SNPSTB# sampled active
rising edge of ClK when
SNPSTB# sampled
SNPSTB# must be asserted to initiate asn60p request. Snoops are initiated by a bus master for all
memory accesses, to ensure that data residing in
other caches is flushed if modified and invalidated if
necessary.
SNPSTB # must be deasserted for at least one
SNPClK or ClK when clocked or synchronous
snooping mode (respectively) is used, in order to
rearm for the next snoop.
SNPSTB # can be asserted while a snoop is in progress, allowing one level of pipelining. However, the
reassertion of SNPSTB # while snooping is in progress must not occur until after SNPCYC#-preciseIy, after the falling edge ofSNPCYC# for strobed
and clocked modes, or in the clock after SNPCYC#
is active for synchronous mode. SNPSTB# must not
be asserted between the first and last BGT # of a
locked sequence. Similarly, SNPSTB # must not occur after the BGT # of the write through and before
the BGT # of the allocation when a Read-far-Ownership transaction is occurring.
SNPSTB # itself does not affect the cache contents
or states, but the snoop signals SNPINV and
SNPNCA, latched upon SNPSTB#, force various
changes in the cache on a snoop hit.
2-324
82495XP Cache Controller/82490XP Cache RAM
7.62.2 WHEN SAMPLED
SNPSTB# is sampled on every SNPCLK or ClK in
clocked or synchronous modes, and is sampled constantly in strobed mode. While a snoop is in progress, a new SNPSTB# is recognized as a new, possibly pipe lined, snoop request. After the assertion of
a pipe lined SNPSTB #, the SNPSTB # signal must
not be reasserted until after the next SNPCYC#.
SNPSTB# should always meet proper set-up and
hold times when operating in clocked or synchronous modes. When operating in strobed mode, it
must meet minimum active/inactive times to be
properly recognized in the next clock.
7.62.3 RELATION TO OTHER SIGNALS
SNPSTB# latches the following signals: SNPINV,
SNPNCA, MBAOE #, and MAOE #, and the address
on the MSET, MTAG, and MCFA pins. The address
which appears on the MSET, MTAG, and MCFA address pins is to be snooped in the 82495XP.
MAOE # acts as a qualifier for a snoop; if MAOE # is
active when sampled on a SNPSTB# assertion, the
snoop request is ignored. ·SNPINV and SNPNCA
provide the 82495XP with snoop attributes which affect the state of a snoop hit cache entry.
If MBAOE# is active during SNPSTB# assertion,
the 82495XP forces all bits in the subline address
(those address bits which MBAOE# controls) to 0
on a snoop write back for that snoop.
Snoops and memory accesses are interlocked, such
that after BGT # for a memory access has been issued, a SNPSTB# which is asserted will be latched,
with its address and attributes, but will not cause a
snoop until after SWEND# for that memory cycle.
After BGT # has been issued for a cycle, snoop
write-backs are delayed until after the CRDY # for
that cycle. Likewise, once a snoop is underway
(SNPCYC# active) BGT# is ignored until snoop
completion.
SNPSTB # must not be deasserted and reasserted
(specifically, cause a second falling edge) between
its initial recognition and SNPCYC#-ie, SNPSTB#
must not be asserted before the SNPCYC# of the
previous SNPSTB#. In strobed and clocked modes,
SNPSTB # can be reasserted after the falling edge
of SNPCYC#; in synchronous mode, SNPSTB# can
be reasserted in the ClK after SNPCYC# is active.
This second assertion of SNPSTB #, after
SNPCYC#, can occur while the first snoop is still
progressing (SNPBSY # is active), allowing one level
of snoop pipelining. In this case, a third assertion of
SNPSTB# must not occur until after the SNPCYC#
for the second, piped snoop request.
SNPSTB # must not be asserted while the 82495XP
is executing a locked sequence (lOCK# active).
Specifically, SNPSTB# must not be asserted after
the BGT # for the first locked access and before the
BGT # of the last locked access.
Systems which support Read-for-Ownership must
not assert SNPSTB # between the BGT # of the
write through and the BGT # of the allocation during
a Read-for-Ownership operation.
7.63 SWEND#
Snoop Window End
Closes Snooping Window
Input to 82495XP (pin Q1) Cycle Progress Signal
Synchronous to ClK
7.63.1 SIGNAL DESCRIPTION
SWEND# is an input to the 82495XP that, when
asserted, closes the snooping window and causes
sampling of MWB/WT # and DRCTM #. Once
snooping of all other 82495XP's is complete,
DRCTM # and MWB/WT # can be determined.
Snoop response is blocked by the 82495XP between BGT# and SWEND# activation. Therefore,
the faster SWEND# is closed, faster snoops can be
determined.
All CPU-generated write cycles and cache read miss
cycles must cause a snoop on the memory bus.
SWEND# may be activated once snooping has
completed for these cycles. SWEND# activation
causes the 82495.XP's internal tags to change state
for the current cycle (if necessary). DRCTM # and
MWB/WT # influence the state change decision.
SWEND# need only be activated for those cycles
which require the sampling of DRCTM # and
MWB/WT#.
If a cycle does not specifically require SWEND#,
and SWEND# is not returned, snooping is blocked
from BGT # to CRDY #. For this reason, it may be
more efficient to always return SWEND#.
7.63.2 WHEN SAMPLED
SWEND# is sampled by the 82495XP on the clock
or after KWEND# is sampled active for those cycles
that sample KWEND#. For cycles that do not sam-
2-325
inteL
82495XP Cache Controller/82490XP Cache RAM
pie KWEND#, SWEND# is sampled with or after
BGT#. Once SWEND# is sampled active, it isignored until KWEND # of the next cycle. If SWEND #
is not being sampled, it may violate setup and hold
times.
Snoop response is blocked between BGT # and
SWEND#. If a snoop is initiated between BGT#
and SWEND#, the MTHIT# and MHITM# response is given after SWEND# activation. Any subsequent snoop write back would begin after
CRDY#.
7.63.3 RELATION TO OTHER SIGNALS
SWEND# causes the sampling of MWB/WT# and
DRCTM#. SWEND# is sampled once KWEND# is
sampled active. BGT#, KWEND#, and SWEND#
may be asserted in the same clock.
IPfRl[g[]"D[MJDOO~fRlW
SYNC# is an asynchronous input. SYNC# must
have a pulse width of 2 ClK's in order to guarantee
82495XP recognition.
7.64.3 RELATION TO OTHER SIGNALS
To initiate a SYNC, the 82495XP will complete all
pending cycles and prohibit further ADS#'s to occur
while a SYNC is in progress. The FSIOUT # output
signal is used to indicate the start and end of the
SYNC operation. It will become active when the
SYNC# signal is internally recognized (all outstanding cycles have completed) and will de-activate
when the SYNC operation has completed.
The memory bus controller supplies BRDY # to the
CP.U once the SYNC .has completed. Once SYNC
has begun, and FSIOUT# active, all CADS#'s and
CRDY # 's correspond to the write-backs caused by
the SYNC operation.
SWEND# shares a pin with CFG1.
The 82495XP can be snooped during SYNC cycles
and the snooping protocols will be the same as that
for any memory bus cycle.
7.64 SYNC#
Sync
Synchronizes 82495XPTAG array with Main Memory
7.65 TCK
Test Clock
Input to .82495XP (04) Cache Synchronization Signal
Clock for the JTAG boundary scan tests
Asynchronous
Input to the i860 XP CPU (pin 01) Test Signal
7.64.1 SIGNAL DESCRIPTION
Input to the 82490XP (pin 3)
Input to the 82495XP (pin P3)
SYNC# activation will cause the synchronization of
the 82495XP and i860 XP CPU tag arrays with main
memory. The 82495XP will flush all modified entries
to memory. All valid tag entries will be kept, with
modified [M] state lines becoming non-modified [E]
state lines.
7.64.2 WHEN SAMPLED
SYNC# can be asserted at any time. The 82495XP
will complete all outstanding cycles on the CPU and
memory bus before beginning the SYNC process.
The memory bus controller does not have to prevent
SYNC# during locked cycles because the 82495XP
will complete its locked cycle before the SYNC process will begin.
Once a SYNC operation has begun,the SYNC# signal is ignored until the operation completes. If
RESET or FLUSH # is asserted while the SYNC operation is in progress, the SYNC operation will be.
aborted and the RESET or FLUSH immediately executed.
.
Synchronous
7.65.1 SIGNAL DESCRIPTION
TCK is an input to the i860 XP CPU, 82495XP and
82490XPand provides the clocking function required by the JTAG boundary scan feature. TCK is
used to clock state information and data into and out
of the component. State select information and data
are clocked into the component pn the rising edge
of TCK on TMS and TDI, respectively. Data is
clocked out of the part on the falling edge of TCK on
TDO.
In addition to using TCK as a free running clock, it
may be stopped in a low, logic 0, state, indefinitely
as described in IEEE 1149.1. While TCK is stopped
in the low state, the boundary scan latches retain
their state.
When boundary scan is not used, TCK should be
tied low.
2-326
int:eL
82495XP Cache Controller/82490XP Cache RAM
7.65.2 WHEN SAMPLED
7.67 TOO
TCK is a clock signal and is used as a reference for
sampling other JTAG signals.
Test Oata Output
Outputs serial test instructions and data
,Output from the i860 XP CPU (pin R10) Test Signal
7.65.3 RELATION TO OTHER SIGNALS
Output from the 82495XP (pin C4)
On the rising edge of TCK, TMS and TOI are sampled. On the falling edge of TCK, ROO is driven.
Output from the 82490XP (pin 84)
Synchronous to TCK
7.66 TOI
7.67.1 SIGNAL DESCRIPTION
Test Oata Input
Receives serial test instructions and data
Input to the i860 XP CPU (pin S14) Test Signal
TOO is the serial output used to shift JTAG instructions and data out of the component. The shifting of
instructions and data occurs during the SHIFT-IR
and SHIFT- OR TAP controller states, respectively.
These states are selected using the TMS signal as
described in chapter 9.
Input to the 82495XP (pin N3)
Input to the 82490XP (pin 2)
Synchronous to TCK
When not in SHIFT-IR or SHIFT-OR state, TOO is
driven to a high impedance state to allow connecting
TOO of different devices in parallel.
7.66.1 SIGNAL DESCRIPTION
TOI is the serial input used to shift JTAG instructions
and data into the component. The shifting of instructions and data occurs during the SHIFT-IR and
SHIFT- OR TAP controller states, respectively.
These states are selected using the TMS signal as
described in chapter 9.
An internal pull up resistor is provided on TOI to ensure a known logic state if an open circuit occurs on
the TOI path. Note than when "1" is continuously
shifted into the instruction register, the. BYPASS instruction is selected.
7.67.2
TOO is driven on the falling edge of TCK during the
SHIFT-IR and SHIFT- OR TAP controller states. At
all other times TOO is driven to the high impedance
state.
7.67.3
TOO is only driven when TMS and TCK have been
used to select the SHIFT- IR or SHIFT-OR states in
the TAP controller.
7.66.2 WHEN SAMPLED
TOI is sampled on the rising edge of TCK, during the
SHIFT-IR and the SHIFT-OR states. Ouring all other
TAP controller states, TOI is a "don't care".
7.68 TMS
Test Mode Select
Controls testing by selecting mode of operation
Input to the i860 XP CPU Test Signal
7.66.3 RELATION TO OTHER SIGNALS
Input to the 82495XP (pin P2)
TOI is only sampled when TMS and TCK have been
used to select the SHIFT-IR or SHIFT-OR states in
the TAP controller.
Input to the 82490XP (pin 1)
Synchronous to TCK
For proper initialization of JTAG logic, TOI should be
driven high, "1", for at least four TCK cycles following the rising edge of RESET.
7.68.1 SIGNAL DESCRIPTION
TMS is decoded by the JTAG TAP (Tap Access
Port) to select the operation of the test logic, as described in chapter 9.
2-327
II
intei·
82495XP Cache Controller/82490XP Cache·RAM
To guarantee deterministic behavior of the TAP controller TMS is provided with an internal pull-up resistor. If boundary scan is not used, TMS may be tied
high or left unconnected.
3. A snoop hit to 8 to cause a write back of 8 before
A is written.
In this scenario, 8 is written to memory before A is,
and thus CPU writes have been reordered.
7.68.2 WHEN SAMPLED
7.70.2 WHEN SAMPLED
TMS is sampled on every rising edge of TCK.
WWOR# is sampled during reset like figure 7-1 with
a setup time of 4 CPU clocks. WWOR # becomes
MALE once FSIOUT # indicates that the 82495XP
reset sequence has completed.
7.68.3 RELATION TO OTHER SIGNALS
TMS is used to select the internal TAP states required to load boundary scan instructions to data on
TDI.
7.70.3 RELATION TO OTHER SIGNALS
For proper initialization of the JTAG logic, TMS
should be driven high, "1", for at least four TCK cycles following the rising edge of RESET.
WWOR # shares a pin with MALE.
8.0
BUS FUNCTIONAL DESCRIPTION
AND.TiiiiiiNG
7.69 Vee and.Vss
The 82495XP/82490XP cache core supports a wide
variety of bus transfers to meet the needs of high
performance systems. 8us transfers can be single
cycle or multiple cycle, cacheable or non-cacheable,
64- or 128-bit (memory bus), and locked. To support
multiprocessing systems there are cache back-invalidation, inquire, snooping, read for ownership, cache
to cache transfers, and locked cycles.
Power and Ground Pins
See Tables 1.1 and 1.2 for locations.
7.70 WWOR#
Weak Write Ordering Mode
This section begins with read cycles, both cacheable
and non-cacheable. It moves on to write cycles,
cacheable and non-cacheable. Snooping cycles are
. discussed next with an example of each snooping
mode. The remaining sections describe special cycles: read for ownership, I/O, and locked cycles.
Enforces strong/weak write-ordering policy
Input to 82495XP (pin Q2) Configuration Signal
Synchronous to ClK
7.70.1 SIGNAL DESCRIPTION
When asserted during reset, the 82495XP enforces
a weak write ordering policy. If WWOR # is deasserted during reset, the 82495XP enforces a strong
write-ordering policy.
In a strong write-ordering mode, writes to the memory bus are forced to occur in the order in which they
were posted by the CPU. In a weak write-ordering
mode it is possible for:
1. A CPU posted write (A) to be waiting in a
82495XP/82490XP memory buffer.
2. A subsequent CPU write (8) to complete in the
82495XP /82490XP because it was a hit to M or E
state.
2-328
The cycles shown in this chapter are examples of
various types of 82495XP/82490XP cycles. The purpose of these examples is to show signal relationships, and are not necessarily best case scenarios.
8.1
Read Cycles
8.1.1 READ HITS
Read Hit cycles are executed completely within the
CPU/Cache core, and will not be seen by the M8C:
int:eL
82495XP Cache Controller/82490XP Cache RAM
:
1 : 2
I
3 : 4
I
5 : 6 :
7 : 8 : 9 : 10 : 11 : 12 : 13
I
14 : 15
I
16 : 17 : 18 : 19 : 20 : 21 :
CLK
CADSII
ADDRESS
~~I~~~'fittxxxxxxxxx~Xxxxx-+!---''-----+-'XX;xxxxxxxxxxdxxxxxXX>i
CW(R!/
ZXX!&X\ .i
RDYSRC
"tlltiJJ '
I
,
MCACHE# ~ ,
,
BGlI
CNAfI
KWEND#
MKEN#
SWENDN
,
,
I'IXtt/Xf,x'IXxx'IX'IX~
,
,
. , , .
i,
\xxXX'
l
xJyxfJyxlXlYJ..XXXXXXYXXJYJY.7
'
I , , , , ,
, , , I
'/XXXXX;XXXXXXXXXxxxxxxxxxxxxxxxXX\ '
~
,
,
,
,
I
~
,
,
,
,
,
,
, '/XXXXXXxxx;p: : \: IX0xx~xxxx:xxxxXxxxXxx'fxxxXxxxp: : \: !Xx0x~xxxX
BRDYI
I
CRDY'
V, ,
"
I
,
I
--------r-~-r-~-~-~-~-~-~-~-~-r-~-r---------
CLOCKED MEMORY BUS MODE:
MCLK
MSEL#
, V
~
V
\
/XxxXXXXXxxxXXXl --:- --:- --:
xxxxxxxXxxxaxxxxxxxxxxxx:::xxxx'
IXxxxxxxxxxx)O(xxXxxxxXxxxXxxixxxxlxxxxxxx'IXx
: I : I ' , , , , , , , I ' I ' , , , , ,
,
'~X'li:XX~XXXXxxx:xxx%
240956-33
Figure 8·1. Cacheable Read Miss with Clean Replacement
2·329
82495XP Cache Controller/82490XP Cache RAM
~[fJ[g[bO[M]OOO~[fJW
at this time, indicating the end of the cacheability
window. The 82495XP samples MKEN# during
KWEND# (clock 5) to determine that the cycle is
indeed cacheable.
8.1.2 CACHEABLE READ MISSES
8.1.2.1 Read Miss with Clean Replacement
Figure 8.1 illustrates CPU initiated Read cycles that
miss ~he 82495XP/82490XP cache and replace a
non-dirty (eg. clean or empty) line in the cache. In
such cycles, the 82495XP will instruct the MBC to
perform a cache line-fill cycle on the memory bus. A
cache line-fill is a read of a complete
82495XP/82490XP line from main memory. The line
is then written into the 82490XP's array, and data
transferred to the CPU as requested. If the line
fetched
from
main
memory
replaces. a
82495XP/82490XP cache line which is in valid unmodified state ([E] or [5]), then a back-invalidation
cycle is performed on the CPU bus to guarantee that
the replaced data is also removed from the CPU's
first level cache, thus maintaining the inclusion property.
CACHE CONTROL SIGNALS:
The CPU initiates the read cycle to the
82495XP/82490XP cache where the cache tag
state is looked up. Once the 82495XP determines
the cycle to be a cache miss, it issues CADS #
(clock 2) and the associated cycle control signals to
the MBC (eg .. CW/R#, CM/IO#, CD/C#, RDYSRC,
MCACHE#) In order to schedule the cache line-fill
operation. MCACHE# is active, indicating that the
read miss is potentially cacheable by the 82495XP;
RDYSRC is active, indicating that the MBC must
supply BRDY#s to the CPU cache core.
(MSET[10:01.
The
memory
bus
address
MTAG[11:0], MCFA[6:0]) is valid with CADS#
(clocks 2 and 13 for the two cycles in this example)
and remain valid until after CNA# is sampled active
by the 82495XP (clocks 5 and 16). MALE and MBALE may be used to hold the address as necessary.
The MBC arbitrates for the memory bus and returns
BGT# asserted (clock 3), indicating that the cycle is
guaranteed to complete on the memory bus. Once
the 82495XP samples BGT # asserted, it must finish
that cycle on the memory bus. Prior to this point, the
cycle can be aborted by a snoop hit in the cache.
CNA# is asserted by the MBC (clock 4) to indicate
that it is ready to schedule a new memory bus cycle.
Note that after CNA# activation, cycle control signals are not guaranteed to be valid.
When the MBC has determined thecacheability attribu~e of the cycle, it drives the MKEN# signal accordmgly. The MBC also drives the KWEND# signal
The MBC asserts SWEND# when the snoop window ends on the memory bus. The 82495XP samples MWB/WT# and DRCTM# during SWEND#
(clock 7) and updates the cache tag state according
to the consistency protocol. The closure of the
~noop window ~Iso enables the MBC to start providIng the CPU With data that has been stored in the
82490XP's memory cycle buffer. The MBC supplies
BRDY#s to the CPU (clocks 7-10).
The first cycle ends when CRDY# is driven active
by the MBC (clock 10). It is at this time that the data
in the 82490XP's memory cycle buffers is loaded
into the cache SRAM.
The 82495XP issues a new CADS# in clock 13
which also misses the 82495XP/82490XP cache:
Note that once the cycle progress signals (BGT #,
CNA#, KWEND#, SWEND#) of a cycle are sampled asserted, the 82495XP ignores them until the
~RDY # of that cycle. The 82495XP does not pipeline the cycle progress signals (ie. it will not sample
them again until after CRDY # of the current memory
bus cycle).
MEMORY BUS SIGNALS:
The memory address latch enables (MALE and
MBALE) may remain asserted by the MBC to place
the address latches in flow through mode. If the
82495XP is the current bus master, the memory address output enables (MAOE# and MBAOE#)
should be asserted by the MBC. MDOE# must be
inactive to allow the data pins to be used as inputs.
Some time after the address has been driven onto
.the memory bus, data will. be supplied from the
DRAM (main memory) to the 82490XP cache
SRAM.
For Clocked Memory Bus Mode, MSEL# is driven
active. by the MBC (clock 4) to allow sampling of
MBRDY# and to latch MZBT# for the transfer.
MZBT # is sampled on all MCLK edges where
MSEL# is inactive. Once MSEL# is sampled active
by the 82495XP, the value of MZBT# sampled on
the prior MCLK is used for the next transfer.
MBRDY # is driven active by the MBC in clocks 4 to
6 to cause the memory burst counter to be incremented and data to be placed into the 82490XP
2-330
82495XP Cache Controller/82490XP Cache RAM
[¥)ffil~ll"mK{A1UOO~ffil\1
cache memory cycle buffers. The MBC drives
MEOC# asserted (clock 7) to end the current cycle
on the memory bus and switch memory cycle buffers
for the new cycle. MZBT # is latched at this time
(when MEOC# is sampled asserted and MSEL# remains low) for the next transfer.
the MBC (eg. CW/R#, CM/IO#, CD/C#, RDYSRC,
MCACHE#) in order to schedule the cache line-fill
operation. MCACHE# is active, indicating that the
read miss is potentially cacheable by the 82495XP;
RDYSRC is active, indicating that the MBC must
supply BRDY # s to the CPU cache core.
MBRDY # is driven active by the MBC in clocks 15
to 17 to read data into the 82490XP cache memory
cycle buffers. The MBC asserts MEOC # (clock 18)
to end the second read miss cycle on the memory
bus and switch the memory cycle buffers for a new
cycle.
The
memory
bus
address
(MSET[10:0]'
MTAG[11:0], MCFA[6:0l) is valid with CADS#
(clocks 1 and 5 for the two cycles in this example)
and remain valid until after CNA# is sampled active
by the 82495XP (clocks 4 and 10). MALE and MBA-.
LE may be used to hold the address as necessary.
For Strobed Memory Bus Mode, MSEL# is driven
active by the MBC (clock 4) to allow MISTB operation and to latch MZBT # (on the falling edge of
MSEL#) for the transfer. MISTB is toggled in clocl&xxXixxxixxx>6cxxxxxx~xxxxxx~xxxxxx
5: '
x5: ' x5: xxxxxxxxxx><:>e:X:=:
' "
'
,
,
~
,
"
~
d~~
,
rxxip0$><=t=XX~XXX~XXX:XX
i.
i
;~I"
~~~~.
Y
:\:..:../"1;
CAOS"~EHS$UI!~ .,,~
: ~~
i:
t;X.6X.6XX;Xt:>t>CACS"~DORTE~
"'~""'X=XXrlo~"'X"'X'7<><*X=X~I.-'-i~\ :
:
~~~~~~ixkxx~xxJxxxJxxxJxxxJxxxJxxxJxxxJxxxJxxxxkX\ :
'"
~
'"
V
SNPADS.
SNPCYC'
:~
,
,
,
~XX~XXX;XX\
SNPBSY..
,
~
\
SNPSTB.
~
, .
SNPINV
:xx;: \xX;XXXXDcx:Xxxx:xxxx:Xxxx~
,
MDOEN
MDATA
_.
,
:"----/:;
xxxxxxxxxxxll
:,\xx xXXXXI , \
'I
1 ,
~xxX:Xxx>6GD6c:::>6cxXxxX:X x xlxxxx:
,
,
1
'
,
,
:
:
:
:
:
:
1
'
~xxx~xx~xxixxx~xxxXxx~xxx'xxxXxxx~
____ 1 __
.1 _ _
..
__
..
__
..
__ , __
•••
_
•••
_
4
__
..
_ _ I.
_ _ ',.
_ _ • __
-J
__
--0
STROBED MEMORY BUS MODE:
MSELII
I~:
"--.:
MEOC#
MxSTB
MZBT.
MFRZII
MDOE.
rxxxrxx$xx$><_;_ -~-;- - -
xxxxxxxxxxx¥xxxXxxxX=XxxxxXxxxXxxxXxxxXxxxxxxxXxx~
; ; ;., \: : : : :.: : : : ,.:
,xxxxxX)Q(XXXgxxx:xxxx:XxxXXxx&Xxxxxx&XXX&~
:x~
x: . , , ,
1
MDATA
i ---~ ---;- ---;- ~~
~XXX~XXXgxx~gXXX,xXXX'XXXXgXXX'XXXXgXXX'XXxx~
•
•
•
•
1
240956-39
Figure 8·7. Synchronous Snooping Mode
2-344
intel·
82495XP Caclle Controller/82490XP Cache RAM
CACHE CONTROL SIGNALS:
MEMORY BUS SIGNALS:
In clock 1 SNPSTB# is asserted by the MBC, indicating to the 82495XP a request for snooping. The
82495XP samples MAOE# (it must be inactive) in
order to recognize the snoop request. It is latched
together with the snoop address (MSET[0:10]'
MTAG[O:11], MCFA[O:6l), SNPINV, MBAOE#, and
SNPNCA on the 82495XP's CLK during SNPSTB#
assertion. The tag look-up is done immediately after
SNPSTB# is sampled active since snoop operations have the highest priority in the cache tag state
arbiter. The 82495XP issues SNPCYC# (clock 2),
indicating that the snoop look-up is in progress. The
results of the look-up are driven to the memory bus
via MTHIT# and MHITM# in the next clock after
SNPCYC#. Since the snoop hit a modified line, both
signals are asserted (clock 3). SNPBSY # is also issued to indicate that the 82495XP is busy with CPU
back-invalidations, the 82490XP's snoop buffer is
full, or a write back is to follow. The 82495XP will
accept snoops only when SNPBSY# is inactive.
For Clocked Memory Bus Mode, the memory data
output enable (MDOE#) is not activated by the MBC
to allow the memory data pins to be used as inputs.
MSEL# is driven active by the MBC (clock 4) to allow sampling of MBRDY# and to latch MZBT# for
the read miss transfer. MZBT # is sampled on all
MCLK rising edges where MSEL# is inactive. Once
MSEL# is sampled active by the 82495XP, the value of MZBT # sampled on the prior MCLK is used
for the next transfer.
Since the read miss cycle is aborted due to the
snoop hit to a modified line (requires a write back
cycle), no MEOC# is given. MSEL# is deasserted
by the MBC (clock 6) and reasserted (clock 8) to
allow latching of MZBT # for the snoop write back
cycle and sampling of MBRDY# for that cycle.
MFRZ# is also sampled at this time.
The memory data output enable (MDOE#) signal is
driven active by the MBC (clock 7) to drive the memory data outputs.
Simultaneously with the memory bus activity due to
the snoop request, the CPU initiates a read miss cycle. The 82495XP issues a memory bus request
(CADS#), CDTS#, and cycle control signals to the
MBC in clock 3. The MBC must wait for the pending
snoop cycle to complete on the memory bus prior to
servicing this read miss cycle.
MBRDY # is driven active by the MBC in clocks 10
to 12 to cause the memory burst counter to be incremented and data to be written from the 82490XP
cache snoop buffers. The MBC drives MEOC# asserted (clock 13) to end the write back cycle on the
memory bus and switch memory cycle buffers for
the new cycle. MZBT# and MFRZ# are sampled
and latche.d at this time for the next data transfer.
The
memory
bus
address
(MSET[10:0],
MTAG[11:0], MCFA[6:0l) is not valid until MAOE#
goes active after CRDY # of the snoop write back
cycle is sampled active by the 82495XP and the
CADS# is reissued (clock 13).
MDOE# is deasserted by the MBC (clock 14) to allow the memory data pins to be used as inputs for
the reissued read cycle.
In clock 4 the 82495XP issues SNPADS# and cycle
control signals to the MBC, indicating a request to
flush a modified line out of the cache. SNPADS#
activation causes the MBC to abort the pending read
miss cycle. It is the 82495XP responsibility to re-issue the aborted cycle after the completion of the
write back, since BGT # was not asserted by the
MBC.
For Strobed Memory Bus Mode, the memory data
output enable (MDOE#) has not been asserted by
the MBC to allow the memory data pins to be used
as inputs for the read miss cycle.
Data is loaded into the 82490XP's snoop buffer.
Since SNPINV· was sampled asserted by the
82495XP (clock 1) during SNPSTB# assertion, it
back-invalidated the CPUs first level cache.
The 82495XP asserts CDTS# (clock 8) indicating to
the MBC that data is available in the snoop buffer.
When the MBC complete the write back cycle on the
memory bus, it activates CRDY# ·to the
82495XP/82490XP cache. At this time, the
82495XP deasserts SNPBSY # (clock 13) and re-issues the aborted read miss cycle (clock 13) by asserting CADS# and CDTS#.
MSEL# is asserted by the MBC (clock 4) to allow
sampling of MISTB and latch MZBT# (on the falling
edge of MSEL#) for the read miss transfer.
Since the read miss cycle is aborted. due to the
snoop hit to a modified line (requires a write back
cycle), no MEOC# is given. MSEL# is deasserted
by the MBC (clock 5) and reasserted (clock 6) to
allow latching of MZBT # for the snoop write back
cycle and sampling of MOSTB for that cycle.
MFRZ# is also sampled at this time.
MOSTB is toggled in clocks 11 to 13 to cause the
memory burst counter to be incremented, and data
2-345
iniel..
82495XP CacheController/82490XP Cache RAM
to be read from the 82490XP cache memory cycle
buffers. Note: MOSTB latches the memory bus data
on both the rising and falling edges. The MBC drives
MEOC# asserted (clock 14) to end the snoop write
back cycle on the memory bus and switch memory
cycle buffers for the new cycle. MZBT # and
MFRZ# for the next cycle, are latched at this time
on the falling edge of MEOC#.
MDOE# is deasserted by the MBC (clock 14) to allow the memory data pins to be used .as inputs for
the reissued read miss cycle.
8.3.2 CLOCKED SNOOPING MODE
Figure 8.8 illustrates a CPU initiated Read cycle
which misses the 82495XP/82490XP cache and the
-subsequent line fill replaces non dirty data (eg. clean
or empty) .. Simultaneous with the read request to the
MBC, that device initiates a snoop to the 82495XP
which misses that line in the cache. The snoop is the
result of a write cycle on the memory bus by some
other cache core; therefore, asserting the snoop invalidation signal (SNPINV) to this 82495XP. This example assumes Clocked Snooping Mode (Le. the requests for snoops are done via SNPSTB# from the
MBC, sampled on the MBC's SNPClK).
CACHE CONTROL SIGNALS:
The CPU initiates the read' cycle to the
82495XP/82490XP cache where the cache tag
state is looked up. Once the 82495XP determines
the cycle to be a cache miss, it issues CADS #
(clock 1) and the associated cycle control signals to
the MBC (eg. CW/R#, CM/IO#, CD/(:#, RDYSRC
MCACHE#) in order to schedule the cache line-fiJi
operation. MCACHE# is active, indicating that the
read miss in potentially cacheable by the 82495XP;
RDYSRC is active; indicating that the MBC must
supply BRDY#s to the CPU cache core.
In clock 3, SNPSTB# is asserted bythe MBC at this
indicatin.g to the 82495XP a request for snooping. MAOE# IS deas~erted to allow the forthcoming
snoop (the 82495XP will not recognize the snoop if
MAOE# is active). It is latched together with the
snoop
address
(MSET[0:10],
MTAG[0:11],
MCFA[0:6]), SNPINV, MBAOE#, and SNPNCA on
the MBC's SNPClK rising edge during SNPSTB#
assertion. SNPINV is asserted from the MBC since
th~ cache core which initiated the snoop issued a
write cycle on the memory bus. If the response of
the snoop to this 82495XP was a cache hit, the contents would no longer be valid due that write.
~ime,
I¥'OOI§I1.DIMIDOO~OOW
~ollowing synchronization to the 82495XP ClK, it
Issues SNPCYC# (clock 5), indicating that the
snoop look-up is in progress. The results of the lookup are driven to the memory bus via MTHIT # and
MHITM# in the next clock after SNPCYC#. Since
the snoop was a miss in the cache, both signals are
inactive (clock 6). Note that SNPBSY # will not be
asserted since the snoop was a miss to this cache.
The snoop from another cache is complete at this
point, and the read miss cycle will continue.
The MBC asserts MAOE# to allow this 82495XP to
drive its address on the memory bus in order to complete the read miss cycle. The memory bus address
(MSET[10:01. MTAG[11 :0], MCFA[6:0l) is valid after
M~OE# assertion # (clock 6 for the read cycle in
this example) and remains valid until after CNA# is
sampled active by the 82495XP (clock 8). MALE and
MBAlE may be used to hold the address as necessary.
The MBC arbitrates for the memory bus and returns
BGT# asserted (clock 6), indicating that the cycle is
guaranteed to complete on the memory bus. Once
the 82495XP samples BGT # asserted, it must finish
that cycle on the memory bus. Prior to this point, the
cycle can be aborted by a snoop hit from another
cache.
CNA# is asserted by the MBC(cJock 7) to indicate
that it is ready to schedule a new memory bus cycle.
Note that after CNA # activation, cycle control signals are not guaranteed to be valid.
When the MBC has determined the cacheabilityattribute 'of the cycle, it drives the MKEN# signal accordingly. The MBC also drives the KWEND# signal
at this time, indicating the end of the cacheability
window. The 82495XP samples MKEN# during
KWEND#. (clock 7) to determine that the cycle is
indeed cacheable.
The MBCasserts SWEND# when the snoop window ends on the memory bus. The 82495XP samples MWB/WT# during SWEND# (clock 9) and updates the cache tag state according to the consistency protocol. The closure of the snoop window also
enables the MBC to start providing the CPU with
data that has been stored in the 82490XP's memory
cycle buffer. The MBC supplies BRDY # s to the CPU
(clocks 9-12).
The read miss cycle ends when CRDY # is driven
active by the MBC (clock 12). It is at this time that
the data in the 82490XP's memory cycle buffers is
loaded into the cache SRAM.. -'.
2-346
~rnl~!bOrMIOOO~OO't7
82495XP Cache Controller/82490XP Cache RAM
, 1
ClK
CADS#
ADDRESS
~'"
I sno~p Ad~ ,DOS :
:
,
"
CW/R#
~I
RDYSRC
W'I
I
IACACHE#
I
I
I
I
5fi\
-I'
I
I
I
I
,
I
I
I
I
~
, IXXXXXXXXXXXlOOOOOCO
I
I
" ,
I
I
I
I
I
I
I
I
I
I
I
I
I
~
BGT#
CNA#
I
'~
MXXXg88ID1l!Xg.)(xx~g:RI'
\:~
BRDY#
r:=:iS
CRDY#
KWEND#
MKEN#
SWEND#
:
I
:
:
:
:
:
:
:
I
I
1
I
1
I
I
I
I
I,
I
"
"
I,
"
"
\
:
:
:
:
I
I
I
I
\
MTHIT#
"HITI"I
M
.1'
~
I
r!
'-f-J :
. I
SNPCYC#
r.
~-r--+--+~\lJr~~~~--~-+--+-~--~~
'
',"
mtIDl'lIcRRm!Rl/1I'RID{frl(f ,I
I
I
I
I
I
I
I
I
----+-T-~-~-~~-~-~-~-~~-~-~-~
,
,
,"
CLOCKED MEMORY BUS MODE:,
SNPClK
SNPSTB#
SNPINV
,,
-~I
'
;I
I
I
~
H
- - _
':
I
"
I
- .....L _ T
_ ..l.-
I
lffllCJOOllO(ij!xxxpx»J<~
:
I
IAAOE#
"
"
"
'
"
I
I
I
I
I
I
I
I
I
I
I
\'~i_+_+_-!-_~-!_-!-_-!-_J
1
I
_ L- _ L- _ 1 _
I
-I _
CLOCKED MEMORY BUS MODE:
I
010 _
I
I- _ 1 _ --I _
I
I
--a. _ -+ _
I
-I
""
I
MClK
MSEl#
MEOC#
,,
MBRDY#
:-..
IAZBT#
,
X
X
X
W.
/.
X
,
-,
X
X
X
~~'~dmmn~dmn
IADATA~_
----~-~-~-~-~~-~-T-~-~~-~-~-~
1
I
1
I
I
I
I
I
I
I
I
I
I
I
,
STROBED MEMORY BUS MODE:
'"
,
\:
MSEl#
I
MEOC#
IAxSTB
I
I
I
I
I
I
~ I- l""""I"""""I'-IIlfl-I'P"""JPlt...,
'
, ,
'
I
I
I
I
r.--:
I
I
IV-:-;
'>¢xPL."i,.....--~-Y,..J.-IIliIl-"i
'-~
.'."\
i"
I
'",.. '"
'.
MZBT#
MDATA
~
Figure 8-8., Clocked Snooping Mode
2-347
240956-40
82495XP Cache Controller/82490XP Cache RAM
1¥>[ru[gIl"OIMlOOO~[ruW
MEMORY BUS SIGNALS:
CACHE CONTROL SIGNALS:
The memory address latch enables (MALE and
MBALE) may remain asserted by the MBC to place
the address latches in flow through mode. If the
82495XP is the current bus master, the memory address output enables (MAOE # and MBAOE #)
should be asserted by the MBC. (Note the use of
MAOE# for snooping at the beginning.of the cache
control signals section.) MDOE# must be inactive to
allow the data pins to be used as inputs.
In clock 1 (totally asynchronous to any clock)
SNPSTB # is asserted by the MBC, indicating to the
82495XP a request for snooping. The 82495XP
samples MAOE# (it must be inactive) in order to
recognize the snoop request. It is latched together
with the snoop address (MSET[0:10], MTAG[0:11],
MCFA[0:6]), SNPINV, MBAOE#, and SNPNCA on
falling edge of SNPSTB#. The 82495XP issues
SNPCYC# (clock 3), indicating that the snoop lookup is in progress. The results of the look-up are driven to the memory bus via MTHIT# and MHITM# in
the next clock after SNPCYC#. Since the snoop hit
a modified line, both signals are asserted (clock 4).
SNPBSY # is also issued to indicate that the
82495XP is busy with CPU back-invalidations, the
82490XP's snoop buffer is full, or a write back is to
follow. The 82495XP will accept snoops only when
SNPBSY # is inactive.
Some time after the address has been driven onto
the memory bus, data will be supplied from the
DRAM (main memory) to the 82490XP cache
SRAM.
For Clocked Memory Bus Mode, MSEL# is driven
active by the MBC (clock6) to allow sampling of
MBRDY# and to latch MZBT# for the transfer.
MZ8T # is sampled on aii MCLK edges where
MSEL# is inactive. Once MSEL# is sampled active
by the 82495XP, the value of MZBT# sampled on
the prior MCLK is used for the next transfer.
MBRDY# is driven active by the MBC in clocks 7 to
9 to cause the memory burst counter to be incremented and data to be placed into the 82490XP
cache memory cycle buffers. The MBC drives
MEOC# asserted (clock 10) to end the current cycle
on the memory bus and switch memory cycle buffers
for the new cycle. MZBT # is sampled at this time
(when MEOC# is sampled asserted and MSEL# remains low) for the next transfer.
For Strobed Memory Bus Mode, MSEL # is driven
active by the MBC (clock 6) to allow MISTB operation and to latch MZBT# (on the falling edge of
MSEL#) for the transfer. MISTB is toggled in clocks
8 to 10 to cause the memory burst counter to be
incremented, and data. to be placed into the
82490XP cache memory cycle buffers. Note: MISTB
latches the memory bus data on both the rising and
falling edges. The MBC drives MEOC# asserted
(clock 11) to end the current cycle on the memory
bus and switch memory cycle buffers for the new
cycle. MZBT # for the next cycle, is sampled at this
time on the falling edge of MEOC#.
8.3.3 STROBED SNOOPING MODE
(HIT TO [M] LINE)
Figure 8.9 illustrates a snoop hit to a dirty line sequence occurring simultaneously with a CPU initiated read miss cycle. This example assumes strobed
snooping mode (ie. requests for snoops are done
from the falling edge of SNPSTB #).
Simultaneously with the memory bus activity due to
the snoop request, the CPU initiates a read miss cycle. The 82495XP issues a memory bus request
(CADS #), CDTS #, and cycle control signals to the
MBC in clock 1. The MBC must wait for the pending
snoop cycle to complete on the memory bus prior to
servicing this read miss cycle.
bus
address
(MSET[10:0],
The
memory
MTAG[11:0], MCFA[6:0]) is not valid until MAOE#
goes active after CRDY # of the snoop write back
cycle is sampled active by the 82495XP and the
CADS# is reissued (clock 15).
In clock 5 the 82495XP issues SNPADS# and cycle
control signals to the MBC, indicating a request to
flush a modified line out of the cache. SNPADS#
activation causes the MBC to abort the pending read
miss cycle. It is the 82495XP responsibility to re-issue the aborted cycle after the completion of the
write back, since BGT # was not asserted by the
MBC.
Data is loaded into the 82490XP's snoop buffer.
Since SNPINV was sampled asserted by the
82495XP (clock 1) during SNPSTB# assertion, it
back-invalidated the CPUs first level cache.
The 82495XP asserts CDTS# (clock 9) indicating to
the MBC that data is available in the snoop buffer.
When the MBC complete the write back cycle on the
memory bus, it activates CRDY # to the
82495XP/82490XP cache. At this time, the
82495XP deasserts SNPBSY # (clock 15) and re-issues the aborted read miss cycle by asserting
CADS # and CDTS # .
2-348
82495XP Cache Controller/8249Q)(P Cache RAM
1
CLK
,
I
2
6
7
8
9
10:
11
12:
13 : 14:
I
CADSII
~
ADDRESS
ZX I
CDTSII
345
I
,
,
,
,
,
,
,
xxxxxxxxxxxxxxxxxxxxxr.--.---.--~--.---~~
'
, x::::r=
~
CW/RII
5(\,
lD..-,-_~~_~/~
RDYSRC
P
MCACHEII
~
I
\~
:~
,
I
'1I~
CADS# Roissued
CADS# Abortod
;
;
BGTII
I
•
'
"--:
:~~:
,
~
CRDYII
,"-/.'------------~-.--:
SNPADSII
:'-J.r:~~~~-~~-T-~~~-~~~-
SNPCYCII
MTHITII
MHITMII
SNPBSYII
'
~
,
bXXX<=ffiH~IbO[KvA]OOO~ffirt7
and data to be placed into the 82490XP cache memory cycle buffers. During the line fill, the 82490XP
will merge the data from the write through buffer with
the incoming data from either main memory or another cache (if that line was a write hit to [M] in
another cache).
The MBC drives MEOC# asserted (clock 11) to end
the allocation cycle on the memory bus and switch
memory cycle buffers for the new cycle. MZBT # is
sampled at this time for the next data transfer.
8.5 1/0 Cycles
Figure 8-12 illustrates CPU initiated 1/0 cycles, both
read and write. 1/0 writes are the only write cycles
not posted by the 82495XP/82490XP cache (ie. the
cycle is not fully acknowledged to the CPU until it
has completed on the memory bus).
CACHE CONTROL SIGNALS:
The CPU initiates an 1/0 write cycle to the
82495XP/82490XP. The 82495XP then issues
CADS# and CDTS# (clock 1) and the associated
cycle control signals to the MBC (eg. CW/R#, CMI
10#, CD/C#, RDYSRC, MCACHE#). MCACHE# in
not active, indicating that the cycle is not cacheable;
RDYSRC is active, indicating that the MBC must
supply BRDY#s to the CPU/Cache core.
The
memory
bus
address
(MSET[10:0]'
MTAG[11:0], MCFA[6:0l) is valid with CADS#
(clocks 1 and 10 for the two read s in this example)
and remain valid until after CNA# is sampled active
by the 82495XP (clocks 6 and 17). MALE and MBALE may be used to hold the address as necessary.
The MBC arbitrates for the memory bus and returns
BGT# asserted (clock 2) for the 1/0 write cycle, indicating that the cycle is guaranteed to complete on
the memory bus. Once the 82495XP samples BGT #
asserted, it must finish that cycle on the memory
bus. Prior to this point, the cycle can be aborted by a
snoop hit from another cache.
CNA# for the write cycle is asserted by the MBC
(clock 5) to indicate that it is ready to schedule a
new memory bus cycle. Note that SWEND# and
KWEND# are not needed for 1/0 cycles since they
are not cacheable.
The MBC asserts BRDY # in clock 7 to complete the
1/0 write cycle from the CPU, and CRDY # in clock 8
to complete the cycle on the memory bus from the
82495XP/82490XP cache.
2-355
II
~!ru~!l.D~DOO~!ruW
82495XP Cache Controller/82490XP Cache RAM
: 1 1 2 : 3 : 4 : 5 1 6 : 7 ' 8 : 9 : 10 1 11 : 12 : 13 : 14 : 15 : 16 1 17 : 18 :
ROYSRC
MCACHE#
BGTN
'fJJ 1
W 1' :,
,
V0'IXmYXfYXlXlJ 1
1
'
:
:,
,\XXXXY:j.,XXXX'lXXXXI '
1
,
,
,
,
1
,
'
~'/'lXIX'tI;'lXXXXXXXX'!X'&'IX'@Y$.'IX'I'XtI:
KWENO#
SWENO#
CNA#
'MMMi .:
,
,
,
,
,
,
,=
1
,
~
1
1
BROY#
.:_
,
,
ttix'IXX:xxxXx'tlit!x'lXlXi;.,xxX
'I""
,
\ IXXXXXXxxxxxxxxxXmx/:
\ flit;:l:i;
: \.J :
"
i ~
~'
C~Y# _
,
.I_.:_.:"".:"" . .L.':"
CLOCKED MEMORY BUS MODE:
:
:
V..:. . . .:. . .J."":. _:. _:. _:. _ ~ _l_~:
:
:
:
:
:
1
:\
,
:
,
,
'
MCLK
Lrlri.Jt_rislri_J-U-G-lr-L~I-J~
MSEUI
,
I'
1
1
,r---'----1L--'--------'
MEOel
MBROY#
x
MZBTN
~
MODE#
'----l.
-
_
x
x'
'
:
:
:
":~L
:
:
:
:
:
/
'
,
,
,
..L
"
"
,_
:
:
:
:
:
:
:
"'"
xxmxxxxxx~xxxxxxxxxxxxxXxxxXxxxxxxxr-~,-~)O()()()(i()()(xxxxxxXXX:
MOATA
-
x
~0;xx~xxqxmxx~xx~xmxx*x'li;,xx0;YXi;,xx0;YXitttiJ,xx0;xxX
MFRZ#
-
-r-
-
_
-
--.-
_....,..
-
-T
-
----0
-
----.
_
STROBED MEMORY BUS MODE:
1
MSEUI
\LJ:
MEDel
MxSTB
MZBT#
MFRZ#
I
00xxixxxxiCXXXX*XX\XXXX:XXX*XXX;XXX$XXX:XXXXXXXXXXXXxxxxxxxxxxxx,
,
1
'
,
,
1
'
,
,
,
1
'
:
:
:
'
:
:
:
, I,
MODE#
~---;_-:-~~.
MOATA
~
XX
>:lXXXXXXxxxxxxxxxxxXXXlXr--,---,----.-.;.......,XXXXXXXxxxxxx
,
,
,
,
I
240956-44
Figure 8-12.1/0 Write and Read Cycles
2-356
82495XP Cache Controller/82490XP Cache RAM
A new CADS# is issued from the 82495XP in clock
10 for an I/O read cycle, along with the associated
cycle control signals. MCACHE # is again not active,
and RDYSRC is again active.
The MBC returns BGT# asserted right away (clock
11). The 82495XP can pipeline I/O cycles, but does
not for the I/O read in this example.
Upon completing the access on the memory bus,
the MBC activates BRDY # (clock 17) and CRDY #
(clock 16). Note that BRDY # of a cycle may come
before (as in the I/O write cycle of this example),
with or after the CRDY # of the same cycle.
For Strobed Memory Bus Mode, The memory data
output enable signal (MDOE#) has been asserted
by the MBC to drive the memory data outputs.
MEOC# is asserted by the MBC (clock 5) to latch
MZBT# for the I/O write transfer (on MEOC# falling
edge), and end that cycle on the memory bus
(MOSTB is not necessary since this example shows
a single transfer cycle). MZBT # is driven high by the
MBC in order to force the write cycle to begin with
the correct burst address. MFRZ# is also sampled
here (it need not be active since the cycle is not
potentially allocatable).
For the I/O read cycle, MDOE# is deasserted (clock
10) by the MBC to allow the data pins to be used as
inputs.
MEMORY BUS SIGNALS:
The memory address latch enables (MALE and
MBALE) may remain asserted by the MBC to place
the address latches in flow through mode. If the
82495XP is the current bus master, the memory address output enables (MAOE# and MBAOE#)
should be asserted by the MBC.
For Clocked Memory Bus Mode, The memory data
output enable signal (MDOE#) is asserted by the
MBC in clock 3 to drive the memory data outputs.
MEOC# is asserted by the MBC (clock 5) to latch
MZBT # for the I/O write transfer, and end that cycle
on the memory bus (MBRDY # is not necessary
since this example shows a single transfer cycle).
MZBT # is driven high by the MBC in order to force
the write cycle to begin with the correct· burst address. MFRZ# is also sampled here (it need not be
active since the cycle is not potentially allocatable).
MSEL# is driven active by the MBC (clock 10) to
allow operation of MISTB and to latch MZBT# for
the transfer (on MSEL# falling edge). Again,
MZBT # is driven high by the MBC to force the transfer to begin with the correct burst address.
The MBC toggles MISTB (clock 15) to cause the
memory burst counter to be incremented and data to
be placed into the 82490XP cache memory cycle
buffers for the I/O read cycle. Note: MISTB latches
the memory bus data on both the rising and falling
edges. The MBC drives MEOC# asserted (clock 16).
to end the read cycle on the memory bus and switch
memory cycle buffers for a new cycle. MZBT # for
the next transfer is latched at this time (on the falling
edge of MEOC#).
8.6 LOCKed Cycles
For the I/O read cycle, MDOE# is deasserted (clock
12) by the MBC to allow the data pins to be used as
inputs.
MSEL# is driven active by the MBC (clock 12) to
allow sampling of MBRDY # and to latch MZBT # for
the transfer. MZBT # is sampled on all MCLK edges
where MSEL# is inactive. Once MSEL# is sampled
active by the 82495XP, the value of MZBT# sampled on the prior MCLK is used for the next transfer.
Again, MZBT# is driven high by the MBC to force
the transfer to begin with the correct burst address.
The MBC asserts MBRDY # (clock 14) to cause the
memory burst counter to be incremented and data to
be placed into the 82490XP cache memory cycle
buffers. The MBC drives MEOC# asserted (clock
15) to end the read cycle on the memory bus and
switch memory cycle buffers for a new cycle.
MZBT # for the next transfer is latched at this time.
8.6.1 CPU READ MODIFY WRITE CYCLES
The 82495XP provides a facility to allow atomic accesses requested by the CPU (via CPU LOCK # activation) through the 82495XP KLOCK # signal. Figure 8-13 illustrates two back-to-back CPU initiated
Locked read-modify-write cycles. KLOCK# activation indicates to the MBC that the memory bus
should not be released between the KLOCKed cycles. KLOCK# will. remain asserted from the beginning of the first cycle (with CADS#) until one clock
after the CADS of the last cycle. The 82495XP does
not distinguish between back-to-back locked operations and will not open an arbitration window (deassert KLOCK#) between them. It is the responsibility
of the MBC to distinguish between the multiple RMW
sequences, if it is so desired.
2-357
infel"
~[fJ~lLn~nOO~[fJW
82495XP Cache Controller/82490XP Cache RAM
2
3, 4
5
6
78
9; 10 ; 11: 12 : 13 ; 14 ;
I
I
I
I
ClK
CADS#
,
ADDRESS
CW/R#
RDYSRC
~!!"i
~I
I
BGT#
CNA#
KlOCK#
I
'I
I II'
I ~i
I
I.
I
, , , , , ,
fItIlJ!IP I I wow:
WcMcx\i i I
I
I
I
mDl~ ' . ~'JXX!!!lXg
.ilX$X$ijX\!.
~!~
I~I
II,
.
I
I
I
,
BRDY#
CRDY#
,
'.
I
I
I
"
,
I
I
I
I
,
I'
,""
,
,. ,
,
--~-~-~-~-~-~~-~-~-~-~~-~-~-~-~
•
I
I
I
I
I
I
I
I
CLOCKED MEMORY BUS MODE:
:
, , ,
.~
MClK
MSEl#
MEOC# ,
MBRDY#
MZBT#
MFRZ#
MDOE#
MDATA
,
"
- - - - . - ....... --I---r--r--I
I
I
•
I
STROBED MEMORY BUS .MODE:
,, ,I
MSEL#
I
I
, ,
-.-""j
I
,,
,
~~~I_.~__~__~~__~~~~__~__·~'__~~~
MEOC#
.MxSTB
MZBT#
MFRZ#
MDOE#
MDATA
240956-45
Figure 8-13. LOCKed Read-Modify-Write Cycles
2-358
82495XP Cache Controller/82490XP Cache RAM
The 82495XP issues a request for a memory bus
access (CADS#) for every locked cycle (read or
write) regardless if it hits the cache tag state or not.
Locked read cycles are treated by the 82495XP as
cache misses, and, if the line is in the [M] state, the
82495XP ignores the data on the memory bus and
uses the data in the 82490XP array. Locked write
cycles are treated as write through, and the tag state
does not change even if the line is in the 82490XP
array.
CACHE CONTROL SIGNALS:
The CPU initiates a Locked read cycle to the
82495XP/82490XP cache where, due to the assertion of CPU LOCK #, it assumes a cache miss and
issues CADS# to the MBC (clock 1) along with the
associated cycle control signals (eg. CW IR #, CM/
10#, CD/C#, RDYSRC, MCACHE#). MCACHE# is
never asserted for LOCKed cycles; RDYSRC is active, indicating that the MBC must supply BRDY # to
the CPU/Cache core.
(MSET[10:0],
The
memory
bus
address
MTAG[11:01, MCFA[6:0]) is valid with CADS#
(clocks 1 and 5, then 7 and 11 for the two locked
RMW sequences in this example) and remain valid
until after CNA# is sampled active by the 82495XP
(clocks 3 and 7, then 9 and 13). MALE and MBALE
may be used to hold the address as necessary.
The MBC arbitrates for the memory bus and returns
BGT # asserted (clock 2), indicating that the cycle is
guaranteed to complete on the memory bus. Once
the 82495XP samples BGT # asserted, it must finish
that cycle on the memory bus. Prior to this point, the
cycle can be aborted by a snoop hit from another
cache.
CNA # for the read cycle is also asserted by the
MBC (clock 2) to indicate that it may schedule a new
memory bus cycle. Note that the cycle control signals are not guaranteed to be valid after CNA# activation.
The MBC' asserts BRDY # to the CPU/Cache core
in clock 4. CRDY # for the locked read cycle is asserted to the 82495XP /82490XP from the MBC
(clock 5) to load the data stored in the 82490XP's
memory cycle buffers into the cache array. If the
read was to a dirty line, the 82495XP is intelligent
enough to ignore the data in the memory cycle buffers and use the data in the cache array.
Locked sequences always end in a write cycle, no
new CPU initiated cycles may be inserted between
the Locked read and Locked write cycles. Therefore,
[¥)OO~[LO~O~~OOW
the 82495XP issues a new memory cycle request
(CADS# in clock 5) for the Locked write as soon as
it completes the Locked read cycle. The cycle control signals are also valid at this time. RDYSRC is not
active, indicating that the 82495XP will supply
BRDY # to the CPU.
The locked write cycle is posted like any other memory write cycle.
In this example, the CPU initiates a second readmodify-write cycle immediately. KLOCK # is not
deasserted between the back-to-back locked sequences since the CPU LOCK # remains asserted. If
snooping is required between these cycles, it is the
MBC responsibility to predict this boundary and allow snooping. The 82495XP issues a memory bus •
request (CADS#) in clock 7 for the second locked
read cycle, along with the new cycle control signals.
The second locked RMW sequence repeats the actions of the first. It's purpose in this example is to
demonstrate that an arbitration window may not
open between locked sequences if they follow one
another with no idle or non-locked cycles between
them.
MEMORY BUS SIGNALS:
The memory address latch enables (MALE and
MBALE) may remain asserted by the MBC to place
the address latches in flow through mode. If the
82495XP is the current bus master, the memory address output enables (MAOE# and MBAOE#)
should be asserted by the MBC.
For Clocked Memory Bus Mode, MSEL# is driven
active by the MBC (clock 3) to allow sampling of
MBRDY# and to latch MZBT# for the transfer.
MZBT # is sampled on all MCLK edges where
MSEL# is inactive. Once MSEL# is sampled active
by the 82495XP, the value of MZBT # sampled on
the prior MCLK is used for the next transfer.
The memory data output enable signal (MDOE#)
must be inactive to allow the data pins to be used as
inputs for the first locked read cycle. The MBC asserts MEOC# (clock 4) to latch MZBT# for the next
transfer, and end the current locked read cycle on
the memory bus (MBRDY # is not necessary since
this example shows a single transfer cycle). MZBT#
is driven high by the MBC in order to force the read
cycle to begin with the correct burst address.
For the locked write cycle, MDOE# is asserted by
the MBC (clock 5) to drive the memory data outputs.
2-359
82495XP Cache Controller/82490XP Cache RAM
MEOC# is again asserted (clock 6) to latch MZBT#
for the next transfer, and end the current locked
write cycle on the memory bus (MBRDY # is not
necessary since this is a single transfer cycle).
MZBT# is again driven high. MFRZ# is also sampled during write cycles when MEOC# is sampled
active by the 82495XP.
MDOE# is deasserted by the MBC (clock 7) to allow
the data pins to be used as inputs for the second
locked read cycle. MEOC# is again asserted (clock
8) to latch MZBT# for the next transfer, and end the
locked read cycle on the memory bus. MZBT # is
again driven high.
MDOE# is asserted by the MBC (clock 9) to drive
the memory data outputs for the second locked write
cycle. MBRDY # is asserted (clock 13) to cause the
memory burst counter to be incremented and data to
be placed into the 82490XP cache memory cycle
buffers. The MBC drives MEOC# active and
MSEL# inactive (clock 14) to end the locked write
cycle on the memory bus and switch memory cycle
buffers for a new cycle. MZBT# and MFRZ# for the
next transfer are sampled at this time.
For Strobed Memory Bus Mode, MSEL# is driven
active by the MBC (clock 1) to allow sampling of
MxSTB and to latch MZBT # for the first locked read
transfer (on the falling edge of MSEL#).
The memory data output enable signal (MDOE #)
must be inactive to allow the data pins to be used as
inputs for the first locked read cycle. The MBC asserts MEOC# (clock 3) to latch MZBT # for the next
transfer (on MEOC# falling edge while MSEL# is
active), and end the current locked read cycle on the
memory bus (MISTB is not necessary since this example shows a single transfer cycle). MZBT # is
driven high by the MBC in order to force the read
cycle to begin with the correct burst address.
For the locked write cycle, MDOE# is asserted by
the MBC (clock 4) to drive the memory data outputs.
MEOC# is again asserted (clock 6) to latch MZBT#
for the next transfer, and end the current locked
write cycle on the memory bus (MOSTB is not necessary since this is a single transfer cycle). MZBT #
is again driven high. MFRZ # is. also sampled on the
falling edge of MEOC#.
MDOE # is deasserted by the MBC (clock 7) to allow
the data pins to be used as inputs for the second
locked read cycle. MEOC# is again asserted (clock
8) to latch MZBT # for the next transfer, and end the
locked read cycle on the memory bus. MZBT # is
again driven high.
MDOE# is asserted by the MBC (clock 9) to drive
the memory data outputs for the second locked write
cycle. MOSTB is toggled (clock 12) to cause the
memory burst counter to be incremented and data to
be placed into the 82490XP cache. memory cycle
buffers. The MBC drives MEOC# active and
MSEL# inactive (clock 13) to end the locked write
cycle on the memory bus and switch memory cycle
buffers for a new cycle. MZBT # and MFRZ # for the
next transfer are sampled at this time.
9.0
TESTABILITY
Testing the 82495XP/82490XP chipset can be divided into three categories: Built-In Self Test (BIST),
Boundary Scan, and external testing. BIST performs
basic device testing on the 82495XP. Boundary
Scan provides additional test hooks that conform to
the IEEE Standard Test Access Port and Boundary
Scan Architecture (IEEE Std.1149.1). Additional
testing can be performed by using software written
to test the 82490XP cache SRAM.
9.1
Built-In Self Test (BIST)
BIST tests the internal funcitonality of the 82495XP.
The 82495XP's BIST tests approximately 90% of
the cache controller. It tests the tag RAM and comparators.
The 82495XP BIST is initiated by driving
SLFTST#(CRDY#) low and HIGHZ#(MBALE) high
at least 10 clocks before RESET goes inactive. The
82495XP Cache Controller reports the result of BIST
on the CAHOLD signal. When the self test completes, the 82495XP drives FSIOUT# inactive and
the BIST result on CAHOLD. If CAHOLD is driven
active the BIST successfully passed. If CAHOLD is
driven inactil1e, BIST detected a flaw in the cache
controller. CAHOLD is valid for one clock after
FSIOUT # deactivation and should be sampled on
the rising edge of FSIOUT # .
On the 82495XP, BIST only informs the system that
a failure did or did not occur. BIST is not able to
indicate where a failure occurred. After completing
BIST the cache controller perform reset and begin
normal operation.
9.2 Boundary Scan
The 82495XP/82490XP chipset provides additional
test ability features compatible with the IEEE Standard Test Access Port and Boundary Scan Architecture (IEEE Std.; ;49.;). The test logic provided al-
2-360
82495XP Cache Controller/82490XP CachE! RAM
lows for testing to insure that components function
correctly, that interconnections between various
components are correct, and that various components interact correctly on the printed circuit board.
The boundary scan test logic consists of a boundary
scan register and support logic that are accessed
through a test access port (TAP). The TAP provides
a simple serial interface that makes it possible to
test all signal traces with only a few probes.
The TAP can be controlled via a bus master. The
bus master can be either automatic test equipment
or a component (PLO) that interfaces to the four-pin
test bus.
9.2.1 BOUNDARY SCAN ARCHITECTURE
The boundary scan test logic contains the following
elements:
- Test access port (TAP), consisting of input pins
TMS, TCK, and TOI; and ouput pin TOO.
- TAP controller, which interprets the inputs on the
test mode select (TMS) line and performs the
corresponding operation. The operations performed by the TAP include controlling the instruction and data registers within the component.
- Instruction register (IR), which accepts instruction codes shifted into the test logic on the test
data input (TOI) pin. The instruction codes are
used to select the specific test operation to be
performed or the test data register to be accessed.
- Test data registers: The 82495XP/82490XP
chipset components each contain three test data
registers: Bypass register (BPR), Oevice Identification register (OIO), and Boundary Scan register (BSR).
The instruction and test data registers are separate
shift-register paths connected in parallel and have a
common serial data input and a common serial data
output connected to the TAP signals, TOI and TOO,
respectively.
.
Each test data register is serially connected to TOI
and TOO, with TOI connected to the most significant
bit and TOO connected to the least significant bit of
the test data register. Oata is shifted one stage (bit
pOSition within the register) on each rising edge of
the test clock (TCK).
9.2.2.1 Bypass Register
The Bypass Register is a one-bit shift register that
provides the minimal length path between TOI and
TOO. This path can be selected when no test operation is being performed by the component to allow
rapid movement of test data to and from other components on the board. While the bypass register is
selected data is transferred from TOI to TOO without
inversion.
9.2.2.2 Boundary Scan Register
The Boundary Scan Register is a single shift register
path containing the boundary scan cells that are
connected to all input and output pins of the
82495XP/82490XP chipset. Figure 9.1 shows the
logical structure of the boundary scan register. While
output cells determine the value of the signal driven
on the corresponding pin, input cells only capture
data; they do not affect the normal operation of the
device. Oata is transferred without inversion from
TOI to TOO through the boundary scan register during scanning. The boundary scan register can be operated by the EXTEST and SAMPLE instructions.
The boundary scan register order is described in
section 9.2.5 ..
9.2.2.3 Device Identification Register
The Oevice Identification Register contains the manufacturer's identification code, part number code,
and version code in the format shown in Figure 9.2.
Table 9.1 lists the codes corresponding to the
82495XP and 82490XP.
Table 9-1. Device ID Register Values
Component
Code
9.2.2 DATA REGISTERS
The 82495XP and 82490XP both contain the two
required .test data registers; bypass register and
boundary scan register. In addition, they also have a
device identification register.
2-361
Part
Manufacturer
Version
Number
Code
Identity
Code
82495XP
(AO or A1) OAh
0495h
0495h
09h
82495XP (SO)
OBh
0495h
09h
82490XP
(AO or A1)
OOh
49AOh
09h
82495XP Cache Controller/82490XP Cache RAM
----------------------------------,
BOUNDARY SCAN REGISTER
r-------------.,
SYSTEM
BiDIRECTIONAL
PIN
SYSTEM
lOGIC
INPUT
SYSTEM
LOGIC
._---1
SYSTEM
3-STATE
OUTPUT
TCK
TOO
TDI
240956-46
Figure 9-1. Boundary Scan Register Structure
/3130292072625242322212019 18 17 16 15 14 13 10'1 10 9
VERSION
8
7
6
5
4
3
MANUFACTURER
IDENTITY
PART NUMBER
2
1
0/
1
240956-47
Figure 9-2. Device ID Register
9.2.2.4 Runbist Register
9.2.3 INSTRUCTION REGISTER
The Runbist Register is a one bit register used to
report the results of the 82495XP BIST when it is
initiated by the RUNBIST instruction. This register is
loaded with a "1" prior to invoking the BIST and is
loaded with "1" upon successfull completion. "0"
indicates a failure occurred during BIST.
The Instruction Register (IR) allows instructions to
be serially shifted into the device. The instruction
selects the particular test to be performed, the test
data register to be accessed, or both. The instruction register is four (4) bits wide. The most significant
bit is connected to TDI and the least significant bit is
connected to TOO. There are no parity bits associated with the Instruction register. Upon entering the
Capture-IR TAP controller state, the Instruction register is loaded with the default instruction "0001",
SAMPLE/PRELOAD. Instructions are shifted into
the instruction register on the rising edge of TCK
while the TAP controller is in the Shift-IR state.
NOTE:
82495XP RUNBIST is not available in the A-stepping.
2-362
82495XP Cache Controller/82490XP Cache RAM
9.2.3.1 82495XP Boundary Scan Instruction Set
The 82495XP cache controller supports all three
mandatory boundary scan instructions (BYPASS,
SAMPLE/PRELOAD, and EXTEST) along with one
optional instruction (IDCODE). On the B-Stepping of
the 82495XP two additional optional instructions will
be implemented (RUNBIST and TRISTATE). Table
9.3 lists the 82495XP boundary scan instruction
codes. The instructions listed as PRIVATE cause
TOO to become enabled in the Shift-DR state and
cause "0" to be shifted out of TOO on the rising
edge of TCK. Execution of the PRIVATE instructions
will not cause hazardous operation of the 82495XP.
Note that system tests should not execute instruction codes labeled "RESERVED". These instructions can put the component in an undeterminant
state which can only be cleared by power on reset.
Table 9·2. 82495)(P Boundary Scan
Instruction Codes
Instruction Code
Instruction Name
0000
EXTEST
0001
SAMPLE
0010
IDCODE
0011
RESERVED
0100
RESERVED
0101
RESERVED
0110
RESERVED
0111
*RUNBIST
1000
'TRISTATE
1001
RESERVED
1010
PRIVATE
1011
PRIVATE
1100
PRIVATE
1101
PRIVATE
1110
PRIVATE
1111
BYPASS
• RUNBIST and TRISTATE are boundary scan instructions
that will be implemented in the B-stepping of the 82495XP.
They are not available on the A·stepping.
EXT EST
The instruction code is "0000". The EXTEST instruction allows testing of circuitry external to the component package, typically board interconnects. It
does so by driving the values loaded
into the 82495XP boundary scan register out on the output pins corresponding
to each boundary scan cell and cap2-363
turing the values on 82495XP input pins
to be loaded into their corresponding
boundary scan register locations. I/O
pins are selected as input or output, depending on the value loaded into their
control setting locations in the boundary
scan register. Values shifted into input
latches in the boundary scan register
are never used by the internal logic of
the 82495XP. Note: after using the EXTEST instruction, the 82495XP must be
reset before normal (non-boundary
scan) use.
SAMPLE/ The instruction code is "0001 ". The
PRELOAD SAMPLE/PRELOAD has two functions
that it performs. When the TAP 'controller is in the Capture-DR state, the SAMPLE/PRELOAD instruction allows a
"snap-shot" of the normal operation of
the component without interfering with
that normal operation. The instruction
causes boundary scan register cells associated with outputs to sample the value being driven by the 82495XP. It causes the cells associated with inputs to
sample the value being driven into the
82495XP. On both outputs and inputs
the sampling occurs on the rising edge
of TCK. When the TAP controller is in
the Update-DR state, the SAMPLE/
PRELOAD instruction preloads data to
the device pins to be driven to the board
by executing the EXTEST instruction.
Data is preloaded to the pins from the
boundary scan register on the falling
edge of TCK.
IDCODE
The instruction code is "0010". The 10CODE instruction selects the device
identification register to be connected to
TDI and TOO, allowing the devices identification code to be shifted out of the
device on TOO. Note that the device
identification register is not altered by
data being shifted in on TDI.
BYPASS The instruction code is "1111 ". The BYPASS instruction selects the bypass
register to be connected to TDI and
TOO, effectively bypassing the test logic
on the 82495XP by reducing the shift
length of the device to one bit. Note that
an open circuit fault in the board level
test data path will cause the bypass register to be selected following an instruction scan cycle due to the pull-up resistor on the TDI input. This has been done
to prevent any unwanted interference
with the proper operation of the system
logic.
82495XP Cache Controller/82490XP Cache RAM
RUNSIST The instruction code is "0111 ". The
RUNSIST instruction selects the one (1)
bit runbist register, loads a value of "0"
into the runbist register, and connects it
to TOO. It also initiates the built-in self
test (SIST) feature of the 82495XP,
which is able to detect approximately
90% of the stuck-at faults on the
82495XP. The 82495XP ac/dc specifications for VCC and ClK must be met
and reset must have been asserted at
least once prior to executing the
RUNSIST boundary scan instruction. After loading the RUNSIST instruction
code in the instruction register, the TAP
controller must be placed in the RunTest/Idle state. SIST begins on the first
rising edge of TCK after entering the
Run-Test/Idle state. The TAP controller
must remain in the Run-Test/Idle state
until B!ST is completed. It rGquires 100K
clock (ClK) cycles to complete SIST
and report the result to the runbist register. After completing the 100K clock
(ClK) cycles, the value in the runbist
register should be shifted out on TOO
during the Shift-DR state. A value of "1"
being shifted out on TOO indicates SIST
successfully completed. A value of "0"
indicates. a failure occurred. After executing the RUNSIST instruction, the
82495XP must be reset prior to normal
operation. NOTE: This instruction is not
available on the A-stepping of the
82495XP. It will be implemented in the
S-stepping.
TRISTATE The instruction code is "1000". The
TRISTATE instruction initiates the tristate output test mode. After loading the
TRISTATE boundary scan instruction
into the instruction register, the TAP
controller must be placed in the RunTest/Idle state. To terminate the tristate
output test mode, the 82495XP must be
reset. NOTE: This instruction is not
available on the A-stepping of the
82495XP. It will be implemented in the
S-stepping.
shifted out of TOO on the rising edge of TCK. Execution of the PRIVATE instructions will not cause hazardous operation of the 82490XP. Note that system
tests should not execute instruction codes labeled
"INTEL RESERVED". These instructions can put
the component in an undeterminant state which can
only be cleared by power on reset.
Table 9-3. 82490XP Boundary Scan
Instruction Codes
Instruction Code
Instruction Name
0000
EXTEST
0001
SAMPLE
0010
IDCODE
EXTEST
9.2.3.2 82490XP Boundary Scan Instruction Set
The 82490XP cache controller supports all three
mandatory boundary scan instructions (SYPASS,
SAMPLE/PRELOAD, and EXTEST) along with one
optional instruction (IDCODE). Table 9.4 lists the
82490XP boundary scan instruction codes. The instructions listed as PRIVATE cause TOO to become
enabled in the Shift-DR state and cause"O" to be
2-364
0011
INTEL RESERVED
0100
INTEL RESERVED
0101
INTEL RESERVED
0110
INTEL RESERVED
0111
INTEL RESERVED
1000
INTEL RESERVED
1001
INTERl RESERVED
1010
PRIVATE
1011
PRIVATE
1100
PRIVATE
1101
PRIVATE
1110
PRIVATE
1111
BYPASS
The instruction code is "0000". The EXTEST instruction allows testing of circuitry external to the component package, typically board interconnects. It
does so by driving the values loaded
into the 82490XP boundary scan register out on the output pins corresponding
to each boundary scan cell and capturing the values on 82490XP input pins to
be loaded into their corresponding
boundary scan register locations. 1/0
pins are selected as input or output, depending on the value loaded into their
, control setting locations in the boundary
scan register. Values shifted into input
latches in the boundary scan register
are never used by the internal logic of
the 82490XP. Note: after using the EXTEST instruction, the 82490XP must be
reset before normal (non-boundary
scan) use.
82495XP Cache Controller/82490XP Cache RAM
SAMPLE/ The instruction code is "0001". The
PRELOAD SAMPLE/PRELOAD has two functions
that it performs. When the TAP controller is in the Capture-DR state, the SAMPLE/PRELOAD instruction allows a
"snap-shot" of the normal operation of
the component without interfering with
that normal operation. The instruction
causes boundary scan register cells associated with outputs to sample the val. ue being driven by the 82490XP. It causes the cells associated with inputs to
sample the value being driven into the
82490XP. On both outputs and inputs
the sampling occurs on the rising edge
of TCK. When the TAP controller is in
the· Update-DR state, the SAMPLE/
PRELOAD instruction preloads data to
the device pins to be driven to the board
by executing the EXTEST instruction.
Data is preloaded to the pins from the
boundary scan register on the falling
edge of TCK.
IDCODE
The instruction code is "0010". The 10CODE instruction selects the device
identification register to be connected to
TDI and TOO, allowing the devices identification code to be shifted out of. the
device on TOO. Note that the device
identification register is not altered by
data being shifted iri on TDI.
BYPASS The instruction code is "1111 ". The BYPASS instruction selects the bypass
register to be connected to TDI and
TOO, effectively bypassing the test logic
on the 82490XP by reducing the shift
length of the device to one bit. Note that
an open circuit fault in the board level
test data path will cause the bypass register to be selected following an instruction scan cycle due to the pull-up resistor on the TDI input. This has been done
to prevent any unwanted interference
with the proper operation of the system
logic.
9.2.4 TEST ACCESS PORT (TAP)·
CONTROLLER
The value of the test mode state (TMS) input signal
at a rising edge of TCK controls the sequence of the
state changes. The state diagram for the TAP contoller is shown in figure 9.3. Test designers must
consider the operation of the state machine in order
to design the correct sequence of values to drive on
TMS.
9.2.4.1 Test-Logie-Reset State
In this state, the test logic is disabled so that normal
operation of the device can continue unhindered.
This is achieved by initializing the instruction register
such taht the IDCODE instruction is loaded. No matter what the original state of the controller, the controller enters Test-Logic-Reset state when the TMS
input is held high (1) for at least five rising edges of •
TCK. The controller remains in this state while TMS
is high. The TAP controller is also forced to enter
this state at power-up.
9.2.4.2 Run-Test/Idle State
A controller state between scan operations. Once in
this state, the controller remains in this state as
long as TMS is held low. In devices supporting the
RUNBIST instruction, the BIST is performed during
this state and the result is reported in the runbist
register. For instructions not causing functions to execute during this state, no activity occurs in the test
logic. The instruction register and all test data registers retain their previous state. When TMS is high
and a rising edge is applied to TCK, the controller
moves to the Select-DR state.
9.2.4.3 Select-OR-Scan State
This is a temporary controller state. The test data
register selected py the current instruction retains its
previous state. If TMS is held low and a rising edge
is applied to TCK when in t.his state, the controller
moves into the Capture-DR state, and a scan sequence for the selected test data register is initiated.
If TMSis held high and a rising edge. is applied to
TCK, the controller moves to the Select-IR-Scan
state.
The instruction does not change in this state.
The TAP controller is a synchronous, finite state machine. It controls the sequence of operations of the
test logic. The TAP controller changes state only in
response to the following events:
1. A rising edge of TCK
2. Power-up.
2-365
inteL
82495XP Cache Controller/82490XP Cache RAM
240956-48
Figure 9-3. Tap Controller State Diagram
9.2.4.4 Capture-DR State
9.2.4.5 Shift-DR State
In this state, the boundary scan register captures
input pin data if the current instruction is EXTEST or
SAMPLE/PRELOAD. The other test data registers,
which do not have parallel input, are not changed.
In this controller state, the test data register connected between TDI and TOO as a result of the current instruction, shifts data one stage toward its serial output on each rising edge of TCK.
The instruction does not change in this state.
The instruction does not change in this state.
When the TAP controller is in this state and a rising
edge is applied to TCK, the controller enters the
Exit1-DR state if TMS is high or the Shift-DR state if
TMS is low.
When the TAP controller is in this state arid a rising
edge is applied to TCK, the controller enters the
Exit1-DR state if TMS is high or remains in the ShiftDR state if TMS is low.
2-366
82495XP Cache Controller/82490XP Cache RAM
9.2.4.6 Exit1-DR State
This is a temporary state. While in this state, if TMS
is held high, a rising edge applied to TCK causes the
controller to enter the Update-DR state, which terminates the scanning process. If TMS is held low and a
rising edge is applied to TCK, the controller enters
the Pause-DR state.
The test data register selected by the current instruciton retains its previous value during this state. The
instruction does not change in this state.
9.2.4.7 Pause-DR State
The pause state allows the test controller to temporarily halt the shifting of data through the test data
register in the serial path between TDI and TOO. An
example of using this state could be to allow a tester
to reload its pin memory from disk during application
of a long test sequence.
The test data register selected by the current instruciton retains its previous value during this state. The
instruction does not change in this state.
The controller remains in. this state as long as TMS
is low. Whne TMS goes high and a rising edge is
applied to TCK, the controller moves to the Exit2-DR
state.
9.2.4.8 Exit2-DR State
This is a temporary state. While in this state, if TMS
is held high, a rising edge applied to TCK causes the
controller to enter the Update-DR state, which terminates the scanning process. If TMS is held low and a
rising edge is applied to TCK, the controller enters
the Shift-DR state.
All Shift-register stages in test data register selected
by the current instruciton retains its previous value
during this state. The instruction does not change in
this state.
9.2.4.10 Select-IR-Scan State
This is a temporary controller state. The test data
register selected by the current instruction retains its
previous state. If TMS is held low and a rising edge
is applied to TCK when in this state, the controller
moves into the Capture-IR state, and a scan sequence for the instruction register is initiated. If TMS
is held high and a riSing edge is applied to TCK, the
controller moves to the Test-Logic-Reset state.
The instruction does not change in this state.
9.2.4.11 Capture-IR State
In this controller state the shift register contained in
the instruction register loads the fixed value "0001"
on the rising edge of TCK.
The test data register selected by the current instruciton retains its previous value during this state. The
instruction does not change in this state.
When the controller is in this state and a rising edge
is applied to TCK, the controller enters the Exit1-IR
state if TMS is held high, or the Shift-IR state if TMS
is held low.
9.2.4.12 Shift-IR State
In this state the shift register contained in the instruction register is connected between TDI and
TOO and shifts data one stage towards its serial output on each rising edge of TCK.
The test data register selected by the current instruciton retains its previous value during this state. The
instruction does not change in this state.
The test data register selected by the current instruciton retains its previous value during this state. The
instruction does not change in this state.
9.2.4.9 Update-DR State
When the controller is in this state and a rising edge
is applied to TCK, the controller enters the Exit1-IR
state if TMS is held high, or remains in the Shift-IR
state if TMS is held low.
The boundary scan register is provided with a
latched parallel output to prevent changes at the
parallel output while data is shifted in response to
the EXTEST and SAMPLE/PRELOAD instructions.
When the TAP controller is in this state and the
boundary scan register is selected, data is latched
onto the parallel output of this register from the shiftregister path on the falling edge of TCK. The data
held at the latched parallel output does not change
other than in this state.
9.2.4.13 Exit1-IR State
This is a temporary state. While in this state, if TMS
is held high, a rising edge applied to TCK causes the
controller to enter the Update-IR state, which terminates the scanning process. If TMS is held low and a
rising edge is applied to TCK, the controller enters
the Pause-IR state.
2-367
intel®
82495XP Cache Controller/82490XP Cache RAM
The test data register selected by the current instruciton retains its previous value during this state. The
instruction does not change in this state.
9_2.4.14 Pause-IR State
The pause state allows the test controller to temporarily halt the shifting of data through the instruction
register.
The test data register selected by the current instruciton retains its previous value during this state. The
instruction does not change in this state.
The controller remains in this state as long as TMS
is low. When TMS goes high and a rising edge is
applied to TCK, the controller moves to the Exit2-IR
state.
9.2.4.15 Exit2-1 R State
This is a temporary state. While in this state, if TMS
is held high, a rising edge applied to TCK causes the
controller to enter the Update-IR state, which terminates the scanning process. If TMS is held low and a
rising edge is applied to TCK, the controller enters
the Shift-IR state.
The test data register selected by the current instruciton retains its previous value during this state. The
instruction does nbt change in this state.
9.2.4.16 Update-IR State
The instruction shifted into the instruction register is
latched onto the parallel output from the shift-register path on the falling edge of TCK. Once the new
instruction has b,een latched, it becomes the current
instruction.
Test data registers selected by the current instruction retain the previous value.
9.2.5 BOUNDARY SCAN REGISTER CELL
The boundary scan register for each component
contains a cell for each pin, as well as cells for control of 1/0 and tristate pins.
9.2.5.1 82495XP Boundary Scan Register Cell
The following is the bit order of the 82495XP boundary scan register: (from left to right and top to bottom)
~OO[g[bOIMlOOO~OOW
TDI ~ MKEN# KWEND# SWEND# BGT#
CNA# BRDY# RESERVED CRDY# MWBWT#
DRCTM# MRO# CWAY# FPFLD# SNPCYC#
SNPBSY# MHITM# MTHIT#CAHOLD FSIOUT#
PALLC# SNPADS# CADS# CDTS# CWR#
CDC# CMIO# RDYSRC MCACHE# KLOCK#
SMLN# NENE# CFA3 CFA2 TAG11 TAG10 TAG9
TAG8 TAG7 TAG6 TAG5 TAG4 TAG3 TAG2 TAG1
TAGO SET10 SET9 SET8 SET7 CLK SET6 SET5
SET4 SET3 SET2 SET1 SETO CFA6 CFA5 CFA4
CFA1 CFAO ADS# LEN BLAST# BRDYC1 #
BRDYC2# CACHE# LOCK# BLE# BOFF# KEN#
AHOLD WR# MIO# DC# PWTPCD HITM# PCYC
EADS# 'NA# INV WBWT# WAY WRARR#
MCYC# BUS# MAWEA# WBWE# WBA WBTYP
MCFAO MCFA1 MCFA4 MCFA5 MCFA6 MSETO
MSET1 MSET2 MSET3 MSET4 MSET5 MSET6
MSET7 MSET8 MSET9 MSET10 MTAGO MTAG1
MTAG2 MTAG3 MTAG4 MTAG5 MTAG6 MTAG7
MTAG8 MTAG9 MTAG10 MTAG11 MCFA2 MCFA3
RESET MAOE# MBAOE# SNPCLK SNPSTB#
EWBE# MPIC# SNPINV FLUSH# SNYC#
SNPNCA MBALE MALE MACTL OCTL CFA4CTL
CFA5CTL
CACTL
FPFLDCTL
WBWTCTL
NACTL~TDO
"RESERVED" signals correspond to no connect
"NC" signals on the 82495XP.
EWBE# and MPIC# will be implemented in the
82495XP B-stepping, omit from boundary scan register for A-stepping 82495XPs.
All the 'CTL cells are control cells that are used to
select the direction of bidirectional pins or tristate
output pins. If "1" is loaded into the control
cell(*CTL), the associated pin(s) are tristated or selected as input. The following lists the control cells
and their corresponding pins.
1. MACTL controls the MSETO-10, MTAGO-11,
and MCFAO-6 pins.
2. OCTL controls the WAY, WRARR#, MCYC#,
MAWEA#, BUS#, WBWE#, WBA, WBTYP, INV,
EADS#, AHOLD, KEN#, BOFF#, BLE#,
BRDYC2#, BRDYC1#, BLAST#, NENE#,
SMLN#, KLOCK#, MCACHE#, RDYSRC,
CMIO#, CDC#, CWR#, CDTS#, CADS#,
SNPADS#, PALLC#, FSIOUT#, CAHOLD,
MTHIT#, MHITM#, SNPBSY#, SNPCYC#,
CWAY, EWBE#, and MPIC# output pins.
,
'
3. CFA4CTL controls the CFA4 pin.
4. CFA5CTL controls the CFA5 pin.
5. CACTL controls the SETO-10, TAGO-11,
CFAO-3, and CFA6 pins.
6. FPFLDCTL controls the FPFLD# pin.
7. WBWTCTL controls the WB/WT# pin.
8. NACTL controls the NA# pin.
2-368
82495XP Cache Controller/82490XP Cache RAM
9.2.5.2 82490XP Boundary Scan Register Cell
The following is the bit order of the 82490XP boundary scan register: (from left to right and top to bottom)
TDI -+ CDCTL WR# BLAST # BRDYC#
BRDY# HITM# ADS# BE# AO A1 A2 A3 M A5 A6
A7 A8 A9 A10 A11 A12 A13 A14 A15 MDATA7
MDATA3 MDATA6 MDATA2 MDATA5 MDATA1
MDATM MDATAO MDCTL MDOE# MZBT#
MBRDY# MOEC# MFRZ# MSEL# MCLK MOCLK
RESET PAR# RESERVED BOFF# WBTYP WBA
WBWE# BUS# MAWEA# MCYC# CRDY#
WRARR# WAY CDATA4 CDATAO CDATA2
CDATA5
CDATA6
CDATA1
CDATA3
CDATA7-+TDO
"RESERVED" signals correspond to no connect
"NC" signals on the 82490XP.
All the 'CTL cells are control cells that are used to
select the direction of bidirectional pins or tristate
output pins. If "1" is loaded into the control
cell(*CTL), the associated pin(s) are tristated or selected as input. The following lists the control cells
and their corresponding pins.
1. CDCTL controls the CDATAO-7 pins.
2. MDCTL controls the MDATAO-7 pins.
9.2.6 TAP CONTROLLER INITIALIZATION
The TAP controller is automatically intialized when a
device is powered up. In addition, the TAP controller
can be initialized by applying a high Signal level on
the TMS input for five TCK periods.
9.2.7 BOUNDARY SCAN SIGNAL DESCRIPTION
[¥>OO~!!..D~DOO~!ruW
tri-stated. When the 82495XP is in tri-state output
test mode, external testing can be used to test
board interconnections.
On the 82495XP, tri-state output test mode is invoked by driving HIGHZ#(MBALE) and SLFTST#(CRDY #) active to the 82495XP at least 10 clocks
prior to the deassertion of RESET. Note that
.HIGHZ# has priority over SLFTST#. When both
HIGHZ# and SLFTST# are driven active the
82495XP will invoke the tri-state output mode and
not invoke BIST.
Once tri-state output test mode is invoked, the
82495XP remains in it until the next RESET.
9.4 82490XP Cache SRAM Testing
The 82490XP cache SRAM can be tested using
standard cache memory testing techniques. Code
must be written to:
1. Flush and reset the 82495XP182490XPICPU
cache
2. Write 1's to every bit of a block of memory equal
to the cache size
3. Read the block of memory to fill the cache, tagging the data as read-only using the MRO# signal
4. Write O's to every bit in the block of memory
5. Read the block, the cache hits should be all 1's
6. Repeat the process, exchanging 0 for 1 and 1 for
o
In this example, the code to test the cache must be
non-cacheable to the 82495XP. Also, the CPU
cache must be on so that the 82495XP will perform
line-fiils.
AND TIMINGS
The functionality of TDI, TMS, TDO, and TCK are
described in Chapter 7. The A.C. timing specifications for the boundary scan signals are located in
Chapter 10.
9.3 Tri-State Output Test Mode
The 82495XP has the ability to tri-state all of its outputs and bidirectional pins and to disable all pull-ups
and pull-downs. During tri-state output test mode all
pins floated during bus hold as well as those which
are never floated during normal operation are
10.0 AC/DC SPECIFICATIONS
10.1 Background
The 82495XP has four main interfaces: CPU Bus,
memory bus controller, memory bus, and 82490XP.
The memory bus controller is typically implemented
with PLD devices. The MBC interface signal timings
are, therefore, generated based on available, offthe-shelf PLD specs. The memory bus interface was
specified to suit a generic memory interface which
works up to CPU frequency.
2-369
inlet
82495XP Cache Controller/82490XP Cache RAM
10.2 D.C. Specifications
Table 10-1. D.C. Specifications
Vcc
Symbol
VIL
Parameter
Input Low Voltage
VIH
Input High Voltage 2.0
VOL
Output Low Voltage
VOH
Output High Voltage
Icc
Power Supply Current
Power
= 5V ± 5%, Tcase = 0 to + 85°C
Min
Max
Unit
-0.3
+0.8
V
TTL Level
2.0
Vee + 0.3
V
TTL Level
0.45
V
TTL Level (1)
550
300
rnA
2.75
1.50
W
2.4
V
Power Dissipation
Notes
TTL Level (2)
82495XP
82490XP
@
82495XP
82490XP
@
III
Input Leakage Current
±15
uA
0< VIN
@
@
50 MHz, (3)
50 MHz
50 MHz, (4)
50 MHz
> Vcc
ILO
Output Leakage Current
±15
uA
o~
VOUT ~ Vcc Tristate
IlL
Input Leakage Current
200
uA
VIN
= 0.45V, (5)
CIN
Input Capacitance
14
5
pF
for 82495XP
for 82490XP
Co
Output Capacitance
18
15
pF
for 82495XP
for 82490XP
ClIO
1/0 Capacitance
18
15
pF
for 82495XP
for 82490XP
CCLK
CLK Input Capacitance
14
5
pF
for 82495XP
for 82490XP
CrlN
Test Input Capacitance
15
10
pF
for 82495XP
for 82490XP
CrouT
Test Output Capacitance
15
10
pF
for 82495XP
for 82490XP
CrcK
Test Clock Capacitance
15
10
pF
for 82495XP
for 82490XP
NOTES:
(1) Parameter measured at 4mA lIoad.
For MCFA6-FCFAO, MSET10-MSETO, and MTAG11-MTAGO, this parameter is measured at 16 mA lIoad.
(2) Parameter measured at 1mA lIoad.
For MCFA6-MCFAO, MSET10-MSETO, and MTAG11-MTAGO, this parameter is measured at 2 mA lIoad.
(3) Typical Supply current 400mA.
(4) Typical Power dissipation is 2W.
(5) This parameter is for input with pullup.
2-370
inteL
82495XP Cache Controller/82490}{P Cache RAM
~~[g[bm~OOO~~w
10.3 A.C. Specifications
All TTL timing specs are measured at 1.5V for both "0" and "1" logic level.
Table 10-2. Clock, Reset, and Configuration
Vcc = 5V ± 5%, Tcase = 0 to + 85°C
Maximum CL = 50 pF unless otherwise specified.
Minimum CL = 20 pF unless otherwise specified.
All Inputs and Outputs are TTL Level.
Symbol
Parameter
to
ClK, MClK, MOClK Frequency
Min
Mal{
Unit
16.6
50
MHz
t1
ClK, MClK, MOClK Stability
t2
ClK, MClK, MOClK Period
20
Figure
Notes
1x clock
0.1
%
60
ns
10-1
t3
ClK, MClK, MOClK High Time·
7
ns
10-1
(1 )
t4
ClK, MClK, MOClK low Time
7
ns
10-1
(1 )
t5
ClK, MClK, MOClK Rise Time
2
ns
(1 )
t6
ClK, MClK, MOClK Fall Time
2
ns
(1 )
t7
RESET Setup Time
t8
RESET Hold Time
t9
RESET Duration
t10
All Configurations CFG3-CFGO,
CPUTYP, SNPMD, PlOCKEN,
MEMlDRV, 82490XPlDRV, HIGHZ#,
SlFTST # Setup Time
t11
7
ns
10-4
2
ns
10-4
8xt2
15xt2
ns
10-4
for 82495XP, (2)
for 82490XP
10x12
ns
10-4
(3), (4)
All Configurations CFG3-CFGO,
CPUTYP, SNPMD, PlOCKEN,
MEMlDRV, 82490XPlDRV, HIGHZ#,
SlFTST # Hold Time
0
ns
10-4
(3), (5)
t12
FLUSH #, SYNC# Setup Time
8
ns
10-3
for 82495XP, (6)
t13
FLUSH #, SYNC# Hold Time
1
ns
10-3
for 82495XP, (7)
t14
FlUSH#, SYNC# Duration
2xt2
ns
t15
MOClK falling edge to MClK rising edge
2
ns
t16
FERR#, HlDAValid Delay
2
t17
FERR#, HlDA Float Delay
t18
HOLD, BOFF # Setup Time
7
ns
10-3
t19
HOLD, BOFF # Hold Time
2
ns
10-3
15
ns
18
ns
(8)
10-2
NOTE:
(1) Rise/Fall, High/low times measured between O.BV and 2.0V.
(2) Power up reset duration should be 1 ms after Vee and elK are stable. If configuration inputs with pullups are left floated,
10 us RESET duration is required.
(3) Timing is referenced to reset falling edge.
(4) Bns setup time is required to guarantee recognition on next clock.
(5) 1ns hold time is required to guarantee recognition on next clock.
(6) To guarantee recognition on next clock.
(7) Synchronous mode only.
(B) Asynchronous mode only. To guarantee recognition.
2-371
intet
82495XP Cache Controller/82490XP Cache RAM
~OO[g[bOIMlOOO~OOW
Table 10-3. Memory Bus Controller 82495XP/82490XP Interface
Vcc = 5V ± 5%, Tcase = 0 to +85 °C
Maximum CL = 50 pF unless otherwise specified.
Minimum CL = 20 pF unless otherwise specified.
All Inputs and Outputs are TTL Level.
Symbol
t30
Unit
Figure
Notes
BRDY#, CRDY#, KWEND#, SWEND#,
BGT # , CNA #, [WRMRST] Setup Time
Parameter
Min
8
Max
ns
10-3
82495XP Only
t30a
BRDY #, CRDY # Setup Time
7
ns
10-3
82490XP Only
t31
BRDY#, CRDY#, KWEND#, SWEND#,
BGT#, CNA#, [WRMRST] Hold Time
1
ns
10-3
82495XP Only
t32
CW/R#, CD/C#, CMI/O#, RDYSRC,
MCACHE#, KlOCK#, BlE#, PAllC#,
CAHOlD, CWAY, FSIOUT#, CADS#,
CDTS#, SNPADS# Valid Delay
2
12
ns
10-2
t33
NENE#, SMlN# Valid Delay
2
15
ns
10-2
t34
MDATA Setup to ClK (clock before
BRDY# active)
6
ns
10-3
t35
MDATAValid Delay from ClK (ClK from
CDTS# valid, MDOE# active)
3
15
ns
10-2
t36
MDATA Valid Delay from MDOE # active
10
ns
10-2
t37
MDATA Fload Delay from MDOE# inactive
14
ns
0
Table 10-4. 82495XP Memory Interface
Vcc = 5V ± 5%, Tcase = 0 to + 85°C
Maximum CL = 50 pF unless otherwise specified.
Minimum CL = 20 pF unless otherwise specified.
All Inputs and Outputs are TTL Level.
Parameter
Symbol
Min
Max
Unit
50
MHz
Figure
t50
SNPClK Frequency
t51
SNPClK Period
20
·ns
10-1
Notes
1x clock (10)
(11 )
t52
SNPClK High Time
8
ns
10-1
t53
SNPClK low Time
8
ns
10-1
t54
SNPClK Rise Time
2
ns
(1)
t55
SNPClK Fall Time
2
ns
(1 )
t56
MCFA6-MCFAO, MSET10-MSETO,
MTAG11-MTAGO Valid Delay
2
13
ns
10-5
(2), (3)
t56
MCFA6-MCFAO, MSET10-MSETO,
MTAG11-MTAGO Float Delay
2
15
ns
10-5
(4)
t58
MCFA6-MCFAO, MSET1 O-MSETO,
MTAG11-MTAGO Valid Delay
2
15
ns
10-5
(5)
2-372
82495)(P Cache Controller/82490)(P Cache RAM
~1m[g[bOIMlO[t{]~OOW
Table 10-4. 82495XP Memory Interface (Continued)
Vcc = 5V ± 5%, Tcase = 0 to +85 °C
Maximum CL = 50 pF unless otherwise specified.
Minimum CL = 20 pF unless otherwise specified.
All Inputs and Outputs are TTL Level.
Symbol
Parameter
Min
Max
Unit
Figure
Notes
t60
MCFA6-MCFAO, MSET10-MSETO, MTAG11MTAGO Valid Delay
2
15
ns
10-2
(6), (12)
t62a
MCFA6-MCFAO, MSET1 O-MSETO, MTAG11MTAGO, SNPINVV, SNPNCA, MAOE#,
MBAOE #, SNPSTB # Setup Time
8
ns
10-3
(7a)
t62b
MCFA6-MCFAO, MSET10-MSETO, MTAG11MTAGO, SNPINV, SNPNCA, MAOE#, MBAOE#
Setup Time
1
ns
10-3
(7b)
t62c
MCFA6-MCFAO, MSET1 O-MSETO, MTAG11MTAGO, SNPINV, SNPNCA, MAOE#, MBAOE#,
SNPSTB# Setup Time
8
ns
10-3
(7c)
t63a
MCFA6-MCFAO, MSET1 O-MSETO, MTAG11MTAGO, SNPINV, SNPNCA, MAOE#, MBAOE#,
SNPSTB# Hold Time
1
ns
10-3
(7a)
t63b
MCFA6-MCFAO, MSET10-MSETO, MTAG11MTAGO, SNPINV, SNPNCA, MAOE#, MBAOE#
Hold Time
8
ns
10-3
(7b)
t63c
MCFA6-MCFAO, MSET10-MSETO, MTAG11MTAGO, SNPINV, SNPNCA, MAOE#, MBAOE#,
SNPSTB# Hold Time
1
ns
10-3
(7c)
t64
SNPSTB # Setup Time
8
ns
10-3
(8)
t65
SNPSTB# Hold Time
1
ns
10-3
(8)
(9)
t66
SNPSTB# Activeilnactive Time
8
ns
10-3
t67
MRO#, MKEN#, DRCTM#, MWB/WT# Setup
Time
8
ns
10-3
t68
MRO#, MKEN#, DRCTM#, MWB/WT# Hold
Time
1
ns
10-3
t69
MTHIT#, MHITM#, SNPBSY#, SNPCYC#
Valid Delay
2
13
ns
10-2
t69a
SNPCYC# Valid Delay
2
12
ns
10-2
NOTES:
(1) Rise/fall times measured between 0.45V and 2.4V
(2) See capacitive derating curves for loads above the 50pF specification
(3) Valid delay from MAOE#, MBAOE# going active (low)
(4) Float delay from MAOE #, MBAOE # going inactive (high)
(5) Valid delay from MALE or MBAlE if both MAOE#, MBAOE# are active
(6) Valid delay from ClK only if MALE or MBAlE, MAOE# and MBAOE# are active
(7) a. In clocked mode referenced to SNPClK rising edge
b. In strobed mode referenced to SNPSTB # falling edge
c. In synchronous mode, refer to ClK
(8) Asynchronous clocked mode only. Timings referenced to SNPClK
(9) Asynchronous signal. Time to guarantee recognition on next clock
(10) SNPClK is only used for the clocked memory bus mode
(11) t51 > t2
(12) This parameter is valid either from SNPClK or ClK
2-373
intet
82495XP Cache Controller/82490XP Cache RAM -
~[ffi~!bOIMlOOO~[ffiW
Table 10-5 82490XP Clocked Mode
Vcc = 5V ± 5%, Tcase = 0 to +85·C
Maximum CL = 50 pF unless otherwise specified.
Minimum CL = 20 pF unless otherwise specified.
All Inputs and Outputs are TTL Level.
Symbol
Parameter
Min
t38
MBRDY#, MSEL#, MEOC# Setup to MCLK
Max
Unit
Figure
5
ns
10-3
t39
MBRDY#, MSEL#, MEOC# Hold from MCLK
2
ns
10-3
t40
MZST#, MFRZ# Setup to MCLK
5
ns
10-3
t41
MZST #, MFRZ # Hold from MCLK
2
ns
10-3
t42
MDATA Setup to MCLK
5
ns
10-3
t43
MDATA Hold from MCLK
3
ns
10-3
t44
MDATA Valid Delay from MCLKoMBRDY#
2
16
ns
10-2
t45
MDATA Valid Delay from MCLK*MEOC#,MCLKoMSEL#
2
20
ns
10-2
t46
MDATA Valid Delay from MOCLK
2
12
ns
10-2
Notes
Table 10-6. 82490XP Strobed Mode
Vce = 5V ± 5%, Tcase = Oto +85·C
Maximum CL = 50 pF unless otherwise specified.
Minimum CL = 20 pF unless otherwise specified.
All Inputs and Outputs are TTL Level.
Symbol
Parameter
Unit
Figure
t85
MISTS, MOSTB High Time
Min
12
Max
ns
10-6
t86
MISTS, MOSTB Low time
12
ns
10-6
t87
MEOC# High time
8
ns
10-6
t88
MEOC# Low time
8
ns
10-6
t89
MxSTB, MEOC# Rise time
2
ns
t90
MxSTS, MEOC# Fall time
2
ns
t91
MSEL# High time for restart
8
ns
(1)
(1 )
10-6
t92
MSEL# Setup before transition on MxSTB
5
ns
10-8
t93
MSEL # Hold after transition on MxSTB
10
ns
10-8
10-8
t92
MSEL# Hold after transition on MEOC#
2
ns
t95
MxSTB transition to/from MEOC# falling transition
10
ns
t96
MZST# Setup to MSEL# or MEOC# falling edge
5
ns
10-7
t97
MZST# Hold from MSEL# or MEOC# falling edge
2
ns
10-7
t98
MFRZ# Setup to MEOC# falling edge
5
ns
10-7
t99
MFRZ# Hold from MEOC# falling edge
2
ns
10-7
t100
MDATA Setup to MxSTB or MEOC# falling transition
5
ns
10-7
t101
MDATA Hold from MxSTB or MEOC# falling transition
2
ns
10-7
t102
MDATA Valid Delay from MxSTB transition
2
16
ns
10-9
t103
MDATA Valid Delay from MEOC# falling transition or
MSEL# deactivation
.
2
20
ns
10-9
. NOTE:
(1) Rise/Fall times are measured between O.8V and 2.0V
2-374
Notes
82495}{P Cache Controller/82490XP Cache RAM
Table 10-7. Test Mode
Vcc = 5V ± 5%, Tcase = 0 to +85 °C
Ma)(imum CL = 50 pF unless otherwise specified.
Minimum CL = 20 pF unless otherwise specified.
All Inputs and Outputs are TTL Level.
Symbol
Parameter
Min
Max
Unit
25
MHz
Figure
Notes
t120
TCK Frequency
t121
TCK Period
40
ns
1x clock
(2)
t122
TCK High Time
10
ns
@2.0V
t123
TCK Low Time
10
ns
@0.8V
(1 )
t124
TCK Rise Time
4
ns
t125
TCK Fall Time
4
ns
t126
TDI, TMS Setup Time
8
t127
TDI, TMS Hold Time
7
ns
10-10
t128
TOO Valid Delay
3
25
ns
10-10
3
25
ns
10-10
(3)
36
ns
10-10
(3)
t129
TOO Float Delay
t130
All Outputs Valid Delay
t131
All Outputs Float Delay
ns
(1 )
10-10
NOTES:
(1) Rise/Fall times are measured between O.BV and 2.0V Rise/Fall limes can be relaxed by 1ns per 10ns increase in TCK
period
(2) TCK period ~ ClK period
(3) Parameter measured from TCK
240956-49
figure 10-1. Cloc!(Waveform
Signal
240956-50
Ix
= t16, 32, 33, 35, 36, 44, 45, 60, 69
Figure 10-2. Valid Delay Timings
2-375
82495XP Cache Controller/8249DXP Cache RAM
ClK
Signal
Signal
·240956-51
Ix = 130, 62a, 62c, 64, 67, 76, 85
~=
240956-52
131,63a, 63c, 65,68, 77,86
Figure 10-3a. Setup and Hold TIIillngs in
Strobed Snooping Mode
Figure 10-3. Setup and Hold Timings
/
ClK
\ RESET
·t9
Canfig
240956-53
Figure 10-4. Reset and ConfigurationTimings
MAOE#,
MBAOE#
t57
t56
MCFAO -
VALID
Hi-Z
MALE,
MBAlE
CFAO
240956-54
Figure 10-5. Memory Interface Signals
Signal
~ n~""rc
2.0V -
- - --
-
.
t8S, 87, 91.
p
-
:
- - - -
- 08V
•
Figure 10-6. Active/Inactive Timing
2-376
240956-55
int:el.
82495XP Cache Controller/82490XP Cache RAM
~OO[g[bO~OOO~OOW
.1
\I
STB
\96,98,
100
\97,99,101
Signal
240956-56
Figure 10-7, Setup and Hold Timing
Signal
\93,94
\92
240956-57
Figure 10-8. Setup and Hold Timing
TCK
STB
TOI
~~~~,lr~~~I,~
TMS
Signal
240956-58
TOO
Figure 10-9, Valid Delay Timing
Output
Signals
240956-59
Figure 10-10. Test Timings
2-377
intel·
APPLICATION
NOTE
AP-434
November 1989
Using i860™ Microprocessor
Graphics Instructions
for 3-~ Rendering
Order Number: 240856-001
2-378
USING i860™
MICROPROCESSOR
GRAPHICS INSTRUCTIONS
FOR 3-D RENDERING
CONTENTS
PAGE
INTRODUCTION . ...................... 2-381
1.0 3·D RENDERING .................. 2-381
2.0 DISTANCE INTERPOLATION ...... 2-383
3.0 COLOR INTERPOLATION ......... 2-385
4.0 BOUNDARY CONDITIONS ........ 2-386
4.1 Z-Buffer Masking ............... 2-386
4.2 Accumulator Initialization ........ 2-388
5.0 THE INNER LOOP ................. 2-388
6.0 ALTERNATIVE
IMPLEMENTATIONS ............ 2-392
2-379
FIGURES
PAGE
EXAMPLES
PAGE
Figure 1 Z-8UFFER Interpolation ....... 2-383
Example 1: Setting Pixel Size ........... 2-381
Figure 2 faddz Operands ............... 2-384
Example 2: Register Assignments ...... 2-382
Figure 3 Pixel Interpolation for Gouraud
Shading ................... : ......... 2-385
Example 3: Gonstruction of Z
Interpolants .......................... 2-385
Figure 4 faddp Operands .............. 2-386
Example 4: Construction of Color
Interpolants .......................... 2-386
Example 5: Z Mask Procedure .......... 2-387
TABLES
PAGE
Table 1 faddz Visualization ............. 2-384
Table 2 Accumulator Initial Values ...... 2-388
Table 3 Accumulator Initialization
Table ................................ 2-388
Example 6: Accumulator Initialization ... 2-389
Example 7: 3-0 Rendering (1 of 2) ...... 2-390
Example 7: 3-0 Rendering (2 of 2) ...... 2-391
Example 8: Inner Loop of Renderers for
Two Pixel Sizes ...................... 2-392
2-380
nn~®
AP-434
Introduction
The i860™ 64-bit microprocessor is a general-purpose
CPU with on-chip integer unit, floating point, memory
management, caches, and graphics. The i860 microprocessor supports 3-D graphics software with the following functions:
1. Hidden surface elimination
2. Distance interpolation
3. Intensity interpolation for 3-D shading
The fzchks (Z-buffer Check) and pst (Pixel Store) instructions expedite hidden surface elimination. Distance interpolation is accomplished with faddz (Add
with Z merge), and intensity interpolation occurs with
faddp (Add with Pixel Merge). The purpose of this application note is to illustrate the intended use of these
instructions in a manner independent of any graphics
environment in which the instructions might be used. It
is not the purpose of this application note to present the
most efficient instruction sequences. While the inner
loop of Example 7 has as few instructions as logically
possible, the other examples are intended to present
general concepts, not optimum implementations. Tuning for maximum performance depends on the specific
environment.
This application note assumes familiarity with the
i860™ 64-bit Microprocessor Programmer's Reference
Manual (Intel order number 240329); the i860 microprocessor instructions for graphics are detailed in section 6.6.
1.0 3-0 RENDERING
This series of examples are routines that might be used
at the lowest level of a graphics software system to convert a machine-independent description of a 3-D image
into values for the frame buffer of a color video display.
Typically, higher-level graphics routines represent an
object as a set of polygons that together roughly describe the surfaces of the objects to be displayed. The
graphics system maintains a database that describes
1/ SET PIXEL SIZE TO 16
psr,
ld.c
andnoth
OxOOCO,
orh
OxOO40,
st.c
Ra,
these polygons in terms of their colors, properties of
reflectance or translucence, and the locations in 3-D
space of their vertices. Due to the roughness of the
representation, the amount of information in the database is considerably less than that which must be delivered to the video display. A rendering procedure, such
as Example 7, uses interpolation to derive the detailed
information needed for each pixel in the graphics frame
buffer. The rendering procedure also performs pixel-bypixel hidden-surface elimination.
The focus of this series of examples is Example 7,
which operates on a segment of a scan line. The segment is bounded by two points of given location and
color: from point (Xl, YO, Zl) with color intensities
Redl, Grnl, Blul to point (X2, YO, Z2) with color intensities Red2, Grn2, Blu2. The points and color intensities are determined by higher-level graphics software.
The points represent the intersection of the scan line
with two edges of the projected image of a polygon. For
a given scan line, the rendering procedure is executed
onc\! for each polygon that projects onto that scan line.
The higher-level graphics software is responsible for
orienting the objects with respect to the viewer, for
making perspective calculations, for scaling, and for determining the amount of light that falls on each polygon vertex.
The 16-bit pixel format is used, giving ample resolution
for color shading: 26 intensity values for red, 26 intensity values for green, and 24 intensity values for blue.
Example 1 shows how to set the pixel size. For hiddensurface elimination, theZ-buffer (or depth buffer) technique is employed, each Z value having a resolution of
l6-bits.
Because the examples presented here use almost all of
the registers of the i860 microprocessor, the registers
are given symbolic names, as defined by Example 2. In
a real application, it is likely that some of the inputs to
the rendering procedure would be passed in floatingpoint registers instead of the integer registers employed
here. The register allocation shown in Example 2 simplifies the examples by avoiding the need to use any
register for multiple purposes.
Ra
1/ Work on psr
Ra, Rail Clear PS
Ra, Rail PS = 16-bit pixels
psr
1/
Example 1. Setting Pixel Size
2-381
AP-434
II
II
REGISTER DEFINITIONS FOR RENDERING PROCEDURE
INTEGER LOCALS
Ra
= r4 II Temporary
Rb
= r5 II Temporary
Rc
= r6
II Temporary
Rd
= r7
II Temporary
INTEGER INPUTS
/I
Xl
= r16 II X coordinate of starting point of line segment in pixels
= r17 II Width of scan line segment in number of pixels
dX
ZBP = r18 II Z-buffer pointer to the current line segment
Zl
= r19 II Initial Z value, fixed-point 16.16 format
mZ
= r20 II Z slope, fixed-point 16.16 format
FBP = r2l II Graphics frame buffer pointer to the current line segment
Redl
r22 II Initial red intensity, fixed-point 6.10 format, plus ;5
Grnl = r23 /I Initial green intensity, fixed-point 6.10 format, plus .5
Blul
r24 /I Initial blue intenSity, fixed-point 6.10 format, plus .5
mR
r25 II Red slope, fixed-point 6.10 format
mG
r26 II Green slope, fixed-point 6.10 format
mB
= r27 II Blue slope, fixed-point 6.10 format
REAL LOCALS
/I
aZ
f2
II Accumulated Z values
aZh = f3
/I
iZl
f4
II Z interpolant, coefficient 1.0
iZlh = f5
/I
iZ3 = f6
II Z interpolant, coefficient 3.0
iZ3h = f7
/I
oldz = f8
II Original values from the Z-buffer
newz
flO II New Z-buffer values
newzh = fll II
newi = f12 II New pixel values
iR
= f14 II Red interpolant, coefficient 4.0
iRh = fl5 /I
aR
= f16 II Accumulated red intensities
aRh = fl7 /I
iG
fl8 II Green interpolant, coefficient 4.0
iGh = fl9 /I
aG
= f20 II Accumulated green intensities
aGh = f2l /I
iB
f22 II Blue interpolant, coefficient 4.0
iBh = f23 /I
aB
f24 II Accumulated blue intensities
aBh = f25 /I
lZmask
f26 II left-end Z mask
lZmaskh = f27 /I
rZmask
= f28 /I right-end Z mask
rZmaskh = f29 /I
=
=
=
=
=
=
=
=
=
=
=
Example 2. Register Assignments
2-382
in~®
AP-434
2.0 DISTANCE INTERPOLATION
To perform hidden surface elimination at each pixel,
the rendering routine first interpolates the value of Z at
each pixel. Distance interpolation consists of calculating the slope of Z over the given line segment, then
increasing the Z value of each successive pixel by that
amount, starting from Xl. The width of the line segment in pixels is ...
dX
Calculate the reciprocal of dX:
=
+
+
mZ
2*mZ
=
Z(XI
+
N) = Zl
+
N'mZ
Z(XI
+
dX) =Zl
+
dX*mZ
I/dX
The value of dX is used several times as a divisor. It is
most efficient to calculate its reciprocal once, then, instead of dividing by dX, multiply by RdX. The slope of
Z is ...
mZ
Z(Xl) = Zl
Z(XI + 1) = Zl
Z(XI + 2) = Zl
X2 - Xl
=
RdX
polygon. Example 7 assumes that dX and mZ have already been calculated, and all that remains is to apply
mZ to successive pixels. Let Z(Xn) be the Z value at
pixel Xn. Then ...
(Z2 - Zl)* RdX
Because each polygon is a plane, the value of mZ is
constant for all scan lines that intersect the polygon;
therefore mZ needs to be calculated only once for each
=
Z(X2)
Figure 1 illustrates this Z-value interpolation.
The faddz instruction helps to perform the above calculations 64 bits at a time. Because a Z value is 16 bits
wide, Example 7 operates on the Z buffer in groups of
four. The faddz instruction, however, treats the interpolation values (N'mZ) as 32-bit fixed-point numbers;
therefore, two faddz instructions are executed for each
group offour pixels. Because of the way the faddz shifts
(r, g, b, x, y, z = 4000)
o
o
o
'"
mZ
= 3000-2400
12 pixels
(r', g', b', x', y', z'
= BOO)
(r", g", b", x", y", z"
Figure 1. Z-Buffer Interpolation
2-383
= 1000)
240856-1
FI
intel~
AP-434
the MERGE register, the first faddz corresponds to
even-numbered pixels, while the second corresponds to
odd-numbered pixels. Instead of starting with the value
for the first pixel (Z(XI» and adding mZ to each pixel
to produce the value for the next pixel, the example
procedure starts with the values for the first two evennumbered pixels and adds 1• mZ to each ofthese values
to produce the values for the adjacent odd-numbered
pair. Adding 3·mZ to each of the Z values of an oddnumbered pair produces the values for the next even-
numbered pair. Figure 2 shows one way of constructing
the operands before starting the distance interpolations.
.(The initial value given to srCI depends on the alignment of the first pixel.) Table 1 helps to visualize the
process.
After two faddz instructions, the MERGE register
holds the Z values for four adjacent pixels (in the correet order). The form instruction copies MERGE into
one of the 64-bit floating-point registers.
Accumulator
63
Z1-1.0'mZ
15
31
47
I
fraction
0
Iinitial
src1
fraction
Z1:-3.0'mZ
Interpolants
63
3.0'mZ
63
I
fraction
IFirst
src2
0
15
1.0'mZ
fraction
0
fraction
3.0'mZ
31
47
1.0'mZ
15
31
47
Isecond
src2
fraction
Figure 2. faddz Operands
Table 1. faddz Visualization
MERGE Register
Operands
63-32
31-0
src1
-1.0
-3.0
src2
3.0
3.0
rdestlsrc1
2.0
0.0
src2
1.0
1.0
rdestlsrc1
3.0
1.0
src2
3.0
3.0
rdestlsrc1
6.0
4.0
src2
1.0
1.0
rdestlsrc1
7.0
5.0
src2
3.0
3.0
rdest/src1
10.0
8.0
src2
1.0
1.0
rdestlsrc1
11.0
9.0
src2
3.0
3.0
rdestlsrc1
14.0
12.0
src2
1.0
1.0
rdest
15.0
11.0
63-48
I
2
I
3
I
6
I
7
I
10
I
11
I
14
J
15
I
47-32
2
6
10
14
I
31-16
I
I
0
I
I
1
I
I
4
I
I
5
I
I
8
I
I
9
I
J
12
I
I
13
I
15-0
0
4
8
12
Because the values of ZI and mZ are constant for each loop through the rendenng routine, the numbers shown here are
the values of the coefficient N, where the actual operands have the values ZI + N'mz. For each execution of faddz, srel
is the same as rdest of the prior faddz. After every two faddz instructions, a form instruction empties the MERGE register.
2-384
AP-434
II CONSTRUCT INTERPOLANTS iZl AND iZ3 GIVEN
ixfr
mZ,
iZl
II Join
1,
mZ,
shl
Ra II Ra
adds
Ra,
mZ,
Ra II Ra
ixfr
Ra,
iZ3
II Join
fmov.ss
iZl,
iZlh
II Join
iZ3,
iZ3h
fmov.ss
II Join
=
=
mZ
each
2*mZ
3*mZ
each
each
each
half in 64-bit register
half in 64-bit register
half in 64-bit register
half in 64-bit register
Eltample 3. Construction of Z Interpolants
(r
=20,g,b,x,y,z)
Red Color
(0-63)
mR =
II
27-:30
12 pixels
(r' = 40, g', b', x', y', z')
(r" = 40, g", b", x", y", z")
240856-2
Figure 3. Piltellnterpolation for Gouraud Shading
The same register is used as both srcl and rdest in all
faddz instructions. This register serves to accumulate Z
values for successive pixels; therefore, it is called an
accumulator. The registers used as src2 are called interpolants. The code in Example 3 constructs the interpolants; it needs to be executed only once for each polygon.
C(Xl) = Cl
C(XI + I)
C(XI + 2)
3.0 COLOR INTIERPOLATION
To determine the RGB color intensities at each pixel,
the rendering routine interpolates between the color intensities at the end points. (This rendering technique is
called "Gouraud shading" after H. Gouraud, "Continuous Shading of Curved Sufaces," IEEE Transactions
on Computers, C-20(6), June 1971, pp. 623-628.) Let
the symbol C (color) represent either R (red), G
(green), or B (blue). Color interpolation consists of calculating the slope of C over the given line segment, then
increasing the C values of each successive pixel by that
amount, starting from the values for Xl. This must be
done for C=R, C=G, and C=B. The slope ofC is ...
mC
=
(C2 - Cl)*RdX
... where RdX = lIdX
The value of mC is constant for all scan lines that intersect a given pair of polygon edges; therefore mC needs
to be calculated only once for each such pair. Example
7 assumes that mC has already been calculated for all
colors, and all that remains is to apply mC to successive
pixels. Let C(Xn) be a C value at pixel Xn. Then ...
Cl
Cl
+
+
mC
2*mC
Cl
+
N'mC
C(XI
+
N)
C(XI
+
dX) == Cl
+
dX*mC = C(X2)
Figure 3 illustrates Gouraud shading of a triangle.
The faddp instruction performs the above calculations
64 bits at a time. Because a pixel is 16 bits wide, Example 7 operates on pixels in groups of four. Instead of
starting with the value for the first pixel (C(Xl)) and
adding inC to each pixel to produce the value for the
next pixel, the example procedure starts with the values
for the first four pixels and adds 4' mC to each group of
2-385
intel .
AP-434
four to produce the values for the next four. Three
faddp instructions are executed for each group of four
pixels. The first increments the blue values; the second,
green; the third,red. Figure 4 shows one way of constructing the operands for each color before starting the
color interpolations. (The initial value given to srcl depends on the alignment of the first pixel.)
Setup of the accumulator and interpolants is similar to
that of the Z~buffer. The code in Example 4 constructs
the interpolants; it needs to be executed only once for
each pair of edges in each polygon.
\
4.0 BOUNQARY CONDITIONS
The i860 microprocessor operates on 64-bit quantities
that are aligned on 8-byte boundaries. The code in this
example takes full advantage of this design, handling
four 16-bit pixels in each loop. However, if the first or
63
I
I
4.1 Z-Buffer Masking
When either the first or last pixel of the line segment is
not at an 8-byte boundary, the rendering procedure
must mask the first or last set of new Z-buffer values
(newz) so that the Z-buffer and the frame buffer are not
erroneously updated. Sometimes both the first and last
pixels are in the same 4-pixel set, in which case either
one may not be on an 8-byte boundary. A function that
looks up and calculates masks is shown in Example 5.
Because the value OxFFFF is used for masking, the Zbuffer is initialized with OxFFFE, so that the fzchks
instruction always finds the mask to be greater than
any Z-buffer contents.
15
0
I
frac
63
C1 +2*mCI
I
frac
frac
C1+mC
frac
Interpolant
31
47
4*mC
1. Masking of Z values near the end points.
2. Initialization of the accumulators.
Accumulator
31
47
C1 +3*mCI
last pixel of a line segment is not on an 8-byte boundary, two kinds of special considerations are required:
4*mC
frac
4*mC
C1
15
frac
Initial
src1
frac
0
4*mC
frac
src2
Figure 4. faddp Operands
/I CONSTRUCT INTERPOLANTS iR, iG, iB GtVEN mR, mG, mB
mR,
shl
18,
Ra II Multiply each color slope by four, then
18,'
shl
mG,
Rb /I shift by lS to put the Significant
shl
18,
mB,
Rc /I bits into the high-order half
shr
lS,
Ra,
mR II Return significant lS bits
shr
lS,
Rb,
mG /I to low-order half. Any Sign bits
shr
Rc,
lS,
mB /I in high-order half are gone.
mR,
or
Ra,
Ra II Join lS-bit quarters
or
rG,
Rb II in 32-bit register
Rb,
or
mB,
Rc,
Rc II
ixfr
Ra,
iR
II Join 32-bit halves
ixfr
Rb,
iG
II in S4-bit register
ixfr
Rc,
iB
II
fmov.ss
iR,
iRh
II
fmov.ss
iG,
iGh
II
fmov.ss
iB,
iBh
II
Example 4. Construction of Color Interpolants
2-386
AP-434
.macro zmask I_align,
I_align, r_align
Rx, Ry
.data
.align 8
left_mask:: Illow
.long OxOOOOOOOO,
.long OxOOOOFFFF,
.long OxFFFFFFFF,
.long OxFFFFFFFF,
right_mask::lllow
.long OxFFFFOOOO,
.long OxOOOOOOOO,
.long OxOOOOOOOO,
.long OxOOOOOOOO,
II
II
.text
shl
mov
fld.d
r_align, Rx, Ry
left- and right-end alignment [0 •• 3] in 2-byte units
scratch registers
high
OxOOOOOOOO
OxOOOOOOOO
OxOOOOOOOO
OxOOOOFFFF
high
OxFFFFFFFF
OxFFFFFFFF
OxFFFFOOOO
OxOOOOOOOO
I_align,
left_mask,
I_align (Rx) ,
3,
I_align
Rx
lZmask
II
II
II
0 mod
1 mod
2 mod
II 3 mod
4
4
4
4
II
II
II
II
0 mod
mod
2 mod
3 mod
4
4
4
4
II
II
II
Multiply by 8
1
Load 8-byte mask
r_align II Multiply by 8
shl
3, r_align,
right_mask,
mov
Rx
II
r_align (Rx) ,
rZmask II Load 8-byte mask
fld.d
II If the first and last pixels are contained in the same 64-bit
II aligned set, then lZmask = lZmask OR rZmask.
andh
Ox8000, dX,
rO
II Is dX negative
L2
bc
II If not, right end is in other set
fxfr
lZmask, Rx
II
rZmask, Ry
fxfr
II
. Ry,
or
Rx,
Rx
II OR low-order half
ixfr
Rx,
lZmask
II
fxfr
lZmaskh,
Rx
II
Ry
fxfr
rZmaskh,
II
Ry,
or
Rx,
Rx
II OR high-order half
ixfr
Rx,
lZmaskh
II
L2: nop
II
.endm
Example 5. Z Mask Procedure
2-387
II
inteL
AP-434
Table 2. Accumulator Initial Values
Alignment
Initial Z Accumulator Values
Z1
Z1
Z1
Z1
0
2
4
6
-
1*mZ
2*mZ
3*mZ
4*mZ
-
3*mZ
4*mZ
5*mZ
6*mZ
Initial Color Accumulator Values
C = R,G,B
Alignment
0
2
4
6
Z1
Z1
Z1
Z1
C1
C1
C1
C1
-
1*mC
2*mC
3*mC
4*mC
C1
C1
C1
C1
-
2*mC
3*mC
4*mC
5*mC
C1
C1
C1
C1
-
3*mC
4*mC
5*mC
6*mC
C1
C1
C1
C1
-
4*mC
5*mC
6'mC
7*mC
Table 3. Accumulator Initialization Table
Table Values
Alignment
*mZ
0
2
4
6
-1, -3
-2,-4
-3, -5
-4, -6
*mR
-1,
-2,
-3,
-4,
-2,
-3,
-4,
-5,
*mB
*mG
-3,
-4,
-5,
-6,
-4
-5
-6
-7
4.2 Accumulator Initialization
When the first pixel of the line segment is not at an 8·
byte boundary, initial values placed in the accumulators
(aZ, aB, aG, and aR) must be selected so that Zl,
Redl, Grnl, and Blul correspond to the correct pixel.
The desired result is that shown by Table 2. However,
each value is a composite of two terms:- one -that is
constant for each edge pair (n *mZ, n *mR, n *mG,
n *mB) and one that can vary with each scan line (Zl,
Redl, Grnl, Blul). The example assumes that the con·
stant values have all been calculated and stored in a
memory table of the format shown by Table 3. At the
beginning of each line segment the values appropriate
to the alignment of the line segment are retrieved from
the table and added to the initial Z and color values, as
shown in Example 6.
5.0 THE INNER LOOP
-1,
-2,
-3,
-4,
-2,
-3,
-4,
-5,
-3,
-4,
-5,
-6,
-4
-5
-6
-7
-1,
-2,
-3,
-4,
-2, ~3, -4
-3, -4, -5.
-4, -5, -6
-5, -6, -7
line segment of a polygon. The code shown in Example
7 operates on four pixels in each loop. The left and
.right ends of the line segment go through different logic
paths so that the Z-buffer masks can be applied by the
form instruction. All the interior points are handled by
the tight inner loop.
The controlling variable dX is zero-relative and is expressed as a number of pixels. The value of dX also
indicates alignment of the end-points with respect to
the 4-pixel groups. Unaligned left-end pixels are subtracted from dX before entering the inner loop; therefore, subsequent values of dX indicate the alignment of
the right end. A value that is 3 mod 4 indicates that the
right end is aligned, which explains the test for a value
of - 5 near the end of the loop ( - 5 mod 4 = 3). The
fact that the value - 5 is loaded into register Rb on
every execution of the loop does not represent a programming inefficiency, because there is nothing else for
the core unit to do at that point anyway.
Once the proper preparations have been made, only a
minimal amount of code is needed to render each scan·
2-388
int'eL
AP-434
II
ACCUMULATOR INITIALIZATION TABLE
.data; .align .double
acc_init_tab:: .double [16] 0
.dsect
aBi:
• double
II Four initial 16-bit blue values
aGi:
• double
II Four initial l6-bit green values
aRi:
• double
II Four initial l6-bit red values
aZi :
• double
II Two initial 32-bit Z values
.end
.text
II INITIALIZE ACCUMULATORS
.macro acc_init Lalign, Rtab, Rx, Ry, Fx, Fxh
II Lalign - left-end alignment (0 •• 3) in two-byte units
II Rtab - register to use for addressing the table
II Rx, Ry, Fx, Fxh - scratch registers
mov
acc_init_tab, Rtab
II
shl,
5,
Lalign, Lalign II Multiply by row width
adds
Lalign, Rtab, Rtab
II Index row corresponding to alignment
fld.d
aZi(Rtab),
aZ
liZ
ixfr
Zl,
Fx
II Z
fld.d
aRi(Rtab),
aR
II R-Load constant values
shl
16,
Redl,
Rx
II R-Shift starting value to hi-order
fmov.ss
Fx,
Fxh
II Z
shr
16,
Rx,
Ry
II R-Redlstripped of sign bits
fiadd.dd
Fx,
aZ,
aZ
II Z
or
Rx,
Ry,
Ry
II R-Form (Redl,Redl)
ixfr
Ry,
Fx
II R-Put in 64-bit register
fld.d
aGi(Rtab),
aG
II G
shl
16,
Grnl,
Rx
II G
fmov.ss
Fx,
Fxh
II R-Form (Redl,Redl,Redl,Redl)
shr
16,
Rx,
Ry
II G
fiadd.dd
Fx,
aR,
aR
II R-Add variables to constants
or
Rx,
Ry,
Ry
II G
ixfr
Ry,
Fx
II G
fld.d
aBi(Rtab),
aB
II B
shl
16,
Blul,
Rx
II B
fmov.ss
Fx,
Fxh
II G
shr
16,
Rx,
Ry
II B
fiadd.dd
Fx,
aG,
aG
II G
or
Rx,
Ry,
Ry
II B
ixfr
Ry,
Fx
II B
fmov.ss
Fx,
Fxh
II B
fiadd.dd
Fx,
aB,
aB
II B
.endm
Example 6. Accumulator Initialization
2-389
fI
int:eL
AP-434
II
II
RENDERING PROCEDURE
16-bit pixels, 16-bit Z-buffer
and
3,
Xl,
Ra II Determine alignment of starting-point
acc_init Ra, Rb, Rc, Rd, Fa, Fah
II Initialize accumulators
subs
4,
Ra,
Rbl14 - alignment
subs
dX,
Rb,
dX II Adjust dX by Xl alignment
II If dX <= 0, then right end is in same set as left end
and
3,
dX,
Rb II Determine alignment of right end
zmask
Ra, Rb, Rc, Rd
II Prepare both left- and right-end masks
left_end:: II Handle boundary conditions
d.faddz
aZ,
iZ3,
aZ II Interpolate 2 even Z values
adds
-8,
FBP,FBP II Anticipate autoincrement
d.faddz
aZ,
iZl,
aZ II Interpolate 2 odd Z values
adds
-8,
ZBP,
ZBP II Anticipate aut 0 increment
d.form
lZmask, newz
II Mask 4 new Z values
fld.d8(ZBP),01dz
II Fetch 4 old Z values
d.faddp
aB,
iB,
aB II Interpolate 4 blue intensities
mov
-4,
Ra
II Loop increment: 4 pixels
d.faddp
aG,
iG,
aG II Interpolate 4 green intensities
adds
-4,
dX,
dX II Prepare dX for bla at end of loop
d.faddp
aR,
iR,
aR II Interpolate 4 red intensities
bla
Ra,
dX,
Ll II InitializeLCC
d.form
fO,
newi
II Move 4 new pixels to 64-bit reg
adds
5,
dX,
rO II Are there any whole sets (dX < -5)?
Ll: d.fzchks
oldz,
newz,
newzll Mark closer points in PM[7 •• 4]
bc
short_segment
II Get out now if no whole set
d.fnop
II
fld.d
16(ZBP) ,
oldz
II Fetch 4 old Z values
inner_loop:: II Handle all interior points
d.faddz
aZ,
iZ3,
aZ II Interpolate 2 even Z values
nop
II
d.faddz
aZ,
iZI,
aZ II Interpolate 2 odd Z values
fst.d
newz,
8(ZBP)++
II Update Z buf from prior loop
d.form
fO,
newz
II Move 4 new Z values to 64-bit reg
nop
II
d.fzchks
fO,
fO,
fO II Shift PM[7 •• 4] to PM[3 •• 0]
mov
-5,
Rb
II -5 mod 4 = 3, aligned right end
d.faddp
aB,
iB,
aB II Interpolate 4 blue intensities
pst.d
newi,
8(FBP)++
II Store pixels indicated by PM[3 •• 0]
d.faddp
aG,
iG,
aG II Interpolate 4 green intensities
xor
Rb.
dX,
rO II Are we at an aligned right end?
d.faddp
aR,
iR,
aR II Interpolate 4 red intensities
bc
aligned_end
II Taken if at an aligned right end
d.form
fO,
newi
II Move 4 new pixels to 64-bit reg
bla
Ra, dX, inner_loop II Loop if not at end of line segment
d.fzchks
oldz,
newz,
newzll Mark closer points in PM[7 •• 4]
fld.d
16(ZBP),
oldz
II Fetch 4 old Z values for next loop
II End of inner_loop. Right end not aligned
Example 7. 3·0 Rendering (1 of 2)
2-390
intel~
AP-434
right_end:: II Handle boundary conditions
d.faddz
aZ,
iZ3,
aZ II Interpolate 2 even Z values
nop
II
d.faddz
aZ,
iZl,
aZ II Interpolate 2 odd Z values
newz,
8 (ZBP) ++
fst.d
II Update Z buf from prior loop
d.form
rZmask, newz
II Mask 4 new Z values
nop
1/
d.fzchks
fa,
fa,
fa II Shift PM[7 •• 4] to PM[3 •• 0]
nop
II
d.faddp
aB,
iB,
aB II Interpolate 4 blue intensities
pst.d
newi,
8(FBP)++
II Store pixels indicated by PM[3 •• 0]
d.faddp
aG,
iG,
aG II Interpolate 4 green intensities
nop
II
d.faddp
aR,
iR,
aR II Interpolate 4 red intensities
nop
II
aligned_end: :
d.form
br
d.fzchks
nop
II
No special boundary conditions
fa,
newi
II Move 4 new pixels to 64-bit reg
wrap_up
II
oldz,
newz,
newzll Mark closer points in PM[7 •• 4]
II
short_segment::
d.fnop
adds
8,
dX,
rO
d.fnop
right_end
bnc.t
d.fnop
fld.d
16(ZBP) ,
oldz
II
II Is right end in same set as left?
II
II Branch 'taken i f no.
II
II Fetch 4 old Z values
wrap_up:: II Store the unstored and leave dual mode.
fzchks
fa,
fa,
fa II Shift PM[7 •• 4] to PM[3 •• 0]
fst.d
newz,
8(ZBP)++
II Update Z buf from prior loop
fnop
pst.d
newi,
8(FBP)++
II Store pixels indicated by PM[3 •• 0]
Example 7. 3-D Rendering (2 of 2)
2-391
AP-434
6.0 ALTERNATIVE IMPLEMENTATIONS
Example 8 contrasts the inner loop of the 16-bit pixel rendering procedure with that of an 8-bit procedure. For 8-bit
pixels, two faddp instructions accomplish 64-bits of pixel intensity interpolation; there is no need to maintain three
separate color accumulators. Four faddz instructions (rather than two) are required, because eight Z values are
created for the eight pixels per loop.
II 8-bit Pixels, IS-Bit Zbuffer = 8 Pixels in 15 Clocks
II
G-Unit
I
Core Unit
inner_loop: :
d.faddz
d.faddz
d.form
d.faddz
d.faddzz
d.form
d.fzchks
d.fzchks
d.faddp
d.faddp
d.form
d.fnop
aZ,deltaZl,aZ
aZ,deltaZ2,aZ
fO,newZ_A
aZ,deltaZl,aZ
aZ,deltaZ2,aZ
fO,newZ_B
oldZ_A,newZ_A,newZ_A
oldZ_B,newZ_B,newZ_B
intens,dI,intens
intens,dI2,intens
fO,newi
fld.q
nop
nop
andh
bnc
nop
nop
nop
fst.q
bte
bla
pst.d
lS(ZBP),oldZ_A
Ox8000,dX, rO
right end
newZ_A ,lS(ZBP)++
O,dX,end
neg8,dX,inner_loop
newi,8(FBP)++
11-----------------------------------------------------------II IS-Bit Pixels, IS-Bit Zbuffer = 4 Pixels in 10 Clocks
II
G-Unit
.
I
Core Unit
inner_loop: :
d.faddz
d.faddz
d.form
d.fzchks
d.faddp
d.faddp
d.faddp
d.form
d.fzchks
aZ,iz3,aZ
aZ,izl,aZ
fO,newz
fO,fO,fO
aB,iB,aB
aG,iG,aG
aR,iR,aR
fO,newi
oldz,newz,newz
nop
fst.d
nop
mov
pst.d
xor
bc
bla
fld.d
newz,8(ZBP)++
-5,Rb
newi,8(FBP)++
Rb,dX,rD
aligned_end
neg4,dX,inner_loop
lS(ZBP),oldz
11-----------------------------------------------------------Example 8. Inner Loop of Renderers for Two Pixel Sizes
2-392
intel·
APPLICATION
NOTE
March 1990
FAST Fourier Transforms on the
i860™ Microprocessor
MARK ATKINS
APPLICATIONS ENGINEER
Order Number: 240658-001
2-393
FAST FOURIER TRANSFORMS ON THE
i860™ MICROPROCESSOR
CONTENTS
PAGE
CONTENTS
PAGE
1.0 INTRODUCTION TO FFTs ......... 2-395
6.0 PIPELINE SCHEDULING . .......... 2-398
2.0 BUTTERFLY DEFINED ............ 2-395
7.0 PERFORMANCE
MEASUREMENTS ............... 2-400
7.1 Cache Fill and Write back Time .. 2-400
3.0 BIT REVERSAL .................... 2-397
4.0 FFT IMPLEMENTATION ON THE
i860™ CPU ...................... 2-397
5.0 CODE DESIGN ..................... 2-398
5.1 Cache Utilization ................ 2-398
5.2 Pfld ............................. 2-398
5.3 Fst.q ........................... 2-398
5.4 Bit Reversal Code .............. 2-398
8.0 CODE HIERARCHy ................ 2-401
9.0 CONCLUSION ..................... 2-401
APPENDIX A:
PROGRAM LISTINGS ................. 2-402
2-394
AP-435
ABSTRACT
The i860 Processor computes floating-point results rapidly, lending itself to DSP (digital signal processing) as
well as general-purpose computing. With this high performance, DSP functions can be added to any system
containing an i860 CPU. A Fast Fourier Transform
(FFT) illustrates this DSP power. Complete code for
the FFT is presented in this application note, as well as
performance measurements. Both complex and real input data FFTs are included, as well as both Decimation
in Time and Decimation in Frequency.
The subdivision, or decimation, of the N-sum into butterflies can be done via two different methods: "Decimation in Time" (DIT) or "Decimation in Frequency"
(DIF). The methods differ in the ordering of twiddles
and the form of the butterfly arithmetic, but they yield
the same answer. They are based on different mathematical derivations of the FFT: DIT results from recursively splitting the input time-domain samples into an
even-indexed group and an odd-indexed, while DIF
comes from splitting the DFT output frequency-domain points into odd/even groups.
2.0 BUTTERFLY DEFINED
1.0 INTRODUCTION TO FAST
FOURIER TRANSFORMS
Let
Discrete Fourier Transforms (DFTs) change time-domain data samples into a frequency-domain profile 'of
the sampled signal. The frequency-domain representation consists of the magnitudes of sine waves at various
frequencies, which would recreate the original data if
superimposed. To accomplish the transform, a DFT
adds combinations of the input data samples, after multiplying some of those inputs with weighting factors.
The number of samples, "N", is usually a power of two.
A
=
the first input to the butterfly (complex
number, composed of Real part AR and
Imaginary part AI)
B
=
the second input to the butterfly (complex, BR and BI)
W = twiddle factor (also complex, WR and
WI)
Anew = complex result # I, which overwrites A
Bnew = result #2, which overwrites B
For a "Decimation-in-Frequency" butterfly,
Each result in the frequency domain comes from a
weighted sum of all data samples. The weighting ("W")
factors are called "twiddles", and are complex cosine/
sine values for each particular frequency.
The FFT (Fast Fourier Transform) is an efficient im-.
plementation of the DFT, defined by:
x(n) = time domain samples of the signal,
n = 0, 1, ... N-l
Anew = A+B
Bnew
=
(A - B) • W
The complex add, subtract, and multiply of a butterfly
decompose into 4 real mUltiplies, 3 real adds, and 3 real
subtracts:
AnewR = AR + BR tempR = AR-BR
tempI '= AI-BI
AnewI = AI + BI
X(k) = the Discrete Fourier Transform of x(n), k
0,1, ... N- 1
'
BnewR = (tempR' WR)- (tempI • WI)
BnewI
a "frequency domain" equivalent of x(n)
~ x(n) • wnk, n = 0 to N-l, and
wnk = e-j2'1Tnk/N , where j = F1
=
(tempR • WI) + (tempI • WR)
For a "Decimation-in-Time" butterfly,
Anew
A + (B • W)
= ~ x(n) • (cos(27Tnk/N) - j • sin(27Tnk/N))
Bnew
The (N-l) complex adds and (N-l) complex multiplications required for each X(k) make the DFT an Order
(N2) computation. Fortunately, the FFT decomposes
this to an Order (N • log2 N) algorithm by splitting the
N-sum into units of 2-sums. These units are called
"butterflies" because they produce 2 output values
from 2 inputs, with the butterfly-shaped dataflow
shown below. (Some FFT algorithms, called Radix-4,
use 4-input, 4-output butterflies.) The butterfly calculations are executed in stages, with log2 N stages and NI2
butterflies per stage.
= A - (B • W)
The number of real operations remains 4 multiplies and
6 add/subtracts, but the equations differ and the multiplies must be done first:
2-395
tempR
=
(WR' BR) - (WI • BI)
(WR • BI) + (WI • BR)
tempI
AnewR = AR + tempR BnewR = AR-tempR
AnewI
= AI + tempI
BnewI
= AI-tempI
infel .
AP-435
Butterfly Dataflow:'
(Decimation In Frequency)
(Decimation In TIme)
'240658-1
The stages, twiddles, and butterflies for 8-point FFrs
are shown in Figures 1 and 2. For larger values of N,
the dataflow patterns are very similar, with N/2 butterflies executed at each stage, and a greater number of
stages. Refer to a text on Digital Signal Processing for a
complete discussion of FFr design, suchas chapter 6 of
Theory and Application of Digital Signal Processing (see
the Bibliography at the end' of this note).
X(O)
X(O)
Xl')
X(')
X(2)
X(2)
X(3)
XIS)
X(')
Xl')
XIS)
XIS)
XIs)
X(3)
X(7)
X(7)
240658-2
Figure 1_ Declmation-In-Frequency FFT for 8 pOints
X(O)
Xl')
X(2)
X(3)
X(')
XIS)
xIS)
X(7)
Figure 2. Declmatlon-In-Time FFT for 8 points
2-396
240658-3
Ap·435
3.0 BIT REVERSAL
Due to their structure, FFT algorithms have the sideeffect of scrambling the ordering of output data. For
radix-2 FFTs, the output is in "bit-reversed" orderfor example, the value for frequency one is NOT at
location one in the output array, but at location N/2.
Time to unscramble the output is often NOT included
.in FFT benchmarking, because scrambled output is fine
for some signal-processing uses such as convolution. In
any event, unscrambling consists of swapping the locations of pairs of output values. Alternatively, input values can be shuffied, as Decimation in Time usually does
before the first stage (as shown in Figure 2). Otherwise,
to avoid the shutTiing of input in DIT, the twiddles
must be accessed in bit-reversed order. As an example
of bit-reversal, for 256 points the reordering involves:
SWAP X(i) and X(j), where i· = 'klmnopqr'b and j =
'rqponmlk'b. The second index (j) contains the same
bits as (i), but in opposite order.
4.0 FFT IMPLEMENTATION ON THE
i860 CPU
Several features of the i860 CPU contribute to FFT
performance. The floating-point multipiier and adder
can simultaneously produce I product and I sum per
cycle, using Dual-Operation FP instructions. To fetch
the butterfly inputs and store outputs, Dual·lnstruc·
tion-Mode allows a memory fetch or store simultaneous
with the multiply and add. Four floating-point numbers
can be stored by one instruction, using the 16-byte-operand "fst.q" instruction. Likewise,16 bytes can be
fetched from the data cache in one fld.q op.
The floating-point arithmetic of the i860 CPU conforms to IEEE 754 format, which some DSPs fail to do.
Shown below is code for the crucial inner loop of the
FFT:
//--------------------
flinner_loop: do 2 Decimation-In-Frequency FFT butterflies.
// Twelve clocks for 2 butterflies - 12 FP add/sub, a multiplies,
//
6 a-byte loads, 4 a-byte stores.
Core-op
//
FP-op
inner_loop::
d.r2pt.ss
WR,DI,BnewR
pfld.d wind (wstart),WRo
d.pfsub.ss
AR,BR,AnewRo
fld.d
a (fetch)++,ARo
d.ratls2.ss AI,BI,Anewlo
fld.d
offset (fetch),BRo
fst.q
d.i2st.ss
WI,DR,BnewI
AnewR,16(store)++
wincr,wind,wind
d.ratlp2.ss AR,BR,DR
adds
pfld.d wind (wstart),WR
d.ialp2.ss
AI,BI,DI
//-------------------d.r2pt.ss
WRo,DI,BnewRo
d.pfsub.ss
d.ratls2.ss
d.i2st.ss
d.ratlp2.ss
d.ialp2.ss
ARo,BRo,AnewR
Alo,Blo,AnewI
Wlo,DR,Bnewlo
ARo,BRo,DR
Alo,Blo,DI
adds
fld.d
fld.d
fst.q
bla
and
//--------------------
2-397
wincr,wind,wind
a (fetch)++,AR
offset (fetch),BR
BnewR, offset (store)
decrem, count ,inner_loop
wlimit,wind,wind //modulo.
fI
intel@
AP-435
5.0 CODE DESIGN
5.2 Pfld
Refer to the inner_loop above and code listings at the
end of this application note for the discussions that follow. Refer to the ''i860™ 64-bit Microprocessor Programmer's Reference ManuiJf' (Intel order number
240329) for details on instructions and formats.
Twiddle factors (W) are fetched with pfld (Pipelined
Floating-Point Load), to avoid caching them. Only in
the first stage are all the WO elements used; successive
stages use fewer and fewer elements, which are separated by larger and larger strides. Thus placing WO in
cache would be inefficient. The streaming of WO from
main memory actually yields better performance than
caching WO, for 512 and 1024 points. With the i860
CPU's 8-byte external data bus, a complex WO value
can be transferred in a single bus cycle. Some FFT routines calculate WO on the fly, rather than fetching precalculated values; however, performance decreases due
to the added run-time calculations.
The programs include both assembly and Fortran components. Input data can number any power of 2 from
16 to 1024 points. The algorithms are radix-2, floatingpoint, in-place. Included in the listing are both Decimation-in-Time and Frequency, and both complex-input
and real-input FFTs.
5.1 Cache Utilization
Because the instruction cache contains 4-Kbytes, all required code easily fits in cache. However, a 1024-point
complex FFT fills the 8-Kbyte data cache with the input XO array. Thus the more rarely-used twiddle WO
array is intentionally kept out of cache, as described in
the "pfld" section.
A subroutine ("fetch.ss") is used to move the input data
array efficiently into cache for the 1024-point FFT.
"Fetch" allows all data to be brought into cache using
the next-near (NENE#) accesses to DRAM. Without
that routine, getting A and B from locations separated
by 4 Kbytes (NOT the same DRAM page) makes
fetches and writebacks from DRAM for the first stage
slower, and adds 30% to overall execution time.
For larger FFTs (2048 points = 16 kB), straightforward expansion of the present algorithm would cause
increased cache misses. Thus a larger FFT should be
broken into multiple FFTs of 1024 points so that all 10
stages of each can achieve high cache hits. The algorithm becomes (assuming 2048 points, Decimation-InTime):
1) Bit-reverse the entire input array
2) Do a lO-stage FFT on the second set of 1024 points.
Cache hits should be high on those, since they were
most recently accessed by the bit-reversal.
3) Do a lO-stage FFT on the first 1024 points. Prefetch
before the first stage to ensure cache hits.
4) Combine the 2 separate 1024-point results with a final stage of butterflies, where A is offset from B by
8 Kbytes.
5.3 Fst.q
Quad-word (16-byte) stores allow 4 floating-point register values to update the cache in one cycle. Likewise,
fld.q (Quad Floating Point Load) transfers 4 values to
the registers in a cycle. However, in some FFT stages,
double-word fetches (fld.d) are used instead of fld.q;
that allows the "background" fetch of a set of operands
concurrent with arithmetic on the other set. For the
same reason, the inner loop does 2 butterflies, rather
than one.
5.4 Bit Reversal Code
The code for bit-reversal fetches the indices of 2 elements to be swapped from a pre-allocated array of indices, and swaps the data elements. Again, pfld.d keeps
the indices out of cache, for the 1024 point case. That
assembly version of bit-reversal is approximately 7
times faster than the standard Fortran routine. The array of indices was generated by printing out the values
generated during operation of the standard Fortran version; similarly, the twiddle WO values can be pre-allocated and generated using a high-Ievel- language program.
6.0 PIPELINE SCHEDULING
The adder pipeline is 3 stages, as is the multiplier; for
the calculation of
BnewR = (AR - BR) * WR - (AI - BI) • WI
the adder result is fed back into the mUltiplier, and the
product again feeds into the adder. The adder and multiplier pipes each advance one stage for each floatingpoint instruction issued.
2-398
AP-435
The butterfly decomposes into 6 real add/subtracts and
4 real multiplies. Thus the best possible performance
would be 6 clocks per butterfly, with the multiplies totally overlapping the adds. The overlap is accomplished
with the Dual-Operation instructions:
r2pt
(KR"sre2, Treg+Mout, load KR +- srel)
r3:t152
(KR· Aout, srct-sre2, load T +- Mout)
i2st
(KI"sre2, Treg·Mout, load KI +- srel)
ratlp2
(KR· Aout, sTet
ialp2
(KI" Aout, srel + sre2, load KI +- srel)
+ sre2, load T +- Mout)
KR, KI, and T are operand registers feeding the multiplier and adder, separate from the floating-point register file. They permit the 4 inputs for multiply and add,
even thought the instruction format holds only 2 registers. "Aout" and "Mout" are adder and multiplier outputs..
The data path arrangements of some of these ops are
illustrated in Figures 3 and 4. Fetching and storing of
butterfly operands is overlapped with the calculations,
using Dual Instruction Mode - the integer core op
(such as a load or branch) and FP op are fetched simultaneously from the instruction cache and executed
simultaneously.
Scheduling of instructions was done with a pipeline diagram, as illustrated in the comments of the code listing
of difstep.ss in the Appendix. (The comments show the
machine state after the. instruction is processed.) Begin
by placing the desired results in the rightmost column,
then tracing progress backwards through the adder.
When adder inputs are products (of the multiplier), one
product is kept in the Treg for a cycle while the other
propogates through the multiplier final stage. Those
products can be traced back on the multiplier pipeline,
to determine at what instruction the multiplier inputs
must be provided.
For example, place the BnewR label in the "Write"
stage of the pipe (the output of the Adder). Now
BnewR = WR • DR - WI • DI
Three instructions earlier, the adder inputs for BnewR
must be fed to adder; those inputs are products, one of
which comes directly from the multiplier output, and
the other from the Treg. The multiplier output and
Treg value must then be traced back through multiplier
. stages, requiring the following instructions:
i2st.ss
as the 9th instruction, to update the Treg
ialp2.ss
as the 6th ap, ta multiply DI • WI
AI,BI,Dr
ratlp2.ss AR,BR,DR
as the 5th ap, ta multiply DR • WR
ratls2.ss AI,BI,Anewla
as the 3rd, to start DI into the adder
pfsub.ss
src2
Wla,DR,Bnewla as the 10th ap of 12, ta start (T - Maut)
ratls2.55 Ala,Bla,AnewI
AR,BR,AnewRo ., as the 2nd, to start DR into the adder
srcl
rdest
src2 rdest
op2
MULTIPLIER UNIT
MULTIPLIER UNIT
RESULT
ADDER UNIT
ADDER UNIT
RESULT
RESULT
r2pt at r2st
rot1p2 Be rat1s2
240658-4
240658-5
Figure 3. Datapath for r2pt op
Figure 4. Datapath for rat1p2 op
2-399
fII
int:et
AP-435
Some trial-and-error ordering of the desired outputs is
needed to devise a sequence which keeps the adder
pipeline full. An op is chosen for each slot for its ability
to load the KR or KI register, or to initiate an adder
operation simultaneous with the multiplies required to
calculate BnewR and BnewI.
Handy hints to assist dual-operation scheduling include:
I) Feedback the adder result to the multiplier, or visa
versa, whenever possible. For example, the ratlp2
op feeds adder-out to multiplier. Thus both src1 and
src2 fields of the instruction are available to feed the
adder-in, and a simultaneous useful add and multiply are initiated.
2) Freeze one of the pipes, by using a pfadd or pfmul,
when appropriate. In the butterfly, where 6 adds are
done for every 4 multiplies, freezing of the multiplier does not degrade performance. The freeze allows
multiplier results to be held until needed in the adder.
3) The Treg can hold a multiplier result for several
cycles until needed in the adder.
4) Unroll a loop to do 2 iterations per loop. That provides time to fetch inputs for iteration 2 while calculating iteration 1, and store results of iteration 1
(and fetch more inputs) while calculting iteration 2.
7.0 PERFORMANCE MEASUREMENTS
The code was run on an evaluation card with DRAM
memory only, no external cache, 33.33 MHz clock, and
5 wait-states or more for some accesses. Next-near accesses (address falls into the same DRAM page as the
previous access) are zero wait-state, but far accesses
take 5 or more wait-states. The code was run under a
virtual-memory multitasking executive. Shown below
are measured results:
System: 33.3 MHz 80860 with a single bank of
static-column DRAM
Type of FFT
Time
1024-point-complex, DIF
1024-point-real
512-point-complex, DIF
512-point-real
256-point-complex, DIF
1024-point-complex, DIT
512-point-complex, DIT
1.17 ms
0.48 ms
0.22 ms
Time
(including
bit-reversal)
1.33 ms
0.67 ms
0.56 ms
0.33 ms
0.26 ms
1.37 ms
0.59 ms
7.1 Cache Fill and Write back Time
Measured times do not include cache-fill and writeback. That is, the timings measured 200,000 executions
of the FFT using the same input array. (Performance
figures offered by other manufacturers for DSP chips
likewise assume that the data is already in on-chip
RAM. Of course, the i860 CPU will do that fetching
automatically into its data cache.) The additional time
for cache fill and writeback were measured as:
1024-point-complex 0.25 ms (8 Kbytes fetched,
8 Kbytes writeback)
512-point-complex 0.12 ms (4 Kbytes)
To quantify the calculations in MFiops (Millions of
FLoating-point OPerations per Second), consider that
the 1024-point complex FFT is implemented with
about 16,400 multiplies and 28,700 adds/subtracts.
Thus the 1.17 ms translates to a sustained 38.5 MFlops
rate. For 512 points, the required 20,000 Flops means
41.6 MFlops.
The overall FFT is about 10 times faster than the equivalent Fortran. Inner loop performance was measured at
13 cycles for the 24 instructions, which is 6.5 cycles per
butterfly.
Algorithm: Radix-2 FFT, in-place. Data is IEEE 754
single-precision floating point. Implemented in assembly-language and Fortran code.
2-400
InteL
AP-435
8.0 CODE HIERARCHY
Pictured below are the programs developed for the i860 CPU FFT:
ror
I
I
dltf.f
I
fft.!
I
I
bltrov.ss
dJrr.f
I
dlfstep.ss
I
felch
reolfix.ss
240658-6
The Fortran program ffttest.f is the highest-level program of those listed on the following pages. It calls two
FFT subroutines, diff.f and fft.f, then compares their
outputs. Fft.f is a Fortran decimation-in-time algorithm, while difU is the high-speed DIF routine. Diff.f
is callable by C or Fortran applications. It in turn calls
difstep, which is implemented in assembly code
(difstep.ss). Difstep is called once per stage of the FFT.
A Fortran version (difstepf.f) is shown, for comparison.
Other assembly routines are the bit-reversal-data-movement (bitrev.ss) and prefetch ("fetch" inside bitrev.ss).
Difstep.ss contains approximately 225 assembly instructions, and bitrev.ss contains about 24. The Fortran
diff.f compiles to about 80 instructions.
A Decimation-in-Time version of diff.f and difstep.ss
can be found in ditU and ditstep.ss. The DIT version
performs 5-10% slower than the Decimation-in-Frequency because the D IT loop takes 7 cycles per butterfly, while DIF takes 6.
A real-input algorithm is dirr.f, which can be called
and tested using program real.f. Dirr.f calls difstep to
do a complex DIF FFT on N real data points, but
treats them as N12 complex points. Then realfix.ss is
called by dirrJ to fix the DIF output, compensating for
the treatment of the N real points as N/2 complex. The
derivation of the real-fix can be found in reference 3,
Numerical Recipes in C.
The mixture of Fortran, C, and assembly code is accomplished by passing function inputs and outputs in
registers. Only pointers and integer values were used in
the above code, but floating point parameters can also
be exchanged. A calling program feeds arguments to a
function in rl6, r17, and higher-numbered integer registers. The callee is permitted to destroy the contents of
those registers, but rl:rl5 must be preserved. For more
details on parameter-passing conventions see the i860
64-bit Microprocessor Programmer's Reference Manual,
Chapter 8.
2-401
9.0 CONCLUSION
The i860 CPU computes very Fast Fourier Transforms,
quicker than most high-end dedicated DSP chips. Contributing to the FFT performance are the 8-kByte onchip data cache and 4-kByte instruction cache. Also the
8-byte external data bus, pfld instruction, and 16-byte
data cache width provide sufficient bandwidth to keep
the arithmetic units busy. Dual-Operation instructions
and Dual-Instruction-Mode allow parallel data movement and calculations. The 33.3 MHz clock rate allows
both an add and a multiply every 30 ns, giving a time of
1.17 ms for a 1024-point complex FFT. A 40 MHz i860
Microprocessor will yield a time of less than 1 mSec.
ACKNOWLEDGEMENTS
The author wishes to thank Tricord Systems, Inc. for
providing the key inner loop kernel design of the FFT.
BIBLIOGRAPHY
1. Gold, Bernard and Rabiner, Lawrence, Theory and
Application of Digital Signal Processing, 1975, Prentice-Hall Inc., Englewood Cliffs, NJ. Pages 356381,573ff
[This text explains DFT and FFT basics well, with
ample picturesl
2. Horden, Ira, "An FFT Algorithm For MCS(c)-96
Products Including Supporting Routines and Examples", Intel Application Note AP-275, order number
270189. (That Application Note can also be found
in the Intel Embedded Controller Handbook, Volume II, order number 210918)
[The note, dated 9/87, reviews FFT theory, real vs.
complex, AID issues, and waveformsl
3. Press, William, Flannery, Brian, et. ai., Numerical
Recipes in C, 1988, Cambridge University Press.
Pages 398-424.
[Numerical Recipes contains the C-code source for
"realfix"l
AP-435
APPENDIX A
PROGRAM LISTINGS
Pg.
A-2
A-3
A-ll
A-13
A-17
A-21
A-22
A-30
A -31
A-36
I) diff.f:
Fortran module to do fast Decimation-In-Frequency (DIF) Radix-2 FFT.
2) difstep.ss:
Assembly code which does all DIF FFT butterflies; called by diff.f.
3) difstepf.f:
Fortran equivalent of difstep.ss. Included here for clarity.
4) bitrev.ss:
Assembly code to do bit-reversal.
5) ffttest.f:
Highest-level Fortran code. Tests diff.f or ditt.f.
6) ditt.f:
Fortran module to do fast Decimation-In-Time (DIT) Radix-2 FFT.
7) ditstep.ss:
Assembly code which does all DI'r FFT butterflies; called by ditt.f.
8) dirr.f:
Fortran module for Real-Input Decimation-In-Frequency (DIF) Radix-2 FFT.
9) realfix.ss:
Assembly code required by dirr.f to compensate for Real-Input.
10) real.f:
Highest-level Fortran code, for Real-value input. Tests dirr.f.
A-40
II) fft.f:
A-43
12) makefile:
Unix V /386 version of a makefile to maintain the FFT code, using the Unix "make" program-maintenance utility. Note that this makefile uses the Unix macro preprocessor "m4" to convert symbolic names
to register numbers.
13) start.ss:
Assembly code preamble for Fortran runtime.
14) time.c:
Dummy routine, used to install breakpoints.
Fortran FFT algorithm. Generates "correct" answers for comparison against the other code.
A-45
A-45
2-402
AP-435
C-------------------
C File: diff. f
C FFT - Decimation in Freq. radix-2. inplace. l-dimen
C Intel assumes no responsibility for use or misuse of this code.
C 5/19/89: call fetch8() added for 1024-point caching.
C 6/01/89: fetch() CRUCIAL-30% performance loss if removed
C Inputs:
C A= complex array of input. up to 1024 pts. single-prec float
C
M= log of number of pts
C
= (number of stages of FFT)
C
N = number of points. ie. N= 2*~'M = number of pts
C W= complex array of twiddle factors. length N/2.
C REV= 0 if bitreversed output ok. l=must re-order output
C
C Outputs:
C A= complex fft of input A
C
subroutine diff(a.m.N.W.REV)
integer m.N. i. j.k. REV.wlimit
integer offset. stage. groups. wincr.powers2(0:10)
complex a(n) .w(N/2) .temp
data powers2 /1.2.4.8.16.32.64.128.256.512.1024/
C Powers2 to avoid calls to POW. DIV
C Twiddle factor array w(k) has (cos.-sin) of 2pi*k/N
CC Assume the caller provides w(k) constants ALREADY initialized
C------------
C Pre-touch data. lock into cache. for 8kByte fft:
IF (N .gt. 513) THEN
call fetch(a.%VAL(n))
ENDIF
C------------
wlimit = 8*((N/2) - 1)
C "DO 20" stage-loop
DO 20 stage = l.m
groups = powers2(stage-l)
C groups=number of times the twiddle factors are used. ie. the number of
C smaller DFTs the stage is split into.
C offset gets N/2.N/4.N/8.N/16 ••••
offset = powers2(m-stage)
wincr = groups
call difstep(a.w.groups.offset.wincr.wlimit)
20 CONTINUE
IF (REV .ne. 0) THEN
cc REV .ne. 0 means must do bit-reversal reordering of output
call bitrev(a.%VAL(M) .n)
ENDIF
RETURN
END
C-----------2-403
int:eL
AP-435
11-------------II
II
II
II
difstep.ss: do one stage of fft butterflies
DIF = Decimation in Frequency, radix-2, inplace, I-dimension
(C) Copyright 1989 INTEL Corporation.
Inner loop developed with assistance from Tricord Systems, Inc.
II
II
II
II
II
5/18/89:
5/19/89:
5/31/89:
11--------------
1 pm - offset_2 added, as next-to-last stage was slow
4 pm - fetch8() routine added, for cache miss avoidance.
am - use fst.q (13% perf improvement of inner_loop!)
last_bfly added, for performance.
6/02/89: am - bptr deleted. Modulo-address W (5% perf improved)
11-----------II Intel is not
11-----------II
II
responsible for use nor for misuse of this program.
Do one entire stage (n/2 butterflies). Sample invocation:
call difstep(a,w,groups,offset,wincr,wlimit)
11====================================================
II Inputs:
.
II
II
II
II
II
II
II
II
II
II
II
A= complex array of input, single-prec float
(complex stored as 4byte real, 4byte imag contiguously).
W= pointer to array of twiddle factors. Assuming W(k) is
CMPLX(coS(2pi*k/N» ,-sin(2pi*k/N»
for k=O to (N/2)-1.
offset = distance (except for scale-by-8byte sizeof(complex» between
the 2 input values for each butterfly.
Offset also is the number of butterflies done per "group".
groups = N/(2*offset). The number of sub-DFTs this stage is split into.
wincr = distance (except for scale-by-8byte sizeof(complex» between
successive w values for successive butterflies
wlimit =max index, in bytes, of W table.
II
II Outputs:
II
A= complex radix-2 butterflied version of input.
11-------------------
define (astart, r16) Ilinput data base address
define (wstart.r17) Iltwiddle array ptr. Because w-contents depend on N.
II we will assume the caller has initialized w() array.
define (groups.r18) Ilgroups=number of sub-DFTs this stage is split into.
define(offset.r19) Iloffset (initially elements. mult by 8 to get bytes)
II between node and its dual (the 2 numbers to butterfly. ie. A and B)
define (wincr.r20) II increment between successive W values. Remains constant
II within a given stage. For Decimation in Freq. wincr addressing is:
II +8 for offset=N/2 (WO.Wl.W2.W3 •••• W(n-l»
II +16
offset=N/4 (WO.
W2.
W4.... ) etc •••
define (wlimit.r2l) Ilmax index. in bytes. of W table.
define (wind.r22) II current index. in bytes. of W table.
define(offset2.r23) Iloffset*2
define(decrem.r24) Ilbla decrement
define (somecount.r25) II bla counter
define (FEtch. r26) Ilpointer to 1st component of butterfly (load)
define(STore.r27) II • " 1st component of but tertly (store)
2-404
InteL
AP·435
/I f4: f7 spare
define (AR, fl2) Ilelement A, real component
define(AI, fl3) II " ", imag
define (ARo, fl4) II extra A value, for prefetch (o="odd")
define(AIo,fl5)
define (BR, fl6) Ilelement B, real component
def~ne(BI, fl7)
define (BRo,fl8) II extra B value, for prefetch
define (BIo,fl9)
define (ER, f20) /lA+B, real (ER = AR + BR)
define(EI, f21) /I"
imag "
define (ERo ,f22) /lA+B, real, previous loop's value
define(EIo,f23) II "
imag "
define (FR, f24) IIW*(A-B) , real
define(FI, f25) II "
imag "
define (FRo,f26)
define (Flo,f27)
define (DR, f28)
define(DI, f29)
define (WR, f30)
define (WI, f31)
define (WRo,flO)
define(WIo,fll)
IIDifference of A-B, real part
/I " ", imag "
/lW (twiddle factor), real part
II " • , imag
IIW (twiddle factor), real part (EXTRA copy)
II " " ,imag
.text
.align .quad
_difstep_ : :
Id.l
o (groups) ,groups Ilfix Fortran call-by-ref
Id.l
O(offset) ,offset II
shl
3,offset,offset II change from elements to bytes
shl
l,offset,offset2
fst.q
fst.q
f8 ,-16(Sp)++ Ilsave "local" regs
f12,-16(sp)++ lIn"
adds
adds
-l,groups,groups II pre-decrement for bnc usage, or bla usage
-16,rO,decrem Ilbla decrement
II We code the last 2 stages as special cases:
11-------xor
8,offset,rO Iloffset=l, special case, no complex multi funny addressing
bcoffset_lll (ASSUMING offset=l means wincr=O, and no twiddle'used)
xor
16, offset, rO Iloffset=2, special case, no complex mul t, funny addressing
bcoffset_211(ASSUMING offset=2 means wincr=N/4)
11-------Id.l
Id.l
O(wincr),wincr
O(wlimit) ,wlimit
2-405
int'et
AP-435
pfadd.ss
pfadd.ss
pfadd.ss
pfmul.ss
pfmul.ss
pfmul.ss
fO,fO,fO
fO,fO,fO
fO,fO,fO
fO,fO,fO
fO,fO,fO
fO,fO,fO
II
init Al,A2,A3=O
/1--------
II
shl
shl
init pointers:
3,wincr,wincr
l,wincr,wind
Ilscale for
Ilinit wind
bytes.
=2*wincr
pfld.d a ( wstart) ,fa
pfld.d wincr ( wstart) ,fa
adds
-8,astart,FEtch
pfld.d wind (wstart) ,fa
adds
wincr,wind,wind Ilwind now 3*wincr
/1 here fetch first set of A,B,W before bla-loop
pfld.d wind (wstart) ,WR
'adds
wincr, wind, wind
and
wlimit,wind,wind Ilmodulo-wlimit the w index
/1 We do modulo-addressing on W(), to keep the pfld pipeline full. We
/1 never do a W-fetch beyond the end of the table.
/1 And the modulo-check needs to be done only every 4th pfld, as always
/1 we use a multiple of 4 W() factors.
fld.d 8 (FEtch)++,AR
fld.d offset (FEtch) ,BR
d.r2apl.ss fO,fO,fO Ilclear Treg'.
adds -32,offset,somecount II bla counter (predecrement by 4 elements)
/1 ----------/1 Definitions for pipe diagram:
/1 (the complex multiply product, F, broken into 4 real mult and 2 adds) :
/1
WR = cos(), WI=-sin().
/1
DR = AR - BR; (diffence of Real components of A,B)
II
II
II
II
II
II
DI
ER
FR
FI
=
=
=
=
AI - BI; ('diffence of Imag components)
AR + BR; EI = AI + BI;
K - L; where K= WR*DR, L=WI*DI
N + M; where M= WI*DR, N=WR*DI
For 1st time thru inner_loop, don't have correct values to store.
Must do 1 loop before the loop, sans the stores.
first_bfly::
Ilfill
pipe
II KR ••• KI ••• Ml •••• M2 •••• M3
d.r2pt.ss WR,fO,fO
II WRO -,
pfld.d wind (wstart) ,WRo
d.pfsub.ss AR,BR,fO II
fld.d 8 (FEtch)++,ARo
d.ratls2.ss AI,BI,fO II
fld.d offset (FEtch) ,BRo
d.i2st.ss WI,fO,fO II
WIO
adds
wincr,wind,wind
2-406
T
Al •••• A2 •••• A3 •••• Write
DRO
DIO
DRO
DIO
DRO
AP-435
d.ratlp2.ss AR,BR,DR
II
nop
d.ialp2.ss AI,BI,DI
II
pfld.d wind (wstart) ,WR
d.r2pt.ss WRo,DI,fO
II WRI
fld.d 8 (FEtch)++,AR
d.pfsub.ss ARo,BRo,ER II
fld.d offset (FEtch) ,BR
d.ratls2.ss Alo,Blo,EI II
adds
wincr,wind,wind
d.i2st.ss Wlo,DR,fO
II
and
wlimit ,wind,lvind
KO
WI!
ERO
DIO
LO
KO
NO
LO
KO
NO
LO
KO
NO
LO
KO
Dll
DRI
NO
KO
K-L
Dll
MO
EIO
ERO
EIO
DRI
DRO
DIO
ERO
EIO
ERO
EIO
DRI
quickstart: :
d.ratlp2.ss ARo,BRo,DR II
KI
MO
NO
ERI FRO Dll
DRI
bla
decrem,somecount,inner_loop Ilinit LCC
d.ialp2.ss Alo,Blo,DI II
LI
KI
MO
NO
Ell ERI FRO
DII
adds
-16,astart,STore II ptrs init 16 low, for fst.q instructions
11-------------------
II Each butterfly = 1 complx multiply, I complx add, 1 complx subtract
II =
4 multiply,
II
3 add
II
3 subtract
II
3 8-byte fetches (A, B, W)
II
2 8-byte stores (A, B)
II
II 6 cycles per butterfly
II
II inner_loop: iterates "offset/2" times (eg, N/4 for stage 1, N/8 for
II for each group. It does 2 butterflies per iteration
stage2),
inner_loop: :
II KR ••• KI ••• Ml ••• M2 •• M3
II I
I
I
I
I
LI
NI
KI
II WR2
T
AI. .A2 ••• A3 •• Write
I
I
I
I
I
d.r2pt.ss WR,DI,FR
NO
N+M Ell ERl
FRO
pfld.d wind (wstart) ,WRo
d.pfsub.ss AR,BR,ERo
Nl
LI
NO
DR2 FlO Ell
ERI
KI
II
fld.d 8 (FEtch)++,ARo
d.ratls2.ss AI,BI,Elo II
NI
LI
Kl
DI2 DR2 FlO
Ell
fld.d offset (FEtch) ,BRo
Kl
d.i2st.ss WI,DR,FI
WI2 MI
FlO
Nl
K-L DI2 DR2
II
fst.q ER,16(STore)++ Ilupdate ER/EI/ERo/Elo
d.ratlp2.ss AR,BR,DR
K2
MI
NI
ER2 FRI DI2
DR2
II
adds
wincr,wind,wind
d.ialp2.ss AI,BI,DI
L2
K2
Ml
Nl
EI2 ER2 FRl
DI2
II
Iino need for modulo-check ("and") here, as odd num of W's have been fetched.
pfld.d wind (wstart) ,WR
I I· ..••..•.•.........•...•...•.•..•....••...•..........•...••.....••
2-407
intel~
AP-435
II KR ••• KI ••• Ml •••• M2 •••• M3
d.r2pt.ss WRo,DI,FRo II WR3
N2
L2
K2
adds
wincr,wind,wind
d.pfsub.ss ARo,BRo,ERII
N2
L2
K2
fld.d 8 (FEtch)++,AR
L2
d.ratls2.ss Alo,Blo,EIII
N2
fld.d offset (FEtch) ,BR
d.i2st.ss Wlo,DR,Floll
WI3 M2
N2
fst.q FR, offset (STore)
T
Nl
Al •••• A2 •••• A3 •••• Write
EI2
ER2
FRl
N+M
Nl
DR3
FIl
EI2
ER2
K2
DI3
DR3
FIl
EI2
K2
K-L
DI3
DR3
FIl
N2
ER3
FR2
DI3
DR3
N2
EI3
ER3
FR2
DI3
Ilupdate FR/FI/FRo/Flo
d.ratlp2.ss ARo,BRo,DRII
K3
bla decrem,somecount, inner_loop
d.ialp2.ss Alo,Blo,DIII
L3
and
wlimit,wind,wind Ilmodulo.
end_~nner_loop::
M2
K3
M2
IIKEEP Pipelines full
/1 RE-init pointers for fetches
d.fiadd.ss fO,fO,fO
adds
offset2,astart,astart Ilbump to next group
Ilredo A,B fetches, with proper ptr.
d.fiadd.ss fO,fO,fO
fld •.d O(astart) ,AR Ilget first AR/AI in next group
d.fiadd.ss fO,fO,fO
fld.d offset (astart) ,BR
d.fiadd.ss fO,fO,fO
adds
O,astart,FEtch
last_bfly:: lIdo final 2 butterflies, start next group
II KR ••• KI ••• Ml •••• M2 •••• M3
T
d.r2pt.ss WR,DI,FR
II WR4
N3
L3
K3
N2
pfld.d wind (wstart) ,WRo
d.pfsub.ss AR,BR,ERo II
N3
L3
K3
N2
fld.d 8 (FEtch)++,ARo
d.ratls2.ss AI,BI,Eloll
N3
L3
K3
fld.d offset (FEtch) ,BRo
d.12st.ss WI,DR,FI II
WI4 M3
N3
K3
fst.q ER,l6(STore)++
d.ratlp2.ss AR,BR,DR II
K4
M3
N3
adds
wincr,wind,wind
d.ialp2.ss AI,BI,DI II
L4
K4
M3
N3
pfld.d wind (wstart) ,WR
Al •••• A2 •••• A3 •••• Write
N+M
EI3
ER3
FR2
DR4
FI2
EI3
ER3
DI4
DR4
FI2
EI3
K-L
DI4
DR4
FI2
ER4
FR3
DI4
DR4
EI4
ER4
FR3
DI4
/1 ............................ •• ......... ••••••••••••••• ••••••••••••
II KR ••• KI ••• Ml •••• M2 •••• M3
T
Al •••• A2 •••• A3 •••• Write
d.r2pt.ss WRo,DI,FRo II WR5
N4
L4
K4
N3
N+M
EI4
ER4
FR3
fld.d 8 (FEtch)++,AR
d.pfsub.ss ARo,BRo,ERII
N4
L4
K4
adds -32,offset,somecount II reset bla counter
d.ratls2.ss Alo,Blo,EIII
N4
L4
adds
wincr,wind,wind
d.12st.ss Wlo,DR,Floll
WI5 M4
N4
adds -l,groups,groups
d.fnop
fld.d offset (FEtch) ,BR
d.fnop
bnc.t quickstart Ilbranch on value of groups
d.fnop
fst.q FR, offset (STore)
2-408
N3
DR5
FI3
EI4
ER4
K4
DI5
DR5
FI3
EI4
K4
K-L
DI5
DR5
FI3
infel"
AP-435
end_last_bf'ly: :
d.fnop
br endit
fiadd.ss fO.fO.fO
fst.q FR. offset (STore) Ilrepeated for bno.t untaken case
.align
• quad
======
offset_l: :
II want FEtch=0.2.4.6.B •••• elements. ASSUMING winor=O.
II and that w=(1.0). so that no complex mult needed. and NO W will be fetched.
II E=A+B. F=A-B. (Per double-butterfly loop: B pfadd.4 dword fld. 4 fst.
II 1 bla) (fld.q required. to reduce # flds to avoid pipe stalls)
II Performance = 4 cyo/bfly best case.
11=======================================================
IIRedefine regs for fld.q.fst.q usage. when A and B adjacent:
define (AR3. f'l2) Ilelement A. real component
define (AI3. f'l3) II n n. imag
define (BR3. f'l4) Ilelement B. real component
define (BI3.f'l5)
define (AR4. f'l6) II extra A value. for prefetch
define (AI4.f'l7)
define (BR4.f'lB) II extra A value. for prefetch
define (BI4.f'l9)
define (ER3.
define(EI3.
define (FR3.
define(FI3.
f20)
f21)
f22)
f23)
define (ER4.f24)
define (EI4.f25)
IIA+B. real (ER =
II
imag
II(A-B). real
II
imag
IIA+B. real. extra
n
n
II
n
AR + BR)
n
n
copy
imag
define (FR4.f26)
define (FI4.f27)
.
/
11===========================--=====--=======
adds
-16.astart.FEtch
fld.q 16 (FEtoh)++.AR4
adds
-l.groups.somecount II.bla counter (predeoremented·already by 1)
Ilusing groups=blacount on the offset_l loop. intentionally.
adds
-16.FEtch.STore
Ilstartup the loop:
Al •••••• A2 •••••• A3 •••••• Write:
d.pfadd.ss AR4.BR4.fO
II ARn+BRn fld.q 16 (FEtch)++.AR3
d.pfadd.ss AI4.BI4.fO
AIn+BIn ERn
adds
-2.rO.decrem
bflies per loop
d.pfsub.ss AR4.BR4.fO
II ARn-BRn EIn
ERn
bla decrem.somecQunt. offsetl_loop Ilinit LCC
d.pfsub. ss AI4.BI4.ER4 /I AIn-BIn FRn
EIn
ERnext
nop
Al •••••• A2 •••••• A3 •••••• Write:
offsetl_loop: :
II --------------..:.-----11
II
112
II --------------------11
2-409
int'et
AP-435
d.pfadd.ss AR3.BR3.EI4 1/ AR+BR
FInop
ER
d.pfadd.ss AI3.BI3.FR4 1/ AI+BI
fld.q 16 (FEtch)++.AR4
EI
d.pfsub.ss AR3.BR3.FI4 1/ AR-BR
fst.q
ER4.16(STore)++
d.pfsub.ss AI3.BI3.ER3 II AI-BI
FR
nop
d.pfadd.ss AR4.BR4.EI3 II AR2+BR2 FI
fld.q 16 (FEtch)++.AR3
d.pfadd.ss AI4.BI4.FR3 II AI2+BI2 ER2
nop
d.pfsub.ss AR4.BR4.FI3 II AR2-BR2 EI2
bla decrem.somecount. offsetl_loop
d.pfsub.ss AI4.BI4.ER4 II AI2-BI2 FR2
fst.q ER3.16(STore)++
FR-
EI-
FI-
FR-
ER
FI-
EI
ER
FR
EI
FI
FR
ER2.
FI
EI2
ERnext
11--------------------------
end_offsetl_loop::
d.fiadd.ss fO.fO.fO
br endit
fiadd.ss fO.fO.fO
nop
11--------------------------
.align .quad
offset_2: :
II want FEtch=O.1;4.5;8.9;12;13; ••• elements.
II ASSUMING wincr=N/4 (W_addr=O.N/4.0.N/4.0 •••• ). Trivial W() factors.
II USE bla loop. incrementing FEtch by 16 (2*offset).
II . Even-indexed elements identical to offset_l.W=WO. no complex multo
II
SO FReven=(AR-BR). Fleven=(AI-BI).
II Odd components have W=(O.-l). So FRodd=(AI-BI). Flodd=(BR-AR).
II Each fld.q fetches AReven.Aleven.ARodd.Alodd.
IIAssume ER.EI.ERo.EIo are 4 contiguous regs.
IIAssume FR.FI.FRo.Flo are 4 contiguous regs.
adds
fld.q
fld.q
adds
-16.astart.FEtch
16 (FEtch)++.AR
16 (FEtch)++.BR
O.groups.somecount Ilbla counter
Ilstartup the loop:
II ---------------------11 Al •••••• A2 •••••• A3 •••••• Write:
pfadd.ss AR .BR .fO
II AR+BRe
pfadd.ss AI .BI .fO
II AI+Ble ER
d.pfadd.ss ARo.BRo.fO
II ARo+BRo EI
ER
nop
EI
ER
d.pfadd.ss Alo.Blo.ER
II Alo+Blo ERo
nop
d.pfsub.ss AR .BR .EI
II AR-BRe Elo
ERo
EI,
adds
-l.rO.decrem 112 bflies per loop. but groups is half desired value.
d.pfsub.ss AI .BI .ERo II AI-Ble FR
Elo
ERo
adds
-16.astart.STore
d.pfsub.ss Alo.Blo.Elo II Alo-Blo FI
FR
Elo
bla decrem.somecount. offset2_100p Ilinit LCC
d.pfsub.ss BRo.ARo.FR
II BRo-ARo FRo
FI
FR
nop'
2-410
AP-435
offset2_loep: :
d.fnop
fld.q 16 (FEtch)++,AR Ilfetch AR,AI,ARo,Alo
d.fnop
fld.q 16 (FEtch)++,BR Ilfetch BR,BI,BRo,Ble
II ---------------------11 A1 •••••• A2 •••••• A3 •••••• Write:
d.pfadd.ss AR ,BR ,FI
II AR+BRe FIe
FRe
FI
nep
d.pfadd.ss AI ,BI ,FRe 1/ AHBle ER
FIe
FRo
nep
d.pfadd.ss ARe,BRo,Fle 1/ ARe+BRe EI
ER
Flo
fst.q
ER ,16(STore)++
Ilupdate ER ,EI ,ERe,Ele
EI
ER
d.pfadd.ss Ale,Ble,ER
II Ale+Ble ERe
nep
ERo
EI
d.pfsub.ss AR ,BR ,EI
II AR-BRe Ele
nep
Elo
ERo
d.pfsub.ss AI ,BI ,ERo II AI-Ble FR
fst.q FR ,16(STere)++
d.pfsub.ss Ale,Ble,Ele II Ale-Ble FI
FR
Elo
bla decrem,semeceunt,effset2_leop
d.pfsub.ss BRo,ARo,FR
FI
FR
II BRo-ARo FRo
nop
endit: :
II restore regs
fiadd.ss fO,fO,fO Ilexit DIM
fld.q
O(sp) ,f12
fiadd.ss fO,fO,fO Illast DIM pair
fld.q
l6(sp),f8
adds
32,sp,sp
bri rl
nop
11----------------------------------
2-411
intel"
AP-435
a----------------------------------~-------------------------
a difstepf.f: do one stage of fft (DIF) butterflies
a (C) Copyright 1989 INTEL Corporation. ALL RIGHTS RESERVED.
a-----------------------------------------------------------a Decimation in Freq, radix-2, inplace; l-dimen
a 6/20/89
c Do one entire stage (n/2 butterflies). Sample invocation:
c call difstep(a,w,groups,offset,wincr)
c Inputs:
a
A= complex array of input, single-prec float
c
(complex stored as 4byte real, 4byte imag contiguously)
c
W= pointer to array of twiddle factors. Assuming W(k) is
a
CMPLX(coS(2pi*k/N)) ,-sin(2pi*k/N)) for k=O to (N/2)-1.
c
offset = distance (in "elements") between
the 2 input values for each butterfly
c
groups = number of sub-DFTs this stage is split into.
c
(groups*offset*2 = N)
c
wincr = distance between successive w values for successive butterflies
c
c Outputs:
c A= complex butterflied version of input.
SUBROUTINE difstep(a,w,groups,offset,wincr)
integer groups,offset,wincr
integer i,j,indexl,iplus
complex a(groups*offset*2),w(groups*0~fset) ,wtemp,temp
c-------------------------------------------------------c We implement a •••
a Special case for offset=l(last stage): no complex multiplies, simple add
c (Performance enhancement)
IF (offset .eq. 1) THEN
CVD$ NODEPCHK
DO 8 i = 1,(2*groups),2
iplus = i + 1
temp = a(iplus)
a(iplus) = a(i) - temp
a(i) = a(i) + temp
8
ELSE
C-----------C Special case
for offset=2 (next-to-last stage): no complex multiplies,
cc
simple add. (Performance enhancement)
cc For half the butterflies, W=(l,O). For the other half, W=(O,-l)
IF (offset .eq. 2) THEN
CVD$
NODEPCHK
DO 90 i = 1,(4*groups) ,4
iplus = i + 2
temp = a(iplus)
a(iplus) = a(i) - temp
90
a(i) = a(i) + temp
C 2nd call to i-loop: w=cmplx(O,-l.)
NODEPCHK
CVD$
CVD$
NOVECTOR
DO 92 i = 2,(4*groups) ,4
iplus = i + 2
temp = a(i) - a(iplus)
a(i) = a(i) + a(iplus)
92
a(iplus) = CMPLX(AI~{AG(temp) ,-REAL(temp))
2-412
AP-435
ELSE
C-----------c "DO 20' indexl-loop
is 'outer loop"
CVD$
VECTOR
CVD$
NODEPCHK
DO 20 indexl = l,(2*offset*groups) ,(2*offset)
=
j
1
NODEPCHK
ALTCODE
DO 10 i = indexl,(indexl+offset-l)
iplus = i + offset
temp = a(i) - a(iplus)
a(i) = a(i) + a(iplus)
a(iplus) = w(j) • temp
10
j = j + wincr
20 CONTINUE
ENDIF
ENDIF
RETURN
END
cccccccccccccccccccccccccccccccccc
subroutine fetch(a,n)
integer n
complex a(n) ,temp
cc Kludge do-nothing pre fetch.
temp = a(l)
RETURN
END
cccccccccccccccccccccccccccccccccc
subroutine bitrev(a,dummy,n)
C Bit-Reverse
C Inputs:
C A= complex array of input, single-prec float
C dummy = %val(m). Probably unusable from Fortran.
C N = number of input points (and output points)
CVD$
CVD$
C Ouput:
C A = original A data, but in bit-reversed order from A
integer n,i,j,k,ndiv2
complex a(n) ,temp
C------------
C "DO 7" loop to in-place-bit-reverse-shuffle output
j=l
ndiv2 = n I 2
DO 7 i= 1, n-l
IF (i .It. j) THEN
temp = a(j)
a(j) = a(i)
a(i)
temp
ENDIF
k = ndiv2
C "While (j .gt. k)" I'decrease j by 2**something *1
6
IF (j .gt. k) THEN
j = j-k
k
k I 2
GOTO 6
ENDIF
C Add next lower power of 2 to j
7 j = j+k
RETURN
END
=
C------------
2-413
int:et
AP-435
11-------------/I bitrev.ss
II
(C) Copyright 1989 INTEL Corporation. ALL RIGHTS RESERVED.
/I
II
/I
II
BIT-reversal of 8byte array elements.
IN PLACE.
(Allows arrays of 8,16,32,64,128,256,512, or 1024 elements)
11-------------II INTEL is not responsible for use nor misuse of this
11-------------II 8/13/89
11====================================================
II Invocation: (from Fortran)
II
code.
call bitrev(a,%VAL(m))
/I Inputs:
a = r16 = pointer to array of 8byte elements
/I
m = r17 (call by value)= base-2 log of total number of elements
/I
(2**m = N)
/I
/I Outputs:
a= Bit-reversed ordered version of A
/I
/I
II Expected best-can-do performance, and measured performance=
II approx 4*N clocks (0.06 mSec for 512 points)
11-------------------
define (astart, r16) Ilinitial input data base address
define(m, r17)
define(logN,r17)
define(destl,r19)
define(dest2,r20)
define(dest3,r21)
define (dest4,r22)
define(iptr, r23) Ilindex-array pointer
define (decrem,r24) Ilbla decrement
define(count,r25) II bla counter
.text
.align .quad
11=========================================
_bitrev_: :
_bitr_::
Ilfetch base address for index table (rbasetab)
II base-addr-table elements = (baseaddr, number_of_swaps-2)
II base-addr-table indexed by logN.
shl
3,logN,r30 Ilscale to 8-byte-entry length
mov
rbasetab,r29
ld.l
r29(r30), iptr
4,r29,r29
addu
ld.l
r29(r30), count I/number of swaps required for this value N
pfld.d O(iptr) ,fO Ilinitiate fetch of first 2 bit-rev indices
pfld.d 8(iptr)++,fO
adds
-2,rO,decreml12 swaps per loop
pfld.d 8(iptr)++,fO
bla
decrem,count. revloop Ilinit LCC
pfld.d 8(iptr)++,f16
Ilget 2 indices, but don't cache the indices
2-414
untel®
AP-435
revloop:: 112 swaps per loop
117.5 cycles consumed for each swap, best case.
pfld.d 8(iptr)++,f18
112 more indices
fxfr f16,destl Iltransfer to integer index regs
fxfr f17,dest2
fld.d destl (astart) ,f24 Ilfetch 2 elements to swap
fld.d dest2 (astart) ,f26
fxfr fl8,dest3
fst.d
f24, dest2 (astart)
fst.d
f26, destl (astart)
fxfr fl9,dest4
fld.d dest3 (astart) ,f28
fld.d dest4 (astart) ,f30
pfld.d 8(iptr)++,f16
112 more indices
fst.d
f28, dest4 (astart)
bla decrem,count, revloop II
fst.d
f30, dest3 (astart)
bri
nap
rl
11---------------
II
II
_fetch8_: Touch all 32-byte lines in the 8k data bytes, to get them
into dcache. (ASSUMING .lte. 8Kbytes and .gte. 4Kbytes)
II
II
II
II
II
Invocation= fetch(astart,num8)
Inputs=
astart=r16=pointer to data which is to be touched.
num8=r17 (passed by VALUE, %VAL(), not by reference)
11------------
II
II
II
II
Using RC and RB to improve dcache hit rates, for FFTs bigger than
1024 complex (8kB).
RC=lO causes replacement only of block denoted by.RB lsbit. RC=ll disables
replacement.
11-------define (num8,r17)
define (FEtch, r26)
_fetch8_: :
_fetch_: :
ld.c dirbase,r30
or Ox800,r30,r30 II Replace Dcache slot 0 only (RC=lO,RB=OO)
st.c r30,dirbase
II Put 4Kbytes into Dcache slot O. (The rest after 4kB goes to slotl).
adds
-4,rO,decrem 114 8-byte-groups per cache line
adds
508,rO,count 11512, but pre-decremented for bla usage
bla
decrem,count,floop
adds
-32,astart,FEtch
floop: :
bla
decrem,count,floop
fld.d
32(FEtch)++,f30 Ildummy load.
adds
-512,num8,count
bc
fdone Ilif data exhausted, quit
II ld.c dirbase,r30
or Ox900,r30,r30 II Replace Dcache slot 1 only (RC=lO,RB=Ol)
st.c r30,dirbase
2-415
II
AP-435
adds
-8.count.count Ilpredecr for bla
bla
decrem.count.floop2 Ilset LCC
fld.d
32(FEtch)++.f30
floop2: :
bla
decrem.count.floop2
fld.d
32(FEtch)++.f30 I/dummy load.
fdone: :
II unlock dcache
andnot OxFOO.r30.r30 Ilclear RC.RB (dirbase(11:8))
st.c r30.dirbase
bri
rl
nop
.data
11--------------II rbasetab:: (Table
of bit-reversed indices for bitrev subroutine)
base-addr-table elements = (baseaddr. number_of_swaps-2)
base-addr-table indexed by 10gN •
•align • quad
rbasetab: :
.long [6]0 Ildon't bother with 10g(n)=0.1.2
.long rev8.
0
.long rev16.
4
.long rev32.
10
.long rev64.
26
.long rev128. 54
.long rev256. 118
.long rev512. 238
.long revl024. 494
II
II
11=====================
Ilnumber of swaps=240 for N=512 (ie. 32 symmetrical patterns
II exist between 0 and 511.)
/I rev512: array of bit-reversed indices. for N=512.
II Each entry is ("i". and "bit-reversed-i"). shifted left by 3
II
to account for 8-byte-elements.
1/ NOTE: This listing DOES NOT SHOW all the table elements. to save paper.
• align .quad
rev512: :
.long
8. 2048.
16. 1024
.long
24. 3072.
32. 512
.long
40. 2560.
48. 1536
II ETC •••• 'ETC ••••• ETC •••
1/===============
.align .quad
revl024: :
.long
8. 4096.
16. 2048
.long
24. 6144.
32. 1024
.long
40. 5120.
48. 3072
.long
56. 7168.
64. 512
1/ ETC •••• ETC ••••• ETC •••
2-416
intel"
AP-435
IINumber of swaps = 496
lIN (Number of elements) = 1024
11=================
.align .quad
\
rev16: :
.1ong
.1ong
.1ong
rev8: :
.1ong
1*8.4*8.3*8.6*8
rev32: :
.long
.1ong
8. 128.16. 64. 24. 192. 40. 160. 48. 96. 56. 224
72. 144. 88. 208. 104. 176. 120. 240. 152. 200. 184. 232
rev64: :
.1ong
.1ong
.1ong
• 1ong
8.
24.
40.
56.
1*8.8*8.2*8.4*8
3*8.12*8.5*8.10*8
7*8.14*8.11*8.13*8
11=================
• align • quad
11=================
.align .quad
256.
384.
320 •
448.
16.
32.
48.
72.
128
64
192
288
16.
32.
48.
72.
256
128
384
576
II ETC •••• ETC ••••• ETC •••
11=================
.align .quad
rev128: :
.1ong
8.
.1ong
24.
.1ong
40.
56.
• 1ong
512.
768.
640 •
896.
II ETC •••• ETC ••••• ETC •••
IINumber of swaps = 56 (Number of elements) =128
11=================
• align • quad
rev256: :
.1ong
8. 1024.
• long
24. 1536.
• 1ong
40. 1280.
.1ong
56. 1792.
16.
32 •
48 •
64.
512
256
768
128
II ETC .... ETC ..... ETC •••
IINumber of swaps = 120. N (Number of elements) = 256
2-417
,.
intel~
AP-435
PROGRAM FFTTEST
C
C
l-D FFT TEST PROGRAM
C
C Intel assumes no responsibility for use or misuse of this code.
C
C 7/20/89
C------------------
C
c
c
c
c
c
c
c
character*8 REALLY
PARAMETER (IREV=O)
PARAMETER (REALLY='complex')
PARAMETER (TIMEIT=l, CACHETIME=O)
DATA IT/200000/
PARAMETER (N=1024,M=10)
PARAMETER (N=5l2,M= 9)
PARAMETER (N=256,M= 8)
PARAMETER (N=128,M= 7)
PARAMETER (N=64,M= 6)
PARAMETER (N=32,M= 5)
PARAMETER (N=16, M=4)
PARAMETER (PI=3.l4l5926536)
COMPLEX X(N) ,Xl(N) ,X2(N) ,X3(N), W(N/2)
Fortran complex values stored R,I, R,I for arrays.
Real ASQR(N) ,ASQR2(N),XR(N)
complex wtemp
real rtemp
C
PRINT *,' FFT test program (ffttest.f)
print *, '==============================='
IF (IREV .eq. 0) THEN
print *, 'NOT counting time for bit-reversal.'
print *,'DO NOT expect matching answers,without bit-rev'
ELSE
print *, 'Time for bit-reversal included.'
ENDIF
print *, 'Time for cache writeback and fills ••• '
IF (CACHETIME .eq. 0) THEN
print *,' NOT included, if iterating.'
ELSE
print * , ,
included. '
ENDIF
print * , '=============================='
print * , 'If iterating ••• Number of Iterations =',IT
print * , '==============================='
print * , 'Number of Points
= " N
print * , '(',REALLY,' data)'
print * , '================================'
2-418
AP-435
C------------------
C Init twiddle factor array w(k) with (cos,-sin) of 2pi*k/N
C (Should just declare this as constant, if N is non-variable)
C (OR could have one constant 512-entry W (for N=1024), adjust wincr accordingly
C in diff.f for smaller N)
rtemp = 2.0*pi/N
wtemp= CMPLX(cos(rtemp), -sin(rtemp))
w(l) = (1.0, 0.0)
DO 200 k = 2,N/2
200
w(k) = wtemp * w(k-l)
cc print *,' W (twiddle) initialization completed •••••• '
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
C INITIALIZE input data
C
c
c
c
PIN = (4*PI)/ N
DO 100 I = 1, N
For testing with sinewave input data:
Treal = COS ( I *PIN)
Timag
SIN( I*PIN)
=
c For testing with squarewave input:
cc IF (I .It. N/2) THEN
cc
Treal = 1.0
cc
Timag = 0.5
cc ELSE
cc
Treal = 0.0
cc
Timag = 0.0
cc ENDIF
C For testing with ramp function input data:
Treal = I - 1.0
Timag = Treal + 0.5
X(I) = CMPLX (Treal, Timag)
Xl(I) = CMPLX (Treal, Timag)
X2(I) = CMPLX (Treal, Timag)
X3(I) = CMPLX (Treal, Timag)
100
CONTINUE
C
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
IF (TIMEIT .ne. 0) THEN
CALL fft (X2, M, N)
cc Subroutine fft is Decimation-In-Time, Fortran version.
c
CALL ditt(X, M, N,W,IREV)
CALL diff(X, M, N,W,IREV)
ENDIF
ccccccccccccccccccccccccccccccccccccccc
IF (IREV .ne. 0) THEN
IF (TIMEIT .eq. 0) THEN
call vcompare(X,X2,2*N)
call cmags(X,N,ASQR)
c cmags to take squared magnitude of complex values
call cmags(X2,N,ASQR2)
2-419
intel~
AP-435
c----------------------c
C print non-zero results:
J=O
DO 700 I = 1,N
IF ((ASQR(I) .GT. 1.0) .OR. (ASQR2(1) .GT. 1.0)) THEN
WRITE (6,22) (1-1), ASQR(I), ASQR2(I)
22
FORMAT (' 1-1=' ,14,' ASQR(I)= ',F14.2, ' ASQR2(I)= ',F14.2//l
J = J+l
IF (J .GT. 32) GO TO 725
ENDIF
700 CONTINUE
725
CALL TIME
ENDIF
ENDIF
IF (TIMEIT .ne. 0) THEN
ccccccccccccccccccccccccccccccccccccccc
cc- Timing loop follows:
print *,' Start Ass.FFT'
IF (CACHETIME .eq. 0) THEN
DO 500 I = 1, IT,4
C Reuse same array, so cache' fill and writeback time NOT included.
CALL diff(X, M, N,W,IREV)
CALL diff(X, M, N,W,IREV)
CALL diff(X, M, N,W,IREV)'
500
CALL diff(X, M, N,W,IREV)
ELSE
DO 504 I = 1, IT,4
C Alternating between X,Xl,X2,X3 should provide cache misses.
CALL diff(X, M, N,W,IREV)
CALL diff(Xl, M, N,W,IREV)
CALL diff(X2, M, N,W,IREV)
504
CALL diff(X3, M, N,W,IREV)
ENDIF
print *,' END Ass. FFT'
ccccccccccccccccccccccccccccccccccccccc
ENDIF
STOP
END
2-420
intel"
AP-435
0----------------------0
subroutine veompare(res,exp,n)
o VCOMPARE compares 2 REAL vectors, prints out 1st few miseompares
o
integer n, errent
real res(n), exp(n)
12
write(6,12)
format('*** VCOMPARE: veotor comparison beginning ***')
data errent/O/
do 30 i
l,n
if(AINT(res(i)) .ne. AINT(exp(i))) then
o (print out error, exit if alot already)
120
print *,'*** Error in oompares ***'
write(6,121) i
121
format (' Item number = ',16)
write (6,124) res (i), exp(i)
124
format (' Res_=' ,F14.2,' Expeoted_=' ,F14.2)
errent = errent + 1
if (erront .gt. 19) then
return
end if
end i f
30
continue
=
190
if (errent .eq. 0) then
print *,' *** vector compares SUCCESSFUL ***'
end i f
99 return
end
e-----~----------------e
2-421
•
intel.
AP-435
c--------------C File: ditt.f
C 6/15/89
C Intel assumes no responsibility for use or misuse of this code.
C FFT - Decimation in TIME, radix-2, inplace, I-dimen
C Inputs:
C A= complex array of input,up to· 1024 pts, single-prec float
M= log of number of pts
C
= (Number of stages of FFT)
C
C
N = number of points. ie, N= 2**M = number of pts
W= complex array of twiddle factors, length=N/2.
C
REV= ignored parameter.
C
C
C Outputs:
C A= complex fft of input A. Correct order (bit-reversal done).
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
subroutine ditt(a,m,N,W,REV)
integer m,N, i, REV,wlimit
integer offset, stage, groups, wincr,powers2(0:10)
complex a(n) ,w(N/2) ,temp
data powers2 /1,2,4,8,16,32,64,128,256,512,1024/
C Powers2 to avoid calls to POW, DIV
C Twiddle factor array wei) has (cos,-sin) of 2pi*i/N
CC Assume the caller provides wei), constants ALREADY initialized
C-----------C Pre-touch data,
lock into cache, for 8kByte fft:
IF (N .gt. 513) THEN
call fetch(a,%VAL(n))
ENDIF
C-----------call bitrev(a,%VAL(M) ,n)
C Bitreversal of input needed for in-place decim in time FFT, to avoid
C fetching twiddle-factors in bitrev order.
wlimit = 8*((N/2) - 1)
DO 20 stage = I,m
groups = powers2(m-stage)
C groups=number of times the twiddle factors are used, ie, the number of
C smaller DFTs the stage is split into.
C
offset gets 1,2,4,8, ••• N/2
offset = powers2(stage-l)
wincr = groups
call ditstep(a,w,groups,offset,wincr,wlimit)
20 CONTINUE
RETURN
END
C------------
2-422
int:et
AP-435
11-------------II ditstep.ss: do
II
II
II
one stage of fft butterflies
DIT = Decimation in Time, radix-2, inplace, I-dimension
(C) Copyright 1989 INTEL Corporation. ALL RIGHTS RESERVED.
7/15/89
11-----------II Intel is not responsible
11-----------II Do one entire stage (n/2
II
for use nor for misuse of this program.
butterflies). Sample invocation:
call ditstep(a,w,groups,offset,wincr,wlimit)
11====================================================
II Inputs:
II
//
II
II
II
II
II
//
II
II
II
A= complex array of input, single-prec float
(complex stored as 4byte real, 4byte imag contiguously)
W= pointer to array of twiddle factors. Assuming W(k) is
CMPLX(cos(2pi*k/N»,-sin(2pi*k/N» for k=O to (N/2)-1.
offset = distance (except for scale-by-8byte sizeof(complex» between
the 2 input values for each butterfly.
Offset also is the number of butterflies done per "group".
groups = N/(2*offset). The number of sub-DFTs this stage is split· into.
wincr = distance (except for scale-by-8byte sizeof(complex» between
successive w values for successive butterflies
wlimit =max index, in bytes, of W table.
//
II
II
Outputs:
A= complex radix-2 butterflied version of input.
//
11------------------define (astart, rlS) II
input data base address
define (wstart,r17) Iltwiddle array ptr. Because w-contents depend on N,
II we will assume the caller has initialized w() array.
define (groups,r18) Ilgroups=number of sub-DFTs this stage is split into.
define(offset,r19) .//offset (initially elements, mult by 8 to get bytes)
II between node and its dual (the 2 numbers to butterfly, ie. A and B)
define (wincr,r20) Ilincrement between successive W values. Remains constant
II within a given stage.
define (wlimit ,r21) Ilmax index, in bytes, of W table.
define (wind,r22) Ilcurrent index, in bytes, of W table.
define (offset2,r23) Iloffset*2
define (decrem,r24) Ilbia decrement
define (somecount ,r25) II bla counter
define (FEtch, r2S) Ilpointer to 1st component of butterfly (load)
define (STore,r27) II " " 1st component of butterfly (store)
define (offsetp8,r28) Iloffset+8
2-423
AP-435
II f4:f7 spare
define (ARe,fl2)
define (AIe,fl3)
define (ARo,fl4)
define (AIo.fl5)
define (BRe,fl6)
define(BIe,fl7)
define (BRo,flS)
define(BIo,fl9)
Ilelement A, real component
II " ", imag
II extra A value,
for prefetch (o="odd")
Ilelement B, real component
II
extra B value, for pre fetch
define(ERe,f20) IIA+(B*W) , real (ER = AR + BR)
define(EIe,f21) II"
imag"
define(ERo,f22) II previous loop's value
define(EIo,f23) II·
imag"
define (FRe,f24) IIA-(B*W) .. real
define(FIe,f25) II "
imag "
define (FRo,f26) II previous loop's value
define(FIo,f27) II"
imag"
define (PR, f2S) II(B*W) , real
define(PI, f29) II(B*W) , imag
define (WRe, f30)
define(WIe,f31)
define (WRo,flO)
define (WIo, fll)
IIW (twiddle factor), real part
II " " ,
imag
IIW (twiddle factor), real part (EXTRA copy)
II " " ,
imag
.text
.align • quad
_ditstep_: :
ld.l
O(groups) ,groups Ilfix Fortran call-by-ref
ld.l
O(offset) ,offset II
shl
3,offset,offset II change from elements to bytes
shl
1,offset,offset2
adds
S,offset,offsetpS
fst.q
fst.q
adds
adds
fS ,-16(sp)++ Iisave "local" regs
f12,-16(sp)++ II " "
-l,groups,groups II pre-decrement for bnc usage, or bla usage
-16,rO,decrem Ilbla decrement
II We code the last 2 stages as special cases:
11-------xor
S,offset,rO Iloffset=l,special case, no complex mult, funny addressing
be
offset_lll (ASSUMING offset=l means wincr=O, and no twiddle used)
xor
16,offset,rO Iloffset=2, special case, no complex mult
offset_2
.
be
I r-:------id.l
O(wincr) ,winer
id.l
O(wlimit) ,wlimit
2-424
int:et
AP-435
pfadd.ss
pfadd.ss
pfadd.ss
pfmul.ss
pfmul.ss
pfmul.ss
fO,fO,fO
fO,fO,fO
fO,fO,fO
fO,fO,fO
fO,fO,fO
fO,fO,fO
II
init Al,A2,A3=O
11-------II
shl
shl
init pointers:
3,wincr,wincr
Iiscale for bytes.
l,wincr,wind Ilinit wind =2*wincr
pfld.d
pfld.d
adds
pfld.d
adds
/I here
pfld.d
adds
/lfirst
fld.d
o ( wstart) ,fO
wincr ( wstart) ,fO
-8,astart,FEtch
wind (wstart) ,fO
wincr,wind,wind Ilwind now 3*wincr
fetch first set of B,W before bla-loop
wind (wstart) ,WRe
wincr,wind,wind
Bfetch from offset, then 1st afetch from O.
offsetp8 (FEtch) ,BRe Ilfirst B value
and
wlimit,wind,wind Ilmodulo-wlimit the w index
We do modulo-addressing on W(), to keep the pfld pipeline full. We
never do a W-fetch beyond the end of the table.
And the modulo-check needs to be done only every 4th pfld, as always
we use a multiple of 4 W() factors.
II
II
II
II
d.r2apl.ss fO,fO,fO Ilclear Treg.
adds -32,offset,somecount II bla counter (predecrement by 4 elements)
II ----------II Definitions for pipe
II Anew = E = A+(B*W)
II Bnew = F = A-(B*W)
II
Let P=(B*W).
diagram:
11-------------II
II
II
II
II
/I
/I
II
II
II
(the complex multiply product, P, broken into 4 real mult and 2 adds) :
WR = cos(), WI=-sin().
PR = K - L; where K= WR*BR, L=WI*BI
PI = N + M; where N= WI*BR, M=WR*BI
ER = AR + PR (Overwrites AR)
EI = AI + PI (
AI)
FR = AR - PR (
BR)
FI = AI - PI (
BI)
For 1st time thru inner--loop, don't have correct values to store.
Must do 1 loop before the loop, sans the stores.
11-----------------
first_bfly::
II fill
pipe
2-425
intel~
Ap·435
II KR ••• KI ••• Ml •••• M2~ ••• M3
d.r2pt.ss WRe,fO,fO
II WRe
pfld.d wind (wstart),WRo
d.i2st.ss Wle,fO,fO II
Wle
adds
wincr,wind,wind
d.r2apl.ss fO ,BRe,fO II
KO
fld.d 8 (FEtch)++,ARe Ilfirst A value
d.pfmul.ss Wle,Ble,fO I!
LO
KO
pfld.d wind (wstart),WRe
d.r2pt.ss WRo,Ble,fO II WRo
MO
LO
·KO
fld.d offsetp8 (FEtch),BRo
d.ratls2.ss fO ,PR ,fOI!
MO
LO
adds
wincr,wind,wind
d.i2st.ss Wlo,BRe,fO /1
Wlo NO
MO
nop
T
Al •••• A2 •••• A3 •••• Write
KO
KO
K-LO
/ I· ••••••••••••••••••••••••••••••••••••••••••••.••.•••••••••••.•••
d.r2apl.ss fO ,BRo,fO I!
Kl
NO
MO
PRO
and
wlimit,wind,wind
PRO
d.pfsub.ss fO ,PI ,fO I!
Kl
NO
MO
fld.d 8 (FEtch)++,ARo
d.pfadd.ss ARe,PR ,PR II
Kl
NO
MO ERO
fld.d offsetp8 (FEtch) ,BRe
d.pfmul.ss Wlo,Blo,fO II
Ll
Kl
NO
MO ERO
nop
d.r2pt.ss WRe,Blo,fO II WRe
Ml
Ll
Kl
MO M+NO ERO
bla
decrem,somecount,restart IlinitLCC
Kl FRO
PIO . ERO
d.ratls2.ss ARe,PR ,fOI!
Ml
Ll
nop
restart: :
d.i2st.ss Wle,BRo,ERel!
Wle Nl
Ml
Kl K-Ll FRO
PIO
adds
-16,astart,STore II ptrs init 16 low, for fst.q instructions
PRO.
ERO
/1------------------Each butterfly = 1
II
II
I!
II
II
II
II
II
II
=
complx multiply, 1 complx add, 1 complx subtract
4 multiply, 3 add, 3 subtract
3 a-byte fetches (A, B, W)
2 a-byte stores (A, B)
7 cycles per butterfly
inner_loop: iterates "offset/2" times
for each group. It does 2 butterflies per iteration
1/ AR/AI fetches need to be a cycle behind BRIBI fetches here. So we
II must index with offset+8 into B.
I! AR is used 1/2 loop before AI.
1/ Pattern= AIO,ARl,BR2,BI2;AIl,AR2,BR3,BI3.
inner _loop: :
II KR ••• KI ••• Ml •••• M2 •••• M3
d.r2apl.ss Ale,BRe,PI II
K2
Nl
pfld.d wind (wstart) ,WRo
d.pfsub.ss Ale,PI ,FRel!
K2
Nl
fld.d 8(FEtch)++,ARe
Nl
d.pfadd.ss ARo,PR ,PR II
K2
fld.d offsetp8 (FEtch) ,BRo
L2
Nl
d.pfmul.ss Wle,Ble,fO II
K2
adds
wincr,wind,wind
2-426
Ml
Al •••• A2 •••• A 3 •••• Write
EIO
PRI
FRO
PlO
Ml
FlO
EIO
PRI
FRO
Ml
ERI
FlO
EIO
PRI
Ml
ERI
FlO
EIO
T
Ap·435
d.r2pt.ss WRo,Ble,Ele II WRo
M2
L2
K2
pfld.d wind (wstart) ,WRe
d.ratls2.ss ARo,PR ,Flell
M2
L2
adds
winer,wind,wind
d.i2st.ss Wlo,BRe,ERol1
Wlo N2
M2
and
wlimit,wind,wind Ilmodulo.
/! KR ••• KI ••• Ml •••• M2 •••• M3
d. r2apl. 55 Alo,BRo,PI /!
K3
N2
nop
d.pfsub.ss Alo,PI ,FRolI
K3
N2
fld.d 8 (FEtch)++,ARo
d.pfadd.ss ARe,PR ,PR II
K3
N2
fld.d offsetp8 (FEtch) ,BRe
d.pfmul.ss Wlo,Blo,fO II
L3
K3
N2
nop
d.r2pt.ss WRe,Blo,Elo II WRe
M3
L3
K3
fst.q ERe,16(STore)++ Ilupdate ERe/Ele/ERo/Elo
d.ratls2.ss ARe,PR ,Flol1
M3
L3
bla decrem,someeount, inner_loop
d.i2st.ss Wle,BRo,ERell
Wle N3
M3
fst.q FRe, offset (STore)
Ilupdate FRe/Fle/FRo/Flo
M+Nl
ERl
FlO
EIO
K2
FRl
Pll
ERl
FlO
K2
K-L2
FRl
Pll
ERl
T
M2
Al •••• A2 •••• A3 •••• Write
Ell
PR2
FRl
Pll
M2
Fll
Ell
PR2
FRl
M2
ER2
Fll
Ell
PR2
M2
ER2
Fll
Ell
M+N2
ER2
Fll
Ell
K3
FR2
PI2
ER2
Fll
K3
K-L3
FR2
PI2
ER2
end_inner_loop:: IIKEEP Pipelines full
II RE-init pointers for fetches
d.fiadd.ss fO,fO,fO
adds
offset2,astart,astart Ilbump to next group
lIre do A,B fetches, with proper ptr.
d.fiadd.ss fO,fO,fO
fld.d offset (astart) ,BRe Ilget first BR/BI in next group
d.fiadd.ss fO,fO,fO
adds
-8,astart,FEtch
last_bflY:: lIdo final 2 butterflies, start next group
II KR ••• KI ••• Ml •••• M2 •••• M3
T Al •••• A2 •••• A3 •••• Write
d.r2apl.ss Ale,BRe,PI II
KO
N3
M3 EI2
PR3
FR2
PI2
pfld.d wind (wstart) ,WRo
d.pfsub.ss Ale,PI ,FRell
KO
N3
M3 FI2
EI2
PR3
FR2
fld.d 8(FEtch)++,ARe
d. pfadd .• ss ARo, PR ,PR II
KO
N3
M3 ER3
FI2
EI2
PR3
fld.d offsetp8 (FEtch) ,BRo
d.pfmul.ss Wle,Ble,fO II
LO
KO
N3
M3 ER3
FI2
EI2
adds
wincr,wind,wind
d.r2pt.ss WRo,Ble,Ele II WRo
MO
LO
KO
M+N3 ER3
FI2
EI2
pfld.d wind (wstart) ,WRe
d.ratls2.ss ARo,PR ,Flell
MO
LO
KO FR3
PI3
ER3
FI2
adds
winer,wind,wind
d.i2st.ss Wlo,BRe,ERo/!
Wlo NO
MO
KO K-LO FR3
PI3
ER3
and
wlimit,wind,wind Ilmodulo
I I • ..............................................................
d.r2apl.ss Alo,BRo,PI II
Kl
NO
MO EI3
PRO
FR3
PI3
adds -32, offset, somecount I I reset ·bla counter
d.pfsub.ss Alo,PI ,FRol1
Kl
NO
MO FI3
EI3
PRO
FR3
fld.d 8 (FEtch)++,ARo
2·427
II
AP-435
d.pfadd.ss ARe,PR ,PR II
Kl
NO
fld.d offsetp8 (FEtch) ,BRe
d.pfmul.ss Wlo,Blo,fO II
Ll
Kl
bla
decrem,somecount,nowhere Ilre-init
d.r2pt.ss WRe,Blo,EIo II WRe
Ll
Ml
adds -l,groups,groups
nowhere: :
d.ratls2.ss ARe,PH. ,Floll
Ml
fst.q ERe,16(STore)++
d.fnop
bnc.t restart Ilbranch on value of groups
d.fnop
fst.q FRe, offset (STore)
MO
ERO
FI3
EI3
NO
MO
LCC=l
Kl
ERO
FI3
EI3
M+NO
ERO
FI3
EI3
Ll
FRO
PIO
ERO
FI3
Kl
PRO
end_lasLbfly: :
d.fnop
br endit
fiadd.ss fO,fO,fO
fst.q FRe, offset (STore) Ilrepeated for bnc.t untaken case
.align
.quad
=========
offset_l: :
II want FEtch=O,2,4,6,8, ••• elements. ASSUMING wincr=O,
II and that w=(l,O), so that no complex mult needed.
II E=A+B, F=A-B. (Per double-butterfly loop: 8 pfadd,4 dword fld, 4 fst,
/I 1 bla) (fld.q used to reduce # flds)
II Performance = 4 cyc/bfly best case.
11==============::=====================================
IIRedefine regs for fld.q,fst.q usage, when A and B adjacent:
define (AR3,f12) Ilelement A, real component
define (AI3,fl3) II " ., imag
define (BR3,fl4) /Ielement B, real component
define (BI3,fl5)
define (AR4,f16) II extra A value, for prefetch
define (AI4,fl7.)
define (BR4,fl8)
define(BI4,f19)
define (ER3,
define(EI3,
define (FR3,
define(FI3,
f20) IIA+B, real (ER = AR + BR)
f21) /I"
imag·
f22) II(A-B) , real
f23) II •
imag
define (ER4,f24)
define (EI4,f25)
define (FR4,f26)
,define(FI4,f27.)
IIA+B, real
II· imag
II(A-B) , real
II· imag
11=========================================
adds
-16,astart,FEtch
fld.q 16 (FEtch)++,AR4
adds
-l,groups,somecount II blacounter (predecremented .already by 1)
/Iusing groups=blacount on the offset_l loop, intentionally.
adds
-16,FEtch,STore
Ilstartup the loop:
2-428
AP-435
I I ---------------------11 Al. ••••• A2 •••••• A3 •••••• Write:
d.pfadd.ss AR4.BR4.fO
II ARn+BRn fld.q 16 (FEtch)++.AR3'
d.pfadd.ss AI4.BI4.fO
II Aln+Bln ERn
adds
-2.rO.decrem 112 bflies per loop
d.pfsub.ss AR4.BR4.fO
II ARn-BRn Eln
ERn
bla decrem.somecount. offsetl_loop Ilinit LCC
d.pfsub.ss AI4.BI4.ER4 II Aln-Bln FRn
Eln
ERnext
nop
II ---------------------11 Al •••••• A2 •.••.• A3 .•••.• Write:
offsetl_loop: :
d.pfadd.ss AR3.BR3.EI4 II AR+BR
FIFREInop
d.pfadd.ss AI3.BI3.FR4 II AI+BI
ER
FIFRfld.q 16 (FEtch)++.AR4
d.pfsub.ss AR3.BR3.FI4 II AR-BR
EI
ER
Flfst.q
ER4.16(STore)++
d.pfsub.ss AI3.BI3.ER3 II AI-BI
FR
EI
ER
nop
d.pfadd.ss AR4.BR4.EI3 II AR2+BR2 FI
FR
EI
fld.q 16 (FEtch)++.AR3
d.pfadd.ss AI4.BI4.FR3 II AI2+BI2 ER2
FI
FR
nop
d.pfsub.ss AR4.BR4.FI3 II AR2-BR2 EI2
ER2
FI
bla decrem.somecount. offsetLloop
ERnext
d.pfsub.ss AI4.BI4.ER4 II AI2-BI2 FR2
EI2
fst.q ER3.16(STore)++
11-------- -
end_offsetl_loop::
d.fiadd.ss fO.fO.fO
br endit
fiadd.ss fO.fO.fO
nop
11---------
• align • quad
offset_2: :
II want FEtch=O.1;4.5;8.9;12.13; ••• elements.
II ASSUMING wincr=N/4 (W_addr=O.N/4.0.N/4.0 •••• ). Trivial W() factors.
II Even-indexed elements identical to offset_l.W=WO. no complex multo
II
So
EReven= (AR+BR). Eleven= (AI+BI) •
II
SO
FReven=(AR-BR). Fleven=(AI-BI).
II
Odd components have W=(O.-l). So B*W = (BI.-BR).
SO ERodd=Re(A+(B*W» = (AR+BI)
Elodd=(AI-BR).
III SO FRodd=Re(A-(B*W» = (AR-BI)
Flodd=(AI+BR).
II Each fld.q fetches AReven.Aleven.ARodd.Alodd.
II
IIAssume ERe.Ele.ERo.Elo are 4 contiguous regs.
IIAssume FRe.Fle.FRo.Flo are 4 contiguous regs.
IIAssume ARe.Ale.ARo.Alo are 4 contiguous regs.
2-429
•
AP-435
adds
-16,astart,FEtch
fld.q 16 (FEtch)++,ARe
fld.q 16 (FEtch)++,BRe
adds
O,groups,somecount Ilbla counter
Ilstartup the loop:
II ---------------------11 Al •••••• A2 •••••• A3 •••••• Write:
pfadd.ss ARe,BRe,fO
II AR+BRe
pfadd.ss Ale,Ble,fO
II AI+Ble ER
d.pfadd.ss ARo,Blo,fO
II ARo+Blo EI
ER
nop
d.pfsub.ss Ala,BRo,ERe /I Alo-BRo ERa
EI
ER
nop
d.pfsub.ss ARe,BRe,Ele II AR-BRe Elo
ERa
EI
ads
-l,rO,decrem 112 bflies per loop,but groups is half desired value.
d.pfsub.ss Ale,Ble,ERo II AI-Ble FR
Ela
ERo
adds
-16,astart,STore
d.pfsub.ss ARo,Blo,Ela /I ARo-Blo FI
FR
Elo
bla decrem,somecount, offset2_1oop Ilinit LCC
d.pfadd.ss Alo,BRo,FRe II Alo+BRo FRo
FI
FR
nop
offset2_1oop: :
d.fnop
fld.q 16 (FEtch)++,ARellfetch AR,AI,ARo,Alo
d.fnop
fld.q
16 (FEtch)++,BRe
II -------
II Al •••••• A2 •••••• A3 •••••• Write:
d.pfadd.ss ARe,BRe,Fle II AR+BRe Flo
FRo
FI
nop
d.pfadd.ss Ale,Ble,FRo II AI+Ble ER
Flo
FRo
nop
d.pfadd.ss ARo,Blo,Flo II ARo+Bla EI
ER
Flo
fst.q
ERe,16(STare)++ Ilupdate ER ,EI ,ERo,Elo
d.pfsub.ss Alo,BRa,ERe II Ala-BRa ERa
ER
EI
nop
d.pfsub.ss ARe,BRe,Ele II AR-BRe Ela
ERo
EI
®nap
Elo
ERa
d.pfsub.ss Ale,BIe,ERa /I AI-Ble FR
fst.q FRe,16(STare)++
d.pfsub.ss ARa,Bla,Ela II ARo-Bla FI
FR
Elo
bla decrem,samecaunt,affset2_laop
d.pfadd.ss Ala,BRa,FRe II Ala+BRa FRo
FI ,
FR
nap
endi t::
II restore regs
fiadd.ss fO,fO,fO /lexit DIM
fld.q
O(sp) ,f12
fiadd.ss fO,fO,fO Illast DIM pair
fld.q
16(Sp) ,fS
adds
32,sp,sp
bri rl
nap
II===================~===========================
2-430
AP-435
C---------------
C File: dirr.f
C FFT - Decimation in Freq, radix-2, inplace, l-dimen,
C
REAL input
C Intel is not responsible for use nor misuse of this code.
C 8/14/89
C Inputs:
C A= REAL array of input, up to 1024 pts, single-prec float
C M= log of number of pts
C
= (Number of stages of FFT)
C N = number of points. ie, N= 2**M = number of pts
C W= complex array of twiddle factors, length N/2.
C REV= 0 if bitreversed output ok. l=must re-order output
C
(REV will be ignored, and output will be properly ordered. Bit
C
reversal WILL be done.)
C
C Outputs:
C A= complex fft of input A, but only the positive "frequency half.
C
Length = N/2+1 complex numbers. A(0:n/2)
C
subroutine dirr(a,m,N,W,REV)
integer m,N, i, j,k, REV,wlimit
integer offset, stage, groups, wincr,powers2(0:10)
real a(N)
complex w(N/2) ,temp
data powers2 /1,2,4,8,16,32,64,128,256,512,1024/
C Powers2 to avoid calls to POW, DIV
C Twiddle factor array w(k) has (cos,-sin) of 2pi*k/N
CC Assume the caller provides w(k) constants ALREADY initialized
C-----------C Pre-touch data,
for 8kByte fft: (2048 points real)
IF (N .gt. 1025) THEN
callfetch(a,%VAL(n/2»
ENDIF
C------------
wlimit = 8*((N/2) - 1)
C "DO 20" stage-loop: doing Complex FFT on length N/2 array. Twiddles are
C for a length N array, so wincr gets scaled by 2.
DO 20 stage = l,m-l
groups = powers2(stage-l)
C groups=number of times the twiddle factors are used, ie, the number of
C smaller DFTs the stage is split into.
C offset gets N/4,N/8,N/16,~ ••
offset = powers2(m-l-stage)
wincr = groups * 2
call difstep(a,w,groups,offset,wincr,wllmit)
20 CONTINUE
call bitrev(a,%VAL(M-l) ,n/2)
call realfix(a,w,%VAL(n»
RETURN
END
C-----------2-431
intet
AP·435
1/ realfix.ss: This is i860(tm) CPU assembly code to revise data from an
1/ N/2 length Complex FFT.
(assumes the input data fed to Complex FFT was N real values)
1/
II
1/
1/
II
II
1/
1/
1/
1/
1/
1/
1/
1/
II
II
INTEL is not responsible for use nor misuse of this code.
8/14/89
This l8-cycle-butterfly loop may be sub-optimal.
output = overwrite the data array used for input. Results are
complex.
ReO.ImO.Rel.Iml ••••• Re(N/2) .Im(N/2).
NOTE that output array is 1 element longer than input.
Input is H(k). output is F(k) •••
F(k)=.5*( H(k)+ Hconj(N/2-k) -j*(H(k) -Hconj (N/2-k))*Wconj (k))
Algorithm from "Numerical Recipes in CR. by Flannery. Press. Teukolsky. and
Vetterlirig. Cambridge Univ. Press 1988. p.4l7.
11*************************1
11* The C-version of realfix: *1 void realfix_(a.w.n)
111*Input =
1/ a(O:n+l): length n/2+l complex array. Entries O:n/2-l are the complex FFT
II * result. in correct (NON BIT. REVERSED) order. Entry n/2 is undefined.
II * w: length n/2 complex array of twiddles. (cos.-sin(2pi*k/n))
II * n: call-by-value. number of REAL input samples
II
II
II
II
II
*Output =
* a(O:n+l): length n/2+l. complex array.
*
Format is ReO.ImO.Rel.Iml ••••• Re(N/2).Im(N/2).
*
NOTE: To generate entire N-lepgth complex output spectrum. you can copy
*
conjugate of element(i) to element(N-i).
1/ *1
II float
1/
1/
1/
/I
1/
/I
a[]. w[]; int n;
{int aptr. bptr. wptr; float half=O.5.
AR.AI.BR.BI. 1* input values for A.B*I
PR.PI.SR.SI.DR.DI. I*temporary differences.sums.products*1
K.L.M.N .. I*temporary products *1
ER.EI.ERD.EID.
FR.FI.FRD.FID.
WR.WI;
111*We do first and last elements as special case {Imag=O. W=(l.O))*1
1/ AR = a[O];
AI = a[l];
/I a[O]
AR + AI; a[l]
0;
1/ a[nl
AR - AI; a[n+l] = 0;
=
=
=
2-432
inteL
AP-435
Ilfor(aptr=2. bptr=(n-2). wptr=2; aptr < n/2; aptr +=2. bptr -=2. wptr +=2)
IIlwR = w[wptr];
WI = w[wptr+l];
I I AR = a[aptr];
AI = a[aptr+l];
I I BR = a[bptr];
BI = a[bptr+l];
II 1* aptr =2.4.6 •••• 14; bptr=30.2S.26 ••••• 1S (if n=32) *1
II 1* Note that there is no need to revise the value at the middle of the
II
list. as it is already correct. (.5*(H(n/4)+Hconj(n/4)) *1
I I SI = (AI + BI) ;
II DR = (BR - AR) ;
II K = WR*SI; L= WI*DR;
PR
K-L;
II M = WR*DR; N= WI*SI;
PI = M+N;
II SR = (AR + BR) ;
II DI = (AI - BI) ;
=
II ERD = SR+PR; ER = half*ERD;
I I a[aptr] = ER;
II EID = DI+PI; EI = half*EID;
I I a[aptr+l]= EI;
II FRD = SR-PR; FR = half*FRD;
II a[bptr] == FR;
II FID = PI-DI; FI = half*FID;
I I a[bptr+l]= FI; I I*end of for-loop * I I
11************* End of C-code for realfix.***********************
.text
• align • quad
11--------------
define (astart. r16)
II input data base address
define (wptr.r17) II pointer to W table. Because w-contents depend on N.
II we will assume the caller has initialized w() array.
define(N.rlS) II
define (aptr. r20) Ilpointer to 1st component of butterfly (load)
define (bptr. r21) //pointer to 2nd component of bfly (load); DOWN COUNTER
define (decrem.r24)
define (count.r25)
define (WR. flS)
define (WI. fl9)
Ilbla
II bla
decrement
counter
IIW (twiddle factor). real part
II " " • imag
define(AR, fl2) Ilelement A, real component
define(AI, fl3) II " ", imag
define (ARo, fl4) II extra A value. for prefetch (o="odd n )
define(AIo,fl5)
define(BR, fl6) Ilelement B, real component
define (BI, fl7)
define(ER. f20) IIResult of butterfly which overwrites AR
define(EI, f21) II "
AI
define (half,f22)
Ilconstant 0.5
define (FR, f24) IIResult of butterfly which overwrites BR
define(FI, f25)
define (PR.f26)
define (PI,f27)
define (DR, f28)
define(DI. f29)
2-433
int:eL
define(SR, f30)
define(SI, f31)
AP-435
II Sum of A+B, real
II n " , imag "
part
.data
.align .double
halfloc:: .float 0.5
11-------.text
.align .qua'd
_real fix_ : :
fst.q
f12,-16(sp)++ Ilsave "local" regs
adds
-4,rO,decrem Ilbla decrement
11-------II We do not bother to initialize FP pipes to zero here, as we assume
II this routine is called after another, "safe" , pipelined FP routine.
pfld.l halfloc,fO
pfld.d 8( wptr)++,fO /Iskip W(O) intentionally. Is a trivial (1,0) value
II init pointers:
adds
O,astart,aptr
pfld.d 8( wptr)++,fO
shl
2,N,bptr Ilbptr=total # bytes of input data
pfld.d 8( wptr)++,half 110.5 into an fpr
adds
bptr,astart,bptr II bptr points to a(N)
II here fetch first set of A,B,W before bla-loop
pfld.d 8( wptr)++,WR
fld.d 0 (aptr),AR Ilfor 1st and last elements
adds ':'8,N,count II bla counter (predecrement by 2 butterflies worth)
II ----------II Do n/4 butterflies: (computing only N/2 elements
II
the second N/2 are just complex conjugates of
of complex output, because
the 1st N/2)
II
II
Definitions for pipe diagram:
WR = cos(), WI=-sin().
/I
DR = BR - AR; (diffence of Real components of A,B)
II DI = AI - BI; (diffence of Imag components)
II SR,SI = sum of A,B
II PR = K - L; where K= WR*SI, L=WI*DR
II PI = M + N; where M= WR*DR, N=WI*SI
II (ER,EI)=complex result to overwrite A.
/I (FR,FI)="""" B.
first_fly:: Ilfill pipe.
II For Oth butterfly:
/I AR = a[O];
AI = a[l];
I I a[O] = AR + AI; a[l] = 0;
/I a[n] = AR - AI; a[n+l]= 0;
r2pt.ss fO,fO,fO
mrmlp2.ss AR,AI,fO
mrmls2.ss AR",AI,fO
fld.d 8 (aptr)++,AR
fld.d -8(bptr)++,BR
d.pfadd.ss fO,fO,fO
d.pfadd.ss fO,fO,ER
/I KR •• KI •• Ml •••• M2 •••• M3
0
/I 0
0
0
/I
0
0
0
/I
/I
/I
0
0
0
0
2-434
0
0
T
Al •••• A2 •••• A3 •••• Write
Q
ERO
FR
ER
0
0
FR
0
ER
FR
ERO
AP-435
d.ralp2.ss AI ,BI ,FR II
0
0
nop
d.mrmls2.ss BR ,AR ,EI II
0
fst.d ER,-8(aptr)
d.mr2pt.ss WR ,fO, FI II WR
fst.d FR, 8(bptr)
d.ralp2.ss BR ,AR ,SI II
Kl
andh Ox8000,count,rO Ilcheck for negative
d.m12tpm.ss WI ,DR ,DR II
Ll
Kl
bnc endfix
d.r2pt.ss half,DR, fO IIhalf
Ml
Ll
Kl
nop
d.m12ttpa.ss WI ,SI ,SRII
Nl
Ml
Ll
nop
d.i2st.ss
Nl
fO ,fO ,fOil
fO
Ml
nop
II KR •• KI •• Ml •••• M2 •••• M3
d.ratls2.ss AI ,BI ,fO II
Nl
nop
d.i2pt.ss
fO ,fO, fOil
fO
fld.d 8 (aptr)++,AR
d.r2apl.ss SR ,fO, PRII
fld.d -8(bptr)++,BR
d.rals2.ss SR ,PR, DI II
pfld.d 8( Vlptr)++,WR
d.r2apl.ss DI ,fO, PIli
nop
d.rals2.ss PI ,DI ,fO II
ERl
nop
d.ralp2.ss fO ,fO ,fO II
FRl
ERl
nop
d.rals2.ss fO ,fO ,fO II
Ell
FRl
ERl
bla decrem,count,fix_loop
d.pfadd.ss fO ,fO ,FI II
Ell
FRl
ERl
nop
11------------------II Each butterfly = 1
II =
II
II
II
II
FRO
Sll
DRl
Sll
DRl
SRl
approx. 18 cycles per butterfly
2-435
Sll
FlO
DRl
Sll
SRl
DRl
SRl
Kl
SRl
Kl
PRl
T
111
Al •••• A2 •••• A3 •••• Write
Dll
PRl
Ml
Pll
Dll
PRl
ERD
Pll
Dll
PRl
FRD
ERD
Pll
Dll
EID
FRD
ERD
Pll
FlO
EID
FRD
FID
EID
complx multiply, 3 complx add, 1 real multiply
8 multiply, 10 add/subtract
3 8-byte fetches (A, B, W)
2 8-byte stores (E, F)
II
EIO
FID
-FID
inteL
AP-435
fix_loop: :
II KR •• KI •• Ml •••• M2 •••• M3
d.mr2pt.ss fa ,FI ,ER /I a
FIl
FRl
Ell
nap
d.mrmlp2.ss AI ,BI ,FR II
Fll
Ell
nap
d.mrmls2.ss BR ,AR ,EI II
Fll
fst.d ER,-8(aptr)
d.mr2pt.ss WR ,fa, FI II WR
fst.d FR, 8(bptr)
d.ralp2.ss BR ,AR ,SI II
K2
andh Ox8000,count,rO Ilcheck for negative
L2
d.ml2tpm.ss WI ,DR ,DR II
K2
bnc end fix
d.r2pt.ss half,DR, fa Ilhalf
M2
L2
K2
nap
N2
L2
d.ml2ttpa.ss WI ,SI ,SRII
M2
nap
d.i2st.ss
N2
M2
fa ,fa ,fall
fa
nap
II KR •• KI •• Ml •••• M2 •••• M3
d.ratls2.ss AI ,BI , fall
N2
nap
d.i2pt.ss
fa ,fa, fall
fa
fld.d 8 (aptr)++,AR
d.r2apl.ss SR ,fa, PRII
fld.d -8(bptr)++,BR
d.rals2.ss SR ,PR, DII/
pfld.d 8( wptr)++,WR
d.r2apl.ss DI ,fa, PIlI
nap
d.rals2.ss PI ,DI ,fa II
ER2
nap
ER2
d.ralp2.ss fa ,fa ,fa II
FR2
nap
d.rals2.ss fa ,fa ,fa II
EI2
FR2
ER2
bla decrem,count,fix_loop
d.pfadd.ss fa ,fa ,FI II
EI2
FR2
ER2
nap
-
-
11--------endfix: :
1/ restore regs
fiadd.ss fO,fO,fO Ilexit DIM
fld.q
O(sp),fl2
fiadd.ss tO,fO,fO Illast DIM pair
adds
l6,sp,sp
bri rl
nap
11------------
2-436
-
T
Al •••• A2 •••• A3 •••• Write
ERl
-
SI2
-
DR2
FRl
Ell
SI2
DR2
SR2
SI2
FIl
DR2
SI2
DR2
SR2
SR2
K2
-
K2
PR2
T
M2
Al •••• A2 •••• A3 •••• Write
DI2
PR2
M2
PI2
DI2
PR2
ERD
PI2
DI2
PR2
FRD
ERD
PI2
DI2
EID
FRD
ERD
PI2
FID
EID
FRD
FID
EID
-
-
SR2
FID
FID
int:el.
c
AP-435
PROGRAM FFTTEST
file = real.f
C
C
l-D FFT TEST PROGRAM
C
C 8/14/89
C Intel assumes no responsibility for use or misuse of this code.
C-----------------PARAMETER (IREV=l)
c
c
c
c
c
c
c
c
c
character*8 really
PARAMETER (REALLY='real')
PARAMETER (REALLY='complex')
PARAMETER (TIMEIT=O, CACHETIME=O)
REALLY=' real , means real-only input, otherwise assume complex input
DATA IT/200000/
PARAMETER (N=2048,M=11)
PARAMETER (N=1024,M=10)
PARAMETER (N=512,M= 9)
PARAMETER (N=256,M= 8)
PARAMETER (N=128,M= 7)
PARAMETER (N=64,M= 6)
PARAMETER (N=32,M= 5)
PARAMETER (N=16, M=4)
PARAMETER (PI=3.1415926536)
COMPLEX X2(N) ,X(N) ,X3(N), W(N/2)
Real ASQR(N) ,ASQR2(N) ,XR(N+2) ,XR1(N+2) ,XR2(N+2) ,XR3(N +2)
complex wtemp
real rtemp
C
PRINT *,' FFT test program
print *,'==============================='
IF (IREV .eq. 0) THEN
print *,' NOT counting "time for bit-reversal.'
print *,'DO NOT expect matching answers,without bit-rev'
ELSE
print *, 'Time for bit-reversal included.'
ENDIF
print * 'Time for cache writeback and fills ••• '
IF (CACHETIME .eq. 0) THEN
print *,' NOT included, if iterating.'
ELSE
included. '
print *,'
ENDIF
print
print
print
print
print
print
*, '==============================='
* 'If iterating ••• Number of Iterations =',IT
*, '==============================='
* 'Number of Points
= 'f N
*, '(',REALLY,' data)'
*, '==============================='
2-437
intel$
AP-435
C------------------
C Init twiddle factor array w(k) with (cos,-sin) of 2pi*k/N
rtemp = 2.0*pi/N
wtemp= CMPLX(cos(rtemp), -sin(rtemp))
w(l) = (1.0, 0.0)
DO 200 k = 2,N/2
200
w(k) = wtemp * w(k-l)
cc print *,' W (twiddle) initialization completed •••••• '
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
C INITIALIZE input data
C
DO 100 I = 1, N
c :constant:
c
Treal = 1.0
c
Timag = 0.0
c:squarewave:
cc IF (I .It. N/2) THEN
cc
Treal = 1.0
cc
Timag = 0.5
cc ELSE
Treal
cc
0.0
Timag = 0.0
cc
cc ENDIF
C: ramp function:
Treal = I
1.0
Timag = Treal + 0.5
IF (REALLY .ne. 'real') THEN
X(I) = CMPLX (Treal, Timag)
X2(1) = CMPLX (Treal, Timag)
X3(1) = CMPLX (Treal, Timag)
ELSE
X(I) = CMPLX (Treal,O.O)
X2(1) = CMPLX (Treal,O.O)
XR(I) = Treal
XR1(1) = Treal
XR2(1) = Treal
XR3(1) = Treal
ENDIF
100
CONTINUE
-
C
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CALL fft (X2, M, N)
cc Subroutine fft is Decimation-In-Time, Fortran version.
CALL dirr(XR,M,N,W,l)
c
(Assuming dirr produces inplace result, items 0:N/2 complex results)
2-438
AP-435
ccccccccccccccccccccccccccccccccccccccc
IF (IREV .ne. 0) THEN
IF (TIMEIT .eq. 0) THEN
call vcompare(XR,X2,N/2+2)
call cmags(XR,N/2+1,ASQR)
c cmags to take squared magnitude of complex values in X
call cmags(X2,N,ASQR2)
c----------------------c
C print non-zero results:
J=O
DO 700 I = 1,N/2+1
IF ((ASQR(I) .GT. 1.0) .OR. (ASQR2(I) .GT. 1.0)) THEN
WRITE (6,22) (1-1), ASQR(I), ASQR2(I)
FORMAT (' 1-1=',14,' ASQR(I)= ',F14.2, ' ASQR2(I)= ',F14.2//)
22
J
J+l
IF (J .GT. 32) GOTO 725
ENDIF
700
CONTINUE
=
725
CALL TIME
ENDIF
ENDIF
IF (TIMEIT .ne. 0) THEN
ccccccccccccccccccccccccccccccccccccccc
cc- Timing loop follows:
print *,' Start Ass.FFT'
IF (CACHETIME .eq. 0) THEN
DO 500 I = 1, IT,4
C Reuse same array, so cache fill and writeback time NOT included.
CALL dirr(XR, M, N,W,IREV)
CALL dirr(XR, M, N,W,IREV)
CALL dirr(XR, M, N,W,IREV)
500
CALL dirr(XR, M, N,W,IREV)
ELSE
DO 504 I = 1, IT,4
C Alternating between XR,XR1,XR2,XR3 should provide cache misses.
CALL dirr(XR, M, N,W,IREV)
CALL dirr(XR1, M, N,W,IREV)
CALL dirr(XR2, M, N,W,IREV)
504
CALL dirr(XR3, M, N,W,IREV)
ENDIF
print *,' END Ass. FFT'
ccccccccccccccccccccccccccccccccccccccc
2-439
int:et
AP-435
ENDIF
STOP
END
0----------------------0
subroutine vcompare(res,exp,n)
o VCOMPARE compares 2 vectors, prints out 1st few miscompares
o
integer n, errcnt
real res(n), exp(n)
12
write(6,12)
. format('*** VCOMPARE: vector comparison beginning ***')
data errcnt/O/
do 30 i = l,n
if(AINT(res(i)) .ne. AINT(exp(i))) then
c (print out error, exit if alot already)
120
print *,'*** Error in compares ***'
write(6,121) i
121
format ( , Item number = ',16)
write(6,124) res (i) , exp(i)
124
format (' Res_=' ,F14.2,' Expected_=' ,F14.2)
errcnt = errcnt + 1
if (erront .gt. 19) then
return
end if
end if
30
continue
190
if (errcnt .eq~ 0) then
print *,' *** veotor compares SUCCESSFUL ***'
end if
99 return
end
c----------------------c
2-440
int'eL
AP-435
C--------------C file: fft. f
C FFT routine from Rabiner & Gold. 1975. who copied it
C from Cooley. Lewis. Welch
C 6/02/89
C
C
C
C
C
C
C
C
Decimation in Time. radix-2. inplace. l-dimen
Inputs:
A= complex array of input. up to 1024 pts. single-prec float
(maybe more than 1024. uncertain what limit is)
M= log of number of pts
= (Number of stages of FFT)
N = number of points. ie. N= 2**M = number of pts
C
C Outputs:
C A= complex fft of input A. in NON-bit-reversed order.
C
C w (twiddle factor) calculated by recursion. Supposedly takes 15% more
C operations than keeping entire twiddle array as constants pre-allocated.
C
subroutine fft(a.m.n)
integer m.n. i. j.k. ndiv2.powers2(0:10)
integer iplus.offset. stage. indexl. groups
complex a(n) .wtemp(2) .w(ll) .temp
C Init twiddle factor array w() with (cos.-sin) of pi.pi/2.pi/4 ••••
data w(l) 1(-1.0,0.0) I
data w(2) 1(0.0.-1.0) I
data w(3) 1(0.7071068,-0.7071068)1
data w(4) 1(0.9238795,-0.3826834)1
data w(5) 1(0.9807853,-0.1950903)1
data w(6) 1(0.9951847,-0.0980171)1
data w(7) 1(0.9987955,-0.0490677)1
data w(8) 1(0.9996988,-0.0245412)1
data w(9) 1(0.9999247,-0.0122715)1
data w(lO) 1(0.9999812,-0.0061359) I
data w(ll) 1(0.9999953,-0.003068) I
data powers2 11,2,4,8.16.32,64,128,256,512,10241
C Powers2 to avoid calls to POW, DIV
C Setup for bit-reversal loop:
ndiv2 = n I 2
j
=1
C-----------
C "DO 7" loop to in-place-bit-reverse-shuffle input
DO 7 i= 1, n-l
IF (i .It. j) THEN
temp = a(j)
a(j) = a(i)
a(i) = temp
ENDIF
k = ndiv2
2-441
AP-435
C "While (j .gt. k)" /*decrease
6
IF (j .gt. k) THEN
j
k
=
j
by 2**something */
j-k
=k
/ 2
GOTO 6
ENDIF
C Add next lower power of 2 to
= j+k
C-----------C Special case
7
j
j
for stage l:no complex multiplies, simple add
C (Performance enhancement)
groups = 2
offset = 1
indexl = 1
C i-loop iterates N/2 times for 1st stage (and would do twice N/4 x for 2nd)
CVD$
NODEPCHK
DO 8 i = l,n,2
iplus = i + 1
8
temp = a(iplus)
a(iplus) = a(i) - temp
a(i) = a(i) + temp
C-----------C Special case
for stage 2: no complex multiplies, simple add
C (Performance enhancement)
groups = 4
offset = 2
indexl = 1
C i-loop iterates N/4 times for 2nd stage
C 1st call to i-loop,in stage2: indexl=l, wtemp(l)=(l,O)
CVD$
NODEPCHK
DO 90 i = l,n,4
iplus = i + 2
temp = a(iplus)
a(i~lus) = a(i) - temp
90
a(i) = a(i) + temp
indexl = 2
CVD$
NODEPCHK
CVD$
NOVECTOR
DO 92 i = 2,n,4
iplus = i + 2
temp = CMPLX(AIMAG(a(iplus)) ,-REAL(a(iplus)))
a(iplus) = a(i) - temp
92
a(i) = a(i) + temp
CVD$
VECTOR
C-----------C "DO 20" stage-loop
executed once for each of the (m) stages of FFT
C (Except 1st and 2nd stage)
C offset gets 4,8,16,32,64,128,256 •••
DO 20 stage = 3,m
groups = powers2(stage)
offset = groups/2
wtemp(l) =(1.0, 0.0)
C One twiddle seed (W) calc per'stage.
C We pre-allocated w(12)-array with those values, avoid cos/sin calls
2-442
InteL
AP-435
C-------------------
DO 20 indexl = l.offset
C "DO 10" i-loop does each butterfly of each stage. with varying twiddles
C
i-loop iterates N/2 times for 1st stage. N/4 x for 2nd. N/8 x for 3rd
C
stage. N/16 x for 4th stage •••• 1 time for last stage.
CVD$
CVD$
NODEPCHK
ALTCODE
DO 10 i = indexl.n.groups
iplus = i + offset
temp = a(iplus) * wtemp(l)
a(iplus) = a(i) - temp
10
a(i) = a(i) + temp
20 wtemp(l) = wtemp(l) * w(stage)
RETURN
END
C------------
C
C
C
C
C
C
subroutine cmags(a.n.asqr)
Complex magnitude squared.
Inputs:
A= complex array of input. single-prec float
N = number of input points (and output points)
Ouput:
asqr = real squared magnitude (R*R + 1*1). N elements. single-prec float
integer n.i
real asqr(n)
complex a(n)
DO 100 i = 1. n
asqr(i) = (REAL(a(i))*REAL(a(i))) + (AIMAG(a(i))*AIMAG(a(i)))
100
CONTINUE
RETURN
END
2-443
inteL
AP-435
## makefile for i860(tm) CPU FFTs (for Unix V/386 programming environment)
8/7/89
##
##
GH=/usr/i860/bin
GHL=/usr/i860/lib
CC=$(GH)/c860
FC=$(GH)/f860
CFLAGS= -OLM -X393 -X405 -X188 -X370
FFLAGS= -OLM -X370 -X393 -X71 -X422
## -X71 uses single-precision math routines
FLFLAGS= -Mx map -e start
LFLAGS= -Mx map -e _main
CLIB=$(GHL)/libc.a
MLIBPSR=$(GHL)/860mtlib.a
MLIB=$(GHL)/libm.a
FLIB=$(GHL)/libf.a
ASM=$(GH)/as860
FLINK=$(GH)/ld860 $(FLFLAGS)
RT=$(GHL)/s5Iib.a
LIBS= $(FLIB) $ (MLIBPSR) $ (MLIB) $(CLIB) $(RT)
LIBCC= $(MLIB) $(CLIB) $(RT)
## NOTE: Order of linked files is CRUCIAL, other orders may give errors
.SUFFIXES:
.SUFFIXES: .f .c .s
.5S
.0 .8
.IGNORE:
## .ignore causes make to ignore error codes from compilers
## To test Fortran plus assembler-fft-stage version:
FILE= ffttest.o fft.o diff.o bitrev.o difstep.o start.o time.o
## To test all-Fortran version of fft:
##FILE= ffttest.o fft.o diff.o difstepf.o.start.o time.o
## To test REAL-input version of fft:
RFILE= real.o fft.o dirr.o realfix.o difstep.o bitrev.o start.o time.o
.f.o:
$(FC) $ (FFLAGS) $*.f
$(ASM) -x -0 $*.0 $*.s
.c.o:
$(CC) $ (CFLAGS) $*.c
$(ASM) -x -0 $*.0 $*.s
2-444
AP-435
• s. 0:
m4 $*.5 temp2.s
$(ASM) -x -0 $*.0 temp2.s
ffttest.S: $(FILE)
$ (FLINK) -0 ffttest.S $(FILE) $(LIBS)
real.S:
$(RFILE)
$(FLINK) -0 real.S $(RFILE) $(LIBS)
clean:
rm -f *.0 *.8
.55.0:
m4 $*.55 temp.s
$(ASM) -x -0 $*.0 temp.s
fI
2·445
AP-435
I/start.ss
II 8/18/89
II Fortran runtime startoff routine
/I
.text
.globl start
.globl finish
start: :
orh
h%_stack+262128+262144,rO,sp
or
1%_stack+262128+262144,sp,sp
adds
-16,sp,sp
st.l
rl,12(sp)
_main
call
nop
finish: :
call
nop
·start.c·
.file
.data
.align
.lcomm
.end
• quad
_stack,262144+262144
11===============================================================
1* file: time.c.
Purpose: establish a label to use for breakpoints
long
time_(x)
long
*x;
{ x = x+4;
return ( (long) x);
long
timestop_(x)
long
*x;
{ x = x+4;
return((long) x);
2-446
*1
inial·
APPLICATION
NOTE
December 1991
Designing a Memory Bus
.Controller for the
82495/82490 Cache
MARK ATKINS
ISIC SILAS
CHRIS KARLE
Order Number: 240957-001
2-447
Designing a Memory Bus Controller
for the 82495/82490 Cache
CONTENTS
PAGE
CONTENTS
1.0 BACKGROUND .................... 2-450
2.0 WHY A CUSTOM BUS
INTERFACE? ........................ 2-451
PAGE
Bus Size Adaptation .............. 2-468
Bus Signal Levels ................... 2-468
8.0 MBC FUNCTIONS FOR
MULTIPROCESSORS ............... 2-469
3.0 GUIDELINES ...................... 2-451
Snooping Results ................ 2-469
Shared Bus Interconnect ............ 2-452
Snoop Window Time ............. 2-470
4.0 MBC BLOCK DIAGRAM ........... 2-452
Read for Ownership .............. 2-470
Cache-to-Cache Transfers ....... 2-471
5.0 DESIGN EXAMPLE: A
UNIPROCESSOR MBC .............. 2-454
Snoop Filtering ................... 2-471
SplitTransaction ................. 2-472
6.0 DESIGN EXAMPLE: A
MULTIPROCESSOR MBC ........... 2-454
Memory Cycle Abort .............. 2-472
Locking ............................. 2-472
7.0 MBC FUNCTIONS ................. 2-456
Bus Lock vs. Address Lock .......... 2-473
MBC Functions for Uni and
Multiprocessors ................... 2-456
KLOCK# De-Assertion ........... 2-473
CPLOCK # .......................... 2-473
Reset and Configuration Control .. 2-456
Intel486 OX CPU Resets ......... 2-457
FLUSH# (and SYNC#) .......... 2-457
9.0 MORE ALTERNATiVES ........... 2-474
Bus Error or Timeout Detection ... 2-458
Strobed or Clocked M-bus ........ 2-474
Scenarios Requiring MBC
Action ......................... 2-458
Write back ........................ 2-474
M-bus Clocking .................. 2-474
Line Size and M-bus Width ....... 2-474
Transfer Tracking ................... 2-458
Clock Boundaries and
Synchronization ................... 2-459
Synchronizer Delays ................ 2-461
10.0 MBC DIFFERENCES FOR
i860TM XP CPU VERSUS
Intel486™ DX CPU ................. 2-475
BRDY # Generation ................. 2-462
11.0 SUMMARY ....................... 2-475
Pipelining ........................... 2-464
12.0 BIBLIOGRAPHy .................. 2-476
Pipelining the MBC-to-82495 ..... 2-464
Pipelining the M-bus .............. 2-464
M-bus Arbitration .................... 2-464
Sequencing ......................... 2-465
APPENDIX A: Questions and Answers
on MBC Design ..................... 2-477
APPENDIX B: Intel486 DX CPU
Uniprocessor MBC Design ......... 2-480
Flowchart of MBC Algorithm ...... 2-467
Cacheability ...................... 2-468
APPENDIX C: i860TM XP CPU DualProcessor MBC ..................... 2-481
Snooping ........................... 2-468
Snoop Handshaking .............. 2-468 .
2-448
Designing a Memory Bus Controller
for the 82495/82490 Cache
CONTENTS
PAGE
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure?
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
PAGE
FIGURES
FIGURES
Figure 1
CONTENTS
CPU + 82495 + 82490
Systems ..................... 2-450
CPU + 82495 + 82490 Core .. 2-451
System Type and Bus
Requirements ............... 2-452
Generic Block Diagram of
MBC ........................ 2-453
Block Diagram of Uniprocessor
MBC ........................ 2-454
Block Diagram of
Multiprocessor MBC ......... 2-455
Synchronizer Hardware ...... 2-461
Data Transfers, M-bus Width =
CPU bus, MCK = ClK ....... 2-463
Data Transfers, M-bus Width =
CPU bus, MClK < ClK ...... 2-463
Data Transfers, M-bus Width =
2*CPUbus ................... 2-463
Data transfers, M-bus Width =
4 *CPUbus ................... 2-463
Data Transfers for NonPipelined M-bus ............. 2-464
Data Transfers for Pipelined
M-bus ....................... 2-464
MBC Signals and Protocol
layers ....................... 2-466
Figure 15
Figure 16
Creating Snoop Results ..... 2-469
Snoop Waveforms .......... 2-471
Figure C-1 Pinout Environment of
MBC ....................... 2-481
Figure C-2 Non-Aborted Read Cycles .. 2-488
Figure C-3 Aborted Non-Pipelined
Cycles ...................... 2-490
Figure C-4 Aborted Pipelined Cycles ... 2-491
Figure C-5 Potentially Allocatable
Write ....................... 2-492
Figure C-6 Non-Allocatable Write ...... 2-492
Figure C-? Interprocessor
Communications in Two
Processor System .......... 2-494
Figure C-8 Extension Glue ............. 2-496
State Diagrams ........................ 2-497
PlD Codes ............................. 2-513
Appendix C Schematic ................. 2-544
TABLES
Table 1 Functions of the Memory Bus
Controller ..................... 2-456
Table 2 Clocked vs. Strobed M-bus
Tradeoffs ...................... 2-474
2-449
inteL
AP-452
1.0 BACKGROUND
The Intel 82495 Cache Controller and 82490 Cache
RAM form a high-speed cache subsystem for the
Intel486 DX CPU (82495DX/490DX) or the i860 XP
CPU (82495XP/490XP). The reader should be familiar
with these chips, as described in:
1) i860 XP CPU Microprocessor Data Sheet (Intel order #240874)
2) Intel486 DX Microprocessor Data Sheet (Intel order
#240440)
I
3) 82495XP Cache Controller/82490XP Cache RAM
Data Sheet. (Intel order #240956, June 1991)
or Intel486 DX CPU Microprocessor Cache-Chip
Set Data Sheet (Intel order # 241084, June 1991)
Diagrams of systems containing the 82495 and 82490
appear in Figure 1, and a more detailed diagram of the
CPU/82495/82490 core appears in Figure 2. (Note: for
simplicity, the 82495XP/82490XP and 82495DX/
82490DX will be referred to generally as
82495/82490--the XP or DX' should be inferred depending upon the CPU being utilized.) In such systems,
the 82495 controls a cache external to the CPU, and
includes the cache tags. It can interface gluelessly to an
Intel486 DX CPU or i860 XP CPU microprocessor,
allowing the processor bus to run at 50 MHz with zero
wait-states, while the memory bus can remain at a lower frequency. Both writeback and writethrough protocols are supported. Concurrent operations can occur
simultaneously on the local CPUbus and the shared
memory bus. All requisites for multiprocessors are included in the 82495, Intel486 DX CPU, and i860 XP
CPUs, but the 82495 also is useful for a uniprocessor
system performance enhancement.
The 82490 cache RAM contains 32 kBytes per chip,
and is used in groups of 4, 8, or 16 to implement caches
from 128 to 512 kBytes. It supports two-way associativity, delayed writebacks, burst transfers, and boundary
scan test. The 82490 contains much more than RAM
cells-it includes various buffers, queues, and support
for several bus protocols. It is two-ported, with simultaneous access on both the CPU side and Memory-Bus
side. The cache optionally supports parity using additional 82490 chips.
Configuration options allow a variety of memory bus
widths (32 to 144 bits), cache line widths (16 to 128
bytes), and asynchronous or synchronous transfers.
The configuration is selected by the polarity of various
pins at reset time.
1. Uniprocessor
240957-1
2. Homogeneous Multiprocessor
...--...",.,.--.
32.64, or 128
240957-2
3. Heterogeneous Multiprocessor
r-----,
240957-3
Figure 1. CPU + 82495 + 82490 Systems
2-450
AP-452
240957-4
Figure 2. CPU + 82495 + 82490 Core
The Memory Bus Controller (MBC) portion of the system interfaces the 82495 and CPU to the system bus.
The MBC converts bus status and command lines into
requests to the 82495, for example, to monitor the progress of an ongoing bus transaction from another CPU
subsystem to ensure consistency with 82495 + 82490
cache contents. Likewise the MBC adapts 82495 requests to the bus protocol and arbitrates for ownership
of the bus. Most CPU requests will not require MBC
action; only I/O cycles, cache bypass requests, and
82495 cache misses are forwarded by 82495 to the
MBC, while external cache hits are handled totally by
82495 + 82490.
2.0
WHY A CUSTOM BUS
INTERFACE?
Clearly the entire interface to a memory bus (abbreviated M-bus) could have been incorporated in the 82495
and 8.2490 chips. This approach has been followed by
some other cache chipsets.
However, such integration suffers from inflexibility and
bandwidth limitations. As shown in Figure 3, the performance and cost targets of the system determine the
size and complexity of the bus, so if the bus is "hardwired" into the cache controller chip, it will be too
costly for small systems and too slow for larger systems. With the bus interface implemented separately, it
can be a complex ASIC for a high-bandwidth complex
system, or a few EPLDs for a PC. The same cache
controller can improve performance of a variety of busbased CPUs.
For a desktop PC, a 32-bit simple memory bus is adequate. For a workstation or small multiprocessor of
two CPUs, a faster 64-bit bus may be required to give
adequate bandwidth for graphics frame buffers and intensive numeric calculations. Bus bandwidth requirements grow as the MIPS rating of each CPU in a system grows; for example, a bus adequate for 12 386
CPUs may be too slow for 6 Intel486 OX CPUs, as
they process far more data per second.
2-451
A large multiprocessor of 6 or more CPUs needs a wide
and fast bus such as Futurebus +, with split-transaction capability to prevent bus bottlenecks from slowing
the performance of every processor. Hierarchies of buses and caches can further allow more CPUs with reasonable performance increases as CPUs are added. A
Futurebus + hierarchy maintains concurrent transactions on each bus, and "bridge" caches at the junctions
of buses echo them from bus to bus when the bridge
detects that one transaction may affect cached copies
on the other bus.
Compatibility with existing buses is often crucial in
product design, so that new faster components can plug
into existing machines and I/O devices. The flexible
82495/82490 bus interface allows compatibility as well
as extension.
Thus the 82495 and 82490 will be used in a wide variety
of systems, including standard buses like Futurebus + .
For proprietary buses, the "proprietor" can design an
ASIC or PAL MBC incoTporating the required fea~
tures.
3.0 GUIDELINES
This document exists to clarify the necessary components and tradeoffs of a Memory Bus Controller. The
example designs here have not been tested, and signal
definitions of the i860 XP CPU, Intel486 DX CPU,
82495, and 82490 chips are subject to change.
The memory bus controller is not allowed to use (and
thus add capacitance to) any of the CPU pins used by
the 82495/82490, except those listed in the 82495 Data
Sheet [82495/490DS] description of the BLE# pin.
Only the CPU pins BE7-0#, PWT, PCD, LEN,
CACHE#, BRDY#, PCYC, and CTYP have sufficient timing margin to tolerate the MBC load.
intel~
AP-452
I
Small
2-3 CPUs
reature
Unlproc
CPU<->Memory
Interconnect
Simple
Bus
Bu. Width,
Frequency
32 bit
20-40 MHz
Medium
4-8 CPU.
Multiprocessor
I
Large
8+ CPU.
PiPelin'ed Bus
Crossbar or
~
32 or 64
".. "'''0''''
64 or 128
64 or 128 bit
33 MHz or more
•
Cache
WriteBack
82~95 + 82490
WriteThru
I
:
- - 3rd Level Cache ---Arbitration
Distri~uted Arbitration
Central
Central
(HOLO/HLDA)
Error Detect
-.
Bus Protocol
...- Simple
LOCKing
Bus Parking
I
I
Bus Lock
Address Lock
I
I
Parity
ECC on Memory, Retry
I
~
Pipelined
--
I
•
-
Split Transaction ----+-
I
Read-for-Ownership
Cache-to-Cache Transfers
External rlro.
Extra
Features
I
240957-5
Figure 3. System Type and Bus Requirements
Shared Bus Interconnect
4.0 MBC BLOCK DIAGRAM
When used in a multiprocessor, the 82495 assumes a
shared-memory, shared-bus environment so that it can
observe and "snoop" accesses by others which might
conflict with the memory locations it has cached. In a
crossbar or other multipath interconnect, shared-bus
coherency can be emulated for the 82495 or it can be
used non-coherently. Either a centralized directory or a
hierarchy of buses and caches can do the emulation. A
directory would keep a record, for each line of main
memory, of caches which have the line. When a cache
first writes to a line of memory, the central directory
broadcasts an invalidation message to all other caches
containing that line. [AgarwaI88]
Shown in Figure 4 is a high-level block diagram of the
functions and interfaces involved in the Memory Bus
Controller. Part of the MBC operates on the high-speed
clock (CLK) which the CPU and 82495 use. While the
M-bus could use the 50 MHz CPU CLK, such a fast
M-bus is hard to design. The part of the MBC which
interacts with the memory bus protocol runs on an Mbus clock (MCLK), if that protocol is clocked. Also
possible is an unclocked M -bus protocol using the
82495/82490 in "strobed" mode. The MBC contains
synchronizers and a few signals which cross between
the two clock domains. Synchronizers, consisting of
specially-designed flip-flops, allow a clocked state machine to use data which may be transitidning near the
edge of the clock. Unsynchronized data can cause
metastability in latches, where their output changes
slowly and unpredictably.
2-452
Ap·452
COTSII CW/R,
CAOSII CO/C, CM/IO
BEllS
LEN
CACHEII
PCO
B
3
SNPSTB
SNPINV
SNPCLK
M
A
0
E
"
BE
LATCH
74f377
SNOOP
LOGIC
B C
M
G N
H
T A
I
T
M
S
K
W W
E
N
D
II
""
II
CYCLE
TRACKING
P
A
L
L
C
K
L
E
N
D
0
II
"
C
K
SPECIAL
CYCLE
TRACKING
C
R
B
R
0
Y
II
0
Y
II
@J
82495
II
READY
LOGIC
CLK
MeLK
MBROY"
RESET AND
CI'ITRL
7
I C0':r~~~:~~ION
PCHKII
R
E
S
E
M
ARB
STATUS
T
H
I
T
M
B
R
D
Y
II
~=
WEWORY BUS
MULTIPROCESSOR ONLY
240957-6
Figure 4, Generic Block Diagram of MBC
2-453
intel®
AP-452
5.0 DESIGN EXAMPLE: A
6.0 DESIGN. EXAMPLE: A
UNIPROCESSOR MBC
MULTIPROCESSOR MBC
A simple MBC design example is an adapter to allow
plugging a daughtercard module with an Intel486 DX
CPU, 82495, and 4 82490s into an Intel486 DX CPU
microprocessor PGA socket. The memory bus is an
Intel486 DXCPU-bus, allowing the external cache to
be a performance enhancing option. It assumes a "divided synchronous" M-bus clock, where the M-bus
runs at 1/2 the CPU CLK speed. Thus no synchronizers
are needed. The MBC uses both the CPU CLK and the
M-bus MCLK.
An i860 XP CPU multiprocessor-capable MBC (Figure
6) using an M-bus similar to the i860 XP CPU bus is
proposed. For. clocking, it uses an MCLK of 33 MHz,
totally asynchronous to the 50 MHz CPU CLK. It
could therefore be upgraded to faster CPU CLK rates
in the future without changing the design or M-bus.
The design requires:
• 2 74F377 octal latches (for BE7-0#, etc... )
• 2 74AS4374 dual-rank-synchronizer octal registers
This design requires
• 1 74F377 latch
• 16 PLDs
• 2 GA1110 clock drivers for clock distribution
• 6 PLDs containing 10 state machines
• 2 chips for clock generation, not part of the MBC
Approximately 70 signal pins connect the MBC block
to the CPU, cache, and memory. Only a uniprocessor is
supported, although the bus protocol and MBC could
be enhanced for multiprocessing coherency. Figure 5
shows a block diagram. Details of the design· can be
found in Appendix B.
Inlel486TM
_
ox CPu
:::::.....
These components could be integrated into a single
ASIC chip, as about 120 signals cpnnect to the MBC.
The MBC can be used for a uniprocessor ormuItipro"
I cessor i860 XP CPU design. Details can be found in
Appendix C.
"'~'.'O"'.""
1E--'===----------+--------"'="----lI8 2 4 9 0 DX
f.Jc
"Oco
.." ...... 1" ... 1,.," ...... 1%0.0
L"' .. ,L<>c~ .. , .. "O ..... T
82495DX
J
l
I
T
1
"c •• o • •
..."" ... c=.........AD" ..
... ' ..... CD/" .... CM/ID ..
ALl.C ....
c ...... .,. <=LI!:N1
CT
c
~
m
c
z
~
" "
240957-27
infel" .
AP-452
7.0 MBC FUNCTIONS
Table I shows the responsibilities of the Memory Bus
Controller for uiIiprocessors and multiprocessors (MP).
The multiprocessor features exist mainly to prevent bus
over-utilization. However, some of the jobs common to
both are more complex in MP for example, arbitration
and snooping. The pin lists in the table are not exhaustive.
MBC Functions for Uni and
Multiprocessors
Reset and configuration control includes strapping of
the following pins to resistors at Vee or Ground, or
"temporary strapping" of multifunction pins whose
state during the last 16 clocks before falling edge of
RESET determines 82495, 82490, or CPU configura-
Table 1. Functions of the Memory Bus Controller
MBC Functions for Uni and Multiprocessors
Pins
RESET,HOLD,CAHOLD
1. RESET and Configuration
CAHOLD,FSIOUT # ,FLUSH # ,SYNC#
2. FLUSH # and SYNC #
•• 3. Bus Error Detection, Retry
PCHK#,BERR
CLEN1 :O,RDYSRC,BRDY #
4. CPU transfer tracking (burst count)
5. Mbus transfer tracking (burst count)
CRDY#,MBRDY#
(including write back, allpcation)
BGT # ,CADS# ,MBRDY #
6. Synchronization between clock domains
,. 7. Memory-bus pipelining
BGT# ,CNA# ,MEOC# ,CRDY #
,. 8. MBC-to-82495 pipelining
CNA#,MALE
BGT#
9. Memory Bus Arbitration
KWEND#,MKEN#
10. Cacheability decode
"11. Redrive bus signals for BTL or ECL levels or heavy capacitive loads
MBRDY#
• '12. Packing (convert 32-bit M-bus for 64-bit 82490 size, or 8-bit ROM)
"13. Bus messages (interrupts, flushes)
.
INT(R),FLUSH #
"14. Boundary scan and selftest
TCK,TMS,SLFTSH
CW/R#,CADS#
• '15. Performance monitoring (M-bus utilization, read vs. write)
SNPSTB# ,SNPCLK,SNPCYC#
16. Snoop handshake (snooping DMA or other CPU)
17. Snoop writebacks
MHITM#,SNPADS#
Additional MBC Functions for Multiprocessors
Pins
M 1. Snoop window (as master)
M2. Backoff 82495 when request was to M-line in another 82495
··M3. Snoop filtering (via SMLN#)
·'M4. Cache-to-cache transfers (CTCT)
• "M5. Read-For-Ownership (RFO)
·'M6. Split transactions (requires duplicate tag array)
., M7. Memory cycle abort (after MHITM #)
M8. LOCK # protection
"M9. LOCK# de-assertion (for back-to-back Intel486 DX CPU locks)
·'M10. CPLOCK# (Inte1486 DX CPU only)
•• M 11. Snoop during LOCK #
"M12. Multiprocessor Interrupts
(for Message-Based Interrupts or TLB shootdown)
'" = optional and implementation dependent
2-456
SWEND#, MWB/WT#
MAOE#
SMLN#
DRCTM # ,MBAOE #
PALLC# ,DRCTM # ,MFRZ #
CWAY
MHITM# .
KLOCK # ,CAHOLD,SNPCYC#
KLOCK#
CPLOCK#
KLOCK#
INT,NMI(BERR)
int:eL
AP-452
tion. The circuit feeding RESET to these chips should
keep it active .at least 16 CLK periods. "Temporary
strapping" means including RESET or 1\ RESET in
the logic equation for the pin. The multifunction pins
are indicated with brackets [ ] below:
input FLUSH # [NCPFLD #] should be driven high
the same clock that RESET falls, to prevent an unnecessary 82495 .cache flush. In Inte1486 DX CPU systems, the 82495 input CACHE# must be tied low and
HITM # [CPUTYP] must be tied LOW, as it signals
CPUTYPE to 82495.
i860 XP CPU pins:
82490: The 82490DX inputs HITM# and BOFF#
must be tied high in an· Inte1486 DX CPU system, as
they exist to support the i860 XP CPU writeback
cache. With an i860 XP CPU, the 82490XP input
BOFF # comes from 82495XP but HITM # from i860
XP CPU feeds 82495XP and 82490XP.
PEN#, FLlNE#, HOLD
Intel486 DX CPU pins:
RDY#, BOFF#, BS8#, BSI6#, HOLD, FLUSH
82495 pins:
CFG3, CFG2[KWEND#1, CFGl [SWEND#],
CFGO [CNA#l, CPUTYP[HITM#],
FPFLDEN[FPFLD#], NCPFLD#[FLUSH#l,
SNPMD[SNPCLK], C490LDRV [BGT#],
MEMLDRV[SYNC# 1. SLFTST # [CRDY#], TEST,
HIGHZ# [MBALE], CACHE# (NOTE: the
FPFLDEN pin is defined for Intel486 DX CPU as
PLOCKEN[CPLOCK#]. The 82495XP does not use
CFO) for configuration in i860 XP CPU systems.)
82490 pins:
MTR4/TR8#[MSEL#], MX4/MX8#[MZBT#],
MSTBM[MCLK], MEMLDRV[MFRZ#] PAR#,
MOCLK, (BOFF #, HITM #)
Intel486 DX CPU: The "unused" Intel486 DX CPU
inputs (RDY#, BS8#, BSI6#, BOFF#) with 82495
should be connected as described in the Intel486DX
CPU Chipset EDS.
The Intel486 DX CPU FLUSH # input should be tied
up, unless the system requires FLUSH messages from
the M-bus to be interpreted. Then the MBC must assert
the FLUSH# inputs to both Intel486 PX CPU and
82495, because 82495 does not do back-invalidates to
the Intel486 DX CPU for FLUSH #. During RESET,
the Intel486 DX CPU FLUSH# input must be kept
high to avoid putting the CPU in tristate-output-testmode (Inte1486 DX CPU Data Sheet Section 8.4).
i860 XP CPU: The i860 XP CPU input PEN# (Parity
trap ENable) must be strapped high unless the memory
data bus feeding the 82490s always contains good parity and the i860 XP CPU system uses 2 82490s In parity
mode; in the latter case, strap PEN# low. HOLD
should be strapped low and FLlNE# strapped high, as
those features cannot be used with 82495.
82495: The multiplexed 82495 pin FPFLDEN
[FPFLD#] becomes an output after RESET, sO the
PAL or ASIC which creates FPFLDEN must float it
as soon as RESET = O. The same mUltiplexing applies
to Intel486 DX CPU mode, where the pin is named
PLOCKEN[CPLOCK#]. Likewise, the multiplexed
The 82490 input MOCLK must also be tied low or to a
delayed version of MCLK, if clocked-M-bus mode is
used. This is because the 82490 senses the state of
MOCLK after RESET ends-if MOCLK stays low,
the 82490 uses MCLK to drive MDATA. If MOCLK
toggles after RESET, the 82490 will use MOCLK to
switch output data. Using a delay-line externally to the
82490 to generate MOCLK from MCLK allows the
design a longer hold-time at other receivers of MDATA in the system. For a clocked-M-bus (non- synchronous to CLK), the undelayed MCLK should be connected to the 82495's SNPCLK input and should be
toggling during RESET to tell the 82495 to snoop in
clocked mode.
During RESET, the 82495 and 82490 will float the bidirectional lines they share with the CPU, such as
CDATA and A31:A3. Thus driver contention is avoided. The RESET input should be synchronous to CLK
and deasserted to the 82495, 82490s, and CPU at the
same time, to assure that the configuration controls get
properly passed between them.
.
.For Intel486 DX CPU resets, refer to [82495/490DS]
for the sequencing of HOLD, HLDA, CAHOLD, and
RESET required to reset only the processor without
destroying 82495 cache contents. For that purpose, a
separate RESET line is advised for the CPU and
82495/82490. The CPU RESET line must be wired to
the WRMRST input of 82495, to force 82495 to assert
the BRDYI # input to the CPU during a reset of CPUonly (the CPU uses the BRPYI # input during RESET
to know of the 82495's existance). The HOLD input of
the Intel486 DX CPU and i860 XP CPU processors
should be kept low during normal operation with the
82495, because floating the processor outputs may yield
undefined 82495 behavior.
FLUSH# (and SYNC#) of caches requested by software must be decoded from the 82495 outputs CM/
10#, CD/C#, and CW/R# (=001) and latched
BE3-0# from the CPU. BE3-0# values of 0111 or
1101 should activate the 82495 FLUSH # input, as the
Intel486 DX CPU outputs them in response to the
INVD and WBINVD instructions, respectively. Synch
and flush commands may also come froin the bus as·a
2-457
AP·452
message in a multiprocessor system. The 82495 is smart
enough to allow assertion of FLUSH# or SYNC# at
any time, and will delay the beginning of the flushing
action until all current CPU and M-bus cycles have
completed. The inputs are edge-sensitive. If the bus defines cache flush· messages, the MBC may activate the
Intel486 DX CPU FLUSH # input as well as the
82495's in response to bus message decodes.
Bus Error or Timeout Detection logic in the MBC can
use the CPU's PCHK # output or other M-bus-specific
signals to detect errors. Note that the assertion of
PCHK # will occur near the time of the error on the
M-bus ONLY for non-cacheable reads or 82495-cachemiss reads. For 82495-hits and CPU-idle cycles,
PCHK # may arise due to a floating or erroneous CPU
data bus value transferred on the M-bus much earlier.
PCHK # must be ignored by the MBC except during
the CLK after data transfer to the CPU was signalled
by the MBC's CPU BRDY #, because PCHK # indicates i860 XP CPU bus parity status at all times, not
just during clocks of BRDY # activation. The processor inputs INT, BERR, or NMI can be asserted by the
MBC to signal errors. To detect errors originating in
the CPU or 82490 upon a write(back), the MBC can
check parity on the 82490 MDATA pins or on the Mbus.
If the memory bus includes a retry protocol, the MBC
bears the responsibility to implement it, because the
82495 will not retry accesses. For a pipelined MBC in~
terface when the retry occurs after CNA # to the
82495, the MBC must latch the address and other controls (CW/R #, CM/IO#, etc...) from the 82495 to use
in retries. Retry should be triggered by signals other
than the CPU PCHK # output, because the CPU data
transfer cannot be retried although the M-bus transfer
can.
The 82490 can restart a burst data trarisfer (for the case
of an error detected after the first MBRDY # but before MEOC# and before CRDY#). To restart the
82490, the MBC must deassert MSEL# for at least 1
MCLK.
While parity is supported by the 82495 and 82490,
ECC (Error Correcting Codes) cannot conveniently be
used within the cache. ECC can be implemented on the
memory system, but no loads are permitted on the
CPU-to-82495/82490 interface wires for error checking
logic.
Scenarios requiring MBC action are
1) CPU based requests ("Master" mode):
• 82495 cache read miss (and line fill)
• 82495 cache write miss
• Non-cacheable CPU read (including i860 XP CPU
pfld)
• Writethrough (to S-state line) or Non-cacheable
CPU write
• I/O reads and writes
• LOCKed reads and writes (will be readthrough or
. writethrough)
.
2) 82495 based requests ("Master" mode):
• Allocation due to write-miss (line fill)
• Replacement writebacks
• SNPADS # writebacks
3) Requests from other masters ("Slave" mode):
• Snooping of DMA accesses
• Snooping. of accesses of other CPU s (in a multiprocessor)
.
• Bus-specific requests, like interrupt messages, reset
requests, cache flushes, configuration registers, ID
registers, timeout detection, acknowledgements,
TLB shootdown
Transfer Tracking
Tracking of transfers on the M'bus and CPUbus is required of the. MBCduring all of the above scenarios.
This' tracking (counting) of transfers involves activating
BRDY# the correct number of times for the CPU and
MBRDY # (a possibly different ilUmber) for the 82495
and 82490. Transactions on the CPUbus which must be
MBC-controlled can be I, 2, or 4 data transfers, decoded from the BLE# -latched CPU pins:
Intel486 DX CPU: BE3-0#, PWT, PCD
i860 XP CPU: BE7-0#, PWT, PCD, LEN,
CACHE #
.
and from the 82495 pins CW/R#, MCACHE#,
RDYSRC (and CLEN1:CLENO for Intel486 DX CPU
mode).
See [82495/490DS1 for a complete definition of the encodings. The BRDY # activations must be done only if
RDYSRC= 1, and always correspond to the first 1,2,
or 4 MBRDY#s for the 82490-M-bus interface: The
number of MBRDY#s always exceeds or is equal to
the number ofBRDY#s, even for a 128-bit M-bus.
Bursts for line fills and writebacks on the CPUbus always are 4 transfers, but with some 82495 configurations the M-bus is 8 transfers. The addresses are nonsequential when the first access is not at the zeroth word
of the line. The' addresses corresponding to each
BRDY # and MBRDY # follow these rules:
1) CPU burst addresses wrap at CPU line length.
2) When the line address is odd (A2 = 1 for 4-byte bus;
A3 = 1 for 8-byte bus; A4= 1 for 16-byte M-bus), the
next address transferred on CDATA and MDATA
is the LOWER address (eg., 3 followed by 2). The
odd-first-then-even pattern continues for all transfers
of the burst. This order optimizes interleaved
DRAM systems, and applies to both the M-bus and
CPUbus.
2-458
int:eL
AP-452
3) 82490 bursts on CDATA wrap at CPU line length.
82490 MDATA burst addresses wrap at 82490 line
length. For example, a linefill with LR = 4 and a first
Intel486 DX CPU address (A5:A2) = E,
• 82490 CDATA ordering is E F C D
• 82490 MDATA ordering is CDEF 89AB 4567 0123
(128-bit M-bus) OR EF CD AD 89 67 45 23 01 (64bit M-bus)
For LR = 2 (Line Ratio of 82495 to CPU) and CPUbus width = M-bus, below are the burst orders. Each address
corresponds to one 4-byte transfer (for Intel486 DX CPUs) or 8-bytes (for i860 XP CPU). Time is increasing left-toright:
First Address: 1
First Address: 0
CPU transfers:
0 1 2 3
1 032
M-bus transfers: 0 1 2 3 4 5 6 7
10325476
First Address: 2
CPU transfers:
2 3 0 1
M-bus transfers: 2 3 0 1 6 7 4 5
First Address: 3
321 0
32107654
First Address: 4
CPU transfers:
4 5 6 7
M-bus transfers: 4 5 6 7 0 1 2 3
First Address: 5
54 76
54761032
First Address: 7
First Address: 6
CPU transfers:
6 7 4 5
M-bus transfers: 6 7 4 5 2 3 0 1
7 654
76543210
For LR=2 and M-bus = 2*CPUbus width (both buses using 4 transfers),
First Address: 1
First Address: 0
1 032
CPU transfers:
0 1 2 3
01 23 45 67
M-bus transfers: 01 23 45 67
First Address: 2
CPU transfers:
2 3 0 1
M-bus transfers: 23 01 67 45
First Address: 3
321 0
23 01 67 45
First Address: 4
CPU transfers:
4 5 6 7
M-bus transfers: 45 67 01 23
First Address: 5
5 4 7 6
45 67 01 23
First Address: 6
CPU transfers: 6 7 4 5
M-bus Transfers: 67 45 23 01
First Address: 7
7654
67 45 23 01
The remaining transfer orderings for other LR values
can be generated similarly, as an exercise for the reader.
For requests originated by the 82495, the MBC must
ignore the CPU pins (CACHE#, LEN, PWT, PCD,
PCYC, CTYP, and BE7#-BEO#). These requests are
writebacks, allocations, or linefills. Also the MBC must
prevent the transfer of those signals to the M-bus for
82495 requests-for example, it must force all BE7 # BEO# active during writebacks. The 82495 based requests can be recognized by:
RDYSRC=O .AND. MCACHE# =0 (for writebacks, linefills, allocations)
RDYSRC=O .AND. MCACHE# =0 .AND.
MKEN# =0 (for linefills, allocations)
For posted write requests (RDYSRC = 0 and
MCACHE# = I), the length is I, 2, or 4 transfers and
the MBC must heed the BLE#-latched BE7-0#,
LEN, and CACHE # .
Clock Boundaries and Synchronization
To optimize performance, the 82495/82490 allow total/decoupling of the CPU clock at 50 MHz from the
M-bus clock. While both the CPU and M-bus could
run at 50 MHz, the physical size of the M-bus would be
severely constrained. Future faster versions of CPU and
82495/82490 would make a synchronous M-bus even
less feasible. However, with a 100% synchronous inter-
2-459
inteL
AP-452
face, little time is lost in relaying requests from the
82495 CADS# to the M-bu~, and in transfering data
from the M~bus to the CPUbus. '
Yet with careful design, a slower M-bus such as
33 MHz can handshake with a 50 MHz 82495 with
only a couple of clocks spent on synchronizing. Furthermore, the transfers requiring synchronizing are
fairly rare uncached cycles, cache misses, and snooping.
CPU performance is improved further because
82495/82490 always post writes destined for the
M-bus, allowing the CPU to continue processing upon
write cache-misses and non-cacheable writes.
Most of the 82495 operates on the CPU CLK. Only the
snooping control inputs operate on another clock,
called SNPCLK (SNPSTB#, SNPINV, SNPNCA).
SNPCLK can be the same as, the MCLK controlling
82490 MDATA. A SNPCLK can be used with 82495,
even if the 82490 is strobed without an MCLK. All
82495 outputs, including snooping results (MHITM #,
MTHIT#, SNPCYC#, and SNPBSY#) remain on
the CPU CLK.
The 82490 operates half in the CPU CLK domain and
half in the M -bus domain. While no control signals flow
through 82490 between memory and the CPU, 82490
implements a flow-through data connection of CDATA to MDATA. Synchronization of the 2 DATA
paths is unneeded, as the control signal MBRDY # gets
synched by the MBC to the CPU clocked BRDY #.
The MBRDY # and BRDY #, inputs control multiplexers inside 82490 to choose which part of a line-fill or
write is transferred to/from the bus. The MDATA input latches are closed on MCLK (or MISTB for nonclocked operation), and CDATA input latches are
closed with CLK.
If MCLK = CLK at 50 MHz, approximately 1.5 CLK
periods are required to transfer data through the 82490,
including 82490 propogation delay (15 ns) and setup
time to both the 82490 (5 ns) and CPU (7 ns for i860
XP CPU "CMOS" levels). The MBC must assure data
setup time at the CPU DO-D31 (D63) pins to the rising edge of CLK for the cycle of BRDY # assertion
during reads, based on the propogation delay from
MDATA to CDATA listed in the 82490 AC timing
specs. Writes are not flow-through, as 82490 always
buffers the write-data and later 82495 gives CDTS# for
the 'write.
Most of the MBC-to-82490 signals are sampled by
82490 with MCLK, except for BRDY # and CRDY #:
MBC- 82490 Signals
MClK
ClK
MBRDY#
BRDY#
MFRl#
CRDY#
MZBT#
MDATA
CDATA
MSEL#
MEOC#
MDOE# (asynchronous to both clocks)
The MBC must be partitioned into an MCLK side and
a CLK side. Fortunately, the CPU-side of MBC passes
only a few signals to the MCLK side, and visa versa.
The signals listed below from the dual-i860 XP CPU
MBC design in Appendix C must go through a synchronizer. Refer to the Appendix for signal definitions.
In the following diagram, a right-arrow ( ~ ) identifies synchronizing to CLK, while a left-arrow (~)
means synchronizers on MCLK:
Clock Domain of the Signal:
MCLK or SNPClK
MRESET
YBGT#
YMEOC#
YCEOC#
MBFiDY#
MSWEND#
MADS#
Neither
~
~
~
~
~
~MSWENDA~
~
ClK
RESET
BGT#
CRDY#
BRDY # _maybe
BRDY # _maybe
SWEND#
CADS# .or. SNPADS# .or. CDTS#
The signals MKWEND# and MNA# might also need synchronizing to CLK, if they are derived from M-bus
responses.
2-460
AP-452
Two TI 74AS4374 "Dual-Rank Synchronizer" chips
(Figure 7) are used to transfer critical signals between
clock domains, while avoiding metastability. This 20pin DIP has one clock input and 8 pairs of flip-flops.
Thus each of the 8 "Q" outputs reflects the value of its
"D" input after 2 clock periods. One chip is clocked by
CLK and the other by MCLK. If fewer than 8 signals
need synchronizing, chips such as the Signetics
74F50728 or Intel's 85C220 EPLD can combine synchronization with other functions [Ham90].
For an asynchronous or strobed memory bus, M-bus
signals (such as MBRDY #) get delayed by the synchronizer for 2 CLK periods before the 82495 can see
them. For a clocked (but not by CLK) M-bus, 82495
outputs (such as CADS#) get delayed by 2 MCLKs by
the other synchronizer before the M-bus sees them.
The following 82495 signals are defined as "asynchronous", meaning that no external synchronizer is required:
o FLUSH#, SYNC#
" MALE, MBALE
o MAOE#, MBAOE#
Many signals can cross clock boundaries without synchronizing, because they· will be ignored until corre-
sponding status signals such as SWEND # and
CADS # have been synchronized by the MBC. Thus
they will be stable when sampled:
• MWB/WT#, DRCTM#, MTHIT#, MHITM#
(sampled when SWEND#)
o
RDYSRC,
KLOCK #,
CPLOCK #,
when CADS#)
Other signals do not cross clock boundaries, but remain
within the MBC CLK logic:
o CNA#, PALLC#, CACHE#, LEN, PCD, PWT,
CTYP, PCYC, MFRZ# ...
Synchronizer Delays
To avoid lost time due to synchronizer delays, the following options exist:
1. Pipeline the 8249S/MBC interface. This hides the
delay in synchronizing CADS# to its MCLK counterpart MADS # .
2. Define the M-bus protocol so that MBRDY # precedes MDATA by 1 MCLK for reads. Thus the 2
CLK delay in creating BRDY # from MBRDY # is
hidden. Likewise define MSWEND # to precede
x
10
1D
-
(1/401 a 74AS4374)
20
2D
CLK - 0 - - - - - - '
CLK
D
x
Q
240957-7.
Figure 7. Synchronizer Hardware and Waveforms
2-461
CW /R #,
CD/C#, CM/IO#, MCACHE#, BE7:# (sampled
int:eL
3.
4.
5.
6.
AP-452
MHITM #. and MTHIT # by a CLK, by generating
MSWEND# from SNPCYC#.
Keep the snooping signals (SWEND#, MHITM#,
MTHIT#, SNPINV, SNPCYC#) which flow between 82495s on the same CLK, so that no synchronizers enter the snoop path. This is feasible only for a
small number of physically proximate CPUs.
Synchronize the snooping feedback signals from the
M-bus (MSWEND#, etc... ) only at the destination.
They will be asynchronous to MCLK; transitioning
with the individual CLK of their source.
Avoid MCLK, using a strobed-only M-bus. Strobed
buses appear in single-CPU systems with an unclocked DRAM interface.
Activate MEOC# to 82490 as soon as possible after
the last MBRDY#. MEOC# allows 82490 to begin
the next data transfer without waiting for CRDY#
synchronization.
BRDY # Generation
Below are .recommended sequences of the 82490 and
CPU burst-transfer "Readys" for CPU reads, assuming
the bus widths are equal. Sequences with more clocks of
delay are acceptable but .suboptimal.
1) Synchronous M-bus (MCLK = CLK): MBRDY#
precedes BRDY # by 1 or 2 CLKs, to allow propogation time for data through the 82490 and setup
time at the CPU pins.
2) "Divided Synchronous" M-bus (e.g., CLK = 50
MHz, MCLK = 25 MHz, skew controlled):
MBRDY # precedes BRDY # by 1 or 2 CLKs. The
BRDY # state machine must ignore MBRDY # in
the CLK period after it was sampled active.
3) Other Clocked M-bus (MCLK < CLK):
MBRDY # must go through a dual-rank synchronizer latch (such as the TI 74AS4374) clocked by
CLK to produce BRDY #. That means 2 CLK delays between MBRDY# and BRDY#. MBRDY#
MUST remain active for at least 1 eLK period to
assure that the synchronizer latched it active. To
avoid one MBRDY # getting wrongly sampled active twice, the BRDY # state machine should ignore
any second MBRDY # in the CLK period after it
was sampled active.
4} StrC)bed M-bus: here MISTB# must go through the
synchronizer with 2 CLK delays to create BRDY #.
An edge-sensitive strobed M-bus avoids the problem
of wrongly converting one M-bus transfer to 2
BRDY#s, as a level-change marks each M-bus
transfer.
When M-bus width is greater than CPUbus width, the
above rule holds only for the first BRDY # . Successive
BRDY # activations follow the rules below:
oM-bus = 2*CPUbus: 2 BRDY #s occur for each of
the first 2 MBRDY#s. The second BRDY# should
occur I CLK after the first. The third BRDY # cannot begin until after the second MBRDY #.
oM-bus = 4*CPUbus: 4 BRDY#s occur for the
MBRDY #. The last 3 BRDY #s can occur immediately in the 3 CLKs after the first BRDY #.
For asynchronous systems (MCLK < CLK), high performance design choices are:
M-bus width = 2 * CPUbus width OR
M-bus width = 4 * CPUbus width
The wider M-bus allows each M-bus transfer to satisfy
2 or 4 CPU transfers, so that the CPU is not starved for
data during a line fill. The 82490 switches its CDATA
outputs to the next value the CLK after BRDY # assertion by the MBC for the current value, so the MBC
controls the provision of data to the CPU on Iinefills.
A low-cost MBC can use M-bus width = CPUbus with
a slower MCLK, by converting the first MBRDY # to
BRDY# through a synchronizer. The last 3 BRDY#s
can be asserted by MBC after completion of all the Mbus transfers. That will allow the CPU to proceed executing after receiving the first datum, which is the one
it was waiting for in most cases. Alternatively, the Mbus protocol can be defined so that no idle clocks occur
on M-bus after the first MBRDY# and the MBC
knows by counting CLKs when to assert successive
BRDY#s.
Shown in the following timing diagrams are data transfers on both buses for CPU reads. Although they assume no dead clocks (wait states) during the M-bus
burst, dead clocks are allowable.
Writes are not shown in the diagrams because the MBC
never supplies the CPU BRDY#s for burst writes.
RDYSRC = 0 for most writes, and the 82495 controls
the CPUbus transfers. The exception to this rule is 1/0
writes, which 82495 does not post; for I/O writes, the
MBC supplies BRDY # to the CPU, but 1/0 accesses
are always I non-bursting transfer.
2-462
AP-452
ClK
CDATA
BRDY#
MDATA
MBRDY#
240957-8
Figure 8. Data Transfers, M-bus Width = CPUbus Width. MClK = ClK
CLK
II
CDATA
8RDY#
MDATA
M8ROY#
MCLK
240957-9
Figure 9. Data Transfers, M-bus Width = CPU bus Width. ClK/2 < MClK
Note the starvation on the CPUbus (extra wait state)
< ClK.
eLK
COATA
8ROY#
MOATA
M8RDY#
MCLK
240957-10
Figure 10. Data Transfers, M-bus Width
=
2*CPUbus. ClK/2 < MClK < ClK
ClK
COATA
8ROY#
MOATA
M8ROY#
MCLK
240957-11
Figure 11. Data Transfers, M-bus Width
2-463
=
4 '~PUbus
in1'el .
AP-452
access with bus data and address transfers, as in the'
i860 XP CPU pipelined bus.
Pipelining
Pipelining the MBC-to-S2495 interface reduces latency
by allowing the MBC to arbitrate for the next M-bus
transaction while the first is proceeding. If the M-bus is
also pipelined, it allows the snoop for the next to begin
during the data transfer for the first.
Signals used in pipelining the 82495 are CNA #,
BGT#, MALE, KWEND#, SWEND#, and
CDTS#. The 82495 will not listen to CNA# until the
clock ofBGT# activation. Also, KWEND# activation
sometimes allows the 82495 to create a next cycle, such
as an allocation after a write miss. MALE deassertion
allows the memory address to remain at the value for a
previous request, even though the next request CADS #
and other control signals have already occurred in response to CNA #. The MBC must latch the 82495 out, put signals which change in response to CNA #, until
their status no longer matters to ongoing cycles.
Note that 82495 and 82490 automatically pipeline the
CPUbus interface to i860 XP CPU by activating NA #
and latching address and data.
Pipelining the M-bus itself involves sending a next address for snooping and DRAM access while data transfer from the current address still remains incomplete.
This increases bandwidth by overlapping slow DRAM
While each 82495 allows only a one-stage deep pipeline,
the M-bus can have a deeper pipe as requests from several different 82495s can be in progress. The number of
'stages in the M-bus pipe should match memory access
latency. For example, use 3 stages for a 240 ns memory with a 120 ns bus MADS#-to-MNA# (and
SWEND#) time, so that a second and third request get
issued during the memory latency of the first. Pipelining does not imply that multiple snoops are ongoirig
waiting for SWEND#; that is a split-transaction bus,
defined in a later section. Thus a quick SWEND#
turnaround time speeds a new request onto the M-bus.
The advantage of a pipelined bus using a 4-transfer
burst is illustrated in Figures 12 and 13. Assumed is a
fast memory access time of 4 MCLKs. With a slower
a"cess time, pipelining becomes more important for
maintaining data bus bandwidth; even with the
4-MCLK access, the unpipelined data bus is idle 50%
of the time.
M-bus Arbitration
If the M-bus possesses more than one master, each
MBC must arbitrate to gain control of the M-bus when-
MCLK
MDATA
MBRDY#
MADDRESS
MADS#
240957-12
Figure 12. Data Transfers for Non-Plpelined M-bus. Note low MDATA Bandwidth.
240957-13
Figure 13. Data Transfers for Pipelined M-bus
2-464
Intel.
AP-452
ever its 82495 activates CADS #. No arbitration logic is
included in 82495 nor 82490, except for the ability to
float. (Hi-Impedance) the 82495 and 82490 M-bus outputs via the MAOE# and MDOE# signals. The
BGT# and MAOE# inputs to 82495 are from MBC
arbitration logic. The simplest systems can use a
HOLD/HLDA/BREQ protocol like the i860 XP CPU
and Intel486 DX CPUs themselves, which is centralized arbitration.
Expandible buses like Futurebus+ and Multibus-II use
distributed arbitration to allow a variable number of
masters. Bus parking (retaining ownership of the M-bus
until another master requests it) is advised to avoid unnecessary delay.
The "restricted backoff protocol" of 82495 requires
that it be granted the bus for a modified-line writeback
after it activates MHITM #, before it will snoop or initiate any other transactions. The snooping MBC must
relinquish the M-bus immediately after the CRDY # of
the M-line writeback so that the original owner can
complete its work.
Sequencing
A typical sequence of request and response signals between the 82495 and MBC is shown in Figure 14. The
"SL" entities (CPUSL, 82495SL, 82490SL, MBCSL) are
for another CPU/Cache core, the SLave(s) who snoop
when the master CPU owns the bus. No DMA (such as
EISA or MCA) interaction is shown, but it will be similar to the CPU responses, except that no writeback will
be done by DMA. Time increases downward. A minussign prefix means deassertion.
The arbitration for the M-bus shown in the diagram
assumes a HOLD/HLDA protocol like the CPUs use.
That is a primitive centralized scheme, suitable only for
a small number of processors.
The sequencing may vary from that shown; for example, MSEL# may precede CDTS#. MADS#,
MW/R#, MA31:3, MM/IO#, MD/C#, and
MBE7-0# would all be valid simultaneously. The signals in parentheses would be asserted only in the case of
a M-line hit in the snooper, and some signals for that
writeback and possible cache-to-cache transfer are not
shown.
2-465
int'eL
AP·452
CPU
MBC
82490 82495
- -ADS#
-CLK-
•
•
MCLK
MBREQ
>
.
-- -
-CLKsL-
•
>
CADS#
>
MHOLD
<
<
CPU
82495SL 82490SL CPUSL
M-BUS
<
MAOE#
<
MALE
T
I
M
E
>
MHLDA
-MHOLD
MBC
82490 82495
-MAOE#
>
82495SL 82490SL CPUSL
M-BUS
MADS#
>
MA31:3 >
>
MW/R#,etc>
>
t
>
>
SNPSTB#
>
SNPINV#
>
<
<
82495SL 82490SL CPUSL
M-BUS
<
<
SWEND#
<
<
BRDY#
<
MEOC#
<
<
CRDY#
<
-MAOE#
82490 82495
>
....
..
MBRDY#
<
MSWEND#'
MBC
M-BUS
82495SL 82490SL CPUSL
240957-14
Signal might occur sooner or not at all. depending on the type of request and bus protocol .
•• = These lines of the sequence occur only on a Hit-fa-Modified (MHITM#)
• =
Figure 14_ MBC Signals and Protocol Layers
2-466
AP-452
Flowchart of MBC Algorithm (not applicable to all cases)
CADS#
!
M-bus alroady owned by Ihis MBC ?
YON
Arbitrale for bus.
~nable
824-95 to drive address to bus (MAOE#, MALE).
Echo olhor roquesl paramelers (MW/R#, MCACHE#, elc •.• ) to Ihe bus.
t
!
Asserl BGT#.
Dolermine cacheabilily, assorl pins KIYEND#, MKEN#, MRO#.
Lalch conlrol signals (MIY/R#, elc •.• ).
Assorl CNA# 10 invoke noxl 824-95 request.
!
MHITM# from olher maslers ?
NOY
Aborl Memory cycle. Do Cache-la-cache Iransfer.
!ail for ·CDTS# (before beginning dolo Iransfer).
!
!
Forward snoop rospon.e. 10 mosier 82495
using SWEND#, MWB/WT#, DRCTM#.
Signal burst Iransfers of M-bus via MBRDY#.
If RDYSRC = I, echo bursl Iransfer acknowledgmenls on BRDY#.
Componsale for LR<> 1 by stopping BRDY# asserlion when CPU line fillod.
t
!
Notify 82495 and 82490 of completion of Iransfer via· MEOC# and CRDY#.
New CADS# ?
N
Relinquish bus ownership.
Dea ..erl MAOE# 10 re-enable snooping by this 82495.
2-467
240957-25
int:et
AP-452
Cacheability of each request must be determined by the
MBC to prevent the 82495 and CPU from caching
things like memory-mapped I/O device registers. The
i860 XP CPU CPU samples its KEN# (Cache ENable)
pin at the time of the first BRDY # for a" transfer or at
NA #, whichever comes first. The 82495 offers more
flexibility than the CPU cacheability indicators, by using the KWEND# (cacheability Windown END) input to indicate validity of the MKEN# and MRO#
pins. The values ofMKEN# and MRO# are based on
address decode, either locally in the MBC or from a
centralized decoder on the memory bus. For best performance, KWEND # should come as soon as possible,
as it allows 82495 to decide what the next CADS#
should be-for example, to begin an allocation for a
write miss, or to start another writethrough.
A typical implementation would activate KWEND# 2
clocks after CADS #, using a PLD or fast SRAM to
decode the upper bits of the address to generate
MKEN# and MRO#.
Note that KWEND#, SWEND#, and BGT# need
not be asserted by the MBC for SNPADS # cycles
(snoop writebacks), but it may be simpler to assert
them always.
Snooping
Snoop handshaking (bus watching) is useful in a multiprocessor system, and may be needed in a uniprocessor
system where the 82495 and CPU caches must be kept
consistent with DMA accesses. The 82495 must snoop
all DMA accesses to memory. The MBC sees requests
from DMA (or other processors) on M-bus and converts them to SNPSTB# activations to the 82495. The
following scenarios. are possible:
• DMA (or other processor) read causes 82495
MHITM #: 82495/82490 must writeback the modified line to memory before the first DMA data
transfer occurs (unless the DMA controller is capable of re-trying the read. If the DMA can retry, then
the 82495 writeback must cause the initial DMA
access to be aborted.) The MBC can assert
SNPNCA (SNooP Non-CAcheable Access) to the
82495 for a DMA read, so that the 82495 knows it
can keep the block Exclusive upon a hit.
• DMA (or other) read causes 82495 MTHIT# but
not MHITM #: MBC must assert the "shared"
status line of the M-bus, if the bus includes such a
line.
• DMA (or other) write causes 82495 MHITM#:
82495/82490 must writeback the modified line to
memory before the first DMA data transfer occurs.
SNPINV should be activated to 82495 to invalidate
the line.
• DMA (or other) write causes 82495 MTHIT# but
not MHITM #: SNPINV should be activated to
82495 to invalidate the line. Note that 82495/82490
cannot "write snarf'-they do not absorb write-data
from the memory bus and merge it with current cached
contents of the line. However, they can absorb a fullline writeback from the M-bus when doing a linefill of
the same address (see the section on Cache-to-CacheTransfers).
Bus size adaptation can be done by the MBC, although
it is not necessary in most systems. In an Intel486 DX
CPU or i860 XP CPU system without an 82495/82490,
an 8-bit device like a ROM can be used to contain code,
and the CPU will automatically fetch at byte-width
when the BS8 # (Inte1486 DX CPU) or CS8 (i860 XP
CPU) pin is asserted. However, if a byte-wide ROM is
used with an 82495/82490, adaptation of this byte interface is required from the MBC.
If the ROM code is to be cacheable, the MBC must
convert the 82495 line fetches at the ROM location to
the appropriate number of byte-wide ROM reads.
Latching transceivers must be employed at the 82490
MDATA inputs or at the ROM output, to assemble the
single-byte ROM reads into 4 (or 8) bus-width-wide
transfers to the 82490s.
If the particular M -bus protocol requires transfer
widths shorter than the 82490 data width used, the address range requiring such transfers can be made noncacheable to force 82495 and 82490 to use the width
given in the request from the CPU.
Bus size adaptation would also be needed to support a
512kB cache on a 32-bit memory bus. In that case, the
MBC must control transceivers and MBRDY#s to interface between the 64-bit 82490 MDA T A path and the
32-bit M-bus.
Bus Signal Levels
Redriving 82495/82490 signals to the M-bus (such as
MDATA, addresses, and 82495 control outputs) can
optionally be done by the MBC. If, the M-bus signal
levels are not TTL, like ECL or Futurebus + BTL
(Backplane Transceiver Level), then appropriate transceivers must lie between the M-bus and 82495/82490.
Also M-buses with heavy capacitive loads should be
redriven by transceivers, although 82495 and 82490 can
tolerate loads of up to 100 pF.
An additional advantage of buffering the 82495/82490
signals with transceivers in" a multiprocessor is that a
"local M-bus" will exist between the chips and the
main system M-bus. That allows some local traffic from
the CPU module to attached peripherals to avoid traversing the M-bus. Such peripherals might include an
MPIC/CCU (MultiProcessor Interrupt Controller/
Concurrency Control Unit), a JTAG boundary-scan
controller, or a time-of-day clock, as in the Sequent
Symmetry multiprocessor.
2-468
AP-452
8.0
MBC FUNCTIONS fOR
MULTilPROCIESSORS
Multiprocessor cache designs have additional motivations beyond the uniprocessor goal of reducing memory
access latency. Reducing memory bus usage is especially important because the sharing of the bus creates a
bottleneck. Thus multi-82495 systems need to minimize
the number of transactions and make each one as short
as possible. Large caches (256k or 512k) are recommended for multis, to keep the miss rate as low as possible.
In addition to the uniprocessor functions, an MBC in a
multiprocessor must handle consistency with caching
agents other than its own 82495. The multiprocessor
MBC may also for performance reasons implement
snoop filtering, cache-to-cache transfers, read-for-ownership,. and split transactions.
Snooping results from listeners (slaves) on the bus must
be fed back to the master 82495 by the time SWEND#
is activated, if the system uses writeback policy (write-
~i
82495
~ ~ ~ ~
>
%
~ ~ ~
~
Q
Q
If a 82495 linefill or writethrough hits a dirty line in
another cache, the MBC cannot BACKOFF the 82495.
. gl
~ ~
~
~
" "
~
S
R
Because 82495 will tolerate SWEND# arrival up until
CRDY #, the M-bus data transfer for reads can overlap
the snooping delay. The transfers (MBRDY#s) can occur during snoop latency, and an MHITM# activation
would cause the MBC to restart the transfer using
82490's MSEL# pin.
..
f5 t ~ ~ s:
B1i
through requires no feedback). These results
(DRCTM#, MWB/WT#) are translations of the
slaves' MHITM# and MTHIT# outputs. As shown in
Figure IS, typically all MHITM# outputs would be
wired-or via open-collector transceivers. Because slaves
on the bus may be busy with CPU operations and backinvalidations, the snoop delay can vary. Thus a latched
derivative of the SNPCYC# output of all 82495s
would be wired-or to derive SWEND#. Alternatively,
the MBC can count CLKs to generate SWEND#, using the worst-case upper-bound of CLKs required for
all 82495s to snoop, but that makes all snoop windows
long.
0
0
0
I
82495
CACHE TO
CACHE
TRANsrER
AND
READ FOR
OWNERSHIP
LOGIC
~
CLK
DUAL - RANK
74AS-4374
I
DUAL - RANK
SYNCHRONIZER
HAS43U
-
Q
SYNCHRONIZER
I
0
I
OPEN COLLECTOR
BUFFER
I
7<4r07
MClK
-- - - ---- - - --- - -----
---- -
---------
IAADS#
MSWEHDA
~r"BC
III ++
I I
I
-}--
1
t.I!NHIBITN (t.lHITI.4#)
MSHAREO# (MTHITNc)
I.4SNPINV
240957-15
Figure 15. Creating Snoop Results from MHITM #, MTHIT #, and SNPCYC #
2-469
inteL
AP-452
Labeling that other cache "the dirty 82495," and the
initiating 82495 "the master 82495". The master MBC
must force a retry of the access after the dirty 82495
dumps the line, but the master 82495 has no "Backoff
and Retry" input pin.· Rather, on a linefill the master
82495 must see the data transfer as if it had come from
memory. On a write, the master 82495/82490 data
write must wait until the modified line from the dirty
82495 has been dumped to memory. To do so, the master MBC can either:
I) Delay the corresponding MBRDY#s to the master
82490 until the modified line is completely written
into memory and read out of memory. That implies
the master MBC will remake the initial request to
the memory controller after the writeback.
OR
2) Create a cache-to-cache transfer, so that the write. back data movements go directly into the master
82490 over the M-bus. A later section describes
cache-to-cache transfers. Such transfers are quicker
than waiting for the entire modified line to be written back to memory.
Note that the 82490 can restart the data transfer for
reads or writes, in the case of MHITM # activation
after the first MBRDY# but before MEOC# and before CRDY #. To restart the 82490, the MBC must
deassert MSEL# for at least 1 MCLK.
Snoop Window Time (the delay from MADS# to
SWEND#) limits address-bus bandwidth. In the interval from the address on M-bus until the acknowledgement (SNPCYC#) by all listeners, no more requests
(addresses) can be on the bus. This restriction is implied by:
1) A typical M-bus has only one MSWEND# wire,
which cannot be identified with the proper request if
several requsts are outstanding.
2) 82495 does not snoop between BGT# and
SWEND#.
3) 82495's "restricted backoff protocol". That protocol
requires the M-line writeback to be the first transaction by any 82495 which generates MHITM #, and
82495 cannot snoop anymore until it finishes the
MHITM # writeback.
Data for read-misses cannot be transferred on the
CPUbus until SWEND #, because the MBC cannot
abort a CPU transfer after giving the first BRDY #.
Thus the snoop window length influences CPU performance. Depending on the number of processors, bus
speed, and memory speed, two scenarios arise from
snoop window length versus memory access latency:
I) SWindow < Memory Latency: SWEND # precedes
the MBRDY#s. If MHITM# occurs, the original
memory access can be aborted and its MBRDY#s
must be ignored.
2) SWindow > Memory Latency: data transfer on
M-bus can proceed, with MBRDY#s causing 82490
linefill buffers to advance. After SWEND #, the
MBC can begin BRDY#s to the CPU and 82490 if
MHITM # is inactive. If MHITM # is active, the
MBC must restart the M-bus data transfers after (or
during) the writeback from the modified snooper,
and can begin BRDY#s immediately after the first
MBRDY#.
The typical snoop window in a multiprocessor using
the hardware of Figure 15 is about 7 CLKs total snoop
turnaround delay, shown in Figure 16:
CLK for propagation delay of master's
MADS# (to slave 82495s' SNPSTB# inputs)
+ 0.5 to I CLK for 82495 to internally latch
SNPSTB # and synchronize it to CLK .
+I
CLK for 82495 tag lookup and SNPCYC#
(or more, if 82495 is busy with SNPBSY #)
+1
CLK to latch SNPCYC# into the MBC
Set/Reset flip-flop generating MSWENDA.
+I
CLK for MSWENDA open-collector buffer and settling time from all slaves.
+2
CLKs for MSWENDA to get through synchronizer (on the master MBC's CLK) and
inverter to generate SWEND# to the master 82495.
The window total assumes that the slave 82495s' one
CLK delay from SNPCYC# until MHITM# is concurrent with the synchronizer delay for creating
SWEND# from MSWENDA at the master. Those 2
CLKs can overlap with the next MADS # if it is asynchronously generated from MSWENDA. Shorter
snoop window times can be obtained using duplicate
external tags as explained later, but this is not trivial.
Read for Ownership (RFO) protocols decrease bus traffic by avoiding the M-bus write which would occur
upon a write-miss. That is, a write-miss would go to the
bus, followed by a 82495 line allocation request for the
missed area. With RFO, the MBC does not echo the
82495 write request to the M-bus. Instead, it asserts
MFRZ # to freeze the written data in the 82490 memory buffer, and allows the subsequent 82495 allocate line
request to go to the bus. When the line data returns on
the M-bus, MBC asserts DRCTM# to cause the 82495
to mark the line as Modified (the memory system and
other caching agents do not know of the original write
miss, so they have invalid copies of the line).
Signals which the MBC must use to do RFO are:
1) PALLC# (Potential ALLoCate): from the 82495
must be active on the write miss.lf not, RFO cannot
be performed.
2) MKEN # and CRDY #: must be activated by the
MDC for the write, to trigger the 82495's subsequent
allocation request
2-470
AP-452
10
SNPClK
ClK
MADS#
(SNPSTB#)
SNPCYC#
lSNPCYC#
MSWENDA#
SWEND#
Slave 82495s see SNPSTB#
~
,
Master 82495 sees SWEND#
Slave 82495s begin snoop (ClK) after internally
synching SNPSTB
240957-16
Figure 16. Snoop Waveforms
3) MFRZ#: must be activated by the MBC to the
82490 at the time of the MEOC# and CRDY # for
the write.
4) INVAL (memory bus Invalidate indication): must
be asserted by the MBC during the allocate-read to
force all other 82495s to invalidate their now-obsolete copies of the line. Slave MBCs will assert
SNPINV to 82495s.
5) DRCTM # (DiReCt To Modified): must be asserted
by the MBC during the SWEND# of the allocate, to
make the 82495 put the line in M-state.
6) MWB/WT # : must be asserted during the
SWEND # of the allocate.
7) CPLOCK # (82495 Psuedo Lock in Intel486 DX
CPU systems): if active, the MBC must NOT do
RFO, because 82495 will activate PALLC# only on
the second of the 2 writes. If the MBC tried to RFO,
it would merge only half of the data into the modified line.
See [82495/490Dsl for RFO information.
Cache·to-cache transfers (CTCT) optimize the speed of
consistency actions in a multiprocessor. For a read linefill by a master causing an MHITM # from a slave, the
writeback data movements go directly into the master
82490 over the M-bus from the dirty 82490. For a
write, Read-for-Ownership (RFO) is required for the
CTCT. If RFO is not implemented, then the cache-tocache option can be used only on Iinefill (read) misses.
In fact, RFO makes every write-miss into a linefill. The
82495/82490 do CTCT only on entire lines, not bytes
or words.
2-471
For CTCT on a Iinefill causing MHITM #, the MBC
doing the writeback must initiate the writeback at the
subline address of the initial read. Starting the writeback from the first word of the line is NOT acceptable.
While CTCT is faster than re-reading the line after
waiting for the dirty writeback, the latency will be longer in most systems than for fetching lines from main
memory. CTCT would actually waste time for such
items as shared instruction pages. For non~written data,
transferring from memory to a CPU is probably faster
than tranferring from another cache. So 82495 supports
only M-line CTCT (no writeback occurs unless
MHITM#).
Signals involved in CTCT are DRCTM#, MZBT#,
MHITM#,
MBAOE#,
and
MSEL#.
See
[82495/490DSl for CTCT information.
Snoop filtering can be implemented by the MBC using
the 82495 SMLN# (SaMe LiNe) output to reduce the
latency for snooping. That is, SWEND# can be asserted immediately ,to the requesting 82495, if the 82495
asserts SMLN # to indicate the current request is to the
same line as the previous request. In that 'case, other
caches already have checked this line. SMLN # must
be ignored if the M-bus has been used by other agents
between the 2 82495 requests. The M-bus protocol need
not include a "non-snooped, transfer type" for the use
of this feature, as the MBC can simply ignore the snoop
responses from other MBC/82495 modules.
int'el.,
AP-452
Split transaction (ST) memory-buses such as Futurebus+ prove valuable in high performance systems. An
ST (also called "connect/disconnect" or "packet
switching") bus divides a single read request into a separate address-transfer phase and a data-transfer phase.
Thus the bus iuiot monopolized during the long latency involved in· accessing data across bus hierarchies.
Writes typically are not split, as the data and address
are available simultaneously from the writer. In a hierarchical bus, requests must be forwarded across bridges
for the purposes of snooping and memory access at remote nodes, and the snoop latency may be long. Thus
the bus should be freed between initial request and
snoop-response for use in other transactions.
The 82495 does not support ST directly. That would
require snooping current cache contents and queue-up
possible writebacks, for the accesses from other bus
agents between the time of the BGT# (the address
phase) and SWEND# (end of the address phase or
later). Also 82495 cannot writeback dirty data between
SWEND# and CRDY# (end of the data phase) of an
ongoing cycle; it cannot suspend a transfer for later
resumption after a snoop writeback.
quired; the duplicate tags are in the SNPCLK/MCLK
logic. While they are a high-performance option, they
are costly and complex.
Memory cycle abort is required in multiprocessors
when a snooping 82495 activates MHITM # to signal
that the memory's copy of the data requested by another 82495 is obsolete. As explained above, the memory
read or write must be INHIBITED until the writeback
is done. Depending on implementation, the original access may need to be retried or abandoned. If CTCT and
RFO are implemented, then abandonment is probably
adequate. Alt.hough the complexity of aborting could
be avoided by delaying all memory action until
SWEND # , that would decrease performance. An
M-bus signal such as "SIV" (System InterVene) or
"MBOFF#" (M-bus Back OFF) allows the MBC of
the snooper to tell memory to abort.
If the M-bus is pipelined, there may be constraints on
when the MBC can assert the "abort" signal to avoid
cancelling the access in progress for the transfer preceding the one causing MHITM # .
Locking
CADS#
BOT#
SWEND#
CRDY#
r---------~INNNNNNNNNNNIDDDDDDDDDDDI
NN = No snooping by 82495 will occur in this area
DO = Delayed response by 82495 to snoop requests
here. MTHIT# and MHITM# asserted immediately, but writeback of dirty data delayed until after CRDY # for ongoing cycle.
82495's inability to snoop during the NN period comes
from the need to keep 2 addresses into the tags activeone for the outstanding 82495 request, whose tag must
be updated at SWEND# based on MWB/WT# and
DRCTM #, and one for the snoop inquiry. Furthermore, any MHITM# on the M-bus could not be easily
linked to the request causing the snoop if 2 snoops are
outstanding.
To support split transactions by snooping between
BGT # and SWEND #, a set of tags external to the
82495 can be implemented in the MBC. Those tags
would replicate the contents· of the 82495 internal tags, listening to all memory bus requests and responding
with snoop results. Only when a 82495 state change (to
I or S) is needed will the 82495 be informed of snooping
action-only then will the external tags relay the snoop
request to it.
Duplicate tags provide quicker snoop turnaround because no SNPCLK-to-CLK synchronization is re-
Locking of the M-bus' using the 82495's KLOCK#
output is required to ensure atomic accesses for CPU
locks. For example, memory variables called semaphores in a multitasking airline-reservation system prevent two processes from trying to update the same list
of flight reservations simultaneously. A task would read
the value of the semaphore in an uninterrupted readmodify-write (RMW) sequence, asserting the CPU's
LOCK # signal during the RMW to block interrupts l
(and block locked accesses by other processors to the
same semaphore in a multiprocessor). If interrupts or
other accesses were allowed during the sequence, two
processes (or processors) might both read the semaphore as "available" (zero) and both assume ownership,
setting it to "unavailable" (nonzero). Then both might
find the same empty seat and write their individual passenger's name in the same seat location. In the end, 101
passengers would have tickets for a 100-seat plane
flight.
The 82495 and i860 XP CPU implement locks in a
sequentially consistent, or serializing, manner. That is,
all data loads and stores within the locked sequence
occur on the external bus in. the same. order as they
appear in the program. Also, all accesses in the program before the LOCK instruction are completed before the first locked read or write, and all the locked
reads/writes complete before other accesses after the
locked sequence. This sequentiality is required by the
semaphore example above, to prevent the CPU from
updating the reservation list before it has obtained ownership using the semaphore.
'The CPU automatically blocks interrupts during the LOCKed sequence. The bus arbiter is responsible for blocking other accesses.
2-472
AP-452
The MBC must serialize by ensuring all back-invalidates from 82495 to the CPU have completed before
activating BRDY # for any locked read or write. So the
MBC must postpone locked BRDY#s until CAHOLD
is' inactive and SNPCYC# has been inactive at least 2
CLKs (refer to [82495/490DS] section 5.1.1).
Bus Lock vs. Address Lock
The 82495 echoes the CPU's LOCK # signal onto its
KLOCK# output, and forces all CPU accesses to go to
the M-bus, even if they are 82495 cache hits. That guarantees that other processors know of the LOCK and
the accesses. The 82495 assumes a BUS LOCK, where
all other processors are kept off the bus during
KLOCK # activation. Most existing "standard" buses,
such as Multibus-lI, have lock protocols which do such
an exclusive lock. 82495 snoop behavior during assertion of its own KLOCK # is undefined, since it expects
no other requests will be permitted then. The 82495's
KLOCK # can remain asserted for multiple cycles
when used with the i860 XP CPU, because the processor allows up to 32 instructions inside a LOCKed sequence.
The 32-instruction i860 XP CPU LOCKed intervals
may exceed 32 CLKs, as each instruction could take
several clocks and cause a TLB miss (the intervals
would be even longer if the i860 XP CPU did data
cache line fills and line writebacks during LOCK #, but
the 82495 prevents that by making KEN# = 1). Unfortunately, this limits bus concurrency. When several
82495s share a bus or interconnection network, performance would improve if a LOCK # from one processor did not block all others from accessing memory
and I/O. Multiprocessors based on the Intel486 DX
CPU are not affected as severely by LOCK #, because
its lock endures only a few clocks-two memory accesses at most.
To improve performance of locks in a multiprocessor, a
scheme of ADDRESS LOCKING may be implemented. This non-blocking protocol allows other accesses to
the bus and memory in spite of LOCK # activation,
and requires only that no other CPU tries to access the
same LOCKed address. If another CPU does try to
access the same location, that second CPU must be
stalled until the first LOCK is de-asserted. To ensure
that the secorid CPU continues to snoop accesses while
stalled, BGT # to it for its request must be delayed
until the lock is obtained, as signalled by the bus arbiter. Semaphore integrity is preserved if all CPUs follow
the software convention of locking their RMW (ReadModify-Write) semaphore accesses. Also by convention, the address corresponding to the first access with
LOCK # asserted is the only locked location permitted
to that processor, until LOCK # deasserts (refer to the
i860 Microprocessor Family Programmer's Reference
Manual Intel order #240875, Section 5-14).
Would software want to be able to cache lockable locations? Since they are used for interprocessor or interprocess communication, it might seem dangerous to
keep them "hidden" in a cache. However, caching allows a CPU to read a semaphore repeatedly without
generating bus traffic, waiting until the semaphore is
free as indicated by a zero value. These reads can be
done in non-locked fashion. If a copy of the semaphore
is cached, no bus traffic is used for the reads, and the
semaphore value still gets updated via the normal
MESI consistency hardware when the semaphore's
owner writes it with a new value.
KLOCK# de-assertion for back-to-back Intel486 DX
CPU locked accesses is required of the MBC if it uses
address-based locking, so that the lock-manager knows
the correct address. The i860 XP CPU always deactivates LOCK # for at least one clock between separate
locked regions, by virtue of its deactivation in the clock
after the last locked ADS#. However, the Intel486 DX
CPU deactivates LOCK # only in the clock after the
last BRDY # of the last locked access. Thus LOCK #
and KLOCK# may not deactivate when two XCHG
instructions occur in succession. The MBC can insert a
deactivation of the M-bus MLOCK # signal by knowing all Intel486 DX CPU locked accesses are ReadModify-Write sequences. The MBC should deassert
MLOCK # regardless of KLOCK # 's value, after the
write.
Deassertion of KLOCK # by the MBC hardware may
be required in any Intel486 DX CPU system, to avoid
bus timeout and starvation of other bus masters when a
continuous stream of locked accesses occurs in one
processor's program. Without it, one processor could
monopolize the bus and prevent re-arbitration.
CPLOCK#
CPLOCK # has a purpose similar to KLOCK # in
Intel486 DX CPU systems, but is unused in i860 XP
CPU systems. PLOCK# (Psuedo-LOCK) indicates an
atomic 8-byte 2-transfer write fot' floating-point data
which should not be interrupted. TheA-byte bus of the
Intel486 DX CPU requires 2 transfers for an 8-byte
datum, and if only half the transfer gets done before
another bus master reads memory, half-wrong data
could be read.
2-473
AP-452
Thus the MBC should not relinquish the bus nor; require snoops of its 82495 from the time of the BGT#
for the first write (when CPLOCK# was asserted by
82495) through the BGT # of the second write. This
increases the worst-case delay of writeback for a 82495snoop-hit to a modified line; to avoid the delay, the
MBC can tie the CPLOCK # [PLOCKENj pin low to
disable PLOCK functionality.
9.0 MORE ALTERNATIVES
In addition to the options discussed above, several other choices affect Memory Bus Controller design.
M-bus· clocking should be chosen to allow future versions of 82495 and 82490 at higher clock speeds. Upgrading the CPU module performance by replacing the
processor and 82495/82490 wiU be possible. While
some redesign of the CPU-side MBC state machines
may be needed for faster clocks, the memory bus. can
remain the same. Thus an asynchronous interface with
either a strobed unclocked M-bus or a clocked M-bus at
less than 50 MHz is advised. A fully synchronous
M-bus/CPU MBC would be difficult to move to higher
clock speed.
One convenient way to design the MBC is with the
M-bus MCLK = 0.5·CLK. Probably it will be possible to keep the M-bus at half the CPU CLK rate, even
with faster CPUs. The big advantage of this half-speed
link is that no synchronizers are needed within the
MBC if the MCLK and CLK edges are skew-controlled. The MBC can be totally on CLK, as in the
design example of Appendix B.
The choice between a Strobed or Clocked M-bus is often determined by existing bus protocols in which
82495/82490 will be used. Most existing buses are
clocked; however, Futurebus + requires all bus entities
to use strobed tranfers, but allows an optional' clocked
mode for high-speed packet transfers [Fbus90j. The
tradeoffs are shown in Table 2.
Line size and M-bus width also determine upgradability to possible future versions of 82490 on the same
M-bus, with more than 32kB per chip. If a higher-density. 82490 becomes available, the fact that 82495 has 8k
tags requires:
128 data bytes per tag (128 byte line, or sectored
64-byte lines)
AND
8-byte or 16-byte memory bus width
to allow a 1 MByte or 2 MByte 82490 configuration. If
a smaller bus is used, a larger 82490 is possible, but
the bus-size multiplexing described earlier would be
needed.
Writeback (WB) cache.policy is advised for high-performance (multi)processors to limit bus traffic. However, a writethru (WT) design is simpler for the MBC
because there never is a need to backoff the 82495 due
to MHITM #. In fact, the snoop window in a WT system becomes unnecessary and SWEND# can be activated simultaneous with KWEND#. In such a system,
the only states of cache lines are S or I. Snooping has
no effect during reads and only causes invalidations (in
the slaves) for writes in a WT design. Cache-to-cache
transfers and RFO are irrelevant.
Table 2. Clocked vs. Strobed MBUS Tradeoffs
CLOCKED MBUS Advantages
STROBED MBUS Disadvantages
Design techniques for clocked systems are well
known.
Fast arbitration using MCLK state machines.
MBC design may require delay lines and nonconventional design techniques.
Arbitration slow because signal must be
synchronized at arbiter and at modules.
Burst throughput slowed if each transfer requires
acknowledgement from receiver.
Burst transfers proceed at one datum per M,CLK
CLOCKED MBUS Disadvantages
Must round-up delays toMCLK period quanta EG.,
, 33 ns delay means two 30 ns MCLKs needed.
Some 82495-to-82495 Signals must be twice
synchronized: once at sender, once at receiver.
Backplane length limited.
MCLK skew must be controlled.
Requires assumptions on CLK vS.MCLK speed
ratio: for example, CLK > MCLK > CLK/2.
STROBED MBUS Advantages
Delays determined by device speed and physics,
not by MCLK quanta.
Each signal goes through sychronizer once, only at
receiver, so less time is lost at synchronizers.
Fewer limits on backplane length or capacitance
or number of boards.
No clock skew worries.
Any CLK frequency will work.
2-474
AP-452
o Decoding CPU request burst length from
CLEN1:0(82495 pins in Intel486 DX CPU systems)
or LEN and CACHE#(i860 XP CPU).
o CPU Line length-16 bytes vs. 32 bytes (i860 XP
CPU) means that the Intel486 DX CPU MBC will
give 2 BRDY # s for every 1 BRDY # of the i860
XP CPU MBC.
10.0 llilli3C DifFERENCES fOR i860
}(P CPU VERSUS Intel486 me
CPU
The same MBC design can be used for either i860 XP
CPU or Intel486 DX CPU if the MBC supersets the
requirements of the two. A "CPU_TYPE" configuration pin can be included in the MBC to modify its behavior. First, make the features as common as possible:
o Choose a configuration acceptable for both CPUs:
a) 256 kBytes, 4 transfers/line, 64-bit M-bus, 32-byte
line.
b) 512 kBytes, 4 transfers/line, 128-bit M-bus,
64-byte line.
c) 256 kBytes, 8 transfers/line, 64-bit M-bus, 64-byte
line.
d) 512 kBytes, 8 transfers/line, 128-bit M'bus,
128-byte line.
o i860 XP CPU-pfld data is cached in 82490--no optimizations are included for pfld.
o Assume that LOCK # duration does not matter (IE,
that back-to-back LOCK#ed requests from
Intel486 DX CPUs and long LOCK# cycles in i860
XP CPU do not cause bus ownership timeout).
Features Strictly for the Intel4!16 DX CPU:
o BE7-4# for M-bus must be synthesized by the
MBC from A2 and BE3-0#.
o CPLOCK # protection.
o WRMRST (warm reset) can be included for both
CPUs, but is optional.
.
Features Strictly for the i!I60 XP CPU:
o Burst writes from the CPU (Length = 2 and
Length=4).
o A second 74F377 BE#-latch is needed, for i860 XP
CPU pins BE7#-BE44f, LEN, and CACHE#.
PCYC and CTYP can also be latched for debug purposes.
o PCHK # output from i860 XP CPU must be ignored except during the CLK after BRDY # comes
from the MBC. PCHK # from Intel486 DX CPU is
always valid.
Differences between the MBCs:
o Configuration pin strapping of 82495 inputs.
Differences between Intel486 DX CPU and i860 XP
CPUs which have no impact on MBC:
o Intel486 DX CPU FLUSH # input pin.
o i860 XP CPU writeback caching, HITM #, and
BOFF#.
o i860 XP CPU CS8 vs. Intel486 DX CPU BS8#,
BS 16 # (none are really useable).
o Intel486 DX CPU RDY # pin and interruptable
bursts (not useable with 82495).
o i860 XP CPU acknowledges HOLD during
LOCK#.
o EADS# duty cycle (50% maximum for i860 XP
CPU and 100% for Intel486 DX CPU, but handled
by 82495).
o KENt,; pin sampling interval by the CPU.
o Behavior of CPU in response to BOFF # assertion.
o i860 XP CPU BERR (Bus ERRor) pin versus
Intel486 DX CPU NMI (Non Maskable Interrupt).
11.0 SUIVHVlAIPlV
The interface between a CPU/82495/82490 chip set
and a system memory bus allows much flexibility and a
wide range of performance options. The simplest MBC
can be a few PALs, while a top-performance multiprocessing version may take thousands of gates on an
ASIC. Signal pin counts for the MBC can range from
70 to 120, varying with the memory bus definition implemented by the MBC.
While beyond the scope of this document, topics for
consideration include detailed timing diagrams, critical
path analysis, simulation of bus traffic, and hit rates.
Useful also are simulations of performance impact of
the number of CPUs, WB versus WT policy, memory
latency, CTCT, RFO, and duplicate tags. Also at issue
are interrupt controller hardware, PAX concurrency
control, boundary scan and selftest, PC-compatibilityimplications, i860 XP CPU pfld options, and highspeed design issues of impedance, termination, and
noise.
2-475
int:eL
AP-452
12.0 BIBLIOGRAPHY
[i401] Intel486 DX CPU Microcomputer Model 401
Board Technical Reference Manual, order # 504366.
[AgarwaI88] "An Evaluation of Directory Schemes for
Cache Coherence", by A. Agarwal, R. Simoni, J. Hennessy, and M. Horowitz (CH2524-2/88/000/0280
IEEE 1988)
[InteI486] Intel486 DX CPU Microprocessor Data
Sheet, Intel order # 240440.
[Fbus90] "Futurebus +: Its features, and how to use
them," John Black, in VMEbus Systems Magazine,
Feb. 1990, p. 23:40.
[Ham90]Tucker Hammerstrom, "Metastability," Intel
Techbit # PLD-0390, March 1990.
[i860XPDS] i860 XP CPU Microprocessor Data Sheet,
Intel order #240874.
[Thakkar90] "Performance of an OLTP Application
on Symmetry Multiprocessor System;" by S. Thakkar
and M. Sweiger (CH2887-8/90/0000/0228), 1990
IEEE Inti Conference on Computer Architecture.
[82495/490DS] 82495XP Cache Controller/82490XP
Cache RAM Data Sheet, Intel order #240956.
2-476
AP-452
AP[?[Eu\Hf}~X
IA:
QIl.lI®sUOIl1lS al\1'il0000< )¢¢¢¢( >0000( Ixxx
A
)0000(
)¢¢¢¢( )¢¢¢¢( )¢¢¢¢( >OOOOC >0000< )6l\
>0000< >0000< >0000< >OOO¢( >0000< IX]
>0000< )¢¢¢¢( )¢¢¢¢( 1)6l\
MD[63:0]
Xl
>0000( >0000< )¢¢¢¢( )¢¢¢¢(
I'M >0000< )¢¢¢¢( )OO¢O( )0000(
hi.. )O¢OO( )O¢¢O( >0000< >0000<
XlIl )()O¢¢( )¢OO¢( )¢OOO( >0000< .
IXXX
Al
X
A2
X A3 X A4 XX
~1~4
MBRDY#
MSNPSTB#
MSWENDI#
MHITMI#
A,±:.
>0000< >0000< )0000( >0000< >0000< >0000< >0000< Nt .1. 'W\ >0000< )0000(
YMBRDY#
Al
A2
ALI
A3
YMEOC#
L
YCEOC#
L
Figure C-2. Non-Aborted Read Cycles
2-488
-'
-'
240957-19
intel~
AP-452
15
16
17
18
19
20
21
22
25
24
rL rt- rL rt-rt-rt-rt-rt- rL rt-
ClK
AOS#
CADS#
XAS#
YSBGT#
,
,
BGT#
KWEND#
CROY#
\. f..I
YSIotEOC#
\. ~
YSCEOC#
\. f..I
BRDY#
MClK
'\
10
11
12
)(
)(
13
)(
14
15
16
~ ~ (\:(\.:.('""\: rG~
XSAS#
YBGT#
Yt.ASEl#
MBREQ
IotADS#
MA[31 :3]
>OCOO< >OCOO< >OCOO< >OCOO< >OCOO< >OCOO< >OCOO<
IotW/R#
>OCOO< >OCOO< >OCOO< >OCOO< >OCOO< >OCOO< >0000<
MNA#
>OCOO< >OCOO< >OCOO< >OCOO< >OCOO< >OCOO< >OCOO<
IotKEN#
1.10[63:0]
MBROY#
>OCOO< >OCOO< >OCOO< >OC¢¢( >OCOO< >OCOO< >OCOO<
A3 .~ A4
Iu
IotSNPSTB#
MSWENOI#
MHITMI#
YMBROY#
YIotEOC#
YCEOC#
>OCOO< >OCOO< >OCOO< >OCOO< >OCOO< >0000< >0000<
A3
A~
"'-~
~ L/
240957-20
Figure C-2. Non-Aborted Read Cycles (Continued)
2-489
AP-452
the last data slice was read into the core. The core floats'
its bus and asserts MBOFF # concurrently with
MABORT#. Upon sampling MBOFF#, the snooping
MBC begins the snoop write-back in clock 10.
Aborted Non-Pipelined Cycles
Figure C-3 illustrates an aborted non-pipelined cycle.
MHITMI # is sampled active during MSWENDI #
(clock 4) indicating a snoop hit to a modified line. Since
the cycle is non-pipelined, MABORT# is issued immediately and the core floats its bus (clock 5). Although
the bus is floated by the master core, the master still
owns the bus (MHLDA remains inactive).
Write Allocate
Figure C-5 illustrates a write cycle which is potentially
allocatable. This write is performed on the bus only in
order to sample the MKEN #, since the allocation cycle will only be guaranteed if MKEN # is active.
MABORT# in clock 5 causes the main memory to
abort its cycle regardless the number of MBRDYs that
have been issued. MBOFF # is also asserted in clock 5
to indicate to the snooping core that the master is floating its signals and the write-back may begin. The main
memory floats its data bus in clock 6 in response to
MABORT#. In the following clocks a snoop writeback cycle is performed by the snooper. The snooper
will release the bus at the end of the write-back.
MKEN # is sampled active in clock 2 causing the
MABORT# to be issued immediately. The reason to
abort the write cycle,even before MSWEND#, is due
to the fact that a read for ownership cycle is guaranteed
to be performed after the aborted write.
In clock 4 the MADS # of the allocation cycle, which
becomes the MADS # of the read for ownership cycle,
is issued. This MADS # is issued only if MSWEND #
has not been issued yet, or if MSWEND # was issued
and MHITMI # was negated. If MHITMI # is asserted
during the MSWEND # that was issued, MADS # will
not be issued (since the snooper issues its MADS#).
Note that MSNPSTB# is not asserted during the
write-back cycle since it obviously Will not hit any
cache.
Aborted Pipe lined Cycles
A second MABORT# is issued in clock 8 indicating
the memory to abort the allocation, and the snooper to
start flushing the modified line. Note that a second
MABORT# will be issued regardless if MADS# of
Figure C-4 illustrates an aborted pipelined cycle. Although MHITMI # is sampled active during
MSWENDI# (clock 7) MABO~T# will not be issued,
immediately since the previous cycle has not been completed yet. MABORT# isissued in clock 9 after
CLOCK
2
3
4
5
7
to
MCLK
MAOS#
MA[31:3j
MW/R#
MNA#
MKEN#
MO[63:0j
MBROY#
MSNPSTB#
MSWENO#
MHITMI#
MABORT#
MBOff#
240957-21
Figure C-3. Aborted Non-Pipelined Cycle
2-490
AP-452
CLOCK
MCLK
MADS#
1.1.\[31 :3]
MW/R#
MNA#
MKEN#
MD[63:0]
MBRDY#
MSNPSTB#
MSWEND/'
MHITMI#
IAABORT#
MBOFF#
240957-22
Figure C-4. Aborted Plpellned Cycles
the allocation was issued or not. The first MABORT #
(clock 3) aborts the write cycle in the memory module
and does not affect the snooper. The second
MABORT# (clock 8) indicates to the snooper to start
its write-back cycle (and if MADS# of an allocation
was issued to also abort it in the memory module).
MSNPSTB # is not issued for the allocation cycle since
write and allocation cycles access the same line.
LIMITATIONS OF DESIGN
The primary limitation of the implementation as it has
been presented so far is that it includes only two processors. The protocol set up in the design is not limited to
two processors. The next section outlines the implementation details which must be modified to extend the
design to more than two processors.
The design has no support for CS8 mode, so the processors cannot be booted from 8 bit EPROMS. Instead,
both processors boot in 64 bit mode, which may complicate the use of the design in stand-alone systems.
If MKEN # had been negated in clock 2 then an allocation would not have been performed and the write cycle
would have continued as a non-allocatable write cycle
(see figure C-6).
The i860 XP CPU's BERR, or Bus ERRor, input is not
utilized in this design. The pin could be used simply as
a non-maskable interrupt pin, but the memory bus controller as designed makes no provision to use BERR to
correct a faulty bus access. Likewise, the parity check
results from the i860 XP CPU's PCRK # pin are of
little value in this design outside of testing the i860 XP
CPU's parity functions. The MBC itself does not check
the PCRK # output, and has no means of reissuing an
access in case of parity error.
Non-Allocatable Write
Figure C-6 illustrates a write cycle without an allocation. It can be either a non-potentially allocatable write
cycle or a potentially allocatable write with inactive
MKEN # (clock 1).
The write cycle is aborted (MABORT# in clock 3)
after sampling active MRITM# during MSWEND#
(clock 2). In clock 11 the master core re-issues the
MADS# of the aborted write cycle (after the snoop
write-back has been completed). MSNPSTB# will not
be issued again since the updated data had been written
into the main memory and the snooper has gone to the
invalid state.
The memory bus controller design here does not decode
and utilize the i860 XP CPU INTA cycles. The INT
pin itself is connected directly to the i860 XP CPU,
without affecting MBC operation.
2-491
•
AP-452
CLOCK
2
NCLK,
NAOS#'
NA[31:3]
NW/R#
NNA#
, MKEN#
1.10[63:0]
, NBROY#, '
NSNPSTB# '
NSWENO#
MHITMI#,"
'
NABORT#
MBOFf# ,
240957-23
Figure, CoS. Potentially Allocatable Write
CLOCK
,MClK
;MAOS#
MA[31:3]
MW/R#
NNA#
NKEN#
1.10[63:0]
:NBROY#
MSNPSTB#
MSWENO#
MHITMI#,
NABORT#
NBOFF#
240957-24
Figure C-6. Non-Allocatable Write
2-492
Ap·452
The MultiProcessor Interrupt Controller (MPIC) currently being designed by Intel is not utilized in or supported by this memory bus controller.
The memory bus controller's treatment of LOCKed cycles is simple but straightfoward: when the 82495 issues
a memory access which is LOCKed (KLOCK# active), the MBC will not relinquish the bus until a cycle
which is not LOCKed is issued. While this is adequate
for simple systems, it will not suffice for dual ported
memories, where a given block of memory can be accessed through more than one bus. In such systems, a
LOCK signal must be introduced. to alert all possible
simultaneous users of memory that a LOCKed access is
in progress.
!EXTENSION' OF DESIGN TO THREE
OR MORIE CPUs
Two Processor Implementation
Overview
Figure C-7 presents a simplified view of the multiprocessing signals for the two processor implementation.
The basic aiidress, data, and memory cycle control lines
are attached to a common bus. Only the core which
controls the bus will drive these signals, with all other
cores floating these lines and asserting MHLDA #.
When the bus master MBC issues a cycle, the
MCACHE# and MW/R# cycle attributes also serve
to drive the 82495s' SNPINV and SNPNCA inputs of
both cores. SNPSTB# is issued by the master in the
clock following MADS#. In reality, both cores have a
SNPSTB# output at their V-side state machines driving a common line which connects to the SNPSTB#
input of both 82495s. The core which does not own the
bus floats its state machine driver on MHLDA, so the
signal acts only as an input in that core. The master
drives the SNPSTB # line, but the action of SNPSTB #
is blocked in its own 82495 because its MAOE# signal
is asserted.
The results of the snoop are driven out on the snooping
core's MTHIT# and MHITMO# outputs, and
MSWENDO # is asserted. These signals are connected
directly to the MHITMI # , MWB/WT # , and
MSWENDI # inputs in the master core, respectively.
The MBOFF # signals of the two MBCs are also connected together. During MHLDA (in a snooping
MBC) MBOFF# is an input, and in the master it is an
output. If the master asserts MBOFF, control of the
data and control busses is given to the snooping MBC
so that a snoop write-back can be performed.
Three or More Processors
This section gives one method of extending the design
given here to three or more processors. The solution
presented here assumes that no changes are made to the
state machines as they.are written for the two processor
system. Instead, some minor glue logic is added to three
of the signals to make the core an element in a scalable
multiprocessing system.' However, modifying the state
machines is also a plausible solution.'
In an implementation with three or more processors,
the primary address, data, and cycle control lines are
still connected to a common bus, as in the two proces'
sor version. MCACHE# and MW/R# are also utilized in the same way as the two processor version: the
outputs of the cores drive a common line which in turn
also drives the 82495 SNPNCA and SNPINV inputs of
all cores ..
The SNPSTB # signal connects directly from core to
core in a two processor version. In an implementation
with three or more processors, .the SNPSTB # ,line is
simply extended to all the processors in the system.
Only the bus master will actually drive the line, and
snoopers will be floating theSNPSTBi;I output from
their state machines. Again, the snoop request is ignored in the master because its MAOE# is asserted.
Similarly, the MBOFF# signal becomes a common line
which only the master will drive and which all other
cores will sample.
The six signals in the upper portion of diagram C-7,
which communicate MSWEND and the snoop results
MHITMO# and MTHIT#, will require more glue
logic to extend the design to three or more processors.
The snoop results MHITMO # and MTHIT # must
now be, considered for multiple cores when a snoop has
been issued, and the master MBC must not sample
these results until all snooping cores have issued their
MSWENDO#.
To resolve these issues, common bus lines carrying
these signals are introduced, where all cores have outputs driving these lines, and inputs to sample them. The
characteristics of such MTHIT # and MHITM # lines
are straightforward: the line should default to 1, and if
any core drives one of these outputs low, the line.
should be pulled low. The MTHIT# line has the simplest solution. As shown in figure C-8, by passing the
signal which is produced by the core through an open
collector buffer, the buffered MTHIT#s can be tied to
a single line which is sampled directly by all cores'
MWB/WT# pins. The open collector buffer sinks current like a normal gate output to drive a logic 0, but
instead of driving current for a logic 1, the open collec- .
tor device assumes a high impedance state for logic 1.
Thus, if all of the cores outputs MTHIT# as 1, the
MTHIT # line remains at a logic 1 level because of the
pull-up resistor. If one or more cores outputs a logic 0,
the MTHIT # line will be pulled to the logic 0 level.
This precisely matches the desired behavior of
MTHIT # for the system: if any 1 or more core(s) has
the snooped data cached, the master MWB/WT # input must ,be asserted low. It is important to note that
2-493
II
intel~
.
AP-452
CORE A
CORE B
MSWENDO#· (0)
MSWENDI# (I)
MHITMO# (0)
MHITMI# (I)
(0) MHITMO#
(I) MHITMI#
MTHIT# (0)
MWB/WT# (I)
(0) MTHIT#
(I) MWB/WT#
MBOFF# (1/0)
(f)
(f)
w
MSNPSTB# (1/0)
SNPINV (I)
...J
SNPNCA(I)
o·
.a: . ,
.. ~
a:
::II
MHITMO#
'
MTHIT#
~.~
MHLDA
if
1\
.
."
MBOFF#
MSNPSTB#
C
c
jj;
MSWENDII
o
I\)
1.
co
en
C')
f4
I'TI
r;
r--
i
:::J
I
1/1
0"
~(
:::J
C)
I
1
i:
:-
MHITMI#
MWB/WT#
MSWENDO#
0
m
m
:0
"-----l--
~
I
!
<1=
r- . . . ",. ......... " ........
tI ... ""If'I'II1 ..
• • III • • UI/II\,IIIUIJI/IIIlIolIolJ,'"UeJ:;t1;I:'l'lIlIlIllIl.l.lZZZZ . . . " " a o ' . . . . . . . . . . . . . . . . . . " ..
'1./110
.. 1'1"0,."' .. ., • .,".,0 ........ 11 ..
oo ........ ooo ...... o .. o .. o ..
II
::~aS~:;
ncl'.oo .. N ... .,1/1
oo ...... oooo .......... oo .. o .. o .. oo .... o .. co .... o .. oo .... o .. o .. o .. oooooo .......... ..
'11'I.OI'II'l'lll"IICIOO .. I ' I " ' . I ' I I ' I . I ' \ / ' ...
0. ..
1'1110. .. 01 .. ".110 .... ,. • .,."',..,,.1111 ... 00 .....
.,.1'1'11
oO .. O .. l)o .... o .. oo .... oOO .. O .. O .. Q .. o.oooo ............ 0
C ( C « c . . . . . . . OIlWW .... OOO:r:rIl'l'l'lIlX.l.l.::l.lZZZZ"' .. O:OII:l:l:lj:l:l:l:l:l:lOc
r-i
r;~.~~:~~~~:~~~~-::~~~::~:~=~:~~=~::~,~=~::~~~~~=:~=~::~:7:~==~:~::~:~:~:!~:7:~~~~:~:~~:~~:~:~:~=:~:~:~~:~:~:!~."~"~.7
••~.~.~""~.-•.~.~.~
••~O
i:ig~ili~gggfgi~~g
,a"C"'IlIl.II.U.II.I.IIII.IIIIIUIlIll .. U .... O I l O l l l l . I . " .. :l.l .. I;)IIII.l .. I:I.1O,I<':II.IOIlI
0
N
:J
, 0 .... "."."'''.'''0 .... .,.1110''.00 .. "'''."..1'.(110 ..... ''.11.,. ... 0 .... '''.11''',..1100 .... .,
I) .. " ... ., 1'1 <0" .. (II . . . . . . . . . . . . . . . . . . . . 1'1 1'1" "' .. " .. I'll" .... 1'\" .. 1'\ . . . . ., .. ., . . . . . . . . . . . . ., or .. ., 1'1 III II'i n 1'1 1\ 1'1 n n n '" '" II 1:1
IlCOOtltltlOOOOOQQOOOCOQoaaoooooaaOOOCIIOOQQQQQOQOOOIIOOQQOaaaaaaQCCCOC
'II '"
I' :0 .. oC I' :l Z I' 1> "/l .. I' I' • " ~ • " go 0/1 "C " '" "/l " I' "C 0 ... 0. oC (10 " II '" '" 0 0
0 1> II "0 41 " I' OIl " I' " I' 1\ It! "Q II> II> II> (I t\ III " " .
00 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
" " , f'I " " I'
00"000000
"
(Q«(Q(UUIlCl.Ul.Il'UC\,lQQIIQU~IrI~~I!JI!JO:E:I:(IIoa::I:\l'lfo:l~!oIli:l.;\l:E~foZ:l1ol1():lfo.lli~"IOIol.lli
J;~a:O~a:I~:J
:J,I,;:a:IoII1Q
o .. ~ '" " "
~
<>1'• •
~ ~
:
:
:
~ ~ ~ ~ ~
!
~ ~ ~ ~ ~ ~ ~ ~
tee Cl Il
.:l Il II II Cl 0 C 0 .:I Cl Il Il Il Il Il Il Il Il Il II II II
:Cl
~ ~
;:
~ ~
:
~ ~ ~
=~ ~ ;
~ ~
:~: ~::
~ ~ ~ ~ ~ ~ ~ ~ ~ ~
" .. II I' 'II ~
oooooc
;;~:
Il Il Il Il Q C. Il t e l l Il Q Il Il Q II Q Il Co Il II Q Q Il Il Q Q Il C Q Il Il " Il Il
II II III II II II II I. I. II II II II II II II II ~ II II II II II II. II II II II II II II II I. II II II I. II: II II II II II II II II II " II II II II II II II II II 1 II II II II II II
.:;'
!,
iL-jIo
N
-
C
• •~
~
~
N
·
"
~
~~
Appendix C Schematic: i860TM XP CPU
2-544
,
n
"
"'
~ ~ ~
~ ~
:
. "n. YI'
:" ! ,~
H
InteL
AP-452
....
'"I
to
'"(ijo
...,'
" ,
""
"
c .. " .., " III "
i"
Zo.o '"
,~
.., .. 01
ill ,.
II
GI
nm mlliiiiiiiiiilllmlll
."' ....... 0""'''' •
....................
~
I,".,"""... '". ,.. "".. ".. ,"".. I:: ;::
......... 'Or"lIIl11o.o.00 .... "
o
~
( ( «" ( «" "
"
....... O' .. "'O'OOOOO
........................ " .... ,."
co .........
I' I' .. " .. I' .. I' .. " I' .. ,. II' '0 i" 0. 0.
~ ~
~ .. 0 0
0 :;, I) ::: 0
Il Il
~
~
II IJ
.. " ........
0 0 0
0 0
:1:1
~
0 .. '" .
en
DI
..
(;'
(XI
I\)
....
CO
0
><
=;- :;~EA'"
: : ::~yp
•
B
HOATA .. 3.6
80 A1.3
HOATA3 15
MOATA" 1.6
81 AJ.4
MOATAS 12
:~ :~s
MOATA6
MOATA7 ..
TOO 64
ADS"
W/R'"
_
D
~R::~EA"
_. ~BTYP
~e
HOATAl. 1"MDATA2 10
79 1012
BA
B
'-':L..K2
30 eLK
B
RST2
28 RESET
a
WR.
6
5
B
H:ITH","
B
WAY
B
MAWEA ...
B
Meye ...
B
WBWE",.
n
wnTYP
B
wnA
MOl 15 I e-)
MOATAO 18
77101.0
78 A1.1
MDATA5 12
HDATAI5 e
(II
I\)
5~
COATA .. 46
COATA7 57
:::T
u,
CoATA3
CDATAS 51
: 8 RESET
en
n
CoATA2 49
CDA'rA6 52
30 eLK
0
J
MOATA3
A12
H
a.
11.[ 1 6 . 3
71 1015
AU
., "
.2
., "
(II
B
CDATAl. OS4
70 AS
MDATAO
»
0_1>(15.8}
COATAO 4a
CDATAS S1
CDATA6 52
HOATAl.
"0
"0
~2
0_0(7.0)
CPATAO 48
54
CDATA1
a
MOATA? ..
ToolS"
D
TOYCS;)
•
ADS"
W/R*
BEH:ITM*
»
"'0
.
""c.n
I\)
WBTYP
~8 WBA
DOFF'"
BLAST'"
"0
0
-,
DI
n
:::T
(II
.MooE-
:II
HZOT
s:
HeRDY"
»
Sr;L*
MeLKS
_FRZ
26 MeLK
MOCLK4
2"1
]0
OCLK
B
YMZDT
8
C8MSELIf' '
26
27
~D
TDI
U2201
U2202
240957-98
"@
2EJ
Iiiiil
F
~
~
~
2EJ
~
=.
d
tit
~3
COATAO 48
n
D2490XP
0(:>311.6]
65 AO
0_A[1GI::I]
COATA2 4 \.1
67 A2
~5
6811..::1
CDATA3 55
69 A"
COATA" 46
CDATA5 51
"1011.5
CDATA5 51
COATA6 S2
COATA7 57
7111.6
COATA6 52
7311.7
CDATA"1 S7
MDATAO :18
MD[ 2:1 1:1\5
7611.9
MPATA2 10
7811.11
M.DATA3 6
7911.12
~
HDA'l'A5 12
M.DATA6 a
~
MDATA:J 6
DO A1:J
MDATA4 1G
B
:2
MDATA" 16
_ _ _ 0111.14
HDAThS
:~ 11.15
HDATA6
11.16
MD{ .:11 :2 .. ]
MOATA1 14
MDATA2 10
7711.10
MDATA1 14
ao
MDATAOI,a
7511.8
J
"_0[31.24J
COATA2 49
COATA4 46
COATA3
;-;-r
~.
CDATAO 40
CDATA1. 54
6611.:1
COATA:1 54
D_A(:l.t'"3]
MDA'l'A74
1!!~!!~~~~~~~3IOCLK
ToolS"
n
TDICBS
Cl
~a =~'T
W/R'"'
o
WRARR'"
B
MAWEA ...
D
DUS'"
..
l>
WRARR-
4,-, WAY
"
D
nu!>'"
"tI
I
MAW!!:A
.&::>
DUS-
B
WBWE'"
D
'WDWE...
..
3
B
WBTYP
D
'WDTYP
.::I
MCYC'"'
WDTYP
U1
I\)
wnwz;;-
38 WDA
B
OOF ......
LAST'"
B
BLAST...
o
BRDYC:> ...
D
BROYC2 ...
11
D
A~D
DOFF...
3
YM.OOl> ...
o
YHZOT
B
CDMSEL'"
2" MGE:L'"'
n
o
YHOROY'"
2
YMEOC2...
MZDT
2
MDRDY.
MEOC"
2
MFRZ
~~
MCLK
VDD
~AR-
»'T'C~::TD
B
TOTC""':J
2
JO
=K
TD%
U2301
"@
:w
1M]
F
U2302
240957-99
~
~
~
~
~
_.
€:
*s
D_OC 39.32]
CDATAD 411
CDATA.l S4
COATAO
D_Ar161.3J
CDATA2 49
CQATA4 46
CDATA5 S1
B
045
CDATA6 52
CDATA6 S2
CDATA7 S7
B
B
046
047
:J:~"
I
HD;):J
t 10
M0:14
m
1
~I
!'!
CIO
I\)
"'CQ0"
><
"D
0
A14
~~
eLK
-=. ,
B
D
MAW1":A'"
D
BUS'"
n
Meye ...
D
WOWE'"
h
"'OA
h
::D
MOATA3
MDATA4 16
HDATAS 12
MDATA6 8
MDATA? ..
ToolS"
h
Torce 7
•
BLAST'"
BRDYC'2#
B
YMDOE'"
B
YHZBT
:t>
1i1l1l;:~ ~ ~:!IWBTYP
~B
DOFF"
B
D
DI
»
3:
MDATA2 10
.3
______________,
MO(471401
MOATAl. 14
A13
C'I
'::T
ID
044
n
ID
0,
~
co
B
n wn*
C'I
'::T
I\)
MOATAO 1 II
A9
AU
79 A12
:~
0
3
~~
eo
TOOFIB~4
(fj
~
MDATA~~'~2__~~~-D
B
X'
32 1
MDATA4~'~·__~~~__
MDATA6r.B__~~~__
MDATA7r4__~~~__a
Q.
0_0(47<40]
"IS A8
MOP"
MOATAl. 14
MDATA:Jr."~~~~__
::l
041
042
CDATA .... 6
MOATAOI'.
ID
040
n
B
CDATA5 ~l.
CDATA7 57
"g
"g
n
CDATA:;;! 49
CDATA3r.S~S__~~~~.
COATA3 SS
»
ole
COATA]' 54
_:.fD
I
""en
I\)
WSA
ii:~;:i~!S!~!*~~~~~~I::~:"
JD
TESTD
TCK
U2401
"tJ
"@J
a2J
Iiiiil
F
U2402
240957-AO
~
~
~
a2J
~
cC
*7
-.
0_0[55;41;1)
CDAT~
fl __ A{ J.61 3}
B_A[1.6.3J
CDATA2
CDATA4 46
CDATAS S1
B
71 A6
COATA6 52
B
73
COATA? 57
75 AS
l:>
HO[55.48]
en
(')
:::J'
III
76 A9
RST2
U
B
B
WR-'"
BE......
"TTM'"
30
TDOlr·~4
LX
____________
~
MDATA4 16
MOATA~ 12
MDATAG B
MoATA? 4
ADS"
W/R-*
BE"
!1!~II~~~~~~~13IoCLX
H06.3
TDO~
~n =~ET
2 B RE!OET
(;
MDATA3~V~~~~~~
00110.13
81 A1"'-
MDATAIS B
I MD,.3 ... ,
MDATA~ ;~
IEfnE§7~'~A12
:; A15
MDATA4 16
MOATAS 1.2
Al.S
MDATA? 4
D
MDATAnI,n
77 A10
78 A11
:~
CLoK3
CDATA7 57
MDATA2 10
MOATA3 6
79110.1.2
lJ
COATA6 52
731\.7
MDATAl. 1.4
Il.
><
o
COATA5 51
71. 110.6
7S AS
77 A10
78 A1 l.
811'1.14
III
::I
70 AS
MDATAO 10
80 A1.3
"g
"g
69 A4
CDATA2
CDATA3 50S
CDATA4 46
7(; A9
D_D[63'S6)
CDA'I'~
A2
68 A3
CDATA3 SS
4
70 AS
7
-"-=__-"~
»
3
I\)
au,
n"
.$>.
••
(0
""'en"'
I
CII)
N
I»
WBTYP
.$>.
~B
CD
C
WBA
•
X
"g
o
DI
(')
:::J'
III
B
YMDor;; ...
::rJ
l:>
3:
27
2.
VCCCLK~""">-
_ _ _ _ _ _- - ,
"@
VS.SCLK
2&
U2501
IMI
IF"
U2502
2409S7-A1
~
~
~
..
2&
~
AP·452
«'"I
,...
'"o
0)
c;5
.•, .., .,
. . . ." ."
~
"
~
"
"
~
"
"
"
"
"
"
~
N
"
r-rl--:l--H'-!--,h M
o
II)
N
'-"1-'1--+-'1--' :J
Appendix C Schematic: Clock Generator
2-550
_.
•
P
l
.1.~
IlL!-;".
8
B_LPE( 7, 0)'"
::1 .
B
BE['JIOJ'"
II
'D
BE'.
1B
t;..,;) ::
U2902
I
D
LBB'"
I
~~~
L-______________________________________~:~1~
R
t.F.N
20r-;-------------------------------------------------~~~
__
00
:
'U
CD
::J
~
U1
U1
-"
a.
5C
en
n
:::r
CD
10
»
U2903
74A64374
'0
o
1
~
a
TSOGT.
•
'P
olio
U1
'"
B 'YMEOC't#
~
n
70
301
17
~
D
YMSWNO'"
•
"@
40
aID
IiiiiI
70
If'"
80
~
U2901
MCLK_SYN
240957-A3
~
~
~
~
_.
l
EASTB
@
08SC224
"
::IO;)
:I04
B
XSNPwaff
~9SVl.
J:0!5 18
J:O"l.7 SV2
a
CRDY1*
II
r------
::1:0 .. 16
1:08 15
B FSI.OUT.
l:NPl.3 14
3
~
EBGTKWN
,
'tJ
'tJ
CD
::l
C.
I\)
0,
en
I\)
D85C224
"+-""-==~-j2 INP'?
l>
B
YSBa±.,3 INP3
R
CP.DYl It 4 INP,",
I I I
fi YNP6
~ IMPS
CI:I LC)HV
:-1TR4' TRO#
;C"
B
o
TII
:::::
::06 l.7 I1_ E :-JOGT.
,.
en
n
:
I
en
"""
,o.~.,
::T
CD
I\)
INP1.31. 4 .
3
1.1'LNP:l.l.
a0"
»
"U
J:NPl.2 13
•
III
ESWND
1. 08SC224
L -_ _ _ _ _ _ _ _~_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _~_ _ _ _ _ _ _ _ _ _~~4_2B~R~S~T2'~--~2J:NP2
a
WSDTS,,"
3
B
BCT.
4
1:03
L-____""B_P!:.B~G;::T!:;.!...__2I5 INP"
X04
MTR4
B
__
'fSMSWNOIt
n
YfiNPO-: SIf
B
YSMEOCIJ
B
SLF":"S':'1t
J:NP3
::tNP"
TRait
115
..
IMP"
o
,.
SVl.
SV2
ro·F'~·----------------~~~~-B
:t06 'l7SV3
.., INP7
1.1
p:
NP l.l.
"@J
2EJ
1:07 1 f'iS_ENSWND*
.tNPD
:tNP,"
IMJ
:1:08 15
IF"
:INP1.3 14
~
INPl.2 13
~
2409S7-A4
~
2EJ
~
_.
l
•
EWCPLB
::I. P8!5C224
a
RST1.
B
CRDY1.
~
%Of
::~::v
JlOYSitC
CPUENO
B
B
B paCT.
WCPLS.
:
LJ3
!a.
I}'
(11
(11
c.l
»
'lJ
j(
•
(')
.;.
en
U1
n
I\)
.':J"
111
EBRDY
3
l.P8SC224
!.
n
l
B
YSCEOC4t
B
SNPCYC..
B
CLENl4t
.:<:::
:::~:
:tNI'''
u
ut .IN!' f5.
Cl.
rn
:Z:Ol~ 5
CD
3
III
~ --=-~LE=-
11
1 1 __
n
.1 1
B
085C22Vl.O
:2
:1
WMSWNO*
SVRO.
YMEOC1.#
""'
(J1
N
l
AYMBTRCK
1
B
B
14
''l-~'0 'Nm~
()
:::r
B
YPl:PE#
MBOFF.
B-PSWBASlf
:r.c:n.'1.9
~
:1:0.1.0
B
YMADS*
I03 17
l:O;Z
•
B
B
YMLOCK#
YSWEHl:TM*
D
PCTCXFR*
==2
------
l.
1.
NP9
_10
"@J
15
1:011.5
D
YMEOCl. ..
a2J
!iiiiI
:too 1.4
:IN.11
13
IF'
~
~
~
240957-A6
~
~
_.
EWMSWND
1
DOSC224
B
B
XSNPWS.
r03
0
1. 9
: : : 1.
%0'
(I
YSW£Hl:TM* ___
l
@
B_ENXSAS*
B
ENMSWNO*
n
-
WMSWNOfo
"
J:O..,1.fi
15
:::::1':.:::-----,
%0.
S
YNOSWNO.
EABORT
DOSC224
1
~
t i l 11I
'tI
(II
::I
~
Il.
N
en
g:
B
WM5WNO.
R
YSW'E:H'ITM*
B
YAL~OC.
:1::::
II
~:~I::
,.
B
YABORT.
:
4
:103
:1:INP!!i
'f.,~.~~~~~~~:jt:~~~=!~~~~:
%O'f
;C
0
B
PCTCXF'R*
,.
J:061 "1
W
::r
(II
3
!a-
n
B
CTC£SD.
B
YWA.,
B
eTC DiS.
II
y
"I
II
"\WPU
%NPU~
U7002
»
'tJ
•
""
U1
N
J
EMSNPST
08!>C224
10
MRESET2
2 :tNP2
MKENl..
3 :tN1"3
Ell
4 :tNP 4
MHLDA
L _ _-"'--'=.!-_~S :INP5
:
B
LROYSRC
B
RFO.
B
VNOSWNO.6 LNP6
B
YBCTO
::;f:
,.,.
:I05r
: INP8
,.
INPSI
:loaFS
71=N.7
0
n
YALLOC.
:
M~
"@
:w
B_HDAswaff
J:06J:,?
B
YMLOCK.
JiiiiJ
u=
~
14
~
~
240957-A7
II
~
~
_.
£
EMEMALE
8
]. Dase!:!24
2
MCLk10
MRESET2
WMSWN~
=t
B
YMEOCl."
%0'7116
B
PXSAS.,
B
YBGTft
r04
D_O:ISWND*
ZO~ 1.
xoe
Np,
"
,.
I
'--":
,
];05,17
B
YM'ALE
YDRCTM.
14
:~a~Y;A;L;L;OC~'~::::::::~::::::::::::::::~~~~%~N;.;'J3"
____
A
MSWN'OO.
J:NP1Z
~...B;:;>Y1..~
ECYCDEF
l
~t;;--,
·~~~~--~-------1i---~--------__====:::::::::::::::::j::::::::::::::~2~
%01
DBSC224
>'
"1:1
"1:1
!I
a.
'l-'
en
~I
0
en tn
en
n
....:
~'- ~,,",,OEO
=
B
MHLDA
___ lL
B
-
B
-
-Q;
-
B
B
XI
a
r:NP]'
YNOSWND" __
"%NP4
CHE.,
"
NP!l
CW~
'":INPS
CM:IO
71:NP'1'
CDC.,
-:tNP.
LLEN
MTH:I'l""
9%N• •
~in I
-..;~
»
."
~
CII
N
::J'
ID
3
DI
U71.02
g.
l YDRC~.
ESIGGEN
1~_·_~C2::p1.4123
a
BYMADS., 2fNP2 :I01'F.2~2~::~::::::::t:::::::::::::::::::::::::::::::::::::::::::::::::::::~~~~~~~
:t02~21.
B
YSNPOl:SIJ
1:03 20
%04 19
MWBWT_"
:IO!!
MDRCTM4t
B
SNP
,.
1:06]."1'
DIS.
MBOP'Fft
B
CWBWTft
B
CDRCTM.
B
C8MSELft
%0'7 16
roa
loS
MTR4' Tit • .,
"Pl.],
l:NP:l2
13
~
07103
B
LMRST
/iiiiI
F
=
~
%NP13 14
.
'\§J
:w
240957-AB
~
a§)
~
......
l
EMBE
B
LBE[7:0)*
I
D05C224
~t::2 XN:::~ MBE6.
MBEC?;Ol4f
I
X03'~~H-="",""---II
xo. to
XO.
18
:I061.7
:to? 1.6
:IOO 1.5
II
B
KCACHE.
l,..NP:l.O
:INPl.::J~:
:INP:l.2
~
"0
"0
CD
:::I
Q.
l\)
0,
0'1
-..J
l>
;CO
~
UI
n
U1
D8SC224
l.~
:::I"
CD
B
II)
LMRST
3
• :ro:zF ~
:::r,:
!.
n
"tJ
I
EMZBT
o
•
B
B
F"l.USH.,
•
R
FPFLO.
:
YPl:PE/J
MCACHE4f
B
B
X07 a
YMEOCl.*
MSYNC/J
.&-
f
:lOO 15
NP10
XNPl.:J
1- 4
:Z:NP1.2 1. 3
~
:w
ffiiiI
F
B
~
FPFL::>EN.
240957-A9
=
~
~
2.el
~
II
5"
C£..
: .
MA[:U.12l
vee
2"~
_MADS'
2Ac!.
R~f50"''IT 1'8.
vee
.
2
'.
2~
1
-1\3150"'2
1,aw
1
,0
•
2'...
ft31l103
.
r
111'IIIIW
~e
f
:J
I}l
U1
a.
iii"
•
2'...
"'~IID"
ir"~HZT'"
.2~
RlI6DYS
kP:iIS03
l>
1/8W
lc.n
vee
•
..C .... r."E...
•
2
2~
1-
R3150115
ID
3
~
n
.
vee
o
8l W
::r
1-
1.1 • •
N
MA1.ts
11'8'"
vee
•
MSNPS'I"S_
:Ii..2~
1
R31507
1,BW
.
•
"DOFF.
2'
vee
2~
W:
R3608
1.
1'8"
. vee
II
MABORT#
:I..2~
1.
RlIeo,
aHL!!:N
1.18W
·d
2.1
R:JCl10
"\§J
.
22J
IiiiiI
11'8W
IF'
240957-80
=
~
~
~
as!
~
......
MBIl;[ -, , 0
£
JoI<
"'OP[ 710)
MDf63.01
RP3~
---~-~
RP3701.
>
"'C
"'C
CD
::l
r::L
I\J
0,
tn
<0
x'
o
l>
RP37~
'tI
I
""'
(fJ
U1
o
::r
I\)
CD
3
!.
o·
"@
2EJ
!iiiil
IF"
=
~
~
RIO.")';;;:
240957-B1
~
2EJ
~
i960™ Microp~ocessor Family
3
II
80960SA/80960SB
\EMBEDDED 32-Brr [PROCESSORS
WITH 16 BIT BURST DATA BUS
a
0
High-Performance Embedded
Architecture
-16 MIPS Burst Execution at 16 MHz
- 5 MIPS' Sustained Execution at
16 MHz
Built-In Interrupt Controller
- 4 Direct Interrupt Pins
- 32 Priority Levels 256.Vectors
0
121 Built-In Floating Point Unit
(80960SB only)
- Fully IEEE 754 Compatible
t:iI 512-Byte On-Chip Instruction Cache
- Direct Mapped
- Parallel Load/Decode for Uncached
Instructions
Ell Easy to Use, High Bandwidth 16-Bit Bus
- 25.6 Mbyte/sec Burst
- Up to 16 Bytes Transferred per Burst
B Multiple Register Sets
- Sil(teen Global 32-Bit Registers .
- Sil(teen local 32-Bit Registers
- Four local Register Sets Stored
On-Chip
- Register Scoreboarding
0
OJ
32-Bit Address Space, 4 Gigabytes
0
80-Lead Quad Flat Pack (EIAJ QFP)
B 84-Lead Plastic leaded Chip Carrier
(PLCC)
Software Compatible with
809601(A/I(IB/CA Processors
The B09608A and B096088 are members of Intel's i960 32-bit processor family, which are designed especially
for low cost embedded applications. They are based on the family's high performance, common core architecture, and include a 512-byte instruction cache and a built-in interrupt controller. The B09608A and B096088
have a large register set, multiple parallel execution units and a high bandwidth, 16-bit, burst bus. Using
advanced RI8C technology, these high performance processors are capable of execution rates in excess of
5 million instructions per second. * The B09608A and B096088 are well-suited for a wide range of cost
sensitive embedded applications such as laser printers, EI8A and MCA adapters, disk controllers and X
Terminals.
*Relative to Digital Equipment Corporation's VAX-11I7BO" at MIP8
B0960S8
Floating Point
Reglstors
I
1632-blt
Global
BO-blt
Floating
Point
Unit
Reglstors
T
--- -----
64 by 32-blt
Local
32-blt
Register
Integer
Execution
Cache
Unit
T
T
'"
I
'l
Instruction
Fetch Unit
T
~ 'VAX-111M
.
I
l
512-8y\.
Instruction
Cacho
~
t
Instruction
Decoder
I
f
Micro-Instruction
Sequencer
I
8us
Control
Logic
Interrupt
Controller
....
..
32-blt Address
16-bit Data
Burst Bus
and ROM
n
14
"
270917-1
is a trademark of Digital Equipment Corporation.
3-1
October 1991
Order Number: 270917-003
int:eL
S0960SAlS0960SB
All members of the 80960 series share a common
core architecture which utilizes RiSe technology so
that, except for special functions, the family members are object code compatible. Each new processor in the series will add its own special set of functions to the core to satisfy the needs of a specific
application or range of applications for the embedded market. For example, future processors may in- .
clude a DMA controller, a timer or an AID converter.
THE i960™ PROCESSOR SERIES
The 80960SA and 80960S8 are members of a new
family of 32-bit microprocessors from Intel known as
the i960 Series. This series was especially designed
to serve the needs of embedded applications. The
embedded market includes applications as diverse
as industrial automation, avionics, image processing,
graphics, robotics, telecommunications and automobiles. These types of applications require highintegration, low power consumption, quick interrupt response times and high performance. Since time to
market is critical, embedded microprocessors need
to be easy to use in both .hardware and software
designs.
Software written for the 80960SA and 80960S8 will
run without modification on any other member of the
80960 family. The 80960SA is pin compatible with
the 80960S8, which includes an integrated floatingpoint unit.
ADDRESS
SPACE
270917-2
NOTES:
1. Register g15 is reserved for stack management functions.
2. Floating-Point registers and operations are available only in the 9608B and 960KB processors.
3. Registers rO, r1 and r2 are reserved for stack management functions.
4. Register g14 is used by BAL and BALX instructions.
Figure 2. 80960 Register Set
3-2
intel®
S0960SAlS0960SB
80960S8 has a Load/Store architecture. Only the
LOAD and STORE instructions reference memory;
all other instructions operate on registers. This type
of architecture simplifies instruction decoding and is
used in combination with other techniques to. increase parallelism.
Key Performance features
The 80960SA and 80960S8's architecture is based
on the most recent advances in RISC technology
and is grounded in Intel's long experience in designing embedded controllers. Many features contribute
to the 80960SA and 80960S8 exceptional performance.
4. Simple Instruction Formats. All instructions in
the 80960SA and 80960S8 are 32 bits long and
must be aligned on word boundaries. This alignment
makes it possible to eliminate the instruction-alignment stage in the pipeline. To simplify the instruction
decoder further, there are only five instruction formats and each instruction type uses only one format. (See Figure 3.)
1. Large Register Set. Modern compilers can take
advantage of a large number of registers to optimize
execution speed. For maximum flexibility, the
80960SA and 80960S8 provide 32 32-bit registers
and four 80-bit floating-point registers. (See
Figure 2.)
5. Overlapped Instruction Execution. A load operation allows execution of subsequent instructions to
continue before the data has been returned from
memory, so that these instructions can overlap the
load. The 80960SA and 80960S8 manage this process transparently to software through the use of a
register scoreboard. Conditional instructions also
make use of a scoreboard so that subsequent unrelated instructions can be executed while the conditional instruction is pending.
2. Fast Instruction Execution. Simple functions
make up the bulk of instructions in most programs,
so that execution speed can be greatly improved by
ensuring that these core instructions execute in as
short a time as possible. The most-frequently executed instructions such as register-register moves,
add/subtract, logical operations, and shifts execute
in one to two cycles (Table 1 contains a list of instructions).
3. Load/Store Architecture. Like other processors
based on RISC technology, the 80960SA and
Control
Opcode
Displacement
Compare
and Branch
Opcode
Reg/Lit
Reg
Memory
Access-Short
Opcode
Reg
Base
Memory
Access-Long
Opcode
Reg
Base
I
M·
Displacement
Register
to Register
I
M
x
Mode
I
Scale
Offset
xx
Index
Displacement
270917-3
Figure 3. Instruction Formats
3-3
80960SA/80960SB
Table 1. 80960SA and 80960SB Instruction Set
Data Movement
Load
Store
Move
Load Address
Comparison
Compare
Conditional Compare
Compare and
Increment
Compare and
Decrement
Arithmetic
Logical
Add
Subtract
Multiply
Divide
Remainder
Modulo
Shift
Extended Multiply
Extended Divide
And
Not And
And Not
Or
Exclusive Or
Not Or
Or Not
Nor
Exclusive Nor
Not
Nand
Rotate
Branch
• Call/Return
Unconditional Branch
Conditional Branch
Compare and Branch
Call
Call Extended
Call System
Return
Branch and Link
Debug
Miscellaneous
Decimal
Modify Trace Controls
Mark
Force Mark
Atomic Add
Atomic Modify
Flush Local Registers
Modify Arithmetic
Controls
Scan Byte for Equal
Test Condition Code
Move
Add with Carry
Subtract with Carry
Conversion
(80960SB only)
Floating-Point
(80960SB only)
Convert Real to Integer
Convert Integer to Real
Synchronous
Move Real
Add
Subtract
Multiply
Divide
Remainder
Scale
Round
Square Root
Sine
Cosine
'Tangent
Arctangent
Log
Log Binary
Log Natural
Exponent
Classify
Copy Real Extended
Compare
3·4
Synchronous Load
Synchronous Move
Bit and Bit Field
Set Bit
Clear Bit
Not Bit
Check Bit
Alter Bit
Scan for Bit
Scan over Bit
Extract
Modify
Fault
Conditional Fault
Synchronize Faults
int:et
S0960SAlS0960SB
6. Integer Execution Optimization. When the result of an operation is used as an operand in a subsequent calculation, the value is sent immediately to
its destination register. Yet at the same time, the
value is put back on a bypass path to the ALU,
thereby saving the time that otherwise would be required to retrieve the value for the next operation.
Data Types
The 809608A and 8096088 recognize the following'
data types:
Numeric:
o 8-, 16-, 32- and
o bit Field
• Triple-Word (96 bits)
• Quad-Word (128 bits)
Large Register Set
The following environment of the 809608A and
8096088 include a large number of registers. In fact,
32 registers are availableatany time. The availability
of this many registers greatly reduces the number of
memory accesses required to execute most programs, which leads to greater instruction processing
speed ..
There are two types of general-purpose registers:
local and global. The global registers consist of sixteen 32-bit registers (GO through G15). These registers perform the same function as the general-purpose registers provided in other popular microprocessors. The term global refers to the fact that these
registers retain their contents across procedure
calls.
Memory Space and Addressing Modes
For ease of use, the 80960tlA and 8096088 have a
small number of addressing modes, but include all
those necessary to ensure efficient compiler implementations of high-level languages such as C, Fortran and Ada. Table 2 lists the memory addressing
modes.
ordinals
Non-Numeric:
• bit
8. Cache Bypass. If there is a cache miss, the processor fetches the needed instruction, then sends it
on to the instruction decoder at the same time it
updates the cache. Thus, no extra time is taken to
load and read the cache.
The 809608A and 8096088 offer a linear programming environment so that all programs running on
the processors are contained in a single address
space. The maximum size of the address space is
4 Gigabytes.
64~bit
• 8-, 16-, 32- and 64'bit integers
• 8-, 16-, 32-, 64- and.80-bit reals
7. Bandwidth Optimizations. The 809608A and
8096088 get optimal use of their memory bus bandwidth because the bus is tuned for use with the
cache; the line size of the instruction cache matches
the maximum burst size for instruction fetches. The
809608A and 8096088 automatically fetch four
words in a burst and store them directly in the
cache. Due to the size of the cache and the fact that
it is continually filled in anticipation of needed instructions in the program flow, the 809608A and
8096088 are exceptionally insensitive to memory
wait states. In fact, each wait state causes only a
10% degradation in system performance. The benefit is that the 809608A and 8096088 will deliver outstanding performance even with a low cost memory
system.
'
The local registers, on the other hand, are proce. dure specific. For each procedure call, the 809608A
and 8096088 allocate 16 local registers (RO through
R15). Each local register is 32 bits. wide.
Multiple Register Sets
To further increase the efficiency of the register set;
multiple sets of local registers are stored on-chip.
This cache holds up to four local register frames,
which means that up to three procedure calls can be
made without having to access the procedure stack
resident in memory.
3-5
II
infel~
S0960SA/S0960SB
Table 2. Memory Addressing Modes
•
•
•
•
•
•
•
•
12-Bit Offset
32-Bit Offset
Register-Indirect
Register + 12-Bit Offset
Register + 32-Bit Offset
Register + (Index-Register x Scale-Factor)
Register x Scale Factor + 32-Bit Displacement
Register + (Index-Register x Scale-Factor) + 32-Bit Displacement
Scale-Factor is 1, 2, 4, 8 or 16
Although programs may have procedure calls nested many calls deep, a program typically oscillates
back and forth between only two or three levels. As
a result, with four stack frames in the cache, the
probability of there being a free frame on the cache
when a call is made is very high. In fact, runs of
representative C-Ianguage programs show that 80%
of the calls are handled without needing to access
oldest set of local registers in the register cache to a
procedure stack in memory to make room for a new
set of registers. Global register G15 is used by the
processor as the frame pointer (FP) for the procedure stack.
Note that the global registers are not exchanged on
a procedure call, but retain their contents, making
them available to all procedures for fast parameter
passing. An illustration of the register cache is
shown in Figure 4.
memory~
If there are four or more active procedures and a
new procedure is called, the processor moves the
Register
Cache
Local Register Set
One of Four
Loc 01
Reglste r Sets
----- r--.
/
\
31
Figure 4. Multiple Register Sets are Stored On-Chip
3-6
o
270917-4
S0960SAlS0960SB
instruction and the instruction that uses the register
contents, as shown in the following example:
Instruction Cache
To further reduce memory accesses, the 80960SA
and 80960S8 include a 512-byte on-chip instruction
cache. The instruction cache is based on the concept of locality of reference; that is, most programs
are not usually executed in a steady stream but consist of many branches and loops that lead to jumping
back and forth within the same small section of
code. Thus, by maintaining a block of instructions in
a cache, the number of memory references required
to read instructions into the processor can be greatly
reduced.
LOAD address 1, R4
LOAD address 2, R5
Unrelated instruction
Unrelated instruction
ADD R4, R5, R6
In essence, the two unrelated instructions between
the LOAD and ADD instructions are executed for
free (Le., take no apparent time to execute) because
they are executed while the register is being loaded,
Up to three instructions can be pending at one time
with three corresponding scoreboard bits set. 8y exploiting this feature, system programmers and compilers have a useful tool for optimizing execution
speed.
To load the instruction cache, instructions are
fetched in 16-byte blocks, so that up to four instructions can be fetched at one time.
Code for small loops will often fit entirely within the
cache, leading to a great increase in processing
speed since further memory references might not be
necessary until the program exits the loop. Similarly,
when calling short procedures, the code for the calling procedure is likely to remain in the cache, so it
will be there on the procedure's return.
floating-Point Arithmetic
In the 80960S8, floating-point arithmetic has been •
made an integral part of the architecture. Having the
floating-point unit integrated on-chip provides two
advantages. First, it improves the performance of
the chip for floating-point applications, since no additional bus' overhead is associated with floatingpOint calculations, thereby leaving more time for other bus operations such as 1/0. Second, the cost of
using floating-point operations is reduced because a
separate coprocessor chip is not required.
Register Score boarding
The instruction decoder has been optimized in several ways. One of these optimizations is the ability to
do instruction overlapping by means of register
scoreboarding.
The 80960S8 floating-point (real number) data types
include single-precision (32-bit), double-precision
(64-bit) and extended precision (80-bit) floating-point
numbers. Any register may be used to execute floating-point operations.
Register scoreboarding occurs when a LOAD instruction is executed to move a variable from memory into a register. When the instruction is initiated, a
scoreboard bit on the target register is set. When the
register is actually loaded, the bit is reset. In between, any reference to the register contents is accompanied by a test of the scoreboard bit to insure
that the load has completed before processing continues. Since the processor does not have to wait for
the LOAD to be completed, it can go on to execute
additional instructions placed in between the LOAD
The processor provides hardware support for both
mandatory and recommended portions of IEEE
Standard 754 for floating-point arithmetic, exponential, logarithmic and other transcendental functions.
Table 3 shows execution times for some representative instructions.
3-7
80960SA/80960SB
• 16·bit data path multiplexed onto the lower bits of
the 32·bit address path
Table 3. Sample Floating-Point
Execution Times (,...s) at 16 MHz
32-Blt
64-Blt
Add
0.6
0.8
Subtract
0.6
0.8
Multiply
1.1
2.0
Divide
2.0
4.5
Square Root
Arctangent
• Eight 16-bit half-word burst capacity, which allows transfers from 1 to 16 bytes at a time
o High bandwidth reads and writes at 25.6 Mbytes
per second
Figure 5 identifies the groups of signals which constitute the Bus. Table 4 lists the function of the Bus
and other processor-support signals, such as the interrupt lines.
5.8
6.1
15.8
20.5
Interrupt Handling
The 809605A and 809605B can be interrupted in
one of two ways: by the activation of one of four
interrupt pins or by sending a message on the processor's data bus.
Exponent
17.7
19.5
Sine
23.8
25.9
Cosine
23.8
25.9
The 809605A and 809605B are unusual in that they
automatically handle interrupts on a priority basis
and track pending interrupts through their on·chip
interrupt controller. Two of the interrupt pins can be
configured to provide 8259A handshaking for expansion beyond four interrupt lines.
High Bandwidth Bus
The 809605A and 809605B CPUs reside on a high·
bandwidth address/data bus. The bus provides a di·
rect communication path between the processor
and the memory and I/O subsystem interfaces. The
processor uses the bus to fetch instructions, manipulate memory and respond to interrupts. Its features
include:
960SA/SB Bus
\
960SA/SB Bus Signal Groups
\
Addre.s 32-Lines
<~
<
______
Data
16-Lines
~co~n~tr~o~I(~O~pe~ra~t~io~n~S~ig~na~ls~=_1~5~-~Li~ne~s~)________~:>
Arbitration (2-LinesJ
:>
270917-5
Figure 5. 80960SA and 80960SB Bus Signal Groups
3·8
80960SA/80960SB
Debug Features
application and are often included as part of the operating system or kernel.
The 80960SA and 80960S8 have built-in debug capabilities. There are two types of breakpoints and six
different trace modes. The debug features are controlled by two internal 32-bit registers, the ProcessControls Word and the Trace-Controls Word. 8y setting bits in these control words, a software debug
monitor can closely control how the processor responds during program execution.
For each of the ten fault types, there are numerous
subtypes that provide specific information about a
fault. For example, a floating-point fault may have its
subtype set to an Overflow or Zero-Divide fault. The
fault handler can use this specific information to respond correctly to the fault.
BUilT-IN TESTABILITY
The 80960SA and 80960S8 have both hardware
and software breakpoints. They provide two hardware breakpoint registers on-chip which can be set
by a special command to any value. When the instruction pointer matches the value in one of the
breakpoint registers, the breakpoint will fire, and a
breakpoint handling routine is called automatically.
Upon reset, the 80960SA and 80960S8 automatically conducts an extensive internal test (self-test) of its
major blocks of logic. Then, before executing its first
instruction, it does a zero check sum on the first
eight words in memory to ensure that the system
has been loaded correctly. If a problem is discovered at any point during the self-test, the 80960SA
and 80960S8 will indicate a failure and will not begin
program execution. The self-test takes approximately 47,000 cycles to complete, and can be disabled.
Tracing is available for all instructions (single-step
execution), calls and returns and branching. Each
different type of trace may be enabled separately by
a special debug instruction. In each case, the
80960SA and 80960S8 execute the instruction first
and then call a trace handling routine (usually part of
a software debug monitor). Further program execution is halted until the trace routine is completed.
When the trace event handling routine is completed,
instruction execution resumes at the next instruction. The 80960SA and 80960S8's tracing mechanisms, which are implemented completely in hardware, greatly simplify the task of testing and debugging software.
System manufacturers can use the a0960SA and
80960S8's self-test feature during incoming parts inspection. No special diagnostic programs need to be
written, and the test is both thorough and fast. The
self-test capability helps ensure that defective parts
will be discovered before systems are shipped, and
once in the field, the self-test makes it easier to distinguish between problems caused by processor failure and problems resulting from other causes.
FAULT DETECTION
CHMOS
The 80960SA and 80960S8 have an automatic
mechanism to handle faults. There are ten fault
types including trace, arithmetic, and· floating-point
faults. When the processor detects a fault, it automatically calls the appropriate fault handling routine
and saves the current instruction pointer and necessary state information to make efficient recovery
possible. The processor posts diagnostic information on the type of fault to a Fault Record. Like interrupt handling routines, fault handing routines are
usually written to meet the needs of a specific
The 80960SA and 80960S8 are fabricated using Intel's CHMOS IV (Complementary High Speed Metal
Oxide Semiconductor) process. This advanced technology eliminates the frequency and reliability limitations of older CMOS processes and opens a new
era in microprocessor performance. It combines the
high performance capabilities of Intel's industryleading HMOS technology with the high density and
low power characteristics of CMOS. The 80960SA
and 80960S8 are available at 10 MHz in both PLCC
and QFP packages, and at 16 MHz in the PLCC
package.
3-9
•
inteL
S0960SAlS0960SB
Table 4. 80960SA and 80960SB Pin Description: Bus Signals
Symbol
CLK2
A31-A16
Type
SYSTEM CLOCK provides the fundamental timing for 80960SA and 80960SB
systems. CLK2 is divided by two inside the 80960SA and 80960SB to generate the
internal processor clock.
0
ADDRESS BUS carries the upper 16 bits of the 32-bit address to memory. It is valid
throughout the burst cycle, no latchis required.
T.S.
AD15-AD1, DO
A3-A1
1/0
T.S.
ADDRESS/DATA BUS carries the low order 32-bit addresses and 16-bit data to and
from memory. AD15-AD4 must be latched since the cycle following the address
cycle carries data on the bus ..
0
ADDRESS BUS carries the word addresses of the 32-bit address to memory. These
three bits are incremented during a burst access indicating the next word address of
the burst access. Note that A3-A1 are duplicated with AD3-AD1 during the address
cycle.
T.S.
ALE
0
T.S.
AS
0
T.S.
WIR
0
T.S.
DEN
0
T.S.
READY
DT/A
ADDRESS LATCH ENABLE indicates the transfer of a physical address. ALE is
asserted during a Ta cycle and deasserted before the beginning of the following Td
state. It is active high and floats to a high impedance state during a hold cycle (Th or
Thr).
ADDRESS STATUS indicates an address state. AS is asserted every Ta state and
deasserted during the following Td state. AS is driven HIGH during reset.
WRITE/READ specifies, during a Ta cycle, whether the operation is write or read. It
is latched on-chip and remains valid during Td cycles.
DATA ENABLE is asserted during Td cycles and indicates transfer of data on the AD
lines. The AD lines should not be driven by an external source unless DEN is
asserted. When DEN is asserted, the outputs from the previous cycle are guaranteed
to be 3-stated. In addition, DEN deasserted indicates inputs have been captured and
therefore input hold times can be disregarded. DEN is driven to a HIGH during reset.
I
READY indicates that data on AD lines can be sampled or removed. If READY is not
asserted during a Td cycle the Td cycle is extended to the next cycle by inserting a
wait state (Tw).
0
DATA TRANSMIT/RECEIVE indicates the direction of the data transfer to and from
the bus. It is low during Ta and Td cycles for a read or interrupt acknowledgement; it
is high during Ta and Td cycles for a write. DT /A never changes state when DEN is
asserted. DT IA is driven HIGH during reset.
T.S.
BLAST/FAIL
Name and Function
I
0
T.S.
BURST LAST indicates the last data cycle (Td) of a burst access. It is asserted low
during the last Td and associated Tw cycles in a burst access.
INITIALIZATION FAILURE indicates that the processor has failed to initialize
correctly. The failure state is indicated by a combination of BLAST asserted and both
BE signals not asserted. This condition occurs after RESET is deasserted and before
the first bus transaction begins. FAIL is asserted while the processor performs a selftest. If the self-test completes successfully, then FAIL is deasserted. Next, the
processor performs a zero checksum on the first eight words of memory. If it fails,
FAIL is asserted for a second time and remains asserted; if it passes, system
initialization continues and FAIL remains deasserted.
110 = Input/Output, 0 = Output, I = Input, 0.0. = Open-Drain, T.S. = 3-State.
3-10
intel®
80960SA/80960SB
Table 4. 80960SA and 80960S8 Pin Description: Bus Signals (Continued)
Symbol
RESET
Type
Name and Function
RESET clears the internal logic of the processor and causes it to reinitialize.
I
During RESET assertion, the input pins are ignored (except for INTO, INT1, INT3,
LOCK), the tri-state output pins are placed in a HIGH impedance state (except for
DTIR, DEN, and AS), and other output pins are placed in their non-asserted state.
RESET must be asserted for at least 41 CLK2 cycles for a predictable reset.
Optionally, for a synchronous reset, the LOW to HIGH transition of RESET should
occur after the rising edge of both CLK2 and the external bus clock, and before the
next rising edge of CLK2.
The interrupt pins indicate the initializtion sequence executed. Typical initialization
requires driving only INTO and INT3 to a HIGH state. The reset conditions follow:
BE1-BEO
0
INTO
INn
INT3
LOCK
1
0
0
x
x
x
0
1
x
x
1
1
x
0
x
1
1
x
x
0
Action Taken
Run self-test (core initialization)
Disable self-test
Reserved
Reserved
ONCE mode (see LOCK pin)
BYTE ENABLE LINES specify which data bytes (up to two) on the bus take part in
the current bus cycle. BE1 corresponds to AD15.:.:AD8 and BEO corresponds to
AD? -AD1, DO. The byte enable lines are asserted appropriately during each data
cycle.
T.S.
INITIALIZATION FAILURE indicates that the processor has failed to initialize
. correctly. The failure state is indicated by a combination of BLAST asserted and
both BE signals not asserted. This condition occurs after RESET is deasserted and
before the first bus transaction begins. FAIL is asserted while the processor
performs a self-test. If the self-test completes successfully, then FAIL is
deasserted. Next, the processor performs a zero checksum on the first eight words
of memory. If it fails, FAIL is asserted for a second time and remains asserted; if it
passes, system initialization continues and FAIL remains deasserted.
INTO
I
INTERRUPT 0 indicates a pending interrupt. The bus interrupt control register
determines in which way the signal should be interpreted. To signal an interrupt
request in a synchronous system, this pin (as well as the other interrupt pins) must
be enabled by being deasserted for at least one bus cycle and then asserted for at
least one additional bus cycle; in an asynchronous system, the pin must remain
deasserted for at least two bus cycles and then be asserted for at least two more
bus cycles. INTO is sampled during RESET to determine if the self-test sequence is
to be executed.
INT1
I
INTERRUPT 1 indicates a direct interrupt, like INTO. INT1 is sampled during
RESET to determine if the self-test sequence is to be executed.
INT2/1NTR
I
INTERRUPT 21INTERRUPT REQUEST: The interrupt control register determines
how this pin is interpreted. If INT2, it has the same interpretation as the INTO and
INT1 pins. If INTR, it is used to receive an interrupt request from an external 8259A
compatible interrupt controller.
INT3/1NTA
1/0
T.S.
INTERRUPT 3/INTERRUPT ACKNOWLEDGE: The interrupt control register
determines how this pin is interpreted. If INT3, it has the same interpretation as the
INTO and INT1 pins. If INTA, it is used as an output to control interruptacknowledge bus transactions. The INTA output is latched on-chip and remains
valid during Td cycles. INT3 must be pulled to a HIGH state during RESET.
I/O
=
Input/Output, 0
=
Output, I
=
Input, 0.0.
=
Open-Drain. T.8.
3-11
=
3-8tate.
infel®
S0960SAlS0960SB
Table 4. 80960SA and 80960SB Pin Description: Bus Signals (Continued)
Symbol
LOCK
Type
Name and Function
I/O
BUS LOCK prevents other bus masters from gaining control of the bus following the
currentcycle (if they would assert LOCK to do so). LOCK is used by the processor or
any bus agent when it performs indivisible Read/Modify/Write (RMW) operations. Do
not leave LOCK unconnected. It must be pulled HIGH for the processor to function
properly.
0.0.
For a read that is designated as an RMW-read, LOCK is examined. If asserted, the
processor waits until it is not asserted; if not asserted, the processor asserts LOCK
during the Ta cycle and leaves it asserted.
I
A write that is designated as an RMW-write de asserts LOCK in the Ta cycle. During
the time LOCK is asserted, a bus agent can perform a normal r~ad or write but no
RMW operations. LOCK is also held asserted during an interrupt-acknowledge
transaction.
ONCE MODE: The LOCK pin is sampled during reset. If it is asserted LOW at the end
of RESET, all outputs will be 3-stated until the part is reset. ONCE MODE is used in
conjunctipn with an ICE.
HOLD
HLDA
I
HOLD: HOLD indicates a request from a secondary bus master to acquire the bus.
When the processor receives HOLD and grants another master control of the bus, it
floats its tri-state bus lines and then asserts HLDA and enters the Th state. When
HOLD is deasserted, the processor will deassert HLDA and go to either the Ti or Ta
state.
0
HOLD ACKNOWLEDGE: HLDA indicates that bus control has been relinquished to
another bus master. This signal is always driven. At RESET it is driven LOW.
T.S.
N.C.
N/A
NOT CONNECTED indicates pins should not be connected. Never connect any pin
marked N.C.
"
I/O = Input/Output, 0= Output, I = Input, 0.0. = Open-Drain, T.S. = 3-State.
plane. Likewise, all Vss pins should be strapped together, preferably on, a ground plane. These pins
may not be connected together within the chip.
ELECTRICAL SPECIFICATIONS
Power and Grounding
Power Decoupling Recommendations
The 80960SA and 80960S8 are implemented in
CHMOS IV technology ,and have modest power requirements. Their high clock frequency and numerous output buffers (address/data, control, error, and
arbitration signals) can cause power surges as multiple output buffers drive new signal levels simultaneously. For clean on-chip power distribution at high
frequency, 12 Vee and 13 Vss pins separately feed
functional units of the 80960SA and 80960S8 in the
package.
Liberal decoupling capacitance should be placed
near the 80960SA and 80960S8. The processor can
cause transient power surges when driving the bus,
particularly when it is connected to a large capacitive load.
Low inductance capacitors and interconnects are
recommended for best high frequency electrical performance. 'Inductance can be reduced by shortening
the board traces between the processor and decoupiing capacitors as much as possible.
Power and ground connections must be made to all
power and ground pins of the 80960SA ,and
80960S8. On the circuit board, all Vee pins must be
strapped closely together, preferably on a power
3-12
InteL
S0960SAlS0960SB
processor at supply voltage (Ved of 5V. Figure 8
shows the typical power supply current (led required by the 80960SA and 80960S8 at various operating frequencies when measured at three input
voltage (Vedlevels.
Connection Recommendations
For reliable operation, always connect unused inputs to an appropriate signal level. In particular, if
one or more interrupt lines are not used, they should
be deasserted. No inputs should ever be left floating.
For a given output current (loll, the curve in Figure 9
shows the worst case output low voltage (VoLl. Figure 10 shows the typical capacitive derating curve
for the 80960SA and 80960S8 measured from 1.5V
on the system clock (ClK) to 0.8V on the falling
edge and 2.0V on the rising edge of the bus address/ data (AD) signals.
All open-drain outputs require a pullup device. While
in some cases a simple pullup resistor will be adequate, we recommend a network of pullup and pulldown resistors biased to a valid VIH (~2.0V) and
terminated in the characteristic impedance of the circuit board. Figure 6 shows our recommendations for
the resistor values for both a low and high current
drive network, which assumes that the circuit board
has a characteristic impedance of 100n. The advantage of terminating the output signals in this fashion
is that it limits signal swing and reduces AC power
consumption.
Test Load Circuit
Figure 11 illustrates load circuit used to test the
80960SA and 80960S8's 3-state pins, and Figure 12
shows the load circuit used to test the open drain
output. The open drain test uses an active load circuit in the form of a matched diode bridge. Since the
open-drain output sinks current, only the IOL legs of
the bridge are necessary and the IOH legs are not
used. When the 80960SA and 80960S8 driver under
test is turned off, the output pin is pulled up to VREF
(Le., VOH). Diode 01 is turned off and the IOL current
source flows through diode 02.
Characteristic Curves
The 80960SA and 80960S8 characteristic curves
shown in Figures 7 through 10 supply information
regarding typical supply currents, typical current versus frequency, worst case voltage versus output current on open drain pins and capacitive derating
curves.
When the 80960SA and 80960S8 open-drain driver
under test is on, diode 01 is also on, and the voltage
on the pin being tested drops to VOL. Diode 02 turns
off and IOL flows through diode 01.
Figure 7 shows the typical supply current requirements over the operating temperture range of the
vee
Vee
220n
390n
OPEN-DRAIN
OUTPUT
OPEN-DRAIN
OUTPUT
270n
470n
270917-7
270917-6
Low Drive Network:
• VOH = 2.45V to 3.0V
o IOl = 9.5 mA to 12 mA
High Drive Network:
• VOH = 2.48V to 3.0V
• IOl = 16 mA to 20 mA
Figure 6. Open Drain Connection Recommendations for
Low and High Current Drive Networks for the LOCK Pin
3-13
intel$
S0960SAlS0960SB
<-
350
....
z
300
'"'":::>
u
250
5
500
<5
UJ
>-
...J
'"
200
...u
150
....>-
100
1i:
450
400
I-
>...J
:::>
til
...J
+ 22°C)
(Temp =
200
til
150
...
...J
u
1i:
>....
5.5V
5.0V
---
300
250
cc-
:::>
4.5V
350
..........-:
~...-I
~
100
:::--
.0
-::--::.It----
50
o
SUPPLY VOLTAGE (V)
o
5
10
15
OPERATING FREQUENCY (MHz)
270917-8
D@5.0V
'_@4.5V
Figure 7. Typical Supply Current
vs Supply Voltage
.@5.5V
270917-9
Figure 8. Typical Current vs Frequency
(Temp
=
+B5°C. Vee
=
4.5V)
(Temp = +B5°C. Vee = 4.5V)
0.8
~
0
,.
0.4
0
...J
/
~
::J
.e::J
..
30
>.
25
~
0.2
/"
/
V
I
,5..
/
o
~
20
~
15
.e-
10
FALLING
~
:!!
::J
::J
o
---
~
--
-:::-
~ -.. RISING
5
0
0.0
o
o
5
10
15
20
o
25
Output Low Current (mA)
20
40
60
80
100
Capacitive Load (prJ
270917-10
270917-11
Figure 9. Worst Case Voltage vs
Output Current on Open-Drain Pin
Figure 10. Capacitive Derating Curve
3-STATE OUTPUT
O---~~--+"--4_-~I---O VREf
IOL Tested at 12 and 20 mA
VREF
270917-12
=
Vee
D1 and D2 are matched
270917-13
Figure 11. Test Load Circuit for
3-State Output Pins
Figure 12. Test Load Circuit for
Open-Drain Output Pins
3·14
80960SA/80960SB
ABSOLUTE MAXIMUM RATINGS
NOTICE: This data sheet contains preliminary information on new products in production. The specifications are subject to change without notice. Verify with
your local Intel Sales office that you have the latest
data sheet before finalizing a design.
Operating Temperature
(PLCC) ................... O·C to + 100·C Case
Operating Temperature
(QFP) .................... O·C to + 1OO·C Case
* WARNING: Stressing the device beyond the "Absolute
Maximum Ratings" may cause permanent damage.
These are stress ratings only. Operation beyond the
"Operating Conditions" is not recommended and extended exposure beyond the "Operating Conditions"
may affect device reliability.
Storage Temperature .......... - 65·C to + 150·C
Voltage on Any Pin (PLCC) ... -0.5V to Vcc + 0.5V
Voltage on Any Pin (QFP) .. -0.25V to Vcc + 0.25V
Power Dissipation ................. 1.9W (16 MHz)
DC CHARACTIERISTICS
960SA/SB (10 MHz and 16 MHz): T CASE
Symbol
Vil
Parameter
Input Low Voltage
=
O·C to + 100·C, VCC
=
5V ± 10% unless otherwise noted.
Min
Mal(
Units
-,-0.3
+0.8
V
Conditions
VIH
Input High Voltage
2.0
Vcc + 0.3
V
VCl
CLK2 Input Low Voltage
-0.3
+0.8
V
VCH
CLK2 Input High Voltage
0.7 Vcc
Vcc + 0.3
V
VOL
Output Low Voltage
0.45
0.45
0.60
V
V
V·
IOl
IOL
IOL
VOH
Output High Voltage
V
All TS, - 2:5 mA(4)
Icc
Power Supply Current
10 MHz-QFP
10 MHz-PLCC
16MHz-PLCC
280
280
350
mA
mA
mA
Output Leakage Current
±15
/LA
(Note 5)
±15
/LA
o :S;
ILO
III
2.4
Input Leakage Current
Input Capacitance
10
pF
Ic
Co
1/0 or Output Capacitance
12
pF
fc
CCLK
Clock Capacitance
10
pF
Ic
3-15
2.5mA
12 mA, LOCK Pin
20 mA, LOCK Pin
T CASE
TCASE
TCASE
CIN
NOTES:
1. TeASE is specified at O'C to + 100'C for the QFP at 10 MHz and Vee = 5V ± 5%,
2. INTO has an internal pullup that sources 100 pA
3. Input, output and clock capacitance are not tested.
4. Not measured on open-drain outputs.
5. Lock has an internal pullup that sources 100 pA
=
=
=
=
=
=
Vo
=
=
=
:S;
O·C(l)
O·C
O·C
VCC<2)
1 MHz(3)
1MHz(3)
1 MHz(3)
intel~
S0960SAlS0960SB
the signal crosses (for output delay and input setup)
1.5V. All AC testing should be done with input voltages of O.4V and 2.4V, except for the clock (CLK2),
which should be tested with input voltages of 0.45V
and 0.7 • Vee. See Figure 13 for timing relationships
for the 80960SA and 80960S8 signals.
AC SPECIFICATIONS
This section describes the AC specifications for the
80960SA and 80960S8 pins. All input and output
timings are specified relative to the 1.5V level of the
rising edge of CLK2, and refer to the time at which
A
B
c
D
A
B
c
D
CLK2
OUTPUTS:
AD(l: 15), A( 1:3), DO
A(16:31), i!E(0:1)
DEN, BLAST
W!R
HLDA, LOCK,INTA
ALE
DT!R
INPUTS:
AD( 1:15), DO
INTO,INT1
INT2/INTR,INT3
HOLD
LOCK
READY
270917-14
Figure 13. Drive Levels and Timing Relationships of 80960SA and 8096058 Signals
3-16
S0960SAlS0960SB
AC Specification Tables
80960SA and 8096058 AC Characteristics (10 MHz)
Symbol
Parameter
Min
Max
Units
125
ns
VIN
ns
VT
T1
Processor Clock
Period (CLK2)
50
T2
Processor Clock Low
Time (CLK2)
8
Processor Clock High
Time (CLK2)
8
Test Conditions
=
1.5V
T4
Processor Clock Fall
Time (CLK2)
10
ns
VT
=
=
=
=
=
T5
Processor Clock Rise
Time (CLK2)
10
ns
VT
=
10% Point to 90% Point(3)
T3
ns
VT
10% Point
VCL + (VCH - VcLl
x
0.1
90% Point
VCL + (VCH - VcLl
x
0.9
90% Point to 10% Point(3)
T6
Output Valid Delay
2
31
ns
CL
T6AS
AS Output Valid Delay
2
25
ns
CL
=
=
50 pF
T7
ALE Width
24
ns
CL
-=
100 pF
T8
ALE Output Valid Delay
4
33
ns
CL
=
100 pF(1)
T9
Output Float Delay
2
20
os
CL
CL
=
=
100 pF (AD)
100 pF (Controls)(1)
T10
Input Setup 1
10
T11
Input Hold
2
ns
T12
Input Setup 2
13
ns
T13
Setup to ALE Inactive
10
ns
CL
T14
Hold after ALE Inactive
8
ns
CL
100 pF (AD and Control)
ns
(Note 4)
=
=
100 pF
100 pF
T15
RESET Hold
3
ns
(Note 2)
T16
RESET Setup
5
ns
(Note 2)
T17
RESET Width
2050
ns
41 CLK2 Periods Minimum
NOTES:
1. A float condition occurs when the maximum output current becomes less than ILO. Float delay is not tested, but should
be no longer than the valid delay.
2. Meeting RESET setup and hold times is an optional method of synchronizing your clocks. If you decide to use an asyn·
chronous reset, then synchronizing the clock can be accomplished by using AS.
3. Processor clock (CLK2) rise time and fall time are not tested.
4. ICE requires a minimum of 4 ns input hold time.
3-17
•
intel~
S0960SAlS0960SB
80960SA and 8096058 AC Characteristics (16 MHz PLCC)
Symbol
Parameter
Tl
Processor Clock
Period (CLK2)
T2
Processor Clock Low
Time (CLK2)
Min
Max
Units
31.25
125
ns
VIN = 1.5V
ns
VT = 10% Point
= VCL + (VCH - VcL>
x 0.1
ns
. VT = 90% Point
= VCL + (VCH - VcL>
x 0.9
8
8
Test Conditions
T3
Processor Clock High
Time (CLK2)
T4
Processor Clock Fall
Time (CLK2)
10
ns
VT = 90% Point to 10% Point(3)
T5
Processor Clock Rise
Time (CLK2)
10
ns
VT = 10% Point to 90% Point(3)
T6
Output Valid Delay
2
25
ns
CL = 100 pF (AD and Control)
21
ns
CL = 50 pF
ns
CL = 100 pF .
T6AS
AS Output Valid Delay
2
T7
ALE Width
15
T8
ALE Output Valid Delay
2
22
ns
CL = 100 pF(l)
T9
Output Float Delay
2
20
ns
CL = 100 pF (AD)
CL = 100 pF (Controls)(l)
Tl0
Input Setup 1
10
Tll
Input Hold
2
ns
T12
Input Setup 2
13
ns
T13
Setup to ALE Inactive
10
ns
CL = 100 pF
ns
(Note 4)
T14
Hold after ALE Inactive
8
ns
CL = 100 pF
T15
RESET Hold
3
ns
(Note 2)
T16
RESET Setup
5
ns
(Note 2)
T17
RESET Width
1281
ns
41 CLK2 Periods Minimum
NOTES:
1. A float condition occurs when the maximum output current becomes less than ILO. Float delay is not tested, but should
be no longer than the valid delay.
2. Meeting RESET setup and hold times is an optional method of synchronizing your clocks. If you decide to use an asynchronous reset, then synchronizing the clock can be accomplished by using AS.
3. Processor clock (CLK2) rise time and fall time are not tested.
4. ICE requires a minimum of 4 ns input hold time.
3-18
int'eL
80960SA/80960SB
To
Td
To
Tr
Tw
Td
Tr
To
ClK
ClK2
A(4:15)/O(O:15)
A(I:3)
iiE(O: 1)
A(16:31)
ALE
w!R
OT!R
270917-15
NOTES:
1. The AD and control signals are driven at all times except during a HOLD acknowledge (HLDA asserted) RESET, and
ONCE mode.
2. The AD and control signals may toggle during idle (Ti) or recovery (Tr) cycles.
Figure 14. Timing Relationships of the 80960SA and 80960SB Bus
3·19
80960SA/80960SB
First
A
8
ClK2
C
0
•••
RESET
OUTPUTS
INTO,INTl
INT3,LOCK
Tl5 Tl6
Tl7
•••
•••
•••
•••
,
,
270917-16
1. The A edge is defined as the first rising CLK2 edge after RESET is deasserted meeting the RESET hold and setup
times.
2. Initialization Parameters must be setup at least four CLK2s prior to the first A edge.
Figure 15. RESET Signal Timing
Th
Th
Th
Th
ClK
,ClK2
HOLD
HlOA
+-_-1-'1
270917-17
Figure 16. HOLD Timing Relationships
imum float time of AD is 20 ns. When DEN is assert·
ed, however, the AD outputs are guaranteed to have
been 3-stated.
'
Design Considerations
Input hold times can be disregarded by the designer
whenever the input is removed because a subsequent output from the processor is deasserted (e.g.,
DEN becomes deasserted).
Designing for the ICE-960SB
The 809608A and 809608B In-Circuit Emulator assists in debugging 809608A and 809608B hardware
and software designs. The product consists of a
probe module, cable, control unit and power supply.
Because of the high operating frequency of the
809608A and 809608B systems, the probe module
Whenever the processor generates an output that
indicates a transition into a subsequent state, any
outputs that are specified to be 3-stated in this new
state are guaranteed to be 3-stated. For example, in
the Td cycle following a Ta cycle for a read, the
minimum output delay of DEN is 2 ns, but the max-
3-20
80960SAlS0960SB
connects directly to the B0960SA and B0960S6
component (EIAJ OFP or PLCC) or a socket for the
PLCC.
numbered in order from 1 to B4 around the package's perimeter. Tables 9 and 10 list the function of
each pin in the OFP. Tables 11 and 12 list the function of each pin in the PLCC.
When designing an B0960SA and B0960S6 hardware system that uses the ICE-960S6 to debug the
system, several electrical and mechanical characteristics should be considered. These considerations
include capacitive loading, drive requirement, power
requirement, and physical layout.
Vee and GND connection must be made to multiple
Vee and GND pins. Each Vee and GND pin must be
connected to the appropriate voltage or ground and
externally strapped close to the package. We recommend that you include separate power and
ground planes in your circuit board for power distribution.
The ICE-960S6 probe module increases the load
capacitance of each line by up to 25 pF. This load
originates from the probe module and are driven by
the B0960SA and B0960S6 processor.
NOTE:
Pins identified as N.C., "No Connect," should never
be connected. The B0960SA and 80960S6 OFP
package contains two N.C. pins and PLCC package
contains six N.C. pins.
To achieve hig[1 noise immunity, the ICE-960S6
probe is powered by the user's system. The highspeed probe circuitry draws up to 1.1 A plus the maximum current (Icc) of the B0960SA and B0960S6
processor.
Pacltage Thermal Specification
The 80960SA and 80960S6 is specified for opera- •
tion when case temperature is within the range O°C
to + 85°C. The case temperature should be measured at the top center of the package.
The AP bus should not be driven by an external
source unless DEN is asserted. In addition, the ICE
requires a minimum data hold time of 4 ns.
The ICE960S6 probe will drive LOCK to a LOW
state during RESET to force the target B0960SA and
B0960S6 to enter ONCE mode. To guarantee timings, the ICE requires ± 5% supply voltage supplied
to the B0960SA and B0960S6. The ICE probe requires a minimum of 0.25 inches clearance on all
sides of both the EIAJ OFP and PLCC.
The ambient temperture can be calculated from 8Je
and 8JA by using the following equations:
TJ = Tc
+ P * IlJC
TA=TJ-P*IlJA
Tc = TA
+ P * [OJA-IlJcl
Values for 8JA and 8Je are given in Table 7 for the
OFP package and in Table 8 for the PLCC package
for various airflows.
Lock Line Termination
You must terminate the LOCK line as described in
Figure 6 in order for the ICE to properly function.
Example:
TA = Te - P' (8JA - 8Jc)
Te = Maximum Case Temperature
P = Maximum Supply Voltage times Icc
at 100 and 10 MHz
8JA and 8Je = OFP Package Thermal Resistance
at 0 ftlm airflow
MECHANICAL DATA
Package Dimensions and Mounting
0
The B0960SA and B0960S6 is available in two different packages: an 80-lead quad flat pack (EIAJ OFP),
shown in Figure 17, and an B4-lead plastic leaded
chip carrier (PLCC) , shown in Figure 18.
TA = 51 = 100 - (5.5' 0.213)' (45.7 - 4)
Pin Assignment
WAVEFORMS
The OFP and PLCC have different pin assignments.
The OPF pins are numbered in order from 1 to 80
around the package's perimeter. The PLCC pins are
Figure 19 through 22 shows the waveforms for various signals on the 80960SA and 80960S6's bus.
3-21
intel~
80960SA/80960SB
-,
Table 7. QFP Package, Thermal Resistance-°C/Watt
Airflow-ft/min
Parameter
0
50
100
200
400
600
SOO
45.7
-na
na
40
31
na
na
4.0
na
na
4.5
5.5
na
na
8JA Junction to Ambient
(CasE:! mE:!asurE:!d in thE:! middlE:!
of thE:! top of thE:! pa~kagE:!)
(No HE:!atsink)
- 8JC Junction to CasE:!
NOTES:
1. This table applies to an B09608A and 8096088 QFP soldered directly onto
2. BJA, = BJC
+
a: board.
BCA·
.3. Thermal data are based on copper lead frames.
Table S. PLCC Package, Thermal Resistance-°C/Watt
Airflow-ft/min
Parameter
0
50
100
200
400
600
SOO
1000
8JA Junction to Ambient
(No Heatsink)
33
na
na
27
23.8
22
20
19.5
8JC Junction to Case
13
na
na
na
na
na
na
na
NOTES:
1. This table applies to an 809608A and 8096088 PLCC soldered directly onto a board.
2. BJA = BJC
8.
+
BCA·
65
11
••
"
Intel
i960SA
53
270917.-19
41
25
••
Figure 1S. S4-Lead Plastic Leaded Chip Carrier
270917-18
Figure 17. SO-Lead EIAJ Quad Flat Pack Package
3-22
80960SA/80960SB
Pin
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Signal
A1
A2
A3
AD1
AD2
AD3
AD4
AD5
AD6
AD7
AD8
AD9
AD10
AD11
AD12
AD13
AD14
AD15
A16
A17
Signal
A22
A21
A20
A19
A18
A17
A16
Vee
Vss
AD15
AD14
Vee
Vss
AD13
AD12
AD11
AD10
AD9
AD8
AD7
Pin
38
35
34
30
29
28
27
26
25
20
19
18
17
16
15
14
11
10
7
6
Table 9. 80960SA and 80960SB QFP Pinout-In Pin Order
Pin
Signal
Pin
Signal
41
BEO
21
Vee
42
22
Vee
Vss
23
43
Vee
Vss
24
44
CLK2
Vss
RESET
25
AD6
45
AD5
INTO
26
46
INT1
27'
AD4
47
INT2/1NTR
AD3
28
48
INT3/1NTA
29
AD2
49
HLDA
30
AD1
50
31
DO
51
Vee
52
32
Vss
Vss
HOLD
33
53
Vee
W/R
34
A3
54
DEN
35
A2
55
DT/R
36
56
Vee
37
57
BLAST
Vss
LOCK
28
A1
58
N.C.
59
39
Vee
40
BE1
60
Vss
Pin
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
Table 10. 80960SA and 80960SB QFP Pinout-In Signal Order
Signal
Signal
Pin
Signal
Pin
A18
5
DO
31
Vee
A19
4
DEN
55
Vee
A20
3
DT/R
56
Vee
50
A21
2
HLDA
Vee
A22
HOLD
53
1
Vee
INTO.
46
A23
80
Vee
A24
79
INT1
47
Vss
INT2/1NTR
48
A25
76
Vss
INT3/1NTA
49
A26
75
Vss
LOCK
58
A27
74
Vss
A28
71
N.C.
39
Vss
A29
70
N.C.
63
Vss
A30
69
READY
67
Vss
45
A31
68
RESET
Vss
12
ALE
66
Vss
Vee
64
21
AS
Vss
Vee
23
BEO
41
Vss
Vee
BE1
40
33
Vee
Vss
BLAST
57
36
Vee
Vss
42
W/R
CLK2
44
Vee
3·23
Signal
Vee
Vss
N.C.
AS
Vss
ALE
READY
A31
A30
A29
A28
Vss
Vee
•
A27
A26
A25
Vee
Vss
A24
A23
Pin
51
59
61
73
77
8
13
22
24
32
37
43
52
60
62
72
78
9
65
54
intel®
S0960SAlS0960SB
Table 11. 80960SA and 80960S8 PLCC Pinout-In Pin Order
Pin
Signal
Pin
Signal
Pin
Signal
Pin
Signal
1
Vee
22
Vss
43
HOLD
N.C.
23
N.C.
44
Vss
Vee
64
2
65
N.C.
3
A27
24
AD13
45
A3
66
W/R
4
A26
25
AD12
46
A2
67
DEN
47
DT/R
5
A25
26
AD11
6
27
AD10
48
69
BLAST
7
Vee
Vss
Vee
Vss
68
28
AD9
49
A1
70
LOCK
8
A24
29
AD8
50
N.C.
71
9
A23
30
AD7
51
BE1
72
52
BEO
73
53
74
54
Vee
Vss
Vee
Vss
Vee
Vss
75
N.C.
55
CLK2
76
AS
10
A22
31
11
A21
32
12
A20
33
13
A19
34
Vee
Vss
Vee
Vss
14
A18
35
AD6
56
RESET
77
Vss
15
A17
36
AD5
57
INTO
78
ALE
16
A16
37
AD4
58
INT1
79
READY
17
38
AD3
59
INT2/1NTR
80
A31
18
Vee
Vss
39
AD2
60
INT3/1NTA
81
A30
19
AD15
40
AD1
61
HLDA
A29
20
AD14
41
DO
62
21
Vee
42
N.C.
63
Vee
Vss
84
83
84
Vss
3-24
A28
80960SA/80960SB
Signal
A1
A2
A3
DO
AD1
AD2
AD3
AD4
AD5
AD6
AD7
ADS
AD9
AD10
AD11
AD12
AD13
AD14
AD15
AD16
A17
Table 12. 80960SA and 80960S8 PLCC Pinout-In Signal Order
Signal
Pin
Signal
Pin
Signal
14
DT/R
68
A18
Vee
A19
13
HLDA
61
Vee
A20
12
HOLD
64
Vee
A21
11
INTO
57
Vee
A22
10
INT1
58
Vee
INT2/1NTR
A23
9
59
Vee
INT3/1NTA
A24
8
60
Vee
A25
5
LOCK
70
Vss
A26
4
N.C.
2
Vss
A27
3
N.C.
23
Vss
A28
83
N.C.
42
Vss
A29
82
N.C.'
50
Vss
A30
81
N.C.
65
Vss
A31
80
N.C.
75
Vss
ALE
READY
79
78
Vss
AS
76
RESET
56
25
Vss
24
BEO
52
1
Vee
Vss
BE1
51
17
20
Vee
Vss
19
BLAST
21
69
Vee
Vss
16
CLK2
55
31
Vee
Vss
W/R
DEN
67
33
15
Vee
Pin
49
46
45
41
40
39
38
37
36
35
30
29
28
27
26
3-25
Pin
44
47
53
6
62
71
73
18
22
32
34
43
48
54
63
7
72
74
77
84
66
..
S0960SAlS0960SB
CLK
CLK2
A(4: 15)/0(0:15)
A(I:3)
BE(O:I)
-I--4--l
A(16:31)
ALE
t..-t---t:::::j
270917-20
Figure 19. Basic 80960SA and 80960SB Timing
CLKI
CLK2
-rLrLf"\-'""----'""----'""----'""----'""----rLrL '""----V
L~h-n-n..n... n..n.. n..n.. n..n.. n..n.. n..n.. n..n..ifLn.. n..n..
-i\-r'
A(4: 15)/0(0:15)
-ICW'.dI
A(I:3)
~
A(16:31)
ALE
'--Add
Add
0
1W.dI
2
IWA
4
T0"a 6
T0"a 8
T0".41 A 1m".41 C
1m"h1 E
IWA
1M:!
I"W'A
1M:!
_....1\
./\.
VA
w/R
OT/R
~
~A
IVA
~
-~
Wd
I~
.WJ.
~
WJ.
VA
WA
1M
I~ I~ I~ I~ I~ I~ I~
270917-21
Figure 20. 80960SA and 80960SB Timing
Showing a Four Word Aligned Read Burst
3-26
80960SA/80960SB
ClKl
ClK2
n-n-n-V\.-V\.-V\.-V\.-n-n- If""\-II
-nLru-L
ru-L ru-L h.n- h-n-h-n-h-n-h-n-h-n- h.n hn
-I\- ~
'--
A(4: 15)/0(0: 15)
-IW'A
A(I:3)
W'Al
A(16:31)
ALE
Add
Add
Im'hI
4
Ilt'JYhI
6
Ilm'hI
8
A
WA
'1M
'W'A
'1M
~
f00'A!
irA!
~
-~
J\.
IW~
W/R
f00'A!
='A!
~
DT/R
,
w/A
WA
rxI
ex
~
~
,
va
'w.:\
~A.
=
~A
~A
~A
~
~A
I
I
270917-22
Figure 21. B0960SA and 80960S8 Double Word Read Timing with Wait States
ClKl
-~ ~ ~ ~ V\.- If""\-If""\-
ClK2
LfUl... fUl... fUl... fUl...
-'-- -1_
A(4: 15)/0(0: 15)
-WA
A(I:3)
W'A!
A(16:31)
ALE
Add
I:WhI
rL-n- r--L ,",-'I
n..n.. hn hn hn ru-t ru-t flSl..
'--
Data
-2
Im"hI
IfW'hI
Data
It'JYhI
W'hI
4
Data
-6
IIWA
Data
W'hI
8
Add
II@'A!
'w.:\
rM
WA
WA
='A!.
irA!
~
r09'A!
_.J\
J\.
W~
WA
~
w/R -~
OT/R
-"
-m
~
~
w/A
WA
I~A.
~~A~ ~A ~ ~A
I
.I
I
I
I
270917-23
Figure 22. 80960SA and 80960S8 Aligned Double Word Write Timing with Walt States
3-27
80960SAl80960SB
eLK
CLK2
-"'-rL "'-"'-rLrLrL "'-rLrLrL "'-rL~
n.n rlSl.. n.n..h-n-
Ln.n.. h-IL n.n.. ~ ru-t h-IL h-IL n.n.. h-IL
...,~
~
'-- J
Add
W&
1
W'fiI 2
r--\.
r-----t r-----t
W'fiI •
W'fiI 6 I'o/fil •
I'o/fil A W'fiI
e
I'o/fil
'1M
WA
:m-
'"
'1M
_J\
/'\
~
wiR
•
wY4
E
rwA
ALE
r--m----
Add ~
-~
'1M
I'o/A
mi·
.TiR
ml.
rwA
~ ~ ~.
12
IW
rm--
WA
W/J
~ ~
'f?g-
rwA
~. ~
~
270917-24
Figure 23. 80960SA 80960SB Timing with a Four Word Read Burst Misaligned by One Byte
ClK
ClK2
-n-n-n-n- " - " - " -n-n-,,-,,-r
n..n.. . .
Lh-n-h-n-h-n-h-n- n..n. n..n. n..n. h-n-rm- n..n.
-I\.- ~
'---
-lW'hoI Add 1'0/&
~
Data I
W'hoI0-2 W'hoI0-4 EV&'fiI 0-6 IW'filO-B IW'filO-A
~2
1
~filO-C
~4~6~B~A~C~
WA
rl0'A
~
~~
'1m
~
ALE
~
W/J
~
~
-~
./\..1·WA
W!R -~
~~
OT!R
MI
~
~"I
I
~
WA
'1m
I='A
~ ~
1iS1
~
~ ~ ~ ~
270917-25
Figure 24. 80960SA and 80960SB Timing witha Three Word Write Burst
Misaligned by One Byte and One Walt State
3-28
intel.
i960™ KA/KB PROCESSOR
PRODUCT OVERVIEW
INTRODUCTION
OVERVIEW OF THE SINGLE
PROCESSOR SYSTEM
ARCHITECTURE
This chapter provides an overview of the Intel i960 KB
processor (which is part of the i960 K series of embedded-processor products).
All of the processors in the i960 K series of products
are based on the Intel i960™ architecture. Most of the
information in this overview also applies to the i960
KA processor. The only difference between the i960
KB and i960 KA processors is that thei960 KA processor does not provide on-chip support for floatingpoint operations or operations on decimal numbers.
OVERVIEW OF THE i960™ KB
ARCHITIECTURE
The i960 KB processor introduced the i960 .architecture-a new 32-bit architecture from Intel. This architecture has. been designed to meet the needs of embedded applications such as machine control, robotics,
process control, avionics and instrumentation.
The central processing module, memory module and
I/O module form the natural boundaries for the hardware system architecture. The modules are connected
together by the high bandwidth 32-bit multiplexed
L-bus, which can transfer data at a maximum sustained
rate of 53 Mbytes per second for an i960 processor operating at 20 MHz.
Figure 1 shows. a simplified block diagram of one possible system configuration. The heart of this system is the
i960 KB processor, which fetches instructions, executes
code, manipulates stored information and interacts
with I/O devices. The high bandwidth L-bus connects
the i960 KB processor to memory and I/O modules.
The i960 KB processor stores system data, instructions
and programs in the memory module. By accessing various peripheral devices in the I/O module, the i960 KB
processor supports communication to terminals, mo. dems, printers, disks and other I/O devices.
The i960 architecture can best be characterized as a
high-performance computing engine. It features highspeed instruction execution and ease of programming.
It is also easily extensible, allowing processors and controllers based on this architecture to be conveniently
customized to meet the needs of specific processing and
control. applications.
i960™ KB Processor and the L-Bus
The i960 KB processor performs bus operations using
multiplexed address and data signals, and provides all
the necessary control. signals. For example standard
control signals, such as Address Latch Enable (ALE),
Address/Data Status (ADS), Write/Read Command
(W/R), Data Transmit/Receive (DT/R) and Data Enable (DEN), are provided by the i960 KB processor.
The i960 processor also generates byte enable signals
that specify which bytes on the 32-bit data lines are
valid for the transfer.
The following are some of the important attributes of
the i960 architecture:
o full 32-bit registers
o high-speed, pipelined instruction execution
o a convenient program execution environment with
32 general-purpose registers and a versatile set of
special-function registers
• a highly optimized procedure call mechanism that
features on-chip caching of local variables and parameters
o extensive facilities for handling interrupts and faults
The L-bus supports burst transactions, which access up
to four data words at a maximum rate of one word per
clock cycle. The i960 KB processor uses the two loworder address lines to indicate how many words are to
be transferred. The i960 KB processor performs burst
transactions to load the on-chip 512-byte instruction
cache to minimize memory accesses for instruction
fetches. Burst transactions can also be used for data
access.
• extensive tracing facilities to support efficient program debugging and monitoring
• register scoreboarding and write buffering to permit
efficient operation when used with lower performance memory subsystems
To transfer control of the bus to an external bus master,
the i960 KB provides two arbitration signals: hold request (HOLD) and hold acknowledge (HLDA). After
receiving HOLD, the processor grants control of the
bus to an external master by asserting HLDA.
3-29
Order Number: 272030-001
intel .
i960TM KA/KB PROCESSOR PRODUCT OVERVIEW
272030-1
Figure 1. Basic i960TM KB System Configuration
can be designed to accommodate the burst transaction
of the i960 KB processor· by using the static column
mode or nibble mode features of the dynamic RAM. In
addition to supplying the operational signals, the controller generates the REA.DY signal to indicate that
data can be transferred to or from the i960 KB processor.
The i960 KB processor provides a· flexible interrupt
structure by using an on-chip interrupt controller, an
external interrupt controller or both. The type of interrupt structure is specified by an internal interrupt vector register. For a system with multiple processors,
another method is available, called inter-agent communication (lAC) where a processor can interrupt another
processor by sending an lAC message.
Memory Module
A memory module can consist of a memory controller,
Erasable Programmable Read Only Memory
(EPROM), and static or dynamic Random Access
Memory (RMA). The memory controller first conditions the L-bus signals for memory operation. It demultiplexes the address and data lines, generates the chip
select signals from the address, detects the stilrt of the
cycle for burst mode operation and latches the byte
enable signals.
The i960 KB processor directly addresses up to
. 4 Gbytes of physical memory. The processor does not
allow burst accesses to cross a 16-byte boundary, to
ease the design of the controller. Each address specifies
a four-byte data word within the block. Individual data
bytes can be accessed by using the four byte-enable signals from the i960 KB processor. Chapter 5 provides
design guidelines for the memory controller.
110 Module
The I/O module consists of the I/O components and
the interface circuit. I/O components can be used to
allow the i960 KB processor. to use most of its clock·
cycles for computational and system management activities. Time consuming tasks can be off-loaded to specialized slave-type components, such as the 8259A Pro-
The memory controller generates the control signals for
EPROM, SRAM and DRAM. Specifically, it provides
the control signals, multiplexed row!column address
and refresh control for dynamic RAMs. The controller
3-30
i960™
1{~!I{B PROCIES~OR
grammable Interrupt Controller or the 82530 Serial
Communication Controller. Some tasks may require a
master-type component, such as the 82586 Local Area
Network Control.
PRODUCT OVIERVIEW
The architecture also provides a set of fast, versatile
load and store instructions. These instructions allow
burst transfers of I, 2, 4, 8, 12 or 16 bytes of information between memory and the registers.
The interface circuit performs several functions. It demultiplexes the address and data lines, generates the
chip select signals from the address, produces the I/O
read or I/O write command from the processor's W /R
signal, latches the byte enable signals and generates the
READY signals. Since some of these functions are
identical to those of the memory controller, the same
logic can be used for both interfaces. For master-type
peripherals that operate on a 16-bit data bus, the interface circuit translates the 32-bit data bus to a 16-bit
data bus.
On-Chip Caching of Code and Data
To further reduce memory accesses, the architecture
offers two mechanisms for caching code and data on
chip: an instruction cache and multiple sets of local
registers. The instruction cache allows prefetching of
blocks of instruction from memory. This helps ensure
that the instruction execution pipeline is supplied with
a steady stream of instructions. It also reduces the
number of memory accesses required when performing
iterative operations such as loops. The architecture allows the size of the instruction cache to vary. For the
i960 KB processor, it is 512 bytes.
The i960 KB processor uses memory-mapped addresses
to access I/O devices. This allows the CPU to use many
of the same instuctions to exchange information for
both memory and peripheral devices. Thus, the powerful memory-type instructions can be used to perform 8-,
16- and 32-bit data transfers.
To optimize the architecture's procedure call mehanism, the processor provides multiple sets of local registers. This allows the processor to perform procedure
calls without having to write the local registers out to
the stack in memory. The number of register sets depends on the processor implementation. The i960 KB
processor provides four sets of local registers.
HIGH PIERfOFUIJlANCIE PIPlOGRAM
IEXIECUTION
Much of the design of the i960 architecture has been
aimed at maximizing the processor's computational
and data processing speed through the use of increased
parallelism. The following paragraphs describe several
of the mechanisms and techniques used to accomplish
this goal.
Overlapped Instruction E:u:ecution
The i960 architecture also enchances program execution speed by overlapping the execution of some instructions. In the i960 K series of processors, this is
accomplished through register scoreboarding.
Register scoreboarding permits instruction' execution to
continue while data is being fetched from memory.
When a load instruction is executed, the processor sets
one or more scoreboard bits to indicate the target registers to be loaded. After the target registers are loaded,
the scoreboard bits are cleared. While the target registers are being loaded, the processor is allowed to execute other instructions that do not use these registers.
load and Store Model
One of the more important features of the i960 architecture is its performance of most operations on operands in registers, rather than in memory. For example,
all arithmetic, logic, comparison, branching and bit operations are performed with registers and literals.
This feature provides two benefits. First, it increases
program execution speed by minimizing the number of
memory accesses necessary to execute a program. Second, it reduces the memory latency encountered when
using slower, lower-cost memory parts.
The processor uses the scoreboard bits to ensure that
the target registers are not used until the load is com.plete. (Scoreboard bits are checked transparently from
software.) This technique allows code to be executed
such that some instructions can be executed in zero
clock cycles (that is, executed for free).
To support this concept, the architecture provides a
generous supply of general-purpose registers. For each
procedure, 32 registers are available, 28 of which are
available for general use. These registers are divided
into two types: global and local. Both types of registers
can be used for general storage of operands. The only
difference is that global registers retain their contents
across procedure boundaries, whereas the processor allocates a new set of local registers each time a new
.
procedure is called.
Single-Cloc!{ Instructions
The i960 architecture is designed to let a processor execute commonly used instructions, such as moves, adds,
subtracts, logical operations and branches, in a minimum number of clock cycles (preferably one cycle).
The architecture supports this concept in several
3-31
i960TM KA/KB PROCESSOR PRODUCT OVERVIEW
registers for the procedure being returned to are restored. This means a program never has to explicitly
save and restore those local variables that are stored in
local registers.
ways. For example, the load and store model described
earlier eliminates the clock cycles required to perform
memory-to-memory operations, by concentrating on
register-to-register operations.
In addition, all of the instructions in the i960 architecture are 32 bits long and aligned on 32-bit boundaries.
This lets instructions be decoded in one clock cycle,
and eliminates the need for an 'instruction-alignment
stage in the pipeline.
, Versatile Instruction Set and
Addressing
The selection of instructions and addressing modes also
simplifies programming. A full set of load, store, move,
arithmetic, comparison and branch instructions are
provided, with operations on both integer and ordinal
data types. Operations on bits and bit strings are simplified by a complete set of Boolean and bit-field instructions.
The i960 KB processor takes full advantage of these
features of the architecture, resulting in more than 50
instructions that can be executed in a single clock cycle.
Efficient Interrupt Model
The addressing modes are efficient and straightforward,
while at the same time providing the necessary indexing
and scaling modes required to address complex arrays
and record structures. The large 4-gigabyte address
space provides ample room to store programs and data.
The availability of 32 addressing lines allows some ad'dress lines to be memory-mapped to control hardware
functions.
The i960 architecture provides an efficient mechanism
for servicing interrupts from external sources. To handle interrupts, the processor maintains an interrupt table of 248 interrupt vectors, 240 of which are available
for general use. When an interrupt is signaled, the processor uses a pointer to the interrupt table to perform an
implicit call to an interrupt handler procedure. In performing this call, the processor automatically saves the
state of the processor prior to receiving the interrupt,
performs the interrupt routine, then restores the state of
the processor. A separate interrupt stack is also provided to segregate interrupt handling from application
programs.
Extensive Fault Handling Capability
The interrupt handling facilities also allow interrupts to
be evaluated by priority. The processor is then able to
store interrupt vectors that are lower in priority than
the current processor task in a pending interrupt section of the interrupt table. The processor checks and '
services the pending interrupts at defined times.
To aid in program development, the i960 architecture
, defines a wide range of faults that the processor detects,
including, arithmetic, faults, invalid operations, invalid
operands and machine faults. When a fault is detected,
the processor makes an implicit call to a fault handler
routine, in a way similar to the' interrupt mechanism
described previously. The information collected for
each fault allows program developers to quickly correct
faulting code, and allows automatic recovery from
some faults.
SIMPLIFIED PROGRAMMING
ENVIRONMENT
Debugging and Monitoring
Because of its streamlined execution environment,
processors based on the i960 architecture are particularly easy to program. The following paragraphs describe some of the architecture features that simplify
programming.
To support debugging systems, the i960 architecture
provides a mechanism for monitoring processor activity
by means of trace events. When the processor detects a
trace event, it signals a trace fault and calls a fault handler. Intel provides several tools that use this feature,
including an in-circuit emulator (ICE) device.
,Highly Efficient Procedure Call
Mechanism
SUPPORT FOR ARCHITECTURAL
EXENSIONS
The ,procedure call mechanism makes procedure calls
and parameter passing between procedures simple and
compact. Each time a call instruction is issued, the
processor auto'matically saves the current set of local
registers and allocates a new set for the called procedure. Likewise, on a return from a procedure, the current set of local registers is deallocated and the local
The i960 architecture provides several features that enable processors based on this architecture to be easily
customized to meet the needs of specific embedded applications, such as signal processing, array processing
or graphics processing.
3-32
i960™ KAlKB PROCESSOR PRODUCT OVERVIEW
The most important of these features is the set of 32
special function registers. These regisers provide a convenient interface to circuitry in the processor or pins
that can be connected to external hardware. They can
be used to control timers, to perform operations on special data types or to perform I/O functions. The special
function registers are similar to the global registers.
They can be addressed by all of the register access instructions.
ing add, subtract, multiply, divide, trigonometric functions and logarithmic functions. These operations are
performed on single precision (32-bit), double precision
(64-bit) and extended precision (80-bit) real numbers.
One of the benefits of this implementation is that the
floating-point handling facilities are integrated into the
normal instruction execution environment. Single and
double precision floating-point values are stored in the
same registers as non-floating point values. Four 80-bit
floating-point registers are provided to hold extendedprecision values.
EXTENSIONS INCLUDED IN THE
i960™ K SERIES PROCESSORS
Interagent Communication
The i960 K series of processors provides a complete
implementation of the i960 architecture, plus several
extensions to that architecture. These extensions fall
into two categories: floating-point processing and interagent communication.
All of the processors in the i960 K series provide an
inter-agent communication (lAC) mechanism, allowing
agents connected to the processor's bus to communicate with one another. This mechanism operates similarly to the interrupt mechanism, except that lAC messages are passed through dedicated sections of memory.
The sort of tasks handled with lAC messages are processor reinitialization, stopping the processor, purging
the instruction cache and forcing the processor to check
pending interrupts.
On-Chip Floating Point
The i960 KB processor provides a complete implementation of the IEEE standard for binary floating-point
arithmetic (IEEE 754-185). This implementation includes a full set of floating-point operations, includ-
3-33
80960KA
EMBEDDED 32-BIT PROCESSOR
High-Performance Embedded
• Architecture
•
iii Built-In Interrupt Controller
- 32 Priority Levels 256 Vectors
- 3.4 J-I-s Latency @ 25 MHz.
- 25 MIPS Burst Execution at 25 MHz
- 9.4 MIPS' Sustained Execution at
25 MHz
Ea Easy to Use, High Bandwidth 32-Bit Bus
512-Byte On-Chip Instruction Cache
- Direct Mapped
- Parallel Load/Decode for Uncached
Instructions
fi:l 4 Gigabyte, Linear Address Space
- 66.7 Mbytes/s Burst
- Up to 16-Bytes Transferred per Burst
I!.il 132-Lead Pin Grid Array (PGA) Package
Pin Compatible with 80960KB
• Multiple
Sets
• - SixteenRegister
Global 32-BII Registers
(;3
132-Lead Plastic Quad Flat Pack (PQFP)
Ii Uses 85C960 Bus Controller
Cil
- Sixteen Local 32-Bit Registers
- Four Local Register Sets Stored
On-Chip
- Register Scoreboarding
Supported by 27960KX Burst EPROMs
The 80960KA is a member of Intel's new 32-bit processor family, the i960 series, which is designed especially
for embedded applications. It is based on the family's high performance, common core architecture, and
includes a 512-byte instruction cache and a built-in interrupt controller. The 80960KA has a large register set,
multiple parallel execution units and a high-bandwidth, burst bus. Using advanced Rise technology, this high
performance processor is capable of execution rates in excess of 9.4 million instructions per second.' The
80960KA is well-suited for.a wide range of embedded applications, including laser printers, image processing,
industrial control, robotics and telecommunications.
*Relative to Digital Equipment Corporation's VAX-11 1780** at 1 MIPS
BUS
CONTROL
LOGIC
AND
INTERRUPT
CONTROLLER
32-BIT
BURST
BUS
270775-1
Figure 1. The 80960KA's Highly Parallel Microarchitecture
"VAX-11TM is a trademark of Digital Equipment Corporation.
3-34
September 1991
Order Number: 270775·004
intel®
80960KA
All members of the 80960 series share a common
core architecture which utilizes RISC technology so
that, except for special functions, the family members are object code compatible. Each new processor in the series will add its own special set of functions to the core to satisfy the needs of a specific
application or range of applications in the embedded
market. For example, future processors may include
a DMA controller, a timer or an AID converter.
THE 960 SERIES
The 80960KA is a member of a new family of 32-bit
microprocessors from Intel known as the i960 Series. This series was especially designed to serve
the needs of embedded applications. The embedded market includes applications as diverse as industrial automation, avionics, image processing,
graphics, robotics, telecommunications and automobiles. These types of applications require high
integration, low power consumption, quick interrupt
response times and high performance. Since time to
market is critical, embedded microprocessors need
to be easy to use in both hardware and software
designs.
Software written for the 80960KA will run without
modification on any other member of the 80960 family. It is also pin-compatible with the 80960KB, which
includes an integrated floating-point unit, and the
80960MC, a military-grade version with support for
multitasking, memory management, multiprocessing
and fault tolerance.
o
gO
SIXTEEN
32-BIT
REGISTERS
GLOBAL
REGISTERS(1)
SIXTEEN
32·BIT
REGISTERS
LOCAL
REGISTERS(2)
g15
rO
ADDRESS
SPACE
r15
32·BITS
ARITHMETIC CONTROLS
32·BITS
INSTRUCTION POINTER
32·BITS
PROCESS CONTROLS
32·BITS
TRACE CONTROLS
NOTES:
1. Register g15 is reserved for stack management functions.
2. Registers rO, r1, and r2 are reserved for stack management functions.
Figure 2. Register Set
3-35
intel®
80960KA
so that execution speed can be greatly improved by
ensuring that these core instructions execute in as
short a time as possible. The most-frequently executed instructions such as register-register moves,
add/subtract, logical operations, and shifts execute
in one to two cycles (Table 1 contains a list of instructions.)
KEY PERFORMANCE FEATURES
The a0960KA's architecture is based on the most
recent advances in RISC technology and is grounded in Intel's long experience in designing embedded
controllers. Many features, contribute to the
a0960KA's exceptional performance:
3. Load/Store Architecture. One way to improve
execution speed is to reduce the number of times
that the processor must access memory to perform
an operation. Like other processors based on RISC
technology, the a0960KA has a Load/Store architecture, only the LOAD and STORE instructions reference memory; all other instructions operate on
registers.
1. Large Register Set. Modern compilers can take
advantage of a large number of registers to optimize
execution speed. For maximum' flexibility, the
a0960KA provides 32 32-bit registers and four a~-bit
floating-point registers. (See Figure 2.)
2. Fast Instruction Execution. Simple functions
make up the bulk of instructions in most programs,
Control
Opcode
Compare
and Branch
Opcode
Reg/Lit
Reg
Register
to Register
Opcode
Reg
Reg/Lit
Memory
Access-Short
Opcode
Reg
Base
Memory
Access-Long
Opcode
Reg
Base
Displacement
M
Displacement
M
x
Mode
Displacement
Figure 3. Instruction Formats
3-36
Ext'd Op
Modes
Reg/Lit
Offset
Scale
xx
Index
80960KA
Table 1. B0960KA Instruction Set
Data Movement
Load
Store
Move
Load Address
Comparison
Compare
Conditional
Compare
Compare and
Increment
Compare and
Decrement
Debug
Modify Trace
Controls
Mark
Force Mark
Arithmetic
Logical
Add
Subtract
Multiply
Divide
Remainder
Modulo
Shift
And
Not And
And Not
Or
Exclusive Or
Not Or
Or Not
Nor
Exclusive Nor
Not
Nand
Rotate
Branch
Call/Return
Call
Call Extended
Call System
Return
Branch and Link
Unconditional
Branch
Conditional Branch
Compare and
Branch
Miscellaneous
Decimal
Atomic Add
Atomic Modify
Flush Local Registers
Modify Arithmetic
Controls
Scan Byte for Equal
Test Condition Code
Modify Process Controls
Move
Add with Carry
Subtract with Carry
Bit and Bit
Field
Set Bit
Clear Bit
Not Bit
Check Bit
Alter Bit
. Scan for Bit
Scan over Bit
Extract
Modify
Fault
Conditional Fault
Synchronize Faults
' ,
Synchronous
Synchronous Load
Synchronous Move
3·37
intel~
80960KA
4. Simple Instruction Formats. All instructions in
the 80960KA are 32-bits long and must be aligned
on word boundaries. This alignment makes it possible to eliminate the instruction-alignment stage in
the pipeline. To simplify the instruction decoder further, there are only five instruction formats and each
instruction uses only one format. (See Figure 3.)
to ensure efficient compiler implementations of highlevel languages such as C, Fortran and Ada. Table 2
lists the memory addressing modes.
Data Types
The 80960KA recognizes the following data types:
5. Overlapped Instruction Execution. A load operation allows execution of subsequent instructions to
continue before the data has been returned from
memory, so that these instructions can overlap the
load. The 80960KA manages this process transparently to software through the use of a register scoreboard. Conditional instructions also make use of a
scoreboard so that subsequent unrelated instructions can be executed while the conditional instruction is pending.
6. Integer Execution Optimization. When the result of an operation is used as an operand in a .subsequent calculation, the value is sent immediately to
its destination register. Yet at the same time,the
value is put back on a bypass path to the ALU,
thereby saving the time that otherwise would be required to retrieve the value for the next operation.
Numeric:
• 8-, 16-, 32- and 64-bit ordinals
• 8-, 16, 32- and 64-bit integers
Non-Numeric:
• Bit
• Bit Field
• Triple-Word (96 bits)
• Quad-Word (128 bits)
. Large Register Set
The programming environment of the 80960KA inc
cludes a·large number of registers. In fact, 32 registers are available at any time. The availability of this
many registers greatly reduces the number of memory accesses required to execute most programs,
which leads to greater instruction processing speed.
7. Bandwidth Optimizations. The 80960KAgets
optimal use of its memory bus bandwidth because
the bus is tuned for use with the cache: the line size
of the instruction cache matches the maximum burst
size for instruction fetches. The 80960KA automatically fetches four words in a burst and stores them
directly in the cache. Due to the size of the cache
and the fact that it is continually filled· in anticipation
of needed instructions in the program flpw, the
80960KA is exceptionally insensitive to memory wait
states. In fact, each wait state causes only a 7%
degradatiori in system perfomance. The benefit is
that the 80960KA will deliver outstanding performance even with a low cost memory system.
There are two types of general-purpose registers:
local and global. The global registers consist of sixteen 32-bit registers (GO through G15) These registers perform the: same function as the general-purpose registers provided in other popular microprocessors. The term global refers to the fact that these
registers retain their contents across procedure
calls.
The local registers, on the other hand, are procedure specific. For each procedure call, the 80960KA
allocates 16 local registers (RO through R15). Each
local register is 32 bits wide.
8. Cache Bypass. If there is a cache miss, the processor fetches the needed instruction, then sends it
on to the instruction decoder at the same time it
updates the cache. Thus, no extra time is taken to
load and read the cache.
Multiple Register Sets
To further increase the efficiency of the register set,
multiple sets of local registers are stored on-chip.
This cache holds up to four local register frames,
which means that up to three procedure calls can be
made without having to access the procedure stack
resident in memory.
Memory Space and Addressing Modes
The 80960KA offers a linear programming environment so that all programs running on the processor
are contained in a single address space. The maximum size of the address space is 4 Gigabytes (2 32
bytes).
Although programs may have procedure calls nested many calls deep, a program typically oscillates
back and forth between only two or three levels. As
For ease of use, the 80960KA has a small number of
addressing modes, but includes all those necessary
3-38
80960KA
Table 2. Memory Addressing Modes
• 12-Bit Offset
o 32-Bit Offset
o Register-Indirect
o Register + 12-Bit Offset
• Register
• Register
o Register
• Register
+
+
x
+
32-Bit Offset
(Index-Register x Scale-Factor)
Scale Factor + 32-Bit Displacement
(Index-Register x Scale-Factor) + 32-Bit Displacement
Scale-Factor is 1, 2, 4, 8 or 16
a result, with four stack frames in the cache, the
probability of there being a free frame on the cache
when a call is made is very high. In fact, runs of
representative C-Ianguage programs show that 80%
of the calls are handled without needing to access
memory.
procedure stack in memory to make room for a new
set of registers. Global register G15 is used by the
processor as the frame pointer (FP) for the procedure stack. '
Note that the global registers are not exchanged on
a procedure call, but retain their contents, making
them available to all procedures for fast parameter
passing. An illustration of the register cache. is
shown in Figure 4.
If there are four or more active procedures and a
new procedure is called, the processor moves the
oldest set of local registers in the register cache to a
REGISTER
CACHE
ONE OF FOUR
LOCAL
REGISTER SETS
-----
~
--
~
LOCAL REGISTER SET
31
o
270775-2
Figure 4. Multiple Register Sets Are Stored On-Chip
3-39
intet
80960KA
Instruction Cache
To further reduce memory accesses, the 80960KA
includes a 512-byte on-chip instruction cache. The
instruction cache is based on the concept of locality
of reference; that is, most programs are not usually
executed in a steady stream but consist of many
branches and loops that lead to jumping back and
forth within the same small section of code. Thus, by
maintaining a block of instructions in a cache, the
number of memory references required to read instructions into the processor can be greatly reduced.
To load the instruction cache, instructions are
fetched in 16-byte blocks, so that up to four instructions can be fetched at one time. An efficient
prefetch algorithm increases the probability that an
instruction will already be in the cache when it is
needed.
Code for small loops will often fit entirely within the
cache, leading to a great increase in processing
speed since further memory references might not be
necessary until the program exits the loop. Similarly,
when calling short procedures, the code for the calling 'procedure is likely to remain in the cache, so it
will be there on the procedure's return.
Register Scoreboarding
The instruction decoder has been optimized in several ways. One of these optimizations is the ability to
do instruction overlapping by means of register
scoreboarding.
Register scoreboarding occurs when a LOAD instruction is executed to move a variable from memory into a register. When the instruction is initiated, a
scoreboard bit on the target register is set. When the
register is actually loaded, the bit is reset. In betvyeen, any reference to the register contents is accompanied by a test of the scoreboard bit to insure
that the load has completed before processing continues. Since the processor does not have to wait for
the LOAD to be completed, it can go on to execute
additional instructions placed in between the LOAD
instruction and the iristruction that uses the register
contents, as shown in the following example:
LOAD R4, address 1
LOAD R5, address 2
Unrelated instruction
Unrelated instruction
ADD R4, R5, R6
In essence, the two unrelated instructions between
the LOAD and ADD instructions are executed for
free (Le., take no apparent time to execute) because
they are executed while the register is being loaded.
Up to three instructions can be pending at one time
with three corresponding scoreboard bits set. By exploiting this feature, system programmers and compilers have a useful tool for optimizing execution
speed.
3-40
80960KA
High Bandwidth Local Bus
Debug Features
An 80960KA CPU resides on a high-bandwidth address/data bus known as the local bus (L-Bus). The
L-Bus provides a direct communication path between the processor and the memory and I/O subsystem interfaces. The processor uses the local bus
to fetch instructions, manipulate memory, and respond to interrupts. Its features include:
• 32-bit multiplexed address/data path
.. Four-word burst capability, which allows transfers
from 1 to 16 bytes at a time
• High bandwidth reads and writes at 66.7 Mbytes
per second
• Special signal to indicate whether a memory
transaction can be cached
The 80960KA has built-in debug capabilities. There
are two types of breakpoints and six different trace
modes. The debug features are controlled by two
internal 32-bit registers, the Process-Controls Word
and the Trace-Controls Word. By setting bits in
these control words, a software debug monitor can
closely control how the processor responds during
program execution .
The 80960KA has both hardware and software
breakpoints. It provides two hardware· breakpoint
registers on-chip which can be set by a special command to any value. When the instruction pointer
matches the value in one of the breakpoint registers,
the breakpoint will fire, and a breakpoint handling
routine is called automatically.
Figure 5 identifies the groups of signals which constitute the L-Bus. Table 4 lists the function of the LBus and other processor-support signals, such as
the interrupt lines.
The 80960KA also provides software breakpoints
through the use of two instructions, MARK and
FMARK. These instructions can be placed at any
point in a program and will cause the· processor to
halt execution at that point and call the breakpoint
handling routine. The breakpoint mechanism is easy
to use and provides a powerful debugging tool.
Interrupt Handling
The 80960KA can be interrupted in one of two ways:
by the activation of one of four interrupt pins or by
sending a message on the processor's data bus.
Tracing is available for instructions (single-step execution), calls and returns, and branching. Each different type of trace may be enabled separately by a
special debug instruction. In each case, the
80960KA executes the instruction first and then
calls a trace handling routine (usually part of a software debug monitor). Further program execution is
halted until the trace routine is completed. When the
trace event handling routine is completed, instruction execution resumes at the next instruction. The
The 80960KA is unusual in that it automatically handles interrupts on a priority basis and tracks pending
interrupts through its on-chip interrupt controller.
Two of the interrupt pins can be configured to provide 8259A handshaking for expansion beyond four
interrupt lines.
LOCAL BUS
\
LOCAL BUS .SIGNAL GROUPS
\
ADDRESS/DATA (32 LINES)
<------1
<
I
CONTROL (ADDRESS,DATA, and OPERATION SIGNALS - 15 LINES)
ARBITRATION (2 LINES)
Figure 5. Local Bus Signal Groups
3-41
270775-3
intaL
80960KA
80960KA's tracing mechanisms, which are implemented completely in hardware, greatly simplify the
task of testing and debugging software.
of logic. Then, before executing its first instruction, it
does a zero check sum on the first eight words in
memory to ensure that the system has been loaded
correctly. If a problem is discovered at any point during the self-test, the 80960KA will assert its FAILURE pin .and will not begin program execution. The
self-test takes approximately 47,000 cycles to complete.
FAULT DETECTION
The 80960KA has an automatic mechanism to
handle faults .. There are ten fault types including
trace, arithmetic, and floating-point faults. When the
processor detects a fault, it automatically calls the
appropriate fault handling routine and saves the current instruction pointer and necessary state information to make efficient recovery possible. The processor posts diagnostic information on the type of fault
to a Fault Record. Like interrupt handling routines,
fault handling routines are usually written to meet
the needs of a specific application and are often included as part of the operating system or kernel.
System manufacturers can use the 80960KA's selftest feature during incoming parts inspection. No
special diagnostic programs need to be written, and
the test is both thorough and fast. The self-test capability helps ensure that defective parts will be discovered before systems are shipped, and once in
the field, the self-test makes it easier to distinguish
between problems caused by processor failure and
problems resulting from other causes.
For each of the ten fault types, there are numerous
subtypes that provide specific information about a
fault. For exam pie, a floating-point fault may have its
subtype set to an Overflow or Zero-Divide fault. The
fault handler can use this specific information to respond correctly to the fault.
CHMOS
The 80960KA is fabricated using Intel's CHMOS IV
(Complementary High Speed Metal Oxide Semiconductor) process. This advanced technology eliminates the frequency and reliability limitations of older
CMOS processes and opens a new era in microprocessor performance. It combines the high performance capabilities of Intel's industry-leading
HMOS technology with the high density and low
power characteristics of CMOS. The 80960KA is
available at 10, 16, 20 and 25 MHz.
BUILT-IN TESTABILITY
Upon reset, the 80960KAautomaticaily conduc.ts an
extensive internal test (self-test) of its major blocks
Table 4a. 80960KA Pin Description: L-Bus Signals
Symbol
Type
CLK2
LAD31
-LADo
I
I/O
T.S.
Name and Function
SYSTEM CLOCK provides the fundamental timing for 80960KA systems. It is
divided by two inside the 80960KA to generate the internal processor clock.
LOCAL ADDRESS/DATA BUS carries 32-bit physical addresses and data to and
from memory. During an address (T a) cycle, bits 2-31 contain a physical word
address (bits 0-1 indicate SIZE; see below). During a data (Td) cycle, bits 0-31
contain read or write data. The LAD lines are active HIGH and float to a high
impedance state when not active.
SIZE, which is comprised of bits 0-1 of the LAD lines during a Ta cycle, specifies
the size of a burst transfer in words.
LAD 1
LAD 0
0
0
1
1
0
ALE
T.S.
1/0
~
Input/Output. 0
~
0
1
0
1
1 Word
2 Words
3 Words
4 Words
ADDRESS-LATCH ENABLE indicates the transfer of a physical address. ALE is
asserted during aTa cycle and deasserted before the beginning of the T d state. It
is active LOW and floats to a high impedance state during a hold cycle (Thor T hr).
Output, I
~
Input. 0.0.
~
Open·Drain, T.S.
~
tri-state
3-42
inteL
80960KA
Table 4a. 80960KA Pin Description: L-Bus Signals (Continued)
,Symbol
Type
Name and Function
ADS
0
0.0.
ADDRESS/DATA STATUS indicates an address state. ADS is asserted every Ta
state and deasserted during the following T d state. For a burst transaction, ADS is
asserted again every T d state where READY was asserted in the previous cycle.
W/R
0
0.0.
WRITE/READ specifies, during aT a cycle, whether the operation is a write or
read. It is latched on-chip and remains valid during Td cycles.
DT/R
0
0.0.
DATA TRANSMIT/RECEIVE indicates the direction of data transfer to and from
the L-Bus. It is low during T a and T d cycles for a read or interrupt
acknowledgement; it is high during T a and T d cycles for a write. DT IR never
changes state when DEN is asserted (see Timing Diagrams).
DEN
0
0.0.
DATA ENABLE is asserted during T d cycles and indicates transfer of data on the
LAD bus lines.
I
READY indicates that data on LAD lines can be sampled or removed. If READY is
not asserted during a T d cycle, the T d cycle .is extended to the next cycle by
inserting a wait state (Tw), and ADS is not asserted in the next cycle.
1/0
BUS LOCK prevents other bus masters from gaining control of the L-Bus
following the current cycle (if they would assert LOCK to do so). LOCK is used by
the processor or any bus agent when it performs indivisible Read/Modify/Write
(RMW) operations. Do not leave LOCK unconnected. It must be pulled high for the
processor to function properly.
READY
LOCK
0.0.
For a read that is designated as a RMW-read, LOCK is examined. if asserted, the
processor waits until it is not asserted; if not asserted, the processor asserts
LOCK during the T a cycle and leaves it asserted.
A write that is designated as an RMW-write deasserts LOCK in the T a cycle.
During the time LOCK is asserted, a bus agent can perform a normal read or write
but no RMW operations. LOCK is also held asserted during an interruptacknowledge transaction.
BE3-BEo
0
0.0.
BYTE ENABLE LINES specify which data bytes (up to four) on the bus take part.
in the current bus cycle. BEa corresponds to LADal-LAD24 and BEa corresponds
toLADrLADo·
The byte enables are provided in advance of data. The byte enables asserted
during T a specify the bytes of the first data word. The byte enables asserted
during Td specify the bytes of the next data word (if any), that is, the word to be
transmitted following the next assertion of READY. The byte enables during the
T d cycles preceding the last assertion of READY are undefined. The byte enables
are latched on-chip and remain constant from one T d cycle to the next when
READY is not asserted.
For reads, the byte enables specify the byte(s) that the processor will actually use.
L-Bus agents are required to assert only adjacent byte enables (e.g., asserting just
BEo and BE2 is not permitted), and are required to assert at least one byte enable.
To produce address bits Ao and Al externally, they can be decoded from the byte
enables.
1/0 = Input/Output, 0 = Output, I = Input, 0.0. = Open· Drain, T.S. = tn·state
3-43
80960KA
Table 4a. 80960KA Pin Description: L-Bus Signals (Continued)
Symbol
Type
Name and Function
HOLD/
HLDAR
I
HOLD: If the processor is the primary bus master (PBM), the input is interpreted
as HOLD, a request from a secondary bus master to acquire the bus. When the
processor receives HOLD and grants another master control of the bus, it floats
its tri-state bus lines and then asserts HLDA and enters the T h state. When HOLD
is deasserted, the processor will deassert HLDA and go to either the Tj or T a
state.
HOLD ACKNOWLEDGE RECEIVED: IUhe processor is a secondary bus master
(SBM), the input is HLDAR, which indicates, when HOLDR output is high, that the
processor has acquired the bus. Processors and other agents can be told at reset
if they are the primary bus master (PBM).
HLDAI
HOLDR
0
HOLD ACKNOWLEDGE: If the processor is a primary bus master, the output is
HLDA, which relinquishes control of the bus to another bus master.
T.S.
HOLD REQUEST: For secondary bus masters (SBM), the output is HOLDR, which
is a request to acquire the bus. The bus is said to be acquired if the agent is a
primary bus master and does not have its HLDA output asserted, or if the agent is
a secondary bus master and has its HOLD input and HLDA output asserted.
CACHE
0
CACHE indicates if an access is cacheable during aTa cycle. It is not asserted
during any synchronous access, such as a synchronous load or move instruction
used for sending an lAC message. The CACHE signal floats to a high impedance
state when the processor is idle.
T.S.
Table 4b. 80960KA Pin Description: Module Support Signals
Symbol
Type
Name and Function
I
BAD ACCESS, if asserted in the cycle following the one in which the last READY
of a transaction is asserted, indicates that an unrecoverable error has occurred on
the current bus transaction, or that a synchronous load/store instruction has not
been acknowledged.
BADAC
STARTUP: During system reset, the BADAC signal is interpreted differently. If the
signal is high, it indicates that this processor will perform system initialization. If it
is low, another processor in the system will perform system initialization instead.
RESET
RESET clears the internal logic of the processor and causes it to re-initialize.
I
During RESET assertion, the input pins are ignored (except for BADAC and
IACIINT0), the tri-state output pins are placed in a high impedance state, and
other output pins are placed in their non-asserted state.
RESET must be asserted for at least 41 CLK2 cycles for a predictable RESET.
The HIGH to LOW transition of RESET should occur after the rising edge of both
CLK2 and the external bus CLK, and before the next rising edge of CLK2.
0
FAILURE
O.D.
N.C.
110
~
N/A
Input/Output, 0
~
INITIALIZATION FAILURE indicates that the processor has failed to initialize
correctly. After RESET is deasserte.d and before the first bus transaction begins,
FAILURE is asserted while the processor performs a self-test. If the self-test
completes successfully, then FAILURE is deasserted. Next, the processor
performs a zero checksum on the first eight words of memory. If it fails, FAILURE
is asserted for a second time and remains asserted; if it passes, system
initialization continues and FAILURE remains deasserted.
NOT CONNECTED indicates pins should not be connected. Never connect any
pin marked N.C.
Output, I
~
Input, 0.0.
~
Open·Orain, 1.S.
~
tn·state
3-44
80960KA
Table 4b. 80960KA Pin Description: Module Support Signals (Continued)
Symbol
lAC
INTO
Type
Name and Function
I
INTERAGENT COMMUNICATION REQUEST IINTERRUPT 0 indicates either
that there is a pending lAC message for the processor or an interrupt. The bus
interrupt control register determines in which way the signal should be interpreted.
To signal an interrupt or lAC request in a synchronous system, this pin (as well as
the other interrupt pins) must be enabled by being deasserted for at least one bus
cycle and then asserted for at least one additional bus cycle; in an asynchronous
system, the pin must remain deasserted for at least two bus cycles and then be
asserted for at least two more bus cycles.
LOCAL PROCESSOR NUMBER: This signal is interpreted differently during
system reset. If the signal is at a high voltage level, it indicates that this processor
is a primary bus master (Local Processor Number = 0); if it is at a low voltage
level, it indicates that this processor is a secondary bus master (Local Processor
Number = 1).
INT1
I
INTERRUPT 1, like INTO, provides direct interrupt signaling.
INT2/
INTR
I
INTERRUPT 211NTERRUPT REQUEST: The bus control registers determines
how this pin is interpreted. If INT2, it has the same interpretation as the INTO and
INT1 pins. If INTR, it is used to receive an interrupt request from an external
interrupt controller.
I/O
INTERRUPT 311NTERRUPT ACKNOWLEDGE: The bus interrupt control register
determines how this pin is interpreted. If INT3, it has the same interpretation as
the INTO, INT1, and INT2 pins. If INTA, it is used as an output to control interruptacknowledge bus transactions. The INTA output is latched on-chip and remains
valid during T d cycles; as an output, it is open-drain.
INT3/
INTA
110
0.0.
= Input/Output, 0 = Output, I = Input, 0.0. = Open-Drain, T.S. = tn-state
ELECTRICAL SPECIFICATIONS
Power Decoupling Recommendations
Power and Grounding
Liberal decoupling capacitance should be placed
near the 80960KA. The processor can cause transient power surges when driving the L-Bus, particularly when it is connected to a large capacitive load.
The 80960KA is implemented in CHMOS IV technology and has modest power requirements. Its high
clock frequency and numerous output buffers (address/ data, control, error, and arbitration signals)
can cause power surges as multiple output buffers
drive new signal levels simultaneously. For clean onchip power distribution at high frequency, 12 Vee
and 13 V ss pins separately feed functional units of
the 80960KA in the PGA.
Low inductance capacitors and interconnects are
recommended for best high frequency electrical performance. Inductance can be reduced by shortening
the board traces between the processor and decoupling capacitors as much as possible. Capacitors
specifically designed for PGA packages are also
commercially available and offer the lowest possible
inductance.
Power and ground connections must be made to all
power and ground pins of the 80960KA. On the circuit board, all Vee pins must be strapped closely
together, preferably on a power plane. Likewise, all
Vss pins should be strapped together, preferably on
a ground plane. These pins may not be connected
together within the chip.
Connection Recommendations
For reliable operation, always connect unused inputs to an appropriate signal level. In particular, if
one or more interrupt lines are not used, they should
be pulled up. No inputs should ever be left floating.
3-45
•
int:et
80960KA
All open-drain outputs require a pullup device.. While
in some cases a simple pullup resistor will be adequate, we recommend a network of pullup and pulldown resistors biased to a valid V,H (~3.4V) and
terminated in the characteristic impedahce of the circuit board. Figure 6 shows our recommendations for
the resistor values for both a low and high current
drive network, which assumes that the circuit board
has a characteristic impedance of 100n. The advantage of terminating the output signals in this fashion
is that it limits signal swing and reduces AC power
consumption.
.
Figure 10 shows the typical capacitive derating
curve for the B0960KA measured from 1.SV on the
system clock (ClK) to 1.SV on the falling edge and
1.SV on the rising edge of the LoBus address/data
(lAD) signals.
Test Load Circuit
Figure .13 illustrates the load circuit used to test the
B0960KA's tristate pins, and Figure 14 shows the
load circuit used to test the open drain outputs. The
open drain test uses an active load circuit in the form
of a matched diode bridge. Since the open-drain outputs sink current, only the 10L legs of the bridge are
necessary and the 10H legs are not used. When the
B0960KA driver under test is turned off, the output
pin is pulled up to VREF (i.e., VOH). Diode 01 is
turned off and the 10L current source flows through
diode 02.
Characteristic Curves
Figure 7 shows the typical supply current requirements over the operating temperature range of the
processor at supply voltage (Veel of SV. Figure B
shows· the typical power supply current (Ieel required by the B0960KA at various operating frequencies when measured at three input voltage (Vecl
levels.
When the B0960KA open-drain driver under test is
on, diode 01 is also on, and the voltage on the pin
being tested drops to VOL. Diode 02 turns off and
10L flows through diode 01.
For a given output current (loll, the curve in Figure 9
shows the worst case output low voltage (VOL).
Vee
(
Vee
130.ll
lBO.ll
B0960KA
B0960KA
OPEN-DRAIN
OUTPUT
. OPEN-DRAIN
OUTPUT
2BO.ll
390.ll
270775-5
270775-4
Low Drive Network:
High Drive Network:
• VOH = 3.42V
• 10l = 25.3 mA
• IOl = 33.8 mA
• VOH = 3.41V
Figure 6. Connection Recommendations for low and High Current Drive Networks
3-46
80960KA
Vee
=S.OV
380
360
~-
"0
-
r--
320
'<
!-
300
280
u
'--- r->--- r---
260
~
~
240
-
--
I--
- r---
220
200
IBO
-60
-20
-40
20
60
100
80
"0
120
CASE TEt.4PERATURE (OC)
Ie 25 t.lHz
020 t..CHz .16 MHz 010 MHz
I
270775-6
Figure 7. Typical Supply Current (Icc)
!
500
t-
450
400
~
350
300
~
u
~
250
g;
200
VI
150
-'
~
~
.....--;
a
........... r
:.,:,....-
::..---
~ ;..--
R::: ;:.....--
100
50
a
-- --
...........
.........
-'
.......
~
10
15
25
20
OPERATING FREQUENCY (MHz)
I ,,@4.5V
O@5.0V
+@5.5V
I
270775-'7
Figure 8. Typical Current vs Frequency
(Temp
~
+85'C, Vee
~
4.5V)
(Temp
0.8
~
&
~)';;
0.4
.3
,
-S
0.2
1/
/'"
./"
a
10
!
V
~
20
,
15
"'5
10
~
30
40
Output Low Current (mA)
4.5V)
fALL NG
::::::
0-
o
20
~
30
1;' 25
o
0.0
+ 85'C, Vee
]:
,/
0.6
~
50
a
a
--
20
~ f-::::
r- RISI
~
40
60
---
80
Capacitive Load (pF')
270775-8
Figure 9. Worst Case Voltage vs
Output Current on Open-Drain Pins
G
100
270775-9
Figure 10. Capacitive Derating Curve
3-47
intel®
80960KA
ABSOLUTE MAXIMUM RATINGS*
NOTICE: This is a production data sheet. The specifications are subject to change without notice.
Operating Temperature ........ O·C to + 85·C Case
• WARNING: Stressing the device beyond the "Absolute
Maximum Ratings" may cause permanent damage.
These are stress ratings only. Operation beyond the
"Operating Conditions" is not recommended and extended exposure beyond the "Operating Conditions"
may affect device reliability.
Storage Temperature .......... -65·C to + 150·C
Voltage on Any Pin .......... - 0.5V to V cc + 0.5V
Power Dissipation ................. 2.5W (25 MHz)
DC CHARACTERISTICS
PGA:
80960KA (16 MHz): T CASE = O·C to + 85·C, VCC = 5V ± 10%
80960KA (20 and 25 MHz): T CASE = O·C to + 85·C, VCC = 5V ± 5%
PQFP:
80960KA (10 and 16 MHz): T CASE = O·C to + 100·C, VCC = 5V ± 10%
80960KA (20 MHz): TCASE = O·C to + 100·C, VCC = 5V ±5%
Symbol
Parameter
Min
Max
Units
Test Conditions
VIL
Input Low Voltage
-0.3
+0.8
V
VIH
Input High Voltage
2.0
VCC + 0.3
V
VCl
CLK2 Input Low Voltage
-0.3
+0.8
V
VCH
CLK2 Input High Voltage
0.55 VCC
VCC + 0.3
V
VOL
Output Low Voltage
0.45
V
(1,5)
VOH
Output High Voltage
V
(2,4)
Icc
Power Supply Current:
10 MHz
16 MHz
20 MHz
25 MHz
300
375
420
480
mA
mA
mA
mA
III
Input Leakage Current
±15
p.A
0::;; VIN ::;; VCC
±15
2.4
IlO
Output Leakage Current
p.A
0.45 ::;; Va ::;; VCC
CIN
Input Capacitance
10
pF
fC = 1 MHz(3)
Co
1/0 or Output Capacitance
12
pF
fC = 1 MHz(3)
CClK
Clock Capacitance
10
pF
fC = 1 MHz(3)
NOTES:
1. For tri-state outputs, this parameter is measured at:
Address/Data ......................................................................................... .4.0 mA
Controls ...............................................................................................5.0 mA
2. This parameter is measured at:
Address/Data ......................................................................................... -1.0 mA
Controls ............................................................................................. -0.9 mA
ALE ................................................................................................. -5.0 mA
3. Input, output, and clock capacitance are not tested.
4. Not measured on open-drain outputs.
5. For open-drain outputs .................................................................................. 25 mA
3-48
80960KA
the specifications refer to the time at which the signal reaches (for input setup) or leaves (for hold time)
the TTL levels of LOW (O.BV) or HIGH (2.0V). All AC
testing should be done with input clock voltages of
O.4V and 2.4V, except for the clock (CLK2), which
should be tested with input voltages of 0.45 Vee and
0.55 Vee.
AC SPECIFICATIONS
This section describes the AC specifications for the
B0960KA pins. All input and output timings are specified relative to the 1.5V level of the rising edge. Four
output timings, the specifications refer to the time it
takes the signal to reach 1.5V. For input timings,
EDGE
A
CLK2
OUTPUTS:
LAD 31 -LAD o,
ADS,
W/R,DEN,
8E 3-8E o
HLDA/HOLDR,
CACHE
LOCK,INTA "
II
DT/R
INPUTS:
LAD 31 -LAD o,
BADAC,
IAC/INTo,INT"
INT2/INTR,iNT3
HOLD,HLDAR,
LOCK.
READY
270775-10
Figure 11. Drive Levels and Timing Relationships for 80960KA Signals
3-49
infel·
80960KA
T.
Td
T,
T.
Td
Td
T,
CLK2
elK
AID
ALE
ADS
8E(0:3)
w/P.
DT/P.
DEN
READY
270775-11
Figure 12. Timing Relationship of L-Bus Signals
3-50
Intel·
80960KA
AC Specification Tables
80960KA AC Characteristics (10 MHz, PQFP Only)
Symbol
Parameter
Min
Max
Units
Test Conditions
125
ns
VIN
=
1.SV
=
=
10% Point
1.2V
T1
Processor Clock
Period (CLK2)
50
T2
Processor Clock
Low Time (CLK2)
12
ns
VIL
T3
Processor Clock
High Time (CLK2)
12
ns
VIH
T4
Processor Clock
Fall Time (CLK2)
10
T5
Processor Clock
Rise Time (CLK2)
10
ns
VIN
T6
Output Valid
Delay
2
25
ns
CL
CL
=
=
100 pF (LAD)
75 pF (Controls){2)
T6H
HOLDA Output
Valid Delay
4
31
ns
CL
=
75 pF
T7
ALE Width
25
ns
CL
75pF
Ts
ALE Output Valid Delay
0
20
ns
CL
=
=
Tg
Output Float
DelaY
2
20
ns
CL
CL
100 pF (LAD)
75 pF (Controls)
TgH
HOLDA Output
Float Delay
4
20
ns
CL
=
=
=
T10
Input Setup 1
3
ns
ns
= 90% Point
= 0.1V + 0.5 VCC
VIN = 90% Point to 10%
Point
T11
Input Hold
5
ns
T11H
HOLD Input
Hold
4
ns
T12
Input Setup 2
8
ns
T13
Setup to ALE
Inactive
10
ns
CL
CL
T14
Hold after ALE
Inactive
8
ns
CL
CL
T15
Reset Hold
3
ns
T16
Reset Setup
5
ns
T17
Reset Width
1640
ns
=
==
=
=
=
10% Point to 90%
Point
75pF(2)
75pF
100 pF (LAD)
75 pF (Controls)
100 pF (LAD)
75 pF (Controls)
41 CLK2 Periods Minimum
NOTES:
1. lAC/INTo, INT1, INT2/INTR, INT3 can be asynchronous.
2. A float condition occurs when the maximum output current becomes less than ILQ. Float delay is not tested, but should be
no longer than the valid delay.
3. Clock rise and fall time is not tested.
3-51
II
infel~
80960KA
80960KA
Ac Characteristics (16 MHz)
Symbol
T1
Parameter
\ Processor Clock
Period (CLK2)
Min
Max
Units
31.25
125
ns
VIN = 1.5V
Test Conditions
T2
Processor Clock
Low Time (CLK2)
8
ns
VIL = 10% Point
= 1.2V
T3
Processor Clock
High Time (CLK2)
8
ns
VIH = 90% Point
= 0.1V + 0.5 Vee
T4
Processor Clock
Fall Time (CLK2)
10
ns
VIN = 90% Point to 10%
Point
Ts
Processor Clock
Rise Time (CLK2)
10
ns
VIN = 10% Point to 90%
Point
Te
Output Valid
Delay
2
25
ns
CL = 100 pF (LAD)
CL = 75 pF (Controls)
TeH
HOLDA Output
Valid Delay
4
3t
ns
CL = 75 pF
T7
ALE Width
15
ns
CL = 75pF
Ta
ALE Output Valid Delay
0
20
ns
CL = 75pF(2)
Tg
Output Float
Delay
2
20
ns
CL = 100 pF (LAD)
CL = 75 pF (Controls)(2)
TgH
HOLDA Output
Float Delay
4
20
ns
CL
T10
Input Setup 1
3
T11
Input Hold
5
ns
T11H
HOLD Input
Hold
4
ns
T12
Input Setup 2
8
ns
T13
Setup to ALE
Inactive
10
ns
CL = 100pF (LAD)
CL = 75 pF (Controls)
T14
Hold after ALE
Inactive
8
ns
CL = 100 pF (LAD)
CL = 75 pF (Controls)
= 75 pF
ns
T1S
Reset Hold
3
ns
T1e
Reset Setup
5
ns
T17
Reset Width
1281
ns
41 CLK2 Periods Minimum
NOTES:
1. lAC/INTo. INT1. INT2I1NTR. INT3 can be asynchronous.
2. A float condition occurs when the maximum output current becomes less than ILO. Float delay is not tested. but should be
no longer than the valid delay.
3. Clock rise and fall time is not tested.
3-52
80960KA
80960KA AC Characteristics (20 MHz)
Symbol
Min
Max
Units
T1
Processor Clock
Period (CLK2)
25
125
ns
VIN
T2
Processor Clock
Low Time (CLK2)
6
ns
VIL
T3
Processor Clock
High Time (CLK2)
6
ns
T4
Processor Clock
Fall Time (CLK2)
10
ns
= 10% Point
= 1.2V
VIH = 90% Point
= 0.1V + 0.5 VCC
VIN = 90% Point to 10%
T5
Processor Clock
Rise Time (CLK2)
10
ns
VIN
T6
Output Valid
Delay
2
20
ns
CL
CL
HOLDA Output
Valid Delay
4
26
ns
T7
ALE Width
12
Ts
ALE Output Valid Delay
0
20
ns
Tg
Output Float
Delay
2
20
ns
T9H
HOLDA Output
Float Delay
4
20
ns
TlO
Input Setup 1
3
Tl1
Input Hold
5
ns
Tl1H
HOLD Input
Hold
4
ns
T6H
Parameter
I
Test Conditions
= 1.5V
Point
= 10% Point to 90%
Point
ns
= 60 pF (LAD)
= 50 pF (Controls)
CL = 50pF
=
=
CL =
CL =
CL =
CL
CL
50pF
50 pF(2)
60 pF (LAD)
50 pF (Controls)(2)
50 pF
ns
T12
Input Setup 2
7
ns
T13
Setup to ALE
Inactive
10
ns
T14
Hold after ALE
Inactive
8
ns
T15
Reset Hold
3
ns
T16
Reset Setup
5
ns
T17
Reset Width
1025
ns
=
=
CL =
CL =
CL
CL
60 pF (LAD)
50 pF (Controls)
60 pF (LAD)
50 pF (Controls)
41 CLK2 Periods Minimum
NOTES:
1. IACIiNTo, INT1, INT2/INTR, INT3 can be asynchronous.
2. A float condition occurs when the maximum output current becomes less than ILQ. Float delay is not tested, but should be
no longer than the valid delay.
3. Clock rise and fall time is not tested.
80960KA
TRISTATE OUTPUT
0-----,1
L
C
~
o----r---+.-....l...-..4--<>
IOL Tested at 25 mA
VREF ~ Vec
D1 and D2 are matched
270775-12
Figure 13. Test Load Circuit for
Tri-State Output Pins
V REF
270775-13
Figure 14. Test Load Circuit for Open-Drain Output Pins
3-53
..
infel"
80960KA
80960KA AC Characteristics (25 MHz, PGA Only)
Symbol
Parameter
Min
Max
Units
Test Conditions
125
ns
VIN = 1.5V
Tl
Processor Clock
Period (CLK2)
20
T2
Processor Clock
Low Time (CLK2)
5
ns
VIL = 10% Point
= 1.2V
Ts
Processor Clock
High Time
5
ns
VIH = 90% Point
+ 0.5 Vcc
T4
Processor Clock
Fall Time (CLK2)
10
ns
VIN = 90% Point to 10%
Point
Ts
Processor Clock
Rise Time (CLK2)
10
ns
VIN = 10% Point to 90%
Point
T6
Output Valid
Delay
2
18
ns
CL = 60 pF (LAD)
CL = 50 pF (Controls)
T6H
HOLDA Output
Valid Delay
4
24
ns
CL = 50 pF
T7
ALE Width
12
ns
CL = 50 pF
Ta
ALE Output Valid Delay
0
20
ns
CL = 50 pF(2)
Tg
Output Float
Delay
2
18
ns
CL = 60 pF (LAD)
CL = 50 pF (Controls)
TgH
HOLDA Output
Float Delay
4
20
ns
CL = 50 pF .
T10
Input S!'!tup 1
3
TIl
Input Hold
5
ns
TllH
HOLD Input
Hold
4
ns
T12
Input Setup 2
7
ns
T13
Setup to ALE
Inactive
8
ns
CL = 60 pF (LAD)
CL = 50 pF (Controls)
Hold after ALE
Inactive
8
ns
CL = 60 pF (LAD)
CL = 50 pF (Controls)
TIS
Reset Hold
3
ns
T16
Reset Setup
5
ns
T17
Reset Width
820
ns
. T14
"" o.w
ns
41 CLK2 Periods Minimum
NOTES:
1. IAC/ffii'fO, INTI, INT2/INTR, INTS can be asynchronous.
2. A float condition occurs when the maximum output current becomes less than ILQ. Float delay is not tested, but should be
no longer than the valid delay.
S. Clock rise and faU time is not tested.
3-54
80960KA
'~
I~
T3
:r------,
HIGH LEVEL (MIN) 0.55V CC
LOW LEVEL (MAX) O.BV
90%
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
10%
T4
~r----------
11
I
270775-14
Figure 15. Processor Clock Pulse (CLK2)
FIRST
ABC
0
A
000
CLK2
ClK
000
11100
RESET
T17
OUTPUTS
000
INIT PARAMETERS (BADAC,
IACo) MUST BE SETUP 8 CLOCKS
PRIOR TO THIS ClK2 EDGE
INIT PARAMETERS MUST BE HELD
BEYOND THIS ClK2 EDGE
Figure 16. RESET Signal Timing
3-55
T15
T16
T17
= RESET
HOLD
= RESET SETUP
= RESET WIDTH
270771)-15
infel"
80960KA
Th
Th
Th
Th
CLK2
ClK
HOLOR
HOlO
HLOA
HLOAR
270775-16
PRIMARY
SECONDARY
LD
HHOLDA
I-
1
HOLDR
L __~~:I----:-.~~ HOLDAR
270775-17
Figure 17. Hold Timing
Input hold times can be disregarded by the designer .
whenever the input is removed oecause a subsequent output from the processor is deasserted (e.g.,
DEN becomes deasserted).
When designing an 80960KA hardware system that
uses the ICE-960KS to debug the system, several
electrical and mechanical characteristics should be
considered. These considerations include capacitive
loading, drive requirement, power requirement and
physical layout.
Whenever the processor generates an output that
indicates a transition into a subsequent state, any
outputs that are specified to be tri-stated in this new
.state are guaranteed to be tri-stated. For example, in
the Td cycle following a Ta cycle for a read, the minimum output delay of DEN is 2 ns, but the maximum
float time of LAD is 20 ns. When DEN is asserted,
however, the LAD outputs are guaranteed to have
been tri-stated.
The .ICE-960KS probe module increases the load
capacitance of each line by up to 25 pF. It also adds
one standard Schottky TIL load on the CLK2 line,
up to one advanced low-power Schottky TIL load
for each control signal line, and one advanced lowpower Schottky TIL load for each address/data and
byte enable line. These loads originate from the
probe module and are driven by the 80960KA processor.
Design Considerations
To achieve high noise immunity, the ICE-960KS
probe is powered by the user's system. The highspeed probe circuitry draws up to 1.1 A plus the maximum current (Ieel of the 80960KA processor.
Designing for the ICE·960KB
The 80960KS In-Circuit Emulator assists in debugging both 80960KA and 80960KS hardware and
software designs. The product consists of a probe
module, cable, and control unit. Secause of the high
. operating frequency of 80960KA systems, the probe
module connects directly to the 80960KA socket.
The mechanical considerations are shown in Figure
18, which illustrates the lateral clearance requirements for the ICE-960KS probe as viewed from
above the socket of the 80960KA processor.
3-56
80960KA
1
..
~
eJ
75
70
65
60
...........-::'
~
V~
./
~p---"
po'
]"""'"'""
r----
~
55
50
200
400
600
800
AIRrLOW (II/min)
• porp
0 PGA with no
heatsink
• PGA with omnidirectional heatslnk
¢
PGA with un idirectional heatsink
270775-46
Figure 27. 10 MHz 80960 K-Series Maximum Allowable Ambient Temperature
80
75
~..,
70
'"=>
~
~
65
60
55
50
200
400
600
800
AIRrLOW (It/min)
• porp
DPGA with no
heatsink
• PGA with omnidirectional heatslnk
¢
PGA with unidirectional heatsink
270775-33
Figure 28. 16 MHz 80960 K-Series Maximum Allowable Ambient Temperature
3-70
80960KA
80
,..---------
75
70
~w
'"
;:
~
~
~.
-
,....-
~~
~
65
60
---- -----
~
55
~
50
~
45
40
200
600
400
800
AIRFLOW (It/min)
m PQFP
0 PGA with no
hoatsink
• PGA with omnldirectional heatsink
<> PGA with un 1directional heatslnk
270775-34
Figure 29. 20 MHz 80960 K-Series Maximum Allowable Ambient Temperature
-
75
'U
70
w
65
~
'"
w
'""-
~
.- ~
V~~
.-V
~V
l"'"
:::J
I-
«
60
W
lI-
'"
SO
y-r
w
iii
45
/
z
'"
«
55
40
35
a
t:::--' J -
.-!
f-"'"
./"
100
200
300
400
500
600
700
800
AIRFLOW (it/min)
o PGA
III PGA with no
with omnldirectional heatslnk
heatsink
• PGA with unidirectional heatslnk
270775-35
Figure 30. Maximum Allowable Ambient Temperature for
the 80960KA at 25 MHz (available in PGA only)
115
'U
~
~ --::;
----:
..~V
....
V
..- ~
~
.....-V
~~
110
w
I-
'"
105
'""-w
100
W
II-
95
:::J
«
'"
zw
iii
'"
«
90
85
.--l
/
V
/1""
/
a
100
200
300
400
500
600
700
800
AIRFLOW (It/min)
III PGA with no
heatslnk
o PGA
with omnidirectional heatslnk
• PGA with unidirectional heatslnk
270775-36
Figure 31. Maximum Allowable Ambient Temperature for the Extended
Temperature TA-80960KA at 20 MHz (available in PGA only)
3-71
iniei®
80960KA
'" u
>'"
z
uzuuo~
z Z Z
Il:
'"
g~
H 8 (.)
> u
> > z
LADO
100
66
Ne
LADl
101
65
LAD2
102
64
Ne
Ne
VSS
103
63
LAD3
104
62
Ne
Ne
LAD4
105
61
Ne
LADS
106
60
Ne
LAD6
107
59
Ne
LAD7
108
58
Ne
LAD8
109
57
Vss
LAD9
110
56
Vee
LAD10
111
55
LADll
112
54
Vee
Ne
LAD12
113
53
VSS
114
52
51
LAD13
115
LAD14
116
LAD15
117
LAD16
118
48
LAD17
119
47
LAD18
120
46
LAD19
121
45
vss
vss
Ne
Ne
Ne
Ne
Ne
Ne
Ne
B0960KA
50
49
LAD20
122
44
He
LAD21
123
43
Ne
LAD22
124
42
Vss
VSS
125
41
Vee
LAD23
126
40
He
LAD24
127
39
LAD25
128
38
Ne
Ne
8ADAe
129
37
Ne
HOLD/HLDAR
130
36
Ne
131
35
ADS
132
34
Vee
Vee
Ne
"'I'9" 1%
>1/)
~
(I)
>(1)
til
U
>VI Z
0
Z
en
>11)
VI (.)
>VI Z
-:?0
U
(.)
>0 Z
(I)
>VI
(I)
>VI
270775-47
Figure 32. 80960KA PQFP Pinout-View from Top
3·72
80960KA
Table 8. 80960KA Plastic Package Pinout-In Pin Order
Signal
Pin
Signal
Pin
1
HLDA/HOLDR
34
N.C.
67
2
ALE
35
68
3
LAD26
36
Vee
Vee
4
LAD27
37
5
LAD28
38
6
LAD29
7
LAD30
Pin
Signal
Pin
Signal
Vss
Vss
100
LADO
101
LAD1
69
N.C.
102
LAD2
N.C.
70
Vss
71
Vee
Vee
103
N.C.
104
LAD3
39
N.C.
72
N.C.
105
LAD4
40
N.C.
73
106
LAD5
8
LAD31
41
74
107
LAD6
9
Vss
42
Vee
Vss
Vss
Vee
75
N.C.
108
LAD7
10
CACHE
43
N.C.
76
N.C.
109
LAD8
11
W/R
44
N.C.
77
N.C.
110
LAD9
12
READY
45
N.C.
78
N.C.
111
LAD10
13
DT/R
46
N.C.
79
112
LAD11
14
BEO
47
N.C.
80
Vss
Vss
113
LAD12
15
BE1
48
N.C.
81
N.C.
114
Vss
16
BE2
49
N.C.
82
LAD13
17
BE3
50
N.C.
83
116
LAD14
18
FAILURE
51
N.C.
84
Vee
Vee
Vss
115
117
LAD15
19
Vss
52
IACIINTO
118
LAD16
LOCK
53
Vss
Vss
85
20
86
INT1
119
LAD17
21
DEN
54
N.C.
87
INT2/1NTR
120
LAD18
22
55
INT311NTA
121
LAD19
24
N.C.
57
Vee
Vee
Vss
88
23
Vss
Vss
25
N.C.
58
26
59
27
Vss
Vss
28
29
89
N.C.
122
LAD20
90
Vss
123
LAD21
N.C.
91
CLK2
124
LAD22
N.C.
92
Vee
125
Vss
60
N.C.
93
RESET
126
LAD23
N.C.
61
N.C.
94
N.C.
127
LAD24
N.C.
95
N.C.
128
LAD25
30
Vee
Vee
62
63
N.C.
96
N.C.
129
BADAC
31
N.C.
64
N.C.
97
N.C.
130
HOLD/HLDAR
32
Vss
Vss
65
N.C.
98
N.C.
131
N.C.
66
N.C.
99
Vss
132
ADS
33
56
3-73
80960KA
Table 9. 80960KA Plastic Package Pinout.,...,..ln Signal Order
Signal
ADS
ALE
BADAC
BEO
BE1
BE2
BE3
CACHE
CLK2
DEN
DT/R
FAILURE
HLDAlHOLDR
HOLD/HLDAR
lAC/INTO
INT1
INT211NTR
INT3/INTA
LADO
LAD1
LAD10
LAD11
LAD12
LAD13
LAD14
LAD15
LAD16
LAD17
LAD18
LAD19
LAD2
LAD20
LAD21
Pin
Signal
Pin
Signal
132
2
129
14
15
16
17
10
91
21
13
18
1
130
85
86
87
88
100
101
111
112
113
115
116
117
118
119
120
121
102
122
123
LAD22
LAD23
LAD24
LAD25
LAD26
LAD27
LAD28
LAD29
LAD3
LAD30
LAD31
124
126
127
128
3
4
. N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
READY
RESET
LAD4
LAD5
LAD6
LAD7
LAD8
LAD9
LOCK
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
5
6
104
7
8
105
106
107
108
109
110
20
24
25
28
31
34
37
38
39
40
43
44
45
46
Vee
Vee
Vee
Vee
47
48
3-74
Pin
Signal
30
35
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
36
W/R
49
50
51
54
58
59
60
61
62
63
64
65
66
69
72
75
76
77
78
81
89
94
95
96
97
98
131
12
93
29
Pin
41
55
56
70
71
74
82
83
92
9
19
22
23
26
27
32
33
42
52
53
57
67
68
73
79
80
84
90
99
103
114
125
11
80960KA
Table 10. 80960KA PGA Package Thermal Characteristics
Thermal Resistance-OC/Watt
Airflow-ft.!min (m/sec)
Parameter
50
600
0
100
200
400
800
(0) (0.25) (0.50) (1.01) (2.03) (3.04) (4.06)
8 Junction-to-Case
(Case Measured
as shown in Figure 26)
2
2
2
2
2
2
2
8 Case-to-Ambient
(No Heatsink)
19
18
17
15
12
10
9
UUlJ
'Jo
'J cop
I
UUU
270775-38
16
15
14
12
9
7
6
8 Case-to-Ambient
(with Unidirectional)
Heatsink)
15
14
13
11
8
6
5
NOTES:
1. This table applies to 80960KA PGA 3.
plugged into socket or soldered directly into board.
+
9JPJn ( \
I
8 Case-to-Ambient
(with Omnidirectional
Heatsink)
2. 0JA = 0JC
OJ.
0J.CAP =
OJ-PIN =
OJ-PIN =
II
4'C/w (approx.)
4'C/w (inner pins) (approx.)
8'C/w (outer pins) (approx.)
0CA·
Table 11. 80960KA PQFP Package Thermal Characteristics
PQFP Thermal Resistance-°C/Watt
Airflow-ft.!min (m/sec)
Parameter
0
50
100
200
400
600
800
(0) (0.25) (0.50) (1.01) (2.03) (3.04) (4.06)
8 Junction-to-Case
(Case Measured
9
as shown in Figure 26)
9
9
9
9
9
9
8 Case-to-Ambient
(No Heatsink)
19
18
16
11
9
8
22
NOTES:
1. This table applies to 80960KA 3.
PQFP soldered directly into board.
2. 0JA = 0JC
+
0JL =
0JB =
18'C/Watt
18'C/Watt
0CA·
270775-39
3-75
80960KA
To
Td
Tr
CLK2
CLK
LAD31 LADO
ALE
ADS
BE3-BEO
W/R
DT/R
DEN
READY
270775-37
Figure 33. Read Transaction
3-76
int:eL
80960KA
ClK2
ClK
lAD 31 lADo
w/p.
DT/p.
270775-40
Figure 34. Write Transaction with One Wait State
To
Td
Td
Td
T,
CLK2
elK
LAD 3 ,-
LADo
ALE
ADS
BE3 -BEo
w/P.
DT/P.
DEN
READY
270775-41
Figure 35. Burst Read Transaction
3-77
•
intel~
80960KA
CLK2
CLK
W/R
OT/R
270775-42
Figure 36. Burst Write Transaction with One Wait State
3-78
inteL
ICLK
INTR
80960KA
PREVIOUS
CYCLE
T
t
T
t
INTERRUPT
ACKNOWLEDGEMENT
CYCLE 1
To
Td
T,
IDLE
(5 BUS STATES)
.
TI
TI
TI
--j-:TI
TI
To
Td
j
Tw
T,
J~ ~ "w ~ ("'w ~ "w ~ ~ "w "w ~ ~ ~
-~
~"" ~""" ~""~ ~""~ 0-.""'0 ~""~ 0-.""'0 ~""" ~""" ~"""
_
.........
" "
,10...""~ ~"""" ::x ADDR ~
~""
-.~
~""",-,: 10.""~
V
V
"----V
"---- J
~
\.
A."'-':
"
-0."~ \0-.""",-,: :0
- - ~OR~~
-0.""" 0...""~ ~""~ ~""~ ~""~ ~R~
- 0
/
DT/il
INTERRUPT
ACKNOWLEDGEMENT
CYCLE 2
~""~ ~""",-,: 0...""~ \0-.""",-,:
~,,~
/
-
~~
~
"----V
-
\.
r
,..-
/
\.
'V
\.V
~
270775-43
NOTE:
INTR can go low no sooner than 5 ns (input hold time) following the beginning of interrupt acknowledgement cycle 1.
For a second interrupt to be acknowledged. INTR must be low for at least three cycles before it can be reasserted.
Figure 37. Interrupt Acknowledge Transaction
3·79
intei®
80960KA
PPSM BUS
STATE
SBM BUS
STATE
elK
LAD 31 -
lADo
w/fl
PSM ALE
To
Td
Td
Th
Thr
Thr
Thr
Thr
T,
T,
J"w '\..., '\..., "w"w"w"w"w ~ "w"w"w"w ~ ~ r-
- - XiiAT'
-A - ~
- xo;.TA
- ~
-~
- .XoMA
- XoMA
- xo;.TA
- )(Di.iA
- ~"'~ ~"'''0:
- ~
-~
~~
?7
'''''"'-:: ~"''''~
~&v
~"'~ ~'"'"'-:: 0."'' ' ' ' ~'"'"'-:: 0."''''~ ~'"'"'-::
"'"'" ~'"'"'-:: ~'"'"'-:: ~"''''~
'"'~
~
~~
A"'~ ~
~
~/
'-::
\. V
HOlDR
PSM
HOLD
PSM
HlDA
SBM
HlDAR
~
f0:
'0
V
SBM ALE
SBM
T,
\. !J
\.
-f-I
'--1I
-~ r-
l/L ~ p
Ii'
~
/
V\. V
V\. V
\.
i'
r------
~
\.
\.
L- "--I /
'-I 270775-44
Figure 38. Bus Exchange Transaction (PBM
= Primary Bus Master, SBM = Secondary Bus Master)
3-80
80960KB
EMBEDDED 32-BIT PROCESSOR
WITH INTEGRATED FLOATING-POINT UNIT
iii High-Performance Embedded
II Multiple Register Sets
Architecture
- 25 MIPS Burst Execution at 25 MHz
- 9.4 MIPS' Sustained Execution at
25 MHz
- Sixteen Global 32-Bit Registers
- Sixteen Local 32-Bit Registers
- Four Local Register Sets Stored
On-Chip
- Register Score boarding
Floating-Point Unit
• -On-Chip
Supports IEEE 754 Standard
III Built-In Interrupt Controller
- Four SO-Bit Registers
- 5.2 Million Whetstones/s at
25 MHz
- 32 Priority Levels 256 Vectors
- 3.4 !-,-S Latency
III Easy to Use, High Bandwidth 32-Bit Bus
Ill! 512-Byte On-Chip Instruction Cache
-66.7 Mbytes/s Burst
- Up to 16-Bytes Transferred per Burst
- Direct Mapped
- Parallel Load/Decode for Uncached
Instructions
.1i1lI
eJ Uses S5C960 Bus Controller
lE Supported by 27960KX Burst EPROMs
4 Gigabyte, Linear Address Space
flI 132-Lead PGA and PQFP Packages
The 80960KB is the first member of Intel's new 32·bit processor family, the i960 series, which is designed
especially for embedded applications. It is based on the family's high performance, common core architecture,
and includes a 512-byte instruction cache, a built-in interrupt controller, and an integrated floating-point unit.
The 80960KB has a large register set, multiple parallel execution units and a high-bandwidth, burst bus. Using
advanced RiSe technology, this high performance processor is capable of execution rates in excess of 9.4
million instructions per second.' The 80960KBis well-suited for a wide range of embedded applications,
including laser printers, image processing, industrial control, robotics and telecommunications.
'Relative to Digital Equipment Corporation's VAX-111780" at 1 MIPS
BUS
CONTROL
LOGIC
AND
INTERRUPT
CONTROLLER
32-BIT
BURST
BUS
270565-1
Figure 1. The 80960KB's Highly Parallel Microarchitecture
"VAX-l1TM is a trademark of Digital Equipment Corporation.
3-81
September 1991
Order Number: 270565-006
infel@
80960KB
All members of the 80960 series share a common
core architecture which utilizes RISC technology so
that, except for special functions, the family members are object code compatible. Each new processor in the series will add its own special set of functions to the core to satisfy the needs of a specific
application or range of applications in the embedded
market. For example, future processors may include
a DMA controller, a timer or an AID converter.
THE 960 SERIES
The 80960KB is the first member ofa new family of
32-bit microprocessors from Intel known as the 960
Series. This series was especially designed to serve
the needs of embedded applications. The embedded market includes applications as diverse as industrial automation, avionics, image processing,
graphics, robotics, telecommunications and automobiles. These types of applications require high
integration, low power consumption, quick interrupt
response times and high performance. Since time to
market is critical, embedded microprocessors· need
to be easy to use in both hardware and software
designs.
The 80960KB includes an integrated floating-pOint
unit. Intel also offers a pin-compatible version, called
the 80960KA, without an FPU, and a military-grade
version, the 80960MC, with support for memory
management, mutitasking, multiprocessing and fault
tolerance.
o
gO
GLOBAL
SIXTEEN
REGISTERS(1)
32-BIT
REGISTERS
g15 '--_ _ _---J
fpO
FLOATINGPOINT
FOUR 80-BIT REGISTERS
fp3 '--_ _ _ _ _ _ _ _ _---J REGISTERS
rO
SIXTEEN
32-BIT
REGISTERS
ADDRESS
SPACE
LOCAL
REGISTERS(2)
r15
32-BITS
ARITHMETIC CONTROLS
32-BITS
INSTRUCTION POINTER
32-BITS
PROCESS CONTROLS
32-BITS
TRACE CONTROLS
232 -1
NOTES:
1. Register g15 is reserved for stack management functions.
2. Registers rO, r1, and r2 are reserved for stack management functions.
Figure 2. Register Set
3-82
intel®
80960KB
The 80960KB's architecture is based on the most
recent advances in RISC technology and is grounded in Intel's long experience in designing embedded
controllers. Many features contribute to the
80960KB's exceptional performance:
so that execution speed can be greatly improved by
ensuring that these core instructions execute in as
short a time as possible. The most-frequently executed instructions such as register-register moves,
add/subtract, logical operations, and shifts execute
in one to two cycles (Table 1 contains a list of instructions.)
1. Large Register Set. Having a large number of
registers reduces the number of times that a proces·
sor needs to access memory. Modern compilers can
take advantage of this feature to optimize execution
speed. For maximum flexibility, the 80960KB pro·
vides 32 32-bit registers and four 80-bit floatingpoint registers. (See Figure2.)
3. Load/Store Architecture. Like other processors
based on RISC technology, the 80960KB has a
Load/Store architecture, only the LOAD and STORE
instructions reference memory; all other instructions
operate on registers. This type of architecture simplifies instruction decoding and is used in combination
with other techniques to increase parallelism.
KEY PERFORMANCE FEATURES
2. Fast Instruction Execution. Simple functions
make up the bulk of instructions in most programs,
Control
Opcode
Compare
and Branch
Opcode
Reg/Lit
Reg
Register
to Register
Opcode
Reg
Reg/Lit
Memory
Access-Short
Opcode
Reg
Base
Memory
Access-Long
Opcode
Reg
Base
Displacement
M
Displacement
Modes
x
M
Mode
Displacement
Figure 3. Instruction Formats
3-83
Ext'dOp
Reg/Lit
Offset
Scale
xx
Index
80960KB
Table 1. 80960KB Instruction Set
Data Movement
Load
Store
Move
Load Address
Comparison
Logical
Add
Subtract
Multiply
Divide
Remainder
Modulo
Shift
Extended Multiply
Extended Divide
And
Not And
And Not
Or
Exclusive Or
Not Or
Or Not
Nor
Exclusive Nor
Not
Nand
Rotate
Branch
Compare
Conditional
Compare
Compare and
Increment
Compare and
Decrement
Call/Return
Call
Call Extended
Call System
Return
Branch and Link
Unconditional
Branch
Conditional Branch
Compare and
Branch
Debug
Modify Trace
Controls
Mark
Force Mark
Arithmetic
,
Conversion
Convert Real to Integer
Convert Integer to Real
Miscellaneous
Decimal
Atomic Add
Atomic Modify
Flush Local Registers
Modify Arithmetic
Controls
Modify Process Controls
Scan Byte for Equal
Test Condition Code
Move
Add with Carry
Subtract with Carry
Floating-Point
Synchronous
Move Real
Add
Subtract
Multiply
Divide
Remainder
Scale
Round
Square Root
Sine
Cosine
Tangent
Arctangent
Log
Log Binary
Log Natural
Exponent
Classify
Copy Real Extended
Compare
Synchronous Load
Synchronous Move
3-84
Bit and Bit
Field
Set Bit
Clear Bit
Not Bit
Check Bit
Alter Bit
Scan for Bit
Scan over Bit
Extract
Modify
Fault
Conditional Fault
Synchronize Faults
intel®
80960KB
4. Simple Instruction Formats. All instructions in
the B0960KB are 32-bits long and must be aligned
on word boundaries. This alignment makes it possible to eliminate the instruction-alignment stage in
the pipeline. To simplify the instruction decoder further, there are only five instruction formats and each
instruction uses only one format. (See Figure 3.)
to ensure efficient compiler implementations of highlevel languages such as C, Fortran and Ada. Table 2
lists the memory addressing modes.
Data Types
The B0960KB recognizes the following data types:
5. Overlapped Instruction Execution. A load operation allows execution of subsequent instructions to
continue before the data has been returned from
memory, so that these instructions can overlap the
load. The B0960KB manages this process transparently to software through the use of a register scoreboard. Conditional instructions also make use of a
scoreboard so that subsequent unrelated instructions cari be executed while the conditional instruction is pending.
6. Integer Execution Optimization. When the result of an operation is used as an operand in a subsequent calculation, the value is sent immediately to
its destination register. Yet at the same time, the
value is put back on a bypass path to the ALU,
thereby saving the time that otherwise would be required to retrieve the value for the next operation.
7. Bandwidth Optimizations. The B0960KB gets
optimal use of its memory bus bandwidth because
the bus is tuned for use with the cache: the line size
of the instruction cache matches the maximum burst
size for instruction fetches. The B0960KB automatically fetches four words in a burst and stores them
directly in the cache. Due to the size of the cache
and the fact that it is continually filled in anticipation
of needed instructions in the program flow, the
B0960KB is exceptionally insensitive to memory wait
states. In fact, each wait state causes only· a 7%
degradation in system perfomance. The benefit is
that the B0960KB will deliver outstanding performance even with a low cost memory system.
8. Cache Bypass. If there is a cache miss, the processor fetches the needed instruction, then sends it
on to the instruction decoder at the same time it
updates the cache. Thus, no extra time is taken to
load and read the cache.
Numeric:
.. B-, 16-, 32- and 64-bit ordinals
.. B-, 16, 32- and 64-bit integers
o 32-, 64- and aO-bit real numbers
Non·Numeric:
.. Bit
o Bit Field
o Triple-Word (96 bits)
o Quad-Word (12B bits)
large Register Set
The programming environment of the a0960KB inc
eludes a large number of registers. In fact, 36 registers are available at any time. The availability of this
many registers greatly reduces the number of memory accesses required to execute most programs,
which leads to greater instruction processing speed.
There are two types of general-purpose registers:
local and global. The 20 global registers consist of
sixteen 32-bit registers (GO through G15) and four
BO-bit registers (FPO through FP3). These registers
perform the same function as the general-purpose
registers provided in other popular microprocessors.
The term global refers to the fact that these registers retain their contents across procedure calls.
The local registers, on the other hand, are procedure specific. For each procedure call, the B0960KB
allocates 16 local registers (RO through R15). Each
local register is 32 bits wide. Any register can also
be used for single or double-precision floating-point
operations; the BO-bit floating-point registers are provided for extended precision.
Multiple Register Sets
Memory Space and Addressing Modes
The B0960KB offers a linear programming environment so that all programs running on the processor
are contained ina single address space. The maximum size of the address space is 4 Gigabytes (2 32
bytes).
For ease of use, the B0960KB has a small number of
addressing modes, but includes all those necessary
To further increase the efficiency of the register set,
multiple sets of local registers are stored on-chip.
This cache holds up to four local register frames,
which means that up to three procedure calls can be
made without having to access the procedure stack
resident in memory.
Although programs may have procedure calls nested many calls deep, a program typically oscillates
back and forth between only two or three levels. As
InteL
80960KB
Table 2. Memory Addressing Modes
• 12-Bit Offset
• 32-Bit Offset
• Register-Indirect
•
•
•
•
Register +
Register +
Register +
Register x
• Register
+
12-Bit Offset
32-Bit Offset
(Index-Register x Scale-Factor)
Scale Factor + 32-Bit Displacement
(Index-Register x Scale-Factor)
+
32-Bit Displacement
Scale-Factor is 1, 2, 4, 8 or 16
a result, with four stack frames in the cache, the
probability of there being a free frame on the cache
when a call is made is very high. In fact, runs of
representative C-Ianguage programs show that 80%
of the calls are handled without needing to access
memory.
procedure stack in memory to make room for a new
set of registers. Global register G15 is used by the
processor as the frame pointer (FP) for the procedure stack.
Note that the global and floating-point registers are
not exchanged on a procedure call, but retain their
contents, making them available to all procedures
for fast parameter passing. An illustration of the register cache is shown in Figure 4.
If there are four or more active procedures and a
new procedure is called, the processor moves the
oldest set of local registers in the register cache to a
REGISTER
CACHE
ONE OF FOUR
LOCA L
REGISTER SETS
--------
----.
RO
31
o
270565-2
Figure 4. Multiple Register Sets Are Stored On-Chip
3-86
80960KB
In essence, the two unrelated instructions between
the LOAD and ADD instructions are executed for
free (i.e., take no apparent time to execute) because
they are executed while the register is being loaded.
Up to three LOAD instructions can be pending at
one time with three corresponding scoreboard bits
set. By exploiting this feature, system programmers
and compilers have a useful tool for optimizing execution speed.
Instruction Cache
To further reduce memory accesses, the 80960KB
includes a S12-byte on-Chip instruction cache. The
instruction cache is based on the concept of locality
of reference; that is, most programs are not usually
executed in a steady stream but consist of many
branches and loops that lead to jumping back and
forth within the same small section of code. Thus, by
maintaining a block of instructions in a cache, the
number of memory references required to read instructions into the processor can be greatly reduced.
Floating-Point Arithmetic
In the 80960KB, floating-point arithmetic has been
made an integral part of the architecture. Having the
floating-point unit integrated on-chip provides two
advantages. First, it improves the performance of
the chip for floating-point applications, since no
additional bus overhead is associated with floatingpoint calculations, thereby leaving more time for other bus operations such as 1/0. Second, the cost of
using floating-point operations is reduced because a
separate coprocessor chip is not required.
To load the instruction cache, instructions are
fetched in 16-byte blocks, so that up to four instructions can be fetched at one time. An efficient
prefetch algorithm increases the probability that an
instruction will already be in the cache when it is
needed.
Code for small loops will often fit entirely within the
cache, leading to a great increase in processing
speed since further memory references might not be
necessary until the program exits the loop. Similarly,
when calling short procedures, the code for the calling procedure is likely to remain in the cache, so it
will be there on the procedure's return.
The 80960KB floating-point (real number) data types
include single-precision (32-bit), double-precision
(64-bit), and extended precision (80-bit) floatingpoint numbers. Any register may be used to execute
floating-point operations.
.
Register Scoreboarding
The processor provides hardware support for both
mandatory and recommended portions of IEEE
Standard 754 for floating-point arithmetic, including
all arithmetic, exponential, logarithmic, and other
transcendental functions. Table 3 shows execution
times for some representative instructions.
The instruction decoder has been optimized in several ways. One of these optimizations is the ability to
do instruction overlapping by means of register
score boarding,
Register scoreboarding occurs when a LOAD instruction is executed to move a variable from memory into a register. When the instruction is initiated, a
scoreboard bit on the target register is set. When the
register is actually loaded, the bit is reset. In between, any reference to the register contents is accompanied by a test of the scoreboard bit to insure
that the load has completed before processing continues. Since the processor does not have to wait for
the LOAD to be completed, it can go on to execute
additional instructions placed in between the LOAD
instruction and the instruction that uses the register
contents, as shown in the following example:
Table 3. Sample Floating-Point Execution
Times (,...s) at 25 MHz
LOAD R4, address 1
LOAD RS, address 2
Unrelated instruction
Unrelated instruction
ADD R4, RS, R6
3-87
32-Bit
64-Bit
Add
Subtract
Multiply
Divide
0.4
0.4
0.7
1.3
0.5
0.5
1.3
2.9
Square Root
Arctangent
Exponent
Sine
Cosine
3.7
10.1
11.3
15.2
15.2
3.9
13.1
12.5
16.6
16.6
80960KB
High Bandwidth Local Bus
Debug Features
An 80960KB CPU resides on a high-bandwidth address/data bus known as the local bus (L-Bus). The
L-Bus provides a direct communication path between the processor and the memory and 110 subsystem interfaces. The processor uses the local bus
to fetch instructions, manipulate memory; and respond to interrupts. Its features include:
The 80960KB has built-in debug capabilities. There
are two types of breakpoints and six different trace
modes. The debug features are controlled by two
internal 32-bit registers, the Process-Controls Word
and the Trace-Controls Word. By setting bits in
these control words, a software debug monitor can
closely control how the processor responds during
program execution.
.
• 32-bit multiplexed address/data path
• Four-word burst capability, which allows transfers
from 1 to 16 bytes at a time
.
The 80960KB has both hardware and software
breakpoints. It provides two hardware breakpoint
registers on-chip which can be set by a special command to any value. When the instruction pointer
matches the value in one of the breakpoint registers,
the breakpoint will fire, and a breakpoint handling
routine is called automatically.
• High bandwidth reads and writes at 66.7 Mbytes
per second
• Special signal to indicate whet.her a memory
transaction can be cached
.
Figure 5 identifies the groups of signals which constitute the L-Bus. Table 4 lists the function of the LBus and o~her processor-support signals, such as
the interrupt lines.
The 80960KB also provides software breakpoints
through the use of two instructions, MARK and
FMARK. These instructions can be placed at any
point in a program and will cause the processor to
halt execution at that point and call the breakpoint
handling routhie. The breakpoint mechanism is easy
to use and provides a powerful debugging tool.
Interrupt Handling
The 80960KB can be interrupted in one of two ways:
by the activation of one of four interriJpt pins or by
sending a message on the processor's data bus.
Tracing is available for instructions (single-step execution), calls and returns, and branching. Each different type of trace may be enabled separately by a
special debug instruction. In each case, the
80960KB executes the instruction first and then
calls a trace handling routine (usually part of a software debug monitor). Further program execution is
halted until the trace routine is completed. When the
trace event handling routine is completed, instruction execution resumes at the next instruction; The
The 80960KB is unusual in that it automatically handles interrupts on a priority basis and tracks pending
interrupts through its on-chip interrupt controller.
Two of the interrupt pins can be configured to provide 8259A handshaking for expansion beyond four
interrupt lines.
LOCAL BUS
\
LOCAL BUS SIGNAL GROUPS
\
ADDRESS/DATA (32 LINES)
<.-------->
CONTROL (ADDRESS,DATA, and OPERATION SIGNALS - 15 LINES)
<
ARBITRATION (2 LINES)
Figure 5. Local Bus Signal Groups
3-88
>
270565-3
80960KB
80960KB's tracing mechanisms, which are implemented completely in hardware, greatly simplify the
task of testing and debugging software.
of logic. Then, before executing its first instruction, it
does a zero check sum on the first eight words in
memory to ensure that the system has been loaded
correctly. If a problem is discovered at any point during the self-test, the 80960KB will assert its FAILURE pin and will not begin program execution. The
self-test takes approximately 47,000 cycles to complete.
FAULT DETECTION
The 80960KB has an automatic mechanism to
handle faults. There are ten fault types including
trace, arithmetic, and floating-point faults. When the
processor detects a fault, it automatically calls the
appropriate fault handling routine and saves the current instruction pointer and necessary state information to make efficient recovery possible. The processor posts diagnostic information on the type of fault
to a Fault Record. Like interrupt handling routines,
fault handling routines are usually written to meet
the needs of a specific application and are often included as part of the operating system or kernel.
System manufacturers can use the 80960KB's selftest feature during incoming parts inspection. No
special diagnostic programs need to be written, and
the test is both thorough and fast. The self-test capability helps ensure that defective parts will be discovered before systems are shipped, and once in
the field, the self-test makes it easier to distinguish
between problems caused by processor failure and
problems resulting from other causes.
For each of the ten fault types, there are numerous
subtypes that provide specific information about a
fault. For example, a floating-point fault may have its
subtype set to an Overflow or Zero-Divide fault. The
fault handler can use this specific information to respond correctly to the fault.
CHMOS
The 80960KB is fabricated using Intel's CHMOS IV
(Complementary High Speed Metal Oxide Semiconductor) process. This advanced technology eliminates the frequency and reliability limitations of older
CMOS processes and opens a new era in microprocessor performance. It combines the high performance capabilities of Intel's industry-leading
HMOS technology with the high density and low
power characteristics of CMOS. The 80960KB is
available at 10, 16, 20 and 25 MHz.
BUILT-IN TESTABILITY
Upon reset, the 80960KB automatically condu9ts an
extensive internal test (self-test) of its major blocks
Table 4a. 80960l of 5V. Figure 8
shows the typical power supply current (Ice> required by the 80960KB at various operating frequencies when measured at three input voltage (Vee>
levels.
When the 80960KB open-drain driver under test is
on, diode Dl is also on, and the voltage on the pin
being tested drops to VOL. Diode D2 turns' off and
IOL flows through diode Dl.
For a given output current (Iod, the curve in Figure 9
shows the worst case output low voltage (Vod.
vee
vee
130.0.
180.0.
80960KB
80960KB
OPEN-DRAIN
OUTPUT
OPEN-DRAIN
OUTPUT
280.0.
390.0.
270565-26
270565-25
Low Drive Network:
High Drive Network:
• VOH = 3.42V
• IOL = 25.3 rnA
• VOH = 3.41V
• IOL = 33.8 rnA
Figure 6. Connection Recommendations for Low and High Current Drive Networks
3-93
80960KB
Vee
=5,OV
380
--
360
340
320
300
280
'--.....
260
240
--- -
---
220
200
180
-60
-40
-20
40
20
60
100
80
120
CASE TEMPERATURE (OC)
1_25MHZ D20lAHz +16MHz 010MHz
I
270565-27
Figure 7. Typical Supply Current (Icc)
(Temp
~
+22°C)
-
500
,........
]: 450
~
~
400
350
~
250
0..
~
200
150
~
100
~
50
a
....
300
---
.,../"
~
:..0---......-: :..--: ,....;--
~
Ii!:::: p-
o
o
5·
-----
20
15
10
vy
25
OPERATING FREQUENCY (MHz)
1_@4.5v
O@5.0V
.@5.5V
1
270565-28
Figure 8. Typical Current vs Frequency
[remp
~
+ 85°C, Vee
~
4.5V)
(Temp
0.8
>'
'&
0.6
~
> 0.4
j
/
"
~ 0.2
/"
/
V
+ 85°C, Vee
!
..
/
o
20
:!!
~
"
~
4.5V)
30
~ 25
15
FALL NG
)..... V
~
::..-RISI
~
---G
a.
~
o
10
-5
o
0.0
~
~
o
10
20
30
40
Output Low Current (mA)
50
o
o
20
40
60
80
Capacitive Load (pr)
270565-29
100
270565-30
Figure 10. Capacitive Derating Curve
Figure 9. Worst Case Voltage vs Output
Current on Open-Drain Pins
3-94
80960KB
ABSOLUTE MAXIMUM RATINGS*
NOTICE: This is a production data sheet. The specifications are subject to change without notice.
Operating Temperature ........ O'C to + 85'C Case
* WARNING: Stressing the device beyond the "Absolute
Maximum Ratings" may cause permanent damage.
These are stress ratings only. Operation beyond the
"Operating Conditions" is not recommended and extended exposure beyond the "Operating Conditions"
may affect device reliability.
Storage Temperature .......... - 65'C to + 150'C
Voltage on Any Pin .......... -0.5V to Vcc + 0.5V
Power Dissipation ................. 2.5W (25 MHz)
DC CHARACTERISTICS
PGA:
80960KB (16 MHz): TCASE = O'C to + 85'C, Vcc = 5V ± 10%
80960KB (20 and 25 MHz): T CASE = O'C to + 85'C, Vcc = 5V ± 5%
PQFP:
80960KA (10 and 16 MHz): TCASE = O'C to +100'C, Vcc = 5V ±10%
80960KA (20 MHz): TCASE = O'C to + 100'C, VCC = 5V ±5%
Symbol
Vil
Parameter
Input Low Voltage
Min
Mal(
Units
-0.3
+0.8
V
Test Conditions
VIH
Input High Voltage
2.0
VCC + 0.3
V
VCl
CLK2 Input Low Voltage
-0.3
+0.8
V
VCH
CLK2 Input High Voltage
0.55 VCC
VCC + 0.3
V
VOL
Output Low Voltage
0.45
V
(1,5)
VOH
Output High Voltage
V
(2,4)
Icc
Power Supply Current:
10 MHz
16 MHz
20 MHz
25 MHz
300
375
420
480
mA
mA
mA
mA
III
Input Leakage Current
±15
}J-A
0:0: VIN :0: VCC
ILO
Output Leakage Current
±15
}J-A
0.45 :0: Vo :0: VCC
CIN
Input Capacitance
10
pF
fC
= 1 MHz(3)
Co
1/0 or Output Capacitance
12
pF
Ic
= 1 MHz(3)
CClK
Clock Capacitance
10
pF
IC
= 1 MHz(3)
2.4
NOTES:
1. For tri-state outputs, this parameter is measured at:
Address/Data ........................................................... : ............................. .4.0 mA
Controls ............................................................................................... 5.0 mA
2. This parameter is measured at:
Address/Data ........................................................................................ -1.0 mA
Controls ............................................................................................. -0.9 mA
ALE ................................................................................................. -5.0 mA
3. Input, output, and clock capacitance are not tested.
4. Not measured on open-drain outputs.
5. For open-drain outputs .................................................................................. 25 mA
3-95
80960KB
AC SPECIFICATIONS
This sec~ion describes the AC specifications for the
S0960KB pins. All input and output timings are specified relative to the 1.5V level of the rising edge. For
output timings, the specifications refer to the time it
takes the signal to reach 1.5V.
EDGE
A
B
For input timings, the specifications refer to the time
at which the signal reaches (for input setup) or
leaves (for hold time) the TTL levels of LOW.(O.SV)
or HIGH (2.0V). All AC testing should be done with
input clock voltages of 0.4V and 2.4V, except for the
clock (CLK2), which should be tested with input voltages of,0.45 Vee and 0.55 Vee.
D
C
A
B
C
CLK2
OUlPUTS:
LAD31 -LADO'
ADS,
W!ii,DEN,
BE3-BEO
HLDA/HOLDR,
CACHE
LOCK,INTA
DT!ii
INPUTS:
LAD31 -LADo'
BADAC, ,
IAC/INTo,INT l'
INT2/INTR .iNi'3
HOLD.HLDAR.
LOCK. '
READY
270565-4
Figure 11. Drive Levels and Timing Relationships for a0960KB Signals
3-96
int:eL
80960KB
To
Td
Tr
To
Td
Td
Tr
ClK2
ClK
A/D
ALE
ADS
BE(0:3)
W/R
DT/R
DEN
READY
270565-5
Figure 12. Timing Relationship of L-Bus Signals
3·97
intei~
80960KB
AC Specification Tables
80960KB AC Characteristics (10 MHz. PQFP Only)
Min
Max
Units
T1
Symbol
Processor Clock
Period (CLK2)
50
125
ns
VIN
= 1.5V
T2
Processor Clock
Low Time (CLK2)
12
ns
VIL
= 10% Point
= 1.2V
T3
Processor Clock
High Time (CLK2)
12
ns
VIH
T4
Processor Clock
Fall Time (CLK2)
10
Processor Clock
Rise Time (CLK2)
10
T5
T6
Parameter
, Output Valid
Delay
ns
Test Conditions
= 90% Point
= 0.1V·+ 0.5 Vcc
VIN = 90% Point to 10%
Point
ns
VIN
= 10% Point to 90%
Point
2
25
ns
CL
CL
= 100 pF (LAD)
= 75 pF (Controls)(2)
31
ns
CL
= 75 pF'
T6H
HOLDA Output
Valid Delay
4
T7
ALE Width
25
ns
CL
Te
ALE Output Valid Delay
0
20
ns
CL
= 75 pF
= 75 pF(2)
Tg
Output Float
Delay
2
20
ns
CL
CL
= 100 pF (LAD)
= 75 pF (Controls)
T9H
HOLDA Output
Float Delay
4
20
ns
CL
= 75pF
TlO
Input Setup 1
3
ns
T11
Input Hold
5
ns
Tl1H
HOLD Input Hold
4
ns
T12
Input Setup 2
8
ns
T13
Setup to ALE
Inactive
10
ns
CL
CL
= 100 pF (LAD)
= 75 pF (Controls)
T14
Hold after ALE
Inactive
8
ns
CL
CL
= 100 pF (LAD)
= 75 pF (Controls)
T15
Reset Hold
3
ns
T16
Reset Setup
5
ns
T17
Reset Width
1640
ns
41 CLK2 Periods Minimum
NOTES:
1. iAC/INTo. INT1. INT2/INTR. INT3 can be asynchronous.
2. A float condition occurs when the maximum output current becomes less than ILQ. Float delay is not tested. but should be
no longer than the valid delay.
3. Clock rise and fall times are not tested.
3-98
B0960KB
80960KB AC Characteristics (16 MHz)
Symbol
Min
Max
Units
T1
Processor Clock
Period (CLK2)
Parameter
31.25
125
ns
VIN = 1.5V
T2
Processor Clock
Low Time (CLK2)
8
ns
VIL = 10% Point
= 1.2V
T3
Processor Clock
High Time (CLK2)
8
ns
VIH = 90% Point
= 0.1V + 0.5 Vee
T4
Processor Clock
Fall Time (CLK2)
10
ns
VIN = 90% Point to 10%
Point
T5
Processor Clock
Rise Time (CLK2)
10
ns
VIN = 10% Point to 90%
Point
T6
Output Valid
Delay
2
25
ns
CL = 100 pF (LAD)
CL = 75 pF (Controls)
T6H
HOLDA Output
Valid Delay
4
31
ns
CL = 75 pF
T7
ALE Width
15
ns
CL = 75pF
Ts
ALE Output Valid Delay
0
20
ns
CL = 75pF(2)
Ta
Output Float
Delay
2
20
ns
CL = 100 pF (LAD)
CL = 75 pF (Controls)(2)
TaH
HOLDA Output
Float Delay
4
20
ns
CL = 75pF
T10
Input Setup 1
3
T11
Input Hold
5
ns
T11H
HOLD Input Hold
4
ns
Test Conditions
ns
T12
Input Setup 2
8
ns
T13
Setup to ALE
Inactive
10
ns
CL = 100 pF(LAD)
CL = 75 pF (Controls)
T14
Hold after ALE
InaCtive
8
ns
CL = 100 pF (LAD)
CL = 75 pF (Controls)
T15
Reset Hold
3
ns
T16
Reset Setup
5
ns
T17
Reset Width
1281
ns
41 CLK2 Periods Minimum
NOTES:
1. lAC/INTo. INT1. INT2/INTR. INT3 can be asynchronous.
2. A float condition occurs when the maximum output current becomes less than ILQ. Float delay is not tested. but should be
no longer than the valid delay.
3. Clock rise and fall times are not tested.
3-99
inlet
80960KB
80960KB AC Characteristics (20 MHz)
Symbol
Parameter
Test Conditions
Min
Max
T1
Processor Clock
Period (CLK2)
25
125
Units
T2
Processor Clock
Low Time (CLK2)
T3
Processor Clock
High Time (CLK2)
T4
Processor Clock
Fall Time (CLK2)
10
ns
= 10% Point
= 1.2V
VIH = 90% Point
= 0.1V + 0.5VCC
VIN = 90% Point to 10%
T5
Processor Clock
Rise Time (CLK2)
10
ns
VIN
T6
Output Valid
Delay
2
20
ns
CL
CL
T6H
HOLDA Output
Valid Delay
4
26
ns
T7
ALE Width
12
Ts
ALE Output Valid Delay
0
20
ns
T9
Output Float
Delay
2
20
ns
T9H
HOLDA Output
Float Delay
4
20
ns
T10
Input Setup 1
3
ns
T11
Input Hold
5
ns
T11H
HOLD Input Hold
4
ns
T12
Input Setup 2
7
ns
T13
Setup to ALE
Inactive
10
ns
T14
Hold after ALE
Inactive
8
ns
T15
Reset Hold
3
ns
T16
Reset Setup
5
ns
T17
Reset Width
1025
ns
ns
VIN
6
ns
VIL
6
ns
~
=
1.5V
Point
=
10% Point to 90%
Point
= 60 pF (LAD)
= 50 pF (Controls)
CL = 50 pF
= 50pF
= 50pF(2)
CL = 60 pF (LAD)
CL = 50 pF (Controls)(2)
CL = 50pF
ns
CL
CL
= 60 pF (LAD)
= 50 pF (Controls)
CL = 60 pF (LAD)
CL = 50 pF (Controls)
CL
CL
41 CLK2 Periods Minimum
NOTES:
1. IAC/INTo, INT1, INT2/INTR, INT3 can be asynchronous.
2. A float condition occurs when the maximum output current becomes less than ILQ. Float delay is not tested, but should be
no longer than the valid delay.
3. Clock rise and fall times are not tested.
80960KB
IOL
0-----.1
TRISTATE OUTPUT
L
C
~
~~~~~~~AIN OUTPUT
Bl
D,
~2
O---,....--H~I---'--t*""--o
I
cL
270565-31
T
'<7
VR£F
IOL Tested at 25 and 40 rnA
VREF
=
Vee
D, and D2 are rnaiched
270565-32
Figure 13. Test Load Circuit for
Tri-State Output Pins
Figure 14. Test Load Circuit for, Open-Drain Output Pins
3-100
intel®
80960KB
80960KB AC Characteristics (25 MHz, PGA Only)
Symbol
Parameter
Min
Max
Units
Test Conditions
20
125
ns
VIN
= 1.5V
= 10% Point
= 1.2V
T1
Processor
Clock Period (CLK2)
T2
Processor Clock
Low Time (CLK2)
5
ns
VIL
T3
Processor Clock
High Time
5
ns
VIH
T4
Processor Clock
Fall Time (CLK2)
10
T5
Processor Clock
Rise Time (CLK2)
10
ns
VIN
T6
Output Valid
Delay
2
18
ns
CL
CL
= 60 pF (LAD)
= 50 pF (Controls)
T6H
HOLDA Output
Valid Delay
4
24
ns
CL
= 50 pF
T7
ALE Width
12
ns
CL
Ts
ALE Output Valid Delay
0
20
ns
CL
Tg
Output Float
Delay
2
18
ns
CL
CL
TgH
HOLDA Output
Float Delay
4
20
ns
CL
=
=
=
=
=
T10
Input Setup 1
3
T11
Input Hold
5
ns
T11H
HOLD Input Hold
4
ns
ns
Point
= 10% Point to 90%
Point
50 pF
50 pF (2)
60 pF (LAD)
50 pF (Controls)
50pF
ns
T12
Input Setup 2
7
ns
T13
Setup to ALE
Inactive
8
ns
T14
Hold after ALE
Inactive
8
ns
. T15
= 90% Point
= 0.1V + 0.5 Vcc
VIN = 90% Point to 10%
Reset Hold
3
ns
T16
Reset Setup
5
ns
T17
Reset Width
820
ns
=
=
CL =
CL =
CL
CL
60 pF (LAD)
50 pF (Controls)
60 pF (LAD)
50 pF (Controls)
41 CLK2 Periods Minimum
NOTES:
1. IACIINTO, INTi, INT2I1NTR, INT3 can be asynchronous.
.
2. A float condition occurs when the maximum output current becomes less than ILO. Float delay is not tested, but should be
no longer than the valid delay.
3. Clock rise and fall times are not tested.
3-101
80960KB
HIGH LEVEL (MIN) 0.S5VCC
90%
LOW LEVEL (MAX) O.BV
I
I
10%
I
I
I
I
I
~I~~--------~
T2
T4~
Ts :..-.-.:
,
,
270565-6
Figure 15. Processor Clock Pulse (CLK2)
FIRST
ABC
CLK2
CLK
RESET
0
A
•••
•••
•••
T17
OUTPUTS
•••
~
INIT PARAMETERS (BADAC.
IACQ) MUST BE SETUP 8 CLOCKS
PRIOR TO THIS CLK2 EDGE
INIT PARAMETERS MUS, BE HELD
BEYOND THIS CLK2 EDGE
Figure 16. RESET Signal Timing
3-102
TIS
T16
T17
=RESET HOLD
= RESET SETUP
= RESET WIDTH
270565-7
80960KB
Th
Th
Th
Th
ClK2
ClK
HOlOR
HOLD
HlOA
HlDAR
270565-8
PRIMARY
HOlO
I
~
SECONOARY
HOlOR
~______Hl_OA-t~--------~~~H_O_lO_AR____~
270565-24
Figure 17. Hold Timing
Input hold times can be disregarded by the designer
whenever the input is removed because a subsequent output from the processor is deasserted (e.g.,
DEN becomes deasserted).
When designing an 80960KB hardware system that
uses the ICE-960KB to debug the system, several
electrical and mechanical characteristics should be
considered. These considerations include capacitive
loading, drive requirement, power requirement and
physical layout.
Whenever the processor generates an output that
indicates a transition into a subsequent state, any
outputs that are specified to be tri-stated in this new
state are guaranteed to be tri-stated. For example, in
the Td cycle following~ cycle for a read, the minimum output delay of DEN is 2 ns, but the maximum
float time of LAD is 20 ns. When DEN is asserted,
however, the LAD outputs are guaranteed to have
been tri-stated.
The ICE-960KB probe module increases the load
capacitance of each line by up to 25 pF. It also adds
one standard Schottky TTL load on the CLK2 line,
up to one advanced low-power Schottky TTL load
for each control signal line, and one advanced lowpower Schottky TTL load for each address/data and
byte enable line. These loads originate from the
probe module and are driven by the 80960KB processor.
Designing for the ICE-960KB
To achieve high noise immunity, the ICE-960KB
probe is powered by the user's system. The highspeed probe circuitry draws up to 1".1 A plus the maximum current (led of the 80960KB processor.
Design Considerations
The 80960KB In-Circuit Emulator assists in debugging both 80960KA and 80960KB hardware and
software designs. The product consists of a probe
module, cable, and control unit. Because of the high
operating frequency of 80960KB systems, the probe
module connects directly to the 80960KB socket.
The mechanical considerations are shown in Figure
18, which illustrates the lateral clearance requirements for the ICE-960KB probe as viewed from
above the socket of the 80960KB processor.
3-103
infei~
80960KB
~~----- 3.8"-----~-
1- ,. "-1
13
1--1.22"--1
~
-0""'.1""5"--'---'--
r------- ..
I
I
0 I'
I
:
USER CPU
SOCKET
UNDER
EMULATION
PROCESSOR
I
I
'I 0
I
----~
• _______ .a
VERTICAL
CLEARANCE 1.2"
o
\
EMULATION
PROCESSOR
VIEW fROM
ABOVE USER CPU
SOCKET
ICE PROCESSOR MODULE
5.5"
n
n
I
RIBBON CABLE CONNECTOR
cABLE TO ICE CONTROL UNIT
MINIMUM CABLE
BEND RADIUS:
LESS THAN 3.0"
4.75"
I' -----''--
~'
_-~~_..II.AI"'-_....J
270565-9
Figure 18.ICE-960KB Lateral Clearance Requirements
heat dissipation or reduce repair costs, Figures 23a
and 23b show two of the many sockets available.
MECHANICAL DATA
Package Dimensions and Mounting
Pin Assignment
The 80960KB is available in two different packages:
a 132-lead ceramic pin-grid array (PGA) and a 132lead pla:stic quad flat pack (PQFP). Pins in the ceramic package are arranged 0.100 inch (2.54 mm)
center-to-center, in a14 by 14 matrix, three rows
around. (See Figure 19.) The plastic package uses
fine·pitch gull wing leads arranged in a single row
along the perimeter of the package with 0.025 inch
(0.64 mm) spacing. (See Figure 20.) Dimensions are
given in Figure 21 and Table 7.
The PGA and PQFP have different pin assignments.
Figure 24 shows the view from :the bottom of the
PGA (pins facing up) and Figure 25 shows a view
from the top of the PGA (pins facing down). Figures
20 and 32 show. the top view of the PQFP; notice
that the pins are numbered in order from 1 to 132
around the package's perimeter. Tables 5 and 6 lis.t
the function· of each pin in the PGA, and Tables 8
and 9 list the function of each pin in the PQFP.
There are a wide va:riety of sockets available for the
ceramic PGApackage including low-insertion or
zero-insertion force mountings, and a choice of terminals such as soldertail, surface mount, or wire
.wrap. Several applicable sockets are shown in Figure 22.
.
Vee and GND connections must be made to multiple Vee and GND pins. Each Vee and GND pin must
be connected to the appropriate voltage or ground
and externally strapped close to the package. We
recommend that you: include sepa~ate power. and
ground planes in your circuit board for power distribution.
.
The PQFP is normally surface mounted to take best
advantage Of the' plastiC package's small footprint
and low cost. In some applications, however, designers may prefer to use a socket, either to improve
NOTE:
Pins identified as N.C., "No Connect," should never
be connected.
3-104
80960KB
Package Thermal Specification
SUPPORT COMPONENTS
The 80960KB is specified for operation when case
temperature is within the range O°C to + 85°C (PGA)
or + 100°C (PQFP). The case temperature should
be measured at the top center of the package as
shown in Figure 26.
85C960 Burst Bus Controller
The ambient temperature can be calculated from lIje
and lIja by using the following equations:
TJ
TA
Tc
=
=
=
Tc + P*lIje
TJ - P*lIja
TA + P*[lIja - lIje]
Values for 8ja and lIje are given in Table 10 for the
PGA pacl
70
~
~
65
~
!:;:
60
p-
~
II
-----
/
".. V~
w
~
P" ~J"'"
V
55
50
200
400
600
800
AIRFLOW (fl/min)
D PQFP
0 PGA with no
heatsink
0 PGA with omnidirectional heatsink
~
PGA with unidirectional heatsink
270565-.48
Figure 27. 10 MHz 80960 K-Series Maxim':lm Allowable Ambient Temperature
400
600
800
AIRFLOW (It/min)
• PGA with omnidirectional heatsink
~
PGA with unidirectional heatsink
270565-33
Figure 28. 16 MHz 80960 K-Series Maximum Allowable Ambient Temperature
3-117
80960K8
80
~
75
t
'"
'"
'"
~~
65
~
--- ----
~~
60
,.....~
I-
~
-
l----::- >-"
70
55
50
~
45
40
400
200
600
800
AIRFLOW (lt/min)
• PQFP
OPCA with no
heatsink
+PCA with omnidirectional heatslnk
oPCA with unl. directional heatsink
270565-41
Figure 29. 20 MHz 80960 K-Series Maximum Allowable Ambient Temperature
75
E
UJ
...'"
.70
65
........ ~
:::l
I-
60
'":'E
55
UJ
Il.
UJ
lI-
zUJ
in
...
:'E
SO
~ ;:::---
......:;V
y-a
45
35
a
...........
;;-
---'
...--
/"
./'
40
/
--=t:::--<
r-
100
200
300
400
500
600
700
800
AIRFLOW (ft/mln)
o PGA
• PGA with no
heotslnk
with omnldirectional heatslnk
• PGA with unidirectional heatslnk
270565-42
Figure 30. Maximum Allowable Ambient Temperature for
the 80960KB at 25 MHz (available in PGA only)
115
E 110
~
UJ
'":::>
...
'""-
IUJ
105
/.
100
:'E
UJ
lI-
SS
Z
UJ
in
...
:'E
90
/,
V
V
,/
85
a
/"
/'
100
~V
V
300
400
....-:::::::::;
-'
V
V
200
~
--
---'
~t-
500
600
700
800
AIRFLOW (It/min)
• PGA with no
heatslnk
o PGA
with omnldirectional heatslnk
• PGA with unidirectional heats Ink
270565-43
Figure 31. Maximum Allowable Ambient Temperature for the Extended
Temperature T A-80960KB at 20 MHz (available in PGA only)
3-118
80960KB
u
Z
u
Z
(.)
Z
u
(.)
~
en
(.)
> > z
u
>
"
u
>
U
z
LADO
100
66
Ne
LADl
101
65
He
LAD2
102
64
Ne
Vss
103
63
Ne
LAD3
104
62
Ne
LAD4
105
61
Ne
LADS
106
60
Ne
LAD6
107
59
He
LAD7
lOB
5B
Ne
LADB
109
57
Vss
LAD9
110
56
Vee
LAD10
111
55
Vee
LAD11
112
54
Ne
LAD12
113
53
Vss
Vss
114
52
Vss
LAD13
115
51
He
LADU
116
50
He
LAD15
117
49
Ne
LAD16
lIB
4B
Ne
LAD17
119
47
Ne
80960KB
LAD1B
120
46
He
LAD19
121
45
He
LAD20
122
44
Ne
LAD21
123
43
Ne
LAD22
124
42
Vss
Vss
125
41
Vee
LAD23
126
40
LAD24
127
39
Ne
Ne
LAD25
12B
3B
He
BADAe
129
37
Ne
HOLD/HLDAR
130
36
Vee
Ne
131
35
Vee
34
Ne
ADS
50 I~
"
"<
:l
"
270565-47
Figure 32. 80960KB PQFP Pinout-View from Top
3-119
II
80960K8
Table S. S0960KB Plastic Package Pinout-In Pin Order
Pin
Signal
Pin
Signal
Pin
Signal
Pin
Signal
1
HLDAlHOLR
34
N.C.
67
LADO
ALE
35
68
101
LADI
3
LAD26
36
Vee
Vee
Vss
Vss
100
2
69
N.C.
102
LAD2
4
LAD27
37
N.C.
70
103
Vss
.5
6
LAD28
38
N.C.
71
Vee
Vee
104
LAD3
LAD29
39
N.C.
72
N.C.
105
LAD4
7
LAD30
40
N.C.
73
LAD5
8
LAD31
41
74
107
LAD6
9
Vss
42
Vee
Vss
Vss
Vee
106
75
N.C.
108
LAD7
10
CACHE
43
N.C.
76
N.C.
109
LAD8
11
W/R
44
N.C.
77
N.C.
110
LAD9
12
READY
45
N.C.
78
N.C.
111
LAD10
13
DT/R
46
N.C.
79
LAD11
14
BEO
47
N.C.
80
Vss
Vss
112
113
LAD12
15
BE1
48
N.C.
81
N.C.
114
Vss
16
BE2
49
N.C.
82
115
LAD13
17
BE3
50
N.C.
83
116
LAD14
18
FAILURE
51
N.C.
84
Vee
Vee
Vss
117
.LAD15
19
Vss
52
85
IACIINTO
118
LAD16
20
LOCK
53
Vss
Vss
86
INT1
119
LAD17
21
DEN
54
N.C.
87
INT2/1NTR
120
LAD18
22
Vss
Vss
55
INT3/INTA
121
LAD19
89
N.C.
122
LAD20
24
N.C.
57
Vee
Vee
Vss
88
23
90
Vss
123
LAD21
25
N.C.
58
N.C.
91
CLK2
124
LAD22
26
Vss
Vss
59
N.C.
92
Vee
125
Vss
27
60
N.C.
93
RESET
126
LAD23
28
N.C.
61
N.C.
94
N.C.
127
LAD24
29
Vee
Vee
62
N.C.
95
N.C.
128
LAD25
30
63
N.C.
96
N.C.
129
BADAC
31
N.C.
64
N.C.
97
N.C.
130
HOLD/HLDAR
32
Vss
Vss
65
N.C.
98
N.C.
131
N.C.
66
N.C.
99
Vss
132
ADS
33
56
3-120
80960KB
Table 9. 80960KB Plastic Package Pinout-In Signal Order
Signal
ADS
ALE
Pin
Signal
Pin
Signal
132
LAD22
124
N.C.
Pin
49
35
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vee
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
Vss
36
W/R
2
LAD23
126
N.C.
50
129
LAD24
127
N.C.
51
BEO
14
LAD25
128
N.C.
54
BE1
15
LAD26
3
N.C.
58
BE2
16
LAD27
4
N.C.
59
BE3
17
LAD28
5
N.C.
60
BADAC
CACHE
10
LAD29
CLK2
91
LAD3
DEN
21
DT/R
13
FAILURE
18
6
N.C.
61
104
N.C.
62
LAD30
7
N.C.
63
LAD31
8
N.C.
64
LAD4
105
N.C.
65
1
LAD5
106
N.C.
66
130
LAD6
107
N.C.
69
lAC/INTO
85
LAD7
108
N.C.
72
INT1
86
LAD8
109
N.C.
75
INT211NTR
87
LAD9
110
N.C.
76
INT3/INTA
88
LOCK
20
N.C.
77
HLDAlHOLR
HOLD/HLDAR
LADO
100
N.C.
24
N:C.
78 .
LAD1
101
N.C.
25
N.C.
81
LAD10
111
N.C.
28
N.C.
89
LAD11
112
N.C.
31
N.C.
94
LAD12
113
N.C.
34
N.C.
95
LAD13
115
N.C.
37
N.C.
96
LAD14
116
N.C.
38
N.C.
97
LAD15
117
N.C.
39
N.C.
98
LAD16
118
N.C.
40
N.C.
131
LAD17
119
N.C.
43
READY
12
LAD18
120
N.C.
44
RESET
93
LAD19
121
N.C.
45
29
LAD2
102
N.C.
46
LAD20
122
N.C.
47
LAD21
123
N.C.
48
Vee
Vee
Vee
Vee
3-121
Signal
30
Pin
41
55
56
70
71
74
82
83
92
9
19
22
23
26
27
32
33
42
52
53
57
67
68
73
79
80
84
90
99
103
114
125
11
80960K13
Table 10. 80960KB PGA Package Thermal Characteristics
Thermal Resistance-°CIWatt
Airflow-ft.!mln (m/sec)
Parameter
100
0
50
200
400
600
800
(0) (0.25) (0.50) (1.01) (2.03) (3.04) (4.06)
(} Junction-to-Case
(Case Measured
as shown in Figure 26)
2
2
2
2
2
2
2
(} Case-to-Ambient
(No Heatsink)
19
18
17
15
12
10
9
16
15
14
12
9
7
6
(} Case-to-Ambient
(with Unidirectional)
Heatsink)
15
14
13
11
8
6
5
NOTES:
1. This table applies to.S0960KB PGA
plugged into socket or soldered directly into board.
+
'J pin ' "
I
(} Case-to-Ambient
(with Omnidirectional
Heatsink)
2. OJA = OJC
'Ja
UUl'
'J,
'J cap
I
UUU
270565-45
3.0J.CAP = 4°C/w (approx:)
J-PIN= 4°C/w (inner pins) (approx.).
OJ-PIN = SOC/w (outer pins) (approx.)
o
0CA.
Table 11. 80960KB PQFP Package Thermal Characteristics
PQFP Thermal Resistance-oC/Watt
Airflow-ft.!min (m/sec)
Parameter
(} Junction-to-Case
(Case Measured
as shown in Figure 26)
(} Case-to-Ambient
(No Heatsink)
50
100
200
400
0
600
800
(O) (O.25) (O.50) (1.01) (2.03) (3.04) (4.06)
9
9
9
9
9
9
9
22
19
18
16
11
9
8
NOTES:
1. This table applies to S0960KB
PQFP soldered directly into board.
2. 6JA= OJC
+
3. OJL = 1SoC/Wati
OJB = 1SoC/Wati
0CA·
270565-44
3-122
80960KB
To
Td
Tr
ClK2
ClK
lAD31 lADo
ALE
ADS
BE 3-BEO
W/R
DT/R
DEN
READY
270565-18
Figure 33. Read Transaction
3·123
80960KB
270565-19
Figure 34. Write Transaction with One Wait State
To
Td
Td
Td
T,
ClK2
ClK
LAD 31 -
LADo
ALE
ADS
&3- BEO
W/R
DT/R
DEN
READY
270565-20
Figure 35. Burst Read Transaction
3-124
B0960KB
CLK2
CLK
LAD 31 -
LADO
w/"
DT/"
270565-21
Figure 36. Burst Write Transaction with One Wait State
3·125
•
tlnn~ft.,."
uu~uun.g
IeLK
INTR
PREVIOUS
CYCLE
t 't;
INTERRUPT
ACKNOWLEDGEMENT
T
J"w
-~
T
To
IDLE
(5 BUS STATES)
"
~1
Td
Tr
TI
TI
---t-
INTERRUPT
ACKNOWLEDGEMENT
'
TI
TI
TI
To
~2
Td
Tw
j
Tr
ruru"w "wruru '\... "w "wruru ~ "w "
"~""" ~"""
~"" ~""
~"",,'\ ~"",\:: ~"",,'\
~"",\:: ~"",\:: r:iAOOR~
-~ ~""
""
~""" K'\""" ~"
"~""",\ ~""" "'~'\
~""
-- --
~"\."\.": ~'\'
,...-
,,{AoiiR~ ~~ ~
~""
V
"- V
"-!./
\.
DT/R
~"\."\." ~"\."\."0 ~
/
/
~~
A": ~"\."\."0 ~"\."\.": ~"\."\."0 ~"\."\.": ~"\."\."0 ~
"-!./
,
"-
/
-
'"
r \.J -
\. !./
270565-22
NOTE:
INTR can go low no sooner than 5 ns (input hold time) following the beginning of interrupt acknowledgement cycle 1.
For a second interrupt to be acknowledged, INTR must be low for at least three cycles before it can be reasserted.
Figure 37. Interrupt Acknowledge Transaction
3-126
80960KB
T,
PPSM BUS
STATE
SBIoi BUS
STATE
elK
J~ ~ "'--- ~ ~ ~ ~ "'--- "'---"'---"'---"'---"'--- ~ "w
,,~ ')(DATA ~ ~
' " " - ' " " - ........... ' " " -
....
W/R
PSM ALE
)(DATA ~ ~ )(DATA )(DATA ~
)(DATA )(DATA
........... :XAoDii
'""'""'""- '""'""- '""- '""-
~~
~
~~
'\."""~""~ ~""" ~""~
t\
A""
~
~~
~/
~,,~
\. /
SBIoi
PSM
HOLD
PSM
HlDA
SBIoi
HlDAR
~""" 0-.,,""
r-
X-..,,~
~""~ ~
'\.,,"~
0.,,"""
~
~""~ 10.""~ ~""~ ~
V
SBIoi ALE
HOlDR
T,
T,
T,
\. ./\. /
\. V
-V
"-- /
-~
/L ~ "/
'/
~
/
\. ./\.V
'\.
!
----
~
'\.
l - ~
~
/
-
"--I 270565-23
Figure 38. Bus Exchange Transaction (PBM
= Primary Bus Master, SBM = Secondary Bus Master)
3·127
· October 1989
80960CA
Product Overview
32-Bit High-Performance Embedded Processo~
with On-Chip DMA Controller, Interrupt Controller,
High-Speed Bus Unit, Instruction and Register Caches
Order Number: 270669-001
3-128
80960CA PRODUCT
OVERVIEW
CONTENTS
PAGE
1.0 PURPOSE ......................... 3-130
2.0 80960CA 32-BIT EMBEDDED
PROCESSOR .................... 3-130
2.1 80960 Architecture ............. 3-130
2.280960 C-Series Core ............ 3-131
2.3 80960CA System Peripherals ... 3-131
3.0 EXECUTION ENVIRONMENT ...... 3-131
3.1 Registers and Literals ........... 3-.131
3.2 Address Space and Memory .... 3-133
3.3 Memory Addressing Modes .... '.3-134
3.4 Data Types ..................... 3-135
3.5 Instruction Set .................. 3-136
3.6 Arithmetic Controls ............. 3-141
3.7 Process Management ........... 3-141
3.8 Call and Return Mechanism ..... 3-142
3.9 Interrupts ....................... 3-146
3.10 Fault Handling and Instruction
Tracing ..................... 3-148
4.0 80960CA SYSTEM
IMPLEMENTATION .............. 3-152
4.1 Peripheral Interface ............. 3-152
4.2 Bus Controller Unit .............. 3-152
4.3 DMA Controller ................. 3-157
4.4 InterruptController ..............
3~161
APPENDIX A
80960CA CORE IMPLEMENTATION .. 3-163
A.1 Instruction Sequencer .......... 3-163
A.2 Register File .................... 3-165
A.3 Execution Unit .................. 3-165
A.4 Multiply Divide Unit ............. 3-165
A.5 Address Generation Unit ........ 3·165
A.6 Data RAM and Local Register
Cache .............. '......... 3-165
3-129
80960CA PRODUCT OVERVIEW
1.0 PURPOSE
The 80960CA Product Overview is a summary of the
features and operation of Intel's 80960CA Embedded
Processor. The Product Overview is intended for those
who are not familiar with the 80960 architecture or the
80960CA, a product built around this architecture. The
80960CA Product Overview provides a programmer or
a system designer with a quick, global view of software
and hardware design considerations for the 80960CA.
For further information, refer to the following reference documents:
The 80960CA User.'s Manual contains detailed technical information and examples for designing embedded systems using the 80960CA.
The 80960CA Data Sheet provides electrical specifications for the device, such as the DC and AC parameters, operating conditions, and packaging specifications.
tecture refers to the implementation of the instruction
set and programming resources. For example, different
microarchitectures may have different pipeline construction, internal bus widths, register set porting, degrees of parallelism, and cache parameterization (twoway, four-way, etc.).
A principal objective of the 80960 architecture is to
provide the framework to allow microarchitectural advances to translate directly into increased performance
without architectural limitations.
2.0 80960CA 32-BIT EMBEDDED
PROCESSOR
The 80960CA (Figure 2-1) is optimized for embedded
processing applications. This product features the highperformance C-Series core plus built-in system periph~
erals, effectively integrating a high-speed CPU and system components onto a single silicon die. The 80960CA
is a member of Intel's 80960 embedded processor family. Each member of the 80960 family is based on a
common architectural definition referred to as the core
architecture.
An 80960. family member, such as the 80960CA, is
made up of an implementation of the core architecture
plus application-specific extensions. These extensions
may consist of integrated peripherals, instruction-set
extensions, or additional registers and caches beyond
those defined by the architecture. The common core
architecture provides a basis for code compatibility for
all 80960 family products, while application-specific extensions optimize a particular product for a class of
applications.
The 80960 architectural target is the execution of multiple instructions per clock (i.e., fractional clocks per
instruction). By defining an architecture which supports parallel instruction execution and out-of-order instruction execution, performance advances are not constrained by the system clock.
The 80960CA is capable of launching and executing
instructions in parallel. This is accomplished by the use
of advanced silicon technology as well as innovative
"microarchitectural" constructs. The term microarchi-
270669-2
Figure 2·1. 80960CA
2.1 80960 Architecture
Embedded applications are cost sensitive, require a different mix of instructions than reprogrammable applications, have demanding interrupt response requirements, and often use real-time executives rather than
full-blown operating systems. The 80960 architecture
was developed with these factors in mind. Several key
optimizations which are provided by the architecture
are explained below.
Instruction Set! Powerful Boolean operations are pro-
vided. Frequently executed functions are available as
single instructions for greater code density and performance. Call, Return, Compare-and-Branch, Conditional-Compare, Compare-and-Increment or Decrement, and Bit-Field-Extract are each single instructions.
Interrupts! A priority interrupt structure simplifies the
management of real-time events. With 31 discrete levels
of priority and 248 possible interrupt-handling procedures, this structure provides the low latency and high
throughput interrupt handling required in embedded
processor applications.
3-130
80960CA PRODUCT OVERVIEW
Faults: A generalized fault-handling mechanism simplifies the task of detecting errant arithmetic calculations
or other conditions that typically require a significant
amount of in-line user code.
.
Application-Specific Extensions: The core architecture
is designed to accept application-specific extensions
such as instruction set extensions (e.g., string functions,
floating point), special purpose registers, larger caches,
on-chip program and data memory, a memory management and protection unit, fault-tolerance support, multiprocessing support, and real-time peripherals (DMA,
serial ports, etc.).
2.2 80960 C-Series Core
The C-series core is an implementation of the 80960
core architecture. The core can execute instructions at
a sustained speed of 66 MIPS(I) with bursts of performance up to 99 MIPS. To achieve this level of performance, Intel has incorporated state-of-the-art silicon
technology and innovative microarchitectural con, structs into the C-Series core. Factors which contribute
to the core's performance are listed below.
Parallel instruction decoding allows the 80960CA
to start two instructions in every clock, with bursts
of three instructions per clock.
Most instructions execute in a single clock cycle.
Multiple independent execution units enable overlapping instruction execution.
Advanced silicon technology allows operation with
a 33 MHz internal clock.
- Efficient instruction pipeline is designed to minimize pipeline break losses.
Register and resource. scoreboarding transparently
manage parallel execution. .
Branch look-ahead feature enables branches to execute in parallel with other instructions.
- Local"register cache is integrated on-chip.
- 1 Kbyte two-way set associative instruction cache is
integrated on-chip.
- 1 Kbyte Static Data RAM is integrated on-chip.
These factors combine to make the 80960CA an ultrahigh performance computing engine.
NOTE:
1. Single clockinstructicins at 33 MHz.
Bus Controller Unit: A 32-bit high-performance bus
controller interfaces the 80960CA to external memory
and peripherals. The bus controller transfers instructions or data at a maximum rate of 132 Mbytesper
second'(2) Internally programmable wait states and 16
separately configurable memory regions allow the bus
controller to interface with a variety of memory subysterns with minimum system complexity and maximum
performance.
DMA Controller: A four channel DMA controller performs high speed data transfers between peripherals
and memory. The DMA controller provides advanced
features such as data chaining, byte assembly and disassembly, and a fly-by mode capable of transfer speeds of
up to 66 Mbytes per second. The DMA controller features a performance and flexibility which is only possible by integrating the DMA controller and the
80960CA core.
Interrupt Controller: A priority interrupt controller
manages 8 exterual interrupt inputs, 4 internal inter- •
rupt sources from the DMA controller, and a single
non-maskable interrupt input (NMI). A total of 248
. external interrupt sources are supported by the interrupt controller by configuring the 8 external interrupt
pins as an 8-bit input port. The interrupt controller provides the mechanism for the low latency and high
throughput interrupt service featured by the 80960CA.
The interrupt latency for the 80960CAis typically less
than 1 J.Ls.
3.0 EXECUTION ENVIRONMENT
The Execution Environment (Figure 3-1) refers to the
resources which are available for executing code on the
80960CA. The foI.Iowing sections describe the elements
of the execution environment.
3.1 Registers and Literals
The 80960CA provides four types of working data registers: Global Registers, Local Registers, Special Function Registers (SFRs), and Control Registers.
Global and local registers are general purpose 32-bit
data registers. The SFRs and the control registers provide a programmer's interface to the on-chip peripherals (i.e., the DMA controller, interrupt controller, and
bus controller).
NOTE:
2. 33 MHz internal clock, load or instruction fetch on
o wait state, pipelined burst bus.
2.3 80960CA System Peripherals
The 80960CA features several extensions to the core
architecture in the form of integrated peripherals.
These peripherals are intended to reduce the external
system requirements needed for embedded applications.
These peripherals are described below.
3-131
infel .
80960CA PRODUCT OVERVIEW
OOOOOOOOH
Address Space
Load
Fetch
Store
Instr~·~tl~n
Stre.am
3 Special Function Registers
Control Registers
270669-4
Figure 3-1. Execution Environment
The 80960 architecture is a register-oriented architecture, That is, operands and results of instructions are
placed in working data registers rather than in memory,
Since the architecture is register oriented, an ample
supply of registers is provided, The architecture's working register set consists of 16, 32-6it global regis,ters and
16, 32-bit local registers,
3.1.1 GLOBAL AND LOCAL REGISTERS
The procedure call and return mechanism, which is
part of the 80960 architecture; inspires the names given
to the local and global registers, When a procedure call
or return is executed, the contents of global registers
are preserved across prqcedure boundaries, In other
words, the same set of global registers is used for each
procedure, A new set oflocal registers, however, is allo- .
cated for each procedure. The 80960's call and return
mechanism is explained in Section 3,8.
The 80960CA supplies 16, 32-bit global registers designated gO through g15. Registers gO through g14 are
general purpose global registers. Register' g15 is. reserved foi' the current Frame Pointer. This register is
available in assembly language as the fp register. The fp
contains the address of the first byte in the current
stack frame. The fp register and the stack frame are
described in Section 3.8.
The 80960CA supplies 16, 32-bit Local Registers designated rO through r15. Registers r3 through r15 are gen, eral purpose local registers. Registers rO, rl, and r2 are
reserved for special functions as follows: rO contains the
Previous Frame Pointer" r1 contains the Stack Pointer,
and r2 is reserved for the Return Instrnction Poinier.
These registers are available in assembly language as,
respectively, the pfp, sp,and rip registers. The pfp, sp,
and rip registers manage stack frame linkage for the
80960's procedure call and return mechanism. The
function of these registers is decribed in Section 3.8.
3.1.2 SPECIAL FUNCTION REGISTERS AND
CONTROL REGISTERS
The 80960CA uses 3 Special Function Registers (SFRs)
for communicating with on-chip peripherals. These
SFR's are an architectural extension specific to the
80960CA. The SFRs on the 80960CA are designated as
sm, sn, and sn. SFRs are accessed as souree operands
by most of the 80960CA's instructions. The registers
serve as part of the' programlll-er's interface to the
DMA and interrupt controller.
3-132
80960CA PRODUCT OVERVIEW
Control registers, like SFRs are used to communicate
with the on-chip peripherals. Configuration information for the peripherals is generally stored in these registers. Control registers can only be accessed by using
the system control (sysctl) instruction. The sysctl
instruction is used to load the internal control register
from a table in external memory called the control table. In order to simplify the process of peripheral configuration, the control registers are automatically loaded from this table at initialization.
3.1.3 LITERALS
The 80960CA provides literals which may be used in
the place of source register operands in most instructions. The literals range from 0 to 31 (5 bits). When a
literal is used as an operand, the processor expands it to
32 bits by adding leading zeros. If the instruction defines an operand larger than 32 bits, the processor zero
extends the literal to the operand size.
3..2 Address Space and Memory
The address space of the 80960CA (Figure 3-2) is considered a subset of the execution environment since the
code, data, data structures, and external peripherals for
the processor reside here. The 80960 family has an address space which is 232 bytes (4 Gbytes) in size. This
address space is linear (unsegmented); therefore, code,
data, and peripherals may be placed anywhere in the
usable space. For the 80960CA, some memory locations are reserved or are assigned special functions as
shown in Figure 3-2.
3.2.1 INTERNAL DATA RAM
The 80960CA provides I Kbyte of internal static RAM
for fast access of frequently used data. The data RAM
allows time critical data storage and retrieval, with no
dependence on the performance of the external bus.
Any
load
or
store,
induding
quad-word
ADDRESS
a
0000 0000 H
Interrupt Vectors (optional)
(Internal SRAM)
0000 003F H
0000 0040 H
64
DMA Registers (optional)
(Internal SRAM)
0000 OOBF H
0000 OOCO H
192
Data RAM (Internal SRAM,
User Write Protected)
0000 OOFF H
0000 0100 H
256
Data RAM (Internal SRAM.
Programmable User
Write Protection)
0000 03FF H
0000 0400 H
1024
~
Code/Data
Architectu rally
Defined Data
Structures
(External Memory)
;:.
FEFF FFFF H
FFOO 0000 H
:<~
FFFF FEFF H
FFFF FFOO H
FFFF FF2C H
FFFF FF2D H
Reserved
~
Initialization Boot Record
(External Memory)
Reserved
FFFF FFFF H
232 -1(4 GbytesZ
270669-5
Figure 3·2. Address Space
3-133
InteL
80960CA .PRODUCT OVERVIEW
operations, execute in a single clock cycle when directed to internal data RAM. The data RAM is located at
address OOHin the processor's address space. When the
DMA .controller is in use, 32 bytes of data RAM are
reserved for each active DMA channel. Additionally,
64 bytes of data RAM are reserved for 16 interrupt
vectors which may be cached internally to reduce interrupt latency. The data RAM reserved for the DMA
controller and the interrupt controller can be used for
additional data storage when these peripherals are not
used.
3.3 Memory Addressing Modes
The 80960CA offers a variety of modes for memory
addressing; The addressing modes available are summarized in Table 3-1.
Absolute addressing is used to reference an address as
an offset from address 0 of the processor's address
space. At the machine level, absolute addressing may be
implemented in one of two ways depending on the size
of the absolute offset from address O. Two instruction
formats, MEMA and MEMB, are used to provide absolute addressing modes. For the MEMA format, the offset is an ordinal number ranging from 0 to 2048. For
the MEMB format, the offset is an integer (called a
displacement) ranging from - 231 - 1 to 231. An assembler will choose the MEMA or MEMB format based on
the .size of the offset.
Two execution modes are possible on the 80960CA,
user mode or supervisor mode. These modes are used to
implement a protection model in which system data
structures are isolated from user code. As shown in
Figure 3-2, the first 256 bytes of data RAM are always
write protected when a program is executing in user
mode but may always be written when executing in
supervisor mode. The remainder of the data RAM can
be programmed for. this protection feature. The user
and supervisor modes are described further in Section
3.7.
Register-indirect addressing modes use a 32-bit ordinal
value in a register as the base for the address calculation. Offsets and indexes are added to this address base
depending on the particular addressing mode. The
register-indirect-with-index addressing mode adds a
scaled index to the address base. The index is specified
as a value in a register. The scale value may be selected
as 1, 2, 4, 8, or 16.
3.2.2 RESERVED ADDRESS SPACE
The upper 16 Mbytes of memory (FFOOOOOOHFFFFFFFFH) are reserved for specific functions and
extensions to the 80960 architecture. The 12 words in
reserved space (FFFFFFOOH - FFFFFF2CH) are used
to start up the processor when it comes out of reset.
These 12 words are called the initialization boot record.
The index-with-displacement addressing mode uses a
scaled index plus an integer displacement. No address
base is used in this address calculation.
3.2.3 ARCHITECTURALLY DEFINED DATA
STRUCTURES
a
To execute program on the 80960CA, data structures
specific to the 80960 architecture must reside in the
processor's address space. Architecture-defined data
structures include stacks, initialization structures, and
various procedure entry tables. These data structures
may generally be located anywhere in the address
space. Pointers to each data structure are specified
when the 80960CA is initialized. The architecture-defined data· structures include:
User Stack
- Interrupt Table
Interrupt Stack
- System-Procedure
Supervisor Stack
Table
- Fault Table
The IP-with-displacement addressing mode is used with
load and store instructions to make them IP relative. In
this mode, an integer displacement plus a constant of 8
is added to the IP of the instruction to calculate the
next address.
In addition to the data structure defined by the architecture, the 80960CA requires several implementationspecific data structures which are used for configuring
peripherals and initialization. These data structures include:
Control Table
- Process Control Block
- Initialization Boot Record
Each data structure will be explained in more detail
later in this product overview.
3-134
Table 3·1. Memory Addressing Modes
Mode
Absolute Offset
Description
Offset
Absolute Displacement
Displacement
Register Indirect
Abase
Register Indirect with
Offset
Abase
+ Offset
Register Indirect with
Index.
Abase
+
Register Indirect with
Index and Displacement
Abase + (Index' Scale)
+ Displacement
Index with Displacement
(Index' Scale)
Displacement
Register Indirect with
Displacement
Abase
IP with Displacement
IP
+
+
.
(Index'Scale)
+
Displacement
Displacement
+
8
80960CA PRODUCT OVERVIEW
3.4 Data Types
The 80960CA operates on the following data types (Figure 3-3):
- Integer (8, 16, 32, and 64 bits)
- Ordinal (8, 16, 32, and 64 bits)
-
Bit
Bit Field
Triple Word (96 bits)
Quad Word (128 bits)
31
L-.J
B~S I BYTE I
7
0
I
B11:S
15
LENGTH (1 TO 32 BITS)---1'
.
0
SHORT
I
0
B~SLI_ _ _ _ _WORDI
_....I
0
31
LONG I
~~~I______------~~~
I
9 6 L·
_ _ _ _ _ _ _~_ _ _ _ _ _~_ _~~~~
TRIPLE WORD
B
rrs
1
'I
128 L _ _...;.._ _ _--'_ _ _ _ _ _ _.....IL-_ _ _ _...,._"-_ _.....:Q:,:U;,;;;A:,D.:,:W,:,OR:.;:D:J1
BITS
270669-.6
Class
Data Type
Length
Range
Numeric
(Integer)
Byte Integer
Short Integer
Integer
Long Integer
8 bits
16 bits
32 bits
64 bits
-2 7 to 27 -2 15 to 215
-2 31 to 231
-2 63 to 263
Numeric
(Ordinal)
Byte Ordinal
Short Ordinal
Ordinal
Long Ordinal
8 bits
16 bits
32 bits
64 bits
Oto28 o to 216 oto 232 Oto2 64 -
Non-Numeric
Bit
Bit Field
Triple Word
Quad Word
1-bit
1-32 bits
96 bits
128 bits
Figure 3·3. Data Types
3-135
1
1
1
1
N/A
1
- 1.
- 1
- 1
intel®
80960CA PRODUCT OVERVIEW
The following sections describe the data types supported by the 80960CA.
3.5.1.1 Data Movement Instructions
The data movement instructions move data from memory to registers, from registers to memory, and between
registers. The load instructions copy bytes, words, or
multiple words from memory to a selected register or
group of registers. Conversely, the store instructions
copy bytes, words, or groups of words from a selected
register or group of registers to memory. The move instructions copy data between registers.
.
3.4.1 NUMERIC DATA TYPES
Integers and ordinals are considered numeric data
types since the processor performs arithmetic operations with this data. The integer data type is a signed
binary value in standard 2's complement representation. The ordinal data type is an unsigned binary value.
load Instructions
-Id
load word
load orainal byte
-Idob
load ordinal short
-Idos
load integer byte
-Idib
-Idis
load integer short
-Idl
load long
-Idt
load triple
load quad
-Idq
3.4.2 NON-NUMERIC DATA TYPES
The remaining data types (bit field, triple word, and
quad word) represent groupings of bits or bytes that the
processor can operate on as a whole, regardless of the
nature of the data contained in the group. These data
types facilitate the moving of blocks of bits or bytes.
3.5 Instruction Set
Store Instructions
- st
store word
store ordinal byte
- stob
store ordinal short
- stos
store integer byte
- stib
- stis
store integer short
- stl
store long
store triple
- stt
- stq
store quad
The 80960CA features a comprehensive instruction set
(Table 3-2). Much of the instruction set is that of a
RISC architecture. Unlike pure RISe machines, however, the 80960CA provides an e'xtension to the RISC
instruction set with instructions that perform complex
functions such as procedure calls and returns, highspeed multiplies, and other complex control, arithmetic, and logical operations. The instruction set allows
functionally complex yet highly compact code to be
written for embedded control applications where memory is a valuable commodity ..
Move Instructions
move word
-mov
-movi
move long
- movt move triple
-movq move quad
3.5.1 INSTRUCTION GROUPS
The 80960CA instruction set is most easily described if
grouped by the functions listed below:
- Data Movement
- Address Computation
- Logical and Arithmetic
- Bit and Bit Field
- Comparison
- Branch
- Call and, Return
- Fault
- Debug
-
Processor Management
3.5.1.2 Address Computation Instructions
The load address (Ida) instruction causes a 32-bit address to be computed and placed in a destination register. The address is computed based on the addressing
mode selected. The load and store instructions perf9rm
a function identical to that of the Ida instruction when
calculating a source or destination address. The Ida instruction is useful for loading a 32-bit constant into a
register.
3.5.1.3 Logical and Arithmetic Instructions
The instructions which make up each of these groups
are described in the following sections.
Logical instructions perform bitwise Boolean operations on operands in registers. Since this group of instructions performs only bitwise manipulations of data,
separate logical instructions for integer and ordinal
data types do not exist. In the table below, srcl and
src2 represent processor registers or literals which are
the operands for these instructions.
3-136
80960CA PROPUCT OVERVIEW
Table 3·2. Instruction Set Summary
Data
Movement
Load
Store
Move
Arithmetic
Logical
Add
Subtract
Multiply
Divide
Remainder
Modulo
And
Not And
And Not
Or
Exclusive Or
NotOr
Shift
Extended
Shift
Extended
Multiply
Extended
Divide
Add with
Carry
Subtract with
Carry
Or Not
Nor
Exclusive Nor
Not
Nand
Rotate
Call and
Return
Comparison
Branch
Compare
Condition
Compare
Compare and
Increment
Compare and
Decrement
Condition Test
Unconditional
Branch
Conditional
Branch
Branch and
Link
Condition
Compare
and Conditional
Branch
Call
Call Extended
Call System
Return
Processor
Management
Address
Computation
Modify
Process
Controls
Modify
Arithmetic
Controls
System Control
Update DMA
Setup DMA
Flush Local
Registers
Load Address
Debug
Modify Trace
Controls
Mark
Force Mark
3-137
Bit and
Bit Field
Set Bit
Clear Bit
Not Bit
Check Bit
Alter Bit
Scan for Bit
Scan for Byte
Span over Bit
Extract
Modify
Fault
Conditional
Fault
Synchronize
Faults
Atomic
Atomic Add
Atomic Modify
=_'"
I
1l1'eII'"
8li96liCA PRODUCT OVERViEW
The atomic instructions perform read-modify-write operations on operands in memory. They allow a system
to insure that when an atomic operation is performed
on a specified memory location, the operation will be
completed before another agent is allowed to perform
an operation on the same memory. These instructions
are required to enable synchronization between interrupt handlers and background tasks in any system.
They are also particularly useful in systems where several agents (processors, coprocessors, or external logic)
have access to the same system memory for communication.
Logical Instructions
-and
src1 and src2
- notand src1 and (not src2)
- and not (not src1) and src2
src1 or src2
- or
src1 or (not src2)
- notor
(not src1) or src2
- ornot
src1 xor src2
- xor
src1 xnor src2
-xnor
not (src1 or src2)
- nor
not (src1 and src2)
- nand
not (src1)
- not
Atomic Instructions
- atadd
atomic add
atomic modify
-atmod
Arithmetic instructions perform add, subtract, multiply, divide, and shift operations on integer or ordinal
operands in registers.
3.5.1.4 Bit and Bit Field Instructions
Arithmetic Instructions
- addi
add integer
- ad do
add ordinal
subtract integer
-subi
subtract ordinal
-subo
multiply integer
-muli
multiply ordinal
-mulo
divide integer
-divi
divide ordinal
-divo
remainder integer
-remi
remainder ordinal
-remo
modulo integer
-modi
- rotate rotate bit left
shift left integer
- shli
shift left ordinal
-shlo
shift right integer
- shri
shift right ordinal
-shro
shift right dividing integer
- shrdi
The bit instructions operate on a specified bit in a register.
Bit Instructions
- setbit
set bit
- clrbit
clear bit
- notblt
not bit
alter bit
- alterbit
-scanbit
scan for bit
span over bit
" span bit
Bit field instructions operate on a specified contiguous
group of bits in a register. This group of bits can be
from 0 to 32 bits in length.
Bit Field Instructions
extract field
- extract
modify field
- modify
- scanbyte scan for byte
Extended arithmetic instructions facilitate computation
on ordinals and integers which are longer than 32 bits.
In add with carry and subtract with carry instructions,
the carry out from the previous arithmetic instruction
is used in the computation. The extended multiply instruction multiplies two ordinal source operands producing a long ordinal result (64 bits). The extended
divide instruction divides a long ordinal dividend by an
ordinal divisor and produces a 64-bit result. The extended shift right instruction shifts a 64-bit source value and produces the lower order 32 bits of the shifted
value.
Extended Arithmetic Instructions
- addc add ordinal with carry
- subc subtract ordinal with carry
- emul extended multiply
- ediv
extended divide
- eshro shift right extended ordinal
3.5.1.5 Branch Instructions
The branch instructions allow the direction of program
flow to be changed by explicitly modifying the
Instruction Pointer (IP). The target IP in a branch instruction is generally specified as a displacement to be
added to the current IP. The extended branch instructions allow IP calculation using any addressing mode.
The unconditional branch instructions always alter program flow when executed.
Unconditional Branch
Instructions
-b
- bx
branch
branch extended
The RISe branch-and-link instructions automatically
save a Return -Instruction Pointer (RIP) before the
3-138
80960CA PRODUCT OVERVIEW
- cmpibl.p
jump is taken. The RIP is the address of the instruction
following the branch and link.
- cmpible.p
Branch and Link Instructions
- bal
branch and link
- balx branch and link extended
-cmpibg.p
Conditional branch instructions alter program. flow
only if the condition code flags in the arithmetic control
register match a value specified in the instruction. The
condition code flags indicate conditions of equality or
inequality between two operands in a previously executed instruction. The arithmetic control register and condition code flags are described in Section 3.6.
-cmpibge.p
-cmpibo.p
-cmpibno.p
Based on a branch prediction flag located in the machine level instruction, the 80960CA will assume that
an instruction usually takes or does not take a conditional branch. By executing along the predicted path of
program flow, delays due to breaks in the instruction
stream are often avoided. This feature of the 80960CA
is referred to as branch prediction. The 80960CA incorporates the branch prediction feature because code using a conditional branch instruction usually favors a
single direction of program flow.
-cmpobe.p
- cmpobne.p
-cmpobl.p
-cmpoble.p
The branch prediction flag is specified at the assembly
level by appending a . t or .f to a conditional branch
instruction meaning, respectively, "assume branch taken" or "assume branch not taken". For example, the
assembler mnemonic be.t means that the processor will
assume that this branch-if-equal instruction usually
branches when encountered. In the following table .p
represents the branch prediction flag.
-cmpobge.p
- bbs.p
Conditional Branch Instructions
- be.p
branch if equal
- bne.p branch if not equal
- bl.p
branch if less
- ble.p
branch if less or equal
- bg.p
branch if greater
- bge.p branch if greater or equal
- bo.p
branch if ordered
- bno.p branch if unordered
- bbc.p
3.5.1.6 Compare and Condition Test
Instructions
Compare and conditional branch instructions compare
two operands, then branch according to the immediate
results.
Conditional Compare and
Conditions Branch Instructions
- cmpibe.p
compare integer
and branch if
equal
- cmpibne.p compare integer
and branch if
not equal
-cmpobg.p
compare integer
and branch if less
compare integer
and branch if less
or equal
compare integer
and branch if
greater
compare integer
and branch if
greater or equal
compare integer
and branch if
ordered
compare integer
and branch if
unordered
compare ordinal
and branch if
. equal
compare ordinal
and branch if
not equal
compare ordinal
and branch if less
compare ordinal
and branch if less
or equal
compare ordinal
and branch if
greater
compare ordinal
and branch if
greater or equal
check bit
and branch
if set
check bit
and branch
if clear
The 80960CA provides several types of instructions
that are used to compare two operands. The condition
code flags in the arithmetic control register are set to
indicate whether one operand is less than, equal to, or
greater than the other operand.
Compare Instructions
-cmpi
compare integer
-cmpo
compare ordinal
- chkbit check bit
Conditional compare instructions test the eXlstmg
status of the condition code flags before a compare is
3-139
80960CA PRODUCT OVERVIEW
performed. These conditional compare instructions are
provided to optimize two·sided range comparisons (i.e.
to test if a value is less than one number but greater
than another).
Conditional Compare Instructions
-concmpi
conditional compare integer
conditional compare ordinal
-concmpo
The compare and increment and compare and decrement instructions set the condition code flags based on
a comparison of two register sources, decrements or
increments' one of the sources, and finally stores this
result in a destination register.
-cmpinci
-cmpinco
-cmpdeci
-cmpdeco
3.5.1.8 Fault Instructions
The S0960CA will fault automatically as the result of
certain errant operations which may oqcur when executing code. Fault procedures are then invoked automatically to handle the various types of faults. In addition, the fault instructions permit a fault to be generated explicitly based on the value of the condition code
flags. The branch prediction flag in these instructions is
used to reduce the execution time of these instructions
when the state of the condition code flags are guessed
correctly.
Conditional Fault Instructions
- faulte.p
fault if equal
- faultne.p fault if not equal
- faultl.p
fault if less
- faultle.p fault if less or equal
- -faultg.p
fault if greater
- faultge.p fault if greater or equal
~ faulto.p· fault if ordered
- faultno.p fault if unordered
compare and increment integer
compare and increment ordinal
compare and decrement integer
compare and decrement ordinal
The condition test instructions allow the state of the
condition code flags to be t~sted. Based on the outcome
of the comparison, a true or false code is stored in a
destination register. The branch prediction flag is used
in this instruction to reduce the execution time of the
instruction when the test outcome is predicted correctly. For example teste.t (test if equal) will execute in a
shorter time if the condition code flags test true for the
equal condition. Analogous to the function of the
branch prediction flag in the conditional compare and
branch instructions, the prediction flag in this case
eliminates breaks in the micro-instruction sequence
which is used to implement the condition test instructions.
Condition Test Instructions
- teste.p
test if equal
- testne.p test if not equal
- testl.p
test if less
- testle.p
test if less or equal
- testg.p
test if greater
- testge.p test if greater or equal
- testo.p
test if ordered
- testno.p test if not ordered
~IQ)W~OO©[§·
. OOO[f@OOfiVAl~jfO@OO
.
The syncf instruction causes the processor to wait for
all faults to be generated which are associated with any
prior uncompleted instructions.
- syncf
synchronize faults
3.5.1.9 Debug Instructions
The processor supports debugging and monitoring of
program activity through the use of trace events. The
debug instructions support debugging and monitoring
software.
Debug Instructions
modify trace controls
- modtc
mark
- mark
force mark
- fmark
3.5.1.10 Processor Management Instructions
3.5.1.7 Call and Return Inlltructions
The S0960cA features an on-chip call and return
mechanism, for making procedure calls to local and system proqedures. The call instructions and the call and
return mechanism is described in Section 3.S.
The S0960CA provides several instructions for direct
control of processor functions and for configuring the
S0960CA's peripherals. A brief description of the processor management instructions is given below.
Call and Return Instructions
-call call
- calix call extended
- calls call system
- ret
return
3-140
Processor Management Instructions
- modpc modify process controls
- modac modify arithmetic controls
- sysctl . system control instruction
- udma
update DMA SRAM
- sdma
setup DMA
- flush reg flush local registers
intel®
80960CA PRODUCT OVERVIEW
3.6 Arithmetic Controls
The Arithmetic Control (AC) Register is a 32-bit on-chip
register (Figure 3-4). The AC register is used primarily
to monitor and control the execution of 80960CA arithmetic instructions. The processor reads and modifies
bits in the AC register when performing many arithmetic operations. The AC register is also used to control
the faulting conditions for some instructions. The
modac instruction allows the user to directly read or
modify the AC register.
The processor sets the condition code flags (bits 0- 2) to
indicate equality or inequality as the result of certain
instructions (such as the compare instructions). Other
instructions, such as the conditional branch instructions, take action based on the value of the condition
code flags. Table 3-3 shows the functional assignment
for each condition code flag.
Table 3-3. Arithmetic Condition Codes
Condition
Code
Condition
001
010
100
Greater Than
Equal
Less Than
The integer overflow flag (bit 8) and the integer overflow mask (bit 12) are used in conjunction with the
arithmetic integer overflow fault. The mask bit masks
the integer overflow fault. When the fault is masked,
and an integer overflow occurs, the integer overflow
flag is set but no fault handling action is taken. If the
fault is not masked, and an integer overflow occurs, the
integer overflow fault is taken and the integer overflow
flag is not set.
The no imprecise faults flag (bit IS) determines if imprecise faults are allowed to occur. Fault handling and
precise and imprecise faults in the 80960CA are discussed in Section 3.10.
3.7 Process Management
Process management refers to the monitoring and control of certain properties of an executing process. The
following sections describe the mechanisms available on
the 80960CA to perform this function.
31
15
12
Reserved (Initialize to 0)
11
8
2
i
L
0
1<11.·• ,> • ,>111 • 11<11 • ·111>11 III 1111 1/[1111 I I I
_
Condition Code
Integer Overflow flag
Intoger Overflow Mask
L.._ _ _ _ _ _ _ _ _ _ _ _ _ No Imprecise faults
270669-7
Figure 3-4. Arithmetic Control Register
3-141
II
int'eL
80960CA PRODUCT OVERVIEW
ty of the processor is restored to its priority before the
interrupt occurred.
3.7.1 PROCESS CONTROL REGISTER
The Process Control (PC) Register (Figure 3-5) provides
access to process state information. The function for
the PC register is described below.
3.7.3 PROCESSOR STATES AND MODES
Execution Mode Flag-This flag indicates that the
processor is executing in user mode (0) or supervisor
mode (I).
Priority Field-This 5-bit field indicates the current executing priority of the processor. Priority values range
from 0 to 31, with 0 as the lowest and 31 as the highest
priority.
State Flag-This flag determines the executing state of
the processor. The processor state is either executing
state (0) or interrupted state (I).
The 80960CA may execute programs in user mode or
supervisor mode. The user-supervisor protection mechanism allows a system to be designed in which kernel
code and data reside in the same address space as user
code and data, but access to the kernel procedures and
data is only allowed through a tightly controlled interface. This interface is the system call table and the interrupt mechanism. The 80960CA provides a supervisor pin (SUP) to implement memory systems which
protect code and data from possible corruption by programs executing in user mode. Some instructions and
functions of the 80960CA are also insulated from code
executing in user mode.
Trace Enable Bit and Trace Fault Pending FlagsThese fields control and monitor trace activity in the
processor. The Trace Enable Bit enables fault generation for trace events. The Trace Fault Pending Flag
indicates that a trace event has been detected.
The processor has two operating states: executing and
interrupted. In executing state, the processor can execute in user or supervisor mode. In the interrupted
state, the processor always executes in supervisor mode.
The process controls can be modified by software with
the modify process controls (modpc) instruction. The
modpc instruction may only write the PC register when
the processor is in supervisor mode.
3.8 Call and Return Mechanism
3.7.2 PRIORITIES
The 80960 architecture defines 11 means to assign priori-.
ties to executing programs and interrupts. The current
priority of the processor is stored in the priority field of
the PC register. This priority is used to determine if an
interrupt will be serviced and in which order multiple
pending interrupts will be serviced. Setting the priority
of an executing program above that of interrupts allows
critical code to be prioritized and executed without interruption.
The priority field of the PC register can be modified
. directly using the modpc instruction. The priority field
is also modified to reflect the priority of serviced interrupts. On a return from an interrupt routine, the priori-
31
2019181716
13
The 80960 architecture features a built-in call and return mechanism. This mechanism is designed to make
procedure calls simple and fast, and to provide a flexible method for storing and handling variables that are
local to a procedure. A call automatically allocates a
new set of local· registers and a new stack frame. All
linkage information is maintained by the processor,
making procedure calls and returns virtually transparent to the user. A system call instruction is provided as
a method for calling privileged procedures such as a
kernel service. The call and return model supports efficient translation of structured high level code (such as
C, or ADA) to 80960 machine language.
The procedure call and return mechanism provides a
number of significant benefits which contribute to the
performance and ease of use of the 80960CA.
1) The call and return instructions are implemented entirely on-chip, resulting in an extremely high performance implementation of these commonly used
functions.
10
1 0
Llil t 111>111>1111111111··111111InI1111
tt
L
~Reserved
~ (Initialize to 0)
1
1i
Trace Enoble'
Execution Mode
' - - - - - - - Trace Fault Pe~dlng
L...._ _ _ _ _ _ _ _ _ _ State
L-_ _ _ _ _ _ _ _ _ _ _ _ _ _ Priority
270669-17
Figure 3-5. Process Control Register
3-142
80960CA PRODUCT OVERVIEW
2) A single instruction to implement each call or return
operation results in code density improvements compared to processors which require multiple instructions to encode these functions.
3) By implementing the call and return functions as
single instructions, the 80960 architecture is open for
further optimization of these instructions, while
maintaining assembly-level compatibility.
4) A program does not have to explicitly save or restore
the variables stored in the local registers when a call
or return is executed. The processor does this implicitly on procedure calls and on returns.
5) The call and return mechanism provides a structure
for storing a virtually unlimited number of local
variables for each procedure: the on-chip local registers provide quick access to often used variables and
the stack provides space for additional variables.
3.8.1 LOCAL REGISTERS AND THE STACK
FRAME
At any point in a program, the 80960 has access to a
local register set and a section of the procedure stack
referred to as a stack frame. When a call is executed, a
new stack frame is allocated for the called procedure.
Additionally, the current local register set is saved by
the processor, freeing these registers for use by the newly called procedure. In this way, every procedure has a
unique stack and unique set of local registers. When a
return is executed, the current local register set and
current stack frame are deallocated. The previous local
register set and previous stack frame are restored. This
call and return mechanism is illustrated in Figure 3-6
where n is procedure depth for the currently executing
procedure.
The procedure stack structure is defined by the 80960
architecture. The procedure stack always grows upward (i.e. towards higher addresses) and the stack
pointer (SP) always points to the next available byte of
the stack frame. The 80960CA requires that each stack
frame begins on a 16-byte boundary. Due to this alignment requirement, a padding space of 0 to IS bytes may
exist between adjacent stack frames in memory. When
a stack frame is allocated, the first 16 words are always
assigned as storage for the local registers; therefore, the
SP initially points to the 17th word in the stack frame.
It should be noted that although each stack frame is
assigned storage space for the local registers, these locations in the stack are not guaranteed to contain the
values of the saved local registers. This is because several sets of local registers are cached on-chip rather than
written tothe stack in external memory. This caching
mechanism is described in detail later in this section.
3.8.2 PROCEDURE LINKING
The 80960· architecture automatically manages procedure linkage. One global register and three local registers are reserved for procedure linkage information.
STACK
Coli Nostlng :
call procedurel
call procedure2
call procedure3
o
o
o
Executing
call procedure(n-I)
Here - - - { > call procedure(n)
Stack Growth
(towards
higher
addresses)
*
Stock
Pointer
~
~ = Saved Registers
c::=i
= Current
Figure 3-6. Call and Return Mechanism
3-143
Registers
270669-8
iniaL
80960CA PRODUCT OVERVIEW
Figure· 3"7 describes the pointer structure used to link
frames and to provide a unique SP for each frame. Register g15 is the Frame Pointer (FP). The FP is the address of the first byte of the current (topmost) stack
frame. The FP is always updated to point to the current
frame when calls and returns are executed. Register rO
is the Previous Frame Pointer (PFP). The PFP is the
address of the first byte ofthe stack frame which was
created prior to the frame containing this PFP. Register
rl is the Stack Pointer (SP). The SP points to the next
available byte of the stack frame. Register r2 is reserved
for the Return Instruction Pointer (RIP). The RIP is
the address of the instruction which follows a call instruction, this is also the target address for the return
from that procedure. The RIP is automatically stored
in register r2 of the calling procedure when a call is
executed.
3.8.3 PARAMETER PASSING
Parameters may be passed by value or passed by reference between procedures. The global registers, the
stack, or predefined data structures in memory may be
used to pass these parameters.
The global registers provide the fastest method for passing parameters. The values to be passed into a procedure reside in the global registers of the calling procedure. When a procedure is called, the values in the
global registers are preserved. If more parameters are to
be passed than will fit in the global registers, additional
parameters may be passed in the stack of the calling
procedure, or in a data structure which is referenced by
a pointer passed in the global registers.
3.8.4 LOCAL REGISTER CACHE
The 80960CA provides an on-chip cache for saving and
restoring the local registers on calls and returns. This
cache greatly enhances performance of the call and re_turn mechanism on the 80960CA. Movement of d·ata
between the local registers and the register cache is typically accomplished in only 4 processor clocks with no
external bus traffic. When this cache is filled, the registers associated with the oldest stack frame are moved to
the area reserved for those registers on the physical
stack (Figure 3-7).
STACK
•
•
•
St ack Growth
(toward higher addresses)
I
1::::::·:·::·:·::·::::::::···::··::::::::::::::::::<:::::.::::::::::--::::::.:::::::.:::
:::
Previous Frame Pointer rO
Previous
Stack Pointer
Local
Register Set
gO
rl
::.:.
Return Instruction Pointer r2
2
•
•
•
1 6 Global Registers
on Chip
r15
I---
..
Optional Stack Variables
Padding Area
Frame Pointer
g15
Previous Frame Pointer rO
Current
Local
Register Set
, ....
Stack Pointer·
rl
!i
Reserved for RIP
r2
."
1<
W
•
r15
Optional Stack Variables
Unused Stack
I:: :·:1
Reserved
for Local Registers
270669-9
Figure 3-7. Stack Frame Linkage
3-144
80960CA PRODUCT OVERVIEW
The local register cache is a physical extension of the
internal data RAM. The part of the data RAM used for
this cache is not visible to the user and is large enough
to hold up to 5 sets of local registers. The register cache
may be extended to hold up to 15 sets of local registers.
When extended, each new register set consumes 16
words of the user's data RAM, beginning at the highest
address and growing downward. The size of the local
register cache is selected when the processor is initialized.
In some cases, the contents of the cached local register
sets may require examination or modification (e.g. for
fault handling). Since the local registers are cached, the
flushreg instruction is provided to flush the local register cache to the locations reserved for the registers on
the stack. This insures that the values in external memory are consistent with the values held in the local register cache.
The local call instructions initiate a procedure call using the call and return mechanism described earlier.
The stack frames for these procedure calls are allocated
on the local procedure stack. A local call is made using
either of two local call instructions: call or calix. The
call instruction specifies the address of the called procedure using an IP plus displacement addressing mode
with a range of - 223 to 223 - 4 bytes from the current
IP. The calix (call extended) instruction specifies the
address of the calling procedure using any of the
80960's addressing modes.
A system call is made using the calls instruction. This
call is similar to a local call except that the processor
gets the IP for the called procedure from a data structure called the system procedure table. The calls instruction requires a procedure number operand. This
procedure number serves as an index into the system
procedure table, which contains IP's for specific procedures. The system procedure table is shown in Figure
3-8.
3.S.5 LOCAL AND SYSTEM CALLS
The 80960CA provides two methods for making proce,
dure calls: local calls and system calls. Local and system calls differ in their operation and use in an application.
The system call mechanism supports two types of procedure calls: system-local calls and system-supervisor
calls (also referred to as supervisor calls). The systcm-
Procedure Entry 2
Procedure Entry 3
Procedure Entry 259
(
L3-1-----pr-o-c-ed-u-r-e-E-n~t-ry-----2---1-0~
1084
~I-------A-d-dr-es-s------TI-x-,xl
L
OO-Local10-Supervlsor
1-::-::-:---:1 Reserved (Initialize to 0)
~ Preserved
270669-11
Figure 3-S. System Procedure Table
3-145
80960CA PRODUCT OVERVIEW
local call performs the same action as the local call
instructions with one exception: the IP target for a system-local call is fetched from the system-procedure table. The supervisor "all differs from the local call as
follows:
1) A supervisor call causes the processor to switch t6
another stack (called the supervisor stack).
2) A supervisor call ~auses the processor to switch' to
the supervisor execution mode and asserts the
80960CA's' supervisor (SUP) pin for all bus accesses.
The system call mechanism offers several benefits. The
system call promotes the portability of application software. System calls are commonly used for kernel services. By calling these services with a procedure number
rather than a specific IP, application software does not
have to be changed each time the implementation of the
kernel service is modified. Additionally, the ability to
switch to a different execution mode and stack allows
kernel procedures and data 'to be insulated from application code.
3.S.6 IMPLIC.IT PROCEDURE CALLS
The call" and return mechanism described for procedure
calls· applies to several classes of call instructions as
welt as to the context switching initiated by interrupts
and faults. When an interrupt or fault condition occurs,
an implicit call is performed that saves the current state
of the processor before branching to the interrupt or
fault handling procedure. When this context switch occurs, the local registers are saved and a new stack frame
is allocated. Additionally, the values of the AC register
and PC register are saved when the implicit call occurs.
These values are restored on the return from the interrupt or fault handler.
3.9 Interrupts
An interrupt is a temporary break in the control stream
'of a program so that the processor can handle another
task. Interrupts may be triggered by the instruction
stream or by hardware sources internal and external to
the 80960CA. An interrupt request is associated with a
vector (i.e. an address) of an interrupt handling procedure. The processor will branch to the handling procedure when an interrupt is serviced. When the handling .
action is completed, the processor is restored to its state
prior to the interrupt:
3.9.1 INTERRUPT VECTORS AND PRIORITY
Interrupt vectors are simply instruction pointers (ad.dresses) to interrupt handling procedures. The 80960
architecture defines 248 interrupt vectors. This means
that 248 unique interrupt handling procedures may be
used. An 8-bit interrupt vector number is associated
with each interrupt vector. This number ranges from 8
to 255. Each interrupt vector has a priority from I to
31, which is determined by the 5 most significant bits of
the interrupt vector number. Priority 1 is the lowest
priority and 31 is the highest. Priority 0 interrupts are
not defined;
The 80960CA executes with a unique priority ranging
from 0 to 31. When an interrupt is serviced, the processor'spriority switches to the priority corresponding to
that of the interrupt request. When a return from an
interrupt procedure is executed,. the process priority is
restored to its value prior to servicing the interrupt.
This priority switching is handled automatically by the
80960CA,
The 80960CA compares its current priority and the priority of an interrupt request to determine whether to
serviqe an interrupt immediately or to delay service. If a
requested interrupt priority is greater than the processor's current priority or equal to 31, the processor services the interrupt immediately; otherwise, the proceSsor
saves (posts) the interrupt request as a pending interrupt so that it can be serviced later. When the processor's priority falls below the priority of a pending interrupt, the pending interrupt is serviced. With the:mechanism described, interrupts with a priority of 0 will never be serviced. For this reason, vectors numbered 0 to 7
are not defined.
. 3.9.2 INTERRUPT TABLE
The interrupt table (Figure 3-9) is an architecturally
defined data structure which holds the interrupt vectors
and information on pending interrupts. The first 36
bytes of the table are used to post interrupts. The 31
most significant bits in the 32-bit pending priorities
field represent a possible priority (1 to 31) of a pending
interrupt. When the processor posts an interrupt in the
interrupt table, the bit corresponding to the interrupt's
priority is set. For example, if an interrupt with a priority of 10 is posted in the interrupt table, bit 10 is set in
the pending priorities field.
The pending interrupts field contains a 256-bit string in
which each bit represents an interrupt vector. When the
processor posts an interrupt in the interrupt table, the
bit corresponding to the vector number of that inter"
rupt is set.
Portions of the interrupt table are cached on-chip in a
non-transparent fashion. This caching is implemented
to minimized interrupt latency by reducing the number
of accesses to the table in external memory when an
interrupt is serviced,
3-146
80960CA PRODUCT OVERVIEW
and the current process pnonty. I) The interrupt is
serviced immediately, or 2) the interrupt is posted (the
pending priorities field and the pending interrupts field
are modified to reflect a pending interrupt).
o
Pending Priorities
4
Pending Interrupts
Interrupts may also be requested by hardware sources
internal and external to the 80960CA. Managing the
hardware sources and posting these interrupts is handled by the interrupt controller. Interrupts requested by
hardware are posted in an internal register, not in the
interrupt table. A mask register enables or disables interrupts from each hardware source. Requesting and
posting hardware interrupts is described in Section 4.4
Interrupt Controller.
32
Vector 8
36
Vector 9
Vector 10
4o
44
.
0
.~
.>
0
1024
Vector 255
(
3.9.6 INTERRUPT LATENCY
Procedure Entry Format
31
2
The time required to perform an interrupt task switch
is referred to as the illterrupt latellcy. The latency is the
time measured between the activation of an interrupt
source and the execution of the first instruction for the
interrupt-handling procedure for the source.
1 0
rl~------A~d~d-re-ss----~lx~l~xl
270669-12
Figure 3-9. Interrupt Table
3.9.3 INTERRUPT STACK
Stack frames for interrupt handling procedures are allocated on a separate interrupt stack. The interrupt stack
can be located anywhere in the processor's address
space. The beginning address of the interrupt stack is
specified when the processor is initialized.
Interrupt latency for the 80960CA varies depending on
conditions such as:
Complex instructions are executing when thc intcrrupt occurs (e.g. sysctl, call, ret, etc.).
Outstanding loads to a local register are pending,
delaying the interrupt context switch.
Division, multiplication, or other multi-cycle instructions with a local register as destination are
executing.
3.9.4 INTERRUPT HANDLING ACTION
When an interrupt is serviced, the processor saves the
processor state and calls the interrupt procedure. The
processor state is restored upon return from the inter·
rupt procedure.
This interrupt service. mechanism is handled by an im·
plicit call operation. When the interrupt is serviced, the
current local registers are saved. A new local register
set and stack frame are allocated on the interrupt stack
for the interrupt handler procedure and the processor
switches to supervisor execution mode. In addition to
the local registers, the current value of the AC and PC
registers are saved as an interrupt record on the inter·
rupt stack.
3.9.5 PENDING INTERRUPTS
Any of the 248 interrupts can be requested by software.
The system control instruction (sysctl) is provided to
support this feature. When the system control instruc·
tion requests an interrupt, one of two actions may occur depending on the priority of the requested interrupt
The 80960CA has been designed to optimize latency
and throughput for interrupts. Two processor features
are designed for this purpose:
First, in the interrupt table, all interrupt vectors with
an index whose least significant four bits are 00102 can
be cached in internal data RAM. The processor will
automatically read these vectors from data RAM when
the interrupt is serviced. This feature reduces the added
latency due to an external access of the interrupt table
for that vector. The NMI vector is always cached in
data RAM.
Second, an instruction cache locking mechanism allows
interrupt procedures or segments of interrupt procedures to be stored in the instruction cache. These routines are always executed from the internal cache, eliminating external code fetches and reducing latency and
increasing throughput for the interrupt.
3-147
80960CA PRODUCT OVERVIEW
Table 3-4. Fault Types and Subtypes
3.10 Fault Handling and Instruction
Tracing
Fault Type
The 80960CA is able to detect various conditions in
code or in its internal state that could cause the processor to d~liver incorrect or inappropriate results or that
could cause it to head down an undesirable control
path. These conditions are referred to as faults. The
80960 architecture provides fault handling mechanisms
to detect and, in most cases, fully recover from a fault.
Parallel
The 80960CA provides on-chip debug support by triggering trace events and servicing the trace fault. A trace
event is activated when a particular instruction or type
of instruction is encountered in an instruction stream.
The trace event optionally signals a fault. A fault handling procedure for the trace fault can act as a debug
monitor and analyze the state of the processor when the
trace event occurred.
All of the faults that the processor detects are predefined. These faults are divided into types and subtypes, each of which is given a number. Table 3-4 lists
the faults that the processor detects arranged by type
and sUbtype.
XXOO OOXX
Instruction Type
Branch Trace
Call Trace
Return Trace
Prereturn Trace
Supervisor Trace
Breakpoint Trace
XX01 0002
XX01 0004
XX01 0008
XX01 0010
XX01 0020
XX010040
XX01 0080
Operation
Invalid Opcode
Unimplemented
Invalid Operand
XX020001
X0020002
XX020004
Arithmetic
Integer Overflow
Arithmetic Zero-Divide
XX030001
XX030002
Constraint
Range
Privileged
XX050001
XX050002
Protection
Length
XX070001
Type
Mismatch
XXOA 0001
NOTE:
Trace Fault Entry
Operation Fault Entry
0
8
16
Arithmetic Fault Entry
24
:::::'::<::::::::::::.:::::::>:'::::.::::::::::.,::::::::: . .:.:::.::':.::'::::::'::':::':'
Constraint Fault Entry
4o
Protection FQuit Entry
56
Fault Record
Trace
3.10.1 FAULT TYPES AND SUBTYPES
Parallel Fault Entry
Fault Subtype
31
X refers to preserved locations in the fault record.
Local Procedure Fault Table Entry
1 0
I:.::::....
Address
........:: .
.::::::.:1.0:1:0.1
System Procedure Table Fault Table Entry
31
1 0
Index
Set to 0000027F 16
Type Fault Entry
j,;.:
(::.:.:::.
""'"
.: . :.:....:: .. :
8o
.::., ,.
2 52
Reserved
(Initialize to 0)
270.669-13
Figure 3-10. Fault Table
3-148
80960CA PRODUCT OVERVIEW
--------------------------------------------------------------------------------o
31
o
Process Controls
4
Arithmetic Control
Fault Flags
I
Fault Type
I
..
I
Fault Subtype
Address of Faulting Instruction
o
12
Roserved
270669-14
Figure 3·11. Fault Record
3.10.2 FAULT TABLE
The fault table (Figure 3-10) provides the processor
with a pathway to fault handling procedures. The fault
table is an architecture-defined data structure, which
may be located anywhere in the processor's address
space. The location of the fault table is specified at initialization. When a fault occurs, an entry in the table is
selected based on the type of fault that occurs. The
entry in the fault table contains a pointer to a specific
fault handler.
The fault table can contain two types of entries (Figure
3-10). The first type of entry is simply a pointer to the
address of the fault-handling procedure. The second
type of entry is an index into the system-procedure table. Fault-handling procedures accessed through the
system-procedure table may be executed in user or supervisor execution mode.
3.10.3 FAULT HANDLING ACTION
When a fault occurs, the processor performs an implicit
call operation to the procedure specified in the fault
table. In addition to performing the implicit call operation, the processor creates a fault record. in its newly
allocated stack frame. This fault record contains information on the state of the processor when the fault
occurred and the fault type and subtype (Figure 3-11).
Some faults can be recovered from easily. When recovery from a fault is possible, the processor's fault handling mechanism allows the processor to automatically
resume work where the fault was signalled. The resumption action is initiated with the ret instruction. If
simple recovery from a fault is not possible, then the
fault handling procedure may call a debug monitor, initiate a reset, or take other actions to recover from the
fault.
3.10.4 TRACING AND DEBUG
The 80960CA provides a facility for monitoring the activity of the processor by tracing the instruction stream.
A trace event occurs at points in a program where certain types of instructions are encountered or a certain
IP or data address is encountered. When a trace event
occurs, a trace fault can be generated and a trace-fault
handler called which displays or analyzes the state of
the processor.
3.10.4.1 Trace Events
The Trace Control (TC) Register (Figure 3-12) is used
to specify the types of instructions which cause trace
events. When a mode bit in the TC register is set, specific instructions will generate trace events. For example, if the branch trace mode bit is enabled and a
branch instruction is executed, a branch trace event will
be signalled. An event flag is used to record trace
events. A single event flag is provided for each mode
bit. Any trace event generates a trace fault when the
trace enable bit in the process control register is set.
The 80960CA recognizes 7 trace events. These events
are described below.
Instruction Trace Event-Signalled each time an instruction is executed. This trace event can be used with
a debug monitor to single step the processor.
Branch Trace Event-Signalled each time a branch instruction is executed. For conditional branch instructions, this event is only signalled when the branch is
taken. Branch-and-link, call, and return instructions do
not signal this trace event.
Call Trace Event-Signalled each time a branch-andlink or call instruction is executed. Implicit calls, such
as those used in interrupt or fault handling, signal this
event. When a call trace event occurs, the prereturn
trace flag (bit 3 in local register rO) is set by the processor to indicate a prereturn trace pending.
Pre-Return Trace Event-Signalled just prior to any ret
instruction. This event is only signalled if the pre-return
trace flag in register rO is set. Since the pre-return trace
flag is set when a call trace event occurs, the call trace
mode must be enabled before a pre-return trace event
can be signalled.
Return Trace Event-Signalled each time a ret instruction is executed.
3-149
80960CA PRODUCT OVERVIEW
31
2726252423222120191817
76 5 4 3 2 1
11111 I I I I I I I I I I 1111':1111] I I I I I I [I
l!~
f,C
L
Instruction Trace Mode
Branch Trace Mode
Call Trace Mode
Return Trace Mode
Pre return Trace Mode
' - - - - - - - - Supervisor Trace Mode
L-_ _ _ _ _ _ Breakpoint Trace Mode
Trace Event Flogs:
' - - - - - - - - - - - - - . , . - - - - Instruction Trace
'-----------------'------------------'---------------------'-------------------'--------------------'---------------------'---------------------'-----------------------'------------------------'--------------------------
Branch Trace
Call Trace
Return Trace
Prereturn Trace
Supervisor Trace
Breakpoint Trace
Data Address Breakpoint 0
Data Address Breakpoint I
Instruction Address Breakpoint 0
Instruction Address Breakpoint 1
Reserved
(Inltlallz~d to 6)
270669-15
Figure :3-12. Trace Control Register
Supervisor Trace Event-Signalled each time a calls
instruction is executed where the selected entry type is
supervisor, or when a ret from supervisor mode is executed. '
Breakpoint Trace Events--Signalled each time a mark
instruction, fmark instruction, or specified address is
encountered in the instruction stream. The mark instruction signals an event when the breakpoint trace
mode is enabled, the fmark (force mark) instruction
will generate a breakpoint trace event regardless of the
value of the breakpoint trace mode bit.
Two IP breakpoint registers and two internaL data address breakpoint registers are provided' on the
,80960CA. These breakpoints are loaded' with an instruction or data address using the system control
(sysctl) instruction. Whim the address is encountered
and the breakpoint ,trace mode bit is set, a breakpoint
trace event occurs. A corresponding instruction or data
address event flag is set in the TC register when the
, addreSs is encountered.
3.10.5 PROCESSOR INITIALIZATION
The Initial Memory Image (IMI) are the data structures needed to initialize the 80960CA (Figure 3-13).
The initialization boot record, in reserved memory beginning at FFFFFFOOH, contains a pointer to the Processor Control Block (PRCB). The PRCB in turn holds
pointers to the data structures which are necessary to
execute code on the 80960CA. The PRCB also holds
several fields which contain information to initially
configure the 80960CA.
Processor initialization begins by asserting the RESET
pin. At initialization the processor optionally performs
an internal self-test. A bus confidence test 'is also performed by calculating a checksum of 8 words read from
externai memory. If either of these self-tests fails, the
FAIL pin indicates the failure and the processor aborts
initialization. If the self-test passes, the 80960CA continues with initialization and branches to the first address of the user's code.
3-150
80960CA PRODUCT OVERVIEW
Fixed Data Structures
Address
Relocatable'Data Structures
Initialization Boot Record:
User Code:
FFFFFFOOH
1
Bus
Configuration
(Least Significant Byte)
FFFFFF10H
First Instruction Pointer
FFFFFF14H
PRCB Pointer
FFFFFF18H
-
Process Control Block (PRCB):
Byte Offset
Fault Table Bose Address
OH
Control Table Bose Address
4H
AC Register Initial Image
BH
Fault Configuration Word
CH
Interrupt Table Bose Address
10H
System Procedure Table
Bose Address
14H
•.• • • • •>?·.·
. ·.·.~
....:. .·.·.·.·.c
. : . . . . : ..: ......
18H
Interrupt Stack Pointer
1CH
Instruction Cache
Configuration Word
20H
Register Cache
Configuration Word
24H
r--6 Check Words
(for bus confidence self-test)
:......FFFFFF2CH
-
-
II
Control Table
~l>
...;1>
---+
Interrupt Table
"I
"I
---+
System Procedure Table
-
...;1>
...;l>
Other Architecturally Defined
Data Structures
(not required as part of 1M!)
270669-19
Figure 3-13. Initial Memory Image
3-151
intel~
80960CA PRODUCT OVERVIEW
4.0 80960CA SYSTEM
IMPLEMENTATION
This section is an overview of the peripherals integrated
with the 80960CA core. The features and operation of
the Bus Controller, DMA Controller, Interrupt Controller, and the interfaces between these peripherals and
the core are described.
Bus Request-A bus request is issued by the core and
directed to the Bus Controller. A bus request is sent to
the BCU when a load, store, or an atomic instruction is
executed, or wheJl an instruction fetch is needed. Bus
requests are also issued by the core to 'perform DMA
transfers. A bus request can consist of one or more bus
accesses. For example, an aligned word (32-bit) request
to an 8"bit memory region will result in four bytelength accesses.
4.1 Peripheral Interface
4.2.2 BUS CONTROL COPROCESSOR
A program communicates with the on-chip peripherals
by reading or modifying the special function registers
(SFRs) or by loading control registers. The SFRs generally serve to transfer status information and data between a peripheral and the core, and the control registers serve to configure the peripherals. SFRs are accessed directly as instruction operands. The control
registers are loaded by using the system control (sysctl)
instruction.
The 80960CA's peripherals are often referred to as co-.
processors, since their operation is decoupled from the
execution of the instruction stream. As an integrated
coprocessor, the BCD receives 'bus requests and independently carries out the action of moving data or code
between the processor and external memory. The BCU
uses a three deep queue to store pending bus requests.
The queue decouples the core from the BCU, since a
series of adjacent requests may be issued faster than the
BCU can service each request. Two of 'the three queue
entries store requests from a user's program (loads,
stores, fetches, etc.). The third queue entry is used by
requests originating from a DMA operation. This
queue entry takes user requests when the DMA is
turned off. The 80960CA alternates service of requests
issued by the user program and requests issued by a
DMA operation.
4.2 Bus Controller Unit
Thc Bus Controller Unit (BCU) manages the data and
instruction path between the 80960CA and extcrnal
memory. Data operations and instruction fetches share
a 32-bit data bus. Memory addresses are output on a
separate 32-bit address bus. The BCU incorporates several advanced features to simplify the bus interface to
external memory. A programmable memory region configuration table allows the characteristics of the external bus to be programmed differently for 16 separate
regions in memory. The attributes of the external bus
which are programmable include wait states and external ready control, data bu~ width (8, 16, or 32 bits),
burst mode, address pipelining, and byte ordering. The
region programmable bus options are described in this
section.
4.2.1 BUS TRANSFERS, ACCESSES, AND
REQUESTS
The distinction between transfer, bus access, and bus
request, as these terms apply to the 80960CA,.must be
presented before beginning a discussion of the BCU.
Transfer-A bus transfer is defined simply as a movementof code or data between a memory system and the
80960CA. A write transfer occurs when the memory
system is the destination of a data movement. A read
transfer occurs when the 80960CA is the destination
for a data or a code fetch from memory.
Bus Access-A bus access is defined as an address cycle
and one or more transfers. In burst mode, an access can
consist of a single address cycle and I to 4 transfers.
4.2.3 SIGNAL DESCRIPTIONS
The external bus signals consist of 30 address signals, 4
byte enables, 32 data lines, and various control signals.
D31-DO 32-bit Data Bus (bi-directional)-32-, 16-,
and 8-bit values are transmitted and re:
ceived on these lines. The 8- and 16-bit
quantities are transferred on the low order
data lines when a memory region is configured respectively for an 8- or 16-bit bus.
A31-Al 30-bit Address (outputs)-The 3D-bit address bus identifies all external addresses to
word (4-byte) boundaries. The byte enable
lines indicate the selected byte in each
word.
BE3-BEO Byte Enables (outputs)-The byte enables
select which of 4 addressed bytes are active
in a memory access. When a memory re~ is configured for an 8-bit bus width,
BEl and BEO act .as the lower two bits of
the address. For a 16-bit memory region,
BEI,BE3, and BED are encoded to provide
AI, BHE, and BLE respectively.
W/R"
Write or Read (output)-This signal is low
for read accesses and high for write access~
es.
ADS
Address Strobe (output)-Indicates valid
address and the start of a new bus access.
3-152
in~.
DT/R
DIC
80960CA PRODUCT OVERVIEW
Data Transmit or Receive (output)Direction control for data transceivers;
similar to WiR.
Data Enable (output)-Low during a
bus request after the first address cycle. This signal is used to control data
transceivers and to indicate the end of
a bus request.
Wait (output)-Indicates that wait
states are being inserted by the internal
wait state generator.
Ready (input)-Signals that data is
valid for a read transfer or ends data
hold for a write transfer. This function
can be disabled for a memory region.
Burst Terminate (input)-Terminates
a burst access. Another address is generated to complete the request when
the signal is deasserted. This function
can be disabled for a memory region.
Lock (output)-Indicates that an
atomic memory operation is in progress. This signal can be used to inhibit
external agents from modifying memory which is atomically accessed.
Burst Last (output)-Indicates the last
transfer in a burst access.
Hold (input)-HOLD can be used by
a bus requester to request access to the
bus. The processor asserts HLDA after the current bus request or locked
requests have completed.
HOLD
Hold Acknowledge (output)'-:Indicates to a bus requester that the processor has relinquished control of the
bus.
Bus Request (output)-Indicates that
requests are queued in the bus controller and are waiting to be serviced.
BREQ can be used for external bus arbitration logic in conjunction with
HOLD and HLDA to regain bus mastership ..
HOLDA
BREQ
Data or Code (output)-Indicates a
data transfer or a code fetch.
DMA Access (output)-Indicates that
a bus request was initiated by either
the user program or the DMA.
Supervisor Access (output)-Indicates
that a bus access originated from a bus
request issued in supervisor mode.
This signal can be used to protect system data structures, or peripherals
from errant modification by the user
code.
Figure 4-1 shows the timing for a simple, non-burst,
non-pipelined read and write access. The timing relations for the key control signals are shown in this figure.
PCLK [
ADS [
:4.SDE [--'""IXr-~--'---'-~--'---':'---'""IXr-Valid
A31
DMA, Die
BE3:0
W/R [
BLAST [
DT/R [
5EN[
--\
I
T
r'L-l
r
~
- iJ\
II
270669-20
Figure 4·1. Basic Read and Write Request
3-153
intel®
80960CA PRODUCT OVERVIEW
4.2.4 MEMORY REGION CONFIGURATION
TABLE
The BCU can be configured differently for 16 separate
sections (referred to as regions) of the address space.
The four most significant bits of a memory address define the location of each region in memory. The bus
characteristics in a region are specified in the memory
region configuration table. When a bus request is serviced, the BCU accesses the configuration table entry for
the region addressed and services the request based on
the bus characteristics programmed for that region.
The characteristics programmed for each region are
listed below:
Burst Mode (on/oft)
Ready Inputs (on/oft)
Address Pipelining
Wait States
(on/oft)
(5 parameters)
Byte Ordering
Bus Width
(Big/Little Endian)
(8-, 16-, or 32-bit)
The flexibility of region programming simplifies the bus
interface in applications where a memory system is
made up of a variety of sub-systems, such as SRAM,
DRAM, ROM, and memory mapped peripherals. Each
memory sub-system can be mapped into a different region in memory, and that region can be configured spe, cifically for the requirements of the particular memory
sub-system.
Region
The configuration table is made up of 16 on-chip control registers (Figure 4-2). Each register is programmed
with the configuration information for a single region.
Since the region table is located on-chip, access to region information does not affect the performance of the
bus.
4.2.4.1 Burst Accesses
The 80960CA BCU is capable of burst accesses to
memory systems which are designed to support this feature. Burst mode is intended to get the most performance from low cost memory systems. A burst access is a
single address cycle followed by successive data or instruction transfers. The transfers reference data or instructions at sequential addresses starting at the address
which began the burst access (Figure 4-3). In a burst
memory system, the upper 28 bits of an address remain
fixed while the lower two bits A2 and A3 increment to
access subsequent locations.
Wait state timing for the first access of a burst request
is controlled. independently from the timing for subsequent accesses. A memory sub-system using static column mode or page mode DRAMs, for example, can
take advantage of the short column access times for
these devices by using burst mode. Interleaved ROM or
EPROM systems can also be constructed which simultaneously access several words and then use burst mode
to multiplex the multi-word array onto the data bus.
MEMORY REGION
CONFIGURATION TABLE
REGION TABLE ENTRY
o11-_
1--_
- -_-_
---4
4
_-_
2
3
4
5
1--------4
1--------4
1--------4
1---------4
8
1--------4
1--------4
10
11
12
13
14
1---------4
1---------4
1---------4
1---------4
!-_ _ _ _ _.....,-I
7
~
L...l.....J
Reserved
(initialize To 0)
15 L-_ _ _ _ _ _--I
L----------------------------------BnEOROER
270669-22
Figure 4-2. Memory Region Configuration Table
3-154
80960CA PRODUCT OVERVIEW
Read Request:
o
Clock
o
o
• •
•
o
»00000OO«
Address
ZKJ(
Data
-------------<::}-----<::}-----<:}-----<::}---------------
X
X
X
Data Transfer
Bus Accoss
,:
I··
..
270669-21
Figure 4-3. Burst Memory Request
Read Request:
NRAD
Walt Stot.
Counter
o
=3 1 NRDD =2 1NRDD =2 1 NRDD =21
o
3
2
10
0
2
1
0
2
1
2
0
•
Clock
Addross
~
Doto
-------------0-----0----0-----0------------.---
X
X
X
lOOOOOOOO(I
Data Transfer
Bus Accoss
I·····
,::->::.::.. :.
..... ·.. ··· ..·············1
270669-23
Write Request:
NWAD
Walt Stot.
Counter
o
3
2
=
! NWDO =21
3
o
2
1
0
NWOO
2
=21
NWOO
102
1
=2!
0
NXOA
3
2
=3 .1
0
3
2
Clock
Addross
Data
Dato Transfer
x
~~--~'~~r--~--~~~-
X
lOOOOOOOO(I
X
~~_ _~
l( ___A
X -___XXlOOOOOOO()(XC
X ___~~
. ":: . 1: :: :"':-:'. :-.::':.:::--:-:-:-,. :-. :.: . ·:·:.;.:<·:·:·::-:·:::1·: :-:. ::..
";:. :';': ..' -:1
.................................................•....···1
Bus Access
I. •·••·······
Figure 4-4. Programmable Wait States
3-155
270669-24
intel®
80960CA PRODUCT OVERVIEW
case, the ready input has no effect until the number of
programmed wait states has expired. When the wait
state counter reaches 0, the ready input is sampled, and
wait states continue or are terminated based on the value of the ready input. In order to gain complete external control over wait states, all wait state parameters
for a region can be set to O.
4.2.4.2 Programmable Wait State Generation
The 80960CA may be interfaced with a variety of memory sub-systems and peripherals with a minimum system cost and complexity. To achieve this interface flexibility, the 80960CA implements an internal programmable wait state generator. Internally generated wait
states eliminate the potential system delays which come
from generating wait states with external logic.
4.2.4.4 Pipelined Reads
Wait states are programmed for each region in the
memory region configuration table. The number of wait
states is programmable over a range which allows efficient control of memory devices ranging from ultra-fast
SRAMs to slow peripherals. An external ready signal is
also provided for external wait state control.
The 80960CA BCU provides an address pipelining
mode (Figure 4-5) to optimize the performance of instruction and data fetches from external memory.
When the pipelined read mode is enabled, an address
cycle overlaps with the last data cycle in each access,
effectively reducing the total time needed for each access. Pipelining mode is selected in each region by programming the memory region configuration table.
The wait states which can be generated by the
80960CA are shown in Figure 4-4. In this table N is the
number of wait states inserted. The wait states for read
accesses and for write accesses are described by three
parameters each. For read accesses, NRAD is the number of states between the address cycle and the first
data cycle and NRDD is the number of states between
consecutive data cycles in a burst access. For writes,
NWAD is the number of states that data is held after an
address cycle, and NWDD is the number of states that
data is held for consecutive data cycles in a burst write.
For both reads and writes, NXDA is the number of
dead cycles after the last data cycle and before the next
address.
4.2.4.5 Byte Ordering
One of two configurations for byte ordering, often referred to as little endian or big endian, is selected for
each region by programming the memory region configuration table. The byte ordering options make the
80960CA capable of sharing memory with a processor
which uses either byte ordering scheme. Byte ordering
refers to the way that the 80960CA relates internal data
to the way that data is stored or fetched from memory.
The little endian configuration· orders the bytes in a
short-word or word so that the least significant byte of
the quantity is positioned at the lowest address and the
most significant byte at the highest address in memory.
Conversely, for the big endian configuration, the least
significant byte is positioned at the highest address, and
the most significant byte at the lowest address. For example, for little endian ordering, byte 0 for word data
would be found in memory· at an address of the form
XXXX XXXOH and, for big endian, at address XXXX
XXX3H.
4.2.4.3 READY Control
The memory region configuration table allows the
ready input (READY) to be enabled or disabled for
each region. If the ready input is disabled, the external
input has no effect on the wait states generated for a
memory access; all wait states are generated internally.
If the ready input is enabled, it works in conjunction
with the programmable wait state generator. In this
Wait Stat.
Counter
Clock
o
•
o
1
•
o
•
•
o
•
•
Address
m
Data
--------------<::::}-----<::::)----<::::}-----<::::}---
X
X
X
XXXXXXXXXX
Dato Transfer
I
Bus Access
,-:.
---.,
,:::
,-. -.
270669-25
Figure 4·5. Pipelined Read Request
3-156
80960CA PRODUCT OVERVIEW
4.2.4.6 Data Alignment
The 80960CA can service any aligned or non-aligned
bus request. Aligned requests are directed to their natural boundary in memory. In other words, the addresses
for aligned requests are even multiples of the length of
the data transferred: Non-aligned requests are not serviced directly by the BCU but are assisted by microcode.
Microcode automatically breaks non-aligned requests
into multiple aligned requests which are then reissued
to the BCU. Depending on the degree of non-alignment
and the length of the original request, the resulting requests by microcode will consist of a combination of
byte, short-word, and double-word requests. The BCU
is able to generate an operation-unaligned fault when a
non-aligned bus request is first received. This fault can
be selectively masked at initialization.
4.3 DMA Controller
The DMA controller is a high-performance, full-functioned integrated peripheral. The DMA controller can
manage 4 channels of DMA transfer concurrent with
program execution. Separate external control for each
channel is provided. Each channel supports high-performance memory to memory transfers where the
sourCe and destination can be any combination of internal data RAM or external memory. The DMA Controller supports various types of transfers such as highspeed fly-by transfers and data chaining with the use of
linked descriptor lists in memory.
The 80960CA's DMA controller is implemented using
dedicated hardware and microcode. Because of the efficiency of the core, it is possible for the microcode to
execute DMA transfers at high speeds. DMA transfers
are performed by the core concurrently with execution
of the user's program. Internal DMA logic is used for
sampling requests, synchronizing transfers with external devices, and handling the service of multiple active
channels.
4.3.1 SIGNAL DESCRIPTIONS
Twelve pins are dedicated to the DMA controller.
Three pins are associated with each, DMA channel.
These pins are described below. In this description, the
pin number corresponds to the channel number. For
example, the DREQO pin is the request pin for
channelO.
DMA Request (input)-This input indicates that an external device is requesting a DMA transfer. A DMA
transfer refers to the complete transfer
of one byte, short-word, word, or quadword, depending on the transfer data
width selected for the channel.
DMA Acknowledge (output)-This
output becomes active when the requesting device is accessed.
EOP3/TC3- End of Process (input) or Terminal
EOPO/TCO Count (output)-This pin functions either as an input (EOPx) or as an output
(TCx). When programmed as an output, the pin is driven active for one
clock after byte count reaches zero and
a DMA terminates. When programmed
as an input, an external device can
cause the DMA operation to terminate.
4.3.2 DMA TRANSFERS
The 80960CA DMA controller supports a variety of
transfer modes and variations of these modes, allowing
the DMA to adapt to a number of hardware systems
and the performance requirements of these systems.
4.3.2.1 Standard Block and Demand Mode
Transfers
A standard DMA transfer is made up of multiple bus
requests. Loads from a source address are followed hy
stores to a destination address. The DMA controller
issues the proper combination of these bus requests (0
execute the DMA transfer. For example, a typical
DMA transfer between memory and an 8-bit peripheral
could appear as a single byte load request directed to
the source memory, followed by a single byte store request directed to the 8-bit peripheral.
The DMA controller has two basic transfer modes:
block mode (unsynchronized) and demand mode (synchronized). Any DMA transfer will be serviced by one
of these basic transfer modes.
A block mode DMA is initiated by software. Block
mode DMAs are generally between memory. Block
mode DMA transfers are not synchronized with any
type of request from an external device. Once the DMA
begins, it will continue until the entire block is complete or until it is suspended. The source 'and destination addresses for block mode transfers can be incremented or held constant for a DMA.
A demand mode DMA is controlled by an external
device. Demand mode DMAs are generally between an
external device and memory. In demand mode, each
individual DMA transfer can be synchronized with a
request. The request is signalled when an external device activates a DMA channel request pin (DREQ3DREQO). The DMA controller acknowledges this request with the DMA acknowledge pin (DACK3DACKO) when the requesting device is accessed. A demand mode transfer may be synchronized with either
the source or the destination device.
3-157
II
int:eL
80960CA PRODUCT OVERVIEW
4.3.2.2 Fly-by Transfers
A fly-by transfer mode is provided for the most performance-critical DMA applications. Fly-by mode also
makes very efficient use of the external bus during a
DMA. Standard DMA transfers involve mUltiple bus
requests: load requests directed to the source and a
store request directed to the destination. Fly-by transfers only require a single bus request. For a fly-by transfer, memory sees a load or a store on the bus while the
requesting device is selected by the DMA acknowledge
pin. The data is never actually read from or written to
the 80960CA. For memory to device transfers, the
processor issues a load, and, while reading the memory;
accesses the external device with the DMA acknowledge pin. The data is then written directly to the destination device with a single bus request. For a device to
memory transfer, the reverse operation is performed.
The DMA issues a store, and, while writing the memory, accesses the source device with the DMA acknowledge pin. In this case, the processor floats the data bus
and the device's data is written directly into memory.
the number of bytes in the byte count field in the descriptor is transferred. At this time, another linked-list
descriptor may be executed. The next descriptor is
specified by the next-pointer field in the current description. Data chaining continues until a null pointer
is encountered in the next-pointer field. Data ch-aining
can be designated as source chaining, destination chaining, or both.
In data chaining mode, an option exists which allows
chaining descriptors to be updated while the DMA is
running. When this option is enabled, the DMA sets a
bit in the DMA's special function register after loading
a descriptor and then checks this bit before loading the
next descriptor. If the bit has been cleared by the user,
the DMA.continues; otherwise, the DMA waits for the
next descriptor to be set up and for the user to -clear the
bit. An interrupt can be generated when each buffer is
complete or when the DMA is terminated with a null
pointer or the EOP pin.
4.3.3 TRANSFER CHARACTERISTICS
The DMA controller provides the programmer with a
number of options for configuring the characteristics of
a DMA transfer. Intelligent selection' of transfer characteristics works to balance DMA performance and
functionality with performance of the user program
when the DMA is in progress.
4.3.2.3 Data Chaining
Each DMA channel can be programmed in a data
chaining mode. In this mode, all transfer information is
taken from a linked-list descriptor in memory (Figure
4-6). Data chaining is started by specifying a pointer to
a descriptor in memory. The transfer continues until
Internal Register
t
User Loads
Destination Buffer
BC = Byte Count
SA = Source Address
DA = Destination Address
NPTR = Next Pointer
---------
Not Used For Source Chaining
Terminate
270669-26
Figure 4-6. Source Data Chaining
I
3-158
80960CA PRODUCT OVERVIEW
The DMA controller provides features to opttmlze
transfers by moving a maximum amount of data for
each bus request issued. This is controlled by specifying
the width of the source and destination directed bus
requests for a DMA transfer, and by on-chip assembly
or disassembly of the transfer when source and destination are not of equal widths.
Data alignment is performed automatically by the
DMA controller when the source and destination of a
transfer are not aligned. The alignment algorithm is
optimized for many transfers, providing a performance
comparable to the aligned transfer cases.
4.3.3.1 Transfer Data Length
The transfer data length specifies the length of bus requests directed to the source and destination in a standard DMA transfer. Byte, short, word, or quad-word
loads and stores are selected for either source or destination when a DMA channel is set up. Assembly and
disassembly of data is automatically performed when
the source and destination widths are different. This
feature provides the most efficient use of the bus when
DMA transfers occur between a source and a destination with different exte~nal bus widths.
LSB
USB
The DMA controller provides the option of using quad
word transfers to enhance DMA performance. When
quad transfers are specified, the DMA will request a
four-word load request and four-word store request for
each DMA transfer. The trade-off for the added DMA
performance is latency on the external bus, preventing
requests by the core, or by another DMA channel from
being immediately serviced.
4.3.3.2 Data Alignment
The DMA controller supports transfer of source and
destination data aligned to different byte boundaries in
memory. The DMA implements microcode algorithms
to transfer some non-aligned data with a performance
level approaching that for aligned transfers. The DMA
accomplishes this by attempting to ~sue the maximum
number of aligned bus requests during a DMA (Figure
4-7). As shown, most of the overhead due to nonaligned DMAs is incurred at the beginning and end of
the DMA. DMAs with low byte counts, therefore, do
not benefit as much from the dataalignment features of
the DMA. The alignment feature ~ optimized for 8-bit
to 8-bit, 32-bit to 32-bit and for 8-blt and 32-bit combinations of source and destination lengths.
Bus Operation
Address
00DOO200H
Source
Uemory
Region
Access
Operation
Number
DODO 0204H
0000 C208H
1
2
3
4
5
0000 C20CH
6
7
8
9
load_word
store_byte
load_word
store_word
load_word
store_word
loael-word
store_short
store_byte
Address
00000200H
00000303H
00000204H
00000304H
00000208H
00000308H
0000020CH
0000030CH
0000030EH
0000 0300H
Byte Number.
0000 0304H
Destination
Memory
Region
0000 C30SH
0000030CH
270669-27
Figure 4-7. DMA Data Alignment
3-159
270669-28
II
in1'et·
80960CA PRODUCT OVERVIEW
assembly and disassembly. Source or destination memory configured as burst memory will provide the most
efficient use of the DMA controller when the quadtransfer feature is enabled. Using the fly-by mode reo.
duces the number of bus requests needed for a DMA
since fly-by mode uses only a single load or a single
store request for each transfer.
4.3.3.3 Channel Priority
The DMA controller arbitrates the priority of the 4
DMA channels. If multiple DMA channels are enabled, the DMA controller will determine in which order each channel is serviced.
The DMA controller can be configured in one of two
priority modes, fixed mode or rotating mode. The fixed
mode assumes a fixed priority for each channel with
channel 0 having the highest priority, followed by channels I, 2, and 3, with channel 3 having the lowest priority. The rotating mode ilpdates a·channel's priority to
the lowest priority after that channel's DMA is made.
This insures that a single channel is never locked out by
other active channels. The priority sequence is always
in the same order, with priority rotating from the low
channel numbers to the high channel numbers.
4;3.4 DMA CONTROL AND CONFIGURATION
The DMA Controlier uses an SFR register, the DMA
command (DMAC) register, and the setup DMA
(sdma) instruction for configuration and control of a
DMA. The sdma instruction is used to configure each
DMA channel. Transfer widths, byte count, source and
destination addresses for a DMA are specified in this
instruction.
The DMAC register (Figure 4-8) is described. below.
4.3.3.4 Performance and Latency
Considerations
DMA operations and the user program share the resources of the core and of the external bu.s. DMA .performance and the performance of the user program are
coupled directly to the balance of load sharing between
these two processes. The core resources necessary to
perform a DMA transfer vary depending on the way a
channel has been configured. For example, byte assembly and disassembly requires more processor overhead
per byte of transfer than does a transfer in which the
source and destination transfer lengths are equal. The
performance of a DMA is also tightly coupled to the
user program's use of the external bus. If the user program does not make frequent. bus requests, the requests
by the DMA controller will be serviced with little or no
delay.
The user can enhance performance· of the DMA with
trade-ami in system complexity and flexibility. Aligned
transfers eliminate the. microcode overhead needed to
perform, the internal· alignments. DMAs between regions of equal transfer widths eliminate overhead for
Reserved
(Initialize To 0)
The channel enable field enables a DMA once the
channel is set up. Clearing these bits will also. cause a
DMA transfer to be s.uspended.
The terminal count field signals that byte count has
reached zero and a DMA has ended.
The channel active field indicates that a
or active. If set, this bit inpiciltes that
active. This implies that the channel
.transfer or has a request pending. The
status information only.
channel is idle
the channel is
is servicing a
active bits are
The channel done field indicates that a DMA operation
is complete. The done bits are status information only.
The channel wait field is· used for handshaking with a
user program in data chaining mode. The DMA sets
these bits when a new linked-list descriptor is read. The
DMA will not read the next descriptor until this bit is
cleared by the user. The user can set up the next descriptor and then clear the channel wait bits to dynamically change descriptors.
1
' - - - - - - - - - - - - - - - -......--Priorlty Mode Bit
' - - - - - - - - - - - - - - - - - - - - T h r o t t l e Bit
270669-29
Figure 4~8. DMA Command Register
3-160
80960CA PRODUCT OVERVIEW
A priority mode bit selects rotating or fixed priority
mode.
The throttle bit selects the maximum amount of core
resources that the DMA microcode will receive in relation to the execution of the user program.
4.3.5 DMA INTERRUPTS
The DMA controller is the source of 4 hardware interrupts in the 80960CA. The DMA Controller can be
programmed to request an interrupt when a DMA is
complete, or when a buffer transfer is completed in
chaining mode. Each channel requests a different interrupt.
troller allows the 8 interrupt pins to be configured as
dedicated inputs capable of requesting 8 interrupts, or
as a vectored input capable of requesting up to 248
interrupts. The NMI pin is always a dedicated input.
The interrupt controller pins are described below.
XINT7 - External Interrupts (inputs)-These pins
XINTO can be used as dedicated inputs, or acting
together as an 8-bit number, request any interrupt. The inputs are edge or level detected, and are optionally debounced internally.
NMI
Non-Maskable Interrupt (input)-NMI requests the highest priority interrupt. NMI
is always taken and is not maskable (as the
name implies), and not interruptable.
4.4.2 INTERRUPT MODES
4.4 Interrupt Controller
The 80960CA Interrupt Controller manages interrupts
which are requested by external agents or by the DMA
Controller. The interrupt controller manages 4 internal
DMA interrupt sources, a single NMI (Non-Maskable
Interrupt) pin, and 8 external interrupt pins. Up to 248
external interrupt sources can be supported by the interrupt controller. The interrupt controller handles the
prioritization of software interrupts, hardware interrupts, and the process priority, and signals the core
when interrupts are to be serviced. The interrupt controller provides the low-latency interrupt service featured on the 80960CA.
4.4.1 EXTERNAL INTERRUPTS
The 8 external interrupt pins can be configured in one
of three modes: dedicated mode, expanded mode, or
mixed mode (Figure 4-9).
4.4.2.1 Dedicated Mode Interrupts
In dedicated mode, each of the 8 interrupt pins acts as a
dedicated input. When an external event is detected on
an interrupt pin, a unique interrupt is requested for that
pin. It is possible to map each dedicated pin to one of a
number of possible interrupt vectors. This is accomplished by programming the interrupt map (IMAP)
control registers with an interrupt vector number for
each pin. (Recall that interrupt vector numbers are
8-bit values which reference the 248 vectors in the interrupt table.)
The 80960CA provides 8 interrupt pins and one NMI
pin for detecting external requests. The interrupt con-
Expanded Mode
Dedicated !.tode
NMl
4 - - - NMI Source
NiJj
B0960CA
1.'4-------
NMI Source
B0960CA
. . . . . - - External Source 7
...---ExternolSoufce6
External Source 5
Priority
Encoding
LogIc
248
Exterool
248 Sources
External Source 0
t.llxed l.4ode
fiiil I . , f - - - - - - - N t . l l Source
80960CA
~§~;;;;;;~=
EJ(ternal Source
External
Source 21
External Source 0
30
External
Sources
270669-32
Figure 4-9. Interrupt Modes
3-161
•
inteL
80960CA PRODUCT OVERVIEW
Only the upper four bits of the vector number can be
programmed for a dedicated mode interrupt. The lower
four bits are fixed at the value 00102. With four programmable bits, one of 15 interrupt vectors is available
for each dedicated pin. These interrupt vectors span the
even priority levels from priority 2 to 30. The vector at
priority 0 is not defined.
The 15 interrupt vectors available to dedicated sources
clm be cached in internal data RAM. If this interrupt
vector caching feature is selected, the processor will automatically fetch the vector from data RAM, eliminating the latency caused· by a bus request for a vector in
external memory.
mode, the other in expanded mode. In mixed mode,
three pins are dedicated interrupt pins (XINT7XINT5). A programmable vector number is associated
with each of these pins. The remaining five interrupt
pins (XINT4-XINTO) are treated as the most significant five bits of the expanded mode vector number. The
lower order bits are internally forced to 0102 to form
the full 8-bit value for the vector number.
4.4.3 INTERRUPT CONTROLLER SETUP
The DMA Controller can request four interrupts to signal the end of a DMA for each of four channels. The
four interrupt signals froni the DMA are handled by
the interrupt .controller in the same way as an interrupt
pin configured as a dedicated input. Each of the four
DMA sources may request one of 15 interrupts by programming the IMAP for that source.
4_4.2.2 Expanded Mode Interrupts
In expanded mode, external hardware considers the interrupt pins (XINTO-XINT7) as an 8-bit binary number. This number is used directly as the interrupt vector
number. Each of the 248 possible interrupt vectors can
be referenced in this way, allowing a separate external
source for ea.ch vector. External hardware is responsible for recognizing individual hardware sources and
then driving the interrupt vector number corresponding
to that source onto the interrupt pins.
4.4.2.3 Mixed Mode Interrupts
In mixed mode, the 8 interrupt pins are divided into
two functional sets. One set functions in dedicated
The interrupt controller uses two special function registers to manage interrupt requests by hardware sources.
The hardware interrupt pending register (IPND) and
the hardware interrupt mask register (IMSK) are addressed as sm and sft respectively. A single bit in each
register corresponds to each of the 8 possible external
sources and 4 DMA sources for hardware interrupts.
The IMSK register performs the function of masking
hardware interrupts and the IPND register implements
posting of interrupts requested by hardware. When
configured for expanded or mixed mode interrupts, bit
o of the IMSK register globally masks the expanded
mode interrupts.
4.4.4. NON-MASKABLE INTERRUPT
In addition to the maskable hardware interrupts, a single Non-Maskable Interrupt (NMI) is provided. A dedicated NMI pin is used to request this interrupt. NMI is
defined as a higher priority than any hardware interrupt, software interrupt, or process priority. The NMI
procedure, therefore, can never be interrupted and
must execute the return instruction before other procedures can execute. The NMI procedure is entered
through vector 248. This vector is cached in internal
data RAM at initialization to reduce latency for the
NMI.
3-162
80960CA PRODUCT OVERVIEW
APPENDiX A
80960CA CORE ~MPl!EMENTAT~ON
The 80960CA Core is a high-performance implementation of the 80960 Core Architecture. This section briefly describes the microarchitecture of the 80960CA core
and the key constructs used to achieve parallel instruction execution.
sors. The REG and MEM data busses shown in Figure
A-I are used to transfer data between the common
Register File and the coprocessors.
A.1 Instruction Sequencer
The 80960CA core can be divided into the 6 main subunits listed below.
Instruction Sequencer
Register File
Execution Unit
Multiply and Divide Unit
Address Generation Unit
Static Data RAM and Local Register Cache
The Instruction Sequencer (IS) decodes the instruction
stream and drives the decoded instruction stream onto
the coprocessor interfaces. In a single clock, the IS decodes up to 4 instruction and issues up to three of these
instructions to the on-chip coprocessors or to the IS
itself. One register (REG) format, one memory (MEM)
format, and one control or control and branch (CTRL
or COER) format instruction can be issued at one time.
These instructions are directed respectively to the REG
coprocessors, the MEM coprocessors, or to the IS. The
ability to issue multiple instructions in parallel can result in the simultaneous execution of many instructions
at once. An optimizing compiler or hand optimization
of assembly code can easily produce an instruction
stream which takes full advantage of the parallel execution of the core.
Figure A -I is a simple block diagram of the 80960CA.
The nucleus of the processor is the Instruction Sequencer and Register File. The other subunits of the
core, referred to as coprocessors, radiate from these
units, connecting to either the. register (REG) side or
the memory (MEM) side of the processor. The Instruction Sequencer issues directives, via the REG and
MEM interfaces, which target a specific coprocessor.
That coprocessor then executes an express function virtually decoupled from the IS and the other coproces-
A technique known as resource scoreboarding is used to
manage the parallel execution of instructions and the
common resources of the processor. A coprocessor, for
example, can scoreboard itself, indicating that it cannot
r-------..
I
I
I
I
I
I
I
Othor
REG
Co-
I
I
I
I
I
.-:.:.:.:.-- .
Processors :
..------- .
B0960CA Core
....
,.---------------------------------------------------~I
t!~~ €o~~o~~s~o~ =S~d: =j
Instruction
Sequencer
(IS)
REG Coprocessor Interface
Cache
~tructl~
~
=>
0.
j
.E
T
I
tEG Data BUSS..!
• i f ti
I
I
I
I
I
I
I
I
I
I
I
I
I
I
Multiply
Divide
Execution
Unit
Unit
(MDU)
(EU)
[~~~ ~~~~c~s~~=S~d~=]
MEM Coprocessor Interface
,
I
,EM Data Busses
• 1
Register
Flies
(RF)
I ...
...
'C17
!
Address
Generation
Unit
(AddGen)
l
!
On-Chip
RAM/Local
Register
Cache
I
I
I
I
Other
MEM
Co-
.--..::.:,;-"
: Processors
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
.
I
I
I
I
I
I
---------------------------------------------------~
....
:.
.!!
c:
eC
U
u
,g
0
«
~
0
0
"::J
m
ntLI.
f.I.
270669-30
Figure A·1. 80960CA Block Diagram
3-163
intel®
80960CA PRODUCT OVERVIEW
act on another instruction until an instruction currently
executing on that coprocessor is completed. A specific
form of resource scoreboarding is referred to as register
scoreboarding. When the computation stage of an instruction takes more than one clock, the destination
register or registers for the result are scoreboarded as
busy. A subsequent operation needing that particular
register will be delayed until the multi-clock operation
is completed. Instructions which do not use the scoreboarded registers can be executed in parallel.
The IS manages a three stage parallel instruction pipeline (Figure A-2). In the first stage of the pipeline (pipe
0), the address of the next instruction is calculated.
This address may be the next sequential instruction, the
target of a branch, or a location in microcode. In the
second stage of the pipeline (pipe 1), the instructions
are issued to the rest of the machine. In the third stage
(pipe 2), the instruction computation is started, and for
single cycle instructions, a result is returned' i
.
Several microarchitectural features of the core are designed to minimize performance loss due to pipeline
breaks.
Branch Prediction-To minimize pipeline breaks due to
branching, the user can specify the direction that a conditional branch instruction will usually follow. The
processor will execute along the specified instruction
path with no pipeline break. If the branch direction
specified was the direction actually selected by execution of the conditional branch, no pipeline break occurs. The direction of the branch guess is determined
by a bit value in the CTRL format instructions.
Register Bypassing-Register bypassing is a feature
which forwards the result of an instruction for immediate use as the source of another instruction. This forwarding occurs at the same time that the value is writ-
ten to its destination register. Bypassing the register file
saves the one clock cycle break which would otherwise
occur while waiting for the value to be written to the
register file and the register scoreboard to be cleared.
On-chip Cache-The on-chip instruction cache and local register cache eliminate many pipeline breaks which
will occur if the IS is forced to wait for code or data to
be moved between the 80960CA and external memory.
Register File Access-The Register File allows multiple
instructions to gain access to the register set simultaneously. This eliminates pipeline breaks which would
be caused by a loss of access to the register set by any
coprocessor.
A.1.1 INSTRUCTION CACHE
The IS includes a 1 Kbyte two-way set associative instruction cache capable of delivering up to four instructions each clock to the Instruction Sequencer. The
cache allows inner loops of code to execute with no
external instruction fetches.
A.1.2 MICROCODE ROM
The 80960CA uses microcode ROM to implement complex instructions and functions. This includes calls, returns, DMA transfers, and initialization sequences. Microcode provides an inexpensive and simple.method for
implementing complex instructions in the mostly RISC
environment of the 80960CA. When the IS encounters
a microcoded instruction, it automatically branches to
the microcode routine. The 80960CA performs this microcode branch in 0 clocks.
State
2
3
Pipe 0
decode
decode
decode
Pipe 1
xxxxx
issue
issue
Pipe 2
xxxxx
xxxxx
execute &
return
Figure A-2. Instruction Pipeline
3-164
in~®
80960CA PRODUCT OVERVIEW
A.2 Register File
A.4 Multiply Divide Unit
The Register File (RF) contains the 16 local and 16
global registers. The register file has six ports (Figure
A-3), allowing parallel access of the register set by several 80960CA coprocessors. This parallel access results
in an ability to execute one simple logic or arithmetic
instruction, one memory operation (load/store), and
one address calculation per clock.
The Multiply and Divide Unit (MDU) is a REG coprocessor which performs integer and ordinal multiply, divide, remainder, and modulo operations. The MDU detects integer overflow and divide by zero errors. The
MDU is optimized for multiplication, performing 32bit multiplies in 4 clocks. The MDU performs multiplies and divides in parallel with the main execution
unit.
MEM coprocessors interface to the RF with a 128-bit
wide load bus and a 128-bit wide store bus. These bus~
ses enable movement of up to 4 words per clock to and
from the RF. These busses also allow LOAD data from
a previous read access and STORE data from a current
write access to be processed in the register file simultaneously. An additional 32-bit port allows an address or
address reduction operand to be simultaneously fetched
by the Address Generation Unit.
REG coprocessors interface to the RF with two 64-bit
source busses and a single 64-bit destination bus. With
this bus structure, two source operands are simultaneously issued to a REG coprocessor when an instruction is issued. A 64-bit destination bus allows the result
from the previous operation to be written to the RF at
the same time that the current operation's source operands are issued.
A.3 Execution Unit
The Execution Unit is the 32-bit Arithmetic and Logic
Unit of the 80960CA Core. The EU can be viewed as a
self-contained REG coprocessor with its own instruction set. As such, the EU is responsible for executing or
supporting the execution of all the integer and ordinal
arithmetic instructions, the logic and shift instructions,
the move instructions, the bit and bit field instructions,
and the compare operations. The EU performs any
arithmetic or logical iJistructions in a single clock.
A.S Address Generation Unit
The Address Generation Unit (AGU) is a MEM coprocessor which computes the effective addresses for memory operations. It directly executes the load address instruction (Ida) and calculates addresses for loads and
stores based on the addressing mode specified in these
instructions. The address calculations are performed in
parallel with the main execution unit (EU).
A.6 Data RAM and local Register
Cache
The Data RAM and Local Register Cache is part of a
1.5 Kbyte block of on-chip Static RAM (SRAM).
I Kbyte of this SRAM is mapped into the 80960CA's
address space from location OOOOOOOOH to
000003FFH. A portion of the remaining 512 bytes is
dedicated to the Local Register Cache. This part of
internal SRAM is not directly visible to the user. Loads
and Stores, including quad-word accesses, to the internal SRAM are typically performed in only one clock.
The complete local register set, therefore, can be moved
to the local register cache in only four clocks.
~":,u"""':""':"~
~ SiX-Port Register file ~
SRC1
/>/////////////////////
64
64 \
SRC2
DEST
64
\
I
I
16 Local Registers
16 Global Registers
%
I
I
, 128
Load
, 128
32
Store
Address
Bose
MEM
REG DATA
BUSSES
DATA
BUSSES
Figure A-3. Six-Port Register File
3-165
270669-31
80960CA-33, -25, -16
32-BIT HIGH PERFORMANCE EMBEDDED PROCESSOR
• Two Instructions/Clock Sustained Execution
• Four 59 Mbytes/s DMA Channels with Data Chaining
• Demultiplexed 32-Bit Burst Bus with Pipelining
_ 32-bit Parallel Architecture
- Two Instructions/clock Execution
- Load/Store Architecture
-16, 32-b!t Global Re~isters
-16, ~2-bIt Local ~e~lst~rs
- Manipulate ~4-Blt Bit Fields
- 11 Addressing Modes
- Full Pa~allel Fault 1II!0dei
- Supervisor Protection Model
_ Fast Procedure Call/Return Model
, - Full Procedure Call in 4 clocks
- RISC Call in 2 clocks (BAL)
_ On-Chip Register Cache
- Caches Registers on Call/Ret
- Minimum of 6 Frames provided
- Number of Frames Programmable,
up to 15
Ii On-Chip Instruction Cache
- 1 Kbyte Two-Way Set Associative
-128-bit Path to Instruction Sequencer
- Cache-Lock Modes
- Cache-Off M~de
_ High Bandwidth On-Chip Data Ram
-1 Kbytes On-chip RAM for Data
- Sustain 128-bits per clock access
_ Four On~Chip DMA Channels
-:- 59 Mbytes/s Fly-by Transfers
_ 32 Mbytes/s Two-Cycle Transfers
_ Data Chaining
I
_ Data Packing/Unpacking
_ Programmable Priority Method
• 32-Bit Demultiplexed Burst Bus
-128-Bit Internal Data Paths to and
from Registers
- Burst Bus for DRAM InterfaCing
- Address Pipelining Option
- Fully Programmable Wait States
-Supports 8, 16 or 32-bit Bus Widths
- Supports Unaligned Accesse.s
- Supervisor Protection Pin
_ High-Speed Interrupt Controller
- Up to 248 External Interrupts
- 32 Fully Programmable Priorities
- Multi-mode 8-bit Interrupt Port .
- Four Internal DMA Interrupts·
I
- Separate, Non-maskable Interrupt Pin
- Context Switch in 750 ns Typical
270727-1
Figure 1. 80960CA Die Photo
3·166
November 1991
Order Number: 270727·004
80960CA-33, -25, -16
32-Bit High Performance Embedded Processor
CONTIENTS
PAGE
1.0 PURPOSE ......................... 3-169
2.0 80960CA OVERViEW .............. 3-169
2.1 The C-Series Core ............... 3-170
2.2 Pipelined, Burst Bus ............. 3-170
2.3 Flexible DMA Controller ......... 3-170
CONTIENTS
PAGE
FIGURES
Figure 1
Figure 2
80960CA Die Photo ........ 3-166
80960CA Block Diagram ... 3-169
Figure 3
Example Pin Description
Entry ...................... 3-172
Figure 4a
80960CA PGA Pinout (View
from Top Side) ............. 3-180
80960CA PGA Pinout (View
from Bottom Side) ......... 3-181
80960CA PQFP Pinout (View
from Top Side) ............. 3-184
2.4 Priority Interrupt Controller ....... 3-170
2.5 Instruction Set Summary ......... 3-171
Figure 4b
3.0 PACKAGE INFORMATION ........ 3-172
Figure 46
3.1 Package Introduction ............ 3-172
3.2 Pin Descriptions ................. 3-172
3.3 80960CA Pinout ................. 3-178
Figure 5
3.4 Mechanical Data ................ 3-185
Figure 6
3.5 Package Thermal
Specifications ..................... 3-189
Figure 7
Molded Details ............. 3-187
Figure 8
Figure 9
Detail M ................... 3-187
Terminal Details ........... 3-188
3.6 Stepping Register Information ... 3-191
3.7 Suggested Sources for 80960CA
Accessories ...................... 3-191
168-Lead Ceramic PGA
Package Dimensions ...... 3-185
Principal Dimensions and
Datum ..................... 3-187
Figure 10
Typical Lead ............... 3-188
Figure 11
80960CA PGA Package
Thermal Characteristics .... 3-189
4.1 Absolute Maximum Ratings ...... 3-192
4.2 Operating Conditions ............ 3-192
Figure 12
80960CA PQFP Package
Thermal Characteristics .... 3-190
4.3 Recommended Connections ..... 3-192
Figure 13
Measuring 80960CA PGA
and PQFP Case
Temperature ............... 3-190
4.0 ELECTRICAL SPECIFICATIONS .. 3-192
4.4 DC Specifications ............... 3-193
4.5 AC Specifications ................ 3-194
5.0 RESET, BACKOFF AND HOLD
ACKNOWLEDGE .................... 3-205
6.0 BUS WAVEFORMS ................ 3-206
Figure 14 Register GO ................ 3-191
Figure 15 AC Test Load .............. 3-200
Figure 16a Input and Output Clocks
Waveform ................. 3-200
3-167
CONTENTS
PAGE
Figure 16b ClKIN Waveform ........... 3-200
Figure 17 Output Delay and Float
Waveform .................. 3-201
Figure 18a Input Setup and Hold
Waveform .................. 3-201
Figure 18b XINTO:7Input Setup and Hold
Waveform .................. 3-201
Figure 19 Hold Acknowledge
Timings .................... 3-202
Figure 20 Bus Back-Off (BOFF)
Timings .................... 3-202
Figure 21 Relative-Timings
Waveforms ................. 3-203 .
Figure 22 Output Delay or Hold vs load
Capacitance ................ 3-203
Figure 23 Rise and Fall Time Derating at
Highest Operating
Temperature and Minimum
Vcc : ....................... 3-204
Figure 24 Icc vs Frequency and
Temperature ......•........ 3-204
Figure 25 Cold Reset Waveform ...... 3-206
Figure 26 Warm Reset Waveform ..... 3-207
Figure 27 Entering the ONCETM
State ....................... 3-208
Figure 28 Clock Synchronization in the
2x Clock Mode ............. 3-209
Figure 29 Non-Burst, Non-Pipelined
Accesses without Wait
States ...................... 3-210
Figure 30 Non-Burst,. Non-Pipelined
Read with Wait States ...... 3-211
Figure 31 Non-Burst, Non-Pipelined
Write with Wait States ...... 3-212
Figure 32 Burst, Non-Pipelined Read
without Wait States, 32-Bit
Bus ........................ 3-213
Figure 33 Burst, Non-Pipelined Read
with Wait States, 32-Bit
Bus ........................ 3-214
Figure 34 Burst, Non-Pipelined Write
without Wait States, 32-Bit
Bus .... ; ................... 3-215
CONTENTS
Figure 35
Figure 36
Figure 37
PAGE
Burst, Non-Pipelined Write
with Wait States, 32-Bit
Bus ........................ 3-216
Burst, Non-Pipelined Read
with Wait States, 16-Bit
Bus; ....................... 3-217
Burst, Non-Pipelined Read
with Wait States,B-Bit
Bus ........................ 3-218
Figure 38 Non-Burst, Pipelined Read
without Wait States, 32-Bit
Bus ........................ 3-219
Figure 39 Non-Burst, Pipe lined Read
with Wait States, 32-Bit
Bus ........................ 3-220
Figure 40 Burst, Pipelined Read without
Wait States, 32-Bit Bus ..... 3-221
Figure 41
Burst, Pipelined Read with
Wait States, 32-Bit Bus ..... 3-222
Figure 42 Burst, Pipelined Read with
Wait States, 16-Bit Bus ..... 3-223
Figure 43 Burst; Pipelined Read with
Wait States, 8-Bit Bus ...... 3-224
Figure 44 Using External READY ..... 3-225
Figure 45 Terminating a Burst with
BTERM .................... 3-226
Figure 46 BOFF Functional Timing ... 3-227
Figure 47 HOLD Functional Timing ... 3-227
Figure 48 DREQ and DACK Functional
Timing ................... ; .. 3-228
Figure 49 EOP Functional Timing ..... 3-228
Figure 50 Terminal Count Functional
Timing ........... ; ......... 3-229
Figure 51
FAil Functional Timing .... 3-229
Figure 52 A Summary of Aligned and
Unaligned Transfers for Little
Endian Regions ............ 3-230
Figure 53 .. A Summary of Aligned and
Unaligned Transfers for Little
Endian Regions
(Continued) ................ 3-231
Figure 54 . Idle Bus Operation ......... 3-232
3-168
80960CA-33, -25, -16
1.0 PURPOSE
This document provides a preview of the electrical
characteristics expected of the 33, 25 and 16 MHz
versions of the 80960CA. For a detailed description
of any 80960CA functional topic, other than parametric performance, consult the latest 80960CA
Product Overview (Order No. 270669), or the
80960CA User's Manual (Order No. 270710).
2.0
80960CA OVERVIEW
The 80960CA is the second-generation member of
the 80960 Family of embedded processors. The
80960CA is object code compatible with the 32-bit
80960 Core Architecture while including Special
Function Register extensions to control on-chip peripherals, and instruction set extensions to shift 64bit operands and configure on-chip hardware. Multiple 128-bit internal busses, on-chip instruction caching and a sophisticated. instruction scheduler allow
the processor to sustain execution of two in'structions every clock, and peak at execution of 3 instructions per clock.
A 32-bit demultiplexed and pipelined burst bus provides a 132 Mbyte/s bandwidth to a system's highspeed external memory sub-system. In addition, the
80960CA's on-chip caching of instructions, procedure context and critical program data substantially
decouples system performance from the wait states
associated with accesses to the system's slower,
cost sensitive, main memory sub-system.
The 80960CA bus controller also integrates full wait
state and bus width control for highest system performance with minimal system design complexity.
Unaligned access and Big Endian byte order support
reduces the cost of porting existing applications to
the 80960CA.
The processor also integrates four complete c;:latachaining DMA channels and a high-speed interrupt
controller on-chip. The DMA channels perform: single-cycle or two-cycle transfers, data packing and
unpacking, and data chaining. Block transfers, in addition to source or destination synchronized transfers are provided.
The interrupt controller provides full programrnability
of 248 interrupt sources into 32 priority levels with a
typical interrupt task switch ("latency") time of
750 ns.
.
FOUR·CHANNEL
DMA CONTROLLER
PROGRAMMABLE
INTERRUPT CONTROLLER
(IK byte. Two·way set associative)
MULTIPLY/DIVIDE
UNIT
SIX·PORT
REGISTER FILE
270727-2
Figure 2. 80960CA Block Diagram
3-169
•
int:eL
80960CA-33, -25, -16
2.1. The C-Series Core
-
The C-Series core is a very high performance microarchitectural implementation of the 80960 Core Architecture. The C-Series core can sustain execution
of two instructions per clock (66 MIPs at 33 MHz).
To achieve this level of performance, Intel has incorporated state-of-the-art silicon technology and innovative microarchitectural constructs into the implementation of the C-Series core. Factors that contribute to the core's performance include:
- Parallel instruction decoding allows issue of up
to three instructions per clock.
- Most instructions execute in a single clock.
-
-
-
-
2.3. Flexible DMA Controller
A four channel DMA controller provides high speed
DMA control for data transfers involving peripherals
and memory. The DMA provides advanced features
such as data chaining, byte assembly and disassembly, and a high performance fly-by mode capable of
transfer speed of up to 59 Mbytes per second at
33 MHz. The DMA controller features a performance
and flexibility which is only possible by integrating
the DMA controller and the 80960CA core.
Local Register Cache integrated on-chip caches
CalilReturn context.
Two-way set associative, 1Kbyte integrated instruction cache
1Kbyte integrated Data RAM sustains a fourword (128-bit) access every clock cycle.
2.2.
Full internal wait state generation to reduce system cost
Little and Big Endian support to ease application
development
Unaligned access support for code portability
Three-deep request queue to decouple the bus
from the core
Direct interface to Intel's 27C960 Burst EPROM
. and 82596 Ethernet Controller.
-
Register and resource scoreboarding allow
simultaneous multi-clock instruction execution.
-Branch look-ahead and prediction allows many
branches to execute with no pipeline break.
-
32-,16- and 8-bit modes for 1/0 interfacing ease.
-
-
-
-
Parallel instruction decode allows sustained,
simultaneous execution of two single-clock instructions every clock cycle.
Efficient instruction pipeline is designed to mini.
mize pipeline break losses.
Demultiplexed, Burst Bus to exploit most efficient
DRAM access modes
Address Pipelining to reduce memory cost while
maintaining performance
.
2.4. Priority Interrupt Controller
Pipelined, Burst Bus
A 32-bit high performance bus controller interfaces
the 80960CA to external memory and peripherals.
The Bus Control Unit features a maximum transfer
rate of 132 Mbytes per second (at 33 MHz). Internally programmable wait states and 16 separately configurable memory regions allow the processor to interface with a variety of memory subsystems with a
minimum of system complexity. and a maximum of
performance. The Bus Controller's main features include:
A programmable-priority interrupt controller manages up to 248 external sources through the 8-bit
external interrupt port. The Interrupt Unit also handles the 4 internal sources from the DMA controller,
and a single non-maskable interrupt input. The 8-bit
interrupt port can also be configured to provide individual interrupt sources that are level, or edge triggered.
Interrupts in the 80960CA are prioritized and signaled within 270 ns of the request. If the interrupt is
of higher priority than the processor priority, the context switch to the interrupt routine typically is complete in another 480 ns: The interrupt unit provides
the mechanism for the low latency and high throughput interrupt service which is essential for embedded
applications.
3-170
80960CA-33, -25, -16
2.5. Instruction Set Summary
The following table summarizes the 80960CA instruction set by logical groupings. See the 80960CA User's
Manual for a complete description of the instruction set.
Data
.Movement
Load
Store
Move
Load Address
Comparison
Compare
Conditional
Compare
Compare and
Increment
Compare and
Decrement
Condition Test
Check Bit
Debug
Modify Trace
Controls
Mark
Force Mark
Arithmetic
Logical
Add
Subtract
Multiply
Divide
Remainder
Modulo
Shift
'Extended
Shift
Extended
Multiply
Extended
Divide
Add with
Carry
Subtract with
Carry
Rotate
And
Not And
And Neit
Or
Exclusive Or
Not Or
Or Not
Nor
Exclusive Nor
Not
Nand
Branch
Unconditional
Branch
Conditional
Branch
Fault
Call
Call Extended
Call System
Return
Branch and Link
Conditional
Fault
Synchronize
Faults
Atomic
Modify
Process
Controls
Modify
Arithmetic
Controls
'System Control
*DMA Control
Flush Local
Registers
Atomic Add
Atomic Modify
NOTE:
Instructions marked by (*) are 80960CA extensions to the 80960 instruction set.
3-171
Set Bit
Clear Bit
Not Bit
Alter Bit
Scan for Bit
Span over Bit
Extract
Modify
Scan Byte for Equal
Call and Return
Compare and
Branch
Processor
Management
Bit, Bit Field
and Byte
intel®
3.0
80960CA-33, -25, -16
Table 1. Pin Description Nomenclature
PACKAGE INFORMATION
Description
Symbol
3.1. Package Introduction
I
Input only pin
This section describes the pins, pinouts and thermal
characteristics for the 80960CA in the 168-pin Ceramic Pin Grid Array (PGA) package and the 196 pin
Plastic Quad Flat Package (PQFP). For complete
package specifications and information, see the Intel
Packaging Specification (Order # 231369).
0
Output only pin
1/0
-
The pins associated with the 32-bit demultiplexed
processor bus are described in Table 2. The pins
associated with basic processor configuration and
control are described in Table 3. The pins associated with the 80960CA OMA Controller and Interrupt
Unit are described in Table 4.
Figure 3 provides an example pin description table
entry. The "1/0" signifies that the data pins are input·output. The "S" indicates the pins aresynchronous to PCLK2:1. The "H(Z)" indicates that these
pins float while the processor bus is in a Hold Acknowledge state. The "R(Z)" notation indicates that
the pins also float while RESET is low.
Pins "must be" connected as
described
S( ... )
Synchronous. Inputs must meet setup
and hold times relative to PCLK2:1 for
proper operation of the processor. All
outputs are synchronous to PCLK2:1.
S(E) Edge sensitive input
S(L) Level sensitive input
A( ... )
Asynchronous. Inputs may be
asynchronous to PCLK2: 1.
A(E) Edge sensitive input
A(L) Level sensitive input
H( ... )
While the processor's bus is in the
Hold Acknowledge or Bus Backoff
state, the pin:
H(1)
is driven to Vee
H(O)
is driven to Vss
H(Z) floats
H(Q) continues to be a valid output
R( ... )
While the processor's RESET pin is
low, the pin
R(1)
is driven to Vee
R(O)
is driven to V ss
R(Z) floats
R(Q) continues to be a valid output
3.2. Pin Descriptions
The 80960CA pins are described in this section. Table 1 presents the legend for interpreting the pin descriptions in the following tables.
Pin can be either an input or output
All pins float while the processor is in the ONCETM
mode.
Name
Type
Description
D31:0
1/0
S(L)
H(Z)
R(Z)
DATA BUS carries 32, 16 or 8-bit data quantities depending on bus width configuration. The
least significant bit of the data is carried on 00 and the most significant on 031. When the
bus is configured for 8 bit data, the lower 8 data lines, 07:0 are used. For 16 bit data widths,
015:0 are used. For 32 bit data the full data bus is used.
Figure 3. Example Pin Description Entry
3-172
80960CA-33, -25, -16
Table 2. 80960CAPin Description-External Bus Signals
Name
A31:2
031:0
BE3
BE2
BE1
BEO
Type
Description
0
S
H(Z)
R(Z)
ADDRESS BUS carries the upper 30 bits of the physical address. A31 is the most
significant address bit and A2 is the least significant. During a bus access, A31:2
identify all external addresses to word (4-byte) boundaries. The byte enable
signals indicate the selected byte in each word. During burst accesses, A3 and A2
increment to indicate successive data cycles.
I/O
S(L)
H(Z)
R(Z)
DATA BUS carries 32, 16 or 8-bit data quantities depending on bus width
configuration. The least significant bit of the data is carried on DO and the most
significant on D31. When the bus is configured for 8 bit data, the lower 8 data
lines, 07:0 are used. For 16 bit bus widths, 015:0 are used. For 32 bit bus widths
the full data bus is used.
0
BYTE ENABLES select which of the four bytes addressed by A31:2 are active
during an access t~emory region configured for a 32-bit data-bus width. BE3
applies to 031 :24; BE2 applies to 023:16; BE1 applies to 015:8; and BEO applies
to 07:0.
32-bit bus:
BE3
-Byte Enable 3
-enable 031 :24
-Byte Enable 2
-enable 023:16
BE2
BE1
-Byte Enable 1
-enable 015:8
BEO
-Byte Enable 0
-enable 07:0
S
H(Z)
R(I)
For accesses to a memory region configured for a 16-bit data-bus width, ~
processor directly encodes BE3, BE1 and BEO to provided BHE, A1 and BLE
respectively.
16-bit bus:
BE3
BE2
BE1
BEO
-Byte High Enable (BHE)
-enable 015:8
-Not used (is driven high or low)
-Address Bit 1 (A1)
-Byte Low Enable (BLE)
-enable 07:0
For accesses to a memory region co~red for an 8-bit data bus width, the
processor directly encodes BE1 and BEO to provide A 1 and AO respectively.
8-bit bus:
W/R
0
S
H(Z)
R(O)
ADS
0
S
H(Z)
R(1)
READY
I
S(L)
H(Z)
R(Z)
BE3
BE2
BE1
BEO
-Not used (is driven high or low)
-Not used (is driven high or low)
-Address Bit 1 (A 1)
-Address Bit 0 (AO) .
WRiTE/READ is low (0) for read requests and h.!.9!!J1) for write requests. The
W fR signal changes in the same clock cycle as ADS. It remains valid for the entire
access in non-pipelined regions. In pipelined regions, W fR may not be valid in the
last cycle of a read access.
ADDRESS STROBE indicates valid address and the start of a new bus access.
ADS is asserted for the first clock of a bus access.
READY is an input which Signals the termination of a data transfer. READY is
used to indicate that read data on the bus is valid, or that a write-data transfer has
completed. The READY signal works in conjunction with the internally
programmed wait-state generator. If READY is enabled in a region, the p~
sampled after the programmed number of wait-states has expired. If the READY
pin is deasserted high, wait states will continue to be inserted until READY
becomes asserted low. This is true for the NRAD, NRDD, NWAD, and NWDD wait
states. The NXDA wait states cannot be extended.
3-173
80960CA-33, -25, -16
Table 2. 80960CA Pin Description-External Bus Signals (Continued)
Name
Type
BTERM
I
S(L)
H(Z)
R(Z)
WAIT
0
S
H(Z)
R(1)
BLAST
0
S
H(Z)
R(O)
DTIR
0
S
H(Z)
R(O)
DEN
0
S
H(Z)
R(1)
LOCK
0
S
H(Z)
R(1)
HOLD
I
S(L)
H(Z)
R(Z)
BOFF
I
S(L)
H(Z)
R(Z)
Description
BURST TERMINATE-The burst terminate signal breaks up a burst access and
causes another address cycle to occur. The STERM signal works in conjunction
with the internally programmed wait,state generator. If READY and STERM are
enabled in a region, the STERM pin is sampled after the programmed number of
wait states has expired. When STERM is asserted, additional wait states are
inserted until STERM is deasserted. When STERM is deasserted, a new ADS
signal is generated and the access is completed. The READY input is ignored
when STERM is asserted. STERM must be externally synchronized to satisfy the
STERM setup and hold times.
WAIT indicates the status of the internal wait state generator. WAIT is active
when wait states are being caused by the internal wait state generator and not by
the READY or BTERM inputs. WAIT can be used to derive a write·data strobe.
WAIT can also be thought of as a READY output that the processor provides
when it is inserting wait states.
BURST LAST indicates the last transfer in a bus access. BLAST is asserted in the
last data transfer of burst and non-burst accesses after the wait state counter
reaches zero. BLAST remains active until the clock following the last cycle of the
last data transfer of a bus access. If the READY or BTERM input is used to extend
wait states, the BLAST signal remains active until READY or BTERM terminates
the access.
DATA TRANSMITIRECEIVE indicates direction for data transceivers. DT fR is
used in conjunction with DEti to provide control for data transceivers attached to
the external bus. When DT fR is low (0), the signal indicates that the proce~sor will
receive datC\. Conversely, when high (1) the processor will send data. DT fR will
change only while DEN is high.
DATA ENABLE indicates data cycles in a bus access. DEN is asserted (low) at
the start of the first data cycle of a bus request and is de asserted (high) at the end
of the last data cycle. DEN is used in conjunction with DT /R' to provide control for
data transceivers attached to the external bus. DEN remains asserted for
sequential reads from pipelinedmemory regions. DEN is high when DT/R'
changes.
BUS LOCK indicates that an atomic read-modify-write operation is in progress.
LOCK may be used to pr~vent external agents from accessing memory which is
currently involved in an atomic operation. LOCK is asserted (0) in the first clock of
an atomic operation, and deasserted in the clock cycle following the last bus
access for the atomic operation. To allow the most flexibility for a memory system
enforcement of locked accesses, the processor will acknowledge a bus hold
request when LOCK is asserted. The processor will perform DMA transfers while
LOCK is active.
HOLD REQUEST signals that an external agent requests access to the external
bus. The processor asserts HOLDA after completing the current bus request.
HOLD, HOLDA,and BREQ are used together to arbitrate access to the
processor's external bus by external bus agents.
BOFF BUS BACKOFF -The backoff pin, when asserted (0), suspends the
current access and causes the bus pins to float. When the pin is deasserted (1),
the ADS signal is asserted on the next clock cycle and the access is resumed.
3-174
80960CA-33, -25, -16
Table 2. 80960CAPin Description-EJ(ternal Bus Signals (Continued)
Name
Type
Description
HOLDA
0
HOLD ACKNOWLEDGE indicates to a bus requestor that the processor has
relinquished control of the external bus. When HOLDAis asserted, the external
address bus, data bus, and bus control signals are floated. HOLD, BOFF, HOLDA
and BREO are used together to arbitrate access to the processor's external bus
by external bus agents. Since the processor will grant HOLD requests and enter
the Hold Acknowledge state even while RESET is active, the state of the HOLDA
pin will be independent of the RESET pin.
S
H(1)
R(O)
0
BREQ
S
H(O)
R(O)
0
DIC
S
H(Z)
R(Z)
0
DMA
S
H(Z)
R(Z)
0
SUP
S
H(Z)
R(Z)
BUS REQUEST indicates that the processor wishes to perform a bus request.
BREO can be used by external bus arbitration logic in conjunction with HOLD and
HOLDA to determine when to return mastership of the external bus to the
processor.
DATA OR CODE indicates that a bus request is a data request (1) or a instruction
request (0). DIG has the same timing as W /R
DMA ACCESS indicates whether the bus request was initiated by the DMA
controller. DMA will be asserted (low) for any DMA request. DMA will be
deasserted (high) for all other requests.
SUPERVISOR ACCESS indicates whether the bus request is issued while in
supervisor mode. SUP will be asserted (low) when the request has supervisor
privileges, and will be deasserted (high) otherwise. SUP can be used to isolate
supervisor code and data structures from non·supervisor requests.
Table 3. 80960CA Pin Description-Processor Control Signals
Name
Type
Description
RESET
I
A(L)
H(Z)
R(Z)
N(Z)
RESET causes the chip to reset. When RESET is asserted (low), all external signals
return to the reset state. When RESET is deasserted, initialization begins. When the
two-x clock mode is selected, RESET must remain asserted for 16 PCLK2:1 cycles
before being deasserted in .order to guarantee correct initialization of the processor.
When the one-x clock mode is selected, RESET must remain asserted for 10,000
PCLK2:1 cycles before being deasserted in order to guarantee correct initialization of
the processor. The CLKMODE pin selects one-x or two-x input clock division of the
CLKIN pin.
The processor's Hold Acknowledge bus state functions while the chip is reset. If the
processor's bus is in the Hold Acknowledge state when RESET is activated, the
processor will internally reset, but will maintain the Hold Acknowledge state on
external pins until the Hold request is removed. If a hold request is made while the
processor is in the reset state, the processor bus will grant HOLDA .and enter the Hold
Acknowledge state.
0
FAIL indicates failure of the processor's self-test performed at initialization. When
RESET is deasserted and the processor begins initialization, the FAIL pin is asserted
(0). An internal self-test is performed as part of the initialization process. If this self-test
passes, the FAIL pin is de asserted (1) otherwise it remains asserted. The FAIL pin is
reasserted while the processor performs and external bus self-confidence test. If this
self-test passes, the processor deasserts the FAIL pin and branches to the users
initialization routine, otherwise the FAIL pin remains asserted. Internal self-test and the
use of the FAIL pin can be disabled with the STEST pin.
FAIL
S
H(O)
R(O)
3-175
intel®
80960CA-33, -25, -16
Table 3. 80960CA Pin Description-Processor Control Signals (Continued)
Name
STEST
Type
.1
S(l)
H(Z)
R(Z)
ONCETM
I
A(l)
H(Z)
R(Z)
Description
SELF TEST causes the processor's internal self-testfeature to be enabled or
disabled at initialization. STEST is read on the rising edge of RESET. When asserted
(high) the processor's internal self-test and external bus confidence tests are
performed during processor initialization. When deasserted (low), only.the internal
self-test is not performed during initialization.
ON CIRCUIT EMULATION causes all outputs to be floated when asserted (low),
ONCE is continuously sampled while RESET is low, and is latched on the rising edge
of RESET. To place the processor in the ONCE state:
(1 ) assert RESET and ONCE (order does not matter)
(2) wait for at least 16 ClKIN periods in two-x mode, or 10,000 ClKIN periods in
one-x mode, after Vee and ClKIN are within operating specifications
(3) deassert RESET
(4) wait at least 32 ClKIN periods
(The processor will now be latched in the ONCE state as long as RESET is high.)
To exit the ONCE state, bring Vee and ClKIN to operating conditions, then assert
RESET and bring ONCE high prior to deasserting RESET.
ClKIN must operate within the specified operating conditions of the processor until
step 4 above has been completed. The ClKIN may then be changed to DC to
achieve the lowest possible ONCE mode leakage current.
ONCE can be used by emulator products or for board testers to effectively make an
installed processor transparent in the board.
ClKIN
ClKMODE
I
A(E)
H(Z)
R(Z)
I
A(L)
H(Z)
R(Z)
PClK2
PClK1
0
S
H(Q)
R(Q)
Vee
-
N/C
-
Vss
CLOCK INPUT is an input for the external clock needed to run the processor. The
external clock is internally divided as prescribed by the ClKMODE pin to produce
PClK2:1.
CLOCK MODE selects the division factor applied to the external clock input (ClKIN).
When ClKMODE is high (1), ClKIN is divided by one to create PClK2:1 and the
processor's internal clock. When ClKMODE is low (0), ClKIN is divided by two to
create PClK2:1 and the processor's internal clock. ClKMODEshouid be tied high, or
low in a system, as the clock mode is not latched by the processor. If left
unconnected, the processor will internally pull the ClKMODE pin low (0), enabling the
two-x clock mode.
PROCESSOR OUTPUT CLOCKS provide a timing reference for all inputs and
outputs of the processor. All inputs and output timings are specified in relation to
PClK2 and PClK1. PClK2 and PClK1 are identical signals. Two output pins are
provided to allow flexibility in the system's allocation of capacitive loading on the
clock. PClK2:1 may also be connected at the processor to form a single clock signal.
GROUND connections consist of 24 pins which must be connected externally to a
Vss board plane.
.. POWER connections consist of 24 pins which must be connected externally to a Vee
board plane.
NOCONNECT pins must not be connected in a system.
3-176
InteL
80960CA-33, -25, -16
Table 4. 80960CA Pin Description-DMA and Interrupt Unit Control Signals
Name
Type
Description
DREQ3
DREQ2
DREQ1
DREQO
I
A(L)
H(Z)
R(Z)
DMA REQUEST causes a DMA transfer to be requested. Each of the four signals
requests a transfer on a single channel. DREQO requests channel 0, DREQ1
requests channel 1, etc. When two or more channels are requested simultaneously,
the channel with the highest priority is serviced first. The channel priority mode is
programmable.
DACK3
DACK2
DACK1
DACKO
0
DMA ACKNOWLEDGE indicates that a DMA transfer is being executed. Each of the
four signals acknowledges a transfer for a single channel. DACKO acknowledges
channel 0, DACK1 acknowledges channel 1, etc. DACK3:0 are active (0) when the
requesting device of a DMA is accessed.
EOP3/TC3
EOP2/TC2
EOP1/TC1
EOPO/TCO
S
H(1)
R(1)
I/O
A(L)
H(Z/Q)
R(Z)
END OF PROCESS/TERMINAL COUNT can be programmed as either an input
(EOP3:0) or as an output (TC3:0), but not both. Each pin is individually
programmable. When programmed as an input, EOPx causes the termination of a
current DMA transfer for the channel corresponding to the EOPx pin. EOPO
corresponds to channel 0, EOP1 corresponds to channel 1, etc. When a channel is
configured for source and destination chaining, the EOP pin for that channel causes
termination of only the current buffer transferred and causes the next buffer to be'
transferred. EOP3:0 are asynchronous inputs.
When programmed as an output, the channel's TCx pin indicates that the channel
byte count has reached 0 and a DMA has terminated. TCx is driven with the same
timing as DACKx during the last DMA transfer for a buffer. If the last bus request is
executed as multiple bus accesses, TCx will stay asserted for the entire bus request.
XINT7
XINT6
XINT5
XINT4
XINT3
XINT2
XINT1
XINTO
NMI
I
A(E/L)
H(Z)
R(Z)
I
A(E)
H(Z)
R(Z)
EXTERNAL INTERRUPT PINS cause interrupts to be requested. These pins can be
configured in three modes.
In the Dedicated Mode, each pin is a dedicated external interrupt source. Dedicated
inputs can be individually programmed to be level (low) or edge (falling) activated.
In the Expanded Mode, the 8 pins act together as an 8-bit vectored interrupt source.
The interrupt pins in this mode are level activated. Since the interrupt pins are active
low, the vector number requested is the one's complement of the positive logic value
place on the port. This eliminates glue logic to interface to combinational priority
encoders which output negative logic.
In the Mixed Mode, XINT7:5 are dedicated sources and XINT4:0 act as the 5 most
significant bits of an expanded mode vector. The least significant bits are set to 010
internally.
NON-MASKABLE INTERRUPT causes a non-maskable interrupt event to occur.
NMI is the highest priority interrupt recognized. NMI is an edge (falling) activated
source.
3-177
intel®
80960CA-33, -25, -16
3.3. 80960CA Pinout
3.3.1 80960CA CPGA PINOUT
Tables 5 and 6 list the 80960CA pin names with
package location. Figure 4-a depicts the complete
80960CApinout as viewed from the top side of the
component (Le., pins facing down). Figure 4b shows
the complete 80960CA pinout as viewed from the
pin,side of the package (Le., pins facing up). See
.. 5.ection 4.0, Electrical Specifications for specifications and recommended connections.
Table 5. PGA Pin Name with Package Location (Signal Order)
Address Bus
Name .. Location,
A31 ........ S15
A30 ........ Q13
A29 ........ R14
A28 ........ Q14
A27 ........ S16
A26 ........R15
A25 ........ S17
A24 ........ Q15
A23 ........H16
A22 ........ R17
A21 ...... : .Q16
A20 ........ P15
A19 ........ P16
A18 ........ Q17
A17 ........ P17
A16 ........ N16
A15 ........ N17
A14 ........ M17
A13 ........ L16
A12 ........ L17
Data Bus
Name .. Location
031 ........ R03
030 ........ Q05
D29 ........ S02
028 ........ Q04
027 ........ R02
026 ......... Q03
025 : ....... SOl
024 ........ ROI
023 ......... Q02
022 ........ P03
021 ........ QOl
020 ........ P02
019 ........ POI
0.18 ........ N02
017 ........ NOI
016 ........ MOI
015 ........ LOI
014 ........ L02
013 ........ KOI
012 ........ JOI
All ........ K17
011 ........ HOI
HOLD .. ,.... R05
Al0 ........ J17
A9 ......... H17
AS ......... G17
A7 ......... G16
A6 ......... F17
A5 ......... E17
A4 ......... E16
010 ........ H02
09 ......... GOI
08 ......... FOI
07 ......... EOI
06 ......... F02
05 ......... 001
04 ......... E02
HOLDA ..... S04
BREQ ...... R13
A3 ......... 017
A2 ......... 016
03 ......... COI
02 ......... 002
01 ......... C02
~O
'Bus Control
Name .. Location
BE3 ........ S05
BE2 ........ S06
BEl ........ S07
BEO .... ; ... R09
Processor Control
Name .... Location
RESET .......• A16
FAIL .......... A02
I/O
Name . . Location
OREQ3 ..... A07
OREQ2 ..... B06
OREQI ..... A06
OREQO ..... B05
STEST ........ B02
W/R ....... S10
ONCE ........ C03
ADS ....... R06
READY. ; ... S03
' BTERM ..... R04
WAIT ....... S12
BLAST ..... S08
OT/R ....... Sll
DEN .. ; .... S09
LOCK ...... S14
O/C ....... . S13
OMA ....... R12
SUP ....... Q12
BOFF ...... BOI
......... E03
3-178
CKLIN ... , .... C13
CLKMOOE .... C14
PCLK1 ... :.;. :B14
PCLK2 ......•. B13
Vss
Location
C07, C08,G09,
Cl0, Cll, C12,
F15, G03, G15,
H03, H15, J03,
J15, K03, K15,
L03, L 15, M03,
M15, Q07, Q08,
Q09, Ql 0, Qll
Vee
Location
B07, B09, Bl0,
B11 , B12, C06,
E15, F03, F16,
G02, H16, J02,
J16, K02, K16, M02,
M16, N03, N15,
Q06, R07, R08,
Rl0,Rll
No Connect
Location
A01, A03, A04, A05,
B03, B04, C04, C05,
003
OACK3
OACK2
OACKI
OACKO
..... Al0
..... A09
..... A08
..... B08
EOP/TCO
EOP/TCI
EOP/TC2
EOP/TC3
XINT7
XINT6
XINT5
XINT4
XINT3
... All
... A12
... A13
" .A14
...... C17
....... C16
...... B17
...... C15
...... B16
XINT2 ...... A17
XINTI .' ..... A15
XINTO ...... B15
NMI ........ 015
80960CA-33, -25, -16
Table 6. PGA Pin Name with Package Location (Pin Order)
Address Bus
Location .. Name
A01 ......... NC
A02
A03
A04
A05
A06
A07
AOB
A09
A10
A11
A12
A13
A14
A15
A16
A17
....... FAIL
......... NC
......... NC
......... NC
..... 0RE01
..... ORE03
..... DACK1
..... OACK2
..... OACK3
... EOP/TCO
... EOP/TC1
... EOP/TC2
... EOP/TC3
...... XINT1
..... RESET
...... XINT2
B01
B02
B03
B04
B05
B06
B07
B08
B09
B10
B11
B12
B13
B14
B15
B16
B17
...... BOFF
..... STEST
......... NC
......... NC
..... DREOO
..... ORE02
........ Vee
..... OACKO
........ Vee
........ Vee
........ Vee
........ Vee
..... PCLK2
..... PCLK1
...... XINTO
...... XINT3
...... XINT5
Data Bus
Location .. Name
C01 ......... 03
C02 .......... 01
C03 ...... ONCE
C04 ......... NC
C05 ......... NC
C06 ........ Vee
C07 ........ Vss
COB ........ Vss
C09 ........ Vss
C10 ........ Vss
C11 ........ Vss
C12 ........ Vss
C13 ...... CLKIN
C14 .. CLKMOOE
C15 ...... XINT4
C16 ...... XINT6
C17 ...... XINT7
001 ......... 05
002 ......... 02
003 ......... NC
015 ........ NMI
016 ......... A2
017 ......... A3
E01 ......... 07
E02 ......... 04
E03 ......... 00
E15 ........ Vee
E16 ......... A4
E17 ......... A5
F01
F02
F03
F15
......... OB
......... 06
Bus Control
Location .. Name
G01 ......... 09
G02 ........ Vee
G03 ........ Vss
G15 ........ Vss
G16 ......... A7
G17 ......... AB
Processor Control
Location . ... Name
M01 ......... 016
M02 ......... Vee
M03 .......... Vss
M15 .......... Vss
M16 ......... Vee
M17 .......... A14
HOi ........ 011
H02 ........ 010
N01 .......... 017
N02 .......... 01B
N03 .......... Vee
N15 .......... Vee
N16 .......... A16
N17 .......... A15
H03
H15
H16
H17
........ Vss
........ Vss
........ Vee
......... A9
J01
J02
J03
J15
J16
J17
........ 012
........ Vee
........ Vss
........ Vss
........ Vee
........ A10
K01 ........ 013
K02 ........ Vee
K03 ........ Vss
K15 ........ Vss
K16 ........ Vee
K17 ........ A11
L01
L02
L03
L15
L16
L17
........ 015
........ 014
........ Vss
..... , .. Vss
........ A13
........ A12
........ Vee
........ Vss
F16 ........ Vee
F17 ......... A6
3-17.9
POi
P02
P03
P15
P16
P17
.......... 019
.......... 020
.......... 022
.......... A20
.......... A19
.......... A17
001 .......... 021
002 .......... 023
003 .......... 026
004 .......... 02B
005 .......... 030
006 .......... Vee
007 .......... Vss
OOB ........... Vss
009 .......... Vss
010 .......... Vss
011 .......... Vss
012 ......... SUP
013 .......... A30
014 .......... A2B
015 .......... A24
016 .......... A21
017 .......... A1B
1/0
Location .. Name
R01 ........ 024
R02 ........ 027
R03 ........ 031
R04 ..... BTERM
R05 ...... HOLD
R06 ....... ADS
R07 ........ Vee
ROB ........ Vee
R09 ........ BEO
R10 ........ Vee
R11 ........ Vee
R12 ....... OMA
R13 ...... BREO
R14 ........ A29
R15 ........ A26
R16 ........ A23
R17 ........ A22
S01 ........ 025
S02 ........ 029
S03 ..... READY
S04 ..... HOLOA
S05 ........ BE3
S06 ........ BE2
S07 ........ BE1
SOB ..... BLAST
S09 ....... DEN
S10 ....... WIR
S11 ....... OT/R
S12 ....... WAIT
S13 ....... . 0/C
S14 ...... LOCK
S15 ........ A31
S16 ........ A27
S17 ........ A25
int:eL
80960CA-33, ~25, -16
R
Q
P
N
M L
K
J
H
024
021
019
017
016
015
013
012
011
029
027
023
020
018
Vee
014
V~e Vee
010
READY
031
026
022
HOLDA
sTERM
BE3
S
G
D
C
09
08
07
05
03
sNF NC
Vee
06
04
02
01
STEST FAIL
Vee vss vss v;s V;s Vss V;s Vee
DO
NC
ONCE
Ne
028
NC
NC
NC
HOLD
030
NC
DREOO
NC
5
BE2
ADS
v';;'e
Vce
DAEQ2
OREQl
6
Bii'
vee v;s
vss vee
DREo3
7
BLAST
v~e vss
v;s
B'Aa
oil nam+IQ
,
,,
@>
]
'">-
d<>
<>
,,f'
nom+5
::J
«
>
>-
~
=>
0
,/
,,
,,/
,,
/
outputs except: LOCK DMA SUP I
BREQ, DACK3:0, EOP3:0/Tc3:0, rAil
- - All
- - - lOCK, DM.&2!LP ~Q, DACK3:0,
EOP3:0/TC3:0, rAil
,/
150
100
Ct.
(pr)
NOTE:
PCLK Load
50 pF
Figure 22. Output Delay or Hold vs Load Capacitance
3-203
270727-17
int:et
80960CA-33, -25, -16
10
O.BV
to 2.0V
50
100
cL
(pr)
(a) All outputs except: LOCK, DMA, SUP, HOLDA, BREQ,
DACK3:0, EOP3:0/TC3:0, FAIL
(b) LOCK, DMA,
EOP3:0/TC3:0, FAIL
SUP,
HOLDA,
BREQ,
DACK3:0,
270727-18
Figure 23. Rise and Fall Time Derating at Highest Operating Temperature and Minimum Vcc
1000
,..---.,--------------:l
o~------------~
o
~
fpClK (MHz)
270727-19
Icc-Icc under test conditions
Figure 24. Icc vs Frequency and Temperature
3-204
80960CA-33, -25, -16
5.0
The following table lists the condition of each processor output pin while HOLDA is asserted (low).
RESET, BACKOFF AND HOLD
ACKNOWLEDGE
Table 14. Hold Acknowledge
and Backoff Conditions
The following table lists the condition of each processor output pin while RESET is asserted (low).
Pins
Table 13. Reset Conditions
Pins
State During Reset
(HOLDA inactive)1
State During HOLDA
A31:A2
Floating
031:00
Floating
A31:A2
Floating
BE3:0
Floating
031:00
Floating
W/R
Floating
BE3:0
Driven high (Inactive)
ADS
Floating
W/R
Driven low (Read)
WAIT
Floating
ADS
Driven high (Inactive)
BLAST
Floating
WAIT
Driven high (Inactive)
DT/R
Floating
BLAST
Driven low (Active)
DEN
Floating
DT/R
Driven low (Receive)
LOCK
Floating
DEN
Driven high (Inactive)
BREQ
Driven (high or low)
LOCK
Driven high (Inactive)
D/C
Floating
BREQ
Driven low (Inactive)
DMA
Floating
D/G
Floating
SUP
Floating
DMA
Floating
FAIL
Driven high (Inactive)
SUP
Floating
DACK3
Driven high (Inactive)
FAIL
Driven low (Active)
DACK2
Driven high (Inactive)
DACK3
Driven high (Inactive)
DACK1
Driven high (Inactive)
DACK2
Driven high (Inactive)
DACKO
Driven high (Inactive)
DACK1
Driven high (Inactive)
EOP/TC3
Driven if output
DACKO
Driven high (Inactive)
EOP/TC2
Driven if output
EOP/TC3
Floating (set to input mode)
EOP/TC1
Driven if output
EOP/TC2
Floating (set to input mode)
EOP/TCO
Driven if output
EOP/TC1
Floating (set to input mode)
EOP/TCO
Floating (set to input mode)
NOTE:
. (1) With regard to bus output pin state only, the Hold Acknowledge state takes precedence over the reset state. Although asserting the RESET pin will internally reset the .
processor, the processor's bus output pins will not enter
the reset state if it has granted Hold Acknowledge to a previous HOLD request (HOLDA is active). Furthermore, the
processor will grant new HOLD requests and enter the
Hold Acknowledge state even while in reset.
For example, if HOLDA is not active and the processor is
in the reset state, then HOLD is asserted, the processor's
bus pins will enter the Hold Acknowledge state and
HOLDA will be granted. The processor will not be able to
perform memory accesses until the HOLD request is removed, even if the RESET pin is brought high. This operation is provided to simplify boot-up synchronization among
multiple processors sharing the same bus.
3-205
~
0
m
c:
CLKIN
Vee' ONCE
PCLK1.2
"TI
..
_ _ ADS.
-,=--OCK WAIT
DEN. DACKO:3
iEi
c
wiR.Qlli
CD
BREO.FAIL
I\)
01
01
BLAST
0
Co)
~
0
'0')
..
ifrn'. D.
CD
III
CD
~.=
~
<
_QQM...
..
EOPITCO:3
CD
0'1
:e
):10
~
II
~I
c:
~
[:
,
----tl
VCCANDCU
3
N
0
-..J
:D
_
SUP.
DMI'h~
DfC,BEO:3
_QQ1.L.
EOPITCO:3
CD
III
!
::e
-.
STEST
[
C)
<
CD
0
RESET
:
II
[
C
,
,\
,
:
::~:
[0
I'PCLJ<,~
'
[
I
II_'-~----~-----r----~---II
~
1
'1-1_;-'-'_ _,
"1
CD
N
!»
I
- - - r - - .,
:
:1
II
II
r -,- - - - - .L -
;
:'
,I::
II
II
- - - - ,- - - - - ..!. - -
11 -
II
q'e
I
I
T - - - - -'
I
::':,
,
'
7-----...;....-----.~,lrlITI'I"\(nlrll';':'I"\(nITIT\'- ~Xlmum R::LL:;:~;::~t State
,
3
Reset High to First
eU5 AcUvity,'c.pproxlmnW:,y 32 PCLK Periods
I L.J!",mum Reset Low. TIIm.16 PClK Periods
270727~21
co
o
CO
C'l
o
(")
)::0
I
(,)
-(,)
I
N
91
....C'l
I
_.
£
~
eLKIN may noilloal
II musl be drrven high or low. or contll1ue 10 run
\1\10\10\10\1\10\1\1\10
eLKIN
_____
1-------
Vee
I- Vc~~DClKINSlable
II
toOutputsValid.manTlUtTl
32 elKIN Periods
11------11---,---------
..:
PClKI.2
"11
ii'J'
r:
iil
ADS. BEO:3.A2:32. 00:31.
lOCK. WAif. BlAST. WiR.
0';; DEN. OT'A. HOlD. HOLDA.
[i
~~----:---II-~-----:-;---~--I
-11--1_ _~......ss:z~~!.IJNi::J:JJOOCIJI:alLlLlLlq
I\)
:"I
_
I
I
I
I
CD
'
o
m
CD
=
;(,.) :::!.
, CD=
o
ex>
~
I\)
o
Q)
o
):0
I
Co)
..
:::T
CD
o
z
n
m
-I
ReSfi
ONCE
[~I-----:-------,---
II
11----:--:
I
I'\)
~U1
....
[~I
I
Q)
!:
i
eLKIN and VccSta.OJe and RESET low and ~lowto
RESET high. rninirrIU'n 32 eLKIN PeriodS in Two-x Mode.
10,000 elKIN Periods in One-x Mode.
270727-51
InteL
80960CA-33, -25, -16
CLKIN
PCLK2:1
(Case I)
MAX
MIN
PCLK2:1
(Case 2)
t
SYNC
270727-65
NOTE:
Case 1 and Case 2 show two possible polarities of PCLK2:1.
Figure 28. Clock Synchronization in the 2x Clock Mode
3-209
in1:et
80960CA-33, -25, -16
Region Table Entry
··
'"·
a
>
a
Byte
Order
.
'"
bits 31 23
bil22
bit 21
0
X
0
0,,0
Bus
Width
[
ADS
[
Nwad
'Nxdo
Nrdd
bits 20·19 bits 18-17 bits 16 12 bits 11-10
X
X
'0
D
bits 9-8
Nrad
bits 7·3
A
00
00000
D
Pipelining
bl12
Off
X
00000
0
A
'PClK
Nwdd
A
0
Exlernst
Ready
Burst
Coniroi
bit!
bit
0
Disabled Disabled
0
0
D
A31:4,$JP, [
DMA~
!:OCR.
3:0
W,R
[
BLAST
[
DT,R
[
DEN
[
A3:2
[
WAIT [
031:0 [
.......... -:- ..........
,,
,
270727-26
Figure 29. Non-Burst, Non-Pipelined Accesses without wait states
3-210
80960CA-33, -25, -16
Region Table Entry
v
v
.~
By1.
Order
0
X
b!1s.31·23
0 ... 0
till 22
,
..
bot"
Bus
Wld1h
[
ADS
[
BE3:O
[
wlii
[
BLAST
[
DTIA
[
DEN
[
A31:2,
SiJP,OOA,
O/~,~
WAIT
X
0
0
031:0 [
N.d.
X
X
"
"
3
blernal
R•• dy
bits 7 3
bII'
bill
X
3
Oft
"
000"
"
Nrad
bits 9·8
2
'Plpe-
lining
Nrdd
,,,1
XXXXI
[
[
NWDd
Ms20-19 bits 18 17 bIIs 16-12 Ms 1110
A
PCLK
Nwdd
D
Burat
Conlrot
""0
Disabled DI •• blad
"
0
A
Valid
I
I
:
,
:: .:: :: :: 0!
I
I
,
I
I
I
I
I
,
I
-- .... --- .. '......... _ .. - ....... -- .... - .. -- .. _.................
I
I
I
I,
IN
-- .. --
,
I
,
,
I
0
270727-27
Figure 30. Non-Burst, Non-Pipelined Read with wait states
3-211
int:et
80960CA-33, -25, -16
Region Table.Entry
·
.··:
Byte
Order
bits 31-23
b1122
bll21
0
X
0
0
"
0 .. 0
.
"
.~
Bu.
Width
[
ADs
[
A31 :2. BE3:li
[
Nwad
Nada
bilS 20-19 bits 18·17 bits 16,'2 bits 11-10
. .
X
X
A
PCLK
Nwdd
D
3
1
00011
0'
Nrdd
bIIs9-S
.
X
Nrad
Plpelining.
X
"'2
Oil
xxxxx
0
bits 7 3
Externa'
R•• dy
Burst
Control
bl11
"'0
Di •• bled DI ••bled
3
0
'"
. W,R [
etm[
OT,R [
DEN [
SUP.ImA. [
Valid
Of~.~
WAIT [
031:0 [
muu+<
' - -_ _ _
OUT
__
,,
-~) .. -- ..
:
1
270727-28
Figure 31. Non-Burst, Non-Plpellned Write with wait states
3.212
80960CA-33, -25, -16
Region Table Entry
..
~
~
Bus
Wldlh
Bylo
Order
>
Nwdd
N\'Iad
Nxd.
Plp.-
Nrdd
Nrad
blts9·B
bits 7·3
bit
0
00
0
00000
Off
lining
~
~
blls31·23
bil22
tH\21
0
0 ... 0
X
0
0
,
bils 20·19 bils 18·17 bits 16·12 blIs ,,.,0
32-blt
X
'0
"
A
X
xxxxx,
o
0
00
o
o
0
o
2
External
Ready
Contral
Burst
bit 1
bit 0
Disabled Enabled
a
1
A
PCLK [
ADS [
A~~'[
OMA,D/C,
meR, BE3:O
BLAST [
DTIR [
A3:2 [
WAIT [
,,
031:0 [
,
-----+; ----e---E}---6 --:
:
-:
,
"
270727-29
Figure 32. Burst, Non-Pipelined Read without wait states, 32-bit bus
3-213
int'et
80960CA-33, -25, -16
Region Table Entry
··
"'·
··
"'·
Order
bits ]1·23
bit 22
M21
0
X
0
"
O.. 0
"
Byte
,
[
ADS
[
SUP,OWI,
D/C,WCi(
A31:4, BE3:0
w-fi
Width
Nwdd
Nwad
Nxda
32-blt
'0
2
1
0
Nrdd
Pipe·
lining
Nrad
bIIsS·8
bits 7·3
brt2
X
X
1
1
2
Off
"
.,(leU
0'
0'
000'0
°
bits 20·19 bits 18-17 bits 16·12 bls 11-10
0
A
PCLK
Bus
D
D
Exl.rn.1
Reidy
Burst
Conlrol
bIt 1
briO
DI •• bled En.bled
0
,
A
D
[
[
[
BLAST [
DTIA
[
DEN
[
A3:2 [
I
WAIT [
031:0 [
I
I
I
L·8· Le··:··8··:·~. .. . .
I
... j.....l...:
I
: :
,
:
:
I
f
:
'
;
:
270727-30
Figure 33. Burst, Non-Pipelined Read with wait states, 32-bit bus
3-214
80960CA-33, -25, -16
~
Byte
Order
.
Region Table Entry
Bus
Width
Nxda
Nwod
Nrdd
Nrad
Pipe·
lining
~
~
bits 31·23
0
bit 22
b,12!
X
0
,
0 ... 0
'"
PCLK
[
ADS
[
A31:4,SQP,
llMA, Ole,
[(jCK, B!:3:O
[
b'ls 9-8
bits 7·3
bit 2
0
X
X
Oft
00
"
bils 20-19 bits 18-17 blls 16-12 blls 11-10
32·blt
10
A
W~
Nwdd
0
0
00
00000
0
0
0
External
RODdy
bit1
0
bit 0
Disabled Enabled
0
XXXKX
Burst
Control
0
1
A
[
BLAST [
DT/R
[
DEN
[
A3:2
[
WAIT
[
............:,
031:0 [
270727-31
Figure 34. Burst, Non-Pipelined Write without wait states, 32-bit bus
3·215
inlet
80960CA-33, -25, -16
Region Table Entry
···
~
~
!
Byte
Ordar
0
0 ... 0
[
ADS
[
WP,l5MA,
DIG, LOCK
A31:4,
Nwad
Nxd.
Nrdd
bil21
X
0
32·bll
0
10
,
bfls
20-19 bits 18-17 bils 16-12 bits 11·10
1
1
3
bif 22
A
PCLK
Nwdd
Nrad
Pipe.
lining
~
~
bIIs 31·23
Bus
Width
, 2
00011
01
bits 9 8
bits 7 3
M2
X
X
Oft
xuxx
0
01
"
D
D
Exlern.'
Re.dy
Burst
Control
bIT 1
bilO
Di .. bled Enabled
0
D
1
A
[
BE3:O [
w~[
BLAST [
DT/R
[
DEN
[
A3:2
[
WAIT [
03,:0 [
uu:(
.,
1
aUTO
X
,
OUT1
1
X
OUT2
X
}--
OUT3
_ _ _ 01
I
I
270727-32
Figure 35. Burst, Non-Pipelined Write with wait states, 32-bit bus
3-216
80960CA-33, -25, -16
.
Region Table Entry
.
Byte
Bun
Width
>
Order
~
bus 31-23
btl 22
bit 21
bIIs 20·19
0
X
0
16·blt
0
("
0
N\1Iad
Nwdd
!I.do
Nrdd
Nrod
E.t.rnal
Roady
Pip.·
lining
Surat
Control
~
0
.
A,
~ts
bLts 16·12 bits 11·10
18·17
X
X
u
xuu
1
0'
D
2
bits 9-8
bls 7·3
bit 2
1
2
00010
Oft
Q'
D
bilO
ba 1
Dlo.blod Enoblod
Q
,
Q
A
D
D
PClK [
ADS [
sup, OM A, [.
DIG, COCi<
A31:4, BE3/BHE,
BEO/BlE
[
w;R [
BLAST [
DT;R [
DEN [
BEliAl [
,
,,
WAIT [
,
I
031:0
I
I
I
I
I
I
'
I
I
,
t
I
,
I
i
i i i@:i@:!@:!e
.......... __ ....; .. _ ...... ;....
I
I
,
I
I
:
I
I
015:0 ....... - D15:0
AhO
'
:
I
'
AI_I
--;-I
I
015:0 ..... __ 015:0
Al.o
I
._
Aht
I
I ! !
I
270727-33
Figure 36. Burst, Non-Pipelined Read with wait states, 16-bit bus
3-217
infel .
80960CA-33, -25, -16
"~ :
:
Byte
Order
b/tS3t·23
till 22
0
X
0 ... 0
,
.
....j
Region Table Entry
"
Bua
Width
N,d.
Nwad
Nwdd
brts 20 19 bits 18-17 bit116·12 bils "·10
0
a·blt
X
X
1
0
00
"
xxxxx
e"
2
A
Nrdd
Nrad
Pipe·
lining
blts9·S
bits 7·3
bot.
1
CI'
2
00010
all
Ready
Burs.
Control
... 0
DI •• bled Enabled
"'"
CI
0
o
o
o
ElI.rn.'
1
A
PCLK [
ADS [
SIJIi, I)W;, [
DR:,~
Valid
Valid
'A31:4 [
W~
[
B~AST
[
DTIR [
A3:~
A3:2 = 00, 01, 100r 11
[
,
,,
\ :,
I,
Al·0
BEliAl [
BEOIAO
WAif [
031:0 [ .
,
= 00:
'-~---,t'
I
I
I':
.,' ':e:e: :'e::e
• __ ~ __ •••,._ •• _: __
I
:
I
:
I
~
I
I
07:0 __ ~ __ 07:0
D
I
I.:
S»,tel
I
__ :__
I
.!
I
I
07:0 __ ~ __
Byle
7.0
I
Byte
:
I
______ ,
270727-34
i=lgure 37. Burst, Non-Plpelined Read with wait states, 8~bit bus
3-218
InteL
80960CA-33, -25, -16
Region Table Entry
~
Bu.
Width
Byte
Order
MsJl·23
biln
bl121
0
X
0
,
Nxd.
bits 20·19 blIs 18·17 blt5 16·12 bl;S 11·\0
X
0
"
A
A'
D
PCLK
Nwad
Nrdd
Nr.d
blls 9-8
bits 7-3
X
0
Pipelining
~
~
0. ,0
Nwdd
X
X
X
"
XOll(
"
A"
D'
"
A'"
D"
External
Ready
Control
Burst
2
bill
biiO
On
X
Dlsablod
bit
00000
I
A""
D'"
D""
,
a
[
ADS [
A31:4, SUf,
DMA,D/C,
LOCK
[
WIR
[
..&s
[
D31:0
[
WAIT
[
BE3:0
..
BLAST [
DT/R
[
DEN
[
Non-pipeHned access concludes,
pipetined reads begin
Plpelined reads conclude,
Non-pipelined accesses begin
270727-35
Figure 38. Non-Burst, Pipelined Read without wait states, 32-bit bus
3-219
intel·
80960CA-33, -25, -16
Region Table Entry
I
Byte
Order
i
~
.
I
~
J
Bu.
Width
bits3U3
bit 22
bit 21
0
X
0
X
"
xx
"..."
Nwdd
Nwad
Nlde
bits 20·19 bits 18·'7 bits 16·,2 bits 1'·10
X
X
xx
lIXlOIJt
A
X
xx
Nrdd
Nrad
Plp.lining
Ext.rn.1
Rudy
Burat
Control
bils9·8
bits 7·3
bit2
twt1
MO
X
xx
1
On
X
Dlubled
,
00001
,
0
D'
A'
D
PCLK [
ADS [
~mJI5.[
~
W,R
A3~
Bffi
Valid
Valid
[
[
031:0 [
WAIT [
BLAST [
DTIA
[
DEN
[
Non-p;pelined _
concludes.
pipoHned reads begin
270727-36
Figure 39. Non-Burst, Pipelined Read with wait states, 32-bit bus
3-220
80960CA-33, -25, -16
"E
Byte
Order
:
~
bits 31·23
bil22
·X
0
,
0 ...0
·
Region Table Entry
..···
bits2().19 bits 18·17 bits 16·12 bils 11·10
0
32·811
X
X
0
10
"
XlUIXX
D
D
"
blU'
BUI
Width
A
Nwdd
Nwod
Nxda
Nrdd
Nrad
Pip ••
lining
eltern,1
'"'X
,
bits 7·3
bil2
l(
0
0
On
"
00
00000
1
D
blts9·B
A'
D'
Burat
Reidy
Control
biiO
Enable,
1
D'
D
PCLK [
~~:[
meR. §E3:ij
•
A3:2 [
031:0 [
270727-37
Figure 40. Burst, Pipelined Read without wait states, 32-bit bus
3·221
inteL
80960CA-33, -25, -16
Region Table Entry
1
:
a:
Byte
Order
bitl31-23
bil22
0
X
0 ...0
,
'A
1
..
""0
0
2
Bu.
Width
Nwdd
NWld
Nxd.
bits 20·19 bits 18·17 bits 16·12 bitslHO
32-BII
X
X
X
10
"
"'"
"
o
D
Plp.-
Eat.rnal
lining
A.. dy
Conha •
bits 7-3
ba2
I';"
bUD
2
On
X
En.bled
Nrdd
Nr.d
bils9-8
1
0'
00010
o
A'
D
1
,
2
Bur.t
1
D'
PCLK [
Ai>S[
BT
W,R [
A3:2 [
031:0 [
WAiT [
BLASr[
DTIA [
DEN
[
270727-38
Figure 41. Burst, Pipelined Read with wait states, 32-bit bus
3-222
80960CA-33, -25, -16
..
Oyle
Region Table Entry
Bua
Width
Ordor
Nwdd
Nwad
Nxda
Nrdd
Nrad
Pipe·
lining
Exterul
0
'"
bits 31-23
bit 22
X
x
0
0 ... 0
A
'"
bil21
bits 20-19
0
16·BII
0
01
Burst
bits 16-17 bits 16-12 bits 11·10
bits9-B
bit:. 7-3
bit2
bot 1
bit 0
X
X
1
2
On
X
Enabled
X)(XXX
xx
01
00010
D
2
Readv
Control
X
xx
D
'D
A'
1
2
D
,
1
D'
PCLK [
ADS [
A31:4,SUP, [
l5W;,o/(;,
Valid
~/liHl:,
[(lQ(,BEO/m:E
A3:2 [
A3:2
'= 00
or 10
A3:2
= 01
or 11 '
BElIA1 [
031:0 [
WAIT [
BLAST [
DT/R [
DEN [
P!pelinod reads conclude.
Non-pipelined accesses begin
270727-39
Figure 42. Burst, Pipe lined Read with wait states, 16-bit bus
3·223
int:eL
80960CA-33, -25, -16
Region Table Entry
"
E
::
By1e
Ordar
:
Bu.
Width
Nwdd
Nwad
Nxda
Nrdd
Nrad
Plpelining
Ext.rnal
Bura'
A.. d,
Control
II:
bits 31-23
b022
0
X
0, .. 0
"E
"'"0
x
A
0
2
bits 20-19 bits 18·17 bit816·12 bits 11·10
8-BIt
X
00
xx
X
xxxxx
o
bils9·8
bits 7·3
bil2
bOl
... 0
1
2
On
Enabled
01
00010
X
x
X
xx
o
o
1
2
1
0'
PCLK.[
A3l :4.U. [
~~
A3:2 [
Valid
A3:2
=
00. 01.10,11
BE1/Al. [
BEO/AO
WArT [
iiLAsT[
DTIR [
ilEN[
270727-40
Figure 43. Burst, Pipelined Read with wait states, 8·bit bus
3-224
80960CA-33, -25, -16
Quad-Word Read
Nrad " O. Nrdd " 0, Nxda '" 0
Ready Enabled
Quad·Word Wnlc
Nwad= 1, Nwdd" 0, Nxda" 0
Read Enabled
PCLK
A3IA.;;UP.
~~
-
-
M.r--'-_....__.L.._ _.....- - ' ' - - - ' ' r -....- - . L . . - _....._....l_ _....L_-.......- _....._-''----'---!
_ _~_ _~_ _~___~~___~/~_~______~____~_~______~____~______~___~~___~______-!
O/C·~~'--~
,I
WAIT
~---~------~------~------~------r------l-------n
I
031:0
I
I
I
I
I
--~--~~--0-0-0-
00
03
03
270727-41
Figure 44. Using External READY
3-225
intel~
80960CA-33, -25, -.16
PClK
A31;4,SUP
mJA,INST
~C.BE"O
~\-~
____
~
____
~
____
~
____
~~~
____
~
____
~
____
~
__
~
____
~n-
__
~
wiFi
DT~
A3,A2
I
D31:0
I
I
I
I
I
I
I
I
I
-!,- - - ~-I~I
- -0- - -! - ~- ~-, ~- -~,-~ - - - ~- .¢.~,- ~ -I~,I
,x:J
',
I
I
I '
i
I
I
I
270727-42
NOTE:
READY adds memory access time to data transfers, .whether or not the bus access is a burst access. BTERM interrupts
a bus access, whether or not the bus access has more data transfers pending. Either the READY signal or the BTERM
signal will terminate a bus access if the signal is asserted during the last (or only) data transfer of the bus access.
Figure 45. Terminating a Burst with BTERM
3-226
int:et
80960CA-33, -25, -16
'
~
1
ADS
S
\.:::,
I
I
I
:c::
I
I
I
I
I
,
I
I
I
I
I
±;i>8u;st
: _~~rl
I
•
:7Tr;r:e~i_DS______--!-_
-r-~r"'--T /:
: : : :
I
I
Non-Burst
1
I
I
I
I
I
t
I
I
I
I
I
I
I
I
I
I
I
I
' I
Non-Burst
I
I
I
I
I
__ ls+8urs:t: ~i
i
I
I
'
~-+--+--'i-r--";-i~IS i ~i
~--.._~__..,:I
I
I
I
I
: : :I
r+
:~ :
:
1
s-JI :
I
~:
1
Suspend Request
I
I
I
I~:
i
~:
+--.;...-...;-------1
I
1
:
IS
1
Res~me Re~uest
1
I
i
I
031:0
(Writes)
I
!X
I
I
I
:::
I
I
I
I
I
I
:
I
I
I
.',
:
I
~
•
Begin Request
I
BOrF may not
I
I
I
I
I
I
I
I
, __ ,
I BOFF may
I
:
I
I
I
I
I
I
NWDO = 0
I
I
I
I
I
I
:
I
must be enabled; NRAD , NRDD , NWAD '
1
End Request
I
be asserted to suspend request
I
1
i :b-~-
: :
I
I
be asserted
NOTE: READY /BTERM
I
---Mi)c:t
:s-;-:
-:---i---i---ictb -~ ~r ~ - - ~ {
:
I
:}-~~r~--+<'-+--;',~
i
1
BOFF may n
be asserted
1
270727-66
Figure 46. BOFF Functional Timing
Word Read
NRAO =l. NXOA =l
HOLD STATE
Word Read
HOLD STATE
NRAO=O. NXOA=O
PCLK2: 1
A31 :2. SUP.
DMA. D/C.
BE3:0. WAIT.
DEN. DT/ii
HOLD
HOLDA
270727-67
Figure 47. HOLD Functional Timing
3-227
•
intel"
80960CA-33, -25, -16
PClK2:1~
,
,
System
clock
,
~\I-_"""
I
I
I
:.
!(BlAST
&: READY)
DACKx
(All modes)
DREQx
(Case 1)
(Note 1)
DREQx
(Case 2)
(Note 2)
___""'__-'-___.i-_ _-+___. i - _ Start
I
DMA
bus request
I
I
I
\\1------'"':'---+'r--~--_!---..L..- ~:~ ~~q~~~
,
:,r--..L..---!---..L..'"t---'----rll--......
, ___.....__
(Se. Not.)
...J..J
,
I
I
.~_~~~'%W--
,
I
I
DMA
acknowledge
High to prevent
next DMA cycle
, b : ~H5
I
I
I
I
I
DMA
} request
to prevent
next DMA cycle
High
~-~~~~-,
,
,
'~S5 ' ~H5
I
I
I
I
I
270727-70
NOTES:
1. Case 1: DREQ must deassert before DACK deasserts. Applications are Fly-by and some packing and unpacking
modes, adjacent load-stores or store-loads, loads followed by loads, and stores followed by stores.
2. Case 2: DREQ must be deasserted by the second clock (rising edge) after DACK is driven high. Applications are non
fly-by transfers and adjacent load-stores or store-loads.
.
3. DACKiis asserted for the duration of a DMA bus request. The request may consist of multiple bus accesses (defined
by ADS and BLAST. Refer to User's Manual for "access", "request" definition.
Figure 48. DREQ and DACK Functional Timing
~
I
':
~
I
I
I
~~
,.
2 ClKS MIN ---:
,...,.0---1----_+- 15 ClKS MAX
- i - - - - ! - - -.......,
270727-71
NOTE:
EOP has the same AC Timing Requirements as DREQ to prevent unwanted DMA requests.
EOP is NOT edge triggered. EOP must be held for a minimum of 2 clock cycles then EOP must be deasserted
within 15 clock cycles.
Figure 49. EOP Functional Timing
3-228
80960CA-33, -25, -16
PCLK2
DREQ
ADS
DACK
Tc
~
~
~
V
V
V
270727-72
NOTE:
Terminal Count becomes active during the last bus request of a buffer transfer. If the last LOAD/STORE bus request is
executed as multiple bus accesses, the TC will be active for the entire bus request. Refer to the User's Manual for
further information.
Figure 50. Terminal Count Functional Timing
RESET
---1
INTERNAL SELF TEST
PASS
I
--~----~!I~----~
- 60.000 CYCLES
BUS TEST
PASS
5 CYCLES
102 CYCLES
270727-73
Figure 51. FAIL Functional Timing
3-229
80960CA-33, "25, -16
WORD OFFSET
0
2
I
3
4
5
6
I
SHORT REQUEST (ALIGNED)
f7.T7I ,
SHORT-WORD
LOAD/STORE
~ BYTE,
,
BYTE REQU,ESTS
Hi~)l ';'" ,,,om
. . ; / \ : BYTE,
B~E
("."0)
REQUESTS
I"""=......'"""i WORD REQUErT (ALIGNED)
I
I
SHORT, BYTE, BYTE REQUESTS
WORD
LOAD/STORE
=~
,
,
,
SHORT, SHORT REQUESTS
~~,
BYTE, SHORT, BYTE REQUESTS
""""l='"""""= I
I
ONE DOUBLE-WORD REQUEST (ALIGNED)
,
~="+",,",,,=9
,
BYTE, SHORT, WORD, BYTE REQUESTS
,
=~=~
,
,
SHORT, WORD, SHORT REQUESTS
DOUBLE-WORD
LOAD/STORE
~~~~'""""
BYTE, WORD, SHORT, BYTE REQUESTS
=¥====¥=~
,
ONE DOUBLE-WORD REQUEST (ALIGNED)
~~="'"""i
I
I
270727-68
Figure 52. A Summary of Aligned and Unaligned Transfers for Little Endian Regions
3-230
intel·
WORD OFFSET
80960CA-33, -25, -16
0
2
3
5
4
.1
ONE THREE-WORD REQUEST (ALIGNED)
~~~~~~""""I
I
I
I
I
I
I
I
BYTE, SHORT, WORD, WORD, BYTE REQUESTS
~~~=F=~
I
I
I
I
SHORT, WORD, WORD, SHORT REQUESTS
TRIPLE-WORD
LOAD/STORE
~¥=~+=~~~
BYTE, WORD, WORD, SHORT, BYTE REQUESTS
~~~~~~~
WORD, WORD, WORD REQUESTS
~~~===+~~
WORD, WORD, WORD REQUESTS
1
~~4=~=+~~
1
ONE FOUR-WORD REQUEST (ALIGNED)
p=====~====~====~==~
I
I
rn7T777mh77STS7771m BYTE, SHORT, WORD, WORD, WORD,
I
"""'~¥;;';';;';';;';';;';';;';';;';''"4====+'''=':;';';;';':;';';;';4''''=:J SHOR~ REQUESTS I
BYTE, WORD, WORD, WORD, SHORT,
....+-"...."""''''''''''''+'''''''===t='''''''''''"'''''"'''''+=-"''''''''''"'' BYTE REQUESTS I
LLZ+ZZ:zqp:Z:Z:Z;;;PEEWqW BYTE
REQ~ESTS
rITtITmmmmmmmtrr.rr.rr.mrr.:m SHORT, WORD, WORD, WORD,
QUAD-WORD
LOAD/STORE
WORD, WORD, WORD,
PZZ24222ZPZ££$ZZ22j WORD REQUESTS
~,..,..,,..,..,,..,..,,.,...,,..,..,.,.,.,.,.,.,,.,,,..,.,.,",,,,,,,.,,..,,.,,..,,.,,..+,.,..,.,.,..,.,.,..,..,,.,,j
DOUBLE-WORD,
DOUBLE-WORD
F===+=;':;';';;;':;';';;;':;';';;"","==~+~~~~ REQUESTS
270727-69
Figure 53. A Summary of Aligned and Unaligned Transfers for Little Endian Regions (Continued)
3-231
_.
Write
Nwad;: 2, Nxda;: 0
<£:
Write
Nrad;: 1, Nxda = 0
Idle bus
(not In Hold Acknowledge state)
Ready Disabled
::::s
Ready Disabled
PCLK
ADS
A31:4. SUP.
DMA.INST.
n'r--.....- -.....__....._-.!.n--"'"n-!
"""""':-r--""':r--""':r--""':r--....,Iri
I-_..".__..".__.....__.....__~,_ _
D/C,BE3:0'~1~________________~&-__~~~~~____~__~~__~~__~~~~~__~~__~____~__~~~~
=
"TI
cS'
W/RV
1\
~
:LV
c
~
UI
BLAST
f-
U)
a:
I\:l
c
iD
N
U)
ID
-If---II'
o
DEN
ID
o·a
:::I
~
A32={~111213}
A3.A2!.X
~
- - (
,
OUT
i~1 ,X
:> -- t,
co
o
CO
0)
o
(')
l>
,
Co)
~Co)
N
.
~U1
,X
,X
,X
,xii
,X
:II
WAIT ' : \
D310
\.i,
II
~
~,
,,
LH
,,
,
,
,
y~~~~
DT/R
UI
"0
x x x x x x 1. J
:x
~.2={0111,2{3} :x:::~
:\
I
I
I
I
I
I
,
"
,
__ L___
OUT
---,
,'~'
,
,
,"
,
READY.BTERM_~~_
,
270727-46
.....
0)
i960™ Me PROCESSOR
PRODUCT OVERVIEW
This chapter provides an overview of the architecture
of the i960 Me processor.
The i960 Me processor is the military-grade member of
a new family of processors from Intel. This processor
family is based on a new 32-bit architecture called the
i960 architecture. The i960 architecture has been designed specifically to meet the needs of embedded applications such as avionics, aerospace, weapons systems,
robotics and instrumentation, where high reliability is
critical. It represents a renewed commitment from Intel
to provide reliable, high-performance processors and
controllers for the embedded processor marketplace.
The i960 architecture can best be characterized as a
high-performance computing engine. It features highspeed instrumentation execution and ease of programming. It is also easily extensible, allowing processors
and controllers based on this architecture to be conveniently customized to meet the needs of specific processing and control appplications.
Some' of the important attributes of the i960 architec.
ture include:
o full 32-bit registers
.. high-speed, pipelined instruction execution
o a convenient program execution environment with
.32 general-purpose registers and a versatile set of
special-function registers
<> a highly optimized procedure call mechanism that
features on-chip caching of local variables and parameters
o extensive facilities for handling interrupts and faults
o extensive tracing facilities to support efficient program debugging and monitoring
o register scoreboarding and write buffering to permit
efficient operation with lower performance memory
subsystems
The i960 Me processor implements the i960 architecture, plus it offers several extensions to the architecture.
Some of these extensions, such as on-chip support for
floating-point arithmetic, virtual memory management
and multitasking, are designed to enhance overall system performance. Several other extensions are designed
to enhance system reliability and robustness. These extensions include facilities for hardware enforced protection of software modules and for creating fault tolerant
systems through the use of redundant processors.
The following sections describe those features of the
i960 architecture that are provided to streamline code
execution and simplify programming. The extensions to
this architecture provided in the i960 Me processor are
described at the end of the chapter.
HIGH PERFORMANCE PROGRAM
EXECUTION
Much of the design of the i960 architecture has been
aimed at maximizing the processor's computational
and data processing speed through increased parallelism. The following paragraphs describe several of the
mechanisms and techniques used to accomplish this
goal, including:
o an efficient load and store memory-access model
o caching of code and procedural data
o overlapped execution of instructions
o many one or two clock-cycle instructions
Load and Store Model
One of the more important features of the i960 architecture is that most of its operations are performed on
operands in registers, rather than in memory. For example, all the arithmetic, logical, comparison, branching and bit operations are performed with registers and
literals.
This feature provides two benefits. First, it increases
program execution speed by minimizing the number of
memory accesses required to execute a program. Second, it reduces memory latency encountered when using slower, lower-cost memory parts.
To support this concept, the architecture provides a
generous supply of general-purpose registers. For each
procedure, 32 registers are available (28 of which are
available for general use). These registers are divided
into two types: global and local. Both these' types of
registers can be used for general storage of operands.
The only difference is that global registers retain their
contents across procedure boundaries, whereas the
processor allocates a new set of local registers each time
a new procedure is called.
3-233
Order Number: 272031-001
intel®
i960™ MC PROCESSOR PRODUCT OVERVIEW
The architecture also provides a set of fast, versatile
load and store instructions. These instructions allow
burst transfers of I, 2, 4, 8, 12 or 16 bytes of information between memory and the registers.
Single-Clock Instructions
On-Chip Caching of Code and Data
To further reduce memory accesses, the architecture
offers two mechanisms for caching code and data on
chip: an instruction cache and multiple sets of local
registers. The instruction cache allows prefetching of
blocks of instruction from memory, which helps insure
that the instruction execution pipeline is supplied with
a steady stream of instructions. It also reduces the
number of memory accesses required when performing
iterative operations such as loops. (The size of the instruction cache can vary. With the i960 Me processor,
it is 512 bytes.)
It is the intent of the i960 architecture that a processor
be able to execute commonly used instructions such as
move, add, subtract, logical operations, compare and
branch in a minimum number of clock cycles (preferably one clock cycle). The architecture supports this
concept in several ways. For example, the load and
store model described earlier in this chapter (with its
concentration on register-to-register operations) allows
simple operations to be performed without the overhead of memory-to-memory operations.
Also, all the instructions in the i960 architecture are
32 bits or 64 bits long and aligned on 32-bit boundaries.
This feature allows instructions to be decoded in one
clock cycle. It also eliminates the need for an instruction-alignment stage in the pipeline.
The design of the i960 Me processor takes full advantage of these features of the architecture, resulting in
more than 50 instructions that can be executed in a
single clock-cycle.
To optimize the architecture's procedure call mechanism, the processor provides multiple sets of local registers. This allows the processor to perform most procedure calls without having to write the local registers out
to the stack in memory.
.
(The number of local-register sets provided depends on
the processor implementation. The i960 Me processor
provides four sets of local registers.)
Overlapped Instruction Execution
Another technique that the i960 architecture employs
to enhance program execution speed is overlapping the
execution of some instructions. This is accomplished
through two mechanisms: register scoreboarding and
branch prediction.
Register scoreboarding permits instruction execution to
continue while data is being fetched from memory.
When a load instruction is executed, the processor sets
one or more scoreboard bits to indicate the target registers to be loaded. After the target registers are loaded,
the scoreboard bits are cleared. While the target registers are being loaded, the processor is allowed to execute other instructions that do not use these registers.
The processor uses the scoreboard bits to insure that
target registers are not used until the loads are complete. (The checking of scoreboard bits is transparent to
software.) The net result of using this technique is that
code can often be optimized in such a way as to allow
some instructions to be executed parallel.
Efficient Interrupt Model
The i960 architecture provides an efficient ll1echanism
for servicing interrupts from external sources; To· han~
die interrupts, the processor maintains an interrupt table of 248 interrupt vectors (240 of which are available
for general use). When an interrupt is signaled, the
processor uses a pointer from the interrupt table to perform an implicit call to an interrupt handler procedure.
In performing this call, the processor automatically
saves the state of the processor prior. to receiving the
interrupt; performs the interrupt routine; and· then restores the state of the processor. A separate interrupt
stack is .also provided to segregate interrupt handling
from application programs.
The interrupt handling facilites also feature a method
of prioritizing interrupts. Using this technique, the
processor is able to store interrupts that are lower in
priority than the task the processor is currently working on in a pending interrupt section of the interrupt
table. At certain defined times, the processor checks the
pending interrupts and services them.
3-234
nn~®
i960™ MC PROCESSOR PRODUCT OVERVIEW
SIMPLIFIED PROGRAMMING
ENVIRONMENT
Partly as a side benefit of its streamlined execution environment and partly by design, processors based on
the i960 architecture are particularly easy to program.
For example, the large number of general-purpose registers allows relatively complex algorithms to be executed with a minimum number of memory accesses. The
following paragraphs describe some of the other features that simplify programming.
Highly Efficient Procedure Call
Mechanism
The procedure call mechanism makes procedure calls
and parameter passing between procedures simple and
compact. Each time a call instruction is issued, the
processor automatically saves the current set of local
registers and allocates a new set of local registers for
the called procedure. Likewise, on a return from a procedure, the current set of local registers is deallocated
and the local registers for the procedure being returned
to are restored. On a procedure call, the program thus
never has to explicitly save and restore those local variables and parameters that are stored in local registers.
Versatile Instruction Set and
Addressing
The selection of instructions and addressing modes also,
simplifies programming. The architecture offers a full
set of load, store, move, arithmetic, comparison and
branch instructions, with operations on both integer
and ordinal data types. It also provides a complete set
of Boolean and bit-field instructions, to simplify operations on bits and bit strings.
The addressing modes are efficient and straightforward,
while at the same time providing the necessary indexing
and scaling modes required to address complex arrays
and record structures.
The large 4-gigabyte address space provides ample
room to store programs and data. The availability of 32
addressing lines allows some address lines to be memory-mapped to control hardware functions.
Extensive Fault Handling Capability
To aid in program development, the i960 architecture
defines a wide selection of faults that the processor detects, including arithmetic faults, invalid operands, in-
valid operations and machine faults. When a fault is
detected, the processor makes an implicit call to a fault
handler routine, using a mechanism similar to that described above for interrupts. The information collected
for each fault allows program developers to quickly
correct faulting code. It also allows automatic recovery
from some faults.
Debugging and Monitoring
To support debugging systems, the i960 architecture
provides a mechanism for monitoring processor activity
by means of trace events. The processor can be configured to detect as many as seven different trace events,
including branches, calls, supervisor calls, returns, prereturns, breakpoints and the ·execution of any instruction. When the processor detects a trace event, it signals a trace fault and calls a fault handler. Intel provides several tools that use this feature, including an incircuit emulator (ICETM) device.
SUPPORT FOR ARCHITECTURAL
!EXTENSIONS
The i960 architecture described earlier in this chapter
provides a high-performance computing engine for use
as the computational and data-processing core of embedded processor or controllers. The architecture also
provides several features that enable processors based
on this architecture to be easily customized to meet the
needs of specific embedded applications, such as signal
processing, array processing or graphics processing.
The most important of these features is a set of 32 special-function registers. These registers provide a convenient interface to circuitry in the processor or to pins
that can be connected to external hardware. They can
be used to control timers, to perform operations on spe"
cial data types or to perform I/O functions.
The special-function registers are similar to the global
registers. They can be addressed by all the register-access instructions.
EXTENSIONS INCLUDED IN THE
80960MC PROCESSOR
The extensions to the i960 architecture included in the
i960 MC processor are built on top of the processor's
core computing engine. These extensions are aimed at
improving the efficiency and reliability of embedded
systems.
3-235
infel~
i960TM MC PROCESSOR PRODUCT OVERVIEW
On-Chip Floating Point
Protection
The i960 MC.processor provides a complete implementation of the IEEE standard for binary floating-point
arithmetic (IEEE 754-185). This implementation in~
cludes a full set of floating-point operations, including
add, subtract, multiply, divide, trigonometric functions
and logarithmic functions. These operations are performed on single precision (32-bit), double precision
(64-bit) and extended precision (80-bit) real numbers.
The i960 MC processor offers two mechanisms for protecting critical data structures or software modules.
The first is the ability to use page rights bits to restrict
access to individual pages. Page rights allow varioiJs
levels of access to be assigned to. a.page, ranging from
no access to read only to read-write.
One of the benefits of this implementation is that the
floating-point handling facilities are completely integrated into the normal instruction execution environment. Single- and double-precision floating-point values
are stored in the same registers as non-floating point
values. Also, four 80-bit floating-point registers are provided to hold extended-precision values. .
The second protection mechanism is a user/supervisor
protection model. This two-level protection model provides hardware enforced protection of kernel procedures and data structures. When using this protection
mechanism, priviledged procedures and data are placed
in protected pages of memory. These pages can then be
accessed only through a procedure table, which provides a tightly controlled interface to kernel functions.
Multitasking
String and Decimai Operations
The i960 MC processor provides several instructions
for moving, filling and comparing byte strings in memory. These instructions speed up string operations and
reduce tbe amount of code required to handle strings.
The decimal instructions perform move, add with carry
and subtract with carry operations on. binary-coded
decimal (BCD) strings.
Virtual-Memory Support' .
Another of the i960 MC processor's important features
is support for virtual-memory management. When using the processor in virtual-memory mode, the processor provides each process (or task) with an address
space of up to 232 bytes. This address space is paged
into physical memory in 4 Kbyte pages. On-chip memory-management facilities handle virtual-to-physical
address translation. A translation look-aside buffer
(TLB) speeds address translation by storing virtual-tophysical address translations for frequently accessed
parts of memory, such as the location of the page tables
and the location of often used system data structures.
The i960 MC processor offers a variety of process management facilities to support concurrent execution of
multiple tasks. These facilities can be divided into two
groups: process scheduling and interprocess communications.
The process scheduling facilities consist of a set of general-purpose data structures and instructions, which are
designed . to support several different multitasking
schemes. For example, the processor provides a set of
instructions that allow the kernel to explicitly dispatch
a task (bind it to the processor) and to suspend a task
(save the current state of a task so that another. task can
be bound to the processor). These instructions can be
used within kernel procedures to schedule, dispatch
and preempt multiple tasks.
The processor also provides a unique feature called self
dispatching. Here, the kernel schedules tasks by queuing them toa dispatch port.'Thereafter, the processor
handles the dispatching, preempting and rescheduling
of the tasks automatically, independent of the kernel.
When using this mechanism, tasks can be scheduled by
priority, with up to 32 priority levels to choose from.
3-236
i960™ MC PROCESSOR PRODUCT OVERVIEW
The processor's interprocess communication facilities
include support for semaphores and communication
ports. These facilities allow synchronization of interdependent tasks and asynchronous communication between tasks.
l\Iluliiprocessing
The i960 MC processor provides several mechanisms
designed to simplify the design of multiple-processor
systems, allowing several processors to run in parallel,
using shared memory resources. One of these mechanisms is the self-dispatching capability described above.
Here, two or more processors can schedule and dispatch processes from a single dispatch port, with each
processor equally sharing the processing load.
The processor also provides an inter-agent communication (lAC) mechanism that allows processors to exchange messages among themselves on the bus. This
mechanism operates similarly to the interrupt mechanism, except that lAC messages are passed through
dedicated sections of memory. The lAC mechanism
can be used to preempt processes running on another
processor, to manage interrupt haridling or to initialize
and synchronize several processors.
A set of atomic instructions are also provided to synchronize memory accesses. Multiple processors can
then access shared memory without inserting inaccuracies and ambiguities into shared data structures.
Fault Tolerance
The i960 family of components supports fault-tolerant·
system design through the use of the M82965 Bus Extension Unit component. The M82965 allows two processors to be operated in tandem to form a self-checking
module. The two M82965s check the outputs of two
processors (a master and a checker) cycle-by-cycle. If
the checking M82965 detects a difference between outputs, it signals an error. A software recovery procedure
can then be initiated.
This fault detection mechanism supports several fault
detection and recovery techniques, including self healing, and continuous-operation (non-stop) systems.
LOOK FOR MORE IN THE FUTURE
The i960 architecture offers exceptional performance,
plus a wealth of useful features to help in the design of
efficient and reliable embedded systems. But equally
important, it offers lots of room to grow. The i960 MC
processor provides average instruction processing rates
of 7.5 million instructions per second (7.5 MIPS) at
20 MHz clock rate and 10 MIPS at a 25 MHz clock
rate(1).
However, the i960 MC processor is only the beginning.
With improvements in VLSI technology, future implementations of the i960 architecture will offer even
greater performance. They will also offer a variety of
useful extensions to solve specific control and monitoring needs in the field of embedded applications.
1. 1 MIP is equivalent to the performance of a Digital Equipment Corp. VAX 11/780.
3-237
80960MC
EMBEDDED 32-BIT MICROPROCESSOR
WITH INTEGRATED FLOATING-POINT UNIT
AND MEMORY MANAGEMENT UNIT
Military
• High-Performance Embedded
Architecture
- 25 MIPS Burst Execution at 25 MHz
- 9.4 MIPS* Sustained Execution at
25 MHz
•
On-Chip Memory Management Unit
- 4 Gigabyte Virtual Address Space
per Task
- 4 Kbyte Pages with Supervisor/User
Protection
• On-Chip Floating-Point Unit
- Supports IEEE 754 Floating-Point
Standard
- Full Transcendental Support
- Four 80-Bit Registers
- 5.2 Million Whetstones/Second at
25 MHz
•
Built-In Interrupt Controller
- 32 Priority Levels
- 248 Vectors
- Supports M8259A
- 3.4 /Ls Latency
•
Easy to Use, High Bandwidth 32-Bit Bus
- 66.7 MBytes/s Burst
- Up to 16-Bytes Transferred per Burst
•
Multitasking and Multiprocessor
Support
- Automatic Task Dispatching
- Prioritized Task Queues
•
Advanced Package Technology
-132 Lead Ceramic Pin Grid Array
-164 Lead Ceramic Quad Flatpack
•
Military Temperature Range
- - 55°C to + 125°C (Tc)
• 512-Byte On-Chip Instruction Cache
- Direct Mapped
- Parallel Load/Decode for Uncached
Instructions
•
Multiple Register Sets
- Sixteen Global 32-Bit Registers
- Sixteen Local 32-Bit Registers
- Four Local Register Sets Stored
On-Chip (Sixteen 32-Bit Registers
per Set)
- Register Scoreboarding
The 80960MC isthe enhanced military member of Intel's new 32-bit microprocessor family, the 960 series,
which is designed especially for embedded applications. It is based on the family's high performance, common core architecture, and includes a 512-byte instruction cache, a built-in interrupt controller, an integrated
floating-point unit and a memory management unit. The 80960MC has a large register set, multiple parallel
execution units, and a high-bandwidth, burst bus. Using advanced RISC technology, this high performance
processor can respond to interrupts in under 3.4 f.Ls and is capable of execution rates in excess of 9.4 million
instructions per second.' The 80960MC is well-suited for a wide range of military and other high reliability
applications, including avionics, airborne radar, navigation, and instrumentation.
'Relative to Digital Equipment Corporation's VAX-ll/780" at 1 MIPS
. 271080-1
Figure 1. The 80960MC's Highly Parallel Microarchitecture
"VAX-l1TM is a trademark of Digital Equipment Corporation.
3-238
February 1991
Order Number: 271080-007
80960MC
THE 960 SIERIES
The 80960MC is the enhanced military member of a
new family of 32-bit microprocessors from Intel
known as the 960 Series. This series was especially
designed to serve the needs of embedded applications. The embedded market includes applications
as diverse as industrial automation, avionics, image
processing, graphics, robotics, telecomrnunications;
and automobiles. These types of applications require high integration, low power consumption; quick
interrupt response times, and high performance.
Since time to market is critical, embedded microprocessors need to be easy to use in both hardware
and software designs.
All members of the 80960 series share a common
core architecture which utilizes RISC technoiogy so
that, except for special functions; the family members are object code compatible. Each ,new processor in the series will add its own special set of functions to the .core to satisfy the needs. of a specific
application or range of applicationsin the embedded
marllet. For example, future processors may include
a DMA controller, a timer, or·an AID converter.
The 80960MC includes an integrated Floating Point
Unit (FPU), a Memory Management Unit (MMU),
multitasking support, and multiprocessor support.
There are also two commercial members of the family: the 80960KB processor with integrated FPU and
the 80960KA without floating-point.
gO
0
SI}{TEEN
32-BIT
REGISTERS
GLOBAL
REGISTERS(l)
g15
fpO
FLOATINGPOINT
REGISTERS
FOUR 80-BITREGISTERS
fp3
rO
SI}{TEEN
32-BIT
REGISTERS
ADDRESS
SPACE
LOCAL
REGISTERS(2)
r15
32-BITS
ARITHMETIC CONTROLS
32-BITS
INSTRUCTION POINTER
32-BITS
PROCESS CONTROLS
32-BITS
TRACE CONTROLS
232 -1
NOTES:
1. Register g15 is reserved for stack management functions.
2. Registers rO, r1, and r2 are reserved for stack management functions.
Figure 2. Register Set
3-239
80960MC
KEY PERFORMANCE FEATURES
The a0960MC's architecture is based on the most
recent advances in RISC technology and is grounded in Intel's long experience in designing embedded
controllers. Many features contribute to the
a0960MC's exceptional performance:
1. Large Register Set. Having a large number of
registers reduces the number of times that a processor needs to access memory. Modern compilers can
take advantage of this feature to optimize execution
speed. For maximum flexibility, the a0960MC provides thirty-two 32-bit registers (sixteen local and
sixteen global) and four a~-bit floating-point global
registers. (See Figure 2.)
so that execution speed can be greatly improved by
ensuring that these core instructions execute in as
short a time as possible. The most-frequently executed instructions such as register-register moves,
add/subtract, logical operations, and shifts execute
in one to two cycles (Table 1 contains a list of instructions.)
. 3. Load/Store Architecture. Like other processors
based on RISC technology, the a0960MC has a
Load/Store architecture, only the LOAD and STORE
instructions reference memory; all other instructions
operate on registers. This type of architecture simplifies instruction decoding and is used in combination
with other techniques to increase parallelism.
2. Fast Instruction Execution. Simple functions
make up the bulk of instructions in most programs,
Displacement
Control
Opcode
Compare
and Branch
Opcode
Reg/Lit
Reg
Register
to Register
Opcode
Reg
Reg/Lit
Memory
Access-Short
Opcode
Reg
Base
Memory
Access-Long
Opcode
Reg
Base
M
Displacement
Modes
M
Mode
Figure 3. Instruction Formats
Reg/Lit
Offset
x
Displacement
3-240
Ext'd Op
Scale
xx
Index
80960MC
Table 1. B0960MC Instruction Set
Data Movement
Load
Store
Move
Load Address
Load Physical
Address
Comparison
Compare
Conditional
Compare
Compare and
Increment
Compare and
Decrement
Floating
Point
Arithmetic
Add
Subtract
Multiply
Divide
Remainder
Modulo
Shift
Add
Subtract
Multiply
Divide
Remainder
Scale
Round
Square Root
Sine
Cosine
Tangent
Arctangent
Log
Log Binary
Log Natural
Exponent
Classify
Copy Real
Extended
Compare
Bit and
Bit Field
Branch
Set Bit
Clear Bit
Not Bit
Check Bit
Alter Bit
Scan for Bit
Scan over Bit
Extract
Modify
Unconditional
Branch
Conditional Branch
Compare and
Branch
Conversion
Decimal
Call/Return
Convert Real to
Integer
Convert Integer to
Real
Move
Add with Carry
Subtract with Carry
Call
Call Extended
Call System
Return
Branch and Link
Fault
Conditional Fault
Synchronize Faults
Miscellaneous
Debug
Modify Trace
Controls
Mark
Force Mark
Flush Local
Registers
Inspect Access
Modify Arithmetic
Controls
Test Condition
Code
3-241
Logical
And
Not And
And Not
Or
Exclusive Or
NotOr
Or Not
Nor
Exclusive Nor
Not.
Nand
Rotate
String
Move String
Move Quick String
Fill String
Compare String
Scan Byte for
Equal
Process
Management
Schedule Process
Saves Process
Resume Process
Load Process Time
Modify Process
Controls
Wait
Conditional Wait
Signal
Receive
Conditional
Receive
Send
Send Service
Atomic Add
Atomic Modify
infel®
80960MC
4. Simple Instruction Formats. All instructions· in
the 80960MC are 32·bits long and must be aligned
on word boundaries. This alignment makes it possi-'
ble to eliminate the instruction-alignment stage in
the pipeline. To simplify the instruction decoder further, there are only five instruction formats and each
instruction uses only one format. (See Figure 3.)
In keeping with RISC design principles, the number
of addressing modes has been kept to a minimum
but includes all those necessary to ensure efficient
execution of high-level languages such as Ada, C,
and Fortran. Table 2 lists the memory addressing
modes.
5. Overlapped Instruction Execution. A load operation allows execution of subsequent instructions to
continue before the data has been returned from
memory, so that these instructions can overlap the
load. The 80960MC manages this process transparently to software through the use of a register scoreboard. Conditional instructions also make use of a
scoreboard so that subsequent unrelated instructions can be executed while the conditional instruction is pending.
Data Types
6. Integer Execution Optimization. When the result of an operation is used as an operand in a.subsequent calculation, the value is sent immediately to
its destination register. Yet at the same time, the
value is put back on a bypass path to the ALU,
thereby saving the time that otherwise would be required to retrieve the value for the next operation.
The 80960MC recognizes the following data types:
Numeric:
o 8-, 16-, 32- and 64-bit ordinals
o 8-, 16, 32- and 64-bit integers
o 32-, 64- and 80-bit real numbers
Non-Numeric:
oBit
o Bit Field
o Triple-Word (96 bits)
o Quad-Word (128 bits)
Large Register Set
7. Bandwidth Optimizations. The 80960MC gets
optimal use of its memory bus bandwidth because
the bus is tuned for use with the cache: the line size
of the instruction cache matches the maximum burst
size for instruction fetches. The 80960MC automatically fetches four words in a burst and stores them
directly in the. cache. Due to the size Of the cache
and the fact that it is continually filled in anticipation
of needed instructions in the program flow, the
80960MC is exceptionally insensitive to memory
wait states. In fact,each wait state causes only a
7% degradation in system perfomance. The benefit
is that the 80960MC will deliver outstanding performance even with a low cost memory system.
8. Cache Bypass, If there is a cache miss, the processor fetches the needed instruction, then sends it
on to the instruction decoder at the same time it
updates the cache. Thus, no extra time is taken to
load and read the cache.
The programming environment of the 80960MC includes a large number of registers. In fact, 36 registers are available at any time. The availability of this
many registers greatly reduces the number of memory accesses required to execute most programs,
which leads to greater instruction processing speed.
There are two types of general-purpose registers:
local and global. The 20 global registers consist of
sixteen 32-bit registers (GO through G 15) and four
80-bit registers (FPO through FP3). These registers
perform the same function as the general-purpose
registers provided in other popular microprocessors.
The term global refers to the fact that these registers retain their contents across procedure calls.
The local registers, on the other hand, are procedure specific. For each procedure call, the 80960MC
allocates 16 local registers (RO through R15). Each
local register is 32 bits wide. Any register can also
be used for floating-point operations; the 80-bit floating-point registers are provided for extended precision.
Memory Space and Addressing Modes
The 80960MC allows each task (process) to address a logical memory space of up to 4 Gbytes. In
turn, each task's address space is divided into four
1-Gbyte regions and each region can be mapped to
physical addresses by zero, one, or two levels of
page tables. The region with the highest addresses
(Region 3) is common to all tasks.
Multiple Register Sets
To further increase the efficiency of the register set,
multiple sets of local registers are stored on-chip.
This cache holds up to four local register frames,
which means that up to three procedure calls can be
made without having to access the procedure stack
resident in memory.
3-242
80960MC
Table 2. Memory Addressing Modes
" 12-Bit Offset
o 32-Bit Offset
• Register-Indirect
o Register + 12-Bit Offset
• Register
.. Register
• Register
II Register
+
+
x
+
32-Bit Offset
(Index-Register x Scale-Factor)
Scale Factor + 32-Bit Displacement
(Index-Register x Scale-Factor) + 32-Bit Displacement
Scale-Factor is 1, 2, 4, 8 or 16
Although programs may have procedure calls nested many calls deep, a pr.ogram typically oscillates
back and forth between only two or three levels. As
a result, with four stack frames in the cache, the
probability of there being a free frame on the cache
when a call is made is very high. In fact, runs of
representative C-Ianguage programs show that 80%
of the calls are handled without needing to access
memory.
procedure stack in memory to make room for a new
set of registers. Global register G15 is used by the
processor as the frame pointer (FP) for the procedure stack.
Note that the global and floating-point registers are
not exchanged on a procedure call, but retain their
contents,. making .them available to all procedures
for fast parameter passing. An illustration of the register cache is shown. in Figure 4.
If there are four or more active procedures and a
new procedure is called, the processor moves the
oldest set of local registers in the register cache to a
REGISTER
CACHE
ONE OF FOUR
LOCAL
REGISTER SETS
--------
--- ------==-=
LOCAL REGISTER SET
.........
~
31
o
271080-2
Figure 4. Multiple Register Sets Are Stored On-Chip
3-243
infel .
80960MC
Instruction Cache
To further reduce memory accesses, the 80960MC
includes a 512-byte on-chip instruction cache. The
instruction cache is based on the concept of locality
of reference; that is, most programs are not usually
executed in a steady stream but consist of many
branches and loops that lead to jumping back and
forth within the same small section of code. Thus, by
maintaining a block of instructions in a cache, the
number of memory references required to read instructions into the processor can be greatly reduced.
To load the instruction cache, instructions are
fetched in 16-byte blocks, so that up to four instructions can be fetched at one time. An efficient
prefetch algorithm increases the probability that an
instruction will already be in the cache whim it is
needed.
'
.
Code for small loops will often fit entirely within the
cache, leading to a great increase in processing
speed since further memory references might not be
necessary until the program exits the loop. Similarly,
when calling short procedures, the code for the calling' procedure is likely to remain in the cache, so it
will be there on the procedure's return. . '
Register Scoreboarding
·free (Le., take no apparent time to execute) because
they are executed while the register is being loaded.
Up to three LOAD instructions can be pending at
one time with three corresponding scoreboard bits
set. By exploiting this feature, syst~m programmers
and compilers have a useful tool for optimizing execution speed.
.
Memory Management and Protection
The 80960MC will be especially useful for multitasking applications that require software protection and
a very large address space. To ensure the highest
level of performance possible, the memory management unit and translation look-aside buffer (TLB) are
contained on-chip.
.
The 80960MCsupports a conventional form of demand-paged virtual memory in which the address
space is divided. into 4 Kbyte pages.. Studies have
shown that a 4 Kbyte page is the optimum size for a
broad range of applications.
Each page table entry includes a 2-bit page rights
field that specifies whether the page is a no-access;
read-only, or read-write page. This field is interpreted differently depending on whether the current task
(process) is executing in user or supervisor mode, as
shown below:
Rights
00
01
10
11
. The instruction decoder has been optimized in several ways. One of these optimizations is the ability to
do instruction overlapping by means of register
scoreboarding.
Register score boarding occurs when a LOAD instruction is executed to move a variable from memory into a register. When the instruction is initiated, a
.scoreboard bit on the target register is set. When the
register is actually loaded, the bit is reset. Inbetween, any. r~ference to the register contents is accompanied by a test of the scoreboard bit to insure
that the load has completed before processing continues. Since the processor does not have to wait for
the LOAD to be completed, it can go on to execute
additional instructions placed in between the LOAD
instruction and the instruction that uses the register
contents, as shown in the following example:
LOAD R4, address 1
LOAD R5, address 2
Unrelated instruction
Unrelated instruction
ADD R4, R5, R6
User
No Access
No Access
Read-Only
Read-Write
Supervisor
Read-Only
Read-Write
Read-Write
Read-Write
Floating-Point Arithmetic
In the 80960MC, floating-point arithmetic has been
made an integral part of the architecture. Having the
floating-point unit integrated on-chip provides two
advantages. First, it improves ,the performance of
the chip for floating-point applications, since no
additional bus overhead is associated with floating~
point calculations, thereby leaving more time for other bus operations such as 1/0. Second, the cost of
using floating-point operations is reduced because a
separate coprocessor chip is not required.
The 80960MC floating-point (real number) data'
types include single-precision (32-bit), double-precision (64-bit), and extended precision (80-bit) floating-point numbers. Any register may be used to execute floating-point operations.
In essence, the two unrelated instructions between
the LOAD and ADD instructions are executed for
3-244
80960MC
The processor provides hardware support for both
mandatory and recommended portions of IEEE
Standard 754 for floating-point arithmetic, including
all arithmetic, exponential, logarithmic, and other
transcendental functions. Table 3 shows execution
times for some representative instructions.
Synchronization and Communication
Table 3. Sample Floating-Point Ellecution
Times (!-,-s) at 25 MHz
32-Bit
Square Root
Arctangent
Exponent
Sine
Cosine
The 80960MC also offers instructions to set up and
test semaphores to ensure that concurrent tasks
remain synchronized and no data inconsistency
results. Special data structures, known as communication ports, provide the means for exchanging
parameters and data structures. Transmission of information by means of communication ports is asynchronous and automatically buffered by the processor.
64-Bit
0.4
0.4
0.7
1.3
0.5
0.5
1.3
2.9
3.7
10.1
11.3
15.2
15.2
3.9
13.1
12.5
16.6
16.6
Add
Subtract
Multiply
Divide
During these operations, no communication between the processor and the operating system is
necessary until the running task is complete or an
interrupt is issued.
Communication between tasks by means of ports
can be carried out independently of the operating
system. Once the ports have been set up by the
programmer, the processor handles the message
passing automatically.
Multitasking Support
Multitasl
>
271080-3
Figure 5. Local Bus Signal Groups
Multiple Processor Support
One means of increasing the processing power of a
system is to run two or more processors in parallel.
Since microprocessors are not generally designed to
run in tandem with other processors, designing such
a .system is usually difficult and costly.
The 80960MC solves this problem by offering a
number of ·functions to coordinate the actions of
multiple processors. First, messages can be passed
between processors to initiate actions such as flushing a cache, stopping or starting another processor,
or preempting a task. The messages are passed on
the bus and allow multiple processors to run together smoothly, with rare need to lock the bus or memory.
Second, a set of synchronization instructions help ,
maintain the coherency of memory. These instructions permit several processors to modify memory at
the same time without inserting inaccuracies or ambiguities into shared data structures.
The self-dispatching mechanism, in addition to being
used in single-processor. systems, provides the
means to increase the performance of a system
merely by adding processors. Each processor can
either work on the same pool of tasks (sharing the
same queue with other processors) or can be restricted to its own queue.
When processors perform system operations, they
synchronize themselves by using atomic operations
and sending special messages between each other.
And changing the number of processors in a system
never requires a software change. Software will execute correctly regardless of the number of processors in the system; systems with more processors
simply execute faster.
Interrupt Handling
The 80960MC can be interrupted in one of two
ways: by the activation of one of four interrupt pins
or by sending a message on the processor's data
bUL
.
The 80960MC is unusual in that it automatically handles interrupts on a priority basis and tracks pending
interrupts through its on-chip interrupt controller.
Two of the interrupt pins can be configured to provide M8259A handshaking for expansion beyond
four interrupt lines.
An interrupt message is made up of a vector number
and an interrupt priority. If the interrupt priority is
greater than that of the currently running task, the
processor accepts the interrupt and uses the vector
as an index into the interrupt table. If the priority of
the interrupt message ·is below that of the current
task, the processor saves the information in a section of the interrupt table reserved for pending interrupts.
Debug Features
The 80960MC has built-in debug capabilities. There
are two types of breakpoints and six different trace
modes. The debug features are controlled by two
3-246
80960MC
internal 32-bit registers, the Process-Controls Word
and the Trace-Controls Word. By setting bits in
these control words, a software debug monitor can
closely control· how the processor responds during
program execution.
The 80960MC has both hardware and software
breakpoints. It provides two hardware breakpoint
registers on-chip which can be set by a special command to any value. When the instruction pointer
matches the value in one of the breakpoint registers,
the breakpoint will fire, and a breakpoint handling
routine is called automatically.
The 80960MC also provides software breakpoints
through the use of two instructions, MARK and
FMARK. These instructions can be placed at any
point in a program and will cause the processor to
halt execution at that point and call the breakpoint
handling routine. The breakpoint mechanism is easy
to use and provides a powerful debugging tool.
Tracing is available for instructions (single-step execution), calls and returns, and branching. Each different type of trace may be enabled separately by a
special debug instruction. In each case, the
80960MC executes the instruction first and then
calls a trace handling routine (usually part of a software debug monitor). Further program execution is
halted until the trace routine is completed. When the
trace event handling routine is completed, instruction execution resumes at the next iristruction. The
80960MC's tracing mechanisms, which are implemented completely in hardware, greatly simplify the
task of testing and debugging software.
IFAULT DETECTION
The 80960MC has an automatic mechanism to
handle faults. There are ten· fault types including
trace, arithmE;ltic, and floating-point faults. When the
processor detects a fault, it automatically calls the
appropriate fault handling routine and saves the current instruction pointer and necessary state information to make efficient recovery possible. The processor posts diagnostic information on the type of fault
to a Fault Record. Like interrupt handling routines,
fault handling routines are usually written to meet
the needs of a specific application and are often included as part of the operating system or kernel.
Interagent Communications (lAC)
In order to coordinate their actions, processors in a
multiple processor system need a means for communicating with each other. The 80960MC does this
through a mechanism known as Interagent Communication messages or lACs.
lAC messages cause a variety of actions including
starting and stopping processors, flushing instruction caches and TLBs, and sending interrupts to other processors in the system. The upper 16 Mbytes of
the processor's physical memory space is reserved
for sending and receiving lAC messages.
BUILT-IN TESTABILITY
Upon reset, the 80960MC automatically conductsan
exhaustive internal test of its major blocks of logic.
Then, before executing its first instruction, it does a
zero check sum on the first eight words in memory
to ensure that the system has been loaded correctly.
If a problem is discovered at any point during the·
self-test, the 80960MC will assert its FAILURE pin
and will not begin program execution. The self-test·
takes approximately 47,000 cycles to complete.
System manufacturers can use the 80960MC's selftest feature during incoming parts inspection. No
special diagnostic programs need to be written, and
the test is both thorough and fast. The self-test capability helps ensure that defective parts will be discovered before systems are shipped, and once in
the field, the self-test makes it easier to distinguish
between problems caused by processor failure and
problems resulting from other causes.
COMPATIBILITY WITH 80960K-SERIES
Application programs written for the 80960K-Series
microprocessors can· be run on the 80960MC without modification. The 80960K-Series instruction set
forms the core of the 80960MC's instructions, so binary compatibility is assured.
For each of the ten fault types, there are numerous .
subtypes that provide specific information about a
fault. For example, a floating-point fault may have its
subtype set to an Overflow or Zero-Divide fault. The
fault handler can use this specific information to respond correctly to the fault.
3-247
intel@
80960MC
CHMOS
CMOS processes and opens a new era in microprocessor performance. It combines the high performance capabilities of Intel's industry-leading
HMOS technology with the high density and low
power characteristics of CMOS. The 80960MC is
available at 16, 20 and 25 MHz.
The 80960MC is fabricated using Intel's CHMOS IV
(Complementary High Speed Metal Oxide Semiconductor) process. This advanced technology eliminates the frequency and reliability limitations of older
Table 4a. 80960MC Pin Description: L-Bus Signals
Symbol
Type
Name and Function
I
SYSTEM CLOCK provides the fundamental timing for 80960MC systems. It is
divided by two inside the 80960MC to generate the internal processor clock. CLK2
is shown in Figure 9.
I/O
T.S.
LOCAL ADDRESS/DATA BUS carries 32-bit physical addresses .and data to and
from memory. During an address (Ta) cycle, bits 2-31 contain a physical word
address (bits 0-1 indicate SIZE; see below). During a data (T d) cycle, bits 0-31
contain read or write data. The LAD lines are active HIGH and float to a high
impedance state when not active.
CLK2
LAD31
-LADo
SIZE; which is comprised of bits 0-1 of the LAD lines during a T a cycle, specifies
the size of a transfer in words for a burst transaction.
LAD 1
LAD 0
0
0
1
1
0
ALE
T.S.
1 Word
2 Words
3 Words
4 Words
0
1
0
1
ADDRESS-LATCH ENABLE indicates the transfer of.a physical address. ALE is
asserted during aTa cycle and deasserted before the beginning of the T d state. It
is active LOW and floats to a high impedance state when the processor is idle or
is at the end of any bus access.
ADS
0
0.0.
ADDRESS STATUS indicates an address state. ADS is asserted every T a state
and deasserted during the the following T d state. For a burst transaction, ADS is
asserted again every Td state where READY was asserted in the previous cycle.
W/R
0
0.0.
0
0.0.
WRITE/READ ~pecifies, during a T a cycle, whether the operation is a write or
read. It is latched on-chip and remains valid during T d and Tw states.
DT/R
DEN
0
0.0.
DATA ENABLE is asserted during T d and T w cycles and indicates transfer of data
on the LAD bus lines.
I
READY indicates that data on LAD lines can be sampled or removed. If READY is
not asserted during aTd cycle, the Td cycle i~ extended to the next cycle by
inserting wait states (Tw), and ADS is not asserted in the next cycle.
110
0.0.
BUS LOCK prevents other bus masters from gaining control of the L-Bus
following the current cycle (if they would assert LOCK to do so). LOCK is used by
the processor or any bus agent when it performs indivisible Read/Modify/Write
(RMW) operations.
READY
LOCK
DATA TRANSMIT/RECEIVE indicates the direction of data transfer to and from
the L-Bus. It is low during Ta, T wand T d cycles for a read or interrupt
acknowledgement; it is high during T a, Twand T d cycles for a write. DT /R" never
changes state when DEN is asserted (see Timing Diagrams).
For a read that is designated as a RMW-read, LOCK is examined, if asserted, the
processor waits until it is not asserted; if not asserted, the processor asserts
LOCK during the Ta cycle and leaves it asserted ..
A write that is designated as an RMW-write deasserts LOCK in the T a cycle.
I/O = Input/Output, 0 = Output, I = Input, 0.0. = Open-Drain, T.S. = three state
Ta
=
T Address, T d
=
T Data, T W
=
T Wait, T r
=
T Recovery, Ti
=
Tldle, T h
3-248
=
T Hold
80960MC
Table 4a. 80960MC Pin Description: L-Bus Signals (Continued)
Symbol
Type
Name and Function
0
BYTE ENABLE LINES specify which data bytes (up to four) on the bus take part
in the current bus cycle. BE3 corresponds to LAD31-LAD24 and BEa corresponds
to LAD7-LADa.
BE3-BEa
O.D.
The byte enables are provided in advance of data. The byte enables asserted
during T a specify the bytes of the first data word. The byte enables asserted
during Td specify the bytes of the next data word (if any), that is, the word to be
transmitted following the next assertion of READY. The byte enables during the
T d cycles preceding the last assertion of READY are undefined. The byte enables
are latched on-chip and remain constant from one T d cycle to the next when
READY is not asserted.
For reads, the byte enables specify the byte{s) that the processor will actually use.
80960MC's will assert only adjacent byte enables (e.g., asserting just BEa and
BE2 is not permitted), and are required to assert at least one byte enable.
Accesses must also be naturally aligned (e.g., asserting BE1 and BE2 is not
allowed even though they are adjacent). To produce address bits Ao and A1
externally, they can be decoded from the byte enables.
HOLD
(HLDAR)
HOLD indicates a request from a secondary bus master to acquire the bus. If the
processor is initialized as the primary bus master this input will be interpreted as
HOLD. When the processor receives HOLD and grants another master control of
the bus, it floats its three-state bus lines, asserts HOLD ACKNOWLEDGE, and
enters the T h state. When HOLD is deasserted, the processor will deassert HOLD
ACKNOWLEDGE and go to either the Ti or T a state.
I
HOLD ACKNOWLEDGE RECEIVED indicates that the processor has acquired
the bus. If the processor is initialized as the secondary bus master this input is
interpreted as HLDAR.
HOLD timing is shown in Figure 11.
HLDA
(HOLDR)
O·
T.S.
HOLD ACKNOWLEDGE relinquishes control of the bus to another bus master. If
the processor is initialized as the primary bus master this output will be interpreted
as HLDA. When HOLD is de asserted, the processor will de assert HLDA and go to
either the Ti or T a state.
HOLD REQUEST indicates a request to acquire the bus. If the processor is
initialized as the secondary bus master this output will be interpreted as HOLDR.
HOLD timing is shown in Figure, 11.
CACHE
0
T.S.
CACHE indicates if an access is cacheable during a Ta cycle. The CACHE signal
floats to a high imp~dance state when the processor is, idle.
I/O = Input/Output, 0 = Output, I = Input, 0.0. = Open-Drain, T.S. = three state
Ta
=
T Address. T d
=
T Data. T W
=
TWait. Tr
=
T Recovery. Ti
=
Tldle. T h
3-249
=
T Hold
80960MC
Table 4b. 80960MC Pin Description: Module Support Signals
Symbol
BADAC
Type
Name and Function
I
BAD ACCESS, if asserted in the cycle following the one in which the last READY
of a transaction is asserted, indicates that an unrecoverable error has occurred on
the current bus transaction, or that a synchronous load/store instruction has not
been acknowledged.
STARTUP: During system reset, the BADAC signal is interpreted differently. If the
signal is high, it indicates that this processor will perform system initialization. If it
is low, another processor in the system will perform system initialization instead.
RESET
I
RESET clears the internal logic of the processor and causes it to re-initialize.
During RESET assertion, the input pins are ignored (except for BADAC and
lAC/INTo), the tri-state output pins are placed in a high impedance state, and
other output pins are placed in their non-asserted state.
RESET must be asserted for at least 41 ClK2 cycles for a predictable RESET.
The HIGH to lOW transition of RESET should occur after the rising edge of both
ClK2 and the external bus ClK, and before the next rising edge of ClK2.
RESET timing is shown in Figure 10.
FAILURE
0
0.0.
INITIALIZATION FAILURE indicates that the processor has failed to initialize
correctly. After RESET is deasserted and before the first bus transaction begins,
FAilURE is asserted while the processor performs a self-test. If the self-test
completes successfully, then FAilURE is deasserted. Next, the processor
performs a zero checksum on the first eight words of memory. If it fails, FAILURE
is asserted for a second time and remains asserted; if it passes, system
initialization continues and FAilURE remains deasserted.
N.C.
N/A
NOT CONNECTED indicates pins should not be connected. Never connect any
pin marked N.C.
lAC
(INTo)
I
INTERAGENT COMMUNICATION REQUEST/INTERRUPT 0 indicates either
that there is a pending lAC message for the processor or an interrupt. The bus
interrupt control register determines in which way the signal should be interpreted.
To signal an interrupt or lAC request in a synchronous system, this pin (as well as
the other interrupt pins) must be enabled by being de asserted for at least one bus
cycle and then asserted for at least one additional bus cycle; in an asynchronous
system, the pin must remain deasserted for at least two bus cycles and then be
asserted for at least two more bus cycles.
LOCAL PROCESSOR NUMBER: This signal is interpreted differently during
system reset. If the signal is at a high voltage level, it indicates that this processor
is a primary bus master (local Processor Number = 0); if it is at a low voltage
level, it indicates that this processor is a secondary bus master (local Processor
Number = 1).
INTj
I
INTERRUPT 1, like INTo, provides direct interrupt signaling.
INT2
(INTR)
I
INTERRUPT 2/1NTERRUPT REQUEST: The bus control registers determines
how this pin is interpreted. If INT2, it has the same interpretation as the INTo and
INTI pins. If INTR, it is used to receive an interrupt request from an external
interrupt controller.
I/O
INTERRUPT 3/INTERRUPT ACKNOWLEDGE: The bus interrupt control register
determines how this pin is interpreted. If INT3, it has the same interpretation as
the INTo, INT1, and INT2 pins. If INTA, it is used as an output to control interrupt'acknowledge bus transactions. The INTA output is latched on-chip and remains
valid during T d cycles; as an output, it is open-drain.
INT3
(INTA)
0.0.
= Input/Output, 0 = Output. I = Input. 0.0. = Open-Drain. T.S. = three state
Ta = TAddress. Td= TDa!a. Tw = TWai!. Tr = TRecovery. Ti = Tldle. Th = THoid
1/0
3-250
80960MC
ELECTRICAL SPECIFICATIONS
Power and Grounding
The B0960MC is implemented in CHMOS III technology and has modest power requirements. Its high
clock frequency and numerous output buffers (address/ data, control, error and arbitration signals)
can cause power surges as multiple output buffers
drive new signal levels simultaneously. For clean onchip power distribution at high frequency, 12 Vee
and 13 Vss pins separately feed functional units of
the B0960MC.
Power and ground connections must be made to all
power and ground pins of the B0960MC. On the circuit board, all Vee pins must be strapped closely
together, preferably on a power plane. Likewise, all
Vss pins should be strapped together, preferably on
a ground plane.
one or more interrupt lines are not used, they should
be pulled up or down to their respective deasserted
states. No inputs should ever be left floating.
All open-drain outputs require a pullup device. While
in some cases a simple pullup resistor will be adequate, we recommend a network of pullup and pulldown resistors biased to a valid VIH (:2 3.4V) and
terminated in the characteristic impedance of the circuit board. Figure 6 shows our recommendations for
the resistor values for both a low and high current
drive network, which assumes that the circuit board
has a characteristic impedance of 1oon. The advantage of terminating the output signals in this fashion
is that it limits signal swing and reduces AC power
consumption.
Characteristic Curves
Power Decoupling Recommendations
Liberal decoupling capacitance should be placed
near the B0960MC. The processor can cause transient power surges when driving the LoBus, particularly when it is connected to a large capacitive load.
Figure 7 shows the typical supply current require-II
ments over the operating temperature range of the
processor at supply voltage (Ved of 5V. Figure 8
shows the typical power supply current (led required by the B0960MC at various operating frequencies when measured at three input voltage
(Vedlevels.
low inductance capacitors and interconnects are
recommended for best high frequency electrical performance. Inductance can be reduced by shortening
the board traces between the processor and decoupiing capacitors as much as possible.
.
Figure 9 shows the typical capacitive derating curve
for the B0960MC measured from 1.5V on the system
clock (ClK) to O.BV on the falling edge and 2.0V on
the rising edge of the LoBus address/data (lAD) signals.
Connection Recommendations
For reliable operation, always connect unused inputs to an appropriate signal level. In particular, if
Vee
Vee
130n.
laon.
a0960Mc
a0960MC
OPEN-DRAIN
OPEN-DRAIN
OUTPUT
OUTPUT
;
390n.
:
2aon.
~
271080-27
271080-28
Low Drive Network:
High Drive Network:
• VOH = 3.42V
• IOL = 25.3 rnA
• VOH = 3.4lV
• IOL = 33.8 rnA
Figure 6. Connection Recommendations for Low and High Current Drive Networks
3-251
intel..
Vee
=
80960MC
S.OV
:1
vee = 5.0V
"<
.5
~a
~
iil
~
!l
"'"
.80
.60
·340
.20
.00
280
2.0
240
220
...
I-
i5
lli
::>
"~
- - - - - - - - - - - - - . 25UHz
"-
'------coooO-o-..o
~
•
8::
iil
20MHz
-'
16WHz
200
-60
-40
-20
0
20
40
60
80
100
120
500
450
400
350
300
250
/.
209
150
~
100
~
50
~
ii!::=
10
140
,/
...-'
,....- ------.......
:rr- ....---
~
15
20
OPERATING FREQUENCY (MHz)
CASE TEMPERATURE (OC)
(DATA POINTS TAKEN 0
----
o
o
---
.-:..--
.,.,....--
1.@4.5V D@5.0V
.@5.5V
-60, -5, 25. 95, 130OC)
I
271080"':30
271080-29
Figure 8. Typical Current vs Frequency
Figure 7. Typical Supply Current (Icc)
Test Load Circuit
Figure 10 illustrates the load circuit used to test the
80960MC's tristate pins, and Figure 11 shows the
load circuit used to test the open drain outputs. The
open drain test uses an active load circuit in the form
of a matched diode bridge. Since the open-drain outputs sink current. only the IOL legs of the bridge are
necessary and the IOH legs are, not used. When the
80960MC driver under test is turned off, the output
pin is pulled up to VREF (i.e., VOH). Diode 01 is
turned off and the IOL current source flows through
diode 02.
When the 80960MC open-drain driver under test is
on, diode 01 is also on, and the voltage on the pin
being tested drops to VOL. Diode 02 turns off and
IOL flows through diode 01.
'iii'
30
..5-
FALL
~ 25
~
Ji!
~
.s-"
o"
20
15
1...--:
~
10
--
NG
-
~~~
--" I'-RISI G
5
o
o
20
40
60
80
100
Capacitive Load. (pF)
271080-31
Figure 9. Capacitive Derating Curve
80960MC
TRISTATE OUTPUT
O_-c-1'"l
.,~~
271080-32
Figure 10. Test Load Circuit for
TRI-STATE Output Pins
271080-33
Figure 11_ Test Load Circuit for Open-Drain Output Pins
3-252
25
nntel®
80960MC
ABSOLUTE MAXIMUM RATINGS*
NOTICE: This data sheet contains information on
products in the sampling and initial production phases
of development. The specifications are subject to
change without notice. Verify with your local Intel
Sales office that you have the latest data sheet before finalizing a deSign.
Case Temperature
under Bias(7) ............•... - 55'C to + 125'C
Storage Temperature .......... - 65'C to + 150'C
Voltage on Any Pin .......... -0.5V to Vcc + 0.5V
Power Dissipation ................. 2.6W (25 MHz)
* WARNING: Stressing the device beyond the "Absolute
Maximum Ratings" may cause permanent damage.
These are stress ratings only. Operation beyond the
"Operating Conditions" is not recommended and extended exposure beyond the "Operating Conditions"
may affect device reliability.
D.C. CHARACTERISTICS
80960MC: T CASE(6) = - 55'C to + 125'C, V CC = 5V ± 5 %
Symbol
Vil
Parameter
Input Low Voltage
Min
Max
Units
-0.3
+0.8
V
Test Conditions
VIH
Input High Voltage
2.0
Vcc + 0.3
V
VCl
CLK2 Input Low Voltage
-0.3
+0.8
V
VCH
CLK2 Input High Voltage
0.55 Vcc
VCC + 0.3
V
Val
Output Low Voltage
0.45
V
(1,5)
VOH
Output High Voltage
V
(2,4)
Icc
Power Supply Current:
16 MHz
20 MHz
25MHz
375
420
480
mA
mA
mA
III
Input Leakage Current
±15
p.A
ILO
Output Leakage Current
±15
p.A
CIN
Input Capacitance
10
pF
IC = 1 MHz(3)
2.4
0,,;: Va ,,;: Vcc
. 0.45 ,,;: Va ,,;: Vcc
Co
I/O or Output Capacitance
12
pF
Ic = 1 MHz(3)
CClK
Clock Capacitance
10
pF
Ic = 1 MHz(3)
°JA
Thermal Resistance
(Junction-to-Ambient)
Pin Grid Array
Ceramic Quad Flatpack
21
29
'C/W
'C/W
Thermal Resistance
(Junction-to-Case)
Pin Grid Array
Ceramic Quad Flatpack
4
8
'C/W
'C/W
°JC
NOTES:
1. For three-state outputs, this parameter is measured at:
Address/Data ................... ; .................................................. ;................... .4.0 mA
Controls ............................................................................................... 5.0 mA
2. This parameter is measured at:
Address/Data .................................................... '.................................... -1.0 mA
Controls ............................................................................................. - 0.9 mA
ALE ................................................................................................. -5.0 mA
3. Input, output, and clock capacitance are not tested.
4. Not measured on open-drain outputs.
5. For open-drain outputs .................................................................................. 25 mA
6. Case temperatures are "instant on;'.
3-253
•
int'eL
80960MC
AC SPECIFICATIONS
This section describes the AC specifications for the
B0960MC pins. All input and output timings are
specified relative to the 1.5V level of the rising edge
of CLK2, and refer to the time at which the signal
EDGE
A
B
reaches (for output delay and input setup) or leaves
(for hold time) the TTL levels of LOW (O.BV) or HIGH
(2.0V). All AC testing should be done with input voltages of OAV and 2AV, except for the clock (CLK2),
which should be. tested with input voltages of OA5V
and 0.55 Vee.
D
C
A
B
C
CLK2
OUTPUTS:
LAD31-LADO'
ADS,
W/R,DEN,
BE3-BEo;
HLDA/HOLDR,
CACHE
LOCK,INTA
DT/R
INPUTS:
LAD 31 -LADo,
BADAC,
IAC/INTo,INT1,
INT2/INTR ,iNT3
HOLD,HLDAR,
LOCK,
READY
271080-4
NOTE 1:
For Tri-State pins, T6 and T9 are measured at 1.5V.
For Open-Drain pins, T6 is measured at 1.5V, T9 at O.BV.
Figure 12. Drive Levels and Timing Relationships for 80960MC Signals
3-254
80960MC
ClK2
ClK
ADS
BE(0:3)
~~
--T
M'\
~
T,±~t
I-Tg
Ts
~""~ ~"""~
R.
-T7~
~
~,,~
I;\:,
-
",,,,,,,,,,,,,,,,'\: ~"''''~ ~"''''~ ~
I-
T6l
~~~"'~~"\
I- T,3•
I::.
T6.
~'\.
~"'' "'' ''"'"'"""
~r'2-
~
I--
'"'"'""-
'"'"'"'"
~ "'~
M
~
'"'"'"'"
"'''''''~ ~'"
rY
i
'"'""'~
"'' ' ' -.....:: ~'"
T9r-
'f&.
~T~
,
k
Tgl--
-
~'"
I-Tll
'"
~"'''''''''' ~ ~"''''~ ~'" ~
I-T61
~~
~"'''''''''''"",,,,"I..
DEN
T14
~'~:J~::~'"~ ~
Tg -
- 1
'"'"
~"'~ ~'" '-:
-TA
Tge
T:C
~'\: ~"''''~
DATA
-Tg
~
-
-
~~
ADDRESS
T'4 I--
W/R
READY
T,
Td
I- T6}
H 6-r-- T'31
~"'''''''''''''~ DATA~~~~"''''''''-: ~~
ADDRESS
~1--T'3--1 T'4
DT/R
Td
~[r--.,~['---'[r--.,~~~Lr-'~r---.
I- Ts
ALE
To
~1t~~~rv\..,rv\..,r'vr'v
H6+- T13
LAD(31-0)
T,
Td
To
T12 I-Tll1
-
~"''''~
T'2- I-Tl1
'"'"
~"'~'"
271080-5
Figure 13. Timing Relationship of L·Bus Signals
3·255
'\:
intel®
80960MC
A.C. Specification Tables
80960MC A.C. Characteristics (16 MHz)
TeASE(3) = -55°C to + 125°C, VCC = 5V ±5%
Symbol
Parameter
Min
Max
Units
125
ns
VIN
VIL
Test Conditions
Tl
Processor Clock
Period (CLK2)
31.25
T2
Processor Clock
Low Time (CLK2)
8
ns
T3
Processor Clock
High Time (CLK2)
8
ns
T4
Processor Clock
Fall Time (CLK2)
10
ns
T5
Processor Clock
Rise Time (CLK2)
10
ns
VIN
T6
Output Valid
Delay
2
25
ns
CL
CL
T6H
HOLDA Output
Valid Delay
4
31
ns
T7
ALE Width
15
= 1.5V
= 10% Point
= 1.2V
VIH = 90% Point
= 0.1V + 0.5 Vec
VIN = 90% Point to 10%
Point
= 10% Point to 90%
Point
ns
= 100 pF (LAD)
= 75 pF (Controls)
CL = 75 pF
= 75 pF
CL = 75 pF(2)
CL
Ts
ALE Invalid Delay
0
20
ns
Ts
Output Float
Delay
2
20
ns
CL
CL
= 100 pF (LAD)
= 75 pF (Controls)(2)
TSH
HOLDA Output
Float Delay
4
20
ns
CL
= 75 pF
TlO
Input Setup 1
3
ns
(Note 1)
(Note 1)
T11
Input Hold
5
ns
T11H
HOLD Input Hold
4
ns
T12
Input Setup 2
8
ns
T13
Setup to ALE
Inactive
10
ns
T14
Hold after ALE
Inactive
8
ns
T15
Reset Hold
3
ns
T16
Reset Setup
5
ns
T17
Reset Width
1281
ns
=
=
CL =
CL =
CL
CL
100 pF (LAD)
75 pF (Controls)
100 pF (LAD)
75 pF (Controls)
41 CLK2 Periods Minimum
NOTES:
1. lAC/INTo, INT1 , INT2/INTR, INT3 can be asynchronous.
2. A float condition occurs when the maximum output current becomes less than ILQ. Float delay is not tested, but should be
no longer than the valid delay.
3. ease temperatures are "instant on".
3-256
InteL
80960MC
A.C. Specification Tables (Continued)
80960MC A.C. Characteristics (20 MHz)
T CASE(3) = - 55°C to + 125°C, VCC = 5V ± 5%
Symbol
Parameter
Test Conditions.
Min
Max
Units
Tl
Processor Clock
Period (CLK2)
25
125
ns
VIN
T2
Processor Clock
Low Time (CLK2)
6
ns
VIL
T3
Processor Clock
High Time (CLK2)
6
ns
= 10% Point
= 1.2V
VIH = 90% Point
T4
Processor Clock
Fall Time (CLK2)
10
ns
VIN
=
90% Point to 10%
Point
T5
Processor Clock
Rise Time (CLK2)
10
ns
VIN
=
10% Point to 90%
Point
Ts
Output Valid
Delay
2
20
ns
CL = 60 pF (LAD)
CL = 50 pF (Controls)
TSH
HOLDA Output
Valid Delay
4
26
ns
CL
T7
ALE Width
12
ns
CL = 50 pF
Te
ALE Invalid Delay
0
20
ns
CL = 50pF(2)
T9
Output Float
Delay
2
20
ns
CL = 60 pF (LAD)
CL = 50 pF (Controls)(2)
T9H
HOLDA Output
Float Delay
4
20
ns
CL=50pF
TlO
Input Setup 1
3
ns
(Note 1)
Tll
Input Hold
5
ns
(Note 1)
TllH
HOLD Input Hold
4
ns
T12
Input Setup 2
7
ns
T13
Setup to ALE
Inactive
10
ns
CL
CL
T14
Hold after ALE
Inactive
8
ns
CL = 60 pF (LAD)
CL = 50 pF (Controls)
T15
Reset Hold
3
ns
=
1.5V
= 0.1V
T1S
Reset Setup
5
ns
T17
Reset Width
1025
ns
=
=
=
+ 0.5 VCC
50 pF
-
60 pF (LAD)
50 pF (Controls)
41 CLK2 'Periods Minimum
NOTES:
1. lAC/INTo, INT1, INT2/INTR, INT3 can be asynchronous.
2. A float condition occurs when the maximum output current becomes less than ILQ. Float delay is not tested, but should be
no longer than the. valid delay.
3. Case temperatures are "instant on".
3-257
intel~
80960MC
A.C. Specification Tables (Continued)
80960MC A.C. Characteristics (25 MHz)
TCASE(3) = -55·Cto + 125·C, Vcc = 5V ±5%
Symbol
Parameter
Min
Max
Units
Test Conditions
125
ns
VIN
VIL
=
T1
Processor Clock
Period (CLK2)
20
T2
Processor Clock
Low Time (CLK2)
5
ns
T3
Processor Clock
High Time (CLK2)
5
ns
T4
Processor Clock
Fall Time (CLK2)
10
Ts
Processor Clock
Rise Time (CLK2)
10
ns
VIN
T6
Output Valid
Delay
2
19
ns
CL
CL
T6H
HOLDA Output
Valid Delay
4
24
ns
CL
T7
ALE Width
12
ns
CL
Te
AL.E Invalid
Delay
0
20
ns
' CL
T9
Output Float
Delay
2
19
ns
CL
CL
T9H
HOLDA Output
Float Delay
41
20
ns
CL
T10
Input Setup 1
3
ns
(Note 1)
Tn
Input Hold
5
ns
(Note 1)
ns
= 10% Point
= 1.2V
VIH = 90% Point
= 0.1V + 0.5 VCC
VIN = 90% Point to
10% Point
T11H
HOLD Input Hold
4
ns
T12
Input Setup 2
7
ns
T13
Setup to ALE
Inactive
8
ns
CL
CL
T14
Hold after ALE
Inactive
8
ns
CL
CL
Reset Hold
3
T1S
1,5V
'"
=
10% Point to
90% Point
=
=
=
60 pF (LAD)
50 pF (Controls)
=
=
50pF
=
=
=
60 pF (LAD)
50 pF (Controls)(2) ,
=
=
=
=
50 pF
50 pF(2)
50pF
60 pF (LAD)
50pF (Controls)
60 pF (LAD)
50 pF (Controls)
ns
"
T16
Reset Setup
5
ns
T17
Reset Width
820
ns
41 CLK2 Periods Minimum
NOTES:
1. IACIiNTo, INT" INT2I1NTR, INT3 can be asynchronous.
2. A float condition occurs when the maximum output current becomes less than ILQ. Float delay is not tested, but should be
no longer than the valid delay,
3. Case temperatures are "instant on".
3-258
80960MC
Tl
;;i
,
271080-6
Figure 14. Processor Clock Pulse.(CLK2)
FIRST
A
C
'
ABC
C
D
A
aGO
CLK2
CLK
RESET
OUTPUTS
:t~~--1~ eo
Q
--t--+--+----I~--I--
~
INIT PARAMETERS (BADAC,
IACo) MUST BE SETUP S CLOCKS - - PRIOR TO THIS CLK2 EDGE
INIT PARAMETERS MUST BE HELD
BEYOND THIS CLK2 EDGE
Figure 15. RESET Signal Timing
3·259
TIS
T16
T17
= RESET HOLD
= RESET SETUP
= RESET WIDTH
271080-7
int:et
80960MC
Th
Th
Th
Th
ClK2
ClK
HOlDR
HOLD
HlDA
HLDAR
271080-8
PRIMARY
SECONDARY
HOlD~HOlDR
HlDA
t---m-----.j HOlDAR
~------~
~------~
DELAY OF 5 ns MINIMUM
IS REQUIRED
271080-24
Figure 16. Hold Timing
Design Considerations
Input hold times can be disregarded by the designer
whenever the input is removed because a subsequent output from the processor is deasserted (e.g.,
DEN becomes deasserted).
In other words, whenever the processor generates
an output that indicates a transition into a subsequent state, the processor must have sampled any
inputs for the previous state.
Similarly, whenever the processor generates an output that indicates a transition into a subsequent
state, any outputs that are specified to be three stated in this new state are guaranteed to be three stated.
Designing for the ICE-960MC
The 80960MC In-Circuit Emulator assists in debugging 80960MC hardware and software designs. The
product consists of a probe module, cable, and control unit. Because of the high operating frequency of
80960MC systems, the probe module connects directlyto the 80960MC socket.
When designing an 80960MC hardware system that
uses the ICE-960MC to debug the system, several
electrical and mechanical characteristics should be
considered. These considerations include capacitive
loading, drive requirement, power requirement, and
. physical layout.
. TheICE-960MC probe module increases the load
capacitance of each line by up to 25 pF. It also adds
. one standard Schottky TTL load on the CLK2 line,
up to one advanced low-power Schottky TTL load
for each control signal line, and one advanced lowpower Schottky TTL load for each address/data and
byte enable line. These loads originate from the
probe module and are driven by the 80960MC processor.
To achieve high noise immunity, the ICE-960MC
probe is powered by the user's system. The highspeed probe circuitry draws up to 1.1 A plus the maximum current (Icd of the 80960MC processor.
The mechanical considerations are shown in Figure
17, which illustrates the lateral clearance requirements for the ICE-960MC probe as viewed from
above the socket of the 80960MC processor.
3-260
80960MC
I+t------ 3.8"-----~-.I
~
--.j
~
-I l
1.22"
1.13"
0.15"
r-------,
I
I
0
USER CPU
SOCKET
,
UNDER
I
E"U[AlION
:
PROCESSOR
I
I
I
'0
I
:
---_.,
.------_.1
VERTICAL
CLEARANCE 1.2"
o
\
[I.IUl,ATION
PROCESSOR
VIEW FROM
ABOVE USER CPU
SOCKET
4.75"
ICE PROCESSOR I.IODULE
n
5.5"
r
RIBBON CABLE CONNECTOR
V
CABLE TO ICE CONTROL UNIT
'J _____..::L-.
-
271080-9
Figure 17.ICE-960MC Lateral Clearance Requirements
MECHANICAL DATA
Package Dimensions and Mounting
Pin Assignment
Pins in the pin grid array package are arranged
0.100 inch (2.54mm) center-to-center, in a 14 by 14
matrix, three rows around. (See Figure 21.)
The 80960MC is packaged in a 132-lead ceramic pin
grid array and a 164-lead ceramic quad flatpack. The
80960MC pin grid array pinout as viewed from the
substrate side of the component is shown in Figure
18 and from the pin side in Figure 19. The 80960MC
ceramic quad flatpack pinout as viewed from the top
of the package is shown in Figure 20.
A wide variety of available sockets allow low-insertion or zero-insertion force mountings, and a choice
of terminals such as soldertail, surface mount, or
wire wrap. Several applicable sockets are shown in
Figure 22.
Vee and GND connections must be made to multiple Vee and GND pins. Each Vee and GND pin must
be connected to the appropriate voltage or ground
and externally strapped close to the package. Preferably, the circuit board should include power and
ground planes for power distribution. Tables 5, 6, 7
and 8 list the function of each pin.
NOTE:
Pins identified as N.C., "No Connect," should never
be connected under any circumstances.
Package Thermal Specification
The 80960MC is specified for operation when its
case temperature is within the range of - 55°C to
+ 125°C. The PGA case temperature should be
measured at the center of the top surface opposite
the pins as shown in Figure 23. The ceramic quad
flatpack case temperature should be measured at
the center of the lid on the top surface of the package.
WAVEFORMS
Figures 24 through 30 show the waveforms for various transactions on the 80960MC's local bus.
3-261
int:eL
p
N
M
80960MC
o
o
0
0
0
0
0
0
0
0
0
0
0
0
0
Vss
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
Vee
N~
N~
o
N.C.
L
0
N~
0
N~
9
0
N~
0
N~
0
N~
6
0
N~
5
0
N~
4
0
N~
H
G
F
E
0
0
0
0
0
0
0
N.C.
N.C.
Vee
Vss
Vss
Vee
N.C.
N.C.
000
Vss
Vee
N.C.
°
C
Vss
N.C.
N.C.
Vss
Vss
BE z
DT/R
000
N.C.
LOCK
MGB0960MC
BEo
BEl
F
CACHE LAD3l LAD z9
000
E
000
°
LADz~ LAD z6 LAD zs
N.C.
N.C.
Vee
o
o
0
0
0
0
0
0
0
000
INTI
INT3
Vss
Vee
LAD3
LADs LAD 13 LAD zo
Vss
Vee
BADAC LAD z5 HOLD
0
0
0
000
0
0
000
G
READY LAD30
000
N.C.
H
W/R
000
N.C.
RESET LADo
K
BE3
000
N.C.
000
Vss
A
N.C.
FAIL
L
DEN
N.C.
INTo
B
N.C.
N.C.
000
Vee
000
N.C.
M
0
N.C.
N.C.
000
N.C.
N
000
000
N.C.
0
Vee
Vss
p
Vg
0
000
N.C.
0
N~
N.C.
000
N.C.
0
N~
0
000
N.C.
3
N.C.
000
N.C.
K
0
10
2
Vee
N~
11
7
13
0
12
B
14
HLPA
0
0
CLK2
LAD2
d
0
0
0
ADS
0
ALE
0
C
B
LAD6 LADlO LAD12 LAD IS LAD IS LAD21 LAD22 LAD24 LAD 23
0
0
0
Vee
INT2
LADI
LAD4
LADs
LAD7
14
13
12
11
10
9
0
0
0
0
0
0
LAD9 LADll LAD14 LAD 16 LAD17 LAD 19
B
7
6
5
4
3
0
0
Vss
Vee
A
2
271080-11
Figure 18. MG80960MC Pinout-View from Top (Pins Facing Down)
3-262
80960MC
2
p
N
L
o
o
o
4
000
N.C.
N.C.
5
0
0
0
0
0
0
0
0
0
0
0
0
0
0
N.C.
N.C.
Vss
Vee
N.C.
N.C.
N.C.
N.C.
N.C.
0
N.C.
000
N.C.
Vee
FAIL
BE2
MG80960MC
000
000
000
ADS
HLDA
o
o
o
000
o
Vss
0
0
0
LAD20 LAD 13 LADs
000 000
0
00
0
2
0
0
0
0
LAD 19 LAD17 LAD 16 LAD14 LADll LAD9
3
4
5
6
7
B
N.C.
G
000
F
000
E
000
D
C
Vee
Vee
N.C.
000
N.C.
ALE
Vss
N.C.
H
N.C.
000
Vee
N.C.
N.C.
000
N.C.
BEl
N.C.
N.C.
N.C.
Vss
N.C.
N.C.
N.C.
N.C.
N.C.
N.C.
0
0
0
0
0
0
LAD3
Vee
Vss
INT3
INTI
INTo
0
LAD 23 LAD24 LAD22 LAD21 LAD 18 LAD 1S LAD12 LADlO LAD6
A
K
N.C.
LAD2S LAD 26 LAD27
B
000
N.C.
000
LOCK
HOLD LAD2S BADAC
N.C.
L
N.C.
Vss
LAD 29 LAD31 CACHE
C
0
N.C.
Vee
000
Vee
Vss
000
BEo
0
N.C.
Vss
000
LAD30 READY
D
N
000
W/R
E
0
Vee
N.C.
000
F
p
0
N.C.
Vss
0
Vss
DT/R
G
0
0
BE3
H
0
N.C.
14
Vss
000
N.C.
N.C.
13
0
N.C.
N.C.
12
Vee
N.C.
N.C.
11
N.C.
000
N.C.
N.C.
10
N.C.
N.C.
N.C.
9
Vss
N.C.
N.C.
8
7
N.C.
000
N.C.
6
Vee
DEN
K
3
0
LAD2
0
CLK2
000
B
A
LADo RESET
Vss
0
0
0
0
0
0
LAD7
LADs
LAD4
LADI
INT2
Vee
9
10
11
12
13
14
271080-10
Figure 19. MG80960MC Pinout-View from Bottom (Pins Facing Up)
3·263
intel®
80960MC
u
u
ZI~
U
U
>
>
"
N
CIl
CIl
CIl
CIl
>
;;;
C>
N
N
...J
...J
I~
:I:
U
U
...J
Z Z Z Z ZZ Z
u
u
VI
VI
u
>
U
U
U
U
on
on
on on
on
'"'" ;;; '"on "on
'" ;;; '"" "" "
on
'"
...<> <>'" <>'" <>'"'" " '" ... ... ... ""~ ""~<> Ii I~'" > IZ<> I~ > Z>u Z
" '" o
'" " '"
'"
:I:
ro
NI:::!
o
I~ I~ L:':
N
N
...J
...J
0
...J
>
0
L.U
...<>" ...<> ...<> ...
I~
u
~
~
VI
VI
...J
VI
CIl U
L.U
NOW
W
U
Z
U
Z
on
U
Z
U
Z
>
z
Z
00
U
U
>
on
'"
U
Z
u
u
>
The M82965 Bus Extension Unit (BXU) is the key to building multiprocessor and fault-tolerant systems with the
80960MC 32-bit microprocessor. BXUs connect to each other in an expandable matrix that can support up to
32 processor and memory modules in a single, high-performance system. No external interface logic is required. The BXU increases overall system performance by providing hardware support for local caches, I/O
prefetch, message passing, and multiprocessor arbitration. Through redundant modules, fault-tolerant systems
based on the BXU can sustain a single-point failure and then reconfigure themselves automatically, while
application programs continue undisrupted. Truly a VLSI building block, the M82965 BXU supports a wide
range of fault tolerance and performance options to meet a diverse set of cost, performance, and reliability
needs.
,.-----..,-,;=,..,--,=,.-,,;-;.,..,..-;=----'\1
FAULT
TOLERANCE
LOGIC
AP-BUS
INTERFACE
".---v
271082-1
Figure 1. M82965 Block Diagram
3-276
January 1990
Order Number: 271082-003
M82965
FUNCTIONAL OVERVIEW
The M82965 Bus Extension Unit (BXU) is the key
component in building multiprocessor and fault-tolerant system designs with the 80960MC 32-bit microprocessor. Its primary function is to connect the Local bus (L-Bus) of a system module to a system-wide
bus called the Advanced Processor Bus (AP-Bus),
allowing the system to expand incrementally as
each new module or AP-Bus is added.
Several important features are provided within the
BXU which streamline 80960MC multiprocessor system operation. To increase the available system bus
bandwidth, multiple BXUs can be employed within
each system module to support up to four AP-Buses.
To reduce AP-Bus traffic, BXU components can directly support a two-way set-associative cache. I/O
prefetch channels are incorporated within each BXU
to reduce the time necessary to transfer large blocks
of data from shared system memory or I/O. BXUs
support processor-to-processor communication by
recognizing, storing, and exchanging Interagent
Communication (lAC) messages with other BXUs
along the AP-Bus. Requests for access to the APBus are resolved through BXU arbitration logic
which ensures that no system modules will suffer
from resource starvation.
basis through a method called Functional Redundancy Checking (FRC). Errors on the AP-Bus are
detected through interlaced parity bits on the address/data and control lines, signal duplication on
the transaction control lines, and a bus timer used to
monitor the bus for non-response to a request. Recovery mechanisms include the capability to marry
FRC modules in a primary-shadow pair (Quad Modular Redundancy), so that if either fails, the surviving
spouse can take over operations immediately. Transient errors on the AP-Bus are automatically retried,
and in the case of permanent errors, the failed bus is
disabled and all memory accesses switched to a
backup bus.
MULTIPROCIESSOR SUPPORT
A multiprocessor 80960MC system is composed of
a set of modules connected to an AP-Bus. Figure 2
shows the three possible types of modules: active,
passive, and the combination of both an active and
passive module. Active modules contain up to two
80960MC processors, cache or private memory, and
a BXU. Passive modules contain a memory array
and controller and a BXU. Active/Passive modules
contain either processors and global memory, or
master and slave I/O devices.
BXUs support fault tolerant system operation
through several mechanisms used to detect, isolate
and recover from hardware errors. Paired BXUs
monitor each other's operation on a cycle-by-cycle
ACTIVE MODULE
PASSIVE MODULE
ACTIVE/PASSIVE MODULE
ACTIVE/PASSIVE IWDULE
AP-BUS,
271082-2
Figure 2. Types of Modules
3-277
intel®
M82965
Local Bus
In a multiprocessor system each module has its own
Local Bus (L-Bus), which is typically confined to a
single board. The L-Bus is provided to interconnect
components within a module. It is a 32-bit multiplexed, synchronous bus with a maximum bandwidth
of 43 Mbytes per second at 16 MHz. It has been
designed to interface with standard support components using minimal glue logic. The L-Bus uses
HOLD/HOLDA for arbitration with bus slaves and
LOCK for signaling indivisible operations. A READY
signal can be used to lengthen bus transactions.
Local Bus protocol permits both primary and secondary bus masters to coexist on the bus (often a
processor and a DMA, or occasionally two processors). A secondary bus master must obtain use of
the L-Bus from the bus master through the use of
HOLDR/HOLDAR. A BXU is always used as a master in a memory module and is generally used as a
slave in a processor module. Fifty BXU pins are dedicated to L-Bus and module support operations (including cache control). The L-Bus control registers
are shown in Table 1.
Table 1. L-Bus Control Registers
Register
Description
Physical-ID (Local)
This register contains a unique identifier for a specific BXU on the L-Bus. It
corresponds to the AP-Bus Physical-ID register.
Logical-I 0 (Local)
This register holds the Logical-ID of the BXU. It corresponds to the AP-Bus
Logical-ID register. '
LBI Control
This is the major control register for BXU functions on the L-Bus. It is used to
set the interleaving factor for the cache, determines if the BXU should act as
a master on the L-Bus, and indicates whether the BXU is in memory or
processor mode.
System Bus 10
This register uniquely identifies the BXU as attached to one of four AP-Buses.
Local-Bus Test
This register allows system diagnostics to check on the type of recognition
that was done on the previous L-Bus request.
Match 0
The contents of this register determine which bits in the L-Bus address should
be recognized by the BXU. This register provides a base address for a
partition of memory recognized by the BXU.
Mask 0
The contents of this register determine if certain bits in the Match 0 register
should be ignored (Le., marked "don't care") during address recognition.
Match 1
Same function as Match Register O.
Mask 1
Same function as Mask Register O.
Match 2
Same function as Match Register O.
Mask 2
Same function as Mask Register O.
Private Memory Match
Private memory address recognizer.
Private Memory Mask
Private memory mask register.
3-278
M82965
Advanced Processor Bus
A highly optimized multiprocessing bus called the
Advanced Processor Bus (AP-Bus) interconnects
B0960MC system modules. The AP-Bus is synchronous, in that all components in the system, including
processors and BXUs, are driven by the same clock
edge. It is a 32-bit multiplexed bus with a maximum
bandwidth of 43 Mbytes per second at 16 MHz.
BXUs connect to each other in the form of a matrix
to allow orderly growth in the system by the addition
of buses or modules. An B0960MC multiprocessing
system allows up to 32 modules and four AP-Buses.
In practice, the number of modules in a system will
be somewhat less in order to meet the AP-Bus's
timing and electrical specifications; a practical limit
may be 20 to 25 connections to an AP-Bus. Table 3
contains a summary of the functions of the AP-Bus
Interface Registers.
Transactions over the AP-Bus are encoded into
pairs of request and reply packets. A request packet
defines the operation, amount of data, and the location (or address) where the transaction will occur. In
the case of a write request, the packet will also include data. The reply packet indicates whether or
not the action completed successfully, and in the
case of read replies, will also include the requested
data. Table 2 lists the various types of AP-Bus operations.
The AP-Bus supports a pipelining feature that allows
up to three requests to be pending at any time. Reply packets are returned in the order requested unless deferred, but requests and replies may be intermixed. For example, two requests may be made, followed by a single reply packet, then another request
packet, before being completed by two reply packets.
The AP-Bus consists of 47 bi-directional signals, a
clock signal, a RESET signal, and five module support signals which are used to interface system modules to the AP-Bus (see Figure 3). The BXU is the
only component that attaches to the AP-Bus.
3-279
Table 2. Types of AP-Bus Operations
Packet
Type
Request
Base
Action
Write
Specific
Operation
Write Word(s)
RMW Write Word(s)
Read
Read Word(s)
Accepted
Read Reply Word(s)
RMW Read Word(s)
Reply
Acknowledge
. (Write Reply)
Refused
Reissue
Not Acknowledged
(NACK)
Bad Access
intel~
M82965
<
<
<
<
PACKET SIGNALS (38 LINES)
ERROR SIGNAL GROUP (4 LINES)
SYNCHRONIZATION (2 LINES)
MODULE SU~PORT (7 LINES)
>
>
>
>
271082-3
Transaction Control
• Arbitration: ARB (3:.0)
• Reply Ordering: RPYOEF
Synchronization and Initialization Group
• System Clock: CLK2
• Initialization: RESET
Packet Signals
• Specification: SPEC (5 .. 0)
• Address/Oata:AO (31..0)
Module Support Group
• Identification: INITIO
• Module Check: MOOCHK
• Bus Output Control: BOUT
• Communication: COM
• Voltage Reference: VREF
• Pop Queue: POPQUE
• Subsystem Busy: SSBUSY
Error Signal Group
, • Check Signal: CHK (1 .. 0)
• Bus Error: (1..0)
Figure 3. Advanced Processor Bus
3-280
M82965
Table 3. AP-Bus Interface Registers
Register
Description
PhysicallD
This register contains a unique identifier for a specific BXU (or FRC pair of
BXUs) on an AP-Bus.
LogicallD
This register holds the logical ID for the BXU. In every case, all BXUs in the
same module will share the same 10gicallD. When two modules are married
in a OMR configuration, they will also share the same logical 10.
Component
Specifier
The contents of this read-only register are fixed at manufacture and specify
the type and stepping of the component.
Arbitration ID
When the BXU needs to issue a request on the AP-Bus, it must actively
arbitrate for the bus. The time and order in which a BXU arbitrates is
determined by the contents of this write-only register.
Com
This register is used for loading external information, such as the type of
board the BXU resides on, into the BXU. The register is useful for both
initialization and diagnostics.
AP-Bus Control
This register is the general control and status register for the BXU's AP-Bus
interface.
FT1
Most of the BXU fault-tolerant capabilities can be selectively enabled by
altering control bits in this register.
Maxtime
The value in this register determines the length of time that BXUs will remain
quiescent following the beginning of an error report.
FRC Splitting
Control
Writing to this register allows a master/checker pair of BXUs to be split into
separately functioning components.
FRC Register
The contents of this register determine of a BXU is part of a master/checker
pair and how the component responds if it is part of a OMR module.
Test Detection
Bits in this register enable parity logic and other internal self testing diagnostic
features.
AP Match
Bits in this register are compared against the corresponding bits in the APBus address cycle and determine which partition of the address space is
recognized by this BXU.
AP Mask
If a bit in this register is cleared, it will cause the corresponding bit position in
the Address Match register to be ignored during comparisons.
Memory addressing over the AP-Bus is divided into
16-byte blocks. The location of a bus transaction is
defined by a 32-bit address. Each address points to
a single byte that is part of a larger 16-byte block. All
transactions are performed on a single block or portion of a block, and do not overlap multiple blocks.
I
Modes of Operation
In Processor mode, the BXU supports cache, I/O
prefetch and lAC message functions. The BXU can
act as either a master or slave on the L-Bus and
requests can flow in either direction between the
AP-Bus and the L-Bus. The assumption is, however,
that most traffic will flow from the L-Bus out onto the
AP-Bus. In a processor-only module, there is no
need for the BXU to participate in arbitration for the
L-Bus, since it will operate only as a slave.
The BXU operates in either Processor or Memory
mode. Processor mode provides support for Active
or Active/Passive modules, while Memory mode
supports Passive modules. The functions of several
BXU signals are dependent on the operating mode
of the BXU.
In Memory mode, the BXU always operates as a
master on the L-Bus and no requests are ever accepted from the L-Bus. All requests flow from the
AP-Bus into the module. In this mode, the BXU supports memory functions and signaling, but does not
provide caching or I/O prefetch.
3-281
II
intel·
M82965
lAC requests fall into two major groups: messages
and register requests. Messages are sent between
processors to cause a processor to perform a specific action (e.g., start, stop, flush cache, etc.) and
are held in the lAC message support registers; Table
4 summarizes the function of these four registers.
Register requests are used by software to read and
write to BXU registers in order to control the system
operation or configuration.
Read-Modify-Write Transactions
Read-Modify-Write (RMW) operations are provided
to give BXUs.the ability to read and modify a location
as a single indivisible action. A RMW-Read operation initiates the indivi~ible action by asserting the
LOCK signal on the L-bus. A RMW-Write operation
is used to terminate the action.
When an RMW-Read transaction occurs, the block
of memory addressed is marked by the BXU controlling that portion of memory as locked (the lock covers a fixed address space based on address bits 4
and 6). Once locked, any other RMW-Reads to this
block will be rejected, but the block remains availablefor other types of memory operations.
An lAC message always originates on an L-Bus and
usually from a processor. From the originator, the
request flows to the BXU where it may be handled
internally or propagated on to the AP-Bus. If the lAC
is sent on to the AP-Bus, the final destination of the
lAC (another BXU) must reside on that bus. The lAC
will not be propagated onto another L-Bus or APBus. lAC messages can be one to four words long.
When an RMW-Read is issued, the BXU controlling
the affected memory will either respond with data in
a normal Read Reply (and set the appropriate lock),
or it will respond with a Reissue Reply indicating that
the requested block is already locked. If refused, the
requesting BXU will wait a short interval and then put
the RMW-Read request back into the arbitration process and try again.
Although each L-Bus (processor or memory module)
may be connected to as many as four AP-Buses, at
any point in time only one bus will be designated as
the message bus. All lAC messages will flow over
that bus. The BXUs on the message bus are responsible for handling the lAC message traffic on behalf
of the processors residing on their L-Bus .(an L-Bus
may support one or two processors).
RMW-Writes are equivalent to Write Word(s) except
'that it resets the lock for that memory location. The
only valid reply packet is the Ack (Write Reply).
AP-Bus 0 normally serves as the message bus. If
AP-Bus 0 is not functional, then AP-Bus 1 serves as
the message bus, completely transparent to the
software. Processors are unaware of which bus is
actually acting as the message bus.
Interagent Communications (lAC)
Support
Bus Extension Units and 80960MC processors communicate by sending Interagent Communication
(lAC) messages, which are a set of memory-mapped
addresses recognized by all BXUs. These messages
are used for such system functions as initialization,
cache flushing, access to error logs and interrupts.
The upper 16 Mbytes of the 80960MC's 4 Gigabyte
address range are reserved for lAC communications.
110 Prefetch Support
The BXU offers two I/O prefetch channels to provide high bandwidth, low latency access to memory
for sequential transfers. Each channel buffers 32
bytes of data in two 16-byte blocks. As data is requested from the buffers, the BXU automatically prefetches the next data block. The BXU can take
Table 4. lAC Support Registers
Register
Description
Processor 0 Priority
This register holds the priority of the task (process) which Processor 0 on the
BXU's L-Bus is currently executing.
Processor 0 Message
This register buffers four words of data from an lAC message for ProcessorO.
Processor 1 Priority
This register holds the priority of the task (process) which Processor 1 on the
BXU's L-Bus is currently executing.
Processor 1 Message
This register buffers four words of data from an lAC message for Processor 1..
3-282
M82965
advantage of the three-deep AP-Bus pipeline to
quickly fill the buffers if 'it ever gets behind because
of momentary surges in AP-Bus traffic. In this way,
the prefetch logic acts to provide stable, bounded
response times, even in large multiprocessor configurations.
proved performance for a processor on the L-Bus. It
also increases potential system performance in a
multiprocessor system by reducing each processor's
demand for AP-Bus bandwidth, thereby allowing
more processors in a system.
The BXU provides cache directory, coherency logic,
and control signals, while external SRAM is used for
data storage. A CACHE signal output from the
80960MC processor indicates to the BXU whether a
request is cacheable. The operation of the BXU
cache is not dependent on the size of the data transfer and therefore can support partial writes. Both
data and instructions can be contained within the
local cache.
Because the normal operation of the BXU hides the
latency of write requests by replying immediately on
the L-Bus, the prefetch unit operates only for read
requests. On a read request from the L-Bus, the prefetch logic returns the amount of data requested.
Any processor or intelligent device used with the
BXU must guarantee that it will split all memory requests that cross 16-byte boundaries into two requests.
The BXU supports a two-way, set associative cache
with 64 sets. The (read address) tag field is 20 bits
long and consists of LAD lines 31-12. There are
eight bits that indicate if a line is valid (a line is 16
bytes). The control bits in the cache control registers
can be used to mask some of these bits to change
cache configurations. All entries in the directory can
be invalidated by sending an INVALIDATE CACHE
Command to each BXU in the module. Figure 4
shows one example of a BXU cache directory and
its relation to L-Bus addresses.
Cache Support
The main function of a cache is to provide local high
speed storage for frequently accessed memory locations. Storing the information locally, the cache
intercepts memory references and handles them directly without transferring the request to the AP-Bus.
This action results in lower traffic on the AP-Bus and
decreased latency on the L-Bus, leading to im-
WAYO
AP-BUS ADDRESS
l- LAD IZ LAD 1Z-LAD 7 LAD s-LAD4
,
t
I"'""t----t>
LAD3-LAD o
STORED ADDRESSES
•
,
TAG
~
~ t----t>
r
0
SET 2
t----t>
..... t----t>
i
SET 62
SET 63
TAG
LINE SELECT
~~ ,.
.v
COMPARE
WORD SELECT
I
WAY BIT
"',
CACHE ADDRESS
~
SET 0
SET 1
D
E
R
~
'<;;'
STORED ADDRESSES
D~
,
I
-*~ Ii" • Ii"
I COMPARE
~
J
ENCODER
I
J
~
271082-4
Figure 4_ Example of a Cache Directory Array
3-283
II
int:et
M82965
A single BXU supports 16 Kbytes of cache. When a
processor module uses multiple BXUs (and therefore multiple buses), the BXUs cooperate to provide
a larger directory and addressing for a larger cache.
The best way to view this larger directory is to think
of it as having an increased number of sets. Thus a
cache managed by two BXUs will have a directory
consisting of 128 sets instead of 64. The maximum
size cache is 64 Kbytes (four BXUs supporting four
AP-Buses per processor module).
The cache is managed using a write-through policy
that guarantees that the shared system memory will
always have the most recent copy of all data; BXU
caches never contain the only copy of revised data.
Any time a processor updates a cache entry, it always causes a write request on the AP-Bus, so that
there are never any hidden updates. In addition, all
BXUs monitor AP-Bus traffic to detect if an update is
being made to a location which they are storing in
their own cache. If so, that line in the cache directory
is marked invalid. This procedure guarantees that a
BXU cache will always return correct data even
when a system uses multiple caches, when multiple
processors treat a single data item differently (some
caching, some not), or when two processors are
used on a single L-Bus.
An example of an SRAM control design using a single BXU is shown in Figure 5. The BXU supplies six
memory control signals to interface the directory and
control logic with an external cache composed of
static RAM: Cache Read (CR), Cache Write (CW) ,
WayO (WYO) , Way1 (WY1), WordO (WOO), and
Word1 (WD1). SRAM control also requires use of
the L-Bus byte enable (BE3-BEO) signals and certain address lines. To simplify latching the byte enable signals, the BXU asserts READY on all address
and recovery cycles as well as when it is transferring
data.
M51C68
SRAM
(EIGHT 4K x 4)
>O-If--Cf CS o
LA031 -LAOO ~....t-j~
BE3-BEo 1-..-1--1--:------...... 00-03
W/R
READY
CLR
>O-If--Cf cs 1
PE
M82965
BXU
CR ~---+----I ~~===:-+====----.t-:---CLK.r
271082-5
Figure 5. Sample Cache SRAM Control Design Using a BXU
3-284
~
intel®
M82965
The tight timing specifications of SRAMs require a
small amount of external logic to interface a static
RAM cache to a BXU. Since all BXU cache signals
have a relatively wide clock to data valid specification (Tcd), external flip-flops are used to achieve
tighter resolution of the Cache Write and Word edges. The address bits are latched using ALE from the
processor. WayO selects between the two "ways" in
the cache directory, and Way1 selects between the
cache and private memory (if present on the L-Bus).
the processor made a read request for two bytes
that missed the cache, the BXU would first write the
entire 16-byte block, then return the requested information to the processor. If the byte enable latches
weren't set, then the write into the cache wouldn't
work correctly because not all byte enables would
be asserted. Byte enable information does not need
to be held on reads because data is always returned
in full words and the processor selects the portion of
the word that it needs internally. Signal timings are
shown in Figures 6-10.
In order to ensure that the cache is filled properly,
the byte enable latch is c.leared on read requests. If
271082-6
Figure 6. Cache Read Signal Timing for 35 ns SRAMs
3-285
M82965
TW
To
Td ,
TdO
Td2
T,
Td3
CLK2
CLK3
LAD3 ,-LAD O
CR
CACHE
READY
Cvi
Wii,-WiiO
W'i 0
Cs
WE
271082-7
Figure 7. Cache Write Signal Timing for 35 ns SRAMs
Tw
To
CLK2
CLK3
TdO
Td ,
Tw
T,
~ ~ ~~ ~ ~ 1r
'"" tJ
V
V
tJ
J\..
tJ
tJ
IJ
--"......
-'
~
=x
ADDRESS
XXXxxxx
[XX [XX [X
[l(XXXXX~T~XXXXXX X
.-
[XX lXXXDATA,X
.-~
''00\
.~
READY
CACHE
Tw
AXY
'..
-'
:l(
:X
:l( 'YY
>flx
ADDRo
'~
ADDR,
:l( 'XX
X :X
:X :X
'l(
..............
XX :X
'"""""",,,
'
~XY
271082-8
Figure 8. Cache Read Signal Timing for 70 ns SRAMs
3-286
intel@
M82965
T,
CLK2
CLK3
LAD3 ,-LADO
4.:lQo~"'::'::="'./'Cf'-__t--""':::':':4----.,....I\l!..J\.._~L...:::=~--+-.N.~~~~
CACHE
READY
~~~~~±=~~'"
CW
W51-W50~~~~3~~~~F~*~~~;
WYo
~
Cs
WE
271082-9
Figure 9. Cache Write Signal Timing for 70 ns SRAMs
3·287
•
_.
c(•
~
c
...
...?
CD
0
QI
()
:::T
CD
(J)
cO"
::3
!!!.
::!
3
-
To
CLK2
IQ
3"
CLK3
...
LAD3 ,-LADO
0
.
TW
.... ~ ~
-'v----' v----' J~
U)
CD
~...
Co
TW
Td3
-
DATAo
(J(J(XADDRESS
XXX
--
OATA,
JO(l{
OATA2
CACHE
QI
READY
~"
C'ii
QI
Wil,-Wilo
Td ,
TdO
- DATA3
-
TdZ
""TAo~J{""TA,JO()Q()(]{"'"
JO(l{
Cii
:tJ
CD
TdZ
Td3
T,
~ ~ ~ ~ ~ ~ ~ ~ ~ '\J"\....ir"
v----' ~ ~ v----' ....r--' v----' ~ v----' ~ ~~
QI
N
CD
Td ,
TdO
_
~X.[XY
'X.'XA
s;:
00
N
....
co
U.A
A'J
nlm.
nlm.
~
~
Ol
U1
Co
:r
0
QI
()
:::T
CD
-...
ViiO
~
cs
0
WE
Co)
U1
::3
III
(J)
:tJ
l>
en
==
"~'A
~]{y
ADDRo
XXX
ADDR
_XXX
ADDR2
XXX ADDR3
ADORa
"
XXX
ADDR,
XXX
ADDR2
~
1§1
XXX "A55R,"
X
ex:
-~
~~
'\.
I\-r ' - . I
' LERlO
---l---
T"
VALID ~AMP #1
271082-27
'NOTE:
LERL signals must be asserted at both edges A2 and A3 in order for them to be recognized by the BXU.
Figure 23. Drive Levels and Measurement Points for A.C. Specifications.
L-Bus Timings for the BXU as a Bus Slave
3-313
M82965
EDGE
C
D
B
D
C
ClK2
OUTPUTS:
LAD31 - LADO'
ADS. BE3-BEO' DEN.
CACHE. HOlDR. FRF. COR. t.lEt.I.
WIR. lOCK. HlDA. HOlDR
DT/R
~~~~~~~~~~~~~--~--V-Al-ID-O-U-TP-U-T,----l-~:~,:~"j~
O,8~~
VALID OUTPUT
INPUTS:
LAD31 -LADo'
READY. ERR.
ECC.UNC.
lOCK. HOLD.
HOlDAR
271082-28
"NOTE:
LERL signals must be asserted at bot~ edges A2 and A3 in order for them to be recognized by the BXU.
Figure 24. Drive Levels and Measurement Points for A.C. Specifications.
L-Bus Timings for the BXU as a Bus Master
3-314
-
in"tol®
M82965
T,
I-T8~ T141---
ALE
~
ADS
~ -Trl
W/R
~
OT/R
DEN
R,~~-----4~~~~~1
~TI3-
T14
-
r-
~T:l
"""
~""" ""--
~'::;
T9
T6
~
~"""'::;
i-=.
~~
1
~
I-
T91=.
T13-
T14
I-
f0.,~
i-T61
""""",,'1 -T~
-
T91--
~
-
~
M
~,
T9
r-
~
'(::/
271082-29
Figure 25. Relative Timing for L-Bus Signals
3-315
intel®
M82965
EDGE
A
C
o
A
A
C
CLK2
BUS CYCLE
OUTPUTS:
S:~:=~~: .~~~~~~~~~~~~~~~~~~~~
L
VA~L~ID
________
~
±i
~::::=~~ ~~~g::::;'''''''----V-A-Ll-D-----~--.T,;,:9;...18jJ~
RPYDEF
~T17
______
~~ ~
__
___
______ _
I
VALID
INPUTS:
AD(3'-O)'
SPEC(5_0)'
RPYDEF
CHK(3_0)
ARB(3_0)
271082-30
'NOTE:
SERL signals must be asserted at both edges A2 and A3 in order for them to be recognized by the SXU.
Figure 26. Relative Timing for AP-Bus Signals
FIRST
C
C
A2
B C D A3
CLK2
CLK
RESET
OUTPUTS
~-~
•• • --------+----+----+---1-
INIT PARAMETERS (BADAC.
iACo) MUST BE SETUP 8 CLOCKS
PRIOR TO THIS CLK2 EDGE
~
L
ALL COMPONENTS MUST
AGREE ON THIS EDGE AS EDGE A3 •
NOT EDGE A2
INIT PARAMETERS MUST BE HELD
BEYOND THIS CLK2 EDGE
271082-31
Figure 27. RESET Setup and HOLD Timing
3-316
M82965
L-BUS DESIGN CONSIDERATIONS
Input hold times can be disregarded by the designer
whenever the input is removed because of a subsequent output from the BXU (e.g., DEN becomes
deasserted). In other words, whenever the BXU generates an output that indicates a transition into a
subsequent state, the BXU will have sampled any
inputs for the previous state.
As an example, in the recovery (Tr) cycle following a
read, the minimum time (t6 Min) that DEN becomes
asserted is specified to be less than the minimum
hold time on the data (t11 Min). When DEN is asserted, however, the data is guaranteed to have been
sampled.
Similarly, whenever the BXU generates an output
that indicates a transition to a subsequent state, any
outputs that are specified to be tri-stated in this new
state will be tri-stated.
For example, in the data (Td) cycle following an address (Ta) cycle for a read, the minimum output delay (t6 Min) of DEN is specified to be less than the
maximum float time of LAD (t9 Max). When DEN is
asserted, however, the LAD outputs are guaranteed
to have been tri-stated.
AP-BUS SIGNAL TIMING
CONSIDERATIONS
The AP-Bus uses three-quarter cycle signaling for
data transmission. Data is driven on edge 0 and
sampled on edge C. This approach allows threequarters of the bus cycle to be used for data transmission.
The remaining (one-quarter) time allows for clock
skew and signal hold time. All AP-Bus signals except
for the ARB, CHK, and BERL signals use this timing.
The relationship of the AP-Bus signals is shown in
Figure 28.
The CHK signals (interlaced parity) are delayed by
one-half cycle or one phase to allow for generation
of parity from the internal data that is being transmitted. The CHK lines are sampled one phase after the
data has been sampled and compared against the
parity generated for the received data.
Most input signals on the AP-Bus are sampled on
the rising edge of CLK2 at edge C. The exceptions
are the error signals CHK, BERL and ARB, which
are sampled on the rising edge of CLK2 at edge A.
Regardless of the edge, the setup and hold times
are the same.
All outputs on the AP-Bus are driven relative to the
falling edge of CLK2 at the middle of phase 2, except CHK, BERL and ARB, which transition on the
falling edge of CLK2 at the middle of phase 1.
When designing a system based on the AP-Bus, the
system topology will be limited by the available propagation time for signals in the system. The propagation time must allow for settling of ringing, ground
shift, and crosstalk, all of which are dependent on
board and system materials and design.
The following equation gives the propagation time
available, given a specific clock implementation and
frequency:
Where Tskew is the worst case clock skew between
BXUs (clock skew is the time delay between any two
clocks in the system due to physical distribution limits).
In AP-Bus systems, this skew is defined as follows:
L-Bus Waveforms
Figures 30 through 36 illustrate the relationship of LBus signals during a variety of bus transactions. For
a detailed discussion of the operation of the L-Bus,
consult the B0960MC Hardware Designer's Reference Manual.
3-317
intet
M82965
ClK2
AP- BUS
ADDRESS/DATA
CYCLE
AP- BUS
ARBITRATION
CYCLE
AP- BUS
ERROR CYCLE
271082-32
Figure 28. AP-Bus Signal Timing
ClK2
ClK
271082-33
Figure 29. System and Processor Clock Relationship
3-318
iVl82965
To
Td
Tr
ClK2
ClK
lAD 31 lADo
ALE
ADS
BE3-BEO
•
W/R
DT/R
DEN
READY
271082-34
Figure 30. L-Bus Read Transaction
3-319
inteL
M82965
To
Tw
Td
Tr
ClK2
ClK
lAD 31 lADo
ALE
ADS
BE3 -BE O
W/R
DT/R
DEN
READY
271082-35
Figure 31. L-Bus Write Transaction
3-320
intel·
M82965
To
Td
Td
Td
Tr
ClK2
ClK
lAD 31 lADo
ALE
ADS
&3-&0
W/R
DT/R
DEN
READY
271082-36
Figure 32. L-Bus Burst Read Transaction
3-321
M82965
To
Tw
Td
Td
Tr
ClK2
ClK
LAD31 LADo
ALE
ADS
BE 3-BEO
W/R
DT/R
DEN
READY
271082-37
Figure 33. L-Bus Burst Write Transaction with One Wait State
3-322
M82965
Th
Th
Th
Th
ClK2
ClK
HOlDR
HOLD
HlDA
HlDAR
271082-38
PRIMARY
SECONDARY
HOLD~HOLDR
L-______
HL_M~~LH_O_LD_AR____~
DELAY OF 5 os MINIMUM
IS REOUIRED
Figure 34. Hold Timing
3·323
271082-39
infel~
M82965
I-
PREVIOUS
CYCLE
.·T
CLK
INTR
t
INTERRUPT
ACKNOWLEDGEMENT
CYCLE 1
.
i·
To
Td
-t:-
~ (5 BUSIDLESTATES)
T, .
."
Tl
TI
TI
TI
TI
To
INTERRUPT
ACKNOWLEDGEMENT
CYCLE 2
. Td
"j
Tw
T,
'V ~ J: ~ V' ~ V' V' V' ~ ~ V' V
~ '--
-~
-0,."-."-.~ ~
~"-.'0 ~"-."-.~ 10..."-."-.~ ~"-."-.~ 0-."-."-.~
,"""" ~"-.~ ~"-."-.'\: ~"-.~ ~"-."-.'\: ~"-.~
-
,'::Z,,'\': "'-0::<.,, ~R}(DATA x..,."-.'0 ~"-."-."" 0-."-."-.'0 ~"-."-.~ 10..."-."-.~ -0,."-."-."" ~R~ ~O~""
--
--
\.,,""~""'" 0-.,,""" ~""'" ~"""" ~""'" 0-., /
\./
~~
'--- V
'--- /
'\
0-.'
/
DT/R \~"-.~ 0-."-."-.'0 ~
'\
/
~~
~ 10."-."-.~ ~"-."-.'\: 10."-."-.~ ~"-."-.~ ~"-."-.~ ~
'--- /
\.
\.
r -
/
'V
\. /
:--
271082-40
Figure 35. Interrupt Acknowledge Transaction
3-324
inteJ®
M82965
PBM BUS
STATE
SBM BUS
STATE
elK
LAD.31-
LADO
W/R
P8M ALE
S8M ALE
HOlDR
PBM
HOLD
PBM
S8M
~
~
~
~
T,
Thr
Thr
Thr
To
T,
V V V V' V' V' V' V' '.J' J ' V'I\"~ ~ )C ~ ~ ~ ~ ~ ~ ~ ~ X-- X
- - ~"",\ 10:'
~
DATA
.. ~
~~
~W
~"",\ ~"",\ ~"",\ ~"",\
,~"",\ ~",,'\ ~""'\; ~/
~ I::r
"'"
~~/
'\;
~"",\ ~"",\ ~"",\ ~",,,,\::
\.
V\V
~
/
~""''\:: ~,,"0 I'\:
\.
V\ /
\.
~~p
-~ r--
~"",\ ~'W ~~ &""'''0 ~"",\:: ~
~~ ~'J
' -V
~
DATA ~,,'\;
\."",\: ~,,"0 I'\:
~ [;1
\. V
HlDA
HlDAR
~
Thr
J""' J""' J""' V
\. /
S8M
~
Tr
I
r---
~
\.
L- ~ r-
/
"--- r271082-41
Figure 36. Bus Exchange Transaction (PBM = Primary Bus Master, SBM = Secondary Bus Master)
3-325
Memories and Peripherals
4
85C960
iai\iHCRON CHMOS
80960 KaSIERllES BUS CONTROL ]LPlD
o
D Burst Logic, Ready Control, and
Address Decode Support for 80960
KAlKB Embedded Controllers in Single
Chip
EI Icc =
I!II Burst Logic Supports Both Standard
and New Generation "Burst Mode"
Memories and Peripherals
E3
o
Operates with 80960KA/KB at 20 MHz
and 25 MHz
50 mA Max.
EI
UV Erasable (CerDIP) or OTpTM
D
100% Generically Testable Logic Array
EJ Based on Low Power CHMOS IIIE*
Technology
Ready/Timing Control Supports 0-15
Wait States across 8 Address Ranges,
Read/Write Accesses, Burst
Transactions
EI Available in 28-Pin 300-mil CerDIP and
PDIP Packages and in 28-Pin PLCC
Package
(See Packaging Spec., Order Number #231369)
8 Dedicated Inputs Decoded into 8
Latched Chip Selects (4 E}{ternal/
Internal; 4 Internal Only)
'CHMOS is a patented technology of Intel Corporation.
RESET/VPP
17
16
15
14
13
12
11
10
BLAST#
RDY#
WCLK#
A3
GND
a.
a.
>
~
w
VCC
ADS#
DEN#
W/R#
CSO#
CS1#
CS2#
CS3#
ADO
AD1
AD2
AD3
A2
CLK2
(I)
!!!
14
13
12
11
10
BLAST#
RDY#
290192-1
~
t::
w
De
U
u
>
"" z""
(I)
0
c(
W
0
25
N85C960
W/R#
CSO#
CS1#
CS2#
CS3#
ADO
AD1
290192-2
# = Active Low Signals
Figure 1. Pinout Diagram.
4-1
August 1990
Order Number: 290192'002
intel~
85C960
grammable device able to support Intel's 32-bit
80960 KAlKB embedded controllers at speeds up to
25 MHz.
GENERAL DESCRIPTION
The Intel 85C960 is a single-chip burst/ready/decode ""PLD (Microcomputer Programmable Logic
Device) designed to interface 80960 KAlKB embedded controllers to system memory and I/O. The
85C960 provides programmable chip selects, a programmable read/write access wait state/ready generator, and burst address (A2, A3) cycling. Burst
transaction cycling of A2, A3, and WCLK# (Write
Clock) is also supported for intelligent peripherals on
the bus.
ARCHITECTURE DESCRIPTION
The 85C960 ""PLD integrates burst control, ready
generation, and chip select decoding into a single
device. Figure 2 shows the architecture of the
85C960. Table 1 lists and describes each signal on
the device. The 85C960 replaces 6-10 separate
PLD/discrete logic devices in small- and mediumsized 80960 systems. For medium- to large-sized
systems, the 85C960 can be supplemented with an
additional decoder, such as the 85C508, and a second 85C960. Figure 3 'shows a single 85C960 in a
typical application.
For its programmable functions, the 85C960 uses
advanced EPROM cells as logic array and wait-state
table memory elements. Coupled with Intel's proprietary CHMOS IIIE technology, the result is a pro-
CSO#
CS1#
CS2#
CS3#
17
16
15
14
13
12
11
10
CS4#-CS7#
(INTERNAL)
LE
8
(OPEN DRAIN)
t-...;..-..-~--1C:J1 RDY#
W/R#
ADO
ADI
AD2
AD3
BUS STATE
TRACKER
t----------------oDBLAST#
L_"T"""_J---------'---------C> WCLK#
ADS#
DEN#
~---------------~A3
CLK2
t----------------~A2
r,;:)((}' PROGRAMMABLE RESOURCES
RESET
290192-3
Figure 2. 85C960 Block Diagram
4-2
""'.
L-BUS
A
LAD31-LADO
~
~
f----+
f---c
DT/R
DEN
~
ALE
80960A/KB
PROCESSOR
c(
DATA
DATA
TRANSCEIVERS
~
'l
DIR
G
ADDRESS
r--vI
LAD31-LAD4
f---c
CE
1\
v
ADDRESS
LATCHES
.
Vee
..,.
~
2- fl~: 8
~
~
AD3-ADO
17-10
CSO
CSl
ADS
ADS
RDY
G~D
MEMSELO
MEMSELl
85C960
BURST/
READY/
DECODE
--<
A2
CLK2
L
W/R
W/R
WCLK
BLAST
RESET
SYSTEM RESET -----CLK2 FROM
SYSTEM CLOCK
BURST MODE
MEMORY
CS
BLAST
TO I/O
DEVICES
W/R
01
(')
to
en
o
DE
10SELl
A3
CLK2
co
WE
CS2
CS3
7- • 7-
NON-BURST MODE
MEMORY
CS
10SELO
DEN
READY
RESET
@
i i i
ADS
CLK2
RESET
READY FROM
OTHER SUBSYSTEMS
290192-4
int:eL
85C960
Table 1. 85C960 Pin Descriptions
Symbol
Type
Name and Function
RESET
I
RESET. When RESET is high for a minimum of four CLK2 cycles, internal
circuits are reset to a known state.
17-10
I
INPUT 7-INPUT O. These are the address range inputs to the
programmable decode logic array ..
CLK2
I
SYSTEM CLOCK. This input, which connects to the 80960 CLK2 signal,
provides the timing reference for all 85C960 operations.
A03-AOO
I
ADDRESS IN 3-ADDRESS IN O. These inputs are driven by LAOO-LA03
from the Local Bus (L-Bus) to provide addressing and burst access decode
information.
W/R#
I
WRITE/READ. Write/Read from controller. When low, indicates that the
current access is a read. When high, indicates that the current access is a
write.
OEN#
I
DATA ENABLE. This input from the controller indicates that data is present
on the L-Bus.
AOS#
I
ADDRESS/DATA STROBE. This input from the 80960 indicates whether
address or data information is currently on the L-Bus. When low, address
information is changing. The 85C960 chip select timing is based in part on
ADS # low during Ta states.
BLAST #
a
BURST LAST. This signal, when low, indicates that the current read/write
access is the last access in a burst transaction. BLAST# is not cycled if
ROY # is generated off-chip.
WCLK#
a
WRITE CLOCK. This output provides a write enable strobe to memories that
do not support burst mode acces!).
A3,A2
a
ADDRESS OUT 3,2. These outputs cycle during burst transactions.
Typically connected to lowest memory address signals.
CS3#-CSO#
a
CHIP SELECT 3-CHIP SELECT O. Single p-term select outputs that are
driven active (low) for the programmed address com~ition on 17 -10.
ROY#
1/0
READY. ROY# is an active low, bidirectional, open·drain signal that should
be connected to the controller's Ready input As an output, ROY # goes high
to cause the controller to extend the current access. ROY # goes low to
indicate that the data on the L-Bus bus may be sampled (read) or removed
(write). ROY # is controlled by the 85C960 Ready Generation and Wait-State
Logic. The open-drain output allows ROY # to be OR-tied to other circuitry
. that may drive the controller's Ready input. As a bidirectional input, ROY #
allows the 85C960 fo provide Ready timing and burst cycling for intelligent
peripherals that do not generate these signals themselves.
4-4
85C960
80960 L-Bus (Local Bus) cycles are monitored by
the Bus State Tracker tq synchronize the functional
blocks in the 85C960 to the L-Bus. CLK2 provides
the timing reference for all 85C960 operations.
and complements of all inputs (17-10) are available
to all eight NAND p-terms.
Each intersecting point in the logic array is connected or not connected based on the value programmed in the EPROM array. Initially (EPROM
erased state), no connections exist between any
p-term and any input. Connections can be made by
programming the appropriate EPROM cells. Since
p-terms are implemented as NANOs, a true condition on a p-term drives the output low. Current consumption is higher when both true- and complement
p-terms for the same input are programmed.
Four external chip selects (CSO # -CS3 #) are generated by the programmable Chip Select Decoder_
These four signals provide decoded selects to memory and I/O devices and are routed to the programmable Wait-State Table so that the 85C960 can
generate ROY # at the appropriate time. Four additional selects are decoded (internal only) and routed
to the Wait-State Table so that the 85C960 can generate ROY # for up to four additional address
ranges.
Selects are latched on the falling edge of an internal
Latch Enable (LE), which is generated from ADS#,
DEN #, and CLK2. The proper combination of these
signals occurs during an 80960 address state (Ta).
Figure 5 shows the relationship of the internal LE
and external chip selects to the three signals at the
end of a Ta state. All selects are cleared to an inactive high state at the start of a recovery state. (Tr).
All eight selects (four external and four internal) are
routed to the Wait-State Table.
The Ready Generation block generates ROY # to
the controller under control of the Wait-State Table.
Depending on the contents programmed into this table and the current type of access, from 0-15 wait
states can be introduced into each bus cycle. An
independent wait state value can be chosen for
each select and each access type. Four access
types are possible: read first, read subsequent, write
first, and write subsequent.
Wait State Table
The Burst Control and Address Counter blocks
control burst transaction timing to memory and I/O.
Note that the ROY # pin is sampled by the Burst
Control block to allow the 85C960 to generate burst
transaction timing for other bus peripherals. WCLK #
provides a write enable strobe for memory and I/O
that do not support burst mode. BLAST # informs
burst-mode devices that the current access is the
last one in a burst transaction. /\2 and A3 are cycled
to select the address location for each access.
Chip selects, WR (Write/Read), and SW (Subsequent Word) feed the Wait-State Table. Each chip
select points to a set of four wait state values while
WR and SW determine which of the four values to
route to the Ready Generation block (see Figure 6).
The four values are grouped into read and write
groups with each group having a value for the first
access and subsequent access (second through
fourth). The four-bit wait-state value is sent to the
Ready Generation block (via WSO#-WS3#) to be
used as an initial count value. If two selects are active, the resulting count value is the logical bit AND
of the two individual values. If more than two selects
are active and the individual count values are not the
same, the resulting count value is indeterminate. If
no select is active, no count value is loaded (and the
Ready Generation circuit is disabled).
fUNCTIONAL DESCRIPTION
The following paragraphs provide a detailed description of each functional block in the 85C960 !J,PLD.
Chip Select Decoder
The Chip Select Decoder, shown in Figure 4, is a
high speed, single p-term (product-term) latched decoder circuit with eight inputs (10-17) and eight
latched outputs. Each output goes low when its associated product term is true. Four of these outputs
(CSO#-CS3#) are available externally to be used
as device selects. The remaining four outputs
(CS4#-CS7#) are available internally so that the
85C960 can provide ready and burst timing for four
more device selects. (The actual selects for these
four additional devices/resources must be generated by external logic.)
-
Ready Generation
ROY # is high at the start of each burst transaction.
The ROY Generator begins to count down from the
wait state value, decrementing the counter at the
start of each wait state. When the internal counter
reaches 0000, ROY # is pulled low (CLK2c during
the data state). On the next CLK2c edge (for a wait
state), ROY # is released, allowing an external resistor to pull ROY # high. Figure 7 shows the timing for
a four-word burst write transaction with 1 wait state
for the first access and 0 wait states for the remaining three accesses (Burst Write 1-0-0-0).
The input to each latch is a single NAND p-term that
can be connected to the dedicated inputs. The true
4-5
int'et
85C960
RDY # is an open-drain I/O pin, which must be connected to pullup and pull down resistors as shown in
Figure 8. During a wait-state access, ROY # is pulled
high to cause the controller to extend the current
access so that the memory or peripheral chip has
time to present data to the bus (read), or sample
data on the bus (write). ROY# is released on the
CLK2a edge of a Tr state. If a Read or Write access
occurs without a chip select having been decoded
on-chip, the ROY # output buffer is disabled and
ROY # is sampled as an input. This allows the
85C960 to cycle A2, A3, and WCLK # to provide
burst transaction timing for other bus controllers.
ROY # may be OR-tied with other bus controllers so
they can access the processor Ready signal.
• • •
CSO#
10
CS1#
11
CS2#
12
CS3#
•
•
•
17
CLK2
D----I
ADS#
D-----------<=I
DEN#
D------------I
LE
290192-5
Figure 4. 85C960 Chip Select Decoder Block
4-6
85C960
Ta
Tw
CLK2
ADS
LE
(INTERNAL)
CH IP SELECT ACTIVE
(BASED ON 10-17)
290192-6
Latch opens when CLK2 and DEN# go high and ADS# goes low.
Latch closes when DEN# goes low or ADS# or CLK2 go high.
Figure 5. Internal LE and EJcternal Chip Select Timing
Burst Transactions
WCU( #, BLAST # Generation
AD3, AD2 are latched to indicate the starting address of a burst transaction. The 85C960 places
these. two signals out on A3 and A2, respectively,
then cycles the two addresses upward until the last
access of the burst. The 85C960 assumes that the
processor handles splitting of the burst transaction
when a 16-byte boundary is crossed.
WCLK # is the write enable signal for writing to nonburst mode memories. When low, address outputs
A2 and A3 are valid. Its trailing edge (Iow-to-high
transition) can be used to latch data into non-burst
mode memories. WCLK# is only provided during
writes; during reads, WCLK # remains high.
BLAST # indicates that the current access is the last
access in a burst transaction. BLAST# is used by
burst-mode memories to reset internal address
counters. BLAST # is not cycled when ROY # is generated off-chip.
ADO and AD1 specify the size of the burst transfer in
double-words as shown in Table 2.
Table 2. ADO-AD1 vs Burst Size
AD1
ADO
No. of
Words Transferred
0
0
1
1
0
1
0
1
1
2
3
4
POWER-ON CHARACTERISTICS
85C960 inputs and outputs begin responding 1 p.s
(max.) after Vcc power-up (VCC = 4.75V) or after a
power-Ioss/power-up sequence. RESET must be
synchronous to CLK2 and must be held high for a
minimum of 4 clock cycles after Vec reaches 4.75 V.
After 4 clock cycles, A2 and A3 are high, CSO #CS3# (and CS4#-CS7#), BLAST#, WCLK# are
high, and the open drain RDY # signal is inactive.
4-7
int:et
85C960
Select
CSOf#
WR = 0
(Read)
WR = 1
(Write)
msb Isb
0000
msb Isb
0000
SW=1
msb Isb
(Subsequent Word)
0011
msb = most significant bit
Isb = least significant bit
msb Isb
0010
SW=O
(First Word)
LATCH-UP IMMUNITY
Write/Read
All of the input, output, and clock pins of the device
have been designed to resist latch-up which is inher-.
ent in inferior CMOS processes. The 85C960 is de. signed with Intel's proprietary 1-micron' CHMOS
EPROM process. Thus, each of the pins will not experience latch-up with currents up to ± 100 mA and
voltages ranging from -0.5V to (Vee + 0.5V). The
programming pin is designed to resist latch-up to the
13.5V max. device limit.
Figure 6. Example Wait-State Entries for CSOf#
DESIGN RECOMMENDATIONS
For proper operation, it is recommended that all input and output pins be constrained to the voltage
range GND :::; (VIN or VOUT) :s; Vee. All unused inputs should be tied high or low to minimize power \
consumption (do not leave them floating). Unused
outputs may be left floating. A high-speed ceramic
decoupling capacitor of at least 0.2 p.F must be connected directly between the Vee and GNDpin.
ERASURE CHARACTERISTICS
Erasure time for the 85C960' is 20 minutes at
12,000 p.Wsec/cm2 with a 2537A UV lamp.
Erasure characteristics of the device are such that
erasure begins to occur upon exposure to light with
wavelengths shorter than approximately 4000A. It
should be noted that sunlight and certain types of
fluorescent lamps have wavelengths in the 3000A4000A range. Data shows that constant exposure to
room level fluorescent lighting could erase the typical 85C960 in approximately two years, while it
would take approximately. two weeks to erase the
device when exposed to direct sunlight. If the device
is to be exposed to these lighting conditions for extended periods of time, conductive opaque labels
should be placed over the device window to prevent
unintentional erasur.e.
As with all CMOS devices, ESD handling procedures
should be used with the 85C960 to prevent-damage
to the device during programming, assembly, and
test.
FUNCTIONAL TESTING
Since the programmable sections of the 85C960 are
controlled by EPROM elements, the device is completely testable during the manufacturing process.
Each programmable EPROM bit controlling the internal logic is tested using .application independent
test patterns. EF?ROM cells in the device are 100%
tested for programming and erasure. After testing,
the devices are erased before shipments to the customers. No post-programming tests of the EPROM
array are required.
The recommended erasure procedure for the
85C960 is exposure to shortwave ultraviolet light
with a wavelength of 2537 A. The integrated dose
(I.e., UV intensity x exposure time) for erasure
should be a minimum of fifteen (15) Wsec/cm 2. The
erasure time with this dosage is approximately 20
minutes using an ultraviolet lamp with a 12,000 p.W/
cm 2 power rating. The device should be placed within 1 inch of the lamp tubes during exposure. The
maximum integrated dose the 85C960 can be exposed to without damage is 7258 Wsec/cm 2 (1
week at 12,000 p.W/cm2). Exposure to high in'tensity
UV light for longer periods may cause permanent
damage to the. device.
The testability and reliability of EPROM-based programmable logic devices is an important feature
over similar devices based on fuse technology.
Fuse-based programmable logic devices require a
user to perform post-programming tests to insure
device functionality. During the manufacturing process, tests on fuse-based parts can only be performed in very restricted ways in order to avoid preprogramming the array.
4-8
85C960
To
CLK2
ADS
Tw
Td
Td
Td
Td
Tr
~~ ~~~~~
1\
/
/
;--
\.
ADO-AD3
=:::x
X r---"
~
10-17
W/R
CSO-CS3
A2,A3
"-
--.I
\.
\.
\.
/
,,----- --------- X
x
X.______ ,,
~
I ~~~
/
ROY
\.
/
290192-7
Figure 7. Burst Write Transaction (1-0-0-0)
4-9
intel®
85C960
Note that in-circuit configuration changes allow "onthe-fly" changes .to be made, but do not alter
EPROM cell data. At the next power-up, the device
will be configured according to the original data programmed into the EPROM cells. In-circuit reconfiguration requires additional circuitry external to the
85C960. For details on in-circuit configuration
changes, refer to AP-337, In-9ircuit Reconfiguration
of 85C960 and 85C508 pPLDs, order number
292072.
Q vee
150.n
85C960 I
OPEN-DRAIN
OUTPUT
,1--0--.
220.n
290192-8
IOL = 28.8 rnA
VOH = 3.DV
DESIGN SOFTWARE
Software support is provided by version 2.1 (or later)
of iPLS II (Intel Programmable Logic Software II).
Programming is supported on the iUP-PC PC-based
programmer or iUP-200Al201A Universal Programmer via the GUPI base module and the GUPI
85EPLD28 programming adaptor.
Figure S. ROY # Pullup/Pulidown Resistors
IN-CIRCUIT RECONFIGURATION
The 85C960 allows in-circuit configuration changes
after the device has powered up. At power-up, the
device is configured according to the information
programmed into the EPROM cells. After power-up,
new information can be shifted in on select pins to
alter device configuration. The new configuration is
retained until the device is powered down or until the
information is overwritten by another configuration
change.
For detailed information on iPLS II, refer to the
iPLDS II Data Sheet, order number: 290134. The
tools section of the Programmable Logic handbook
contains a complete listing of all design tools for Intel EPLDs.
ORDERING INFORMATION
S0960KA/KB
Clock Frequency
J.LPLO Order Code
Package
* D85C960-20
CERDIP
N85C960-20
PLCC
*D85C960-25
CERDIP
N85C960-25
PLCC
20 MHz
25 MHz
'Only wmdowed CERDIP allows UV-erase.
4-10
Operating Range
Commercial
Commercial
85C960
ABSOLUTE MAXIMUM RATINGS*
NOTICE: This is a production data sheet. The specifications are subject to change without notice.
Supply Voltage (V ecl(1) .......... - 2.0V to + 7.0V
• WARNING: Stressing the device beyond the "Absolute
Maximum Ratings" may cause permanent damage.
These are stress ratings only. Operation beyond the
"Operating Conditions" is not recommended and extended exposure beyond the "Operating Conditions"
may affect device reliability.
Programming Supply
Voltage (Vpp)(1) .............. - 2.0V to + 13.5V
D.C. Input Voltage (VI)(1, 2) ... -0.5V to Vee + 0.5V
Storage Temperature (T stg) ..... - 65°C to + 150°C
Ambient Temperature (TA)(3) ..... -10°C to + 85°C
NOTES:
1. Voltages with respect to GND.
2. Minimum D.C. input is -0.5V. During transitions, the inputs may undershoot to -2.0V or overshoot to +7.0V for
periods of less than 20 ns under no load conditions.
3. Under bias. Extended Temperature versions are also
available.
RECOMMENDED OPERATING CONDITIONS
Symbol
Parameter
Min
Mal(
Units
Vee
Supply Voltage
4.75
5.25
V
VIN
Input Voltage
0
Vee
V
Va
Output Voltage
0
Vee
V
TA
Operating Temperature
0
+70
°C
4-11
85C960
D.C. CHARACTERISTICS (TA = O·C to + 70·C, Vee = 5.0V ± 5%)
Symbol
Parameter
Min
VIH1(4)
High Level Input Voltage
(All Inputs except for
ADS #, ADO-ADS, DEN #,
andW/R#)
2.0
VIH2(4)
High Level Input Voltage
for ADS #, ADO-ADS,
DEN#, and W/R#
2.2
VIL(4)
Low Level Input Voltage
-O.S
VOH
High Level Output Voltage
Vou
Low Level Output Voltage
VOL2
Typ
Max
Vee
+ O.S
Unit
Test Conditions
V
V
0.8
2.4
V
V
10H = -4.0 mA D.C.,
Vee = Min.
0.4 .
V
IOL = 4.0 mA D.C., Vee = Min.,
CL = SO pF
Low Level Output Voltage
for A2, AS
0.45
V
10L = 24 mA D.C., Vee = Min.,
CL = 60 pF
VOL3
Low Level Output Voltage
for Open Drain (RDY #)
0.5
V
10L = SO mA D.C., Vee = Min.,
CL = SO pF
II
Input Leakage Current
±10
p,A
Vee = Max.,
GND :;:; VIN :;:; Vee
loz
Output Leakage Current
±10
p,A
Vee = Max.,
GND :;:; VOUT :;:; Vee
ISc!5)
Output Short Circuit Current
-90
mA
Vee = Max., VOUT = 0.5V
Icc
Power Supply Current
50
mA
Vee = Max., VIN = Vee or GND,
No Load, CLK2 =50 MHz
,-SO
10
NOTES:
4. Absolute values with respect to device GND; all over and undershoots due to system or tester noise are included.
5. Not more than 1 output should be tested at a time. Duration of that test should not exceed 1 second.
A.C. TESTING LOAD CIRCUIT
(ALL OUTPUTS EXCEPT RDY#)
A.C. TESTING LOAD CIRCUIT (RDY#)
ou~~~~
02
Vee
O-......---fIIIlI--....- - I M - - o vee
290192-9
290192-18
See D.C. Characteristics Table for Current and Capacitance Specifications.
D1 and D2 are matched
D3 and D4 are matched
See D.C. Characteristics Table for Current and Capacitance Specifications.
D1 and D2 are matched.
4-12
85C960
A.C. TESTING WAVEFORM-SYNCHRONOUS INPUTS AND OUTPUTS
3.0
los; 1--
CLK2
TEST POINTS -
1.5V
0.45
2.4
INPUT (SETUP
AND HOLD)
0.4
=><
2.0>
TEST
OINTS
<2.0
O.B
O.B
K
~:>m''''~'
OUTPUTS
290192-10
A.C. Testing: Inputs are driven at 2AV for a Logic "1" and OAV for a Logic "0". CLK2 is driven at 3.0V for a Logic "1"
and OA5V for a Logic "0". Timing Measurements made relative to CLK2 are made from 1.5V on CLK2. Inputs and
outputs are measured at 2.0V for a high and O.BV for a low. Device input rise and fall times are less than 3 ns.
A.C. TESTING WAVEFORM-ASYNCHRONOUS INPUTS AND OUTPUTS
2.4
X
2.0>
INPUTS
0.4
TEST POINTS
O.B
)<
OUTPUTS
2.0>
TEST POINTS
0.8
290192-11
A.C. Testing: Inputs are driven at 2AV for a Logic "1" and OAV for a Logic "0". Input timing is measured at 1.5V for
high:to·low and low·to·high transitions. Outputs are measured at 2.0V for a high and O.BV for a low. Device input rise and
fall times are less than 3 ns.
4-13
85C960
A.C. CHARACTERISTICS
Symbol
(TA = O°Cto
+ 70°C, VCC =5.0V
±5%)
85C960-25
Parameter
Min
Max
85C960-20
Min
Units
Max
11(6)
Input Setup to CLK2a
12
15
ns
12(6)
Input Hold from CLK2a
2
2
ns
t3
CLK2a to A2, A3 Valid Delay
0
t4
CLK2c to ROY # Output Low Delay
t5(7)
CLK2c to RDY# Output High Delay
t6
CLK2a to CSO#-CS3# High Delay
t7
CLK2a to BLAST # Low Delay
t8
CLK2a to BLAST # High Delay
5
t9(8)
CLK2b to WCLK # Low Delay
0
10
0
12
ns
0
10
0
12
ns
12
ns
15
ns
8
0
10
10
5
40
5
20
10
ns
15
ns
15
ns
50
ns
20
ns
5
ns
tlO(8)
CLK2d to WCLK # High Delay
t11 (9)
ADS# Low to CSO#-CS3# Low Delay
t12(9)
CLK2c to CSO # -CS3 # Low Delay
t13(10)
10-17 Setup to CLK2a
5
t14(10)
10-17 Hold from CLK2a
2
115(11)
10-17 Valid to CSO#-CS3#Valid Delay-(tPD)
t16
ROY # Input Setup to CLK2d (Write)
7.5
10
ns
t17
ROY # Input Setup to CLK2a (Read)
9
9
ns
t18
ROY # Input Hold after CLK2a (Read/Write)
5
10
ns
t19(12)
RESET High Setup to CLK2
0
0
ns
t20(13)
RESET High Hold from CLK2
3
3
ns
t21 (12)
RESET Low Setup to CLK2a
5
5
ns
10
12
i
i
7
ns
2
10
ns
12
ns
NOTES:
6. Applies to ADS#, DEN#, W/R#, and ADO-AD3. DEN# is high during· the entire Ta state in 80960 KAlKB systems.
7. ROY # is an open-drain output. Specified time includes ROY # output float delay and pull-up/pull-down resistors
(Figure 8). ROY # remains low for a minimum of 10 ns at the start of a Tr state and goes high by CLK2a of the next Tx state.
8. Minimum WCLK # pulse width is one clock period minus 3 ns. For example, at 25 MHz: 20 ns - 3 ns· = a 17 ns minimum
WCLK # pulse.
9. Chip Select Decoder latches are transparent flow-through types. Latches open when ADS# is low, DEN # is high, and
CLK2 goes high during the middle of a Tx state (CLK2c). Since DEN # is high during the entire Ta state in 80960 KAlKB
systems, only CLK2c and ADS# are specified.
.
10. Chip Select Decoder latches are transparent flow-through types. Latches close when ADS# is high or DEN# is low, or
when CLK2 goes high at the start of a Tx state (CLK2a) after the latches have opened. Since ADS # is low and DEN # is
high at the end of a Ta in 80960 KAlKB systems, setup and hold times are specified with reference to CLK2a only.
11. Propagation delay while. latches are open (transparent); one output switching (high-to-Iow).
12. RESET must be held high for a minimum of 4 CLK2 cycles (80960 specifies 41 CLK2 cycles minimum).
13. RESET must hold after the low-to-high transition immediately prior to CLK2a. CLK2a is defined as the first low-to-high
transition after RESET goes low.
4-14
85C960
CLK2 EDGES
CLK2
290192-12
NOTE:
Minimum CLK2 high and low times are 8 ns measured from 1.5V to 1.5V.
CAPACITANCE (TA = O·Ctc +70·C;Vcc = 5.0V ± 5%)
Typ
Max
Unit
CIN
Input Capacitance
6
10
pF
COUT
Output Capacitance
6
10
pF
CCLK
CLK2 Capacitance
6
10
pF
= OV, f = 1.0 MHz
VOUT = OV, f = 1.0 MHz
VIN = OV, f = 1.0 MHz
Cvpp
Vpp Pin Capacitance
10
25
pF
Vpp on Pin 1 (RESET)
CRDY
ROY # Capacitance
6
10
pF
VOUT
Symbol
Parameter
Min
Conditions
VIN
= OV, f = 1.0 MHZ
•
HI
4-15
_.
c(•
4 Word Burst Write with 1 Wait State on Each Access
ROY # is Generated by the 85C960
(Same Timing for Read Cycle, Except WCLK # Remains High)
.•
CLK2
ADS
Tw
~- ~
Td
Tw
Td
Tw
/
~
\.
0J
0
-
0)
WCLK
X
X r-~
r- 0 -
10-17
A2.A3
.1
X
ADO-AD3
CSO-CS3
'
..
CI)
'
U1
oco
0)
BLAST
- .~
/
W
\.
\.
-01'\.
.- XC0
---- ---- X
r ;--"\
;-1 \
\.
0RDY
Tr
Td
/
DEN
W/R
Tw
~~ ~'"'-.J 'V'u ~ 'V'u ~ 'V'u ~
0-
!
Td
\.
-
- t..®
/
\.
/
'-- - - - -- -------.
;-:--\
~/
J
-
~
\.
o "-- C@
/
290192-13
o
85C960
WCLK# TIMING
Tx
I
a
d'
b
CLK2
WCLK
290192-14
10-17 AND CSO#-CS3# TIMING
Ta
Tw/Td
CLK2
ADS
II
DEN
@
ADO-AD3
CSO-CS3
290192-15
NOTE:
CLK2, ADS #, and DEN # generate ,internal latch enable, See Figure 7 f~r details.
4·17
int:et
85C960
3 Word Burst with 0 Wait States on Each Access
ROY # is Generated Externally
(WCLK# is Only Generated During Burst Write Transactions)
Ta
Td'
Td
Td
Tr
CLK2
W!R
PJErr-r-iT--i--i--';\I]_
A2,A3
290192-16
RESET INPUT TIMING
a
CLK2
, RESET
I,-~----------~--'I
1-_ _
... ------------~I
4 CLK2 CYCLES - - - - - - - 1
(MINIMUM)
290192-17
4-18
PIPEl~NtED
9
27960CX
BURST ACCESS 1M (128K x 8) CHMOS EPROM
Synchronous 4 Byte Data Burst Access
Ed Pipelined Addressing for. Optimal Bus
Bandwidth on B0960CA
- Next Addressing Overlaps Last Data
Byte
I:J No Glue Interface to 80960CA
L'l'I High Performance Clock to Data Out
- Zero Wait State Data to Data Burst
- Up to 33 MHz B0960CA Performance
III CHIVIOS III-E for High Performance and
Low Power
- 125 mA Active, 30 _mA Standby
- TTL Compatible Inputs
I1lI Asynch Microcontroller Reset Function
- Returns to Known State with High-Z
Outputs
II]
1 Mbit Density Configures as 12BK x B
Intel's 27960CX is a 5V only, '1,048,576 bit, Erasable Programmable Read Only Memory, organized as 128K
words of 8 bits.
The 27960CX provides a no glue synchronous burst interface to the 80960CA bus. Internally the 27960CX is
organized in 4 byte blocks, in which each byte is accessed sequentially. The internal state machine is factory
configured to generate either 1 or 2 wait-states between the address and first data byte. High performance
outputs provide zero wait-state-data to data accesses at clock frequencies up to 33 MHz.
Pipelining capability allows addresses to overlap previous data, further optimizing bus bandwidth in 80960CA
applications. An asynchronous microcontroller RESET feature puts the outputs in the high impedance state
and takes the internal state machine to a known state where a new burst access can begin.
The 27960CX is available in 44-lead PLCC package, providing optimum cost effectiveness.
The 27960CX is manufactured on Intel's 1 micron CHMOS III-E technology. The Quick-Pulse ProgrammingTM
algorithm provides fast, reliable programming with throughput under 17 seconds for optimized equipment.
'CHMOS is a Patented Process of Intel Corporation.
Ao-A I6
A
0
0
L
A
T
R C
E H
5 E
5
5
5
M
T
CLK
A
A
C
T
H
E
I
N
E
cs
290236-t
Figure 1. 27960CX Burst EPROM Block Diagram
4-19
September 1991
Order Number: 290236-006
..
intel®
27960CX
27960CX BURST EPROM
Architecture
EPROMs are established as the preferred code storage device in embedded applications. The non-volatile, flexible, reliable, cost effective EPROM makes a
product easier to design, manufacture and service.
Until recently, however, EPROMs could not match
the performance needs of high-end systems. The
27960CX was designed to support the 80960CA embedded processor. It utilizes the burst interface to
offer near zero wait-state performance without the
high cost normally associated with this performance.
The 27960CX provides a no-glue, synchronous burst
interface to the 80960CA's bus. It operates in pipelined or non-pipelined modes. Internally, the
27960CX is organized in 4 byte blocks which are
accessed sequentially. A burst access begins on the
first clock pulse after ADS and CS are asserted. The
address·of the 4 byte block is latched on the rising
edge of clock following ADS. After a preset number
of wait-states (1 or 2), data is output one byte at a
time on each subsequent clock cycle. A burst access is terminated on the rising edge of clock with
BLAST asserted. High performance outputs provide
zero wait-state data to data accesses at clock frequencies up to 33 MHz. Extra power and ground
pins dedicated to the outputs reduce the effects of
fast output switching on device performance.
In embedded designs, board space and cost must
be kept at a minimum without impacting performance and reliability. The 27960CX removes the need
for expensive high-speed shadow RAM backed up
by slow EPROM or ROM for non-volatile code storage. Code optimization concerns are reduced with
"off-chip" code fetches no longer crippling to system performance. FONTs can be run directly out of
these EPROMs at the same performance as highspeed DRAMs. With the 27960CX, the EPROM is
the ideal code or FONT storage device for your
80960CA system.
.
The pipelining capability of the 27960CX allows the
address to overlap the last data byte of the burst,
further optimizing bus band width in 80960CA applications. In the pipe lined mode, with a non-buffered
interface, the 27960CX delivers 4 bytes of data in
6 clock cycles at 33 MHz. In a 32-bit configuration,
. this translates into a read bandwidth of 88 Mbytes/
sec. Performance capability of the 27960CX in different 80960CA systems is given in Table I.
'CERQUAD is available in a socket only version.
ADDRESS
-"
17
~
~
DATA
<
27960CX
'I
8
ADS
BLAST
BURST
EPROM
128K x 8
RESET
ClK
PGM
290236-2
Figure 2. 27960CX Burst EPROM Signal Set
4-20
intel~
~OO~!LO~O[f{]~OOW
27960CX
Table 1. Performance Capability
33 MHz 2WS Non-Buffered: 4 Words/6 Clock Cycles AOOR
DATA
PCLK
Aoo
C1
WS
C2
WS
C3
25 MHz
AOOR
DATA
PCLK
Aoo
WS
Cl
C2
AOOR
DATA
PCLK
Aoo
WS
C2
2WS
002
C6
C3
000
C3
000
C4
1 WS
AOl
003
C7
Cl
001
C5
002
C6
AOl
003
C7
WS
C1
010
C3
C2
001
C4
002
C5
Cl
010
C3
C2
0 12
C4
011
C3
Os
011
C4
A02
013
C5
A1s
~4
05
04
11
~2
44 LEAD PLCC
~1
0.650" x 0.650"
~o
Ag
TOP VIEW
31
VSS4
As
VCC2
A7
°1
As
290236-3
Figure 3. 27960CX 44 Lead PLCC Pinout
4-21
C1
II
~3
N27960CX
VSS3
13
Cl
WS
AlB
VCC1
O2
A02
013
C6
WS
D12
C5
Cl
51 Mbytes/Sec
VSS2
03
A02
013
C6
WS
D12
C5
WS
WS
010
C2
011
C4
66 Mbytes/Sec
Buffered: 4 Words/5 Clock Cycles AOl
003
C6
88 Mbytes/Sec
WS
Buffered: 4 Words/6 Clock Cycles -
WS
16 MHz
Cl
001
C5
000
C4
WS
27960CX
PIN DESCRIPTIONS
Symbol
Pin
Ao-A16
23-39
00-0 7
18,17,14,
13,11,10,
Function
ADDRESS INPUTS: During a burst operation, A2-A16 provides the
base address pointing to a block of four consective bytes. Ao and Al
select the first byte of the burst access. The 27960CX latches
addresses in the first clock cycle. An internal address generator
increments addresses Ao and Al for subsequent bytes of the burst.
DATAINPUTS/OUTPUTS
7,6
ADS
42
ADDRESS STROBE: Indicates the start of a new bus access. ADS is
active low in the first clock cycle of a bus access.
,
CS
3
CHIP SELECT: Master device enable. When asserted (active low)
data can be written to and read from the device. In 'read mode, CS
enables the state machine and the I/O circuitry.
NOTE:
1. The address decode path is independent of CS, i.e., X and Y
decoding is always powered up.
2. For programming, CS should remain low for the entire cycle.
Program and verify functions are done one byte at a time.
3. CS going high does not terminate a concurrent burst cycle.
BLAST
1
BURST LAST: Terminates a concurrent burst data cycle at the rising
edge of the ClK. It must be asserted byth,e fourth data byte.
RESET
22
RESET: Resets the state machine into a known state, tri:states the
outputs. RESET must be asserted for a l)1inimum of 10 clock cycles. At
least 5 clock cycles are reguired after deassertion of RESET before
beginning the next cycle. RESET will abort a concurrent bus cycle.
PGM
43
PROGRAM-PULSE CONTROL INPUT
Vpp
Vss
Vee
2
5,8,12,
15,19,21
9,16,20,44
PROGRAMMING POWER SUPPLY
GROUND
SUPPLY VOLTAGE INPUT
4·22
,
27960C}(
on the address lines (due to the EPROM only) is
24 pF for a 128K x 32 system and 48 pF for a 256K x
32 system. The EPROM is specified at 6 pF for input
capacitance (15 pF max) and 12 pF typical for output capacitance. Larger systems can be implemented with buffers (Figure 48).
iNTERFACIE EXAMPLIE
Overview
This example illustrates 8-, 16- and 32-bit wide
27960CX interfaces to the 80960CA. The designs
offer a simple "no-glue" interface.
Chip Select Logic
A non-buffered 27960CX system organized as 256K
x 32 is shown in Figure 4A. Since the 27960CX is
capable of driving a 80 pF load, large, non-buffered
systems can be implemented by stacking up to 2
banks of 4 EPROMs, resulting in a 256K x 32 memory subsystem. The input capacitive load seen
r---
.....-
High order address lines are decoded to provide CS.
Qualification with other signals is not required. The
chip select logic can be implemented with standard
asynchronous decoders, PAL's or PLO's (like Intel's
85C508).
CS 2
DEC
CS,
"--
rcsL,
CS
27960CX
128K
ADDRESS
n"-
x
8
B0960CA
t
DATA
CS
f - 27960CX
I-- 128K
x
x
8
8
l-
A
nt-
CS
CS
I-- 27960CX
I-- 128K
..
x
8
l-
t--
J
~"CS
CS
f - 27960CX
f - 128K
I-
1
1
'I
290236-4
Figure 4A. 256K l( 32 Non-Buffered Burst EPROM Memory System
r - - - CS N
DEC
CS 2
CS,
"--
CS
~--~
., ...--I----..A
r-----v
ADDRESS
~
L--
DATA
r--A
XCVR
x
8
B0960CA
I\r
!--
-'--
cs~
CS
CS
27960CX
128K
CS
27960CX
128K
x
8
r
t
CS
CS
cs
cs
27960CX
128K
27960CX
128K
x
8
x
I--
CS
~~-_I.-
~~s~
CS
~--~
--
ADDRESS
DRIVER
8
f0I--
1
foI--
J
I.....290236-5
Figure 4B. Buffered Burst EPROM Memory System
4-23
inlet
27960CX
In a non-buffered, 16-bit system (Figure 6A) BE1
and A2 connect to the lower order address bits of
the 27960CX. BE1 connects to Ao of both EPROMs,
while A2 connects to both A1'S.
Schematics
Figure 5 shows a non-buffered, 128K x 32 27960CX
EPROM system.
In a non-buffered, 8~bit system (Figure 6B) SEO and
BE1 connect to Ao and A1 respectively.
Chip select logic, the only external logic that is required for this interface, can be derived from the
global system chip select circuitry.
DECODER
(85C508)
ADDRESS
I-c_S_ _- - - - - - _ - - - - - - . . - - - ' - ' - - - - . . ,
A2 -A'8
CS
AD-A 16
17
ADS
ADS
PClK
BLAST
80960CA
CS
AD-A 16
27960CX
128Kx8
27960CX
128Kx8
ADS
CS
AD-A 16
ADS
27960CX
l28Kx8
ClK
ClK
ClK
BLAST
BLAST
BLAST
DATA
32
290236-6
Figure 5. 128K x 32 27960CX Burst EPROM System
-
-
A3 -A 17
ADDRESS
CS
DECODER
(85C508)
1\
/
15 I'
-
CS
A2-A 16
---A
---v
cs
A2 -A 16
ADS
ADS
PClK
BLAST
80960CA
27960CX
l28K x 8
ClK
-BLAST
ADS
27960CX
l28K x 8
ClK
-BLAST
A2
BEl
A
Al
Al
Ao
Ao
~O-D7
DATA
~8-D15
,"-
/
'I
16
290236-7
Figure 6A. 27960CX Burst EPROM in a 16-Bit System
4-24
27960CX
r-
ADDRESS
CS
DECODER
(85C508)
A2-A'6
~
15 V
ADS
-
ADS
PClK
BLAST
BEl
A,
BEO
Ao
lt
o- D7
DATA
l\r
27960CX
12BK x B
ClK
BLAST
B0960CA
CS
A2-A'6
/
B
"
290236-8
Figure 6B. 27960CX Burst EPROM in a a·Bit System
required. With the 80960CA's maximum valid address delay of 14 ns at 33 MHz, 9 ns remains for CS
decoding logic.
Waveforms
Figure 7 shows the timing waveforms of a 27960CX
pipelined read in a 32-bit system.
Bootup
CS Setup Time
The wait state configuration (1 or 2), of the 27960CX
is programmed by the user into the 80960CA Region
Table parameters of NRAD, NRDD, and NXDA.
NRDD is always 0 for the 27960CX.
CS setup time is the time between CS being asserted and the first ClK rising edge (during the address
cycle). Since a memory access begins on the first
ClK rising edge after ADS and CS are asserted, a
minimum CS setup time of 7 ns (tSVCH) at 33 MHz is
4-25
intel®
27960CX
ws
AID
WS
WS
AID
elK
NRAD
NRDD
=2
=0
NXOA == 0
PIPELINED BURST READ
DATA
t--t--t---{
290236-9
NOTES:
1. The EPROM can also operate in non pipelined mode i.e, next address and ADS can be asserted in the clock cycle
following the last data word of the burst.
2. 2 - 0 - 0 - 0 Burst Read --+ 2 indicates the number of wait states to access the first word
O's indicate the number of wait states for subsequent data words:
o in this case!
Figure 7. Two Cycles of a 27960CX 2 Wait State 4 Byte Read (2-0-0-0 Burst Read) in a 32 Bit System
During boot-up (Figure 8), the 80960CA picks up it's
Region Table data from addresses FFFF FFOO;
FFFF FF04; FFFF FF08 and FFFF FFOC. Only the
least significant byte of each of the above four 32-bit
accesses is used to configure the Region Table. For
boot-up, the wait-state parameters NRAD and NXDA
default to 31 and 3 respectively. During boot-up, the
27960CX will wrap around the first word of the fourword burst and hold the first word until BLAST is
asserted.
always assumed. On timings where the EPROM is
faster than the microprocessor, we specified the
time required by the EPROM and left the excess
time as additional system guardband. The example
below shows how the 27960C2-33 tavcoh timing
was derived.
@33 MHz the clock cycle is
~30
ns.
tOV2 of the 80960CA is 3 ns - 14 ns.
Typical 2 ns guardband.
27960C2-33 tavcoh
27960CX DEVICE NAMES
=
30 ns - 14 ns - 2 ns
ns
= 14
The device names on the 27960CX were derived as
mnemonics that correspond to the number of wait
states and expected operating frequency for the device. For example, the 25 MHz, 2 wait state
27960CX is named 27960C2-25.
Decoders are needed for the systems chip select
decoding. For the 27960CX timings we assumed a
10 ns chip select decoder for 16 MHz and a 7 ns
decoder for 25 MHz and 33 MHz systems. The example below shows how the 27960C2-33 tsvch timing was derived.
AC TIMING DERIVATIONS
The AC timings for the 27960CX were generated
specifically to meet the requirements of the
80960CA microprocessor. In each case the applicable 80960CA clock frequency and AC timing were
taken together with an address buffer delay (if needed) and a typical 2 ns guard band to generate the
27960CX AC timing. Worst case timings were
@33 MHz the clock cycle is
~
30 ns.
tOV2 of the 80960CA is 3 ns - 14 ns.
Decoder = 7 ns
27960C2-33 tsvch = 30 ns - 14 ns - 7 ns
= 9
4-26
ns
""".
d
(2:
G
a
PCLK
"T1
I
cE'
e::
...
ADDR
CD
!»
N
-.J
:gl
ADS
0
0
-.J
.....
gl
2
CS
CD
en
5
6
31
7
~
~,.
\fA.
XI; m: ml IX
FFFF FFOO
11
'\
FFFF FFO'
IA.
II .m 1m 11K
FFFF FF06
ll:
m: 1mB
\V
\1
FFFF FFOC
:roDATA
...e::
\J
"'coOl"
V
00~01~02~031JJ
Ill:
o
\
FIRST BYTE
\ 00
W~APPED
n:
C')
AROUND
m:l!:
X
m m04~05~06~07n
mm m: m: mmloB~09~
A
~ B U: (I:
m: mOC~OD~OE~OFt:tJJ0
-NWS·
't:I
::II
3
S·
(Q
I:
/ ' 1----"
0
0
0
•
n. n. n.rL rL rt. rL n. ~t n. rt rt rt rL rL rL n. rt rt. rt. rt rL rt. rt. rt. rt rL n. rL n. rt. rt. rtrL n. n.. n. rtn..rt rL ~
0
IDI
3
I\)
)(
f"
I\)
1
BLAST
\~
Wi
NRAD = 31
!--NXDA=3-
\1
-2WS-
-NXDA=3-
\~
,
!--2WS-
!--NXDA=3-
\
r
-2WS-
290236-10
~
2$
!fiiil
F
~
~
~
2$
~
II
infel~
27960CX
CLK
u
z uz
BINARY
Vpp
VCC
= + 5V
= +5V
SE~UENCE
R = 1 KO
GND = OV
FROM AO TO A16
290236-12
CS = GND
CLK = 1 MHz
290236-11
Figure 9. 27960CX Burn In Blasing Diagram
Note that the 25 MHz buffers are slightly faster in
keeping with the increased sensitivity for higher performance. Significantly faster buffers are available
for applications requiring them. The example below
shows the tchqv timing analysis for a buffered
27960C2-25.
System Buffering Considerations
For large system applications buffering may be required between the microprocessor and memory devices. The 25 and 16 MHz 27960CX AC timings take
this into account. For applications not requiring buffering these devices will provide additional system
guardband.
@25 MHz the clock cycle is - 40 ns.
tlH1 of the 80960CA is 5 ns.
Output buffer for 25 MHz = 5 ns
The list below shows the buffers used in generating
the 27960CX timings:
25 MHz
16 MHz
Input
Buffer
8 ns
10 ns
27960C2-25
Output
Buffer
5 ns
7 ns
4-28
tCHQV =
40
ns -
=
30
ns
5
ns -
5
ns
intel®
27960CX
ABSOLUTE MAXIMUM RATINGS*
Read Operating Temperature ...... O°C to
+ 70°C(8)
Case Temperature Under Bias .. -10°C to
+ 80°C(8)
Storage Temperature .......... -65°C to
+ 125°C
All Input or Output Voltages
with Respect to Ground ...... - 0.6V to
+ 6.5V(4)
Voltage on Ag
with Respect to Ground ..... - 0.6V to
+ 13.0V(4)
Vpp Supply Voltage
with Respect to Ground ..... - 0.6V to
+ 14.0V(4)
Vee Supply Voltage
with Respect to Ground ...... - 0.6V to
NOTICE: This data sheet contains preliminary information on new products in production. The specifications are subject to change without notice. Verify with
your local Intel Sales office that you have the latest
data sheet before finalizing a design.
• WARNING: Stressing the device beyond the "Absolute
Maximum Ratings" may cause permanent damage.
These are stress ratings only. Operation beyond the
"Operating Conditions" is not recommended and extended exposure beyond the "Operating Conditions"
may affect device reliability.
+ 7.0V(4)
READ OPERATION
DC CHARACTIERISTICS
Symbol
O°C
< TA + 70°C, Vee
Mal(
Unit
1
/LA
VIN
Output Leakage Current
10
/LA
Vpp Load Current Read
10
/LA
= 5.5V
= 0 to Vee, PGM =
CS = VIH, f = 33.MHz
CS = VIH
CS = VIL, f = 33 MHz,
lOUT = 0 rnA
Parameter
III
Input Load Current
ILO
Ipp
ISB
Vee Standby
I Switching
I Stable
Notes
2
45
mA
30
mA
125
mA
0.8
V
Vee Active Current
1,3,7
VIL
Input Low Voltage
4
VIH
Input High Voltage
VOL
Output Low Voltage
VOH
Output High Voltage
Output Short Circuit
Min
2
lee
los
= 5V ±10%, TTL Inputs
-0.5
2.0
Vee
+
0.45
1
Vee - 0.8
V
5
2.4
V
100
5.5V
VOUT
Vpp
VIH
V
V
5
6
Test Condition
=
= 2.1 rnA
10H = -100/LA
10H = -400/LA
10L
mA
NOTES:
1. Maximum current is with outputs unloaded.
2. Icc standby current assumes no output loading i.e., 10H = 10L = a mA.
3. Icc is. the sum of current through VCC3 + VCC4 and does not include the current through VCC1 and VCC2' (VCC1 and
VCC2 supply power to the output drivers. VCC3 and VCC4 supply power to the reset of the device.)
4. Minimum De input voltage on input and output pins is -0.5V. During transitions, this level may undershoot to -2.0V for
periods less than 20 ns.
5. Maximum De voltage on input and output pins is VCC + 0.5V which may overshoot to VCC + 2.0V for periods less than
20 ns.
6. One output shorted for no more than one second. los is sampled but not 100% tested.
7. Icc max measured with a 10.11 p.F capacitor betWeen Vcc and Vss.
8. This specification defines commercial product operating temperatures.
4-29
infel®
27960CX
The fifth character represents the signal level indicated for the fourth character. The list below shows
character representations.
EXPLANATION OF AC SYMBOLS
The nomenclature used for timing parameters are as
per IEEE STD 662-1980 IEEE Standard Terminology
for Semiconductor Memory.
Each timing symbol has five characters. The first is
always a "t" (for time). The second character represents a Signal name. e.g., (ClK, ADS, etc.). The third
character represents the signal's level (high or low)
for the signal indicated by the second character. The
fourth character represents a signal name at which a
transition occurs marking the end of the time interval
being specified.
A:
Address
R:
B:
BLAST
Q:
C:
H:
l:
P:
Clock
S:
logic High level
t:
ADS/logic low level
V:
Vpp Programming Voltage Z:
No longer a valid "driven" logic
X:
AC CHARACTERISTICS: READ OPERATION
Versions
27960C2-25
27960Cl-16
33 MHz
2 Wait State
25 MHz
2 Wait State
16MHz
1 Wait State
Min
Min
Unit
Notes
Min
tAvcoH
ClKo
12
10
22
ns
2
tCNHAX
ClK High to
Address Invalid
2
0
0
0
ns
3
tLLCH
ADS low to ClK High
ClKo
8
8
22
ns
4
tCHLH
ClK high to ADS High
5
6
5
tSVCH
CS Valid to
ClK High
1
7
6
tCNHSX
ClK High to
CS Invalid
2
0
7
tCHOV
ClK High to Data Valid
7
8
tCHOX
ClK High to Data Invalid
9
tCHOZ
ClK High to Data High Z
10
tBVCH
BLAST Valid to
ClKHigh
11
tCHBX
ClK High to
BLAST Invalid
1
Parameter
< TA < + 70°C, Vcc = 5V ±10%
27960C2-33
Address Valid to
ClK High
No.
Symbol
O°C
Reset
Data
CS
Time
Valid
Tri-state level
level
Max
22
6
Max
32
6
Max
40
ns
7
14
ns
0
0
ns
,
27
5
6
5
25
5
22
5
40
ns
30
ns
5
30
ns
22
8
8
3
30
32
5
ns
40
ns
NOTES:
1. Valid signal level is meant to be either a logic high or logic low.
2. The subscript N represents the number of wait states for this parameter. CS can be de-asserted (high) after the number
of wait states (N) has expired and the EPROM will continue to burst out data for the current cycle.
3. BLAST # must be returned high before the next rising clock edge.
4. The sum of tCHOV + tAVCH +, NCLK will not equal actual tAVOV if independent test conditions are used to obtain tAVCH
and tCHOV (N = number of wait states).
5. ADS must be returned high before the next rising clock edge.
6. Sampled, not 100% tested. The transition is measured ±500 mV from steady state voltage.
7. For capacitive loads above 80 pF, tCHOV can be derated by 1 ns/20 pF.
4-30
27960C}{
~
~
~
en
co
.....
ui
J-
©
®
CD
~
"'
$I
...
$I
'"
$I
'"
Ul
J-
Ul
J-
-~ ~ 1
0
(.)
'"
.....I
(.)
'"
Q
Q
~
I~
I~
~
13
I~
Figure 10. 27960CX Pipelined 2 Wait State AC Waveforms
4-31
intel®
27960CX
AC CONDITIONS OF TEST
Input Rise and Fall Times
(10% to 90%) ........................... .4 ns
Input Timing Reference Level ................ 1.5V
Input Pulse Levels .................. 0.45V to 2.4V
Output Timing Reference Level .............. 1.5V
Table 2. Mode Table
CS
PGM
BLAST
ADS
RESET
Ag
Vpp
Vee
Read
VIL
VIH
VIH(1)
VIH(2)
VIH
X
Vee
Vee
DOUT
Standby(6)
VIH
X
X
X
VIH
X
Vee (5)
Vee
HighZ
Program
VIL
VIL
VIH
VIH(2)
VIH
X
(3)
(3)
DIN
. Program Verify
VIL
VIH
VIH(1)
VIH
VIH
X
(3)
(3)
DOUT
Program Inhibit
VIH
X
X
X
VIH
X
(3)
(3)
HighZ
10 Byte 0: Manufacturer
VIL
VIH
VIH(I)
VIH(2)
VIH
VID(3)
Vee
Vee
89H
10 Byte 1: Part (27960)
VIL
VIH
VIH(I)
VIH(2)
VIH
Vld 3)
Vee
Vee
EOH
10 Byte 2: CX
VIL
VIH
VIH(I)
VIH(2)
VIH
VID(3)
Vee
Vee
01B
10 Byte 3: 1 Wait State
VIL
VIH
VIH(1)
VIH(2)
VIH
Vld 3)
Vee
Vee
01B
10B
X
X
X
X
VIL
X
Vee
Vee
HighZ
Mode
2 Wait States
Reset
OUTPUT
NOTES:
1. VIH until data terminated at which time BLAST must go to VIL.
2. Need to toggle from VII·i to VIL to VIH·
3. See De Programming Characteristics for Vee, VID and Vpp voltages.
4. X can be VIL or VIH.
5. Vpp = Vee to meet standy current specification. Vee> Vpp > VIL will cause a slight increase in standby current.
6. The device must be in the idle state (by asserting RESET or using BLAST) before going into standby.
CAPACITANCE(1)
Symbol
TA = 25°C, f = 1.0 MHz
Parameter
CIN
Input Capacitance
COUT
Output Capacitance
Cvpp
Vpp Capacitance
Typ
Max
Unit
4
6
pF
VIN = OV
12
15
pF
VOUT = OV
40
45
pF
VIN = OV
NOTE:
1. Sampled. Not 100% tested.
4-32
Condition
27960C}(
AC INPUT/OUTPUT REFERENCE WAVEFORMS
AC TESTING lOAD CIRCUIT
VOH:==\
INPUT
(1'5V
~,------~
VOL
I
I
TIMING PARAMETER
DEVICE
UNDER
TEST
,.._----VOH
780n
II---Q---O
iCL=SOPF
OUTPUT
' - - - - - - VOL
290236-15
290236-14
CL includes jig capacitance
For tCHQZC L = 5 pF and RL = 4050
Input and output timings are measured from 1,5V,
Timing values are specified assuming maximum input
and output rise and fall time = 4 ns,
ClOC~(
CHARACTERISTICS
Versions
33 MHz
20 MHz
25 MHz
Mal(
Mal(
16MHz
Max
Min
Parameter
Min
elK
Period
30,3
tpR
Rise Time
tpF
Fall Time
1
4
1
4
1
4
tpL
low Time
(tl2) - 2
tl2
(tl2) - 3
tl2
(tl2) - 4
tl2
High Time
(tl2) - 2
tl2
(tl2) - 4
tl2
(tl2) - 4
tPH
1
Min
Min
Symbol
40
4
tl2
62,5
50
4
1
(tl2) - 3
1
Max Rise Time for Programming elK
=
4
100 ns
ClOC" WAVEFORM
i-o----cLK-----i
290236-16
4-33
Units
Max
ns
4
ns
1
4
ns
(tl2) - 4
tl2
ns
tl2
ns
1
intel®
27960CX
Program/Program Verify
Initially, and after each erasure, all bits of the
EPROM are in the "1 's" state. Data is introduced by
selectively programming "O's" into the desired bit
locations. Although only "O's" can be programmed,
both "1 's" and "O's" can be present in the data
word. Ultraviolet erasure is the only way to change
"D's" to "1 '5",
Programming mode is entered when Vpp is raised to
12.75V. ProgramlVerify operation is synchronous
with the clock and can only be initiated following an
idle state. Program and Program Verify take place in
3 clock cycles. In the first clock cycle, addresses
and data are input and programming occurs. Program Verify follows in the second clock cycle and
the third clock cycle terminates synchronous ProgramlVerify operation, returning the state machine
to the idle .state with outputs at high impedance.
The programmer can verify the device identifier and
choose the programming algorithm that corresponds
to the Intel 27960eX. The inteligent Identifier can
also be used to verify that the product is configured
with the desired Read mode options for wait states.
inteligent Identifier mode is entered when Ag (pin 32)
is raised to its high voltage (VID) level. The internal
state machine is then set for intelligent Identifier
Read operation. Reading the identifier is similar to a
Read operation on a one wait state configured product. Up to four bytes can be read in a single burst
access. inteligent Identifier read is terminated by a
synchronous BLAST input,returning the state machine to the idle state with outputs at high impedance.
The four byte block code for the inteligent Identifier
code is located at address OOH through 03H and is
encoded as follows:
As in the Read mode, A2-A16 point to a four byte
block in the memory array. During programming, the
internal address increment circuitry is disabled and
the programmer must supply Ao and A1 to point to
an individual byte within the four byte block that is to
be programmed. Only one byte is programmed in
each 3 cycle Program/Verify sequence.
Program Inhibit
MEANING
IntellD
(A1, AO)
Byte 00
DATA
89h
27960
ex
Byte 01
Byte 10
EOh
01b
1 Wait State
2 Wait States
Byte 11
Byte 11
01b
10b
RESET MODE
The Program Inhibit mode allows parallel programming and verification of multiple devices with different data. With Vpp at 12.75V, a ProgramlVerify sequence is initiated for any device that receives a valid ADS pulse and rising clock edge while es is asserted. A PGM pulse programs data in the first cycle
of the sequence and data for Program Verify is output in the second cycle. The ProgramlVerify sequence is inhibited on any devices for which es is
not asserted. Data will not be programmed and the
outputs will remain in their high impedance state.
Due to the synchronous nature of the 27960eX, the
various operating modes must be initiated from a
known idle state. During normal operation, the internal state machine returns to an idle state at the termination of a bus access (after BLAST is asserted).
During initial device power up, the state machine is
in an indeterminant state. The reset mode is provided to force operation into the idle state. Reset mode
is entered when the RESET pin is asserted. Output
pins are asynchronously set to the high impedance
state and address latches are put into the flow
through mode. A reset is successfully completed
and the state machine set in an idle state when
RESET has been asserted for a minimum of 10
clock cycles and deasserted for five clock cycles.
inteligent Identifier™ Mode
The device's manufacturer, product type, and configuration are stored in a four byte block that can be
accessed by using the inteligent IdentifierTM mode.
4-34
inteL
27960CX
..
FAIL
290236-17
Figure 11. Quick-Pulse Programming™ Algorithm
4-35
inteJ®
27960CX
mine when the addressed byte is correctly programmed. The algorithm terminates if 25 100 p.s
pulses fail to program a byte. Figure 11 shows the
27960CX Quick-Pulse Programming algorithm flowchart.
QUICK-PULSE PROGRAMMINGTM
ALGORITHM
The Quick-Pulse Programming algorithm programs
Intel's 27960CX. Developed to substantially reduce
programming throughput time, this algorithm allows
optimized equipment to program a 27960CX in under 17 seconds. Actual programming time depends
on the programmer used.
The entire program-pulse/byte-verify sequence is
performed with Vee = 6.25V and Vpp = 12.75V.
The program equipment must establish Vee before
applying voltages to any other pins. When programming is complete, all bytes should be compared to
the original data with Vee = 5.0V and Vpp =
12.75V.
The Quick-Pulse Programming algorithm uses a
100 p.s pulse followed by a byte verification to deter-
D.C. PROGRAMMING CHARACTERISTICS TA
Symbol
Parameter
=
25° +5°C
-
Min
Notes
III
Input Load Current
lee
Vee Program Current
1
Ipp
Vpp Program Current
1
VIL
Input Low Voltage
-0.5
VIH
Input High Voltage
2.0
VOL
Output Low Voltage(Verify)
VOH
Output High Voltage(Verify)
VIO
Ag inteligent Identifier
Voltage
Vee
Supply Voltage (Program)
2
Vpp
Program Voltage
2
Max
Unit
10
p.A
VIN
=
VIH or VIL
125
mA
CS
=
VIL
50
mA
CS
=
VIL
0.8
V
V
IOL
=
2.1 mA
V
IOH
=
Vee
+ 0.5
0.40
Vee - 0.8
11.5
V
12.5
V
6.0
6.5
V
12.5
13.0
V
NOTES: .
1. The maximium current value is with outputs unloaded.
2. Vee must be applied simultaneously or before Vpp and removed simultaneously or after Vpp.
3. During programming clock levels are VIH and VIL.
4-36
Condition
-
400 p.A
27960CX
A.C. PROGRAMMING, RESET AND ID CHARACTERISTICS
T A = 25°C
± 5°C
No.
Symbol
1
tAVPl
Address Valid to PGM low
2
/Ls
2
tCHAX
ClK High to Address Invalid
50
ns
3
tllCH
ADS low to ClK High
1
50
ns
4
tCHlH
ClK High to ADS High
2
50
ns
5
Parameter
tSVCH
CS Valid to ClK High
6
tCHSX
ClK High to CS Invalid
7
tCHQV
ClK High to DOUT Valid
8
tCHOX
9
Notes
Min
Max
50
Unit
ns
3
ns
100
ns
ClK High to DOUT Invalid
0
ns
tBVCH
BLAST Valid to ClK High
50
ns
10
tCHBX
ClK High to BLAST Invalid
50
ns
11
4
tOVPl
DATA Valid to PGM low
12
tplPH
PGM Program Pulse Width
95
13
tpHQX
PGM High to DIN Invalid
2
/Ls
14
tClPl
ClK low to PGM low
50
ns
15
tQZCH
DIN Tri-State to ClK High
2
/Ls
16
tvcs
VCC Program Voltage to ClK High
7
2
/Ls
17
tvps
VPP Program Voltage to ClK High
7
2
/Ls
,
18
105
/Ls
2
/Ls
2
/Ls
2
/Ls
tAgHCH
Ag VID Voltage to ClK High
19
tCHAgX
ClK High to Ag Not VID Voltage
20
tRVCH
RESET Valid to ClK High
6
50
ns
21
tCHCl
ClK High to ClK low
5
100
ns
22
tClCH
ClK low to ClK High
5
100
ns
NOTES:
1. If CS is low, ADS can go low no sooner than the falling edge of the previous ClK.
2. ADS must retum high prior to the next rising edge of clock.
3. CS must remain low until after the rising edge of ClK1.
4. BLAST must return high prior to the next rising edge of ClK.
5. Max ClK rise/fall time is 100 ns.
6. RESET must be low for 10 clock cycles and high for 5 clock cycles.
7. Vee must be applied simultaneously or before Vpp and removed simultaneously or after Vpp.
4-37
intel®
27960CX
0
1
2
'" :,: Jt--'-@-1-11-=-_
_ _ _ _ _fihn
CD :
:0
ADDR
1.....-_ __
H.'": .___
A:_~_RE_S_S_0
-1-:____
_____
_!__i~3
(ADDRESS 1
u
VIH
DATA
~~0
B
,
---<
DATA 0 IN
:
_
'@
@
@
VIH
\
PGM
:----
@
jC?:
DATA O~
OUT :.IF
"I
/
....
DATA 1
@
)
141--
_ _ VIH
BLAST
290236-18
Figure 12. 27960CX Programming Waveforms
4-38
27960C){
RESET and inteligent Identifier Waveforms
:8:
---
_ _~_--I)( '~:~~!~: xxZZXXXXXXXXXXX
ADDR
\
IX!
----~--~--~fi«V'~·D.~
."h
DATA
8yta 0
Vpp
VID
@:.~.
,.......;:...-_ _ _ _ _ _ _ _ _ _ _
f:
Ag
VIH.. .,.XX.. .,. .X.,. . . "XX.. .,. .X.......
XZ
.........Z........
XX
.........Z...-Jy:
@
~
•
:~
: ~
:
290236-19
Figure 13. 27960C)( RESET and ID Waveforms
4-39
27960KX
BURST ACCESS 1M (128K x 8) CHMOS EPROM
• Synchronous 4-Byte Data Burst Access
•
Asynch Microcontrol/er Reset Function
- Returns to Known State with High Z
Outputs
•
CHMOS* II/-E for High Performance and
Low Power
-125 rnA Active, 30 rnA Standby
- TTL Compatible Inputs
•
1 Mbit Density Configures as 128K x 8
• Simple Interface to the 80960KAlKB
•
High Performance Clock to Data Out
- Zero Wait State Data-to-Data Burst
- Supports 16, 20 and 25 MHz
80960KAlKB Devices
Intel's 27960KX is a 5V only, 1,048,576 bit, Erasable Programmable Read Only Memory, organized as 128K
words of 8 bits.
The 27960KXprovides a simple synchronous burst interface to the 80960KAlKB bus. Internally the 27960KX
is organized in 4 byte blocks, in which each byte is accessed sequentially. The internal state machine is factory
configured to generate either 1 or 2 wait-states between the address and first data byte. High performance
outputs provide zero wait-state data to data accesses at clock frequencies up to 25 MHz.,
An asynchronous microcontroller RESET feature. puts the outputs in the high impedance state and takes the
internal state machine to a known state where a new burst access can begin.
The 27960KX is available in 44 lead PLCC package, providing optimum cost effectiveness.
The 27960KX is manufactured on Intel's 1 micron CHMOS III-E technology. The Quick-Pulse Programming™
algorithm provides fast, reliable programming with throughput under 17 seconds for optimized equipment.
'CHMOS is a patented process of Intel Corporation.
elK
290237-1
Figure 1. 27960KX Burst EPROM Block Diagram
4-40
September 1991
Order Number: 290237-006
27960KX
27960KX BURST EPROM
Architecture
EPROMs are established as the preferred code storage device in embedded applications. The non-volatile, flexible, reliable, cost effective EPROM makes a
product easier to design, manufacture and service.
Until recently, however, EPROMs could not match
the performance needs of high-end systems. The
27960KX was designed to support the 80960KAlKB
embedded processor. It utilizes the burst interface to
offer near zero-wait state performance without the
high cost normally associated with this performance.
The 27960KX provides a simple, synchronous burst
interface to the 80960KAlKB's bus. Internally, the
27960KX is organized in 4 byte blocks each byte is
accessed sequentially. A burst access begins on the
first clock pulse after CS is asserted. The address of
the four byte block is latched by the rising edge of
ALE. After a preset number of wait-states (1 or 2),
data is output one byte at a time on each subsequent clock cycle. A burst access is terminated on
the rising edge of CLOCK if BLAST is asserted. High
performance outputs provide zero wait-state data to
data accesses at clock frequencies up to 25 MHz.
Extra power and ground pins dedicated to the outputs reduce the effects of fast output switching on
device performance.
In embedded designs, board space and cost must
be kept at a minimum without impacting performance and reliability. The 27960KX removes the need
for expensive high-speed shadow RAM backed up
by slow EPROM or ROM for non-volatile code storage. Code optimization concerns are reduced with
"off-chip" code fetches no longer crippling to system performance. FONTs can be run directly out of
these EPROMs at the same performance as highspeed DRAMs. With the 27960KX, the EPROM is
the ideal code or FONT storage device for your
80960KAlKB system.
The 27960KX delivers 4 bytes of data in 8 clock
cycles at 25 MHz and 4 bytes of data in 7 clock
cycles· at 20 MHz. In a 32-bit configuration, this
translates into a read bandwidth of 50 Mbytes/sec
and 45 Mbytes/sec respectively. Performance capability of the 27960KX in different 80960KAlKB systems is given in Table 1.
ADDRESS
"
17
A
!
I'
DATA
~
8
ALE
BLAST
27960KX
BURST
EPROM
128K x 8
RESET
eLK
PGM
290237-2
Figure 2. 27960KX Burst EPROM Signal Set
4-41
..
intel~
27960KX
A,6
A,s
A,4
A,3
A,2
A,1
A,o
N27960KX.
44· LEAD PLCC
0.650" x 0.650"
.rOP VIEW
Ag
31
AS
A7
As
290237-3
Fi~ure
3. 27960KX 44-Lead PLCC Pinout
PIN DESCRIPTIONS
Symbol
Pin
Function
Ao-A16:
23-39
ADDRESS INPUTS: During a burst operation, A2 through A16 provide the base
address painting to a block of four consecutive bytes. Ao and A1 select the first
byte of the burst access. The 27960KX latches valid addresses in the first clock
cycle. An internal address generator increments addresses Ao and A1 for
subsequent bytes of the burst.
00- 0 7:
18,17,14,13,
11,10,7,6
DATAINPUTS/OUTPUTS
ALE
42
ADDRESS LATCH ENABLE: Indicates the transfer of a physical address. ALE
is an active low signal used to latch the addresses from the processor.
Addresses are latched on the rising edge of ALE. Valid addresses must be
present at or before ALE becomes valid.
es
3
CHIP SELECT: Master device enable. When asserted (active low) data can be
written to and read from the device. In read mode, es enables the state
machine and the liD circuitry.
NOTES:
1. The address decode path is independent of es, i.e., X and Y decoding is
always powered up.
2. For programming, es should remain low for the entire cycle. Program and
verify functions are done one byte at a time.
3. es going high does not terminate a concurrent burst cycle.
4. es must be deasserted between bursts.
BLAST
1
BURST LAST: Terminates a concurrent burst data cycle at the rising edge of the
elK. It must be asserted by the fourth data byte.
RESET
22
RESET: Resets the state machine into a known state, tri-states the outputs. The
duration of RESET should be 10 elK cycles minimum. At least 5 clock cycles
are required after deassertion of RESET before beginning the next cycle. Reset
will abort a concurrent bus cycle.
4-42
27960KX
PIN DESCRIPTIONS (Continued)
Symbol
Pin
Function
PGM
43
PROGRAM-PULSE CONTROL INPUT
Vpp
2
PROGRAMMING POWER SUPPLY Vpp
Vss
5, B, 12,
15, 19,21
GROUND
Vee
9,16,20,44
SUPPLY VOLTAGE INPUT
Table 1. Performance Capability
25/20 MHz 2 WS NON-BUFFERED : 4 WORDS/S CLOC" CYCLES AOOR
OATA
ClK
Aoo WS
C1
C2
WS
C3
RS
000
C4
001 002
C5
Cs
003
C7
Ca
A01
C1
WS
C2
20 MHz 1 WS NON-BUFFERED : 4 WORDSI7 CLOC" CYCLES
AOOR Aoo WS
OATA
ClK C1
C2
RS
000
C3
001
C4
002 003
C5 Cs
C7
WS
Ao1
C1
C2
16 MHz 1 WS BUFFERED: 4 WORDSI7 CLOC" CYCLES
AOOR
OATA
ClK
Aoo WS
C1
C2
RS
000
C3
001
C4
002 003
C5 Cs
C7
A01
C1
WS
C2
010
C3
-
010
C3
50/40 MBYTES/SEC
WS
C3
-
011
C4
RS
010 011
C4 C5
012
Cs
013
C7
Ca
45 MBYTES/SEC
RS
012 013
C5 Cs
Ao3 WS
C7
36 MBVTES/SEC
RS
011
C4
012 013
C5 Cs
A03 WS
C7
Overview
system (shown) and 4B pF for a 256K x 32 system.
The EPROM is specified at 4 pF for input capacitance and 12 pF typical for output capacitance.
Larger systems can be implemented with buffers.
The following design offers a simple interface to the
80960KAlKB's bus.
Chip Select Logic
INTERFACE EXAMPLE
High order address lines are decoded to provide CS.
Qualification with other 'signals is not required. The
chip select logic can be implemented with standard
asynchronous decoders, PAL's or PlO's (like Intel's
85C960).
A non-buffered 27960KX burst EPROM system is
shown in Figure 4. Since the 27960KX is capable of
driving a 120 pF load, large, non-buffered systems
can be implemented by stacking up to 2 banks of 4
EPROMs, giving a memory size of 256K x 32. The
input capacitive load seen on the address lines (due
to the EPROM only) is 24 pF for a 128K x 32
4-43
•
int:et
27960KX
ADDRESS
LATCHES
1====:>
'--_--'
•
ADDRESS TO NON-BURST MODE MEMORY
SEE NOTE
LAO(31:0)
80960KX
CLK2
CLK(25I.1HZ)
RESET
290237-6
NOTE:
27960KX does not require address latches
Figure 4. 128K x 32 Burst EPROM System
Waveforms
CS Deassert between bursts
Figure 5 shows the timing waveforms of 27960KX
reads in a 32-bit system.
After every EPROM read (one to four words) CS
must be deasserted.
CS setup time
Reset and RESET
CS setup time is the time between CS asserted and
the first rising ClK edge of ClK (during the address
cycle). Since a memory access begins on the first
ClK rising edge after CS asserted, a minimum CS
setup time of 5 ns (tsvCH) at 25 MHz is required.
With the 80960KA/KB's maximum valid address delay of 18 ns at 25 MHz, 13 ns remains for CS decoding logic.
The 27960KX uses RESET. The 80960 KAlKB
RESET signal must be inverted for the 27960KX.
Clock Phase
The initial rising edge of ClK and ClK2 must be in
phase with as small a skew as possible.
4-44
·27960K){
A
CLK
ws
RC
A
ws
RC
290237-9
NOTES:
1. 1-0-0-0 Burst Read -+
1 indicates the number of wait states to access the first word
O's indicate the number of wait states for subsequent data words (0 in this case)
2. 279S0KX latches addresses on the rising edge of ALE: it has an internal address generator which increments addresses for subsequent words of the burst.
Figure 5. Two Cycles of a 27960K)( 1 Wait State, 4-Byte Read (1-0-0-0 Burst Read) in a 32-Bit System
generate the 27960KX AC timing. Worst case timings were always assumed. The example below
shows how the 27960K1-20 tavcoh timing was derived.
27960KX DEVICE NAMES
The device names on the 27960KX were derived as
mnemonics that correspond to the number of wait
states and expected operating frequency for the device. For example, the 25 MHz, 2 wait state
27960KX is named 27960K2-25.
@20 MHz the clock cycle is - 50 ns.
ts of the B0960KAlKB is 2-20 ns.
4 ns clock skew guardband.
AC TIMING DERIVATIONS
27960K1-20 tavcoh
The AC timings for the 27960KX were generated
specifically to meet the requirements of the
80960KAlKB microprocessor. In each case the applicable 80960KA/KB clock frequency and AC timing were taken together with an address buffer delay
(if needed) and a 4 ns positive clock skew or a 2 ns
negative clock skew (see Figure 6A). guardband to
On timings such as this, where the EPROM is faster
than the microprocessor, we specified the EPROM's
timing leaving the excess time as system guardband.
4-45
= 50 ns - 20 ns - 4 ns
= 26ns
int:et
27960KX
CLK2
(\0 80960)
2n0-1
~
rI
jJ-_..!
CLK
I
I
I
I
290237-11
NOTE:
The 27960KX allows a positive clock skew (ClK2 leading ClK) of up to 4 ns and a negative clock skew (ClK2 lagging
ClK) of up to 2 ns. The larger positive clock skew takes into account longer trace lengths and heavier loading on the Ix
clock trace.
Figure 6A: D~finition of Positive and Negative Clock Skew
\
150CLOCK
~Hz
I
I
Combinatorial
80960KB
---.
Driver
PAL
16l8-7
ClK2
---.
74F244
ClK
l
J
I
I
27960KX
27960KX
27960KX
27960KX
290237-12
NOTE:
ClK and ClK2 are generated by the same PAL. This minimizes skew between ClK and ClK2. Both PAL outputs are fed
to a 74F244 driver. The EPROMs should be as close to the clock driver as possible.
Figure 6B. Example Clock Circuit with Minimum Skew
4-46
27960KX
Vee
Vee
10K
10K
ClK2 50 MHz
Po
00
P1
01
P2
°2
P3
03
ClK 25 MHz
7 4AS 1804 NAND DRIVERS
74ACT163
100 MHz
CP
osc
eET
CEP
290237-20
NOTE:
This clock generation circuit uses a 100 MHz oscillator. The EPROMs should be as close to the NAND drivers as
possible.
Figure 6C. Example Clocl< Circuit Using a 100 MHz Oscillator
Decoders are needed for the systems address (chip
select) decoding. For the 27960KX's timings we assumed a 5-10 ns chip select decoder for 16 MHz
and 20 MHz frequencies and a 5-9 ns decoder for
25 MHz systems. The example below shows how
the 27960K2-25 tsvch timing was derived.
The .list below shows the buffers used in generating
these timings:
Output
Input
Buffer
Buffer
20 MHz
9 ns
5 ns
16 MHz
10 ns
7 ns
@25 MHz the clock cycle is ~ 40 ns.
t6 of the 80960KA/KB is 2-18 ns.
Decoder = 9 ns
4 ns clock skew guardband
27960K2-25 tsvch
=
=
The 20 MHz buffers are slightly faster in keeping
with the increased sensitivity for higher performance. We chose the above buffers because of their
wide availability. Significantly faster buffers are available for applications requiring them. The example
below shows tchqv for the 27960K2-20.
40 ns - 18 ns - 9 ns - 4 ns
9 ns
@20 MHz the clock cycle is ~ 50 ns.
tlO of the 80960KAlKB is 3 ns.
Output buffer for 20 MHz = 5 ns.
4 ns clock skew guard band
SYSTEM BUFFERING CONSIDERATIONS
For many large system applications buffering may
be required between the microprocessor and memory devices. The 20 MHz - 2 WS and 16 MHz
27960KX AC timings take this into account. For applications at these frequencies not requiring buffering these devices will provide an additional 5-10 ns
of system guardband.
27960K2-20 tchqv
4-47
=
=
50 ns - 5 ns - 3 ns - 4 ns
38 ns
intel®
27960KX
ABSOLUTE MAXIMUM RATINGS*
NOTICE: This data sheet contains preliminary information on new products in production. The specifications are subject to change without notice. Verify with
your local Intel Sales office that you have the latest
data sheet before finalizing a design.
Read Operating Temperature ...... O°C to + 70°C(8)
Case Temperature under Bias .. -10°C to + 80°C(8)
Storage Temperature .......... - 65°C to + 125°C
• WARNING: Stressing the device beyond the "Absolute
Maximum Ratings" may cause permanent damage.
These are stress ratings only. Operation beyond the
"Operating Conditions" is not recommended and extended exposure beyond the "Operating Conditions"
may affect device reliability.
All Input or Output Voltages ..... - 0.6V to + 6.5V(4)
with Respect to Ground
Voltage on A9 ................ -0.6V to
with Respect to Ground
+ 13.0V(4)
Vpp Supply Voltage ........... -0.6V to + 14.0V(4)
with Respect to Ground
Vee Supply Voltage '........... -0.6V to + 7.0V(4)
with Respect to Ground
DC CHARACTERISTICS: READ OPERATION
< T A < + 70°C, Vee = 5V ± 10%, TTL Inputs
O°C
Symbol
III
Parameter
Notes
Min'
Input Load Current
Max
Unit
1
/LA
VIN
Test Condition
=
= 5.5V
= 0 to Vee, PGM =
= VIH, f = 25 MHz
ILO
Output Leakage Current
10
/LA
VOUT
Ipp
Vpp Load Current Read
10
p,A
Vpp
IS8
Vee Standby Switching
I
IStable
2
45
mA
CS
2
30
mA
CS
Icc
Vce Active Current
1,3,7
125
mA
CS
VIL
Input Low Voltage
4
-,--0.5
0.8
V
VIH
Input High Voltage
2.0
Vee+ 1
V
VOL
Output Low Voltage
0.45
V
10L
VOH
Output High Voltage
5
Vee- 0.8
V
10H
5
2.4
los
Output Short Circuit
6
V
100
5.5V
=
=
VIH
VIH
VIL, f
=
25 MHz, lOUT
=
0 mA
= 2.1 mA
= -100/LA
10H = -400/LA
mA
NOTES:
1. Maximum current is with outputs unloaded.
2. Icc standby current assumes no output loading, i.e., 10H = 10L = 0 mAo
3. Icc is the sum of current through VCC3 + VCC4 and does not include the current through VCC1 and VCC2. (VCC1 and
VCC2 supply power to the output drivers. VCC3 and VCC4 supply power to the rest of the device.)
4. Minimum De voltage on input and output pins is -0.5V. During transitions, this level may undershoot to -2.0V for
periods less than 20 ns.
5. Maximum De voltage on input and output pins is Vcc + 0.5V which may overshoot to VCC + 2.0V for periods less than
20 ns.
6. One output shorted for no more than one second. los. is sampled but not 100% tested.
7. Icc max measured with a 10.11 /LF capacitor between VCC and Vss.
8. This specification defines commercial product operating temperatures.
4-48
27960K)(
The fifth character represents the signal level indicated for the fourth character. The list below shows
character representations.
EXPLANATION OF AC SYMBOLS
The nomenclature used for timing parameters are as
per IEEE STD 662-1980 IEEE Standard Terminology
for Semiconductor Memory.
A:
B:
C:
H:
L:
P:
Each timing symbol has five characters. The first is
always a "t" (for time). The second character represents a signal name, e.g., (CLK, ALE, etc.). The third
character represents the signal's level (high or low)
for the signal indicated by the second character. The
fourth character represents a signal name at which a
transition occurs marl<
f" =e
01 en
o »
0
:JJ
': ~ ~ ~
~ ~ ~....~........
.... ........................ ...
................
~
........ ........
CD
III
a.
=eIII
<
CD
..0'
3
1/1
~" I ®I
BLAST
I
'1'XXx1'xXX1':®1
I
~
I\)
......
CD
C7I
0
"><
P®
~
@)@
290237-13
II
iI
F
=:.
~
=:.
~
~
~
~
27960K){
AC CONDITIONS OF TEST
Input Rise and Fall Times
(10% to 90%) ............................ 4 ns
Input Pulse Levels .................. 0.45V to 2.4V
Input Timing Reference Level ................ 1.5V
Output Timing Reference Level ...... O.BV and 2.0V
Table 2. Mode Table
MODE
CS
Vee
PGM
BLAST
ALE
RESET
A9
VIL
VIH
VIH(1)
VIH(2)
VIH
X(4)
Vee
Vee
DOUT
VIH
X
X
X
VIH
X
Vec<5)
Vee
HighZ
VIL
VIL
VIH
VIH(2)
VIH
X
(3)
(3)
DIN
Program Verify
VIL
VIH
VIH(1)
VIH
VIH
X
(3)
(3)
Dour
Program Inhibit
VIH
X
X
X
VIH
X
(3)
(3)
HighZ
ID Byte 0: Manufacturer
VIL
VIH
VIH(1)
VIH(2)
VIH
VIO(3)
Vee
ID Byte 1: Part (27960)
VIL
VIH
VIH(1)
VIH(2)
VIH
VIO(3)
Vee
EOH
KX
VIL
VIH
VIH(1)
VIH(2)
VIH
VIO(3)
Vee
Vee
Vee
Vee
ID Byte 3: 1 Wait·State
2 Wait-States
VIL
VIH
VIH(1)
VIH(2)
VIH
VIO(3)
Vee
Vee
018
108
X
X
X
X
VIL
X
Vee
Vee
HighZ
Read
Standby (6)
Program
ID Byte 2:
Reset
Vpp
OUTPUT
B9H
008
NOTES:
1. VIH until data terminated at which time BLAST must go to VIL.
2. Need to toggle from VIH to VIL to VIH to latch address.
3. See oe Programming Characteristics for Vee, VID and Vpp voltages.
4. X can be VIL or VIH.
5. Vpp = Vee to meet standby current specification. Vee> Vpp > VIL will cause a slight increase in standby current.
6. The device must be in the idle state (by asserting RESET or using BLAST) before going into standby.
4-51
int:eL
27960KX
CAPACITANCE(1) TA =
Symbol
25°C, f
= 1.0 MHz
Parameter
Typ
Max
Unit
CIN
Input Capacitance
4
6
pF
VIN =OV
COUT
Output Capacitance
12
15
pF
VOUT =OV
VPP Capacitance
40
45
pF
VIN =OV
CVPP
Condition
NOTE:
1. Sampled, not 100% tested
AC INPUT/OUTPUT REFERENCE WAVEFORMS
AC TESTING LOAD CIRCUIT
'Z.1V
1.5V
780n
TIMING PARAMETER
ICL=
1Z0pF
290237-15
For tCHQZ CL = 5 pF and RL = 4050
CL includes jig capacitance
290237-14
AC test inputs are driven at 2.4V (VOH) for a logic '1'
and 0.45V (VoLl for a logic '0'.
Input timing begins at 1.5V.
Output timing ends at VIH (2.0V) and VIL (0.8V)
Input Rise and fall times (10% to 90%) < 4.0 ns
CLOCK CHARACTERISTICS
Versions
Symbol
Parameter
ClK
Period
25 MHz
Min
16MHz
20 MHz
Max
Min
Max
50
40
Min
Units
Max
62.5
ns
T5
Rise Time
10
10
10
ns
T4
Fall Time
10
10
10
ns
T2
low Time
7
8
11
ns
T3
High Time
7
8
11
ns
Max ClK Rise Time during Programming is
100 ns
CLOCK WAVEFORM
290237-16
4-52
intel®
27960KX
accessed by using the inteligent Identifier™ mode.
The programmer can verify the device identifier and
choose the programming algorithm that corresponds
to the Intel 27960KX. The inteligent Identifier can
also be used to verify that the product is configured
with the desired Read mode options for wait states.
Program/Program Verify
Initially, and after each erasure, all bits of the
EPROM are in the "1's" state. Data is introduced by
selectively programming "O's" into the desired bit
locations. Although only "O's" can be programmed,
both "1's" and "O's" can be present in the data
word. Ultraviolet erasure is the only way to change
"a's" to "1 's",
Inteligent Identifier mode is entered when Ag (pin
32) is raised to its high voltage (VH) level. The internal state machine is then set for inteligent Identifier
Read operation. Reading the Identifier is similar to a
Read operation on a one wait state configured product. Up to four bytes can be read in a single burst
access. inteligent Identifier read is terminated by a
synchronous BLAST input, returning the state machine to the idle state with outputs at high impedance.
Program mode is entered when Vpp is raised to
12.75V. ProgramlVerify operation is synchronous
with the clock and can only be initiated following an
idle state. Program and Program Verify take place in
3 clock cycles. In the first clock cycle, addresses
and data are input and programming occurs. Program Verify follows in the second clock cycle and
the third clock cycle terminates synchronous ProgramlVerify operation, returning the state machine
to the idle state with outputs at high impedance.
The four byte block code for the inteligent Identifier
code is located at address OOH through 03H and is
encoded as follows:
As in the Read mode, A2-A16 point to a four byte
block in the memory array. During Programming the
internal address increment circuitry is disabled and
the programmer must supply Ao and A1 to point to
an individual byte within the four byte block that is to
be programmed. Only one byte is programmed in
each 3 cycle programlVerify sequence.
MEANING
IntellD
27960
KX
1 wait state
2 wait states
(A1, Ao)
Byte 00
DATA
89h
Byte 01
Byte 10
EOh
OOb
01b
10b
Byte 11
Byte 11
Program Inhibit
RESET MODE
Program Inhibit mode allows parallel programming
and verification of multiple devices with different
data. With Vpp at 12.75V, a ProgramlVerify sequence is initiated for any device that receives a valid ALE pulse and rising clock edge while CS is asserted. A PGM pulse programs data in the first cycle
of the sequence and data for Program Verify is output in the second cycle. The ProgramlVerify sequence is inhibited on any devices for which CS is
not asserted during the first (ALE) cycle. Data will
not be programmed and the outputs will remain in
their high impedance state.
Due to the synchronous nature of the 27960KX, the
various operating modes must be initiated from a
known idle state. During normal operation, the internal state machine returns to an idle state at the termination of a bus access (after BLAST is asserted).
During initial device power up, the state machine is
in an indeterminant state. The reset mode is provided to force operation in to the idle state. Reset mode
is entered when the RESET pin is asserted. Output
pins are asynchronously set to the high impedance
state and address latches are put into the flow
through mode. A reset is successfully completed
and the state machine set in an idle state in the
cycle after RESET has been asserted for a minimum
of 10 clock cycles and deasserted for five clock cycles.
inteligent IdentifierTM Mode
The device's manufacturer, product type, and configuration are stored in a four byte block that can be
4-53
intel®
27960KX
FAIL
290237-17
FigureS. Quick-Pulse Programming™ Algorithm
4-54
27960K}{
pulses fail to program a byte. Figure 8 shows the
27960KX Quick-Pulse Programming algorithm flowchart.
QUICK-PULSE PROGRAMMING
ALGORITHM
The Quick-Pulse Programming algorithm programs
Intel's 27960KX. Developed to substantially reduce
programming throughput time, this algorithm allows
optimized equipment to program a 27960KX in under 17 seconds. Actual programming time depends
on the programmer used.
The entire program-pulse, byte-verify sequence is
performed with Vee = 6.25V and Vpp = 12.75V.
The programming equipment must establish Vee before applying voltages to any other pins. When programming is complete, all bytes should be compared
to the original data with Vee = 5.0V and Vpp =
12.75V.
The Quick-Pulse Programming algorithm uses a
100 fLs pulse followed by a byte verfication to determine when the addressed byte is correctly programmed. The algorithm terminates if 25 100fLs
D.C. PROGRAMMING CHARACTIERISTICS TA= 25°C +5°C
Symbol
Parameter
Notes
Min
Mal(
Unit
Test Condition
III
Input Load Current
fLA
VIN = VIH or VIL
lee
Vee Program Current
1
125
mA
CS = VIL
Ipp
Vpp Program Current
1
50
mA
CS = VIL
VIL
Input Low Voltage
-0.5
0.8
V
VIH
Input High Voltage
2.0
Vee + 0.5
V
VOL
Output Low Voltage (Verify)
VOH
Output High Voltage (Verify)
10
0.40
Vee- 0.8
V
IOL = 2.1 mA
V
IOH = - 400 fLA
VID
Ag inteligent Identifier Voltage
11.5
12.5
V
Vee
Supply Voltage (Program)
2
6.0
6.5
V
Vpp
Program Voltage
2
12.5
13.0
V
NOTES:
1. The maximum current value is with outputs unloaded.
2. Vce must be applied simultaneously or before Vpp and remove simultaneously or after Vpp.
3. During programming clock levels are VIH and VIL.
4-55
intel®
27960KX
AC PROGRAMMING, RESET AND ID CHARACTERISTICS
No
Symbol
Parameter
Notes
TA = 25°C
Min
Units
tAVPl
Address Valid to PGM low
2
/Ls
tCHAX
ClK High to Address Invalid
50
ns
3
tlLCH
ALE low to ClK High
1
50
ns
4
2
50
ns
50
ns
1
2
tCHLH
ClK High to ALE High
5
tSVCH
CS Valid to ClK High
6
tCHSX
ClK High to CS Invalid
7
tCHQV
ClKHigh to DOUT·Valid
8
tCHQX
ClK High to DOUT Invalid
0
ns
9
tBVCH
BLAST Valid to ClK High
50
ns
10
tCHBX
ClK High to BLAST Invalid
50
ns
11
tQVPL
DATA Valid to PGM low
2
/Ls
12
tpLPH
PGM Program Pulse Width
95
13
tpHQX
PGM High to DIN Invalid
14
tCLPL
ClK low to PGM low
15
tQZCH
DIN in Tri-State to ClK High
16
tvcs
tvps
VCC Program Voltage to ClK High
Vpp Program Voltage to ClK High
18
tAgHCH
Ag VID Voltage to ClK High
2
/Ls
19
tCHA9X
ClK High to A9 not VID Voltage
2
/Ls
20
tRVCH
RESET Valid to ClK High
6
50
ns
21
tCHCL
ClK High to ClK low
5
100
ns
tCLCH
ClK low to ClK High
5
100
ns
17
22
3
ns
100
4
2
/Ls
/Ls
ns
2
/Ls
7
2
/Ls
7
2
/Ls
If CS is low, ALE can go low no sooner than the falling edge of the previous CLK.
ALE must return high prior to the next rising edge of clock.
CS must remain low until after the rising edge CLK1.
BLAST must return high prior to the next rising edge of ClK.
Max CLK rise/fall time is 100 ns.
RESET must be held low for 10 cycles and high for 5 cycles before performing a read.
Vee must be applied simultaneously or before Vpp and removed simultaneously or after Vpp.
4-56
105
ns
.50
NOTES:
1.
2.
3.
4.
5.
6.
7.
± 5°C
Max
""".
m;,j@ i
H
IL
.
Y'
AD DR
!!
co
...
I:
ill
Y IL
;;:l
en
.
:
:
.
:17\
•
1-7-----;...-@
·1·
@~-=-i
_ Y IH
PGM
:i"
co
:E
III
BLAST
<
-
CO
Cl
o
J
"
X
]:
:
%
:
...0
fIl
~
~@----"O
......
YIH
(\)
3
:
YIL
I\)
@
"0
3
3
U
DATA 0 IN
YIL
III
.
ADDRESS 1
:
.
"><
0
""'cJ,
" CO
-..j
...
_~-«
:@
3 :~
DATA
0
.
ADDRESS 0
@
'------
2
:
~H
N
'0
1d!
(\)
!O
:3
. •:
IH
:1:
~
CD"
,:l
c(
Y,H
CS y
IL
®:
4:-
12.7SV
.-:
•
@:
y;...~_~
!® ®!@)
..
~
~
.
.
.
.
~
Vpp~
2:§J
ffiiil
F
YCC4
6.2SY
=
:
@
~
~
290237-18
~
2:§J
~
27960KX
_ _----i-_ _----JXI~~~:;
ADDR
XXXXXXXXXXXXXXX
\tV
----,...-----....;...---41««
DATA
~
I.D.
Byte 0
~
@!
Vpp
V10
Ag V
1H
@!
r:-...-:-:-~~---------..;..--.....::-~:
i~
xxxxxxxxxxxxf
290237-19
Figure 10. 27960KX RESET and 10 Waveforms
4-58
82596CA
HIGH-PERFORMANCE 32-BIT LOCAL
AREA NETWORK COPROCESSOR
Jill
Performs Complete CSMAlCD Medium
Access Control (MAC) FunctionsIndependently of CPU
-IEEE 802.3 (EOC) Frame Delimiting
- HDLC Frame Delimiting
I:l Optimized CPU Interface
- Optimized Bus Interface to Intel's
i486™D}(, i486™S}( and 80960CA
Processors
- Supports Big Endian and Little
Endian Byte Ordering
II Supports Industry Standard LANs
-IEEE TYPE 10BASE-T,
IEEE TYPE 10BASE5 (Ethernet*),
IEEE TYPE 10BASE2 (Cheapernet),
IEEE TYPE 1BASE5 (StarLAN),
and the Proposed Standard
10BASE-F
- Proprietary CSMA/CD Networks Up
to 20 Mb/s
IliI 32-Bit Bus Master Interface
-106 MB/s Bus Bandwidth
- Burst Bus Transfers
- Bus Throttle Timers
- Transfers Data at 100% of Serial
Bandwidth
- 128-Byte Receive fiFO, 64-Byte
Transmit FIFO
I::l On-Chip Memory Management
~
- Automatic Bufier Chaining
- Buffer Reclamation after Receipt of
Bad Frames; Optional Save Bad
Frames
- 32-Bit Segmented or Linear (Flat)
Memory Addressing Formats
t;1l
~
Structures
0
and PGA Package
(See Packaging Spec Order No. 240800·001,
Package Type KU and A)
i486 is a trademark of Intel Corporation.
'Ethernet is a registered trademark of Xerox Corporation.
"CHMOS is a patented process of Intel Corporation.
82586 Software Compatible
r---------------------,
Sorial
I
RTS
I
CTS
I
I
TxC
TxD
I
CRS
I
I
COT
RxD
RxC
I
1"'-------,
FIFO
I
Subsyst~m
I
LPBK I
Transmit
Bit
Machine
~
I-
Subsystem
Transmit
Byto
Machine
~
~
R
Rx
I
I
I
I
I
Timer
Rece;ve
Bit
Machine
A'
I
I
;;'1
Subsystem
I
Unit
------
I
I
I
I
I
,-l\
V
,
~
L
.,
~
Switch
j[trOI
~
-+
(DIU)
.J\
'---
Bus
Interface
I
I
I
~
~
----y'
Control
D~A
I
I
I
I
I
I
I
I
.
~
I
I
I
I
I
I
Unit
U
0.
1i5
~
J ~J
(BIU)
'"
Micro
Machine
I
I PORT
Dala. Bus
I
~
Data
Interface
' I I
I
- -- - - _..
,
Receive
Byte
Machine
I
I lE/BE
Control
32-Bit DBus
r-
Logic
Backoff
I
32-Bit DBus
8
Carrier
Sense
Collision
Exponential
r-------------,
Poro1ls1
I
I
FIFO
Detect
I
I
I
I
I
I
I
I
I
High-Speed, 5V, CHMOS'* IV
Technology
D 132-Pin Plastic Quad Flat Pack (PQFP)
Network Management and Diagnostics
- Monitor Mode
- 32-Bit Statistical Counters
-
Self-Test Diagnostics
El Configurable Initialization Root for Data
,)
Address
I B te Enable,)
I
290218-1
Figure 1. 82596CA Block Diagram
4-59
November 1991
Order Number: 290218·004
int'eL
82596CA
82596CA High-Performance 32-Bit
local Area Network Coprocessor
CONTENTS
CONTENTS
PAGE
PAGE
INTRODUCTION ........................ 4-61
SYSTEM CONTROL BLOCK (SCB) ..... 4-84
PIN DESCRIPTIONS .................... 4-65
SCB OFFSET ADDRESSES . ............ 4-87
CBL Offset (Address) .................... 4-87
82596 AND HOST CPU
INTERACTION ....................... 4-69
RFA Offset (Address) .................... 4-87
82596 BUS INTERFACE ................ 4-69
SCB STATISTICAL COU~TERS ........ 4-88
82596 MEMORY ADDRESSiNG ........ 4-69
Statistical Counter Operation ............ 4-88
82596 SYSTEM MEMORY
STRUCTURE .. "....................... 4-71
ACTION COMMANDS AND
OPERATING MODES ................. 4-89
NOP .................................... 4-90
TRANSMIT AND RECEIVE MEMORY
STRUCTURES . ....................... 4-72
Individual Address Setup ................ 4-90
Configure ............................... 4-91
TRANSMITTING FRAMES .............. 4-75
Multicast-Setup ......................... 4-97
RECEIVING FRAMES . .................. 4-76
Transmit ................................ 4-98
Jamming Rules .......................... 4~ 100
82596 NETWORK MAt'JAGEMEt'n Mm
DIAGNOSTICS ....................... 4-76
TOR .............. ; .................... 4-101
Dump ., ................................ 4-103
NETWORK PLANNING AND
MAINTENANCE ...................... 4-78
Diagnose ............................. ; 4' 106
STATION DIAGNOSTICS AND SELFTEST ................................. 4-79
RECEIVE FRAME DESCRiPTOR ...... 4-107
Simplified Memory Structure ............ 4-107
82586 SOFTWARE COMPATIBILITY ... 4-79
Flexible Memory Structure .............. 4-108
INITIALIZING THE 82596 ............... 4-79
Receive Buffer Descriptor (RBD) ....... 4-109
SYSTEM CONFIGURATION POINTER
(SCP) ................................. 4-79
PGA PACKAGE THERMAL
SPECIFICATIONS ................... 4-114
Writing the Sysbus ...................... 4-80
ELECTRICAL AND TIMING
CHARACTERISTICS ................ 4-114
INTERMEDIATE SYSTEM
CONFIGURATION POINTER
(ISCP) ............................. ; .. 4-81
Absolute Maximum Ratings ............. 4-114
INITIALIZATION PROCESS ............ 4-81
AC Characteristics ..................... 4-115
CONTROLLING THE 82596CA ......... 4-82
82596CA Input/Output System
Timings .............................. 4-115
82596 CPU ACCESS INTERFACE
(PORT) ............................... 4-82
Transmit/ Receive Clock Parameters ... 4-117
MEMORY ADDRESSING FORMATS .... 4-82
System Interface AC Timing
Characteristics ....................... 4-121
DC Characteristics ..................... 4-114
82596CA BUS Operation ............... 4-120
LITTLE EN DIAN AND BIG ENDIAN
BYTE ORDERING .................... 4-83
Input Waveforms ....................... 4-122
Serial AC Timing Characteristics ........ 4-124
COMMAND UNIT (CU) .................. .4-83
OUTLINE DIAGRAMS ................. 4-126
RECEIVE UNIT (RU) .................... 4-84
4-60
in~®
82596CA
tallies, channel activity indicators, optional capture
of all frames regardless of destination address
(promiscuous mode), optional capture of errored or
collided frames, and time domain reflectometry for
locating fault points on the network cable. The statistical counters, in 32-bit segmented and linear
modes, are 32-bits each and include CRC errors,
alignment errors, overrun errors, resource errors,
short frames, and received collisions. The 82596CA
also features a monitor mode for network analysis.
In this mode the 82596CA can capture status bytes,
and update statistical counters, of frames monitored
on the link without transferring the contents of the
frames to memory. This can be done concurrently
while transmitting and receiving frames destined for
that station.
INTRODUCTION
The 82596CA is an intelligent, high-performance
32-bit Local Area Network coprocessor. The
82596CA implements the CSMAlCD access method
and can be configured to support all existing IEEE
802.3 standards-TYPEs 1OBASE-T, 10BASE5,
10BASE2, 1BASE5, and 10BROAD36. It can also be
used to implement the proposed standard TYPE
10BASE-F. The 82596CA performs high-level commands, command chaining, and interprocessor communications via shared memory, thus relieving the
host CPU of many tasks associated with network
control. All time-critical functions are performed independently of the CPU, this increases network performance and efficiency. The 82596CA bus interfaces is optimized for Intel's i486™SX, i486™DX,
80960CA, and 80960KB processors.
The 82596CA can be used in both baseband and
broadband networks. It can be configured for maximum network efficiency (minimum contention overhead) with networks of any length. Its highly flexible
CSMA/CD unit supports address field lengths of
zero through six bytes-configurable to either' IEEE
B02.3/Ethernet or HDLC frame delimitation. It also
supports 16- or 32-bit cyclic redundancy checks.
The CRC can be transferred directly to memory for
receive operations, or dynamically inserted for transmit operations. The CSMAlCD unit can also be configured for full duplex operation for high throughput
in point-to-point connections.
The 82596CA implements all IEEE 802.3 Medium
Access Control and channel interface functions,
these include framing, preamble generation and
stripping, source address generation, destination address checking, short-frame detection, and automatic length-field handling. Data rates up to 20 Mb/s are
supported.
The 82596CA provides a powerful host system interface. It manages memory structures automatically,
with command chaining and bidirectional data chaining. An on-chip DMA controller manages four channels, this allows autonomous transfer of data blocks
(buffers and frames) and relieves the CPU of byte
transfer overhead. Buffers containing errored or collided frames can be automatically recovered without
CPU intervention. The 82596CA provides an upgrade path for existing 82586 software drivers by
providing an 82586-software-compatible mode that
supports the current 82586 memory structure. The
82586CA also has a Flexible memory structure and
a Simplified memory structure. The .82596CA can
address up to 4 gigabytes of memory. The 82596CA
supports Little Endian and Big Endian byte ordering.
82596 B-Stepping
The 82956 B-Step incorporates new features compared to the 82596 A1 stepping. The following is a
summary of the 82596 B-step new features.
o
The 82596 B-step transmit buffers can now be
byte aligned.
The 82596CA bus interface can achieve a burst
transfer rate of 106 MB/s at 33 MHz. The bus interface employs bus throttle timers to regulate
82596CA bus use. Two large, independent FIFOs128 bytes for Receive and 64 bytes for Transmittolerate long bus latencies and provide programmable thresholds that allow the user to optimize bus
overhead for any worst-case bus latency. The highperformance bus is capable of back-to-back transmission and reception during the IEEE 802.3 9.6-fLs
Interframe Spacing (IFS) period.
In big endian mode, and when configured to linear mode, the 82596 B-step treats 32-bit address
pointers as big endian 32-bit entities. However,
the SCB absolute address and statistical counters are still treated as two 16-bit big endian entities. This big endian 32-bit entity support is configured through the SYSBUS byte; not setting this
mode will configure the 82596 B-step to be 100%
compatible to the 82596 A1-step big endian
mode.
a The 82596 B-step has improved performance on
back-to-back frame transmission.
o The 82596 B-step can be configured to reread
the next Command Block on the CB list upon receiving a CU RESUME Control Command.
The 82596CA provides a wide range of diagnostics
and network management functions, these include
internal and external loop back, exception condition
The 82596CA is fabricated with Intel's reliable, 5-V,
CHMOS IV (process 648.8) technology. It is available in a 132-pin PQFP or PGA package.
o
4-61
intel$
82596CA
82596CA
(Top View)
2902.18-2
Figure 2. 82596CA PQFP Pin Configuration
4-62
82596CA
A
01
I"0
015
02
03
04
05
06
07
08
09
10
o
o
o
o
o
o
013
06
05
Vss
VSS
O'
o
o
0
o
o
o
018
012
09
D8
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
02
Vee
DO
Vee
CPO
PORT
"iIT:AST
HOLD
0
o
o
o
o
o
o
o
o
o
o
o
o
o
020
D16
Dl.
Dll
010
D7
D3
01
elK
OP3
DPI
READY
lNT/iNf
CA
0
o
o
o
D22
D21
D17
lOCK
0
o
o
026
024
019
METAL LID
o
Ws
o
w/fi
o
01
02
03
04
BREa
o
o
AHOLD
BEl
05
0
o
o
o
o
o
Vss
Vee
D23
HLDA
BOfF
Vss
o
o
o
08
P9
0
o
o
Vss
Vee
025
0
o
o
Vss
027
D2e
06
(82596CA Pin View)
0
o
o
o
o
o
D29
D31
030
A3
A2
BE3
0
o
o
o
o
o
10
11
A4
11
12
13
14
0
o
o
o
o
o
TxD
RxC
CTS
A8
A6
A5
o
12
13
0
0
0
0
0
0
0
0
0
0
0
0
0
i:Pe'i(
RxD
TxC
A30
A2B
A25
A23
A21
AI8
AlB
A12
AIO
A9
A7
0
0
0
0
0
0
0
0
0
0
0
0
0
o
COT
RESET
Vee
A29
Vee
A26
Vee
Vee
Vee
A19
Vee
AI4
Al.3
All
0
o
o
o
o
o
o
o
o
o
o
o
o
o
CRS
lE!iiE
A31
A27
Vss
A24
Vss
Vss
Vss
A22
Vss
A20
AI7
A1S
A
290218-3
Figure 3. 82596CA PGA Pinout
4-63
82596CA
82596CA PGA Cross Reference by Pin Name
Address
Data
Signal
Pin No.
Signal
Pin No.
A2
A3
N9
M9
M10
P11
N11
P12
M11
N12
M12
P13
L12
N13
M13
P14
K12
N14
J12
K13
M14
H12
K14
G12
F14
F12
F13
014
E12
013
012
C14
DO
01
02
03
04
05
06
07
08
09
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
J2
H3
G2
G3
G1
01
C1
F3
02
C2
E3
03
82
81
C3
A1
83
C4
A2
C5
A3
84
A4
A5
A6
A7
A8
A9
A10
A11
A12
A13
A14
A15
A16
A17
A18
A19
A20
A21
A22
A23
A24
A25
A26
A27
A28
A29
A30
A31
A4
C6
85
C7
A5
88
C8
A9
C9
89
Serial
Interface
Control
Signal
ADS
AHOLO
8EO
8E1
8E2
8E3
8LAST
80FF
8ROY
8REQ
8S16
CA
CLK
OPO
OP1
OP2
OP3
HLDA
HOLD
INT/INT
LE/8E
LOCK
PCHK
PORT
READY
RESET
WIA
Vee
Vss
Pin No.
Signal
Pin No.
Pin No.
Pin No.
M5
N5
M7
P5
M8
P9
N2
N6
M1
P4
N1
P3
J3
L2
L3
L1
K3
M6
P2
N3
814
M4
P1
M2
M3
813
N4
COl:
CRS
CTS
LP8K
RTS
RxC
RxO
TxC
TxO
A13
A14
C11
A12
C10
811
812
C12
A11
86
87
810
C13
E2
E13
F2
G13
H2
H13
J13
K2
L13
N7
N8
N10
A6
A7
A8
A10
E1
E14
F1
G14
H1
H14
J1
J14
K1
L14
P6
P7
P8
P10
4-64
,
82596CA
PIN DIESCRIPTIONS
Symbol
ClK
00-031
PQFP
Pin No.
Type
9
I
14-53
I/O
Name and Function
CLOCK. The system clock input provides the fundamental timing for
the 82596. It is a 1X ClK input used to generate the 82596 clock and
requires TTL levels. All external timing parameters are specified in
reference to the rising edge of ClK.
DATA BUS. The 32 Data Bus lines are bidirectional, tri·state lines that
provide the general purpose data path between the 82596 and
memory. With the 82596 the bus can be either 16 or 32 bits wide; this
is determined by the BS16 signal. The 82596 always drives all 32 data
lines during Write operations, even with a 16-bit bus. 031- DO are
floated after a Reset or when the bus is not acquired.
These lines are inputs during a CPU Port access; in this mode the CPU
writes the next address to the 82596 through the data lines. During
PORT commands (Relocatable SCP, Self-Test, Reset and Dump) the
address must be aligned to a 16-byte boundary. This frees the 03-00
lines so they can be used to distinguish the commands. The following
is a summary of the decoding data.
DO
01
02
03
031-04
Function
0
0
1
1
0
1
0
1
0
0
0
0
0
0
0
0
0000
AOOR
AD DR
AOOR
Reset
Relocatable SCP
Self-Test
Dump Command
OPO-OP3
4-7
I/O
DATA PARITY. These aretri-stated data parity pins. There is one
parity line for each byte of the data bus. The 82596 drives them with
even-parity information during write operations having the same timing
as data writes. Likewise, even-parity information, with the same timing
as read information, must be driven back to the 82596 over these pins
to ensure that the correct parity check status is indicated by the
82596.
PCHK
127
0
PARITY CHECK. This pin is driven high one clock after ROY to inform
Read operations of the parity status of data sampled at the end of the
previous clock cycle. When driven low it indicates that incorrect parity
data has been sampled. It only checks the parity status of enabled
'bytes, which are indicated by the Byte Enable and Bus Size signals.
PCHK is only valid for one clock time after data read is returned to the
82596; i.e., it is inactive (high) at all other times.
A31-A2
70-108
0
ADDRESS LINES. These 30 tri-stated Address lines output the
address bits required for memory operation. These lines are floated
after a Reset or when the bus is not acquired.
BE3-BEO
109-114
0
BYTE ENABLE. These tri-stated signals are used to indicate which
bytes are involved with the current memory access. The number of
Byte Enable signals asserted indicates the physical size of the data
being transferred (1, 2, 3, or 4 bytes).
" BEO indicates 07 -DO
" BE1 indicates 015-08
o BE2 indicates 023-016
o BE3 indicates 031-024
These lines are floated after a Reset or when the bus is not acquired.
120
0
WRITE/READ. This dual function pin is used to distinguish Write and
Read cycles. This line is floated after a Reset or when the bus is not
acquired.
W/R
4-65
int:eL
82596CA
PIN DESCRIPTIONS (Continued)
PQFP
Pin No.
Type
Name and Function
ADS
124
0
ADDRESS· STATUS. The 82596 uses this tri-state pin to indicate to
indicate that a valid bus cycle has begun and that A31-A2, BE3-BEO,
and W/R are being driven. It is asserted during t1 bus states. This line
is floated after a Reset or when the bus is not acquired.
RDY
130
I
READY. Active low. This signal is the acknowledgment from
addressed memory that the transfer cycle can be completed. When
high, it causes wait states to be inserted. It is ignored at the end of the
first clock of the bus cycle's data cycle. This active-low signal does not
have an internal pull-up resistor. This signal must meet the setup and
hold times to operate correctly.
BRDY
2
I
BURST READY. Active low. Burst Ready, like RDY, indicates that the
external system has presented valid data on the data pins in response
to a Read, or that the external system has accepted the 82596 data in
response to a Write request Also, like RDY, this signal is ignored at
the end of the first clock in a bus cycle. If the 82596 can still receive
data from the previous cycle, ADS will not be asserted in the next
clock cycle; however, Address and Byte Enable will change to reflect .
the next data item expected by the 82596. BRDY will be sampled
. during each succeeding clock and if active, the data on the pins will be
strobed to the 82596 or to external memory (read/write). BRDY
operates exactly like READY during the last data cycle of a burst
sequence and during nonburstable cycles.
BLAST
128
0
BURST LAST. A signal (active low) on this tri-state pin indicates that
the burst cycle is finished and when BRDY is next returned it will be
treated as a normal ready; I.e., another set of addresses will be driven
with ADS or the bus will go idle. BLAST is not asserted if the bus is not
acquired.
AHOLD
117
I
ADDRESS HOLD. This hold signal is active high, it allows another bus
master to access the 82596 address bus. In a system where an 82596
and an i486 processor share the local bus, AHOLD allows the cache
controller to make a cache invalidation cycle while the 82596 holds the
address lines. In response to a signal on this pin,·the 82596
'immediately (I.e. during the next clock) stops driving the entire address
bus (A31..:.A2); the rest of the bus can remain active. For example,
data can be returned for a previously specified bus cycle during
Address Hold. The 82596 will not begin another bus cycle while
AHOLD is active.
BOFF
116
I
BACKOFF. This signal is active low, it informs the 82596 that another
bus master requires access to the bus before the 82596 bus cycle
completes. The 82596 immediately (I.e. during the next clock) floats its
bus. Any data returned to the 82596 while BOFF is asserted is ignored.
BOFF has higher priority than RDY or BRDY; if two such signals are
returned in the same clock period, BOFF is given preference. The
82596 remains in Hold until BOFF goes high, then the 82596 resumes
its bus cycle by driving out the address and status, and asserting ADS.
BOFF should not be asserted during T1.
LOCK
126
0
LOCK. This tri-state pin is used to distinguish locked and unlocked bus
cycles. LOCK generatesa semaphore handshake to the CPU. LOCK
can be active for several memory cycles, it goes active during the first
locked memory cycle (t1) and goes inactive at the last locked cycle
(t2). This line is floated after a Reset or when the bus is not acquired.
LOCK can be disabled via the sysbus byte in software.
Symbol
4-66
82596CA
PIN DIESCRIPTIONS (Continued)
PQFP
Pin No.
Type
B816
129
I
BUS SIZE. This signal allows the 82596CA to work with either 16· or
32-bit bytes. Inserting B816 low causes the 82596 to perform two 16bit memory accesses when transferring 32-bit data. In little endian
mode the 015-00 lines are driven when B816 is inserted, in Big
Endian mode the 031-016 lines are driven.
HOLD
123
0
HOLD. The HOLD signal is active high, the 82596 uses it to request
local bus r)lastership. In normal operation HOLD goes inactive before
HLDA. The 82596 can be forced off the bus by deasserting HLDA or if
the bus throttle timers expire.
HLDA
118
I
HOLD ACKNOWLEDGE. The HLDA signal is active high, it indicates
that bus mastership has been given to the 82596. HLDA is internally
synchronized; after HOLD is detected low, the CPU drives HLOA low.
NOTE:
Do not connect HLDA to VC~it will cause a deadlock. A user wanting
to give the 82596 permanent access to the bus should connect HLDA
to HOLD. If HLOA goes inactive before HOLD, the 82596 will release
the bus (by deasserting HOLD) within a maximum of within a specified
number of bus cycles as specified in the 82596 User's Manual.
BREQ
115
I
BUS RIEQUEST. This signal, when configured to an externally
activated mode, is used to trigger the bus throttle timers.
PORT
3
I
PORT. When this signal is received, the 82596 latches the data on the
data bus into an internal 32-bit register. When the CPU is asserting this
signal it can write into the 82596 (via the data bus). This pin must be
activated twice during all CPU Port access commands.
RE8ET
69
I
RESET. This active high, internally synchronized signal causes the
82596 to terminate current activity. The signal must be high for at least
five system clock cycles. After five system clock cycles and four TxC
clock cycles the 82596 will execute a Reset when it receives a high
RE8ET signal. When RE8ET returns to low the 82596 waits for the
first CA signal and then begins the initialization sequence.
LE/BE
65
I
LITTLE ENDIAN/BIG ENDIAN. This dual-function pin is used to
select byte ordering. When LE/BE is high, little end ian byte ordering is
used; when low, big endian byte ordering is used for data in frames
(bytes) and for control (8CB, RFD, CBL, etc).
CA
119
I
CHANNEL ATTENTION. The CPU uses this pin to force the 82596 to
begin executing memory resident Command blocks. The CA signal is
internally synchronized. The signal must be high for at least one
system clock. It is latched internally on the high to low edge and then
detected by the 82596.
The first CA after a Reset forces the 82596 into the initialization
sequence beginning at location 00FFFFF6h or an 8CP address written
to the 82596 using CPU Port access. All subsequent CA signals cause
the 82596 to begin executing new command sequences from the 8CB.
Symbol
I
Nnme nnd Function
..
INTIINT
125
0
INTERRUPT_ A high signal on this pin notifies the CPU that the 82596
is requesting an interrupt. This signal is an edge triggered interrupt
signal, and can be configured to be active high or low.
4-67
intel®
82596CA
PIN DESCRIPTIONS (Continued)
Symbol
PQFP
Pin No.
Vee
17 Pins
POWER. +5 V
Vss
17 Pins
GROUND.OV.
TxD
54
a
TRANSMIT DATA. This pin transmits data to the serial link. It is high
when not transmitting.
TxC
64
I
TRANSMIT CLOCK. This signal provides the fundamental timing for
the serial subsystem. The clock is also used to transmit data
synchronously on the TxD pin. For NRZ encoding, data is transferred
to the TxD pin on the high to low clock transition. For Manchester
encoding, the transmitted bit center is aligned with the low to high
transition. Transmit clock must always be running for proper device
operation.
LPBK
58
a
RxD
60
I
RECEIVE DATA. This pin receives NRZ serial data only. It must be
high when not receiving.
RxC
59
I
RECEiVE CLOCK. This signal provides timing information to the
internal shifting logic. For NRZ data the state of the RxD pin is
sampled on the high to low transition of the clock.
RTS
57
a
CTS
62
I
CLEAR TO SEND. An active· low signal that enables the 82596 to
send data. It is normally used as an interface handshake to RTS.
Asserting CTS high stops transmission. CTS is internally synchronized.
If CTS goes inactive, meeting the setup time to the TxC negative edge,
the transmission will stop and RTS will go inactive within, at most, two
TxC cycles.
CRS
63
I
CARRIER SENSE. This signal is active low, it is used to notify the
82596 that traffic is on the serial link. It is only used if the 82596 is
configured for external Carrier Sense. In this configuration external
Circuitry is required for detecting traffic on the serial link. CRS is
internally synchronized. To be accepted, the signal must remain active
for at least two serial clock cycles (for CRSF = 0).
CDT
61
I
COLLISION DETECT. This active·low signal informs the 82596 that a
collision has occurred. It is only used if the 82596 is configured for
external Collision Detect. External circuitry is required for collision
detection. CDT is internally synchronized. To be accepted; the signal
must remain active for at least two serial clock cycles (for CDTF = 0).
Type
Name and Function
± 10%.
LOOPBACK. This TTL·level control signal enables the loopback
mode. In this mode serial data on the TxD input is routed through the
82C501 internal circuits and back to the RxD output without driving the
transceiver cable. To enable this signal, both internal and external
loopback need to be set with the Configure command.
REQUEST TO SEND. When this signal is low the 82596 informs the
external interface that it has data to transmit. It is forced high after a
Reset or when transmission is stopped.
4·68
82596CA
82596 AND HOST CPU INTERACTION
82596 BUS INTERFACE
The 82596CA and the host CPU communicate
through shared memory. Because of its on-chip
DMA capability, the 82596 can make data block
transfers (buffers and frames) independently of the
CPU; this greatly reduces the CPU byte transfer
overhead.
The 82596CA has bus interface timings and pin definitions that are compatible with Intel's 32-bit
i486™SX and i486™DX microprocessors. This
eliminates the need for additional bus interface logic.
Operating at 33 MHz, the 82596's bus bandwidth
can be as high as 106 MB/s. Since Ethernet only
requires 1.25 MB/s, this leaves a considerable
amount of bandwidth for the CPU. The 82596 also
has a bus throttle to regulate its use of the bus. Two
timers can be programmed through the SCB: one
controls the maximum time the 82596 can remain on
the bus, the other controls the time the 82596 must
stay off the bus (see Figure 5). The bus throttle can
be programmed to trigger internally with HLDA or
externally with BREQ. These timers can restrict the
82596 HOLD activation time and improve bus utilization.
The 82596 is a multitasking coprocessor that comprises two independent logical units-the Command
Unit (CU) and the Receive Unit (RU). The CU executes commands from shared memory. The RU handles all activities related to frame reception. The independence of the CU and RU enables the 82596 to
engage in both activities Simultaneously-the CU
can fetch and execute commands from memory
while the RU is storing received frames in memory.
The CPU is only involved with this process after the
CU has executed a sequence of commands or the
RU has finished storing a sequence of frames.
82596 MIEMORY ADDRIESSING
The CPU and the 82596 use the hardware signals
Interrupt (lNT) and Channel Attention (CA) to initiate
communication with the System Control Block
(SCB), see Figure 4. The 82596 uses INT to alert the
CPU of a change in the contents of the SCB, the
CPU uses CA to alert the 82596.
The 82596 has a CPU Port Access state that allows
the CPU to execute certain functions without accessing memory. The 82596 PORT pin and data bus
pins are used to enable this feature. The CPU can.
directly activate four operations when the 82596 is in
this state.
o Write an alternative System Configuration Pointer
(SCP). This can be used when the 82596 cannot
use the default SCP address space.
o Write a different Dump Command Pointer and execute Dump. This can be used for troubleshooting No Response problems.
o The CPU can reset the 82596 via software without disturbing the rest of the system.
o A self,test can be used for board testing; the
82596 will execute a self-test and write the results to memory.
4-69
The 82596 has a 32-bit memory address range,
which allows addressing up to four gigabytes of
memory. The 82596 has three memory addressing
modes (see Table 1).
II 82586 Mode. The 82596 has a 24-bit memory
address range. The System Control Block, Command List, Receive Descriptor List, and Buffer
Descriptors must reside in one 64-KB memory
segment. Transmit and Receive buffers can reside in a 24·bit address space.
o 32-Bit Segmented Mode. The 82596 has a 32bit memory address range. The System Control
Block, Command List, Receive Descriptor List,
and Buffer Descriptors must reside in one 64-KB
memory segment. Transmit and Receive buffers
can reside in a 32-bit address space.
II Linear Mode. The 82596 has a 32-bit memory
address range. Any memory structure can reside
anywhere within the 32-bit memory address
range.
intel®
82596CA
I
CHANNEL ATTENTION
.1
CPU
. ..t.
CA
I 82596
INTERRUPT
INT
;>-
I
"" ;>SHARED MEMORY
INITIALIZATION
ROOT
"-
SYSTEM CONTROL
BLOCK (SCB).
"MAILBOX"
~
v
•
A
~
•
RECEIVE
FRAME
AREA
COMMAND
LIST
290218-4
Figure 4. 82596 and Host CPU Intervention
82596 Bus Use
without Bus
Throttle Timers
I
r-
82596 Bus Us.
with Bus Throttle
I
Timers
I
·1
t1
T-ON
IT-OFF
Lt2--1
rr:oNl
lt3J
t1 =t2+t3
290218-5
Figure 5. Bus Throttle Timers
Table 1.82596 Memory Addressing Formats
Operation Mode
Pointer or Offset
32-Bit
Segmented
82586
ISCP Address
24-Bit Linear
SCB Address
Base (24)
Command Block Pointers
Base (24)
Rx Frame Descriptors
Base (24)
Tx Frame Descriptors
Base (24)
32-Bit Linear
+ Offset (16)
+ Offset (16)
+ Offset (16)
+ Offset (16)
+ Offset (16)
+ Offset (16)
Base (32)
Base (32)
Base (32)
Base (32)
+ Offset (16)
+ Offset (16)
+ Offset (16)
+ Offset(16)
+ Offset (16)
+ Offset (16)
Linear
32-Bit Linear
32-Bit Linear
32-Bit Linear
32-Bit Linear
32-Bit Linear
Rx Buffer Descriptors
Base (24)
Tx Buffer Descriptors
Base (24)
Rx Buffers
24-Bit Linear
32-Bit Linear
32-Bit Linear
Tx Buffers
24-Bit Linear
32-Bit Linear
32-Bit Linear
4-70
Base (32)
Base (32)
32-Bit Linear
32-Bit Linear
82596CA
INITIALIZATION ROOT
COMMAND LIST (Cl)
CO~~~NNT~iIST 1-----;.-1>1
L..-__. . . I
RECEIVE FRAME
POINTER
STATISTICS
BUS
THROTTLE
El~
TRANSMIT
BUFFER
DESCRIPTOR
(TBD)
1
I
1
1
10 _ _ _ _ _ _ - - - _ ..
1
1
TRANSMIT
BUFFER
(1)
(N)
1
TRANSMIT
BUFFER
1
T 1
J
RECEIVE FRAME AREA (RFA)
'--_ _--I (N)
El~
RECEIVE
BUFFER
DESCRIPTOR
(RBD)
1
RECEIVE
BUFFER
DESCRIPTOR
(RBD)
L - _ . - - - - - I (N)
TL..-_......IJ l'----......IT 1
T
290218-6
Figure 6. 82596 Shared Memory Structure
The System Control Block serves as a bidirectional
mail drop for the host CPU and the 82596 CU and
RU. It is the central point through which the CPU and
the 82596 exchange control and status information.
The SCB has two areas. The first contains instructions from the CPU to the 82596. These include:
control of the CU and RU (Start, Abort, Suspend,
and Resume), a pointer to the list of CU commands,
a pointer to the Receive Frame Area, a set of Interrupt Acknowledge bits, and the T-ON and T-OFF
timers for the bus throttle. The second area contains
status information the 82596 is sending to the CPU.
Such as, the CU and RU states (Idle, Active
82596 SYSTEM MEMORY STRUCTURE
The Shared Memory structure consists of four parts:
the Initialization Root, the System Control Block, the
Command List, and the Receive Frame Area (see
Figure 6).
The Initialization Root is in an established location
known to the host CPU and the 82596 (OOFFFFF6h).
However, the CPU can establish the Initialization
Root in another location by using the CPU Port access. This root is accessed during initialization, and
points to the System Control Block.
4-71
Intel.
82596CA
Ready, Suspended, No Receive Resources, etc.), interrupt bits (Command Completed, Frame Received,
CU Not Ready, and RU Not Ready), and statistical
counters.
The Command List functions as a program for the
CU; individual commands are placed in memory
units called Command Blocks. (CBs). These CBs
contain the parameters and status of specific highlevel commands called Action Commands; e.g.,
Transmit or Configure.
Transmit causes the 82596 to transmit a frame. The
Transmit CB contains the destination address, the
length field, and a pointer to a list of linked buffers
holding the frame that is to be constructed from several buffers scattered throughout memory. The
Command Unit operates without CPU intervention;
the DMA for each buffer, and the prefetching of references to new buffers, is performed in parallel. The
CPU is notified only after a transmission is complete.
The Receive Frame Area is a list of Free Frame Descriptors (descriptors not yet used) and a list of userprepared buffers. Frames arrive at the 82596 unsolicited; the 82596 must always be ready to receive
and store them in the Free Frame Area. The Receive Unit fills the buffers when it receives frames,
and reformats the Free Buffer List into receivedframe structures. The frame structure is, for all practical purposes, identical to the format of the frame to
be transmitted. The first Frame descriptor is referenced by the 8CB. Unless the 82596 is configured
to Save Bad Frames, the frame descriptor, and the
associated buffer descriptor, which is wasted when
a bad frame is received, are automatically reclaimed
and returned to the Free Buffer List.
ing this frame'the 82596 sets the next Free Frame
Descriptor RBD pointer to the next Free RBD. Figure
7C shows the RFA after receiving a second frame.
In this example the second frame occupies only one
Receive Buffer and one RFD. The 82596 again sets
the RBD pointer. This process is repeated again in
Figure 7D, showing the reception of another frame
using one Receive Buffer; in this example there is an
extra Frame Descriptor.
TRANSMIT AND RECEIVE MEMORY
STRUCTURES
There are three memory structures for reception and
transmission. The 82586 memory structure, the
Flexible memory structure, and the Simplified memory structure. The 82586 mode is selected by configuring the 82596 during initialization. In this mode all
the 82596 memory structures are compatible with
the 82586 memory structures.
When the 82596 is not configured to the 82586
mOde, the other two memory structures, Simplified
and Flexible, are available for transmitting and receiving. These structures can be selected on a
frame-by-frame basis by setting the S/F bit in the
Transmit Command and the Receive Frame Descriptor (see Figures 29, 30, 41, and 42). The Simplified memory structure offers a simple structure for
ease of programming (see Figure 8). All information
about a frame is contained in one structure; for example, during reception the RFD and data field are
contained in one structure.
The Flexible memory structure (see Figure 9) has a
control field that allows the programmer to specify
the amount of receive data the RFD will contain for
receive operations and the amount of transmit data
the Transmit Command Block will contain for transmit operations. For example, when the control field
in the RFD is set to 20 bytes during a reception, the
first 20 bytes of the data field are stored in the RFD
(6 bytes of destination address, 6 bytes of source
address, 2 bytes of length field, and 6 bytes of data)
and the remainder of the data field is stored in the
Receive Data Buffers. This is us.eful for capturing
frame headers when header information is contained in the data field. The header information can
then be automatically stored in the RFD partitioned
from the Receive Data Buffer.
Receive buffer chaining (storing incoming frames in
a linked buffer list) significantly improves memory
utilization. Without buffer chaining, the user must allocate consecutive blocks of memory, each capable
of containing a maximum frame (for Ethernet, 1518
bytes). Since an average frame is about 200 bytes,
this is very inefficient. With buffer chaining, the user
can allocate small buffers and the 82596 will only
use those that are needed.
Figure 7 A-D illustrates how the 82596 uses the
Receive Frame Area. Figure 7A shows an unused
Receive Frame Area composed of Free Frame Descriptors and Free Receive Buffers prepared by the
user. The SCB points to the first Frame Descriptor of
the Frame Descriptor List. Figure 7B shows the
same Receive Frame Area after receiving one
frame. This first frame occupies two Receive Buffers
and one Frame Descriptor-a valid received frame
will only occupy one Frame Descriptor. After receiv-
The control field can also be used for the Transmit
Command when the Flexible memory structure is
used. The quantity of data field bytes to be transmitted from the Transmit Command Block is specified
by the variable control field.
4-72
82596CA
}
"'...,
Om
...,~
'" m
7. A
~>
T
EL=l
RBD
Ef:]
290218-7
Figure 7. Frame Reception in the RFA
4-73
intel®
82596CA
seB
STATUS
TO COMMAND LIST
I"
I
FD
POINTER
STATISTICS
I
I
I
I
I
I
BUS
THROTTLE
._-----_ ..
RECEIVE
FRAME
DESCRIPTORS
••
RECEIVE FRAME AREA
I
I
I
I
FD1
FDZ
STATUS
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
- -
STATUS
I
VARIABLE
DATA
FIELD
FD4
FD3
lJ
STATUS
S
EMPTY
EMPTY
-Lt:
STATUS
EMPTY
~ RECEIVE FRAME LIST --"'~"':~II-------- FREE FRAME LIST ---------I~~,
290218-8
Figure 8. Simplified Memory Structure
SCB
Pr
FD
POINTER
TO COMMAND LIST
,4
FD1
I
BUS
THROTTLE
I
I
I
.. -----_.
RECEIVE
FRAME
DESCRIPTORS
-
RECEIVE
BUFFERS
--
CONTROL
FIELD
VARIABLE
DATA
FIELD
RBDI
I
RBD2
..-
S
FD3
STATUS
I
r--
L
RECEIVE
BUFFER
DESCRIPTORS
FD2
STATUS
STATISTICS
I
I
I
~
RECEIVE FRAME AREA
0-
FD4
-
STATUS
S
-
lJ
-r:
STATUS
EMPTY
EMPTY
EMPTY
RBD3
RBD4
RBDS
if
-~
-r
..-
T
T
T
T
T
J-
J-
J-
J-
J-
DATA
FIELD
DATA
FIELD
EMPTY
EMPTY
EMPTY
'----
'---
'----
L...-
L...-
BUFFER I
BUFFER Z
BUFFER 3
BUFFER 4
BUFFER S
,+--- RECEIVE
FRAME LIST
I
~:4
54
290218-9
Figure 9. Flexible Memory Structure
4·74
.
:
FREE FRAME LIST
82596CA
cated by the lack of a signal after the last bit of the
frame check sequence field has been transmitted. In
EOC mode the 82596 can be configured to extend
short frames by adding pad bytes (7Eh) during transmission, according to the length field. In HOLC mode
the 82596 will generate the 01111110 flag for the
start and end frame delimiters, and do standard bit
stuffing and stripping. Furthermore, the 82596 can
be configured to pad frames shorter than the specified minimum frame length by appending the appropriate number of flags to the end of the frame.
TRANSMITTING FRAMES
The 82596 executes high-level Action Commands
from the Command List in system memory. Action
Commands are fetched and executed in parallel with
the host CPU operation, thereby significantly improving system performance. The format of the Action
Commands is shown in Figure 10. Figure 28 shows
the 82586 mode, and Figures 29 and 30 show the
command formats of the Linear and 32-bit Segmented modes.
When a collision occurs, the 82596 manages the
jam, random wait, and retry processes, reinitializing
OMA pointers without CPU intervention. Multiple
frames can be sent by linking the appropriate number of Transmit commands together. This is particularly useful when transmitting a message larger than
the maximum frame size (1518 bytes for Ethernet).
A single Transmit command contains, as part of the
command-specific parameters, the destination address and length field of the transmitted frame and a
pointer to buffer area in memory containing the data
portion of the frame. The data field is contained in a
memory data structure consisting of a buffer descriptor (BO) and a data buffer-or a linked list of
buffer descriptors and buffers-as shown in Figure
11.
Multiple data buffers can be chained together using
the BOs. Thus, a frame with a long data field can be
transmitted using several (shorter) data buffers
chained together. This chaining technique allows the
system designer to develop efficient buffer management.
CONTROL
FIELDS
I
I
(POINTER
~~KN~~;L~OMMAND)
COMMAND STATUS
COMMAND
0
-,'>
NEXT
COMMAND
PARAMETER FIELD
(COMMAND-SPECIFIC
PARAMETERS)
290218-10
The 82596 automatically generates the preamble
(alternating 1s and as) and start frame delimiter,
fetches the destination address and length field from
the Transmit command, inserts its unique address
as the source address, fetches the data field specified by the Transmit command, and computes and
appends the CRC to the end of the frame (see Figure 12). In the Linear and 32-bit Segmented mode
the CRC can be optionally inserted on a frame-byframe basis by setting the NC bit in the Transmit
Command Block (see Figures 29 and 30).
Figure 10. Action Command Format
TRANSMIT BD
ACTUAL COUNT
LINK FIELD
DB ADDRESS c>
(24 BITS)
The 82596 can be configured to generate two types
of start and end frame delimiters-End of Carrier
(EOC) or HOLC. In EOC mode the start frame delimiter is 10101011 and the end frame delimiter is indi-
PREAMBLE
START
FRAME
DELIMITER
DESTINATION
ADDRESS
0-
r
--»
NEXT BUFFER DES CRIPTOR
DATA
BUFFER
(DB)
290218-11
Figure 11. Data Buffer Descriptor and
Data Buffer Structure
SOURCE
ADDRESS
LENGTH
FIELD
Figure 12. Frame Format
4-75
DATA
FIELD
FRAME
CHECK
SEQUENCE
END
FRAME
DELIMITER
intel®
82596CA
RECEIVING FRAMES
frame. The 82596 will continue to receive frames
without CPU help as long as Receive Frame Descriptors and Data Buffers are /ivailable.
To reduce CPU overhead, the 82596 is designed to
receive frames without CPU supervision. The host
CPU first sets aside an adequate receive buffer
space and then enables the 82596 Receive. Unit.
Once enabled, the RU watches for arriving frames
and automatically stores them in the Receive Frame
Area (RFA). The RFA contains Receive Frame Descriptors, Receive Buffer Descriptors, and Data Buffers (see Figure 13). The individual Receive Frame
Descriptors make up a Receive Descriptor List
(RDL) used by the 82596 to store the destination
and source addresses, the length field, and the
status of each frame received (see Figure 14).
82596 NETWORK MANAGEMENT
AND DIAGNOSTICS
The behavior of data communication networks is
normally very complex because of their distributed
and asynchronous nature. It is particularly difficult to
pinpoint a failure when it occurs. The 82596 has extensive diagnostic and network management functions that help improve reliability and testability. The
82596 reports on the following events after each
frame is transmitted.
Once enabled, the 82596 checks each passing
frame for an address match. The 82596 will recognize its own unique address, one or more multicast
addresses, or the broadcast address. If a match is
found the 82596 stores the destination and source
addresses and the length field in the next available
RFD. It then begins filling the next available Data
Buffer on the FBL, which is pointed to by the current
RFD, with the data portion of the incoming frame. As
one Data Buffer is filled, the 82596 automatically
fetches the next DB on the FBL until the entire frame
is received. This buffer chaining technique is particularly memory efficient because it allows the system
designer to set aside buffers to fit frames much
shorter than the maximum allowable frame length. If
AL-LOC = 1, or if the flexible memory structure is
used, the addresses and length field can be placed
in the Receive Buffer.
• Transmission successful.
• Transmission unsuccessful. Lost Carrier Sense.
• Transmission unsuccessful. Lost Clear to Send.
• Transmission unsuccessful. A DMA underrun occurred because the system bus did not keep up
with the transmission.
• Transmission unsuccessful. The number of collisions exceeded the maximum allowed.
• Number of Collisions. The number of collisions
experienced during the frame.
• Heartbeat Indicator. This indicates the presence
of a heartbeat during the last Interframe Spacing
(IFS) after transmission.
When configured to Save Bad Frames the 82596
checks each incoming frame and reports the following errors.
Once the entire frame is received without error, the
82596 does the following housekeeping tasks.
• CRC error. Incorrect CRC in a properly aligned
frame .
.. The actual count field of the last Buffer Descriptor used to hold the frame just received is updated with the number of bytes stored in the associated Data Buffer.
• Alignment error. Incorrect CRC in a misaligned
frame.
• Frame too short. The frame is shorter than the
value configured for minimum frame length .
.. The next available Receive Frame Descriptor is
fetched.
• Overrun. Part of the frame was not placed in
memory because the system bus did not keep up
with incoming data.
.. The address of the next available Buffer Descriptor is written to the next available Receive Frame
Descriptor.
• Out of buffer. Part of the frame was discarded
because of insufficient memory storage space .
.. A frame received interrupt status bit is posted in
the SCB.
• Receive collision. A collision was detected during
reception .
.. An interrupt is sent to the CPU.
• Length error. A frame not matching the frame
length parameter was detected.
If a frame error occurs, for example a CRC error, the
82596 automatically reinitializes its DMA pointers
and reclaims any data buffers containing the bad
4-76
int:et
82596CA
RECEIVER FRAME AREA (RFA)
FD
FD
0 0 - 4 L . . . - _-----I
FREE BUFFER LIST (FBL)
RECEIVE
BUFFER
DESCRIPTOR(RBD)
RBD
RBD
DATA
BUFFER (DB)
[J
[J
290218-12
Figure 13. Receive Frame Area Diagram
RECEIVE FRAME STATUS
LINK FIELD
C>
->
NEXT RECEIVE
FRAME DESCRIPTOR
BUFFER DESCRIPTOR
LINK FIELD
C>
->
BUFFER DESCRIPTOR
DESTINATION ADDRESS
SOURCE ADDRESS
LENGTH FIELD
290218-13
Figure 14. Receive Frame Descriptor
4-77
intel®
82596CA
The 82596 will receive all frames and put them in the
RFD. Frames that exceed the available space in the
RFD will be truncated, the status will be updated,
and the 82596 will retrieve the next RFD.This allows
the user to capture the initial data bytes of each
frame (for instance, the header) and discard the remainder of the frame.
NETWORK PLANNING AND
MAINTENANCE
To properly plan, operate, and maintain a communication network, the network management entity
must accumulate information on network behavior.
The 82596 provides a rich set of network-wide diagnostics that can serve as the basis fora network
management entity.
The 82596 also has a monitor mode for network
analysis. During normal operation the receive function enables the 82596 to receive frames that pass
address filtering. These frames must have the Start
of Frame Delimiter (SFD) field and must be longer
than the absolute minimum frame length of 5 bytes
(6 bytes in case of Multicast address filtering). Contents and status of the received frames are transferred to memory. The monitor function enables the
82596 to simply evaluate the incoming frames. The
82596 can monitor the frames that pass or do not
pass the address filtering. It can also monitor frames
which do not have the SFD fields. The 82596 can be
configured to only keep statistical information about
monitor frames. Three options are available in the
Monitor mode. These options are selected by the
two monitor mode configuration bits available in the
configuration command.
Information on network activity is provided in the
status of each frame transmitted. The 82596 reports
the following activity indicators after each frame.
o Number of collisions. The number of collisions
the 82596 experienced while attempting to transmit the frame.
• Deferred transmission. During the first transmission attempt the 82596 had to defer to traffic on
the link.
The 82596 updates its 32-bit statistical counters after each received frame that both passes address
filtering and is longer than the· Minimum Frame
Length configuration parameter. The 82596 reports
the following statistics.
• CRC errors. The number of well-aligned frames
that experienced a CRC error.
e Alignment errors. The number of misaligned
frames that experienced a CRC error.
o No resources. The number of frames that were
discarded because of insufficient resources for
reception.
o Overrun errors. The number of frames that were
not completely stored in memory because the
system bus did not keep up with incoming data.
o Receive Collision counter. The number of collisions detected during receive.
• Short Frame counter. The number of frames that
were discarded because they were shorter than
the configured minimum frame length.
The 82596 can be configured to Promiscuous mode.
In this mode it captures all frames transmitted on the
network without checking the Destination Address.
This is useful when implementing a monitoring station to capture all frames for analysis.
When the first option is selected, the 82596 receives
good frames that pass address. filtering and transfers them to memory while monitoring frames that
do not pass address filtering or are shorter than the
minimum frame size (these frames are not transferred to memory). When this option is used the
82596 updates six counters: CRC errors, alignment
errors, no resource errors, overrun errors, short
frames and total good frames received.
When the second option is selected, the receive
function is completely disabled. The 82596 monitors
only those frames that pass address filterings and
meet the minimum frame length requirement. When
this option is used the 82596 updates six counters:
CRC errors, alignment errors, total frames (good and
. bad), short frames, collisions detected and total
good frames.
When the third option is selected, the receive function is completely disabled. The 82596 monitors all
frames, including frames that do not have a Start
Frame Delimiter. When this option is used the 82596
updates six counters: CRC errors, alignment errors,
total frames (good and bad), short frames, collisions
detected and total good frames.
A useful method of capturing frame headers is to
use the Simplified memory mode, configure the
82596 to Save Bad Frames, and configure the
82596 to Promiscuous mode with space in the RFD
allocated for specific number of receive data bytes.
4-78
intel®
82596CA
INITIALIZING THE 82596
STATION DIAGNOSTICS
AND SELF-TEST
A Reset command is issued to the 82596 to prepare
it for normal operation. The 82596 is initialized
through two data structures that are addressed by
two pointers, the System Configuration Pointer
(SCP) and the Intermediate System Configuration
Pointer (ISCP). The initialization procedure begins
when a Channel Attention signal is asserted after
RESET. The 82596 uses the address of the double
word that contains the SCP as a default00FFFFF4h. Before the CA signal is asserted this
default address can be changed to any other available address by asserting the PORT pin and providing the desired address over the D31-D4 pins of the
address bus. Pins 03-00 must be 0010; i.e., any
alternative address must be aligned to 16-byte
boundaries. All addresses sent to the 82596 must be
word aligned, which means that all pointers and
memory structures must start on an even address
(Ao=zero).
The 82596 provides a large set of diagnostic and
network management functions. These include internal and external loopback and time domain reflectometry for locating fault points in the network cable.
The 82596 ensures software reliability by dumping
the contents of the 82596 internal registers into system memory. The 82596 has a self-test mode that
enables it to run an internal self-test and place the
results in system memory.
82586 SOFTWARE COMPATIBILITY
a
The 82596 has software-compatible state in which
all its memory structures are compatible with the
82586 memory structure. This includes all the Action
Commands, the Receive Frame Area (including the
RFD, Buffer Descriptors, and Data Buffers), the System Control Block, and the initialization procedures.
There are two minor differences between the 82596
in the 82586-Compatible memory structure and the
82586.
o When the internal and external loopback bits in
the Configure command are set to 11 the 82596
is in external loopback and the LPBK pin is activated; in the 82586 this situation would produce
internal loopback.
o During a Dump command both the 82596 and
82586 dump the same number of bytes; however,
the data format is different.
SYSTEM
(SCP)
CONlIFIGU~ATIONI
I?OiNiTlE1F;l
The SCP contains the sysbus byte and the location
of the next structure of the initialization process, the
ISCP. The following parameters are selected in the
SYSBUS.
o The 82596 operation mode.
o The Bus Throttle timer triggering method.
o Lock enabled.
II Interrupt polarity.
o Big Endian 32-bit entity mode.
Byte ordering is determined by the LEIBE pin.
LE/BE= 1 selects Little Endian byte ordering and
LEIBE = 0 selects Big Endian byte ordering.
NOTE:
In the following, X indicates a bit not checked
82586 mode. This bit must be set to 0 in all other
modes.
4-79
82596CA
The following diagram illustrates the format of the SCPo
31
ODD WORD
X X X X X X X X
0
oj 0
0 0
X X X X X X
xix
X X X X X X X OFFFFF8h
10 0
X X X X X X X X X X X X X X X
0
EVEN WORD
16 15
SYSBUS
xix
A31 ................ A24 A23
0 0
0 0
0 0
0 0
o
OFFFFF4h
AO OFFFFFCh
ISCP ADDRESS
A31 ................ A24 are not checked in 82586 mode
X .................... X areas are not checked in 82586 mode; they must be 0 in all other modes.
23
0- The 32-bit address pointers 'in linear mo.de ore treated
as two 16-bit big endian entities. This is identical to
J l
I I
SYS8US
BE
~
. tho 82596 A 1 stopping definition.
1 - The 32-blt address pointers in linear mode ore treated
as 32-bit big endien entities. This mode is only supported
in the 82596 8 stepping. In this mode the see absolute
address and statistical counters are still treated as two
1 liNT
I I I
LOCK
TRG
16
t.41
t.40
Ix I
U
L
1 0 : Linear mode
1 1 : Reserved
o : internal
16-bit big end ion entities.
Interrupt polarity
Interrupt pin is active
L : NOT CHECKED
0 0 : 82586 mode
o 1 : 32-8it Segmented mode
triggering of the
Bus Throttle timers
o-
1 : external triggering of the
Bus Throttle timers
high
1, - Interrupt pin is active
low
L...--....-..:-
0 : Lock function enabled
1 : Lock function disabled
290218-14
ISCP ADDRESS- The physical address of the ISCP. In the 82586 mode, bits A31-A24 are considered to
be zero.
Figure 15. The System Configuration Pointer
Writing the Sysbus
When writing the sysbus byte it is important to pay attention to the byte order.
.. When a Little Endian processor is used, the sysbus byte is located at byte address 00FFFFF6h (or address
n + 2 if an alternative SCP address n was programmed).
" When a processor using Big Endian byte ordering is used, the sysbus, alternative SCP, and ISCP addresses
will be different.
o The sysbus byte is located at 00FFFFF5h.
" If an alternative SCP address is programmed, the sysbus byte should be at byte address n + 1.
4·80
82596CA
INTERMEDIATE SYSTEM CONFIGURATION POINTER (ISCP)
The ISCP indicates the location of the System Control Block. Often the SCP is in ROM and the.ISCP is in RAM.
The CPU loads the SCB address (or an equivalent data structure) into the ISCP and asserts CA. This Channel
Attention signal causes the 82596 to begin its initialization procedure and to get the SCB address from the
ISCP and SCPo In 82586 and 32-bit Segmented modes the SCP base address is also the base address of all
Command Blocks, Frame Descriptors, and Buffer Descriptors (but not buffers). All these data structures must
reside in one 64-KB segment; however, in Linear mode no such limitation is imposed.
The following diagram illustrates the ISCP format.
ODD WORD
31
EVEN WORD
8 7
16 15
A15
SCB OFFSET
AO
o
BUSY
~---------------.----------------L----------------L---------------4
L..-_ _ _ _ _ _ _ _ _ _ _ _ _ _....L.A;...2c:..3________________S;...C:..:B:..:B:..:A;...S:..:E:..:A;...D:..:D:..:R;...E:..:S:..:S'--______________-=-=AO.:.J
ISCP
ISCP
+4
i
X X X X X X X X - in 82586 mode
A31 ................ A24 - in 32-bit segmented mode.
BUSY
-
Indicates that the 82596 is being initialized. The CPU sets the ISCP to 01 h before it gives
the first CA to the 82596. The ISCP is cleared by the 82596 after the SCB base and offset
are read. Note that the most significant byte of the first word of the ISCP is not modified
when BUSY is cleared.
SCB OFFSET-This 16-bit quantity specifies the offset portion of the address of the SCB.
SCB BASE
-
Specifies the base portion of the address of the SCB. The base of SCB is also. the base of
all 82596 Command Blocks, Frame Descriptors and Buffer Descriptors. In the 82586
mode, bits A31-A24 are considered to be zero.
Figure 16. The Intermediate System Configuration Pointer-82586 and 32-Bit Segmented Modes
ODD WORD
31
16 15
EVEN WORD
8 7
o
000
Indicates that the 82596 is being .initialized. The ISCP is set to 01 h by the CPU before its
first CA to the 82596. It is cleared by the 82596 after the SCB address is read.
SCB ADDRESS- This 32-bit quantity specifies the physical address of the SCB.
BUSY
-
Figure 17. The Intermediate System Configuration Pointer-Linear Mode.
INITIALIZATION PROCESS
The CPU sets up the SCP, ISCP, and the SCB structures, and, if desired, an alternative SCP address. It also
sets BUSY to 01 h. The 82596 is initialized when a Channel Attention signal follows a Reset signal, causing the
82596 to access the System Configuration Pointer. The sysbus byte, the operational mode,the bus throttle
timer triggering method, the interrupt polarity, and the state of LOCK are read. After reset the Bus Throttle
. timers are essentially disabled-the T-ON value is infinite, the T-OFF value is zero. After the SCP is read, the
82596 reads the ISCP and saves the SCB address. In 82586 and 32-bit Segmented modes this address is
represented as a base address plus the offset (this base address is also the base address of all the control
blocks). In Linear mode the base address is also an absolute address. The 82596 clears BUSY, sets CX and
CNR to equal 1 in the SCB, clears the SCB command word, sends an interrupt to the CPU, and awaits another
Channel Attention signal. RESET configures the 82596 to its default state before CA is asserted.
4-81
•
A
int:eL
82596CA
CONTROLLING THE 82596CA
The host CPU controls the 82596 with the commands, data structures, and methods described in this section.
The CPU and the 82596 communicate through shared memory structures. The 82596 contains two indepen·
dent units: the Command Unit and the Receive Unit. The Command Unit executes commands from the CPU,
and the Receive Unit handles frame reception. These two units are controlled and monitored by the CPU
through a shared memory structure called the System Control Block (SCB). The CPU and the 82596 use the
CA and INT signals to communicate with the SCB.
82596 CPU ACCESS INTERFACE (PORT)
The 82596 has a CPU access interface that allows the host CPU to do four things.
• Write an alternative System Configuration Pointer address.
• Write an alternative Dump area pointer and perform Dump.
• Execute a software reset.
• Execute a self·test.
The following events initiate the CPU access state.
• Presence of an address on the D31-D4 data bus pins.
• The D3-DO pins are used to select one of the four functions.
• The PORT input ,pin is asserted, as in a regular write cycle.
NOTE.
TheSCP Dump and Self·Test addresses must be 16·byte aligned.
The 82596 requires two 16·bit write cycles for a port command. The first write holds the internal machines and
reads the first 16 bits; the second activates the PORT command and reads the second 1,6 bits.
The PORT Reset is useful when only the 82596 needs to be reset. The CPU must wait for 1a·system and 5·se·
rial clocks before issuing another CA to the 82596; this new CA begins a new initialization process.
The Dump function is useful for troubleshooting No Response problems. If the chip is in a No Response state,
the PORT Dump operation can be executed and a PORT Reset can be used to reinitialize the 82596 without
disturbing the rest of the system.
The Self·Test function can be used for, board testing; the 82596 will execute a self·test and write the results to
memory.
Table 2. PORT Function Selection
031 ................................. . 04 ............................ . 00
03
02
01
00
Reset
A31
Don't Care
A4
a
0
a
a
Self·Test
A31
Self·Test Results Address
A4
a
0
a
1
,SCP
A31
Alternative SCP Address
A4
a
0
1
0
Dump
A31 '
Dump Area Pointer
A4
a
a
1
1
Function
Addresses and Results
MEMORY ADDRESSING FORMATS
The 82596 accesses memory by 32·bit addresses. There are two types of 32·bit addresses: linear and seg·
mented. The type of address used depends on the 82596 operating mode and the type of memory structure it
is addressing. The 82596 has three operating modes.
4·82
82596CA
o
o
82586 Mode
o
A Linear address is a single 24-bit entity. Address pins A31 -A24 are always zero.
o
A Segmented address uses a 24-bit base and a 16-bit offset.
32-bit Segmented Mode
., A Linear address is a sin.gle 32-bit entity.
o
A Segmented address uses a 32-bit base and a 16-bit offset.
NOTE:
In the previous two memory addressing modes, each command header (CB, TBD, RFD, RBD, and SCB)
must wholly reside within one segment. If the 82596 encounters a memory structure that does not follow this
restriction, the 82596 will fetch the next contiguous location in memory (beyond the segment).
o
Linear Mode
o
A Linear address is a single 32-bit entity.
o
There are no Segmented addresses.
Linear addresses are primarily used to address transmit and receive data buffers: In the 82586 and 32-bit
Segmented modes, segmented addresses (base plus offset) are used for all Command Blocks, Buffer Descriptors, Frame Descriptors, and System Control Blocks. When using Segmented addresses, only the offset
portion of the entity being addressed is specified in the block. The base for all offsets is the same-that of the
SCB. See Table 1.
LITTLE ENDIAN AND BIG ENDIAN BYTE ORDERING
The 82596 supports both Little Endian and Big Endian byte ordering for its memory structures.
The 82596 A1 stepping supports Big Endian byte ordering for word and byte entities. Dword entities are not
supported with 82596 A1 Big Endian byte ordering. This results in slightly different 82596A1 memory structures for Big Endian operation. These structures are defined in the 32 LAN Components Users Manual
The 82596 B stepping supports Big Endian byte ordering for Linear mode only. All 82596 B 32-bit address
pointers are treated as 32-bit Big Endian entities, however, the SCB absolute address and statistical counters
are treated as two 16-bit Big Endian entities. This 32-bit Big Endian entity support is configured through bit 7 in
the SYSBUS byte.
NOTE:
All 82596 memory entities must be word or dword aligned, except the transmit buffers can be byte aligned
for the 82596 B-Stepping.
An example of a dword entity is a frame descriptor command/status dword, whereas the raw data of the frame
are byte entities. Both 32- and 16-bit buses are supported. When a 16-bit bus is used with Big Endian memory
organization, data lines 015-00 are used. The 82596 has an internal crossover that handles these swap
operations.
COMMAND UNIT (CU)
The Command Unit is the logical unit that executes Action Commands from a list of commands very similar to
a CPU program. A Command Block is associated with each Action Command. The CU is modeled as a logical
machine that takes, at any given time, one of the following states.
o
Idle. The CU is not executing a command and is not associated with a CB on the list. This is the initial state.
.. Suspended. The CU is not executing a command; however, it is associated with a CB on the list.
o Active. The CU is executing an Action Command and pointing to its CB.
4-83
II
intel®
82596CA
The CPU can affect CU operation in two ways: by issuing a CU Control Command or by setting bits in the
Command word of the Action Command.
RECEIVE UNIT (RU)
The Receive Unit is the logical unit that receives frames and stores them in memory. The RU is modeled as a
logical machine that takes, at any given time, one of the following states.
• Idle. The RU has no memory resources and is discarding incoming frames. This is the initial state.
• No Resources. The RUhas no memory resources and is discarding incoming frames. This state differs
from Idle in thattheRU accumulates statistics on the number of discarded frames.
• Suspended. The RU has memory available for storing frames, but is discarding them. The suspend state
can only be reached if the CPU forces this through the SCB or sets the suspend bit in the RFD.
• Ready. The RU has memory available and is storing incoming frames.
The CPU can affect RU operation in three ways: by issuing an RU Control Command, by setting bits in the
Frame Descriptor Command word of the frame being received, or by setting the El bit of the current buffer's
.
Buffer Descriptor.
SYSTEM CONTROL BLOCK (SCB)
The SCB is a memory block that plays a major role in communications between the CPU and the 82596. Such
communications include the following.
• Commands issued by the CPU
• Status reported by the 82596
Control commands are sent to the 82596 by writing them into the SCB and then asserting CA. The 82596
examines the command, performs the required action, and then clears the SCB command word. Control
commands perform the following types of tasks.
.
• Operation of the Command Unit (CU). The SCB controls the CU by specifying the address of the Command
Block List (CBl) and by starting, suspending, resuming, or aborting execution of CBl commands.
o Operation of the Bus Throttle. The SCB controls the Bus Throttle timers by providing them with new values
and sending the load and Start timer commands. The timers can be operated in both the 32-bit Segmented
and Linear modes.
• Reception of frames by the Receive Unit (RU). The SCB controls the RU by specifying the address of the
Receive Frame Area and by starting; suspending, resuming, or aborting frame reception.
• Acknowledgment of events that cause interrupts.
• Resetting the chip.
The 82596 sends status reports to the CPU via the System Control Block. The SCB contains four types of
status reports.
• The cause of the current interrupts. These interrupts are caused by one or more of the following 82596
events.
• The Command Unit completes an Action Command that has its I bit set.
o The Receive Unit receives a frame.
• The Command Unit becomes inactive.
• The Receive Unit becomes not ready.
• The sta,tus of the Command Unit.
• The status of the Receive Unit.
• Status reports from the 82596 regarding reception of corrupted frames.
4-84
82596CA
Events can be cleared only by CPU acknowledgment. If some events are not acknowledged by the ACK field
the Interrupt signal (INT) will be reissued after Channel Attention (CA) is processed. Furthermore, if a new
event occurs while an interrupt is set, the interrupt is temporarily cleared to trigger edge-triggered interrupt
controllers.
'
The CPU uses the Channel Attention line to cause the 82596 to examine the SCB. This signal is trailing-edge
triggered-the 82596 latches CA on the trailing edge. The latch is cleared by the 82596 before the SCB
control command is read.
31
ODD WORD
ACK
I xl
cuc
IRI
16 15
RUC
IX X X X
EVEN WORD
STAT
RFAOFFSET
1 0 1 CUS I 0 I RUS
CBLOFFSET
0
I0
0 0 0 SCB
SCB
ALIGNMENT ERRORS
CRCERRORS
SCB
OVERRUN ERRORS
RESOURCE ERRORS
SCB
+4
+8
+ 12
Figure 18. SCB-82586 Mode
31
ODD WORD
ACK
101
cuc
IRI
16 15
RUC
I0
o 0
01
EVEN WORD
STAT
101
I
RFAOFFSET
CUS
I
RUS
0
ITlo 0 0 SCB
CBLOFFSET
SCB
CRCERRORS
SCB
ALIGNMENT ERRORS
SCB
RESOURCE ERRORS (*)
SCB
OVERRUN ERRORS (')
SCB
RCVCDT ERRORS (*)
SCB
SHORT FRAME ERRORS
I
T-ONTIMER
SCB
T-OFFTIMER
SCB
+4
+8
+ 12
+ 16
+ 20
+ 24
+ 28
+ 32
*In monitor mode these counters change function
Figure 19. SCB-32·Bit Segmented Mode
31
ODD WORD
ACK
101
cuc
IRI
16 15
RUC
EVEN WORD
STAT
1 0 1 CUS
COMMAND BLOCK ADDRESS
10 0
0
01
I
RUS
0
ITlo 0
o SCB
SCB
RECEIVE FRAME AREA ADDRESS
SCB
CRCERRORS
SCB
ALIGNMENT ERRORS
SCB
RESOURCE ERRORS (*)
SCB
OVERRUN ERRORS (*)
SCB
RCVCDT ERRORS (*)
SCB
SHORT FRAME ERRORS
SCB
I
T-ON TIMER
T-OFFTIMER
*In MONITOR mode these counters change function
Figure 20. SCB-Linear Mode
4-85
SCB
+4
+8
+ 12
+ 16
+ 20
+ 24
+ 28
+ 32
+ 36
int:et
82596CA
Command Word
16
31
o
: cue:
R
: Rue:
o
o
o
o
8eB
+2
These bits specify the action to be performed as a result of a CA. This word is set by the CPU and cleared by
the 82596. Defined bits are:
Bit 31 ACK-CX
-
Acknowledges that the CU completed an Action Command.
Bit 30 ACK-FR
-
Acknowledges that the RU received a frame.
Bit 29 ACK-CNA
-
Acknowledges that the Command Unit became not active.
Bit 28 ACK-RNR
-
Acknowledges that the Receive Unit became not ready.
Bits 24-26 CUC
-
(3 bits) This field contains the command to the Command Unit. Valid values are:
o -
2
NOP (does not affect current state of the unit).
-
Start execution of the first command on the CBL. If a command is executing,
complete it before starting the new CBL. The beginning of the CBL is in CBL
OFFSET (address).
-
Resume the operation of the Command Unit by executing the next command.
This operation assumes that the Command Unit has been previously suspended.
3
-
Suspend execution of commands on CBL after current command is complete.
4
-
Abort current command immediately.
5
-
6
-
Loads the Bus Throttle timers so they will be initialized with their new values
after the active timer (T-ON or T-OFF) reaches Terminal Count. If no timer is
active new values will be loaded immediately. This command is not valid in
82586 mode.
Loads and immediately restarts the Bus Throttle timers with their new values.
This command is not valid in 82586 mode.
7
-
Reserved.
Bit 23 RESET
-
Reset chip (logically the same as hardware RESET).
Bits 20-22 RUC
-
(3 bits) This field contains the command to the Receive Unit. Valid values are:
o -
NOP (does not alter current state of unit).
1
Start reception of frames. The beginning of the RFA is contained in the RFA
OFFSET (address). If a frame is being received complete reception before
starting.
-
2
-
Resume frame reception (only when in suspended state).
3
-
Suspend frame reception. If a frame is being received complete its reception
before suspending.
4
-
Abort receiver operation immediately.
5-7 - Reserved.
4-86
82596CA
Status Word
0
15
o
I
: GUS:
0
I
: GUS :
o
I
0
: RUS:
I
0
I
0
0
I
0
0
I
SGB
I
SGB
82586 mode
0
15
I
I
32-Bit Segmented and Linear mode.
T
I
0
Indicates the status of the 82596. This word is modified only by the 82596. Defined bits are:
Bit 15 CX
Bit 14 FR
Bit 13 CNA
Bit 12 RNR
Bits8-10CUS
-
The CU finished executing a command with its I (interrupt) bit set.
The RU finished receiving a frame.
-
The Command Unit left the Active state.
The Receive Unit left the Ready state.
(3 bits) This field contains the status of the command unit. Valid values are:
o
-Idle
1 - Suspended
2 -Active
3-7 - Not used
Bits 4-7 RUS
-
This field contains the status of the receive unit. Valid values are:
Oh (0000) - Idle
1h (0001) - Suspended
2h (0010) - No Resources. This bit indicates both no resources due to lack of
RFDs in the RDl and no resources due to lack of RBDs in the FBl.
Bit3 T
-
4h (0100) Ah (1010) -
Ready
No resources due to no more RBDs (not in the 82586 mode).
Ch (1100) -
No more RBDs (not in 82586 mode)
No other combinations are allowed
Bus Throttle timers loaded (not in 82586 mode).
SCB OIFIFSIET ADDRIESSIES
CBl Offset (Address)
In 82586 and 32-bit Segmented modes this 16-bit quantity indicates the offset portion of the address for the
first Command Block on the CBl. In Linear mode it is a 32-bit linear address for the first Command Block on
the CBl. It is accessed only if CUC equals Start.
RFA Offset (Address)
In 82586 and 32-bit Segmented modes this 16-bit quantity indicates the offset portion of the address for the
Receive Frame Area. In Linear mode it is a 32-bit linear address for the Receive Frame Area. It is accessed
only if RUC equals Start.
4-87
..
inlet
82596CA
SCB STATISTICAL COUNTERS
Statistical Counter Operation
o The CPU is responsible for clearing all error counters before initializing the 82596. The 82596 updates
e
these counters by reading them, adding 1, and then writing them back to the SCB.
The counters are wraparound counters. After reaching FFFFFFFFh the counters wrap around to zero.
The 82596 updates the required counters for each frame. It is possible for more than one counter to be
updated; multiple errors will result in all affected counters being updated.
• The 82596 executes the read-counter lincrementlwrite-counter operation without relinquishing the bus
(locked operation). This is to ensure that no logical contention exists between the 82596 and the CPU due
to both attempting to write to the counters simultaneously. In the dual-port memory configuration the CPU
should not execute any write operation to a counter if LOCK is asserted.
• The counters are 32-bits wide and their behavior is fully compatible with the IEEE 802.3 standard. The
82596 supports all relevant statistics (mandatory, optional, and desired) through the status of the transmit
and receive header and directly through SCB statistics.
D
CRCERRS
This 32-bit quantity contains the number of aligned frames discarded because of a CRC error. This counter is
updated, if needed, regardless of the RU state.
ALNERRS
This 32-bit quantity contains the number of frames that both are misaligned (i.e., where CRS deasserts on a
rionoctet boundary) and contain a CRC error. The counter is updated, if needed, regardless of the RU state.
SHRTFRM
This 32-bit quantity contains the number of received frames shorter than the minimum frame length.
The last three counters change function in monitor mode.
RSCERRS
This 32-bit quantity contains the number of good frames discarded because there were no resources to
contain them. Frames intended for a host whose RU is in the No Receive Resources state, fall into this
category. This counter is updated only if the RU is in the No Resc;>urces state. When in Monitor mode this
counter counts the total number of frames-good and bad.
4-88
82596CA
OVRNERRS
This 32-bit quantity contains the number of frames known to be lost because the local system bus was not
available. If the traffic problem lasts longer than the duration of one frame, the frames that follow the first are
lost without an indicator, and they are not counted. This counter is updated, if needed, regardless of the RU
state.
RCVCDT
This 32-bit quantity contains the number of collisions detected during frame reception. In Monitor mode this
counter counts the total number of good frames.
ACTION COMMANDS AND OPERATiNG MODIES
This section lists all the Action Commands of the Command Unit Command Block List (CBL). Each command
contains the Command field, the Status and Control fields, the link to the next Action Command, and any
command-specific parameters. There are three basic types of action commands: 82596 Configuration and
Setup, Transmission, and Diagnostics. The following is a list of the actual commands.
o NOP
Individual Address Setup
o Configure
o
Transmit
TOR
o
Dump
o MC Setup
o Diagnose
0
o
The 82596 has three addressing modes. In the 82586 mode all the Action Commands look exactly like those
of the 82586.
o 82586 Mode. The 82596 software and memory structure is compatible with the 82586.
o
32-Bit Segmented Mode. The 82596 can access the entire system memory and use the two new memory
structures-Simplified and Flexible-while still using the segmented approach. This does not require any
significant changes to existing software.
o Linear Mode. The 82596 operates in a flat, linear, 4 gigabyte memory space without segmentation. It can
also use the two new memory structures.
In the 32-bit Segmented mode there are some differences between the 82596 and 82586 action commands,
mainly in programming and activating new 82596 features. Those bits marked "don't care" in the compatible
mode are not checked; however, we strongly recommend that those bits all be zeroes; this will allow future
enchancements and extensions.
In the Linear mode all of the address offsets become 32-bit address pointers. All new 82596 features are
accessible in this mode, and all bits previously marked "don't care" must be zeroes.
The Action Commands, and all other 82596 memory structures, must begin on even byte boundaries, i.e., they
must be word aligned.
4-89
inteL
82596CA
NOP
This command results in no action by the 82596 except for those performed in the normal command processing. It is used to manipulate the CBl manipulation. The format of the NOPcommand is shown in Figure 21.
NOP-82586 and 32-8it Segmented Modes
ODD WORD
0
EVEN WORD
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
0
0
0
0
0
0
0
0
0
0
0
LINK OFFSET
o
0
AO 4
NOP-Linear Mode
ODD WORD
0
0
0
0
0
EVEN WORD
0
0
0
0
0
0
0
0
0
0
0
0
LINK ADDRESS
0
0
0
o
0
AO 4
Figure 21
where:
LINK POINTER
-
In the 82586 or 32-bit Segmented modes this is a 16-bit offset to the next Command
Block. In the Linear mode this is the 32-bit address of the next Command Block.
El
S
-
If set, this bit indicates that this command block is the last on the CBL.
If set to one, suspend the CU upon completion of this CB.
-
If set to one, the 82596 will generate an interrupt after execution of the command is
complete. If I is not set to one, the CX bit will not be set.
The NOP command. Value: Oh.
CMD (bits 16-18) Bits 19-28
C
-
B
-
Reserved (zero in the 32-bit Segmented and Linear modes).
This bit indicates the execution status of the command. The CPU initially resets it to zero
when the Command Block is placed on the CBl. Following a command Completion, the
82596 will set it to one.
This bit indicates that the 82596 is currently executing the Nap command. It is initially
reset to zero by the CPU. The 82596 sets it to one when execution begins and to zero
when execution is completed. This bit is also set when the 82596 prefetches the command.
NOTE:
The C and B bits are modified in one operation.
OK
- Indicates that the command was executed without error. If set to one no error occurred
(command executed OK). If zero an error occured.
Individual Address Setup
This command is used to load the 82596 with the Individual Address. This address is used by the 82596 for
inserting the Source Address during transmission and recognizing the Destination Address during reception.
After RESET, and prior to Individual Address Setup Command execution, the 82596 assumes the Broadcast
Address is the Individual Address in all aspects, Le.:
• This will be the Individual Address Match reference.
• This will be the Source Address of a transmitted frame (for Al-lOC = 0 mode only).
4-90
82596CA
The format of the Individual Address Setup command is shown in Figure 22.
IA Setup-82586 and 32-Bit Segmented Modes
31
ODD WORD
ELls111
x
X
X
X
X
X
1615
X X
X
xlo
INDIVIDUAL ADDRESS
0
1
EVEN WORD
cJBJoKIAlo
0
1st byte A15
5th byte
6th byte
0
0
0
0
0
0
0
0
0
0
LINK OFFSET
o 0
AO 4
3rd byte
4th byte
B
IA Setup-Linear Mode
31
ODD WORD
ELI S II 10
0
0
0
0
0
0
1615
0
0
010
A31
0
1
EVEN WORD
c I B lOKI A 10
0
0
0
0
0
0
0
0
0
0
LINK ADDRESS
3rd byte
4th byte
0
o 0
AO 4
INDIVIDUAL ADDRESS
1st byte
B
6th byte
5th by1e
C
Figure 22
where:
LINK ADDRESS,
EL, B, C, I, S
-
As per standard Command Block (see the NOP command for details)
A
-
Bits 19-28
CMD (bits 16-18)
-
Indicates that the command was abnormally terminated due to CU Abort control
command. If one, then the command was aborted, and if necessary it should be
repeated. If this bit is zero, the command was not aborted.
Reserved (zero in the 32-bit Segmented and Linear modes).
The Address Setup command. Value: 1h.
INDIVIDUAL ADDRESS -
The individual address of the node, 0 to 6 bytes long.
The least significant bit of the Individual Address must be zero for Ethernet (see the Command Structure).
However, no enforcement of 0 is provided by the 82596. Thus, an Individual Address with 1 as its least
significant bit is a valid Individual Address in all aspects.
The default address length is 6 bytes long, as in 802.3. If a different length is used the IA Setup command
should be executed after the Configure command.
Coniigure
The Configure command loads the 82596 with its operating parameters. It allows changing some of the
parameters by specifying a byte count less than the maximum number of configuration bytes (11 in the 82586
mode, 14 in the 32-Bit Segmented and Linear modes). The 82596 configuration depends on its mode of
operation. When configuring the 12th byte (Byte 11 undefined) in 82586 mode this byte should be all ones.
o ,In the 82586 mode the maximum number of configuration bytes is 12. Any number larger than 12 will be
reduced to 12 and any number less than 4 will be increased to 4.
o The additional features of the serial side are disabled in the 82586 mode.
o In both the 32-Bit Segmented and Linear modes there are four additional configuration bytes, which hold
parameters for additional 82596 features. If these parameters are not accessed, the 82596 will follow their
default values.
o For more detailed information refer to the 32-Bit LAN Components User's Manual.
4-91
int'et
82596CA
The format of the Configure command is shown in Figure 23,24 and 25.
31
ODD WORD
1615
ELI S II I X X X X X X X X X xlo
1
0
EVEN WORD
c I B lOKI A 10
0
0
0
o
0
0
0
0
0
0
0
o 0
Byte 1
Byte 0
Byte 5
Byte 4
Byte 3
Byte 2
8
Byte 9
Byte 8
Byte 7
Byte 6
12
Byte 10
16
A15
X X X X X X X X X X X X X X X X X X
. LINK OFFSET
X
X X X
AO 4
X X
Figure 23. CONFIGURE-82586 Mode
31
ODD WORD
ELI S II 10
0
0
0
0
0
0
1615
0
0
010
1
o
EVEN WORD
c I B lOKI A I 0
0
0
0
0
0
0
0
0
0
0
0
o 0
. Byte 1
Byte 0
Byte 5
Byte 4
BYte 3
Byte 2
8
Byte 9
Byte 8
Byte 7
Byte 6
12
Byte 13
Byte 12
Byte 11
Byte 10
16
A15
LINK OFFSET
AO 4
Figure 24. CONFIGURE-32-Bit Segmented Mode
31
ODD WORD
ELI S II 10
0
0
000
0
1615
0
o
010
1
EVEN WORD
o c I B lOKI A I 0 o
0
0
0
0
0
0
0
0
o
LINK ADDRESS .
A31
0
o0
AO 4
Byte 3
Byte 2
Byte 1
Byte 0
8
Byte 7
Byte 6
Byte 5
Byte 4
12
Byte 11
Byte 10
Byte 9
Byte 8
16
Byte 13
Byte 12
20
X X X X X X X X X X X X X X X X
Figure 25. CONFIGURE-'-Linear Mode
LINK ADDRESS, EL, B, C,I, S
As per standard Command Block (see the NOP command for details)
A
Indicates that the command was abnormally terminated due to a CU Abort control command. If 1, then the command was aborted and if necessary it should be repeated. If this
bit is O,the command was not aborted.
-
Bits 19-28
CMD
(bit~
-
16-18) -
Reserved (zero in the 32-Bit Segmented and Linear Modes)
The CONFIGURE command. Value: 2h.
The interpretation of the fields follows:
7
p.
6
5
4
X
X
X
3
2
1
o
.BYTE90UNT
BYTE 0
BYTE CNT (Bits 0-3)
Byte Count. Number of bytes, including this one, that hold parameters to be configured.
PREFETCHED (Bit 7)
Enable the 82596 to write the prefetched bit in all prefetch
RBDs.
4-92
intel®
82596CA
NOTE:
The P bit is valid only in the new memory structure modes. In 82586 mode this bit is disabled (Le., no
prefetched mark).
o
7
x
x
BYTE 1
FIFO Limit (Bits 0-3)
MONITOR # (Bits 6- 7)
FIFO limit.
Receive monitor options. If the Byte Count of the configure
command is less than 12 bytes then these Monitor bits are ignored:
DEFAULT: C8h
o
7
o
SAVBF
BYTE 2
SAV BF (Bit 7)
o
o
o
o
O-Received bad frames are not saved in the memory.
1-Received bad frames are saved in the memory.
DEFAULT: 40h
RESUME_RD (Bit 1)
o-
The 82596 does not reread the next CB on the list when aCU Resume
Control Command is issued.
1 - The 82596 will reread the next CB on the list when a CU Resume
Control Command is issued. This is available only on the 825968 s t e p
A-II
ping.
o
7
LOOP BACK
MODE
ADDRESS LENGTH
PREAMBLE LENGTH
BYTE 3
ADR LEN (Bits 0-2)
NO SCR ADD INS (Bit 3)
PREAM LEN (Bits 4-5)
LP BCK MODE (Bits 6-7)
DEFAULT: 26h
Address length (any kind).
No Source Address Insertion.
In the 82586 this bit is called AL LOC.
Preamble length.
Loopback mode.
o
7
IBOFMETD I
EXPO~ENTIAL PRlpRITY
BYTE 4
LIN PRIO (Bits 0-2)
EXP PRIO (Bits 4-6)
BOF METD (Bit 7)
DEFAULT: OOh
o
Linear Priority.
Exponential Priority.
E?- t-TO
OMMAND
BLOCK
LIST
RECEIVE
FRAME
DESCRIPTORS
0--
VALID
PARAMETERS
L
RBD2
1
I ACT-cnt
0-
RECEIVE
BUFFER
DESCRIPTORS
RECEIVE
BUFFERS
9
9
J-
~
VALID
DATA
VALID
DATA
'---
-BUFFER 2
BUFFER 1
I
It
I
Lr
STATUS
0-
0--
f
.-LJ
01 ACT-ent
9
S
EMPn'
RBD5
RBD4
ACT-cnt
9
I+-- RECEIVE FRAME LIST
0-
EMPTY
RBD3
o
STATUS
-
EMPn'
RBDl
O\ACT-:Lr
f
a
I ACT-ent
0-
lJ
'L
000
BUFFER 3
BUFFER 4
BUFFER 5
FREE FRAME LIST
290218-15
Figure 37. The Receive Frame Area
4-107
..
int'eL
82596CA
Note that this sequence is very useful for monitoring. If the 82596 is configured to Save Bad Frames, to
receive in Promiscuous mode, and to use the Simplified memory structure, any programmed length of received
.
data can be saved in memory.
The Simplified memory structure is shown in Figure 38.
SCB
STATUS
CBL'
POINTER
RFA
POINTER
Lr
II
TO COf.4f.4AND LIST
4
RECEIVE FRAf.4E AREA
FD1
STATUS
STATISTICS
I
I
I
BUS
THROTTLE
FD2
FD4
.'
STATUS
I
I
I
I
.. _----_.
RECEIVE
FRAf.4E
DESCRIPTORS
-y!Bt------!
FD3
I
I
I
I
I
I
I
I
VARIABLE
DATA
FIELD
Ef.4PTY
I
I
I
I
I
I
I
I
.
I
.1
I
I
I
I
I
Ef.4PTY
._---- ..
Ef.4PTY
I
I
I
I
I
: . - RECEIVE FRAf.4E LIST -......j.~:.4------- FREE FRAME LIST -------......j.~,
290218-16
Figure 38. RFA Simplified Memory Structure
Flexible Memory Structure
The second structure is the Flexible memory structure, the data structure of the received frame is stored in
both the RFD and in a linked list of Receive Buffers-Receive Buffer Descriptors. The received frame is placed
in the RFD as configured in the Size field. Any remaining data is placed in a linked list of RBDs.
The Flexible memory structure is shown in Figure 39.
4-108
nntel·
SCB
82596CA
TO COMMAND LIST
STATUS
'<:
RFA
POINTER
FDl
FD2
BUS
THROTTLE
RECEIVE
FRAME
DESCRIPTORS
FD3
FD4
S
STATUS
STATISTICS
• _ _ _ _ _ _ _ 01
~,
RECEIVE FRAME AREA
CBL
POINTER
CONTROL
FIELD
VARIABLE
DATA
FIELD
EMPTY
EMPTY
EMPTY
._----_.
._----_.
._----_.
RECEIVE
BUFFER
DESCRIPTORS
EMPTY
EMPTY
RECEIVE
BUFFERS
BUFFER 1
,4---
BUFFER 2
RECEIVE FRAME LIST
BUFFER 3
---I>,,
290218-17
Figure 39. RFA FleJdble Memory Structure
Buffers on the receive side can be different lengths. The 82596 will not place more bytes into a buffer than
indicated in the associated RBD. The 82596 will fetch the next RBD before it is needed. The 82596 will
attempt to receive frames as long as the FBL is not exhausted. If there are no more buffers, the 82596
Receive Unit will enter the No Resources state. Before starting the RU, the CPU must place the FBL pointer in
the RBD pointer field of the first RFD. All remaining RBD pointer fields for subsequent RFDs should be "1 s." If
the Receive Frame Descriptor and the associated Receive Buffers are not reused (e.g., the frame is properly
received or the 82596 is configured to Save Bad Frames), the 82596 writes the address of the next free RBD
to the RBD pointer field of the next RFD.
Receive Buffer Descriptor (RBD)
The RBDs are used to store received data in a flexible set of linked buffers. The portion of the frame's data
field that is outside the RFD is placed in a set of buffers chained by a sequence of RBDs. The RFD points to
the first RBD, and the last RBD is flagged with an EOF bit set to 1. Each buffer in the linked list of buffers
related to a particular frame can be any size up to 214 bytes but must be word aligned (begin on an even
numbered byte). This ensures optimum use of the memory resources while maintaining low overhead. All
buffers in a frame are filled with the received data except for the last, in which the actual count can be smaller
than the allocated buffer space.
4-109
intet
82596CA
31
ODD WORD
ELI S I
1615
x x x x x x x x x x x x x x
A15
RBDOFFSET
C I B lOKI 0 I
AO A15
4th byte
EVEN WORD
STATUS BITS
10
LINK OFFSET
0
0
0
o
1st byte 6th byte
6th byte
12
4th byte
X X X X X X X, X X X X X X X X X
16
LENGTH FIELD
ODD WORD
0
o
0
A15
0
0
o
0
1615
o
0 ISFI 0
0
RBDOFFSET
0
SIZE
0101
EVEN WORD
LINK OFFSET
DESTINATION ADDRESS
SOURCE ADDRESS
0
AO 4
ACTUAL COUNT
EOFI FI
4th byte
0
STATUS BITS
C I B lOKI
AO A15
20
~ode
Figure 40. Receive Frame Descrlptor-82586
31
0
AO 4
1st byte 8 '
DESTINATION ADDRESS
SOURCE ADDRESS
ELI S 10
0
0
8
1st byte 12
1st byte 6th byte
16
4th byte
20
6th byte
LENGTH FIELD
24
, OPTIONAL DATA AREA
Figure 41. Receive Frame Descriptor-32-Bit Segmented Mode
31
ODD WORD
ELI S 10
0
o
0
0
0
0
0
A31
o ISFI
0
0
0
6th byte
STATUS BITS
EOFI FI
6th byte
12
1st byte 16
20
4th byte
24
LENGTH FIELD
OPTIONAL DATA AREA
Figure 42. Receive Frame Descriptor-Linear Mode
4-110
0
A08
ACTUAL COUNT
DESTINATION ADDRESS
1st byte
0
AO 4
RECEIVE BUFFER DESCRIPTOR ADDRESS
SIZE
4th byte
SOURCE ADDRESS
EVEN WORD
C I BloKI
LINK ADDRESS
A31
0101
1615
0
" 28
ontel®
82596CA
where:
EL
-
S
- When set, this bit suspends the RU after receiving the frame.
- This bit selects between the Simplified or the Flexible mode.
0- Simplified mode, all the RX data is in the RFD. RBD ADDRESS field is all
SF
When set, this bit indicates that this RFD is the last one on the RDL.
"1s,"
1-
C
B
OK (bit 13)
STATUS
Flexible mode. Data is in the RFD and in a linked list of Receive Buffer Descriptors.
- This bit indicates the completion of frame reception. It is set by the 82596.
- This bit indicates that the 82596 is currently receiving this frame, or that the 82596
is ready to receive the frame. It is initially set to 0 by the CPU. The 82596 sets it to
1 when reception set up begins, and to 0 upon completion. The C and B bits are
set during the same operation.
- Frame received successfully, without errors. RFDs with bit 13 equal to 0 are possible only if the save bad frames, configuration option is selected. Otherwise all
frames with errors will be discarded, although statistics will be collected on them.
- The results of the Receive operation. Defined bits are,
Bit 12:
Length error if configured to check length
Bit 11:
Bit 10:
CRC error in an aligned frame
Alignment error (CRC error in misaligned frame)
Bit 9:
Bit 8:
Ran out of buffer space-no resources
DMA Overrun failure to acquire the system bus.
Bit 7:
Frame too short.
Bit 6:
Bit 5:
No EOP flag (for Bit stuffing only)
When the SF bit equals zero, and the 82596 is configured to save bad
frames, this bit signals that the receive frame was truncated. Otherwise it
is zero.
Bits 2-4: Zeros
Bit 1:
When it is zero, the destination address of the received frame matches
the IA address. When it is a 1, the destination address of the received
frame did not match the individual address. For example, a multicast
address or broadcast address will set this bit to a 1.
Bit 0:
LINK ADDRESS
-
RBD POINTER
-
EOF
F
SIZE
ACT COUNT
-
MC
DESTINATION
ADDRESS
SOURCE ADDRESS
-
LENGTH FIELD
Receive collision, a collision is detected during reception.
A 16-bit offset (32-bit address. in the Linear mode) to the next Receive Frame
Descriptor. The Link Address of the last frame can be used to form a cyclical list.
The offset (address in the Linear mode) of the first RBD containing the received
frame data. An RBD pointer of all ones indicates no RBD.
These fields are for the Simplified and Flexible memory models. They are exactly
the same as the respective fields in the Receive Buffer Descriptor. See the next
section for detailed explanation of their functions.
Multicast bit.
The contents of the destination address of the receive frame. The field is 0 to 6
bytes long.
- The contents of the Source Address field of the received frame. It is 0 to 6 bytes
long.
- The contents of this 2-byte field are user defined. In 802.3 it contains the length of
the data field. It is placed in memory in the same order it is received, i.e., most
significant byte first, least significant byte second.
4-111
intel®
82596CA
NOTES
1. The Destination address, Source address and Length fields are packed, i.e., one field immediately follows
the next.
2. The affect of Add ressl Length Location (No Source Address Insertion) configuration parameter while receiving is as follows:
-
82586 Mode: The Destination address, Source address and Length field ar.e not used, they are placed in
the RX data buffers.
-
32-Bit Segmented and Linear Modes: when the Simplified memory model is used, the Destination address,
Source address and Length fields reside in their respective fields in the RFD. When the Flexible memory
strucrture is uSed the Destination address, Source address, and Length field locations depend on the SIZE
field of the RFD. They can be placed in the RFD, in the RX data buffers, or partially in the RFD and the rest
in the RX data buffers, depending on the SIZE field value.
82586 Mode
31
ODD WORD
A15
16 15
NEXT RBD OFFSET
X X
X
X
X
X
X
X IA23
X X
X
X
X
X
X
X
X
EVEN WORD
RECEIVE BUFFER ADDRESS
X
X
X
X
X
X
0
0
ACTUAL COUNT
AolEOFI FI
A04
SIZE
X JEL I X I
8
32·Bit Segmented Mode
31
ODD WORD
A15
16 15
NEXT RBD OFFSET
RECEIVE BUFFER ADDRESS
0
0
0
0
0
0
0
000
0
0
0
0
01 EL I
0
0
ACTUAL COUNT
AojEOFjF I
A31
0
EVEN WORD
AO 4
pi
SIZE
8
Linear Mode
31
0
ODD WORD
0
0
0
0
0
0
0
0
0
1615
0
0
0
A31
0
EVEN WORD
OIEOFI FI
ACTUAL COUNT
NEXT RBD ADDRESS
A31
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
olELipL
Figure 43. Receive Buffer Descriptor
4-112
0
AO 4
RECEIVE BUFFER ADDRESS
0
0
AO 8
SIZE
intel®
82596CA
where:
EOF
-
F
-
ACT COUNT
-
NEXT BD ADDRESS
-
BUFFER ADDRESS
-
EL
P
-
SIZE
-
Indicates that this is the last buffer related to the frame. It is cleared by the CPU
before starting the RU, and is written by the 82596 at the end of reception of the
frame.
Indicates that this buffer has already been used. The Actual Count has no meaning
unless the F bit equals one. This bit is cleared by the CPU before starting the RU,
and is set by the 82596 after the associated buffer has been. This bit has the same
meaning as the Complete bit in the RFD and CB.
This 14-bit quantity indicates the number of meaningful bytes in the buffer. It is
cleared by the CPU before starting the RU, and is written by the 82596 after the
associated buffer has already been used. In general, after the buffer is full, the
Actual Count value equals the size field of the same buffer. For the last buffer of
the frame, Actual Count can be less than the buffer size.
The offset (absolute address in the Linear mode) of the next RBD on the list. It is
meaningless if EL = 1.
The starting address of the memory area that contains the received data. In the
82586 mode, this is a 24-bit address (with pins A24-A31 = 0). In the 32-bit Segmented and Linear modes this is a 32-bit address.
Indicates that the buffer associated with this RBD is last in the FBL.
This bit indicates that the 82596 has already prefetched the RBDs and any change
in the RBD data will be ignored. This bit is valid only in the new 82596 memory
modes, and if this feature has been enabled during configure command. The
82596 Prefetches the RBDs in locked cycles; after prefetching the RBD the 82596
performs a write cycle where the P bit is set to one and the rest of the data remains
unchanged. The CPU is responsible for resetting it in all RBDs. The 82596 will not
check this bit before setting it.
This 14-bit quantity indicates the size, in bytes, of the associated buffer. This quantity must be an even number.
4-113
..
intel·
82596CA
PGA PACKAGE THERMAL SPECIFICATION
Parameter
Thermal Resistance
8JC
3°C/W
8JA
24°C/W
ELECTRICAL AND TIMING
CHARACTERISTICS
Absolute Maximum Ratings
• Storage Temperature ........ -65°C to + 150°C
o
Case Temperature under Bias - 65°C to + 110°C
• Supply Voltage'
with Aespect to Vss . ........ - 0.5V to + 6.5V
.. Voltage on Other Pins .... -0.5V to Vcc + 0.5V
DC Characteristics
T c = 0°C-85°C, Vcc = 5V ± 10% LE/BE have MaS levels (see VMll, VMIH).
All other signals have TTL levels (see Vil , VIH , VOL , VOH)
Symbol
Vil
Parameter
Input Low Voltage (TTL)
Min
Max
Units
-0.3
+0.8
V
VIH
Input High Voltage (TTL)
2.0
VMll
Input Low Voltage (MaS)
-0.3
Vcc + 0.3 '"
+0.8
V
VMIH
Input High Voltage (MaS)
3.7
VCC+ 0.3
V
Val
Output Low Voltage (TTL)
0.45
V
0.6
V
Vcc+ 0.5
V
Notes
V
IOl
= 4.0mA
IOH
= 0.9 mA-1 mA
VCll
AXC, TXC Input Low Voltage
-0.5
VCIH
AXC, TXC Input High Voltage
3.3
VOH
Output High Voltage (TTL)
2.4
III
Input Leakage Current
±15
}loA
Os VIN s Vcc
ILO
Output Leakage Current
±15
}loA
0.45
CIN
Capacitance of Input Buffer
10
pF
FC
COUT
Capacitance of Input/Output
Buffer
12
pF
< VOUT < Vcc
= 1 MHz
FC = 1 MHz
CClK
CLK Capacitance
20
pF
FC
Icc
Power Supply
200
mA
At 25 MHz
Icc Typical
= 100 mA
At33 MHz
Icc Typical
= 150 mA
Icc
Power Supply
V
300
4-114
mA
= 1 MHz
inteJ®
82596CA
AC Characteristics
82596CA INPUT/OUTPUT SYSTEM TIMINGS
Te = 0°C-85°C, Vee = 5V ± 10%. These timing assume the CL on all outputs is 50 pF unless otherwise
specified. CL can be 20 pF to 120 pF however timings must be derated. All timing requirements are given in
nanoseconds.
Symbol
25 MHz
Parameter
Operating Frequency
Min
Max
12.5 MHz
25 MHz
40
80
Notes
1X ClK Input
T1
ClK Period
T1a
ClK Period Stability
T2
ClK High
14
2.0V
T3
ClK low
14
0.8V
0.1%
Adjacent ClK l:;.
T4
ClK Rise Time
4
0.8V to 2.0V
T5
ClK Fall Time
4
2.0V to 0.8V
T6
BEn, lOCK, and A2-A31 Valid Delay
3
22
T6a
BLAST, PCHK Valid Delay
27
T7
BEn, lOCK, BLAST, A2-A31 Float Delay
3
3
T8
WfR and ADS Valid Delay
3
22
T9
W fR and ADS Float Delay
3
30
T10
00-031, OPn Write Data Valid Delay
3
22
T11
00- 031, OPn Write Data Float Delay
3
30
22
30
..
T12
HOLD Valid Delay
3
T13
CA and BREQ Setup Time
7
1,2
T14
CA and BREQ Hold Time
3
1,2
.115
BS16 Setup Time
8
2
T16
BS16 Hold Time
3
2
T17
BROY, ROY Setup Time
8
2
T18
BROY, ROY Hold Time
3
2
T19
00-031, OPn READ Setup Time
5
2
T20
00-031, OPn READ Hold Time
3
2
T21
AHOlO and HlOA Setup Time
10
1,2
T22
AHOlO Hold Time·
3
1,2
T22a
HlOA Hold Time
3
1,2
T23
RESET Setup Time
10
1,2
T24
RESET Hold Time
3
1,2
T25
INT liNT Valid Delay
T26
CA and BREQ, PORT Pulse Width
T27
00-031 CPU PORT Access Setup Time
5
2
T28
00-031 CPU PORT Access Hold Time
3
2
1
2T1
26
1,2,3
T29
PORT Setup Time
7
2
T30
PORT Hold Time
3
2
T31
BOFF Setup Time
10
2
T32
BOFF Hold Time
3
2
4-115
intaL
82596CA
AC Characteristics (Continued)
82596CA INPUTIOUTPUT SYSTEM. TIMINGS
=
Te
0·C-85·C, Vee = 5V ±5%. These timing assume the CL on all outputs is 50 pF unless otherwise
specified. CL can be 20 pF to 120 pF, however timings must be derated. All timing requirements are given in
nanoseco·nds.
.
Symbol
33 MHz
Parameter
Min
Operating Frequency
T1
ClK Period
T1a
ClK Period Stability
T2
ClK High
T3
ClKlow
T4
ClK Rise Time
..'
{," ';!~,i\ .
ClK Fall Time
T6a
BLAST, PCHK Valid De
BEn, lOCK, BLAST,
T9
.W/R and ADS F'Rl~t
T10
00-031, OPn Write
T11
00-031, OPn Writ
T12
HOLD Valid Del '
. T13
',~~@~{~'If\'
"'~"~
If0ij~!l 11 ,.:C. <:'1,'\it
1''2>'' 11~"
. ·di\:!l"
i,.;,jf;;~1fJi"·
BEn, lOCK; and A2-A31 V!!llj
W/R and ADS V~
33 MIi~'L
30
,"0"3,%
T6
T8
12.5 MHz
'I"
T5
T7
Max
lay
Notes
1X ClK Input
Adjacent ClK I::.
2.0V
0.8V
3
0.8Vto2.0V
3
2.0VtoO.8V
19
"
22
1 Flo'
Ii
20
..
CA and BR(;g Se pTime
·'t·'··
0"
Y
-
3
19
3
20
3
19
3
20
3
19
7
1,2
T14
CA and.aJil~61iHold Time
3
1,2
T15
BS1 ~~e,jup Time
6
2
T16
BS16 Hold Time
3
2
T17
BROY, ROY Setup Time
6
2
T18
BROY, ROY Hold Time
.3
2
T19
DO-031, OPn READ Setup Time
5
2
T20
00-031, DPn READ Hold Time
3
2.
T21
AHOlO and HlOA Setup Time
8
1,2
T22
AHOlO Hold Time
3
1,2
82596CA
AC Characteristics (Continued)
82596CA INPUT/OUTPUT SYSTEM TIMINGS
CL on all outputs is 50 pF unless otherwise specified.
All timing requirements are given in nanoseconds.
Symbol
33 MHz
Parameter
Min
Mal(
Notes
T22a
HLDA Hold Time
3
1,2
T23
RESET Setup Time
8
1,2
T24·
RESET Hold Time
3
T25
INT liNT Valid Delay
1
T26
CA and BREQ, PORT Pulse Width
2T1
1,2,3
T27
DO-D31 CPU PORT Access Setup Time
5
2
T28
DO-D31 CPU PORT Access Hold Time
3
2
T29
PORT Setup Time
7
2
T30
PORT Hold Time
3
2
T31
BOFF Setup Time
8
2
T32
BOFF Hold Time
3
2
1,2
20
NOTES:
1. RESET, HLDA, and CA are intemallysynchronized.This timing is to guarantee recognition at next clock for RESET. HLDA
and CA.
2. All set·up. hold and delay timings are at maximum frequency specification Fmax. and must be derated according to the
following equation for operation at lower frequencies:
Tderated = (Fmax/Fopr) x T
where:
Tderate = Specifies the value to derate the specification.
Fmax = Maximum operating frequency.
Fopr = Actual operating frequency.
T = Specification at maximum frequency.
This calculation only provides a rough estimate for derating the frequency. For more detailed information, contact your
Intel Sales Office for the data sheet supplement.
.
3. CA pulse width need only be 1 T1 wide if the set up and hold times are met; BREQ must meet setup and hold times and
need only be 1 T1 wide.
TRANSMIT/RECEIVE CLOCK PARAMETERS
20 MHz
Parameter
Symbol
Min
Notes
Max
1,3
T36
TxCCycle
T38
TxC Rise Time
50
5
1
T39
TxC Fall Time
5
1
T40
TxC High Time
19
T41
TxC Low Time
18
T42
TxD Rise Time
10
T43
TxDFaliTime
10
T44
TxD Transition
T45
TxC Low to TxD Valid
25
4,6
T46
TxC Low to TxD Transition
25
2,4
T47
TxC High to TxD Transition
25
2,4
T48
TxC Low to TxD High (At End of Transition)
25
4
1,3
1,3
4
2,4
20
4-117
4
82596CA
TRANSMITIRECEIVE CLOCK PARAMETERS (Continued)
Symbol
20MHz
Parameter
Notes
Max
Min
RTS AND CTS PARAMETERS
T49
TxC Low to RTS Low,
Time to Activate RTS
25
T50
CTS Low to TxC Low, CTS Setup Time
T51
TxC Low to CTS Invalid, CTS Hold Time
T52
TxC Low to RTS High
T53
20
10
RXCCycle
T54
RXC Rise Time
T55
RXCFaliTime
T56
RXC High Time
T57
RXC Low Time
RXD Setup Time
T59
RXD Hold Time
T60
RXD Rise Time
T61
RXD Fall Tin'l.e
25
5
50
...
...
...•
<
..
...
....
.
1,3
;.
..
.
RECEIVED DATA PARAMETERS.·.····
T58
7
..
...
RECEIVE CLOCK PARAMETERS
5
.c
5
1
5
1
19
1
18
1
20
.
.. /
....
.
10
6
..
6
....
10
....
10
..
CRS AND COT PARAMETERS
T62
CDT Lowto TXC HIGH
External Collision Detect Setup Time
20
T63
TXC High toCDT Inactive, CDT Hold Time
10
T64
CDT Low to Jam Start
T65
CRS Low to TXC High,
Carrier Sense Setup Time
20
T66
TXC High to CRS Inactive, CRS Hold Time
(Internal Collision Detect)
10
T67
CRS High to Jamming Start,
10
12
T68
Jamming Period
T69
CRS High to RXC High,
CRS Inactive Setup Time
30.
T70
RXC High to CRS High,
CRS Inactive Hold Time
10
11
4-118
82596CA
TRANSMIT/RECEIVE CLOCK PARAMETERS (Continued)
Symbol
I
I
Parameter
I
J
20 MHz
Min
I
Max
I
Notes
INTERFRAME SPACING PARAMETERS
T71
I
Interframe Delay
I
I
J
I
J
I
9
EXTERNAL LOOPBACK-PIN PARAMETERS
T72
T73
I
I
TXC Low to LPBK Low
TXC Low to LPBK High
NOTES:
1.. Special MOS levels. VCll = 0.9V and VCIH = 3.0V.
2. Manchester only.
3. Manchester. Needs 50% duty cycle.
4.1 TTL load + 50 pF.
5. 1 TTL load + 100 pF.
6. NRZ only.
7. Abnormal end of transmission-CTS expires before RTS.
B. Normal end to transmission.
9. Programmable value:
T71 = NIFS. T36
where: NIFS = the IFS configuration value
(if NIFS is less than 12 then NIFS is forced to 12).
10. Programmable value:
T64 = (NCDF 0 T36) + x 0 T36
(If the collision occurs after the preamble)
where:
NCDF = the collision detect filter configuration value,
and
x = 12,13,14, or15
11. T6B = 32· T36
12. Programmable value:
T67 = (NCSF. T36) + x· T36
where: NCSF = the Carrier Sense Filter configuration
value, and
x = 12,13, 14, or 15
13. To guarantee recognition on the next clock.
4-119
I
T36
T36
I
I
4
4
•
inteL
82596CA
82596CA BUS OPERATION
The following figures show the 82596CA basic bus cycle and basic burst cycle.
Please refer to the 32-Bit LAN Component User's Manual.
T1
tlDlE
T2
T1
T2
11
T2
tlDlE
T2
ClK
A~~~~;
____
~-JX~~----~~X~~----~~X~~----~JX.--~----~---
W/R
I
\
I
I
I
I
I
I
: \
I
I
I
I
I
I
I
W
I
I
I
I
I
ROY
1
I
I
I
I
I
I
I
rT\
BLAST
CT\
I
I
I
CT\
I
I
I
CT\
I
I
I
C
I
DATA ----7---~--<
READ
WRITE
READ
WRITE
290218-40
Figure 44. Basic 82596CA Bus Cycle
IDLE
T1
T2
T2
tlDlE
T2
T2
ClK
I
I
LU
i x i i crcD---;iI
I
A31-A2
W/R
8ED-3
I
BLAST
I
I
I
I
----7-~~
I
\
I
c
I
I
DATA - - - - ' - - - - ' - - - {
290218-41
Figure 45. Basic 82596CA Burst Cycle
4,120
82596CA
SYSTEM INTERFACE A.C. TIMING CHARACTERISTICS
The measurements should be done at:
• Te = 0°C-85°C, Vee = 5V ± 10%, C = 50 pF unless otherwise specified.
o A.C. testing inputs are driven at 2.4V for a logic "1" and 0.45V for a logic "0".
Timing measurements are made at 1.5V for both logic "1" and "0".
o
o Rise and Fall time of inputs and outputs signals are measured between 0.8V and 2.0V respectively unless
otherwise specified.
.. All timings are relative to ClK crossing the 1.5V level.
All A.C. parameters are valid only after 100 JLs from power up.
o
2.4V
0.45V
~ 1.5V rest
----..I\.:.
Point
?r-~
290218-18
290218-19
Figure 46. ClK Timings
Two types of timing specifications are presented below:
1. Input Timing-minimum setup and hold times.
2. Output Timings-output delays and float times from ClK rising edge.
Figure 47 defines how the measurements should be done:
elK
1.5V
LEGEND:
T8 = Input Setup Time
Th = Input Hold Time
Tn = Minimum output delay or Mininum float delay
Tx = Maximum output delay or Maximum float delay
290218-20
Figure 47. Drive levels and Measurements Points for A.C. Specifications
Ts
Th
Tn
Tx
=
=
=
=
T13, T15, T17, T19, T21, T23, T27, T29, T31
T14, T16, T18, T20, T22, T22a, T24, T28, T30, T32
T6, T6a, T7, T8, T9, T10, T11, T12, T25
T6, T6a, T7, T8, T9, T10, T11, T12, T25
4-121
inteL
82596CA
INPUT WAVEFORMS
ClK
BREQ
CA
290218-21
Figure 48. CA and BREQ Input Timing
290218-22
Figure 49. INTliNT Output Timing
ClK
L.JL.JL.rU-U--U-U1121
~1r-_ _ _ _+-"'T_12""\.~11
T22 \.,,_ _ __
r-T21 .... -T22aj
HOLD
I.
BOFF
---------x
A~~~~
x'---
CT31 .... -<-T32::J
290218-23
Figure 50. HOLD/HLDA Timings
ClK
031-00
OP3-0PO
290218-24
Figure 51. Input Setup and Hold Time
4-122
82596CA
ClK
A31-A2,BEn,
__
lOCK (T6)
PCHK, BLAST (T60)
W/R, ADS
-i
T1 \
r-
T2
.
~i,~'~.r~.",.IMAX
~
T61-1"
T60
VALID n
I
I_:..I_T_8_'IMIN
n+l
'IMAX
VAllon~n+l
r
OP3-0PO
031-00
(OUTPUT)
T1 0 : t = : : 1 MAX
_ _ _ _ _~IO DATA
290218-25
Figure 52. Output Valid Delay Timing
ClK
T7
MIN
A31-A2, BEn
lOCK, BLAST
PCHK
MAX
FLOAT
VALID n
T9
MIN
MAX
Fl9AT
VALID n
T11
MIN
OP3-0PO
031-00
(OUTPUT)
MAX
FLOAT
VALID n
290218-26
Figure 53. Output Float Delay Timing
elK
00-031
290218-27
Figure 54. PORT Setup and Hold Time
4-123
intel .
82596CA
RESET
290218-,28
Figure 55. RESET Input Timing
SERIAL AC TIMING CHARACTERISTICS
3.0V
0.9V
1+:-----T36
, 153
T41
~,
T57
290218-29
Figure 56. Serial Input Clock Timing
CTS
----------------------~
CRS
TXD-• •____
v
-'~6!_;1·v •.;-}!'!.---'i_1,~-----16-7=1
.. ------------- ____ _
"
•
(NRZ) _ . "••. - ------ ••"._-- - - - _.'
-11441-
'-----------------.-
TXD _.'"•• ",,".",,"'. _ .. __ .'"•• ""." _ .". _ _,:-0_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
"._,l'. __.___ ••"•••"•• ___ .'
(M'ANCHESTER) •••~•••
290218-30
Figure 57. Transmit Data Waveforms
4-124
inteL
82596CA
CRS
-,-
TXD----.....;.-....
(NRZ)
T43
H
T47
r-
-.".- - - - ..----...~
._--
_• • - - - - -
("'CJ(
I-T60. T61
>C
290218-32
Figure 59. Receive Data Waveforms (NRZ)
290218-33
Figure 60. Receive Data Waveforms (CRS)
4-125
int'eL
82596CA
OUTLINE DIAGRAMS
132 LEAD CERAMIC PIN GRID ARRAY PACKAGE INTEL TYPE A
SEATING
PLANE
SEATING~
PLANE
'" B (ALL PINS)
t=~
SWAGGED
PIN
DETAIL
mm (inch)
290218-34
Family: Ceramic Pin Grid Array Package
Millimeters
Symbol
Inches
Min
Max
A
Al
, A2
3.56
4.57
0.76
1.27
Solid Lid
2.67
3.43
Solid Lid
A3
1.14
1.40
Notes
Min
Max
0.140
0.180
0.030
0.050
Solid Lid
0.105
0.135
Solid Lid
0.045
0.055
B
0.43
0.51
0.017
0.020
0
36.45
37.21
1.435
1.465
01
32.89
33.15
1.295
1.305
91
2.29
2.79
0.090
0.110
L
2.54
3.30·
0.100
0.130
N
SI
ISSUE
132
1.27
IWS
132
2.54
0.050
10/12/88
4-126
0.100
Notes
82596CA
Intel Case Outline Drawings
Plastic Quad Flat Pack (PQFP)
0.025 Inch (0.635mm) Pitch
Symbol
Description
Max
Min
Mal(
Min
68
84
Min
Max
Mal(
132
Min
Max
164
Min
Mal(
N
Leadcount
A
Package Height
0.160 0.170 0.160 0.170 0.160 0.170 0.160 0.170 0.160 0.170 0.160 0.170
A1
Standoff
0.020 0.030 0.020 0.030 0.020 0.030 0.020 0.030 0.020 0.030 0.020 0.030
D,E
Terminal Dimension
0.675 0.685 0.775 0.785 0.875 0.885 1.075 1.085 1.275 1.285 1.475 1.485
01, E1
Package Body
0.547 0.553 0.647 0.653 0.747 0.753 0.947 0.953 1.147 1.153 1.347 1.353
02, E2
Bumper Distance
0.697 0.703 0.797 0.803 0.897 0.903 1.097 1.103 1.297 1.303 1.497 1.503
03, E3
Lead Dimension
Foot Radius Location 0.623 0.637 0.723 0.737 0.823 0.837 1.023 1.037 1.223 1.237 1.423 1.437
L1
Foot Length
Issue
IWS Preliminary 12/12/88
Description
0.500 REF
0.600 REF
196
D4,E4
Symbol
0.400 REF
100
Min
0.800 REF
1.000 REF
1.200 REF
0.020 0.030 0.020 0.030 0.020 0.030 0.020 0.030 0.020 0.030 0.020 0.030
INCH
Min
Max
Min
Max
Min
Max
100
84
Min
Mal(
Min
Malt
Min
Mal{
N
Leadcount
A
Package Height
4.06
4.32
4.06
4.32
4.06
4.32
4.06
4.32 4.06
4.32
4.06
4.32
A1
Standoff
0.51
0.76
0.51
0.76
0.51
0.76
0.51
0.76
0.76
0.51
0.76
D,E
Terminal Dimension
17.15 17.40 19.69 19.94 22.23 22.48 27.31 27.56 32.39 32.64 37.47 37.72
01, E1
Package Body
13.89 14.05 16.43 16.59 18.97 19.13 24.05 24.21 29.13 29.29 34.21 34.37
02, E2
Bumper Distance
17.70 17.85 20.24 20.39 22.78 22.93 27.86 28.01 32.94 33.09 38.02 38.18
68
10.16 REF
Lead Dimension
Foot Radius Location 15.82 16.17 .18.36 18.71 21.25 21.25 25.89 26.33 31.06 31.41 36.14 36.49
Foot Length
IWS Preliminary 12/12/88
0.51
0.76
0.51
0.76
0.51
0.76
20.32 REF
0.51
0.51
0.76
25.40 REF
196
03, E3
L1
15.24 REF
164
D4,E4
Issue
12.70 REF
132
0.51
0.76
30.48 REF
0.51
0.76
mm
4·127
intel..
82596CA
mm (hich)
290218-35
Figure 61. Principal Dimensions and Datums
mm (inch)
290218-36
Figure 62. Molded Details
4-128
82596CA
-11-
1"'.635
0.20 (.008)
~.14
(.H.BS)
D4/E4 ----Q>oj
=:NO
$"-'-,0-.-20-(-.0-08-)""""'@".....'.-C...
, A----'®=S--B----,®=-s-.--,D-'®=-~
8 DEG.
o DEG.
'"I
290218-38
mm (inch)
Detail J
Detail L
Figure 64. Typical Lead
4-129
82596CA
t
1.32 (.952)
1.22 (.948)
9.99 (.935) MIN.
~
2.lI3 (.lIBlI)-l
1.93 (.lI76)
--~---D2------
29021B-39
mm (inch)
Figure 65. Detail M
4-130
Development Support Tools
5
i9S0TM FAMJIL Y OF §OFTWARE DElBUGGERS
280916-1
COMPREHENSIVE SOFTWARE DEBUG SUPPORT FOR i960™
EMBEDDEDAPPLICATIONS
Intel provides comprehensive software debug support for all members of the i960TM
component architecture, including the newest members, the i960SA and i960SB. All
Intel's i960 software debug products share the same high-level, windowed user interface
emerging as the standard for all i960 tools from Intel. This innovative debug interface
allows users to focus their efforts on finding bugs rather than spending time learning and
manipulating the ,debug environment.
Intel's i960 software debug tools support a wide variety of debug environments, including
code debug on a simulated target environment, a PC-based evaluation board, a serialbased Intel evaluation board, or a serial-based, customized target system.
GENERAL i960 SOFTWARE DEBUGGER FEATURES
o
o
o
Windowed, pull down menu user
interface shared by other i960
Development Tools
Full symbolic debug with source level
display allows C or assembly code
debugging
Debugging productivity enhanced by
ability to quickly browse source code and
view call stacks or symbol run-time
values
o
o
o
Breakpoints may be defined symbolically
using module names, procedure names
and line numbers
Single step execution, code assembly /
disassembly, memory and register
display/modification
Run-time library support allows
programs to access host files and perform
I/O
*IBM, PCI AT, and Personal System/2 are registered trademarks of International Business Machines Corporation.
'Compaq is a registered trademark of the Compaq Corporation.
"Intel is a registered trademark of the Intel Corporation.
5-1
November 1991
Order Number: 280916·002
intel®
FEATURES
EASY TO USE, POWERFUL
USER INTERFACE
All i960 debuggers share the same high-level,
powerful user interface as other i960
development tools. Utilizing pulldown menus,
users have access to a color, windowed
environment featuring source-level, symbolic
debugging. Multiple, non-overlapping windows
can be used to display source code, registers,
variable values, and command line entries.
DEBUGGING FEATURES
High-level source or disassembled code can be
displayed in the source window. Users can
scroll through the source, browse from module
to module in a program, scope to any
executable point in the source, or
instantaneously relocate from a symbol name
to the location where it was defined
(hyperscope operation). Symbol names in the
source can be highlighted to inspect the
current run-time value of program variables.
Call stacks can be examined to trace execution
flow.
A variety of breakpoints can be specified
including source breakpoints, watch points,
passpoints, or event-action breakpoints.
Breakpoints can be defined symbolically using
module names, procedure names and line
numbers. Watch points allow users to observe
a variable as it changes during program
execution. Passpoints display a message when
a specified instruction is executed, giving the
user a non-realtime way to track execution of
key code sequences without halting instruction
flow. The event-action form allows complex
breakpoint conditions to be set up, including
data breakpoints (when supported by on-chip
registers).
Users can step through program execution via
a single assembly language instruction, a highlevel language statement or a high-level
function or procedure. Memory can be
displayed or modified as common data types
and all processor registers and system tables
can be examined or changed.
Expressions involving symbol names, memory
references, or both, can be defined as watch
expressions whose values are monitored in a
Watch window as a program executes. The
i960 family of software debuggers also allows
screen flipping between the debugger
environment and the display output from the
program.
Low level, run time libraries are provided that
allow programs running on an i960 board to
access the file system on the host or to perform
I/O operations.
RETARGETABLE SOFTWARE
DEBUGGER
Intel's DB-960 Retargetable Software
Debugger is a combination application and
system level debugger designed for use with
the i960 family of embedded microprocessors.
DB-960's retargetable monitor can be
customized to a target system, allowing sourcelevel, symbolic debug across a serial interface
cable.
RETARGETABLEMONITOR
Utilizing a combination of object files and
source code, a retargetable monitor is provided
with DB-960 for users to customize and
incorporate into their proprietary target
systems. This retargetable monitor is designed
to support all members of the i960 family. Most
ofthe monitor code is provided in object code
and does not need to be changed. Hardwaredependent source code is supplied for
modification by users. Example code is
provided for porting the monitor to the Intel
EV80960CA and QT960 target boards. Both
boards use an Intel 82510 UART serial
controller chip and the Intel 82C54 Counter/
Timer.
HARDWARE DEBUG
DB-960 takes advantage of on-chip debug
registers like those found on the i960CA to
provide two hardware execution address
breakpoints and two data address breakpoints.
Once the monitor has been retargeted to the
target system, hardware designers can
download initialization code, read/write to
registers and examine memory or register
contents.
HIGH SPEED SERIAL LINK
DB-960 communications between the host and
target system is supported via RS232 and
RS422 communication links. RS232 allows
access to industry standard serial protocols
while the RS422 interface provides higher
speed communication (up to 115K baud) for
faster code and data download. PC-AT buscompatible RS422 communication boards are
available from various third party vendors.
5-2
FEATURES
CUSTOMIZED ENVIRONMENT
Program execution statistics reported
include:
o Total number of instructions executed
o Total time
o Number oftimes a call caused processor to
write registers to external memory
o Current clock setting in cycles per second
o Current wait-state setting for each of the 16
memory regions
o Number of instruction words executed from
cache rather than external memory
o Total number of cycles elapsed
o Number of stack frames or register sets
cached on chip
o Number of times an unaligned load or store
operation occurred
o Bus utilization
o Branch prediction efficiency
o Usage for load, store, call and branch cache
instructions
Generally, DBSIM960 provides all the full
symbolic, debug capabilities found in the i960
family of debug tools, while providing a
complete benchmarking environment prior to
target system availability.
Because the user has control over the target
board and serial driver source code, a highly
customized target environment can be
developed. Serial communication functions can
be modified to allow for parallel
communication schemes, allowing faster
download speeds.
LICENSING
There are no incorporation or royalty fees for
customers shipping the retargeted DB-960
monitor with their product or system.
PC-BASED SOFTWARE
DEBUGGER
The DB960KBDEVA Software Debugger is
designed for debugging i960KA or i960KB code
executing on an Intel EVA-960KB4MB
Software Execution Vehicle plugged into PCATs or compatibles using DOS.
DB960KBDEVA offers the same powerful
debug user interface as other i960 softerware
debuggers and utilizes I/O resources provided
by the pc. Due to compatibility with the
i960KA and i960KB, i960SA and i960SB code
can be executed and debugged using the Intel
EVA-960KB4MB Software Execution Vehicle
in conjunction with the DB960KBDEV A
Software Debugger.
"'By being able to easily change the waitstate definition for
their code, the user's hardware and software design can be
optimized before any hardware development takes place.
IN-CIRCUIT DEBUG MONITOR
Intel's DB960CADIC in-circuit debug monitor
hosted on extended DOS/386 allows users to
debug high-speed, cached applications at the
full speed of the i960CA target processor.
DB960CADIC can be used by both hardware
and software developers, at any stage of design.
Early in the development process,
DB960CADIC allows software debugging when
inserted into an existing i960CA board such as
the EV80960CA, or in the DB960CASAST
stand-alone self-test unit. Later in the design
cycle, DB960CADIC can be inserted into the
user's target system, facilitating debug of
hardware/software integration.
DB960CADIC offers the same, windowed debug
user interface as other i960 software debuggers
and is also available with an optional 4 MB
standalone selftest chassis to debug and test
code before prototype hardware is available.
For further information, see fact sheet
# 280900 from Intel.
.
SIMULATOR-BASED
SOFTWARE DEBUGGER
The DBSIM960 Debug Simulator combines an
i960 CA/KA/SA instruction-level simulator
with the easy to use, powerful DB960 software
debugger interface. Users can debug i960
applications without a hardware target system
being available, allowing products to get to
market sooner. For i960 CA designs,
performance information is provided, with
timing profiles accurate to plus or minus 5%.
Users can specify the target system's clock
speed and wait-state information for each
region of memory. * DBSIM960 uses this
information to provide i960 CA performance
statistics. DBSIM960 expects COFF executable
files generated by Intel's CTOOLS960 compiler
and assembler. Execution flow can be
monitored by using a trace capability, which
reports the 8 digit cycle address, 8 digit
instruction pointer value, and the
disassembled instruction for each operation.
5-3
intel®
FEATURES
SOFTWARE COMPLETES THE
SYSTEM
Intel provides a comprehensive software
development environment to complement DB960. This environment includes a C Compiler,
an i960 Assembler, a system generator for
automating the compilation process and
instruction-level simulators. The languages
support the entire range ofi960 embedded
processors.
WORLDWIDE SERVICE,
SUPPORT, AND TRAINING
To augment its development tools, Intel offers
a full array of seminars, classes, workshops,
field application engineering expertise, hotline
technical support, and on-site service.
Intel also offers a Software Support Contract
which includes technical software information,
automatic distributions of software and
documentation updates, iCOMMENTS
publication, remote diagnostic software, and a
development tools troubleshooting guide.
Intel's 90,day Hardware Support package
includes technical hardware information,
telephone support, warranty on parts, labor,
material, and on-site hardware support.
Intel Development Tools also offers a 30-day,
money-back guarantee to customers who are
not satisfied after purchasing any Intel
development tool.
SPECIFICATIONS AND REQUIREMENTS
HOST SYSTEM REQUIREMENTS
Host system requirements to run Intel's i960
family of software debuggers include the.
following:
• DOS version 3.3 or later excluding DOS 4.0
• 640K bytes of RAM in conventional memory
• A fixed disk drive with at least 1.25M bytes
of free disk space
• One disk drive capable of reading 5.25 inch,
360K byte disks
• RS232 serial port (COM! or COM2)
Evaluated Systems include:
IBM PC-AT' with DOS 3.3
COMPAQ 386* with DOS 3.3
Intel 30'/302' with DOS 3.3
IBM Personal System/2* Model 70/80 with
DOS4.0!
5-4
ORDERING INFORMATION
DB960KBDEV
DOS-based, retargetable
software debugger for the
960KA, i960KB, i960SA,
i960SB and i960CA
embedded microprocessors.
Includes host debug
software, retargetable
monitor, host I/O libraries
and documentation.
DB960KBDEVA DOS-based source level
debugger for the i960KA,
i960KB, i960SA and i960SB
embedded microprocessors.
Requires EVA-960KB4MB
Software Execution Vehicle
and PC-AT compatible bus.
DBSIM960D
DOS/386-hosted debug
simulator for the i960 CA,
i960 KA and i960 SA which
utilizes an i960 CA
instruction-level simulator
allowing code development
and debug prior to hardware
prototype availability.
DBSIM960S
UNIX System V /386-hosted
debug simulator for the i960
CA, i960 KA and i960 SA
which utilizes an i960 CA
instruction-level simulator
allowing code development
and debug prior to hardware
prototype availability.
DBSIM960R
IBM RS/6000-hosted debug
simulator for the i960 CA,
i960 KA and i960 SA which
utilizes an i960 CA
instruction-level simulator
allowing code development
and debug prior to hardware
prototype availability.
DB960CADIC
DOS/386 hosted in-circuit
debug monitor for i960CA
only. Includes small board
with i960CA processor,
system debug monitor and
serial interface. Plugs into
i960CA socket on hardware
prototype system.
DB960CASAST Standalone Self Test Unit for
DB960CADIC. Includes builtin power supply, self-test
board, 4M byte of usable
DRAM for code development
and enclosure.
To order your Intel Development Tool product,
for more information, or for the number of
your nearest sales office or distributor, call
800-874-6835 (North America). For literature
on other Intel products call 800-548-4725
(North America). Outside of North America,
please contact your local Intel sales office or
distributor for more information.
5-5
inteL
EXV-960MC EXECUTION VEHICLE
280879-1
B0960MC-BASED TARGET SYSTEM SUPPORTING EARLY
SOFTWARE DEVELOPMENT AND BENCHMARKING
EXV-960MC is a software execution vehicle designed to support 80960MC-based designs.
Users can use the EXV-960MC board to execute and debug their application software
before a functional hardware prototype is available. The EXV-960MC is also designed
with programmable waitstate SRAM to support benchmarking activities. The EXV960MC is supported by the complete set ofIntel C, assembler and Ada code generation
tools. Both ofthe VAX/VMS*-hosted 80960MC software debuggers, the SDM-960MC
system debug monitor and the Ada-960MC source-level debugger, can be used for
debugging software running on the EXV-960MC.
EXV-960MC includes a Multibus I form factor board and a set of SDM-960MC target
monitor EPROMS. The SDM-960MC and the Ada-960MC debugger are preconfigured to
support the EXV-960MC execution environment. Designers can select the software
debugger best suited to their development needs. The Ada-960MC debugger is a sourcelevel symbolic debugger which provides a productive debugging environment for Ada
applications. The SDM-960MC debug monitor offers a complete debugging facility for
applications written in C, assembler or Ada.
"VAX/VMS is a trademark of Digital Equipment Corp.
5-6
December 1990
Order Number: 280879-002
SDM-960MC RETARGETABLE SYSTEM DEBUG
MONITOR
FEATURES
o
o
o
o
o
o
o
25 MHz 80960MC processor
256 Kbytes of (0,0,0,0) programmable wait-state SRAM
4 Mbytes dual-ported (3,1,1,1) wait-state DRAM
iSBXTM interface
Two serial ports, one bi-directional parallel port
8254 programmable interval timer
8259A programmable interrupt controller
ELECTRICAL CHARACTERISTICS
lOA
50mA
50mA
@
@
@
+5V
+12V
-12V
ENVIRONMENTAL CHARACTERISTICS
Operating temperature: 0° to + 60°C (32° to 140°F), 300 LFM
Operating Humidity: 10% to 90% non-condensing
SOFTWARE DEBUGGING SUPPORT
The SDM-960MC is a VAX/VMS*-hosted system debug monitor that provides a complete, flexible
environment to execute and debug 80960MC-based applications. Users can tailor the execution
environment as software development evolves. Initially, the application may require the full
support of the system debug monitor to establish a run-time environment. As the application
evolves, the SDM-960MC allows the application to take more ofthe responsibility for system
functions.
The default execution environment of the SDM-960MC is the EXV-960MC execution vehicle. The
VAX-hosted portion ofthe SDM-960MC debug monitor provides complete on-target debugging
support through its interface with the target-resident portion of the SDM-960MC. To facilitate
debugging on a user's custom target system, the SDM-960MC includes source and object files
necessary to reconfigure the target monitor. SDM-960MC and other 80960MC development tools
allow the developers to take full advantage of the 80960MC processor.
FEATURES
WORLDWIDE SERVICE AND
SUPPORT
assemble and disassemble 80960MC
instructions
o single step program execution
o access to memory and processor resources
• support 64 execution breakpoints
• issue Interagent Communications (lACs)
o powerful execution trace
o serial download
o
Intel augments its 80960 architecture family
development tools with a full array of
seminars, classes, and workshops; on-site
consulting services and telephone support are
available at all stages of development.
ORDERING INFORMATION
Product Code Description
EXV960MC
80960MC execution vehicle
(board and target EPROM)
SDM960MC
VAX, MicroVAX/VMS
hosted System Debug
Monitor, retargetable source
is included
HARDWARE REQUIREMENTS
• a serial interface
o 25 Kbytes of EPROM
• contiguous 50 Kbytes of RAM
5-7
80960SA/SB DEVELOPMENT SUPPORT
260906-1
COMPREHENSIVE DEVELOPMENT SUPPORT FOR 80960SAI
SB EMBEDDED APPLICATIONS
Intel provides comprehensive development support for the 80960 component
architecture, including the newest members, the 80960SA and 80960SB. Tools range from
compilers to simulators and from debuggers to emulators. All designed specifically for
members of the 80960 family, allowing you to take full advantage of their RISC-based
design while reducing time to market.
DEVELOPMENT TOOLS AVAILABLE:
• ASM-960 macro assembler for
developing and tuning speed-critical code
• iC-960 highly optimizing C language
compiler for high-level language
software development
• GEN-960 system generator for
initializing your design to take
advantage of 80960 on-chip features
• DB/SIM960KA debug simulator for
80960KA and 80960SA applications
Windowed, interactive, source-level DB960 debugger which can be targeted to
one of the evaluation and development
boards below, or customized to your
target system
o Evaluation and development boards
including the EV960SB, the QT80960KB,
and the EVA960KB
• ICE-960SA/SB offers a full featured incircuit emulator for the 80960SA/SB
components
o
5-8
November 1990
Order Number: 280906·001
80960SA/SB DEVELOPMENT SUPPORT
P....
~
...
1
"""""" To
"""",.Codo
....
SQytceFile
o
o
280906-2
GEN-960 SYSTEM GENERATOR
ASM-960 MACRO ASSEMBLER
The ASM-960 macro assembler is used to finetune sections of code for peak program
execution speed on the 80960S,A, 80960SB,
80960KA, 80960KB, 80960MC, and 80960CA.
ASM-960 does this by giving you absolute
control over program instructions. In addition
to the assembler and macro preprocessor,
ASM-960 includes several utilities for
application program maintenance and debug:
• LINKER provides incremental program
linking/locating and link-time optimization.
• ARCHIVER allows you to build reusable
function libraries for applications.
o DISASSEMBLER produces assembly
language from object files:
o SYMBOL DUMPER provides symbolic
information from a program file for
facilitating low-level debug.
• ROM IMAGE BUILDER produces a hex file
suitable for PROM programmers.
• Macro preprocessor provides code generation
flexibility and improves code readability,
. reducing maintenance costs.
A Floating Point Arithmetic Library (FPAL) is
included for the 80960SA, 80960KA, and
80960CA components. It eliminates the need to
develop your own floating point code.
The 80960 System Generator (GEN-960) helps
you set up data structures for standalone,
. embedded applications that use the on-chip
features of the 80960 architecture. GEN-960 is
used with other 80960 tools to generate and
refine ROM or RAM code. GEN-960 supplies a
set of command and template files containing
assembly code and linker control commands to
set up processor control blocks, inter-agent
communication mechanisms, system procedure
tables, and other requirements for
initialization. The result is a batch file
containing all the commands needed to
compile, assemble and link the final target
system.
o Improves engineering productivity by
automating the compilation, assembly and
linking process
o Supplies sample initialization code, reducing
programming time
• Save engineering time by simplifyingthe
task of initializing each processor for on-chip
capabilities
5-9
intel®
80960SA/SB DEVELOPMENT SUPPORT
iC-960 COMPILER
DEBUGGING SIMULATOR
iC-960 is a highly optimizing C language
compiler for the S0960 family of
microprocessors. iC-960 supports the full C .
language as described in the Kernighan and
Ritchie book, The C Programming Language
(Prentice-Hall, 1975). iC-960 includes standard
ANSI extensions to the C language and is used
in conjunction with ASM-960 for creating
object files.
The iC-960 compiler supports a number of
processor dependent optimizations including
global register allocation, constant
propagation, arithmetic identity folding,
redundant load/store elimination, strength
reduction and register allocation/scheduling of
arguments. Processor independent
optimizations include common sub-expression
elimination, folding of constant expressions,
elimination of superfluous branches, removing
unreachable code, tail recursion and procedure
incorporation.
iC-960 includes a standard C library with I/O
functions and mathematical routines. A second
library provides low level, environmentdependent routines emulating UNIX' system
calls and supplies I/O routines for the EVAc
960 Software Execution Vehicle.
iC-960 also includes the following
enhancements for embedded application
development:
Programs may be easily placed in ROM.
Memory-mapped I/O allows high-level
language access to application-specific input
and output.
In-line assembly simplifies the integration of
C language and assembly code for speedcritical functions.
Floating point support produces in-line code
to take full advantage ofthe floating point
capability of the S0960SB, S0960KB and
S0960MC.
Symbolic debugging of source code for iC-960
and ASM-960 is provided by the DB-960 Source
Level Debugger, the DBSIM960KA debugging
simulator, the DB960CADIC in-target
debugger, and the ICE960SB and ICE960KB
emulators.
The DBSIM960KA simulator features an easy
to use, pulldown menu user interface combined
with an S0960SA/S0960KA instruction
simulator. DBSIM960KA facilitates debugging
S0960SA and S0960KA applications by
providing debugging capabilities before target
hardware is available. DBSIM960KA's
powerful, windowed, source-oriented interface
allows you to focus your efforts on finding bugs
rather than on learning and manipulating the
debug environment.
Ease of learning. Drop-down menus make the
debugger easy to learn for new or casual users.
A command line interface allows direct
command entry for solving more complex
problems, improving productivity of
knowledgeable users.
Extensive debug modes. You can set
conditional breakpoints, pass points, and
temporary breakpoints as needed.
See into your program. Using pull-down
menus or function keys, you can browse source
and Call stacks, monitor processor registers,
view screen output, and watch the values of
variables chang~.
Full debug symbolics for maximum
productivity. You need not know whether a
variable is an unsigned integer, a real, or a
structure: the debugger displays program
variables in their respective type formats.
5-10
80960SA/SB DEVELOPMENT SUPPORT
EVA-960KB4MB SOFTWARE
EXECUTION VEHICLE
The EVA-960KB4MB is a software execution
vehicle for the 80960KA/KB microprocessor. It
is a single PC AT plug-in board which provides
easy and convenient architecture evaluation
and benchmarking, as well as software
development. Since the board uses an
80960KB, 80960SA and 80960SB performance
can be extrapolated. The EVA-960KB4MB
contains the following:
• 4 MB or 16 MB (EVA960KB16MB) of one
wait-state program memory (DRAM)
o 64 Kbytes of zero wait-state program
memory (SRAM)
o Three-channel programmable interval timer
SOURCE-LEVEL DEBUGGER
The DB-960 Debugger with source-level debug
capabilities is available for PC ATs equipped
with DOS. DB-960 can debug 80960 code
executing on an Intel EVA-960 Software
Execution Vehicle or on a hardware target
system via a serial interface. The EVA-960
targeted debugger uses I/O resources provided
by the PC, while 80960 code executes at high
speed on the EVA-960. Two serial versions of
DB-960 are available. DB-960CADIC plugs
directly into the 80960CA socket on your
prototype, offering a "plug-in and go" debug
environment. DB-960D is a serial, retargetable
version ofDB-960 whose system debug monitor
can be customized for 80960SA/SB, 80960KAI
KB, or 80960CA operation.
Ease oflearning. Drop-down menus make the
debugger easy to learn for new or casual users.
A command line interface allows direct
command entry for solving more complex
problems, improving productivity of
knowledgeable users.
• Hosted debug monitor which supports two
hardware and 64 software breakpoints,
single-step program execution, register and
memory access, program download and
upload
o DOS access libraries that allow: screen
display, keyboard input, read and write disk
files, and the ability to spawn a DOS process
that could communicate with serial or
parallel I/O
o 20 MHz operation, allowing software to
operate at full speed of 80960KB
EVA-960KB4MB also operates with the DB960 Source Level Debugger for code
development/debug prior to target system
availability.
Extensive debug modes_ You can set
conditional breakpoints, pass points, and
temporary breakpoints as needed.
See into your program. Using pull-down
menus or function keys, you can browse source
and Call stacks, monitor processor registers,
view screen output, and watch the values of
variables change.
Full debug symbolics for maximum
productivity. You need not know whether a
variable is an unsigned integer, a real, or a
structure: the debugger displays program
variables in their respective type formats.
In-Target Debug. Porting the DB960D
retargetable monitor to your target system
allows the debugger to be used in-target, thus
facilitating debugging of code dependent upon
hardware interaction.
5-11
int:et
80960SA/SB DEVELOPMENT SUPPORT
ICE960SB IN-CIRCUIT
EMULATOR
ICE960SB is a full featured in~circuit emulator
for the B0960SA and B0960SB components. A
separate ICE probe can be purchased to
support B0960KA and B0960KB components.
ICE960SB includ~s:
Full speed emulation of the B0960SA/SB
components to 16 MHz
• Complete symbolic information when used
with Intel B0960 compilers
• 1024 Frames Bus or Execution Trace with
Time-Tags
' .
• Comprehensive break capal;lilities including
execution addresses, instruction type, bus
read/write/access, data values, and external
synchlines
WORLDWIDE SERVICE, .'
SUPPORT, AND TRAINING
To augment its development tools, Intel offers
a full array 'of seminars, classes, and
workshops, field application engineering
expertise, hotline technical support, and onsite service.
Intel also offers a Software Support package
which includes technical software information,
• Qualification of break conditions based on a
B-state machine or an ()ccurrence counter
o Fastbreaks.to dynamically access mem<;>ry ,or
variables during emulation
• Examine and modify memory and B0960
registers
., . .
o Stand-Alone Self-Test module provides
diagnostic circuitry and 256 Kbytes of
memory for software development
• Optional 2 Mbyte.of relocatable expansion
memory
o Support for socketed and surface mounted B4
Pin PLCC components and surface mounted
BO Pin EIAJ components via ONCE mode
o DOS Hosting with support for RS232 and
RS422 communication linkS
'
telephone support, automatic diStribution of
software and documentation updates, access to
the "TooITalk" electronic bulletin board,
"iComments" publication, remote diagnostic,
software, and a development tools
troubleshooting guide.
Intel's Hardware Support package includes
technical hardware information, telephone
support, warranty on partS, labor, material,
and on-site hardware suppoM;.
5-12
80960SA/SB. DEVELOPMENT SUPPORT
80960SA/SB DEVELOPMENT
TOOLS
ASM960
Assembler package
containing the assembler,
linker/loader, macro
preprocessor, archiver,
ROM image builder, other
object file utilities, and the
80960SA/KA/CA floating
point arithmetic library.
C960
Optimizing C Compiler,
with ANSI extensions for
embedded control
applications; contains
standard STDIO libraries
and in-line assembly
capability.
GEN960
80960 System Generation
software automates the
compilation, assembly and
linking process. Simplifies
usage of 80960 sophisticated
features.
Debugging Simulator
software emulates the
80960SA and 80960KA
instruction set allowing
code development and
debugging prior to
hardware prototype
availability.
Source Level Debugger
software for the 80960KBI
KA with powerful debug
capabilities including
conditional breakpoints,
source and Call stack
browsing, memory/register
display and modification,
and ability to watch
variables change value.
Requires EVA-960KB4MB
Software Execution
Vehicle. For PC AT hosted
systems only.
DBSIM960KA
DB960KBDEV A
DB960D
Source Level Debugger
software for 80960SA/SB,
80960KA/KB, or CA
processors resident on
serially-interfaced
hardware prototype
systems. Includes
customizable system debug
monitor and serial interface
protocol specifications. For
PC AT hosted systems only.
EV A960KB4MB
Software Execution Vehicle
for 80960SA/SB and
80960KA/KB components.
Includes 4 Mbyte of onboard memory, system
debug monitor and code
download software. Code
compatible with the
80960SA/SB components.
Required by
DB960KBDEVA.
EV A960KB16MB Identical to
EVA960KB4MB with
16 Mbyte of DRAM instead
of4Mbyte.
ICE960SB
5-13
In-Circuit emulator for the
80960SA/SB components.
Includes ICE base and
probe, stand-alone self-test
module, and your choice of
PLCC or PQFP target
adapters. Optional 2 Mbyte
relocatable expansion
memory option provides
overlayable memory for
software prototyping and
hardware debugging.
•
intel~
80960SA/SB DEVELOPMENT SUPPORT .1'
ARCHITECTURE EVALUATION
STARTER KITS
960SKit3
DB960KIT2
Contains ASM960D Assembler
and iC960D Compiler
Kit contains DB-960KBDEVA
(KB version of DB-960 used with
EVA-960), EVA960KB4MB
Software Execution Vehicle,
ASM960D and C960E. Requires
PC AT with 640K memory.
DB960KIT3 Kit contains DB-960D (serial
version of DB-960 supporting the
80960SA/SB, 80960KA/KB and
80960CA components (operating
on PC-AT/DOS), ASM960D and
C960D. Requires PC AT with
640K memory.
Product Code to order, by Host
Product
Category
PC-AT/DOS
Assembler:
CCompiler
SystemGen
SXDebugger
KXDebugger
ASM960D
ASM960S
'C960D
C960S
GEN960D
DB960D
DB960KBDEVA
DB960D .
CADebugger
UNIX-38G
V.4
-
DB960CADIC
DB960D
SA Simulator
CA Simulator SIM960CAD
ICE960SB
ICE960SB
ICE960KB
ICE960KB
-
OS/2
ASM960P ASM960U
C960P
C960U
GEN960U
-
-
-
-
-
-
-
-'-
-
DBSIM960KAS
-
Sun 3/
UNIX
HP9000/
HP-UX
VAX/
ULTRIX
ASM960H
CP60H
GEN960H
ASM960VX ASM960MX
C960VX
C960VX
-
-
-
-
-
-
-
SIM960CAU SIM960CAH
-
-
-
-
5-14
-
-
",VAX/
ULTRIX
-
-
-
-
-
-
-
-
-
-
-
-
ICETM-960SB AND ICE-960KB
IN-CIRCUIT EMULATOR
280852-1
INTERCHANGEABLE PROBES
The ICETM-960 in-circuit emulator delivers real-time hardware and software debugging
capabilities for i960TM SA/SB and i960 KA/KB-based designs. Features include fullspeed emulation of each of the microprocessors, powerful breakpoint specification,
fastbreaks, optional relocatable expansion memory, two types of trace capability, large
trace buffering, sophisticated human interface and high-speed communication links with
the DOS host. The ICE-960 inccircuit emulator gives you unmatched control over all
phases of hardware/software debug, including developing, integrating and testing, which
improves development productivity and improves ~ime to market.
FEATURES
Real-Time Emulation of the i960 KA/KB
microprocessors up to 25 MHz and
emulation of the i960 SA/SB to 16 MHz
o Full symbolic integration with Intel
ASM and C compilers
• Optional ICE960KBREM/
ICE960SBREM boards provide 2 Mbytes
. oflCE memory which can overlay user
ROM or RAM .
.. Examine and modify memory and the
i960 registers
.
o
• Dynamically monitor and update
program variables via fastbreaks
• Breakpoint capabilities include:
execution address, instruction ~e, bus
read/write/ access, and data value.
Qualification of events is based on an
occurrence counter and an 8-state statesmachine
5-15
October 1991
Order Number: 280852-003
FEATURES
• Hosted on IBM PC AT' or compatible and
supporting RS232, RS422 and Ethernet
operation
• 1024 frame trace buffer for execution and/or
bus trace and time tags
• The on-chip cache does not effect collection of
the execution trace
.
• 256 Kbytes of memory in standalone self-test
(SAST) unit
o Real-time bus trace with time-tags for
tracking code execution time.
• Assembly and disassembly of code in 1960
instruction mnemonics
• ICE to component interconnect includes
support for surface-mounted and socketed 84pin PLCCD and surface mounted 80-pin
EIAJ QFP i960 SA/SB and 132-pin PGA for
i960KA/KB
The ICE-960 in-circuit emulator provides
emulation of the i960 SA/SB at speeds to
16 MHz and the i960 KA/KB at speeds to
25 MHz, thus providing early detection of
subtle timing problems that may arise at full
speed. Intel's intimate knowledge ofthe
component makes possible the tightest
conceivable conformance between timing
parameters of the emulator and the target
microprocessor.
EVENT RECOGNITION
(BREAKPOINT CONTROL) AND
EMULATION CONTROL
PROCESSOR/MEMORY
EXAMINATION AND
MODIFICATION
An optional board provides ICE-960 with 2
Mbytes of relocatable expansion memory
which allows users to develop applications
either before the target system memory is
working, or in place of ROM or EPROM to
speed the debugging cycle. This memory can be
mapped in two separate 1 Mbyte partitions on
1 Mbyte boundaries.
For thenew ICE960KBREM board, the
memory waitstate pattern is (3,1,1,1) when the
users system does not return RDY # for
accesses in the mapped area. For accesses
where the user system does return RDY # for
these areas, the waitstate pattern will be the
larger of (3,1,1,1) or user waitstate pattern plus
(2,2,2,2). For either board, the size and shape of
the board is identical to the ICE probe and is.
installed between the probe and the user's
target system when in use. The memory
configuration can be mapped via an ICE MAP
command.
.
.
The ICE960KBREM/ICE960SBREM cards add
some constraints when used with the ICE in a
users target system. First, users should qualify
bus drivers/buffers with DEN # in order to
eliminate potential bus conflict between the
REM board and their target memory while
The i960 registers can be accessed
mnemonically (e.g. g12, r5, fp3) with the ICE960 emulator software. Data can be displayed
or modified in hexadecimal, decimal, oCtal, or
binary and by data type (byte, word, etc).
Program memory contents can be modified as
i960 assembly instruction mnemonics.
PROGRAM TRACING
The ICE-960 emulator can store 1024 frames of
program execution history processor/address/
data bus activity in the trace buffer. Each
frame of program execiltion contains a
discontinuity address (branch, ca,ll, return, etc)
and a time-tag. This information can be used to
reconstruct a history of the program execution.
With the execution trace option enabled, the
ICE-960 will run at less than full speed. Each
trace frame of bus cycles contains one complete
bus burst trace. Collection of trace information
is controlled by a logic analyzer type moving
trace window and by bus access type.
ICE~960
provides comprehensive event
recognition capabilities including: two
hardware and thirty-two software breakpoints
for instruction execution breakpoints, and use
of the internal debug registers to recognize
execution of certain instruction types such as
branch or call instructions. Bus analysis logic
provides recognition of external bus addresses
qualified by read, write, or access type as well
as data values. The data values may be entered
as masked values and qualified by type. Two
synchronization lines are provided for
recognition of external events. ICE-960 also
provides qualification ·of events based on an
occurrence counter or by Ii recognition
sequence of up to 8 events: Additionally,
emulation can be automatically stopped when
the trace buffer is full. Besides the ability to
execute program code at full speed between
specified points, the ICE-960 emulator provides
the capability to single-step through program
.
code.
RELOCATABLE EXPANSION
MEMORY
5-16
FEATURES
using the ICE. Second, the 1M Byte partition
size can not be reduced and may effect the
design of the users memory subsystem. Third,
the REM boards delay the ADS # and DEN #
signals by 5 ns (typical) and delays the RDY #
signal by 4 ns (typical). Fourth, it adds loading,
capacitance, and power requirements as shown
in tables 3 and 4.
STANDALONE OPERATION
Product software can be developed and
debugged prior to and independent of
hardware availability with the Standalone Self
Test unit (SAST), which contains 256 Kbytes of
two wait-state program memory. The SAST
also provides diagnostic testing to assure full
functionality of the ICE-960 emulator.
VERSATILE AND POWERFUL
HOST SOFTWARE
ICE-960 provides an easy-to-use human
interface which utilizes color forms to
complement a powerful command set. The
software includes: an on-line help facility, a
dynamic command entry and syntax guide,
screen oriented editor, assembler and
disassembler, input/output redirection,
command piping, DOS command entry, and
the ability to customize the command set via
debug procedures and literal definitions.
ICE TO COMPONENT
INTERCONNECT SYSTEM
Using the On-Circuit Emulation (ONCE) i960
SA/SB silicon feature, ICE960SB can be used
in systems with surface-mounted i960 SA/SB
components in either PLCC or EIAJ QFP
packages. The hinge cable adapters included in
the various ICE kits and pictured to the right,
are placed directly on top of the surface
mounted i960 SA/SB device. The circuitry
necessary for the emulator to take control
from the target processor is fully supported in
the emulator. No additional circuitry is
required.
Of course, socketed support for i960 SA/SB
components in PLCC packages, or i960 KA/KB
components in PGA packages are also
supported. Please see Figures 1, 2, 3, and 4 for
ICE Probe physical characteristics. Refer to
Table 5 for hinge cable loading and delay
characteristics.
WORLDWIDE SERVICE,
SUPPORT, AND TRAINING
DEBUG PROCEDURES AND
LITERALS
Debug procedures (PROCs) are user-defined
groups ofICE960 emulator commands. They
can be stored on disk and recalled during later
debugging sessions. PROCs can be used to
simplify the process of debugging by grouping
repetitive emulator commands, which can then
be accessed by typing the name of the PROC.
Literals are user-defined abbreviations for
whole or partial ICE-960 emulator commands.
Literals are a shorthand method of
customizing the emulator commands to fit
your needs and preferences.
To augment its development tools, Intel offers
a full array of seminars, classes, workshops,
field application engineering expertise, hotline
technical support, and on-site service.
Intel also offers a Software Support contract
which includes technical software information,
automatic distributions of software and
documentation updates, iCOMMENTS
publication, remote diagnostic software, and a
development tools troubleshooting guide.
5-17
intel"
FEATURES
HIGH-SPEED HOST-TO~ICE
COMMUNICATIONS
PROTOCOLS
Intel's 90-day Hardware Support package
includes technical hardware information,
warranty on parts, labor, material, and on.site
hardware support.
.
IntelDevelopmerit'l'0ols also offers a 30-day,
money-back guara~teeto customers who are
not satisfied after purchasing any Intel
development tool.
ICE-960 supports RS232 and RS422
communications protocols to 115 KBaud and
1152 KBaud respectively deperidingupon the
ability of the host to support the specific rate.
Testing for these systems and' the
configurations involved are described in the
following sections.
j' SPECIFICATIONS 'j
HOST REQUIREMENTS
COMP~Q Deskpro 386' with DOS 3.3.
Tested with built-in RS232 and Quatech DS202
Asynchronous RS422 Communications Board
with 16550 Option
Systems Based on an Inte1301/302™ Box,
with DOS 3.3. Tested with buiit-in RS232 to '
115.2 KBaud and a Quatech DS202'
Asynchronous RS422 Communications Board
"
with 16550 Option to 1.152 MBaud
IBM Personal System/2* with DOS 4.01.
Tested with built-in RS232
'
IBM PC-AT (minimum requirements) with 640
KBytes of conventional memory
1 MByte of RAM (Lotus, Intel, Microsoft
expanded memory specification)
20 MByte Fixed Disk
At least one 5%" or 3'12" Floppy Disk drive
RS232 or RS422 Communication Interface
DOS Operating System (version 3.2 or 3.3)
TESTED HOS'I'
CONFIGURATIONS
REQUIRED SYSTEM
RESOURCES
IBM PC-AT with DOS 3.3. Tested with built, in RS232 and a Quatech DS202 Asynchronous
RS422 Communications Board with 16550
Option
The ICE-960 emulator requires the following:
a) exclusive use of the i960 SA/SB or i960 KAI
KB's on-chip debug registers and b) a '
"
minimum of 256 bytes of target system RAM
used to flush the i960 local registers.
MECHANICAL SPECIFICATIONS
Unit
TABLE 1. ICE-960 Emulator Physicai Characte~istics
, Length
Height'
Width
Control Unit
Inches
em
Inches
Inches
em
lbs
kg
6.0
2.72
3.5
' 1.59
4.7
2.14
10.5
26.7
1.5
3.8
16.0
40.6
Processor Module'
3.8
9.6
1.5
3.8
5.0
12.7
SAST
6.0
5.1
8.0
20.3
3.8
15.2
' 9.6
2.0
OIB
0.9
2.3
5.1
13.0
Power Supply
2.8
7.1
4.2
10.7
11.0
27.9
User Cable
22.0
55.9
Serial Cables
12.0'
3.66m
'Measurement mcludes target adaptor
5-18
Weight
em
intel®
SPECIFICATIONS
SIOEYIEW
t
.80
t
•
. 13
I'
'I
,I
5.35
a
1.20
i
TOPYIEW
n
LJ
o·
0
PIN 1
3.00
1.700
~
a
1==-:-".DD
280852-2
Figure 1: ICE960KB25 Processor Module
SiDE VIEW
I
i
TOP VIEW
r
I~
5.100
~I
3.800
L
O.lea
2PL
280852-3
Figure 2: Optional Isolation Board
5-19
intel~
SPECIFICATIONS
PLCC Hinge Cable Dimensions
17.5
~_ _ _ _. ._ _~~ r--~w.w..ww.u.u,..,-,
Side View
iiiliiliiiiiiiiiii
All Measurements In Centimeters
0.5,-1---i~--~
~
Required Clearance for Surface
Mount ComponenlS
260852-4
Figure 3: ICE960SB16C Adapter
5-20
SPECIFICATIONS
ELECTRICAL SPECIFICATIONS
SYNC Line Specification
The SYNCIN line must be valid for at least one
instruction cycle because it is only sampled on
bus access boundaries. The SYNCIN line is a
standard TTL input. The SYNCOUT line is
driven by a TTL open collector with a 4.75 Kn
pull-up resistor
ACIDC Specifications
The Optional Isolation Board (OlB) isolates the
ICE-960 probe from an untested user target
system. When the OlB is in use, the ICE-960
AC and DC specifications differ from the i960
microprocessor as shown below. When the OIB
is not installed, the ICE-960KB timing
specifications are identical to those of the i960
component.
TABLE 2. AC Specifications with the OIB Installed
Symbol'
16 MHz
80960SB
Parameter
Min
Max
Clock Period
T2
Clock Low Time
9ns
T3
Clock High Time
9ns
T4
Clock Fall Time
IOns
IOns
T5
Clock Rise
IOns
. IOns
T6
Output Valid Delay
A(2:3), BE#(O:l), BLAST # "
DEN#,DTR#, WR#"
AID Lines'"
40ns
33ns
40ns
33ns
AS Valid Delay (AS#)
T7
ALE# Width
T8
ALE # Valid Delay
T9
Output Float Delay
A(2:3), BE#(O:l), BLAST#,*
DEN#, DTR#, WR#"
AID Lines
125 ns
Min
T1
T6AS
32ns
Max
25 MHz
80960KB
20ns.
6ns
36ns
16ns
33ns
12ns
36ns
33ns
50ns
35ns
50ns
40ns
T10
Input Setup 1
HLDA, INTO #, INTI, INT2, INT3 #
13ns
6ns
T11
Input Hold
HLDA, INTO # , INTI, INT2, INT3 #
HOLD, READY#, LOCK#
IOns
IOns
13ns
13ns
Input Setup 2
HOLD, READY #, LOCK #
17ns
11 ns
T12
125ns
6ns
T13
Setup to ALE # Inactive
7 ns
7ns
T14
Hold after ALE #
5ns
5ns
T15
RESET# Hold
4ns
4ns
T16
RESET# Setup
4ns
4ns
T17
RESET # Width
1281 ns
820ns
*TpLH dependent on termmatIon for KB control sIgnals
"OIB does not float AID bus during Tr and Tj (between bus cycles)
"'Output Valid Delay for contr~l signals after HOLD ACKNOWLEDGE is deasserted 50 ns for 809608B and 43 ns for 80960KB
5-21
iniaL
SPECIFICATIONS
TABLE 3. ICE-960 Emulator DC Specifications
ICE Probe
OIB
REM
Processor Speed
ICE960SB
1.4
0.4
0.5
16
ICE960KB
1.4
0.6
0.7
25
TARGET SYSTEM DESIGN
CONSIDERATIONS
In addition to the mechanical, power
consumption, and signal loading
considerations for the ICE probe, the following
points should be taken into account when the
target system is being designed:
1) [SA/SB/KA/KB/MC]
The AD bus should not be driven by an
external source unless DEN # is asserted.
3) [SA/SB/KA/KB/MC]
To guarantee timings, the ICE requires
± 5% supply voltage to the target system
(i.e., ICE probe power).
4) [SA/SB]
To ensure correct bus trace the ICE requires
a data hold time (TIl) of 4 ns.
5) [SA/SB/KA/KB/MC]
Each Vee and GND pin of the processor
must be connected to the appropriate
voltage or ground and externally strapped
close to the package.
2) [SAISB/KA/KB/MC]
The LOCK # signal must be terminated as
recommended in the 80960SA/SB
component data sheet.
6) [SAISB/KA/KB/MC]
Processor no connect (N.C.) pins must be
'
left disconnected.
\
5-22
SPECIFICATIONS
TABLE 4. Additional DC Loading
(OlB)
(ICE Probe)
Signal
(KBREM)
(SBREM)
IIH
IlL
IIH
IlL
IIH
IlL
IIH
IlL
Max
Max·
Max
Max
Max
Max
Max
Max
AD(0:31)
25~A
25~A
15~A
-15~A
120~A
0.7mA
20~A
100~A
ADS #
25~A
25~A
1l5~A
-15~A
Driven by 74AS760
wi 4.7k Pull-Up
lO~A
lO~A
150~A
1.7mA
lO~A
lO~A
DEN #
25~A
25~A
1l5~A
-15~A
W/R#
25~A
25~A
1l5~A
-15~A
CLK2
50~A
500~A
25~A
-25~A
130~A
2.9mA
,20~A
1600 ~A
RESET
25~A
250~A
45~A
-750~A
250~A
0.3mA
lO~A
lO~A
BE(0:3)#
25~A
25~A
1l5~A
-15~A
lO~A
O.lmA
lO~A
lO~A
READY #
25~A
25~A
45~A
-750~A
750~A
O.SmA
25~A
260~A
ALE #
25~A
25~A
15~A
-15~A
20~A
0.5mA·
lO~A
1600 ~A
DT/R#
25~A
25~A
1l5~A
-15~A
INT(0:3)
25~A
25~A
15~A
-565~A
BADAC#
25~A
25~A
15~A
-565~A
LOCK #
25~A
25~A
140~A
-500~A
HOLD
25~A
25~A
45~A
-750~A
FAILURE #
25~A
25~A
20~A
-lmA
TABLE 5. 80960SB PLCC Hinge Cable Loading and Delay
Signal Loading 15 pF Typical
Signal Delay
Signals from Processor delayed 4 ns typical, Setup and Hold Timings unaffected.
5-23
inial..
ORDERING INFORMATION
Description
Hinge Cable Adapter for
surface-mount i960SB EIAJ
QFP packages. This adapter is
included in the ICE960SBI6J
kit.
ADPT84PLCC Hinge Cable Adapter fur
surface-mount and socketed
i960SB PLCC packages. This
adapter is included in the
ICE960SBI6C kit.
ICE960SB16C ICE960 base, i960 SA/SB
probe, 84-pin PLCC surfacemount and socketed target
component interconnect, and
RS232 and RS422
communication cables.
(Shrink-Wrap license, Class 1)
ICE960SBI6J ICE960 base, i960 SA/SB
probe, 80-pin EIAJ surfacemount target component
interconnect, and RS232 and
RS422 communication cables.
(Shrink-Wrap license, Class 1)
ICE960KB25
ICE960 base, i960 KA/KB
probe, 132-pin PGA target
component interconnect, and
RS232 and RS422
communication cables.
(Shrink-Wrap license, Class 1)
Order Code
'ADPT80EIAJ
Order Code
Description
ICE960SBREM Optional 2 MByte Relocatable
Expansion Memory Board for
i960 SA/SB components.
ICE960KBREM Optional 2 MByte Relocatable
Expansion Memory Board for
80960KA/KB components.
PTOI960SBI6 Probe and Software to convert
ICE960KB25 to ICE960SBI6.
An ADPT80EIAJ or
ADPT84PLCC adapter kit
should also be ordered with
this package to support the
component packaging type of
your choice. (Shrink-Wrap
license, Class 1)
PTOI960KB25 Probe and Software to convert
ICE960SBI6C or
ICE960SBI6J to ICE960KB25.
(Shrink-Wrap license, Class 1)
5-24
ICETM-960MC IN-CIRCUIT EMULATOR
260699-1
IN-CIRCUIT EMULATOR FOR THE 80960BiC
MICROPROCESSOR
The ICETM-960MC In-circuit Emulator delivers real-time hardware and software
debugging capabilities for 80960MC based designs. Features include emulation ofthe
80960MC microprocessor, powerful breakpoint specification, fastbreaks, optional
relocatable expansion memory, two types oftrace capability, large trace buffering,
support of virtual and physical component addressing modes, and sophisticated human
interface. The ICE-960MC In-circuit Emulator gives you unmatched control over all
phases of hardware/software debug, including developing, integrating and testing, which
improves development productivity and speeds time to market.
FEATURES
• Real-Time Emulation of the 80960MC
microprocessors up to 20 MHz (25 MHz
optional)
o Full Symbolic Information Relating to
Code. Data symbolics subject to some
limitations in virtual addressing mode
o Optional ICE960KBREM Board Provides
2 Mbytes of ICE Memory Which Can
Overlay User ROM or RAM.
o Zero wait-state operation from user
memory
o Examine and modify Memory ,and the
80960 Registers
o
o
o
o
o
5-25
Breakpoint Capabilities include:
Execution Address, Instruction Type,
Bus Read/Write/Access, and Data
Value. Qualification of Events is Based
on an Occurrence Counter and an 8 state
State-Machine
Hosted on IBM PC AT or compatible
Dynamically monitor or update program
variables or memory during emulation
with Fastbreaks
1024 Frame Trace Buffer for execution
and/ or Bus Trace and time tags
256 Kbytes of Memory in Standalone
Self-Test (SAST) Unit
November 1990
Order Number: 280899·001
intel®
ICETM-960MC IN-CIRCUIT EMULATOR
REAL·TIME EMULATION
branch or call instructions. Bus analysis logic
provides recognition of external bus addresses
qualified by read, write, or access type as well
as data values which may be entered as
masked values; Two synchronization lines are
provided for recognition of external events.
ICE-960MC also provides qualification of
events based on an occurrence counter or by a
recognition sequence of up to 8 events. Special
additions for the 80960MC include the ability
to recognize process binds. Additionally,
emulation can be automatically stopped when
the trace buffer is full. Besides the ability to
execute program code at full speed between
specified points, the ICE-960MC emulator
provides the capability to single-step through
program code.
The ICE-960MC In-circuit Emulator provides
emulation of the 80960MC at speeds up to 20
MHz (25 MHz optional), thus providing early
detection of subtle timing problems. Intel's
intimate knowledge of the component makes
possible the tightest conceivable conformance
between timing parameters of the emulator
and the target microprocessor.
PROCESSOR/MEMORY
EXAMINATION AND
MODIFICATION
The 80960MC registers can.be accessed
mnemonically (e.g. g12, r5, fp3) with the ICE960MC emulator software. Data can be
displayed or modified in one of four bases
(hexadecimal, decimal, octal, or binary) and by
data type (byte, word, etc). Program memory
contents can be disassembled and displayed as
80960 assembly instruction mnemonics.
Additionally, 80960 assembly instruction
mnemonics can be assembled and stored into
program memory. 80960MC system data
structures such as the segment table, dispatch
port, and page tables can also be accessed and
modified mnemonically.
RELOCATABLEEXPANSION
MEMORY
PROGRAM TRACING
The ICE-960MC emulator can store 1024
frames of program execution history or 1024
frames ofthe 80960MC address/data bus
activity in the trace buffer. Each frame of
program execution contains a discontinuity
address (branch, call, return, etc) and a timetag: This information can be used to
reconstruct a history of the program execution.
With the execution trace option enabled, the
ICE-960MC will run at less than full speed.
Each trace frame of bus cycles contains one
complete bus burst trace. Collection of trace
information is controlled by a logic analyzer
type moving trace window and by bus access
type.
EVENT RECOGNITION
(BREAKPOINT CONTROL) AND
EMULATION CONTROL
ICE-960MC provides comprehensive event
recognition capabilities including: two
hardware and thirty-two software breakpoints
for instruction execution breakpoints, and use
of the internal debug registers to recognize
execution of certain instruction types such as
An optional.board provides ICE-960MC with 2
Mbytes of relocatable expansion memory
which allows users to develop applications
either before the target system memory is
working, or in place of ROM or EPROM to
speed the debugging cycle. This memory can be
mapped in two separate 1 Mbyte partitions on
1 Mbyte boundaries. The memory waitstate
pattern is (3,1,1,1) when the user's system does
not return RDY # for accesses directed to the
ICE960KBREM board. For accesses where the
user system does return RDY # the waitstate
pattern will be the larger of(3,1,1,1) or user
waitstate pattern plus (2,2,2,2). The size aJ;ld
shape ofthe board is identical to the ICE probe
and is installed between the probe and the
user's target system whIm in use. The memory
configuration can be mapped via either an ICE
MAP command or via switches on the
ICE960KBREM board.
The ICE-960KBREM card adds some
constraints when used with the ICE in a user's
target system. First, users should qualify bus
drivers/buffers with DEN # in order to
eliminate potential bus conflict between
REM960 and their target memory. Second, the
1 Mbyte partition size can not be reduced and
may effect the design ofthe user's memory
subsystem. Third, ICE960KBREM delays the
ADS # and DEN # signals by 5 nsec (typical)
and delays the RDY # signal by 2 nsec (typical).
Fourth, it adds loading, capacitance, and power
requirements as shown in tables 3 and 4.
5-26
ICETM-960MC IN-CIRCUIT EMULATOR
STANDALONE OPERATION
Product software can be developed and
debugged prior to and independent of
hardware availability with the Standalone Self
Test unit (SAST), which contains 256 Kbytes of
two wait-state program memory. The SAST
also provides diagnostic testing to assure full
functionality of the ICE-960MC emulator.
VERSATILE AND POWERFUL
HOST SOFTWARE
ICE-960MC provides an easy-to-use human
interface which utilizes color and pull-down
menus to complement a powerful command
set. The software includes: an on-line help
facility, a dynamic command entry and syntax
guide, screen oriented editor, assembler and
disassembler, input/output redirection,
command piping, DOS command entry, and
the ability to customize the command set via
debug procedures and literal definitions.
Special software commands are provided to
display, interpret, and modify the 80960MC
hardware data structures including the
segment table, dispatch port, process control
block, and the page tables and directories.
DEBUG PROCED URES AND
LITERALS
Debug procedures (PROCs) are user-defined
groups of ICE-960MC emulator commands.
They can be stored on disk and recalled during
later debugging sessions. PROCs can be used to
simplify the process of debugging by grouping
repetitive emulator commands, which can then
be accessed by typing the name of the PROC.
Literals are user-defined abbreviations for
whole or partial ICE-96()MC emulator
commands. Literals are a shorthand method of
customizing the emulator commands to fit
your needs and preferences.
5-27
intel®
ICETM·960MC IN· CIRCUIT EMULATOR
WORLDWIDE SERVICE,
SUPPORT, AND TRAINING
telephone support, automatic distribution of
software and documentation updates, access to
the "TooITalk" electronic bulletin board,
"iComments" publication, remote diagnostic
software, and a development tools
troubleshooting guide.
Intel's Hardware Support package includes
technical hardware information, telephone
support, warranty on parts, labor, material,
and on-site hardware support.
To augment its development tools, Intel offers
a full array of seminars, classes, and
workshops, field application engineering
expertise, hotline technical support, and onsite service.
Intel also offers a Software Support package
which includes technical software information,
SPECIFICATIONS
HOST REQUIREMENTS
REQUIRED SYSTEM
RESOURCES
IBM PC AT (minimum requirements) with 640
KB of conventional memory
• 1 MB of RAM (Lotus, Intel, Microsoft
expanded memory specification)
• 20 MB Fixed Disk
• At least one 5- 1/ 4 // Floppy Disk drive
• A serial interface
• DOS Operating system (version 3.2 or later
excluding 4.x)
Mechanical Specifications
The ICE-960MC emulator requires the
following: a) exclusive use ofthe B0960MC's onchip debug registers and b) a minimum of 256
bytes of target system RAM used to flush the
B0960 local registers.
TABLE 1. ICE-960MC Emulator Physical Characteristics
Unit
Control unit
Processor module'
SAST
OIB
Power supply
User cable
Serial cable
Width
Inches
cm
10.5
3.B
6.0
3.B
2.B
26.7
9.6
15.2
9.6
7.1
Height
Inches
cm
1.5
1.5
2.0
.9
4.2
tmeasurement mcludes target adaptor
5-28
3.B
3.B
5.1
2.3
10.7
Length
Inches
cm
16.0
5.0
B.O
5.1
11.0
22.0
12.0 ft
40.6
12.7
20.3
13.0
27.9
55.9
3.66m
Weight
kg
lbs
6.0
2.72
3.5
1.59
4.7
2.14
ICETM-9160MC IN-CIRCUIT EMUl.ATOR
SIOEVIE.W
.0tI0
·1,
5.35
\'
9
4'
--
9·
.13
f
,II
,
1.20
9
*
TOPVIE\',
it
0
0
i
r
TO
.200
~·L
'I
PIU 1
3.CO
L~
IT--
/
0
L..10n
-<-
-<>
~
~'_.CO
2 PL
PROCESSOR UQDULE
280899-2
Figure 1: Processor Module
SIOEV1E\'/
I
i§~~~'3+
.50 ~I
I
.80 REF
F-=--===f---.i...
TOPVIEW
-_.,00_'\
1---'
0
0
0
0
,.'00
,j,
PIN 1
Jl
3.000
Plf~
J'
1
'r
I
1.26
1I p
__
.1
1.1
0
0
0
~'" 8 1
.\.15+-
..-
.OO~
OPTIONAL ISOLATION BOARD
1
0.18
, PL
280899-3
Figure 2: Optional Isolation
5-29
intel~
SPECIFICATIONS
ELECTRICAL SPECIFICATIONS
SYNC Line Specification
The SYNCIN line must be valid for at least one
instruction cycle because it is only sampled on
instruction boundaries. The SYNCIN line is a
standard TTL input. The SYNCOUT line is
driven by a TTL open collector with a 4.75Kohm pull-up resistor.
ACIDC Specifications
The Optional Isolation Board (OIB) isola.tes the
ICE-960MC probe from an untested user target
system. When the OIB is in use, the ICE960MC AC and DC specifications differ from
the 80960MC microprocessor as shown below.
When the OIB is not installed, the ICE-960MC
specifications are identical to those of the
80960MC component.
TABLE 2. AC Specifications With The OIB Installed
Symbol'
t2
t3
t6
t7
t8
t9
tlO
tll
tl6
Parameter
clock low time
clock high time
output valid delay
AID 0:31
DT/R#, DEN#, BEO-3#, ADS#, W/R#
HLDA, CACHE, LOCK #, INTA #
ALE #
ALE# width
ALE # disable delay
output float delay
AID 0:31
DT/R#, DEN#, BEO-3#, ADS#, W/R#
HLDA, CACHE, LOCK #, INTA #
input setup 1
AID 0:31
BADAC #, INTO-3 # deassertion
input hold
AID 0:31, HOLD
BADAC#,INTO-3#,
READY #
reset setup time
Minimum
Maximum
2+1nS
3+1ns
6+8ns
6+7nS
6+6ns
6+lOnS
7-6.5nS
8+nS
t6+16Ns
t6+ 14ns
t6+8nS
t6+20nS
t9+5nS
t9+7nS
t9+6nS
t9+22nS
t9+15ns
t9+8nS
t8+14nS
tlO+2nS
tlO+ 14nS
tll+6nS
tll +7nS
. 16+6
'symbol refers to 80960MC speCificatIon
TABLE 3. ICE-960MC Emulator DC Specifications
Symbol'
Parameter
Ma~imum
PM-Icc
OIB-Icc
REM-Icc
Supply current with 80960KB-20
Supply current
Supply current
1400mA
PM-Icc + 1l00mA
PM-Icc + 1300mA (1700 Total Typical)
5-30
SPECIFICATIONS
TABLE 4. Additional DC Loading
Signal
AD (0:31)
ADS#
DEN #
W/R#
CLK2
RESET
BE (0:3)#
READY #
ALE#
DT/R#
INTO # , INT3 #
INT1,INT2
BADAC#
LOCK#
HOLD
FAILURE #
(with
(with
(without
REM installed)
OIB installed)
OIB installed)
lih
IiI
lih
IiI
lih
IiI
Maximum Maximum Maximum Maximum Maximum Maximum
100uA
140uA
40uA
140uA
BOuA
0.6 rnA
1.6 rnA
l.OmA
1.6 rnA
2.2 rnA
20uA
20uA
20uA
20uA
50uA
50uA
20uA
20uA
20uA
20uA
20uA
20uA
20uA
20uA
20uA
20uA
5-31
-1mA
-1mA
-1mA
-1mA
-2mA
-2mA
-lmA
-lmA
-lmA
-lmA
-lmA
-1mA
-lmA
-lmA
-lmA
-lmA
120uA
0.7 rnA
Driven by 74AS760
wi 4.7k pull-up
150uA
l.7mA
l30uA
2.9 rnA
250uA
0.3 rnA
10uA
0.1 rnA
750uA
O.BmA
20uA
0.5 rnA
int:eL
SPECIFICATIONS
POWER SUPPLY
ORDERING INFORMATION
100-120V or 220-240V (Selectable)
50-60Hz
2 amps (AC Max) @ l20V
1 amp (AC Max) @ 240V
Order Code
Description
ICE960MC
The complete 20 MHz ICE960MC emulator system
including coritrol unit,
processor module, power
supply, SAST, OIB, SAB,
serial communications cable
(SCOM4), IEDIT, V1.0
software. (Requires software
license, Class I)
ICE960MC25P
25 MHz ICE960MC as
described above
ENVIRONMENTAL
CHARACTERISTICS
Operating Temperature 10°C to 40°C
(50°F to 104°F)
Operating Humidity
Maximum 85%
Relative Humidity,
non-condensing
I960MCUPG
Conversion kit to convert
ICE-960KB to ICE-960MC.
Consists of new host and
probe software, probe
firmware, and manual.
Requires ICE-960KB V2.0
hardware.
ICE960KBREM Optional 2 Mbyte Relocatable
Expansion Memory Board.
5-32
QT960 EVALUATION AND PROTOTYPING
BOARD
270743·1
LOW COST EVALUATION TOOL
The QT960 products give you a 32-bit starter kit to begin software evaluation and
hardware design at a low cost. The boards feature the 20 MHz 80960KB 32-bit embedded
processor. The 80960KB has integrated floating point, instruction and register caches,
and an on-chip interrupt controller. The 80960K-series are the first in a new
architectural family of embedded processors from Intel built using Intel's CHMOS Ivt
process. These boards provide you with full access to the features ofthe 80960KB
processor. A wire wrap prototyping area offers you easy access to board features to test
your designs. Interleaved EPROM means fast execution of your code taking advantage of
the 80960KB's burst bus. A programmable wait state generator simulates different
memory environments useful in evaluating the performance of your code. These features
make the QT960 boards useful low cost tools for the 32-bit embedded designer.
Once written, you can debug your program with NINDY, an EPROM resident debug
monitor. NINDY enables you to download code, set seven different trace modes, display
and modify memory or registers, and disassemble problem code sequences.
Available separately from Intel are the ASM-960 (assembly language) and iC-960 (highlevel language) products which provide you with the code development environment for
the QT960 boards.
The starter kit comes in two versions: the QT960F version has fast SRAM, high speed
EPROM and Flash memory; the QT960E version has lower cost SRAM, Flash memory
and no high speed EPROM. Each version has NINDY in either EPROM (QT960F) or
Flash memory (QT960E), power supply cable, and the QT960 User Manual. Both versions
also include the parts list, source code of the debug monitor, and the board data base
(schematics) all on diskette. Armed with this starter kit you now have a system to
evaluate and prototype your product ideas quickly and at low cost.
5-33
November 1990
Order Number: 270743·002
int'et
FEATURES
QT960 FEATURES
• 20 MHz Execution Speed
• Display/Modify Memory and Registers
• 128K Bytes to Zero Wait State EPROMt
• Code Disassembly
• 128K Bytes of Flash Memory
• High Level Language Support
• 128K Bytes of Zero Wait State SRAM+
• RS-232 Communications Link
• Programmable Wait State Generator
• The QT960E Version has 128K Bytes of Two
• Prototyping Wire Wrap Area
Wait State SRAM and 128K Bytes Four Wait
State Flash Memory
• Five Instruction Traces
• Two Hardware Breakpoints
Product Order Codes: EVQT960F20 and EVQT960E20
tCHMOS IV is a patented Intel process.
*QT960F Version only.
FAST AND EASY CODE UPDATES
128K Bytes of Intel's 28F256 Flash memory provides an easy and quick method of changing your
code in nonvolatile memory. Flash memory may be conveniently reprogrammed without
removing it from the board while software is under development.
FAST EPROM
Interleaved fast EPROM (Intel's 27C202) on the QT960F version yields one-zero-zero-zero wait
state code access. It efficiently utilizes the four word burst capabilities of the 80960KB bus
maximizing program performance.
PROTOTYPING SUPPORT
A prototyping wire wrap area is provided on board with access to the system's signals and buses.
This area gives you access to the board's features and allows you to easily test design ideas. A
system bus connector is also provided for off board prototyping.
PROGRAMMABLE WAIT STATE GENERATOR
A software programmable wait state generator enables you to quickly model various memory
speeds. Under software control you can set over 16 different wait state combinations and evaluate
the performance of your target system.
DMA
The board offers you eight DMA channels accessed through a NINDYlibrary function using
Intel's 82380. In addition, off board connectors provide DMA I/O capabilities.
FIVE INSTRUCTION TRACES AND TWO HARDWARE
BREAKPOINTS
NINDY utilizes the built-in trace capabilities of the 80960KB to provide you with single· step,
supervisor, call, return, and branch instruction tracing offering you extensive debug capabilities
for software examination and modification. Two hardware breakpoints enable you to break on
and examine EPROM resident code.
5-34
FEATURES
HIGH LE VEL LANGUAGE SUPPORT
NINDY is capable of downloading absolute object code generated by ASM-960 or iC-960. ASM-960
and iC-960 may be purchased separately from Intel.
COMMUNICATION AND SOFTWARE REQUIREMENTS
The QT960 boards communicate with the host through the RS-232 link using an Intel 82510
UART provided on board. The boards support five baud rates: 1200,2400,9600,19200, and 38400.
The default is 9600 baud. To communicate with the QT960 boards you must meet the following
minimum software requirements:
o
Terminal Emulator
80960KB
CPU
o
-I
WAIT STATE
GENERATOR
T
I
ADDRESS
LATCHES
I
EPROM
CLOCK
RESET
128K BYTES
I
XMODEM Download Capabilities
82380
SYSTEM
SUPPORT
fPERIPHERAL
I
I
fLASH
MEMORY
128K BYTES
82510
SERIAL
CONTROLLER
-
I
I
U=
SRAM
-
WIRE
WRAP
PINS
Off
BOARD
CONNECTOR
128K BYTES
270743-2
Block Diagram of the QT960 Board
For information or the number of your nearest sales office call BOO·54S-4 752 (U.S. and Canada).
Intel Corporaton, Literature Department, 3065 Bowers Avenue, Santa Clara CA 95051, United States. Tel: 408·987-8080.
5-35
DB960CADIC IN-CIRCUIT DEBUG MONITOR
280~OO-1
DB960CADIC
Intel's DB960CADIC, the in-circuit delmg monitor for the 33 MHz i960CA embedded
microprocessor, represents a new generation of development tool technology.
DB960CADIC allows users to debug high-speed, cached applications at the full speed of
the i960CA target processor. Controlled by Intel's DB interface, DB960CADIC offers the
user a tool with a powerful feature set at a fraction of the cost oftraditional development
tools. DB960CADIC is designed to improve productivity by allowing the user to debug
software before and after the target system arrives, with minimal hardware intrusion.
Features
• Real-time emulation ofthe i960CA
embedded microprocessor at speeds up to
33 MHz
• Full development and debug support for
i960CA on-chip cache and RAM
• Minimal intrusive operation, allowing
the user to debug the target system with
minimal modification subject to initial
design constraints
• Breakpoint capabilities include ten
software breakpoints, two hardware
execution address breakpoints, and two
hardware data address breakpoints. The
human interface supplements these
breakpoints with the ability to break on
data values, conditions, and a four-state
state machine in non-real time.
• Low-Cost
• Source-Level, Symbolic Debugging in a
Windowed Human Interface with pull
down Menus (DB). This interface is
consistent across i960CA tools.
• 128K Bytes User Memory
• Virtual 110, the ability to perform 110
between the DB960CADIC unit and the
host
• In-Circuit operation facilitates easy
transition between target systems
• Optional Stand-Alone Self-Test
(DB960CASAST) Module
• Optional Logic Analyzer Interface Board
(LAI960CA)
5-36
June 1991
Order Number: 280900-002
DB960CADIC IN· CIRCUIT DEBUG MONITOR
Full-Speed Debug and
Development
addresses in hardware. Software breakpoints
are also used to supplement the hardware
breakpoints for RAM-based memory
subsystems. DB960CADIC extends these
capabilities by providing the ability to break
on data values, NOT data values, or
combinations ofthe above in a four-state state
machine. More complex conditions such as
breaking when a variable is less than a certain
value can be entered via a very flexible feature
called conditional breakpoints.
The DB960CADIC In-Circuit Debug Monitor
provides sophisticated real-time hardware and
software debug capabilities for i960CA
embedded microprocessor-based designs. The
user can run at the full speed ofthe target
processor, ensuring th"at elusive timing bugs
will be found. The DB960CADIC is jumpered to
receive a clock pulse from either the user's
target system, or from an internal 25 MHz
clock.
128K Bytes User Memory
Ideal for All Stages of
Development
DB960CADIC can be used by both hardware
and software developers, at any stage of design.
Early in the development process,
DB960CADIC allows software debugging when
inserted into an existing i960CA board such as
the DB960CASAST module or the EV80960CA
board. Later in the design cycle, DB960CADIC
can be inserted into the user's target system,
thus facilitating debug of hardware/software
integration.
Speed Development with Source
Code, Symbolic Debugging
Using source code oriented debugging in a
windowed, symbolic interface, software
engineers can increase productivity by
debugging in the medium they are familiar
with, software source.
Commands can be entered via either function
keys, pull-down menus which group logically
related commands, or a supplementary
command line which allows entry of complex
conditions. In addition, source code symbolics
can be used to examine and modify memory
and registers. Optimal symbolic debugging can
be achieved when using DB960CADIC with
genuine Intel languages.
Powerful Break Capabilities
DB960CADIC provides complex emulation
control by utilizing the on-chip debug registers
within the i960CA. Real-time break
capabilities include the ability to break on any
two execution addresses or data access
DB960CADIC provides the user with 128K
bytes of memory in Region F of the i960CA
target space. Since the debug monitor is also
placed in Region F, the on-chip bus interface
unit of the i960CA is configured to address
region F as byte-wide memory with 5
waitstates and no burst accesses allowed.
Virtual Input/Output
DB960CADIC is shipped with documented
library calls which provide users with a builtin mechanism of performing target I/O using
the host system. These libraries provide the
ability to simulate I/O operations in the target
system before target hardware is available.
High Speed Serial Link
Communication between a host and the
DB960CADIC module is supported via RS232
and RS422 communication links. RS232 allows
access to industry standard serial protocols
while the RS422 link provides a higher speed
communication mechanism currently
emerging in the development market. PC/ AT
Compatible RS422 communication boards are
available from various third party vendors.
Optional Stand-Alone Self Test
Chassis
An optional stand-alone self test chassis
complements DB960CADIC by allowing the
user to debug and test code before prototype
hardware is available. The DB960CASAST
includes self-test circuitry to ensure that the
DB960CADIC unit is working correctly. It also
provides 4 Megabyte of DRAM to be used for
5-37
intel~
DB960CADIC IN-CIRCUIT DEBUG MONITOR
Worldwide Service, Support, and
Training
developing applications. This memory has a
(3,1,1,1) waitstate pattern at 25 MHz. This
waitstate pattern is programmable using the
bus controller unit in the i960CA. It also
includes an 8254 programmable timer which
can optionally interrupt the i960CAprocessor
and provide the ability to time code sequences.
Optional Logic Analyzer Interface
Board
The LAI960CA board provides access to
i960CA pins by routing the signals to easily
accessible stake pins while passing them
through to the target system.
Software Completes the System
Intel provides a comprehensive software
development environment to complement
DB960CADIC. This environment includes C
and ASM source languages, a retargetable
debug monitor, and DB960CADIC. The
languages support the entire range of 80960
embedded processors.
To augment its development tools, Intel offers
a full array of seminars, classes, workshops,
field application engineering expertise, hotline
technical support, and on-site service.
Intel also offers a Software Support contract
which includes technical software information,
telephone support, automatic distributions of
software and documentation updates,
iCOMMENTS publication, remote diagnostic
software, and a development tools
troubleshooting guide.
Intel's 90-day Hardware Support package
includes technical hardware information,
telephone support, warranty on parts, labor,
material, and on-site hardware support.
Intel Development Tools also offers a 30-day,
money-back guarantee to customers who are
not satisfied after purchasing any Intel
development tool.
IDB960CADIC SPECIFICATIONS AND REQUIREMENTS I
Host System Requirements
Host system requirements to run the in-circuit
debugger include the following:
-DOS version 3.2 or later excluding DOS 4.0
-640 bytes of RAM in conventional memory
-A 20 MB hard disk
-An RS232 or RS422 Serial Port
Evaluated Systems include:
IBM PC-AT' with DOS 3.3
COMPAQ 386* with DOS 3.3
Intel 301/302* with DOS 3.3
IBM Personal System/2* Model 70/80 with
DOS 4.01
Environment Characteristics
Operating Temperature: + 10°C to + 40°C
Operating Humidity:
5-38
(50°F to 104°F)
Maximum of 90%
relative humidity,
non-condensing.
DB960CADIC IN-CIRCUIT DEBUG MONITOR
0
A20
~
001
003
005
007
O~
D20
0011
.0013
0015
0017
0019
0021
DO
02
D4
D6
A6 0
~O
0023
D22
O~
0025
0027
0029
0031
o ROY"
o HOLDA
o ADS"
o SUP"
-?>
024
D26
026
D30
BTRM"
HOLD
BLST"
~
~
AGO
o
A9
All
A13
A15
A17
A19
A21
A23
Ala 0
A120
A140
A160
AlB 0
A200
A220
~o
A240
A260
A2B 0
A30 0
BEO" 0
BE2" 0
ODO
D6
010
012
014
016
01B
A3
A5
A7
MO
C4
0
1
~
::.:::: I---.or
1.5"
4cm
Figure 2
5-39
280900-4
•
intel®
DB960CADIC IN-CIRCUIT DEBUG MONITOR
Ordering Information
DB960CADIC Interface
Considerations
DB960CADIC
Target systems intended to receive
DB960CADIC must meet the following
requirements:
• The target system mustnot respond to
memory accesses in Region F (OFOOOOOOOOFFFFFFFF) with DB960CADIC installed.
DB960CADIC provides an ACTIVE out
signal which can be used to qualify bus logic
to prevent this occurrence when
DB960CADIC is installed.
• The Target System must provide 1.3 Amps of
power (worst case) .9 Amps average to power
the DB960CADIC unit.
• Use of one of the nine directly accessible
i960CA interrupts.
• Use of interrupt table entry 242 or 248.
• Additional Signal Loading as follows:
The DB960CADIC makes use of the PCLK
outputs, DO through D7, and some ofthe
address and control signals of the processor.
The following table lists the worst case
loadings added by the presence of the
DB960CADIC circuitry.
Signal
Name
DC Load
(p,A)
Capacitive
Load (pF)
PCLK1
PCLK2
CLKIN
DO:D7
A31:26
A2:A17
BEO', BEl'
ADS'
W/R*
WAIT*
BLAST'
FAIL*
RESET'
INTO:7*
NMI*
+25/-250
+ 30/ -255
+12/ -12
+ 20/ -600
+25/ -250
+20/-100
+20/-100
+50/-500
+50/-500
+ 25/ -250
+25/-250
+20/-20
+15/-15
+20/ -500
+20/-500
8
17
13
10
11
10
10
13
13
8
8
8
25
15
15
In-circuit debug monitor for
the i960CA embedded
microprocessor. Operates at
speeds up to 33 MHz. Includes
hardware debug module,
RS232/RS422 serial cables,
DOS host software, and
documentation.
DB960CASAST Stand-Alone Self Test Unit for
DB960CADIC. Includes builtin power supply, self-test
board, 4Mbyte of usable
DRAM for code development,
and enclosure.
DB960CADIC and
DB960CAST
DB960CASAST as described
above.
Optional
Logic Analyzer
LAI960CA
Interface Board for the
i960CA system. Does not
require DB960CADIC.
Additional Loading Imposed on the Target by
the DB960CADIC
5-40
inY
...,®
INTEL DEVELOPMENT TOOLS SOFTWARE
SERVICES
280921-1
Intel is committed to providing high quality products and customer support. Our
commitment to quality is demonstrated by a 30 day, money-back, unconditional refund to
customers not satisfied with their purchase of an Intel Development Tools product.
Intel supports its customers by offering a 90-day software warranty and standard
software support including free technical support over the phone.
Intel software is continuously undergoing improvement. For customers who desire the
security of having the most current software and the convenience of having updates sent
automatically, Intel offers inexpensive Software Support Contracts.
SOFTWARE WARRANTY
STANDARD SOFTWARE
The standard software warranty is 90 days
and entitles the customer to the following
(provided the customer has registered their
software by returning a completed
Warranty Registration Card):
• Replacement of defective media
• Software product updates occurring
within the 90 day warranty period.
Standard Software Support, provided at no
additional cost, offers the following
additional benefits:
o Free Technical Information Phone
Service ("TIPS')
o Timely response to Software Problem
Reports
SUPPORT
5-41
June 1991
Order Number: 280921·001
lINTEL DEVELOPMENT TOOLS SOFTWARE SERVICESI
Software Support Contracts
Software Support Contracts cover products for
one year from the date of purchase and are
renewable annually. The following benefits are
provided:
• Automatic Software Updates
• Standard Software Support
.. Remote Diagnostic Software for DOS-based
products.
• Monthly issues of iCOMMENTS, a technical
support publication
• Quarterly issues of Troubleshooting Guides
(host-specific)
• Quantity discounts
ORDERING INFORMATION
Ordering Procedures
Pricing Information·
For more information, call 1-800-468-3548 or
Quantity discounts.are:
your local Intel sales office. Similar support
. product quantity
pricing per copy
offerings are available outside of North
1..,.10 copies
20% of List Price
America. Software Support Contracts are
available for North American customers only.
11-25 copies
15% of List Price
26+ copies
10% of List Price
All orders for contracts, including renewals,
can be submitted through the local Intel sales
V AX and MicroV AX software not included.
office or directly to the Development Tools
Please call 1-800-874-6835 for price quote.
Operation by calling 1-800-874-6835.
Ordering Information
To order a Software Support Contract, a
customer must have registered their product . order code
description
or provide proof of ownership. Customers must
SWSUPPORT51
Software Support Contract
also have the most current version ofthe
for 51 family
software, otherwise, they must order a product
upgrade before a support contract may be
SWSUPPORT86 Software Support Contract
purchased.
for86 family
Pricing is a percentage of the List Price, based
SWSUPPORT96 Software Support Contract
on the number of copies covered by the
for 96 family
Software Support Contract. For emulators, the
SWSUPPORT286 Software Support Contract
percentage will be applied to the identified list
for 286 family
price of the software portion only, not the full
SWSUPPORT386 Software Support Contract
.
list price ofthe emulator.
for 386 family
SWSUPPORT486 Software Support Contract
for 486 family
SWSUPPORT960 Software Support Contract
for 960 family
5-42
iRMKTM 960
REAL-TIME KERNEL
III Requires Only an i960 KA, KB or MC
E!l 32-Bit Real-Time Multi-Tasking Kernel
Embedded Processor
for the i960™ Microprocessor Family
II Flexible, Modular Design to Ease
~
System Integration
Bus Independent
rn Easy Customization and Add-On
.
!!II Fast Execution with Predictable
Response Time for Time-Critical
Applications
Enhancements
Easily EPROMmable
1m Comprehensive Development Tool
E1I Compact Code Size (14 Kbytes-
Support
Including All Optional Modules)
The iRMK 960 Real-Time Kernel is the 32-bit real-time executive developed and supported by Intel, the i960
architecture experts. The kernel isa small, fast and highly modular package of system control software. It
contains the basic software building blocks that act as the foundation in using the key features of the i960
microprocessor. The iRMK 960 software is fully supported by an array of tools that work in the most popular
development environments (i.e., DOS', VAXIVMS', SUN').
The iRMK 960 Real-Time Kernel is available off-the-shelf. The kernel reduces the cost and risk of designing
and maintaining software for numerous real-time applications such as, embedded control systems and dedicated real-time subsystems in multiprocessor environment. Use of the kernel can save man years that might
otherwise be spent developing or porting another real-time kernel. This means reduced time to market for the
user.
*DOS® is a registered trademark of Microsoft Corporation.
VAXIVMSTM is a trademark of Digital Equipment Corporation.
SUNTM is a trademark of Sun Microsystems.
5-43
October 1991
Order Number: 281006-001
intel®
iRMKTM 960
ARCHITECTURAL OVERVIEW
TASK MANAGEMENT
At the heart of the architecture are the kernel core
modules consisting of a scheduler, task manager,
interrupt manager and time manager (See Figure 1).
As additional building blocks, the kernel provid~s optional modules consisting of a mailbox manager,
semaphore manager, memory manager, on-processor interrupt controller manager and ·fault handler
manager. The optional device. manager for the
82380 Integrated System Peripheral (ISP) and 8254
Programmable Interval Timer (PIT) complete the architecture.
The iRMK 960 kernel uses system calls to create,
manage and schedule tasks in a multi-tasking environment. It provides pre-emptive priority scheduling
combined with optional time-slice (round robin)
scheduling.
The scheduling algorithm used by the kernel enables tasks to be rescheduled in a fixed amount of
time regardless of the number of tasks. Applications
may contain any number of tasks.
An application can integrate optional task handlers
to customize task management. These handlers can
execute on task creation, task switch, task deletion
and task priority change. Task handlers can be used
for a wide range of functions, including saving and
restoring the state of coprocessor registers on task
switch, masking interrupts based on. task priority or
implementing statistical and diagnostic monitors.
FUNCTIONAL FEATURES
A Full Set of Real-Time Building Blocks
The kernel provides a full set of services for realtime applications including task management, time
management, synchronization of and communications between tasks, and memory pool management.
INTERRUPT MANAGEMENT
iRMK 960 interrupts are managed by immediately
switching control to user-written interrupt handlers
when an interrupt occurs.
APPLICATION
]
LANGUAGE INTERFACE LIBRARIES
KERNEL
CORE
MODULES
KERNEL
OPTIONAL
MODULES
USERSUPPLIED
SYSTEM
ROUTINES
KERNEL
SUPPLIED
DEVICE
MANAGERS
HARDWARE
281006-1
Figure 1. iRMKTM 960 Real-Time Architecture
5-44
iRMI(TM 960
Response to interrupts is both fast and predictable.
Most of the kernel's system calls can be executed
directly from interrupt handlers.
MEMORY POOL MANAGEMENT
The iRMK 960 kernel uses the concept of memory
pools to efficiently divide and manage blocks of
memory. The memory pool manager provides for
both fixed and variable block allocation.
TIME MANAGEMENT
The time management features included in the kernel provide single-shot alarms, repetitive alarms and
a real-time clock. In addition, alarms can be reset.
Memory can be divided into any number of pools.
Multiple memory pools might be created for different
speed memories, or for allocating different size
blocks. The times to allocate and de-allocate fixedsize areas from within a pool have a fixed upper
bound.
These time management facilities can solve a wide
range of real-time programming problems. Singleshot alarms, for example, can be used to handle
timeouts. If the timeout occurs, the alarm invokes a
user-written handler; if the event occurs before the
timeout, the application simply deletes the alarm.
Other uses for the kernel's time management facilities including polling devices with repetitive alarms,
putting tasks to sleep for specified periods of time,
or implementing a time-of-day clock.
The kernel-supplied memory manager works with
flat memory architecture. Users can also write their
own memory manager to provide different memory
management policies or support virtual memory.
Hardware Requirements and Support
The kernel requires only an i960 microprocessor and
sufficient memory for itself and its application. The
kernel's design, however, recognizes that many systems use additional programmable peripheral devices and coprocessors. The kernel provides optional
device managers for:
INTERTAS;( SYNCHRONIZATION AND
COMMUNICATION
Semaphores, regions and mailboxes are the key
mechanisms the kernel uses for synchronizing tasks
and communicating between tasks.
o
Semaphores are objects used for intertask signaling
and synchronization. Tasks exchange abstract
"units" with semaphores as a means of becoming
synchronized. A tasl< requests a unit from a semaphore to gain access to a resource. If the resource is
available, the semaphore will have a unit to give to
the task, enabling the task to proceed. A task sends
a unit to a semaphore to indicate that it has released
a previously obtained resource.
The 82380 Integrated System Peripheral (ISP)
chip
o The 8254 Programmable Interval Timer (PIT) chip
An application can supply managers for other devices and coprocessors in addition to or in replacement
of the devices listed above.
The openness of the iRMK 960 kernel is a major
benefit to the OEM. The kernel is designed to be
programmed into PROM or EPROM, making it easy
to use in embedded designs. In addition, it can be
used with any system bus, including those of MULTIBUS I and MULTIBUS II bus architectures.
A special binary type of semaphore is called a Region. Regions are used to ensure mutual exclusion,
thus preventing deadlock when tasks contend for
control of system resources. A task holding a region's unit runs at the priority of the highest priority
task waiting in queue for the region's unit.
A Modular Architecture for Easy
Customization
Mailboxes are queues that can hold any number of
messages and are used to exchange data between
tasks. Either data or pointers can be sent using mailboxes. The kernel allows mailbox messages to be of
any length. High priority messages can be placed
Gammed) at the front of the message queue to ensure that they are received and processed before
other messages queued at the mailbox.
The kernel is designed for maximum flexibility. It can
be customized for any application. Each major function, mailboxes for example, is implemented as a
separate module. The kernel's modules have not
been linked together and are supplied individually.
(See Table 1 for the list of kernel modules, and their
approximate sizes.)
To ensure that high priority tasks are not blocked by
lower priority tasks, the kernel allows tasks to queue
at semaphores and mailboxes in priority order. The
kernel also supports first-in, first-out task queueing.
The user links only the modules needed for his application. Any module not used does not need to be
linked in, and does not increase the size of the kernel in your application. The user can also replace
5-45
II
intel®
iRMKTM 960
any optional kernel module with one that implements specific features required by the application.
For example, the user might want to replace the kernel's memory manager with one that supports virtual
memory.
These tools include:
ASM 960 assembler iC 960
Software:
compiler
NOTE:
These tools are available for ~OS,
VAXIVMS*, MicroVAX*, SUN' and
EVA960KB 4MB environment
Oebuggers:
ICETM 960
In-Circuit Emulator for the i960 microprocessor
SMOTM 960 System Debug Monitor for the i960
microprocessor
Evaluation
Vehicles:
EVA960KB AT Bus-Compatible Board
A960KB4MB AT Bus-Compatible Board with
4 Mbytes of Memory
Standalone Evaluation Vehicle
QT960
Table 1_ iRMKTM 960 Kernel Modules
and Approximate Sizes
Core Modules
Task Manager
Interrupt Manager
Time Manager
Scheduler
Initialization
Bytes
2600
150
3000
1700
50
Optional Modules
Mailbox Manager
Semaphore Manager
Memory Manger
Fault Handler Manager
Miscellaneous
1250
2900
1260
50
300
Optional Device Manager
82380 Integrated System Peripheral
8254 Programmable Interval Timer
Intel Support, Consulting and Training
With iRMK 960 kernel software, the developer has
available the total Intel i960 architecture and realtime expertise of Intel's support engineers. Intel provides telephone support, on or off-site consulting,
troubleshooting guides and updates. The kernel includes 90 days of Intel's Technical Information
Phone Service (TIPS). Extended support and consulting are also available.
4200
1200
Total size of the (entire) kernel (minus device managers) is about 13.5 Kbytes.
Developing with the iRMKTM 960
Real-Time Kernel
Contents of the iRMKTM 960 Kernel
Development Package
Kernel applications can be written using any language or compiler that produces code that executes
on the i960 microprocessor. This independence is
achieved by using an interface library. This library
works with the. idiosyncracies of a particular language-for example, the ordering of parameters.
The interface library translates the calls provided by
the language into a standard format expected by the
kernel. Intel provides an interface library for our iC
960 compiler. The source code of this library is included, so that the user can modify it to support other compliers.
The iRMK 960 Kernel comes in a comprehensive
package including:
• Kernel object modules
• Source for the kernel supplied 82380 Integrated
System Peripheral and 8254 PIT device managers
• Source for the iC 960 interface library
• Source for sample applications showing the following:
- Structure of kernel applications
- Use'of the kernel with an application written in iC
960 language
- Compile, bind' and build sequences
- Sample initialization code for the i960 microprocessor
- Applications written to execute in a flat memory
space
Because the kernel is supplied as unlinked object
modules, applications can be developed on any system that hosts the development tools needed.
Comprehensive Development Tool
Support
• User reference guide
• 90 days of customer support
Intel provides a complete line of 80960 development
tools for writing and debugging iRMK 960 applications.
5-46
iRMKTM 960
LICENSING
iRMK 960 software requires prior execution of the
standard Intel Software License Agreement (SLA). A
single development copy requires a Class I license
and allows iRMK 960 software to be loaded and run
on one single-processor system.
KN_geLtime
Get time
KN_seLtime
Set time
KN_tick
Notify kernel that clock
tick has occurred
INTERTASK COMMUNICATION AND
SYNCHRONIZATION
KN_create_semaphore
SPECIFICATIONS
System Calls
Delete a semaphore
KN_send_unit
Add a unit to a
semaphore
The following items are system calls arranged by
type:
Receive a unit from a
semaphore
KN_create_mailbox
iRMKTM 960 KERNEL SYSTEM CALLS LISTING
KERNEL INITIALIZATION
KN_initialize
Create a semaphore
KN_delete_semaphore
Create a mailbox
KN_delete_mailbox
Delete a mailbox
KN_send_data
Send data to a mailbox
KN_send_priority_data Place Gam) priority
message at head of
message queue
Initialize kernel
Request a message
from a mailbox
OBJECT MANAGEMENT
KN_token_to_ptr
Returns a pointer to the
area holding object
MEMORY MANAGEMENT
KN_currenLtask
Returns a pointer for the
current task
KN_create_pool
TASK MANAGEMENT
KN_create_task
Creates a task
KN_delete_task
Deletes a task
KN_suspend_task
Suspends a task
KN_resume_task
Resumes a task
KN_seLpriority
Change priority of a task
KN_geLpriority
Return priority of a task
Specify interrupt handler
KN_stop_scheduling
Suspend task switching
KN_starLscheduling
Resume task switching
Delete a memory pool
KN_create_area
Create a memory area
from a pool
KN_delete_area
Return a memory area to
a memory pool
KN~geLpool_attributes Get a memory pool's
attributes
PROGRAMMABLE INTERRUPT
CONTROLLER MANAGEMENT
INTERRUPT MANAGEMENT
KN_seLinterrupt
Create a memory pool
KN_delete_pool
KN_initialize_PICs
Initialize PIC's
KN_masLslot
Mask out interrupts on a
specified slot
Unmask interrupts on a
specified slot
Signal the PIC that the
interrupt on a specified
slot has been serviced
TIME MANAGEMENT
KN_sleep
Put calling task to sleep
KN_new_masks
Change interrupt masks
KN_create_alarm
Create and start virtual
alarm clock
KN_geLslot
KN_reseLalarm .,
KN_delete_alarm
Reset an existing alarm
Return the most
important active interrupt
slot
KN_geLinterrupt
Get address of specified
interrupt handler
Delete alarm
5-47
•
iRMKTM 960
KN_initialize_PIT
KN_start_PIT
KN_geLPIT_interval
Initialize the PIT
Start PIT counting
Return PIT interval
PROCESSOR RECOGNIZED FAULT HANDLING
KN_geLfaulLhandler
KN_seLfaulLhandler
Get address of fault
handler currently associated with
specified fault type
Establish address of
fault handler for the
specified fault type
PROCESSOR INTERRUPT
CONTROLLER SUPPORT
KN_geLprocessor_
_priority
KN_seLprocessor_
_priority
Time (in /kS)
Action
PROGRAMMABLE INTERVAL CONTROLLER
MANAGEMENT
Returns value of the
processor
_
Change the value of ttie
processor priority
Create Semaphore
Delete Semaphore
FIFO Semaphore Send Unit
FIFO Semaphore Receive Unit
Region Semaphore Send Unit
Region Semaphore Receive Unit
6
14
7
7
18
14
Create Mailbox
Delete Mailbox
Send Data
Receive Data
19
23
21
21
Create Alarm
Delete Alarm
FIFO Semaphore Send/Receive
Unit with Task Switch
Suspend Task with Task Switch
Basic Task Switch
Create Task
Suspend Task
-Resume Task
Delete Task
Get Priority
Set Priority
29
30
75
70
50
62
26
50
50
5
27
PERFORMANCE
Action
Create Pool
Get Pool Attributes
Delete Pool
Create Area
Delete Area
3
3
Set Interrupt
Get Interrupt
The figures listed below were derived from a test
suite running on a EVA-960 evaluation vehicle using
an 80960KB running at 20 MHz. The EVA-960 has
what is known a~ 2-1-1-1 wait state memory; what
this means is that the first instruction of a four instruction fetch takes two wait states, and each of the
three successive instructions takes one wait state.
The figures are the worst case values obtained from
several sets of test runs. The code was generated
using the iC 960 DOS hosted compiler, Version 1.1.
MANUALS
iRMK 960 User's Manual (Intel Order #463863001).
TRAINING INFORMATION
Intel Customer Service Training:
Time (in /ks)
18
36_
1
35
32
"80960 KAlKB
Course"
Embedded. Processor Training
ORDERING INFORMATION
Ordering Code
RMK960
5-48
Product Description
iRMK 960 Real-Time Kernel
EV80960CA Evaluation Board
270870-1
Low Cost Processor Evaluation Tool
Intel's EV80960CA evaluation board provides a low-cost hardware environment for code
execution and software debugging. The board features the 80960CA, the newest and
highest performance member of Intel's family of 32-bit embedded microprocessors. The
board allows a user's program to take full advantage of the power of the 80960CA and
provides zero wait state execution of the user's code.
Popular features such as single line assemblerldisassembler, single-step program
execution and software breakpoints are standard on the EV80960CA's on-board monitor.
Available separately, Intel offers a complete code development environment using the
assembler (ASM-960) as well as high-level languages, such as Intel's iC-960 C compiler, to
accelerate development schedules.
The EV80960CA evaluation board package features the 80960CA System Debug Monitor
(SDM) in EPROM, a SDM host software floppy disk, a power supply cable, a 9-pin PCI AT
serial connector for terminal and the EV80960CA User's Manual. The EV80960CA
User's Manual includes schematics ofthe board, a part list and programmable logic
(PLD) equations. The board is hosted on an IBM or BIOS-compatible PCI AT.
'The SRAM memory system provides zero wait state read (0-0-0-0-0) and one wait state write (1-1-1-1-0) performance.
"The DRAM memory system provides 2-1-1-1-1 reads and writes.
5-49
October 1991
Order Number: 270870-001
EV80960CA Evaluation Board
EV80960CA Features
Fast Static-Column Mode DRAM
• 25 MHz Execution Speed
• 32 Kbytes of EPROM for 80960CA SDM
Target Operating Firmware
• 64 Kbytes of Zero Wait State Pipelined
SRAM'
• 1 Mbyte of Static-Column Mode DRAM"
expandable to 4. Mbytes
• Concurrent Interrogation of Memory and
Registers
• Software Breakpoints
• Code Disassembly
• High-Level Language Support
• Two RS-232s for Host and User
Communication
• Two iSBX 1/0 Connectors
• An Expansion Bus to Accommodate
Eurocard Form-Factor Prototyping Boards
The memory design of the EV80960CA uses
the 80960CA burst mode bus and static-column
DRAM mode. The DRAM control PLDs are
functionally isolated into interconnected state
machines. The PLDs can be changed to allow
alternative DRAM memory implementations
with different DRAM access modes (staticcolumn mode;nibble mode or fast-page mode).
Concurrent Interrogation of
Memory and Registers
The 80960CA System Debug Monitor (SDM) for
the EV80960CA allows the user to read and
modify internal registers and external memory
while the user's program is running on the
board.
iSBX I/O Connectors and
Expansion Interface
Fast Pipelined SRAM Memory
System
The pipelined-read memory system of the
EV80960CA provides true zero wait state read
and one wait state performance. The memory
design utilizes the internal wait state
generator of the 80960CA.
The EV80960CA evaluation board has two
connectors to support both 8- and 16-bit
standard iSBX Expansion Modules. The board
also provides an expansion bus to
accommodate Eurocard form-factor
prototYping boards.
I/O EXPANSION
WITH STANDARD
iSBX BOARDS
B0960CA
CPU
I
BUS
BUFFERING
AND
CONTROL
SIGNAL
GEN.
64 KBYTES
O-WAIT
STATE
SRAM
32-BIT
EXPANSION
BUS
1 MBYTES
DRAM
I
I
H
BUFFER
BOOT
EPROM
II
SENSEI
SWITCHES
J
I
I
iSBX
I
EXPANSION
I
I
FAULT AND
USER LEOS
TIMER/
COUNTERS
I
UARTS
I
I
I
HOST
INTERFACE
PORT
USER
PORT
270870-2
Block Diagram of the EV80960CA Board
5-50
inteL
EV80960CA Evaluation Board
Communication Link
Host System Requirements
The EV80960CA board communicates with the
host through the RS-232 link using an Intel
82510 UART provided on board. The board
supports seven baud rates: 300, 1200, 2400,
4800, 9600, 19200 and 38400.
The EV80960CA Evaluation Board is hosted on
an IBM PCI AT or compatibles; a 386-based PC
is recommended. The host system must meet
the following minimum requirements:
• 512 Kbytes of Memory
• One 1.2 Mbyte Floppy Disk Drive
• PC-DOS 3.2 or Later
o A Serial Port (COM1 or COM2)
Power Requirements
The EV80960CA Evaluation Board requires 5V
at 2000 rnA and ± 12V at 25 rnA.
•
5-51
i960TM SA/SB EVALUATION BOARD
272033-1
i960™ SA/SB EVALUATION BOARD
The EV80960SX board is a general purpose evaluation tool for the i960TM SA/SB
embedded processors. This evaluation board provides a high-performance DRAM
subsystem, an interleaved EPROM subsystem, and a robust set of peripheral devices for
benchmarking and debugging application code written for the i960 SA/SB embedded
processors.
The EV80960SX is a great starter kit for your 32-bit application. The EV80960SX,
NINDY debug environment, along with assembler and C-compiler (not provided) provide
a seamless environment for developing code and evaluating the i960 SA/SB processors.
The NINDY monitor provides code download capabilities from a number of popular
development systems, including DOS-based PC's. Single step, breakpoints, register and
memory display are among the full set of features provided by NINDY.
The board is provided with the following
features:
• DRAM Subsystem operates at
1-0-0-0-0-0-0-0 wait states for read and
write cycles in the burst mode. The
DRAM subsystem runs at the maximum
processor frequency of 16 MHz, using
100 ns fast page mode DRAMs. The
DRAM subsystem can accommodate
from 512 Kbytes to 4 Mbytes, using 4 or 8
ZIP-packaged DRAMs.
• Interleaved EPROM Subsystem executes
burst program fetches with a 2-0-1-0-2-01-0 wait state performance.
The EPROM subsystem accommodates
four, 32-pin or 28-pin 8-bit wide EPROMs
with up to 150 ns access times.
• Flash EPROM Subsystem reads and
writes two 8-bit wide Flash EPROMS.
• 8259A Interrupt Controller provides
expanded interrupt capabilities using
the i960 SA/SB's interrupt controller
interface.
• Parallel Port Input allows fast
downloads of code or data to the
EV80960SX board. The parallel port
provides auto-busy and interrupt
capabilities, and is a full implementation
of the Centronics standard.
ACE5! ®, ICE® and MCS@ are registered trademarks of Intel Corporation.
Ethernet® is a registered trademark of Xerox Corporation.
·CHMOS is a patented Intel process.
5-52
June 1991
Order Number: 272033-001
i960TM SA/SB EVALUATION BOARD
o
o
o
o
o
Two serial ports provide queued and
interrupt driven serial transfer at up to
128000 baud.
82C54 Timer/Counter provides a 32-bit
counter and l6-bit counter, each with
dedicated interrupts.
Expansion/Prototype Bus (XBUS) allows
expansion cards and prototype hardware
direct access to the i960 SA/sB's bus and
control signals. Optionally, a configurable
wait state scheme provides a no glue
interface to most peripherals attached to the
XBUS.
LEDs and Switches are user programmable.
One 10-segment bar LED, a 7-segment LED
and an 8-position switch are under program
control.
Local Area Networking (LAN) is
implemented using an 82596SX LAN
coprocessor.
o
o
Laser Printer Control provides interfaces to
TEC or Canon compatible laser engines.
Monitor and Self-test diagnostics are
provided for the EV80960SX in the EPROMs
installed in the board.
The evaluation board comes complete with a
design database included on diskette, the
NINDY debug monitor on diskette and in
EPROM, power and serial cables, schematics
and user's manual.
The EV80960SX is a public domain design. The
hardware is fully documented and provides
working examples of popular memory and
peripheral interfaces to the i960 SA/SB
processor. The schematic and PLD database
are provided with each board. The EV80960SX
designs are easily duplicated and can be used
directly as the building blocks for custom
designs. Custom hardware can be prototyped
using the expansion bus (XBUS) connector.
II
. 272033-2
EV80960SX Evaluation Board
5-53
NORTH AMERICAN SALES OFFICES
ALABAMA
Intel Corp.
5015 Bradford Dr., #2
Huntsville 35805
Tel: (205) 630-4010
FAX: (205) 837-2640
ARIZONA
tlntel Corp.
410 North 44th Street
Suite 500
Phoenix 85008
Tel: (602) 231-0366
FAX: (602) 244-0446
tlntel Corp.
5850 T.G. Lee Blvd.
Suite 340
Orlando 32822
Tel: (407) 240-8000
FAX: (407) 240-8097
GEORGIA
tlntel Corp.
20 Technology Parkway
Suite 150
Norcross 30092
Tel: (404) 449-0541
FAX: (404) 605-9762
ILLINOIS
CALIFORNIA
tlntal Corp.
21515 Vanowen Street
Suite 116
Canoga Park 91303
Tel: (616) 704-6500
FAX: (616) 340-1144
Intel Corp.
1 Sierra Gate Plaza
Suite 280C
Roseville 95678
Tel: (916) 782-8086
FAX: (916) 762-8153
tlntel Corp.
9665 Chesapeake Dr.
Suite 325
San Diego 92123
Tel: (619) 292-6066
FAX: (619) 292-0626
-tlntel Corp.
400 N. Tustin Avenue
Suite 450
Santa Ana 92705
Tel: (714) 635-9642
TWX: 910-595-1114
FAX: (714) 541-9157
-tlntel Corp.
San Tomas 4
2700 San Tomas Expressway
2nd Floor
Santa Clara 95051
Tel: (408) 986-8066
TWX: 910-338-0255
FAX: (408) 727-2620
*tlntel Corp.
Woodfield Corp. Center III
300 N. Martingale Road
Suite 400
Schaumburg 60173
Tel: (708) 605-6031
FAX: (708) 706-9762
INDIANA
tlntel Corp.
8910 Purdue Road
Suite 350
Indianapolis 46268
Tel: (317) 675-0623
FAX: (317) 875-6938
MARYLAND
*tlnlel Corp.
10010 Junction Or.
Suite 200
Annapolis Junction 20701
Tel: (301) 206-2860
FAX: (301) 206-3677
(301) 206-3678
MASSACHUS~nS
*tlntel Corp.
Westford Corp. Center,
3 Carlisle Road
2nd Floor
Westford 01886
Tel: (508) 692-0960
TWX: 710-343-6333
FAX: (508) 692-7867
MICHIGAN
COLORADO
Intel Corp.
4445 Northpark Drive
Suite 100
Colorado Springs 80907
Tel: (719) 594-6622
FAX: (303) 594-0720
-tlntal Corp.
BOO S. Cherry 51.
Suite 700
Denver 80222
Tel: (303) 321-8086
TWX: 910-931-2289
FAX: (303) 322-8670
CONNECTICUT
~~lte~~°TP.irm Cor~or~te Park
tlntel Corp.
7071 Orchard Lake Road
Suite 100
West Bloomfield 48322
Tel: (313) 851-8096
FAX: (313) 651-8770
MINNESOTA
tlntel Corp.
3500 W. 80th 51.
Suite 360
Bloomington 55431
Tel: (612) 835-6722
TWX: 910-576-2867
FAX: (612) 631-6497
NEW JERSEY
~r~~~~ri
*tlnte! Corp.
2950 Express Dr., South
Suite 130
Islandia 11722
Tel: (516) 231-3300
TWX: 510-227-6236
FAX: (516) 348-7939
tlntel Corp.
300 Westage BUSiness Center
Suite 230
Fishkilr 12524
Tel: (914) 897-3860
FAX: (914) 897-3125
OHIO
*tlntel Corp.
3401 Park Center Drive
Suite 220
Dayton 45414
Tel: (513) 890-5350
TWX: 810-450-2528
FAX: (513) 890-8658
*tlntel Corp.
25700 Science Park Dr.
Suite 100
Beachwood 44122
Tel: (216) 464-2736
TWX: 810-427-9298
FAX: (804) 282-0673
OKLAHOMA
Intel Corp.
6801 N. Broadway
Suite 115
Oklahoma City 73162
Tel: (405) 848-8086
FAX: (405) 640-9819
OREGON
tlntel Corp.
15254 N.W. Greenbrier Pkwy.
Building B
Beaverton 97006
Tel: (503) 645-8051
TWX: 910-467-6741
FAX: (503) 645-8181
PENNSYLVANIA
·tlntel Corp.
925 Harvest Drive
Suite 200
Blue Bell 19422
Tel: (215) 641-1000
FAX: (215) 641-0785
*tlntel Corp.
400 Penn Center Blvd.
Suite 610
Pittsburgh 15235
Tel: (412) 823-4970
FAX: (412) 629-7578
PUERTO RICO
tlntel Corp.
South Industrial Park
P.O. Box 910
Las Piedras 00671
Tel: (809) 733-8616
TEXAS
83 Wooster Heights Ad.
Danbury 06810
Tel: (203) 746-3130
FAX: (203) 794-0339
C8ff?ce Center
125 Half Mile Road
Red Bank 07701
Tel: (908) 747-2233
FAX: (908) 747-0983
FLORIDA
NEW YORK
tlntel Corp.
8911 N. Capital of Texas Hwy.
Suite 4230
Austin 78759
Tel: (512) 794-8086
FAX: (512) 336-9335
tlnlel Corp.
*lntel.Corp.
8S0 Crosskeys Office Park
Fairport 14450
Tel: (716) 425-2750
TWX: 510-253-7391
FAX: (716) 223-2561
*tlntel Corp.
12000 Ford Road
Suite 400
Dallas 75234
Tel: (214) 241-8087
FAX: (214) 484-1180
800 Fairway Drive
Suite 160
Deerfield Beach 33441
Tel: (305) 421-0506
FAX: (305) 421-2444
tSales and Service Office
*Field Application Location
*tlntel Corp.
7322 S.W. Freeway
Suite 1490
Houston 77074
Tel: (713) 988-6066
TWX: 910-681-2490
FAX: (713) 988-3660
UTAH
tlntel Corp.
428 East 6400 South
Suite 104
Murray 84107
Tel: (801) 263-8051
FAX: (801) 268-1457
WASHINGTON
tlntel Corp.
155 108th Avenue N.E.
Suite 386
Bellevue 98004
Tel: (206) 453-8086
TWX: 910-443-3002
FAX: (206) 451-9556
Intel Corp.
408 N. Mullan Road
Suite 102
Spokane 99206
Tel: (509) 928-8086
FAX: (509) 928-9467
WISCONSIN
Intel Corp.
330 S. Executive Dr.
Suite 102
Brookfield 53005
Tel: (414) 784-8087
FAX: (414) 796-2115
CANADA
BRITISH COLUMBIA
Intel Semiconductor of
Canada, Ltd.
4585 Canada Way
Suite 202
Burnaby V5G 4L6
Tel: (604) 298-0387
FAX: (604) 298-8234
ONTARIO
tlntel Semiconductor of
Canada, Ltd.
2650 Queensview Drive
Suite 250
Ottawa K2B 8H6
Tel: (613) 829-9714
FAX: (613) 820-5936
tlntel Semiconductor of
Canada, Ltd.
190 Attwell Drive
Suite 500
Rexdale M9W 6H8
Tel: (416) 675-2105
FAX: (416) 675-2438
QUEBEC
tlntel Semiconductor of
Canada, ltd.
1 Rue Holiday
Suite 115
Tour East
PI. Claire H9R 5N3
Tel: (514) 694-9130
FAX: 514-694-0064
NORTH AMERICAN DISTRIBUTORS
ALABAMA
Arrow Electronics, Inc.
1015 Henderson Road
Huntsville 35806
Tel: (205) 837-6955
FAX: (205) 721-1581
Hamilton/Avnet Electronics
4960 Corporate Drive, #135
Huntsville 35805
Tel: (205) 837-7210
FAX: (205) 721-0356
MTI Systems Sales
4950 Corporate Drive
Suite 120
Huntsville 35805
Tel: (205) 830-9526
FAX: (205) 830-9557
PioneerfTechnotogies Group, Inc.
4835 University Square, #5
Huntsville 35805
Tel: (205) 837-9300
FAX: (205) 837-9358
ARIZONA
tArrow ElectrOnics, Inc.
4134 E. Wood Street
Phoenix 85040
Tel: (602) 437-0750
FAX: (602) 252-9109
Avnel Computer
30 South McKemy Avenue
Chandler 85226
Tel: (602) 961-6460
FAX: (602) 961-4787
Hamilton/Avnet Electronics
30 South McKemy Avenue
Chandler 85226
Tel: (602) 961-6403
FAX: (602) 961-1331
Wyle Distribution Group
4141 E. Raymond
Phoenix 85040
Tel: (602) 437-2088
FAX: (602) 437-2124
CALIFORNIA
Arrow Commercial System Group
1502 Crocker Avenue
Hayward 94544
Tel: (415) 489-5371
FAX: (415) 489-9393
Arrow Commercial $ystem Group
14242 Chambers Road
Tustin 92680
Tel: (714) 544-0200
FAX: (714) 731-8438
tArrow Electronics, Inc.
19748 Dearborn Street
Chatsworth 91311
Avnet Computer
755 Sunrise Blvd., #150
Roseville 95661
Tel: (916) 781-2521
FAX: (916) 781-3819
Avnet Computer
1175 Bordeaux Drive, #A
Sunnyvale 94089
Tel: (408) 743-3304
FAX: (408) 743-3348
Avne! Computer
21150 Califa Street
Woodland Hills 91376
Tel: (808) 345-3870
FAX: (818) 594-8333
COLORADO
Arrow Electronics, Inc.
3254 C Frazer Street
Aurora 80011
Tel: (303) 373-5616
FAX: (303) 373-5760
Tel: (303) 799-7800
FAX: (303) 799-7801
tWyle Distribution Group
451 E 124th Avenue
Thornton 80241
Tel: (303) 457-9953
FAX: (303) 457-4831
Tel: (714) 641-4100
FAX: (714) 754-6033
CONNECTICUT
tHamliton/Avnet Electronics
1175 Bordeaux Drive, #A
Sunnyvale 94089
tArrow Electronics, Inc.
12 Beaumont Road
Wallingford 06492
Tel: (203) 265-7741
tHamilton/Avnet Electronics
4545 Viewridge Avenue
San Diego 92123
FAX: (203) 265-7988
Avnet Computer
55 Federal Road, #103
Danbury 06810
Tel: (619) 571-1900
FAX: (619) 571-8761
Tel: (203) 797-2880
FAX: (203) 791-9050
tHamilton/Avnet Electronics
21150 Califa SI.
Woodland Hills 91367
tHamilton/Avnet Electronics
55 Federal Road, #103
Danbury 06810
Tel: (818) 594-0403
FAX: (818) 594-8234
Tel: (203) 743-6077
FAX: (203) 791-9050
tHamiiton/Avnet Electronics
13618 West 190th Street
Gardena 90248
tPioneerlStandard Electronics
112 Main Street
Norwalk 06851
Tel: (213) 516-8600
FAX: (213) 217-6822
Tel: (203) 853-1515
FAX: (203) 838-9901
tHamilton/Avnet Electronics
755 Sunrise Avenue, #150
Roseville 95661
FLORIDA
Tel: (916) 925-2216
FAX: (916) 925-3478
tArrow Electronics, Inc.
400 Fairway Drive, #102
Deerfield Beach 33441
Pioneer!Technologies Group, Inc.
134 Aio Robles
San Jose 95134
Tel: (305) 429-8200
FAX: (305) 428-3991
Tel: (408) 954-9100
tArrow Electronics, Inc.
37 Skyline Dnve, #3101
Lake Mary 32746
FAX: 408-954-9113
tWyle Distribution Group
124 Maryland Street
EI Segundo 90245
Tel: (213) 322-8100
FAX: (213) 416-1151
WyJe Distribution Group
7431 Chapman Ave.
Garden Grove 92641
Tel: (818) 701-7500
FAX: (818) 772-8930
Tel: (714) 891-1717
FAX: (714) 891-1621
tAr row Electronics, Inc.
9511 Ridgehaven Court
San Diego 92123
tWyle Distribution Group
2951 Sunrise Blvd., SUite 175
Rancho Cordova 95742
Tel: (619) 565-4800
FAX: (619) 279-8062
Tel: (916) 638-5282
FAX: (916) 638-1491
tArrow Electronics, Inc.
1180 Murphy Avenue
San Jose 95131
tWyle Distribution Group
9525 Chesapeake Drive
San Diego 92123
Tel: (408) 441-9700
FAX: (408) 453-4810
Tel: (619) 565-9171
FAX: (619) 365-0512
tArrow Electronics, Inc.
2961 Dow Avenue
Tustin 92680
tWyle Distribution Group
3000 Bowers Avenue
Santa Clara 95051
Tel: (305) 428-8877
FAX: (305) 481-2950
GEORGIA
tHamiiton/Avnet Electronics
9605 Maroon Circle, #200
Englewood 80112
tHamliton/Avnet Electronics
3170 Pullman Street
Costa Mesa 92626
Tel: (408) 743-3300
FAX: (408) 745-6679
Pioneer!Technologles Group, Inc.
674 S. Military Trail
Deerfield Beach 33442
Tel: (407) 333-9300
FAX: (407) 333-9320
Avnet Computer
3343 W. Commercial Blvd.
Bldg. C/O, Suite 107
Ft. Lauderdale 33309
Arrow Commercial System Group
3400 C. Corporate Way
Duluth 30136
Tel: (404) 623-8825
FAX: (404) 623-8802
tArrow Electronic~, Inc.
4250 E. Aivergreen Pkwy., #E
Duluth 30136
Tel: (404) 497-1300
FAX: (404) 476-1493
Avnet Computer
3425 Corporate Way, #G
Duluth 30136
Tel: (404) 623-5452
FAX: (404) 476-0125
Hamilton/Avnet Electronics
3425 Corporate Way, #G
Duluth 30136
Tel: (404) 446-0611
FAX: (404) 446-1011
Pioneer!Technologies Group, Inc.
4250 C. Rivergreen Parkway
Duluth 30136
Tel: (404) 623-1003
FAX: (404) 623-0665
ILLINOIS
tArrow Electronics, Inc.
1140 W. Thorndale Rd.
Itasca 60143
Tel: (708) 250-0500
Avnet Computer
1124 Thorndale Avenue
Bensenville 60106
Tel: (708) 860-8573
FAX: (708) 773-7976
tHamilton/Avnet Electronics
1130 Thorndale Avenue
Bensenville 60106
Tel: (708) 860-7700
FAX: (708) 860-8530
MTI Systems
1140W. Thorndale Avenue
Itasca 60143
Tel: (708) 250-8222
FAX: (708) 250-8275
Tel: (305) 979-9067
FAX: (305) 730-0368
tPioneer/Standard Electronics
2171 Executive Dr., Suite 200
Addison 60101
Avnet Computer
3247 Tech Drive North
SI. Petersburg 33716
Tel: (813) 573-5524
Tel: (708) 495-9680
FAX: (708) 495-9831
FAX: (813) 572-4324
INDIANA
tHamilton/Avnet Electronics
5371 N.W. 33rd Avenue
Ft. Lauderdale 33309
tArrow Electronics, tnc.
7108 Lakeview Parkway West Dr.
Indianapolis 46268
Tel: (305) 484-5016
FAX: (305) 484-8369
Tel: (317) 299-2071
FAX: (317) 299-2379
tHamilton/Avnet Electronics
3247 Tech OrNe North
St. Petersburg 33716
Tel: (813) 573-3930
Avnet Computer
485 Gradle Drive
Carmel 46032
Tel: (317) 575-8029
Tel: (714) 838-5422
FAX: (714) 838-4151
Tel: (408) 727-2500
FAX: (408) 727-5896
Avnet Computer
3170 Pullman Street
Costa Mesa 92626
tWyle Distribution Group
17872 Cowan Avenue
Irvine 92714
Tel: (714) 641-4121
FAX: (714) 641-4170
Tel: (714) 863-9953
FAX: (714) 263-0473
Tel: (407) 657-3300
FAX: (407) 678-1878
Tel: (317) 844-9333
FAX: (317) 844-5921
Avnet Computer
.
1361B West 190th Street
Gardena 90248
tWyle Distribution Group
26010 Mureau Road, #150
Calabasas 91302
tPioneer!Technologies Group, Inc.
337 Northlake Blvd., Suite 1000
Alta Monte Springs 32701
9350 Priority Way West Dr.
Tel: (800) 345-3870
FAX: (213) 327-5389
Tel: (818) 880-9000
FAX: (818) 880-5510
Tel: (407) 834-9090
FAX: (407) 834-0865
Tel: (317) 573-0880
FAX: (317) 573-0979
tCertified VAD
FAX: (813) 572-4329
FAX: (317) 844-4964
tHamiiton/Avnet Electronics
7079 University Boulevard
Winter Park 32791
Hamllton/Avnet Electronics
485 Gradle Drive
Carmel 46032
tPioneer/Standard Electronics
Indianapolis 46250
NORTH AMERICAN DISTRIBUTORS (Contd.)
IOWA
MICHIGAN
Hamilton/Avnet Electronics
2335A Blairsferry Ad., N.E.
Cedar Rapids 52402
Tel: (319) 362-4757
FAX: (319) 393-7050
tArrow Electronics, Inc.
19880 Haggerty Road
Livonia 48152
Tel: (313) 665-4100
FAX: (313) 462-2686
KANSAS
Avnet Computer
2876 28th Street, S.W., #5
Grandville 49418
Tel: (616) 531-9607
FAX: (616) 531-0059
Arrow Electronics, Inc.
8208 Melrose Dr., Suite 210
Lenexa 66214
Tel: (913) 541-9542
FAX: (913) 541-0328
Avnet Computer
15313 W. 95th Street
Lenexa 61219
Tel: (913) 541-7989
FAX: (913) 541-7904
tHamilton/Avnet Electronics
15313 W. 951h
Overland Park 66215
Tel: (913) 888-1055
FAX: (913) 541-7951
KENTUCKY
Hamilton/Avnet Electronics
805 A. Newtown Circle
Lexington 40511
Tel: (606) 259-1475
FAX: (606) 252-3238
MARYLAND
Arrow Commercial Systems Group
200 Perry Parkway
Gaithersburg 20877
Tel: (301) 670-1600
FAX: (301) 670-0188
tArrow Electronics, Inc.
8300 Guilford Road, #H
Columbia 21046
Tel: (301) 995-6002
FAX: (301) 995-6201
Avnet Computer
7172 Columbia Gateway Dr., #G
Columbia 21045
Tel: (301) 995-0020
FAX: (301) 995-3515
tHamilton/Avnet Electronics
7172 Columbia Gateway Dr., #F
Columbia 21045
Tel: (301) 995-3554
FAX: (301) 995-3515
tNorth Atlantic Industries
Systems Division
7125 Riverwood Dr.
Columbia 21046
Tel: (301) 290-3999
tPioneerfTechnologies Group, Inc.
15810 Gaither Road
Gaithersburg 20877
Tel: (301) 921-0660
FAX: (301) 670-6746
MASSACHUSETTS
Arrow Electronics, Inc.
25 Upton Dr.
Wilmington 01887
Tel: (508) 658-0900
FAX: (508) 694-1754
Avnet Computer
10 0 Centennial Drive
Peabody 01960
Tel: (508) 532-9886
FAX: (?08) 532-9660
tHamilton/Avnet Electronics
tOO Centennial Drive
Peabody 01960
Tel: (508) 531-7430
FAX: (508) 532-9802
tpioneer/Standard Electronics
44 Hartwell Avenue
Lexington 02173
Tel: (617) 861-9200
FAX: (617) 863-1547
Wyle Distribution Group
15 Third Avenue
Burlington 01803
Tel: (617) 272-7300
FAX: (617) 272-6809
tCertified VAD
Avnet Computer
41650 Garden Road
Novi 48375
Tel: (313) 347-1820
FAX: (313) 347-4067
Hamllton/Avnet Electronics
2876 28th Street, SW., #5
Grandville 49418
Tel: (616) 243-8805
FAX: (616) 531-0059
Hamilton/Avnet Electronics
41650 Garden Brook Rd., #100
Novl48375
Tel: (313) 347-4270
FAX: (313) 347-4021
tpioneer/Slandard Electronics
4505 8roadmoor S.E.
Grand Rapids 49512
Tel: (616) 698-1800
FAX: (616) 698-1831
tArrow Electronics, Inc.
6 Century Drive
Parsipanny 07054
Tel: (201) 538-0900
FAX: (201) 538-4962
Hamilton/Avnet Electronics
103 Twin Oaks Drive
Syracuse 13120
Tel: (315) 437-2641
FAX: (315) 432-0740
Avnet Computer
1-8 Keystone Ave., Bldg. 36
Cherry Hill 08003
Tel: (609) 424-8961
FAX: (609) 751-2502
MTI Systems
50 Horseblock Road
Brookhaven 11719
Tel: (516) 924-9400
FAX: (516) 924-1103
Avnet Computer
10 Industrial Road
Fairfield 07006
Tel: (201) 882-2879
FAX: (201) 808-9251
MTI Systems
1 Penn Plaza
250 W. 34th Sheet
New York 10119
Tel: (212) 643-1280
FAX: (212) 643-1288
tHamiiton/Avnet Electronics
1 Keystone Ave., Bldg. 36
Cherry Hill 08003
Tel: (609) 424-0110
FAX: (609) 751-2552
tHamilton/Avnet Electronics
10 Industrial
Fairfield 07006
Tel: (201) 575-3390
FAX: (201) 575-5839
tMTI Systems Sales
6 Century Drive
Parsippany 07054
Tel: (201) 539-6496
FAX: (201) 539-6430
tPloneer/Standard Electronics
13485 Stamford
L1vonla 48150
Tel: (313) 525-1800
FAX: (313) 427-3720
tPloneer/Standard Electronics
14-A Madison Rd.
Fairfield 07006
Tel: (201) 575-3510
FAX: (201) 575-3454
MINNESOTA
NEW MEXICO
tArrow Electronics, Inc.
10120A West 76th Street
Eden Prairie 55344
Tel: (612) 829-5588
FAX: (612) 942-7803
Avnet Computer
10000 West 76th Street
Eden Prairie 55344
Tel: (612) 829-0025
FAX: (612) 944-2781
tHamilton/Avnet Electronics
12400 Whitewater Drive
Minnetonka 55343
Tel: (612) 932-0600
FAX: (612) 932-0613
tPioneerlStandard Electronics
7625 Golden Triange Dr., #G
Eden Prairie 55344
Tel: (612) 944-3355
FAX: (612) 944-3794
MISSOURI
tArrow Electronics, Inc.
2380 Schuetz Road
S1. Louis 63141
Tel: (314) 567-6888
FAX: (314) 567-1164
Avne! Computer
739 Goddard Avenue
Chesterfield 63005
Tel: (314) 537-2725
FAX: (314) 537-4248
tHamiiton/Avnet Electronics
741 Goddard
Chesterfield 63005
Tel: (314) 537-1600
FAX: (314) 537-4248
NEW HAMPSHIRE
Avnet Computer
2 Executive Park Drive
Bedford 03102
Tel: (603) 624-6630
FAX: (603) 624-2402
NEW JERSEY
tArrow Electronics, Inc.
4 East Stow Road
Unit 11
Marlton 08053
Tel: (609) 596-8000
FAX: (609) 596-9632
Alliance Electronics Inc.
10510 Research Avenue
Albuquerque 87123
Tel: (505) 292-3360
FAX: (505) 275-6392
Avne! Computer
7801 Academy Road
Bldg. 1, Suite 204
Albuquerque 87109
Tel: (505) 828-9725
FAX: (505) 828-0360
tHamilton/Avnet Electronics
7801 Academy Rd."N.E.
Bldg. 1, Suile 204
Albuquerque 87108
Tel: (505) 765-1500
FAX: (505) 243-1395
NEW YORK
tArrow Electronics, Inc.
3375 Brighton Henrietta Townline Rd.
Rochester 14623
Tel: (716) 427-0300
FAX: (716) 427-0735
Arrow Electronics, Inc.
20 Oser Avenue
Hauppauge 11788
Tel: (516) 231-1000
FAX: (516) 231-1072
Avne! Computer
933 Motor Parkway
Hauppauge 11788
Tel: (516) 231-9040
FAX: (516) 434-7426
Avnet Computer
2060 Townline
Rochester 14623
Tel: (716) 272-9306
FAX: (716) 272-9685
tHamilton/Avnet Electronics
933 Motor Parkway
Hauppauge 11788
Tel: (516) 231-9800
FAX: (516) 434-7426
tHamilton/Avnel Electronics
2060 Townline Rd.
Rochester 14623
Tel: (716) 292-0730
FAX: (716) 292-0810
Pioneer/Standard Electronics
68 Corporate Drive
Binghamton 13904
Tel: (607) 722-9300
FAX: (607) 722-9562
tPioneer/Standard Electronics
60 Crossway Park West
Woodbury, Long Island 11797
Tel: (516) 921-8700
FAX: (516) 921-2143
tPioneer/Standard Electronics
840 Fairport Park
'Fairport 14450
Tel: (716) 381-7070
FAX: (716) 381-5955
NORTH CAROLINA
tArrow Electronics, Inc.
5240 Greensdairy Road
Raleigh 27604
Tel: (919) 876-3132
FAX: (919) 878-9517
Avnet Computer
2725 Millbrook Rd., #123
Raleigh 27604
Tel: (919) 790-1735
FAX: (919) 872-4972
Hamilton/Avnet Electronics
5250·77 Center Dr. #350
Charlone 28217
Tel: (704) 527-2485
FAX: (704) 527-8058
tHamilton/Avnet Electronics
3510 Spring Forest Drive
Raleigh 27604
Tel: (919) 878-0819
~~oo~eL~~oe~~~~~o~I~: gf~cr~'
Inc.
Charlotte 28210
Tel: (704) 527-8188
FAX: (704) 522-8564
Pioneer Technologies Group, Inc.
2810 Meridian Parkway, #148
Durham 27713
.
Tel: (919) 544-5400
FAX: (919) 544-5885
OHIO
Arrow Commercial System Group
284 Cramer Creek Court
Dublin 43017
Tel: (614) 889-9347
FAX: (614) 889-9680
tArrow ElectroniCs, Inc.
6573 Cochran Road, #E
Solon 44139
Tel: (216) 248-3990
FAX: (216) 248-1106
Arrow Electronics, Inc.
8200 Washington Village Dr.
Centerville 45458
Tel: (513) 435-5563
FAX: (513) 435-2049
NORTH AMERICAN' DISTRIBUTORS (Contd.)
OHIO (Conld.)
Avnet Computer
7764 Washington Village Dr.
Dayton 45459
Tel: (513) 439·6756
FAX: (513) 439·6719
Avnet Computer
30325 Bainbridge Rd .• Bldg. A
Solon 44139
Pioneer/Technologies Group, Inc.
259 Kappa Drive
PiHsburgh 15238
Tel: (4121 782-2300
FAX: (412) 963-8255
tPioneerfTechnologies Group, Inc.
500 Enterprise Road
Keith Valley Business Center
Horsham 19044
Tel: (215) 674-4000
FAX: (215) 674-3107
Tel: (216) 349-2505
FAX: (216) 349-1894
tHamilton/Avnet Electronics
7760 Washington Village Dr.
Dayton 45459
Tel: (513) 439-6733
FAX: (513) 439-6711
tHamiJton/Avnet Electronics
30325 Bainbridge
Solon 44139
Tel: (800) 543-2984
FAX: (216) 349-1894
Hamilton/Avnet Electronics
2600 Corp Exchange Dr1ve. #180
Columbus 43231
Tel: (614) 882-7004
FAX: (614) 882-8650
MTI Systems Sales
23404 Commerce Park Road
Beachwood 44122
Tel: (216) 464-6688
FAX: (216) 464-3564
tpioneerlStandard Electronics
4433 Interpoint Boulevard
tPioneer/Standard Electronics
4800 E. 131st Street
Tel: (901) 367-0540
FAX: (90.1) 367-2081
TEXAS
Tel: (801) 974-9953
FAX: (801) 972-2524
Arrow Electronics, Inc.
3220 Commander Drive
Carrollton 75006
Tel: (214) 380-6464
FAX: (214) 248-7208
Avnet Computer
4004 BeltJine, Suite 200
Dallas 75244
Tel: (214) 308-8181
FAX:·(214) 308-8129
1235 Noith Loop West. #525
OREGON
1885 N.W. 169th Place
Beaverton 97006
Tel: (503) 629-8090
FAX: 503-645-0611
Avnet Computer
9409 Southwest Nimbus Ave.
Beaverton 97005
Tel: (503) 627-0900
FAX: (503) 526-6242
1Hamilton/Avnet Electronics
9409 S.W. Nimbus Ave.
~:~(~~~)i;~g~OI
FAX: (50.31 641-4012
.
Wyle
9640 Sunshine Court
Bldg. G, suns 200
Beaverton 97005,
tHamiiton/Avnet Electronics
1626-F Kramer Lane
Austin 76758
Tel: (713) 240-7733
FAX: (713) 861-6541
Tel: (206) 241-8555
FAX: (206) 241-5472
15385 N.E_ 90th Street
Tel: (414) 792-0150
FAX: (414) 792-0t56
Avnet Computer
20875 Crossroads Circle, #400
Waukesha 53186
. Tel: (414) 784-8205
FAX: (414) 784-6006
tHamiiton/Avnet Electronics
28875 Crossroads Circle, #400
Waukesha 53186
tPioneerlStandard Electronics
13765 Beta Road
Tel: (414) 784-3480
Dallas 75244
Tel: (214) 386-7300
FAX: (214) 490-6419
ALASKA
Tel: (214) 235-9953
FAX: (214) 644-5064
Wyle Distrjbutlon Gr()up
4030 West Braker lane, #330
Austin 78758 .
Wyle Distribution Group
11001 South Wilcrest, #100
Houston 77099
Tel: (713) 879-9953
FAX: (713) 879-6540
120 Bishops Way #163
Brookfield 53005
Avnet Computer
1400 West Benson Blvd.
Suite 400
re~~~~;~~~~~9O:99
FAX: (907) 277-2639
CANADA
ALBERTA.
Avnet Computer
2816 21st Street Northeast
Calgary T2E 622
Tel: (4031 291-3284
FAX: (403) 250-1591
Hamilton/Avnet Electronics
213 Executive, #320
Mars 16045
Arrow Electronics, Inc.
1946 W. Parkway Blvd.
Tel: (412) 281-4152
FAX: (412) 772-1890
Salt Lake City 84119
Tel: (801) 973-6913
tArrow Electronics, Inc.
1093 Meyerside, Unit 2
Mississauga L5T 1M4
Tel: (416) 670-7769
F/IX:(416) 670-7781
. Avnet Computer
Canada System Engineering
Group
3688 Nashua Dr., Unit 6
Mississuaga L4V 1M5
Tel: (416) 672-8638
FAX: (416) 677-5091
Avnet Computer
6845 Rexwood Road
Un~s 7-9
Mississuaga l4V 1M4
Tel: (416) 672-8638
FAX: (416) 672-8650
Avnet Computer
190 Colonade Road
Nepean K2E 7J5
Tel: (416) 677-7432
FAX: (416) 677-0940
tHamilton/Avnet Electronics
190 Colonade Road
Nepean K2E 7J5
Tel: (613) 226-1700
FAX: (613) 226-1184
tZentronics
1355 Meyerside Drive
Mississauga l5T 1C9
··Tel: (416) 564-9600
FAX: (416) 564-3127 ,
tZentronics
155 Colonade Rd., South
UnH 17
Napean K2E 7KI
Tel: (613) 226-8840
FAX: (613) 226-6352
.QUEBEC
Arrow Electronics Inc.
.1100 51. Regis Blvd.
Dorval H9P 2T5
. Tel: (514) 421-7411
FAX: (514) 421-7430
Arrow Electronics, Inc.
500 Boul. 5t-Jean-Baptiste Ave.
Quebec H2E 5R9
Tel: (418) 871-7500
FAX: (418) 871-6816
Avnet Computer
2795 Rue Halpern
51. laurent H4S 1P6'
Zentronics
Tel: (514) 335-2483
FAX: (514) 335-2481
6815 8th Street N.E .. #100
Calgsry.T2E 7H
Tel: (403) 295-8838
FAX: (403) 295-8714
tHamilton/Avnet Electronics
2795 Halpern
S1. Laurent H4S 1P8
BRITISH COLUMBIA
UTAH
Tel: (613) 226-6903
FAX: (613) 723-2018
tHamiiton/Avnet Electronics
6845 Rexwood Rd., Units 3-5
Mississauga l4T 1R2
.
Arrow Electronics, Inc.
200 N.-?atrick Blvd., Ste. 100
Brookfield 53005
Pioneer/Standard Electronics
tWyle Distribution Group
1810 Greenville Avenue
Richardson 75081
ONTARIO
'Arrow Electronics, Inc.
36 Antares Dr., Unit 100
Nepean K2E 7W5
WISCONSIN
Tel: (512) 835-4000
FAX: (512) 835-9829
Tel: (713) 495-4700
FAX: (713) 495-5642
Tel: (604) 273-5575
FAX: (604) 273-2413
Tel: (613) 727-7529
FAX: (613) 226-1184
Tel: (414) 784-4510
FAX: (414) 784-9509
tPioneer/Standard Electronics
10530 Rockley Road, #100
Houston 77099
Zentronics
11400 Bridgeport Rd., #108
Richmond V6X 1T2
.
Tel: (206) 881-1150
FAX: (206) 881-1567
tPioneer/Standard Electronics
1826-0 Kramer Lane
Austin 78758
PENNSYLVANIA
1Certified VAD
17761 N.E. 78th Place. #C
Redmond 98052
Redmond 98052
Tel: (512) 345-8853
FAX: (512) 345-9330
Tel: (412) 772-1888
FAX: (4121 772-1890
tHamiiton/Avnet Electronics
Wyle Distribution Group
Tel: (503) 643-7900
FAX: (503) 646-5466
Avnet Computer
213 Executive Drive, #320
Mars 16046
tAlmac Electronics Corp.
14360 S.E. Eastgate Way
Bellevue 98007
Tel: (206) 643-9992
FAX: (206) 643-9709
Houston 77008
tHamiltonJAvnet Electronics
1235 N. loop West, #521
Houston 77006
tArmac Electronics Corp.
WASHINGTON
Tel: (713) 867-7500
FAX: (713) 861-6851
Tel: (214) 308-8111
FAX: (214) 308-8109
12121 E. 51st St.. Suite 102A
Tulsa 74146
Tel: (918) 664-0444
FAX: (918) 250-8763
1100 Easl 6600 South. #120
tWyle Distribution Group
1325 West 2200 South, #E
West Valley 84119
Arrow Electronics, Inc.
tHamilton/Avnet Electronics
tHamiiton/Avnet Electronics
Tel: (801) 972-2800
FAX: (801) 263-0104
OKLAHOMA
Tulsa 74146
Tel: (918) 252-7537
FAX: (918) 254-0917
Tel: (206) 867-0160
FAX: (206) 867-0161
Arrow CommerCial System Group
3635 Knight Road, #7
Memphis 38118
tHamilton/Avnet Electronics
4004 Beltline. #200
Dallas 75244
12111 East 51st Street, #101
Avnet Computer
17761 Northeast 78th Place
Redmond 98052
Salt lake City 84121
Tel: (800) 772-5668
FAX: (512) 832-4315
Cleveland 44105
Tel: (216) 587-3600
FAX: (216) 663-1004
1100 E. 6600 South. #150
Salt Lake City 84121
Tel: (801) 266-1115
FAX: (801) 266-0362
TENNESSEE
Avnet Computer
Dayton 45424
Tel: (513) 236-9900
FAX: (513) 236-8133
Avnet Computer
tHamiiton/Avnet Electronics
8610 Commerce Court
Burnaby V5A 4N6
Tel: (604) 420-4101
FAX: (604) 420-5376
Tel: (514) 335-1000
FAX: (514) 335-2461
tZentronics
520 McCaffrey
SI. laurent H4T 1N3
Tel: (514) 737-9700
FAX: (514) 737-5212
EUROPEAN SALES OFFICES
~LAND
GERMANY
ITALY
SPAIN
Intel GmbH
Dornacher Strasse 1
Intel Corporation Italia S.p.A.
Milanofiori Palazzo E
Intel Iberia SA
r01.
8016 Feldkirchen bel Muenchen
Tel: (49) 089/90992-0
FAX: (49) 089/9043948
20094 Assago
(358) 0 544 644
FAX: (358) 0 544 030
FRANCE
ISRAEL
Intel Corporation S.A.A.l.
1, Rue Edison·BP 303
Intel Semiconductor Ltd.
.
Atidim Industrial Park·Neve Sharet
78054 St. Quenlin·en·Yvelines
P.O. Box 43202
Tel·Aviv 61430
Tel: (972) 03498080
FAX: (972) 03491870
101 Finland OY
,uosilantie 2
10390 Helsinki
Cedex
Tel: (33) (1) 30 57 70 00
FAX: (33) (1) 30 64 60 32
Milano
Tel: (39) (02) 89200950
FAX: (39) (2) 3498464
Zubaran,28
28010 Madrid
Tel: (34) 308 25 52
FAX: (34) 410 7570
NETHERLANDS
SWEDEN
Intel Semiconductor B.V.
3009 CC Rotterdam
Intel Sweden A.B.
Dalvagen 24
171 36 Solna
Tel: (31) 104071111
FAX: (31) 10455 4688
Tel: (46) 873401 00
FAX: (46) 8 278085
Postbus 84130
UNITED KINGDOM
Intel Corporation (UK) Ltd.
Pipers Way
Swindon, Wiltshire SN3 lRJ
Tel: (44) (0793) 696000
FAX: (44) (0793) 641440
EUROPEAN DISTRIBUTORS/REPRESENTATIVES
AUSTRIA
Sacher Electronics GmbH
Rotenmuehlgasse 26
A-1120 Wien
Tel: 43 22281356460
FAX: 43 222 834276
BELGIUM
Inelco Belgium SA
Oorlogskruisenlaan 94
8-1120 Bruxelles
Tel: 32 2 244 2811
Proelectron Vertriebs GmbH
Max-Planck-Strasse 1-3
6072 Dreieich
Tel: 49 6103 304343
FAX: 49 6103 304425
Rein Electronik GmbH
Loetscher Weg 66
4054 Nettelal 1
Tel: 49 2153 7330
FAX: 49 2153 733513
Lasl Elettronica S.pA
Naverland 29
DK-2600 Glastrup
Denmark
Tel: 010 45 42 451822
FAX: 39 2 66101385
FAX: 010 45 42 457624
P.I.00839000155
Telcom s.r.i.-Divisione MDS
Via Trombetta
Zona Marconi
Strada Cassanese
.Segrate - Milano
FAX: 32 2 216 4301
GREECE
Tel: 39 2 2138010
FAX: 39 2 216061
FRANCE
Pouliadis Associates Corp.
5 Koumbari Street
Kolonaki Square
10674 Athens
NETHERLANDS
Almcx
48. Ruc de l'Aubepine
B.P, 102
92164 Antony Cedex
Tel: 33 t 40965400
FAX: 33 1 4666 "6028
Lex Electronics
Silic 585
60 Rue des Gemeaux
94663 Aungis Cedex
TeJ: 33 1 4978 4978
FAX: 33 1 4978 0596
Tel: 30 13603741
FAX: 30 1 360 7501
IRELAND
Micro Marketing
Tany Hall
Egfinton Terrace
Dundrum
Dublin
Metrologie
Tour d'Asnieres
4, Avenue laurent Cely
92606 Asnieres Cedex
Tel: 33 1 47906240
Tel: 0001 989400
FAX: 0001 989 8282
FAX: 33 1 4790 5947
Eastronics Ltd.
Aozanis 11
Tekelec-Air1ronic
Cite Des Bruyeres
Rue Carle Vernet
BP 2
92310 Sevres
Tel: 33 1 4623 2425
P.O.B.39300
Tel Baruch
Tel-Aviv 61392
Tel: 972 3 475151
E2000 Vertriebs-AG
Stahlgruberring 12
8000 Muenchen 82
Tel: 49 89 420010
Celdis Spa
Via F.ll i Gracchi 36
20092 CiniseUo Balsamo
Milano
FAX: 49 8942001209
Tel: 39 2 66012003
FAX: 39 2 6182433
FAX: 49 8972447111
PORTUGAL
ArD Electronica LOA
Rua Dr_ Faria de
Vasconcelos, 3a
1900 Lisboa
Tel: 351 1 8472200
FAX: 47 2 846545
Nordisk Electronrk AS
Box 36
Torshamnsgatan 39
S-16493 Kista
Sweden
Tel: 46 8 7034630
FAX: 46 8 7039845
SWITZERLAND
lndustrade AG.
Hertistrasse 31
CH-8304 Wallisellen
Tel: 4118328111
FAX: 41 18307550
TURKEY
SPAIN
EMPA
ATD Electronica
Plaza Cludad de Viana, 6
28040 Madrid
80050 Sishane
Refik Saydam Cad No. 89/5
Istanbul
Tel: 90 1 1436212
Tel: 34 1 534 4000/09
FAX: 34 1 534 7663
FAX: 90 1.143 6547
FAX: 972 3 475125
ITALY
Metrologie GmbH
Steinerstrasse 15
8000 Muenchen 70
Tel: 49 89 724470
2627 AP Delft
Tel: 31 15609906
FAX: 31 15619194
Nordisk Elektronik NS
Postboks 122
Smedsvingen 4
N~ 1364 Hvalstad
Norway
Tel: 47 2 846210
FAX: 351 1 8472197
GERMANY
Tel: 49 6431 5080
FAX: 496431 508289
Koning en Hartman BV.
Energieweg 1
ISRAEL
FAX: 33 1 4507 2191
Jermyn GmbH
1m Dachsstueck 9
6250 Limburg
In Multikomponent NS
Viale Fulvia Testi, N.280
20126 Milano
Tel: 39 2 66101370
Intes; Div. Della Deutsche
Divisione In
Industries GmbH
P.1. 06550110156
Milanofiori Palazzo E5
20094 Assago (Milano)
Tel: 39 2 824701
FAX: 39 2 8242631
Metrologia Iberica
etra De Fuencarral N.80
28100'Alcobendas
Madrid
Tel: 34 1 6538611
FAX: 34 1 6517549
SCANDINAVIA
OY Fintronic AS
Heikkilantie 2a
SF-00210 Helsinki
Tel: 358 0 6926022
FAX: 358 0 6821251
UNITED KINGDOM
Access Elect Camp Ltd.
Jubilee House
Jubilee Road
Letchworth
Hertfordshire
SG61QH
Tel: 0462 480888
FAX: 0462 682467
Bytech Components Ltd.
12a Cedarwood
Chineham Business Park
Crockford Lane
Basingstoke
Hants RG12 lRW
Tel: 0256 707107
FAX: 0256707162
Bytech Systems
Unit 3
The Western Centre
Western Road
Bracknell
Berks RG12 1RW
Tel: 0344 55333
FAX: 0344 867270
Metralogie
Rapid House
Oxford Road
High Wycombe
Bucks
Herts HP11 2EE
Tel: 0494 474147
FAX: 0494452144
Jermyn
Vestry Estate
Otford Road
Sevenoaks
Kent TN14 5EU
Tel: 0732 450144
FAX: 0732451251
MMD
3 Bennet Court
Bennet Road
Reading
Berkshire RG2 OOX
Tel: 0734 313232
FAX: 0734 313255
Rapid Silicon
3 Bennet Court
Bennet Road
~~~~~n~G2 OQX
Tel: 0734 752266
FAX: 0734312728
Metro Systems
Rapid House
Oxford Road
High Wycombe
Bucks HP11 2EE
Tel: 0494474171
FAX: 049421860
YUGOSLAVIA
H.R. Microelectronics Corp.
2005 de la Cruz Blvd.
Suite 220
Santa Clara, CA 95050
U.S.A.
Tel: (408) 988·0286
FAX: (408) 988-0306
INTERNATIONAL SALES OFFICES
AUSTRALIA
Intel Australia Ply. Ltd.
Unit 13
Allambie Grove Business Park
25 Frenchs forest Road East
Frenchs Forest, NSW, 2086
Sydney
Tel: 61-2-975-3300
FAX: 61-2-975-3375
Intel Australia Ply. Ltd.
711 High Street
151 Floor
East Kw. Vic., 3102
Melbourne
Tel: 61-3-810-2141
FAX: 61-3-819 7200
BRAZIL
Intel Semiconductores do Brazil LTDA
Avenida Paulista, 11S9-CJS 404/405
01311 - Sao Paulo· S,P.
Tel: 55-11-287-5899
TLX: 11-37-557-ISOB
FAX: 55-11-287-5119
Intel Semiconductor Ltd. *
1OfF East Tower
Bond Center
Queensway, Central
Hong Kong
Tel: (852) 844-4555
FAX: (852) 868-1989
INDIA
Intel Asia Electronics, Inc.
4/2, Samrah Plaza
51. Mark's Road
Bangalore 560001
Tel: 91-812-215773
TLX: 953-845-2646 INTEL IN
FAX: 091-812-215067
JAPAN
CHINA/HONG KONG
Intel Japan K.K.
5-6 Tokodai, Tsukuba-shi
Ibaraki. 300-26
Tel: 0298-47-8511
FAX: 0298-47-8450
Intel PRC Corporation
15/F, Office 1, Citic Bldg.
Jian Guo Men Wai Street
Beijing, PRC
Tel: (1) 500-4850
TLX: 22947 INTEL CN
FAX: (1) 500-2953
Intel Japan K.K.*
Hachioji ON Bldg.
4-7-14 Myojin-machi
Hachioji-shi. Tokyo 192
Tel: 0426-48-8770
FAX: 0426-48-8775
Intel Japan K.K.'"
Bldg. Kumagaya
2-69 Hon-cho
Kumagaya-shi, Sailama 360
Tel: 0485-24-6871
FAX: 0485-24-7518
Intel Japan K.K.*
Kawa-asa Bldg.
2-11-5 Shin-Yokohama
Kohoku-ku, Yokohama-shi
Kanagawa, 222
Tel: 045-474-7661
FAX: 045-471-4394
Inlel Japan K.K.*
Ryokuchi-Eki Bldg.
2-4-1 Terauchi
Toyonaka-shi,·Osaka 560
Tel: 06-863-1091
FAX: 06-863-1084
In lei Japan KK.
Shinmaru Bldg.
1-5-1 Marunouchi
Chiyoda-ku, Tokyo 100
Tel: 03-3201-3621
FAX: 03-3201-6850
Intel Japan K.K.
Green Bldg.
1-16-20 Nishiki
Naka-ku, Nagoya-shi
Aichi 460
Tel: 052-204-1261
FAX: 052-204-1285
KOREA
Intel Korea. Ltd.
16th Floor, Life Bldg.
61 Yoido-dong, Youngdeungpo-Ku
Seoul 150-010
Tel: (2) 784·8186
FAX: (2) 784-8096
SINGAPORE
Intel Singapore Technology, Ltd.
101 Thomson Road #08-03/06
United Square
Singapore 1130
Tel: (65) 250-7811
FAX: (65) 250-9256
TAIWAN
Intel Technology Far East Ltd.
Taiwan Branch Office
8th Floor, No. 205
Bank Tower Bldg.
Tung Hua N. Road
Taipei
Tel: 886-2-5144202
FAX: 886-2-717-2455
~ NTIERNATIONAl D~STFU BUTOrAS/REPRIESENTAT~VIES
ARGENTINA
INDIA
Dafsys S.RL
Chacabuco, 90-6 Piso
1069-Buenos Aires
Tel: 54-1-34-7726
FAX: 54-1-34-1871
Micronic Devices
Arun Complex
No. 65 D.V.G. Road
Basavanagudi
Bangalore 560 004
Tel: 011-91-812-600-631
011-91-812-611-365
TLX: 9538458332 MOBG
AUSTRALIA
Email Electronics
15-17 Hume Street
Huntingdale, 3166
Tel: 011·61·3-544-8244
TLX: AA 30895
FAX: 011-61-3-543-8179
NSD-Australia
205 Middleborough Rd.
Box Hill, Victoria 3128
Tel: 03 8900970
FAX: 03 8990819
BRAZIL
Microlinear
Largo do Arouche, 24
01219 Sao Paulo, SP
Tel: 5511-220-2215
FAX: 5511-220-5750
CHILE
Micronic Devices
No. 516 5th Floor
Swastik Chambers
Sian, Trombay Road
Chembur
Bombay 400 071
TLX: 9531 171447 MDEV
Micronic Devices
25/8, 1 st Floor
Bada Bazaar Marg
Old Rajinder Nagar
New Delhi 110 060
Tel: 011-91-11-5723509
011-91-11-589771
TLX: 031·63253 MONO IN
Micronic Devices
6-3·348/12A Dwarakapuri COlony
Hyderabad 500 482
Tel: 011-91-842-226748
Sisteco
Vecinal 40 - Las Condes
Santiago
Tel: 562-234-1644
FAX: 562-233-9895
S&S Corporation
1587 Kooser Road
San Jose, CA 95118
Tel: (408) 978-6216
TLX: 820281
FAX: (408) 978-8635
CHINA/HONG KONG
JAMAICA
Novel Precision Machinery Co., Ltd.
Room 728 Trade Square
681 Cheung Sha Wan Road
Kowloon, Hong Kong
Tel: (852) 360-8999
TWX: 32032 NVTNL HX
FAX: (852) 725-3695
MC Systems
10-12 Grenada Crescent
Kingston 5
Tel: (809) 929-2638
(809) 926-0188
FAX: (809) 926-0104
JAPAN
GUATEMALA
Abinitio
11 Calle2-Zona9
Guatemala City
Tel: 5022-32-4104
FAX: 5022-32-4123
*Field Application Location
Asahi Electronics Co. Ltd.
KMM Bldg. 2-14-1 Asano
Kokurakita·ku
Kitakyushu-shi 802
Tel: 093-511-6471
FAX: 093-551-7861
CTC Components Systems Co., Ltd.
4-8-1 Dobashi, Miyamae-ku
Kawasaki·shi, Kanagawa 213
Tel: 044-852-5121
FAX: 044-877-4268
Dia Semicon Systems, Inc.
Flower Hill Shinmachi Higashi-kan
1-23 Shinmachi, Setagaya-ku
Tokyo 154
Tel: 03-3439-1600
FAX: 03-3439-1601
Okaya Koki
2+18 Sakae
Naka-ku, Nagoya-shi 460
Tel: 052-204-8315
FAX: 052-204-8380
Ryoyo Electro Corp.
Konwa Bldg.
1-12·22 Tsukiji
Chuo-ku, Tokyo 104
Tel: 03-3546-5011
FAX: 03-3546-5044
SAUDI ARABIA
ME Systems, Inc.
642 N. Pastoria Ave.
Sunnyvale, CA 94086
U.S.A_
Tel: (408) 732-1710
FAX: (408) 732-3095
TLX: 494-3405 AAE SYS
SINGAPORE
Electronic Resources Pte, Ltd.
17 Harvey RO.J.d
#03-01 Singnpore 1336
Tel (65) 283-0888
TWX: RS 56541 ERS
FAX: (6S) 289-5327
SOUTH AFRICA
Electronic Building Elements
178 Erasmus SI. (off Watermeyet St.)
Meyerspark, Pretoria. 0184
Tel: 011-2712-803-7680
FAX: 011-2712-803-8294
KOREA
J-Tek Corporation
Dong Sung Bldg. 9/F
158-24, Samsung-Dong, Kangnam·Ku
Seoul 135-090
Tel: (822) 557-8039
FAX: (822) 557-8304
Samsung Electronics
Samsung Main Bldg.
150 Taepyung-Ro-2KA, Chung-Ku
Seoul 100-102
C.P.O. Box 8780
Tel: (822) 751-3680
TWX: KORSST K 27970
FAX: (822) 753-9065
TAIWAN
Micro Electronics Corporation
121h Floor, Section 3
285 Nanking East Road
Taipei, R.O.C.
Tel: (886) 2-7198419
FAX: (886) 2-7107916
Acer Sertek Inc.
15th Floor, Section 2
Chien Kuo North Rd.
Taipei 18479 R.O.C.
Tel: 886-2-501-0055
TWX: 23756 SERTEK
FAX: (886) 2-5012521
MEXICO
URUGUAY
PSI SA de C.V.
Fco. Villa esq. Ajusco sin
Cuernavaca, MOR 62130
Tel: 52-73-13-9412
52·73-17-5340
FAX: 52-73-17-5333
Interfase
Zabala 1378
11000 Montevideo
Tel: 5982-96-0490
5982-96·1143
FAX: 5982-96-2965
NEW ZEALAND
VENEZUELA
Email Electronics
36 Olive Road
Penrose, Auckland
Tel: 011-64-9-591-155
FAX: 011-64-9-592-681
Unix:el CA
4 Transversal de Monte Cristo
Edt AXXA, Piso 1, of. 1&2
Centro Empresarial Boleita
Caracas
Tel: 582-238-6082
FAX: 582-238-1816
NORTH AMERICAN SERVICE OFFICES
ALASKA
CONNECTICUT
MARYLAND
NEW YORK
PUERTO RICO
Intel Corp.
c/o TransAlaska Network
1515 Lore Rd.
Anchorage 99507
Tel: (907) 522-1776
*Intel Corp.
301 Lee Farm Corporale Park
83 Wooster HeIghts Rd.
Danbury 06811
Tel: (203) 748-3130
**Inlel Corp.
10010 Junction Dr., Suite 200
Annapolis Junction ·20701
Tel: (301) 206-2860
*Inlel Corp.
2950 Expressway Or. South
Suite 130
Islandia 11722
Tel: (516) 231-3300
Intel Corp.
South Industrial Park
P.O. Box 910
Las Piedras 00671
Tel: (809) 733-8616
Intel Corp.
300 Westage Business Center
Suile 230
Fishkill 12524
Tel: (914) 897-3860
TEXAS
Intel Corp.
c/o TransAlaska Data Systems
c/o Gel Operations
520 Fifth Ave., Suite 407
Fairbanks 99701
Tel: (907) 452-6264
ARIZONA
*Intel Corp.
410 North 44th Street
Suite 500
Phoenix 85008
Tel: (602) 231-0386
FAX: (602) 244-0446
"Intel Corp.
500 E. Fry Blvd., Suite M-15
Sierra Vista 85635
Tel: (602) 459-5010
FLORIDA
**Inlel Corp.
BOO Fairway Dr., Suite 160
Deerfield Beach 33441
Tel: (305) 421-0506
FAX: (305) 421-2444
*Intel Corp.
5B50 T.G. Lee Blvd., Ste. 340
Orlando 32B22
Tel: (407) 240-8000
GEORGIA
*Inlel Carp.
20 Technology Park, Suite 150
Norcross 30092
Tel: (404) 449-0541
MASSACHUSETTS
**Intel Corp.
Westford Corp. Center
3 Carlisle Rd., 2nd Floor
Westford 01666
Tel: (508) 692-0960
MICHIGAN
*Inlel Corp.
7071 Orchard Lake Rd., Ste. 100
West Bloomfield 48322
Tel: (313) 851-8905
MINNESOTA
*Inlel Carp.
3500 W. 80th St., Suite 360
Bloomington 55431
Tel: (612) 835-6722
ARKANSAS
5523 Theresa Street
Columbus 31907
Inlel Corp.
cIa Federal Express
HAWAlI
MISSISSIPPI
**Intel Corp.
Honolulu 96B20
Tel: (808) 847-6738
Intel Corp.
cia Compu-Care
2001 Airport Road, Suite 205F
Jackson 39208
Tel: (601) 932-6275
1500 West Park Drive
Little Rock 72204
CALIFORNIA
*Intel Corp.
21515 Vanowen St., Ste.116
Canoga Park 91303
Tel: (818) 704-8500
"Intel Corp.
300 N. Continental Blvd.
Suite 100
EI Segundo 90245
Tel: (213) 640-6040
"Intel Corp.
1900 Prairie City Rd.
Folsom 95630-9597
Tel: (916) 351-6143
"Intel Corp.
9665 Chesapeake Dr., Suite 325
San Diego 92123
Tel: (619) 292-8086
"·Inlel Corp.
400 N. Tustin Avenue
Suite 450
Santa Ana 92705
Tel: (714) 835-9642
··Inlel Corp
2700 San Tomas Exp., 1st Floor
Santa Clara 95051
Tel: (408) 970-1747
COLORADO
*Inlel Corp.
600 S. Cherry St., Suile 700
Denver B0222
Tel: (303) 321-8086
ILLINOIS
**tlntei Corp.
Woodfield Corp. Cenler 111
300 N. Martingale Rd., Sle. 400
Schaumburg 60173
Tel: (708) 605-8031
MISSOURI
INDIANA
*Inlel Corp.
3300 Rider Trail South
Suile 170
Earth City 63045
Tel: (314) 291-1990
*Inlel Carp.
6910 Purdue Rd., Sle. 350
Indianapolis 46268
Tel: (317) 875-0623
Intel Corp.
Route 2, Box 221
Smithville 64089
Tel: (913) 345-2727
KANSAS
NEW JERSEY
*Intel Corp.
10985 Cody, Suite 140
Overland Park 66210
Tel: (913) 345-2727
KENTUCKY
Intel Corp.
133 Walton Ave., Office lA
Lexington 40508
Tel: (606) 255-2957
Intel Corp.
896 Hillcrest Road, Apt. A
Radcliff 40160 (Louisville)
LOUISIANA
Hammond 70401
(serviced from Jackson, MS)
**Inlel Corp.
300 Sylvan Avenue
Englewood Cliffs 07632
Tel: (201) 567-0821
*Inlel Corp.
Lincroft Office Cenler
125 Half Mile Road
Red Bank 07701
Tel: (908) 747-2233
NEW MEXICO
Intel Corp.
Rio Rancho 1
4100 Sara Road
Rio Rancho 87124-1025
(near Albuquerque)
Tel: (505) 893-7000
Intel Corp.
5858 East Molloy Road
Syracuse 13211
Tel: (315) 454-0576
NORTH CAROLINA
*Intel Corp
5800 Executive Center Drive
Suile 105
Charlotte 28212
Tel: (704) 568-8966
**Intel Corp.
5540 Centerview Dr., Suile 215
Raleigh 27606
Tel: (919) 851-9537
VIRGINIA
*Inlel Corp.
25700 Science Park Dr., Ste. 100
Beachwood 44122
Tel: (216) 464-2736
OREGON
**Inlel Corp.
15254 N.W. Greenbrier Pkwy.
Building B
Beaverton 97006
Tel: (503) 645-8051
PENNSYLVANIA
*tlntel Corp.
925 Harvest Drive
Suite 200
Blue Bell 19422
Tel: (215} 641-1000
1-800-468-3548
FAX: (215) 641-0785
**tlntel Corp.
400 Penn Cenler B!vd., Sle. 610
Pittsburgh 15235
Tel: (412} 823-4970
*Inlel Corp.
1513 Cedar Cliff Dr.
Camp Hill 17011
Tel: (717) 761-0860
SYSTEMS ENGINEERiNG OFFICES
2950 Expressway Dr., South
Islandia 11722
Tel: (506) 231-3300
*Carry-in locations
**Carry-in/maiHn locations
UTAH
Inlel Corp.
428 East 6400 South
Suile 104
Murray 84107
Tel: (801) 263-8051
FAX: (801) 268-1457
*Inlel Corp.
9030 Slony Point Pkwy.
Suite 360
Richmond 23235
Tel: (804) 330-9393
2402 W. Beardsley Road
Phoenix B5027
Tel: (602) 869-4288
1 -800-468-3548
NEW YORK
**Intel Corp.
7322 SW Freeway, Suite 1490
Houston 77074
Tel: (713} 988-8086
OHIO
CUSTOMER TRAINING CENTERS
3500 W. BOth Street
Suite 360
Bloomington 55431
Tel: (612) 835-6722
**tlntel Corp
12000 Ford Rd., Suite 401
Dallas 75234
Tel: (214) 241-8087
**Intel Corp..
3401 Park Center Dr., Sle. 220
Dayton 45414
Tel: (513) 890-5350
ARIZONA
MINNESOTA
**!ntel Corp.
Westech 360, Suite 4230
8911 N. Capito! of Texas Hwy.
Austin 78752-1239
Tel: (512) 794-8086
WASHINGTON
**Intel Corp.
155 10Bth Avenue N.E., Sle. 386
Bellevue 98004
Tel: (206) 453-8086
CANADA
ONTARIO
**Intel Semiconductor of
Canada, Ltd.
2650 Queensview Dr., Sle. 250
Ottawa K2B 8H6
Tel: (613) 829-9714
**Inlel Semiconductor of
Canada, Ltd.
190 Attwell Dr., Ste. 102
Rexdale (Toronto) M9W 6H8
Tel: (416) 675-2105
QUEBEC
**Inlel Semiconductor of
Canada, Ltd
1 Rue Holiday
Suite 115
Tour East
Pt. Claire H9R 5N3
Tel: (514) 694-9130
FAX: 514-694-0064
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.3 Linearized : No XMP Toolkit : Adobe XMP Core 4.2.1-c043 52.372728, 2009/01/18-15:56:37 Create Date : 2011:06:22 17:28:25-08:00 Modify Date : 2011:06:22 20:28:59-07:00 Metadata Date : 2011:06:22 20:28:59-07:00 Producer : Adobe Acrobat 9.43 Paper Capture Plug-in Format : application/pdf Document ID : uuid:44e59851-8aff-42fb-8dee-5e077f0fd96f Instance ID : uuid:9201a588-5de4-4993-a7ee-99d6f9aef305 Page Layout : SinglePage Page Mode : UseNone Page Count : 1216EXIF Metadata provided by EXIF.tools